JP4907483B2

JP4907483B2 - Video display device

Info

Publication number: JP4907483B2
Application number: JP2007255906A
Authority: JP
Inventors: 茂則前田; 幸太郎坂田; 俊弥中; 啓介早田; 智英石上; 知成高橋
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2007-09-28
Filing date: 2007-09-28
Publication date: 2012-03-28
Anticipated expiration: 2027-09-28
Also published as: JP2009087026A

Description

本発明はＴＶ等の映像表示装置であって、特に、従来のボタンによる入力だけでなく、ユーザの位置・向き、視線、ハンドジェスチャ、ユーザの手・腕の動き等のユーザセンシング情報に基づき制御される映像表示装置に関する。 The present invention is a video display device such as a TV, and in particular, based on user sensing information such as a user's position / orientation, line of sight, hand gesture, user's hand / arm movement, as well as conventional button input. The present invention relates to a video display device.

また、本発明は映像表示装置により提供されるアプリケーション・サービスを複数人による操作を前提とし、そのような操作の組合せにより制御する方法に関する。 The present invention also relates to a method for controlling an application service provided by a video display apparatus based on a combination of such operations on the premise of operation by a plurality of persons.

さらに、本発明は映像表示装置の大画面を活かした実物大表示や自然なインタラクションによりあたかもその場所にいるかのような臨場感を醸し出す方式に関する。 Furthermore, the present invention relates to a system that creates a sense of realism as if the user is in the place by a full-scale display utilizing a large screen of a video display device and natural interaction.

大画面化・薄型化に伴い、ＴＶを単に番組や映画などを視聴するだけでなく、複数の情報の同時利用、大量の情報の一覧、実物大の臨場感を利用したコミュニケーション、常に情報を提示する掲示板、壁紙や額縁のような室内装飾、といったような新しい利用方法への可能性が広がりつつある。 Along with the increase in screen size and thinning, not only watching TV programs and movies, but also using multiple information simultaneously, listing large amounts of information, communication using real-world presence, always presenting information The possibility of new usage methods such as bulletin boards, interior decorations such as wallpaper and picture frames is expanding.

また家庭内ネットワークの立ち上がりとともに、各機器で検知されたユーザの操作履歴や状態をネットワークを介して共有し総合的にユーザのコンテキスト・状況をセンシングすることも現実味を帯びてきている（例えば、特許文献１参照）。 In addition, with the rise of the home network, it is also becoming realistic to share the user's operation history and status detected by each device via the network and comprehensively sense the user's context and situation (for example, patents) Reference 1).

このような状況下で従来の番組や映画を視聴するだけでなく、上記のような新しい利用方法で必要となるより複雑な機能を簡単に利用できるようにするため、直感的に行える新しい操作方式、ユーザセンシング情報に基づきユーザの状況や意図を汲み先回りすることでユーザの明示的な操作量・負担を軽減する自律動作方式の実現が必要となる。 In this situation, in addition to watching conventional programs and movies, in order to make it easier to use the more complex functions required by the above new usage method, a new intuitive operation method Therefore, it is necessary to realize an autonomous operation method that reduces the user's explicit operation amount and burden by drawing the user's situation and intention based on the user sensing information.

また、リビングルームなどの家族の共有空間に設置されるＴＶの特徴を活かしつつ受身的に視聴するのみでない使い方に対応するために、複数人による操作を前提とした入力方式、操作体系を採用することが望まれる。
特開２００４−２４６８５６号公報 In addition, in order to cope with usage that is not only passive viewing while taking advantage of the characteristics of the TV set in the shared space of the family such as the living room, an input method and operation system based on operation by multiple people are adopted. It is desirable.
JP 2004-246856 A

本発明は上記課題を解決するためのものであり、その目的とするところは、１以上の視聴対象ユーザにそれぞれ対応し、前記映像を表示するアプリケーションを制御することにより、複数人による直感的な操作が可能な映像表示装置を提供することである。 The present invention is for solving the above-described problems, and the object of the present invention is to provide an intuitive operation by a plurality of persons by controlling an application for displaying the video corresponding to one or more viewing target users. An object of the present invention is to provide a video display device that can be operated.

さらに、そのような映像表示装置により提供されるサービス・アプリケーションを制御する方法を提供することにある。 Furthermore, it is providing the method of controlling the service application provided by such a video display apparatus.

上記課題を解決するため、本発明の映像表示装置は、１以上のユーザにより操作される映像表示装置であって、１以上のユーザを識別する識別手段と、１以上のユーザの位置と動作を検出する検出手段と、映像を表示する表示手段と、識別された１以上のユーザのうち、検出したユーザの位置と動作により、表示映像の視聴対象である１以上のユーザを判別する判別手段と、映像を表示するアプリケーションを制御する制御手段と、ユーザの属性情報を格納するユーザデータベースを備え、前記属性情報には、少なくとも身体情報を含む身体特徴情報、および、１以上のユーザ間の人間関係情報が含まれ、前記制御手段は、１以上の視聴対象ユーザの前記属性情報にそれぞれ対応し、前記映像を表示するアプリケーションを制御することを特徴とする。
ここで、前記１以上のユーザ間の人間関係情報は、ユーザ間の親密度の情報であり、前記制御手段は、当該ユーザ間の親密度の大きさに応じて、情報の共有の可否の度合いを判定し、前記映像を表示するアプリケーションを制御してもよい。
また、前記映像表示装置は、さらに、他の場所に設置されネットワーク接続された相手方映像表示装置と連携動作を行うアプリケーションを制御し、前記相手方映像表示装置から送信された操作情報を受信するデータ受信手段と、当該操作情報のうち、前記相手方映像表示装置で撮影された等身大映像の実物大での提示を行うオブジェクトの情報、およびそのオブジェクトに対する操作情報について、提示画面サイズに基づきスケール変換を行う等身大表示変換手段と、ユーザの大きさ情報とユーザ位置情報に基づき、前記相手方映像表示装置での等身大表示に必要となる等身大情報を操作情報に付加する等身大情報付加手段と、等身大情報を付加された操作情報および等身大映像を前記相手方映像表示装置に対して送信するデータ送信手段とを備え、前記制御手段は、入力情報とともに、スケール変換後の操作情報を処理し、アプリケーションを制御してもよい。
また、前記映像表示装置は、さらに、ユーザの明示的な入力情報を取得するリモコンを備え、前記制御手段は、前記リモコンの複数の相対的な位置関係に基づき、アプリケーションを制御してもよい。
また、前記リモコンは球形であり、ユーザは当該リモコンを投げる、あるいは、面上を転がすことができ、前記制御手段は、当該リモコンの軌道、あるいは、停止位置の情報に基づいて、前記映像を表示するアプリケーションを制御してもよい。 In order to solve the above problems, a video display device according to the present invention is a video display device operated by one or more users, and includes identification means for identifying one or more users, positions and operations of one or more users. Detecting means for detecting; display means for displaying video; and discriminating means for discriminating one or more users who are viewing targets of the displayed video based on the position and operation of the detected user among the identified one or more users. A control unit for controlling an application for displaying video; and a user database for storing user attribute information. The attribute information includes body feature information including at least body information, and a human relationship between one or more users. information is included, the control means, especially to control the application corresponding to said attribute information of one or more viewing target user, and displays the video To.
Here, the human relationship information between the one or more users is information on intimacy between users, and the control means determines whether information can be shared according to the degree of intimacy between the users. The application for displaying the video may be controlled.
In addition, the video display device further controls an application that performs a cooperative operation with a partner video display device installed in another location and connected to the network, and receives data for receiving operation information transmitted from the partner video display device. The scale conversion is performed based on the presentation screen size for the information on the object to be displayed in the actual size of the life-size video captured by the counterpart video display device and the operation information on the object among the operation information Life-size display conversion means, life-size information adding means for adding life-size information necessary for life-size display on the counterpart video display device to operation information based on user size information and user position information, Data transmitting means for transmitting operation information and life-size video to which the large information is added to the counterpart video display device Wherein the control means, together with the input information, processes the operation information after the scale conversion may control the application.
The video display device may further include a remote controller that obtains user explicit input information, and the control unit may control the application based on a plurality of relative positional relationships of the remote controller.
Further, the remote control is spherical, and the user can throw the remote control or roll it on the surface, and the control means displays the video based on the information on the trajectory or stop position of the remote control. You may control the application to do.

また、本発明の映像表示装置は、前記制御手段が、１以上の視聴対象ユーザの少なくとも視聴位置に対応し、視聴対象となる映像の表示位置及び又はサイズを修正することを特徴とする。 The video display device of the present invention is characterized in that the control means corrects a display position and / or a size of a video to be viewed corresponding to at least a viewing position of one or more viewing target users.

また、本発明の映像表示装置は、ユーザの属性情報を格納するユーザデータベースを備え、前記属性情報には、少なくとも身長情報を含む身体特徴情報が含まれ、前記制御手段は、１以上の視聴対象ユーザの前記属性情報にそれぞれ対応し、前記映像を表示するアプリケーションを制御することを特徴とする。 The video display device of the present invention includes a user database that stores user attribute information. The attribute information includes body feature information including at least height information, and the control means includes one or more viewing targets. An application for displaying the video is controlled corresponding to each attribute information of the user.

また、本発明の映像表示装置は、前記視聴対象のユーザが立位か座位かのそれぞれの視聴状態を管理する管理手段を備え、前記身体特徴情報には、立位での身長と目の高さ、座位での身長と目の高さ情報が含まれ、前記制御手段は、前記視聴対象ユーザの視聴状態と身体特徴情報に基づき、前記映像を表示するアプリケーションを制御することを特徴とする。 The video display device according to the present invention further includes management means for managing each viewing state of the viewing target user in a standing position or a sitting position, and the body characteristic information includes the height and eye height in the standing position. The height and eye height information in the sitting position is included, and the control means controls an application for displaying the video based on the viewing state and body characteristic information of the viewing target user.

また、本発明の映像表示装置は、前記身体特徴情報には、利き目情報、視力情報が含まれ、前記制御手段は、前記視聴対象ユーザの前記利き目情報、視力情報に基づき、前記映像を表示するアプリケーションを制御することを特徴とする。 In the video display device of the present invention, the body characteristic information includes dominant eye information and visual acuity information, and the control means displays the video based on the dominant eye information and visual acuity information of the viewing target user. It is characterized by controlling an application to be displayed.

本発明の映像表示装置によれば、１以上の視聴対象ユーザにそれぞれ対応し、前記視聴対象ユーザの属性や状況に応じて、前記映像を表示するアプリケーションを制御することにより、前記視聴対象ユーザが複数のモーダルをシームレスに切り替えられるため、ＰＣと比較したときのＴＶの手軽さが担保された上で、従来の番組や映画の視聴にとどまらない複数人での多様なＴＶの楽しみ方の基盤が提供される。 According to the video display device of the present invention, each of the viewing target users corresponds to one or more viewing target users, and the viewing target user is controlled by controlling an application that displays the video according to the attribute or situation of the viewing target user. Since multiple modals can be switched seamlessly, the convenience of TV compared to a PC is ensured, and a foundation for various TV enjoyment by multiple people that goes beyond viewing traditional programs and movies. Provided.

以下、添付の図面を用いて本発明に係る映像表示装置及び制御方法を実施するための最良の形態について説明する。 The best mode for carrying out a video display device and a control method according to the present invention will be described below with reference to the accompanying drawings.

（実施の形態１）
図１は、本実施の形態における映像表示装置の外観と関連機器とのインタフェースの一例を説明する図である。映像表示装置は、放送番組を受信するためのアンテナのほか、一つまたは複数のユーザ検出カメラなどを含む。ユーザ検出カメラにより撮影された画像情報の解析により検出されるユーザの位置・動きにより映像表示装置を制御する。 (Embodiment 1)
FIG. 1 is a diagram for explaining an example of an interface between an appearance of a video display device and related devices in the present embodiment. The video display device includes one or a plurality of user detection cameras in addition to an antenna for receiving a broadcast program. The video display device is controlled by the position and movement of the user detected by analysis of image information captured by the user detection camera.

ジェスチャ入力リモコンを保持したユーザの手の動きや同リモコン上に配置されたボタンの押下といったユーザのリモコン操作に基づき、映像表示装置を制御してもよい。 The video display device may be controlled based on the user's remote control operation such as the movement of the user's hand holding the gesture input remote control or the pressing of a button arranged on the remote control.

また、図１では示していないが、映像表示装置における複数のスピーカーは映像表示装置の上端と下端、左端と右端というように離れて配置されてもよい。 Although not shown in FIG. 1, the plurality of speakers in the video display device may be arranged apart from each other such as an upper end and a lower end, and a left end and a right end of the video display device.

ユーザの属性や位置、アプリケーションの状態などに応じてスピーカー出力を制御すれば、ユーザにより大きな臨場感を与えることが可能である。 If the speaker output is controlled according to the user's attribute, position, application state, etc., it is possible to give the user a greater sense of realism.

また、映像表示装置はその他の家電機器、例えばデジタルスチルカメラ、デジタルカムコーダなどと無線、ルータ／ハブなどを介してネットワーク接続されている。 In addition, the video display device is connected to other home appliances such as a digital still camera and a digital camcorder via a wireless network and a router / hub.

映像表示装置は、それらの機器から受信したデジタル情報・操作メニューを画面に提示するとともに、画面に提示されたメニューに対するユーザの操作情報をそれらの機器に送信することができる。 The video display device can present the digital information / operation menu received from these devices on the screen, and can transmit the user's operation information for the menu presented on the screen to these devices.

さらに、映像表示装置が他の場所に設置された映像表示装置とネットワーク接続することで、映像表示装置を介して、離れた場所にいるユーザ同士が写真や映像などのデジタル情報を共有することができる。 Furthermore, when the video display device is connected to a video display device installed at another location via a network, users in remote locations can share digital information such as photographs and videos via the video display device. it can.

なお、映像表示装置により、ユーザ同士が、複数の写真でスライドショーを構成するといった共同作業をすることもできる。 It should be noted that the video display device enables users to collaborate such that a slide show is composed of a plurality of photos.

ネットワーク接続の手段として、特にインターネットを用いれば、遠方のユーザ同士が情報共有や共同作業をすることができる。 If the Internet is used as a means for network connection, distant users can share information and collaborate.

（実施の形態２）
図２は、本実施の形態における映像表示装置を有する室内において複数のユーザが位置している状況の一例を示す図である。 (Embodiment 2)
FIG. 2 is a diagram illustrating an example of a situation in which a plurality of users are located in a room having the video display device according to the present embodiment.

また、図３は、映像表示装置の機能を示すブロック図であり、図４は、図３のユーザ情報ＤＢ（データベース）のデータ構成の概略を説明する図である。また、図５は、映像表示装置の処理の概略を示すフローチャートである。 FIG. 3 is a block diagram showing functions of the video display device, and FIG. 4 is a diagram for explaining the outline of the data structure of the user information DB (database) in FIG. FIG. 5 is a flowchart showing an outline of processing of the video display apparatus.

以下、図３の各ブロックの機能を説明する。
ユーザ検出カメラ１００は映像表示装置の前方に存在するユーザを撮影する。ユーザ位置検出手段１０１は、ユーザ検出カメラ１００で撮影した複数の映像からユーザ領域の抽出を行った後、複数の映像におけるユーザ領域の対応関係からステレオ視の原理により、ユーザと映像表示装置との相対位置を算出する。ユーザ識別手段１０２は、顔領域の抽出を行った後、あらかじめ登録された顔画像と抽出した顔画像の照合を行うことで、ユーザを特定するユーザ識別情報を出力する。視線検出手段１０３はユーザの目の黒目領域の位置に基づきユーザの視線方向を算出する。手位置・形状検出手段１０４は、ユーザの手領域を抽出した後、手の位置と手の形状があらかじめ規定されたどの形状に合致するか照合して、例えば「グー」「パー」「右選択」「左選択」などの手形状情報を出力する。 Hereinafter, the function of each block in FIG. 3 will be described.
The user detection camera 100 captures a user existing in front of the video display device. The user position detecting unit 101 extracts a user area from a plurality of videos captured by the user detection camera 100, and then uses a stereo vision principle to determine the relationship between the user and the video display device based on the correspondence between user areas in the plurality of videos. The relative position is calculated. After extracting the face area, the user identification unit 102 collates the face image registered in advance with the extracted face image, and outputs user identification information for identifying the user. The line-of-sight detection means 103 calculates the user's line-of-sight direction based on the position of the user's eye black eye region. After extracting the user's hand region, the hand position / shape detecting unit 104 collates with which shape the hand position and the hand shape match in advance, for example, “goo” “par” “right selection” Outputs hand shape information such as “select left”.

またジェスチャ入力リモコン１０５は、配置されたボタンのうちいずれがユーザに押下されているかのボタン状態を出力するボタン入力手段１０５１、ジェスチャ入力リモコン１０５を保持するユーザの手の動きを検出するモーションセンサ１０５２、ジェスチャ識別手段１０５３から構成される。モーションセンサ１０５２は移動と回転からなる手の動き情報を出力する。ジェスチャ識別手段１０５３は、モーションセンサからの出力と、あらかじめ規定された手の動き（ジェスチャ）情報とを照合して、「振る」「回す」「投げる」などのジェスチャ種別情報と操作量を出力する。 The gesture input remote controller 105 includes a button input unit 1051 that outputs a button state indicating which of the arranged buttons is pressed by the user, and a motion sensor 1052 that detects the movement of the user's hand holding the gesture input remote controller 105. , Gesture identification means 1053. The motion sensor 1052 outputs hand movement information including movement and rotation. The gesture identification unit 1053 collates the output from the motion sensor with predetermined hand movement (gesture) information, and outputs gesture type information such as “shake”, “turn”, and “throw” and an operation amount. .

アプリケーション制御手段１０６は、ユーザ位置検出手段１０１が出力するユーザ位置、ユーザ識別手段１０２が出力するユーザ識別情報、視線検出手段１０３が出力する視線方向、手位置・形状検出手段１０４が出力する手位置・手形状情報、ボタン入力手段１０５１が出力するボタン状態、ジェスチャ識別手段１０５３が出力するジェスチャ種別情報を入力情報として解釈する。アプリケーション制御手段１０６は、これらの入力情報に加えて、ユーザ情報ＤＢ１０７Ａ、ユーザ状態管理手段１０７Ｂ、モーダル制御手段１０７Ｃ、視聴状態判別手段１０７Ｄからの入力情報により、アプリケーションの制御を行う。 The application control unit 106 includes a user position output by the user position detection unit 101, user identification information output by the user identification unit 102, a line-of-sight direction output by the line-of-sight detection unit 103, and a hand position output by the hand position / shape detection unit 104. The hand shape information, the button state output by the button input unit 1051, and the gesture type information output by the gesture identification unit 1053 are interpreted as input information. In addition to these input information, the application control means 106 controls the application based on input information from the user information DB 107A, the user state management means 107B, the modal control means 107C, and the viewing state determination means 107D.

ユーザ情報ＤＢ１０７Ａは、図４に示すとおり、基本属性情報と身体特徴情報、人間関係情報などを格納する。基本属性情報は例えば氏名、性別、年齢、誕生日、続柄などである。ユーザ状態管理手段１０７Ｂは各々のユーザ毎にユーザ状態を管理する。モーダル制御手段１０７Ｃは、各々のユーザ毎に該当するモーダルを選択した上で、現在の状態との組み合わせに基づき、次に実行すべき機能、遷移すべき状態を決定する。視聴状態判別手段１０７Ｄは、識別された１以上のユーザのうち、少なくとも検出したユーザの位置あるいは動作も鑑みて、表示映像の視聴対象であるユーザを判別する。 As shown in FIG. 4, the user information DB 107A stores basic attribute information, body feature information, human relationship information, and the like. The basic attribute information is, for example, name, sex, age, birthday, relationship, and the like. The user status management means 107B manages the user status for each user. The modal control means 107C selects a corresponding modal for each user, and determines a function to be executed next and a state to be transitioned based on the combination with the current state. The viewing state determination unit 107D determines, among the one or more identified users, a user who is a viewing target of the display video in consideration of at least the detected position or operation of the user.

画面の描画内容の更新を行う場合は、アプリケーション制御手段１０６は、その描画内容の更新情報を画面描画手段１１０に出力する。画面描画手段１１０は描画された画面内容を画面１１１へ提示する。 When updating the drawing content of the screen, the application control unit 106 outputs update information of the drawing content to the screen drawing unit 110. The screen drawing unit 110 presents the drawn screen content on the screen 111.

アプリケーション制御手段１０６は、基本属性情報に基づいてユーザに提示する情報を選択することができる。これにより、映像表示装置は、例えば、各ユーザの年齢や職業を踏まえて各々に関連する、或いは興味を持つであろうニュースを選択して表示できる。また、図４のユーザを例にとると、松下花子さんの誕生日である１２月２１日の２週間前になると夫である太郎さんに対して花子さんへの誕生日プレゼントの購入を促すメッセージを表示する、過去のプレゼント履歴やプレゼントのおすすめ候補を表示するなどができる。また、誕生日当日の１２月２１日には、花子さんへのお祝いメッセージを表示したりすることもできる。 The application control means 106 can select information to be presented to the user based on the basic attribute information. Thereby, the video display device can select and display news related to or interested in each user based on the age and occupation of each user, for example. Further, taking the user in FIG. 4 as an example, a message prompting her husband, Taro, to purchase a birthday present for Hanako, two weeks before December 21st, which is Matsushita's birthday. , Past gift histories, and suggested gift candidates. In addition, a congratulatory message to Hanako can be displayed on December 21st of the birthday.

また、ユーザ情報ＤＢ１０７Ａに格納される身体特徴情報は、立位での身長と目の高さ、座位での身長と目の高さ、利き手、利き目、視力、聴力など、ユーザの姿勢別の身体形状、および視聴能力などを含む。 Also, the body feature information stored in the user information DB 107A is the height and eye height in the standing position, the height and eye height in the sitting position, the dominant hand, the dominant eye, the eyesight, the hearing, etc. Includes body shape and viewing ability.

アプリケーション制御手段１０６は、この身体特徴情報に基づいてユーザの見やすい位置や大きさ、聞きやすい音量で情報を提示することができる。例えば、ユーザが立位であることを認識すると、映像表示装置は、立位での目の高さの情報に基づいて、ユーザの目の高さ位置を中心として情報を表示できる。また、映像表示装置は視力の悪いユーザに対して表示サイズを大きくできる。利き目情報に基づいて表示の微調整を行ってもよい。 The application control means 106 can present information at a position and size that is easy for the user to see and a volume that is easy to hear based on the body feature information. For example, when the user recognizes that the user is standing, the video display device can display information centered on the eye height position of the user based on the information on the eye height when standing. Also, the video display device can increase the display size for users with poor vision. Fine adjustment of the display may be performed based on the dominant eye information.

また、片手でのハンドジェスチャを利き手で行うとすれば、利き手情報に基づいてハンドジェスチャを促すアイコン等の表示位置を決定することができる。すなわち、右利きのユーザに対してはユーザの位置の右寄りに対応する画面上の位置に、左利きのユーザに対してはユーザの位置の左寄り対応する画面上の位置にハンドジェスチャを促すアイコン等を表示することで、より自然なインタラクションを実現することができる。 Also, if hand gesture with one hand is performed with a dominant hand, the display position of an icon or the like that prompts hand gesture can be determined based on the dominant hand information. That is, for a right-handed user, an icon or the like that prompts a hand gesture at a position on the screen corresponding to the right of the user's position, and for a left-handed user at a position on the screen corresponding to the left of the user's position. By displaying, a more natural interaction can be realized.

さらに、ユーザ情報ＤＢ１０７Ａに格納される人間関係情報は、例えば図４（ｃ）のように該データベースに登録済みのユーザ間の親密度を０．０〜１．０で格納している。アプリケーション制御手段１０６は、この親密度に基づいて、ユーザ間の情報の共有の可否などを制御することができる。 Furthermore, the human relationship information stored in the user information DB 107A stores the familiarity between users registered in the database as 0.0 to 1.0 as shown in FIG. 4C, for example. The application control means 106 can control whether or not information can be shared among users based on the familiarity.

このように、本発明の映像表示装置は、１以上の視聴対象ユーザにそれぞれに対応する映像を表示するアプリケーションを制御する。 As described above, the video display device of the present invention controls applications that display video corresponding to one or more viewing target users.

また、他の場所に設置されネットワーク接続された映像表示装置との連携動作を行うアプリケーションの場合は、データ受信手段１０８は相手方映像表示装置から送信されてきた操作情報を受信する。等身大表示変換手段１０９は当該操作情報のうち表示サイズに依存するもの（相手方映像表示装置で撮影された等身大映像など実物大での提示を行うオブジェクトの情報、及びそのオブジェクトに対する操作情報）について提示画面サイズに基づきスケール変換を行う。アプリケーション制御手段１０６は、入力情報と共に、スケール変換後の操作情報を処理した後、アプリケーションの制御を行う。 Further, in the case of an application that performs a cooperative operation with a video display device installed at another location and connected to a network, the data receiving unit 108 receives operation information transmitted from the counterpart video display device. The life-size display conversion means 109 is dependent on the display size of the operation information (information of an object that is to be displayed in real size, such as a life-size video taken by the other-party video display device, and operation information for the object). Scale conversion is performed based on the presentation screen size. The application control unit 106 controls the application after processing the scale-converted operation information together with the input information.

また、アプリケーション制御手段１０６は、相手方映像表示装置に対する操作情報の生成を行う。また、アプリケーション制御手段１０６は、ユーザ識別手段１０２より入力されたユーザ識別情報に基づいてユーザ情報ＤＢ１０７Ａより取得したユーザの大きさ情報、ユーザ位置検出手段１０１より入力されたユーザ位置情報を等身大情報付加手段１１２に入力してもよい。入力された操作情報がユーザ検出カメラ１００で撮影された等身大映像である場合には、等身大情報付加手段１１２は、ユーザの大きさ情報とユーザ位置情報に基づき、相手方映像表示装置での等身大表示に必要となる等身大情報を操作情報に付加する。データ送信手段１１３は等身大情報を付加された操作情報及び等身大映像を相手方映像表示装置に対して送信する。 Further, the application control means 106 generates operation information for the counterpart video display device. In addition, the application control unit 106 uses the user size information acquired from the user information DB 107A based on the user identification information input from the user identification unit 102 and the user position information input from the user position detection unit 101 as life-size information. You may input into the addition means 112. FIG. When the input operation information is a life-size image taken by the user detection camera 100, the life-size information adding means 112 is based on the size information of the user and the user position information, and the life-size image on the counterpart image display device. Life-size information necessary for large display is added to the operation information. The data transmission means 113 transmits the operation information and the life-size video to which the life-size information is added to the counterpart video display device.

ここで、図５のフローチャートに沿って、映像表示装置の処理の流れを説明する。
まずユーザ検出カメラ１００が顔を検出すると、あらかじめ登録されたユーザ情報ＤＢ１０７Ａに格納される身体特徴情報との照合により、ユーザ識別手段１０２はユーザ識別を行う（Ｓ２０１）。そして、識別された各ユーザに対して、ユーザ位置検出手段１０１はユーザ位置情報を、視線検出手段１０３は視線方向情報を、手位置・形状検出手段１０４は手位置・手形状情報をそれぞれ算出する（Ｓ２０２）。そして、Ｓ２０１で識別された１以上のユーザのうち、少なくとも検出したユーザの位置あるいは動作も鑑みて、視聴状態判別手段１０７Ｄは表示映像の視聴対象であるユーザを判別する（Ｓ２０３）。 Here, the processing flow of the video display apparatus will be described with reference to the flowchart of FIG.
First, when the user detection camera 100 detects a face, the user identification unit 102 performs user identification by collating with body feature information stored in the user information DB 107A registered in advance (S201). For each identified user, the user position detection unit 101 calculates user position information, the line-of-sight detection unit 103 calculates line-of-sight direction information, and the hand position / shape detection unit 104 calculates hand position / hand shape information. (S202). Then, in view of at least the position or operation of the detected user among the one or more users identified in S201, the viewing state determination unit 107D determines the user who is the viewing target of the display video (S203).

例えば、図２に示すユースシーンを例にとると、ユーザＡは立位で映像表示装置の近辺を移動中であることから視聴対象ユーザと判別する。ユーザＢはソファに座って映像表示装置の方を向いていることから視聴対象ユーザと判別する。ユーザＣは立位で映像表示装置に背を向けて当該装置から離れる方向に移動しているので、視聴対象ユーザではないと判別する。 For example, taking the use scene shown in FIG. 2 as an example, since user A is standing and moving in the vicinity of the video display device, the user A is determined as a viewing target user. User B is determined as a viewing target user because he is sitting on the sofa and facing the video display device. Since the user C is standing and turning away from the video display device, the user C is determined not to be a viewing target user.

視聴対象ユーザと判別されたユーザ数が複数の場合（Ｓ２０４でＹＥＳ）には、モーダル制御手段１０７Ｃは、ユーザ位置情報や現在のアプリケーション情報から、コラボレーション作業の形態を判定する（Ｓ２０５）。 When the number of users determined as viewing target users is plural (YES in S204), the modal control means 107C determines the form of collaboration work from the user position information and current application information (S205).

コラボレーション作業の形態には、複数人が同時に同じ作業を分担して行う形態と複数人が同時に別々の作業をする形態がある。前者には、さらに一人が主導権を握り、もう一人が助言する形態や、複数人が同じ立場で作業する形態などがある。 The forms of collaboration work include a form in which a plurality of persons share the same work at the same time and a form in which a plurality of people perform separate work at the same time. In the former, there is a form in which one person takes the initiative and the other person gives advice, and multiple persons work in the same position.

そして、Ｓ２０２で算出されたユーザ位置情報、視線方向情報、手位置・手形状情報や現在のアプリケーション状態から、モーダル制御手段１０７Ｃは、各々のユーザ毎に操作モーダルを判定する（Ｓ２０６）。ここで言う操作モーダルとは、以下に説明する４つの方式、すなわち「ユーザ位置による操作方式」「視線・顔向きによる操作方式」「フリーハンドジェスチャによる操作方式」「ジェスチャ入力リモコンによる操作方式」のいずれかである。操作モーダルが判定されると、対応する入力情報、例えば、フリーハンドジェスチャによる操作方式であれば、「グー」「パー」「右選択」「左選択」などの手形状情報から操作コマンドを生成する（Ｓ２０７）。 Then, based on the user position information, line-of-sight direction information, hand position / hand shape information calculated in S202, and the current application state, the modal control means 107C determines the operation modal for each user (S206). The operation modals mentioned here are the following four methods: “operation method by user position”, “operation method by gaze / face orientation”, “operation method by freehand gesture”, “operation method by gesture input remote control”. Either. When the operation modal is determined, an operation command is generated from the corresponding input information, for example, the hand shape information such as “goo”, “par”, “right selection”, “left selection” if the operation method is a freehand gesture. (S207).

また、等身大表示が必要なアプリケーションであれば（Ｓ２０８のＹＥＳ）、等身大表示に必要なスケール情報を取得し（Ｓ２０９）、等身大表示情報を生成する（Ｓ２１０）。 If the application requires life-size display (YES in S208), the scale information necessary for life-size display is acquired (S209), and life-size display information is generated (S210).

そして、アプリケーション制御手段１０６はＳ２０７で生成した操作コマンドと現在のアプリケーションの状態から次に実行すべき機能を決定し（Ｓ２１１）、その機能を実行する（Ｓ２１２）。 Then, the application control unit 106 determines a function to be executed next from the operation command generated in S207 and the current application state (S211), and executes the function (S212).

（実施の形態３）
図６は本実施の形態におけるユーザ位置検出手段１０１におけるステレオ視の原理に基づくユーザ位置算出方法を示したものである。 (Embodiment 3)
FIG. 6 shows a user position calculation method based on the principle of stereo vision in the user position detection means 101 in this embodiment.

図６（ａ）に示すように、ユーザ検出カメラ１００が２台一組にして間隔Ｂをおいて映像表示装置の画面に対して平行に設置され、ユーザ位置検出手段１０１はそれぞれのカメラで撮影された画像内の対応するユーザ領域の位置のずれを基に、ユーザと映像表示装置の画面との距離Ｄを算出する。各カメラで撮影された画像内のユーザが写っている領域の抽出は、例えば、あらかじめユーザのいない状態でそれぞれのカメラで撮影した画像を保存しておき、ユーザが現れたときの画像との差分を求めることにより実現できる。また、顔画像検出及び顔画像の照合によりユーザの顔領域を求め、前記顔領域をユーザ領域とすることもできる。 As shown in FIG. 6A, two user detection cameras 100 are installed in parallel with the screen of the video display device at a distance B as a set, and the user position detection means 101 is photographed by each camera. The distance D between the user and the screen of the video display device is calculated based on the position shift of the corresponding user area in the image. Extraction of the region in which the user is photographed in the image photographed by each camera is, for example, stored in advance in the state where there is no user, and the difference from the image when the user appears It can be realized by seeking. It is also possible to obtain a user's face area by face image detection and face image collation, and use the face area as the user area.

図６（ｂ）は二つの画像上の対応するユーザ領域を基にユーザとカメラ設置面（映像表示装置の画面）との距離Ｄを求めるステレオ視の原理を示したものである。２台のカメラで撮影された画像それぞれで対応するユーザ領域を位置測定対象とすると、その像は二つの画像上に図６（ｂ）に示すように投影される。対応する像の画像上のずれをＺとすると、カメラの焦点距離ｆとカメラの光軸間の距離Ｂとから、ユーザと映像表示装置との距離Ｄは、
Ｄ＝ｆ×Ｂ／Ｚ
で求められる。また、映像表示装置の画面に平行な方向のユーザ位置については、画像中のユーザ領域の位置と上記の距離Ｄを基に求めることができる。このようにして求めた映像表示装置に対するユーザの相対的な位置がユーザ位置検出手段１０１からアプリケーション制御手段１０６に入力される。 FIG. 6B shows the principle of stereo vision for obtaining the distance D between the user and the camera installation surface (screen of the video display device) based on the corresponding user areas on the two images. Assuming that the user area corresponding to each of the images captured by the two cameras is a position measurement target, the images are projected onto the two images as shown in FIG. 6B. If the shift of the corresponding image on the image is Z, the distance D between the user and the video display device is calculated from the focal length f of the camera and the distance B between the optical axes of the camera.
D = f × B / Z
Is required. Further, the user position in the direction parallel to the screen of the video display device can be obtained based on the position of the user area in the image and the above distance D. The relative position of the user with respect to the video display device thus obtained is input from the user position detection unit 101 to the application control unit 106.

アプリケーション制御手段１０６は、このユーザ位置情報に基づいて、例えば当該ユーザに提示すべき情報の映像表示装置の画面上の表示位置を決定することで、図７（ａ）と図７（ｂ）に示すようにユーザが動いても常にユーザから見やすい位置に情報を提示し続けることが可能である。例えば、図７（ａ）に示すようにユーザが映像表示装置の前方で移動すれば、ユーザの位置に近い画面上の位置に情報を表示するようにすることができる。また、図７（ｂ）に示すようにユーザが映像表示装置に対して近づく、または遠ざかる場合は、アプリケーション制御手段１０６が情報の表示サイズを縮小または拡大し、ユーザにとって見やすい大きさで表示することができる。特に、ユーザが映像表示装置に対して近づき、表示されている情報に興味を抱いていると判断できる場合には、アプリケーション制御手段１０６が表示する情報をより詳細なものに設定し、表示してもよい。さらに、図７（ｃ）に示すように、各ユーザの顔の高さ位置に応じて、見やすい高さに情報を表示するようにすることができる。このようにユーザの位置に応じて映像表示装置を操作する方式を、ユーザ位置による操作方式と呼ぶ。 Based on this user position information, the application control means 106 determines the display position on the screen of the video display device of information to be presented to the user, for example, so that the information shown in FIG. 7A and FIG. As shown, even if the user moves, it is possible to keep presenting information at a position that is always easy to see from the user. For example, as shown in FIG. 7A, if the user moves in front of the video display device, information can be displayed at a position on the screen close to the position of the user. Also, as shown in FIG. 7B, when the user approaches or moves away from the video display device, the application control means 106 reduces or enlarges the information display size and displays it in a size that is easy for the user to see. Can do. In particular, when the user approaches the video display device and can determine that he / she is interested in the displayed information, the information displayed by the application control means 106 is set to a more detailed information and displayed. Also good. Furthermore, as shown in FIG. 7C, information can be displayed at a height that is easy to see according to the height position of each user's face. Such a method of operating the video display device in accordance with the position of the user is called an operation method based on the user position.

（実施の形態４）
図８、図９は本実施の形態における視線検出手段１０３における視線方向検出方法を示したものである。 (Embodiment 4)
8 and 9 show a gaze direction detection method in the gaze detection means 103 in the present embodiment.

視線方向は顔の向きと、目の中の黒目部分の方向の組み合わせを基に計算される。そこでまず人物の三次元の顔向きを推定し、次に黒目の向き推定を行い、２つを統合して視線方向を計算する。 The line-of-sight direction is calculated based on the combination of the face direction and the direction of the black eye part in the eye. Therefore, first, the three-dimensional face direction of the person is estimated, then the direction of the black eye is estimated, and the gaze direction is calculated by integrating the two.

図８（ａ）に示すように、視線検出手段１０３は、ユーザ検出カメラ１００で撮影された画像から、まずは顔の顔向きの推定を行う。顔向き推定の方法としては例えば以下に図８（ｂ）および図９（ａ）、（ｂ）を用いて説明する方法を用いることで推定できる。図８（ｂ）は、全体のフローを示すものである。あらかじめ、検出された顔領域における目・鼻・口などの顔部品特徴点の領域をいくつかの顔向きごとに用意する。図９（ａ）の例では顔向き正面および左右±２０度における顔部品特徴点の領域を用意している。また、各顔部品特徴点周辺領域を切り出したテンプレート画像を用意しておく。 As illustrated in FIG. 8A, the line-of-sight detection unit 103 first estimates the face orientation of the face from the image captured by the user detection camera 100. As a face direction estimation method, for example, it can be estimated by using the method described below with reference to FIGS. 8B, 9A, and 9B. FIG. 8B shows the overall flow. Areas of facial part feature points such as eyes, nose and mouth in the detected face area are prepared in advance for each of several face orientations. In the example of FIG. 9A, a face part feature point region is prepared in front of the face and ± 20 degrees to the left and right. A template image obtained by cutting out the area around each facial part feature point is prepared.

まず、ユーザ検出カメラ１００が映像表示装置の前方に存在するユーザを撮影し（Ｓ４０１）、撮影画像から顔領域の検出を行う（Ｓ４０２）。次に、検出された顔領域に対し、各顔向きに対応した顔部品特徴点の領域を当てはめ（Ｓ４０３）、各顔部品特徴点の領域画像を切り出す。切り出された領域画像と、あらかじめ用意したテンプレート画像の相関を計算し（Ｓ４０４）、各顔向きの角度を相関の比で重み付けした重み付け和を求め、これを検出顔の顔向きとする（Ｓ４０５）。図９（ａ）の例では、顔向き＋２０度に対する相関が０．８５、正面向きに対する相関が０．１４、−２０度に対する相関が０．０１であるので、顔向きは２０×０．８５＋０×０．１４＋−２０×０．０１＝１６．８度と算出される。 First, the user detection camera 100 captures a user existing in front of the video display device (S401), and detects a face area from the captured image (S402). Next, a face part feature point area corresponding to each face orientation is applied to the detected face area (S403), and a region image of each face part feature point is cut out. The correlation between the clipped region image and the template image prepared in advance is calculated (S404), a weighted sum obtained by weighting the angle of each face by the correlation ratio is obtained, and this is set as the face orientation of the detected face (S405). . In the example of FIG. 9A, the correlation for the face orientation +20 degrees is 0.85, the correlation for the front orientation is 0.14, and the correlation for −20 degrees is 0.01, so the face orientation is 20 × 0.85 + 0. It is calculated as × 0.14 + −20 × 0.01 = 16.8 degrees.

ここでは各顔部品領域を相関計算の対象としたが、これに限らず例えば顔領域全体を相関計算の対象としてもよい。またその他の方法としては、顔画像から目・鼻・口などの顔部品特徴点を検出し、顔部品特徴点の位置関係から顔の向きを計算する方法が知られている。顔部品特徴点の位置関係から顔向きを計算する方法としては、１つのカメラから得られた顔部品特徴点に最も一致するように、あらかじめ用意した顔部品特徴点の三次元モデルを回転・拡大縮小してマッチングし、得られた三次元モデルの回転量から顔向きを計算する方法や、実施の形態３で述べたように２台のカメラにより撮影された画像を基にステレオ視の原理を用いて、左右のカメラにおける顔部品特徴点位置の画像上のずれから各顔部品特徴点の三次元位置を計算し、得られた顔部品特徴点の位置関係から顔の向きを計算する方法がある。例えば両目および口の三次元座標点で張られる平面の法線方向を顔向きとする、などの方法が知られている。 Here, each face part area is a target of correlation calculation, but the present invention is not limited to this. For example, the whole face area may be a target of correlation calculation. As another method, a method is known in which facial part feature points such as eyes, nose and mouth are detected from a face image, and the face orientation is calculated from the positional relationship of the facial part feature points. As a method of calculating the face direction from the positional relationship of the facial part feature points, rotate and enlarge the 3D model of the facial part feature points prepared in advance to best match the facial part feature points obtained from one camera. The method of calculating the face direction from the amount of rotation of the three-dimensional model obtained by reducing and matching, and the principle of stereo vision based on the images taken by two cameras as described in the third embodiment. Using the left and right cameras to calculate the three-dimensional position of each facial part feature point from the displacement of the facial part feature point position on the image, and to calculate the orientation of the face from the positional relationship of the obtained facial part feature points is there. For example, a method is known in which the normal direction of the plane stretched by the three-dimensional coordinate points of both eyes and mouth is the face direction.

視線検出手段１０３は、顔向きが決定した後、黒目の向き推定を行う。黒目方向は例えば以下の方法を用いることで推定できる。推定方法の概要を図８（ｂ）、図９（ｂ）を用いて説明する。 The line-of-sight detection means 103 estimates the direction of the black eye after the face orientation is determined. The black eye direction can be estimated by using the following method, for example. An outline of the estimation method will be described with reference to FIGS. 8B and 9B.

本手法では、視線方向基準面の算出、黒目中心の検出、視線方向の算出の手順で、視線検出を行う。 In this method, gaze detection is performed in the sequence of gaze direction reference plane calculation, black eye center detection, and gaze direction calculation.

まず、視線方向基準面の算出に関して、本手法における視線方向基準面とは、視線方向を算出する際に基準となる面のことで，顔の左右対称面と同一である。本手法では、目頭が、目尻や口角、眉など他の顔部品に比べて表情による変動が少なく、また誤検出が少ないことを利用し、顔の左右対称面を目頭の３次元位置から算出する。 First, regarding the calculation of the line-of-sight reference plane, the line-of-sight reference plane in the present method is a plane that serves as a reference when calculating the line-of-sight direction, and is the same as the left-right symmetrical plane of the face. In this method, the right and left symmetry plane of the face is calculated from the three-dimensional position of the eye using the fact that the eye is less affected by facial expressions and has fewer false detections than other facial parts such as the corner of the eye, the corner of the mouth, and the eyebrows. .

目頭の３次元位置は、ステレオカメラで撮影した２枚の画像に対して、顔検出モジュールと顔部品検出モジュールを用いて目頭を検出し、これらをステレオ計測することで、３次元位置を計測する（Ｓ４０６）。図９（ｂ）に示すように、視線方向基準面は検出した左右の目頭を端点とする線分の垂直二等分面として取得される。 The three-dimensional position of the eye is measured by detecting the eye of the two images taken by the stereo camera using the face detection module and the face part detection module, and measuring these in stereo. (S406). As shown in FIG. 9B, the line-of-sight direction reference plane is acquired as a vertical bisector of a line segment with the detected left and right eye heads as endpoints.

次に、黒目中心の検出に関してだが、人が見ているものは、瞳孔から入った光が網膜に届き、電気信号となって脳に伝達されたものである。したがって、視線方向を検出する場合には、瞳孔の動きを見ればよい。しかし、日本人の場合、虹彩が黒または茶色のため、画像上では瞳孔との判別が付きにくい。そこでここでは、瞳孔の中心と黒目（虹彩）の中心がほぼ一致することから、視線方向特徴として黒目中心の検出を行う。黒目中心はまず目尻と目頭を検出し、図９（ｃ−１）のような目尻と目頭を含む目領域から輝度が最小となる領域を、黒目領域として検出する。次に、図９（ｃ−２）のような領域１，２からなる黒目検出フィルタを設定し、領域１，２内の画素の輝度の領域間分散が最大となるような円中心を探索し、これを黒目中心とする。最後に先ほどと同様に、黒目中心の３次元位置をステレオ計測によって取得する（Ｓ４０７）。 Next, regarding the detection of the center of the black eye, what humans are seeing is that light entering from the pupil reaches the retina and is transmitted to the brain as an electrical signal. Therefore, when detecting the line-of-sight direction, it is only necessary to see the movement of the pupil. However, in the case of Japanese people, since the iris is black or brown, it is difficult to distinguish it from the pupil on the image. Therefore, here, since the center of the pupil and the center of the black eye (iris) substantially coincide with each other, the center of the black eye is detected as the line-of-sight direction feature. At the center of the black eye, first, the corner of the eye and the head of the eye are detected, and the region having the minimum luminance from the eye region including the corner of the eye and the head of the eye as shown in FIG. Next, a black-eye detection filter composed of regions 1 and 2 as shown in FIG. 9C-2 is set, and a circle center is searched so that the inter-region variance of the luminance of the pixels in regions 1 and 2 is maximized. This is the center of black eyes. Finally, as before, the three-dimensional position of the center of the black eye is acquired by stereo measurement (S407).

さらに、視線方向の検出に関してだが、算出した視線方向基準面と、黒目中心の３次元位置を用いて、視線方向を検出する。人の眼球直径は成人の場合はほとんど個人差がないことが知られており、日本人の場合約２４ｍｍである。したがって、基準となる方向（たとえば正面）を向いたときの黒目中心の位置が分かっていれば、そこから現在の黒目中心位置までの変位を求めることで視線方向に変換算出することができる。従来手法では基準となる方向を向いた時の黒目中心の位置が既知ではないため、キャリブレーションを必要としていたが、本手法では、正面を向いたときは、左右の黒目中心の中点が顔の中心、すなわち視線方向基準面上に存在することを利用し、左右の黒目中心の中点と視線方向基準面との距離を計測することで視線方向を計算する（Ｓ４０８）。 Further, regarding the detection of the gaze direction, the gaze direction is detected using the calculated gaze direction reference plane and the three-dimensional position of the center of the black eye. The human eyeball diameter is known to have almost no individual difference in the case of an adult, and is about 24 mm in the case of a Japanese. Therefore, if the position of the center of the black eye when the reference direction (for example, the front) is pointed is known, it can be converted and calculated in the line-of-sight direction by obtaining the displacement from there to the current center position of the black eye. In the conventional method, the position of the center of the black eye when facing the reference direction is not known, so calibration was necessary.However, in this method, when facing the front, the midpoint of the center of the left and right black eyes is the face. The direction of the line of sight is calculated by measuring the distance between the midpoint of the center of the left and right black eyes and the line of sight direction reference plane (S408).

この手法では、視線方向は、顔正面に対する左右方向の回転角θとして取得される。回転角θは、以下の式で求められる。 In this method, the line-of-sight direction is acquired as a rotation angle θ in the left-right direction with respect to the front of the face. The rotation angle θ is obtained by the following equation.

Ｒ：眼球半径（１２ｍｍ）
ｄ：視線方向基準面と黒目中点の距離 R: Eyeball radius (12 mm)
d: Distance between line-of-sight reference plane and black eye midpoint

以上の手順に基づいて算出した顔の三次元向きと、顔における黒目の向きを合わせることで、実空間における視線方向の検出を行うことができる（Ｓ４０９）。 The line-of-sight direction in the real space can be detected by matching the three-dimensional orientation of the face calculated based on the above procedure and the orientation of the black eyes in the face (S409).

アプリケーション制御手段１０６は、上記の手順で視線検出手段１０３が検出した視線方向情報に基づいて、例えば当該ユーザに提示すべき情報の映像表示装置の画面上の表示位置を決定する。これにより、本映像表示装置は、ユーザの視線・顔向きが動いても常にユーザから見やすい位置に情報を提示し続けることができる。このようにユーザが視線・顔向き方向を変えることにより映像表示装置を操作する方式を、視線・顔向き方向による操作方式と呼ぶ。 The application control unit 106 determines, for example, the display position of information to be presented to the user on the screen of the video display device based on the gaze direction information detected by the gaze detection unit 103 in the above procedure. As a result, the present video display device can continue to present information at a position that is always easy for the user to see even if the user's line of sight / face orientation moves. A method in which the user operates the video display device by changing the line of sight / face direction is referred to as an operation method based on the line of sight / face direction.

（実施の形態５）
図１０は本実施の形態における手位置・形状検出手段１０４における手位置・形状検出方法を示したものである。 (Embodiment 5)
FIG. 10 shows a hand position / shape detection method in the hand position / shape detection means 104 according to the present embodiment.

図１０（ａ）に示すように、ユーザ検出カメラ１００で撮影された画像から、まずは人物位置を検出し、人物位置の周辺で手の位置・形状の検出を行う。手位置・形状の推定の方法としては例えば以下の方法を用いることで推定できる。以下、図１０（ｂ）を用いて説明する。 As shown in FIG. 10A, the position of the person is first detected from the image captured by the user detection camera 100, and the position / shape of the hand is detected around the person position. As a method for estimating the hand position / shape, for example, the following method can be used. Hereinafter, a description will be given with reference to FIG.

まずオフライン処理として、手位置・形状検出手段１０４は検出したい手の学習画像を大量に用意する（Ｓ５０１）。学習画像における照明環境、向きなどの条件は、実際に検出する環境になるべく沿った条件のものを揃える。次に、Ｓ５０１で用意した学習画像から、主成分分析を用いて手の主成分を構成する固有空間を作成する（Ｓ５０２）。また、検出したい手のサンプルとして手のテンプレート画像を用意する。テンプレート画像は用意した手の平均画像でもよいし、グー、パーなど、いくつかの手の画像を用意してもよい。作成した固有空間への射影行列および、手のテンプレート画像を手テンプレートデータベースに格納する（Ｓ５０３）。 First, as offline processing, the hand position / shape detection unit 104 prepares a large number of learning images of the hand to be detected (S501). The conditions such as the lighting environment and direction in the learning image are set so as to match the actual detection environment as much as possible. Next, an eigenspace that constitutes the principal component of the hand is created from the learning image prepared in S501 using principal component analysis (S502). A hand template image is prepared as a sample of the hand to be detected. The template image may be an average image of prepared hands or images of several hands such as goo and par. The created projection matrix onto the eigenspace and the hand template image are stored in the hand template database (S503).

次に実際の検出を行うオンライン処理について説明する。
まず、ユーザ検出カメラ１００が映像表示装置の前方に存在するユーザを撮影し(Ｓ５０４)、撮影画像から顔領域の検出を行う(Ｓ５０５)。 Next, online processing for performing actual detection will be described.
First, the user detection camera 100 captures a user existing in front of the video display device (S504), and detects a face area from the captured image (S505).

Ｓ５０５で顔領域が検出されると、その領域の周辺で手の検出を行う。顔領域の周辺において、用意した手のテンプレートに類似した領域を、手テンプレートデータベースに格納した手テンプレートを用いて走査する（Ｓ５０６）。顔の周辺領域の決定は、顔位置を基準としてあらかじめ設定したサイズの範囲でもよいし、カメラを２つ用いたステレオ視の原理により顔の周辺領域で顔と奥行き距離が近い領域を走査して探索範囲を削減してもよい。マッチングを行うための類似度の計算としては、ここではまず切り出した手の候補領域画像と、手のテンプレート画像をあらかじめ用意した固有空間への射影行列を使って固有空間に射影し、固有空間上での両者の距離を比較する手法を行う。手の主成分を表す空間上で距離を比較することで背景などノイズの影響を低減した検出が可能である。探索領域内で、あらかじめ定めた閾値を満たし、かつ最も手テンプレートに近い距離が得られた領域を手位置とする。また最も距離が近い手テンプレートの形状（例えばグー、パー等）を検出手形状とする（Ｓ５０７）。 When a face area is detected in S505, a hand is detected around the area. An area similar to the prepared hand template is scanned around the face area using the hand template stored in the hand template database (S506). The determination of the peripheral area of the face may be a pre-set size range based on the face position, or by scanning an area close to the face in the peripheral area of the face based on the principle of stereo vision using two cameras. The search range may be reduced. In order to calculate the similarity for matching, the candidate region image of the extracted hand and the hand template image are first projected onto the eigenspace using the prepared projection matrix to the eigenspace. The method of comparing the distance between the two is performed. By comparing the distances in the space representing the main component of the hand, it is possible to perform detection with reduced influence of noise such as background. In the search area, an area that satisfies a predetermined threshold and has a distance closest to the hand template is defined as a hand position. Further, the shape of the hand template having the closest distance (for example, goo, par, etc.) is set as the detected hand shape (S507).

探索領域内で閾値を満たす領域が無い場合には、手を出していないものとして検出を終了する。 If there is no region that satisfies the threshold in the search region, the detection is terminated as if the hand has not been taken.

この例では手位置・形状の検出にテンプレートマッチングの手法を用いたが、その他の手法、例えばブースティング（Ｂｏｏｓｔｉｎｇ）などの手法を用いてもよい。 In this example, the template matching method is used for detecting the hand position / shape, but other methods such as a boosting method may be used.

アプリケーション制御手段１０６は、この手の位置・形状情報に基づいて、例えば当該ユーザに複数の選択肢と各選択肢に対応した手位置・形状を提示しておき、ユーザの手の位置・形状の変化に応じて対応する選択肢が選ばれたことを判断することができる。このようにユーザが手の形状・位置を変えることにより映像表示装置を操作する方式を、フリーハンドジェスチャによる操作方式と呼ぶ。 Based on the hand position / shape information, the application control means 106 presents, for example, a plurality of options and hand positions / shapes corresponding to the options to the user, and changes the position / shape of the user's hand. Accordingly, it can be determined that the corresponding option has been selected. A method in which the user operates the video display device by changing the shape and position of the hand in this way is called an operation method using a freehand gesture.

（実施の形態６）
図１１は本実施の形態におけるジェスチャ入力リモコン１０５による操作方式を示したものである。 (Embodiment 6)
FIG. 11 shows an operation method by the gesture input remote controller 105 in the present embodiment.

図１１（ａ）に示すように、ユーザはジェスチャ入力リモコン１０５を手に把持した状態で、振る、回すなどといったあらかじめ規定された手の動き（ジェスチャ）や、映像表示装置の所望の位置を指すことにより映像表示装置を操作することができる。 As shown in FIG. 11A, the user indicates a predetermined hand movement (gesture) such as shaking or turning while holding the gesture input remote controller 105 in his hand, or a desired position of the video display device. Thus, the video display device can be operated.

図１１（ｂ）はジェスチャ入力リモコン１０５の構成を示すものである。該リモコンは、その内部に、該リモコンを把持するユーザの手の動きを検出するモーションセンサ１０５２を備える。また、該リモコンは、図１１（ｂ）に示すようにその表面にボタンを備えてもよい。 FIG. 11B shows the configuration of the gesture input remote controller 105. The remote control includes therein a motion sensor 1052 that detects the movement of the user's hand holding the remote control. Further, the remote controller may be provided with buttons on the surface thereof as shown in FIG.

モーションセンサ１０５２は、加速度センサ、角加速度センサ（レートジャイロ）、地磁気センサ（電子コンパス）のいずれか、もしくは２つ以上の組み合わせにより構成される。加速度センサは、所定の軸に対する加速度を検知するものであり、例えば図１１（ｂ）に示すようにＸ軸、Ｙ軸、Ｚ軸の直交する３軸の各軸に対して加速度を検知するものである。ユーザが手でジェスチャ入力リモコン１０５を把持した状態で手首および／または腕を動かし、ジェスチャ入力リモコンの位置および／または姿勢が変更されると、ジェスチャ識別手段１０５３があらかじめ規定された手の動き（ジェスチャ）のデータと照合を行い、識別結果がジェスチャ識別手段１０５３からアプリケーション制御手段１０６に入力される。 The motion sensor 1052 is configured by any one of an acceleration sensor, an angular acceleration sensor (rate gyro), a geomagnetic sensor (electronic compass), or a combination of two or more. The acceleration sensor detects acceleration with respect to a predetermined axis. For example, as shown in FIG. 11B, the acceleration sensor detects acceleration with respect to each of the three axes orthogonal to the X axis, the Y axis, and the Z axis. It is. When the user moves his / her wrist and / or arm while holding the gesture input remote control 105 with his / her hand, and the position and / or posture of the gesture input remote control is changed, the gesture identifying means 1053 moves the predetermined hand movement (gesture ) And the identification result is input from the gesture identification unit 1053 to the application control unit 106.

ここでは、ジェスチャ入力リモコン１０５の位置および／または姿勢の検出手段として、加速度センサを用いる例を示したが、同様の目的で、角加速度センサ（レートジャイロ）、地磁気センサ（電子コンパス）を利用することができる。 Here, an example is shown in which an acceleration sensor is used as the position and / or orientation detection means of the gesture input remote controller 105. However, for the same purpose, an angular acceleration sensor (rate gyro) and a geomagnetic sensor (electronic compass) are used. be able to.

アプリケーション制御手段１０６は、この識別されたジェスチャに基づいて、例えば図１２（ａ）に示すように「回す」というジェスチャに対応して画面をスクロールする、といったアプリケーションの制御を行う。 Based on the identified gesture, the application control means 106 controls the application such as scrolling the screen in response to the gesture of “turn” as shown in FIG.

また、モーションセンサ１０５２に地磁気センサを用いることで、同業者が容易に推測できるように、ユーザが映像表示装置の所望の位置を指した際に、映像表示装置の画面上の対応する位置を算出し、その算出結果をアプリケーション制御手段１０６に入力することができる。 In addition, by using a geomagnetic sensor as the motion sensor 1052, when the user points to a desired position of the video display device, the corresponding position on the screen of the video display device is calculated so that a person skilled in the art can easily guess. Then, the calculation result can be input to the application control means 106.

アプリケーション制御手段１０６は、この映像表示装置の画面上の対応する位置の入力に基づいて、例えば図１２（ｂ）に示すようにフォーカスをあてる、またはその位置に対応するアプリケーションを起動するなどのアプリケーションの制御を行う。 Based on the input of the corresponding position on the screen of the video display device, the application control means 106 focuses, for example, as shown in FIG. 12B, or activates the application corresponding to the position. Control.

このようにユーザがジェスチャ入力リモコン１０５を手に把持した状態で、あらかじめ規定された手の動き（ジェスチャ）や、映像表示装置の所望の位置を指すことにより映像表示装置を操作する方式を、ジェスチャ入力リモコン１０５による操作方式と呼ぶ。 In this way, the user operates the video display device by pointing to a predetermined hand movement (gesture) or a desired position of the video display device while the user holds the gesture input remote control 105 with the hand. This is called an operation method by the input remote controller 105.

なお、ジェスチャ入力リモコン１０５の形状は、図１１のような平面矩形の長手形状に限らず、平面形状が楕円形などの長手形状でもよく、断面形状も矩形に限らず円形や他の多角形であってもよい。さらに、該リモコンの形状は、図１３のような球であってもよい。 The shape of the gesture input remote controller 105 is not limited to the longitudinal shape of a planar rectangle as shown in FIG. 11, but the planar shape may be a longitudinal shape such as an ellipse, and the sectional shape is not limited to a rectangle, but may be a circle or other polygonal shape. There may be. Furthermore, the shape of the remote controller may be a sphere as shown in FIG.

球状のジェスチャ入力リモコンでは、図１３に示すような１つ以上の球状ジェスチャ入力リモコンを、（ａ−１）拘束された面上で転がす、（ａ−２）手の中で握って、手首を中心に回す、移動する、（ａ−３）空中に浮かんだボールを両手あるいは片手で押す、回転させる、ことにより画面上の対象物（ＧＵＩのオブジェクト等）をジェスチャに応じて操作することが可能である。これらの操作は、複数人で同じ目的のＧＵＩを協同で操作する場面でも実行されうる。図１３（ａ）に示した球状ジェスチャ入力リモコンの構造として、図１３（ｂ）に示すものが考えられる。球状の外形に加えて、モーションセンサ（加速度センサ、角加速度センサ（レートジャイロ）、地磁気センサ（電子コンパス）など）と、握った時の手の力をセンシングする圧力センサ、さらに機器側からの情報を提示する表示機構（ＬＥＤなど）、振動用モータ等の球状ジェスチャ入力リモコンの信号を受信側に双方向で通信する仕組み、及び駆動機構、電源を備える。また、図１３（ｂ−２）に示すように、空中に浮かぶタイプの球状ジェスチャ入力リモコンの構造として、透明或いは半透明の弾力性のある樹脂の中にヘリウムガスなどを充填し、内部には図１３（ｂ−１）に示すようなセンサ群、回転機構、通信機構、表示機構などを備える。ＬＥＤやプロジェクタなどの表示機構はシステム側からの状態をフィードバックとしてさりげなく投影する仕組みも兼ねる。 In the spherical gesture input remote control, one or more spherical gesture input remote controls as shown in FIG. 13 are (a-1) rolled on the restrained surface, (a-2) grasped in the hand and holding the wrist. It is possible to manipulate objects (GUI objects, etc.) on the screen according to gestures by turning or moving to the center, or (a-3) pushing or rotating the ball floating in the air with both hands or one hand. It is. These operations can also be executed in a scene where a plurality of people operate a GUI for the same purpose in cooperation. As the structure of the spherical gesture input remote controller shown in FIG. 13A, the structure shown in FIG. In addition to the spherical outer shape, motion sensors (acceleration sensor, angular acceleration sensor (rate gyro), geomagnetic sensor (electronic compass), etc.), pressure sensor that senses the hand force when grasping, and information from the device side Display mechanism (LED, etc.), a mechanism for bidirectionally communicating signals of a spherical gesture input remote controller such as a vibration motor to the receiving side, a drive mechanism, and a power source. As shown in FIG. 13 (b-2), the structure of a spherical gesture input remote control that floats in the air is filled with helium gas or the like in a transparent or translucent elastic resin. A sensor group, a rotation mechanism, a communication mechanism, a display mechanism, and the like as shown in FIG. Display mechanisms such as LEDs and projectors also serve as a mechanism for casually projecting the state from the system side as feedback.

これらの球状ジェスチャ入力リモコンを用いた操作系の一例は図１４（ａ）に示すように、ユーザが球状ジェスチャ入力リモコンを手の中に把持し、手首を回転あるいは移動することで、スクリーン内の３次元キャラクタ（アバター）や物体をマリオネットのように操作するといったものである。この場合、キャラクタは、骨格構造のような階層構造で定義し、その先端の並進移動量（Ｘ，Ｙ，Ｚの３自由度）と回転量（Ｘ，Ｙ，Ｚの３自由度）を該球状ジェスチャ入力リモコンの位置を移動することで先端の軌跡として指定する。先端を除く各関節の位置は逆運動学（インバースキネマティックス）などの手法を使って計算で求めることも可能である。これにより、例えば、サファリパークを模した仮想空間を表示するアプリケーションにおいて、該サファリ仮想空間内に表示されている実物大の動物（例えばキリン）にえさを差し出す、といったインタラクションが実現される。すなわち、ユーザが球状ジェスチャ入力リモコンを手の中に把持しながら手をユーザの前方に所定の距離以上差し出す動作をすると、該サファリ仮想空間内に表示されている実物大のキリンの近くにえさが表示され、キリンの口の近くにえさが移動するようにユーザが自身の手を移動させて、キリンの口からの所定の近傍内にえさが移動されると、キリンがえさを食べる、といったインタラクションである。このような映像表示装置の大画面を活かした実物大表示や自然なインタラクションにより、あたかもその場所（サファリパーク）にいるかのような臨場感を醸し出すことができる。また、手中の球状ジェスチャ入力リモコンを軽く握ることで画面上のＧＵＩを掴んで、さらにボールを手中で回転及び移動することでＧＵＩを移動、変形することも可能である。 An example of an operation system using these spherical gesture input remote controls is shown in FIG. 14A. The user holds the spherical gesture input remote control in the hand and rotates or moves the wrist, thereby For example, a three-dimensional character (avatar) or an object is operated like a marionette. In this case, the character is defined by a hierarchical structure such as a skeletal structure, and the translational movement amount (three degrees of freedom of X, Y, and Z) and the rotation amount (three degrees of freedom of X, Y, and Z) of the tip are included in the character. Designate the locus of the tip by moving the position of the spherical gesture input remote control. The position of each joint excluding the tip can be obtained by calculation using a technique such as inverse kinematics. Thereby, for example, in an application that displays a virtual space simulating a safari park, an interaction is realized in which food is presented to a full-sized animal (for example, a giraffe) displayed in the safari virtual space. That is, when the user holds the spherical gesture input remote control in his / her hand and puts his / her hand forward a predetermined distance or more in front of the user, the food is close to the full-size giraffe displayed in the safari virtual space. When the user moves his / her hand so that the food moves near the giraffe's mouth and the food is moved within a predetermined vicinity from the giraffe's mouth, the giraffe eats the interaction. It is. The actual size display utilizing the large screen of such a video display device and natural interaction can create a sense of presence as if it were in the place (safari park). It is also possible to grab the GUI on the screen by lightly grasping the spherical gesture input remote control in the hand, and further move and transform the GUI by rotating and moving the ball in the hand.

さらにユーザが空中を浮遊する球状ジェスチャ入力リモコンを掴んで或いは回して、画面上のバブル型のＧＵＩを選択、移動することも可能である。この場合、ユーザが実際の球を掴む、回す動作と、そのジェスチャによる画面上のＧＵＩの動作との一体感が臨場感として感じられるところが特徴である。さらに、球状ジェスチャ入力リモコンを用いた操作の別の例として図１４（ｂ）に示すように、球状ジェスチャ入力リモコンを映像表示装置に向かって転がす、或いは投げることで、例えばユーザが入力した情報（ボイスメールなど）が所定の相手先に送信される、といったインタラクションが可能である。これにより、従来はＧＵＩのメニューに沿って操作していたものを、球を転がすといった直感的なジェスチャで行うことが可能となり、高齢者や子どもなど従来ＧＵＩの複雑な操作が難しかったユーザ層も、該球状ジェスチャリモコンを利用することで容易に操作することが可能である。また、球状ジェスチャ入力リモコンを振る、投げることで、画面上のＧＵＩオブジェクト或いはコンテンツが、例えばユーザが入力した情報（ボイスメールなど）を送受信するように移動してもよい。或いは複数の球状ジェスチャ入力リモコン同士を物理的に近づけたり、接触させる、もしくは交換することで、ボールに予め与えられた属性の融合や交換を実現することで、該球形ジェスチャ入力リモコンを用いた、ユーザの実感を伴った直感的な操作が実現される。この場合、画面上のＧＵＩとしては、メタボールのような球形ＧＵＩ或いはコンテンツが核融合のように融合或いは分離することでその属性の変化を明示することも可能である。 It is also possible for the user to select or move the bubble-type GUI on the screen by grasping or turning the spherical gesture input remote control floating in the air. In this case, it is a feature that the user can feel a sense of unity between the action of grasping and turning the actual ball and the action of the GUI on the screen by the gesture. Furthermore, as another example of the operation using the spherical gesture input remote controller, as shown in FIG. 14B, by rolling or throwing the spherical gesture input remote controller toward the video display device, for example, information input by the user ( (Such as voice mail) is transmitted to a predetermined destination. This makes it possible to perform intuitive gestures, such as rolling a ball, that were previously operated according to the GUI menu, and there are users such as elderly people and children who have had difficulty with conventional GUI complicated operations. The spherical gesture remote controller can be used for easy operation. Further, the GUI object or content on the screen may be moved so as to transmit / receive information (such as voice mail) input by the user by shaking or throwing the spherical gesture input remote controller. Alternatively, by using the spherical gesture input remote controller by physically bringing the spherical gesture input remote controllers close to each other, bringing them into contact with each other, or exchanging them, and realizing the fusion and exchange of attributes given in advance to the ball, Intuitive operation with the user's feeling is realized. In this case, as a GUI on the screen, a spherical GUI such as a metaball or a content can be clearly indicated by fusing or separating contents like nuclear fusion.

本発明は映像表示装置として、特に、従来のボタンによる入力だけでなく、ユーザの位置・向き、視線、ハンドジェスチャ、ユーザの手・腕の動き等のユーザセンシング情報に基づき制御される映像表示装置として、例えば、リビングルームなどの家族の共有空間に設置されるＴＶとして利用することができる。 The present invention is a video display device that is controlled based on user sensing information such as a user position / orientation, line of sight, hand gesture, user hand / arm movement, etc., as well as conventional button input. For example, it can be used as a TV set in a shared space of a family such as a living room.

映像表示装置の外観と関連機器とのインタフェースの一例を説明する図The figure explaining an example of the interface of the external appearance of a video display apparatus, and related apparatus 映像表示装置を有する室内において複数のユーザが位置している状況の一例を示す図The figure which shows an example of the condition where several users are located in the room | chamber interior which has an image | video display apparatus. 映像表示装置の主要機能ブロック図Main function block diagram of video display device ユーザ情報ＤＢのデータ構成の概略を説明する図The figure explaining the outline of a data structure of user information DB 映像表示装置の処理の概略を示すフローチャートFlow chart showing outline of processing of video display device ユーザ位置による操作方法とその実現方法の概略を説明する図The figure explaining the outline of the operation method by a user position, and its realization method ユーザ位置による操作方法の動作例を説明する図The figure explaining the operation example of the operation method by a user position 視線・顔向きによる操作方式とその実現方法の概略を説明する図The figure explaining the outline of the operation method by gaze and face direction and the realization method 視線・顔向きによる操作方式とその実現方法の概略を説明する図The figure explaining the outline of the operation method by gaze and face direction and the realization method フリーハンドジェスチャによる操作方式とその実現方法の概略を説明する図The figure explaining the outline of the operation method by freehand gesture and its realization method ジェスチャ入力リモコンによる操作方式とその実現方法の概略を説明する図The figure explaining the outline of the operation method by the gesture input remote control and its realization method ジェスチャ入力リモコンによる操作の動作例を説明する図The figure explaining the operation example of operation by gesture input remote control 球状ジェスチャ入力リモコンによる操作方式とその実現方法の概略を説明する図The figure explaining the outline of the operation method by the spherical gesture input remote control and its realization method 球状ジェスチャ入力リモコンによる操作の動作例を説明する図The figure explaining the example of operation of operation by a spherical gesture input remote control

Explanation of symbols

１００ユーザ検出カメラ
１０１ユーザ位置検出手段
１０２ユーザ識別手段
１０３視線検出手段
１０４手位置・形状検出手段
１０５ジェスチャ入力リモコン
１０６アプリケーション制御手段
１０７Ａユーザ情報ＤＢ
１０７Ｂユーザ状態管理手段
１０７Ｃモーダル制御手段
１０７Ｄ視聴状態判別手段
１０８データ受信手段
１０９等身大表示変換手段
１１０画面描画手段
１１１画面
１１２等身大情報付加手段
１１３データ送信手段 DESCRIPTION OF SYMBOLS 100 User detection camera 101 User position detection means 102 User identification means 103 Line-of-sight detection means 104 Hand position / shape detection means 105 Gesture input remote control 106 Application control means 107A User information DB
107B User state management means 107C Modal control means 107D Viewing state determination means 108 Data reception means 109 Life size display conversion means 110 Screen drawing means 111 Screen 112 Life size information addition means 113 Data transmission means

Claims

A video display device operated by one or more users,
Identifying means for identifying one or more users;
Detection means for detecting the position and movement of one or more users;
Display means for displaying video;
A discriminating means for discriminating one or more users who are viewing targets of the displayed video based on the detected position and operation of the detected user among the one or more identified users;
Control means for controlling an application for displaying video ;
A user database that stores user attribute information is provided.
The attribute information includes body feature information including at least body information, and human relationship information between one or more users,
The video display apparatus, wherein the control unit controls an application for displaying the video corresponding to the attribute information of one or more viewing target users.

The human relationship information between the one or more users is information on intimacy between users,
The video display apparatus according to claim 1, wherein the control unit determines the degree of information sharing according to the degree of closeness between the users and controls an application that displays the video.

The video display device further includes:
Control the application that works in cooperation with the other party's video display device installed in another location and connected to the network,
Data receiving means for receiving operation information transmitted from the counterpart video display device;
Of the operation information, a life-size display that performs scale conversion based on the presentation screen size for information on an object that is to be presented in a full-size image captured by the counterpart video display device and operation information for the object Conversion means;
Life-size information adding means for adding life-size information necessary for life-size display on the counterpart video display device to operation information based on user size information and user position information;
Data transmission means for transmitting operation information and life-size video to which the life-size information is added to the counterpart video display device;
Wherein the control unit, together with the input information, processes the operation information after the scale conversion, and controlling the application, the image display apparatus according to claim 1.

The video display device further includes a remote control for obtaining explicit input information of the user,
The video display apparatus according to claim 3 , wherein the control unit controls an application based on a plurality of relative positional relationships of the remote controller .

The remote control is spherical, and the user can throw the remote control or roll on the surface,
5. The video display apparatus according to claim 4 , wherein the control unit controls an application for displaying the video based on information on a trajectory of the remote controller or information on a stop position .