JP2000112610A

JP2000112610A - Contents display selecting system and contents recording medium

Info

Publication number: JP2000112610A
Application number: JP29311498A
Authority: JP
Inventors: Toshihiro Maruyama; 俊弘丸山
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1998-09-30
Filing date: 1998-09-30
Publication date: 2000-04-21

Abstract

PROBLEM TO BE SOLVED: To provide a contents display selecting system by which the intension and command of a user who operates a computer are received through the use of plural recognizing methods including not only voice recognition but also at least picture recognition and also linking is executed to a designated link destination. SOLUTION: In this system, a 'category' indicating the kinds of recognizing devices, 'recognition candidate data' as the selection matter of the user, which is to be recognized by the recognizing devices, and 'link information' being the link destination of a selection result are described in the description text of the contents, a control means 10 transmits the 'category' provided in display contents displayed in a display device 12 to the recognizing devices 14A and 14B, 'recognition candidate data' are transmitted to the recognizing devices so as to be recognized, applying link information is selected by receiving the recognition result and the display contents are changed into the contents indicated by link information.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、コンピュータによ
るデータの表示とコンピュータ使用者からのコマンドな
どの受け付けに関し、特にインターネットなどにおける
ホームページのコンテンツ閲覧システムとして利用され
るコンピュータによるコンテンツ表示選択システム及び
コンテンツが記録されたコンテンツ記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to display of data by a computer and reception of commands and the like from a computer user. The present invention relates to a recorded content recording medium.

【０００２】[0002]

【従来の技術】コンピュータがマルチメディア対応にな
り、通信回線などを経由してネットワークにおいて利用
されるようになり、またインターネットの普及に伴い、
コンピュータ利用者が飛躍的に増大している。これま
で、コンピュータへのコマンドの付与は、主としてキー
ボードやマウスなど手動操作の器具を用いて行なわれて
いる。かかる手動操作の器具に代って人間が発生する音
声を認識し、コンピュータ利用者の意志やコマンドを受
け付けるシステムが開発されている。かかる音声認識を
用いた音声対話型のコンピュータシステムが、例えば特
開平８−３３５１６０号公報に示されている。この公報
に記載された従来技術によれば、データ接続によって取
得したビデオスクリーン表示の信号とデータリンク集合
を認識する手段を設け、そのデータリンクに対応する文
法を取得し、その文法を音声認識部で受け取ることがで
きるというものである。2. Description of the Related Art Computers have become compatible with multimedia, have been used in networks via communication lines and the like, and with the spread of the Internet,
Computer users are increasing exponentially. Heretofore, commands have been given to computers mainly using manually operated instruments such as keyboards and mice. A system has been developed that recognizes voices generated by humans in place of such manually operated instruments and accepts wills and commands of computer users. A speech interactive computer system using such speech recognition is disclosed in, for example, Japanese Patent Application Laid-Open No. 8-335160. According to the prior art described in this publication, a means for recognizing a video screen display signal and a data link set obtained by a data connection is provided, a grammar corresponding to the data link is obtained, and the grammar is converted to a voice recognition unit. It can be received at.

【０００３】また、他の従来技術として、電子通信学会
の信学技法IE95-46,MVE95-39(1995,7月)には、既存のコ
ンテンツ記述言語から機械的に認識に必要な情報を抽出
する技術が開示されている。[0003] As another conventional technique, IE95-46, MVE95-39 (1995, July), IEICE's IEICE-Technology, extracts information necessary for mechanical recognition from existing content description languages. A technique for performing this is disclosed.

【０００４】[0004]

【発明が解決しようとする課題】これらの従来の技術で
は、ある限定したシステムで限られた認識方法のみを選
択できるにすぎない。すなわち、上記特開平８−３３５
１６０号公報の方式では音声認識のみによりユーザの意
志やコマンドを受け付けることができるのであり、例え
ばジェスチャー、手話、顔表情、手書きパターンなどの
認識を行なうことにより、同時に複数の認識手段に対し
て同一のコンテンツ記述テキスト上で記述しておき、こ
れらの複数の認識手段を用いてユーザの意志やコマンド
を受け付けるという技術は考えられていなかった。In these prior arts, only a limited recognition method can be selected in a limited system. That is, Japanese Patent Application Laid-Open No. 8-335
In the method of Japanese Patent Publication No. 160, it is possible to accept a user's intention or command only by voice recognition. For example, by performing recognition of gesture, sign language, facial expression, handwritten pattern, etc., the same recognition can be simultaneously performed for a plurality of recognition means. No description has been made of a technique of describing the content description text on the content description text and accepting the user's intention or command using the plurality of recognition means.

【０００５】さらに、機械的な抽出では、上記信学技法
の第３６頁に記述されている通り、同じ文字列で、別の
リンク先が定義されていたりすると、どのコンテンツ
（表示部分）をどの音声で指定してよいかわからない、
という問題点がある。また、特別の認識が可能だったと
しても、システムに精通しているか、特別な訓練を行わ
なければ操作方法を理解することが難しく、初心者で
は、取り扱いが難しいという問題もあった。また、手の
不自由なハンディキャップユーザへのアシストの観点か
らも、複数の認識方法を用いることが望まれていて、ホ
ームページなどのコンテンツ閲覧システムを提供するイ
ンターネットビジネス業者にとって、コンテンツ自体の
改良が望まれ、これに対応してコンテンツを閲覧するユ
ーザ側のハード面の改良が望まれている。[0005] Further, in the mechanical extraction, as described on page 36 of the above-mentioned IEICE technique, if another link destination is defined by the same character string, which content (display part) is I do not know if I can specify it by voice,
There is a problem. Even if special recognition is possible, it is difficult to understand the operation method without familiarity with the system or without special training, and there is also a problem that it is difficult for a beginner to handle. In addition, from the viewpoint of assisting handicap users who are handicapped, it is desired to use a plurality of recognition methods. For an Internet business provider that provides a content browsing system such as a homepage, improvement of the content itself is required. In response, there is a demand for improvement of the hardware side of the user who browses the content.

【０００６】したがって、本発明は音声認識のみなら
ず、少なくとも画像認識を含む複数の認識方法を用いて
コンピュータを操作するユーザの意志やコマンドを受け
付けることができ、ユーザの指示に従って指定されたリ
ンク先にリンクすることを可能とするコンテンツ表示選
択システムを提供することを第１の目的とする。Accordingly, the present invention can accept a user's intention or command for operating a computer using at least a plurality of recognition methods including image recognition as well as voice recognition, and can specify a link destination specified according to a user's instruction. It is a first object of the present invention to provide a content display selection system that can link to a content display selection system.

【０００７】また、本発明は音声認識のみならず、少な
くとも画像認識を含む複数の認識方法を用いてコンピュ
ータを操作するユーザの意志やコマンドを受け付けるこ
とができ、ユーザの指示に従って指定されたリンク先に
リンクすることを可能とするためのコンテンツが記録さ
れたコンテンツ記録媒体を提供することを第２の目的と
する。なお、本発明において「画像認識」とは、コンピ
ュータを操作するユーザのジェスチャー、手話、顔表
情、手書きパターンなどの認識を含む概念である。した
がって、ビデオカメラなどを用いた光学的認識のみなら
ず、筆圧などによる形状認識など、あらゆる画像認識手
段による認識を含むものとする。In addition, the present invention can accept a user's intention or command for operating a computer using at least a plurality of recognition methods including image recognition as well as voice recognition, and can specify a link destination specified according to a user's instruction. It is a second object of the present invention to provide a content recording medium on which content for enabling linking to a content is recorded. In the present invention, “image recognition” is a concept including recognition of gestures, sign language, facial expressions, handwritten patterns, and the like of a user who operates a computer. Therefore, it includes not only optical recognition using a video camera or the like but also recognition by any image recognition means, such as shape recognition using pen pressure or the like.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、本発明ではコンピュータの表示装置に表示すべきコ
ンテンツに少なくとも音声認識と画像認識を含む複数の
認識方法のいずれかによりユーザの意志を認識できるよ
うに、認識方法のカテゴリーを音声又は画像にて与える
ための記述をあらかじめ含めておき、このコンテンツを
ブラウザソフトがインストールされたコンピュータとし
てのコンテンツ表示選択システムにて表示するとき、こ
のコンピュータが少なくとも音声認識と画像認識を含む
認識装置と、コンテンツを表示する表示装置と、認識装
置と表示装置を関連付けて制御する制御手段とを有する
構成であって、表示すべきコンテンツを記述する記述テ
キストには、認識装置の種類を示す「カテゴリー」と、
認識装置により認識するユーザの選択事項の候補である
「認識候補データ」と、選択された事項のリンク先であ
る「リンク情報」を認識情報として記述しておき、制御
手段が表示装置に表示されている表示コンテンツの有す
る「カテゴリー」を認識装置に送り、表示装置に表示さ
れている「認識候補データ」を認識装置にて認識できる
よう、「認識候補データ」を認識装置に送り、かつ認識
装置による認識結果を受けて該当するリンク情報を選択
し、表示装置に表示されるコンテンツを該当するリンク
情報で示されるコンテンツに変更するよう表示装置を制
御するよう構成されているのである。In order to achieve the above object, according to the present invention, contents to be displayed on a display device of a computer recognize a user's will by one of a plurality of recognition methods including at least voice recognition and image recognition. In order to be able to do so, a description for giving the category of the recognition method by voice or image is included in advance, and when this content is displayed by the content display selection system as a computer on which the browser software is installed, at least this computer A configuration including a recognition device including voice recognition and image recognition, a display device for displaying the content, and control means for controlling the recognition device and the display device in association with each other. , "Category" indicating the type of recognizer,
"Recognition candidate data" which is a candidate for a user selection item to be recognized by the recognition device and "link information" which is a link destination of the selected item are described as recognition information, and the control means is displayed on the display device. Sending the "recognition candidate data" to the recognition device so that the "recognition candidate data" displayed on the display device can be recognized by the recognition device; In response to the result of the recognition, the corresponding link information is selected, and the display device is controlled to change the content displayed on the display device to the content indicated by the corresponding link information.

【０００９】すなわち、本発明によれば少なくとも音声
認識と画像認識を含む認識装置と、コンテンツを表示す
る表示装置と、前記認識装置と前記表示装置を関連付け
て制御する制御手段とを有するコンテンツ表示選択シス
テムであって、表示すべきコンテンツを記述する記述テ
キストには、前記認識装置の種類を示す「カテゴリー」
と、前記認識装置により認識するユーザの選択事項の候
補である「認識候補データ」と、選択された事項のリン
ク先である「リンク情報」を認識情報として記述してお
き、前記制御手段が前記表示装置に表示されている表示
コンテンツの有する「カテゴリー」を前記認識装置に送
り、前記表示装置に表示されている「認識候補データ」
を前記認識装置にて認識できるよう、前記「認識候補デ
ータ」を前記認識装置に送り、かつ前記認識装置による
認識結果を受けて該当するリンク情報を選択し、前記表
示装置に表示されるコンテンツを前記該当するリンク情
報で示されるコンテンツに変更するよう前記表示装置を
制御するよう構成されているコンテンツ表示選択システ
ムが提供される。That is, according to the present invention, a content display selection device including at least a recognition device including voice recognition and image recognition, a display device for displaying content, and control means for controlling the recognition device and the display device in association with each other. In the system, the description text describing the content to be displayed includes “category” indicating the type of the recognition device.
And "recognition candidate data", which is a candidate for a user selection item to be recognized by the recognition device, and "link information", which is a link destination of the selected item, are described as recognition information. The "category" of the display content displayed on the display device is sent to the recognition device, and the "recognition candidate data" displayed on the display device is displayed.
To be recognized by the recognition device, send the "recognition candidate data" to the recognition device, and receive the result of recognition by the recognition device, select the corresponding link information, and display the content displayed on the display device. There is provided a content display selection system configured to control the display device to change to the content indicated by the corresponding link information.

【００１０】また、本発明によれば少なくとも音声認識
と画像認識を含む複数の認識方法のいずれかによりユー
ザの意志を認識できるように、認識方法のカテゴリーを
音声又は画像にて与えるための記述を含むコンテンツ
が、コンピュータが読み取り可能な状態で記録されたコ
ンテンツ記録媒体が提供される。Further, according to the present invention, a description for giving a category of a recognition method by voice or image so that a user's will can be recognized by at least one of a plurality of recognition methods including voice recognition and image recognition. A content recording medium is provided in which the content including the content is recorded in a computer-readable state.

【００１１】なお、前記記述テキストには、前記認識候
補データに関する補足文字列を表示するための「補助情
報」を前記認識情報として記述しておき、前記制御手段
が前記「補助情報」を前記表示装置に送り、前記表示装
置が前記「補助情報」によって画面上に前記認識候補デ
ータに関する補足文字列を表示するよう構成されている
ことは本発明の好ましい態様である。[0011] In the description text, "auxiliary information" for displaying a supplementary character string relating to the recognition candidate data is described as the recognition information, and the control means displays the "auxiliary information" on the display. It is a preferred aspect of the present invention that the display device is configured to send a supplementary character string relating to the recognition candidate data on a screen by the "auxiliary information" by the "auxiliary information".

【００１２】また、前記記述テキストには、前記認識装
置が含む認識方法を示す「ガイド情報」を前記認識情報
として記述しておき、前記制御手段が前記「ガイド情
報」を前記表示装置に送り、前記表示装置が前記「ガイ
ド情報」によって画面上に前記認識装置が含む認識方法
を示してユーザに前記認識手法のガイドを提供するよう
構成されていることは本発明の好ましい態様である。In the description text, "guide information" indicating a recognition method included in the recognition device is described as the recognition information, and the control means sends the "guide information" to the display device, It is a preferred aspect of the present invention that the display device is configured to show a recognition method included in the recognition device on a screen by the "guide information" and provide a guide of the recognition method to a user.

【００１３】さらに、前記画像認識がジェスチャー、手
話、顔表情、手書きパターンの少なくとも１つを認識す
るものであることは本発明の好ましい態様である。Further, it is a preferred embodiment of the present invention that the image recognition is for recognizing at least one of a gesture, a sign language, a facial expression, and a handwritten pattern.

【００１４】本発明によれば、インターネットなどのホ
ームページを表示する際に、表示しているコンテンツを
キーボード、マウスなどで指定する以外に、音声確認、
ジェスチャー、手話、顔表情、手書きパターンなどによ
る選択を行えるように、ホームページの記述テキストに
認識システムのカテゴリーと、関連するコンテンツの情
報を記述する書式を備え、表示システムではその表示シ
ステムが備える認識装置に応じて記述テキストにより認
識に必要な情報を取得するようにしている。According to the present invention, when displaying a homepage such as the Internet, in addition to specifying the displayed content with a keyboard, a mouse, or the like, a voice check,
The description text of the homepage includes a recognition system category and a format that describes information on related content so that selections can be made using gestures, sign language, facial expressions, handwritten patterns, etc. The information necessary for recognition is acquired from the description text according to the information.

【００１５】さらに、対応する認識システムごとに、ホ
ームページ表示の際に、どのような音声認識、ジェスチ
ャー、手話、顔表情、手書きパターン認識が用意されて
いるかを示すガイドデータを提供し、音声認識ならば音
声によるアナウンス、ジェスチャー、手話、顔表情なら
アニメーション、動画データによる表示、手書きパター
ンなら、参考になる図形の表示（アニメーションも含
む）を同時に提供することは本発明の好ましい態様であ
る。Further, for each corresponding recognition system, guide data indicating what kind of voice recognition, gesture, sign language, facial expression, and handwritten pattern recognition are provided when displaying a home page is provided. It is a preferred embodiment of the present invention to simultaneously provide, for example, an announcement by voice, a gesture, a sign language, an animation for facial expression, a display by moving image data, and a handwritten pattern to display a reference graphic (including animation).

【００１６】[0016]

【発明の実施の形態】以下、図面を参照して本発明の好
ましい実施の形態について説明する。図１は本発明に係
るコンテンツ表示選択システムの好ましい実施の形態を
概念的に示すブロック図である。このコンテンツ表示選
択システムはコンピュータとその関連装置により構成さ
れるが、これを機能的に示すと、図示省略のＣＰＵ（中
央演算処理装置）、主記憶装置及びインタフェースを含
む制御装置１０、画像を表示し音声を再生する表示装置
１２、音声認識を行なう第１認識装置１４Ａ、画像認識
を行なう第２認識装置１４Ｂ、リンク情報を記憶するリ
ンク情報記憶部１６、補助情報の表示を制御する補助情
報表示制御部１８が存在し、サーバ２２からコンテンツ
記述テキスト２０を受け取る様子が示されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram conceptually showing a preferred embodiment of a content display selection system according to the present invention. This content display selection system is composed of a computer and its related devices. When this is functionally shown, a CPU (Central Processing Unit) not shown, a control device 10 including a main storage device and an interface, and an image display device Display device 12 for reproducing voice, first recognition device 14A for performing voice recognition, second recognition device 14B for performing image recognition, link information storage section 16 for storing link information, auxiliary information display for controlling display of auxiliary information The state where the control unit 18 is present and receives the content description text 20 from the server 22 is shown.

【００１７】まず制御装置１０の主記憶装置にはＨＴＭ
Ｌ記述コンテンツを表示するためのブラウザソフトがあ
らかじめインストールされているものとする。なお、第
２認識装置１４Ｂはユーザの上半身又は全身によるジェ
スチャー、手話、手書きパターンなどの１つ以上を認識
するものである。したがって、第１認識装置１４Ａと第
２認識装置１４Ｂを併用することで、音声認識とジェス
チャー認識、音声認識と手話認識、音声認識と手書きパ
ターン認識、あるいは音声認識とジェスチャー認識、手
話認識、手書きパターン認識のうちの２つ以上によりユ
ーザの意志やコマンドを認識することができる。First, the HTM is stored in the main memory of the control device 10.
It is assumed that browser software for displaying the L-description content is installed in advance. The second recognition device 14B recognizes one or more of gestures, sign language, handwritten patterns, and the like by the upper body or the whole body of the user. Therefore, by using the first recognition device 14A and the second recognition device 14B together, voice recognition and gesture recognition, voice recognition and sign language recognition, voice recognition and handwritten pattern recognition, or voice recognition and gesture recognition, sign language recognition, handwritten pattern The user's will or command can be recognized by two or more of the recognitions.

【００１８】図２は、図１の構成の動作を示すフローチ
ャートであり、このフローチャートに従って図１の構成
の動作について説明する。ブラウザソフトが起動して、
図２のプログラムがスタートし、いま、制御装置１０が
サーバ２２からネットワーク、ファイルシステムなどを
通してコンテンツ記述テキスト２０を取得したものとす
る（ステップＳ１）。制御装置１０は、コンテンツ記述
テキスト２０を解析し（ステップＳ２）、拡張された認
識情報が含まれているか否かを判断する（ステップＳ
３）。拡張された認識情報が含まれていない場合、すな
わち通常のコンテンツであるときは、従来どおりの表示
処理により画像、テキストなどを表示する（ステップＳ
４）。FIG. 2 is a flowchart showing the operation of the configuration of FIG. 1. The operation of the configuration of FIG. 1 will be described with reference to this flowchart. The browser software starts up,
It is assumed that the program of FIG. 2 has started, and the control device 10 has acquired the content description text 20 from the server 22 via a network, a file system, and the like (step S1). The control device 10 analyzes the content description text 20 (step S2) and determines whether or not the extended recognition information is included (step S2).
3). If the extended recognition information is not included, that is, if the content is a normal content, an image, a text, and the like are displayed by the conventional display processing (step S).
4).

【００１９】一方、拡張された認識情報を確認すると、
制御装置１０に接続された認識装置１４Ａ、１４Ｂの認
識方法と一致する認識情報が含まれているか検索し、用
意された認識装置１４Ａ、１４Ｂの認識方法（図１の例
では音声認識と画像認識）に該当する認識情報が発見さ
れると、制御装置１０はコンテンツ記述テキストから認
識情報を抽出する（ステップＳ５）。次いで、第１の認
識装置１４Ａと第２の認識装置１４Ｂに対応する認識候
補データを送付する（ステップＳ６Ａ、Ｓ６Ｂ）。同時
にリンク情報記憶部１６に、認識候補と関連するリンク
先を記憶する。また、これらのステップＳ６Ａ、Ｓ６Ｂ
では、コンテンツ記述テキストに記述された対応する
「補助情報」を補助情報表示制御部１８にそれぞれ送
る。次いで制御装置１０は認識開始命令を認識装置１４
Ａ、１４Ｂに送り（ステップＳ７）、これを受けて認識
装置１４Ａ、１４Ｂは、制御装置１０から送付された認
識情報を基に、認識処理を開始する。On the other hand, when the extended recognition information is confirmed,
A search is performed to determine whether recognition information that matches the recognition method of the recognition devices 14A and 14B connected to the control device 10 is included, and the recognition method of the prepared recognition devices 14A and 14B (speech recognition and image recognition in the example of FIG. 1). When the recognition information corresponding to ()) is found, the control device 10 extracts the recognition information from the content description text (step S5). Next, recognition candidate data corresponding to the first recognition device 14A and the second recognition device 14B is sent (steps S6A and S6B). At the same time, the link information storage unit 16 stores the link destination related to the recognition candidate. In addition, these steps S6A, S6B
Then, the corresponding “auxiliary information” described in the content description text is sent to the auxiliary information display control unit 18. Next, the control device 10 issues a recognition start command to the recognition device 14.
A and 14B (step S7). In response, the recognition devices 14A and 14B start recognition processing based on the recognition information sent from the control device 10.

【００２０】次いで、各認識装置１４Ａ、１４Ｂで認識
が行なわれたか否かがチェックされる（ステップＳ
８）。なお、このフローチャートでは、２つの認識装置
１４Ａ、１４Ｂを順番にチェックしているが、認識装置
が多い場合や、チェックに時間を要する場合などは、割
込処理を用いることができる。認識が行われたことがス
テップＳ８で判断されると、認識装置１４Ａ、１４Ｂは
それぞれ対応する認識候補中のどの認識候補に対する認
識が行われたかを制御装置１０に返し、制御装置１０は
認識結果を表示装置１２にて表示し（ステップＳ９）、
制御装置１０はその認識結果からリンク情報記憶部１６
に記憶されたリンク先を検索し（ステップＳ１０）、該
当するリンク先を選択し、新しいリンク先として新たな
コンテンツ記述テキスト２０をサーバ２２から取得する
（ステップＳ１１）。なお、補助情報表示制御部１８で
は、コンテンツ記述テキストに記述された「補助情報」
を受けて、対応する表示を表示装置１２にて行う。な
お、表示装置１２は、画像のみならず、音声の再生をも
行うものであり、具体的にはディスプレイとスピーカ
（ヘッドフォン）を含む。Next, it is checked whether or not recognition has been performed by each of the recognition devices 14A and 14B (step S).
8). In this flowchart, the two recognizing devices 14A and 14B are checked in order. However, when there are many recognizing devices or when a long time is required for checking, an interrupt process can be used. When it is determined in step S8 that the recognition has been performed, the recognition devices 14A and 14B return to the control device 10 which of the corresponding recognition candidates has been recognized, and the control device 10 returns the recognition result. Is displayed on the display device 12 (step S9),
The control device 10 determines the link information storage unit 16 from the recognition result.
Is searched (step S10), the corresponding link is selected, and a new content description text 20 is acquired from the server 22 as a new link (step S11). Note that the auxiliary information display control unit 18 uses the “auxiliary information” described in the content description text.
In response, the corresponding display is performed on the display device 12. The display device 12 reproduces not only images but also audio, and specifically includes a display and a speaker (headphone).

【００２１】次に図３と図４により本発明の第２の実施
の形態について説明する。第２の実施の形態は、コンテ
ンツに「ガイド情報」が加えて記述されている場合に有
効なものである。すなわち、「ガイド情報」がコンテン
ツに付記されていた場合は、そのガイド情報が示す別の
データ、コンテンツを取得し、音声、動画、アニメーシ
ョンなどのガイド情報を基にガイド情報表示制御部２４
が表示装置１２を制御して再生することができる。すな
わち、図３に示されるように、第２の実施の形態では、
図１の構成に加えてガイド情報表示制御部２４が制御装
置に接続されている。この、ガイド情報表示制御部２４
には、コンテンツに含まれるガイド情報１とガイド情報
２が一時記憶され、表示装置１２における表示の態様が
制御される。また、図４のフローチャートは、図２のフ
ローチャートのステップＳ６ＢとステップＳ７の間にガ
イド情報１とガイド情報２を順次表示・再生するステッ
プＳ１２Ａ、Ｓ１２Ｂが設けられたものとなっている。Next, a second embodiment of the present invention will be described with reference to FIGS. The second embodiment is effective when the content is described with “guide information” added thereto. That is, when “guide information” is added to the content, another data and content indicated by the guide information are acquired, and the guide information display control unit 24 is controlled based on the guide information such as audio, video, and animation.
Can control and reproduce the display device 12. That is, as shown in FIG. 3, in the second embodiment,
A guide information display control unit 24 is connected to the control device in addition to the configuration of FIG. This guide information display control unit 24
, The guide information 1 and the guide information 2 included in the content are temporarily stored, and the display mode on the display device 12 is controlled. Further, the flowchart of FIG. 4 is provided with steps S12A and S12B for sequentially displaying and reproducing the guide information 1 and the guide information 2 between steps S6B and S7 of the flowchart of FIG.

【００２２】次に、コンテンツに付加される認識情報の
例について説明する。認識情報は、コンテンツを記述す
る記述テキストに付加された、「カテゴリー」、「認識
候補データ」、「リンク情報」、「補助情報」で表わさ
れた情報である。カテゴリーには、認識装置の種類を記
述する。例えば音声認識、ジェスチャー、手話、顔表
情、手書きパターンなどがある。Next, an example of recognition information added to content will be described. The recognition information is information represented by “category”, “recognition candidate data”, “link information”, and “auxiliary information” added to the description text describing the content. The category describes the type of recognition device. Examples include voice recognition, gestures, sign language, facial expressions, handwritten patterns, and the like.

【００２３】音声認識の認識情報は、音声認識に必要な
音声の候補を提供するものである。認識装置１４で認識
された結果から制御装置１０がリンク情報を選択する。
ジェスチャーはユーザの上半身又は全身の動作を、カメ
ラ映像で取り込み、その一連の動作をモーションキャプ
チャーなどにより解析し、画面上のコンテンツとのリン
クを関連づける。ジェスチャーについては、一連の動作
を定型化し、例えば「おじぎ」、「右手上げる」、「左
手上げる」、「座る」、「飛び跳ねる」、「右手腰に付
ける」、「左手振る」、「右足上げる」などの動作を決
めておき、第２認識装置１４Ｂによる動作解析の結果を
標準化しておくとさらに汎用性がある。The recognition information for speech recognition provides speech candidates necessary for speech recognition. The control device 10 selects link information from the result recognized by the recognition device 14.
The gesture captures the motion of the upper body or the whole body of the user with a camera image, analyzes a series of the motion by motion capture or the like, and associates a link with the content on the screen. For gestures, a series of actions are standardized, such as "bow", "right hand up", "left hand up", "sit", "jump", "right hand waist", "left hand shake", "right hand up" If the operation such as the above is determined and the result of the operation analysis by the second recognition device 14B is standardized, there is further versatility.

【００２４】手話については、すでに一連の動作が定義
されており、第２認識装置１４Ｂの解析能力により、単
純な動作から複雑な動作まで選択することが可能であ
る。特に手話として認識しない場合はジェスチャーによ
る認識の１つとすることができる。顔表情についても、
カメラ画像などにより認識が可能である。ジェスチャー
と同様に「右目つぶる」、「左目つぶる」、「口を開け
る」、「舌を出す」などがある。手書きパターンについ
ては、手書き文字認識などの技術の応用により、キーボ
ードの代用として用いることも可能であるが、完全な文
字認識に至らずとも、「まる」、「バツ」、「波線」、
「やま」、「四角」、「三角」などの単純図形でも十分
である。For sign language, a series of actions has already been defined, and a simple action to a complex action can be selected by the analysis capability of the second recognition device 14B. In particular, when not recognized as sign language, it can be one of recognition by gesture. Regarding facial expressions,
Recognition is possible by a camera image or the like. As with gestures, there are "close right eye", "close left eye", "open mouth", "put out tongue" and so on. Handwritten patterns can be used as substitutes for keyboards by applying techniques such as handwritten character recognition. However, even if complete character recognition is not achieved, it is possible to use "maru", "x", "wavy",
Simple figures such as "Yama", "Square" and "Triangle" are sufficient.

【００２５】これらの種々の認識手法に対応した認識装
置の種類をカテゴリーとし、それぞれの認識情報として
以下のように定義する。＜Extend Recognition Category=”xxxx”＞ xxxx：カテゴリー＜Language=yyyyy＞ yyyy：記述言語に関する情報＜List aaaa, bbbb, cccc＞ aaaa：認識候補 bbbb：リンク先 cccc：補助情報＜／Extend Recognition＞The types of recognition devices corresponding to these various recognition methods are defined as categories, and each piece of recognition information is defined as follows. <Extend Recognition Category = ”xxxx”> xxxx: Category <Language = yyyyy> yyyy: Information on description language <List aaaa, bbbb, cccc> aaaa: Recognition candidate bbbb: Link destination cccc: Auxiliary information </ Extend Recognition>

【００２６】カテゴリーについては、以下のように定義
する ”Voice” 音声認識 ”Gesture”ジェスチャー ”Sign language” 手話 ”Face” 顔表情 ”Pen” 手書きパターン、タブレットなどカテゴリーはこれ以外にも、拡張が可能である。The categories are defined as follows: "Voice" Voice recognition "Gesture" Gesture "Sign language" Sign language "Face" Facial expression "Pen" Handwriting pattern, tablet, etc. It is.

【００２７】制御装置１０は自分に用意されていない拡
張部分は無視するので、上記フォーマットに沿って記述
されていれば、コンテンツ記述テキストに未知の認識装
置に関する記述があっても無視される。Since the control unit 10 ignores the extension that is not prepared for itself, if it is described in accordance with the above format, it is ignored even if there is a description about an unknown recognition device in the content description text.

【００２８】認識情報には以下のような例がある、音声認識の場合＜Language＝”Japanese”＞言語体系は日本語＜List”りんくいち”，http://www.server1.com,”りんく１”＞読みの候補は”りんくいち” リンク先はhttp://www.server1.com 画面に表示する場合は”リンク１”という文字列を使うThe recognition information includes the following examples. In the case of speech recognition <Language = "Japanese"> The language system is Japanese <List "Rinkuichi", http://www.server1.com, "Rinku" 1 "> Reading candidate is" Rinkuichi "Link destination is http://www.server1.com When displayed on the screen, use the character string" Link1 "

【００２９】ここで示すListの場合、”りんくいち”と
いう音声が認識されると、リンク先であるhttp://www.s
erver1.comのコンテンツを表示する。また画面に認識候
補を表示する場合は、３つめの”りんく１”という文字
列を使う。言語体系は、認識候補の表記にどういう言語
を使うかを示し、日本語以外の認識システムに候補を提
供する場合は、Languageタグをいくつか用意して、他国
語への対応も可能とする。In the case of the List shown here, when the voice "Rinkuichi" is recognized, the link destination http: //www.s
Display the contents of erver1.com. When a recognition candidate is displayed on the screen, a third character string "Rinku 1" is used. The language system indicates what language is used for notation of recognition candidates, and when providing candidates to recognition systems other than Japanese, several Language tags are prepared to enable support for other languages.

【００３０】ジェスチャーの場合は、＜Language＝”Japanese”＞動作記述は日本語＜List”右手上げ”，http://www.server1.com,”右手を上げる”＞読みの候補は”右手を上げる” リンク先はhttp://www.server1.com 画面に表示する場合は”右手上げる”という文字列を使うIn the case of a gesture, <Language = "Japanese"> The operation description is Japanese <List "Right hand up", http://www.server1.com, "Right hand up"> The candidate for reading is "Right hand" Use the character string "raise right" to display the link destination on the http://www.server1.com screen.

【００３１】ここでも動作の記述に日本語以外を使うこ
とを考慮し、Languageタグを用意する。ここにあげたLi
stの場合は、右手を上げるという動作が認識された場合
はリンク先であるhttp://www.server1.comのコンテンツ
を表示する。画面に動作の候補を表示する場合に３つめ
の”右手を上げる”という文字列を表示する。Here, a Language tag is prepared in consideration of using a language other than Japanese for the description of the operation. Li given here
In the case of st, when the operation of raising the right hand is recognized, the content of http://www.server1.com which is the link destination is displayed. When displaying the operation candidates on the screen, a third character string “raise right hand” is displayed.

【００３２】手話の場合、＜Language＝”Japanese”＞手話体系は日本語＜List”こんにちわ”，http://www.server1.com,”こんにちわ”＞手話の候補は”こんにちわ” リンク先はhttp://www.server1.com 画面に表示する場合は”こんにちわ”という文字列を使うIn the case of sign language, <Language = “Japanese”> Sign language system is Japanese <List ”Hello”, http://www.server1.com, “Hello”> Sign language candidate is “Hello” Link to http : //www.server1.com When displaying on the screen, use the character string "Hello"

【００３３】手話に手話体系として言語を指定するの
は、国によって手話の定義がそれぞれ違うからである。
手話体系としてここでもLanguageタグを用意する。ここ
にあげたListの場合は、”こんにちわ”ということを表
現する手話動作が認識された場合はリンク先であるhtt
p://www.server1.comのコンテンツを表示する。画面に
手話動作の候補を表示する場合に、３つめの”こんにち
わ”という文字列を表示する。The reason why a language is specified as a sign language system in sign language is that the definition of sign language differs from country to country.
A Language tag is also prepared here as a sign language system. In the case of the List given here, if the sign language action expressing "Hello" is recognized, the link destination is htt
Display the contents of p: //www.server1.com. When a candidate for the sign language action is displayed on the screen, a third character string "Hello" is displayed.

【００３４】顔表情の場合は、＜Language＝”Japanese”＞動作表記は日本語＜List”口開ける”，http://www.server1.com,”口を開ける”＞動作の候補は”口を開ける” リンク先はhttp://www.server1.com 画面に表示する場合は”口を開ける”という文字列を使うIn the case of a facial expression, <Language = "Japanese"> The operation notation is Japanese <List "Open mouth", http://www.server1.com, "Open mouth"> The operation candidate is "Mouth open". Use the character string "Open your mouth" to display the link on the http://www.server1.com screen.

【００３５】ここでも動作の記述に日本語以外を使うこ
とを考慮し、Languageタグを用意する。ここにあげたLi
stの場合は、”口を開ける”ということを動作が認識さ
れた場合はリンク先であるhttp://www.server1.comのコ
ンテンツを表示する。画面に顔表情の候補を表示する場
合に３つめ”口を開ける”という文字列を表示する。Here, too, a Language tag is prepared in consideration of using a language other than Japanese for the description of the operation. Li given here
In the case of st, when the operation of "opening the mouth" is recognized, the content of the link http://www.server1.com is displayed. When displaying facial expression candidates on the screen, a third character string “open mouth” is displayed.

【００３６】手書きのパターンの場合は、＜Language＝”Japanese”＞表記は日本語＜List”丸”，http://www.server1.com,”丸を書く”＞動作の候補は”丸”を書くリンク先はhttp://www.server1.com 画面に表示する場合は”丸を書く”という文字列を使うIn the case of a handwritten pattern, <Language = "Japanese"> is written in Japanese <List "maru", http://www.server1.com, "writes a circle"> The candidate for operation is "maru" Write the link http://www.server1.com When displaying on the screen, use the character string "Write a circle"

【００３７】ここでも動作の記述に日本語以外を使うこ
とを考慮し、Languageタグを用意する。日本語以外で
は、記号をそのまま記述したり、Draw Circle、などの
英語表記もある。ここにあげたListの場合は、丸を書く
という動作が認識された場合はリンク先であるhttp://w
ww.server1.comのコンテンツを表示する。画面に動作、
記号の候補を表示する場合に３つめの”丸を書く”とい
う文字列を表示する。Here, too, a Language tag is prepared in consideration of using a language other than Japanese for the description of the operation. Other than Japanese, there are also English notations such as writing symbols directly and Draw Circle. In the case of List given here, if the action of writing a circle is recognized, the link destination is http: // w
Display the contents of ww.server1.com. Work on screen,
When displaying candidate symbols, a third character string "write circle" is displayed.

【００３８】以上のように、様々な認識方式について、
認識候補とリンク先、補助情報を提供できるようにコン
テンツ記述テキストを用意する。制御装置１０では、読
み込まれたコンテンツ記述テキストに、上記の認識シス
テムのための候補があることが確認されると、自分の制
御装置１０で備える認識システムに合致する認識情報を
選択し、候補データから、認識システムの認識部に登録
する。認識システムでは、登録された動作、候補が選択
された場合に、どの候補が認識されたかを返し、制御装
置１０では、認識された結果からリンク先を選択し、リ
ンク先のコンテンツを表示する。As described above, for various recognition methods,
A content description text is prepared so as to provide a recognition candidate, a link destination, and auxiliary information. When it is confirmed that the read content description text includes a candidate for the above recognition system, the control device 10 selects recognition information that matches the recognition system provided in its own control device 10 and selects candidate data. From the recognition unit of the recognition system. In the recognition system, when a registered operation or candidate is selected, which candidate is recognized is returned, and the control device 10 selects a link destination from the recognized result and displays the content of the link destination.

【００３９】また、第２の実施の形態の場合は、上記の
List表記の４つめのパラメータに、その認識候補をガイ
ドする、ガイド情報を記述する。例えば音声認識用のLi
stであれば、＜List”りんくいち”，http://www.server1.com,”リ
ンク１”，Link1.wav＞というように、４つめのパラメータに、音声データであ
る波形ファイルを記述する。表示システムにおいて、こ
の第４パラメータを確認した場合、コンテンツの表示を
行う際に、同時に音声でこの波形ファイルの再生を行
う。In the case of the second embodiment,
In the fourth parameter of the List notation, guide information for guiding the recognition candidate is described. For example, Li for voice recognition
If it is st, describe a waveform file that is audio data in the fourth parameter, such as <List ”Rinkuichi”, http://www.server1.com, “Link1”, Link1.wav> . In the display system, when the fourth parameter is confirmed, when displaying the content, the waveform file is reproduced by voice at the same time.

【００４０】例えば、ページが表示され、音声による認
識で”りんくいち”という音声でリンクを指定できる場
合には、「このページに表示されたコンテンツはりんく
いちという音声で選択できます」というように音声のガ
イドを付けることができる。これが複数のListが含まれ
ているときは、「このページに表示されたコンテンツ
は」と「という音声で選択できます。」という音声を制
御装置１０で用意しておいて、各Listから「りんくい
ち」「りんくに」という候補に対する音声を抽出し、
「このページに表示されたコンテンツはりんくいち、り
んくにという音声で選択できます」というようにメッセ
ージを連結してガイドを行う。For example, if a page is displayed and a link can be designated by voice "Rinkuichi" by voice recognition, the content displayed on this page can be selected by voice "Rinkuichi". Audio guides can be added. When this includes a plurality of Lists, the control device 10 prepares a voice saying "The content displayed on this page can be selected by voice" and "Rink" from each List. Extract voices for candidates "ichi" and "rinkuni"
A guide is given by linking messages as follows: "The content displayed on this page can be selected with the voice of Rinkuichi."

【００４１】また、ジェスチャー、顔表情、手話の場合
は、第４パラメータには、動画、アニメーションなどの
データを記述し、＜List”右手上げる”，http://www.server1.com，”右
手上げる”，motion1.avi＞というように、音声付きの動画データガイド情報として
用い、コンテンツの表示と同時に「このページに表示さ
れたコンテンツは画面でごらんできるように、右手を上
げることによって選択できます」というメッセージと動
画によって候補のガイダンスを行える。複数の選択が可
能な場合は、音声認識の時と同様に、音声によるガイダ
ンスを連結することによって複数の候補をガイドする。In the case of gestures, facial expressions, and sign language, the fourth parameter describes data such as moving images and animations. <List “Right up”, http://www.server1.com, “Right Use it as video data guide information with audio, such as “raise”, motion1.avi>, and at the same time as displaying the content, “You can select the content displayed on this page by raising your right hand so that you can see it on the screen. "And the video can provide guidance for candidates. When a plurality of selections are possible, a plurality of candidates are guided by linking voice guidance as in the case of voice recognition.

【００４２】手話、手書きパターンについても同様に、
音声と動画、アニメーション、または静止画の候補を画
面に表示することにより、初心者でも操作に迷うことな
く操作を行える。また、これらの認識情報は、制御装置
１０によっては複数同時に対応することも可能であり、
ガイダンスについても表示システムにより「音声認識」
「ジェスチャー」の複数のガイダンスを行うようにする
ことが可能である。Similarly, for sign language and handwritten patterns,
By displaying voice and moving image, animation, or still image candidates on the screen, even a novice user can operate without hesitation. In addition, depending on the control device 10, a plurality of pieces of such recognition information can be simultaneously handled,
"Speech recognition" with guidance system for guidance
It is possible to provide a plurality of “gesture” guidances.

【００４３】[0043]

【発明の効果】以上説明したように本発明によれば、同
一のコンテンツ記述テキスト上に、複数の認識装置に対
する認識候補データとリンク情報を提供することがで
き、制御装置１０においても制御装置１０に備えられた
認識装置に応じて必要な認識候補データ、リンク情報の
取得が可能となる。また、同時にガイド情報を提供、表
示することにより、ユーザに対して、認識手段のガイド
を行えるようになる。According to the present invention as described above, recognition candidate data and link information for a plurality of recognition devices can be provided on the same content description text. It is possible to obtain necessary recognition candidate data and link information according to the recognition device provided in the device. Also, by simultaneously providing and displaying the guide information, the user can be guided by the recognition means.

[Brief description of the drawings]

【図１】本発明に係るコンテンツ表示選択システムの第
１の実施の形態の模式的ブロック図である。FIG. 1 is a schematic block diagram of a first embodiment of a content display selection system according to the present invention.

【図２】図１の第１の実施の形態における動作を示すフ
ローチャートである。FIG. 2 is a flowchart showing an operation in the first embodiment of FIG.

【図３】本発明に係るコンテンツ表示選択システムの第
２の実施の形態の模式的ブロック図である。FIG. 3 is a schematic block diagram of a second embodiment of the content display selection system according to the present invention.

【図４】図３の第２の実施の形態における動作を示すフ
ローチャートである。FIG. 4 is a flowchart showing an operation in the second embodiment of FIG. 3;

[Explanation of symbols]

１０制御装置１２画像を表示し音声を再生する表示装置１４Ａ音声認識を行なう第１認識装置１４Ｂ画像認識を行なう第２認識装置１６リンク情報記憶部１８補助情報表示制御部２０コンテンツ記述テキスト２２サーバ２４ガイド情報表示制御部 Reference Signs List 10 control device 12 display device for displaying image and reproducing sound 14A first recognition device for performing voice recognition 14B second recognition device for performing image recognition 16 link information storage unit 18 auxiliary information display control unit 20 content description text 22 server 24 Guide information display controller

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/00 Ｇ１０Ｌ 3/00 ５５１Ｐ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 15/00 G10L 3/00 551P

Claims

[Claims]

1. A content display selection system, comprising: a recognition device including at least voice recognition and image recognition; a display device for displaying content; and control means for controlling the recognition device and the display device in association with each other, The descriptive text describing the content to be displayed includes “category” indicating the type of the recognition device, “recognition candidate data” that is a candidate for a user selection item recognized by the recognition device, and The link destination “link information” is described as recognition information, and the control unit sends a “category” of the display content displayed on the display device to the recognition device,
The “recognition candidate data” is sent to the recognition device so that the “recognition candidate data” displayed on the display device can be recognized by the recognition device, and the corresponding link information is received in response to the recognition result by the recognition device. And controlling the display device to change the content displayed on the display device to the content indicated by the corresponding link information.

2. An "auxiliary information" for displaying a supplementary character string related to the recognition candidate data in the description text.
Is described as the recognition information, the control means sends the "auxiliary information" to the display device, and the display device displays a supplementary character string on the recognition candidate data on a screen by the "auxiliary information". The content display selection system according to claim 1, wherein the content display selection system is configured as follows.

3. The description text describes “guide information” indicating a recognition method included in the recognition device as the recognition information, and the control unit sends the “guide information” to the display device, The content display according to claim 1, wherein the display device is configured to indicate a recognition method included in the recognition device on a screen by the "guide information" and provide a guide of the recognition method to a user. Selection system.

4. The content display selection system according to claim 1, wherein said image recognition recognizes at least one of a gesture, a sign language, a facial expression, and a handwritten pattern.

5. A content including a description for giving a category of a recognition method by voice or image so that a user's intention can be recognized by at least one of a plurality of recognition methods including voice recognition and image recognition. A content recording medium recorded in a readable state.