JP2006227219A

JP2006227219A - Information generating device, information output device, and program

Info

Publication number: JP2006227219A
Application number: JP2005039794A
Authority: JP
Inventors: Yohei Kawaguchi; 洋平川口; Yasuyuki Sumi; 康之角; Masaru Kumagai; 賢熊谷; Toyoaki Nishida; 豊明西田; Kenji Mase; 健二間瀬
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-02-16
Filing date: 2005-02-16
Publication date: 2006-08-31

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information generating device, an information output device, and a program in which a contents unit that is a portion of the contents can be accumulated from raw contents related to the explanation or the discussion of an exhibited object and the contents unit can appropriately be output. <P>SOLUTION: The information generating device accumulates the contents unit that is a portion of the contents from raw contents related to the explanation or the discussion of the exhibited object. The information generating device has a voice information receiving section which receives voice information that is the information having voice, a contents unit extracting section which extracts the contents unit having voice information equivalent to the time, that matches with a prescribed condition, based on the voice information received by the voice information receiving section and a contents unit accumulating section which accumulates the contents unit information that is related to the contents unit extracted by the contents unit extracting section. Thus, the information generating device can extract the contents unit which can be reused. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、たとえば、展示物や展示パネルなどの対象について説明したり、議論したりした際の音声情報を記録し、かつ一部を切り出す要約処理を行ったり、または出力したりする情報生成装置、情報出力装置等に関するものである。 The present invention relates to an information generation apparatus that records audio information when an object such as an exhibit or an exhibition panel is discussed or discusses, and performs or outputs a summary process that extracts a part of the audio information. The present invention relates to an information output device and the like.

従来の動画等を要約する動画要約方法において、動画内の意味的に重要なイベントを自動抽出することが可能な動画要約方法が存在する（例えば、特許文献１参照）。本動画要約方法は、動画の特徴を抽出する特徴抽出器４０と、特徴をインテグレートし境界を決定するための隠れマルコフモデルなどのモデルを使用する確率モデル４２と、コマーシャル及び非コマーシャルスローモーション再生セグメントを区別するコマーシャル／非コマーシャルフィルタ４４と、検出したスローモーション再生セグメントに基づき要約を生成する要約生成器４６とを含む。本動画要約方法における特徴抽出器４０は、ブロック５０でカラーヒストグラムから特徴を抽出し、ブロック５２で画素に基づく差から３つの特徴を抽出する。ブロック５２で抽出した特徴は、再生セグメントのスローモーション，静止フィールド，及び／又はノーマル速度再生の各構成成分を特徴づける。本動画要約方法におけるブロック５０で抽出した特徴は編集効果成分を特徴付ける。
また、本実施の形態で述べる生コンテンツの取得方法に関して、非特許文献１において開示されている。
特開２００２−２３２８４０号公報（第１頁、第１図等）角康之他９名、"ユビキタス環境における体験の記録と共有"、システム／制御／情報（システム制御情報学会誌）、２００４年１１月、Ｖｏｌ．４８，Ｎｏ．１，ｐｐ．４５８−４６３ In a conventional moving image summarizing method for summarizing moving images and the like, there is a moving image summarizing method capable of automatically extracting semantically important events in a moving image (for example, see Patent Document 1). The video summarization method includes a feature extractor 40 that extracts video features, a probability model 42 that uses a model such as a hidden Markov model to integrate features and determine boundaries, and commercial and non-commercial slow motion playback segments. A commercial / non-commercial filter 44 and a summary generator 46 that generates a summary based on the detected slow motion playback segment. The feature extractor 40 in this moving image summarization method extracts features from the color histogram in block 50 and extracts three features from the pixel-based differences in block 52. The features extracted at block 52 characterize each component of the playback segment's slow motion, still field, and / or normal speed playback. The feature extracted in block 50 in the moving image summarizing method characterizes the editing effect component.
Further, Non-Patent Document 1 discloses a raw content acquisition method described in the present embodiment.
JP 2002-232840 A (first page, FIG. 1 etc.) Yasuyuki Kaku et al., “Recording and Sharing Experiences in Ubiquitous Environment”, System / Control / Information (Journal of System Control Information Society), November 2004, Vol. 48, no. 1, pp. 458-463

しかしながら、従来の動画要約方法においては、展示物や展示パネルなどの対象について説明したり、議論したりした際に記録される生コンテンツから、再利用され得るコンテンツユニットを抽出することができない、という課題があった。また、かかるコンテンツユニットを適切に出力し、展示会等における説明等を代わりに行うエージェントである分身エージェントを実現することはできない、という課題があった。 However, the conventional video summarization method cannot extract reusable content units from raw content recorded when explaining or discussing objects such as exhibits and display panels. There was a problem. In addition, there is a problem that it is not possible to realize a clone agent that is an agent that appropriately outputs such a content unit and performs explanation in an exhibition or the like instead.

本第一の発明の情報生成装置は、展示されているオブジェクトに対する説明または議論に関する生コンテンツから、一部のコンテンツであるコンテンツユニットを蓄積する情報生成装置であり、音声を有する情報である音声情報を受け付ける音声情報受付部と、前記音声情報受付部が受け付けた音声情報に基づいて、所定の条件に合致する時間分の音声情報を有するコンテンツユニットを抽出するコンテンツユニット抽出部と、前記コンテンツユニット抽出部が抽出したコンテンツユニットに関する情報であるコンテンツユニット情報を蓄積するコンテンツユニット蓄積部を具備する情報生成装置である。ここで、オブジェクトとは、展示パネルや展示物である。展示パネルとは、例えば、ディスプレイに表示された情報である表示情報である。展示物とは、例えば、美術館の絵画、博物館の恐竜などである。
かかる構成により、展示されているオブジェクトに対する説明または議論に関するコンテンツであり、意味のある、再利用可能なコンテンツユニットを、自動的に抽出できる。 The information generation apparatus according to the first aspect of the present invention is an information generation apparatus that accumulates content units, which are part of content, from raw content related to an explanation or discussion of an object on display. An audio information reception unit that receives audio information, a content unit extraction unit that extracts content units having audio information corresponding to a predetermined time based on the audio information received by the audio information reception unit, and the content unit extraction An information generation apparatus including a content unit storage unit that stores content unit information that is information about content units extracted by the unit. Here, the object is an exhibition panel or an exhibit. An exhibition panel is display information which is information displayed on a display, for example. The exhibits include, for example, art museum paintings and museum dinosaurs.
With this configuration, it is possible to automatically extract a meaningful and reusable content unit that is content related to an explanation or discussion of an object on display.

また、本第二の発明の情報生成装置は、第一の発明に対して、表示する情報である表示情報を格納している表示情報格納部と、前記表示情報を表示する表示部と、前記表示部が表示している表示情報に対する指示を受け付ける指示受付部と、前記指示受付部が受け付けた指示の位置に関する情報である位置情報を取得する位置情報取得部をさらに具備し、前記コンテンツユニット抽出部は、前記指示受付部が受け付けた指示、および前記音声情報受付部が受け付けた音声情報に基づいて、所定の条件に合致する時間分の音声情報、および当該時間の間に前記位置情報取得部が取得した位置情報を有するコンテンツユニットを抽出する情報生成装置である。
かかる構成により、説明や議論の箇所に対応したコンテンツユニットを、自動的に抽出できる。 In addition, the information generating apparatus according to the second aspect of the present invention provides a display information storage unit that stores display information that is information to be displayed, a display unit that displays the display information, The content unit extraction further includes: an instruction receiving unit that receives an instruction for display information displayed by the display unit; and a position information acquisition unit that acquires position information that is information related to a position of the instruction received by the instruction receiving unit. The unit includes voice information corresponding to a predetermined condition based on the instruction received by the instruction receiving unit and the voice information received by the voice information receiving unit, and the position information acquisition unit during the time. Is an information generation device that extracts a content unit having position information acquired.
With this configuration, it is possible to automatically extract content units corresponding to explanations and discussion points.

また、本第三の発明の情報生成装置は、第一、第二の発明に対して、前記コンテンツユニット抽出部は、一の発話者により所定の時間以上、発話されていると判断される音声である説明音声を検出する説明音声検出手段と、前記表示情報に対する指示の位置を示す位置情報を１以上有する典型説明情報を格納している典型説明情報格納手段と、前記位置情報取得部が取得した１以上の位置情報と前記典型説明情報格納手段が格納している１以上の位置情報に基づいて、前記位置情報取得部が取得した１以上の位置情報に対応する説明と、前記典型説明情報に対応する説明との類似度を検出する類似度検出手段と、前記説明音声検出手段が説明音声を検出し、かつ前記類似度検出手段が検出した類似度が予め決められた類似度以上の類似度である場合に、当該説明音声の時間分の音声情報を有するレクチャーコンテンツユニットを抽出するレクチャーコンテンツユニット抽出手段を具備する情報生成装置である。
かかる構成により、典型的な説明用のコンテンツユニットであるレクチャーコンテンツユニットを自動的に抽出できる。 In the information generating apparatus according to the third aspect of the invention, in contrast to the first and second aspects of the invention, the content unit extractor is a voice that is determined to be uttered by a single speaker for a predetermined time or more. The explanation voice detecting means for detecting the explanation voice, the typical explanation information storing means for storing the typical explanation information having one or more position information indicating the position of the instruction with respect to the display information, and the position information acquisition unit A description corresponding to the one or more pieces of position information acquired by the position information acquisition unit based on the one or more pieces of position information and one or more pieces of position information stored in the typical explanation information storage means, and the typical explanation information Similarity detection means for detecting the similarity to the explanation corresponding to, and the similarity detected by the explanation voice detection means and the similarity detected by the similarity detection means is equal to or greater than a predetermined similarity Degree If an information generating apparatus having a lecture content unit extracting means for extracting a lecture content unit having a time duration of the audio information of the description speech.
With this configuration, a lecture content unit, which is a typical content unit for explanation, can be automatically extracted.

また、本第四の発明の情報生成装置は、第一、第二の発明に対して、前記コンテンツユニット抽出部は、一の発話者識別子と対になる音声と、ほぼ連続する他の発話者識別子と対になる音声を有する対話音声を検出する対話音声検出手段と、前記対話音声検出手段が対話音声を検出した場合、当該対話音声の時間分の音声情報を有するインタラクションコンテンツユニットを抽出するインタラクションコンテンツユニット抽出手段を具備する情報生成装置である。
かかる構成により、展示されているオブジェクトに対する議論に関するコンテンツであるインタラクションコンテンツユニットを自動的に抽出できる。 Further, in the information generating apparatus according to the fourth invention, in contrast to the first and second inventions, the content unit extraction unit is configured such that the speech paired with one speaker identifier and another speaker who is substantially continuous. Dialogue voice detecting means for detecting a dialogue voice having a voice paired with an identifier, and an interaction content unit having voice information corresponding to the time of the dialogue voice when the dialogue voice detection means detects the dialogue voice An information generation apparatus including content unit extraction means.
With this configuration, it is possible to automatically extract an interaction content unit that is a content related to a discussion on the displayed object.

また、本第五の発明の情報生成装置は、第一から第三の発明に対して、前記コンテンツユニット蓄積部が蓄積したコンテンツユニットが有する音声情報を少なくとも出力するコンテンツユニット出力部をさらに具備する情報生成装置である。
かかる構成により、蓄積したコンテンツユニットを出力でき、コンテンツユニットの再利用ができる。なお、本第一から四の発明において、コンテンツユニットの出力は、他の装置で行っても良い。 In addition, the information generating apparatus according to the fifth aspect of the present invention further includes a content unit output unit that outputs at least audio information included in the content unit stored in the content unit storage unit. An information generation device.
With this configuration, the accumulated content unit can be output, and the content unit can be reused. In the first to fourth aspects of the invention, the content unit may be output by another device.

また、本第六の発明の情報生成装置は、展示されているオブジェクトに対する説明または議論に関するコンテンツであるコンテンツユニットを１以上格納しているコンテンツユニット格納部と、コンテンツユニットの出力のトリガーを検出するトリガー検出部と、前記トリガー検出部がトリガーを検出した場合に、前記コンテンツユニット格納部から１以上のコンテンツユニットを取得するコンテンツユニット取得部と、前記コンテンツユニット取得部が取得したコンテンツユニットを出力するコンテンツユニット出力部を具備する情報出力装置である。
かかる構成により、トリガーの検出により、自動的にコンテンツユニットが出力され、好適である。
また、本第七の発明の情報生成装置は、第六の発明に対して、表示する情報である表示情報を格納している表示情報格納部と、前記表示情報を表示する表示部をさらに具備する情報出力装置である。
かかる構成により、展示パネルの説明等のコンテンツユニットを自動的に出力できる。
また、本第八の発明の情報生成装置は、第七の発明に対して、前記トリガー検出部が検出するトリガーは、前記表示部が表示している表示情報に対する指示である情報出力装置である。
かかる構成により、表示情報に対する指示により、自動的にコンテンツユニットが出力され、好適である。 The information generation apparatus of the sixth aspect of the invention detects a content unit storage unit that stores one or more content units that are contents related to the explanation or discussion of the displayed object, and a trigger for output of the content unit. When the trigger detection unit detects the trigger, the content unit acquisition unit that acquires one or more content units from the content unit storage unit, and the content unit acquired by the content unit acquisition unit are output. An information output apparatus including a content unit output unit.
Such a configuration is preferable because the content unit is automatically output upon detection of the trigger.
In addition, the information generating apparatus of the seventh invention further includes a display information storage unit that stores display information that is information to be displayed, and a display unit that displays the display information, relative to the sixth invention. Information output device.
With this configuration, a content unit such as an explanation of an exhibition panel can be automatically output.
The information generation apparatus according to the eighth aspect of the invention is the information output apparatus according to the seventh aspect, wherein the trigger detected by the trigger detection unit is an instruction for the display information displayed by the display unit. .
With such a configuration, the content unit is automatically output in accordance with an instruction for the display information, which is preferable.

また、本第九の発明の情報生成装置は、第八の発明に対して、前記コンテンツユニットは、音声情報と位置情報を有し、前記トリガー検出部が検出した指示の位置に関する情報である位置情報を取得する位置情報取得部をさらに具備し、前記コンテンツユニット取得部は、前記コンテンツユニット格納部から、前記位置情報取得部が取得した位置情報に対応するコンテンツユニットを取得する情報出力装置である。
かかる構成により、ユーザが指示した箇所に対応する説明または議論についてのコンテンツユニットが出力され、好適である。 The information generating apparatus according to the ninth aspect of the present invention is the information generating apparatus according to the eighth aspect, wherein the content unit has audio information and position information, and is information related to the position of the instruction detected by the trigger detection unit. The information output device further includes a position information acquisition unit that acquires information, and the content unit acquisition unit acquires a content unit corresponding to the position information acquired by the position information acquisition unit from the content unit storage unit. .
With this configuration, a content unit for explanation or discussion corresponding to a location designated by the user is output, which is preferable.

また、本第十の発明の情報生成装置は、第六から第九の発明に対して、前記コンテンツユニット取得部は、前記表示情報が表示されているディスプレイとユーザとの距離に関する情報である距離情報を取得する距離情報取得手段と、前記距離情報取得手段が取得した距離情報に基づいて、前記コンテンツユニット格納部から１以上のコンテンツユニットを取得するコンテンツユニット取得手段を具備する情報出力装置である。
かかる構成により、展示されているオブジェクトとユーザとの距離により、ユーザの当該オブジェクトに対する興味を推定し、当該興味に応じたコンテンツユニットを出力できる。 Further, in the information generating device according to the tenth aspect of the present invention, in contrast to the sixth to ninth aspects, the content unit acquisition unit is a distance that is information relating to a distance between the display on which the display information is displayed and the user. An information output device comprising: distance information acquisition means for acquiring information; and content unit acquisition means for acquiring one or more content units from the content unit storage unit based on distance information acquired by the distance information acquisition means. .
With this configuration, the user's interest in the object can be estimated based on the distance between the object on display and the user, and a content unit corresponding to the interest can be output.

また、本第十一の発明の情報生成装置は、第六から第十の発明に対して、ユーザを識別する情報であるオブジェクト識別子を取得するオブジェクト識別子取得部と、前記コンテンツユニット出力部がコンテンツユニットを出力している間に、前記オブジェクト識別子取得部が取得したオブジェクト識別子を、前記コンテンツユニットに対応付けて蓄積する既説明情報蓄積部をさらに具備し、前記コンテンツユニット取得部は、前記オブジェクト識別子取得部が取得したオブジェクト識別子と対応付けられているコンテンツユニットは取得しない情報処理装置である。
かかる構成により、一のユーザに対して、説明等の重複を避けることができる。 In addition, the information generation apparatus according to the eleventh aspect of the invention provides an object identifier acquisition unit that acquires an object identifier that is information for identifying a user, and the content unit output unit is content based on the sixth to tenth aspects of the invention. And a description information storage unit that stores the object identifier acquired by the object identifier acquisition unit in association with the content unit while outputting the unit, and the content unit acquisition unit includes the object identifier. The content unit associated with the object identifier acquired by the acquisition unit is an information processing apparatus that does not acquire the content unit.
With this configuration, it is possible to avoid duplication of explanations for one user.

また、本第十二の発明の情報生成装置は、第六から第十一の発明に対して、１以上のエージェントを格納しているエージェント格納部をさらに具備し、前記コンテンツユニット出力部は、コンテンツユニットを出力する際に、前記エージェントをも出力する情報出力装置である。 Further, the information generation apparatus of the twelfth aspect of the invention further comprises an agent storage unit storing one or more agents with respect to the sixth to eleventh aspects of the invention, wherein the content unit output unit includes: When outputting a content unit, the information output device also outputs the agent.

かかる構成により、代理の分身エージェントが説明する態様でコンテンツユニット等を出力することにより、展示物の回りに説明員が居なくても、説明員が実際に説明しているような臨場感を、ユーザ（見学者）に提供できる。 With such a configuration, by outputting the content unit in a manner described by the substitute agent agent, even if there is no explanation staff around the exhibit, a sense of reality that the explanation staff actually explains, Can be provided to users (visitors).

本発明による情報生成装置によれば、展示されているオブジェクトに対する説明または議論に関する生コンテンツから、一部のコンテンツであるコンテンツユニットを蓄積できる。また、本発明による情報出力装置によれば、コンテンツユニットを適切に出力できる。 According to the information generating apparatus of the present invention, it is possible to accumulate content units, which are part of content, from raw content related to explanations or discussions on objects on display. Moreover, according to the information output device of the present invention, the content unit can be output appropriately.

以下、情報生成装置等の実施形態について図面を参照して説明する。なお、実施の形態において同じ符号を付した構成要素は同様の動作を行うので、再度の説明を省略する場合がある。
（実施の形態１） Hereinafter, embodiments of an information generation device and the like will be described with reference to the drawings. In addition, since the component which attached | subjected the same code | symbol in embodiment performs the same operation | movement, description may be abbreviate | omitted again.
(Embodiment 1)

実施の形態１において、展示物や展示パネルなどの対象について説明したり、議論したりした際の映像や音声を記録し、当該記録した映像、音声等から、対象について説明したり、議論したりする際に重要であると考えられるコンテンツ（かかるコンテンツを、適宜「コンテンツユニット」という）を抽出し、蓄積する情報生成装置について述べる。なお、情報生成装置においても、情報出力機能を有する場合について述べるが、主な情報出力機能の説明は、実施の形態２において行う。また、展示物や展示パネルなどの対象について説明したり、議論したりした際の映像や音声を有するコンテンツであり、加工していないコンテンツを、生コンテンツという。
図１は、本実施の形態における情報生成装置のブロック図である。 In Embodiment 1, a video or audio when an object such as an exhibit or an exhibition panel is explained or discussed is recorded, and the object is explained or discussed from the recorded video or audio. An information generation apparatus that extracts and stores content that is considered to be important in the process (such content is appropriately referred to as “content unit”) will be described. Note that although the information generation apparatus also has a case of having an information output function, the main information output function will be described in the second embodiment. In addition, content that has video and audio when an object such as an exhibit or display panel is explained or discussed, and content that has not been processed is referred to as raw content.
FIG. 1 is a block diagram of an information generation apparatus according to the present embodiment.

情報生成装置は、対象識別子格納部１０１、表示情報格納部１０２、表示部１０３、指示受付部１０４、位置情報取得部１０５、映像情報受付部１０６、音声情報受付部１０７、オブジェクト識別子取得部１０８、コンテンツユニット抽出部１０９、コンテンツユニット蓄積部１１０、コンテンツユニット出力部１１１を具備する。 The information generation apparatus includes a target identifier storage unit 101, a display information storage unit 102, a display unit 103, an instruction reception unit 104, a position information acquisition unit 105, a video information reception unit 106, an audio information reception unit 107, an object identifier acquisition unit 108, A content unit extraction unit 109, a content unit storage unit 110, and a content unit output unit 111 are provided.

コンテンツユニット抽出部１０９は、説明音声検出手段１０９１、典型説明情報格納手段１０９２、類似度検出手段１０９３、レクチャーコンテンツユニット抽出手段１０９４、対話音声検出手段１０９５、インタラクションコンテンツユニット抽出手段１０９６を具備する。 The content unit extraction unit 109 includes explanation voice detection means 1091, typical explanation information storage means 1092, similarity detection means 1093, lecture content unit extraction means 1094, dialogue voice detection means 1095, and interaction content unit extraction means 1096.

対象識別子格納部１０１は、説明または議論の対象を識別する対象識別子を格納している。説明または議論の対象とは、例えば、展示パネルである。ここでは、主として、対象をディスプレイ（タッチパネルなど）に表示された展示パネルとして説明するが、博物館や美術館等の展示物も含む。対象識別子は、ＩＤや対象物名など、そのデータ構造は問わない。対象識別子は、複数の対象に対する生コンテンツ、コンテンツユニットと同時に扱う場合に必要である。ただし、本明細書の具体例等において、説明を簡略化するために、通常、扱う対象は、一つとして説明する。対象識別子格納部１０１は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。不揮発性の記録媒体でも、揮発性の記録媒体でも良い。 The target identifier storage unit 101 stores a target identifier for identifying a target of explanation or discussion. The object of explanation or discussion is, for example, an exhibition panel. Here, the object is mainly described as an exhibition panel displayed on a display (such as a touch panel), but also includes exhibits such as museums and art galleries. The object identifier may be any data structure such as an ID or an object name. The target identifier is necessary when handling raw content and content units for a plurality of targets at the same time. However, in the specific example of this specification, in order to simplify the description, the object to be handled is usually described as one. The target identifier storage unit 101 is preferably a nonvolatile recording medium, but can also be realized by a volatile recording medium. It may be a non-volatile recording medium or a volatile recording medium.

表示情報格納部１０２は、表示する情報である表示情報を格納している。表示情報とは、例えば、展示パネルを構成する情報であり、その構造は問わない。表示情報は、例えば、ビットマップファイル、テキストファイル、ＴｅＸやＰｏｓｔＳｃｒｉｐｔ等の文書記述言語で記載されたファイル等、何でも良い。表示情報格納部１０２は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。不揮発性の記録媒体でも、揮発性の記録媒体でも良い。 The display information storage unit 102 stores display information that is information to be displayed. The display information is, for example, information that constitutes an exhibition panel, and its structure is not limited. The display information may be anything such as a bitmap file, a text file, a file described in a document description language such as TeX or PostScript. The display information storage unit 102 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. It may be a non-volatile recording medium or a volatile recording medium.

表示部１０３は、表示情報格納部１０２の表示情報を表示する。表示部１０３は、ディスプレイ等の出力デバイスを含むと考えても含まないと考えても良い。表示部１０３は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。ここで、ディスプレイは、例えば、タッチパネルを具備する。 The display unit 103 displays the display information stored in the display information storage unit 102. The display unit 103 may be considered as including or not including an output device such as a display. The display unit 103 can be implemented by output device driver software, or output device driver software and an output device. Here, the display includes, for example, a touch panel.

指示受付部１０４は、ユーザからの各種の指示を受け付ける。例えば、指示受付部１０４は、表示部１０３が表示している表示情報に対する指示を受け付ける。この指示は、表示情報について説明する説明員や、表示情報について議論する者等が入力する指示である。また、指示受付部１０４は、コンテンツユニットを抽出する指示であるコンテンツユニット抽出指示を受け付ける。コンテンツユニット抽出指示は、蓄積された音声情報等から、後述するレクチャーコンテンツユニットやインタラクションコンテンツユニット等を取得する指示である。また、指示受付部１０４は、コンテンツユニットを出力する指示であるコンテンツユニット出力指示を受け付ける。かかる指示の入力手段は、通常、タッチパネルへの押下であるが、マウスやテンキーやキーボードやメニュー画面によるもの等、何でも良い。指示受付部１０４は、タッチパネルやマウス等の入力手段のデバイスドライバー等で実現され得る。 The instruction receiving unit 104 receives various instructions from the user. For example, the instruction receiving unit 104 receives an instruction for display information displayed on the display unit 103. This instruction is an instruction input by an instructor explaining display information, a person discussing display information, or the like. The instruction receiving unit 104 receives a content unit extraction instruction that is an instruction to extract a content unit. The content unit extraction instruction is an instruction for acquiring a lecture content unit, an interaction content unit, etc., which will be described later, from the accumulated audio information or the like. The instruction receiving unit 104 receives a content unit output instruction that is an instruction to output a content unit. The instruction input means is usually a press on the touch panel, but may be anything such as a mouse, a numeric keypad, a keyboard, or a menu screen. The instruction receiving unit 104 can be realized by a device driver of an input unit such as a touch panel or a mouse.

位置情報取得部１０５は、指示受付部１０４が受け付けた指示の位置に関する情報である位置情報を取得する。位置情報は、通常、座標情報（ｘ，ｙ）であるが、指示されたあたりの領域や指示されたあたりの段落（パネルに表示されている文書の段落）などでも良い。位置情報取得部１０５は、例えば、タッチパネルのドライバーソフトで実現される。なお、位置情報取得部１０５は、ＭＰＵやメモリ等から実現されても良い。位置情報取得部１０５は、指示される対象が物理的な展示物（例えば、博物館の恐竜など）である場合には、当該展示物が指示された位置情報を取得する手段により構成される。かかる場合、例えば、展示物に指示され得るボタンが設置されており、位置情報取得部１０５は、当該ボタンの押下、および押下されたボタンを認識し、当該ボタンに対応する位置情報を取得する。 The position information acquisition unit 105 acquires position information that is information regarding the position of the instruction received by the instruction reception unit 104. The position information is usually coordinate information (x, y), but it may be a region around the designated area or a paragraph around the designated area (a paragraph of the document displayed on the panel). The position information acquisition unit 105 is realized by, for example, touch panel driver software. Note that the position information acquisition unit 105 may be realized by an MPU, a memory, or the like. The position information acquisition unit 105 is configured by means for acquiring position information in which the exhibit is instructed when the object to be instructed is a physical exhibit (for example, a dinosaur in a museum). In this case, for example, a button that can be instructed to the exhibit is installed, and the position information acquisition unit 105 recognizes the pressing of the button and the pressed button, and acquires position information corresponding to the button.

映像情報受付部１０６は、映像を有する映像情報を受け付ける。かかる映像には、通常、対象について説明する者、議論する者、対象の展示物等が写っている。映像情報は、例えば、映像と、映像を取得したカメラを識別するカメラ識別子を有する。ただし、映像情報は、映像のみでも良く、その他の情報（時刻の情報や、場所の情報（例えば、場所の情報は、図示しないＧＰＳ受信機で取得する）など）を含んでも良く、映像情報のデータ構造は問わない。映像情報受付部１０６は、映像を取得するカメラ等を含むと考えても、含まないと考えても良い。映像情報受付部１０６は、カメラから映像情報を受信する通信手段等で実現され得る。映像情報受付部１０６は、ＤＶＤ等の記録媒体から映像情報を読み出す手段で実現されても良い。 The video information receiving unit 106 receives video information having a video. Such a video usually shows a person explaining the object, a person discussing it, a display object of the object, and the like. The video information includes, for example, a video and a camera identifier that identifies the camera that acquired the video. However, the video information may be only the video, and may include other information (time information, location information (for example, location information is acquired by a GPS receiver not shown), etc.) The data structure does not matter. The video information receiving unit 106 may or may not include a camera that acquires video. The video information receiving unit 106 can be realized by a communication unit that receives video information from a camera. The video information receiving unit 106 may be realized by means for reading video information from a recording medium such as a DVD.

音声情報受付部１０７は、音声を有する情報である音声情報を受け付ける。音声は、例えば、対象について説明する説明者や、対象について議論する者等の発話者が発した音声である。音声情報は、例えば、音声と、発話者を識別する発話者識別子を有する。音声情報受付部１０７は、マイクを含むと考えても含まないと考えても良い。音声情報受付部１０７は、例えば、マイクから送信される音声情報を受信する手段で実現され得る。かかる場合、マイクは、音声と、マイクを識別する情報（ここでは、発話者識別子）を有する音声情報を対にして、音声情報受付部１０７に送信する。また、マイクは、スロート・マイクでも良い。 The voice information receiving unit 107 receives voice information that is information having voice. The voice is, for example, a voice uttered by a speaker such as an explainer explaining the target or a person discussing the target. The voice information includes, for example, voice and a speaker identifier that identifies the speaker. The voice information receiving unit 107 may be considered to include or not include a microphone. The voice information receiving unit 107 can be realized by a means for receiving voice information transmitted from a microphone, for example. In such a case, the microphone transmits a pair of voice and voice information having information for identifying the microphone (here, a speaker identifier) to the voice information reception unit 107. The microphone may be a throat microphone.

オブジェクト識別子取得部１０８は、１以上のオブジェクトを識別し、当該オブジェクトを識別する情報であるオブジェクト識別子を１以上取得する。ここで、オブジェクトとは、展示物、展示パネルなどの対象、および対象について説明や議論をする者（発話者）をいう。 The object identifier acquisition unit 108 identifies one or more objects, and acquires one or more object identifiers that are information for identifying the objects. Here, the object means an object such as an exhibit or an exhibition panel, and a person (speaker) who explains or discusses the object.

かかるオブジェクト識別子を出力する方法の例を、以下に説明する。まず、対象や発話者に、赤外線のＩＤを発信するタグ（以下、適宜「ＬＥＤタグ」という。）を付ける。そして、展示パネル等が設置された会場に多数のビデオカメラを設置する。そして、ビデオカメラに赤外線ＩＤを認識するセンサ（以下、「ＩＲトラッカ」という）を併設する。オブジェクト識別子取得部１０８は、かかるＩＲトラッカで実現され得る。オブジェクト識別子取得部１０８の実現手段の例の詳細は、非特許文献１に記載されているので、ここでの詳細な説明は省略する。なお、オブジェクト識別子取得部１０８は、映像受付部１０６が受け付けた映像を画像認識し、当該映像に映っている発話者を識別し、発話者ＩＤ（オブジェクト識別子）を取得するような構成でも良い。 An example of a method for outputting such an object identifier will be described below. First, a tag for transmitting an infrared ID (hereinafter, referred to as “LED tag” as appropriate) is attached to an object or a speaker. A large number of video cameras are installed at the venue where the display panels are installed. The video camera is provided with a sensor for recognizing the infrared ID (hereinafter referred to as “IR tracker”). The object identifier acquisition unit 108 can be realized by such an IR tracker. Details of examples of means for realizing the object identifier acquisition unit 108 are described in Non-Patent Document 1, and thus detailed description thereof is omitted here. The object identifier acquisition unit 108 may be configured to recognize an image received by the video reception unit 106, identify a speaker shown in the video, and acquire a speaker ID (object identifier).

コンテンツユニット抽出部１０９は、音声情報受付部１０７が受け付けた音声情報に基づいて、所定の条件に合致する時間分の音声情報と、前記対象識別子を有するコンテンツユニットを抽出する。「所定の条件に合致する」とは、例えば、以下で詳述するレクチャーであることを認識できたこと、または、以下で詳述するインタラクションであることを認識できたことである。「所定の条件に合致する」とは、例えば、音声の大きさが所定以上の時間が、予め決められた時間以上続く場合に、当該時間分の音声情報等が所定の条件に合致することとなる。その他、所定の条件には種々ある。コンテンツユニットは、少なくとも音声情報は有する。その他、コンテンツユニットは、映像、１以上のオブジェクト識別子、位置情報、当該位置情報に対応する表示情報に関する情報である表示関連情報のうち１種以上の情報を有しても良い。表示関連情報は、表示情報の段落を示す情報や表示情報の一部の情報等である。また、コンテンツユニットは、以下で詳述するレクチャーコンテンツユニットやインタラクションコンテンツユニットなどの種類がある。さらに、「コンテンツユニットを抽出する」とは、出力されるコンテンツユニット自体を抽出することでも良いし、出力されるコンテンツユニットを抽出するための情報（ポインタ情報）を抽出することでも良い。コンテンツユニット抽出部１０９は、通常、ＭＰＵやメモリ等から実現され得る。コンテンツユニット抽出部１０９の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 Based on the audio information received by the audio information receiving unit 107, the content unit extracting unit 109 extracts audio information for a time that matches a predetermined condition and a content unit having the target identifier. “Matching a predetermined condition” means that, for example, the lecture described in detail below can be recognized, or the interaction described in detail below can be recognized. “Matching a predetermined condition” means that, for example, when a time when the volume of the sound is greater than or equal to a predetermined time continues for a predetermined time or more, the sound information for the time matches the predetermined condition. Become. In addition, there are various predetermined conditions. The content unit has at least audio information. In addition, the content unit may include one or more types of information among display related information, which is information regarding video, one or more object identifiers, position information, and display information corresponding to the position information. The display related information is information indicating a paragraph of the display information, partial information of the display information, and the like. In addition, the content unit includes types such as a lecture content unit and an interaction content unit, which will be described in detail below. Further, “extracting a content unit” may be extracting an output content unit itself or extracting information (pointer information) for extracting an output content unit. The content unit extraction unit 109 can be usually realized by an MPU, a memory, or the like. The processing procedure of the content unit extraction unit 109 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

コンテンツユニット蓄積部１１０は、コンテンツユニット抽出部１０９が抽出したコンテンツユニットに関する情報であるコンテンツユニット情報を蓄積する。コンテンツユニット情報は、出力されるコンテンツユニット自体でも良いし、出力されるコンテンツユニットを抽出するための情報（ポインタ情報）でも良い。コンテンツユニット蓄積部１１０がコンテンツユニットを蓄積する記録媒体は、情報生成装置の外部装置の記録媒体でも良いし、情報生成装置が有する記録媒体でも良い。コンテンツユニット蓄積部１１０は、通常、ＭＰＵやメモリ等から実現され得る。コンテンツユニット蓄積部１１０の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The content unit accumulating unit 110 accumulates content unit information that is information regarding the content unit extracted by the content unit extracting unit 109. The content unit information may be the output content unit itself, or may be information (pointer information) for extracting the output content unit. The recording medium in which the content unit storage unit 110 stores the content unit may be a recording medium of an external device of the information generation apparatus or a recording medium included in the information generation apparatus. The content unit storage unit 110 can usually be realized by an MPU, a memory, or the like. The processing procedure of the content unit storage unit 110 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

コンテンツユニット出力部１１１は、コンテンツユニット蓄積部１１０が蓄積したコンテンツユニットを出力する。コンテンツユニット出力部１１１は、コンテンツユニットのすべての種類の情報を出力する必要はない。コンテンツユニット出力部１１１は、コンテンツユニットが有する音声情報を少なくとも出力する。コンテンツユニット出力部１１１は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。コンテンツユニット出力部１１１は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。ここで、出力とは、ディスプレイへの表示、プリンタへの印字、音出力、外部の装置への送信等を含む概念である。 The content unit output unit 111 outputs the content unit stored by the content unit storage unit 110. The content unit output unit 111 need not output all types of information of content units. The content unit output unit 111 outputs at least audio information included in the content unit. The content unit output unit 111 may be considered as including or not including an output device such as a display or a speaker. The content unit output unit 111 can be implemented by output device driver software, or output device driver software and an output device. Here, output is a concept including display on a display, printing on a printer, sound output, transmission to an external device, and the like.

説明音声検出手段１０９１は、一の発話者により所定の時間以上、発話されていると判断される音声である説明音声を検出する。説明音声検出手段１０９１は、通常、ＭＰＵやメモリ等から実現され得る。説明音声検出手段の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The explanation voice detecting means 1091 detects explanation voice, which is a voice that is determined to be spoken by a single speaker for a predetermined time or more. The explanation voice detecting means 1091 can be usually realized by an MPU, a memory or the like. The processing procedure of the explanation voice detecting means is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

典型説明情報格納手段１０９２は、表示情報に対する指示の位置を示す位置情報を１以上有する典型説明情報を格納している。典型説明情報は、通常、展示パネルなどの対象（ここでの表示情報）についての典型的な説明を、説明員が行った場合に、当該説明員が対象を指示する位置の情報を１以上有する。ただし、典型説明情報が示す意義は問わない。つまり、典型説明情報は、特異な場合の説明パターンにおいて、説明員が対象を指示する位置の情報を１以上有するものでも良い。典型説明情報は、位置情報と時刻情報と対に有する指示情報を１以上有する構成でも良い。典型説明情報格納手段１０９２は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。不揮発性の記録媒体でも、揮発性の記録媒体でも良い。 The typical explanation information storage unit 1092 stores typical explanation information having one or more pieces of position information indicating the position of the instruction with respect to the display information. The typical explanation information usually has one or more pieces of information on the position where the explanation staff indicates the target when the explanation staff gives a typical explanation about the target such as an exhibition panel (display information here). . However, the significance indicated by the typical explanation information does not matter. That is, the typical explanation information may have one or more pieces of information on the position where the explanation staff indicates the target in the explanation pattern in the case of peculiarity. The typical explanation information may be configured to include one or more instruction information included in a pair of position information and time information. The typical explanation information storage unit 1092 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. It may be a non-volatile recording medium or a volatile recording medium.

類似度検出手段１０９３は、位置情報取得部１０５が取得した１以上の位置情報と典型説明情報格納手段１０９２が格納している１以上の位置情報に基づいて、位置情報取得部が取得した１以上の位置情報に対応する説明と、典型説明情報に対応する説明との類似度を検出する。なお、類似度検出手段１０９３は、位置情報取得部１０５が取得した位置情報と、当該位置情報を取得した時刻の情報を対に有する１以上の指示情報（これを取得指示情報という）と、典型説明情報格納手段１０９２の典型説明情報（時刻の情報も含む）を比較して、類似度を検出しても良い。かかる類似度の検出方法には、種々あり、その方法は問わない。類似度検出手段１０９３は、通常、ＭＰＵやメモリ等から実現され得る。類似度検出手段１０９３の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The similarity detection means 1093 is one or more acquired by the position information acquisition unit based on the one or more position information acquired by the position information acquisition unit 105 and the one or more position information stored in the typical explanation information storage unit 1092. The degree of similarity between the explanation corresponding to the position information and the explanation corresponding to the typical explanation information is detected. The similarity detection unit 1093 includes one or more pieces of instruction information (this is referred to as acquisition instruction information) having a pair of the position information acquired by the position information acquisition unit 105 and the time information at which the position information is acquired. The degree of similarity may be detected by comparing the typical explanation information (including time information) of the explanation information storage unit 1092. There are various methods for detecting the degree of similarity, and any method can be used. The similarity detection means 1093 can be usually realized by an MPU, a memory, or the like. The processing procedure of the similarity detection means 1093 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

レクチャーコンテンツユニット抽出手段１０９４は、説明音声検出手段１０９１が説明音声を検出し、かつ類似度検出手段１０９３が検出した類似度が予め決められた類似度以上の類似度である場合に、当該説明音声の時間分の音声情報と対象識別子を有するレクチャーコンテンツユニットを抽出する。ただし、対象が一つの場合は、レクチャーコンテンツユニットにおいて、音声情報を有するのみでも良い。レクチャーコンテンツユニット抽出手段１０９４は、通常、ＭＰＵやメモリ等から実現され得る。レクチャーコンテンツユニット抽出手段１０９４の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The lecture content unit extraction means 1094 detects the explanation voice when the explanation voice detection means 1091 detects the explanation voice and the similarity detected by the similarity detection means 1093 is equal to or higher than a predetermined similarity. The lecture content unit having the audio information and the target identifier for the time is extracted. However, when there is only one target, the lecture content unit may have only audio information. Lecture content unit extraction means 1094 can usually be realized by an MPU, a memory, or the like. The processing procedure of the lecture content unit extraction means 1094 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

対話音声検出手段１０９５は、一の発話者識別子と対になる音声と、ほぼ連続する他の発話者識別子と対になる音声を有する対話音声を検出する。「ほぼ連続する」とは、所定の間隔以内の時間的な重なりがあっても良いし、所定の間隔以内の時間的な間があっても良いことをいう。また、時間的な重なりがある場合と、時間的な間がある場合の「所定の間隔」は、異なっていても良い。対話音声検出手段１０９５は、通常、ＭＰＵやメモリ等から実現され得る。対話音声検出手段１０９５の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The dialog voice detecting means 1095 detects a dialog voice having a voice paired with one speaker identifier and a voice paired with another substantially continuous speaker identifier. “Substantially continuous” means that there may be temporal overlap within a predetermined interval or there may be a temporal interval within a predetermined interval. Also, the “predetermined interval” when there is a temporal overlap and when there is a time interval may be different. The dialogue voice detecting means 1095 can be usually realized by an MPU, a memory or the like. The processing procedure of the dialog voice detecting means 1095 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

インタラクションコンテンツユニット抽出手段１０９６は、対話音声検出手段１０９５が対話音声を検出した場合、当該対話音声の時間分の音声情報と対象識別子を有するインタラクションコンテンツユニットを抽出する。なお、インタラクションコンテンツユニットについても、対象識別子は必須ではない。インタラクションコンテンツユニットは、音声情報だけでも良い。ただ、インタラクションコンテンツユニットは、２人以上の音声を含むので、音声を識別する情報（たとえば、上述したオブジェクト識別子）を有することは好ましい。また、インタラクションコンテンツユニットは、誰の音声かを認識する情報は無くても、異なる者の音声であることを認識できる情報を有するだけでも良い。また、インタラクションコンテンツユニット抽出手段１０９６は、音声情報だけではなく、後述するように、対話している２以上の者（オブジェクト）が対向していることを検出することを条件として、インタラクションコンテンツユニットを抽出しても良い。ここで、「対向する」とは、真正面から向きあう必要はなく、対話できる位置関係にあれば良い。「対向する」ことを認識するアルゴリズム例は、後述するが、その方法は問わない。インタラクションコンテンツユニット抽出手段１０９６は、通常、ＭＰＵやメモリ等から実現され得る。インタラクションコンテンツユニット抽出手段１０９６の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。
次に、情報生成装置の動作について図２から図６のフローチャートを用いて説明する。
（ステップＳ２０１）表示部１０３は、表示情報格納部１０２の表示情報を読み出す。
（ステップＳ２０２）表示部１０３は、ステップＳ２０１で取得した表示情報を表示する。 The interaction content unit extraction unit 1096 extracts the interaction content unit having the voice information and the target identifier for the time of the dialogue voice when the dialogue voice detection unit 1095 detects the dialogue voice. Note that the target identifier is not essential for the interaction content unit. The interaction content unit may be only audio information. However, since the interaction content unit includes two or more voices, it is preferable to have information (for example, the above-described object identifier) for identifying the voice. Further, the interaction content unit may have only information that can recognize the voice of a different person even if there is no information that recognizes who the voice is. Further, the interaction content unit extraction means 1096 detects not only the audio information but also the interaction content unit on condition that two or more persons (objects) that are interacting with each other face each other, as will be described later. It may be extracted. Here, “facing” does not need to face from the front, but may be in a positional relationship that allows dialogue. An algorithm example for recognizing “facing” will be described later, but the method is not limited. The interaction content unit extraction unit 1096 can be usually realized by an MPU, a memory, or the like. The processing procedure of the interaction content unit extraction means 1096 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
Next, the operation of the information generating apparatus will be described using the flowcharts of FIGS.
(Step S <b> 201) The display unit 103 reads display information from the display information storage unit 102.
(Step S202) The display unit 103 displays the display information acquired in step S201.

（ステップＳ２０３）指示受付部１０４は、コンテンツユニット抽出指示を受け付けたか否かを判断する。コンテンツユニット抽出指示を受け付ければステップＳ２０４に行き、コンテンツユニット抽出指示を受け付けなければステップＳ２０６に飛ぶ。 (Step S203) The instruction receiving unit 104 determines whether or not a content unit extraction instruction has been received. If a content unit extraction instruction is accepted, the process proceeds to step S204, and if a content unit extraction instruction is not accepted, the process jumps to step S206.

（ステップＳ２０４）コンテンツユニット抽出部１０９は、音声情報受付部１０７が受け付けた音声情報に基づいて、所定の条件に合致する時間分の音声情報と、前記対象識別子を有するコンテンツユニットを抽出する。コンテンツユニットを抽出する処理の詳細は、後述する。
（ステップＳ２０５）コンテンツユニット蓄積部１１０は、ステップＳ２０４で抽出したコンテンツユニット情報を蓄積する。 (Step S204) Based on the audio information received by the audio information receiving unit 107, the content unit extracting unit 109 extracts audio information for a time that matches a predetermined condition and a content unit having the target identifier. Details of the process of extracting the content unit will be described later.
(Step S205) The content unit storage unit 110 stores the content unit information extracted in step S204.

（ステップＳ２０６）指示受付部１０４は、コンテンツユニット出力指示を受け付けたか否かを判断する。コンテンツユニット出力指示を受け付ければステップＳ２０７に行き、コンテンツユニット出力指示を受け付けなければステップＳ２０９に飛ぶ。 (Step S206) The instruction receiving unit 104 determines whether or not a content unit output instruction has been received. If the content unit output instruction is accepted, the process goes to step S207, and if the content unit output instruction is not accepted, the process jumps to step S209.

（ステップＳ２０７）コンテンツユニット出力部１１１は、コンテンツユニット蓄積部１１０が蓄積したコンテンツユニットが存在するか否かを判断する。または、コンテンツユニット出力部１１１は、ステップＳ２０６で受け付けた指示に対応するコンテンツユニットが存在するか否かを判断する。上記判断の結果、条件に合致するコンテンツユニットが存在するとの判断の場合はステップＳ２０８に行き、存在しないとの判断の場合はステップＳ２０３に戻る。 (Step S207) The content unit output unit 111 determines whether there is a content unit stored by the content unit storage unit 110. Alternatively, the content unit output unit 111 determines whether there is a content unit corresponding to the instruction received in step S206. As a result of the determination, if it is determined that there is a content unit that matches the condition, the process goes to step S208. If it is determined that there is no content unit, the process returns to step S203.

（ステップＳ２０８）コンテンツユニット出力部１１１は、コンテンツユニット蓄積部１１０が蓄積したコンテンツユニットを出力する。なお、ここで、コンテンツユニット出力部１１１は、ステップＳ２０６で受け付けた指示に対応するコンテンツユニットを選択して出力することは好適である。「受け付けた指示に対応するコンテンツユニット」とは、例えば、指示が示す位置情報に対応する（当該位置情報に近い位置情報と対になる）コンテンツユニットである。また、コンテンツユニット出力部１１１は、その他の条件に応じて、コンテンツユニットを選択して出力することは好適である。コンテンツユニットを出力する処理の例の詳細は、実施の形態２において説明する。
（ステップＳ２０９）映像情報受付部１０６は、映像を有する映像情報を受け付ける。映像情報受付部１０６は、が一度に受け付ける映像情報のデータ量（撮影時間）は、問わない。
（ステップＳ２１０）音声情報受付部１０７は、音声を有する情報である音声情報を受け付ける。 (Step S208) The content unit output unit 111 outputs the content units stored by the content unit storage unit 110. Here, it is preferable that the content unit output unit 111 selects and outputs a content unit corresponding to the instruction received in step S206. The “content unit corresponding to the received instruction” is, for example, a content unit corresponding to position information indicated by the instruction (paired with position information close to the position information). In addition, it is preferable that the content unit output unit 111 selects and outputs a content unit according to other conditions. Details of an example of processing for outputting a content unit will be described in Embodiment 2.
(Step S209) The video information receiving unit 106 receives video information having a video. The video information receiving unit 106 does not matter the data amount (shooting time) of the video information that is received at one time.
(Step S210) The voice information receiving unit 107 receives voice information that is information having voice.

（ステップＳ２１１）オブジェクト識別子取得部１０８は、オブジェクトを識別する情報であるオブジェクト識別子を１以上取得する。なお、オブジェクト識別子取得部１０８は、例えば、ＩＲトラッカからオブジェクト識別子を１以上取得する。 (Step S211) The object identifier acquisition unit 108 acquires one or more object identifiers which are information for identifying an object. The object identifier acquisition unit 108 acquires one or more object identifiers from the IR tracker, for example.

（ステップＳ２１２）コンテンツユニット抽出部１０９は、対象識別子格納部１０１から対象識別子を取得する。対象識別子は、例えば、展示パネルや、展示パネルが展示されているブースや、展示物（例えば、美術館の絵画など）を識別する識別子である。なお、対象識別子を取得する処理は必須ではない。 (Step S212) The content unit extraction unit 109 acquires the target identifier from the target identifier storage unit 101. The target identifier is, for example, an identifier for identifying an exhibition panel, a booth where the exhibition panel is displayed, or an exhibit (for example, a picture of a museum). Note that the process of acquiring the target identifier is not essential.

（ステップＳ２１３）指示受付部１０４は、表示情報に対する指示を受け付けたか否かを判断する。表示情報に対する指示を受け付ければステップＳ２１４に行き、当該指示を受け付けなければステップＳ２１５に飛ぶ。なお、指示受付部１０４は、表示情報ではなく、展示物に対する指示を受け付けても良い。
（ステップＳ２１４）位置情報取得部１０５は、ステップＳ２１３で受け付けた指示の位置に関する情報である位置情報を取得する。 (Step S213) The instruction receiving unit 104 determines whether an instruction for display information has been received. If the instruction | indication with respect to display information is received, it will go to step S214, and if the said instruction | indication is not received, it will fly to step S215. Note that the instruction receiving unit 104 may receive an instruction for the exhibit instead of the display information.
(Step S214) The position information acquisition unit 105 acquires position information that is information regarding the position of the instruction received in step S213.

（ステップＳ２１５）コンテンツユニット抽出部１０９は、ステップＳ２０９からステップＳ２１４で取得された映像情報、音声情報、オブジェクト識別子、対象識別子、位置情報を対にして、蓄積する情報を構成する。なお、蓄積する情報は、映像情報、音声情報、オブジェクト識別子、対象識別子、位置情報のすべての情報を必須とはせず、例えば、音声情報だけ、また音声情報と位置情報だけ、また音声情報と対象識別子だけ等でも良い。また、蓄積する情報は、上述した生コンテンツである。 (Step S215) The content unit extraction unit 109 configures information to be accumulated by pairing the video information, audio information, object identifier, target identifier, and position information acquired in steps S209 to S214. Note that the information to be stored does not necessarily include all information of video information, audio information, object identifier, target identifier, and position information. For example, only the audio information, only the audio information and the position information, and the audio information Only the target identifier may be used. Further, the information to be stored is the above-described raw content.

（ステップＳ２１６）コンテンツユニット抽出部１０９は、ステップＳ２１５で構成した生コンテンツを一時蓄積する。そして、コンテンツユニット抽出指示を受け付けるまで、少なくともコンテンツユニット抽出部１０９は、生コンテンツを保持している。かかる情報を保持している記録媒体は、ハードディスクやＲＯＭ等の不揮発性の記録媒体でも、ＲＡＭ等の揮発性の記録媒体でも良い。また、コンテンツユニット抽出部１０９は、生コンテンツを、不揮発性の記録媒体に蓄積することは好適である。特に、コンテンツユニット情報が、出力されるコンテンツユニットを抽出するための情報（ポインタ情報）である場合、生コンテンツは、通常、不揮発性の記録媒体に蓄積される。
なお、図２のフローチャートにおいて、ステップＳ２１５やステップＳ２１６の処理は、コンテンツユニット抽出部１０９ではない、他の手段や図示しない他の手段が行っても良い。
また、図２のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。
次に、コンテンツユニットを抽出する処理の第一の例について、図３、図４のフローチャートを用いて説明する。図３、図４において、レクチャーコンテンツユニットを抽出する処理について説明する。
（ステップＳ３０１）類似度検出手段１０９３は、典型説明情報格納手段１０９２が格納している典型説明情報を取得する。
（ステップＳ３０２）類似度検出手段１０９３は、カウンタｉに１を代入する。なお、カウンタｉをどのようにインクリメントするか（インクリメントのデータサイズの幅）は問わない。 (Step S216) The content unit extraction unit 109 temporarily stores the raw content configured in step S215. Until at least the content unit extraction instruction is received, at least the content unit extraction unit 109 holds the raw content. The recording medium holding such information may be a non-volatile recording medium such as a hard disk or a ROM, or a volatile recording medium such as a RAM. Further, it is preferable that the content unit extraction unit 109 stores the raw content in a nonvolatile recording medium. In particular, when the content unit information is information (pointer information) for extracting an output content unit, the raw content is normally stored in a nonvolatile recording medium.
In the flowchart of FIG. 2, the processing of step S215 and step S216 may be performed by other means other than the content unit extraction unit 109 or other means (not shown).
In the flowchart of FIG. 2, the process is ended by power-off or a process end interrupt.
Next, a first example of processing for extracting a content unit will be described with reference to the flowcharts of FIGS. A process of extracting a lecture content unit will be described with reference to FIGS.
(Step S301) The similarity detection unit 1093 acquires the typical explanation information stored in the typical explanation information storage unit 1092.
(Step S302) The similarity detection unit 1093 substitutes 1 for a counter i. It does not matter how the counter i is incremented (increment data size width).

（ステップＳ３０３）類似度検出手段１０９３は、ｉ番目の音声の大きさが所定の大きさ以上であるか否か（つまり、発話されているか否か）を判断する。所定の大きさ以上であればステップＳ３０４に行き、所定の大きさ未満であればステップＳ３０９に行く。なお、ｉが決まれば、音声のデータのうちのどの位置（アドレス）の音声のデータであるかが決まる、とする。
（ステップＳ３０４）類似度検出手段１０９３は、ｉ番目の音声と同一の発話者が発した連続する音声を取得する。なお、ここでは、音声と対に発話者識別子が存在する、とする。 (Step S303) The similarity detection unit 1093 determines whether or not the i-th voice is greater than or equal to a predetermined volume (that is, whether or not an utterance is being made). If it is equal to or larger than the predetermined size, the process goes to step S304, and if it is less than the predetermined size, the process goes to step S309. When i is determined, it is determined which position (address) of the audio data is the audio data.
(Step S304) The similarity detection unit 1093 acquires continuous voices uttered by the same speaker as the i-th voice. Here, it is assumed that a speaker identifier exists in a pair with the voice.

（ステップＳ３０５）類似度検出手段１０９３は、ステップＳ３０１で取得した典型説明情報と、ステップＳ３０４で取得した連続する音声（以下、適宜「音声群」という）の時間分の位置情報等を取得し、当該位置情報等と典型説明情報との類似度を算出する。なお、類似度算出処理の例の詳細は、図４のフローチャートを用いて説明する。また、ここで、位置情報等とは、位置情報だけでも良いし、位置情報と位置情報に対応する時刻情報を有しても良い。なお、ステップＳ３０５で取得する類似度が、似ていない度合いであっても良い。ステップＳ３０５で取得する類似度が、似ていない度合である場合、似ているとは、類似度が所定の値以下である場合である。 (Step S305) The similarity detection unit 1093 acquires the typical explanation information acquired in Step S301, the positional information for the time of the continuous sound acquired in Step S304 (hereinafter referred to as “voice group” as appropriate), and the like. The similarity between the position information and the typical explanation information is calculated. Details of the similarity calculation processing example will be described with reference to the flowchart of FIG. Here, the position information or the like may be only position information, or may include position information and time information corresponding to the position information. Note that the degree of similarity acquired in step S305 may be a degree of dissimilarity. If the degree of similarity acquired in step S305 is a degree that is not similar, the similarity is a case where the degree of similarity is equal to or less than a predetermined value.

（ステップＳ３０６）レクチャーコンテンツユニット抽出手段１０９４は、ステップＳ３０６で算出した類似度が所定値以上（似ている、ということ）であるか否かを判断する。類似度が所定値以上であれば（似ていれば）ステップＳ３０７に行き、類似度が所定値未満（似ていなければ）であればステップＳ３０８に行く。 (Step S306) The lecture content unit extraction unit 1094 determines whether the similarity calculated in step S306 is equal to or greater than a predetermined value (similar). If the similarity is greater than or equal to a predetermined value (if similar), go to step S307, and if the similarity is less than a predetermined value (if not similar), go to step S308.

（ステップＳ３０７）レクチャーコンテンツユニット抽出手段１０９４は、音声情報等を含むレクチャーコンテンツユニットを抽出する。ここで、抽出とは、レクチャーコンテンツユニットを別途、図示しない記録媒体に蓄積することでも良いし、ステップＳ２１６で蓄積されたコンテンツ中に、レクチャーコンテンツユニットである旨のフラグ（通常、開始フラグ（下記の例では、始点）と終了フラグ（下記の例では、終点）がある）を記録することでも良い。フラグを記録した場合は、出力時に当該フラグを目印にして、レクチャーコンテンツユニットを取得る。フラグとは、レクチャーコンテンツユニットを構成するコンテンツの始点、終点を示すポインタでも良い。 (Step S307) The lecture content unit extraction unit 1094 extracts a lecture content unit including audio information and the like. Here, extraction may be to separately store a lecture content unit in a recording medium (not shown), or in the content stored in step S216, a flag indicating that it is a lecture content unit (usually a start flag (described below)). In this example, a start point) and an end flag (in the following example, there is an end point) may be recorded. When the flag is recorded, the lecture content unit is obtained by using the flag as a mark at the time of output. The flag may be a pointer indicating the start point and the end point of the content constituting the lecture content unit.

（ステップＳ３０８）レクチャーコンテンツユニット抽出手段１０９４は、カウンタｉを１、インクリメントする。なお、ｉを１、インクリメントすることは、類似度を算出すべき次の音声のデータまで、ポインタを進めることであり、インクリメントするｉの幅（音声データ中のバイト数）は一定でなくても良い、つまり、ステップＳ３０４で取得した連続する音声の類似度が低い場合（典型説明情報に類似していなかった場合）、レクチャーコンテンツユニット抽出手段１０９４は、その次の音声までポインタを進める。また、ｉ番目の音声の大きさが所定の大きさ未満である場合、例えば、レクチャーコンテンツユニット抽出手段１０９４は、次の音声までインタを進める。ステップＳ３０３に行く。
（ステップＳ３０９）レクチャーコンテンツユニット抽出手段１０９４は、ｉ番目の音声が存在するか否かを判断する。ｉ番目の音声が存在すればステップＳ３０８に行き、存在しなければ上位関数にリターンする。
次に、類似度算出処理の例の詳細について図４のフローチャートを用いて説明する。なお、本類似度算出処理の説明において、典型説明情報と取得した連続音声に対応する位置情報等の類似度を算出する。 (Step S308) The lecture content unit extraction unit 1094 increments the counter i by one. Note that incrementing i by 1 means that the pointer is advanced to the next audio data for which the similarity is to be calculated, and the width of i to be incremented (the number of bytes in the audio data) is not constant. If it is good, that is, if the similarity of the continuous sounds acquired in step S304 is low (if they are not similar to the typical explanation information), the lecture content unit extraction unit 1094 advances the pointer to the next sound. Also, if the i-th audio is less than a predetermined level, for example, the lecture content unit extraction unit 1094 advances the inter to the next audio. Go to step S303.
(Step S309) The lecture content unit extraction means 1094 determines whether or not the i-th audio exists. If the i-th voice exists, the process goes to step S308, and if not, the process returns to the upper function.
Next, details of an example of similarity calculation processing will be described using the flowchart of FIG. In the description of the similarity calculation process, the similarity of the position information corresponding to the typical explanation information and the acquired continuous sound is calculated.

（ステップＳ４０１）類似度検出手段１０９３は、初期化する。つまり、類似度検出手段１０９３は、カウンタｊに１を代入する。また、類似度検出手段１０９３は、変数「距離」および変数「時間差」に０を代入する。変数「距離」は、２つの位置情報の差を累積した情報を代入する変数である。なお、２つの位置情報とは、典型説明情報の位置情報と、連続音声に対応する位置情報等の位置情報である。変数「時間差」は、典型説明情報が有する２つの位置情報が記録された時間間隔と、位置情報等が有する２つの位置情報が記録された時間間隔の差の累積を代入する変数である。 (Step S401) The similarity detection unit 1093 initializes. That is, the similarity detection unit 1093 substitutes 1 for the counter j. The similarity detection unit 1093 substitutes 0 for the variable “distance” and the variable “time difference”. The variable “distance” is a variable into which information obtained by accumulating the difference between two pieces of position information is substituted. The two pieces of position information are position information such as position information of typical explanation information and position information corresponding to continuous sound. The variable “time difference” is a variable for substituting the cumulative difference between the time interval in which the two pieces of position information included in the typical explanation information are recorded and the time interval in which the two pieces of position information included in the position information are recorded.

（ステップＳ４０２）類似度検出手段１０９３は、典型説明情報中にｊ番目の位置情報が存在するか否かを判断する。ｊ番目の位置情報が存在すればステップＳ４０３に行き、ｊ番目の位置情報が存在しなければステップＳ４１２に行く。
（ステップＳ４０３）類似度検出手段１０９３は、ｊ番目の位置情報を典型説明情報から取得する。
（ステップＳ４０４）類似度検出手段１０９３は、取得した連続音声に対応する位置情報等からｊ番目の位置情報を取得する。
（ステップＳ４０５）類似度検出手段１０９３は、ステップＳ４０３で取得した位置情報と、ステップＳ４０４で取得した位置情報との距離を算出する。
（ステップＳ４０６）類似度検出手段１０９３は、変数「距離」に、ステップＳ４０５で算出した距離を加える。
（ステップＳ４０７）類似度検出手段１０９３は、（ｊ−１）番目からｊ番目の位置情報の間の時間を、典型説明情報から取得する。なお、ｊが「１」の場合、位置情報の間の時間は、「０」とする。 (Step S402) The similarity detection unit 1093 determines whether or not the j-th position information exists in the typical explanation information. If the j-th position information exists, the process proceeds to step S403, and if the j-th position information does not exist, the process proceeds to step S412.
(Step S403) The similarity detection unit 1093 acquires the j-th position information from the typical explanation information.
(Step S404) The similarity detection unit 1093 acquires the j-th position information from the position information corresponding to the acquired continuous sound.
(Step S405) The similarity detection unit 1093 calculates the distance between the position information acquired in step S403 and the position information acquired in step S404.
(Step S406) The similarity detection unit 1093 adds the distance calculated in step S405 to the variable “distance”.
(Step S407) The similarity detection unit 1093 acquires the time between the (j−1) th to jth position information from the typical explanation information. When j is “1”, the time between the position information is “0”.

（ステップＳ４０８）類似度検出手段１０９３は、（ｊ−１）番目からｊ番目の位置情報の間の時間を、取得した連続音声に対応する位置情報等から取得する。なお、ｊが「１」の場合、位置情報の間の時間は、「０」とする。
（ステップＳ４０９）類似度検出手段１０９３は、ステップＳ４０７で取得した時間と、ステップＳ４０８で取得した時間の差である時間差を算出する。
（ステップＳ４１０）類似度検出手段１０９３は、変数「時間差」に、ステップＳ４０９で算出した時間差を加える。
（ステップＳ４１１）類似度検出手段１０９３は、カウンタｊを１、インクリメントする。ステップＳ４０２に戻る。 (Step S408) The similarity detection unit 1093 acquires the time between the (j−1) th to jth position information from the position information corresponding to the acquired continuous sound. When j is “1”, the time between the position information is “0”.
(Step S409) The similarity detection unit 1093 calculates a time difference which is a difference between the time acquired in Step S407 and the time acquired in Step S408.
(Step S410) The similarity detection unit 1093 adds the time difference calculated in Step S409 to the variable “time difference”.
(Step S411) The similarity detection unit 1093 increments the counter j by 1. The process returns to step S402.

（ステップＳ４１２）類似度検出手段１０９３は、変数「距離」の値、変数「時間差」の値をパラメータとして、類似度を算出する。かかる算出の関数は問わない。例えば、類似度検出手段１０９３は、ｆ＝「０．７×（１／変数「距離」の値）＋０．３×（１／変数「時間差」の値）」により、類似度を算出する。かかる場合、距離の差を、時間差に比べて重視したこととなる。上位関数にリターンする。 (Step S412) The similarity detection unit 1093 calculates the similarity using the value of the variable “distance” and the value of the variable “time difference” as parameters. The calculation function is not limited. For example, the similarity detection unit 1093 calculates the similarity by f = “0.7 × (1 / variable“ distance ”value) + 0.3 × (1 / variable“ time difference ”value)”. In such a case, the difference in distance is more important than the time difference. Return to upper function.

なお、類似度の算出アルゴリズムは、上記に限られないことは言うまでもない。例えば、類似度検出手段１０９３は、位置情報のみを比較して類似度を算出しても良い。また、例えば、類似度検出手段１０９３は、典型説明情報の位置情報が示す領域を、取得した連続音声に対応する位置情報等が示す領域がすべて含む場合（連続音声が、典型的説明において説明する箇所をすべて説明した際の音声であると考えられる場合）に、類似度を高い値に設定するようなアルゴリズムでも良い。また、類似度は、「似ている（値が「１」）」、「似ていない（値が「０」）」のような簡単な情報でも良い。
次に、コンテンツユニットを抽出する処理の第一の例について、図５、図６のフローチャートを用いて説明する。図５、図６において、インタラクションコンテンツユニットを抽出する処理について説明する。 Needless to say, the algorithm for calculating the similarity is not limited to the above. For example, the similarity detection unit 1093 may calculate the similarity by comparing only the position information. Further, for example, the similarity detection unit 1093 includes the area indicated by the position information of the typical explanation information in the case where all the areas indicated by the position information corresponding to the acquired continuous voice are included (the continuous voice is described in the typical explanation). An algorithm may be used in which the similarity is set to a high value in the case where it is considered that the voice is when all the portions are described. The similarity may be simple information such as “similar (value is“ 1 ”)” or “not similar (value is“ 0 ”)”.
Next, a first example of processing for extracting a content unit will be described with reference to the flowcharts of FIGS. The process of extracting the interaction content unit will be described with reference to FIGS.

（ステップＳ５０１）対話音声検出手段１０９５は、（ｉ，ｊ）の組を決定する。ｉ、ｊは、発話者識別子に対応する。つまり、２人以上の発話者が存在することが前提であり、かかる場合、本ステップにおいて、２人の発話者の組を決定する。通常、最初は、「ｉ＝１」、「ｊ＝２」である。
（ステップＳ５０２）対話音声検出手段１０９５は、ｉに対応する発話者識別子の音声を取得する。
（ステップＳ５０３）対話音声検出手段１０９５は、ｊに対応する発話者識別子の音声を取得する。 (Step S501) The dialogue voice detecting means 1095 determines a set (i, j). i and j correspond to speaker identifiers. That is, it is assumed that there are two or more speakers, and in such a case, a set of two speakers is determined in this step. Usually, “i = 1” and “j = 2” are initially set.
(Step S502) The dialogue voice detecting means 1095 acquires the voice of the speaker identifier corresponding to i.
(Step S503) The dialogue voice detecting means 1095 acquires the voice of the speaker identifier corresponding to j.

（ステップＳ５０４）対話音声検出手段１０９５は、ステップＳ５０２で取得した音声、ステップＳ５０３で取得した音声に基づいて、対話を検出する処理を行う。この対話検出処理の詳細については、図６のフローチャートを用いて説明する。 (Step S504) The dialogue voice detecting means 1095 performs processing for detecting a dialogue based on the voice obtained in Step S502 and the voice obtained in Step S503. Details of this dialogue detection processing will be described with reference to the flowchart of FIG.

（ステップＳ５０５）インタラクションコンテンツユニット抽出手段１０９６は、ステップＳ５０４で検出した対話音声群に基づいて、対話を構成する音声情報等を含むインタラクションコンテンツユニットを抽出する。インタラクションコンテンツユニットの抽出とは、インタラクションコンテンツユニットを構成する全コンテンツ（音声や映像など）を取得する処理でも良いし、生コンテンツに対するポインタの情報（例えば、「始点」および「終点」）を取得する処理でも良い。 (Step S505) The interaction content unit extraction unit 1096 extracts an interaction content unit including audio information and the like constituting the dialogue based on the dialogue voice group detected in step S504. The extraction of the interaction content unit may be a process of acquiring all the contents (such as audio and video) constituting the interaction content unit, or acquiring pointer information (for example, “start point” and “end point”) for the raw content. Processing is also acceptable.

（ステップＳ５０６）対話音声検出手段１０９５は、次の（ｉ，ｊ）が存在するか否かを判断する。次の（ｉ，ｊ）が存在すればステップＳ５０１に戻り、次の（ｉ，ｊ）が存在しなければ上位関数にリターンする。なお、（１，２）の（ｉ，ｊ）は、例えば、（１，３）である。なお、例えば、発話者識別子は「１，２，３・・・」という具合に付与されている、とする。
次に、対話検出処理の例について、図６のフローチャートを用いて説明する。
（ステップＳ６０１）対話音声検出手段１０９５は、初期化する。ここでは、対話音声検出手段１０９５は、カウンタｋ，ｌに１を代入する。 (Step S506) The dialogue voice detecting means 1095 determines whether or not the next (i, j) exists. If the next (i, j) exists, the process returns to step S501. If the next (i, j) does not exist, the process returns to the upper function. Note that (1, j) in (1, 2) is, for example, (1, 3). For example, it is assumed that the speaker identifier is given as “1, 2, 3,...”.
Next, an example of the conversation detection process will be described using the flowchart of FIG.
(Step S601) The dialog voice detecting means 1095 is initialized. Here, the dialog voice detecting means 1095 substitutes 1 for the counters k and l.

（ステップＳ６０２）対話音声検出手段１０９５は、ｉの発話者のｋ番目の連続する音声群が存在するか否かを判断する。音声群が存在すればステップＳ６０３に行き、音声群が存在しなければ上位関数にリターンする。
（ステップＳ６０３）対話音声検出手段１０９５は、ｉの発話者のｋ番目の連続する音声群を取得する。 (Step S602) The dialogue voice detecting means 1095 determines whether or not the k-th continuous voice group of the speaker of i exists. If the voice group exists, the process proceeds to step S603, and if the voice group does not exist, the process returns to the upper function.
(Step S603) The dialogue voice detecting means 1095 acquires the k-th continuous voice group of the speaker of i.

（ステップＳ６０４）対話音声検出手段１０９５は、ｊの発話者のｌ番目の連続する音声群が存在するか否かを判断する。音声群が存在すればステップＳ６０５に行き、音声群が存在しなければ上位関数にリターンする。
（ステップＳ６０５）対話音声検出手段１０９５は、ｊの発話者のｌ番目の連続する音声群を取得する。 (Step S604) The dialog voice detecting means 1095 determines whether or not the l-th continuous voice group of the j speaker is present. If the voice group exists, the process goes to step S605, and if the voice group does not exist, the process returns to the upper function.
(Step S605) The dialogue voice detecting means 1095 acquires the l-th continuous voice group of the j speaker.

（ステップＳ６０６）対話音声検出手段１０９５は、ｉの発話者のｋ番目の連続する音声群と、ｊの発話者のｌ番目の連続する音声群が、ほぼ連続しているか否かを判断する。ほぼ連続していればステップＳ６０７に行き、ほぼ連続していなればステップＳ６０８に行く。なお、「ほぼ連続」とは、図７の（ａ）（ｂ）（ｃ）の状況を言う。つまり、（ａ）は、２つの音声群（音声１、音声２）が、少しの重複（ｘ）だけで、引き続いて、出力されていることを示す。かかる場合のｘの時間間隔は、例えば、「５秒以下」が好適である。また、（ｂ）は、２つの音声群（音声１、音声２）が、重複も隙間もなく、連続している場合である。さらに、（ｃ）は、２つの音声群（音声１、音声２）が発せされる間に、少しの時間間隔（ｙ）を有する場合である。なお、「ｙ」は、例えば「３秒以下」が好適である。また、図７（ｄ）（ｅ）に示す態様は、「ほぼ連続」とは、判断され得ない。二人の発声者が重複して発声している時間が、所定時間以上であるからである。なお、図７において、線は、発声していることを示し、横軸は時間（ｔ）である。また、音声１と音声２は、異なる者が発声した音声（異なる発声者識別子と対になる音声）である。 (Step S606) The dialogue voice detecting means 1095 determines whether or not the k-th continuous voice group of the i speaker and the l-th continuous voice group of the j speaker are substantially continuous. If almost continuous, the process goes to Step S607, and if not substantially, the process goes to Step S608. Note that “substantially continuous” refers to the situation of (a), (b), and (c) in FIG. That is, (a) indicates that two voice groups (voice 1, voice 2) are continuously output with only a slight overlap (x). In this case, the time interval of x is preferably “5 seconds or less”, for example. Further, (b) is a case where two voice groups (voice 1, voice 2) are continuous without overlapping or gaps. Furthermore, (c) is a case where there is a short time interval (y) between the two voice groups (voice 1, voice 2). For example, “y” is preferably “3 seconds or less”. Further, the modes shown in FIGS. 7D and 7E cannot be determined to be “substantially continuous”. This is because the time during which the two speakers speak twice is a predetermined time or more. In FIG. 7, the line indicates that the user is speaking, and the horizontal axis is time (t). Voice 1 and voice 2 are voices uttered by different persons (voices paired with different speaker identifiers).

（ステップＳ６０７）対話音声検出手段１０９５は、ｉの発話者のｋ番目の連続する音声群と、ｊの発話者のｌ番目の連続する音声群に、一の対話を構成する音声であることを示すマーク（所定の値）を付与する。かかるマークを構成する所定の値は、一度、ステップＳ６０６で「ほぼ連続」しない、との判断があった場合には、例えば、インクリメントされるなどして、異なるマークが付与される。 (Step S607) The dialogue voice detecting means 1095 is a voice that constitutes one dialogue between the k-th continuous voice group of the speaker of i and the l-th continuous voice group of the speaker of j. A mark (predetermined value) is given. If it is determined in step S606 that the predetermined value constituting the mark is not “substantially continuous”, a different mark is given by incrementing the predetermined value, for example.

（ステップＳ６０８）対話音声検出手段１０９５は、カウンタｋを直前にインクリメントしたか、カウンタｌを直前にインクリメントしたかを判断する。カウンタｋを直前にインクリメントした場合ステップＳ６０９に行き、カウンタｌを直前にインクリメントした場合ステップＳ６１０に行く。
（ステップＳ６０９）対話音声検出手段１０９５は、カウンタｌを１、インクリメントする。ステップＳ６０４に戻る。
（ステップＳ６１０）対話音声検出手段１０９５は、カウンタｋを１、インクリメントする。ステップＳ６０２に戻る。 (Step S608) The dialogue voice detecting means 1095 determines whether the counter k has been incremented immediately before or the counter l has been incremented immediately before. When the counter k is incremented immediately before, the process goes to step S609, and when the counter l is incremented immediately before, the process goes to step S610.
(Step S609) The dialogue voice detecting means 1095 increments the counter l by 1. The process returns to step S604.
(Step S610) The dialogue voice detecting means 1095 increments the counter k by 1. The process returns to step S602.

なお、図６のフローチャートにおいて、対話音声の検出は、音声のみの情報に基づいて行ったが、オブジェクト識別子と音声に基づいて行っても良い。例えば、ユーザＡが保持している「ＩＲトラッカ」が取得するオブジェクト識別子がユーザＢのオブジェクト識別子であり、ユーザＢが保持している「ＩＲトラッカ」が取得するオブジェクト識別子がユーザＡのオブジェクト識別子である場合、ユーザＡとユーザＢは、向かい合っている（対向している）ことが判別できる。つまり、ユーザＡの「ＩＲトラッカ」を有する情報取得装置から送信されるオブジェクト識別子が、ユーザＢのオブジェクト識別子であり、かつ、ユーザＢの「ＩＲトラッカ」を有する情報取得装置から送信されるオブジェクト識別子が、ユーザＡのオブジェクト識別子である場合に、ユーザＡとユーザＢは対向していると、情報生成装置は判断する。 In the flowchart of FIG. 6, the dialogue voice is detected based on only the voice information, but may be based on the object identifier and the voice. For example, the object identifier acquired by “IR tracker” held by user A is the object identifier of user B, and the object identifier acquired by “IR tracker” held by user B is the object identifier of user A. In some cases, it can be determined that user A and user B are facing each other. That is, the object identifier transmitted from the information acquisition apparatus having the “IR tracker” of the user A is the object identifier of the user B, and the object identifier transmitted from the information acquisition apparatus having the “IR tracker” of the user B Is the object identifier of user A, the information generating apparatus determines that user A and user B are facing each other.

また、「ＩＲトラッカ」を有する情報取得装置は、例えば、ＣＣＤカメラ、マイクを有する。かかる情報取得装置の例を図８に示す。図８において、ユーザの耳の上部に「ＣＣＤカメラ」「赤外線ＩＤタグ」「赤外線センサ」を具備する。「ＣＣＤカメラ」は、映像を取得する。「赤外線ＩＤタグ」は、本ユーザのオブジェクト識別子を示す信号を重畳した赤外線信号を発信する。「赤外線センサ」は、外部からの赤外線信号を受信する。つまり、「赤外線センサ」は、上記の「ＩＲトラッカ」である。そして、口元に「マイク」、および喉元に「スロート・マイク」を有する。また、目の前にＨＭＤ（ヘッド・マウント・ディスプレイ）を具備する。そして、「ＣＣＤカメラ」の信号は、ユーザの背中の背負われたＰＣが取得し、ＰＣから、本情報生成装置に送信される構成である。さらに、ＨＭＤはユーザの居る位置や、閲覧した展示物や対向した人（オブジェクト）に関する情報を出力するために利用する。なお、情報取得の方法として、図８の情報取得装置は、一例であることは言うまでもない。
以下、本実施の形態における情報生成装置の具体的な動作について説明する。 An information acquisition apparatus having an “IR tracker” includes, for example, a CCD camera and a microphone. An example of such an information acquisition apparatus is shown in FIG. In FIG. 8, a "CCD camera", an "infrared ID tag", and an "infrared sensor" are provided above the user's ear. The “CCD camera” acquires an image. The “infrared ID tag” transmits an infrared signal on which a signal indicating the object identifier of the user is superimposed. The “infrared sensor” receives an infrared signal from the outside. That is, the “infrared sensor” is the “IR tracker” described above. It has a “microphone” at the mouth and a “throat microphone” at the throat. In addition, an HMD (head mounted display) is provided in front of the eyes. The signal from the “CCD camera” is acquired by the PC on the back of the user and transmitted from the PC to the information generation apparatus. Further, the HMD is used to output information on the position of the user, the displayed exhibits and the people (objects) facing the user. Needless to say, the information acquisition apparatus in FIG. 8 is an example of an information acquisition method.
Hereinafter, a specific operation of the information generation apparatus according to the present embodiment will be described.

今、展示パネルが多数存在する学会の会場である、とする。この展示パネルは、タッチパネルを具備するディスプレイに、発表するパネル（技術解説した情報）が表示されている。そして、展示パネルの見学者が多数、会場に居る。見学者は、例えば、ユーザＡ、ユーザＢである。また、展示パネルの説明員も、展示パネルごとに存在する。展示パネルの説明員は、例えば、ユーザＣとする。各ユーザは、図８の情報取得装置を装着し、自分が見た映像、自分に対向するオブジェクト（人や展示パネルなど）を識別するオブジェクト識別子、発声した音声（かかる情報の集合は、生コンテンツである）を取得する。そして、生コンテンツは、情報生成装置に送られる。 Suppose that it is a meeting place of an academic society with many exhibition panels now. In this exhibition panel, a panel to be announced (information on technical explanation) is displayed on a display having a touch panel. There are many visitors to the exhibition panel at the venue. Visitors are user A and user B, for example. In addition, there is an explanation panel for each exhibition panel. The exhibitor of the display panel is user C, for example. Each user wears the information acquisition device shown in FIG. 8 and sees the video he / she sees, the object identifier for identifying the object (person, display panel, etc.) facing the user, the voice uttered (a set of such information is the raw content Is). Then, the raw content is sent to the information generation device.

そして、今、ユーザＡ、ユーザＢが、展示パネルＸの前で展示パネルの前で議論している。ここで、ユーザＡのオブジェクト識別子は「３５」、ユーザＢのオブジェクト識別子は「３８」、展示パネルＸのオブジェクト識別子は「１」である、とする。かかるオブジェクト識別子は、各情報取得装置が予め保持している。また、情報取得装置の「赤外線ＩＤタグ」はオブジェクト識別子を重畳した赤外線信号を発信する。また、「ＩＲトラッカ」は、向かい合うオブジェクト（人や展示パネルなど）が装着している「赤外線ＩＤタグ」が発信した信号からオブジェクト識別子を取得する。 Now, user A and user B are discussing in front of the display panel X in front of the display panel X. Here, it is assumed that the object identifier of the user A is “35”, the object identifier of the user B is “38”, and the object identifier of the exhibition panel X is “1”. Such an object identifier is held in advance by each information acquisition device. The “infrared ID tag” of the information acquisition device transmits an infrared signal on which an object identifier is superimposed. In addition, the “IR tracker” acquires an object identifier from a signal transmitted from an “infrared ID tag” attached to an object (a person, an exhibition panel, etc.) facing each other.

かかる場合、各情報取得装置が取得した生コンテンツ（映像、音声、オブジェクト識別子を有する生コンテンツ）の概念図を図９に示す。本情報生成装置は、ユーザＡ、ユーザＢ、ユーザＣの各情報取得装置が取得した生コンテンツを受け付ける。かかる生コンテンツの受け付けは、例えば、無線または有線の通信手段で行う。 In this case, FIG. 9 shows a conceptual diagram of the raw content (raw content having video, audio, and object identifier) acquired by each information acquisition device. This information generation device accepts raw content acquired by each information acquisition device of user A, user B, and user C. Such raw content is received by, for example, wireless or wired communication means.

図９において、各ユーザの情報取得装置から送付される生コンテンツは、映像、音声、オブジェクト識別子の組である。また、映像、音声、オブジェクト識別子の各情報は、同期して送付される。図９において、ユーザＡの情報取得装置から送付される情報は、ユーザＡが見ている映像、ユーザＡが発している音声、ユーザＡが対向しているオブジェクトの識別子（ここでは、ユーザＢ等の識別子）である。
そして、次に、展示パネルＸの説明員であるユーザＣが展示パネルＸの近くにやってきて、ユーザＡ、ユーザＢに対して、展示パネルＸの説明を開始する、とする。 In FIG. 9, the raw content sent from the information acquisition device of each user is a set of video, audio, and object identifier. In addition, video, audio, and object identifier information are sent synchronously. In FIG. 9, the information sent from the information acquisition device of user A includes the video that user A is viewing, the voice that user A is uttering, the identifier of the object that user A is facing (here, user B, etc.) Identifier).
Next, it is assumed that the user C who is an explanation member of the exhibition panel X comes near the exhibition panel X and starts explaining the exhibition panel X to the user A and the user B.

かかる場合に、ユーザＣの情報取得装置、および展示パネルＸまたはその周辺に設置された情報取得装置から、本情報生成装置に送付される生コンテンツの例を図１０に示す。図１０は、ユーザＣが展示パネルＸの説明を開始したあたりから蓄積した生コンテンツの概念図である。また、ユーザＣは展示パネルＸの説明箇所の指示しながら、説明を行うので、指示された位置を示す位置情報（ここでは、座標情報）も取得されている。なお、位置情報のヘッダーには、指示された対象（ここでは、展示パネルＸの対象識別子「１」）が記載されている。なお、ここでは、対象識別子「１」は、展示パネルＸのオブジェクト識別子と同じである。 In such a case, an example of raw content sent to the information generation device from the information acquisition device of the user C and the information acquisition device installed in or around the display panel X is shown in FIG. FIG. 10 is a conceptual diagram of raw content accumulated from when user C started explaining the display panel X. In addition, since the user C performs an explanation while indicating the explanation location of the display panel X, position information (here, coordinate information) indicating the instructed position is also acquired. Note that the instructed target (here, the target identifier “1” of the display panel X) is described in the header of the position information. Here, the target identifier “1” is the same as the object identifier of the display panel X.

かかる本情報生成装置は、１以上の情報取得装置から受け付けた生コンテンツや指示受付部１０４が受け付けた対象に対する指示が示す位置情報を、まず、一時蓄積する。この一時蓄積した情報の例が、図９、図１０である。なお、生コンテンツは、上記位置情報も含むと考えても良い。
次に、ユーザがコンテンツユニットの抽出の指示を入力する、とする。すると、本情報生成装置は、コンテンツユニットの抽出を開始する。
ここで、コンテンツユニットの抽出処理の具体例について説明する。 The information generating apparatus temporarily accumulates the position information indicated by the raw content received from one or more information acquisition apparatuses and the instruction for the target received by the instruction receiving unit 104. Examples of the temporarily stored information are shown in FIGS. The raw content may be considered to include the position information.
Next, it is assumed that the user inputs a content unit extraction instruction. Then, the information generation apparatus starts extracting the content unit.
Here, a specific example of content unit extraction processing will be described.

まず、図１０の生コンテンツからレクチャーコンテンツユニットを抽出する処理について説明する。図１１は、典型説明情報の例である。典型説明情報は、ここでは、時間情報と位置情報を有する。本典型説明情報は、最初に（２０，４０）あたりを指示（説明）し、その１００秒後に（２０，８０）あたりを指示（説明）し、その８０秒後に（１２０，４０）あたりを指示（説明）し、その３０秒後に（１２０，８０）あたりを指示（説明）することを示す。図１２は、かかる典型的な説明の指示の様子を示す模式図である。 First, processing for extracting a lecture content unit from the raw content in FIG. 10 will be described. FIG. 11 is an example of typical explanation information. The typical explanation information here includes time information and position information. This typical explanation information first instructs (explains) around (20, 40), instructs (explains) around (20, 80) after 100 seconds, and instructs around (120, 40) after 80 seconds. (Explanation) and indicates (description) around (120, 80) after 30 seconds. FIG. 12 is a schematic diagram showing a state of such a typical explanation instruction.

まず、説明音声検出手段１０９１は、図１０のユーザＣの音声が、所定の時間以上、発声されている区間を検知する。つまり、説明音声検出手段１０９１は、図１０のユーザＣの音声のうち、「ｔ＝３５０」から「ｔ＝２８５０」の区間を検知する。 First, the explanation voice detecting unit 1091 detects a section in which the voice of the user C in FIG. 10 is uttered for a predetermined time or more. That is, the explanation voice detecting unit 1091 detects a section from “t = 350” to “t = 2850” in the voice of the user C in FIG.

次に、かかる音声の区間について、類似度検出手段１０９３は、図１０の位置情報「（２０，３８）、ｔ＝６１１」「（２０，８８）、ｔ＝７２０」「（１２５，４０）、ｔ＝８００」「（１３１，８０）、ｔ＝８４１」を取得する。なお、各「ｔ」の値は、各位置情報を取得した時間を示す（図１０参照）。 Next, with respect to such a voice segment, the similarity detection means 1093 causes the position information “(20, 38), t = 611”, “(20, 88), t = 720” “(125, 40)” in FIG. “t = 800” “(131, 80), t = 841” is acquired. Each “t” value indicates the time when each piece of position information is acquired (see FIG. 10).

そして、類似度検出手段１０９３は、（２０，４０）と（２０，３８）の距離、（２０，８０）と（２０，８８）の距離、（１２０，４０）と（１２５，４０）の距離、（１２０，８０）と（１３１，８０）の距離を算出する。そして、類似度検出手段１０９３は、４つの距離の和「２６」を得る。 The similarity detection means 1093 then calculates the distance between (20, 40) and (20, 38), the distance between (20, 80) and (20, 88), and the distance between (120, 40) and (125, 40). , (120, 80) and (131, 80) are calculated. Then, the similarity detection means 1093 obtains the sum “26” of the four distances.

また、類似度検出手段１０９３は、４つの点の時間間隔の差「（１００−０）と（７２０−６１１）の差」「（１８０−１００）と（８００−７２０）の差」「（２１０−１８０）と（８４１−８００）の差」の和「２０」を得る。
そして、類似度検出手段１０９３は、例えば、類似度を「２６」と「２０」に基づいて「１／（２６＋２０）」として算出する。 Also, the similarity degree detection means 1093 is the difference between the time intervals of the four points “difference between (100-0) and (720-611)” “difference between (180-100) and (800-720)” “(210 The difference “−180) and (841−800)” is obtained as “20”.
Then, the similarity detection unit 1093 calculates the similarity as “1 / (26 + 20)” based on “26” and “20”, for example.

そして、レクチャーコンテンツユニット抽出手段１０９４は、類似度が予め決められた類似度「１／１００」以上である「１／４６」であるので、典型的な説明として採用できる、として、図１０のユーザＣの音声の時間「３５０」以降、発声が終了する「２８５０（図示していない）」までの情報（ユーザＩＤ＝５２、開始時刻＝３５０、終了時刻＝２８５０）を抽出する。なお、レクチャーコンテンツユニット抽出手段１０９４は、時間「３５０」から「２８５０」までの、音声を含む情報を、図１０のユーザＣの情報から切り出して、取得しても良い。
次に、図９の各種情報からインタラクションコンテンツユニットを抽出する処理について説明する。 The lecture content unit extraction means 1094 is assumed to be adopted as a typical explanation because the similarity is “1/46” which is equal to or higher than the predetermined similarity “1/100”. Information (user ID = 52, start time = 350, end time = 2850) from the time “350” of the voice of C to “2850 (not shown)” where the utterance ends is extracted. Note that the lecture content unit extraction unit 1094 may acquire information including audio from the time “350” to “2850” by cutting it from the information of the user C in FIG.
Next, processing for extracting an interaction content unit from various information in FIG. 9 will be described.

まず、対話音声検出手段１０９５は、対象となる２つの発話者識別子を決定する。ここでは、２つの発話者識別子を（３５，３８）として決定する。かかる発話者識別子の決定は、コンテンツユニット抽出部１０９が受け付けたすべての発話者識別子から、順次２つの発話者識別子の組み合わせを決定することにより行い、その決定アルゴリズムは問わない。また、通常、すべての組み合わせについて、以下に述べるインタラクションコンテンツユニットの抽出処理を行う。
そして、対話音声検出手段１０９５は、発話者識別子「３５」、および「３８」に対応する音声を取得する。 First, the dialogue voice detecting means 1095 determines two target speaker identifiers. Here, two speaker identifiers are determined as (35, 38). The determination of the speaker identifier is performed by sequentially determining a combination of two speaker identifiers from all the speaker identifiers received by the content unit extraction unit 109, and the determination algorithm is not limited. Also, the interaction content unit extraction process described below is usually performed for all combinations.
Then, the dialogue voice detecting means 1095 acquires voices corresponding to the speaker identifiers “35” and “38”.

次に、対話音声検出手段１０９５は、発話者識別子「３５」の音声の中で、１つめの連続する音声群を取得する。その結果、対話音声検出手段１０９５は、発話者識別子「３５」の音声の中の「ｔ＝０」から「ｔ＝２６０」（図９参照）の音声（第一音声とする）を取得する。この音声は、発話者識別子「３５」で識別される発話者が「この展示パネル、面白いね。・・・・」と発話した際に発した音声である。 Next, the dialogue voice detecting means 1095 acquires the first continuous voice group among the voices of the speaker identifier “35”. As a result, the dialogue voice detecting means 1095 acquires the voice (referred to as the first voice) from “t = 0” to “t = 260” (see FIG. 9) in the voice of the speaker identifier “35”. This voice is a voice uttered when the speaker identified by the speaker identifier “35” speaks “This exhibition panel is interesting.

次に、対話音声検出手段１０９５は、発話者識別子「３８」の音声の中で、１つめの連続する音声群を取得する。その結果、対話音声検出手段１０９５は、発話者識別子「３８」の音声の中の「ｔ＝２５８」から「ｔ＝３６０」（図９参照）の音声を取得（第二音声とする）する。この音声は、発話者識別子「３８」で識別される発話者が「そうかな？どこが」と発話した際に発した音声である。 Next, the dialogue voice detecting means 1095 acquires the first continuous voice group among the voices of the speaker identifier “38”. As a result, the dialogue voice detecting means 1095 acquires (trays as the second voice) voices from “t = 258” to “t = 360” (see FIG. 9) in the voice of the speaker identifier “38”. This voice is a voice uttered when the speaker identified by the speaker identifier “38” speaks “I wonder? Where?”.

そして、対話音声検出手段１０９５は、第一音声と第二音声がほぼ連続しているか否かを判断する。ここでは、第一音声の発話時間の終点「２６０」と、第二音声の発話時間の始点「２５８」の差が「２」であり、所定（例えば、「５」）の時間間隔以下であるので、ほぼ連続する、と判断する。 Then, the dialogue voice detecting means 1095 determines whether or not the first voice and the second voice are almost continuous. Here, the difference between the end point “260” of the utterance time of the first voice and the start point “258” of the utterance time of the second voice is “2”, which is equal to or less than a predetermined time interval (for example, “5”). Therefore, it is judged that it is almost continuous.

次に、対話音声検出手段１０９５は、発話者識別子「３５」の音声の中で、２つめの連続する音声群を取得しようとする。しかし、図９において、２つめの連続する音声群は存在しないので、２つめの連続する音声群を取得できず、対話検出処理を完了する。 Next, the dialog voice detecting means 1095 tries to acquire the second continuous voice group in the voice of the speaker identifier “35”. However, in FIG. 9, since the second continuous voice group does not exist, the second continuous voice group cannot be acquired, and the dialogue detection process is completed.

次に、インタラクションコンテンツユニット抽出手段１０９６は、第一音声の始点「０」と第二音声の終点「３６０」を取得する。そして、コンテンツユニット蓄積部１１０は、コンテンツユニット情報（ここでは、少なくとも「０」「３６０」を有する情報）を蓄積する。なお、コンテンツユニット情報は、出力されるコンテンツユニットを抽出するための始点と終点を示す情報だけでも良いし、コンテンツユニット自体の情報でも良い。コンテンツユニット自体の情報とは、発話者識別子「３５」の「ｔ＝０」から「ｔ＝２６０」の時間間隔分の音声、映像等、および発話者識別子「３８」の「ｔ＝２５８」から「ｔ＝３６０」の時間間隔分の音声、映像等を切り出して構成されたコンテンツである。コンテンツユニット情報は、通常、出力されるコンテンツユニットを抽出するための始点と終点、および発話者識別子「３５」および「３８」の全コンテンツを識別する情報（例えば、「３５」「３８」を有する情報）である。コンテンツユニット情報の例を図１３に示す。図１３におけるコンテンツユニット情報は、第一のオブジェクト識別子（発話者識別子「３５」）、当該第一のオブジェクト識別子で識別されるオブジェクトから発信された音声等を切り出す始点、同終点、第二のオブジェクト識別子（発話者識別子「３８」）、当該第二のオブジェクト識別子で識別されるオブジェクトから発信された音声等を切り出す始点、同終点を有するデータ構造である。 Next, the interaction content unit extraction unit 1096 acquires the start point “0” of the first sound and the end point “360” of the second sound. Then, the content unit storage unit 110 stores content unit information (in this case, information having at least “0” and “360”). The content unit information may be only information indicating the start point and end point for extracting the output content unit, or may be information on the content unit itself. The information of the content unit itself includes voice, video, etc. for the time interval from “t = 0” to “t = 260” of the speaker identifier “35”, and “t = 258” of the speaker identifier “38”. The content is configured by cutting out audio, video, and the like for a time interval of “t = 360”. The content unit information usually has information (for example, “35” and “38”) that identifies the start and end points for extracting the output content unit, and all the content of the speaker identifiers “35” and “38”. Information). An example of content unit information is shown in FIG. The content unit information in FIG. 13 includes the first object identifier (speaker identifier “35”), the start point, the end point, and the second object for extracting the voice transmitted from the object identified by the first object identifier. This is a data structure having an identifier (speaker identifier “38”), a starting point and a ending point for extracting the voice transmitted from the object identified by the second object identifier.

また、上記において、上述したように、発話者識別子「３５」のオブジェクトと発話者識別子「３８」のオブジェクトが向かい合っていることを判別する処理を、さらに加え、両者が向かい合っている場合のみ、インタラクションコンテンツユニットを抽出することはさらに好適である。なお、図９において、発話者識別子「３５」の情報は、「ｔ＝０」から「ｔ＝２６０」の時間において、オブジェクト識別子「３８」を有し、かつ、発話者識別子「３８」の情報は、「ｔ＝２５８」から「ｔ＝３６０」の時間において、オブジェクト識別子「３５」を有するので、両者は向かい合っていると判別される。
次に、上記と同様に、対話音声検出手段１０９５は、発話者識別子「３５」、および「５２」に対応する音声を取得する。 Further, in the above, as described above, a process for determining that the object having the speaker identifier “35” and the object having the speaker identifier “38” are facing each other is further added, and only when both are facing each other, the interaction is performed. It is more preferable to extract the content unit. In FIG. 9, the information of the speaker identifier “35” has the object identifier “38” from the time “t = 0” to “t = 260”, and the information of the speaker identifier “38”. Has the object identifier “35” in the time from “t = 258” to “t = 360”, it is determined that the two are facing each other.
Next, in the same manner as described above, the dialogue voice detecting means 1095 acquires voices corresponding to the speaker identifiers “35” and “52”.

次に、対話音声検出手段１０９５は、発話者識別子「５２」の音声の中で、１つめの連続する音声群を取得する。その結果、対話音声検出手段１０９５は、発話者識別子「５２」の音声の中の「ｔ＝４００」から「ｔ＝５００」（図９参照）の音声を取得（第二音声とする）する。この音声は、発話者識別子「５２」で識別される発話者が「では説明します。・・・」と発話した際に発した音声である。 Next, the dialogue voice detecting means 1095 acquires the first continuous voice group among the voices of the speaker identifier “52”. As a result, the dialog voice detecting means 1095 acquires the voice from “t = 400” to “t = 500” (see FIG. 9) in the voice of the speaker identifier “52” (referred to as the second voice). This voice is a voice uttered when the speaker identified by the speaker identifier “52” utters “I will explain ...”.

そして、対話音声検出手段１０９５は、第一音声と第二音声がほぼ連続しているか否かを判断する。上述のアルゴリズムにより、第一音声と第二音声はほぼ連続している条件を満たさないので、発話者識別子「３５」、および「５２」の組において、インタラクションコンテンツユニット抽出手段１０９６は、インタラクションコンテンツユニットを抽出しない。 Then, the dialogue voice detecting means 1095 determines whether or not the first voice and the second voice are almost continuous. Since the above-described algorithm does not satisfy the condition that the first voice and the second voice are substantially continuous, the interaction content unit extraction unit 1096 uses the interaction content unit in the set of the speaker identifiers “35” and “52”. Do not extract.

さらに、上記と同様に、対話音声検出手段１０９５は、発話者識別子「３８」、および「５２」に対応する音声を取得する。上記と同様に、発話者識別子「３８」の第一音声と「５２」の第二音声がほぼ連続しているという条件を満たさないので、発話者識別子「３８」、および「５２」の組において、インタラクションコンテンツユニット抽出手段１０９６は、インタラクションコンテンツユニットを抽出しない。 Further, similar to the above, the dialogue voice detecting means 1095 acquires voices corresponding to the speaker identifiers “38” and “52”. Similarly to the above, since the condition that the first voice of the speaker identifier “38” and the second voice of “52” are almost continuous is not satisfied, in the set of the speaker identifiers “38” and “52” The interaction content unit extraction unit 1096 does not extract the interaction content unit.

以上の処理により、レクチャーユニット、インタラクションコンテンツユニットが抽出される。なお、レクチャーユニットとインタラクションコンテンツユニットは、識別されえる形態で蓄積される。例えば、蓄積される記録媒体（テーブル等も含む）が異なったり、蓄積されたコンテンツユニット情報にユニットの種類を識別するフラグが対に蓄積されたりする。 Through the above process, the lecture unit and the interaction content unit are extracted. The lecture unit and the interaction content unit are stored in a form that can be identified. For example, the storage media (including tables and the like) to be stored are different, or flags identifying the unit type are stored in pairs in the stored content unit information.

次に、ユーザは、所定のコンテンツユニット出力指示を入力する、とする。そして、指示受付部１０４は、コンテンツユニット出力指示を受け付ける。具体的には、例えば、展示パネルＸが、図１４に示すように、タッチパネルを具備するディスプレイに表示されている。そして、ディスプレイの左下には、「レクチャー」ボタン、「インタラクション」ボタンが表示されている。ユーザが「レクチャー」ボタンを押下すると、指示受付部１０４は、当該ボタンの押下を認識し、コンテンツユニット出力部１１１は、コンテンツユニット蓄積部１１０が蓄積したレクチャーコンテンツユニットを取得し、出力する。 Next, it is assumed that the user inputs a predetermined content unit output instruction. The instruction receiving unit 104 receives a content unit output instruction. Specifically, for example, the display panel X is displayed on a display having a touch panel as shown in FIG. In the lower left of the display, a “Lecture” button and an “Interaction” button are displayed. When the user presses the “Lecture” button, the instruction receiving unit 104 recognizes the pressing of the button, and the content unit output unit 111 acquires and outputs the lecture content unit stored in the content unit storage unit 110.

また、ユーザが、展示パネル中の疑問箇所を押下すると、位置情報取得部１０５が位置情報（座標情報）を取得し、当該位置情報に概要する説明を有するレクチャーコンテンツユニットを取得し、出力する構成も好適である。 In addition, when the user presses a questionable part in the display panel, the position information acquisition unit 105 acquires position information (coordinate information), and acquires and outputs a lecture content unit having an explanation outlined in the position information. Is also suitable.

また、ユーザが「インタラクション」ボタンを押下すると、指示受付部１０４は、当該ボタンの押下を認識し、コンテンツユニット出力部１１１は、コンテンツユニット蓄積部１１０が蓄積したインタラクションコンテンツユニットを取得し、出力する。
以上、本実施の形態によれば、展示物や展示パネルなどの対象について説明したり、議論したりした際に記録される生コンテンツから、再利用され得るコンテンツユニットを自動的に抽出できる。 When the user presses the “interaction” button, the instruction receiving unit 104 recognizes that the button is pressed, and the content unit output unit 111 acquires and outputs the interaction content unit stored by the content unit storage unit 110. .
As described above, according to the present embodiment, a content unit that can be reused can be automatically extracted from raw content recorded when an object such as an exhibit or a display panel is described or discussed.

さらに、本実施の形態における処理は、ソフトウェアで実現しても良い。そして、このソフトウェアをソフトウェアダウンロード等により配布しても良い。また、このソフトウェアをＣＤ−ＲＯＭなどの記録媒体に記録して流布しても良い。なお、このことは、本明細書における他の実施の形態においても該当する。なお、本実施の形態における情報生成装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、展示されているオブジェクトに対する説明または議論に関する生コンテンツから、一部のコンテンツであるコンテンツユニットを蓄積するためのプログラムであって、コンピュータに、音声を有する情報である音声情報を受け付ける音声情報受付ステップと、前記音声情報受付ステップで受け付けた音声情報に基づいて、所定の条件に合致する時間分の音声情報を有するコンテンツユニットを抽出するコンテンツユニット抽出ステップと、前記コンテンツユニット抽出ステップで抽出したコンテンツユニットに関する情報であるコンテンツユニット情報を蓄積するコンテンツユニット蓄積ステップを実行させるためのプログラム、である。 Furthermore, the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded and distributed on a recording medium such as a CD-ROM. This also applies to other embodiments in this specification. Note that the software that implements the information generation apparatus according to the present embodiment is the following program. In other words, this program is a program for accumulating a content unit, which is a part of content, from raw content related to an explanation or discussion of an object on display. A receiving audio information receiving step, a content unit extracting step for extracting a content unit having audio information for a time matching a predetermined condition based on the audio information received in the audio information receiving step, and the content unit extracting step A program for executing a content unit accumulation step for accumulating content unit information, which is information relating to the content unit extracted in (1).

また、上記プログラムに対して、コンピュータに、表示情報を表示する表示ステップと、前記表示ステップで表示している表示情報に対する指示を受け付ける指示受付ステップと、前記指示受付ステップで受け付けた指示の位置に関する情報である位置情報を取得する位置情報取得ステップをさらに実行させ、前記コンテンツユニット抽出ステップにおいて、前記指示受付ステップで受け付けた指示、および前記音声情報受付ステップで受け付けた音声情報に基づいて、所定の条件に合致する時間分の音声情報、および当該時間の間に前記位置情報取得ステップで取得した位置情報を有するコンテンツユニットを抽出するプログラムでも良い。 Further, a display step for displaying display information on a computer, an instruction receiving step for receiving an instruction for the display information displayed in the display step, and a position of the instruction received in the instruction receiving step for the program A position information acquisition step of acquiring position information as information is further executed, and in the content unit extraction step, based on the instruction received in the instruction reception step and the audio information received in the audio information reception step, A program that extracts audio information for a time that matches the condition and a content unit that has the position information acquired in the position information acquisition step during the time may be used.

また、上記プログラムに対して、前記コンテンツユニット抽出ステップは、一の発話者により所定の時間以上、発話されていると判断される音声である説明音声を検出する説明音声検出サブステップと、前記位置情報取得ステップで取得した１以上の位置情報と格納している典型説明情報が有する１以上の位置情報に基づいて、前記位置情報取得ステップで取得した１以上の位置情報に対応する説明と、前記典型説明情報に対応する説明との類似度を検出する類似度検出サブステップと、前記説明音声検出サブステップで説明音声を検出し、かつ前記類似度検出サブステップで検出した類似度が予め決められた類似度以上の類似度である場合に、当該説明音声の時間分の音声情報を有するレクチャーコンテンツユニットを抽出するレクチャーコンテンツユニット抽出サブステップを具備するプログラムでも良い。 For the above program, the content unit extraction step includes an explanation voice detection substep for detecting explanation voice, which is a voice determined to be spoken by a single speaker for a predetermined time or more, and the position A description corresponding to the one or more position information acquired in the position information acquisition step based on the one or more position information acquired in the information acquisition step and one or more position information included in the stored typical description information; The similarity detection substep for detecting the similarity with the explanation corresponding to the typical explanation information, the explanation voice is detected at the explanation voice detection substep, and the similarity detected at the similarity detection substep is determined in advance. Lecture content unit having audio information for the time of the explanation audio when the similarity is equal to or higher than the similarity It may be a program having a content unit extraction sub-step.

また、上記プログラムに対して、前記コンテンツユニット抽出ステップは、一の発話者識別子と対になる音声と、ほぼ連続する他の発話者識別子と対になる音声を有する対話音声を検出する対話音声検出サブステップと、前記対話音声検出サブステップで対話音声を検出した場合、当該対話音声の時間分の音声情報を有するインタラクションコンテンツユニットを抽出するインタラクションコンテンツユニット抽出サブステップを具備するプログラムでも良い。 In addition, for the above program, the content unit extraction step detects a dialog voice having a voice paired with one speaker identifier and a voice paired with another substantially continuous speaker identifier. The program may include an interaction content unit extraction substep for extracting an interaction content unit having audio information corresponding to the time of the dialogue voice when the dialogue voice is detected in the substep and the dialogue voice detection substep.

また、上記プログラムに対して、コンピュータに、前記コンテンツユニット蓄積ステップで蓄積したコンテンツユニットが有する音声情報を少なくとも出力するコンテンツユニット出力ステップをさらに実行させるプログラムでも良い。
（実施の形態２）
図１５は、本実施の形態における情報出力装置のブロック図である。 Moreover, the program may further cause the computer to further execute a content unit output step for outputting at least audio information of the content unit stored in the content unit storage step.
(Embodiment 2)
FIG. 15 is a block diagram of the information output apparatus in the present embodiment.

情報出力装置は、指示受付部１０４、コンテンツユニット格納部１５０１、トリガー検出部１５０２、位置情報取得部１５０３、オブジェクト識別子取得部１５０４、既説明情報蓄積部１５０５、コンテンツユニット取得部１５０６、エージェント格納部１５０７、コンテンツユニット出力部１５０８、表示情報格納部１５０９、表示部１５１０を具備する。
また、コンテンツユニット取得部１５０６は、距離情報取得手段１５０６１、コンテンツユニット取得手段１５０６２を具備する。 The information output device includes an instruction receiving unit 104, a content unit storage unit 1501, a trigger detection unit 1502, a position information acquisition unit 1503, an object identifier acquisition unit 1504, an already described information storage unit 1505, a content unit acquisition unit 1506, and an agent storage unit 1507. , A content unit output unit 1508, a display information storage unit 1509, and a display unit 1510.
The content unit acquisition unit 1506 includes distance information acquisition means 15061 and content unit acquisition means 15062.

コンテンツユニット格納部１５０１は、コンテンツユニットを１以上格納している。コンテンツユニットは、そのまま出力されるコンテンツユニット自体でも良いし、生コンテンツと、出力されるコンテンツユニットを抽出するための情報（ポインタ情報）でも良い。生コンテンツとは、上述した映像情報受付部１０６、音声情報受付部１０７等が取得したそのままの映像や音声等を含む情報である。コンテンツユニットは、例えば、音声情報と位置情報を有する。コンテンツユニットは、例えば、上述したレクチャーコンテンツユニット、インタラクションコンテンツユニットである。かかるコンテンツユニットは、実施の形態１で説明した情報出力装置が抽出したコンテンツユニットであることが好適である。コンテンツユニット格納部１５０１は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。不揮発性の記録媒体でも、揮発性の記録媒体でも良い。 The content unit storage unit 1501 stores one or more content units. The content unit may be the content unit itself output as it is, or the raw content and information (pointer information) for extracting the output content unit. The raw content is information including the video and audio as they are acquired by the video information receiving unit 106 and the audio information receiving unit 107 described above. The content unit has, for example, audio information and position information. The content unit is, for example, the above-described lecture content unit or interaction content unit. Such content units are preferably content units extracted by the information output apparatus described in the first embodiment. The content unit storage unit 1501 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. It may be a non-volatile recording medium or a volatile recording medium.

トリガー検出部１５０２は、コンテンツユニットの出力のトリガーを検出する。トリガー検出部１５０２が検出するトリガーは、例えば、表示部１５１０が表示している表示情報に対する指示である。トリガー検出部１５０２が検出するトリガーは、例えば、ユーザが近くに居ることを認識することでも良い。かかる認識は、例えば、オブジェクト識別子取得部１５０４が、ユーザのＩＲトラッカから出力される信号を受信し、当該信号からオブジェクト識別子を取得することによりなされる。トリガー検出部１５０２は、通常、ＭＰＵやメモリ等から実現され得る。トリガー検出部１５０２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The trigger detection unit 1502 detects an output trigger of the content unit. The trigger detected by the trigger detection unit 1502 is, for example, an instruction for display information displayed on the display unit 1510. The trigger detected by the trigger detection unit 1502 may be, for example, recognizing that the user is nearby. Such recognition is performed, for example, by the object identifier acquisition unit 1504 receiving a signal output from the user's IR tracker and acquiring the object identifier from the signal. The trigger detection unit 1502 can usually be realized by an MPU, a memory, or the like. The processing procedure of the trigger detection unit 1502 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

位置情報取得部１５０３は、トリガー検出部１５０２が検出した指示の位置に関する情報である位置情報を取得する。位置情報は、通常、座標情報（ｘ，ｙ）であるが、指示されたあたりの領域や指示されたあたりの段落（パネルに表示されている文書の段落）などでも良い。位置情報取得部１５０３は、例えば、タッチパネルのドライバーソフトで実現される。なお、位置情報取得部１５０３は、ＭＰＵやメモリ等から実現されても良い。位置情報取得部１５０３は、指示される対象が物理的な展示物（例えば、博物館の恐竜など）である場合には、当該展示物が指示された位置情報を取得する手段により構成される。かかる場合、例えば、展示物に指示され得るボタンが設置されており、位置情報取得部１５０３は、当該ボタンの押下、および押下されたボタンを認識し、当該ボタンに対応する位置情報を取得する。 The position information acquisition unit 1503 acquires position information that is information regarding the position of the instruction detected by the trigger detection unit 1502. The position information is usually coordinate information (x, y), but it may be a region around the designated area or a paragraph around the designated area (a paragraph of the document displayed on the panel). The position information acquisition unit 1503 is realized by, for example, touch panel driver software. Note that the position information acquisition unit 1503 may be realized by an MPU, a memory, or the like. The position information acquisition unit 1503 is configured by means for acquiring position information in which the exhibit is instructed when the object to be instructed is a physical exhibit (for example, a dinosaur in a museum). In this case, for example, a button that can be instructed to the exhibit is installed, and the position information acquisition unit 1503 recognizes the pressing of the button and the pressed button, and acquires position information corresponding to the button.

オブジェクト識別子取得部１５０４は、ユーザを識別する情報であるオブジェクト識別子を取得する。ここで、ユーザは、人であり、オブジェクトの一種である。また、ここでは、ユーザは、展示物、展示パネルを見学する人である。オブジェクト識別子取得部１５０４は、例えば、上述したＩＲトラッカで実現されえる。また、オブジェクト識別子取得部１５０４は、例えば、上述したＩＲトラッカから、オブジェクト識別子を受け付けるソフトウェアでも良い。 The object identifier acquisition unit 1504 acquires an object identifier that is information for identifying a user. Here, the user is a person and is a kind of object. Here, the user is a person who observes the exhibits and the display panels. The object identifier acquisition unit 1504 can be realized by the above-described IR tracker, for example. The object identifier acquisition unit 1504 may be software that receives an object identifier from the above-described IR tracker, for example.

既説明情報蓄積部１５０５は、コンテンツユニット出力部１５０８がコンテンツユニットを出力している間に、オブジェクト識別子取得部１５０４が取得したオブジェクト識別子を、コンテンツユニットに対応付けて蓄積する。既説明情報蓄積部１５０５は、例えば、コンテンツユニットを識別するコンテンツユニット識別子と、１以上のオブジェクト識別子を有する既説明情報を構成し、蓄積しても良い。既説明情報蓄積部１５０５における蓄積情報のデータ構造等は問わない。既説明情報蓄積部１５０５は、既に説明（出力）済みのオブジェクトのオブジェクト識別子を、当該出力されたコンテンツユニットに対応付ければ良い。既説明情報蓄積部１５０５は、通常、ＭＰＵやメモリ等から実現され得る。既説明情報蓄積部１５０５の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The already-explained information storage unit 1505 stores the object identifier acquired by the object identifier acquisition unit 1504 in association with the content unit while the content unit output unit 1508 outputs the content unit. The already-explained information storage unit 1505 may configure and store already-explained information having a content unit identifier for identifying a content unit and one or more object identifiers, for example. The data structure of the stored information in the already described information storage unit 1505 does not matter. The already-explained information storage unit 1505 may associate the object identifier of the already described (output) object with the output content unit. The already-explained information storage unit 1505 can usually be realized by an MPU, a memory, or the like. The processing procedure of the already-explained information storage unit 1505 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

コンテンツユニット取得部１５０６は、トリガー検出部１５０２がトリガーを検出した場合に、コンテンツユニット格納部１５０１から１以上のコンテンツユニットを取得する。コンテンツユニット取得部１５０６は、指示があるたびに、ランダムにコンテンツユニットを選択して取得しても良い。また、コンテンツユニット取得部１５０６は、コンテンツユニット格納部１５０１から、位置情報取得部１５０３が取得した位置情報に対応するコンテンツユニットを取得しても良い。位置情報に対応するコンテンツユニットとは、位置情報取得部１５０３が取得した位置情報と対になるコンテンツユニットでも良いし、位置情報取得部１５０３が取得した位置情報と所定の距離以内である位置情報と対になるコンテンツユニットでも良い。コンテンツユニット取得部１５０６は、ユーザとの距離を検出し、当該距離に応じて、コンテンツユニットを選択して取得しても良い。かかる処理は、距離情報取得手段１５０６１、およびコンテンツユニット取得手段１５０６２が行う。また、コンテンツユニット取得部１５０６は、オブジェクト識別子取得部１５０４が取得したオブジェクト識別子と対応付けられているコンテンツユニットは取得しないことが好適である。つまり、一度出力したコンテンツユニットは、再度、同じユーザ（オブジェクト）に出力しないためである。コンテンツユニット取得部１５０６は、通常、ＭＰＵやメモリ等から実現され得る。コンテンツユニット取得部１５０６の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。 The content unit acquisition unit 1506 acquires one or more content units from the content unit storage unit 1501 when the trigger detection unit 1502 detects a trigger. The content unit acquisition unit 1506 may select and acquire content units at random each time an instruction is given. Further, the content unit acquisition unit 1506 may acquire a content unit corresponding to the position information acquired by the position information acquisition unit 1503 from the content unit storage unit 1501. The content unit corresponding to the location information may be a content unit that is paired with the location information acquired by the location information acquisition unit 1503, or the location information acquired by the location information acquisition unit 1503 and the location information within a predetermined distance. Paired content units may be used. The content unit acquisition unit 1506 may detect the distance to the user and select and acquire a content unit according to the distance. Such processing is performed by the distance information acquisition unit 15061 and the content unit acquisition unit 15062. Moreover, it is preferable that the content unit acquisition unit 1506 does not acquire the content unit associated with the object identifier acquired by the object identifier acquisition unit 1504. That is, the content unit once output is not output again to the same user (object). The content unit acquisition unit 1506 can usually be realized by an MPU, a memory, or the like. The processing procedure of the content unit acquisition unit 1506 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).

エージェント格納部１５０７は、１以上のエージェントを格納している。エージェントとは、展示物等の説明員や見学者等を想起させる図柄やマーク等であり、そのデータ構造は問わない。エージェントは、例えば、ビットマップデータである。エージェント格納部１５０７は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。不揮発性の記録媒体でも、揮発性の記録媒体でも良い。 The agent storage unit 1507 stores one or more agents. An agent is a symbol or mark that reminds an exhibitor of an exhibit or a visitor, and the data structure is not limited. The agent is, for example, bitmap data. The agent storage unit 1507 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium. It may be a non-volatile recording medium or a volatile recording medium.

コンテンツユニット出力部１５０８は、コンテンツユニット取得部１５０６が取得したコンテンツユニットを出力する。コンテンツユニット出力部１５０８は、コンテンツユニットを出力する際に、通常、エージェントをも出力する。ただし、エージェント出力は必須ではない。ここで、出力とは、ディスプレイへの表示、プリンタへの印字、音出力、外部の装置への送信等を含む概念である。コンテンツユニット出力部１５０８は、ディスプレイやスピーカー等の出力デバイスを含むと考えても含まないと考えても良い。コンテンツユニット出力部１５０８は、出力デバイスのドライバーソフトまたは、出力デバイスのドライバーソフトと出力デバイス等で実現され得る。 The content unit output unit 1508 outputs the content unit acquired by the content unit acquisition unit 1506. The content unit output unit 1508 normally outputs an agent when outputting a content unit. However, agent output is not mandatory. Here, output is a concept including display on a display, printing on a printer, sound output, transmission to an external device, and the like. The content unit output unit 1508 may or may not include an output device such as a display or a speaker. The content unit output unit 1508 can be implemented by output device driver software, or output device driver software and an output device.

表示情報格納部１５０９は、表示する情報である表示情報を格納している。表示情報とは、展示パネルを構成する情報である。なお、展示されるオブジェクトが展示物（例えば、博物館の恐竜など）である場合は、表示情報格納部１５０９および表示部１５１０は、不要である場合もある。表示情報格納部１５０９は、不揮発性の記録媒体が好適であるが、揮発性の記録媒体でも実現可能である。不揮発性の記録媒体でも、揮発性の記録媒体でも良い。 The display information storage unit 1509 stores display information that is information to be displayed. Display information is information that constitutes an exhibition panel. If the object to be displayed is an exhibit (for example, a dinosaur in a museum), the display information storage unit 1509 and the display unit 1510 may be unnecessary. The display information storage unit 1509 is preferably a nonvolatile recording medium, but can also be realized by a volatile recording medium. It may be a non-volatile recording medium or a volatile recording medium.

表示部１５１０は、表示情報格納部１５０９の表示情報を表示する。表示部１５１０は、ディスプレイやタッチパネル等のデバイスを含むと考えても含まないと考えても良い。表示部１５１０は、デバイスのドライバーソフトまたは、デバイスのドライバーソフトと出力デバイス等で実現され得る。 A display unit 1510 displays display information in the display information storage unit 1509. The display unit 1510 may be considered as including or not including a device such as a display or a touch panel. The display unit 1510 can be realized by device driver software, or device driver software and an output device.

距離情報取得手段１５０６１は、展示物（例えば、表示情報が表示されているディスプレイ）とユーザとの距離に関する情報である距離情報を取得する。距離情報取得手段１５０６１は、例えば、超音波受信機と、距離を算出するソフトウェア等で実現されえる。また、距離情報取得手段１５０６１は、通信距離が異なる２以上のＩＲタグ（ユーザが保持しているＩＲタグ）から発信される赤外線信号を受信したか否かにより、おおよその距離を検出するソフトウェアで実現されても良い。また、２つのオブジェクト（ここでは、展示されているオブジェクトとユーザ）間の距離を算出する手段は、種々あり、いかなる手段を用いても良い。 The distance information acquisition unit 15061 acquires distance information that is information regarding the distance between an exhibit (for example, a display on which display information is displayed) and the user. The distance information acquisition unit 15061 can be realized by, for example, an ultrasonic receiver and software for calculating the distance. The distance information acquisition unit 15061 is software that detects an approximate distance depending on whether an infrared signal transmitted from two or more IR tags (IR tags held by a user) having different communication distances is received. It may be realized. There are various means for calculating the distance between two objects (here, the displayed object and the user), and any means may be used.

コンテンツユニット取得手段１５０６２は、距離情報取得手段１５０６１が取得した距離情報に基づいて、コンテンツユニット格納部１５０１から１以上のコンテンツユニットを取得する。コンテンツユニット取得手段１５０６２は、通常、ＭＰＵやメモリ等から実現され得る。コンテンツユニット取得手段１５０６２の処理手順は、通常、ソフトウェアで実現され、当該ソフトウェアはＲＯＭ等の記録媒体に記録されている。但し、ハードウェア（専用回路）で実現しても良い。
次に、情報出力装置の動作について図１６、図１７のフローチャートを用いて説明する。 The content unit acquisition unit 15062 acquires one or more content units from the content unit storage unit 1501 based on the distance information acquired by the distance information acquisition unit 15061. The content unit acquisition unit 15062 can be usually realized by an MPU, a memory, or the like. The processing procedure of the content unit acquisition unit 15062 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
Next, the operation of the information output apparatus will be described with reference to the flowcharts of FIGS.

（ステップＳ１６０１）トリガー検出部１５０２は、コンテンツユニットの出力のトリガーを検出したか否かを判断する。トリガーを検出すればステップＳ１６０２に行き、トリガーを検出しなければステップＳ１６０１に戻る。かかるトリガーは、例えば、タッチパネルへの押下や、ユーザ（見学者）が展示物に所定の距離以内に近づいたこと等である。
（ステップＳ１６０２）コンテンツユニット取得部１５０６は、コンテンツユニットを取得する。コンテンツユニット取得処理の詳細は、図１７のフローチャートを用いて説明する。 (Step S1601) The trigger detection unit 1502 determines whether or not a trigger for outputting the content unit has been detected. If a trigger is detected, the process goes to step S1602, and if no trigger is detected, the process returns to step S1601. Such a trigger is, for example, pressing on the touch panel or a user (visitor) approaching the exhibit within a predetermined distance.
(Step S1602) The content unit acquisition unit 1506 acquires a content unit. Details of the content unit acquisition processing will be described with reference to the flowchart of FIG.

（ステップＳ１６０３）コンテンツユニット出力部１５０８は、エージェント格納部１５０７から、ステップＳ１６０２で取得したコンテンツユニットに対応するエージェントを取得する。コンテンツユニットに対応するエージェントとは、例えば、以下のようなエージェントである。エージェント格納部１５０７は、例えば、説明員用のエージェント（属性値として、例えば「０」を有する）と、見学者用のエージェント（属性値として、例えば「１」を有する）を格納している。そして、ステップＳ１６０２で取得したコンテンツユニットがレクチャーコンテンツユニットの場合は、属性値「０」の説明員用のエージェントを取得する。また、ステップＳ１６０２で取得したコンテンツユニットがインタラクションコンテンツユニットの場合は、属性値「０」の説明員用のエージェント、および属性値「１」の見学者用のエージェントを取得する。そして、エージェントの出力の際には、それぞれの音声等に対応して、２種類のエージェントが配置、出力される。 (Step S1603) The content unit output unit 1508 acquires an agent corresponding to the content unit acquired in step S1602 from the agent storage unit 1507. The agent corresponding to the content unit is, for example, the following agent. The agent storage unit 1507 stores, for example, an explanation agent agent (having, for example, “0” as an attribute value) and a visitor agent (having, for example, “1” as an attribute value). If the content unit acquired in step S1602 is a lecture content unit, an agent for an explanatory staff with attribute value “0” is acquired. If the content unit acquired in step S1602 is an interaction content unit, an agent for an explanatory member with attribute value “0” and an agent for a visitor with attribute value “1” are acquired. At the time of agent output, two types of agents are arranged and output corresponding to the respective voices and the like.

（ステップＳ１６０４）コンテンツユニット出力部１５０８は、ステップＳ１６０２で取得したコンテンツユニットと、ステップＳ１６０３で取得したエージェントに基づいて、出力するコンテンツを構成する。例えば、コンテンツユニット出力部１５０８は、表示情報の上にエージェントを重ねて、出力する表示情報を構成し、当該出力する表示情報と出力する音声とを有するコンテンツを構成する。 (Step S1604) The content unit output unit 1508 configures content to be output based on the content unit acquired in step S1602 and the agent acquired in step S1603. For example, the content unit output unit 1508 configures display information to be output by overlaying an agent on the display information, and configures content having the display information to be output and sound to be output.

（ステップＳ１６０５）コンテンツユニット出力部１５０８は、ステップＳ１６０４で構成したコンテンツユニットの出力を開始する。ここで、コンテンツユニット出力部１５０８は、例えば、エージェントが重ねられた表示情報をディスプレイに表示し、音声をスピーカーに出力する。なお、ここでは、コンテンツユニットの出力は、以下のステップにおける処理と並行して行われる、とする。 (Step S1605) The content unit output unit 1508 starts outputting the content unit configured in step S1604. Here, the content unit output unit 1508 displays, for example, display information on which the agent is superimposed on a display, and outputs sound to a speaker. Here, it is assumed that the output of the content unit is performed in parallel with the processing in the following steps.

（ステップＳ１６０６）オブジェクト識別子取得部１５０４は、ユーザを識別する情報であるオブジェクト識別子を取得する。このユーザは、展示オブジェクト（例えば、表示情報が表示されているディスプレイ）の見学者であり、当該展示オブジェクトの近くに居る人である。 (Step S1606) The object identifier acquisition unit 1504 acquires an object identifier which is information for identifying a user. This user is a visitor of a display object (for example, a display on which display information is displayed), and is a person who is near the display object.

（ステップＳ１６０７）既説明情報蓄積部１５０５は、ステップＳ１６０６でオブジェクト識別子が取得できたか否かを判断する。オブジェクト識別子が取得できた場合はステップＳ１６０８に行き、オブジェクト識別子が取得できなかった場合はステップＳ１６０９に飛ぶ。 (Step S1607) The already-explained information storage unit 1505 determines whether or not the object identifier has been acquired in step S1606. If the object identifier can be acquired, the process proceeds to step S1608. If the object identifier cannot be acquired, the process jumps to step S1609.

（ステップＳ１６０８）既説明情報蓄積部１５０５は、ステップＳ１６０６で取得したオブジェクト識別子を一時蓄積する。この一時蓄積する記録媒体は、図示しない主メモリやハードディスクなどの記録媒体である。 (Step S1608) The already-explained information storage unit 1505 temporarily stores the object identifier acquired in Step S1606. The temporarily storing recording medium is a recording medium such as a main memory or a hard disk (not shown).

（ステップＳ１６０９）既説明情報蓄積部１５０５は、ステップＳ１６０５で開始したコンテンツユニットの出力を終了したか否かを判断する。コンテンツユニットの出力が終了すればステップＳ１６１０に行き、コンテンツユニットが出力中であればステップＳ１６０６に戻る。 (Step S1609) The already-explained information storage unit 1505 determines whether or not the output of the content unit started in step S1605 has ended. If the output of the content unit is completed, the process goes to step S1610, and if the content unit is being output, the process returns to step S1606.

（ステップＳ１６１０）既説明情報蓄積部１５０５は、ステップＳ１６０８で一時蓄積された１以上のオブジェクト識別子に基づいて、出力されたコンテンツユニットに関して、既に説明済み（出力されたコンテンツユニットを聴いた、または／および見た）であるオブジェクトのオブジェクト識別子を決定する。かかる決定は、例えば、以下のアルゴリズムにより行う。既説明情報蓄積部１５０５は、コンテンツ出力の開始後、所定時間（例えば、５秒）以内に蓄積されたオブジェクト識別子であり、かつ、終了前、所定時間（例えば、８秒）以内に蓄積されたオブジェクト識別子を、既に説明済みであるオブジェクトのオブジェクト識別子と決定する。また、既説明情報蓄積部１５０５は、ステップＳ１６０８で一時蓄積されているすべてのオブジェクト識別子を既に説明済みであるオブジェクトのオブジェクト識別子と決定しても良い。さらに、既説明情報蓄積部１５０５は、コンテンツユニットの出力開始から出力終了までのすべての所定間隔において検出されたオブジェクト識別子を既に説明済みであるオブジェクトのオブジェクト識別子と決定しても良い。その他、かかるオブジェクト識別子を決定するアルゴリズムは問わない。なお、ステップＳ１６１０で決定されるオブジェクト識別子は、一つであるとは限らない。ステップＳ１６１０で決定されるオブジェクト識別子は、２以上であっても、０であっても良い。なお、ステップＳ１６１０で決定されるオブジェクト識別子が０である場合は、ステップＳ１６１１、およびステップＳ１６１２の処理は行われない。 (Step S1610) The already-explained information storage unit 1505 has already explained (has listened to the output content unit) or has already been explained with respect to the output content unit based on the one or more object identifiers temporarily stored in step S1608. And the object identifier of the object that is seen). Such a determination is performed by the following algorithm, for example. The already-explained information accumulation unit 1505 is an object identifier accumulated within a predetermined time (for example, 5 seconds) after the start of content output, and accumulated within a predetermined time (for example, 8 seconds) before the end. The object identifier is determined as the object identifier of the object that has already been described. The already-explained information storage unit 1505 may determine all the object identifiers temporarily stored in step S1608 as the object identifiers of the objects that have already been described. Furthermore, the already-explained information storage unit 1505 may determine the object identifiers detected at all predetermined intervals from the start of output of the content unit to the end of output as the object identifiers of objects that have already been described. In addition, the algorithm for determining the object identifier is not limited. Note that the number of object identifiers determined in step S1610 is not necessarily one. The object identifier determined in step S1610 may be 2 or more, or 0. If the object identifier determined in step S1610 is 0, the processes in steps S1611 and S1612 are not performed.

（ステップＳ１６１１）既説明情報蓄積部１５０５は、ステップＳ１６１０で決定されたオブジェクト識別子と、ステップＳ１６０２で取得されたコンテンツユニットに基づいて、既説明情報を構成する。既説明情報は、例えば、コンテンツユニットを識別するコンテンツユニット識別子と、１以上のオブジェクト識別子を有する。
（ステップＳ１６１２）既説明情報蓄積部１５０５は、ステップＳ１６１１で構成された既説明情報を蓄積する。
なお、出力されたコンテンツユニットと、既に説明（出力）済みの人（オブジェクト）との対応付けの方法は、図１６における処理に限られない。
また、図１６のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。
次に、コンテンツユニット取得処理の詳細を、図１７のフローチャートを用いて説明する。 (Step S1611) The already-explained information storage unit 1505 configures already-explained information based on the object identifier determined in Step S1610 and the content unit acquired in Step S1602. The already described information includes, for example, a content unit identifier for identifying a content unit and one or more object identifiers.
(Step S1612) The already-explained information accumulation unit 1505 accumulates the already-explained information configured in step S1611.
Note that the method of associating the output content unit with the already explained (output) person (object) is not limited to the processing in FIG.
Further, in the flowchart of FIG. 16, the process is terminated by power-off or a process termination interrupt.
Next, details of the content unit acquisition processing will be described with reference to the flowchart of FIG.

（ステップＳ１７０１）オブジェクト識別子取得部１５０４は、ユーザを識別する情報であるオブジェクト識別子を取得する。なお、ここでは、オブジェクト識別子取得部１５０４は、一つのオブジェクト識別子を取得する、とする。
（ステップＳ１７０２）位置情報取得部１５０３は、トリガー検出部１５０２が検出した指示の位置に関する情報である位置情報を取得する。
（ステップＳ１７０３）距離情報取得手段１５０６１は、展示物（例えば、表示情報が表示されているディスプレイ）とユーザとの距離に関する情報である距離情報を取得する。
（ステップＳ１７０４）コンテンツユニット取得手段１５０６２は、カウンタｉに１を代入する。 (Step S1701) The object identifier acquisition unit 1504 acquires an object identifier which is information for identifying a user. Here, it is assumed that the object identifier acquisition unit 1504 acquires one object identifier.
(Step S1702) The position information acquisition unit 1503 acquires position information that is information regarding the position of the instruction detected by the trigger detection unit 1502.
(Step S1703) The distance information acquisition unit 15061 acquires distance information, which is information regarding the distance between an exhibit (for example, a display on which display information is displayed) and the user.
(Step S1704) The content unit acquisition unit 15062 substitutes 1 for the counter i.

（ステップＳ１７０５）コンテンツユニット取得手段１５０６２は、ｉ番目のコンテンツユニットがコンテンツユニット格納部１５０１に存在するか否かを判断する。ｉ番目のコンテンツユニットが存在すればステップＳ１７０６に行き、ｉ番目のコンテンツユニットが存在しなければ上位関数にリターンする。なお、コンテンツユニットを取得できずに上位関数にリターンした場合、図１６におけるステップＳ１７０３以降の処理は行われない。つまり、コンテンツユニットは出力されない。 (Step S1705) The content unit acquisition unit 15062 determines whether or not the i-th content unit exists in the content unit storage unit 1501. If the i-th content unit exists, the process goes to step S1706, and if the i-th content unit does not exist, the process returns to the upper function. If the content unit cannot be acquired and the process returns to the upper function, the processing after step S1703 in FIG. 16 is not performed. That is, the content unit is not output.

（ステップＳ１７０６）コンテンツユニット取得手段１５０６２は、ｉ番目のコンテンツユニットが、ステップＳ１７０２で取得した位置情報の条件を満たすか否かを判断する。かかる判断処理の具体例を以下に説明する。コンテンツユニットは、その属性値として、位置情報を有する。つまり、この位置情報は、当該コンテンツユニットが説明する、または議論されたオブジェクト（展示物や表示情報など）の、説明するまたは議論対象の情報等の位置を示す情報である。そして、コンテンツユニット取得手段１５０６２は、当該コンテンツユニットの属性値である位置情報が、ステップＳ１７０２で取得された位置情報と所定の関係にあるか否かを判断する。所定の関係とは、例えば、一致する、または所定の距離以内である、などをいう。ここで、コンテンツユニット取得手段１５０６２は、２つの位置情報が所定の関係にあれば、上記の位置情報の条件を満たすと判断する。位置情報の条件を満たせばステップＳ１７０７に行き、位置情報の条件を満たさなければステップＳ１７１０に飛ぶ。 (Step S1706) The content unit acquisition unit 15062 determines whether or not the i-th content unit satisfies the condition of the position information acquired in step S1702. A specific example of such determination processing will be described below. The content unit has position information as its attribute value. In other words, the position information is information indicating the position of the information (exhibit, display information, etc.) described or discussed by the content unit, such as information to be described or discussed. Then, the content unit acquisition unit 15062 determines whether or not the position information that is the attribute value of the content unit has a predetermined relationship with the position information acquired in step S1702. The predetermined relationship means, for example, matching or within a predetermined distance. Here, if the two pieces of position information have a predetermined relationship, the content unit acquisition unit 15062 determines that the condition of the position information is satisfied. If the condition of the position information is satisfied, the process goes to step S1707. If the condition of the position information is not satisfied, the process jumps to step S1710.

（ステップＳ１７０７）コンテンツユニット取得手段１５０６２は、ｉ番目のコンテンツユニットが、ステップＳ１７０３で取得した距離情報の条件を満たすか否かを判断する。具体的には、例えば、コンテンツユニット取得手段１５０６２は、まず、ｉ番目のコンテンツユニットの出力時間を取得する。出力時間は、コンテンツユニットの属性値として格納されていても良いし、コンテンツユニットのデータ長から算出しても良い。次に、コンテンツユニット取得手段１５０６２は、距離情報から出力対象のコンテンツユニットの出力時間（通常、幅がある）を決定する。そして、ｉ番目のコンテンツユニットの出力時間が、かかる決定した出力時間を満たせば、距離情報の条件を満たす、とする。つまり、本例では、展示物の近くに居るユーザは、その展示物に興味があると判断できるので、長いコンテンツユニットを出力しても良いが、展示物から離れたところに居るユーザは、その展示物にあまり興味がないと判断できるので、短いコンテンツユニットを選択して出力する。距離情報の条件を満たす場合はステップＳ１７０８に行き、距離情報の条件を満たさない場合はステップＳ１７１０に行く。 (Step S1707) The content unit acquisition unit 15062 determines whether or not the i-th content unit satisfies the condition of the distance information acquired in step S1703. Specifically, for example, the content unit acquisition unit 15062 first acquires the output time of the i-th content unit. The output time may be stored as an attribute value of the content unit, or may be calculated from the data length of the content unit. Next, the content unit acquisition unit 15062 determines the output time (usually having a width) of the content unit to be output from the distance information. If the output time of the i-th content unit satisfies the determined output time, the distance information condition is satisfied. In other words, in this example, a user who is close to an exhibit can determine that he / she is interested in the exhibit, so a long content unit may be output, but a user who is away from the exhibit may Since it can be determined that the exhibit is not very interested, a short content unit is selected and output. If the condition for distance information is satisfied, the process goes to step S1708. If the condition for distance information is not satisfied, the process goes to step S1710.

（ステップＳ１７０８）コンテンツユニット取得手段１５０６２は、ステップＳ１７０１で取得したオブジェクト識別子で識別されるユーザに対して、ｉ番目のコンテンツユニットが、既に出力されたことがあるか否かを判断する。既に出力されたことがないとの判断の場合はステップＳ１７０９に行き、既に出力されたことがあるとの判断である場合はステップＳ１７１０に行く。
（ステップＳ１７０９）コンテンツユニット取得手段１５０６２は、ｉ番目のコンテンツユニットを取得する。上位関数にリターンする。
（ステップＳ１７１０）コンテンツユニット取得手段１５０６２は、カウンタｉを１、インクリメントする。ステップＳ１７０５に戻る。 (Step S1708) The content unit acquisition unit 15062 determines whether or not the i-th content unit has already been output to the user identified by the object identifier acquired in step S1701. If it is determined that it has not been output, the process goes to step S1709. If it is determined that it has already been output, the process goes to step S1710.
(Step S1709) The content unit acquisition unit 15062 acquires the i-th content unit. Return to upper function.
(Step S1710) The content unit acquisition unit 15062 increments the counter i by 1. The process returns to step S1705.

なお、図１７のフローチャートにおいて、位置情報、距離情報、既に出力していないか否かの３つの条件に基づいて、コンテンツユニットを取得したが、１つまたは２つの条件を満たしたコンテンツユニットを取得しても良い。 In the flowchart of FIG. 17, the content unit is acquired based on the three conditions of position information, distance information, and whether or not it has already been output, but the content unit that satisfies one or two conditions is acquired. You may do it.

また、図１７のフローチャートにおいて、取り扱うオブジェクト識別子は一つであったが、２つ以上のオブジェクト識別子を取り扱っても良い。２つ以上のオブジェクト識別子を取り扱うとは、ステップＳ１７０１で２つ以上のオブジェクト識別子を取得し、当該取得した２つ以上のすべてのオブジェクト識別子が、位置情報、距離情報、既に出力していないか否かの３つの条件のうちの１つ以上の条件を満たした場合に、コンテンツユニットを取得することである。なお、３つの条件のうち、いずれの条件を満たした場合に、コンテンツユニットを取得するかは、いずれでも良い。
以下、本実施の形態における情報出力装置の具体的な動作について説明する。 In the flowchart of FIG. 17, only one object identifier is handled. However, two or more object identifiers may be handled. Handling two or more object identifiers means acquiring two or more object identifiers in step S1701, and whether or not all of the two or more acquired object identifiers have already been output. The content unit is acquired when one or more of the three conditions are satisfied. It should be noted that any one of the three conditions may be used to acquire the content unit.
Hereinafter, a specific operation of the information output apparatus in the present embodiment will be described.

図１８は、本情報出力装置のコンテンツユニット格納部１５０１が格納しているコンテンツユニット管理表である。また、コンテンツユニット格納部１５０１は、コンテンツユニット管理表と生コンテンツを格納している。生コンテンツの例は、図９、図１０等である。コンテンツユニット管理表は、「ＩＤ」「コンテンツユニット種類ＩＤ」「コンテンツユニット情報」を有する。「コンテンツユニット情報」は、コンテンツユニットに関する情報であり、ここでは、「オブジェクト識別子」「始点（ｔ）」「終点（ｔ）」「位置情報（ｘ，ｙ）」を有する。「ＩＤ」は、レコードを識別する情報であり、表におけるレコード管理のために存在する。「コンテンツユニット種類ＩＤ」は、コンテンツユニットの種類を識別する情報である。コンテンツユニット種類ＩＤは、ここでは、例えば、「０」がレクチャーコンテンツユニット、「１」がインタラクションコンテンツユニットである。「オブジェクト識別子」は、生コンテンツを送付したオブジェクトを識別する情報である。「始点（ｔ）」「終点（ｔ）」は、生コンテンツからコンテンツユニットを切り出すための始点と終点であり、ここでは、始点と終点は、時間情報（相対的な時間でも良いし、絶対的な時刻でも良い）である。「位置情報（ｘ，ｙ）」は、例えば、タッチパネル上の座標位置であり、「オブジェクト識別子」で識別されるユーザにより指示された座標位置である。 FIG. 18 is a content unit management table stored in the content unit storage unit 1501 of the information output apparatus. The content unit storage unit 1501 stores a content unit management table and raw content. Examples of raw content are shown in FIGS. The content unit management table includes “ID”, “content unit type ID”, and “content unit information”. The “content unit information” is information related to the content unit, and has “object identifier”, “start point (t)”, “end point (t)”, and “position information (x, y)”. “ID” is information for identifying a record and exists for record management in the table. The “content unit type ID” is information for identifying the type of content unit. Here, for example, “0” is a lecture content unit and “1” is an interaction content unit. “Object identifier” is information for identifying the object to which the raw content is sent. “Start point (t)” and “End point (t)” are a start point and an end point for cutting out the content unit from the raw content. Here, the start point and the end point are time information (relative time may be used or absolute). It may be a good time). “Position information (x, y)” is, for example, a coordinate position on the touch panel, and is a coordinate position designated by the user identified by the “object identifier”.

図１９は、エージェント格納部１５０７に格納されているエージェントの例であり、ここでは、「ＩＤ＝０」の説明員用のエージェントと、「ＩＤ＝１」の見学者用のエージェントを有する。なお、エージェント格納部１５０７は、その他のエージェントを有しても良い。種々のエージェントを有することにより、出力するコンテンツユニットが、レクチャーコンテンツユニットかインタラクションコンテンツユニットかによって、エージェントを変えても良いし、出力するディスプレイの大きさや種類等によって、エージェントを変えるなどしても良い。 FIG. 19 shows an example of agents stored in the agent storage unit 1507. Here, an agent for an explanation member with “ID = 0” and an agent for a visitor with “ID = 1” are included. Note that the agent storage unit 1507 may include other agents. By having various agents, the agent may be changed depending on whether the content unit to be output is a lecture content unit or an interaction content unit, or the agent may be changed depending on the size and type of the display to be output. .

図２０は、既説明情報蓄積部１５０５により蓄積された既説明情報を有する既説明情報管理表である。既説明情報管理表は、「ＩＤ」「コンテンツユニットＩＤ」「オブジェクト識別子」を有する。既説明情報管理表の各レコードは、「コンテンツユニットＩＤ」で識別されるコンテンツユニットが、「オブジェクト識別子」で識別される人に対して、既に出力されたことを示す。 FIG. 20 is an already explained information management table having already explained information accumulated by the already explained information accumulating unit 1505. The already-explained information management table has “ID”, “content unit ID”, and “object identifier”. Each record of the already-explained information management table indicates that the content unit identified by the “content unit ID” has already been output to the person identified by the “object identifier”.

また、図２１は、コンテンツユニット取得手段１５０６２が保持している距離時間関係管理表である。距離時間関係管理表は、ユーザと展示物との距離と、出力されるコンテンツユニットの出力時間の関係を管理する表である。距離時間関係管理表は、「ＩＤ」「距離（ｍ）」「出力時間（秒）」を有するレコードを１以上格納している。「距離（ｍ）」は、ユーザと展示物との距離、「出力時間（秒）」は、出力されるコンテンツユニットの出力時間の属性値である。距離時間関係管理表の「ＩＤ＝１」のレコードにおいて、ユーザと展示物との距離が「０ｍ以上１．０ｍ未満」の場合（近い場合）、出力時間が１００秒以上あるコンテンツユニット（長いコンテンツ）を選択することを示す。 FIG. 21 is a distance-time relationship management table held by the content unit acquisition unit 15062. The distance / time relationship management table is a table for managing the relationship between the distance between the user and the exhibit and the output time of the output content unit. The distance time relationship management table stores one or more records having “ID”, “distance (m)”, and “output time (seconds)”. “Distance (m)” is the distance between the user and the exhibit, and “Output time (seconds)” is the attribute value of the output time of the output content unit. In the record of “ID = 1” in the distance-time relationship management table, when the distance between the user and the exhibit is “0 m or more and less than 1.0 m” (close), a content unit (long content) with an output time of 100 seconds or more ) Is selected.

かかる場合、オブジェクト識別子「６３」の見学者が、当該展示物（ここでは、展示パネル）の前を通りかかり、立ち止まったとする。すると、本情報出力装置のオブジェクト識別子取得部１５０４は、オブジェクト識別子「６３」を取得する。なお、オブジェクト識別子「６３」は、当該展示物に設置されている情報取得装置により取得されたものである、とする。 In this case, it is assumed that the visitor with the object identifier “63” passes in front of the exhibit (in this case, the display panel) and stops. Then, the object identifier acquisition unit 1504 of the information output device acquires the object identifier “63”. It is assumed that the object identifier “63” is acquired by the information acquisition device installed in the exhibit.

次に、オブジェクト識別子「６３」の見学者が展示パネルに近づき、タッチパネルを具備する展示パネルのディスプレイの位置（２０，９０）をタッチした、とする。そして、トリガー検出部１５０２は、タッチパネルの押下であるトリガーを検知する。次に、位置情報取得部１５０３は、位置情報（２０，９０）を取得する。 Next, it is assumed that the visitor with the object identifier “63” approaches the display panel and touches the position (20, 90) of the display of the display panel having the touch panel. And the trigger detection part 1502 detects the trigger which is pressing-down of a touch panel. Next, the position information acquisition unit 1503 acquires position information (20, 90).

次に、コンテンツユニット取得部１５０６は、以下の処理によりコンテンツユニットを取得する。つまり、距離情報取得手段１５０６１は、展示パネルとオブジェクト識別子「６３」の見学者との距離「０．５ｍ」を取得する。そして、コンテンツユニット取得手段１５０６２は、１番目のコンテンツユニットを、コンテンツユニット格納部１５０１から取得する。ここでは、図１８のコンテンツユニット管理表から「ＩＤ＝１」のコンテンツユニット情報を取得する。 Next, the content unit acquisition unit 1506 acquires a content unit by the following processing. That is, the distance information acquisition unit 15061 acquires the distance “0.5 m” between the display panel and the visitor with the object identifier “63”. Then, the content unit acquisition unit 15062 acquires the first content unit from the content unit storage unit 1501. Here, the content unit information of “ID = 1” is acquired from the content unit management table of FIG.

次に、コンテンツユニット取得手段１５０６２は、位置情報（２０，９０）と、「ＩＤ＝１」のコンテンツユニット情報の位置情報が所定の条件を満たすか否かを判断する。ここでは、所定の条件は、コンテンツユニット情報の位置情報のうちいずれか一つの位置情報と、取得した位置情報（２０，９０）の距離が１０以内であることとする。ここで、コンテンツユニット情報の一つの位置（２０，８８）と取得した位置情報（２０，９０）の距離が「２」であるので条件を満たす。 Next, the content unit acquisition unit 15062 determines whether or not the position information (20, 90) and the position information of the content unit information with “ID = 1” satisfy a predetermined condition. Here, the predetermined condition is that the distance between any one piece of position information of the content unit information and the acquired position information (20, 90) is within 10 or less. Here, since the distance between one position (20, 88) of the content unit information and the acquired position information (20, 90) is “2”, the condition is satisfied.

次に、コンテンツユニット取得手段１５０６２は、「ＩＤ＝１」のコンテンツユニット情報の終点「５００」、始点「４００」から、コンテンツの出力時間「１００」を取得する。そして、コンテンツユニット取得手段１５０６２は、展示パネルとオブジェクト識別子「６３」の見学者との距離「０．５ｍ」をキーに、図２１の距離時間関係管理表を検索し、出力時間「１００以上」を得る。そして、コンテンツユニット取得手段１５０６２は、「ＩＤ＝１」のコンテンツユニット情報は、距離の条件を満たすことを検出する。 Next, the content unit acquisition unit 15062 acquires the content output time “100” from the end point “500” and the start point “400” of the content unit information of “ID = 1”. Then, the content unit acquisition unit 15062 searches the distance-time relationship management table of FIG. 21 using the distance “0.5 m” between the display panel and the visitor with the object identifier “63” as a key, and outputs the output time “100 or more”. Get. Then, the content unit acquisition unit 15062 detects that the content unit information of “ID = 1” satisfies the distance condition.

次に、コンテンツユニット取得手段１５０６２は、図２０の既説明情報管理表に基づいて、オブジェクト識別子「６３」の見学者が、既に「ＩＤ＝１」のコンテンツユニット情報に対応するコンテンツユニットの説明を受けたか否かを判断する。図１９の既説明情報管理表に、コンテンツユニットＩＤ「１」、オブジェクト識別子「６３」のレコード「ＩＤ＝１」が存在するので、オブジェクト識別子「６３」の見学者が、既に「ＩＤ＝１」のコンテンツユニット情報に対応するコンテンツユニットの説明を受けたと判断する。
そして、コンテンツユニット取得手段１５０６２は、「ＩＤ＝１」のコンテンツユニット情報に対応するコンテンツユニットは選択しない。
次に、コンテンツユニット取得手段１５０６２は、コンテンツユニット管理表から「ＩＤ＝２」のコンテンツユニット情報を取得する。 Next, based on the already-explained information management table of FIG. 20, the content unit acquisition unit 15062 describes the content unit corresponding to the content unit information of “ID = 1” by the visitor with the object identifier “63”. It is determined whether or not it has been received. Since the record “ID = 1” with the content unit ID “1” and the object identifier “63” exists in the already described information management table of FIG. 19, the visitor with the object identifier “63” has already “ID = 1”. It is determined that the description of the content unit corresponding to the content unit information is received.
Then, the content unit acquisition unit 15062 does not select the content unit corresponding to the content unit information of “ID = 1”.
Next, the content unit acquisition unit 15062 acquires content unit information of “ID = 2” from the content unit management table.

そして、コンテンツユニット取得手段１５０６２は、「ＩＤ＝２」のコンテンツユニット情報が、上述した位置情報、距離情報等の条件を満たすか否かを判断する。ここで、「ＩＤ＝２」のコンテンツユニット情報は、位置情報の条件を満たさない（位置情報を有さない）ので、コンテンツユニット取得手段１５０６２は、「ＩＤ＝２」のコンテンツユニット情報に対応するコンテンツユニットは選択しない。
次に、コンテンツユニット取得手段１５０６２は、コンテンツユニット管理表から「ＩＤ＝３」のコンテンツユニット情報を取得する。 Then, the content unit acquisition unit 15062 determines whether or not the content unit information of “ID = 2” satisfies the conditions such as the position information and the distance information described above. Here, since the content unit information of “ID = 2” does not satisfy the condition of the position information (has no position information), the content unit acquisition unit 15062 corresponds to the content unit information of “ID = 2”. No content unit is selected.
Next, the content unit acquisition unit 15062 acquires content unit information of “ID = 3” from the content unit management table.

かかる、「ＩＤ＝３」のコンテンツユニット情報は、位置情報、距離情報の条件を満たす。さらに、図１９の既説明情報管理表に基づいて、オブジェクト識別子「６３」の見学者が、既に「ＩＤ＝３」のコンテンツユニット情報に対応するコンテンツユニットの説明を受けていないと判断できる。つまり、既説明情報管理表は、コンテンツユニットＩＤ「３」、オブジェクト識別子「６３」のレコードを有さない。そして、コンテンツユニット取得手段１５０６２は、「ＩＤ＝３」のコンテンツユニット情報に対応するコンテンツユニットを選択して、取得する。 Such content unit information of “ID = 3” satisfies the conditions of position information and distance information. Furthermore, based on the already-explained information management table of FIG. 19, it can be determined that the visitor with the object identifier “63” has not received the explanation of the content unit corresponding to the content unit information with “ID = 3”. That is, the already-explained information management table does not have a record with the content unit ID “3” and the object identifier “63”. Then, the content unit acquisition unit 15062 selects and acquires the content unit corresponding to the content unit information of “ID = 3”.

次に、コンテンツユニット出力部１５０８は、エージェント格納部１５０７から、先に取得したコンテンツユニットに対応するエージェントを取得する。つまり、ここでは、コンテンツユニット出力部１５０８は、図１９のＩＤ「０」の説明員用のエージェントを取得する。そして、コンテンツユニット出力部１５０８は、取得したコンテンツユニットと、取得したエージェントに基づいて、出力するコンテンツを構成する。ここでは、コンテンツユニット出力部１５０８は、表示情報の上にエージェントを重ねて、出力する表示情報を構成し、当該出力する表示情報と出力する音声とを有するコンテンツを構成する。 Next, the content unit output unit 1508 acquires an agent corresponding to the previously acquired content unit from the agent storage unit 1507. In other words, here, the content unit output unit 1508 acquires the agent for the instructor with ID “0” in FIG. Then, the content unit output unit 1508 configures content to be output based on the acquired content unit and the acquired agent. Here, the content unit output unit 1508 configures display information to be output by overlapping an agent on the display information, and configures content having the display information to be output and sound to be output.

次に、コンテンツユニット出力部１５０８は、構成したコンテンツユニットの出力を開始する。かかる出力例を図２２に示す。図２２において、当該コンテンツユニットが有する音声も出力されている。また、コンテンツユニット出力部１５０８は、エージェントだけではなく、当該音声を認識した文字列を吹き出しの態様で表示している。なお、文字列は、音声認識せずに、適当に、例えば「ＸＸＸＸＹＹＹＹ・・・」などの文字列でも良い。なお、音声認識技術、吹き出しを出力する技術は、公知技術であるので詳細な説明を省略する。吹き出しを出力する技術は、例えば、予め吹き出しの図柄を保持しておき、エージェントの口（口の座標も管理している）あたりに出力することが考えられる。
なお、コンテンツユニット取得手段１５０６２は、タッチパネルに対する指示があった場合には、例えば、コンテンツユニット種類ＩＤ「０」のレクチャーコンテンツユニットのみから選択しても良い。
次に、オブジェクト識別子「８」の見学者が、展示パネルの近くを通りかかり、立ち止まったとする。 Next, the content unit output unit 1508 starts outputting the configured content unit. An example of such output is shown in FIG. In FIG. 22, the sound of the content unit is also output. In addition, the content unit output unit 1508 displays not only the agent but also a character string that recognizes the voice in a balloon form. The character string may be a character string such as “XXXXYYYY...” Appropriately without performing voice recognition. Note that the voice recognition technique and the technique for outputting a speech balloon are known techniques, and thus detailed description thereof is omitted. As a technique for outputting a speech balloon, for example, it may be possible to hold a speech balloon pattern in advance and output it around the mouth of the agent (which also manages the coordinates of the mouth).
Note that the content unit acquisition unit 15062 may select only the lecture content unit having the content unit type ID “0” when there is an instruction to the touch panel.
Next, it is assumed that the visitor with the object identifier “8” passes near the display panel and stops.

すると、本情報出力装置は、ユーザが近くに所定時間以上、留まっていることを検知する。これは、展示物に設置されたＩＲトラッカが、一定時間以上、オブジェクト識別子「６３」を受信し続けたことから、トリガー検出部１５０２がトリガーを検出したものである。
次に、距離情報取得手段１５０６１は、展示パネルとオブジェクト識別子「８」の見学者との距離「０．３ｍ」を取得する。
そして、距離「０．３ｍ」に基づいて、出力時間「１００秒以上」を取得する。 Then, the information output device detects that the user has stayed nearby for a predetermined time or more. This is because the trigger detection unit 1502 detects the trigger because the IR tracker installed in the exhibit continues to receive the object identifier “63” for a predetermined time or more.
Next, the distance information acquisition unit 15061 acquires the distance “0.3 m” between the display panel and the visitor with the object identifier “8”.
Then, based on the distance “0.3 m”, the output time “100 seconds or more” is acquired.

次に、コンテンツユニット取得手段１５０６２は、コンテンツユニット管理表から、順にコンテンツユニット情報を検査し、出力時間「１００秒以上」に合致するコンテンツユニットであり、かつオブジェクト識別子「８」の見学者が、過去に説明を受けていないコンテンツユニット（図２０のオブジェクト識別子「８」に対応するコンテンツユニットＩＤで識別されるコンテンツユニットを除く）を取得する。
ここでは、例えば、コンテンツユニット取得手段１５０６２は、図１８のＩＤ「７２」コンテンツユニット情報に対応するコンテンツユニットを取得するとする。 Next, the content unit acquisition unit 15062 examines the content unit information in order from the content unit management table, and the visitor with the object identifier “8” that is the content unit that matches the output time “100 seconds or more” A content unit that has not been explained in the past (excluding a content unit identified by a content unit ID corresponding to the object identifier “8” in FIG. 20) is acquired.
Here, for example, the content unit acquisition unit 15062 acquires the content unit corresponding to the ID “72” content unit information in FIG.

そして、コンテンツユニット出力部１５０８は、エージェント格納部１５０７から、先に取得したコンテンツユニットに対応するエージェントを取得する。つまり、ここでは、コンテンツユニット出力部１５０８は、図１９のＩＤ「０」の説明員用のエージェントと、ＩＤ「１」の見学者員用のエージェントを取得する。そして、コンテンツユニット出力部１５０８は、取得したコンテンツユニットと、取得したエージェントに基づいて、出力するコンテンツを構成する。ここでは、コンテンツユニット出力部１５０８は、表示情報の上にエージェントを重ねて、出力する表示情報を構成し、当該出力する表示情報と出力する音声とを有するコンテンツを構成する。 Then, the content unit output unit 1508 acquires an agent corresponding to the previously acquired content unit from the agent storage unit 1507. That is, here, the content unit output unit 1508 acquires the agent for the explanation member with ID “0” and the agent for the visitor member with ID “1” in FIG. 19. Then, the content unit output unit 1508 configures content to be output based on the acquired content unit and the acquired agent. Here, the content unit output unit 1508 configures display information to be output by overlapping an agent on the display information, and configures content having the display information to be output and sound to be output.

次に、コンテンツユニット出力部１５０８は、構成したコンテンツユニットの出力を開始する。かかる出力例を図２３に示す。なお、出力例は、図２３に限らず、図２４に示すように、コンテンツユニットが有する映像（ウィンドウの右側）を出力しても良い。つまり、コンテンツユニット出力部１５０８は、コンテンツユニットが有する音声情報は出力するが、その他の情報のうち、どの情報を出力するかは問わない。また、コンテンツユニット出力部１５０８は、エージェントを出力してもしなくても良い。
以上、本実施の形態によれば、情報出力装置は、所定の条件に合致するコンテンツニットを出力する。 Next, the content unit output unit 1508 starts outputting the configured content unit. An example of such output is shown in FIG. Note that the output example is not limited to FIG. 23, and as shown in FIG. 24, the video (right side of the window) of the content unit may be output. That is, the content unit output unit 1508 outputs audio information included in the content unit, but it does not matter which information is output among other information. Further, the content unit output unit 1508 may or may not output the agent.
As described above, according to the present embodiment, the information output device outputs a content knit that matches a predetermined condition.

なお、本実施の形態によれば、所定の条件は、上述した位置情報、距離情報、既に説明を受けたか否かの３つの条件の組み合わせであったが、他の条件を考慮しても良い。例えば、展示の終了間際においては、出力時間の長いコンテンツを出力しない、等である。かかる場合、情報出力装置は、展示の終了間際と判断されえる時刻に関する情報を格納している、とする。 Note that according to the present embodiment, the predetermined condition is a combination of the above-described position information, distance information, and three conditions indicating whether or not an explanation has been received, but other conditions may be considered. . For example, content that has a long output time is not output just before the end of the exhibition. In such a case, it is assumed that the information output device stores information regarding the time that can be determined to be just before the end of the exhibition.

さらに、本実施の形態における情報出力装置を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、コンピュータに、展示されているオブジェクトに対する説明または議論に関するコンテンツであるコンテンツユニットを１以上格納しており、コンテンツユニットの出力のトリガーを検出するトリガー検出ステップと、前記トリガー検出ステップでトリガーを検出した場合に、前記コンテンツユニットから１以上のコンテンツユニットを取得するコンテンツユニット取得ステップと、前記コンテンツユニット取得ステップで取得したコンテンツユニットを出力するコンテンツユニット出力ステップを実行させるためのプログラムである。
さらに、上記プログラムにおいて、前記表示情報を表示する表示ステップをさらに実行させても良い。
また、上記プログラムにおいて、前記トリガー検出ステップで検出するトリガーは、前記表示ステップで表示した表示情報に対する指示であっても良い。 Furthermore, the software that realizes the information output apparatus according to the present embodiment is the following program. That is, this program stores in the computer one or more content units that are contents related to explanations or discussions on the displayed object, and a trigger detection step for detecting a trigger of output of the content unit, and the trigger detection step A program for executing a content unit acquisition step of acquiring one or more content units from the content unit and a content unit output step of outputting the content unit acquired in the content unit acquisition step when a trigger is detected in is there.
Furthermore, in the program, a display step for displaying the display information may be further executed.
In the above program, the trigger detected in the trigger detection step may be an instruction for the display information displayed in the display step.

また、上記プログラムにおいて、前記コンテンツユニットは、音声情報と位置情報を有し、前記トリガー検出ステップで検出した指示の位置に関する情報である位置情報を取得する位置情報取得ステップを、コンピュータにさらに実行させ、前記コンテンツユニット取得ステップは、前記位置情報取得部が取得した位置情報に対応するコンテンツユニットを取得するプログラムでも良い。 In the above program, the content unit has audio information and position information, and causes the computer to further execute a position information acquisition step of acquiring position information that is information regarding the position of the instruction detected in the trigger detection step. The content unit acquisition step may be a program for acquiring a content unit corresponding to the position information acquired by the position information acquisition unit.

また、上記プログラムにおいて、前記コンテンツユニット取得ステップにおいて、前記表示情報が表示されているディスプレイとユーザとの距離に関する情報である距離情報を取得する距離情報取得サブステップと、前記距離情報取得サブステップで取得した距離情報に基づいて、１以上のコンテンツユニットを取得するコンテンツユニット取得サブステップを具備するプログラムでも良い。 In the program, in the content unit acquisition step, a distance information acquisition substep for acquiring distance information that is information on a distance between the display on which the display information is displayed and a user, and the distance information acquisition substep. The program may include a content unit acquisition substep for acquiring one or more content units based on the acquired distance information.

また、上記プログラムにおいて、コンピュータに、ユーザを識別する情報であるオブジェクト識別子を取得するオブジェクト識別子取得ステップと、前記コンテンツユニット出力ステップでコンテンツユニットを出力している間に、前記オブジェクト識別子取得ステップで取得したオブジェクト識別子を、前記コンテンツユニットに対応付けて蓄積する既説明情報蓄積ステップをさらに実行させ、前記コンテンツユニット取得ステップにおいて、前記オブジェクト識別子取得ステップで取得したオブジェクト識別子と対応付けられているコンテンツユニットは取得しないプログラムでも良い。
また、上記プログラムにおいて、１以上のエージェントを格納しており、前記コンテンツユニット出力ステップにおいて、コンテンツユニットを出力する際に、前記エージェントをも出力するプログラムでも良い。 In the above program, the object identifier acquisition step for acquiring an object identifier that is information for identifying the user to the computer, and the content unit output in the content unit output step are acquired in the object identifier acquisition step. In the content unit acquisition step, the content unit associated with the object identifier acquired in the object identifier acquisition step is further executed. It may be a program that is not acquired.
In the above program, one or more agents may be stored, and the content unit output step may output the agent when the content unit is output.

また、図２５は、本明細書で述べたプログラムを実行して、上述した種々の実施の形態の情報生成装置、または／および情報出力装置を実現するコンピュータの外観を示す。上述の実施の形態は、コンピュータハードウェア及びその上で実行されるコンピュータプログラムで実現され得る。図２５は、このコンピュータシステム２５０の概観図であり、図２６は、コンピュータシステム２５０のブロック図である。 FIG. 25 shows the external appearance of a computer that executes the program described in this specification to realize the information generation apparatus and / or the information output apparatus according to the various embodiments described above. The above-described embodiments can be realized by computer hardware and a computer program executed thereon. FIG. 25 is an overview diagram of the computer system 250, and FIG. 26 is a block diagram of the computer system 250.

図２５において、コンピュータシステム２５０は、ＦＤ（ＦｌｅｘｉｂｌｅＤｉｓｋ）ドライブ、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋＲｅａｄＯｎｌｙＭｅｍｏｒｙ）ドライブを含むコンピュータ２５１と、キーボード２５２と、マウス２５３と、モニタ２５４と、タッチパネル２５５、マイク２５６と、スピーカー２５７、ＩＲトラッカ２５８を含む。 In FIG. 25, a computer system 250 includes a computer 251 including a FD (Flexible Disk) drive and a CD-ROM (Compact Disk Read Only Memory) drive, a keyboard 252, a mouse 253, a monitor 254, a touch panel 255, and a microphone 256. And a speaker 257 and an IR tracker 258.

図２６において、コンピュータ２５１は、ＦＤドライブ２５１１、ＣＤ−ＲＯＭドライブ２５１２に加えて、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２５１３と、ＣＰＵ２５１３、ＣＤ−ＲＯＭドライブ２５１２及びＦＤドライブ２５１１に接続されたバス２５１４と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）２５１５と、ＣＰＵ２５１３に接続され、アプリケーションプログラムの命令を一時的に記憶するとともに一時記憶空間を提供するためのＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２５１６と、アプリケーションプログラム、システムプログラム、及びデータを記憶するためのハードディスク２５１７とを含む。ここでは、図示しないが、コンピュータ２５１は、さらに、ＬＡＮへの接続を提供するネットワークカードを含んでも良い。 26, in addition to the FD drive 2511 and the CD-ROM drive 2512, the computer 251 includes a CPU (Central Processing Unit) 2513, a bus 2514 connected to the CPU 2513, the CD-ROM drive 2512, and the FD drive 2511, and a boot. A ROM (Read-Only Memory) 2515 for storing programs such as an up program, and a RAM (Random Access Memory) connected to the CPU 2513 for temporarily storing instructions of application programs and providing a temporary storage space 2516 and a hard disk 2517 for storing application programs, system programs, and data. Although not shown here, the computer 251 may further include a network card that provides connection to the LAN.

コンピュータシステム２５０に、上述した実施の形態の情報生成装置や情報出力装置の機能を実行させるプログラムは、ＣＤ−ＲＯＭ２６０１、またはＦＤ２６０２に記憶されて、ＣＤ−ＲＯＭドライブ２５１２またはＦＤドライブ２５１１に挿入され、さらにハードディスク２５１７に転送されても良い。これに代えて、プログラムは、図示しないネットワークを介してコンピュータ２５１に送信され、ハードディスク２５１７に記憶されても良い。プログラムは実行の際にＲＡＭ２５１６にロードされる。プログラムは、ＣＤ−ＲＯＭ２６０１、ＦＤ２６０２またはネットワークから直接、ロードされても良い。 A program that causes the computer system 250 to execute the functions of the information generation apparatus and the information output apparatus according to the above-described embodiments is stored in the CD-ROM 2601 or the FD 2602, inserted into the CD-ROM drive 2512 or the FD drive 2511, Further, it may be transferred to the hard disk 2517. Alternatively, the program may be transmitted to the computer 251 via a network (not shown) and stored in the hard disk 2517. The program is loaded into the RAM 2516 when executed. The program may be loaded directly from the CD-ROM 2601, the FD 2602, or the network.

プログラムは、コンピュータ２５１に、上述した実施の形態の情報生成装置や情報出力装置の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティープログラム等は、必ずしも含まなくても良い。プログラムは、制御された態様で適切な機能（モジュール）を呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいれば良い。コンピュータシステム２５０がどのように動作するかは周知であり、詳細な説明は省略する。 The program does not necessarily include an operating system (OS) or a third-party program that causes the computer 251 to execute the functions of the information generation apparatus and the information output apparatus according to the above-described embodiments. The program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 250 operates is well known and will not be described in detail.

また、上記各実施の形態において、各処理（各機能）は、単一の装置（システム）によって集中処理されることによって実現されてもよく、あるいは、複数の装置によって分散処理されることによって実現されてもよい。つまり、例えば、実施の形態１で説明した情報生成装置と、実施の形態２で説明した情報出力装置は、単一のシステム（情報処理システム）として、実現されていても良い。 In each of the above embodiments, each process (each function) may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be. That is, for example, the information generation device described in the first embodiment and the information output device described in the second embodiment may be realized as a single system (information processing system).

また、上記各実施の形態において、各構成要素は専用のハードウェアにより構成されてもよく、あるいは、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。
また、上記プログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、あるいは分散処理を行ってもよい。
また、上記各実施の形態において、一の装置に存在する２以上の通信手段（情報送信部など）は、物理的に一の媒体で実現されても良いことは言うまでもない。
本発明は、以上の実施の形態に限定されることなく、種々の変更が可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。 In each of the above embodiments, each component may be configured by dedicated hardware, or a component that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
Further, the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.
Further, in each of the above embodiments, it goes without saying that two or more communication means (such as an information transmission unit) existing in one apparatus may be physically realized by one medium.
The present invention is not limited to the above-described embodiments, and various modifications are possible, and it goes without saying that these are also included in the scope of the present invention.

以上のように、本発明にかかる情報出力装置は、展示されているオブジェクトの説明または議論に関するコンテンツユニットを適切に出力できるという効果を有し、プレゼンテーション装置等として有用である。 As described above, the information output device according to the present invention has an effect of appropriately outputting the content unit related to the explanation or discussion of the displayed object, and is useful as a presentation device or the like.

実施の形態１における情報生成装置のブロック図Block diagram of an information generation apparatus according to Embodiment 1 同情報生成装置の動作について説明するフローチャートA flowchart for explaining the operation of the information generation apparatus 同情報生成装置の動作について説明するフローチャートA flowchart for explaining the operation of the information generation apparatus 同情報生成装置の動作について説明するフローチャートA flowchart for explaining the operation of the information generation apparatus 同情報生成装置の動作について説明するフローチャートA flowchart for explaining the operation of the information generation apparatus 同情報生成装置の動作について説明するフローチャートA flowchart for explaining the operation of the information generation apparatus 同インタラクションコンテンツユニットの抽出処理の説明図Explanatory drawing of extraction processing of the same interaction content unit 同情報取得装置の例を示す図The figure which shows the example of the same information acquisition device 同情報取得装置が取得した生コンテンツの概念図を示す図The figure which shows the conceptual diagram of the raw content which the same information acquisition apparatus acquired 同情報取得装置が取得した生コンテンツの概念図を示す図The figure which shows the conceptual diagram of the raw content which the same information acquisition apparatus acquired 同典型説明情報の例を示す図The figure which shows the example of the typical description information 同典型的な説明の指示の様子を示す模式図Schematic diagram showing the typical explanation instructions 同コンテンツユニット情報の例を示す図The figure which shows the example of the same content unit information 同コンテンツユニット出力指示の入力画面例を示す図The figure which shows the example of an input screen of the same content unit output instruction | indication 実施の形態２における情報出力装置のブロック図Block diagram of an information output apparatus according to Embodiment 2 同情報出力装置の動作について説明するフローチャートFlow chart for explaining the operation of the information output device 同情報出力装置の動作について説明するフローチャートFlow chart for explaining the operation of the information output device 同コンテンツユニット管理表を示す図Figure showing the same content unit management table 同エージェント管理表を示す図Figure showing the agent management table 同既説明情報管理表を示す図Figure showing the same explanation information management table 同距離時間関係管理表を示す図Figure showing the same distance time relationship management table 同情報出力装置の出力例を示す図The figure which shows the output example of the same information output device 同情報出力装置の出力例を示す図The figure which shows the output example of the same information output device 同情報出力装置の出力例を示す図The figure which shows the output example of the same information output device 同コンピュータの外観図External view of the computer 同コンピュータシステムのブロック図Block diagram of the computer system

Explanation of symbols

１０１対象識別子格納部
１０２、１５０９表示情報格納部
１０３、１５１０表示部
１０４指示受付部
１０５、１５０３位置情報取得部
１０６映像情報受付部
１０６映像受付部
１０７音声情報受付部
１０８、１５０４オブジェクト識別子取得部
１０９コンテンツユニット抽出部
１１０コンテンツユニット蓄積部
１１１、１５０８コンテンツユニット出力部
１０９１説明音声検出手段
１０９２典型説明情報格納手段
１０９３類似度検出手段
１０９４レクチャーコンテンツユニット抽出手段
１０９５対話音声検出手段
１０９６インタラクションコンテンツユニット抽出手段
１５０１コンテンツユニット格納部
１５０２トリガー検出部
１５０５既説明情報蓄積部
１５０６コンテンツユニット取得部
１５０７エージェント格納部
１５０６１距離情報取得手段
１５０６２コンテンツユニット取得手段 101 Target identifier storage unit 102, 1509 Display information storage unit 103, 1510 Display unit 104 Instruction reception unit 105, 1503 Position information acquisition unit 106 Video information reception unit 106 Video reception unit 107 Audio information reception unit 108, 1504 Object identifier acquisition unit 109 Content unit extraction unit 110 Content unit storage unit 111, 1508 Content unit output unit 1091 Explanation voice detection means 1092 Typical explanation information storage means 1093 Similarity detection means 1094 Lecture content unit extraction means 1095 Dialogue voice detection means 1096 Interaction content unit extraction means 1501 Content unit storage unit 1502 Trigger detection unit 1505 Already described information storage unit 1506 Content unit acquisition unit 1507 Agent Storage unit 15061 distance information acquisition unit 15062 content units acquiring means

Claims

It is an information generation device that accumulates content units, which are part of content, from raw content related to explanations or discussions on objects being exhibited,
A voice information receiving unit that receives voice information that is information having voice;
A content unit extraction unit that extracts content units having audio information for a time that matches a predetermined condition based on the audio information received by the audio information reception unit;
An information generation apparatus comprising a content unit storage unit that stores content unit information, which is information about content units extracted by the content unit extraction unit.

A display information storage unit that stores display information that is information to be displayed;
A display unit for displaying the display information;
An instruction receiving unit for receiving an instruction for display information displayed by the display unit;
A position information acquisition unit that acquires position information that is information related to the position of the instruction received by the instruction reception unit;
The content unit extraction unit includes:
Based on the instruction received by the instruction receiving unit and the audio information received by the audio information receiving unit, audio information for a time that matches a predetermined condition, and the position information acquisition unit acquired during the time The information generation apparatus according to claim 1, wherein a content unit having position information is extracted.

The content unit extraction unit includes:
Explanation voice detecting means for detecting explanation voice, which is voice determined to be spoken by a single speaker for a predetermined time or more,
Typical explanation information storage means for storing typical explanation information having one or more pieces of position information indicating the position of an instruction with respect to the display information;
A description corresponding to the one or more position information acquired by the position information acquisition unit based on the one or more position information acquired by the position information acquisition unit and the one or more position information included in the typical description information; Similarity detection means for detecting the similarity to the explanation corresponding to the explanation information, the explanation voice detection means detects the explanation voice, and the similarity detected by the similarity detection means is equal to or higher than a predetermined similarity 3. The information generating apparatus according to claim 1, further comprising: a lecture content unit extracting unit that extracts a lecture content unit having audio information corresponding to the time of the explanation audio when the degree of similarity is.

The content unit extraction unit includes:
Dialogue voice detecting means for detecting dialogue voice having a voice paired with one speaker identifier and a voice paired with another substantially continuous speaker identifier;
3. The information generating apparatus according to claim 1, further comprising: an interaction content unit extracting unit that extracts an interaction content unit having audio information corresponding to a time of the dialogue voice when the dialogue voice detecting unit detects the dialogue voice. .

A content unit storage unit that stores at least one content unit that is content related to an explanation or discussion of an object on display;
A trigger detection unit for detecting a trigger of output of the content unit;
A content unit acquisition unit that acquires one or more content units from the content unit storage unit when the trigger detection unit detects a trigger;
An information output device comprising a content unit output unit that outputs the content unit acquired by the content unit acquisition unit.

A display information storage unit that stores display information that is information to be displayed;
The information output apparatus according to claim 5, further comprising a display unit that displays the display information.

A program for accumulating content units, which are part of content, from raw content related to explanations or discussions about objects on display,
On the computer,
A voice information receiving step for receiving voice information which is information having voice;
A content unit extraction step for extracting a content unit having audio information for a time that matches a predetermined condition based on the audio information received in the audio information reception step;
A program for executing a content unit accumulation step for accumulating content unit information, which is information relating to the content unit extracted in the content unit extraction step.

On the computer,
Contains one or more content units that are content related to the explanation or discussion of the objects on display,
A trigger detection step for detecting an output trigger of the content unit;
A content unit acquisition step of acquiring one or more content units from the content unit when a trigger is detected in the trigger detection step;
A program for executing a content unit output step of outputting the content unit acquired in the content unit acquisition step.