JP2006268395A

JP2006268395A - Information processor, information processing method and program

Info

Publication number: JP2006268395A
Application number: JP2005085198A
Authority: JP
Inventors: Hideto Yuzawa; 秀人湯澤
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2005-03-24
Filing date: 2005-03-24
Publication date: 2006-10-05

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processor allowing looking-back upon contents including a feeling of a conference participant or the like. <P>SOLUTION: This information processor has: a synchronization part 8 synchronizing bioinformation of a person with video information; a state estimation processing part 10 finding a state estimation value showing a psychological state of the person by use of a prescribed evaluation function on the basis of the bioinformation of the person or voice information in time of photography; an index file storage part 12 storing the state estimation value correspondingly to a scene of a video; a retrieval means retrieving the scene of the video having the state estimation value according to movement of an object; and a display control part 16 displaying a screen wherein the object showing the person is disposed inside a coordinate space expressed by use of the psychological state of the person according to the state estimation value. Thereby, because the object showing the person is disposed inside the coordinate space expressed by use of the psychological state of the person, the contents including the feeling can be looked back upon. The psychological state can be converted into a distance and be disposed to express subjective information as positional information. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来、会議などでの決定事項などは、後に発行される議事録を見返すことにより振り返ることができる。しかし、決定事項に至った詳細な経緯などは記録されないため、適切に振り返ることができない。 Conventionally, decisions made at meetings and the like can be reviewed by reviewing the minutes issued later. However, since the details of the decision are not recorded, it is not possible to look back appropriately.

このような振り返りを支援する技術として動画像を利用する方法が提案されている。例えば、会議をビデオ撮影し、後で思い出したいシーンを再生することにより想起を支援する方法である。そのためには、見たいシーンをすばやく検索する技術が不可欠となる。 A method of using a moving image has been proposed as a technology for supporting such reflection. For example, it is a method of supporting recall by taking a video of a meeting and playing back a scene that you want to remember later. For this purpose, technology for quickly searching for a desired scene is indispensable.

例えば、動画像からある一定の時間間隔で抽出したシーン（静止画）を代表画像としてユーザに提示することにより、ユーザはその代表画像を頼りに時間的前後関係を推測し、見たいシーンに辿り着くことができる。しかし、この場合、ユーザに提示されるシーンは一定間隔の時間で切り出されたシーンであり、そのシーンが時間推定にどれだけ有用であるかは不明である。例えば、似たような複数のシーンが抽出された場合、それらから時間的前後関係を推測するのは難しい。 For example, by presenting a scene (still image) extracted from a moving image at a certain time interval as a representative image to the user, the user guesses the temporal context based on the representative image and traces to the desired scene. Can arrive. However, in this case, the scene presented to the user is a scene cut out at regular intervals, and it is unclear how useful the scene is for time estimation. For example, when a plurality of similar scenes are extracted, it is difficult to estimate the temporal context from them.

このような問題を解決するために、動画像の内容が大きく変化した場合の静止画を代表画像としてユーザに提示する方法がある。この場合、単なる時間間隔で得られた代表画像と異なり、シーンが変化した時点を代表画像とするため、同じようなシーンが連続して代表画像となることはない。しかし、この方法では、画像中の変化が少ないために代表画像が得られない場合もある。逆に、カメラのパンやブレやズーム、又は誰かがカメラの前を通った場合はシーンが大きく変化したとみなされ、そのシーンが代表画像として提示される場合もある。それらは会議の振り返りをするユーザにとっては意味のない（記憶にない）シーンであるので役立たない。 In order to solve such a problem, there is a method of presenting a still image to the user as a representative image when the content of the moving image changes greatly. In this case, unlike the representative image obtained at simple time intervals, the point in time when the scene changes is used as the representative image, so that similar scenes do not continuously become the representative image. However, in this method, there is a case where a representative image cannot be obtained because there are few changes in the image. Conversely, when the camera pans, blurs, zooms, or someone passes in front of the camera, the scene is considered to have changed significantly, and the scene may be presented as a representative image. They are useless because they are meaningless (unremembered) scenes for users looking back at the meeting.

また、どの発言者がいつ発言したかを特定し、それをビジュアルにタイムラインとして表示し、そのタイムラインから再生したい部分の動画へアクセスする方法が提案されている。しかし、ユーザは表示されたタイムラインを見て、自分が探しているシーンがどこに相当するかを見たいシーンを推測する必要がある。このため、長時間の会議になればなるほど、見る範囲も広くなり、ユーザに負担を強いることになる。 In addition, a method has been proposed in which which speaker speaks and when it is displayed, which is visually displayed as a timeline, and a part of the video to be reproduced is accessed from the timeline. However, the user needs to look at the displayed timeline and infer a scene that he / she wants to see where the scene he / she is looking for corresponds. For this reason, the longer the meeting is, the wider the viewing range becomes, and the user is burdened.

また、見たいシーンを画像としてユーザがシステムに与えることにより、システムはその与えられた画像と類似するシーンを検索するという方法がある。この場合、ユーザは画像を検索要求として与えるが、会議の録画の場合、設置されたカメラ位置を想定して、その位置から見える画像を作りシステムに与えなければならない。ユーザは通常、カメラの位置などについては考慮していないのでカメラ位置からの画像を作成して与えるのは難しい。 In addition, there is a method in which a user searches a scene similar to the given image when the user gives the system a scene to be viewed as an image. In this case, the user gives an image as a search request, but in the case of recording a conference, it is necessary to assume the position of the installed camera and to create an image that can be seen from that position and give it to the system. Since the user usually does not consider the camera position or the like, it is difficult to create and provide an image from the camera position.

また、上記手法では、意味のあるシーンを検出することが不可能であるという問題があった。このような問題点を解決する従来技術として以下のようなものが提案されている。 In addition, the above method has a problem that it is impossible to detect a meaningful scene. The following are proposed as conventional techniques for solving such problems.

例えば、プレゼンテーションの始めに順次表示された複数のスライドに対する参加者の関心の度合いを検出し、スライド番号と関心度のテーブルの形で保持しておき、一番関心度の高かったスライド番号に対応するスライド進行パターンに従って、スライドを取り出して表示するという手法が提案されている（特許文献１参照）。 For example, it detects the degree of interest of participants for multiple slides displayed sequentially at the beginning of the presentation, and keeps them in the form of a slide number and interest level table, corresponding to the slide number with the highest interest level. A method has been proposed in which a slide is taken out and displayed according to a slide progress pattern (see Patent Document 1).

また、プレゼンテーション実施時に発表者の音声を入力し、入力した音声の大きさおよび抑揚を分析し、音声の大きいあるいは抑揚の大きい強調部である場合にプレゼンテーション中の画面を強調表示する方法が提案されている（特許文献２参照）。 In addition, a method has been proposed in which the speech of the presenter is input at the time of presentation, the volume and inflection of the input speech is analyzed, and the screen during presentation is highlighted when the speech is high or the emphasis is high. (See Patent Document 2).

また、撮影者が重要度を決定し、映像にタグ付けしていく手法（特許文献３参照）、興味の度合いを顔面皮膚温と顔面距離から算定する手法（特許文献４参照）、端末操作時のオペレータの操作状況を監視して操作を支援する手法（特許文献５参照）が提案されている。 In addition, a method in which the photographer determines the importance and tags the video (see Patent Document 3), a method in which the degree of interest is calculated from the facial skin temperature and the facial distance (see Patent Document 4), when the terminal is operated A method for supporting the operation by monitoring the operation status of the operator (see Patent Document 5) has been proposed.

特開２００２−１４９１４５号公報JP 2002-149145 A 特開２００２−２３７１６号公報JP 2002-23716 A 特開２０００−１９７００２号公報JP 2000-197002 A 特開２００１−１００８８８号公報Japanese Patent Laid-Open No. 2001-100808 特開平５−１７６３２５号公報JP-A-5-176325

しかしながら、後で、会議参加者の感情も含めて会議内容を振り返る場合、感情に関連する情報は手がかりが少なく振り返りにくい。また、遠隔会議においいては、会議参加者の位置が近接していないため、微妙な感情を表現しにくい（パーソナルスペース問題）。 However, when the content of the conference including the emotions of the conference participants is later reviewed, the information related to emotions has few clues and is difficult to reflect. In remote conferences, the participants are not close to each other, so it is difficult to express subtle emotions (personal space problem).

また、特許文献１記載の装置では、対象を内容自体ではなく、スライドにおいており、会議やプレゼンテーション時において、参加者は話者と資料を補完しながら見るなど、必ずしもスライドへの凝視と関心度は結びつかないため、参加者の感情も含めて内容を振り返ることはできない。また、参加者各人の関心度を決定することができない。 Further, in the apparatus described in Patent Document 1, the object is not the content itself but the slide, and at the time of a meeting or presentation, the participant looks at the speaker and the material while complementing each other. Because it is not connected, it is not possible to look back on the contents including the emotions of the participants. Also, the interest level of each participant cannot be determined.

特許文献２記載の装置では、会議では発話者が偏る状況も多く、またプレゼンテーションにおいては、視聴者が発話する機会は極端に少なく、発話音量により意味のあるシーンを特定することは困難である。特許文献３記載の装置では、撮影者１人の決定に依存し、視聴者の意図を反映することができない。さらに、撮影者が意図的にタグ付け作業を行わなければならない。 In the apparatus described in Patent Document 2, there are many situations where speakers are biased in a conference, and in a presentation, there are extremely few opportunities for viewers to speak, and it is difficult to specify a meaningful scene based on the speech volume. In the device described in Patent Document 3, the intention of the viewer cannot be reflected depending on the determination of one photographer. Furthermore, the photographer must intentionally perform tagging work.

特許文献４記載の装置では、ＴＶゲームなどにおける情動の入力支援手法であり、その記録と映像のリンクによる重要シーンを振り返ることはできない。特許文献５記載の装置によっては、感情も含めた内容の振り返りを行うことはできない。 The device described in Patent Document 4 is an emotional input support method in a TV game or the like, and it is impossible to look back on an important scene by a link between the recording and the video. Depending on the device described in Patent Document 5, it is not possible to look back on the content including emotions.

そこで、本発明は、上記問題点に鑑みてなされたもので、会議参加者等の感情を含めて内容を振り返ることができる情報処理装置、情報処理方法及びプログラムを提供することを目的とする。 Therefore, the present invention has been made in view of the above-described problems, and an object thereof is to provide an information processing apparatus, an information processing method, and a program that can look back on the contents including the feelings of conference participants and the like.

上記課題を解決するために、本発明は、撮影時に得た人物の生体情報及び音声情報のうちの少なくとも一方を用いて推定された人物の心的状態を該撮影時の映像情報に対応させて格納する格納手段と、前記人物の心的状態を少なくとも一つの座標軸とする座標空間内に前記人物を表すオブジェクトを配置した画面を表示する表示手段とを備える情報処理装置である。 In order to solve the above problems, the present invention relates a person's mental state estimated using at least one of the biological information and audio information of the person obtained at the time of shooting to the video information at the time of shooting. An information processing apparatus comprising: storage means for storing; and display means for displaying a screen in which an object representing the person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis.

本発明によれば、人物の心的状態を少なくとも一つの座標軸とする座標空間内に人物を表すオブジェクトを配置した画面を表示することにより感情も含めて内容を振り返ることができる。また、心的状態を距離に変換して配置することで、主観情報を位置情報として表現できる。つまり、パーソナルスペースを可視化できる。また、心的状態を位置情報にすることによって心的状態をパターンとして覚えておくことで、後で探し易くなる。ここで、座標空間には、２次元空間の場合の座標平面も座標空間に含まれる。 According to the present invention, it is possible to look back on the contents including emotions by displaying a screen in which an object representing a person is arranged in a coordinate space in which the mental state of the person is at least one coordinate axis. Moreover, subjective information can be expressed as position information by converting a mental state into a distance. That is, the personal space can be visualized. Moreover, it becomes easy to search later by remembering a mental state as a pattern by making a mental state into positional information. Here, the coordinate space includes a coordinate plane in the case of a two-dimensional space.

本発明は、前記座標空間内のオブジェクトの位置に応じた映像を前記格納手段から検索する検索手段をさらに備え、前記表示手段は、前記検索手段が検索した映像を表示することを特徴とする。本発明によれば、人物のオブジェクトを検索キーとして、直感的に目的の心理状態のシーンを探すことができる。 The present invention further includes search means for searching the storage means for a video corresponding to the position of the object in the coordinate space, and the display means displays the video searched by the search means. According to the present invention, it is possible to intuitively search for a scene in a target psychological state using a person object as a search key.

前記表示手段は、前記人物の心的状態を少なくとも一つの座標軸とする座標空間内に前記オブジェクトを配置した画面のサムネイルを表示することを特徴とする。これにより、心理状態を一瞥して確認できる。前記表示手段は、選択されたサムネイルに対応する映像を表示することを特徴とする。前記表示手段は、前記オブジェクトの遷移を表示することを特徴とする。前記座標空間は、時間軸を一つの座標軸とすることを特徴とする。本発明は、前記人物の生体情報又は前記撮影時の音声情報に基づいて、所定の評価関数を用いて前記人物の心的状態の程度を表す状態推定値を求める推定手段をさらに備える。 The display means displays a thumbnail of a screen on which the object is arranged in a coordinate space having the mental state of the person as at least one coordinate axis. As a result, the psychological state can be confirmed at a glance. The display means displays an image corresponding to the selected thumbnail. The display means displays the transition of the object. The coordinate space has a time axis as one coordinate axis. The present invention further includes estimation means for obtaining a state estimation value representing a degree of the mental state of the person using a predetermined evaluation function based on the biological information of the person or the sound information at the time of photographing.

本発明は、撮影時に得た人物の生体情報及び音声情報のうちの少なくとも一方を用いて推定された人物の心的状態を該撮影時の映像情報に対応させて格納する格納手段と、前記人物を表すオブジェクト及び前記人物の心的状態を少なくとも一つの座標軸とする座標空間を表示する表示手段と、前記座標空間内のオブジェクトの位置に応じた映像を前記格納手段から検索する検索手段とを備える情報処理装置である。本発明によれば、人物のオブジェクトを移動させることにより、直感的に目的の心理状態のシーンを探すことができる。前記人物の心的状態は、人物の興味度、人物の興奮度、人物の快適度、人物の理解度、人物の記憶度、人物の支持度合い、人物の共有感覚度合い、人物の主観度合い、人物の客観度合い、人物の嫌悪度合い及び人物の疲労度のうちの少なくとも一つを含むことを特徴とする。 The present invention provides a storage means for storing a mental state of a person estimated using at least one of the biological information and audio information of the person obtained at the time of photographing in correspondence with the video information at the time of photographing, and the person And a display means for displaying a coordinate space having the mental state of the person as at least one coordinate axis, and a search means for searching a video corresponding to the position of the object in the coordinate space from the storage means. Information processing apparatus. According to the present invention, it is possible to intuitively search for a scene in a target psychological state by moving a person object. The mental state of the person includes the person's interest degree, person's excitement degree, person's comfort degree, person's understanding degree, person's memory degree, person's support degree, person's shared sense degree, person's subjectivity degree, person Including at least one of an objective degree, a person's dislike degree, and a person's fatigue degree.

本発明は、撮影時に得た人物の生体情報及び音声情報のうちの少なくとも一方を用いて算出された人物の心的状態を示す状態推定値を取得するステップと、前記状態推定値に応じて、前記人物の心的状態を少なくとも一つの座標軸とする座標空間内に前記人物を表すオブジェクトを配置した画面を表示するステップとを有する情報処理方法である。本発明によれば、人物の心的状態を少なくとも一つの座標軸とする座標空間内に人物を表すオブジェクトを配置した画面を表示するので、感情も含めて内容を振り返ることができる。また、心的状態を距離に変換して配置することで、主観情報を位置情報として表現できる。つまり、パーソナルスペースを可視化できる。また、心的状態を位置情報にすることによって心的状態をパターンとして覚えておくことで、後で探し易くなる。 The present invention obtains a state estimated value indicating a mental state of a person calculated using at least one of the biological information and audio information of the person obtained at the time of shooting, and according to the state estimated value, And displaying a screen on which an object representing the person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis. According to the present invention, since a screen on which an object representing a person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis is displayed, the contents including emotion can be looked back on. Moreover, subjective information can be expressed as position information by converting a mental state into a distance. That is, the personal space can be visualized. Moreover, it becomes easy to search later by remembering a mental state as a pattern by making a mental state into positional information.

本発明は、前記人物の心的状態を少なくとも一つの座標軸とする座標空間内に前記人物を表すオブジェクトを配置した画面のサムネイルを表示するステップをさらに有する。本発明は、人物を表すオブジェクト及び人物の心的状態を少なくとも一つの座標軸とする座標空間内に配置した画面を表示するステップと、前記オブジェクトの位置に対応する状態推定値を持つ映像シーンを検索するステップとを有する情報処理方法である。 The present invention further includes a step of displaying a thumbnail of a screen on which an object representing the person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis. The present invention includes a step of displaying a screen arranged in a coordinate space having an object representing a person and a mental state of the person as at least one coordinate axis, and a video scene having a state estimated value corresponding to the position of the object Information processing method.

本発明は、撮影時に得た人物の生体情報及び音声情報のうちの少なくとも一方を用いて算出された人物の心的状態を示す状態推定値を取得するステップ、前記状態推定値に応じて、前記人物の心的状態を少なくとも一つの座標軸とする座標空間内に前記人物を表すオブジェクトを配置した画面を表示するための情報を生成するステップをコンピュータに実行させるためのプログラムである。本発明によれば、人物の心的状態を少なくとも一つの座標軸とする座標空間内に人物を表すオブジェクトを配置した画面を表示するので、感情も含めて内容を振り返ることができる。また、心的状態を距離に変換して配置することで、主観情報を位置情報として表現できる。つまり、パーソナルスペースを可視化できる。また、心的状態を位置情報にすることによって心的状態をパターンとして覚えておくことで、後で探し易くなる。 The present invention includes a step of obtaining a state estimated value indicating a person's mental state calculated using at least one of the biological information and audio information of the person obtained at the time of photographing, according to the state estimated value, A program for causing a computer to execute a step of generating information for displaying a screen in which an object representing a person is arranged in a coordinate space having a mental state of the person as at least one coordinate axis. According to the present invention, since a screen on which an object representing a person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis is displayed, the contents including emotion can be looked back on. Moreover, subjective information can be expressed as position information by converting a mental state into a distance. That is, the personal space can be visualized. Moreover, it becomes easy to search later by remembering a mental state as a pattern by making a mental state into positional information.

本発明によれば、会議参加者等の感情を含めて内容を振り返ることができる情報処理装置、情報処理方法及びプログラムを提供することを目的とする。 According to the present invention, it is an object to provide an information processing apparatus, an information processing method, and a program capable of looking back on the contents including feelings of conference participants and the like.

以下、本発明を実施するための最良の形態について説明する。図１は、本発明の実施形態における情報検索システム１のブロック図である。図１に示すように、情報検索システム１は、生体情報検出部２、音声情報検出部３、会議映像検出部４、生体情報格納部５、音声情報格納部６、会議映像格納部７、同期部８、会議映像格納部９、状態推定処理部１０、状態順位判定部１１、インデクスファイル格納部１２、検索要求入力部１３、検索要求記録部１４、検索部１５、表示制御部１６及び表示部１７を備える。 Hereinafter, the best mode for carrying out the present invention will be described. FIG. 1 is a block diagram of an information search system 1 in an embodiment of the present invention. As shown in FIG. 1, the information retrieval system 1 includes a biological information detection unit 2, a voice information detection unit 3, a conference video detection unit 4, a biological information storage unit 5, a voice information storage unit 6, a conference video storage unit 7, and a synchronization. Unit 8, conference video storage unit 9, state estimation processing unit 10, state rank determination unit 11, index file storage unit 12, search request input unit 13, search request recording unit 14, search unit 15, display control unit 16 and display unit 17.

この情報検索システム１は、会議の映像から所望のシーンを検索できるようにするものである。生体情報検出部２は、会議の視聴者の瞬きや瞳孔径、注視対象、注視時間など眼球に係る情報や、顔面皮膚温などの生体情報を検出するものである。この生体情報は、特に会議の視聴者への計測機器の装着無しに取得可能なものであるのが好ましい。 This information retrieval system 1 enables a desired scene to be retrieved from a conference video. The biometric information detection unit 2 detects information related to the eyeball such as blinks, pupil diameters, gaze targets, and gaze times of conference viewers, and biometric information such as facial skin temperature. It is preferable that this biometric information can be obtained without wearing a measuring device particularly for a conference viewer.

生体情報検出部２は、例えば、瞬きや瞳孔径に係る情報を、公知の手法を用いて視聴者の顔をカメラで撮像した顔画像から顔領域を抽出し、更に眼領域を特定し、瞬きの数をカウントしたり、瞳孔径を測定したりすることによって取得する。注視対象、注視時間においては、注視対象候補側に配置したカメラ等により撮像した画像を用い、前記手法により眼領域を特定し、撮像したカメラの位置より注視対象を特定し、撮像された眼領域の撮影時間から注視時間を特定することにより取得できる。顔面皮膚温においては、赤外線カメラ、サーモグラフィ等により視聴者への計測機器を装着することなく取得することが可能である。 The biological information detection unit 2 extracts, for example, information relating to blinking and pupil diameter from a face image obtained by capturing a viewer's face with a camera using a known method, further specifies an eye region, and blinks. Is obtained by counting the number of pupils or measuring the pupil diameter. In the gaze target and the gaze time, the eye area is identified by the above method using an image captured by a camera or the like arranged on the gaze target candidate side, the gaze target is identified from the position of the captured camera, and the captured eye area It can be acquired by specifying the gaze time from the shooting time. The facial skin temperature can be obtained without wearing a measuring device for the viewer by using an infrared camera, thermography or the like.

音声情報検出部３は、例えば集音マイク等により構成され、会議の視聴者、話者の音声等を検出する。会議映像検出部４は、ビデオカメラ等により構成され、会議に関する映像情報を検出する。会議映像検出部４には、会議で使用されるプレゼンテーション資料や視聴者等が広角で撮像できるカメラを用いても良い。 The voice information detection unit 3 includes, for example, a sound collecting microphone and detects the voices of conference viewers and speakers. The conference video detection unit 4 is composed of a video camera or the like, and detects video information related to the conference. The conference video detection unit 4 may be a camera that can capture presentation materials and viewers used in a conference at a wide angle.

生体情報格納部５は、生体情報検出部２で検出した生体情報を表形式のデータシートとして格納する。図２は、生体情報格納部５に格納されたデータシートを示す図である。図２において、時刻ｔ、瞳孔径x、y、注視対象、注視時間、瞬目、顔面皮膚温は生体情報である。この生体情報は、視聴者ごとに、時刻ｔに対応させて格納されている。注視対象は、注視対象側に設置されたカメラから撮像可能であったカメラ位置を特定することで注視対象とする。注視時間は、対象ごとに累積時間を算出して記録する。 The biological information storage unit 5 stores the biological information detected by the biological information detection unit 2 as a tabular data sheet. FIG. 2 is a diagram illustrating a data sheet stored in the biometric information storage unit 5. In FIG. 2, time t, pupil diameter x, y, gaze target, gaze time, blink, and facial skin temperature are biological information. This biometric information is stored for each viewer in association with time t. The gaze target is set as the gaze target by specifying the camera position that can be imaged from the camera installed on the gaze target side. The gaze time is calculated and recorded for each subject.

音声情報格納部６は、音声情報検出部２２で検出した音声等を表形式のデータシートとして格納する。図３は、音声情報格納部６に格納されたデータシートを示す図である。図３において、発話有無、音量、話者発話有無、話者音量及び環境音は音声情報である。この生体情報は、視聴者ごとに、時刻ｔに対応させて格納されている。特に環境音は、映像に付随して発生する音響情報を含む。 The voice information storage unit 6 stores the voice detected by the voice information detection unit 22 as a tabular data sheet. FIG. 3 is a view showing a data sheet stored in the audio information storage unit 6. In FIG. 3, utterance presence / absence, volume, speaker utterance presence / absence, speaker volume, and environmental sound are voice information. This biometric information is stored for each viewer in association with time t. In particular, the environmental sound includes acoustic information generated accompanying the video.

会議映像格納部７は、会議映像検出部４によって検出された会議映像を一覧形式のデータシートとして格納する。次に、会議映像格納部７に格納された会議映像について説明する。図４は、会議映像格納部７に格納されたデータシートを示す図である。図４に示すように、会議映像は、識別記号ＩＤに対応して格納されている。 The conference video storage unit 7 stores the conference video detected by the conference video detection unit 4 as a data sheet in a list format. Next, the conference video stored in the conference video storage unit 7 will be described. FIG. 4 is a diagram illustrating a data sheet stored in the conference video storage unit 7. As shown in FIG. 4, the conference video is stored in correspondence with the identification symbol ID.

同期部８は、人物の生態情報及び撮影時の音声情報を映像情報に同期させるものである。会議情報格納部９は、例えば、半導体メモリ、ハードディスク、フレキシブルディスクなどの記録装置により構成され、同期部８で同期化させた情報を会議インデックスファイルとして保持する。図５は会議情報格納部９に格納されたデータシートを示す図である。図５に示すように、会議情報は、時刻、瞳孔径、注視対象、注視時間、瞬目、顔面皮膚温などの生体情報と、発話有無、音量、話者発話有無、話者音量、環境音などの音情報と、会議映像情報データＩＤとを含む。会議情報は、視聴者ごとに、時刻に対応付けられている。 The synchronization unit 8 synchronizes the biological information of the person and the audio information at the time of shooting with the video information. The conference information storage unit 9 is configured by a recording device such as a semiconductor memory, a hard disk, or a flexible disk, and holds information synchronized by the synchronization unit 8 as a conference index file. FIG. 5 is a view showing a data sheet stored in the conference information storage unit 9. As shown in FIG. 5, the conference information includes biological information such as time, pupil diameter, gaze target, gaze time, blink, and facial skin temperature, utterance presence / absence, volume, speaker utterance presence / absence, speaker volume, and environmental sound. Such as sound information and conference video information data ID. The meeting information is associated with the time for each viewer.

状態推定処理部１０は、人物の生体情報及び音声情報に基づいて、所定の評価関数を用いて状態推定値を求め、求めた状態推定値を会議映像のシーンＩＤに対応させてインデックスファイル格納部１２に格納する。このようにして、状態推定処理部１０によって、会議映像に対してインデックスが付与される。状態推定値は、人物の心的状態を示し、撮影時に得た人物の生体情報及び音声情報のうちの少なくとも一方を用いて算出されたものである。 The state estimation processing unit 10 obtains a state estimation value using a predetermined evaluation function based on the biological information and voice information of the person, and associates the obtained state estimation value with the scene ID of the conference video to store the index file storage unit. 12. In this way, the state estimation processing unit 10 assigns an index to the conference video. The state estimated value indicates a mental state of the person and is calculated using at least one of the biological information and voice information of the person obtained at the time of photographing.

この会議視聴者の心的状態には、例えば会議視聴者の認知的状態や精神的情報等を含む。ここで、人物の心的状態には、人物の興味度、人物の興奮度、人物の快適度、人物の理解度、人物の記憶度、人物の支持度合い、人物の共有感覚度合い、人物の主観度合い、人物の客観度合い、人物の嫌悪度合い及び人物の疲労度等が含まれる。この評価関数は、視聴者の生態情報及び視聴者を撮影する際に得られる音情報を重み付けして加算するものである。状態推定処理部１０は、所定のプログラムを実行することでその機能が実現される。 The mental state of the conference viewer includes, for example, the cognitive state and mental information of the conference viewer. Here, the mental state of the person includes the person's interest degree, person's excitement degree, person's comfort degree, person's understanding degree, person's memory degree, person's support degree, person's shared sense degree, person's subjectivity Degree, objective degree of person, degree of disgust of person, degree of fatigue of person, and the like. This evaluation function weights and adds the biological information of the viewer and the sound information obtained when the viewer is photographed. The function of the state estimation processing unit 10 is realized by executing a predetermined program.

状態順位判定部１１は、状態推定処理部１０により算出された視聴者の状態推定値を元に視聴者の状態に順位を付け、状態推定処理部１０の推定結果の数を減らす処理を行う。これにより、順位判定によりランドマークとして適切なシーン数にすることで、候補が膨大となり検索性能が著しく劣化するということを防止できる。 The state rank determination unit 11 ranks the viewer state based on the viewer state estimation value calculated by the state estimation processing unit 10 and performs a process of reducing the number of estimation results of the state estimation processing unit 10. Thus, by making the number of scenes appropriate as landmarks by rank determination, it is possible to prevent the number of candidates from becoming enormous and the search performance from deteriorating significantly.

インデックスファイル格納部１２は、時間、会議映像ＩＤに対応付けられた人物の状態推定値をインデックスファイルとして格納する。このインデックスファイル格納部１２は、例えば各視聴者の表形式のデータシートで格納される。 The index file storage unit 12 stores the estimated state value of the person associated with the time and the conference video ID as an index file. The index file storage unit 12 is stored, for example, in a tabular data sheet for each viewer.

図６はインデックスファイル格納部１２が格納するデータシートを示す図である。図６に示すように、左側から時刻ｔ、会議映像情報ＩＤ並びに興味度、興奮度、快適度、理解度及び記憶度等の視聴者の心的状態の項目が設けられている。時刻ｔの欄には、会議映像情報に対応する時刻がそれぞれ入力されている。会議映像情報ＩＤは、会議映像情報を識別する番号が入力されている。興味度、興奮度、快適度、理解度及び記憶度等の視聴者の心的状態の各欄には、評価関数により得られた状態推定値が入力されている。また、図中＊印は状態順位判定部１１で削除されたシーンである。これにより検索するシーン数を少なくしている。なお、同図では視聴者の心的状態として、興味度、興奮度、快適度、理解度及び記憶度を表示しているが、視聴者の心的状態はこれらに限定されない。 FIG. 6 is a diagram showing a data sheet stored in the index file storage unit 12. As shown in FIG. 6, items of the viewer's mental state such as time t, conference video information ID, interest level, excitement level, comfort level, understanding level, and memory level are provided from the left side. In the column of time t, times corresponding to the conference video information are respectively input. As the conference video information ID, a number for identifying the conference video information is input. In each column of the viewer's mental state, such as the degree of interest, the degree of excitement, the degree of comfort, the degree of understanding, and the degree of memory, state estimation values obtained by the evaluation function are input. In the figure, * indicates a scene deleted by the state order determination unit 11. This reduces the number of scenes to be searched. In the figure, the interest level, the excitement level, the comfort level, the understanding level, and the memory level are displayed as the viewer's mental state, but the viewer's mental state is not limited to these.

検索要求入力部１３は、例えばタッチパネル、マウス又はキーボード等により構成され、ユーザはこの検索要求入力部１３を操作することによって特定の検索要求を入力することができる。ユーザは、この検索要求入力部１３から例えば特定の視聴者を指定し、更にその視聴者が興味を持った状態などの視聴者の心的状態を指定することができる。検索要求記録部１４は、例えば半導体メモリ、ハードディスク、フレキシブルディスクなどの記録装置により構成され、検索要求入力部１３で入力された検索条件を記録する。 The search request input unit 13 includes, for example, a touch panel, a mouse, or a keyboard, and the user can input a specific search request by operating the search request input unit 13. The user can specify, for example, a specific viewer from the search request input unit 13, and can further specify the mental state of the viewer such as a state in which the viewer is interested. The search request recording unit 14 is configured by a recording device such as a semiconductor memory, a hard disk, or a flexible disk, and records the search condition input by the search request input unit 13.

検索部１５は、検索要求入力部１３から入力された検索要求に基づいて、座標空間内のオブジェクトの位置に応じた状態推定値を持つ映像シーン（オブジェクトの位置に応じた映像）をインデックスファイル格納部１２から検索する。表示制御部１６は、インデックスファイル格納部１２から会議映像データＩＤ及び状態推定値を取得し、状態推定値を位置情報に変換して、人物の心的状態を少なくとも一つの座標軸とする座標空間内に人物を表すオブジェクトを配置した画面を表示する。このとき、表示制御部１６は、会議映像データＩＤを参照して会議映像格納部７から映像情報を取得して、映像をオブジェクトに対応させて表示部１７に表示させる。 Based on the search request input from the search request input unit 13, the search unit 15 stores a video scene having a state estimation value corresponding to the position of the object in the coordinate space (video corresponding to the position of the object) as an index file. Search from part 12. The display control unit 16 acquires the conference video data ID and the state estimated value from the index file storage unit 12, converts the state estimated value into position information, and uses the mental state of the person as at least one coordinate axis in the coordinate space. Displays a screen on which objects representing persons are arranged. At this time, the display control unit 16 acquires video information from the conference video storage unit 7 with reference to the conference video data ID, and displays the video on the display unit 17 in association with the object.

次に、状態推定処理部１０の動作について説明する。眼球運動心理学（眼球運動の実験心理学、まばたきの心理学、瞳孔運動の心理学）によれば、興味は、瞳孔径に関連し、理解度、記憶度は、瞬目に関連し、興奮度、快適度は、瞬目や顔面皮膚温に関連していると考えられている。但し、これら単体では、例えば室内温度環境による顔面皮膚温上昇など、精度を保つことができない。そこで、これらデータ及び発話音量、環境音等を重み付けして加算した評価関数により、前記状態を同定する。 Next, the operation of the state estimation processing unit 10 will be described. According to eye movement psychology (experimental psychology of eye movement, psychology of blinking, psychology of pupil movement), interest is related to pupil diameter, comprehension and memory are related to blink, and excitement Degree and comfort are considered to be related to blink and facial skin temperature. However, these simple substances cannot maintain accuracy such as an increase in facial skin temperature due to an indoor temperature environment. Therefore, the state is identified by an evaluation function obtained by weighting and adding these data, speech volume, environmental sound, and the like.

次に、状態推定処理部１０で用いる評価関数を以下に示す。人物の興味度、人物の興奮度、人物の快適度、人物の理解度、人物の記憶度の状態推定値を求める評価関数は以下の通りである。なお、同様にして人物の支持度合い、人物の共有感覚度合い、人物の主観度合い、人物の客観度合い、人物の嫌悪度合い及び人物の疲労度の状態推定値を求めることができる。 Next, the evaluation function used in the state estimation processing unit 10 is shown below. The evaluation functions for obtaining state estimation values of person interest, person excitement, person comfort, person understanding, and person memory are as follows. Similarly, it is possible to obtain state estimation values of a person's support level, a person's shared sense level, a person's subjectivity level, a person's objectivity level, a person's disgust level, and a person's fatigue level.

・興味度ｆ１＝ｗ１１＊瞳孔径（変化量,変化速度）＋ｗ１２＊注視（時間,回数）＋Ｗ１３＊瞬目（率,回数,群発瞬目数）＋ｗ１４＊顔面皮膚温変化量＋ｗ１５＊発話音量＋ｗ１６＊話者発話音量＋ｗ１６＊環境音・・・（１） Interest degree f1 = w11 * pupil diameter (change amount, change speed) + w12 * gaze (time, frequency) + W13 * blink (rate, frequency, cluster blink) + w14 * facial skin temperature change + w15 * speech volume + w16 * Speaker volume + w16 * Environmental sound (1)

・興奮度ｆ２＝ｗ２１＊瞳孔径（変化量,変化速度）＋ｗ２２＊注視（時間,回数）＋ｗ２３＊瞬目（率,回数,群発瞬目数）＋ｗ２４＊顔面皮膚温変化量＋ｗ２５＊発話音量＋ｗ２６＊話者発話音量＋ｗ２６＊環境音・・・（２） Exciting degree f2 = w21 * pupil diameter (change amount, change speed) + w22 * gaze (time, frequency) + w23 * blink (rate, frequency, number of cluster blinks) + w24 * facial skin temperature change + w25 * speech volume + w26 * Speaker volume + w26 * Environmental sound (2)

・快適度ｆ３＝ｗ３１＊瞳孔径（変化量,変化速度）＋ｗ３２＊注視（時間,回数）＋ｗ３３＊瞬目（率,回数,群発瞬目数）＋ｗ３４＊顔面皮膚温変化量＋ｗ３５＊発話音量＋ｗ３６＊話者発話音量＋ｗ３６＊環境音・・・（３） Comfort level f3 = w31 * pupil diameter (change amount, change speed) + w32 * gaze (time, frequency) + w33 * blink (rate, frequency, cluster blink) + w34 * facial skin temperature change + w35 * speech volume + w36 * Speaker volume + w36 * Environmental sound (3)

・理解度ｆ４＝ｗ４１＊瞳孔径（変化量,変化速度）＋ｗ４２＊注視（時間,回数）＋ｗ４３＊瞬目（率,回数,群発瞬目数）＋ｗ４４＊顔面皮膚温変化量＋ｗ４５＊発話音量＋ｗ４６＊話者発話音量+ｗ４６＊環境音・・・（４）・ Understanding f4 = w41 * pupil diameter (change amount, change speed) + w42 * gaze (time, number of times) + w43 * blink (rate, number of times, cluster blink number) + w44 * face skin temperature change amount + w45 * utterance volume + w46 * Speaker volume + w46 * Environmental sound (4)

・記憶度ｆ５＝ｗ５１＊瞳孔径（変化量,変化速度）＋ｗ５２＊注視（時間,回数）＋ｗ５３＊瞬目（率,回数,群発瞬目数）＋ｗ５４＊顔面皮膚温変化量＋ｗ５５＊発話音量＋ｗ５６＊話者発話音量＋ｗ５６＊環境音・・・（５） Memory degree f5 = w51 * pupil diameter (change amount, change speed) + w52 * gaze (time, frequency) + w53 * blink (rate, frequency, cluster blink) + w54 * facial skin temperature change + w55 * speech volume + w56 * Speaker volume + w56 * Environmental sound (5)

次に、上記評価関数について説明する。式１の興味度ｆにおいて、瞳孔変化量が大きく、注視時間が長く、瞬目回数が少なく、顔面皮膚温の変化量が多く、発話音量の大きい部分を、興味があるシーンとして映像を特定することができる。このため式（１）の興味度ｆは、瞳孔径（変化量、変化速度）、注視（時間、回数）、瞬目（率、回数、群発瞬目数）、顔面皮膚温変化量、発話音量、話者発話音量、環境音等をそれぞれ重み係数ｗ１１〜ｗ１ｎで重み付けして加算することで状態推定値を算出することができる。 Next, the evaluation function will be described. In the degree of interest f in Formula 1, the video is specified as a scene of interest where the pupil variation is large, the gaze time is long, the number of blinks is small, the facial skin temperature variation is large, and the speech volume is large. be able to. Therefore, the degree of interest f in the expression (1) is the pupil diameter (change amount, change speed), gaze (time, number of times), blink (rate, number of times, number of cluster blinks), facial skin temperature change amount, speech volume. The estimated state value can be calculated by weighting and adding the speaker utterance volume, the environmental sound, and the like by weighting factors w11 to w1n, respectively.

式（２）の興奮度ｆ２は、瞳孔径（変化量、変化速度）、注視（時間、回数）、瞬目（率、回数、群発瞬目数）、顔面皮膚温変化量、発話音量、話者発話音量、環境音等に重み係数ｗ２１〜ｗ２ｎでそれぞれ重み付けして加算することで状態推定値を算出することができる。式（３）の快適度ｆ３は、瞳孔径（変化量、変化速度）、注視（時間、回数）、瞬目（率、回数、群発瞬目数）、顔面皮膚温変化量、発話音量、話者発話音量、環境音等に重み係数ｗ３１〜ｗ３ｎでそれぞれ重み付けして加算することで状態推定値を算出することができる。 The degree of excitement f2 in equation (2) is the pupil diameter (change amount, change speed), gaze (time, number of times), blink (rate, number of times, number of cluster blinks), facial skin temperature change amount, speech volume, talk The estimated state value can be calculated by adding the weights w21 to w2n to the person's utterance volume, the environmental sound, and the like. The comfort level f3 in Equation (3) is the pupil diameter (change amount, change speed), gaze (time, number of times), blink (rate, number of times, number of cluster blinks), facial skin temperature change amount, speech volume, talk The estimated state value can be calculated by weighting and adding to each person's speech volume, environmental sound, etc. with weighting coefficients w31 to w3n.

式（４）の理解度において、理解困難な場合は瞬目が増加すると考えられている（まばたきの心理学）。一般的に理解が困難な場合、視聴者は、補足情報を取得しようとし、注視時間が増加する傾向にあると考えられるため、注視時間が長く、瞬目回数が多い部分を、理解が困難であるシーンとして映像を特定する。このため、式（４）の理解度ｆ４は、瞳孔径（変化量、変化速度）、注視（時間、回数）、瞬目（率、回数、群発瞬目数）、顔面皮膚温変化量、発話音量、話者発話音量、環境音等に重み係数ｗ４１〜ｗ４ｎでそれぞれ重み付けして加算することで状態推定値を算出することができる。 In the degree of understanding of equation (4), it is considered that blinking increases when it is difficult to understand (psychology of blinking). In general, when it is difficult to understand, viewers tend to acquire supplementary information and the gaze time tends to increase, so it is difficult to understand the part where the gaze time is long and the number of blinks is large. The video is specified as a certain scene. Therefore, the comprehension degree f4 of the equation (4) is the pupil diameter (change amount, change speed), gaze (time, number of times), blink (rate, number of times, number of cluster blinks), facial skin temperature change amount, speech The estimated state value can be calculated by weighting and adding the volume, speaker utterance volume, environmental sound, etc. with weighting factors w41 to w4n.

また、式（５）の記憶度ｆ５は、瞳孔径（変化量、変化速度）、注視（時間、回数）、瞬目（率、回数、群発瞬目数）、顔面皮膚温変化量、発話音量、話者発話音量、環境音等に重み係数ｗ５１〜ｗ５ｎでそれぞれ重み付けして加算することで状態推定値を算出することができる。 In addition, the memory f5 of Expression (5) is the pupil diameter (change amount, change speed), gaze (time, number of times), blink (rate, number of times, number of cluster blinks), facial skin temperature change amount, speech volume. The estimated state value can be calculated by weighting the speaker utterance volume, the environmental sound, and the like with weighting coefficients w51 to w5n and adding them.

図７は生体情報、音声情報、会議映像情報の検出及び記録を説明するための図である。参照符号２００は会議室を示す。図７に示す例では、上述した生体情報検出部２は、赤外線カメラ２１１および２１２により構成されている。これにより、例えば視聴者Ａがスライド表示部２０５を注視した場合に赤外線カメラ２１１により眼球映像が検出される。話者２０６を注視した場合は赤外線カメラ２１２により撮像される。また、上述した音声情報検出部３は、例えば音圧センサーと指向性マイク２０２とで構成される。これにより、例えば視聴者Ａの発話情報が検出される。会議映像情報検出部４は、会議映像検出カメラ２０３により構成される。 FIG. 7 is a diagram for explaining detection and recording of biometric information, audio information, and conference video information. Reference numeral 200 indicates a conference room. In the example illustrated in FIG. 7, the above-described biological information detection unit 2 includes infrared cameras 211 and 212. Thereby, for example, when the viewer A gazes at the slide display unit 205, the eyeball image is detected by the infrared camera 211. When the speaker 206 is watched, it is picked up by the infrared camera 212. Moreover, the audio | voice information detection part 3 mentioned above is comprised by the sound pressure sensor and the directional microphone 202, for example. Thereby, for example, the utterance information of the viewer A is detected. The conference video information detection unit 4 includes a conference video detection camera 203.

図８は、索引付与処理の動作フローチャートである。ステップＳ１０１において、状態推定処理部１０は、会議情報格納部９内の会議映像インデックスファイルを読み出す。ステップＳ１０２において、状態推定処理部１０は、シーンＩＤごとに、会議映像インデックスファイルのパラメータを読み出し、評価関数を用いて人物の状態推定値を算出し、人物の状態推定値をインデックスファイル格納部１２に格納する。ステップＳ１０３で、状態順位判定部１１は、状態推定処理部１０により算出された視聴者の状態推定値を元に視聴者の状態に順位を付け、状態推定処理部１０の推定結果の数を減らす処理を行う。これにより、順位判定によりランドマークとして適切なシーン数にすることで、候補が膨大となり検索性能が著しく劣化するということを防止できる。このようにして、人物の心的状態に応じたインデックス（索引）を映像情報に対して付与することができる。なお、ステップＳ１０３の状態順位判定処理は選択的に行うようにしても良い。 FIG. 8 is an operation flowchart of the index assigning process. In step S <b> 101, the state estimation processing unit 10 reads the conference video index file in the conference information storage unit 9. In step S102, the state estimation processing unit 10 reads out the parameters of the conference video index file for each scene ID, calculates the estimated state value of the person using the evaluation function, and uses the estimated state value of the person as the index file storage unit 12. To store. In step S103, the state ranking determination unit 11 ranks the viewer state based on the viewer state estimation value calculated by the state estimation processing unit 10, and reduces the number of estimation results of the state estimation processing unit 10. Process. Thus, by making the number of scenes appropriate as landmarks by rank determination, it is possible to prevent the number of candidates from becoming enormous and the search performance from deteriorating significantly. In this way, an index corresponding to the mental state of the person can be added to the video information. Note that the state order determination process in step S103 may be selectively performed.

次に、表示処理について説明する。図９は、表示処理の動作フローチャートである。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として興味度及び理解度を検索条件として入力する。ステップＳ２０１で、状態順位判定部１１は、上述したの評価関数に基づいて、ランドマーク選定基準を参照して例えば、その上位５位までを参照する。ステップＳ２０２で、ステップＳ１の情報に基づいて図６のインデックスファイルを参照して、会議映像ＩＤを取得する。 Next, display processing will be described. FIG. 9 is an operation flowchart of the display process. The user operates the search request input unit 13 to input an interest level and an understanding level as search conditions as the viewer's mental state. In step S201, the state order determination unit 11 refers to the landmark selection criteria based on the above-described evaluation function, and refers to, for example, the top five places. In step S202, the conference video ID is acquired by referring to the index file in FIG. 6 based on the information in step S1.

ステップＳ２０３で、表示制御部１６は、座標空間の選定処理を行う。つまり、表示制御部１６は、検索要求入力部１３によって得られた要求情報、例えば興味度と理解度について検索したい、という要求をもとに、図１０の座標空間を作成する。ステップＳ２０４で、表示制御部１６は、会議参加者ごとの選定されたシーンＩＤに対する各評価関数を参照して、興味度および理解度の数値を得る。ステップＳ２０５で、表示制御部１６は、図６の数値に基づいてオブジェクトを図１０の座標空間に配置する。 In step S203, the display control unit 16 performs a coordinate space selection process. That is, the display control unit 16 creates the coordinate space in FIG. 10 based on the request information obtained by the search request input unit 13, for example, a request to search for the degree of interest and the degree of understanding. In step S204, the display control unit 16 refers to each evaluation function for the selected scene ID for each conference participant and obtains numerical values of the degree of interest and the degree of understanding. In step S205, the display control unit 16 places the object in the coordinate space of FIG. 10 based on the numerical values of FIG.

図１０は、本実施形態におけるユーザインタフェースを示す図である。参照符号２１は会議映像のタイムライン、２２は会議映像、２３は検索結果画面をそれぞれ示す。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として興味度及び理解度を検索条件として入力する。検索部１５は、検索要求入力部１３からの検索条件に基づいて、インデックスファイル格納部１２内の興味度及び理解度の状態推定値及び対応する会議映像情報ＩＤを取得し、表示制御部１６に送る。表示制御部１６は、状態推定値に応じて、視聴者（人物）を表すオブジェクト（アバタ）Ａ〜Ｄを人物の心的状態（理解度、興味度）を座標軸とする座標平面内に配置した検索結果画面２３を表示する。これにより理解度が低く、興味が低い人は左下、理解度が高く、興味が高い人は、右上に配置され、主観的状態が一瞥して把握できる。アバタは、図１０に配置している個人を模式的に表現する人形アイコンであり、その表現形式はたとえば実際の人物画像でも良いし、○、□などの図形であっても良い。 FIG. 10 is a diagram showing a user interface in the present embodiment. Reference numeral 21 denotes a conference video timeline, 22 denotes a conference video, and 23 denotes a search result screen. The user operates the search request input unit 13 to input an interest level and an understanding level as search conditions as the viewer's mental state. Based on the search condition from the search request input unit 13, the search unit 15 acquires the interest level and the understanding level estimated value in the index file storage unit 12 and the corresponding conference video information ID, and sends them to the display control unit 16. send. The display control unit 16 arranges the objects (avatars) A to D representing the viewer (person) in a coordinate plane having the mental state (degree of understanding and interest) of the person as coordinate axes according to the state estimation value. The search result screen 23 is displayed. Thereby, a person with low understanding and low interest is arranged in the lower left, a person with high understanding and high interest is arranged in the upper right, and the subjective state can be grasped at a glance. The avatar is a doll icon that schematically represents the individual arranged in FIG. 10, and the expression format may be, for example, an actual person image or a figure such as ○ or □.

検索結果画面２３を参照すると、視聴者Ａは興味はあるが理解はしておらず、視聴者Ｂは興味も理解もしておらず、視聴者Ｃは興味も理解もしており、視聴者Ｄは理解はしているが興味がないことが分かる。このように、人物を表すオブジェクトを人物の心的状態を少なくとも一つの座標軸とする座標平面内に配置した画面を表示することで、視聴者の感情も含めて会議内容を振り返ることができる。また、心的状態を距離に変換して配置することで、主観情報を位置情報として表現できる。つまり、パーソナルスペースを可視化できる。また、心的状態が変更した時間帯という検索の手がかりとなる。 Referring to the search result screen 23, the viewer A is interested but not understood, the viewer B is not interested or understood, the viewer C is interested and understood, and the viewer D is I understand that I understand but I am not interested. In this way, by displaying a screen in which an object representing a person is arranged in a coordinate plane having the mental state of the person as at least one coordinate axis, it is possible to look back on the content of the conference including the viewer's emotions. Moreover, subjective information can be expressed as position information by converting a mental state into a distance. That is, the personal space can be visualized. In addition, it becomes a clue to search for a time zone when the mental state is changed.

図１１は、本実施形態におけるユーザインタフェースの他の例を示す図である。参照符号２１は会議映像のタイムライン、２２は会議映像、２４は検索結果画面２４を示す図である。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として集中度及び快適度を検索条件として入力する。検索部１５は、検索要求入力部１３からの検索条件に基づいて、インデックスファイル格納部１２内の集中度及び快適度の状態推定値及び対応する会議映像情報ＩＤを取得し、表示制御部１６に送る。表示制御部１６は、状態推定値に応じて、視聴者（人物）を表すオブジェクト（アバタ）Ａ〜Ｄを人物の心的状態（集中度、快適度）を座標軸とする座標平面内に配置した検索結果画面２３を表示する。 FIG. 11 is a diagram illustrating another example of the user interface in the present embodiment. Reference numeral 21 denotes a conference video timeline, 22 a conference video, and 24 a search result screen 24. The user operates the search request input unit 13 to input the degree of concentration and the comfort level as search conditions as the mental state of the viewer. Based on the search condition from the search request input unit 13, the search unit 15 acquires the concentration level and comfort level estimated values in the index file storage unit 12 and the corresponding conference video information ID, and sends them to the display control unit 16. send. The display control unit 16 arranges the objects (avatars) A to D representing the viewer (person) in a coordinate plane having the mental state (concentration level, comfort level) of the person as coordinate axes according to the state estimation value. The search result screen 23 is displayed.

検索結果画面２４を参照すると、視聴者Ａは集中度が少なく、視聴者Ｂは快適でもなく、集中度も低く、視聴者Ｃは快適で集中度も高く、視聴者Ｄは集中しているが快適ではないことが分かる。このように、人物を表すオブジェクトを人物の心的状態を少なくとも一つの座標軸とする座標平面内に配置した画面を表示することで、視聴者の感情も含めて会議内容を振り返ることができる。また、心的状態を距離に変換して配置することで、主観情報を位置情報として表現できる。つまり、パーソナルスペースを可視化できる。 Referring to the search result screen 24, the viewer A is less concentrated, the viewer B is not comfortable, the concentration is low, the viewer C is comfortable and high, and the viewer D is concentrated. It turns out that it is not comfortable. In this way, by displaying a screen in which an object representing a person is arranged in a coordinate plane having the mental state of the person as at least one coordinate axis, it is possible to look back on the content of the conference including the viewer's emotions. Moreover, subjective information can be expressed as position information by converting a mental state into a distance. That is, the personal space can be visualized.

図１２は、本実施形態におけるユーザインタフェースの他の例を示す図である。参照符号２１は会議映像のタイムライン、２２は会議映像、２５は検索結果画面２５を示す。また、座標は、視聴者Ｘ（理解度、興味度、快適度）を示す。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として理解度、興味度及び快適度を検索条件として入力する。検索部１５は、検索要求入力部１３からの検索条件に基づいて、インデックスファイル格納部１２内の理解度、興味度及び快適度の状態推定値及び対応する会議映像情報ＩＤを取得し、表示制御部１６に送る。表示制御部１６は、状態推定値に応じて、視聴者（人物）を表すオブジェクト（アバタ）Ａ〜Ｄを人物の心的状態（理解度、興味度、快適度）を座標軸とする座標空間内に配置した検索結果画面２３を表示する。ここで座標空間は時間軸を一つの座標軸としても良い。 FIG. 12 is a diagram illustrating another example of the user interface in the present embodiment. Reference numeral 21 denotes a conference video timeline, 22 a conference video, and 25 a search result screen 25. The coordinates indicate the viewer X (degree of understanding, degree of interest, degree of comfort). The user operates the search request input unit 13 to input the understanding level, the interest level, and the comfort level as search conditions as the viewer's mental state. Based on the search condition from the search request input unit 13, the search unit 15 acquires the understanding level, the interest level and the comfort level state estimation value in the index file storage unit 12 and the corresponding conference video information ID, and performs display control. Send to part 16. In accordance with the state estimation value, the display control unit 16 uses objects (avatars) A to D representing viewers (persons) in a coordinate space with a person's mental state (degree of understanding, interest, and comfort) as coordinate axes. The search result screen 23 arranged in is displayed. Here, the coordinate space may have a time axis as one coordinate axis.

検索結果画面２５を参照すると、視聴者Ａは興味があり快適な状態であるが、理解はしていないことが分かる。視聴者Ｂは興味もなく、理解もしておらず、快適な状態でもないことが分かる。視聴者Ｃは興味もあり理解もしており、快適な状態であることが分かる。視聴者Ｄは興味があり、理解はしているが、快適な状態ではないことが分かる。このように、人物を表すオブジェクトを人物の心的状態を少なくとも一つの座標軸とする座標空間内に配置した画面を表示することで、視聴者の感情も含めて会議内容を振り返ることができる。また、心的状態を距離に変換して配置することで、主観情報を位置情報として表現できる。つまり、パーソナルスペースを可視化できる。 Referring to the search result screen 25, it can be seen that the viewer A is interested and comfortable, but does not understand. It is understood that the viewer B is not interested, does not understand, and is not in a comfortable state. It is understood that the viewer C is interested and understands and is in a comfortable state. Viewer D is interested and understands, but understands that it is not comfortable. In this way, by displaying a screen in which an object representing a person is arranged in a coordinate space in which the mental state of the person is at least one coordinate axis, it is possible to look back on the content of the conference including the viewer's emotions. Moreover, subjective information can be expressed as position information by converting a mental state into a distance. That is, the personal space can be visualized.

図１３は、本実施形態におけるユーザインタフェースの他の例を示す図である。参照符号２１は会議映像のタイムライン、２２は会議映像、２６は検索結果画面をそれぞれ示す。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として興味度を検索条件として入力する。検索部１５は、検索要求入力部１３からの検索条件に基づいて、インデックスファイル格納部１２内の興味度の状態推定値及び対応する会議映像情報ＩＤを取得し、表示制御部１６に送る。表示制御部１６は、状態推定値に応じて、視聴者（人物）を表すオブジェクト（アバタ）Ａ及びＢを人物の心的状態（興味度）を座標軸とする座標空間内に配置した検索結果画面２６を表示する。座標平面は時間軸を一つの座標軸としている。 FIG. 13 is a diagram illustrating another example of the user interface in the present embodiment. Reference numeral 21 denotes a conference video timeline, 22 denotes a conference video, and 26 denotes a search result screen. The user operates the search request input unit 13 to input the degree of interest as a search condition as the viewer's mental state. Based on the search condition from the search request input unit 13, the search unit 15 acquires the state of interest state estimated value and the corresponding conference video information ID in the index file storage unit 12, and sends them to the display control unit 16. The display control unit 16 arranges objects (avatars) A and B representing viewers (persons) in a coordinate space with the mental state (degree of interest) of the person as a coordinate axis according to the state estimation value. 26 is displayed. The coordinate plane has a time axis as one coordinate axis.

検索結果画面２６を参照すると、視聴者Ａは（２）のとき、興味度が最も高くなっており、それ以降は興味度が低くなっているのが分かる。視聴者Ｂは初め興味度が低かったが（２）のとき、興味度が少し上がったが、その後、興味度が下がり、（５）のとき、興味度が非常に上がったことが分かる。このように、オブジェクトの遷移を表示することで、心的状態の変化を把握できる。 Referring to the search result screen 26, it can be seen that the viewer A has the highest degree of interest in (2) and the degree of interest after that is low. Viewer B was initially less interested, but when (2), the degree of interest slightly increased, but after that, the degree of interest decreased, and when (5), the degree of interest increased greatly. Thus, by displaying the transition of the object, it is possible to grasp the change of the mental state.

図１４は、検索結果のサムネイルでの表現例を示す図である。参照符号２１は会議映像のタイムライン、２２は会議映像、２７１〜２７４は検索結果画面のサムネイルをそれぞれ示す図である。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として興味度及び理解度を検索条件として入力する。検索部１５は、検索要求入力部１３からの検索条件に基づいて、インデックスファイル格納部１２内の興味度及び理解度の状態推定値及び対応する会議映像情報ＩＤを取得し、表示制御部１６に送る。表示制御部１６は、視聴者を表すオブジェクトＡ〜Ｄを人物の心的状態（理解度、興味度）を少なくとも一つの座標軸とする座標平面内に配置した検索結果画面のサムネイル２７１〜２７４を表示する。ここで、座標平面に代えて座標空間を用いても良い。 FIG. 14 is a diagram illustrating a representation example of a search result as a thumbnail. Reference numeral 21 is a meeting video timeline, 22 is a meeting video, and 271 to 274 are search result screen thumbnails. The user operates the search request input unit 13 to input an interest level and an understanding level as search conditions as the viewer's mental state. Based on the search condition from the search request input unit 13, the search unit 15 acquires the interest level and the understanding level estimated value in the index file storage unit 12 and the corresponding conference video information ID, and sends them to the display control unit 16. send. The display control unit 16 displays the thumbnails 271 to 274 of the search result screen in which the objects A to D representing the viewer are arranged in the coordinate plane having the mental state (degree of understanding and interest) of the person as at least one coordinate axis. To do. Here, a coordinate space may be used instead of the coordinate plane.

検索結果のサムネイル２７２を参照すると、視聴者Ａの理解度が上がっているのが分かり、検索結果のサムネイル２７３を参照すると、視聴者Ｄの興味度が上がっているのが分かり、検索結果のサムネイル２７４を参照すると、視聴者Ａの理解度が下がり、視聴者Ｄの興味度が下がっているのが分かる。このように検索結果をサムネイルにより表示することにより、心理状態を一瞥して確認できる。興味のあった箇所がすぐ分かる。また、心理状態を位置情報にするので、パターンとして覚えておける。そのため、後で探し易くなり、人が移動した（心理状態が変更した）時間帯という検索の手がかりが可能となる。表示制御部１６は、サムネイルがクリックされると、元映像にリンクし、その部分の映像を表示する。 By referring to the thumbnail 272 of the search result, it can be seen that the understanding level of the viewer A has increased, and by referring to the thumbnail 273 of the search result, it has been understood that the interest level of the viewer D has increased. With reference to 274, it can be seen that the understanding level of the viewer A decreases and the interest level of the viewer D decreases. By displaying the search results as thumbnails in this way, it is possible to check the psychological state at a glance. You can immediately find out where you were interested. Moreover, since the psychological state is used as position information, it can be remembered as a pattern. Therefore, it becomes easy to search later, and it becomes possible to provide a clue to search for a time zone in which a person has moved (a psychological state has changed). When the thumbnail is clicked, the display control unit 16 links to the original video and displays the video of that portion.

図１５は、検索用ユーザインタフェースを説明するための図である。参照符号２１は会議映像のタイムライン、２８は検索用ユーザインタフェース、２９１及び２９２は検索結果のサムネイル、３０は会議映像をそれぞれ示す。表示制御部１６は、検索用ユーザインタフェース２８として、参加者エリア２８１に人物を表すオブジェクトを表示し、人物の心的状態を少なくとも一つの座標軸とする座標平面内に配置した画面２８２を表示する。ここで座標平面ではなく座標空間であっても良い。ユーザは、参加者エリア２８１内の任意の視聴者を選択して、座標空間２８２に移動させる。同図では、興味度及び理解度が低い位置に視聴者Ｂを示すオブジェクトを配置させた例である。 FIG. 15 is a diagram for explaining a search user interface. Reference numeral 21 is a conference video timeline, 28 is a search user interface, 291 and 292 are search result thumbnails, and 30 is a conference video. The display control unit 16 displays an object representing a person in the participant area 281 as the search user interface 28, and displays a screen 282 arranged in a coordinate plane having the mental state of the person as at least one coordinate axis. Here, not a coordinate plane but a coordinate space may be used. The user selects an arbitrary viewer in the participant area 281 and moves it to the coordinate space 282. In the figure, an object indicating the viewer B is arranged at a position where the degree of interest and the degree of understanding are low.

表示制御部１６は、座標空間２８２内の視聴者Ｂのオブジェクトの位置情報を検索部１５に送る。検索部１５は、インデックスファイル格納部１２から、視聴者Ｂのオブジェクトの座標平面内の位置に応じた状態推定値を持つ映像シーンＩＤを検索し、表示制御部１６に送る。表示制御部１６は、映像シーンＩＤに合った映像シーンを会議映像格納部７から取得し、会議映像シーンを表示する。また、表示制御部１６は、映像シーン３０に合った検索結果のサムネイル２９１及び２９２を表示する。このように、参加者のオブジェクト（アバタ）を検索キーと見立て、二次元仮想会議に配置することにより、直感的に目的の心理状態の箇所を探すことができる。また、オブジェクトを移動させることによって、心理的状態の変化を示し、その状態の元映像にいち早くアクセスできる。検索結果は、全ての会議参加者を配置した最後に提示されるものではなく、配置の最中にも候補映像がサムネイルとして表示される形式をとる。 The display control unit 16 sends the position information of the object of the viewer B in the coordinate space 282 to the search unit 15. The search unit 15 searches the index file storage unit 12 for a video scene ID having a state estimation value corresponding to the position of the viewer B object in the coordinate plane, and sends the video scene ID to the display control unit 16. The display control unit 16 acquires a video scene corresponding to the video scene ID from the conference video storage unit 7 and displays the conference video scene. In addition, the display control unit 16 displays thumbnails 291 and 292 of search results suitable for the video scene 30. In this way, by regarding the participant's object (avatar) as a search key and arranging it in the two-dimensional virtual conference, it is possible to intuitively search for a target psychological state. In addition, by moving the object, a change in the psychological state is shown, and the original video in that state can be quickly accessed. The search result is not presented at the end of arranging all the conference participants, but takes a form in which candidate videos are displayed as thumbnails during the arrangement.

図１６は、遠隔会議で本発明を適用した例を説明する図である。参照符号２１は会議映像のタイムライン、３０は会議映像、３１１及び３１２は検索結果のサムネイルをそれぞれ示す。ユーザは、検索要求入力部１３を操作して、視聴者の心的状態として理解度及び興味度を、また会議参加者として遠隔地Ｘで会議に参加している会議参加者Ａ及びＢ、遠隔地Ｙで会議に参加している会議参加者Ｃ及びＤを検索条件として入力する。検索部１５は、検索要求入力部１３からの検索条件に基づいて、インデックスファイル格納部１２内の理解度及び興味度の状態推定値及び対応する会議映像情報ＩＤを取得し、表示制御部１６に送る。表示制御部１６は、遠隔地Ｘ及びＹの参加者を表すオブジェクトＡ〜Ｄを人物の心的状態（理解度、興味度）を少なくとも一つの座標軸とする座標平面内に配置した検索結果画面のサムネイル３１１及び３１２を表示する。 FIG. 16 is a diagram for explaining an example in which the present invention is applied to a remote conference. Reference numeral 21 is a timeline of the conference video, 30 is the conference video, and 311 and 312 are thumbnails of the search results. The user operates the search request input unit 13 to set the understanding level and the interest level as the viewer's mental state, and the conference participants A and B participating in the conference at the remote location X as the conference participants, The conference participants C and D participating in the conference at the ground Y are input as search conditions. Based on the search condition from the search request input unit 13, the search unit 15 acquires the understanding level and the interest level estimated value in the index file storage unit 12 and the corresponding conference video information ID, and sends them to the display control unit 16. send. The display control unit 16 displays a search result screen in which objects A to D representing participants in remote locations X and Y are arranged in a coordinate plane having a person's mental state (degree of understanding and interest) as at least one coordinate axis. Thumbnail 311 and 312 are displayed.

人は他者との距離によって、心理的状態を表現することがある（社会心理学：パーソナルスペース）。しかし、遠隔会議では、物理的距離が遠いため、それを表現し得ない。従って、従来の技術では、遠隔地Ｘと遠隔地Ｙでの遠隔会議では、会議参加者Ａ〜Ｄの心理的状態を表現することができなかった。そこで、バーチャル会議室に本発明を適用することで、心理的状態を加味してオブジェクトを配置することで、遠隔会議においても心理的状態を表現することを可能にする。 A person may express a psychological state depending on the distance to others (social psychology: personal space). However, in a remote conference, the physical distance is so far that it cannot be expressed. Therefore, in the conventional technology, in the remote conference at the remote location X and the remote location Y, the psychological state of the conference participants A to D cannot be expressed. Therefore, by applying the present invention to a virtual conference room, it is possible to express a psychological state even in a remote conference by arranging objects in consideration of the psychological state.

上述したように、生体情報用いて、人物の心理的状態を特定し、この心理状態を距離に変換して、仮想会議空間に配置する。また、主観情報による検索の検索キーとして、アバタの位置情報を利用する。これにより、例えば、仮想会議空間を、理解度及び興味度の２次元で表示し、オブジェクトをこの座標上に配置することによって、理解度が低く、興味が低い人は左下、理解度が高く、興味が高い人は右上に配置され、主観的状態が一瞥して把握できる。会議中の主観情報が位置情報として表現され、パターン認知を支援し、記憶のためのランドマークとなり得る。また、会議中の主観情報を位置情報として表現し、パーソナルスペースを可視化して提供することができる。 As described above, the psychological state of the person is specified using the biological information, and this psychological state is converted into a distance and placed in the virtual conference space. In addition, avatar position information is used as a search key for searching by subjective information. Thereby, for example, the virtual conference space is displayed in two dimensions of the understanding level and the interest level, and by placing the object on the coordinates, the understanding level is low, and the person with low interest is in the lower left, the understanding level is high, Persons with high interest are placed in the upper right, and their subjective state can be grasped at a glance. Subjective information during the meeting is expressed as position information, supports pattern recognition, and can be a landmark for storage. Moreover, the subjective information during the meeting can be expressed as position information, and a personal space can be visualized and provided.

情報検索システム１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ(Read Only Memory)、ＲＡＭ(Random Access Memory)、ハードディスク装置、ディスプレイ装置等を用いて実現される。なお、本発明による情報処理方法は、情報検索システム１によって実現され、プログラムをハードディスク装置や、ＣＤ−ＲＯＭ、ＤＶＤまたはフレキシブルディスクなどの可搬型記憶媒体等からインストールし、または通信回路からダウンロードし、ＣＰＵがこのプログラムを実行することで、各ステップが実現される。 The information retrieval system 1 is realized using a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), a hard disk device, a display device, and the like. The information processing method according to the present invention is realized by the information search system 1, and the program is installed from a hard disk device, a portable storage medium such as a CD-ROM, a DVD, or a flexible disk, or downloaded from a communication circuit, Each step is realized by the CPU executing this program.

プログラムは、撮影時に得た人物の生体情報及び音声情報のうちの少なくとも一方を用いて算出された人物の心的状態を示す状態推定値を取得するステップ、前記状態推定値に応じて、前記人物の心的状態を少なくとも一つの座標軸とする座標空間内に前記人物を表すオブジェクトを配置した画面を表示するステップをコンピュータに実行させる。なお、情報検索システム１が情報処理装置に、インデックスファイル格納部１２が格納手段に、表示制御部１６が表示手段に、検索部１５が検索手段に、状態推定処理部１０が推定手段にそれぞれ対応する。 The program obtains a state estimated value indicating the mental state of the person calculated using at least one of the biological information and voice information of the person obtained at the time of shooting, and the person is selected according to the state estimated value. The computer is caused to execute a step of displaying a screen in which an object representing the person is arranged in a coordinate space having at least one coordinate axis as the mental state of. The information search system 1 corresponds to the information processing apparatus, the index file storage unit 12 corresponds to the storage unit, the display control unit 16 corresponds to the display unit, the search unit 15 corresponds to the search unit, and the state estimation processing unit 10 corresponds to the estimation unit. To do.

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施例に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。例えば上記各実施例では映像として会議映像を用いて説明したが本発明の映像は会議映像に限られるものではない。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed. For example, in each of the embodiments described above, the conference video is used as the video, but the video of the present invention is not limited to the conference video.

本発明の実施形態における情報検索システムのブロック図である。It is a block diagram of the information search system in the embodiment of the present invention. 生体情報格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the biometric information storage part. 音声情報格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the audio | voice information storage part. 会議映像格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the meeting image | video storage part. 会議情報格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the meeting information storage part. インデックスファイル格納部が格納するデータシートを示す図である。It is a figure which shows the data sheet which an index file storage part stores. 生体情報、音声情報、会議映像情報の検出及び記録を説明するための図である。It is a figure for demonstrating the detection and recording of biometric information, audio | voice information, and conference video information. 索引付与処理の動作フローチャートである。It is an operation | movement flowchart of an index provision process. 表示処理の動作フローチャートである。It is an operation | movement flowchart of a display process. 本実施形態におけるユーザインタフェースを示す図である。It is a figure which shows the user interface in this embodiment. 本実施形態におけるユーザインタフェースの他の例を示す図である。It is a figure which shows the other example of the user interface in this embodiment. 本実施形態におけるユーザインタフェースの他の例を示す図である。It is a figure which shows the other example of the user interface in this embodiment. 本実施形態におけるユーザインタフェースの他の例を示す図である。It is a figure which shows the other example of the user interface in this embodiment. 検索結果のサムネイルでの表示例を示す図である。It is a figure which shows the example of a display by the thumbnail of a search result. 検索用ユーザインタフェースを説明するための図である。It is a figure for demonstrating the user interface for a search. 遠隔会議で本発明を適用した例を説明する図である。It is a figure explaining the example which applied this invention by the remote conference.

Explanation of symbols

１情報検索システム１１状態順位判定部
２生体情報検出部１２インデックスファイル格納部
３音声情報検出部１３検索要求入力部
４会議映像検出部１５検索部
８同期部１６表示制御部
９会議情報格納部１７表示部
１０状態推定処理部
DESCRIPTION OF SYMBOLS 1 Information search system 11 State order | rank determination part 2 Biometric information detection part 12 Index file storage part 3 Audio | voice information detection part 13 Search request input part 4 Conference image | video detection part 15 Search part 8 Synchronization part 16 Display control part 9 Conference information storage part 17 Display unit 10 State estimation processing unit

Claims

Storage means for storing the mental state of the person estimated using at least one of the biological information and audio information of the person obtained at the time of shooting in association with the video information at the time of shooting;
An information processing apparatus comprising: a display unit configured to display a screen in which an object representing the person is arranged in a coordinate space in which the mental state of the person is at least one coordinate axis.

Search means for searching the storage means for a video according to the position of the object in the coordinate space,
The information processing apparatus according to claim 1, wherein the display unit displays the video searched by the search unit.

The information processing apparatus according to claim 1, wherein the display unit displays a thumbnail of a screen on which the object is arranged in a coordinate space having the mental state of the person as at least one coordinate axis.

The information processing apparatus according to claim 3, wherein the display unit displays an image corresponding to the selected thumbnail.

The information processing apparatus according to claim 1, wherein the display unit displays a transition of the object.

The information processing apparatus according to claim 1, wherein the coordinate space has a time axis as one coordinate axis.

The apparatus further comprises an estimation unit that obtains a state estimation value representing a degree of the mental state of the person using a predetermined evaluation function based on the biological information of the person or the voice information at the time of shooting. The information processing apparatus according to any one of claims 1 to 6.

Storage means for storing the mental state of the person estimated using at least one of the biological information and audio information of the person obtained at the time of shooting in association with the video information at the time of shooting;
Display means for displaying a coordinate space having the object representing the person and the mental state of the person as at least one coordinate axis;
An information processing apparatus comprising: a search unit that searches the storage unit for a video corresponding to the position of the object in the coordinate space.

The mental state of the person includes the person's interest degree, person's excitement degree, person's comfort degree, person's understanding degree, person's memory degree, person's support degree, person's shared sense degree, person's subjectivity degree, person The information processing apparatus according to claim 1, wherein the information processing apparatus includes at least one of an objectivity level, a person's disgust level, and a person's fatigue level.

Obtaining a state estimation value indicating the mental state of the person calculated using at least one of the biological information and voice information of the person obtained at the time of shooting;
And displaying a screen on which an object representing the person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis in accordance with the state estimation value.

The information processing method according to claim 10, further comprising: displaying a thumbnail of a screen on which an object representing the person is arranged in a coordinate space in which the mental state of the person is at least one coordinate axis.

Displaying a screen arranged in a coordinate space having an object representing a person and a mental state of the person as at least one coordinate axis;
And a step of searching for a video scene having a state estimated value corresponding to the position of the object.

Obtaining a state estimation value indicating the mental state of the person calculated using at least one of the biological information and voice information of the person obtained at the time of shooting;
In order to cause the computer to execute a step of generating information for displaying a screen in which an object representing the person is arranged in a coordinate space having the mental state of the person as at least one coordinate axis according to the state estimation value Program.