JP2006185187A

JP2006185187A - Information processor, information processing method, and program

Info

Publication number: JP2006185187A
Application number: JP2004378220A
Authority: JP
Inventors: Hideto Yuzawa; 秀人湯澤
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-12-27
Filing date: 2004-12-27
Publication date: 2006-07-13

Abstract

PROBLEM TO BE SOLVED: To provide an information processor by which a user can see or hear a scene that the user has missed to see or hear. SOLUTION: An information retrieval system 1 is provided with; a gaze object determination part 10 for determining an object which a prescribed person gazes, on the basis of biological information of persons synchronized with video information obtained by photographing a prescribed space; a sound pressure object determination part 10 for determining a sound pressure object on the basis of sound information synchronized with the video information; an indexing part 12 for indexing missed scenes within the video information on the basis of the gaze object of the person and the sound pressure object; a retrieval part 16 for referring to indexes given by the indexing part 12 to retrieve missed scenes; and a display control part 17 for displaying missed scenes retrieved by the retrieval part 16. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来から会議は、ほぼ毎日、時間毎に開催されている。このため、会議毎に捕捉され、蓄積された会議データは膨大である。またその数は日々増加していく。ここで、会議での決定事項を見返すなど、会議データを再利用したいときに、所望の会議を特定し、さらに会議データの中から所望の箇所（シーン）を探し出す手間は煩わしい。またこれを探し出すことは困難若しくは不可能である。 Traditionally, meetings have been held almost every day, every hour. For this reason, the conference data captured and stored for each conference is enormous. The number increases daily. Here, it is troublesome to specify a desired meeting and search for a desired portion (scene) from the meeting data when it is desired to reuse the meeting data, such as looking back on matters to be determined in the meeting. It is difficult or impossible to find this.

また、会議などでの決定事項などは、後に発行される議事録を見返すことにより振り返ることができる。しかし、議事録には決定事項に至った詳細な経緯などは記録されないため、見たいシーンを効率良く振り返ることができない。また、当人にとっては重要であると思える発話の内容や資料の内容など、議事録に載らないような事柄でも後で思い出したいと思う場合もある。また、振り返りを支援する従来技術として以下のようなものが提案されている。 In addition, decisions made at meetings and the like can be reviewed by reviewing the minutes issued later. However, since the details of the decision are not recorded in the minutes, it is not possible to efficiently look back on the scene you want to see. Also, you may want to remember later things that are not listed in the minutes, such as the content of utterances and materials that seem important to you. In addition, the following technologies have been proposed as conventional technologies for supporting the review.

特許文献１記載の装置は、動画像からある一定の時間間隔で抽出したシーン（静止画）を代表画像としてユーザに提示し、ユーザはその代表画像を頼りに時間的前後関係を推測することで、見たいシーンに辿り着くことができる。 The device described in Patent Document 1 presents a scene (still image) extracted from a moving image at a certain time interval as a representative image to the user, and the user estimates the temporal context based on the representative image. , You can get to the scene you want to see.

特許文献２記載の装置は、会議出席者が途中で中座した場合に、そのシーンの発話情報を言語化し、議事録として提供する。 The device described in Patent Document 2 provides utterance information of the scene as a minutes when a meeting attendee is in the middle.

特開２００２−２６２２４０号公報JP 2002-262240 A 特開２００２−９９５３０号公報JP 2002-99530 A

しかしながら、会議やセミナー聴講の最中、メモの筆記や携帯電話での通話などによって、シーンを見逃したり聞き逃したりしてしまうことが頻繁に起こるという問題がある。また、会議やセミナー聴講すべてを録画した場合、情報量が膨大となり、検索性が著しく低くなる。 However, during conferences and seminars, there is a problem that scenes are frequently overlooked or missed due to writing notes or making calls on mobile phones. In addition, when all conferences and seminar attendances are recorded, the amount of information becomes enormous and the searchability is remarkably lowered.

また、特許文献１記載の技術では、動画像からある一定の時間間隔で抽出された代表シーンを見ながら、見逃しシーン又は聞き逃しシーンを見つけなければならないため、検索性が悪い。 The technique described in Patent Document 1 has poor searchability because it is necessary to find a missed scene or a missed scene while looking at a representative scene extracted from a moving image at a certain time interval.

また、特許文献２記載の技術では、会議の出席者が途中で中座した場合に見逃し又は聞き逃し箇所の議事録を作成するようにしているが、見逃し又は聞き逃しは中座した場合以外にも発生するため、この議事録によっては、すべての見逃し又は聞き逃しシーンを振り返ることができない。 In addition, in the technology described in Patent Document 2, when the attendees of the conference are in the middle, the minutes of missed or missed points are created. Therefore, according to the minutes, it is not possible to look back on all missed or missed scenes.

そこで、本発明は、上記問題点に鑑みてなされたもので、見逃しシーン又は聞き逃しシーンを効率的に振り返ることができる情報処理装置、情報処理方法及びプログラムを提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and an object thereof is to provide an information processing apparatus, an information processing method, and a program capable of efficiently looking back on a missed scene or a missed scene.

上記課題を解決するために、本発明は、所定の空間を撮影した映像情報に同期された人物の生体情報に基づいて、該人物の注視対象を判定する注視対象判定手段と、前記人物の注視対象に基づいて、前記映像情報内の前記人物の見逃しシーン又は聞き逃しシーンに対して索引を付与する付与手段とを備える情報処理装置である。本発明によれば、人物の注視対象によって映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 In order to solve the above problems, the present invention provides gaze target determination means for determining a gaze target of a person based on biological information of the person synchronized with video information obtained by photographing a predetermined space, and gaze of the person An information processing apparatus comprising: an assigning unit that assigns an index to the person's missed scene or missed scene in the video information based on a target. According to the present invention, since an index can be assigned to a person's missed scene or missed scene in the video information depending on the person's gaze target, it is possible to efficiently look back on the person's missed scene or missed scene. .

本発明は更に、前記映像情報に同期された音声情報に基づいて、音圧対象を判定する音圧対象判定手段を備え、前記付与手段は、前記映像情報内の前記人物の注視対象及び前記音圧対象に基づいて、前記映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与する。本発明によれば、人物の注視対象及び音圧対象によって映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention further includes sound pressure target determination means for determining a sound pressure target based on audio information synchronized with the video information, and the granting means includes the person's gaze target and the sound in the video information. Based on the pressure target, an index is assigned to a scene that is missed or missed by a person in the video information. According to the present invention, since an index can be assigned to a person's missed scene or missed scene in the video information depending on the person's gaze target and sound pressure target, the person's missed scene or missed scene can be efficiently handled. You can look back.

本発明は更に、前記映像情報に同期された音声情報に基づいて、音圧対象を判定する音圧対象判定手段を備え、前記付与手段は、前記映像情報内のシーンで前記人物の注視対象と前記音圧対象が同じシーンを前記人物の見逃しシーン又は聞き逃しシーンの候補から除外する。本発明によれば、人物の注視対象と音圧対象が同じ場合、視聴者は講演者を見ていることになるため、そのシーンについては見逃しシーン又は聞き逃しシーンの候補から除外する。 The present invention further includes sound pressure target determination means for determining a sound pressure target based on audio information synchronized with the video information, wherein the assigning means is a target in the scene in the video information. A scene with the same sound pressure target is excluded from candidates for the person's missed scene or missed scene. According to the present invention, when the person's gaze target and the sound pressure target are the same, the viewer is looking at the speaker, so the scene is excluded from the candidates for the missed scene or the missed scene.

本発明は更に、前記映像情報に同期された音声情報に基づいて、音圧対象を判定する音圧対象判定手段を備え、前記付与手段は、前記映像情報内のシーンで前記人物の注視対象と前記音圧対象が異なるシーンを前記人物の見逃しシーン又は聞き逃しシーンの候補として抽出する。本発明によれば、注視対象と音圧対象が異なる場合、視聴者は講演者を見ていないことになるため、これに該当するシーンについては人物の見逃し又は聞き逃しシーンの候補とすることによって、人物の見逃しシーン又は聞き逃しシーンを効率良く見つけることができる。 The present invention further includes sound pressure target determination means for determining a sound pressure target based on audio information synchronized with the video information, wherein the assigning means is a target in the scene in the video information. Scenes with different sound pressure targets are extracted as candidates for the person's missed scene or missed scene. According to the present invention, when the gaze target and the sound pressure target are different, the viewer does not look at the lecturer. Thus, it is possible to efficiently find a person's missed scene or missed scene.

本発明は、前記映像情報に同期された音声情報に基づいて、音圧対象を判定する音圧対象判定手段を備え、前記付与手段は、前記人物の注視対象と前記音圧対象が異なり、かつ前記人物が音圧を発生しているシーンを前記人物の聞き逃しシーンの候補として抽出する。本発明によれば、注視対象と音圧対象が異なり、視聴者が音圧を発生している場合、聞き逃しシーンであると考えられるため、このシーンに対して索引を付与することによって、聞き逃しシーンを効率良く振り返ることができる。 The present invention comprises a sound pressure target determining means for determining a sound pressure target based on audio information synchronized with the video information, and the providing means is different from the sound pressure target of the person and A scene in which the person is generating sound pressure is extracted as a candidate for the person's missed scene. According to the present invention, when the gaze target and the sound pressure target are different and the viewer is generating sound pressure, it is considered that the scene is a missed scene. You can look back at the missed scenes efficiently.

本発明は、所定の空間を撮影した映像情報に同期された音声情報に基づいて、音圧対象を判定する音圧対象判定手段と、前記音圧対象に基づいて、前記映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与する付与手段とを備える情報処理装置である。本発明によれば、音圧対象に基づいて、映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention provides a sound pressure target determination means for determining a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space, and a person in the video information based on the sound pressure target. An information processing apparatus includes an adding unit that assigns an index to an overlooked scene or an overlooked scene. According to the present invention, it is possible to index a person's missed scene or missed scene in the video information based on the sound pressure target, so that the person's missed scene or missed scene can be efficiently looked back. it can.

本発明は更に、前記人物の生体情報を前記映像情報に同期させる同期手段を備える。また本発明は更に、前記音声情報を前記映像情報に同期させる同期手段を備える。また本発明は更に、前記付与手段によって付与された索引を参照して、前記人物の見逃しシーン又は聞き逃しシーンを検索する検索手段を備える。また、本発明は更に、前記検索手段によって検索された人物の見逃しシーン又は聞き逃しシーンを表示する表示制御手段を備える。本発明によれば、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention further includes synchronization means for synchronizing the person's biological information with the video information. The present invention further includes synchronization means for synchronizing the audio information with the video information. The present invention further includes search means for searching for the person's missed scene or missed scene with reference to the index assigned by the assigning means. In addition, the present invention further includes display control means for displaying a person's missed scene or missed scene searched by the search means. According to the present invention, it is possible to efficiently look back on a person's missed scene or missed scene.

本発明は、所定の空間を撮影した映像情報に同期された人物の生体情報から得た所定の人物の注視対象を用いて該映像情報内の該人物の見逃しシーン又は聞き逃しシーンに対して付与された索引を格納する格納手段と、前記格納手段に格納された索引を参照して、前記映像情報内の前記人物の見逃しシーン又は聞き逃しシーンを表示する表示制御手段とを備える。本発明によれば、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention uses a gaze target of a predetermined person obtained from the biological information of the person synchronized with the video information obtained by photographing the predetermined space, and gives the missed scene or the missed scene of the person in the video information. Storage means for storing the index, and display control means for displaying the person's missed scene or missed scene in the video information with reference to the index stored in the storage means. According to the present invention, it is possible to efficiently look back on a person's missed scene or missed scene.

本発明は、所定の空間を撮影した映像情報に同期された人物の生体情報に基づいて、所定の人物の注視対象を判定するステップと、前記人物の注視対象に基づいて、前記映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与する付与ステップとを有する情報処理方法である。本発明によれば、人物の注視対象によって映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与することができるので、見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention includes a step of determining a gaze target of a predetermined person based on the biological information of the person synchronized with video information obtained by photographing a predetermined space, and a step in the video information based on the gaze target of the person. And a granting step of assigning an index to a missed scene or a missed scene. According to the present invention, since an index can be assigned to a missed scene or a missed scene in the video information depending on a person's gaze target, the missed scene or the missed scene can be looked back on efficiently.

本発明は更に、前記映像情報に同期された音声情報に基づいて、音圧対象を判定するステップを有し、前記付与ステップは、前記人物の注視対象と前記音圧対象に基づいて、前記映像情報内の前記人物の見逃しシーン又は見逃しシーンに対して索引を付与する。本発明によれば、人物の注視対象及び音圧対象によって映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention further includes a step of determining a sound pressure target based on audio information synchronized with the video information, and the adding step includes determining the video based on the gaze target of the person and the sound pressure target. An index is assigned to the person's missed scene or the missed scene in the information. According to the present invention, since an index can be assigned to a person's missed scene or missed scene in the video information depending on the person's gaze target and sound pressure target, the person's missed scene or missed scene can be efficiently handled. You can look back.

本発明は、所定の空間を撮影した映像情報に同期された音声情報に基づいて、音圧対象を判定するステップと、前記音圧対象に基づいて、前記映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与するステップとを有する情報処理方法である。本発明によれば、音圧対象に基づいて、映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention includes a step of determining a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space, and a person's missed scene or listening in the video information based on the sound pressure target. And an indexing step for the missed scene. According to the present invention, it is possible to index a person's missed scene or missed scene in the video information based on the sound pressure target, so that the person's missed scene or missed scene can be efficiently looked back. it can.

本発明は更に、前記索引を参照して、人物の見逃しシーン又は聞き逃しシーンを表示するステップを有する。本発明によれば、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention further includes a step of displaying a person's missed scene or missed scene with reference to the index. According to the present invention, it is possible to efficiently look back on a person's missed scene or missed scene.

本発明は、所定の空間を撮影した映像情報に同期された人物の生体情報に基づいて、所定の人物の注視対象を判定するステップ、前記人物の注視対象に基づいて、前記映像情報内の前記人物の見逃しシーン又は聞き逃しシーンに対して索引を付与する付与ステップをコンピュータに実行させるためのプログラムである。本発明によれば、人物の注視対象によって映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引を付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention includes a step of determining a gaze target of a predetermined person based on biometric information of a person synchronized with video information obtained by photographing a predetermined space, and based on the gaze target of the person, the video information in the video information It is a program for causing a computer to execute an assigning step of assigning an index to a person's missed scene or a missed scene. According to the present invention, since an index can be assigned to a person's missed scene or missed scene in the video information depending on the person's gaze target, the person's missed scene or missed scene can be efficiently looked back on. .

本発明は、所定の空間を撮影した映像情報に同期された音声情報に基づいて、音圧対象を判定するステップ、前記音圧対象に基づいて、前記映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与するステップをコンピュータに実行させるためのプログラムである。本発明によれば、音圧対象に基づいて、映像情報内の人物の見逃しシーン又は聞き逃しシーンに対して索引付与することができるので、人物の見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 The present invention includes a step of determining a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space, and a missed scene or a missed scene in the video information based on the sound pressure target. A program for causing a computer to execute an indexing step. According to the present invention, it is possible to index a person's missed scene or missed scene in the video information based on the sound pressure target, so that the person's missed scene or missed scene can be efficiently looked back. it can.

本発明によれば、見逃しシーン又は聞き逃しシーンを効率的に振り返ることができる情報処理装置、情報処理方法及びプログラムを提供できる。 ADVANTAGE OF THE INVENTION According to this invention, the information processing apparatus, the information processing method, and program which can look back at a missed scene or a missed scene efficiently can be provided.

以下、本発明を実施するための最良の形態について説明する。図１は本発明の実施形態における情報検索システム１のブロック図である。図１に示すように、情報検索システム１は、生体情報検出部２、音声情報検出部３、会議映像検出部４、生体情報格納部５、音声情報格納部６、会議映像格納部７、同期部８、会議情報格納部９、注視対象判定部１０、音圧対象判定部１１、索引付与部１２、インデックスファイル格納部１３、検索要求入力部１４、検索要求記録部１５、検索部１６、表示制御部１７及び表示部１８を備える。 Hereinafter, the best mode for carrying out the present invention will be described. FIG. 1 is a block diagram of an information search system 1 according to an embodiment of the present invention. As shown in FIG. 1, the information retrieval system 1 includes a biological information detection unit 2, a voice information detection unit 3, a conference video detection unit 4, a biological information storage unit 5, a voice information storage unit 6, a conference video storage unit 7, and a synchronization. Unit 8, conference information storage unit 9, gaze target determination unit 10, sound pressure target determination unit 11, index assignment unit 12, index file storage unit 13, search request input unit 14, search request recording unit 15, search unit 16, display A control unit 17 and a display unit 18 are provided.

この情報検索システム１は、会議の映像から人物の見逃しシーン又は聞き逃しシーンを検索できるようにするものである。生体情報検出部２は、会議の視聴者の瞬きや瞳孔径、注視対象、注視時間など眼球に係る情報や、顔面皮膚温などの生体情報を検出するものである。ここで取得する生体情報は、特に会議の視聴者への計測機器の装着無しに取得可能なものであることが好ましい。 This information search system 1 enables searching for a person's missed scene or missed scene from a conference video. The biometric information detection unit 2 detects information related to the eyeball such as blinks, pupil diameters, gaze targets, and gaze times of conference viewers, and biometric information such as facial skin temperature. It is preferable that the biometric information acquired here is information that can be acquired without attaching the measuring device to the viewer of the conference.

生体情報検出部２は、カメラ及び画像処理装置等により構成され、瞬きや瞳孔径に係る情報を公知の手法を用いて視聴者の顔を撮像した顔画像から顔領域を抽出し、更に眼領域を特定し、瞬きの数をカウントしたり、瞳孔径を測定したりすることによって取得する。注視対象、注視時間については、注視対象候補側に配置したカメラ等により撮像した画像を用い、前記手法により眼領域を特定し、撮像したカメラの位置より注視対象を特定し、撮像された眼領域の撮影時間から注視時間を特定することによって取得できる。顔面皮膚温においては、赤外線カメラ、サーモグラフィ等により視聴者への計測機器を装着することなく取得することが可能である。 The biological information detection unit 2 includes a camera and an image processing device, and extracts information related to blinking and pupil diameter from a face image obtained by capturing a viewer's face using a known technique. Is obtained by counting the number of blinks or measuring the pupil diameter. For the gaze target and the gaze time, use an image captured by a camera or the like arranged on the gaze target candidate side, identify the eye region by the above method, identify the gaze target from the captured camera position, and capture the eye region It can be obtained by specifying the gaze time from the shooting time. The facial skin temperature can be obtained without wearing a measuring device for the viewer by using an infrared camera, thermography or the like.

音声情報検出部３は、例えば集音マイク等により構成され、会議の視聴者、話者の音声等を検出する。会議映像検出部４は、ビデオカメラ等により構成され、会議映像を検出する。会議映像検出部４には、会議で使用されるプレゼンテーション資料や視聴者等が広角で撮像できるカメラを用いても良い。生体情報検出部２、音声情報検出部３及び会議映像検出部４は、会議室内の所定の位置に設置される。会議映像検出部４による会議映像の撮影中に、会議の参加者の生態情報及び音声情報が検出される。 The voice information detection unit 3 includes, for example, a sound collecting microphone and detects the voices of conference viewers and speakers. The conference video detection unit 4 is composed of a video camera or the like and detects a conference video. The conference video detection unit 4 may be a camera that can capture presentation materials and viewers used in a conference at a wide angle. The biological information detection unit 2, the audio information detection unit 3, and the conference video detection unit 4 are installed at predetermined positions in the conference room. During the shooting of the conference video by the conference video detector 4, the biometric information and audio information of the conference participants are detected.

生体情報格納部５は、生体情報検出部２で検出した生体情報をデータシートの形式で格納する。図２は、生体情報格納部５に格納されたデータシートを示す図である。図２において、生体情報は、瞳孔径x、y、注視対象、注視時間、瞬目、顔面皮膚温を含む。この生体情報は、視聴者ごとに、時刻ｔに対応させて格納されている。この時刻ｔは会議映像検出部４が検出する会議映像に対応する。注視対象は、注視対象側に設置されたカメラから撮像可能であったカメラ位置を特定することで注視対象とする。注視時間は、対象ごとに累積時間を算出して記録する。図２の例では、視聴者Ａは、注視対象が１、２、３、及び１の順で移動しているのが分かる。 The biological information storage unit 5 stores the biological information detected by the biological information detection unit 2 in the form of a data sheet. FIG. 2 is a diagram illustrating a data sheet stored in the biometric information storage unit 5. In FIG. 2, the biological information includes pupil diameter x, y, gaze target, gaze time, blink, and facial skin temperature. This biometric information is stored for each viewer in association with time t. This time t corresponds to the conference video detected by the conference video detection unit 4. The gaze target is set as the gaze target by specifying the camera position that can be imaged from the camera installed on the gaze target side. The gaze time is calculated and recorded for each subject. In the example of FIG. 2, the viewer A can see that the gaze target is moving in the order of 1, 2, 3, and 1.

音声情報格納部６は、音声情報検出部３で検出した音声等をデータシートの形式で格納する。図３は音声情報格納部６に格納されたデータシートを示す図である。図３において、音声情報は、発話有無、音量、話者発話有無、話者音量及び環境音を含む。この生体情報は、視聴者ごとに、時刻ｔに対応させて格納されている。特に環境音は、映像に付随して発生する音響情報を含む。図３に示す例では、視聴者Ａは、発話「Ｙ」の期間で音圧を発していることになる。 The voice information storage unit 6 stores the voice detected by the voice information detection unit 3 in the form of a data sheet. FIG. 3 is a view showing a data sheet stored in the audio information storage unit 6. In FIG. 3, the audio information includes utterance presence / absence, volume, speaker utterance presence / absence, speaker volume, and environmental sound. This biometric information is stored for each viewer in association with time t. In particular, the environmental sound includes acoustic information generated accompanying the video. In the example illustrated in FIG. 3, the viewer A generates sound pressure during the utterance “Y”.

会議映像格納部７は、会議映像検出部４によって検出された会議映像を一覧形式のデータシートで格納する。図４は会議映像格納部７に格納されたデータシートを示す図である。図４に示すように、会議映像は、識別記号ＩＤに対応して格納されている。 The conference video storage unit 7 stores the conference video detected by the conference video detection unit 4 in a list format data sheet. FIG. 4 is a view showing a data sheet stored in the conference video storage unit 7. As shown in FIG. 4, the conference video is stored in correspondence with the identification symbol ID.

同期部８は、生体情報格納部５に格納された人物の生態情報及び音声情報格納部６に格納された音声情報を会議映像格納部７に格納された映像情報に同期させる。会議情報格納部９は、例えば、半導体メモリ、ハードディスク、フレキシブルディスクなどの記録装置により構成され、同期部８で同期化させた情報を会議インデックスファイルとして保持する。 The synchronization unit 8 synchronizes the biological information of the person stored in the biometric information storage unit 5 and the audio information stored in the audio information storage unit 6 with the video information stored in the conference video storage unit 7. The conference information storage unit 9 is configured by a recording device such as a semiconductor memory, a hard disk, or a flexible disk, and holds information synchronized by the synchronization unit 8 as a conference index file.

図５は会議情報格納部９に格納されたデータシートを示す図である。図５に示すように、会議情報は、時刻、瞳孔径、注視対象、注視時間、瞬目、顔面皮膚温などの生体情報と、発話有無、音量、話者発話有無、話者音量、環境音などの音情報と、会議映像情報データＩＤとを含む。会議情報は、視聴者ごとに、時刻に対応付けられている。 FIG. 5 is a view showing a data sheet stored in the conference information storage unit 9. As shown in FIG. 5, the conference information includes biological information such as time, pupil diameter, gaze target, gaze time, blink, and facial skin temperature, utterance presence / absence, volume, speaker utterance presence / absence, speaker volume, and environmental sound. Such as sound information and conference video information data ID. The meeting information is associated with the time for each viewer.

注視対象判定部１０は会議情報格納部９から取得した人物の生体情報に基づいて、所定の人物の注視対象を判定する。音圧対象判定部１１は会議情報格納部９から取得した音声情報に基づいて、音圧対象を判定する。音圧対象とは音声を発声している対象をいう。索引付与部１２は、人物の注視対象と音圧対象に基づいて、映像情報内の見逃しシーン又は見逃しシーンに対して索引を付与する。この索引付与部１２は、人物の注視対象だけに基づいて、映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与するようにしてもよい。インデックスファイル格納部１３は、見逃しシーンＩＤ又は聞き逃しシーンＩＤと見逃しシーン又は聞き逃しシーンの索引を関連付けて格納する。したがって、インデックスファイル格納部１３は、所定の空間を撮影した映像情報に同期された人物の生体情報から得られた人物の注視対象を用いて該映像情報内の見逃しシーン又は聞き逃しシーンに対して付与された索引を格納する。 The gaze target determination unit 10 determines a gaze target of a predetermined person based on the biometric information of the person acquired from the conference information storage unit 9. The sound pressure target determination unit 11 determines a sound pressure target based on the audio information acquired from the conference information storage unit 9. The sound pressure target is a target that is speaking. The index assigning unit 12 assigns an index to a missed scene or a missed scene in the video information based on a person's gaze target and sound pressure target. The index assigning unit 12 may assign an index to a missed scene or a missed scene in the video information based only on a person's gaze target. The index file storage unit 13 stores the missed scene ID or the missed scene ID in association with the index of the missed scene or the missed scene. Therefore, the index file storage unit 13 uses a person's gaze target obtained from the biometric information of the person synchronized with the video information obtained by photographing a predetermined space to detect a missed scene or a missed scene in the video information. Stores the assigned index.

検索要求入力部１４は、例えばタッチパネル、マウス又はキーボード等により構成される。ユーザはこの検索要求入力部１４を操作することによって特定の検索要求を入力することができる。ユーザは、この検索要求入力部１４から例えば特定の視聴者を指定する。検索要求記録部１５は、例えば半導体メモリ、ハードディスク、フレキシブルディスクなどの記録装置により構成され、検索要求入力部１４で入力された検索条件を記録する。 The search request input unit 14 is configured by, for example, a touch panel, a mouse, a keyboard, or the like. The user can input a specific search request by operating the search request input unit 14. For example, the user designates a specific viewer from the search request input unit 14. The search request recording unit 15 is configured by a recording device such as a semiconductor memory, a hard disk, or a flexible disk, and records the search conditions input by the search request input unit 14.

検索部１６は、検索要求入力部１４から入力された検索要求に基づいて、インデックスファイル格納部１３にアクセスし、見逃しシーンＩＤ又は聞き逃しシーンＩＤを表示制御部１４へ送る。表示制御部１７は、見逃しシーンＩＤ又は聞き逃しシーンＩＤに対応する映像情報を会議映像格納部７から取得して、描画情報を生成して表示部１８に送る。表示部１８は、表示制御部１７からの描画情報に基づいて見逃しシーン又は聞き逃しシーンを表示する。なお、表示制御部１７は、見逃しシーンＩＤ又は聞き逃しシーンＩＤに基づいて、映像情報のサムネイルを表示するようにしても良い。 Based on the search request input from the search request input unit 14, the search unit 16 accesses the index file storage unit 13 and sends a missed scene ID or a missed scene ID to the display control unit 14. The display control unit 17 acquires video information corresponding to the missed scene ID or the missed scene ID from the conference video storage unit 7, generates drawing information, and sends it to the display unit 18. The display unit 18 displays a missed scene or a missed scene based on the drawing information from the display control unit 17. The display control unit 17 may display a thumbnail of the video information based on the missed scene ID or the missed scene ID.

図６は生体情報、音声情報、会議映像情報の検出及び記録を説明するための図である。図６に示すように、会議室２００内では、テーブル２０１の回りに視聴者Ａ〜Ｄが椅子に座っている。プロジェクタ２０４は、スライド表示部２０５にスライドを投影する。このスライドに合わせて、話者２０６が講演を行っている。ビデオカメラ２０３は、上述の会議映像検出部４に対応し、会議映像を検出する。ビデオカメラ２１１、２１２、２１３及び２１４は、上述の生体情報検出部２に対応する。ここでは、スライド表示部２０５を注視対象（視対象）（１）、話者２０６を注視対象（２）、視聴者Ａの前に置いてあるノートパソコン２０７を注視対象（３）、それ以外の他の位置（ｏｔｈｅｒ）を注視対象（４）とする。 FIG. 6 is a diagram for explaining detection and recording of biometric information, audio information, and conference video information. As shown in FIG. 6, in the conference room 200, viewers A to D are sitting on a chair around a table 201. The projector 204 projects a slide on the slide display unit 205. Along with this slide, the speaker 206 gives a lecture. The video camera 203 corresponds to the conference video detection unit 4 described above and detects a conference video. Video cameras 211, 212, 213, and 214 correspond to the biological information detection unit 2 described above. Here, the slide display unit 205 is a gaze target (visual target) (1), the speaker 206 is a gaze target (2), the notebook computer 207 placed in front of the viewer A is the gaze target (3), and the rest The other position (other) is set as the gaze target (4).

視聴者Ａがスライド表示部２０５を注視した場合、ビデオカメラ２１１により顔映像情報が検出され、視聴者Ａの注視対象は（１）となる。視聴者Ａが話者２０６を注視した場合にビデオカメラ２１２により顔映像情報が検出され、視聴者Ａの注視対象は（２）となる。視聴者Ａがノートパソコン２０７を注視した場合にビデオカメラ２１３により顔映像情報が検出され、視聴者Ａの注視対象は（３）となる。視聴者Ａが他の位置（ｏｔｈｅｒ）を注視した場合にビデオカメラ２１４により顔映像情報が検出され、視聴者Ａの注視対象は（４）となる。 When the viewer A gazes at the slide display unit 205, the face image information is detected by the video camera 211, and the gaze target of the viewer A is (1). When the viewer A gazes at the speaker 206, the face image information is detected by the video camera 212, and the gaze target of the viewer A is (2). When the viewer A gazes at the notebook computer 207, the face image information is detected by the video camera 213, and the gaze target of the viewer A is (3). When the viewer A gazes at another position, the face image information is detected by the video camera 214, and the viewer A's gaze target is (4).

注視対象判定部１０は、生体情報に基づいて、視聴者Ａの注視対象を特定する。音圧センサー３０１、３０２は音声情報検出部３に対応する。音圧センサー３０１は視聴者Ａの音声情報を検出し、音圧センサー３０２は話者２０６の音声情報を検出する。音圧対象判定部１１は、音圧センサー３０１及び３０２によって検出された音声情報に基づいて、音圧対象を特定する。 The gaze target determination unit 10 identifies the gaze target of the viewer A based on the biological information. The sound pressure sensors 301 and 302 correspond to the sound information detection unit 3. The sound pressure sensor 301 detects the audio information of the viewer A, and the sound pressure sensor 302 detects the audio information of the speaker 206. The sound pressure target determination unit 11 specifies a sound pressure target based on the sound information detected by the sound pressure sensors 301 and 302.

図７は、見逃し候補及び聞き逃し候補のシーン抽出を説明するための図である。図７において、上の図は、視聴者Ａの注視対象（視対象）を示す図、下の図は音圧対象を示す図である。（１）〜（４）は図６で示した注視対象を示す。視聴者Ａの注視対象は、（１）、（２）、（３）、（４）、及び（１）の順で移動しているのが分かる。音圧対象を示す図７の下の図において、実線で示した箇所が話者２０６の発話区間を示し、破線で示した箇所が視聴者Ａの発話区間を示す。視聴者Ａの注視対象が（２）で、音圧対象が話者２０６の場合、視聴者Ａは、話者２０６の方を見ているため、このシーンを見逃しシーン又は聞き逃しシーンの候補から除外する。一方、視聴者Ａの注視対象が（４）で、音圧対象が視聴者Ａである場合、視聴者Ａは、話者２０６以外の方を見ながら、自分で話しをしているため、このシーンを見逃し又は聞き逃しシーンの候補とする。 FIG. 7 is a diagram for explaining scene extraction of missed candidates and missed candidates. In FIG. 7, the upper diagram is a diagram showing a viewer A's gaze target (visual target), and the lower diagram is a diagram showing a sound pressure target. (1)-(4) show the gaze object shown in FIG. It can be seen that the watching target of the viewer A moves in the order of (1), (2), (3), (4), and (1). In the lower diagram of FIG. 7 showing the sound pressure target, a portion indicated by a solid line indicates the utterance section of the speaker 206, and a portion indicated by a broken line indicates the utterance section of the viewer A. If the viewer A's gaze target is (2) and the sound pressure target is the speaker 206, the viewer A is looking at the speaker 206. exclude. On the other hand, when the viewer A's gaze target is (4) and the sound pressure target is the viewer A, the viewer A is talking by himself while looking at people other than the speaker 206. Candidates for missing or missing scenes.

次に、図７及び図８を用いて見逃しシーン及び聞き逃しシーンの候補を抽出する処理を説明する。図８は見逃しシーン又は聞き逃しシーンの抽出処理フローチャートである。なお、図８では１つのシーンについての見逃しシーン又は聞き逃しシーンの抽出フローを示す。 Next, the process of extracting missed scene and missed scene candidates will be described with reference to FIGS. FIG. 8 is a flowchart of the process of extracting a missed scene or a missed scene. FIG. 8 shows a flow of extracting a missed scene or a missed scene for one scene.

ステップＳ１０１で、生体情報検出部２は、会議参加者の顔映像情報等を取得し、注視対象等を含む生体情報を検出する。音声情報検出部３は、話者２０６及び視聴者Ａの音声情報を検出する。ステップＳ１０２で、同期部８は、生体情報及び音声情報を会議映像に同期させて会議情報格納部９に格納する。ステップＳ１０３で、注視対象判定部１０は会議情報格納部９から取得した生体情報に基づいて、視聴者Ａの注視対象を判定する。また、音圧対象判定部１１は会議情報格納部９から取得した音声情報に基づいて音圧対象を判定する。 In step S 101, the biological information detection unit 2 acquires facial image information of the conference participant and detects biological information including a gaze target. The voice information detection unit 3 detects the voice information of the speaker 206 and the viewer A. In step S 102, the synchronization unit 8 stores biometric information and audio information in the conference information storage unit 9 in synchronization with the conference video. In step S 103, the gaze target determination unit 10 determines the gaze target of the viewer A based on the biological information acquired from the conference information storage unit 9. The sound pressure target determination unit 11 determines a sound pressure target based on the audio information acquired from the conference information storage unit 9.

ステップＳ１０４で、索引付与部１２は、注視対象判定部１０から取得した視聴者Ａの注視対象及び音圧対象検出部１１から取得した音圧対象に基づいて、音圧対象と注視対象がずれているかどうかを判断し、音圧対象と注視対象とが同じであると判断した場合、ステップ１０７に進み、判定しているシーンを聞き逃し／見逃し公報シーンから除外する。例えば、図７で、視聴者Ａの注視対象が（２）で、音圧対象が話者２０６となる場合、人物の注視対象と音圧対象が同じになるため、視聴者は講演者の方を見ていることになる。このような場合、索引付与部１２は人物の注視対象と音圧対象が同じシーンを見逃しシーン又は聞き逃しシーンの候補から除外する。 In step S 104, the index assignment unit 12 shifts the sound pressure target from the gaze target based on the viewer A's gaze target acquired from the gaze target determination unit 10 and the sound pressure target acquired from the sound pressure target detection unit 11. If it is determined that the sound pressure target and the gaze target are the same, the process proceeds to step 107, and the determined scene is excluded from the missed / missed bulletin scene. For example, in FIG. 7, when the viewer A's gaze target is (2) and the sound pressure target is the speaker 206, the person's gaze target is the same as the sound pressure target. Will be watching. In such a case, the index assigning unit 12 excludes a scene in which a person's gaze target and sound pressure target are the same as a missed scene or a missed scene candidate.

ステップＳ１０４で、索引付与部１２は、音圧対象と注視対象とがずれていると判断した場合、ステップＳ１０５に進み、視聴者Ａが音圧を発生しているかどうかを判断し、視聴者Ａが音圧を発生していないと判断した場合、ステップＳ１０８で見逃し候補として抽出し、このシーンに対して見逃しシーンを特定する索引を付与して、インデックスファイル格納部１３に格納する。例えば、図７で、視聴者Ａの注視対象が（３）で、音圧対象が話者２０６の場合、注視対象と音圧対象が異なる場合であるため、視聴者は講演者の方を見ていないことになる。このような場合、索引付与部１２は人物の注視対象と音圧対象が異なるシーンを見逃しシーンの候補として抽出する。 In step S104, when the index assigning unit 12 determines that the sound pressure target and the gaze target are different from each other, the process proceeds to step S105, where it is determined whether or not the viewer A generates sound pressure. If the sound pressure is not generated, it is extracted as a missed candidate in step S108, an index for identifying the missed scene is assigned to this scene, and the index file storage unit 13 stores the index. For example, in FIG. 7, when the viewer A's gaze target is (3) and the sound pressure target is the speaker 206, the gaze target and the sound pressure target are different, so the viewer looks at the speaker. Will not be. In such a case, the index assigning unit 12 overlooks a scene in which a person's gaze target and a sound pressure target are different and extracts them as scene candidates.

ステップＳ１０５で、索引付与部１２は、視聴者Ａが音圧を発生していると判断した場合、ステップＳ１０６で、このシーンを聞き逃し候補として抽出し、このシーンに対して聞き逃しシーンを特定する索引を付与して、インデックスファイル格納部１３に格納する。例えば図７で、視聴者Ａの注視対象が（４）で、音圧対象が視聴者Ａの場合、注視対象と音圧対象が異なり、視聴者が音圧を発生している場合になるため、聞き逃しシーンであると考えられる。このような場合、索引付与部１２は人物の注視対象と音圧対象が異なり、かつ人物が音圧を発生しているシーンであると判断し、聞き逃しシーンの候補として抽出する。 If it is determined in step S105 that the viewer A is generating sound pressure, the index assigning unit 12 extracts this scene as a missed candidate in step S106 and specifies a missed scene for this scene. The index is stored and stored in the index file storage unit 13. For example, in FIG. 7, when the viewer A's gaze target is (4) and the sound pressure target is the viewer A, the gaze target is different from the sound pressure target, and the viewer generates sound pressure. It is considered to be a missed scene. In such a case, the index assigning unit 12 determines that the person's gaze target is different from the sound pressure target, and the person is generating a sound pressure, and extracts it as a missed scene candidate.

次に、表示処理について説明すると、検索部１６は検索要求入力部１４の検索要求に基づいて、インデックスファイル格納部１３内に格納された索引を参照して、見逃しシーン又は聞き逃しシーンを検索する。表示制御部１７は検索部１６によって検索された見逃しシーン又は聞き逃しシーンに対応する会議映像を会議映像格納部７から取り出して表示部１８を介して表示する。表示部１８の表示を見ることによって見逃しシーン又は聞き逃しシーンを効率良く振り返ることができる。 Next, the display process will be described. The search unit 16 searches for a missed scene or a missed scene with reference to the index stored in the index file storage unit 13 based on the search request of the search request input unit 14. . The display control unit 17 extracts the conference video corresponding to the missed scene or the missed scene searched by the search unit 16 from the conference video storage unit 7 and displays it via the display unit 18. By looking at the display on the display unit 18, a missed scene or a missed scene can be looked back efficiently.

なお、上記実施形態では、注視対象及び音圧対象を用いて索引を付与する例について説明したが、音圧対象だけを用いてシーンに索引を付与するようにしてもよい。この場合、情報処理装置は、所定の空間を撮影した映像情報に同期された音声情報に基づいて、音圧対象を検出する音圧対象判定部１１と、音圧対象に基づいて、映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与する索引付与部１２とにより構成される。また情報検索システム１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ(Read Only Memory)、ＲＡＭ(Random Access Memory)、ハードディスク装置及びディスプレイ装置等により構成される。 In the above embodiment, an example in which an index is assigned using a gaze target and a sound pressure target has been described, but an index may be assigned to a scene using only the sound pressure target. In this case, the information processing apparatus includes: a sound pressure target determination unit 11 that detects a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space; The index assigning unit 12 assigns an index to the missed scene or the missed scene. The information retrieval system 1 includes a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), a hard disk device, and a display device.

なお、本発明の情報処理方法は、例えばＣＰＵ、ＲＯＭ、ＲＡＭ等を用いて実現され、プログラムをハードディスク装置や、ＣＤ−ＲＯＭ、ＤＶＤまたはフレキシブルディスクなどの可搬型記憶媒体等からインストールし、または通信回路からダウンロードし、ＣＰＵがこのプログラムを実行することで、各ステップが実現される。 Note that the information processing method of the present invention is realized using, for example, a CPU, ROM, RAM, and the like, and the program is installed from a hard disk device, a portable storage medium such as a CD-ROM, DVD, or a flexible disk, or communicated. Each step is realized by downloading from the circuit and executing this program by the CPU.

プログラムは、所定の空間を撮影した映像情報に同期された人物の生体情報に基づいて、該人物の注視対象を検出するステップ、前記人物の注視対象に基づいて、前記映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与する付与ステップをコンピュータに実行させる。また、プログラムは、所定の空間を撮影した映像情報に同期された音声情報に基づいて、音圧対象を検出するステップ、前記音圧対象に基づいて、前記映像情報内の見逃しシーン又は聞き逃しシーンに対して索引を付与するステップをコンピュータに実行させる。注視対象判定部１０が注視対象判定手段に、音圧対象判定部１１が音圧対象判定手段に、索引付与部１２が付与手段にそれぞれ対応する。情報処理装置は情報検索システム１内に組み込まれている。 The program detects a person's gaze target based on the biometric information of the person synchronized with the video information obtained by photographing a predetermined space, the missed scene in the video information based on the person's gaze target, or The computer is caused to execute an assigning step for assigning an index to the missed scene. Further, the program detects a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space, and a missed scene or a missed scene in the video information based on the sound pressure target. Causing the computer to execute an indexing step. The gaze target determination unit 10 corresponds to the gaze target determination unit, the sound pressure target determination unit 11 corresponds to the sound pressure target determination unit, and the index assignment unit 12 corresponds to the provision unit. The information processing apparatus is incorporated in the information search system 1.

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施例に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。例えば上記各実施例では映像として会議映像を用いて説明したが本発明の映像は会議映像に限られるものではない。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed. For example, in each of the embodiments described above, the conference video is used as the video, but the video of the present invention is not limited to the conference video.

本発明の実施形態における情報検索システムのブロック図である。It is a block diagram of the information search system in the embodiment of the present invention. 生体情報格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the biometric information storage part. 音声情報格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the audio | voice information storage part. 会議映像格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the meeting image | video storage part. 会議情報格納部に格納されたデータシートを示す図である。It is a figure which shows the data sheet stored in the meeting information storage part. 生体情報、音声情報、会議映像情報の検出及び記録を説明するための図である。It is a figure for demonstrating the detection and recording of biometric information, audio | voice information, and conference video information. 見逃し候補及び聞き逃し候補のシーン抽出を説明するための図である。It is a figure for demonstrating the scene extraction of an oversight candidate and an oversight candidate. 見逃しシーン又は聞き逃しシーンの抽出処理フローチャートである。It is an extraction process flowchart of a missed scene or a missed scene.

Explanation of symbols

１情報検索システム１１音圧対象抽出部
２生体情報検出部１２索引付与部
３音声情報検出部１３インデックス格納部
４会議映像検出部１４検索要求入力部
８同期部１６検索部
９会議情報格納部１７表示制御部
１０注視対象抽出部
DESCRIPTION OF SYMBOLS 1 Information search system 11 Sound pressure object extraction part 2 Biometric information detection part 12 Index provision part 3 Audio | voice information detection part 13 Index storage part 4 Conference video detection part 14 Search request input part 8 Synchronization part 16 Search part 9 Conference information storage part 17 Display control unit 10 Gaze target extraction unit

Claims

Gaze target determination means for determining a gaze target of the person based on the biometric information of the person synchronized with the video information obtained by photographing the predetermined space;
An information processing apparatus comprising: an assigning unit that assigns an index to the person's missed scene or missed scene in the video information based on the person's gaze target.

A sound pressure target determining means for determining a sound pressure target based on the audio information synchronized with the video information;
2. The indexing unit according to claim 1, wherein the assigning unit assigns an index to an overlooked scene or an overlooked scene of the person in the video information based on the gaze target and the sound pressure target of the person. Information processing device.

A sound pressure target determining means for determining a sound pressure target based on the audio information synchronized with the video information;
2. The apparatus according to claim 1, wherein the adding unit excludes a scene in the video information in which the person's gaze target and the sound pressure target are the same from candidates for the person's missed scene or missed scene. Information processing device.

A sound pressure target determining means for determining a sound pressure target based on the audio information synchronized with the video information;
The said giving means extracts a scene in which the person's gaze target and the sound pressure target are different in the scene in the video information as candidates for the person's missed scene or missed scene. Information processing device.

A sound pressure target determining means for determining a sound pressure target based on the audio information synchronized with the video information;
The assigning means extracts a scene in which the person's gaze target and the sound pressure target are different in the scene in the video information and the person is generating sound pressure as candidates for the person's missed scene. The information processing apparatus according to claim 1.

Sound pressure target determination means for determining a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space;
An information processing apparatus comprising: an assigning unit that assigns an index to a person's missed scene or missed scene in the video information based on the sound pressure target.

The information processing apparatus according to claim 1, further comprising synchronization means for synchronizing the biological information of the person with the video information.

The information processing apparatus according to any one of claims 2 to 6, further comprising a synchronization unit configured to synchronize the audio information with the video information.

7. The apparatus according to claim 1, further comprising a search unit that searches for an overlooked scene or an overlooked scene of the person with reference to the index provided by the assigning unit. Information processing device.

The information processing apparatus according to claim 9, further comprising display control means for displaying an overlooked scene or a missed scene of the person searched by the search means.

An index assigned to the person's missed scene or missed scene in the video information using the gaze target of the predetermined person obtained from the biological information of the person synchronized with the video information obtained by photographing the predetermined space. Storage means for storing;
An information processing apparatus comprising: display control means for displaying an overlooked scene or an overlooked scene of the person in the video information with reference to an index stored in the storage means.

Determining a gaze target of the predetermined person based on the biological information of the person synchronized with the video information obtained by photographing the predetermined space;
An information processing method comprising: adding an index to the person's missed scene or missed scene in the video information based on the person's gaze target.

Further comprising the step of determining a sound pressure target based on the audio information synchronized with the video information;
13. The index according to claim 12, wherein the assigning step assigns an index to the missed scene or the missed scene of the person in the video information based on the gaze target of the person and the sound pressure target. Information processing method.

Determining a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space;
Providing an index to a person's missed scene or missed scene in the video information based on the sound pressure target.

The information processing method according to claim 12, further comprising a step of displaying a missed scene or a missed scene of the person with reference to the index.

Determining a gaze target of the predetermined person based on the biological information of the person synchronized with the video information obtained by photographing the predetermined space;
A program for causing a computer to execute an assigning step of assigning an index to the person's missed scene or missed scene in the video information based on the person's gaze target.

Determining a sound pressure target based on audio information synchronized with video information obtained by photographing a predetermined space;
The program for making a computer perform the step which provides an index with respect to the person's missed scene or the missed scene in the said video information based on the said sound pressure object.