JP2009065324A

JP2009065324A - Image providing device, image providing method and image providing program

Info

Publication number: JP2009065324A
Application number: JP2007229969A
Authority: JP
Inventors: Mitsuhiro Wagatsuma; 光洋我妻; Yukinobu Taniguchi; 行信谷口; Yosuke Torii; 陽介鳥井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-09-05
Filing date: 2007-09-05
Publication date: 2009-03-26
Anticipated expiration: 2027-09-05
Also published as: JP4755156B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a new technology for providing an image that a user desires to view, by retrieving the image among large amounts of moving images and still images photographed at a specific location. <P>SOLUTION: When images photographed at the specific place are inputted and stored in a storing means, a photographing position and a photographing angle of the inputted images are deduced, by collating the inputted images with an environmental model of the specific place, and information of the estimated photographing position and the estimated photographing angles are recorded, while being associated with the images stored in the storing means. When there is a request from a user for browsing an image that designates the position of the visual point and the angle of a visual line, images which are seen from the position of the visual point and the angle of the visual line designated by the user are retrieved, by retrieving the photographing positions and the photographic angles stored in the storing means; an image of an environmental model of a particular place which is seen from the position of the visual point and the angle of the visual line designated by the user is displayed on a display; and a list of summary images showing a summary of the retrieved images is displayed on the image. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、特定の場所で撮影された膨大な動画像や静止画像の中から、ユーザが見たいと考えている画像を検索してユーザに提供する画像提供装置及びその方法と、その画像提供装置の実現に用いられる画像提供プログラムとに関し、特に、撮影位置や撮影方向や撮影時刻を手がかりにして、ユーザが見たいと考えている画像を検索してユーザに提供する画像提供装置及びその方法と、その画像提供装置の実現に用いられる画像提供プログラムとに関する。 The present invention provides an image providing apparatus and method for searching for and providing to a user an image that the user wants to see from a vast number of moving images and still images taken at a specific location, and the image provision The present invention relates to an image providing program used for realizing the apparatus, and in particular, an image providing apparatus and method for searching for an image that the user wants to see and providing the user with a clue of a photographing position, a photographing direction, and a photographing time. And an image providing program used for realizing the image providing apparatus.

サッカーや野球のようなスポーツイベントを多数のカメラで撮影し、撮影映像を好みの方向、位置から目的の選手を映したシーンを検索したいといったニーズがある。 There is a need to shoot sports events such as soccer and baseball with a large number of cameras, and to search for a scene in which the target player is reflected from the direction and position of the captured video.

さらに、運動会など多くの人が撮影した映像、写真をコミュニティでアーカイブ、共有することで、自分が取り逃がしたハイライトシーンを検索して視聴するといった用途も考えられる。 Furthermore, it is also possible to search for and view highlight scenes that have been missed by archiving and sharing videos and photos taken by many people, such as athletic meet, in the community.

このようなことを背景にして、多くのユーザが撮影した同一の撮影対象についてのデジタル情報を共有するウェブサイトも提供されている。 Against this background, there are also provided websites that share digital information about the same subject to be photographed by many users.

一方、デジタル製品の記憶媒体が大きくなっており、共有する全ての写真や映像を見るには時間がかかりすぎてしまう。 On the other hand, the storage media of digital products is growing, and it takes too much time to view all the photos and videos that are shared.

そこで素早い、直感的な方法で検索を行いたいという要求がある。例えば、特定の時間帯に特定の場所から撮った写真をまとめて見たい、といった検索である。 Therefore, there is a demand for searching in a quick and intuitive manner. For example, it is a search that wants to see a group of photos taken from a specific place during a specific time period.

画像検索の従来技術として、画像が共有された後に個々の画像にタグを付与し、そのタグを元に検索する技術がある。 As a conventional technique for image retrieval, there is a technique for assigning a tag to each image after the image is shared and performing retrieval based on the tag.

例えば、写真や映像ごとに「評価」や「人気度」を設定し、写真公開後にそれを見る人が評価を入力したり、見られた回数などで人気を算出したりすることで、評価や人気の高さなどを利用し、評価や人気の高いものを検索する技術がある。 For example, you can set “Evaluation” and “Popularity” for each photo and video, and after viewing the photo, you can enter the rating or calculate the popularity based on the number of times it was viewed. There are technologies that use popularity to search for items that are highly rated and popular.

このような画像検索技術は閲覧する側のアクションを前提とするものであるが、アップロードする側のアクションを前提とするものとして、それぞれの写真や映像にコメントを付与することで、それらのコメントから特定の用語を検索して画像を検索する技術もある。 Such image search technology presupposes the action of the viewer, but as the premise of the action of the uploader, by adding comments to each photo or video, from those comments There is also a technique for searching for an image by searching for a specific term.

また、ユーザの挙動を記録しておくことで、多くの人が続けて見ている写真、映像群を抽出しておき、特定の写真を見た人にそれと同じ群に属する写真、映像を紹介する技術もある。 Also, by recording the user's behavior, we can extract a group of photos and videos that many people continue to see, and introduce the photos and videos belonging to the same group to those who have seen a specific photo. There is also technology to do.

また、デジタル写真を共有できるコミュニティサイトの写真を地図上に表示し、地図を用いて検索をするというサービスを提供するサイトも提供されている（例えば、非特許文献１の７６頁参照）。
Ambient Findability,出版社：Oreilly & Associates Inc (2005/10),ISBN-10：0596007655,ISBN-13：978-0596007652 In addition, a site that provides a service of displaying a photograph of a community site that can share a digital photograph on a map and performing a search using the map is also provided (see, for example, page 76 of Non-Patent Document 1).
Ambient Findability, Publisher: Oreilly & Associates Inc (2005/10), ISBN-10: 0596007655, ISBN-13: 978-0596007652

前述したように、画像検索の従来技術として、閲覧する側のアクションを前提とするものと、アップロードする側のアクションを前提とするものとがある。 As described above, there are two types of conventional image search technologies based on the browsing action and the uploading action.

このうち、閲覧する側のアクションを前提としたサービスでは、不特定多数のユーザが積極的に使用した後で初めて利用可能である。したがって、画像や映像を公開した直後はその情報を検索することは難しく、また、特定のコミュニティに対してのみ公開される情報ではユーザの数が限定されており、これらの技術で検索できるようにするのは困難である。 Among these services, the service based on the action on the browsing side can be used only after an unspecified number of users actively use it. Therefore, it is difficult to search for information immediately after publishing an image or video, and the number of users is limited for information that is disclosed only to specific communities, so that these technologies can be used for searching. It is difficult to do.

一方、アップロードする側のアクションを前提とするサービスでは、撮影位置などの情報を手動で入力することになる。しかし、数百枚程度の写真が保存可能である記憶媒体は珍しくない。特定のイベントに参加する際、１日に１００枚を超えて撮影するようなことが考えられるが、それらの写真に手動で情報を付与することは現実的ではない。 On the other hand, in a service that presupposes an action on the uploading side, information such as a shooting position is manually input. However, a storage medium that can store several hundred photos is not uncommon. When participating in a specific event, it may be possible to take more than 100 images per day, but it is not practical to manually add information to these photos.

ＧＰＳ機能を使って撮影位置を自動検出してメタデータとして用いることも可能であるが、ＧＰＳ機能のある携帯電話に付属するカメラは解像度に限界があり、解像度の高いデジタルカメラをＧＰＳと共に持ち歩き、写真に撮影位置を付与させるサービスは、そもそもＧＰＳをカメラと共に持ち歩く煩わしさが伴う。 It is also possible to automatically detect the shooting position using the GPS function and use it as metadata, but the camera attached to the mobile phone with the GPS function has a limited resolution, carry a digital camera with high resolution with GPS, In the first place, the service of assigning a shooting position to a photograph is accompanied by the trouble of carrying the GPS with the camera.

しかも、現在、カメラと共に用いられているＧＰＳサービスは１０［ｍ］程度の誤差が付きまとい、特定のイベント会場の中でどの場所なのかを特定するには不十分な精度である。 Moreover, the GPS service currently used with the camera has an error of about 10 [m], and the accuracy is insufficient to identify the location in the specific event venue.

また、デジタル写真にはｅｘｉｆ（Exchangeable Image File Format) と呼ばれるメタデータが付与されているのに対して、映像ファイルには撮影時刻をメタデータとして保持できないフォーマットもある。そのようなファイルでは撮影時刻での検索はできない。 In addition, metadata called exif (Exchangeable Image File Format) is added to digital photos, whereas there are formats in which video time cannot be stored as metadata. Such a file cannot be searched by shooting time.

本発明は、これらの問題点を解決するためになされたものであり、特定の場所で撮影された膨大な動画像や静止画像の中から、ユーザが見たいと考えている画像を検索してユーザに提供することを実現するために、閲覧ユーザ、投稿ユーザの積極的なアクションを必要とせず、ＧＰＳより高い精度で撮影位置情報を取得し、それによって直感的な検索を可能とする新たな画像提供技術の提供を目的とする。 The present invention has been made to solve these problems, and searches for an image that the user wants to view from a vast number of moving images and still images taken at a specific location. In order to realize the provision to the user, a new action that does not require the active action of the browsing user and the posting user, obtains shooting position information with higher accuracy than GPS, and thereby enables an intuitive search. The purpose is to provide image providing technology.

この目的を達成するために、本発明の画像提供装置は、特定の場所で撮影された画像の中から、ユーザの閲覧要求を満たす画像を検索してユーザに提供することを実現するために、（１）特定の場所で撮影された動画像又は静止画像を入力して蓄積手段に保存する入力手段と、（２）入力手段の入力した画像と特定の場所の環境モデルとを照合することで、入力手段の入力した画像の撮影位置及び撮影角度を推定する推定手段と、（３）蓄積手段に保存される画像に対応付けて、推定手段の推定したその画像の撮影位置及び撮影角度の情報を記録する記録手段と、（４）ユーザから視点位置及び／又は視線角度の情報を指定する画像の閲覧要求がある場合に、蓄積手段に保存される撮影位置及び撮影角度の情報を検索することで、ユーザの指定する視点位置及び／又は視線方向から見えることになる画像を検索する検索手段と、（５）ユーザの指定する視点位置及び／又は視線方向から見えることになる特定の場所の環境モデルの画像をディスプレイに表示するとともに、その画像上に、検索手段の検索した画像の概要を示す要約画像の一覧を表示する表示手段とを備えるように構成する。 In order to achieve this object, the image providing apparatus of the present invention searches for an image satisfying the user's browsing request from images taken at a specific location and provides the user with the image. (1) An input unit that inputs a moving image or a still image shot at a specific place and stores it in the storage unit; (2) An image input by the input unit and an environmental model of the specific place are collated. , An estimation unit for estimating the shooting position and shooting angle of the image input by the input unit; and (3) information on the shooting position and shooting angle of the image estimated by the estimation unit in association with the image stored in the storage unit. (4) When there is an image browsing request specifying information on the viewpoint position and / or line-of-sight angle from the user, information on the shooting position and shooting angle stored in the storage means is retrieved. The user's finger Search means for searching for an image that can be seen from the viewpoint position and / or line-of-sight direction, and (5) an environment model image of a specific place that is visible from the viewpoint position and / or line-of-sight direction specified by the user Display means for displaying a list of summary images indicating an outline of the image searched by the search means on the image.

この構成を採るときに、蓄積手段に保存される画像の持つ音声の類似点を検出して、それに基づいて、蓄積手段に保存される撮影時刻の情報を持たない画像について、その撮影時刻を推定する第２の推定手段を備えることがある。 When this configuration is adopted, the similarities of the voices of the images stored in the storage means are detected, and based on the detected similarities, the shooting times of the images having no shooting time information stored in the storage means are estimated. Second estimation means may be provided.

第２の推定手段を備える場合には、記録手段は、入力手段の入力した画像の内の撮影時刻の情報を持つ画像については、その撮影時刻の情報をさらに記録し、撮影時刻の情報を持たない画像については、第２の推定手段の推定した撮影時刻の情報をさらに記録することになる。そして、検索手段は、ユーザから撮影時刻の情報についても指定する画像の閲覧要求がある場合には、蓄積手段に保存される撮影時刻の情報についても検索することで、ユーザの指定する撮影時刻に撮影された画像を検索することになる。 When the second estimating unit is provided, the recording unit further records the shooting time information for the image having the shooting time information in the image input by the input unit, and has the shooting time information. For images that do not exist, the information of the photographing time estimated by the second estimating means is further recorded. Then, when there is an image browsing request that also specifies the shooting time information from the user, the searching means also searches for the shooting time information stored in the storage means, thereby obtaining the shooting time specified by the user. The photographed image is searched.

以上の各処理手段はコンピュータプログラムでも実現できるものであり、このコンピュータプログラムは、適当なコンピュータ読み取り可能な記録媒体に記録して提供されたり、ネットワークを介して提供され、本発明を実施する際にインストールされてＣＰＵなどの制御手段上で動作することにより本発明を実現することになる。 Each of the above processing means can also be realized by a computer program. This computer program is provided by being recorded on an appropriate computer-readable recording medium or provided via a network, and is used when implementing the present invention. The present invention is realized by being installed and operating on a control means such as a CPU.

このように構成される本発明の画像提供装置では、特定の場所で撮影された動画像又は静止画像を入力して蓄積手段に保存すると、その入力した画像と特定の場所の環境モデルとを照合することで、その入力した画像の撮影位置及び撮影角度を推定して、蓄積手段に保存した画像に対応付けて、その推定した撮影位置及び撮影角度の情報を記録する。このとき、入力した画像の内の撮影時刻の情報を持つ画像については、その撮影時刻の情報についても記録する。 In the image providing apparatus of the present invention configured as described above, when a moving image or a still image taken at a specific place is input and stored in the storage means, the input image is collated with an environmental model at the specific place. Thus, the shooting position and shooting angle of the input image are estimated, and information on the estimated shooting position and shooting angle is recorded in association with the image stored in the storage unit. At this time, with respect to an image having shooting time information in the input image, the shooting time information is also recorded.

ここで、画像の撮影位置及び撮影角度の推定を確かなものとするために、動画像を入力する場合には、その動画像の中から選択される最もズームアウトした画像を用いて特定の場所の環境モデルとの照合を行うことで、その動画像の撮影位置及び撮影角度を推定したり、その動画像から作成されるパノラマ画像を用いて特定の場所の環境モデルとの照合を行うことで、その動画像の撮影位置及び撮影角度を推定することがある。 Here, in order to ensure the estimation of the shooting position and shooting angle of an image, when inputting a moving image, a specific place is selected using the most zoomed-out image selected from the moving images. By collating with the environmental model, it is possible to estimate the shooting position and angle of the moving image, or to collate with the environmental model at a specific location using the panoramic image created from the moving image. The shooting position and shooting angle of the moving image may be estimated.

このようにして、蓄積手段には、特定の場所において多数の投稿ユーザにより撮影された動画像や静止画像が保存されていくことになるとともに、それらの画像の撮影位置及び撮影角度の情報が記録されていくことになる。 In this way, the storage means stores moving images and still images shot by a large number of posting users at a specific location, and records information on the shooting positions and shooting angles of those images. It will be done.

このとき、撮影時刻の情報を持つ画像については、さらに、その撮影時刻の情報についても記録していくことになるので、撮影時刻の情報を持たない画像については、蓄積手段に保存される画像の持つ音声の類似点を検出して、それに基づいて、その撮影時刻を推定して記録していくことになる。 At this time, the image having the shooting time information is further recorded with respect to the shooting time information. Therefore, for the image having no shooting time information, the image stored in the storage means is stored. The similarities of the speech possessed are detected, and based on this, the shooting time is estimated and recorded.

これから、蓄積手段には、最終的に、特定の場所において多数の投稿ユーザにより撮影された動画像や静止画像が保存されていくことになるとともに、それらの画像の撮影位置及び撮影角度の情報と、それらの画像の撮影時刻の情報とが記録されていくことになる。 From now on, in the storage means, moving images and still images taken by a large number of posting users in a specific place will be stored, and information on the shooting positions and shooting angles of those images will be stored. The shooting time information of these images is recorded.

この蓄積手段の保存する画像およびそれらの画像の持つ属性情報を受けて、閲覧ユーザから視点位置や視線角度の情報を指定する画像の閲覧要求があると、蓄積手段に保存される撮影位置及び撮影角度の情報を検索することで、閲覧ユーザの指定する視点位置や視線方向から見えることになる画像を検索する。 Upon receiving images stored in the storage means and attribute information of those images, and when there is a browsing request for an image designating information on the viewpoint position and line-of-sight angle from the viewing user, the shooting position and the shooting stored in the storage means By searching for the angle information, an image that can be seen from the viewpoint position or line-of-sight direction specified by the viewing user is searched.

そして、その検索結果を閲覧ユーザに提供すべく、閲覧ユーザの指定する視点位置や視線方向から見えることになる特定の場所の環境モデルの画像をディスプレイに表示するとともに、その画像上に、その検索した画像の概要を示す要約画像の一覧を表示する。 Then, in order to provide the search result to the browsing user, an image of the environmental model of a specific place that is visible from the viewpoint position and the line-of-sight direction specified by the browsing user is displayed on the display, and the search is performed on the image. A summary image list showing an overview of the selected images is displayed.

このとき、閲覧ユーザから撮影時刻の情報についても指定する画像の閲覧要求がある場合には、蓄積手段に保存される撮影時刻の情報についても検索することで、閲覧ユーザの指定する撮影時刻に撮影された画像を検索することになる。 At this time, if there is an image browsing request that also specifies the shooting time information from the browsing user, the shooting time information stored in the storage means is also searched to capture the shooting time specified by the browsing user. The retrieved image will be searched.

また、入力した画像と特定の場所の環境モデルとを照合するときに、その照合の度合いに基づいて、入力した画像の撮影位置及び撮影角度の推定の確信度を算出して、閲覧ユーザに対して、より閲覧要求に合致する画像を提供すべく、検索した画像の中から確信度の大きな画像を優先する形で画像を選択することで最終的な検索結果を得ることがある。 In addition, when collating the input image with the environmental model of a specific location, based on the degree of collation, calculate the certainty of estimating the shooting position and shooting angle of the input image, and Thus, in order to provide an image that more closely matches the browsing request, a final search result may be obtained by selecting an image with a higher certainty factor from the searched images in priority.

また、閲覧ユーザに対して、どの撮影場所の画像が提供されるのかということを把握できるようにするために、検索した画像の撮影位置及び撮影角度の指す撮影場所に対応付けられる環境モデルの画像上位置に検索結果の要約画像を表示することがある。 In addition, in order to make it possible for the viewing user to know which shooting location image is provided, the image of the environmental model associated with the shooting location indicated by the shooting position and shooting angle of the searched image A summary image of search results may be displayed at the upper position.

また、閲覧ユーザに対して、どのような傾きを持つ画像が提供されるのかということを把握できるようにするために、検索した画像の撮影角度に応じて、閲覧ユーザが閲覧するときの状態に合わせる形態で検索結果の要約画像を傾けて表示することがある。 In addition, in order to be able to grasp what kind of inclination the image is provided to the browsing user, the state when the browsing user browses according to the shooting angle of the searched image. In some cases, the search result summary image is tilted and displayed.

また、閲覧ユーザに対して、遠くで撮影された画像であるのか近くで撮影された画像であるのかということを把握できるようにするために、検索した画像の撮影位置に応じて、遠くで撮影された画像ほど小さくなる形態で検索結果の要約画像の大きさを変えて表示することがある。 Also, in order to make it possible for the viewing user to grasp whether the image was taken at a distance or an image taken at a distance, the image is taken at a distance according to the shooting position of the searched image. The size of the summary image of the search result may be changed and displayed in a form that becomes smaller as the image is displayed.

本発明によれば、複数の動画像や静止画像を登録し環境モデルを構築する段階を有することにより、ＧＰＳよりも高い精度で撮影位置情報を取得する効果があり、撮影角度も定めることができる効果がある。 According to the present invention, by having a step of registering a plurality of moving images and still images and constructing an environmental model, there is an effect of acquiring shooting position information with higher accuracy than GPS, and a shooting angle can also be determined. effective.

また、多数の動画像や静止画像のそれぞれに撮影位置や撮影角度や撮影時刻を自動的に付与し、それらを利用することで、検索のために閲覧者・投稿者の積極的なアクションを必要としない効果がある。 In addition, it automatically assigns the shooting position, shooting angle, and shooting time to each of a large number of moving images and still images, and by using them, it is necessary for the viewer / poster to take an active action for searching. There is no effect.

また、環境モデルを作成し、それを表示することで閲覧者に直感的な検索を提供できる効果がある。 Moreover, there is an effect that an intuitive search can be provided to a viewer by creating an environment model and displaying it.

また、時刻情報を持たない動画像に関しても、すでに登録されている画像との音声の同期を試みることで時刻情報を推定し、検索の対象をより広げる効果がある。 In addition, even for a moving image having no time information, there is an effect that the time information is estimated by attempting to synchronize sound with an already registered image, and the search target is further expanded.

以下、実施の形態に従って本発明を詳細に説明する。 Hereinafter, the present invention will be described in detail according to embodiments.

図１に、本発明を具備する画像提供システムのシステム構成を図示する。 FIG. 1 illustrates a system configuration of an image providing system including the present invention.

本発明を具備する画像提供システムは、特定のイベント会場などで撮影された動画像や静止画像の中から、閲覧ユーザの閲覧要求を満たす動画像や静止画像（以下、動画像や静止画像を画像と総称することがある）を検索して閲覧ユーザに提供する処理を行うものであり、この処理を行うために、図１に示すように、環境モデル計算情報入力装置１０と、環境モデル計算装置１１と、環境モデル記憶装置１２と、画像入力装置１３と、画像記憶装置１４と、推定装置１５と、画像属性情報記憶装置１６と、検索条件入力装置１７と、画像検索装置１８と、出力装置１９とを備える。 The image providing system including the present invention is a moving image or a still image (hereinafter referred to as a moving image or a still image) that satisfies a browsing user's browsing request from a moving image or a still image taken at a specific event venue. In order to perform this process, an environment model calculation information input device 10 and an environment model calculation device are provided as shown in FIG. 11, environmental model storage device 12, image input device 13, image storage device 14, estimation device 15, image attribute information storage device 16, search condition input device 17, image search device 18, and output device 19.

環境モデル計算情報入力装置１０は、撮影対象となる特定のイベント会場の環境モデル（３次元立体モデル）の計算に必要となる情報を入力する。環境モデル計算装置１１は、環境モデル計算情報入力装置１０の入力した情報に基づいて、撮影対象となるイベント会場の環境モデルを計算する。環境モデル記憶装置１２は、環境モデル計算装置１１の計算した環境モデルを記憶する。 The environmental model calculation information input device 10 inputs information necessary for calculating an environmental model (three-dimensional solid model) of a specific event venue to be photographed. The environmental model calculation device 11 calculates the environmental model of the event venue to be photographed based on the information input by the environmental model calculation information input device 10. The environmental model storage device 12 stores the environmental model calculated by the environmental model calculation device 11.

画像入力装置１３は、特定のイベント会場で撮影された動画像や静止画像を入力する。画像記憶装置１４は、画像入力装置１３の入力した画像を記憶する。 The image input device 13 inputs a moving image or a still image taken at a specific event venue. The image storage device 14 stores the image input by the image input device 13.

この画像記憶装置１４の記憶する情報についてさらに説明するならば、画像入力装置１３の入力した画像が静止画像である場合には、その静止画像のファイルを記憶することに加えて、図２（ａ）に示すように、画像ＩＤ、画像ファイルの型、投稿者ＩＤ、ファイル名などの情報を記憶する。また、画像入力装置１３の入力した画像が動画像である場合には、その動画像のファイルとその動画像から抽出された代表画像のファイルとを記憶することに加えて、図２（ｂ）に示すように、動画ＩＤ、動画像ファイルの型、代表画像ファイルの型、投稿者ＩＤ、動画像のファイル名、代表画像のファイル名などの情報を記憶する。 The information stored in the image storage device 14 will be further described. If the image input by the image input device 13 is a still image, in addition to storing the still image file, FIG. ), Information such as an image ID, an image file type, a contributor ID, and a file name is stored. When the image input by the image input device 13 is a moving image, in addition to storing the moving image file and the representative image file extracted from the moving image, FIG. As shown in FIG. 4, information such as a moving image ID, a moving image file type, a representative image file type, a contributor ID, a moving image file name, and a representative image file name is stored.

推定装置１５は、画像記憶装置１４に記憶される画像と環境モデル記憶装置１２に記憶される環境モデルとを照合することで、画像入力装置１３の入力した画像の撮影位置及び撮影方向（撮影角度）を推定し、さらに、画像記憶装置１４に記憶される画像の持つ音声の類似点を検出して、それに基づいて、画像記憶装置１４に記憶される撮影時刻の情報を持たない画像について、その撮影時刻を推定する。 The estimation device 15 collates the image stored in the image storage device 14 with the environment model stored in the environment model storage device 12, thereby obtaining the shooting position and shooting direction (shooting angle) of the image input by the image input device 13. ) Is detected, and the similarities of the voices of the images stored in the image storage device 14 are detected. On the basis of the detected similarities of the images stored in the image storage device 14, Estimate the shooting time.

下記の参考文献１に記載されるように、画像処理を用いて、画像の撮影位置だけでなく画像の撮影方向を推定する技術が開示されている。この技術では、環境モデルを実際に撮影し、撮影された画像列から特徴点を抽出し、被写体の３Ｄモデルを復元すると共に撮影された場所を推定するものである。 As described in Reference Document 1 below, a technique for estimating not only an image shooting position but also an image shooting direction using image processing is disclosed. In this technique, an environmental model is actually photographed, feature points are extracted from the photographed image sequence, a 3D model of the subject is restored, and a photographed place is estimated.

この推定装置１５では、例えば、この参考文献１に記載される技術を使って、画像記憶装置１４に記憶される画像と環境モデル記憶装置１２に記憶される環境モデルとを照合することで、画像入力装置１３の入力した画像の撮影位置及び撮影方向を推定するように処理する。 In this estimation device 15, for example, by using the technique described in this reference 1, the image stored in the image storage device 14 and the environmental model stored in the environment model storage device 12 are collated, thereby obtaining an image. Processing is performed so as to estimate the shooting position and shooting direction of the image input by the input device 13.

〔参考文献１〕Carlo Tomasi, Takeo Kanade : "Shape and Motion from Image Stream s", International Journal of Computer Vision Volume 9, Number 2, pp.137-154 (1992).
画像属性情報記憶装置１６は、画像記憶装置１４に記憶される画像と対応をとりつつ、推定装置１５の推定した画像の撮影位置及び撮影角度の情報を記憶するとともに、画像の撮影時刻の情報（画像入力装置１３の入力した画像の持つ撮影時刻の情報や、推定装置１５の推定した撮影時刻の情報）を記憶する。 [Reference 1] Carlo Tomasi, Takeo Kanade: "Shape and Motion from Image Streams", International Journal of Computer Vision Volume 9, Number 2, pp.137-154 (1992).
The image attribute information storage device 16 stores information on the shooting position and shooting angle of the image estimated by the estimation device 15 while corresponding to the image stored in the image storage device 14 and information on the shooting time of the image ( The information on the photographing time of the image input by the image input device 13 and the information on the photographing time estimated by the estimating device 15) are stored.

ここで、画像記憶装置１４および画像属性情報記憶装置１６が特許請求の範囲に記載する蓄積手段に相当するものである。 Here, the image storage device 14 and the image attribute information storage device 16 correspond to storage means described in the claims.

この画像属性情報記憶装置１６の記憶する情報についてさらに説明するならば、画像記憶装置１４に記憶される画像が静止画像である場合には、図３（ａ）に示すように、画像ＩＤに対応付けて、撮影時刻、撮影位置、撮影角度、推定の確信度、被写体の人数、人気などの情報を記憶する。また、画像記憶装置１４に記憶される画像が動画像である場合には、図３（ｂ）に示すように、動画ＩＤに対応付けて、撮影開始時刻、撮影終了時刻、撮影位置、撮影角度、同期リスト、推定の確信度、人気などの情報を記憶する。 If the information stored in the image attribute information storage device 16 is further described, when the image stored in the image storage device 14 is a still image, it corresponds to the image ID as shown in FIG. In addition, information such as the shooting time, the shooting position, the shooting angle, the certainty of estimation, the number of subjects, and the popularity is stored. When the image stored in the image storage device 14 is a moving image, as shown in FIG. 3B, the shooting start time, the shooting end time, the shooting position, and the shooting angle are associated with the moving image ID. , Information such as synchronization list, estimation confidence, popularity.

検索条件入力装置１７は、閲覧ユーザの発行する検索要求を受け取るものであり、視点位置や視線角度の情報を指定する画像の検索条件を入力する。画像検索装置１８は、検索条件入力装置１７の入力した検索条件をキーにして画像属性情報記憶装置１６に記憶される画像の撮影位置、撮影角度及び撮影時刻の情報を検索することで、検索条件入力装置１７の入力した検索条件の指す画像を検索して、その検索結果を表す３次元描画データ（３Ｄ化したイベント会場に対応付けて検査結果の画像を示す３Ｄマップの描画データ）を生成する。出力装置１９は、画像検索装置１８の生成した検索結果を表す３次元描画データを出力する。 The search condition input device 17 receives a search request issued by a browsing user, and inputs an image search condition that specifies information on the viewpoint position and the line-of-sight angle. The image search device 18 uses the search condition input by the search condition input device 17 as a key to search for information on the shooting position, shooting angle, and shooting time of the image stored in the image attribute information storage device 16. The image indicated by the search condition input by the input device 17 is searched, and three-dimensional drawing data representing the search result (3D map drawing data indicating the image of the examination result in association with the 3D event venue) is generated. . The output device 19 outputs three-dimensional drawing data representing the search result generated by the image search device 18.

図４に、このように構成される本発明を具備する画像提供システムの実行するフローチャートを図示する。 FIG. 4 shows a flowchart executed by the image providing system having the present invention configured as described above.

次に、このフローチャートに従って、このように構成される本発明を具備する画像提供システムの実行する処理について説明する。 Next, processing executed by the image providing system including the present invention configured as described above will be described according to this flowchart.

本発明を具備する画像提供システムは、図４のフローチャートに示すように、先ず最初に、ステップＳ１００で、管理ユーザが撮影対象となるイベント会場の環境モデルを計算により求めて環境モデル記憶装置１２に登録する。 As shown in the flowchart of FIG. 4, in the image providing system including the present invention, first, in step S100, the management user obtains the environmental model of the event venue to be photographed by calculation and stores it in the environmental model storage device 12. sign up.

この環境モデルの構築には、具体的には以下のような方法がある。 Specifically, there are the following methods for building this environmental model.

（イ）方法１
管理ユーザが環境モデル計算装置１１にＣＡＤや３Ｄを扱うことができるソフトウェアを用意しておき、３Ｄ化したいイベント会場の設計図に書かれている情報を環境モデル計算情報入力装置１０から入力して環境モデル計算装置１１に代入し、その結果得られたイベント会場の環境モデルを環境モデル記憶装置１２に転送し格納する。 (B) Method 1
An administrative user prepares software capable of handling CAD and 3D in the environmental model calculation device 11 and inputs information written in the design drawing of the event venue desired to be 3D from the environmental model calculation information input device 10. The result is assigned to the environmental model calculation device 11, and the resulting event venue environmental model is transferred to the environmental model storage device 12 and stored therein.

（ロ）方法２
管理ユーザが十分離れた二か所以上の場所から、３Ｄに復元したいイベント会場をビデオカメラで撮影する。撮影した動画像をａｖｉ(Audio Video Interleaving)や、ｍｐｇ(Moving Picture Experts Group)などの形式で符号化し、環境モデル計算情報入力装置１０から環境モデル計算装置１１にデータを転送する。環境モデル計算装置１１では、特徴点を抽出し、不動なものを検出した上で、特徴点の照合によりイベント会場の３Ｄ化された環境モデルを構築する。その得られた環境モデルを環境モデル記憶装置１２に転送し格納する。 (B) Method 2
A video camera is used to shoot an event venue to be restored to 3D from two or more locations that are sufficiently separated by the management user. The captured moving image is encoded in a format such as avi (Audio Video Interleaving) or mpg (Moving Picture Experts Group), and the data is transferred from the environmental model calculation information input device 10 to the environmental model calculation device 11. The environmental model calculation device 11 extracts feature points, detects immovable objects, and constructs a 3D environment model of the event venue by matching the feature points. The obtained environmental model is transferred to and stored in the environmental model storage device 12.

ここで、特徴点の抽出には、下記の参考文献２に記載されるHarris作用素、Moravec 作用素などを用いることができる。また、不動なものの検出は、特徴点を追跡することで可能であり、特徴点の抽出はＫＬＴ法などを用いることができる。また、特徴点の照合から３Ｄ化するには、前述の参考文献１に記載されるTomasi、Kanadeの因子分解法を用いることができる。また、環境モデル計算情報入力装置１０から環境モデル計算装置１１へのデータの転送はインターネットを介して転送する。また、携帯電話で撮影した動画像をｍｐｇなどの形式でアップロードすることもできる。もちろん、環境モデル計算情報入力装置１０と環境モデル計算装置１１が一つの装置となることもでき、その場合にはネットワークを介する必要はない。 Here, for extracting feature points, a Harris operator, a Moravec operator, or the like described in Reference Document 2 below can be used. Further, immovable objects can be detected by tracking feature points, and feature points can be extracted using the KLT method or the like. In addition, the Tomasi and Kanade factorization method described in the above-mentioned Reference 1 can be used for 3D conversion from feature point matching. Further, data is transferred from the environmental model calculation information input device 10 to the environmental model calculation device 11 via the Internet. It is also possible to upload a moving image shot with a mobile phone in a format such as mpg. Of course, the environmental model calculation information input device 10 and the environmental model calculation device 11 may be a single device, and in that case, it is not necessary to connect via a network.

〔参考文献２〕金澤靖, 金谷健一：“コンピュータビジョンのための画像の特徴点の抽出”電子情報通信学会誌 Vol.87, No.12 pp.1043-1048 (2004).
（ハ）方法３
管理ユーザが３Ｄに復元したいイベント会場の静止画像を十分多く撮影し、ｊｐｇ(Joint Photographic Experts Group)などの形式で符号化し、環境モデル計算情報入力装置１０から環境モデル計算装置１１にデータを転送する。環境モデル計算装置１１では、特徴点を抽出し、特徴点の照合によりイベント会場の３Ｄ化された環境モデルを構築する。その得られた環境モデルを環境モデル記憶装置１２に転送し格納する。 [Reference 2] Jun Kanazawa, Kenichi Kanaya: “Extraction of image feature points for computer vision” Journal of IEICE Vol.87, No.12 pp.1043-1048 (2004).
(C) Method 3
The management user takes a sufficient number of still images of the event venue that he / she wants to restore in 3D, encodes it in a format such as jpg (Joint Photographic Experts Group), and transfers the data from the environmental model calculation information input device 10 to the environmental model calculation device 11 . The environment model calculation device 11 extracts feature points and constructs a 3D environment model of the event venue by matching the feature points. The obtained environmental model is transferred to and stored in the environmental model storage device 12.

ここで、方法２や方法３によってイベント会場の環境モデルを構築する場合、カメラやビデオカメラでイベント会場を撮影し、画像ごとに特徴点を抽出する必要がある。このとき、環境モデルの精度を高めるためには、イベント会場以外の被写体が画像に含まれないようにする必要があり、また、イベント会場以外の被写体が画像に含まれている場合には、ズームアウトした画像を使用したり、パノラマ画像を生成して使用するというような工夫を用いることが好ましい。 Here, when the environment model of the event venue is constructed by the method 2 or the method 3, it is necessary to photograph the event venue with a camera or a video camera and extract the feature points for each image. At this time, in order to improve the accuracy of the environmental model, it is necessary to prevent subjects other than the event venue from being included in the image, and if subjects other than the event venue are included in the image, zoom It is preferable to use a device such as using an out image or generating and using a panoramic image.

これらの方法を使って、ステップＳ１００で、撮影対象となるイベント会場の環境モデルを求めて環境モデル記憶装置１２に登録すると、続いて、ステップＳ１０１で、投稿ユーザがイベント会場で撮影した画像と自分の名前などの情報を画像入力装置１３から入力し、画像記憶装置１４に転送する。 Using these methods, in step S100, the environmental model of the event venue to be photographed is obtained and registered in the environmental model storage device 12. Subsequently, in step S101, the image taken by the posting user at the event venue and the user himself / herself are recorded. Is input from the image input device 13 and transferred to the image storage device 14.

これにより、画像記憶装置１４には、図２に示したように、投稿ユーザがイベント会場で撮影した画像とその投稿者の名前などの情報が格納されることになる。 Thereby, as shown in FIG. 2, the image storage device 14 stores information such as an image taken by the posting user at the event venue and the name of the poster.

続いて、ステップＳ１０２で、前述した参考文献１に記載される方法に従って、画像記憶装置１４に格納されている各画像について特徴点を抽出し、環境モデル記憶装置１２に格納されている特徴点座標と照合し、それぞれの画像の撮影された場所、撮影の方向を推定する。さらに、画像の持つ音声の類似点を抽出することで撮影時刻が不明の画像についてその撮影時刻を推定する。そして、その推定した座標（場所）、方向、時刻などの情報を画像ＩＤや動画ＩＤと共に画像属性情報記憶装置１６に格納する。 Subsequently, in step S102, a feature point is extracted for each image stored in the image storage device 14 in accordance with the method described in Reference Document 1 described above, and the feature point coordinates stored in the environment model storage device 12 are extracted. And the location where each image was shot and the shooting direction are estimated. Furthermore, the image capturing time is estimated for an image of which the image capturing time is unknown by extracting the similarities of the sound possessed by the image. Then, information such as the estimated coordinates (location), direction, and time is stored in the image attribute information storage device 16 together with the image ID and the moving image ID.

これにより、画像属性情報記憶装置１６には、図３に示したように、画像記憶装置１４に格納されている各画像と対応をとりつつ、それらの画像の撮影時刻や撮影場所や撮影角度などの情報が格納されることになる。 Thus, as shown in FIG. 3, the image attribute information storage device 16 corresponds to each image stored in the image storage device 14, and the shooting time, shooting location, shooting angle, etc. of those images. Is stored.

このステップＳ１０２の処理については、後述する図５及び図６のフローチャートに従って詳細に説明する。 The process of step S102 will be described in detail according to the flowcharts of FIGS.

続いて、ステップＳ１０３で、閲覧ユーザが視点位置、角度、時刻の範囲、顔が写っている写真限定にするか、などの検索条件を検索条件入力装置１７から入力し、その検索条件を満たす３Ｄマップを生成して出力装置１９に出力表示する。 Subsequently, in step S103, a search condition such as whether the viewing user is limited to a viewpoint position, an angle, a time range, or a photograph showing a face is input from the search condition input device 17, and the search condition 3D is satisfied. A map is generated and displayed on the output device 19.

すなわち、環境モデル記憶装置１２の記憶する特徴点の情報に基づいて、検索条件で指定されるイベント会場の画像（閲覧ユーザから見えることになるイベント会場の画像）を出力装置１９に出力表示するとともに、画像属性情報記憶装置１６の記憶する情報に基づいて検索条件を満たす画像を割り出し、画像記憶装置１４からその割り出した画像（動画像の場合には代表画像）を読み出しサムネイルを作成してイベント会場の画像に重畳表示する形で出力装置１９に出力表示することで、出力装置１９に対して、３Ｄ化したイベント会場に対応付けて検索結果の画像を示す３Ｄマップを出力表示するのである。 That is, based on the feature point information stored in the environmental model storage device 12, the event venue image specified by the search condition (the event venue image that will be seen by the viewing user) is output and displayed on the output device 19. Based on the information stored in the image attribute information storage device 16, an image satisfying the search condition is determined, and the determined image (representative image in the case of a moving image) is read out from the image storage device 14 to create a thumbnail to create an event venue. By outputting and displaying the image on the output device 19 in a superimposed manner, a 3D map indicating the search result image is output and displayed in association with the 3D event venue.

このステップＳ１０３の処理については、後述する図９のフローチャートに従って詳細に説明する。 The process of step S103 will be described in detail according to the flowchart of FIG.

次に、図５及び図６のフローチャートに従って、ステップＳ１０２で実行する処理について詳細に説明する。 Next, the process executed in step S102 will be described in detail according to the flowcharts of FIGS.

ステップＳ１０２の処理に入る前に、図５のフローチャートに示すように、先ず最初に、ステップＳ２００で、投稿ユーザがイベント会場で撮影した写真（静止画像）や動画像と自分の名前などの情報を画像入力装置１３から入力し、これらの情報が画像記憶装置１４に転送されることになる。すなわち、図４のフローチャートのステップＳ１０１の処理が実行されることになる。 Before entering the process of step S102, as shown in the flowchart of FIG. 5, first, in step S200, information such as a photograph (still image) taken by the posting user at the event site, a moving image, and his / her name is displayed. The information input from the image input device 13 is transferred to the image storage device 14. That is, the process of step S101 in the flowchart of FIG. 4 is executed.

この後、ステップＳ２０１の処理に入って、ステップＳ２０１〜ステップＳ２１４で、入力したファイル（写真あるいは動画像）に対して、以下に説明する処理を繰り返し実行する。ここで、この処理については推定装置１５が実行することになる。 Thereafter, the process of step S201 is entered, and the process described below is repeatedly executed for the input file (photograph or moving image) in steps S201 to S214. Here, this processing is executed by the estimation device 15.

すなわち、ステップＳ２０２で、入力ファイルの拡張子やファイルヘッダなどから、入力ファイルが静止画像ファイルであるのか動画像ファイルであるのかを判断する。 That is, in step S202, it is determined whether the input file is a still image file or a moving image file from the extension of the input file and the file header.

このステップＳ２０２の判断処理に従って、入力ファイルが動画像ファイルであることを判断するときには、ステップＳ２０３に進んで、その動画像ファイルのパノラマ画像が作成可能であるのか検証する。パノラマ画像とは、パン、チルト、ズームなどのカメラ操作が含まれる区間から画像列を合成することで生成される画像であり、下記の参考文献３に記載される方法によってパノラマ画像の作成可否を判定することが可能である。 When it is determined that the input file is a moving image file in accordance with the determination process in step S202, the process proceeds to step S203 to verify whether a panoramic image of the moving image file can be created. A panoramic image is an image generated by synthesizing an image sequence from a section including camera operations such as panning, tilting, and zooming. Whether a panoramic image can be created by the method described in Reference 3 below. It is possible to determine.

〔参考文献３〕谷口行信, 阿久津明人, 外村佳伸："Panorama Excerpts：パノラマ画像の自動生成・レイアウトによる映像一覧" 電子情報通信学会論文誌 D-II Vol.J82-D-II, No.3 pp.390-398 (1999).
このステップＳ２０３の判断処理に従って、入力した動画像ファイルのパノラマ画像の作成が可能であることを判断するときには、ステップＳ２０４に進んで、参考文献３に記載される方法によって、入力した動画像ファイルからパノラマ画像を作成し、それを代表画像として画像記憶装置１４に格納する。このようにパノラマ画像を代表画像として用いることでイベント会場が背景として多く含まれることになり、これにより、より多くの特徴を抽出することができるようになることで、後述する処理に従って画像の撮影場所や撮影方向を推定する場合に、その推定精度を高めることができるようになる。 [Reference 3] Yukinobu Taniguchi, Akihito Akutsu, Yoshinobu Tonomura: “Panorama Excerpts: Video List by Automatic Generation and Layout of Panoramic Images” IEICE Transactions D-II Vol.J82-D-II, No .3 pp.390-398 (1999).
When it is determined that it is possible to create a panoramic image of the input moving image file according to the determination process of step S203, the process proceeds to step S204, and the method described in Reference 3 is used to input the panoramic image from the input moving image file. A panoramic image is created and stored in the image storage device 14 as a representative image. By using a panoramic image as a representative image in this way, a large number of event venues are included as a background, thereby enabling more features to be extracted and capturing images according to the processing described below. When estimating a place and a shooting direction, the estimation accuracy can be increased.

一方、ステップＳ２０３の判断処理に従って、入力した動画像ファイルのパノラマ画像の作成が不可能であることを判断するときには、ステップＳ２０５に進んで、動画像の持つ画像の中で、ピントの合っている画像の中から最もズームアウトしている画像を選択して、それを代表画像として画像記憶装置１４に格納する。このように最もズームアウトしている画像を代表画像として用いることでイベント会場が背景として多く含まれることになり、これにより、より多くの特徴を抽出することができるようになることで、後述する処理に従って画像の撮影場所や撮影方向を推定する場合に、その推定精度を高めることができるようになる。 On the other hand, when it is determined in step S203 that it is impossible to create a panoramic image of the input moving image file, the process proceeds to step S205, and the moving image is in focus. The most zoomed-out image is selected from the images, and is stored in the image storage device 14 as a representative image. By using the most zoomed-out image as a representative image in this way, a lot of event venues are included as a background, and more features can be extracted, which will be described later. When estimating the shooting location and shooting direction of an image according to the processing, the estimation accuracy can be improved.

続いて、ステップＳ２０６で、すでに登録された動画像から同時刻に撮影されたものを推定、検出し、検出された場合には同期リスト（図３（ｂ）に示す同期リスト）に登録する。動画像の時間同期をとる手法については、後述する図６のフローチャートに従って詳細に説明する。 Subsequently, in step S206, an image captured at the same time from the already registered moving images is estimated and detected. If detected, it is registered in the synchronization list (the synchronization list shown in FIG. 3B). A method for synchronizing time of moving images will be described in detail with reference to a flowchart of FIG.

続いて、ステップＳ２０７で、この時間同期処理に従って、同時刻に撮影された撮影時刻の情報を検出できたのか否かを判断して、検出できないことを判断するときには、ステップＳ２０８に進んで、撮影時刻の自動検出を諦めて、変数time＿ｓ＝０（未設定）とする。 Subsequently, in step S207, according to the time synchronization process, it is determined whether or not the information of the shooting time taken at the same time can be detected. If it is determined that the information cannot be detected, the process proceeds to step S208. Give up the automatic detection of time and set the variable time_s = 0 (not set).

一方、ステップＳ２０２の判断処理に従って、入力ファイルが静止画像ファイル（写真）であることを判断するときには、ステップＳ２０９に進んで、その静止画像ファイルがexif情報を保持しているのかを確認する。 On the other hand, when it is determined that the input file is a still image file (photograph) in accordance with the determination process in step S202, the process proceeds to step S209 to check whether the still image file holds exif information.

このステップＳ２０２の確認処理に従って、入力した静止画像ファイルがexif情報を保持していることを確認するときには、ステップＳ２１０に進んで、exif情報に記録されている撮影時刻を変数timeに保持し、一方、入力した静止画像ファイルがexif情報を保持していないことを判断するときには、ステップＳ２０８に進んで、撮影時刻の自動検出を諦めて、変数time＝０（未設定）とする。 When it is confirmed that the input still image file holds exif information according to the confirmation processing in step S202, the process proceeds to step S210, and the shooting time recorded in the exif information is held in the variable time. If it is determined that the input still image file does not hold the exif information, the process proceeds to step S208 to give up the automatic detection of the shooting time and set the variable time = 0 (not set).

続いて、ステップＳ２１１で、入力した画像が静止画像である場合には、その静止画像の中から人の顔を抽出して、その抽出した顔の数を保持し、一方、入力した画像が動画像である場合には、その動画像の代表画像の中から人の顔を抽出して、その抽出した顔の数を保持する。 Subsequently, in step S211, if the input image is a still image, a human face is extracted from the still image and the number of the extracted faces is held, while the input image is a moving image. If it is an image, a human face is extracted from the representative image of the moving image, and the number of extracted faces is held.

続いて、ステップＳ２１２で、入力した画像が静止画像である場合には、その静止画像の特徴点を抽出し、一方、入力した画像が動画像である場合には、その動画像の代表画像の特徴点を抽出する。その上で、環境モデル記憶装置１２に記憶されている特徴点と照合することで、入力した画像がどの位置、どの方向から撮影されたものかを推定し、確信度と共に保持する。ここで、確信度は特徴点の中で、撮影位置推定に用いられたものの数と、用いられなかったものの数の関数として算出する。 In step S212, if the input image is a still image, feature points of the still image are extracted. On the other hand, if the input image is a moving image, the representative image of the moving image is extracted. Extract feature points. Then, by collating with the feature points stored in the environment model storage device 12, it is estimated from which position and from which direction the input image was taken, and held together with the certainty factor. Here, the certainty factor is calculated as a function of the number of feature points used for shooting position estimation and the number not used.

続いて、ステップＳ２１３で、ステップＳ２００からの操作で得られた情報（図３に示す情報の内の人気を除く情報）の値を画像属性情報記憶装置１６に格納する。 Subsequently, in step S213, the value of information obtained by the operation from step S200 (information excluding popularity in the information shown in FIG. 3) is stored in the image attribute information storage device 16.

次に、図６のフローチャートに従って、ステップＳ２０６で実行する動画像の時間同期処理について説明する。 Next, the moving image time synchronization processing executed in step S206 will be described with reference to the flowchart of FIG.

ステップＳ２０６の処理に入ると、図６のフローチャートに示すように、先ず最初に、ステップＳ３００で、投稿ユーザから処理対象となる動画像ファイル（ステップＳ２００で入力した動画像ファイル）を入力する。 In step S206, as shown in the flowchart of FIG. 6, first, in step S300, a moving image file to be processed (moving image file input in step S200) is input from the posting user.

続いて、ステップＳ３０１で、入力した動画像ファイルが撮影時刻のメタデータ（撮影の開始、終了時刻について記録するメタデータ）を持っているのか否かを判断する。 In step S301, it is determined whether or not the input moving image file has shooting time metadata (metadata to be recorded for shooting start and end times).

このステップＳ３０１の判断処理に従って、入力した動画像ファイルが撮影時刻のメタデータを持っていないことを判断するときには、ステップＳ３０２に進んで、撮影時刻が設定されていないことを示すべく、変数time＿ｓ＝０（未設定）とし、一方、撮影時刻のメタデータを持っていることを判断するときには、ステップＳ３０３に進んで、撮影の開始、終了時刻をそれぞれtime＿ｓ，time＿ｅに代入する。 When it is determined that the input moving image file does not have the shooting time metadata in accordance with the determination process in step S301, the process proceeds to step S302 to indicate that the shooting time is not set. On the other hand, when it is determined that it has metadata of shooting time, the process proceeds to step S303, and the start and end times of shooting are substituted into time_s and time_e, respectively.

続いて、ステップＳ３０４で、入力した動画像ファイルの音声を取り出し、ステップＳ３０５〜ステップＳ３１５で、この取り出し音声を処理対象として、過去に登録した全ての動画像ファイルについて、以下に説明する処理を繰り返し実行する。 Subsequently, in step S304, the sound of the input moving image file is extracted, and in steps S305 to S315, the processing described below is repeated for all moving image files registered in the past with the extracted sound as a processing target. Execute.

すなわち、ステップＳ３０６で、過去に登録した動画像ファイルの音声を抽出し、入力した動画像ファイルの音声と一致する部分（同期する箇所）を探索する。 That is, in step S306, the sound of the moving image file registered in the past is extracted, and a portion (synchronized portion) that matches the sound of the input moving image file is searched.

具体的には、例えば、入力した動画像ファイルの音声と、過去に登録した動画像ファイルの音声とを共に５００［ｍｓ］間隔でカットし、カットしたそれぞれがある閾値を超える音圧を持っている場合、過去に登録した動画像ファイルの音声のどこかと一致しないかを調べる。具体的には、下記の参考文献４に記載される方法を用いることができる。そして、一致する部分の数が閾値を超えるとき、二つの音声は同期したとする。 Specifically, for example, both the sound of the input moving image file and the sound of the moving image file registered in the past are cut at intervals of 500 [ms], and each cut has a sound pressure exceeding a certain threshold. If yes, it is checked whether the sound of the moving image file registered in the past does not match. Specifically, the method described in Reference Document 4 below can be used. Then, when the number of matching parts exceeds the threshold, it is assumed that the two sounds are synchronized.

〔参考文献４〕柏野邦夫, ガビンスミス, 村瀬洋： "ヒストグラム特徴を用いた音響信号の高速探索法" 電子情報通信学会論文誌 D-II Vol.J82-D-II, No.9 pp.1365-1373 (1999).
続いて、ステップＳ３０７で、ステップＳ３０６の探索処理に従って同期する音声を検出できたのか否かを判断して、同期する音声を検出できなかったことを判断するときには、ステップＳ３１５に進んで、次の過去に登録した動画像ファイルに処理を移行する。 [Reference 4] Kunio Kanno, Gavin Smith, Hiroshi Murase: "Fast Search Method for Acoustic Signals Using Histogram Features" IEICE Transactions D-II Vol.J82-D-II, No.9 pp.1365- 1373 (1999).
Subsequently, in step S307, it is determined whether or not the synchronized sound has been detected in accordance with the search process in step S306, and when it is determined that the synchronized sound has not been detected, the process proceeds to step S315. The process is transferred to the moving image file registered in the past.

一方、ステップＳ３０７の判断処理に従って、同期する音声を検出できたことを判断するときには、ステップＳ３０８に進んで、お互いの同期リスト（図３（ｂ）に示す同期リスト）にお互いのＩＤ（動画ＩＤ）を加える。 On the other hand, when it is determined that the sound to be synchronized has been detected according to the determination process in step S307, the process proceeds to step S308, and each ID (moving image ID) is displayed in each other's synchronization list (the synchronization list shown in FIG. 3B). ).

続いて、ステップＳ３０９で、現在処理している過去に登録した動画像ファイルが時刻情報を持っておらず、かつ、入力した動画像ファイルの時刻情報が確定している状態にあるのか否かを判断する。 Subsequently, in step S309, it is determined whether or not the moving image file registered in the past that is currently processed does not have time information and the time information of the input moving image file is fixed. to decide.

このステップＳ３０９の判断処理に従って、現在処理している過去に登録した動画像ファイルが時刻情報を持っておらず、かつ、入力した動画像ファイルの時刻情報が確定している状態にないことを判断するときには、ステップＳ３１０に進んで、現在処理している過去に登録した動画像ファイルの時刻情報が確定しており、かつ、入力した動画像ファイルが時刻情報を持っていない状態にあるのか否かを判断する。 According to the determination process in step S309, it is determined that the currently registered moving image file currently processed does not have time information, and the time information of the input moving image file is not in a fixed state. If so, the process proceeds to step S310, in which whether or not the time information of the moving image file registered in the past that is currently processed is fixed and the input moving image file has no time information. Judging.

このステップＳ３１０の判断処理に従って、現在処理している過去に登録した動画像ファイルの時刻情報が確定しており、かつ、入力した動画像ファイルが時刻情報を持っていない状態にないことを判断するときには、ステップＳ３１５に進んで、次の過去に登録した動画像ファイルに処理を移行する。 According to the determination process in step S310, it is determined that the time information of the previously registered moving image file being processed is fixed and the input moving image file does not have the time information. In some cases, the process proceeds to step S315, and the process proceeds to the next moving image file registered in the past.

一方、ステップＳ３１０の判断処理に従って、現在処理している過去に登録した動画像ファイルの時刻情報が確定しており、かつ、入力した動画像ファイルが時刻情報を持っていない状態にあることを判断するときには、ステップＳ３１１に進んで、入力した動画像ファイルの音声と現在処理している過去に登録した動画像ファイルの音声との同期箇所から、入力した動画像ファイルの撮影開始時刻、終了時刻を推定して、それらを入力した動画像ファイルの持つtime＿ｓ，time＿ｅに代入してから、ステップＳ３１５に進んで、次の過去に登録した動画像ファイルに処理を移行する。 On the other hand, according to the determination process in step S310, it is determined that the time information of the previously registered moving image file being processed is fixed and the input moving image file has no time information. In step S311, the shooting start time and end time of the input moving image file are set from the synchronization point between the sound of the input moving image file and the sound of the currently registered moving image file. After estimating and substituting them into time_s and time_e of the input moving image file, the process proceeds to step S315, and the process proceeds to the next past moving image file.

一方、ステップＳ３０９の判断処理に従って、現在処理している過去に登録した動画像ファイルが時刻情報を持っておらず、かつ、入力した動画像ファイルの時刻情報が確定している状態にあることを判断するときには、ステップＳ３１２に進んで、入力した動画像ファイルの音声と現在処理している過去に登録した動画像ファイルの音声との同期箇所から、現在処理している過去に登録した動画像ファイルの撮影開始時刻、終了時刻を推定して、それらを現在処理している過去に登録した動画像ファイルの持つtime＿ｓ，time＿ｅに代入する。 On the other hand, according to the determination processing in step S309, it is confirmed that the currently registered moving image file currently processed does not have time information, and that the time information of the input moving image file has been confirmed. When the determination is made, the process proceeds to step S312, and the previously registered moving image file currently processed from the synchronized portion of the sound of the input moving image file and the sound of the previously registered moving image file. The shooting start time and end time are estimated and substituted for time_s and time_e of the previously registered moving image file currently processed.

続いて、ステップＳ３１３で、過去に登録した動画像ファイルの持つ同期リストに撮影時刻が不明のものがあるのか否かを判断する。 Subsequently, in step S313, it is determined whether or not there is an unknown shooting time in the synchronization list of the moving image file registered in the past.

このステップＳ３１３の判断処理に従って、過去に登録した動画像ファイルの持つ同期リストに撮影時刻が不明のものがないことを判断するときには、ステップＳ３１５に進んで、次の過去に登録した動画像ファイルに処理を移行する。 If it is determined in accordance with the determination processing in step S313 that there is no unknown shooting time in the synchronization list of the previously registered moving image file, the process proceeds to step S315, and the next registered moving image file is registered in the past. Migrate processing.

一方、ステップＳ３１３の判断処理に従って、過去に登録した動画像ファイルの持つ同期リストに撮影時刻が不明のものがあることを判断するときには、ステップＳ３１４に進んで、同期リストで連結される動画像ファイルで撮影時刻が不明のものについて全て再帰的に撮影時刻を推定し、代入してから、ステップＳ３１５に進んで、次の過去に登録した動画像ファイルに処理を移行する。 On the other hand, when it is determined that there is an unknown shooting time in the synchronization list of the moving image file registered in the past according to the determination processing in step S313, the process proceeds to step S314, and the moving image files connected in the synchronization list are processed. In step S315, the shooting time is recursively estimated and substituted for all the shooting times unknown. Then, the process proceeds to step S315, and the process proceeds to the next moving image file registered in the past.

以上に説明した図６のフローチャートで実行する動画像の時間同期処理では、動画像は同じイベント会場で撮影されたものであることから、例えば、野球の応援歌やサッカーのゴール時の歓声や運動会の場内アナウンスのように環境音が他の動画像にも収音されており、これから、音の同期で撮影時刻を推定することが可能であるということに着目して、撮影時刻が不明の動画像の撮影時刻を推定するようにしている。 In the moving image time synchronization processing executed in the flowchart of FIG. 6 described above, since the moving images were taken at the same event venue, for example, cheering or athletic meet at a baseball cheer song or soccer goal Focusing on the fact that the ambient sound is also picked up by other moving images, such as the announcement in the field, and it is possible to estimate the shooting time by synchronizing the sound from now on. The image capturing time is estimated.

例えば、図７に示すように、動画像Ａの撮影時刻が分かっており、動画像Ｂの撮影時刻が分かっていない場合に、動画像Ａの音声と動画像Ｂの音声との一致箇所を求めて、それに基づいて、動画像Ｂの撮影時刻（撮影開始時刻及び撮影終了時刻）を推定するのである。さらに、例えば、図８に示すように、動画像Ａ、動画像Ｂ、動画像Ｃ、動画像Ｄの順に投稿されたときにあって、動画像Ｄの撮影時刻しか分かっていない場合に、動画像Ａの音声と動画像Ｂの音声との一致箇所と、動画像Ｂの音声と動画像Ｃの音声との一致箇所と、動画像Ｃの音声と動画像Ｄの音声との一致箇所とを求めて、それに基づいて、動画像Ａ，Ｂ，Ｃの撮影時刻（撮影開始時刻及び撮影終了時刻）を推定するのである。 For example, as shown in FIG. 7, when the shooting time of the moving image A is known and the shooting time of the moving image B is not known, a matching portion between the sound of the moving image A and the sound of the moving image B is obtained. Based on this, the shooting time (shooting start time and shooting end time) of the moving image B is estimated. Further, for example, as shown in FIG. 8, when posting is made in the order of moving image A, moving image B, moving image C, moving image D and only the shooting time of moving image D is known, A matching portion between the sound of the image A and the sound of the moving image B, a matching portion of the sound of the moving image B and the sound of the moving image C, and a matching portion of the sound of the moving image C and the sound of the moving image D. Then, based on this, the shooting times (shooting start time and shooting end time) of the moving images A, B, and C are estimated.

このようにして、本発明を具備する画像提供システムは、図４のフローチャートに従って、撮影対象となるイベント会場の環境モデルを求めて環境モデル記憶装置１２に登録し（ステップＳ１００）、投稿ユーザがイベント会場で撮影した画像とその投稿者の名前などの情報を画像記憶装置１４に格納すると（ステップＳ１０１）、ステップＳ１０２の処理に入って、図５及び図６のフローチャートを実行することで、画像記憶装置１４に格納されている各画像について特徴点を抽出し、環境モデル記憶装置１２に格納されている特徴点座標と照合し、それぞれの画像の撮影された場所、撮影の方向を推定する。さらに、画像の持つ音声の類似点を抽出することで撮影時刻が不明の画像についてその撮影時刻を推定して、その推定した座標（場所）、方向、時刻などの情報を画像ＩＤや動画ＩＤと共に画像属性情報記憶装置１６に格納するのである。 In this way, the image providing system including the present invention obtains the environmental model of the event venue to be photographed and registers it in the environmental model storage device 12 in accordance with the flowchart of FIG. When information such as the image taken at the venue and the name of the contributor is stored in the image storage device 14 (step S101), the process enters step S102, and the flowcharts of FIGS. 5 and 6 are executed to store the image. A feature point is extracted for each image stored in the device 14 and collated with the feature point coordinates stored in the environmental model storage device 12 to estimate the location where the image was captured and the direction of the capture. Further, by extracting similarities of the voices of the images, the shooting time of the image whose shooting time is unknown is estimated, and information such as the estimated coordinates (location), direction, and time, together with the image ID and moving image ID, is estimated. It is stored in the image attribute information storage device 16.

これにより、画像属性情報記憶装置１６には、図３に示したように、画像記憶装置１４に格納されている各画像と対応をとりつつ、それらの画像の撮影の時刻や撮影の場所や撮影の角度などの情報が格納されることになる。 Thus, as shown in FIG. 3, the image attribute information storage device 16 is associated with each image stored in the image storage device 14 while taking the image capturing time, image capturing location and image capturing. Information such as the angle is stored.

図４のフローチャートのステップＳ１０３で説明したように、投稿ユーザにより撮影された画像の撮影の時刻や撮影の場所や撮影の角度などの情報を画像属性情報記憶装置１６に格納すると、それらの情報を使って、閲覧ユーザに対して閲覧要求のある画像を提供するように処理する。 As described in step S103 of the flowchart of FIG. 4, when information such as the shooting time, shooting location, and shooting angle of an image shot by the posting user is stored in the image attribute information storage device 16, the information is stored. And processing to provide the browsing user with an image requested to be viewed.

次に、図９のフローチャートに従って、ステップＳ１０３で実行する処理について詳細に説明する。ここで、このフローチャートの処理について画像検索装置１８が実行することになる。 Next, the process executed in step S103 will be described in detail according to the flowchart of FIG. Here, the image search device 18 executes the processing of this flowchart.

ステップＳ１０３の処理に入ると、図９のフローチャートに示すように、先ず最初に、ステップＳ４００で、検索条件入力装置１７を介して、閲覧ユーザから、視点位置、角度、時刻の範囲、顔が写っている画像に限定するかということについて記述する検索条件を入力する。 In step S103, as shown in the flowchart of FIG. 9, first, in step S400, the viewpoint position, the angle, the time range, and the face are captured from the browsing user via the search condition input device 17. Enter search conditions that describe whether you want to limit the images to

続いて、ステップＳ４０１で、環境モデル記憶装置１２に記憶される特徴点の情報に基づいて、入力した視点位置、角度から見えるイベント会場の見やすい３Ｄマップを作成して、それを出力装置１９に描画出力する。 Subsequently, in step S401, based on the feature point information stored in the environmental model storage device 12, an easy-to-see 3D map of the event venue viewed from the input viewpoint position and angle is created and rendered on the output device 19 Output.

ここで、見やすさの指標は視点位置、角度の関数である。視点位置から、地上に立っているように見ているのか、上から俯瞰しているのかを割り出し、それにより遠近感のある図にするかどうかを定める。 Here, the visibility index is a function of the viewpoint position and angle. From the viewpoint position, determine whether you are standing on the ground or looking down from above, and decide whether to make a perspective view.

続いて、ステップＳ４０２で、画像属性情報記憶装置１６の記憶する情報を検索することで、（１）検索条件で指定される視点位置、角度から見える画像で、（２）検索条件で指定される時刻範囲に撮影された画像で、（３）検索条件で顔が写っている画像であることが限定される場合には、それを満たしている画像で、（４）３Ｄマップ上の撮影された場所にサムネイルを描画可能である画像、という条件を満たす画像のＩＤを抽出する。 Subsequently, in step S402, by searching the information stored in the image attribute information storage device 16, (1) an image that can be seen from the viewpoint position and angle specified by the search condition, and (2) specified by the search condition. If the image is taken within the time range and (3) it is limited that the face is captured by the search condition, it is an image that satisfies that, and (4) the image was taken on the 3D map. The ID of an image satisfying the condition that the thumbnail can be drawn at the place is extracted.

このステップＳ４０２で、上記の条件を満たす画像のＩＤを抽出できない場合には、処理を終了し、一方、上記の条件を満たす画像のＩＤを抽出できた場合には、ステップＳ４０３に進んで、その抽出した画像（動画像の場合には代表画像を用いる）の中から最も評価の高い画像を選ぶ。ここで、画像の評価は撮影場所の確信度、人気、写っている顔の数などの関数である。 If the ID of the image satisfying the above condition cannot be extracted in step S402, the process ends. On the other hand, if the ID of the image satisfying the above condition can be extracted, the process proceeds to step S403. The image with the highest evaluation is selected from the extracted images (a representative image is used in the case of a moving image). Here, the evaluation of the image is a function of the certainty of the shooting location, the popularity, the number of faces in the image, and the like.

続いて、ステップＳ４０４で、３Ｄマップ上の撮影位置にサムネイルを描画する際に、新たな画像のサムネイルが既に表示されている他の画像のサムネイルに隠れてその面積が半分以上表示できるかどうかを判断する。 Subsequently, in step S404, when the thumbnail is drawn at the shooting position on the 3D map, it is determined whether or not the thumbnail of the new image can be displayed more than half of the thumbnail of another image already displayed. to decide.

例えば、現在の出力が図１０のようになっているときに、次の画像が図１１に示す画像Ｇであった場合、画像Ｇの面積の半分以上が画像Ｂ，Ｃによって隠れてしまい表示できない。このような状態になるのか否かを判断するのである。ここで、サムネイルの大きさは３Ｄマップの書き方と視点位置からの距離の関数である。 For example, when the current output is as shown in FIG. 10 and the next image is the image G shown in FIG. 11, more than half of the area of the image G is hidden by the images B and C and cannot be displayed. . It is determined whether or not such a state occurs. Here, the size of the thumbnail is a function of how to write the 3D map and the distance from the viewpoint position.

このステップＳ４０４の判断処理で、ステップＳ４０３で選んだ画像が半分以上表示できない画像であることを判断するときには、ステップＳ４０５に進んで、ステップＳ４０３で選んだ画像は描画不能とみなして描画を諦めると共に、その画像がＳ４０２で選ばれないよう条件から外してから、ステップＳ４０２の処理に戻る。 If it is determined in step S404 that the image selected in step S403 is an image that cannot be displayed by more than half, the process proceeds to step S405, and the image selected in step S403 is regarded as being undrawn and the drawing is given up. Then, the condition is removed so that the image is not selected in S402, and the process returns to step S402.

一方、ステップＳ４０４の判断処理で、ステップＳ４０３で選んだ画像が半分以上表示できる画像であることを判断するときには、ステップＳ４０６に進んで、画像記憶装置１４から、ステップＳ４０３で選んだ画像ＩＤの指す画像（動画像の場合には代表画像）を読み出し、その読み出した画像のサムネイルを生成して、出力装置１９の３Ｄマップ上の画像の撮影位置に描画出力してから、ステップＳ４０２の処理に戻る。 On the other hand, when it is determined in the determination process in step S404 that the image selected in step S403 is an image that can be displayed by more than half, the process proceeds to step S406, and the image ID selected in step S403 is indicated by the image storage device 14. An image (a representative image in the case of a moving image) is read out, a thumbnail of the read image is generated, and the image is drawn and output at the shooting position of the image on the 3D map of the output device 19, and then the process returns to step S402. .

このようにして、本発明を具備する画像提供システムは、図５及び図６のフローチャートを実行することで、投稿ユーザにより撮影された画像の撮影の時刻や撮影の場所や撮影の角度などの情報を画像属性情報記憶装置１６に格納すると、図９のフローチャートを実行することで、閲覧ユーザから見えることになるイベント会場の画像を出力装置１９に出力表示するとともに、画像属性情報記憶装置１６の記憶する情報に基づいて検索条件を満たす画像を割り出し、画像記憶装置１４からその割り出した画像（動画像の場合には代表画像）を読み出しサムネイルを作成してイベント会場の画像に重畳表示する形で出力装置１９に出力表示することで、出力装置１９に対して、３Ｄ化したイベント会場に対応付けて検索結果の画像を示す３Ｄマップを出力表示するのである。 In this way, the image providing system including the present invention executes the flowcharts of FIGS. 5 and 6 to thereby obtain information such as the shooting time, shooting location, and shooting angle of the image shot by the posting user. Is stored in the image attribute information storage device 16, the image of the event venue that will be seen by the viewing user is output and displayed on the output device 19 by executing the flowchart of FIG. An image satisfying the search condition is determined based on the information to be read, the image (the representative image in the case of a moving image) is read from the image storage device 14, a thumbnail is created, and output in a form superimposed on the event venue image By outputting and displaying on the device 19, the output device 19 displays a search result image in association with the 3D event venue. It is to output display the flop.

図１０〜図１３に、検索結果の画像を示す３Ｄマップの出力表示の一例を図示する。 FIGS. 10 to 13 show an example of the output display of the 3D map showing the search result image.

図１０は、イベント会場を俯瞰した様子である。時刻範囲だけを定め、イベント会場全体を俯瞰することで、より詳しく調べたい場所を特定できる。図の画像Ｆは動画像から作成されたパノラマ画像である。 FIG. 10 shows a bird's-eye view of the event venue. By defining only the time range and overlooking the entire event venue, you can specify the location you want to investigate in more detail. The image F in the figure is a panoramic image created from a moving image.

図１１は、図１０の表示を行っている場合にあって、画像Ｇが検索された場合に、この画像Ｇが描画対象とならないことを説明するための図である。 FIG. 11 is a diagram for explaining that when the display of FIG. 10 is performed and the image G is searched, the image G is not a drawing target.

図１２は、地上から見た場合である。遠くで撮影された画像は小さく出力される。さらに撮影された角度を用いて検索をしたい場合には画像にも角度をつけてやればよい。図１３は、角度をつけた画像を用いた例を示す。 FIG. 12 is a view from the ground. Images taken from a distance are output smaller. In addition, when it is desired to perform a search using the photographed angle, the image may be given an angle. FIG. 13 shows an example using an angled image.

このような表示形態の切り替えを可能にすることで、時刻を入力し、イベント会場を俯瞰することで大まかな画像の配置を把握することができる。そして、地上の視点から見た出力に切り替えれば、実際にそこに立っている感覚で画像を検索できる。さらに、推定された撮影の角度を用いた出力に切り替えれば、画像を特定の被写体を写したものなどの限定することができ、検索がより容易になる。 By enabling such switching of the display form, it is possible to grasp the rough image layout by inputting the time and overlooking the event venue. Then, if you switch to the output seen from the ground viewpoint, you can search for images as if you were actually standing there. Furthermore, if the output is switched to the output using the estimated shooting angle, the image can be limited to a photograph of a specific subject and the search becomes easier.

このようにして表示するサムネイルの中から閲覧ユーザが一つのサムネイルを選択して詳細表示を要求すると、その選択したサムネイルの作成元となった原画像を表示するように処理することになる。 When the browsing user selects one thumbnail from the thumbnails displayed in this way and requests detailed display, processing is performed to display the original image from which the selected thumbnail was created.

本発明は、特定の場所で撮影された膨大な動画像や静止画像の中から、ユーザが見たいと考えている画像を検索してユーザに提供する場合に適用できるものであり、撮影位置や撮影方向や撮影時刻を手がかりにして、ユーザが見たいと考えている画像を検索してユーザに提供することができるようになる。 The present invention can be applied to a case where an image that a user wants to see is retrieved from a vast number of moving images and still images captured at a specific location and provided to the user. Using the shooting direction and shooting time as clues, it is possible to search for and provide the user with an image that the user wants to see.

本発明を具備する画像提供システムのシステム構成図である。1 is a system configuration diagram of an image providing system including the present invention. 画像記憶装置の記憶する情報の説明図である。It is explanatory drawing of the information which an image memory | storage device memorize | stores. 画像属性情報記憶装置の記憶する情報の説明図である。It is explanatory drawing of the information which an image attribute information storage device memorize | stores. 本発明を具備する画像提供システムの実行するフローチャートである。It is a flowchart which the image provision system which comprises this invention performs. 本発明を具備する画像提供システムの実行するフローチャートである。It is a flowchart which the image provision system which comprises this invention performs. 本発明を具備する画像提供システムの実行するフローチャートである。It is a flowchart which the image provision system which comprises this invention performs. 動画像の時間同期処理の説明図である。It is explanatory drawing of the time synchronization process of a moving image. 動画像の時間同期処理の説明図である。It is explanatory drawing of the time synchronization process of a moving image. 本発明を具備する画像提供システムの実行するフローチャートである。It is a flowchart which the image provision system which comprises this invention performs. 検索結果の出力表示の一例を示す図である。It is a figure which shows an example of the output display of a search result. 検索結果の出力表示の一例を示す図である。It is a figure which shows an example of the output display of a search result. 検索結果の出力表示の一例を示す図である。It is a figure which shows an example of the output display of a search result. 検索結果の出力表示の一例を示す図である。It is a figure which shows an example of the output display of a search result.

Explanation of symbols

１０環境モデル計算情報入力装置
１１環境モデル計算装置
１２環境モデル記憶装置
１３画像入力装置
１４画像記憶装置
１５推定装置
１６画像属性情報記憶装置
１７検索条件入力装置
１８画像検索装置
１９出力装置 DESCRIPTION OF SYMBOLS 10 Environmental model calculation information input device 11 Environmental model calculation device 12 Environmental model storage device 13 Image input device 14 Image storage device 15 Estimation device 16 Image attribute information storage device 17 Search condition input device 18 Image search device 19 Output device

Claims

An image providing device for searching for an image satisfying a user's browsing request from images taken at a specific location and providing the image to the user,
An input means for inputting a moving image or a still image taken at the place and storing it in the storage means;
An estimation unit that estimates an imaging position and an imaging angle of the image input by the input unit by comparing the image input by the input unit with the environmental model of the place;
A recording unit that records information of a shooting position and a shooting angle of the image estimated by the estimation unit in association with an image stored in the storage unit;
When there is an image viewing request specifying information on the viewpoint position and / or line-of-sight angle from the user, the information on the shooting position and shooting angle stored in the storage means is searched, and the viewpoint position specified by the user and And / or search means for searching for an image that will be seen from the line-of-sight direction;
An image of the environmental model that is visible from the viewpoint position and / or line-of-sight direction specified by the user is displayed on the display, and a summary image list showing an overview of the image searched by the search means is displayed on the image. Providing display means for
An image providing apparatus.

The image providing apparatus according to claim 1,
Second estimation for detecting a similar point of the sound stored in the storage unit and estimating the shooting time of an image having no shooting time information stored in the storage unit based on the detected similarity With means,
The recording means further records the information of the photographing time for the image having the information of the photographing time among the images input by the input means, and the second means for the image having no information of the photographing time. Further record the information of the shooting time estimated by the estimation means,
When there is an image browsing request that also specifies the shooting time information from the user, the search means also searches the shooting time information stored in the storage means to obtain the shooting time specified by the user. Searching for images that were taken
An image providing apparatus.

The image providing device according to claim 1 or 2,
When the input unit inputs a moving image, the estimation unit performs matching with the environmental model using the most zoomed-out image selected from the moving images, thereby obtaining the moving image. Estimating the shooting position and shooting angle,
An image providing apparatus.

The image providing device according to claim 1 or 2,
When the input unit inputs a moving image, the estimation unit performs collation with the environmental model using a banorama image created from the moving image, so that a shooting position and a shooting angle of the moving image are obtained. To estimate
An image providing apparatus.

The image providing apparatus according to any one of claims 1 to 4,
The estimation means calculates the certainty factor of the estimation based on the degree of matching,
The search means obtains a final search result by selecting an image in a form giving priority to the image with a high certainty factor from the searched images.
An image providing apparatus.

The image providing apparatus according to any one of claims 1 to 5,
The display means displays the summary image at a position on the image of the environmental model associated with the shooting position of the image searched by the search means and the shooting position indicated by the shooting angle.
An image providing apparatus.

The image providing apparatus according to any one of claims 1 to 6,
The display means tilts and displays the summary image in a form that matches the state when the user browses according to the shooting angle of the image searched by the search means.
An image providing apparatus.

The image providing apparatus according to any one of claims 1 to 6,
The display means displays the summary image by changing the size of the summary image in a form that becomes smaller as the image taken farther according to the shooting position of the image searched by the search means.
An image providing apparatus.

An image providing method executed by an image providing apparatus that searches an image that satisfies a user's browsing request from images taken at a specific location and provides the image to the user,
A process of inputting a moving image or a still image captured at the place and storing it in the storage means;
The process of estimating the shooting position and shooting angle of the input image by collating the input image with the environmental model of the place;
In association with the image stored in the storage means, the process of recording the estimated shooting position and shooting angle information of the image,
When there is an image viewing request specifying information on the viewpoint position and / or line-of-sight angle from the user, the information on the shooting position and shooting angle stored in the storage means is searched, and the viewpoint position specified by the user and And / or the process of searching for an image that will be visible from the line of sight,
Displaying the image of the environmental model to be seen from the viewpoint position and / or line-of-sight direction specified by the user on the display, and displaying a summary image list indicating the outline of the searched image on the image; Having
An image providing method as a feature.

An image providing program used for realizing an image providing apparatus that searches for an image that satisfies a user's browsing request from images taken at a specific location and provides the image to the user,
Computer
An input means for inputting a moving image or a still image taken at the place and storing it in the storage means;
An estimation unit that estimates an imaging position and an imaging angle of the image input by the input unit by comparing the image input by the input unit with the environmental model of the place;
A recording unit that records information of a shooting position and a shooting angle of the image estimated by the estimation unit in association with an image stored in the storage unit;
When there is an image viewing request specifying information on the viewpoint position and / or line-of-sight angle from the user, the information on the shooting position and shooting angle stored in the storage means is searched, and the viewpoint position specified by the user and And / or search means for searching for an image that will be seen from the line-of-sight direction;
An image of the environmental model that is visible from the viewpoint position and / or line-of-sight direction specified by the user is displayed on the display, and a summary image list showing an overview of the image searched by the search means is displayed on the image. An image providing program for functioning as display means.