JPWO2011132382A1

JPWO2011132382A1 - Information providing system, information providing method, and information providing program

Info

Publication number: JPWO2011132382A1
Application number: JP2012511530A
Authority: JP
Inventors: 仙田　修司; 修司仙田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2010-04-19
Filing date: 2011-04-13
Publication date: 2013-07-18
Also published as: WO2011132382A1

Abstract

実世界の場面に関連するビデオ画像のような動画情報を、利用者の視点による実世界と関連付けて提供できる情報提供システムを提供する。カメラは、利用者に装着されて実世界を撮影する。位置姿勢推定手段は、カメラが撮影した映像から、実世界の位置及び姿勢を推定する。ビデオ映像変形手段は、実世界の場面を予め撮影した映像であるビデオ映像を、推定された実世界の位置及び姿勢に合わせて変形させる。重畳手段は、カメラが撮影した映像と、ビデオ映像変形手段が変形したビデオ映像とを重畳する。Provided is an information providing system capable of providing moving image information such as a video image related to a real world scene in association with the real world from a user's viewpoint. The camera is worn by the user to photograph the real world. The position / orientation estimation means estimates the position and orientation of the real world from the video taken by the camera. The video image transformation means transforms a video image, which is an image obtained by capturing a real world scene in advance, in accordance with the estimated position and posture of the real world. The superimposing unit superimposes the video photographed by the camera and the video video deformed by the video video transformation unit.

Description

本発明は、情報提供システム、映像表示用端末、映像再生制御装置、情報提供方法および情報提供用プログラムに関し、特にカメラで撮影した対象に関連付けられた情報を分かり易く提供する情報提供システム、映像表示用端末、映像再生制御装置、情報提示方法および情報提示用プログラムに関する。 The present invention relates to an information providing system, a video display terminal, a video reproduction control device, an information providing method, and an information providing program, and in particular, an information providing system that provides easy-to-understand information associated with an object photographed by a camera, and video display The present invention relates to an information terminal, a video reproduction control device, an information presentation method, and an information presentation program.

実世界に関連付けられた情報を提供する方法として、利用者の視点からの実世界をカメラで撮影し、撮影された実世界の内容に関連する情報を映像に重ねて提供する拡張現実（ＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ）と呼ばれる技術がある。拡張現実を利用した装置の一例が、特許文献１に記載されている。特許文献１に記載された作業情報提供装置は、カメラで撮影した作業対象の空間的配置を同定し、作業者が行うべき作業に関する情報を作業対象と関連付けて重畳表示する。このような方法を用いることで、作業者にとって理解し易い情報を提供することが可能になる。 Augmented Reality (Augmented Reality) is a method of providing information related to the real world by photographing the real world from the user's viewpoint with a camera and providing information related to the content of the captured real world on the video. There is a technology called). An example of an apparatus using augmented reality is described in Patent Document 1. The work information providing apparatus described in Patent Literature 1 identifies a spatial arrangement of work objects photographed by a camera, and superimposes and displays information related to work to be performed by the worker in association with the work object. By using such a method, it becomes possible to provide information that can be easily understood by the operator.

特許文献２には、撮影映像を地図上に重ね合わせて表示する撮影映像処理システムが記載されている。特許文献２に記載された撮影映像処理システムでは、空中における撮影位置を３次元的に特定し、さらに、カメラと機体の地表面に対する姿勢を特定して撮影画枠を計算し、その撮影画枠に合わせて映像変形を行う。 Patent Document 2 describes a captured video processing system that displays captured images superimposed on a map. In the photographic image processing system described in Patent Document 2, the photographic position in the air is specified three-dimensionally, and the photographic image frame is calculated by specifying the posture of the camera and the body relative to the ground surface. The video is transformed to match.

また、特許文献３には、ユーザの視点位置からの現実世界の映像と、仮想世界の映像とを合成した映像を画像表示装置に出力する情報提示装置が記載されている。特許文献３に記載された情報提示装置は、ユーザの視点位置姿勢情報に基づき、仮想世界におけるユーザの視野を算出する。そして、上記情報提示装置は、ユーザの視野に入る部分のデータをもとに、ユーザの視野に重なるＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）を生成する。 Patent Document 3 describes an information presentation device that outputs a video obtained by synthesizing a real world video from a user's viewpoint position and a virtual world video to an image display device. The information presentation device described in Patent Literature 3 calculates the user's visual field in the virtual world based on the user's viewpoint position and orientation information. And the said information presentation apparatus produces | generates CG (Computer Graphics) which overlaps with a user's visual field based on the data of the part which enters into a user's visual field.

特開２００１−２８２３４９号公報JP 2001-282349 A 特開２００３−３１６２５９号公報JP 2003-316259 A 特開２００５−１７４０２１号公報JP-A-2005-174021

実世界の場面に関連付けられる情報の中には、ビデオ映像も存在する。しかし、一般的な装置では、ビデオ映像を実世界と関連付けて表示する方法が考慮されていない。そのため、特許文献１に記載された作業情報提供装置では、簡便に作成可能なビデオ映像を、実世界と関連付けて表示することができないという課題がある。 Among the information associated with real-world scenes are video images. However, a general apparatus does not consider a method of displaying video images in association with the real world. Therefore, the work information providing apparatus described in Patent Document 1 has a problem that a video image that can be easily created cannot be displayed in association with the real world.

また、特許文献２に記載されたシステムは、地表面に対するカメラの姿勢に基づいてカメラが撮影している画像そのものを変形し、変形した画像を重ね合わせるものである。すなわち、特許文献２に記載されたシステムでは、実世界に関連付けられた情報に合わせて撮影された映像が変形される。そのため、利用者の視点からの実世界に対して関連する情報が提供されているとは言い難い。 In addition, the system described in Patent Document 2 deforms an image captured by the camera based on the posture of the camera with respect to the ground surface, and superimposes the deformed images. In other words, in the system described in Patent Document 2, the video imaged according to the information associated with the real world is transformed. Therefore, it is hard to say that relevant information is provided for the real world from the user's perspective.

また、特許文献３に記載された情報提示装置では、ユーザの視野（実世界）とＣＧとを単純に重畳した映像を表示装置に出力しているに過ぎず、重畳する情報（ＣＧ）と実世界との関連性はない。そのため、特許文献３に記載された情報提示装置を用いても、ビデオ映像を実世界と関連付けて表示することはできない。 In addition, the information presentation device described in Patent Document 3 merely outputs an image in which the user's field of view (real world) and CG are simply superimposed on the display device, and the superimposed information (CG) and the actual information are displayed. There is no relevance to the world. Therefore, even if the information presentation device described in Patent Document 3 is used, a video image cannot be displayed in association with the real world.

そこで、本発明は、実世界の場面に関連する動画情報を、利用者の視点による実世界と関連付けて提供できる情報提供システム、映像表示用端末、映像再生制御装置、情報提示方法および情報提示用プログラムを提供することを目的とする。 Therefore, the present invention provides an information providing system, a video display terminal, a video reproduction control device, an information presentation method, and information presentation information that can provide moving picture information related to a real world scene in association with the real world from the viewpoint of the user. The purpose is to provide a program.

本発明による情報提供システムは、利用者に装着されて実世界を撮影するカメラと、カメラが撮影した映像から、実世界の位置及び姿勢を推定する位置姿勢推定手段と、実世界の場面を予め撮影した映像であるビデオ映像を、推定された実世界の位置及び姿勢に合わせて変形させるビデオ映像変形手段と、カメラが撮影した映像と、ビデオ映像変形手段が変形したビデオ映像とを重畳する重畳手段とを備えたことを特徴とする。 An information providing system according to the present invention includes a camera that is worn by a user and captures the real world, a position and orientation estimation unit that estimates a position and orientation of the real world from an image captured by the camera, and a real world scene in advance. Video image transformation means for transforming a video image, which is a photographed image, according to the estimated position and orientation of the real world, superimposition that superimposes a video image taken by the camera and a video image transformed by the video image transformation means Means.

本発明による映像表示用端末は、利用者に装着されて実世界を撮影するカメラと、実世界を撮影した映像からその実世界の位置及び姿勢を推定するサーバ装置に、カメラが撮影した実世界の映像を送信する送信手段と、実世界の場面を予め撮影した映像であるビデオ映像を、カメラが撮影した映像をもとにサーバ装置が推定した実世界の位置及び姿勢に合わせて変形させるビデオ映像変形手段と、カメラが撮影した映像と、ビデオ映像変形手段が変形したビデオ映像とを重畳する重畳手段とを備えたことを特徴とする。 The video display terminal according to the present invention includes a camera that is mounted on a user and shoots the real world, and a server device that estimates the position and orientation of the real world from a video of the real world. Transmitting means for transmitting images, and video images that transform video images that are images of real-world scenes in advance to match the position and orientation of the real world estimated by the server device based on the images captured by the camera The image forming apparatus includes a deforming unit, and a superimposing unit that superimposes the video photographed by the camera and the video image deformed by the video image deforming unit.

本発明による映像再生制御装置は、実世界を撮影した映像を送信する端末装置から受信したその映像から、実世界の位置及び姿勢を推定する位置姿勢推定手段と、実世界の場面を予め撮影した映像であるビデオ映像中の区切りとして予め定められた情報である区切り情報と、端末装置から受信した映像とを比較して、利用者の動作の区切りを判定する区切り判定手段と、判定結果にしたがって、区切り情報により特定される区間のビデオ映像の再生を制御する信号である再生制御信号を生成する再生制御信号生成手段と、位置姿勢推定手段が推定した実世界の位置及び姿勢を示す情報と、再生制御信号とを端末装置に送信する情報送信手段とを備えたことを特徴とする。 The video playback control apparatus according to the present invention pre-captures a position and orientation estimation means for estimating the position and orientation of the real world, and a scene of the real world from the video received from the terminal device that transmits the video of the real world. Separation determination means for comparing division information, which is information predetermined as a division in a video image, which is video, and video received from a terminal device, and determining a division of a user's action, according to a determination result Reproduction control signal generation means for generating a reproduction control signal that is a signal for controlling the reproduction of the video image in the section specified by the delimiter information, information indicating the position and orientation of the real world estimated by the position and orientation estimation means, An information transmission means for transmitting the reproduction control signal to the terminal device is provided.

本発明による情報提示方法は、利用者に装着されて実世界を撮影するカメラが撮影した映像から、実世界の位置及び姿勢を推定し、実世界の場面を予め撮影した映像であるビデオ映像中の区切りとして定められた情報である区切り情報と、カメラが撮影した映像とを比較して、利用者の動作の区切りを判定し、判定結果にしたがって、区切り情報により特定される区間のビデオ映像の再生を制御し、推定された実世界の位置及び姿勢に合わせてビデオ映像を変形させ、カメラが撮影した映像と、変形したビデオ映像とを重畳することを特徴とする。 The information presentation method according to the present invention is based on a video image that is a video in which a real-world scene is preliminarily estimated by estimating the position and orientation of the real world from a video captured by a camera that is worn by a user and that captures the real world. The segmentation information, which is information defined as segmentation, is compared with the video captured by the camera to determine the segmentation of the user's action, and according to the determination result, the video video in the section specified by the segmentation information The reproduction is controlled, the video image is deformed in accordance with the estimated position and orientation of the real world, and the image captured by the camera is superimposed on the deformed video image.

本発明による情報提示用プログラムは、利用者に装着されて実世界を撮影するカメラを備えたコンピュータに適用される情報提示用プログラムであって、コンピュータに、カメラが撮影した映像から、実世界の位置及び姿勢を推定する位置姿勢推定処理、実世界の場面を予め撮影した映像であるビデオ映像中の区切りとして定められた情報である区切り情報と、カメラが撮影した映像とを比較して、利用者の動作の区切りを判定する区切り判定処理、判定結果にしたがって、区切り情報により特定される区間のビデオ映像の再生を制御する再生制御処理、推定された実世界の位置及び姿勢に合わせてビデオ映像を変形させるビデオ映像変形処理、および、カメラが撮影した映像と、変形されたビデオ映像とを重畳する重畳処理を実行させることを特徴とする。 An information presentation program according to the present invention is an information presentation program applied to a computer equipped with a camera that is mounted on a user and photographs the real world, and from the video captured by the camera, Position / orientation estimation processing to estimate position and orientation, and use by comparing segment information, which is information defined as segments in video images that are pre-captured images of real-world scenes, and images captured by the camera Delimiter determination process for determining the delimiter of the user's movement, playback control process for controlling the playback of the video image of the section specified by the delimiter information according to the determination result, video image according to the estimated real-world position and orientation Video image transformation processing that transforms the video, and superimposition processing that superimposes the video shot by the camera and the transformed video video. The features.

本発明によれば、実世界の場面に関連するビデオ画像のような動画情報を、利用者の視点による実世界と関連付けて提供できる。 According to the present invention, it is possible to provide moving image information such as a video image related to a real world scene in association with the real world from the viewpoint of the user.

本発明の第１の実施形態における情報提供システムの例を示すブロック図である。It is a block diagram which shows the example of the information provision system in the 1st Embodiment of this invention. 第１の実施形態における動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement in 1st Embodiment. 本発明の第２の実施形態における情報提供システムの例を示すブロック図である。It is a block diagram which shows the example of the information provision system in the 2nd Embodiment of this invention. 第２の実施形態における動作の例を示すフローチャートである。It is a flowchart which shows the example of operation | movement in 2nd Embodiment. 特定の作業環境において作業支援を行う具体例を示す説明図である。It is explanatory drawing which shows the specific example which performs work assistance in a specific work environment. ビデオ映像中の１場面の例を示す説明図である。It is explanatory drawing which shows the example of 1 scene in a video image | video. 本発明による情報提供システムの最小構成の例を示すブロック図である。It is a block diagram which shows the example of the minimum structure of the information provision system by this invention. 本発明による映像表示用端末の最小構成の例を示すブロック図である。It is a block diagram which shows the example of the minimum structure of the terminal for video display by this invention. 本発明による映像再生制御装置の最小構成の例を示すブロック図である。It is a block diagram which shows the example of the minimum structure of the video reproduction control apparatus by this invention.

以下、本発明の実施形態を図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

実施形態１．
図１は、本発明の第１の実施形態における情報提供システムの例を示すブロック図である。本実施形態における情報提供システムは、カメラ１と、表示装置２と、位置姿勢推定部３と、区切り推定部４と、区切り情報記憶部５と、ビデオ映像記憶部６と、再生制御部７と、位置姿勢情報記憶部８と、ビデオ映像変形部９と、重畳部１０とを備えている。Embodiment 1. FIG.
FIG. 1 is a block diagram showing an example of an information providing system according to the first embodiment of the present invention. The information providing system in the present embodiment includes a camera 1, a display device 2, a position / orientation estimation unit 3, a segment estimation unit 4, a segment information storage unit 5, a video image storage unit 6, and a playback control unit 7. , A position / orientation information storage unit 8, a video image transformation unit 9, and a superimposition unit 10.

カメラ１は、利用者の目線から見た実世界を撮影する。カメラ１は、利用者の目線に近い位置に設置され、利用者が見ている実世界と同様の映像を撮影する。 The camera 1 captures the real world as seen from the user's perspective. The camera 1 is installed at a position close to the user's line of sight, and shoots an image similar to that in the real world viewed by the user.

表示装置２は、実世界に関連付けられた情報を、実世界の映像と重畳して表示する。表示装置２は、例えば、カメラ１が撮影した映像に、後述する重畳部１０が実世界に関連付けられた映像を重畳した映像を利用者に提示する。なお、このような機構は、ビデオシースルーと呼ばれている。また、表示装置２がハーフミラーを具備している場合、表示装置２は、カメラ１が撮影した映像ではなく、ハーフミラーを通して見える実際の映像に、関連付けられた情報だけが重畳された映像を利用者に提示してもよい。なお、この場合、カメラ１が撮影した映像は、後述する位置姿勢推定部３が実世界の空間的位置等を推定する際に用いられる。 The display device 2 displays information associated with the real world superimposed on the real-world video. For example, the display device 2 presents to the user a video obtained by superimposing a video associated with the real world on a video captured by the camera 1 by a superimposing unit 10 described later. Such a mechanism is called video see-through. In addition, when the display device 2 includes a half mirror, the display device 2 uses a video in which only associated information is superimposed on an actual video that can be seen through the half mirror, not a video shot by the camera 1. May be presented to the person. In this case, the video taken by the camera 1 is used when a position / orientation estimation unit 3 described later estimates a spatial position in the real world.

位置姿勢推定部３は、カメラ１が撮影する実世界の空間的な位置及び姿勢を推定する。具体的には、位置姿勢推定部３は、カメラ１が撮影した実世界の画像を解析して、撮影している実世界の相対的な空間位置及び姿勢を推定する。ここで、実世界の空間的な位置とは、カメラ１から見た実世界の位置のことを意味し、カメラ１の向き、及び、カメラ１と実世界との距離で表すことができる。また、実世界の姿勢とは、カメラ１から見た実世界の姿勢のことを意味し、実世界全体を回転させる度合いを示す。 The position / orientation estimation unit 3 estimates a real-world spatial position and orientation taken by the camera 1. Specifically, the position / orientation estimation unit 3 analyzes the real world image captured by the camera 1 and estimates the relative spatial position and orientation of the captured real world. Here, the spatial position of the real world means the position of the real world as viewed from the camera 1, and can be represented by the orientation of the camera 1 and the distance between the camera 1 and the real world. The real world posture means the real world posture viewed from the camera 1 and indicates the degree of rotation of the entire real world.

位置姿勢推定部３が実世界の位置及び姿勢を推定する方法には、実世界中に配置したマーカを利用する方法や、実世界の映像から抽出される画像特徴を用いる方法などが挙げられる。ここで、マーカとは、識別しやすいように設計された図形のことを意味する。 Examples of methods for estimating the position and orientation of the real world by the position / orientation estimation unit 3 include a method of using markers arranged in the real world, a method of using image features extracted from real-world video, and the like. Here, the marker means a graphic designed to be easily identified.

例えば、撮影対象とする実世界中に、識別しやすいように設計された図形（以下、マーカと記す。）を予め配置しておき、位置姿勢推定部３は、カメラ１が撮影した画像におけるマーカの種類や、位置、態様をもとに、実世界の位置及び姿勢を推定してもよい。なお、このような図形マーカによる方式は、例えば、ＡＲＴｏｏｌＫｉｔと呼ばれるソフトウエアを用いて実現できる。なお、ＡＲＴｏｏｌＫｉｔについては、以下の参考文献１で紹介されている。
〔参考文献１〕http://www.artoolworks.com/ARToolKit_Professional.htmlFor example, figures (hereinafter referred to as markers) designed to be easily identified are arranged in advance in the real world to be photographed, and the position / orientation estimation unit 3 is a marker in an image photographed by the camera 1. The real world position and orientation may be estimated based on the type, position, and mode. Note that such a method using graphic markers can be realized by using software called ARTToolKit, for example. Note that ARTToolKit is introduced in Reference Document 1 below.
[Reference 1] http://www.artoolworks.com/ARToolKit_Professional.html

マーカを配置しない場合、位置姿勢推定部３は、例えば、実世界の映像から画像特徴を抽出し、抽出した画像特徴を予め登録しておいた画像特徴の位置と照合することで、実世界の位置及び姿勢を推定してもよい。なお、このようなマーカレス方式を実現するソフトウエアとして、ＰＴＡＭが知られている。なお、ＰＴＡＭについては、以下の参考文献２に記載されている。
〔参考文献２〕http://www.robots.ox.ac.uk/~gk/PTAM/When the marker is not arranged, the position / orientation estimation unit 3 extracts, for example, an image feature from a real-world video, and compares the extracted image feature with a pre-registered image feature position. The position and orientation may be estimated. PTAM is known as software that realizes such a markerless system. PTAM is described in Reference Document 2 below.
[Reference 2] http://www.robots.ox.ac.uk/~gk/PTAM/

ビデオ映像記憶部６は、実世界の場面を予め撮影した映像を記憶する。実世界の場面を予め撮影した映像とは、いわゆる動画であり、例えば、実世界で行われる動作を予め撮影したビデオ映像などが挙げられる。また、ビデオ映像の具体例として、家具の組み立て方や魚のさばき方など、作業の手本となる教師ビデオ映像などが挙げられる。以下、実世界の場面を予め撮影した映像として、ビデオ映像を例に説明する。 The video image storage unit 6 stores an image obtained by capturing a real-world scene in advance. The video obtained by photographing a real world scene in advance is a so-called moving image, and examples thereof include a video video obtained by photographing an operation performed in the real world in advance. In addition, as a specific example of the video image, there is a teacher video image as a model of work such as how to assemble furniture and how to handle fish. Hereinafter, a video image will be described as an example of an image obtained by photographing a real-world scene in advance.

ビデオ映像は、カメラ１が撮影している実世界の映像と関連性を有する映像である。具体的には、ビデオ映像は、利用者が映像に合わせて作業できるように、カメラ１が撮影している実世界の映像に重ねて表示する情報である。例えば、カメラ１及び表示装置２を装着した利用者が家具の組み立て作業を行う場合、表示装置２は、カメラ１が家具の組み立て作業中の様子を撮影した映像に、家具の組み立て方を示すビデオ映像を重ねて表示する。なお、表示装置２が表示するビデオ映像は、利用者が行う作業に応じて予め選択しておけばよい。 The video image is an image having relevance to the real-world image captured by the camera 1. Specifically, the video image is information that is displayed overlaid on the real-world image captured by the camera 1 so that the user can work according to the image. For example, when a user wearing the camera 1 and the display device 2 performs assembly work of furniture, the display device 2 displays a video showing how the furniture 1 is assembled in a video image of the camera 1 being assembled. Overlays the image. The video image displayed on the display device 2 may be selected in advance according to the work performed by the user.

位置姿勢情報記憶部８は、ビデオ映像として撮影された実世界の空間的な位置及び姿勢として予め解析された情報を記憶する。以下、撮影されたビデオ映像における実世界の空間的な位置及び姿勢を示す情報を位置姿勢情報と記す。 The position / orientation information storage unit 8 stores information previously analyzed as a real-world spatial position and orientation captured as a video image. Hereinafter, information indicating the real-world spatial position and orientation in the captured video image will be referred to as position and orientation information.

位置姿勢情報は、例えば、カメラ１が撮影した現在の実世界における位置及び姿勢を位置姿勢推定部３が推定する方法と同様の方法を用いて解析される。ただし、位置姿勢情報の解析方法は、上記方法に限定されない。例えば、ビデオ映像を撮影する際に、実世界の位置及び姿勢を検知できるセンサをカメラに取り付けておき、そのセンサによって検知された情報を位置姿勢情報として用いてもよい。このように、位置姿勢情報記憶部８には、任意の方法を用いて解析された位置姿勢情報が記憶される。 The position / orientation information is analyzed using, for example, a method similar to the method in which the position / orientation estimation unit 3 estimates the current position and orientation in the real world taken by the camera 1. However, the analysis method of position and orientation information is not limited to the above method. For example, when shooting a video image, a sensor that can detect the position and orientation of the real world may be attached to the camera, and information detected by the sensor may be used as position and orientation information. As described above, the position and orientation information storage unit 8 stores the position and orientation information analyzed using an arbitrary method.

位置姿勢情報として、例えば、映像を変換する変換行列が挙げられる。例えば、カメラが固定された状態でビデオ映像が撮影された場合、位置姿勢情報記憶部８、その状態の間については、同一の変換行列を記憶しておけばよい。また、カメラが動く状態でビデオ映像が撮影された場合、位置姿勢情報記憶部８は、時間ごとに変化する変換行列を記憶しておけばよい。ただし、位置姿勢情報は、変換行列に限定されない。 As the position / orientation information, for example, a conversion matrix for converting video can be cited. For example, when a video image is shot with the camera fixed, the same transformation matrix may be stored between the position and orientation information storage unit 8 and the state. Further, when a video image is shot with the camera moving, the position / orientation information storage unit 8 may store a conversion matrix that changes with time. However, the position / orientation information is not limited to the transformation matrix.

ビデオ映像変形部９は、推定された実世界の位置及び姿勢に合うようにビデオ映像を変形させる。すなわち、ビデオ映像変形部９は、位置姿勢推定部３が推定した実世界の位置及び姿勢に応じてビデオ映像を変形させる。 The video image deformation unit 9 deforms the video image so as to match the estimated position and orientation of the real world. That is, the video image deformation unit 9 deforms the video image according to the position and posture of the real world estimated by the position / orientation estimation unit 3.

具体的には、撮影された対象物の実世界における位置及び姿勢（すなわち、位置姿勢情報）をビデオ映像から予め求めておき、その位置姿勢情報を位置姿勢情報記憶部８に記憶しておく。そして、ビデオ映像変形部９は、位置姿勢情報と、カメラ１が撮影している実世界の位置及び姿勢（すなわち、位置姿勢推定部３が推定した実世界の位置及び姿勢）に一致するように、ビデオ映像を変形させる。ビデオ映像変形部９は、例えば、透視投影変換によりビデオ映像を変形させてもよい。ただし、ビデオ映像の変形方法は、透視投影変換に限定されない。 Specifically, the position and orientation of the photographed object in the real world (that is, position and orientation information) are obtained in advance from the video image, and the position and orientation information is stored in the position and orientation information storage unit 8. Then, the video image transformation unit 9 matches the position / orientation information with the real-world position and orientation taken by the camera 1 (that is, the real-world position and orientation estimated by the position / orientation estimation unit 3). , Transform video footage. The video image deformation unit 9 may deform the video image by perspective projection conversion, for example. However, the video image transformation method is not limited to perspective projection conversion.

このようにビデオ映像を変形させることにより、現在の実世界とビデオ映像とが同じ見た目になる。そのため、利用者は、実世界に関連付けられた情報として非常に直感的で分かり易いビデオ映像を見ることが可能になる By deforming the video image in this way, the current real world and the video image look the same. Therefore, the user can view a video image that is very intuitive and easy to understand as information associated with the real world.

重畳部１０は、カメラ１が撮影した映像に、ビデオ映像変形部９が変形したビデオ映像を重畳する。 The superimposing unit 10 superimposes the video image deformed by the video image deforming unit 9 on the image captured by the camera 1.

重畳部１０は、例えば、半透過にしたビデオ映像を実世界と全く同じ位置に重畳してもよい。なお、実世界の映像と半透過にしたビデオ画像とを重畳した場合に、いずれの画像も見づらくなることを防止するため、重畳部１０は、映像を見やすくなるようにビデオ映像を加工してもよい。すなわち、重畳部１０は、カメラ１が撮影した実世界の映像と区別可能な態様にビデオ映像を加工してもよい。重畳部１０は、例えば、ビデオ映像の色を変える、ビデオ映像のエッジを強調してエッジ以外を透明化する、半透過の度合いを時間に応じて変化させる、などの方法を用いてビデオ映像を加工してもよい。 For example, the superimposing unit 10 may superimpose the semi-transparent video image at the same position as in the real world. Note that when the real-world video and the semi-transparent video image are superimposed, in order to prevent any of the images from being difficult to see, the superimposing unit 10 may process the video video so that the video is easy to see. Good. That is, the superimposing unit 10 may process the video image in a manner that can be distinguished from the real-world image captured by the camera 1. For example, the superimposing unit 10 changes the color of the video image, enhances the edge of the video image to make the other portions transparent, changes the degree of semi-transmission according to time, and the like. It may be processed.

また、重畳部１０は、ビデオ映像を、向きや大きさは揃えつつ、表示する位置を若干ずらすように重畳してもよい。さらに、重畳部１０は、ビデオ映像が常に画面端に表示されるように重畳してもよい。 The superimposing unit 10 may superimpose the video image so that the display position is slightly shifted while aligning the direction and size. Further, the superimposing unit 10 may superimpose so that the video image is always displayed on the screen edge.

区切り情報記憶部５は、区切り情報を記憶する。ここで、区切り情報とは、ビデオ映像記憶部６に記憶されたビデオ映像中の区切りとして予め定められた情報である。区切り情報記憶部５は、区切り情報として、例えば、その区切りにおける映像を記憶してもよい。ただし、区切り情報は、映像に限定されない。ビデオ映像の中から区切りを判断できる情報であれば、他の情報であってもよい。例えば、映像そのものではなく、映像から得られる特徴を区切り情報としてもよい。なお、映像から得られる特徴を区切り情報にする場合、ビデオ映像との照合に利用しやすい特徴を抽出することが望ましい。 The delimiter information storage unit 5 stores delimiter information. Here, the delimiter information is information predetermined as a delimiter in the video image stored in the video image storage unit 6. The delimiter information storage unit 5 may store, for example, an image at the delimiter as the delimiter information. However, the delimiter information is not limited to video. Other information may be used as long as it is information that can determine a break from a video image. For example, instead of the video itself, features obtained from the video may be used as the delimiter information. In addition, when the feature obtained from the video is used as the delimiter information, it is desirable to extract a feature that can be easily used for matching with the video video.

区切り推定部４は、実世界の映像と区切り情報とを比較することで、利用者が行う動作の区切りを判定する。そして、区切り推定部４は、利用者の動作が区切りに到達したと判定すると、その旨の情報を再生制御部７へ送出する。 The delimiter estimation unit 4 determines the delimiter of the action performed by the user by comparing the real-world video with the delimiter information. When the delimiter estimation unit 4 determines that the user's action has reached the delimiter, the delimiter estimation unit 4 sends information to that effect to the reproduction control unit 7.

具体的には、まず、区切り推定部４は、位置姿勢推定部３が推定した実世界の位置及び姿勢と、ビデオ映像に対応する位置姿勢情報とから、カメラ１が撮影した実世界の映像をビデオ映像に合わせるように変形する。その後、区切り推定部４は、変形された実世界の映像と、区切り情報記憶部５に記憶された区切り情報とを比較して、利用者の動作の進捗状況を推定する。なお、以下の説明では、区切り情報を、最終状態と記すこともある。 Specifically, first, the delimiter estimation unit 4 calculates the real world image captured by the camera 1 from the real world position and orientation estimated by the position / orientation estimation unit 3 and the position / orientation information corresponding to the video image. Transform to fit the video image. After that, the delimiter estimation unit 4 compares the deformed real-world video with the delimiter information stored in the delimiter information storage unit 5 to estimate the progress of the user's operation. In the following description, the delimiter information may be described as a final state.

例えば、ビデオ映像が、予め何らかの動作単位ごとに区切られており、位置姿勢情報記憶部８が、動作単位に区切られたビデオ映像の最終状態を示す映像を区切り情報として記憶しているものとする。このとき、区切り推定部４は、現時点での動作が含まれた区間内の最終状態を示す映像と、カメラ１が撮影した実世界の映像とを比較する。そして、区切り推定部４は、これらの映像が一致したと判断した場合、現在の区間内の動作は終了したものとし、次の区間へと処理を進めるための信号（以下、再生制御信号と記す。）を再生制御部７へ送出する。 For example, it is assumed that the video image is divided in advance for each operation unit, and the position / orientation information storage unit 8 stores an image indicating the final state of the video image divided into operation units as the delimiter information. . At this time, the delimiter estimation unit 4 compares the video showing the final state in the section including the current operation with the real-world video captured by the camera 1. When the delimiter estimation unit 4 determines that these videos match, it is assumed that the operation in the current section has ended, and a signal for proceeding to the next section (hereinafter referred to as a playback control signal). .) Is sent to the reproduction control unit 7.

なお、上記説明では、区切り推定部４が、最終状態を示す映像と、カメラ１が撮影した実世界の映像とを比較する場合を説明した。ただし、区切り推定部４が、実世界の映像と比較する対象は、映像に限定されない。区切り推定部４は、最終状態の特徴を示す情報と、実世界の映像とを比較して区切りを判定してもよい。 In the above description, the case where the delimiter estimation unit 4 compares the video showing the final state with the real-world video captured by the camera 1 has been described. However, the target that the delimiter estimation unit 4 compares with the real-world video is not limited to the video. The delimiter estimation unit 4 may determine the delimiter by comparing information indicating the characteristics of the final state with a real-world video.

再生制御部７は、区切り推定部４が判定した区切りにしたがい、区切り情報により特定される区間のビデオ映像の再生を制御する。例えば、再生制御部７は、現在判定されている区切りまでの区間に対応するビデオ映像を繰り返し再生する。 The reproduction control unit 7 controls the reproduction of the video image in the section specified by the division information according to the division determined by the division estimation unit 4. For example, the playback control unit 7 repeatedly plays back the video image corresponding to the section up to the currently determined break.

具体的には、区切り推定部４が区切り情報とカメラ１が撮影した映像とが一致したと判定すると、再生制御部７は、次の区間のビデオ映像を再生するように制御する。一方、区切り推定部４が区切り情報とカメラ１が撮影した映像とが一致していないと判定すると、再生制御部７は、再生している区間のビデオ映像を繰り返し再生する。 Specifically, when the delimiter estimation unit 4 determines that the delimiter information and the video captured by the camera 1 match, the playback control unit 7 controls to play the video video of the next section. On the other hand, when the delimiter estimation unit 4 determines that the delimiter information and the video captured by the camera 1 do not match, the reproduction control unit 7 repeatedly reproduces the video image of the section being reproduced.

なお、対応する区間のビデオ映像を繰り返し再生する際、再生制御部７は、巻き戻しのたびに（すなわち、再生が完了するたびに）再生しない時間を一定時間置いてからビデオ映像を再生しても良い。また、再生制御部７は、再生を繰り返すごとに再生速度を変化させて（例えば、速度を落として、ゆっくり）再生するようにしてもよい。例えば、再生の繰り返し回数が増えるにしたがって、再生速度を低下させることにより、ビデオ映像が早すぎるような場合は、速度を徐々に低下させることができる。そのため、利用者は、提供される情報を理解しやすくなる。 Note that when the video image of the corresponding section is repeatedly reproduced, the reproduction control unit 7 reproduces the video image after a certain period of time that is not reproduced every time rewinding (that is, every time reproduction is completed). Also good. Further, the reproduction control unit 7 may change the reproduction speed each time reproduction is repeated (for example, the speed is decreased and the reproduction is slowly performed). For example, by reducing the playback speed as the number of playback repetitions increases, the video speed can be gradually reduced if the video image is too early. Therefore, the user can easily understand the provided information.

このように、実世界とビデオ映像との位置及び姿勢の違いを、位置姿勢推定部３とビデオ映像変形部９が同調させ、作業動作の進捗を区切り推定部４が判定しながら、再生制御部７がビデオ映像の再生制御を行う。そのため、簡便に作成可能なビデオ映像を用いて、実世界と関連付けた作業手順などの情報を拡張現実で表示できる。 In this way, the position and orientation estimation unit 3 and the video image transformation unit 9 synchronize the difference in position and orientation between the real world and the video image, and the playback control unit determines the progress of the work operation while the estimation unit 4 determines the progress. 7 performs video image reproduction control. For this reason, information such as work procedures associated with the real world can be displayed in augmented reality using video images that can be easily created.

なお、区切り情報記憶部５と、ビデオ映像記憶部６と、位置姿勢情報記憶部７とは、それぞれ、磁気ディスク等により実現される。 The delimiter information storage unit 5, the video image storage unit 6, and the position / orientation information storage unit 7 are each realized by a magnetic disk or the like.

また、位置姿勢推定部３と、区切り推定部４と、再生制御部７と、ビデオ映像変形部９とは、プログラム（情報提供用プログラム）に従って動作するコンピュータのＣＰＵによって実現される。例えば、プログラムは、情報提供システム内の記憶部（図示せず）に記憶され、ＣＰＵは、そのプログラムを読み込み、プログラムに従って、位置姿勢推定部３、区切り推定部４、再生制御部７およびビデオ映像変形部９として動作してもよい。 The position / orientation estimation unit 3, the segment estimation unit 4, the playback control unit 7, and the video image transformation unit 9 are realized by a CPU of a computer that operates according to a program (information providing program). For example, the program is stored in a storage unit (not shown) in the information providing system, and the CPU reads the program, and in accordance with the program, the position / orientation estimation unit 3, the segment estimation unit 4, the playback control unit 7, and the video image The deformation unit 9 may be operated.

また、情報提供システムが、映像を表示する映像表示用端末と、再生する映像を制御する映像再生制御装置とを備える構成であってもよい。この場合、例えば、映像表示用端末が、カメラ１と、表示装置２と、ビデオ映像変形部９と、重畳部１０とを備え、映像再生制御装置が、位置姿勢推定部３と、区切り推定部４と、再生制御部７と、区切り情報記憶部５と、ビデオ映像記憶部６と、位置姿勢情報記憶部８とを備えていてもよい。 Further, the information providing system may be configured to include a video display terminal that displays video and a video playback control device that controls the video to be played back. In this case, for example, the video display terminal includes the camera 1, the display device 2, the video video deformation unit 9, and the superimposition unit 10, and the video playback control device includes the position / orientation estimation unit 3, the segment estimation unit, and the like. 4, a reproduction control unit 7, a delimiter information storage unit 5, a video image storage unit 6, and a position / orientation information storage unit 8.

このとき、映像再生制御装置の制御部（図示せず）は、カメラ１が撮影した実世界の映像を映像再生制御装置に送信して、実世界の位置及び姿勢を推定させてもよい。さらに、このとき、再生制御部７は、ビデオ映像の再生を制御する信号（すなわち、再生制御信号）を生成し、生成した再生制御信号を映像表示用端末に送信してもよい。 At this time, the control unit (not shown) of the video reproduction control apparatus may transmit the real world video captured by the camera 1 to the video reproduction control apparatus to estimate the position and orientation of the real world. Further, at this time, the playback control unit 7 may generate a signal (that is, a playback control signal) for controlling playback of the video image, and transmit the generated playback control signal to the video display terminal.

さらに、位置姿勢推定部３と、区切り推定部４と、再生制御部７と、ビデオ映像変形部９とは、それぞれが専用のハードウェアで実現されていてもよい。 Furthermore, each of the position / orientation estimation unit 3, the segment estimation unit 4, the playback control unit 7, and the video image transformation unit 9 may be realized by dedicated hardware.

次に、動作について説明する。図２は、第１の実施形態における動作の例を示すフローチャートである。 Next, the operation will be described. FIG. 2 is a flowchart showing an example of the operation in the first embodiment.

まず、位置姿勢推定部３は、カメラ１が撮影した画像を解析することで、利用者に対する実世界の相対的な位置及び姿勢を推定する（ステップＳ１）。次に、区切り推定部４は、位置姿勢推定部３が推定した実世界の位置及び姿勢と、ビデオ映像に対応する位置姿勢情報とから、カメラ１が撮影した実世界の映像をビデオ映像に合わせるように変形する。その後、区切り推定部４は、変形された実世界の映像と、区切り情報記憶手段５に蓄積された現在の区間内における最終状態とを比較して、利用者の動作の進捗状況が区間内の最後（すなわち、区切り）に到達しているかどうかを判定する（ステップＳ２）。 First, the position / orientation estimation unit 3 estimates the relative position and orientation of the real world with respect to the user by analyzing the image captured by the camera 1 (step S1). Next, the delimiter estimation unit 4 matches the real world image captured by the camera 1 with the video image from the real world position and orientation estimated by the position / orientation estimation unit 3 and the position / orientation information corresponding to the video image. It deforms as follows. After that, the delimiter estimation unit 4 compares the deformed real-world video and the final state in the current section accumulated in the delimiter information storage unit 5, so that the progress of the user's action is within the section. It is determined whether the end (that is, the break) has been reached (step S2).

利用者の動作が区切り内の最後に到達していると判定された場合（ステップＳ２のＹｅｓ）、再生制御部７は、ビデオ映像の再生位置を次の区切りまでの区間へと進める（ステップＳ３）。再生制御部７が現在の区切りまでの区間内のビデオ映像を再生する際、ビデオ映像変形部９は、位置姿勢推定部３が推定した実世界の位置及び姿勢に合わせてビデオ映像を変形する（ステップＳ４）。最後に、重畳部１０は、カメラ１が撮影した映像に、変形されたビデオ映像を重畳して、表示装置２に表示させる（ステップＳ５）。 When it is determined that the user's action has reached the end of the segment (Yes in step S2), the playback control unit 7 advances the playback position of the video image to the section up to the next segment (step S3). ). When the reproduction control unit 7 reproduces the video image in the section up to the current segment, the video image deformation unit 9 deforms the video image according to the position and posture of the real world estimated by the position / orientation estimation unit 3 ( Step S4). Finally, the superimposing unit 10 superimposes the deformed video image on the image captured by the camera 1 and causes the display device 2 to display the image (step S5).

一方、ステップＳ２において、利用者の動作が区切り内の最後に到達していないと判定された場合（ステップＳ２のＮｏ）、再生制御部７は、再生している区間のビデオ映像を繰り返し再生するステップＳ４以降の処理を行う。 On the other hand, when it is determined in step S2 that the user's action has not reached the end of the break (No in step S2), the playback control unit 7 repeatedly plays back the video image of the section being played back. The process after step S4 is performed.

次に、本実施形態の効果を説明する。本実施形態によれば、位置姿勢推定部３が、カメラ１が撮影した映像から実世界の位置及び姿勢を推定する。また、区切り判定部４が、区切り情報とカメラが撮影した映像とを比較して、利用者の動作の区切りを判定する。そして、再生制御部７は、判定結果にしたがって、区切り情報により特定される区間のビデオ映像の再生を制御する。ビデオ映像変形部９は、推定された実世界の位置及び姿勢に合わせてビデオ映像を変形させ、重畳部１０は、カメラ１が撮影した映像と、変形したビデオ映像とを重畳する。よって、実世界の場面に関連するビデオ映像のような動画情報を、利用者の視点による実世界と関連付けて提供できる。 Next, the effect of this embodiment will be described. According to the present embodiment, the position / orientation estimation unit 3 estimates the position and orientation of the real world from the video captured by the camera 1. The delimiter determination unit 4 compares the delimiter information with the video captured by the camera to determine the delimiter of the user's action. Then, the playback control unit 7 controls the playback of the video image in the section specified by the delimiter information according to the determination result. The video image deforming unit 9 deforms the video image according to the estimated position and orientation of the real world, and the superimposing unit 10 superimposes the image captured by the camera 1 and the deformed video image. Therefore, it is possible to provide moving image information such as a video image related to a real world scene in association with the real world from the viewpoint of the user.

すなわち、本実施形態では、位置姿勢推定部３が利用者から見た実世界の位置及び姿勢を推定し、ビデオ映像変形部９及び再生制御部７が、その位置及び姿勢に合わせてビデオ映像を変形しながら再生する。そのため、利用者は、教師となるビデオ映像を実世界と容易に見比べながら作業を行うことができる。 That is, in this embodiment, the position / orientation estimation unit 3 estimates the position and orientation of the real world as viewed from the user, and the video image transformation unit 9 and the playback control unit 7 display the video image according to the position and orientation. Play while transforming. Therefore, the user can work while easily comparing the video image as a teacher with the real world.

さらに、本実施形態では、ビデオ映像を一連の動作ごとに区切っておき、区切り推定部４及び再生制御部７が、動作の進捗に合わせて区切られた区間ごとに再生を行う。そのため、動作に応じてビデオ映像を区切った間隔で、そのビデオ映像と実際の動作との時間的な同期をとることが可能になる。また、拡張現実で提供される情報に、既存のビデオ映像をそのまま利用することも可能である。 Further, in the present embodiment, the video image is segmented for each series of operations, and the segment estimation unit 4 and the playback control unit 7 perform playback for each segment segmented according to the progress of the operations. For this reason, it is possible to synchronize the video image and the actual operation at intervals at intervals obtained by dividing the video image according to the operation. In addition, existing video images can be used as they are for information provided in augmented reality.

実施形態２．
図３は、本発明の第２の実施形態における情報提供システムの例を示すブロック図である。なお、第１の実施形態と同様の構成については、図１と同一の符号を付し、説明を省略する。本実施形態における情報提供システムは、第１の実施形態における区切り情報記憶部５、ビデオ映像記憶部６及び位置姿勢情報記憶部８の代わりに、第１ビデオ情報記憶部１２と、第２ビデオ情報記憶部１３とを備えている。また、本実施形態における情報提供システムは、第１の実施形態における情報提供システムに加え、ビデオ情報選択部１１を備えている。Embodiment 2. FIG.
FIG. 3 is a block diagram illustrating an example of an information providing system according to the second embodiment of the present invention. In addition, about the structure similar to 1st Embodiment, the code | symbol same as FIG. 1 is attached | subjected and description is abbreviate | omitted. The information providing system according to the present embodiment includes a first video information storage unit 12 and second video information instead of the delimiter information storage unit 5, the video image storage unit 6, and the position / orientation information storage unit 8 according to the first embodiment. And a storage unit 13. In addition to the information providing system in the first embodiment, the information providing system in the present embodiment includes a video information selection unit 11.

第１ビデオ情報記憶部１２は、ビデオ映像と、そのビデオ映像に対応する区切り情報と、位置姿勢情報との組からなる情報を記憶する。以下、この情報をビデオ情報と記し、第１ビデオ情報記憶部１２が記憶するビデオ情報を、第１のビデオ情報と記す。 The first video information storage unit 12 stores information including a set of a video image, delimiter information corresponding to the video image, and position and orientation information. Hereinafter, this information is referred to as video information, and the video information stored in the first video information storage unit 12 is referred to as first video information.

第２ビデオ情報記憶部１３は、第１のビデオ情報とは異なるビデオ映像と、そのビデオ映像に対応する区切り情報と、位置姿勢情報との組からなる情報を記憶する。以下、第２ビデオ情報記憶部１３が記憶するビデオ情報を、第２のビデオ情報と記す。 The second video information storage unit 13 stores information including a set of a video image different from the first video information, delimiter information corresponding to the video image, and position and orientation information. Hereinafter, the video information stored in the second video information storage unit 13 is referred to as second video information.

なお、第１ビデオ情報記憶部１２及び第２ビデオ情報記憶部１３には、それぞれ、第１の実施形態における区切り情報記憶部５、ビデオ映像記憶部６及び位置姿勢情報記憶部８が含まれる。また、第１ビデオ情報記憶部１２及び第２ビデオ情報記憶部１３が記憶するビデオ情報には、それぞれ異なる場面で提供される情報が含まれる。第１ビデオ情報記憶部１２と、第２ビデオ情報記憶部１３とは、それぞれ、磁気ディスク等により実現される。 The first video information storage unit 12 and the second video information storage unit 13 include the delimiter information storage unit 5, the video image storage unit 6, and the position / orientation information storage unit 8 in the first embodiment, respectively. The video information stored in the first video information storage unit 12 and the second video information storage unit 13 includes information provided in different scenes. The first video information storage unit 12 and the second video information storage unit 13 are each realized by a magnetic disk or the like.

なお、図３に示す例では、各ビデオ情報がそれぞれ異なる記憶装置（具体的には、第１ビデオ情報記憶部１２と第２ビデオ情報記憶部１３）に記憶されている。ただし、各ビデオ情報は、それぞれ異なる記憶装置に記憶されていなくてもよく、１つの記憶装置に記憶されていてもよい。 In the example shown in FIG. 3, each video information is stored in different storage devices (specifically, the first video information storage unit 12 and the second video information storage unit 13). However, each video information may not be stored in different storage devices, and may be stored in one storage device.

ビデオ情報選択部１１は、実世界の映像と、複数のビデオ情報におけるビデオ映像とを比較し、より適したビデオ情報を選択する。具体的には、ビデオ情報選択部１１は、複数のビデオ情報の中から、カメラ１が撮影した映像に最も類似したビデオ映像を含むビデオ情報を選択する。例えば、ビデオ情報選択部１１は、第１のビデオ情報及び第２のビデオ情報に含まれるビデオ映像を実世界の映像と比較し、ビデオ映像の中からより適したビデオ映像を含むビデオ情報を選択する。なお、ビデオ情報選択部１１は、例えば、プログラムに従って動作するＣＰＵにより実現される。 The video information selection unit 11 compares the real-world video and the video video in the plurality of video information, and selects more suitable video information. Specifically, the video information selection unit 11 selects video information including a video image most similar to the image captured by the camera 1 from the plurality of video information. For example, the video information selection unit 11 compares the video images included in the first video information and the second video information with real-world images, and selects video information including more suitable video images from the video images. To do. Note that the video information selection unit 11 is realized by a CPU that operates according to a program, for example.

以下、ビデオ情報選択部１１が、実世界の映像に類似したビデオ映像を選択する方法を説明する。まず、ビデオ情報選択部１１は、位置姿勢推定部３が推定した実世界の位置及び姿勢と、第１のビデオ情報における位置姿勢情報および第２のビデオ情報における位置姿勢情報とから、カメラ１が撮影した実世界の映像を、それぞれ（すなわち、第１のビデオ情報及び第２のビデオ情報）のビデオ映像に合わせるように変形する。そして、ビデオ情報選択部１１は、変形した実世界の映像と、それぞれのビデオ映像とを比較し、より類似しているビデオ映像を選択する。 Hereinafter, a method in which the video information selection unit 11 selects a video image similar to a real-world image will be described. First, the video information selection unit 11 uses the real-world position and orientation estimated by the position / orientation estimation unit 3, the position / orientation information in the first video information, and the position / orientation information in the second video information. The photographed real-world video is transformed so as to match the video video of each (that is, the first video information and the second video information). Then, the video information selection unit 11 compares the deformed real-world video with each video video, and selects a video image that is more similar.

なお、上記説明では、ビデオ情報選択部１１が、変形した実世界の映像と、ビデオ映像とを比較する場合について説明した。ただし、ビデオ情報選択部１１が、変形した実世界の映像と比較する対象は、映像に限定されない。ビデオ情報選択部１１は、ビデオ映像の特徴を示す情報と、変形した実世界の映像とを比較してより適したビデオ情報を選択してもよい。 In the above description, the case where the video information selection unit 11 compares the deformed real-world image with the video image has been described. However, the object that the video information selection unit 11 compares with the deformed real world image is not limited to the image. The video information selection unit 11 may select more suitable video information by comparing the information indicating the characteristics of the video image with the deformed real-world image.

なお、以下の説明では、図３に例示するように、ビデオ情報選択部１１が比較するビデオ情報が、第１のビデオ情報及び第２のビデオ情報の２種類の場合について説明する。ただし、ビデオ情報は、２種類に限定されず、３種類以上であってもよい。この場合、ビデオ情報選択部１１は、最も類似しているビデオ映像を含むビデオ情報を選択すればよい。 In the following description, as exemplified in FIG. 3, the case where the video information compared by the video information selection unit 11 is two types of first video information and second video information will be described. However, the video information is not limited to two types, and may be three or more types. In this case, the video information selection unit 11 may select video information including the most similar video image.

次に、動作について説明する。図４は、第２の実施形態における動作の例を示すフローチャートである。まず、第１の実施形態と同様に、位置姿勢推定部３は、カメラ１が撮影した画像を解析することで、利用者に対する実世界の相対的な位置及び姿勢を推定する（ステップＳ１）。その後、ビデオ情報選択部１１は、第１のビデオ情報と第２のビデオ情報のうち、より実世界の映像に類似しているビデオ映像を含むビデオ情報を選択する（ステップＳ１０）。 Next, the operation will be described. FIG. 4 is a flowchart illustrating an example of an operation in the second embodiment. First, as in the first embodiment, the position / orientation estimation unit 3 estimates the relative position and orientation of the real world with respect to the user by analyzing the image captured by the camera 1 (step S1). Thereafter, the video information selection unit 11 selects video information including a video image that is more similar to a real-world image from the first video information and the second video information (step S10).

以降、選択されたビデオ映像を変形し、実世界の映像に重畳して表示させるまでの処理は、図２におけるステップＳ２〜Ｓ５までの処理と同様である。 Thereafter, the processing until the selected video image is deformed and displayed superimposed on the real-world image is the same as the processing from steps S2 to S5 in FIG.

次に、本実施形態の効果を説明する。本実施形態によれば、ビデオ情報選択部１１が、複数のビデオ情報の中から、カメラ１が撮影した映像（すなわち、現在の実世界の映像）に最も類似したビデオ映像を含むビデオ情報を選択する。そのため、様々なビデオ映像を準備しておくだけで、実世界に適した映像を自動的に表示できる。 Next, the effect of this embodiment will be described. According to the present embodiment, the video information selection unit 11 selects, from among a plurality of video information, video information including a video image most similar to the image captured by the camera 1 (that is, the current real-world image). To do. Therefore, it is possible to automatically display images suitable for the real world simply by preparing various video images.

以下、具体的な実施例により本発明を説明するが、本発明の範囲は以下に説明する内容に限定されない。 Hereinafter, the present invention will be described with reference to specific examples, but the scope of the present invention is not limited to the contents described below.

図５は、特定の作業環境において作業支援を行う具体例を示す説明図である。図５に例示する環境において、利用者１１０は、装着型カメラ１１１及び装着型表示装置１１２を装着している。ここで、装着型カメラ１１１は、図１におけるカメラ１に対応し、装着型表示装置１１２は、図１における表示装置２に対応する。 FIG. 5 is an explanatory diagram illustrating a specific example of performing work support in a specific work environment. In the environment illustrated in FIG. 5, the user 110 is wearing a wearable camera 111 and a wearable display device 112. Here, the wearable camera 111 corresponds to the camera 1 in FIG. 1, and the wearable display device 112 corresponds to the display device 2 in FIG.

なお、本実施例における情報提供システムは、装着型カメラ１１１、装着型表示装置１１２、及び、ＣＰＵやメモリを搭載した小型のパーソナルコンピュータにより実現される。本実施例における情報提供システムには、装着型カメラ１１１及び装着型表示装置１１２の他、第１の実施形態における位置姿勢推定部３と、区切り推定部４と、区切り情報記憶部５と、ビデオ映像記憶部６と、再生制御部７と、位置姿勢情報記憶部８と、ビデオ映像変形部９と、重畳部１０とが含まれる。ただし、図５に示す例では、装着型カメラ１１１及び装着型表示装置１１２以外の構成については、記載を省略する。 The information providing system in this embodiment is realized by the wearable camera 111, the wearable display device 112, and a small personal computer equipped with a CPU and a memory. The information providing system according to the present embodiment includes, in addition to the wearable camera 111 and the wearable display device 112, the position / orientation estimation unit 3, the delimiter estimation unit 4, the delimiter information storage unit 5, and the video in the first embodiment. A video storage unit 6, a playback control unit 7, a position / orientation information storage unit 8, a video video transformation unit 9, and a superimposition unit 10 are included. However, in the example shown in FIG. 5, the description of the configuration other than the wearable camera 111 and the wearable display device 112 is omitted.

また、本実施例では、作業の教師ビデオ映像をビデオ映像として使用するものとし、このビデオ映像を用いて作業支援を行う方法を説明する。 In this embodiment, it is assumed that a teacher video image of work is used as a video image, and a method of performing work support using this video image will be described.

まず、装着型表示装置１１２の上部に取り付けられた装着型カメラ１１１が、利用者１１０が見ている環境と同様の映像を撮影する。撮影された映像は、マーカ１０１〜１０４や、作業対象１２０を含む画像である。 First, the wearable camera 111 attached to the upper part of the wearable display device 112 shoots an image similar to the environment that the user 110 is viewing. The captured video is an image including the markers 101 to 104 and the work target 120.

位置姿勢推定部３は、撮影した映像の中からマーカを検出する。そして、位置姿勢推定部３は、撮影した映像中の情報から得られたマーカ１０１〜１０４それぞれの位置及び姿勢を表す情報を統合して、利用者１１０に対する実世界の位置及び姿勢を推定する。ここで、マーカの姿勢とは、実世界に対するマーカが回転している度合いを示す。 The position / orientation estimation unit 3 detects a marker from the captured video. Then, the position / orientation estimation unit 3 integrates information representing the positions and orientations of the markers 101 to 104 obtained from the information in the captured video, and estimates the real-world position and orientation with respect to the user 110. Here, the posture of the marker indicates the degree to which the marker is rotating with respect to the real world.

なお、マーカを複数配置することにより、例えば、複数のマーカの位置及び姿勢の平均を算出することにより、実世界の位置及び姿勢に対する誤差を減少させることが出来る。さらに、一部のマーカが見えていない状況でも、実世界の位置及び姿勢を推定できるようになる。 Note that by arranging a plurality of markers, for example, by calculating the average of the positions and orientations of the plurality of markers, it is possible to reduce errors with respect to the real-world positions and orientations. Further, the position and orientation of the real world can be estimated even when some markers are not visible.

図６は、ビデオ映像記憶部６に記憶されたビデオ映像中の１場面の例を示す説明図である。図６に例示するビデオ映像では、図５に例示するマーカ１０１〜１０４及び作業対象１２０に対応するマーカ２０１〜２０４及び作業対象２２０が撮影されていることを示す。このビデオ映像を動作手順の区切りごとに予め分割しておき、ビデオ映像を区切った情報（すなわち、区切り情報）を区切り情報記憶部５に記憶しておく。 FIG. 6 is an explanatory diagram showing an example of one scene in the video image stored in the video image storage unit 6. The video image illustrated in FIG. 6 indicates that the markers 201 to 204 and the work target 220 corresponding to the markers 101 to 104 and the work target 120 illustrated in FIG. This video image is divided in advance for each delimitation of the operation procedure, and information delimiting the video image (that is, delimiter information) is stored in the delimiter information storage unit 5.

さらに、図６に例示するマーカ２０１〜２０４を検出して位置姿勢情報を解析し、解析した情報を予め位置姿勢情報記憶部８に記憶しておく。さらに、図６に例示する以外の場面の位置姿勢情報についても同様に解析し、解析した情報を位置姿勢情報記憶部８に記憶しておく。 Further, the position and orientation information is analyzed by detecting the markers 201 to 204 illustrated in FIG. 6, and the analyzed information is stored in the position and orientation information storage unit 8 in advance. Further, the position and orientation information of scenes other than those illustrated in FIG. 6 are similarly analyzed, and the analyzed information is stored in the position and orientation information storage unit 8.

区切り推定部４は、図５に例示するマーカ１０１〜１０４によって推定された実世界の位置及び姿勢をもとに、図６に例示するビデオ映像の区切り（最終状態）において、マーカ２０１〜２０４によって推定された実世界の位置及び姿勢に合わせるように実世界の映像を変形する。そして、区切り推定部４は、区切り（最終状態）を示すビデオ映像と、変形された実世界の映像とを比較して、利用者の動作の進捗状況が区間内の最後に到達しているかどうかを判定する。 The segment estimation unit 4 uses the markers 201 to 204 in the segment (final state) of the video image illustrated in FIG. 6 based on the position and orientation of the real world estimated by the markers 101 to 104 illustrated in FIG. The real-world video is transformed to match the estimated real-world position and orientation. Then, the delimiter estimation unit 4 compares the video image indicating the delimiter (final state) with the deformed real-world image to determine whether the progress of the user's operation has reached the end in the section. Determine.

利用者の動作の進捗状況が区間内の最後に到達していると判定された場合、再生制御部７は、ビデオ映像の再生位置を次の区切りまでの区間へと進める。再生制御部７は、現在の区間内のビデオ映像を再生する。その際、ビデオ映像変形部９は、位置姿勢推定部３が推定した実世界の位置及び姿勢にビデオ映像を合わせるように変形する。そして、重畳部１０は、カメラ１が撮影した映像に対して変形されたビデオ映像を重畳し、装着型表示装置１１２にその映像を表示させる。 When it is determined that the progress status of the user's operation has reached the end of the section, the playback control unit 7 advances the playback position of the video image to the section up to the next segment. The playback control unit 7 plays back video images in the current section. At that time, the video image deformation unit 9 deforms the video image so as to match the position and posture of the real world estimated by the position / orientation estimation unit 3. The superimposing unit 10 superimposes the deformed video image on the image captured by the camera 1 and causes the wearable display device 112 to display the image.

なお、上記説明では、作業の教師ビデオ映像をビデオ映像として使用し、このビデオ映像を用いて作業支援を行う方法を説明した。本発明における情報提示システムは、例えば、道案内を行う場面にも適用可能である。 In the above description, a method of using a teacher video image of work as a video image and performing work support using the video image has been described. The information presentation system according to the present invention can be applied to scenes where route guidance is performed, for example.

道案内に本発明における情報提示システムを用いる場合、例えば、手持ちカメラで歩きながら道順を予め撮影しておき、目印となる建物などを音声や字幕などで説明するような映像を教師ビデオ映像として準備しておく。この場合、位置姿勢推定部３は、映像中に存在する特徴点の情報（建物、看板、景色など）を利用して現在の位置姿勢を推定する。さらに、ビデオ映像を、一定の距離を歩く、角を曲がるといった動作ごとに区切ってもよい。このとき、区間内の最終場面に到達すると、次の区間のビデオ映像が、見ている景色に合わせて変形されて表示される。 When using the information presentation system according to the present invention for route guidance, for example, taking a route in advance while walking with a hand-held camera, and preparing a video that explains a landmark building or the like with audio or subtitles as a teacher video image Keep it. In this case, the position / orientation estimation unit 3 estimates the current position / orientation using information (features, signboards, scenery, etc.) of feature points existing in the video. Further, the video image may be divided for each operation such as walking a certain distance or turning a corner. At this time, when the final scene in the section is reached, the video image of the next section is displayed by being transformed according to the scenery being viewed.

次に、本発明の最小構成を説明する。図７は、本発明による情報提供システムの最小構成の例を示すブロック図である。図８は、本発明による映像表示用端末の最小構成の例を示すブロック図である。図９は、本発明による映像再生制御装置の最小構成の例を示すブロック図である。 Next, the minimum configuration of the present invention will be described. FIG. 7 is a block diagram showing an example of the minimum configuration of the information providing system according to the present invention. FIG. 8 is a block diagram showing an example of the minimum configuration of the video display terminal according to the present invention. FIG. 9 is a block diagram showing an example of the minimum configuration of the video reproduction control apparatus according to the present invention.

本発明による情報提供システムは、利用者に装着されて実世界を撮影するカメラ８１（例えば、カメラ１）と、カメラ８１が撮影した映像から、実世界の位置及び姿勢（例えば、カメラの向き、カメラの距離、回転の度合い）を推定する位置姿勢推定手段８２（例えば、位置姿勢推定部３）と、実世界の場面を予め撮影した映像であるビデオ映像（例えば、ビデオ映像記憶部６に記憶されたビデオ映像）を、推定された実世界の位置及び姿勢に合わせて変形させるビデオ映像変形手段８３（例えば、ビデオ映像変形部９）と、カメラ８１が撮影した映像と、ビデオ映像変形手段８３が変形したビデオ映像とを重畳する重畳手段８４（例えば、重畳部１０）とを備えている。 The information providing system according to the present invention includes a camera 81 (for example, camera 1) that is worn by a user and captures the real world, and an image captured by the camera 81 from the position and orientation of the real world (for example, the camera orientation, Position / orientation estimation means 82 (for example, position / orientation estimation unit 3) for estimating camera distance and degree of rotation, and video images (for example, stored in the video image storage unit 6) that are images of real-world scenes previously captured. Video image deformation means 83 (for example, video image deformation section 9) that deforms the estimated video image in accordance with the estimated position and orientation of the real world, video captured by the camera 81, and video image deformation means 83. Is provided with superimposing means 84 (for example, the superimposing unit 10) for superimposing the deformed video image.

そのような構成により、実世界の場面に関連するビデオ画像のような動画情報を、利用者の視点による実世界と関連付けて提供できる。 With such a configuration, it is possible to provide moving image information such as a video image related to a real-world scene in association with the real world from the viewpoint of the user.

また、情報提示システムは、ビデオ映像中の区切りとして予め定められた情報である区切り情報（例えば、区切り情報記憶部５に記憶された区切り情報、最終状態）と、カメラ８１が撮影した映像とを比較して、利用者の動作の区切りを判定する区切り判定手段（例えば、区切り推定部４）と、判定結果にしたがって、区切り情報により特定される区間のビデオ映像の再生を制御する再生制御手段（例えば、再生制御部７）とを備えていてもよい。 In addition, the information presenting system uses delimiter information (for example, delimiter information stored in the delimiter information storage unit 5 and the final state), which is information predetermined as a delimiter in the video image, and the video captured by the camera 81. In comparison, a delimiter determining unit (for example, delimiter estimation unit 4) that determines the delimiter of the user's action, and a reproduction control unit that controls the reproduction of the video image in the section specified by the delimiter information according to the determination result ( For example, a reproduction control unit 7) may be provided.

また、区切り判定手段が、区切り情報が示すビデオ映像中の状態と、カメラ８１により撮影された映像が示す状態とが同じ状態か否かを判定して、利用者の動作の区切りを判定し、再生制御手段が、区切り判定手段が同じ状態であると判定した場合に、区切り情報により特定される区切りの次の区間のビデオ映像を再生するように制御し、区切り判定手段が同じ状態でないと判定した場合に、区切り情報により特定される区切りまでの区間のビデオ映像を繰り返し再生するように制御してもよい。 The delimiter determining means determines whether the state in the video image indicated by the delimiter information is the same as the state indicated by the image captured by the camera 81, and determines the delimiter of the user's action, When the playback control means determines that the break determination means is in the same state, it controls to play the video image of the next section of the break specified by the break information, and determines that the break determination means is not in the same state In this case, it may be controlled to repeatedly reproduce the video image in the section up to the break specified by the break information.

また、再生制御手段が、区切り情報により特定される区間のビデオ映像を繰り返し再生するように制御し、その区間のビデオ映像を繰り返し再生する際に、再生速度を変化させるように（例えば、再生の繰り返し回数が増えるにしたがって、再生速度を低下させるように）制御してもよい。このようにすることで、利用者は、提供される情報を理解しやすくなる。 Further, the playback control means controls to repeatedly play back the video image of the section specified by the delimiter information, and changes the playback speed when repeatedly playing back the video image of the section (for example, playback Control may be performed so that the playback speed decreases as the number of repetitions increases. By doing so, the user can easily understand the provided information.

また、情報提示システムが、ビデオ映像と区切り情報とを組にした情報であるビデオ情報を複数記憶するビデオ情報記憶手段（例えば、第１ビデオ情報記憶部１２及び第２ビデオ情報記憶部１３）と、複数のビデオ情報の中から、カメラ８１が撮影した映像に最も類似したビデオ映像を含むビデオ情報を選択するビデオ情報選択手段（例えば、ビデオ情報選択部１１）とを備えていてもよい。このような構成により、様々なビデオ映像を準備しておくだけで、実世界に適した映像を自動的に表示できる。 In addition, the information presentation system includes a video information storage unit (for example, the first video information storage unit 12 and the second video information storage unit 13) that stores a plurality of pieces of video information that is information obtained by combining video images and segment information. The video information selecting means (for example, the video information selecting unit 11) for selecting video information including a video image most similar to the image captured by the camera 81 from a plurality of video information may be provided. With such a configuration, it is possible to automatically display images suitable for the real world simply by preparing various video images.

また、重畳手段８４が、ビデオ映像をカメラ８１が撮影した映像と区別可能な態様に加工してもよい（例えば、ビデオ映像の色を変える、ビデオ映像のエッジを強調してエッジ以外を透明化する、半透過の度合いを時間に応じて変化させる）。 Further, the superimposing means 84 may process the video image so that it can be distinguished from the image taken by the camera 81 (for example, changing the color of the video image, emphasizing the edge of the video image, and making other than the edge transparent) The degree of semi-transmission is changed according to time).

また、本発明による映像表示用端末は、利用者に装着されて実世界を撮影するカメラ７１（例えば、カメラ１）と、実世界を撮影した映像からその実世界の位置及び姿勢を推定するサーバ装置６０（例えば、映像再生制御装置）に、カメラ７１が撮影した映像を送信する映像送信手段７２（例えば、映像再生制御装置の制御部）と、カメラ７１が撮影した映像をもとにサーバ装置６０が推定した実世界の位置及び姿勢に合わせて、実世界の場面を予め撮影した映像であるビデオ映像を変形させるビデオ映像変形手段７３（例えば、ビデオ映像変形部９）と、カメラ７１が撮影した映像と、ビデオ映像変形手段７３が変形したビデオ映像とを重畳する重畳手段７４とを備えている。 In addition, a video display terminal according to the present invention includes a camera 71 (for example, camera 1) that is worn by a user and captures the real world, and a server device that estimates the position and orientation of the real world from the video captured of the real world. 60 (for example, a video reproduction control device), a video transmission means 72 (for example, a control unit of the video reproduction control device) for transmitting a video captured by the camera 71, and a server device 60 based on the video captured by the camera 71. In accordance with the estimated position and orientation of the real world, the video image transformation means 73 (for example, the video image transformation unit 9) that transforms the video image, which is the image obtained by photographing the real world scene in advance, and the camera 71 photographed. Superimposing means 74 for superimposing the video and the video image deformed by the video image deformation means 73 is provided.

また、本発明による映像再生制御装置は、実世界を撮影した映像を送信する端末装置７０（例えば、映像表示用端末）から受信したその映像から、実世界の位置及び姿勢を推定する位置姿勢推定手段６１（例えば、位置姿勢推定部３）と、実世界の場面を予め撮影した映像であるビデオ映像（例えば、ビデオ映像記憶部６に記憶されたビデオ映像）中の区切りとして予め定められた情報である区切り情報（例えば、区切り情報記憶部５に記憶された区切り情報）と、端末装置７０から受信した映像とを比較して、利用者の動作の区切りを判定する区切り判定手段６２（例えば、区切り推定部４）と、判定結果にしたがって、区切り情報により特定される区間のビデオ映像の再生を制御する信号である再生制御信号を生成する再生制御信号生成手段６３（例えば、再生制御部７）と、位置姿勢推定手段が推定した実世界の位置及び姿勢を示す情報と、再生制御信号とを端末装置７０に送信する情報送信手段６４（例えば、再生制御部７）とを備えている。 Also, the video reproduction control apparatus according to the present invention estimates the position and orientation of the real world from the video received from the terminal device 70 (for example, video display terminal) that transmits the video of the real world. Information determined in advance as a break in the means 61 (for example, the position / orientation estimation unit 3) and a video image (for example, a video image stored in the video image storage unit 6) that is an image obtained by capturing a real-world scene in advance. The delimiter determining means 62 (for example, delimiter information stored in the delimiter information storage unit 5) is compared with the video received from the terminal device 70 to determine the delimiter of the user's action. The delimiter estimation unit 4) and a reproduction control signal generation for generating a reproduction control signal that is a signal for controlling the reproduction of the video image in the section specified by the delimiter information according to the determination result Information transmission means 64 (for example, reproduction control) that transmits information indicating the real world position and orientation estimated by the stage 63 (for example, the reproduction control unit 7), the position and orientation estimation means, and a reproduction control signal to the terminal device 70. Part 7).

これらの構成であっても、実世界の場面に関連するビデオ画像のような動画情報を、利用者の視点による実世界と関連付けて提供できる。 Even with these configurations, it is possible to provide moving image information such as a video image related to a real-world scene in association with the real world from the viewpoint of the user.

なお、上記の実施形態の一部又は全部は、以下の付記のようにも記載されうるが、以下には限られない。 In addition, although a part or all of said embodiment can be described also as the following additional remarks, it is not restricted to the following.

（付記１）利用者に装着されて実世界を撮影するカメラと、前記カメラが撮影した映像から、実世界の位置及び姿勢を推定する位置姿勢推定手段と、実世界の場面を予め撮影した映像であるビデオ映像を、推定された実世界の位置及び姿勢に合わせて変形させるビデオ映像変形手段と、前記カメラが撮影した映像と、前記ビデオ映像変形手段が変形したビデオ映像とを重畳する重畳手段とを備えたことを特徴とする情報提示システム。 (Supplementary note 1) A camera that is mounted on a user and shoots the real world, a position and orientation estimation unit that estimates the position and orientation of the real world from the video captured by the camera, and a video that pre-captures a real world scene A video image deformation means for deforming the video image according to the estimated position and orientation of the real world, and a superimposing means for superimposing the image captured by the camera and the video image deformed by the video image deformation means An information presentation system characterized by comprising:

（付記２）ビデオ映像中の区切りとして予め定められた情報である区切り情報と、カメラが撮影した映像とを比較して、利用者の動作の区切りを判定する区切り判定手段と、前記判定結果にしたがって、前記区切り情報により特定される区間のビデオ映像の再生を制御する再生制御手段とを備えた付記１記載の情報提示システム。 (Supplementary Note 2) Separation determination means for comparing division information, which is information predetermined as a division in a video image, and video captured by a camera to determine a division of a user's action, and the determination result Therefore, the information presentation system according to appendix 1, further comprising reproduction control means for controlling reproduction of a video image in a section specified by the delimiter information.

（付記３）区切り判定手段は、区切り情報が示すビデオ映像中の状態と、カメラにより撮影された映像が示す状態とが同じ状態か否かを判定して、利用者の動作の区切りを判定し、再生制御手段は、前記区切り判定手段が同じ状態であると判定した場合に、前記区切り情報により特定される区切りの次の区間のビデオ映像を再生するように制御し、前記区切り判定手段が同じ状態でないと判定した場合に、前記区切り情報により特定される区切りまでの区間のビデオ映像を繰り返し再生するように制御する付記２記載の情報提示システム。 (Additional remark 3) The division | segmentation determination means determines whether the state in the video image | video which division | segmentation information shows, and the state which the image | video which was image | photographed with the camera is the same state, and determines the division | segmentation of a user's operation | movement. The playback control means controls to play back the video image of the next section after the break specified by the break information when the break determination means determines that they are in the same state, and the break determination means is the same. The information presentation system according to supplementary note 2, wherein when it is determined that the state is not in a state, control is performed so as to repeatedly reproduce a video image of a section up to a break specified by the break information.

（付記４）再生制御手段は、区切り情報により特定される区間のビデオ映像を繰り返し再生するように制御し、当該区間のビデオ映像を繰り返し再生する際に、再生速度を変化させるように制御する付記２または付記３記載の情報提示システム。 (Supplementary note 4) The reproduction control means controls to repeatedly reproduce the video image of the section specified by the delimiter information, and controls to change the reproduction speed when repeatedly reproducing the video image of the section. 2. Information presentation system according to 2 or appendix 3.

（付記５）ビデオ映像と区切り情報とを組にした情報であるビデオ情報を複数記憶するビデオ情報記憶手段と、前記複数のビデオ情報の中から、カメラが撮影した映像に最も類似したビデオ映像を含むビデオ情報を選択するビデオ情報選択手段とを備えた付記２から付記４のうちのいずれか１つに記載の情報提示システム。 (Supplementary Note 5) Video information storage means for storing a plurality of video information, which is a set of video images and delimiter information, and a video image most similar to a video photographed by a camera among the plurality of video information. 5. The information presentation system according to any one of supplementary notes 2 to 4, further comprising video information selection means for selecting video information to be included.

（付記６）重畳手段は、ビデオ映像をカメラが撮影した映像と区別可能な態様に加工する付記１から付記５のうちのいずれか１つに記載の情報提示システム。 (Supplementary note 6) The information presentation system according to any one of supplementary note 1 to supplementary note 5, wherein the superimposing unit processes the video image into a mode distinguishable from the image captured by the camera.

（付記７）ビデオ映像変形手段は、ビデオ映像における実世界の位置及び姿勢を示す情報である位置姿勢情報と、推定された実世界の位置及び姿勢とが整合するように、ビデオ映像を変形させる付記１から付記６のうちのいずれか１つに記載の情報提示システム。 (Additional remark 7) A video image deformation | transformation means deform | transforms a video image so that the position and attitude | position information which is the information which shows the position and attitude | position of the real world in a video image, and the estimated real world position and attitude | position match. The information presentation system according to any one of supplementary notes 1 to 6.

（付記８）位置姿勢推定手段は、カメラが撮影した映像からマーカ位置を抽出して実世界の位置及び姿勢を推定する付記１から付記７のうちのいずれか１つに記載の情報提示システム。 (Supplementary note 8) The information presentation system according to any one of supplementary notes 1 to 7, wherein the position / orientation estimation unit extracts a marker position from an image captured by a camera and estimates a real-world position and orientation.

（付記９）位置姿勢推定手段は、カメラが撮影した映像から抽出される特徴点をもとに実世界の位置及び姿勢を推定する付記１から付記７のうちのいずれか１つに記載の情報提示システム。 (Supplementary note 9) The information according to any one of supplementary notes 1 to 7, wherein the position / orientation estimation unit estimates a real-world position and orientation based on a feature point extracted from an image captured by a camera. Presentation system.

（付記１０）利用者に装着されて実世界を撮影するカメラと、実世界を撮影した映像から当該実世界の位置及び姿勢を推定するサーバ装置に、前記カメラが撮影した実世界の映像を送信する送信手段と、前記カメラが撮影した映像をもとに前記サーバ装置が推定した実世界の位置及び姿勢に合わせて、実世界の場面を予め撮影した映像であるビデオ映像を変形させるビデオ映像変形手段と、前記カメラが撮影した映像と、前記ビデオ映像変形手段が変形したビデオ映像とを重畳する重畳手段とを備えたことを特徴とする映像表示用端末。 (Supplementary Note 10) A real-world video captured by the camera is transmitted to a camera that is mounted on the user and that captures the real world, and a server device that estimates the position and orientation of the real world from the video captured of the real world. And a video image transformation that transforms a video image, which is an image obtained by photographing a real-world scene in advance, in accordance with the position and orientation of the real world estimated by the server device based on the image taken by the camera And a superimposing unit for superimposing the video captured by the camera and the video image deformed by the video image modifying unit.

（付記１１）重畳手段は、ビデオ映像をカメラが撮影した映像と区別可能な態様に加工する付記１０記載の映像表示用端末。 (Supplementary note 11) The video display terminal according to supplementary note 10, wherein the superimposing unit processes the video image into a mode distinguishable from the image captured by the camera.

（付記１２）実世界を撮影した映像を送信する端末装置から受信した当該映像から、実世界の位置及び姿勢を推定する位置姿勢推定手段と、実世界の場面を予め撮影した映像であるビデオ映像中の区切りとして予め定められた情報である区切り情報と、前記端末装置から受信した映像とを比較して、利用者の動作の区切りを判定する区切り判定手段と、前記判定結果にしたがって、前記区切り情報により特定される区間のビデオ映像の再生を制御する信号である再生制御信号を生成する再生制御信号生成手段と、前記位置姿勢推定手段が推定した実世界の位置及び姿勢を示す情報と、前記再生制御信号とを前記端末装置に送信する情報送信手段とを備えたことを特徴とする映像再生制御装置。 (Supplementary note 12) Position / orientation estimation means for estimating the position and orientation of the real world from the video received from the terminal device that transmits the video obtained by photographing the real world, and a video image that is a video obtained by photographing the scene of the real world in advance Delimiter information that is predetermined information as a delimiter in the middle and a video received from the terminal device, delimiter determining means for determining a delimiter of user's action, and according to the determination result, the delimiter Reproduction control signal generation means for generating a reproduction control signal that is a signal for controlling reproduction of a video image in a section specified by the information, information indicating the position and orientation of the real world estimated by the position and orientation estimation means, An image reproduction control device comprising: information transmission means for transmitting a reproduction control signal to the terminal device.

（付記１３）区切り判定手段は、区切り情報が示すビデオ映像中の状態と、カメラにより撮影された映像が示す状態とが同じ状態か否かを判定して、利用者の動作の区切りを判定し、再生制御手段は、前記区切り判定手段が同じ状態であると判定した場合に、前記区切り情報により特定される区切りの次の区間のビデオ映像を再生する制御を行う再生制御信号を生成し、前記区切り判定手段が同じ状態でないと判定した場合に、前記区切り情報により特定される区切りまでの区間のビデオ映像を繰り返し再生する制御を行う再生制御信号を生成する付記１２記載の映像再生制御装置。 (Additional remark 13) A division | segmentation determination means determines whether the state in the video image | video which division | segmentation information shows, and the state which the image | video which was image | photographed with the camera is the same state, and determines the division | segmentation of a user's operation | movement. The reproduction control unit generates a reproduction control signal for performing control to reproduce the video image of the next section of the segment specified by the segment information when it is determined that the segment determination unit is in the same state; 13. The video playback control device according to appendix 12, which generates a playback control signal for performing control to repeatedly play back video images in a section up to a partition specified by the partition information when it is determined that the partition determination means is not in the same state.

（付記１４）利用者に装着されて実世界を撮影するカメラが撮影した映像から、実世界の位置及び姿勢を推定し、実世界の場面を予め撮影した映像であるビデオ映像中の区切りとして定められた情報である区切り情報と、前記カメラが撮影した映像とを比較して、利用者の動作の区切りを判定し、前記判定結果にしたがって、前記区切り情報により特定される区間のビデオ映像の再生を制御し、推定された実世界の位置及び姿勢に合わせてビデオ映像を変形させ、前記カメラが撮影した映像と、変形したビデオ映像とを重畳することを特徴とする情報提示方法。 (Supplementary Note 14) The position and orientation of the real world are estimated from the video taken by the camera that is mounted on the user and shoots the real world, and is defined as a break in the video video that is a pre-captured video of the real world scene. The segment information, which is the recorded information, is compared with the video captured by the camera to determine the segment of the user's action, and in accordance with the determination result, the video image of the section specified by the segment information is reproduced. The information presentation method is characterized in that the video image is deformed in accordance with the estimated position and orientation of the real world, and the image captured by the camera is superimposed on the deformed video image.

（付記１５）区切り情報により特定される区間のビデオ映像を繰り返し再生するように制御し、前記区間のビデオ映像を繰り返し再生する際に、再生速度を変化させるように制御する付記１４記載の情報提示方法。 (Supplementary note 15) The information presentation according to supplementary note 14, wherein control is performed so that the video image of the section specified by the delimiter information is repeatedly reproduced, and control is performed to change the reproduction speed when the video image of the section is repeatedly reproduced. Method.

（付記１６）利用者に装着されて実世界を撮影するカメラを備えたコンピュータに適用される情報提示用プログラムであって、前記コンピュータに、前記カメラが撮影した映像から、実世界の位置及び姿勢を推定する位置姿勢推定処理、実世界の場面を予め撮影した映像であるビデオ映像中の区切りとして定められた情報である区切り情報と、前記カメラが撮影した映像とを比較して、利用者の動作の区切りを判定する区切り判定処理、前記判定結果にしたがって、前記区切り情報により特定される区間のビデオ映像の再生を制御する再生制御処理、推定された実世界の位置及び姿勢に合わせてビデオ映像を変形させるビデオ映像変形処理、および、前記カメラが撮影した映像と、変形されたビデオ映像とを重畳する重畳処理を実行させるための情報提示用プログラム。 (Supplementary Note 16) An information presentation program applied to a computer equipped with a camera that is mounted on a user and captures the real world, and the position and orientation of the real world from the video captured by the camera The position and orientation estimation processing for estimating the segmentation information, which is information defined as a segment in the video image, which is a video obtained by capturing a real-world scene in advance, and the image captured by the camera are compared with each other. Delimitation determination processing for determining an operation delimiter, playback control processing for controlling playback of a video image in a section specified by the delimiter information according to the determination result, video image in accordance with the estimated real-world position and orientation The video image deformation process for deforming the image and the superimposition process for superimposing the image captured by the camera and the deformed video image are executed. Program for the information presented.

（付記１７）再生制御処理で、区切り情報により特定される区間のビデオ映像を繰り返し再生するように制御させ、前記区間のビデオ映像を繰り返し再生する際に、再生速度を変化させるように制御させる付記１６記載の情報提示用プログラム。 (Supplementary Note 17) In the reproduction control process, control is performed so that the video image in the section specified by the delimiter information is repeatedly reproduced, and control is performed so as to change the reproduction speed when the video image in the section is repeatedly reproduced. 16. Information presentation program according to 16.

以上、実施形態及び実施例を参照して本願発明を説明したが、本願発明は上記実施形態および実施例に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

この出願は、２０１０年４月１９日に出願された日本特許出願２０１０−９６０５５を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims the priority on the basis of the JP Patent application 2010-96055 for which it applied on April 19, 2010, and takes in those the indications of all here.

本発明は、カメラで撮影した対象に関連付けられた情報を分かり易く提供する情報提供システムに好適に適用される。具体的には、本発明を、教師用のビデオ映像を実世界の対象に合わせて変形し、その映像を再生する情報提示システムや、情報提供用プログラムといった用途に適用できる。本発明を、動作毎に区切られたビデオ映像を進捗に合わせて再生する情報提示システムや、情報提供用プログラムといった用途にも適用可能である。 The present invention is suitably applied to an information providing system that provides information related to an object photographed by a camera in an easy-to-understand manner. Specifically, the present invention can be applied to uses such as an information presentation system and an information providing program for transforming a video image for teacher according to an object in the real world and reproducing the image. The present invention can also be applied to uses such as an information presentation system for reproducing a video image divided for each operation according to progress, and an information providing program.

１カメラ
２表示装置
３位置姿勢推定部
４区切り推定部
５区切り情報記憶部
６ビデオ映像記憶部
７再生制御部
８位置姿勢情報記憶部
９ビデオ映像変形部
１０重畳部
１１ビデオ情報選択部
１２第１ビデオ情報記憶部
１３第２ビデオ情報記憶部
１０１〜１０４，２０１〜２０４マーカ
１２０，２２０作業対象DESCRIPTION OF SYMBOLS 1 Camera 2 Display apparatus 3 Position and orientation estimation part 4 Separation estimation part 5 Separation information storage part 6 Video image storage part 7 Playback control part 8 Position and orientation information storage part 9 Video image transformation part 10 Superposition part 11 Video information selection part 12 1st Video information storage unit 13 Second video information storage unit 101-104, 201-204 Marker 120, 220 Work target

Claims

A camera that is attached to the user and shoots the real world,
Position and orientation estimation means for estimating the position and orientation of the real world from the video taken by the camera;
Video image transformation means for transforming a video image, which is an image obtained by photographing a real world scene in advance, in accordance with the estimated position and posture of the real world,
An information presenting system comprising: a superimposing unit that superimposes a video photographed by the camera and a video video deformed by the video video transforming unit.

Delimiter determination means for comparing delimiter information, which is information predetermined as a delimiter in a video image, and a video captured by the camera to determine a delimiter of the user's action;
The information presentation system according to claim 1, further comprising reproduction control means for controlling reproduction of a video image in a section specified by the delimiter information according to the determination result.

The delimiter determining means determines whether the state in the video image indicated by the delimiter information is the same as the state indicated by the video captured by the camera, determines a delimiter of the user's action,
The reproduction control unit controls to play back the video image of the next section of the segment specified by the segment information when it is determined that the segment determination unit is in the same state, and the segment determination unit is in the same state The information presentation system according to claim 2, wherein when it is determined that the video image is not, the video image in the section up to the segment specified by the segment information is repeatedly reproduced.

The playback control means controls to repeatedly play the video image of the section specified by the delimiter information, and controls to change the playback speed when repeatedly playing the video image of the section. Item 4. The information presentation system according to Item 3.

Video information storage means for storing a plurality of video information, which is information obtained by combining video images and delimiter information;
5. The video information selecting means for selecting video information including a video image most similar to the image captured by the camera from the plurality of video information. 5. Information presentation system described.

The information presentation system according to any one of claims 1 to 5, wherein the superimposing unit processes the video image so as to be distinguishable from the image captured by the camera.

A camera that is attached to the user and shoots the real world,
Transmitting means for transmitting the real-world video captured by the camera to a server device that estimates the position and orientation of the real world from the video captured of the real world;
Video image transformation means for transforming a video image, which is an image obtained by photographing a real-world scene in advance in accordance with the position and orientation of the real world estimated by the server device based on the image taken by the camera;
A video display terminal comprising superimposing means for superimposing video captured by the camera and video video deformed by the video video deforming means.

Position and orientation estimation means for estimating the position and orientation of the real world from the video received from the terminal device that transmits the video of the real world,
Determining the user's action delimiter by comparing delimiter information, which is information predetermined as a delimiter in a video image, which is a video of a real-world scene, and the video received from the terminal device Break determination means;
According to the determination result, reproduction control signal generation means for generating a reproduction control signal that is a signal for controlling reproduction of a video image in a section specified by the delimiter information;
An image reproduction control apparatus comprising: information transmission means for transmitting information indicating the position and orientation of the real world estimated by the position / orientation estimation means and the reproduction control signal to the terminal device.

Estimate the position and orientation of the real world from the video taken by the camera that is attached to the user and shoots the real world,
Compare the delimiter information, which is information defined as a delimiter in the video image, which is a video that is pre-captured in the real world scene, and the video captured by the camera, determine the delimiter of the user's action,
In accordance with the determination result, control the playback of the video image of the section specified by the delimiter information,
Transform the video image to match the estimated position and orientation of the real world,
An information presentation method comprising superimposing a video image taken by the camera and a deformed video image.

An information presentation program applied to a computer equipped with a camera that is mounted on a user and photographs the real world,
In the computer,
A position and orientation estimation process for estimating the position and orientation of the real world from the video captured by the camera;
Separation determination that determines segmentation of user's action by comparing segmentation information, which is information defined as segmentation in video images that are pre-captured images of real world scenes, and video captured by the camera processing,
According to the determination result, a playback control process for controlling playback of a video image in a section specified by the delimiter information;
Video image transformation processing for transforming the video image according to the estimated position and orientation of the real world, and
An information presentation program for executing a superimposition process for superimposing a video photographed by the camera and a deformed video video.