JP2016500169A

JP2016500169A - Annotation method and apparatus

Info

Publication number: JP2016500169A
Application number: JP2015534916A
Authority: JP
Inventors: リム，ロラン; モネ，マティウ; エイエ，セルジュ; ヴェテルリ，マルタン
Original assignee: Vidinoti SA
Current assignee: Vidinoti SA
Priority date: 2012-10-05
Filing date: 2012-10-05
Publication date: 2016-01-07
Also published as: KR20150082195A; CN104798128A; WO2014053194A1; EP2904605A1

Abstract

【課題】既存の拡張現実システムの問題点を解決または軽減する。【解決手段】アノテーション方法が、プレノプティック画像キャプチャデバイス（４）でライトフィールドを表わすデータをキャプチャするステップ（１００）と；キャプチャされたデータを対応する基準データとマッチングするステップ（１０１）と；前記基準データの一要素と結びつけられたアノテーションをリトリーブするステップ（１０２）と；前記キャプチャされたデータから生成され少なくとも１つのアノテーションを含むビューをレンダリングするステップ（１０３）と；を含む。【選択図】図３To solve or alleviate the problems of existing augmented reality systems. An annotation method includes capturing (100) data representing a light field with a plenoptic image capture device (4); and matching (101) the captured data with corresponding reference data. Retrieving (102) an annotation associated with an element of the reference data; and rendering (103) a view generated from the captured data and including at least one annotation. [Selection] Figure 3

Description

本発明は、シーンに対応するデータに対してアノテーションを付加するためのアノテーション方法に関する。 The present invention relates to an annotation method for adding an annotation to data corresponding to a scene.

スマートフォン、パームトップコンピュータ、ポータブルメディアプレーヤー、携帯情報端末（ＰＤＡ）デバイスなどのハンドヘルドポータブルデバイスの開発における急速な進歩のため、画像処理が関与する新規フィーチャおよびアプリケーションを含み入れることが提案されるに至っている。このようなアプリケーション、すなわち画像アノテーションまたはキャプショニングにおいては、ユーザーは１つのシーン、例えば景色、建物、ポスターまたは美術館内の絵画の方にポータブルデバイスを向け、ディスプレイはそのシーンに関係する重ね合わされた情報と共に画像を示す。このような情報としては、例えば山や居住地の名称、人の名前、建物の歴史的情報および広告などの商業的情報、例えばレストランのメニューなどが含まれ得る。このようなシステムの一例は、欧州特許第１２４６０８０号明細書および欧州特許出願公開第２２０７１１３号明細書中に記載されている。 Due to the rapid progress in the development of handheld portable devices such as smartphones, palmtop computers, portable media players, personal digital assistants (PDA) devices, it has been proposed to include new features and applications involving image processing. Yes. In such applications, i.e. image annotation or captioning, the user points the portable device towards a scene, e.g. a landscape, a building, a poster or a painting in a museum, and the display is superimposed information relating to that scene. Together with the image. Such information may include, for example, names of mountains and settlements, names of people, historical information of buildings, commercial information such as advertisements, and menus of restaurants, for example. An example of such a system is described in EP 1246080 and EP 2207113.

アノテーション情報は、無線通信ネットワーク内にあるサーバーによりポータブルデバイスに供給され得る。ここでは、サーバーおよびポータブルデバイスを伴う通信ネットワークの対応する機能的構成をアノテーションシステムと呼称する。 The annotation information can be supplied to the portable device by a server that is in the wireless communication network. Here, the corresponding functional configuration of the communication network with server and portable device is called an annotation system.

国際公開第０５／１１４４７６号は、携帯電話および遠隔認識サーバーを含むモバイル画像に基づく情報リトリーブシステムについて記載している。このシステムでは、携帯電話のカメラで撮影した画像は遠隔サーバーに伝送され、ここで認識プロセスが実施される。こうして、画像を伝送するための高い帯域幅の必要性ならびに、サーバー内でアノテーションを計算しそれらを携帯電話に伝送し戻すための遅延が導かれる。 WO 05/114476 describes a mobile image based information retrieval system including a mobile phone and a remote recognition server. In this system, an image taken by a mobile phone camera is transmitted to a remote server, where a recognition process is performed. This leads to the need for high bandwidth to transmit images as well as the delay to calculate annotations in the server and transmit them back to the mobile phone.

多くのアノテーションシステムおよび方法に、データベース内に記憶された一組の基準画像とアノテーションデバイスが獲得した画像とを比較するステップが含まれている。実際の視野角および照明条件は、データベース内に記憶された画像に比べて異なることから、比較アルゴリズムはこれらのパラメータの影響を除去しなければならない。 Many annotation systems and methods include comparing a set of reference images stored in a database with images acquired by an annotation device. Since the actual viewing angle and illumination conditions are different compared to the images stored in the database, the comparison algorithm must remove the effects of these parameters.

さらなるより高性能の画像アノテーション技術では、３Ｄ基準（ｒｅｆｅｒｅｎｃｅ）モデルが使用される。多くの場合、これにはレジストレーションプロセス、すなわち、キャプチャした（またはターゲット）画像を基準３Ｄモデルと合致（ａｌｉｇｎｗｉｔｈ）するように空間的に変換するプロセスが関与する。例えば建物の場合、オブジェクトの３Ｄモデルは、アノテートすべき詳細と共に基準データベース内に記憶される。ポータブルデバイスが獲得した２Ｄ画像はこのモデルとレジストレーションされ、マッチを発見できた場合、オブジェクトは認識され、対応するアノテーションが２Ｄ画像上に重ね合わされる。 A further higher performance image annotation technique uses a 3D reference model. In many cases, this involves a registration process, that is, a process of spatially transforming the captured (or target) image to align with the reference 3D model. For example, in the case of a building, the 3D model of the object is stored in the reference database along with the details to be annotated. The 2D image acquired by the portable device is registered with this model, and if a match can be found, the object is recognized and the corresponding annotation is superimposed on the 2D image.

３Ｄモデルに基づく画像アノテーションモデルは、２Ｄモデルに比べ、視野角への依存度がより少ないという利点を有する。異なる場所から異なる角度でキャプチャされた複数の異なる２Ｄ画像とのマッチングのための基準として、単一の３Ｄモデルを使用することができる。しかしながら、３Ｄモデルのコレクションを構築することは、困難でかつ面倒なプロセスである。これには通常３Ｄまたはステレオカメラが必要である。その上、２Ｄキャプチャ画像と３Ｄモデルをレジストレーションするプロセスは、時間を要するものである。 The image annotation model based on the 3D model has an advantage that the dependence on the viewing angle is less than that of the 2D model. A single 3D model can be used as a reference for matching multiple different 2D images captured at different angles from different locations. However, building a collection of 3D models is a difficult and cumbersome process. This usually requires a 3D or stereo camera. Moreover, the process of registering 2D captured images and 3D models is time consuming.

したがって、本発明の目的は、既存の拡張現実（ａｕｇｍｅｎｔｅｄｒｅａｌｉｔｙ）システムの上述の問題点を解決するかまたは少なくとも軽減することにある。 Accordingly, it is an object of the present invention to solve or at least mitigate the above mentioned problems of existing augmented reality systems.

本発明によると、これらの目的は、
− プレノプティック（ｐｌｅｎｏｐｔｉｃ）キャプチャデバイスでライトフィールド（ｌｉｇｈｔｆｉｅｌｄ）を表わすデータをキャプチャするステップと；
− キャプチャされたデータを対応する基準データとマッチングするためのプログラムコードを実行するステップと；
− 前記基準データの一要素と結びつけられたアノテーションをリトリーブするためのプログラムコードを実行するステップと；
− 前記キャプチャされたデータから生成され少なくとも１つのアノテーションを含むビューをレンダリングするためのプログラムコードを実行するステップと；
を含む方法を介して達成される。 According to the present invention, these objectives are:
-Capturing data representing a light field with a plenoptic capture device;
-Executing program code for matching the captured data with the corresponding reference data;
-Executing program code for retrieving annotations associated with an element of the reference data;
-Executing program code for rendering a view generated from the captured data and including at least one annotation;
Achieved through a method comprising:

本発明はまた、
シーンに対応するデータをキャプチャしアノテートするための装置であって、
− ライトフィールドを表わすデータをキャプチャするためのプレノプティックカメラと；
− プロセッサと；
− ディスプレイと；
− プログラムコードであって、前記プログラムコードが実行された時に、前記プロセッサに、前記カメラでキャプチャされたデータの一要素に結びつけられたアノテーションをリトリーブさせるための、および、キャプチャされたデータから生成され、少なくとも１つのアノテーションを含むビューを前記ディスプレイ上にレンダリングするための、プログラムコードと；
を含む装置を介して達成される。 The present invention also provides
A device for capturing and annotating data corresponding to a scene,
A plenoptic camera for capturing data representing a light field;
-With a processor;
-With a display;
-Program code for causing the processor to retrieve annotations associated with an element of data captured by the camera when the program code is executed and generated from the captured data; Program code for rendering a view including at least one annotation on the display;
Achieved through an apparatus comprising:

本発明はまた、アノテーションを決定するための装置であって、
− プロセッサと；
− ストアと；
− プログラムコードであって、前記プログラムコードが実行された時に前記プロセッサに、ライトフィールドを表わすデータを受信させ、前記データを前記ストア内の一つの基準データとマッチングさせ、前記基準データと結びつけられたアノテーションを決定させ、そして遠隔デバイスに対して前記アノテーションを送信させるための、プログラムコードと；
を含む装置をも提供する。 The present invention is also an apparatus for determining an annotation comprising:
-With a processor;
-With the store;
-Program code, when the program code is executed, causing the processor to receive data representing a light field, match the data with one reference data in the store, and bind to the reference data Program code for determining an annotation and causing the remote device to transmit the annotation;
There is also provided an apparatus comprising:

プレノプティックカメラは、それ自体公知であり、低価格で市販されている。センサー上にシーンの２Ｄ投影をキャプチャするだけである従来のカメラとは異なり、プレノプティックカメラはライトフィールドを表わすデータを、すなわち各ピクセル上の光の強度だけでなくこのピクセル／サブ画像に達する光の方向あるいは少なくともさまざまな方向から各単一のサブ画像に達する光の強度をも示すマトリクスを表わすデータをキャプチャする。 Plenoptic cameras are known per se and are commercially available at low prices. Unlike conventional cameras that only capture a 2D projection of a scene on a sensor, a plenoptic camera provides data representing the light field to this pixel / sub-image as well as the intensity of light on each pixel. Data representing a matrix that also indicates the direction of light reaching or at least the intensity of light reaching each single sub-image from various directions.

したがって、プレノプティックセンサーは、従来の２Ｄ画像センサーによって生成される従来の２Ｄ画像データよりも多くの、各サブ画像に達する光についての情報を含むデータを生成する。 Thus, the plenoptic sensor generates more data containing information about the light reaching each sub-image than the conventional 2D image data generated by a conventional 2D image sensor.

プレノプティックセンサーにより生成されるデータは、従来の３Ｄセンサーからも立体カメラ（ｓｔｅｒｅｏｓｃｏｐｉｃｃａｍｅｒａ）からも直接入手できないシーンについての情報を含んでいる。したがって、より多くのそして異なる情報が利用可能であることから、基準データとキャプチャされたデータのマッチングプロセスは、２Ｄ画像と２Ｄまたは３Ｄモデルをマッチングする従来の方法よりもさらに信頼性が高い。キャプチャされたシーンについてより多くの情報を有することは、認識パフォーマンスを改善しレジストレーションの質を改善する上で有益であるということが、直観的に理解できる。 The data generated by the plenoptic sensor includes information about a scene that is not directly available from a conventional 3D sensor or a stereo camera. Thus, because more and different information is available, the matching process of reference data and captured data is more reliable than conventional methods of matching 2D images with 2D or 3D models. It can be intuitively understood that having more information about the captured scene is beneficial in improving recognition performance and improving registration quality.

プレノプティックカメラにより提供されたデータとモデルのマッチングも同様に、３Ｄモデルと２Ｄまたは３Ｄキャプチャ画像データとのマッチングに比べてより堅牢である。 The data and model matching provided by the plenoptic camera is also more robust than the 3D model and 2D or 3D captured image data matching.

ライトフィールドを表わしプレノプティックセンサーによりキャプチャされたデータのマッチングには、ライトフィールドデータを２Ｄ画像上に投影し、この２Ｄ画像を２Ｄまたは３Ｄ基準モデルとマッチングするステップが含まれていてよい。異なる投影（例えばプレノプティック画像のレンダリング中に選択可能な異なる焦点に対応するもの）が可能であることから、このプロセスの結果、マッチングの尤度は増大することになる。しかしながら、このまたはこれらの投影を計算するためには追加のリソースが必要とされ、キャプチャされたシーンについての情報は、対話中に失なわれ、結果としてマッチングの精度および速度が低下する。 Matching data representing a light field and captured by a plenoptic sensor may include projecting the light field data onto a 2D image and matching the 2D image with a 2D or 3D reference model. This process results in an increased likelihood of matching because different projections (eg, corresponding to different focal points that can be selected during rendering of the plenoptic image) are possible. However, additional resources are required to calculate this or these projections, and information about the captured scene is lost during the interaction, resulting in reduced matching accuracy and speed.

したがって、一実施形態において、プレノプティックセンサーによってキャプチャされたデータは、基準ライトフィールドを表わす基準データとマッチングさせられる。有利には、このマッチングは、キャプチャされたライトフィールドデータを２Ｄ画像上に全く投影せずに、かつ／または基準ライトフィールドデータを２Ｄ画像上に全く投影せずに実施される。したがって、マッチングは、完全にプレノプティックドメイン内で、２Ｄまたは３Ｄ画像上へのコンバージョンに起因して情報が失なわれることなく、かつ１シーンの各点における明度に基づくのみならず、キャプチャされたデータおよび基準データにおける光線上の方向にも基づいて行われる。 Thus, in one embodiment, the data captured by the plenoptic sensor is matched with reference data representing a reference light field. Advantageously, this matching is performed without projecting any captured light field data onto the 2D image and / or without projecting any reference light field data onto the 2D image. Thus, matching is not only based on lightness at each point in the scene, but also in the plenoptic domain, without loss of information due to conversion onto 2D or 3D images, and capture. This is also based on the direction on the ray in the generated data and the reference data.

方法は、キャプチャされたライトフィールドデータを基準ライトフィールドデータ上へレジストレーションするステップを含んでいてよい。このレジストレーションプロセスは、キャプチャされたライトフィールドデータと表示すべき種々のアノテーションとの間の幾何学的関係を発見することを目的としている。例えば、地図データの場合、レジストレーションプロセスの最終目的は、プレノプティックセンサーによりキャプチャされたライトフィールドが基準マップ内のどこに存在するかを見い出して、後に正しい場所で地図アノテーションをオーバーレイできるようにすることにある。プレノプティック空間内において全面的にこのレジストレーションプロセスを実施することによって、ライトフィールドを表わすデータ内に存在する情報全てが使用され、シーンのより精確なアノテーションが生み出される。 The method may include registering the captured light field data onto the reference light field data. This registration process is aimed at finding the geometric relationship between the captured light field data and the various annotations to be displayed. For example, for map data, the ultimate goal of the registration process is to find where the light field captured by the plenoptic sensor is in the reference map and later overlay map annotations in the correct location. There is to do. By performing this registration process entirely within the plenoptic space, all the information present in the data representing the light field is used to produce a more accurate annotation of the scene.

方法には、キャプチャされたライトフィールドデータと、異なる位置でプレノプティックセンサーによって生成された基準ライトフィールドデータとをマッチングするステップが含まれていてよい。方法には、キャプチャされたライトフィールドデータと、異なる距離でプレノプティックセンサーにより生成された基準ライトフィールドデータとをマッチングするステップが含まれていてよい。方法には、キャプチャされたライトフィールドデータと、異なるタイプのプレノプティックセンサーにより生成されたかまたは各サブ画像内に異なる数のピクセルを有する基準ライトフィールドデータとをマッチングするステップが含まれていてよい。 The method may include matching the captured light field data with reference light field data generated by the plenoptic sensor at different locations. The method may include matching the captured light field data with reference light field data generated by the plenoptic sensor at different distances. The method includes matching the captured light field data with reference light field data generated by different types of plenoptic sensors or having a different number of pixels in each sub-image. Good.

キャプチャされたライトフィールドデータと基準ライトフィールドデータのレジストレーションを実施することにより、基準ライトフィールドデータ内に存在するより完全な情報と適切かつ精確にレジストレーションするために、キャプチャされたライトフィールドデータ内に存在するすべての情報が活用され、こうしてシーンのアノテーションを適切かつ精確に行うことができるようになる。 By registering the captured light field data with the reference light field data, in the captured light field data to properly and accurately register with the more complete information present in the reference light field data. All of the information that exists in the system will be utilized, so that scene annotation can be performed appropriately and accurately.

本発明は、一例として提供され図中に示されている実施形態の説明を用いることによってより良く理解できる。 The invention can be better understood by using the description of the embodiments provided as an example and shown in the figures.

オブジェクトが第１の距離にある状態で、シーンのライトフィールドを表わすデータをキャプチャするプレノプティックカメラを概略的に示す図である。FIG. 2 schematically illustrates a plenoptic camera that captures data representing a light field of a scene with an object at a first distance. オブジェクトが第２の距離にある状態で、シーンのライトフィールドを表わすデータをキャプチャするプレノプティックカメラを概略的に示す図である。FIG. 2 schematically illustrates a plenoptic camera that captures data representing a light field of a scene with an object at a second distance. オブジェクトが第３の距離にある状態で、シーンのライトフィールドを表わすデータをキャプチャするプレノプティックカメラを概略的に示す図である。FIG. 6 schematically illustrates a plenoptic camera that captures data representing a light field of a scene with an object at a third distance. 共に本発明を実施するさまざまな装置要素を含むシステムを概略的に示す図である。FIG. 1 schematically illustrates a system that includes various device elements that together implement the present invention. ライトフィールドを表わすデータをキャプチャし、アノテートされた２Ｄ画像をレンダリングするための方法のブロック図である。FIG. 3 is a block diagram of a method for capturing data representing a light field and rendering an annotated 2D image. グローバルモデルを用いたローカルレジストレーション方法のブロック図である。It is a block diagram of the local registration method using a global model. プレノプティック空間内のグローバルレジストレーション方法のブロック図である。FIG. 6 is a block diagram of a global registration method in plenoptic space.

センサー上に１シーンの２Ｄ投影をキャプチャするだけである従来のカメラとは異なり、プレノプティックセンサーは、所与のシーン内に存在する完全なライトフィールドをキャプチャすることを目指す。完全なライトフィールドは、各ピクセルについて７つのパラメータすなわち、位置について３つ、方向について２つ、波長について１つ、そして時間について１つのパラメータを含んでいてよい。 Unlike conventional cameras that only capture a 2D projection of a scene on a sensor, plenoptic sensors aim to capture the complete light field that exists in a given scene. A complete light field may contain seven parameters for each pixel: three for position, two for direction, one for wavelength, and one for time.

プレノプティックセンサーは、いわゆるプレノプティックライトフィールドを表わすデータ、すなわち、マトリクスであって、該マトリクスからこれらのパラメータのうちの少なくとも４つを、つまりプレノプティックセンサーの各ピクセルをヒットする光線の２Ｄ位置および２Ｄ方向を計算し得るマトリクスを表わすデータを生成する。我々はこのデータを「ライトフィールドデータ」と呼ぶ場合がある。 A plenoptic sensor is a data representing a so-called plenoptic light field, i.e. a matrix, from which at least four of these parameters are hit, i.e. each pixel of the plenoptic sensor. Data representing a matrix from which the 2D position and 2D direction of the ray to be calculated can be calculated. We may call this data “light field data”.

今日現在、少なくとも２社が、このようなプレノプティックライトフィールドを記録できるプレノプティックセンサーを提案している。すなわちＬｙｔｒｏとＲａｙｔｒｉｘである。これらの会社の２つのカメラは、設計がわずかに異なっているものの、主要な考え方は、標準的カメラセンサーにおいて単一のフォトサイト（またはピクセル）に当たると想定される光の異なる方向を分解（ｄｅｃｏｍｐｏｓｅ）することにある。この目的のために、図１に示されている通り、従来のカメラのセンサーに代ってマイクロレンズ２０のアレイがメインレンズ１の背後に設置されている。画像センサー２１は、後方に移動させられる。 As of today, at least two companies have proposed plenoptic sensors capable of recording such plenoptic light fields. That is, Lytro and Raytrix. Although the two cameras of these companies are slightly different in design, the main idea is to decompose different directions of light that are expected to hit a single photosite (or pixel) in a standard camera sensor. ) For this purpose, as shown in FIG. 1, an array of microlenses 20 is placed behind the main lens 1 in place of a conventional camera sensor. The image sensor 21 is moved backward.

こうして、マイクロレンズ２０は、光線をその入射角にしたがって方向転換させ、方向転換された光線は、センサー２１の異なるピクセル２１０に達する。サブ画像を作るＮ×Ｍ個のピクセル２１０の各々が測定する光の量は、サブ画像の前のマイクロレンズ２０をヒットする光ビームの方向に左右される。 Thus, the microlens 20 redirects the light according to its angle of incidence, and the redirected light reaches different pixels 210 of the sensor 21. The amount of light measured by each of the N × M pixels 210 that make up the sub-image depends on the direction of the light beam that hits the microlens 20 in front of the sub-image.

図１〜３は、ｎ＝９個のサブ画像を含む単純な一次元センサーを示し、各サブ画像はＮ×Ｍ個のピクセル（またはフォトサイト）２１０の１つの行を有し、この例においてＮは３に等しく、Ｍは１に等しい。多くのプレノプティックセンサーは、より多くのサブ画像と各サブ画像についてより多くのピクセル、例えば９×９個のピクセルを有し、マイクロレンズ２０上でＮ×Ｍ＝８１個の異なる光の配向の間での識別を可能にしている。シーンの全てのオブジェクトの焦点が合っていると仮定すると、こうして各サブ画像は、そのサブ画像上にさまざまな方向から来る光の量を表わす明度値のパッチを含む。 1-3 show a simple one-dimensional sensor containing n = 9 sub-images, each sub-image having one row of N × M pixels (or photosites) 210, in this example N is equal to 3 and M is equal to 1. Many plenoptic sensors have more sub-images and more pixels for each sub-image, for example 9 × 9 pixels, and N × M = 81 different light on the microlens 20 It enables discrimination between orientations. Assuming all objects in the scene are in focus, each sub-image thus contains a patch of lightness values representing the amount of light coming from different directions on that sub-image.

この構造において、マイクロレンズ２０のアレイは、カメラのメインレンズ１により形成される画像平面上に位置づけられ、センサー２１はマイクロレンズから距離ｆのところに位置づけられ、ここでｆはマイクロレンズの焦点距離である。この設計は、高い角度分解能を可能にするものの、空間分解能が比較的低い（レンダリングされた画像あたりの有効ピクセル数は、マイクロレンズの数に等しい）。この問題は、マイクロレンズがメインレンズの画像平面上に焦点を合わせ、こうしてマイクロレンズと画像平面の間にギャップを創出している他のプレノプティックカメラにより対処される。このような設計には、角度分解能の低下という代償が伴う。 In this structure, the array of microlenses 20 is positioned on the image plane formed by the main lens 1 of the camera, and the sensor 21 is positioned at a distance f from the microlens, where f is the focal length of the microlens. It is. While this design allows for high angular resolution, the spatial resolution is relatively low (the number of effective pixels per rendered image is equal to the number of microlenses). This problem is addressed by other plenoptic cameras where the microlens is focused on the main lens image plane, thus creating a gap between the microlens and the image plane. Such a design comes at the price of reduced angular resolution.

図１〜３を見ればわかるように、この例において単一の点３を有するシーンに対応するプレノプティックライトフィールドは、点３からメインレンズ１までの距離によって左右される。図１では、このオブジェクトからの全ての光ビームは同じマイクロレンズ２０に達し、こうして、このマイクロレンズに対応するサブ画像内の全てのピクセルが第１の正の光強度を記録する一方で、他のレンズに対応する他のピクセル全てが、異なる、ヌル（ｎｕｌｌ）の光強度を記録するプレノプティックライトフィールドが、結果としてもたらされる。オブジェクト３がレンズ１により近接している図２では、点３に由来する一部の光ビームは、他のサブ画像、すなわち前にヒットされていたマイクロレンズに隣接する２つのマイクロレンズに結びつけられたサブ画像のピクセルに達する。オブジェクト３がレンズ１からさらに大きい距離のところにある図３では、点３に由来する一部の光ビームが、前にヒットされていたマイクロレンズに隣接する２つのマイクロレンズに結びつけられた異なるピクセルに達する。したがって、センサー２１により送出されるデジタルデータ２２は、オブジェクト３までの距離によって左右される。 As can be seen from FIGS. 1 to 3, the plenoptic light field corresponding to the scene having a single point 3 in this example depends on the distance from the point 3 to the main lens 1. In FIG. 1, all light beams from this object reach the same microlens 20, so that all pixels in the sub-image corresponding to this microlens record the first positive light intensity while others This results in a plenoptic light field in which all other pixels corresponding to the other lens record different, null light intensities. In FIG. 2, where the object 3 is closer to the lens 1, some of the light beam from point 3 is tied to another sub-image, ie two microlenses adjacent to the previously hit microlens. Reach the sub-image pixel. In FIG. 3 where the object 3 is at a greater distance from the lens 1, the different pixels in which some of the light beams from point 3 are associated with two microlenses adjacent to the previously hit microlens. To reach. Therefore, the digital data 22 transmitted by the sensor 21 depends on the distance to the object 3.

こうして、プレノプティックセンサー２１は、マイクロレンズ２０に対応する各サブ画像について、このサブ画像の上にあるレンズ上にさまざまな方向から来る光の量を表わすＮ×Ｍ個の値の１セットを含むライトフィールドデータ２２を送出（ｄｅｌｉｖｅｒ）する。所与の焦点合せされたオブジェクト点について、サブ画像の各ピクセルは、一定の入射角でセンサーをヒットする光線の強度尺度に対応する。 Thus, the plenoptic sensor 21 has, for each sub-image corresponding to the microlens 20, a set of N × M values representing the amount of light coming from various directions on the lens above this sub-image. The light field data 22 including is delivered (delivered). For a given focused object point, each pixel of the sub-image corresponds to a ray intensity measure that hits the sensor at a constant angle of incidence.

光線の方向を知ることの利点は、多数ある。光線を入念に再整理することによって、他のタスクの中でも、再焦点合せ（ｒｅｆｏｃｕｓｉｎｇ：シーン内で焦点が合っているオブジェクトを変えること）を実施することが、または、カメラの視点を変更することができる。 There are many advantages to knowing the direction of the rays. Performing refocusing (changing the focused object in the scene) or changing the camera viewpoint, among other tasks, by carefully rearranging the rays Can do.

図４は、本発明を実施するアノテーションシステムのブロック図を概略的に示す。システムは、ユーザーデバイス４、例えばハンドヘルドデバイス、スマートフォン、タブレット、カメラ、メガネ、ゴーグルなどを含む。デバイス４は、シーン３上のライトフィールドを表わすデータをキャプチャするための図１〜３に示されたカメラなどのプレノプティックカメラ４１と、好適なプログラムコードを伴うマイクロプロセッサ４００などのプロセッサと、インターネット６などのネットワーク上で例えばクラウドサーバーなどの遠隔サーバー５とデバイス４を接続するためのセルラーインターフェースおよび／またはＷＩＦＩなどの通信モジュール４０１とを含む。サーバー５は、基準ライトフィールドデータコレクションおよび／または１つまたは複数のグローバルモデルを記憶するための、ＳＱＬデータベースなどのデータベース、１組のＸＭＬドキュメント、１組のライトフィールドデータ画像などを伴うストレージ５０と、アノテーション方法において必要とされるオペレーションをマイクロプロセッサに実施させるためのコンピュータコードを伴うマイクロプロセッサを含むプロセッサ５１とを含む。アノテーションおよび対応する位置も同様に、基準ライトフィールドデータと共にストレージ５０内に記憶可能である。 FIG. 4 schematically shows a block diagram of an annotation system embodying the present invention. The system includes a user device 4, such as a handheld device, a smartphone, a tablet, a camera, glasses, goggles and the like. Device 4 includes a plenoptic camera 41 such as the camera shown in FIGS. 1-3 for capturing data representing a light field on scene 3, and a processor such as microprocessor 400 with suitable program code. A communication module 401 such as a cellular interface and / or WIFI for connecting the remote server 5 such as a cloud server and the device 4 on a network such as the Internet 6. The server 5 includes a storage 50 with a database, such as a SQL database, a set of XML documents, a set of light field data images, etc., for storing a reference light field data collection and / or one or more global models. A processor 51 including a microprocessor with computer code for causing the microprocessor to perform the operations required in the annotation method. Annotations and corresponding positions can also be stored in the storage 50 along with the reference light field data.

ユーザーデバイス４により実行されるプログラムコードは、例えば、ユーザーがユーザーデバイス４内にダウンロードしインストールできるアプリケーションソフトウェアまたはａｐｐを含むことができてよい。プログラムコードはまた、ユーザーデバイス４のオペレーティングコードの一部を含むこともできてよい。プログラムコードはまた、例えばＪａｖａ（登録商標）、Ｊａｖａｓｃｒｉｐｔ、ＨＴＭＬ５コードなどを含めた、ウェブページ内に埋め込まれたまたはブラウザ内で実行されるコードを含むこともできてよい。プログラムコードは、フラッシュメモリー、ハードディスク、または任意のタイプの永久的または半永久的メモリーなどの有形装置可読媒体（ｔａｎｇｉｂｌｅａｐｐａｒａｔｕｓｒｅａｄａｂｌｅｍｅｄｉｕｍ）中に、コンピュータプログラム製品として記憶されてよい。 The program code executed by the user device 4 may include, for example, application software or app that can be downloaded and installed in the user device 4 by the user. The program code may also include a part of the operating code of the user device 4. The program code may also include code embedded in a web page or executed in a browser, including, for example, Java, JavaScript, HTML5 code, and the like. The program code may be stored as a computer program product in a tangible apparatus readable medium such as flash memory, a hard disk, or any type of permanent or semi-permanent memory.

プログラムコードは、マイクロプロセッサが、ライトフィールドに対応するキャプチャされたデータセット、またはこれらのデータセットのフィーチャのうちの少なくとも一部を遠隔サーバー５に対して送るようにするために、ユーザーデバイス４内でマイクロプロセッサ４００により実行される。プログラムコードは、このライトフィールドデータを「プレノプティックフォーマット」で、すなわち光線の方向についての情報を失なうことなく送るために配置されている。プログラムコードはまた、マイクロプロセッサ４００に、サーバー５からライトフィールドフォーマットにおけるアノテートされたデータ、またはアノテートされた画像、または先に送られたライトフィールドデータに関連するアノテーションを受取らせ、アノテーション付きのキャプチャされたデータに対応するビューをレンダリングさせることもできる。 Program code is stored in the user device 4 to cause the microprocessor to send captured data sets corresponding to the light field, or at least some of the features of these data sets, to the remote server 5. And executed by the microprocessor 400. The program code is arranged to send this light field data in a “plenoptic format”, i.e. without losing information about the direction of the rays. The program code also causes the microprocessor 400 to receive annotations associated with the annotated data in the light field format, or the annotated image, or previously transmitted light field data from the server 5 and is annotated and captured. You can also render a view corresponding to the data.

一実施形態において、ユーザーデバイス４内のプログラムコードはまた、キャプチャされたデータ内に存在するローカルフィーチャを識別するための、およびこのローカルフィーチャの記述を計算、例えばプログラムコードが遠隔サーバー５に対して送らせることのできるバイナリーベクトル（ｂｉｎａｒｙｖｅｃｔｏｒ）を計算するためのモジュールをも含んでいる。 In one embodiment, the program code in the user device 4 is also used to identify a local feature present in the captured data and calculate a description of this local feature, eg, the program code is sent to the remote server 5 It also includes a module for computing a binary vector that can be sent.

サーバー５内でマイクロプロセッサにより実行されるプログラムコードには、サーバー５に以下のタスクのうちの少なくとも一部を実行させるための実行可能なプログラムまたは他のコードが含まれていてよい：
− ライトフィールドを表わすデータをデバイスから受信すること；
− プレノプティックフォーマットでモデルおよび／または複数の基準データをリトリーブすること；
− ユーザーデバイスから受信したデータを、前記モデルの一部分と、または複数の基準データの中の１つとそれぞれ、マッチングすること；
− モデル結びつけられた、または複数の基準データの中の１つとそれぞれ結びつけられたアノテーションを決定すること；
− デバイスに対して、受信データに対応する、アノテーション、アノテートされた画像またはアノテートされたデータを送ること。 Program code executed by the microprocessor in server 5 may include an executable program or other code that causes server 5 to perform at least some of the following tasks:
-Receiving data representing light fields from the device;
-Retrieve the model and / or multiple reference data in a plenoptic format;
-Matching data received from a user device with a part of the model or one of a plurality of reference data, respectively;
-Determining the annotation associated with the model or each associated with one of the reference data;
-Sending annotations, annotated images or annotated data corresponding to the received data to the device.

さまざまな実施形態において、サーバー内の基準データとのマッチングのためにキャプチャデータセットを遠隔サーバに送る代りに、このマッチングを、ローカルに記憶された基準データのセットまたはユーザーデバイス内にローカルに記憶されたモデルと、ローカルで行うことが可能である。 In various embodiments, instead of sending a capture data set to a remote server for matching with reference data in the server, this match is stored locally in a set of locally stored reference data or user devices. And can be done locally.

ここで、図１〜４のシステム、装置および配置を用いて実施できる方法のさまざまな考えられる実施形態について説明する。 Various possible embodiments of methods that can be implemented using the systems, devices, and arrangements of FIGS. 1-4 will now be described.

Ａ．プレノプティックローカルフィーチャに基づく多数の独立した基準データセット
一実施形態において、サーバー５のストレージ５０内で、ライトフィールドを表わす既知でかつ前にキャプチャされた基準データセットのコレクション、例えばプレノプティックカメラで前にキャプチャされたかまたは３Ｄモデルからコンバートされた基準データのコレクションが入手可能である。この場合、適切なレジストレーションが可能になる前に、マッチングデータは基準データのセットから認識されなければならない。マッチング基準データとのレジストレーションが行われるのは、その後でしかない。 A. Multiple independent reference data sets based on plenoptic local features In one embodiment, a collection of known and previously captured reference data sets representing light fields in the storage 50 of the server 5, e.g. A collection of reference data previously captured with a tick camera or converted from a 3D model is available. In this case, the matching data must be recognized from the set of reference data before proper registration is possible. Registration with the matching reference data is only performed after that.

本実施形態において使用される、あり得る一連のステップが、図５に示されている。これには以下のものが含まれる：
ステップ１００：アノテートすべきライトフィールドがユーザーデバイス４内でプレノプティックカメラ４１によりキャプチャされるか、または、ライトフィールドデータの任意の考えられるソースからリトリーブされる。キャプチャされたプレノプティックライトフィールドの２Ｄ投影がユーザーデバイス４の２Ｄディスプレイ４０上に表示されてよいが、データは好ましくは、ライトフィールドデータとして、すなわち各サブ画像上の入射光線の方向についての情報を失なうことなく記憶される。
ステップ１０１：基準データをキャプチャするために使用されるプレノプティックカメラが、アノテートすべきライトフィールドデータをキャプチャするために使用されるプレノプティックカメラと同じタイプのものでない場合には、プロセスは、いずれか一方のデータを他方のデータのフォーマットにコンバートまたはリサンプリングするステップ１０１を含んでいてよい。例えば異なるプレノプティックカメラが、各サブ画像内に異なる数のピクセルを有するライトフィールドデータを生成するかまたは、異なる方法でライトフィールドをサンプリングしてもよい。このコンバージョンは、ユーザーデバイス４および／または遠隔サーバー５内で行うことができてよい。
ステップ１０２：キャプチャされたデータ内のローカルフィーチャの検出。以下で記述するように、例えば、ＤＰＦ（ｄｅｐｔｈｐｌｅｎｏｐｔｉｃｆｅａｔｕｒｅ：深さプレノプティックフィーチャ）アルゴリズムにしたがうことによってか、またはライトフィールド内に含まれる視差情報（ｄｉｓｐａｒｉｔｙｉｎｆｏｒｍａｔｉｏｎ）を使用することによってか、あるいは、エピポーラボリューム（ｅｐｉｐｏｌａｒｖｏｌｕｍｅ）でライトフィールドを表現することによって、検出を行うことができる。他の検出方法および他のタイプのローカルフィーチャを使用してもよい。使用されるローカルフィーチャのタイプおよび検出方法は、シーン、場所、ユーザーの選択などによって左右される可能性がある。
ステップ１０３：キャプチャデータ内で検出されたこれらのローカルフィーチャの記述。先行ステップの間に検出されたローカルフィーチャのタイプに応じて、例えば、以下で説明するように、バイナリーベクトル、またはエピポーラボリュームでの視差またはローカルフィーチャ点の記述によりうまく適応させられた他の記述子（ｄｅｓｃｒｉｐｔｏｒ）などを含め、異なるタイプの記述子を使用できると考えられる。ローカルフィーチャの検出および記述は、有利には、サーバー５にこれらの短かい記述を送ることしか必要としないユーザーデバイス４内の好適なソフトウェアモジュールによって行われる。完全なライトフィールドデータをサーバー５に送ることも同様に可能であり、サーバー５はそしてローカルフィーチャを検出し記述するが、この結果として、利用可能な帯域幅の使用効率は低くなると考えられる。
ステップ１０４：記述されたローカルフィーチャに基づく、キャプチャされたデータの認識。これは様々な方法で行うことができる。一実施形態においては、ローカルフィーチャを量子化し（ステップ１０４０）、その後この量子化されたフィーチャを用いて、ステップ１０４１の間に、同じ（またはほぼ同じ）量子化されたフィーチャセットを有する基準データを検索することができる。基準データは、ユーザーデバイスからおよび／または遠隔サーバー５内の遠隔ストレージ５０からリトリーブされてよい。基準データのプレフィルタリングは、さまざまなフィルタリングクリテリア（ｃｒｉｔｅｒｉａ）、例えば衛星または地上（ｔｅｒｒｅｓｔｒｉａｌ）位置特定システムから予め決定されたユーザーデバイス４の場所、シーンから受信した信号、ユーザーの選択などに基づいて行われてよい。基準データは２Ｄ画像、３Ｄモデルまたは好ましくはライトフィールドを表わすデータを含む場合がある。このステップは、サーバー５内の好適なプログラムコードによって実行されてよいが、基準データの数が過度に多くなければ、ユーザーデバイス４内でのローカル認識も可能である。
量子化ステップ１０４０は、既知の基準の数が増大した場合に、システムをより容易にスケーリングできるようにする。
ステップ１０６：キャプチャされたデータ中の検出されたローカルフィーチャと、先のステップ中に識別された基準データ中のローカルフィーチャとのマッチング。基準データ中のローカルフィーチャは、コレクション５０が構築された時点で、先行する段階において、検出され記述される。このステップは、サーバー５内で好適なプログラムコードによって実行されてよいが、ユーザーデバイス４内で実行してもよい。
ステップ１０７：キャプチャデータから検出されたローカルフィーチャを、マッチングする基準データへとマッピングする、幾何学的変換（ｔｒａｎｓｆｏｒｍａｔｉｏｎ）を見つける。このステップは、「レジストレーション」と呼ばれる。変換には、ローテーションを用いたキャプチャされたデータのワーピング（ｗａｒｐｉｎｇ）、スケーリング、トランスレーションまたはホモグラフィが含まれる。複数の基準画像が利用可能である場合には、このステップは、レジストレーションの質が最良である基準データの決定を含んでいてよい。レジストレーションは、ユーザーデバイス４内、遠隔サーバー５内または、部分的にユーザーデバイス内と遠隔サーバー内において行われてよい。
一実施形態において、レジストレーションプロセスの結果は、また、「拡張レイヤー（ａｕｇｍｅｎｔｅｄｌａｙｅｒ）」として表示されるべき情報との関係におけるシーンをキャプチャするユーザーデバイス４の全位置をも示す。カメラの位置および配向は、６つのパラメータ、すなわち位置について３つ、その配向について３つのパラメータによって識別されてよい。このステップは、サーバー５内で好適なプログラムコードにより実行されてよいが、ユーザーデバイス４内で実行されてもよい。
ステップ１０８：コレクション５０内の基準データと結びつけられた少なくとも１つのアノテーション、ならびにこのアノテーションが結びつけられるべき画像の位置またはフィーチャをリトリーブする。
ステップ１０９：ステップ１０８中にリトリーブされたアノテーションのうちの少なくとも１つを伴うキャプチャデータに基づいて、２Ｄまたは３Ｄ画像などのビューを、ユーザーデバイス４のディスプレー４０上にレンダリングする。 A possible sequence of steps used in this embodiment is shown in FIG. This includes the following:
Step 100: The light field to be annotated is captured by the plenoptic camera 41 in the user device 4 or retrieved from any possible source of light field data. Although a 2D projection of the captured plenoptic light field may be displayed on the 2D display 40 of the user device 4, the data is preferably as light field data, i.e. for the direction of the incident ray on each sub-image. It is memorized without losing information.
Step 101: Process if the plenoptic camera used to capture the reference data is not of the same type as the plenoptic camera used to capture the light field data to be annotated May include a step 101 of converting or resampling either data into the format of the other data. For example, different plenoptic cameras may generate light field data having a different number of pixels in each sub-image, or sample the light field in different ways. This conversion may be performed within the user device 4 and / or the remote server 5.
Step 102: Detection of local features in the captured data. As described below, for example, by following a DPF (depth plenoptic feature) algorithm or by using disparity information contained within a light field, Alternatively, the detection can be performed by expressing the light field with an epipolar volume. Other detection methods and other types of local features may be used. The type of local features used and the detection method may depend on the scene, location, user choice, etc.
Step 103: A description of these local features found in the capture data. Depending on the type of local feature detected during the previous step, for example, as described below, a binary vector or other descriptor well adapted by the description of the disparity or local feature point in the epipolar volume It is contemplated that different types of descriptors can be used, including (descriptor) and the like. The detection and description of local features is advantageously performed by a suitable software module in the user device 4 that only needs to send these short descriptions to the server 5. It is possible to send complete light field data to the server 5 as well, and the server 5 will then detect and describe local features, but this will result in less efficient use of available bandwidth.
Step 104: Recognition of captured data based on the described local features. This can be done in various ways. In one embodiment, local features are quantized (step 1040) and then the quantized features are used to generate reference data having the same (or nearly the same) quantized feature set during step 1041. You can search. The reference data may be retrieved from the user device and / or from the remote storage 50 in the remote server 5. The pre-filtering of the reference data is based on various filtering criteria, such as the location of the user device 4 predetermined from a satellite or terrestrial location system, the signal received from the scene, the user's selection, etc. May be done. The reference data may include data representing a 2D image, a 3D model, or preferably a light field. This step may be performed by suitable program code in the server 5, but local recognition in the user device 4 is also possible if the number of reference data is not too large.
The quantization step 1040 allows the system to be more easily scaled when the number of known criteria increases.
Step 106: Matching the detected local features in the captured data with the local features in the reference data identified during the previous step. Local features in the reference data are detected and described in a previous step when the collection 50 is constructed. This step may be performed by suitable program code in the server 5 but may also be performed in the user device 4.
Step 107: Find a geometric transformation that maps the local features detected from the captured data to the matching reference data. This step is called “registration”. Transformation includes warping, scaling, translation or homography of the captured data using rotation. If multiple reference images are available, this step may include determining the reference data with the best registration quality. Registration may take place in the user device 4, in the remote server 5, or partially in the user device and in the remote server.
In one embodiment, the result of the registration process also indicates the total position of the user device 4 that captures the scene in relation to the information to be displayed as an “augmented layer”. Camera position and orientation may be identified by six parameters: three for position and three for that orientation. This step may be performed by suitable program code in the server 5 but may also be performed in the user device 4.
Step 108: Retrieve at least one annotation associated with the reference data in the collection 50, as well as the location or feature of the image to which this annotation is to be associated.
Step 109: Render a view, such as a 2D or 3D image, on the display 40 of the user device 4 based on the captured data with at least one of the annotations retrieved during step 108.

Ｂ．プレノプティックローカルフィーチャに基づくグローバル基準データセット
上述の方法Ａは、異なるライトフィールドを表わす基準データのコレクション、および最も高い精度または信頼度で基準データとマッチする基準データを、ローカルフィーチャに基づいて決定するプロセスに依存している。 B. Global Reference Data Set Based on Plenoptic Local Features Method A described above uses a collection of reference data representing different light fields and reference data that matches the reference data with the highest accuracy or confidence based on local features. Depends on the process to decide.

ここで、基準ライトフィールドデータのコレクションの利用可能性に依存せずシーンのグローバルモデルを使用するグローバル方法について説明する。この方法ではなお、キャプチャされたデータとこのモデルのマッチングおよびレジストレーションのためにローカルフィーチャが使用される。この種の方法は、例えば屋外位置特定の場合に役立つが、建物、美術館、モールなどの内部または、シーン全体のモデルが利用可能である他の拡張現実アプリケーションについても使用可能であると考えられる。 Here, a global method that uses a global model of a scene without depending on the availability of a collection of reference light field data will be described. This method still uses local features for matching and registration of the captured data with this model. This type of method is useful, for example, for outdoor location, but could be used for other augmented reality applications where a model of the whole scene, such as buildings, museums, malls, etc. is available.

グローバルモデルは、１つまたは複数のプレノプティックカメラでキャプチャされたライトフィールドデータセットについて計算されたローカルフィーチャのクラウドにより構成されていてよい。例えば、都市または基準シーンのモデルを、さまざまなカメラでキャプチャされたライトフィールドデータの大型セットを集約することによって構築してよい。ローカルフィーチャはこれらのさまざまなデータピース内で検出され、記述される。これらの記述されたフィーチャは、次にグローバル座標系内の特定の物理的場所に割当てられる。最後に、モデルはこうして、各々グローバル座標系内の特定の物理的場所を表わすローカルフィーチャのクラウドで構成される。都市の場合、座標系は、例えばＧＰＳ（ＷＧＳ８４）内で使用されるものであり得、全てのフィーチャはその座標系内の特定の点／ローカルエリアを表わし得ると考えられる。 The global model may consist of a cloud of local features computed over a light field dataset captured with one or more plenoptic cameras. For example, a model of a city or reference scene may be constructed by aggregating a large set of light field data captured by various cameras. Local features are detected and described in these various data pieces. These described features are then assigned to specific physical locations within the global coordinate system. Finally, the model is thus composed of a cloud of local features, each representing a specific physical location in the global coordinate system. In the case of a city, the coordinate system may be that used in, for example, GPS (WGS84), and it is believed that all features may represent a particular point / local area within that coordinate system.

代替的には、モデルは、プレノプティックサンプルから抽出されたプレノプティックローカルフィーチャで構成されない。例えば、クエリーがプレノプティックサンプルである一方で、１都市の３Ｄモデルを得ることができる。その場合、可能性としては、３Ｄモデルから合成ライトフィールドデータをレンダリングするという可能性が考えられる。別の可能性としては、２つのデータモダリティ間の相互情報測度（ｍｕｔｕａｌｉｎｆｏｒｍａｔｉｏｎｍｅａｓｕｒｅ）を使用して、３Ｄモデル上に入力プレノプティック画像をマッピングする幾何学的変換が相互情報測度との関係において最適化される最小化プロセスを適用することがある。 Alternatively, the model is not composed of plenoptic local features extracted from plenoptic samples. For example, a 3D model of a city can be obtained while the query is a plenoptic sample. In that case, the possibility of rendering synthetic light field data from the 3D model is considered. Another possibility is that a geometric transformation that maps the input plenoptic image on the 3D model using a mutual information measure between the two data modalities is related to the mutual information measure. May apply a minimization process that is optimized in

ユーザーデバイス４内においてプレノプティックカメラ４１でキャプチャされた新規キャプチャデータを、ローカルプレノプティックフィーチャのこのクラウドに対してマッチさせるためには、図６を用いて例示される以下のアプローチを使用してよい：
ステップ１００：アノテートすべきライトフィールドを表わすデータをキャプチャまたはリトリーブする。
ステップ１０１：必要な場合、データをリサンプリングする。
ステップ１０２〜１０３：ライトフィールドを表わすキャプチャされたデータ内のローカルフィーチャを検出し記述する。
ステップ１１０：検出されたローカルフィーチャを、例えばデータベース５０内に記憶されたモデルなどのグローバルモデル１１０１のローカルフィーチャとマッチングする。このマッチングは、フィーチャを合わせてビニング（ｂｉｎｎｉｎｇ）して検索を加速させることによって、スピードアップすることができる。先行する情報１１０２（ＧＰＳ情報、ユーザー入力など）に基づいて、枝刈り（ｐｒｕｎｉｎｇ）ステップ１１００を実施して、マッチングをスピードアップしてもよい。このとき、マッチングは、これらの先行情報に対応するローカルフィーチャのサブセットについてのみ行われる。局所性鋭敏型ハッシュ法（ｌｏｃａｌｉｔｙｓｅｎｓｉｔｉｖｅｈａｓｈｉｎｇ）を使用してもよく、その場合、ハッシュ関数セットがフィーチャー記述子について計算されて、異なるハッシュ値に基づいてクラスタを作り出す。ハッシュ関数のセットは、記述子空間内で互いに近接している２つの記述子が同じハッシュ値を生成するような形で選択される。
ステップ１１１：グローバルモデル内でローカルフィーチャをマッチさせた状態でキャプチャデータ内において検出されたローカルフィーチャを投影する幾何学的変換を計算する。これはレジストレーションステップである。このステップの出力は、カメラ４１のポーズ推定であり、こうして、キャプチャデータをキャプチャするカメラがモデル座標系との関係においてどこにあるかがわかるようになる。
ステップ１０８：次にアノテーションがリトリーブされる。アノテーションは通常、位置依存性（ｐｏｓｉｔｉｏｎ−ｄｅｐｅｎｄｅｎｔ）であり、それ自体、モデル座標系の内でレジストレーション（ｒｅｇｉｓｔｅｒｅｄ）されている。
ステップ１０９：アノテーション付き画像がレンダリングされる。 To match new capture data captured by the plenoptic camera 41 in the user device 4 against this cloud of local plenoptic features, the following approach illustrated with FIG. May be used:
Step 100: Capture or retrieve data representing a light field to be annotated.
Step 101: Resample data if necessary.
Steps 102-103: Detect and describe local features in the captured data representing the light field.
Step 110: Match the detected local features with local features of the global model 1101, such as a model stored in the database 50, for example. This matching can be speeded up by accelerating the search by binning features together. Based on the preceding information 1102 (GPS information, user input, etc.), a pruning step 1100 may be performed to speed up matching. At this time, matching is performed only on a subset of local features corresponding to the preceding information. Locality sensitive hashing may be used, in which case a set of hash functions is computed for feature descriptors to create clusters based on different hash values. The set of hash functions is selected such that two descriptors that are close to each other in the descriptor space produce the same hash value.
Step 111: Calculate a geometric transformation that projects the local features detected in the capture data with matching local features in the global model. This is a registration step. The output of this step is the pose estimation of the camera 41. Thus, it is possible to know where the camera that captures the capture data is in relation to the model coordinate system.
Step 108: The annotation is then retrieved. Annotations are usually position-dependent and are themselves registered in the model coordinate system.
Step 109: An annotated image is rendered.

ここでもまた、プレノプティック情報を使用することで、とりわけ異なる照明条件、画像変形などの下でのマッチングおよびレジストレーションプロセスのロバスト性が改善される。 Again, the use of plenoptic information improves the robustness of the matching and registration process, especially under different lighting conditions, image deformation, etc.

Ｃ．グローバルモデルを用いたライトフィールドデータに基づくグローバルレジストレーション
ここで、グローバルモデルを用いたグローバルレジストレーションに基づくさらなるレジストレーション方法について記述する。先行の方法Ｂと同様、この方法は、所定のシーンの既知のグローバルモデルが利用可能である場合に使用できる。例えば、都市の場合、我々は、我々が所与の都市内にいるという先験情報（ａ−ｐｒｉｏｒｉｉｎｆｏｒｍａｔｉｏｎ）を有し、したがってすでに利用可能であるその都市の３Ｄモデルをロードすることができる。レジストレーションプロセスは、モデル座標系との関係における、ライトフィールドデータをキャプチャしたカメラの位置を送出する。 C. Global registration based on light field data using a global model Here, a further registration method based on global registration using a global model is described. Like previous method B, this method can be used when a known global model of a given scene is available. For example, in the case of a city, we can load a 3D model of that city that has a-priori information that we are in a given city and is therefore already available. The registration process sends out the position of the camera that captured the light field data in relation to the model coordinate system.

一例として、グローバルレジストレーションに基づく典型的方法には、図７を用いて例示されている以下のステップが含まれ得る：
ステップ１５２：ステップ１５２の間に、ユーザーが現在おかれているシーンまたは環境のグローバルモデルが、例えばプレノプティックセンサー２を含むユーザーのスマートフォンまたはタブレットまたはナビゲーションシステムなどのユーザーデバイス４のメモリー内にロードされる。ストレージ５０からロードされるモデルは、例えばＧＰＳを用いて決定されるユーザーの場所、ユーザーの選択、シーンの自動解析、他の先験的に既知の情報などにより左右され得る。
ステップ１００：アノテートすべきライトフィールドが、ユーザーデバイス４のカメラ４１でキャプチャされる。キャプチャされたプレノプティックライトフィールドの２Ｄ投影は、ユーザーデバイス４の２Ｄディスプレイ４０上に表示されてもよいが、データは好ましくは、ライトフィールドデータとして、すなわち各ピクセル上の入射光線の方向についてその情報を失なうことなく記憶される。
ステップ１０１：プロセスは、例えばモデルが異なるフォーマットを有する場合、マッチングおよび認識プロセスを容易にするかまたは迅速化するため、キャプチャデされたータをコンバートまたはリサンプリングする追加のステップを含んでいてよい。例えば、異なるプレノプティックカメラが、各サブ画像内に異なる数のピクセルを有するデータを生成するかまたは、異なる方法でライトフィールドをサンプリングしてよい。このコンバージョンは、ユーザーデバイス４内または遠隔サーバー５内で行うことができる。
ステップ１５０：初期位置は、例えばＧＰＳ、ユーザーが入力した情報、または他の類似の先行情報に基づいて推定されてよい。
ステップ１５１：キャプチャされたデータは、モデルとの関係においてレジストレーションされる。アウトプットでは、モデルとの関係におけるカメラの位置についての完全な６つの自由度がある。モデルがユーザーデバイス４内にロードされた場合には、レジストレーションはこのデバイス内のプロセッサにより行われ得る。
ステップ１０８：計算されたデバイス４の位置の周りの場所あるいはこの位置から見えるはずのものと結びつけられた１組のアノテーションが、モデルからリトリーブされる。
ステップ１０９：先行ステップ中にリトリーブされたアノテーションのうちの少なくとも１つを伴うキャプチャされたデータに基づいて、２Ｄまたは３Ｄ画像などのビューが、ユーザーデバイス４のディスプレイ４０上にレンダリングされる。 As an example, an exemplary method based on global registration may include the following steps, illustrated using FIG.
Step 152: During step 152, the global model of the scene or environment in which the user is currently located is stored in the memory of the user device 4 such as the user's smartphone or tablet or navigation system including the plenoptic sensor 2, for example. Loaded. The model loaded from the storage 50 may depend on, for example, the user's location determined using GPS, user selection, automatic scene analysis, other a priori known information, and the like.
Step 100: The light field to be annotated is captured by the camera 41 of the user device 4. Although the 2D projection of the captured plenoptic light field may be displayed on the 2D display 40 of the user device 4, the data is preferably as light field data, i.e. for the direction of the incident ray on each pixel. The information is stored without loss.
Step 101: The process may include an additional step of converting or resampling the captured data to facilitate or expedite the matching and recognition process, for example if the models have different formats. . For example, different plenoptic cameras may generate data having a different number of pixels in each sub-image, or sample the light field in different ways. This conversion can take place in the user device 4 or in the remote server 5.
Step 150: The initial position may be estimated based on, for example, GPS, user input information, or other similar prior information.
Step 151: The captured data is registered in relation to the model. In the output, there are six complete degrees of freedom for the position of the camera in relation to the model. If the model is loaded into the user device 4, registration can be performed by a processor in this device.
Step 108: A set of annotations associated with a location around the calculated location of the device 4 or what should be visible from this location is retrieved from the model.
Step 109: A view, such as a 2D or 3D image, is rendered on the display 40 of the user device 4 based on the captured data with at least one of the annotations retrieved during the previous step.

上述のグローバルレジストレーション法のレジストレーションステップ１５１は、好ましくは、カメラ位置の推定が与えられており、目的関数（ｏｂｊｅｃｔｉｖｅｆｕｎｃｔｉｏｎ）を用いて、プレノプティックライトフィールドサンプルをモデル（上記の場合においては都市モデル）へと投影する誤差（ｅｒｒｏｒ）を計算する。この目的関数（費用関数（ｃｏｓｔｆｕｎｃｔｉｏｎ）としても公知である）を用いて、カメラ位置推定値が精緻化され改善されて投影誤差を最小化するような形で、反復的最適化プロセスを適用することができる。この最適化プロセスを、以下のステップに分解することができる：
１．ユーザーデバイスの位置の初期推定値を獲得／計算する。これは、例えばプレノプティックカメラを含むスマートフォンの場合、スマートフォンＧＰＳ、加速度計ならびにデバイスの位置および配向を計算するためのコンパスを使用することによって行うことができる。この初期設定値を現在の設定値として設定する。
２．入力されたプレノプティックサンプルのモデル内への投影を計算する。目的関数を用いて、投影誤差を計算する（ステップ１５１０）。
３．誤差および目的関数を所与として、次のカメラ位置推定値を計算し（ステップ１５１１）、それを現在の推定値として設定する。
４．誤差が特定の閾値より大きい場合、ステップ２に戻り、それ以外の場合にはステップ５に進む。
５．現在の推定値が、ユーザーデバイスの最適化された位置であり、モデルとの関係におけるデバイスの実際の位置に対応する。 The registration step 151 of the global registration method described above is preferably given an estimate of the camera position and uses an objective function to model the plenoptic light field sample (in the above case). Calculates the error to be projected onto the city model. Use this objective function (also known as a cost function) to apply an iterative optimization process in such a way that the camera position estimate is refined and improved to minimize projection errors be able to. This optimization process can be broken down into the following steps:
1. Obtain / calculate an initial estimate of the position of the user device. This can be done, for example, in the case of a smartphone including a plenoptic camera, by using a smartphone GPS, an accelerometer and a compass to calculate the position and orientation of the device. This initial set value is set as the current set value.
2. Calculate the projection of the input plenoptic sample into the model. A projection error is calculated using the objective function (step 1510).
3. Given the error and objective function, calculate the next camera position estimate (step 1511) and set it as the current estimate.
4). If the error is larger than the specific threshold value, the process returns to step 2; otherwise, the process proceeds to step 5.
5. The current estimate is the optimized position of the user device and corresponds to the actual position of the device in relation to the model.

我々は、ライトフィールドを表わすデータを使用していることから、ステップ１５１０で使用される目的関数を調整し、こうして、目的関数がそのデータセット内に存在する全ての情報を使用し、標準的２Ｄ画像が使用される場合よりもレジストレーションをよりロバストなものにするようにすることができる。 Since we are using data representing a light field, we adjust the objective function used in step 1510, thus using all the information that the objective function exists in the data set, and using standard 2D The registration can be made more robust than when the image is used.

プレノプティック入力サンプルのために特別に調整された目的関数を導出して、あらゆる種類の変換および照明条件に対してレジストレーションをよりロバストなものにすることができる。プレノプティックモデルが利用できない場合、考えられる１つのアプローチは、３Ｄモデルからプレノプティック合成サンプルを生成することである。このサンプルは、仮想プレノプティックカメラをシミュレートし、異なる３Ｄモデル点上でレイトレーシング（ｒａｙ−ｔｒａｃｉｎｇ）プロセスを行うことによって生成可能である。３Ｄモデルの各点は３Ｄ座標ならびに反射率または透明度などの物理的プロパティを用いて表現され得ると考えられる。シーンの光源も同様に、現実的な３Ｄシーンを得る目的で記述されてよい。シーン光源が欠如している場合、照明はアンビエント（ａｍｂｉｅｎｔ）として、したがってシーンの各オブジェクトに同等に影響を及ぼすものとして、みなすことができる。レイトレーシング方法には、このとき、シーン内を進む（ｔｒａｖｅｌｉｎｇ）実際の光線をシミュレートするために空間内の光線経路を再構成することが関与する。光源の存在下で、光線はこれらの光源から出発してトレースされシーンのオブジェクト上に伝播させられる。アンビエントな照明が考慮される場合、光線は直接、３Ｄモデルの物理的点から生成される。反射、屈折、散乱または分散が、シーンレンダリングの優れたリアリズムを確保するためにレイトレーシングによってシミュレートされ得る光学的効果の一部分である。 An objective function tailored specifically for plenoptic input samples can be derived to make the registration more robust to all kinds of transformations and lighting conditions. If a plenoptic model is not available, one possible approach is to generate a plenoptic synthetic sample from the 3D model. This sample can be generated by simulating a virtual plenoptic camera and performing a ray-tracing process on different 3D model points. It is believed that each point of the 3D model can be represented using 3D coordinates and physical properties such as reflectance or transparency. The scene light source may be similarly described for the purpose of obtaining a realistic 3D scene. In the absence of a scene light source, lighting can be viewed as ambient and thus equally affecting each object in the scene. The raytracing method then involves reconstructing the ray path in space to simulate the actual ray traveling in the scene. In the presence of light sources, light rays are traced starting from these light sources and propagated onto the objects in the scene. When ambient lighting is considered, the rays are generated directly from the physical points of the 3D model. Reflection, refraction, scattering or dispersion are some of the optical effects that can be simulated by ray tracing to ensure good realism of scene rendering.

プレノプティックカメラセンサーをヒットするライトフィールドをシミュレートするため、仮想シーン内に仮想プレノプティックカメラを置くことができる。該カメラのメインレンズに入る全ての光線は、このとき、仮想センサー上に仮想的に投影されて、３Ｄモデルに反応するプレノプティック基準データを作り出すことができる。 A virtual plenoptic camera can be placed in the virtual scene to simulate a light field hitting the plenoptic camera sensor. All rays that enter the main lens of the camera can then be virtually projected onto a virtual sensor to create plenoptic reference data that is responsive to the 3D model.

このプレノプティック基準データをリトリーブした後、基準データ内とキャプチャされたデータ内の光線強度の間の相関関係が最大であるようなカメラの視点を決定することができる。モデル内のカメラの最も尤度の高い視点を決定するために、他の目的関数を使用することができるだろう。 After retrieving this plenoptic reference data, the camera viewpoint can be determined such that the correlation between the light intensities in the reference data and the captured data is maximal. Other objective functions could be used to determine the most likely viewpoint of the camera in the model.

ローカルフィーチャの検出および記述
方法ＡおよびＢは両方共、真に情報提供的（ｉｎｆｏｒｍａｔｉｖｅ）であるデータ、すなわちそのエントロピーが空間の他のエリアに比べて高いものであるデータの、特定的なそれゆえローカルなフィーチャのみにレジストレーション空間を削減することを目指している。その上、２つのローカルフィーチャ間の相互情報すなわち相対的エントロピーは、２つのローカルフィーチャが２つの異なるエリアを表わしている場合それらが互いから容易に差別化され得るように低いものであることを目指している。これらのフィーチャの最後の所望される特性は、同じシーンの２つのビューを所与として、これら２つのビューの間の変換がいかなるものであろうと（幾何変換、露光変化（ｅｘｐｏｓｕｒｅｃｈａｎｇｅｓ）など）同じフィーチャを検出することができるということにある。 Local Feature Detection and Description Both methods A and B are specific for data that is truly informative, ie data whose entropy is high compared to other areas of space. It aims to reduce the registration space for local features only. Moreover, the mutual information or relative entropy between the two local features aims to be low so that they can be easily differentiated from each other if the two local features represent two different areas. ing. The final desired properties of these features are the same, given two views of the same scene, whatever the transformation between these two views (geometric transformation, exposure changes, etc.) The feature can be detected.

一態様によると、レジストレーションおよび認識のために使用されるローカルフィーチャの種類は、シーンのタイプに応じて選択される。例えば、自然のパノラマ式ビューにおいては、ストリートレベルでの都市の場合と同じフィーチャは使用しない。前者の場合には、フィーチャとして水平線を使用することができ、後者の場合には、適切なフィーチャは、複数の異なる種類の深さが交差している点となる。 According to one aspect, the type of local feature used for registration and recognition is selected depending on the type of scene. For example, a natural panoramic view does not use the same features as a city at the street level. In the former case, a horizontal line can be used as the feature, and in the latter case, the appropriate feature is the point where multiple different types of depths intersect.

内容が参照によりここで援用されている国際公開第２０１２／０８４３６２号は、アルゴリズムがシーンにより左右される拡張現実方法について記載している。ただし、この文書は、レジストレーションのために使用されるローカルフィーチャのタイプをシーンのタイプに適応させることを示唆していない。国際公開第２０１２／０８４３６２号に記載の方法に類似する方法は、例えばデバイスの場所、画像の解析、ユーザーの選択、受信信号などから決定されるシーンタイプに応じて使用すべきローカルフィーチャのタイプを決定するために、ここに記載の装置および方法において使用可能である。 WO 2012/084362, the contents of which are incorporated herein by reference, describes an augmented reality method in which the algorithm depends on the scene. However, this document does not suggest adapting the type of local features used for registration to the type of scene. A method similar to that described in WO 2012/084362 can be used to determine the type of local feature to be used depending on the scene type determined from eg device location, image analysis, user selection, received signal, etc. It can be used in the apparatus and method described herein to determine.

ローカルフィーチャの第１の例：深さプレノプティックフィーチャ（ＤｅｐｔｈＰｌｅｎｏｐｔｉｃＦｅａｔｕｒｅｓ：ＤＰＦ）
一実施形態において、キャプチャデータのレジストレーションに使用されるローカルフィーチャには、平面のインターセクション（ｉｎｔｅｒｓｅｃｔｉｏｎ）が含まれる。 First example of local feature: Depth Plenoptic Features (DPF)
In one embodiment, the local features used for capture data registration include a planar intersection.

例えば都市環境内の写真または機械的部品などの製造されたオブジェクトの画像は、多くの場合、通常はジオメトリの観点から見てきわめて規則的であり通常テクスチャ度が低いものである多数の人工構造物を含む。これらのエリア内では、多数の平面が交差（ｉｎｔｅｒｓｅｃｔ）する点は、典型的に３Ｄでのコーナーを表わすと考えられる。したがって、このような人工的シーンにおいて、フィーチャ点は、最少数の平面が交差するエリアとして定義づけることができる。 Images of manufactured objects, such as photographs or mechanical parts in an urban environment, for example, are often numerous artificial structures that are usually very regular in terms of geometry and usually have a low degree of texture including. Within these areas, the points where multiple planes intersect are typically considered to represent corners in 3D. Thus, in such an artificial scene, feature points can be defined as areas where a minimum number of planes intersect.

このタイプのフィーチャの検出は、ライトフィールドフォーマットでのキャプチャデータ内に存在する全ての情報を活用することによって、効率良くかつ精確に行うことができる。 This type of feature detection can be done efficiently and accurately by taking advantage of all the information present in the captured data in light field format.

プレノプティックカメラ４１（図４）内でプレノプティックセンサーにより送出されるデータ中では、サブ画像の異なるピクセルが、マイクロレンズ２０上にさまざまな入射角で到来する、すなわち異なる距離にあるオブジェクトから来る光ビームに対応する。したがって、異なる集光面（ｆｏｃｕｓｅｄｐｌａｎｅ）内でオブジェクトの焦点が合っているエリアは、複数の隣接するピクセルが同じあるいはほぼ同じ値を有するサブ画像として容易に検出できる。 In the data sent by the plenoptic sensor in the plenoptic camera 41 (FIG. 4), different pixels of the sub-image arrive at different angles of incidence on the microlens 20, i.e. at different distances. Corresponds to the light beam coming from the object. Accordingly, an area where an object is focused in different focused planes can be easily detected as sub-images in which a plurality of adjacent pixels have the same or substantially the same value.

したがって、深さフィールドの計算または他の計算が激しいタスク無しで、異なる深さで焦点合せされた画像を作り出すために、異なるサブ画像からとられたピクセルセットがリトリーブされる。深さプレノプティックフィーチャを、異なる深さにある物理的点が同時に存在するエリアとして定義し、キャプチャされたデータを基準データとレジストレーションするためにこれらのフィーチャを使用することができる。 Thus, pixel sets taken from different sub-images are retrieved in order to produce focused images at different depths without the depth field calculations or other computationally intensive tasks. Depth plenoptic features can be defined as areas where physical points at different depths exist simultaneously, and these features can be used to register captured data with reference data.

異なる焦点距離でライトフィールドの異なる投影を伴うスタックを考慮する。このスタックの１つの画像を取り上げた場合、焦点が合っているオブジェクトは、前の画像上ではさほど焦点が合っていないことになる。同じことは次の画像についてもあてはまる。したがって、このスタック上で３Ｄグラディエント（３Ｄｇｒａｄｉｅｎｔ）を計算することが可能である。グラディエントマグニチュードが高いエリアは、焦点合せレベルの高いオブジェクト／ピクセルに対応する。このグラディエントマグニチュードが低いエリアは、キャプチャされたデータをレジストレーションするための高エントロピーフィーチャとして検出され使用され得る、異なる深さで存在するオブジェクトに対応する。したがって、同じ物理的エリアについて異なる焦点合せされた情報を提供するプレノプティックカメラの能力とこのイン−フォーカス検出技術を結合させると、結果として、情報提供レベルの高い反復可能なフィーチャが得られる。 Consider stacks with different projections of the light field at different focal lengths. If one image of this stack is taken, the object that is in focus will be less focused on the previous image. The same is true for the next image. Therefore, it is possible to calculate a 3D gradient on this stack. Areas with a high gradient magnitude correspond to objects / pixels with a high focus level. This low gradient magnitude area corresponds to objects present at different depths that can be detected and used as high entropy features for registering captured data. Thus, combining this in-focus detection technology with the ability of plenoptic cameras to provide different focused information for the same physical area results in repeatable features with a high level of information provision. .

したがって、このローカルフィーチャ検出方法には、例えば、視線にほぼ平行である平面に対応し、スタックの３Ｄグラディエントが低くて、異なる深さに存在する同一のオブジェクトに対応している、データ内のエリアの検出が含まれていてよい。この方法にはまた、視線にほぼ直交する平面に対応しそのために隣接するピクセルが類似の値を有しているデータ内のエリアの検出も含まれていてよい。ローカルフィーチャの検出方法には、視線にほぼ直交する平面と視線にほぼ平行な平面との間の交差点の検出が含まれていてよい。 Thus, this local feature detection method includes, for example, an area in the data that corresponds to a plane that is substantially parallel to the line of sight, the stack has a low 3D gradient, and corresponds to the same object at different depths. Detection may be included. The method may also include detecting areas in the data that correspond to a plane that is substantially orthogonal to the line of sight and for which adjacent pixels have similar values. The local feature detection method may include detecting an intersection between a plane substantially perpendicular to the line of sight and a plane substantially parallel to the line of sight.

より一般的には、ローカルフィーチャの検出は、特定の深さに対応するピクセルが、異なる深さにある同じサブ画像のピクセルと所定の関係を有しているプレノプティックライトフィールド内のエリアの検出を含むことができる。例えば、深さ方向（視線と平行）での高エントロピーまたは高い周波数も同様にレジストレーションにとって有用なフィーチャであるとみなされてよい。 More generally, local feature detection is an area in a plenoptic light field where pixels corresponding to a particular depth have a predetermined relationship with pixels of the same sub-image at different depths. Detection can be included. For example, high entropy or high frequency in the depth direction (parallel to the line of sight) may also be considered a useful feature for registration.

ローカルフィーチャの第２の例：視差（ｄｉｓｐａｒｉｔｉｅｓ）に基づくローカルフィーチャ
一実施形態において、キャプチャされたプレノプティックライトフィールドの識別に使用されるローカルフィーチャは、ライトフィールド内に含まれている視差情報を使用する。 Second example of local feature: local feature based on disparities In one embodiment, the local feature used to identify the captured plenoptic light field includes disparity information contained within the light field. Is used.

物理的点の視差は、１つの平面に対する物理的点の２つの投影間の変位である。典型的視覚系においては、視差は、同じ画像平面上の２つの異なるビューから投影された同じ物理的点についての位置の差に対応するものとして計算される。 The disparity of a physical point is the displacement between two projections of the physical point with respect to one plane. In a typical visual system, the parallax is calculated as corresponding to the difference in position for the same physical point projected from two different views on the same image plane.

２つの異なるビューからの１つの点の投影の変位は、その点が上に投影される平面との関係におけるその点の深さに関係づけされる。カメラ平面から一定の距離のところにある点は、該平面からさらに離れた点に比べてより高い視差（変位）値を有する。すなわち、オブジェクトが該平面に近ければ近いほど、その視差値は大きくなる。その結果として、深さは視差値と逆の関係にある。 The displacement of a point projection from two different views is related to the depth of that point in relation to the plane on which the point is projected. A point at a certain distance from the camera plane has a higher parallax (displacement) value than a point further away from the plane. That is, the closer the object is to the plane, the greater the parallax value. As a result, the depth is inversely related to the parallax value.

プレノプティックライトフィールドのキャプチャには、物理的点から来る光線の位置および方向の情報を含むので、同じ物理的点から来る異なるビューに対応する異なる光線を抽出することが可能である。このとき、これらの光線に結びつけられたサブ画像ピクセルを用いて、視差および深さ情報を計算することができる。 Since the capture of the plenoptic light field includes information on the position and direction of the rays coming from the physical point, it is possible to extract different rays corresponding to different views coming from the same physical point. At this time, parallax and depth information can be calculated using the sub-image pixels associated with these rays.

このとき、深さの情報を、ローカルフィーチャと結びつけて識別およびマッチングのロバスト性を改善することができる。 At this time, depth information can be combined with local features to improve the robustness of identification and matching.

一実施形態においては、深さ情報を平均として用いて、特定の深さに存在するオブジェクトへと点をクラスタ化することができる。この実施形態は、幾何学的に規則的である有意な数の人工的構造を多くの場合に含む製造されたオブジェクトまたは都市シーンの場合に特に有利である。実際、平面は、このような人工的環境内で頻出する。このとき、クラスタは、カメラの視線に直交する特定の深さにおける平面を表わす。 In one embodiment, the depth information can be used as an average to cluster points into objects that exist at a particular depth. This embodiment is particularly advantageous for manufactured objects or urban scenes that often contain a significant number of artificial structures that are geometrically regular. In fact, planes are frequent in such artificial environments. At this time, the cluster represents a plane at a specific depth orthogonal to the line of sight of the camera.

クラスタを用いると、マッチングはよりロバストなものになる。実際、単一のローカルフィーチャに対してただ制約を加える代りに、複数のフィーチャのグループを合わせてマッチングすることができる。これらのクラスタのマッチングは、ローカルフィーチャのみを用いる場合に比べてより制約が大きく、したがってより優れた結果を生み出す。 With clusters, matching is more robust. In fact, instead of just constraining a single local feature, multiple groups of features can be matched together. Matching these clusters is more restrictive than using only local features and therefore produces better results.

キーポイントのクラスタ化はまた、いずれのクラスタにも属さない無意味で切り離されたフィーチャを廃棄するという利点を有する。こうして、１つのシーンのマッチングに必要なフィーチャの数は減少し、その結果、大きいアノテーションまたは多くのキャプチャされた画像をマッチングする必要のあるシステムにより適応したものとなる。 Keypoint clustering also has the advantage of discarding meaninglessly separated features that do not belong to any cluster. Thus, the number of features needed to match a scene is reduced, and as a result is more adapted to systems that need to match large annotations or many captured images.

ローカルフィーチャの第３の例：エピポーラボリューム形態（ＥｐｉｐｏｌａｒＶｏｌｕｍｅｆｏｒｍ）
一実施形態においては、意味のある安定したローカルフィーチャ点を検出するために、エピポーラボリュームそしてより具体的にはこれらの体積内のエピポーラ線と呼ばれる線が使用される。エピポーラ線は、ハリスアフィンフィーチャ領域検出器（Ｈａｒｒｉｓａｆｆｉｎｅｆｅａｔｕｒｅｒｅｇｉｏｎｄｅｔｅｃｏｒ）などの他のフィーチャの検出器と組合わせることができる。エピポーラボリューム形態としてプレノプティックライトフィールドサンプルを表現することは、プレノプティックボリュームの多くの解析を簡略化し迅速化することから、極めて興味深い。エピポーラボリュームは、２つの画像間のカメラの動きが単に水平方向並進である場合に、画像を合わせてスタッキング（ｓｔａｃｋｉｎｇ）することによって作り出される。これらのボリュームを解析することで、以下の結論が得られる。すなわち、これらのボリューム上に存在する線は、単一の物理的点を表わし得る。したがってこの線の傾斜はまた、この点の深さを定義する。 Third example of local feature: Epipolar Volume form
In one embodiment, epipolar volumes and more specifically lines called epipolar lines within these volumes are used to detect meaningful and stable local feature points. Epipolar lines can be combined with other feature detectors, such as Harris affine feature region detectors. Representing a plenoptic light field sample as an epipolar volume form is extremely interesting because it simplifies and speeds up many analyzes of plenoptic volumes. An epipolar volume is created by stacking images together when the camera movement between the two images is simply horizontal translation. The following conclusions can be obtained by analyzing these volumes. That is, a line that exists on these volumes may represent a single physical point. Therefore, the slope of this line also defines the depth of this point.

したがって、一実施形態においては、ライトフィールドデータ内のローカルフィーチャが、エピポーラボリューム空間内で決定され投影される。この空間内で、不変ではない（ｎｏｎ−ｓｔａｂｌｅ）フィーチャを除去するために２つの短かい線をフィルタリングする一方で、点を線にクラスタ化し線１本あたり単一のローカルフィーチャ点のみを保持（ｒｅｔａｉｎ）する。アウトプットにおいて、異なる視点の下で検出された通りの不変の（ｓｔａｂｌｅ）ローカルフィーチャセットが得られる。 Thus, in one embodiment, local features in the light field data are determined and projected in epipolar volume space. Within this space, two short lines are filtered to remove non-stable features, while points are clustered into lines and only a single local feature point per line is retained ( retain). At the output, a stable local feature set is obtained as detected under different viewpoints.

ローカルフィーチャの記述：バイナリープレノプティックフィーチャ記述子
（例えば図５のステップ１０３における）ローカルフィーチャの記述は、バイナリー形態で行うことができるだろう。２つのフィーチャを合わせて比較してそれらが類似しているか否かを見るためにハミング距離を使用することが可能であるため、各フィーチャの記述子サイズが著しく縮小されるのみならず、比較もスピードアップする。実際、ハミング距離は、一度に複数のバイトについて距離を計算する専用のベクトル命令を用いて、効率良く計算可能である。 Local Feature Description: Binary Plenoptic Feature Descriptor Local feature description (eg, in step 103 of FIG. 5) could be done in binary form. The Hamming distance can be used to compare two features together to see if they are similar, so not only the descriptor size of each feature is significantly reduced, but also the comparison Speed up. In fact, the Hamming distance can be calculated efficiently using a dedicated vector instruction that calculates the distance for multiple bytes at once.

上述のＤＰＦフィーチャは、グラディエントオペレータ（ｇｒａｄｉｅｎｔｏｐｅｒａｔｏｒ）由来の情報を活用する記述子を用いて記述され得る。さらに迅速な方法は、ピクセル値比較を実施して検出されたフィーチャを記述することである。これを、グラディエントオペレータの簡略版とみることができる。ピクセル値のこれらの比較は、前に検出されたフィーチャ点のまわりで行われ、このような記述子の所望される反復可能性および情報提供性を保つことができるようになっている。単一の比較の結果は、長さ１ビットの情報に対応する。比較を多数回行うことで、結果として、ビット−ストリング記述子が得られ、ここで各ビットは特定の１回の比較に対応する。 The DPF features described above can be described using descriptors that leverage information from a gradient operator. A faster method is to perform a pixel value comparison to describe the detected features. This can be viewed as a simplified version of the gradient operator. These comparisons of pixel values are made around previously detected feature points so that the desired repeatability and informability of such descriptors can be maintained. The result of a single comparison corresponds to 1 bit long information. Performing multiple comparisons results in a bit-string descriptor, where each bit corresponds to a specific single comparison.

２進化された記述子のこの原理は、プレノプティックライトフィールドデータの全ての情報を活用してプレノプティックバイナリー記述子を得ることにより、プレノプティック空間内で使用することができる。標準的ピンホールカメラにより画像が生成される場合、ピクセル値の比較は、画像の視覚的情報の比較に対応するだろう。プレノプティックカメラの場合には、比較は、記述子のエントロピーを最大化するために異なる次元で行われる。 This principle of binarized descriptors can be used in plenoptic space by taking advantage of all the information of plenoptic light field data to obtain plenoptic binary descriptors. . If the image is generated by a standard pinhole camera, the pixel value comparison will correspond to a comparison of the visual information of the image. In the case of a plenoptic camera, the comparison is done in different dimensions to maximize descriptor entropy.

先に見た通り、プレノプティック画像は複数のサブ画像で構成されている。単一のサブ画像は、異なる視点の下での同じ物理的点の複数の表現を含む。したがって、プレノプティックバイナリー記述子においては、この情報冗長性が有効に使用される。このプレノプティックバイナリー記述子が前述のＤＰＦ検出器と結合された場合、該検出器によって有効利用される焦点スタックもまた、比較点のソース（ｓｏｕｒｃｅ）として使用することができる。したがって、プレノプティックバイナリー記述子は、エリアの異なるビューについての情報と、このフィーチャエリアの異なる深さについての情報の両方を含む。 As seen earlier, the plenoptic image is composed of a plurality of sub-images. A single sub-image contains multiple representations of the same physical point under different viewpoints. Therefore, this information redundancy is effectively used in plenoptic binary descriptors. When this plenoptic binary descriptor is combined with the aforementioned DPF detector, the focal stack utilized by the detector can also be used as a source of comparison points. Thus, the plenoptic binary descriptor contains both information about different views of the area and information about different depths of this feature area.

このとき、プレノプティックバイナリー記述子は、比較点ペアの１セットを選択することによって計算される。これらのペアの一方の部分は、ＤＰＦ検出器によって検出されたフィーチャ点エリアの周囲にあるサブ画像から取られたピクセル値の場所に対応する。他方の部分は、フィーチャ点エリアの周囲にあるがＤＰＦ検出器の焦点スタックにおける異なる深さにある点に対応している。このペアのセットは、一回のみ選択され、同じものが全ての記述子の計算について使用される。 At this time, the plenoptic binary descriptor is calculated by selecting one set of comparison point pairs. One part of these pairs corresponds to the location of pixel values taken from the sub-image around the feature point area detected by the DPF detector. The other part corresponds to points around the feature point area but at different depths in the focal stack of the DPF detector. This set of pairs is selected only once and the same is used for all descriptor calculations.

この比較点セットを選択するための種々の戦略が存在する。第１のものは、焦点スタックまたはサブ画像のいずれかであり得る所望の空間内で無作為に選択するという戦略である。これは高い信頼性でうまく機能する一方で、最高のセットを学習し、同じフィーチャ間の内部距離（ｉｎｔｒａ−ｄｉｓｔａｎｃｅ）を最小化しながら異なるフィーチャ間の相互距離（ｉｎｔｅｒ−ｄｉｓｔａｎｃｅ）を最大化する目的で、機械学習（ｍａｃｈｉｎｅｌｅａｒｎｉｎｇ）を使用することもまた可能である。中サイズのフィーチャエリアについては、記述子の相関関係を最小化しながらその分散を最大化するために、最良の比較点の欲張り（ｇｒｅｅｄｙ）アルゴリズムに基づく検索が実施される。 There are various strategies for selecting this set of comparison points. The first is a strategy of selecting randomly within a desired space, which can be either a focus stack or a sub-image. This works well with high reliability, while learning the best set and maximizing the inter-distance between different features while minimizing the intra-distance between the same features It is also possible to use machine learning. For medium size feature areas, a search based on the best comparison point greedy algorithm is performed to maximize the variance while minimizing the correlation of descriptors.

所与のフィーチャエリアを表わすバイナリー記述子を計算するために、以下の手順を適用することができる：
１．各比較点ペアについて、第１の比較点におけるレンダリングされるグレースケールピクセル値が、他方の点におけるものよりも小さいか否かを決定する。
２．比較が真である場合には、バイナリーの「１」が、（当初は空である）記述子に追加され、そうでなければ、バイナリーの「０」が追加される。
３．該手順を各比較点について反復し、バイナリーストリング記述子を作り出す。 In order to compute a binary descriptor representing a given feature area, the following procedure can be applied:
1. For each comparison point pair, it is determined whether the rendered grayscale pixel value at the first comparison point is less than at the other point.
2. If the comparison is true, a binary “1” is added to the descriptor (which is initially empty), otherwise a binary “0” is added.
3. The procedure is repeated for each comparison point to create a binary string descriptor.

これらの技術を用いて、キャプチャされたデータのこのように決定されたバイナリー記述子を、基準プレノプティックライトフィールドのバイナリー記述子と比較することができる。この比較は、このプレノプティックフィーチャ空間内でのそれらの相対的距離を決定するため、ハミング距離に基づくものであってよい。 Using these techniques, the thus-determined binary descriptor of the captured data can be compared with the binary descriptor of the reference plenoptic light field. This comparison may be based on a Hamming distance to determine their relative distance in this plenoptic feature space.

レジストレーションから拡張（ａｕｇｍｅｎｔｅｄ）シーンへ
上述の方法のうちのいずれか１つでのレジストレーションの後、レジストレーションされた基準シーンに対して相対的なユーザーデバイス４内のプレノプティックカメラ４１の位置および配向がわかる。キャプチャされたデータに対応する基準データも同様にわかり、基準データベース内で、データの種々の要素またはフィーチャについてのアノテーションセットと結びつけられる。アノテーションは、テキスト、画像、ビデオ、音声、既存のフィーチャの操作または強調、３Ｄオブジェクトなどで構成されていてよい。これらは、アノテートされるべきシーンおよびビューのコンテキストに左右される。 From registration to augmented scene After registration with any one of the methods described above, the plenoptic camera 41 in the user device 4 is relative to the registered reference scene. The position and orientation are known. Reference data corresponding to the captured data is similarly known and associated with an annotation set for various elements or features of the data in the reference database. Annotations may consist of text, images, video, audio, manipulation or enhancement of existing features, 3D objects, and the like. These depend on the scene and view context to be annotated.

その後、最終拡張（アノテートされた）画像は、レンダリングされる。例えば、山の名前または他のアノテーションを該画像上に重ね合せた伴うキャプチャされた景観を示す２Ｄ画像（静止画またはビデオ）を生成することが可能である。あるいは都市環境においては、最寄の店舗およびアメニティへの道順を画像上に表示することができる。 The final expanded (annotated) image is then rendered. For example, it is possible to generate a 2D image (still image or video) showing a captured landscape with a mountain name or other annotation superimposed on the image. Alternatively, in an urban environment, directions to the nearest store and amenity can be displayed on the image.

一実施形態において、ビュー（焦点の合ったオブジェクト、カメラの視点）のレンダリングは、アノテーションの組込みの前に発生する。したがって、所与のレンダリングされたビューについてのポーズならびにモデル内のアノテーションの位置が分かっているため、それらを、レンダリングするように選択したビュー内に投影することが可能である。 In one embodiment, rendering of the view (in-focus object, camera perspective) occurs prior to the incorporation of the annotation. Thus, since the poses for a given rendered view as well as the position of the annotations in the model are known, they can be projected into the view selected to render.

拡張現実（ＡＲ）プレノプティックレンダリングおよびアプリケーション
プレノプティック空間内の１シーンのキャプチャは、拡張現実レンダリングに関する新たな可能性の扉を聞く。実際、プレノプティックカメラ内でセンサーをヒットする光線の位置および方向は、他のフィーチャの中で、深さ情報のリトリーブ、画像のキャプチャ後の再度の焦点合せ、あるいはユーザーの視点の変更を可能にするので、シーンレンダリングをさらに良くしてユーザーに新しい経験を提供するためにこの情報を活用することができる。以下の段落では、考えられるいくつかの進歩したレンダリング能力について記述する。 Augmented Reality (AR) Plenoptic Rendering and Application The capture of a scene in plenoptic space hears a new potential door for augmented reality rendering. In fact, the position and direction of rays that hit the sensor in the plenoptic camera can, among other features, retrieve depth information, refocus after image capture, or change the user's viewpoint. This information can be leveraged to further improve scene rendering and provide a new experience to the user. The following paragraphs describe some possible advanced rendering capabilities.

実際、拡張現実の１つの特別な利点は、例えばいくつかの関連追加情報を得るために興味のあるフィーチャをクリックすることなどによって、ユーザーが、プロセスによりレンダリングされた画像の要素と対話できるという事実に関連する。この対話（ｉｎｔｅｒａｃｔｉｏｎ）は、ユーザーが受動的である代りに、現実のものであれ仮想のものであれ、オブジェクトと直接対話することになるため、極めて有利である。 In fact, one particular advantage of augmented reality is the fact that the user can interact with the elements of the image rendered by the process, for example by clicking on features of interest to obtain some relevant additional information. is connected with. This interaction is very advantageous because the user will interact directly with the object, whether real or virtual, instead of being passive.

例えば、多くの場合に、レンダリングされた画像のどの特定のオブジェクトが、インタラクティブでありアノテーションと結びつけられていて、従ってユーザーが例えばそれをクリックすることができる、ということをユーザーに告げることが望まれる。この問題を解決する１つの方法は、例えば該オブジェクトを指し示す矢印を伴うテキストボックスなどの通知を表示することである。しかしながら、複数のインタラクティブオブジェクトがキャプチャされたシーンの一部である場合には、インタラクティブな要素が何であるかをユーザーに告げる多くの通知が存在する必要がある。 For example, it is often desirable to tell the user that any particular object in the rendered image is interactive and associated with the annotation, so that the user can click it, for example . One way to solve this problem is to display a notification, such as a text box with an arrow pointing to the object. However, if multiple interactive objects are part of the captured scene, there must be many notifications telling the user what the interactive elements are.

プレノプティック空間は、新たなインタラクティブな要素を許容し、これらによりユーザーにより良い体験が提供される。前述の通り、プレノプティックセンサによりキャプチャされたデータは、データがキャプチャされた後も、異なる焦点距離を有する２Ｄ画像としてレンダリングされる能力を有する。また、再焦点合せプロセスは、データのローカル部分ごとに独立して計算でき、必ずしもデータを全体として考慮しない。換言すると、これはすなわち、１つの画像の特定のオブジェクトを、たとえこれらのオブジェクトがシーン内で同じ深さに属さない場合であっても焦点合せできるようにすることができる、ということを意味している。 The plenoptic space allows new interactive elements that provide a better experience for the user. As described above, the data captured by the plenoptic sensor has the ability to be rendered as a 2D image with different focal lengths even after the data is captured. Also, the refocusing process can be calculated independently for each local part of the data and does not necessarily consider the data as a whole. In other words, this means that certain objects in one image can be focused even if they do not belong to the same depth in the scene. ing.

したがって、シーンの他の要素はぼやけている一方で、アノテートされたオブジェクトまたはアノテートされた画像のフィーチャは焦点の合った状態となるようにレンダリングされ得る。このようにして、ユーザーは、画像中において何がアノテートされたまたはインタラクティブオブジェクトであるかそしてどれがそうではないのかに、直ちに気づくことができる。 Thus, annotated objects or annotated image features can be rendered in focus while the other elements of the scene are blurred. In this way, the user can immediately notice what is annotated or interactive in the image and what is not.

一例として、インタラクティブな拡張現実マニュアルまたはビデオチュートリアルを構想することができ、ここで、プリンタの異なるノブまたは部品が、ユーザーの選択に基づき拡張現実の形で表示される有用な命令を含む。２Ｄアノテートされた画像が、プレノプティックライトフィールドからレンダリングされ得、これがプリンタを示し、かつ、画像の残りの部分がぼやけている一方で、その全てのインタラクティブノブまたは部品を焦点の合った状態にする。したがって、ユーザーには、プリンタのインタラクティブ部品が提示され、ユーザーはこれをクリックしてアノテーションにアクセスすることができる。ユーザーは同様に、他の要素の焦点合せされたビューを望む場合、焦点深さを変更してもよい。 As an example, an interactive augmented reality manual or video tutorial can be envisaged, where the different knobs or parts of the printer contain useful instructions that are displayed in augmented reality based on user selections. A 2D annotated image can be rendered from the plenoptic light field, showing the printer, and the rest of the image is blurred while all its interactive knobs or parts are in focus To. Thus, the user is presented with an interactive part of the printer, who can click to access the annotation. Similarly, the user may change the depth of focus if he wants a focused view of other elements.

プレノプティックカメラの視点の変更は、部分的（ｐａｒｔｉａｌ）３Ｄ要素として１つのシーンの各点をレンダリングする可能性を提供する。シーンに由来する光線は、オブジェクトの周囲の全ての位置からではなく１つの位置からキャプチャされることから、３Ｄ再構成は部分的なものでしかない。しかしながら、この部分的３Ｄ再構成は、スインギング（ｓｗｉｎｇｉｎｇ）／ジッタリング（ｊｉｔｔｅｒｉｎｇ）運動でシーン内のオブジェクトをレンダリングすることを可能にする。これらのオブジェクトは、特定の方向から見られる、画像からポップアウトする３Ｄオブジェクトとして現われる。ここでもまた、この効果は、シーンの選択されたオブジェクトについてローカルで計算され得る。したがって、１つのシーンのインタラクティブ要素は、運動するオブジェクトとして表示されこうしてユーザーの注意を引くことができ、一方他のオブジェクトは静止状態にとどまる。このとき、アノテーションの内容表示をトリガーするためユーザーはこれらのスインギング要素をクリックすることができる。 Changing the plenoptic camera viewpoint offers the possibility to render each point of a scene as a partial 3D element. Rays originating from the scene are captured from one location rather than all locations around the object, so 3D reconstruction is only partial. However, this partial 3D reconstruction allows to render objects in the scene with swinging / jittering motion. These objects appear as 3D objects that pop out of the image as seen from a particular direction. Again, this effect can be calculated locally for the selected object in the scene. Thus, the interactive elements of one scene are displayed as moving objects and can thus draw the user's attention while the other objects remain stationary. At this time, the user can click on these swinging elements to trigger the display of annotation contents.

以上で記述した方法のさまざまなオペレーションは、さまざまなハードウェアおよび／またはソフトウェアコンポーネント（単複）、回路、および／またはモジュール（単複）などの、これらのオペレーションを実施できる任意の好適な手段によって実施されてよい。概して、本出願中に記載のいずれのオペレーションも、それらを実施することのできる対応する機能的手段によって実施されてよい。さまざまな手段、論理ブロックおよびモジュールには、回路、特定用途向け集積回路（ＡＳＩＣ）、または汎用プロセッサ、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ信号（ＦＰＧＡ）または他のプログラマブル論理デバイス（ＰＬＤ）、個別ゲートまたはトランジスタロジック、個別ハードウェアコンポーネント、あるいはここに記載の機能を果たすように設計されたこれらの任意の組合せを含む（ただしこれらに限定されない）さまざまなハードウェアおよび／またはソフトウェアコンポーネント（単複）および／またはモジュール（単複）、が含まれていてよい。汎用プロセッサは、マイクロプロセッサであってよいが、代替的には、プロセッサは、任意の市販のプロセッサ、コントローラ、マイクロコントローラまたは状態機械（ｓｔａｔｅｍａｃｈｉｎｅ）であってよい。プロセッサは、計算用デバイスの組合せ、例えばＤＳＰとマイクロプロセッサ、複数のマイクロプロセッサ、ＤＳＰコアと併用した１つ以上のマイクロプロセッサの組合せ、あるいは他の任意のこのような構成として実装されてもよい。サーバーは、単一の機械として、一組の機械として、仮想サーバーとして、またはクラウドサーバーとして実装されてよい。 The various operations of the methods described above are performed by any suitable means capable of performing these operations, such as various hardware and / or software component (s), circuits, and / or module (s). It's okay. In general, any of the operations described in this application may be performed by corresponding functional means capable of performing them. Various means, logic blocks, and modules include circuits, application specific integrated circuits (ASICs), or general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate array signals (FPGAs). Or various other including, but not limited to, programmable logic devices (PLDs), individual gate or transistor logic, individual hardware components, or any combination thereof designed to perform the functions described herein Hardware and / or software component (s) and / or module (s) may be included. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. The processor may be implemented as a combination of computing devices, such as a DSP and microprocessor, multiple microprocessors, a combination of one or more microprocessors in combination with a DSP core, or any other such configuration. A server may be implemented as a single machine, as a set of machines, as a virtual server, or as a cloud server.

ここで使用される「ライトフィールドデータ」という表現は、プレノプティックカメラで生成されたか、またはあたかもプレノプティックカメラでキャプチャされたかのように３Ｄモデルから計算された、そして１つのシーンのライトフィールド画像すなわち光の明度および色だけでなくこの光の方向も記憶されている画像を記述している任意のデータを意味する。このようなプレノプティックライトフィールド画像からレンダリングされた２Ｄまたは３Ｄ投影は、この光の方向が失なわれているため、プレノプティックライトフィールド画像とはみなされない。 As used herein, the expression “light field data” is calculated from a 3D model as if it were generated by a plenoptic camera or captured by a plenoptic camera, and the light of one scene It means any data describing a field image, ie an image in which not only the lightness and color of light but also the direction of this light is stored. A 2D or 3D projection rendered from such a plenoptic light field image is not considered a plenoptic light field image because the direction of this light has been lost.

ここで使用される「プレノプティック空間」という表現は、多次元空間であって、該空間でライトフィールドを記述できる、すなわち空間内の全ての方向での光またはセンサーに達する光の量を記述するファンクションを記述できる多次元空間を意味していてよい。プレノプティック空間は、各サブ画像の位置についての少なくとも２つのパラメータ、そしてこのサブ画像に達する光の方向についての少なくとも１つの追加のパラメータによって記述され得る。多くの場合、プレノプティック空間は、各サブ画像の位置についての２つのパラメータ、このサブ画像上への光の方向についての２つのパラメータ、波長についての少なくとも１つのパラメータ、および場合によっては時間についての１つのパラメータ（ビデオの場合）により記述される。 As used herein, the expression “plenoptic space” is a multidimensional space in which the light field can be described, ie, the amount of light reaching the sensor or light in all directions in the space. It may mean a multidimensional space in which the function to be described can be described. The plenoptic space can be described by at least two parameters for the position of each sub-image and at least one additional parameter for the direction of light reaching this sub-image. In many cases, the plenoptic space consists of two parameters for the position of each sub-image, two parameters for the direction of light on this sub-image, at least one parameter for the wavelength, and possibly time. Is described by one parameter (for video).

ここで使用される「アノテーション（ａｎｎｏｔａｔｉｏｎ）」という用語は、例えばテキスト、静止画像、ビデオ画像、ロゴ、画像レイヤー、音声および／または、画像に重ね合わされるかまたは追加され得る他の要素を含めた、多様な考えられる要素を包含する。 The term “annotation” as used herein includes, for example, text, still images, video images, logos, image layers, sound and / or other elements that can be superimposed or added to the image. Encompasses a variety of possible elements.

ここで使用される「ピクセル」という用語は、１つの単一モノクロフォトサイト、または異なる色で光を検出するための複数の隣接するフォトサイトを意味していてよい。例えば、赤、緑および青色光を検出するための３つの隣接が、単一のピクセルを形成し得る。 As used herein, the term “pixel” may refer to a single monochrome photosite or a plurality of adjacent photosites for detecting light in different colors. For example, three neighbors for detecting red, green, and blue light can form a single pixel.

ここで使用される「決定する（ｄｅｔｅｒｍｉｎｉｎｇ）」という用語は、多様なアクションを包含する。例えば、「決定する」という用語には、計算（ｃａｌｃｕｌａｔｉｎｇ）する、コンピュータで計算（ｃｏｍｐｕｔｉｎｇ）する、処理する、導出（ｄｅｒｉｖｉｎｇ）する、調査（ｉｎｖｅｓｔｉｇａｔｉｎｇ）する、参照（ｌｏｏｋｉｎｇｕｐ）する（例えばテーブル、データベースまたは別のデータ構造内で参照する）、確定（ａｓｃｅｒｔａｉｎｉｎｇ）する、推定（ｅｓｔｉｍａｔｉｎｇ）することなどが含まれる。また、「決定する」という用語は、受信する（例えば情報を受信する）、アクセスする（例えばメモリー内のデータにアクセスする）ことなども含む。また、「決定する」という用語には、解決（ｒｅｓｏｌｖｉｎｇ）する、選択（ｓｅｌｅｃｔｉｎｇ，ｃｈｏｏｓｉｎｇ）する、設定（ｅｓｔａｂｌｉｓｈｉｎｇ）することなども含まれてよい。 As used herein, the term “determining” encompasses a variety of actions. For example, the term “determining” includes calculating, computing, processing, deriving, investigating, looking up (eg, table, Including referencing within a database or another data structure), ascertaining, estimating, and the like. The term “determining” also includes receiving (eg, receiving information), accessing (eg, accessing data in a memory) and the like. In addition, the term “determine” may include resolving, selecting, selecting, setting, and the like.

シーンの一画像をキャプチャすることには、カメラの画像センサーに達する光の明度を測定するためのデジタルカメラの使用が関与する。ライトフィールドデータをキャプチャすることには、プレノプティックカメラの使用が関与する場合があり、あるいは、シーンおよび光源の３Ｄモデルまたは他の記述からライトフィールドデータを生成することが関与する場合もある。 Capturing an image of a scene involves the use of a digital camera to measure the light intensity reaching the camera image sensor. Capturing light field data may involve the use of plenoptic cameras, or it may involve generating light field data from 3D models or other descriptions of scenes and light sources. .

「ビューをレンダリングする」、例えば「ライトフィールドデータから２Ｄビューをレンダリングする」という表現は、画像を計算または生成するアクション、例えばライトフィールドデータ内に含まれる情報から２Ｄ画像を計算するアクションを包含する。複数の異なるビューがレンダリングされてよいという事実を強く主張するために、「ビューを投影する」、例えば「ライトフィールドデータに基づいて２Ｄビューを投影する」という表現が使用される場合もある。 The expression “render view”, eg “render a 2D view from light field data” encompasses actions to calculate or generate an image, eg to calculate a 2D image from information contained in the light field data. . The expression “project a view”, eg “project a 2D view based on light field data”, may be used to assert the fact that multiple different views may be rendered.

本開示に関連して記述された方法またはアルゴリズムのステップは、直接ハードウェアの形、プロセッサにより実行されるソフトウェアモジュールの形あるいは、その２つの組合せの形で実施されてよい。ソフトウェアモジュールは、当該技術分野において公知のあらゆる形態の記憶媒体内に存在していてよい。使用され得る記憶媒体の一部の例としては、ランダムアクセスメモリー（ＲＡＭ）、読取り専用メモリー（ＲＯＭ）、フラッシュメモリー、ＥＰＲＯＭメモリー、ＥＥＰＲＯＭメモリー、レジスタ、ハードディスク、リムーバブルディスク、ＣＤ−ＲＯＭなどがある。ソフトウェアモジュールは、単一の命令または多くの命令を含んでいてよく、かつ、複数の異なるコードセグメントを超えて、異なるプログラム中に、そして多数の記憶媒体を横断して分散させられてよい。ソフトウェアモジュールは、実行可能なプログラム、完全なプログラム内で使用される一つの部分、ルーチンまたはライブラリ、複数の相互接続されたプログラム、多くのスマートホン、タブレットまたはコンピュータにより実行される「ａｐｐｓ（アプリケーション）」、ウィジェット、フラッシュアプリケーション、ＨＴＭＬコードの一部分などで構成されていてよい。記憶媒体は、プロセッサに結合されて、プロセッサが記憶媒体から情報を読出し、それに情報を書込むことができるようになっていてよい。代替的には、記憶媒体はプロセッサに必須であってよい。データベースは、ＳＱＬデータベース、ＸＭＬドキュメントセット、セマンティックデータベース、またはＩＰネットワーク上で利用可能な情報セットを含む任意の構造化されたデータコレクション、または他の任意の好適な構造として実装されてよい。 The method or algorithm steps described in connection with this disclosure may be implemented in the form of direct hardware, in the form of software modules executed by a processor, or in a combination of the two. A software module may reside in any form of storage medium that is known in the art. Some examples of storage media that may be used include random access memory (RAM), read only memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM, and the like. A software module may include a single instruction or many instructions and may be distributed across different code segments, across different programs, and across multiple storage media. A software module can be an executable program, a single part used in a complete program, a routine or library, a plurality of interconnected programs, an “apps” that is executed by many smartphones, tablets or computers. ”, A widget, a flash application, a part of HTML code, or the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The database may be implemented as a SQL database, an XML document set, a semantic database, or any structured data collection that includes a set of information available on an IP network, or any other suitable structure.

こうして、ある態様には、ここで提示されたオペレーションを実施するためのコンピュータプログラム製品が含まれていてよい。例えば、このようなコンピュータプログラム製品は、命令が記憶された（および／またはコード化された）コンピュータで読取可能な媒体を含んでいてよく、これらの命令は、ここに記載のオペレーションを実施するため１つ以上のプロセッサによって実行可能である。ある態様については、コンピュータプログラム製品は、パッケージングマテリアルを含んでいてよい。 Thus, certain aspects may include a computer program product for performing the operations presented herein. For example, such computer program products may include a computer-readable medium having instructions stored (and / or encoded) for performing the operations described herein. It can be executed by one or more processors. For certain aspects, the computer program product may include packaging material.

クレームは、以上で例示した精確な構成およびコンポーネントに限定されるわけではないということを理解すべきである。クレームの範囲から逸脱することなく、以上に記載の方法および装置の配置、オペレーションおよび詳細にさまざまな修正、変更および変化を加えてもよい。 It should be understood that the claims are not limited to the precise configuration and components illustrated above. Various modifications, changes and variations may be made in the arrangement, operation and details of the methods and apparatus described above without departing from the scope of the claims.

１メインレンズ
４ユーザーデバイス
５サーバー
６ネットワーク
２０マイクロレンズ
２１画像センサー
４０ディスプレイ
４１プレノプティックカメラ
５０ストレージ
５１プロセッサ
２１０ピクセル
４００プロセッサ
４０１通信モジュール DESCRIPTION OF SYMBOLS 1 Main lens 4 User device 5 Server 6 Network 20 Micro lens 21 Image sensor 40 Display 41 Plenoptic camera 50 Storage 51 Processor 210 Pixel 400 Processor 401 Communication module

欧州特許第１２４６０８０号明細書EP 1246080 specification 欧州特許出願公開第２２０７１１３号明細書European Patent Application No. 2207113 国際公開第０５／１１４４７６号International Publication No. 05/114476 国際公開第２０１２／０８４３６２号International Publication No. 2012/084362

Claims

-Capturing (100) data representing a light field with a plenoptic camera (41) in the device (4);
-Executing (101) program code for matching the captured data with the reference data;
-Executing (102) program code for retrieving annotations associated with an element of the reference data;
-Executing (103) program code for rendering a view generated from the captured data and including at least one annotation;
Annotation method including

The method of claim 1, wherein the reference data defines a reference light field.

The method of claim 2, comprising generating the reference data from a 3D model of the scene.

The method of claim 2, wherein the matching step includes matching the captured data with a data piece of a plurality of reference data pieces representing different right views.

The method of any one of the preceding claims, comprising detecting (102) local features in the captured data.

The local feature detection step (102) includes detecting areas where pixels at a first depth have a predetermined relationship with pixels at different depths. Method.

The method of claim 5, wherein the local feature detection step (102) includes detecting parallax in the captured data.

The method of claim 5, wherein the local feature detection step (102) includes calculating an epipolar volume or line.

The method according to any one of claims 5 to 8, comprising the step (1011) of describing the local features.

The method of claim 9, wherein the local feature is described by a descriptor in binary form.

The method of claim 10, comprising calculating a Hamming distance between the descriptors.

The method of any one of claims 5 to 11, comprising matching (106) the local features in the captured data with local features in the reference data.

13. A method according to any one of claims 5 to 12, comprising registering (107) the reference data and the plenoptic data using the local features.

14. A method according to any one of claims 5 to 13, comprising detecting a type of scene and determining a type of local feature to be detected in the captured data in accordance with the scene type.

15. Selecting one or a limited number of reference data pieces prior to the matching depending on the position of the device (4), selections made by a user or received signals. The method as described in any one of.

The method according to claim 1, wherein the reference data includes a global model of a scene.

17. The method of claim 16, comprising minimizing (1510) a cost function representing a projection error of the captured data onto the reference data.

18. The step (109) of rendering a view includes rendering a 2D view derived from captured data and overlaying an annotation on the 2D view. the method of.

The step (109) of rendering the 2D view from the captured data is focused on the annotated object or feature of the annotated image while the rest of the scene is blurred. 18. The method of claim 17, comprising displaying the annotated object or annotated image features in a conforming fashion.

A device (4) for capturing and annotating data corresponding to a scene,
A plenoptic camera (41) for capturing (100) data representing a light field;
A processor (400);
-A display (40);
-Program code for causing the processor to retrieve annotations associated with an element of the data captured by the camera when the program code is executed and generated from the captured data Program code for rendering a view on the display (40) comprising at least one annotation;
Including the device.

The apparatus of claim 20, wherein the program code is further arranged to cause the processor (400) to detect local features present in captured data when the program code is executed.

21. The apparatus of claim 20, wherein the program code is further arranged to describe each detected local feature with a binary vector.

A computer program product comprising a tangible device readable medium for causing the device to perform the method of any one of claims 1-19.

An apparatus (5) for determining an annotation,
A processor (51);
-Store (50);
A program code that, when the program code is executed, causes the processor to receive data representing a light field, matches the data with one reference data in the store, and is associated with the reference data; Program code for determining the annotation and sending the annotation to the remote device (4);
Including the device.