JP2020113159A

JP2020113159A - Information terminal device and program

Info

Publication number: JP2020113159A
Application number: JP2019005017A
Authority: JP
Inventors: 加藤　晴久; Haruhisa Kato; 晴久加藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-01-16
Filing date: 2019-01-16
Publication date: 2020-07-27
Anticipated expiration: 2039-01-16
Also published as: JP7074694B2

Abstract

To provide an information terminal device which can suppress delay and estimate a posture.SOLUTION: The information terminal device includes: an imaging unit 1 for imaging and obtaining a taken image; a first estimation unit 3 for estimating attitude information of the imaging unit at a past time from the taken image; a sensor unit 2 for continuously acquiring sensor values on which the attitude of the imaging unit is to be reflected; a second estimation unit 4 for estimating attitude change information to a future time from the sensor values acquired continuously; and a synthesis unit 5 for obtaining synthesis attitude information formed by synthesizing the attitude information and the attitude change information.SELECTED DRAWING: Figure 1

Description

本発明は、遅延を抑制して姿勢を推定することが可能な情報端末装置及びプログラムに関する。 The present invention relates to an information terminal device and a program capable of estimating a posture while suppressing a delay.

画像を撮像するカメラの姿勢を推定する技術として例えば、特許文献１，２や非特許文献１の技術がある。特許文献１では、画像から特徴記述子を抽出し、複数の画像における当該記述子の対応関係を求めることで画像間の姿勢変化を推定する手法が提案されている。特徴記述子の正規化にセンサの情報を利用することで信頼性を高めている。特許文献２では、センサで初期姿勢を推定し画像で補正する手法が提案されている。センサにより姿勢推定の解空間を減少させ、画像による姿勢最適化の収束を高める効果があると主張している。非特許文献１では、深層学習を利用することで画像及びセンサから姿勢推定している。 Techniques for estimating the orientation of a camera that captures an image include, for example, techniques disclosed in Patent Documents 1 and 2 and Non-Patent Document 1. Patent Document 1 proposes a method of estimating a posture change between images by extracting a feature descriptor from an image and obtaining a correspondence relationship between the descriptors in a plurality of images. Reliability is enhanced by using sensor information for normalization of feature descriptors. Patent Document 2 proposes a method of estimating an initial posture with a sensor and correcting it with an image. It is argued that the sensor reduces the solution space of pose estimation and enhances the convergence of pose optimization by images. In Non-Patent Document 1, the posture is estimated from an image and a sensor by using deep learning.

特開2014-241155号公報JP 2014-241155 JP 特表2015-532077号公報Special table 2015-532077 bulletin

Ronald Clark, Sen Wang, Hongkai Wen, Andrew Markham, Niki Trigoni,``VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem,'' Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 3995--4001, 2017.Ronald Clark, Sen Wang, Hongkai Wen, Andrew Markham, Niki Trigoni, ``VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem,'' Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 3995 --4001, 2017. Kato, H., Billinghurst, M. ``Marker Tracking and HMD Calibration for a video-based Augmented Reality Conferencing System,'' In Proc. of the 2nd Int. Workshop on Augmented Reality, 1999.Kato, H., Billinghurst, M. ``Marker Tracking and HMD Calibration for a video-based Augmented Reality Conferencing System,'' In Proc. of the 2nd Int. Workshop on Augmented Reality, 1999. D.G.Lowe, ``Distinctive image features from scale-invariant key points,'' Proc. of Int. Journal of Computer Vision, 60(2) pp.91-110, 2004.D.G.Lowe, ``Distinctive image features from scale-invariant key points,'' Proc. of Int. Journal of Computer Vision, 60(2) pp.91-110, 2004.

しかしながら、以上のような従来技術には、次の課題があった。すなわち、情報量の多い画像から姿勢を推定するには処理負荷がかかることにより、撮像してから姿勢算出までに時間を要するため、姿勢算出結果が得られた時点では姿勢が変化している虞がある。特に、光学シースルー型ヘッドマウントディスプレイ（HMD）などで実世界の景色が遅延無く視認できる場合に、この姿勢算出結果を用いて拡張現実(AR)表示アプリを実現しようとする場合などには、姿勢推定結果の反映が遅れてしまうという問題がある。 However, the above conventional techniques have the following problems. That is, since a processing load is required to estimate a posture from an image having a large amount of information, it takes time from image capturing to posture calculation. Therefore, the posture may change when the posture calculation result is obtained. There is. In particular, if you want to realize an augmented reality (AR) display application by using this attitude calculation result when the real world scenery can be viewed without delay on an optical see-through type head mounted display (HMD), etc. There is a problem that the reflection of the estimation result will be delayed.

この遅れの結果として、カメラが背景や対象物に対して動いている場合に、こうした従来技術の姿勢推定を用いたAR表示を体験するユーザの立場においては、不自然な表示となってしまう虞があった。すなわち、見えている現時刻の背景や対象物に対して、現時刻の姿勢とは異なる過去時刻の姿勢に基づいてAR表示が行われることにより、AR表示として重畳を行うべき本来の位置から乖離した位置に重畳が行われ、不自然な表示となってしまう虞があった。例えば、動きが速い場合、現在時刻からわずか0.1秒〜0.2秒程度の遅延を伴う過去時刻の姿勢を利用する場合であっても、不自然な表示となってしまう虞があった。 As a result of this delay, when the camera is moving with respect to the background or the object, the display may be unnatural from the standpoint of the user who experiences the AR display using such conventional posture estimation. was there. In other words, the AR display is performed based on the posture at the past time that is different from the posture at the current time with respect to the visible background or object at the current time, so that the AR display deviates from the original position to be superimposed. There is a risk that superimposition will be performed at the position where it is displayed, resulting in an unnatural display. For example, when the movement is fast, even if the posture at the past time with a delay of only 0.1 seconds to 0.2 seconds from the current time is used, the display may be unnatural.

上記従来技術の課題に鑑み、本発明は、遅延を抑制して姿勢を推定することが可能な情報端末装置及びプログラムを提供することを目的とする。 In view of the problems of the above-mentioned conventional techniques, it is an object of the present invention to provide an information terminal device and a program capable of estimating a posture while suppressing a delay.

上記目的を達成するため、本発明は、情報端末装置であって、撮像を行って撮像画像を得る撮像部と、前記撮像画像より、過去時刻における前記撮像部の姿勢情報を推定する第一推定部と、前記撮像部の姿勢が反映されるセンサ値を継続的に取得するセンサ部と、前記継続的に取得したセンサ値より未来時刻に向けての姿勢変化情報を推定する第二推定部と、前記姿勢情報と前記姿勢変化情報とを合成した合成姿勢情報を得る合成部と、を備えることを特徴とする。また、コンピュータを前記情報端末装置として機能させるプログラムであることを特徴とする。 In order to achieve the above-mentioned object, the present invention is an information terminal device, and an image capturing unit that captures an image to obtain a captured image, and first estimation that estimates posture information of the image capturing unit at a past time from the captured image. Unit, a sensor unit that continuously acquires a sensor value that reflects the attitude of the imaging unit, and a second estimation unit that estimates attitude change information toward a future time from the continuously acquired sensor value. A combining unit that obtains combined posture information that combines the posture information and the posture change information. Further, it is a program for causing a computer to function as the information terminal device.

本発明によれば、撮像画像より高精度に過去の姿勢情報を推定し、未来時刻に向けての姿勢変化情報はセンサ値に基づいて推定しておき、且つ、これら姿勢情報と姿勢変化情報を合成した合成姿勢情報を得ておくことにより、現在時刻が未来時刻に到達した際に、この合成姿勢情報によって遅延が抑制されており且つ高精度な姿勢推定結果を得ることができる。 According to the present invention, the posture information in the past is estimated with high accuracy from the captured image, the posture change information toward the future time is estimated based on the sensor value, and the posture information and the posture change information are calculated. By obtaining the synthesized combined posture information, when the current time reaches the future time, delay can be suppressed by this combined posture information and a highly accurate posture estimation result can be obtained.

一実施形態に係る情報端末装置の機能構成図である。It is a functional block diagram of the information terminal device which concerns on one Embodiment. 情報端末装置を実現することが可能な一般的なコンピュータ装置のハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of the general computer apparatus which can implement|achieve an information terminal device. 情報端末装置の各部が連携して動作することによる、情報端末装置全体としての動作の時間推移の模式例を示す図である。It is a figure which shows the schematic example of the time transition of operation|movement as a whole information terminal device by each part of an information terminal device operating in cooperation. 一実施形態に係る撮像部の動作のフローチャートである。6 is a flowchart of the operation of the imaging unit according to the embodiment. 一実施形態に係る第一推定部の動作のフローチャートである。It is a flow chart of operation of the first estimating part concerning one embodiment. 一実施形態に係るセンサ部の動作のフローチャートである。6 is a flowchart of the operation of the sensor unit according to the embodiment. 一実施形態に係る第二推定部の動作のフローチャートである。It is a flow chart of operation of the second estimating part concerning one embodiment. 一実施形態に係る合成部及び提示部の動作のフローチャートである。It is a flow chart of operation of a synthetic part and a presentation part concerning one embodiment.

図１は、一実施形態に係る情報端末装置の機能構成図である。図示するように、情報端末装置10は、撮像部1、センサ部2、第一推定部3、第二推定部4、合成部5及び提示部6を備える。情報端末装置10のハードウェア構成は、スマートフォン、タブレット端末、デスクトップ型又はラップトップ型のコンピュータ、HMD等として実装される一般的なコンピュータ装置によって実現することができる。 FIG. 1 is a functional configuration diagram of an information terminal device according to an embodiment. As illustrated, the information terminal device 10 includes an imaging unit 1, a sensor unit 2, a first estimating unit 3, a second estimating unit 4, a synthesizing unit 5, and a presenting unit 6. The hardware configuration of the information terminal device 10 can be realized by a general computer device implemented as a smartphone, a tablet terminal, a desktop or laptop computer, an HMD, or the like.

図２は、情報端末装置10を実現することが可能な一般的なコンピュータ装置20のハードウェア構成の例を示す図である。図２に示すように、コンピュータ装置20は、所定命令を実行するCPU（中央演算装置）101、CPU101の実行命令の一部又は全部をCPU101に代わって又はCPU101と連携して実行する専用プロセッサ102（GPU（グラフィック演算装置）や深層学習専用プロセッサ等）、CPU101や専用プロセッサ102にワークエリアを提供する主記憶装置としてのRAM103、補助記憶装置としてのROM104、カメラ201、シースルー型ではない通常のディスプレイ202、光学シースルー型HMDを実現する場合のためのシースルー型ディスプレイ203、センサ204及び通信インタフェース205と、これらの間でデータを授受するためのバスBと、を備える。なお、ディスプレイ202及びシースルー型ディスプレイ203は、実施形態に応じてそのいずれか一方のみが備わるようにしてもよい。 FIG. 2 is a diagram showing an example of a hardware configuration of a general computer device 20 capable of realizing the information terminal device 10. As shown in FIG. 2, the computer device 20 includes a CPU (central processing unit) 101 that executes a predetermined instruction, and a dedicated processor 102 that executes a part or all of the execution instructions of the CPU 101 instead of the CPU 101 or in cooperation with the CPU 101. (GPU (graphics processing unit), deep learning dedicated processor, etc.), RAM103 as a main storage device that provides a work area for the CPU101 and dedicated processor 102, ROM104 as an auxiliary storage device, camera 201, a normal see-through type display 202, a see-through type display 203 for realizing an optical see-through type HMD, a sensor 204 and a communication interface 205, and a bus B for exchanging data among them. It should be noted that the display 202 and the see-through display 203 may be provided with only one of them depending on the embodiment.

詳細を後述する情報端末装置10の各部は、各部の機能に対応する所定のプログラムをROM104から読み込んで実行するCPU101及び／又は専用プロセッサ102によって実現することができる。ここで、撮像関連の処理が行われる場合にはさらに、カメラ201が連動して動作し、表示関連の処理が行われる場合にはさらに、ディスプレイ202又はシースルー型ディスプレイ203が連動して動作し、各種のセンサ値の取得が行われる際にはさらにセンサ204（1種類以上）が連動して動作し、ネットワークを経由したデータ送受信に関する通信関連の処理が行われる場合にはさらに通信インタフェース205が連動して動作する。 Each unit of the information terminal device 10, which will be described in detail later, can be realized by the CPU 101 and/or the dedicated processor 102 that reads a predetermined program corresponding to the function of each unit from the ROM 104 and executes the program. Here, when the imaging-related processing is performed, the camera 201 further operates in an interlocking manner, and when the display-related processing is performed, the display 202 or the see-through display 203 further operates in an interlocking manner. When various sensor values are acquired, the sensor 204 (one or more types) further operates in conjunction, and when communication-related processing related to data transmission/reception via a network is performed, the communication interface 205 further interlocks. And work.

なお、情報端末装置10は、スマートフォン端末等の1台のコンピュータ装置20によって実現されてもよいし、2台以上のコンピュータ装置20によって実現されてもよい。例えば、少なくともカメラ201及びシースルー型ディスプレイ203を備えるHMD端末である第一のコンピュータ装置20と、このHMD端末において表示するためのAR重畳コンテンツを提供するスマートフォン端末である第二のコンピュータ装置20とを、両端末の通信インタフェース205を介した近距離無線などによって相互に通信可能としたシステムを、情報端末装置10としてもよい。これに加えて、またはこれに代えて、ネットワーク上のサーバ端末である第三のコンピュータ装置20において、情報端末装置10の一部（用途としてユーザが存在する現場でAR表示を実現する場合は、撮像部1、センサ部2及び提示部6を除く）の機能が実現されてもよい。 The information terminal device 10 may be realized by one computer device 20 such as a smartphone terminal, or may be realized by two or more computer devices 20. For example, a first computer device 20 that is an HMD terminal including at least a camera 201 and a see-through display 203, and a second computer device 20 that is a smartphone terminal that provides AR superimposed content to be displayed on this HMD terminal. The information terminal device 10 may be a system that enables mutual communication by short-range wireless communication via the communication interface 205 of both terminals. In addition to this or instead of this, in the third computer device 20 which is a server terminal on the network, a part of the information terminal device 10 (when realizing AR display at the site where the user exists as a use, The functions of the imaging unit 1, the sensor unit 2, and the presentation unit 6) may be realized.

以下、情報端末装置10の各部を説明する。図３は、情報端末装置10の各部が連携して動作することによる、情報端末装置10全体としての動作の時間推移の模式例を示す図である。図３では、その時間軸が横軸方向で共通のものとして線L1上に例えば時刻t₁,t₂,…,t₈として示されており、この時間軸上において各部の処理の模式例が示されている。以下の説明においては適宜、この図３の例を参照する。 Hereinafter, each unit of the information terminal device 10 will be described. FIG. 3 is a diagram showing a schematic example of a time transition of the operation of the information terminal device 10 as a whole by the respective parts of the information terminal device 10 operating in cooperation with each other. In FIG. 3, the time axis is shown as, for example, times t ₁ , t ₂ ,..., T ₈ on the line L1 as being common in the horizontal axis direction, and a schematic example of the processing of each unit on this time axis is shown. It is shown. In the following description, the example of FIG. 3 will be referred to as appropriate.

撮像部1は、撮像を行い、得られた画像を時刻情報（撮像時刻のタイムスタンプ）とともに撮像情報として第一推定部3および提示部6へ出力する。撮像部1は、ハードウェアとしてはデジタルカメラ等によるカメラ201を用いて実現することができる。 The image capturing unit 1 captures an image and outputs the obtained image to the first estimating unit 3 and the presenting unit 6 together with time information (time stamp of image capturing time) as image capturing information. The imaging unit 1 can be realized by using a camera 201 such as a digital camera as hardware.

図４は、一実施形態に係る撮像部1の動作のフローチャートである。ステップS11では、現時刻が撮像タイミングに到達したか否かを判定し、到達していればステップS12へと進み、到達していなければステップS11に戻り、この判定を繰り返す。ステップS12ではこの撮像タイミングにおいて撮像を行い、得られた画像にタイムスタンプを紐づけた撮像情報を、第一推定部3及び提示部6へと転送してから、（すなわち、第一推定部3及び提示部6が処理のために参照する画像バッファとしてのRAM103へと転送してから、）ステップS11へと戻る。以上の図４のフローにより、ステップS11で判定される撮像タイミングごとに、撮像部1による撮像が繰り返されることとなる。 FIG. 4 is a flowchart of the operation of the image capturing unit 1 according to the embodiment. In step S11, it is determined whether or not the current time has reached the image capturing timing. If the current time has arrived, the process proceeds to step S12, and if not, the process returns to step S11 and this determination is repeated. In step S12, imaging is performed at this imaging timing, and imaging information in which a time stamp is associated with the obtained image is transferred to the first estimating unit 3 and the presenting unit 6 (that is, the first estimating unit 3 Then, the presentation unit 6 transfers the image data to the RAM 103 as an image buffer to be referred to for processing, and then returns to step S11. According to the flow of FIG. 4 described above, the image pickup by the image pickup unit 1 is repeated at each image pickup timing determined in step S11.

ステップS11の撮像タイミングは、予め指定しておく所定の時間間隔ごととしてよいが、実際の撮像時刻（撮像情報における時刻情報として反映される撮像時刻）は、例えば画像の焦点を合わせる等の処理完了を待つことにより、所定の撮像タイミングからズレがあるものであってもよい。図３の例では、線L1上に示されるこのような撮像時刻t₁,t₂,…,t₈において線L2上に示すような撮像情報I₁,I₂,…,I₈が得られたうえで転送を開始され、この転送が完了した状態が線L3上に示されている。この転送には一般に、不均一な遅延が発生しうるものである。この不均一な遅延の原因には、汎用的なコンピュータ装置20である情報端末装置10においてその他の処理（本発明に関係する処理以外の、オペレーティングシステムによって管理されるバックグラウンド処理など）が同時並行で実施されることによって発生する不均一な負荷等が含まれる。図３の例ではこの不均一な遅延により例えば、撮像時刻t₂の撮像情報I₂の転送所要時間よりも、撮像時刻t₃の撮像情報I₃の転送所要時間の方が長くなっている。 The image capturing timing in step S11 may be set at predetermined time intervals specified in advance, but the actual image capturing time (image capturing time reflected as time information in the image capturing information) is, for example, processing completion such as focusing of an image. It may be that there is a deviation from a predetermined imaging timing by waiting for. In the example of FIG. 3, the imaging information I ₁ , I ₂ ,..., I ₈ as shown on the line L2 is obtained at the imaging times t ₁ , t ₂ ,..., T ₈ as shown on the line L1. Then, the transfer is started, and the state in which the transfer is completed is shown on the line L3. This transfer is typically subject to non-uniform delay. The cause of this non-uniform delay is that other processes (such as background processes managed by the operating system other than the processes related to the present invention) are concurrently performed in the information terminal device 10 which is a general-purpose computer device 20. Including non-uniform load and the like caused by the implementation of. In the example of FIG. 3, due to this non-uniform delay, for example, the transfer required time of the image pickup information I ₃ at the image pickup time t ₃ is longer than the transfer required time of the image pickup information I ₂ at the image pickup time t ₂ .

なお、図４のフローによる撮像部1の処理に対する追加的な処理として、遅延（転送所要時間）が大きく次の撮像タイミング（ステップS11の肯定判定タイミング）に達しても転送が完了していない場合、この転送完了していない撮像情報を破棄するようにしてもよい。このような破棄処理が部分的に発生した場合であっても、情報端末装置10は姿勢推定処理を継続的に実施することが可能である。 As an additional process to the process of the image capturing unit 1 according to the flow of FIG. 4, when the delay (transfer required time) is large and the transfer is not completed even when the next image capturing timing (affirmative determination timing of step S11) is reached. The image pickup information that has not been transferred may be discarded. Even when such a discarding process partially occurs, the information terminal device 10 can continuously perform the posture estimation process.

なお、提示部6で撮像情報を利用する必要がない実施形態の場合（例えば、シースルー型HMDによってAR表示を行う実施形態の場合）には、撮像部1から提示部6へと撮像情報を出力することは省略し、第一推定部3のみへと撮像情報を出力するようにすればよい。 In the case of an embodiment in which it is not necessary to use the imaging information in the presentation unit 6 (for example, in the case of an embodiment in which AR display is performed by a see-through HMD), the imaging information is output from the imaging unit 1 to the presentation unit 6. This may be omitted, and the imaging information may be output only to the first estimation unit 3.

第一推定部3は、撮像部1から入力される撮像情報における撮像画像I_n(n=1,2,…)から所定の対象物（撮像対象）を検出したうえで、撮像部1を構成するカメラ201を基準とした、撮像対象の相対的な姿勢情報P_n(n=1,2,…)を推定して、この姿勢情報P_nを合成部5へと出力する。この姿勢情報は、カメラ201のカメラパラメータを既知のものとして与えておいたうえで、並進成分行列と回転成分行列との積の平面射影変換の行列の形で得ることができる。この姿勢推定処理それ自体には任意の既存の手法を用いてよく、例えば、撮像対象を白黒の正方マーカ（ARマーカ）として前掲の非特許文献２の手法を用いてもよいし、任意模様の撮像対象に対して、自然特徴情報（特徴点及びその周辺から得られる特徴量）の一例であるSIFT特徴情報の抽出によって撮像対象を検出する前掲の非特許文献３の手法を用いてもよい。あるいは、図１ではそのデータ授受の流れは省略しているが、第一推定部3では入力として撮像部1で得た撮像画像I_nのみでなくセンサ部2によるセンサ値s_mも利用することにより、前掲の非特許文献１の深層学習の手法を用いて姿勢推定してもよい。また、これら姿勢推定処理を継続的に行うに際しては、ある初期時刻において対象検出を行い、その後の時刻では検出よりも高速に実施可能な対象追跡を行うようにしてもよい。 The first estimation unit 3 configures the imaging unit 1 after detecting a predetermined target (imaging target) from the captured image I _n (n=1, 2,...) In the imaging information input from the imaging unit 1. The relative posture information P _n (n=1, 2,...) Of the imaging target with respect to the camera 201 is estimated, and this posture information P _n is output to the synthesis unit 5. This orientation information can be obtained in the form of a plane projective transformation matrix of the product of the translation component matrix and the rotation component matrix, given the camera parameters of the camera 201 as known. Any existing method may be used for this posture estimation process itself, for example, the method of Non-Patent Document 2 described above may be used as a black-and-white square marker (AR marker) as an imaging target, or an arbitrary pattern may be used. The method of Non-Patent Document 3 described above may be used for detecting an imaging target by extracting SIFT characteristic information, which is an example of natural characteristic information (a characteristic amount obtained from a characteristic point and its periphery), for the imaging target. Alternatively, although in FIG. 1 the flow of the data exchange is omitted, also be utilized sensor values s _m by the sensor unit 2 not only captured images I _n obtained by the imaging section 1 as an input the first estimation unit 3 Therefore, the posture may be estimated using the deep learning method described in Non-Patent Document 1 described above. Further, when the posture estimation processing is continuously performed, the target may be detected at a certain initial time and the target tracking that can be performed at a higher speed than the detection may be performed at the subsequent time.

図５は、一実施形態に係る第一推定部3の動作のフローチャートである。ステップS31では、姿勢推定に関して未処理であり、且つ、撮像部１から転送済みである撮像情報が存在するか否かを判定し、存在していればステップS32へと進み、存在していなければステップS31に戻り、条件を満たす撮像情報が現れるまでこの判定を繰り返す。ステップS32では、その直前のステップS31で肯定判定を得た撮像情報における撮像画像I_n(n=1,2,…)を対象として姿勢推定処理を行い、得られた姿勢情報P_n(n=1,2,…)を合成部5へと出力してから、ステップS31へと戻る。 FIG. 5 is a flowchart of the operation of the first estimation unit 3 according to the embodiment. In step S31, it is determined whether or not there is imaging information that has not been processed with respect to posture estimation and that has been transferred from the imaging unit 1. If it exists, the process proceeds to step S32, and if it does not exist. Returning to step S31, this determination is repeated until the imaging information satisfying the conditions appears. In step S32, the posture estimation process is performed on the captured image I _n (n=1, 2,...) Of the imaged information obtained in the affirmative determination immediately before that in step S31, and the obtained posture information P _n (n= 1, 2,...) Is output to the synthesizing unit 5, and then the process returns to step S31.

第一推定部3による処理は、マルチコアで構成されたCPU101及び／又はGPU等の専用プロセッサ102においてマルチスレッドにより姿勢推定処理プログラムを複数同時並行で実行することにより、複数の姿勢推定を同時並行で実施してもよい。すなわち、図５のフローがスレッドごとに実施されることで、複数同時並行で実施されてもよい。この場合、ステップS31における未処理の判断は、マルチスレッドにおける自スレッド以外の他スレッドにおいて未処理であるか否かも含めて判断すればよい。この場合、他スレッドにおいて既に姿勢推定処理が開始されているが現時点で未完了であるものは、自スレッドにおいて処理すべき対象から除外するよう判断することで、スレッド間で同じ処理が重複することを避けるようにすればよい。 The processing by the first estimation unit 3 is to execute a plurality of posture estimation processing programs by a multithread in a dedicated processor 102 such as a CPU 101 and/or a GPU configured by a multi-core at the same time, thereby performing a plurality of posture estimations at the same time. You may implement. That is, a plurality of threads may be concurrently executed by executing the flow of FIG. 5 for each thread. In this case, the unprocessed determination in step S31 may be made by including whether or not the other threads other than the own thread in the multithread are unprocessed. In this case, if the posture estimation process is already started in another thread, but it is not completed at this time, it is determined that it should be excluded from the processing target in its own thread, and the same process may be duplicated between threads. You should avoid.

図３の例では、第一推定部3による処理が3個同時並行で実施される場合が線L4,L5,L6上において、個別の姿勢推定処理が実施されている時間範囲をハッチ付与の枠として示すことによって、示されている。 In the example of FIG. 3, when three processes by the first estimation unit 3 are simultaneously performed in parallel, on the lines L4, L5, and L6, the time range in which the individual posture estimation processes are performed is hatched. It is shown by showing as.

すなわち、線L4上では第一スレッドにより時刻t₁,t₂間で転送完了し未処理となっている撮像画像I₁を対象として姿勢推定処理が開始され、時刻t₃,t₄間で姿勢推定処理が完了してその姿勢情報P₁が得られている。また、線L4上では第一スレッドにより撮像画像I₄に対して時刻t₄,t₅間で姿勢推定処理を開始して時刻t₆,t₇間でその姿勢情報P₄が得られている。同様に、線L5上では第二スレッドにより撮像画像I₂に対して時刻t₂,t₃間で姿勢推定処理を開始して時刻t₅,t₆間でその姿勢情報P₂が得られ、撮像画像I₅に対して時刻t₅,t₆間で姿勢推定処理を開始して時刻t₈後でその姿勢情報P₅が得られている。同様に、線L6上では第三スレッドにより撮像画像I₃に対して時刻t₃,t₄間で姿勢推定処理を開始して時刻t₅,t₆間でその姿勢情報P₃が得られ、撮像画像I₆に対して時刻t₆,t₇間で姿勢推定処理を開始して時刻t₈後にその姿勢情報P₆が得られている。 That is, on the line L4, the posture estimation process is started for the unprocessed captured image I ₁ that has been transferred between the times t ₁ and t ₂ by the first thread, and the posture is estimated between the times t ₃ and t _4. The estimation process is completed and the posture information P ₁ is obtained. Further, on the line L4, the posture estimation process is started by the first thread for the captured image I ₄ between times t ₄ and t ₅ , and the posture information P ₄ is obtained between times t ₆ and t ₇ . .. Similarly, on the line L5, the posture estimation process is started between the times t ₂ and t _{3 with} respect to the captured image I ₂ by the second thread, and the posture information P ₂ is obtained between the times t ₅ and t ₆ , The posture estimation process is started for the captured image I ₅ between times t ₅ and t ₆ , and the posture information P ₅ is obtained after time t ₈ . Similarly, on the line L6, the attitude estimation process is started between the times t ₃ and t _{4 with} respect to the captured image I ₃ by the third thread, and the attitude information P ₃ is obtained between the times t ₅ and t ₆ , The posture estimation process is started for the captured image I ₆ between times t ₆ and t ₇ , and the posture information P ₆ is obtained after time t ₈ .

このように、姿勢推定処理の対象となる撮像画像が得られてから姿勢情報を推定するには、ある程度の時間を要する。図３の例では、第一推定部3の動作は、撮像情報が入力される度に処理待ち状態のスレッド（ステップS31において肯定判断待ちにあるスレッド）が姿勢推定処理を開始し、撮像部1による撮像間隔２〜３個分の時間がかかっている様子を図示している。撮像情報の内容や前述したその他のバックグラウンド処理の存在等によって処理負荷が変化するため、実行時には処理がいつ完了するかは未知である。 As described above, it takes some time to estimate the posture information after the captured image that is the target of the posture estimation processing is obtained. In the example of FIG. 3, in the operation of the first estimation unit 3, the thread in the processing waiting state (the thread waiting for the affirmative judgment in step S31) starts the posture estimation processing every time the imaging information is input, and the imaging unit 1 The figure shows that it takes a time corresponding to 2-3 imaging intervals. Since the processing load changes depending on the content of the imaging information and the presence of the other background processing described above, it is unknown when the processing is completed at the time of execution.

姿勢推定処理が完了して撮像画像I_n(n=1,2,…)のものとして得られた姿勢情報P_n(n=1,2,…)は、そのデータ量が撮像画像のデータ量と比べて非常に小さいため、少なくとも一定時間の転送完了時間を要した撮像画像の場合とは異なり、当該得られた時刻とほぼ同時に合成部5へと出力され（すなわち、合成部5が参照するRAM103へと出力されて書き込まれ）、合成部5において参照可能となる。 The amount of data of the posture information P _n (n=1,2,...) Obtained as the captured image I _n (n=1,2,...) After the posture estimation process is completed is the data amount of the captured image. Since it is much smaller than that, unlike the case of a captured image that requires at least a certain transfer completion time, it is output to the synthesizing unit 5 almost at the same time as the obtained time (that is, the synthesizing unit 5 refers to it). It is output to the RAM 103 and written therein, and can be referred to in the synthesizing unit 5.

なお、図３の例では全フレーム（時間軸上での撮像画像I_n(n=1,2,…)の全て）を対象として姿勢推定処理を行っているが、時間軸上でフレームを間引いたうえで姿勢推定処理を行うようにしてもよい。例えば、概ね1/3に間引くことで、3スレッド分の線L4,L5,L6の全てではなく、そのうちの１スレッドのみの線L4に示される姿勢推定処理のみを行うようにしてもよい。 In the example of FIG. 3, the posture estimation processing is performed for all frames (all of the captured images I _n (n=1,2,...) on the time axis), but frames are thinned on the time axis. Alternatively, the posture estimation process may be performed. For example, by thinning out to approximately 1/3, it is possible to perform only the posture estimation processing indicated by the line L4 of only one thread of the lines L4, L5, and L6 for three threads.

間引く際は、ステップS31において姿勢推定処理の判断対象とするものを予め時間軸上において一定割合で間引いて残ったもののみとする第一手法を用いてもよいし、ステップS31において未処理（前述の通りマルチスレッドの場合、他スレッドで処理開始されているものは処理済とする）且つ転送済のものが複数ある場合に、現在時刻に最も近いもののみを選び、選ばれなかったものは以降の時点における姿勢推定対象から除外する（マルチスレッドの場合、全スレッドにおいて除外する）第二手法を用いてもよいし、第一手法及び第二手法を組み合わせてもよい。第一手法に関して、CPU101及び／又は専用プロセッサ102の負荷状況を一定時間ごとに監視し、負荷が大きいほど間引き割合を増やすようにすることで、CPU101及び／又は専用プロセッサ102の負荷が動的に最適化されるようにしてもよい。 At the time of thinning out, the first method may be used in which the objects to be judged in the posture estimation processing in step S31 are thinned out in advance at a fixed rate on the time axis and only the remaining objects are unprocessed in step S31. As in the case of multi-threading, those that have been started in other threads are processed.) And when there are multiple transferred ones, only the one that is closest to the current time is selected. A second method of excluding the object from the posture estimation target at the time of (in the case of multi-thread, excluding all threads) may be used, or the first method and the second method may be combined. Regarding the first method, the load status of the CPU 101 and/or the dedicated processor 102 is dynamically monitored by monitoring the load status of the CPU 101 and/or the dedicated processor 102 at regular intervals and increasing the thinning rate as the load increases. It may be optimized.

センサ部2は、計測した値を時刻情報（計測時刻のタイムスタンプ）とともにセンサ情報として第二推定部4へ出力する。センサ部2を実現するハードウェアとしては、携帯端末に標準装備されていることの多い、加速度や並進運動を計測する加速度センサ及び／又は慣性センサ（回転角速度や回転運動を計測するジャイロセンサ）等の、１つ以上のセンサデバイスで構成されたセンサ204用いることができる。 The sensor unit 2 outputs the measured value to the second estimation unit 4 as sensor information together with time information (time stamp of measurement time). As hardware that realizes the sensor unit 2, an acceleration sensor and/or an inertial sensor (a gyro sensor that measures a rotational angular velocity or a rotational motion), which is often provided as standard equipment in a mobile terminal, that measures acceleration or translational motion, etc. , A sensor 204 composed of one or more sensor devices.

センサ部2では具体的に、カメラ201の姿勢を表現するものとして、センサ情報を取得する。このために例えば、センサ部2を実現するハードウェアとしてのセンサ204はカメラ201と共通の筐体に固定して設置しておく等により、センサ204とカメラ201との相対的位置関係がこの共通の筐体への固定によって剛体的に変化しない位置関係となるようにしておけばよい。 Specifically, the sensor unit 2 acquires sensor information as a representation of the posture of the camera 201. For this purpose, for example, the sensor 204 as the hardware that realizes the sensor unit 2 is fixedly installed in the same housing as the camera 201, and the relative positional relationship between the sensor 204 and the camera 201 is the same. The positional relationship may be such that it does not change rigidly when it is fixed to the housing.

図６は、一実施形態に係るセンサ部2の動作のフローチャートである。ステップS21では、現時刻が計測タイミングに到達したか否かを判定し、到達していればステップS22へと進み、到達していなければステップS21に戻り、この判定を繰り返す。ステップS22ではこの計測タイミングにおいてセンサ部2が計測を行い、得られたセンサ値にタイムスタンプを紐づけたセンサ情報を第二推定部4へと出力してからステップS21へと戻る。以上の図６のフローにより、ステップS21で判定される計測タイミングごとに、センサ部2によるセンサ計測が繰り返されることとなる。 FIG. 6 is a flowchart of the operation of the sensor unit 2 according to the embodiment. In step S21, it is determined whether or not the current time has reached the measurement timing. If the current time has arrived, the process proceeds to step S22, and if not, the process returns to step S21, and this determination is repeated. In step S22, the sensor unit 2 performs measurement at this measurement timing, outputs sensor information in which a time stamp is associated with the obtained sensor value to the second estimation unit 4, and then returns to step S21. According to the flow of FIG. 6 described above, the sensor measurement by the sensor unit 2 is repeated at each measurement timing determined in step S21.

この繰り返しにより、センサ情報が時系列上のものとして得られることとなる。センサ情報の任意の2時刻t_a,t_b間での変化は、撮像部1を実現するハードウェアとしてのカメラ201の姿勢の2時刻t_a,t_b間での変化を計測したものとなり、第一推定部3で得た相対的な姿勢情報と同種類の情報を与えるものとなる。すなわち、相対的な姿勢情報（基準姿勢からの変化としての姿勢情報）の表現の一例として並進成分行列と回転成分行列との積の平面射影変換の行列の形で第一推定部3で得たのと同種の情報として、時系列上のセンサ情報をセンサ部2において得ることができる。 By repeating this, the sensor information is obtained in time series. The change between any two times t _a and t _{b of the} sensor information is a change in the attitude of the camera 201 as the hardware that realizes the imaging unit 1 measured between two times t _a and t _b , The same kind of information as the relative posture information obtained by the first estimation unit 3 is given. That is, as an example of the expression of the relative posture information (posture information as a change from the reference posture), it is obtained by the first estimation unit 3 in the form of a plane projective transformation matrix of the product of the translation component matrix and the rotation component matrix. As the same type of information as the above, the sensor information in time series can be obtained in the sensor unit 2.

ステップS21における計測タイミングは、予め指定しておく所定の時間間隔ごととしてよい。この計測タイミングは、ステップS11における撮像部1による撮像タイミングよりもその間隔を細かい（短い）ものとして設定することが好ましい。図３では、撮像タイミングにおける1回分の間隔をなす時刻t₁,t₂間に、線L7上に示すようにセンサ計測が7回行われてセンサ値s₁,s₂,…s₇が得られており、計測タイミングの間隔が撮像タイミングの間隔よりも概ね6倍の細かさ（1/6倍の短さ）で設定される場合が例として示されている。 The measurement timing in step S21 may be set at predetermined time intervals that are designated in advance. This measurement timing is preferably set such that the interval is finer (shorter) than the imaging timing by the imaging unit 1 in step S11. In FIG. 3, the sensor measurement is performed 7 times as shown on the line L7 between the times t ₁ and t ₂ forming one interval at the imaging timing, and the sensor values s ₁ , s ₂ ,..., S ₇ are obtained. The case where the measurement timing interval is set to be 6 times finer (1/6 times shorter) than the imaging timing interval is shown as an example.

なお、図３の例では、m回目(m=1,2,…)の計測で得られたセンサ値をs_mとして表記している。センサ値以外のデータ、例えば線L2,L3上に示される撮像画像I_n(n=1,2,…)は撮像時刻t_nに対応するものであり、下付き文字がn回目の撮像時刻t_nに対応するものとなっているが、センサ値s_mに関しては、その下付き文字はセンサ部2による計測回数mに対応し、撮像時刻t_nに直接対応しているわけではない。（ただし、図３の例では撮像時刻t_nに関して概ね、「m=6n-5」の対応関係が存在することで、撮像時刻t_nと概ね同時にm=6n-5回目のセンサ値s_6n-5が取得されている。）また、図３では、センサ値s_mに関して表示が煩雑化するのを避けるために、その全てに対してではなく主要な説明で参照するもののみに対して、文字表記を付与してある。 In the example of FIG. 3, it is denoted m-th (m = 1, 2, ...) of the sensor value obtained by the measurement as s _m. Data other than the sensor value, for example, the captured images I _n (n=1,2,...) Shown on the lines L2, L3 correspond to the capturing time t _n , and the subscript indicates the n-th capturing time t. _Although it corresponds to _n , the subscript of the sensor value s _m corresponds to the number of measurements m by the sensor unit 2, and does not directly correspond to the imaging time t _n . (However, generally with respect to the imaging time t _n in the example of FIG. 3, "m = 6n-5" relationship that exists in the imaging time t _n roughly simultaneously m = 6n-5 th sensor value s _{6 N- In} addition, in FIG. 3, in order to avoid cluttering the display with respect to the sensor value s _m , not all of them but only those referred to in the main description Notation is given.

計測処理が完了して得られたセンサ情報s_m(m=1,2,…)は、そのデータ量が撮像画像のデータ量と比べて非常に小さいため、少なくとも一定時間の転送完了時間を要した撮像画像の場合とは異なり、当該得られた時刻とほぼ同時に第二推定部4へと出力され（すなわち、第二推定部4が参照するRAM103へと出力されて書き込まれ）、第二推定部4において参照可能となる。 The sensor information s _m (m=1,2,...) Obtained after the measurement process is completed requires a transfer completion time of at least a certain time because its data amount is much smaller than the data amount of the captured image. Unlike the case of the captured image, the second estimation is output almost simultaneously with the obtained time to the second estimation unit 4 (that is, output to and written to the RAM 103 referred to by the second estimation unit 4). It will be available in Part 4.

第二推定部4は、センサ部2から時系列上で継続的に入力されているセンサ情報を用いて、現時刻から見て未来における提示部6の提示処理間隔において変化することとなる姿勢変化情報を提示処理間隔ごとに推定して、合成部5へと出力する。なお、後述するように、提示部6は所定の提示処理間隔（フレームレート）において、提示処理を継続的に実施することで、AR表示等を実現することができるものである。 The second estimation unit 4 uses the sensor information continuously input from the sensor unit 2 in time series, and the posture change that will change in the presentation processing interval of the presentation unit 6 in the future when viewed from the current time. Information is estimated for each presentation processing interval and output to the synthesis unit 5. As will be described later, the presentation unit 6 can realize AR display and the like by continuously performing the presentation process at a predetermined presentation process interval (frame rate).

図３では、線L4,L5,L6上に示される第一推定部3の場合と同様に、線L8上に第二推定部4による個別の推定処理が、実施されている時間範囲をハッチ付与の枠として示すことにより、示されている。図３では、提示部6の提示処理間隔が撮像部1による撮像時刻t₁,t₂,…,t₈に一致して定まる場合を例として、線L8上に第二推定部4による処理の例が示されている。 In FIG. 3, as in the case of the first estimation unit 3 shown on the lines L4, L5, and L6, the time range in which the individual estimation processing by the second estimation unit 4 is performed is hatched on the line L8. This is indicated by the frame. In FIG. 3, as an example, the presentation processing interval of the presentation unit 6 is determined to match the imaging times t ₁ , t ₂ ,..., T ₈ by the imaging unit 1 and the processing by the second estimation unit 4 on the line L8. An example is shown.

すなわち、現時刻t_現在が撮像時刻t₂,t₃間（t₂<t_現在<t₃）にあって11個目のセンサ情報s₁₁までが参照可能となっている場合に、未来における提示部6の提示処理間隔である時刻t₃,t₄間での姿勢変化情報ΔP_3,4を、11個目のセンサ情報s₁₁までを用いて現時刻t_現在において推定開始し、その後の現時刻がこの間隔の終端側にある未来時刻t₄に到達する前に推定処理を完了する。同様に、現時刻t_現在が撮像時刻t₃,t₄間（t₃<t_現在<t₄）にあって17個目のセンサ情報s₁₇までが参照可能となっている場合に、未来における提示部6の提示処理間隔である時刻t₄,t₅間での姿勢変化情報ΔP_4,5を、17個目のセンサ情報s₁₇までを用いて現時刻t_現在において推定開始し、その後の現時刻がこの間隔の終端側にある未来時刻t₅に到達する前に推定処理を完了する。 That is, when the present time t _present is between the imaging times t ₂ and t ₃ (t ₂ <t _present <t ₃ ) and up to the eleventh sensor information s ₁₁ can be referred to, presentation in the future the time t _3, t posture change information [Delta] P _{3, 4} of between ₄ is a presenting process interval parts 6, the estimated starting at the current time t _now using up to 11 th sensor information s _11, then the current The estimation process is completed before the time reaches the future time t ₄ at the end of this interval. Similarly, when the present time t _present is between the imaging times t ₃ and t ₄ (t ₃ <t _present <t ₄ ) and up to the 17th sensor information s ₁₇ can be referred to, the future the posture change information [Delta] P _{4, 5} in between time t _4, t ₅ is a presentation process interval indication section 6, the estimated starting at the current time t _now using up to 17 th sensor information s _17, followed by The estimation process is completed before the current time reaches the future time t ₅ at the end of this interval.

同様に、現時刻t_現在が撮像時刻t₄,t₅間（t₄<t_現在<t₅）にあって24個目のセンサ情報s₂₄までが参照可能となっている場合に、未来における提示部6の提示処理間隔である時刻t₅,t₆間での姿勢変化情報ΔP_5,6を、24個目のセンサ情報s₂₄までを用いて現時刻t_現在において推定開始し、その後の現時刻がこの間隔の終端側にある未来時刻t₆に到達する前に推定処理を完了する。同様に、現時刻t_現在が撮像時刻t₆,t₇間（t₆<t_現在<t₇）のうち時刻t₆の直後付近にあって時刻t₆で得られた31個目のセンサ情報s₃₁までが参照可能となっている場合に、未来における提示部6の提示処理間隔である時刻t₆,t₇間（時刻t₆は過去だが時刻t₇は未来となる）での姿勢変化情報ΔP_6,7を、31個目のセンサ情報s₃₁までを用いて現時刻t_現在において推定開始し、その後の現時刻がこの間隔の終端側にある未来時刻t₇に到達する前に推定処理を完了する。同様に、現時刻t_現在が撮像時刻t₆,t₇間（t₆<t_現在<t₇）にあって36個目のセンサ情報s₃₆までが参照可能となっている場合に、未来における提示部6の提示処理間隔である時刻t₇,t₈間での姿勢変化情報ΔP_7,8を、36個目のセンサ情報s₃₆までを用いて現時刻t_現在において推定開始し、その後の現時刻がこの間隔の終端側にある未来時刻t₈に到達する前に推定処理を完了する。 Similarly, if the present time t _present is between the imaging times t ₄ and t ₅ (t ₄ <t _present <t ₅ ) and up to the 24th sensor information s ₂₄ can be referred to, the posture change information [Delta] P _{5, 6} in between time t _5, t ₆ is a presentation process interval indication section 6, the estimated starting at the current time t _now using up to 24 th sensor information s _24, followed by The estimation process is completed before the current time reaches the future time t ₆ at the end of this interval. Similarly, 31 th of sensor information current time t _current is obtained at time t ₆ In the vicinity immediately after the time t ₆ of between imaging time t _6, t ₇ (t ₆ <t _current <t ₇₎ Posture change between times t ₆ and t ₇ (present time t ₆ is past but time t ₇ is future), which is the presentation processing interval of the presenting unit 6 in the future when up to s ₃₁ can be referenced the information [Delta] P _{6, 7,} 31 th of sensor information using up to s ₃₁ starts estimated at the current time t _current, current time subsequent estimated before reaching the future time t ₇ at the end side of the spacing Complete the process. Similarly, if the present time t _present is between the imaging times t ₆ and t ₇ (t ₆ <t _present <t ₇ ) and up to the 36th sensor information s ₃₆ can be referred to, Posture change information ΔP _7,8 between times t ₇ and t ₈ which is the presentation processing interval of the presentation unit 6, starts estimation at the current time t _current using the _36th sensor information up to s ₃₆ , and then The estimation process is completed before the current time reaches the future time t ₈ on the end side of this interval.

第二推定部4において、現時刻t_現在までに参照可能となっているセンサ情報を入力として用いて姿勢変化情報を推定開始し、この処理を完了するにはある程度の時間を要する。（すなわち、この完了時刻をt_完了とすると、t_現在<t_完了である。）以上のような図３の例においては、線L8上に示される第二推定部4の動作は、処理開始時までのセンサ情報を用いて姿勢推定処理を実行し、撮像間隔0.5〜0.8個分程度の時間がかかっている様子を図示している。第一推定部3における画像処理を含む処理の場合のように画像内容等で処理負荷が大きく変化しうる場合とは異なり、センサ情報の内容によって第二推定部4の処理負荷はそれほど変化することはないため、処理開始時には処理がいつ完了するかは凡そ既知である。 In the second estimation unit 4, the posture change information estimated start using the sensor information that is referable up to the current time t _current as an input, to complete this process takes some time. (That is, when this completion time is t _completion , t _current <t _completion .) In the example of FIG. 3 as described above, the operation of the second estimation unit 4 shown on the line L8 is at the start of processing. The posture estimation process is executed using the sensor information up to, and it takes a time of about 0.5 to 0.8 imaging intervals. Unlike the case where the processing load may change significantly depending on the image content such as the case of processing including image processing in the first estimation unit 3, the processing load of the second estimation unit 4 changes so much depending on the content of the sensor information. Therefore, when the process starts, it is generally known when the process is completed.

この考察に基づき、第二推定部4で処理完了の所要時間T_第二推定と、合成部5での合成処理完了の所要時間T_合成と、提示部6において提示処理を行うのに要する時間T_提示（１フレーム分の提示処理を行うために必要な、AR表示情報等を準備するための時間）と、を加算(T_第二推定+T_合成+T_提示)して、提示部6における次の提示間隔t_始端,t_終端における終端側時刻t_終端からこの加算分だけ遡った時刻以前に、第二推定部4における処理を開始することが望ましい。すなわち、以上の図３の例でも説明してきた推定処理を開始するタイミングとしての現時刻t_現在は、以下の式(1)の条件を満たすものとして設定しておくことが望ましい。（なお、以下の式(1)及び以上説明してきた例においては、時刻の値に関して通常そのように扱われているのと同様に、未来側の時刻ほどその値が大きくなるものとして、時刻の大小を定めている。以降の説明でも同様である。）
t_現在≦t_終端-(T_第二推定+T_合成+T_提示) …(1) Based on this consideration, the required time T _{second estimation} process completion in the second estimation unit 4, a required time T _Synthesis of completion of processing by the combining unit 5, required to carry out the presentation processing in the presentation unit 6 time T _{The presentation} (the time required to prepare the AR display information and the like required to perform the presentation processing for one frame) and ( _presentation T ₂ +T _synthesis +T _presentation ) presentation interval t _{the beginning} of, before the time going back by the added amount of the terminating time t _end at t _end, it is desirable to start the process in the second estimation unit 4. That is, in the current time t _current as a timing for starting the estimation process has been described more than the example of FIG. 3, it is desirable to set as the condition is satisfied the following equation (1). (Note that, in the following formula (1) and the example described above, the value of the time is assumed to be larger as the time on the future side is similar to the case where the value of the time is normally treated as such. The size is specified. The same applies to the following explanations.)
t _present ≤ t _terminal- (T _{second estimation} + T _synthesis + T _presentation ) (1)

なお、式(1)における３つの所要時間T_第二推定、T_合成及びT_提示に関しては、情報端末装置10が利用される実環境等でその実績値を実験的に収集しておき、その実績値における何らかの代表値等を用いるようにすればよい。例えば、平均値や最頻値を用いてもよいし、平均値や最頻値に所定のマージンを加えた値などを用いてもよい。 Regarding the three required times T _{second estimation} , T _synthesis, and T _presentation in Expression (1), the actual values are experimentally collected in the actual environment in which the information terminal device 10 is used, and the actual results are obtained. A representative value or the like of the values may be used. For example, an average value or mode value may be used, or a value obtained by adding a predetermined margin to the average value or mode value may be used.

第二推定部4による姿勢変化の推定処理それ自体には、適用可能対象が姿勢データに限定されない任意種類の時系列データの未来の挙動を予測する任意の既存手法を用いてよく、例えば、カルマンフィルタ(Kalman filter)や深層学習を利用してよい。 For the posture change estimation processing itself by the second estimation unit 4, any existing method for predicting future behavior of any type of time series data whose applicable target is not limited to posture data may be used, for example, a Kalman filter. (Kalman filter) or deep learning may be used.

図７は、一実施形態に係る第二推定部4の動作のフローチャートである。ステップS41では、現時刻が推定タイミングに到達したか否かを判定し、到達していればステップS42へと進み、到達していなければステップS41に戻り、肯定判定が得られるまでステップS41を繰り返す。ステップS42では、ステップS41で肯定判定を得た現在時点t_現在までにおいて参照可能となっている時系列上の履歴としてのセンサ情報を用いて、提示部6における次の提示間隔t_始端,t_終端での姿勢変化情報を推定して合成部5へと出力し、ステップS41へと戻る。 FIG. 7 is a flowchart of the operation of the second estimation unit 4 according to the embodiment. In step S41, it is determined whether or not the current time has reached the estimated timing, and if it has arrived, the process proceeds to step S42, and if it has not arrived, the process returns to step S41, and step S41 is repeated until a positive determination is obtained. .. In step S42, using the sensor information as a history of the time on the series has become visible in until an affirmative decision obtained present time t _currently in step S41, the next presentation interval t _start in the presentation unit 6, t _termination The posture change information in 1 is estimated and output to the synthesis unit 5, and the process returns to step S41.

ステップS41の判定は、少なくとも前述の式(1)を満たすものとして判定すればよい。詳細には、以下の考察に基づいてステップS41の肯定判定を得る現時刻と、その次のステップS42において推定対象となる姿勢変化情報と、を設定することができる。 The determination in step S41 may be determined as satisfying at least the above equation (1). Specifically, it is possible to set the current time at which an affirmative determination is made in step S41 based on the following consideration, and the posture change information to be estimated in the next step S42.

ここで、第二推定部4では現時刻t_現在に対する未来としての、提示部6における次の提示間隔t_始端,t_終端（始端側時刻t_始端及び終端側時刻t_終端のうち少なくとも終端側時刻t_終端が現時刻t_現在に対する未来であり、t_現在＜t_終端となる）での変化予測として、姿勢変化情報を推定する。未来予測は一般に、より先の未来に対するより長期の予測となるほどその予測精度が落ちる傾向があることが想定される。従って、未来に対する姿勢変化情報の推定精度を確保するためには、可能な限り近い未来を予測対象として設定することが好ましい。 Here, as the future for the second estimation unit 4 at the current time t _current, following presentation interval t _start in the presentation unit 6, t _terminated (at least the terminating time t of the start end time t _start and end side time t _end _{The end point} is the future with respect to the present time t _present , and t _present <t _end ). The posture change information is estimated as a change prediction. It is generally assumed that the prediction accuracy of the future tends to decrease as the prediction becomes longer in the future. Therefore, in order to ensure the estimation accuracy of the posture change information with respect to the future, it is preferable to set the future as close as possible to the prediction target.

従って、提示部6におけるk番目(k=1,2,…)の処理間隔を間隔t_始端[k],t_終端[k]（k番目の始端側時刻t_始端[k]及びk番目の終端側時刻t_終端[k]で定まる間隔。従って、t_始端[k]=t_終端[k-1]の関係がある。）とすると、現時刻t_現在がk番目の提示処理間隔t_始端[k],t_終端[k]内にある場合（t_始端[k]<t_現在<t_終端[k]）に、最も近い未来であるその次のk+1番目の処理間隔t_始端[k+1],t_終端[k+1]での姿勢変化情報を推定するように、ステップS41におけるk番目の姿勢変化情報の推定開始タイミングとする現時刻現時刻t_現在[k]を、k番目の提示処理間隔t_始端[k],t_終端[k]と同期（タイミングの間隔を一致させることによる同期）させて設定しておくことが望ましい。すなわち、k番目の提示処理間隔の長さt_終端[k]-t_始端[k]よりも短い所定値c（0<c<t_終端[k]-t_始端[k]…(2)）を設定しておき、ステップS41におけるk番目の姿勢変化情報の推定開始タイミング（その次のk+1番目の提示処理間隔t_始端[k+1],t_終端[k+1]での姿勢変化情報の推定開始タイミング）とする現時刻t_現在[k]を以下の式(3)のように設定してよい。
t_現在[k]=t_終端[k]-c …(3) Therefore, the k-th (k=1, 2,...) Processing interval in the presentation unit 6 is set to the interval t start point _[k] , t _{end [k]} (k-th start time t start point _[k] and k-th end point). interval defined by side the time t _{end [k].} Thus, t _{start [k]} = t _{end [k-1]} related.) and when the current time t _current is the k-th presentation processing interval t _{start [k} of _], if it is in t _{end [k]} (t _{start [k]} <t _current <t _{end [k]),} the the following is closest future k + 1 th processing interval t _{start [k + 1 ]} , t _{present [k]} at the current time current time t _{present [k]} as the estimation start timing of the kth posture change information in step S41 so as to estimate the posture change information at the _{terminal [k+1]} processing interval t _{start [k],} it is desirable to set the allowed (synchronized by matching the interval timing) synchronization and t _{end [k].} That, k-th presentation processing interval length t _{end [k]} -t _{start [k]} is shorter than the predetermined value c (0 <c <t _{end [k]} -t _{start [k]} ... (2)) the set advance, the posture change information in estimating the start timing of the k-th posture change information in step S41 (the next (k + 1) th presentation processing interval t _{start [k + 1],} t _{end [k + 1]} The current time t _{present [k],} which is the estimated start timing of (3), may be set as in the following equation (3).
t _{present [k]} = t _{terminal [k]} -c …(3)

なお、式(2)の範囲内の定数cによって式(3)でk+1番目の提示処理間隔t_始端[k+1],t_終端[k+1]の姿勢変化情報を推定開始するように設定すると、式(1)(「t_終端=t_終端[k+1]」且つ「t_現在=t_現在[k]」とした式(1))が成立しない場合、すなわち、現時刻がk番目の提示処理間隔t_始端[k],t_終端[k]内にある時点でその次のk+1番目の提示処理間隔t_始端[k+1],t_終端[k+1]での姿勢変化情報を推定開始すると、その終端時刻t_終端[k+1]までに推定処理が完了しない場合、姿勢変化情報を推定する対象をさらに次のk+2番目の提示処理間隔t_始端[k+2],t_終端[k+2]とすればよい。このように設定しても依然として式(1)(「t_終端=t_終端[k+2]」且つ「t_現在=t_現在[k]」とした式(1))が成立しない場合、予測対象をさらにその先のk+3,k+4,…番目の提示処理間隔における姿勢変化情報とし、式(1)が成立するような最も近い未来の提示処理間隔を姿勢変化情報の推定対象とすればよい。 Note that the posture change information at the k+1-th presentation processing interval t _{starting point [k+1]} , t _{terminal [k+1]} in equation (3) is estimated to be started by the constant c within the range of equation (2). If (1) with “t _end =t _{end [k+1]} ” and “t _present =t _{present [k]} ” does not hold, that is, the current time is k th presentation processing interval t _{start [k],} the following k + 1-th presentation processing interval t _starting at some point in t _{end _{[k] [k + 1]}} , the attitude at t _{end [k + 1]} When the change information is estimated, if the estimation process is not completed by the end time t _{terminal [k+1]} , the target for which the posture change information is estimated is the next k+ second presentation processing interval t start point _{[k+]. 2]} , t _{terminal [k+2]} . Even if you set in this way, if the formula (1) (“t _end = t _{end [k+2]} ” and “t _present = t _{present [k]} ” (1)) does not hold, the prediction target Is the posture change information at the further k+3, k+4,...th presentation processing interval, and the closest future presentation processing interval such that Eq. Good.

逆に、現時刻がk番目の提示処理間隔t_始端[k],t_終端[k]内にある時点で式(2)を満たす何らかの定数cによって姿勢変化情報の推定開始タイミングt_現在[k]を設定し、姿勢変化情報の推定対象をk番目の提示処理間隔t_始端[k],t_終端[k]としてその終端時刻t_終端[k]までに推定処理が完了する場合（式(1)が「t_終端=t_終端[k]」且つ「t_現在=t_現在[k]」の場合でも成立する場合）、その通りに設定してもよい。すなわち、現時刻がk番目の提示処理間隔t_始端[k],t_終端[k]内にある時点で姿勢変化情報の推定開始タイミングt_現在[k]を設定し、姿勢変化情報の推定対象をその現時刻t_現在[k]が属するk番目の提示処理間隔t_始端[k],t_終端[k]における姿勢変化情報としてもよい。 On the contrary, when the present time is within the k-th presentation processing interval t start point _[k] , t _{end [k]} , the posture change information estimation start timing t _{current [k]} set, attitude change the estimation target k-th presentation processing interval t _start information _[k], if the estimation processing until the end time t _{end [k]} as t _{end [k]} is completed (formula (1) Is satisfied even when “t _end =t _{end [k]} ” and “t _present =t _{present [k]} ”), it may be set as it is. That is, the estimation start timing t _{present [k]} of the posture change information is set at the time when the present time is within the k-th presentation processing interval t start _end _[k] , t _{end [k]} , and the posture change information estimation target is set. The attitude change information at the k-th presentation processing interval t start point _[k] , t _{end [k] to} which the current time t _{present [k]} belongs may be used.

合成部5は、第一推定部3から得られる姿勢情報と第二推定部4から得られる姿勢変化情報とを合成し、現時刻から見た未来における姿勢情報を合成し、合成姿勢情報として提示部6へと出力する。後述するように、この未来の合成姿勢情報は、合成部5で合成された後の未来において提示部6で利用される際には、提示部6の提示タイミングである現時刻に対応するものとして利用されることとなる。 The synthesis unit 5 synthesizes the posture information obtained from the first estimation unit 3 and the posture change information obtained from the second estimation unit 4, synthesizes the posture information in the future seen from the present time, and presents it as synthetic posture information. Output to part 6. As will be described later, this future composite posture information is assumed to correspond to the present time, which is the presentation timing of the presentation unit 6, when it is used by the presentation unit 6 in the future after being composed by the composition unit 5. Will be used.

一実施形態では、提示部6の提示タイミングは撮像部1での撮像時刻と一致して設定され、合成部5の合成処理開始時点（＝現時刻t_合成開始）で参照可能となっている最新の姿勢情報P_aと、当該姿勢情報に紐づけられている時刻t_aから次の提示タイミングt_iまでの１つ以上の姿勢変化情報ΔP_k,k+1（k=a,a+1,…,i-2,i-1）と、を用いて、以下の式(4)で合成姿勢情報Pe_iを合成する。ここで前述のように、姿勢情報P_aは現時刻t_合成開始から見て過去である時刻t_aの撮像画像I_aから第一推定部3により推定され、現時刻t_合成開始において合成部5において参照可能となっているものである。また、次の提示タイミングt_iは現時刻t_合成開始から見て未来となるものである。 In one embodiment, the presentation timing of the presentation unit 6 is set so as to coincide with the imaging time of the imaging unit 1, and the latest timing that can be referred to at the time when the composition process of the composition unit 5 starts (=current time t _{composition start} ). and orientation information P _a of one or more posture change information [Delta] P _k from time t _a which is associated string to the posture information until the next presentation timing _{_{t i, k + 1 (k}} = a, a + 1, , I-2, i-1) and the combined posture information Pe _i is calculated by the following equation (4). Here, as described above, the posture information P _a is estimated by the first estimating unit 3 from the captured image I _a time t _a is in the past when viewed from the _starting current time t, _synthetic unit 5 at _{the start} current time t _synthesis Can be referred to in. Further, the next presentation timing t _i is the future when viewed from the _{start of synthesis at the} current time t.

既に説明したように、姿勢情報P_aと姿勢変化情報ΔP_k,k+1とは共に平面射影変換行列の形で得ることができるものである。式(4)ではこの行列の積を求めることで、同じく平面射影変換行列の形で合成姿勢情報Pe_iを得ている。 As already described, both the posture information P _a and the posture change information ΔP _k,k+1 can be obtained in the form of a plane projective transformation matrix. In equation (4), the product of these matrices is obtained, and thus the combined posture information Pe _i is also obtained in the form of a plane projective transformation matrix.

なお、合成部5では、参照可能な最新の姿勢情報P_aであっても、対応する姿勢変化情報ΔP_k,k+1（k=a,a+1,…,i-2,i-1）が存在しないものは採用しない。すなわち、合成部5では、参照可能な姿勢情報P_aであって、且つ、対応する姿勢変化情報ΔP_k,k+1（k=a,a+1,…,i-2,i-1）が存在するものの中から最新のものとして、姿勢情報P_aを選択して、式(4)により合成処理を行う。 In the synthesizing unit 5, the corresponding posture change information ΔP _k,k+1 (k=a,a+1,...,i-2,i-1) is used even for the latest referable posture information P _a. ) Does not exist is not adopted. That is, in the synthesizing unit 5, the posture information P _a that can be referred to and the corresponding posture change information ΔP _k,k+1 (k=a,a+1,...,i-2,i-1) The posture information P _a is selected as the latest one from among the existing ones, and the combining process is performed according to the equation (4).

図３の例では、線L9上に式(4)による合成部5の合成処理が示されている。式(4)の通り、合成処理は平面射影変換行列（サイズ3×3）の積算であり高速に（ほぼ瞬時に）計算可能であるため、線L9上では線L4,L5,L6上の第一推定部3や線L8上の第二推定部4の処理例の場合とは異なり、処理時間範囲をハッチ付与枠で示すことはせず、ほぼ瞬時に得られた合成姿勢情報Pe_i（i=4,5,6,7,8）のみを示している。 In the example of FIG. 3, the combining process of the combining unit 5 according to the equation (4) is shown on the line L9. As shown in equation (4), the combining process is an integration of the plane projective transformation matrix (size 3×3) and can be calculated at high speed (almost instantaneously). Therefore, on line L9, the values on lines L4, L5, and L6 Unlike the case of the processing example of the one estimation unit 3 or the second estimation unit 4 on the line L8, the processing time range is not indicated by the hatching frame, and the synthesized posture information Pe _i (i =4,5,6,7,8) only.

具体的に、線L9上では以下のように各合成姿勢情報Pe_i（i=7,8）が得られている。未来の提示時刻t₇に対して現時刻t_合成開始（t₆<t_合成開始<t₇）にて、参照可能であり且つ対応する姿勢変化情報が存在する時刻t₃の姿勢情報P₃と、この時刻t₃から未来の提示時刻t₇までの間の姿勢変化情報ΔP_3,4,ΔP_4,5,ΔP_5,6,ΔP_6,7と、を式(4)により合成することで以下の式(4-7)のように未来の提示時刻t₇における合成姿勢情報Pe₇が得られる。
Pe₇=ΔP_6,7ΔP_5,6ΔP_4,5ΔP_3,4P₃ …(4-7) Specifically, each combined posture information Pe _i (i=7,8) is obtained on the line L9 as follows. At the present time t _{synthesis start} (t ₆ <t _{synthesis start} <t ₇ ) with respect to the future presentation time t ₇ , it is possible to refer to the posture information P _{3 at} time t ₃ at which there is corresponding posture change information. , By combining the posture change information ΔP _3,4, ΔP _4,5 , ΔP _5,6 , ΔP _6,7 from this time t ₃ to the future presentation time t ₇ by the equation (4). synthetic attitude information Pe ₇ in the presentation time t ₇ of the future can be obtained as shown in the following equation (4-7).
Pe ₇ = ΔP _6,7 ΔP _5,6 ΔP _4,5 ΔP _3,4 P ₃ …(4-7)

同様に、未来の提示時刻t₈に対して現時刻t_合成開始（t₇<t_合成開始<t₈）にて、参照可能であり且つ対応する姿勢変化情報が存在する時刻t₄の姿勢情報P₄と、この時刻t₄から未来の提示時刻t₈までの間の姿勢変化情報ΔP_4,5,ΔP_5,6,ΔP_6,7,ΔP_7,8と、を式(4)により合成することで以下の式(4-8)のように未来の提示時刻t₈における合成姿勢情報Pe₈が得られる。
Pe₈=ΔP_7,8ΔP_6,7ΔP_5,6ΔP_4,5P₄ …(4-8) Similarly, the posture information at time t ₄ that can be referred to at the present time t _{synthesis start} (t ₇ <t _{synthesis start} <t ₈ ) with respect to the future presentation time t ₈ and at which corresponding posture change information exists P ₄ and the posture change information ΔP _4,5 , ΔP _5,6 , ΔP _6,7 , ΔP _7,8 from this time t ₄ to the future presentation time t ₈ are combined by equation (4). By doing so, the combined posture information Pe ₈ at the future presentation time t ₈ can be obtained as in the following Expression (4-8).
Pe ₈ = ΔP _7,8 ΔP _6,7 ΔP _5,6 ΔP _4,5 P ₄ …(4-8)

なお、線L9上のその他の合成姿勢情報Pe_i（i=4,5,6）に関しても、線L8上では不図示となっている姿勢変化情報ΔP_1,2,ΔP_2,3等と、姿勢情報P₁,P₂,P₃のいずれかと、を上記の(4-7),(4-8)の例の場合と同様に適切な合成処理開始タイミングt_合成開始において適切に選択することで、式(4)により合成することが可能である。 As for the other combined posture information Pe _i (i=4,5,6) on the line L9, the posture change information ΔP _1,2 , ΔP _2,3, etc. not shown on the line L8, and one of orientation information _{_{_{P 1, P 2, P 3}}} , the above (4-7), a suitable selection in a similar appropriate synthetic processing start timing t _{synthesis initiation} and in the example of (4-8) Then, it can be synthesized by the formula (4).

提示部6では、撮像部1から得られる撮像情報と合成部5から得られる合成姿勢情報とを用いて、合成姿勢情報に応じた提示情報を生成したうえでこの提示情報を撮像情報に対して重畳することで提示する。ここで、合成姿勢情報に応じた提示情報の生成に関しては、既存のAR技術を用いて、情報端末装置10がその用途として実現するアプリケーション内容に応じた任意内容のものを生成してよい。例えば、撮像情報に撮影されている対象物に対して、この対象物の空間的位置及び撮像しているカメラ201の姿勢（すなわち、合成姿勢情報において表現されている姿勢）に応じた空間位置を占める、対象物に対する解説コメントが記載された看板を、CG（コンピュータグラフィック）等によって生成したものを提示情報としてもよい。 The presentation unit 6 uses the imaging information obtained from the imaging unit 1 and the synthetic posture information obtained from the synthesis unit 5 to generate presentation information according to the synthetic posture information, and then presents this presentation information to the imaging information. Present by superimposing. Here, regarding the generation of the presentation information according to the combined posture information, existing AR technology may be used to generate arbitrary content according to the application content realized by the information terminal device 10 as its use. For example, with respect to the object imaged in the imaging information, the spatial position according to the spatial position of the object and the attitude of the camera 201 that is imaging (that is, the attitude represented in the combined attitude information) is set. The presentation information may be generated by a computer graphics (CG) or the like of a signboard that occupies a commentary comment on the object.

なお、上記の実施形態は、提示部6がシースルー型ディスプレイ203ではなく、通常のディスプレイ202を用いて実現される場合のものである。提示部6がシースルー型ディスプレイ203を用いて実現される実施形態においては、提示部6では、撮像情報（シースルー型ディスプレイ203を通じてユーザに見えている景色に相当する）を用いることなく、合成姿勢情報や現実空間に存在する対象物（カメラ201に撮像されている対象物）に応じて生成された提示情報のみを、シースルー型ディスプレイ203上に表示するようにしてよい。 The above-described embodiment is for the case where the presentation unit 6 is realized by using the normal display 202 instead of the see-through type display 203. In the embodiment in which the presentation unit 6 is realized by using the see-through display 203, the presentation unit 6 does not use the imaging information (corresponding to the view seen by the user through the see-through display 203) and the combined posture information Alternatively, only the presentation information generated according to the target object (the target object imaged by the camera 201) existing in the physical space may be displayed on the see-through display 203.

図８は、一実施形態に係る合成部5及び提示部6の動作のフローチャートである。図８において、ステップS51,S52,S53が合成部5の動作に関する部分であり、ステップS54,S55が提示部6の動作に関する部分である。 FIG. 8 is a flowchart of operations of the combination unit 5 and the presentation unit 6 according to the embodiment. In FIG. 8, steps S51, S52, and S53 are parts related to the operation of the combining unit 5, and steps S54 and S55 are parts related to the operation of the presentation unit 6.

ステップS51では、現在時刻が合成部5における合成処理のタイミングに到達したか否かが判定され、到達していればステップS52へと進み、到達していなければステップS51に戻ってステップS51の判定を繰り返す。なお、ステップS51で肯定判定を得るタイミングは、既に説明した合成部5における合成開始時刻t_合成開始である。ステップS52では、当該肯定判定を得た現在時刻t_合成開始において利用可能である姿勢情報と、対応する姿勢変化情報とを取得してステップS53へと進む。ステップS53では、ステップS52において取得した情報を用いて、合成部5が前述の式(4)により、合成姿勢情報を得て提示部6へと出力してから、ステップS54へと進む。 In step S51, it is determined whether or not the current time has reached the timing of the synthesizing process in the synthesizing unit 5. If the current time has arrived, the process proceeds to step S52. If not, the process returns to step S51 and the determination in step S51. repeat. The timing at which an affirmative determination is obtained in step S51 is the synthesis start time t _{synthesis start} in the synthesis section 5 already described. In step S52, the posture information that can be used at the _start of the current time t combination for which the affirmative determination is obtained and the corresponding posture change information are acquired, and the process proceeds to step S53. In step S53, using the information acquired in step S52, the synthesizing unit 5 obtains the synthesized posture information by the above-mentioned equation (4) and outputs it to the presentation unit 6, and then proceeds to step S54.

ステップS54では、現在時刻（ステップS52で肯定判定を得た時刻t_合成開始のすぐ後に相当する）が、提示部6における提示タイミングに到達したか否かを判定し、到達していればステップS55へと進み、到達していなければステップS54に戻ってステップS54の判定を繰り返す。ステップS55では、当該到達した現在時刻に対応する合成姿勢情報を用いて提示情報を生成したうえで、シースルー型ディスプレイ203の場合はこの提示情報のみを、又は、通常のディスプレイ202の場合はこの提示情報を撮像情報に重畳したものを、提示部6が提示してから、ステップS51へと戻る。 In step S54, it is determined whether or not the present time (corresponding to immediately after the _start of the time t combination for which an affirmative determination is obtained in step S52) has reached the presentation timing in the presentation unit 6, and if it has arrived, step S55 If not, the process returns to step S54 to repeat the determination of step S54. In step S55, the presentation information is generated using the synthesized posture information corresponding to the reached current time, and then only the presentation information is displayed in the case of the see-through display 203, or this presentation information is displayed in the case of the normal display 202. After the presentation unit 6 presents the information superimposed on the imaging information, the process returns to step S51.

なお、提示部6による提示情報の生成や、さらに撮像情報に重畳する処理は、ステップS53内において合成部5が合成姿勢情報を得た後にただちに実施するようにして、ステップS55では既に生成・重畳されている提示情報等をそのまま提示するようにしてもよい。 Note that the generation of the presentation information by the presentation unit 6 and the processing of further superimposing it on the imaging information are performed immediately after the synthesis unit 5 obtains the synthetic posture information in step S53, and already generated/superimposed in step S55. The presented information or the like may be presented as it is.

ステップS54で肯定判定を得るタイミングとしての提示部6による提示タイミングは、既に言及した通り、所定レートのものとすることができ、例えば、撮像部1による撮像時刻（あるいは撮像タイミング）に一致させてよい。なお、第二推定部4の説明の際に既に言及したk回目の提示処理間隔t_始端[k],t_終端[k]は、この提示タイミングによって定義されるものである。すなわち、始端側時刻t_始端[k]と終端側時刻t_始端[k]とは共に提示タイミングであり、且つ、互いに隣接する提示タイミングである。 The presentation timing by the presentation unit 6 as the timing for obtaining an affirmative determination in step S54 can be set at a predetermined rate as already mentioned, and for example, can be matched with the imaging time (or imaging timing) by the imaging unit 1. Good. Incidentally, already mentioned k-th presentation processing interval t _start in the description of the second estimation unit 4 _[k], t _{end [k]} is intended to be defined by the presentation timing. That is, both the start time t start end _[k] and the end time t start end _[k] are presentation timings and are adjacent presentation timings.

ステップS51における肯定判定を得る時刻としての合成タイミングt_合成開始は、ステップS54の提示タイミングと同期（タイミングの間隔を一致させることによる同期）させて、この提示タイミングの直前にある所定時刻として設定してよい。すなわち、k回目の合成タイミングをt_{合成開始[k]}とすると、以下の式(5A)〜(5C)のように提示タイミングと同期させてこれを設定してよい。
t_始端[k]< t_{合成開始[k]}<t_終端[k] …(5A)
t_{合成開始[k]}= t_終端[k]-b (bは以下の式(5C)の範囲にある定数) …(5B)
0<b<t_終端[k]- t_始端[k] …(5C) The synthesis timing t, which is the time at which an affirmative determination is obtained in step S51, is _started in synchronization with the presentation timing in step S54 (synchronization by matching the timing intervals) and set as a predetermined time immediately before this presentation timing. You can That is, when the k-th combining timing is t _{combining start [k]} , this may be set in synchronization with the presentation timing as in the following equations (5A) to (5C).
t _{start point} _[k] <t _{synthesis start [k]} <t _{end point [k]} …(5A)
t _{synthesis start [k]} = t _{termination [k]} -b (b is a constant within the range of the following formula (5C)) (5B)
0<b<t _{end [k]} -t start _[k] …(5C)

なお、図３の例では、線L9上に示される合成姿勢情報Pe_i(i=4,5,…,8)は、上記の合成タイミングt_{合成開始[k]}ではなく、この合成処理が完了した時刻位置に示されるものである。（合成処理は画像転送と比べると瞬時であるが、式(1)でも説明したように、変動しうる幾分かの時間を要するものである。） In the example of FIG. 3, the combined posture information Pe _i (i=4,5,..., 8) shown on the line L9 is not the above-mentioned combined timing t _{combined start [k]} , and this combined process is completed. It is shown at the time position. (Composite processing is instantaneous compared to image transfer, but as described in equation (1), it takes some time that can fluctuate.)

図３の例では、線L10上に提示部6による処理例が示されている。すなわち、撮像時刻t_i(i=4,5,…,8)と一致して提示部6による提示タイミングが設定され、この提示タイミングt_i(i=4,5,…,8)の直前で対応する合成姿勢情報Pe_i(i=4,5,…,8)を用いて提示情報A_i(i=4,5,…,8)を生成しておき、提示タイミングt_i(i=4,5,…,8)において提示情報A_i(i=4,5,…,8)を提示部6が提示している。 In the example of FIG. 3, a processing example by the presentation unit 6 is shown on the line L10. That is, the presentation timing by the presentation unit 6 is set in agreement with the imaging time t _i (i=4,5,..., 8), and immediately before this presentation timing t _i (i=4,5,..., 8). The presentation information A _i (i=4,5,..., 8) is generated using the corresponding synthetic posture information Pe _i (i=4,5,..., 8), and the presentation timing t _i (i=4 , 5,..., 8), the presentation unit 6 presents the presentation information A _i (i=4,5,..., 8).

以上のように、その一実施形態として例示された本発明によれば、可能な限り現在時刻に近い過去の撮像画像から高精度に、カメラ201の姿勢を姿勢情報として求めておき、且つ、この過去の姿勢情報の現在時刻までの微小時間における変化分も、センサ204の出力に未来予測を適用して姿勢変化情報として求めておき、現在時刻におけるカメラ201の姿勢を合成姿勢情報として得るので、現在時刻におけるカメラ201の姿勢を遅延なく高精度に求めることができる。この合成姿勢情報を用いて提示部6においてAR表示を実現すれば、特にシースルー型ディスプレイ203を用いる場合（ユーザに見えている景色が常に現在となる場合）であっても、遅延なく高精度なAR表示が実現されることとなる。 As described above, according to the present invention exemplified as one embodiment thereof, the attitude of the camera 201 is obtained as attitude information with high accuracy from a past captured image that is as close as possible to the current time, and Since the future prediction is also applied to the output of the sensor 204 as the posture change information to obtain the change amount of the past posture information in the minute time until the current time, the posture of the camera 201 at the current time is obtained as the synthetic posture information. The posture of the camera 201 at the current time can be obtained with high accuracy and without delay. If the AR display is realized in the presenting unit 6 using this synthetic posture information, even if the see-through display 203 is used (when the view seen by the user is always the current state), there is no delay and high accuracy. AR display will be realized.

なお、シースルー型ではない通常のディスプレイ202で提示部6を実現する場合も、提示情報を重畳する対象の撮像画像として、撮像された直後の現在時刻とみなせるもの（カメラ201で映像撮影している際のプレビュー表示等で得られるもの）を採用することで、すなわち、姿勢情報を得るのは過去の撮像画像を用いるが、提示情報を重畳する対象としては現在の撮像画像を用いることで、シースルー型ディスプレイ203の場合とほぼ同様に、現在時刻において遅延なく高精度なAR表示を実現することができる。 Even when the presentation unit 6 is realized by the normal display 202 that is not the see-through type, the captured image of the target on which the presentation information is superimposed can be regarded as the current time immediately after being captured (the image is captured by the camera 201. That is obtained by a preview display at the time), that is, the past captured image is used to obtain the posture information, but the present captured image is used as the target to which the presentation information is superimposed. Similar to the case of the model display 203, highly accurate AR display can be realized without delay at the current time.

すなわち、図３の例であれば、線L10上の提示タイミングt_i(i=4,5,…,8)における提示情報A_i(i=4,5,…,8)を、これと同時刻である撮像時刻t_i(i=4,5,…,8)において撮像された撮像画像I_i(i=4,5,…,8)に対して重畳したものを、提示部6において映像として提示するようにすればよい。 That is, in the example of FIG. 3, the presentation information A _i (i=4,5,..., 8) at the presentation timing t _i (i=4,5,..., 8) on the line L10 is the same as this. In the presentation unit 6, an image superimposed on the captured image I _i (i=4,5,..., 8) captured at the capturing time t _i (i=4,5,..., 8), which is the time It should be presented as.

以下、本発明のその他の実施形態などに関する追加説明を行う。 Hereinafter, additional description regarding other embodiments of the present invention will be given.

（１）一定条件が満たされる場合に、合成部5が合成姿勢情報を合成する処理の負荷を軽減することが可能な実施形態として、次がある。既に説明したように、合成部5では繰り返し合成姿勢情報を合成しているが、k回目の合成処理と、その次のk+1回目の合成処理において、式(4)で利用する姿勢情報P_aが変化しない場合、既に求まっているk回目の合成姿勢情報Pe_kを用いて、式(4)の全ての積を計算することなく、k+1回目の合成姿勢情報Pe_k+1への変化分に対応する1つの姿勢変化情報ΔP_k,k+1のみを積算することで、以下の式(6)のようにk+1回目の合成姿勢情報Pe_k+1を簡素に計算してもよい。
Pe_k+1=ΔP_k,k+1Pe_k …(6) (1) The following is an embodiment in which the composition unit 5 can reduce the load of the process of combining the combined posture information when a certain condition is satisfied. As described above, the synthesizing unit 5 synthesizes the synthetic posture information repeatedly, but in the k-th synthesis process and the k+1-th subsequent synthesis process, the posture information P used in Equation (4) is used. _{When a} does not change, using the k-th combined posture information Pe _k that has already been obtained, it is possible to obtain the k+1-th combined posture information Pe _k+1 without calculating all products of equation (4). By accumulating only _one piece of posture change information ΔP _k,k+1 corresponding to the change amount, the k+1-th combined posture information Pe _k+1 can be simply calculated as in the following equation (6). Good.
Pe _k+1 = ΔP _k,k+1 Pe _k …(6)

なお、k+2回目以降においても依然としてk回目において式(4)で利用する姿勢情報P_aが変化しない場合は、同様に式(6)を繰り返し利用してよい。 If the posture information P _a used in the equation (4) does not change at the k-th time even after the k+2th time, the equation (6) may be repeatedly used.

ここで、k回目の合成処理と、その次のk+1回目（及びこれ以降）の合成処理において、式(4)で利用する姿勢情報P_aが変化しないと判断される場合としては、次の第一及び第二の場合が挙げられる。ここで説明のため、k回目及びk+1回目の合成処理で利用する姿勢情報をそれぞれ、a[k]番目及びa[k+1]番目の撮像時刻t_a[k]及びt_a[k+1]（これらが同時刻の場合を含む）の撮像画像I_a[k]及びI_a[k+1]から得たものとして、P_a[k]及びP_a[k+1]と表記する。第一の場合として、a[k]=a[k+1]である場合、すなわち、利用する姿勢情報がk回目とk+1回目とで、同じ撮像時刻t_a[k]=t_a[k+1]の撮像画像から得られたものである場合がある。第一の場合はこの定義の通り、自動で判断することが可能である。第一の場合に該当するのは、この同じ撮像時刻t_a[k]=t_a[k+1]よりも後の撮像画像が破棄されてしまっている場合、又は、この撮像時刻t_a[k]=t_a[k+1]よりも後の撮像画像は得られているが、k+1回目の合成処理の開始時点においてはその姿勢情報の推定処理が完了していない場合、となる。 Here, in the case where it is determined that the posture information P _a used in Expression (4) does not change in the k-th combining process and the k+1-th (and subsequent) combining processes that follow, The first and second cases of are mentioned. For the sake of explanation, the posture information used in the k-th and k+1-th synthesis processing is respectively a[k]-th and a[k+1]-th imaging times t _a[k] and t _{a[k +1]} (including the case where they are at the same time) is described as P _a[k] and P _a[k+1] as obtained from the captured images I _a[k] and I _a[k+1] To do. In the first case, when a[k]=a[k+1], that is, when the posture information to be used is the k-th time and the k+1-th time, the same imaging time t _a[k] =t _{a[ It} may have been obtained from a captured image of _[k+1] . In the first case, it is possible to judge automatically according to this definition. The first case corresponds to the case where captured images after this same capturing time t _a[k] =t _a[k+1] have been discarded, or this capturing time t _{a[ When the} captured image after _k] =t _a[k+1] is obtained, but when the estimation process of the posture information is not completed at the start point of the k+1-th synthesis process, ..

第二の場合として、a[k]≠a[k+1]であり、別の撮像時刻t_a[k]≠t_a[k+1]の撮像画像I_a[k]及びI_a[k+1]から姿勢情報P_a[k]及びP_a[k+1]が得られているが、これら姿勢情報がほぼ等しく変化がないと判定される場合がある。すなわち、「P_a[k]≒P_a[k+1]」と判定される場合である。この判定は例えば、両行列の差のノルム|P_a[k]-P_a[k+1]|（行列の各成分の絶対値和によるノルム等、以下同様）が閾値以下であることによって判定すればよい。 In the second case, a[k]≠a[k+1], and the captured images I _a[k] and I _a[k ] at different imaging times t _a[k] ≠t _a[k+1] _{Although the} posture information P _a[k] and P _a[k+1] are obtained from _+1], it may be determined that the posture information does not change substantially equally. That is, this is the case where “P _a[k] ≈P _a[k+1] ” is determined. This judgment is made, for example, by the fact that the norm of difference between both matrices |P _a[k] -P _a[k+1] | (norm by sum of absolute value of each element of matrix, etc., below) is below threshold do it.

式(6)を適用する具体例として、前述の式(4-7)及び(4-8)の例に関して、k回目(k=7回目)は式(4-7)で算出し、k+1回目(k+1=8回目)において、図３の例とは異なり仮に姿勢情報P₄が利用できなかったとする場合、あるいは、姿勢情報P₄は利用可能だが「P₃≒P₄」と判定された場合には、式(4-8)の多数の積算に代えて以下の式(7)で直前の合成姿勢情報Pe₇に姿勢変化情報ΔP_7,8を乗ずるのみで、合成姿勢情報Pe₈を簡素に計算することが可能である。
Pe₈=ΔP_7,8Pe₇ …(7) As a specific example of applying the formula (6), with respect to the examples of the above formulas (4-7) and (4-8), the k-th time (k=7-th time) is calculated by the formula (4-7), and k+ At the first time (k+1=8th time), unlike the example of FIG. 3, if the posture information P ₄ is not available, or the posture information P ₄ is available, but “P ₃ ≈P ₄ ” If it is determined, instead of multiplying a large number of equations (4-8), by multiplying the immediately preceding synthetic posture information Pe ₇ by the posture change information ΔP _7,8 in the following formula (7), the synthetic posture information It is possible to calculate Pe ₈ simply.
Pe ₈ = ΔP _7,8 Pe ₇ …(7)

（２）既に説明したように、撮像部1は所定の撮像レートにおいて繰り返し撮像を行い、時間軸上での撮像画像I_n(n=1,2,…)を得る。そして、第一推定部3においては、この全て撮像画像I_n(n=1,2,…)を姿勢情報の推定対象としてもよいし、既に説明した第一手法により一定割合で間引いたもののみを姿勢情報の推定対象としてもよい。 (2) As described above, the imaging unit 1 repeatedly performs imaging at a predetermined imaging rate and obtains a captured image I _n (n=1, 2,...) On the time axis. Then, in the first estimation unit 3, all of the captured images I _n (n=1, 2,...) May be used as the estimation target of the posture information, or only the thinned images I _n (n=1, 2,...) At a fixed ratio by the first method already described. May be the target for estimating the posture information.

ここで、第一手法における間引き割合を動的に決定する実施形態として、次も可能である。この実施形態では、合成部5がタイミングを同期させて合成姿勢情報を合成する対象としての提示部6における提示タイミングは、撮像部1における撮像時刻（あるいは撮像タイミング）に合致させることを前提とする。第一推定部3では追加処理として、この提示タイミングにおいて、あるいは、ステップS31の推定処理が完了したタイミングにおいて、あるいは、その他の任意の所定タイミングにおいて、過去の同一の提示タイミングt_過去に対するものとして合成部5で得た合成姿勢情報Pe_過去と、この過去時刻t_過去にて撮像部1で撮像画像I_過去を撮像して第一推定部3で既に推定済みとなっているその姿勢情報P_過去と、のうち、参照可能であり且つ現在時刻に近い側のものを比較する処理を行う。すなわち、両行列の差のノルム|Pe_過去-P_過去|を評価する処理を行う。 Here, the following is also possible as an embodiment of dynamically determining the thinning rate in the first method. In this embodiment, it is premised that the presentation timing of the presentation unit 6 as a target for the synthesis unit 5 to synthesize the synthetic posture information by synchronizing the timing is matched with the imaging time (or the imaging timing) of the imaging unit 1. .. In the first estimation unit 3, as an additional process, at the presentation timing, at the timing when the estimation process of step S31 is completed, or at any other predetermined timing, the same presentation timing t in the _past is combined with the _past . The combined posture information Pe _past obtained by the unit 5 and the posture information P _past which is already estimated by the first estimation unit 3 by capturing the captured image I _past by the imaging unit 1 at the past time t _past . , Among those that can be referred to and are closer to the current time are compared. That is, the process of evaluating the norm |Pe _past- P _past | of the difference between both matrices is performed.

この比較処理により差のノルムが所定閾値以下となることで、両者が概ね等しい、すなわち、「Pe_過去≒P_過去」と判定されることはすなわち、合成部5で用いた姿勢変化情報の精度が高いことを意味している。そして、センサ204の出力に基づいて得られた姿勢変化情報の精度が高いということはすなわち、実際のカメラ201の姿勢変化が小さく、その動きが小さいことが原因であると想定される。従って、高精度であるが計算負荷も高い第一推定部3による撮像画像を用いた姿勢情報の推定処理レートを下げても、合成部5で得られる合成姿勢情報の精度は維持されることが期待される。 Since the norm of the difference becomes equal to or less than the predetermined threshold value by this comparison process, both are substantially equal, that is, it is determined that “Pe _past ≈P _past ” means that the accuracy of the posture change information used in the combining unit 5 is high. It means high. The fact that the posture change information obtained based on the output of the sensor 204 has high accuracy is considered to be because the actual posture change of the camera 201 is small and the movement thereof is small. Therefore, even if the estimation processing rate of the posture information using the captured image by the first estimation unit 3 with high accuracy but high calculation load is reduced, the precision of the combined posture information obtained by the synthesis unit 5 can be maintained. Be expected.

上記の考察に基づき、「Pe_過去≒P_過去」と判定された場合には、第一推定部3においては推定処理の間引き割合を動的に大きくするように変更してよい。より一般には、Pe_過去ととP_過去と相違を差ノルム|Pe_過去-P_過去|として評価し、差ノルムが小さいほど推定処理の間引き割合を大きく設定してよい。評価対象としての過去時刻t_過去は、直近の利用可能な1つのみとしてもよいし、2つ以上のものとして、差ノルムの線形和で評価してもよい。 Based on the above discussion, when it is determined that "Pe _past ≒ P _past" may be modified as in the first estimation unit 3 dynamically increasing the thinning ratio of the estimation process. More generally, the Pe _past and the P _previous and differences difference norm | Pe _past -P _past | evaluated as may be set large thinning ratio estimating process as the difference norm is small. The past time t _past as an evaluation target may be only the latest one available, or may be two or more and evaluated by a linear sum of difference norms.

図３の例であれば、時刻t₄において、合成姿勢情報Pe₄が式(4)による積「Pe₄=ΔP_3,4ΔP_2,3ΔP_1,2P₁」（既に述べた通り、ΔP_2,3及びΔP_1,2は線L8上において不図示であるが、適切な時間範囲で推定可能なものである）として合成完了した後に、時刻t₆と時刻t₇との間で第一推定部が時刻t₄の姿勢情報P₄の推定を完了するので、例えばその直後の時刻t₇において合成姿勢情報Pe₄と姿勢情報P₄とを比較し、一致度合が低ければ間引き割合を小さくし、一致度合が高ければ間引き割合を維持あるいは大きくする、といった動的な間引き割合の設定が可能である。 In the example of FIG. 3, at time t ₄ , the combined posture information Pe ₄ is the product “Pe ₄ =ΔP _3,4 ΔP _2,3 ΔP _1,2 P ₁ ”according to equation (4) (as already described, Although [Delta] P _2,3 and [Delta] P _{1, 2} are not shown in the line L8, after the completion of the synthesis as is capable estimation at the appropriate time range), the in between times t ₆ and time t ₇ Since the one estimation unit completes the estimation of the posture information P ₄ at time t ₄ , for example, at time t ₇ immediately thereafter, the synthetic posture information Pe ₄ and the posture information P ₄ are compared, and if the degree of coincidence is low, the thinning rate is set to It is possible to set a dynamic thinning-out ratio such that the thinning-out ratio is reduced and the thinning-out ratio is maintained or increased if the degree of coincidence is high.

なお、この実施形態は、第一推定部3においてカメラ201の姿勢を推定するために用いる正方マーカ等の対象物は、静止していることを前提とする。 It should be noted that this embodiment is premised on that an object such as a square marker used for estimating the posture of the camera 201 in the first estimation unit 3 is stationary.

（３）合成部5が式(4)により合成姿勢情報を得る処理を開始する時刻t_合成開始において、次の提示タイミングt_iまでの１つ以上の姿勢変化情報ΔP_k,k+1（k=a,a+1,…,i-2,i-1）のうち、第二推定部4の推定処理による未来予測値としてではなく、センサ部2による実測の値が既に得られているものがあれば、未来予測値を実測の値に置き換えて、式(4)を適用するようにしてもよい。 (3) at time t _{synthesis initiation} to begin the process of obtaining synthetic posture information synthesizing unit 5 by the equation (4), one or more posture change information [Delta] P _k until the next presentation timing t _{_i, k + 1} (k =a,a+1,...,i-2,i-1), which is not the future predicted value by the estimation process of the second estimation unit 4, but the value actually measured by the sensor unit 2 has already been obtained. If so, the future prediction value may be replaced with the actual measurement value, and the equation (4) may be applied.

すなわち、a<r<i-1として、k=a,a+1,…,r-1,rまでは、予測値ΔP_k,k+1に対応するセンサ実測値がΔP_{k,k+1[実測]}として得られており、k=r+1,r+2,…,i-2,i-1までは予測値ΔP_k,k+1のみが得られている場合に、式(4)に代えて、以下の式(8)で合成姿勢情報を求めるようにしてよい。 That is, assuming that a<r<i-1, up to k=a,a+1,...,r-1,r, the measured sensor value corresponding to the predicted value ΔP _k,k+1 is ΔP _k,k+1. It is obtained as _{[actual measurement]} , and when k=r+1,r+2,...,i-2,i- ₁ only the predicted value ΔP _k,k+1 is obtained, the equation (4 ), the combined posture information may be obtained by the following equation (8).

なお、この実施形態においては、第二推定部4によるステップS42における推定処理に対する追加処理として、過去に既に求めた予測値ΔP_k,k+1のうち、センサ部2からの出力（時刻t_k,t_k+1間での出力）に基づく実測値ΔP_{k,k+1[実測]}が参照可能なものがある場合に、この参照可能となった実測値ΔP_{k,k+1[実測]}を合成部5に対して出力する処理を行うものとする。 In addition, in this embodiment, as an additional process to the estimation process in step S42 by the second estimation unit 4, among the predicted values ΔP _k,k+1 already obtained in the past, the output from the sensor unit 2 (time t _k , output between t _k+1 ) and the measured value ΔP _{k,k+1 [measured]} that can be referred to, this measured value ΔP _{k,k+1 [measured]} Is output to the synthesizing unit 5.

例えば、前述の式(4-7)の例で、この合成処理開始時点での4つの推定された姿勢変化情報ΔP_3,4,ΔP_4,5,ΔP_5,6,ΔP_6,7のうち過去側の3つに関して、実測の姿勢変化情報ΔP_3,4[実測],ΔP_4,5[実測],ΔP_5,6[実測]が得られて出力されている場合、式(4-7)に代えて以下の式(9)で合成姿勢情報を得るようにしてよい。
Pe₇=ΔP_6,7ΔP_5,6[実測]ΔP_4,5[実測]ΔP_3,4[実測]P₃ …(9) For example, in the example of the above formula (4-7), of the four estimated posture change information ΔP _3,4, ΔP _4,5 , ΔP _5,6 , ΔP _6,7 at the start of the combining process. For the past three, if the measured posture change information ΔP _{3,4 [measured],} ΔP _{4,5 [measured]} , ΔP _{5,6 [measured]} is obtained and output, the formula (4-7 Instead of ), combined posture information may be obtained by the following equation (9).
Pe ₇ = ΔP _6,7 ΔP _{5,6 [measured]} ΔP _{4,5 [measured]} ΔP _{3,4 [measured]} P ₃ …(9)

（４）第一推定部3で得る姿勢情報（基準姿勢からの変化としての姿勢情報）と、第二推定部4で得る姿勢変化情報と、は３次元空間内での姿勢変化に対応する変換を２次元画像座標（斉次座標）において行う平面射影変換行列（サイズ3×3）の形で得る場合を例として説明してきたが、その他の形を用いてもよい。例えば、姿勢情報及び姿勢変化情報は、３次元空間内での姿勢変化をそのまま表現するものとして、以下の式(10)で与えられる回転成分r_ij(1≦i,j≦3)と並進成分t_X,t_Y,t_Zで構成されるカメラ201の外部パラメータM（サイズ4×4）の形を用いるようにしてもよい。この外部パラメータMを用いる場合も、合成部5による合成は以上の式(4)やその他の式の例と同様に積の形で、ほぼ瞬時に行うことが可能である。提示部6において提示情報を生成する際は、外部パラメータMに加えてカメラ201の所定の内部パラメータを用いてCG分野で既知の数学的関係により透視投影行列（カメラ行列）を求め、モデル空間で定義されている3次元CGモデルを撮像部1の画像座標へと、この透視投影行列を用いて投影したものにより、提示情報を生成してよい。あるいは、外部パラメータM及び内部パラメータから既知のエピポーラ幾何の関係を用いて、空間内の同一平面上にある点を2つの異なるカメラ位置で撮像した際の、この点の2つの画像座標間での変換関係として平面射影変換行列を求め、この平面射影変換行列を用いて既に説明した手法で提示情報を生成してもよい。 (4) The posture information (posture information as a change from the reference posture) obtained by the first estimation unit 3 and the posture change information obtained by the second estimation unit 4 are converted corresponding to the posture change in the three-dimensional space. Has been described as an example in the form of a plane projective transformation matrix (size 3×3) performed in two-dimensional image coordinates (homogeneous coordinates), but other forms may be used. For example, the posture information and the posture change information represent the posture change in the three-dimensional space as it is, and the rotation component r _ij (1≦i,j≦3) and the translational component given by the following formula (10) are given. The shape of the external parameter M (size 4×4) of the camera 201 configured by t _X , t _Y , and t _Z may be used. Even when this external parameter M is used, the composition by the composition unit 5 can be performed almost instantaneously in the form of a product as in the case of the above formula (4) and other formulas. When the presentation information is generated in the presentation unit 6, a perspective projection matrix (camera matrix) is obtained by a mathematical relationship known in the CG field using a predetermined internal parameter of the camera 201 in addition to the external parameter M, and in the model space. The presentation information may be generated by projecting the defined three-dimensional CG model onto the image coordinates of the imaging unit 1 using this perspective projection matrix. Alternatively, using a known epipolar geometrical relationship from the external parameter M and the internal parameter, when a point on the same plane in space is imaged at two different camera positions, the two image coordinates of this point are It is also possible to obtain a plane projective transformation matrix as the transformation relationship and use this plane projective transformation matrix to generate the presentation information by the method already described.

10…情報端末装置、1…撮像部、2…センサ部、3…第一推定部、4…第二推定部、5…合成部、6…提示部 10... Information terminal device, 1... Imaging unit, 2... Sensor unit, 3... First estimation unit, 4... Second estimation unit, 5... Synthesis unit, 6... Presentation unit

Claims

An image capturing section that captures an image to obtain a captured image;
From the captured image, a first estimation unit that estimates orientation information of the imaging unit at a past time,
A sensor unit that continuously acquires a sensor value that reflects the attitude of the imaging unit,
A second estimation unit that estimates the posture change information toward the future time from the continuously acquired sensor value,
An information terminal device, comprising: a combining unit that obtains combined attitude information that combines the attitude information and the attitude change information.

Further comprising a presentation unit that performs an augmented reality display at each display time using the synthetic posture information,
The information terminal device according to claim 1, wherein the second estimation unit estimates the posture change information toward a future time that is the display time.

In the synthesizing unit, the posture change information estimated as being directed to a future time in the second estimating unit and the posture information are synthesized in advance to obtain synthetic posture information at the future time,
The information terminal device according to claim 2, wherein when the present time reaches the future time, the presenting unit displays the augmented reality by using the preliminarily combined synthetic posture information.

3. The second estimating unit starts the process of estimating the posture change information toward a future time which is the display time, at a time that is a certain interval or more past the future time. Or the information terminal device as described in 3.

The fixed interval is a time required for estimating the posture change information by the second estimation unit, a time required for obtaining the synthesized posture information by the synthesis unit, and an augmented reality display by the presentation unit. The information terminal device according to claim 4, wherein the required time is added.

The synthesis unit repeatedly obtains synthetic posture information by synthesizing posture information and posture change information that can be referred to at the time,
If it is determined that the attitude information that can be referenced at the current time does not change from the attitude information that can be referenced at the immediately previous time, the combined attitude information obtained at the immediately previous time and the current time from the immediately previous time 6. The information terminal device according to claim 1, wherein the combined posture information at the current time is obtained by synthesizing the posture change information up to and including.

The image capturing unit obtains a captured image at each time,
In the synthesizing unit, synthetic posture information is obtained for each time as a future time,
In the first estimation unit, the posture information is estimated for all or part of the obtained captured images at each time, and
The first estimating unit repeatedly compares the posture information already estimated at the past time by the first estimating unit with the synthetic posture information obtained by the synthesizing unit as corresponding to the past time. Evaluate the degree of agreement between the two, and
In the first estimation unit, as the degree of matching is larger, only the thinned-out images with a larger thinning-out ratio among all the captured images at the respective times obtained are targets for estimating the posture information. The information terminal device according to any one of claims 1 to 6.

At the second estimation unit, at each time, the posture change information is estimated toward a future time based on the time, and the posture change information already estimated at the past time is continuously detected by the sensor unit. If there is a corresponding actually measured posture change information by the sensor value acquired in, obtain the actually measured posture change information,
At each time, the synthesizing unit obtains the synthesized posture information by synthesizing the posture information and the posture change information which can be referred to at the time, and the posture change information which can be referred to at the time. In the case where the corresponding actually-measured posture change information acquired by the second estimation unit is present, the actually-measured posture change information is used instead of the posture change information estimated by the second estimation unit, The information terminal device according to claim 1, wherein the combined attitude information is obtained.

The information terminal device according to any one of claims 1 to 8, wherein the sensor unit includes an acceleration sensor and/or an angular velocity sensor.

A program causing a computer to function as the information terminal device according to any one of claims 1 to 9.