JP2017055397A

JP2017055397A - Image processing apparatus, image composing device, image processing system, image processing method and program

Info

Publication number: JP2017055397A
Application number: JP2016167289A
Authority: JP
Inventors: 小林　正明; Masaaki Kobayashi; 正明小林
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-09-08
Filing date: 2016-08-29
Publication date: 2017-03-16
Anticipated expiration: 2036-08-29
Also published as: JP6768416B2

Abstract

PROBLEM TO BE SOLVED: To enable an MR (Mixed Reality) device to perform highly accurate position and posture estimation, by the combination of an imaging system, which has a small deterioration with respect to movement, with an imaging system which can image a high definition video though having a large deterioration with respect to movement, enabling high definition video viewing.SOLUTION: An image processing apparatus outputs a first video using first imaging means, having a relatively small image deterioration when a subject moves, and also outputs a second video using second imaging means having a relatively large image deterioration when a subject moves. The image processing apparatus analyzes the first video to generate the position and the posture information of the image processing apparatus. The image processing apparatus then draws a CG object on the second video in a superposed manner at a position determined based on the position and the posture information.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理装置、画像合成装置、画像処理システム、画像処理方法、及びプログラムに関する。 The present invention relates to an image processing device, an image composition device, an image processing system, an image processing method, and a program.

近年、ＶｉｓｕａｌＳＬＡＭ（ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ）とよばれる移動するカメラの映像から三次元的な位置姿勢を推定する技術が実用化されている。この応用として、カメラの位置姿勢に基づいて、仮想的に存在する三次元コンピュータグラフィックオブジェクトを、映像上に描画として表示するＭＲ（ＭｉｘｅｄＲｅａｌｉｔｙ）／ＡＲ（ＡｒｇｕｍｅｎｅｔｅｄＲｅａｌｉｔｙ）がある。映像からカメラの位置姿勢推定を行う技術にはマーカーを利用するものと利用しないものがある。双方ともマーカー、あるいは、自然の被写体をフレーム間で同定し、その動きを追跡する（以降、この追跡をトラッキングと呼ぶ）ことによって、三次元空間上のカメラの位置姿勢を推定するものである。マーカーを利用した位置姿勢推定方法の一つには、特許文献１がある。マーカーを利用しない位置姿勢推定方法の一つには、非特許文献２（通称：ＰＴＡＭ）がある。ＭＲでは、推定したカメラの位置姿勢から環境マップと呼ばれるマーカーや被写体の三次元的な位置を示すマップを生成し、この環境マップを使って、ＣＧオブジェクトの位置と向きを決定し、入力した映像に重畳する。これにより、実空間内にＣＧオブジェクトが存在するような映像を得ることができる。このとき、正しい位置にＣＧを重畳できるか否かはトラッキング精度に依存し、トラッキング精度は映像のフレーム画像一枚一枚の特性に大きく依存する。 In recent years, a technique for estimating a three-dimensional position and orientation from a moving camera image called Visual SLAM (Multiple Localization and Mapping) has been put into practical use. As this application, there is MR (Mixed Reality) / AR (Arranged Reality) that displays a virtually existing three-dimensional computer graphic object on a video based on the position and orientation of the camera. Some techniques for estimating the position and orientation of a camera from an image include those that use markers and those that do not. In both cases, a marker or a natural object is identified between frames and its movement is tracked (hereinafter, this tracking is referred to as tracking) to estimate the position and orientation of the camera in the three-dimensional space. One position / orientation estimation method using markers is disclosed in Patent Document 1. One of the position and orientation estimation methods that do not use a marker is Non-Patent Document 2 (common name: PTAM). In MR, a map that indicates the three-dimensional position of a marker or subject called an environment map is generated from the estimated position and orientation of the camera, the position and orientation of the CG object are determined using this environment map, and the input video Superimpose on. Thereby, it is possible to obtain an image in which a CG object exists in the real space. At this time, whether or not the CG can be superimposed at the correct position depends on the tracking accuracy, and the tracking accuracy greatly depends on the characteristics of each frame image of the video.

フレーム画像の特性は、センサやセンサの駆動条件に依存する。例えば、ＣＭＯＳセンサでよく使われる方式であるローリングシャッターセンサを利用すると、シーンに動きのある被写体が存在する、あるいは、カメラがパンする場合などに、ローリングシャッター歪と呼ばれる歪が生じる。この歪はフレーム間でマーカーや被写体を同定精度を低下させ、結果としてトラキング精度、位置姿勢推定精度を低下させてしまう。一方ＣＣＤに代表されるグローバルシャッターセンサを用いるとローリングシャッター歪は生じない。しかし、一般にグローバルシャッターセンサは一般に高い駆動電圧を必要とし高解像度化や高フレームレート化は難しいとされる。また、同じローリングシャッターセンサでも、センサの駆動速度を向上させると、ローリングシャッター歪を大幅に減少させることができる。 The characteristics of the frame image depend on the sensor and the driving condition of the sensor. For example, when a rolling shutter sensor, which is a method often used in CMOS sensors, is used, distortion called rolling shutter distortion occurs when there is a moving subject in the scene or when the camera pans. This distortion reduces the accuracy of identifying markers and subjects between frames, and as a result, reduces tracking accuracy and position / orientation estimation accuracy. On the other hand, when a global shutter sensor typified by a CCD is used, rolling shutter distortion does not occur. However, in general, a global shutter sensor generally requires a high driving voltage, and it is difficult to achieve high resolution and high frame rate. Further, even with the same rolling shutter sensor, rolling shutter distortion can be greatly reduced by increasing the sensor driving speed.

ＨｉｒｏｋａｚｕＫａｔｏａｎｄＭａｒｋＢｉｌｌｉｎｇｈｕｒｓｔ、「ＭａｒｋｅｒＴｒａｃｋｉｎｇａｎｄＨＭＤＣａｌｉｂｒａｔｉｏｎｆｏｒａＶｉｄｅｏ−ｂａｓｅｄＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙＣｏｎｆｅｒｅｎｃｉｎｇＳｙｓｔｅｍ」、Ｐｒｏｃｅｅｄｉｎｇｓ．２ｎｄＩＥＥＥａｎｄＡＣＭＩｎｔｅｒｎａｔｉｏｎａｌＷｏｒｋｓｈｏｐｏｎＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ ’９９Hirokazu Kato and Mark Billinghurst, “Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System”, Proceedings. 2nd IEEE and ACM International Workshop on Augmented Reality '99 ＧｅｏｒｇＫｌｅｉｎａｎｄＤａｖｉｄＭｕｒｒａｙ、「ＰａｒａｌｌｅｌＴｒａｃｋｉｎｇａｎｄＭａｐｐｉｎｇｏｎａＣａｍｅｒａＰｈｏｎｅ」、ＩｎＰｒｏｃ．ＩｎｔｅｒｎａｔｉｏｎａｌＳｙｍｐｏｓｉｕｍｏｎＭｉｘｅｄａｎｄＡｕｇｍｅｎｔｅｄＲｅａｌｉｔｙ（ＩＳＭＡＲ’０９，Ｏｒｌａｎｄｏ）Georg Klein and David Murray, “Parallel Tracking and Mapping on a Camera Phone”, In Proc. International Symposium on Mixed and Augmented Reality (ISMAR'09, Orlando)

ＭＲ装置において、ローリングシャッターセンサの映像は、映像にローリングシャッター歪が発生し、位置姿勢推定精度が低下する。グローバルシャッターセンサを使う場合には、低コストで十分な解像度や高いフレームの画像を扱うことが難しい。 In the MR apparatus, a rolling shutter distortion occurs in the image of the rolling shutter sensor, and the position / orientation estimation accuracy decreases. When using a global shutter sensor, it is difficult to handle images with sufficient resolution and high frame at low cost.

本発明の目的を達成するために、例えば、本発明の画像処理装置は以下の構成を備える。すなわち、
第１の映像を出力する、被写体が動くことによる画像劣化が相対的に少ない第１の撮像手段と、
第２の映像を出力する、被写体が動くことによる画像劣化が相対的に多い第２の撮像手段と、
前記第１の映像を解析して前記画像処理装置の位置姿勢情報を生成する推定手段と、
前記位置姿勢情報に基づいて決定された位置に重畳されるように、前記第２の映像上にＣＧオブジェクトを描画する描画手段と、
を備えることを特徴とする。 In order to achieve the object of the present invention, for example, an image processing apparatus of the present invention comprises the following arrangement. That is,
A first imaging means for outputting a first video and relatively less image degradation caused by movement of the subject;
A second imaging means for outputting a second video, wherein image degradation due to movement of the subject is relatively large;
Estimating means for analyzing the first video and generating position and orientation information of the image processing device;
Drawing means for drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information;
It is characterized by providing.

高精度な位置姿勢推定と高精細な視聴映像の生成を両立する。 Both high-precision position and orientation estimation and high-definition viewing video generation are compatible.

実施形態１におけるＭＲ装置構成を説明する図FIG. 3 is a diagram illustrating the configuration of an MR apparatus according to the first embodiment. 実施形態２におけるＭＲ装置構成を説明する図The figure explaining MR apparatus composition in Embodiment 2. 実施形態３におけるＭＲ装置構成を説明する図The figure explaining MR apparatus composition in Embodiment 3. 実施形態３における動き量を検出する方法を説明する図FIG. 10 is a diagram for explaining a method of detecting a motion amount in the third embodiment. フレーム画像の幾何的な変換を説明する図Diagram explaining geometric transformation of frame image 実施形態３における動き量を検出する方法の変形例を説明する図FIG. 10 is a diagram for explaining a modification of the method for detecting a motion amount in the third embodiment. 実施形態４におけるＭＲ装置構成を説明する図The figure explaining MR apparatus composition in Embodiment 4. 実施形態５におけるＭＲシステム構成を説明する図FIG. 6 is a diagram for explaining an MR system configuration according to a fifth embodiment. 実施形態６におけるＭＲ装置構成を説明する図The figure explaining MR apparatus structure in Embodiment 6. FIG. 実施形態６におけるＭＲ処理方法を説明する図FIG. 10 is a diagram for explaining an MR processing method according to a sixth embodiment. 実施形態６におけるＭＲ処理方法の変形例を説明する図FIG. 10 is a diagram for explaining a modification of the MR processing method according to the sixth embodiment. 実施形態７にＭＲシステム構成を説明する図であるFIG. 10 is a diagram for explaining an MR system configuration according to a seventh embodiment. 実施形態７におけるＭＲ処理方法を説明する図FIG. 10 is a diagram illustrating an MR processing method according to a seventh embodiment.

［実施形態１］
本発明の実施形態１に係る画像処理装置であるＭＲ装置１００について説明する。本実施形態では、撮像部と表示部とを備えるＭＲ装置上で処理が実行されるものとする。以下、ＭＲ装置の構成と各モジュールの動作について、図１を参照して説明する。図１は、実施形態１に係るＭＲ装置（例えば、ヘッドマウントディスプレイ）の構成を説明する図である。 [Embodiment 1]
An MR apparatus 100 that is an image processing apparatus according to the first embodiment of the present invention will be described. In the present embodiment, it is assumed that processing is executed on an MR apparatus including an imaging unit and a display unit. Hereinafter, the configuration of the MR apparatus and the operation of each module will be described with reference to FIG. FIG. 1 is a diagram illustrating a configuration of an MR apparatus (for example, a head mounted display) according to the first embodiment.

第１のセンサ１０１は、第１の映像を撮像する。本実施形態において、第１のセンサ１０１は、縦９６０画素、横５４０画素、６０ｆｐｓ撮像可能なグローバルシャッターセンサである。本実施形態において、センサにはレンズユニットが接続されており、連続する画像で構成される映像を撮像してセンサ信号を得ることができる。解像度やフレームレートはこの値には限定されない。 The first sensor 101 captures the first video. In the present embodiment, the first sensor 101 is a global shutter sensor capable of imaging 960 pixels vertically, 540 pixels horizontally, and 60 fps. In the present embodiment, a lens unit is connected to the sensor, and a sensor signal can be obtained by capturing an image composed of continuous images. The resolution and frame rate are not limited to these values.

第１のＩＳＰ（イメージシグナルプロセッサ）１０２は、第１のセンサ１０１により得られたセンサ信号を映像に変換する。本実施形態において、第１のＩＳＰ１０２は、画像データ又は画像データを符号化して得られた符号化ストリームを出力する機能を有するモジュールである。具体的には、第１のＩＳＰ１０２は、センサ信号からＲＧＢ画像を生成したり、画像の拡大縮小を行ったり、映像を符号化したりするなど、複合的な画像処理機能を有している。これらの処理のために必要さとれるＲＡＭは、第１のＩＳＰ１０２に内蔵されているものとするが、第１のＩＳＰ１０２にはさらに外部ＲＡＭが接続されていてもよい。 A first ISP (image signal processor) 102 converts a sensor signal obtained by the first sensor 101 into an image. In the present embodiment, the first ISP 102 is a module having a function of outputting image data or an encoded stream obtained by encoding image data. Specifically, the first ISP 102 has a complex image processing function such as generating an RGB image from the sensor signal, enlarging or reducing the image, and encoding a video. The RAM required for these processes is built in the first ISP 102, but an external RAM may be further connected to the first ISP 102.

位置姿勢推定部１０３は、第１のＩＳＰ１０２により得られた映像を用いてＭＲ装置１００の位置姿勢を推定する。位置姿勢推定部１０３は、例えば、ＭＲ装置１００に搭載された第２のセンサ１０４の位置姿勢を推定することができる。本実施形態において、位置姿勢推定部１０３はＲＯＭ（リードオンリーメモリ）及びＲＡＭ（ランダムアクセスメモリ）を内蔵するＣＰＵである。ＣＰＵは、ＲＯＭに記憶された位置姿勢推定プログラムに従って、ＲＡＭをワークエリアとして使いながら動作することにより、位置姿勢推定処理を行う。位置姿勢推定プログラムとしては、非特許文献１に記載された方式に従うプログラムを用いることができる。なお、位置姿勢推定部１０３は専用ＨＷ（ハードウェア）であってもよい。また、位置姿勢推定処理は非特許文献１の方式に従うものには限定されず、例えば非特許文献２に記載されているようなＶｉｓｕａｌＳＬＡＭ方式に従って行うこともできる。 The position / orientation estimation unit 103 estimates the position / orientation of the MR apparatus 100 using the video obtained by the first ISP 102. The position / orientation estimation unit 103 can estimate the position / orientation of the second sensor 104 mounted on the MR apparatus 100, for example. In the present embodiment, the position / orientation estimation unit 103 is a CPU incorporating a ROM (read-only memory) and a RAM (random access memory). The CPU performs position and orientation estimation processing by operating while using the RAM as a work area according to the position and orientation estimation program stored in the ROM. As the position / orientation estimation program, a program according to the method described in Non-Patent Document 1 can be used. Note that the position / orientation estimation unit 103 may be a dedicated HW (hardware). Further, the position / orientation estimation processing is not limited to the one according to the method of Non-Patent Document 1, and can be performed according to the Visual SLAM method as described in Non-Patent Document 2, for example.

第１のセンサ１０１は、第２の映像を撮像する。本実施形態において、第２のセンサ１０４は、縦１９２０画素、横１０８０画素、６０ｆｐｓ撮像可能なローリングシャッターセンサである。本実施形態において、センサにはレンズユニットが接続されており、連続する画像で構成される映像を撮像してセンサ信号を得ることができる。解像度やフレームレートはこの値には限定されない。 The first sensor 101 captures the second video. In the present embodiment, the second sensor 104 is a rolling shutter sensor that can capture 1920 pixels vertically, 1080 pixels horizontally, and 60 fps. In the present embodiment, a lens unit is connected to the sensor, and a sensor signal can be obtained by capturing an image composed of continuous images. The resolution and frame rate are not limited to these values.

第２のＩＳＰ１０５は、第２のセンサ１０４により得られたセンサ信号を映像に変換する。第２のＩＳＰ１０５は第１のＩＳＰ１０２と同様の機能を有することができる。 The second ISP 105 converts the sensor signal obtained by the second sensor 104 into an image. The second ISP 105 can have the same function as the first ISP 102.

ＣＧ描画部１０６は、第２のＩＳＰ１０５から出力された画像を用いて合成画像を生成する。本実施形態において、ＣＧ描画部１０６は、三次元（コンピュータグラフィック）オブジェクト情報を保持し、画像上に仮想ＣＧオブジェクトを重ねて描画することにより合成画像を生成する。例えば、ＣＧ描画部１０６は、ＣＧオブジェクト情報に従って仮想ＣＧオブジェクトをレンダリングし、得られた仮想ＣＧオブジェクトを画像に重畳することができる。この際に、重畳される仮想ＣＧオブジェクト及び仮想ＣＧオブジェクトが重畳される位置は、位置姿勢推定部１０３により推定されたＭＲ装置１００の位置姿勢に従って制御される。例えば、ＣＧ描画部１０６は、推定されたＭＲ装置１００の位置姿勢に従う位置に視点を配置して仮想ＣＧオブジェクトをレンダリングし、得られた仮想画像を第２のＩＳＰ１０５から出力された画像に重畳することにより合成画像を生成することができる。 The CG drawing unit 106 generates a composite image using the image output from the second ISP 105. In the present embodiment, the CG drawing unit 106 holds three-dimensional (computer graphic) object information, and generates a composite image by drawing a virtual CG object on the image. For example, the CG drawing unit 106 can render a virtual CG object according to the CG object information and superimpose the obtained virtual CG object on the image. At this time, the superimposed virtual CG object and the position at which the virtual CG object is superimposed are controlled according to the position and orientation of the MR apparatus 100 estimated by the position and orientation estimation unit 103. For example, the CG rendering unit 106 renders a virtual CG object by placing a viewpoint at a position according to the estimated position and orientation of the MR apparatus 100, and superimposes the obtained virtual image on the image output from the second ISP 105. Thus, a composite image can be generated.

表示部１０７は、ＣＧ描画部１０６により生成された合成画像を表示する。ＭＲ装置１００のユーザは、表示部１０７を介して映像を見ることができる。 The display unit 107 displays the composite image generated by the CG drawing unit 106. A user of the MR apparatus 100 can view an image via the display unit 107.

本実施形態において、センサ１０１，１０２は、同一画角の画像を撮像できるように十分に近い位置に固定して配置されているものとする。また、センサ１０１，１０２が同一画角の画像を撮像できるように、光路にハーフミラーを挿入してもよい。なお、画角が同じであることは必須ではない。画角が異なる場合には、例えば位置姿勢推定部１０３が画角の違いを補正してもよいし、又は、ＩＳＰ１０２，１０５が、それぞれが出力する画像の画角が合うように幾何補正処理を行ってから画像を出力してもよい。 In the present embodiment, it is assumed that the sensors 101 and 102 are fixedly arranged at positions close enough to capture an image with the same angle of view. Further, a half mirror may be inserted in the optical path so that the sensors 101 and 102 can capture images with the same angle of view. It is not essential that the angle of view is the same. When the angle of view is different, for example, the position / orientation estimation unit 103 may correct the difference in the angle of view, or the ISPs 102 and 105 perform geometric correction processing so that the angle of view of the images output from each of them matches. The image may be output after this is done.

以下、ＭＲ装置１００全体の動作について説明する。図１において、矢印は主要データの流れを示すものとする。各部の間では制御信号等を双方向に通信することができるが、このような制御信号等の説明は省略する。 Hereinafter, the operation of the entire MR apparatus 100 will be described. In FIG. 1, arrows indicate the flow of main data. Although control signals and the like can be communicated bidirectionally between the respective units, description of such control signals and the like is omitted.

センサ１０１の信号はＩＳＰ１０２へ出力される。ＩＳＰ１０２はＲＧＢ画像を生成し、ＲＧＢ画像を信号として位置姿勢推定部１０３へ出力する。本明細書においては、センサとＩＳＰの組を撮像系と呼ぶ。また、センサ１０１とＩＳＰ１０２を備える撮像系を解析映像撮像系と呼ぶ。さらに、解析映像撮像系で生成された映像を解析映像と呼ぶ。 A signal from the sensor 101 is output to the ISP 102. The ISP 102 generates an RGB image and outputs the RGB image as a signal to the position / orientation estimation unit 103. In this specification, a combination of a sensor and an ISP is called an imaging system. An imaging system including the sensor 101 and the ISP 102 is referred to as an analytic video imaging system. Furthermore, a video generated by the analysis video imaging system is called an analysis video.

位置姿勢推定部１０３は、解析画像を解析して位置姿勢情報を生成する。本実施形態において、位置姿勢推定部１０３は、三次元空間におけるＭＲ装置１００の位置姿勢を示す位置姿勢情報を生成し、位置姿勢情報に基づいて環境マップを生成し、位置姿勢情報と環境マップとをＣＧ描画部１０６へ出力する。 The position and orientation estimation unit 103 analyzes the analysis image and generates position and orientation information. In the present embodiment, the position / orientation estimation unit 103 generates position / orientation information indicating the position / orientation of the MR apparatus 100 in a three-dimensional space, generates an environment map based on the position / orientation information, Is output to the CG rendering unit 106.

センサ１０４の信号はＩＳＰ１０５へ出力される。ＩＳＰ１０５はＲＧＢ画像を生成し、ＲＧＢ画像を信号としてＣＧ描画部１０６へ出力する。本明細書においては、センサ１０４とＩＳＰ１０５とを備える撮像系を視聴映像撮像系と呼ぶ。また、視聴映像撮像系で生成された映像を視聴映像と呼ぶ。 A signal from the sensor 104 is output to the ISP 105. The ISP 105 generates an RGB image and outputs the RGB image as a signal to the CG drawing unit 106. In this specification, an imaging system including the sensor 104 and the ISP 105 is referred to as a viewing video imaging system. A video generated by the viewing video imaging system is referred to as a viewing video.

ＣＧ描画部１０６は、位置姿勢情報に基づいて、仮想的に存在する三次元コンピュータグラフィックオブジェクトを、視聴映像の画面に射影した像として視聴映像上に描画する。本実施形態において、ＣＧ描画部１０６は、視聴映像上に、仮想ＣＧオブジェクトを描画し、生成された合成映像を信号として表示部１０７へ出力する。このとき、ＣＧ描画部１０６は、位置姿勢推定部１０３から位置姿勢情報及び環境マップ情報を得て、仮想的な三次元空間内の定位置にあたかもＣＧオブジェクトが存在するかのように、視聴映像上に仮想ＣＧオブジェクトを描画する。表示部１０７は、ＣＧ描画部１０６から得た映像を表示する。これらの動作は、画像を連続的に撮像及び処理しながら行われる。 Based on the position and orientation information, the CG drawing unit 106 draws a virtually existing three-dimensional computer graphic object on the viewing video as an image projected onto the viewing video screen. In the present embodiment, the CG drawing unit 106 draws a virtual CG object on the viewing video and outputs the generated composite video to the display unit 107 as a signal. At this time, the CG rendering unit 106 obtains the position / orientation information and the environment map information from the position / orientation estimation unit 103, and the viewing video is as if a CG object exists at a fixed position in the virtual three-dimensional space. A virtual CG object is drawn on top. The display unit 107 displays the video obtained from the CG drawing unit 106. These operations are performed while continuously capturing and processing images.

本実施形態では、ＭＲ装置１００は解析映像撮像系（１０１，１０２）、視聴映像撮像系（１０４，１０５）、及び表示部１０７を有している。しかしながら、ＭＲ装置１００はこれらの処理部を二重に持っていてもよく、このようなＭＲ装置１００は左目及び右目の映像を撮像及び表示するＨＭＤ（ヘッドマウントディスプレイ）として用いることができる。また、図１では各部が直結されているように図示されているが、このような構成には限定されず、各部をバスを介して接続してもよく、この場合各部はバス経由で信号を送受信することができる。 In the present embodiment, the MR apparatus 100 includes an analysis video imaging system (101, 102), a viewing video imaging system (104, 105), and a display unit 107. However, the MR apparatus 100 may have these processing units doubly, and such an MR apparatus 100 can be used as an HMD (head mounted display) that captures and displays images of the left eye and the right eye. Further, in FIG. 1, each part is illustrated as being directly connected, but the present invention is not limited to such a configuration, and each part may be connected via a bus. In this case, each part receives a signal via the bus. You can send and receive.

従来のＭＲ装置は、撮像系を１つだけ有しており、撮像系により得られた映像に基づいてＭＲ装置の位置姿勢が推定されるとともに、同じ撮像系により得られた映像に仮想ＣＧオブジェクトが描画された。このため、撮像系のセンサにローリングシャッターを採用するとローリングシャッター歪が発生するため、ＭＲ装置の位置姿勢推定が困難になっていた。一方でＣＣＤ等のグローバルシャッターセンサを採用すると、高解像度化や高フレームレート化した際に著しく消費電力が増えてしまうという課題があった。 The conventional MR apparatus has only one imaging system, and the position and orientation of the MR apparatus are estimated based on the video obtained by the imaging system, and the virtual CG object is added to the video obtained by the same imaging system. Was drawn. For this reason, when a rolling shutter is used as an image sensor, rolling shutter distortion occurs, making it difficult to estimate the position and orientation of the MR apparatus. On the other hand, when a global shutter sensor such as a CCD is adopted, there is a problem that power consumption increases remarkably when resolution is increased or a frame rate is increased.

本実施形態に係るＭＲ装置１００は、解析映像撮像系と視聴映像撮像系の２系統の撮像系を持つ。ここで、センサ１０１を備える解析映像撮像系は、動体の画質劣化特性が相対的に低く、センサ１０４を備える視聴映像撮像系は、動体の画質劣化特性が相対的に高い。すなわち、解析映像撮像系は、視聴映像撮像系よりも、動体を撮像した際の画質劣化がより少ない。一方で、一実施形態において、視聴映像撮像系は、解析映像撮像系よりも良好な画質を有している。例えば、視聴映像撮像系は、解析映像撮像系よりも解像度が高い。例えば、より低い解像度を有するグローバルシャッターセンサが解析映像撮像系に用いられ、より高い解像度を有するローリングシャッターセンサが視聴映像撮像系に用いられる。このため、位置姿勢推定性能と視聴画質との両立を図ることができる。 The MR apparatus 100 according to the present embodiment has two imaging systems, an analysis video imaging system and a viewing video imaging system. Here, the analysis video imaging system including the sensor 101 has relatively low image quality degradation characteristics of moving objects, and the viewing video imaging system including the sensor 104 has relatively high image quality degradation characteristics of moving objects. That is, the analysis video imaging system has less image quality degradation when a moving object is imaged than the viewing video imaging system. On the other hand, in one embodiment, the viewing video imaging system has better image quality than the analytic video imaging system. For example, the viewing video imaging system has a higher resolution than the analytic video imaging system. For example, a global shutter sensor having a lower resolution is used for the analysis video imaging system, and a rolling shutter sensor having a higher resolution is used for the viewing video imaging system. For this reason, it is possible to achieve both the position / orientation estimation performance and the viewing image quality.

もっとも、解析映像撮像系のセンサ１０１として、グローバルシャッターセンサの代わりに高速駆動可能なローリングシャッターセンサを使用してもよい。例えば、垂直走査時間が４ｍｓ（２４０ｆｐｓに相当）で駆動可能なセンサを用いてフレーム画像を取得すると、垂直走査時間が１６ｍｓ（６０ｆｐｓに相当）の場合に比べてローリングシャッター歪量が四分の一になる。このように、低速駆動するローリングシャッターセンサと比べて、ローリングシャッター歪量が少ない解析映像を得ることができるので、位置姿勢推定性能を向上させることができる。センサ１０１はセンサ１０４と比較して画素数が少なくてもよいため、高速駆動可能なセンサを採用することは容易である。 However, a rolling shutter sensor that can be driven at a high speed may be used as the sensor 101 of the analysis video imaging system instead of the global shutter sensor. For example, when a frame image is acquired using a sensor that can be driven with a vertical scanning time of 4 ms (corresponding to 240 fps), the rolling shutter distortion amount is a quarter of that when the vertical scanning time is 16 ms (corresponding to 60 fps). become. As described above, since an analysis image with a small amount of rolling shutter distortion can be obtained as compared with a rolling shutter sensor driven at a low speed, the position / orientation estimation performance can be improved. Since the sensor 101 may have a smaller number of pixels than the sensor 104, it is easy to employ a sensor that can be driven at high speed.

また、センサ１０１とセンサ１０４とが共にグローバルシャッターセンサであってもよい。一般に、高速なシャッタースピードで撮像するとモーションブラー（動きによるボケ）の少ない鮮鋭な画像を撮像することができる。モーションブラーは映像解析を難しくする要因であるため、モーションブラーが少ない映像は位置姿勢推定に用いるのに適している。しかし、高速なシャッタースピードで撮像した映像はジャーキネスとよばれる動きの不連続性が発生するため、視聴する際には違和感が感じられる。そこで、センサ１０１を高速なシャッタースピード（例えば開口時間４ｍｓ）で駆動し、センサ１０４を低速なシャッタースピード（例えば開口時間１６ｍｓ）で駆動することができる。このような構成であっても、位置姿勢推定性能と視聴画質との両立を図ることができる。センサ１０１はセンサ１０４と比較して画素数が少なくてもよいため、高速なシャッタースピードで駆動することは容易である。 Further, both the sensor 101 and the sensor 104 may be global shutter sensors. In general, when an image is captured at a high shutter speed, a sharp image with less motion blur (blur due to movement) can be captured. Since motion blur is a factor that makes video analysis difficult, videos with little motion blur are suitable for use in position and orientation estimation. However, since a discontinuity of motion called jerkiness occurs in a video imaged at a high shutter speed, a sense of incongruity is felt when viewing. Therefore, the sensor 101 can be driven at a high shutter speed (for example, an opening time of 4 ms), and the sensor 104 can be driven at a low shutter speed (for example, an opening time of 16 ms). Even with such a configuration, it is possible to achieve both position and orientation estimation performance and viewing image quality. Since the sensor 101 may have fewer pixels than the sensor 104, it can be easily driven at a high shutter speed.

解析映像と視聴映像とは異なるフレームレートであってもよい。一実施形態において、位置姿勢推定性能を高めるために、センサ１０１では高速駆動及び高速シャッターが採用される。この場合、解析映像のフレームレートは視聴映像のフレームレートよりも高くなる。 The analysis video and the viewing video may have different frame rates. In one embodiment, in order to improve the position and orientation estimation performance, the sensor 101 employs high-speed driving and a high-speed shutter. In this case, the frame rate of the analysis video is higher than the frame rate of the viewing video.

一実施形態において、センサ１０１は相対的にフレームレートが高いセンサであり、センサ１０４は相対的にフレームレートが低いセンサである。また、一実施形態において、センサ１０１は相対的にシャッター速度が高速なセンサであり、センサ１０４は相対的にシャッター速度が低速なセンサである。一実施形態において、解析映像の解像度は視聴映像の解像度よりも低い。一実施形態において、解析映像のフレームレートは視聴映像のフレームレートよりも高い。 In one embodiment, sensor 101 is a relatively high frame rate sensor and sensor 104 is a relatively low frame rate sensor. In one embodiment, the sensor 101 is a sensor having a relatively high shutter speed, and the sensor 104 is a sensor having a relatively low shutter speed. In one embodiment, the resolution of the analysis video is lower than the resolution of the viewing video. In one embodiment, the frame rate of the analysis video is higher than the frame rate of the viewing video.

以上のように、本実施形態に係る画像処理装置は、第１の映像を出力する、被写体が動くことによる画像劣化が相対的に少ない第１の撮像部と、第２の映像を出力する、被写体が動くことによる画像劣化が相対的に多い第２の撮像部と、を備える。そして、位置姿勢推定部１０３は第１の映像を解析して画像処理装置の位置姿勢情報を生成し、ＣＧ描画部１０６は位置姿勢情報に基づいて決定された位置に重畳されるように、第２の映像上にＣＧオブジェクトを描画する。このとき、画像処理装置の動きが所定の閾値より大きい場合は第１の映像を解析して位置姿勢情報が生成され、画像処理装置の動きが前記所定の閾値より小さい場合は第２の映像を解析して位置姿勢情報が生成される。 As described above, the image processing apparatus according to the present embodiment outputs the first video, the first imaging unit that relatively reduces image degradation due to the movement of the subject, and the second video. A second imaging unit that is relatively subject to image degradation caused by movement of the subject. Then, the position / orientation estimation unit 103 analyzes the first video to generate position / orientation information of the image processing apparatus, and the CG drawing unit 106 superimposes the position on the position determined based on the position / orientation information. The CG object is drawn on the second video. At this time, if the movement of the image processing apparatus is larger than the predetermined threshold, the first video is analyzed to generate position and orientation information, and if the movement of the image processing apparatus is smaller than the predetermined threshold, the second video is The position and orientation information is generated by analysis.

［実施形態２］
本発明の実施形態２に係る画像処理装置であるＭＲ装置２００について、図２を使って説明する。ＭＲ装置２００は、ＭＲ装置１００と比較して、慣性センサ２０１とセレクタ２０２とが追加されている。特別な説明がない限り、ＭＲ装置２００の動作は、実施形態１に係るＭＲ装置１００（例えば、ヘッドマウントディスプレイ）の動作と同様である。 [Embodiment 2]
An MR apparatus 200 that is an image processing apparatus according to the second embodiment of the present invention will be described with reference to FIG. In comparison with the MR apparatus 100, the MR apparatus 200 includes an inertial sensor 201 and a selector 202. Unless otherwise specified, the operation of the MR apparatus 200 is the same as the operation of the MR apparatus 100 (for example, a head mounted display) according to the first embodiment.

慣性センサ２０１は、ＭＲ装置２００の速度情報を出力する。慣性センサ２０１としては、ジャイロセンサ又は加速度センサ等を用いることができる。これらは加速度を検出するセンサであり、計算により加速度から速度を算出することができる。本実施形態において慣性センサ２０１は、ジャイロセンサ、加速度センサ、及び計算処理部を備えており、ＭＲ装置２００の速度情報を出力することができる。本実施形態において、慣性センサ２０１は装置のセンサ面に対して水平方向及び垂直方向の速度、及び角速度を出力する。加速度情報から速度情報を算出する方法としては広く知られている方法を採用することができ、本明細書では詳細な説明を省略する。速度情報を算出する処理は、必ずしも慣性センサ２０１内で行う必要はなく、外部に設置された演算ユニットが行ってもよい。 The inertial sensor 201 outputs speed information of the MR apparatus 200. As the inertial sensor 201, a gyro sensor or an acceleration sensor can be used. These are sensors for detecting acceleration, and the velocity can be calculated from the acceleration by calculation. In this embodiment, the inertial sensor 201 includes a gyro sensor, an acceleration sensor, and a calculation processing unit, and can output speed information of the MR apparatus 200. In the present embodiment, the inertial sensor 201 outputs horizontal and vertical velocities and angular velocities with respect to the sensor surface of the apparatus. A widely known method can be adopted as a method for calculating velocity information from acceleration information, and detailed description thereof is omitted in this specification. The processing for calculating the speed information is not necessarily performed in the inertial sensor 201, and may be performed by an arithmetic unit installed outside.

セレクタ２０２は、２つの入力のうち一方を出力する機能を有する。セレクタ２０２は、例えば、プログラム可能な演算ユニットを持ち、演算結果に応じて１つの入力を選択して出力することができる。セレクタ２０２の動作については後述する。 The selector 202 has a function of outputting one of the two inputs. The selector 202 has, for example, a programmable calculation unit, and can select and output one input according to the calculation result. The operation of the selector 202 will be described later.

実施形態２におけるＭＲ装置２００の動作について説明する。本実施形態では、ＩＳＰ１０２は内部で生成したＲＧＢ画像を縦横２倍に拡大してから出力する。慣性センサ２０１はＭＲ装置２００自体に固定され、ＭＲ装置２００の速度情報を検出してセレクタ２０２に出力する。セレクタ２０２は、速度情報からＭＲ装置２００の動き量を算出し、動き量が閾値以上であると判定された場合、ＩＳＰ１０２からの映像信号を位置姿勢推定部１０３へ出力する。一方で、動き量が閾値より小さいと判定された場合、セレクタ２０２はＩＳＰ１０５からの映像信号を位置姿勢推定部１０３へ出力する。 The operation of the MR apparatus 200 in the second embodiment will be described. In the present embodiment, the ISP 102 enlarges the internally generated RGB image twice vertically and horizontally and then outputs it. The inertial sensor 201 is fixed to the MR apparatus 200 itself, detects speed information of the MR apparatus 200, and outputs it to the selector 202. The selector 202 calculates the motion amount of the MR apparatus 200 from the speed information, and outputs the video signal from the ISP 102 to the position / orientation estimation unit 103 when it is determined that the motion amount is equal to or greater than the threshold value. On the other hand, when it is determined that the motion amount is smaller than the threshold value, the selector 202 outputs the video signal from the ISP 105 to the position / orientation estimation unit 103.

閾値は、視聴映像からのＭＲ装置２００の位置姿勢推定が困難なほどにＭＲ装置の動き量が大きい場合にＩＳＰ１０２からの映像信号が、そうではない場合にＩＳＰ１０５からの映像信号が、位置姿勢推定部１０３へ出力されるように設定することができる。動き量は、センササイズ、解像度、レンズ焦点距離又は被写体の距離等によって異なる。本実施形態では、ＭＲ装置２００の動き量を、画像中央部、カメラから３０ｃｍの位置にある被写体の１フレーム間（例えば１６ｍｓ）における動きで表現する。よって動き量の画素を単位として表現でき、この動き量が例えば４画素以上の場合にＩＳＰ１０２からの入力映像か採用される。なお、４画素という大きさに限定はない。これらの動作は画像を連続的に撮像、処理しながら行われるため、セレクタ２０２による切り替えも、フレーム画像単位で行われる。 The threshold value is calculated based on the position and orientation of the video signal from the ISP 102 when the MR apparatus has a large amount of motion such that it is difficult to estimate the position and orientation of the MR apparatus 200 from the viewing video, and the video signal from the ISP 105 otherwise. It can be set to be output to the unit 103. The amount of movement varies depending on the sensor size, resolution, lens focal length, subject distance, and the like. In the present embodiment, the amount of movement of the MR apparatus 200 is represented by the movement of one frame (for example, 16 ms) of a subject located 30 cm from the camera at the center of the image. Therefore, the motion amount pixels can be expressed in units, and when the motion amount is, for example, 4 pixels or more, the input video from the ISP 102 is adopted. The size of 4 pixels is not limited. Since these operations are performed while images are continuously captured and processed, switching by the selector 202 is also performed in units of frame images.

本実施形態では、慣性センサ２０１とセレクタ２０２とを追加することにより、ＭＲ装置２００が静止している又は小さく動いている場合には、より解像度の高い視聴映像を使ってより精度の高い位置姿勢推定を行うことができる。また、ＭＲ装置２００が動いている場合には実施形態１で説明したようにローリングシャッター歪の少ない解析映像を使って正確性の高い位置姿勢推定が可能となる。一実施形態では、フレーム画像の撮像時点におけるＭＲ装置２００の動き量に従ってフレーム画像の出力先を切り替えることができる。しかしながら、フレーム画像の撮像時点以前又は以後におけるＭＲ装置２００の動き量に従ってフレーム画像の出力先を切り替えてもよく、この場合でも前述の効果を得ることができる。 In the present embodiment, by adding the inertial sensor 201 and the selector 202, when the MR apparatus 200 is stationary or moving small, the position and orientation with higher accuracy can be obtained using a viewing image with higher resolution. Estimation can be performed. In addition, when the MR apparatus 200 is moving, it is possible to perform highly accurate position and orientation estimation using an analysis image with little rolling shutter distortion as described in the first embodiment. In one embodiment, the output destination of a frame image can be switched according to the amount of movement of the MR apparatus 200 at the time of capturing the frame image. However, the output destination of the frame image may be switched in accordance with the amount of motion of the MR apparatus 200 before or after the frame image is captured.

このように、本実施形態に係る画像処理装置は、画像処理装置の動きを検出する慣性センサ２０１を備える。 As described above, the image processing apparatus according to the present embodiment includes the inertial sensor 201 that detects the movement of the image processing apparatus.

［実施形態３］
本発明の実施形態３に係る画像処理装置であるＭＲ装置３００について、図３〜６を使って説明する。図３は実施形態３に係るＭＲ装置３００の構成を説明する図である。ＭＲ装置３００は、ＭＲ装置１００と比較して、動き検出部３０１及びセレクタ３０２が追加されている。特別な説明がない限り、ＭＲ装置３００の動作は、実施形態１に係るＭＲ装置１００の動作と同様である。 [Embodiment 3]
An MR apparatus 300 that is an image processing apparatus according to the third embodiment of the present invention will be described with reference to FIGS. FIG. 3 is a diagram illustrating the configuration of the MR apparatus 300 according to the third embodiment. Compared to the MR apparatus 100, the MR apparatus 300 has a motion detection unit 301 and a selector 302 added thereto. Unless otherwise specified, the operation of the MR apparatus 300 is the same as the operation of the MR apparatus 100 according to the first embodiment.

動き検出部３０１は、センサ１０１が撮像した画像間の動きを検出し、セレクタ３０２へと出力する。本実施形態において、動き検出部３０１はＲＯＭ、ＲＡＭ及びＣＰＵを備えており、ＣＰＵは、ＲＯＭに組み込まれた後述の動作を行うプログラムを、ＲＡＭをワークエリアとして使いながら実行する。動き検出部３０１がＣＰＵを備えることは必須ではなく、同等の機能を備える専用ＨＷを動き検出部３０１として用いることもできる。 The motion detection unit 301 detects a motion between images captured by the sensor 101 and outputs the motion to the selector 302. In the present embodiment, the motion detection unit 301 includes a ROM, a RAM, and a CPU, and the CPU executes a program that performs an operation described below that is incorporated in the ROM while using the RAM as a work area. It is not essential that the motion detection unit 301 includes a CPU, and a dedicated HW having an equivalent function can also be used as the motion detection unit 301.

セレクタ３０２は、実施形態２に係るセレクタ２０２と同様の機能を有し、その動作については後述する。 The selector 302 has the same function as the selector 202 according to the second embodiment, and the operation will be described later.

以下、ＭＲ装置３００（例えば、ヘッドマウントディスプレイ）の全体の動作について説明する。動き検出部３０１は、ＩＳＰ１０２から画像を取得し、画像間の動き量を検出してセレクタ３０２へと出力する。動き量の算出方法は後述する。セレクタ３０２は、動き量が閾値以上であると判定された場合、ＩＳＰ１０２からの映像信号を位置姿勢推定部１０３へ出力する。また、セレクタ３０２は、小さいと判定された場合、ＩＳＰ１０５からの入力映像信号を位置姿勢推定部１０３へ出力する。閾値は実施形態２と同様に設定することができ、ここでは、動き量が４画素以上の場合に、ＩＳＰ１０２からの入力映像が採用するものとする。 Hereinafter, the overall operation of the MR apparatus 300 (for example, a head mounted display) will be described. The motion detection unit 301 acquires an image from the ISP 102, detects the amount of motion between images, and outputs the detected amount to the selector 302. A method for calculating the amount of movement will be described later. The selector 302 outputs the video signal from the ISP 102 to the position / orientation estimation unit 103 when it is determined that the amount of motion is equal to or greater than the threshold. If the selector 302 is determined to be small, the selector 302 outputs the input video signal from the ISP 105 to the position / orientation estimation unit 103. The threshold value can be set in the same manner as in the second embodiment. Here, it is assumed that the input video from the ISP 102 is employed when the amount of motion is 4 pixels or more.

以下、動き検出部３０１によるフレーム間の動き量の検出方法について、図４を使って説明する。ステップＳ４０１０において動き検出部３０１は、複数のフレーム画像間での、画像上の物体の動きベクトルを検出する。例えば、ＩＳＰ１０２から出力された連続する２つのフレーム画像を用いて、複数の物体についての複数の動きベクトルを検出することができる。ステップＳ４０２０において動き検出部３０１は、検出された複数の動きベクトルのそれぞれについて長さを算出する。ステップＳ４０３０において動き検出部３０１は、検出された複数の動きベクトルの長さの平均を算出する。こうして得られた平均値が動き量として用いられる。 Hereinafter, a method of detecting the amount of motion between frames by the motion detection unit 301 will be described with reference to FIG. In step S4010, the motion detection unit 301 detects a motion vector of an object on the image between a plurality of frame images. For example, a plurality of motion vectors for a plurality of objects can be detected using two consecutive frame images output from the ISP 102. In step S4020, the motion detection unit 301 calculates a length for each of the detected plurality of motion vectors. In step S4030, the motion detection unit 301 calculates the average length of the detected plurality of motion vectors. The average value thus obtained is used as the amount of movement.

動き量を検出する方法は、図４に示す方法に限定されない。例えば、図５，６に示す方法を用いて動き量を検出することもできる。次に示す方法であってもよい。図５はフレーム画像の幾何的な変換を説明する図である。網掛けのない矩形が現フレームを表し、網掛けされた領域は時間的に直前のフレームを表す。図５からは、直前のフレームと原フレームとの間で、水平方向及び垂直方向の平行移動に加えて、ＭＲ装置３００のアオリの動きが発生していることがわかる。このような３次元空間上の平面の動きは、ホモグラフィ行列で表現できる。以下、ホモグラフィ行列を算出してから動き量を算出する方法について、図６を使って説明する。 The method for detecting the amount of motion is not limited to the method shown in FIG. For example, the amount of motion can be detected using the method shown in FIGS. The following method may be used. FIG. 5 is a diagram for explaining the geometric transformation of the frame image. The rectangle without shading represents the current frame, and the shaded area represents the previous frame in time. From FIG. 5, it can be seen that a tilt movement of the MR apparatus 300 occurs between the immediately preceding frame and the original frame in addition to the horizontal and vertical translations. Such plane movement in the three-dimensional space can be expressed by a homography matrix. Hereinafter, a method for calculating a motion amount after calculating a homography matrix will be described with reference to FIG.

ステップＳ６０１０で動き検出部３０１は、ステップＳ４０１０と同様に複数の動きベクトルを検出する。ステップＳ６０２０で動き検出部３０１は、検出された動きベクトルに従ってホモグラフィ行列を算出する。ホモグラフィ行列の算出には、ＲＡＮＳＡＣやＭ推定等のロバスト推定を用いることができるが、算出方式に限定はない。ステップＳ６０３０において動き検出部３０１は、画面４隅の点の動きベクトルの始点と終点とをホモグラフィ行列を用いて射影する。ステップＳ６０４０において動き検出部３０１は、射影により得られた４本の動きベクトルの始点と終点とから、射影により得られたそれぞれの動きベクトルの長さを算出する。ステップＳ６０５０において動き検出部３０１は、射影により得られた動きベクトルのうち、長さが最も長い動きベクトルを選択する。本実施形態では、こうして選択された動きベクトルの長さが動き量として用いられる。この説明においては画面の４隅の点の動きベクトルが射影されたが、別の点の動きベクトルを射影してもよいし、動きベクトルが射影される点の数がより多くまたはより少なくてもよい。 In step S6010, the motion detection unit 301 detects a plurality of motion vectors in the same manner as in step S4010. In step S6020, the motion detection unit 301 calculates a homography matrix according to the detected motion vector. For the calculation of the homography matrix, robust estimation such as RANSAC or M estimation can be used, but the calculation method is not limited. In step S6030, the motion detection unit 301 projects the start and end points of the motion vectors at the four corners of the screen using the homography matrix. In step S6040, the motion detection unit 301 calculates the length of each motion vector obtained by projection from the start and end points of the four motion vectors obtained by projection. In step S6050, the motion detection unit 301 selects the motion vector having the longest length from among the motion vectors obtained by the projection. In the present embodiment, the length of the motion vector thus selected is used as the motion amount. In this description, the motion vectors of the four corner points of the screen are projected. However, the motion vector of another point may be projected, or the number of points to which the motion vector is projected may be larger or smaller. Good.

本実施形態では、実施形態２で用いられた慣性センサ２０１の代わりに、電子的に動きを検出する動き検出部３０１を用いることで、慣性センサ２０１を省略することができる。本実施形態では、動き検出部３０１は専用のＣＰＵを有しているが、１つのＣＰＵを用いて動き検出部３０１及びセレクタ３０２等の複数のユニットの処理を、例えば時分割処理等を用いて、実現することもできる。この場合には、複数のユニットの処理を実現するのに十分な演算能力を有するＣＰＵが用いられる。 In the present embodiment, the inertial sensor 201 can be omitted by using the motion detection unit 301 that electronically detects motion instead of the inertial sensor 201 used in the second embodiment. In this embodiment, the motion detection unit 301 has a dedicated CPU. However, the processing of a plurality of units such as the motion detection unit 301 and the selector 302 is performed using one CPU, for example, using time division processing or the like. Can also be realized. In this case, a CPU having a computing capacity sufficient to realize processing of a plurality of units is used.

このように、本実施形態に係る画像処理装置は、第１の映像を解析して画像処理装置の動きを検出する動き検出部３０１を備える。 As described above, the image processing apparatus according to the present embodiment includes the motion detection unit 301 that analyzes the first video and detects the motion of the image processing apparatus.

［実施形態４］
本発明の実施形態４に係る画像処理装置であるＭＲ装置７００（例えば、ヘッドマウントディスプレイ）について、図７を使って説明する。本実施形態では、位置姿勢推定部１０３が動き情報を算出する機能を有している。特別な説明がない限り、ＭＲ装置７００の動作は、実施形態２に係るＭＲ装置２００の動作と同様である。 [Embodiment 4]
An MR apparatus 700 (for example, a head mounted display) that is an image processing apparatus according to Embodiment 4 of the present invention will be described with reference to FIG. In the present embodiment, the position / orientation estimation unit 103 has a function of calculating motion information. Unless otherwise specified, the operation of the MR apparatus 700 is the same as the operation of the MR apparatus 200 according to the second embodiment.

以下、ＭＲ装置７００全体の動作について説明する。位置姿勢推定部１０３が計算する位置姿勢情報を用いれば、ＭＲ装置７００の三次元空間上の相対位置を計算することができる。本実施形態において位置姿勢推定部１０３は、この相対位置のフレーム間の変化量、つまりＭＲ装置７００の動き量を算出する。動き量は、実施形態２と同様に表現することができる。 The overall operation of the MR apparatus 700 will be described below. By using the position / orientation information calculated by the position / orientation estimation unit 103, the relative position of the MR apparatus 700 in the three-dimensional space can be calculated. In the present embodiment, the position / orientation estimation unit 103 calculates the amount of change in the relative position between frames, that is, the amount of motion of the MR apparatus 700. The amount of movement can be expressed in the same manner as in the second embodiment.

位置姿勢推定部１０３はこの動き量をセレクタ３０２へフィードバックする。セレクタ３０２は、この動き量が視聴映像からの映像解析が困難なほど大きいと判定された場合にはＩＳＰ１０２からの入力映像信号を、小さいと判定された場合にはＩＳＰ１０５からの入力映像信号を位置姿勢推定部１０３へ出力する。これは、実施形態３における３０２の動作と同様である。 The position / orientation estimation unit 103 feeds back this amount of motion to the selector 302. The selector 302 positions the input video signal from the ISP 102 when it is determined that the amount of motion is so large that video analysis from the viewed video is difficult, and the input video signal from the ISP 105 when it is determined to be small. Output to the posture estimation unit 103. This is the same as the operation 302 in the third embodiment.

これらの動作は画像を連続的に撮像、処理しながら行われるため、セレクタによる切り替えも、フレーム画像単位で行われる。ただし、セレクタによる切り替えが、動きが発生したフレームの次のフレームから実施されることになる。慣性センサのような追加モジュールなしに、実施形態２と同様の効果を得ることができる。 Since these operations are performed while images are continuously captured and processed, switching by the selector is also performed in units of frame images. However, switching by the selector is performed from the frame next to the frame in which the motion has occurred. The same effect as in the second embodiment can be obtained without an additional module such as an inertial sensor.

このように、位置姿勢推定部１０３は、第１の映像を解析して位置姿勢情報を生成するとともに画像処理装置の動きを検出し、検出された動きが所定の閾値より大きい場合、位置姿勢情報を生成する際に解析する映像を第２の映像に切り替える。
［実施形態５］ As described above, the position / orientation estimation unit 103 analyzes the first video to generate position / orientation information and detects the movement of the image processing apparatus. When the detected movement is larger than the predetermined threshold, the position / orientation information The video to be analyzed when generating is switched to the second video.
[Embodiment 5]

ＭＲシステムに関する第五の実施形態について図８を使って説明する。本実施形態では、ＨＭＤとホストコンピュータから構成され、ＨＭＤとホストコンピュータはネットーワークを介して接続されているものとして説明する。特別な説明がない限り、その動作は、図２を使って説明した実施形態２の動作と同様とする。 A fifth embodiment relating to the MR system will be described with reference to FIG. In the present embodiment, it is assumed that the HMD and the host computer are connected, and the HMD and the host computer are connected via a network. Unless otherwise specified, the operation is the same as that of the second embodiment described with reference to FIG.

図８は、実施形態５におけるＭＲ装置構成を説明する図である。８０１は信号の出力の有無を制御するセレクタである。８０２はＮＷ（ネットーワークシステム）である。ＨＭＤ側とホストコンピュータ側にパケタイザー回路、ベースバンドエンジン、ＲＦ部、アンテナをそれぞれを持ち、各主データをＨＭＤ、ホストコンピュータ間で自由に伝送できるものとする。８０３と８０５と８１１は映像復号部である。８０４は信号を選択するセレクタである。８０７はＣＰＵである。８０８はＲＡＭである。８０９は不揮発ストレージである。８０７は不揮発ストレージ８０９にあるプログラムをＲＡＭ８０８に読み込みプログラムを実行する。８１０はバスであり、各モジュールはバスでつながれ、バスでつながれたモジュールは特別な記述がなければ、データのやり取りがなされるものとする。 FIG. 8 is a diagram for explaining an MR apparatus configuration according to the fifth embodiment. A selector 801 controls the presence / absence of signal output. Reference numeral 802 denotes an NW (network system). Assume that the HMD side and the host computer side each have a packetizer circuit, a baseband engine, an RF unit, and an antenna, and each main data can be freely transmitted between the HMD and the host computer. Reference numerals 803, 805, and 811 denote video decoding units. Reference numeral 804 denotes a selector for selecting a signal. Reference numeral 807 denotes a CPU. Reference numeral 808 denotes a RAM. Reference numeral 809 denotes a nonvolatile storage. Reference numeral 807 reads the program in the nonvolatile storage 809 into the RAM 808 and executes the program. Reference numeral 810 denotes a bus. Each module is connected by a bus, and the modules connected by the bus are assumed to exchange data unless there is a special description.

本実施形態では実施形態１と異なりＩＳＰ１０２とＩＳＰ１０５が映像を符号化して出力するものとする。１０２、１０５、８０３、８０５、８０６、８１１が扱う符号化方式はＨ．２６４であるものとして説明するがこれに限定はない。１０２は符号化ストリームを８０１に出力する。８０１は入力された動き量が視聴映像からの映像解析が困難なほど大きいと判定された場合には入力映像信号をＮＷ８０２を介して８０３へ出力し、小さいと判定された場合には出力を停止する。なお映像信号の出力を停止している間、セレクタ８０１はＩＳＰ１０２へ停止信号を出力し続ける。Ｈ．２６４などのインターフレームを用いる符号化形式はでは、イントラフレームを復号して初めて後続のインタフレームを符号化できる。そこでＩＳＰ１０２は停止信号受信中、映像をイントラフレームのみで符号化し、停止信号を受信終了後からインターフレーム符号化を始める。これによりインターフレームを使った高圧縮な符号化を実現しつつ、符号化ストリームの伝送が再開されたときにはイントラフレームが伝送されるため、速やかに復号が再開できる。 In this embodiment, unlike the first embodiment, the ISP 102 and the ISP 105 encode and output a video. 102, 105, 803, 805, 806, 811 are handled by H.264. However, the present invention is not limited to this. 102 outputs the encoded stream to 801. When it is determined that the input motion amount is so large that it is difficult to analyze the video from the viewing video, the input video signal is output to 803 via the NW 802, and the output is stopped when it is determined that the input video signal is small. To do. While the output of the video signal is stopped, the selector 801 continues to output a stop signal to the ISP 102. H. In an encoding format using an inter frame such as H.264, a subsequent inter frame can be encoded only after the intra frame is decoded. Therefore, the ISP 102 encodes the video only with the intra frame while receiving the stop signal, and starts inter-frame encoding after the stop signal is received. As a result, high-compression encoding using inter frames is realized, and when transmission of an encoded stream is resumed, an intra frame is transmitted, so that decoding can be resumed promptly.

８０３は符号化ストリームを復号して映像信号を生成し８０４へ出力する。ＩＳＰ１０５はＮＷ８０２を介して符号化ストリームを映像復号器８０５に出力する。映像復号器８０５は符号化ストリームを復号して映像信号を生成する。セレクタ８０４は、８０１がデータを送信した場合には８０３から入力された映像信号を８０７へ、送信しなかった場合には８０５から入力された映像信号を８０７へ出力する。出力とはＲＡＭを介しなされてもよい。ＣＰＵ８０７は位置姿勢推定を行い位置姿勢情報と環境マップをＣＧ描画部１０６へ出力する。ＣＧ描画部１０６は、実施形態１で説明した動作を行い生成した映像を映像符号化部８０６へ出力する。映像符号化部８０６は、映像を符号化して得られた符号化ストリームをＮＷ８０２を介して映像復号部８１１へ出力する。映像復号部８１１は符号化ストリームを復号して生成された映像信号を表示部１０７へ出力する。これらの動作は画像を連続的に撮像、処理しながら行われるため、セレクタによる切り替えも、フレーム画像単位で行われる。 803 decodes the encoded stream to generate a video signal and outputs it to 804. The ISP 105 outputs the encoded stream to the video decoder 805 via the NW 802. The video decoder 805 generates a video signal by decoding the encoded stream. The selector 804 outputs the video signal input from 803 to 807 when the data 801 transmits data, and outputs the video signal input from 805 to 807 when the data is not transmitted. The output may be performed via a RAM. The CPU 807 performs position / orientation estimation and outputs position / orientation information and an environment map to the CG drawing unit 106. The CG rendering unit 106 outputs the video generated by performing the operation described in the first embodiment to the video encoding unit 806. The video encoding unit 806 outputs the encoded stream obtained by encoding the video to the video decoding unit 811 via the NW 802. The video decoding unit 811 outputs the video signal generated by decoding the encoded stream to the display unit 107. Since these operations are performed while images are continuously captured and processed, switching by the selector is also performed in units of frame images.

このように、本実施形態によれば、解析映像と視聴映像との双方がＨＭＤからホストコンピュータに送られ、ホストコンピュータは解析映像と視聴映像との双方を用いて合成映像を生成するため、実施形態１と同様の効果が得られる。また、本実施形態においては、動き量に応じた解析映像の送信制御が行われる。すなわち、本実施形態によれば、実施形態２で示した効果に加えて、映像の動き量が小さい場合に解析映像のＮＷ伝送を抑制し、ＮＷ帯域を削減可能となる。また、削減した分の符号量を視聴映像の符号化の際の符号量に割りあてて、より高精細な映像の視聴映像を生成させる動作をさせてもよい。一方で、実施形態１と同様の効果を得るためには、動き量に応じた解析映像の送信制御を行うことは必須ではない。 Thus, according to the present embodiment, both the analysis video and the viewing video are sent from the HMD to the host computer, and the host computer generates the composite video using both the analysis video and the viewing video. The same effect as in the first mode can be obtained. In this embodiment, analysis video transmission control is performed in accordance with the amount of motion. That is, according to the present embodiment, in addition to the effects shown in the second embodiment, NW transmission of the analysis video can be suppressed and the NW band can be reduced when the video motion amount is small. Further, the code amount corresponding to the reduced amount may be assigned to the code amount at the time of encoding the viewing video, and an operation for generating a higher-definition viewing video may be performed. On the other hand, in order to obtain the same effect as in the first embodiment, it is not essential to perform transmission control of the analysis video according to the amount of motion.

本実施形態において、ＨＭＤは画像処理装置として働き、ホストコンピュータは画像合成装置として働く。そして、ＨＭＤ及びホストコンピュータは画像処理システムを構成する。 In this embodiment, the HMD functions as an image processing device, and the host computer functions as an image composition device. The HMD and the host computer constitute an image processing system.

［実施形態６］
ＭＲシステムに関する第６の実施形態について図９、図１０を使って説明する。図９は実施形態６におけるＭＲ装置構成を説明する図である。図９では図２に対し９０１から９０４が追加されている。特別な説明がない場合、その動作は図２を使って説明した実施形態２と同様とする。 [Embodiment 6]
A sixth embodiment relating to the MR system will be described with reference to FIGS. FIG. 9 is a diagram for explaining an MR apparatus configuration according to the sixth embodiment. In FIG. 9, 901 to 904 are added to FIG. When there is no special description, the operation is the same as that of the second embodiment described with reference to FIG.

９０１はバスである。９０２はＣＰＵである。９０３はＲＡＭである。９０４は不揮発ストレージである。不揮発ストレージに格納されたプログラムはバスを介してＲＡＭに読み込まれＣＰＵが実行する構成となる。本実施形態では位置姿勢推定プログラムが実行され、位置姿勢情報と環境マップ情報が生成される。また、慣性センサ２０１が出力する動き量、ＩＳＰ１０２、１０５が出力する映像データはバスを介してＲＡＭ９０３に格納されプログラムの入力データとなる。ＣＧ描画部１０６はＲＡＭ９０３に格納された画像データをバスを介して取得し、コンピュータグラフィックオブジェクトを描画して、表示部１０７へ出力する。また、ＣＰＵ９０２は各モジュールを制御する。特別な記述がない限りバスにつながれたモジュールは、バス経由でデータを入出力するものとする。 Reference numeral 901 denotes a bus. Reference numeral 902 denotes a CPU. Reference numeral 903 denotes a RAM. Reference numeral 904 denotes a nonvolatile storage. The program stored in the non-volatile storage is read into the RAM via the bus and executed by the CPU. In this embodiment, a position / orientation estimation program is executed to generate position / orientation information and environment map information. Further, the motion amount output from the inertial sensor 201 and the video data output from the ISPs 102 and 105 are stored in the RAM 903 via the bus and become program input data. The CG drawing unit 106 acquires the image data stored in the RAM 903 via the bus, draws a computer graphic object, and outputs the computer graphic object to the display unit 107. The CPU 902 controls each module. Unless otherwise specified, modules connected to the bus shall input and output data via the bus.

以下ＣＰＵ９０２が実行する位置姿勢推定プログラムの動作について図１０を使って説明する。図１０は、実施形態６における位置姿勢推定方法を説明する図である。Ｓ１００００では、動き量を取得する。本実施形態では、慣性センサ２０１から出力される動き量を取得する。Ｓ１００１０では、視聴映像を取得する。Ｓ１００２０では、動き量の大きさを判定する。動き量が視聴映像からの映像解析が困難なほど大きいと判定された場合にはＳ１００３０が実行される。小さいと判定される場合にはＳ１００４０が実行される。Ｓ１００３０では、解析映像を取得する映像Ａとする。前述の通り、解析映像は視聴映像に比べ動きによる画質劣化の少ない映像である。Ｓ１００３５では、解析映像を映像Ａとする。Ｓ１００４０では、視聴映像を映像Ａとする。Ｓ１００５０では、映像Ａを利用して、位置姿勢情報と環境マップを生成する。 The operation of the position / orientation estimation program executed by the CPU 902 will be described below with reference to FIG. FIG. 10 is a diagram for explaining a position and orientation estimation method according to the sixth embodiment. In S10000, the amount of movement is acquired. In this embodiment, the amount of motion output from the inertial sensor 201 is acquired. In S10010, a viewing video is acquired. In S10020, the amount of motion is determined. If it is determined that the amount of motion is so large that video analysis from the viewed video is difficult, S10030 is executed. If it is determined that the value is smaller, S10040 is executed. In S10030, the video A for acquiring the analysis video is used. As described above, the analysis video is a video with less image quality degradation due to movement than the viewing video. In S10035, the analysis video is video A. In S10040, the viewing video is video A. In S10050, using the video A, position and orientation information and an environment map are generated.

Ｓ１００６０では、位置姿勢情報と環境マップを利用して、視聴映像上にコンピュータグラフィックオブジェクトを描画する。これにより仮想的な三次元空間内の定位置のあたかもコンピュータグラフィックオブジェクトが存在するかのような映像を生成される。なお、本実施形態ではＣＰＵ９０２がＣＧ描画部１０６に指示を出す形で実行されるものとするが、これに限定されず、重畳処理をＣＰＵ９０２が担ってもよい。これらの動作は画像を連続的に撮像、処理しながら行われるため、条件判定もフレーム画像単位で行われる。 In S10060, a computer graphic object is drawn on the viewing video using the position and orientation information and the environment map. As a result, an image is generated as if a computer graphic object exists at a fixed position in a virtual three-dimensional space. In the present embodiment, the CPU 902 is executed in such a manner as to issue an instruction to the CG drawing unit 106, but the present invention is not limited to this, and the CPU 902 may be responsible for superimposition processing. Since these operations are performed while images are continuously captured and processed, condition determination is also performed in units of frame images.

本実施形態の構成であっても実施形態２と同様の効果が得られる。なお、本実施形態では慣性センサから出力を動き量としたが、実施形態３、実施形態４で示したように、ソフトウェア処理により算出した値を取得する構成であってもよい。本実施形態では、動き量を取得する例を示したが、動き量を用いない構成をとることもできる。以下、この変形例を図１１を使って説明する。 Even if it is the structure of this embodiment, the effect similar to Embodiment 2 is acquired. In the present embodiment, the output from the inertial sensor is used as the amount of movement, but as shown in the third and fourth embodiments, a value calculated by software processing may be acquired. In the present embodiment, an example in which the amount of motion is acquired has been described, but a configuration in which the amount of motion is not used may be employed. Hereinafter, this modification will be described with reference to FIG.

図１１は、実施形態６の変形例を説明する図である。Ｓ１１０００では、解析映像を取得する。Ｓ１１０１０では、視聴映像を取得する。Ｓ１１０２０では、解析映像を利用して、位置姿勢情報と環境マップを生成する。Ｓ１００３０では、位置姿勢情報と環境マップを利用して、視聴映像上にコンピュータグラフィックオブジェクトを描画する。 FIG. 11 is a diagram illustrating a modification of the sixth embodiment. In S11000, an analysis video is acquired. In S11010, a viewing video is acquired. In S11020, position and orientation information and an environment map are generated using the analysis video. In S10030, a computer graphic object is drawn on the viewing video using the position and orientation information and the environment map.

これにより仮想的な三次元空間内の定位置のあたかもコンピュータグラフィックオブジェクトが存在するかのような映像を生成される。このような構成であっても、実施形態１と同様の効果を得ることができる。 As a result, an image is generated as if a computer graphic object exists at a fixed position in a virtual three-dimensional space. Even if it is such a structure, the effect similar to Embodiment 1 can be acquired.

［実施形態７］
ＭＲシステムに関する第７の実施形態について図１２、図１３を使って説明する。図１２は、実施形態７におけるＭＲ装置構成を説明する図である。特別な説明がない限り、その動作は図８を使って説明した実施形態５の動作と同様とする。 [Embodiment 7]
A seventh embodiment relating to the MR system will be described with reference to FIGS. FIG. 12 is a diagram for explaining an MR apparatus configuration according to the seventh embodiment. Unless otherwise specified, the operation is the same as that of the fifth embodiment described with reference to FIG.

図１２は、実施形態７にＭＲシステム構成を説明する図である。実施形態５では、ホストコンピュータの構成を専用ＨＷが存在しモジュールが直結する構成で示した。本実施形態では、各モジュールがバスに接続され、映像復号機能、映像符号化機能もＣＰＵが行うものとする。図１１には、実施形態５で説明したセレクタ機能がないため、ＣＰＵ８０７がデータフローを制御して位置姿勢推定を行う。このときの制御フローを図１３を使って説明する。図１２は、実施形態７における位置姿勢推定方法を説明する図である。図１３は図１０に対して、Ｓ１００１０がＳ１３０００に、Ｓ１００２０がＳ１３０２０に置き換わっている。その他のステップの動作は、図１０の説明と同様とする。 FIG. 12 is a diagram illustrating the MR system configuration according to the seventh embodiment. In the fifth embodiment, the configuration of the host computer is shown as a configuration in which a dedicated HW exists and modules are directly connected. In this embodiment, each module is connected to a bus, and the video decoding function and the video encoding function are also performed by the CPU. In FIG. 11, since there is no selector function described in the fifth embodiment, the CPU 807 controls the data flow and performs position and orientation estimation. The control flow at this time will be described with reference to FIG. FIG. 12 is a diagram for explaining a position and orientation estimation method according to the seventh embodiment. FIG. 13 is different from FIG. 10 in that S10010 is replaced with S13000 and S10020 is replaced with S13020. The operation of other steps is the same as that described in FIG.

Ｓ１３０００では、ホストコンピュータが解析映像を受信したか否かの状態を取得する。解析映像受信の動作はフレーム毎に行われる。Ｓ１３０２０では、解析映像を受信していたらＳ１００３０を、Ｓ１００４０を実行させる。これらの動作は画像を連続的に撮像、処理しながら行われるため、条件判定も、フレーム画像単位で行われる。本実施形態で示した構成であっても、実施形態５と同様の効果を得ることができる。 In S13000, the status of whether or not the host computer has received the analysis video is acquired. The analysis video reception operation is performed for each frame. In S13020, if an analysis video is received, S10030 is executed, and S10040 is executed. Since these operations are performed while images are continuously captured and processed, condition determination is also performed in units of frame images. Even with the configuration shown in the present embodiment, the same effect as in the fifth embodiment can be obtained.

本実施形態によれば、画像処理装置は第１の映像の受信状態を判定する構成をさらに備える。第１の映像を受信している場合は第１の映像を解析して位置姿勢情報が生成され、第１の映像を受信していない場合は第２の映像を解析して位置姿勢情報が生成される。 According to this embodiment, the image processing apparatus further includes a configuration for determining the reception state of the first video. If the first video is received, the first video is analyzed to generate position / orientation information. If the first video is not received, the second video is analyzed to generate position / orientation information. Is done.

上記ＭＲ装置の例であるヘッドマウントディスプレイは撮像画像にＣＧオブジェクトを重畳した合成画像を表示部に表示してユーザに観察させるビデオシースルータイプで説明した。しかしながら、本願明細書に係るヘッドマウントディスプレイは現実空間を透過して観察可能なディスプレイにＣＧオブジェクトを重畳して表示する光学シースルータイプを適用してもかまわない。 The head mounted display as an example of the MR apparatus has been described as a video see-through type in which a composite image obtained by superimposing a CG object on a captured image is displayed on a display unit to allow a user to observe. However, the head-mounted display according to the present specification may be an optical see-through type in which a CG object is superimposed and displayed on a display that can be observed through the real space.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１０１センサ；１０２ＩＳＰ；１０３位置姿勢推定部；１０４センサ；１０５ＩＳＰ；１０６ＣＧ描画部；１０７表示部；２０１慣性センサ；２０９セレクタ；３０１動き検出部；３０２セレクタ；８０１セレクタ；８０２ＮＷ；８０３映像復号部；８０４セレクタ；８０５映像復号部；８０６映像符号化部；８０７ＣＰＵ；８０８ＲＡＭ；８０９不揮発ストレージ；８１０バス；８１１映像復号部；９０１バス；９０２ＣＰＵ；９０３ＲＡＭ；９０４不揮発ストレージ 101 sensor; 102 ISP; 103 position and orientation estimation unit; 104 sensor; 105 ISP; 106 CG rendering unit; 107 display unit; 201 inertia sensor; 209 selector; 301 motion detection unit; 302 selector; 801 selector; 804 selector; 805 video decoding unit; 806 video encoding unit; 807 CPU; 808 RAM; 809 non-volatile storage; 810 bus; 811 video decoding unit; 901 bus; 902 CPU; 903 RAM;

Claims

An image processing apparatus,
A first imaging means for outputting a first video and relatively less image degradation caused by movement of the subject;
A second imaging means for outputting a second video, wherein image degradation due to movement of the subject is relatively large;
Estimating means for analyzing the first video and generating position and orientation information of the image processing device;
Drawing means for drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information;
An image processing apparatus comprising:

When the movement of the image processing apparatus is larger than a predetermined threshold, the estimating means analyzes the first video to generate the position and orientation information, and when the movement of the image processing apparatus is smaller than the predetermined threshold The image processing apparatus according to claim 1, wherein the position and orientation information is generated by analyzing the second video.

The image processing apparatus according to claim 2, further comprising a motion detection unit that detects a motion of the image processing apparatus using an inertial sensor.

The image processing apparatus according to claim 2, further comprising a motion detection unit that analyzes the first video and detects a motion of the image processing apparatus.

The estimation means analyzes the first video to generate the position / orientation information and detects the movement of the image processing apparatus. When the detected movement is greater than a predetermined threshold, 5. The image processing apparatus according to claim 1, wherein an image to be analyzed at the time of generation is switched to the second image. 6.

The image processing apparatus according to claim 1, wherein the first imaging unit is a global shutter sensor, and the second imaging unit is a rolling shutter sensor.

The first imaging means is a sensor with a relatively high frame rate, and the second imaging means is a sensor with a relatively low frame rate. The image processing apparatus according to item.

The first image pickup means is a sensor having a relatively high shutter speed, and the second image pickup means is a sensor having a relatively low shutter speed. The image processing apparatus according to claim 1.

The image processing apparatus according to claim 1, wherein a resolution of the first video is lower than a resolution of the second video.

The image processing apparatus according to claim 1, wherein a frame rate of the first video is higher than a frame rate of the second video.

First acquisition means for receiving, from the image processing apparatus, a first video that has relatively little image degradation due to movement of the subject;
Second acquisition means for receiving, from the image processing device, a second video that has relatively high image degradation due to movement of the subject;
Estimating means for analyzing the first video and generating position and orientation information of the image processing device;
Drawing means for drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information;
An image composition apparatus comprising:

A determination unit for determining a reception state of the first video;
The estimation means generates the position and orientation information by analyzing the first video when the first video is received, and generates the position and orientation information when the first video is not received. The image synthesizing apparatus according to claim 11, wherein the position and orientation information is generated by analyzing a video.

An image processing system comprising an image processing device and an image composition device,
The image processing apparatus includes:
A first imaging means for outputting a first video and relatively less image degradation caused by movement of the subject;
A second imaging means for outputting a second video, wherein image degradation due to movement of the subject is relatively large;
Transmission means for transmitting the first video and the second video to an image composition device,
The image composition device includes:
First acquisition means for receiving the first video from an image processing device;
Second acquisition means for receiving the second video from the image processing device;
Estimating means for analyzing the first video and generating position and orientation information of the image processing device;
An image processing system comprising: drawing means for drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information.

The transmission means transmits both the first video and the second video when the movement of the image processing apparatus is larger than a predetermined threshold, and the movement of the image processing apparatus is smaller than the predetermined threshold. The image processing system according to claim 13, wherein the second video is transmitted without transmitting the first video.

An image processing method performed by an image processing apparatus,
Using the first imaging means to obtain a first video with relatively little image degradation due to movement of the subject;
Using the second imaging means to obtain a second video that has a relatively large image degradation due to movement of the subject;
Analyzing the first video to generate position and orientation information of the image processing device;
Drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information;
An image processing method comprising:

An image processing method performed by an image composition device,
Receiving a first image from the image processing apparatus with relatively little image degradation caused by movement of the subject;
Receiving from the image processing device a second video image that is relatively image-degraded due to movement of the subject;
Analyzing the first video to generate position and orientation information of the image processing device;
Drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information;
An image processing method comprising:

An image processing method performed by an image processing system including an image processing device and an image composition device,
The image processing apparatus using the first imaging means to obtain a first video that has relatively little image degradation due to movement of the subject;
The image processing apparatus using the second imaging means to acquire a second video image that is relatively largely deteriorated due to movement of the subject;
The image processing device transmitting the first video and the second video to the image synthesis device;
The image synthesizing device receiving the first video from the image processing device;
The image synthesizing device receiving the second video from the image processing device;
The image synthesizing device analyzing the first video to generate position and orientation information of the image processing device;
Drawing the CG object on the second video so that the image synthesizing device is superimposed on the position determined based on the position and orientation information;
An image processing method comprising:

On the computer,
Obtaining a first video imaged by the image processing apparatus, with relatively little image degradation due to movement of the subject;
Obtaining a second video imaged by the image processing device, the image of which is relatively deteriorated due to movement of a subject;
Analyzing the first video to generate position and orientation information of the image processing device;
Drawing a CG object on the second video so as to be superimposed on a position determined based on the position and orientation information;
A program for running