JP2016218594A

JP2016218594A - Image processor, control method image processor and computer program

Info

Publication number: JP2016218594A
Application number: JP2015100694A
Authority: JP
Inventors: 青沼　正志; Masashi Aonuma; 正志青沼; 薫山口; Kaoru Yamaguchi
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2015-05-18
Filing date: 2015-05-18
Publication date: 2016-12-22
Anticipated expiration: 2035-05-18
Also published as: JP6609988B2

Abstract

PROBLEM TO BE SOLVED: To provide an image processor capable of improving user-friendliness by associating a moving image displayed on a display device with an actual object.SOLUTION: The image processor includes: an external scene sensor which picks up at least one object; and an image generation section that generates a virtual image corresponding to at least one moving object in the picked-up image of the object.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理装置の技術に関する。 The present invention relates to a technique of an image processing apparatus.

従来、特許文献１に記載されているように、ビデオ撮影された被写体の動作の内、一部の動作に対応させて生成した画像を、ビデオ撮影された被写体の動画に同期させて表示させる表示装置が知られている。 Conventionally, as described in Patent Document 1, an image generated in response to a part of the motion of a video-captured subject is displayed in synchronization with a video of the subject of video-capture The device is known.

特開２００２−２３００８６号公報Japanese Patent Laid-Open No. 2002-230086

しかし、特許文献１に記載された技術では、表示装置に表示される被写体の動画と生成された画像とは同期しているものの、表示装置に表示される動画等は、表示装置に表示されていない現実の対象となんら関係がない。そのため、表示装置に表示される動画と現実の対象とを関連付けることで使用者の利便性や使い勝手を向上させたいという課題があった。また、表示装置に表示される動画と現実の対象とを関連付けた動画等を容易に作成したいという課題があった。 However, in the technique described in Patent Document 1, although the moving image of the subject displayed on the display device and the generated image are synchronized, the moving image displayed on the display device is displayed on the display device. There is no relationship with no real object. Therefore, there has been a problem that it is desired to improve the convenience and usability of the user by associating the moving image displayed on the display device with the actual object. In addition, there is a problem that it is desirable to easily create a moving image in which a moving image displayed on the display device and an actual target are associated with each other.

本発明は、上述の課題の少なくとも一部を解決するためになされたものであり、以下の形態として実現することが可能である。 SUMMARY An advantage of some aspects of the invention is to solve at least a part of the problems described above, and the invention can be implemented as the following forms.

（１）本発明の一形態によれば、画像処理装置が提供される。この画像処理装置は、少なくとも１つの対象を撮像する外景センサーと；撮像された前記対象の内、動いている前記対象の少なくとも１つに対応する仮想の画像を生成する画像生成部と、を備える。この形態の画像処理装置によれば、動いている対象に対応する仮想の画像を画像生成部が生成するため、使用者は、例えば、作業支援等の仮想の画像を含む動画を簡単に作成でき、使用者の使い勝手が向上する。 (1) According to an aspect of the present invention, an image processing apparatus is provided. The image processing apparatus includes an outside scene sensor that captures at least one target; and an image generation unit that generates a virtual image corresponding to at least one of the captured targets. . According to this form of the image processing apparatus, since the image generation unit generates a virtual image corresponding to the moving object, the user can easily create a moving image including a virtual image such as work support, for example. , User convenience is improved.

（２）上記形態の画像処理装置において、前記画像生成部は、前記仮想の画像が生成される前記対象である画像生成対象の移動領域を、撮像された前記対象の内の前記画像生成対象を除く前記対象の少なくとも１つに対応付けて、前記画像生成対象の前記仮想の画像を生成してもよい。この形態の画像処理装置によれば、生成された仮想の画像を含む動画が再生された場合に、仮想の画像は、仮想の画像に対応付けられた現実に存在する対象の位置や大きさ等に対応付けられて表示される。例えば、動画が作業等の支援動画であった場合には、作業を行なう対象に仮想の画像を重畳させることで、使用者の作業性がより向上し、使用者の使い勝手が向上する。 (2) In the image processing apparatus according to the above aspect, the image generation unit is configured to select a movement area of the image generation target that is the target on which the virtual image is generated, the image generation target among the captured targets. The virtual image of the image generation target may be generated in association with at least one of the objects to be excluded. According to the image processing apparatus of this aspect, when a moving image including the generated virtual image is reproduced, the virtual image is a position or size of a target that actually exists associated with the virtual image. Is displayed in association with. For example, when the moving image is a support moving image such as work, by superimposing a virtual image on the work target, the workability of the user is further improved and the usability of the user is improved.

（３）上記形態の画像処理装置において、前記画像生成部は、前記画像生成対象の前記移動領域に対応付けられる前記対象の大きさに、前記仮想の画像の大きさと前記移動領域との少なくとも一方を対応付けた前記仮想の画像を生成してもよい。この形態の画像処理装置によれば、生成された仮想の画像を含む動画が再生された場合に、仮想の画像は、仮想の画像に対応付けられた現実に存在する対象の位置や大きさ等により対応付けられて表示され、使用者の使い勝手がより向上する。 (3) In the image processing apparatus according to the aspect described above, the image generation unit may set at least one of the size of the virtual image and the movement region to the size of the target associated with the movement region of the image generation target. The virtual image may be generated in association with each other. According to the image processing apparatus of this aspect, when a moving image including the generated virtual image is reproduced, the virtual image is a position or size of a target that actually exists associated with the virtual image. Are displayed in association with each other, and the usability of the user is further improved.

（４）上記形態の画像処理装置において、前記画像生成部は、前記仮想の画像の表示の有無を、設定したトリガー対象の検出の有無に対応させて生成してもよい。この形態の画像処理装置によれば、予め設定された特定の条件の検出によって仮想の画像が表示されるように仮想の画像が作成されるため、仮想の画像の用途に応じた表示のタイミングを設定できる。 (4) In the image processing apparatus according to the above aspect, the image generation unit may generate whether or not the virtual image is displayed in correspondence with whether or not the set trigger target is detected. According to the image processing apparatus of this aspect, since the virtual image is created so that the virtual image is displayed by detecting a predetermined condition set in advance, the display timing according to the use of the virtual image is set. Can be set.

（５）上記形態の画像処理装置において、前記画像生成部は；トリガー対象として、撮像された複数の前記対象の内、動いている前記対象である移動対象と、動いている前記対象から所定の距離以内にあると判定されると共に動いていない前記対象である関連静止対象と、を設定し；前記移動対象の有無と前記関連静止対象の有無との組み合わせに対応付けて、前記移動対象の仮想の画像としての移動対象対応画像と、前記関連静止対象の仮想の画像としての関連静止対象画像と、の組み合わせの仮想の画像を生成してもよい。この形態の画像処理装置によれば、特に何の操作を受け付けなくても、移動対象に関連する仮想の画像を含む複数の仮想の画像の動画が作成され、画像処理装置の使い勝手が向上する。 (5) In the image processing apparatus according to the aspect described above, the image generation unit; as a trigger target; a moving target that is the moving target among a plurality of the captured targets, and a predetermined target from the moving target An associated stationary object that is determined to be within a distance and is not moving; and is associated with a combination of the presence / absence of the moving object and the presence / absence of the associated stationary object. A virtual image may be generated that is a combination of the movement target corresponding image as the first image and the related still target image as the related still target virtual image. According to this form of the image processing apparatus, even if no particular operation is accepted, a plurality of virtual images including a virtual image related to the movement target are created, and the usability of the image processing apparatus is improved.

（６）上記形態の画像処理装置において、さらに；操作を受け付ける操作受付部を備えてもよい。ここで、前記画像生成部は、受け付けられた操作に基づいて、不要な部分を消去して前記仮想の画像を生成してもよい。この形態の画像処理装置によれば、仮想の画像として、生成される必要がない移動物体や生成される必要がある静止物体が選択され、使用者がより使いやすいＡＲシナリオや複合シナリオを作成でき、使用者の使い勝手が向上する。 (6) The image processing apparatus according to the above aspect may further include an operation receiving unit that receives an operation. Here, the image generation unit may generate the virtual image by deleting unnecessary portions based on an accepted operation. According to this form of the image processing apparatus, a moving object that does not need to be generated or a stationary object that needs to be generated is selected as a virtual image, and an AR scenario or a composite scenario that is easier for the user to use can be created. , User convenience is improved.

（７）上記形態の画像処理装置において、前記画像生成部は、前記仮想の画像として、撮像された複数の前記対象の内、動いている前記対象が動いている間に対応する画像を生成してもよい。この形態の画像処理装置によれば、仮想の画像を生成する対象を選択する操作が行なわれなくても、自動的に移動物体の仮想の画像を生成する。よって、例えば、仮想の画像を含む動画が作業等の支援動画であり、作業において何らかの対象を移動させる必要がある場合に、移動させるべき物体である移動物体の仮想の画像が自動的に生成され、使用者の使い勝手が向上する。 (7) In the image processing apparatus according to the above aspect, the image generation unit generates, as the virtual image, a corresponding image while the moving target is moving among the plurality of captured images. May be. According to the image processing apparatus of this aspect, a virtual image of a moving object is automatically generated even when an operation for selecting a target for generating a virtual image is not performed. Therefore, for example, when a moving image including a virtual image is a support moving image such as work, and it is necessary to move some target in the work, a virtual image of a moving object that is an object to be moved is automatically generated. , User convenience is improved.

（８）上記形態の画像処理装置において、さらに；対象選択部を備え；前記対象選択部は、前記少なくとも１つの対象として、人の体の形状と、人の体以外の形状とを識別し；前記画像生成部は、撮像された前記対象の内、人の体の形状に対応する前記仮想の画像を生成しなくてもよい。この形態の画像処理装置によれば、仮想の画像を含む動画が実行されている場合に、移動させるべき物体を動かすための手段である人の手などが仮想の画像として表示されないため、使用者は、手などの不要な仮想の画像を視認せずに済み、使用者の利便性が向上する。 (8) The image processing apparatus according to the above aspect further includes: a target selection unit; the target selection unit identifies a shape of a human body and a shape other than the human body as the at least one target; The image generation unit may not generate the virtual image corresponding to the shape of a human body among the captured objects. According to this form of the image processing apparatus, when a moving image including a virtual image is being executed, a user's hand, which is a means for moving an object to be moved, is not displayed as a virtual image. This eliminates the need for visually recognizing unnecessary virtual images such as hands and improves the convenience of the user.

（９）上記形態の画像処理装置において、さらに；外部の音声を取得する音声取得部を備え；前記画像生成部は、前記仮想の画像が生成される前記対象である画像生成対象と、前記画像生成対象が動いている間に取得された音声と、を対応付けて前記仮想の画像を生成してもよい。この形態の画像処理装置によれば、作成される仮想の画像を含む動画には、撮像画像を基に生成された仮想の画像のような視覚的な情報に加えて、音声といった聴覚的な情報にも対応付けられた仮想の画像が含まれるので、使用者の利便性が向上する。 (9) The image processing apparatus according to the above aspect further includes: an audio acquisition unit that acquires external audio; the image generation unit; the image generation target that is the target on which the virtual image is generated; and the image The virtual image may be generated by associating the sound acquired while the generation target is moving. According to the image processing apparatus of this aspect, in the moving image including the virtual image to be created, in addition to visual information such as a virtual image generated based on the captured image, auditory information such as sound Since the associated virtual image is also included, the convenience for the user is improved.

（１０）上記形態の画像処理装置において、前記画像生成部は、取得された音声を文字画像として、前記仮想の画像と対応付けて生成してもよい。この形態の画像処理装置によれば、音声を視覚的な情報として仮想の画像と並列して生成でき、使用者が情報を認識しやすく、使用者の利便性がより向上する。 (10) In the image processing apparatus of the above aspect, the image generation unit may generate the acquired voice as a character image in association with the virtual image. According to this form of the image processing apparatus, sound can be generated as visual information in parallel with the virtual image, the user can easily recognize the information, and the convenience for the user is further improved.

（１１）上記形態の画像処理装置において、さらに；前記対象までの距離を測定する距離測定部を備え；前記画像生成部は、測定された前記距離に基づいて前記仮想の画像を生成してもよい。この形態の画像処理装置によれば、生成された仮想の画像が、三次元モデルとして生成され得るから、使用者が仮想の画像をより認識しやすく、使用者の利便性がより向上する。 (11) The image processing apparatus according to the above aspect may further include a distance measurement unit that measures a distance to the target; the image generation unit may generate the virtual image based on the measured distance Good. According to this form of the image processing apparatus, since the generated virtual image can be generated as a three-dimensional model, the user can more easily recognize the virtual image, and the convenience for the user is further improved.

（１２）上記形態の画像処理装置において、前記画像生成部は、前記仮想の画像が時間に沿って変化する仮想の動画である場合に、前記仮想の動画の特定の時点に、特定の画像を挿入してもよい。この形態の画像処理装置によれば、特定の時点に行われていることを、特定の画像を介して、視覚的な情報として使用者に認識させることができ、使用者の使い勝手が向上する。 (12) In the image processing device according to the above aspect, the image generation unit may display a specific image at a specific time point of the virtual moving image when the virtual image is a virtual moving image that changes with time. It may be inserted. According to this form of the image processing apparatus, it is possible to cause the user to recognize what is being performed at a specific time point as visual information via the specific image, and the usability of the user is improved.

（１３）上記形態の画像処理装置において、さらに；外部の音声を取得する音声取得部を備え；前記画像生成部は、前記仮想の画像が時間に沿って変化する仮想の動画である場合に、前記仮想の動画の特定の時点と、取得された前記音声と、を対応付けて前記仮想の動画を生成してもよい。この形態の画像処理装置によれば、音声で使用者に特定の時点を認識させることができるため、特定の画像を表示する場合と比較して、特定の画像が仮想の画像に重複することがなく、特定の時点における使用者の使い勝手がより向上する。 (13) In the image processing device of the above aspect, further comprising: an audio acquisition unit that acquires external audio; and when the virtual image is a virtual moving image that changes over time, The virtual moving image may be generated by associating a specific time point of the virtual moving image with the acquired sound. According to this form of the image processing apparatus, it is possible to make the user recognize a specific time point by voice, so that the specific image may overlap the virtual image as compared with the case where the specific image is displayed. In addition, the convenience of the user at a specific time is further improved.

上述した本発明の各形態の有する複数の構成要素はすべてが必須のものではなく、上述の課題の一部または全部を解決するため、あるいは、本明細書に記載された効果の一部または全部を達成するために、適宜、前記複数の構成要素の一部の構成要素について、その変更、削除、新たな他の構成要素との差し替え、限定内容の一部削除を行なうことが可能である。また、上述の課題の一部または全部を解決するため、あるいは、本明細書に記載された効果の一部または全部を達成するために、上述した本発明の一形態に含まれる技術的特徴の一部または全部を上述した本発明の他の形態に含まれる技術的特徴の一部または全部と組み合わせて、本発明の独立した一形態とすることも可能である。 A plurality of constituent elements of each embodiment of the present invention described above are not essential, and some or all of the effects described in the present specification are to be solved to solve part or all of the above-described problems. In order to achieve the above, it is possible to appropriately change, delete, replace with another new component, and partially delete the limited contents of some of the plurality of components. In order to solve part or all of the above-described problems or to achieve part or all of the effects described in this specification, technical features included in one embodiment of the present invention described above. A part or all of the technical features included in the other aspects of the present invention described above may be combined to form an independent form of the present invention.

例えば、本発明の一形態は、外景センサーと、画像生成部と、の２つの要素の内の一つまたは二つを備えた装置として実現可能である。すなわち、この装置は、外景センサーを有していてもよく、有していなくてもよい。また、装置は、画像生成部を有していてもよく、有していなくてもよい。外景センサーは、例えば、少なくとも１つの対象を撮像してもよい。画像生成部は、例えば、撮像された前記対象の内、動いている前記対象の少なくとも１つに対応する仮想の画像を生成してもよい。こうした装置は、例えば、画像処理装置として実現できるが、画像処理装置以外の他の装置としても実現可能である。このような形態によれば、装置の操作性の向上および簡易化、装置の一体化や、装置を使用する使用者の利便性の向上、等の種々の課題の少なくとも１つを解決することができる。前述した画像処理装置の各形態の技術的特徴の一部または全部は、いずれもこの装置に適用することが可能である。 For example, one embodiment of the present invention can be realized as an apparatus including one or two of the two elements of the outside scene sensor and the image generation unit. That is, this apparatus may or may not have an outside scene sensor. Further, the apparatus may or may not have an image generation unit. For example, the outside scene sensor may image at least one target. For example, the image generation unit may generate a virtual image corresponding to at least one of the moving objects among the captured objects. Such an apparatus can be realized as an image processing apparatus, for example, but can also be realized as an apparatus other than the image processing apparatus. According to such a form, it is possible to solve at least one of various problems such as improvement and simplification of the operability of the device, integration of the device, and improvement of convenience of the user who uses the device. it can. Any or all of the technical features of the respective forms of the image processing apparatus described above can be applied to this apparatus.

本発明は、画像処理装置以外の種々の形態で実現することも可能である。例えば、画像処理装置の制御方法、画像処理装置を有するシステム、画像処理装置の制御方法およびシステムを実現するためのコンピュータープログラム、そのコンピュータープログラムを記録した記録媒体、および、そのコンピュータープログラムを含み搬送波内に具現化されたデータ信号等の形態で実現できる。 The present invention can also be realized in various forms other than the image processing apparatus. For example, a control method for an image processing apparatus, a system having the image processing apparatus, a computer program for realizing the control method and system for the image processing apparatus, a recording medium storing the computer program, and a carrier including the computer program It can be realized in the form of a data signal or the like embodied in

本発明の第１実施形態における画像処理装置の構成を機能的に示すブロック図である。1 is a block diagram functionally showing the configuration of an image processing apparatus according to a first embodiment of the present invention. ＲＧＢカメラおよび距離センサーによって被写体を含む外景を撮像する場合の説明図である。It is explanatory drawing at the time of imaging the external scene containing a to-be-photographed object with a RGB camera and a distance sensor. ＡＲシナリオ作成処理のフローチャートである。It is a flowchart of AR scenario creation processing. ＡＲシナリオ作成処理のフローチャートである。It is a flowchart of AR scenario creation processing. 被写体がいない作業開始前に撮像された撮像画像を示す説明図である。It is explanatory drawing which shows the captured image imaged before the work start which does not have a to-be-photographed object. トラッキングされた移動物体を含む外景の撮像画像を示す説明図である。It is explanatory drawing which shows the captured image of the outside scene containing the tracked moving object. ＡＲシナリオに含まれる特定の時点での撮像画像に付加情報が付加された画像を示す説明図である。It is explanatory drawing which shows the image with which additional information was added to the captured image at the specific time contained in AR scenario. 撮像画像の中から不要な物体が消去された後の画像を示す説明図である。It is explanatory drawing which shows the image after the unnecessary object was erase | eliminated from the captured image. 生成されたＡＲ画像とＡＲ画像に対応付けられた物体とを示す説明図である。It is explanatory drawing which shows the produced | generated AR image and the object matched with AR image. 複合シナリオ作成処理のフローチャートである。It is a flowchart of a composite scenario creation process. トリガーが設定されているときに表示される編集画像を示す説明図である。It is explanatory drawing which shows the edit image displayed when the trigger is set. トリガーが検出された場合に分岐シナリオへ分岐したときに表示する画像を示す説明図である。It is explanatory drawing which shows the image displayed when it branches to a branch scenario when a trigger is detected. 頭部装着型表示装置（ＨＭＤ）の外観構成を示す説明図である。It is explanatory drawing which shows the external appearance structure of a head mounting type display apparatus (HMD). 複合シナリオ実行処理のフローチャートである。It is a flowchart of a composite scenario execution process. 実行される複合シナリオに設定された対応物体が検出された場合に使用者が視認する視野を示す説明図である。It is explanatory drawing which shows the visual field which a user visually recognizes when the corresponding | compatible object set to the composite scenario performed is detected. 実行される分岐シナリオに設定されたトリガー対象が検出された場合に使用者が視認する視野を示す説明図である。It is explanatory drawing which shows the visual field which a user visually recognizes when the trigger object set to the branch scenario performed is detected. 第２実施形態における画像処理装置の構成を機能的に示すブロック図である。It is a block diagram which shows functionally the structure of the image processing apparatus in 2nd Embodiment. 第２実施形態におけるＡＲシナリオ作成処理の一部のフローチャートである。It is a flowchart of a part of AR scenario creation process in 2nd Embodiment. 第３実施形態における画像処理装置の構成を機能的に示すブロック図である。It is a block diagram which shows functionally the structure of the image processing apparatus in 3rd Embodiment. 第３実施形態におけるＡＲシナリオ作成処理の一部のフローチャートである。It is a flowchart of a part of AR scenario creation process in 3rd Embodiment. 第４実施形態におけるＡＲシナリオ作成処理のフローチャートである。It is a flowchart of AR scenario creation processing in the fourth embodiment. 第４実施形態におけるＡＲシナリオ作成処理のフローチャートである。It is a flowchart of AR scenario creation processing in the fourth embodiment. 第４実施形態におけるＲＧＢカメラおよび距離センサーによって複数の被写体を含む外景を撮像する場合の説明図である。It is explanatory drawing in the case of imaging the outside scene containing a several to-be-photographed object with the RGB camera and distance sensor in 4th Embodiment. 第４実施形態におけるＲＧＢカメラおよび距離センサーによって複数の被写体を含む外景を撮像する場合の説明図である。It is explanatory drawing in the case of imaging the outside scene containing a several to-be-photographed object with the RGB camera and distance sensor in 4th Embodiment. 第４実施形態におけるＲＧＢカメラおよび距離センサーによって複数の被写体を含む外景を撮像する場合の説明図である。It is explanatory drawing in the case of imaging the outside scene containing a several to-be-photographed object with the RGB camera and distance sensor in 4th Embodiment. 第４実施形態におけるＲＧＢカメラおよび距離センサーによって複数の被写体を含む外景を撮像する場合の説明図である。It is explanatory drawing in the case of imaging the outside scene containing a several to-be-photographed object with the RGB camera and distance sensor in 4th Embodiment. 第４実施形態におけるＲＧＢカメラおよび距離センサーによって複数の被写体を含む外景を撮像する場合の説明図である。It is explanatory drawing in the case of imaging the outside scene containing a several to-be-photographed object with the RGB camera and distance sensor in 4th Embodiment. 第４実施形態のＡＲシナリオ作成処理のステップＳ９５の処理において作成されるトリガー対象の組み合わせとＡＲシナリオとの組み合わせの一例を示す一覧表である。It is a table | surface which shows an example of the combination of the trigger object produced in the process of step S95 of the AR scenario production process of 4th Embodiment, and the combination of AR scenario. ＡＲシナリオが実行されている場合の表示画像決定処理のフローチャートである。It is a flowchart of the display image determination process in case AR scenario is performed. 組み合わせ（１）に対応付けられた表示画像が光学像表示部に表示されたときに使用者が視認する視野の一例を示す説明図である。It is explanatory drawing which shows an example of the visual field which a user visually recognizes when the display image matched with the combination (1) is displayed on the optical image display part. 組み合わせ（２）に対応付けられた表示画像が光学像表示部に表示されたときに使用者が視認する視野の一例を示す説明図である。It is explanatory drawing which shows an example of the visual field which a user visually recognizes when the display image matched with the combination (2) is displayed on the optical image display part. 組み合わせ（３）に対応付けられた表示画像が光学像表示部に表示されたときに使用者が視認する視野の一例を示す説明図である。It is explanatory drawing which shows an example of the visual field which a user visually recognizes when the display image matched with the combination (3) is displayed on the optical image display part. 組み合わせ（４）に対応付けられた表示画像が光学像表示部に表示されたときに使用者が視認する視野の一例を示す説明図である。It is explanatory drawing which shows an example of the visual field which a user visually recognizes when the display image matched with the combination (4) is displayed on the optical image display part.

本明細書における用語「外景センサー」は、以下で説明するＲＧＢカメラ、距離センサーの少なくとも一つを包含する用語である。よって、ＲＧＢカメラ、距離センサー、またはこれらの組み合わせ、のいずれもが「外景センサー」の一例である。もちろん、「外景センサー」は、実施形態において説明されるＲＧＢカメラ、距離センサー、またはこれらの組み合わせに限定されず、外景または外景に含まれる対象（これらを実環境や実物体と呼ぶこともある）の２次元座標または３次元座標を推定するための情報を獲得し出力する装置をいう。 The term “outside scene sensor” in the present specification includes at least one of an RGB camera and a distance sensor described below. Therefore, any of the RGB camera, the distance sensor, or a combination thereof is an example of the “outside scene sensor”. Of course, the “outside scene sensor” is not limited to the RGB camera, the distance sensor, or a combination thereof described in the embodiment, but is included in the outside scene or the outside scene (these may be called a real environment or a real object). A device that acquires and outputs information for estimating the two-dimensional coordinates or three-dimensional coordinates.

本明細書における用語「対象選択部」は、仮想の画像（ＡＲ画像）として表されることになる実物体を選択する機能、または選択するための基礎となる情報を提供する機能を有する構成をいう。実施形態では、物体トラッキング部１２ａ、１２ｂ、１２ｃのそれぞれが、「対象選択部」の一例である。 The term “target selection unit” in this specification has a function of selecting a real object to be represented as a virtual image (AR image) or a function of providing information serving as a basis for selection. Say. In the embodiment, each of the object tracking units 12a, 12b, and 12c is an example of a “target selection unit”.

Ａ．第１実施形態：
Ａ−１．画像処理装置の構成：
図１は、本発明の第１実施形態における画像処理装置１００の構成を機能的に示すブロック図である。画像処理装置１００は、連続的に撮像した被写体の三次元モデルを生成して、生成した被写体の三次元モデルと受け付けた各種操作とに基づいて、ＡＲ（augmented reality）画像を生成する。なお、本実施形態でいうＡＲ画像は、画像認識などによって認識された現実の対象物と関連付けて表示させる画像のことをいう。 A. First embodiment:
A-1. Configuration of image processing device:
FIG. 1 is a block diagram functionally showing the configuration of the image processing apparatus 100 according to the first embodiment of the present invention. The image processing apparatus 100 generates a three-dimensional model of a continuously captured subject, and generates an augmented reality (AR) image based on the generated three-dimensional model of the subject and various received operations. Note that the AR image in the present embodiment refers to an image displayed in association with an actual object recognized by image recognition or the like.

画像処理装置１００は、ＣＰＵ１０と、データ記憶部５０と、電源６０と、ＲＧＢカメラ３１と、距離センサー３２と、マイク３３と、操作部３４と、表示部３５と、ＲＯＭ４１と、ＲＡＭ４２と、を備えている。データ記憶部５０は、各種データを記憶し、ハードディスクドライブなどによって構成されている。電源６０は、画像処理装置１００の各部に電力を供給する。電源６０としては、例えば二次電池を用いることができる。 The image processing apparatus 100 includes a CPU 10, a data storage unit 50, a power source 60, an RGB camera 31, a distance sensor 32, a microphone 33, an operation unit 34, a display unit 35, a ROM 41, and a RAM 42. I have. The data storage unit 50 stores various data and is configured by a hard disk drive or the like. The power supply 60 supplies power to each unit of the image processing apparatus 100. As the power supply 60, for example, a secondary battery can be used.

ＲＧＢカメラ３１は、被写体を含む所定の範囲の外景を撮像するカメラである。本実施形態では、ＲＧＢカメラ３１は、異なる位置に配置された３台の第１カメラ３１１と第２カメラ３１２と第３カメラ３１３とによって構成されている。ＲＧＢカメラ３１は、撮像した外景のＲＧＢデータを後述するＣＰＵ１０のセンサー制御部１５に送信する。距離センサー３２は、照射した無数の点を赤外線カメラによって撮影することにより、照射した対象までの距離を測定するデプスセンサーである。本実施形態では、距離センサー３２は、第１カメラ３１１と第２カメラ３１２と第３カメラ３１３とそれぞれに１対１で対応するように、それぞれのカメラの隣に配置されている。すなわち、距離センサー３２は、ＲＧＢカメラ３１と同じように、異なる位置に配置された３台のデプスセンサー（第１距離センサー３２１、第２距離センサー３２２、第３距離センサー３２３）によって構成されている。距離センサー３２は、赤外線カメラによって撮像し、物体の表面で反射した赤外線反射光の無数の点のデータをＣＰＵ１０のセンサー制御部１５に送信する。なお、他の実施形態では、距離センサー３２は、ＴＯＦ（Time-of-Flight）法を用いて、対象までの距離を測定してもよい。また、本実施形態では、ＲＧＢカメラ３１と距離センサー３２とのそれぞれは、３台のカメラ３１１，３１２，３１３とセンサー３２１，３２２，３２３とによって構成されたが、他の実施形態では、３台よりも少ない数のカメラやセンサーであってもよいし、３台よりも多い数のカメラやセンサーであってもよい。また、ＲＧＢカメラ３１の数と距離センサー３２の数とは、同じである必要はなく、１対１で対応させる必要はない。ＲＧＢカメラ３１、カメラ３１１，３１２，３１３は、請求項における外景センサーに相当する。ただし、ＲＧＢカメラ３１と距離センサー３２とが１対１で対応する場合には、距離センサー３２から実物体までの距離Ｄの計測も含めて、ＲＧＢカメラ３１および距離センサー３２によって外景または実物体を捕捉することを「撮像する」と表記することもある。このとき、撮像された画像はＲＧＢＤデータで表されることになる。ＲＧＢＤデータとは、例えば、画素ごとに、Ｒ，Ｇ，Ｂ、距離Ｄの値があるデータである。 The RGB camera 31 is a camera that captures an outside scene in a predetermined range including a subject. In the present embodiment, the RGB camera 31 includes three first cameras 311, second cameras 312, and third cameras 313 arranged at different positions. The RGB camera 31 transmits the captured RGB data of the outside scene to the sensor control unit 15 of the CPU 10 described later. The distance sensor 32 is a depth sensor that measures the distance to the irradiated object by photographing innumerable irradiated points with an infrared camera. In the present embodiment, the distance sensor 32 is arranged next to each camera so as to correspond to the first camera 311, the second camera 312, and the third camera 313 on a one-to-one basis. That is, the distance sensor 32 includes three depth sensors (a first distance sensor 321, a second distance sensor 322, and a third distance sensor 323) arranged at different positions, like the RGB camera 31. . The distance sensor 32 captures data with an infrared camera and transmits data of countless points of infrared reflected light reflected by the surface of the object to the sensor control unit 15 of the CPU 10. In another embodiment, the distance sensor 32 may measure the distance to the target using a TOF (Time-of-Flight) method. In this embodiment, each of the RGB camera 31 and the distance sensor 32 is configured by three cameras 311, 312, 313 and sensors 321, 322, 323, but in another embodiment, three cameras A smaller number of cameras and sensors may be used, or a larger number of cameras and sensors may be used than three. Further, the number of RGB cameras 31 and the number of distance sensors 32 do not have to be the same, and need not correspond one-to-one. The RGB camera 31 and the cameras 311, 312, 313 correspond to an outside scene sensor in the claims. However, when the RGB camera 31 and the distance sensor 32 have a one-to-one correspondence, the outside scene or the real object is detected by the RGB camera 31 and the distance sensor 32 including the measurement of the distance D from the distance sensor 32 to the real object. Capturing may be referred to as “imaging”. At this time, the captured image is represented by RGBD data. The RGBD data is, for example, data having R, G, B, and distance D values for each pixel.

図２は、ＲＧＢカメラ３１および距離センサー３２によって被写体ＯＢを含む外景ＳＣを撮像する場合の説明図である。図２に示すように、異なる位置に配置された３台のカメラ３３１，３３２，３３３は、被写体ＯＢを含む外景を撮像している。カメラ３３１の撮像画像は、第１カメラ３１１および第１距離センサー３２１の撮像画像に対応し、カメラ３３２の撮像画像は、第２カメラ３１２および第２距離センサー３２２の撮像画像に対応し、カメラ３３３の撮像画像は、第３カメラ３１３および第３距離センサー３２３の撮像画像に対応する。図２に示すように、被写体ＯＢとしての料理人の右手がうろこ取りＴＬを持って、被写体ＯＢの左手が魚ＦＳの頭を押さえ、料理人がうろこ取りＴＬによって魚ＦＳのうろこを取っている状態が撮像されている。異なる位置に配置されたカメラ３３１，３３２，３３３によって取得されたＲＧＢデータおよび距離データに基づいて、被写体ＯＢ、うろこ取りＴＬ、魚ＦＳなどといった外景ＳＣに含まれる対象の位置や色が特定される。 FIG. 2 is an explanatory diagram in the case where the outside scene SC including the subject OB is imaged by the RGB camera 31 and the distance sensor 32. As shown in FIG. 2, the three cameras 331, 332, and 333 arranged at different positions capture an outside scene including the subject OB. The captured image of the camera 331 corresponds to the captured image of the first camera 311 and the first distance sensor 321, the captured image of the camera 332 corresponds to the captured image of the second camera 312 and the second distance sensor 322, and the camera 333 These captured images correspond to the captured images of the third camera 313 and the third distance sensor 323. As shown in FIG. 2, the right hand of the cook as the subject OB has the scale TL, the left hand of the subject OB holds the head of the fish FS, and the cook takes the scale of the fish FS by the scale TL. The state is imaged. Based on the RGB data and distance data acquired by the cameras 331, 332, and 333 arranged at different positions, the position and color of the target included in the outside scene SC such as the subject OB, the scale removal TL, and the fish FS are specified. .

カメラやセンサーの位置を特定する方法としては、その他の態様も取り得る。また、実物体の外景ＳＣにおける三次元位置の求め方は、例えば、カメラ３３１，３３２，３３３の位置関係およびカメラパラメーターが既知であり、距離センサー３２の水平軸が地面と平行である場合に、カメラ３３１，３３２，３３３の共通の撮像範囲内に、光（たとえば赤外光）を間欠発光する光源を設置する。カメラ３３１，３３２，３３３のそれぞれが当該光源を撮像（画像は画素ごとにＲＧＢＤデータで表される。距離Ｄは距離センサー３２からの距離である）することで、カメラ３３１，３３２，３３３のそれぞれから見た光源の三次元位置（Ｘｉ，Ｙｉ，Ｚｉ）（ｉ＝０，１，２）が推定される。そして、例えば、カメラ３３２，３３３（ｉ＝１，２）のそれぞれから見た共通の光源の２つの座標を、カメラ３３１（ｉ＝０）から見た座標に変換する。その際に、カメラ３３２、３３３から見た座標を変換した座標をカメラ３３１から見た座標に一致させる変換行列を、カメラ３３２、３３３のそれぞれについて導出する。具体的には、カメラ３３１の座標と変換後の座標との差分が最も小さくなる変換行列を反復計算により算出する。このような設定をすることにより、カメラ３３１，３３２，３３３のそれぞれの視点から生成されたそれぞれの三次元モデルを融合し、カメラの視点に依存しない１つの三次元モデルを生成することができる。なお、設置された光源の位置を変更させて、カメラ３３１，３３２，３３３が撮像することで、精度を向上させてもよい。また、ＲＧＢカメラ３１と距離センサー３２のセットの数は、４セット以上でもよい。 As a method for specifying the position of the camera or sensor, other modes can be used. In addition, for example, when the positional relationship and camera parameters of the cameras 331, 332, and 333 are known and the horizontal axis of the distance sensor 32 is parallel to the ground, A light source that intermittently emits light (for example, infrared light) is installed within a common imaging range of the cameras 331, 332, and 333. Each of the cameras 331, 332, and 333 images the light source (the image is represented by RGBD data for each pixel. The distance D is a distance from the distance sensor 32), thereby each of the cameras 331, 332, and 333. The three-dimensional position (Xi, Yi, Zi) of the light source viewed from (i = 0, 1, 2) is estimated. Then, for example, two coordinates of the common light source viewed from each of the cameras 332 and 333 (i = 1, 2) are converted into coordinates viewed from the camera 331 (i = 0). At this time, a transformation matrix is derived for each of the cameras 332 and 333 so that the coordinates obtained by transforming the coordinates viewed from the cameras 332 and 333 coincide with the coordinates viewed from the camera 331. Specifically, a transformation matrix that minimizes the difference between the coordinates of the camera 331 and the transformed coordinates is calculated by iterative calculation. By making such settings, it is possible to fuse the three-dimensional models generated from the viewpoints of the cameras 331, 332, and 333 and generate one three-dimensional model that does not depend on the camera viewpoint. Note that the accuracy may be improved by changing the position of the installed light source and capturing images by the cameras 331, 332, and 333. Further, the number of sets of the RGB camera 31 and the distance sensor 32 may be four or more.

マイク３３（図１）は、ＲＧＢカメラ３１および距離センサー３２が所定の範囲を撮像している間や、使用者からの操作を受け付ける場合に、外部の音声を取得する。マイク３３は、取得した音声に基づく音声信号を後述するＣＰＵ１０のＵＩ制御部１６に送信する。操作部３４は、使用者からの入力を受け付けるユーザーインターフェース（ＵＩ）である。操作部３４は、キーボードやマウスによって構成されている。操作部３４は、押下されたキーボードのキーに対応する制御信号や、マウスのポインターの位置の変化に基づく制御信号を、ＣＰＵ１０のＵＩ制御部１６に送信する。表示部３５は、ＵＩ制御部１６から送信された画像信号に基づいて、画像を表示する液晶パネルである。使用者は、表示部３５に表示される画像を見ながら、操作部３４やマイク３３を操作することで、画像処理装置１００を操作できる。操作部３４およびマイク３３は、請求項における操作受付部に相当し、また、マイク３３は、音声取得部に相当する。 The microphone 33 (FIG. 1) acquires external sound while the RGB camera 31 and the distance sensor 32 are imaging a predetermined range or when receiving an operation from the user. The microphone 33 transmits an audio signal based on the acquired audio to the UI control unit 16 of the CPU 10 described later. The operation unit 34 is a user interface (UI) that receives input from the user. The operation unit 34 includes a keyboard and a mouse. The operation unit 34 transmits to the UI control unit 16 of the CPU 10 a control signal corresponding to the pressed keyboard key or a control signal based on a change in the position of the mouse pointer. The display unit 35 is a liquid crystal panel that displays an image based on the image signal transmitted from the UI control unit 16. The user can operate the image processing apparatus 100 by operating the operation unit 34 and the microphone 33 while viewing the image displayed on the display unit 35. The operation unit 34 and the microphone 33 correspond to an operation reception unit in the claims, and the microphone 33 corresponds to an audio acquisition unit.

ＣＰＵ１０は、ＲＯＭ４１に格納されたコンピュータープログラムを読み出し、ＲＡＭ４２に展開して実行することにより、画像処理装置１００を制御する。ＣＰＵ１０は、ＡＲシナリオ制御部１１と、物体トラッキング部１２と、物体認識部１３と、三次元モデル生成部１４（３Ｄモデル生成部１４）と、センサー制御部１５と、ユーザーインターフェース制御部１６（ＵＩ制御部１６）と、ＡＲシナリオ操作設定部１７と、付加情報取得部１８と、不要画像消去部１９と、ＡＲ画像抽出部２１と、を有している。 The CPU 10 controls the image processing apparatus 100 by reading out a computer program stored in the ROM 41, developing it in the RAM 42 and executing it. The CPU 10 includes an AR scenario control unit 11, an object tracking unit 12, an object recognition unit 13, a three-dimensional model generation unit 14 (3D model generation unit 14), a sensor control unit 15, and a user interface control unit 16 (UI). A control unit 16), an AR scenario operation setting unit 17, an additional information acquiring unit 18, an unnecessary image erasing unit 19, and an AR image extracting unit 21.

センサー制御部１５は、ＲＧＢカメラ３１から送信された外景のＲＧＢデータと、距離センサー３２から送信された赤外線カメラによって撮像された無数の点のデータと、を取得する。センサー制御部１５は、ＲＧＢカメラ３１および距離センサー３２から取得したデータを、物体トラッキング部１２と３Ｄモデル生成部１４とのそれぞれに送信する。また、ＵＩ制御部１６から送信された制御信号に基づいて、ＲＧＢカメラ３１および距離センサー３２を制御する。 The sensor control unit 15 acquires RGB data of the outside scene transmitted from the RGB camera 31 and countless point data captured by the infrared camera transmitted from the distance sensor 32. The sensor control unit 15 transmits the data acquired from the RGB camera 31 and the distance sensor 32 to each of the object tracking unit 12 and the 3D model generation unit 14. Further, the RGB camera 31 and the distance sensor 32 are controlled based on the control signal transmitted from the UI control unit 16.

３Ｄモデル生成部１４は、ＡＲシナリオ制御部１１から送信される制御信号に基づいて、センサー制御部１５から送信されたＲＧＢカメラ３１のＲＧＢデータと距離センサー３２の距離データと用いて、撮像した所定の範囲の中に存在する対象について、三次元モデル（３Ｄモデル）を作成する。三次元モデルの具体的な作成としては、３Ｄモデル生成部１４は、距離センサー３２によって取得された距離データに基づいて、撮像範囲の物体の形状を取得し、取得した距離のデータに基づいて、取得した物体の形状における同一の境界を検出して、三次元モデルを生成する。また、３Ｄモデル生成部１４は、ＲＧＢカメラ３１から送信されたＲＧＢデータに基づいて、生成した三次元モデルに対して着色する。３Ｄモデル生成部１４は、生成した着色済みの三次元モデルと、検出した同一の境界のデータと、を物体認識部１３へと送信する。 Based on the control signal transmitted from the AR scenario control unit 11, the 3D model generation unit 14 uses the RGB data of the RGB camera 31 transmitted from the sensor control unit 15 and the distance data of the distance sensor 32 to capture a predetermined image. A three-dimensional model (3D model) is created for an object existing in the range. As a specific creation of the three-dimensional model, the 3D model generation unit 14 acquires the shape of the object in the imaging range based on the distance data acquired by the distance sensor 32, and based on the acquired distance data, A three-dimensional model is generated by detecting the same boundary in the acquired object shape. The 3D model generation unit 14 colors the generated three-dimensional model based on the RGB data transmitted from the RGB camera 31. The 3D model generation unit 14 transmits the generated colored three-dimensional model and the detected data of the same boundary to the object recognition unit 13.

物体認識部１３は、ＡＲシナリオ制御部１１から送信される制御信号に基づいて、生成された三次元モデルと、検出された同一の境界のデータと、を用いて、連続している境界のデータを有する三次元モデルを１つの物体として認識する。換言すれば、物体認識部１３は、連続していない境界のデータに基づいて、三次元モデルを切り離して、１つ１つの物体として認識する。また、物体認識部１３は、ＡＲシナリオ制御部１１を介して、後述するデータ記憶部５０に記憶された人体のパーツ（例えば、手や足）とパターンマッチングや統計的識別法によって比較することにより、三次元モデルの中から人体を抽出する。なお、距離センサー３２およびセンサー３２１，３２２，３２３は、距離測定部に相当する。 The object recognizing unit 13 uses the generated three-dimensional model and the detected same boundary data based on the control signal transmitted from the AR scenario control unit 11 to generate continuous boundary data. Is recognized as one object. In other words, the object recognizing unit 13 separates the three-dimensional model and recognizes the objects as individual objects based on the data of the discontinuous boundaries. In addition, the object recognition unit 13 compares the human body parts (for example, hands and feet) stored in the data storage unit 50 (described later) with the pattern matching or statistical identification method via the AR scenario control unit 11. Extract the human body from the 3D model. The distance sensor 32 and the sensors 321, 322 and 323 correspond to a distance measuring unit.

物体トラッキング部１２は、ＡＲシナリオ制御部１１から送信される制御信号に基づいて、認識された物体の１つ１つの内、撮像している間に動いている物体の動きを特定する。物体トラッキング部１２は、動いている物体（移動物体）と動いていない物体（静止物体）とを特定する情報をＵＩ制御部１６およびＡＲシナリオ制御部１１へと送信する。 Based on the control signal transmitted from the AR scenario control unit 11, the object tracking unit 12 identifies the movement of the recognized object among the recognized objects while moving. The object tracking unit 12 transmits information specifying a moving object (moving object) and a non-moving object (stationary object) to the UI control unit 16 and the AR scenario control unit 11.

ＵＩ制御部１６は、マイク３３や操作部３４が受け付けた操作に基づいて、表示部３５やＣＰＵ１０に含まれる各部へと制御信号を送信する。例えば、ＵＩ制御部１６は、操作部３４が受け付けた操作に基づいて、ＲＧＢカメラ３１および距離センサー３２を制御する信号をセンサー制御部１５へと送信する。また、物体トラッキング部１２から送信された特定した物体について、物体のそれぞれを使用者が選択して操作できるように、物体のそれぞれを表示部３５に表示するための画像信号を送信する。また、ＵＩ制御部１６は、後述するＡＲシナリオ作成時において、マイク３３によって取得された音声を自動的に文字画像へと変換するテキスト変換部１６１を有している。テキスト変換部１６１は、取得された音声を音声認識して、対応する文字画像へと変換する。 The UI control unit 16 transmits a control signal to each unit included in the display unit 35 and the CPU 10 based on an operation received by the microphone 33 or the operation unit 34. For example, the UI control unit 16 transmits a signal for controlling the RGB camera 31 and the distance sensor 32 to the sensor control unit 15 based on the operation received by the operation unit 34. Further, an image signal for displaying each of the objects on the display unit 35 is transmitted so that the user can select and operate each of the identified objects transmitted from the object tracking unit 12. In addition, the UI control unit 16 includes a text conversion unit 161 that automatically converts voice acquired by the microphone 33 into a character image when creating an AR scenario, which will be described later. The text conversion unit 161 recognizes the acquired voice and converts it into a corresponding character image.

ＡＲシナリオ操作設定部１７は、画像処理装置１００によって作成されたＡＲシナリオが操作する状況について設定する。なお、本実施形態では、ＡＲシナリオとは、少なくとも１つの移動物体に対応するＡＲ画像を含む動画のことをいい、ＡＲシナリオには、使用者によって挿入された音声や文字画像などが含まれる。ＡＲシナリオ操作設定部１７は、例えば、生成されたＡＲ画像に対応付けられた対象物が、画像認識などによって現実の対象として検出された場合などに、ＡＲシナリオが実行するように設定する。また、ＡＲシナリオ操作設定部１７は、現実において、予め設定された特定の物体が検出されることにより、複数のＡＲシナリオの分岐などを設定する。 The AR scenario operation setting unit 17 sets a situation in which an AR scenario created by the image processing apparatus 100 is operated. In the present embodiment, the AR scenario refers to a moving image including an AR image corresponding to at least one moving object, and the AR scenario includes a voice or a character image inserted by a user. The AR scenario operation setting unit 17 sets the AR scenario to be executed when, for example, an object associated with the generated AR image is detected as a real object by image recognition or the like. In addition, the AR scenario operation setting unit 17 actually sets a branch of a plurality of AR scenarios by detecting a preset specific object.

付加情報取得部１８は、ＵＩ制御部１６から送信されるＵＩが受け付けた操作信号やＡＲシナリオ制御部１１から送信される制御信号に基づいて、ＡＲシナリオに付加する情報を取得する。ＡＲシナリオに付加される情報としては、例えば、操作部３４が受け付けた操作によって設定されるＡＲ画像の拡大または縮小して表示する表示方法の設定や、マイク３３が取得した音声が変換されたテキストの挿入などがある。 The additional information acquisition unit 18 acquires information to be added to the AR scenario based on the operation signal received by the UI transmitted from the UI control unit 16 and the control signal transmitted from the AR scenario control unit 11. Information added to the AR scenario includes, for example, a display method setting for displaying an enlarged or reduced AR image set by an operation received by the operation unit 34, and text obtained by converting speech acquired by the microphone 33. There are insertions.

ＡＲシナリオ制御部１１は、ＡＲシナリオを作成するために、ＣＰＵ１０の各部を制御する。ＡＲシナリオ制御部１１は、物体トラッキング部１２によって特定された移動物体および静止物体と、ＵＩが受け付けた操作と、に基づいて、ＡＲ画像として生成される物体とＡＲ画像を生成されない物体と区別し、区別した結果を不要画像消去部１９およびＡＲ画像抽出部２１へと送信する。また、ＡＲシナリオ制御部１１は、データ記憶部５０との各種データを送受信することにより、以前に作成したＡＲシナリオを読み込んで編集したり、新規で作成したＡＲシナリオをデータ記憶部５０に記憶させたりする。 The AR scenario control unit 11 controls each unit of the CPU 10 in order to create an AR scenario. The AR scenario control unit 11 distinguishes between an object generated as an AR image and an object for which an AR image is not generated based on the moving object and the stationary object specified by the object tracking unit 12 and the operation accepted by the UI. The discrimination result is transmitted to the unnecessary image erasing unit 19 and the AR image extracting unit 21. The AR scenario control unit 11 transmits and receives various data to and from the data storage unit 50 to read and edit the previously created AR scenario, or to store the newly created AR scenario in the data storage unit 50. Or

不要画像消去部１９は、ＡＲシナリオ制御部１１およびＡＲシナリオ操作設定部１７から送信された制御信号に基づいて、特定された物体の内、ＡＲ画像として生成されない物体の画像を消去する。換言すれば、不要画像消去部１９は、撮像画像の中から、ＡＲ画像として生成する物体を選択するともいえる。不要画像消去部１９は、不要な物体として消去した画像の画像信号をＡＲシナリオ制御部１１へと送信する。 The unnecessary image erasing unit 19 erases an image of an object that is not generated as an AR image among the identified objects, based on the control signals transmitted from the AR scenario control unit 11 and the AR scenario operation setting unit 17. In other words, it can be said that the unnecessary image erasing unit 19 selects an object to be generated as an AR image from the captured images. The unnecessary image erasing unit 19 transmits an image signal of an image deleted as an unnecessary object to the AR scenario control unit 11.

ＡＲ画像抽出部２１は、ＡＲシナリオ制御部１１から送信された制御信号に基づいて、ＡＲ画像としてＡＲシナリオに表示させる物体を抽出して、画像を生成する。ＡＲ画像抽出部２１は、距離センサー３２によって取得された距離データに基づいて、ＡＲ画像を三次元の画像として生成する。また、ＡＲ画像抽出部２１は、ＲＧＢカメラ３１によって取得されたＲＧＢデータに基づいて、生成したＡＲ画像に着色を行なう。ＡＲ画像抽出部２１は、抽出したＡＲ画像として生成する物体を特定する信号をＡＲシナリオ制御部１１へと送信する。また、ＡＲ画像抽出部２１は、操作部３４を介して所定の操作を受け付けることにより、データ記憶部５０に記憶された特定の物体を、自動的にＡＲ画像として生成する物体として抽出できる。抽出される物体としては、例えば、ＣＡＤ（computer aided design）によって作成された図面によって特定される物体などがある。なお、ＡＲ画像抽出部２１は、請求項における画像生成部に相当する。 Based on the control signal transmitted from the AR scenario control unit 11, the AR image extraction unit 21 extracts an object to be displayed in the AR scenario as an AR image, and generates an image. The AR image extraction unit 21 generates an AR image as a three-dimensional image based on the distance data acquired by the distance sensor 32. The AR image extracting unit 21 colors the generated AR image based on the RGB data acquired by the RGB camera 31. The AR image extraction unit 21 transmits a signal for specifying an object to be generated as the extracted AR image to the AR scenario control unit 11. In addition, the AR image extraction unit 21 can automatically extract a specific object stored in the data storage unit 50 as an object to be generated as an AR image by receiving a predetermined operation via the operation unit 34. Examples of the extracted object include an object specified by a drawing created by CAD (computer aided design). The AR image extraction unit 21 corresponds to the image generation unit in the claims.

Ａ−２．ＡＲシナリオ作成処理：
図３および図４は、ＡＲシナリオ作成処理のフローチャートである。ＡＲシナリオ作成処理では、画像処理装置１００がＲＧＢカメラ３１および距離センサー３２によって撮像された外景画像に含まれる移動物体等のＡＲ画像を作成する処理である。 A-2. AR scenario creation process:
3 and 4 are flowcharts of the AR scenario creation process. In the AR scenario creation process, the image processing apparatus 100 creates an AR image such as a moving object included in an outside scene image captured by the RGB camera 31 and the distance sensor 32.

ＡＲシナリオ作成処理では、初めに、マイク３３または操作部３４がＡＲシナリオの作成を開始する操作の受付を待機する（ステップＳ１２）。マイク３３は、予め設定された所定の音声を受け付けることで、画像処理装置１００がＡＲシナリオ作成処理を開始する。また、操作部３４は、予め設定された所定のキーボードのボタン操作を受け付けることで、画像処理装置１００がＡＲシナリオ作成処理を開始する。ステップＳ１２の処理において、マイク３３または操作部３４は、ＡＲシナリオ作成処理を開始する操作を受け付けない場合には（ステップＳ１２：ＮＯ）、引き続き、ＡＲ作成処理を開始する操作の受付を待機する（ステップＳ１２）。 In the AR scenario creation process, first, the microphone 33 or the operation unit 34 waits for an operation to start creating an AR scenario (step S12). When the microphone 33 receives a predetermined sound set in advance, the image processing apparatus 100 starts an AR scenario creation process. The operation unit 34 receives a predetermined keyboard button operation, and the image processing apparatus 100 starts the AR scenario creation process. In the process of step S12, when the microphone 33 or the operation unit 34 does not accept an operation for starting the AR scenario creation process (step S12: NO), the microphone 33 or the operation unit 34 continues to wait for an operation for starting the AR creation process (step S12: NO). Step S12).

ＡＲシナリオ作成処理を開始する操作を受け付けた場合には（ステップＳ１２：ＹＥＳ）、ＡＲシナリオ制御部１１は、ＲＧＢカメラ３１および距離センサー３２が撮像する撮像範囲を設定する（ステップＳ１４）。ＡＲシナリオ制御部１１は、操作部３４が所定の操作を受け付けることで、撮像範囲の範囲や位置を設定する。なお、本実施形態では、ＲＧＢカメラ３１の撮像範囲と距離センサー３２の撮像範囲とは、同じ範囲として設定されるが、他の実施形態では、ＲＧＢカメラ３１の撮像範囲と距離センサー３２の撮像範囲とが別々に設定されてもよい。 When an operation for starting the AR scenario creation process is received (step S12: YES), the AR scenario control unit 11 sets an imaging range captured by the RGB camera 31 and the distance sensor 32 (step S14). The AR scenario control unit 11 sets the range and position of the imaging range when the operation unit 34 receives a predetermined operation. In the present embodiment, the imaging range of the RGB camera 31 and the imaging range of the distance sensor 32 are set as the same range. However, in other embodiments, the imaging range of the RGB camera 31 and the imaging range of the distance sensor 32 are set. And may be set separately.

ＲＧＢカメラ３１および距離センサー３２の撮像範囲が設定されると、ＲＧＢカメラ３１は、撮像範囲のＲＧＢデータを取得し、距離センサー３２は、距離センサー３２から撮像範囲内に存在する物体までの距離を測定する（ステップＳ１６）。ＲＧＢカメラ３１は、取得した撮像範囲のＲＧＢデータを、センサー制御部１５を介して、３Ｄモデル生成部１４および物体トラッキング部１２へと送信する。距離センサー３２は、測定した撮像範囲に存在する物体までの距離データを、センサー制御部１５を介して、３Ｄモデル生成部１４および物体トラッキング部１２へと送信する。 When the imaging ranges of the RGB camera 31 and the distance sensor 32 are set, the RGB camera 31 acquires RGB data of the imaging range, and the distance sensor 32 determines the distance from the distance sensor 32 to an object existing in the imaging range. Measurement is performed (step S16). The RGB camera 31 transmits the acquired RGB data of the imaging range to the 3D model generation unit 14 and the object tracking unit 12 via the sensor control unit 15. The distance sensor 32 transmits distance data to an object existing in the measured imaging range to the 3D model generation unit 14 and the object tracking unit 12 via the sensor control unit 15.

３Ｄモデル生成部１４は、ＲＧＢカメラ３１から送信されたＲＧＢデータと、距離センサー３２から送信された距離データと、に基づいて、三次元モデル（３Ｄモデル）を生成する（ステップＳ１８）。３Ｄモデル生成部１４は、距離データに基づいて、撮像範囲に含まれる物体の形状の三次元モデルを生成する。また、３Ｄモデル生成部１４は、ＲＧＢデータに基づいて、生成した三次元モデルを着色する。なお、本実施形態では、３Ｄモデル生成部１４が生成する三次元モデルは、カメラ３３１、３３２，３３３からのそれぞれの視点から生成されたそれぞれの三次元モデルが、１つに融合されたものである。物体認識部１３は、生成された三次元モデルに対して、検出された同一の境界データを用いることで、三次元モデルの中に含まれる個々の物体を認識する（ステップＳ２０）。 The 3D model generation unit 14 generates a three-dimensional model (3D model) based on the RGB data transmitted from the RGB camera 31 and the distance data transmitted from the distance sensor 32 (step S18). The 3D model generation unit 14 generates a three-dimensional model of the shape of the object included in the imaging range based on the distance data. Further, the 3D model generation unit 14 colors the generated three-dimensional model based on the RGB data. In the present embodiment, the 3D model generated by the 3D model generation unit 14 is obtained by merging each 3D model generated from each viewpoint from the cameras 331, 332, and 333 into one. is there. The object recognition unit 13 recognizes individual objects included in the three-dimensional model by using the detected same boundary data for the generated three-dimensional model (step S20).

図５は、被写体ＯＢがいない作業開始前に撮像された撮像画像を示す説明図である。撮像範囲が設定されると、ＲＧＢカメラ３１および距離センサー３２は、外景ＳＣの撮像を開始する。撮像が開始された直後では、被写体ＯＢが撮像範囲にいないため、図５に示すように、撮像画像には、料理人である被写体ＯＢが含まれていない。本実施形態では、撮像が開始された直後では、被写体ＯＢがいない状態で、魚ＦＳとうろこ取りＴＬとのそれぞれの三次元モデルが作成される。詳細については後述するが、生成されたうろこ取りＴＬの三次元モデルは、物体トラッキング部１２によって特定されたうろこ取りＴＬの動きに関連付けられることで、ＡＲシナリオを構成する一部の画像として用いられる。 FIG. 5 is an explanatory diagram illustrating a captured image captured before the start of work without the subject OB. When the imaging range is set, the RGB camera 31 and the distance sensor 32 start imaging the outside scene SC. Immediately after the start of imaging, the subject OB is not in the imaging range, and as shown in FIG. 5, the captured image does not include the subject OB that is a cook. In the present embodiment, immediately after the imaging is started, the three-dimensional models of the fish FS and the scale removing TL are created without the subject OB. Although the details will be described later, the generated three-dimensional model of the scale removal TL is used as a partial image constituting the AR scenario by being associated with the movement of the scale removal TL specified by the object tracking unit 12. .

図３のステップＳ２０の処理が行なわれると、マイク３３または操作部３４が動画の撮像を始める前の初期設定が済んだことを示す操作の受付を待機する（ステップＳ２１）。初期操作が済んだことを示す操作が受け付けられない場合には、ＣＰＵ１０は、再度、ステップＳ１４以降の処理を実行する。ステップＳ２１の処理において、マイク３３または操作部３４が、初期操作が済んだことを示す操作を受け付けた場合には（ステップＳ２１：ＹＥＳ）、次に、移動物体の撮像を開始する操作の受付を待機する（ステップＳ２２）。マイク３３または操作部３４が、撮像を開始する操作を受け付けない場合には（ステップＳ２２：ＮＯ）、撮像を開始する操作の受付を待機する（ステップＳ２２）。ステップＳ２２の処理において、撮像を開始する操作を受け付けた場合には（ステップＳ２２：ＹＥＳ）、ＲＧＢカメラ３１および距離センサー３２は、設定された撮像範囲を、時間の経過に伴って動画として撮像する（ステップＳ２２）。物体トラッキング部１２は、撮像範囲に含まれると共に三次元モデルとして生成された物体の内、移動物体と静止物体とを区別して、移動物体をトラッキングする（ステップＳ２４）。物体トラッキング部１２は、区別した移動物体のＲＧＢデータの変化量と測定された距離の変化量とを計測し、移動物体の軌跡や姿勢などの位置の変化を特定する。 When the process of step S20 in FIG. 3 is performed, the microphone 33 or the operation unit 34 waits for an operation indicating that the initial setting before the start of moving image capturing has been completed (step S21). When the operation indicating that the initial operation has been completed is not accepted, the CPU 10 executes the processes after step S14 again. In the process of step S21, when the microphone 33 or the operation unit 34 receives an operation indicating that the initial operation has been completed (step S21: YES), next, an operation for starting imaging of a moving object is received. Wait (step S22). When the microphone 33 or the operation unit 34 does not accept an operation for starting imaging (step S22: NO), the microphone 33 or the operation unit 34 waits for acceptance of an operation for starting imaging (step S22). In the process of step S22, when an operation for starting imaging is accepted (step S22: YES), the RGB camera 31 and the distance sensor 32 capture the set imaging range as a moving image as time elapses. (Step S22). The object tracking unit 12 tracks the moving object by distinguishing between the moving object and the stationary object among the objects included in the imaging range and generated as the three-dimensional model (step S24). The object tracking unit 12 measures a change amount of the RGB data of the distinguished moving object and a change amount of the measured distance, and specifies a change in position such as a trajectory or posture of the moving object.

図６は、トラッキングされた移動物体を含む外景ＳＣの撮像画像を示す説明図である。図６では、図２の撮像範囲と同じ範囲を設定された場合に、ＲＧＢカメラ３１および距離センサー３２が撮像する外景ＳＣが示されている。図６の外景ＳＣでは、図２の外景ＳＣと比較して、料理人である被写体ＯＢが持っているうろこ取りＴＬの位置が、被写体ＯＢの左手が押さえている魚ＦＳの頭に近づいている点が異なる。そのため、物体トラッキング部１２は、うろこ取りＴＬとうろこ取りＴＬを持っている被写体ＯＢの右手とを移動物体としてトラッキングし、その他の被写体ＯＢの左手や魚ＦＳなどを静止物体として区別する。なお、図６では、カメラ３３１，３３２，３３３によって撮像される撮像画像を示しているため、カメラ３３１，３３２，３３３は、撮像画像の中には含まれない。本実施形態では、被写体ＯＢの右手とうろこ取りＴＬとが移動物体としてトラッキングされたが、他の実施形態では、うろこ取りＴＬの移動によって、魚ＦＳから剥ぎ取られたうろこが移動物体としてトラッキングされてもよい。 FIG. 6 is an explanatory diagram showing a captured image of the outside scene SC including the tracked moving object. FIG. 6 shows an outside scene SC captured by the RGB camera 31 and the distance sensor 32 when the same range as the imaging range of FIG. 2 is set. In the outside scene SC of FIG. 6, the scale of the scale TL held by the subject OB who is the cook is closer to the head of the fish FS held by the left hand of the subject OB than the outside scene SC of FIG. 2. The point is different. Therefore, the object tracking unit 12 tracks the scale TL and the right hand of the subject OB having the scale TL as moving objects, and distinguishes the left hand of other subjects OB, fish FS, and the like as stationary objects. Note that FIG. 6 shows captured images captured by the cameras 331, 332, and 333, and thus the cameras 331, 332, and 333 are not included in the captured images. In the present embodiment, the right hand of the subject OB and the scale removal TL are tracked as a moving object. In other embodiments, the scale peeled off from the fish FS is tracked as a moving object by the movement of the scale removal TL. May be.

ＡＲシナリオ制御部１１は、トラッキングされた移動物体またはトラッキングされた移動物体が接触している静止物体の内の少なくとも１つを、作成されたＡＲシナリオが実行される場合に、ＡＲシナリオの実行を開始するためのトリガーとしてＡＲシナリオデータ内に設定する。ＡＲシナリオが実行されるためのトリガーが設定され、ＡＲ表示を実行できる頭部装着型表示装置（ヘッドマウントディスプレイ（Head Mounted Display）、ＨＭＤ）に備え付けられたカメラの撮像画像の中にトリガーが検出されると、自動的にトリガーが設定されているＡＲシナリオが実行される。なお、ＡＲシナリオ制御部１１は、ＡＲシナリオに対して、トリガーを設定してもよいし、設定しなくてもよい。 The AR scenario control unit 11 executes the AR scenario when the created AR scenario is executed on at least one of the tracked moving object or the stationary object that is in contact with the tracked moving object. Set in the AR scenario data as a trigger to start. A trigger for executing an AR scenario is set, and the trigger is detected in a captured image of a camera mounted on a head-mounted display device (Head Mounted Display, HMD) capable of executing AR display. Then, the AR scenario for which the trigger is automatically set is executed. The AR scenario control unit 11 may or may not set a trigger for the AR scenario.

次に、付加情報取得部１８は、マイク３３または操作部３４が受け付けた操作に基づいて、作成するＡＲシナリオに付加する付加情報を取得する（ステップＳ２６）。付加情報としては、例えば、ＲＧＢカメラ３１および距離センサー３２が移動物体を含む撮像範囲を撮像している最中に、マイク３３によって取得された音声がテキスト変換部１６１によって変換された文字画像などがある。また、付加情報の他の例として、他の情報処理装置において作成したＡＲシナリオが使用されている場合に、検出されることが望ましい物体（推奨物体）であるうろこ取りＴＬに対して、検出されることが望ましくない物体（非推奨物体）として包丁が設定されてもよい。この場合、ＡＲシナリオが実行されているときに、非推奨物体である包丁が検出されると、ＡＲシナリオが停止されたり、別のＡＲシナリオに切り替わったりしてもよい。なお、ＡＲシナリオの実行の詳細については、後述する「Ａ−４．複合シナリオの実行」において説明する。 Next, the additional information acquisition unit 18 acquires additional information to be added to the AR scenario to be created based on the operation received by the microphone 33 or the operation unit 34 (step S26). As the additional information, for example, a character image obtained by converting the voice acquired by the microphone 33 by the text conversion unit 161 while the RGB camera 31 and the distance sensor 32 are imaging the imaging range including the moving object. is there. As another example of additional information, when an AR scenario created in another information processing apparatus is used, it is detected with respect to a scale removal TL that is an object (recommended object) that is desirably detected. A knife may be set as an object that is not desirable to be used (non-recommended object). In this case, when a knife that is a non-recommended object is detected while the AR scenario is being executed, the AR scenario may be stopped or switched to another AR scenario. Details of the execution of the AR scenario will be described in “A-4. Execution of composite scenario” described later.

図７は、ＡＲシナリオに含まれる特定の時点での撮像画像に付加情報が付加された画像を示す説明図である。図７には、ＲＧＢカメラ３１および距離センサー３２によって撮像された図６の撮像画像に対して、マイク３３が取得した音声を付加情報のテキスト画像ＴＸ１として付加した画像が示されている。なお、操作部３４が使用者の操作を受け付けることで、テキスト画像ＴＸ１のフォントの大きさや色の変更や、撮像画像にテキスト画像ＴＸ１を付加する位置などが変更されてもよい。 FIG. 7 is an explanatory diagram illustrating an image in which additional information is added to a captured image at a specific time point included in the AR scenario. FIG. 7 shows an image obtained by adding the sound acquired by the microphone 33 as a text image TX1 of additional information to the captured image of FIG. 6 captured by the RGB camera 31 and the distance sensor 32. Note that the operation unit 34 may accept a user's operation to change the font size or color of the text image TX1, the position where the text image TX1 is added to the captured image, or the like.

付加情報が取得されると（図３のステップＳ２６）、ＡＲシナリオ制御部１１がいずれの物体のＡＲ画像を作成するかを決定するために、操作部３４は、ＡＲ画像を作成する対象とはならない物体の不要な画像を自動で消去するか否かを選択させる操作を受け付ける（ステップＳ２８）。ＡＲシナリオ制御部１１は、不要画像の消去を自動で行なうか、自動ではない手動で行うかを選択させる選択画面を表示部３５に表示させ、操作部３４が受け付けた操作によって、不要画像の消去の処理を決定する。ステップＳ２８の処理において、不要画像を自動で消去する選択の操作が受け付けられた場合には（ステップＳ２８：ＹＥＳ）、不要画像消去部１９は、静止物体と移動物体の内の人体とを、ＡＲ画像を生成する対象ではない不要な物体として消去する（ステップＳ３８）。換言すれば、不要画像消去部１９は、移動物体の内の人体を除いた物体を、ＡＲ画像を生成する対象として消去しない。また、不要画像消去部１９は、撮像画像の中から、物体認識部１３によって抽出された人体のパーツを消去する。撮像画像の中から消去された物体は、作成されるＡＲシナリオにおいて、ＡＲ画像として表示されない。なお、本実施形態では、同じ物体であっても、動いている間は移動物体と呼び、動いていない間は静止物体と呼ぶ。他の実施形態では、物体ごとに移動物体と静止物体とを定義付けてもよい。 When the additional information is acquired (step S26 in FIG. 3), in order for the AR scenario control unit 11 to determine which object the AR image is to be created, the operation unit 34 is a target for creating the AR image. An operation for selecting whether or not to automatically delete an unnecessary image of an object that should not be received is accepted (step S28). The AR scenario control unit 11 causes the display unit 35 to display a selection screen for selecting whether to delete unnecessary images automatically or not automatically, and deletes unnecessary images by an operation received by the operation unit 34. Determine the processing. In the process of step S28, when an operation of selection for automatically erasing unnecessary images is accepted (step S28: YES), the unnecessary image erasing unit 19 converts a stationary object and a human body among moving objects to AR. It is deleted as an unnecessary object that is not a target for generating an image (step S38). In other words, the unnecessary image erasing unit 19 does not erase an object excluding a human body among moving objects as a target for generating an AR image. Further, the unnecessary image erasing unit 19 erases human parts extracted by the object recognition unit 13 from the captured image. An object deleted from the captured image is not displayed as an AR image in the created AR scenario. In the present embodiment, even the same object is called a moving object while it is moving, and is called a stationary object while it is not moving. In other embodiments, a moving object and a stationary object may be defined for each object.

図８は、撮像画像の中から不要な物体が消去された後の画像を示す説明図である。不要な物体として消去されない移動物体は、うろこ取りＴＬのみであるが、図８では、説明のために、不要な物体として消去された魚ＦＳ（破線）と被写体ＯＢ（一点鎖線）とについても示している。不要画像消去部１９は、撮像画像の中から、外景ＳＣに含まれる静止物体の魚ＦＳなどを消去し、かつ、静止しているか移動しているかに関わらず、人体と判断する料理人である被写体ＯＢについても消去する。なお、本実施形態では、魚ＦＳを静止している物体として説明したが、静止しているか移動しているかの判定は、移動量の大小で判定されてもよい。例えば、被写体ＯＢによって魚ＦＳの尾が持ち上げられるなどの場合があり、この場合には、移動量の大小や移動の速度などによって、魚ＦＳなどが移動している場合であっても、静止している物体として判定されてもよい。 FIG. 8 is an explanatory diagram showing an image after unnecessary objects are erased from the captured image. The moving object that is not erased as an unnecessary object is only the scale removal TL, but FIG. 8 also shows the fish FS (dashed line) and the subject OB (dashed line) erased as an unnecessary object for the sake of explanation. ing. The unnecessary image erasing unit 19 is a cook who erases a stationary object such as the fish FS included in the outside scene SC from the captured image, and determines whether it is a human body regardless of whether it is stationary or moving. The subject OB is also deleted. In the present embodiment, the fish FS has been described as a stationary object. However, whether the fish FS is stationary or moving may be determined based on the amount of movement. For example, there is a case where the tail of the fish FS is lifted by the subject OB. In this case, even if the fish FS or the like is moving due to the amount of movement or the speed of movement, the fish FS is stationary. It may be determined as a moving object.

不要な物体が映像から消去されると（図４のステップＳ３８）、ＡＲ画像抽出部２１は、不要画像消去部１９によってＡＲ画像を生成する対象として選択された不要な物体以外の物体を抽出して、抽出した物体（以下、「抽出物体」とも呼ぶ）と抽出した物体が接触している物体（以下、「接触物体」と呼ぶ）とを対応付けたＡＲ画像を生成する（ステップＳ３４）。抽出物体と接触物体との対応付けとしては、接触物体の大きさや向きおよび接触物体の移動領域に対して、抽出物体の移動領域や抽出物体の大きさを対応付ける例が挙げられる。ＡＲ画像を生成する対象の物体の移動領域や大きさなどと、当該対象の物体から所定の距離内に存在する物体の移動領域や大きさなどと、が対応付けられることで、作成されたＡＲシナリオが実行された場合に、ＡＲ画像に対応付けられた接触物体が検出されて、検出された接触物体の位置・形状・大きさに対応付けられて、抽出物体に基づいて生成されたＡＲ画像が表示される。また、ＡＲ画像抽出部２１は、ＡＲ画像が生成される移動物体が撮像中に移動している間に、マイク３３によって取得された音声を付加情報として、ＡＲ画像に対応付けてもよい。付加情報としての音声をＡＲ画像に対応付ける方法としては、例えば、移動物体のＡＲ画像が表示されている間のみ、移動物体に対応付けられた音声をテキスト画像として表示させるなどがある。なお、ＡＲ画像が生成される対象は、請求項における画像生成対象に相当する。 When an unnecessary object is deleted from the video (step S38 in FIG. 4), the AR image extraction unit 21 extracts an object other than the unnecessary object selected as a target for generating the AR image by the unnecessary image deletion unit 19. Then, an AR image in which the extracted object (hereinafter also referred to as “extracted object”) is associated with the object in contact with the extracted object (hereinafter referred to as “contact object”) is generated (step S34). Examples of the association between the extracted object and the contact object include an example of associating the size and direction of the contact object and the movement area of the contact object with the movement area of the extraction object and the size of the extraction object. The movement area and size of the target object for generating the AR image are associated with the movement area and size of the object existing within a predetermined distance from the target object, thereby creating the created AR When the scenario is executed, the contact object associated with the AR image is detected, and the AR image generated based on the extracted object is associated with the detected position / shape / size of the contact object. Is displayed. The AR image extraction unit 21 may associate the sound acquired by the microphone 33 as additional information with the AR image while the moving object for generating the AR image is moving during imaging. As a method of associating the sound as additional information with the AR image, for example, the sound associated with the moving object is displayed as a text image only while the AR image of the moving object is displayed. The target for generating the AR image corresponds to the target for generating an image in the claims.

図９は、生成されたＡＲ画像とＡＲ画像に対応付けられた物体とを示す説明図である。図９には、ＡＲ画像として生成されたうろこ取りＴＬの画像ＡＲ１（実線）と、画像ＡＲ１に対応付けられている魚ＦＳ（破線）と、が示されている。画像ＡＲ１を含むＡＲシナリオが実行されている場合に、ＡＲシナリオを実行している装置が魚ＦＳを検出すると、検出された魚ＦＳの位置に対応付けてうろこ取りＴＬの画像ＡＲ１が表示される。なお、装置によってＡＲシナリオが実行されている場合における魚ＦＳの検出や画像ＡＲ１の表示の詳細については、「Ａ−４．複合シナリオの実行」で説明する。 FIG. 9 is an explanatory diagram illustrating the generated AR image and an object associated with the AR image. FIG. 9 shows an image AR1 (solid line) of the scale removal TL generated as an AR image and a fish FS (broken line) associated with the image AR1. When the AR scenario including the image AR1 is executed and the apparatus executing the AR scenario detects the fish FS, the scale AR image AR1 is displayed in association with the position of the detected fish FS. . Details of the detection of the fish FS and the display of the image AR1 when the AR scenario is executed by the apparatus will be described in “A-4. Execution of composite scenario”.

ＡＲ画像が生成されると（図４のステップＳ３４）、ＡＲシナリオ制御部１１は、ＡＲ画像と付加情報とに基づいて、ＡＲシナリオとしての動画を作成し、作成したＡＲシナリオデータをデータ記憶部５０に保存し（ステップＳ３６）、画像処理装置１００は、ＡＲシナリオ作成処理を終了する。 When the AR image is generated (step S34 in FIG. 4), the AR scenario control unit 11 creates a moving image as an AR scenario based on the AR image and the additional information, and the created AR scenario data is stored in the data storage unit. 50 (step S36), and the image processing apparatus 100 ends the AR scenario creation process.

ステップＳ２８の処理において、不要な画像を自動ではなく、手動で選択する操作が受け付けられた場合には（ステップＳ２８：ＮＯ）、ＡＲシナリオ制御部１１は、表示部３５に、撮像範囲に含まれる移動物体と静止物体のそれぞれを選択させるための選択画像を表示させる（ステップＳ３０）。不要画像消去部１９は、操作部３４が受け付けた操作に基づいて、ＲＧＢカメラ３１および距離センサー３２によって撮像画像の中から、消去すべき不要な物体として選択された移動物体や静止物体を消去する。撮像画像の中から手動で消去すべき物体が選択される場合には、撮像画像から自動で静止物体や人体のパーツが消去された場合と比較して、使用者は、任意にＡＲ画像を生成する対象を選択できる。例えば、うろこ取りＴＬのＡＲ画像に加えて、うろこ取りＴＬを持っている被写体ＯＢの右手のＡＲ画像や魚ＦＳのＡＲ画像などが生成されてもよい。なお、他の実施形態では、消去すべき物体ではなく、ＡＲ画像を生成する対象を選択する態様であってもよい。使用者の操作によって、不要な物体が消去されると（ステップＳ３２）、ＡＲ画像抽出部２１およびＡＲシナリオ制御部１１は、ステップＳ３４以降の処理を実行する。 In the process of step S28, when an operation for manually selecting an unnecessary image is accepted instead of automatic (step S28: NO), the AR scenario control unit 11 is included in the imaging range in the display unit 35. A selection image for selecting each of the moving object and the stationary object is displayed (step S30). The unnecessary image erasing unit 19 erases a moving object or a stationary object selected as an unnecessary object to be erased from the captured images by the RGB camera 31 and the distance sensor 32 based on the operation received by the operation unit 34. . When an object to be manually deleted is selected from the captured image, the user arbitrarily generates an AR image as compared to a case where a stationary object or a human body part is automatically deleted from the captured image. The target to be selected can be selected. For example, in addition to the AR image of the scale removal TL, an AR image of the right hand of the subject OB having the scale removal TL, an AR image of the fish FS, or the like may be generated. In other embodiments, an object for generating an AR image may be selected instead of an object to be erased. When an unnecessary object is erased by the user's operation (step S32), the AR image extraction unit 21 and the AR scenario control unit 11 execute the processes after step S34.

Ａ−３．複合シナリオ作成処理：
図１０は、複合シナリオ作成処理のフローチャートである。複合シナリオ作成処理では、ＣＰＵ１０が複数のＡＲシナリオを組み合わせた複合シナリオを作成する処理である。ＣＰＵ１０のＡＲシナリオ操作設定部１７は、作成する複合シナリオとして、例えば、あるＡＲシナリオでトリガーを検出した場合には、別のＡＲシナリオへと分岐する複合シナリオを作成する。トリガーとしては、例えば、撮像範囲内における特定の物体の検出や、マイク３３や操作部３４が受け付けた操作などがある。 A-3. Composite scenario creation process:
FIG. 10 is a flowchart of the composite scenario creation process. In the composite scenario creation process, the CPU 10 creates a composite scenario in which a plurality of AR scenarios are combined. For example, when a trigger is detected in a certain AR scenario, the AR scenario operation setting unit 17 of the CPU 10 creates a composite scenario that branches to another AR scenario. Examples of the trigger include detection of a specific object within the imaging range and an operation received by the microphone 33 or the operation unit 34.

複合シナリオ作成処理では、初めに、マイク３３または操作部３４が複合シナリオの作成を開始する操作の受付を待機する（ステップＳ４２）。複合シナリオ作成処理を開始する操作が受け付けられない場合には（ステップＳ４２：ＮＯ）、ＡＲシナリオ操作設定部１７は、複合シナリオ作成処理を終了する（ステップＳ４２）。 In the composite scenario creation process, first, the microphone 33 or the operation unit 34 waits for an operation to start creating a composite scenario (step S42). If the operation for starting the composite scenario creation process is not accepted (step S42: NO), the AR scenario operation setting unit 17 ends the composite scenario creation process (step S42).

ステップＳ４２の処理において、複合シナリオ作成処理を開始する操作を受け付けた場合には（ステップＳ４２：ＹＥＳ）、ＡＲシナリオ操作設定部１７は、複合シナリオの基となる１つのＡＲシナリオ（以下、「基本シナリオ」とも呼ぶ）を選択させる画面を選択する（ステップＳ４４）。ＡＲシナリオ操作設定部１７は、基本シナリオを使用者に選択させるために、データ記憶部５０に保存された複数のＡＲシナリオを表示部３５に表示させ、使用者に操作部３４を操作させることで、複数のＡＲシナリオから１つの基本シナリオを選択させる。なお、基本シナリオの選択の方法については、これに限られず、種々変形可能である。 In the process of step S42, when an operation for starting the composite scenario creation process is received (step S42: YES), the AR scenario operation setting unit 17 sets one AR scenario (hereinafter referred to as “basic” as a basis of the composite scenario). A screen for selecting “scenario” is also selected (step S44). The AR scenario operation setting unit 17 displays a plurality of AR scenarios stored in the data storage unit 50 on the display unit 35 and allows the user to operate the operation unit 34 in order to allow the user to select a basic scenario. , One basic scenario is selected from a plurality of AR scenarios. Note that the method for selecting the basic scenario is not limited to this, and various modifications can be made.

次に、ＡＲシナリオ操作設定部１７は、基本シナリオに複合させた別のシナリオ（以下、「分岐シナリオ」とも呼ぶ）に分岐するためのトリガーを設定する（ステップＳ４６）。ＡＲシナリオ操作設定部１７は、マイク３３および操作部３４が受け付けた操作に基づいて、基本シナリオに対してトリガーを設定する。ＡＲ画像抽出部２１は、トリガーを設定するときに、トリガーを設定している基本シナリオの編集を行なっていることを使用者に視認させるために、予め設定された画像を表示部３５に表示させる。換言すれば、ＡＲ画像抽出部２１は、複合シナリオを編集する場合に、ＡＲシナリオの中に予め設定された画像を挿入する。なお、本実施形態における分岐シナリオは、基本シナリオが全て終了した後に実行される新しいＡＲシナリオも含む。 Next, the AR scenario operation setting unit 17 sets a trigger for branching to another scenario combined with the basic scenario (hereinafter also referred to as “branch scenario”) (step S46). The AR scenario operation setting unit 17 sets a trigger for the basic scenario based on operations accepted by the microphone 33 and the operation unit 34. When the AR image extraction unit 21 sets a trigger, the AR image extraction unit 21 displays a preset image on the display unit 35 so that the user can visually recognize that the basic scenario in which the trigger is set is being edited. . In other words, the AR image extraction unit 21 inserts a preset image into the AR scenario when editing the composite scenario. Note that the branch scenario in this embodiment includes a new AR scenario that is executed after all the basic scenarios are completed.

図１１は、トリガーが設定されているときに表示される編集画像ＫＣを示す説明図である。図１１には、基本シナリオが魚ＦＳのうろこを剥ぎ取ることを促すＡＲシナリオである場合に、分岐シナリオへと分岐する編集時に表示される編集画像ＫＣが示されている。基本シナリオにトリガーが設定されている場合には、基本シナリオが実行されているときと同じように検出された実物の魚ＦＳの位置に対応付けられて、うろこ取りＴＬのＡＲ画像である画像ＡＲ１と編集画像ＫＣとが表示される。編集画像ＫＣには、基本シナリオが「うろこ取り」のＡＲシナリオであることと、基本シナリオが開始してから「８分３７秒」が経過していることと、現時点で「分岐編集」状態であることと、を示す画像である。編集画像ＫＣは、操作部３４が受け付けた操作により、移動させたり、消去させたりすることができる。なお、編集時は、請求項における特定の時点に相当し、編集画像ＫＣは、請求項における特定の画像に相当する。 FIG. 11 is an explanatory diagram showing an edited image KC displayed when a trigger is set. FIG. 11 shows an edited image KC displayed at the time of editing for branching to a branch scenario when the basic scenario is an AR scenario that prompts the fish FS scale to be peeled off. When a trigger is set for the basic scenario, an image AR1 that is an AR image of the scale removal TL is associated with the position of the real fish FS detected in the same manner as when the basic scenario is executed. And the edited image KC are displayed. The edited image KC shows that the basic scenario is an AR scenario of “scale removal”, that “8 minutes and 37 seconds” have passed since the basic scenario started, It is an image showing that there is. The edited image KC can be moved or deleted by an operation received by the operation unit 34. Note that the time of editing corresponds to a specific time point in the claims, and the edited image KC corresponds to a specific image in the claims.

ＡＲシナリオ操作設定部１７は、トリガーを設定すると（図１０のステップＳ４６）、基本シナリオが実行されているときに設定したトリガーが検出された場合に、分岐する分岐シナリオを設定する（ステップＳ４８）。ＡＲシナリオ操作設定部１７は、分岐シナリオを設定するために、データ記憶部５０に保存された複数のＡＲシナリオを表示部３５に表示させ、使用者に操作部３４を操作させることで、複数のＡＲシナリオから１つの分岐シナリオを設定する。なお、分岐シナリオの選択の方法については、これに限られず、種々変形可能である。 When the AR scenario operation setting unit 17 sets a trigger (step S46 in FIG. 10), the AR scenario operation setting unit 17 sets a branch scenario to branch when the set trigger is detected when the basic scenario is executed (step S48). . In order to set a branch scenario, the AR scenario operation setting unit 17 displays a plurality of AR scenarios stored in the data storage unit 50 on the display unit 35, and causes the user to operate the operation unit 34, thereby One branch scenario is set from the AR scenario. Note that the method for selecting a branch scenario is not limited to this, and various modifications can be made.

図１２は、トリガーが検出された場合に分岐シナリオへ分岐したときに表示する画像を示す説明図である。図１２には、撮像画像の中からトリガーとして設定された包丁ＫＮが検出されて、分岐シナリオへ分岐したときに表示するテキスト画像ＴＸ２とＡＲ画像である画像ＡＲ２が示されている。テキスト画像ＴＸ２は、うろこ取りＴＬを用いて魚ＦＳのうろこを剥ぎ取ることを促すための文字画像であり、分岐シナリオに分岐された場合に表示するための付加された付加情報である。テキスト画像ＴＸ２は、ＡＲシナリオを実行している装置が画像を表示できる表示範囲と対応付けられて表示されるように設定されている。画像ＡＲ２は、トリガーとして検出されて包丁ＫＮが魚ＦＳのうろこを剥ぎ取ることとは関係がないことを示すための「×」を表すＡＲ画像である。画像ＡＲ２は、トリガーである包丁ＫＮが検出された場合に、分岐シナリオにおいて、表示するように設定された付加情報である。画像ＡＲ２は、検出された包丁ＫＮの位置に対応付けて表示されるように設定されている。 FIG. 12 is an explanatory diagram showing an image displayed when a branch is made to a branch scenario when a trigger is detected. FIG. 12 shows a text image TX2 and an AR image image AR2 that are displayed when a knife KN set as a trigger is detected from the captured image and the branch scenario is branched. The text image TX2 is a character image for encouraging the fish FS to peel off the scale using the scale removing TL, and is additional information added to be displayed when the branch scenario is branched. The text image TX2 is set so as to be displayed in association with a display range in which an apparatus executing the AR scenario can display an image. The image AR <b> 2 is an AR image representing “×” for indicating that the knife KN is not related to the removal of the scale of the fish FS by being detected as a trigger. The image AR2 is additional information set to be displayed in the branch scenario when the trigger knife KN is detected. The image AR2 is set to be displayed in association with the detected position of the knife KN.

分岐シナリオが設定されると（図１０のステップＳ４８）、マイク３３または操作部３４は、選択された基本シナリオに対して、さらに、別の分岐シナリオを追加するか否かの操作を受け付ける（ステップＳ５０）。別の分岐シナリオを追加する操作を受け付けた場合には（ステップＳ５０：ＹＥＳ）、ＡＲシナリオ操作設定部１７は、ステップＳ４６以降の処理を実行する。 When the branch scenario is set (step S48 in FIG. 10), the microphone 33 or the operation unit 34 accepts an operation as to whether or not to add another branch scenario to the selected basic scenario (step S48). S50). When an operation for adding another branch scenario is received (step S50: YES), the AR scenario operation setting unit 17 executes the processing after step S46.

ステップＳ５０の処理において、別の分岐シナリオを追加する操作が受け付けられなかった場合には（ステップＳ５０：ＮＯ）、ＡＲシナリオ操作設定部１７は、選択された基本シナリオに設定した分岐シナリオを合わせて複合シナリオとして作成し、作成した複合シナリオをデータ記憶部５０に保存し、複合シナリオ作成処理を終了する。 If the operation for adding another branch scenario is not accepted in the process of step S50 (step S50: NO), the AR scenario operation setting unit 17 matches the branch scenario set to the selected basic scenario. The composite scenario is created, the created composite scenario is stored in the data storage unit 50, and the composite scenario creation process is terminated.

Ａ−４．複合シナリオの実行：
画像処理装置１００によって作成された複合シナリオが実行される複合シナリオ実行処理について説明する。複合シナリオ実行処理では、外景の物体を検出でき、かつ、画像表示部にＡＲ画像を表示できる装置が、外景の中から検出された特定の物体に基づいて、特定の複合シナリオを実行する処理である。なお、複合シナリオを実行できる装置としては、例えば、外景の物体を検出するための撮像カメラが搭載されたＨＭＤなどがある。 A-4. Running a composite scenario:
A composite scenario execution process in which a composite scenario created by the image processing apparatus 100 is executed will be described. In the composite scenario execution process, an apparatus capable of detecting an object in the outside scene and displaying an AR image on the image display unit executes a specific composite scenario based on the specific object detected in the outside scene. is there. An apparatus that can execute a composite scenario includes, for example, an HMD equipped with an imaging camera for detecting an object in the outside scene.

図１３は、頭部装着型表示装置２００（ＨＭＤ２００）の外観構成を示す説明図である。ＨＭＤ２００は、使用者が虚像を視認すると同時に外景も直接視認可能な光学透過型の頭部装着型表示装置である。ＨＭＤ２００は、使用者の頭部に装着された状態において使用者に虚像を視認させる画像表示部８０と、画像表示部８０を制御する制御部７０（コントローラー７０）と、を備えている。 FIG. 13 is an explanatory diagram showing an external configuration of the head-mounted display device 200 (HMD 200). The HMD 200 is an optically transmissive head-mounted display device that allows a user to visually recognize a virtual image and at the same time directly view an outside scene. HMD200 is provided with the image display part 80 which makes a user visually recognize a virtual image in the state with which the user's head was mounted | worn, and the control part 70 (controller 70) which controls the image display part 80. FIG.

画像表示部８０は、使用者の頭部に装着される装着体であり、眼鏡形状を有している。画像表示部８０は、右表示駆動部８２と、左表示駆動部８４と、右光学像表示部８６と、左光学像表示部８８と、カメラ８９と、デプスセンサー９１と、９軸センサー８７と、を含んでいる。右光学像表示部８６および左光学像表示部８８は、それぞれ、使用者が画像表示部８０を装着した際に使用者の右および左の眼前に位置するように配置されている。右表示駆動部８２と左表示駆動部８４とは、使用者が画像表示部８０を装着した際の使用者の頭部に対向する側に配置されている。 The image display unit 80 is a wearing body attached to the user's head and has a glasses shape. The image display unit 80 includes a right display drive unit 82, a left display drive unit 84, a right optical image display unit 86, a left optical image display unit 88, a camera 89, a depth sensor 91, and a 9-axis sensor 87. , Including. The right optical image display unit 86 and the left optical image display unit 88 are arranged so as to be positioned in front of the right and left eyes of the user when the user wears the image display unit 80, respectively. The right display drive unit 82 and the left display drive unit 84 are arranged on the side facing the user's head when the user wears the image display unit 80.

表示駆動部８２，８４は、液晶ディスプレイで形成されている。光学部材としての光学像表示部８６，８８は、導光板と調光板とを含んでいる。導光板は、光透過性の樹脂材料等によって形成され、表示駆動部８２，８４から出力された画像光を使用者の眼に導く。調光板は、薄板状の光学素子であり、使用者の眼の側とは反対の側である画像表示部８０の表側を覆うように配置されている。 The display driving units 82 and 84 are formed of a liquid crystal display. The optical image display units 86 and 88 as optical members include a light guide plate and a light control plate. The light guide plate is formed of a light transmissive resin material or the like, and guides the image light output from the display drive units 82 and 84 to the user's eyes. The light control plate is a thin plate-like optical element, and is disposed so as to cover the front side of the image display unit 80 which is the side opposite to the user's eye side.

カメラ８９は、使用者が画像表示部８０を装着した際の使用者の眉間に対応する位置に配置されている。そのため、カメラ８９は、使用者が画像表示部２０を頭部に装着した状態において、使用者の視線方向の外部の景色である外景を撮像し、撮像した画像である撮像画像を取得する。デプスセンサー９１は、撮像範囲に含まれる対象物までの距離を測定する距離センサーである。 The camera 89 is disposed at a position corresponding to the user's eyebrow when the user wears the image display unit 80. Therefore, the camera 89 captures an outside scene that is an external scenery in the direction of the user's line of sight in a state where the user wears the image display unit 20 on the head, and acquires a captured image that is the captured image. The depth sensor 91 is a distance sensor that measures the distance to an object included in the imaging range.

９軸センサー８７は、利用者の右側のこめかみに対応する位置に配置されている。９軸センサー８７は、加速度（３軸）、角速度（３軸）、地磁気（３軸）を検出するモーションセンサーである。９軸センサー８７は、画像表示部８０に設けられているため、画像表示部８０が利用者の頭部に装着されているときには、ＨＭＤ２００の利用者の頭部の動きを検出する動き検出部として機能する。ここで、頭部の動きとは、頭部の速度・加速度・角速度・向き・向きの変化を含む。 The 9-axis sensor 87 is disposed at a position corresponding to the temple on the right side of the user. The 9-axis sensor 87 is a motion sensor that detects acceleration (3 axes), angular velocity (3 axes), and geomagnetism (3 axes). Since the 9-axis sensor 87 is provided in the image display unit 80, when the image display unit 80 is attached to the user's head, the 9-axis sensor 87 serves as a motion detection unit that detects the movement of the user's head of the HMD 200. Function. Here, the movement of the head includes changes in the speed, acceleration, angular velocity, direction, and direction of the head.

画像表示部８０は、さらに、画像表示部８０を制御部７０に接続するための接続部８５を有している。接続部８５の一部は、右イヤホン８１および左イヤホン８３に延伸している。接続部を構成するコードとしては、例えば、金属ケーブルや光ファイバーを採用できる。画像表示部８０と制御部７０とは、接続部８５を介して各種信号の伝送を行なう。 The image display unit 80 further includes a connection unit 85 for connecting the image display unit 80 to the control unit 70. A part of the connecting portion 85 extends to the right earphone 81 and the left earphone 83. For example, a metal cable or an optical fiber can be used as the cord constituting the connecting portion. The image display unit 80 and the control unit 70 transmit various signals via the connection unit 85.

制御部７０は、ＨＭＤ２００を制御するための装置である。制御部７０は、複数のキーやトラックパッドなどによって構成される操作部である。制御部７０の複数のキーは、押下操作を検出して、画像表示部８０へと押下されたキーに対応する制御信号を送信する。制御部７０のトラックパッドは、トラックパッドの操作面上での使用者の指の操作を検出して、検出内容に応じた信号を出力する。 The control unit 70 is a device for controlling the HMD 200. The control unit 70 is an operation unit configured by a plurality of keys, a track pad, and the like. The plurality of keys of the control unit 70 detects a pressing operation and transmits a control signal corresponding to the pressed key to the image display unit 80. The track pad of the control unit 70 detects the operation of the user's finger on the operation surface of the track pad, and outputs a signal corresponding to the detected content.

制御部７０は、画像表示部８０を制御するＣＰＵ７５（図示しない）を有している。ＣＰＵ７５は、無線通信などを介して受信したデータ記憶部５０に保存された複合シナリオを実行する。制御部７０が所定のキー操作受け付けると、ＣＰＵ７５は、カメラ８９が撮像した撮像画像の中から、複合シナリオの中の基本シナリオに含まれるＡＲ画像に対応付けられた物体（以下、「対応物体」とも呼ぶ）を検出する。ＣＰＵ７５は、検出した対応物体の位置に対応付けて、基本シナリオに含まれるＡＲ画像を画像表示部８０の光学像表示部８６，８８に表示させる。また、ＣＰＵ７５は、カメラ８９が撮像した撮像画像の中から、基本シナリオから分岐シナリオへと分岐するためのトリガーとしての対象（以下、「トリガー対象」とも呼ぶ）の画像を検出する。ＣＰＵ７５は、カメラ８９の撮像画像の中からトリガー対象の画像を検出すると、基本シナリオから分岐シナリオへと分岐し、分岐シナリオに基づくＡＲ画像を光学像表示部８６，８８に表示させる。 The control unit 70 has a CPU 75 (not shown) that controls the image display unit 80. The CPU 75 executes a composite scenario stored in the data storage unit 50 received via wireless communication or the like. When the control unit 70 receives a predetermined key operation, the CPU 75 selects an object associated with an AR image included in the basic scenario in the composite scenario (hereinafter, “corresponding object”) from the captured images captured by the camera 89. Also called). The CPU 75 displays the AR image included in the basic scenario on the optical image display units 86 and 88 of the image display unit 80 in association with the detected position of the corresponding object. Further, the CPU 75 detects an image of a target (hereinafter also referred to as “trigger target”) as a trigger for branching from the basic scenario to the branch scenario from the captured images captured by the camera 89. When the CPU 75 detects the trigger target image from the captured images of the camera 89, the CPU 75 branches from the basic scenario to the branch scenario, and displays the AR image based on the branch scenario on the optical image display units 86 and 88.

図１４は、複合シナリオ実行処理のフローチャートである。複合シナリオ実行処理では、初めに、使用者の頭部に装着されたＨＭＤ２００の制御部７０が複合シナリオ処理を実行する操作を受け付けたか否かを判定する（ステップＳ６１）。制御部７０が複合シナリオを実行する操作を受け付けなかった場合には（ステップＳ６１：ＮＯ）、ＨＭＤ２００は、複合シナリオ実行処理を終了する。 FIG. 14 is a flowchart of the composite scenario execution process. In the composite scenario execution process, first, it is determined whether or not the control unit 70 of the HMD 200 mounted on the user's head has received an operation for executing the composite scenario process (step S61). When the control unit 70 does not accept the operation for executing the composite scenario (step S61: NO), the HMD 200 ends the composite scenario execution process.

ステップＳ６１の処理において、制御部７０は、複合シナリオを実行する操作を受け付けた場合には（ステップＳ６１：ＹＥＳ）、実行する複合シナリオを使用者に選択させるための画像を画像表示部８０の光学像表示部８６，８８に表示させる（ステップＳ６３）。使用者は、光学像表示部８６，８８に表示された画像を視認し、制御部７０のキーを操作することで、実行する１つの複合シナリオを選択できる。複合シナリオが選択されると、制御部７０のＣＰＵ７５は、カメラ８９が撮像した撮像画像の中から、選択された複合シナリオの中の基本シナリオに含まれるＡＲ画像に対応付けられた対応物体を検出する（ステップＳ６５）。ＣＰＵ７５は、撮像画像の中から、パターンマッチングや統計的識別法を用いて、ＡＲ画像の対応物体を検出する。また、ＣＰＵ７５は、デプスセンサー９１が測定した対応物体までの距離を測定する。なお、本実施形態では、１つの複合シナリオが選択されたが、他の実施形態では、複数の複合シナリオが選択されて、検出された対応物体によって実行される複合シナリオが決定されてもよい。実行される複合シナリオの数や複合シナリオの選択の方法については、種々変形可能である。 In the process of step S61, when the control unit 70 receives an operation for executing a composite scenario (step S61: YES), the control unit 70 selects an image for causing the user to select the composite scenario to be executed. The image is displayed on the image display units 86 and 88 (step S63). The user can select one composite scenario to be executed by visually recognizing the images displayed on the optical image display units 86 and 88 and operating the keys of the control unit 70. When the composite scenario is selected, the CPU 75 of the control unit 70 detects a corresponding object associated with the AR image included in the basic scenario in the selected composite scenario from the captured images captured by the camera 89. (Step S65). The CPU 75 detects the corresponding object of the AR image from the captured image using pattern matching or statistical identification. Further, the CPU 75 measures the distance to the corresponding object measured by the depth sensor 91. In the present embodiment, one composite scenario is selected. However, in another embodiment, a plurality of composite scenarios may be selected to determine a composite scenario to be executed by the detected corresponding object. The number of composite scenarios to be executed and the method for selecting composite scenarios can be variously modified.

ＣＰＵ７５は、撮像画像の中から対応物体を検出すると、撮像画像の中における対応物体の位置を特定し、特定した対応物体の位置に対応付けて、複合シナリオに含まれるＡＲ画像や付加情報を表示する（ステップＳ６７）。ＨＭＤ２００では、カメラ８９の撮像範囲と使用者が視認する光学像表示部８６，８８に表示される画素の位置とが整合するように予め設定されているため、ＣＰＵ７５が対応物体に対応付けてＡＲ画像を光学像表示部８６，８８に表示すると、使用者は、現実の対応物体の位置に対応付けられたＡＲ画像を視認できる。また、ＣＰＵ７５は、デプスセンサー９１によって測定された対応物体までの距離に対応させてＡＲ画像を立体視表示することもできる（ここでいう立体視表示とは、左右眼用の２つのＡＲ画像間に視差を付けて表示することである）。なお、ＣＰＵ７５は、対応物体を検出しない場合には、複合シナリオに含まれるＡＲ画像を表示しない。また、付加情報には、例えば、テキスト画像などの光学像表示部８６，８８に表示される画像だけでなく、イヤホン８１，８３を介して、出力される音声等も含まれる。 When the CPU 75 detects the corresponding object from the captured image, the CPU 75 identifies the position of the corresponding object in the captured image and displays the AR image and additional information included in the composite scenario in association with the identified corresponding object position. (Step S67). In the HMD 200, since the imaging range of the camera 89 and the positions of the pixels displayed on the optical image display units 86 and 88 visually recognized by the user are set in advance, the CPU 75 associates the AR with the corresponding object. When the image is displayed on the optical image display units 86 and 88, the user can visually recognize the AR image associated with the position of the actual corresponding object. The CPU 75 can also stereoscopically display the AR image in correspondence with the distance to the corresponding object measured by the depth sensor 91 (the stereoscopic display here refers to the interval between two AR images for the left and right eyes). Display with parallax). Note that the CPU 75 does not display the AR image included in the composite scenario when no corresponding object is detected. Further, for example, the additional information includes not only an image displayed on the optical image display units 86 and 88 such as a text image but also a sound output via the earphones 81 and 83.

図１５は、実行される複合シナリオに設定された対応物体が検出された場合に使用者が視認する視野ＶＲを示す説明図である。図１５に示すように、使用者が視認する視野ＶＲには、頭部に装着した画像表示部８０の光学像表示部８６，８８を透過した外景ＳＣと、光学像表示部８６，８８に表示されたうろこ取りＴＬの画像ＡＲ１およびテキスト画像ＴＸ１と、が含まれている。外景ＳＣには、まな板の上に置かれている魚ＦＳが含まれる。うろこ取りＴＬの画像ＡＲ１およびテキスト画像ＴＸ１は、ＣＰＵ７５によって撮像画像の中から検出された魚ＦＳの位置に対応付けられて光学像表示部８６，８８に表示される。画像ＡＲ１は、静止画ではなく、魚ＦＳの頭から尻尾までを往復する動画である。 FIG. 15 is an explanatory diagram showing a visual field VR visually recognized by the user when a corresponding object set in the composite scenario to be executed is detected. As shown in FIG. 15, the visual field VR visually recognized by the user is displayed on the external scene SC transmitted through the optical image display units 86 and 88 of the image display unit 80 mounted on the head and on the optical image display units 86 and 88. An image AR1 of the scale removal TL and a text image TX1 are included. The outside scene SC includes a fish FS placed on a cutting board. The image AR1 and the text image TX1 of the scale removal TL are displayed on the optical image display units 86 and 88 in association with the position of the fish FS detected from the captured image by the CPU 75. The image AR1 is not a still image but a moving image that reciprocates from the head to the tail of the fish FS.

ＣＰＵ７５は、複合シナリオに含まれるＡＲ画像を光学像表示部８６，８８に表示すると（図１４のステップＳ６７）、カメラ８９の撮像画像の中から、分岐シナリオに分岐するためのトリガー対象の画像の検出を監視する（ステップＳ６９）。ＣＰＵ７５は、撮像画像の中からトリガー対象の画像を検出した場合には（ステップＳ６９：ＹＥＳ）、検出したトリガー対象に対応付けられた分岐シナリオに分岐して実行する（ステップＳ７３）。ＣＰＵ７５は、実行するＡＲシナリオを分岐シナリオに分岐させると、分岐シナリオに含まれるＡＲ画像に対応付けられた対応物体を検出する（ステップＳ６５）。ＣＰＵ７５は、検出した対応物体の位置を特定し、特定した対応物体の位置に対応付けて、分岐シナリオに含まれるＡＲ画像を表示させる（ステップＳ６７）。なお、分岐シナリオに含まれるＡＲ画像に対応付けられた対応物体とトリガー対象とは、同じ物体であってもよいし、異なる物体であってもよい。 When the CPU 75 displays the AR image included in the composite scenario on the optical image display units 86 and 88 (step S67 in FIG. 14), the CPU 75 displays a trigger target image for branching from the captured image of the camera 89 to the branch scenario. The detection is monitored (step S69). When detecting the trigger target image from the captured image (step S69: YES), the CPU 75 branches to the branch scenario associated with the detected trigger target and executes it (step S73). When the CPU 75 branches the AR scenario to be executed into the branch scenario, the CPU 75 detects a corresponding object associated with the AR image included in the branch scenario (step S65). The CPU 75 specifies the position of the detected corresponding object, and displays the AR image included in the branch scenario in association with the specified position of the corresponding object (step S67). Note that the corresponding object associated with the AR image included in the branch scenario and the trigger target may be the same object or different objects.

図１６は、実行される分岐シナリオに設定されたトリガー対象が検出された場合に使用者が視認する視野ＶＲを示す説明図である。図１６に示すように、使用者が視認する視野ＶＲには、頭部に装着した画像表示部８０の光学像表示部８６，８８を透過した外景ＳＣと、光学像表示部８６，８８に表示された「×」を表す画像ＡＲ２およびテキスト画像ＴＸ２と、が含まれている。外景ＳＣには、まな板の上に置かれている魚ＦＳと、使用者が右手に持っている包丁ＫＮと、が含まれている。テキスト画像ＴＸ２は、ＣＰＵ７５によって撮像画像の中から検出された魚ＦＳの位置に対応付けられて光学像表示部８６，８８に表示される。画像ＡＲ２は、撮像画像の中から検出された包丁ＫＮの位置に対応付けられて光学像表示部８６，８８に表示される。 FIG. 16 is an explanatory diagram showing a visual field VR visually recognized by the user when a trigger target set in the executed branch scenario is detected. As shown in FIG. 16, the visual field VR visually recognized by the user is displayed on the external scene SC transmitted through the optical image display units 86 and 88 of the image display unit 80 mounted on the head and on the optical image display units 86 and 88. The image AR2 and the text image TX2 representing “x” are included. The outside scene SC includes a fish FS placed on a cutting board and a knife KN held by the user on the right hand. The text image TX2 is displayed on the optical image display units 86 and 88 in association with the position of the fish FS detected from the captured image by the CPU 75. The image AR2 is displayed on the optical image display units 86 and 88 in association with the position of the knife KN detected from the captured image.

図１４のステップＳ６９の処理において、ＣＰＵ７５は、撮像画像の中からトリガー対象の画像を検出しなかった場合には（ステップＳ６９：ＮＯ）、複合シナリオの中の実行しているＡＲシナリオ（以下、「実行シナリオ」とも呼ぶ）を終了させるか否かを判定する（ステップＳ７１）。ＣＰＵ７５は、実行シナリオを終了させる操作を制御部７０が受け付けた場合や実行シナリオを表示する必要がないと判定した場合に、実行シナリオを終了させる。ＣＰＵ７５が実行シナリオを表示する必要がないと判断する例としては、魚ＦＳのうろこ取りを促すＡＲシナリオが実行されている場合に、カメラ８９の撮像範囲において、魚ＦＳの表面に魚ＦＳのうろこが存在しないと判定された場合などがある。 In the process of step S69 in FIG. 14, when the CPU 75 does not detect the trigger target image from the captured images (step S69: NO), the AR scenario being executed in the composite scenario (hereinafter, referred to as “NO”) is described. It is determined whether or not the “execution scenario” is terminated (step S71). The CPU 75 ends the execution scenario when the control unit 70 receives an operation for ending the execution scenario or when it is determined that it is not necessary to display the execution scenario. As an example in which the CPU 75 determines that it is not necessary to display the execution scenario, when the AR scenario that prompts the fish FS to be removed is being executed, the scale of the fish FS is placed on the surface of the fish FS in the imaging range of the camera 89. There is a case where it is determined that does not exist.

ステップＳ７１の処理において、ＣＰＵ７５は、実行シナリオの終了を終了させないと判定した場合には（ステップＳ７１：ＮＯ）、引き続き、実行シナリオに含まれるＡＲ画像および付加情報を光学像表示部８６，８８に表示させる。ステップＳ７１の処理において、実行シナリオを終了させると判定された場合には（ステップＳ７１：ＹＥＳ）、ＨＭＤ２００は、複合シナリオ実行処理を終了する。 In the process of step S71, when the CPU 75 determines not to end the execution scenario (step S71: NO), the AR image and the additional information included in the execution scenario are continuously displayed on the optical image display units 86 and 88. Display. If it is determined in step S71 that the execution scenario is to be ended (step S71: YES), the HMD 200 ends the combined scenario execution process.

以上説明したように、本実施形態の画像処理装置１００では、ＡＲ画像抽出部２１は、物体認識部１３および３Ｄモデル生成部１４によって認識された１つ１つの物体の内の移動物体のＡＲ画像を生成する。そのため、本実施形態の画像処理装置１００では、動いている物体を特定するだけで、特定した物体に対応するＡＲ画像が生成されるため、使用者は、例えば、作業支援等のＡＲシナリオを簡単に作成でき、使用者の使い勝手が向上する。 As described above, in the image processing apparatus 100 according to the present embodiment, the AR image extraction unit 21 is an AR image of a moving object among individual objects recognized by the object recognition unit 13 and the 3D model generation unit 14. Is generated. Therefore, in the image processing apparatus 100 of the present embodiment, an AR image corresponding to the identified object is generated simply by identifying the moving object, so that the user can easily perform an AR scenario such as work support, for example. The user-friendliness is improved.

また、本実施形態の画像処理装置１００では、ＡＲ画像抽出部２１は、ＡＲ画像を生成する対象の移動領域を、不要画像消去部１９によって消去された物体に対応付けてＡＲ画像を生成する。そのため、本実施形態の画像処理装置１００では、生成されたＡＲ画像を含むＡＲシナリオが実行された場合に、ＡＲ画像は、ＡＲ画像に対応付けられた現実に存在する物体の位置や大きさ等に対応付けられて表示される。例えば、ＡＲシナリオが作業等の支援動画であった場合には、作業を行なう対象にＡＲ画像を重畳させることで、使用者の作業性がより向上し、使用者の使い勝手が向上する。 Further, in the image processing apparatus 100 of the present embodiment, the AR image extraction unit 21 generates an AR image by associating the moving area to be generated with the AR image with the object deleted by the unnecessary image deletion unit 19. Therefore, in the image processing apparatus 100 according to the present embodiment, when an AR scenario including the generated AR image is executed, the AR image is the position and size of an actually existing object associated with the AR image. Is displayed in association with. For example, when the AR scenario is a support moving image such as work, the workability of the user is further improved by superimposing the AR image on the work target, and the user's usability is improved.

また、本実施形態の画像処理装置１００では、ＡＲ画像抽出部２１は、生成するＡＲ画像に対応付ける対象の大きさに、ＡＲ画像を生成する対象の大きさや移動領域を対応付けて、ＡＲ画像を生成する。そのため、本実施形態の画像処理装置１００では、生成されたＡＲ画像を含むＡＲシナリオが実行された場合に、ＡＲ画像は、ＡＲ画像に対応付けられた現実に存在する物体の位置や大きさ等により対応付けられて表示され、使用者の使い勝手がより向上する。 In the image processing apparatus 100 according to the present embodiment, the AR image extraction unit 21 associates the size of the target to be generated with the AR image to be generated with the size of the target to generate the AR image and the moving area, and converts the AR image into the AR image. Generate. Therefore, in the image processing apparatus 100 according to the present embodiment, when an AR scenario including the generated AR image is executed, the AR image is the position and size of an actually existing object associated with the AR image. Are displayed in association with each other, and the usability of the user is further improved.

以上説明したように、本実施形態の画像処理装置１００では、距離センサー３２が撮像した対象の表面までの距離を測定し、物体認識部１３が３Ｄモデル生成部１４によって生成された三次元モデルを用いて、撮像画像の中に含まれる１つ１つの物体を認識する。ＡＲ画像抽出部２１は、不要画像消去部１９によって消去された静止物体以外の移動している移動物体のＡＲ画像を生成する。そのため、本実施形態の画像処理装置１００では、所定の範囲が撮像されるだけで、距離が測定された移動物体のＡＲ画像を含むＡＲシナリオを作成できるため、使用者が簡単にＡＲシナリオを作成でき、使用者の利便性が向上する。 As described above, in the image processing apparatus 100 of this embodiment, the distance sensor 32 measures the distance to the target surface, and the object recognition unit 13 uses the 3D model generated by the 3D model generation unit 14. It is used to recognize each object included in the captured image. The AR image extraction unit 21 generates an AR image of a moving moving object other than the stationary object erased by the unnecessary image erasing unit 19. Therefore, in the image processing apparatus 100 according to the present embodiment, an AR scenario including an AR image of a moving object whose distance is measured can be created simply by capturing a predetermined range, so that the user can easily create an AR scenario. This improves user convenience.

また、本実施形態の画像処理装置１００では、不要画像消去部１９は、操作部３４が受け付けた操作に基づいて、撮像画像の中から選択された移動物体や静止物体を、ＡＲ画像を生成しない対象として消去する。すなわち、不要画像消去部１９は、生成されるＡＲ画像の対象を選択している。そのため、本実施形態の画像処理装置１００では、ＡＲ画像として、生成される必要がない移動物体や生成される必要がある静止物体が選択され、使用者がより使いやすいＡＲシナリオや複合シナリオを作成でき、使用者の使い勝手が向上する。 Further, in the image processing apparatus 100 of the present embodiment, the unnecessary image erasing unit 19 does not generate an AR image for a moving object or a stationary object selected from the captured image based on the operation received by the operation unit 34. Erase as target. That is, the unnecessary image erasing unit 19 selects the target of the generated AR image. Therefore, in the image processing apparatus 100 of this embodiment, a moving object that does not need to be generated or a stationary object that needs to be generated is selected as an AR image, and an AR scenario or a composite scenario that is easier for the user to use is created. This improves user convenience.

また、本実施形態の画像処理装置１００では、不要画像消去部１９は、撮像画像に含まれる移動物体以外の静止物体をＡＲ画像として生成されない対象として消去し、ＡＲ画像抽出部２１は、不要画像消去部１９によって消去されなかった撮像画像に含まれる移動物体のＡＲ画像を生成する。そのため、本実施形態の画像処理装置１００では、ＡＲ画像を生成する対象を選択する操作が行なわれなくても、自動的に移動物体のＡＲ画像を生成する。よって、例えば、ＡＲシナリオが作業等の支援動画であり、作業では何らかの対象を移動させる必要がある場合に、移動させるべき物体である移動物体のＡＲ画像が自動的に生成され、使用者の使い勝手が向上する。 Further, in the image processing apparatus 100 of the present embodiment, the unnecessary image erasure unit 19 erases a stationary object other than the moving object included in the captured image as a target that is not generated as an AR image, and the AR image extraction unit 21 performs an unnecessary image. An AR image of the moving object included in the captured image that has not been erased by the erasure unit 19 is generated. For this reason, the image processing apparatus 100 according to the present embodiment automatically generates an AR image of a moving object without performing an operation of selecting a target for generating an AR image. Therefore, for example, when the AR scenario is a support moving image such as work and it is necessary to move some target in the work, an AR image of the moving object that is the object to be moved is automatically generated, and the user's convenience Will improve.

また、本実施形態の画像処理装置１００では、不要画像消去部１９は、撮像画像の中から、物体認識部１３によって抽出された人体のパーツを消去し、ＡＲ画像抽出部２１は、撮像画像の中から消去された人体のパーツをＡＲ画像として生成しない。そのため、本実施形態の画像処理装置１００では、ＡＲシナリオが実行されている場合に、移動させるべき物体を動かすための手段である人の手などが表示されないため、使用者は、手などの不要なＡＲ画像を視認せずに済み、使用者の利便性が向上する。 Further, in the image processing apparatus 100 of the present embodiment, the unnecessary image erasing unit 19 erases human body parts extracted by the object recognition unit 13 from the captured image, and the AR image extracting unit 21 captures the captured image. The human body part erased from the inside is not generated as an AR image. Therefore, in the image processing apparatus 100 according to the present embodiment, when the AR scenario is executed, the user's hand, which is a means for moving the object to be moved, is not displayed. Therefore, it is not necessary to visually recognize a new AR image, and convenience for the user is improved.

また、本実施形態の画像処理装置１００では、ＡＲ画像抽出部２１は、撮像中の移動物体が移動している間に、マイク３３によって取得された音声を対応付けて、移動物体を対象とするＡＲ画像を生成する。そのため、本実施形態の画像処理装置１００では、作成されるＡＲシナリオには、撮像画像を基に生成されたＡＲ画像のような視覚的な情報に加えて、音声といった聴覚的な情報にも対応付けられたＡＲ画像が含まれるので、使用者の利便性が向上する。 In the image processing apparatus 100 of the present embodiment, the AR image extraction unit 21 associates the sound acquired by the microphone 33 while the moving object being imaged is moving, and targets the moving object. An AR image is generated. Therefore, in the image processing apparatus 100 according to the present embodiment, the created AR scenario supports not only visual information such as an AR image generated based on a captured image but also auditory information such as sound. Since the attached AR image is included, the convenience for the user is improved.

また、本実施形態の画像処理装置１００では、マイク３３によって取得された音声をテキスト画像として、移動物体と対応付けてＡＲ画像を生成するため、音声を視覚的な情報としてＡＲ画像と並列して生成でき、使用者が情報を認識しやすく、使用者の利便性がより向上する。 Further, in the image processing apparatus 100 of the present embodiment, the voice acquired by the microphone 33 is used as a text image, and an AR image is generated in association with a moving object. Therefore, the voice is used as visual information in parallel with the AR image. It can be generated, and the user can easily recognize the information, and the convenience for the user is further improved.

また、本実施形態の画像処理装置１００では、ＡＲ画像抽出部２１は、ＲＧＢカメラ３１によって取得された撮像範囲のＲＧＢデータを用いて、生成したＡＲ画像に対して着色を行なう。そのため、本実施形態の画像処理装置１００では、生成されたＡＲ画像が、着色が行なわれていないＡＲ画像と比較して、撮像画像に含まれる対象により似ているため、使用者がＡＲ画像をより認識しやすく、使用者の利便性がより向上する。 In the image processing apparatus 100 of the present embodiment, the AR image extraction unit 21 colors the generated AR image using the RGB data of the imaging range acquired by the RGB camera 31. Therefore, in the image processing apparatus 100 according to the present embodiment, the generated AR image is more similar to the target included in the captured image as compared with the AR image that is not colored, and thus the user can view the AR image. It is easier to recognize and user convenience is improved.

また、本実施形態の画像処理装置１００では、分岐シナリオに分岐するためのトリガー設定などの複合シナリオの編集時において、ＡＲ画像抽出部２１は、編集状態であることを示す編集画像ＫＣをＡＲシナリオに挿入する。そのため、本実施形態の画像処理装置１００では、ＡＲシナリオに編集などがされている場合に、編集がされていることを視覚的な情報として使用者に認識させるための画像が編集されているＡＲシナリオに挿入されるため、使用者の使い勝手が向上する。 Further, in the image processing apparatus 100 of the present embodiment, when editing a composite scenario such as a trigger setting for branching to a branch scenario, the AR image extraction unit 21 displays an edited image KC indicating an editing state as an AR scenario. Insert into. Therefore, in the image processing apparatus 100 according to the present embodiment, when an AR scenario is edited, an image for editing the image for allowing the user to recognize that the editing is performed as visual information is edited. Since it is inserted into the scenario, the user convenience is improved.

また、本実施形態の画像処理装置１００では、ＡＲ画像として、三次元モデルが３６０度のどの向きから見てもよいように生成されているため、ＡＲシナリオが実行された場合に、使用者は、どの向きからも三次元モデルを確認でき、使用者の利便性が向上する。 In the image processing apparatus 100 according to the present embodiment, since the 3D model is generated as an AR image so that it can be viewed from any direction of 360 degrees, when the AR scenario is executed, the user can , The 3D model can be confirmed from any direction, improving user convenience.

Ｂ．第２実施形態：
図１７は、第２実施形態における画像処理装置１００ａの構成を機能的に示すブロック図である。第２実施形態では、センサー制御部１５ａから送信されたＲＧＢＤデータがストリーミングデータとして出力されることで、３Ｄモデル生成部１４ａと物体認識部１３ａと物体トラッキング部１２ａとが撮像範囲に含まれる対象等の全ての三次元モデルを生成し、生成した三次元モデルをストリーミングデータとしてＡＲシナリオ制御部１１ａに送信する点が第１実施形態と異なる。 B. Second embodiment:
FIG. 17 is a block diagram functionally showing the configuration of the image processing apparatus 100a in the second embodiment. In the second embodiment, the RGBD data transmitted from the sensor control unit 15a is output as streaming data, so that the 3D model generation unit 14a, the object recognition unit 13a, and the object tracking unit 12a are included in the imaging range. Is different from the first embodiment in that all the three-dimensional models are generated and the generated three-dimensional model is transmitted as streaming data to the AR scenario control unit 11a.

図１８は、第２実施形態におけるＡＲシナリオ作成処理の一部のフローチャートである。第２実施形態のＡＲシナリオ作成処理では、センサー制御部１５ａが外景を撮像して画素ごとのＲＧＢＤデータをストリーミングデータとして出力している間、所望の三次元モデルがストリーミングデータとして出力されるため、図１８のステップＳ２４ａの処理が第１実施形態のＡＲ作成シナリオ処理（図３）のステップＳ２４の処理と異なる。そのため、第２実施形態では、図１８のステップＳ２４ａについて説明し、その他の処理の説明を省略する。図１８のステップＳ２４ａの処理において、センサー制御部１５ａからの画素ごとのＲＧＢＤデータのストリーミングデータに基づいて、３Ｄモデル生成部１４ａが、撮像範囲内に存在する実物体・実環境の対象をすべて含んだ三次元モデル（以下、単に「全体三次元モデル」と呼ぶ）を生成する。本実施形態では、具体的には、３Ｄモデル生成部１４ａは、カメラ３３１、３３２，３３２からのそれぞれの視点からのそれぞれの三次元モデルを生成し、これらの三次元モデルを１つに融合し、カメラ３３１、３３２，３３３からの視点に依存しない全体三次元モデルを得る。本実施形態では、全体三次元モデルは、ポリゴンメッシュ（たとえばレンダリングされたテクスチャー・トライアングル・メッシュ）のデータで表されている。そして、３Ｄモデル生成部１４ａは、その全体三次元モデルのデータをストリーミングデータとして出力する。なお、以降では、ストリーミングデータとして出力することを、単にストリーミング出力するともいう。 FIG. 18 is a flowchart of a part of the AR scenario creation process in the second embodiment. In the AR scenario creation process of the second embodiment, while the sensor control unit 15a captures an outside scene and outputs RGBD data for each pixel as streaming data, a desired three-dimensional model is output as streaming data. The process of step S24a of FIG. 18 is different from the process of step S24 of the AR creation scenario process (FIG. 3) of the first embodiment. Therefore, in the second embodiment, step S24a in FIG. 18 will be described, and description of other processing will be omitted. In the process of step S24a in FIG. 18, the 3D model generation unit 14a includes all the objects of the real object and the real environment existing in the imaging range based on the streaming data of the RGBD data for each pixel from the sensor control unit 15a. A three-dimensional model (hereinafter simply referred to as an “entire three-dimensional model”) is generated. In the present embodiment, specifically, the 3D model generation unit 14a generates each three-dimensional model from each viewpoint from the cameras 331, 332, and 332, and fuses these three-dimensional models into one. The whole three-dimensional model that does not depend on the viewpoint from the cameras 331, 332, and 333 is obtained. In the present embodiment, the entire three-dimensional model is represented by data of a polygon mesh (for example, a rendered texture triangle mesh). Then, the 3D model generation unit 14a outputs the data of the entire three-dimensional model as streaming data. Hereinafter, outputting as streaming data is also simply referred to as streaming output.

次に、物体認識部１３ａは、センサー制御部１５ａから送信された画素ごとのＲＧＢＤデータのうち、ＲＧＢデータのストリーミングデータに基づいて、全ての三次元モデルに含まれる個々の要素である三次元モデル（以下、単に「要素三次元モデル」とも呼ぶ）を区別して認識する。第２実施形態では、三次元モデルを区別する方法として、例えば、物体認識部１３ａは、ＲＧＢデータ内でエッジ検出などにより、実物体を区別し、区別した実物体がＲＧＢデータで表される画像空間に占める領域を、全体三次元モデルの空間における領域に対応付ける。そうすると、全体三次元モデルのうち当該領域に含まれる部分（要素三次元モデル）が他の部分から区別される。物体認識部１３ａは、三次元モデルの区別の結果に応じて、全体三次元モデルに含まれる要素三次元モデルを修正する。 Next, the object recognizing unit 13a is a three-dimensional model that is an individual element included in all three-dimensional models based on RGB data streaming data among the RGBD data for each pixel transmitted from the sensor control unit 15a. (Hereinafter also simply referred to as “element three-dimensional model”). In the second embodiment, as a method of distinguishing a three-dimensional model, for example, the object recognition unit 13a distinguishes a real object by detecting an edge in RGB data, and the distinguished real object is represented by RGB data. The area occupying the space is associated with the area in the space of the entire three-dimensional model. If it does so, the part (element 3D model) contained in the said area | region among the whole three-dimensional model will be distinguished from another part. The object recognition unit 13a corrects the elemental 3D model included in the overall 3D model according to the result of the 3D model discrimination.

物体トラッキング部１２ａは、センサー制御部１５ａから送信されたＲＧＢデータのストリーミングデータに対して画像処理を行なうことで、移動している実物体（移動物体）と静止している実物体とを特定する。物体トラッキング部１２ａは、特定した移動物体をＲＧＢデータで表される画像空間内で追跡する。 The object tracking unit 12a performs image processing on the streaming data of the RGB data transmitted from the sensor control unit 15a, thereby identifying a moving real object (moving object) and a stationary real object. . The object tracking unit 12a tracks the identified moving object in an image space represented by RGB data.

ＡＲシナリオ制御部１１ａは、物体認識部１３ａが特定した全体三次元モデルの中の要素三次元モデルの内、物体トラッキング部１２ａが追跡する移動物体に対応する要素三次元モデルをストリーミング出力する。物体トラッキング部１２がストリーミング出力する要素三次元モデルには、三次元モデルの移動だけでなく、向きを含む姿勢の変化（例えば、回転など）も含まれる。なお、ストリーミング出力される要素三次元モデルは、本実施形態では、ポリゴンメッシュのデータで表されている。 The AR scenario control unit 11a performs streaming output of the element 3D model corresponding to the moving object tracked by the object tracking unit 12a among the element 3D models in the overall 3D model specified by the object recognition unit 13a. The element three-dimensional model output by streaming from the object tracking unit 12 includes not only the movement of the three-dimensional model but also a change in posture including the direction (for example, rotation). Note that the elemental 3D model output by streaming is represented by polygon mesh data in this embodiment.

ＡＲ画像抽出部２１は、ＡＲシナリオ制御部１１ａがストリーミング出力した要素三次元モデルに不要な部分が含まれる場合に、当該要素三次元モデルから不要な部分を消去する。ＡＲ画像抽出部２１によって判断される不要な部分は、複数の要素三次元モデルの内の１つの要素三次元モデルや、要素三次元モデルの一部、例えば、対象である実物体を覆っている人間の体の一部分（例えば、手）を表す要素三次元モデルの部分などがある。ＡＲ画像抽出部２１は、要素三次元モデルから不要な部分を削除した後に残った要素三次元モデルをストリーミング出力する。 The AR image extraction unit 21 deletes an unnecessary part from the element three-dimensional model when the element three-dimensional model stream-output by the AR scenario control unit 11a includes an unnecessary part. The unnecessary part determined by the AR image extraction unit 21 covers one element three-dimensional model of a plurality of element three-dimensional models or a part of the element three-dimensional model, for example, a target real object. There is a part of an element three-dimensional model that represents a part of a human body (for example, a hand). The AR image extraction unit 21 performs streaming output of the element 3D model remaining after deleting unnecessary portions from the element 3D model.

データ記憶部５０は、ＡＲ画像抽出部２１によってストリーミング出力された要素三次元モデルを抽出三次元モデルとして記録する。その後、ＡＲシナリオ制御部１１ａは、記録された抽出三次元モデルを用いて、ＡＲシナリオを作成する。ＡＲシナリオに含まれるＡＲ画像は、抽出三次元モデルを表す画像であってもよいし、抽出三次元モデルの外観を修正した画像であってもよい。また、抽出三次元モデルは、ＡＲシナリオ作成処理の撮像開始時に撮像された三次元モデル（例えば、図５で示すようなうろこ取りＴＬの三次元モデル）であってもよいし、または、ＣＡＤデータによるものなどその他の三次元モデルに置き換えてもよい。この場合、要素三次元モデルから不要な部分を削除した後でも、例えば、手などで隠された部分が欠如しないないＡＲ画像が実現する。また、本実施形態のＡＲ画像は、三次元モデルのストリームデータ（例えば、ポリゴンメッシュのストリームデータ）によって表されている。このため、ストリームデータの始点と終点との間で規定される時間期間内の任意の時点で、ＡＲ画像に対して任意に視点を変え、表示されるＡＲ画像の向きを変えることができる。なお、カメラ３３１〜３３３のいずれか一つからの視点が、ＡＲシナリオ制御部１１ａによって、デフォルト視点情報としてＡＲシナリオに包含されてもよい。 The data storage unit 50 records the element three-dimensional model stream-output by the AR image extraction unit 21 as the extracted three-dimensional model. Thereafter, the AR scenario control unit 11a creates an AR scenario using the recorded extracted three-dimensional model. The AR image included in the AR scenario may be an image representing the extracted three-dimensional model, or may be an image obtained by correcting the appearance of the extracted three-dimensional model. The extracted three-dimensional model may be a three-dimensional model (for example, a three-dimensional model of a scale removal TL as shown in FIG. 5) captured at the start of imaging in the AR scenario creation process, or CAD data It may be replaced with other three-dimensional models such as In this case, even after deleting unnecessary portions from the element three-dimensional model, for example, an AR image that does not lack a portion hidden by a hand or the like is realized. The AR image of the present embodiment is represented by stream data of a three-dimensional model (for example, polygon mesh stream data). Therefore, it is possible to arbitrarily change the viewpoint of the AR image and change the direction of the displayed AR image at an arbitrary time point within the time period defined between the start point and the end point of the stream data. Note that the viewpoint from any one of the cameras 331 to 333 may be included in the AR scenario as default viewpoint information by the AR scenario control unit 11a.

作成されたＡＲシナリオは、ストリーミング出力された移動物体が実際に移動している間に撮像された連続的な動きを備えるＡＲ画像を含んでもよい。なお、ＡＲ画像が連続的に動く時間の長さは、ＡＲ画像の基となる撮像された移動物体が実際に移動している時間と同じであってもよいし、異なっていてもよい。また、ＡＲシナリオは、連続的な動きのＡＲ画像ではなく、離散的な動きのＡＲ画像によって構成されていてもよい。離散的な動きのＡＲ画像として、例えば、ＡＲシナリオにおいて、撮像された移動物体の動き出した時点の状態から移動を終了した時点の状態までの少なくとも一時点の状態がＡＲ画像として生成されてもよいし、動き出した時点と、終了した時点と、動き出した時点から終了した時点までの間の一時点と、を生成したＡＲ画像であってもよい。 The created AR scenario may include an AR image including continuous motion that is captured while the moving object that is streamed out is actually moving. Note that the length of time during which the AR image continuously moves may be the same as or different from the time during which the captured moving object that is the basis of the AR image is actually moving. Further, the AR scenario may be configured by an AR image of discrete motion instead of an AR image of continuous motion. As an AR image of discrete motion, for example, in an AR scenario, a state of at least a temporary point from a state at the time when a captured moving object starts moving to a state at the time when movement is completed may be generated as an AR image. In addition, the AR image may be generated in which a time point at which the movement starts, a time point at which the movement starts, and a point in time from the time point when the movement starts to the time point at which the movement ends.

Ｃ．第３実施形態：
図１９は、第３実施形態における画像処理装置１００ｂの構成を機能的に示すブロック図である。図２０は、第３実施形態におけるＡＲシナリオ作成処理の一部のフローチャートである。第３実施形態では、第２実施形態の構成と比較して、画像処理装置１００ａのＣＰＵ１０ａが物体認識部１３ａを有しない点が異なり、その他の構成については同じである。図１９に示すように、第３実施形態では、物体認識部１３ａがないため、ＡＲ作成処理のステップＳ２４ｂの処理が第２実施形態のＡＲ作成処理（図１８）のステップＳ２４ａの処理と異なる。第３実施形態では、第２実施形態と異なる点について説明し、同じである点についての説明を省略する。 C. Third embodiment:
FIG. 19 is a block diagram functionally showing the configuration of the image processing apparatus 100b according to the third embodiment. FIG. 20 is a flowchart illustrating a part of the AR scenario creation process according to the third embodiment. The third embodiment is different from the configuration of the second embodiment in that the CPU 10a of the image processing apparatus 100a does not have the object recognition unit 13a, and the other configurations are the same. As shown in FIG. 19, in the third embodiment, since there is no object recognition unit 13a, the process of step S24b of the AR creation process is different from the process of step S24a of the AR creation process (FIG. 18) of the second embodiment. In the third embodiment, differences from the second embodiment will be described, and descriptions of the same points will be omitted.

図１９のステップＳ２４ｂの処理において、物体トラッキング部１２ｂは、３Ｄモデル生成部１４ｂがストリーミング出力した全体三次元モデルを受け取る。そして、物体トラッキング部１２ｂは、全体三次元モデルの中で、動いている三次元モデル（要素三次元モデル）と、動いていない三次元モデルと、を特定（区別）する。そして、物体トラッキング１２ｂは、特定（区別）した要素三次元モデルを、ストリーミング出力する。 In the process of step S24b in FIG. 19, the object tracking unit 12b receives the entire three-dimensional model output by streaming from the 3D model generation unit 14b. Then, the object tracking unit 12b identifies (distinguishes) a moving three-dimensional model (element three-dimensional model) and a non-moving three-dimensional model in the entire three-dimensional model. Then, the object tracking 12b performs streaming output of the identified (differentiated) element three-dimensional model.

Ｄ．第４実施形態：
第４実施形態では、撮像した撮像データから、移動物体のＡＲ画像と、移動物体に関連すると判定されると共に静止している関連静止物体のＡＲ画像と、を組み合わせた複数のＡＲシナリオが自動的に作成されることが、第１実施形態および第２実施形態と比較して主に異なる。第４実施形態では、ＡＲシナリオ制御部１１ａは、ＡＲシナリオが実行されるためのトリガー対象として、移動物体と関連静止物体とのそれぞれを設定する。ＡＲシナリオ制御部１１ａは、ＡＲシナリオが実行された場合に検出されるトリガー対象しての移動物体と関連静止物体との所定の組み合わせとして、下記（１）−（４）の場合についてのそれぞれのＡＲシナリオを作成する。
（１）トリガー対象としての移動物体のみを検出した場合
（２）トリガー対象としての関連静止物体のみを検出した場合
（３）トリガー対象としての移動物体と関連静止物体とのいずれも検出した場合
（４）トリガー対象としての移動物体と関連静止とのいずれも検出しない場合
なお、第４実施形態では、上記（１）−（４）の条件の検出のそれぞれに対応したＡＲシナリオが作成されたが、他の実施形態では、上記（１）−（４）の４つの場合の内の３つ以下の場合に対応するＡＲシナリオが作成されてもよい。 D. Fourth embodiment:
In the fourth embodiment, a plurality of AR scenarios in which an AR image of a moving object and an AR image of a related stationary object that is determined to be related to the moving object and are stationary are automatically obtained from the captured image data. It is mainly different from the first embodiment and the second embodiment. In the fourth embodiment, the AR scenario control unit 11a sets each of the moving object and the related stationary object as trigger targets for executing the AR scenario. The AR scenario control unit 11a has the following combinations (1) to (4) as predetermined combinations of a moving object as a trigger target and a related stationary object that are detected when the AR scenario is executed. Create an AR scenario.
(1) When only a moving object as a trigger target is detected (2) When only a related stationary object as a trigger target is detected (3) When both a moving object and a related stationary object as a trigger target are detected ( 4) When neither a moving object as a trigger object nor a related stationary object is detected Note that in the fourth embodiment, an AR scenario corresponding to each of the detection of the conditions (1) to (4) is created. In other embodiments, AR scenarios corresponding to three or less of the four cases (1) to (4) may be created.

図２１および図２２は、第４実施形態におけるＡＲシナリオ作成処理のフローチャートである。第４実施形態のＡＲシナリオ作成処理では、第２実施形態のＡＲシナリオ作成処理と比較して、以下の２点が異なる。１つは、既に撮像された撮像データを元にしてＡＲシナリオを作成できることであり、もう１つは、自動的に上記（１）−（４）の場合の複数のＡＲシナリオを作成することとである。なお、第４実施形態における他の処理については、第２実施形態のＡＲシナリオ作成処理と同じである。そのため、第４実施形態のＡＲシナリオ作成処理では、第２実施形態および第３実施形態と同じステップＳ１４からステップＳ２２までの処理の説明を省略する。 21 and 22 are flowcharts of the AR scenario creation process in the fourth embodiment. The AR scenario creation process of the fourth embodiment differs from the AR scenario creation process of the second embodiment in the following two points. One is that an AR scenario can be created based on already captured image data, and the other is that a plurality of AR scenarios in the cases (1) to (4) are automatically created. It is. The other processes in the fourth embodiment are the same as the AR scenario creation process in the second embodiment. Therefore, in the AR scenario creation process of the fourth embodiment, the description of the processes from step S14 to step S22 that are the same as those of the second embodiment and the third embodiment is omitted.

第４実施形態のＡＲシナリオ作成処理では、初めに、マイク３３または操作部３４がＡＲシナリオの作成を開始する所定の操作を受け付ける（ステップＳ８１）。マイク３３または操作部３４は、ＡＲシナリオの作成を開始する所定の操作を受け付けると（ステップＳ８１：ＹＥＳ）、作成するＡＲシナリオが撮像データを元に作成するか否かの所定の操作を受け付ける（ステップＳ８３）。撮像データを元にＡＲシナリオを作成しない所定の操作を受け付けると（ステップＳ８３：ＮＯ）、ＡＲシナリオ制御部１１ａは、第２実施形態と同じステップＳ１４からステップＳ２２までの処理を行なう。ステップＳ８３の処理において、撮像データを元にＡＲシナリオを作成する所定の操作を受け付けると（ステップＳ８３：ＹＥＳ）、第２実施形態と同じように、３Ｄモデル生成部１４ａは、全体三次元モデルを生成する（図２２のステップＳ２４ａ）。具体的には、３Ｄモデル生成部１４ａは、センサー制御部１５ａからの画素ごとのＲＧＤＢデータのストリーミングデータに基づいて、全体三次元モデルを生成する。 In the AR scenario creation process of the fourth embodiment, first, the microphone 33 or the operation unit 34 receives a predetermined operation for starting creation of an AR scenario (step S81). When the microphone 33 or the operation unit 34 receives a predetermined operation for starting creation of an AR scenario (step S81: YES), the microphone 33 or the operation unit 34 receives a predetermined operation as to whether or not the AR scenario to be created is created based on imaging data ( Step S83). When a predetermined operation that does not create an AR scenario based on the imaging data is received (step S83: NO), the AR scenario control unit 11a performs the same processing from step S14 to step S22 as in the second embodiment. In the process of step S83, when a predetermined operation for creating an AR scenario based on the imaging data is received (step S83: YES), the 3D model generation unit 14a converts the entire three-dimensional model into the same manner as in the second embodiment. Generate (step S24a in FIG. 22). Specifically, the 3D model generation unit 14a generates an entire three-dimensional model based on streaming data of RGDB data for each pixel from the sensor control unit 15a.

図２３ないし図２７は、第４実施形態におけるＲＧＢカメラ３１および距離センサー３２によって複数の被写体を含む外景ＳＣを撮像する場合の説明図である。図２３には、筐体ＢＸと、カバーＣＶと、工具のドライバーＤＶと、４つのボルトＢＴと、を含む外景ＳＣが示されている。図２３から図２６では、ドライバーＤＶとボルトＢＴとによって、別部品としてばらばらの筐体ＢＸとカバーＣＶとが、一体の部品として取り付けられるまでの変化が表されている。筐体ＢＸには、ボルトＢＴのおねじ部が嵌合するための４箇所のめねじ部Ｂｈが形成されている。また、筐体ＢＸに対してカバーＣＶを固定するために、カバーＣＶには、筐体ＢＸとカバーＣＶとが組み合わされたときに、筐体ＢＸのめねじ部Ｂｈに対応する位置に正円状の穴Ｃｈが形成されている。 FIG. 23 to FIG. 27 are explanatory diagrams when an external scene SC including a plurality of subjects is imaged by the RGB camera 31 and the distance sensor 32 in the fourth embodiment. FIG. 23 shows an outside scene SC including a housing BX, a cover CV, a tool driver DV, and four bolts BT. 23 to FIG. 26 show changes until the separate housing BX and the cover CV are attached as separate components by the driver DV and the bolt BT. The housing BX is formed with four female screw portions Bh for fitting the male screw portions of the bolt BT. Further, in order to fix the cover CV to the housing BX, when the housing BX and the cover CV are combined, the cover CV has a circular shape at a position corresponding to the female thread Bh of the housing BX. A hole Ch is formed.

図２４には、図２３に示す外景ＳＣと比較して、作業者の左手ＬＨが、カバーＣＶを移動していない筐体ＢＸに対して固定される位置に保持している外景ＳＣが示されている。図２４に示す状態では、筐体ＢＸとカバーＣＶとは、ボルトＢＴによって固定されておらず、カバーＣＶの位置は、左手ＬＨによって一時的に固定されている。なお、図２３に示す外景ＳＣから図２４に示す外景ＳＣへと変化した状態では、カバーＣＶは、移動しているため、移動物体である。また、筐体ＢＸは、移動物体のカバーＣＶに接触している静止物体であるため、関連静止物体である。 FIG. 24 shows the outside scene SC in which the operator's left hand LH holds the cover CV at a position fixed to the casing BX that has not moved, as compared to the outside scene SC shown in FIG. ing. In the state shown in FIG. 24, the housing BX and the cover CV are not fixed by the bolt BT, and the position of the cover CV is temporarily fixed by the left hand LH. In the state where the outside scene SC shown in FIG. 23 changes to the outside scene SC shown in FIG. 24, the cover CV is a moving object because it moves. Further, the housing BX is a related stationary object because it is a stationary object that is in contact with the cover CV of the moving object.

図２５には、図２４に示す外景ＳＣと比較して、作業者の右手ＲＨがドライバーＤＶを把持し、ドライバーＤＶの先端に１つのボルトＢＴが装着している外景ＳＣが示されている。なお、図２５では、筐体ＢＸとカバーＣＶと左手ＬＨとの位置は変化していない。図２４に示す外景ＳＣから図２５に示す外景ＳＣへと変化した状態では、ドライバーＤＶとドライバーＤＶの先端に装着されたボルトＢＴとが移動物体である。なお、ドライバーＤＶとボルトＢＴとは、一体になって移動しているときは、１つの移動物体としてみなすこともできる。ドライバーＤＶに装着されていない３つのボルトＢＴは、移動物体とは関連のない静止物体である。また、図示されていないが、ドライバーＤＶとボルトＢＴとが一体化する前の状態では、ドライバーＤＶまたはボルトＢＴが移動物体であり、もう一方が関連静止物体の状態がある。 FIG. 25 shows the outside scene SC in which the operator's right hand RH grips the driver DV and one bolt BT is attached to the tip of the driver DV as compared to the outside scene SC shown in FIG. In FIG. 25, the positions of the housing BX, the cover CV, and the left hand LH are not changed. In the state changed from the outside scene SC shown in FIG. 24 to the outside scene SC shown in FIG. 25, the driver DV and the bolt BT attached to the tip of the driver DV are moving objects. Note that the driver DV and the bolt BT can be regarded as one moving object when moving together. The three bolts BT that are not attached to the driver DV are stationary objects that are not related to the moving object. Although not shown, in a state before the driver DV and the bolt BT are integrated, the driver DV or the bolt BT is a moving object, and the other is a related stationary object.

図２６には、図２５に示す外景ＳＣと比較して、右手ＲＨに把持されたドライバーＤＶの先端に装着された１つのボルトＢＴが筐体ＢＸの１つのめねじ部Ｂｈに挿入されて回転している外景ＳＣが示されている。なお、図２５に示す外景ＳＣから図２６に示す外景ＳＣへと変化した状態では、筐体ＢＸおよびカバーＣＶは、移動していない。そのため、ドライバーＤＶおよびボルトＢＴは、移動物体であり、筐体ＢＸおよびカバーＣＶは、関連静止物体である。 In FIG. 26, as compared with the outside scene SC shown in FIG. 25, one bolt BT attached to the tip of the driver DV held by the right hand RH is inserted into one female thread Bh of the housing BX and rotated. The outside scene SC is shown. Note that the housing BX and the cover CV are not moved in a state where the outside scene SC shown in FIG. 25 changes to the outside scene SC shown in FIG. Therefore, the driver DV and the bolt BT are moving objects, and the housing BX and the cover CV are related stationary objects.

図２７には、図２６に示す外景ＳＣと比較して、１つのボルトＢＴが筐体ＢＸの１つのめねじ部Ｂｈと嵌合することによって、筐体ＢＸとカバーＣＶとが固定され、ドライバーＤＶの先端がボルトＢＴから離れた状態の外景ＳＣが示されている。図２６に示す外景ＳＣから図２７に示す外景ＳＣまで変化した状態では、ドライバーＤＶは、移動物体であり、筐体ＢＸとカバーＣＶとボルトＢＴとは、関連静止物体である。なお、他の実施形態では、ドライバーＤＶが移動物体であり、筐体ＢＸとカバーＣＶとボルトＢＴとは、ドライバーＤＶと無関係で、関連静止物体でないとして扱われてもよい。 In FIG. 27, as compared with the outside scene SC shown in FIG. 26, one bolt BT is fitted to one female thread Bh of the housing BX, so that the housing BX and the cover CV are fixed. The outside scene SC with the DV tip separated from the bolt BT is shown. In a state where the outside scene SC shown in FIG. 26 changes to the outside scene SC shown in FIG. 27, the driver DV is a moving object, and the housing BX, the cover CV, and the bolt BT are related stationary objects. In other embodiments, the driver DV may be a moving object, and the housing BX, the cover CV, and the bolt BT may be treated as being independent of the driver DV and not an associated stationary object.

図２２のステップＳ２４ａでは、図２３から図２７までに示された外景ＳＣの変化の撮像画像から、移動物体と関連静止物体とを含む全ての対象のそれぞれの３Ｄモデルが作成される。その後、ＡＲシナリオ制御部１１ａは、生成された全体三次元モデルの中から、物体トラッキング部１２ａによって特定された１つ以上の移動物体の内から１つの移動物体を選択する（ステップＳ８５）。ＡＲシナリオ制御部１１ａは、図２３から図２７までの外景ＳＣの変化において、例えば、図２３から図２４までの外景ＳＣの変化における移動物体としてのカバーＣＶを選択する。 In step S24a of FIG. 22, 3D models of all targets including moving objects and related stationary objects are created from captured images of changes in the outside scene SC shown in FIGS. Thereafter, the AR scenario control unit 11a selects one moving object from among the one or more moving objects specified by the object tracking unit 12a from the generated entire three-dimensional model (step S85). The AR scenario control unit 11a selects, for example, the cover CV as a moving object in the change of the outside scene SC from FIG. 23 to FIG. 24 in the change of the outside scene SC from FIG. 23 to FIG.

ＡＲ画像抽出部２１は、選択された移動物体としてのカバーＣＶのＡＲ画像として、後述するカバー画像ＩＭＣを生成する（ステップＳ８７）。その後、ＡＲシナリオ制御部１１ａは、物体認識部１３ａによって測定距離に基づいて、選択した移動物体としてのカバーＣＶと接触していると特定する関連静止物体があるか否かを判定する（ステップＳ８９）。ＡＲシナリオ制御部１１ａは、移動物体に対して所定の距離以内に存在する静止物体を、移動物体と接触している関連静止物体として特定する。図２３から図２４までの外景ＳＣの変化において、ＡＲシナリオ制御部１１ａは、移動物体としてのカバーＣＶの関連静止物体として筐体ＢＸを特定する。そのため、カバーＣＶに対する関連静止物体があると判定され（ステップＳ８９：ＹＥＳ）、ＡＲ画像抽出部２１は、全ての関連静止物体のＡＲ画像を生成する（ステップＳ９１）。ＡＲ画像抽出部２１は、関連静止物体である筐体ＢＸのＡＲ画像として、後述する筐体画像ＩＭＸを生成する。 The AR image extraction unit 21 generates a cover image IMC described later as an AR image of the cover CV as the selected moving object (step S87). Thereafter, the AR scenario control unit 11a determines whether or not there is an associated stationary object that is identified as being in contact with the cover CV as the selected moving object based on the measurement distance by the object recognition unit 13a (step S89). ). The AR scenario control unit 11a identifies a stationary object that exists within a predetermined distance from the moving object as an associated stationary object that is in contact with the moving object. In the change of the outside scene SC from FIG. 23 to FIG. 24, the AR scenario control unit 11a specifies the housing BX as the related stationary object of the cover CV as the moving object. Therefore, it is determined that there is a related stationary object for the cover CV (step S89: YES), and the AR image extraction unit 21 generates AR images of all the related stationary objects (step S91). The AR image extraction unit 21 generates a case image IMX, which will be described later, as an AR image of the case BX that is a related stationary object.

その後、不要画像消去部１９は、選択された移動物体と関連静止物体とを除くその他の移動物体および静止物体を不要な物体として消去する（ステップＳ９３）。図２３から図２４までの外景ＳＣの変化では、不要画像消去部１９は、カバーＣＶとボルトＢＴと左手ＬＨとを不要な物体として消去する。その後、ＡＲシナリオ制御部１１ａは、ＡＲ画像が生成された移動物体のカバーＣＶと関連静止物体の筐体ＢＸとのそれぞれを、ＡＲシナリオが実行されるためのトリガー対象として設定する。ＡＲシナリオ制御部１１ａは、設定したトリガー対象としての移動物体の有無と関連静止物体の有無との組み合わせに対応するＡＲシナリオを作成する（ステップＳ９５）。トリガー対象の組み合わせに対応して作成されるＡＲシナリオの詳細については、後述する。 Thereafter, the unnecessary image erasing unit 19 erases other moving objects and stationary objects excluding the selected moving object and related stationary objects as unnecessary objects (step S93). In the change of the outside scene SC from FIG. 23 to FIG. 24, the unnecessary image erasing unit 19 erases the cover CV, the bolt BT, and the left hand LH as unnecessary objects. Thereafter, the AR scenario control unit 11a sets each of the moving object cover CV and the related stationary object casing BX on which the AR image is generated as a trigger target for executing the AR scenario. The AR scenario control unit 11a creates an AR scenario corresponding to the combination of the presence / absence of a moving object as a trigger target and the presence / absence of a related stationary object (step S95). Details of the AR scenario created corresponding to the combination of trigger targets will be described later.

ＡＲシナリオ制御部１１ａは、所定の組み合わせに対応するそれぞれのＡＲシナリオを作成すると、撮像データに含まれる全ての移動物体を選択してＡＲシナリオを作成したか否かを判定する（ステップＳ９７）。ＡＲシナリオ制御部１１ａは、移動物体としてカバーＣＶのみしか選択しておらず、全ての移動物体を選択していないため（ステップＳ９７：ＮＯ）、ステップＳ８５以降の処理を行なう。ステップＳ９７の処理において、ＡＲシナリオ制御部１１ａは、カバーＣＶ以外の撮像データにおける全ての移動物体を選択したと判定した場合には（ステップＳ９７：ＹＥＳ）、作成した全てのＡＲシナリオをデータ記憶部５０に保存して、ＡＲシナリオ作成処理を終了する。 When creating each AR scenario corresponding to a predetermined combination, the AR scenario control unit 11a determines whether or not an AR scenario has been created by selecting all moving objects included in the imaging data (step S97). Since the AR scenario control unit 11a selects only the cover CV as the moving object and does not select all the moving objects (step S97: NO), the AR scenario control unit 11a performs the processing after step S85. In the process of step S97, if the AR scenario control unit 11a determines that all moving objects in the imaging data other than the cover CV have been selected (step S97: YES), the AR scenario control unit 11a stores all the created AR scenarios in the data storage unit. 50, and the AR scenario creation process is terminated.

図２８は、第４実施形態のＡＲシナリオ作成処理のステップＳ９５の処理において作成されるトリガー対象の組み合わせとＡＲシナリオとの組み合わせの一例を示す一覧表である。図２８には、作成されたＡＲシナリオが実行された場合に、検出されるトリガー対象の所定の組み合わせ（１）−（４）に対応して表示される表示画像が示されている。図２８には、移動物体がカバーＣＶであり、関連静止物体が筐体ＢＸである場合について、（１）−（４）の組み合わせ対応する表示画像が示されている。以降では、ＡＲシナリオが実行されている場合に、検出されるトリガー対象に対応させて表示される表示画像について説明する。 FIG. 28 is a list showing an example of combinations of trigger targets and AR scenarios created in the process of step S95 of the AR scenario creation process of the fourth embodiment. FIG. 28 shows display images that are displayed in correspondence with the predetermined combinations (1) to (4) to be detected when the created AR scenario is executed. FIG. 28 shows display images corresponding to combinations (1) to (4) when the moving object is the cover CV and the related stationary object is the housing BX. In the following, a display image that is displayed in association with the detected trigger target when the AR scenario is being executed will be described.

図２９は、ＡＲシナリオが実行されている場合の表示画像決定処理のフローチャートである。表示画像決定処理は、ＡＲシナリオが実行されている場合に、検出されたトリガー対象の組み合わせに応じて、ＡＲシナリオを実行している機器としてのＨＭＤ２００が光学像表示部８６,８８に表示する表示画像を決定する処理である。 FIG. 29 is a flowchart of display image determination processing when an AR scenario is being executed. In the display image determination process, when an AR scenario is being executed, the HMD 200 as a device that is executing the AR scenario is displayed on the optical image display units 86 and 88 in accordance with the detected combination of trigger targets. This is a process for determining an image.

表示画像決定処理では、初めに、ＨＭＤ２００のカメラ８９は、外景を撮像する（ステップＳ１０１）。ＨＭＤ２００のＣＰＵ７５は、カメラ８９の撮像画像の中に図２８に示す組み合わせ（１）のトリガー対象を検出したか否かを判定する（ステップＳ１０３）。ＣＰＵ７５は、組み合わせ（１）のトリガー対象を検出したと判定した場合には（ステップＳ１０３：ＹＥＳ）、組み合わせ（１）に対応付けられた表示画像を光学像表示部８６,８８に表示させる画像として決定する（ステップＳ１１１）。その後、ＣＰＵ７５は、表示画像決定処理を終了する。 In the display image determination process, first, the camera 89 of the HMD 200 captures an outside scene (step S101). The CPU 75 of the HMD 200 determines whether or not the trigger target of the combination (1) shown in FIG. 28 is detected in the captured image of the camera 89 (step S103). When the CPU 75 determines that the trigger target of the combination (1) has been detected (step S103: YES), the CPU 75 displays the display image associated with the combination (1) as an image to be displayed on the optical image display units 86 and 88. Determine (step S111). Thereafter, the CPU 75 ends the display image determination process.

図３０は、組み合わせ（１）に対応付けられた表示画像が光学像表示部８６,８８に表示されたときに使用者が視認する視野ＶＲの一例を示す説明図である。図３０には、トリガー対象としての移動物体であるカバーＣＶが検出された場合に、ＨＭＤ２００の画像表示部８０を頭部に装着した使用者が視認する視野ＶＲの一例が示されている。図３０に示すように、使用者は、外景ＳＣに含まれる実物のカバーＣＶとドライバーＤＶとボルトＢＴとに加えて、画像として表示されている筐体画像ＩＭＸを視認する。言い換えれば、図２８に示す組み合わせ（１）のように、ＡＲシナリオ制御部１１ａは、ＡＲシナリオが実行されて、トリガー対象としての移動物体であるカバーＣＶのみが検出された場合に、関連静止物体の筐体画像ＩＭＸのみを表示画像として設定したＡＲシナリオを作成する。なお、図３０には、実物のドライバーＤＶおよび複数のボルトＢＴが検出されているが、図２８に示す組み合わせにおいて、実物のドライバーＤＶおよびボルトＢＴは、表示画像の有無とは無関係である。なお、後述する図３１ないし図３３においても、実物のドライバーＤＶおよびボルトＢＴは、表示画像の有無とは無関係である。 FIG. 30 is an explanatory diagram illustrating an example of the visual field VR visually recognized by the user when the display image associated with the combination (1) is displayed on the optical image display units 86 and 88. FIG. 30 shows an example of a visual field VR visually recognized by a user who wears the image display unit 80 of the HMD 200 on the head when a cover CV that is a moving object as a trigger target is detected. As shown in FIG. 30, in addition to the actual cover CV, driver DV, and bolt BT included in the outside scene SC, the user visually recognizes the housing image IMX displayed as an image. In other words, as in the combination (1) shown in FIG. 28, the AR scenario control unit 11a executes the AR scenario, and when only the cover CV that is a moving object as a trigger target is detected, the related stationary object An AR scenario in which only the housing image IMX is set as a display image is created. In FIG. 30, an actual driver DV and a plurality of bolts BT are detected. However, in the combination shown in FIG. 28, the actual driver DV and the bolt BT are irrelevant to the presence or absence of a display image. 31 to 33 described later, the actual driver DV and the bolt BT are irrelevant to the presence or absence of the display image.

図２９のステップＳ１０３の処理において、ＣＰＵ７５は、組み合わせ（１）のトリガー対象を検出していないと判定した場合には（ステップＳ１０３：ＮＯ）、図２８に示す組み合わせ（２）のトリガー対象を検出しているか否かを判定する（ステップＳ１０５）。ＣＰＵ７５は、組み合わせ（２）のトリガー対象を検出したと判定した場合には（ステップＳ１０５：ＹＥＳ）、組み合わせ（２）に対応付けられた表示画像を光学像表示部８６,８８に表示させる画像として決定する（ステップＳ１１３）。その後、ＣＰＵ７５は、表示画像決定処理を終了する。 In the process of step S103 of FIG. 29, when the CPU 75 determines that the trigger target of the combination (1) has not been detected (step S103: NO), the trigger target of the combination (2) shown in FIG. 28 is detected. It is determined whether or not (step S105). When the CPU 75 determines that the trigger target of the combination (2) has been detected (step S105: YES), the CPU 75 displays the display image associated with the combination (2) on the optical image display units 86 and 88. Determine (step S113). Thereafter, the CPU 75 ends the display image determination process.

図３１は、組み合わせ（２）に対応付けられた表示画像が光学像表示部８６,８８に表示されたときに使用者が視認する視野ＶＲの一例を示す説明図である。図３１には、トリガー対象としての関連静止物体である筐体ＢＸが検出された場合に、ＨＭＤ２００の使用者が視認するＶＲの一例が示されている。図３１に示すように、ＣＰＵ７５は、ＡＲシナリオを実行して、トリガー対象としての関連性物体である筐体ＢＸのみが検出された場合に、図２８の組み合わせ（２）に示す画像を光学像表示部８６,８８に表示させる。ＣＰＵ７５は、表示画像として、移動物体であるカバーＣＶのカバー画像ＩＭＣを、ＡＲシナリオ作成時の撮像データにおいて最後に静止した位置、すなわち、カバーＣＶが筐体ＢＸに装着された位置にカバー画像ＩＭＣに表示させる。言い換えれば、ＡＲシナリオ制御部１１ａは、関連静止物体である筐体ＢＸに一体化された位置にカバーＣＶのＡＲ画像であるカバー画像ＩＭＣを表示させるＡＲシナリオを作成する。 FIG. 31 is an explanatory diagram illustrating an example of the visual field VR visually recognized by the user when the display image associated with the combination (2) is displayed on the optical image display units 86 and 88. FIG. 31 shows an example of a VR visually recognized by the user of the HMD 200 when a casing BX that is a related stationary object as a trigger target is detected. As illustrated in FIG. 31, the CPU 75 executes the AR scenario, and when only the casing BX that is the related object as the trigger target is detected, the image illustrated in the combination (2) in FIG. 28 is an optical image. It is displayed on the display units 86 and 88. The CPU 75 displays the cover image IMC of the cover CV, which is a moving object, as the display image at the position where the cover CV was last stopped in the imaging data when the AR scenario was created, that is, the position where the cover CV was mounted on the housing BX. To display. In other words, the AR scenario control unit 11a creates an AR scenario that displays a cover image IMC that is an AR image of the cover CV at a position integrated with the casing BX that is a related stationary object.

図２９のステップＳ１０５の処理において、ＣＰＵ７５は、組み合わせ（２）のトリガー対象を検出していないと判定した場合には（ステップＳ１０５：ＮＯ）、図２８に示す組み合わせ（３）のトリガー対象を検出しているか否かを判定する（ステップＳ１０７）。ＣＰＵ７５は、組み合わせ（３）のトリガー対象を検出したと判定した場合には（ステップＳ１０７：ＹＥＳ）、組み合わせ（３）に対応付けられた表示画像を光学像表示部８６,８８に表示させる画像として決定する（ステップＳ１１５）。その後、ＣＰＵ７５は、表示画像決定処理を終了する。 In the process of step S105 in FIG. 29, when the CPU 75 determines that the trigger target of the combination (2) is not detected (step S105: NO), the CPU 75 detects the trigger target of the combination (3) shown in FIG. It is determined whether or not (step S107). When the CPU 75 determines that the trigger target of the combination (3) has been detected (step S107: YES), the CPU 75 displays the display image associated with the combination (3) as an image to be displayed on the optical image display units 86 and 88. Determine (step S115). Thereafter, the CPU 75 ends the display image determination process.

図３２は、組み合わせ（３）に対応付けられた表示画像が光学像表示部８６,８８に表示されたときに使用者が視認する視野ＶＲの一例を示す説明図である。図３２には、トリガー対象としての移動物体であるカバーＣＶおよび関連静止物体である筐体ＢＸが検出された場合に、ＨＭＤ２００の使用者が視認する視野ＶＲの一例が示されている。図３２に示すように、ＣＰＵ７５は、ＡＲシナリオを実行して、トリガー対象としての筐体ＢＸおよびカバーＣＶとが検出された場合に、図２８の組み合わせ（３）に示す画像を光学像表示部８６,８８に表示させる。ＣＰＵ７５は、表示画像として、図３１に示すカバー画像ＩＭＣと同様に、関連静止物体である筐体ＢＸに一体化された位置にカバーＣＶのカバー画像ＩＭＣを光学像表示部８６,８８に表示させる。言い換えれば、ＡＲシナリオ制御部１１ａは、関連静止物体である筐体ＢＸに一体化された位置にカバーＣＶのＡＲ画像であるカバー画像ＩＭＣを表示させるＡＲシナリオを作成する。 FIG. 32 is an explanatory diagram illustrating an example of the visual field VR visually recognized by the user when the display image associated with the combination (3) is displayed on the optical image display units 86 and 88. FIG. 32 shows an example of a visual field VR visually recognized by the user of the HMD 200 when a cover CV that is a moving object as a trigger target and a housing BX that is a related stationary object are detected. As shown in FIG. 32, the CPU 75 executes the AR scenario and detects the image shown in the combination (3) in FIG. 28 when the casing BX and the cover CV as trigger targets are detected. 86,88. The CPU 75 causes the optical image display units 86 and 88 to display the cover image IMC of the cover CV as a display image at a position integrated with the housing BX that is a related stationary object, as with the cover image IMC shown in FIG. . In other words, the AR scenario control unit 11a creates an AR scenario that displays a cover image IMC that is an AR image of the cover CV at a position integrated with the casing BX that is a related stationary object.

図２９のステップＳ１０７の処理において、ＣＰＵ７５は、組み合わせ（３）のトリガー対象を検出していないと判定した場合には（ステップＳ１０７：ＮＯ）、図２８に示す組み合わせ（４）のトリガー対象を検出しているか否かを判定する（ステップＳ１０９）。ＣＰＵ７５は、組み合わせ（４）のトリガー対象を検出したと判定した場合には（ステップＳ１０９：ＹＥＳ）、組み合わせ（４）に対応付けられた表示画像を光学像表示部８６,８８に表示させる画像として決定する（ステップＳ１１７）。その後、ＣＰＵ７５は、表示画像決定処理を終了する。 In the process of step S107 of FIG. 29, when the CPU 75 determines that the trigger target of the combination (3) has not been detected (step S107: NO), the CPU 75 detects the trigger target of the combination (4) shown in FIG. It is determined whether or not (step S109). When the CPU 75 determines that the trigger target of the combination (4) has been detected (step S109: YES), the CPU 75 displays the display image associated with the combination (4) as an image to be displayed on the optical image display units 86 and 88. Determination is made (step S117). Thereafter, the CPU 75 ends the display image determination process.

図３３は、組み合わせ（４）に対応付けられた表示画像が光学像表示部８６,８８に表示されたときに使用者が視認する視野ＶＲの一例を示す説明図である。図３３には、トリガー対象としての移動物体であるカバーＣＶおよび関連静止物体である筐体ＢＸが検出されなかった場合に、ＨＭＤ２００の使用者が視認する視野ＶＲが示されている。図３３に示すように、ＣＰＵ７５は、ＡＲシナリオを実行して、いずれのトリガー対象も検出しなかった場合に、図２８の組み合わせ（４）に示す画像を光学像表示部８６,８８に表示させる。ＣＰＵ７５は、表示画像として、移動物体であるカバーＣＶと関連静止物体である筐体ＢＸとが一体化した状態を表すカバー画像ＩＭＣと、筐体画像ＩＭＸと、を光学像表示部８６,８８に表示させる。言い換えれば、ＡＲシナリオ制御部１１ａは、いずれのトリガー対象も検出されなかった場合に、筐体ＢＸとカバーＣＶとが一体化したＡＲ画像としての筐体画像ＩＭＸとカバー画像ＩＭＣとを表示させるＡＲシナリオを作成する。 FIG. 33 is an explanatory diagram illustrating an example of the visual field VR visually recognized by the user when the display image associated with the combination (4) is displayed on the optical image display units 86 and 88. FIG. 33 shows a visual field VR visually recognized by the user of the HMD 200 when the cover CV that is a moving object as a trigger target and the casing BX that is a related stationary object are not detected. As shown in FIG. 33, the CPU 75 executes the AR scenario and displays the image shown in the combination (4) in FIG. 28 on the optical image display units 86 and 88 when no trigger target is detected. . The CPU 75 displays, as display images, a cover image IMC representing a state in which the cover CV that is a moving object and the case BX that is a related stationary object are integrated, and the case image IMX on the optical image display units 86 and 88. Display. In other words, the AR scenario control unit 11a displays the case image IMX and the cover image IMC as an AR image in which the case BX and the cover CV are integrated when no trigger target is detected. Create a scenario.

図２９のステップＳ１０９の処理において、ＣＰＵ７５は、組み合わせ（４）のトリガー対象を検出していないと判定した場合には（ステップＳ１０９：ＮＯ）、いずれのＡＲ画像も光学像表示部８６,８８に表示させずに、表示画像決定処理を終了する。なお、他の実施形態では、第４実施形態の表示関連処理の後に、異なる移動物体と関連静止物体とをトリガー対象とする表示関連処理が行なわれてもよい。このように、第４実施形態のＡＲシナリオ制御部１１ａは、図２８に示すように、トリガー対象の組み合わせのそれぞれに対応するＡＲ画像を含むＡＲシナリオを自動で作成する。なお、移動物体であるカバーＣＶは、請求項における移動対象に相当し、関連静止物体である筐体ＢＸは、請求項における関連静止対象に相当する。また、カバーＣＶのカバー画像ＩＭＣは、請求項における移動対象対応画像に相当し、静止関連物体である筐体ＢＸの筐体画像ＩＭＸは、請求項における関連静止対象画像に相当する。 In the process of step S109 of FIG. 29, when the CPU 75 determines that the trigger of the combination (4) has not been detected (step S109: NO), any AR image is displayed on the optical image display units 86 and 88. The display image determination process is terminated without displaying. In other embodiments, after the display related process of the fourth embodiment, a display related process using different moving objects and related stationary objects as trigger targets may be performed. As described above, the AR scenario control unit 11a of the fourth embodiment automatically creates an AR scenario including an AR image corresponding to each combination of trigger targets, as illustrated in FIG. Note that the cover CV, which is a moving object, corresponds to the moving object in the claims, and the housing BX, which is the related stationary object, corresponds to the related stationary object in the claims. Further, the cover image IMC of the cover CV corresponds to the movement target corresponding image in the claims, and the casing image IMX of the casing BX that is the stationary related object corresponds to the related still target image in the claims.

以上説明したように、第４実施形態の画像処理装置１００ａでは、ＡＲシナリオ制御部１１ａは、ＡＲシナリオが実行されるためのトリガー対象としてカバーＣＶと筐体ＢＸとを設定する。そのため、第４実施形態の画像処理装置１００ａでは、予め設定された特定の条件の検出によってＡＲシナリオが実行されるように作成されるため、ＡＲシナリオの用途に応じた実行のタイミングを設定できる。 As described above, in the image processing apparatus 100a of the fourth embodiment, the AR scenario control unit 11a sets the cover CV and the housing BX as trigger targets for executing the AR scenario. Therefore, in the image processing apparatus 100a according to the fourth embodiment, the AR scenario is created so as to be executed by detecting a preset specific condition, so that the execution timing can be set according to the use of the AR scenario.

また、第４実施形態の画像処理装置１００ａでは、ＡＲシナリオ制御部１１ａは、ＡＲシナリオが実行されたときに検出される移動物体のカバーＣＶの有無と関連静止物体の筐体ＢＸ有無との組み合わせに対応するカバー画像ＩＭＣと筐体画像ＩＭＸとのＡＲ画像の表示有無を含むＡＲシナリオを作成する。そのため、第４実施形態の画像処理装置１００ａでは、特に何の操作をしなくても、撮像データに基づいて、移動物体に関連するＡＲ画像を含む複数のＡＲシナリオが作成され、画像処理装置１００ａの使い勝手が向上する。 In the image processing apparatus 100a of the fourth embodiment, the AR scenario control unit 11a combines the presence / absence of the cover CV of the moving object and the presence / absence of the casing BX of the related stationary object detected when the AR scenario is executed. An AR scenario including the presence / absence of display of the AR image of the cover image IMC and the housing image IMX corresponding to is created. Therefore, in the image processing apparatus 100a of the fourth embodiment, a plurality of AR scenarios including an AR image related to the moving object are created based on the imaging data without any particular operation, and the image processing apparatus 100a. Improved usability.

Ｅ．変形例：
なお、この発明は上記実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば、次のような変形も可能である。 E. Variations:
In addition, this invention is not limited to the said embodiment, It can implement in a various aspect in the range which does not deviate from the summary, For example, the following deformation | transformation is also possible.

Ｅ−１．変形例１：
上記第１実施形態では、魚ＦＳのうろこを取ることを促す基本シナリオと、基本シナリオから包丁ＫＮが検出された場合に分岐する分岐シナリオと、によって構成される複合シナリオについて説明したが、複合シナリオについては、これに限られず、種々変形可能である。例えば、２つのＡＲシナリオから構成された複合シナリオである必要はなく、１つのＡＲシナリオであってもよいし、３つ以上のＡＲシナリオが複合された複合シナリオであってもよい。また、基本シナリオと分岐シナリオとのように、ＡＲシナリオが上と下とで区別されておらず、並列に扱われた複数のＡＲシナリオによって複合シナリオが作成されてもよい。 E-1. Modification 1:
In the first embodiment, a composite scenario composed of a basic scenario that prompts the fish FS to take the scale and a branch scenario that branches when the kitchen knife KN is detected from the basic scenario has been described. However, the present invention is not limited to this and can be variously modified. For example, it is not necessary to be a composite scenario composed of two AR scenarios, and may be a single AR scenario or a composite scenario in which three or more AR scenarios are combined. Further, unlike the basic scenario and the branch scenario, the AR scenario is not distinguished between the upper and lower scenarios, and a composite scenario may be created by a plurality of AR scenarios handled in parallel.

上記実施形態では、ＲＧＢカメラ３１やマイク３３によって取得されたデータに基づいて、ＡＲ画像等の生成が行なわれたが、必ずしもこれらの装置は必須の構成ではなく、これらの装置については、種々変形可能である。例えば、画像処理装置１００は、ＲＧＢカメラ３１、マイク３３、操作部３４、表示部３５を備える必要はなく、自動的に撮像した撮像範囲の中の移動物体のＡＲ画像のみを作成して、ＡＲシナリオを作成してもよい。また、ＣＰＵ１０は、付加情報取得部１８を有する必要はなく、単に、撮像画像のみの画像データに基づいて、ＡＲシナリオを作成してもよい。 In the above embodiment, the generation of the AR image and the like is performed based on the data acquired by the RGB camera 31 and the microphone 33. However, these devices are not necessarily indispensable configurations, and these devices are variously modified. Is possible. For example, the image processing apparatus 100 does not need to include the RGB camera 31, the microphone 33, the operation unit 34, and the display unit 35, but only creates an AR image of a moving object in the imaging range that has been automatically imaged. Scenarios may be created. Further, the CPU 10 does not need to include the additional information acquisition unit 18, and may simply create an AR scenario based on image data of only the captured image.

上記第１実施形態では、ＡＲシナリオ操作設定部１７は、基本シナリオと分岐シナリオとによって構成される複合シナリオを作成するときに、基本シナリオから分岐シナリオへと分岐する際のトリガーを設定する場合に、編集画像ＫＣのような画像を表示させたが、必ずしもこのような画像を表示させる必要はない。ＡＲシナリオ操作設定部１７は、編集時に、編集画像ＫＣの画像を表示させる代わりに、ＡＲ画像等の色を変更させて表示させることで、トリガーを設定していること使用者に認識させてもよい。また、ＡＲシナリオ操作設定部１７は、編集画像ＫＣの画像を表示させる代わりに、音声を出力させることで、使用者にトリガーを設定していることを使用者に認識させてもよい。この変形例では、音声で使用者に編集等の状態を認識させることができるため、編集画像ＫＣなどの画像を表示する場合と比較して、編集画像ＫＣがＡＲ画像に重複することがなく、編集時等の使用者の使い勝手がより向上する。 In the first embodiment, the AR scenario operation setting unit 17 sets a trigger for branching from a basic scenario to a branch scenario when creating a composite scenario composed of a basic scenario and a branch scenario. Although an image such as the edited image KC is displayed, it is not always necessary to display such an image. Instead of displaying the edited image KC image during editing, the AR scenario operation setting unit 17 displays the image by changing the color of the AR image or the like so that the user can recognize that the trigger has been set. Good. Further, the AR scenario operation setting unit 17 may cause the user to recognize that the trigger is set by outputting sound instead of displaying the image of the edited image KC. In this modification, since the user can recognize the state of editing or the like by voice, the edited image KC is not overlapped with the AR image as compared with the case where an image such as the edited image KC is displayed. The user-friendliness during editing is improved.

また、上記実施形態では、３台のカメラ３１１，３１２，３１３と、３台の距離センサー３２１、３２２、３２３によって撮像されたが、１台のカメラによって撮像されてもよいし、１対のカメラと距離センサーによって撮像されてもよい。例えば、ＨＭＤに搭載された１対のカメラ３１１および距離センサー３２１によって、外景ＳＣが撮像され、ＨＭＤの装着者が被写体ＯＢ（実演者）となって、被写体ＯＢの行動または作業がトラッキングされることで、ＡＲシナリオが作成されてもよい。この場合、被写体ＯＢの頭部の動きによって撮像範囲が変化しても、ＨＭＤに搭載された９軸センサー８７などによって、実空間と三次元モデルの空間との対応関係を補正できる。また、１台のカメラであっても、容易に二次元モデルまたは三次元モデルのＡＲ画像を含むＡＲシナリオを作成できる。この場合、ＡＲ画像またはＡＲシナリオを生成する画像処理装置１００は、ＨＭＤ上のプロセッサーおよび記憶装置によって実現されてもよいし、ＨＭＤとネットワーク（例えば、無線ＬＡＮ）経由で双方向に通信可能な外部のコンピューターに包含されたプロセッサーおよび記憶装置によって実現されてもよい。 In the above embodiment, the images are taken by the three cameras 311, 312, and 313 and the three distance sensors 321, 322, and 323, but may be taken by one camera or a pair of cameras. And a distance sensor. For example, the outside scene SC is imaged by a pair of cameras 311 and a distance sensor 321 mounted on the HMD, and the wearer of the HMD becomes a subject OB (demonstrator), and the behavior or work of the subject OB is tracked. Thus, an AR scenario may be created. In this case, even if the imaging range changes due to the movement of the head of the subject OB, the correspondence between the real space and the space of the three-dimensional model can be corrected by the 9-axis sensor 87 mounted on the HMD. Moreover, even with one camera, an AR scenario including an AR image of a two-dimensional model or a three-dimensional model can be easily created. In this case, the image processing apparatus 100 that generates the AR image or the AR scenario may be realized by a processor and a storage device on the HMD, or may be an external device capable of bidirectional communication with the HMD via a network (for example, a wireless LAN). It may be realized by a processor and a storage device included in the computer.

そして、この場合において、手などに隠されることよってＡＲ画像に遮蔽部分が生じることを防ぐ方法の一つは、上記の方法に加えて、以下の方法がある。それはたとえば、ＡＲシナリオ作成時に、ＨＭＤを装着した実演者に対して、当該ＨＭＤが「あなたが持っている物を違う角度や向きで見せて」など視覚または聴覚に訴えるメッセージを提示し、当該物の異なる角度又は向きの画像をカメラ３１１・距離センサー３２１が取得した場合に、異なる角度又は向きからの画像を合成して１つの三次元モデルとすることも可能である。 In this case, in addition to the above method, there is the following method as one of methods for preventing the AR image from being blocked by a hand or the like. For example, at the time of creating an AR scenario, a message that appeals to the visual or auditory sense, such as “show the object you have at a different angle or orientation,” is presented to the performer wearing the HMD. When the camera 311 and the distance sensor 321 acquire images with different angles or orientations, it is possible to combine images from different angles or orientations into one three-dimensional model.

また、上記実施形態では、不要画像消去部１９が不要な物体をＡＲ画像として生成しないように削除したが、不要な物体の処理については、種々変形可能である。例えば、不要な物体を、移動物体と同様にＡＲ画像として生成してもよいし、ＲＧＢデータを変化させることで、半透明のＡＲ画像や二次元の画像として生成してもよい。この変形例では、例えば、作成されたＡＲシナリオを実行している使用者がうろこ取りＴＬの持ち方などがわからない場合に、被写体ＯＢの手の部分の画像をうろこ取りＴＬのＡＲ画像と対応付けて表示させることで、使用者の利便性が向上する。 In the above-described embodiment, the unnecessary image erasing unit 19 has deleted the unnecessary object so as not to generate the AR image. However, the processing of the unnecessary object can be variously modified. For example, an unnecessary object may be generated as an AR image similarly to the moving object, or may be generated as a translucent AR image or a two-dimensional image by changing RGB data. In this modification, for example, when the user executing the created AR scenario does not know how to hold the scale TL, the image of the hand portion of the subject OB is associated with the AR image of the scale TL. By displaying the information, the convenience for the user is improved.

Ｅ−２．変形例２：
また、ＣＰＵ１０は、自動学習部を有していてもよい。自動学習部は、撮像画像に含まれる物体の画像についての姿勢や位置を学習することで、移動物体や規格が決まっていない不定形物も認識できる。また、自動学習部は、複数認識アルゴリズムによって、形状、色、模様といった最適な組み合わせを自動選択することで、環境変化（例えば、工程の変化や照明の変化）に対する順応性が高い学習型の認識を実現できる。 E-2. Modification 2:
Further, the CPU 10 may have an automatic learning unit. The automatic learning unit can recognize a moving object or an indeterminate object whose standard is not determined by learning the posture and position of the image of the object included in the captured image. In addition, the automatic learning unit automatically selects the optimal combination of shape, color, and pattern using a multi-recognition algorithm, so that the learning type recognition is highly adaptable to environmental changes (for example, process changes and lighting changes). Can be realized.

上記実施形態では、ＡＲ画像が生成される対象と、当該対象から所定の範囲内の検出される対応物体との位置関係が対応付けられたが、必ずしも、ＡＲ画像が生成される対象から近い距離の対応物体との位置関係に対応付けられてＡＲ画像が生成されなくてもよい。例えば、撮像画像の中から検出された対応物体の位置関わらず、対応物体が検出されると、予め設定された位置にＡＲ画像が表示されてもよい。また、対応物体とＡＲ画像との位置関係は、操作部３４が受け付けた操作によって、適宜、設定されてもよい。 In the above embodiment, the positional relationship between the target for which the AR image is generated and the corresponding object detected within a predetermined range from the target is associated, but the distance is not necessarily close to the target for which the AR image is generated. The AR image may not be generated in association with the positional relationship with the corresponding object. For example, an AR image may be displayed at a preset position when a corresponding object is detected regardless of the position of the corresponding object detected from the captured image. Further, the positional relationship between the corresponding object and the AR image may be set as appropriate by an operation received by the operation unit 34.

また、上記実施形態では、不要画像消去部１９は、検出された対応物体の位置に対応付けてＡＲ画像を生成したが、必ずしも、検出された対応物体の位置に対応付けて、ＡＲ画像を生成する必要はない。例えば、ＡＲ画像は、マイク３３によって取得された使用者の音声に対応付けられて生成されてもよい。 In the above embodiment, the unnecessary image erasing unit 19 generates an AR image in association with the detected position of the corresponding object. However, the unnecessary image erasure unit 19 necessarily generates an AR image in association with the detected position of the corresponding object. do not have to. For example, the AR image may be generated in association with the user's voice acquired by the microphone 33.

第１実施形態および第２実施形態では、被写体ＯＢが行う行動または作業として、料理を作る作業を例示して実施形態を説明した。しかし、本発明の他の態様は、工場における機械部品の取付け、機械の点検、その他、機械的な実物体に対して行う作業の際に表示されるＡＲ画像を作成する実施形態に適用できる。また、本発明のさらに他の態様は、レゴ社のレゴ（登録商標）ブロックの組み立て方など、玩具を使った作業（レジャーやゲーム）の際に表示されるＡＲ画像を作成する実施形態にも適用できる。 In the first embodiment and the second embodiment, the embodiment has been described by exemplifying the work of cooking as the action or work performed by the subject OB. However, another aspect of the present invention can be applied to an embodiment that creates an AR image that is displayed when a machine part is installed in a factory, a machine is inspected, or other operations performed on a mechanical real object. Still another aspect of the present invention is an embodiment for creating an AR image displayed during work (leisure or game) using toys, such as how to assemble a LEGO (registered trademark) block of LEGO Corporation. Applicable.

Ｅ−３．変形例３：
このように、第４実施形態の画像処理装置１００ａでは、ＡＲシナリオ制御部１１ａは、ＡＲが実行されたときに検出されるトリガー対象の組み合わせのそれぞれに対応するＡＲシナリオを自動的に作成した。それに加えて、ＡＲシナリオ制御部１１ａは、所定の操作を受け付けることにより、第１実施形態のように、付加情報取得部１８が音声等の付加情報を作成されたＡＲシナリオに追加してもよい。 E-3. Modification 3:
As described above, in the image processing apparatus 100a according to the fourth embodiment, the AR scenario control unit 11a automatically creates an AR scenario corresponding to each combination of trigger targets detected when the AR is executed. In addition, the AR scenario control unit 11a may add the additional information such as voice to the created AR scenario by accepting a predetermined operation as in the first embodiment. .

また、第４実施形態の画像処理装置１００ａでは、ＡＲシナリオ制御部１１ａは、移動物体の有無と静止関連物体の有無との全ての組み合わせに対応するＡＲシナリオを作成したが、一部の組み合わせに対応するＡＲシナリオのみを作成してもよい。また、ＡＲシナリオ制御部１１ａは、図２２のステップＳ９７の処理において、撮像データの中の全ての移動物体を選択した組み合わせに対応するＡＲシナリオを作成したが、選択された一部の移動物体のみに対応するＡＲシナリオのみを作成してもよい。 In the image processing apparatus 100a according to the fourth embodiment, the AR scenario control unit 11a creates AR scenarios corresponding to all combinations of the presence / absence of a moving object and the presence / absence of a stationary related object. Only the corresponding AR scenario may be created. In addition, the AR scenario control unit 11a creates an AR scenario corresponding to a combination in which all moving objects in the imaging data are selected in the process of step S97 in FIG. 22, but only some selected moving objects. Only the AR scenario corresponding to can be created.

また、図２９に示す第４実施形態の表示画像決定処理では、図２８に示す組み合わせのトリガー対象と一致するかの判定により、表示画像が決定したが、１つ１つのトリガー対象が検出されるか否かの判定によって、表示画像が決定されてもよい。例えば、ＡＲシナリオに含まれる全てのトリガー対象として、４つの部品としてのカバーＣＶ、筐体ＢＸ、ドライバーＤＶ、ボルトＢＴがある場合に、それぞれの部品の検出の有無によって、フローチャートが分岐して、表示画像が決定されてもよい。具体的には、最初に、カバーＣＶの検出が判定され、次に、筐体ＢＸの検出が判定され、次に、ドライバーＤＶの検出が判定され、次に、ボルトＢＴの検出が判定されて、全ての判定結果に基づいて、表示画像が決定されてもよい。この変形例の表示画像決定処理では、複数のトリガー対象がＡＲシナリオに設定されている場合でも、対応したＡＲシナリオに含まれるＡＲ画像を表示できる。 Further, in the display image determination process of the fourth embodiment shown in FIG. 29, the display image is determined by determining whether or not the combination trigger object shown in FIG. 28 matches, but each trigger object is detected. The display image may be determined by determining whether or not. For example, when there are cover CV, casing BX, driver DV, and bolt BT as four parts as all trigger targets included in the AR scenario, the flowchart branches depending on whether or not each part is detected, A display image may be determined. Specifically, first, detection of the cover CV is determined, next detection of the housing BX is determined, next detection of the driver DV is determined, and then detection of the bolt BT is determined. The display image may be determined based on all the determination results. In the display image determination process of this modification, even when a plurality of trigger targets are set in the AR scenario, it is possible to display an AR image included in the corresponding AR scenario.

本発明は、上記実施形態や変形例に限られるものではなく、その趣旨を逸脱しない範囲において種々の構成で実現することができる。例えば、発明の概要の欄に記載した各形態中の技術的特徴に対応する実施形態、変形例中の技術的特徴は、上述の課題の一部または全部を解決するために、あるいは、上述の効果の一部または全部を達成するために、適宜、差し替えや、組み合わせを行なうことが可能である。また、その技術的特徴が本明細書中に必須なものとして説明されていなければ、適宜、削除することが可能である。 The present invention is not limited to the above-described embodiments and modifications, and can be realized with various configurations without departing from the spirit of the present invention. For example, the technical features in the embodiments and the modifications corresponding to the technical features in each form described in the summary section of the invention are to solve some or all of the above-described problems, or In order to achieve part or all of the effects, replacement or combination can be performed as appropriate. Further, if the technical feature is not described as essential in the present specification, it can be deleted as appropriate.

１０…ＣＰＵ
１１…ＡＲシナリオ制御部
１２…物体トラッキング部（対象選択部）
１３…物体認識部
１４…３Ｄモデル生成部
１５…センサー制御部
１６…ＵＩ制御部
１７…ＡＲシナリオ操作設定部
１８…付加情報取得部
１９…不要画像消去部
２０…画像表示部
２１…ＡＲ画像抽出部（画像生成部）
３１…ＲＧＢカメラ（撮像部）
３２…距離センサー（距離測定部）
３３…マイク（操作受付部、音声取得部）
３４…操作部（操作受付部）
３５…表示部
５０…データ記憶部
６０…電源
７０…制御部
７５…ＣＰＵ
８０…画像表示部
８１…右イヤホン
８２…右表示駆動部
８３…左イヤホン
８４…左表示駆動部
８５…接続部
８６…右光学像表示部
８７…９軸センサー
８８…左光学像表示部
８９…カメラ
９１…デプスセンサー
１００…画像処理装置
１６１…テキスト変換部
２００…ＨＭＤ
３１１…第１カメラ（撮像部）
３１２…第２カメラ（撮像部）
３１３…第３カメラ（撮像部）
３２１…第１距離センサー（距離測定部）
３２２…第２距離センサー（距離測定部）
３２３…第３距離センサー（距離測定部）
ＯＢ…被写体
ＳＣ…外景
ＫＣ…編集画像（特定の画像）
ＴＬ…うろこ取り
ＫＮ…包丁
ＶＲ…視野
ＦＳ…魚
ＡＲ１，ＡＲ２…画像
ＴＸ１、ＴＸ２…テキスト画像
ＣＶ…カバー（移動対象）
ＢＸ…筐体（関連静止対象）
ＤＶ…ドライバー
ＢＴ…ボルト
Ｂｈ…筐体のめねじ部
Ｃｈ…カバーの穴
ＬＨ…左手
ＲＨ…右手
ＩＭＸ…筐体画像（関連静止対象画像）
ＩＭＣ…カバー画像（移動対象対応画像） 10 ... CPU
11 ... AR scenario control unit 12 ... Object tracking unit (target selection unit)
DESCRIPTION OF SYMBOLS 13 ... Object recognition part 14 ... 3D model production | generation part 15 ... Sensor control part 16 ... UI control part 17 ... AR scenario operation setting part 18 ... Additional information acquisition part 19 ... Unnecessary image deletion part 20 ... Image display part 21 ... AR image extraction (Image generator)
31 ... RGB camera (imaging part)
32 ... Distance sensor (distance measuring unit)
33 ... Microphone (operation reception unit, voice acquisition unit)
34. Operation unit (operation reception unit)
35 ... Display unit 50 ... Data storage unit 60 ... Power source 70 ... Control unit 75 ... CPU
DESCRIPTION OF SYMBOLS 80 ... Image display part 81 ... Right earphone 82 ... Right display drive part 83 ... Left earphone 84 ... Left display drive part 85 ... Connection part 86 ... Right optical image display part 87 ... 9 axis sensor 88 ... Left optical image display part 89 ... Camera 91 ... Depth sensor 100 ... Image processing device 161 ... Text converter 200 ... HMD
311 ... 1st camera (imaging part)
312 ... Second camera (imaging unit)
313 ... Third camera (imaging unit)
321 ... 1st distance sensor (distance measuring part)
322 ... Second distance sensor (distance measuring unit)
323 ... Third distance sensor (distance measuring unit)
OB ... Subject SC ... Outside view KC ... Edited image (specific image)
TL ... scale removal KN ... knife VR ... visual field FS ... fish AR1, AR2 ... image TX1, TX2 ... text image CV ... cover (moving target)
BX ... Case (Related stationary object)
DV ... Screwdriver BT ... Bolt Bh ... Housing female thread Ch ... Cover hole LH ... Left hand RH ... Right hand IMX ... Housing image (Related still image)
IMC ... Cover image (image to be moved)

Claims

An image processing apparatus,
An outside scene sensor for imaging at least one object;
An image generation device, comprising: an image generation unit that generates a virtual image corresponding to at least one of the moving objects among the captured objects.

The image processing apparatus according to claim 1,
The image generation unit associates a moving area of an image generation target that is the target on which the virtual image is generated with at least one of the targets excluding the image generation target among the captured targets. An image processing apparatus that generates the virtual image to be generated.

The image processing apparatus according to claim 2,
The image generation unit generates the virtual image in which at least one of the size of the virtual image and the moving region is associated with the size of the target associated with the moving region of the image generation target. , Image processing device.

The image processing apparatus according to claim 2, wherein:
The image generation unit is configured to determine whether or not to display the virtual image according to whether or not the set trigger target is detected.

The image processing apparatus according to claim 4,
The image generation unit
As a trigger target, among the plurality of captured targets, the moving target that is the moving target, and the relationship that is determined to be within a predetermined distance from the moving target and the target that is not moving Set the stationary object and
In association with a combination of the presence / absence of the moving target and the presence / absence of the related stationary target, a moving target corresponding image as a virtual image of the moving target, and a related still target image as a virtual image of the related stationary target, An image processing apparatus that generates a virtual image combining the above.

The image processing apparatus according to any one of claims 1 to 5, further comprising:
An operation reception unit for receiving operations is provided.
The image generation unit is an image processing apparatus that deletes unnecessary portions and generates the virtual image based on the received operation.

The image processing apparatus according to any one of claims 1 to 6, wherein
The image generation unit is an image processing device that generates, as the virtual image, a corresponding image among a plurality of captured objects while the moving object is moving.

The image processing apparatus according to any one of claims 1 to 7, further comprising:
With a target selector
The target selecting unit identifies a shape of a human body and a shape other than the human body as the at least one target,
The image generation unit, wherein the image generation unit does not generate the virtual image corresponding to the shape of a human body among the captured objects.

The image processing apparatus according to any one of claims 1 to 8, further comprising:
It has an audio acquisition unit that acquires external audio,
The image generation unit generates the virtual image by associating the image generation target, which is the target on which the virtual image is generated, with the sound acquired while the image generation target is moving. , Image processing device.

The image processing apparatus according to claim 9,
The image generation unit is an image processing device that generates the acquired voice as a character image in association with the virtual image.

The image processing apparatus according to any one of claims 1 to 10, further comprising:
A distance measuring unit for measuring the distance to the object,
The image generation unit is an image processing device that generates the virtual image based on a measured distance.

An image processing apparatus according to any one of claims 1 to 11, comprising:
The image generation device is configured to insert a specific image at a specific time point of the virtual moving image when the virtual image is a virtual moving image that changes with time.

The image processing apparatus according to any one of claims 1 to 12, further comprising:
It has an audio acquisition unit that acquires external audio,
When the virtual image is a virtual moving image that changes over time, the image generating unit associates a specific time point of the virtual moving image with the acquired sound and associates the virtual moving image with the virtual moving image. An image processing apparatus that generates

A control method for an image processing apparatus, comprising:
Imaging at least one object;
Generating a virtual image corresponding to at least one of the moving objects among the imaged objects.

A computer program for an image processing apparatus,
An object imaging function for imaging at least one object;
A computer program that causes a computer to realize an image generation function that generates a virtual image corresponding to at least one of the moving objects among the captured objects.