JP7412826B1

JP7412826B1 - Video compositing device, video compositing method, and program

Info

Publication number: JP7412826B1
Application number: JP2023123184A
Authority: JP
Inventors: 直広早石; 英由樹安藤
Original assignee: KEISUUGIKEN CORPORATION
Current assignee: KEISUUGIKEN CORPORATION
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2024-01-15
Anticipated expiration: 2043-07-28

Abstract

【課題】参照映像と自己映像とが融合したように感じられることを防止でき、学習者であるユーザが模倣対象の動作に容易に追従することができる映像合成装置を提供する。【解決手段】映像合成装置１は、ユーザが動作を模倣する対象である模倣対象の動作の映像である参照映像が記憶される記憶部１１と、ユーザが動作させる対象である動作対象の動作の映像である自己映像を取得する映像取得部１２と、参照映像と自己映像とを合成して合成映像を生成する合成部１３と、合成映像を出力する映像出力部１４と、を備え、合成部１３は、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成する。【選択図】図１An object of the present invention is to provide a video synthesis device that can prevent a reference video and a self-video from appearing to be fused, and allows a user who is a learner to easily follow the motion of an imitation target. A video synthesis device 1 includes a storage unit 11 that stores a reference video that is a video of an imitation target whose motion is to be imitated by a user, and a storage unit 11 that stores a reference video that is a video of a motion of an imitation target whose motion is to be imitated by a user; The composition unit includes a video acquisition unit 12 that acquires a self-image that is a video, a composition unit 13 that combines the reference video and the self-video to generate a composite video, and a video output unit 14 that outputs the composite video. Step 13 combines the reference video and self-video so that the ratio of the combination of both images changes continuously and repeatedly over time. [Selection diagram] Figure 1

Description

本発明は、模倣対象の動作の映像である参照映像と、ユーザの動作対象の動作の映像である自己映像とを合成して出力する映像合成装置等に関する。 The present invention relates to a video synthesis device and the like that combines and outputs a reference video that is a video of an action to be imitated and a self-video that is a video of a user's action to be performed.

従来、手術などの動きの学習のために、学習者が模倣する対象となる模倣対象の動作の映像である参照映像と、学習者の動作の映像である自己映像とを交互に切り替えて表示する学習支援装置が知られている（例えば、特許文献１参照）。このような表示を参照することにより、学習者は、模倣対象の動作と同じ動作を行うためのトレーニングを行うことができる。 Conventionally, in order to learn movements such as those for surgery, a reference video, which is a video of the movement of the imitation target that the learner is imitating, and a self-video, which is a video of the learner's movements, are alternately displayed. Learning support devices are known (for example, see Patent Document 1). By referring to such a display, the learner can train to perform the same motion as the motion to be imitated.

特開２０１４－０７１４４３号公報Japanese Patent Application Publication No. 2014-071443

しかしながら、参照映像と自己映像とを交互に切り替えて表示する場合には、両映像が融合したように感じられることになる。そのため、学習者が参照映像に含まれる模倣対象の動作に適切に追従できていれば問題ないが、そうでない場合には、参照映像と自己映像との区別がはっきりしなくなり、学習者が模倣対象の動作に追従することが困難になるという問題がある。例えば、学習者が、自己映像がどちらであるのかを容易に把握することが難しくなることがある。 However, when the reference video and the self-video are alternately switched and displayed, it feels like the two videos are fused. Therefore, there is no problem if the learner can properly follow the movements of the imitation target included in the reference video, but if this is not the case, the distinction between the reference video and the self-image becomes unclear, and the learner is unable to follow the imitation target's movements. There is a problem in that it becomes difficult to follow the movements of For example, it may be difficult for the learner to easily understand which self-image is.

本発明は、上記課題を解決するためになされたものであり、参照映像と自己映像とが融合したように感じられることを防止でき、学習者であるユーザが模倣対象の動作に容易に追従できるようにするための映像合成装置等を提供することを目的とする。 The present invention has been made in order to solve the above-mentioned problems, and can prevent the reference video and self-video from feeling like they are fused, and allows the user, who is a learner, to easily follow the behavior of the imitator. The purpose of the present invention is to provide a video compositing device and the like for the purpose of the present invention.

上記目的を達成するため、本発明の一態様による映像合成装置は、ユーザが動作を模倣する対象である模倣対象の動作の映像である参照映像が記憶される記憶部と、ユーザが動作させる対象である動作対象の動作の映像である自己映像を取得する映像取得部と、参照映像と自己映像とを合成して合成映像を生成する合成部と、合成映像を出力する映像出力部と、を備え、合成部は、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成する、ものである。 In order to achieve the above object, a video synthesis device according to one aspect of the present invention includes a storage unit that stores a reference video that is a video of an action to be imitated whose action is to be imitated by a user, and an object to be moved by the user. a video acquisition unit that acquires a self-video that is a video of an action target; a synthesis unit that combines the reference video and the self-video to generate a composite video; and a video output unit that outputs the composite video. The combining unit combines the reference video and the self-video so that the combination ratio of the reference video and self-video continuously and repeatedly changes over time.

このような構成により、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成することができる。このように、参照映像と自己映像とを交互に切り替えて表示するのではないため、ユーザが、参照映像と自己映像とが融合したように感じることを防止でき、自己映像を容易に把握することができるようになり、その結果として、模倣対象の動作に容易に追従することができるようになる。 With such a configuration, it is possible to synthesize the reference image and the own image such that the ratio of synthesis between the reference image and the own image changes continuously and repeatedly over time. In this way, since the reference image and the self-image are not alternately displayed, it is possible to prevent the user from feeling as if the reference image and the self-image have been fused, and it is possible to easily understand the self-image. As a result, it becomes possible to easily follow the movements of the imitator.

また、本発明の一態様による映像合成装置では、合成部は、参照映像と自己映像との合成の割合の時間に沿った変化が正弦波となるように両映像を合成してもよい。 Further, in the video composition device according to one aspect of the present invention, the composition unit may compose the reference video and the own video so that the change in the composition ratio of the reference video and the own video over time becomes a sine wave.

このような構成により、例えば、一方の映像の割合が０～１００％の間で変化するように合成が行われる場合に、一方の映像のみが主に表示される期間がそれぞれあると共に、両映像が重なって表示される期間もあるため、ユーザは、両映像を容易に区別することができると共に、自分が動作させている動作対象を模倣対象の動作に容易に合わせることができるようになる。 With such a configuration, for example, when compositing is performed such that the ratio of one video changes between 0 and 100%, there are periods in which only one video is mainly displayed, and both videos are Since there are periods in which the images are displayed overlapping, the user can easily distinguish between the two images, and can easily match the motion object that he or she is moving to the motion of the imitation target.

また、本発明の一態様による映像合成装置では、合成映像における模倣対象と動作対象との距離を取得する距離取得部をさらに備え、合成部は、距離が遠いほど、自己映像がより多く表示されるように合成を行ってもよい。 Further, the video synthesis device according to one aspect of the present invention further includes a distance acquisition unit that acquires the distance between the imitation target and the motion target in the composite video, and the composition unit determines that the farther the distance is, the more self-image is displayed. The synthesis may be performed as follows.

このような構成により、模倣対象と動作対象の距離が大きい場合には、自己映像がより多く表示されることによって、両者の距離が小さくなるように動作対象を移動させることができる。また、両者の距離が小さい場合には、模倣対象がより多く表示されることによって、模倣対象の動作をより詳細に確認することができるようになる。 With such a configuration, when the distance between the imitation target and the action target is large, more self-images are displayed, thereby making it possible to move the action target so that the distance between them becomes smaller. Furthermore, when the distance between the two is small, more imitation targets are displayed, making it possible to confirm the movements of the imitation targets in more detail.

また、本発明の一態様による映像合成装置では、参照映像及び自己映像の少なくとも一方である変更対象の映像について、背景領域を透明またはあらかじめ決められた単色に変更する背景変更部をさらに備え、合成部は、変更対象の映像については、背景変更部による変更後の映像を合成してもよい。 Further, the video synthesis device according to one aspect of the present invention further includes a background changing unit that changes the background area of the video to be changed, which is at least one of the reference video and the self-video, to transparent or a predetermined monochrome color, and The unit may synthesize the image changed by the background changing unit with respect to the image to be changed.

このような構成により、合成映像において、模倣対象や動作対象がより見えやすくなるようにすることができる。背景をそれぞれ含む参照映像及び自己映像が合成された合成映像では、例えば、模倣対象が自己映像の背景領域と重なっていたり、動作対象が参照映像の背景領域と重なっていたりする場合に、模倣対象や動作対象の位置や状態を把握しにくくなるが、少なくとも一方の映像の背景領域が透明や単色に変更されることによって、そのような状況が改善されることになる。 With such a configuration, it is possible to make the imitation target and the motion target more visible in the composite video. In a composite video in which a reference video and a self-video, each including a background, are combined, for example, if the imitation target overlaps the background area of the self-video, or the motion target overlaps the background area of the reference video, the imitation target overlaps with the background area of the reference video. However, such situations can be improved by changing the background area of at least one of the images to transparent or monochrome.

また、本発明の一態様による映像合成装置では、参照映像及び自己映像の少なくとも一方である特定対象の映像について、注目箇所を特定する特定部と、特定対象の映像について特定された注目箇所以外の領域をぼかす処理を行うぼかし処理部と、をさらに備え、合成部は、特定対象の映像については、ぼかし処理部によるぼかし処理後の映像を合成してもよい。 In addition, in the video synthesis device according to one aspect of the present invention, a specifying unit that identifies a point of interest in a specific target video that is at least one of a reference video and a self-video; The image forming apparatus may further include a blurring processing section that performs a process of blurring an area, and the compositing section may synthesize the video after the blurring processing performed by the blurring processing section for the specific target video.

このような構成により、注目箇所以外についてぼかし処理が行われるため、注目箇所がより注目されやすいようにすることができる。 With this configuration, the blurring process is performed on areas other than the points of interest, so it is possible to make the points of interest more likely to attract attention.

また、本発明の一態様による映像合成装置では、特定部は、動きのある箇所である注目箇所を特定してもよい。 Furthermore, in the video synthesis device according to one aspect of the present invention, the identifying unit may identify a point of interest that is a moving point.

このような構成により、注目箇所を自動的に特定することができるようになる。 With such a configuration, it becomes possible to automatically specify a point of interest.

また、本発明の一態様による映像合成装置では、ユーザから注目箇所を示す注目箇所情報を受け付ける受付部をさらに備え、特定部は、受付部によって受け付けられた注目箇所情報に応じて注目箇所を特定してもよい。 The video synthesis device according to one aspect of the present invention further includes a reception unit that receives point-of-interest information indicating a point of interest from a user, and the identification unit specifies the point of interest according to the point-of-interest information received by the reception unit. You may.

このような構成により、ユーザが注目箇所を指定することができるようになる。 Such a configuration allows the user to specify a point of interest.

また、本発明の一態様による映像合成方法は、ユーザが動作を模倣する対象である模倣対象の動作の映像である参照映像が記憶される記憶部と、映像取得部と、合成部と、映像出力部とを用いて処理される映像合成方法であって、映像取得部が、ユーザが動作させる対象である動作対象の動作の映像である自己映像を取得するステップと、合成部が、参照映像と自己映像とを合成して合成映像を生成するステップと、映像出力部が、合成映像を出力するステップと、を備え、合成映像を生成するステップでは、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成する、ものである。 Further, a video synthesis method according to an aspect of the present invention includes: a storage unit that stores a reference video that is a video of an action to be imitated whose action is to be imitated by a user; a video acquisition unit; a synthesis unit; The video synthesis method includes a step in which the video acquisition section acquires a self-image that is a video of an operation target that is an object to be operated by a user; and a step of generating a composite video by combining the reference video and the self-video, and a step of outputting the composite video by the video output unit, and in the step of generating the composite video, the ratio of the combination of the reference video and the self-video is determined. The two images are combined in such a way that the images change continuously and repeatedly over time.

本発明の一態様による映像合成装置等によれば、参照映像と自己映像とが融合したように感じられることを防止することができ、学習者であるユーザが模倣対象の動作に容易に追従することができるようになる。 According to the video synthesis device or the like according to one aspect of the present invention, it is possible to prevent the reference video and the self-video from appearing to be fused, and the user, who is a learner, can easily follow the motion to be imitated. Be able to do things.

本発明の実施の形態による映像合成装置の構成を示すブロック図A block diagram showing the configuration of a video synthesis device according to an embodiment of the present invention 同実施の形態による映像合成装置の使用状況の一例を示す図A diagram showing an example of the usage status of the video synthesis device according to the embodiment 同実施の形態における参照映像の一例を示す図A diagram showing an example of a reference video in the same embodiment 同実施の形態における自己映像の一例を示す図A diagram showing an example of a self-image in the same embodiment. 同実施の形態における合成の割合の変化の一例を示す図A diagram showing an example of a change in the synthesis ratio in the same embodiment. 同実施の形態における合成の割合の変化の一例を示す図A diagram showing an example of a change in the synthesis ratio in the same embodiment. 同実施の形態における合成の割合の変化の一例を示す図A diagram showing an example of a change in the synthesis ratio in the same embodiment. 同実施の形態における合成の割合の変化の一例を示す図A diagram showing an example of a change in the synthesis ratio in the same embodiment. 同実施の形態における合成映像の一例を示す図A diagram showing an example of a composite image in the same embodiment 同実施の形態における合成映像の一例を示す図A diagram showing an example of a composite image in the same embodiment 同実施の形態による映像合成装置の動作を示すフローチャートFlowchart showing the operation of the video synthesis device according to the embodiment 同実施の形態による映像合成装置の構成の他の一例を示すブロック図A block diagram showing another example of the configuration of the video synthesis device according to the embodiment 同実施の形態による映像合成装置の構成の他の一例を示すブロック図A block diagram showing another example of the configuration of the video synthesis device according to the embodiment 同実施の形態による映像合成装置の構成の他の一例を示すブロック図A block diagram showing another example of the configuration of the video synthesis device according to the embodiment 同実施の形態における映像上の複数の領域を示す図A diagram showing multiple areas on a video in the same embodiment 同実施の形態におけるコンピュータの構成の一例を示す図A diagram showing an example of the configuration of a computer in the same embodiment.

以下、本発明による映像合成装置、及び映像合成方法について、実施の形態を用いて説明する。なお、以下の実施の形態において、同じ符号を付した構成要素及びステップは同一または相当するものであり、再度の説明を省略することがある。本実施の形態による映像合成装置は、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成するものである。 DESCRIPTION OF THE PREFERRED EMBODIMENTS A video synthesis device and a video synthesis method according to the present invention will be described below using embodiments. Note that in the following embodiments, components and steps denoted by the same reference numerals are the same or equivalent, and a repeated explanation may be omitted. The video compositing device according to the present embodiment combines the reference video and the own video such that the ratio of the combination of the two images continuously and repeatedly changes over time.

図１は、本実施の形態による映像合成装置１の構成を示すブロック図である。図２は、学習者であるユーザ３０が、映像合成装置１を用いて模倣対象の動作を学習している状況を示す図である。本実施の形態による映像合成装置１は、記憶部１１と、映像取得部１２と、合成部１３と、映像出力部１４とを備える。なお、映像合成装置１は、一例として、後述する図１３等で示されるようにコンピュータ９００によって実現されてもよく、専用のハードウェアによって実現されてもよい。本実施の形態では、前者の場合について主に説明する。 FIG. 1 is a block diagram showing the configuration of a video synthesis device 1 according to this embodiment. FIG. 2 is a diagram showing a situation in which a user 30, who is a learner, is learning the motion of an imitation target using the video synthesis device 1. The video synthesis device 1 according to this embodiment includes a storage section 11, a video acquisition section 12, a composition section 13, and a video output section 14. Note that, as an example, the video synthesis device 1 may be realized by a computer 900 as shown in FIG. 13, etc., which will be described later, or may be realized by dedicated hardware. In this embodiment, the former case will mainly be explained.

記憶部１１では、ユーザが動作を模倣する対象である模倣対象の動作の映像である参照映像が記憶される。ユーザは、参照映像を参照しながら動作を学習する学習者である。ユーザが学習する動作は、例えば、手術などの動作であってもよく、工場における作業の動作であってもよく、介護やホテルなどにおける業務の動作であってもよく、料理などの動作であってもよく、工芸品等の作品の作成のための動作であってもよく、スポーツなどの動作であってもよく、習字などの動作であってもよく、ロープ結びの動作であってもよく、その他の動作であってもよい。模倣対象は、例えば、被模倣者の身体の一部であってもよく、被模倣者によって動作される対象物であってもよい。被模倣者は、例えば、学習者であるユーザの先生役であり、学習者が学習する対象となる動作に熟練している者であってもよい。また、被模倣者の身体の一部は、例えば、被模倣者の手を含んでいてもよい。また、被模倣者によって動作される対象物は、例えば、被模倣者が有している鉗子やメス、ピンセット、ハサミ、筆などの道具であってもよい。参照映像は、通常、カメラによって撮影された映像であるが、カメラによって撮影された映像に相当するＣＧ（Computer Graphics）映像であってもよい。 The storage unit 11 stores a reference video that is a video of an action to be imitated whose action is to be imitated by the user. The user is a learner who learns movements while referring to reference videos. The motions that the user learns may be, for example, the motions of surgery, the motions of work in a factory, the motions of work in nursing care, hotels, etc., the motions of cooking, etc. It may be an action for creating a work such as a craft, it may be an action for sports, it may be an action for calligraphy, or it may be an action for tying a rope. , or other operations. The imitation target may be, for example, a part of the imitator's body, or may be an object that is operated by the imitator. The imitator may be, for example, a teacher of a user who is a learner, and may be someone who is skilled in the movement that the learner is learning. Further, the part of the imitator's body may include, for example, the imitator's hand. Further, the object to be moved by the imitator may be, for example, a tool owned by the imitator, such as forceps, a scalpel, tweezers, scissors, or a brush. The reference video is usually a video taken by a camera, but may also be a CG (Computer Graphics) video corresponding to the video taken by a camera.

参照映像は、一例として、模倣対象を動作させる被模倣者の視点からの映像、すなわち被模倣者の一人称視点の映像であってもよい。この場合には、参照映像は、一例として、被模倣者が装着しているヘッドマウントカメラで撮影された映像であってもよい。本実施の形態では、被模倣者が使う鉗子である模倣対象４１を含む、図３で示される参照映像が記憶部１１で記憶されている場合について主に説明する。 The reference video may be, for example, a video from the viewpoint of the imitator who moves the imitation target, that is, a video from the first-person viewpoint of the imitator. In this case, the reference video may be, for example, a video shot with a head-mounted camera worn by the imitator. In the present embodiment, a case will be mainly described in which the reference video shown in FIG. 3 including the imitation target 41, which is a forceps used by a person to be imitated, is stored in the storage unit 11.

記憶部１１では、例えば、参照映像の全体が記憶されてもよく、または、参照映像の一部が記憶されてもよい。一例として、映像合成装置１が、外部から参照映像を受信しながら表示する場合には、参照映像の一部である受信された最新の参照映像の部分が記憶部１１で記憶され、それが読み出されて表示されると共に、順次、上書きされてもよい。記憶部１１には、参照映像以外の情報が記憶されてもよい。例えば、映像取得部１２によって取得された自己映像が記憶部１１で記憶されてもよい。 In the storage unit 11, for example, the entire reference video may be stored, or a part of the reference video may be stored. As an example, when the video synthesis device 1 displays a reference video while receiving it from the outside, a part of the latest received reference video that is a part of the reference video is stored in the storage unit 11, and it is readable. The information may be output and displayed, and may also be sequentially overwritten. Information other than the reference video may be stored in the storage unit 11. For example, the self-image acquired by the image acquisition unit 12 may be stored in the storage unit 11.

記憶部１１に情報が記憶される過程は問わない。例えば、記録媒体を介して情報が記憶部１１で記憶されるようになってもよく、通信回線等を介して送信された情報が記憶部１１で記憶されるようになってもよく、または、カメラなどのデバイスを介して入力された情報が記憶部１１で記憶されるようになってもよい。記憶部１１は、不揮発性の記録媒体によって実現されることが好適であるが、揮発性の記録媒体によって実現されてもよい。記録媒体は、例えば、半導体メモリや磁気ディスクなどであってもよい。 The process by which information is stored in the storage unit 11 does not matter. For example, information may be stored in the storage unit 11 via a recording medium, information transmitted via a communication line or the like may be stored in the storage unit 11, or, Information input via a device such as a camera may be stored in the storage unit 11. The storage unit 11 is preferably implemented by a nonvolatile recording medium, but may also be implemented by a volatile recording medium. The recording medium may be, for example, a semiconductor memory or a magnetic disk.

映像取得部１２は、ユーザが動作させる対象である動作対象の動作の映像である自己映像を取得する。映像取得部１２は、例えば、映像を撮影するカメラ等の光学機器であってもよく、カメラ等の光学機器によって撮影された映像を取得するものであってもよい。本実施の形態では、映像取得部１２がカメラ９０１によって撮影された映像を受け付ける場合について主に説明する。自己映像を撮影するカメラ９０１は、例えば、ヘッドマウントカメラなどのように、ユーザ３０に装着されるカメラであってもよく、三脚などの支持部によって撮影環境に設置されるカメラであってもよい。動作対象は、模倣対象に対応するものである。通常、動作対象と模倣対象は同種類のものである。そのため、動作対象は、例えば、ユーザの身体の一部であってもよく、ユーザによって動作される対象物であってもよい。ユーザの身体の一部は、例えば、ユーザの手を含んでいてもよい。また、ユーザによって動作される対象物は、例えば、ユーザが有している鉗子やメス、ピンセット、ハサミ、筆などの道具であってもよい。例えば、模倣対象が鉗子である場合には、動作対象も鉗子であることが好適である。 The video acquisition unit 12 acquires a self-video that is a video of the motion of an object to be operated by the user. The image acquisition unit 12 may be, for example, an optical device such as a camera that photographs an image, or may be one that acquires an image photographed by an optical device such as a camera. In this embodiment, a case will be mainly described in which the video acquisition unit 12 receives a video captured by the camera 901. The camera 901 that takes a self-image may be a camera worn by the user 30, such as a head-mounted camera, or a camera installed in the shooting environment with a support such as a tripod. . The action object corresponds to the imitation object. Usually, the action target and the imitation target are of the same type. Therefore, the operation target may be, for example, a part of the user's body or an object that is operated by the user. The user's body part may include, for example, the user's hand. Further, the object operated by the user may be, for example, a tool owned by the user, such as forceps, a scalpel, tweezers, scissors, or a brush. For example, when the imitation target is a forceps, it is preferable that the motion target is also a forceps.

自己映像は、一例として、動作対象を動作させるユーザの視点からの映像、すなわちユーザの一人称視点の映像であってもよい。この場合には、自己映像は、例えば、図２で示されるように、ユーザ３０が頭部に装着しているカメラ９０１で撮影された映像であってもよい。カメラ９０１の光軸は、動作対象を向いていることが好適である。本実施の形態では、ユーザ３０が使う鉗子である動作対象５１を含む、図４で示される自己映像が映像取得部１２によって取得される場合について主に説明する。なお、参照映像を撮影する参照映像用カメラと模倣対象との相対的な位置関係と、自己映像を撮影する自己映像用カメラと動作対象との相対的な位置関係とは同じであるか、または近いことが好適である。また、参照映像に含まれる各フレームと、自己映像に含まれる各フレームとは、例えば、それぞれ同じ大きさであることが好適である。フレームが同じ大きさであるとは、フレームの縦方向のピクセル数と、横方向のピクセル数とがそれぞれ同じであることであってもよい。 The self-video may be, for example, a video from the viewpoint of the user who operates the action object, that is, a video from the user's first-person viewpoint. In this case, the self-image may be, for example, an image taken by a camera 901 worn on the head of the user 30, as shown in FIG. 2. It is preferable that the optical axis of camera 901 faces the object of operation. In this embodiment, a case will be mainly described in which a self-image shown in FIG. 4 including an action object 51, which is a forceps used by the user 30, is acquired by the image acquisition unit 12. In addition, the relative positional relationship between the reference video camera that shoots the reference video and the imitation target is the same as the relative positional relationship between the self-video camera that shoots the self-video and the motion target, or Preferably, it is close. Further, it is preferable that each frame included in the reference video and each frame included in the self-video have the same size, for example. The frames having the same size may mean that the number of pixels in the vertical direction and the number of pixels in the horizontal direction of the frames are the same.

合成部１３は、参照映像と自己映像とを合成して合成映像を生成する。この合成の際に、合成部１３は、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成する。参照映像と自己映像との合成の割合とは、合成映像で表示される参照映像や自己映像の表示の割合であってもよい。この割合は、例えば、一方の映像の不透明度または透明度であってもよい。本実施の形態では、参照映像と自己映像との合成の割合が、自己映像の不透明度の割合である場合について主に説明する。この場合には、参照映像の手前側、すなわち上側に自己映像を合成してもよい。両映像の合成の割合が時間方向に沿って連続的に変化するとは、例えば、合成の割合の時間方向における全体において、合成の割合の変化が連続していることであってもよく（例えば、図５Ａ、図５Ｂ参照）、合成の割合の時間方向における少なくとも一部において、合成の割合の変化が連続していることであってもよい（例えば、図５Ｃ参照）。また、時間方向における合成の割合の繰り返しは、例えば、周期的な繰り返しであってもよく、または、周期的ではない繰り返しであってもよい。周期的に変化する合成の割合の変化の周期は特に限定されないが、例えば、０．５秒から３秒程度であってもよい。模倣対象の動きが早いほど、周期が短いことが好適である。また、参照映像と自己映像との合成の割合は、一例として、１００：０から０：１００までの間で変化してもよく、または、９５：５から５：９５までの間などのように、合成映像が一方の映像となる期間がないように変化してもよい。本実施の形態では、前者の場合について主に説明する。また、時間方向における合成の割合の連続的な変化は、滑らかな変化であってもよく、または、そうでなくてもよい。前者の場合には、合成の割合の時間変化は、時間微分可能であってもよい。 The combining unit 13 combines the reference video and the own video to generate a composite video. During this compositing, the compositing unit 13 combines the reference video and the self-video so that the ratio of compositing the images continuously and repeatedly changes over time. The ratio of combining the reference video and the self-video may be the ratio of the display of the reference video and the self-video displayed in the composite video. This ratio may be, for example, the opacity or transparency of one image. In this embodiment, a case will be mainly described in which the ratio of composition of the reference image and the self-image is the ratio of the opacity of the self-image. In this case, the self-image may be combined with the front side, that is, the upper side of the reference image. For example, the combination ratio of both videos may continuously change in the time direction, for example, the combination ratio may continuously change throughout the entire combination ratio in the time direction (for example, (See FIGS. 5A and 5B), or the change in the synthesis rate may be continuous in at least a portion of the synthesis rate in the time direction (for example, see FIG. 5C). Further, the repetition of the synthesis ratio in the time direction may be, for example, periodic repetition or non-periodic repetition. The period of change in the periodic combination ratio is not particularly limited, but may be, for example, about 0.5 seconds to 3 seconds. It is preferable that the faster the movement of the imitation target, the shorter the period. Furthermore, the ratio of compositing the reference video and the own video may vary between 100:0 and 0:100, or between 95:5 and 5:95, for example. , the composite video may be changed so that there is no period during which the composite video is one of the videos. In this embodiment, the former case will mainly be explained. Furthermore, the continuous change in the composition ratio in the time direction may or may not be a smooth change. In the former case, the time change in the rate of synthesis may be time-differentiable.

合成部１３は、一例として、参照映像と自己映像との合成の割合の時間に沿った変化が正弦波となるように両映像を合成してもよい。この場合には、参照映像の手前側に合成される自己映像の不透明度（すなわち、合成の割合）は、図５Ａで示されるように時系列に沿って変化してもよい。この場合には、合成映像において参照映像がメインとなる期間（自己映像の不透明度が０％付近の期間）、及び合成映像において自己映像がメインとなる期間（自己映像の不透明度が１００％付近の期間）がそれぞれ存在しているため、ユーザは、参照映像のみを見ることもでき、自己映像のみを見ることもできる。したがって、ユーザは、参照映像に含まれる模倣対象と、自己映像に含まれる動作対象とをそれぞれ区別して把握することが容易になる。一方、それ以外の期間では、合成映像に参照映像及び自己映像の両方が含まれることになるため、ユーザは、参照映像に含まれる模倣対象と、自己映像に含まれる動作対象とを一緒に見ることもできる。したがって、ユーザは、自己映像に含まれる動作対象を、参照映像に含まれる模倣対象の動作に合わせて動作させることが容易になる。すなわち、ユーザが、模倣対象の動作に追従して動作対象を動作させることができる。 For example, the synthesizing unit 13 may synthesize the reference image and the own image so that the change in the synthesis ratio of the reference image and the own image over time becomes a sine wave. In this case, the opacity of the self-image that is synthesized in front of the reference image (that is, the proportion of synthesis) may change over time as shown in FIG. 5A. In this case, there is a period when the reference image is the main image in the composite image (a period when the opacity of the self image is around 0%), and a period when the self image is the main image in the composite image (a period when the opacity of the self image is around 100%). 2), the user can view only the reference video or only his own video. Therefore, the user can easily distinguish between the imitation target included in the reference video and the motion target included in the self-video. On the other hand, in other periods, the composite video includes both the reference video and the self-video, so the user sees the imitation target included in the reference video and the action target included in the self-video together. You can also do that. Therefore, the user can easily move the motion target included in the self-video in accordance with the motion of the imitation target included in the reference video. That is, the user can make the motion target move by following the motion of the imitation target.

合成部１３は、例えば、合成を行う際の参照映像と自己映像との合成の割合を特定し、その測定した割合で、参照映像と自己映像とを合成してもよい。例えば、参照映像と自己映像との合成の割合の時間に沿った変化が図５Ａで示される場合には、合成部１３は、時間ｔに両映像を合成する際に、時間ｔに対応する自己映像の不透明度を特定し、時間ｔに対応する参照映像の手前側に合成する自己映像の不透明度が、その特定した値となるように両映像を合成してもよい。参照映像と自己映像との合成の割合の時間変化は、例えば、グラフや、テーブル、関数等によって示され、合成部１３は、それらを用いて合成を行う際の参照映像と自己映像との合成の割合を特定してもよい。合成の割合が図５Ａで示される場合には、時間ｔにおける自己映像の不透明度（％）の示す関数は、例えば、次式で示されるものであってもよい。次式において、Ｐは不透明度の周期である。
自己映像の不透明度（％）＝５０×（ｓｉｎ（２πｔ／Ｐ－π／２）＋１） For example, the combining unit 13 may specify the ratio of combining the reference video and the self-video when performing the combination, and combine the reference video and the self-video at the measured ratio. For example, if the change over time in the composition ratio of the reference video and self-video is shown in FIG. 5A, the combining unit 13, when combining both videos at time t, The opacity of the video may be specified, and both videos may be combined so that the opacity of the self-image to be combined on the near side of the reference video corresponding to time t becomes the specified value. The time change in the ratio of composition of the reference video and self-video is shown by, for example, a graph, a table, a function, etc., and the composition unit 13 uses these to synthesize the reference video and self-video when performing composition. You may also specify a percentage of When the composition ratio is shown in FIG. 5A, the function represented by the opacity (%) of the self-image at time t may be, for example, represented by the following equation. In the following equation, P is the period of opacity.
Opacity of self-image (%) = 50 x (sin (2πt/P-π/2) + 1)

図６、図７は、合成映像の一例を示す図である。図６は、図５Ａにおける時間Ｔ１の不透明度で生成された合成映像であり、図７は、図５Ａにおける時間Ｔ２の不透明度で生成された合成映像である。図６で示される合成映像では、自己映像の不透明度が５０％より低いため、模倣対象４１の方が、動作対象５１よりもはっきりと表示されることになる。一方、図７で示される合成映像では、自己映像の不透明度が５０％より高いため、動作対象５１の方が、模倣対象４１よりもはっきりと表示されることになる。 6 and 7 are diagrams showing an example of a composite video. 6 is a composite image generated with opacity at time T1 in FIG. 5A, and FIG. 7 is a composite image generated with opacity at time T2 in FIG. 5A. In the composite image shown in FIG. 6, since the opacity of the self image is lower than 50%, the imitation target 41 is displayed more clearly than the action target 51. On the other hand, in the composite image shown in FIG. 7, since the opacity of the self image is higher than 50%, the action object 51 is displayed more clearly than the imitation object 41.

また、合成部１３は、例えば、参照映像と自己映像との合成の割合の時間に沿った変化が正弦波以外となるように両映像を合成してもよい。例えば、合成部１３は、参照映像と自己映像との合成の割合の時間に沿った変化が、図５Ｂで示されるように三角波となるように両映像を合成してもよく、図５Ｃで示されるようにノコギリ波となるように両映像を合成してもよい。また、合成部１３は、例えば、参照映像に含まれる各フレームと、自己映像に含まれる各フレームとの大きさが異なる場合に、一方の映像の大きさを他方の映像の大きさに合わせて合成を行ってもよい。 Further, the combining unit 13 may combine the reference video and the own video so that the change over time in the combination ratio of the reference video and the own video becomes a wave other than a sine wave, for example. For example, the synthesizing unit 13 may synthesize the reference image and the self-image so that the change over time in the synthesis ratio of both images becomes a triangular wave as shown in FIG. 5B, or as shown in FIG. 5C. Both images may be combined to create a sawtooth waveform. Further, for example, when each frame included in the reference video and each frame included in the own video are different in size, the combining unit 13 adjusts the size of one video to the size of the other video. Synthesis may also be performed.

映像出力部１４は、合成部１３によって生成された合成映像を出力する。ここで、この出力は、例えば、表示デバイス（例えば、液晶ディスプレイや有機ＥＬディスプレイなど）への表示でもよく、合成映像を表示する機器への通信回線を介した送信でもよい。なお、映像出力部１４は、出力を行うデバイス（例えば、表示デバイスや通信デバイスなど）を含んでもよく、または含まなくてもよい。映像出力部１４によって出力された映像が表示される表示デバイスは、例えば、図２で示されるように、ユーザ３０のいる環境に配置されたディスプレイであってもよく、ユーザ３０が装着しているヘッドマウントディスプレイであってもよい。また、映像出力部１４は、ハードウェアによって実現されてもよく、または、それらのデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい。 The video output section 14 outputs the composite video generated by the composition section 13. Here, this output may be displayed on a display device (for example, a liquid crystal display or an organic EL display), or may be transmitted via a communication line to a device that displays a composite image. Note that the video output unit 14 may or may not include a device that performs output (for example, a display device, a communication device, etc.). The display device on which the video output by the video output unit 14 is displayed may be, for example, a display placed in the environment where the user 30 is, as shown in FIG. It may also be a head mounted display. Further, the video output unit 14 may be realized by hardware, or may be realized by software such as a driver that drives these devices.

次に、映像合成装置１の動作について図８のフローチャートを用いて説明する。
（ステップＳ１０１）合成部１３は、参照映像と自己映像との合成を行うかどうか判断する。そして、合成を行う場合には、ステップＳ１０２に進み、そうでない場合には、合成を行うと判断するまで、ステップＳ１０１の処理を繰り返す。なお、合成部１３は、各映像に含まれるフレームの表示間隔ごとに、合成を行うと判断してもよい。例えば、フレームレートがＡ（ｆｐｓ）である場合には、合成部１３は、１／Ａ（秒）ごとに、合成を行うと判断してもよい。 Next, the operation of the video synthesis device 1 will be explained using the flowchart of FIG. 8.
(Step S101) The synthesis unit 13 determines whether to synthesize the reference video and the own video. If combining is to be performed, the process proceeds to step S102; otherwise, the process of step S101 is repeated until it is determined that combining is to be performed. Note that the combining unit 13 may determine to perform the combining at each display interval of frames included in each video. For example, when the frame rate is A (fps), the combining unit 13 may determine to perform combining every 1/A (second).

（ステップＳ１０２）合成部１３は、記憶部１１から合成対象の参照映像のフレームを読み出す。なお、合成の処理が開始されてからＴ秒後に読み出す対象となる参照映像のフレームは、例えば、先頭からＴ秒の位置のフレームであってもよい。 (Step S102) The compositing unit 13 reads the frame of the reference video to be composited from the storage unit 11. Note that the reference video frame to be read out T seconds after the start of the compositing process may be, for example, a frame located T seconds from the beginning.

（ステップＳ１０３）映像取得部１２は、自己映像のフレームを取得する。映像取得部１２は、例えば、カメラ９０１から最新の自己映像のフレームを取得してもよい。 (Step S103) The video acquisition unit 12 acquires a frame of the self video. The video acquisition unit 12 may acquire the latest self-video frame from the camera 901, for example.

（ステップＳ１０４）合成部１３は、参照映像と合成映像との合成の割合を特定する。例えば、図５Ａで示されるように合成の割合が時間方向に変化する場合であって、合成の処理が開始されてからＴ秒が経過している場合には、合成部１３は、Ｔ秒に対応する合成の割合を特定してもよい。 (Step S104) The combining unit 13 specifies the ratio of combining the reference video and the composite video. For example, if the composition rate changes in the time direction as shown in FIG. 5A, and T seconds have elapsed since the composition process was started, the composition unit 13 The corresponding composition ratio may also be specified.

（ステップＳ１０５）合成部１３は、ステップＳ１０２で読み出した参照映像のフレームと、ステップＳ１０３で取得された自己映像のフレームとを、ステップＳ１０４で特定した割合で合成することによって、合成映像のフレームを生成する。 (Step S105) The synthesis unit 13 synthesizes the frame of the reference image read out in step S102 and the frame of the self-image acquired in step S103 at the ratio specified in step S104, thereby creating a frame of the synthesized image. generate.

（ステップＳ１０６）映像出力部１４は、ステップＳ１０５で生成された合成映像のフレームを出力する。そして、ステップＳ１０１に戻る。このように、参照映像及び自己映像の各フレームが合成されて出力されることにより、両映像の合成が行われる。 (Step S106) The video output unit 14 outputs the composite video frame generated in step S105. Then, the process returns to step S101. In this way, each frame of the reference video and self-video is combined and output, thereby performing combination of both videos.

なお、図８のフローチャートにおける処理の順序は一例であり、同様の結果を得られるのであれば、各ステップの順序を変更してもよい。例えば、ステップＳ１０２～Ｓ１０４の各処理は、図８で示される以外の順序で実行されてもよい。また、図８のフローチャートにおいて、電源オフや処理終了の割り込みにより処理は終了する。 Note that the order of processing in the flowchart of FIG. 8 is an example, and the order of each step may be changed as long as the same result can be obtained. For example, each process of steps S102 to S104 may be executed in an order other than that shown in FIG. 8. Further, in the flowchart of FIG. 8, the process is terminated by turning off the power or by an interrupt to terminate the process.

次に、本実施の形態による映像合成装置１の動作について、具体例を用いて説明する。本具体例において、参照映像は、鉗子である模倣対象４１が熟練者によって使用されている状況の映像であるとする。参照映像は、例えば、図３で示される映像である。また、図２で示されるように、ユーザ３０が頭部に装着しているカメラ９０１を用いて、ユーザ３０が使用している鉗子である動作対象５１の自己映像が撮影されるものとする。自己映像は、例えば、図４で示される映像である。 Next, the operation of the video synthesis device 1 according to this embodiment will be explained using a specific example. In this specific example, it is assumed that the reference video is a video of a situation where the imitation target 41, which is a forceps, is being used by an expert. The reference video is, for example, the video shown in FIG. Further, as shown in FIG. 2, it is assumed that a self-image of an action object 51, which is a forceps used by the user 30, is photographed using a camera 901 worn on the head of the user 30. The self-image is, for example, the image shown in FIG. 4.

まず、ユーザ３０が、映像合成装置１を操作して、参照映像と自己映像との合成の処理を開始させると、合成部１３は、記憶部１１で記憶されている参照映像の１番目のフレームを読み出す（ステップＳ１０１，Ｓ１０２）。また、映像取得部１２は、カメラ９０１によって撮影された自己映像のフレームを、有線または無線を介して受け付ける（ステップＳ１０３）。また、合成部１３は、その時点の合成の割合を特定する（ステップＳ１０４）。本具体例では、図５Ａで示される自己映像の不透明度を用いて、参照映像の手前側に自己映像が合成されるものとする。なお、合成処理の開始時点では、自己映像の不透明度が０％であったとする。すると、合成部１３は、参照映像の手前側に、不透明度が０％である自己映像を合成した合成映像のフレームを生成して映像出力部１４に渡す（ステップＳ１０５）。すなわち、参照映像のみが含まれる合成映像のフレームが生成されることになる。合成映像のフレームを受け取ると、映像出力部１４は、その受け取った合成映像のフレームを表示デバイス９０２に出力する（ステップＳ１０６）。このようにして、ユーザ３０は、合成映像を見ることができるようになる。 First, when the user 30 operates the video compositing device 1 to start the process of compositing the reference video and self-video, the compositing unit 13 selects the first frame of the reference video stored in the storage unit 11. is read out (steps S101, S102). Further, the video acquisition unit 12 receives a frame of the self-video shot by the camera 901 via wire or wirelessly (step S103). Furthermore, the combining unit 13 specifies the combining ratio at that time (step S104). In this specific example, it is assumed that the self-image is synthesized on the near side of the reference image using the opacity of the self-image shown in FIG. 5A. Note that it is assumed that the opacity of the self-image is 0% at the start of the compositing process. Then, the synthesizing unit 13 generates a frame of a synthesized image in which the self-image whose opacity is 0% is synthesized on the near side of the reference image, and sends it to the image outputting unit 14 (step S105). In other words, a composite video frame containing only the reference video is generated. Upon receiving the composite video frame, the video output unit 14 outputs the received composite video frame to the display device 902 (step S106). In this way, the user 30 can view the composite video.

なお、参照映像と自己映像との合成の割合は、図５Ａで示されるように変化することになる。したがって、合成映像において、徐々に自己映像の割合が増加することになり、時間Ｔ１には、図６で示される合成映像が表示され、時間Ｔ２には、図７で示される合成映像が表示されることになる。また、自己映像のみが含まれる合成映像が表示された後に、合成映像において、徐々に自己映像の割合が減少することになる。このように、合成映像における自己映像の割合は、正弦波に応じて、０％から１００％までの範囲内で周期的に変化することになる。ユーザ３０は、表示デバイス９０２に表示された合成映像を見ながら、模倣対象の動作と同様の動作となるように、動作対象を動作させることによって学習を行うことができる。 Note that the ratio of compositing the reference video and the self video changes as shown in FIG. 5A. Therefore, in the composite image, the proportion of self-image gradually increases, and at time T1, the composite image shown in FIG. 6 is displayed, and at time T2, the composite image shown in FIG. 7 is displayed. That will happen. Furthermore, after a composite image containing only self-images is displayed, the proportion of self-images in the composite image gradually decreases. In this way, the proportion of the self-image in the composite image changes periodically within the range from 0% to 100% according to the sine wave. The user 30 can learn by moving the motion target so that the motion is similar to the motion of the imitation target while viewing the composite video displayed on the display device 902.

以上のように、本実施の形態による映像合成装置１によれば、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成することができる。したがって、参照映像と自己映像とを交互に切り替えて表示しないため、ユーザが、参照映像と自己映像とが融合したように感じることを防止できる。その結果、合成映像を見るユーザは、参照映像における模倣対象４１と、自己映像における動作対象５１とを容易に区別することができるようになり、模倣対象の動作に追従するように動作対象を動作させることができ、参照映像を用いた学習の効果を高めることができる。また、両映像の合成の割合が周期的に繰り返して変化される場合には、参照映像がメインに表示される時点や自己映像がメインに表示される時点を容易に予測することができ、結果として、合成映像において、模倣対象や動作対象を特定しやすくなる。 As described above, according to the video composition device 1 according to the present embodiment, it is possible to composite the reference video and the own video such that the ratio of composition of both videos changes continuously and repeatedly over time. . Therefore, since the reference video and the self-video are not alternately displayed, it is possible to prevent the user from feeling as if the reference video and the self-video are fused. As a result, the user viewing the composite video can easily distinguish between the imitation target 41 in the reference video and the motion target 51 in the self-video, and moves the motion target to follow the motion of the imitation target. It is possible to increase the effectiveness of learning using reference videos. Additionally, if the proportion of the combination of both images is changed periodically, it is possible to easily predict when the reference image will be displayed as the main image or when the self image will be displayed as the main image. As a result, it becomes easier to identify the imitation target or action target in the composite video.

なお、本実施の形態による映像合成装置１は、例えば、図９で示されるように、合成映像における模倣対象と動作対象との距離を取得する距離取得部２１をさらに備え、合成部１３は、例えば、距離取得部２１によって取得された距離が遠いほど、自己映像がより多く表示されるように合成を行ってもよい。合成映像における模倣対象と動作対象との距離とは、仮に、ある時点に読み出された参照映像と、その時点に取得された自己映像とを合成して合成映像を生成したとした場合に、その合成映像における模倣対象と動作対象との距離である。この距離は、例えば、直前に合成された合成映像のフレームにおける模倣対象と動作対象との距離であってもよく、この距離の取得のために、合成時点の参照映像のフレームと自己映像のフレームとを合成した合成映像のフレームにおける模倣対象と動作対象との距離であってもよく、合成時点の参照映像のフレームにおける模倣対象の位置と、最新の自己映像のフレームにおける動作対象の位置との距離であってもよい。 Note that the video synthesis device 1 according to the present embodiment further includes a distance acquisition unit 21 that acquires the distance between the imitation target and the motion target in the composite video, for example, as shown in FIG. For example, composition may be performed such that the greater the distance acquired by the distance acquisition unit 21, the more self-images are displayed. The distance between the imitation target and the motion target in a composite video is, if a composite video is generated by combining a reference video read out at a certain point with a self-image acquired at that time, This is the distance between the imitation target and the motion target in the composite video. This distance may be, for example, the distance between the imitation target and the motion target in the frame of the synthesized image that was synthesized immediately before, and in order to obtain this distance, the reference image frame and the self-image frame at the time of synthesis are It may be the distance between the imitation target and the motion target in the frame of the composite video that is synthesized with the above, and the distance between the position of the motion target in the frame of the reference video at the time of composition and the position of the motion target in the latest self-video frame. It may be distance.

映像のフレームにおける模倣対象や動作対象の領域は、例えば、フレームにおける物体検出や、セグメンテーション、パターンマッチングなどによって特定されてもよい。また、模倣対象の位置や、動作対象の位置は、例えば、フレームにおいて特定された模倣対象や動作対象の領域の代表位置であってもよい。ある領域の代表位置は、一例として、その領域の重心の位置であってもよく、その領域を囲むバウンディングボックスの中心の位置であってもよい。 The imitation target or action target area in a video frame may be specified by, for example, object detection in the frame, segmentation, pattern matching, or the like. Further, the position of the imitation target or the position of the motion target may be, for example, a representative position of a region of the imitation target or motion target specified in the frame. The representative position of a certain area may be, for example, the position of the center of gravity of the area, or the position of the center of a bounding box surrounding the area.

合成部１３は、例えば、取得された距離が閾値ＴＨより大きい場合には、取得された距離が閾値ＴＨより小さい場合よりも、自己映像がより多く表示されるように参照映像と自己映像とを合成してもよい。なお、合成部１３は、例えば、取得された距離が閾値ＴＨと等しい場合には、取得された距離が閾値ＴＨより小さい場合よりも、自己映像がより多く表示されるように参照映像と自己映像とを合成してもよく、または、そうでなくてもよい。自己映像がより多く表示されるように合成するとは、例えば、合成映像における自己映像の割合の時間方向における所定の期間の平均値が、より大きくなるように合成することであってもよい。合成の割合が周期的に変化する場合には、所定の期間は、合成の割合の１周期の整数倍に相当する期間であってもよい。この場合には、取得された距離が閾値ＴＨより大きいときには、取得された距離が閾値ＴＨより小さいときよりも、合成映像が表示された際に、自己映像の方がより多く表示されることになる。一例として、取得された距離が閾値ＴＨより大きい場合には、合成映像における自己映像の割合の平均値が、５０％を超えるように合成が行われてもよい。模倣対象と動作対象との距離が大きい場合には、通常、動作対象の動作を模倣対象の動作に合わせるよりは、動作対象の位置が、模倣対象の位置に近づくように動作対象が移動されることになる。そのため、模倣対象と動作対象との距離が小さい場合よりも、動作対象がより多く表示されるように合成が行われることによって、その移動を容易に行うことができるようになる。一方、模倣対象と動作対象との距離が小さい場合には、通常、動作対象の動作を模倣対象の動作に合わせるように動作対象が動作されることになる。そのため、模倣対象と動作対象との距離が大きい場合よりも、模倣対象がより多く表示されるように合成が行われることによって、その動作の模倣を容易に行うことができるようになる。 For example, the combining unit 13 combines the reference video and the self-video so that when the acquired distance is larger than the threshold TH, more self-images are displayed than when the acquired distance is smaller than the threshold TH. May be synthesized. Note that the combining unit 13 combines the reference video and the self-video so that, for example, when the acquired distance is equal to the threshold TH, more self-videos are displayed than when the acquired distance is smaller than the threshold TH. may or may not be synthesized. Combining so that more self-images are displayed may mean, for example, combining so that the average value of the proportion of self-images in the composite video over a predetermined period in the time direction becomes larger. When the rate of synthesis changes periodically, the predetermined period may be a period corresponding to an integral multiple of one cycle of the rate of synthesis. In this case, when the acquired distance is greater than the threshold TH, more self-images are displayed when the composite image is displayed than when the acquired distance is smaller than the threshold TH. Become. As an example, if the acquired distance is greater than the threshold TH, composition may be performed such that the average ratio of self-images in the composite image exceeds 50%. When the distance between the imitation target and the action target is large, the action target is usually moved so that the position of the action target approaches the position of the imitation target, rather than adjusting the action of the action target to the action of the imitation target. It turns out. Therefore, by compositing so that more of the motion targets are displayed than when the distance between the imitation target and the motion target is small, it becomes possible to easily move the motion targets. On the other hand, when the distance between the imitation target and the motion target is small, the motion target is usually moved so that the motion of the motion target matches the motion of the imitation target. Therefore, by compositing so that more imitation targets are displayed than when the distance between the imitation target and the motion target is large, the motion can be easily imitated.

一例として、合成部１３は、取得された距離が閾値ＴＨより大きい場合には、図５Ｄで示されるように時間的に変化する自己映像の不透明度に応じて、参照映像と自己映像との合成を行ってもよい。また、合成部１３は、取得された距離が閾値ＴＨより小さい場合には、図５Ａで示されるように時間的に変化する自己映像の不透明度に応じて、参照映像と自己映像との合成を行ってもよい。このようにすることで、取得された距離が閾値ＴＨより大きい場合に、取得された距離が閾値ＴＨより小さい場合よりも、自己映像がより多く表示されるように参照映像と自己映像とを合成することができるようになる。 For example, if the acquired distance is greater than the threshold TH, the combining unit 13 combines the reference image and the self-image according to the opacity of the self-image that changes over time as shown in FIG. 5D. You may do so. Furthermore, when the acquired distance is smaller than the threshold TH, the synthesis unit 13 synthesizes the reference image and the self-image according to the opacity of the self-image that changes over time as shown in FIG. 5A. You may go. By doing this, the reference video and self-video are combined so that when the acquired distance is greater than the threshold TH, more self-images are displayed than when the acquired distance is smaller than the threshold TH. You will be able to do this.

また、本実施の形態による映像合成装置１は、例えば、図１０で示されるように、参照映像及び自己映像の少なくとも一方である変更対象の映像について、背景領域を透明またはあらかじめ決められた単色に変更する背景変更部２２をさらに備え、合成部１３は、変更対象の映像については、背景変更部２２による変更後の映像を合成してもよい。変更対象の映像は、例えば、参照映像のみであってもよく、自己映像のみであってもよく、両映像であってもよい。変更対象の映像が参照映像のみである場合には、合成部１３によって、背景の変更後の参照映像と、背景の変更が行われていない自己映像とが合成されてもよい。また、変更対象の映像が自己映像のみである場合には、合成部１３によって、背景の変更が行われていない参照映像と、背景の変更後の自己映像とが合成されてもよい。また、変更対象の映像が参照映像及び自己映像である場合には、合成部１３によって、背景の変更後の参照映像と、背景の変更後の自己映像とが合成されてもよい。 Furthermore, as shown in FIG. 10, for example, the image synthesizing device 1 according to the present embodiment makes the background region transparent or a predetermined monochrome color for the image to be changed, which is at least one of the reference image and the self image. The image processing apparatus may further include a background changing section 22 that changes the background, and the synthesizing section 13 may synthesize the image changed by the background changing section 22 with respect to the image to be changed. The video to be changed may be, for example, only the reference video, only the self-video, or both videos. If the video to be changed is only the reference video, the compositing unit 13 may combine the reference video with the background changed and the self-video without the background changed. Further, when the video to be changed is only the self-image, the compositing unit 13 may combine the reference video whose background has not been changed and the self-image after the background has been changed. Further, when the video to be changed is a reference video and a self-video, the combining unit 13 may combine the reference video after the background has been changed and the self-video after the background has been changed.

背景変更部２２は、例えば、模倣対象や動作対象の領域を特定し、その特定した領域以外の領域を背景領域として特定してもよい。模倣対象や動作対象の領域の特定は、上記したように行われてもよい。なお、模倣対象や動作対象によって所定の処理（例えば、鉗子である模倣対象や動作対象によって行われる縫合処理など）の対象となる物（例えば、臓器や模擬臓器など）の領域も、背景以外としてもよい。そして、背景変更部２２は、特定した背景領域を、例えば、透明にしてもよく、あらかじめ決められた単色に変更してもよい。なお、参照映像及び自己映像の両方が変更対象の映像である場合であって、両映像の背景領域が単色に変更される場合には、参照映像の背景の色と、自己映像の背景の色とは異なっていてもよい。このようにすることで、合成映像が表示される際に、どちらの映像がメインに表示されているのかを、その背景の色を用いて容易に把握することができるようになる。また、画像の背景領域を透明にしたり、特定の単色にしたりするソフトウェアは公知である。背景変更部２２は、そのようなソフトウェアを用いて、変更対象の映像の各フレームにおいて、背景領域を透明や単色に変更してもよい。 The background changing unit 22 may, for example, specify an area to be imitated or an action target, and specify an area other than the specified area as the background area. The area to be imitated or to be acted upon may be specified as described above. In addition, areas of objects (for example, internal organs or simulated organs) that are subject to predetermined processing (for example, suturing processing performed by the imitation target or motion target, which are forceps) depending on the imitation target or motion target are also displayed as non-background images. Good too. Then, the background changing unit 22 may change the identified background area to, for example, transparent or a predetermined single color. In addition, if both the reference video and self-video are the videos to be changed, and the background areas of both videos are changed to a single color, the background color of the reference video and the background color of the self-video will be changed. may be different from By doing this, when the composite video is displayed, it becomes possible to easily understand which video is being displayed as the main one using the color of the background. Furthermore, software that makes the background area of an image transparent or a specific monochromatic color is known. The background changing unit 22 may use such software to change the background area to transparent or monochrome in each frame of the video to be changed.

このように、少なくとも一方の映像の背景領域が透明または単色に変更されることによって、合成映像において、模倣対象や動作対象をより見えやすくすることができる。例えば、模倣対象の領域に自己映像の透明または単色の背景領域が表示されたり、動作対象の領域に参照映像の透明または単色の背景領域が表示されたりすることによって、ユーザが、模倣対象や動作対象を容易に特定することができるようになる。 In this way, by changing the background area of at least one of the images to be transparent or monochrome, it is possible to make the imitation target or the action target more visible in the composite image. For example, by displaying a transparent or monochrome background area of the self-image in the area to be imitated, or by displaying a transparent or monochrome background area of the reference image in the area to be imitated, the user can You will be able to easily identify the target.

また、本実施の形態による映像合成装置１は、例えば、図１１で示されるように、ユーザから注目箇所を示す注目箇所情報を受け付ける受付部２３と、参照映像及び自己映像の少なくとも一方である特定対象の映像について、注目箇所を特定する特定部２４と、特定対象の映像について特定された注目箇所以外の領域をぼかす処理を行うぼかし処理部２５とをさらに備え、合成部１３は、特定対象の映像については、ぼかし処理部２５によるぼかし処理後の映像を合成してもよい。特定対象の映像は、例えば、参照映像のみであってもよく、自己映像のみであってもよく、両映像であってもよい。 Furthermore, as shown in FIG. 11, the video synthesis device 1 according to the present embodiment also includes a reception unit 23 that receives attention point information indicating a point of interest from a user, and a specific part that is at least one of a reference video and a self-video. The compositing unit 13 further includes a specifying unit 24 that specifies a point of interest in the target video, and a blurring processing unit 25 that performs a process of blurring an area other than the specified point of interest in the target video. As for the video, the video after blurring processing by the blurring processing unit 25 may be combined. The specific target video may be, for example, only the reference video, only the self-video, or both videos.

受付部２３は、例えば、図１２で示されるように、映像のフレームにおけるあらかじめ決められた領域６１～６５をそれぞれ識別する複数の領域識別子の少なくともいずれかである注目箇所情報を受け付けてもよい。この場合には、注目箇所情報である領域識別子によって識別される領域が、注目箇所であってもよい。領域識別子は、例えば、領域６１～６５をそれぞれ識別する「中央」「左上」「右上」「左下」「右下」であってもよい。また、受付部２３は、一例として、音声認識の結果である注目箇所情報を受け付けてもよい。この場合には、ユーザは、音声によって注目箇所を指定することができるようになる。なお、映像において特定することができる注目箇所の個数や位置は、図１２で示される以外であってもよいことは言うまでもない。 For example, as shown in FIG. 12, the reception unit 23 may receive point-of-interest information that is at least one of a plurality of region identifiers that respectively identify predetermined regions 61 to 65 in a video frame. In this case, the area identified by the area identifier that is the attention point information may be the attention point. The region identifiers may be, for example, "center," "upper left," "upper right," "lower left," and "lower right," which identify the regions 61 to 65, respectively. Furthermore, the receiving unit 23 may receive point-of-interest information that is a result of voice recognition, for example. In this case, the user can specify the point of interest by voice. It goes without saying that the number and positions of points of interest that can be specified in the video may be other than those shown in FIG. 12.

受付部２３は、例えば、入力デバイス（例えば、キーボードやマウス、タッチパネルなど）から入力された情報を受け付けてもよく、有線または無線の通信回線を介して送信された情報を受信してもよい。なお、受付部２３は、受け付けを行うためのデバイス（例えば、入力デバイスや通信デバイスなど）を含んでもよく、または含まなくてもよい。また、受付部２３は、ハードウェアによって実現されてもよく、または所定のデバイスを駆動するドライバ等のソフトウェアによって実現されてもよい。 For example, the reception unit 23 may receive information input from an input device (for example, a keyboard, mouse, touch panel, etc.), or may receive information transmitted via a wired or wireless communication line. Note that the accepting unit 23 may or may not include a device for accepting requests (for example, an input device, a communication device, etc.). Further, the reception unit 23 may be realized by hardware or by software such as a driver that drives a predetermined device.

特定部２４は、例えば、受付部２３によって受け付けられた注目箇所情報に応じて注目箇所を特定してもよい。この場合には、特定部２４は、例えば、受付部２３によって受け付けられた注目箇所情報である領域識別子によって識別される領域である注目箇所を特定してもよい。具体的には、領域識別子「中央」である注目箇所情報が受け付けられた場合には、特定部２４は、領域識別子「中央」で識別される領域６１を、注目箇所として特定してもよい。注目箇所情報が受け付けられる場合であって、特定対象の映像が参照映像と自己映像との両方である場合には、例えば、両映像における注目箇所は同じであってもよく、または異なっていてもよい。後者の場合には、例えば、映像ごとに注目箇所情報が受け付けられてもよい。 The specifying unit 24 may specify the point of interest according to the point of interest information received by the reception unit 23, for example. In this case, the specifying unit 24 may, for example, specify the point of interest which is an area identified by the area identifier which is the point of interest information received by the reception unit 23. Specifically, when point-of-interest information with the region identifier "center" is received, the specifying unit 24 may specify the region 61 identified by the region identifier "center" as the point of interest. In the case where attention point information is accepted and the specific target video is both a reference video and a self-image, for example, the attention point in both videos may be the same or may be different. good. In the latter case, for example, attention point information may be accepted for each video.

また、特定部２４は、受け付けられた注目箇所情報を用いないで注目箇所を特定してもよい。この場合には、特定部２４は、例えば、特定対象の映像において動きのある箇所である注目箇所を特定してもよい。映像において動きのある箇所である注目箇所は、例えば、映像の所定の時間間隔だけ離れた２個のフレームにおいてそれぞれ特徴点を抽出し、その２個のフレームにおいて対応する特徴点の単位時間あたりの位置の変化、すなわち距離が閾値を超える箇所であってもよい。一例として、映像におけるオプティカルフローのベクトルの大きさが所定の閾値を超える箇所を、映像において動きのある箇所である注目箇所としてもよい。この閾値は、例えば、模倣対象や動作対象が注目箇所に含まれるように設定されてもよい。なお、注目箇所の形状は、例えば、矩形状、楕円形状などのようにあらかじめ決まっていてもよい。また、例えば、注目箇所の大きさもあらかじめ決まっていてもよい。この場合には、特定部２４は、例えば、特定対象の映像において動きのある箇所を含む所定の大きさの所定の形状の領域である注目箇所を特定してもよい。注目箇所情報を用いないで注目箇所が特定される場合であって、特定対象の映像が参照映像と自己映像との両方である場合には、例えば、両映像における注目箇所は異なっていてもよい。受け付けられた注目箇所情報を用いないで注目箇所を特定する場合には、映像合成装置１は、受付部２３を有していなくてもよい。 Further, the specifying unit 24 may specify the point of interest without using the received point of interest information. In this case, the specifying unit 24 may specify, for example, a point of interest that is a moving point in the video to be specified. To find a point of interest that is a moving part of a video, for example, extract feature points from each of two frames separated by a predetermined time interval in the video, and calculate the per unit time of the corresponding feature points in those two frames. It may be a change in position, that is, a location where the distance exceeds a threshold value. As an example, a location in the video where the magnitude of the optical flow vector exceeds a predetermined threshold may be set as a location of interest that is a location in the video that has movement. This threshold value may be set, for example, so that the imitation target or the motion target is included in the attention location. Note that the shape of the point of interest may be determined in advance, such as a rectangular shape or an elliptical shape, for example. Furthermore, for example, the size of the point of interest may also be determined in advance. In this case, the specifying unit 24 may specify, for example, a point of interest that is an area of a predetermined size and a predetermined shape that includes a moving portion in the video to be specified. In the case where a point of interest is specified without using point of interest information, and when the video to be identified is both a reference video and a self-video, for example, the points of interest in both videos may be different. . When specifying a point of interest without using the received point-of-interest information, the video synthesis device 1 does not need to include the reception unit 23.

ぼかし処理部２５は、特定対象の映像について、特定部２４によって特定された注目箇所以外の領域をぼかす処理を行う。一例として、特定部２４は、特定対象の映像と、その映像における注目箇所を示す領域とをぼかし処理部２５に渡し、ぼかし処理部２５は、その受け取った映像及び注目箇所を用いて、注目箇所以外の領域をぼかす処理を行ってもよい。ぼかし処理部２５によって行われる処理は、例えば、画像処理ソフトウェアにおけるぼかし加工処理として知られている処理であってもよい。この処理は、一例として、あるピクセルの色を、そのピクセルを中心とした所定の範囲の周囲の色の平均値に置き換える処理であってもよい。 The blurring processing unit 25 performs a process of blurring areas other than the point of interest specified by the specifying unit 24 in the specific target video. As an example, the specifying unit 24 passes the video to be identified and an area indicating the point of interest in the video to the blurring processing unit 25, and the blurring processing unit 25 uses the received video and the point of interest to You may perform processing to blur areas other than the above. The process performed by the blurring processing unit 25 may be, for example, a process known as a blurring process in image processing software. For example, this process may be a process of replacing the color of a certain pixel with the average value of surrounding colors in a predetermined range centered on that pixel.

このように、注目箇所以外の領域についてぼかし処理が行われることによって、ユーザが、注目箇所に注目しやすいようにすることができる。例えば、ぼかし処理の行われていない参照映像と自己映像とが合成され、合成映像に両映像が含まれている場合には、模倣対象の領域に自己映像の背景が表示されたり、動作対象の領域に参照映像の背景が表示されたりすることもあり、そのような状況では、ユーザが、模倣対象や動作対象を容易に特定することが困難になることもある。それに対して、少なくとも一方の映像の注目箇所以外の領域についてぼかし処理が行われる場合には、例えば、模倣対象の領域に自己映像のぼかし処理の行われた領域が表示されたり、動作対象の領域に参照映像のぼかし処理の行われた領域が表示されたりすることによって、ユーザが、模倣対象や動作対象を容易に特定することができるようになる。 In this way, by performing the blurring process on areas other than the point of interest, it is possible to make it easier for the user to focus on the point of interest. For example, if a reference video that has not been blurred and a self-video are combined, and both videos are included in the composite video, the background of the self-video may be displayed in the area to be imitated, or the background of the self-video may be displayed in the area to be imitated. The background of the reference video may be displayed in the area, and in such a situation, it may be difficult for the user to easily identify the imitation target or action target. On the other hand, if blurring is performed on an area other than the point of interest in at least one of the videos, for example, the blurred area of the self-image is displayed in the area to be imitated, or the area where the action is targeted is displayed. By displaying the blurred area of the reference video on the screen, the user can easily identify the imitation target or action target.

また、本実施の形態では、参照映像や自己映像、合成映像が２次元の単数の映像である場合について主に説明したが、それらの映像は、ステレオ映像であってもよい。この場合には、映像出力部１４は、ユーザ３０が装着しているヘッドマウントディスプレイにステレオ映像である合成映像を出力してもよい。 Further, in this embodiment, the case where the reference video, self-video, and composite video are two-dimensional single videos has been mainly described, but these videos may be stereo videos. In this case, the video output unit 14 may output a composite video, which is a stereo video, to a head-mounted display worn by the user 30.

また、上記実施の形態では、映像合成装置１がスタンドアロンである場合について説明したが、映像合成装置１は、スタンドアロンの装置であってもよく、サーバ・クライアントシステムにおけるサーバ装置であってもよい。後者の場合には、受付部や取得部、出力部は、通信回線を介して入力を受け付けたり、情報を取得したり、情報を出力したりしてもよい。 Further, in the above embodiment, the case where the video synthesis device 1 is a standalone device has been described, but the video synthesis device 1 may be a standalone device or a server device in a server/client system. In the latter case, the reception unit, acquisition unit, and output unit may accept input, acquire information, and output information via a communication line.

また、上記実施の形態において、各処理または各機能は、単一の装置または単一のシステムによって集中処理されることによって実現されてもよく、または、複数の装置または複数のシステムによって分散処理されることによって実現されてもよい。 Furthermore, in the above embodiments, each process or each function may be realized by being centrally processed by a single device or a single system, or may be realized by being distributedly processed by multiple devices or multiple systems. This may be realized by

また、上記実施の形態において、各構成要素間で行われる情報の受け渡しは、例えば、その情報の受け渡しを行う２個の構成要素が物理的に異なるものである場合には、一方の構成要素による情報の出力と、他方の構成要素による情報の受け付けとによって行われてもよく、または、その情報の受け渡しを行う２個の構成要素が物理的に同じものである場合には、一方の構成要素に対応する処理のフェーズから、他方の構成要素に対応する処理のフェーズに移ることによって行われてもよい。 In addition, in the above embodiment, the information exchange performed between each component is performed by one component, for example, when the two components that exchange the information are physically different. This may be done by outputting information and receiving the information by another component, or by one component if the two components passing that information are physically the same. This may be performed by moving from a phase of processing corresponding to the component to a phase of processing corresponding to the other component.

また、上記実施の形態において、各構成要素が実行する処理に関係する情報、例えば、各構成要素が受け付けたり、取得したり、選択したり、生成したり、送信したり、受信したりした情報や、各構成要素が処理で用いる閾値や数式、アドレス等の情報等は、上記説明で明記していなくても、図示しない記録媒体において、一時的に、または長期にわたって保持されていてもよい。また、その図示しない記録媒体への情報の蓄積を、各構成要素、または、図示しない蓄積部が行ってもよい。また、その図示しない記録媒体からの情報の読み出しを、各構成要素、または、図示しない読み出し部が行ってもよい。 In the above embodiments, information related to processing executed by each component, for example, information accepted, acquired, selected, generated, transmitted, or received by each component. Information such as threshold values, formulas, addresses, etc. used by each component in processing may be held temporarily or for a long period of time in a recording medium (not shown), even if not specified in the above description. Further, the information may be stored in the recording medium (not shown) by each component or by a storage unit (not shown). Further, each component or a reading unit (not shown) may read information from the recording medium (not shown).

また、上記実施の形態において、各構成要素等で用いられる情報、例えば、各構成要素が処理で用いる閾値やアドレス、各種の設定値等の情報がユーザによって変更されてもよい場合には、上記説明で明記していなくても、ユーザが適宜、それらの情報を変更できるようにしてもよく、または、そうでなくてもよい。それらの情報をユーザが変更可能な場合には、その変更は、例えば、ユーザからの変更指示を受け付ける図示しない受付部と、その変更指示に応じて情報を変更する図示しない変更部とによって実現されてもよい。その図示しない受付部による変更指示の受け付けは、例えば、入力デバイスからの受け付けでもよく、通信回線を介して送信された情報の受信でもよく、所定の記録媒体から読み出された情報の受け付けでもよい。 In addition, in the above-described embodiment, if the information used in each component, for example, information such as threshold values, addresses, various setting values, etc. used by each component in processing, may be changed by the user, the above-mentioned Even if it is not specified in the description, the user may or may not be able to change the information as appropriate. If the information can be changed by the user, the change is realized by, for example, a reception unit (not shown) that receives change instructions from the user, and a change unit (not shown) that changes the information in accordance with the change instruction. You can. The acceptance of the change instruction by the reception unit (not shown) may be, for example, acceptance from an input device, information transmitted via a communication line, or information read from a predetermined recording medium. .

また、上記実施の形態において、映像合成装置１に含まれる２以上の構成要素が通信デバイスや入力デバイス等を有する場合に、２以上の構成要素が物理的に単一のデバイスを有してもよく、または、別々のデバイスを有してもよい。 Further, in the above embodiment, when two or more components included in the video synthesis apparatus 1 have a communication device, an input device, etc., even if the two or more components physically have a single device, or may have separate devices.

また、上記実施の形態において、各構成要素は専用のハードウェアにより構成されてもよく、または、ソフトウェアにより実現可能な構成要素については、プログラムを実行することによって実現されてもよい。例えば、ハードディスクや半導体メモリ等の記録媒体に記録されたソフトウェア・プログラムをＣＰＵ等のプログラム実行部が読み出して実行することによって、各構成要素が実現され得る。その実行時に、プログラム実行部は、記憶部や記録媒体にアクセスしながらプログラムを実行してもよい。なお、上記実施の形態における映像合成装置１を実現するソフトウェアは、以下のようなプログラムである。つまり、このプログラムは、ユーザが動作を模倣する対象である模倣対象の動作の映像である参照映像が記憶される記憶部にアクセス可能なコンピュータを、ユーザが動作させる対象である動作対象の動作の映像である自己映像を取得する映像取得部、参照映像と自己映像とを合成して合成映像を生成する合成部、合成映像を出力する映像出力部として機能させ、合成部は、参照映像と自己映像との合成の割合が時間に沿って連続的に繰り返して変化するように両映像を合成する、プログラムでもよい。 Furthermore, in the embodiments described above, each component may be configured by dedicated hardware, or components that can be realized by software may be realized by executing a program. For example, each component can be realized by a program execution unit such as a CPU reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory. At the time of execution, the program execution section may execute the program while accessing the storage section or recording medium. Note that the software that implements the video synthesis device 1 in the above embodiment is the following program. In other words, this program causes a computer that has access to a storage unit that stores a reference video that is a video of the motion of the target to be imitated, the motion of which the user is to imitate, to perform the motion of the target to be imitated. It functions as a video acquisition unit that acquires a self-image that is a video, a composition unit that combines the reference video and the self-video to generate a composite video, and a video output unit that outputs the composite video. It may be a program that combines both images so that the ratio of the combination with the video changes continuously and repeatedly over time.

なお、上記プログラムにおいて、上記プログラムが実現する機能には、ハードウェアでしか実現できない機能は含まれない。例えば、情報を取得する取得部や、情報を出力する出力部などにおけるモデムやインターフェースカードなどのハードウェアでしか実現できない機能は、上記プログラムが実現する機能には少なくとも含まれない。 Note that in the above program, the functions realized by the program do not include functions that can only be realized by hardware. For example, functions that can only be realized by hardware such as a modem or an interface card in an acquisition unit that acquires information, an output unit that outputs information, etc. are not included in the functions that are realized by the program.

また、このプログラムは、サーバなどからダウンロードされることによって実行されてもよく、所定の記録媒体（例えば、ＣＤ－ＲＯＭなどの光ディスクや磁気ディスク、半導体メモリなど）に記録されたプログラムが読み出されることによって実行されてもよい。また、このプログラムは、プログラムプロダクトを構成するプログラムとして用いられてもよい。 Further, this program may be executed by being downloaded from a server or the like, and the program recorded on a predetermined recording medium (for example, an optical disk such as a CD-ROM, a magnetic disk, a semiconductor memory, etc.) is read out. It may be executed by Further, this program may be used as a program constituting a program product.

また、このプログラムを実行するコンピュータは、単数であってもよく、複数であってもよい。すなわち、集中処理を行ってもよく、または分散処理を行ってもよい。 Further, the number of computers that execute this program may be one or more. That is, centralized processing or distributed processing may be performed.

図１３は、上記プログラムを実行して、上記実施の形態による映像合成装置１を実現するコンピュータ９００の構成の一例を示す図である。図１３において、コンピュータ９００は、ＭＰＵ（Micro Processing Unit）９１１と、ブートアッププログラム等のプログラムを記憶するためのＲＯＭ９１２と、ＭＰＵ９１１に接続され、アプリケーションプログラムの命令を一時的に記憶すると共に、一時記憶空間を提供するＲＡＭ９１３と、アプリケーションプログラム、システムプログラム、及びデータを記憶する記憶部９１４と、ＬＡＮやＷＡＮ等への接続を提供する通信モジュール９１５とを備える。なお、ＭＰＵ９１１、ＲＯＭ９１２等は、カメラ９０１、表示デバイス９０２、キーボード９０３、及びタッチパッドやマウスなどのポインティングデバイス９０４と共に、バスによって相互に接続されていてもよい。また、記憶部９１４は、例えば、ハードディスクやＳＳＤ（Solid State Drive）などであってもよい。また、カメラ９０１、表示デバイス９０２、キーボード９０３、ポインティングデバイス９０４などは、例えば、外付けのデバイスであってもよく、または、コンピュータ９００に内蔵されているデバイスであってもよい。 FIG. 13 is a diagram illustrating an example of the configuration of a computer 900 that executes the above program to realize the video synthesis apparatus 1 according to the above embodiment. In FIG. 13, a computer 900 includes an MPU (Micro Processing Unit) 911, a ROM 912 for storing programs such as a boot-up program, and a ROM 912 connected to the MPU 911 to temporarily store instructions of an application program. It includes a RAM 913 that provides space, a storage unit 914 that stores application programs, system programs, and data, and a communication module 915 that provides connection to a LAN, WAN, etc. Note that the MPU 911, the ROM 912, and the like may be interconnected by a bus together with the camera 901, the display device 902, the keyboard 903, and the pointing device 904 such as a touch pad or a mouse. Further, the storage unit 914 may be, for example, a hard disk or an SSD (Solid State Drive). Further, the camera 901, display device 902, keyboard 903, pointing device 904, etc. may be external devices, or may be devices built into the computer 900, for example.

コンピュータ９００に、上記実施の形態による映像合成装置１の機能を実行させるプログラムは、実行の際にＲＡＭ９１３にロードされてもよい。なお、プログラムは、例えば、記憶部９１４、またはネットワークから直接、ロードされてもよい。 A program that causes the computer 900 to execute the functions of the video synthesis apparatus 1 according to the embodiment described above may be loaded into the RAM 913 at the time of execution. Note that the program may be loaded directly from the storage unit 914 or the network, for example.

プログラムは、コンピュータ９００に、上記実施の形態による映像合成装置１の機能を実行させるオペレーティングシステム（ＯＳ）、またはサードパーティプログラム等を必ずしも含んでいなくてもよい。プログラムは、制御された態様で適切な機能やモジュールを呼び出し、所望の結果が得られるようにする命令の部分のみを含んでいてもよい。コンピュータ９００がどのように動作するのかについては周知であり、詳細な説明は省略する。 The program does not necessarily need to include an operating system (OS) or a third party program that causes the computer 900 to execute the functions of the video synthesis apparatus 1 according to the embodiment described above. A program may include only those portions of instructions that call appropriate functions or modules in a controlled manner to achieve desired results. How computer 900 operates is well known and detailed explanation will be omitted.

また、以上の実施の形態は、本発明を具体的に実施するための例示であって、本発明の技術的範囲を制限するものではない。本発明の技術的範囲は、実施の形態の説明ではなく、特許請求の範囲によって示されるものであり、特許請求の範囲の文言上の範囲及び均等の意味の範囲内での変更が含まれることが意図される。 Further, the above embodiments are illustrative examples for concretely implementing the present invention, and do not limit the technical scope of the present invention. The technical scope of the present invention is indicated by the claims, not the description of the embodiments, and includes changes within the literal scope and equivalent meaning of the claims. is intended.

１映像合成装置
１１記憶部
１２映像取得部
１３合成部
１４映像出力部
２１距離取得部
２２背景変更部
２３受付部
２４特定部
２５処理部 1 Video synthesis device 11 Storage unit 12 Video acquisition unit 13 Synthesis unit 14 Video output unit 21 Distance acquisition unit 22 Background change unit 23 Reception unit 24 Specification unit 25 Processing unit

Claims

a storage unit that stores a reference video that is a video of an action to be imitated whose action is to be imitated by the user;
a video acquisition unit that acquires a self-image that is a video of the action of an action target that is an object to be operated by the user;
a synthesizing unit that synthesizes the reference image and the self-image to generate a composite image;
a video output unit that outputs the composite video,
The combining unit is a video combining device that combines the reference video and the own video such that a combination ratio of the reference video and the own video continuously and repeatedly changes over time.

2. The video combining device according to claim 1, wherein the combining unit combines the reference video and the own video so that a change over time in a ratio of combining the reference video and the own video becomes a sine wave.

further comprising a distance acquisition unit that acquires a distance between the imitation target and the motion target in the composite video,
The image synthesizing device according to claim 1, wherein the synthesizing unit performs the synthesizing so that the greater the distance, the more of the self-image is displayed.

further comprising a background changing unit that changes a background area of the video to be changed, which is at least one of the reference video and the self-video, to transparent or a predetermined monochromatic color;
3. The video compositing device according to claim 1, wherein the compositing unit combines the video changed by the background changing unit with respect to the video to be changed.

a specifying unit that specifies a point of interest in a specific target video that is at least one of the reference video and the self-video;
further comprising a blurring processing unit that performs processing to blur an area other than the specified point of interest in the specific target video,
The image synthesis apparatus according to claim 1 or 2, wherein the synthesis section synthesizes the image subjected to blurring processing by the blurring processing section for the specific target image.

6. The video synthesizing device according to claim 5, wherein the specifying unit specifies a point of interest that is a moving point.

Further comprising a reception unit that receives attention point information indicating attention points from the user,
6. The video synthesis apparatus according to claim 5, wherein the specifying section specifies the point of interest according to point of interest information received by the receiving section.

A video synthesis method in which processing is performed using a storage unit storing a reference video that is a video of an action to be imitated whose action is to be imitated by a user, a video acquisition unit, a compositing unit, and a video output unit. There it is,
a step in which the image acquisition unit acquires a self-image that is an image of an operation target that is an object to be operated by the user;
the combining unit combining the reference video and the self-video to generate a composite video;
the video output unit outputting the composite video,
In the step of generating the composite video, the video compositing method comprises composing the reference video and the self-video so that the ratio of composition of the reference video and the self-video continuously and repeatedly changes over time.

A computer capable of accessing a storage unit in which a reference image, which is an image of the movement of the imitation target whose movement is to be imitated by the user, is stored;
a video acquisition unit that acquires a self-image that is a video of the action of an action target that is to be operated by the user;
a synthesizing unit that synthesizes the reference image and the self-image to generate a composite image;
Function as a video output unit that outputs the composite video,
The combining unit is a program that combines the reference video and the self-video so that a combination ratio between the reference video and the self-video continuously and repeatedly changes over time.