JP2023160275A

JP2023160275A - System, method, and program for three-dimensionally displaying two-dimensional moving images

Info

Publication number: JP2023160275A
Application number: JP2022070514A
Authority: JP
Inventors: 浩司大川; Koji Okawa; 洋輔佐藤; Yosuke Sato
Original assignee: Novius Inc
Current assignee: Novius Inc
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2023-11-02

Abstract

To three-dimensionally display two-dimensional moving images captured by a monocular imaging apparatus.SOLUTION: A system for three-dimensionally displaying two-dimensional moving images of the present invention comprises: receiving means that receives the two-dimensional moving images; depth information acquisition means that acquires depth information from each of the frames of the two-dimensional moving images; image pair creation means that creates an image pair having parallax for each of the frames of the two-dimensional moving images by using the depth information acquired from each of the frames of the two-dimensional moving images; and display means that displays the image pair for each of the frames of the two-dimensional moving images.SELECTED DRAWING: Figure 1

Description

本発明は、２次元動画を３次元的に表示するためのシステム、方法、およびプログラムに関する。 The present invention relates to a system, method, and program for displaying two-dimensional moving images three-dimensionally.

３次元画像を撮影可能な複眼撮像装置が知られている（例えば、特許文献１）。 BACKGROUND ART Compound eye imaging devices capable of capturing three-dimensional images are known (for example, Patent Document 1).

特開２００９－４８１８１号公報JP2009-48181A

本発明は、深度情報を有しない２次元動画であっても、３次元的に表示することを可能にすることを目的とする。 An object of the present invention is to make it possible to display three-dimensionally even a two-dimensional moving image that does not have depth information.

本発明は、例えば、深度情報を有しない２次元動画であっても、３次元的に表示することを可能にする２次元動画を３次元的に表示するためのシステムを提供する。 The present invention provides a system for displaying a two-dimensional moving image three-dimensionally, which enables, for example, a two-dimensional moving image that does not have depth information to be displayed three-dimensionally.

一実施形態において、本発明は、以下の項目を提供する。
（項目１）
２次元動画を３次元的に表示するためのシステムであって、
２次元動画を受信する受信手段と、
前記２次元動画のそれぞれのフレームから深度情報を取得する深度情報取得手段と、
前記２次元動画のそれぞれのフレームの前記深度情報を利用して、前記２次元動画のそれぞれのフレームに対して、視差を有する画像対を作成する画像対作成手段と、
前記２次元動画のそれぞれのフレームの前記画像対を表示する表示手段と
を備えるシステム。
（項目２）
前記画像対作成手段は、前記２次元動画のそれぞれのフレームに対して、
前記深度情報に基づいて、前記フレームの少なくとも１つの画素の変位量を決定することと、
前記フレームの前記少なくとも１つの画素を前記決定された変位量だけずらすことによって、前記フレームに対して視差が付けられた視差付画像を作成することと
によって、前記フレームと前記視差付画像とから構成される画像対を作成する、項目１に記載のシステム。
（項目３）
前記画像対作成手段は、深度が小さいほど前記変位量が大きくなるように、前記変位量を決定する、項目２に記載のシステム。
（項目４）
前記画像対作成手段は、前記２次元動画のそれぞれのフレームに対して、前記深度情報に基づいて、処理されるべき画素と処理されるべきではない画素とを決定し、
前記少なくとも１つの画素は、前記処理されるべき画素のうちの少なくとも１つの画素である、項目２または項目３に記載のシステム。
（項目５）
前記画像対作成手段は、前記２次元動画のそれぞれのフレームに対して、
前記深度情報に基づいて、前記フレームの少なくとも１つの領域の拡大率または縮小率を決定することと、
前記フレームの前記少なくとも１つの領域および前記視差付画像の対応する少なくとも１つの領域を前記決定された拡大率または縮小率で拡大／縮小することと
を行う、項目２～４のいずれか一項に記載のシステム。
（項目６）
前記画像対作成手段は、深度が小さい領域ほど前記拡大率が大きくなるように、かつ／または、深度が大きい領域ほど前記縮小率が大きくなるように、前記拡大率および／または前記縮小率を決定する、項目５に記載のシステム。
（項目７）
前記表示手段は、レンチキュラーディスプレイを含む、項目１～６のいずれか一項に記載のシステム。
（項目８）
２次元動画を３次元的に表示するための方法であって、
２次元動画を受信することと、
前記２次元動画のそれぞれのフレームから深度情報を取得することと、
前記２次元動画のそれぞれのフレームの前記深度情報を利用して、前記２次元動画のそれぞれのフレームに対して、視差を有する画像対を作成することと、
前記２次元動画のそれぞれのフレームの前記画像対を表示することと
を含む方法。
（項目９）
２次元動画を３次元的に表示するためのプログラムであって、前記プログラムは、プロセッサ部および表示部を備えるコンピュータにおいて実行され、前記プログラムは、
２次元動画を受信することと、
前記２次元動画のそれぞれのフレームから深度情報を取得することと、
前記２次元動画のそれぞれのフレームの前記深度情報を利用して、前記２次元動画のそれぞれのフレームに対して、視差を有する画像対を作成することと、
前記２次元動画のそれぞれのフレームの前記画像対を前記表示部に表示することと
を含む処理を前記プロセッサ部に行わせる、プログラム。 In one embodiment, the present invention provides the following items.
(Item 1)
A system for displaying 2D videos in 3D,
a receiving means for receiving a two-dimensional video;
Depth information acquisition means for acquiring depth information from each frame of the two-dimensional video;
image pair creation means for creating an image pair having parallax for each frame of the two-dimensional video using the depth information of each frame of the two-dimensional video;
and display means for displaying the pair of images of each frame of the two-dimensional video.
(Item 2)
The image pair creation means, for each frame of the two-dimensional video,
determining a displacement amount of at least one pixel of the frame based on the depth information;
creating a parallax image in which parallax is added to the frame by shifting the at least one pixel of the frame by the determined displacement amount; 2. The system of item 1, wherein the system generates an image pair.
(Item 3)
The system according to item 2, wherein the image pair creation means determines the displacement amount such that the smaller the depth, the larger the displacement amount.
(Item 4)
The image pair creation means determines pixels to be processed and pixels not to be processed for each frame of the two-dimensional video based on the depth information,
4. The system of item 2 or item 3, wherein the at least one pixel is at least one of the pixels to be processed.
(Item 5)
The image pair creation means, for each frame of the two-dimensional video,
determining an enlargement or reduction ratio of at least one region of the frame based on the depth information;
In any one of items 2 to 4, the at least one region of the frame and the corresponding at least one region of the parallax image are enlarged/reduced at the determined enlargement rate or reduction rate. The system described.
(Item 6)
The image pair creation means determines the magnification rate and/or the reduction rate such that the magnification rate is greater in an area with a smaller depth, and/or the reduction rate is greater in an area with a greater depth. The system described in item 5.
(Item 7)
7. The system according to any one of items 1 to 6, wherein the display means includes a lenticular display.
(Item 8)
A method for displaying a two-dimensional video three-dimensionally, the method comprising:
receiving a two-dimensional video;
obtaining depth information from each frame of the two-dimensional video;
creating an image pair having parallax for each frame of the two-dimensional video using the depth information of each frame of the two-dimensional video;
displaying the image pair of each frame of the two-dimensional video.
(Item 9)
A program for displaying a two-dimensional moving image three-dimensionally, the program being executed in a computer including a processor section and a display section, the program comprising:
receiving a two-dimensional video;
obtaining depth information from each frame of the two-dimensional video;
creating an image pair having parallax for each frame of the two-dimensional video using the depth information of each frame of the two-dimensional video;
Displaying the pair of images of each frame of the two-dimensional moving image on the display unit.

本発明によれば、深度情報を有しない２次元画像であっても、３次元的に表示することが可能となる。 According to the present invention, even a two-dimensional image without depth information can be displayed three-dimensionally.

本発明のシステムを用いて、２次元動画を３次元的に表示するためのフローの一例を概略的に示す図A diagram schematically showing an example of a flow for displaying a two-dimensional video three-dimensionally using the system of the present invention. ２次元動画を３次元的に表示するためのシステム１００の構成の一例を示す図A diagram showing an example of the configuration of a system 100 for displaying a two-dimensional video three-dimensionally. 画像対作成手段１３０による処理の一例を概略的に示す図A diagram schematically showing an example of processing by the image pair creation means 130 図３Ａに示される画像対に対して、深度に応じた拡大率／縮小率での拡大／縮小処理を行った結果の一例を示す図A diagram showing an example of the result of performing enlargement/reduction processing on the image pair shown in FIG. 3A at an enlargement/reduction ratio according to the depth. ２次元動画を３次元的に表示するためのシステム１００における処理の一例（処理４００）のフローチャートFlowchart of an example of processing (processing 400) in the system 100 for displaying a two-dimensional video three-dimensionally ステップＳ４０３でプロセッサ部が行う処理の一例を示すフローチャートFlowchart showing an example of processing performed by the processor unit in step S403 実施例で作成された画像対を示す図Diagram showing image pairs created in the example 実施例で作成された画像対を示す図Diagram showing image pairs created in the example 実施例で作成された画像対を示す図Diagram showing image pairs created in the example 実施例で作成された画像対を示す図Diagram showing image pairs created in the example

以下、図面を参照しながら、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（１．単眼式内視鏡動画を３次元的に表示するためのシステム）
本発明の発明者は、顕微鏡手術または内視鏡手術における問題点を認識していた。その問題点の１つは、顕微鏡または内視鏡によって撮影された２次元動画が立体的に見えないことである。熟練した術者であれば、２次元動画から立体的な構造を想像することができるかもしれないが、経験未熟な術者はそれができないため、顕微鏡または内視鏡によって取得された動画を見ながら手術をすることは、経験未熟な術者にとって非常に危険である。 (1. System for displaying monocular endoscopic videos in three dimensions)
The inventors of the present invention recognized problems in microsurgery or endoscopic surgery. One of the problems is that a two-dimensional moving image taken with a microscope or an endoscope cannot be seen three-dimensionally. An experienced surgeon may be able to imagine a three-dimensional structure from a two-dimensional video, but an inexperienced surgeon may not be able to do so by viewing videos obtained with a microscope or endoscope. It is extremely dangerous for inexperienced surgeons to perform surgery.

この問題点を解決する１つの手段は、複眼式内視鏡を用いることである。複眼式内視鏡は、深度情報を有する２次元動画を取得することができる。複眼式内視鏡を用いることで、得られる動画を３次元的に表示することができるため、経験未熟な術者であっても、立体的な構造を認識して手術を行うことができる。しかしながら、本願発明の発明者は、複眼式内視鏡を用いることには、欠点があると考えた。具体的には、複眼式内視鏡は、単眼式内視鏡に比べて、高倍率に弱く、高倍率ではピントが合わずにぼやけるという欠点があった。また、複眼式内視鏡によって得られた２次元動画を３次元的に表示するためには３Ｄ眼鏡等を装着する必要があり、３Ｄ眼鏡を介して長時間動画を見ることで、いわゆる３Ｄ酔いになりやすいという欠点もあった。さらに、３Ｄ眼鏡等を装着する必要性により、動画を複数人で共有することができず、例えば、医学教育の面でも困難性があった。加えて、複眼式内視鏡は単眼式内視鏡よりもはるかに高価であり、導入コストが非常に大きいという欠点もあった。 One way to solve this problem is to use a compound endoscope. A compound eye endoscope can acquire two-dimensional moving images with depth information. By using a compound eye endoscope, the resulting moving image can be displayed three-dimensionally, so even an inexperienced surgeon can recognize the three-dimensional structure and perform surgery. However, the inventor of the present invention considered that there are drawbacks to using a compound endoscope. Specifically, compound-eye endoscopes have the disadvantage that they are less sensitive to high magnification than monocular endoscopes, and the images become blurred due to lack of focus at high magnifications. In addition, in order to display two-dimensional videos obtained with a compound eye endoscope three-dimensionally, it is necessary to wear 3D glasses, etc., and watching videos for a long time through 3D glasses can cause so-called 3D motion sickness. It also had the disadvantage of being prone to Furthermore, the necessity of wearing 3D glasses or the like makes it impossible to share the video with multiple people, which poses difficulties, for example, in terms of medical education. In addition, compound endoscopes are much more expensive than monocular endoscopes, and they also have the drawback of being extremely expensive to introduce.

本発明の発明者は、鋭意研究の結果、複眼式内視鏡に依拠することなく、単眼式内視鏡から得られる２次元動画を３次元的に表示するためのシステムを開発した。なお、本明細書では、動画を３次元的に表示するとは、見る人によって３次元であると認識されるように動画を表示すること、言い換えると、立体視を可能にするように動画を表示することを意味し、表示される動画は必ずしも３次元動画である必要はない。 As a result of intensive research, the inventor of the present invention has developed a system for three-dimensionally displaying two-dimensional moving images obtained from a monocular endoscope without relying on a compound-eye endoscope. Note that in this specification, displaying a video three-dimensionally means displaying a video so that it is recognized as three-dimensional by the viewer, in other words, displaying a video so that stereoscopic viewing is possible. This means that the displayed video does not necessarily have to be a 3D video.

図１は、本発明のシステムを用いて、２次元動画を３次元的に表示するためのフローの一例を概略的に示す。 FIG. 1 schematically shows an example of a flow for displaying a two-dimensional video three-dimensionally using the system of the present invention.

まず、単眼式内視鏡１０を用いて、単眼式内視鏡１０の視野内の身体部位Ｓが撮影される。身体部位Ｓは、構造Ａおよび構造Ｂを備えている。単眼式内視鏡１０によって撮影された動画からは、構造Ａおよび構造Ｂがどのような立体的な構造を有しているかを認識することができない。単眼式内視鏡１０によって取得された動画は２次元であり、かつ、深度情報が欠如しているからである。 First, using the monocular endoscope 10, a body part S within the field of view of the monocular endoscope 10 is photographed. Body part S includes structure A and structure B. From the video taken by the monocular endoscope 10, it is not possible to recognize what kind of three-dimensional structures structure A and structure B have. This is because the moving image acquired by the monocular endoscope 10 is two-dimensional and lacks depth information.

ステップＳ１では、単眼式内視鏡１０によって撮影された２次元動画が本発明のシステム１００に入力される。ここでは、動画は、連続する複数の静止画であり、本明細書では、動画を構成する静止画の各々を「フレーム」と称する。２次元動画が本発明のシステム１００に入力されると、本発明のシステム１００は、２次元動画のそれぞれのフレームに対して画像対を作成する。画像対は、相互に視差を有しており、画像対の一方は右眼用のフレームとなり、画像対の他方は左眼用のフレームとなる。 In step S1, a two-dimensional video captured by the monocular endoscope 10 is input to the system 100 of the present invention. Here, a moving image is a plurality of consecutive still images, and in this specification, each of the still images making up the moving image is referred to as a "frame." When a two-dimensional video is input to the system 100 of the present invention, the system 100 of the present invention creates an image pair for each frame of the two-dimensional video. The image pairs have parallax with each other, one of the image pairs becomes a frame for the right eye, and the other image pair becomes a frame for the left eye.

ステップＳ２では、本発明のシステム１００が、動画から作成された画像対を連続的に表示する。見る人は、右眼用のフレームを右眼で見て、左眼用のフレームを左眼で見ることにより、視差を有する画像対に映った被写体を立体的に認識することができる。例えば、見る人には、動画２０が表示されることになる。動画２０では、単眼式内視鏡１０によって撮影された身体部位Ｓが立体的に表示される。これにより、見る人は、構造Ａおよび構造Ｂの立体的な構造を認識することができる。図１に示される例では、動画２０により、構造Ａは周りに対してくぼんだ構造であり、構造Ｂは周りに対して隆起した構造であることが見て取れる。 In step S2, the system 100 of the present invention sequentially displays image pairs created from the video. By viewing the frame for the right eye with the right eye and viewing the frame for the left eye with the left eye, the viewer can three-dimensionally recognize the subject reflected in the pair of images having parallax. For example, a video 20 will be displayed to the viewer. In the video 20, a body part S photographed by the monocular endoscope 10 is displayed three-dimensionally. This allows the viewer to recognize the three-dimensional structures of Structure A and Structure B. In the example shown in FIG. 1, it can be seen from the video 20 that structure A is a depressed structure with respect to its surroundings, and structure B is a structure that is raised with respect to its surroundings.

画像対は、任意の３次元表示方式で表示されることができる。例えば、見る人が３Ｄ表示用の装着器具（例えば、３Ｄ眼鏡等）を装着することで動画２０を見ることができるように、画像対が表示されるようにしてもよいし、見る人が専用の装着器具を装着する必要なく動画２０を見ることができるように、画像対が表示されるようにしてもよい。好ましくは、見る人が専用器具を装着する必要なく動画２０を見ることができるように画像対が表示され得る。これは、例えば、レンチキュラーディスプレイ等の特殊なディスプレイ装置に画像対を表示することによって達成され得る。 The image pair can be displayed in any three-dimensional display format. For example, the pair of images may be displayed so that the viewer can view the video 20 by wearing a 3D display device (for example, 3D glasses), or the viewer may The image pairs may be displayed so that the video 20 can be viewed without the need to wear a mounting device. Preferably, the image pair may be displayed so that a viewer can view the video 20 without having to wear specialized equipment. This can be achieved, for example, by displaying the image pair on a special display device, such as a lenticular display.

このようにして、単眼式内視鏡１０によって撮影された２次元動画であっても、本発明のシステム１００によれば３次元的に表示することができるため、例えば、術者は、立体的な構造を認識して手術を行うことができる。また、単眼式内視鏡１０によって撮影された画像は、複眼式内視鏡に比べて、高倍率に強く、高倍率でもピントが合うため、細かい構造ですら立体的な構造を認識することができる。 In this way, even a two-dimensional moving image taken by the monocular endoscope 10 can be displayed three-dimensionally according to the system 100 of the present invention. surgery can be performed by recognizing the structure. In addition, images taken with the monocular endoscope 10 are more resistant to high magnification than those of compound eye endoscopes, and are in focus even at high magnifications, making it possible to recognize even fine three-dimensional structures. can.

また、レンチキュラーディスプレイ等を使用して、専用の装着器具を必要としないように画像対を表示することで、見る人は、いわゆる３Ｄ酔いになりづらい。さらに、動画２０を複数の人で共有して見ることもでき、これは、医学教育に役立つ。さらには、専用の装着器具は付け外し等のために接触が必要であるが、専用の装着器具を必要としないことで、本発明のシステム１００は、非接触ツールとなり、ポストコロナにおいても有用となる。 Furthermore, by displaying the image pair using a lenticular display or the like so that a dedicated mounting device is not required, the viewer is less likely to suffer from so-called 3D motion sickness. Furthermore, the video 20 can be shared and viewed by multiple people, which is useful for medical education. Furthermore, since a dedicated mounting device requires contact for attachment and removal, the system 100 of the present invention becomes a non-contact tool and will be useful even in the post-corona era. Become.

さらに、高価な複眼式内視鏡を導入する必要がなくなり、低いコストで安全な手術を実施することができるという利点もある。 Another advantage is that there is no need to introduce an expensive compound endoscope, and safe surgery can be performed at low cost.

加えて、本発明のシステム１００は、略リアルタイムで（例えば、２次元動画が入力された時刻に対して約２０～５０ミリ秒以下の遅延で、好ましくは、約３０ミリ秒以下の遅延で、典型的には、約３３ミリ秒以下の遅延で）、画像対を出力することができる。人間の視覚遅延弁別閾は２０～３０ミリ秒とされており、約５０ミリ秒以下の遅延、好ましくは、約３０ミリ秒以下の遅延であれば、人間は、リアルタイムであると感じることができる。本発明のシステムが略リアルタイムで画像対を出力することができることは、手術用途において特に好ましい。手術用途では、動画が撮影された時刻と動画が表示される時刻とのずれが、術者による正確な手術を不可能にするからである。 In addition, the system 100 of the present invention can perform operations in substantially real time (e.g., with a delay of about 20 to 50 milliseconds or less relative to the time the two-dimensional video is input, preferably with a delay of about 30 milliseconds or less). The image pair can be output (typically with a delay of about 33 milliseconds or less). The human visual delay discrimination threshold is said to be 20 to 30 milliseconds, and humans can perceive that it is real time if the delay is about 50 milliseconds or less, preferably about 30 milliseconds or less. . The ability of the system of the present invention to output image pairs in near real time is particularly advantageous in surgical applications. This is because, in surgical applications, the difference between the time when the video is taken and the time when the video is displayed makes it impossible for the surgeon to perform accurate surgery.

上述した例では、単眼式内視鏡１０によって撮影された２次元動画を例に説明したが、本発明のシステム１００が３次元的に表示することができるのは、単眼式内視鏡１０によって撮影された２次元動画に限定されない。本発明のシステム１００は、２次元動画が深度情報を有していない限り、他の任意の２次元動画を３次元的に表示することができる。２次元動画は、例えば、既存のアニメーション、映画、ＴＶ番組、ドラマ等であってもよい。これにより、既存のアニメーション、映画、ＴＶ番組、ドラマ等が３次元的に表示されることとなる。これは、新たなユーザ体験を創出し得る。 In the above example, a two-dimensional video captured by the monocular endoscope 10 was explained, but the system 100 of the present invention can display three-dimensional images using the monocular endoscope 10. The present invention is not limited to two-dimensional videos shot. The system 100 of the present invention can display any other two-dimensional video three-dimensionally, as long as the two-dimensional video does not have depth information. The two-dimensional video may be, for example, an existing animation, movie, TV program, drama, etc. As a result, existing animations, movies, TV programs, dramas, etc. will be displayed three-dimensionally. This may create a new user experience.

上述した本発明のシステム１００は、後述する構成を有することができる。 The system 100 of the present invention described above can have the configuration described below.

（２．２次元動画を３次元的に表示するためのシステムの構成）
図２は、２次元動画を３次元的に表示するためのシステム１００の構成の一例を示す。 (2. Configuration of system for displaying 2D video in 3D)
FIG. 2 shows an example of the configuration of a system 100 for displaying a two-dimensional moving image three-dimensionally.

システム１００は、受信手段１１０と、深度情報取得手段１２０と、画像対作成手段１３０と、表示手段１４０とを備える。 The system 100 includes a receiving means 110, a depth information acquiring means 120, an image pair creating means 130, and a display means 140.

受信手段１１０は、２次元動画を受信するように構成されている。 The receiving means 110 is configured to receive a two-dimensional moving image.

２次元動画は、任意の２次元動画であり得る。２次元動画は、好ましくは、単眼撮像装置（例えば、単眼式内視鏡）によって撮影された動画であり得る。２次元動画は、例えば、既存のアニメーション、映画、ＴＶ番組、ドラマ等であってもよい。 The two-dimensional video may be any two-dimensional video. The two-dimensional video may preferably be a video captured by a monocular imaging device (for example, a monocular endoscope). The two-dimensional video may be, for example, an existing animation, movie, TV program, drama, etc.

受信手段１１０によって受信された２次元動画は、深度情報取得手段１２０および画像対作成手段１３０に渡される。 The two-dimensional moving image received by the reception means 110 is passed to the depth information acquisition means 120 and the image pair creation means 130.

深度情報取得手段１２０は、受信手段１１０によって受信された２次元動画のそれぞれのフレームから深度情報を取得するように構成されている。深度情報とは、所定の基準点（例えば、カメラ位置）から被写体までの距離（深度）を表す情報である。深度情報取得手段１２０は、例えば、各画素についての深度情報を取得するようにしてもよいし、複数の画素を含む領域についての深度情報を取得するようにしてもよい。深度情報取得手段１２０が複数の画素を含む領域についての深度情報を取得することで、深度情報を取得する処理の負荷は軽減され得、処理が高速化され得る。深度情報は、例えば、深度マップとして表されてもよいし、各画素とその深度とを記述する行列等の形式で表されてもよい。 The depth information acquisition means 120 is configured to acquire depth information from each frame of the two-dimensional video received by the reception means 110. Depth information is information representing the distance (depth) from a predetermined reference point (eg, camera position) to a subject. For example, the depth information acquisition unit 120 may acquire depth information for each pixel, or may acquire depth information for an area including a plurality of pixels. By the depth information acquisition unit 120 acquiring depth information for a region including a plurality of pixels, the load of processing for acquiring depth information can be reduced and the processing speed can be increased. Depth information may be expressed, for example, as a depth map, or in a format such as a matrix that describes each pixel and its depth.

深度情報取得手段１２０は、例えば、機械学習モデルを用いて、各フレームの深度情報を取得することができる。機械学習モデルは、画像と、その画像の深度情報との関係を学習したモデルである。そのような機械学習モデルは、画像を入力用教師データとし、その画像の深度情報を出力用教師データとして、複数の画像のそれぞれについて学習することによって構築されることができる。あるいは、そのような機械学習モデルは、画像を入力用教師データとし、その画像の深度マップを出力用教師データとして、複数の画像のそれぞれについて学習することによって構築されることができる。そのような機械学習モデルに画像を入力すると、その画像の深度情報または深度マップが出力されることになる。 The depth information acquisition means 120 can acquire the depth information of each frame using, for example, a machine learning model. A machine learning model is a model that has learned the relationship between an image and its depth information. Such a machine learning model can be constructed by learning about each of a plurality of images, using an image as input training data and depth information of the image as output training data. Alternatively, such a machine learning model can be constructed by learning about each of a plurality of images using an image as input training data and a depth map of the image as output training data. When you input an image to such a machine learning model, it will output depth information or a depth map for that image.

学習に用いられる画像は、好ましくは、複眼撮像装置によって撮影された画像であり得る。複眼撮像装置によって撮影された画像であれば、深度情報を容易に取得することができる、あるいは、深度マップを容易に作成することができるからである。複眼撮像装置によって撮影された画像から深度情報を取得する手法、または、深度マップを作成する手法は、当該技術分野において公知の手法であり得る。 The images used for learning may preferably be images taken by a compound eye imaging device. This is because if the image is captured by a compound eye imaging device, depth information can be easily acquired or a depth map can be easily created. A method of acquiring depth information from an image photographed by a compound-eye imaging device or a method of creating a depth map may be a method known in the art.

深度情報取得手段１２０によって作成された、２次元動画のそれぞれのフレームの各画素の深度情報は、画像対作成手段１３０に渡される。 The depth information of each pixel of each frame of the two-dimensional video created by the depth information acquisition means 120 is passed to the image pair creation means 130.

画像対作成手段１３０は、２次元動画のそれぞれのフレームに対して、深度情報を利用して、視差を有する画像対を作成するように構成されている。画像対は、フレームと、画像対作成手段１３０によって作成された画像との対であってもよいし、画像対作成手段１３０によって作成された第１の画像と第２の画像との対であってもよい。好ましくは、画像対は、フレームと、画像対作成手段１３０によって作成された画像との対であり得る。フレームを画像対の一方として利用することで、画像対作成手段１３０の処理を軽減することができ、処理の高速化につながり得るからである。 The image pair creation means 130 is configured to create an image pair having parallax for each frame of a two-dimensional video using depth information. The image pair may be a pair of a frame and an image created by the image pair creation means 130, or a pair of a first image and a second image created by the image pair creation means 130. It's okay. Preferably, the image pair may be a pair of a frame and an image created by the image pair creation means 130. This is because by using a frame as one of the image pairs, the processing of the image pair creation means 130 can be reduced, which can lead to faster processing.

フレームおよび作成された画像は、相互に視差を有しており、第１の画像および第２に画像も、相互に視差を有している。ここで、２つの画像が視差を有するとは、２つの画像中の同一の被写体の位置が、２つの画像の視点（例えば、カメラ位置）の差に対応する変位量だけ相互にずれていることを意味する。例えば、２つの異なる視点から同一の被写体を撮影した２つの画像は、視差を有することになる。画像対作成手段１３０によって作成された画像対では、フレーム中の被写体の位置と、作成された画像中の同一の被写体の位置とは、人の両眼間の距離に対応する変位量だけ相互にずれており、あるいは、作成された第１の画像中の被写体の位置と、作成された第２の画像中の同一の被写体の位置とは、人の両眼間の距離に対応する変位量だけ相互にずれている。画像対作成手段１３０は、例えば、近くにある被写体である（深度が小さい）ほど、２つの画像中の位置のずれが大きくなり、遠くにある被写体である（深度が大きい）ほど、２つの画像中の位置のずれが小さくなるような画像対を作成することができる。 The frame and the created image have a mutual parallax, and the first image and the second image also have a mutual parallax. Here, two images having parallax means that the positions of the same subject in the two images are shifted from each other by an amount of displacement corresponding to the difference in viewpoint (for example, camera position) of the two images. means. For example, two images of the same subject taken from two different viewpoints will have parallax. In the image pair created by the image pair creation means 130, the position of the subject in the frame and the position of the same subject in the created image differ from each other by an amount of displacement corresponding to the distance between the eyes of the person. Or, the position of the subject in the first image created and the position of the same subject in the second image created are only a displacement amount corresponding to the distance between the person's eyes. They are out of alignment with each other. For example, the closer the object is (smaller the depth), the larger the positional shift between the two images becomes, and the farther the object is (larger the depth), the larger the difference between the two images becomes. It is possible to create a pair of images in which the positional deviation between the images is small.

画像対作成手段１３０は、２次元動画のそれぞれのフレームに対して、深度情報に基づいて、フレームの少なくとも１つの画素の変位量を決定することと、フレームの少なくとも１つの画素を決定された変位量だけずらすことによって、フレームに対して視差がつけられた視差付画像を作成することとによって、フレームと視差付画像とから構成される画像対を作成することができる。このとき、画像対作成手段１３０は、深度が小さいほど変位量が大きくなるように変位量を決定することができる。深度と変位量とは、例えば、線形の関係を有し得る。例えば、フレーム中の第１の画素が、第１の深度を有し、第２の画素が、第１の深度の２倍の深度を有する場合、第１の画素の変位量は、第２の画素の変位量の２倍となり得る。 The image pair creation means 130 determines, for each frame of the two-dimensional video, the displacement amount of at least one pixel of the frame based on the depth information, and the determined displacement amount of at least one pixel of the frame. By shifting the frame by the amount, an image with parallax is created in which parallax is added to the frame, and an image pair consisting of the frame and the image with parallax can be created. At this time, the image pair creation means 130 can determine the amount of displacement such that the smaller the depth, the larger the amount of displacement. For example, depth and displacement may have a linear relationship. For example, if a first pixel in a frame has a first depth and a second pixel has a depth twice the first depth, the amount of displacement of the first pixel is equal to the amount of displacement of the second pixel. This can be twice the amount of displacement of the pixel.

このように深度に応じた変位量で画素をずらすことにより、作成された画像対は、より自然な見え方で３次元的に表示されることになる。深度が小さい（近い）被写体をより大きくずらし、深度が大きい（遠い）被写体を小さくずらすことは、人間の生理的な両眼視認識に沿ったものであるからである。 By shifting the pixels by a displacement amount corresponding to the depth in this manner, the created image pair is displayed three-dimensionally with a more natural appearance. This is because it is in line with human physiological binocular recognition to shift a subject with a small depth (near) by a larger amount and shift a subject with a larger depth (farther) by a smaller amount.

図３Ａは、画像対作成手段１３０による処理の一例を概略的に示す。図３Ａ（ａ）が動画のフレームを表し、図３Ａ（ｂ）が作成された視差付画像を表す。説明の簡単のために、各画像は、６×７ピクセルを有するものとして説明する。薄い灰色で表された画素が０．５の深度を有し、濃い灰色で表された画素が１．０の深度を有し、白色で表された画素が１０の深度を有するものとする。深度が小さいほど画像の視点（例えば、カメラ位置）に近いことを表す。 FIG. 3A schematically shows an example of processing by the image pair creation means 130. FIG. 3A(a) represents a frame of a moving image, and FIG. 3A(b) represents a created image with parallax. For ease of explanation, each image will be described as having 6x7 pixels. It is assumed that a pixel represented in light gray has a depth of 0.5, a pixel represented in dark gray has a depth of 1.0, and a pixel represented in white has a depth of 10. The smaller the depth, the closer to the viewpoint of the image (for example, the camera position).

画像対作成手段１３０に図３Ａ（ａ）に示される動画のフレームが入力されると、画像対作成手段１３０は、例えば、薄い灰色で表された画素の変位量を２画素と決定し、濃い灰色で表された画素の変位量を１画素と決定し、白色で表された画素の変位量を０画素と決定する。そして、画像対作成手段１３０は、各画素を決定された変位量だけずらすことによって、図３Ａ（ｂ）に示される視差付画像を作成する。例えば、図３Ａ（ａ）の（３，１）の薄い灰色の画素を紙面右方向に２画素ずらすことで、対応する画素は、図３（ｂ）の（３，３）の画素となる。例えば、図３Ａ（ａ）の（２，２）の濃い灰色の画素を紙面右方向に１画素ずらすことで、対応する画素は、図３（ｂ）の（２，３）の画素となる。このようにして、図３Ａ（ａ）に示される動画のフレームと、図３Ａ（ｂ）に示される視差付画像とから構成される画像対が作成されることになる。 When the frame of the moving image shown in FIG. 3A(a) is input to the image pair creation means 130, the image pair creation means 130 determines, for example, that the amount of displacement of the pixel represented in light gray is 2 pixels, and The amount of displacement of the pixel represented in gray is determined to be 1 pixel, and the amount of displacement of the pixel represented in white is determined to be 0 pixel. Then, the image pair creation means 130 creates the parallax image shown in FIG. 3A(b) by shifting each pixel by the determined displacement amount. For example, by shifting the light gray pixel at (3,1) in FIG. 3A(a) by two pixels to the right in the paper, the corresponding pixel becomes the pixel at (3,3) in FIG. 3(b). For example, by shifting the dark gray pixel (2, 2) in FIG. 3A(a) by one pixel to the right in the paper, the corresponding pixel becomes the pixel (2, 3) in FIG. 3(b). In this way, an image pair consisting of the moving image frame shown in FIG. 3A(a) and the parallax image shown in FIG. 3A(b) is created.

一実施形態において、画像対作成手段１３０は、深度情報に基づいて、処理されるべき画素と処理されるべきではない画素とを決定し、処理されるべき画素に対してのみ処理を行い（具体的には、変位量を決定して変位量だけずらす処理を行い）、処理されるべきではない画素に対しては何もしないようにしてもよい。これは、処理のために必要なリソースを削減することができるため、コンピュータの負荷を軽減することができるとともに、処理の高速化につながるという利点を有している。この実施形態は、略リアルタイムの画像対表示のために好ましい。 In one embodiment, the image pair creation means 130 determines pixels to be processed and pixels not to be processed based on the depth information, and processes only the pixels to be processed (specifically Specifically, it may be possible to perform processing to determine the amount of displacement and shift by the amount of displacement), and do nothing to pixels that should not be processed. This has the advantage that the resources required for processing can be reduced, thereby reducing the load on the computer and leading to faster processing. This embodiment is preferred for near real-time image pair display.

画像対作成手段１３０は、例えば、深度が所定の閾値未満である画素を処理されるべき画素であると決定し、深度が所定の閾値以上である画素を処理されるべきではない画素であると決定することができる。所定の閾値は、固定値であってもよいし変動値であってもよい。所定の閾値が変動値である場合、所定の閾値は、例えば、画素全体の深度の平均値、中央値、第１四分位数、または、第３四分位数であってもよいし、例えば、深度の最大値の９０％値、８０％値、７０％値、６０％値、４０％値、または３０％値等であってもよい。 For example, the image pair creation means 130 determines that a pixel whose depth is less than a predetermined threshold is a pixel that should be processed, and a pixel whose depth is greater than or equal to a predetermined threshold is a pixel that should not be processed. can be determined. The predetermined threshold value may be a fixed value or a variable value. When the predetermined threshold value is a variable value, the predetermined threshold value may be, for example, the average value, median value, first quartile, or third quartile of the depth of all pixels; For example, the depth may be 90%, 80%, 70%, 60%, 40%, or 30% of the maximum depth.

例えば、図３Ａを参照して上述した例では、画像対作成手段１３０は、深度情報に基づいて、白色で表された画素および濃い灰色で表された画素を処理されるべきではない画素として決定し、白色で表された画素および濃い灰色で表された画素に対しては何も処理をしない。他方で、画像対作成手段１３０は、深度情報に基づいて、薄い灰色で表された画素を処理されるべき画素として決定し、薄い灰色で表された画素の変位量を決定し、薄い灰色で表された画素をその変位量だけずらす。これにより、白色で表された画素および濃い灰色で表された画素に対する処理を行わなくてよいため、処理のために必要なリソースを削減することができるとともに、処理の高速化を達成することができる。 For example, in the example described above with reference to FIG. 3A, the image pair creation means 130 determines the pixels represented in white and the pixels represented in dark gray as pixels that should not be processed, based on the depth information. However, no processing is performed on pixels expressed in white and pixels expressed in dark gray. On the other hand, the image pair creation means 130 determines the pixel represented in light gray as the pixel to be processed based on the depth information, determines the amount of displacement of the pixel represented in light gray, and determines the amount of displacement of the pixel represented in light gray. Shifts the represented pixel by the amount of displacement. This eliminates the need to process pixels represented in white and pixels represented in dark gray, making it possible to reduce the resources required for processing and speed up processing. can.

上述した変位量を決定する処理に加えて、画像対作成手段１３０は、２次元動画のそれぞれのフレームに対して、深度情報に基づいて、フレームの少なくとも１つの領域の拡大率または縮小率を決定することと、フレームの少なくとも１つの領域および視差付画像の対応する少なくとも１つの領域を決定された拡大率または縮小率で拡大／縮小することとを行うようにしてもよい。少なくとも１つの領域は、例えば、同一または類似する深度を有する画素を含む領域であり得る。本明細書において、類似する深度とは、±１０％の深度のことをいう。画像対作成手段１３０は、深度が小さい領域ほど拡大率が大きくなるように、かつ／または、深度が大きい領域ほど縮小率が大きくなるように、拡大率および／または縮小率を決定することができる。深度と拡大率および縮小率とは、例えば、線形の関係を有し得る。例えば、フレーム中の第１の領域が、第１の深度を有し、第２の領域が、第１の深度の２倍の深度を有する場合、第１の領域の拡大率は、第２の領域の２倍となり得る。 In addition to the process of determining the amount of displacement described above, the image pair creation means 130 determines, for each frame of the two-dimensional video, the enlargement rate or reduction rate of at least one region of the frame based on the depth information. At least one region of the frame and at least one corresponding region of the parallax image may be enlarged/reduced at the determined enlargement rate or reduction rate. The at least one region may be, for example, a region containing pixels with the same or similar depth. As used herein, similar depth refers to a depth of ±10%. The image pair creation means 130 can determine the enlargement rate and/or the reduction rate such that the smaller the depth is, the larger the enlargement rate is, and/or the larger the depth is, the larger the reduction rate is. . For example, the depth and the enlargement rate and reduction rate may have a linear relationship. For example, if a first region in a frame has a first depth and a second region has a depth twice the first depth, then the magnification factor of the first region is the same as that of the second region. It can be twice the area.

このように深度に応じた拡大率／縮小率で拡大／縮小することにより、作成された画像対は、より自然な見え方で３次元的に表示されることになる。深度が小さい（近い）被写体をより大きくし、深度が大きい（遠い）被写体をより小さくすることは、人間の生理的な両眼視認識に沿ったものであり、感覚的立体視の効果を高めることができるからである。さらに、このような拡大／縮小により、見る人の画像中の焦点が変化するため、被写体のエッジが強調されるように見えることになる。これにより、例えば、近い被写体は、鮮明になり、輪郭が連続し、かつ／または解像度がよく見える一方で、遠い被写体は、ぼやけて、輪郭が非連続であり、かつ／または解像度が乏しく見えるようになる。これも、感覚的立体視の効果を高めることができる。 By enlarging/reducing the image at an enlargement/reduction ratio according to the depth in this manner, the created image pair is displayed in a three-dimensional manner with a more natural appearance. Making objects with a small depth (closer) larger and objects with a larger depth (farther) smaller is in line with human physiological binocular recognition and increases the effect of sensory stereopsis. This is because it is possible. Furthermore, such enlargement/reduction changes the focal point of the image for the viewer, so that the edges of the subject appear to be emphasized. This may cause, for example, objects that are close to each other to appear sharp, have continuous contours, and/or good resolution, while objects that are far away appear blurred, have discontinuous contours, and/or have poor resolution. become. This can also enhance the effect of sensory stereopsis.

図３Ｂは、図３Ａに示される画像対に対して、深度に応じた拡大率／縮小率での拡大／縮小処理を行った結果の一例を示す。図３Ｂ（ａ）が動画のフレームを表し、図３Ｂ（ｂ）が作成された視差付画像を表す。図３Ａを参照して上述したように、薄い灰色で表された画素が０．５の深度を有し、濃い灰色で表された画素が１．０の深度を有し、白色で表された画素が１０の深度を有するものとする。 FIG. 3B shows an example of the result of performing enlargement/reduction processing on the image pair shown in FIG. 3A at an enlargement/reduction ratio depending on the depth. FIG. 3B(a) represents a frame of a moving image, and FIG. 3B(b) represents a created image with parallax. As described above with reference to FIG. 3A, pixels represented by light gray have a depth of 0.5, pixels represented by dark gray have a depth of 1.0, and are represented by white. Assume that a pixel has a depth of 10.

画像対作成手段１３０は、図３Ａに示される画像対に対して、例えば、薄い灰色で表された領域の拡大率を１６０％と決定し、濃い灰色で表された領域および白色で表された領域の拡大率を１００％（すなわち、拡大／縮小しない）と決定する。そして、画像対作成手段１３０は、各領域を決定された拡大率で拡大することによって、図３Ｂに示される画像対を作成する。すなわち、図３Ｂ（ａ）に示される動画のフレームの少なくとも１つの領域が拡大された画像と、図３Ｂ（ｂ）に示される、対応する少なくとも１つの領域が拡大された視差付画像とから構成される画像対が作成される。 For the image pair shown in FIG. 3A, the image pair creation means 130 determines, for example, that the enlargement ratio of the area represented in light gray is 160%, and the enlargement ratio of the area represented in dark gray and white is determined to be 160%. The magnification rate of the area is determined to be 100% (that is, no expansion/reduction). Then, the image pair creation means 130 creates the image pair shown in FIG. 3B by enlarging each region at the determined magnification rate. That is, it is composed of an image in which at least one region of a frame of a moving image shown in FIG. 3B(a) is enlarged, and an image with parallax in which at least one corresponding region is enlarged, shown in FIG. 3B(b). A pair of images is created.

本例においても、画像対作成手段１３０は、深度情報に基づいて、処理されるべき画素と処理されるべきではない画素とを決定し、処理されるべき画素に対してのみ処理を行い（具体的には、拡大率／縮小率を決定して拡大／縮小する処理を行い）、処理されるべきではない画素に対しては何もしないようにしてもよい。これにより、感覚的立体視の効果が高い画像対を、コンピュータの負荷を軽減しかつ高速に作成することができるという利点を有している。 In this example as well, the image pair creation means 130 determines pixels that should be processed and pixels that should not be processed based on the depth information, and processes only the pixels that should be processed (specifically In particular, it may be possible to perform processing for enlarging/reducing the image by determining an enlargement/reduction ratio), and do nothing to pixels that should not be processed. This has the advantage that a pair of images with a high sensory stereoscopic effect can be created at high speed while reducing the load on the computer.

作成された画像対は、表示手段１４０に渡される。 The created image pair is passed to display means 140.

表示手段１４０は、２次元動画のそれぞれのフレームに対して作成された画像対を表示するように構成されている。表示手段１４０は、任意の３次元表示方式で画像対を表示することができる。表示手段１４０は、例えば、見る人が３Ｄ表示用の装着器具（例えば、３Ｄ眼鏡等）を装着することで画像対による３次元動画を見ることができるように、画像対を表示するようにしてもよいし、見る人が専用の装着器具を装着する必要なく画像対による３次元動画を見ることができるように、画像対を表示するようにしてもよい。好ましくは、表示手段１４０は、見る人が専用器具を装着する必要なく画像対による３次元動画を見ることができるように画像対を表示し得る。これにより、見る人は、いわゆる３Ｄ酔いになりづらく、さらに、３次元動画を複数の人で共有して見ることもできる。さらには、専用の装着器具は付け外し等のために接触が必要であるが、専用の装着器具を必要としないことで、本発明のシステム１００は、非接触ツールとなり、ポストコロナにおいても有用となる。 The display means 140 is configured to display image pairs created for each frame of the two-dimensional video. The display means 140 can display the image pair in any three-dimensional display format. The display means 140 is configured to display image pairs so that a viewer can view a three-dimensional moving image based on the image pairs by wearing a 3D display device (for example, 3D glasses). Alternatively, the image pairs may be displayed so that the viewer can view the three-dimensional animation of the image pairs without having to wear a special mounting device. Preferably, the display means 140 may display the image pairs so that a viewer can view a three-dimensional animation of the image pairs without having to wear specialized equipment. As a result, viewers are less likely to suffer from so-called 3D motion sickness, and furthermore, 3D videos can be shared and viewed by multiple people. Furthermore, since a dedicated mounting device requires contact for attachment and removal, the system 100 of the present invention becomes a non-contact tool and will be useful even in the post-corona era. Become.

表示手段１４０は、具体的には、レンチキュラーディスプレイであり得る。レンチキュラーディスプレイは、表示面上にレンチキュラーレンズが配列されたディスプレイであり、３Ｄ表示用の装着器具を装着することなく３次元画像を見ることを可能にする。 The display means 140 may specifically be a lenticular display. A lenticular display is a display in which lenticular lenses are arranged on a display surface, and makes it possible to view three-dimensional images without wearing a mounting device for 3D display.

このように、本発明のシステム１００は、入力された動画の各フレームから画像対を作成し、画像対を表示することで、動画が３次元的に見えるようにしている。本発明のシステム１００は、画像対を合成して３次元動画を生成する等の処理が不要であるため、負荷を低減させることができ、かつ、処理を高速化することができる。これは、略リアルタイムの表示につながり得る。さらに、大きな変位量でずらすべき画素を取捨選択して処理すること、大きな拡大率／縮小率で拡大／縮小すべき領域を取捨選択して処理すること、および／または、処理すべき画素／領域を取捨選択して処理することも、略リアルタイムの表示につながり得る。 In this manner, the system 100 of the present invention creates image pairs from each frame of the input video and displays the image pairs, thereby making the video appear three-dimensional. Since the system 100 of the present invention does not require processing such as combining image pairs to generate a three-dimensional video, it is possible to reduce the load and speed up the processing. This can lead to near real-time display. Furthermore, pixels to be shifted by a large amount of displacement may be selectively processed, regions to be enlarged/reduced by large enlargement/reduction ratios may be selectively processed, and/or pixels/regions to be processed may be selectively processed. Selecting and processing the information can also lead to near real-time display.

上述したシステム１００は、例えば、コンピュータ装置として実装されることができる。 The system 100 described above can be implemented as a computer device, for example.

コンピュータ装置は、入力部と、メモリ部と、プロセッサ部と、表示部とを備える。 The computer device includes an input section, a memory section, a processor section, and a display section.

入力部は、コンピュータ装置の外部からの入力を可能にするように構成されている。入力部には、２次元動画を入力することができる。入力部に２次元動画を入力する態様は問わない。例えば、２次元動画は、撮像装置から直接入力されるようにしてもよいし、撮像装置からネットワーク（例えば、ＬＡＮ、インターネット）を介して入力されるようにしてもよいし、記憶媒体を介して入力されるようにしてもよい。好ましくは、２次元動画は、撮像装置から直接入力され得る。これにより、撮像装置による撮像とコンピュータ装置による動画の表示とを略同時に行うことができるからである。
例えば、本発明のシステム１００の受信手段１１０は、入力部によって実装されることができる、
メモリ部には、コンピュータ装置の処理の実行に必要とされるプログラムやそのプログラムの実行に必要とされるデータ等が記憶されている。例えば、２次元動画を３次元的に表示するための処理をコンピュータ装置に行わせるためのプログラム（例えば、後述する図４Ａおよび図４Ｂに示される処理を実現するプログラム）の一部または全部が記憶されている。ここで、プログラムをどのようにしてメモリ部に記憶するかは問わない。例えば、プログラムは、メモリ部にプリインストールされていてもよい。あるいは、プログラムは、ネットワークを経由してダウンロードされることによってメモリ部にインストールされるようにしてもよい。プログラムは、コンピュータ読み取り可能な有形記憶媒体上に記憶されてもよい。メモリ部は、任意の記憶手段によって実装され得る。 The input unit is configured to allow input from outside the computer device. A two-dimensional moving image can be input to the input unit. The manner in which the two-dimensional video is input to the input unit does not matter. For example, a two-dimensional video may be input directly from an imaging device, may be input from an imaging device via a network (e.g. LAN, Internet), or may be input via a storage medium. It may also be configured to be input. Preferably, the two-dimensional moving image can be directly input from an imaging device. This is because the imaging device can take an image and the computer device can display a moving image almost simultaneously.
For example, the receiving means 110 of the system 100 of the invention can be implemented by an input unit,
The memory unit stores programs required for execution of processing by the computer device, data required for execution of the programs, and the like. For example, part or all of a program for causing a computer device to perform processing for displaying a two-dimensional video three-dimensionally (for example, a program for realizing the processing shown in FIGS. 4A and 4B described later) is stored. has been done. Here, it does not matter how the program is stored in the memory section. For example, the program may be preinstalled in the memory unit. Alternatively, the program may be installed in the memory unit by being downloaded via a network. The program may be stored on a computer readable tangible storage medium. The memory section may be implemented by any storage means.

プロセッサ部は、コンピュータ装置全体の動作を制御する。プロセッサ部は、メモリ部に記憶されているプログラムを読み出し、そのプログラムを実行する。これにより、コンピュータ装置を所望のステップを実行する装置として機能させることが可能である。プロセッサ部は、単一のプロセッサによって実装されてもよいし、複数のプロセッサによって実装されてもよい。 The processor section controls the operation of the entire computer device. The processor section reads a program stored in the memory section and executes the program. This allows the computer device to function as a device that executes desired steps. The processor section may be implemented by a single processor or by multiple processors.

例えば、本発明のシステム１００の受信手段１１０、深度情報取得手段１２０、画像対作成手段１３０、表示手段１４０の少なくとも一部は、プロセッサ部によって実装され得る。 For example, at least a part of the receiving means 110, the depth information acquisition means 120, the image pair creation means 130, and the display means 140 of the system 100 of the present invention may be implemented by a processor section.

表示部は、画像を表示するように構成されている。表示部は、作成された画像対を表示することができる。表示部は、任意の表示装置であり得、好ましくは、レンチキュラーディスプレイであり得る。 The display section is configured to display images. The display unit can display the created image pair. The display unit may be any display device, preferably a lenticular display.

例えば、本発明のシステム１００の表示手段１４０は、表示部によって実装され得る。 For example, the display means 140 of the system 100 of the invention may be implemented by a display.

コンピュータ装置の各構成要素は、コンピュータ装置内に設けられてもよいし、コンピュータ装置外に設けられてもよい。例えば、入力部、メモリ部、プロセッサ部、表示部のそれぞれが別々のハードウェア部品で構成されている場合には、各ハードウェア部品が任意のネットワークを介して接続されてもよい。このとき、ネットワークの種類は問わない。各ハードウェア部品は、例えば、ＬＡＮを介して接続されてもよいし、無線接続されてもよいし、有線接続されてもよい。コンピュータ装置は、特定のハードウェア構成には限定されない。例えば、プロセッサ部をデジタル回路ではなくアナログ回路によって構成することも本発明の範囲内である。コンピュータ装置の構成は、その機能を実現できる限りにおいて上述したものに限定されない。 Each component of the computer device may be provided within the computer device or may be provided outside the computer device. For example, if the input section, memory section, processor section, and display section are each composed of separate hardware components, each of the hardware components may be connected via an arbitrary network. At this time, the type of network does not matter. Each hardware component may be connected via a LAN, wirelessly, or wired, for example. Computer devices are not limited to any particular hardware configuration. For example, it is also within the scope of the present invention to configure the processor section with an analog circuit rather than a digital circuit. The configuration of the computer device is not limited to that described above as long as its functions can be realized.

コンピュータ装置は、例えば、手術支援装置として利用され得る。この場合、入力部には、手術用撮像装置（例えば、単眼式内視鏡）によって撮影された手術部位の２次元動画が入力される。表示部からは、手術部位の２次元動画が３次元的に表示されることになる。 The computer device can be used, for example, as a surgical support device. In this case, a two-dimensional video of the surgical site captured by a surgical imaging device (for example, a monocular endoscope) is input to the input unit. A two-dimensional moving image of the surgical site will be displayed three-dimensionally from the display unit.

コンピュータ装置は、例えば、動画再生装置として利用され得る。この場合、入力部には、既存の２次元動画（例えば、アニメーション、映画、ＴＶ番組、ドラマ）が入力される。表示部からは、既存の２次元動画が３次元的に表示されることになる。 The computer device can be used, for example, as a video playback device. In this case, an existing two-dimensional moving image (for example, animation, movie, TV program, drama) is input to the input unit. The existing two-dimensional moving image will be displayed three-dimensionally from the display unit.

（３．２次元動画を３次元的に表示するためのシステムにおける処理）
図４Ａは、２次元動画を３次元的に表示するためのシステム１００における処理の一例（処理４００）のフローチャートを示す。本例では、システム１００がコンピュータ装置によって実装されることを例に説明する。処理４００は、コンピュータ装置のプロセッサ部において実行されることになる。 (3. Processing in a system for displaying 2D video in 3D)
FIG. 4A shows a flowchart of an example of a process (process 400) in the system 100 for displaying a two-dimensional video three-dimensionally. In this example, an example in which the system 100 is implemented by a computer device will be described. Process 400 will be executed in a processor section of a computer device.

ステップＳ４０１では、プロセッサ部が、入力部に入力された２次元動画を、入力部から受信する。２次元動画は、任意の２次元動画であり得る。２次元動画は、好ましくは、単眼撮像装置（例えば、単眼式内視鏡）によって撮影された動画であり得る。２次元動画は、例えば、既存のアニメーションであってもよい。 In step S401, the processor unit receives a two-dimensional video input from the input unit. The two-dimensional video may be any two-dimensional video. The two-dimensional video may preferably be a video captured by a monocular imaging device (for example, a monocular endoscope). The two-dimensional video may be, for example, an existing animation.

ステップＳ４０２では、プロセッサ部が、ステップＳ４０１で受信された２次元動画のそれぞれのフレームからの深度情報を取得する。 In step S402, the processor unit obtains depth information from each frame of the two-dimensional video received in step S401.

プロセッサ部は、例えば、機械学習モデルを用いて、各フレームの深度情報を取得することができる。機械学習モデルは、画像と、その画像の深度情報との関係を学習したモデルである。そのような機械学習モデルにステップＳ４０１で受信された２次元動画のそれぞれのフレームを入力すると、それぞれのフレームの深度情報が出力される。 The processor unit can obtain depth information for each frame using, for example, a machine learning model. A machine learning model is a model that has learned the relationship between an image and its depth information. When each frame of the two-dimensional video received in step S401 is input to such a machine learning model, depth information of each frame is output.

ステップＳ４０３では、プロセッサ部が、ステップＳ４０１で受信された２次元動画のそれぞれのフレームに対してステップＳ４０２で取得された深度情報を利用して、ステップＳ４０１で受信された２次元動画のそれぞれのフレームに対して、視差を有する画像対を作成する。画像対は、フレームと、画像対作成手段１３０によって作成された画像との対であってもよいし、画像対作成手段１３０によって作成された第１の画像と第２の画像との対であってもよい。画像対作成手段１３０は、例えば、近くにある被写体である（深度が小さい）ほど、２つの画像中の位置のずれが大きくなり、遠くにある被写体である（深度が大きい）ほど、２つの画像中の位置のずれが小さくなるような画像対を作成することができる。 In step S403, the processor unit uses the depth information acquired in step S402 for each frame of the two-dimensional video received in step S401 to determine whether each frame of the two-dimensional video received in step S401 is , create a pair of images with parallax. The image pair may be a pair of a frame and an image created by the image pair creation means 130, or a pair of a first image and a second image created by the image pair creation means 130. It's okay. For example, the closer the object is (smaller the depth), the larger the positional shift between the two images becomes, and the farther the object is (larger the depth), the larger the difference between the two images becomes. It is possible to create a pair of images in which the positional deviation between the images is small.

プロセッサ部は、ステップＳ４０３において、２次元動画のそれぞれのフレームに対して、深度情報に基づいて、フレームの少なくとも１つの画素の変位量を決定することと、フレームの少なくとも１つの画素を決定された変位量だけずらすことによって、フレームに対して視差がつけられた視差付画像を作成することとによって、フレームと視差付画像とから構成される画像対を作成することができる。このとき、プロセッサ部は、深度が小さいほど変位量が大きくなるように変位量を決定することができる。深度と変位量とは、例えば、線形の関係を有し得る。例えば、フレーム中の第１の画素が、第１の深度を有し、第２の画素が、第１の深度の２倍の深度を有する場合、第１の画素の変位量は、第２の画素の変位量の２倍となり得る。 In step S403, the processor unit determines, for each frame of the two-dimensional video, the displacement amount of at least one pixel of the frame based on the depth information, and determines the amount of displacement of at least one pixel of the frame. By shifting the frame by the amount of displacement and creating a parallax image in which parallax is added to the frame, an image pair consisting of the frame and the parallax image can be created. At this time, the processor unit can determine the amount of displacement such that the smaller the depth, the larger the amount of displacement. For example, depth and displacement may have a linear relationship. For example, if a first pixel in a frame has a first depth and a second pixel has a depth twice the first depth, the amount of displacement of the first pixel is equal to the amount of displacement of the second pixel. This can be twice the amount of displacement of the pixel.

プロセッサ部は、ステップＳ４０３において、深度情報に基づいて、処理されるべき画素と処理されるべきではない画素とを決定し、処理されるべき画素に対してのみ処理を行い（具体的には、変位量を決定して変位量だけずらす処理を行い）、処理されるべきではない画素に対しては何もしないようにしてもよい。これは、処理のために必要なリソースを削減することができるため、コンピュータの負荷を軽減することができるとともに、処理の高速化につながるという利点を有している。 In step S403, the processor unit determines pixels that should be processed and pixels that should not be processed based on the depth information, and processes only the pixels that should be processed (specifically, It is also possible to perform processing to determine the amount of displacement and shift by the amount of displacement), and do nothing to pixels that should not be processed. This has the advantage that the resources required for processing can be reduced, thereby reducing the load on the computer and leading to faster processing.

プロセッサ部は、ステップＳ４０３において、上述した処理に加えて、２次元動画のそれぞれのフレームに対して、深度情報に基づいて、フレームの少なくとも１つの領域の拡大率または縮小率を決定することと、フレームの少なくとも１つの領域および視差付画像の対応する少なくとも１つの領域を決定された拡大率または縮小率で拡大／縮小することとを行うようにしてもよい。このとき、プロセッサ部は、深度が小さい領域ほど拡大率が大きくなるように、かつ／または、深度が大きい領域ほど縮小率が大きくなるように、拡大率および／または縮小率を決定することができる。少なくとも１つの領域は、例えば、同一または類似する深度を有する画素を含む領域であり得る。 In step S403, in addition to the above-described processing, the processor unit determines, for each frame of the two-dimensional video, an enlargement rate or reduction rate of at least one region of the frame based on the depth information; At least one area of the frame and at least one corresponding area of the parallax image may be enlarged/reduced at a determined enlargement rate or reduction rate. At this time, the processor unit can determine the enlargement rate and/or the reduction rate so that the smaller the depth is, the larger the enlargement rate is, and/or the larger the depth is, the larger the reduction rate is. . The at least one region may be, for example, a region containing pixels with the same or similar depth.

ステップＳ４０４では、プロセッサ部が、２次元動画のそれぞれのフレームに対してステップＳ４０３で作成された画像対を表示部に表示する。プロセッサ部は、任意の３次元表示方式で画像対を表示部に表示することができる。プロセッサ部は、例えば、見る人が３Ｄ表示用の装着器具（例えば、３Ｄ眼鏡等）を装着することで画像対による３次元動画を見ることができるように、画像対を表示部に表示するようにしてもよいし、見る人が専用の装着器具を装着する必要なく画像対による３次元動画を見ることができるように、画像対を表示部に表示するようにしてもよい。好ましくは、プロセッサ部は、見る人が専用器具を装着する必要なく画像対による３次元動画を見ることができるように画像対を表示部に表示し得る。これにより、見る人は、いわゆる３Ｄ酔いになりづらく、さらに、３次元動画を複数の人で共有して見ることもできる。 In step S404, the processor unit displays the image pair created in step S403 for each frame of the two-dimensional video on the display unit. The processor section can display the image pair on the display section in any three-dimensional display format. The processor unit is configured to display the image pair on the display unit, for example, so that a viewer can view a three-dimensional moving image based on the image pair by wearing a wearable device for 3D display (e.g., 3D glasses). Alternatively, the image pair may be displayed on a display unit so that the viewer can view a three-dimensional moving image of the image pair without having to wear a special attachment device. Preferably, the processor unit is capable of displaying the image pair on the display unit so that a viewer can view a three-dimensional animation of the image pair without having to wear special equipment. As a result, viewers are less likely to suffer from so-called 3D motion sickness, and furthermore, 3D videos can be shared and viewed by multiple people.

図４Ｂは、ステップＳ４０３でプロセッサ部が行う処理の一例を示すフローチャートである。ステップＳ４０３は、２次元動画のそれぞれのフレームに対して同様に行われる。ここでは、２次元動画の１つのフレームに対して行われることを例に説明する。 FIG. 4B is a flowchart illustrating an example of processing performed by the processor unit in step S403. Step S403 is similarly performed for each frame of the two-dimensional video. Here, we will explain what is done for one frame of a two-dimensional video as an example.

ステップＳ４０３１では、プロセッサ部が、深度情報に基づいて、処理されるべき画素と処理されるべきではない画素とを決定する。プロセッサ部は、例えば、深度が所定の閾値未満である画素を処理されるべき画素であると決定し、深度が所定の閾値以上である画素を処理されるべきではない画素として決定することができる。所定の閾値は、固定値であってもよいし変動値であってもよい。所定の閾値が変動値である場合、所定の閾値は、例えば、画素全体の深度の平均値、中央値、第１四分位数、または、第３四分位数であってもよいし、例えば、最大値の９０％値、８０％値、７０％値、６０％値、４０％値、または３０％値等であってもよい。 In step S4031, the processor unit determines pixels to be processed and pixels not to be processed based on the depth information. The processor unit may, for example, determine that a pixel whose depth is less than a predetermined threshold is a pixel that should be processed, and a pixel whose depth is greater than or equal to a predetermined threshold as a pixel that should not be processed. . The predetermined threshold value may be a fixed value or a variable value. When the predetermined threshold value is a variable value, the predetermined threshold value may be, for example, the average value, median value, first quartile, or third quartile of the depth of all pixels; For example, the maximum value may be 90%, 80%, 70%, 60%, 40%, or 30% of the maximum value.

ステップＳ４０３２では、ｉ＝１が定義される。 In step S4032, i=1 is defined.

ステップＳ４０３３～ステップＳ４０３４では、ステップＳ４０３１で処理されるべき画素として決定された画素が順に処理される。処理されるべきではない画素として決定された画素は何ら処理されない。これにより、処理のために必要なリソースを削減することができるため、コンピュータの負荷を軽減することができるとともに、処理の高速化につながる。 In steps S4033 and S4034, the pixels determined to be processed in step S4031 are sequentially processed. Pixels determined as not to be processed are not processed at all. This makes it possible to reduce the resources required for processing, thereby reducing the load on the computer and leading to faster processing.

ステップＳ４０３３～ステップＳ４０３４で処理される画素の順序は、好ましくは、深度が大きい画素から深度が小さい画素の順であり得る。これにより、作成される視差付画像において、より遠くにある被写体がより近くにある被写体の上に重なる等のエラーが発生しにくいからである。 The order of pixels processed in steps S4033 and S4034 may preferably be from pixels with greater depth to pixels with smaller depth. This is because errors such as a farther object overlapping a closer object are less likely to occur in the created parallax image.

ステップＳ４０３３では、プロセッサ部が、処理されるべき画素として決定された画素のうちの第ｉの画素について、深度情報に基づいて変位量を決定する。プロセッサ部は、例えば、事前に決定された深度情報と変位量との関係に従って、第ｉの画素の変位量を決定することができる。事前に決定された深度情報と変位量との関係は、例えば、線形の関係であり得、（変位量）＝α×（深度）＋β等の関係で表され得る（α、βは定数）。 In step S4033, the processor unit determines the amount of displacement for the i-th pixel among the pixels determined as pixels to be processed, based on the depth information. The processor unit can determine the amount of displacement of the i-th pixel, for example, according to a predetermined relationship between depth information and the amount of displacement. The relationship between the predetermined depth information and the amount of displacement may be, for example, a linear relationship, and may be expressed by a relationship such as (amount of displacement)=α×(depth)+β (α and β are constants).

ステップＳ４０３４では、プロセッサ部が、ステップＳ４０３３で決定された変位量だけ、フレーム中の第ｉの画素が右方向または左方向にずらされる。ずらされる方向は、作成される視差付画像が左眼用の画像となるか、右眼用の画像となるかに応じて決定される。 In step S4034, the processor section shifts the i-th pixel in the frame to the right or left by the amount of displacement determined in step S4033. The direction of shift is determined depending on whether the created parallax image is an image for the left eye or an image for the right eye.

ステップＳ４０３５では、ｉがインクリメントされる（すなわち、ｉ＝ｉ＋１とされる）。 In step S4035, i is incremented (ie, i=i+1).

ステップＳ４０３６では、ステップＳ４０３１で処理されるべき画素として決定された画素のすべてが処理されたか否かが判定される。すべての画素が処理されたと判定されるとステップＳ４０３７に進み、まだすべての画素が処理されていないと判定されるとステップＳ４０３３に戻り、すべての画素が処理されたと判定されるまで、ステップＳ４０３３～ステップＳ４０３６が繰り返される。 In step S4036, it is determined whether all the pixels determined to be processed in step S4031 have been processed. If it is determined that all pixels have been processed, the process advances to step S4037; if it is determined that all pixels have not been processed yet, the process returns to step S4033, and the process continues from step S4033 to step S4033 until it is determined that all pixels have been processed. Step S4036 is repeated.

ステップＳ４０３７では、深度情報に基づいて画素がずらされた視差付画像が作成され、フレームと作成された視差付画像とから構成される画像対が作成される。 In step S4037, a parallax image with pixels shifted based on the depth information is created, and an image pair consisting of the frame and the created parallax image is created.

なお、ステップＳ４０３７の前に、深度情報に基づいて、フレームの少なくとも１つの領域の拡大率または縮小率を決定することと、フレームの少なくとも１つの領域および視差付画像の対応する少なくとも１つの領域を決定された拡大率または縮小率で拡大／縮小することとを行うようにしてもよい。深度に応じた拡大率／縮小率で拡大／縮小することにより、ステップＳ４０３７で作成された画像対は、より自然な見え方で３次元的に表示されることができるものとなる。深度が小さい（近い）被写体をより大きくし、深度が大きい（遠い）被写体をより小さくすることは、人間の生理的な両眼視認識に沿ったものであり、感覚的立体視の効果を高めることができるからである。さらに、このような拡大／縮小により、見る人の画像中の焦点が変化するため、被写体のエッジが強調されるように見えることになる。これにより、例えば、近い被写体は、鮮明になり、輪郭が連続し、かつ／または解像度がよく見える一方で、遠い被写体は、ぼやけて、輪郭が非連続であり、かつ／または解像度が乏しく見えるようになる。これも、感覚的立体視の効果を高めることができる。 Note that before step S4037, based on the depth information, the enlargement rate or reduction rate of at least one area of the frame is determined, and the at least one area of the frame and the corresponding at least one area of the image with parallax are Enlargement/reduction may be performed at a determined enlargement rate or reduction rate. By enlarging/reducing at an enlargement/reduction ratio according to the depth, the image pair created in step S4037 can be displayed three-dimensionally with a more natural appearance. Making objects with a small depth (closer) larger and objects with a larger depth (farther) smaller is in line with human physiological binocular recognition and increases the effect of sensory stereopsis. This is because it is possible. Furthermore, such enlargement/reduction changes the focal point of the image for the viewer, so that the edges of the subject appear to be emphasized. This may cause, for example, objects that are close to each other to appear sharp, have continuous contours, and/or good resolution, while objects that are far away appear blurred, have discontinuous contours, and/or have poor resolution. become. This can also enhance the effect of sensory stereopsis.

このように、処理４００により、入力された２次元動画から画像対を作成し、画像対を表示することで、２次元動画が３次元的に見えるようにしている。処理４００では、画像対を合成して３次元動画を生成する等の処理を行う必要ななく、負荷を低減させることができ、かつ、処理を高速化することができる。これは、略リアルタイムの表示につながり得る。さらに、大きな変位量でずらすべき画素を取捨選択して処理すること、大きな拡大率／縮小率で拡大／縮小すべき領域を取捨選択して処理すること、および／または、処理すべき画素／領域を取捨選択して処理することも、略リアルタイムの表示につながり得る。 In this manner, the process 400 creates image pairs from the input two-dimensional video and displays the image pairs, thereby making the two-dimensional video appear three-dimensional. In the process 400, there is no need to perform processes such as combining image pairs to generate a three-dimensional moving image, and the load can be reduced and the process can be speeded up. This can lead to near real-time display. Furthermore, pixels to be shifted by a large amount of displacement may be selectively processed, regions to be enlarged/reduced by large enlargement/reduction ratios may be selectively processed, and/or pixels/regions to be processed may be selectively processed. Selecting and processing the information can also lead to near real-time display.

上述した例では、特定の順序で各ステップの処理が行われることを説明したが、各ステップの処理の順序は説明されるものに限定されない。論理的に可能な任意の順序で、各ステップの処理を行うことができる。例えば、ステップＳ４０３３の後にステップＳ４０３４を行わず、ステップ４０３６で処理されるべきすべての画素が処理されたと判定された後に、ステップＳ４０３４において、決定された変位量だけ対象の画素を一括してずらすようにしてもよい。 In the example described above, it has been explained that each step is performed in a specific order, but the order of each step is not limited to that described. Each step can be processed in any logically possible order. For example, step S4034 is not performed after step S4033, and after it is determined in step 4036 that all pixels to be processed have been processed, in step S4034, the target pixels are collectively shifted by the determined displacement amount. You may also do so.

上述した例では、図４Ａおよび図４Ｂに示される各ステップの処理は、プロセッサ部とメモリ部に格納されたプログラムとによって実現することが説明されたが、本発明はこれに限定されない。図４Ａおよび図４Ｂに示される各ステップの処理のうちの少なくとも１つは、制御回路などのハードウェア構成によって実現されてもよい。 In the example described above, it has been explained that the processing of each step shown in FIGS. 4A and 4B is realized by the processor unit and the program stored in the memory unit, but the present invention is not limited to this. At least one of the processes in each step shown in FIGS. 4A and 4B may be realized by a hardware configuration such as a control circuit.

本発明は、上述した実施形態に限定されるものではない。本発明は、特許請求の範囲によってのみその範囲が解釈されるべきであることが理解される。当業者は、本発明の具体的な好ましい実施形態の記載から、本発明の記載および技術常識に基づいて等価な範囲を実施することができることが理解される。 The present invention is not limited to the embodiments described above. It is understood that the invention is to be construed in scope only by the claims. It will be understood that those skilled in the art will be able to implement the present invention to an equivalent extent based on the description of the present invention and common general technical knowledge from the description of the specific preferred embodiments of the present invention.

手術中に撮影された２次元動画を用いて、本発明のシステムにより、画像対を作成した。 Image pairs were created by the system of the present invention using two-dimensional videos taken during surgery.

手術に用いた顕微鏡には、下記のカメラが搭載されていた。
・ＮＩＲカメラ×１（ＩＣＧ用）
・白色光カメラ×２（左眼用、右眼用）
（センターサイズ：１／１．２インチＣＭＯＳ） The microscope used for the surgery was equipped with the following camera.
・NIR camera x 1 (for ICG)
・White light camera x 2 (for left eye, for right eye)
(Center size: 1/1.2 inch CMOS)

解像度は、１０８０ｐ（ＦｕｌｌＨＤ）であった。 The resolution was 1080p (Full HD).

搭載されていたカメラのうち、右眼用の白色光カメラを用いて、撮影を行った。左眼用の白色色カメラ、ＮＩＲカメラは用いなかった。 Of the cameras on board, the white light camera for the right eye was used to take pictures. A white camera for the left eye and a NIR camera were not used.

右眼用の白色光カメラによって撮影された２次元動画を本発明のシステムにより処理した。本発明のシステムを、ＮＶＩＤＩＡ（ＧＰＵ）のコンピュータを用いて、Ｐｙｔｈｏｎのプログラム言語により実装した。 A two-dimensional video captured by a white light camera for the right eye was processed by the system of the present invention. The system of the present invention was implemented using a NVIDIA (GPU) computer using the Python programming language.

図５Ａ～図５Ｄは、本実施例で作成された画像対を示す。 5A-5D show image pairs created in this example.

図５Ａ～図５Ｄでは、左側の（ａ）が白色光カメラによって撮影された２次元動画のフレームであり、右側の（ｂ）が本発明のシステムによって作成されたフレームである。 In FIGS. 5A to 5D, (a) on the left is a frame of a two-dimensional video captured by a white light camera, and (b) on the right is a frame created by the system of the present invention.

図５Ａ～図５Ｄにおいて、右眼で左側の（ａ）のフレーム（実際の映像）を見て、左眼で右側の（ｂ）のフレーム（本発明のシステムによって作成された映像）を見て、これらのフレームを重ね合わせて見ることで画像が立体的に見えることがわかる。なお、本例では、実際の映像を左側に置き、本発明のシステムによって作成された映像を右側に置いたが、置き方はこれに限定されない。例えば、実際の映像を右側に置き、本発明のシステムによって作成された映像を左側に置いてもよく、この場合も、重ね合わせられた画像は立体的に見える。 In FIGS. 5A to 5D, the right eye views the frame (a) on the left (the actual image), and the left eye views the frame (b) on the right (the image created by the system of the present invention). , it can be seen that by superimposing these frames, the image appears three-dimensional. Note that in this example, the actual video is placed on the left and the video created by the system of the present invention is placed on the right, but the placement is not limited to this. For example, the actual image may be placed on the right and the image created by the system of the invention may be placed on the left; again, the superimposed images will appear stereoscopic.

このように、本発明のシステムによれば、２次元動画のフレームから、立体視を可能にするように画像対を作成することができた。 In this way, according to the system of the present invention, image pairs could be created from frames of a two-dimensional video so as to enable stereoscopic viewing.

また、本発明のシステムでは、３０ＦＰＳ（３０フレーム／秒）で２次元動画の各フレームを処理して画像対を作成することができた。このように、本発明のシステムは、略リアルタイムで画像対を出力することができた。また、作成された画像対を表示装置に表示
したところ、遅延を感じさせることなく動画が表示された。 Furthermore, the system of the present invention was able to process each frame of a two-dimensional video at 30 FPS (30 frames/second) to create an image pair. Thus, the system of the present invention was able to output image pairs in near real time. Furthermore, when the created image pair was displayed on a display device, the moving image was displayed without any perceived delay.

本発明は、深度情報を有しない２次元画像であっても、３次元的に表示することが可能なシステム等を提供するものとして有用である。 INDUSTRIAL APPLICATION This invention is useful as a system etc. which can display three-dimensionally even a two-dimensional image which does not have depth information.

１０単眼式内視鏡
２０動画
１００システム
１１０受信手段
１２０深度情報取得手段
１３０画像対作成手段
１４０表示手段 10 Monocular endoscope 20 Video 100 System 110 Receiving means 120 Depth information acquisition means 130 Image pair creation means 140 Display means

Claims

A system for displaying 2D videos in 3D,
a receiving means for receiving a two-dimensional video;
Depth information acquisition means for acquiring depth information from each frame of the two-dimensional video;
image pair creation means for creating an image pair having parallax for each frame of the two-dimensional video using the depth information of each frame of the two-dimensional video;
and display means for displaying the pair of images of each frame of the two-dimensional video.

The image pair creation means, for each frame of the two-dimensional video,
determining a displacement amount of at least one pixel of the frame based on the depth information;
creating a parallax image in which parallax is added to the frame by shifting the at least one pixel of the frame by the determined displacement amount; 2. The system of claim 1, wherein the system creates pairs of images.

The system according to claim 2, wherein the image pair creation means determines the displacement amount such that the smaller the depth, the larger the displacement amount.

The image pair creation means determines pixels to be processed and pixels not to be processed for each frame of the two-dimensional video based on the depth information,
3. The system of claim 2, wherein the at least one pixel is at least one of the to-be-processed pixels.

The image pair creation means determines pixels to be processed and pixels not to be processed for each frame of the two-dimensional video based on the depth information,
4. The system of claim 3, wherein the at least one pixel is at least one of the to-be-processed pixels.

The image pair creation means, for each frame of the two-dimensional video,
determining an enlargement or reduction ratio of at least one region of the frame based on the depth information;
6. Enlarging/reducing the at least one region of the frame and the corresponding at least one region of the parallax image at the determined enlargement rate or reduction rate. system described in.

The image pair creation means determines the magnification rate and/or the reduction rate such that the magnification rate is greater in an area with a smaller depth, and/or the reduction rate is greater in an area with a greater depth. 7. The system of claim 6.

2. The system of claim 1, wherein the display means includes a lenticular display.

A method for displaying a two-dimensional video three-dimensionally, the method comprising:
receiving a two-dimensional video;
obtaining depth information from each frame of the two-dimensional video;
creating an image pair having parallax for each frame of the two-dimensional video using the depth information of each frame of the two-dimensional video;
displaying the image pair of each frame of the two-dimensional video.

A program for displaying a two-dimensional moving image three-dimensionally, the program being executed in a computer including a processor section and a display section, the program comprising:
receiving a two-dimensional video;
obtaining depth information from each frame of the two-dimensional video;
creating an image pair having parallax for each frame of the two-dimensional video using the depth information of each frame of the two-dimensional video;
Displaying the pair of images of each frame of the two-dimensional moving image on the display unit.