WO2024095356A1

WO2024095356A1 - Graphics generation device, graphics generation method, and program

Info

Publication number: WO2024095356A1
Application number: PCT/JP2022/040852
Authority: WO
Inventors: 塁佐藤
Original assignee: 株式会社バーチャルウインドウ
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2024-05-10

Abstract

The present invention enables a CG image to be superimposed on a two-dimensional image, by expressing occlusion.　This graphics generation device comprises a storage unit and a processing unit. Data pertaining to two-dimensional actual images in which an actual space is captured and three-dimensional CG data representing virtual objects superimposed on the two-dimensional actual images are stored in the storage unit. The processing unit: acquires data pertaining to a three-dimensional actual-space model in which real objects present in the actual space are modeled; generates a virtual space that includes a virtual plane to which a two-dimensional actual image is affixed as a texture, the three-dimensional actual-space model, and a virtual object; identifies a shield region in the virtual object that is shielded, on the basis of an observation position in the virtual space that corresponds to the imaging position in the real world when the two-dimensional actual image was captured, the virtual plane, and the three-dimensional actual-space model; and generates a virtual object in which the shield region is shielded.

Description

Graphics generation device, graphics generation method and program

This disclosure relates to technology for overlaying CG images onto real-world images.

Non-Patent Document 1 shows an automobile driving simulator. A seat, steering wheel, levers, etc. that mimics the driver's seat of an automobile are arranged, and an LCD monitor is placed in the area that corresponds to the windshield of an automobile, and a CG-created image that mimics the view seen through the windshield from the driver's seat is displayed on this monitor. When used for driving lessons at driving schools, CG that recreates automobile accidents may be added to the image.

Patent Document 1 discloses a head-mounted display that expresses occlusion in augmented reality (AR). The head-mounted display in Patent Document 1 is equipped with cameras at approximately the same positions as the user's left and right eyes, calculates the distance to an object based on the parallax of images captured by the two cameras, calculates the occluded area of a CG (Computer Graphics) image based on that distance, and generates an augmented reality image by superimposing a CG image in which the occluded area has been occluded and removed onto an image of the real scene captured by the camera.

Patent Document 2 also discloses exercise equipment that can reduce the user's anxiety and discomfort while providing the user with a high sense of realism and immersion. The exercise equipment disclosed in Patent Document 2 includes an exercise device for making the user perform a predetermined exercise, a measurement device for measuring the user's viewpoint position in a predetermined reference coordinate system when the user exercises using the exercise device, and a video device for generating a display image of an object that simulates how the object appears when viewed in a virtual space of the reference coordinate system from the viewpoint position through a fixed screen according to the viewpoint position, and displaying the display image on a screen.

International Publication WO2020/115815A1 International Publication WO2020/218368A1

For example, in a car driving simulator (Non-Patent Document 1) used in driving schools, etc., CG-created roads, buildings, cars, pedestrians, etc. may be displayed on an LCD monitor installed in front of a seat simulating a car driver's seat to allow users to experience highly realistic accident situations that cannot be experienced in driving lessons with an actual car. However, in the method of Non-Patent Document 1, the images used in the simulator are CG rather than real, and the images displayed on the LCD monitor are fixed regardless of the driver's viewpoint position. Therefore, unlike an actual windshield, even if the driver leans forward to look ahead, the image of the scenery seen by the driver does not change, and this often gives the impression of a video game, and the reproducibility of the driving experience is low, making it ineffective as a teaching material for driving training. An ideal car driving simulator needs to provide an experience so realistic that it gives the illusion of being in a real car. To achieve this, firstly, the images displayed on the monitor are based on real-life footage, and occlusion processing is required to superimpose CG onto the real-life footage to a high degree in order to recreate accident situations. Secondly, rather than displaying an image with a fixed viewing angle on the monitor, processing is essential that applies virtual reality (VR) technology to present images according to the viewpoint position, creating the illusion that the real world is really spreading out beyond the monitor.

In the method of Patent Document 1, occlusion is achieved by calculating the distance from the image to the object in real time while shooting the image with a camera, and then calculating the occluded area in CG based on that distance. However, when shooting live-action images from a vehicle for use in a car driving simulator, such three-dimensional measurements are not usually performed and only a simple two-dimensional image video is obtained. Therefore, when creating content to be projected on a car driving simulator based on this video, it is difficult to achieve advanced video expression such as superimposing live-action images and CG based on occlusion processing. Furthermore, using a head-mounted display, which is commonly used in virtual reality technology to produce video expression according to the driver's viewpoint, is not suitable as a driving training material because it changes significantly from the actual driving situation.

Here, a car driving simulator is given as an example, but other examples are also possible. For example, in Patent Document 2, live-action content previously shot with a 360-degree camera is projected on three walls of a private room where a fitness bike is installed. While exercising on the fitness bike in the private room, the user can feel as if they are actually riding a bicycle through a cityscape or landscape. In the video content based on the 360-degree video of the real space used here, if one considers superimposing 3DCG by occlusion processing to further entertain the user, difficulties arise because three-dimensional measurements like those in Patent Document 1 are not usually performed when shooting the base 360-degree video. In this way, the demand for superimposing advanced CG on simple two-dimensional image video centered on the landscape afterwards is generated not only in the entertainment field but also in various other fields.

One objective of one aspect of the present disclosure is to provide technology that enables occlusion processing that seamlessly superimposes a virtual object (CG) on a two-dimensional image of real space captured without three-dimensional measurement, taking into account the relative positions of the front and back and the degree of overlap between the two. Another objective of another aspect of the present disclosure is to provide technology that gives the user the illusion of the existence of a virtual world that spreads out beyond the display device.

A graphics generation device according to one aspect of the present disclosure is a graphics generation device having a storage unit and a processing unit, in which data of a two-dimensional live-action image captured in real space and three-dimensional CG data representing a virtual object superimposed on the two-dimensional live-action image are stored in the storage unit, and the processing unit is configured to execute the following: acquire data of a three-dimensional real-space model that models a real object existing in the real space; generate a virtual space including a virtual surface onto which the two-dimensional live-action image is pasted as a texture, a three-dimensional real-space model, and a virtual object; identify an occluded area that is occluded in the virtual object based on an observation position in the virtual space that corresponds to a shooting position in the real world when the two-dimensional live-action image was captured, the virtual surface, and the three-dimensional real-space model; and generate a virtual object that occludes the occluded area.

A graphics generation device according to another aspect of the present disclosure is a graphics generation device having a storage unit and a processing unit, in which data of a two-dimensional live-action image captured of a real space and three-dimensional CG data representing a virtual object to be superimposed on the two-dimensional live-action image are stored in the storage unit, and the processing unit is configured to execute the following: acquire data of a three-dimensional real-space model that models a real object existing in the real space; generate a virtual space including a virtual surface onto which the two-dimensional live-action image is attached as a texture and the three-dimensional real-space model and the virtual object; and, when the distance from an observation position in the virtual space corresponding to the shooting position in the real world when the two-dimensional live-action image was captured to the virtual object is longer than the distance from the observation position to the three-dimensional real-space model, further generate an overlapped virtual surface in which a texture obtained by two-dimensionally converting the virtual surface by perspective projection starting from the observation position is superimposed on the texture and attached to the virtual surface, the virtual object occluding a portion corresponding to the occluded area occluded by the three-dimensional real-space model.

In addition, a graphics generation device according to another aspect of the present disclosure is a graphics generation device having a storage unit and a processing unit, in which data of a two-dimensional live-action image captured of a real space and three-dimensional CG data representing a virtual object to be superimposed on the two-dimensional live-action image are stored in the storage unit, and the processing unit is configured to execute the following operations: acquire data of a three-dimensional real space model that models a real object existing in the real space; generate a virtual space including a virtual surface onto which the two-dimensional live-action image is pasted as a texture and the three-dimensional real space model and the virtual object; set the three-dimensional real space model as an obstruction for calculating an obstruction area in a virtual object other than the virtual surface; and generate a virtual object that obstructs the obstruction area.

According to one aspect of the present disclosure, it is possible to perform occlusion processing that allows a virtual object (CG) to be seamlessly superimposed on a two-dimensional image of real space captured without performing three-dimensional measurements, taking into account the relative positions of the front and rear and the degree of overlap between the two. Furthermore, according to another aspect of the present disclosure, it is possible to create an experience in which the virtual world created by the image that has been subjected to such processing is perceived as if it exists on the other side of the display device, without using a head-mounted display.

1 is a schematic configuration diagram of an image display system according to an embodiment of the present disclosure. 1 is a conceptual diagram for explaining a display image that follows a viewpoint position. 1A and 1B are diagrams for explaining how the image display system of the present embodiment displays an image according to a user's viewpoint position. FIG. 11 is a conceptual diagram for explaining calculation of a viewpoint position. 1 is a block diagram showing a configuration of a graphics generation device according to an embodiment of the present invention. 4 is a flowchart showing the overall processing by the graphics generation device of the present embodiment. This is a flowchart of another example in which some processing is different from that of the flowchart shown in FIG. FIG. 6 is a conceptual diagram for explaining the process of step S102 in the flowchart shown in FIG. 5 . 6 is a diagram conceptually showing a state in which a three-dimensional real space model is reproduced in real size within a virtual space by the process of step S104 in the flowchart shown in FIG. 5. FIG. 7 conceptually illustrates a state in which an observation position for identifying an occluded area is identified, at which a real object included in a two-dimensional real-life image appears to match a three-dimensional real space model corresponding to the real object, by the processing of step S105 in the flowchart shown in FIG. 6 is a conceptual diagram showing an example of an occlusion region on a virtual object identified by the processing in step S106 in the flowchart shown in FIG. 5 . FIG. 5B is a conceptual diagram showing an example of a masked area on a virtual surface identified by the process of step S106A in the flowchart shown in FIG. 5A. FIG. 6 is a diagram showing an example of a display image displayed on the display device in step S109 in the flowchart shown in FIG. 5 .

An embodiment of the present invention will be described with reference to the drawings.

FIG. 1 is a schematic diagram of an image display system according to an embodiment of the present disclosure. As shown in FIG. 1, the graphics display system (hereinafter sometimes simply referred to as the "system") 1 of this embodiment mainly comprises a graphics generation device 10 that generates graphics to be presented to a user, a measurement device 20 that measures the viewpoint position of a user 2, and a display device 30 that displays the image generated by the graphics generation device 10. As an example, the graphics generation device 10 is a computer that executes a software program with a processor, and also serves as a calculation unit (not shown) of the measurement device 20.

In this embodiment, the display device 30 includes, as an example, four display devices: a display device 30A located in front of the user 2, a display device 30B located to the right of the user 2, a display device 30C (not shown in FIG. 1) located to the left of the user 2, and a display device 30D located above the user 2, and is configured to surround at least a portion of the user 2's periphery, including the front of the user 2.

In this embodiment, each display device 30A-30D is installed in a housing 3 that mimics the driver's seat and passenger seat of a car. The housing 3 is equipped with seats 3B that correspond to the driver's seat and passenger seat of a car, and the user sits in these seats 3B to view the display device 30. The measurement device 20 (sensor 22) is provided in a lower part 3A of the display device 30A located in front of the user 2 on the housing 3, which corresponds to the dashboard of a car, as an example, and measures the user's viewpoint position from in front of the user 2. The installation position of the measurement device 20 is not limited to this, and it can be placed in any position where the user's viewpoint position can be measured.

As an example, the system 1 of this embodiment configured in this way can operate as a so-called driving simulator, in which the graphics generating device 10 generates a view seen from the window of a car according to the viewpoint position of the user 2 and displays it on the display device 30.

The measuring device 20 measures the position of the viewpoint 2A of the user 2 in a predetermined reference coordinate system while the user 2 is looking at the display device 30. The viewpoint 2A is a position that corresponds to the position of the eyes. The specific viewpoint 2A used in the processing is not particularly limited, but for example, the midpoint between the eyes of the user 2, the center point of the head, or a position a predetermined distance inward from the center of the eyes can be used as the viewpoint 2A.

In FIG. 1, as an example of a reference coordinate system, a Cartesian coordinate system is shown having an X-axis pointing to the right of the user 2, a Y-axis pointing upward, and a Z-axis pointing backward. The positions and orientations of the sensor 22 (not shown in FIG. 1) in the measurement device 20 and each of the display devices 30A-30D are fixed to this reference coordinate system. The orientation is represented by Pitch around the X-axis, Yaw around the Y-axis, and Roll around the Z-axis.

The graphics generation device 10 generates a display image of a virtual object that simulates how the virtual object appears in the virtual space of the reference coordinate system when viewed from viewpoint position 2A through a fixed screen, according to viewpoint position 2A and information on the position and shape of the screen of the fixed display device 30, and displays the display image on that screen. In this embodiment, the screen fixed to the reference coordinate system is the screen of the display device 30. Since the display device 30 includes four display devices 30A-30D, the graphics generation device 10 displays, on each screen of the display devices 30A-30D, a display image of the virtual object in the virtual space as seen from viewpoint position 2A through each screen.

The graphics generation device 10 stores in advance in an internal storage unit 12 (see FIG. 4) data of two-dimensional live-action images of real space captured in advance using a video camera or a 360° camera, and three-dimensional CG data representing virtual objects to be superimposed on the two-dimensional live-action images, such as data on the position and shape of an elephant drawn in three-dimensional CG, and generates display images based on a calculation method for three-dimensional to two-dimensional conversion so that the two-dimensional live-action images and the three-dimensional CG are appropriately displayed on the display devices 30A to 30D based on the viewpoint position 2A and the position and shape of each screen so that the images look plausible and natural to the user 2, giving the illusion that the virtual objects in the virtual space are actually present there. The graphics generation device 10 of this embodiment is particularly configured to superimpose the two-dimensional live-action images and the virtual objects while occluding parts of the virtual objects to be superimposed on the two-dimensional live-action images that are occluded by real objects, such as buildings, contained in the two-dimensional live-action images. When generating a display image, for example an image to be displayed on the screen of display device 30A, graphics generation device 10 performs projective transformation, perspective projection transformation, or similar calculation processing to project a virtual object in a virtual space in which three-dimensional CG data is arranged onto the screen of display device 30A, i.e., a two-dimensional surface, based on the user's viewpoint position 2A. Graphics generation device 10 also generates display images for display devices 30B to 3D using the same processing.

In this specification and claims, the term "image" is used to include not only one or more still images, but also moving images (video) made up of multiple images that are consecutive in time series.

According to this embodiment, as an example, an image in which a virtual object is superimposed on the scenery in real space seen from the window of a car is generated according to the viewpoint position 2A of the user 2 and the position and shape information of the screen of the fixed display device 30, and is displayed on the display device 30, so that the user 2 can feel as if he or she is in a virtual world in which virtual objects exist in real space. In addition, since this embodiment employs a method of displaying images on a screen fixed to the environment rather than a head-mounted display, the display device using the graphics generation device 10 can be used as a display device in virtual reality and augmented reality technology, and can allow the user to experience realistic virtual reality and augmented reality in the same way as a head-mounted display (HMD), without causing the user 2 any inconvenience or discomfort due to the use of a head-mounted display.

In this embodiment, the screen of the display device 30 is shown as an example of a screen that is fixed to the environment, rather than a screen that moves with the user's head like a head-mounted display. As the display device 30, a liquid crystal display, an organic EL (Electro-Luminescence) display, a plasma display, an FED (Field Emission Display), or the like can be used. Also, instead of the display device 30, a video projection device and a screen may be used.

The image display system 1 of this embodiment will be described in more detail below.

The measuring device 20 continuously measures the viewpoint position 2A, and the graphics generating device 10 generates a display image to follow the viewpoint position 2A and displays it on the screen of the display device 30. As a result, when the user 2's head moves, the display image displayed on the display device 30 changes to follow the viewpoint position 2A that moves with it, so that the user 2 can feel as if he or she is in a virtual space. The tracking of the viewpoint position 2A is realized by continuously generating an image of how the drawn content looks when seen from the viewpoint position 2A at that time. However, although the description of tracking is given here, this is not limited to this, and an image may be generated by predicting the viewpoint position 2A a little ahead on the time axis. When the viewpoint position 2A moves, if an image is generated based on the current position of 2A and the result is displayed, a delay will occur in the image display for the time required for calculation, which may cause the user to feel uncomfortable. To solve this, a method is possible in which the position a little ahead is predicted while taking time series data of the viewpoint position 2A and an image is generated in advance.

FIG. 2 is a conceptual diagram for explaining a display image that tracks the viewpoint position. This is a conceptual diagram of a display device 30 installed in a housing 3 as seen from above, and shows three display devices 30A to 30C, excluding display device 30D placed above user 2.

Objects 4A to 4H are virtual objects (buildings in the illustrated example) arranged in a reference coordinate system that are displayed as display images on display device 30. The viewpoint position 2A of user 2 moves left and right as shown by solid and dashed lines in FIG. 2. When viewpoint position 2A moves from the left (solid line) to the right (dashed line) as shown in FIG. 2, the image to be displayed on the screen of display device 30 changes. For example, when viewpoint position 2A is on the left (solid line), the entire object 4B is displayed on the left display device 30C, but when viewpoint position 2A is on the right (dashed line), object 4B is displayed so as to straddle the left display device 30C and the front display device 30A. In addition, when viewpoint position 2A is on the left (solid line), the display device 30C is viewed at a more acute angle than when viewpoint position 2A is on the right (dashed line) (when the viewpoint position is in the normal direction to the display screen and the viewpoint position is defined as 0 degrees, an acute angle refers to a viewpoint position close to ±90 degrees, that is, a state in which the screen is viewed from an oblique direction that is difficult to see). Therefore, normally, the

objects

4A and 4B look like buildings that are thinly squashed in the horizontal direction of the display device. In this case, even if an image of the

objects

4A and 4B is displayed on the display device 30C in a pseudo-projected manner, the image does not look convincing and natural to the user 2 so as to give the illusion that the buildings are there (places located in the virtual world that spreads beyond the screen). For this reason, when the

objects

4A and 4B are displayed on the display device 30C, it is necessary to display an image whose shape has been appropriately adjusted as a result of performing appropriate geometric calculation processing as described below with reference to FIG. 2A to the user 2. In this way, as a process for making the image look convincing and natural to the user 2 so as to give the illusion that the virtual object in the virtual space is right there, in this embodiment, a three-dimensional to two-dimensional conversion process that projects a three-dimensional object in the virtual space onto the display device 30, i.e., onto a two-dimensional surface, is performed using a projection transformation, a perspective projection transformation, or a calculation process similar to these, using the user's viewpoint position 2A and the position and shape of the display screen.

2A is a diagram for explaining how the image display system according to the present embodiment displays an image according to the user's viewpoint position. For convenience of explanation, only the display device 30A is shown in FIG. 2A. In the explanation of FIG. 2A, the user space on the front side of the display device 30A is defined as the real space, and the space on the back side is defined as the virtual space. The virtual space defined on the back side of the display device 30A is displayed on the display device 30A as an image seen from the position P1 where the head of the user 2 is located through a pseudo window (hereinafter also referred to as a "pseudo window") by the display device 30A. The virtual object in the virtual space is defined by the three-dimensional CG data described later. In the example of FIG. 2A, six trees are arranged side by side as virtual objects. Note that here, the virtual space is defined on the back side of the display device 30A in FIG. 2A for the sake of explanation, but the virtual space may be defined on the front side of the display device 30A, or the space including the front side and the back side of the display device 30A may be defined as the virtual space. This makes it possible to display on display device 30A not only images in which a virtual object in virtual space appears to be on the other side of a pseudo window created by display device 30A, but also images in which a virtual object in virtual space appears to be protruding from display device 30A.

When user 2 is at position P1 close to and directly in front of display device 30A, the field of view FoV1 of the virtual space seen through the pseudo window created by display device 30A is wide, and all six trees are displayed on display device 30A as if they are within the field of view FoV1 (display D1). When user 2 moves from position P1 in the z direction to position P2 away from display device 30A, the field of view FoV2 of the virtual space seen through the pseudo window created by display device 30A narrows, and only the entire three trees and parts of the trees on either side of them are displayed in field of view FoV2 (display D2). Furthermore, when user 2 moves from position P1 in the -x (negative x) direction to position P3, the field of view FoV3 of the virtual space seen through the pseudo window created by display device 30A changes in the x direction. Only the three trees on the right side are included in field of view FoV3. In addition, in the field of view FoV3, the screen is viewed from an oblique direction rather than from the front of the display device 30, but the horizontal thickness of the tree seen through the pseudo window needs to be the same as when viewed from the front (display D3'). For this reason, when displaying the tree on the display device 30A, an image that has been appropriately stretched so that it looks like display D3' to the user is displayed (display D3). In this way, as a process that makes the image look plausible and natural so that the user 2 has the illusion that the virtual object in the virtual space is actually present there, in this embodiment, when generating an image to be displayed on the display device 30A, a process (projective transformation, perspective projection transformation, or a calculation process similar thereto) is performed to project the virtual object in the virtual space defined in the three-dimensional CG data onto the display device 30A, i.e., onto a two-dimensional surface. As another method, each point of the three-dimensional CG data may be projected onto the point where the straight line connecting each point and the user's viewpoint position intersects with the display device 30 in the reference coordinate space. Another processing method for generating an image to be displayed on the display device 30A may involve performing arithmetic operations on specific matrices or values based on empirical rules on the image or the three-dimensional parameters of the image.

The measuring device 20 of this embodiment is configured to have an imaging unit that images the user 2, and a calculation unit that determines the viewpoint position 2A based on the imaging information of the user 2 captured by the imaging unit. The imaging unit is realized by a sensor 22 installed in the measuring device 20 shown in FIG. 1. The calculation unit is realized by the processing unit 14 (see FIG. 4) of the graphics generating device 10 shown in FIG. 1. Alternatively, the calculation unit may be realized by a processing unit (not shown) within the measuring device 20.

As an example, in this embodiment, the sensor 22 (imaging unit) is a depth sensor that measures the depth from the sensor 22 to an object (here, the body of the user 2) at each pixel. The calculation unit realized by the processing unit 14 of the graphics generation device 10 estimates the shape of the human body based on the depth of each pixel measured by the sensor 22, and calculates the viewpoint position 2A based on the position of the head in the body. In this way, since the human body shape is estimated from the depth of each pixel and the head position in the body shape is used, the viewpoint position 2A can be identified with high accuracy even if the position and orientation of the body of the user 2 changes in various ways. The method of identifying the viewpoint position is not limited to this. It is also possible to directly recognize the position of a person's face or eyes by image processing, and identify the viewpoint position 2A from the spatial coordinate value of the eye position in combination with the depth information at that time.

FIG. 3 is a conceptual diagram for explaining the calculation of the viewpoint position. The position ( _Xs , _Ys , Zs) and orientation (Pitch _s , Yaw _s , Roll _s ) of the sensor 22 in the reference coordinate system are set in advance. The coordinates (Xh, _Yh , _Zh ) of the viewpoint position 2A can be calculated from the depth of each pixel acquired by the sensor 22 and the position and orientation of the sensor 22. As shown in FIG. 3, the position ( _Xm , _Ym , _Zm ), orientation (Pitch _m , Yaw _m , Roll _m ₎ , and shape (Height _m , Width _m ) of each display device (display device 30A in the example shown in FIG. 3) in the reference coordinate system are also set in advance. The positions, orientations, and shapes of the other display devices 30B to _30D are also set in advance in the same manner. In this embodiment, as an example, each of the display devices 30A to 30D is a rectangle or a trapezoid, and the shape is expressed by a height (Height _m ) and a width (Width _m ). The display devices 30A to 30D may be any polygon, curved surface, or spherical surface. The position, attitude, and shape of the display device and/or its display screen may be defined by a method other than the above. For example, when the display device is rectangular, if the spatial coordinate values of one vertex point of the display screen constituting the device and one vertex point on the diagonal line, and the normal vector information of the front of the display are collectively defined as data related to the position, attitude, and shape of one display device and its display screen, the position, attitude, and shape of the display device can be uniquely set without having to be set separately with the above definition, and another benefit is obtained, such as a small amount of data variables. The above example is merely an example, and there are various ways of expressing the position, attitude, and shape of the display device and its display screen uniquely, and these are collectively called arrangement data, but any method may be used as the format of the arrangement data. The same applies to the sensor position and the viewpoint position.

The image captured by the sensor 22 may be a depth image, or a depth image and a visible image. For example, the imaging unit may include a sensor that captures a depth image and a camera that captures a visible image, and the calculation unit may calculate the viewpoint position 2A using both the depth image and the visible image. It may also be two visible images captured at different positions. In this case, the viewpoint position 2A may be calculated using the parallax between the two visible images. The viewpoint position 2A may also be estimated from a single visible image by image processing using AI.

1 and 3 show an example in which the measuring device 20 (sensor 22) is installed on the dashboard, but this is not limited to this. It may be installed in a suspended position between

display devices

30A and 30D, or it may be installed between other display devices. Also, although an example in which it is installed in front of the user 2 is shown, it is also possible to place it behind the user 2 in a position that does not overlap with the display device 30. It is also possible to install it behind the display and measure the user's viewpoint position by passing through the display surface. Alternatively, it is possible to use multiple sensors and use an image captured by any one of them to perform the measurement, or to integrate information from multiple sensors to use the viewpoint position.

Next, the graphics generation device 10 in this embodiment will be described. FIG. 4 is a block diagram showing the configuration of the graphics generation device in this embodiment. As shown in FIG. 4, the graphics generation device 10 has a storage unit 12, a processing unit 14, a display output unit 16, and a communication unit 18. Although the graphics generation device 10 is depicted as a single element in FIG. 4, the graphics generation device 10 does not necessarily have to be a single physical element, and may be composed of multiple physically separated elements.

The storage unit 12 includes temporary or non-temporary storage media such as ROM (Read Only Memory), RAM (Random Access Memory), HDD (Hard Disk Drive), and SDD (Solid State Drive). The storage unit 12 stores computer programs executed by the processing unit 14 and various data described below. The computer programs stored in the non-temporary storage media of the storage unit 12 include instructions for carrying out each process of the image generation method by the processing unit 14, which will be described later with reference to FIG. 5 etc.

The storage unit 12 holds screen layout data indicating the position, orientation, and shape of each of the display devices 30A-30D in a predetermined reference coordinate space, measurement device layout data indicating the position and orientation of the measurement device 20 in the reference coordinate space, two-dimensional live-action image data capturing an image of real space, and three-dimensional CG data representing a virtual object to be superimposed on the two-dimensional live-action image. The reference coordinate space is a space represented by a coordinate system (reference coordinate system) having a predetermined origin O that serves as the basis for calculations in this embodiment, and is a Cartesian coordinate system space having three axes, xyz. How the reference coordinate space and its origin O are set is arbitrary; for example, the reference coordinate space may be fixed relative to the display device 30, may be fixed relative to the measurement device 20, may be fixed relative to both the display device 30 and the measurement device 20, or may not be fixed to either of them.

"Two-dimensional real-image data capturing real space" is data captured using a camera with a built-in image sensor that captures normal real-image images. As an example, the camera is mounted on a vehicle such as an automobile, and images are acquired by capturing images of the real space around the vehicle while the vehicle is moving along a route in real space. The two-dimensional real-image image may be one or more still images, but is preferably a moving image (video) consisting of multiple frame images captured continuously in a time series. The two-dimensional real-image data may be captured by a camera with a normal viewing angle, but is more preferably a wide-angle image captured by a camera with a wide viewing angle in all directions (360°).

Furthermore, the data of the two-dimensional real-life image may not only be the data of the image itself, but also may include image capture position information regarding the position where each image (or each frame image) was captured, and image capture orientation information regarding the image capture direction from the image capture position of the two-dimensional camera at the time of capturing each image, which may be stored in association with each image (each frame image). For example, position information from a GPS (Global Positioning System) may be used as the image capture position information. For example, velocity vector information acquired by a GPS at the time of capturing may be used as the image capture orientation information. Alternatively, both image capture position information and image capture orientation information may be acquired using a VPS (Visual Positioning System) and stored in association with each image (each frame image). In this method, many images whose captured positions and directions have been specified in advance are used, and visual features such as the outlines of objects such as buildings that appear in those images are extracted, and the position and orientation information is stored in a database as an index that allows the visual features to be searched. Then, when an arbitrary captured 2D real-life image is input, visual features are extracted from the image, and the extracted visual features are compared with the visual features stored in the database to identify the position and direction at which the image was captured. The identified position and direction at which the image was captured are converted into imaging position information and imaging orientation information. For example, a program that automatically performs such conversion is created, and data of a series of 2D real-life images (moving images) arranged in time series is input to this program, so that a series of data in which imaging position information and imaging orientation information are associated with each image (each frame image) in the input data is automatically created and stored. For example, the storage unit 12 can store a plurality of 2D real-life image data corresponding to each route, which are captured while moving along various routes. The shooting position obtained in this way is a position in the real world (strictly speaking, a spatial coordinate value uniquely determined by latitude, longitude, altitude, etc. on the earth), but it is convenient to have position information data defined in a corresponding manner in a virtual space virtually constructed in a manner corresponding to the real world, as described later. In particular, because this position information data can be used as basic information for identifying occluded areas of virtual objects in virtual space (spatial coordinate values that are uniquely determined by X, Y, and Z coordinates in virtual space), it is called the observation position for identifying occluded areas and will be used in the following explanation. There is a one-to-one correspondence between the shooting position in the real world and the observation position for identifying occluded areas in virtual space, and they can be uniquely converted into each other using a specific formula.

"Three-dimensional CG" is also called 3DCG because it is a virtual solid object in three-dimensional space that differs from two-dimensional images in that it contains depth (solid) information. Three-dimensional CG is a virtual solid object (modeling) composed of multiple faces created with vertices at points placed on the three-dimensional coordinates of the virtual space, and may be expressed by providing information reproducing the material, etc., of each face and illuminating the object from an arbitrary light intensity, light source position, etc. Three-dimensional CG data includes information regarding the placement position within the virtual space of the virtual object displayed in the three-dimensional CG. The memory unit 12 can store three-dimensional CG data for a variety of multiple virtual objects.

In this specification and claims, "computer graphics (CG)" or "graphics" refers to images generated by a computer, such as virtual surfaces onto which two-dimensional real-world images are applied as textures, three-dimensional real-space models, and three-dimensional CG representing virtual objects.

The processing unit 14 is composed of, for example, one or more CPUs (Central Processing Units). The processing unit 14 may also include one or more GPUs (Graphics Processing Units). The processing unit 14 executes a computer program stored in the storage unit 12, thereby carrying out each process of the graphics generation method described below with reference to FIG. 5 etc.

The display output unit 16 is an output unit to which the display device 30 is connected, and which outputs an image signal for displaying the image generated by the processing unit 14 on the display device 30. The display output unit 16 is configured with output terminals such as VGA, DVI, DisplayPort (trademark), HDMI (registered trademark), USB (trademark) Type-C, etc., and can be connected to the display device 30 via a wired connection. Alternatively, the display output unit 16 may be configured to be wirelessly connected to the display device 30 using any wireless communication technology.

The communication unit 18 has the function of transmitting and receiving data between the graphics generating device 10 and an external device. In particular, in this embodiment, the communication unit 18 may receive image data captured by the sensor 22 of the measuring device 20. In addition, the communication unit 18 may obtain only the necessary parts of data of a three-dimensional real space model that models a real object that exists in real space via a network. In addition, a two-dimensional real-life image may be sent to an external server for VPS, and imaging position information and imaging orientation information of the image may be obtained from the server via a network. Note that communication between the external device and the communication unit 18 may be either wired communication or wireless communication.

Next, as an example, a graphics generation method by the graphics generation device 10 of this embodiment will be described with reference to FIG. 5 etc. FIG. 5 is an example of a flowchart showing the overall processing by the graphics generation device of this embodiment. Each process in FIG. 5 is executed by the processing unit 14. When the image generated by the graphics generation device 10 is a moving image made up of multiple frame images that are successive in time series, calculations may be made for each frame of the moving image.

Referring to FIG. 5, the processing unit 14 of the graphics generating device 10 first acquires data of a three-dimensional real space model that is a three-dimensional model of a real object that exists in real space (step S101).

Real objects that exist in real space are, for example, buildings, houses, bridges, railways, roads, and other structures in the real world. 3D real space model data that is a 3D model of such real objects contains position information of each real object in the real world and information about the shape of each real object, as information for reproducing the real objects in virtual space. Currently, several businesses and organizations provide 3D city model data that is a 3D model of real spaces for the shapes of several cities and nature. Well-known services that provide 3D city model data include, for example, "Project PLATEAU" from the Ministry of Land, Infrastructure, Transport and Tourism and "3D Map Data" from Zenrin Co., Ltd. In the future, it is expected that various businesses and organizations will provide 3D model data for even more cities and nature.

In this embodiment, as an example, the "three-dimensional real space model data" can be obtained by using a service provided by a business or institution. The communication unit 18 of the graphics generating device 10 connects to a service providing server of the business or institution via the Internet or the like, and the necessary parts of the three-dimensional real space model data are downloaded from the server, thereby obtaining the three-dimensional real space model data. The obtained three-dimensional real space model data is stored in the memory unit 12.

The processing unit 14 then generates a virtual space including a virtual surface onto which the two-dimensional real-world image is attached as a texture (step S102).

FIG. 6 is a conceptual diagram for explaining the processing of step S102 in the flowchart shown in FIG. 5. FIG. 6(a) shows an example of a two-dimensional real-life image captured in real space, and FIG. 6(b) shows a virtual space including a virtual surface S onto which the two-dimensional real-life image is attached as a texture. In the example shown in FIG. 6(a), one frame of a two-dimensional real-life image captured while driving forward on a city road lined with buildings on both sides is shown as imaging data for the first half (180 degrees) of an image captured by a 360-degree camera.

In the processing of step S102, the processing unit 14 generates a virtual space based on a virtual space coordinate system Cv corresponding to the reference coordinate system Cr of the real space, and generates a virtual surface S within the virtual space on the inside of which the two-dimensional real-life image stored in the storage unit 12 is pasted as a texture. The two-dimensional real-life image pasted on the virtual surface S is displayed as a background both when viewed from an observation position OP for identifying an occluded area in the virtual space, and when the image pasted on the virtual surface S is projected (including a calculation process of three-dimensional-two-dimensional conversion) onto a display image displayed on the display device 30 as described below and viewed from the viewpoint position 2A of the user 2 in the real space. The virtual surface S is composed of, for example, the entire curved surface forming a sphere or hemisphere, or a part of the curved surface. If the two-dimensional real-life image is a wide-angle image, for example, with a 180° angle of view, the virtual surface S can be composed of a curved surface that forms a sphere or hemisphere and has an area onto which a 180° angle of view can be projected, or if the image has an omnidirectional (360°) angle of view, the virtual surface S can be composed of a curved surface that forms the entire sphere or hemisphere. Here, angles such as 180° and 360° are merely examples, and the invention is not limited to these angles; the image captured according to the viewing angle of the camera that captures the image can be pasted onto an appropriate area on the virtual surface S. The size of the virtual surface S in the virtual space is a radius large enough to include a three-dimensional model, which will be described later, within it.

The processing unit 14 then performs a process of pasting the two-dimensional real-life image as a texture onto the inside (hereinafter, "inside" will be omitted) of the virtual surface S generated within the virtual space (step S103).

In step S103, the processing unit 14 reads data of the two-dimensional real-life image to be pasted onto the virtual surface S from the storage unit 12, and performs processing to paste the image onto the virtual surface S as a texture. At this time, the processing unit 14 performs pasting by matching the horizontal plane in the two-dimensional real-life image with the horizontal plane in the virtual surface S. If the two-dimensional real-life image is a 360° image, a celestial sphere image obtained by shooting can be pasted onto the virtual surface S using a general method.

The processing unit 14 then places the three-dimensional real space model and virtual objects in the virtual space generated as described above (step S104).

In the processing of step S104, the processing unit 14 places in the virtual space a three-dimensional real space model associated with the two-dimensional live-action image data pasted onto the virtual surface. At this time, the three-dimensional real space model generated based on the real space reference coordinate system Cr is converted into the virtual space coordinate system Cv that constitutes the virtual space, and the three-dimensional real space model is reproduced in actual size in the virtual space. In the processing of step S104, the processing unit 14 further places the virtual object represented by the three-dimensional CG at the relevant position in the virtual space, according to information regarding the position of the virtual object in the virtual space that is stored in the storage unit 12 in a form linked to the three-dimensional CG data representing the virtual object.

FIG. 7 conceptually illustrates a state in which a three-dimensional real space model is reproduced in actual size within a virtual space by the processing of step S104 in the flowchart shown in FIG. 5. By the processing of step S104, a three-dimensional real space model Mdl is reproduced within the virtual space as shown in FIG. 7. FIG. 7 also illustrates a virtual object (e.g., the three-dimensional CG of an elephant in FIG. 7) placed within the virtual space. In this example, the virtual object (elephant) is placed at a position behind a specific building represented by the three-dimensional real space model Mdl in FIG. 7 when viewed from an observation position OP for identifying occluded areas, which will be described later.

The processing unit 14 then identifies an observation position OP for identifying an occluded area in the virtual space where a real object included in the two-dimensional real-life image appears to match a three-dimensional real-space model corresponding to the real object (step S105).

When a certain observation point is placed in virtual space, an observation point is searched for in the virtual space such that a characteristic part of a real-life building (for example, a corner of a building's exterior) seen when the virtual surface S is viewed from that point generally matches a characteristic part on the three-dimensional real-space model that corresponds to the characteristic part when the three-dimensional real-space model is viewed from the same observation point in the same direction. For example, in FIG. 8, when the virtual surface S is viewed from a certain observation point OP, each vertex of the real-life building (Obj) that appears as a rectangle on the virtual surface S generally matches each vertex of Mdl observed when the three-dimensional real-space model Mdl is viewed from the same observation point OP in the same direction. The observation position OP at this time is defined as the observation position OP for identifying the occluded area. In order to determine the observation position OP for identifying the occluded area, in addition to the position of the observation point (defined on the X, Y, Z coordinates) that is a candidate for OP in the virtual space, it is necessary to change the rotation angle (rotation angle Yaw around the Y axis) of the position on the virtual surface S where the texture of the two-dimensional real-life image is applied, and to carry out trial and error. This trial and error can be performed by an operator operating the image generating device 10 to change various parameters (X, Y, Z coordinates of the observation points that are OP candidates and the rotation angle Yaw at which the texture is applied) with the aim of achieving a rough match of the feature points described above, but it can also be automated by the processing unit 14 using a specific algorithm (such as a feature point matching algorithm) or AI, while referring to position information (VPS, GPS, etc.) acquired when the 2D actual image was captured.

As an example, a method for automating the above-mentioned process in S105 using a Visual Positioning System (VPS) will be described. In this method, the position and direction at which the image was captured are identified based on a real-life image captured in real space. For this purpose, a database is prepared in which visual features such as the outlines of objects such as buildings that appear in many images whose captured positions and directions have been identified in advance are extracted and the visual features are stored as searchable indexes together with information on position and direction. When a real-life image is input, visual features are extracted from the acquired image, and the extracted visual features are compared with the visual features stored in the database to identify the position and direction at which the image was captured. This identified position is identified as an observation position OP for identifying the occluded area. If there is a slight deviation or error, when the observation viewpoint is directed from the observation position OP toward the direction identified from the image, the feature points are further compared as described above between the three-dimensional real-space model in the field of view and the two-dimensional real-life image pasted as a texture on the virtual surface S that exists behind it. A matching process is performed in which the rotation angle (rotation angle Yaw around the Y axis) of the position on the virtual surface S where the texture of the two-dimensional real-life image is applied is changed so that the feature points generally match, and the degree of match of the feature points is scored while finely adjusting the position coordinates of the observation position OP, and the position coordinates and rotation angle Yaw of the observation position OP with the highest evaluation are searched for. In this way, the above-mentioned work may be automated by using the VPS. In addition, when a 360-degree image is used as an input image to the VPS, the entire image may be used, or a part of the image may be cut out and used. In addition, after the first observation position OP for identifying a blocked area is identified in this way, the corresponding observation position OP may be moved along the path traveled when the two-dimensional real-life image was captured, while determining the observation position OP for identifying a blocked area for each captured frame, or fine adjustment may be made by the above method based on the observation position OP for identifying a blocked area determined in this way.

Another example of the method for processing step S105 will be described. For example, the processing unit 14 may first perform feature search on the two-dimensional real-life image pasted on the virtual surface S of the virtual space. In the feature search, a real object is detected from within the two-dimensional real-life image. In this embodiment, if the real object in the two-dimensional real-life image is, for example, a building, the outline of the building's walls and the points where these outlines intersect with each other may be considered as feature parts. Any known image recognition technology may be used for the process of detecting a real object in the two-dimensional real-life image. In addition, a method of recognizing a real object may be used that uses a learner (trained artificial intelligence program) that directly recognizes a real object from an image trained by deep learning or machine learning.

Then, the processing unit 14 may execute a matching process between a real object in the two-dimensional real-action image pasted onto the virtual surface S in the virtual space and a three-dimensional real-space model projected onto the virtual surface S when the three-dimensional real-space model is viewed from a certain observation point in the virtual space. This matching process compares the feature amount in the two-dimensional image obtained by perspectively projecting the three-dimensional real-space model from the observation point onto the virtual surface S with the feature amount of the real object in the two-dimensional real-action image on the virtual surface S, and identifies, as an observation position OP for identifying an occluded area, a position on the virtual surface S where the real object in the two-dimensional real-action image on the virtual surface S and the corresponding three-dimensional real-space model appear to roughly overlap when viewed from a certain observation point in the virtual space.

The processing unit 14 may execute the feature search process and matching process using either or both of the two methods described above multiple times to find multiple candidate observation positions OP in the virtual space. The processing unit 14 may then perform an evaluation based on the feature values related to the real object included in the two-dimensional live-action image and the feature values related to the three-dimensional real space model, and identify an observation point with a high matching score as an observation position for identifying an occluded area. Through the above processing, an observation position OP for identifying an occluded area is identified in the virtual space at which the real object included in the two-dimensional live-action image appears to roughly match the three-dimensional real space model corresponding to the real object.

FIG. 8 is a conceptual diagram showing a state in which an observation position OP for identifying an occluded area is identified, where a real object included in a two-dimensional live-action image appears to match a three-dimensional real space model corresponding to the real object, by the processing of step S105 in the flowchart shown in FIG. 5. The processing of step S105 identifies an observation position OP for identifying an occluded area, where a real object Obj (a building in this example) included in a two-dimensional live-action image appears to match a three-dimensional real space model Mdl of the building corresponding to the real object Obj.

The processing unit 14 then identifies an occluded area of the virtual object placed in the virtual space that is occluded by the three-dimensional real space model when viewed from an observation position for identifying an occluded area in the virtual space (step S106).

As an example of the processing of step S106, the processing unit 14 detects the outer shape of the three-dimensional real space model projected onto the virtual object Obj when the three-dimensional real space model is viewed from the observation position OP for identifying the occluded area. The outer shape of this three-dimensional real space model is conceptually a shadow (silhouette) that appears on the virtual object Obj when light is projected only onto the three-dimensional real space model from the observation position OP for identifying the occluded area. The processing unit 14 identifies an area that includes all of the volumetric parts of the outer shape of the three-dimensional real space model projected onto the virtual object Obj that penetrate the virtual object Obj as an occluded area of the virtual object Obj that is occluded by the three-dimensional real space model when viewed from the observation position OP for identifying the occluded area in the virtual space.

FIG. 9 is a conceptual diagram showing an example of the occlusion area Ocl of the virtual object Obj identified by the processing of step S106 in the flowchart shown in FIG. 5. Referring to FIG. 9, the occlusion area Ocl is shown, which is conceived as a volume portion that penetrates the virtual object Obj on the virtual object Obj when light is projected from the observation position OP for identifying the occlusion area onto the three-dimensional real space model Mdl.

The processing unit 14 then makes the portion of the virtual object that corresponds to the occluded area transparent (or deletes, erases, or makes it invisible) (step S107). As a result, when viewed from the observation position OP for identifying the occluded area, the virtual object (three-dimensional CG) appears to overlap with the actual image of the virtual surface behind it, with the occluded area portion cut out.

As described above, the position information of the virtual object in the virtual space exists in the memory unit 12 in a form linked to the data of the three-dimensional shape, so the processing unit 14, based on the relationship between the position of the virtual object in the virtual space and the arrangement position of the three-dimensional real space model, leaves the part of the virtual object, or the whole of the virtual object, that is located on the closer side to the three-dimensional real space model (the side closer to the observation position OP for identifying the occluded area) without making it transparent.

The processing unit 14 then performs processing to generate a display image for displaying a three-dimensional object in the virtual space on the display device 30 according to the viewpoint position 2A of the user 2 and the information on the position and shape of the display device (step S108). More specifically, the processing unit 14 calculates the viewpoint position 2A of the user 2 in the reference coordinate system based on an image (depth image, etc.) captured by the sensor 22 of the measurement device 20, and generates a display image that simulates how the virtual space appears when viewed through the display device 30 from the viewpoint position 2A. When the display device 30 is composed of multiple display devices 30A-30D as in this embodiment, the processing unit 14 generates a display image to be displayed on each of the display devices 30A-30D.

As an example, the processing unit 14 can accurately generate a display image that can be seen through the screen of the display device 30 from the viewpoint position 2A of the user 2 by performing projective transformation, perspective projection transformation, or similar calculation processing on the screen of the display device 30 based on the viewpoint position 2A of the user 2, for example, of a virtual object in which an obstructed area existing in the virtual space has been made transparent and a virtual surface S on which a two-dimensional real-life image has been attached as a texture. By such calculation processing such as projective transformation, the virtual object in the virtual space is geometrically accurately projected onto the screen of the display device 30 taking into account the viewpoint position 2A of the user. Therefore, even if the user 2 looks into the screen of the display device 30 from an oblique direction, a display image is generated that is presented on the display device 30 as a realistic and natural image of a real object (e.g., a building or other structure) in the two-dimensional real-life image and a three-dimensional CG (e.g., an elephant as a virtual object) as if seen from the viewpoint position 2A. The processing performed here is not limited to projective transformation only. A perspective projection transformation method may be used, or a specific matrix or numerical arithmetic operation according to empirical rules may be used.

The above-mentioned methods realize occlusion processing with a certain degree of accuracy. On the other hand, a simple method can be considered that simplifies the processing and reduces the processing load by accepting a loss of accuracy compared to the above-mentioned methods. As an example, in the flowchart shown in FIG. 5, the processing of S106 and S107 for identifying and implementing the occlusion area on the virtual object can be skipped, and a display image is generated simultaneously in S108, including the occlusion processing. In the above-mentioned method, in S108 of FIG. 5, the display image to be presented on the display device 30 is generated by the above-mentioned calculation processing for the virtual object that has been subjected to the occlusion processing in advance in steps S106 and S107. However, in this modified method, such a prior occlusion processing is not performed. In S108, starting from the user's viewpoint position 2A, all three-dimensional objects in the virtual space visible through the display device 30 are converted to two dimensions according to the display screen by the above-mentioned calculation processing including projective transformation. At this time, by treating the three-dimensional real space model as an object for occlusion (shielding object), all of the parts (or the whole) of the virtual object that are hidden by the three-dimensional real space model and cannot be seen from the viewpoint position 2A are occluded (made transparent). Since the occlusion process (transparency) is not performed on the virtual surface that exists further back, the two-dimensional real-life image of the virtual surface that is behind the occluded part as described above is visible from the user viewpoint position 2A. In this method, if the observation position for identifying the occluded area and the user viewpoint position 2A coincide, accurate occlusion is realized without any sense of incongruity. However, in a normal use case, these two positions do not coincide, so an error occurs in the occlusion process. That is, a deviation occurs in the superimposition of the real-life image and the virtual image. This deviation becomes larger as the spatial positions of the observation position for identifying the occluded area and the user viewpoint position 2A become farther apart, but depending on the use case, this deviation does not become a big problem. When actually creating an image with this kind of occlusion, for example, if a camera is installed on the windshield of a passenger car to capture real-life footage, there is only a deviation of about 1m between the camera position (observation position) and the user's viewpoint position (the viewpoint position of the driver in the driver's seat), which is sufficiently small compared to the distance between the passenger car and the real objects outside it (in many cases, several meters to several tens of meters or more), so the deviation in the superimposition is often not noticeable. Therefore, this methodology is often necessary and sufficient to achieve this purpose.

The processing unit 14 transmits the signal of the display image thus generated to the display device 30 via the display output unit 16 of the graphics generating device 10, and causes the display device 30 to display the display image (step S109). FIG. 10 is a diagram showing an example of a display image displayed on the display device 30 in step S109 in the flowchart shown in FIG. 5. As shown in FIG. 10, in this example, a vehicle equipped with a camera moves along a certain route in real space, and a two-dimensional real-image image obtained by capturing an image of the real space around the vehicle with the camera is superimposed on the two-dimensional real-image image, and the image is displayed across the display devices 30A to 30D. In the example shown in FIG. 10, of the virtual object (elephant) that overlaps with the occlusion area that coincides with the outline of the real object (building) in the two-dimensional real space, the front leg part of the virtual object (elephant) located on the front side of the screen of the real object in the two-dimensional real space is not occluded, and the other parts (part of the right ear, the rear part of the torso, and the hind leg) are shown in a state of being occluded by the occlusion area.

FIG. 5A is a flowchart of another example in which some of the processing is different from that of the flowchart shown in FIG. 5. Each process of the flowchart shown in FIG. 5A is executed by the processing unit 14 in the same way as in FIG. 5. Also, when the image generated by the graphics generating device 10 is a moving image consisting of multiple frame images that are successive in time series, it is basically the same as FIG. 5, including the fact that calculations may be performed for each frame of the moving image. Below, only steps S106A, S107A, and S108A in FIG. 5A, which are different from FIG. 5, will be described.

The process performed in step S106A is to identify the outer shape of a real-space three-dimensional model on the virtual surface S as a shading area when the real-space three-dimensional model placed in virtual space is perspectively projected onto the virtual surface S from the observation position OP for identifying a shading area. The processing unit 14 detects the outer shape of the three-dimensional real-space model that is perspectively projected onto the virtual surface S when the three-dimensional real-space model is viewed from the observation position OP for identifying a shading area. The outer shape of this three-dimensional real-space model is conceptually a shadow (silhouette) that appears on the virtual surface S when light is projected onto the three-dimensional real-space model from the observation position OP for identifying a shading area. The processing unit 14 identifies the area within the outer shape of the three-dimensional real-space model perspectively projected onto the virtual surface S as a shading area that is shading by the three-dimensional real-space model when viewed from the observation position OP for identifying a shading area in the virtual space.

FIG. 9A is a conceptual diagram showing an example of an occlusion area Ocl on the virtual surface S identified by the processing of step S106A in the flowchart shown in FIG. 5A. Referring to FIG. 9A, the occlusion area Ocl is conceived as a shadow that appears on the virtual surface S when light is projected from the observation position OP for identifying the occlusion area onto the three-dimensional real space model Mdl.

In step S107A, the processing unit 14 then performs a 3D-to-2D conversion by perspectively projecting the virtual object (3D CG image) onto the virtual surface S, and generates an overlapping virtual surface by making transparent (or deleting, scraping, or making invisible) the portion of the virtual object (2D CG image) after the conversion that corresponds to the occluded area.

In step S107A, when the virtual object is viewed from the observation position OP for identifying the occluded area, if a part or all of the three-dimensional real space model exists in front of it (on the OP side), the processing unit 14 first performs a perspective projection of the virtual object onto the virtual surface S and superimposes it on the virtual surface S, and then performs a process of making the occluded area identified in the processing of step S106A transparent (including deleting, erasing, or making it invisible). More specifically, when the distance from the observation position OP for identifying the area in the virtual space to the virtual object is greater than the distance from the observation position OP to the three-dimensional real space model, a virtual object that occludes a part corresponding to the occluded area occluded by the three-dimensional real space model is perspectively projected onto the virtual surface S, and a superimposed overlapping virtual surface is generated. As a result, when viewed from the observation position OP for identifying the occluded area, the virtual object appears to overlap with the actual image of the virtual surface behind it, as if the occluded area portion had been cut out. In other words, a single overlapping virtual surface is generated by perspectively projecting onto a virtual surface all virtual objects in the virtual world observed from the observation position OP for identifying an occluded area, including virtual objects whose portions overlapping with the occluded area have disappeared. Here, the overlapping virtual surface is overlaid on the two-dimensional live-action image that was originally applied as texture to the virtual surface, with the virtual objects being perspectively projected onto the virtual surface. Therefore, the overlapping virtual surface created in this way exists as a three-dimensional object with textures applied in multiple layers in the virtual space.

As described above, information regarding the position of a 3D CG virtual object in virtual space is stored in the memory unit 12 in a form linked to the data of the 3D CG image. Based on this position information, the processing unit 14 takes into consideration the relationship between the position of the virtual object in virtual space and the position of the 3D real space model that forms the occlusion area, and can perform processing to leave a part or all of the virtual object that is located in front of the 3D real space model (the side closer to the observation position OP for identifying the occlusion area) without making it transparent, even if it is an occlusion area.

In step S108A, the processing unit 14 then performs processing to generate a display image for displaying the three-dimensional object on the display device 30 according to the viewpoint position 2A of the user 2 and the position and shape of the display device, on the assumption that the overlapping virtual surface exists in the virtual space.

In step S108A, the processing unit 14 generates an image that can be seen from the viewpoint position 2A of the user 2 through the display device 30 for the overlapping virtual surface generated in step S107A that exists in the virtual space. More specifically, the processing unit 14 calculates the viewpoint position 2A of the user 2 in the reference coordinate system based on an image (depth image, etc.) captured by the sensor 22 of the measurement device 20, and generates a display image that simulates how the virtual space appears when viewed from the viewpoint position 2A through the display device 30. When the display device 30 is composed of multiple display devices 30A to 30D as in this embodiment, the processing unit 14 generates a display image to be displayed on each of the display devices 30A to 30D.

As an example, the processing unit 14 can generate a display image that can be seen through the screen of the display device 30 from the viewpoint 2A of the user 2 by performing projective transformation on the screen of the display device 30 of a overlapping virtual surface (a state in which a two-dimensional real-life image and a two-dimensional virtual object (original three-dimensional CG) that exists in a form superimposed on it are pasted as textures) based on the viewpoint 2A of the user 2. By such projective transformation, the overlapping virtual surface is projected onto the screen of the display device 30 in a mathematically correlated manner taking into account the viewpoint 2A of the user, so that even if the user 2 looks into the screen of the display device 30 from an oblique direction, a display image is generated that is presented on the display device 30 as a realistic and natural image of a real object (e.g., a building or other structure) in the two-dimensional real-life image and a three-dimensional CG (e.g., an elephant as a virtual object) as if seen through the screen from the viewpoint 2A. The processing performed here is not limited to projective transformation only. A perspective projection transformation method may be used, or a specific matrix or numerical arithmetic operation according to empirical rules may be used.

The process in step S109 is the same in the method of FIG. 5 and the method of FIG. 5A.

If the two-dimensional real-life image is a moving image made up of multiple frame images captured successively in time series, then for example each frame of the moving image is processed and displayed. In this case, all of the steps shown in FIG. 5 and FIG. 5A may be performed for each frame, or any step that is a common process or a process that is performed only once the first time may be skipped. The order of each step may also be partially changed, or the steps may be processed in parallel. As a simplified method, certain steps may be skipped as described above.

In addition, in S104, the position of the observation position OP for identifying the occluded area (defined on the X, Y, Z coordinates) and the rotation angle (rotation angle Yaw around the Y axis) of the position on the virtual surface S where the texture of the two-dimensional real-life image is applied are both determined by trial and error or by automation using a program, but in the case of a moving image as described above, particularly when the direction of the lens of the shooting camera and the shooting direction of the video are locked, the Yaw may in principle not be changed after it has been determined in the first frame.

By carrying out the processing of each step for each frame image of the video in this way, images in which a virtual object displayed in 3D CG is superimposed on a virtual surface are continuously generated following changes in the viewpoint position 2A of the user 2, and an augmented reality video is provided to the user 2 via the display device 30, in which the virtual object displayed in 3D CG (in this example, an elephant) is present in a real space (in this example, an urban area outside the vehicle) represented by a two-dimensional live-action image. This creates an illusion for the user 2 who sees the image displayed on the display device 30 as if the display screen were a real car window, and furthermore, the 3D CG virtual object is superimposed on a realistic landscape on the display screen, including depth information, making it feel as if the real (real space) and virtual (virtual space) are being represented in a single world that is amalgamated together.

As an application example of the system 1 according to the present embodiment, a driving simulator in which the graphics generating device 10 generates the scenery seen from the window of a car according to the viewpoint position of the user 2 and displays it on the display device 30 has been shown, but the application of the system 1 is not limited to this example. For example, it can also be used in an example in which live-action content previously captured by a 360-degree camera is projected on three walls of a private room in which a fitness bike is installed. While exercising on the fitness bike in a private room, the user can feel the experience of actually riding the bike through a cityscape or landscape. In the 360-degree video used here, if additional CG is occluded to further entertain the user, the CG can be occluded in a simple two-dimensional image video centered on the scenery of the base 360-degree video. Such a requirement can be applied in various entertainment fields, but it is also possible to configure a simulator that displays virtual objects superimposed on the scenery seen from the window of an airplane, ship, submarine, spaceship, etc. as a simulator of other types of vehicles in fields other than entertainment. In addition, for example, by configuring system 1 to display virtual objects superimposed on images captured by an endoscope inserted into the patient's digestive tract through the mouth, it is possible to provide a user experience with an unprecedented worldview, in which the user feels as if they are inside the human body and observing its interior.

The above-described embodiments of the present invention are illustrative examples for the purpose of explaining the present invention, and are not intended to limit the scope of the present invention to only these embodiments. Those skilled in the art can implement the present invention in various other forms without departing from the gist of the present invention. Furthermore, although the description of the embodiments does not specify details such as specific arithmetic expressions for projective transformation, it goes without saying that those skilled in the art can implement the embodiments by appropriately using known arithmetic expressions for projective transformation.

1...image display system, 2...user, 2A...viewpoint position, 3...housing, 4A-4H...virtual object, 10...graphics generation device, 20...measuring device, 30, 30A-30D...display device, S...virtual surface

Claims

1. A graphics generation device having a storage unit and a processing unit,
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The processing unit includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
Identifying an occluded area in the virtual object based on an observation position in the virtual space corresponding to a shooting position in the real world when the two-dimensional real-world image is captured, the virtual surface, and the three-dimensional real space model;
generating the virtual object occluding the occlusion region.
1. A graphics generation device having a storage unit and a processing unit,
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The processing unit includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
when a distance from an observation position in the virtual space corresponding to a shooting position in the real world when the two-dimensional real-world image was captured to the virtual object is longer than a distance from the observation position to the three-dimensional real space model, the virtual object occluding a portion corresponding to a occluded area occluded by the three-dimensional real space model is subjected to a perspective projection starting from the observation position, and a overlapped virtual surface is further generated by pasting a texture obtained by two-dimensionally converting the virtual object onto the virtual surface by superimposing the texture on the texture;
13. A graphics generation device configured to execute:
1. A graphics generation device having a storage unit and a processing unit,
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The processing unit includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
setting the three-dimensional real space model as an obstruction for calculating an obstruction region in the virtual object other than the virtual surface;
generating the virtual object occluding the occlusion region;
13. A graphics generation device configured to execute:
The processing unit includes:
and further configured to specify, in the virtual space, the observation position at which the real object in the real space shown in the two-dimensional real-life image pasted on the virtual surface appears to match the three-dimensional real-space model corresponding to the real object.
A graphics generation device according to any one of claims 1 to 3.
The identifying the observation position includes:
5. The graphics generating device according to claim 4, further comprising: determining the observation position using a plurality of candidate observation positions based on features of the real object in the real space reflected in the two-dimensional real-life image pasted on the virtual surface in the virtual space when viewed from the plurality of candidate observation positions in the virtual space, and features in a projected image obtained by perspectively projecting the three-dimensional real space model onto the virtual surface from the candidate observation positions.
The processing unit includes:
identifying the observation position and the observation direction from the observation position based on visual features appearing in the two-dimensional real-life image;
A graphics generation device according to any one of claims 1 to 3.
The graphics generating device of claim 2, wherein generating the overlapping virtual surface includes identifying the occluded area that is occluded by the three-dimensional real space model when viewed from the observation position.
the two-dimensional real-life image includes a plurality of frame images captured successively at a plurality of positions on a path while moving on the path in the real space;
The processing unit is configured to execute, for each of the frame images, determining the observation position in the same virtual space and identifying the occlusion area.
2. The graphics generation device of claim 1.
the two-dimensional real-life image includes a plurality of frame images captured successively at a plurality of positions on a path while moving on the path in the real space;
the processing unit is configured to execute, for each of the frame images, determining the observation position in the same virtual space and generating the overlapping virtual surface.
3. The graphics generation device of claim 2.
the two-dimensional real-life image includes a plurality of frame images captured successively at a plurality of positions on a path while moving on the path in the real space;
the processing unit is configured to execute, for each of the frame images, determining the observation position in the same virtual space and setting the obstruction.
4. The graphics generation device of claim 3.
The processing unit includes:
The display device is further configured to perform projective transformation or perspective projection transformation to generate a display image that simulates an appearance when the virtual space is viewed from the user's viewpoint position through the screen, based on layout data of a screen capable of displaying an image, the screen having a display in a predetermined reference coordinate space, and a user's viewpoint position in the reference coordinate space, and to display the display image on the screen.
A graphics generation device according to any one of claims 1 to 10.
The processing unit includes:
and generating a display image that simulates an appearance when the virtual space is viewed from the user's viewpoint position through the screen, by performing projective transformation or perspective projection transformation, based on layout data of a screen capable of displaying an image, the screen having a display in a predetermined reference coordinate space, and a user's viewpoint position in the reference coordinate space measured by a measurement device, and displaying the display image on the screen.
12. The graphics generation device of claim 11.
the user's gaze position is continuously measured by the measurement device;
Displaying the display image on the screen includes generating the display image so as to follow the user's viewpoint position and displaying the display image on the screen.
13. The graphics generation device of claim 12.
The screen is configured to surround at least a portion of the user's periphery, including in front of the user;
Displaying the display image on the screen includes generating the display image that simulates the appearance when the virtual space is viewed through each part of the screen based on the layout data of the screen and the user's viewpoint position, and displaying the display image on each part of the screen.
A graphics generation device according to any preceding claim.
1. A graphics generation method performed by a graphics generation device having a storage unit and a processing unit, the method comprising:
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The image generating method includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
identifying an occluded area in the virtual object based on an observation position in the virtual space corresponding to a shooting position in the real world when the two-dimensional real image is captured, the virtual surface, and the three-dimensional real space model;
generating the virtual object occluding the occlusion region;
A method for generating graphics, comprising:
1. A graphics generation method performed by a graphics generation device having a storage unit and a processing unit, the method comprising:
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The image generating method includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
when a distance from the observation position for identifying an occluded area to the virtual object is greater than a distance from the observation position to the three-dimensional real space model, further generating a overlapped virtual surface by pasting a texture obtained by two-dimensionally converting the virtual object occluding a portion corresponding to the occluded area occluded by the three-dimensional real space model onto the virtual surface by perspective projection, on the texture;
A method for generating graphics, comprising:
1. A graphics generation method performed by a graphics generation device having a storage unit and a processing unit, the method comprising:
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The image generating method includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
setting the three-dimensional real space model as an obstruction for calculating an obstruction region in the virtual object other than the virtual surface;
generating the virtual object occluding the occlusion region;
A method for generating graphics, comprising:
identifying the observation position and the observation direction from the observation position based on visual features appearing in the two-dimensional real-life image;
A method for generating graphics according to any one of claims 15 to 17.
A program for causing a computer having a storage unit and a processing unit to execute a graphics generation process,
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The program causes the processing unit to
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
identifying an occluded area in the virtual object based on an observation position in the virtual space corresponding to a shooting position in the real world when the two-dimensional real image is captured, the virtual surface, and the three-dimensional real space model;
generating the virtual object occluding the occlusion region;
A program to execute.
A program for causing a computer having a storage unit and a processing unit to execute a graphics generation process,
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The program causes the processing unit to
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
when a distance from the observation position for identifying an occluded area to the virtual object is greater than a distance from the observation position to the three-dimensional real space model, further generating a overlapped virtual surface by pasting a texture obtained by two-dimensionally converting the virtual object occluding a portion corresponding to the occluded area occluded by the three-dimensional real space model onto the virtual surface by perspective projection, on the texture;
A program to execute.
A program for causing a computer having a storage unit and a processing unit to execute a graphics generation process,
The storage unit includes:
Two-dimensional real-world image data of a real space;
3D CG data representing a virtual object to be superimposed on the 2D real-life image;
is stored,
The program includes:
acquiring data of a three-dimensional real space model that models a real object existing in the real space;
generating a virtual space including a virtual surface onto which the two-dimensional real-life image is pasted as a texture, the three-dimensional real space model, and the virtual object;
setting the three-dimensional real space model as an obstruction for calculating an obstruction region in the virtual object other than the virtual surface;
generating the virtual object occluding the occlusion region;
A program to execute.
identifying the observation position and the observation direction from the observation position based on visual features appearing in the two-dimensional real-life image;
22. The program according to any one of claims 19 to 21.