WO2019193699A1

WO2019193699A1 - Reference image generation device, display image generation device, reference image generation method, and display image generation method

Info

Publication number: WO2019193699A1
Application number: PCT/JP2018/014478
Authority: WO
Inventors: 雄気唐澤; アンドリュージェイムスビゴス
Original assignee: 株式会社ソニー・インタラクティブエンタテインメント; ソニーインタラクティブエンタテインメントヨーロッパリミテッド
Priority date: 2018-04-04
Filing date: 2018-04-04
Publication date: 2019-10-10

Abstract

The present invention: prepares, as reference images 428a–428e, images of a space viewed from reference viewpoints, for a space including and object 424 for display; synthesizes these, in accordance with an actual viewpoint position; and draws a display image. Identification information for the reference image is associated to each prescribed region, in a display surface for a model of the object 424, said reference image having the relevant region appearing as an image. The reference images A and C associated to a surface 430a are used as references when drawing said surface 430a.

Description

Reference image generation device, display image generation device, reference image generation method, and display image generation method

The present invention relates to a reference image generation device that generates data used to display an image according to a user's viewpoint, a display image generation device that generates a display image using the data, a reference image generation method using the same, and a display The present invention relates to an image generation method.

An image display system that allows users to appreciate the target space from a free viewpoint has become widespread. For example, a system has been developed in which a panoramic image is displayed on a head-mounted display, and a panoramic image corresponding to the line-of-sight direction is displayed when a user wearing the head-mounted display rotates his head. By using a head-mounted display, it is possible to enhance the sense of immersion in video and improve the operability of applications such as games. In addition, a walk-through system has been developed that allows a user wearing a head-mounted display to physically move around a space displayed as an image when the user physically moves.

∙ Regardless of the type of display device, image display technology that supports free viewpoints requires high responsiveness to changes in display with respect to viewpoint movement. On the other hand, in order to increase the realism of the image world, it is necessary to increase the resolution or perform complicated calculations, which increases the load of image processing. For this reason, the display cannot catch up with the movement of the viewpoint, and as a result, the sense of reality may be impaired.

The present invention has been made in view of these problems, and an object of the present invention is to provide a technique capable of achieving both image display responsiveness and image quality with respect to the viewpoint.

In order to solve the above-described problems, an aspect of the present invention relates to a reference image generation device. This reference image generation device is used to generate a display image when a space including an object to be displayed is viewed from an arbitrary viewpoint, and data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint A reference image generation device for generating a reference image and a depth image corresponding to the reference image and a depth image corresponding to the reference viewpoint arranged in the space, and a space construction unit that arranges the object in the space according to information defining the object A reference image data generation unit that generates a reference image in which the region appears as an image for each predetermined region on the surface of the object using the depth image, and outputs the identification result and the data of the reference image; , Provided.

Another aspect of the present invention relates to a display image generation apparatus. The display image generation device stores an object model storage unit that stores information defining an object in a display target space, and reference image data representing an image when the space including the object is viewed from a predetermined reference viewpoint. A reference image data storage unit, a viewpoint information acquisition unit that acquires information related to a user's viewpoint, a projection unit that displays an image of an object on a plane of the display image when the space is viewed from the user's viewpoint, and a display image For each pixel, a reference image in which a point on the corresponding object is represented is specified by reading additional information of the object model stored in the object model storage unit, and the color of the pixel is determined in the specified reference image. A pixel value determining unit that determines the color of the image; and an output unit that outputs display image data. .

Still another embodiment of the present invention relates to a reference image generation method. This reference image generation method is used to generate a display image when a space including an object to be displayed is viewed from an arbitrary viewpoint, and data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint A reference image generating device to generate, in accordance with information defining the object, a step of arranging the object in a space; a step of generating a reference image and a depth image corresponding to the reference image in a field of view corresponding to the reference viewpoint arranged in the space; And, for each predetermined area on the surface of the object using the depth image, specifying a reference image in which the area appears as an image, and outputting a specification result and data of the reference image. To do.

Still another aspect of the present invention relates to a display image generation method. In this display image generation method, a step of reading information defining an object in a display target space from a memory, and a reference image data representing an image when the space including the object is viewed from a predetermined reference viewpoint are read from the memory. A step of acquiring information relating to the user's viewpoint, a step of representing an image of the object when the space is viewed from the user's viewpoint on a plane of the display image, and a pixel on the corresponding object for each pixel in the display image Identifying a reference image in which points are represented based on additional information of the object model included in the information defining the object, and determining the color of the pixel using the color of the image in the identified reference image; Outputting display image data.

It should be noted that any combination of the above-described constituent elements and the expression of the present invention converted between a method, apparatus, system, computer program, data structure, recording medium, etc. are also effective as an aspect of the present invention.

According to the present invention, it is possible to achieve both image display responsiveness and image quality with respect to the viewpoint.

It is a figure which shows the example of an external appearance of the head mounted display of this Embodiment. It is a block diagram of the image processing system of this Embodiment. It is a figure for demonstrating the example of the image world which the display image generation apparatus of this Embodiment displays on a head mounted display. It is a figure which shows the internal circuit structure of the display image generation apparatus of this Embodiment. It is a figure which shows the functional block of the display image generation apparatus in this Embodiment. It is a figure which shows the functional block of the reference | standard image generation apparatus in this Embodiment. It is a figure which shows the example of a setting of the reference viewpoint in this Embodiment. It is a figure for demonstrating the method in which the pixel value determination part in this Embodiment selects the reference | standard image used for the determination of the pixel value of a display image. It is a figure for demonstrating the method in which the pixel value determination part in this Embodiment determines the pixel value of a display image. It is a flowchart which shows the process sequence which the display image generation apparatus produces | generates the display image according to a viewpoint in this Embodiment. It is a figure which shows the structural example of the data stored in a reference | standard image data memory | storage part in this Embodiment. It is a figure which shows the example of a setting of the reference viewpoint for representing the object with a motion in this Embodiment. It is a figure for demonstrating the aspect which switches the reference | standard image used for the production | generation of a display image in this Embodiment according to a motion of an object. In this embodiment, the configuration of functional blocks of a reference image data generation unit of a reference image generation device and a pixel value determination unit of a display image generation device when a compression / decompression processing function of reference image data is introduced is shown. FIG. It is a figure which shows typically the example of an integrated moving image produced | generated by the data compression part in this Embodiment. It is a figure which shows typically another example of the integrated moving image produced | generated by the data compression part in this Embodiment. It is a figure for demonstrating the method of making only the image of the area | region with a change into time series data as one of the compression processes which a data compression part implements in this Embodiment. It is a figure for demonstrating the method of making the information showing only a pixel with a change into time series data as one of the compression processes which a data compression part implements in this Embodiment. It is a figure which illustrates two frames which move forward and backward in the animation of the standard picture of this embodiment. In this Embodiment, it is a figure for demonstrating the method in which a data compression part controls the compression process of a reference | standard image per tile image. In this Embodiment, it is a figure which shows the structural example of the data after compression in the aspect which controls the compression process of a reference | standard image and a depth image per tile image. In this Embodiment, it is a figure for demonstrating the example of a data compression process when the image of all the directions of a reference | standard image and a depth image is represented by the cube map. In this embodiment, a reference image data generation unit of a reference image generation device and a display image generation device when a function of storing information related to a reference image of a reference destination in association with a position on the object surface is introduced. It is a figure which shows the structure of the functional block of the pixel value determination part. It is a figure for demonstrating the example of the method of matching the identification information of the reference image of a reference destination with an object model in this Embodiment. It is a figure for demonstrating another example of the method of matching the identification information of the reference image of a reference destination with an object model in this Embodiment. It is a figure for demonstrating another example of the method of matching the identification information of the reference image of a reference destination with an object model in this Embodiment.

This embodiment basically displays an image with a field of view according to the user's viewpoint. As long as this is the case, the type of device that displays an image is not particularly limited, and any of a wearable display, a flat panel display, a projector, and the like may be used. Here, a head-mounted display will be described as an example of a wearable display.

In the case of a wearable display, the user's line of sight can be estimated approximately by a built-in motion sensor. In the case of other display devices, the user can detect the line of sight by wearing a motion sensor on the head or using a gaze point detection device. Alternatively, the user's head may be attached with a marker, and the line of sight may be estimated by analyzing an image of the appearance, or any of those techniques may be combined.

FIG. 1 shows an example of the appearance of the head mounted display 100. The head mounted display 100 includes a main body portion 110, a forehead contact portion 120, and a temporal contact portion 130. The head mounted display 100 is a display device that is worn on the user's head and enjoys still images and moving images displayed on the display, and listens to sound and music output from the headphones. Posture information such as the rotation angle and inclination of the head of the user wearing the head mounted display 100 can be measured by a motion sensor built in or externally attached to the head mounted display 100.

The head mounted display 100 is an example of a “wearable display device”. The wearable display device is not limited to the head-mounted display 100 in a narrow sense, but includes glasses, glasses-type displays, glasses-type cameras, headphones, headsets (headphones with microphones), earphones, earrings, ear-mounted cameras, hats, hats with cameras, Any wearable display device such as a hair band is included.

FIG. 2 is a configuration diagram of the image processing system according to the present embodiment. The head mounted display 100 is connected to the display image generating apparatus 200 via an interface 205 for connecting peripheral devices such as wireless communication or USB. The display image generation apparatus 200 may be further connected to a server via a network. In that case, the server may provide the display image generating apparatus 200 with image data to be displayed on the head mounted display 100.

The display image generation device 200 identifies the position of the viewpoint and the direction of the line of sight based on the position and orientation of the head of the user wearing the head mounted display 100, generates a display image so as to have a field of view corresponding thereto, and Output to the mount display 100. As long as this is the case, the purpose of displaying an image may be various. For example, the display image generation apparatus 200 may generate a virtual world that is the stage of a game as a display image while advancing an electronic game, or display a moving image or the like for viewing regardless of whether the virtual world is a real world. May be. When the display device is a head-mounted display, if the panoramic image can be displayed in a wide angle range with the viewpoint as the center, it is possible to produce a state of being immersed in the display world.

FIG. 3 is a diagram for explaining an example of an image world displayed on the head mounted display 100 by the display image generating apparatus 200 in the present embodiment. In this example, a state in which the user 12 is in a room that is a virtual space is created. As shown in the figure, objects such as walls, floors, windows, tables, and objects on the table are arranged in the world coordinate system that defines the virtual space. The display image generation apparatus 200 defines a view screen 14 in the world coordinate system in accordance with the position of the viewpoint of the user 12 and the direction of the line of sight, and draws a display image by projecting an image of the object there.

The position of the viewpoint of the user 12 and the direction of the line of sight (hereinafter, these may be collectively referred to as “viewpoint”) are acquired at a predetermined rate, and the position and direction of the view screen 14 can be changed accordingly. For example, an image can be displayed with a field of view corresponding to the user's viewpoint. If a stereo image having parallax is generated and displayed in front of the left and right eyes on the head mounted display 100, the virtual space can be stereoscopically viewed. Thereby, the user 12 can experience a virtual reality as if he were in a room in the display world. In the illustrated example, the display target is a virtual world based on computer graphics. However, a real world photographed image such as a panoramic photograph may be used, or it may be combined with the virtual world.

In order to give such a display a sense of reality, it is desirable to reflect the physical phenomenon that occurs in the display target space as accurately as possible. For example, by accurately calculating the propagation of various light that reaches the eyes, such as diffuse reflection, specular reflection, and ambient light on the object surface, changes in the color and brightness of the object surface due to the movement of the viewpoint can be expressed more realistically can do. A typical method for realizing this is ray tracing. However, particularly in an environment that allows a free viewpoint, it is conceivable that latency that cannot be overlooked before display is caused by performing such physical calculation with high accuracy.

Therefore, in the present embodiment, an image viewed from a specific viewpoint is acquired in advance and used to determine the pixel value of the display image for an arbitrary viewpoint. That is, the color of the object appearing as an image in the display image is determined by extracting from the corresponding portion of the image acquired in advance. Hereinafter, the viewpoint set in the prior image acquisition is referred to as “reference viewpoint”, and the image acquired in advance viewed from the reference viewpoint is referred to as “reference image” or “reference viewpoint image”. By acquiring a part of data used for drawing a display image in advance as a reference image, the latency from the movement of the viewpoint to the display can be suppressed. In addition, since there is basically no time restriction in the generation stage of the reference image, physical calculations such as ray tracing can be performed with high accuracy over time.

Set a plurality of reference viewpoints in the movable range assumed for the viewpoint at the time of display, and prepare a reference image for each, taking into account the color of the same object seen from multiple viewpoints, Objects according to the viewpoint at the time of display can be expressed with higher accuracy. More specifically, when the viewpoint at the time of display coincides with one of the reference viewpoints, the pixel value of the reference image corresponding to the reference viewpoint can be adopted as it is. When the viewpoint at the time of display is between a plurality of reference viewpoints, the pixel values of the display image are determined by combining the pixel values of the reference image corresponding to the plurality of reference viewpoints.

FIG. 4 shows the internal circuit configuration of the display image generating apparatus 200. The display image generation apparatus 200 includes a CPU (Central Processing Unit) 222, a GPU (Graphics Processing Unit) 224, and a main memory 226. These units are connected to each other via a bus 230. An input / output interface 228 is further connected to the bus 230.

The input / output interface 228 includes a peripheral device interface such as USB or IEEE1394, a communication unit 232 including a wired or wireless LAN network interface, a storage unit 234 such as a hard disk drive or a nonvolatile memory, and a display device such as the head mounted display 100. An output unit 236 that outputs data to the head, an input unit 238 that inputs data from the head mounted display 100, and a recording medium driving unit 240 that drives a removable recording medium such as a magnetic disk, an optical disk, or a semiconductor memory are connected.

The CPU 222 controls the entire display image generation apparatus 200 by executing the operating system stored in the storage unit 234. The CPU 222 also executes various programs read from the removable recording medium and loaded into the main memory 226 or downloaded via the communication unit 232. The GPU 224 has a function of a geometry engine and a function of a rendering processor, performs a drawing process according to a drawing command from the CPU 222, and stores a display image in a frame buffer (not shown). Then, the display image stored in the frame buffer is converted into a video signal and output to the output unit 236. The main memory 226 includes a RAM (Random Access Memory) and stores programs and data necessary for processing.

FIG. 5 shows a functional block configuration of the display image generation apparatus 200 in the present embodiment. As described above, the display image generation apparatus 200 may perform general information processing such as progressing an electronic game or communicating with a server. In FIG. 5, in particular, a function of generating display image data according to a viewpoint. It is shown paying attention to. Note that at least a part of the functions of the display image generation apparatus 200 shown in FIG. 5 may be mounted on the head mounted display 100. Alternatively, at least a part of the functions of the display image generation device 200 may be implemented in a server connected to the display image generation device 200 via a network.

The functional blocks shown in FIG. 5 and FIG. 6 to be described later can be realized in hardware by the configuration of the CPU, GPU, various memories shown in FIG. 4, and loaded in the memory from a recording medium or the like in software. It is realized by a program that exhibits various functions such as a data input function, a data holding function, an image processing function, and a communication function. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.

The display image generation apparatus 200 includes a viewpoint information acquisition unit 260 that acquires information related to a user's viewpoint, a space construction unit 262 that constructs a space composed of objects to be displayed, a projection unit 264 that projects an object on a view screen, A pixel value determining unit 266 that determines values of pixels constituting the image and completes a display image, and an output unit 268 that outputs display image data to the head mounted display 100 are provided. The display image generation apparatus 200 further includes an object model storage unit 254 that stores data related to an object model necessary for constructing a space, and a reference image data storage unit 256 that stores data related to a reference image.

The viewpoint information acquisition unit 260 includes the input unit 238 and the CPU 222 shown in FIG. 4 and acquires the position of the user's viewpoint and the direction of the line of sight at a predetermined rate. For example, the output value of the acceleration sensor built in the head mounted display 100 is sequentially acquired, and thereby the posture of the head is acquired. Further, a light emitting marker (not shown) is provided outside the head mounted display 100, and the captured image is acquired from an imaging device (not shown), thereby acquiring the position of the head in real space.

Alternatively, an imaging device (not shown) that captures an image corresponding to the user's visual field may be provided on the head-mounted display 100 side, and the position and posture of the head may be acquired by a technique such as SLAM (Simultaneous Localization and Mapping). If the position and orientation of the head can be acquired in this way, the position of the user's viewpoint and the direction of the line of sight can be specified approximately. Those skilled in the art will understand that various methods for acquiring information related to the user's viewpoint are not limited to the case of using the head mounted display 100.

The space construction unit 262 is configured by the CPU 222, the GPU 224, the main memory 226, and the like in FIG. 4, and constructs a shape model of the space where the object to be displayed exists. In the example shown in FIG. 3, objects such as walls, floors, windows, tables, and objects on the table representing the room are arranged in the world coordinate system that defines the virtual space. Information related to the shape of each object is read from the object model storage unit 254. Here, the space construction unit 262 may determine the shape, position, and orientation of the object, and can use a modeling technique based on a surface model in general computer graphics.

In the present embodiment, it is possible to express how an object moves or deforms in a virtual space. For this reason, the object model storage unit 254 also stores data defining the movement and deformation of the object. For example, time-series data representing the position and shape of the object at predetermined time intervals is stored. Alternatively, a program for generating such a change is stored. The space construction unit 262 reads the data and changes the object arranged in the virtual space.

The projection unit 264 includes the GPU 224 and the main memory 226 shown in FIG. 4 and sets the view screen according to the viewpoint information acquired by the viewpoint information acquisition unit 260. That is, by setting the screen coordinates corresponding to the position of the head and the direction of the face, the display target space is drawn on the screen plane with a field of view corresponding to the position and direction of the user.

The projection unit 264 further projects the object in the space constructed by the space construction unit 262 onto the view screen at a predetermined rate. This processing can also use a general computer graphics technique for perspective-transforming a mesh such as a polygon. The pixel value determining unit 266 includes the GPU 224, the main memory 226, and the like shown in FIG. 4, and determines the values of the pixels constituting the object image projected onto the view screen. At this time, as described above, the reference image data is read from the reference image data storage unit 256, and pixel values representing points on the same object are extracted and used.

For example, by identifying corresponding pixels from a reference image generated with respect to a reference viewpoint around the actual viewpoint, and averaging with a weight based on the distance or angle between the actual viewpoint and the reference viewpoint, the pixel value of the display image To do. By accurately generating a reference image by taking time by ray tracing, etc., a high-definition that is close to the case of ray tracing is achieved with a light load calculation that reads out the corresponding pixel values and performs weighted averaging during operation. Image representation can be realized.

When representing the movement or deformation of an object, the reference image is naturally a moving image viewed from the reference viewpoint. Therefore, the pixel value determination unit 266 refers to the frame of the reference image at the corresponding time for the moving image of the object projected by the projection unit 264. That is, the pixel value determination unit 266 refers to the moving image of the reference image after synchronizing with the movement of the object in the virtual space generated by the space construction unit 262.

Note that the reference image is not limited to a graphics image drawn by ray tracing, and may be an image obtained by photographing a real space from a reference viewpoint in advance. In this case, the space construction unit 262 constructs a shape model of the real space to be imaged, and the projection unit 264 projects the shape model onto the view screen corresponding to the viewpoint at the time of display. Alternatively, the processing of the space construction unit 262 and the projection unit 264 can be omitted if the position of the image of the object to be imaged can be determined with a visual field corresponding to the viewpoint at the time of display.

When the display image is stereoscopically viewed, the projection unit 264 and the pixel value determination unit 266 perform processing on the left-eye and right-eye viewpoints, respectively. The output unit 268 includes the CPU 222, the main memory 226, the output unit 236, and the like shown in FIG. 4, and the display image data completed by the pixel value determination unit 266 determining the pixel value is transferred to the head mounted display 100 according to a predetermined value. Send at rate. When a stereo image is generated for stereoscopic viewing, the output unit 268 generates and outputs an image obtained by connecting the left and right as a display image. In the case of the head mounted display 100 configured to view a display image through a lens, the output unit 268 may perform correction on the display image in consideration of distortion caused by the lens.

FIG. 6 shows functional blocks of a device that generates reference image data. The reference image generation device 300 may be a part of the display image generation device 200 of FIG. 5 or may be provided independently as a device that generates data used for display. Further, the generated reference image data, the object model used for generation, and the electronic content including the data defining the movement are stored in a recording medium or the like and loaded into the main memory of the display image generation apparatus 200 during operation. You may be able to do it. The internal circuit configuration of the reference image generation device 300 may be the same as the internal circuit configuration of the display image generation device 200 shown in FIG.

The reference image generation device 300 generates a reference image data for each reference viewpoint based on the reference space setting unit 310 that sets a reference viewpoint, a space construction unit 316 that constructs a space composed of objects to be displayed, and the constructed space. The apparatus includes a reference image data generation unit 318, an object model storage unit 314 that stores data relating to an object model necessary for constructing a space, and a reference image data storage unit 256 that stores data of the generated reference image.

The reference viewpoint setting unit 310 includes an input unit 238, a CPU 222, a main memory 226, and the like, and sets the position coordinates of the reference viewpoint in the display target space. Preferably, a plurality of reference viewpoints are distributed so as to cover the range of viewpoints that the user can take. The appropriate values of the range and the number of reference viewpoints vary depending on the configuration of the display target space, the purpose of display, the accuracy required for display, the processing performance of the display image generation device 200, and the like. Therefore, the reference viewpoint setting unit 310 may accept a setting input of the position coordinates of the reference viewpoint from the creator of the display content. Alternatively, the reference viewpoint setting unit 310 may change the position of the reference viewpoint according to the movement of the object, as will be described later.

The space construction unit 316 includes a CPU 222, a GPU 224, a main memory 226, and the like, and constructs a shape model of a space in which an object to be displayed exists. This function corresponds to the function of the space construction unit 262 shown in FIG. On the other hand, the reference image generating apparatus 300 in FIG. 6 uses a modeling method based on a solid model in consideration of the color and material of an object in order to accurately draw an image of the object by ray tracing or the like. Therefore, the object model storage unit 314 stores object model data including information such as color and material.

Also, the space construction unit 316 moves or deforms the object in the virtual space. Alternatively, the lighting state may be changed or the color of the object may be changed. Information defining such a change may be read out from information stored in the object model storage unit 314 or may be set by direct input by the creator of the display content. In the latter case, the space construction unit 316 changes the object in accordance with the input information, and stores information defining the change in the object model storage unit 314 so that the same change occurs during display.

The reference image data generation unit 318 includes a CPU 222, a GPU 224, a main memory 226, and the like, and draws a display target object visible from the reference viewpoint for each reference viewpoint set by the reference viewpoint setting unit 310 at a predetermined rate. Preferably, by preparing a reference image as a panoramic video covering all directions from the reference viewpoint, the viewpoint at the time of display can be freely changed to all directions. Further, it is desirable to accurately represent the appearance at each reference viewpoint in the reference image by calculating the propagation of light rays over time.

The reference image data generation unit 318 also generates a depth image corresponding to each reference image. That is, the distance (depth value) from the screen surface of the object represented by each pixel of the reference image is obtained, and a depth image representing this as a pixel value is generated. When the reference image is an omnidirectional panoramic image, the view screen is a spherical surface, and the depth value is the distance to the object in the normal direction of the spherical surface. The generated depth image is used for selecting a reference image to be referred to when determining the pixel value of the display image.

Alternatively, as will be described later, the reference image data generation unit 318 may generate other information used when selecting the reference image for reference at the time of display instead of the depth image. Specifically, for the position on the object surface, a reference image to be referred to when drawing the position is obtained in advance. In this case, the reference image data generation unit 318 stores the information in the object model storage unit 314 as additional information of the object model. The object model storage unit 254 in FIG. 5 may store at least data used for generating a display image among the data stored in the object model storage unit 314 in FIG.

The reference image data generation unit 318 stores the generated data in the reference image data storage unit 256 in association with the position coordinates of the reference viewpoint. The reference image data storage unit 256 basically stores a pair of a reference image and a depth image for one reference viewpoint. However, in a mode in which a depth image is not used during display as described above, one reference viewpoint is used. Only the reference image is stored. Hereinafter, a pair of a reference image and a depth image may also be referred to as “reference image data”.

In the present embodiment, since the reference image and the depth image are moving images, the data size of the reference image is likely to increase depending on the number of reference viewpoints. Therefore, the reference image data generation unit 318 reduces the data size and the processing load when generating the display image by using a data structure in which the image is updated only for a moving region in the generated moving image. In addition, an integrated moving image in which a reference image frame and a depth image frame at the same time are represented in one frame is generated, and the data size is reduced by compressing and encoding in units, and decoding and decompression at the time of display is performed. Reduce the load of processing and synchronization processing. Details will be described later.

FIG. 7 shows an example of setting the reference viewpoint. In this example, a plurality of reference viewpoints are set on the horizontal plane 20a at the height of the eyes when the user 12 stands and the horizontal plane 20b at the height of the eyes when sitting, as indicated by black circles. As an example, the horizontal plane 20a is 1.4 m from the floor, and the horizontal plane 20b is 1.0 m from the floor. In addition, on the

horizontal planes

20a and 20b, assuming a moving range according to the display contents in the left-right direction (X-axis direction in the figure) and the front-back direction (Y-axis direction in the figure) centering on the user's standard position (home position) The reference viewpoint is distributed in the corresponding rectangular area.

In this example, the reference viewpoint is arranged at every other intersection of the grids that divide the rectangular area into four equal parts in the X-axis direction and the Y-axis direction, respectively. Further, the upper and lower

horizontal surfaces

20a and 20b are arranged so as to be shifted so that the reference viewpoints do not overlap. As a result, in the example shown in FIG. 7, a total of 25 reference viewpoints are set, 13 points on the upper

horizontal plane

20a and 12 points on the lower horizontal plane 20b.

However, the distribution of the reference viewpoint is not limited to this, and may be distributed on a plurality of planes including a vertical plane, or may be distributed on a curved surface such as a spherical surface. In addition, the reference viewpoints may be distributed at a higher density than the others in a range where the probability that the user is present is not made uniform. Further, as described above, the reference viewpoint may be arranged so as to correspond to the object to be displayed, and the reference viewpoint may be moved according to the movement of the object. In this case, the reference image is moving image data that reflects the movement of each reference viewpoint.

In addition, by setting a reference viewpoint to surround each object and preparing a reference image that represents only each object, an image for each object is first generated at the time of display, and then the displayed image is synthesized. May be generated. In this way, the positional relationship between the object and the reference viewpoint can be controlled independently. As a result, for example, important objects or objects that are likely to be seen in close proximity can be expressed in more detail, or even when different movements are performed for each object, the details of all objects can be expressed uniformly. For a stationary object such as a background, an increase in data size can be suppressed by representing the reference image as a still image from a fixed reference viewpoint.

FIG. 8 is a diagram for explaining a method in which the pixel value determining unit 266 of the display image generating apparatus 200 selects a reference image used for determining the pixel value of the display image. The figure shows a state in which a display target space including the object 24 is viewed from above. In this space, it is assumed that five reference viewpoints 28a to 28e are set, and reference image data is generated for each of them. In the figure, circles centered on the reference viewpoints 28a to 28e schematically show the screen surface of the reference image prepared as a panoramic image of the whole celestial sphere.

If the user's viewpoint at the time of image display is at the position of the virtual camera 30, the projection unit 264 determines the view screen so as to correspond to the virtual camera 30 and projects the model shape of the object 24. As a result, the correspondence between the pixel in the display image and the position on the surface of the object 24 is determined. For example, when determining the value of a pixel representing the image of the point 26 on the surface of the object 24, the pixel value determining unit 266 first specifies a reference image in which the point 26 appears as an image.

Since the position coordinates of the reference viewpoints 28a to 28e and the point 26 in the world coordinate system are known, their distances can be easily obtained. In the figure, the distance is indicated by the length of a line segment connecting the reference viewpoints 28a to 28e and the point 26. If the point 26 is projected onto the screen surface of each reference viewpoint, the position of the pixel where the image of the point 26 should appear in each reference image can be specified. On the other hand, depending on the position of the reference viewpoint, the point 26 may be behind the object or concealed by an object in front, and the image may not appear at the position of the reference image.

Therefore, the pixel value determining unit 266 confirms the depth image corresponding to each reference image. The pixel value of the depth image represents the distance from the screen surface of the object that appears as an image in the corresponding reference image. Therefore, by comparing the distance from the reference viewpoint to the point 26 and the depth value of the pixel in which the image of the point 26 in the depth image should appear, it is determined whether or not the image is the image of the point 26.

For example, since there is a point 32 on the back side of the object 24 on the line of sight from the reference viewpoint 28c to the point 26, the pixel in which the image of the point 26 in the corresponding reference image should appear actually represents the image of the point 32. Yes. Accordingly, the value indicated by the corresponding pixel of the depth image is the distance to the point 32, and the distance Dc converted to a value starting from the reference viewpoint 28c is clearly smaller than the distance dc to the point 26 calculated from the coordinate values. Become. Therefore, when the difference between the distance Dc obtained from the depth image and the distance dc to the point 26 obtained from the coordinate value is equal to or greater than the threshold value, the reference image is excluded from the calculation of the pixel value representing the point 26.

Similarly, the distances Dd and De to the corresponding pixel objects obtained from the depth images of the

reference viewpoints

28d and 28e are calculated based on the assumption that there is a difference between the distance from each

reference viewpoint

28d and 28e to the point 26 and a threshold value or more. Excluded. On the other hand, the threshold values determine that the distances Da and Db to the corresponding pixel objects obtained from the depth images of the

reference viewpoints

28a and 28b are substantially the same as the distances from the

reference viewpoints

28a and 28b to the point 26. Can be identified. The pixel value determination unit 266 selects the reference image used for calculating the pixel value for each pixel of the display image by performing the screening using the depth value in this way.

FIG. 8 illustrates five reference viewpoints, but actually, comparison is performed using depth values for all of the reference viewpoints distributed as shown in FIG. Thereby, a highly accurate display image can be drawn. On the other hand, referring to a depth image and a reference image of about 25 for all the pixels of the display image may cause a load that cannot be overlooked depending on the processing performance of the apparatus. Therefore, prior to selecting the reference image used for determining the pixel value as described above, a reference image to be a selection candidate may be narrowed down according to a predetermined reference. For example, reference viewpoints existing within a predetermined range from the virtual camera 30 are extracted, and selection processing using depth values is performed only for the reference images from them.

At this time, the upper limit of the reference viewpoint to be extracted is set to 10, 20, etc., and the range of the extraction target is adjusted so as to be within such an upper limit, or is selected randomly or based on a predetermined rule. Or you may. The number of reference viewpoints to be extracted may be varied depending on the area on the display image. For example, when virtual reality is realized using a head-mounted display, since the center area of the display image coincides with the direction in which the user's line of sight is directed, drawing with higher accuracy than the peripheral area is desirable.

Therefore, a certain number of reference viewpoints (reference images) are selected as candidates for pixels that are within a predetermined range from the center of the display image, while the number of selection candidates is reduced for pixels that are outside of that range. As an example, it can be considered that about 20 reference images are used as selection candidates and about 10 reference images are used as peripheral regions. However, the number of regions is not limited to two, and may be three or more regions. In addition to the division depending on the distance from the center of the display image, it is conceivable to dynamically divide according to the image area of the object of interest. In this way, by controlling the number of reference images to be referenced based on factors other than whether or not the image of the object is captured, it is possible to take into account the processing performance of the device, the accuracy required for display, the contents of display, etc. It is possible to draw a display image under various conditions.

FIG. 9 is a diagram for explaining a method in which the pixel value determining unit 266 determines the pixel value of the display image. As shown in FIG. 8, it is assumed that the image of the point 26 of the object 24 is represented in the reference images of the

reference viewpoints

28a and 28b. The pixel value determination unit 266 basically determines the pixel value of the image of the point 26 in the display image corresponding to the actual viewpoint by blending the pixel values of the image of the point 26 in those reference images.

Here, assuming that the pixel values (color values) of the image of the point 26 in the reference images of the

reference viewpoints

28a and 28b are c ₁ and c ₂ , respectively, the pixel value C in the display image is calculated as follows.
C = w ₁ · c ₁ + w ₂ · c ₂
Here, the coefficients w ₁ and w ₂ represent weights having a relationship of w ₁ + w ₂ = 1, that is, the contribution ratio of the reference image, and the positional relationship between the

reference viewpoints

28a and 28b and the virtual camera 30 representing the actual viewpoint. Determine based on. For example, the contribution rate is increased by increasing the coefficient as the distance from the virtual camera 30 to the reference viewpoint is shorter.

In this case, if the distances from the virtual camera 30 to the

reference viewpoints

28a and 28b are Δa and Δb, and sum = 1 / Δa ² + 1 / Δb ² , the weighting factor can be considered as the following function.
w ₁ = (1 / Δa ² ) / sum
w ₂ = (1 / Δb ² ) / sum
In the above equation, the number of reference images to be used is N, the identification number of the reference viewpoint is i (1 ≦ i ≦ N), the distance from the virtual camera 30 to the i-th reference viewpoint is Δi, and the corresponding pixel value in each reference image Is generalized as c _i and the weighting factor is w _i as follows.

In the above equation, when Δi is 0, that is, when the virtual camera 30 matches one of the reference viewpoints, the weight coefficient for the pixel value of the corresponding reference image is 1, and the weight coefficient for the pixel value of the other reference image is 0. Thereby, the reference image created with high accuracy for the viewpoint can be reflected on the display image as it is. However, the calculation formula is not limited to this.

The parameter used for calculating the weighting coefficient is not limited to the distance from the virtual camera to the reference viewpoint. For example, the angle θa, θb (0 ≦ θa, θb ≦ 90 °) formed by the line-of-sight vectors Va, Vb from each reference viewpoint to the line-of-sight vector Vr from the virtual camera 30 to the point 26 may be used. For example, using the inner products (Va · Vr) and (Vb · Vr) of the vectors Va and Vb and the vector Vr, the weight coefficient is calculated as follows.
w ₁ = (Va · Vr) / ((Va · Vr) + (Vb · Vr))
w ₂ = (Vb · Vr) / ((Va · Vr) + (Vb · Vr))
As described above, this equation is generalized as follows, where the number of reference images to be used is N, the line-of-sight vector from the reference viewpoint i to the point 26 is V _i , and the weighting factor is w _i .

Anyway, if a calculation rule is introduced such that the reference viewpoint whose state with respect to the point 26 is closer to the virtual camera 30 has a larger weighting factor, the specific calculation formula is not particularly limited. The weighting factor may be determined by evaluating “closeness of state” from both the distance and the angle. Further, the surface shape of the object 24 at the point 26 may be taken into consideration. The brightness of the reflected light from the object generally has an angle dependency based on the inclination (normal) of the surface. Therefore, the angle formed between the normal vector at the point 26 and the line-of-sight vector Vr from the virtual camera 30 is compared with the angle formed between the normal vector and the line-of-sight vectors Va and Vb from each reference viewpoint. You may enlarge a weighting coefficient, so that it is small.

Also, the function itself for calculating the weighting coefficient may be switched depending on attributes such as the material and color of the object 24. For example, in the case of a material in which the specular reflection component is dominant, the material has strong directivity, and the color observed varies greatly depending on the angle of the line-of-sight vector. On the other hand, in the case of a material in which the diffuse reflection component is dominant, the color change with respect to the angle of the line-of-sight vector is not so large. Therefore, in the former case, a function that increases the weighting coefficient is used for the reference viewpoint having a line-of-sight vector closer to the line-of-sight vector Vr from the virtual camera 30 to the point 26. In the latter case, the weighting coefficient is used for all the reference viewpoints. May be made equal, or a function that makes the angle dependency smaller than when the specular reflection component is dominant may be used.

For the same reason, in the case of a material in which the diffuse reflection component is dominant, a reference image used to determine the pixel value C of the display image is thinned out, or a reference having a line-of-sight vector with an angle close to the actual line-of-sight vector Vr by a predetermined value. The number itself may be reduced by using only images or the calculation load may be suppressed. As described above, when the determination rule of the pixel value C is varied depending on the attribute of the object, the reference image data storage unit 256 associates the data representing the attribute such as the material of the object represented by the image of the reference image with each other. And store it.

According to the above-described embodiment, the surface shape and material of the object can be taken into account, and the directivity of light due to specular reflection can be more accurately reflected in the display image. Note that the weighting factor is determined by combining two or more of calculations based on the shape of the object, calculations based on attributes, calculations based on the distance from the virtual camera to the reference viewpoint, and calculations based on the angle formed by each line-of-sight vector. May be.

Next, the operation of the image generation apparatus that can be realized by the configuration described so far will be described. FIG. 10 is a flowchart illustrating a processing procedure in which the display image generation apparatus 200 generates a display image corresponding to the viewpoint. This flowchart is started when an application or the like is started by a user operation, an initial image is displayed, and a viewpoint movement is accepted. As described above, various information processing such as an electronic game may be performed in parallel with the display processing illustrated. First, the space construction unit 262 forms an initial state of a three-dimensional space in which a display target object exists in the world coordinate system (S10).

On the other hand, the viewpoint information acquisition unit 260 identifies the position of the viewpoint and the direction of the line of sight at that time based on the position and orientation of the user's head (S12). Next, the projection unit 264 sets a view screen for the viewpoint, and projects an object existing in the display target space (S14). As described above, in this processing, only the surface shape needs to be considered, such as perspective transformation of the vertexes of the polygon mesh forming the three-dimensional model. Next, the pixel value determining unit 266 sets one target pixel among the pixels inside the mesh thus projected (S16), and selects a reference image used for determining the pixel value (S18).

That is, as described above, the reference image in which the point on the object represented by the target pixel appears as an image is determined based on the depth image of each reference image. Then, the pixel value determination unit 266 determines the weighting factor based on the positional relationship between the reference viewpoint of these reference images and the virtual camera corresponding to the actual viewpoint, the shape of the object, the material, and the like, and then the correspondence of each reference image. The value of the target pixel is determined by weighted averaging the pixel values to be performed (S20). It should be understood by those skilled in the art that the calculation for deriving the pixel value of the target pixel from the pixel value of the reference image can be considered variously as statistical processing and interpolation processing in addition to the weighted average.

The processes of S18 and S20 are repeated for all pixels on the view screen (N and S16 of S22). When the pixel values of all the pixels are determined (Y in S22), the output unit 268 outputs the data to the head mounted display 100 as display image data (S24). When generating the left-eye display image and the right-eye display image, the processing of S16 to S22 is performed for each, and the images are connected and output as appropriate. If it is not necessary to end the display (N in S26), the space construction unit 262 forms a display target space for the next time step (S10). That is, the object is moved or deformed by a time step from the initial state. Then, after obtaining information on the user's viewpoint at that time, a view screen is set, and a display image is generated and output (S12 to S24). The processes from S10 to S24 are repeated until the end of the display process, and when it is necessary to end the display, all the processes are ended (Y in S26).

In the example of FIG. 10, the pixel values are determined using the reference image for all the pixels on the view screen, but the drawing method may be switched depending on the region on the display image and the position of the viewpoint. For example, for an image of an object that does not require changes in light or color due to viewpoint movement, only conventional texture mapping may be performed. In addition, a state observed only from a local viewpoint, such as reflected light with high directivity, may not be expressed from the surrounding reference image. Therefore, the amount of data prepared as a reference image can be suppressed by switching to rendering by ray tracing only when the viewpoint enters the corresponding range.

FIG. 11 shows an example of the structure of data stored in the reference image data storage unit 256. The reference image data 270 has a data structure in which the reference viewpoint position coordinates 274, the reference image 276, and the depth image 278 are associated with each reference image identification information 272. As described with reference to FIG. 7, the reference viewpoint position coordinates 274 are three-dimensional position coordinates in the virtual space set by the reference viewpoint setting unit 310 in consideration of the movable range of the user 12.

The reference image 276 is moving image data representing a space including a moving object when viewed from each reference viewpoint. The depth image 278 also becomes moving image data representing the distance from the screen surface of the space including the moving object. In the figure, the reference image is represented by character information such as “moving image A”, “moving image B”, “moving image C”, and the depth image is represented by “depth moving image A”, “depth moving image B”, and “depth moving image C”. Actually, information such as a storage area in the reference image data storage unit 256 may be included.

FIG. 12 shows an example of setting a reference viewpoint for representing a moving object. The representation of the figure is the same as in FIG. An object 34 and an object 35 exist in the virtual space shown in FIGS. On the other hand, the reference viewpoint setting unit 310 of the reference image generating apparatus 300 sets five

reference viewpoints

30a, 30b, 30c, 30d, and 30e. Here, it is assumed that one object 35 has moved as indicated by an arrow. On the other hand, (a) shows a mode in which the reference viewpoint is not moved.

In this case, the change in each reference image is mainly limited to the image area of the object 35. That is, in each frame of the moving image of the reference image and the moving image of the depth image, no change occurs in a wide range of areas, and therefore, for example, the data size can be reduced by applying a compression method using inter-frame difference. On the other hand, in the mode shown in (b), at least a part of the reference viewpoints 30a to 30e is moved so as to correspond to the movement of the object 35, thereby obtaining reference viewpoints 36a to 36e. In the illustrated example, the four reference viewpoints 30a to 30d are moved to the reference viewpoints 36a to 36d with the same speed vector as the speed vector of the object 35. However, the movement rule is not limited to this, and the reference viewpoint can be moved so that the distance to the object does not exceed the predetermined threshold and the distance between the reference viewpoints does not fall below the predetermined threshold. That's fine.

In this case, since the background other than the moving object 35 also changes relatively, the area that changes between frames becomes wider and the data compression efficiency becomes lower. On the other hand, since the distance between the object and the reference viewpoint can be kept substantially constant, the level of detail of the object image is unlikely to change in the display image. Based on these points, the reference viewpoint setting rule is appropriately selected in consideration of the level of detail required for the display image, the range of movement of the object, a suitable data size, and the like when expressing the object.

However, it is not necessary to move all reference viewpoints according to the same rules. For example, as shown in the figure, when there are a plurality of

objects

34 and 35 in the display target space and only one of them moves, the reference viewpoint 30e (= 36e) in the vicinity of the stopped object 34 may be fixed. Good. Even when the movement direction and speed differ among a plurality of objects, the movement direction and speed of the reference viewpoint may be individually set in correspondence with them.

For example, for each object, the reference viewpoint in charge of the object is distributed within the predetermined range, and the position of the reference viewpoint is controlled so that the positional relationship with the object is maintained. Here, “responsible” refers only to the tracking of the position, and the reference image may represent all objects visible from the reference viewpoint. Alternatively, as described above, only the image of the object in charge may be represented as a reference image and combined when determining the pixel value of the display image.

For example, after temporarily determining the pixel value of the display image using the reference image representing only the background, the display image is overwritten using the reference image representing only the foreground object. There may be a reference viewpoint that handles a plurality of objects simultaneously. For example, a certain reference viewpoint may be moved by an average vector of moving speed vectors of a plurality of objects. In the mode (b), the position coordinates of the reference viewpoint change with respect to the time axis among the data of the reference image shown in FIG.

Therefore, the reference image generating apparatus 300 stores the reference image data and the reference viewpoint position coordinates in the reference image data storage unit 256 in association with each time step. The pixel value determination unit 266 of the display image generation apparatus 200 calculates the weighting factor based on the positional relationship between the reference viewpoint and the user viewpoint at the same time step, and then determines the pixel value of the display image at the time step. To do.

In the example of FIG. 12, it is assumed that the display image is generated using all of the prepared reference images. However, after generating the reference image with the reference viewpoint fixed, the reference image used for generating the display image is the object. You may switch according to the movement of. FIG. 13 is a diagram for describing a mode in which the reference image used for generating the display image is switched according to the movement of the object. The way of representing the figure is the same as in FIG. That is, the

objects

34 and 35 exist in the virtual space, and the latter moves as indicated by arrows.

The reference image generation apparatus 300 sets fixed reference viewpoints 38a to 38f so as to cover the movement range of the object, and generates respective reference images. On the other hand, the display image generation device 200 switches the reference image used for display according to the movement of the object. For example, at the initial position of the object 35, a reference image indicated by a solid line (reference images of the

reference viewpoints

38a, 38b, 38c, and 38f) is used to generate a display image. On the other hand, at the position after movement, the reference image indicated by the broken line (reference image of the

reference viewpoints

38d and 38e) is added to the reference destination when the display image is generated, and at the same time, the reference image indicated by the thick solid line (

reference viewpoints

38b and 38f). Are excluded from the reference object.

At this time, for example, a reference image corresponding to a reference viewpoint whose distance from the

objects

34 and 35 is smaller than a threshold value is used to generate a display image. In this way, the object can be expressed with a stable level of detail in the same manner as when the reference viewpoint is moved. In addition, since the moving image of each reference image does not move the viewpoint, the region that changes between frames is limited, and the compression efficiency increases. However, since it is necessary to provide a relatively large number of reference viewpoints, the number of moving images of the reference image tends to increase.

As described above, the reference image is basically video data. Therefore, data can be stored or transmitted in the reference image data storage unit 256 using a general moving image data compression encoding method such as MPEG (Moving Picture Picture Experts Group). Alternatively, when an omnidirectional image is represented by an equirectangular cylinder, it may be converted into a coefficient of a general spherical harmonic function and compressed. Further, compression may be performed for each frame by using a general still image data compression encoding method such as JPEG (Joint Photographic Experts Group).

On the other hand, the present embodiment has characteristics such as that the reference image and the depth image video are paired and that the videos of a plurality of reference viewpoints to be synchronized are to be stored. The effect can be enhanced by introducing. FIG. 14 shows a configuration of functional blocks of the reference image data generation unit of the reference image generation device 300 and the pixel value determination unit of the display image generation device 200 when the compression / decompression processing function of the reference image data is introduced. ing.

In this aspect, the reference image data generation unit 318a includes a reference image generation unit 330, a depth image generation unit 332, and a data compression unit 334. The reference image generation unit 330 and the depth image generation unit 332 generate the reference image and depth image data as described above. That is, a moving image of the reference image representing the state of the space from each reference viewpoint set by the reference viewpoint setting unit 310 and a moving image of the depth image representing the distance value are generated. Here, the reference viewpoint may be fixed, or a part thereof may be moved according to the movement of the object.

The data compression unit 334 compresses the reference image and the depth image thus generated at a predetermined rate with respect to the time axis according to a predetermined rule. Specifically, at least one of the following processes is performed.
(1) A reference image and a depth image at the same time step are reduced as necessary, and an integrated moving image expressed as an image for one frame is generated. (2) An area having a change in the reference image and the depth image Represents only as time-series data

The data compression unit 334 stores the compressed data in the reference image data storage unit 256. At this time, one frame of the integrated image or the image of the area with change may be further compressed by JPEG. Alternatively, the moving image of the integrated image may be compressed by MPEG. On the other hand, the pixel value determination unit 266a includes a data decompression unit 336, a reference unit 338, and a calculation unit 340. The data decompression unit 336 restores the reference image and the depth image by reading the reference image data at each time step from the reference image data storage unit 256 and decompressing the data.

That is, when the compression of (1) is performed, the data decompression unit cuts out the reference image and the depth image from each frame of the integrated moving image, and enlarges them as necessary. When the compression of (2) is performed, only the changed area in the previous frame image is updated using the time series data. When the compressions (1) and (2) are performed at the same time, both are performed even when decompressing.

The reference unit 338 selects the reference image representing the point on the object to be drawn for each pixel of the display image using the depth image of each time step restored as described above, and the reference image The pixel value of is acquired. As described above, the calculation unit 340 also determines the pixel value of the display image by averaging the pixel value acquired from the reference image of the reference destination with an appropriate weight.

FIG. 15 schematically illustrates an example of an integrated moving image generated by the data compression unit 334. The integrated moving image 42 is divided into four regions obtained by dividing one frame 40, and “first reference image” and “second reference image” generated for two reference viewpoints, and “corresponding“ Of the “first depth image” and the “second depth image”, each has a data structure representing frames of the same time step. The data compression unit 334 appropriately reduces the frames of the reference image and the depth image according to the size of the image plane set in the integrated moving image 42, and connects them in a predetermined arrangement as illustrated.

For example, when the integrated moving image 42 has the same size as the frame of the original reference image or depth image, the data compression unit 334 reduces the frame of each reference image and depth image to 1/2 in both the vertical and horizontal directions. The data compression unit 334 further associates the position coordinates of the two reference viewpoints integrated as an integrated moving image as additional data of the moving image. These processes correspond to the conversion of data for two rows into one moving image in the reference image data shown in FIG.

This makes it possible to reduce the size of the entire reference image data, thereby saving transmission bandwidth and storage device capacity. In addition, since four types of moving images can be decoded and expanded at a time, parallel processing for restoration is facilitated even if a large number of reference viewpoints are set. Furthermore, since the four types of data are naturally synchronized, the synchronization process can be simplified even if the data of all reference viewpoints are considered. Note that the number of reference viewpoints to be integrated by one integrated moving image 42 is not limited to two, and may be larger depending on the reduction ratio allowed for each image.

FIG. 16 schematically shows another example of the integrated moving image generated by the data compression unit 334. This integrated moving image 46 is divided into four areas obtained by dividing one frame 44, and “first reference image”, “second reference image”, and “third reference image” generated for three reference viewpoints. ”And the corresponding“ first depth image ”,“ second depth image ”, and“ third depth image ”, have a data structure representing frames of the same time step.

In the case of the integrated moving image 42 shown in FIG. 15, the channels and gradations to be used are not limited by representing the “first depth image” and the “second depth image” in different regions of the image plane. On the other hand, the integrated moving image 46 shown in FIG. 16 includes “first depth image”, “second depth image”, and “third depth image” in red (R), green (G), and blue (B). Three channels are used to represent the same area in the image plane. Thereby, three reference images can be represented in the remaining three regions.

According to such a data structure, the image reduction rate is the same as in FIG. 15, but data of three reference viewpoints can be included in one moving image. As a result, synchronization processing and decoding expansion processing can be made more efficient while maintaining image quality. However, when RGB images are converted into YCbCr images for compression encoding, there is a possibility that when the display image generating apparatus 200 performs decoding and decompression, the pixel values of other depth images are affected and cannot be completely restored. Therefore, it is desirable to employ a compression encoding method that can accurately restore RGB values.

FIG. 17 is a diagram for explaining a technique for converting only an image in a region having a change into time-series data as one of the compression processes performed by the data compression unit 334. In this example, a moving image representing a car traveling on a road is assumed, and (a) shows a reference image for 6 frames of them continuously with the horizontal axis as time. Here, each frame of the reference image represents an omnidirectional image viewed from the reference viewpoint with an equirectangular cylinder. In this case, there is almost no movement on the road and background other than the object automobile.

(B) in FIG. 6 is a region in which a fixed size region (for example, region 50) including the automobile is extracted from each frame shown in (a). As described above, the change in the moving image of the reference image is almost limited to the extracted region. Therefore, the data compression unit 334 stores the entire region using a frame at a certain point in time, for example, the frame 52 as a reference frame, and an image of a region of a predetermined size including the object (for example, an image) 54) and the time-series data of 54) and the position information of the area on the reference image plane are stored in association with each other to obtain the compressed reference image data.

The data decompression unit 336 uses the time step to which the reference frame is given as a reference image, and for subsequent time steps, updates only the area stored as time-series data. Restore the image. As shown in the figure, the image 54 of the fixed-size area including the object may have a higher resolution than the image of the corresponding area 50 in the reference frame. According to this, even if the size of the reference frame is reduced to reduce the data size, the degree of detail can be maintained for the object region that the user is expected to watch. The reference frame may be the first frame of each moving image or may be a frame at a predetermined time interval.

(C) in the figure further extracts only an image area of the object, for example, a rectangular area having four sides at a predetermined distance from the outline of the object. In this case, the size of the extracted region varies depending on the positional relationship between the reference viewpoint and the object. The data compression unit 334 extracts an object image from each frame of the reference image shown in FIG. Then, the entire region is stored with a frame at a certain time point, for example, the frame 52 as a reference frame, and the time step frame after that is stored with time series data of the image of the object image region (for example, the image 56), By storing the positional information and size information of the area on the reference image plane in association with each other, the compressed reference image data is obtained.

Alternatively, an image representing only the object may be generated as the image 56 when the reference image generating unit 330 generates the reference image. In this case, the screen surface may be adjusted so that the object is zoomed while the reference viewpoint is fixed. The operation of the data decompression unit 336 is the same as in the case of (b). The modes (a) to (c) can be implemented not only for the reference image but also for the depth image. The compression method applied to the reference image and the depth image may be the same or different. According to the compression method (c), object information can be held with the same level of detail regardless of the distance between the reference viewpoint and the object.

FIG. 18 is a diagram for explaining a method of using information representing only pixels with changes as time-series data as one of the compression processes performed by the data compression unit 334. The horizontal axis of the figure indicates time. First, the image 60 is one frame of the reference image or a part thereof. The image 62a corresponds to the next frame of the image 60, but the pixel having a pixel value difference from the image 60 of a predetermined value or more is shown in gray. The image 62b further corresponds to the next frame, and similarly, a pixel whose pixel value difference from the previous frame is a predetermined value or more is shown in gray.

The data compression unit 334 takes the inter-frame difference of the reference image, and extracts a pixel having a difference of a predetermined value or more in the pixel value. As a result, in the illustrated example, the front portion including the hood and bumper of the automobile and pixels representing the road surface in front of the automobile are extracted. Next, the data compression unit 334 stores the data (x, y, R, G, B) composed of the extracted pixel position coordinates and the changed pixel value as a 5-channel pixel value in raster order and holds the image 64a. , 64b is generated. Here, (x, y) is a pixel position coordinate on the reference image plane, and (R, G, B) is a pixel value of the reference image, that is, a color value.

In the case of a depth image, when (d) is the pixel value of the depth image, that is, the distance value, the data (x, y, d) consisting of the extracted pixel position coordinates and the changed pixel value is represented by a three-channel pixel. An image held as a value in a raster order is generated. Then, the entire area is stored using the image 60 as a reference frame, and for the frames in subsequent time steps, the

images

64a and 64b representing only the information of the changed pixels are stored as time-series data, so that the post-compression Video data of the reference image.

The data decompression unit 336 uses the time step to which the reference frame is given as a reference image, and for the time steps after that, updates only the pixels stored as time-series data. Restore the image. The same applies to the depth image. As a result, the data size can be further reduced in consideration of the shape of the object as compared with the mode shown in FIG. The reference frame may be the first frame of each moving image or may be a frame at a predetermined time interval. The embodiment of FIG. 17 and the embodiment of FIG. 18 may be appropriately combined.

FIG. 19 exemplifies two frames that move back and forth in the moving image of the reference image. As described above, when the number of main objects that move or deform in the display target space is limited, the difference between the frames is limited to only a part of the region. Even in the illustrated image of the automobile traveling, only the minute movements of the

automobile images

70a and 70b and the minute reflection changes on the

roads

72a and 72b occur between the upper frame and the lower frame.

In this example,

regions

74a and 74b above the road on the image plane are distant views. The distant view is different in nature from the surface of the object arranged in the display target space assumed in the present embodiment, and often does not need to be changed with respect to the movement of the user's viewpoint. Therefore, an image at a predetermined reference viewpoint may be separately displayed as a display image by texture mapping or the like. In other words, it is less necessary to hold the image data of the area for each reference viewpoint. Using these properties, the compression process may be controlled in units of tile images after the reference image and the depth image are divided into tile images of a predetermined size.

FIG. 20 is a diagram for explaining a method in which the data compression unit 334 controls the compression process of the reference image in units of tile images. The illustrated image corresponds to one frame shown in FIG. 19, and a matrix-like rectangle divided by a lattice represents a tile image. The size of the tile image is set in advance. Among such tile images, the tile image surrounded by the white frame included in the distant view area 80 does not need to reflect the movement of the user's viewpoint as described above, and therefore, from the reference image data for each reference viewpoint. exclude.

Since the tile image surrounded by the remaining black lines is included in the foreground, that is, the region 82 used for drawing the object, it is included in the reference image data for each reference viewpoint as time series data. Alternatively, a tile image having a difference from the previous frame, such as a tile image surrounded by a solid line (for example, tile image 84), may be extracted, and only the time-series data may be included in the reference image data. For example, when the average value of the pixel values of tile images at the same position has a difference of a predetermined value or more between frames, it is determined that a difference from the previous frame has occurred and is extracted.

Or, from a tile image (for example, tile image 84) in which a difference from the previous frame is generated, a pixel having a difference of a predetermined value or more from the previous frame is extracted, and data including the position coordinate and the pixel value of the pixel is extracted. An image to represent may be generated. This processing is as described in FIG. Similarly, in the depth image, data can be omitted or the compression state can be controlled in units of tile images. When the entire depth image is handled as general moving image data, for example, the distance value must be expressed in 256 gradations of SDR (Standard Dynamic Range), and therefore information after the decimal point is lost. If the original pixel value (distance value) is stored as floating point data in units of tile images, the resolution of the distance value is increased, and the reference image used for drawing can be selected with high accuracy.

FIG. 21 shows an example of the structure of data after compression in a mode in which the compression processing of the reference image and the depth image is controlled in units of tile images. The post-compression reference image data 350 is generated for each reference viewpoint, and has a data structure in which the tile image data is connected in time series in association with the position coordinates of the tile image (denoted as “tile position”) on the image plane. In the figure, the time series are in the order of “frame numbers” 0, 1, 2,. For example, when a tile image with position coordinates (0, 0) or (1, 0) is included in the distant view area, the image in that area is not used for drawing an object and is therefore invalid as the reference image data. Prepare in the form of data. In the figure, “-” indicates that the data of the tile image is invalid.

On the other hand, for the tile image that is included in the foreground area and may be used for drawing the object, first, the data of the first frame (frame number “0”) is included in the data of the reference image. In the figure, the tile images are “image a”, “image b”, and the like. For the subsequent frames, only when a change occurs in the tile image, information indicating the change is included in the data of the reference image. In the example shown in the figure, the tile images of the position coordinates (70, 65) and (71, 65) have a change with the frame number “1”. It is included.

Since the tile image at the position coordinates (70, 65) has a difference even in the next frame, “difference image c2” is included in association with the frame number “2”. Here, the difference image is an image representing a difference from the previous frame, and corresponds to, for example, the

images

64a and 64b in FIG. Further, since the tile image of the position coordinate (30, 50) has a frame number “24” and the tile image of the position coordinate (31, 50) has a change of the frame number “25”, an image “difference image” representing each difference. a1 "and" difference image b1 ".

The data decompression unit 336 of the display image generation apparatus 200 restores the reference image and the depth image of the frame by connecting the tile images associated with the frame number “0” based on the position coordinates. For the subsequent frames, only the moving image of the reference image and the depth image can be restored by updating the pixels represented as the difference image only in the tile area in which the difference image is included.

In the embodiment described so far, it is assumed that an omnidirectional image is represented by an equirectangular cylinder as a reference image, but the present embodiment is not limited thereto. FIG. 22 is a diagram for explaining an example of data compression processing in a case where images of all directions of the reference image and the depth image are represented by a cube map. (A) has shown the relationship between the screen surface of all directions, and the surface of a cube map. The cube map surface 362 is a surface constituting a cube including a spherical screen surface 360 having the same distance from the viewpoint 364 in all directions.

A certain pixel 366 on the screen surface 360 is mapped to a position 368 where a straight line from the viewpoint 364 to the pixel 366 intersects with the surface 362 of the cube map. Such a cube mapping technique is known as one of panoramic image expression means. In the present embodiment, the reference image and the depth image can be held as cube map data. (B) is a development view of six surfaces when a depth image at a certain reference viewpoint is represented by a cube map.

As described above, when the reference image is a moving image, image data as illustrated is generated at a predetermined rate. However, when the spaces illustrated in FIGS. 17 to 20 are represented, the difference from the previous frame is limited to the area of the car image indicated by the arrow in FIG. By using the fact that the image plane is originally divided into six sections, the cube map can easily include only the moving surface (the surface 370 in the illustrated example) as time series data in the reference image data. it can.

For example, in the data structure shown in FIG. 21, if the tile image is replaced with a cube map surface and the “difference image” is an image of a surface where a difference from the previous frame occurs, the data compression unit 334 and the data decompression The operation of the unit 336 is the same as described above. Alternatively, the surface of the cube map may be further divided into tile images, and it may be determined whether or not to include them in the reference image data in units of tile images. Furthermore, for the cube map surface that has changed from the previous frame, or the tile image that has changed, data representing only the information relating to the pixel where the difference has occurred as shown in FIG. It may be an “image”.

When the reference image or depth image is represented by the equirectangular cylinder method, the image of the object directly below or above the viewpoint is stretched laterally at the lower or upper part of the image plane. For this reason, when such a region changes in the space to be displayed, it is conceivable that a wide range changes in the equirectangular image, and the efficiency of data compression deteriorates. According to the cube map method, the change in the image plane is limited to the area corresponding to the change in the space, so that the efficiency of data compression can be stabilized.

In the mode described so far, a reference image and a depth image are generated as a pair for each reference viewpoint, and they are similarly compressed or expanded and used for drawing an object. Here, the depth image is used to select a reference image to be referred for drawing each point on the object surface. If this is pre-calculated and associated with the position on the object surface, the depth image itself does not need to be included in the reference image data.

FIG. 23 shows the reference image data generation unit of the reference image generation device 300 and the display image generation device 200 when a function for storing information related to the reference image of the reference destination in association with the position on the object surface is introduced. The structure of the functional block of the pixel value determination part is shown. In this aspect, the standard image data generation unit 318b includes a standard image generation unit 330, a data compression unit 334, a depth image generation unit 332, and a reference destination information addition unit 342. The functions of the reference image generation unit 330, the data compression unit 334, and the depth image generation unit 332 are the same as the corresponding functional blocks shown in FIG.

The reference destination information adding unit 342 uses the depth image generated by the depth image generating unit to generate information for designating a reference image to be referred to for drawing the position with respect to the position on the object surface. This process is basically the same as that shown in FIG. That is, a reference image in which a point on the object (such as point 26 in FIG. 8) appears as an image is determined by comparing the distance to the object indicated by the depth image with the distance from the reference viewpoint to the point in the display target space. To do.

However, when the reference destination is selected at the time of display as described with reference to FIG. 8, the pixel to be drawn in the display image is set as the starting point and the corresponding point is determined, but the reference destination information adding unit 342 obtains the reference destination. A unit area on the object surface is set according to a predetermined rule. Specific examples will be described later. The reference destination information adding unit 342 writes the identification information of the reference destination reference image thus determined in association with the object model stored in the object model storage unit 254.

When the object moves or deforms, the appearance from the reference viewpoint also changes, so part of the identification information of the reference image written in the object model is time-series data. With this configuration, when the display image generation apparatus 200 generates a display image, it is not necessary to refer to the depth image. Therefore, the data compression unit 334 compresses only the reference image generated by the reference image generation unit 330 by any one of the methods described above, and stores it in the reference image data storage unit 256.

The pixel value determination unit 266b of the display image generation device 200 includes a data decompression unit 336, a reference unit 344, and a calculation unit 340. The functions of the data decompression unit 336 and the calculation unit 340 are the same as the corresponding functional blocks shown in FIG. However, the data decompression unit 336 performs the decompression process only on the reference image stored in the reference image data storage unit 256 as described above. On the other hand, unlike the reference unit 338 in FIG. 14, the reference unit 344 determines a reference image used to draw a point on the object corresponding to each pixel on the display image based on information added to the object model. .

Then, a pixel value representing the image of the point is acquired from the determined reference image and supplied to the calculation unit 340. With such a configuration, the processing load of the reference unit 344 is reduced, and the display image generation process can be speeded up. Further, since the identification information of the reference image of the reference destination requires less gradation than the distance value of the depth image, the data size can be reduced even as time-series data.

FIG. 24 is a diagram for explaining an example of a method for associating identification information of a reference image of a reference destination with an object model. The representation of the figure is the same as in FIG. That is, five reference viewpoints are set in a space where the object 424 exists, and

reference images

428a, 428b, 428c, 428d, and 428e are generated. The identification information of each reference image (or reference viewpoint) is “A”, “B”, “C”, “D”, and “E”. In this example, the reference destination information adding unit 342 associates the identification information of the reference image to be referred to in units of vertices of the objects 424 indicated by circles or in units of planes (mesh) surrounded by straight lines connecting the vertices.

For example, it is found from the depth image that the surface 430a of the object 424 appears in the reference images of the identification information “A” and “C”. Therefore, the identification information “A” and “C” are associated with the surface 430a. If it is found that the surface 430b appears in the reference images of the identification information “A” and “B”, the identification information “A” and “B” are associated with the surface 430b. When it is found that the surface 430c appears in the reference images of the identification information “C” and “D”, the identification information “C” and “D” are associated with the surface 430c.

For other surfaces of the object, the reference image is used to identify the reference image using the depth image, and the identification information is associated. In the figure, the identification information to be associated is shown in a balloon from each surface of the object 424. The reference unit 344 of the display image generation device 200 identifies a surface including a point on the object corresponding to the drawing target pixel or a vertex in the vicinity thereof, and acquires identification information of a reference image associated therewith. . According to such a configuration, since information can be added using information on vertices and meshes already formed as an object model as they are, an increase in data size can be suppressed. Further, since the reference destinations in the object model are limited, the processing load during display is small.

On the other hand, since the granularity for storing information such as faces and vertices increases, if the reference image of the reference destination changes on the same face due to occlusion, it cannot be accurately represented. In this case, it is conceivable that only the standard image showing the entire surface is used as the reference destination, but the standard image used for drawing is reduced, and the quality of the display image may be reduced. In order to maintain the image quality, it is necessary to divide the surface (mesh) for each region having different reference destinations and set the information of the reference image in that unit, but this is disadvantageous in terms of data size and processing load. From these facts, it is desirable to apply the illustrated technique to an object having a relatively simple shape.

FIG. 25 is a diagram for explaining another example of a method for associating identification information of a reference image of a reference destination with an object model. The representation of the figure is the same as in FIG. In this aspect, the distribution of the identification information of the reference image of the reference destination is generated as a texture image. For example, with respect to the surface 430a of the object 424, a texture image 432 is generated that represents the identification information of the reference image as a pixel value for each position on the surface. If the reference destination does not change in the plane, the pixel values of the texture image 432 are uniform. When the reference image of the reference destination changes in the plane due to occlusion or the like, the pixel value of the texture image 432 changes so as to correspond to it. This makes it possible to control the reference destination with a smaller granularity than the surface unit.

In this case, the reference unit 344 of the display image generation device 200 specifies the (u, v) coordinates on the texture image corresponding to the point on the object to be drawn, and the identification information of the reference image represented at that position Is read. This process is basically the same as general texture mapping in computer graphics. According to such a configuration, switching of reference destinations within the same plane by occlusion or the like can be realized with a light load without dividing the mesh defined by the object model.

FIG. 26 is a diagram for explaining still another example of a method for associating identification information of a reference image of a reference destination with an object model. The representation of the figure is the same as in FIG. In this aspect, the object is divided into voxels of a predetermined size, and the identification information of the reference image to be referred to is associated with the voxel unit. For example, when the surface 430a of the object 242 appears in the reference images of the identification information “A” and “C”, the identification information “A” and “C” correspond to the voxels (for example, the

voxels

432a and 432b) including the surface 430a. Put it on. The same applies to voxels including other surfaces. When two planes are included in one voxel, reference destination information may be associated with each plane.

If the reference destination does not change in the plane, the information associated with the voxel containing it is the same. Even if the reference image of the reference destination changes within the plane due to occlusion or the like, an appropriate reference destination can be acquired with fine granularity by holding the reference destination information in units of voxels. In this case, the reference unit 344 of the display image generation apparatus 200 specifies a voxel including a point on the drawing target object, and acquires identification information of a reference image associated with the voxel. According to such a configuration, an image can be drawn with high accuracy with a unified data structure and processing regardless of the shape of the object and the complexity of the space.

In the example shown in the figure, the state of looking down over the same size voxels is represented by a set of squares. On the other hand, the unit of the three-dimensional space that associates the identification information of the reference image to be referred to is not limited to voxels having the same size. For example, space division by an octree that is widely known as one of methods for efficiently retrieving information associated with a position in a three-dimensional space may be introduced. This method requires processing that makes the target space a root box, divides it into three boxes in three dimensions to form eight boxes, and further divides the box into eight boxes. This is a technique for representing a space in an octree tree structure by repeating the process accordingly.

By changing the number of divisions according to the position, the size of the box that is finally formed can be controlled by the locality of the granularity of the space that associates the information. Further, the relationship between the index numbers given to these boxes and the positions in the space can be easily found by simple bit operations. In this case, the reference unit 344 of the display image generation apparatus 200 obtains the index number of the box including the point on the object to be drawn by bit operation, thereby quickly identifying the identification information of the reference image associated therewith. Can be specified.

According to the embodiment described above, in the technology for viewing a moving image from an arbitrary viewpoint, together with data defining the movement of the object in the virtual space, a moving image obtained by viewing the movement from a plurality of reference viewpoints, Prepare as a reference image. At the time of display, the object is projected onto the view screen based on the user's viewpoint at a predetermined time step, and the pixel value representing the same object is obtained from the reference image at each time to obtain the pixel value of the display image. decide. In calculating the pixel value, a rule based on the positional relationship between the actual viewpoint and the reference viewpoint and the attribute of the object is introduced.

Since the reference image can be generated over time at a timing different from the display according to the viewpoint, a high-quality image can be prepared. By subtracting a value from this high quality image at the time of display, a high quality image can be presented without taking time. If the reference viewpoint is moved so as to follow the movement of the object, the detail level of the object in the reference image can be made constant, and the image of the object can be stably expressed in the display image with high quality.

In addition, it is necessary to display moving images by extracting only changing areas in the moving images of the reference images and the depth images used for selecting the reference images to be referenced at the time of display. Can reduce the size of data. Further, by generating integrated moving image data including the corresponding frames of the reference image and the depth image in the same frame, and compressing and encoding in units of the moving image, the load of decoding processing and synchronization processing at the time of display is reduced. Can be reduced.

Further, instead of using the depth image data to determine the reference image of the reference destination, the reference image of the reference destination is specified in advance with respect to the position of the object surface, and the identification information is associated with the object model. Thereby, the size of data required for display can be further reduced. Further, since the process of determining the reference image for reference by calculation can be omitted at the time of display, the time from acquisition of the user's viewpoint to display can be shortened.

The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .

100 head mounted display, 200 display image generation device, 222 CPU, 224 GPU, 226 main memory, 236 output unit, 238 input unit, 254 object model storage unit, 256 reference image data storage unit, 260 viewpoint information acquisition unit, 262 space Construction unit, 264 projection unit, 266 pixel value determination unit, 268 output unit, 300 reference image generation device, 310 reference viewpoint setting unit, 314 object model storage unit, 316 space construction unit, 318 reference image data generation unit, 330 reference image Generation unit, 332, depth image generation unit, 334 data compression unit, 336 data decompression unit, 338 reference unit, 340 calculation unit, 342 reference destination information addition unit, 344 reference unit.

As described above, the present invention can be used for various information processing devices such as a head mounted display, a game device, an image display device, a portable terminal, and a personal computer, and an information processing system including any one of them.

Claims

A reference image generation device that generates data of a reference image representing an image when a space is viewed from a predetermined reference viewpoint, which is used to generate a display image when the space including an object to be displayed is viewed from an arbitrary viewpoint. There,
According to information defining the object, a space construction unit that arranges the object in the space;
The reference image and the corresponding depth image are generated in a field of view corresponding to the reference viewpoint arranged in the space, and the region is converted into an image for each predetermined region on the surface of the object using the depth image. A reference image data generation unit that identifies the reference image that appears, and outputs the identification result and data of the reference image;
A reference image generating device comprising:
The reference image generation device according to claim 1, wherein the reference image data generation unit associates the identification information of the specified reference image with a corresponding region of the object model.
3. The reference image generation device according to claim 2, wherein the reference image data generation unit specifies the reference image with respect to a vertex or a mesh that defines the object model, and associates the identification information with the reference image.
The reference image generation device according to claim 2, wherein the reference image data generation unit generates an image representing a distribution of identification information of the reference image as a texture image to be mapped to each surface of the object model. .
The reference according to claim 2, wherein the reference image data generation unit specifies the reference image in units of regions included in voxels obtained by dividing the object model, and associates the identification information with the voxels. Image generation device.
The space construction unit changes the object in the space according to information defining the change of the object,
The reference image data generation unit generates the reference image representing the change of the object at a predetermined rate, and acquires the change of the reference image in which the region appears as an image, thereby obtaining a time series of the identification information. 6. The reference image generation apparatus according to claim 1, wherein the reference image generation apparatus outputs data.
An object model storage unit for storing information defining an object in a display target space;
A reference image data storage unit for storing data of a reference image representing an image when the space including the object is viewed from a predetermined reference viewpoint;
A viewpoint information acquisition unit for acquiring information related to the user's viewpoint;
A projection unit that displays an image of the object on a plane of a display image when the space is viewed from the viewpoint of the user;
For each pixel in the display image, the reference image in which a point on the corresponding object is represented is identified by reading additional information of the object model stored in the object model storage unit, and the color of the pixel is determined. A pixel value determining unit that determines the color of the image in the identified reference image,
An output unit for outputting data of the display image;
A display image generating apparatus comprising:
The pixel value determination unit reads, as the additional information, a texture image representing a distribution of identification information of the reference image representing the surface, which is associated with the surface including the point, and maps the texture image to the surface The display image generating apparatus according to claim 7, wherein the reference image in which the point is represented is specified.
The projecting unit represents an image of an object changing in the space on a plane of a display image at a predetermined rate,
The said pixel value determination part specifies the said reference image used for determination of the color of the said pixel at a predetermined | prescribed rate based on the time series data represented as the said additional information, The Claim 7 or 8 characterized by the above-mentioned. Display image generating apparatus.
A reference image generation device that generates data for a reference image that represents an image when the space is viewed from a predetermined reference viewpoint, which is used to generate a display image when the space including the object to be displayed is viewed from an arbitrary viewpoint. ,
Placing the object in the space according to information defining the object;
Generating the reference image and a depth image corresponding to the reference image in a field of view corresponding to a reference viewpoint arranged in the space;
For each predetermined region on the surface of the object, using the depth image, specifying the reference image in which the region appears as an image, and outputting a specification result and data of the reference image;
A reference image generation method comprising:
Reading from the memory information defining an object in the display target space;
Reading data of a reference image representing an image when the space including the object is viewed from a predetermined reference viewpoint from a memory;
Obtaining information relating to the user's viewpoint;
Representing an image of the object on a plane of a display image when the space is viewed from the viewpoint of the user;
For each pixel in the display image, the reference image in which a point on the corresponding object is represented is specified based on the additional information of the object model included in the information defining the object, and the color of the pixel is specified Determining using the color of the image in said reference image;
Outputting data of the display image;
A display image generation method by a display image generation apparatus, comprising:
A computer that generates data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint, which is used to generate a display image when the space including the object to be displayed is viewed from an arbitrary viewpoint,
A function of arranging the object in the space according to information defining the object;
A function of generating the reference image and a depth image corresponding to the reference image in a field of view corresponding to the reference viewpoint arranged in the space;
Using the depth image, for each predetermined area on the surface of the object, identifying the reference image in which the area appears as an image, and outputting the identification result and the data of the reference image;
A computer program characterized by realizing the above.
A function for reading out information defining the object in the display target space from the memory;
A function of reading data of a reference image representing an image when the space including the object is viewed from a predetermined reference viewpoint from a memory;
A function for acquiring information related to the user's viewpoint;
A function of representing an image of the object on a plane of a display image when the space is viewed from the viewpoint of the user;
For each pixel in the display image, the reference image in which a point on the corresponding object is represented is specified based on the additional information of the object model included in the information defining the object, and the color of the pixel is specified A function of determining using the color of the image in the reference image,
A function of outputting data of the display image;
A computer program for causing a computer to realize the above.