WO2019193699A1 - Reference image generation device, display image generation device, reference image generation method, and display image generation method - Google Patents

Reference image generation device, display image generation device, reference image generation method, and display image generation method Download PDF

Info

Publication number
WO2019193699A1
WO2019193699A1 PCT/JP2018/014478 JP2018014478W WO2019193699A1 WO 2019193699 A1 WO2019193699 A1 WO 2019193699A1 JP 2018014478 W JP2018014478 W JP 2018014478W WO 2019193699 A1 WO2019193699 A1 WO 2019193699A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
reference image
viewpoint
display
data
Prior art date
Application number
PCT/JP2018/014478
Other languages
French (fr)
Japanese (ja)
Inventor
雄気 唐澤
アンドリュー ジェイムス ビゴス
Original Assignee
株式会社ソニー・インタラクティブエンタテインメント
ソニーインタラクティブエンタテインメントヨーロッパ リミテッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ソニー・インタラクティブエンタテインメント, ソニーインタラクティブエンタテインメントヨーロッパ リミテッド filed Critical 株式会社ソニー・インタラクティブエンタテインメント
Priority to PCT/JP2018/014478 priority Critical patent/WO2019193699A1/en
Publication of WO2019193699A1 publication Critical patent/WO2019193699A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present invention relates to a reference image generation device that generates data used to display an image according to a user's viewpoint, a display image generation device that generates a display image using the data, a reference image generation method using the same, and a display
  • the present invention relates to an image generation method.
  • An image display system that allows users to appreciate the target space from a free viewpoint has become widespread.
  • a system has been developed in which a panoramic image is displayed on a head-mounted display, and a panoramic image corresponding to the line-of-sight direction is displayed when a user wearing the head-mounted display rotates his head.
  • a head-mounted display By using a head-mounted display, it is possible to enhance the sense of immersion in video and improve the operability of applications such as games.
  • a walk-through system has been developed that allows a user wearing a head-mounted display to physically move around a space displayed as an image when the user physically moves.
  • image display technology that supports free viewpoints requires high responsiveness to changes in display with respect to viewpoint movement.
  • it is necessary to increase the resolution or perform complicated calculations, which increases the load of image processing. For this reason, the display cannot catch up with the movement of the viewpoint, and as a result, the sense of reality may be impaired.
  • the present invention has been made in view of these problems, and an object of the present invention is to provide a technique capable of achieving both image display responsiveness and image quality with respect to the viewpoint.
  • an aspect of the present invention relates to a reference image generation device.
  • This reference image generation device is used to generate a display image when a space including an object to be displayed is viewed from an arbitrary viewpoint, and data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint
  • a reference image data generation unit that generates a reference image in which the region appears as an image for each predetermined region on the surface of the object using the depth image, and outputs the identification result and the data of the reference image; , Provided.
  • the display image generation device stores an object model storage unit that stores information defining an object in a display target space, and reference image data representing an image when the space including the object is viewed from a predetermined reference viewpoint.
  • a reference image data storage unit a viewpoint information acquisition unit that acquires information related to a user's viewpoint, a projection unit that displays an image of an object on a plane of the display image when the space is viewed from the user's viewpoint, and a display image
  • a reference image in which a point on the corresponding object is represented is specified by reading additional information of the object model stored in the object model storage unit, and the color of the pixel is determined in the specified reference image.
  • a pixel value determining unit that determines the color of the image; and an output unit that outputs display image data. .
  • Still another embodiment of the present invention relates to a reference image generation method.
  • This reference image generation method is used to generate a display image when a space including an object to be displayed is viewed from an arbitrary viewpoint, and data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint
  • a reference image generating device to generate, in accordance with information defining the object, a step of arranging the object in a space; a step of generating a reference image and a depth image corresponding to the reference image in a field of view corresponding to the reference viewpoint arranged in the space; And, for each predetermined area on the surface of the object using the depth image, specifying a reference image in which the area appears as an image, and outputting a specification result and data of the reference image. To do.
  • Still another aspect of the present invention relates to a display image generation method.
  • this display image generation method a step of reading information defining an object in a display target space from a memory, and a reference image data representing an image when the space including the object is viewed from a predetermined reference viewpoint are read from the memory.
  • a step of acquiring information relating to the user's viewpoint a step of representing an image of the object when the space is viewed from the user's viewpoint on a plane of the display image, and a pixel on the corresponding object for each pixel in the display image Identifying a reference image in which points are represented based on additional information of the object model included in the information defining the object, and determining the color of the pixel using the color of the image in the identified reference image; Outputting display image data.
  • FIG. 1 It is a figure which shows the example of an external appearance of the head mounted display of this Embodiment. It is a block diagram of the image processing system of this Embodiment. It is a figure for demonstrating the example of the image world which the display image generation apparatus of this Embodiment displays on a head mounted display. It is a figure which shows the internal circuit structure of the display image generation apparatus of this Embodiment. It is a figure which shows the functional block of the display image generation apparatus in this Embodiment. It is a figure which shows the functional block of the reference
  • FIG. It is a figure for demonstrating the aspect which switches the reference
  • the configuration of functional blocks of a reference image data generation unit of a reference image generation device and a pixel value determination unit of a display image generation device when a compression / decompression processing function of reference image data is introduced is shown.
  • FIG. It is a figure which shows typically the example of an integrated moving image produced
  • this Embodiment it is a figure which shows the structural example of the data after compression in the aspect which controls the compression process of a reference
  • this Embodiment it is a figure for demonstrating the example of a data compression process when the image of all the directions of a reference
  • a reference image data generation unit of a reference image generation device and a display image generation device when a function of storing information related to a reference image of a reference destination in association with a position on the object surface is introduced. It is a figure which shows the structure of the functional block of the pixel value determination part.
  • This embodiment basically displays an image with a field of view according to the user's viewpoint.
  • the type of device that displays an image is not particularly limited, and any of a wearable display, a flat panel display, a projector, and the like may be used.
  • a head-mounted display will be described as an example of a wearable display.
  • the user's line of sight can be estimated approximately by a built-in motion sensor.
  • the user can detect the line of sight by wearing a motion sensor on the head or using a gaze point detection device.
  • the user's head may be attached with a marker, and the line of sight may be estimated by analyzing an image of the appearance, or any of those techniques may be combined.
  • FIG. 1 shows an example of the appearance of the head mounted display 100.
  • the head mounted display 100 includes a main body portion 110, a forehead contact portion 120, and a temporal contact portion 130.
  • the head mounted display 100 is a display device that is worn on the user's head and enjoys still images and moving images displayed on the display, and listens to sound and music output from the headphones.
  • Posture information such as the rotation angle and inclination of the head of the user wearing the head mounted display 100 can be measured by a motion sensor built in or externally attached to the head mounted display 100.
  • the head mounted display 100 is an example of a “wearable display device”.
  • the wearable display device is not limited to the head-mounted display 100 in a narrow sense, but includes glasses, glasses-type displays, glasses-type cameras, headphones, headsets (headphones with microphones), earphones, earrings, ear-mounted cameras, hats, hats with cameras, Any wearable display device such as a hair band is included.
  • FIG. 2 is a configuration diagram of the image processing system according to the present embodiment.
  • the head mounted display 100 is connected to the display image generating apparatus 200 via an interface 205 for connecting peripheral devices such as wireless communication or USB.
  • the display image generation apparatus 200 may be further connected to a server via a network. In that case, the server may provide the display image generating apparatus 200 with image data to be displayed on the head mounted display 100.
  • the display image generation device 200 identifies the position of the viewpoint and the direction of the line of sight based on the position and orientation of the head of the user wearing the head mounted display 100, generates a display image so as to have a field of view corresponding thereto, and Output to the mount display 100.
  • the purpose of displaying an image may be various.
  • the display image generation apparatus 200 may generate a virtual world that is the stage of a game as a display image while advancing an electronic game, or display a moving image or the like for viewing regardless of whether the virtual world is a real world. May be.
  • the display device is a head-mounted display, if the panoramic image can be displayed in a wide angle range with the viewpoint as the center, it is possible to produce a state of being immersed in the display world.
  • FIG. 3 is a diagram for explaining an example of an image world displayed on the head mounted display 100 by the display image generating apparatus 200 in the present embodiment.
  • a state in which the user 12 is in a room that is a virtual space is created.
  • objects such as walls, floors, windows, tables, and objects on the table are arranged in the world coordinate system that defines the virtual space.
  • the display image generation apparatus 200 defines a view screen 14 in the world coordinate system in accordance with the position of the viewpoint of the user 12 and the direction of the line of sight, and draws a display image by projecting an image of the object there.
  • the position of the viewpoint of the user 12 and the direction of the line of sight (hereinafter, these may be collectively referred to as “viewpoint”) are acquired at a predetermined rate, and the position and direction of the view screen 14 can be changed accordingly.
  • an image can be displayed with a field of view corresponding to the user's viewpoint. If a stereo image having parallax is generated and displayed in front of the left and right eyes on the head mounted display 100, the virtual space can be stereoscopically viewed. Thereby, the user 12 can experience a virtual reality as if he were in a room in the display world.
  • the display target is a virtual world based on computer graphics. However, a real world photographed image such as a panoramic photograph may be used, or it may be combined with the virtual world.
  • an image viewed from a specific viewpoint is acquired in advance and used to determine the pixel value of the display image for an arbitrary viewpoint. That is, the color of the object appearing as an image in the display image is determined by extracting from the corresponding portion of the image acquired in advance.
  • the viewpoint set in the prior image acquisition is referred to as “reference viewpoint”
  • the image acquired in advance viewed from the reference viewpoint is referred to as “reference image” or “reference viewpoint image”.
  • Objects according to the viewpoint at the time of display can be expressed with higher accuracy. More specifically, when the viewpoint at the time of display coincides with one of the reference viewpoints, the pixel value of the reference image corresponding to the reference viewpoint can be adopted as it is. When the viewpoint at the time of display is between a plurality of reference viewpoints, the pixel values of the display image are determined by combining the pixel values of the reference image corresponding to the plurality of reference viewpoints.
  • FIG. 4 shows the internal circuit configuration of the display image generating apparatus 200.
  • the display image generation apparatus 200 includes a CPU (Central Processing Unit) 222, a GPU (Graphics Processing Unit) 224, and a main memory 226. These units are connected to each other via a bus 230. An input / output interface 228 is further connected to the bus 230.
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • the input / output interface 228 includes a peripheral device interface such as USB or IEEE1394, a communication unit 232 including a wired or wireless LAN network interface, a storage unit 234 such as a hard disk drive or a nonvolatile memory, and a display device such as the head mounted display 100.
  • An output unit 236 that outputs data to the head, an input unit 238 that inputs data from the head mounted display 100, and a recording medium driving unit 240 that drives a removable recording medium such as a magnetic disk, an optical disk, or a semiconductor memory are connected.
  • the CPU 222 controls the entire display image generation apparatus 200 by executing the operating system stored in the storage unit 234.
  • the CPU 222 also executes various programs read from the removable recording medium and loaded into the main memory 226 or downloaded via the communication unit 232.
  • the GPU 224 has a function of a geometry engine and a function of a rendering processor, performs a drawing process according to a drawing command from the CPU 222, and stores a display image in a frame buffer (not shown). Then, the display image stored in the frame buffer is converted into a video signal and output to the output unit 236.
  • the main memory 226 includes a RAM (Random Access Memory) and stores programs and data necessary for processing.
  • FIG. 5 shows a functional block configuration of the display image generation apparatus 200 in the present embodiment.
  • the display image generation apparatus 200 may perform general information processing such as progressing an electronic game or communicating with a server.
  • a function of generating display image data according to a viewpoint It is shown paying attention to.
  • at least a part of the functions of the display image generation apparatus 200 shown in FIG. 5 may be mounted on the head mounted display 100.
  • at least a part of the functions of the display image generation device 200 may be implemented in a server connected to the display image generation device 200 via a network.
  • the functional blocks shown in FIG. 5 and FIG. 6 to be described later can be realized in hardware by the configuration of the CPU, GPU, various memories shown in FIG. 4, and loaded in the memory from a recording medium or the like in software. It is realized by a program that exhibits various functions such as a data input function, a data holding function, an image processing function, and a communication function. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.
  • the display image generation apparatus 200 includes a viewpoint information acquisition unit 260 that acquires information related to a user's viewpoint, a space construction unit 262 that constructs a space composed of objects to be displayed, a projection unit 264 that projects an object on a view screen, A pixel value determining unit 266 that determines values of pixels constituting the image and completes a display image, and an output unit 268 that outputs display image data to the head mounted display 100 are provided.
  • the display image generation apparatus 200 further includes an object model storage unit 254 that stores data related to an object model necessary for constructing a space, and a reference image data storage unit 256 that stores data related to a reference image.
  • the viewpoint information acquisition unit 260 includes the input unit 238 and the CPU 222 shown in FIG. 4 and acquires the position of the user's viewpoint and the direction of the line of sight at a predetermined rate. For example, the output value of the acceleration sensor built in the head mounted display 100 is sequentially acquired, and thereby the posture of the head is acquired. Further, a light emitting marker (not shown) is provided outside the head mounted display 100, and the captured image is acquired from an imaging device (not shown), thereby acquiring the position of the head in real space.
  • an imaging device (not shown) that captures an image corresponding to the user's visual field may be provided on the head-mounted display 100 side, and the position and posture of the head may be acquired by a technique such as SLAM (Simultaneous Localization and Mapping). If the position and orientation of the head can be acquired in this way, the position of the user's viewpoint and the direction of the line of sight can be specified approximately.
  • SLAM Simultaneous Localization and Mapping
  • the space construction unit 262 is configured by the CPU 222, the GPU 224, the main memory 226, and the like in FIG. 4, and constructs a shape model of the space where the object to be displayed exists.
  • objects such as walls, floors, windows, tables, and objects on the table representing the room are arranged in the world coordinate system that defines the virtual space.
  • Information related to the shape of each object is read from the object model storage unit 254.
  • the space construction unit 262 may determine the shape, position, and orientation of the object, and can use a modeling technique based on a surface model in general computer graphics.
  • the object model storage unit 254 also stores data defining the movement and deformation of the object. For example, time-series data representing the position and shape of the object at predetermined time intervals is stored. Alternatively, a program for generating such a change is stored.
  • the space construction unit 262 reads the data and changes the object arranged in the virtual space.
  • the projection unit 264 includes the GPU 224 and the main memory 226 shown in FIG. 4 and sets the view screen according to the viewpoint information acquired by the viewpoint information acquisition unit 260. That is, by setting the screen coordinates corresponding to the position of the head and the direction of the face, the display target space is drawn on the screen plane with a field of view corresponding to the position and direction of the user.
  • the projection unit 264 further projects the object in the space constructed by the space construction unit 262 onto the view screen at a predetermined rate.
  • This processing can also use a general computer graphics technique for perspective-transforming a mesh such as a polygon.
  • the pixel value determining unit 266 includes the GPU 224, the main memory 226, and the like shown in FIG. 4, and determines the values of the pixels constituting the object image projected onto the view screen.
  • the reference image data is read from the reference image data storage unit 256, and pixel values representing points on the same object are extracted and used.
  • the pixel value of the display image To do.
  • a high-definition that is close to the case of ray tracing is achieved with a light load calculation that reads out the corresponding pixel values and performs weighted averaging during operation.
  • Image representation can be realized.
  • the pixel value determination unit 266 refers to the frame of the reference image at the corresponding time for the moving image of the object projected by the projection unit 264. That is, the pixel value determination unit 266 refers to the moving image of the reference image after synchronizing with the movement of the object in the virtual space generated by the space construction unit 262.
  • the reference image is not limited to a graphics image drawn by ray tracing, and may be an image obtained by photographing a real space from a reference viewpoint in advance.
  • the space construction unit 262 constructs a shape model of the real space to be imaged, and the projection unit 264 projects the shape model onto the view screen corresponding to the viewpoint at the time of display.
  • the processing of the space construction unit 262 and the projection unit 264 can be omitted if the position of the image of the object to be imaged can be determined with a visual field corresponding to the viewpoint at the time of display.
  • the output unit 268 includes the CPU 222, the main memory 226, the output unit 236, and the like shown in FIG. 4, and the display image data completed by the pixel value determination unit 266 determining the pixel value is transferred to the head mounted display 100 according to a predetermined value. Send at rate.
  • the output unit 268 When a stereo image is generated for stereoscopic viewing, the output unit 268 generates and outputs an image obtained by connecting the left and right as a display image.
  • the output unit 268 may perform correction on the display image in consideration of distortion caused by the lens.
  • FIG. 6 shows functional blocks of a device that generates reference image data.
  • the reference image generation device 300 may be a part of the display image generation device 200 of FIG. 5 or may be provided independently as a device that generates data used for display. Further, the generated reference image data, the object model used for generation, and the electronic content including the data defining the movement are stored in a recording medium or the like and loaded into the main memory of the display image generation apparatus 200 during operation. You may be able to do it.
  • the internal circuit configuration of the reference image generation device 300 may be the same as the internal circuit configuration of the display image generation device 200 shown in FIG.
  • the reference image generation device 300 generates a reference image data for each reference viewpoint based on the reference space setting unit 310 that sets a reference viewpoint, a space construction unit 316 that constructs a space composed of objects to be displayed, and the constructed space.
  • the apparatus includes a reference image data generation unit 318, an object model storage unit 314 that stores data relating to an object model necessary for constructing a space, and a reference image data storage unit 256 that stores data of the generated reference image.
  • the reference viewpoint setting unit 310 includes an input unit 238, a CPU 222, a main memory 226, and the like, and sets the position coordinates of the reference viewpoint in the display target space.
  • a plurality of reference viewpoints are distributed so as to cover the range of viewpoints that the user can take.
  • the appropriate values of the range and the number of reference viewpoints vary depending on the configuration of the display target space, the purpose of display, the accuracy required for display, the processing performance of the display image generation device 200, and the like. Therefore, the reference viewpoint setting unit 310 may accept a setting input of the position coordinates of the reference viewpoint from the creator of the display content. Alternatively, the reference viewpoint setting unit 310 may change the position of the reference viewpoint according to the movement of the object, as will be described later.
  • the space construction unit 316 includes a CPU 222, a GPU 224, a main memory 226, and the like, and constructs a shape model of a space in which an object to be displayed exists. This function corresponds to the function of the space construction unit 262 shown in FIG.
  • the reference image generating apparatus 300 in FIG. 6 uses a modeling method based on a solid model in consideration of the color and material of an object in order to accurately draw an image of the object by ray tracing or the like. Therefore, the object model storage unit 314 stores object model data including information such as color and material.
  • the space construction unit 316 moves or deforms the object in the virtual space.
  • the lighting state may be changed or the color of the object may be changed.
  • Information defining such a change may be read out from information stored in the object model storage unit 314 or may be set by direct input by the creator of the display content. In the latter case, the space construction unit 316 changes the object in accordance with the input information, and stores information defining the change in the object model storage unit 314 so that the same change occurs during display.
  • the reference image data generation unit 318 includes a CPU 222, a GPU 224, a main memory 226, and the like, and draws a display target object visible from the reference viewpoint for each reference viewpoint set by the reference viewpoint setting unit 310 at a predetermined rate.
  • the viewpoint at the time of display can be freely changed to all directions. Further, it is desirable to accurately represent the appearance at each reference viewpoint in the reference image by calculating the propagation of light rays over time.
  • the reference image data generation unit 318 also generates a depth image corresponding to each reference image. That is, the distance (depth value) from the screen surface of the object represented by each pixel of the reference image is obtained, and a depth image representing this as a pixel value is generated.
  • the reference image is an omnidirectional panoramic image
  • the view screen is a spherical surface
  • the depth value is the distance to the object in the normal direction of the spherical surface.
  • the generated depth image is used for selecting a reference image to be referred to when determining the pixel value of the display image.
  • the reference image data generation unit 318 may generate other information used when selecting the reference image for reference at the time of display instead of the depth image. Specifically, for the position on the object surface, a reference image to be referred to when drawing the position is obtained in advance. In this case, the reference image data generation unit 318 stores the information in the object model storage unit 314 as additional information of the object model.
  • the object model storage unit 254 in FIG. 5 may store at least data used for generating a display image among the data stored in the object model storage unit 314 in FIG.
  • the reference image data generation unit 318 stores the generated data in the reference image data storage unit 256 in association with the position coordinates of the reference viewpoint.
  • the reference image data storage unit 256 basically stores a pair of a reference image and a depth image for one reference viewpoint. However, in a mode in which a depth image is not used during display as described above, one reference viewpoint is used. Only the reference image is stored.
  • a pair of a reference image and a depth image may also be referred to as “reference image data”.
  • the reference image data generation unit 318 reduces the data size and the processing load when generating the display image by using a data structure in which the image is updated only for a moving region in the generated moving image.
  • an integrated moving image in which a reference image frame and a depth image frame at the same time are represented in one frame is generated, and the data size is reduced by compressing and encoding in units, and decoding and decompression at the time of display is performed. Reduce the load of processing and synchronization processing. Details will be described later.
  • FIG. 7 shows an example of setting the reference viewpoint.
  • a plurality of reference viewpoints are set on the horizontal plane 20a at the height of the eyes when the user 12 stands and the horizontal plane 20b at the height of the eyes when sitting, as indicated by black circles.
  • the horizontal plane 20a is 1.4 m from the floor
  • the horizontal plane 20b is 1.0 m from the floor.
  • the reference viewpoint is distributed in the corresponding rectangular area.
  • the reference viewpoint is arranged at every other intersection of the grids that divide the rectangular area into four equal parts in the X-axis direction and the Y-axis direction, respectively. Further, the upper and lower horizontal surfaces 20a and 20b are arranged so as to be shifted so that the reference viewpoints do not overlap. As a result, in the example shown in FIG. 7, a total of 25 reference viewpoints are set, 13 points on the upper horizontal plane 20a and 12 points on the lower horizontal plane 20b.
  • the distribution of the reference viewpoint is not limited to this, and may be distributed on a plurality of planes including a vertical plane, or may be distributed on a curved surface such as a spherical surface.
  • the reference viewpoints may be distributed at a higher density than the others in a range where the probability that the user is present is not made uniform.
  • the reference viewpoint may be arranged so as to correspond to the object to be displayed, and the reference viewpoint may be moved according to the movement of the object.
  • the reference image is moving image data that reflects the movement of each reference viewpoint.
  • an image for each object is first generated at the time of display, and then the displayed image is synthesized. May be generated.
  • the positional relationship between the object and the reference viewpoint can be controlled independently.
  • important objects or objects that are likely to be seen in close proximity can be expressed in more detail, or even when different movements are performed for each object, the details of all objects can be expressed uniformly.
  • an increase in data size can be suppressed by representing the reference image as a still image from a fixed reference viewpoint.
  • FIG. 8 is a diagram for explaining a method in which the pixel value determining unit 266 of the display image generating apparatus 200 selects a reference image used for determining the pixel value of the display image.
  • the figure shows a state in which a display target space including the object 24 is viewed from above. In this space, it is assumed that five reference viewpoints 28a to 28e are set, and reference image data is generated for each of them.
  • circles centered on the reference viewpoints 28a to 28e schematically show the screen surface of the reference image prepared as a panoramic image of the whole celestial sphere.
  • the projection unit 264 determines the view screen so as to correspond to the virtual camera 30 and projects the model shape of the object 24. As a result, the correspondence between the pixel in the display image and the position on the surface of the object 24 is determined. For example, when determining the value of a pixel representing the image of the point 26 on the surface of the object 24, the pixel value determining unit 266 first specifies a reference image in which the point 26 appears as an image.
  • the position coordinates of the reference viewpoints 28a to 28e and the point 26 in the world coordinate system are known, their distances can be easily obtained.
  • the distance is indicated by the length of a line segment connecting the reference viewpoints 28a to 28e and the point 26. If the point 26 is projected onto the screen surface of each reference viewpoint, the position of the pixel where the image of the point 26 should appear in each reference image can be specified. On the other hand, depending on the position of the reference viewpoint, the point 26 may be behind the object or concealed by an object in front, and the image may not appear at the position of the reference image.
  • the pixel value determining unit 266 confirms the depth image corresponding to each reference image.
  • the pixel value of the depth image represents the distance from the screen surface of the object that appears as an image in the corresponding reference image. Therefore, by comparing the distance from the reference viewpoint to the point 26 and the depth value of the pixel in which the image of the point 26 in the depth image should appear, it is determined whether or not the image is the image of the point 26.
  • the distances Dd and De to the corresponding pixel objects obtained from the depth images of the reference viewpoints 28d and 28e are calculated based on the assumption that there is a difference between the distance from each reference viewpoint 28d and 28e to the point 26 and a threshold value or more. Excluded.
  • the threshold values determine that the distances Da and Db to the corresponding pixel objects obtained from the depth images of the reference viewpoints 28a and 28b are substantially the same as the distances from the reference viewpoints 28a and 28b to the point 26. Can be identified.
  • the pixel value determination unit 266 selects the reference image used for calculating the pixel value for each pixel of the display image by performing the screening using the depth value in this way.
  • FIG. 8 illustrates five reference viewpoints, but actually, comparison is performed using depth values for all of the reference viewpoints distributed as shown in FIG. Thereby, a highly accurate display image can be drawn.
  • a depth image and a reference image of about 25 for all the pixels of the display image may cause a load that cannot be overlooked depending on the processing performance of the apparatus. Therefore, prior to selecting the reference image used for determining the pixel value as described above, a reference image to be a selection candidate may be narrowed down according to a predetermined reference. For example, reference viewpoints existing within a predetermined range from the virtual camera 30 are extracted, and selection processing using depth values is performed only for the reference images from them.
  • the upper limit of the reference viewpoint to be extracted is set to 10, 20, etc., and the range of the extraction target is adjusted so as to be within such an upper limit, or is selected randomly or based on a predetermined rule. Or you may.
  • the number of reference viewpoints to be extracted may be varied depending on the area on the display image. For example, when virtual reality is realized using a head-mounted display, since the center area of the display image coincides with the direction in which the user's line of sight is directed, drawing with higher accuracy than the peripheral area is desirable.
  • a certain number of reference viewpoints are selected as candidates for pixels that are within a predetermined range from the center of the display image, while the number of selection candidates is reduced for pixels that are outside of that range.
  • reference images reference images
  • the number of regions is not limited to two, and may be three or more regions. In addition to the division depending on the distance from the center of the display image, it is conceivable to dynamically divide according to the image area of the object of interest.
  • FIG. 9 is a diagram for explaining a method in which the pixel value determining unit 266 determines the pixel value of the display image. As shown in FIG. 8, it is assumed that the image of the point 26 of the object 24 is represented in the reference images of the reference viewpoints 28a and 28b.
  • the pixel value determination unit 266 basically determines the pixel value of the image of the point 26 in the display image corresponding to the actual viewpoint by blending the pixel values of the image of the point 26 in those reference images.
  • the pixel value C in the display image is calculated as follows.
  • C w 1 ⁇ c 1 + w 2 ⁇ c 2
  • N the number of reference images to be used
  • the identification number of the reference viewpoint is i (1 ⁇ i ⁇ N)
  • the distance from the virtual camera 30 to the i-th reference viewpoint is ⁇ i
  • the weight coefficient for the pixel value of the corresponding reference image is 1, and the weight coefficient for the pixel value of the other reference image is 0.
  • the calculation formula is not limited to this.
  • the parameter used for calculating the weighting coefficient is not limited to the distance from the virtual camera to the reference viewpoint.
  • the angle ⁇ a, ⁇ b (0 ⁇ ⁇ a, ⁇ b ⁇ 90 °) formed by the line-of-sight vectors Va, Vb from each reference viewpoint to the line-of-sight vector Vr from the virtual camera 30 to the point 26 may be used.
  • the weight coefficient is calculated as follows.
  • the specific calculation formula is not particularly limited.
  • the weighting factor may be determined by evaluating “closeness of state” from both the distance and the angle.
  • the surface shape of the object 24 at the point 26 may be taken into consideration.
  • the brightness of the reflected light from the object generally has an angle dependency based on the inclination (normal) of the surface. Therefore, the angle formed between the normal vector at the point 26 and the line-of-sight vector Vr from the virtual camera 30 is compared with the angle formed between the normal vector and the line-of-sight vectors Va and Vb from each reference viewpoint. You may enlarge a weighting coefficient, so that it is small.
  • the function itself for calculating the weighting coefficient may be switched depending on attributes such as the material and color of the object 24.
  • attributes such as the material and color of the object 24.
  • the material in the case of a material in which the specular reflection component is dominant, the material has strong directivity, and the color observed varies greatly depending on the angle of the line-of-sight vector.
  • the color change with respect to the angle of the line-of-sight vector is not so large. Therefore, in the former case, a function that increases the weighting coefficient is used for the reference viewpoint having a line-of-sight vector closer to the line-of-sight vector Vr from the virtual camera 30 to the point 26. In the latter case, the weighting coefficient is used for all the reference viewpoints. May be made equal, or a function that makes the angle dependency smaller than when the specular reflection component is dominant may be used.
  • a reference image used to determine the pixel value C of the display image is thinned out, or a reference having a line-of-sight vector with an angle close to the actual line-of-sight vector Vr by a predetermined value.
  • the number itself may be reduced by using only images or the calculation load may be suppressed.
  • the reference image data storage unit 256 associates the data representing the attribute such as the material of the object represented by the image of the reference image with each other. And store it.
  • the surface shape and material of the object can be taken into account, and the directivity of light due to specular reflection can be more accurately reflected in the display image.
  • the weighting factor is determined by combining two or more of calculations based on the shape of the object, calculations based on attributes, calculations based on the distance from the virtual camera to the reference viewpoint, and calculations based on the angle formed by each line-of-sight vector. May be.
  • FIG. 10 is a flowchart illustrating a processing procedure in which the display image generation apparatus 200 generates a display image corresponding to the viewpoint. This flowchart is started when an application or the like is started by a user operation, an initial image is displayed, and a viewpoint movement is accepted. As described above, various information processing such as an electronic game may be performed in parallel with the display processing illustrated.
  • the space construction unit 262 forms an initial state of a three-dimensional space in which a display target object exists in the world coordinate system (S10).
  • the viewpoint information acquisition unit 260 identifies the position of the viewpoint and the direction of the line of sight at that time based on the position and orientation of the user's head (S12).
  • the projection unit 264 sets a view screen for the viewpoint, and projects an object existing in the display target space (S14).
  • the pixel value determining unit 266 sets one target pixel among the pixels inside the mesh thus projected (S16), and selects a reference image used for determining the pixel value (S18).
  • the reference image in which the point on the object represented by the target pixel appears as an image is determined based on the depth image of each reference image. Then, the pixel value determination unit 266 determines the weighting factor based on the positional relationship between the reference viewpoint of these reference images and the virtual camera corresponding to the actual viewpoint, the shape of the object, the material, and the like, and then the correspondence of each reference image.
  • the value of the target pixel is determined by weighted averaging the pixel values to be performed (S20). It should be understood by those skilled in the art that the calculation for deriving the pixel value of the target pixel from the pixel value of the reference image can be considered variously as statistical processing and interpolation processing in addition to the weighted average.
  • the processes of S18 and S20 are repeated for all pixels on the view screen (N and S16 of S22).
  • the output unit 268 outputs the data to the head mounted display 100 as display image data (S24).
  • the processing of S16 to S22 is performed for each, and the images are connected and output as appropriate.
  • the space construction unit 262 forms a display target space for the next time step (S10). That is, the object is moved or deformed by a time step from the initial state.
  • the pixel values are determined using the reference image for all the pixels on the view screen, but the drawing method may be switched depending on the region on the display image and the position of the viewpoint. For example, for an image of an object that does not require changes in light or color due to viewpoint movement, only conventional texture mapping may be performed. In addition, a state observed only from a local viewpoint, such as reflected light with high directivity, may not be expressed from the surrounding reference image. Therefore, the amount of data prepared as a reference image can be suppressed by switching to rendering by ray tracing only when the viewpoint enters the corresponding range.
  • FIG. 11 shows an example of the structure of data stored in the reference image data storage unit 256.
  • the reference image data 270 has a data structure in which the reference viewpoint position coordinates 274, the reference image 276, and the depth image 278 are associated with each reference image identification information 272.
  • the reference viewpoint position coordinates 274 are three-dimensional position coordinates in the virtual space set by the reference viewpoint setting unit 310 in consideration of the movable range of the user 12.
  • the reference image 276 is moving image data representing a space including a moving object when viewed from each reference viewpoint.
  • the depth image 278 also becomes moving image data representing the distance from the screen surface of the space including the moving object.
  • the reference image is represented by character information such as “moving image A”, “moving image B”, “moving image C”, and the depth image is represented by “depth moving image A”, “depth moving image B”, and “depth moving image C”.
  • information such as a storage area in the reference image data storage unit 256 may be included.
  • FIG. 12 shows an example of setting a reference viewpoint for representing a moving object.
  • the representation of the figure is the same as in FIG.
  • An object 34 and an object 35 exist in the virtual space shown in FIGS.
  • the reference viewpoint setting unit 310 of the reference image generating apparatus 300 sets five reference viewpoints 30a, 30b, 30c, 30d, and 30e.
  • (a) shows a mode in which the reference viewpoint is not moved.
  • the change in each reference image is mainly limited to the image area of the object 35. That is, in each frame of the moving image of the reference image and the moving image of the depth image, no change occurs in a wide range of areas, and therefore, for example, the data size can be reduced by applying a compression method using inter-frame difference.
  • at least a part of the reference viewpoints 30a to 30e is moved so as to correspond to the movement of the object 35, thereby obtaining reference viewpoints 36a to 36e.
  • the four reference viewpoints 30a to 30d are moved to the reference viewpoints 36a to 36d with the same speed vector as the speed vector of the object 35.
  • the movement rule is not limited to this, and the reference viewpoint can be moved so that the distance to the object does not exceed the predetermined threshold and the distance between the reference viewpoints does not fall below the predetermined threshold. That's fine.
  • the reference viewpoint setting rule is appropriately selected in consideration of the level of detail required for the display image, the range of movement of the object, a suitable data size, and the like when expressing the object.
  • the reference viewpoint in charge of the object is distributed within the predetermined range, and the position of the reference viewpoint is controlled so that the positional relationship with the object is maintained.
  • “responsible” refers only to the tracking of the position, and the reference image may represent all objects visible from the reference viewpoint.
  • only the image of the object in charge may be represented as a reference image and combined when determining the pixel value of the display image.
  • the display image is overwritten using the reference image representing only the foreground object.
  • a certain reference viewpoint may be moved by an average vector of moving speed vectors of a plurality of objects.
  • the position coordinates of the reference viewpoint change with respect to the time axis among the data of the reference image shown in FIG.
  • the reference image generating apparatus 300 stores the reference image data and the reference viewpoint position coordinates in the reference image data storage unit 256 in association with each time step.
  • the pixel value determination unit 266 of the display image generation apparatus 200 calculates the weighting factor based on the positional relationship between the reference viewpoint and the user viewpoint at the same time step, and then determines the pixel value of the display image at the time step. To do.
  • FIG. 13 is a diagram for describing a mode in which the reference image used for generating the display image is switched according to the movement of the object.
  • the way of representing the figure is the same as in FIG. That is, the objects 34 and 35 exist in the virtual space, and the latter moves as indicated by arrows.
  • the reference image generation apparatus 300 sets fixed reference viewpoints 38a to 38f so as to cover the movement range of the object, and generates respective reference images.
  • the display image generation device 200 switches the reference image used for display according to the movement of the object. For example, at the initial position of the object 35, a reference image indicated by a solid line (reference images of the reference viewpoints 38a, 38b, 38c, and 38f) is used to generate a display image.
  • the reference image indicated by the broken line reference image of the reference viewpoints 38d and 38e
  • the reference image indicated by the thick solid line reference viewpoints 38b and 38f.
  • a reference image corresponding to a reference viewpoint whose distance from the objects 34 and 35 is smaller than a threshold value is used to generate a display image.
  • the object can be expressed with a stable level of detail in the same manner as when the reference viewpoint is moved.
  • the moving image of each reference image does not move the viewpoint, the region that changes between frames is limited, and the compression efficiency increases.
  • the number of moving images of the reference image tends to increase.
  • the reference image is basically video data. Therefore, data can be stored or transmitted in the reference image data storage unit 256 using a general moving image data compression encoding method such as MPEG (Moving Picture Picture Experts Group).
  • MPEG Motion Picture Picture Experts Group
  • an omnidirectional image when represented by an equirectangular cylinder, it may be converted into a coefficient of a general spherical harmonic function and compressed. Further, compression may be performed for each frame by using a general still image data compression encoding method such as JPEG (Joint Photographic Experts Group).
  • FIG. 14 shows a configuration of functional blocks of the reference image data generation unit of the reference image generation device 300 and the pixel value determination unit of the display image generation device 200 when the compression / decompression processing function of the reference image data is introduced. ing.
  • the reference image data generation unit 318a includes a reference image generation unit 330, a depth image generation unit 332, and a data compression unit 334.
  • the reference image generation unit 330 and the depth image generation unit 332 generate the reference image and depth image data as described above. That is, a moving image of the reference image representing the state of the space from each reference viewpoint set by the reference viewpoint setting unit 310 and a moving image of the depth image representing the distance value are generated.
  • the reference viewpoint may be fixed, or a part thereof may be moved according to the movement of the object.
  • the data compression unit 334 compresses the reference image and the depth image thus generated at a predetermined rate with respect to the time axis according to a predetermined rule. Specifically, at least one of the following processes is performed. (1) A reference image and a depth image at the same time step are reduced as necessary, and an integrated moving image expressed as an image for one frame is generated. (2) An area having a change in the reference image and the depth image Represents only as time-series data
  • the data compression unit 334 stores the compressed data in the reference image data storage unit 256. At this time, one frame of the integrated image or the image of the area with change may be further compressed by JPEG. Alternatively, the moving image of the integrated image may be compressed by MPEG.
  • the pixel value determination unit 266a includes a data decompression unit 336, a reference unit 338, and a calculation unit 340.
  • the data decompression unit 336 restores the reference image and the depth image by reading the reference image data at each time step from the reference image data storage unit 256 and decompressing the data.
  • the data decompression unit cuts out the reference image and the depth image from each frame of the integrated moving image, and enlarges them as necessary.
  • the compression of (2) is performed, only the changed area in the previous frame image is updated using the time series data.
  • the reference unit 338 selects the reference image representing the point on the object to be drawn for each pixel of the display image using the depth image of each time step restored as described above, and the reference image The pixel value of is acquired. As described above, the calculation unit 340 also determines the pixel value of the display image by averaging the pixel value acquired from the reference image of the reference destination with an appropriate weight.
  • FIG. 15 schematically illustrates an example of an integrated moving image generated by the data compression unit 334.
  • the integrated moving image 42 is divided into four regions obtained by dividing one frame 40, and “first reference image” and “second reference image” generated for two reference viewpoints, and “corresponding“ Of the “first depth image” and the “second depth image”, each has a data structure representing frames of the same time step.
  • the data compression unit 334 appropriately reduces the frames of the reference image and the depth image according to the size of the image plane set in the integrated moving image 42, and connects them in a predetermined arrangement as illustrated.
  • the data compression unit 334 reduces the frame of each reference image and depth image to 1/2 in both the vertical and horizontal directions.
  • the data compression unit 334 further associates the position coordinates of the two reference viewpoints integrated as an integrated moving image as additional data of the moving image.
  • FIG. 16 schematically shows another example of the integrated moving image generated by the data compression unit 334.
  • This integrated moving image 46 is divided into four areas obtained by dividing one frame 44, and “first reference image”, “second reference image”, and “third reference image” generated for three reference viewpoints. ”And the corresponding“ first depth image ”,“ second depth image ”, and“ third depth image ”, have a data structure representing frames of the same time step.
  • the channels and gradations to be used are not limited by representing the “first depth image” and the “second depth image” in different regions of the image plane.
  • the integrated moving image 46 shown in FIG. 16 includes “first depth image”, “second depth image”, and “third depth image” in red (R), green (G), and blue (B).
  • Three channels are used to represent the same area in the image plane. Thereby, three reference images can be represented in the remaining three regions.
  • the image reduction rate is the same as in FIG. 15, but data of three reference viewpoints can be included in one moving image.
  • synchronization processing and decoding expansion processing can be made more efficient while maintaining image quality.
  • RGB images are converted into YCbCr images for compression encoding
  • the display image generating apparatus 200 performs decoding and decompression, the pixel values of other depth images are affected and cannot be completely restored. Therefore, it is desirable to employ a compression encoding method that can accurately restore RGB values.
  • FIG. 17 is a diagram for explaining a technique for converting only an image in a region having a change into time-series data as one of the compression processes performed by the data compression unit 334.
  • a moving image representing a car traveling on a road is assumed, and (a) shows a reference image for 6 frames of them continuously with the horizontal axis as time.
  • each frame of the reference image represents an omnidirectional image viewed from the reference viewpoint with an equirectangular cylinder. In this case, there is almost no movement on the road and background other than the object automobile.
  • (B) in FIG. 6 is a region in which a fixed size region (for example, region 50) including the automobile is extracted from each frame shown in (a).
  • the data compression unit 334 stores the entire region using a frame at a certain point in time, for example, the frame 52 as a reference frame, and an image of a region of a predetermined size including the object (for example, an image) 54) and the time-series data of 54) and the position information of the area on the reference image plane are stored in association with each other to obtain the compressed reference image data.
  • the data decompression unit 336 uses the time step to which the reference frame is given as a reference image, and for subsequent time steps, updates only the area stored as time-series data. Restore the image.
  • the image 54 of the fixed-size area including the object may have a higher resolution than the image of the corresponding area 50 in the reference frame. According to this, even if the size of the reference frame is reduced to reduce the data size, the degree of detail can be maintained for the object region that the user is expected to watch.
  • the reference frame may be the first frame of each moving image or may be a frame at a predetermined time interval.
  • (C) in the figure further extracts only an image area of the object, for example, a rectangular area having four sides at a predetermined distance from the outline of the object.
  • the size of the extracted region varies depending on the positional relationship between the reference viewpoint and the object.
  • the data compression unit 334 extracts an object image from each frame of the reference image shown in FIG. Then, the entire region is stored with a frame at a certain time point, for example, the frame 52 as a reference frame, and the time step frame after that is stored with time series data of the image of the object image region (for example, the image 56), By storing the positional information and size information of the area on the reference image plane in association with each other, the compressed reference image data is obtained.
  • an image representing only the object may be generated as the image 56 when the reference image generating unit 330 generates the reference image.
  • the screen surface may be adjusted so that the object is zoomed while the reference viewpoint is fixed.
  • the operation of the data decompression unit 336 is the same as in the case of (b).
  • the modes (a) to (c) can be implemented not only for the reference image but also for the depth image.
  • the compression method applied to the reference image and the depth image may be the same or different. According to the compression method (c), object information can be held with the same level of detail regardless of the distance between the reference viewpoint and the object.
  • FIG. 18 is a diagram for explaining a method of using information representing only pixels with changes as time-series data as one of the compression processes performed by the data compression unit 334.
  • the horizontal axis of the figure indicates time.
  • the image 60 is one frame of the reference image or a part thereof.
  • the image 62a corresponds to the next frame of the image 60, but the pixel having a pixel value difference from the image 60 of a predetermined value or more is shown in gray.
  • the image 62b further corresponds to the next frame, and similarly, a pixel whose pixel value difference from the previous frame is a predetermined value or more is shown in gray.
  • the data compression unit 334 takes the inter-frame difference of the reference image, and extracts a pixel having a difference of a predetermined value or more in the pixel value. As a result, in the illustrated example, the front portion including the hood and bumper of the automobile and pixels representing the road surface in front of the automobile are extracted. Next, the data compression unit 334 stores the data (x, y, R, G, B) composed of the extracted pixel position coordinates and the changed pixel value as a 5-channel pixel value in raster order and holds the image 64a. , 64b is generated.
  • (x, y) is a pixel position coordinate on the reference image plane
  • (R, G, B) is a pixel value of the reference image, that is, a color value.
  • the data decompression unit 336 uses the time step to which the reference frame is given as a reference image, and for the time steps after that, updates only the pixels stored as time-series data. Restore the image. The same applies to the depth image. As a result, the data size can be further reduced in consideration of the shape of the object as compared with the mode shown in FIG.
  • the reference frame may be the first frame of each moving image or may be a frame at a predetermined time interval.
  • the embodiment of FIG. 17 and the embodiment of FIG. 18 may be appropriately combined.
  • FIG. 19 exemplifies two frames that move back and forth in the moving image of the reference image.
  • the difference between the frames is limited to only a part of the region. Even in the illustrated image of the automobile traveling, only the minute movements of the automobile images 70a and 70b and the minute reflection changes on the roads 72a and 72b occur between the upper frame and the lower frame.
  • regions 74a and 74b above the road on the image plane are distant views.
  • the distant view is different in nature from the surface of the object arranged in the display target space assumed in the present embodiment, and often does not need to be changed with respect to the movement of the user's viewpoint. Therefore, an image at a predetermined reference viewpoint may be separately displayed as a display image by texture mapping or the like. In other words, it is less necessary to hold the image data of the area for each reference viewpoint.
  • the compression process may be controlled in units of tile images after the reference image and the depth image are divided into tile images of a predetermined size.
  • FIG. 20 is a diagram for explaining a method in which the data compression unit 334 controls the compression process of the reference image in units of tile images.
  • the illustrated image corresponds to one frame shown in FIG. 19, and a matrix-like rectangle divided by a lattice represents a tile image.
  • the size of the tile image is set in advance.
  • the tile image surrounded by the white frame included in the distant view area 80 does not need to reflect the movement of the user's viewpoint as described above, and therefore, from the reference image data for each reference viewpoint. exclude.
  • the tile image surrounded by the remaining black lines is included in the foreground, that is, the region 82 used for drawing the object, it is included in the reference image data for each reference viewpoint as time series data.
  • a tile image having a difference from the previous frame such as a tile image surrounded by a solid line (for example, tile image 84), may be extracted, and only the time-series data may be included in the reference image data. For example, when the average value of the pixel values of tile images at the same position has a difference of a predetermined value or more between frames, it is determined that a difference from the previous frame has occurred and is extracted.
  • a pixel having a difference of a predetermined value or more from the previous frame is extracted, and data including the position coordinate and the pixel value of the pixel is extracted.
  • An image to represent may be generated. This processing is as described in FIG.
  • data can be omitted or the compression state can be controlled in units of tile images.
  • the distance value must be expressed in 256 gradations of SDR (Standard Dynamic Range), and therefore information after the decimal point is lost. If the original pixel value (distance value) is stored as floating point data in units of tile images, the resolution of the distance value is increased, and the reference image used for drawing can be selected with high accuracy.
  • FIG. 21 shows an example of the structure of data after compression in a mode in which the compression processing of the reference image and the depth image is controlled in units of tile images.
  • the post-compression reference image data 350 is generated for each reference viewpoint, and has a data structure in which the tile image data is connected in time series in association with the position coordinates of the tile image (denoted as “tile position”) on the image plane.
  • the time series are in the order of “frame numbers” 0, 1, 2,.
  • the image in that area is not used for drawing an object and is therefore invalid as the reference image data.
  • “-” indicates that the data of the tile image is invalid.
  • the data of the first frame (frame number “0”) is included in the data of the reference image.
  • the tile images are “image a”, “image b”, and the like.
  • the tile images of the position coordinates (70, 65) and (71, 65) have a change with the frame number “1”. It is included.
  • difference image c2 is included in association with the frame number “2”.
  • the difference image is an image representing a difference from the previous frame, and corresponds to, for example, the images 64a and 64b in FIG.
  • the tile image of the position coordinate (30, 50) has a frame number “24” and the tile image of the position coordinate (31, 50) has a change of the frame number “25”, an image “difference image” representing each difference. a1 "and” difference image b1 ".
  • the data decompression unit 336 of the display image generation apparatus 200 restores the reference image and the depth image of the frame by connecting the tile images associated with the frame number “0” based on the position coordinates. For the subsequent frames, only the moving image of the reference image and the depth image can be restored by updating the pixels represented as the difference image only in the tile area in which the difference image is included.
  • FIG. 22 is a diagram for explaining an example of data compression processing in a case where images of all directions of the reference image and the depth image are represented by a cube map.
  • A has shown the relationship between the screen surface of all directions, and the surface of a cube map.
  • the cube map surface 362 is a surface constituting a cube including a spherical screen surface 360 having the same distance from the viewpoint 364 in all directions.
  • a certain pixel 366 on the screen surface 360 is mapped to a position 368 where a straight line from the viewpoint 364 to the pixel 366 intersects with the surface 362 of the cube map.
  • Such a cube mapping technique is known as one of panoramic image expression means.
  • the reference image and the depth image can be held as cube map data.
  • (B) is a development view of six surfaces when a depth image at a certain reference viewpoint is represented by a cube map.
  • the reference image is a moving image
  • image data as illustrated is generated at a predetermined rate.
  • the spaces illustrated in FIGS. 17 to 20 are represented, the difference from the previous frame is limited to the area of the car image indicated by the arrow in FIG.
  • the cube map can easily include only the moving surface (the surface 370 in the illustrated example) as time series data in the reference image data. it can.
  • the data compression unit 334 and the data decompression The operation of the unit 336 is the same as described above.
  • the surface of the cube map may be further divided into tile images, and it may be determined whether or not to include them in the reference image data in units of tile images.
  • data representing only the information relating to the pixel where the difference has occurred as shown in FIG. It may be an “image”.
  • the reference image or depth image is represented by the equirectangular cylinder method
  • the image of the object directly below or above the viewpoint is stretched laterally at the lower or upper part of the image plane.
  • the efficiency of data compression deteriorates.
  • the change in the image plane is limited to the area corresponding to the change in the space, so that the efficiency of data compression can be stabilized.
  • a reference image and a depth image are generated as a pair for each reference viewpoint, and they are similarly compressed or expanded and used for drawing an object.
  • the depth image is used to select a reference image to be referred for drawing each point on the object surface. If this is pre-calculated and associated with the position on the object surface, the depth image itself does not need to be included in the reference image data.
  • FIG. 23 shows the reference image data generation unit of the reference image generation device 300 and the display image generation device 200 when a function for storing information related to the reference image of the reference destination in association with the position on the object surface is introduced.
  • the structure of the functional block of the pixel value determination part is shown.
  • the standard image data generation unit 318b includes a standard image generation unit 330, a data compression unit 334, a depth image generation unit 332, and a reference destination information addition unit 342.
  • the functions of the reference image generation unit 330, the data compression unit 334, and the depth image generation unit 332 are the same as the corresponding functional blocks shown in FIG.
  • the reference destination information adding unit 342 uses the depth image generated by the depth image generating unit to generate information for designating a reference image to be referred to for drawing the position with respect to the position on the object surface. This process is basically the same as that shown in FIG. That is, a reference image in which a point on the object (such as point 26 in FIG. 8) appears as an image is determined by comparing the distance to the object indicated by the depth image with the distance from the reference viewpoint to the point in the display target space. To do.
  • the reference destination information adding unit 342 obtains the reference destination.
  • a unit area on the object surface is set according to a predetermined rule. Specific examples will be described later.
  • the reference destination information adding unit 342 writes the identification information of the reference destination reference image thus determined in association with the object model stored in the object model storage unit 254.
  • the data compression unit 334 compresses only the reference image generated by the reference image generation unit 330 by any one of the methods described above, and stores it in the reference image data storage unit 256.
  • the pixel value determination unit 266b of the display image generation device 200 includes a data decompression unit 336, a reference unit 344, and a calculation unit 340.
  • the functions of the data decompression unit 336 and the calculation unit 340 are the same as the corresponding functional blocks shown in FIG.
  • the data decompression unit 336 performs the decompression process only on the reference image stored in the reference image data storage unit 256 as described above.
  • the reference unit 344 determines a reference image used to draw a point on the object corresponding to each pixel on the display image based on information added to the object model. .
  • a pixel value representing the image of the point is acquired from the determined reference image and supplied to the calculation unit 340.
  • the processing load of the reference unit 344 is reduced, and the display image generation process can be speeded up.
  • the identification information of the reference image of the reference destination requires less gradation than the distance value of the depth image, the data size can be reduced even as time-series data.
  • FIG. 24 is a diagram for explaining an example of a method for associating identification information of a reference image of a reference destination with an object model.
  • the representation of the figure is the same as in FIG. That is, five reference viewpoints are set in a space where the object 424 exists, and reference images 428a, 428b, 428c, 428d, and 428e are generated.
  • the identification information of each reference image (or reference viewpoint) is “A”, “B”, “C”, “D”, and “E”.
  • the reference destination information adding unit 342 associates the identification information of the reference image to be referred to in units of vertices of the objects 424 indicated by circles or in units of planes (mesh) surrounded by straight lines connecting the vertices.
  • the identification information “A” and “C” are associated with the surface 430a. If it is found that the surface 430b appears in the reference images of the identification information “A” and “B”, the identification information “A” and “B” are associated with the surface 430b. When it is found that the surface 430c appears in the reference images of the identification information “C” and “D”, the identification information “C” and “D” are associated with the surface 430c.
  • the reference image is used to identify the reference image using the depth image, and the identification information is associated.
  • the identification information to be associated is shown in a balloon from each surface of the object 424.
  • the reference unit 344 of the display image generation device 200 identifies a surface including a point on the object corresponding to the drawing target pixel or a vertex in the vicinity thereof, and acquires identification information of a reference image associated therewith. . According to such a configuration, since information can be added using information on vertices and meshes already formed as an object model as they are, an increase in data size can be suppressed. Further, since the reference destinations in the object model are limited, the processing load during display is small.
  • FIG. 25 is a diagram for explaining another example of a method for associating identification information of a reference image of a reference destination with an object model.
  • the distribution of the identification information of the reference image of the reference destination is generated as a texture image.
  • a texture image 432 is generated that represents the identification information of the reference image as a pixel value for each position on the surface. If the reference destination does not change in the plane, the pixel values of the texture image 432 are uniform. When the reference image of the reference destination changes in the plane due to occlusion or the like, the pixel value of the texture image 432 changes so as to correspond to it. This makes it possible to control the reference destination with a smaller granularity than the surface unit.
  • the reference unit 344 of the display image generation device 200 specifies the (u, v) coordinates on the texture image corresponding to the point on the object to be drawn, and the identification information of the reference image represented at that position Is read.
  • This process is basically the same as general texture mapping in computer graphics. According to such a configuration, switching of reference destinations within the same plane by occlusion or the like can be realized with a light load without dividing the mesh defined by the object model.
  • FIG. 26 is a diagram for explaining still another example of a method for associating identification information of a reference image of a reference destination with an object model.
  • the object is divided into voxels of a predetermined size, and the identification information of the reference image to be referred to is associated with the voxel unit.
  • the identification information “A” and “C” correspond to the voxels (for example, the voxels 432a and 432b) including the surface 430a. Put it on.
  • reference destination information may be associated with each plane.
  • the reference unit 344 of the display image generation apparatus 200 specifies a voxel including a point on the drawing target object, and acquires identification information of a reference image associated with the voxel. According to such a configuration, an image can be drawn with high accuracy with a unified data structure and processing regardless of the shape of the object and the complexity of the space.
  • the state of looking down over the same size voxels is represented by a set of squares.
  • the unit of the three-dimensional space that associates the identification information of the reference image to be referred to is not limited to voxels having the same size.
  • space division by an octree that is widely known as one of methods for efficiently retrieving information associated with a position in a three-dimensional space may be introduced. This method requires processing that makes the target space a root box, divides it into three boxes in three dimensions to form eight boxes, and further divides the box into eight boxes. This is a technique for representing a space in an octree tree structure by repeating the process accordingly.
  • the size of the box that is finally formed can be controlled by the locality of the granularity of the space that associates the information. Further, the relationship between the index numbers given to these boxes and the positions in the space can be easily found by simple bit operations.
  • the reference unit 344 of the display image generation apparatus 200 obtains the index number of the box including the point on the object to be drawn by bit operation, thereby quickly identifying the identification information of the reference image associated therewith. Can be specified.
  • a moving image obtained by viewing the movement from a plurality of reference viewpoints Prepare as a reference image.
  • the object is projected onto the view screen based on the user's viewpoint at a predetermined time step, and the pixel value representing the same object is obtained from the reference image at each time to obtain the pixel value of the display image.
  • a rule based on the positional relationship between the actual viewpoint and the reference viewpoint and the attribute of the object is introduced.
  • the reference image can be generated over time at a timing different from the display according to the viewpoint, a high-quality image can be prepared. By subtracting a value from this high quality image at the time of display, a high quality image can be presented without taking time. If the reference viewpoint is moved so as to follow the movement of the object, the detail level of the object in the reference image can be made constant, and the image of the object can be stably expressed in the display image with high quality.
  • the reference image of the reference destination is specified in advance with respect to the position of the object surface, and the identification information is associated with the object model. Thereby, the size of data required for display can be further reduced. Further, since the process of determining the reference image for reference by calculation can be omitted at the time of display, the time from acquisition of the user's viewpoint to display can be shortened.
  • 100 head mounted display 200 display image generation device, 222 CPU, 224 GPU, 226 main memory, 236 output unit, 238 input unit, 254 object model storage unit, 256 reference image data storage unit, 260 viewpoint information acquisition unit, 262 space Construction unit, 264 projection unit, 266 pixel value determination unit, 268 output unit, 300 reference image generation device, 310 reference viewpoint setting unit, 314 object model storage unit, 316 space construction unit, 318 reference image data generation unit, 330 reference image Generation unit, 332, depth image generation unit, 334 data compression unit, 336 data decompression unit, 338 reference unit, 340 calculation unit, 342 reference destination information addition unit, 344 reference unit.
  • the present invention can be used for various information processing devices such as a head mounted display, a game device, an image display device, a portable terminal, and a personal computer, and an information processing system including any one of them.

Abstract

The present invention: prepares, as reference images 428a–428e, images of a space viewed from reference viewpoints, for a space including and object 424 for display; synthesizes these, in accordance with an actual viewpoint position; and draws a display image. Identification information for the reference image is associated to each prescribed region, in a display surface for a model of the object 424, said reference image having the relevant region appearing as an image. The reference images A and C associated to a surface 430a are used as references when drawing said surface 430a.

Description

基準画像生成装置、表示画像生成装置、基準画像生成方法、および表示画像生成方法Reference image generation device, display image generation device, reference image generation method, and display image generation method
 この発明は、ユーザの視点に応じた画像を表示するのに用いるデータを生成する基準画像生成装置、当該データを用いて表示画像を生成する表示画像生成装置および、それらによる基準画像生成方法、表示画像生成方法に関する。 The present invention relates to a reference image generation device that generates data used to display an image according to a user's viewpoint, a display image generation device that generates a display image using the data, a reference image generation method using the same, and a display The present invention relates to an image generation method.
 対象空間を自由な視点から鑑賞できる画像表示システムが普及している。例えばヘッドマウントディスプレイにパノラマ映像を表示し、ヘッドマウントディスプレイを装着したユーザが頭部を回転させると視線方向に応じたパノラマ画像が表示されるようにしたシステムが開発されている。ヘッドマウントディスプレイを利用することで、映像への没入感を高めたり、ゲームなどのアプリケーションの操作性を向上させたりすることもできる。また、ヘッドマウントディスプレイを装着したユーザが物理的に移動することで、映像として表示された空間内を仮想的に歩き回ることのできるウォークスルーシステムも開発されている。 An image display system that allows users to appreciate the target space from a free viewpoint has become widespread. For example, a system has been developed in which a panoramic image is displayed on a head-mounted display, and a panoramic image corresponding to the line-of-sight direction is displayed when a user wearing the head-mounted display rotates his head. By using a head-mounted display, it is possible to enhance the sense of immersion in video and improve the operability of applications such as games. In addition, a walk-through system has been developed that allows a user wearing a head-mounted display to physically move around a space displayed as an image when the user physically moves.
 表示装置の種類によらず、自由視点に対応する画像表示技術においては、視点の動きに対する表示の変化に高い応答性が求められる。一方で、画像世界の臨場感を高めるには、解像度を高くしたり複雑な計算を実施したりする必要が生じ、画像処理の負荷が増大する。そのため視点の移動に対し表示が追いつかず、結果として臨場感が損なわれてしまうこともあり得る。 ∙ Regardless of the type of display device, image display technology that supports free viewpoints requires high responsiveness to changes in display with respect to viewpoint movement. On the other hand, in order to increase the realism of the image world, it is necessary to increase the resolution or perform complicated calculations, which increases the load of image processing. For this reason, the display cannot catch up with the movement of the viewpoint, and as a result, the sense of reality may be impaired.
 本発明はこうした課題に鑑みてなされたものであり、その目的は、視点に対する画像表示の応答性と画質を両立させることのできる技術を提供することにある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a technique capable of achieving both image display responsiveness and image quality with respect to the viewpoint.
 上記課題を解決するために、本発明のある態様は基準画像生成装置に関する。この基準画像生成装置は、表示対象のオブジェクトを含む空間を任意視点から見たときの表示画像を生成するのに用いる、当該空間を所定の基準視点から見たときの像を表す基準画像のデータ生成する基準画像生成装置であって、オブジェクトを規定する情報に従い、空間に当該オブジェクトを配置する空間構築部と、空間に配置した基準視点に対応する視野で、基準画像とそれに対応するデプス画像を生成したうえ、デプス画像を用いて、オブジェクトの表面上の所定領域ごとに、当該領域が像として表れている基準画像を特定し、特定結果と基準画像のデータを出力する基準画像データ生成部と、を備えたことを特徴とする。 In order to solve the above-described problems, an aspect of the present invention relates to a reference image generation device. This reference image generation device is used to generate a display image when a space including an object to be displayed is viewed from an arbitrary viewpoint, and data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint A reference image generation device for generating a reference image and a depth image corresponding to the reference image and a depth image corresponding to the reference viewpoint arranged in the space, and a space construction unit that arranges the object in the space according to information defining the object A reference image data generation unit that generates a reference image in which the region appears as an image for each predetermined region on the surface of the object using the depth image, and outputs the identification result and the data of the reference image; , Provided.
 本発明の別の態様は、表示画像生成装置に関する。この表示画像生成装置は、表示対象の空間におけるオブジェクトを規定する情報を格納するオブジェクトモデル記憶部と、オブジェクトを含む空間を、所定の基準視点から見たときの像を表す基準画像のデータを格納する基準画像データ記憶部と、ユーザの視点に係る情報を取得する視点情報取得部と、空間をユーザの視点から見たときのオブジェクトの像を表示画像の平面に表す射影部と、表示画像における画素ごとに、対応するオブジェクト上のポイントが表されている基準画像を、オブジェクトモデル記憶部に格納されたオブジェクトモデルの付加情報を読み出すことにより特定し、当該画素の色を、特定した基準画像における像の色を用いて決定する画素値決定部と、表示画像のデータを出力する出力部と、を備えたことを特徴とする。 Another aspect of the present invention relates to a display image generation apparatus. The display image generation device stores an object model storage unit that stores information defining an object in a display target space, and reference image data representing an image when the space including the object is viewed from a predetermined reference viewpoint. A reference image data storage unit, a viewpoint information acquisition unit that acquires information related to a user's viewpoint, a projection unit that displays an image of an object on a plane of the display image when the space is viewed from the user's viewpoint, and a display image For each pixel, a reference image in which a point on the corresponding object is represented is specified by reading additional information of the object model stored in the object model storage unit, and the color of the pixel is determined in the specified reference image. A pixel value determining unit that determines the color of the image; and an output unit that outputs display image data. .
 本発明のさらに別の態様は、基準画像生成方法に関する。この基準画像生成方法は、表示対象のオブジェクトを含む空間を任意視点から見たときの表示画像を生成するのに用いる、当該空間を所定の基準視点から見たときの像を表す基準画像のデータ生成する基準画像生成装置が、オブジェクトを規定する情報に従い、空間に当該オブジェクトを配置するステップと、空間に配置した基準視点に対応する視野で、基準画像とそれに対応するデプス画像を生成するステップと、デプス画像を用いて、オブジェクトの表面上の所定領域ごとに、当該領域が像として表れている基準画像を特定し、特定結果と基準画像のデータを出力するステップと、を含むことを特徴とする。 Still another embodiment of the present invention relates to a reference image generation method. This reference image generation method is used to generate a display image when a space including an object to be displayed is viewed from an arbitrary viewpoint, and data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint A reference image generating device to generate, in accordance with information defining the object, a step of arranging the object in a space; a step of generating a reference image and a depth image corresponding to the reference image in a field of view corresponding to the reference viewpoint arranged in the space; And, for each predetermined area on the surface of the object using the depth image, specifying a reference image in which the area appears as an image, and outputting a specification result and data of the reference image. To do.
 本発明のさらに別の態様は、表示画像生成方法に関する。この表示画像生成方法は、表示対象の空間におけるオブジェクトを規定する情報をメモリから読み出すステップと、オブジェクトを含む空間を、所定の基準視点から見たときの像を表す基準画像のデータをメモリから読み出すステップと、ユーザの視点に係る情報を取得するステップと、空間をユーザの視点から見たときのオブジェクトの像を表示画像の平面に表すステップと、表示画像における画素ごとに、対応するオブジェクト上のポイントが表されている基準画像を、オブジェクトを規定する情報に含まれるオブジェクトモデルの付加情報に基づき特定し、当該画素の色を、特定した基準画像における像の色を用いて決定するステップと、表示画像のデータを出力するステップと、を含むことを特徴とする。 Still another aspect of the present invention relates to a display image generation method. In this display image generation method, a step of reading information defining an object in a display target space from a memory, and a reference image data representing an image when the space including the object is viewed from a predetermined reference viewpoint are read from the memory. A step of acquiring information relating to the user's viewpoint, a step of representing an image of the object when the space is viewed from the user's viewpoint on a plane of the display image, and a pixel on the corresponding object for each pixel in the display image Identifying a reference image in which points are represented based on additional information of the object model included in the information defining the object, and determining the color of the pixel using the color of the image in the identified reference image; Outputting display image data.
 なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、データ構造、記録媒体などの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and the expression of the present invention converted between a method, apparatus, system, computer program, data structure, recording medium, etc. are also effective as an aspect of the present invention.
 本発明によれば、視点に対する画像表示の応答性と画質を両立させることができる。 According to the present invention, it is possible to achieve both image display responsiveness and image quality with respect to the viewpoint.
本実施の形態のヘッドマウントディスプレイの外観例を示す図である。It is a figure which shows the example of an external appearance of the head mounted display of this Embodiment. 本実施の形態の画像処理システムの構成図である。It is a block diagram of the image processing system of this Embodiment. 本実施の形態の表示画像生成装置がヘッドマウントディスプレイに表示させる画像世界の例を説明するための図である。It is a figure for demonstrating the example of the image world which the display image generation apparatus of this Embodiment displays on a head mounted display. 本実施の形態の表示画像生成装置の内部回路構成を示す図である。It is a figure which shows the internal circuit structure of the display image generation apparatus of this Embodiment. 本実施の形態における表示画像生成装置の機能ブロックを示す図である。It is a figure which shows the functional block of the display image generation apparatus in this Embodiment. 本実施の形態における基準画像生成装置の機能ブロックを示す図である。It is a figure which shows the functional block of the reference | standard image generation apparatus in this Embodiment. 本実施の形態における基準視点の設定例を示す図である。It is a figure which shows the example of a setting of the reference viewpoint in this Embodiment. 本実施の形態における画素値決定部が、表示画像の画素値の決定に用いる基準画像を選択する手法を説明するための図である。It is a figure for demonstrating the method in which the pixel value determination part in this Embodiment selects the reference | standard image used for the determination of the pixel value of a display image. 本実施の形態における画素値決定部が、表示画像の画素値を決定する手法を説明するための図である。It is a figure for demonstrating the method in which the pixel value determination part in this Embodiment determines the pixel value of a display image. 本実施の形態において表示画像生成装置が視点に応じた表示画像を生成する処理手順を示すフローチャートである。It is a flowchart which shows the process sequence which the display image generation apparatus produces | generates the display image according to a viewpoint in this Embodiment. 本実施の形態において基準画像データ記憶部に格納されるデータの構造例を示す図である。It is a figure which shows the structural example of the data stored in a reference | standard image data memory | storage part in this Embodiment. 本実施の形態において動きのあるオブジェクトを表すための基準視点の設定例を示す図である。It is a figure which shows the example of a setting of the reference viewpoint for representing the object with a motion in this Embodiment. 本実施の形態において表示画像の生成に用いる基準画像を、オブジェクトの動きに応じて切り替える態様を説明するための図である。It is a figure for demonstrating the aspect which switches the reference | standard image used for the production | generation of a display image in this Embodiment according to a motion of an object. 本実施の形態において、基準画像のデータの圧縮/伸張処理機能を導入した場合の、基準画像生成装置の基準画像データ生成部と、表示画像生成装置の画素値決定部の機能ブロックの構成を示す図である。In this embodiment, the configuration of functional blocks of a reference image data generation unit of a reference image generation device and a pixel value determination unit of a display image generation device when a compression / decompression processing function of reference image data is introduced is shown. FIG. 本実施の形態におけるデータ圧縮部によって生成される、統合動画像の例を模式的に示す図である。It is a figure which shows typically the example of an integrated moving image produced | generated by the data compression part in this Embodiment. 本実施の形態におけるデータ圧縮部によって生成される、統合動画像の別の例を模式的に示す図である。It is a figure which shows typically another example of the integrated moving image produced | generated by the data compression part in this Embodiment. 本実施の形態においてデータ圧縮部が実施する圧縮処理の一つとして、変化のある領域の画像のみを時系列データとする手法を説明するための図である。It is a figure for demonstrating the method of making only the image of the area | region with a change into time series data as one of the compression processes which a data compression part implements in this Embodiment. 本実施の形態においてデータ圧縮部が実施する圧縮処理の一つとして、変化のある画素のみを表す情報を時系列データとする手法を説明するための図である。It is a figure for demonstrating the method of making the information showing only a pixel with a change into time series data as one of the compression processes which a data compression part implements in this Embodiment. 本実施の形態の基準画像の動画において前後する2つのフレームを例示する図である。It is a figure which illustrates two frames which move forward and backward in the animation of the standard picture of this embodiment. 本実施の形態において、データ圧縮部が基準画像の圧縮処理をタイル画像単位で制御する手法を説明するための図である。In this Embodiment, it is a figure for demonstrating the method in which a data compression part controls the compression process of a reference | standard image per tile image. 本実施の形態において、基準画像およびデプス画像の圧縮処理をタイル画像単位で制御する態様における、圧縮後のデータの構造例を示す図である。In this Embodiment, it is a figure which shows the structural example of the data after compression in the aspect which controls the compression process of a reference | standard image and a depth image per tile image. 本実施の形態において、基準画像およびデプス画像の全方位の画像をキューブマップで表した場合の、データ圧縮処理の例を説明するための図である。In this Embodiment, it is a figure for demonstrating the example of a data compression process when the image of all the directions of a reference | standard image and a depth image is represented by the cube map. 本実施の形態において、参照先の基準画像に係る情報を、オブジェクト表面上の位置に対応づけて保存する機能を導入した場合の、基準画像生成装置の基準画像データ生成部と、表示画像生成装置の画素値決定部の機能ブロックの構成を示す図である。In this embodiment, a reference image data generation unit of a reference image generation device and a display image generation device when a function of storing information related to a reference image of a reference destination in association with a position on the object surface is introduced. It is a figure which shows the structure of the functional block of the pixel value determination part. 本実施の形態において参照先の基準画像の識別情報をオブジェクトモデルに対応づける手法の例を説明するための図である。It is a figure for demonstrating the example of the method of matching the identification information of the reference image of a reference destination with an object model in this Embodiment. 本実施の形態において参照先の基準画像の識別情報をオブジェクトモデルに対応づける手法の別の例を説明するための図である。It is a figure for demonstrating another example of the method of matching the identification information of the reference image of a reference destination with an object model in this Embodiment. 本実施の形態において参照先の基準画像の識別情報をオブジェクトモデルに対応づける手法のさらに別の例を説明するための図である。It is a figure for demonstrating another example of the method of matching the identification information of the reference image of a reference destination with an object model in this Embodiment.
 本実施の形態は基本的に、ユーザの視点に応じた視野で画像を表示する。その限りにおいて画像を表示させる装置の種類は特に限定されず、ウェアラブルディスプレイ、平板型のディスプレイ、プロジェクタなどのいずれでもよいが、ここではウェアラブルディスプレイのうちヘッドマウントディスプレイを例に説明する。 This embodiment basically displays an image with a field of view according to the user's viewpoint. As long as this is the case, the type of device that displays an image is not particularly limited, and any of a wearable display, a flat panel display, a projector, and the like may be used. Here, a head-mounted display will be described as an example of a wearable display.
 ウェアラブルディスプレイの場合、ユーザの視線は内蔵するモーションセンサによりおよそ推定できる。その他の表示装置の場合、ユーザがモーションセンサを頭部に装着したり、注視点検出装置を用いたりすることで視線を検出できる。あるいはユーザの頭部にマーカーを装着させ、その姿を撮影した画像を解析することにより視線を推定してもよいし、それらの技術のいずれかを組み合わせてもよい。 In the case of a wearable display, the user's line of sight can be estimated approximately by a built-in motion sensor. In the case of other display devices, the user can detect the line of sight by wearing a motion sensor on the head or using a gaze point detection device. Alternatively, the user's head may be attached with a marker, and the line of sight may be estimated by analyzing an image of the appearance, or any of those techniques may be combined.
 図1は、ヘッドマウントディスプレイ100の外観例を示す。ヘッドマウントディスプレイ100は、本体部110、前頭部接触部120、および側頭部接触部130を含む。ヘッドマウントディスプレイ100は、ユーザの頭部に装着してディスプレイに表示される静止画や動画などを鑑賞し、ヘッドホンから出力される音声や音楽などを聴くための表示装置である。ヘッドマウントディスプレイ100に内蔵または外付けされたモーションセンサにより、ヘッドマウントディスプレイ100を装着したユーザの頭部の回転角や傾きといった姿勢情報を計測することができる。 FIG. 1 shows an example of the appearance of the head mounted display 100. The head mounted display 100 includes a main body portion 110, a forehead contact portion 120, and a temporal contact portion 130. The head mounted display 100 is a display device that is worn on the user's head and enjoys still images and moving images displayed on the display, and listens to sound and music output from the headphones. Posture information such as the rotation angle and inclination of the head of the user wearing the head mounted display 100 can be measured by a motion sensor built in or externally attached to the head mounted display 100.
 ヘッドマウントディスプレイ100は、「ウェアラブルディスプレイ装置」の一例である。ウェアラブルディスプレイ装置には、狭義のヘッドマウントディスプレイ100に限らず、めがね、めがね型ディスプレイ、めがね型カメラ、ヘッドホン、ヘッドセット(マイクつきヘッドホン)、イヤホン、イヤリング、耳かけカメラ、帽子、カメラつき帽子、ヘアバンドなど任意の装着可能なディスプレイ装置が含まれる。 The head mounted display 100 is an example of a “wearable display device”. The wearable display device is not limited to the head-mounted display 100 in a narrow sense, but includes glasses, glasses-type displays, glasses-type cameras, headphones, headsets (headphones with microphones), earphones, earrings, ear-mounted cameras, hats, hats with cameras, Any wearable display device such as a hair band is included.
 図2は、本実施の形態に係る画像処理システムの構成図を示している。ヘッドマウントディスプレイ100は、無線通信またはUSBなどの周辺機器を接続するインタフェース205により表示画像生成装置200に接続される。表示画像生成装置200は、さらにネットワークを介してサーバに接続されてもよい。その場合、サーバはヘッドマウントディスプレイ100に表示させる画像のデータを表示画像生成装置200に提供してもよい。 FIG. 2 is a configuration diagram of the image processing system according to the present embodiment. The head mounted display 100 is connected to the display image generating apparatus 200 via an interface 205 for connecting peripheral devices such as wireless communication or USB. The display image generation apparatus 200 may be further connected to a server via a network. In that case, the server may provide the display image generating apparatus 200 with image data to be displayed on the head mounted display 100.
 表示画像生成装置200は、ヘッドマウントディスプレイ100を装着したユーザの頭部の位置や姿勢に基づき視点の位置や視線の方向を特定し、それに応じた視野となるように表示画像を生成してヘッドマウントディスプレイ100に出力する。この限りにおいて画像を表示する目的は様々であってよい。例えば表示画像生成装置200は、電子ゲームを進捗させつつゲームの舞台である仮想世界を表示画像として生成してもよいし、仮想世界が実世界かに関わらず観賞用として動画像などを表示させてもよい。表示装置をヘッドマウントディスプレイとした場合、視点を中心に広い角度範囲でパノラマ画像を表示できるようにすれば、表示世界に没入した状態を演出することもできる。 The display image generation device 200 identifies the position of the viewpoint and the direction of the line of sight based on the position and orientation of the head of the user wearing the head mounted display 100, generates a display image so as to have a field of view corresponding thereto, and Output to the mount display 100. As long as this is the case, the purpose of displaying an image may be various. For example, the display image generation apparatus 200 may generate a virtual world that is the stage of a game as a display image while advancing an electronic game, or display a moving image or the like for viewing regardless of whether the virtual world is a real world. May be. When the display device is a head-mounted display, if the panoramic image can be displayed in a wide angle range with the viewpoint as the center, it is possible to produce a state of being immersed in the display world.
 図3は、本実施の形態で表示画像生成装置200がヘッドマウントディスプレイ100に表示させる画像世界の例を説明するための図である。この例ではユーザ12が仮想空間である部屋にいる状態を作り出している。仮想空間を定義するワールド座標系には図示するように、壁、床、窓、テーブル、テーブル上の物などのオブジェクトを配置している。表示画像生成装置200は当該ワールド座標系に、ユーザ12の視点の位置や視線の方向に応じてビュースクリーン14を定義し、そこにオブジェクトの像を射影することで表示画像を描画する。 FIG. 3 is a diagram for explaining an example of an image world displayed on the head mounted display 100 by the display image generating apparatus 200 in the present embodiment. In this example, a state in which the user 12 is in a room that is a virtual space is created. As shown in the figure, objects such as walls, floors, windows, tables, and objects on the table are arranged in the world coordinate system that defines the virtual space. The display image generation apparatus 200 defines a view screen 14 in the world coordinate system in accordance with the position of the viewpoint of the user 12 and the direction of the line of sight, and draws a display image by projecting an image of the object there.
 ユーザ12の視点の位置や視線の方向(以後、これらを包括的に「視点」と呼ぶ場合がある)を所定のレートで取得し、これに応じてビュースクリーン14の位置や方向を変化させれば、ユーザの視点に対応する視野で画像を表示させることができる。視差を有するステレオ画像を生成し、ヘッドマウントディスプレイ100において左右の目の前に表示させれば、仮想空間を立体視させることもできる。これによりユーザ12は、あたかも表示世界の部屋の中にいるような仮想現実を体験することができる。なお図示する例では表示対象を、コンピュータグラフィックスを前提とする仮想世界としたが、パノラマ写真など実世界の撮影画像としたり、それと仮想世界とを組み合わせたりしてもよい。 The position of the viewpoint of the user 12 and the direction of the line of sight (hereinafter, these may be collectively referred to as “viewpoint”) are acquired at a predetermined rate, and the position and direction of the view screen 14 can be changed accordingly. For example, an image can be displayed with a field of view corresponding to the user's viewpoint. If a stereo image having parallax is generated and displayed in front of the left and right eyes on the head mounted display 100, the virtual space can be stereoscopically viewed. Thereby, the user 12 can experience a virtual reality as if he were in a room in the display world. In the illustrated example, the display target is a virtual world based on computer graphics. However, a real world photographed image such as a panoramic photograph may be used, or it may be combined with the virtual world.
 このような表示に臨場感を持たせるためには、表示対象の空間で生じる物理現象をできるだけ正確に反映させることが望ましい。例えばオブジェクト表面での拡散反射や鏡面反射、環境光など、目に到達する様々な光の伝播を正確に計算することにより、視点の動きによるオブジェクト表面の色味や輝度の変化をよりリアルに表現することができる。これを実現する代表的な手法がレイトレーシングである。しかしながら自由視点を許容する環境では特に、そのような物理計算を高精度に行うことにより、表示までに看過できないレイテンシが生じることが考えられる。 In order to give such a display a sense of reality, it is desirable to reflect the physical phenomenon that occurs in the display target space as accurately as possible. For example, by accurately calculating the propagation of various light that reaches the eyes, such as diffuse reflection, specular reflection, and ambient light on the object surface, changes in the color and brightness of the object surface due to the movement of the viewpoint can be expressed more realistically can do. A typical method for realizing this is ray tracing. However, particularly in an environment that allows a free viewpoint, it is conceivable that latency that cannot be overlooked before display is caused by performing such physical calculation with high accuracy.
 そこで本実施の形態では、特定の視点から見た画像を前もって取得しておき、任意の視点に対する表示画像の画素値の決定に利用する。すなわち表示画像に像として表れるオブジェクトの色を、前もって取得しておいた画像の対応する箇所から抽出することで決定する。以後、事前の画像取得において設定する視点を「基準視点」、基準視点から見た事前に取得する画像を「基準画像」または「基準視点の画像」と呼ぶ。表示画像の描画に用いるデータの一部を、基準画像として事前に取得しておくことにより、視点の移動から表示までのレイテンシを抑えられる。また基準画像の生成段階においては基本的に時間的な制約がないため、レイトレーシングなどの物理計算を、時間をかけて高精度に行うことができる。 Therefore, in the present embodiment, an image viewed from a specific viewpoint is acquired in advance and used to determine the pixel value of the display image for an arbitrary viewpoint. That is, the color of the object appearing as an image in the display image is determined by extracting from the corresponding portion of the image acquired in advance. Hereinafter, the viewpoint set in the prior image acquisition is referred to as “reference viewpoint”, and the image acquired in advance viewed from the reference viewpoint is referred to as “reference image” or “reference viewpoint image”. By acquiring a part of data used for drawing a display image in advance as a reference image, the latency from the movement of the viewpoint to the display can be suppressed. In addition, since there is basically no time restriction in the generation stage of the reference image, physical calculations such as ray tracing can be performed with high accuracy over time.
 基準視点を、表示時の視点に想定される可動範囲に分散させて複数個設定し、それぞれについて基準画像を準備しておけば、複数の視点から見た同じオブジェクトの色味を加味して、表示時の視点に応じたオブジェクトをより高精度に表現できる。より具体的には、表示時の視点が基準視点の一つと一致しているとき、当該基準視点に対応する基準画像の画素値をそのまま採用できる。表示時の視点が複数の基準視点の間にあるとき、当該複数の基準視点に対応する基準画像の画素値を合成することにより、表示画像の画素値を決定する。 Set a plurality of reference viewpoints in the movable range assumed for the viewpoint at the time of display, and prepare a reference image for each, taking into account the color of the same object seen from multiple viewpoints, Objects according to the viewpoint at the time of display can be expressed with higher accuracy. More specifically, when the viewpoint at the time of display coincides with one of the reference viewpoints, the pixel value of the reference image corresponding to the reference viewpoint can be adopted as it is. When the viewpoint at the time of display is between a plurality of reference viewpoints, the pixel values of the display image are determined by combining the pixel values of the reference image corresponding to the plurality of reference viewpoints.
 図4は表示画像生成装置200の内部回路構成を示している。表示画像生成装置200は、CPU(Central Processing Unit)222、GPU(Graphics Processing Unit)224、メインメモリ226を含む。これらの各部は、バス230を介して相互に接続されている。バス230にはさらに入出力インタフェース228が接続されている。 FIG. 4 shows the internal circuit configuration of the display image generating apparatus 200. The display image generation apparatus 200 includes a CPU (Central Processing Unit) 222, a GPU (Graphics Processing Unit) 224, and a main memory 226. These units are connected to each other via a bus 230. An input / output interface 228 is further connected to the bus 230.
 入出力インタフェース228には、USBやIEEE1394などの周辺機器インタフェースや、有線又は無線LANのネットワークインタフェースからなる通信部232、ハードディスクドライブや不揮発性メモリなどの記憶部234、ヘッドマウントディスプレイ100などの表示装置へデータを出力する出力部236、ヘッドマウントディスプレイ100からデータを入力する入力部238、磁気ディスク、光ディスクまたは半導体メモリなどのリムーバブル記録媒体を駆動する記録媒体駆動部240が接続される。 The input / output interface 228 includes a peripheral device interface such as USB or IEEE1394, a communication unit 232 including a wired or wireless LAN network interface, a storage unit 234 such as a hard disk drive or a nonvolatile memory, and a display device such as the head mounted display 100. An output unit 236 that outputs data to the head, an input unit 238 that inputs data from the head mounted display 100, and a recording medium driving unit 240 that drives a removable recording medium such as a magnetic disk, an optical disk, or a semiconductor memory are connected.
 CPU222は、記憶部234に記憶されているオペレーティングシステムを実行することにより表示画像生成装置200の全体を制御する。CPU222はまた、リムーバブル記録媒体から読み出されてメインメモリ226にロードされた、あるいは通信部232を介してダウンロードされた各種プログラムを実行する。GPU224は、ジオメトリエンジンの機能とレンダリングプロセッサの機能とを有し、CPU222からの描画命令に従って描画処理を行い、表示画像を図示しないフレームバッファに格納する。そしてフレームバッファに格納された表示画像をビデオ信号に変換して出力部236に出力する。メインメモリ226はRAM(Random Access Memory)により構成され、処理に必要なプログラムやデータを記憶する。 The CPU 222 controls the entire display image generation apparatus 200 by executing the operating system stored in the storage unit 234. The CPU 222 also executes various programs read from the removable recording medium and loaded into the main memory 226 or downloaded via the communication unit 232. The GPU 224 has a function of a geometry engine and a function of a rendering processor, performs a drawing process according to a drawing command from the CPU 222, and stores a display image in a frame buffer (not shown). Then, the display image stored in the frame buffer is converted into a video signal and output to the output unit 236. The main memory 226 includes a RAM (Random Access Memory) and stores programs and data necessary for processing.
 図5は、本実施の形態における表示画像生成装置200の機能ブロックの構成を示している。表示画像生成装置200は上述のとおり、電子ゲームを進捗させたりサーバと通信したりする一般的な情報処理を行ってよいが、図5では特に、視点に応じて表示画像のデータを生成する機能に着目して示している。なお図5で示される表示画像生成装置200の機能のうち少なくとも一部を、ヘッドマウントディスプレイ100に実装してもよい。あるいは、表示画像生成装置200の少なくとも一部の機能を、ネットワークを介して表示画像生成装置200に接続されたサーバに実装してもよい。 FIG. 5 shows a functional block configuration of the display image generation apparatus 200 in the present embodiment. As described above, the display image generation apparatus 200 may perform general information processing such as progressing an electronic game or communicating with a server. In FIG. 5, in particular, a function of generating display image data according to a viewpoint. It is shown paying attention to. Note that at least a part of the functions of the display image generation apparatus 200 shown in FIG. 5 may be mounted on the head mounted display 100. Alternatively, at least a part of the functions of the display image generation device 200 may be implemented in a server connected to the display image generation device 200 via a network.
 また図5および後述する図6に示す機能ブロックは、ハードウェア的には、図4に示したCPU、GPU、各種メモリなどの構成で実現でき、ソフトウェア的には、記録媒体などからメモリにロードした、データ入力機能、データ保持機能、画像処理機能、通信機能などの諸機能を発揮するプログラムで実現される。したがって、これらの機能ブロックがハードウェアのみ、ソフトウェアのみ、またはそれらの組合せによっていろいろな形で実現できることは当業者には理解されるところであり、いずれかに限定されるものではない。 The functional blocks shown in FIG. 5 and FIG. 6 to be described later can be realized in hardware by the configuration of the CPU, GPU, various memories shown in FIG. 4, and loaded in the memory from a recording medium or the like in software. It is realized by a program that exhibits various functions such as a data input function, a data holding function, an image processing function, and a communication function. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any one.
 表示画像生成装置200は、ユーザの視点に係る情報を取得する視点情報取得部260、表示対象のオブジェクトからなる空間を構築する空間構築部262、ビュースクリーンにオブジェクトを射影する射影部264、オブジェクトの像を構成する画素の値を決定し表示画像を完成させる画素値決定部266、表示画像のデータをヘッドマウントディスプレイ100に出力する出力部268を備える。表示画像生成装置200はさらに、空間の構築に必要なオブジェクトモデルに係るデータを記憶するオブジェクトモデル記憶部254、および、基準画像に係るデータを記憶する基準画像データ記憶部256を備える。 The display image generation apparatus 200 includes a viewpoint information acquisition unit 260 that acquires information related to a user's viewpoint, a space construction unit 262 that constructs a space composed of objects to be displayed, a projection unit 264 that projects an object on a view screen, A pixel value determining unit 266 that determines values of pixels constituting the image and completes a display image, and an output unit 268 that outputs display image data to the head mounted display 100 are provided. The display image generation apparatus 200 further includes an object model storage unit 254 that stores data related to an object model necessary for constructing a space, and a reference image data storage unit 256 that stores data related to a reference image.
 視点情報取得部260は、図4の入力部238、CPU222などで構成され、ユーザの視点の位置や視線の方向を所定のレートで取得する。例えばヘッドマウントディスプレイ100に内蔵した加速度センサの出力値を逐次取得し、それによって頭部の姿勢を取得する。さらにヘッドマウントディスプレイ100の外部に図示しない発光マーカーを設け、その撮影画像を図示しない撮像装置から取得することで、実空間での頭部の位置を取得する。 The viewpoint information acquisition unit 260 includes the input unit 238 and the CPU 222 shown in FIG. 4 and acquires the position of the user's viewpoint and the direction of the line of sight at a predetermined rate. For example, the output value of the acceleration sensor built in the head mounted display 100 is sequentially acquired, and thereby the posture of the head is acquired. Further, a light emitting marker (not shown) is provided outside the head mounted display 100, and the captured image is acquired from an imaging device (not shown), thereby acquiring the position of the head in real space.
 あるいはヘッドマウントディスプレイ100側に、ユーザの視野に対応する画像を撮影する図示しない撮像装置を設け、SLAM(Simultaneous Localization and Mapping)などの技術により頭部の位置や姿勢を取得してもよい。このように頭部の位置や姿勢を取得できれば、ユーザの視点の位置および視線の方向はおよそ特定できる。ユーザの視点に係る情報を取得する手法は、ヘッドマウントディスプレイ100を利用する場合に限らず様々に考えられることは当業者には理解されるところである。 Alternatively, an imaging device (not shown) that captures an image corresponding to the user's visual field may be provided on the head-mounted display 100 side, and the position and posture of the head may be acquired by a technique such as SLAM (Simultaneous Localization and Mapping). If the position and orientation of the head can be acquired in this way, the position of the user's viewpoint and the direction of the line of sight can be specified approximately. Those skilled in the art will understand that various methods for acquiring information related to the user's viewpoint are not limited to the case of using the head mounted display 100.
 空間構築部262は、図4のCPU222、GPU224、メインメモリ226などで構成され、表示対象のオブジェクトが存在する空間の形状モデルを構築する。図3で示した例では、室内を表す壁、床、窓、テーブル、テーブル上の物などのオブジェクトを、仮想空間を定義するワールド座標系に配置する。個々のオブジェクトの形状に係る情報はオブジェクトモデル記憶部254から読み出す。ここで空間構築部262は、オブジェクトの形状、位置、姿勢を決定すればよく、一般的なコンピュータグラフィクスにおけるサーフェスモデルに基づくモデリングの手法を利用できる。 The space construction unit 262 is configured by the CPU 222, the GPU 224, the main memory 226, and the like in FIG. 4, and constructs a shape model of the space where the object to be displayed exists. In the example shown in FIG. 3, objects such as walls, floors, windows, tables, and objects on the table representing the room are arranged in the world coordinate system that defines the virtual space. Information related to the shape of each object is read from the object model storage unit 254. Here, the space construction unit 262 may determine the shape, position, and orientation of the object, and can use a modeling technique based on a surface model in general computer graphics.
 なお本実施の形態では、仮想空間においてオブジェクトが移動したり変形したりする様子を表現できるようにする。このためオブジェクトモデル記憶部254には、オブジェクトの動きや変形を規定するデータも格納しておく。例えばオブジェクトの位置や形状を所定の時間間隔で表した時系列データを格納しておく。あるいはそのような変化を発生させるためのプログラムを格納しておく。空間構築部262は当該データを読み出し、仮想空間において配置したオブジェクトを変化させる。 In the present embodiment, it is possible to express how an object moves or deforms in a virtual space. For this reason, the object model storage unit 254 also stores data defining the movement and deformation of the object. For example, time-series data representing the position and shape of the object at predetermined time intervals is stored. Alternatively, a program for generating such a change is stored. The space construction unit 262 reads the data and changes the object arranged in the virtual space.
 射影部264は、図4のGPU224、メインメモリ226などで構成され、視点情報取得部260が取得した視点の情報に応じてビュースクリーンを設定する。すなわち頭部の位置や顔面の向く方向に対応させてスクリーン座標を設定することにより、ユーザの位置や向く方向に応じた視野で表示対象の空間がスクリーン平面に描画されるようにする。 The projection unit 264 includes the GPU 224 and the main memory 226 shown in FIG. 4 and sets the view screen according to the viewpoint information acquired by the viewpoint information acquisition unit 260. That is, by setting the screen coordinates corresponding to the position of the head and the direction of the face, the display target space is drawn on the screen plane with a field of view corresponding to the position and direction of the user.
 射影部264はさらに、空間構築部262が構築した空間内のオブジェクトを、所定のレートでビュースクリーンに射影する。この処理も、ポリゴンなどのメッシュを透視変換する一般的なコンピュータグラフィクスの手法を利用できる。画素値決定部266は、図4のGPU224、メインメモリ226などで構成され、ビュースクリーンに射影されてなるオブジェクトの像を構成する画素の値を決定する。このとき上述のとおり基準画像のデータを基準画像データ記憶部256から読み出し、同じオブジェクト上のポイントを表す画素の値を抽出して利用する。 The projection unit 264 further projects the object in the space constructed by the space construction unit 262 onto the view screen at a predetermined rate. This processing can also use a general computer graphics technique for perspective-transforming a mesh such as a polygon. The pixel value determining unit 266 includes the GPU 224, the main memory 226, and the like shown in FIG. 4, and determines the values of the pixels constituting the object image projected onto the view screen. At this time, as described above, the reference image data is read from the reference image data storage unit 256, and pixel values representing points on the same object are extracted and used.
 例えば実際の視点の周囲の基準視点に対し生成された基準画像から対応する画素を特定し、実際の視点と基準視点との距離や角度に基づく重みで平均することにより、表示画像の画素値とする。レイトレーシングなどにより時間をかけて正確に基準画像を生成しておくことにより、運用時には、対応する画素値を読み出し加重平均する、という負荷の軽い計算で、レイトレーシングをした場合に近い高精細な画像表現を実現できる。 For example, by identifying corresponding pixels from a reference image generated with respect to a reference viewpoint around the actual viewpoint, and averaging with a weight based on the distance or angle between the actual viewpoint and the reference viewpoint, the pixel value of the display image To do. By accurately generating a reference image by taking time by ray tracing, etc., a high-definition that is close to the case of ray tracing is achieved with a light load calculation that reads out the corresponding pixel values and performs weighted averaging during operation. Image representation can be realized.
 オブジェクトの移動や変形を表す場合、基準画像は当然、基準視点からその様子を見た動画像となる。したがって画素値決定部266は、射影部264が射影してなるオブジェクトの動画像に対し、対応する時刻の基準画像のフレームを参照する。すなわち画素値決定部266は、空間構築部262が生成する仮想空間でのオブジェクトの動きに対し、同期をとったうえで基準画像の動画像を参照する。 When representing the movement or deformation of an object, the reference image is naturally a moving image viewed from the reference viewpoint. Therefore, the pixel value determination unit 266 refers to the frame of the reference image at the corresponding time for the moving image of the object projected by the projection unit 264. That is, the pixel value determination unit 266 refers to the moving image of the reference image after synchronizing with the movement of the object in the virtual space generated by the space construction unit 262.
 なお基準画像は、レイトレーシングによって描画されたグラフィックス画像に限らず、事前に基準視点から実空間を撮影した画像などでもよい。この場合、空間構築部262は撮影対象となった実空間の形状モデルを構築し、射影部264は表示時の視点に応じたビュースクリーンに当該形状モデルを射影する。あるいは表示時の視点に応じた視野で撮影対象のオブジェクトの像の位置を決定できれば、空間構築部262と射影部264の処理を省略することもできる。 Note that the reference image is not limited to a graphics image drawn by ray tracing, and may be an image obtained by photographing a real space from a reference viewpoint in advance. In this case, the space construction unit 262 constructs a shape model of the real space to be imaged, and the projection unit 264 projects the shape model onto the view screen corresponding to the viewpoint at the time of display. Alternatively, the processing of the space construction unit 262 and the projection unit 264 can be omitted if the position of the image of the object to be imaged can be determined with a visual field corresponding to the viewpoint at the time of display.
 表示画像を立体視させる場合、射影部264および画素値決定部266は、左目および右目の視点に対しそれぞれ処理を行う。出力部268は、図4のCPU222、メインメモリ226、出力部236などで構成され、画素値決定部266が画素値を決定して完成させた表示画像のデータを、ヘッドマウントディスプレイ100に所定のレートで送出する。立体視のためステレオ画像を生成した場合、出力部268はそれらを左右につなげた画像を表示画像として生成し出力する。レンズを介して表示画像を鑑賞する構成のヘッドマウントディスプレイ100の場合、出力部268は、当該レンズによる歪みを考慮した補正を表示画像に施してもよい。 When the display image is stereoscopically viewed, the projection unit 264 and the pixel value determination unit 266 perform processing on the left-eye and right-eye viewpoints, respectively. The output unit 268 includes the CPU 222, the main memory 226, the output unit 236, and the like shown in FIG. 4, and the display image data completed by the pixel value determination unit 266 determining the pixel value is transferred to the head mounted display 100 according to a predetermined value. Send at rate. When a stereo image is generated for stereoscopic viewing, the output unit 268 generates and outputs an image obtained by connecting the left and right as a display image. In the case of the head mounted display 100 configured to view a display image through a lens, the output unit 268 may perform correction on the display image in consideration of distortion caused by the lens.
 図6は、基準画像のデータを生成する装置の機能ブロックを示している。基準画像生成装置300は、図5の表示画像生成装置200の一部としてもよいし、表示に用いるデータを生成する装置として独立に設けてもよい。また生成された基準画像のデータと、生成に用いたオブジェクトモデル、およびその動きを規定するデータを含む電子コンテンツとして記録媒体などに格納しておき、運用時に表示画像生成装置200におけるメインメモリにロードできるようにしてもよい。基準画像生成装置300の内部回路構成は、図4で示した表示画像生成装置200の内部回路構成と同様でよい。 FIG. 6 shows functional blocks of a device that generates reference image data. The reference image generation device 300 may be a part of the display image generation device 200 of FIG. 5 or may be provided independently as a device that generates data used for display. Further, the generated reference image data, the object model used for generation, and the electronic content including the data defining the movement are stored in a recording medium or the like and loaded into the main memory of the display image generation apparatus 200 during operation. You may be able to do it. The internal circuit configuration of the reference image generation device 300 may be the same as the internal circuit configuration of the display image generation device 200 shown in FIG.
 基準画像生成装置300は、基準視点を設定する基準視点設定部310、表示対象のオブジェクトからなる空間を構築する空間構築部316、構築された空間に基づき基準視点ごとに基準画像のデータを生成する基準画像データ生成部318、空間の構築に必要なオブジェクトモデルに係るデータを記憶するオブジェクトモデル記憶部314、および、生成した基準画像のデータを格納する基準画像データ記憶部256を備える。 The reference image generation device 300 generates a reference image data for each reference viewpoint based on the reference space setting unit 310 that sets a reference viewpoint, a space construction unit 316 that constructs a space composed of objects to be displayed, and the constructed space. The apparatus includes a reference image data generation unit 318, an object model storage unit 314 that stores data relating to an object model necessary for constructing a space, and a reference image data storage unit 256 that stores data of the generated reference image.
 基準視点設定部310は、入力部238、CPU222、メインメモリ226などで構成され、表示対象の空間における基準視点の位置座標を設定する。好適にはユーザがとり得る視点の範囲を網羅するように複数の基準視点を分布させる。このような範囲や基準視点の数の適正値は、表示対象の空間の構成、表示の目的、表示に求められる精度、表示画像生成装置200の処理性能などによって異なる。このため基準視点設定部310は、基準視点の位置座標の設定入力を、表示コンテンツの作成者から受け付けるようにしてもよい。あるいは基準視点設定部310は後述するように、オブジェクトの動きに応じて基準視点の位置を変化させてもよい。 The reference viewpoint setting unit 310 includes an input unit 238, a CPU 222, a main memory 226, and the like, and sets the position coordinates of the reference viewpoint in the display target space. Preferably, a plurality of reference viewpoints are distributed so as to cover the range of viewpoints that the user can take. The appropriate values of the range and the number of reference viewpoints vary depending on the configuration of the display target space, the purpose of display, the accuracy required for display, the processing performance of the display image generation device 200, and the like. Therefore, the reference viewpoint setting unit 310 may accept a setting input of the position coordinates of the reference viewpoint from the creator of the display content. Alternatively, the reference viewpoint setting unit 310 may change the position of the reference viewpoint according to the movement of the object, as will be described later.
 空間構築部316は、CPU222、GPU224、メインメモリ226などで構成され、表示対象のオブジェクトが存在する空間の形状モデルを構築する。この機能は図5で示した空間構築部262の機能に対応する。一方、図6の基準画像生成装置300では、レイトレーシングなどによりオブジェクトの像を正確に描画するため、オブジェクトの色や材質を加味したソリッドモデルに基づくモデリング手法を用いる。そのためオブジェクトモデル記憶部314には、色や材質などの情報を含むオブジェクトのモデルデータを格納しておく。 The space construction unit 316 includes a CPU 222, a GPU 224, a main memory 226, and the like, and constructs a shape model of a space in which an object to be displayed exists. This function corresponds to the function of the space construction unit 262 shown in FIG. On the other hand, the reference image generating apparatus 300 in FIG. 6 uses a modeling method based on a solid model in consideration of the color and material of an object in order to accurately draw an image of the object by ray tracing or the like. Therefore, the object model storage unit 314 stores object model data including information such as color and material.
 また空間構築部316は、仮想空間においてオブジェクトを移動させたり変形させたりする。あるいは照明の状態を変化させたりオブジェクトの色を変化させたりしてもよい。このような変化を規定する情報は、オブジェクトモデル記憶部314に格納しておいたものを読み出してもよいし、表示コンテンツの作成者が直接、入力することにより設定してもよい。後者の場合、空間構築部316は、当該入力情報に従いオブジェクトを変化させるとともに、その変化を規定する情報をオブジェクトモデル記憶部314に格納することで、表示時に同じ変化が生じるようにする。 Also, the space construction unit 316 moves or deforms the object in the virtual space. Alternatively, the lighting state may be changed or the color of the object may be changed. Information defining such a change may be read out from information stored in the object model storage unit 314 or may be set by direct input by the creator of the display content. In the latter case, the space construction unit 316 changes the object in accordance with the input information, and stores information defining the change in the object model storage unit 314 so that the same change occurs during display.
 基準画像データ生成部318は、CPU222、GPU224、メインメモリ226などで構成され、基準視点設定部310が設定した基準視点ごとに、当該基準視点から見える表示対象のオブジェクトを所定のレートで描画する。好適には基準視点から全方位を網羅するパノラマ動画として基準画像を準備しておくことにより、表示時の視点も全方位に自由に変化させることができる。また光線の伝播について時間をかけて計算することにより、各基準視点における見え方を基準画像に正確に表すことが望ましい。 The reference image data generation unit 318 includes a CPU 222, a GPU 224, a main memory 226, and the like, and draws a display target object visible from the reference viewpoint for each reference viewpoint set by the reference viewpoint setting unit 310 at a predetermined rate. Preferably, by preparing a reference image as a panoramic video covering all directions from the reference viewpoint, the viewpoint at the time of display can be freely changed to all directions. Further, it is desirable to accurately represent the appearance at each reference viewpoint in the reference image by calculating the propagation of light rays over time.
 基準画像データ生成部318はまた、各基準画像に対応するデプス画像を生成する。すなわち基準画像の各画素が表すオブジェクトの、スクリーン面からの距離(デプス値)を求め、それを画素値として表すデプス画像を生成する。なお基準画像を全方位のパノラマ画像とした場合、ビュースクリーンは球面となるため、デプス値は当該球面の法線方向におけるオブジェクトまでの距離となる。生成したデプス画像は、表示画像の画素値を決定する際に参照する基準画像の選択に利用される。 The reference image data generation unit 318 also generates a depth image corresponding to each reference image. That is, the distance (depth value) from the screen surface of the object represented by each pixel of the reference image is obtained, and a depth image representing this as a pixel value is generated. When the reference image is an omnidirectional panoramic image, the view screen is a spherical surface, and the depth value is the distance to the object in the normal direction of the spherical surface. The generated depth image is used for selecting a reference image to be referred to when determining the pixel value of the display image.
 あるいは後述するように、基準画像データ生成部318は、表示時に参照先の基準画像を選択する際に用いる別の情報を、デプス画像の代わりに生成してもよい。具体的には、オブジェクト表面の位置に対し、当該位置を描画する際に参照すべき基準画像をあらかじめ求めておく。この場合、基準画像データ生成部318はオブジェクトモデルの付加情報として、当該情報をオブジェクトモデル記憶部314に格納する。なお図5のオブジェクトモデル記憶部254には、図6のオブジェクトモデル記憶部314に格納されるデータのうち、少なくとも表示画像の生成に用いるデータを格納すればよい。 Alternatively, as will be described later, the reference image data generation unit 318 may generate other information used when selecting the reference image for reference at the time of display instead of the depth image. Specifically, for the position on the object surface, a reference image to be referred to when drawing the position is obtained in advance. In this case, the reference image data generation unit 318 stores the information in the object model storage unit 314 as additional information of the object model. The object model storage unit 254 in FIG. 5 may store at least data used for generating a display image among the data stored in the object model storage unit 314 in FIG.
 基準画像データ生成部318は生成したデータを、基準視点の位置座標と対応づけて基準画像データ記憶部256に格納する。基準画像データ記憶部256には基本的には1つの基準視点に対し基準画像とデプス画像の対が格納されるが、上述のとおり表示時にデプス画像を用いない態様においては、1つの基準視点に対し基準画像のみが格納される。以後、基準画像とデプス画像の対についても「基準画像のデータ」と呼ぶことがある。 The reference image data generation unit 318 stores the generated data in the reference image data storage unit 256 in association with the position coordinates of the reference viewpoint. The reference image data storage unit 256 basically stores a pair of a reference image and a depth image for one reference viewpoint. However, in a mode in which a depth image is not used during display as described above, one reference viewpoint is used. Only the reference image is stored. Hereinafter, a pair of a reference image and a depth image may also be referred to as “reference image data”.
 本実施の形態では基準画像やデプス画像が動画となるため、基準視点の数によって基準画像のデータサイズが増大しやすい。そこで基準画像データ生成部318は、生成した動画のうち、動きのある領域についてのみ画像を更新するようなデータ構造とすることで、データサイズや表示画像生成時の処理の負荷を軽減させる。また同一時刻の基準画像のフレームとデプス画像のフレームを1つのフレーム内で表した統合動画像を生成し、その単位で圧縮符号化することにより、データサイズを圧縮するとともに、表示時の復号伸張処理や同期処理の負荷を軽減させる。詳細は後に述べる。 In the present embodiment, since the reference image and the depth image are moving images, the data size of the reference image is likely to increase depending on the number of reference viewpoints. Therefore, the reference image data generation unit 318 reduces the data size and the processing load when generating the display image by using a data structure in which the image is updated only for a moving region in the generated moving image. In addition, an integrated moving image in which a reference image frame and a depth image frame at the same time are represented in one frame is generated, and the data size is reduced by compressing and encoding in units, and decoding and decompression at the time of display is performed. Reduce the load of processing and synchronization processing. Details will be described later.
 図7は基準視点の設定例を示している。この例ではユーザ12が立ったときの目の高さの水平面20aと、座ったときの目の高さの水平面20bのそれぞれに、黒丸で示すように複数個の基準視点を設定している。一例として水平面20aは床から1.4m、水平面20bは床から1.0mなどである。またユーザの標準位置(ホームポジション)を中心とする左右方向(図のX軸方向)および前後方向(図のY軸方向)に、表示内容に応じた移動範囲を想定し、水平面20a、20b上の対応する矩形領域に基準視点を分布させる。 FIG. 7 shows an example of setting the reference viewpoint. In this example, a plurality of reference viewpoints are set on the horizontal plane 20a at the height of the eyes when the user 12 stands and the horizontal plane 20b at the height of the eyes when sitting, as indicated by black circles. As an example, the horizontal plane 20a is 1.4 m from the floor, and the horizontal plane 20b is 1.0 m from the floor. In addition, on the horizontal planes 20a and 20b, assuming a moving range according to the display contents in the left-right direction (X-axis direction in the figure) and the front-back direction (Y-axis direction in the figure) centering on the user's standard position (home position) The reference viewpoint is distributed in the corresponding rectangular area.
 この例では、矩形領域をX軸方向、Y軸方向にそれぞれ4等分する格子の交点に1つおきに基準視点を配置している。また上下の水平面20a、20bで基準視点が重ならないようにずらして配置している。結果として図7に示す例では、上の水平面20aにおいて13点、下の水平面20bにおいて12点の、合計25点の基準視点が設定されている。 In this example, the reference viewpoint is arranged at every other intersection of the grids that divide the rectangular area into four equal parts in the X-axis direction and the Y-axis direction, respectively. Further, the upper and lower horizontal surfaces 20a and 20b are arranged so as to be shifted so that the reference viewpoints do not overlap. As a result, in the example shown in FIG. 7, a total of 25 reference viewpoints are set, 13 points on the upper horizontal plane 20a and 12 points on the lower horizontal plane 20b.
 ただし基準視点の分布をこれに限る主旨ではなく、垂直面なども含めた複数の平面上に分布させてもよいし、球面などの曲面上に分布させてもよい。また分布を均等にせず、ユーザがいる確率の高い範囲には他より高い密度で基準視点を分布させてもよい。また上述のとおり、基準視点を表示対象のオブジェクトに対応するように配置し、オブジェクトの移動に応じて基準視点を移動させてもよい。この場合、基準画像は各基準視点の動きを反映するような動画像のデータとなる。 However, the distribution of the reference viewpoint is not limited to this, and may be distributed on a plurality of planes including a vertical plane, or may be distributed on a curved surface such as a spherical surface. In addition, the reference viewpoints may be distributed at a higher density than the others in a range where the probability that the user is present is not made uniform. Further, as described above, the reference viewpoint may be arranged so as to correspond to the object to be displayed, and the reference viewpoint may be moved according to the movement of the object. In this case, the reference image is moving image data that reflects the movement of each reference viewpoint.
 またオブジェクトごとに、それを囲むように基準視点を設定し、各オブジェクトのみを表す基準画像を準備しておくことにより、表示時にはまずオブジェクトごとの画像を生成し、それらを合成することで表示画像を生成してもよい。このようにすることでオブジェクトと基準視点の位置関係を独立して制御できる。結果として例えば、重要なオブジェクトや接近して見られる可能性の高いオブジェクトについてより詳細に表現したり、オブジェクトごとに異なる動きをしても全てのオブジェクトの詳細度を均一に表現したりできる。また背景など停止しているオブジェクトについては、固定の基準視点からの静止画として基準画像を表すことにより、データサイズの増大を抑えられる。 In addition, by setting a reference viewpoint to surround each object and preparing a reference image that represents only each object, an image for each object is first generated at the time of display, and then the displayed image is synthesized. May be generated. In this way, the positional relationship between the object and the reference viewpoint can be controlled independently. As a result, for example, important objects or objects that are likely to be seen in close proximity can be expressed in more detail, or even when different movements are performed for each object, the details of all objects can be expressed uniformly. For a stationary object such as a background, an increase in data size can be suppressed by representing the reference image as a still image from a fixed reference viewpoint.
 図8は、表示画像生成装置200の画素値決定部266が、表示画像の画素値の決定に用いる基準画像を選択する手法を説明するための図である。同図はオブジェクト24を含む表示対象の空間を俯瞰した状態を示している。この空間において、5つの基準視点28a~28eが設定され、それぞれに対し基準画像のデータが生成されているとする。同図において基準視点28a~28eを中心とする円は、全天球のパノラマ画像として準備した基準画像のスクリーン面を模式的に示している。 FIG. 8 is a diagram for explaining a method in which the pixel value determining unit 266 of the display image generating apparatus 200 selects a reference image used for determining the pixel value of the display image. The figure shows a state in which a display target space including the object 24 is viewed from above. In this space, it is assumed that five reference viewpoints 28a to 28e are set, and reference image data is generated for each of them. In the figure, circles centered on the reference viewpoints 28a to 28e schematically show the screen surface of the reference image prepared as a panoramic image of the whole celestial sphere.
 画像表示時のユーザの視点が仮想カメラ30の位置にあるとすると、射影部264は当該仮想カメラ30に対応するようにビュースクリーンを決定し、オブジェクト24のモデル形状を射影する。その結果、表示画像における画素とオブジェクト24の表面上の位置との対応関係が判明する。そして、例えばオブジェクト24の表面のポイント26の像を表す画素の値を決定する場合、画素値決定部266はまず、当該ポイント26が像として表れている基準画像を特定する。 If the user's viewpoint at the time of image display is at the position of the virtual camera 30, the projection unit 264 determines the view screen so as to correspond to the virtual camera 30 and projects the model shape of the object 24. As a result, the correspondence between the pixel in the display image and the position on the surface of the object 24 is determined. For example, when determining the value of a pixel representing the image of the point 26 on the surface of the object 24, the pixel value determining unit 266 first specifies a reference image in which the point 26 appears as an image.
 ワールド座標系における各基準視点28a~28eとポイント26の位置座標は既知であるため、それらの距離は容易に求められる。図ではその距離を、各基準視点28a~28eとポイント26を結ぶ線分の長さで示している。またポイント26を各基準視点のスクリーン面に射影すれば、各基準画像においてポイント26の像が表れるべき画素の位置も特定できる。一方、基準視点の位置によっては、ポイント26がオブジェクトの裏側になったり前にあるオブジェクトに隠蔽されていたりして、基準画像の当該位置にその像が表れないことがある。 Since the position coordinates of the reference viewpoints 28a to 28e and the point 26 in the world coordinate system are known, their distances can be easily obtained. In the figure, the distance is indicated by the length of a line segment connecting the reference viewpoints 28a to 28e and the point 26. If the point 26 is projected onto the screen surface of each reference viewpoint, the position of the pixel where the image of the point 26 should appear in each reference image can be specified. On the other hand, depending on the position of the reference viewpoint, the point 26 may be behind the object or concealed by an object in front, and the image may not appear at the position of the reference image.
 そこで画素値決定部266は、各基準画像に対応するデプス画像を確認する。デプス画像の画素値は、対応する基準画像に像として現れるオブジェクトの、スクリーン面からの距離を表している。したがって、基準視点からポイント26までの距離と、デプス画像におけるポイント26の像が表れるべき画素のデプス値とを比較することで、当該像がポイント26の像であるか否かを判定する。 Therefore, the pixel value determining unit 266 confirms the depth image corresponding to each reference image. The pixel value of the depth image represents the distance from the screen surface of the object that appears as an image in the corresponding reference image. Therefore, by comparing the distance from the reference viewpoint to the point 26 and the depth value of the pixel in which the image of the point 26 in the depth image should appear, it is determined whether or not the image is the image of the point 26.
 例えば基準視点28cからポイント26への視線上には、オブジェクト24の裏側のポイント32があるため、対応する基準画像におけるポイント26の像が表れるべき画素は、実際にはポイント32の像を表している。したがって対応するデプス画像の画素が示す値はポイント32までの距離であり、基準視点28cを始点とする値に換算した距離Dcは、座標値から計算されるポイント26までの距離dcより明らかに小さくなる。そこでデプス画像から得られる距離Dcと座標値から得られるポイント26までの距離dcとの差がしきい値以上であるとき、ポイント26を表す画素値の計算から当該基準画像を除外する。 For example, since there is a point 32 on the back side of the object 24 on the line of sight from the reference viewpoint 28c to the point 26, the pixel in which the image of the point 26 in the corresponding reference image should appear actually represents the image of the point 32. Yes. Accordingly, the value indicated by the corresponding pixel of the depth image is the distance to the point 32, and the distance Dc converted to a value starting from the reference viewpoint 28c is clearly smaller than the distance dc to the point 26 calculated from the coordinate values. Become. Therefore, when the difference between the distance Dc obtained from the depth image and the distance dc to the point 26 obtained from the coordinate value is equal to or greater than the threshold value, the reference image is excluded from the calculation of the pixel value representing the point 26.
 同様に、基準視点28d、28eのデプス画像から得られる対応する画素のオブジェクトまでの距離Dd、Deは、各基準視点28d、28eからポイント26までの距離としきい値以上の差があるとして計算から除外される。一方、基準視点28a、28bのデプス画像から得られる対応する画素のオブジェクトまでの距離Da、Dbは、各基準視点28a、28bからポイント26までの距離と略同一であることがしきい値判定により特定できる。画素値決定部266は、このようにデプス値を用いたスクリーニングを行うことにより、表示画像の画素ごとに、画素値の算出に用いる基準画像を選択する。 Similarly, the distances Dd and De to the corresponding pixel objects obtained from the depth images of the reference viewpoints 28d and 28e are calculated based on the assumption that there is a difference between the distance from each reference viewpoint 28d and 28e to the point 26 and a threshold value or more. Excluded. On the other hand, the threshold values determine that the distances Da and Db to the corresponding pixel objects obtained from the depth images of the reference viewpoints 28a and 28b are substantially the same as the distances from the reference viewpoints 28a and 28b to the point 26. Can be identified. The pixel value determination unit 266 selects the reference image used for calculating the pixel value for each pixel of the display image by performing the screening using the depth value in this way.
 図8では5つの基準視点を例示しているが、実際には図7で示したように分布させた基準視点の全てに対して、デプス値を用いた比較を行う。これにより精度の高い表示画像を描画できる。一方、表示画像の全画素について、25程度のデプス画像および基準画像を参照することは、装置の処理性能によっては看過できない負荷を生むことも考えられる。そこで画素値の決定に用いる基準画像を上述のように選択するのに先立ち、所定の基準によって選択候補とする基準画像を絞りこんでもよい。例えば仮想カメラ30から所定の範囲内に存在する基準視点を抽出し、それらからの基準画像に限定してデプス値を用いた選択処理を実施する。 FIG. 8 illustrates five reference viewpoints, but actually, comparison is performed using depth values for all of the reference viewpoints distributed as shown in FIG. Thereby, a highly accurate display image can be drawn. On the other hand, referring to a depth image and a reference image of about 25 for all the pixels of the display image may cause a load that cannot be overlooked depending on the processing performance of the apparatus. Therefore, prior to selecting the reference image used for determining the pixel value as described above, a reference image to be a selection candidate may be narrowed down according to a predetermined reference. For example, reference viewpoints existing within a predetermined range from the virtual camera 30 are extracted, and selection processing using depth values is performed only for the reference images from them.
 このとき、抽出する基準視点の上限を10個、20個などと設定しておき、そのような上限に収まるように、抽出対象の範囲を調整したり、ランダムあるいは所定の規則に基づき取捨選択したりしてもよい。また表示画像上の領域によって、抽出する基準視点の数を異ならせてもよい。例えばヘッドマウントディスプレイを用いて仮想現実を実現する場合、表示画像の中心領域はユーザの視線の向く方向と一致するため、周辺領域より高い精度での描画が望ましい。 At this time, the upper limit of the reference viewpoint to be extracted is set to 10, 20, etc., and the range of the extraction target is adjusted so as to be within such an upper limit, or is selected randomly or based on a predetermined rule. Or you may. The number of reference viewpoints to be extracted may be varied depending on the area on the display image. For example, when virtual reality is realized using a head-mounted display, since the center area of the display image coincides with the direction in which the user's line of sight is directed, drawing with higher accuracy than the peripheral area is desirable.
 そこで表示画像の中心から所定範囲内にある画素については、ある程度多くの基準視点(基準画像)を選択候補とする一方、それより外側にある画素については、選択候補の数を減らす。一例として中心領域は20個程度、周辺領域は10個程度の基準画像を選択候補とすることが考えられる。ただし領域数は2つに限らず、3領域以上としてもよい。また表示画像中心からの距離に依存した区分けに限らず、注目されるオブジェクトの像の領域などにより動的に区分けすることも考えられる。このように、オブジェクトの像が写っているか否か以外の要因に基づき、参照する基準画像の数を制御することにより、装置の処理性能や表示に求められる精度、表示の内容などを考慮した最適な条件での表示画像描画が可能となる。 Therefore, a certain number of reference viewpoints (reference images) are selected as candidates for pixels that are within a predetermined range from the center of the display image, while the number of selection candidates is reduced for pixels that are outside of that range. As an example, it can be considered that about 20 reference images are used as selection candidates and about 10 reference images are used as peripheral regions. However, the number of regions is not limited to two, and may be three or more regions. In addition to the division depending on the distance from the center of the display image, it is conceivable to dynamically divide according to the image area of the object of interest. In this way, by controlling the number of reference images to be referenced based on factors other than whether or not the image of the object is captured, it is possible to take into account the processing performance of the device, the accuracy required for display, the contents of display, etc. It is possible to draw a display image under various conditions.
 図9は、画素値決定部266が表示画像の画素値を決定する手法を説明するための図である。図8に示したように、基準視点28a、28bの基準画像に、オブジェクト24のポイント26の像が表されていることが判明しているとする。画素値決定部266は基本的に、それらの基準画像におけるポイント26の像の画素値をブレンドすることにより、実際の視点に対応する表示画像におけるポイント26の像の画素値を決定する。 FIG. 9 is a diagram for explaining a method in which the pixel value determining unit 266 determines the pixel value of the display image. As shown in FIG. 8, it is assumed that the image of the point 26 of the object 24 is represented in the reference images of the reference viewpoints 28a and 28b. The pixel value determination unit 266 basically determines the pixel value of the image of the point 26 in the display image corresponding to the actual viewpoint by blending the pixel values of the image of the point 26 in those reference images.
 ここで、ポイント26の像の、基準視点28a、28bの基準画像における画素値(カラー値)をそれぞれc、cとすると、表示画像における画素値Cを次のように計算する。
 C=w・c+w・c
ここで係数w、wはw+w=1の関係を有する重み、すなわち基準画像の寄与率を表し、基準視点28a、28bと、実際の視点を表す仮想カメラ30との位置関係に基づき決定する。例えば仮想カメラ30から基準視点までの距離が近いほど大きな係数とすることで寄与率を大きくする。
Here, assuming that the pixel values (color values) of the image of the point 26 in the reference images of the reference viewpoints 28a and 28b are c 1 and c 2 , respectively, the pixel value C in the display image is calculated as follows.
C = w 1 · c 1 + w 2 · c 2
Here, the coefficients w 1 and w 2 represent weights having a relationship of w 1 + w 2 = 1, that is, the contribution ratio of the reference image, and the positional relationship between the reference viewpoints 28a and 28b and the virtual camera 30 representing the actual viewpoint. Determine based on. For example, the contribution rate is increased by increasing the coefficient as the distance from the virtual camera 30 to the reference viewpoint is shorter.
 この場合、仮想カメラ30から基準視点28a、28bまでの距離をΔa、Δbとし、sum=1/Δa+1/Δbとおくと、重み係数を次のような関数とすることが考えられる。
 w=(1/Δa)/sum
 w=(1/Δb)/sum
上式を、用いる基準画像の数をN、基準視点の識別番号をi(1≦i≦N)、仮想カメラ30からi番目の基準視点までの距離をΔi、各基準画像における対応する画素値をc、重み係数をwとして一般化すると次のようになる。
In this case, if the distances from the virtual camera 30 to the reference viewpoints 28a and 28b are Δa and Δb, and sum = 1 / Δa 2 + 1 / Δb 2 , the weighting factor can be considered as the following function.
w 1 = (1 / Δa 2 ) / sum
w 2 = (1 / Δb 2 ) / sum
In the above equation, the number of reference images to be used is N, the identification number of the reference viewpoint is i (1 ≦ i ≦ N), the distance from the virtual camera 30 to the i-th reference viewpoint is Δi, and the corresponding pixel value in each reference image Is generalized as c i and the weighting factor is w i as follows.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 なお上式においてΔiが0の場合、すなわち仮想カメラ30がいずれかの基準視点と一致する場合は、対応する基準画像の画素値に対する重み係数を1、他の基準画像の画素値に対する重み係数を0とする。これにより、当該視点に対し精度よく作成しておいた基準画像を、そのまま表示画像に反映させることができる。ただし計算式をこれに限る主旨ではない。 In the above equation, when Δi is 0, that is, when the virtual camera 30 matches one of the reference viewpoints, the weight coefficient for the pixel value of the corresponding reference image is 1, and the weight coefficient for the pixel value of the other reference image is 0. Thereby, the reference image created with high accuracy for the viewpoint can be reflected on the display image as it is. However, the calculation formula is not limited to this.
 また重み係数の算出に用いるパラメータは仮想カメラから基準視点までの距離に限らない。例えば仮想カメラ30からポイント26への視線ベクトルVrに対する、各基準視点からポイント26への視線ベクトルVa、Vbのなす角度θa、θb(0≦θa,θb≦90°)に基づいてもよい。例えばベクトルVaおよびVbとベクトルVrとの内積(Va・Vr)、(Vb・Vr)を用いて、重み係数を次のように算出する。
 w=(Va・Vr)/((Va・Vr)+(Vb・Vr))
 w=(Vb・Vr)/((Va・Vr)+(Vb・Vr))
この式を、上述同様、用いる基準画像の数をNとし、基準視点iからポイント26への視線ベクトルをV、重み係数をwとして一般化すると次のようになる。
The parameter used for calculating the weighting coefficient is not limited to the distance from the virtual camera to the reference viewpoint. For example, the angle θa, θb (0 ≦ θa, θb ≦ 90 °) formed by the line-of-sight vectors Va, Vb from each reference viewpoint to the line-of-sight vector Vr from the virtual camera 30 to the point 26 may be used. For example, using the inner products (Va · Vr) and (Vb · Vr) of the vectors Va and Vb and the vector Vr, the weight coefficient is calculated as follows.
w 1 = (Va · Vr) / ((Va · Vr) + (Vb · Vr))
w 2 = (Vb · Vr) / ((Va · Vr) + (Vb · Vr))
As described above, this equation is generalized as follows, where the number of reference images to be used is N, the line-of-sight vector from the reference viewpoint i to the point 26 is V i , and the weighting factor is w i .
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 いずれにしろポイント26に対する状態が仮想カメラ30に近い基準視点ほど大きい重み係数となるような算出規則を導入すれば、具体的な計算式は特に限定されない。距離および角度の双方から多角的に「状態の近さ」を評価して重み係数を決定してもよい。さらにオブジェクト24の、ポイント26における表面形状を加味してもよい。オブジェクトからの反射光の輝度は一般的に表面の傾斜(法線)に基づく角度依存性を有する。そこで、ポイント26における法線ベクトルと仮想カメラ30からの視線ベクトルVrとのなす角度と、当該法線ベクトルと各基準視点からの視線ベクトルVa、Vbとのなす角度とを比較し、その差が小さいほど重み係数を大きくしてもよい。 Anyway, if a calculation rule is introduced such that the reference viewpoint whose state with respect to the point 26 is closer to the virtual camera 30 has a larger weighting factor, the specific calculation formula is not particularly limited. The weighting factor may be determined by evaluating “closeness of state” from both the distance and the angle. Further, the surface shape of the object 24 at the point 26 may be taken into consideration. The brightness of the reflected light from the object generally has an angle dependency based on the inclination (normal) of the surface. Therefore, the angle formed between the normal vector at the point 26 and the line-of-sight vector Vr from the virtual camera 30 is compared with the angle formed between the normal vector and the line-of-sight vectors Va and Vb from each reference viewpoint. You may enlarge a weighting coefficient, so that it is small.
 また、重み係数を算出する関数自体を、オブジェクト24の材質や色などの属性によって切り替えてもよい。例えば鏡面反射成分が支配的な材質の場合、強い指向性を有し、視線ベクトルの角度によって観測される色が大きく変化する。一方、拡散反射成分が支配的な材質の場合、視線ベクトルの角度に対する色の変化がそれほど大きくない。そこで、前者の場合は仮想カメラ30からポイント26への視線ベクトルVrに近い視線ベクトルを持つ基準視点ほど重み係数を大きくするような関数を用い、後者の場合は、全ての基準視点に対し重み係数を等しくしたり、鏡面反射成分が支配的な場合と比較し角度依存性が小さくなるような関数を用いたりしてもよい。 Also, the function itself for calculating the weighting coefficient may be switched depending on attributes such as the material and color of the object 24. For example, in the case of a material in which the specular reflection component is dominant, the material has strong directivity, and the color observed varies greatly depending on the angle of the line-of-sight vector. On the other hand, in the case of a material in which the diffuse reflection component is dominant, the color change with respect to the angle of the line-of-sight vector is not so large. Therefore, in the former case, a function that increases the weighting coefficient is used for the reference viewpoint having a line-of-sight vector closer to the line-of-sight vector Vr from the virtual camera 30 to the point 26. In the latter case, the weighting coefficient is used for all the reference viewpoints. May be made equal, or a function that makes the angle dependency smaller than when the specular reflection component is dominant may be used.
 また同じ理由から、拡散反射成分が支配的な材質の場合、表示画像の画素値Cの決定に用いる基準画像を間引いたり、実際の視線ベクトルVrに所定値以上、近い角度の視線ベクトルを持つ基準画像のみを用いたりしてその数自体を減らし、計算の負荷を抑えてもよい。このように、画素値Cの決定規則をオブジェクトの属性によって異ならせる場合、基準画像データ記憶部256には、基準画像の像ごとに、それが表すオブジェクトの材質など、属性を表すデータを対応づけて格納しておく。 For the same reason, in the case of a material in which the diffuse reflection component is dominant, a reference image used to determine the pixel value C of the display image is thinned out, or a reference having a line-of-sight vector with an angle close to the actual line-of-sight vector Vr by a predetermined value. The number itself may be reduced by using only images or the calculation load may be suppressed. As described above, when the determination rule of the pixel value C is varied depending on the attribute of the object, the reference image data storage unit 256 associates the data representing the attribute such as the material of the object represented by the image of the reference image with each other. And store it.
 以上述べた態様により、オブジェクトの表面形状や材質を加味し、鏡面反射による光の指向性などをより正確に表示画像に反映させることができる。なお重み係数の決定には、オブジェクトの形状に基づく計算、属性に基づく計算、仮想カメラから基準視点までの距離に基づく計算、各視線ベクトルのなす角度に基づく計算、のいずれか2つ以上を組み合わせてもよい。 According to the above-described embodiment, the surface shape and material of the object can be taken into account, and the directivity of light due to specular reflection can be more accurately reflected in the display image. Note that the weighting factor is determined by combining two or more of calculations based on the shape of the object, calculations based on attributes, calculations based on the distance from the virtual camera to the reference viewpoint, and calculations based on the angle formed by each line-of-sight vector. May be.
 次に、これまで述べた構成によって実現できる画像生成装置の動作について説明する。図10は、表示画像生成装置200が視点に応じた表示画像を生成する処理手順を示すフローチャートである。このフローチャートはユーザ操作によりアプリケーション等が開始され初期画像が表示されるとともに、視点の移動を受け付ける状態となったときに開始される。上述のとおり図示する表示処理と並行して電子ゲームなど各種情報処理がなされてもよい。まず空間構築部262は、表示対象のオブジェクトが存在する3次元空間の初期状態をワールド座標系に形成する(S10)。 Next, the operation of the image generation apparatus that can be realized by the configuration described so far will be described. FIG. 10 is a flowchart illustrating a processing procedure in which the display image generation apparatus 200 generates a display image corresponding to the viewpoint. This flowchart is started when an application or the like is started by a user operation, an initial image is displayed, and a viewpoint movement is accepted. As described above, various information processing such as an electronic game may be performed in parallel with the display processing illustrated. First, the space construction unit 262 forms an initial state of a three-dimensional space in which a display target object exists in the world coordinate system (S10).
 一方、視点情報取得部260は、ユーザ頭部の位置や姿勢に基づき、その時点での視点の位置や視線の方向を特定する(S12)。次に射影部264は、視点に対するビュースクリーンを設定し、表示対象の空間に存在するオブジェクトを射影する(S14)。上述のとおりこの処理では、3次元モデルを形成するポリゴンメッシュの頂点を透視変換するなど表面形状のみを考慮すればよい。次に画素値決定部266は、そのように射影されたメッシュ内部の画素のうち対象画素を1つ設定し(S16)、その画素値の決定に用いる基準画像を選択する(S18)。 On the other hand, the viewpoint information acquisition unit 260 identifies the position of the viewpoint and the direction of the line of sight at that time based on the position and orientation of the user's head (S12). Next, the projection unit 264 sets a view screen for the viewpoint, and projects an object existing in the display target space (S14). As described above, in this processing, only the surface shape needs to be considered, such as perspective transformation of the vertexes of the polygon mesh forming the three-dimensional model. Next, the pixel value determining unit 266 sets one target pixel among the pixels inside the mesh thus projected (S16), and selects a reference image used for determining the pixel value (S18).
 すなわち上述のように、対象画素が表すオブジェクト上のポイントが像として表れている基準画像を、各基準画像のデプス画像に基づき決定する。そして画素値決定部266は、それらの基準画像の基準視点と、実際の視点に対応する仮想カメラとの位置関係やオブジェクトの形状、材質などに基づき重み係数を決定したうえ、各基準画像の対応する画素値を加重平均するなどして対象画素の値を決定する(S20)。なお基準画像の画素値から対象画素の画素値を導出する計算は、加重平均以外に統計処理や補間処理として様々に考えられることは当業者には理解されるところである。 That is, as described above, the reference image in which the point on the object represented by the target pixel appears as an image is determined based on the depth image of each reference image. Then, the pixel value determination unit 266 determines the weighting factor based on the positional relationship between the reference viewpoint of these reference images and the virtual camera corresponding to the actual viewpoint, the shape of the object, the material, and the like, and then the correspondence of each reference image. The value of the target pixel is determined by weighted averaging the pixel values to be performed (S20). It should be understood by those skilled in the art that the calculation for deriving the pixel value of the target pixel from the pixel value of the reference image can be considered variously as statistical processing and interpolation processing in addition to the weighted average.
 S18、S20の処理を、ビュースクリーン上の全ての画素について繰り返す(S22のN、S16)。全画素の画素値を決定したら(S22のY)、出力部268は当該データを表示画像のデータとしてヘッドマウントディスプレイ100に出力する(S24)。なお左目用、右目用の表示画像を生成する場合は、それぞれについてS16~S22の処理を実施するとともに適宜つなげて出力する。表示を終了させる必要がなければ(S26のN)、空間構築部262は次の時間ステップに対し表示対象の空間を形成する(S10)。すなわちオブジェクトを初期状態から時間ステップ分だけ動かしたり変形させたりする。そしてその時点でのユーザの視点の情報を取得したうえでビュースクリーンを設定し、表示画像の生成、出力を行う(S12~S24)。S10からS24の処理を表示処理の終了まで繰り返し、表示を終了させる必要が生じたら全ての処理を終了する(S26のY)。 The processes of S18 and S20 are repeated for all pixels on the view screen (N and S16 of S22). When the pixel values of all the pixels are determined (Y in S22), the output unit 268 outputs the data to the head mounted display 100 as display image data (S24). When generating the left-eye display image and the right-eye display image, the processing of S16 to S22 is performed for each, and the images are connected and output as appropriate. If it is not necessary to end the display (N in S26), the space construction unit 262 forms a display target space for the next time step (S10). That is, the object is moved or deformed by a time step from the initial state. Then, after obtaining information on the user's viewpoint at that time, a view screen is set, and a display image is generated and output (S12 to S24). The processes from S10 to S24 are repeated until the end of the display process, and when it is necessary to end the display, all the processes are ended (Y in S26).
 なお図10の例ではビュースクリーン上の全ての画素について基準画像を用いて画素値を決定したが、表示画像上の領域や視点の位置によって描画手法を切り替えてもよい。例えば視点移動による光や色味の変化を必要としないオブジェクトの像については、従来のテクスチャマッピングを行うのみでもよい。また指向性の高い反射光など、局所的な視点でのみ観測される状態は、周囲の基準画像からは表現しきれないことがある。そのため、該当する範囲に視点が入ったときのみレイトレーシングによる描画に切り替えることにより、基準画像として準備するデータの量を抑えることができる。 In the example of FIG. 10, the pixel values are determined using the reference image for all the pixels on the view screen, but the drawing method may be switched depending on the region on the display image and the position of the viewpoint. For example, for an image of an object that does not require changes in light or color due to viewpoint movement, only conventional texture mapping may be performed. In addition, a state observed only from a local viewpoint, such as reflected light with high directivity, may not be expressed from the surrounding reference image. Therefore, the amount of data prepared as a reference image can be suppressed by switching to rendering by ray tracing only when the viewpoint enters the corresponding range.
 図11は、基準画像データ記憶部256に格納されるデータの構造例を示している。基準画像のデータ270は、基準画像の識別情報272ごとに、基準視点の位置座標274、基準画像276、およびデプス画像278を対応づけたデータ構造を有する。基準視点の位置座標274は図7を参照して説明したように、ユーザ12の可動範囲などを考慮して基準視点設定部310が設定する、仮想空間における3次元の位置座標である。 FIG. 11 shows an example of the structure of data stored in the reference image data storage unit 256. The reference image data 270 has a data structure in which the reference viewpoint position coordinates 274, the reference image 276, and the depth image 278 are associated with each reference image identification information 272. As described with reference to FIG. 7, the reference viewpoint position coordinates 274 are three-dimensional position coordinates in the virtual space set by the reference viewpoint setting unit 310 in consideration of the movable range of the user 12.
 基準画像276は、各基準視点から見たときの、動くオブジェクトを含む空間を表す動画のデータである。デプス画像278も、動くオブジェクトを含む空間の、スクリーン面からの距離を表す動画のデータとなる。同図では基準画像を「動画A」、「動画B」、「動画C」、デプス画像を「デプス動画A」、「デプス動画B」、「デプス動画C」といった文字情報で表象しているが、実際には基準画像データ記憶部256における格納領域などの情報を含んでよい。 The reference image 276 is moving image data representing a space including a moving object when viewed from each reference viewpoint. The depth image 278 also becomes moving image data representing the distance from the screen surface of the space including the moving object. In the figure, the reference image is represented by character information such as “moving image A”, “moving image B”, “moving image C”, and the depth image is represented by “depth moving image A”, “depth moving image B”, and “depth moving image C”. Actually, information such as a storage area in the reference image data storage unit 256 may be included.
 図12は、動きのあるオブジェクトを表すための基準視点の設定例を示している。図の表し方は図8と同様である。同図(a)および(b)に示す仮想空間において、オブジェクト34とオブジェクト35が存在している。これに対し基準画像生成装置300の基準視点設定部310は、5つの基準視点30a、30b、30c、30d、30eを設定する。ここで一方のオブジェクト35が、矢印で示すように移動したとする。これに対し(a)は、基準視点を移動させない態様を示している、 FIG. 12 shows an example of setting a reference viewpoint for representing a moving object. The representation of the figure is the same as in FIG. An object 34 and an object 35 exist in the virtual space shown in FIGS. On the other hand, the reference viewpoint setting unit 310 of the reference image generating apparatus 300 sets five reference viewpoints 30a, 30b, 30c, 30d, and 30e. Here, it is assumed that one object 35 has moved as indicated by an arrow. On the other hand, (a) shows a mode in which the reference viewpoint is not moved.
 この場合、各基準画像における変化は、主にオブジェクト35の像の領域に限られる。すなわち基準画像の動画およびデプス画像の動画の各フレームにおいて、広範囲の領域で変化が生じないため、例えばフレーム間差分を利用した圧縮手法を適用することによりデータサイズを小さくできる。一方、(b)に示す態様は、オブジェクト35の移動に対応するように、基準視点30a~30eの少なくとも一部を移動させ、基準視点36a~36eとしている。図示する例では、オブジェクト35の速度ベクトルと同じ速度ベクトルで、4つの基準視点30a~30dを基準視点36a~36dに移動させている。ただし移動規則はこれに限定されず、オブジェクトとの距離が所定のしきい値を超えないように、かつ基準視点同士の距離が所定のしきい値を下回らないように、基準視点を移動させればよい。 In this case, the change in each reference image is mainly limited to the image area of the object 35. That is, in each frame of the moving image of the reference image and the moving image of the depth image, no change occurs in a wide range of areas, and therefore, for example, the data size can be reduced by applying a compression method using inter-frame difference. On the other hand, in the mode shown in (b), at least a part of the reference viewpoints 30a to 30e is moved so as to correspond to the movement of the object 35, thereby obtaining reference viewpoints 36a to 36e. In the illustrated example, the four reference viewpoints 30a to 30d are moved to the reference viewpoints 36a to 36d with the same speed vector as the speed vector of the object 35. However, the movement rule is not limited to this, and the reference viewpoint can be moved so that the distance to the object does not exceed the predetermined threshold and the distance between the reference viewpoints does not fall below the predetermined threshold. That's fine.
 このようにすると、移動するオブジェクト35以外の背景なども相対的に変化するため、フレーム間で変化する領域が広くなりデータ圧縮効率が低くなる。一方、オブジェクトと基準視点の距離を略一定に保てるため、表示画像においてオブジェクトの像の詳細度が変化しにくい。これらの点を踏まえ、オブジェクトを表現するうえで表示画像に求められる詳細度、オブジェクトの移動範囲、好適なデータサイズなどを考慮して、基準視点の設定規則を適宜選択する。 In this case, since the background other than the moving object 35 also changes relatively, the area that changes between frames becomes wider and the data compression efficiency becomes lower. On the other hand, since the distance between the object and the reference viewpoint can be kept substantially constant, the level of detail of the object image is unlikely to change in the display image. Based on these points, the reference viewpoint setting rule is appropriately selected in consideration of the level of detail required for the display image, the range of movement of the object, a suitable data size, and the like when expressing the object.
 ただし全ての基準視点を同じ規則で移動させなくてもよい。例えば図示するように、表示対象の空間に複数のオブジェクト34、35が存在し、その一方のみが移動する場合、停止しているオブジェクト34の近傍にある基準視点30e(=36e)は固定としてもよい。複数のオブジェクト間で移動方向や速度が異なる場合も、それらに対応させて、基準視点の移動方向や速度を個別に設定してよい。 However, it is not necessary to move all reference viewpoints according to the same rules. For example, as shown in the figure, when there are a plurality of objects 34 and 35 in the display target space and only one of them moves, the reference viewpoint 30e (= 36e) in the vicinity of the stopped object 34 may be fixed. Good. Even when the movement direction and speed differ among a plurality of objects, the movement direction and speed of the reference viewpoint may be individually set in correspondence with them.
 例えばオブジェクトごとに、その所定範囲内に当該オブジェクトを担当する基準視点を分布させ、オブジェクトとの位置関係が維持されるように基準視点の位置を制御する。ここで「担当する」とは位置の追随のみを指し、その基準画像には、当該基準視点から見える全てのオブジェクトを表してよい。あるいは上述したように、担当するオブジェクトの像のみを基準画像として表しておき、表示画像の画素値を決定する際に合成してもよい。 For example, for each object, the reference viewpoint in charge of the object is distributed within the predetermined range, and the position of the reference viewpoint is controlled so that the positional relationship with the object is maintained. Here, “responsible” refers only to the tracking of the position, and the reference image may represent all objects visible from the reference viewpoint. Alternatively, as described above, only the image of the object in charge may be represented as a reference image and combined when determining the pixel value of the display image.
 例えば背景のみを表す基準画像を用いて表示画像の画素値を仮に決定したあと、前景となるオブジェクトのみを表す基準画像を用いて表示画像を上書きする。なお複数のオブジェクトを同時に担当する基準視点があってもよい。例えばある基準視点を、複数のオブジェクトの移動速度ベクトルの平均ベクトルで移動させてもよい。なお(b)の態様では、図11で示した基準画像のデータのうち、基準視点の位置座標が時間軸に対し変化するデータとなる。 For example, after temporarily determining the pixel value of the display image using the reference image representing only the background, the display image is overwritten using the reference image representing only the foreground object. There may be a reference viewpoint that handles a plurality of objects simultaneously. For example, a certain reference viewpoint may be moved by an average vector of moving speed vectors of a plurality of objects. In the mode (b), the position coordinates of the reference viewpoint change with respect to the time axis among the data of the reference image shown in FIG.
 したがって基準画像生成装置300は、基準画像のデータと基準視点の位置座標を時間ステップごとに対応づけて基準画像データ記憶部256に格納する。表示画像生成装置200の画素値決定部266は、同じ時間ステップにおける基準視点とユーザの視点との位置関係に基づき上述の重み係数を算出したうえで、当該時間ステップの表示画像の画素値を決定する。 Therefore, the reference image generating apparatus 300 stores the reference image data and the reference viewpoint position coordinates in the reference image data storage unit 256 in association with each time step. The pixel value determination unit 266 of the display image generation apparatus 200 calculates the weighting factor based on the positional relationship between the reference viewpoint and the user viewpoint at the same time step, and then determines the pixel value of the display image at the time step. To do.
 図12の例は、準備する基準画像の全てを用いて表示画像を生成することを想定したが、基準視点を固定として基準画像を生成したうえで、表示画像の生成に用いる基準画像を、オブジェクトの動きに応じて切り替えてもよい。図13は、表示画像の生成に用いる基準画像を、オブジェクトの動きに応じて切り替える態様を説明するための図である。図の表し方は図12と同様である。すなわち仮想空間にはオブジェクト34、35が存在し、そのうち後者が矢印で示すように移動する。 In the example of FIG. 12, it is assumed that the display image is generated using all of the prepared reference images. However, after generating the reference image with the reference viewpoint fixed, the reference image used for generating the display image is the object. You may switch according to the movement of. FIG. 13 is a diagram for describing a mode in which the reference image used for generating the display image is switched according to the movement of the object. The way of representing the figure is the same as in FIG. That is, the objects 34 and 35 exist in the virtual space, and the latter moves as indicated by arrows.
 基準画像生成装置300は、オブジェクトの移動範囲を網羅するように固定の基準視点38a~38fを設定し、それぞれの基準画像を生成しておく。一方、表示画像生成装置200は、オブジェクトの移動に応じて表示に用いる基準画像を切り替える。例えばオブジェクト35の初期位置では、実線で示した基準画像(基準視点38a、38b、38c、38fの基準画像)を表示画像の生成に用いる。一方、移動後の位置では、破線で示した基準画像(基準視点38d、38eの基準画像)を表示画像生成時の参照先に加えると同時に、太い実線で示した基準画像(基準視点38b、38fの基準画像)を参照対象から除外する。 The reference image generation apparatus 300 sets fixed reference viewpoints 38a to 38f so as to cover the movement range of the object, and generates respective reference images. On the other hand, the display image generation device 200 switches the reference image used for display according to the movement of the object. For example, at the initial position of the object 35, a reference image indicated by a solid line (reference images of the reference viewpoints 38a, 38b, 38c, and 38f) is used to generate a display image. On the other hand, at the position after movement, the reference image indicated by the broken line (reference image of the reference viewpoints 38d and 38e) is added to the reference destination when the display image is generated, and at the same time, the reference image indicated by the thick solid line ( reference viewpoints 38b and 38f). Are excluded from the reference object.
 このとき例えば、オブジェクト34、35からの距離がしきい値より小さい基準視点に対応する基準画像を表示画像の生成に用いることとする。このようにしても実質的には基準視点を移動させた場合と同様に、安定した詳細度でオブジェクトを表現できる。また各基準画像の動画自体は視点の移動がないため、フレーム間で変化する領域が限定され圧縮効率が高くなる。ただし基準視点を比較的多く設ける必要があるため、基準画像の動画の数は増える傾向となる。 At this time, for example, a reference image corresponding to a reference viewpoint whose distance from the objects 34 and 35 is smaller than a threshold value is used to generate a display image. In this way, the object can be expressed with a stable level of detail in the same manner as when the reference viewpoint is moved. In addition, since the moving image of each reference image does not move the viewpoint, the region that changes between frames is limited, and the compression efficiency increases. However, since it is necessary to provide a relatively large number of reference viewpoints, the number of moving images of the reference image tends to increase.
 これまで述べたように、基準画像は基本的に動画データとなる。したがってMPEG(Moving Picture Experts Group)など一般的な動画データの圧縮符号化方式を利用して、基準画像データ記憶部256にデータを格納したり伝送したりすることができる。あるいは全天球の画像を正距円筒で表す場合は、一般的な球面調和関数の係数に変換して圧縮してもよい。さらにはJPEG(Joint Photographic Experts Group)など一般的な静止画データの圧縮符号化方式を利用してフレームごとに圧縮してもよい。 As described above, the reference image is basically video data. Therefore, data can be stored or transmitted in the reference image data storage unit 256 using a general moving image data compression encoding method such as MPEG (Moving Picture Picture Experts Group). Alternatively, when an omnidirectional image is represented by an equirectangular cylinder, it may be converted into a coefficient of a general spherical harmonic function and compressed. Further, compression may be performed for each frame by using a general still image data compression encoding method such as JPEG (Joint Photographic Experts Group).
 一方、本実施の形態では、基準画像とデプス画像の動画が対をなすこと、同期をとるべき複数の基準視点の動画を格納対象とすること、などの特性を有するため、固有の圧縮手法を導入することにより効果を高めることができる。図14は、基準画像のデータの圧縮/伸張処理機能を導入した場合の、基準画像生成装置300の基準画像データ生成部と、表示画像生成装置200の画素値決定部の機能ブロックの構成を示している。 On the other hand, the present embodiment has characteristics such as that the reference image and the depth image video are paired and that the videos of a plurality of reference viewpoints to be synchronized are to be stored. The effect can be enhanced by introducing. FIG. 14 shows a configuration of functional blocks of the reference image data generation unit of the reference image generation device 300 and the pixel value determination unit of the display image generation device 200 when the compression / decompression processing function of the reference image data is introduced. ing.
 この態様において基準画像データ生成部318aは、基準画像生成部330、デプス画像生成部332、およびデータ圧縮部334を含む。基準画像生成部330およびデプス画像生成部332は、これまで述べたように基準画像とデプス画像のデータを生成する。すなわち基準視点設定部310が設定した各基準視点からの空間の様子を表した基準画像の動画像、および距離値を表したデプス画像の動画像を生成する。ここで基準視点は固定としてもよいし、オブジェクトの動きに応じてその一部を移動させてもよい。 In this aspect, the reference image data generation unit 318a includes a reference image generation unit 330, a depth image generation unit 332, and a data compression unit 334. The reference image generation unit 330 and the depth image generation unit 332 generate the reference image and depth image data as described above. That is, a moving image of the reference image representing the state of the space from each reference viewpoint set by the reference viewpoint setting unit 310 and a moving image of the depth image representing the distance value are generated. Here, the reference viewpoint may be fixed, or a part thereof may be moved according to the movement of the object.
 データ圧縮部334は、そのようにして時間軸に対し所定のレートで生成される基準画像とデプス画像を、所定の規則で圧縮する。具体的には次の処理の少なくとも一方を実施する。
(1)同じ時間ステップにおける基準画像とデプス画像を、必要に応じて縮小したうえ、1フレーム分の画像として表した統合動画像を生成する
(2)基準画像およびデプス画像において、変化のある領域のみを時系列データとして表す
The data compression unit 334 compresses the reference image and the depth image thus generated at a predetermined rate with respect to the time axis according to a predetermined rule. Specifically, at least one of the following processes is performed.
(1) A reference image and a depth image at the same time step are reduced as necessary, and an integrated moving image expressed as an image for one frame is generated. (2) An area having a change in the reference image and the depth image Represents only as time-series data
 データ圧縮部334は、そのようにして圧縮したデータを基準画像データ記憶部256に格納する。この際、統合画像や変化のある領域の画像の1フレーム分を、JPEGによりさらに圧縮してもよい。あるいは統合画像の動画をMPEGにより圧縮してもよい。一方、画素値決定部266aは、データ伸張部336、参照部338、および演算部340を含む。データ伸張部336は、各時間ステップにおける基準画像のデータを基準画像データ記憶部256から読み出し、伸張することにより、基準画像およびデプス画像を復元する。 The data compression unit 334 stores the compressed data in the reference image data storage unit 256. At this time, one frame of the integrated image or the image of the area with change may be further compressed by JPEG. Alternatively, the moving image of the integrated image may be compressed by MPEG. On the other hand, the pixel value determination unit 266a includes a data decompression unit 336, a reference unit 338, and a calculation unit 340. The data decompression unit 336 restores the reference image and the depth image by reading the reference image data at each time step from the reference image data storage unit 256 and decompressing the data.
 すなわち上記(1)の圧縮がなされている場合、データ伸張部は、統合動画像の各フレームから、基準画像およびデプス画像を切り出し、必要に応じて拡大する。上記(2)の圧縮がなされている場合は、前のフレーム画像のうち変化のある領域のみを、時系列データを用いて更新する。(1)および(2)の圧縮を同時に実施する場合は、伸張する場合もその双方を実施する。 That is, when the compression of (1) is performed, the data decompression unit cuts out the reference image and the depth image from each frame of the integrated moving image, and enlarges them as necessary. When the compression of (2) is performed, only the changed area in the previous frame image is updated using the time series data. When the compressions (1) and (2) are performed at the same time, both are performed even when decompressing.
 参照部338はそのようにして復元された、各時間ステップのデプス画像を用いて、上述のとおり表示画像の画素ごとに、描画対象のオブジェクト上のポイントを表す基準画像を選択し、当該基準画像の画素値を取得する。演算部340も上述したように、参照先の基準画像から取得した画素値に適切な重みをつけて平均することにより、表示画像の画素値を決定する。 The reference unit 338 selects the reference image representing the point on the object to be drawn for each pixel of the display image using the depth image of each time step restored as described above, and the reference image The pixel value of is acquired. As described above, the calculation unit 340 also determines the pixel value of the display image by averaging the pixel value acquired from the reference image of the reference destination with an appropriate weight.
 図15は、データ圧縮部334によって生成される、統合動画像の例を模式的に示している。この統合動画像42は、1つのフレーム40を分割してなる4つの領域に、2つの基準視点に対し生成された「第1基準画像」および「第2基準画像」と、それらに対応する「第1デプス画像」および「第2デプス画像」のうち、同じ時間ステップのフレームをそれぞれ表したデータ構造を有する。データ圧縮部334は、統合動画像42に設定される画像平面のサイズに応じて、基準画像およびデプス画像のフレームを適宜縮小したうえ、図示するような所定の配置で接続する。 FIG. 15 schematically illustrates an example of an integrated moving image generated by the data compression unit 334. The integrated moving image 42 is divided into four regions obtained by dividing one frame 40, and “first reference image” and “second reference image” generated for two reference viewpoints, and “corresponding“ Of the “first depth image” and the “second depth image”, each has a data structure representing frames of the same time step. The data compression unit 334 appropriately reduces the frames of the reference image and the depth image according to the size of the image plane set in the integrated moving image 42, and connects them in a predetermined arrangement as illustrated.
 例えば統合動画像42を、元の基準画像やデプス画像のフレームと同サイズとした場合、データ圧縮部334は各基準画像およびデプス画像のフレームを、縦横双方向に1/2に縮小する。データ圧縮部334はさらに、統合動画像として統合された2つの基準視点の位置座標を、当該動画像の付加データとして対応づける。これらの処理は、図11で示した基準画像のデータにおいて、2行分のデータを1つの動画像に変換することに対応する。 For example, when the integrated moving image 42 has the same size as the frame of the original reference image or depth image, the data compression unit 334 reduces the frame of each reference image and depth image to 1/2 in both the vertical and horizontal directions. The data compression unit 334 further associates the position coordinates of the two reference viewpoints integrated as an integrated moving image as additional data of the moving image. These processes correspond to the conversion of data for two rows into one moving image in the reference image data shown in FIG.
 このようにすれば、基準画像のデータ全体のサイズを小さくできるため、伝送帯域や記憶装置の容量を節約できる。また4種類の動画を一度に復号伸張できるため、多数の基準視点を設定しても復元のための並列処理が容易になる。さらに当該4種類のデータ間では自ずと同期がとれるため、全基準視点のデータを考慮しても同期処理を簡略化できる。なお1つの統合動画像42によって統合する基準視点は2つに限らず、各画像に許容される縮小率によってはそれより多くてもよい。 This makes it possible to reduce the size of the entire reference image data, thereby saving transmission bandwidth and storage device capacity. In addition, since four types of moving images can be decoded and expanded at a time, parallel processing for restoration is facilitated even if a large number of reference viewpoints are set. Furthermore, since the four types of data are naturally synchronized, the synchronization process can be simplified even if the data of all reference viewpoints are considered. Note that the number of reference viewpoints to be integrated by one integrated moving image 42 is not limited to two, and may be larger depending on the reduction ratio allowed for each image.
 図16は、データ圧縮部334によって生成される、統合動画像の別の例を模式的に示している。この統合動画像46は、1つのフレーム44を分割してなる4つの領域に、3つの基準視点に対し生成された「第1基準画像」、「第2基準画像」、および「第3基準画像」と、それらに対応する「第1デプス画像」、「第2デプス画像」、および「第3デプス画像」のうち、同じ時間ステップのフレームを表したデータ構造を有する。 FIG. 16 schematically shows another example of the integrated moving image generated by the data compression unit 334. This integrated moving image 46 is divided into four areas obtained by dividing one frame 44, and “first reference image”, “second reference image”, and “third reference image” generated for three reference viewpoints. ”And the corresponding“ first depth image ”,“ second depth image ”, and“ third depth image ”, have a data structure representing frames of the same time step.
 図15で示した統合動画像42の場合、「第1デプス画像」と「第2デプス画像」を、画像平面の別の領域に表すことにより、用いるチャンネルや階調は限定されない。一方、図16に示す統合動画像46は、「第1デプス画像」、「第2デプス画像」、および「第3デプス画像」を、赤(R)、緑(G)、青(B)の3チャンネルを使用して画像平面の同じ領域に表す。これにより、残りの3領域に3つの基準画像を表すことができる。 In the case of the integrated moving image 42 shown in FIG. 15, the channels and gradations to be used are not limited by representing the “first depth image” and the “second depth image” in different regions of the image plane. On the other hand, the integrated moving image 46 shown in FIG. 16 includes “first depth image”, “second depth image”, and “third depth image” in red (R), green (G), and blue (B). Three channels are used to represent the same area in the image plane. Thereby, three reference images can be represented in the remaining three regions.
 このようなデータ構造によれば、画像の縮小率は図15の場合と同じでありながら、3つの基準視点のデータを1つの動画に含めることができる。結果として画質を維持しつつ、同期処理や復号伸張処理をより効率化できる。ただしRGBの画像をYCbCrの画像に変換して圧縮符号化する場合は、表示画像生成装置200において復号伸張した際、他のデプス画像の画素値が影響して完全に復元されない可能性がある。したがって、RGBの値を精度よく復元できる圧縮符号化方式を採用することが望ましい。 According to such a data structure, the image reduction rate is the same as in FIG. 15, but data of three reference viewpoints can be included in one moving image. As a result, synchronization processing and decoding expansion processing can be made more efficient while maintaining image quality. However, when RGB images are converted into YCbCr images for compression encoding, there is a possibility that when the display image generating apparatus 200 performs decoding and decompression, the pixel values of other depth images are affected and cannot be completely restored. Therefore, it is desirable to employ a compression encoding method that can accurately restore RGB values.
 図17は、データ圧縮部334が実施する圧縮処理の一つとして、変化のある領域の画像のみを時系列データとする手法を説明するための図である。この例は道路を走行する自動車を表す動画像を想定しており、(a)はそのうちの6フレーム分の基準画像を、横軸を時間として連続して示している。ここで基準画像の各フレームは、基準視点から見た全方位の画像を、正距円筒で表したものである。この場合、オブジェクトである自動車以外の道路や背景はほぼ動きがない。 FIG. 17 is a diagram for explaining a technique for converting only an image in a region having a change into time-series data as one of the compression processes performed by the data compression unit 334. In this example, a moving image representing a car traveling on a road is assumed, and (a) shows a reference image for 6 frames of them continuously with the horizontal axis as time. Here, each frame of the reference image represents an omnidirectional image viewed from the reference viewpoint with an equirectangular cylinder. In this case, there is almost no movement on the road and background other than the object automobile.
 同図(b)は(a)で示した各フレームのうち、自動車を含む固定サイズの領域(例えば領域50)を抽出したものである。上述したように基準画像の動画のうち変化するのは、抽出した領域にほぼ限定される。したがってデータ圧縮部334は、ある時点でのフレーム、例えばフレーム52を基準フレームとしてその全体領域を保存し、それより後の時間ステップのフレームについては、オブジェクトを含む所定サイズの領域の画像(例えば画像54)の時系列データと、基準画像平面における当該領域の位置情報を対応づけて保存することにより、圧縮後の基準画像のデータとする。 (B) in FIG. 6 is a region in which a fixed size region (for example, region 50) including the automobile is extracted from each frame shown in (a). As described above, the change in the moving image of the reference image is almost limited to the extracted region. Therefore, the data compression unit 334 stores the entire region using a frame at a certain point in time, for example, the frame 52 as a reference frame, and an image of a region of a predetermined size including the object (for example, an image) 54) and the time-series data of 54) and the position information of the area on the reference image plane are stored in association with each other to obtain the compressed reference image data.
 データ伸張部336は、基準フレームが与えられている時間ステップについてはそれを基準画像とし、それより後の時間ステップについては、時系列データとして保存されている領域のみを更新していくことで基準画像を復元する。なお図示するように、オブジェクトを含む固定サイズの領域の画像54は、基準フレームにおける対応する領域50の画像より高い解像度としてもよい。これによれば、基準フレームのサイズを縮小してデータサイズを軽減させても、ユーザが注視することが想定されるオブジェクトの領域については詳細度を維持できる。また基準フレームは各動画の最初のフレームでもよいし、所定の時間間隔のフレームでもよい。 The data decompression unit 336 uses the time step to which the reference frame is given as a reference image, and for subsequent time steps, updates only the area stored as time-series data. Restore the image. As shown in the figure, the image 54 of the fixed-size area including the object may have a higher resolution than the image of the corresponding area 50 in the reference frame. According to this, even if the size of the reference frame is reduced to reduce the data size, the degree of detail can be maintained for the object region that the user is expected to watch. The reference frame may be the first frame of each moving image or may be a frame at a predetermined time interval.
 同図(c)はさらに、オブジェクトの像の領域、例えばオブジェクトの輪郭から所定距離に4辺を有する矩形領域のみを抽出したものである。この場合、基準視点とオブジェクトの位置関係によって抽出される領域のサイズは様々となる。データ圧縮部334は(a)に示した基準画像の各フレームから、オブジェクトの像を抽出して切り出す領域を決定する。そしてある時点でのフレーム、例えばフレーム52を基準フレームとしてその全体領域を保存し、それより後の時間ステップのフレームについては、オブジェクトの像の領域の画像(例えば画像56)の時系列データと、基準画像平面における当該領域の位置情報およびサイズの情報を対応づけて保存することにより、圧縮後の基準画像のデータとする。 (C) in the figure further extracts only an image area of the object, for example, a rectangular area having four sides at a predetermined distance from the outline of the object. In this case, the size of the extracted region varies depending on the positional relationship between the reference viewpoint and the object. The data compression unit 334 extracts an object image from each frame of the reference image shown in FIG. Then, the entire region is stored with a frame at a certain time point, for example, the frame 52 as a reference frame, and the time step frame after that is stored with time series data of the image of the object image region (for example, the image 56), By storing the positional information and size information of the area on the reference image plane in association with each other, the compressed reference image data is obtained.
 あるいは基準画像生成部330が基準画像を生成する段階で、オブジェクトのみを表す画像を画像56として生成してもよい。この場合、基準視点は固定としたままオブジェクトをズームするようにスクリーン面を調整すればよい。データ伸張部336の動作は(b)の場合と同様である。(a)~(c)の態様は、基準画像のみならずデプス画像についても同様に実施できる。基準画像とデプス画像に適用する圧縮手法は同じでもよいし異なっていてもよい。(c)の圧縮手法によれば、基準視点とオブジェクトの距離によらず同様の詳細度でオブジェクトの情報を保持することができる。 Alternatively, an image representing only the object may be generated as the image 56 when the reference image generating unit 330 generates the reference image. In this case, the screen surface may be adjusted so that the object is zoomed while the reference viewpoint is fixed. The operation of the data decompression unit 336 is the same as in the case of (b). The modes (a) to (c) can be implemented not only for the reference image but also for the depth image. The compression method applied to the reference image and the depth image may be the same or different. According to the compression method (c), object information can be held with the same level of detail regardless of the distance between the reference viewpoint and the object.
 図18は、データ圧縮部334が実施する圧縮処理の一つとして、変化のある画素のみを表す情報を時系列データとする手法を説明するための図である。図の横軸は時間を示している。まず画像60は、基準画像の1フレームあるいはそのうちの一部である。画像62aは、画像60の次のフレームに対応するが、画像60との画素値の差が所定値以上ある画素をグレーで示している。画像62bはさらに次のフレームに対応し、同様に前のフレームとの画素値の差が所定値以上ある画素をグレーで示している。 FIG. 18 is a diagram for explaining a method of using information representing only pixels with changes as time-series data as one of the compression processes performed by the data compression unit 334. The horizontal axis of the figure indicates time. First, the image 60 is one frame of the reference image or a part thereof. The image 62a corresponds to the next frame of the image 60, but the pixel having a pixel value difference from the image 60 of a predetermined value or more is shown in gray. The image 62b further corresponds to the next frame, and similarly, a pixel whose pixel value difference from the previous frame is a predetermined value or more is shown in gray.
 データ圧縮部334は、基準画像のフレーム間差分をとり、画素値に所定値以上の差がある画素を抽出する。その結果、図示する例では自動車のボンネットやバンパーを含むフロント部分や、自動車前方の路面を表す画素が抽出される。次にデータ圧縮部334は、抽出した画素の位置座標と変化後の画素値からなるデータ(x,y,R,G,B)を、5チャンネルの画素値としてラスタ順につめて保持した画像64a、64bを生成する。ここで(x,y)は基準画像平面における画素の位置座標、(R,G,B)は基準画像の画素値すなわちカラー値である。 The data compression unit 334 takes the inter-frame difference of the reference image, and extracts a pixel having a difference of a predetermined value or more in the pixel value. As a result, in the illustrated example, the front portion including the hood and bumper of the automobile and pixels representing the road surface in front of the automobile are extracted. Next, the data compression unit 334 stores the data (x, y, R, G, B) composed of the extracted pixel position coordinates and the changed pixel value as a 5-channel pixel value in raster order and holds the image 64a. , 64b is generated. Here, (x, y) is a pixel position coordinate on the reference image plane, and (R, G, B) is a pixel value of the reference image, that is, a color value.
 デプス画像の場合は、(d)をデプス画像の画素値すなわち距離値としたとき、抽出した画素の位置座標と変化後の画素値からなるデータ(x,y,d)を、3チャンネルの画素値としてラスタ順につめて保持した画像を生成する。そして画像60を基準フレームとしてその全体領域を保存し、それより後の時間ステップのフレームについては、変化のある画素の情報のみを表す画像64a、64bを時系列データとして保存することにより、圧縮後の基準画像の動画データとする。 In the case of a depth image, when (d) is the pixel value of the depth image, that is, the distance value, the data (x, y, d) consisting of the extracted pixel position coordinates and the changed pixel value is represented by a three-channel pixel. An image held as a value in a raster order is generated. Then, the entire area is stored using the image 60 as a reference frame, and for the frames in subsequent time steps, the images 64a and 64b representing only the information of the changed pixels are stored as time-series data, so that the post-compression Video data of the reference image.
 データ伸張部336は、基準フレームが与えられている時間ステップについてはそれを基準画像とし、それより後の時間ステップについては、時系列データとして保存されている画素のみを更新していくことで基準画像を復元する。デプス画像も同様である。これにより図17に示した態様より、オブジェクトの形状を加味してデータサイズをさらに軽減できる。なお基準フレームは各動画の最初のフレームでもよいし、所定の時間間隔のフレームでもよい。図17の態様と図18の態様を適宜組み合わせてもよい。 The data decompression unit 336 uses the time step to which the reference frame is given as a reference image, and for the time steps after that, updates only the pixels stored as time-series data. Restore the image. The same applies to the depth image. As a result, the data size can be further reduced in consideration of the shape of the object as compared with the mode shown in FIG. The reference frame may be the first frame of each moving image or may be a frame at a predetermined time interval. The embodiment of FIG. 17 and the embodiment of FIG. 18 may be appropriately combined.
 図19は、基準画像の動画において前後する2つのフレームを例示している。これまで述べたように、表示対象の空間において移動したり変形したりするメインのオブジェクトの数が限定される場合、そのフレーム間で差が生じるのはごく一部の領域に限られる。図示する自動車が走行する画像でも、上段のフレームから下段のフレームの間では、自動車の像70a、70bの微小な動きと、道路上72a、72bの微細な反射の変化のみが生じている。 FIG. 19 exemplifies two frames that move back and forth in the moving image of the reference image. As described above, when the number of main objects that move or deform in the display target space is limited, the difference between the frames is limited to only a part of the region. Even in the illustrated image of the automobile traveling, only the minute movements of the automobile images 70a and 70b and the minute reflection changes on the roads 72a and 72b occur between the upper frame and the lower frame.
 またこの例では、画像平面において道路より上側の領域74a、74bは遠景である。遠景は本実施の形態で想定する、表示対象の空間に配置したオブジェクトの表面とは性質が異なり、ユーザの視点の移動に対して変化させる必要がない場合が多い。したがって別途、所定の基準視点における画像をテクスチャマッピングなどにより表示画像に表して差し支えない。換言すれば当該領域の画像のデータを、基準視点ごとに保持する必要性が低い。これらの性質を利用して、基準画像およびデプス画像を所定サイズのタイル画像に分割したうえで、当該タイル画像の単位で圧縮処理を制御してもよい。 In this example, regions 74a and 74b above the road on the image plane are distant views. The distant view is different in nature from the surface of the object arranged in the display target space assumed in the present embodiment, and often does not need to be changed with respect to the movement of the user's viewpoint. Therefore, an image at a predetermined reference viewpoint may be separately displayed as a display image by texture mapping or the like. In other words, it is less necessary to hold the image data of the area for each reference viewpoint. Using these properties, the compression process may be controlled in units of tile images after the reference image and the depth image are divided into tile images of a predetermined size.
 図20は、データ圧縮部334が基準画像の圧縮処理をタイル画像単位で制御する手法を説明するための図である。図示する画像は図19で示した1フレームに対応し、格子に区切られてなるマトリクス状の矩形がタイル画像を表している。タイル画像のサイズはあらかじめ設定しておく。そのようなタイル画像のうち、遠景領域80に含まれる、白枠で囲まれたタイル画像は、上述のとおりユーザの視点の移動を反映させる必要がないため、基準視点ごとの基準画像のデータから除外する。 FIG. 20 is a diagram for explaining a method in which the data compression unit 334 controls the compression process of the reference image in units of tile images. The illustrated image corresponds to one frame shown in FIG. 19, and a matrix-like rectangle divided by a lattice represents a tile image. The size of the tile image is set in advance. Among such tile images, the tile image surrounded by the white frame included in the distant view area 80 does not need to reflect the movement of the user's viewpoint as described above, and therefore, from the reference image data for each reference viewpoint. exclude.
 残りの黒線で囲まれたタイル画像は近景、すなわちオブジェクトの描画に用いる領域82に含まれるため、時系列データとして基準視点ごとの基準画像のデータに含める。あるいはさらに、実線で囲まれたタイル画像(例えばタイル画像84)のように、前のフレームとの差が生じたタイル画像を抽出し、その時系列データのみを基準画像のデータに含めてもよい。例えば同位置のタイル画像の画素値の平均値が、フレーム間で所定値以上の差を有するとき、前のフレームとの差が生じたと判定し抽出する。 Since the tile image surrounded by the remaining black lines is included in the foreground, that is, the region 82 used for drawing the object, it is included in the reference image data for each reference viewpoint as time series data. Alternatively, a tile image having a difference from the previous frame, such as a tile image surrounded by a solid line (for example, tile image 84), may be extracted, and only the time-series data may be included in the reference image data. For example, when the average value of the pixel values of tile images at the same position has a difference of a predetermined value or more between frames, it is determined that a difference from the previous frame has occurred and is extracted.
 または前のフレームとの差が生じているタイル画像(例えばタイル画像84)のうち、前のフレームから所定値以上の差がある画素を抽出し、当該画素の位置座標および画素値からなるデータを表す画像を生成してもよい。この処理は図18で説明したとおりである。デプス画像も同様に、タイル画像単位でデータを省いたり圧縮状態を制御したりできる。デプス画像全体を一般的な動画データとして扱った場合、例えば距離値をSDR(Standard Dynamic Range)の256階調などで表現せざるを得ないため、小数点以下の情報が欠落する。タイル画像単位で元の画素値(距離値)を浮動小数点のデータとして保存するようにすれば、距離値の分解能が上がり、描画に用いる基準画像を精度よく選択できる。 Or, from a tile image (for example, tile image 84) in which a difference from the previous frame is generated, a pixel having a difference of a predetermined value or more from the previous frame is extracted, and data including the position coordinate and the pixel value of the pixel is extracted. An image to represent may be generated. This processing is as described in FIG. Similarly, in the depth image, data can be omitted or the compression state can be controlled in units of tile images. When the entire depth image is handled as general moving image data, for example, the distance value must be expressed in 256 gradations of SDR (Standard Dynamic Range), and therefore information after the decimal point is lost. If the original pixel value (distance value) is stored as floating point data in units of tile images, the resolution of the distance value is increased, and the reference image used for drawing can be selected with high accuracy.
 図21は、基準画像およびデプス画像の圧縮処理をタイル画像単位で制御する態様における、圧縮後のデータの構造例を示している。圧縮後基準画像データ350は基準視点ごとに生成され、画像平面におけるタイル画像の位置座標(「タイル位置」と表記)に対応づけて、タイル画像のデータを時系列順に接続したデータ構造を有する。図では時系列を「フレーム番号」0、1、2、・・・の順としている。例えば位置座標(0,0)や(1,0)のタイル画像が遠景領域に含まれる場合、その領域の画像はオブジェクトの描画には用いないため基準画像のデータとしては無効とし、別途、テクスチャデータなどの形式で準備する。図では当該タイル画像のデータが無効であることを「-」と表記している。 FIG. 21 shows an example of the structure of data after compression in a mode in which the compression processing of the reference image and the depth image is controlled in units of tile images. The post-compression reference image data 350 is generated for each reference viewpoint, and has a data structure in which the tile image data is connected in time series in association with the position coordinates of the tile image (denoted as “tile position”) on the image plane. In the figure, the time series are in the order of “frame numbers” 0, 1, 2,. For example, when a tile image with position coordinates (0, 0) or (1, 0) is included in the distant view area, the image in that area is not used for drawing an object and is therefore invalid as the reference image data. Prepare in the form of data. In the figure, “-” indicates that the data of the tile image is invalid.
 一方、近景領域に含まれオブジェクトの描画に用いられる可能性のあるタイル画像については、まず最初のフレーム(フレーム番号「0」)のデータを基準画像のデータに含める。図では当該タイル画像を「画像a」、「画像b」などとしている。その後のフレームについては、タイル画像に変化が生じた場合のみ、その変化を表す情報を基準画像のデータに含める。図示する例では、位置座標(70,65)と(71,65)のタイル画像はフレーム番号「1」で変化があるため、その差分を表す画像「差分画像c1」、「差分画像d1」を含めている。 On the other hand, for the tile image that is included in the foreground area and may be used for drawing the object, first, the data of the first frame (frame number “0”) is included in the data of the reference image. In the figure, the tile images are “image a”, “image b”, and the like. For the subsequent frames, only when a change occurs in the tile image, information indicating the change is included in the data of the reference image. In the example shown in the figure, the tile images of the position coordinates (70, 65) and (71, 65) have a change with the frame number “1”. It is included.
 位置座標(70,65)のタイル画像は次のフレームでも差が生じているため、フレーム番号「2」に対応づけて「差分画像c2」を含めている。ここで差分画像は、前のフレームとの差分を表す画像であり、例えば図18の画像64a、64bに対応する。また位置座標(30,50)のタイル画像はフレーム番号「24」で、位置座標(31,50)のタイル画像はフレーム番号「25」で変化があるため、それぞれの差分を表す画像「差分画像a1」、「差分画像b1」を含めている。 Since the tile image at the position coordinates (70, 65) has a difference even in the next frame, “difference image c2” is included in association with the frame number “2”. Here, the difference image is an image representing a difference from the previous frame, and corresponds to, for example, the images 64a and 64b in FIG. Further, since the tile image of the position coordinate (30, 50) has a frame number “24” and the tile image of the position coordinate (31, 50) has a change of the frame number “25”, an image “difference image” representing each difference. a1 "and" difference image b1 ".
 表示画像生成装置200のデータ伸張部336は、フレーム番号「0」に対応づけられた各タイル画像を位置座標に基づき接続することで、当該フレームの基準画像やデプス画像を復元する。以後のフレームについては、差分画像が含められているタイル領域のみ、差分画像として表されている画素を更新すれば、基準画像やデプス画像の動画を全て復元できる。 The data decompression unit 336 of the display image generation apparatus 200 restores the reference image and the depth image of the frame by connecting the tile images associated with the frame number “0” based on the position coordinates. For the subsequent frames, only the moving image of the reference image and the depth image can be restored by updating the pixels represented as the difference image only in the tile area in which the difference image is included.
 これまで説明した態様は、基準画像として全方位の画像を正距円筒で表すことを想定していたが、本実施の形態はそれに限らない。図22は基準画像およびデプス画像の全方位の画像をキューブマップで表した場合の、データ圧縮処理の例を説明するための図である。(a)は全方位のスクリーン面とキューブマップの面の関係を示している。キューブマップの面362は、視点364から全方位に同じ距離を有する球状のスクリーン面360を包含する立方体を構成する面である。 In the embodiment described so far, it is assumed that an omnidirectional image is represented by an equirectangular cylinder as a reference image, but the present embodiment is not limited thereto. FIG. 22 is a diagram for explaining an example of data compression processing in a case where images of all directions of the reference image and the depth image are represented by a cube map. (A) has shown the relationship between the screen surface of all directions, and the surface of a cube map. The cube map surface 362 is a surface constituting a cube including a spherical screen surface 360 having the same distance from the viewpoint 364 in all directions.
 スクリーン面360上のある画素366は、視点364から画素366への直線がキューブマップの面362で交わる位置368にマッピングされる。このようなキューブマッピング技術は、パノラマ画像の表現手段の一つとして知られている。本実施の形態では、基準画像およびデプス画像をキューブマップのデータとして保持することができる。(b)は、ある基準視点のデプス画像をキューブマップで表したときの、6面の展開図を示している。 A certain pixel 366 on the screen surface 360 is mapped to a position 368 where a straight line from the viewpoint 364 to the pixel 366 intersects with the surface 362 of the cube map. Such a cube mapping technique is known as one of panoramic image expression means. In the present embodiment, the reference image and the depth image can be held as cube map data. (B) is a development view of six surfaces when a depth image at a certain reference viewpoint is represented by a cube map.
 これまで述べたように基準画像を動画とした場合、図示するような画像データが所定のレートで生成される。ただし図17~図20で例示したような空間を表した場合、前のフレームからの差が生じているのは、同図(b)に矢印で示した自動車の像の領域に限定される。キューブマップは元来、画像平面が6つの区画に分かれていることを利用すれば、動きのある面(図示する例では面370)のみを時系列データとして基準画像のデータに含めることが容易にできる。 As described above, when the reference image is a moving image, image data as illustrated is generated at a predetermined rate. However, when the spaces illustrated in FIGS. 17 to 20 are represented, the difference from the previous frame is limited to the area of the car image indicated by the arrow in FIG. By using the fact that the image plane is originally divided into six sections, the cube map can easily include only the moving surface (the surface 370 in the illustrated example) as time series data in the reference image data. it can.
 例えば図21で示したデータ構造において、タイル画像をキューブマップの面に置き換え、「差分画像」を、前のフレームからの差が生じている面の画像とすれば、データ圧縮部334やデータ伸張部336の動作は上述と同様である。あるいは、キューブマップの面をさらにタイル画像に分割し、タイル画像の単位で基準画像のデータに含めるか否かを決定してもよい。さらには、前のフレームから変化のあったキューブマップの面、あるいはそのうち変化のあったタイル画像について、図18で示したように差が生じている画素に係る情報のみを表したデータを「差分画像」としてもよい。 For example, in the data structure shown in FIG. 21, if the tile image is replaced with a cube map surface and the “difference image” is an image of a surface where a difference from the previous frame occurs, the data compression unit 334 and the data decompression The operation of the unit 336 is the same as described above. Alternatively, the surface of the cube map may be further divided into tile images, and it may be determined whether or not to include them in the reference image data in units of tile images. Furthermore, for the cube map surface that has changed from the previous frame, or the tile image that has changed, data representing only the information relating to the pixel where the difference has occurred as shown in FIG. It may be an “image”.
 基準画像やデプス画像を正距円筒法で表した場合、その性質上、視点の真下や真上のオブジェクトの像は、画像平面の下部や上部で横方向に引き延ばされる。そのため、表示対象の空間においてそのような領域に変化が生じた場合、正距円筒の画像において広い範囲が変化し、データ圧縮の効率が悪くなることが考えられる。キューブマップ法によれば、画像平面での変化が、空間での変化に対応する面積に限定されるため、データ圧縮の効率を安定化できる。 When the reference image or depth image is represented by the equirectangular cylinder method, the image of the object directly below or above the viewpoint is stretched laterally at the lower or upper part of the image plane. For this reason, when such a region changes in the space to be displayed, it is conceivable that a wide range changes in the equirectangular image, and the efficiency of data compression deteriorates. According to the cube map method, the change in the image plane is limited to the area corresponding to the change in the space, so that the efficiency of data compression can be stabilized.
 これまで述べた態様では主に、基準視点ごとに基準画像とデプス画像を対として生成し、それらを同様に圧縮したり伸張したりして、オブジェクトの描画に利用した。ここでデプス画像は、オブジェクト表面上のポイントごとに、そこを描画する際に参照すべき基準画像を選択するのに用いられる。これを事前計算してオブジェクト表面上の位置に対応づけておけば、デプス画像自体を基準画像のデータに含める必要がなくなる。 In the mode described so far, a reference image and a depth image are generated as a pair for each reference viewpoint, and they are similarly compressed or expanded and used for drawing an object. Here, the depth image is used to select a reference image to be referred for drawing each point on the object surface. If this is pre-calculated and associated with the position on the object surface, the depth image itself does not need to be included in the reference image data.
 図23は、参照先の基準画像に係る情報を、オブジェクト表面上の位置に対応づけて保存する機能を導入した場合の、基準画像生成装置300の基準画像データ生成部と、表示画像生成装置200の画素値決定部の機能ブロックの構成を示している。この態様におい基準画像データ生成部318bは、基準画像生成部330、データ圧縮部334、デプス画像生成部332、および参照先情報付加部342を含む。基準画像生成部330、データ圧縮部334、およびデプス画像生成部332の機能は、図14で示した対応する機能ブロックと同様である。 FIG. 23 shows the reference image data generation unit of the reference image generation device 300 and the display image generation device 200 when a function for storing information related to the reference image of the reference destination in association with the position on the object surface is introduced. The structure of the functional block of the pixel value determination part is shown. In this aspect, the standard image data generation unit 318b includes a standard image generation unit 330, a data compression unit 334, a depth image generation unit 332, and a reference destination information addition unit 342. The functions of the reference image generation unit 330, the data compression unit 334, and the depth image generation unit 332 are the same as the corresponding functional blocks shown in FIG.
 参照先情報付加部342は、デプス画像生成部が生成したデプス画像を用いて、オブジェクト表面上の位置に対し、当該位置を描画するのに参照すべき基準画像を指定する情報を生成する。この処理は基本的に、図8で示したのと同様である。すなわちオブジェクト上のポイント(図8におけるポイント26など)が像として表れている基準画像を、デプス画像が示すオブジェクトまでの距離と、表示対象の空間における基準視点からポイントまでの距離との比較によって決定する。 The reference destination information adding unit 342 uses the depth image generated by the depth image generating unit to generate information for designating a reference image to be referred to for drawing the position with respect to the position on the object surface. This process is basically the same as that shown in FIG. That is, a reference image in which a point on the object (such as point 26 in FIG. 8) appears as an image is determined by comparing the distance to the object indicated by the depth image with the distance from the reference viewpoint to the point in the display target space. To do.
 ただし図8で説明したように表示時に参照先を選択する場合は、表示画像における描画対象の画素を起点とし、それに対応するポイントを定めたが、参照先情報付加部342は、参照先を求めるオブジェクト表面上の単位領域を所定の規則で設定する。具体例は後に述べる。参照先情報付加部342は、そのようにして決定した参照先の基準画像の識別情報を、オブジェクトモデル記憶部254に格納されたオブジェクトモデルに対応づけて書き込む。 However, when the reference destination is selected at the time of display as described with reference to FIG. 8, the pixel to be drawn in the display image is set as the starting point and the corresponding point is determined, but the reference destination information adding unit 342 obtains the reference destination. A unit area on the object surface is set according to a predetermined rule. Specific examples will be described later. The reference destination information adding unit 342 writes the identification information of the reference destination reference image thus determined in association with the object model stored in the object model storage unit 254.
 オブジェクトが移動したり変形したりする場合、基準視点からの見え方も変化するため、オブジェクトモデルに書き込まれる基準画像の識別情報の一部は時系列データとなる。この構成により、表示画像生成装置200が表示画像を生成する際、デプス画像を参照する必要がなくなる。したがってデータ圧縮部334は、基準画像生成部330が生成した基準画像のみを上述のいずれかの手法で圧縮し、基準画像データ記憶部256に格納する。 When the object moves or deforms, the appearance from the reference viewpoint also changes, so part of the identification information of the reference image written in the object model is time-series data. With this configuration, when the display image generation apparatus 200 generates a display image, it is not necessary to refer to the depth image. Therefore, the data compression unit 334 compresses only the reference image generated by the reference image generation unit 330 by any one of the methods described above, and stores it in the reference image data storage unit 256.
 表示画像生成装置200の画素値決定部266bは、データ伸張部336、参照部344、および演算部340を含む。データ伸張部336と演算部340の機能は、図14で示した対応する機能ブロックと同様である。ただしデータ伸張部336は、基準画像データ記憶部256に格納された基準画像についてのみ、上述のように伸張処理を施す。一方、参照部344は図14の参照部338と異なり、表示画像上の各画素に対応する、オブジェクト上のポイントを描画するのに用いる基準画像を、オブジェクトモデルに付加された情報に基づき決定する。 The pixel value determination unit 266b of the display image generation device 200 includes a data decompression unit 336, a reference unit 344, and a calculation unit 340. The functions of the data decompression unit 336 and the calculation unit 340 are the same as the corresponding functional blocks shown in FIG. However, the data decompression unit 336 performs the decompression process only on the reference image stored in the reference image data storage unit 256 as described above. On the other hand, unlike the reference unit 338 in FIG. 14, the reference unit 344 determines a reference image used to draw a point on the object corresponding to each pixel on the display image based on information added to the object model. .
 そして決定した基準画像から、当該ポイントの像を表す画素値を取得し、演算部340に供給する。このような構成により参照部344の処理の負荷が軽減され、表示画像の生成処理を高速化できる。また参照先の基準画像の識別情報は、デプス画像の距離値と比較し必要な階調が少なくてすむため、時系列のデータとしてもデータサイズを軽減できる。 Then, a pixel value representing the image of the point is acquired from the determined reference image and supplied to the calculation unit 340. With such a configuration, the processing load of the reference unit 344 is reduced, and the display image generation process can be speeded up. Further, since the identification information of the reference image of the reference destination requires less gradation than the distance value of the depth image, the data size can be reduced even as time-series data.
 図24は、参照先の基準画像の識別情報をオブジェクトモデルに対応づける手法の例を説明するための図である。図の表し方は図8と同様である。すなわちオブジェクト424が存在する空間に、5つの基準視点が設定され基準画像428a、428b、428c、428d、428eが生成されている。各基準画像(あるいは基準視点)の識別情報を「A」、「B」、「C」、「D」、「E」とする。この例で参照先情報付加部342は、丸印で示したオブジェクト424の頂点単位、あるいは頂点を結ぶ直線で囲まれた面(メッシュ)単位で、参照すべき基準画像の識別情報を対応づける。 FIG. 24 is a diagram for explaining an example of a method for associating identification information of a reference image of a reference destination with an object model. The representation of the figure is the same as in FIG. That is, five reference viewpoints are set in a space where the object 424 exists, and reference images 428a, 428b, 428c, 428d, and 428e are generated. The identification information of each reference image (or reference viewpoint) is “A”, “B”, “C”, “D”, and “E”. In this example, the reference destination information adding unit 342 associates the identification information of the reference image to be referred to in units of vertices of the objects 424 indicated by circles or in units of planes (mesh) surrounded by straight lines connecting the vertices.
 例えばオブジェクト424の面430aは、識別情報「A」および「C」の基準画像に表れていることがデプス画像から判明する。したがって当該面430aに、識別情報「A」および「C」を対応づける。面430bが、識別情報「A」および「B」の基準画像に表れていることが判明したら、当該面430bに、識別情報「A」および「B」を対応づける。面430cが、識別情報「C」および「D」の基準画像に表れていることが判明したら、当該面430cに、識別情報「C」および「D」を対応づける。 For example, it is found from the depth image that the surface 430a of the object 424 appears in the reference images of the identification information “A” and “C”. Therefore, the identification information “A” and “C” are associated with the surface 430a. If it is found that the surface 430b appears in the reference images of the identification information “A” and “B”, the identification information “A” and “B” are associated with the surface 430b. When it is found that the surface 430c appears in the reference images of the identification information “C” and “D”, the identification information “C” and “D” are associated with the surface 430c.
 オブジェクトのそのほかの面についても、どの基準画像にその像が表れているかをデプス画像を用いて特定し、その識別情報を対応づける。図では対応づけられる識別情報を、オブジェクト424の各面からの吹き出し内に示している。表示画像生成装置200の参照部344は、描画対象の画素に対応するオブジェクト上のポイントが含まれる面、あるいはその近傍の頂点を特定し、それに対応づけられている基準画像の識別情報を取得する。このような構成によれば、オブジェクトモデルとしてすでに形成されている頂点やメッシュの情報をそのまま利用して情報を付加できるため、データサイズの増大を抑えられる。また、オブジェクトモデルにおける参照先が限定されるため表示時の処理の負荷が小さい。 For other surfaces of the object, the reference image is used to identify the reference image using the depth image, and the identification information is associated. In the figure, the identification information to be associated is shown in a balloon from each surface of the object 424. The reference unit 344 of the display image generation device 200 identifies a surface including a point on the object corresponding to the drawing target pixel or a vertex in the vicinity thereof, and acquires identification information of a reference image associated therewith. . According to such a configuration, since information can be added using information on vertices and meshes already formed as an object model as they are, an increase in data size can be suppressed. Further, since the reference destinations in the object model are limited, the processing load during display is small.
 一方、面や頂点など情報を格納する粒度が大きくなるため、オクルージョンなどにより同一面で参照先の基準画像が変化する場合、それを正確に表すことができない。この場合、当該面の全体が表れている基準画像のみを参照先とすることが考えられるが、描画に用いる基準画像が少なくなり表示画像の質が低下することが考えられる。画質を維持するためには、参照先が異なる領域ごとに面(メッシュ)を分割し、その単位で基準画像の情報を設定する必要があるが、データサイズや処理負荷の面では不利となる。これらのことから図示する手法は、比較的単純な形状のオブジェクトに適用することが望ましい。 On the other hand, since the granularity for storing information such as faces and vertices increases, if the reference image of the reference destination changes on the same face due to occlusion, it cannot be accurately represented. In this case, it is conceivable that only the standard image showing the entire surface is used as the reference destination, but the standard image used for drawing is reduced, and the quality of the display image may be reduced. In order to maintain the image quality, it is necessary to divide the surface (mesh) for each region having different reference destinations and set the information of the reference image in that unit, but this is disadvantageous in terms of data size and processing load. From these facts, it is desirable to apply the illustrated technique to an object having a relatively simple shape.
 図25は、参照先の基準画像の識別情報をオブジェクトモデルに対応づける手法の別の例を説明するための図である。図の表し方は図24と同様である。この態様では、参照先の基準画像の識別情報の分布をテクスチャ画像として生成する。例えばオブジェクト424の面430aに対し、当該面上の位置ごとに参照先の基準画像の識別情報を画素値として表したテクスチャ画像432を生成する。面内で参照先が変化しなければ、テクスチャ画像432の画素値は均一となる。オクルージョンなどにより参照先の基準画像が面内で変化する場合、テクスチャ画像432の画素値がそれに対応するように変化する。これにより面単位より小さい粒度での参照先の制御が可能となる。 FIG. 25 is a diagram for explaining another example of a method for associating identification information of a reference image of a reference destination with an object model. The representation of the figure is the same as in FIG. In this aspect, the distribution of the identification information of the reference image of the reference destination is generated as a texture image. For example, with respect to the surface 430a of the object 424, a texture image 432 is generated that represents the identification information of the reference image as a pixel value for each position on the surface. If the reference destination does not change in the plane, the pixel values of the texture image 432 are uniform. When the reference image of the reference destination changes in the plane due to occlusion or the like, the pixel value of the texture image 432 changes so as to correspond to it. This makes it possible to control the reference destination with a smaller granularity than the surface unit.
 この場合、表示画像生成装置200の参照部344は、描画対象のオブジェクト上のポイントに対応するテクスチャ画像上の(u,v)座標を特定し、その位置に表されている基準画像の識別情報を読み出す。この処理は基本的には、コンピュータグラフィクスにおける一般的なテクスチャマッピングと同様である。このような構成によれば、オブジェクトモデルで規定されるメッシュを分割することなく、オクルージョンなどによる同一面内での参照先の切り替えを、軽い負荷で実現できる。 In this case, the reference unit 344 of the display image generation device 200 specifies the (u, v) coordinates on the texture image corresponding to the point on the object to be drawn, and the identification information of the reference image represented at that position Is read. This process is basically the same as general texture mapping in computer graphics. According to such a configuration, switching of reference destinations within the same plane by occlusion or the like can be realized with a light load without dividing the mesh defined by the object model.
 図26は、参照先の基準画像の識別情報をオブジェクトモデルに対応づける手法のさらに別の例を説明するための図である。図の表し方は図24と同様である。この態様では、オブジェクトを所定サイズのボクセルに分割し、当該ボクセル単位で、参照すべき基準画像の識別情報を対応づける。例えばオブジェクト242の面430aが識別情報「A」および「C」の基準画像に表れている場合、当該面430aを含むボクセル(例えばボクセル432a、432b)に識別情報「A」および「C」を対応づける。その他の面を含むボクセルについても同様である。1つのボクセルに2つの面が含まれる場合は、面ごとに参照先の情報を対応づけてもよい。 FIG. 26 is a diagram for explaining still another example of a method for associating identification information of a reference image of a reference destination with an object model. The representation of the figure is the same as in FIG. In this aspect, the object is divided into voxels of a predetermined size, and the identification information of the reference image to be referred to is associated with the voxel unit. For example, when the surface 430a of the object 242 appears in the reference images of the identification information “A” and “C”, the identification information “A” and “C” correspond to the voxels (for example, the voxels 432a and 432b) including the surface 430a. Put it on. The same applies to voxels including other surfaces. When two planes are included in one voxel, reference destination information may be associated with each plane.
 面内で参照先が変化しなければ、それを含むボクセルに対応づけられる情報は同一となる。オクルージョンなどにより面内で参照先の基準画像が変化しても、参照先の情報をボクセル単位で保持することにより、細かい粒度で適切な参照先を取得できる。この場合、表示画像生成装置200の参照部344は、描画対象のオブジェクト上のポイントが含まれるボクセルを特定し、それに対応づけられている基準画像の識別情報を取得する。このような構成によれば、オブジェクトの形状や空間の複雑さによらず統一されたデータ構造および処理で、高精度に画像を描画できる。 If the reference destination does not change in the plane, the information associated with the voxel containing it is the same. Even if the reference image of the reference destination changes within the plane due to occlusion or the like, an appropriate reference destination can be acquired with fine granularity by holding the reference destination information in units of voxels. In this case, the reference unit 344 of the display image generation apparatus 200 specifies a voxel including a point on the drawing target object, and acquires identification information of a reference image associated with the voxel. According to such a configuration, an image can be drawn with high accuracy with a unified data structure and processing regardless of the shape of the object and the complexity of the space.
 なお図示する例では同一サイズのボクセルを俯瞰した状態を、正方形の集合で表している。一方、参照すべき基準画像の識別情報を対応づける3次元空間の単位は、同一サイズのボクセルに限定されない。例えば、3次元空間の位置に対応づける情報を効率的に検索する手法の一つとして広く知られる八分木(オクトツリー)による空間分割を導入してもよい。当該手法は、対象となる空間をルートのボックスとし、それを3次元の各軸方向に2分割して8つのボックスとし、当該ボックスをさらに分割して8つのボックスとする、という処理を必要に応じて繰り返すことにより、空間を八分木のツリー構造で表す手法である。 In the example shown in the figure, the state of looking down over the same size voxels is represented by a set of squares. On the other hand, the unit of the three-dimensional space that associates the identification information of the reference image to be referred to is not limited to voxels having the same size. For example, space division by an octree that is widely known as one of methods for efficiently retrieving information associated with a position in a three-dimensional space may be introduced. This method requires processing that makes the target space a root box, divides it into three boxes in three dimensions to form eight boxes, and further divides the box into eight boxes. This is a technique for representing a space in an octree tree structure by repeating the process accordingly.
 位置によって分割回数を変化させることにより、情報を対応づける空間の粒度の局所性によって、最終的に形成されるボックスのサイズを制御できる。またそれらのボックスに与えたインデックス番号と、空間における位置との関係が、単純なビット演算により容易に判明する。この場合、表示画像生成装置200の参照部344は、描画対象のオブジェクト上のポイントが含まれるボックスのインデックス番号をビット演算により取得することにより、それに対応づけられている基準画像の識別情報を高速に特定できる。 By changing the number of divisions according to the position, the size of the box that is finally formed can be controlled by the locality of the granularity of the space that associates the information. Further, the relationship between the index numbers given to these boxes and the positions in the space can be easily found by simple bit operations. In this case, the reference unit 344 of the display image generation apparatus 200 obtains the index number of the box including the point on the object to be drawn by bit operation, thereby quickly identifying the identification information of the reference image associated therewith. Can be specified.
 以上述べた本実施の形態によれば、動画像を任意の視点から鑑賞する技術において、オブジェクトの仮想空間での動きを規定するデータとともに、当該動きを複数の基準視点から見た動画像を、基準画像として準備しておく。そして表示時には、ユーザの視点に基づくビュースクリーンに、所定の時間ステップでオブジェクトを射影するとともに、各時刻の基準画像から、同じオブジェクトを表す画素の値を取得することにより、表示画像の画素値を決定する。画素値の算出には、実際の視点と基準視点との位置関係やオブジェクトの属性に基づく規則を導入する。 According to the embodiment described above, in the technology for viewing a moving image from an arbitrary viewpoint, together with data defining the movement of the object in the virtual space, a moving image obtained by viewing the movement from a plurality of reference viewpoints, Prepare as a reference image. At the time of display, the object is projected onto the view screen based on the user's viewpoint at a predetermined time step, and the pixel value representing the same object is obtained from the reference image at each time to obtain the pixel value of the display image. decide. In calculating the pixel value, a rule based on the positional relationship between the actual viewpoint and the reference viewpoint and the attribute of the object is introduced.
 基準画像は視点に応じた表示とは別のタイミングで時間をかけて生成できるため、高品質なものを準備できる。表示時にはこの高品質な画像から値を引いてくることにより、時間をかけることなく高品質な画像を提示できる。ここでオブジェクトの動きに追随するように基準視点を移動させると、基準画像におけるオブジェクトの詳細度を一定にでき、表示画像においてもオブジェクトの像を高品質で安定的に表すことができる。 Since the reference image can be generated over time at a timing different from the display according to the viewpoint, a high-quality image can be prepared. By subtracting a value from this high quality image at the time of display, a high quality image can be presented without taking time. If the reference viewpoint is moved so as to follow the movement of the object, the detail level of the object in the reference image can be made constant, and the image of the object can be stably expressed in the display image with high quality.
 また基準画像や、表示時に参照先の基準画像を選択するために用いるデプス画像の動画像において、変化する領域のみを抽出して時系列データとすることにより、動画像の表示であっても必要なデータのサイズを抑えることができる。さらに基準画像とデプス画像の対応するフレームを、同一フレームに含めた統合動画像のデータを生成し、当該動画像の単位で圧縮符号化することにより、表示時における復号処理や同期処理の負荷を軽減できる。 In addition, it is necessary to display moving images by extracting only changing areas in the moving images of the reference images and the depth images used for selecting the reference images to be referenced at the time of display. Can reduce the size of data. Further, by generating integrated moving image data including the corresponding frames of the reference image and the depth image in the same frame, and compressing and encoding in units of the moving image, the load of decoding processing and synchronization processing at the time of display is reduced. Can be reduced.
 さらに、参照先の基準画像を決定するためにデプス画像のデータを用いる代わりに、オブジェクト表面の位置に対し参照先の基準画像をあらかじめ特定しておき、その識別情報をオブジェクトモデルに対応づける。これにより、表示に必要なデータのサイズをさらに軽減できる。また表示時には参照先の基準画像を演算により決定する処理が省略できるため、ユーザの視点の取得から表示までの時間を短縮できる。 Further, instead of using the depth image data to determine the reference image of the reference destination, the reference image of the reference destination is specified in advance with respect to the position of the object surface, and the identification information is associated with the object model. Thereby, the size of data required for display can be further reduced. Further, since the process of determining the reference image for reference by calculation can be omitted at the time of display, the time from acquisition of the user's viewpoint to display can be shortened.
 以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on the embodiments. The embodiments are exemplifications, and it will be understood by those skilled in the art that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are within the scope of the present invention. .
 100 ヘッドマウントディスプレイ、 200 表示画像生成装置、 222 CPU、 224 GPU、 226 メインメモリ、 236 出力部、 238 入力部、 254 オブジェクトモデル記憶部、 256 基準画像データ記憶部、 260 視点情報取得部、 262 空間構築部、 264 射影部、 266 画素値決定部、 268 出力部、 300 基準画像生成装置、 310 基準視点設定部、 314 オブジェクトモデル記憶部、 316 空間構築部、 318 基準画像データ生成部、 330 基準画像生成部、 332 デプス画像生成部、 334 データ圧縮部、 336 データ伸張部、 338 参照部、 340 演算部、 342 参照先情報付加部、 344 参照部。 100 head mounted display, 200 display image generation device, 222 CPU, 224 GPU, 226 main memory, 236 output unit, 238 input unit, 254 object model storage unit, 256 reference image data storage unit, 260 viewpoint information acquisition unit, 262 space Construction unit, 264 projection unit, 266 pixel value determination unit, 268 output unit, 300 reference image generation device, 310 reference viewpoint setting unit, 314 object model storage unit, 316 space construction unit, 318 reference image data generation unit, 330 reference image Generation unit, 332, depth image generation unit, 334 data compression unit, 336 data decompression unit, 338 reference unit, 340 calculation unit, 342 reference destination information addition unit, 344 reference unit.
 以上のように本発明は、ヘッドマウントディスプレイ、ゲーム装置、画像表示装置、携帯端末、パーソナルコンピュータなど各種情報処理装置や、それらのいずれかを含む情報処理システムなどに利用可能である。 As described above, the present invention can be used for various information processing devices such as a head mounted display, a game device, an image display device, a portable terminal, and a personal computer, and an information processing system including any one of them.

Claims (13)

  1.  表示対象のオブジェクトを含む空間を任意視点から見たときの表示画像を生成するのに用いる、当該空間を所定の基準視点から見たときの像を表す基準画像のデータ生成する基準画像生成装置であって、
     前記オブジェクトを規定する情報に従い、前記空間に当該オブジェクトを配置する空間構築部と、
     前記空間に配置した基準視点に対応する視野で、前記基準画像とそれに対応するデプス画像を生成したうえ、前記デプス画像を用いて、前記オブジェクトの表面上の所定領域ごとに、当該領域が像として表れている前記基準画像を特定し、特定結果と前記基準画像のデータを出力する基準画像データ生成部と、
     を備えたことを特徴とする基準画像生成装置。
    A reference image generation device that generates data of a reference image representing an image when a space is viewed from a predetermined reference viewpoint, which is used to generate a display image when the space including an object to be displayed is viewed from an arbitrary viewpoint. There,
    According to information defining the object, a space construction unit that arranges the object in the space;
    The reference image and the corresponding depth image are generated in a field of view corresponding to the reference viewpoint arranged in the space, and the region is converted into an image for each predetermined region on the surface of the object using the depth image. A reference image data generation unit that identifies the reference image that appears, and outputs the identification result and data of the reference image;
    A reference image generating device comprising:
  2.  前記基準画像データ生成部は、特定した前記基準画像の識別情報を、オブジェクトモデルの該当領域に対応づけることを特徴とする請求項1に記載の基準画像生成装置。 The reference image generation device according to claim 1, wherein the reference image data generation unit associates the identification information of the specified reference image with a corresponding region of the object model.
  3.  前記基準画像データ生成部は、前記オブジェクトモデルを規定する頂点またはメッシュに対し前記基準画像を特定し、その識別情報を対応づけることを特徴とする請求項2に記載の基準画像生成装置。 3. The reference image generation device according to claim 2, wherein the reference image data generation unit specifies the reference image with respect to a vertex or a mesh that defines the object model, and associates the identification information with the reference image.
  4.  前記基準画像データ生成部は、前記オブジェクトモデルの各面にマッピングするテクスチャ画像として、前記基準画像の識別情報の分布を表す画像を生成することを特徴とする請求項2に記載の基準画像生成装置。 The reference image generation device according to claim 2, wherein the reference image data generation unit generates an image representing a distribution of identification information of the reference image as a texture image to be mapped to each surface of the object model. .
  5.  前記基準画像データ生成部は、前記オブジェクトモデルを分割してなるボクセルに含まれる領域単位で前記基準画像を特定し、その識別情報をボクセルに対応づけることを特徴とする請求項2に記載の基準画像生成装置。 The reference according to claim 2, wherein the reference image data generation unit specifies the reference image in units of regions included in voxels obtained by dividing the object model, and associates the identification information with the voxels. Image generation device.
  6.  前記空間構築部は、前記オブジェクトの変化を規定する情報に従い当該オブジェクトを前記空間において変化させ、
     前記基準画像データ生成部は前記オブジェクトの変化を表す前記基準画像を所定のレートで生成するとともに、前記領域が像として表れている前記基準画像の変化を取得することにより、当該識別情報の時系列データを出力することを特徴とする請求項1から5のいずれかに記載の基準画像生成装置。
    The space construction unit changes the object in the space according to information defining the change of the object,
    The reference image data generation unit generates the reference image representing the change of the object at a predetermined rate, and acquires the change of the reference image in which the region appears as an image, thereby obtaining a time series of the identification information. 6. The reference image generation apparatus according to claim 1, wherein the reference image generation apparatus outputs data.
  7.  表示対象の空間におけるオブジェクトを規定する情報を格納するオブジェクトモデル記憶部と、
     前記オブジェクトを含む空間を、所定の基準視点から見たときの像を表す基準画像のデータを格納する基準画像データ記憶部と、
     ユーザの視点に係る情報を取得する視点情報取得部と、
     前記空間を前記ユーザの視点から見たときの前記オブジェクトの像を表示画像の平面に表す射影部と、
     前記表示画像における画素ごとに、対応するオブジェクト上のポイントが表されている前記基準画像を、前記オブジェクトモデル記憶部に格納されたオブジェクトモデルの付加情報を読み出すことにより特定し、当該画素の色を、特定した前記基準画像における像の色を用いて決定する画素値決定部と、
     前記表示画像のデータを出力する出力部と、
     を備えたことを特徴とする表示画像生成装置。
    An object model storage unit for storing information defining an object in a display target space;
    A reference image data storage unit for storing data of a reference image representing an image when the space including the object is viewed from a predetermined reference viewpoint;
    A viewpoint information acquisition unit for acquiring information related to the user's viewpoint;
    A projection unit that displays an image of the object on a plane of a display image when the space is viewed from the viewpoint of the user;
    For each pixel in the display image, the reference image in which a point on the corresponding object is represented is identified by reading additional information of the object model stored in the object model storage unit, and the color of the pixel is determined. A pixel value determining unit that determines the color of the image in the identified reference image,
    An output unit for outputting data of the display image;
    A display image generating apparatus comprising:
  8.  前記画素値決定部は、前記付加情報として、前記ポイントを含む面に対応づけられた、当該面が表されている前記基準画像の識別情報の分布を表すテクスチャ画像を読み出し、当該面にマッピングすることにより、前記ポイントが表されている前記基準画像を特定することを特徴とする請求項7に記載の表示画像生成装置。 The pixel value determination unit reads, as the additional information, a texture image representing a distribution of identification information of the reference image representing the surface, which is associated with the surface including the point, and maps the texture image to the surface The display image generating apparatus according to claim 7, wherein the reference image in which the point is represented is specified.
  9.  前記射影部は、前記空間において変化するオブジェクトの像を所定のレートで表示画像の平面に表し、
     前記画素値決定部は、前記付加情報として表される時系列データに基づき、前記画素の色の決定に用いる前記基準画像を所定のレートで特定することを特徴とする請求項7または8に記載の表示画像生成装置。
    The projecting unit represents an image of an object changing in the space on a plane of a display image at a predetermined rate,
    The said pixel value determination part specifies the said reference image used for determination of the color of the said pixel at a predetermined | prescribed rate based on the time series data represented as the said additional information, The Claim 7 or 8 characterized by the above-mentioned. Display image generating apparatus.
  10.  表示対象のオブジェクトを含む空間を任意視点から見たときの表示画像を生成するのに用いる、当該空間を所定の基準視点から見たときの像を表す基準画像のデータ生成する基準画像生成装置が、
     前記オブジェクトを規定する情報に従い、前記空間に当該オブジェクトを配置するステップと、
     前記空間に配置した基準視点に対応する視野で、前記基準画像とそれに対応するデプス画像を生成するステップと、
     前記デプス画像を用いて、前記オブジェクトの表面上の所定領域ごとに、当該領域が像として表れている前記基準画像を特定し、特定結果と前記基準画像のデータを出力するステップと、
     を含むことを特徴とする基準画像生成方法。
    A reference image generation device that generates data for a reference image that represents an image when the space is viewed from a predetermined reference viewpoint, which is used to generate a display image when the space including the object to be displayed is viewed from an arbitrary viewpoint. ,
    Placing the object in the space according to information defining the object;
    Generating the reference image and a depth image corresponding to the reference image in a field of view corresponding to a reference viewpoint arranged in the space;
    For each predetermined region on the surface of the object, using the depth image, specifying the reference image in which the region appears as an image, and outputting a specification result and data of the reference image;
    A reference image generation method comprising:
  11.  表示対象の空間におけるオブジェクトを規定する情報をメモリから読み出すステップと、
     前記オブジェクトを含む空間を、所定の基準視点から見たときの像を表す基準画像のデータをメモリから読み出すステップと、
     ユーザの視点に係る情報を取得するステップと、
     前記空間を前記ユーザの視点から見たときの前記オブジェクトの像を表示画像の平面に表すステップと、
     前記表示画像における画素ごとに、対応するオブジェクト上のポイントが表されている前記基準画像を、前記オブジェクトを規定する情報に含まれるオブジェクトモデルの付加情報に基づき特定し、当該画素の色を、特定した前記基準画像における像の色を用いて決定するステップと、
     前記表示画像のデータを出力するステップと、
     を含むことを特徴とする表示画像生成装置による表示画像生成方法。
    Reading from the memory information defining an object in the display target space;
    Reading data of a reference image representing an image when the space including the object is viewed from a predetermined reference viewpoint from a memory;
    Obtaining information relating to the user's viewpoint;
    Representing an image of the object on a plane of a display image when the space is viewed from the viewpoint of the user;
    For each pixel in the display image, the reference image in which a point on the corresponding object is represented is specified based on the additional information of the object model included in the information defining the object, and the color of the pixel is specified Determining using the color of the image in said reference image;
    Outputting data of the display image;
    A display image generation method by a display image generation apparatus, comprising:
  12.  表示対象のオブジェクトを含む空間を任意視点から見たときの表示画像を生成するのに用いる、当該空間を所定の基準視点から見たときの像を表す基準画像のデータ生成するコンピュータに、
     前記オブジェクトを規定する情報に従い、前記空間に当該オブジェクトを配置する機能と、
     前記空間に配置した基準視点に対応する視野で、前記基準画像とそれに対応するデプス画像を生成する機能と、
     前記デプス画像を用いて、前記オブジェクトの表面上の所定領域ごとに、当該領域が像として表れている前記基準画像を特定し、特定結果と前記基準画像のデータを出力する機能と、
     を実現させることを特徴とするコンピュータプログラム。
    A computer that generates data of a reference image representing an image when the space is viewed from a predetermined reference viewpoint, which is used to generate a display image when the space including the object to be displayed is viewed from an arbitrary viewpoint,
    A function of arranging the object in the space according to information defining the object;
    A function of generating the reference image and a depth image corresponding to the reference image in a field of view corresponding to the reference viewpoint arranged in the space;
    Using the depth image, for each predetermined area on the surface of the object, identifying the reference image in which the area appears as an image, and outputting the identification result and the data of the reference image;
    A computer program characterized by realizing the above.
  13.  表示対象の空間におけるオブジェクトを規定する情報をメモリから読み出す機能と、
     前記オブジェクトを含む空間を、所定の基準視点から見たときの像を表す基準画像のデータをメモリから読み出す機能と、
     ユーザの視点に係る情報を取得する機能と、
     前記空間を前記ユーザの視点から見たときの前記オブジェクトの像を表示画像の平面に表す機能と、
     前記表示画像における画素ごとに、対応するオブジェクト上のポイントが表されている前記基準画像を、前記オブジェクトを規定する情報に含まれるオブジェクトモデルの付加情報に基づき特定し、当該画素の色を、特定した前記基準画像における像の色を用いて決定する機能と、
     前記表示画像のデータを出力する機能と、
     をコンピュータに実現させることを特徴とするコンピュータプログラム。
    A function for reading out information defining the object in the display target space from the memory;
    A function of reading data of a reference image representing an image when the space including the object is viewed from a predetermined reference viewpoint from a memory;
    A function for acquiring information related to the user's viewpoint;
    A function of representing an image of the object on a plane of a display image when the space is viewed from the viewpoint of the user;
    For each pixel in the display image, the reference image in which a point on the corresponding object is represented is specified based on the additional information of the object model included in the information defining the object, and the color of the pixel is specified A function of determining using the color of the image in the reference image,
    A function of outputting data of the display image;
    A computer program for causing a computer to realize the above.
PCT/JP2018/014478 2018-04-04 2018-04-04 Reference image generation device, display image generation device, reference image generation method, and display image generation method WO2019193699A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/014478 WO2019193699A1 (en) 2018-04-04 2018-04-04 Reference image generation device, display image generation device, reference image generation method, and display image generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/014478 WO2019193699A1 (en) 2018-04-04 2018-04-04 Reference image generation device, display image generation device, reference image generation method, and display image generation method

Publications (1)

Publication Number Publication Date
WO2019193699A1 true WO2019193699A1 (en) 2019-10-10

Family

ID=68100222

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/014478 WO2019193699A1 (en) 2018-04-04 2018-04-04 Reference image generation device, display image generation device, reference image generation method, and display image generation method

Country Status (1)

Country Link
WO (1) WO2019193699A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220141480A1 (en) * 2019-04-01 2022-05-05 Nippon Telegraph And Telephone Corporation Image generation apparatus, image generation method and program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003051026A (en) * 2001-06-29 2003-02-21 Samsung Electronics Co Ltd Three-dimensional object and image-based method for presenting and rendering animated three-dimensional object
JP2006072805A (en) * 2004-09-03 2006-03-16 Nippon Hoso Kyokai <Nhk> Three-dimensional model display device and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003051026A (en) * 2001-06-29 2003-02-21 Samsung Electronics Co Ltd Three-dimensional object and image-based method for presenting and rendering animated three-dimensional object
JP2006072805A (en) * 2004-09-03 2006-03-16 Nippon Hoso Kyokai <Nhk> Three-dimensional model display device and program

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220141480A1 (en) * 2019-04-01 2022-05-05 Nippon Telegraph And Telephone Corporation Image generation apparatus, image generation method and program
US11800129B2 (en) * 2019-04-01 2023-10-24 Nippon Telegraph And Telephone Corporation Image generation apparatus, image generation method and program

Similar Documents

Publication Publication Date Title
KR102214263B1 (en) Image generating apparatus, image generating method, computer program, and recording medium
JP7212519B2 (en) Image generating device and method for generating images
JP6980031B2 (en) Image generator and image generation method
JP6934957B2 (en) Image generator, reference image data generator, image generation method, and reference image data generation method
US11893705B2 (en) Reference image generation apparatus, display image generation apparatus, reference image generation method, and display image generation method
US11315309B2 (en) Determining pixel values using reference images
US20200342656A1 (en) Efficient rendering of high-density meshes
WO2017086244A1 (en) Image processing device, information processing device, and image processing method
WO2018052100A1 (en) Image processing device, image processing method, and image processing program
US9225968B2 (en) Image producing apparatus, system and method for producing planar and stereoscopic images
WO2019193699A1 (en) Reference image generation device, display image generation device, reference image generation method, and display image generation method
WO2019193698A1 (en) Reference image generation device, display image generation device, reference image generation method, and display image generation method
US20100177098A1 (en) Image generation system, image generation method, and computer program product
JP2021028853A (en) Data structure of electronic content
WO2022244131A1 (en) Image data generation device, display device, image display system, image data generation method, image display method, and data structure of image data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18913698

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18913698

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP