WO2017191703A1 - Image processing device - Google Patents

Image processing device Download PDF

Info

Publication number
WO2017191703A1
WO2017191703A1 PCT/JP2017/005742 JP2017005742W WO2017191703A1 WO 2017191703 A1 WO2017191703 A1 WO 2017191703A1 JP 2017005742 W JP2017005742 W JP 2017005742W WO 2017191703 A1 WO2017191703 A1 WO 2017191703A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
user
image processing
real space
processing apparatus
Prior art date
Application number
PCT/JP2017/005742
Other languages
French (fr)
Japanese (ja)
Inventor
良徳 大橋
Original Assignee
株式会社ソニー・インタラクティブエンタテインメント
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ソニー・インタラクティブエンタテインメント filed Critical 株式会社ソニー・インタラクティブエンタテインメント
Priority to JP2018515396A priority Critical patent/JPWO2017191703A1/en
Publication of WO2017191703A1 publication Critical patent/WO2017191703A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/38Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory with means for controlling the display position

Definitions

  • the present invention relates to an image processing apparatus that is connected to a display device that a user wears on the head.
  • Such a display device causes the user to view the image by forming an image in front of the user's eyes.
  • a display device includes a non-transmission type that covers the front of the user with a display unit and prevents the user from seeing the real space in front of the user, and a display unit that includes a half mirror or the like.
  • a transmission type optical see-through method for visually recognizing.
  • the real space in front of the user's eyes photographed by a camera is separately displayed on the display unit, so that the user can visually recognize the real space in front of the eyes in the same manner as the transmissive display device.
  • a device that realizes a pseudo-transmissive display device referred to as a camera see-through method.
  • the present invention has been made in view of the above circumstances, and one of its purposes is to provide an image processing apparatus capable of displaying an image while giving a more accurate sense of hand position to the user. .
  • the present invention that solves the problems of the conventional example is an image processing apparatus that is connected to a display device that a user wears on the head, and that acquires an image of a real space around the user Means for determining field-of-view information; and image generation means for generating a field-of-view image specified by the determined information based on the acquired real-space image.
  • the image generated by is output to the display device.
  • FIG. 1 is a configuration block diagram illustrating an example of an image processing system including an image processing apparatus according to an embodiment of the present invention. It is a functional block diagram showing the example of the image processing apparatus which concerns on embodiment of this invention. It is explanatory drawing showing the example of the information of the inclination of the head which the image processing apparatus which concerns on embodiment of this invention uses. It is explanatory drawing showing the outline
  • an image processing system 1 including an image processing apparatus 10 includes an image processing apparatus 10, an operation device 20, a relay device 30, and a display device 40. It is configured to include.
  • the image processing apparatus 10 is an apparatus that supplies an image to be displayed by the display device 40.
  • the image processing apparatus 10 is a consumer game machine, a portable game machine, a personal computer, a smartphone, a tablet, or the like.
  • the image processing apparatus 10 includes a control unit 11, a storage unit 12, and an interface unit 13.
  • the control unit 11 is a program control device such as a CPU, and executes a program stored in the storage unit 12. In the present embodiment, the control unit 11 acquires an image of a real space around the user wearing the display device 40, and generates an image of a designated visual field based on the acquired image of the real space. .
  • the control unit 11 has a predetermined size (for example, a width (when the user wears the display device 40 (ie An initial space perpendicular to the user's line-of-sight direction and parallel to the floor surface) 10 m, depth (an initial user's line-of-sight direction parallel to the floor surface) 10 m, and a 3 m high cuboid range)
  • a virtual space (hereinafter referred to as a virtual space) corresponding to the space) is configured.
  • control unit 11 places a virtual three-dimensional object in the virtual space or applies a video effect while referring to an image in the real space.
  • control unit 11 includes, in the virtual space, information related to a visual field used when rendering an image of the virtual space (information related to two fields, a visual field corresponding to the user's left eye and a visual field corresponding to the right eye, or common information). Image of the virtual field viewed from the visual field specified by the obtained information (when the visual field information corresponding to the left and right eyes is used, a stereoscopic image is obtained). Generate. Then, the control unit 11 outputs the generated image to the display device 40. Detailed operation of the control unit 11 will be described later.
  • the storage unit 12 includes at least one memory device such as a RAM and stores a program executed by the control unit 11.
  • the storage unit 12 also operates as a work memory for the control unit 11 and stores data used by the control unit 11 in the course of program execution.
  • the program may be provided by being stored in a computer-readable non-transitory recording medium and stored in the storage unit 12.
  • the interface unit 13 is an interface for the control unit 11 of the image processing apparatus 10 to perform data communication with the operation device 20 and the relay device 30.
  • the image processing apparatus 10 is connected to the operation device 20, the relay apparatus 30, or the like via the interface unit 13 by either wired or wireless.
  • the interface unit 13 is a multimedia such as HDMI (High-Definition Multimedia Interface) in order to transmit an image (stereoscopic image) and sound supplied by the image processing device 10 to the relay device 30.
  • An interface may be included.
  • a data communication interface such as a USB may be included in order to receive various information from the display device 40 via the relay device 30 and to transmit a control signal or the like.
  • the interface unit 13 may include a data communication interface such as a USB in order to receive a signal indicating the content of the user's operation input to the operation device 20.
  • the operation device 20 is a controller or the like of a consumer game machine, and is used by the user to perform various instruction operations on the image processing apparatus 10.
  • the content of the user's operation input to the operation device 20 is transmitted to the image processing apparatus 10 by either wired or wireless.
  • the operation device 20 is not necessarily separate from the image processing apparatus 10, and may include operation buttons, a touch panel, and the like arranged on the surface of the image processing apparatus 10.
  • the relay device 30 is connected to the display device 40 by either wired or wireless, receives data of an image (stereoscopic image) supplied from the image processing device 10, and outputs a video signal corresponding to the received data. Output to the display device 40. At this time, the relay device 30 executes a process for correcting distortion generated by the optical system of the display device 40 as necessary for the video represented by the supplied image, and outputs a video signal representing the corrected video. It may be output. If the image supplied from the image processing device 10 is a stereoscopic image, the video signal supplied from the relay device 30 to the display device 40 includes a left-eye video signal and a right-eye video signal generated based on the stereoscopic image. Two video signals including a video signal are included. The relay device 30 relays various information transmitted and received between the image processing device 10 and the display device 40 such as audio data and control signals, in addition to the stereoscopic image and the video signal.
  • the display device 40 is a display device that the user wears on the head and uses the video according to the video signal input from the relay device 30 to allow the user to browse. In the present embodiment, it is assumed that the display device 40 displays an image corresponding to each eye in front of each of the user's right eye and left eye. As shown in FIG. 1, the display device 40 includes a video display element 41, an optical element 42, a camera 43, a sensor unit 44, and a communication interface 45.
  • the video display element 41 is an organic EL display panel, a liquid crystal display panel, or the like, and displays a video corresponding to a video signal supplied from the relay device 30.
  • the video display element 41 may be one display element that displays the left-eye video and the right-eye video in a line, or displays the left-eye video and the right-eye video independently.
  • a pair of display elements may be included.
  • a display screen such as a smartphone may be used as the video display element 41 as it is. In this case, a smartphone or the like displays a video corresponding to the video signal supplied from the relay device 30.
  • the display device 40 may be a retinal irradiation type (retinal projection type) device that directly projects an image on a user's retina.
  • the image display element 41 may be configured by a laser that emits light and a MEMS (Micro Electro Mechanical Systems) mirror that scans the light.
  • the optical element 42 is a hologram, a prism, a half mirror, or the like, and is disposed in front of the user's eyes.
  • the optical element 42 transmits or refracts the image light displayed by the image display element 41 to enter the user's eyes.
  • the optical element 42 may include a left-eye optical element 42L and a right-eye optical element 42R.
  • the left-eye image displayed by the image display element 41 is incident on the user's left eye via the left-eye optical element 42L
  • the right-eye image is incident on the user's right eye via the right-eye optical element 42R. You may make it inject.
  • the user can view the left-eye video with the left eye and the right-eye video with the right eye while the display device 40 is mounted on the head.
  • the display device 40 is assumed to be a non-transmissive display device in which the user cannot visually recognize the appearance of the outside world.
  • the camera 43 has a pair of image sensors 430L and 430R arranged on the left side slightly on the front side (front side of the user) of the display device 40 and on the right side with respect to the center (when there is no need to distinguish between left and right in the following description). , Collectively referred to as the image sensor 430).
  • the camera 43 may include at least one image sensor 430B disposed on the back side of the user.
  • the camera 43 captures at least an image of the real space in front of the user captured by the image sensors 430L and 430R, and outputs image data obtained by capturing the image to the image processing device 10 via the relay device 30. .
  • the sensor 44 may further include a head direction sensor 441 that detects the direction of the user's head (the front direction of the user's face) and the position.
  • the head direction sensor 441 detects the direction of the user's head (face direction).
  • the head direction sensor 441 is a gyro or the like, and the rotation angle in the head direction and the rotation angle in the elevation direction in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted. Then, the rotation angle around the viewing direction axis is detected and output.
  • the head direction sensor 441 uses a predetermined position of the display device 40 (for example, the position of a point that bisects a line segment connecting the image sensor 430L and the image sensor 430R of the camera 43) as a reference position.
  • the left and right direction of the user (the axis where the cross section and the coronal plane intersect, hereinafter referred to as the X axis), the front and rear direction (the axis where the sagittal plane and the cross section intersect, hereinafter referred to as the Y axis), and the vertical direction (Z axis) ) Is detected and output (x, y, z) from when it is attached.
  • the relative coordinates of each image sensor 430 with the reference position as the origin are known.
  • the communication interface 45 is an interface for communicating data such as video signals and image data with the relay device 30.
  • the communication interface 45 includes a communication antenna and a communication module.
  • control unit 11 functionally executes the program stored in the storage unit 12 to functionally obtain an image acquisition unit 21, a visual field determination processing unit 23, and an image generation unit 24.
  • the output unit 25 is realized.
  • the image acquisition unit 21 acquires an image of the real space around the user wearing the display device 40. Specifically, the image acquisition unit 21 receives image data captured by the camera 43 from the display device 40 via the relay device 30.
  • the image data captured by the camera 43 is a pair of image data captured by the pair of imaging elements 430 arranged on the left and right, and in the real space captured by the parallax of each image data.
  • the distance to the object can be determined.
  • the image acquisition unit 21 generates and outputs a depth map having the same size as the image data (hereinafter referred to as captured image data for distinction) based on the image data captured by the camera 43.
  • the depth map is image data in which information indicating the distance to the object captured by each pixel of the image data captured by the camera 43 is used as the value of the pixel corresponding to the pixel.
  • the visual field determination processing unit 23 determines visual field information for rendering the virtual space.
  • the visual field determination processing unit 23 is determined in advance (for example, may be hard-coded in a program or described in a setting file) regardless of the position of the image sensor 430 included in the camera 43.
  • a position coordinate RC of a camera hereinafter referred to as a rendering camera, in which an image viewed from the position of the rendering camera is rendered) in rendering in the virtual space;
  • Information representing the direction of the visual field (for example, vector information starting from the position coordinate RC and passing through the center of the visual field) is obtained and used as visual field information.
  • the visual field determination processing unit 23 may obtain the position coordinates RC of the rendering camera in the virtual space as relative coordinates from the reference position in the real space that changes with time according to the user's movement.
  • the reference position may be the position of the image sensor 430
  • the position in the virtual space corresponding to the position moved from the image sensor 430 by a predetermined relative coordinate value may be the position coordinate RC of the rendering camera.
  • the relative coordinates may be relative coordinates from the position of the image sensor 430R (or 430L) to the position where the user's right eye (or left eye) wearing the display device 40 should be, for example.
  • the position in the virtual space corresponding to the position of the user's eyes becomes the position coordinate RC of the rendering camera.
  • the visual field determination processing unit 23 may obtain information indicating the direction of the visual field (for example, vector information passing through the center of the visual field starting from the position coordinates RC) from the information output from the head direction sensor 441. In this case, the visual field determination processing unit 23 determines that the information output from the head direction sensor 441 is in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted, as illustrated in FIG.
  • the rotation angle ⁇ in the head direction, the rotation angle ⁇ in the elevation direction, the rotation angle ⁇ about the axis in the viewing direction, and the movement amount (x, y, z) of the head are acquired.
  • the visual field determination processing unit 23 determines the direction of the user's visual field based on the rotation angles ⁇ and ⁇ , and determines the inclination of the user's neck around the visual field direction based on the rotational angle ⁇ .
  • the visual field determination processing unit 23 sets the coordinates in the virtual space corresponding to the positions of the left and right eyes of the user in the real space as the position coordinates RC of the rendering camera. That is, coordinate information in the target space of the left image sensor 430L and the right image sensor 430R included in the camera 43 (information related to the amount of movement of the user's head and the relative position from the reference position to each image sensor 430). In the virtual space corresponding to the positions of the left and right eyes of the user in the XYZ coordinate system in the target space from the information of the relative coordinates of the left and right eyes with respect to the position of each image sensor 430. The coordinate information is output to the image generation unit 24 as the position coordinate RC of the rendering camera.
  • the visual field determination processing unit 23 also includes information on the direction of the visual field determined by the rotation angle ⁇ and ⁇ and the direction of the user's visual field, and the tilt of the user's neck about the visual field direction determined by the rotation angle ⁇ . Are output to the image generation unit 24.
  • the image generation unit 24 receives information about the position coordinates RC of the rendering camera and the direction of the visual field from the visual field determination processing unit 23. And this image generation part 24 produces
  • the image generation unit 24 first generates environment mesh list information and an object buffer based on the depth map information output from the image acquisition unit 21.
  • the environment mesh list information is obtained by dividing the depth map into meshes, the vertex coordinates of the mesh (information indicating the pixel position), mesh identification information, and pixels in the captured image data corresponding to the pixels in the mesh. Includes information on the normal of the object imaged on the screen, mesh type information (information indicating which type is predetermined), and information on the surface shape of the mesh.
  • This mesh type information indicates that the object imaged on the pixel in the captured image data corresponding to the pixel in the mesh is an object other than a wall that is within a predetermined height from the floor, ceiling, wall, or obstacle. Or the like) or the other. Further, the information regarding the surface shape of the mesh is information indicating whether the surface shape is a plane, an uneven surface, a spherical surface, or a complex surface.
  • the object buffer has a predetermined size (for example, a width (a direction perpendicular to the initial user's line-of-sight direction and parallel to the floor surface) 10 m, a depth (floor) including the user's position and the rear of the user's line-of-sight direction.
  • a target space an actual space (hereinafter referred to as a target space) of 10 m and a height of 3 m in a rectangular parallelepiped direction (hereinafter referred to as a target space) is virtually a voxel (a virtual volume element, for example, a width of 10 cm, a depth).
  • voxel value (Cube element of 10 cm and 10 cm in height) is expressed in space, the value of the voxel in which the object exists (voxel value) is “1”, the value of the non-existing voxel is “0”, and it is unknown
  • the voxel value is set to “ ⁇ 1” (FIG. 4).
  • FIG. 4 shows only a part of the voxels for convenience of illustration, and the size of the voxels is also set as appropriate for explanation.
  • the size of the voxel with respect to the target space does not necessarily indicate a size suitable for implementation.
  • a cubic object is arranged at the back corner of the target space, and the value of the voxel corresponding to the surface is set to “1” indicating that the object exists, and is hidden from the surface.
  • the value of the partial voxel is set to “ ⁇ 1” indicating that it is unknown, and the value of the voxel existing up to the object surface is set to “0” indicating that there is nothing.
  • the image generation unit 24 sets this voxel value based on the depth map information.
  • Each pixel on the depth map has a vertex at the position coordinates of the camera 43 at the time of shooting the image data that is the origin of the depth map (which may be a reference position coordinate, hereinafter referred to as a shooting position), and the angle of view of the depth map. Is divided by the resolution of the depth map (vertical py pixels ⁇ horizontal px pixels).
  • a vector parallel to the line segment passing through the vertex of each pixel starting from the coordinates of the shooting position (the difference in coordinates in the world coordinate system) or a line segment passing through the center of each pixel starting from the coordinates of the shooting position
  • a parallel vector (difference in coordinates in the world coordinate system) can be calculated as the direction of each pixel from the coordinates of the shooting position, information representing the angle of view of the depth map, and the resolution of the depth map.
  • the image generation unit 24 for each pixel on the depth map, is in the direction of the pixel from the coordinates in the object buffer corresponding to the coordinates of the shooting position (may be the coordinates of the reference position), and the object represented by the depth map
  • the value of the voxel corresponding to the distance to is “1”, and unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is “0”.
  • the image generation unit 24 is hidden by an object in the real space in the image data picked up by the camera 43 and is behind the part not picked up (the object behind the desk, the wall, or the floor).
  • the voxel value of the corresponding portion is set to “ ⁇ 1” because it is unknown whether or not an object exists.
  • the image generation unit 24 moves the user or changes the direction of the line of sight, so that the image data captured by the camera 43 has not been captured in the past and whether or not an object exists is unknown.
  • the depth map of the portion corresponding to (the portion corresponding to the voxel whose value was “ ⁇ 1”) is obtained
  • the value of the voxel of the portion is set to “0” or “0” based on the obtained depth map. Set to “1” to update.
  • a method for setting a voxel value in a three-dimensional space representing a range where such an object exists from information such as a depth map is a method widely known as a 3D scanning method in addition to the method described here. Various methods can be employed.
  • the image generation unit 24 uses the rendering camera position coordinates RC specified by the information input from the visual field determination processing unit 23 to determine the visual field direction specified by the information input from the visual field determination processing unit 23 in the object buffer.
  • a projection image is generated by two-dimensionally projecting a voxel whose value is not “0” (FIG. 5). Then, the image generation unit 24 detects an object arranged in the real space based on the acquired real space image, and generates a stereoscopic image in which a virtual object is arranged at the position of each detected object. And output.
  • the image generation unit 24 configures a virtual space according to a predetermined rule (referred to as an object conversion rule).
  • the object conversion rule is as follows.
  • the value of voxel is “1” and the mesh of the corresponding part is (1) If the mesh type is “ceiling”, the background is synthesized.
  • a virtual object “operation panel” is arranged at the position of an object whose mesh type is an obstacle and whose mesh surface shape is a plane.
  • a “rock” or “box”, which is a virtual object, is placed at the position of an object whose type of mesh is an obstacle and whose mesh surface shape is an uneven surface.
  • a “light”, which is a virtual object, is placed at the position of an object whose mesh type is an obstacle and whose mesh surface shape is spherical.
  • “Trees and plants”, which are virtual objects, are arranged in a range of an object having a mesh type obstacle and a complicated mesh surface shape.
  • the image generation unit 24 separately receives input of a background image, an operation panel, and three-dimensional model data of a virtual object such as a rock, a box, a tree, and a plant.
  • 5 is a range in which a voxel having a voxel value “1” in the projected image illustrated in FIG. 5 is projected, and the mesh in the corresponding range is set as an “operation panel” based on the object conversion rule.
  • a voxel having a voxel value of “1” in the projected image is projected, and the mesh in the corresponding range is set to “rock” or “box” based on the object conversion rule.
  • a virtual object of “rock” or “box” is arranged at a position corresponding to the part, and each virtual object is arranged as follows.
  • the process of arranging the virtual object represented by the 3D model data in the virtual space is widely known in the process of creating the 3D graphics, so detailed description thereof is omitted here. .
  • the image generation unit 24 calculates the field of view from the position coordinates RC of the rendering camera (here, the coordinates corresponding to the left eye and the right eye of the user) input from the field of view determination processing unit 23.
  • the images viewed from the viewing direction specified by the information input from the determination processing unit 23 are rendered.
  • the image generation unit 24 outputs the pair of image data obtained by rendering to the output unit 25 as a stereoscopic image.
  • the output unit 25 outputs the image data of the stereoscopic image generated by the image generation unit 24 to the display device 40 via the relay device 30.
  • the image generation unit 24 may generate image data tilted by this angle ⁇ using information on the angle ⁇ related to the tilt of the user's neck at the time of rendering.
  • the image generation unit 24 replaces the stereoscopic image in the virtual space according to the instruction with the real space image captured by the imaging elements 430L and R of the camera 43 as it is. (So-called camera see-through function is provided).
  • the image generation unit 24 outputs image data captured by the left imaging element 430L and the right imaging element 430R included in the camera 43 as they are, and the output unit 25 outputs an image based on these image data.
  • the image is displayed as it is as the image for the left eye and the image for the right eye.
  • the image processing apparatus 10 basically includes the above configuration and operates as follows.
  • the image processing device 10 starts the processing illustrated in FIG. 6, and sets the reference position of the display device 40 (for example, the gravity center position of each image sensor 430 of the camera 43) as the origin.
  • This object space is represented by an object buffer (initially, all voxel values are represented by “ ⁇ ” as a virtual voxel (virtual volume element, for example, a cubic element having a width of 10 cm, a depth of 10 cm, and a height of 10 cm). 1 ”is set and stored in the storage unit 12 (S2).
  • the display device 40 repeatedly captures an image with the camera 43 at predetermined timings (for example, every 1/1000 seconds), and sends image data obtained by the imaging to the image processing device 10.
  • the image processing apparatus 10 receives image data captured by the camera 43 from the display apparatus 40 via the relay apparatus 30 (S3).
  • the image processing apparatus 10 acquires information on the direction of the user's head (face direction) and the amount of movement (for example, expressed by the coordinate values in the XYZ space). Specifically, the information on the head direction and the movement amount of the user may be information detected by the head direction sensor 441 of the display device 40 and output to the image processing device 10.
  • the image processing apparatus 10 determines the position coordinates RC of the rendering camera and the viewing direction (S4). Specifically, as illustrated in FIG. 7, the image processing apparatus 10 refers to the acquired information on the movement amount of the head, and includes the left image sensor 430 ⁇ / b> L and the right image sensor 430 ⁇ / b> R included in the camera 43. Get coordinate information in the target space. Then, the coordinate information obtained here, and each of the predetermined imaging elements 430L and R and the corresponding eyes of the user (the user's left eye for the imaging element 430L and the user's right eye for the imaging element 430R) are respectively relative to each other.
  • the coordinate information of the left and right eyes of the user in the XYZ coordinate system in the target space is obtained using the specific coordinate information.
  • the coordinate information in the virtual space corresponding to this coordinate information is set as the position coordinate RC of the rendering camera.
  • the XYZ coordinate system of the target space and the coordinate system of the virtual space may be matched, and the coordinate values in the XYZ coordinate system in the target space may be used as the position coordinates RC of the rendering camera as they are.
  • the image processing apparatus 10 determines the viewing direction based on the acquired information on the direction of the head.
  • the image processing apparatus 10 receives from the operation device 20 an instruction on whether to display an image in the real space (operation as a so-called camera see-through) or an image in the virtual space from the user (S5). Then, the image processing apparatus 10 starts processing for displaying the instructed image.
  • S5 “Real Space”.
  • the image processing apparatus 10 displays an image using the real space image captured by the camera 43 from the display apparatus 40 received in S3.
  • the image processing apparatus 10 uses the image data input from the image sensor 430L of the camera 43 as it is as image data for the left eye. Further, the image processing apparatus 10 directly uses the image data input from the image sensor 430R of the camera 43 as image data for the right eye. For example, the image processing apparatus 10 generates image data for the left eye and right eye in this way (S6).
  • the image processing apparatus 10 outputs the image data for the left eye and the image data for the right eye generated here to the display apparatus 40 via the relay apparatus 30 (S7).
  • the display device 40 causes the left-eye image data to enter the left eye of the user via the left-eye optical element 42L as a left-eye image. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye.
  • the user visually recognizes an image of the real space that is captured by the camera 43 and the viewpoint is converted to the position of the user's eyes. Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.
  • step S5 If it is determined in step S5 that the user has instructed display of an image in the virtual space (S5: “virtual space”), the image processing apparatus 10 uses the image data captured from the camera 43 based on the image data. An obtained depth map is generated (S8).
  • the image processing apparatus 10 divides the generated depth map into meshes, and the object captured by the pixels in the captured image data corresponding to the pixels in each mesh has a ceiling, a wall, and an obstacle (a predetermined height from the floor). It is determined in advance as an object other than the wall within the range) or the other. Further, from the value of the pixel on the depth map in each mesh, it is determined whether the surface shape of the mesh is a surface shape such as a plane, an uneven surface, a spherical surface, or a complex surface.
  • the image processing apparatus 10 includes, as information related to the generated depth map, information indicating the position of each mesh on the depth map (may be the coordinates of the vertex of the mesh), information on the type of mesh, and information on the surface shape Are stored in the storage unit 12 as environment mesh list information (S9: environment mesh list information generation).
  • the image processing apparatus 10 sequentially selects each pixel of the depth map, and sets the value of the voxel corresponding to the distance to the object represented by the selected pixel of the depth map in the direction corresponding to the pixel selected on the depth map as “1”. Unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is set to “0”. Here, in the image data picked up by the camera 43, it is assumed that it is unclear whether or not there is an object that is hidden by an object in the real space and is not picked up. The value remains “ ⁇ 1”.
  • the image processing apparatus 10 is unclear whether the image data captured by the camera 43 has not been captured in the past and an object exists as the user moves or changes the line-of-sight direction.
  • the depth map of the part corresponding to the voxel (the part corresponding to the voxel whose value is “ ⁇ 1”) is obtained
  • the value of the voxel of the part is obtained based on the obtained depth map in step S11.
  • Set to “0” or “1” to update (S10: Update object buffer).
  • the image processing apparatus 10 generates, from the coordinates represented by the position coordinates of the rendering camera, a projection image obtained by two-dimensionally projecting a voxel having a value other than “0” in the field of view determined in step S4 in the object buffer (FIG. 5). Then, the image processing apparatus 10 detects an object arranged in the real space based on the acquired image of the real space, and sets the detected object to a predetermined rule (when a game process is performed, The stereoscopic image replaced with the virtual object is output in accordance with the rules defined in the program No. 1).
  • the image processing apparatus 10 accepts input of a background image and an operation panel, 3D model data such as rocks and boxes, trees and plants, and the voxel value is “3” in a virtual 3D space.
  • the virtual object of the operation panel is arranged at a position corresponding to the range in which the voxel of “1” is projected and the mesh in the corresponding range is set as the “operation panel” based on the object conversion rule.
  • a voxel whose voxel value is “1” in the projected image is projected, and the mesh in the corresponding range is set as “rock” or “box” based on the object conversion rule.
  • a virtual object of “rock” or “box” is arranged at a certain position, and each virtual object is arranged in the virtual space to form a virtual space.
  • a background image is synthesized to form a virtual space as if there is no ceiling (S11: configure virtual space).
  • the image processing apparatus 10 includes the rendering camera position coordinates RC (user's coordinates) determined in step S4 in a virtual three-dimensional space (a coordinate system space corresponding to the target space) in which the three-dimensional model data is arranged. Rendering the image data in the visual field direction determined in step S4 (image data to be presented to each of the left eye and the right eye) viewed from the coordinate information corresponding to each of the left eye and the right eye (S12: rendering). Then, the image processing apparatus 10 outputs the pair of image data thus rendered as a stereoscopic image (S13).
  • RC user's coordinates
  • the display device 40 causes the image data for the left eye to enter the left eye of the user through the left-eye optical element 42L as an image for the left eye. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye.
  • the user visually recognizes an image in a virtual space in which an object in the real space is changed to a virtual object (such as an operation panel, a rock, or a box). Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.
  • processing for updating the object buffer may also be performed when the processing of processing S6 to S7 for displaying an image of the real space is executed.
  • the rendering camera is arranged at a position corresponding to the position of the user's eyes to reflect the real space (Image data obtained by replacing an object in the real space with a virtual object) Since a rendered image is presented, an image can be displayed while giving a more accurate sense of hand distance to the user. That is, when the user reaches out to touch the virtual object in the virtual space, the user touches the corresponding object in the real space.
  • the left-eye image data and the right-eye image data to be generated are extracted from the real space image data obtained by reconstructing the target space (the mesh obtained by the same process as the process S8 to the process S9).
  • the texture is pasted as a virtual object, and these virtual objects are arranged in a virtual space corresponding to the target space to form a virtual space, and the viewing direction is determined from the position coordinates RC of the rendering camera determined in step S4. It may be displayed as a rendered image).
  • an image viewed from an arbitrary position such as the position of the user's eyes is presented as an image representing the real space, and a rendered image viewed from the same position is also presented as an image in the virtual space.
  • the visual field does not move substantially at the time of switching, and the sense of incongruity can be further reduced, and when switching from the real space to the virtual space, the virtual space
  • rendering and outputting image data obtained by the rendering are used as if the real space object is used little by little in a game or the like. It is also possible to produce a scene as if it is being replaced by a virtual object in the virtual space.
  • the image of the real space around the user is obtained by the camera 43 provided in the display device 40 worn by the user.
  • the present embodiment is not limited to this.
  • an image captured by a camera arranged in a room where the user is located may be used.
  • the rendering camera position coordinate RC represents the position corresponding to the position of the user's eyes and the visual field direction is the direction of the user's face
  • the position coordinate RC of the rendering camera is a coordinate representing a position viewed from above the virtual space (for example, the coordinate value of the X-axis and Y-axis components is the user's position coordinate, and the coordinate value of the Z-axis component is predetermined. Value).
  • the direction of the visual field may be a direction behind the user (a direction opposite to the direction of the user's face).
  • the range of the field of view may be changed to produce the effect of zooming.
  • the range of view (view angle) may be arbitrarily set by a user operation.
  • the information on the position and tilt of the user's head is assumed to be obtained from the head direction sensor 441 provided in the display device 40.
  • the present embodiment is not limited to this, for example, The user is imaged by a camera arranged at a known position in the room where the user is located, and a predetermined point that moves with the user's head, for example, a predetermined marker arranged in advance on the display device 40 worn by the user.
  • the position and tilt angle of the user's head may be detected by detecting the position and orientation. Since a technique for detecting the position and inclination of a marker based on such a marker and image data obtained by imaging the marker is widely known, detailed description thereof is omitted here.
  • the display device 40 When acquiring information on the position and tilt of the user's head by this method, the display device 40 does not necessarily need to be provided with the head direction sensor 441.
  • the relative coordinates between the reference position and the user's eye position are determined in advance.
  • the display device 40 is provided with an eye sensor 440 for detecting the position of the user's eyes, thereby detecting the position of the user's eyes and obtaining the relative coordinates between the reference position and the position of the user's eyes. May be.
  • an eye sensor 440 there is a visible camera or an infrared camera arranged at a known position with respect to a predetermined position of the display device 40.
  • the eye sensor 440 detects the position of the user's eye, iris, cornea, or pupil (relative position with respect to the eye sensor 440), and the predetermined position of the known display device 40 (for example, the left image sensor for the left eye).
  • Relative coordinate information from the position of 430L and the right eye (position of the right imaging element 430R) is obtained and output as information representing the iris or pupil center position on the eyeball surface.
  • the eye sensor 440 can detect vector information of the direction of the user's line of sight (eyeball direction), the parameters (for example, distortion correction parameters) of the video displayed on the display device 40 are used using the line-of-sight information. It may be changed.
  • the image processing apparatus 10 obtains the vectors VL and VR of the line-of-sight direction for each of the left and right eyes of the user in the XYZ coordinate system in the target space, and then the direction of these vectors from the position coordinates RC of the rendering camera. The distances rL and rR to the virtual object at are obtained.
  • the image processing apparatus 10 determines rmin (lower limit distance) and rmax (upper limit distance) for specifying the distance range r in focus as follows, for example.
  • the image processing apparatus 10 processes the rendered image data (stereoscopic image image data) using information (rmin, rmax in this example) that specifies the calculated depth of field.
  • the image generation unit 24 of the image processing apparatus 10 divides the generated stereoscopic image image data (each image data for the left eye and for the right eye) into a plurality of image regions (for example, image blocks of a predetermined size). . For each image region, the image generation unit 24 obtains a pixel value (a distance from the display device 40 worn by the user) on the depth map corresponding to the pixel in the image region. Based on the pixel value on the depth map obtained here and the information related to the position of the user's eyes, the image generation unit 24 converts from the user's eyes to the pixels in the image area (for example, the center pixel of the image area). A distance D to the imaged object is obtained. If the distance D is rmin ⁇ D ⁇ rmax with respect to rmin and rmax, the image generation unit 24 does nothing for the image region.
  • the image generation unit 24 applies a blurring process (for example, a Gaussian filter) with an intensity corresponding to the size of rmin ⁇ D to the pixels in the image area.
  • a blurring process for example, a Gaussian filter
  • the image generation unit 24 performs a blurring process of intensity corresponding to the magnitude of D ⁇ rmax (also a Gaussian filter, for example) on the pixel of the image area. To make the image look as if it is out of focus.
  • D ⁇ rmax also a Gaussian filter, for example
  • the view-converted real space image based on the acquired real space image, or the virtual space image obtained by rendering In addition, a stereoscopic image having the designated depth of field can be generated.
  • an image portion outside the distance range that the user is gazing at will be blurred and a more natural image is provided to the user.

Abstract

This image processing device, which is connected to a display device which is worn on the head of a user when used, acquires an image of a real space around the user, determines information related to the field of view, and generates, on the basis of the acquired image of the real space, an image of the field of view identified by the determined information. The image processing device outputs the generated image to the display device.

Description

画像処理装置Image processing device
 本発明は、ユーザが頭部に装着して使用する表示装置と接続される画像処理装置に関する。 The present invention relates to an image processing apparatus that is connected to a display device that a user wears on the head.
 近年、ヘッドマウントディスプレイなど、ユーザが頭部に装着して使用する表示装置が普及しつつある。このような表示装置は、ユーザの目の前に画像を結像させることで、その画像をユーザに閲覧させる。こうした表示装置には、ユーザの眼前を表示部で覆い、ユーザに眼前の現実空間を見せないようにする非透過型のものと、表示部をハーフミラー等で構成し、ユーザに眼前の現実空間を視認させる透過型(光学シースルー方式)のものがある。 In recent years, display devices such as head-mounted displays that are worn by users on their heads are becoming popular. Such a display device causes the user to view the image by forming an image in front of the user's eyes. Such a display device includes a non-transmission type that covers the front of the user with a display unit and prevents the user from seeing the real space in front of the user, and a display unit that includes a half mirror or the like. There is a transmission type (optical see-through method) for visually recognizing.
 また、非透過型の表示装置であっても、別途、カメラで撮影したユーザの眼前の現実空間を表示部に表示することで、透過型の表示装置同様にユーザに眼前の現実空間を視認させる、疑似的に透過型の表示装置を実現するもの(カメラシースルー方式と呼ばれる)が存在する。 Further, even in the case of a non-transmissive display device, the real space in front of the user's eyes photographed by a camera is separately displayed on the display unit, so that the user can visually recognize the real space in front of the eyes in the same manner as the transmissive display device. There is a device that realizes a pseudo-transmissive display device (referred to as a camera see-through method).
 しかしながら、カメラシースルー方式の表示装置では、ユーザの実際の目の位置を視点とした画像と異なり、カメラからの視点の画像をユーザが見ることとなるので、ユーザは例えば手の位置の距離感がつかみにくいという問題点があった。 However, in a camera see-through display device, the user sees an image of the viewpoint from the camera, unlike an image whose viewpoint is the actual position of the user's eyes. There was a problem that it was difficult to grasp.
 本発明は上記実情に鑑みて為されたもので、その目的の一つは、ユーザに対してより正確な手の位置の距離感を与えつつ画像を表示できる画像処理装置を提供することである。 The present invention has been made in view of the above circumstances, and one of its purposes is to provide an image processing apparatus capable of displaying an image while giving a more accurate sense of hand position to the user. .
 上記従来例の問題点を解決する本発明は、ユーザが頭部に装着して使用する表示装置と接続される画像処理装置であって、当該ユーザの周囲の現実空間の画像を取得する画像取得手段と、視野の情報を決定する決定手段と、前記取得した現実空間の画像に基づき、前記決定した情報で特定される視野の画像を生成する画像生成手段と、を有し、前記画像生成手段が生成した画像を、前記表示装置に出力することとしたものである。 The present invention that solves the problems of the conventional example is an image processing apparatus that is connected to a display device that a user wears on the head, and that acquires an image of a real space around the user Means for determining field-of-view information; and image generation means for generating a field-of-view image specified by the determined information based on the acquired real-space image. The image generated by is output to the display device.
本発明の実施の形態に係る画像処理装置を含む画像処理システムの例を表す構成ブロック図である。1 is a configuration block diagram illustrating an example of an image processing system including an image processing apparatus according to an embodiment of the present invention. 本発明の実施の形態に係る画像処理装置の例を表す機能ブロック図である。It is a functional block diagram showing the example of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施の形態に係る画像処理装置が用いる頭部の傾きの情報の例を表す説明図である。It is explanatory drawing showing the example of the information of the inclination of the head which the image processing apparatus which concerns on embodiment of this invention uses. 本発明の実施の形態に係る画像処理装置が生成する物体バッファの概要を表す説明図である。It is explanatory drawing showing the outline | summary of the object buffer which the image processing apparatus which concerns on embodiment of this invention produces | generates. 本発明の実施の形態に係る画像処理装置が生成する物体バッファの投影像を表す説明図である。It is explanatory drawing showing the projection image of the object buffer which the image processing apparatus which concerns on embodiment of this invention produces | generates. 本発明の実施の形態に係る画像処理装置の動作例を表すフローチャート図である。It is a flowchart figure showing the example of operation of the image processing device concerning an embodiment of the invention. 本発明の実施の形態に係る画像処理装置の動作例を表す説明図である。It is explanatory drawing showing the operation example of the image processing apparatus which concerns on embodiment of this invention.
 本発明の実施の形態について図面を参照しながら説明する。本発明の実施の形態に係る画像処理装置10を含む画像処理システム1は、図1に例示するように、画像処理装置10と、操作デバイス20と、中継装置30と、表示装置40と、を含んで構成されている。 Embodiments of the present invention will be described with reference to the drawings. As illustrated in FIG. 1, an image processing system 1 including an image processing apparatus 10 according to an embodiment of the present invention includes an image processing apparatus 10, an operation device 20, a relay device 30, and a display device 40. It is configured to include.
 画像処理装置10は、表示装置40が表示すべき画像を供給する装置である。例えば画像処理装置10は、家庭用ゲーム機、携帯型ゲーム機、パーソナルコンピューター、スマートフォン、タブレット等である。図1に示されるように、この画像処理装置10は、制御部11と、記憶部12と、インタフェース部13と、を含んで構成される。 The image processing apparatus 10 is an apparatus that supplies an image to be displayed by the display device 40. For example, the image processing apparatus 10 is a consumer game machine, a portable game machine, a personal computer, a smartphone, a tablet, or the like. As shown in FIG. 1, the image processing apparatus 10 includes a control unit 11, a storage unit 12, and an interface unit 13.
 制御部11は、CPU等のプログラム制御デバイスであり、記憶部12に格納されているプログラムを実行する。本実施の形態では、この制御部11は、表示装置40を装着したユーザの周囲の現実空間の画像を取得し、当該取得した現実空間の画像に基づいて、指定された視野の画像を生成する。 The control unit 11 is a program control device such as a CPU, and executes a program stored in the storage unit 12. In the present embodiment, the control unit 11 acquires an image of a real space around the user wearing the display device 40, and generates an image of a designated visual field based on the acquired image of the real space. .
 具体的に、本実施の形態の一例では、この制御部11は、ユーザの位置を中心として、ユーザの後方と含むユーザ周囲の所定サイズ(例えば幅(ユーザが表示装置40を装着した時点(つまり初期)のユーザの視線方向に直交し、床面に平行な方向)10m、奥行(床面に平行な初期のユーザの視線方向)10m、高さ3mの直方体範囲)の現実空間(以下、対象空間と呼ぶ)に対応する仮想的な三次元空間(以下、仮想空間と呼ぶ)を構成する。 Specifically, in the example of the present embodiment, the control unit 11 has a predetermined size (for example, a width (when the user wears the display device 40 (ie An initial space perpendicular to the user's line-of-sight direction and parallel to the floor surface) 10 m, depth (an initial user's line-of-sight direction parallel to the floor surface) 10 m, and a 3 m high cuboid range) A virtual three-dimensional space (hereinafter referred to as a virtual space) corresponding to the space) is configured.
 すなわち制御部11は、現実空間の画像を参照しつつ、この仮想空間内に仮想的な三次元の物体を配し、あるいは映像効果を適用する。また、この制御部11は、この仮想空間内に、仮想空間の画像をレンダリングする際に用いる視野に係る情報(ユーザの左目に対応する視野と右目に対応する視野の二つに係る情報または共通の一つの視野の情報)を得て、当該得た情報で特定される視野から見た仮想空間の画像(左右のそれぞれの目に対応する視野の情報を用いる場合は立体視画像となる)を生成する。そして制御部11は、生成した画像を表示装置40に出力する。この制御部11の詳しい動作については、後に述べる。 That is, the control unit 11 places a virtual three-dimensional object in the virtual space or applies a video effect while referring to an image in the real space. In addition, the control unit 11 includes, in the virtual space, information related to a visual field used when rendering an image of the virtual space (information related to two fields, a visual field corresponding to the user's left eye and a visual field corresponding to the right eye, or common information). Image of the virtual field viewed from the visual field specified by the obtained information (when the visual field information corresponding to the left and right eyes is used, a stereoscopic image is obtained). Generate. Then, the control unit 11 outputs the generated image to the display device 40. Detailed operation of the control unit 11 will be described later.
 記憶部12は、RAM等のメモリデバイスを少なくとも一つ含み、制御部11が実行するプログラムを格納する。また、この記憶部12は制御部11のワークメモリとしても動作し、制御部11がプログラム実行の過程で使用するデータを格納する。このプログラムは、コンピュータ可読かつ非一時的な記録媒体に格納されて提供され、この記憶部12に格納されたものであってもよい。 The storage unit 12 includes at least one memory device such as a RAM and stores a program executed by the control unit 11. The storage unit 12 also operates as a work memory for the control unit 11 and stores data used by the control unit 11 in the course of program execution. The program may be provided by being stored in a computer-readable non-transitory recording medium and stored in the storage unit 12.
 インタフェース部13は、操作デバイス20や中継装置30との間で画像処理装置10の制御部11がデータ通信を行うためのインタフェースである。画像処理装置10は、インタフェース部13を介して有線又は無線のいずれかで操作デバイス20や中継装置30等と接続される。一例として、このインタフェース部13は、画像処理装置10が供給する画像(立体視画像)や音声を中継装置30に送信するために、HDMI(登録商標)(High-Definition Multimedia Interface)などのマルチメディアインタフェースを含んでよい。また、中継装置30経由で表示装置40から各種の情報を受信したり、制御信号等を送信したりするために、USB等のデータ通信インタフェースを含んでよい。さらにインタフェース部13は、操作デバイス20に対するユーザの操作入力の内容を示す信号を受信するために、USB等のデータ通信インタフェースを含んでよい。 The interface unit 13 is an interface for the control unit 11 of the image processing apparatus 10 to perform data communication with the operation device 20 and the relay device 30. The image processing apparatus 10 is connected to the operation device 20, the relay apparatus 30, or the like via the interface unit 13 by either wired or wireless. As an example, the interface unit 13 is a multimedia such as HDMI (High-Definition Multimedia Interface) in order to transmit an image (stereoscopic image) and sound supplied by the image processing device 10 to the relay device 30. An interface may be included. In addition, a data communication interface such as a USB may be included in order to receive various information from the display device 40 via the relay device 30 and to transmit a control signal or the like. Further, the interface unit 13 may include a data communication interface such as a USB in order to receive a signal indicating the content of the user's operation input to the operation device 20.
 操作デバイス20は、家庭用ゲーム機のコントローラ等であって、ユーザが画像処理装置10に対して各種の指示操作を行うために使用される。操作デバイス20に対するユーザの操作入力の内容は、有線又は無線のいずれかにより画像処理装置10に送信される。なお、操作デバイス20は必ずしも画像処理装置10と別体でなくてもよく、画像処理装置10の筐体表面に配置された操作ボタンやタッチパネル等を含んでもよい。 The operation device 20 is a controller or the like of a consumer game machine, and is used by the user to perform various instruction operations on the image processing apparatus 10. The content of the user's operation input to the operation device 20 is transmitted to the image processing apparatus 10 by either wired or wireless. The operation device 20 is not necessarily separate from the image processing apparatus 10, and may include operation buttons, a touch panel, and the like arranged on the surface of the image processing apparatus 10.
 中継装置30は、有線又は無線のいずれかにより表示装置40と接続されており、画像処理装置10から供給される画像(立体視画像)のデータを受け付けて、受け付けたデータに応じた映像信号を表示装置40に対して出力する。このとき中継装置30は、必要に応じて、供給された画像が表す映像に対して、表示装置40の光学系によって生じる歪みを補正する処理などを実行し、補正された映像を表す映像信号を出力してもよい。なお画像処理装置10から供給される画像が立体視画像であれば、中継装置30から表示装置40に供給される映像信号は、立体視画像に基づいて生成した左目用の映像信号と右目用の映像信号との二つの映像信号を含んでいる。また、中継装置30は、立体視画像や映像信号以外にも、音声データや制御信号など、画像処理装置10と表示装置40との間で送受信される各種の情報を中継する。 The relay device 30 is connected to the display device 40 by either wired or wireless, receives data of an image (stereoscopic image) supplied from the image processing device 10, and outputs a video signal corresponding to the received data. Output to the display device 40. At this time, the relay device 30 executes a process for correcting distortion generated by the optical system of the display device 40 as necessary for the video represented by the supplied image, and outputs a video signal representing the corrected video. It may be output. If the image supplied from the image processing device 10 is a stereoscopic image, the video signal supplied from the relay device 30 to the display device 40 includes a left-eye video signal and a right-eye video signal generated based on the stereoscopic image. Two video signals including a video signal are included. The relay device 30 relays various information transmitted and received between the image processing device 10 and the display device 40 such as audio data and control signals, in addition to the stereoscopic image and the video signal.
 表示装置40は、ユーザが頭部に装着して使用する表示デバイスであって、中継装置30から入力される映像信号に応じた映像を表示し、ユーザに閲覧させる。本実施形態では、表示装置40はユーザの右目と左目とのそれぞれの目の前に、それぞれの目に対応した映像を表示するものとする。この表示装置40は、図1に示したように、映像表示素子41、光学素子42、カメラ43、センサ部44、及び通信インタフェース45を含んで構成される。 The display device 40 is a display device that the user wears on the head and uses the video according to the video signal input from the relay device 30 to allow the user to browse. In the present embodiment, it is assumed that the display device 40 displays an image corresponding to each eye in front of each of the user's right eye and left eye. As shown in FIG. 1, the display device 40 includes a video display element 41, an optical element 42, a camera 43, a sensor unit 44, and a communication interface 45.
 映像表示素子41は、有機EL表示パネルや液晶表示パネルなどであって、中継装置30から供給される映像信号に応じた映像を表示する。この映像表示素子41は、左目用の映像と右目用の映像とを一列に並べて表示する1つの表示素子であってもよいし、左目用の映像と右目用の映像とをそれぞれ独立に表示する一対の表示素子を含んで構成されてもよい。また、スマートフォン等のディスプレイ画面が、そのまま映像表示素子41として用いられてもよい。この場合、スマートフォン等が、中継装置30から供給される映像信号に応じた映像を表示することとなる。 The video display element 41 is an organic EL display panel, a liquid crystal display panel, or the like, and displays a video corresponding to a video signal supplied from the relay device 30. The video display element 41 may be one display element that displays the left-eye video and the right-eye video in a line, or displays the left-eye video and the right-eye video independently. A pair of display elements may be included. A display screen such as a smartphone may be used as the video display element 41 as it is. In this case, a smartphone or the like displays a video corresponding to the video signal supplied from the relay device 30.
 また表示装置40は、ユーザの網膜に直接映像を投影する網膜照射型(網膜投影型)の装置であってもよい。この場合、映像表示素子41は、光を発するレーザーとその光を走査するMEMS(Micro Electro Mechanical Systems)ミラーなどによって構成されてもよい。 Further, the display device 40 may be a retinal irradiation type (retinal projection type) device that directly projects an image on a user's retina. In this case, the image display element 41 may be configured by a laser that emits light and a MEMS (Micro Electro Mechanical Systems) mirror that scans the light.
 光学素子42は、ホログラムやプリズム、ハーフミラーなどであって、ユーザの目の前に配置され、映像表示素子41が表示する映像の光を透過又は屈折させて、ユーザの目に入射させる。具体的に、この光学素子42は、左目用光学素子42Lと、右目用光学素子42Rとを含んでもよい。この場合、映像表示素子41が表示する左目用の映像は、左目用光学素子42Lを経由してユーザの左目に入射し、右目用の映像は右目用光学素子42Rを経由してユーザの右目に入射するようにしてもよい。これによりユーザは、表示装置40を頭部に装着した状態で、左目用の映像を左目で、右目用の映像を右目で、それぞれ見ることができるようになる。なお、本実施形態において表示装置40は、ユーザが外界の様子を視認することができない非透過型の表示装置であるものとする。 The optical element 42 is a hologram, a prism, a half mirror, or the like, and is disposed in front of the user's eyes. The optical element 42 transmits or refracts the image light displayed by the image display element 41 to enter the user's eyes. Specifically, the optical element 42 may include a left-eye optical element 42L and a right-eye optical element 42R. In this case, the left-eye image displayed by the image display element 41 is incident on the user's left eye via the left-eye optical element 42L, and the right-eye image is incident on the user's right eye via the right-eye optical element 42R. You may make it inject. As a result, the user can view the left-eye video with the left eye and the right-eye video with the right eye while the display device 40 is mounted on the head. In the present embodiment, the display device 40 is assumed to be a non-transmissive display device in which the user cannot visually recognize the appearance of the outside world.
 カメラ43は、表示装置40の前面(ユーザの前面側)の中央よりやや左側と中央よりやや右側とにそれぞれ配した一対の撮像素子430L,430R(以下の説明で左右を区別する必要のないときには、撮像素子430としてまとめて称する)を含む。またこのカメラ43は、ユーザの背面側に配された少なくとも一つの撮像素子430Bを含んでもよい。このカメラ43は、少なくとも撮像素子430L,Rで撮像したユーザ前方の現実空間の画像を撮像し、当該撮像して得た画像データを、中継装置30を介して画像処理装置10に対して出力する。 The camera 43 has a pair of image sensors 430L and 430R arranged on the left side slightly on the front side (front side of the user) of the display device 40 and on the right side with respect to the center (when there is no need to distinguish between left and right in the following description). , Collectively referred to as the image sensor 430). The camera 43 may include at least one image sensor 430B disposed on the back side of the user. The camera 43 captures at least an image of the real space in front of the user captured by the image sensors 430L and 430R, and outputs image data obtained by capturing the image to the image processing device 10 via the relay device 30. .
 センサ44は、ユーザの頭部の方向(ユーザの顔の前面方向)と位置とを検出する頭部方向センサ441をさらに含んでもよい。この場合、当該頭部方向センサ441は、ユーザの頭部の方向(顔面の方向)を検出する。具体的にこの頭部方向センサ441は、ジャイロ等であり、表示装置40の装着時の初期の方向からの床面に平行な面内での頭部方向の回転角度と、仰角方向の回転角度と、視野方向の軸まわりの回転角度を検出して出力する。またこの頭部方向センサ441は、表示装置40の所定の位置(例えばカメラ43の撮像素子430Lと撮像素子430Rとを結ぶ線分を二等分する点の位置)を基準位置として、この基準位置の、ユーザの左右方向(横断面と冠状面の交差する軸、以下X軸とする)、前後方向(矢状面と横断面の交差する軸、以下Y軸とする)、上下方向(Z軸とする)への装着時からの移動量(x,y,z)を検出して出力する。この基準位置を原点とした各撮像素子430の相対的座標は既知であるものとする。 The sensor 44 may further include a head direction sensor 441 that detects the direction of the user's head (the front direction of the user's face) and the position. In this case, the head direction sensor 441 detects the direction of the user's head (face direction). Specifically, the head direction sensor 441 is a gyro or the like, and the rotation angle in the head direction and the rotation angle in the elevation direction in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted. Then, the rotation angle around the viewing direction axis is detected and output. The head direction sensor 441 uses a predetermined position of the display device 40 (for example, the position of a point that bisects a line segment connecting the image sensor 430L and the image sensor 430R of the camera 43) as a reference position. The left and right direction of the user (the axis where the cross section and the coronal plane intersect, hereinafter referred to as the X axis), the front and rear direction (the axis where the sagittal plane and the cross section intersect, hereinafter referred to as the Y axis), and the vertical direction (Z axis) ) Is detected and output (x, y, z) from when it is attached. Assume that the relative coordinates of each image sensor 430 with the reference position as the origin are known.
 通信インタフェース45は、中継装置30との間で映像信号や、画像データ等のデータの通信を行うためのインタフェースである。例えば表示装置40が中継装置30との間で無線LANやBluetooth(登録商標)などの無線通信によりデータの送受信を行う場合、通信インタフェース45は通信用のアンテナ、及び通信モジュールを含む。 The communication interface 45 is an interface for communicating data such as video signals and image data with the relay device 30. For example, when the display device 40 transmits / receives data to / from the relay device 30 by wireless communication such as a wireless LAN or Bluetooth (registered trademark), the communication interface 45 includes a communication antenna and a communication module.
 次に、本発明の実施の形態に係る画像処理装置10の制御部11の動作について説明する。この制御部11は、記憶部12に格納されたプログラムを実行することにより、図2に例示するように、機能的に、画像取得部21と、視野決定処理部23と、画像生成部24と、出力部25とを実現する。 Next, the operation of the control unit 11 of the image processing apparatus 10 according to the embodiment of the present invention will be described. As illustrated in FIG. 2, the control unit 11 functionally executes the program stored in the storage unit 12 to functionally obtain an image acquisition unit 21, a visual field determination processing unit 23, and an image generation unit 24. The output unit 25 is realized.
 画像取得部21は、表示装置40を装着したユーザの周囲の現実空間の画像を取得する。具体的にこの画像取得部21は、表示装置40から中継装置30を介して、カメラ43にて撮像した画像データを受け入れる。本実施の形態の例では、カメラ43で撮像した画像データは、左右に配した一対の撮像素子430で撮像された一対の画像データであり、各画像データの視差により撮像された現実空間内の対象物までの距離が判断できるものである。本実施の形態では、この画像取得部21は、カメラ43で撮像された画像データに基づいて、当該画像データ(以下区別のため撮像画像データと呼ぶ)と同じサイズのデプスマップを生成して出力する。ここでデプスマップは、カメラ43で撮像された画像データの各画素に撮像された対象物までの距離を表す情報を、当該画素に対応する画素の値とした画像データである。 The image acquisition unit 21 acquires an image of the real space around the user wearing the display device 40. Specifically, the image acquisition unit 21 receives image data captured by the camera 43 from the display device 40 via the relay device 30. In the example of the present embodiment, the image data captured by the camera 43 is a pair of image data captured by the pair of imaging elements 430 arranged on the left and right, and in the real space captured by the parallax of each image data. The distance to the object can be determined. In the present embodiment, the image acquisition unit 21 generates and outputs a depth map having the same size as the image data (hereinafter referred to as captured image data for distinction) based on the image data captured by the camera 43. To do. Here, the depth map is image data in which information indicating the distance to the object captured by each pixel of the image data captured by the camera 43 is used as the value of the pixel corresponding to the pixel.
 視野決定処理部23は、仮想空間をレンダリングするための視野の情報を決定する。具体的な例として、この視野決定処理部23は、カメラ43に含まれる撮像素子430の位置によらず、予め定められた(例えばプログラムにハードコードされていてもよいし、設定ファイルに記述されたものを読取ったものであってもよい)仮想空間内のレンダリングを行う際のカメラ(以下、レンダリングカメラと呼ぶ。このレンダリングカメラの位置から見た画像がレンダリングされる)の位置座標RCと、視野の方向を表す情報(例えば位置座標RCを起点とし、視野の中心を通るベクトル情報)を得て、これらを視野の情報とする。 The visual field determination processing unit 23 determines visual field information for rendering the virtual space. As a specific example, the visual field determination processing unit 23 is determined in advance (for example, may be hard-coded in a program or described in a setting file) regardless of the position of the image sensor 430 included in the camera 43. A position coordinate RC of a camera (hereinafter referred to as a rendering camera, in which an image viewed from the position of the rendering camera is rendered) in rendering in the virtual space; Information representing the direction of the visual field (for example, vector information starting from the position coordinate RC and passing through the center of the visual field) is obtained and used as visual field information.
 またこの視野決定処理部23は、別の例として、ユーザの動きに伴って時間変化する現実空間内の基準位置からの相対座標として仮想空間内のレンダリングカメラの位置座標RCを得てもよい。一つの例として、基準位置は、撮像素子430の位置とし、この撮像素子430から予め定めた相対座標値だけ移動した位置に対応する仮想空間内の位置を、レンダリングカメラの位置座標RCとしてもよい。 As another example, the visual field determination processing unit 23 may obtain the position coordinates RC of the rendering camera in the virtual space as relative coordinates from the reference position in the real space that changes with time according to the user's movement. As an example, the reference position may be the position of the image sensor 430, and the position in the virtual space corresponding to the position moved from the image sensor 430 by a predetermined relative coordinate value may be the position coordinate RC of the rendering camera. .
 ここで相対座標は、例えば撮像素子430R(または430L)の位置から、表示装置40を装着したユーザの右目(または左目)があるべき位置への相対座標であってもよい。この例では、ユーザの目の位置に対応する仮想空間内の位置がレンダリングカメラの位置座標RCとなる。 Here, the relative coordinates may be relative coordinates from the position of the image sensor 430R (or 430L) to the position where the user's right eye (or left eye) wearing the display device 40 should be, for example. In this example, the position in the virtual space corresponding to the position of the user's eyes becomes the position coordinate RC of the rendering camera.
 またこの視野決定処理部23は、頭部方向センサ441が出力する情報から、視野の方向を表す情報(例えば位置座標RCを起点とし、視野の中心を通るベクトル情報)を得てもよい。この場合、視野決定処理部23は、頭部方向センサ441が出力する情報から、図3に例示するような、表示装置40の装着時の初期の方向からの床面に平行な面内での頭部方向の回転角度θと、仰角方向の回転角度φと、視野方向の軸まわりの回転角度ψと、頭部の移動量(x,y,z)を取得する。そして視野決定処理部23は、回転角度θ、φによりユーザの視野の方向が決定され、回転角度ψによって視野の方向を軸としたユーザの首の傾きが決定される。 Further, the visual field determination processing unit 23 may obtain information indicating the direction of the visual field (for example, vector information passing through the center of the visual field starting from the position coordinates RC) from the information output from the head direction sensor 441. In this case, the visual field determination processing unit 23 determines that the information output from the head direction sensor 441 is in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted, as illustrated in FIG. The rotation angle θ in the head direction, the rotation angle φ in the elevation direction, the rotation angle ψ about the axis in the viewing direction, and the movement amount (x, y, z) of the head are acquired. The visual field determination processing unit 23 determines the direction of the user's visual field based on the rotation angles θ and φ, and determines the inclination of the user's neck around the visual field direction based on the rotational angle ψ.
 一例としてこの視野決定処理部23は、現実空間におけるユーザの左右の目の位置に対応する仮想空間内の座標を、レンダリングカメラの位置座標RCとする。すなわち、カメラ43に含まれる左側の撮像素子430Lと、右側の撮像素子430Rの対象空間内の座標情報(ユーザの頭部の移動量に係る情報及び、基準位置から各撮像素子430までの相対的座標とにより演算できる)と、各撮像素子430の位置に対する当該左右の目の相対座標の情報とから、対象空間内のXYZ座標系における、ユーザの左右の目の位置に対応する仮想空間内の座標情報を、レンダリングカメラの位置座標RCとして画像生成部24に出力する。 As an example, the visual field determination processing unit 23 sets the coordinates in the virtual space corresponding to the positions of the left and right eyes of the user in the real space as the position coordinates RC of the rendering camera. That is, coordinate information in the target space of the left image sensor 430L and the right image sensor 430R included in the camera 43 (information related to the amount of movement of the user's head and the relative position from the reference position to each image sensor 430). In the virtual space corresponding to the positions of the left and right eyes of the user in the XYZ coordinate system in the target space from the information of the relative coordinates of the left and right eyes with respect to the position of each image sensor 430. The coordinate information is output to the image generation unit 24 as the position coordinate RC of the rendering camera.
 またこの視野決定処理部23は、回転角度θ、φによりユーザの視野の方向が決定した視野の方向の情報と、回転角度ψによって決定された、視野の方向を軸としたユーザの首の傾きの情報とを画像生成部24に出力する。 The visual field determination processing unit 23 also includes information on the direction of the visual field determined by the rotation angle θ and φ and the direction of the user's visual field, and the tilt of the user's neck about the visual field direction determined by the rotation angle ψ. Are output to the image generation unit 24.
 画像生成部24は、視野決定処理部23からレンダリングカメラの位置座標RC及び視野の方向の情報を受け入れる。そしてこの画像生成部24は、取得した現実空間の画像に基づいて、上記情報で特定される視野を見たときの立体視画像を生成する。 The image generation unit 24 receives information about the position coordinates RC of the rendering camera and the direction of the visual field from the visual field determination processing unit 23. And this image generation part 24 produces | generates the stereoscopic vision image when seeing the visual field specified by the said information based on the acquired image of the real space.
 本実施の形態の一例では、この画像生成部24は、画像取得部21が出力するデプスマップの情報に基づいて、まず、環境メッシュリスト情報と、物体バッファと、を生成する。ここで環境メッシュリスト情報は、デプスマップをメッシュに分割し、当該メッシュの頂点座標(画素の位置を表す情報)と、メッシュの識別情報と、メッシュ内の画素に対応する撮像画像データ内の画素に撮像された対象物の法線の情報と、メッシュの種類情報(予め定めた種類のいずれであるかを表す情報)と、メッシュの表面形状に関する情報とを含む。 In an example of the present embodiment, the image generation unit 24 first generates environment mesh list information and an object buffer based on the depth map information output from the image acquisition unit 21. Here, the environment mesh list information is obtained by dividing the depth map into meshes, the vertex coordinates of the mesh (information indicating the pixel position), mesh identification information, and pixels in the captured image data corresponding to the pixels in the mesh. Includes information on the normal of the object imaged on the screen, mesh type information (information indicating which type is predetermined), and information on the surface shape of the mesh.
 このメッシュの種類の情報は、メッシュ内の画素に対応する撮像画像データ内の画素に撮像された対象物が床、天井、壁、障害物(床から所定の高さ以内にある壁以外の物体などとして予め定めておく)、その他のいずれであるかを表す。また、メッシュの表面形状に関する情報は、平面、凹凸のある面、球状の面、複雑な形状の面といった表面形状のいずれであるかを表す情報とする。 This mesh type information indicates that the object imaged on the pixel in the captured image data corresponding to the pixel in the mesh is an object other than a wall that is within a predetermined height from the floor, ceiling, wall, or obstacle. Or the like) or the other. Further, the information regarding the surface shape of the mesh is information indicating whether the surface shape is a plane, an uneven surface, a spherical surface, or a complex surface.
 このように、デプスマップの情報等から、撮像画像データ内の対象物の種類や、表面形状等を認識する方法には、種々のものがあるが、どのような方法を採用するかはここでは問わないこととする。 As described above, there are various methods for recognizing the type of the object in the captured image data, the surface shape, and the like from the depth map information and the like. Do not ask.
 また物体バッファは、ユーザの位置と、ユーザの視線方向より後方とを含むユーザ周囲の所定サイズ(例えば幅(初期のユーザの視線方向に直交し、床面に平行な方向)10m、奥行(床面に平行な初期のユーザの視線方向)10m、高さ3mの直方体範囲)の現実空間(以下、対象空間と呼ぶ)を仮想的にボクセル(Voxel:仮想的な体積要素、例えば幅10cm,奥行10cm,高さ10cmの立方体要素)空間で表現したものであり、物体が存在するボクセルの値(ボクセル値)を「1」、存在しないボクセルの値を「0」、存在するか否かが不明なボクセルの値を「-1」と設定したものである(図4)。 Further, the object buffer has a predetermined size (for example, a width (a direction perpendicular to the initial user's line-of-sight direction and parallel to the floor surface) 10 m, a depth (floor) including the user's position and the rear of the user's line-of-sight direction. An actual space (hereinafter referred to as a target space) of 10 m and a height of 3 m in a rectangular parallelepiped direction (hereinafter referred to as a target space) is virtually a voxel (a virtual volume element, for example, a width of 10 cm, a depth). (Cube element of 10 cm and 10 cm in height) is expressed in space, the value of the voxel in which the object exists (voxel value) is “1”, the value of the non-existing voxel is “0”, and it is unknown The voxel value is set to “−1” (FIG. 4).
 図4では、図示の都合上、一部のボクセルのみを示し、またボクセルのサイズも説明のために適宜設定している。この対象空間に対するボクセルのサイズは、必ずしも実施時に適したものを示したものではない。また、図4では、対象空間の奥側隅に立方体状の物体が配され、その表面に相当するボクセルの値を、物体が存在することを表す「1」に設定され、表面から隠された部分のボクセルの値を、不明であることを表す「-1」とし、物体表面までの間にあるボクセルの値を、何もないことを表す「0」と設定する例を示している。 FIG. 4 shows only a part of the voxels for convenience of illustration, and the size of the voxels is also set as appropriate for explanation. The size of the voxel with respect to the target space does not necessarily indicate a size suitable for implementation. In FIG. 4, a cubic object is arranged at the back corner of the target space, and the value of the voxel corresponding to the surface is set to “1” indicating that the object exists, and is hidden from the surface. In this example, the value of the partial voxel is set to “−1” indicating that it is unknown, and the value of the voxel existing up to the object surface is set to “0” indicating that there is nothing.
 画像生成部24は、デプスマップの情報に基づいてこのボクセル値を設定する。デプスマップ上の各画素は、デプスマップの元となった画像データの撮影時のカメラ43の位置座標(基準位置の座標でよい、以下撮影時位置と呼ぶ)を頂点とし、デプスマップの画角に相当する仮想的な四角錐の底辺を、デプスマップの解像度(縦py画素×横px画素)で分割したものである。そこで撮影時位置の座標を起点として各画素の頂点を通る線分に平行なベクトル(ワールド座標系での座標の差)や、撮影時位置の座標を起点として各画素の中心を通る線分に平行なベクトル(ワールド座標系での座標の差)が、撮影時位置の座標と、デプスマップの画角を表す情報と、デプスマップの解像度とから、各画素の方向として演算できる。 The image generation unit 24 sets this voxel value based on the depth map information. Each pixel on the depth map has a vertex at the position coordinates of the camera 43 at the time of shooting the image data that is the origin of the depth map (which may be a reference position coordinate, hereinafter referred to as a shooting position), and the angle of view of the depth map. Is divided by the resolution of the depth map (vertical py pixels × horizontal px pixels). Therefore, a vector parallel to the line segment passing through the vertex of each pixel starting from the coordinates of the shooting position (the difference in coordinates in the world coordinate system) or a line segment passing through the center of each pixel starting from the coordinates of the shooting position A parallel vector (difference in coordinates in the world coordinate system) can be calculated as the direction of each pixel from the coordinates of the shooting position, information representing the angle of view of the depth map, and the resolution of the depth map.
 そこで、画像生成部24は、デプスマップ上の各画素について、撮影時位置の座標(基準位置の座標でよい)に対応する物体バッファ内の座標から当該画素の方向にあり、デプスマップが表す物体までの距離にあたるボクセルの値を「1」とし、当該ボクセルとは異なり、当該ボクセルからカメラ43までの線上にあるボクセルの値を「0」とする。また画像生成部24は、カメラ43で撮像された画像データにおいて、現実空間内にある物体により隠され、撮像されていない部分(机や壁の裏、床に置かれた物の陰となっている部分)については、物体が存在するか否かが不明であるとして、対応する部分のボクセルの値を「-1」と設定する。 Therefore, the image generation unit 24, for each pixel on the depth map, is in the direction of the pixel from the coordinates in the object buffer corresponding to the coordinates of the shooting position (may be the coordinates of the reference position), and the object represented by the depth map The value of the voxel corresponding to the distance to is “1”, and unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is “0”. In addition, the image generation unit 24 is hidden by an object in the real space in the image data picked up by the camera 43 and is behind the part not picked up (the object behind the desk, the wall, or the floor). For the portion), the voxel value of the corresponding portion is set to “−1” because it is unknown whether or not an object exists.
 画像生成部24は、ユーザが移動したり、視線方向を変えたりすることで、カメラ43によって撮像された画像データに、過去に撮像されず、物体が存在するか否かが不明であったボクセルに対応する部分(値が「-1」であったボクセルに対応する部分)のデプスマップが得られたときには、当該部分のボクセルの値を、得られたデプスマップに基づいて「0」または「1」に設定して更新する。 The image generation unit 24 moves the user or changes the direction of the line of sight, so that the image data captured by the camera 43 has not been captured in the past and whether or not an object exists is unknown. When the depth map of the portion corresponding to (the portion corresponding to the voxel whose value was “−1”) is obtained, the value of the voxel of the portion is set to “0” or “0” based on the obtained depth map. Set to “1” to update.
 なお、デプスマップ等の情報から、このような物体の存在する範囲を表す三次元空間内のボクセル値を設定する方法は、ここで述べた方法のほか、3Dスキャンの方法で広く知られた方法等、種々の方法を採用できる。 A method for setting a voxel value in a three-dimensional space representing a range where such an object exists from information such as a depth map is a method widely known as a 3D scanning method in addition to the method described here. Various methods can be employed.
 さらに画像生成部24は、視野決定処理部23から入力された情報で特定されるレンダリングカメラの位置座標RCから、物体バッファ中で、視野決定処理部23から入力された情報で特定される視野方向にある、値が「0」でないボクセルを二次元投影した投影像を生成する(図5)。そして画像生成部24は、取得された現実空間の画像に基づいて、現実空間内に配された物体を検出し、当該検出した各物体の位置に仮想的な物体を配した立体視画像を生成して出力する。具体的な例として、画像生成部24は、予め定めた規則(対象物変換規則と呼ぶ)に従って、仮想空間を構成する。 Furthermore, the image generation unit 24 uses the rendering camera position coordinates RC specified by the information input from the visual field determination processing unit 23 to determine the visual field direction specified by the information input from the visual field determination processing unit 23 in the object buffer. A projection image is generated by two-dimensionally projecting a voxel whose value is not “0” (FIG. 5). Then, the image generation unit 24 detects an object arranged in the real space based on the acquired real space image, and generates a stereoscopic image in which a virtual object is arranged at the position of each detected object. And output. As a specific example, the image generation unit 24 configures a virtual space according to a predetermined rule (referred to as an object conversion rule).
 一例として、この対象物変換規則は次のようなものである。ボクセルの値が「1」であって、対応する部分のメッシュについて、
(1)メッシュの種類が「天井」であれば、背景を合成する。
(2)メッシュの種類が障害物で、かつメッシュ表面形状が平面である物体の位置には仮想オブジェクト「操作パネル」を配置する。
(3)メッシュの種類が障害物で、かつメッシュ表面形状が凹凸のある面である物体の位置は、仮想オブジェクトである「岩」または「箱」を配置する。
(4)メッシュの種類が障害物で、かつメッシュ表面形状が球状の物体の位置には、仮想オブジェクトである「ライト」を配置する。
(5)メッシュの種類が障害物で、かつメッシュ表面形状が複雑な形状の物体のある範囲には、仮想オブジェクトである「樹木,草木」を配置する。
As an example, the object conversion rule is as follows. The value of voxel is “1” and the mesh of the corresponding part is
(1) If the mesh type is “ceiling”, the background is synthesized.
(2) A virtual object “operation panel” is arranged at the position of an object whose mesh type is an obstacle and whose mesh surface shape is a plane.
(3) A “rock” or “box”, which is a virtual object, is placed at the position of an object whose type of mesh is an obstacle and whose mesh surface shape is an uneven surface.
(4) A “light”, which is a virtual object, is placed at the position of an object whose mesh type is an obstacle and whose mesh surface shape is spherical.
(5) “Trees and plants”, which are virtual objects, are arranged in a range of an object having a mesh type obstacle and a complicated mesh surface shape.
 画像生成部24は、別途、背景画像と、操作パネルや、岩、箱、樹木、草木等の仮想オブジェクトの三次元モデルデータ等の入力を受け入れる。そして、図5に例示した投影像のうちボクセル値が「1」であるボクセルが投影された範囲であって、対応する範囲のメッシュが対象物変換規則に基づいて「操作パネル」とされた位置に、操作パネルの仮想オブジェクトを配する。以下、同様に、投影像のうちボクセル値が「1」であるボクセルが投影された範囲であって、対応する範囲のメッシュが対象物変換規則に基づいて「岩」または「箱」とされた部分に対応する位置に、「岩」または「箱」の仮想オブジェクトを配し、…というように、各仮想オブジェクトを配していく。このように仮想空間内に、三次元モデルデータで表される仮想オブジェクトを配する処理は、三次元グラフィックスを作成する際の処理において広く知られているので、ここでの詳しい説明を省略する。 The image generation unit 24 separately receives input of a background image, an operation panel, and three-dimensional model data of a virtual object such as a rock, a box, a tree, and a plant. 5 is a range in which a voxel having a voxel value “1” in the projected image illustrated in FIG. 5 is projected, and the mesh in the corresponding range is set as an “operation panel” based on the object conversion rule. Arrange virtual objects on the operation panel. Hereinafter, similarly, a voxel having a voxel value of “1” in the projected image is projected, and the mesh in the corresponding range is set to “rock” or “box” based on the object conversion rule. A virtual object of “rock” or “box” is arranged at a position corresponding to the part, and each virtual object is arranged as follows. As described above, the process of arranging the virtual object represented by the 3D model data in the virtual space is widely known in the process of creating the 3D graphics, so detailed description thereof is omitted here. .
 そして画像生成部24は、仮想オブジェクトを配した後に、視野決定処理部23から入力された、レンダリングカメラの位置座標RC(ここではユーザの左目及び右目のそれぞれに対応する座標とする)から、視野決定処理部23から入力された情報で特定される視野方向を見た像をそれぞれレンダリングする。画像生成部24は、レンダリングして得た一対の画像データを、立体視画像として出力部25に出力する。そして出力部25は、画像生成部24が生成した立体視画像の画像データを、表示装置40に対して中継装置30を介して出力する。 Then, after arranging the virtual object, the image generation unit 24 calculates the field of view from the position coordinates RC of the rendering camera (here, the coordinates corresponding to the left eye and the right eye of the user) input from the field of view determination processing unit 23. The images viewed from the viewing direction specified by the information input from the determination processing unit 23 are rendered. The image generation unit 24 outputs the pair of image data obtained by rendering to the output unit 25 as a stereoscopic image. Then, the output unit 25 outputs the image data of the stereoscopic image generated by the image generation unit 24 to the display device 40 via the relay device 30.
 なお、画像生成部24は、レンダリングの際、ユーザの首の傾きに係る角度ψの情報を用いて、この角度ψだけ傾いた画像データを生成してもよい。 Note that the image generation unit 24 may generate image data tilted by this angle ψ using information on the angle ψ related to the tilt of the user's neck at the time of rendering.
 さらに本実施の形態では、画像生成部24は、指示により、仮想空間の立体視画像に代えて、カメラ43の撮像素子430L,Rが撮像した現実空間の画像を、そのまま現実空間の立体視画像として出力する(いわゆるカメラシースルーの機能を提供する)こととしてもよい。 Furthermore, in the present embodiment, the image generation unit 24 replaces the stereoscopic image in the virtual space according to the instruction with the real space image captured by the imaging elements 430L and R of the camera 43 as it is. (So-called camera see-through function is provided).
 この例では画像生成部24は、カメラ43に含まれる左側の撮像素子430Lと、右側の撮像素子430Rとで撮像した画像データをそのまま出力し、出力部25が、これらの画像データに基づく画像をそのまま、左目用の画像、及び右目用の画像として表示する。 In this example, the image generation unit 24 outputs image data captured by the left imaging element 430L and the right imaging element 430R included in the camera 43 as they are, and the output unit 25 outputs an image based on these image data. The image is displayed as it is as the image for the left eye and the image for the right eye.
[動作]
 本発明の実施の形態の画像処理装置10は以上の構成を基本的に備えており、次のように動作する。ユーザが表示装置40を頭部に装着すると、画像処理装置10は、図6に例示する処理を開始し、当該表示装置40の基準位置(例えばカメラ43の各撮像素子430の重心位置)を原点として、ユーザの後方を含むユーザ周囲のX軸方向に±5m(合計10m)、Y軸方向に±5m(合計10m)、Z軸方向に床から高さ3mの直方体範囲の現実空間を対象空間として設定する(S1)。
[Operation]
The image processing apparatus 10 according to the embodiment of the present invention basically includes the above configuration and operates as follows. When the user wears the display device 40 on the head, the image processing device 10 starts the processing illustrated in FIG. 6, and sets the reference position of the display device 40 (for example, the gravity center position of each image sensor 430 of the camera 43) as the origin. A real space in a rectangular parallelepiped range of ± 5 m (total 10 m) in the X-axis direction including the user's back, ± 5 m (total 10 m) in the Y-axis direction, and 3 m in height from the floor in the Z-axis direction (S1).
 そしてこの対象空間を、仮想的にボクセル(Voxel:仮想的な体積要素、例えば幅10cm,奥行10cm,高さ10cmの立方体要素)空間で表現した物体バッファ(当初はすべてのボクセルの値を「-1」と設定する)を設定して、記憶部12に格納する(S2)。 This object space is represented by an object buffer (initially, all voxel values are represented by “−” as a virtual voxel (virtual volume element, for example, a cubic element having a width of 10 cm, a depth of 10 cm, and a height of 10 cm). 1 ”is set and stored in the storage unit 12 (S2).
 表示装置40は、所定のタイミングごと(例えば1/1000秒ごと)に繰り返してカメラ43にて画像を撮像し、撮像して得た画像データを画像処理装置10へ送出している。画像処理装置10は表示装置40から中継装置30を介して、カメラ43にて撮像した画像データを受け入れる(S3)。 The display device 40 repeatedly captures an image with the camera 43 at predetermined timings (for example, every 1/1000 seconds), and sends image data obtained by the imaging to the image processing device 10. The image processing apparatus 10 receives image data captured by the camera 43 from the display apparatus 40 via the relay apparatus 30 (S3).
 画像処理装置10は、ユーザの頭部の方向(顔面の方向)及び移動量の情報(例えば上記XYZ空間の座標値で表す)を取得する。具体的にこのユーザの頭部の方向及び移動量の情報は、表示装置40の頭部方向センサ441が検出し、画像処理装置10に出力したものであってもよい。 The image processing apparatus 10 acquires information on the direction of the user's head (face direction) and the amount of movement (for example, expressed by the coordinate values in the XYZ space). Specifically, the information on the head direction and the movement amount of the user may be information detected by the head direction sensor 441 of the display device 40 and output to the image processing device 10.
 画像処理装置10は、レンダリングカメラの位置座標RCと、視野方向とを決定する(S4)。具体的には、画像処理装置10は、図7に例示するように、上記取得した頭部の移動量の情報を参照し、カメラ43に含まれる左側の撮像素子430Lと、右側の撮像素子430Rの対象空間内の座標情報を得る。そしてここで得られた座標情報と、予め定めた、各撮像素子430L,Rと、ユーザの対応する目(撮像素子430Lに対するユーザの左目と、撮像素子430Rに対するユーザの右目)のそれぞれとの相対的な座標情報とを用い、対象空間内のXYZ座標系における、ユーザの左右の目の座標情報を得る。そしてこの座標情報に対応する仮想空間内の座標情報を、レンダリングカメラの位置座標RCとする。なお、対象空間のXYZ座標系と、仮想空間の座標系とを一致させておき、対象空間内のXYZ座標系での座標値をそのままレンダリングカメラの位置座標RCとしてもよい。また、画像処理装置10は、取得した頭部の方向の情報に基づいて、視野方向を決定する。 The image processing apparatus 10 determines the position coordinates RC of the rendering camera and the viewing direction (S4). Specifically, as illustrated in FIG. 7, the image processing apparatus 10 refers to the acquired information on the movement amount of the head, and includes the left image sensor 430 </ b> L and the right image sensor 430 </ b> R included in the camera 43. Get coordinate information in the target space. Then, the coordinate information obtained here, and each of the predetermined imaging elements 430L and R and the corresponding eyes of the user (the user's left eye for the imaging element 430L and the user's right eye for the imaging element 430R) are respectively relative to each other. The coordinate information of the left and right eyes of the user in the XYZ coordinate system in the target space is obtained using the specific coordinate information. The coordinate information in the virtual space corresponding to this coordinate information is set as the position coordinate RC of the rendering camera. Note that the XYZ coordinate system of the target space and the coordinate system of the virtual space may be matched, and the coordinate values in the XYZ coordinate system in the target space may be used as the position coordinates RC of the rendering camera as they are. Further, the image processing apparatus 10 determines the viewing direction based on the acquired information on the direction of the head.
 さらに画像処理装置10は、例えば操作デバイス20から、ユーザから現実空間の画像(いわゆるカメラシースルーとしての動作)と、仮想空間の画像とのどちらを表示するかの指示を受け入れる(S5)。そして画像処理装置10は当該指示された画像を表示する処理を開始する。ここではまず、現実空間の画像を表示する場合について説明する(S5:「現実空間」)。 Further, for example, the image processing apparatus 10 receives from the operation device 20 an instruction on whether to display an image in the real space (operation as a so-called camera see-through) or an image in the virtual space from the user (S5). Then, the image processing apparatus 10 starts processing for displaying the instructed image. Here, a case where an image of a real space is displayed will be described first (S5: “Real Space”).
 この場合、画像処理装置10は、S3で受け入れた表示装置40からカメラ43が撮像した現実空間の画像を用いて、画像を表示する。本実施の形態の一例では、画像処理装置10は、カメラ43の撮像素子430Lから入力される画像データをそのまま左目用の画像データとする。また画像処理装置10は、カメラ43の撮像素子430Rから入力される画像データをそのまま右目用の画像データとする。画像処理装置10は例えばこのようにして、左目用、右目用の画像データを生成する(S6)。 In this case, the image processing apparatus 10 displays an image using the real space image captured by the camera 43 from the display apparatus 40 received in S3. In an example of the present embodiment, the image processing apparatus 10 uses the image data input from the image sensor 430L of the camera 43 as it is as image data for the left eye. Further, the image processing apparatus 10 directly uses the image data input from the image sensor 430R of the camera 43 as image data for the right eye. For example, the image processing apparatus 10 generates image data for the left eye and right eye in this way (S6).
 画像処理装置10は、ここで生成した左目用の画像データと、右目用の画像データとを表示装置40に対して中継装置30を介して出力する(S7)。そして表示装置40は、左目用の画像データを、左目用の映像として、左目用光学素子42Lを経由してユーザの左目に入射させる。また表示装置40は、右目用の画像データを、右目用の映像として、右目用光学素子42Rを経由してユーザの右目に入射させる。これより、ユーザはカメラ43で撮影され、ユーザの目の位置に視点が変換された現実空間の画像を視認することとなる。以下、画像処理装置10は、処理S3に戻って処理を繰り返す。 The image processing apparatus 10 outputs the image data for the left eye and the image data for the right eye generated here to the display apparatus 40 via the relay apparatus 30 (S7). The display device 40 causes the left-eye image data to enter the left eye of the user via the left-eye optical element 42L as a left-eye image. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye. As a result, the user visually recognizes an image of the real space that is captured by the camera 43 and the viewpoint is converted to the position of the user's eyes. Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.
 また処理S5において、ユーザが仮想空間の画像の表示を指示したと判断すると(S5:「仮想空間」)、画像処理装置10は、カメラ43で撮像された画像データに基づいて、当該画像データから得られるデプスマップを生成する(S8)。 If it is determined in step S5 that the user has instructed display of an image in the virtual space (S5: “virtual space”), the image processing apparatus 10 uses the image data captured from the camera 43 based on the image data. An obtained depth map is generated (S8).
 画像処理装置10は、生成したデプスマップをメッシュに分割し、各メッシュ内の画素に対応する撮像画像データ内の画素に撮像された対象物が天井、壁、障害物(床から所定の高さ以内にある壁以外の物体などとして予め定めておく)、その他のいずれであるかを判断する。また、各メッシュ内のデプスマップ上の画素の値から、メッシュの表面形状が平面、凹凸のある面、球状の面、複雑な形状の面といった表面形状のいずれであるかを判断する。 The image processing apparatus 10 divides the generated depth map into meshes, and the object captured by the pixels in the captured image data corresponding to the pixels in each mesh has a ceiling, a wall, and an obstacle (a predetermined height from the floor). It is determined in advance as an object other than the wall within the range) or the other. Further, from the value of the pixel on the depth map in each mesh, it is determined whether the surface shape of the mesh is a surface shape such as a plane, an uneven surface, a spherical surface, or a complex surface.
 そして画像処理装置10は、生成したデプスマップに関連する情報として、各メッシュのデプスマップ上の位置を表す情報(メッシュの頂点の座標でよい)と、メッシュの種類の情報と、表面形状の情報とを関連付けて、環境メッシュリスト情報として記憶部12に格納する(S9:環境メッシュリスト情報生成)。 Then, the image processing apparatus 10 includes, as information related to the generated depth map, information indicating the position of each mesh on the depth map (may be the coordinates of the vertex of the mesh), information on the type of mesh, and information on the surface shape Are stored in the storage unit 12 as environment mesh list information (S9: environment mesh list information generation).
 画像処理装置10は、デプスマップの各画素を順次選択しつつ、デプスマップ上で選択した画素に対応する方向にあり、デプスマップの選択した画素が表す物体までの距離にあたるボクセルの値を「1」とし、当該ボクセルとは異なり、当該ボクセルからカメラ43までの線上にあるボクセルの値を「0」とする。ここで、カメラ43で撮像された画像データにおいて、現実空間内にある物体により隠され、撮像されていない部分については、物体が存在するか否かが不明であるとして、対応する部分のボクセルの値が「-1」のままとなる。 The image processing apparatus 10 sequentially selects each pixel of the depth map, and sets the value of the voxel corresponding to the distance to the object represented by the selected pixel of the depth map in the direction corresponding to the pixel selected on the depth map as “1”. Unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is set to “0”. Here, in the image data picked up by the camera 43, it is assumed that it is unclear whether or not there is an object that is hidden by an object in the real space and is not picked up. The value remains “−1”.
 なお、画像処理装置10は、ユーザが移動したり、視線方向を変えたりすることで、カメラ43によって撮像された画像データに、過去に撮像されず、物体が存在するか否かが不明であったボクセルに対応する部分(値が「-1」であったボクセルに対応する部分)のデプスマップが得られたときには、処理S11において当該部分のボクセルの値を、得られたデプスマップに基づいて「0」または「1」に設定して更新する(S10:物体バッファの更新)。 It should be noted that the image processing apparatus 10 is unclear whether the image data captured by the camera 43 has not been captured in the past and an object exists as the user moves or changes the line-of-sight direction. When the depth map of the part corresponding to the voxel (the part corresponding to the voxel whose value is “−1”) is obtained, the value of the voxel of the part is obtained based on the obtained depth map in step S11. Set to “0” or “1” to update (S10: Update object buffer).
 画像処理装置10は、レンダリングカメラの位置座標が表す座標から、物体バッファ中で、処理S4において決定した視野方向にある、値が「0」でないボクセルを二次元投影した投影像を生成する(図5)。そして画像処理装置10は、取得された現実空間の画像に基づいて、現実空間内に配された物体を検出し、当該検出した各物体を、予め定めた規則(ゲームの処理を行う場合、ゲームのプログラムで定めた規則)に従って仮想オブジェクトに置き換えた立体視画像を出力する。 The image processing apparatus 10 generates, from the coordinates represented by the position coordinates of the rendering camera, a projection image obtained by two-dimensionally projecting a voxel having a value other than “0” in the field of view determined in step S4 in the object buffer (FIG. 5). Then, the image processing apparatus 10 detects an object arranged in the real space based on the acquired image of the real space, and sets the detected object to a predetermined rule (when a game process is performed, The stereoscopic image replaced with the virtual object is output in accordance with the rules defined in the program No. 1).
 具体的に画像処理装置10は、背景画像と、操作パネル、岩や箱、樹木や草木等の三次元モデルデータ等の入力を受け入れており、仮想的な三次元空間内で、ボクセル値が「1」であるボクセルが投影された範囲に対応し、当該対応する範囲のメッシュが対象物変換規則に基づいて「操作パネル」とされている位置に、操作パネルの仮想オブジェクトを配する。以下、同様に、投影像のうちボクセル値が「1」であるボクセルが投影された範囲であって、対応する範囲のメッシュが対象物変換規則に基づいて「岩」または「箱」とされている位置に、「岩」または「箱」の仮想オブジェクトを配し、…というように、仮想空間内に各仮想オブジェクトを配して、仮想空間を構成する。また天井に相当する範囲については、背景画像を合成して、あたかも天井がないかのような仮想空間を構成する(S11:仮想空間を構成)。 Specifically, the image processing apparatus 10 accepts input of a background image and an operation panel, 3D model data such as rocks and boxes, trees and plants, and the voxel value is “3” in a virtual 3D space. The virtual object of the operation panel is arranged at a position corresponding to the range in which the voxel of “1” is projected and the mesh in the corresponding range is set as the “operation panel” based on the object conversion rule. Hereinafter, similarly, a voxel whose voxel value is “1” in the projected image is projected, and the mesh in the corresponding range is set as “rock” or “box” based on the object conversion rule. A virtual object of “rock” or “box” is arranged at a certain position, and each virtual object is arranged in the virtual space to form a virtual space. For the range corresponding to the ceiling, a background image is synthesized to form a virtual space as if there is no ceiling (S11: configure virtual space).
 画像処理装置10は、上記三次元モデルデータを配した仮想的な三次元空間(対象空間に対応する座標系の空間)内で、処理S4にて決定されたレンダリングカメラの位置座標RC(ユーザの左目及び右目のそれぞれに対応する各座標情報)から見た、処理S4にて決定された視野方向の画像データ(左目及び右目のそれぞれに提示されるべき画像データ)をレンダリングして得る(S12:レンダリング)。そして画像処理装置10は、こうしてレンダリングした一対の画像データを、立体視画像として出力する(S13)。 The image processing apparatus 10 includes the rendering camera position coordinates RC (user's coordinates) determined in step S4 in a virtual three-dimensional space (a coordinate system space corresponding to the target space) in which the three-dimensional model data is arranged. Rendering the image data in the visual field direction determined in step S4 (image data to be presented to each of the left eye and the right eye) viewed from the coordinate information corresponding to each of the left eye and the right eye (S12: rendering). Then, the image processing apparatus 10 outputs the pair of image data thus rendered as a stereoscopic image (S13).
 表示装置40は、左目用の画像データを、左目用の映像として、左目用光学素子42Lを経由してユーザの左目に入射させる。また表示装置40は、右目用の画像データを、右目用の映像として、右目用光学素子42Rを経由してユーザの右目に入射させる。これより、ユーザは、現実空間内の物体が、仮想的な物体(操作パネルや岩、箱等)に変化した仮想的な空間内の画像を視認することとなる。以下、画像処理装置10は、処理S3に戻って処理を繰り返す。 The display device 40 causes the image data for the left eye to enter the left eye of the user through the left-eye optical element 42L as an image for the left eye. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye. Thus, the user visually recognizes an image in a virtual space in which an object in the real space is changed to a virtual object (such as an operation panel, a rock, or a box). Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.
 なお、この図6の例において、物体バッファを更新する処理(処理S10)は、現実空間の画像を表示する処理S6乃至S7の処理を実行する際にも行うようにしてもよい。 In the example of FIG. 6, the processing for updating the object buffer (processing S10) may also be performed when the processing of processing S6 to S7 for displaying an image of the real space is executed.
 このように本実施の形態の例では、仮想空間の表示の際に、図7に例示したように、例えばユーザの目の位置に対応する位置にレンダリングカメラを配し、現実空間を反映した(現実空間内の対象物を仮想オブジェクトに置き換えるなどした画像データ)レンダリング画像が提示されるので、ユーザに対してより正確な手の距離感を与えつつ画像を表示できる。すなわち仮想空間の仮想オブジェクトに触れるようにユーザが手を伸ばせば、ユーザは現実空間における対応する対象物に触れることとなる。 As described above, in the example of the present embodiment, when the virtual space is displayed, as illustrated in FIG. 7, for example, the rendering camera is arranged at a position corresponding to the position of the user's eyes to reflect the real space ( (Image data obtained by replacing an object in the real space with a virtual object) Since a rendered image is presented, an image can be displayed while giving a more accurate sense of hand distance to the user. That is, when the user reaches out to touch the virtual object in the virtual space, the user touches the corresponding object in the real space.
 また上記図6の処理S6において、生成する左目用、右目用の画像データは、対象空間を再構成したもの(処理S8から処理S9と同じ処理で得られるメッシュに、現実空間の画像データから抽出したテクスチャを貼り付けたものを仮想オブジェクトとし、対象空間に対応する仮想空間内にこれらの仮想オブジェクトを配置して仮想空間を構成し、処理S4で決定したレンダリングカメラの位置座標RCから視野方向を見た画像としてレンダリングしたもの)として表示してもよい。 In the process S6 of FIG. 6, the left-eye image data and the right-eye image data to be generated are extracted from the real space image data obtained by reconstructing the target space (the mesh obtained by the same process as the process S8 to the process S9). The texture is pasted as a virtual object, and these virtual objects are arranged in a virtual space corresponding to the target space to form a virtual space, and the viewing direction is determined from the position coordinates RC of the rendering camera determined in step S4. It may be displayed as a rendered image).
 この例によると、現実空間を表す画像についてもユーザの目の位置など、任意の位置から見た画像が提示され、仮想空間内の画像についても、同じ位置から見たレンダリング画像が提示されるので、現実空間の画像と仮想的な空間の画像とを切り替えても、切り替えの際に視野が実質的に移動せず、違和感をより軽減でき、また現実空間から仮想空間への切替えにおいて、仮想空間の構成の処理S11において仮想オブジェクトに置き換える現実空間内の物体を順次選択しつつ、レンダリングと、レンダリングして得た画像データの出力を行うことで、あたかも現実空間の物体が少しずつゲーム等で用いる仮想空間内の仮想オブジェクトに置き換えられていくかのようなシーン演出も可能となる。 According to this example, an image viewed from an arbitrary position such as the position of the user's eyes is presented as an image representing the real space, and a rendered image viewed from the same position is also presented as an image in the virtual space. Even when switching between the image of the real space and the image of the virtual space, the visual field does not move substantially at the time of switching, and the sense of incongruity can be further reduced, and when switching from the real space to the virtual space, the virtual space By sequentially selecting an object in the real space to be replaced with a virtual object in the process S11 of the above configuration, rendering and outputting image data obtained by the rendering are used as if the real space object is used little by little in a game or the like. It is also possible to produce a scene as if it is being replaced by a virtual object in the virtual space.
[変形例]
 ここまでの説明において、ユーザ周囲の現実空間の画像は、ユーザの装着する表示装置40に設けられたカメラ43によって得られるものとしたが、本実施の形態はこれに限られない。本実施の形態の別の例として、ユーザの所在する室内に配置されたカメラによって撮像されたものであっても構わない。
[Modification]
In the above description, the image of the real space around the user is obtained by the camera 43 provided in the display device 40 worn by the user. However, the present embodiment is not limited to this. As another example of the present embodiment, an image captured by a camera arranged in a room where the user is located may be used.
 また、ここまでの説明では、レンダリングカメラの位置座標RCを、ユーザの目の位置に対応する位置を表すものとし、視野方向をユーザの顔の向きとする例について述べたが、本実施の形態はこれに限られない。例えば、レンダリングカメラの位置座標RCは、仮想空間を俯瞰して見る位置を表す座標(例えばX軸,Y軸成分の座標値についてはユーザの位置座標とし、Z軸成分の座標値を予め定めた値としたもの)に設定してもよい。また、視野の方向は、ユーザの背後方向(ユーザの顔の向きと反対の方向)としてもよい。さらに、視野の範囲を変更して、ズームが行われたような効果を演出してもよい。視野の範囲(画角)は、ユーザの操作により任意に設定可能としてもよい。 In the description so far, the example in which the rendering camera position coordinate RC represents the position corresponding to the position of the user's eyes and the visual field direction is the direction of the user's face has been described. Is not limited to this. For example, the position coordinate RC of the rendering camera is a coordinate representing a position viewed from above the virtual space (for example, the coordinate value of the X-axis and Y-axis components is the user's position coordinate, and the coordinate value of the Z-axis component is predetermined. Value). Further, the direction of the visual field may be a direction behind the user (a direction opposite to the direction of the user's face). Furthermore, the range of the field of view may be changed to produce the effect of zooming. The range of view (view angle) may be arbitrarily set by a user operation.
[ユーザの頭部の位置及び傾きの情報を取得する別の例]
 また上述の説明では、ユーザの頭部の位置及び傾きの情報は、表示装置40に設けられた頭部方向センサ441から得られるものとしていたが、本実施の形態はこれに限られず、例えば、ユーザの所在する室内の既知の位置に配置されたカメラによってユーザを撮像し、ユーザの頭部とともに移動する所定の点、例えばユーザが装着している表示装置40上に予め配した所定のマーカーの位置及び姿勢を検出することで、ユーザの頭部の位置及び傾き角度を検出してもよい。このようなマーカーとそれを撮像した画像データに基づく、マーカーの位置及び傾きを検出する技術は広く知られているので、ここでの詳しい説明は省略する。
[Another example of acquiring information on the position and tilt of the user's head]
In the above description, the information on the position and inclination of the user's head is assumed to be obtained from the head direction sensor 441 provided in the display device 40. However, the present embodiment is not limited to this, for example, The user is imaged by a camera arranged at a known position in the room where the user is located, and a predetermined point that moves with the user's head, for example, a predetermined marker arranged in advance on the display device 40 worn by the user The position and tilt angle of the user's head may be detected by detecting the position and orientation. Since a technique for detecting the position and inclination of a marker based on such a marker and image data obtained by imaging the marker is widely known, detailed description thereof is omitted here.
 この方法でユーザの頭部の位置及び傾きの情報を取得する場合、表示装置40には頭部方向センサ441を設ける必要は必ずしもない。 When acquiring information on the position and tilt of the user's head by this method, the display device 40 does not necessarily need to be provided with the head direction sensor 441.
[目センサの利用]
 さらに、ここまでの説明では、例えば撮像素子430とユーザの目との相対座標など、基準位置とユーザの目の位置との相対座標が予め定められたものとしたが、本実施の形態は、これに限られず、表示装置40にユーザの目の位置を検出する目センサ440を設けて、これにより、ユーザの目の位置を検出して基準位置とユーザの目の位置との相対座標を求めてもよい。
[Use of eye sensor]
Further, in the above description, the relative coordinates between the reference position and the user's eye position, such as the relative coordinates between the image sensor 430 and the user's eye, are determined in advance. Without being limited thereto, the display device 40 is provided with an eye sensor 440 for detecting the position of the user's eyes, thereby detecting the position of the user's eyes and obtaining the relative coordinates between the reference position and the position of the user's eyes. May be.
 このような目センサ440としては、表示装置40の所定位置に対して既知の位置に配された可視カメラまたは赤外カメラがある。この目センサ440は、ユーザの目頭や虹彩、角膜や瞳孔の位置(目センサ440との相対的な位置)を検出し、上記既知の表示装置40の所定位置(例えば左目については左側の撮像素子430Lの位置、右目については右側の撮像素子430Rの位置)からの相対的な座標情報を求めて、眼球表面の虹彩または瞳孔の中心位置を表す情報として出力する。 As such an eye sensor 440, there is a visible camera or an infrared camera arranged at a known position with respect to a predetermined position of the display device 40. The eye sensor 440 detects the position of the user's eye, iris, cornea, or pupil (relative position with respect to the eye sensor 440), and the predetermined position of the known display device 40 (for example, the left image sensor for the left eye). Relative coordinate information from the position of 430L and the right eye (position of the right imaging element 430R) is obtained and output as information representing the iris or pupil center position on the eyeball surface.
[視線の情報を用いる例]
 さらにこの目センサ440が、ユーザの視線の方向(眼球の向き)のベクトル情報を検出できる場合は、視線の情報を用いて、表示装置40に表示する映像のパラメータ(例えば歪み補正のパラメータ)を変更してもよい。
[Example using gaze information]
Further, when the eye sensor 440 can detect vector information of the direction of the user's line of sight (eyeball direction), the parameters (for example, distortion correction parameters) of the video displayed on the display device 40 are used using the line-of-sight information. It may be changed.
[被写界深度]
 また、この視線情報を用い、レンダリング時、当該視線の方向にある仮想オブジェクトまでの距離を求め、被写界深度(ここでは、ピントの合う距離の範囲rを指定するrmin≦r≦rmaxを意味するものとする)を設定してもよい。一例として画像処理装置10は、対象空間内のXYZ座標系におけるユーザの左右の目のそれぞれについての視線の方向のベクトルVL,VRを得た後、レンダリングカメラの位置座標RCからこれらのベクトルの方向にある仮想オブジェクトまでの距離rL,rRを求める。
[Depth of field]
Further, using this line-of-sight information, at the time of rendering, the distance to the virtual object in the direction of the line of sight is obtained, and the depth of field (in this case, rmin ≦ r ≦ rmax that designates the range r in focus) May be set). As an example, the image processing apparatus 10 obtains the vectors VL and VR of the line-of-sight direction for each of the left and right eyes of the user in the XYZ coordinate system in the target space, and then the direction of these vectors from the position coordinates RC of the rendering camera. The distances rL and rR to the virtual object at are obtained.
 そして画像処理装置10は、例えば次のようにピントの合う距離の範囲rを特定するrmin(下限距離),rmax(上限距離)を決定する。一例として、画像処理装置10は、上記求めた距離rL,rRの算術平均rを求める。そして、このrが所定のしきい値rthを超えていれば、rminを「0」、rmaxを無限大とする(つまりパンフォーカスとする)。また画像処理装置10は、rが所定のしきい値rthを超えていなければ、rminを、rmin=r-α、rmaxをr+βとする。なお、α,βは実験的に定められる正の値である。ここでα=βであってもよい。 Then, the image processing apparatus 10 determines rmin (lower limit distance) and rmax (upper limit distance) for specifying the distance range r in focus as follows, for example. As an example, the image processing apparatus 10 obtains the arithmetic average r of the distances rL and rR obtained above. If r exceeds a predetermined threshold value rth, rmin is set to “0” and rmax is set to infinity (that is, pan focus is set). If r does not exceed the predetermined threshold value rth, the image processing apparatus 10 sets rmin to rmin = r−α and rmax to r + β. Α and β are positive values determined experimentally. Here, α = β may be satisfied.
 もっとも、ここで示したrmin、rmaxの演算例は一例であって、ユーザの視線の方向に応じて違和感のない被写界深度が実験的に設定できれば他の演算方法であってもよい。画像処理装置10は、ここで演算した被写界深度を特定する情報(ここでの例ではrmin,rmax)を用いて、レンダリングした画像データ(立体視画像の画像データ)を加工する。 Of course, the calculation examples of rmin and rmax shown here are merely examples, and other calculation methods may be used as long as the depth of field can be experimentally set according to the direction of the user's line of sight. The image processing apparatus 10 processes the rendered image data (stereoscopic image image data) using information (rmin, rmax in this example) that specifies the calculated depth of field.
 すなわち画像処理装置10の画像生成部24は、生成した立体視画像の画像データ(左目用と右目用との画像データのそれぞれ)を、複数の画像領域(例えば所定サイズの画像ブロック)に分割する。画像生成部24は、各画像領域について、当該画像領域内の画素に対応する、デプスマップ上の画素値(ユーザが装着する表示装置40からの距離)を得る。画像生成部24は、ここで得たデプスマップ上の画素値とユーザの目の位置に係る情報とに基づいて、ユーザの目から、画像領域内の画素(例えば画像領域の中心の画素)に撮像された物体までの距離Dを求める。そして画像生成部24は、当該距離Dが、rmin,rmaxに対して、rmin≦D≦rmaxであれば、当該画像領域については何もしない。 That is, the image generation unit 24 of the image processing apparatus 10 divides the generated stereoscopic image image data (each image data for the left eye and for the right eye) into a plurality of image regions (for example, image blocks of a predetermined size). . For each image region, the image generation unit 24 obtains a pixel value (a distance from the display device 40 worn by the user) on the depth map corresponding to the pixel in the image region. Based on the pixel value on the depth map obtained here and the information related to the position of the user's eyes, the image generation unit 24 converts from the user's eyes to the pixels in the image area (for example, the center pixel of the image area). A distance D to the imaged object is obtained. If the distance D is rmin ≦ D ≦ rmax with respect to rmin and rmax, the image generation unit 24 does nothing for the image region.
 また、画像生成部24は、ここで求めた距離DがD<rminであれば、当該画像領域の画素に、rmin-Dの大きさに応じた強度のぼかし処理(例えばガウシアンフィルタ)を適用して、ピントが合っていないかのような画像とする。 Further, if the distance D obtained here is D <rmin, the image generation unit 24 applies a blurring process (for example, a Gaussian filter) with an intensity corresponding to the size of rmin−D to the pixels in the image area. The image looks as if it is out of focus.
 同様に、画像生成部24は、ここで求めた距離DがD>rmaxであれば、当該画像領域の画素に、D-rmaxの大きさに応じた強度のぼかし処理(これも例えばガウシアンフィルタ)を適用して、ピントが合っていないかのような画像とする。 Similarly, if the distance D obtained here is D> rmax, the image generation unit 24 performs a blurring process of intensity corresponding to the magnitude of D−rmax (also a Gaussian filter, for example) on the pixel of the image area. To make the image look as if it is out of focus.
 これにより、被写界深度を指定する情報に基づき、取得した現実空間の画像に基づく、ビュー変換された現実空間の画像であっても、またレンダリングして得られた仮想空間の画像であっても、当該指定された被写界深度を有した立体視画像を生成できる。 Thus, based on the information for specifying the depth of field, the view-converted real space image based on the acquired real space image, or the virtual space image obtained by rendering, In addition, a stereoscopic image having the designated depth of field can be generated.
 本実施の形態のこの例によると、ユーザが注視している距離範囲外の画像部分がぼかされて見えることとなり、より自然な画像がユーザに提供される。 According to this example of the present embodiment, an image portion outside the distance range that the user is gazing at will be blurred and a more natural image is provided to the user.
 10 画像処理装置、11 制御部、12 記憶部、13 インタフェース部、20 操作デバイス、21 画像取得部、23 視野決定処理部、24 画像生成部、25 出力部、30 中継装置、40 表示装置、41 映像表示素子、42 光学素子、43 カメラ、44 センサ、45 通信インタフェース、430 撮像素子、440 目センサ、441 頭部方向センサ。
 
 
DESCRIPTION OF SYMBOLS 10 Image processing apparatus, 11 Control part, 12 Storage part, 13 Interface part, 20 Operation device, 21 Image acquisition part, 23 Visual field determination processing part, 24 Image generation part, 25 Output part, 30 Relay apparatus, 40 Display apparatus, 41 Image display element, 42 optical element, 43 camera, 44 sensor, 45 communication interface, 430 imaging element, 440 eye sensor, 441 head direction sensor.

Claims (6)

  1.  ユーザが頭部に装着して使用する表示装置と接続される画像処理装置であって、
     当該ユーザの周囲の現実空間の画像を取得する画像取得手段と、
     視野の情報を決定する決定手段と、
     前記取得した現実空間の画像に基づき、前記決定した情報で特定される視野の画像を生成する画像生成手段と、
     を有し、前記画像生成手段が生成した画像を、前記表示装置に出力する画像処理装置。
    An image processing device connected to a display device used by a user wearing on the head,
    Image acquisition means for acquiring an image of a real space around the user;
    A determination means for determining information of the visual field;
    An image generating means for generating an image of a visual field specified by the determined information based on the acquired image of the real space;
    An image processing apparatus that outputs an image generated by the image generation means to the display device.
  2.  請求項1記載の画像処理装置であって、
     前記視野の情報は、前記表示装置を装着したユーザの目の位置を表す情報を含む画像処理装置。
    The image processing apparatus according to claim 1,
    The visual field information is an image processing device that includes information indicating a position of an eye of a user wearing the display device.
  3.  請求項1または2に記載の画像処理装置であって、
     前記視野の情報は、前記表示装置を装着したユーザの顔の向きの情報を含む画像処理装置。
    The image processing apparatus according to claim 1, wherein:
    The visual field information is an image processing apparatus including information on a face direction of a user wearing the display device.
  4.  請求項1から3のいずれか一項に記載の画像処理装置であって、
     前記画像生成手段は、前記取得した現実空間の画像に基づいて、現実空間内に配された物体を検出し、現実空間に対応する仮想空間であって、前記検出した各物体に対応する三次元モデルを、前記現実空間の各物体の位置に対応する仮想空間内の位置に配し、前記視野の画像を生成する画像処理装置。
    The image processing apparatus according to any one of claims 1 to 3,
    The image generation means detects an object arranged in the real space based on the acquired image of the real space, and is a virtual space corresponding to the real space, the three-dimensional corresponding to each detected object An image processing apparatus that arranges a model at a position in a virtual space corresponding to the position of each object in the real space, and generates an image of the visual field.
  5.  請求項1から4のいずれか一項に記載の画像処理装置であって、
     前記画像生成手段は、前記取得した現実空間の画像に基づいて、現実空間内に配された物体を検出し、現実空間に対応する仮想空間であって、前記検出した各物体に対応する三次元モデルを、前記現実空間の各物体の位置に対応する仮想空間内の位置に配し、前記視野の画像を生成する画像生成手段であり、
     指示により、
     前記取得した現実空間の画像を出力する処理と、
     前記画像生成手段により生成された画像を出力する処理と、
     のいずれかを実行する画像処理装置。
    The image processing apparatus according to any one of claims 1 to 4, wherein:
    The image generation means detects an object arranged in the real space based on the acquired image of the real space, and is a virtual space corresponding to the real space, the three-dimensional corresponding to each detected object An image generating means for generating a visual field image by arranging a model at a position in a virtual space corresponding to the position of each object in the real space;
    By instructions
    A process of outputting the acquired real space image;
    A process of outputting the image generated by the image generating means;
    An image processing apparatus that executes any one of the above.
  6.  ユーザが頭部に装着して使用する表示装置と接続される画像処理装置を、
     当該ユーザの周囲の現実空間の画像を取得する画像取得手段と、
     視野の情報を決定する決定手段と、
     前記取得した現実空間の画像に基づき、前記決定した情報で特定される視野の画像を生成する画像生成手段と、
     として機能させ、前記画像生成手段が生成した画像を、前記表示装置に出力させるプログラム。
    An image processing device connected to a display device used by the user wearing on the head,
    Image acquisition means for acquiring an image of a real space around the user;
    A determination means for determining information of the visual field;
    An image generating means for generating an image of a visual field specified by the determined information based on the acquired image of the real space;
    A program for causing the display device to output an image generated by the image generation means.
PCT/JP2017/005742 2016-05-02 2017-02-16 Image processing device WO2017191703A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2018515396A JPWO2017191703A1 (en) 2016-05-02 2017-02-16 Image processing device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-092561 2016-05-02
JP2016092561 2016-05-02

Publications (1)

Publication Number Publication Date
WO2017191703A1 true WO2017191703A1 (en) 2017-11-09

Family

ID=60203728

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/005742 WO2017191703A1 (en) 2016-05-02 2017-02-16 Image processing device

Country Status (2)

Country Link
JP (1) JPWO2017191703A1 (en)
WO (1) WO2017191703A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021200270A1 (en) * 2020-03-31 2021-10-07 ソニーグループ株式会社 Information processing device and information processing method
WO2024004321A1 (en) * 2022-06-28 2024-01-04 キヤノン株式会社 Image processing device, head-mounted display device, control method for image processing device, control method for head-mounted display device, and program

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004038470A (en) * 2002-07-02 2004-02-05 Canon Inc Augmented reality system and information processing method
JP2005327204A (en) * 2004-05-17 2005-11-24 Canon Inc Image composing system, method, and device
JP2009271732A (en) * 2008-05-07 2009-11-19 Sony Corp Device and method for presenting information, imaging apparatus, and computer program
JP2012133471A (en) * 2010-12-20 2012-07-12 Kokusai Kogyo Co Ltd Image composer, image composition program and image composition system
JP2014511512A (en) * 2010-12-17 2014-05-15 マイクロソフト コーポレーション Optimal focus area for augmented reality display
WO2016048658A1 (en) * 2014-09-25 2016-03-31 Pcms Holdings, Inc. System and method for automated visual content creation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3631151B2 (en) * 2000-11-30 2005-03-23 キヤノン株式会社 Information processing apparatus, mixed reality presentation apparatus and method, and storage medium
JP2013218535A (en) * 2012-04-09 2013-10-24 Crescent Inc Method and device for displaying finger integrated into cg image in three-dimensionally modeled cg image and wide viewing angle head mount display device for displaying three-dimensionally modeled cg image
JP6294054B2 (en) * 2013-11-19 2018-03-14 株式会社Nttドコモ Video display device, video presentation method, and program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004038470A (en) * 2002-07-02 2004-02-05 Canon Inc Augmented reality system and information processing method
JP2005327204A (en) * 2004-05-17 2005-11-24 Canon Inc Image composing system, method, and device
JP2009271732A (en) * 2008-05-07 2009-11-19 Sony Corp Device and method for presenting information, imaging apparatus, and computer program
JP2014511512A (en) * 2010-12-17 2014-05-15 マイクロソフト コーポレーション Optimal focus area for augmented reality display
JP2012133471A (en) * 2010-12-20 2012-07-12 Kokusai Kogyo Co Ltd Image composer, image composition program and image composition system
WO2016048658A1 (en) * 2014-09-25 2016-03-31 Pcms Holdings, Inc. System and method for automated visual content creation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TOSHIKAZU OSHIMA ET AL.: "RV-Border Guards : A Multi-player Mixed Reality Entertainment", TRANSACTIONS OF THE VIRTUAL REALITY SOCIETY OF JAPAN, vol. 4, no. 4, 31 December 1999 (1999-12-31), pages 699 - 705, XP055410021, ISSN: 1344-011X *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021200270A1 (en) * 2020-03-31 2021-10-07 ソニーグループ株式会社 Information processing device and information processing method
WO2024004321A1 (en) * 2022-06-28 2024-01-04 キヤノン株式会社 Image processing device, head-mounted display device, control method for image processing device, control method for head-mounted display device, and program

Also Published As

Publication number Publication date
JPWO2017191703A1 (en) 2018-10-04

Similar Documents

Publication Publication Date Title
JP5996814B1 (en) Method and program for providing image of virtual space to head mounted display
JP6933727B2 (en) Image processing equipment, image processing methods, and programs
CN107209950B (en) Automatic generation of virtual material from real world material
CN110022470B (en) Method and system for training object detection algorithm using composite image and storage medium
US11184597B2 (en) Information processing device, image generation method, and head-mounted display
US9106906B2 (en) Image generation system, image generation method, and information storage medium
US10607398B2 (en) Display control method and system for executing the display control method
JP6899875B2 (en) Information processing device, video display system, information processing device control method, and program
TW201831947A (en) Helmet mounted display, visual field calibration method thereof, and mixed reality display system
US20230156176A1 (en) Head mounted display apparatus
JP6682624B2 (en) Image processing device
JP6649010B2 (en) Information processing device
WO2017191703A1 (en) Image processing device
JP6687751B2 (en) Image display system, image display device, control method thereof, and program
JP6591667B2 (en) Image processing system, image processing apparatus, and program
JP2019102828A (en) Image processing system, image processing method, and image processing program
JP5037713B1 (en) Stereoscopic image display apparatus and stereoscopic image display method
JP6613099B2 (en) Program, computer and head-mounted display system for stereoscopic display of virtual reality space
WO2017163649A1 (en) Image processing device
WO2018173206A1 (en) Information processing device
JP2017142769A (en) Method and program for providing head-mounted display with virtual space image

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018515396

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17792636

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17792636

Country of ref document: EP

Kind code of ref document: A1