WO2017191703A1

WO2017191703A1 - Image processing device

Info

Publication number: WO2017191703A1
Application number: PCT/JP2017/005742
Authority: WO
Inventors: 良徳大橋
Original assignee: 株式会社ソニー・インタラクティブエンタテインメント
Priority date: 2016-05-02
Filing date: 2017-02-16
Publication date: 2017-11-09
Also published as: JPWO2017191703A1

Abstract

This image processing device, which is connected to a display device which is worn on the head of a user when used, acquires an image of a real space around the user, determines information related to the field of view, and generates, on the basis of the acquired image of the real space, an image of the field of view identified by the determined information. The image processing device outputs the generated image to the display device.

Description

Image processing device

The present invention relates to an image processing apparatus that is connected to a display device that a user wears on the head.

In recent years, display devices such as head-mounted displays that are worn by users on their heads are becoming popular. Such a display device causes the user to view the image by forming an image in front of the user's eyes. Such a display device includes a non-transmission type that covers the front of the user with a display unit and prevents the user from seeing the real space in front of the user, and a display unit that includes a half mirror or the like. There is a transmission type (optical see-through method) for visually recognizing.

Further, even in the case of a non-transmissive display device, the real space in front of the user's eyes photographed by a camera is separately displayed on the display unit, so that the user can visually recognize the real space in front of the eyes in the same manner as the transmissive display device. There is a device that realizes a pseudo-transmissive display device (referred to as a camera see-through method).

However, in a camera see-through display device, the user sees an image of the viewpoint from the camera, unlike an image whose viewpoint is the actual position of the user's eyes. There was a problem that it was difficult to grasp.

The present invention has been made in view of the above circumstances, and one of its purposes is to provide an image processing apparatus capable of displaying an image while giving a more accurate sense of hand position to the user. .

The present invention that solves the problems of the conventional example is an image processing apparatus that is connected to a display device that a user wears on the head, and that acquires an image of a real space around the user Means for determining field-of-view information; and image generation means for generating a field-of-view image specified by the determined information based on the acquired real-space image. The image generated by is output to the display device.

1 is a configuration block diagram illustrating an example of an image processing system including an image processing apparatus according to an embodiment of the present invention. It is a functional block diagram showing the example of the image processing apparatus which concerns on embodiment of this invention. It is explanatory drawing showing the example of the information of the inclination of the head which the image processing apparatus which concerns on embodiment of this invention uses. It is explanatory drawing showing the outline | summary of the object buffer which the image processing apparatus which concerns on embodiment of this invention produces | generates. It is explanatory drawing showing the projection image of the object buffer which the image processing apparatus which concerns on embodiment of this invention produces | generates. It is a flowchart figure showing the example of operation of the image processing device concerning an embodiment of the invention. It is explanatory drawing showing the operation example of the image processing apparatus which concerns on embodiment of this invention.

Embodiments of the present invention will be described with reference to the drawings. As illustrated in FIG. 1, an image processing system 1 including an image processing apparatus 10 according to an embodiment of the present invention includes an image processing apparatus 10, an operation device 20, a relay device 30, and a display device 40. It is configured to include.

The image processing apparatus 10 is an apparatus that supplies an image to be displayed by the display device 40. For example, the image processing apparatus 10 is a consumer game machine, a portable game machine, a personal computer, a smartphone, a tablet, or the like. As shown in FIG. 1, the image processing apparatus 10 includes a control unit 11, a storage unit 12, and an interface unit 13.

The control unit 11 is a program control device such as a CPU, and executes a program stored in the storage unit 12. In the present embodiment, the control unit 11 acquires an image of a real space around the user wearing the display device 40, and generates an image of a designated visual field based on the acquired image of the real space. .

Specifically, in the example of the present embodiment, the control unit 11 has a predetermined size (for example, a width (when the user wears the display device 40 (ie An initial space perpendicular to the user's line-of-sight direction and parallel to the floor surface) 10 m, depth (an initial user's line-of-sight direction parallel to the floor surface) 10 m, and a 3 m high cuboid range) A virtual three-dimensional space (hereinafter referred to as a virtual space) corresponding to the space) is configured.

That is, the control unit 11 places a virtual three-dimensional object in the virtual space or applies a video effect while referring to an image in the real space. In addition, the control unit 11 includes, in the virtual space, information related to a visual field used when rendering an image of the virtual space (information related to two fields, a visual field corresponding to the user's left eye and a visual field corresponding to the right eye, or common information). Image of the virtual field viewed from the visual field specified by the obtained information (when the visual field information corresponding to the left and right eyes is used, a stereoscopic image is obtained). Generate. Then, the control unit 11 outputs the generated image to the display device 40. Detailed operation of the control unit 11 will be described later.

The storage unit 12 includes at least one memory device such as a RAM and stores a program executed by the control unit 11. The storage unit 12 also operates as a work memory for the control unit 11 and stores data used by the control unit 11 in the course of program execution. The program may be provided by being stored in a computer-readable non-transitory recording medium and stored in the storage unit 12.

The interface unit 13 is an interface for the control unit 11 of the image processing apparatus 10 to perform data communication with the operation device 20 and the relay device 30. The image processing apparatus 10 is connected to the operation device 20, the relay apparatus 30, or the like via the interface unit 13 by either wired or wireless. As an example, the interface unit 13 is a multimedia such as HDMI (High-Definition Multimedia Interface) in order to transmit an image (stereoscopic image) and sound supplied by the image processing device 10 to the relay device 30. An interface may be included. In addition, a data communication interface such as a USB may be included in order to receive various information from the display device 40 via the relay device 30 and to transmit a control signal or the like. Further, the interface unit 13 may include a data communication interface such as a USB in order to receive a signal indicating the content of the user's operation input to the operation device 20.

The operation device 20 is a controller or the like of a consumer game machine, and is used by the user to perform various instruction operations on the image processing apparatus 10. The content of the user's operation input to the operation device 20 is transmitted to the image processing apparatus 10 by either wired or wireless. The operation device 20 is not necessarily separate from the image processing apparatus 10, and may include operation buttons, a touch panel, and the like arranged on the surface of the image processing apparatus 10.

The relay device 30 is connected to the display device 40 by either wired or wireless, receives data of an image (stereoscopic image) supplied from the image processing device 10, and outputs a video signal corresponding to the received data. Output to the display device 40. At this time, the relay device 30 executes a process for correcting distortion generated by the optical system of the display device 40 as necessary for the video represented by the supplied image, and outputs a video signal representing the corrected video. It may be output. If the image supplied from the image processing device 10 is a stereoscopic image, the video signal supplied from the relay device 30 to the display device 40 includes a left-eye video signal and a right-eye video signal generated based on the stereoscopic image. Two video signals including a video signal are included. The relay device 30 relays various information transmitted and received between the image processing device 10 and the display device 40 such as audio data and control signals, in addition to the stereoscopic image and the video signal.

The display device 40 is a display device that the user wears on the head and uses the video according to the video signal input from the relay device 30 to allow the user to browse. In the present embodiment, it is assumed that the display device 40 displays an image corresponding to each eye in front of each of the user's right eye and left eye. As shown in FIG. 1, the display device 40 includes a video display element 41, an optical element 42, a camera 43, a sensor unit 44, and a communication interface 45.

The video display element 41 is an organic EL display panel, a liquid crystal display panel, or the like, and displays a video corresponding to a video signal supplied from the relay device 30. The video display element 41 may be one display element that displays the left-eye video and the right-eye video in a line, or displays the left-eye video and the right-eye video independently. A pair of display elements may be included. A display screen such as a smartphone may be used as the video display element 41 as it is. In this case, a smartphone or the like displays a video corresponding to the video signal supplied from the relay device 30.

Further, the display device 40 may be a retinal irradiation type (retinal projection type) device that directly projects an image on a user's retina. In this case, the image display element 41 may be configured by a laser that emits light and a MEMS (Micro Electro Mechanical Systems) mirror that scans the light.

The optical element 42 is a hologram, a prism, a half mirror, or the like, and is disposed in front of the user's eyes. The optical element 42 transmits or refracts the image light displayed by the image display element 41 to enter the user's eyes. Specifically, the optical element 42 may include a left-eye optical element 42L and a right-eye optical element 42R. In this case, the left-eye image displayed by the image display element 41 is incident on the user's left eye via the left-eye optical element 42L, and the right-eye image is incident on the user's right eye via the right-eye optical element 42R. You may make it inject. As a result, the user can view the left-eye video with the left eye and the right-eye video with the right eye while the display device 40 is mounted on the head. In the present embodiment, the display device 40 is assumed to be a non-transmissive display device in which the user cannot visually recognize the appearance of the outside world.

The camera 43 has a pair of

image sensors

430L and 430R arranged on the left side slightly on the front side (front side of the user) of the display device 40 and on the right side with respect to the center (when there is no need to distinguish between left and right in the following description). , Collectively referred to as the image sensor 430). The camera 43 may include at least one image sensor 430B disposed on the back side of the user. The camera 43 captures at least an image of the real space in front of the user captured by the

image sensors

430L and 430R, and outputs image data obtained by capturing the image to the image processing device 10 via the relay device 30. .

The sensor 44 may further include a head direction sensor 441 that detects the direction of the user's head (the front direction of the user's face) and the position. In this case, the head direction sensor 441 detects the direction of the user's head (face direction). Specifically, the head direction sensor 441 is a gyro or the like, and the rotation angle in the head direction and the rotation angle in the elevation direction in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted. Then, the rotation angle around the viewing direction axis is detected and output. The head direction sensor 441 uses a predetermined position of the display device 40 (for example, the position of a point that bisects a line segment connecting the image sensor 430L and the image sensor 430R of the camera 43) as a reference position. The left and right direction of the user (the axis where the cross section and the coronal plane intersect, hereinafter referred to as the X axis), the front and rear direction (the axis where the sagittal plane and the cross section intersect, hereinafter referred to as the Y axis), and the vertical direction (Z axis) ) Is detected and output (x, y, z) from when it is attached. Assume that the relative coordinates of each image sensor 430 with the reference position as the origin are known.

The communication interface 45 is an interface for communicating data such as video signals and image data with the relay device 30. For example, when the display device 40 transmits / receives data to / from the relay device 30 by wireless communication such as a wireless LAN or Bluetooth (registered trademark), the communication interface 45 includes a communication antenna and a communication module.

Next, the operation of the control unit 11 of the image processing apparatus 10 according to the embodiment of the present invention will be described. As illustrated in FIG. 2, the control unit 11 functionally executes the program stored in the storage unit 12 to functionally obtain an image acquisition unit 21, a visual field determination processing unit 23, and an image generation unit 24. The output unit 25 is realized.

The image acquisition unit 21 acquires an image of the real space around the user wearing the display device 40. Specifically, the image acquisition unit 21 receives image data captured by the camera 43 from the display device 40 via the relay device 30. In the example of the present embodiment, the image data captured by the camera 43 is a pair of image data captured by the pair of imaging elements 430 arranged on the left and right, and in the real space captured by the parallax of each image data. The distance to the object can be determined. In the present embodiment, the image acquisition unit 21 generates and outputs a depth map having the same size as the image data (hereinafter referred to as captured image data for distinction) based on the image data captured by the camera 43. To do. Here, the depth map is image data in which information indicating the distance to the object captured by each pixel of the image data captured by the camera 43 is used as the value of the pixel corresponding to the pixel.

The visual field determination processing unit 23 determines visual field information for rendering the virtual space. As a specific example, the visual field determination processing unit 23 is determined in advance (for example, may be hard-coded in a program or described in a setting file) regardless of the position of the image sensor 430 included in the camera 43. A position coordinate RC of a camera (hereinafter referred to as a rendering camera, in which an image viewed from the position of the rendering camera is rendered) in rendering in the virtual space; Information representing the direction of the visual field (for example, vector information starting from the position coordinate RC and passing through the center of the visual field) is obtained and used as visual field information.

As another example, the visual field determination processing unit 23 may obtain the position coordinates RC of the rendering camera in the virtual space as relative coordinates from the reference position in the real space that changes with time according to the user's movement. As an example, the reference position may be the position of the image sensor 430, and the position in the virtual space corresponding to the position moved from the image sensor 430 by a predetermined relative coordinate value may be the position coordinate RC of the rendering camera. .

Here, the relative coordinates may be relative coordinates from the position of the image sensor 430R (or 430L) to the position where the user's right eye (or left eye) wearing the display device 40 should be, for example. In this example, the position in the virtual space corresponding to the position of the user's eyes becomes the position coordinate RC of the rendering camera.

Further, the visual field determination processing unit 23 may obtain information indicating the direction of the visual field (for example, vector information passing through the center of the visual field starting from the position coordinates RC) from the information output from the head direction sensor 441. In this case, the visual field determination processing unit 23 determines that the information output from the head direction sensor 441 is in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted, as illustrated in FIG. The rotation angle θ in the head direction, the rotation angle φ in the elevation direction, the rotation angle ψ about the axis in the viewing direction, and the movement amount (x, y, z) of the head are acquired. The visual field determination processing unit 23 determines the direction of the user's visual field based on the rotation angles θ and φ, and determines the inclination of the user's neck around the visual field direction based on the rotational angle ψ.

As an example, the visual field determination processing unit 23 sets the coordinates in the virtual space corresponding to the positions of the left and right eyes of the user in the real space as the position coordinates RC of the rendering camera. That is, coordinate information in the target space of the left image sensor 430L and the right image sensor 430R included in the camera 43 (information related to the amount of movement of the user's head and the relative position from the reference position to each image sensor 430). In the virtual space corresponding to the positions of the left and right eyes of the user in the XYZ coordinate system in the target space from the information of the relative coordinates of the left and right eyes with respect to the position of each image sensor 430. The coordinate information is output to the image generation unit 24 as the position coordinate RC of the rendering camera.

The visual field determination processing unit 23 also includes information on the direction of the visual field determined by the rotation angle θ and φ and the direction of the user's visual field, and the tilt of the user's neck about the visual field direction determined by the rotation angle ψ. Are output to the image generation unit 24.

The image generation unit 24 receives information about the position coordinates RC of the rendering camera and the direction of the visual field from the visual field determination processing unit 23. And this image generation part 24 produces | generates the stereoscopic vision image when seeing the visual field specified by the said information based on the acquired image of the real space.

In an example of the present embodiment, the image generation unit 24 first generates environment mesh list information and an object buffer based on the depth map information output from the image acquisition unit 21. Here, the environment mesh list information is obtained by dividing the depth map into meshes, the vertex coordinates of the mesh (information indicating the pixel position), mesh identification information, and pixels in the captured image data corresponding to the pixels in the mesh. Includes information on the normal of the object imaged on the screen, mesh type information (information indicating which type is predetermined), and information on the surface shape of the mesh.

This mesh type information indicates that the object imaged on the pixel in the captured image data corresponding to the pixel in the mesh is an object other than a wall that is within a predetermined height from the floor, ceiling, wall, or obstacle. Or the like) or the other. Further, the information regarding the surface shape of the mesh is information indicating whether the surface shape is a plane, an uneven surface, a spherical surface, or a complex surface.

As described above, there are various methods for recognizing the type of the object in the captured image data, the surface shape, and the like from the depth map information and the like. Do not ask.

Further, the object buffer has a predetermined size (for example, a width (a direction perpendicular to the initial user's line-of-sight direction and parallel to the floor surface) 10 m, a depth (floor) including the user's position and the rear of the user's line-of-sight direction. An actual space (hereinafter referred to as a target space) of 10 m and a height of 3 m in a rectangular parallelepiped direction (hereinafter referred to as a target space) is virtually a voxel (a virtual volume element, for example, a width of 10 cm, a depth). (Cube element of 10 cm and 10 cm in height) is expressed in space, the value of the voxel in which the object exists (voxel value) is “1”, the value of the non-existing voxel is “0”, and it is unknown The voxel value is set to “−1” (FIG. 4).

FIG. 4 shows only a part of the voxels for convenience of illustration, and the size of the voxels is also set as appropriate for explanation. The size of the voxel with respect to the target space does not necessarily indicate a size suitable for implementation. In FIG. 4, a cubic object is arranged at the back corner of the target space, and the value of the voxel corresponding to the surface is set to “1” indicating that the object exists, and is hidden from the surface. In this example, the value of the partial voxel is set to “−1” indicating that it is unknown, and the value of the voxel existing up to the object surface is set to “0” indicating that there is nothing.

The image generation unit 24 sets this voxel value based on the depth map information. Each pixel on the depth map has a vertex at the position coordinates of the camera 43 at the time of shooting the image data that is the origin of the depth map (which may be a reference position coordinate, hereinafter referred to as a shooting position), and the angle of view of the depth map. Is divided by the resolution of the depth map (vertical py pixels × horizontal px pixels). Therefore, a vector parallel to the line segment passing through the vertex of each pixel starting from the coordinates of the shooting position (the difference in coordinates in the world coordinate system) or a line segment passing through the center of each pixel starting from the coordinates of the shooting position A parallel vector (difference in coordinates in the world coordinate system) can be calculated as the direction of each pixel from the coordinates of the shooting position, information representing the angle of view of the depth map, and the resolution of the depth map.

Therefore, the image generation unit 24, for each pixel on the depth map, is in the direction of the pixel from the coordinates in the object buffer corresponding to the coordinates of the shooting position (may be the coordinates of the reference position), and the object represented by the depth map The value of the voxel corresponding to the distance to is “1”, and unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is “0”. In addition, the image generation unit 24 is hidden by an object in the real space in the image data picked up by the camera 43 and is behind the part not picked up (the object behind the desk, the wall, or the floor). For the portion), the voxel value of the corresponding portion is set to “−1” because it is unknown whether or not an object exists.

The image generation unit 24 moves the user or changes the direction of the line of sight, so that the image data captured by the camera 43 has not been captured in the past and whether or not an object exists is unknown. When the depth map of the portion corresponding to (the portion corresponding to the voxel whose value was “−1”) is obtained, the value of the voxel of the portion is set to “0” or “0” based on the obtained depth map. Set to “1” to update.

A method for setting a voxel value in a three-dimensional space representing a range where such an object exists from information such as a depth map is a method widely known as a 3D scanning method in addition to the method described here. Various methods can be employed.

Furthermore, the image generation unit 24 uses the rendering camera position coordinates RC specified by the information input from the visual field determination processing unit 23 to determine the visual field direction specified by the information input from the visual field determination processing unit 23 in the object buffer. A projection image is generated by two-dimensionally projecting a voxel whose value is not “0” (FIG. 5). Then, the image generation unit 24 detects an object arranged in the real space based on the acquired real space image, and generates a stereoscopic image in which a virtual object is arranged at the position of each detected object. And output. As a specific example, the image generation unit 24 configures a virtual space according to a predetermined rule (referred to as an object conversion rule).

As an example, the object conversion rule is as follows. The value of voxel is “1” and the mesh of the corresponding part is
(1) If the mesh type is “ceiling”, the background is synthesized.
(2) A virtual object “operation panel” is arranged at the position of an object whose mesh type is an obstacle and whose mesh surface shape is a plane.
(3) A “rock” or “box”, which is a virtual object, is placed at the position of an object whose type of mesh is an obstacle and whose mesh surface shape is an uneven surface.
(4) A “light”, which is a virtual object, is placed at the position of an object whose mesh type is an obstacle and whose mesh surface shape is spherical.
(5) “Trees and plants”, which are virtual objects, are arranged in a range of an object having a mesh type obstacle and a complicated mesh surface shape.

The image generation unit 24 separately receives input of a background image, an operation panel, and three-dimensional model data of a virtual object such as a rock, a box, a tree, and a plant. 5 is a range in which a voxel having a voxel value “1” in the projected image illustrated in FIG. 5 is projected, and the mesh in the corresponding range is set as an “operation panel” based on the object conversion rule. Arrange virtual objects on the operation panel. Hereinafter, similarly, a voxel having a voxel value of “1” in the projected image is projected, and the mesh in the corresponding range is set to “rock” or “box” based on the object conversion rule. A virtual object of “rock” or “box” is arranged at a position corresponding to the part, and each virtual object is arranged as follows. As described above, the process of arranging the virtual object represented by the 3D model data in the virtual space is widely known in the process of creating the 3D graphics, so detailed description thereof is omitted here. .

Then, after arranging the virtual object, the image generation unit 24 calculates the field of view from the position coordinates RC of the rendering camera (here, the coordinates corresponding to the left eye and the right eye of the user) input from the field of view determination processing unit 23. The images viewed from the viewing direction specified by the information input from the determination processing unit 23 are rendered. The image generation unit 24 outputs the pair of image data obtained by rendering to the output unit 25 as a stereoscopic image. Then, the output unit 25 outputs the image data of the stereoscopic image generated by the image generation unit 24 to the display device 40 via the relay device 30.

Note that the image generation unit 24 may generate image data tilted by this angle ψ using information on the angle ψ related to the tilt of the user's neck at the time of rendering.

Furthermore, in the present embodiment, the image generation unit 24 replaces the stereoscopic image in the virtual space according to the instruction with the real space image captured by the imaging elements 430L and R of the camera 43 as it is. (So-called camera see-through function is provided).

In this example, the image generation unit 24 outputs image data captured by the left imaging element 430L and the right imaging element 430R included in the camera 43 as they are, and the output unit 25 outputs an image based on these image data. The image is displayed as it is as the image for the left eye and the image for the right eye.

[Operation]
The image processing apparatus 10 according to the embodiment of the present invention basically includes the above configuration and operates as follows. When the user wears the display device 40 on the head, the image processing device 10 starts the processing illustrated in FIG. 6, and sets the reference position of the display device 40 (for example, the gravity center position of each image sensor 430 of the camera 43) as the origin. A real space in a rectangular parallelepiped range of ± 5 m (total 10 m) in the X-axis direction including the user's back, ± 5 m (total 10 m) in the Y-axis direction, and 3 m in height from the floor in the Z-axis direction (S1).

This object space is represented by an object buffer (initially, all voxel values are represented by “−” as a virtual voxel (virtual volume element, for example, a cubic element having a width of 10 cm, a depth of 10 cm, and a height of 10 cm). 1 ”is set and stored in the storage unit 12 (S2).

The display device 40 repeatedly captures an image with the camera 43 at predetermined timings (for example, every 1/1000 seconds), and sends image data obtained by the imaging to the image processing device 10. The image processing apparatus 10 receives image data captured by the camera 43 from the display apparatus 40 via the relay apparatus 30 (S3).

The image processing apparatus 10 acquires information on the direction of the user's head (face direction) and the amount of movement (for example, expressed by the coordinate values in the XYZ space). Specifically, the information on the head direction and the movement amount of the user may be information detected by the head direction sensor 441 of the display device 40 and output to the image processing device 10.

The image processing apparatus 10 determines the position coordinates RC of the rendering camera and the viewing direction (S4). Specifically, as illustrated in FIG. 7, the image processing apparatus 10 refers to the acquired information on the movement amount of the head, and includes the left image sensor 430 </ b> L and the right image sensor 430 </ b> R included in the camera 43. Get coordinate information in the target space. Then, the coordinate information obtained here, and each of the predetermined imaging elements 430L and R and the corresponding eyes of the user (the user's left eye for the imaging element 430L and the user's right eye for the imaging element 430R) are respectively relative to each other. The coordinate information of the left and right eyes of the user in the XYZ coordinate system in the target space is obtained using the specific coordinate information. The coordinate information in the virtual space corresponding to this coordinate information is set as the position coordinate RC of the rendering camera. Note that the XYZ coordinate system of the target space and the coordinate system of the virtual space may be matched, and the coordinate values in the XYZ coordinate system in the target space may be used as the position coordinates RC of the rendering camera as they are. Further, the image processing apparatus 10 determines the viewing direction based on the acquired information on the direction of the head.

Further, for example, the image processing apparatus 10 receives from the operation device 20 an instruction on whether to display an image in the real space (operation as a so-called camera see-through) or an image in the virtual space from the user (S5). Then, the image processing apparatus 10 starts processing for displaying the instructed image. Here, a case where an image of a real space is displayed will be described first (S5: “Real Space”).

In this case, the image processing apparatus 10 displays an image using the real space image captured by the camera 43 from the display apparatus 40 received in S3. In an example of the present embodiment, the image processing apparatus 10 uses the image data input from the image sensor 430L of the camera 43 as it is as image data for the left eye. Further, the image processing apparatus 10 directly uses the image data input from the image sensor 430R of the camera 43 as image data for the right eye. For example, the image processing apparatus 10 generates image data for the left eye and right eye in this way (S6).

The image processing apparatus 10 outputs the image data for the left eye and the image data for the right eye generated here to the display apparatus 40 via the relay apparatus 30 (S7). The display device 40 causes the left-eye image data to enter the left eye of the user via the left-eye optical element 42L as a left-eye image. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye. As a result, the user visually recognizes an image of the real space that is captured by the camera 43 and the viewpoint is converted to the position of the user's eyes. Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.

If it is determined in step S5 that the user has instructed display of an image in the virtual space (S5: “virtual space”), the image processing apparatus 10 uses the image data captured from the camera 43 based on the image data. An obtained depth map is generated (S8).

The image processing apparatus 10 divides the generated depth map into meshes, and the object captured by the pixels in the captured image data corresponding to the pixels in each mesh has a ceiling, a wall, and an obstacle (a predetermined height from the floor). It is determined in advance as an object other than the wall within the range) or the other. Further, from the value of the pixel on the depth map in each mesh, it is determined whether the surface shape of the mesh is a surface shape such as a plane, an uneven surface, a spherical surface, or a complex surface.

Then, the image processing apparatus 10 includes, as information related to the generated depth map, information indicating the position of each mesh on the depth map (may be the coordinates of the vertex of the mesh), information on the type of mesh, and information on the surface shape Are stored in the storage unit 12 as environment mesh list information (S9: environment mesh list information generation).

The image processing apparatus 10 sequentially selects each pixel of the depth map, and sets the value of the voxel corresponding to the distance to the object represented by the selected pixel of the depth map in the direction corresponding to the pixel selected on the depth map as “1”. Unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is set to “0”. Here, in the image data picked up by the camera 43, it is assumed that it is unclear whether or not there is an object that is hidden by an object in the real space and is not picked up. The value remains “−1”.

It should be noted that the image processing apparatus 10 is unclear whether the image data captured by the camera 43 has not been captured in the past and an object exists as the user moves or changes the line-of-sight direction. When the depth map of the part corresponding to the voxel (the part corresponding to the voxel whose value is “−1”) is obtained, the value of the voxel of the part is obtained based on the obtained depth map in step S11. Set to “0” or “1” to update (S10: Update object buffer).

The image processing apparatus 10 generates, from the coordinates represented by the position coordinates of the rendering camera, a projection image obtained by two-dimensionally projecting a voxel having a value other than “0” in the field of view determined in step S4 in the object buffer (FIG. 5). Then, the image processing apparatus 10 detects an object arranged in the real space based on the acquired image of the real space, and sets the detected object to a predetermined rule (when a game process is performed, The stereoscopic image replaced with the virtual object is output in accordance with the rules defined in the program No. 1).

Specifically, the image processing apparatus 10 accepts input of a background image and an operation panel, 3D model data such as rocks and boxes, trees and plants, and the voxel value is “3” in a virtual 3D space. The virtual object of the operation panel is arranged at a position corresponding to the range in which the voxel of “1” is projected and the mesh in the corresponding range is set as the “operation panel” based on the object conversion rule. Hereinafter, similarly, a voxel whose voxel value is “1” in the projected image is projected, and the mesh in the corresponding range is set as “rock” or “box” based on the object conversion rule. A virtual object of “rock” or “box” is arranged at a certain position, and each virtual object is arranged in the virtual space to form a virtual space. For the range corresponding to the ceiling, a background image is synthesized to form a virtual space as if there is no ceiling (S11: configure virtual space).

The image processing apparatus 10 includes the rendering camera position coordinates RC (user's coordinates) determined in step S4 in a virtual three-dimensional space (a coordinate system space corresponding to the target space) in which the three-dimensional model data is arranged. Rendering the image data in the visual field direction determined in step S4 (image data to be presented to each of the left eye and the right eye) viewed from the coordinate information corresponding to each of the left eye and the right eye (S12: rendering). Then, the image processing apparatus 10 outputs the pair of image data thus rendered as a stereoscopic image (S13).

The display device 40 causes the image data for the left eye to enter the left eye of the user through the left-eye optical element 42L as an image for the left eye. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye. Thus, the user visually recognizes an image in a virtual space in which an object in the real space is changed to a virtual object (such as an operation panel, a rock, or a box). Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.

In the example of FIG. 6, the processing for updating the object buffer (processing S10) may also be performed when the processing of processing S6 to S7 for displaying an image of the real space is executed.

As described above, in the example of the present embodiment, when the virtual space is displayed, as illustrated in FIG. 7, for example, the rendering camera is arranged at a position corresponding to the position of the user's eyes to reflect the real space ( (Image data obtained by replacing an object in the real space with a virtual object) Since a rendered image is presented, an image can be displayed while giving a more accurate sense of hand distance to the user. That is, when the user reaches out to touch the virtual object in the virtual space, the user touches the corresponding object in the real space.

In the process S6 of FIG. 6, the left-eye image data and the right-eye image data to be generated are extracted from the real space image data obtained by reconstructing the target space (the mesh obtained by the same process as the process S8 to the process S9). The texture is pasted as a virtual object, and these virtual objects are arranged in a virtual space corresponding to the target space to form a virtual space, and the viewing direction is determined from the position coordinates RC of the rendering camera determined in step S4. It may be displayed as a rendered image).

According to this example, an image viewed from an arbitrary position such as the position of the user's eyes is presented as an image representing the real space, and a rendered image viewed from the same position is also presented as an image in the virtual space. Even when switching between the image of the real space and the image of the virtual space, the visual field does not move substantially at the time of switching, and the sense of incongruity can be further reduced, and when switching from the real space to the virtual space, the virtual space By sequentially selecting an object in the real space to be replaced with a virtual object in the process S11 of the above configuration, rendering and outputting image data obtained by the rendering are used as if the real space object is used little by little in a game or the like. It is also possible to produce a scene as if it is being replaced by a virtual object in the virtual space.

[Modification]
In the above description, the image of the real space around the user is obtained by the camera 43 provided in the display device 40 worn by the user. However, the present embodiment is not limited to this. As another example of the present embodiment, an image captured by a camera arranged in a room where the user is located may be used.

In the description so far, the example in which the rendering camera position coordinate RC represents the position corresponding to the position of the user's eyes and the visual field direction is the direction of the user's face has been described. Is not limited to this. For example, the position coordinate RC of the rendering camera is a coordinate representing a position viewed from above the virtual space (for example, the coordinate value of the X-axis and Y-axis components is the user's position coordinate, and the coordinate value of the Z-axis component is predetermined. Value). Further, the direction of the visual field may be a direction behind the user (a direction opposite to the direction of the user's face). Furthermore, the range of the field of view may be changed to produce the effect of zooming. The range of view (view angle) may be arbitrarily set by a user operation.

[Another example of acquiring information on the position and tilt of the user's head]
In the above description, the information on the position and inclination of the user's head is assumed to be obtained from the head direction sensor 441 provided in the display device 40. However, the present embodiment is not limited to this, for example, The user is imaged by a camera arranged at a known position in the room where the user is located, and a predetermined point that moves with the user's head, for example, a predetermined marker arranged in advance on the display device 40 worn by the user The position and tilt angle of the user's head may be detected by detecting the position and orientation. Since a technique for detecting the position and inclination of a marker based on such a marker and image data obtained by imaging the marker is widely known, detailed description thereof is omitted here.

When acquiring information on the position and tilt of the user's head by this method, the display device 40 does not necessarily need to be provided with the head direction sensor 441.

[Use of eye sensor]
Further, in the above description, the relative coordinates between the reference position and the user's eye position, such as the relative coordinates between the image sensor 430 and the user's eye, are determined in advance. Without being limited thereto, the display device 40 is provided with an eye sensor 440 for detecting the position of the user's eyes, thereby detecting the position of the user's eyes and obtaining the relative coordinates between the reference position and the position of the user's eyes. May be.

As such an eye sensor 440, there is a visible camera or an infrared camera arranged at a known position with respect to a predetermined position of the display device 40. The eye sensor 440 detects the position of the user's eye, iris, cornea, or pupil (relative position with respect to the eye sensor 440), and the predetermined position of the known display device 40 (for example, the left image sensor for the left eye). Relative coordinate information from the position of 430L and the right eye (position of the right imaging element 430R) is obtained and output as information representing the iris or pupil center position on the eyeball surface.

[Example using gaze information]
Further, when the eye sensor 440 can detect vector information of the direction of the user's line of sight (eyeball direction), the parameters (for example, distortion correction parameters) of the video displayed on the display device 40 are used using the line-of-sight information. It may be changed.

[Depth of field]
Further, using this line-of-sight information, at the time of rendering, the distance to the virtual object in the direction of the line of sight is obtained, and the depth of field (in this case, rmin ≦ r ≦ rmax that designates the range r in focus) May be set). As an example, the image processing apparatus 10 obtains the vectors VL and VR of the line-of-sight direction for each of the left and right eyes of the user in the XYZ coordinate system in the target space, and then the direction of these vectors from the position coordinates RC of the rendering camera. The distances rL and rR to the virtual object at are obtained.

Then, the image processing apparatus 10 determines rmin (lower limit distance) and rmax (upper limit distance) for specifying the distance range r in focus as follows, for example. As an example, the image processing apparatus 10 obtains the arithmetic average r of the distances rL and rR obtained above. If r exceeds a predetermined threshold value rth, rmin is set to “0” and rmax is set to infinity (that is, pan focus is set). If r does not exceed the predetermined threshold value rth, the image processing apparatus 10 sets rmin to rmin = r−α and rmax to r + β. Α and β are positive values determined experimentally. Here, α = β may be satisfied.

Of course, the calculation examples of rmin and rmax shown here are merely examples, and other calculation methods may be used as long as the depth of field can be experimentally set according to the direction of the user's line of sight. The image processing apparatus 10 processes the rendered image data (stereoscopic image image data) using information (rmin, rmax in this example) that specifies the calculated depth of field.

That is, the image generation unit 24 of the image processing apparatus 10 divides the generated stereoscopic image image data (each image data for the left eye and for the right eye) into a plurality of image regions (for example, image blocks of a predetermined size). . For each image region, the image generation unit 24 obtains a pixel value (a distance from the display device 40 worn by the user) on the depth map corresponding to the pixel in the image region. Based on the pixel value on the depth map obtained here and the information related to the position of the user's eyes, the image generation unit 24 converts from the user's eyes to the pixels in the image area (for example, the center pixel of the image area). A distance D to the imaged object is obtained. If the distance D is rmin ≦ D ≦ rmax with respect to rmin and rmax, the image generation unit 24 does nothing for the image region.

Further, if the distance D obtained here is D <rmin, the image generation unit 24 applies a blurring process (for example, a Gaussian filter) with an intensity corresponding to the size of rmin−D to the pixels in the image area. The image looks as if it is out of focus.

Similarly, if the distance D obtained here is D> rmax, the image generation unit 24 performs a blurring process of intensity corresponding to the magnitude of D−rmax (also a Gaussian filter, for example) on the pixel of the image area. To make the image look as if it is out of focus.

Thus, based on the information for specifying the depth of field, the view-converted real space image based on the acquired real space image, or the virtual space image obtained by rendering, In addition, a stereoscopic image having the designated depth of field can be generated.

According to this example of the present embodiment, an image portion outside the distance range that the user is gazing at will be blurred and a more natural image is provided to the user.

DESCRIPTION OF SYMBOLS 10 Image processing apparatus, 11 Control part, 12 Storage part, 13 Interface part, 20 Operation device, 21 Image acquisition part, 23 Visual field determination processing part, 24 Image generation part, 25 Output part, 30 Relay apparatus, 40 Display apparatus, 41 Image display element, 42 optical element, 43 camera, 44 sensor, 45 communication interface, 430 imaging element, 440 eye sensor, 441 head direction sensor.

Claims

An image processing device connected to a display device used by a user wearing on the head,
Image acquisition means for acquiring an image of a real space around the user;
A determination means for determining information of the visual field;
An image generating means for generating an image of a visual field specified by the determined information based on the acquired image of the real space;
An image processing apparatus that outputs an image generated by the image generation means to the display device.
The image processing apparatus according to claim 1,
The visual field information is an image processing device that includes information indicating a position of an eye of a user wearing the display device.
The image processing apparatus according to claim 1, wherein:
The visual field information is an image processing apparatus including information on a face direction of a user wearing the display device.
The image processing apparatus according to any one of claims 1 to 3,
The image generation means detects an object arranged in the real space based on the acquired image of the real space, and is a virtual space corresponding to the real space, the three-dimensional corresponding to each detected object An image processing apparatus that arranges a model at a position in a virtual space corresponding to the position of each object in the real space, and generates an image of the visual field.
The image processing apparatus according to any one of claims 1 to 4, wherein:
The image generation means detects an object arranged in the real space based on the acquired image of the real space, and is a virtual space corresponding to the real space, the three-dimensional corresponding to each detected object An image generating means for generating a visual field image by arranging a model at a position in a virtual space corresponding to the position of each object in the real space;
By instructions
A process of outputting the acquired real space image;
A process of outputting the image generated by the image generating means;
An image processing apparatus that executes any one of the above.
An image processing device connected to a display device used by the user wearing on the head,
Image acquisition means for acquiring an image of a real space around the user;
A determination means for determining information of the visual field;
An image generating means for generating an image of a visual field specified by the determined information based on the acquired image of the real space;
A program for causing the display device to output an image generated by the image generation means.