WO2017191703A1 - Dispositif de traitement d'images - Google Patents

Dispositif de traitement d'images Download PDF

Info

Publication number
WO2017191703A1
WO2017191703A1 PCT/JP2017/005742 JP2017005742W WO2017191703A1 WO 2017191703 A1 WO2017191703 A1 WO 2017191703A1 JP 2017005742 W JP2017005742 W JP 2017005742W WO 2017191703 A1 WO2017191703 A1 WO 2017191703A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
user
image processing
real space
processing apparatus
Prior art date
Application number
PCT/JP2017/005742
Other languages
English (en)
Japanese (ja)
Inventor
良徳 大橋
Original Assignee
株式会社ソニー・インタラクティブエンタテインメント
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ソニー・インタラクティブエンタテインメント filed Critical 株式会社ソニー・インタラクティブエンタテインメント
Priority to JP2018515396A priority Critical patent/JPWO2017191703A1/ja
Publication of WO2017191703A1 publication Critical patent/WO2017191703A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09GARRANGEMENTS OR CIRCUITS FOR CONTROL OF INDICATING DEVICES USING STATIC MEANS TO PRESENT VARIABLE INFORMATION
    • G09G5/00Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators
    • G09G5/36Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory
    • G09G5/38Control arrangements or circuits for visual indicators common to cathode-ray tube indicators and other visual indicators characterised by the display of a graphic pattern, e.g. using an all-points-addressable [APA] memory with means for controlling the display position

Definitions

  • the present invention relates to an image processing apparatus that is connected to a display device that a user wears on the head.
  • Such a display device causes the user to view the image by forming an image in front of the user's eyes.
  • a display device includes a non-transmission type that covers the front of the user with a display unit and prevents the user from seeing the real space in front of the user, and a display unit that includes a half mirror or the like.
  • a transmission type optical see-through method for visually recognizing.
  • the real space in front of the user's eyes photographed by a camera is separately displayed on the display unit, so that the user can visually recognize the real space in front of the eyes in the same manner as the transmissive display device.
  • a device that realizes a pseudo-transmissive display device referred to as a camera see-through method.
  • the present invention has been made in view of the above circumstances, and one of its purposes is to provide an image processing apparatus capable of displaying an image while giving a more accurate sense of hand position to the user. .
  • the present invention that solves the problems of the conventional example is an image processing apparatus that is connected to a display device that a user wears on the head, and that acquires an image of a real space around the user Means for determining field-of-view information; and image generation means for generating a field-of-view image specified by the determined information based on the acquired real-space image.
  • the image generated by is output to the display device.
  • FIG. 1 is a configuration block diagram illustrating an example of an image processing system including an image processing apparatus according to an embodiment of the present invention. It is a functional block diagram showing the example of the image processing apparatus which concerns on embodiment of this invention. It is explanatory drawing showing the example of the information of the inclination of the head which the image processing apparatus which concerns on embodiment of this invention uses. It is explanatory drawing showing the outline
  • an image processing system 1 including an image processing apparatus 10 includes an image processing apparatus 10, an operation device 20, a relay device 30, and a display device 40. It is configured to include.
  • the image processing apparatus 10 is an apparatus that supplies an image to be displayed by the display device 40.
  • the image processing apparatus 10 is a consumer game machine, a portable game machine, a personal computer, a smartphone, a tablet, or the like.
  • the image processing apparatus 10 includes a control unit 11, a storage unit 12, and an interface unit 13.
  • the control unit 11 is a program control device such as a CPU, and executes a program stored in the storage unit 12. In the present embodiment, the control unit 11 acquires an image of a real space around the user wearing the display device 40, and generates an image of a designated visual field based on the acquired image of the real space. .
  • the control unit 11 has a predetermined size (for example, a width (when the user wears the display device 40 (ie An initial space perpendicular to the user's line-of-sight direction and parallel to the floor surface) 10 m, depth (an initial user's line-of-sight direction parallel to the floor surface) 10 m, and a 3 m high cuboid range)
  • a virtual space (hereinafter referred to as a virtual space) corresponding to the space) is configured.
  • control unit 11 places a virtual three-dimensional object in the virtual space or applies a video effect while referring to an image in the real space.
  • control unit 11 includes, in the virtual space, information related to a visual field used when rendering an image of the virtual space (information related to two fields, a visual field corresponding to the user's left eye and a visual field corresponding to the right eye, or common information). Image of the virtual field viewed from the visual field specified by the obtained information (when the visual field information corresponding to the left and right eyes is used, a stereoscopic image is obtained). Generate. Then, the control unit 11 outputs the generated image to the display device 40. Detailed operation of the control unit 11 will be described later.
  • the storage unit 12 includes at least one memory device such as a RAM and stores a program executed by the control unit 11.
  • the storage unit 12 also operates as a work memory for the control unit 11 and stores data used by the control unit 11 in the course of program execution.
  • the program may be provided by being stored in a computer-readable non-transitory recording medium and stored in the storage unit 12.
  • the interface unit 13 is an interface for the control unit 11 of the image processing apparatus 10 to perform data communication with the operation device 20 and the relay device 30.
  • the image processing apparatus 10 is connected to the operation device 20, the relay apparatus 30, or the like via the interface unit 13 by either wired or wireless.
  • the interface unit 13 is a multimedia such as HDMI (High-Definition Multimedia Interface) in order to transmit an image (stereoscopic image) and sound supplied by the image processing device 10 to the relay device 30.
  • An interface may be included.
  • a data communication interface such as a USB may be included in order to receive various information from the display device 40 via the relay device 30 and to transmit a control signal or the like.
  • the interface unit 13 may include a data communication interface such as a USB in order to receive a signal indicating the content of the user's operation input to the operation device 20.
  • the operation device 20 is a controller or the like of a consumer game machine, and is used by the user to perform various instruction operations on the image processing apparatus 10.
  • the content of the user's operation input to the operation device 20 is transmitted to the image processing apparatus 10 by either wired or wireless.
  • the operation device 20 is not necessarily separate from the image processing apparatus 10, and may include operation buttons, a touch panel, and the like arranged on the surface of the image processing apparatus 10.
  • the relay device 30 is connected to the display device 40 by either wired or wireless, receives data of an image (stereoscopic image) supplied from the image processing device 10, and outputs a video signal corresponding to the received data. Output to the display device 40. At this time, the relay device 30 executes a process for correcting distortion generated by the optical system of the display device 40 as necessary for the video represented by the supplied image, and outputs a video signal representing the corrected video. It may be output. If the image supplied from the image processing device 10 is a stereoscopic image, the video signal supplied from the relay device 30 to the display device 40 includes a left-eye video signal and a right-eye video signal generated based on the stereoscopic image. Two video signals including a video signal are included. The relay device 30 relays various information transmitted and received between the image processing device 10 and the display device 40 such as audio data and control signals, in addition to the stereoscopic image and the video signal.
  • the display device 40 is a display device that the user wears on the head and uses the video according to the video signal input from the relay device 30 to allow the user to browse. In the present embodiment, it is assumed that the display device 40 displays an image corresponding to each eye in front of each of the user's right eye and left eye. As shown in FIG. 1, the display device 40 includes a video display element 41, an optical element 42, a camera 43, a sensor unit 44, and a communication interface 45.
  • the video display element 41 is an organic EL display panel, a liquid crystal display panel, or the like, and displays a video corresponding to a video signal supplied from the relay device 30.
  • the video display element 41 may be one display element that displays the left-eye video and the right-eye video in a line, or displays the left-eye video and the right-eye video independently.
  • a pair of display elements may be included.
  • a display screen such as a smartphone may be used as the video display element 41 as it is. In this case, a smartphone or the like displays a video corresponding to the video signal supplied from the relay device 30.
  • the display device 40 may be a retinal irradiation type (retinal projection type) device that directly projects an image on a user's retina.
  • the image display element 41 may be configured by a laser that emits light and a MEMS (Micro Electro Mechanical Systems) mirror that scans the light.
  • the optical element 42 is a hologram, a prism, a half mirror, or the like, and is disposed in front of the user's eyes.
  • the optical element 42 transmits or refracts the image light displayed by the image display element 41 to enter the user's eyes.
  • the optical element 42 may include a left-eye optical element 42L and a right-eye optical element 42R.
  • the left-eye image displayed by the image display element 41 is incident on the user's left eye via the left-eye optical element 42L
  • the right-eye image is incident on the user's right eye via the right-eye optical element 42R. You may make it inject.
  • the user can view the left-eye video with the left eye and the right-eye video with the right eye while the display device 40 is mounted on the head.
  • the display device 40 is assumed to be a non-transmissive display device in which the user cannot visually recognize the appearance of the outside world.
  • the camera 43 has a pair of image sensors 430L and 430R arranged on the left side slightly on the front side (front side of the user) of the display device 40 and on the right side with respect to the center (when there is no need to distinguish between left and right in the following description). , Collectively referred to as the image sensor 430).
  • the camera 43 may include at least one image sensor 430B disposed on the back side of the user.
  • the camera 43 captures at least an image of the real space in front of the user captured by the image sensors 430L and 430R, and outputs image data obtained by capturing the image to the image processing device 10 via the relay device 30. .
  • the sensor 44 may further include a head direction sensor 441 that detects the direction of the user's head (the front direction of the user's face) and the position.
  • the head direction sensor 441 detects the direction of the user's head (face direction).
  • the head direction sensor 441 is a gyro or the like, and the rotation angle in the head direction and the rotation angle in the elevation direction in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted. Then, the rotation angle around the viewing direction axis is detected and output.
  • the head direction sensor 441 uses a predetermined position of the display device 40 (for example, the position of a point that bisects a line segment connecting the image sensor 430L and the image sensor 430R of the camera 43) as a reference position.
  • the left and right direction of the user (the axis where the cross section and the coronal plane intersect, hereinafter referred to as the X axis), the front and rear direction (the axis where the sagittal plane and the cross section intersect, hereinafter referred to as the Y axis), and the vertical direction (Z axis) ) Is detected and output (x, y, z) from when it is attached.
  • the relative coordinates of each image sensor 430 with the reference position as the origin are known.
  • the communication interface 45 is an interface for communicating data such as video signals and image data with the relay device 30.
  • the communication interface 45 includes a communication antenna and a communication module.
  • control unit 11 functionally executes the program stored in the storage unit 12 to functionally obtain an image acquisition unit 21, a visual field determination processing unit 23, and an image generation unit 24.
  • the output unit 25 is realized.
  • the image acquisition unit 21 acquires an image of the real space around the user wearing the display device 40. Specifically, the image acquisition unit 21 receives image data captured by the camera 43 from the display device 40 via the relay device 30.
  • the image data captured by the camera 43 is a pair of image data captured by the pair of imaging elements 430 arranged on the left and right, and in the real space captured by the parallax of each image data.
  • the distance to the object can be determined.
  • the image acquisition unit 21 generates and outputs a depth map having the same size as the image data (hereinafter referred to as captured image data for distinction) based on the image data captured by the camera 43.
  • the depth map is image data in which information indicating the distance to the object captured by each pixel of the image data captured by the camera 43 is used as the value of the pixel corresponding to the pixel.
  • the visual field determination processing unit 23 determines visual field information for rendering the virtual space.
  • the visual field determination processing unit 23 is determined in advance (for example, may be hard-coded in a program or described in a setting file) regardless of the position of the image sensor 430 included in the camera 43.
  • a position coordinate RC of a camera hereinafter referred to as a rendering camera, in which an image viewed from the position of the rendering camera is rendered) in rendering in the virtual space;
  • Information representing the direction of the visual field (for example, vector information starting from the position coordinate RC and passing through the center of the visual field) is obtained and used as visual field information.
  • the visual field determination processing unit 23 may obtain the position coordinates RC of the rendering camera in the virtual space as relative coordinates from the reference position in the real space that changes with time according to the user's movement.
  • the reference position may be the position of the image sensor 430
  • the position in the virtual space corresponding to the position moved from the image sensor 430 by a predetermined relative coordinate value may be the position coordinate RC of the rendering camera.
  • the relative coordinates may be relative coordinates from the position of the image sensor 430R (or 430L) to the position where the user's right eye (or left eye) wearing the display device 40 should be, for example.
  • the position in the virtual space corresponding to the position of the user's eyes becomes the position coordinate RC of the rendering camera.
  • the visual field determination processing unit 23 may obtain information indicating the direction of the visual field (for example, vector information passing through the center of the visual field starting from the position coordinates RC) from the information output from the head direction sensor 441. In this case, the visual field determination processing unit 23 determines that the information output from the head direction sensor 441 is in a plane parallel to the floor surface from the initial direction when the display device 40 is mounted, as illustrated in FIG.
  • the rotation angle ⁇ in the head direction, the rotation angle ⁇ in the elevation direction, the rotation angle ⁇ about the axis in the viewing direction, and the movement amount (x, y, z) of the head are acquired.
  • the visual field determination processing unit 23 determines the direction of the user's visual field based on the rotation angles ⁇ and ⁇ , and determines the inclination of the user's neck around the visual field direction based on the rotational angle ⁇ .
  • the visual field determination processing unit 23 sets the coordinates in the virtual space corresponding to the positions of the left and right eyes of the user in the real space as the position coordinates RC of the rendering camera. That is, coordinate information in the target space of the left image sensor 430L and the right image sensor 430R included in the camera 43 (information related to the amount of movement of the user's head and the relative position from the reference position to each image sensor 430). In the virtual space corresponding to the positions of the left and right eyes of the user in the XYZ coordinate system in the target space from the information of the relative coordinates of the left and right eyes with respect to the position of each image sensor 430. The coordinate information is output to the image generation unit 24 as the position coordinate RC of the rendering camera.
  • the visual field determination processing unit 23 also includes information on the direction of the visual field determined by the rotation angle ⁇ and ⁇ and the direction of the user's visual field, and the tilt of the user's neck about the visual field direction determined by the rotation angle ⁇ . Are output to the image generation unit 24.
  • the image generation unit 24 receives information about the position coordinates RC of the rendering camera and the direction of the visual field from the visual field determination processing unit 23. And this image generation part 24 produces
  • the image generation unit 24 first generates environment mesh list information and an object buffer based on the depth map information output from the image acquisition unit 21.
  • the environment mesh list information is obtained by dividing the depth map into meshes, the vertex coordinates of the mesh (information indicating the pixel position), mesh identification information, and pixels in the captured image data corresponding to the pixels in the mesh. Includes information on the normal of the object imaged on the screen, mesh type information (information indicating which type is predetermined), and information on the surface shape of the mesh.
  • This mesh type information indicates that the object imaged on the pixel in the captured image data corresponding to the pixel in the mesh is an object other than a wall that is within a predetermined height from the floor, ceiling, wall, or obstacle. Or the like) or the other. Further, the information regarding the surface shape of the mesh is information indicating whether the surface shape is a plane, an uneven surface, a spherical surface, or a complex surface.
  • the object buffer has a predetermined size (for example, a width (a direction perpendicular to the initial user's line-of-sight direction and parallel to the floor surface) 10 m, a depth (floor) including the user's position and the rear of the user's line-of-sight direction.
  • a target space an actual space (hereinafter referred to as a target space) of 10 m and a height of 3 m in a rectangular parallelepiped direction (hereinafter referred to as a target space) is virtually a voxel (a virtual volume element, for example, a width of 10 cm, a depth).
  • voxel value (Cube element of 10 cm and 10 cm in height) is expressed in space, the value of the voxel in which the object exists (voxel value) is “1”, the value of the non-existing voxel is “0”, and it is unknown
  • the voxel value is set to “ ⁇ 1” (FIG. 4).
  • FIG. 4 shows only a part of the voxels for convenience of illustration, and the size of the voxels is also set as appropriate for explanation.
  • the size of the voxel with respect to the target space does not necessarily indicate a size suitable for implementation.
  • a cubic object is arranged at the back corner of the target space, and the value of the voxel corresponding to the surface is set to “1” indicating that the object exists, and is hidden from the surface.
  • the value of the partial voxel is set to “ ⁇ 1” indicating that it is unknown, and the value of the voxel existing up to the object surface is set to “0” indicating that there is nothing.
  • the image generation unit 24 sets this voxel value based on the depth map information.
  • Each pixel on the depth map has a vertex at the position coordinates of the camera 43 at the time of shooting the image data that is the origin of the depth map (which may be a reference position coordinate, hereinafter referred to as a shooting position), and the angle of view of the depth map. Is divided by the resolution of the depth map (vertical py pixels ⁇ horizontal px pixels).
  • a vector parallel to the line segment passing through the vertex of each pixel starting from the coordinates of the shooting position (the difference in coordinates in the world coordinate system) or a line segment passing through the center of each pixel starting from the coordinates of the shooting position
  • a parallel vector (difference in coordinates in the world coordinate system) can be calculated as the direction of each pixel from the coordinates of the shooting position, information representing the angle of view of the depth map, and the resolution of the depth map.
  • the image generation unit 24 for each pixel on the depth map, is in the direction of the pixel from the coordinates in the object buffer corresponding to the coordinates of the shooting position (may be the coordinates of the reference position), and the object represented by the depth map
  • the value of the voxel corresponding to the distance to is “1”, and unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is “0”.
  • the image generation unit 24 is hidden by an object in the real space in the image data picked up by the camera 43 and is behind the part not picked up (the object behind the desk, the wall, or the floor).
  • the voxel value of the corresponding portion is set to “ ⁇ 1” because it is unknown whether or not an object exists.
  • the image generation unit 24 moves the user or changes the direction of the line of sight, so that the image data captured by the camera 43 has not been captured in the past and whether or not an object exists is unknown.
  • the depth map of the portion corresponding to (the portion corresponding to the voxel whose value was “ ⁇ 1”) is obtained
  • the value of the voxel of the portion is set to “0” or “0” based on the obtained depth map. Set to “1” to update.
  • a method for setting a voxel value in a three-dimensional space representing a range where such an object exists from information such as a depth map is a method widely known as a 3D scanning method in addition to the method described here. Various methods can be employed.
  • the image generation unit 24 uses the rendering camera position coordinates RC specified by the information input from the visual field determination processing unit 23 to determine the visual field direction specified by the information input from the visual field determination processing unit 23 in the object buffer.
  • a projection image is generated by two-dimensionally projecting a voxel whose value is not “0” (FIG. 5). Then, the image generation unit 24 detects an object arranged in the real space based on the acquired real space image, and generates a stereoscopic image in which a virtual object is arranged at the position of each detected object. And output.
  • the image generation unit 24 configures a virtual space according to a predetermined rule (referred to as an object conversion rule).
  • the object conversion rule is as follows.
  • the value of voxel is “1” and the mesh of the corresponding part is (1) If the mesh type is “ceiling”, the background is synthesized.
  • a virtual object “operation panel” is arranged at the position of an object whose mesh type is an obstacle and whose mesh surface shape is a plane.
  • a “rock” or “box”, which is a virtual object, is placed at the position of an object whose type of mesh is an obstacle and whose mesh surface shape is an uneven surface.
  • a “light”, which is a virtual object, is placed at the position of an object whose mesh type is an obstacle and whose mesh surface shape is spherical.
  • “Trees and plants”, which are virtual objects, are arranged in a range of an object having a mesh type obstacle and a complicated mesh surface shape.
  • the image generation unit 24 separately receives input of a background image, an operation panel, and three-dimensional model data of a virtual object such as a rock, a box, a tree, and a plant.
  • 5 is a range in which a voxel having a voxel value “1” in the projected image illustrated in FIG. 5 is projected, and the mesh in the corresponding range is set as an “operation panel” based on the object conversion rule.
  • a voxel having a voxel value of “1” in the projected image is projected, and the mesh in the corresponding range is set to “rock” or “box” based on the object conversion rule.
  • a virtual object of “rock” or “box” is arranged at a position corresponding to the part, and each virtual object is arranged as follows.
  • the process of arranging the virtual object represented by the 3D model data in the virtual space is widely known in the process of creating the 3D graphics, so detailed description thereof is omitted here. .
  • the image generation unit 24 calculates the field of view from the position coordinates RC of the rendering camera (here, the coordinates corresponding to the left eye and the right eye of the user) input from the field of view determination processing unit 23.
  • the images viewed from the viewing direction specified by the information input from the determination processing unit 23 are rendered.
  • the image generation unit 24 outputs the pair of image data obtained by rendering to the output unit 25 as a stereoscopic image.
  • the output unit 25 outputs the image data of the stereoscopic image generated by the image generation unit 24 to the display device 40 via the relay device 30.
  • the image generation unit 24 may generate image data tilted by this angle ⁇ using information on the angle ⁇ related to the tilt of the user's neck at the time of rendering.
  • the image generation unit 24 replaces the stereoscopic image in the virtual space according to the instruction with the real space image captured by the imaging elements 430L and R of the camera 43 as it is. (So-called camera see-through function is provided).
  • the image generation unit 24 outputs image data captured by the left imaging element 430L and the right imaging element 430R included in the camera 43 as they are, and the output unit 25 outputs an image based on these image data.
  • the image is displayed as it is as the image for the left eye and the image for the right eye.
  • the image processing apparatus 10 basically includes the above configuration and operates as follows.
  • the image processing device 10 starts the processing illustrated in FIG. 6, and sets the reference position of the display device 40 (for example, the gravity center position of each image sensor 430 of the camera 43) as the origin.
  • This object space is represented by an object buffer (initially, all voxel values are represented by “ ⁇ ” as a virtual voxel (virtual volume element, for example, a cubic element having a width of 10 cm, a depth of 10 cm, and a height of 10 cm). 1 ”is set and stored in the storage unit 12 (S2).
  • the display device 40 repeatedly captures an image with the camera 43 at predetermined timings (for example, every 1/1000 seconds), and sends image data obtained by the imaging to the image processing device 10.
  • the image processing apparatus 10 receives image data captured by the camera 43 from the display apparatus 40 via the relay apparatus 30 (S3).
  • the image processing apparatus 10 acquires information on the direction of the user's head (face direction) and the amount of movement (for example, expressed by the coordinate values in the XYZ space). Specifically, the information on the head direction and the movement amount of the user may be information detected by the head direction sensor 441 of the display device 40 and output to the image processing device 10.
  • the image processing apparatus 10 determines the position coordinates RC of the rendering camera and the viewing direction (S4). Specifically, as illustrated in FIG. 7, the image processing apparatus 10 refers to the acquired information on the movement amount of the head, and includes the left image sensor 430 ⁇ / b> L and the right image sensor 430 ⁇ / b> R included in the camera 43. Get coordinate information in the target space. Then, the coordinate information obtained here, and each of the predetermined imaging elements 430L and R and the corresponding eyes of the user (the user's left eye for the imaging element 430L and the user's right eye for the imaging element 430R) are respectively relative to each other.
  • the coordinate information of the left and right eyes of the user in the XYZ coordinate system in the target space is obtained using the specific coordinate information.
  • the coordinate information in the virtual space corresponding to this coordinate information is set as the position coordinate RC of the rendering camera.
  • the XYZ coordinate system of the target space and the coordinate system of the virtual space may be matched, and the coordinate values in the XYZ coordinate system in the target space may be used as the position coordinates RC of the rendering camera as they are.
  • the image processing apparatus 10 determines the viewing direction based on the acquired information on the direction of the head.
  • the image processing apparatus 10 receives from the operation device 20 an instruction on whether to display an image in the real space (operation as a so-called camera see-through) or an image in the virtual space from the user (S5). Then, the image processing apparatus 10 starts processing for displaying the instructed image.
  • S5 “Real Space”.
  • the image processing apparatus 10 displays an image using the real space image captured by the camera 43 from the display apparatus 40 received in S3.
  • the image processing apparatus 10 uses the image data input from the image sensor 430L of the camera 43 as it is as image data for the left eye. Further, the image processing apparatus 10 directly uses the image data input from the image sensor 430R of the camera 43 as image data for the right eye. For example, the image processing apparatus 10 generates image data for the left eye and right eye in this way (S6).
  • the image processing apparatus 10 outputs the image data for the left eye and the image data for the right eye generated here to the display apparatus 40 via the relay apparatus 30 (S7).
  • the display device 40 causes the left-eye image data to enter the left eye of the user via the left-eye optical element 42L as a left-eye image. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye.
  • the user visually recognizes an image of the real space that is captured by the camera 43 and the viewpoint is converted to the position of the user's eyes. Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.
  • step S5 If it is determined in step S5 that the user has instructed display of an image in the virtual space (S5: “virtual space”), the image processing apparatus 10 uses the image data captured from the camera 43 based on the image data. An obtained depth map is generated (S8).
  • the image processing apparatus 10 divides the generated depth map into meshes, and the object captured by the pixels in the captured image data corresponding to the pixels in each mesh has a ceiling, a wall, and an obstacle (a predetermined height from the floor). It is determined in advance as an object other than the wall within the range) or the other. Further, from the value of the pixel on the depth map in each mesh, it is determined whether the surface shape of the mesh is a surface shape such as a plane, an uneven surface, a spherical surface, or a complex surface.
  • the image processing apparatus 10 includes, as information related to the generated depth map, information indicating the position of each mesh on the depth map (may be the coordinates of the vertex of the mesh), information on the type of mesh, and information on the surface shape Are stored in the storage unit 12 as environment mesh list information (S9: environment mesh list information generation).
  • the image processing apparatus 10 sequentially selects each pixel of the depth map, and sets the value of the voxel corresponding to the distance to the object represented by the selected pixel of the depth map in the direction corresponding to the pixel selected on the depth map as “1”. Unlike the voxel, the value of the voxel on the line from the voxel to the camera 43 is set to “0”. Here, in the image data picked up by the camera 43, it is assumed that it is unclear whether or not there is an object that is hidden by an object in the real space and is not picked up. The value remains “ ⁇ 1”.
  • the image processing apparatus 10 is unclear whether the image data captured by the camera 43 has not been captured in the past and an object exists as the user moves or changes the line-of-sight direction.
  • the depth map of the part corresponding to the voxel (the part corresponding to the voxel whose value is “ ⁇ 1”) is obtained
  • the value of the voxel of the part is obtained based on the obtained depth map in step S11.
  • Set to “0” or “1” to update (S10: Update object buffer).
  • the image processing apparatus 10 generates, from the coordinates represented by the position coordinates of the rendering camera, a projection image obtained by two-dimensionally projecting a voxel having a value other than “0” in the field of view determined in step S4 in the object buffer (FIG. 5). Then, the image processing apparatus 10 detects an object arranged in the real space based on the acquired image of the real space, and sets the detected object to a predetermined rule (when a game process is performed, The stereoscopic image replaced with the virtual object is output in accordance with the rules defined in the program No. 1).
  • the image processing apparatus 10 accepts input of a background image and an operation panel, 3D model data such as rocks and boxes, trees and plants, and the voxel value is “3” in a virtual 3D space.
  • the virtual object of the operation panel is arranged at a position corresponding to the range in which the voxel of “1” is projected and the mesh in the corresponding range is set as the “operation panel” based on the object conversion rule.
  • a voxel whose voxel value is “1” in the projected image is projected, and the mesh in the corresponding range is set as “rock” or “box” based on the object conversion rule.
  • a virtual object of “rock” or “box” is arranged at a certain position, and each virtual object is arranged in the virtual space to form a virtual space.
  • a background image is synthesized to form a virtual space as if there is no ceiling (S11: configure virtual space).
  • the image processing apparatus 10 includes the rendering camera position coordinates RC (user's coordinates) determined in step S4 in a virtual three-dimensional space (a coordinate system space corresponding to the target space) in which the three-dimensional model data is arranged. Rendering the image data in the visual field direction determined in step S4 (image data to be presented to each of the left eye and the right eye) viewed from the coordinate information corresponding to each of the left eye and the right eye (S12: rendering). Then, the image processing apparatus 10 outputs the pair of image data thus rendered as a stereoscopic image (S13).
  • RC user's coordinates
  • the display device 40 causes the image data for the left eye to enter the left eye of the user through the left-eye optical element 42L as an image for the left eye. Further, the display device 40 causes the image data for the right eye to enter the right eye of the user via the optical element for the right eye 42R as a video image for the right eye.
  • the user visually recognizes an image in a virtual space in which an object in the real space is changed to a virtual object (such as an operation panel, a rock, or a box). Thereafter, the image processing apparatus 10 returns to the process S3 and repeats the process.
  • processing for updating the object buffer may also be performed when the processing of processing S6 to S7 for displaying an image of the real space is executed.
  • the rendering camera is arranged at a position corresponding to the position of the user's eyes to reflect the real space (Image data obtained by replacing an object in the real space with a virtual object) Since a rendered image is presented, an image can be displayed while giving a more accurate sense of hand distance to the user. That is, when the user reaches out to touch the virtual object in the virtual space, the user touches the corresponding object in the real space.
  • the left-eye image data and the right-eye image data to be generated are extracted from the real space image data obtained by reconstructing the target space (the mesh obtained by the same process as the process S8 to the process S9).
  • the texture is pasted as a virtual object, and these virtual objects are arranged in a virtual space corresponding to the target space to form a virtual space, and the viewing direction is determined from the position coordinates RC of the rendering camera determined in step S4. It may be displayed as a rendered image).
  • an image viewed from an arbitrary position such as the position of the user's eyes is presented as an image representing the real space, and a rendered image viewed from the same position is also presented as an image in the virtual space.
  • the visual field does not move substantially at the time of switching, and the sense of incongruity can be further reduced, and when switching from the real space to the virtual space, the virtual space
  • rendering and outputting image data obtained by the rendering are used as if the real space object is used little by little in a game or the like. It is also possible to produce a scene as if it is being replaced by a virtual object in the virtual space.
  • the image of the real space around the user is obtained by the camera 43 provided in the display device 40 worn by the user.
  • the present embodiment is not limited to this.
  • an image captured by a camera arranged in a room where the user is located may be used.
  • the rendering camera position coordinate RC represents the position corresponding to the position of the user's eyes and the visual field direction is the direction of the user's face
  • the position coordinate RC of the rendering camera is a coordinate representing a position viewed from above the virtual space (for example, the coordinate value of the X-axis and Y-axis components is the user's position coordinate, and the coordinate value of the Z-axis component is predetermined. Value).
  • the direction of the visual field may be a direction behind the user (a direction opposite to the direction of the user's face).
  • the range of the field of view may be changed to produce the effect of zooming.
  • the range of view (view angle) may be arbitrarily set by a user operation.
  • the information on the position and tilt of the user's head is assumed to be obtained from the head direction sensor 441 provided in the display device 40.
  • the present embodiment is not limited to this, for example, The user is imaged by a camera arranged at a known position in the room where the user is located, and a predetermined point that moves with the user's head, for example, a predetermined marker arranged in advance on the display device 40 worn by the user.
  • the position and tilt angle of the user's head may be detected by detecting the position and orientation. Since a technique for detecting the position and inclination of a marker based on such a marker and image data obtained by imaging the marker is widely known, detailed description thereof is omitted here.
  • the display device 40 When acquiring information on the position and tilt of the user's head by this method, the display device 40 does not necessarily need to be provided with the head direction sensor 441.
  • the relative coordinates between the reference position and the user's eye position are determined in advance.
  • the display device 40 is provided with an eye sensor 440 for detecting the position of the user's eyes, thereby detecting the position of the user's eyes and obtaining the relative coordinates between the reference position and the position of the user's eyes. May be.
  • an eye sensor 440 there is a visible camera or an infrared camera arranged at a known position with respect to a predetermined position of the display device 40.
  • the eye sensor 440 detects the position of the user's eye, iris, cornea, or pupil (relative position with respect to the eye sensor 440), and the predetermined position of the known display device 40 (for example, the left image sensor for the left eye).
  • Relative coordinate information from the position of 430L and the right eye (position of the right imaging element 430R) is obtained and output as information representing the iris or pupil center position on the eyeball surface.
  • the eye sensor 440 can detect vector information of the direction of the user's line of sight (eyeball direction), the parameters (for example, distortion correction parameters) of the video displayed on the display device 40 are used using the line-of-sight information. It may be changed.
  • the image processing apparatus 10 obtains the vectors VL and VR of the line-of-sight direction for each of the left and right eyes of the user in the XYZ coordinate system in the target space, and then the direction of these vectors from the position coordinates RC of the rendering camera. The distances rL and rR to the virtual object at are obtained.
  • the image processing apparatus 10 determines rmin (lower limit distance) and rmax (upper limit distance) for specifying the distance range r in focus as follows, for example.
  • the image processing apparatus 10 processes the rendered image data (stereoscopic image image data) using information (rmin, rmax in this example) that specifies the calculated depth of field.
  • the image generation unit 24 of the image processing apparatus 10 divides the generated stereoscopic image image data (each image data for the left eye and for the right eye) into a plurality of image regions (for example, image blocks of a predetermined size). . For each image region, the image generation unit 24 obtains a pixel value (a distance from the display device 40 worn by the user) on the depth map corresponding to the pixel in the image region. Based on the pixel value on the depth map obtained here and the information related to the position of the user's eyes, the image generation unit 24 converts from the user's eyes to the pixels in the image area (for example, the center pixel of the image area). A distance D to the imaged object is obtained. If the distance D is rmin ⁇ D ⁇ rmax with respect to rmin and rmax, the image generation unit 24 does nothing for the image region.
  • the image generation unit 24 applies a blurring process (for example, a Gaussian filter) with an intensity corresponding to the size of rmin ⁇ D to the pixels in the image area.
  • a blurring process for example, a Gaussian filter
  • the image generation unit 24 performs a blurring process of intensity corresponding to the magnitude of D ⁇ rmax (also a Gaussian filter, for example) on the pixel of the image area. To make the image look as if it is out of focus.
  • D ⁇ rmax also a Gaussian filter, for example
  • the view-converted real space image based on the acquired real space image, or the virtual space image obtained by rendering In addition, a stereoscopic image having the designated depth of field can be generated.
  • an image portion outside the distance range that the user is gazing at will be blurred and a more natural image is provided to the user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)
  • Position Input By Displaying (AREA)
  • Controls And Circuits For Display Device (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention concerne un dispositif de traitement d'images, lequel est connecté à un dispositif d'affichage qui est porté sur la tête d'un utilisateur pendant son utilisation, qui acquiert une image d'un espace réel autour de l'utilisateur, détermine des informations associées au champ de vision, et génère, sur la base de l'image acquise du monde réel, une image du champ de vision identifié par les informations déterminées. Le dispositif de traitement d'images transmet l'image générée au dispositif d'affichage.
PCT/JP2017/005742 2016-05-02 2017-02-16 Dispositif de traitement d'images WO2017191703A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2018515396A JPWO2017191703A1 (ja) 2016-05-02 2017-02-16 画像処理装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016-092561 2016-05-02
JP2016092561 2016-05-02

Publications (1)

Publication Number Publication Date
WO2017191703A1 true WO2017191703A1 (fr) 2017-11-09

Family

ID=60203728

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/005742 WO2017191703A1 (fr) 2016-05-02 2017-02-16 Dispositif de traitement d'images

Country Status (2)

Country Link
JP (1) JPWO2017191703A1 (fr)
WO (1) WO2017191703A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021200270A1 (fr) * 2020-03-31 2021-10-07 ソニーグループ株式会社 Dispositif de traitement d'informations et procédé de traitement d'informations
WO2024004321A1 (fr) * 2022-06-28 2024-01-04 キヤノン株式会社 Dispositif de traitement d'image, dispositif de visiocasque, procédé de commande pour dispositif de traitement d'image, procédé de commande pour dispositif de visiocasque, et programme

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004038470A (ja) * 2002-07-02 2004-02-05 Canon Inc 複合現実感装置および情報処理方法
JP2005327204A (ja) * 2004-05-17 2005-11-24 Canon Inc 画像合成システムおよび画像合成方法、および画像合成装置
JP2009271732A (ja) * 2008-05-07 2009-11-19 Sony Corp 情報提示装置及び情報提示方法、撮像装置、並びにコンピュータ・プログラム
JP2012133471A (ja) * 2010-12-20 2012-07-12 Kokusai Kogyo Co Ltd 画像合成装置、画像合成プログラム、及び画像合成システム
JP2014511512A (ja) * 2010-12-17 2014-05-15 マイクロソフト コーポレーション 拡張現実ディスプレイ用最適焦点エリア
WO2016048658A1 (fr) * 2014-09-25 2016-03-31 Pcms Holdings, Inc. Système et procédé de création automatisée de contenu visuel

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3631151B2 (ja) * 2000-11-30 2005-03-23 キヤノン株式会社 情報処理装置、複合現実感提示装置及びその方法並びに記憶媒体
JP2013218535A (ja) * 2012-04-09 2013-10-24 Crescent Inc 三次元モデリングされたcg画像内にcg画像化された手指を表示する方法及び装置、並びに三次元モデリングされたcg画像を表示する広視野角ヘッドマウントディスプレイ装置
JP6294054B2 (ja) * 2013-11-19 2018-03-14 株式会社Nttドコモ 映像表示装置、映像提示方法及びプログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004038470A (ja) * 2002-07-02 2004-02-05 Canon Inc 複合現実感装置および情報処理方法
JP2005327204A (ja) * 2004-05-17 2005-11-24 Canon Inc 画像合成システムおよび画像合成方法、および画像合成装置
JP2009271732A (ja) * 2008-05-07 2009-11-19 Sony Corp 情報提示装置及び情報提示方法、撮像装置、並びにコンピュータ・プログラム
JP2014511512A (ja) * 2010-12-17 2014-05-15 マイクロソフト コーポレーション 拡張現実ディスプレイ用最適焦点エリア
JP2012133471A (ja) * 2010-12-20 2012-07-12 Kokusai Kogyo Co Ltd 画像合成装置、画像合成プログラム、及び画像合成システム
WO2016048658A1 (fr) * 2014-09-25 2016-03-31 Pcms Holdings, Inc. Système et procédé de création automatisée de contenu visuel

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TOSHIKAZU OSHIMA ET AL.: "RV-Border Guards : A Multi-player Mixed Reality Entertainment", TRANSACTIONS OF THE VIRTUAL REALITY SOCIETY OF JAPAN, vol. 4, no. 4, 31 December 1999 (1999-12-31), pages 699 - 705, XP055410021, ISSN: 1344-011X *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021200270A1 (fr) * 2020-03-31 2021-10-07 ソニーグループ株式会社 Dispositif de traitement d'informations et procédé de traitement d'informations
WO2024004321A1 (fr) * 2022-06-28 2024-01-04 キヤノン株式会社 Dispositif de traitement d'image, dispositif de visiocasque, procédé de commande pour dispositif de traitement d'image, procédé de commande pour dispositif de visiocasque, et programme

Also Published As

Publication number Publication date
JPWO2017191703A1 (ja) 2018-10-04

Similar Documents

Publication Publication Date Title
JP6933727B2 (ja) 画像処理装置、画像処理方法、およびプログラム
JP5996814B1 (ja) 仮想空間の画像をヘッドマウントディスプレイに提供する方法及びプログラム
CN107209950B (zh) 从现实世界材料自动生成虚拟材料
US11184597B2 (en) Information processing device, image generation method, and head-mounted display
US9106906B2 (en) Image generation system, image generation method, and information storage medium
JP6899875B2 (ja) 情報処理装置、映像表示システム、情報処理装置の制御方法、及びプログラム
US20180246331A1 (en) Helmet-mounted display, visual field calibration method thereof, and mixed reality display system
US10607398B2 (en) Display control method and system for executing the display control method
US20230156176A1 (en) Head mounted display apparatus
JP6687751B2 (ja) 画像表示システム、画像表示装置、その制御方法、及びプログラム
JP6682624B2 (ja) 画像処理装置
JP6649010B2 (ja) 情報処理装置
WO2017191703A1 (fr) Dispositif de traitement d'images
JP6591667B2 (ja) 画像処理システム、画像処理装置、及びプログラム
JP2019102828A (ja) 画像処理装置、画像処理方法、及び画像処理プログラム
WO2017163649A1 (fr) Dispositif de traitement d'image
JP6613099B2 (ja) 仮想現実空間を立体的に表示するためのプログラム、コンピュータ及びヘッドマウントディスプレイシステム
WO2018173206A1 (fr) Dispositif de traitement d'informations
JP2017142769A (ja) 仮想空間の画像をヘッドマウントディスプレイに提供する方法及びプログラム

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018515396

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17792636

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17792636

Country of ref document: EP

Kind code of ref document: A1