WO2018016316A1 - Dispositif de traitement d'image, procédé de traitement d'image, programme, et système de téléprésence - Google Patents

Dispositif de traitement d'image, procédé de traitement d'image, programme, et système de téléprésence Download PDF

Info

Publication number
WO2018016316A1
WO2018016316A1 PCT/JP2017/024571 JP2017024571W WO2018016316A1 WO 2018016316 A1 WO2018016316 A1 WO 2018016316A1 JP 2017024571 W JP2017024571 W JP 2017024571W WO 2018016316 A1 WO2018016316 A1 WO 2018016316A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
user
images
unit
hand
Prior art date
Application number
PCT/JP2017/024571
Other languages
English (en)
Japanese (ja)
Inventor
穎 陸
祐介 阪井
雅人 赤尾
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Publication of WO2018016316A1 publication Critical patent/WO2018016316A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the present disclosure relates to an image processing device, an image processing method, a program, and a telepresence system, and more particularly, to an image processing device, an image processing method, a program, and a telepresence system that enable communication with less discomfort. .
  • the imaging device that captures the user on the other side and the display device that displays the user on the other side are arranged at different positions. There was a case where a sense of incongruity that it did not fit occurred.
  • Patent Document 1 the lines of sight of the users are matched by imaging with a camera provided on the back side of the display main body through a hole provided in the polarizing plate of the liquid crystal display unit.
  • a camera-integrated display device There has been proposed a camera-integrated display device.
  • This disclosure has been made in view of such a situation, and is intended to enable communication without a sense of incongruity.
  • An image processing apparatus includes an image acquisition unit that acquires a plurality of images captured by a user from a plurality of viewpoints, and a plurality of the user's portions based on the plurality of images.
  • An image generation unit that generates a display image that appropriately displays a part of the angle of view when the image is captured, and the display image generated by the image generation unit is transmitted.
  • An image transmission unit An image transmission unit.
  • An image processing method or program acquires a plurality of images captured by a user from a plurality of viewpoints, and a part of the user acquires a plurality of the images based on the plurality of images.
  • the method includes a step of generating a display image so that a part of the image is appropriately displayed when the angle of view is outside of the field of view, and transmitting the generated display image.
  • a telepresence system includes a display device that displays a display image transmitted from a counterpart, and a plurality of imaging devices that are arranged around the display device and that capture a user from a plurality of viewpoints.
  • An image acquisition unit that acquires images captured by each of the plurality of imaging devices, and an outside angle of view when a part of the user captures the plurality of images based on the plurality of images.
  • An image generation unit that generates a display image that properly displays a part thereof, and an image transmission unit that transmits the display image generated by the image generation unit.
  • a plurality of images captured by a user from a plurality of viewpoints are acquired, and a portion of the user captures a plurality of images based on the plurality of images. Then, a display image is generated so that a part of the display image is appropriately displayed, and the generated display image is transmitted.
  • FIG. 1 is a perspective view illustrating a schematic configuration of a telepresence system to which the present technology is applied. It is a block diagram which shows the structural example of 1st Embodiment of an image processing apparatus. It is a flowchart explaining a depth production
  • FIG. 18 is a block diagram illustrating a configuration example of an embodiment of a computer to which the present technology is applied.
  • FIG. 1 is a perspective view showing a schematic configuration of a telepresence system to which the present technology is applied.
  • the telepresence system 11 includes a display device 12, imaging devices 13a to 13d, and an image processing device 14.
  • the telepresence system 11 can provide, for example, a communication experience in which two users at remote locations are facing each other.
  • the user side in front of the display device 12 shown in FIG. 1 is referred to as the user side
  • the user side displayed on the display device 12 is referred to as the partner side.
  • the telepresence system 11 is provided on both the own side and the other side, and the telepresence systems 11 on the own side and the other side can communicate with each other via a network.
  • the display device 12 is connected to a communication device (not shown) that can communicate with the telepresence system 11 on the other side, and displays an image transmitted from the telepresence system 11 on the other side. The user on the side is displayed on the screen.
  • the imaging devices 13a to 13d are arranged around the display device 12.
  • the imaging devices 13a to 13d image the user from the viewpoints of the respective arrangement positions, and supply images (RGB color images) obtained by the imaging to the image processing device 14.
  • FIG. 1 the arrangement positions where the vertical and horizontal directions are 2 ⁇ 2 by the four imaging devices 13 a to 13 d are shown, but the arrangement positions are limited to the example shown in FIG. 1. There is nothing. If the user can be imaged from a plurality of viewpoints, the number of the imaging devices 13 may be three or less or five or more.
  • the image processing device 14 uses the four images supplied from the imaging devices 13a to 13d to perform image processing for generating an image viewed from the virtual viewpoint that is different from the viewpoints of the imaging devices 13a to 13d. Go to the telepresence system 11 on the other side. For example, during telepresence, the image processing apparatus 14 provides a virtual viewpoint (viewpoint P described later with reference to FIG. 5) so that the other user does not feel uncomfortable when he / she sees the user on the other side. Can be set. The detailed configuration of the image processing apparatus 14 will be described with reference to FIG.
  • the hand approaching the display device 12 is moved to the imaging devices 13a to 13a. It is assumed that the angle of view is outside the imageable angle of view by 13d. Therefore, in the image processing device 14, an image that allows the palm to be put together at the center of the display device 12 as shown in FIG. Processing is performed.
  • FIG. 2 is a block diagram illustrating a configuration example of the first embodiment of the image processing apparatus 14.
  • the image processing apparatus 14 includes an image acquisition unit 21, a depth generation unit 22, an image generation unit 23, and an image transmission unit 24.
  • the image acquisition unit 21 is connected to the imaging devices 13a to 13 in FIG. 1 by wire or wirelessly, acquires four images captured by the user from the viewpoints of the imaging devices 13a to 13d, and obtains a depth generation unit. 22 is supplied.
  • the depth generation unit 22 uses the four images supplied from the image acquisition unit 21 to generate a depth representing the depth for each coordinate in the plane direction of each image, and supplies the depth to the image generation unit 23.
  • the depth generation unit 22 obtains a stereo depth for each image by a stereo matching method using two images arranged vertically or horizontally, and then synthesizes the stereo depth in the vertical direction and the horizontal direction of each image.
  • the depth for the four images can be finally generated.
  • the image generation unit 23 includes a base image generation unit 31, a data recording unit 32, and an image composition unit 33.
  • the hand when the image generation unit 23 performs communication in which the users of the telepresence system 11 join hands, when the user's hand is outside the angle of view that can be captured by the imaging devices 13a to 13d, The hand generates an image that can be appropriately displayed on the display device 12 of the telepresence system 11 on the other side.
  • the base image generation unit 31 is any one of the four images acquired by the image acquisition unit 21. The images are selected to be displayed on the display device 12 of the telepresence system 11 on the other side.
  • the base image generation unit 31 displays an image viewed from the virtual viewpoint as the partner.
  • the base image is generated as a base image for display on the display device 12 of the telepresence system 11 on the side.
  • the base image generation unit 31 views the user from a virtual viewpoint based on the four images acquired by the image acquisition unit 21 and the depths for the four images generated by the depth generation unit 22.
  • a base image can be generated. Note that the base image generation processing of the base image generation unit 31 will be described later with reference to FIGS.
  • a 3D model of a hand in which the shape of the user's hand is formed in three dimensions and the texture of the hand is pasted is recorded in advance.
  • the users of the telepresence system 11 communicate with each other, hands that can display the palm of the hand can be displayed by displaying an image in which the palm of the other user can be seen on each other's display device 12.
  • a 3D model is created in advance.
  • the 3D model of the hand may be created so as to include, for example, the part from the hand to the elbow in addition to creating only the hand part.
  • a hand 3D model used in existing computer graphics may be used.
  • the image compositing unit 33 is, for example, a state in which the user's hand approaches the display device 12 and is outside the angle of view that can be captured by the imaging devices 13a to 13d, and is close enough that a part of the user's hand cannot be captured (for example, In the case of state C) in FIG. 4 described later, the user's hand is combined with the base image.
  • the image synthesizing unit 33 converts the user's hand to the data recording unit 32 into a base image viewed from a virtual viewpoint so that the user's hand is appropriately displayed on the display device 12 of the telepresence system 11 on the other side. An image of the palm of the user based on the recorded 3D model of the hand is synthesized. Note that the image composition processing of the image composition unit 33 will be described later with reference to FIGS.
  • the image transmission unit 24 is connected to a communication device (not shown) that can communicate with the telepresence system 11 on the other side via the network, and the image generated by the image generation unit 23 is transmitted to the other side. As a display image to be displayed on the telepresence system 11.
  • the image processing device 14 configured as described above is configured to combine images obtained by combining the user's hand based on the 3D model of the hand with the base image generated based on the images captured by the imaging devices 13a to 13d. Can be displayed on the display device 12. Thereby, the telepresence system 11 can make the communication that the users match each other without a sense of incongruity.
  • the depth generation process executed in the depth generation unit 22 will be described with reference to the flowchart shown in FIG.
  • an image a captured by the image capturing device 13a, an image b captured by the image capturing device 13b, an image c captured by the image capturing device 13c, and an image d captured by the image capturing device 13d are transferred from the image acquisition unit 21 to the depth.
  • the process is started.
  • step S11 the depth generation unit 22 uses the image a picked up by the image pickup device 13a and the image b picked up by the image pickup device 13b, and the first stereo depth a1 and the image of the image a by the stereo matching method.
  • the first stereo depth b1 of b is calculated.
  • step S12 the depth generation unit 22 uses the image c picked up by the image pickup device 13c and the image d picked up by the image pickup device 13d, and the first stereo depth c1 and the image c of the image c by the stereo matching method.
  • the first stereo depth d1 of d is calculated.
  • step S ⁇ b> 13 the depth generation unit 22 uses the image a captured by the imaging device 13 a and the image c captured by the imaging device 13 c to perform the second stereo depth a ⁇ b> 2 and the image of the image a by the stereo matching method.
  • the second stereo depth c2 of c is calculated.
  • step S14 the depth generation unit 22 uses the image b picked up by the image pickup device 13b and the image d picked up by the image pickup device 13d, and the second stereo depth b2 and the image b of the image b by the stereo matching method.
  • the second stereo depth d2 of d is calculated.
  • step S15 the depth generation unit 22 combines the first stereo depth a1 of the image a calculated in step S11 and the second stereo depth a2 of the image a calculated in step S13, thereby obtaining a depth a for the image a.
  • the depth generation unit 22 combines the first stereo depth b1 of the image b calculated in step S11 and the second stereo depth b2 of the image b calculated in step S14, thereby calculating the depth b for the image b.
  • the depth generation unit 22 calculates the depth c for the image c and the depth d for the image d, and the depth generation process ends.
  • the depth generation unit 22 can generate a depth for each of the four images obtained by capturing the user from the viewpoints of the imaging devices 13a to 13d.
  • FIG. 4 shows an example of four images a to d captured by the imaging devices 13a to 13d in three states according to the distance from the display device 12 to the user's hand.
  • the base image generation unit 31 determines whether or not to generate a base image viewed from the virtual viewpoint according to the distance from the display device 12 to the user's hand.
  • the base image generation unit 31 determines that the base image is not generated when the user's hand is sufficiently far from the display device 12 (hereinafter referred to as state A as appropriate), and determines that four images a Any one of the images from d to d is selected for display on the display device 12 of the telepresence system 11 on the other side.
  • the base image generation unit 31 determines to generate a base image, and the four images acquired by the image acquisition unit 21 and the four images generated by the depth generation unit 22 are determined. Based on the depth, a base image in which the user is viewed from a virtual viewpoint is generated.
  • the user's hand is closer to the center of the display device 12 than in the state B, and the user's hand is closer to the outside of the angle of view that can be imaged by the imaging devices 13a to 13d.
  • the distance to the user's hand is less than the predetermined second distance, a part of the user's hand is not displayed on the screen in the four images a to d as shown in FIG. 4C.
  • state C a state close enough that a part of the user's hand cannot be imaged
  • state C a base image lacking a part of the user's hand is generated.
  • state B the user on the other side is more uncomfortable.
  • the base image generation unit 31 generates a base image viewed from the virtual viewpoint, and then adds the user's hand to the generated base image so as to compensate for the lacking part of the user's hand. Hands are synthesized.
  • the base image generation unit 31 determines whether the state is the state A, the state B, or the state C according to the distance from the display device 12 to the user's hand.
  • the base image generation unit 31 determines that the state A has changed to the state B, the image to be displayed on the display device 12 of the telepresence system 11 on the other side is obtained from any one of the images a to d.
  • a process of switching from a virtual viewpoint to a base image viewed from the user is performed. Note that the base image generation unit 31 may generate a base image even in the state A.
  • the global coordinate system uses the center of the display device 12 as the origin O, the direction orthogonal to the surface of the display device 12 as the Z axis, and the horizontal direction along the surface of the display device 12. Is set as the X axis, and the vertical direction along the surface of the display device 12 is set as Y.
  • the height of the other user is 150 cm, and the height of the display device 12 is L.
  • the center of the viewpoint P of the partner user is the coordinate in the global coordinate system. (0, 150-L / 2, -0.5) can be set.
  • the x-axis, y-axis, and z-axis of the local coordinate system of the viewpoint P are set to be parallel to the X-axis, Y-axis, and Z-axis of the global coordinate system, respectively.
  • the base image generation unit 31 uses the viewpoint P of the other user as a virtual viewpoint when generating the base image, so that the user's hand is too close to the display device 12. In addition, it is possible to generate a base image that does not make the other user feel uncomfortable.
  • the telepresence system 11 is a user who specifies the user's viewpoint position such as the distance from the display device 12 to the user and the height of the user's viewpoint based on the images captured by the imaging devices 13a to 13d. Information can be sought and transmitted over the network at any time. Then, the base image generation unit 31 can determine the coordinates of the virtual viewpoint P based on the other-party user information.
  • FIG. 6 is a flowchart illustrating a base image generation process in which the base image generation unit 31 generates a base image.
  • step S21 the base image generation unit 31 shows the four images acquired by the image acquisition unit 21 and the depths generated by the depth generation unit 22 for these four images as shown in FIG. Convert to a point cloud in the global coordinate system. Thereby, one point cloud that three-dimensionally represents the surface of the user viewed from the imaging devices 13a to 13d side by a set of points is synthesized.
  • step S22 the base image generation unit 31 selects the user on the own side from the virtual viewpoint P (the other user's viewpoint) shown in FIG. 5 based on the point cloud of the global coordinate system synthesized in step S21.
  • the viewed image is generated as a base image.
  • the base image generation unit 31 is based on the virtual viewpoint P based on the four images obtained by capturing the user from the viewpoints of the imaging devices 13a to 13 and the depths of the images. An image can be generated.
  • the image composition unit 33 In the case of the state A described above with reference to FIG. 4, that is, in a state where the user's hand is sufficiently far from the display device 12, the image composition unit 33 directly selects the image selected by the base image generation unit 31. Output. Further, in the case of the state B described above with reference to FIG. 4, that is, when the user's hand is visible in the periphery of the screen, the image composition unit 33 displays the base image generated by the base image generation unit 31. Output as is.
  • the image synthesizing unit 33 uses the base image generated by the base image generating unit 31 and the point cloud of the global coordinate system, and the 3D model of the hand recorded in the data recording unit 32, to the user's base image. Perform image composition processing to synthesize hands.
  • the image combining unit 33 When combining the user's hand with the base image, the image combining unit 33 first estimates a depth Z 0 for placing the 3D model of the hand.
  • FIG. 7A shows a state in which the user reaches for the display device 12 in the state B
  • FIG. 7B shows a state in which the user reaches for the display device 12 in the state C. It is shown. Then, assuming that the relative distance from the user's body to the hand does not change when the state B changes to the state C, the image compositing unit 33 calculates the depth difference L1 between the body and the hand at the time of the state B.
  • the hand depth Z 0 can be inferred from the body depth Zs at the time of the state C by referring to FIG.
  • the image composition unit 33 detects an area in which the user's body is shown and an area in which the user's hand is shown from the image acquired by the image acquisition unit 21, and the depth generation unit The average depth of each region is calculated from the depth of the image generated by 22. In addition, for example, learning performed in advance can be used to detect the region where the user's body or hand is shown. Then, the image composition unit 33 obtains the depth difference L1 by calculating the difference between the average depth of the region where the user's body is shown and the average depth of the region where the user's hand is shown, and the data recording unit 32 Keep a record.
  • the depth Z 0 of the 3D model of the hand may be estimated using a depth camera 15 as shown in FIG.
  • the telepresence system 11 is configured to include a depth camera 15 on the ceiling above the room where the display device 12 is arranged. Therefore, the image composition unit 33 can estimate the depth Z 0 where the 3D model of the hand is arranged based on the distance from the display device 12 measured by the depth camera 15 to the user's hand.
  • FIG. 9 shows an image of the base image generated by the base image generation unit 31 in the state C.
  • a part of the user's hand is outside the angle of view that can be captured by the imaging devices 13a to 13d, and an image in which the part of the user's hand is not reflected is acquired.
  • a base image lacking a part of the hand is generated.
  • the image compositing unit 33 uses the center position (u h , v h ) of the hand region lacking in the base image as the luminance I (u i , v i ) of the pixel (u i , v i ) in the base image. As shown in the following equation (1).
  • the image composition unit 33 obtains the center position (u h , v h ) of the hand region in the base image generated by the base image generation unit 31 as described with reference to FIG. Based on the depth Z 0 of the 3D model of the hand, the 3D model of the hand is projected onto the base image.
  • the local coordinate system T of the 3D model of the hand has, for example, the center of gravity of the hand as the origin (X T0 , Y T0 , Z T0 ), the reverse direction of the palm as the z axis, and the finger direction of the middle finger Can be defined in a right-handed coordinate system with y as the y-axis.
  • the 3D model of the hand is fully base so that the center of gravity of the 3D model of the hand is projected on the center position (u h , v h ) of the hand region in the base image viewed from the viewpoint P shown in FIG. Projected on the image.
  • the image composition unit 33 calculates the coordinates (X T0 , Y T0 , Z T0 ) of the center of gravity of the 3D model of the hand in the coordinate system of the viewpoint P based on the following equation (2).
  • f in Expression (2) is a focal length when the base image of the viewpoint P is generated.
  • the x-axis, y-axis, and z of the local coordinate system T of the 3D model of the hand It is assumed that the axis and the x-axis, y-axis, and z-axis of the virtual viewpoint P coordinate system are parallel to each other. Thereby, when each viewpoint in the local coordinate system T of the 3D model of the hand is converted into the coordinate system of the virtual viewpoint P, it is not necessary to perform the rotation, and only the translation is required.
  • the image composition unit 33 calculates the coordinates (X Ti , Y Ti , Z Ti ) in the local coordinate system T of the 3D model of the hand according to the following equation (3). Is converted into coordinates (X Pi , Y Pi , Z Pi ) in the coordinate system of a typical viewpoint P.
  • the image composition unit 33 calculates the following equation (4) for the pixel (u i , v i ) obtained by projecting each point i in the 3D model of the hand from the coordinate system of the virtual viewpoint P onto the base image. To find out.
  • the depth of the pixel (u i , v i ) is Z Pi in the depth of the base image.
  • multiple points on the 3D model of the hand may be projected onto the same pixel of the base image. In this case, a point with a small depth among the multiple points is selected as the one to be projected onto the base image. Is done.
  • FIG. 11 is a flowchart illustrating an image composition process in which the image composition unit 33 composes the user's hand with the base image.
  • step S31 for example, as described with reference to FIG. 7 described above, the image composition unit 33 estimates the depth Z 0 where the 3D model of the hand is placed.
  • step S32 the image composition unit 33 determines the center position of the hand region lacking in the base image generated by the base image generation unit 31, that is, the center position (u h , v h ) shown in FIG. presume.
  • step S33 the image composition unit 33 sets the depth Z 0 for placing the 3D model of the hand estimated in step S31 and the center position (u h , v h ) of the hand region in the base image estimated in step S32. Based on this, a 3D model of the hand is projected onto the base image. Thereby, the image composition unit 33 can generate an image in which the user's hand can be seen by composing the hand with the base image in which a part of the user's hand is missing.
  • the image composition unit 33 displays the image so that the user's hand is appropriately displayed based on the depth at which the 3D model of the hand is arranged and the center position of the hand region that is missing in the base image. Synthesis can be performed.
  • step S41 the image acquisition unit 21 acquires four images obtained by capturing the user from the viewpoints of the imaging devices 13a to 13d, and the depth generation unit 22 is acquired. To supply.
  • step S42 the depth generation unit 22 performs depth generation processing (the flowchart of FIG. 4 described above) that generates depths for the four images supplied from the image acquisition unit 21 in step S41. Then, the depth generation unit 22 supplies the four images and the depth corresponding to each image to the base image generation unit 31 and the image synthesis unit 33.
  • depth generation processing the flowchart of FIG. 4 described above
  • step S43 the base image generation unit 31 obtains the distance from the display device 12 to the user's hand based on the depth supplied in step S42, and the user's hand is not so uncomfortable as viewed from the other user. Is determined to be sufficiently far away.
  • step S43 when the base image generation unit 31 determines that the user's hand is sufficiently far away (state A in FIG. 4), the process proceeds to step S44.
  • step S44 the base image generation unit 31 displays any one of the four images captured by the imaging devices 13a to 13d on the display device 12 of the telepresence system 11 on the other side. Choose as a thing.
  • step S45 the base image generation unit 31 generates a base image using the four images captured by the imaging devices 13a to 13d and the depth generated by the depth generation unit 22 in step S42. Processing (the flowchart of FIG. 6 described above) is performed.
  • step S46 the image composition unit 33 cannot capture a part of the user's hand based on the distance from the display device 12 to the user's hand obtained based on the depth generated by the depth generation unit 22 in step S42. It is determined whether or not the state is almost the same.
  • step S46 If it is determined in step S46 that the image compositing unit 33 is in a state close enough that a part of the user's hand cannot be captured (state C in FIG. 4), the process proceeds to step S47.
  • step S47 the image composition unit 33 composes the user's hand with the base image generated in step S45 based on the 3D model of the hand recorded in the data recording unit 32 (see FIG. 11 described above). Flowchart).
  • step S48 the image synthesizing unit 33 supplies the image transmitting unit 24 with an image in which the user's hand is synthesized with the base image, and the image transmitting unit 24 Send an image.
  • step S44 the process proceeds to step S48, where the image composition unit 33 supplies the image selected by the base image generation unit 31 to the image transmission unit 24, and the image transmission unit 24 transmits the image. To do. If it is determined in step S46 that the part of the user's hand is not close enough to be captured (state B in FIG. 4), the process proceeds to step S48, and the image compositing unit 33 proceeds to step S45.
  • the base image generated by the base image generation unit 31 is supplied to the image transmission unit 24 as it is, and the image transmission unit 24 transmits the base image.
  • step S48 the process returns to step S41, and the same process is repeatedly performed for the next image captured by the imaging devices 13a to 13d.
  • the image processing apparatus 14 can perform communication that allows users to join hands with each other based on the images captured by the imaging apparatuses 13 a to 13 d arranged around the display apparatus 12 without a sense of incongruity. Processing can be realized. Thereby, the telepresence system 11 can provide more friendly communication.
  • FIG. 13 is a block diagram illustrating a configuration example of the second embodiment of the image processing apparatus 14.
  • the same reference numerals are given to the same components as those in the image processing apparatus 14 in FIG. 2, and detailed descriptions thereof are omitted.
  • the image processing apparatus 14A includes an image acquisition unit 21, a depth generation unit 22, and an image transmission unit 24.
  • the image generation unit 23A includes a base image generation unit. 31, a data recording unit 32, and an image composition unit 33.
  • the base image generation unit 31 is configured to supply the point cloud of the global coordinate system to the data recording unit 32 and record the point cloud in FIG.
  • the configuration is different from that of the image processing apparatus 14.
  • the base image generation unit 31 determines the depth for the four images generated by the depth generation unit 22 when it is determined that the user's hand is in the state B reflected on the periphery of the screen.
  • the point cloud synthesized from is supplied to the data recording unit 32.
  • the image composition unit 33 determines that the state C is so close that a part of the user's hand cannot be captured, the point cloud of the state B recorded in the data recording unit 32 and the point of the state C Align with the cloud and synthesize the user's hand with the base image.
  • FIG. 14 is a diagram showing an image of aligning two point clouds.
  • the image composition unit 33 aligns these two point clouds using, for example, a technique such as ICP (Iterative Closest Point), and replaces the missing portion in the state C point cloud with the state B. Get from the point cloud.
  • ICP Intelligent Closest Point
  • the image composition unit 33 projects the point cloud in the state C in which the hand portion is complemented from the point cloud in the state B onto the base image viewed from the virtual viewpoint P, and the user Synthesize your hands.
  • the image composition unit 33 places the point cloud in the state B and the point cloud in the state C in the coordinate system of the virtual viewpoint P as shown in FIG. 5 and calculates the above-described equation (4).
  • the point cloud of the hand portion can be projected onto the base image.
  • the image generation unit 23A does not use the 3D model of the hand created in advance, but combines the user's hand with the base image by using the point cloud in the state B several frames before, so that the user You can realize communication that hands are together.
  • FIG. 15 is a block diagram illustrating a configuration example of the third embodiment of the image processing apparatus 14.
  • the same reference numerals are given to the same components as those in the image processing device 14 in FIG. 2, and detailed description thereof is omitted.
  • the image processing device 14B includes an image acquisition unit 21, a depth generation unit 22, and an image transmission unit 24, as with the image processing device 14 of FIG.
  • the image generation unit 23B of the image processing apparatus 14B includes a user recognition unit 34 in addition to the base image generation unit 31, the data recording unit 32, and the image synthesis unit 33.
  • a 3D model of a hand is recorded for each of a plurality of users, and characteristics of each user are recorded.
  • a user ID (Identification) for identifying the user is set for each user, and the data recording unit 32 images the 3D model of the hand corresponding to the user ID specified by the user recognition unit 34. This is supplied to the combining unit 33.
  • the user recognizing unit 34 detects a user feature based on the image obtained from the base image generating unit 31 and refers to the user feature recorded in the data recording unit 32 to correspond to the detected user feature. To the data recording unit 32. For example, the user recognizing unit 34 detects a face using a face detection method as a user feature, and the facial feature reflected in the image and the facial feature of each user recorded in the data recording unit 32 Is used to recognize the user by the face recognition method. Then, the user recognition unit 34 can specify the user ID of the user recognized as the same user from the facial features to the data recording unit 32. For the face detection method and the face recognition method by the user recognition unit 34, for example, a learning method such as deep learning can be used.
  • the image processing apparatus 14 ⁇ / b> B records the 3D models of a plurality of user's hands in the data recording unit 32 in advance, and recognizes the user by the user recognition unit 34. Can be combined with an image.
  • FIG. 16 is a block diagram showing a configuration example of the fourth embodiment of the image processing apparatus 14.
  • the same reference numerals are given to the same components as those in the image processing device 14 in FIG. 2, and detailed description thereof will be omitted.
  • the image processing device 14 ⁇ / b> C includes the image acquisition unit 21, the depth generation unit 22, the image generation unit 23, and the image transmission unit 24 similarly to the image processing device 14 of FIG. 2. Furthermore, the image processing apparatus 14 ⁇ / b> C is configured to include a hand alignment recognition unit 25.
  • the hand recognizing unit 25 tries to perform communication in which the user puts hands together using an arbitrary depth of the depths of the four images generated by the depth generating unit 22 and an image corresponding to the depth. Recognize your intention. Then, the hand recognizing unit 25 transmits a recognition result of recognizing the intention of the user to communicate with each other to the telepresence system 11 on the other side via a network (not shown).
  • the hand recognizing unit 25 recognizes the region where the hand is shown from the image, and extracts the depth of the recognized hand region. Then, the hand recognizing unit 25 refers to the depth of the extracted hand region, and when it is determined that the user's hand is equal to or less than the predetermined distance, there is an intention that the user intends to perform communication that hands are put together. It can be recognized that there is.
  • the hand recognizing unit 25 records the data of several frames before the extracted depth of the hand area, and the depth of the hand area in the current frame and the depth of the hand area in the current frame. Based on the above, it is determined whether or not the user's hand is approaching. Then, when it is determined that the user's hand is approaching, the hand recognizing unit 25 can recognize that the user intends to perform communication that the hand is put together.
  • the image processing apparatus 14C can recognize whether or not there is an intention of performing communication that the users join hands, and when the user recognizes the intention, the hands of the users can be recognized. Suitable image processing can be performed more reliably.
  • the communication can be performed. Provide feedback.
  • the communication in which each other's hands are brought together via the telepresence system 11 has been described as an example.
  • the telepresence system 11 performs image processing other than combining the user's hand with the base image. It can be performed.
  • the telepresence system 11 uses the 3D model such as the body and face corresponding to the base image The image processing to be combined can be performed.
  • the processes described with reference to the flowcharts described above do not necessarily have to be processed in chronological order in the order described in the flowcharts, but are performed in parallel or individually (for example, parallel processes or objects). Processing).
  • the program may be processed by one CPU, or may be distributedly processed by a plurality of CPUs.
  • the above-described series of processing can be executed by hardware or can be executed by software.
  • a program constituting the software executes various functions by installing a computer incorporated in dedicated hardware or various programs.
  • the program is installed in a general-purpose personal computer from a program recording medium on which the program is recorded.
  • FIG. 16 is a block diagram showing an example of the hardware configuration of a computer that executes the above-described series of processing by a program.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • EEPROM Electrically Erasable Memory and Programmable Read Only Memory
  • the CPU 101 loads the program stored in the ROM 102 and the EEPROM 104 to the RAM 103 via the bus 105 and executes the program, thereby performing the above-described series of processing.
  • a program executed by the computer (CPU 101) can be written in the ROM 102 in advance, and can be installed or updated from the outside in the EEPROM 104 via the input / output interface 105.
  • system represents the entire apparatus composed of a plurality of apparatuses.
  • this technique can also take the following structures.
  • An image acquisition unit that acquires a plurality of images captured by a user from a plurality of viewpoints; An image that generates a display image based on a plurality of the images so that a part of the user is appropriately displayed when the user is outside the angle of view when the plurality of images are captured.
  • a generator An image processing apparatus comprising: an image transmission unit that transmits the display image generated by the image generation unit.
  • a depth generation unit configured to generate a depth representing a depth in each image based on the plurality of images; The image processing device according to (1), wherein the image generation unit generates the display image based on a plurality of the images and the corresponding depths.
  • the image generation unit generates a base image that is an image obtained by viewing the user from a virtual viewpoint different from the plurality of viewpoints based on the plurality of images and the corresponding depths.
  • the image processing apparatus according to (2) including a generation unit.
  • the image processing apparatus according to (3) wherein the virtual viewpoint is set to a viewpoint of a counterpart user to whom the image transmission unit transmits the display image.
  • a plurality of imaging devices that image the user are arranged around a display device that displays a display image transmitted from the other party to which the image transmission unit transmits the display image,
  • the base image generation unit The image processing apparatus according to (3) or (4).
  • the image transmission unit is any one of the plurality of images captured by the user from a plurality of viewpoints.
  • the image processing apparatus according to (5) wherein a sheet is transmitted as the display image.
  • the image generation unit captures a plurality of images when a distance from the display device to a part of the user is less than a second distance at which the part of the user does not appear in the plurality of images.
  • the image processing device according to (5) or (6), wherein the image transmission unit transmits an image in which a part of the user is combined with the base image by the image combining unit as the display image.
  • the image transmission unit When the distance from the display device to the part of the user is less than the first distance and greater than or equal to the second distance, the image transmission unit generates the base image generated by the base image generation unit. As the display image.
  • the image processing device according to (7).
  • the image generation unit further includes a data recording unit that records a 3D model in which a part of the user is three-dimensionally formed, The image processing device according to (7) or (8), wherein the image synthesis unit synthesizes a part of the user with the base image using the 3D model recorded in the data recording unit.
  • the data recording unit generates the base image when the distance from the display device to a part of the user is less than the first distance and greater than or equal to the second distance. Record the point cloud of the user used when The image synthesizing unit is recorded in the data recording unit when a distance from the display device to a part of the user is less than a second distance at which the part of the user is not reflected in a plurality of images.
  • the image processing device according to any one of (7) to (9), wherein a part of the user is synthesized with the base image using the point cloud.
  • the image generation unit further includes a user recognition unit that recognizes the user shown in the image, The data recording unit records the 3D model for each of the plurality of users, The image processing apparatus according to (9), wherein the image synthesis unit synthesizes a part of the user with the base image using the 3D model corresponding to the user recognized by the user recognition unit.
  • the image transmission unit performs communication to match the other user's hand displayed on the display device that displays the display image transmitted from the other party transmitting the display image with the user's hand on the own side.
  • the image processing apparatus according to any one of (1) to (11) above.
  • a display device that displays a display image transmitted from the other party;
  • a plurality of imaging devices arranged around the display device and imaging a user from a plurality of viewpoints;
  • An image acquisition unit for acquiring images captured by each of the plurality of imaging devices;
  • a generator, A telepresence system comprising: an image transmission unit that transmits the display image generated by the image generation unit.
  • 11 telepresence system 12 display device, 13a to 13d imaging device, 14 image processing device, 15 depth camera, 21 image acquisition unit, 22 depth generation unit, 23 image generation unit, 24 image transmission unit, 25 manual recognition unit, 31 base image generation unit, 32 data recording unit, 33 image composition unit, 34 user recognition unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention concerne un dispositif de traitement d'image, un procédé de traitement d'image, un programme, et un système de téléprésence, permettant d'exécuter une communication plus naturelle. Une pluralité de dispositifs de capture d'image utilisée pour capturer des images d'un utilisateur depuis une pluralité de points de vue est placée autour d'un dispositif d'affichage pour afficher une image d'affichage transmise à partir d'une contrepartie, et une unité d'acquisition d'image acquiert des images capturées respectivement par la pluralité de dispositifs de capture d'image. Sur la base de la pluralité d'images, une unité de génération d'image génère une image d'affichage de sorte que lorsqu'une main de l'utilisateur sort de l'angle de vue lorsque la pluralité d'images est capturée, la partie est affichée de manière appropriée, et une unité de transmission d'image transmet l'image d'affichage à la contrepartie. La présente invention peut être appliquée à un système de téléprésence, par exemple.
PCT/JP2017/024571 2016-07-19 2017-07-05 Dispositif de traitement d'image, procédé de traitement d'image, programme, et système de téléprésence WO2018016316A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016141553 2016-07-19
JP2016-141553 2016-07-19

Publications (1)

Publication Number Publication Date
WO2018016316A1 true WO2018016316A1 (fr) 2018-01-25

Family

ID=60992195

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/024571 WO2018016316A1 (fr) 2016-07-19 2017-07-05 Dispositif de traitement d'image, procédé de traitement d'image, programme, et système de téléprésence

Country Status (1)

Country Link
WO (1) WO2018016316A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08123938A (ja) * 1994-10-20 1996-05-17 Ohbayashi Corp 遠隔操作用監視装置
US20100149310A1 (en) * 2008-12-17 2010-06-17 Microsoft Corporation Visual feedback for natural head positioning
JP2011138314A (ja) * 2009-12-28 2011-07-14 Sharp Corp 画像処理装置
JP2013025528A (ja) * 2011-07-20 2013-02-04 Nissan Motor Co Ltd 車両用画像生成装置及び車両用画像生成方法
JP2014056466A (ja) * 2012-09-13 2014-03-27 Canon Inc 画像処理装置及び方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08123938A (ja) * 1994-10-20 1996-05-17 Ohbayashi Corp 遠隔操作用監視装置
US20100149310A1 (en) * 2008-12-17 2010-06-17 Microsoft Corporation Visual feedback for natural head positioning
JP2011138314A (ja) * 2009-12-28 2011-07-14 Sharp Corp 画像処理装置
JP2013025528A (ja) * 2011-07-20 2013-02-04 Nissan Motor Co Ltd 車両用画像生成装置及び車両用画像生成方法
JP2014056466A (ja) * 2012-09-13 2014-03-27 Canon Inc 画像処理装置及び方法

Similar Documents

Publication Publication Date Title
US11887234B2 (en) Avatar display device, avatar generating device, and program
WO2017141511A1 (fr) Appareil de traitement d'informations, système de traitement d'informations, procédé de traitement d'informations et programme
JP2011090400A (ja) 画像表示装置および方法、並びにプログラム
JP5237234B2 (ja) 映像コミュニケーションシステム、及び映像コミュニケーション方法
WO2018188277A1 (fr) Procédé et dispositif de correction de visée, terminal de conférence intelligent et support de stockage
JP5833526B2 (ja) 映像コミュニケーションシステム及び映像コミュニケーション方法
EP2661077A1 (fr) Système et procédé pour l'alignement oculaire dans une vidéo
JP4144492B2 (ja) 画像表示装置
JP2008140271A (ja) 対話装置及びその方法
JP2020065229A (ja) 映像通信方法、映像通信装置及び映像通信プログラム
JP5478357B2 (ja) 表示装置および表示方法
KR101579491B1 (ko) 증강현실에서의 원격 협업을 위한 디지로그 공간 생성기 및 그를 이용한 디지로그 공간 생성 방법
JP5731462B2 (ja) 映像コミュニケーションシステム及び映像コミュニケーション方法
JP2011097447A (ja) コミュニケーションシステム
JP2011113206A (ja) 映像コミュニケーションシステム、及び映像コミュニケーション方法
JP5712737B2 (ja) 表示制御装置、表示制御方法、及びプログラム
JP5759439B2 (ja) 映像コミュニケーションシステム及び映像コミュニケーション方法
WO2018016316A1 (fr) Dispositif de traitement d'image, procédé de traitement d'image, programme, et système de téléprésence
JP5833525B2 (ja) 映像コミュニケーションシステム及び映像コミュニケーション方法
JP5898036B2 (ja) 映像コミュニケーションシステム及び映像コミュニケーション方法
WO2018173205A1 (fr) Système de traitement d'informations, son procédé de commande et programme
JP2015184986A (ja) 複合現実感共有装置
JP5647813B2 (ja) 映像提示システム、プログラム及び記録媒体
JP5485102B2 (ja) コミュニケーション装置、コミュニケーション方法、及びプログラム
JPWO2020090128A1 (ja) 画像処理装置、方法、コンピュータプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17830836

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17830836

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP