WO2024062002A2 - System and method for visualizing a person's face - Google Patents

System and method for visualizing a person's face Download PDF

Info

Publication number
WO2024062002A2
WO2024062002A2 PCT/EP2023/076006 EP2023076006W WO2024062002A2 WO 2024062002 A2 WO2024062002 A2 WO 2024062002A2 EP 2023076006 W EP2023076006 W EP 2023076006W WO 2024062002 A2 WO2024062002 A2 WO 2024062002A2
Authority
WO
WIPO (PCT)
Prior art keywords
video stream
image
face
floor
unit
Prior art date
Application number
PCT/EP2023/076006
Other languages
French (fr)
Other versions
WO2024062002A3 (en
Inventor
Andreas SANSANO
Lukas JULEN
Original Assignee
Abusizz Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Abusizz Ag filed Critical Abusizz Ag
Publication of WO2024062002A2 publication Critical patent/WO2024062002A2/en
Publication of WO2024062002A3 publication Critical patent/WO2024062002A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals

Definitions

  • the invention makes possible to provide an improved meeting experience in meeting with one or more remote participants.
  • the invention can make possible that the appearance of remote participants is more realistic.
  • the size of an image of a remote participant’s head or face and the height (above a floor) is adjusted so that it resembles size and height of the head(s) or faces of non-remote participant.
  • a stand to which an image output surface on which the remote person’s head or face is made visible is fixed, e.g., integrated, has a predefined alignment when placed on a floor, such that the stand and thus the image of the remote person’s head or face can be placed near a table at which non-remote participants are sitting.
  • a processing unit accomplishing the scaling and positioning of the remote participant’s head or face in a video stream can be integrated in a visualization unit comprising the stand and the image output surface, or it can be realized on a remote server connected to the internet.
  • Today’s video conferencing software, known image processing software and known face recognition software have useful features which can be used in the invention, such as face detection, eye detection, distance detection, e.g., between the two eyes of an imaged person, identifying the image of a person’s head to identify complementing background, e.g., for cropping or for filling the background with different image information.
  • the processing unit can receive an input video stream, e.g., via the internet, e.g., generated by the remote person’s laptop, and apply such processing to the input video stream or to a video stream derived therefrom. For example, eye distance and a center of face position (or vertical eye position) can be determined and based thereon, appropriate placement and scaling of the person’s face in an output video stream of the processing unit can be achieved.
  • the output video stream can have system-defined number of pixels (horizontally and vertically), such as given by the visualization unit and more particularly of a display unit thereof, e.g., an output monitor. E.g., a scaling of the input video stream onto that system-defined number of pixels is carried out, wherein the examination regarding size and position of the remote person’s face can be accomplished before or after said scaling.
  • the visualization unit can be placed at a table instead of a chair, to impersonate the remote person sitting at the table. This way, a realistic meeting experience can be achieved with relatively simple means.
  • the invention be particularly useful in conjunction with interactive display apparatuses as described in WO 2021/175 997.
  • the invention comprises systems as well as corresponding methods.
  • an interactive display apparatus is described. Using such interactive display apparatuses or also in other situations, it can be useful to create an image of a current situation relating to (a) a video projected onto a projection surface such as onto a table top and (b) an object present on the projection surface during the projection.
  • an augmented image can be created from the live image taken of the object (“second image”) and from a video image (“first image”) which would have been projected at least approximately at the time of muting the projection and capturing the live image. These two images can be merged, e.g., to show, within the video, the object where it has been located at that time.
  • the augmented image can be of very high image quality.
  • the 3D sensor can distinguish between regions where nothing stands on the projection surface (empty projection surface) and where an object is present on the projection surface because such an object is elevated relative to the projection surface.
  • a 3D sensor calibration can be useful in this regard, e.g., including determining the 3D position of the projection surface (in many 3D sensing locations across the projection surface) with no object present on it as a reference. E.g., any location where the 3D sensor senses a height which lies above the reference height of the (empty) projection surface is considered to be a place where an object is present on the projection surface.
  • a calibration to match the live image pixels and the video pixels can be made, e.g., associating four corner points of a rectangular overlap area and interpolating between these.
  • Contents of pixels identified in the live image as object-associated pixels can be used for defining the contents in corresponding regions of the augmented image (and optionally, also contents of the video image in said corresponding regions can be used therefor), and contents of the remaining regions of the augmented image can be determined from contents in corresponding regions in the video image, in particular solely therefrom.
  • the muting can mean but not necessarily means that the projection is shut off completely. In embodiments, it is possible that the projection continues, but, e.g., with (strongly) reduced intensity.
  • the imaging unit used for taking the live image (second image) can be a digital imaging unit.
  • the capturing of the second image is accomplished in a pause between the projection of subsequent video images - i.e. where momentarily no projection takes place anyway, such as in the pause occurring 60 times per second in a 60 Hz projection.
  • a control unit of the system in this case does not necessarily have to change the way the projection is “naturally” carried out but can control the proection just the way it might be without the capture of the live image and the generation of the augmented image.
  • the system according to the invention can be a system for a generation of an augmented video stream.
  • the invention comprises systems as well as corresponding methods.
  • an interactive display apparatus is described. Using such interactive display apparatuses or also in other situations, it can be useful to create an image of a current situation relating to (a) a video projected onto a projection surface such as onto a table top and (b) an object present on the projection surface during the projection.
  • an augmented image can be created from the live image taken of the object (“second image”) and from a video image (“first image”) which would have been projected at least approximately at the time of muting the projection and capturing the live image. These two images can be merged, e.g., to show, within the video, the object where it has been located at that time.
  • the augmented image can be of very high image quality.
  • the 3D sensor can distinguish between regions where nothing stands on the projection surface (empty projection surface) and where an object is present on the projection surface because such an object is elevated relative to the projection surface.
  • a 3D sensor calibration can be useful in this regard, e.g., including determining the 3D position of the projection surface (in many 3D sensing locations across the projection surface) with no object present on it as a reference. E.g., any location where the 3D sensor senses a height which lies above the reference height of the (empty) projection surface is considered to be a place where an object is present on the projection surface.
  • a calibration to match the live image pixels and the video pixels can be made, e.g., associating four comer points of a rectangular overlap area and interpolating between these.
  • Contents of pixels identified in the live image as object-associated pixels can be used for defining the content s in corresponding regions of the augmented image (and optionally, also contents of the video image in said corresponding regions can be used therefor), and contents of the remaining regions of the augmented image can be determined from contents in corresponding regions in the video image, in particular solely therefrom.
  • the muting can mean but not necessarily means that the projection is shut off completely. In embodiments, it is possible that the projection continues, but, e.g., with (strongly) reduced intensity.
  • the imaging unit used for taking the live image (second image) can be a digital imaging unit.
  • the capturing of the second image is accomplished in a pause between the projection of subsequent video images - i.e. where momentarily no projection takes place anyway, such as in the pause occurring 60 times per second in a 60 Hz projection.
  • a control unit of the system in this case does not necessarily have to change the way the projection is “naturally” carried out but can control the proection just the way it might be without the capture of the live image and the generation of the augmented image.
  • the capturing of the second image is accomplished by eliminating at least one of the video images of the video projection and capturing the live image in the corresponding time slot. Also this can be done repeatedly, also at a relatively high rate of succession.
  • the system according to the invention can be a system for a generation of an augmented video stream.
  • the invention comprises systems as well as corresponding methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Studio Devices (AREA)

Abstract

1. System for visualizing a person's face, comprising --- a processing unit configured for receiving an input video stream showing the face and for outputting an output video stream showing the face; and a visualization unit configured for receiving the output video stream. The visualization unit comprises --- a display unit comprising an image output surface, the display unit being configured for displaying the output video stream on the image output surface; --- a stand comprising a foot section to be placed on a floor, the stand having in a predefined alignment relative to a floor when it is placed with its foot section on the floor. The image output surface is positionally fixed to the stand, such that, when the stand has the predefined alignment on a floor, - the image output surface is aligned in an upright fashion; - the image output surface has a vertical extension of at least 20 cm; - the image output surface has a horizontal extension of between 22 cm and 100 cm; and - at least a portion of the image output surface extends across a vertical range of between 115 cm and 130 cm above the floor; wherein the processing unit is configured --- for deriving the output video stream from the input video stream in such a way that, when the stand has the predefined alignment on a floor and the output video stream is displayed on the image output surface, the face is displayed on the image output surface at least approximately in natural size and with eyes of the face being positioned between 100 cm and 145 cm above the floor.

Description

SYSTEM AND METHOD FOR VISUALZING A PERSON’S FACE,
AND
DEVICE, SYSTEM AND METHOD FOR GENERATING AUGMENTED IMAGES OR VIDEO STREAMS
DESCRIPTION
FIRST INVENTION
The invention makes possible to provide an improved meeting experience in meeting with one or more remote participants. In particular, the invention can make possible that the appearance of remote participants is more realistic.
More particularly, the size of an image of a remote participant’s head or face and the height (above a floor) is adjusted so that it resembles size and height of the head(s) or faces of non-remote participant. A stand to which an image output surface on which the remote person’s head or face is made visible is fixed, e.g., integrated, has a predefined alignment when placed on a floor, such that the stand and thus the image of the remote person’s head or face can be placed near a table at which non-remote participants are sitting.
A processing unit accomplishing the scaling and positioning of the remote participant’s head or face in a video stream can be integrated in a visualization unit comprising the stand and the image output surface, or it can be realized on a remote server connected to the internet. Today’s video conferencing software, known image processing software and known face recognition software have useful features which can be used in the invention, such as face detection, eye detection, distance detection, e.g., between the two eyes of an imaged person, identifying the image of a person’s head to identify complementing background, e.g., for cropping or for filling the background with different image information.
For example, the processing unit can receive an input video stream, e.g., via the internet, e.g., generated by the remote person’s laptop, and apply such processing to the input video stream or to a video stream derived therefrom. For example, eye distance and a center of face position (or vertical eye position) can be determined and based thereon, appropriate placement and scaling of the person’s face in an output video stream of the processing unit can be achieved. The output video stream can have system-defined number of pixels (horizontally and vertically), such as given by the visualization unit and more particularly of a display unit thereof, e.g., an output monitor. E.g., a scaling of the input video stream onto that system-defined number of pixels is carried out, wherein the examination regarding size and position of the remote person’s face can be accomplished before or after said scaling.
Of course, assumptions about the remote person’s real eye distance (more particularly, eye distance means pupil distance) (or about some other characteristic face feature dimension) have to be made, e.g., based on a preset value, such as approximately 6 cm, or based on a known average value, as well as assumptions about the height above the floor of non-remote person’s eyes (or head or some other characteristic feature) which however can be made in a similar fashion, e.g., assuming that, e.g., a height above the floor of the eyes of a person sitting at a table (in the non-remote location) is approximately 122 cm, in order to achieve a life-sized (natural size) image of the remote person’s face on the image output surface which in addition is visible in an appropriate height above the floor. It has been found that this way, placing the visualization unit at the table at which the one or more remote participants are sitting, creates a very agreeable situation for the non-remote participants in which the virtual presence of the remote participant is surpridingly realistic and much improved over standard remote video conferencing setups in which the remote person’s face is arbitrarily scaled and placed.
Outputting sound picked up in the remote location (from the remote participant) close to the image output surface furthermore strongly enhances the meeting experience.
The visualization unit can be placed at a table instead of a chair, to impersonate the remote person sitting at the table. This way, a realistic meeting experience can be achieved with relatively simple means.
The invention be particularly useful in conjunction with interactive display apparatuses as described in WO 2021/175 997.
The invention comprises systems as well as corresponding methods.
Note: When an item is described to be “configured” to carry out a step, this means that concrete measures have been taken which factually enable the item to carry out the step. For example, dedicated program code is implemented enabling the item to carrying out the step when the program code is executed. Thus, this does not include, e.g., the mere suitability to (possibly) make the item carry out the step, as may be the case for a computer without a dedicated program code.
Aspects of the invention are also described in the claims.
SECOND INVENTION
From WO 2021/175 997, an interactive display apparatus is described. Using such interactive display apparatuses or also in other situations, it can be useful to create an image of a current situation relating to (a) a video projected onto a projection surface such as onto a table top and (b) an object present on the projection surface during the projection. In a simple approach of obtaining such an image, simply a photograph is taken by a camera which, accordingly, has several drawbacks: (i) The image quality in image ranges where the video is visible on the projection surface is low, as any projection surface produces only a limited quality reproduction of the video, (ii) The image quality suffers from interference effects due to the reproduction of the video in a certain number of video images per time, e.g., 60 video images per second, and mostly also due to a line-wise build-up of each video image which furthermore can be accomplished separately for different color components. Brightness changes and coloring across the image occur, which can be particularly pronounced when using a digital camera for taking the live image, (iii) The object in the image is illuminated by the projection of the video, which may be undesired in many applications.
It has been found that these drawbacks can be overcome when muting the projection during the time of capturing the image. And subsequently, an augmented image can be created from the live image taken of the object (“second image”) and from a video image (“first image”) which would have been projected at least approximately at the time of muting the projection and capturing the live image. These two images can be merged, e.g., to show, within the video, the object where it has been located at that time.
The augmented image can be of very high image quality.
It can be determined where the object is shown in the live image, e.g., using image processing such as object recognition, or using a 3D sensor, e.g., comprising a range imaging device or a time-of-flight camera or a stereo imaging device or a structured light based 3D sensing device or a radar-based 3D sensing device. The 3D sensor can distinguish between regions where nothing stands on the projection surface (empty projection surface) and where an object is present on the projection surface because such an object is elevated relative to the projection surface. A 3D sensor calibration can be useful in this regard, e.g., including determining the 3D position of the projection surface (in many 3D sensing locations across the projection surface) with no object present on it as a reference. E.g., any location where the 3D sensor senses a height which lies above the reference height of the (empty) projection surface is considered to be a place where an object is present on the projection surface.
Furthermore, a calibration to match the live image pixels and the video pixels can be made, e.g., associating four corner points of a rectangular overlap area and interpolating between these. Contents of pixels identified in the live image as object-associated pixels can be used for defining the contents in corresponding regions of the augmented image (and optionally, also contents of the video image in said corresponding regions can be used therefor), and contents of the remaining regions of the augmented image can be determined from contents in corresponding regions in the video image, in particular solely therefrom.
The muting can mean but not necessarily means that the projection is shut off completely. In embodiments, it is possible that the projection continues, but, e.g., with (strongly) reduced intensity.
The imaging unit used for taking the live image (second image) can be a digital imaging unit.
In some embodiments, the capturing of the second image (live image) is accomplished in a pause between the projection of subsequent video images - i.e. where momentarily no projection takes place anyway, such as in the pause occurring 60 times per second in a 60 Hz projection. A control unit of the system in this case does not necessarily have to change the way the projection is “naturally” carried out but can control the proection just the way it might be without the capture of the live image and the generation of the augmented image.
In this way, but also otherwise, it is possible to create several augmented images, even at a relatively high rate of succession, such that an augmented video can be created; a synchronization between the video projection and the capturing of the second images (live images) can be advantages in such cases. In some embodiments, the capturing of the second image (live image) is accomplished by eliminating at least one of the video images of the video projection and capturing the live image in the corresponding time slot. Also this can be done repeatedly, also at a relatively high rate of succession. Accordingly, the system according to the invention can be a system for a generation of an augmented video stream.
The invention comprises systems as well as corresponding methods.
Note: When an item is described to be “configured” to carry out a step, this means that concrete measures have been taken which factually enable the item to carry out the step. For example, dedicated program code is implemented enabling the item to carrying out the step when the program code is executed. Thus, this does not include, e.g., the mere suitability to (possibly) make the item carry out the step, as may be the case for a computer without a dedicated program code.
Aspects of the invention are also described in the claims.
—SECOND INVENTION —
DEVICE, SYSTEM AND METHOD FOR GENERATING AUGMENTED
IMAGES OR VIDEO STREAMS
From WO 2021/175 997, an interactive display apparatus is described. Using such interactive display apparatuses or also in other situations, it can be useful to create an image of a current situation relating to (a) a video projected onto a projection surface such as onto a table top and (b) an object present on the projection surface during the projection. In a simple approach of obtaining such an image, simply a photograph is taken by a camera which, accordingly, has several drawbacks: (i) The image quality in image ranges where the video is visible on the projection surface is low, as any projection surface produces only a limited quality reproduction of the video, (ii) The image quality suffers from interference effects due to the reproduction of the video in a certain number of video images per time, e.g., 60 video images per second, and mostly also due to a line-wise build-up of each video image which furthermore can be accomplished separately for different color components. Brightness changes and coloring across the image occur, which can be particularly pronounced when using a digital camera for taking the live image, (iii) The object in the image is illuminated by the projection of the video, which may be undesired in many applications.
It has been found that these drawbacks can be overcome when muting the projection during the time of capturing the image. And subsequently, an augmented image can be created from the live image taken of the object (“second image”) and from a video image (“first image”) which would have been projected at least approximately at the time of muting the projection and capturing the live image. These two images can be merged, e.g., to show, within the video, the object where it has been located at that time.
The augmented image can be of very high image quality.
It can be determined where the object is shown in the live image, e.g., using image processing such as object recognition, or using a 3D sensor, e.g., comprising a range imaging device or a time-of-flight camera or a stereo imaging device or a structured light based 3D sensing device or a radar-based 3D sensing device. The 3D sensor can distinguish between regions where nothing stands on the projection surface (empty projection surface) and where an object is present on the projection surface because such an object is elevated relative to the projection surface. A 3D sensor calibration can be useful in this regard, e.g., including determining the 3D position of the projection surface (in many 3D sensing locations across the projection surface) with no object present on it as a reference. E.g., any location where the 3D sensor senses a height which lies above the reference height of the (empty) projection surface is considered to be a place where an object is present on the projection surface.
Furthermore, a calibration to match the live image pixels and the video pixels can be made, e.g., associating four comer points of a rectangular overlap area and interpolating between these. Contents of pixels identified in the live image as object-associated pixels can be used for defining the content s in corresponding regions of the augmented image (and optionally, also contents of the video image in said corresponding regions can be used therefor), and contents of the remaining regions of the augmented image can be determined from contents in corresponding regions in the video image, in particular solely therefrom.
The muting can mean but not necessarily means that the projection is shut off completely. In embodiments, it is possible that the projection continues, but, e.g., with (strongly) reduced intensity. The imaging unit used for taking the live image (second image) can be a digital imaging unit.
In some embodiments, the capturing of the second image (live image) is accomplished in a pause between the projection of subsequent video images - i.e. where momentarily no projection takes place anyway, such as in the pause occurring 60 times per second in a 60 Hz projection. A control unit of the system in this case does not necessarily have to change the way the projection is “naturally” carried out but can control the proection just the way it might be without the capture of the live image and the generation of the augmented image.
In this way, but also otherwise, it is possible to create several augmented images, even at a relatively high rate of succession, such that an augmented video can be created; a synchronization between the video projection and the capturing of the second images (live images) can be advantages in such cases.
In some embodiments, the capturing of the second image (live image) is accomplished by eliminating at least one of the video images of the video projection and capturing the live image in the corresponding time slot. Also this can be done repeatedly, also at a relatively high rate of succession.
Accordingly, the system according to the invention can be a system for a generation of an augmented video stream.
The invention comprises systems as well as corresponding methods.
Note: When an item is described to be “configured” to carry out a step, this means that concrete measures have been taken which factually enable the item to carry out the step. For example, dedicated program code is implemented enabling the item to carrying out the step when the program code is executed. Thus, this does not include, e.g., the mere suitability to (possibly) make the item carry out the step, as may be the case for a computer without a dedicated program code.
Aspects of the invention are also described in the claims.

Claims

Patent Claims
1. System for visualizing a person’s face, in particular a remote person’s face participating in a partially virtual meeting, comprising a processing unit configured for receiving an input video stream showing the face and for outputting an output video stream showing the face; and a visualization unit configured for receiving the output video stream; the visualization unit comprising a display unit comprising an image output surface, the display unit being configured for displaying the output video stream on the image output surface, a stand comprising a foot section to be placed on a floor, the stand having in a predefined alignment relative to a floor when it is placed with its foot section on the floor; the image output surface, in particular the display unit, being positionally fixed to the stand, in particular being integrated in the stand, such that, when the stand has the predefined alignment on a floor, the image output surface is aligned in an upright fashion; the image output surface has a vertical extension of at least 20 cm, more particularly of at least 25 cm; the image output surface has a horizontal extension of between 22 cm and
100 cm, more particularly of between 25 cm and 80 cm; and at least a portion of the image output surface extends across a vertical range of between 115 cm and 130 cm above the floor, more particularly of between 110 cm and 135 cm above the floor; wherein the processing unit is configured for deriving the output video stream from the input video stream in such a way that, when the stand has the predefined alignment on a floor and the output video stream is displayed on the image output surface, the face is displayed on the image output surface at least approximately in natural size and with eyes of the face being positioned between 100 cm and 145 cm above the floor, more particularly between 110 cm and 135 cm above the floor.
2. The system according to claim 1, the deriving comprising resizing and repositioning the face in the input video stream or in a video stream originating from the input video stream.
3. The system according to claim 1 or claim 2, the visualization unit further comprising an audio output unit configured for receiving an audio stream associated with the input video stream and for outputting, in particular from an audio emission area, sound according to the audio stream, in particular wherein the audio output unit is positionally fixed to the stand.
4. The system according to claim 3, wherein, when the stand has the predefined alignment on a floor, an audio emission area of the audio output unit for outputting the sound is located between 60 cm and 180 cm above the floor, more particularly between 70 cm and 150 cm above the floor.
5. The system according to any one of the preceding claims, the deriving comprising repeatedly carrying out an examination step to the input video stream or to a video stream originating from the input video stream, the examination step comprising applying face detection to the respective video stream to identify the face; determining a position of the face identified in the respective video stream; and determining a characteristic face feature dimension, e g., an eye distance, of the face identified in the respective video stream; the deriving further comprising using the determined position for positioning the face in the output video stream, in particular for positioning the face in the output video stream in an at least approximately vertically centered fashion; and using the determined characteristic face feature dimension for scaling a size of the face in the output video stream.
6. The system according to claim 5, the examination step further comprising identifying one or more pixel ranges in the respective video stream showing a portion of the person’s body comprising the person’s head; the deriving further comprising using the identified one or more pixel ranges for identifying complementing pixel ranges in the respective video stream and to make the complementing pixel ranges show a backgound unrelated to the input video stream.
7. The system according to any one of the preceding claims, wherein the display unit comprises a monitor, in particular a color monitor.
8. The system according to any one of the preceding claims, wherein the display unit comprises a screen embodying the image output surface and a projector for illuminating the screen according to the output video stream, in particular wherein the projector is positionally fixed to the stand and thus in fixed relative position to the screen.
9. Device for a generation of one or more sets of a first and a second image each, each of the sets being related to one or more physical objects present on a projection surface in a range referred to as overlap area, the device comprising the projection surface, e g., a table top; a control unit; a projector having a field of view, configured for receiving an input video stream and for generating, controlled by the control unit, from the input video stream a projection on the projection surface; an imaging unit having a field of view; the field of view of the projector and the field of view of the imaging unit overlapping on the projection surface in the overlap area; wherein the control unit is configured, for the generation of each of the one or more sets, for controlling the projector to mute the projection during a capture duration; for capturing, as the first image of the respective set, a portion of the input video stream present in the input video stream within the capture duration; for controlling the imaging unit to capture, as the second image of the respective set, a live image within the capture duration showing each of the one or more physical objects to an extent in which the respective physical object is present in the overlap area; wherein the control unit is furthermore configured for carrying out a calibration routine by which calibration data are determined, the calibration data enabling for any place within the overlap area, an association between a video pixel in the input video stream projected by the projector onto the respective place; and an image pixel in the live image onto which the image unit images the respective place; in particular wherein the device comprises a housing housing the control unit; the projector; and the imaging unit, more particularly wherein the projector and the imaging unit are fixed in a fixed position relative to the housing and to one another.
10. System for a generation of one or more augmented images, comprising a device according to claim 9 and a processing unit, the processing unit being configured for receiving each of the second images and, for each of the second images, for identifying one or more pixel ranges in the respective second image showing at least a portion of one of the physical objects; for creating from the respective first and second images a respective augmented image, the creating comprising assigning, to the respective augmented image in the identified pixel ranges, contents derived from, in particular solely from, respective associated pixels of the respective second image and optionally also from respective associated pixels of the respective first image; and assigning, to the respective augmented image outside the identified pixel ranges, contents derived from, in particular solely from, associated pixels of the respective second image.
11. The system according to claim 10, the identifying comprising applying object recognition to the respective second image.
12. The system according to claim 10 or 11, the identifying comprising applying 3D sensing to the overlap area.
13. The system according to claim 10 or 11, further comprising a 3D sensing unit having a field of view; the field of view of the projector and the field of view of the imaging unit and the field of view of the 3D sensing unit overlapping on the projection surface in the overlap area; the 3D sensing unit being configured for determining for a plurality of 3D sensing locations within the overlap area whether or not one of the physical objects is present on the projection surface from the fact that if a physical object is present on the projection surface, that physical object extends above the projection surface; the calibration data enabling for any place within the overlap area, an association between a video pixel in the input video stream projected by the projector onto the respective place; an image pixel in the live image onto which the image unit images the respective place, and a 3D sensing location and the respective place; the identifying comprising using the 3D sensing unit for determining those 3D sensing locations, referred to as object-showing locations, in which one of the physical objects is present on the projection surface and for determining from the object-showing locations the associated video pixels in the input video stream.
PCT/EP2023/076006 2022-09-22 2023-09-21 System and method for visualizing a person's face WO2024062002A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263408909P 2022-09-22 2022-09-22
US63/408,909 2022-09-22

Publications (2)

Publication Number Publication Date
WO2024062002A2 true WO2024062002A2 (en) 2024-03-28
WO2024062002A3 WO2024062002A3 (en) 2024-05-02

Family

ID=88517403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/076006 WO2024062002A2 (en) 2022-09-22 2023-09-21 System and method for visualizing a person's face

Country Status (1)

Country Link
WO (1) WO2024062002A2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021175997A1 (en) 2020-03-04 2021-09-10 Abusizz Ag Interactive display apparatus and method for operating the same

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015194075A1 (en) * 2014-06-18 2015-12-23 ソニー株式会社 Image processing device, image processing method, and program
US10129506B2 (en) * 2014-09-25 2018-11-13 Steve H. McNelley Advanced transparent projection communication terminals
US20230103284A9 (en) * 2016-04-26 2023-03-30 View, Inc. Immersive collaboration of remote participants via media displays
EP3493533B1 (en) * 2016-08-01 2024-04-03 Sony Group Corporation Information processing device, information processing method, and program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021175997A1 (en) 2020-03-04 2021-09-10 Abusizz Ag Interactive display apparatus and method for operating the same

Also Published As

Publication number Publication date
WO2024062002A3 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
TWI479452B (en) Method and apparatus for modifying a digital image
US10085008B2 (en) Image processing apparatus and method
WO2017033853A1 (en) Image processing device and image processing method
WO2014064870A1 (en) Image processing device and image processing method
US9679369B2 (en) Depth key compositing for video and holographic projection
EP1912175A1 (en) System and method for generating a video signal
EP1843581A2 (en) Video processing and display
US20210166485A1 (en) Method and apparatus for generating augmented reality images
KR101757627B1 (en) Marker tracking apparatus for projection area in augmented reality environment using three-dimensional model and marker tracking method thereof
KR20170013704A (en) Method and system for generation user's vies specific VR space in a Projection Environment
JPWO2016152634A1 (en) Information processing apparatus, information processing method, and program
CN110730340B (en) Virtual audience display method, system and storage medium based on lens transformation
US20220207848A1 (en) Method and apparatus for generating three dimensional images
JP2009141508A (en) Television conference device, television conference method, program, and recording medium
WO2024062002A2 (en) System and method for visualizing a person's face
US20210065659A1 (en) Image processing apparatus, image processing method, program, and projection system
Rhee et al. Low-cost telepresence for collaborative virtual environments
JPH0962444A (en) Indication information input device
KR20120092960A (en) System and method for controlling virtual character
Fukui et al. A virtual studio system for TV program production
KR100632533B1 (en) Method and device for providing animation effect through automatic face detection
KR101895281B1 (en) Apparatus for capturing stick-type object in augmented reality environment and method thereof
Thomas Virtual Graphics for Broadcast Production
WO2022226745A1 (en) Photographing method, control apparatus, photographing device, and storage medium
WO2020166352A1 (en) Information processing device, information processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23793996

Country of ref document: EP

Kind code of ref document: A2