US20200045261A1 - Gaze-correct video conferencing systems and methods - Google Patents
Gaze-correct video conferencing systems and methods Download PDFInfo
- Publication number
- US20200045261A1 US20200045261A1 US16/056,446 US201816056446A US2020045261A1 US 20200045261 A1 US20200045261 A1 US 20200045261A1 US 201816056446 A US201816056446 A US 201816056446A US 2020045261 A1 US2020045261 A1 US 2020045261A1
- Authority
- US
- United States
- Prior art keywords
- image
- video conferencing
- camera
- rgb
- participant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 239000002131 composite material Substances 0.000 claims abstract description 137
- 230000004044 response Effects 0.000 claims description 28
- 230000008859 change Effects 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 7
- 238000004891 communication Methods 0.000 description 218
- 230000003287 optical effect Effects 0.000 description 19
- 230000033001 locomotion Effects 0.000 description 14
- 230000008569 process Effects 0.000 description 9
- 230000009466 transformation Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000000844 transformation Methods 0.000 description 8
- 238000012937 correction Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 230000003993 interaction Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000000670 limiting effect Effects 0.000 description 6
- 230000007704 transition Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 230000002452 interceptive effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 238000003709 image segmentation Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000003702 image correction Methods 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000014616 translation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000226585 Antennaria plantaginifolia Species 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003935 attention Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000001144 postural effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/44—Receiver circuitry for the reception of television signals according to analogue transmission standards
- H04N5/445—Receiver circuitry for the reception of television signals according to analogue transmission standards for displaying additional information
- H04N5/44504—Circuit details of the additional information generator, e.g. details of the character or graphics signal generator, overlay mixing circuits
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- G06K9/00597—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- H04N5/247—
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/272—Means for inserting a foreground image in a background image, i.e. inlay, outlay
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/142—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
- H04N7/144—Constructional details of the terminal equipment, e.g. arrangements of the camera and the display camera and display on the same optical axis, e.g. optically multiplexing the camera and display for eye to eye contact
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/10—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/10—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
- H04N23/11—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths for generating image signals from visible and infrared light wavelengths
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/2628—Alteration of picture size, shape, position or orientation, e.g. zooming, rotation, rolling, perspective, translation
Definitions
- Video conferencing technologies have become increasingly commonplace. As globalization continues to spread throughout the world economy, it is increasingly common to find projects where team members are widely distributed across continents. Video conferencing has long been considered a critical technology to reduce high travel expenses for distributed work-forces.
- a video conference session in which there is real-time variability in the physical position of participants(s) relative to a camera or to one another may preclude the capture of a consistent or reliable view of the participants(s) for the remote users.
- Eye contact can instill trust and foster an environment of collaboration and partnership. Lack of eye contact, on the other hand, may generate feelings of distrust and discomfort. Unfortunately, eye contact is usually not preserved in typical video conferencing.
- a video conferencing system in accord with a first aspect of this disclosure, includes a first device including a first display device and a first camera, one or more processors, and one or more computer readable media including instructions which, when executed by the one or more processors, cause the one or more processors to obtain a first RGB image captured, at a first time during a video conferencing session, by the first camera, wherein the first camera is positioned to capture the first RGB image through a first pixel display region of the first display device.
- the instructions also cause the one or more processors to receive at the first device, via the video conferencing session, a first video stream providing a first series of live images of a first human participant of the video conferencing session, wherein the first series of live images includes a first image portion depicting the eyes of the first human participant.
- the instructions cause the one or more processors to display, at about the first time, a first composite image on the first display device, wherein a first pixel position of the first composite image is displayed by the first pixel display region, the first pixel position having a first lateral pixel position in the first composite image.
- the instructions cause the one or more processors to, before the display of the first composite image, composite the first image portion at about the first lateral pixel position in the first composite image, segment a first foreground image, corresponding to a second human participant of the video conferencing session, from the first RGB image, cause, via the video conferencing session, a second composite image to be displayed by a second device at a different geographic location than the first device, wherein the second composite image includes the first foreground image composited with a first background image.
- a method for video conferencing includes obtaining a first RGB image captured, at a first time during a video conferencing session, by a first camera included in a first device, wherein the first camera is positioned to capture the first RGB image through a first pixel display region of a first display device included in the first device.
- the method also includes receiving at the first device, via the video conferencing session, a first video stream providing a first series of live images of a first human participant of the video conferencing session, wherein the first series of live images includes a first image portion depicting the eyes of the first human participant.
- the method includes displaying, at about the first time, a first composite image on the first display device, wherein a first pixel position of the first composite image is displayed by the first pixel display region, the first pixel position having a first lateral pixel position in the first composite image.
- the method further includes, before the display of the first composite image, compositing the first image portion at about the first lateral pixel position in the first composite image.
- the method involves segmenting a first foreground image, corresponding to a second human participant of the video conferencing session, from the first RGB image, and causing, via the video conferencing session, a second composite image to be displayed by a second device at a different geographic location than the first device, wherein the second composite image includes the first foreground image composited with a first background image.
- FIG. 1 illustrates an example of a video conferencing system that includes a first multimedia communication device multimedia communication device being used to access and participate in a video conferencing session.
- FIG. 2 illustrates an exploded view of the first multimedia communication device illustrated in FIG. 1 .
- FIG. 3A illustrates an example of capturing and displaying human foreground subject images.
- FIG. 3B illustrates an example of segmentation of a foreground image from an RGB image captured by the multimedia communication device for the scene shown in FIG. 3A .
- FIG. 3C shows details of the foreground image obtained in FIG. 3B for the scene shown in FIG. 3A .
- FIG. 3D shows positions in a composite image corresponding to each of the RGB camera pixel display regions of a remote multimedia communication device that will display the composite image, such as the remote multimedia communication device in FIG. 1 .
- FIG. 3E illustrates a portion of the composite image generated for the scene shown in FIG. 3A using the foreground image shown in FIG. 3C .
- FIG. 3A illustrates an example of capturing and displaying human foreground subject images.
- FIG. 3B illustrates an example of segmentation of a foreground image from an RGB image captured by the multimedia communication device for the scene shown in FIG. 3A .
- FIG. 3C
- FIG. 3F illustrates an example scene in which the foreground subject has moved laterally from the physical position in FIG. 3A and a resulting composite image for the scene in FIG. 3F .
- FIG. 3G illustrates an example scene in which the foreground subject has moved laterally from the physical position in FIG. 3F and a resulting composite image for the scene in FIG. 3G .
- FIG. 4 illustrates use of image distortion correction applied in some implementations to reduce distortions occurring in various portions of the fields of view of the RGB cameras.
- FIGS. 5A-5D illustrate techniques which may be applied by the video conferencing system in response to changes in distance between multimedia communication devices and respective foreground subjects.
- FIG. 5A illustrates a first scenario occurring at about a first time and a resulting composite image.
- FIG. 5B illustrates aspects of scaling of a foreground image by the video conferencing system for the composite image in FIG. 5A based on at least a distance between a multimedia communication device and a participant.
- FIG. 5C illustrates a second scenario occurring at about a second time after the first time in FIG. 5A in which a participant has moved closer to a multimedia communication device and a resulting composite image.
- FIG. 5D illustrates aspects of scaling of a foreground image by the video conferencing system for the second scenario shown in FIG. 5C .
- FIGS. 5E and 5F illustrate additional techniques which may be applied by the video conferencing system in response to changes in distance between the first multimedia communication device and a foreground subject.
- FIG. 5E illustrates an example scene in which the foreground subject has moved from the physical position shown in FIG. 3F to a new physical position closer to the multimedia communication device and the resulting composite image.
- FIG. 5F illustrates an example scene in which the foreground subject has moved from the physical position shown in FIG. 5E to a new physical position further away from the multimedia communication device and the resulting composite image.
- FIGS. 6A-6D illustrate techniques for selecting and changing RGB cameras that further support providing gaze-correct video conferencing sessions among and between various participants at various geographic locations during a single video conferencing session.
- FIG. 6A illustrates a first scenario occurring at a first time, including a scene at the first geographic location shown in FIG. 1 and a scene at the second geographic location shown in FIG. 1 .
- FIG. 6B illustrates a second scenario occurring at a second time after the first time shown in FIG. 6A and during the video conferencing session shown in FIG. 6A .
- FIG. 6C illustrates a third scenario occurring at a third time after the second time shown in FIG. 6B and during the video conferencing session shown in FIGS. 6A and 6B .
- FIG. 6D illustrates a fourth scenario occurring at a fourth time after the third time shown in FIG. 6C and during the video conferencing session shown in FIGS. 6A-6C .
- FIGS. 7A-7C illustrate a technique used in some implementations, in which rendered foreground images make an animated transition from one RGB camera area to another when a new foreground camera is selected, in which over several successive video frames the rendered foreground images “glide” or otherwise approximate lateral human motion from the previous RGB camera area to the new RGB camera area.
- FIG. 8 illustrates techniques involving having multiple participants concurrently participating in a video conferencing session via a single shared multimedia communication device.
- FIG. 9 illustrates an example of gaze-correct multi-party video conferencing among five participants each at a different geographic location.
- FIG. 10 illustrates an example in which two multimedia communication devices are tiled adjacent to each other to provide a larger multimedia communication device or system.
- FIG. 11 is a block diagram illustrating an example software architecture, various portions of which may be used in conjunction with various hardware architectures herein described.
- FIG. 12 is a block diagram illustrating components of an example machine configured to read instructions from a machine-readable medium and perform any of the features described herein.
- the following implementations introduce video conferencing systems and process for facilitating eye contact between participants of a video conferencing session. These systems are configured to improve gaze alignment between live participants and projected images of remote counterparts. This can occur by generation of composite images that maximize the presentation of a participant's face and eyes. In addition, segmentation of the image allows foreground images to be composited with background images. These systems are configured to present images of the participant(s) such that the projected person appears to be looking directly at a camera. As a result, the participants can have a gaze-correct multi-party video conferencing session.
- eye contact refers to a situation in which two individuals are looking directly into each other's eyes, or where an image of a live person's eyes appear to be directed towards a person viewing the image, and/or a live person's eyes are directed toward the eyes of a projected image of a person.
- eye gaze carries important information about another person's focus of attention, emotional and mental states, and intentions, as well as signals another person's potential interest for social interaction.
- eye contact two persons share emotions and can more readily develop a connection.
- the perception of a direct gaze can trigger self-referential processing that leads, for example, to the enhanced processing of incoming information, enhancement of self-awareness, and increased prosocial behavior.
- the eye region is a key region of the face that individuals tend to pay attention to during conversations, as shown in multiple studies using eye tracking technology.
- a direct gaze can hold an audience's attention more effectively than other gaze directions.
- FIG. 1 illustrates an example of a video conferencing system 102 that includes a first multimedia communication device 100 (which may be referred to as a “teleconferencing device,” “telepresence device”, “video conferencing device,” or “participant device”) being used to access and participate in a video conferencing session (which may be referred to as a “telepresence session”).
- the video conferencing system 102 further includes a second multimedia communication device 160 at a different second geographic location 150 .
- the second multimedia communication device 160 is configured with essentially the same features and to operate substantially the same as the first multimedia communication device 100 .
- the multimedia communication devices 100 and 160 may be each implemented in various other embodiments.
- the video conferencing system 102 may include additional such multimedia communication devices, which may be used to access and participate in the video conferencing session shown in FIG. 1 and/or other video conferencing sessions.
- the video conferencing system 102 may include and/or make use of additional network-connected computing devices and systems, with the video conferencing system 102 being configured to use such additional computing devices and systems for establishing video conferencing sessions, maintaining video conferencing sessions, image segmentation, and/or image compositing.
- the first multimedia communication device 100 is arranged and operating at a first geographic location 120 as an endpoint in a video conferencing session.
- a video conferencing session may also be referred to as a “video conference.”
- the first multimedia communication device 100 is operating to provide a video stream providing a series of live images depicting one or more participants (which may be referred to as “subjects” or “users”) at the first geographic location 120 to the second multimedia communication device 160 for viewing by a remote participant 155 .
- the first multimedia communication device 100 is operating to receive a video stream from the second multimedia communication device 160 providing a series of live images depicting the remote participant 155 .
- the first multimedia communication device 100 may be referred to as a “local” device
- the second multimedia communication device 160 may be referred to as a “remote” device.”
- the multimedia communication device 100 is embodied as an interactive device that includes a display device 105 for presenting images, although it is noted that the multimedia communication device 100 is not limited to such embodiments.
- the multimedia communication device 100 may present images via, but not include, a display device.
- the display device 105 is positioned to present images to participants at the first geographic location 120 .
- the multimedia communication device 100 may be configured to display images and/or video streams from one or more remote devices or systems participating in a video conferencing session with the multimedia communication device 100 , such as from the multimedia communication device 160 .
- the multimedia communication device 100 may be mounted on a wall, as illustrated in FIG.
- the display device 105 is also configured to operate as a touch screen to receive user input.
- the first geographic location 120 is a conference room with seated participants 134 , 136 , and 138 at a table 125 and a standing participant 132 in closer proximity to the multimedia communication device 100 .
- the example illustrated in FIG. 1 is not intended to limit applications or environments in which the multimedia communication device 100 may be used.
- the desk 125 is shown closer in FIG. 1 than in FIG. 3 below.
- the four participants 132 , 134 , 136 , and 138 are participating in the video conferencing session via the multimedia communication device 100 .
- the term “video conferencing” applies to electronic communications in which a video stream including images captured by a first participant device is received and displayed by at least a second participant device, and may include, but does not require, the first participant device displaying a video stream provided by the second participant device.
- the illustrated video conferencing session includes the remote participant 155 at the second geographic location 150 , who is participating via the multimedia communication device 160 (which may also be referred to as a “remote participant device”) configured to serve as an endpoint in the video conferencing session.
- the multimedia communication device 160 receives the video stream via one or more data communication networks (not illustrated in FIG. 1 ). It is noted that use of the multimedia communication device 100 is not necessarily limited to video conferencing activities. For example, the multimedia communication device 100 may provide a virtual whiteboard or run arbitrary computer program applications, and display information and/or user interfaces for such other activities on the display device 105 . Such other activities may be performed during a video conferencing session and result in additional data being exchanged among devices participating in a video conferencing session.
- the multimedia communication device 100 includes a plurality of RGB (red-green-blue) imaging cameras 110 a , 110 b , 110 c , and 110 d (collectively referred to as “RGB cameras 110 ”). Although the example illustrated in FIG. 1 includes four RGB cameras 110 , in other implementations there may two or more RGB cameras 110 . Each of the RGB cameras 110 are positioned behind the display device 105 to capture images from light received through the display device 105 , and accordingly are not directly visible in FIG. 1 . By positioning the RGB cameras 110 behind the display device 105 , images can be displayed on the display device 105 over the physical positions of the RGB cameras 110 .
- RGB cameras 110 red-green-blue
- subject gazes may be directed at the RGB cameras 110 , enabling gaze-correct multi-party video conferencing as discussed in more detail herein. Additionally, by placing the RGB cameras 110 behind the display device 105 , greater numbers of RGB cameras 110 may be more easily included, the RGB cameras 110 may arranged to capture images from more natural angles (for example, for near and/or far features), and an additional non-display user-facing surface (such as a bezel) is not necessary to accommodate the RGB cameras 110 .
- the RGB cameras 110 are positioned such that, when the multimedia communication device 100 is operated, a leftmost RGB camera 110 (in FIG. 1 , the RGB camera 110 a ) and a rightmost RGB camera 110 (in FIG. 1 , the RGB camera 110 d ) span a horizontal distance that is at least large enough, in most conditions, to obtain a view around a human subject located close to and within a field of view (FOV) of one or more of the RGB cameras 110 .
- FOV field of view
- an image of the standing participant 132 is included in an image 140 b captured by the RGB camera 110 b , whereas the standing participant 132 is not visible in an image 140 d captured by the RGB camera 110 d at approximately the same time.
- the RGB camera 110 a may be positioned at a height less than or about equal to a height of the RGB camera 110 d .
- Various other arrangements and numbers for the RGB cameras 110 are also effective, such as, but not limited to, an array, along multiple parallel lines, or along perpendicular lines (for example, to increase a horizontal span when operated in portrait orientation perpendicular to the landscape orientation illustrated in FIG. 1 ).
- the RGB cameras 110 are configured and operated to periodically capture images at a frame rate suitable for video conferencing.
- the multimedia communication device 160 similarly includes RGB cameras 180 a , 180 b , 180 c , and 180 d.
- the multimedia communication device 100 includes one or more depth cameras 115 , such as the two depth cameras 115 a and 115 b .
- some or all of the depth cameras 115 are positioned behind the display device 105 to capture light for depth estimation through the display device 105 , such as is illustrated for the two depth cameras 115 a and 115 b (which accordingly are not directly visible in FIG. 1 ).
- the depth cameras 115 By placing the depth cameras 115 behind the display device 105 , greater numbers of depth cameras 115 may be more easily included, and an additional non-display user-facing surface is not necessary for the depth cameras 115 .
- a depth estimate may also be referred to as an “estimated depth,” “distance estimate,” or “estimated distance.”
- the depth cameras 115 produce depth maps (also referred to as “depth images”) that include depth estimates for multiple physical positions within the FOV of the depth cameras 115 .
- Depth estimates obtained using the depth cameras 115 may be used by the video conferencing system 102 (for example, at the multimedia communication device 100 ) to, among other things, determine when a subject has come into proximity to the multimedia communication device 100 , estimate a distance between the multimedia communication device 100 and a subject, estimate a physical position of a subject relative to one or more of the RGB cameras 110 , and/or identify discontinuities in a depth image and related depth image data used to aid image segmentation for a foreground subject in an image captured by one of the RGB cameras 110 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) is configured to select one or more foreground cameras from the multiple RGB cameras 110 for capturing one or more images of one or more identified foreground subjects (for example, a human subject).
- the term “foreground” may be abbreviated as “FG” in portions of this disclosure.
- the standing participant 132 may also be referred to as “foreground subject 132 .”
- the RGB camera 110 b has been selected as a foreground camera, and has captured an RGB image 140 b in which the foreground subject 132 can be seen.
- Image segmentation is performed to identify a foreground image portion of the RGB image 140 b corresponding to the foreground subject 132 , which is used to generate a foreground image 142 of the foreground subject 132 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) is configured to select a background camera from the multiple RGB cameras 110 for capturing one or more images of at least a portion of a background area behind the foreground subject 132 .
- the term “background” may be abbreviated as “BG” in portions of this disclosure.
- the RGB camera 110 d has been selected as a background camera, and a background image 140 d has been obtained from the selected RGB camera 110 d .
- the background image 140 a includes images of the table 125 and the participants 134 , 136 , and 138 , but does not show the foreground subject 132 .
- the foreground image 142 has been scaled and composited with the background image 140 d to produce a composite image 145 .
- the scaled foreground image 142 has been positioned in the composite image 145 so that when the composite image 145 is displayed by the multimedia communication device 160 , an image portion depicting the eyes of the foreground subject 132 is shown at about the position of the RGB camera 180 a .
- the participant 155 views the composite image 145 on the multimedia communication device 160 (and other such images), in RGB images captured by the RGB camera 180 a the participant 155 is looking directly at the RGB camera 180 a .
- RGB images When such RGB images are used to generate images of the participant 155 on the multimedia communication device 100 , it appears to at least some of the participant at the first geographic location 120 that they are in direct eye contact with the participant 155 .
- an image portion depicting the eyes of the participant 155 is shown at about the position of the RGB camera 110 b used as a foreground camera for the foreground subject 132 .
- the foreground subject 132 views such images of the participant 155 on the multimedia communication device 100
- the foreground subject 132 is looking directly at the RGB camera 110 b.
- the participant 155 views images of the participant 132 in which the participant 132 is in eye contact with the participant 155
- the participant 132 views images of the participant 155 in which the participant 155 is in eye contact with the participant 132 .
- the participants 132 and 155 have a gaze-correct multi-party video conferencing session.
- the participants 132 and 155 are actually looking at the RGB cameras 110 and 180 , there is no need to modify the portions of the RGB images depicting the eyes to achieve gaze alignment, thereby avoiding application of gaze correction techniques that generally result in unnatural images.
- the composite image 145 and/or the foreground image 142 is digitally encoded by the video conferencing system 102 to produce an encoded image (such as, but not limited to, a frame of an encoded video stream).
- the encoded image is then provided to the remote multimedia communication device 160 , thereby causing the composite image 145 to be displayed, at least in part, by the remote multimedia communication device 160 , such as via a video conferencing application program executed by the remote multimedia communication device 160 .
- Similar processing may be performed to generate a sequence of multiple such images, based on images captured by the RGB cameras 110 , used for a sequence of frames that are encoded in one or more video streams transmitted to participants of the video conferencing session.
- the image 170 is illustrated as occupying an entire display surface of the remote device 160 , the image 170 may be displayed in a subportion of the display surface; for example, the image 170 may be displayed in a window or a video display region of a user interface.
- the multimedia communication device 100 and/or the multimedia communication device 160 may display images received from one or more remote devices in a similar manner.
- FIG. 2 illustrates an exploded view of the first multimedia communication device 100 illustrated in FIG. 1 .
- FIG. 2 is presented with reference to a Z axis 202 , a Y axis 204 , and an X axis 206 .
- a positive direction (illustrated with “+”) may be referred to as a “forward” direction
- a negative direction (illustrated with “ ⁇ ”) may be referred to as a “backward” direction.
- the display device 105 is arranged perpendicular to the Z axis 202 and configured to emit light in the forward direction through a front (and user-viewable) surface 205 of the display device 105 (which also, in this example, is a front surface 205 of the first multimedia communication device 100 ) in response to signals received from a controller 250 included in the first multimedia communication device 100 .
- a horizontally arranged axis of the first multimedia communication device 100 may be referred to as a lateral axis or
- a vertically arranged axis of the first multimedia communication device 100 may be referred to as a longitudinal axis or direction (which may define an “upward” direction and a “downward” direction).
- a horizontally arranged axis of the first multimedia communication device 100 may be referred to as a lateral axis or
- a vertically arranged axis of the first multimedia communication device 100 may be referred to as a longitudinal axis or direction (which may define an “upward” direction and a “
- the X axis 206 may be referred to as a lateral axis and the Y axis 204 may be referred to as a longitudinal axis.
- the X axis 206 may be referred to as a longitudinal axis and the Y axis 204 may be referred to as a lateral axis.
- the display device 105 may be implemented with technologies such as liquid-crystal displays (LCDs), organic light-emitting diode type displays (OLEDs), quantum dot-based displays, or various other light-emitting displays that permit RGB cameras 110 to capture suitable images through the display device 105 .
- LCDs liquid-crystal displays
- OLEDs organic light-emitting diode type displays
- quantum dot-based displays or various other light-emitting displays that permit RGB cameras 110 to capture suitable images through the display device 105 .
- Light received by the RGB cameras 110 a , 110 b , 110 c , and 110 d from a scene 240 in front of the display device 105 passes through respective pixel display regions 210 a , 210 b , 210 c , and 210 d of the display device 105 (collectively referred to as “pixel display regions 210 ”, which may also be referred to as “RGB camera pixel display regions”).
- pixel display regions 215 Light received by the depth cameras 115 a and 115 b from the scene 240 passes through respective pixel display regions 215 a and 215 b of the display device 105 (collectively referred to as “pixel display regions 215 ”, which may also be referred to as “depth camera pixel display regions”).
- pixel display regions 215 may also be referred to as “depth camera pixel display regions”.
- One or more scene illumination sources may also be positioned behind the display device 105 .
- one or more of the depth cameras 215 may include an integrated infrared (IR) illumination source.
- the display device 105 includes multiple display panels.
- the display device 105 is a forward-emitting display device, such as an OLED-based forward-emitting display device, arranged such that a small portion or substantially none of the light emitted by the display device 105 is emitted through a rear surface of the display device 105 .
- some OLED-based forward-emitting display devices have about a 5% backward emission of display light.
- image correction is performed to correct for backward-emitted light; for example, image contents for an RGB camera pixel display region 210 may be used to estimate and subtract or otherwise correct the effect of backward-emitted light captured by an RGB camera 110 .
- the RGB cameras 110 and/or the depth cameras 115 may capture images at any time, independent of synchronization with operation of the display device 105 .
- image capture operations performed by the RGB cameras 110 are synchronized with at least operation of their respective pixel display regions 210 .
- image capture periods for an RGB camera 110 may be performed when its respective pixel display regions 210 is not emitting light, such as, but not limited to, in synchronization with display refresh periods or by displaying a dimmed image (including, for example, a black image) in the pixel display regions 210 during image capture operations. Additional approaches are described in U.S. Patent Application Publication Number 2015/0341593 (published on Nov. 26, 2015 and entitled “Imaging Through a Display device”), which is incorporated by reference herein in its entirety.
- depth image capture operations performed by the depth cameras 115 are similarly synchronized with at least operation of their respective depth camera pixel display regions 215 .
- each of the RGB cameras 110 is positioned at about a same first distance upward (and away) from a lateral midline 206 of the display device 105 .
- the physical positions of the RGB cameras 110 relative to one another and/or the lateral midline 206 can vary.
- the first multimedia communication device 100 also includes the controller 250 .
- the controller 250 includes a logic subsystem, a data holding subsystem, a display controller, and a communications subsystem, and is communicatively coupled to the display device 105 , RGB cameras 110 , and depth cameras 115 .
- the logic subsystem may include, for example, one or more processors configured to execute instructions and communicate with the other elements of the first multimedia communication device 100 according to such instructions to realize various aspects of this disclosure. Such aspects include, but are not limited to, configuring and controlling the other elements of the first multimedia communication device 100 , input and commands, communicating with other computer systems, processing images captured by the RGB cameras 110 and the depth cameras 115 , and/or displaying image data received from remote systems.
- the data holding subsystem includes one or more memory devices (such as, but not limited to, DRAM devices) and/or one or more storage devices (such as, but not limited to, flash memory devices).
- the data holding subsystem includes one or more media having instructions stored thereon which are executable by the logic subsystem, which cause the logic subsystem to realize various aspects of this disclosure. Such instructions may be included as part of firmware, an operating system, device drivers, application programs, or other executable programs.
- the communications subsystem is arranged to allow the first multimedia communication device 100 to communicate with other computer systems. Such communication may be performed via, for example, wired or wireless data communication. Other examples for the controller 250 are illustrated in FIGS. 11 and 12 .
- the first multimedia communication device 100 also includes an enclosure 260 , arranged to be mechanically coupled to the display panel 105 and enclose internal components of the first multimedia communication device 100 , including the RGB cameras 110 , the depth cameras 215 , and the controller 250 .
- the enclosure 260 may also be referred to as a “housing.” In this example, when the illustrated first multimedia communication device 100 is assembled, the RGB cameras 110 are all encompassed by the single enclosure 260 and positioned behind the single display device 105 .
- the display device 105 has a 16:9 aspect ratio, with a diagonal size of approximately 213 centimeters.
- the RGB cameras 110 a , 110 b , 110 c , and 110 d are positioned equidistantly along a line substantially parallel to the lateral axis 206 with a distance of about 150 centimeters between the optical axes of the RGB cameras 110 a and 110 d .
- a lateral midline of the display device 105 (for example, the lateral midline 206 illustrated in FIG.
- the RGB cameras 110 are positioned horizontally and approximately 154 centimeters above a floor, and the optical axes of the RGB cameras 110 are positioned approximately 6 centimeters above the vertical center of the display device 105 , placing the optical axes of the RGB cameras 110 approximately 160 centimeters from the floor, positioning the RGB cameras 110 at approximately eye level for a standing human subject.
- a subject's eyes are more likely to be aligned with the RGB cameras 110 improving both capture of gaze-aligned images (images in which a subject is looking directly at the camera) and display of images of remote participants perceived as direct eye-to-eye contact.
- An optical axis of the depth camera 115 a is oriented 11 degrees left from the horizontal axis 210 and an optical axis of the depth camera 115 b is oriented 11 degrees right from the horizontal axis 210 , thereby providing an increased combined FOV for the depth cameras 115 .
- An optical center of the depth camera 115 a is positioned approximately 66 centimeters in the lateral direction from an optical center of the depth camera 215 b .
- the optical centers of the depth cameras 115 are positioned approximately 13 centimeters below the optical axes of the RGB cameras 110 .
- the RGB cameras 110 and the depth cameras 115 each capture images with a 16:9 aspect ratio and with a horizontal FOV of approximately 100 degrees.
- first multimedia communication device 100 may be implemented across multiple devices. For example, selected operations may be performed by a computer system not within the illustrated enclosure 260 , and/or some or all of the depth cameras 115 may be included in one or more separate devices instead of being positioned behind the display device 105 or otherwise not positioned within the enclosure 260 .
- FIG. 3A illustrates an example of capturing and displaying human foreground subject images.
- FIG. 3A shows a top view of an example scene 300 in which the four participants 132 , 134 , 136 , and 138 are arranged much as shown in FIG. 1 , with seated participants 134 , 136 , and 138 , and standing participant 132 , during a video conferencing session.
- the standing participant 132 has advanced toward the multimedia communication device 100 and within an example threshold distance 302 and a corresponding foreground space 303 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) may be configured to determine a subject distance based on depth images captured by the depth cameras 115 .
- the video conferencing system 102 is configured to ignore features beyond the threshold distance 302 or outside of the foreground space 303 for identifying foreground subjects.
- the shape, physical positions, and distances illustrated in FIG. 3A for the threshold distance 302 and the foreground space 303 are generally illustrated for discussion, and may be different in various implementations.
- the threshold distance 302 and/or a shape of, and physical positions for, the foreground space 303 may be defined and/or adjusted by a user; for example, during a setup process.
- the video conferencing system 102 (for example, the multimedia communication device 100 ) has identified the participant 132 as a foreground subject for segmentation from RGB images.
- the video conferencing system 102 has selected the RGB camera 110 b , with a corresponding FOV 304 b (shown in part), as the foreground camera for capturing images of the foreground subject 132 . It is noted that foreground camera selection may occur after the foreground image has been captured and be based on the content of the RGB images and/or corresponding depth images.
- FIG. 3B illustrates an example of segmentation of a foreground image 330 , corresponding to the foreground subject 132 , from an RGB image 310 captured by the multimedia communication device 100 for the scene 300 shown in FIG. 3A .
- the segmentation of a foreground image from an RGB image results in labeling of pixels in the RGB image, rather than generating a foreground image separate from the RGB image.
- the RGB image 310 has been captured by the selected foreground RGB camera 110 b .
- the foreground subject 132 has a height 312 of about 74% of the height of the RGB image 310 , and the eyes of the foreground subject 132 are centered at a lateral distance 314 of about 74% of the width of the RGB image 310 .
- an RGB image based segmentation is performed, identifying a first foreground mask 316 identifying pixel positions corresponding to the foreground subject 132 and, in some examples, a first background mask 318 .
- a machine-trained model for an automated machine algorithm trained to identify instances of certain types of objects, may be applied to the RGB image 310 to identify the first foreground mask 316 and/or the first background mask 318 .
- a trained neural network such as a trained convolutional neural network (CNN), may be used for this purpose.
- CNN convolutional neural network
- a depth image 320 has been captured for the scene 300 by the depth camera 115 a . Due to limitations of patent illustrations, the depth image 320 is illustrated with only a few different levels of shading. In the depth image 320 , there is a portion 322 with depth estimates that are substantially discontinuous along edges between the portion 322 and surrounding areas of the depth image 320 . Based on the depth image 320 , the video conferencing system 102 (for example, the multimedia communication device 100 ) identifies a first foreground depth mask 324 identifying positions in the depth image 320 corresponding to the foreground subject 132 and, in some examples, a first background depth mask 326 .
- the video conferencing system 102 based on the above-mentioned discontinuities between the portion 322 and surrounding areas of the depth image 320 , the video conferencing system 102 identifies the portion 322 as a foreground portion 322 of the depth image 320 . In some examples, the video conferencing system 102 may further determine a distance d 305 and/or physical position for the identified foreground portion 322 . Based on, for example, the determined distance d 305 being less than the threshold distance 302 and/or the determined physical position being within the foreground space 303 , the video conferencing system 102 identifies a foreground subject corresponding to the participant 132 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) is configured to identify portions of the RGB image 310 corresponding to the first foreground depth mask 324 , resulting in a second foreground mask 328 and, in some implementations, a second background mask 329 .
- various techniques can be used individually or in combination, including, but not limited to, rotations and/or translations of two-dimensional (2D) and/or 3D points and/or vectors (including, for example, use or one or more transformation matrices); optical distortion correction for a depth camera and/or RGB camera (including, for example, correction of complex asymmetric optical distortion); geometric transformations such as, but are not limited to, affine transformations (linear conformal (scaling, translations, rotations) and shears), projective transformations (projections, homographies, and collineations), and piecewise linear transformations (for example, affine transformations applied separately to triangular regions of an image); and/or nonlinear image transformations such as, but not limited to, polynomial transformations, nonuniform scaling, circular or radial distortion (barrel, pincushion, moustache, and multiorder), and tangential distortion
- the video conferencing system 102 (for example, the multimedia communication device 100 ) is configured to, based on the first foreground mask 316 , the second foreground mask 328 , the first background mask 318 , and/or the second background mask 329 , segment from the RGB image 310 a foreground image 330 corresponding to the foreground subject 132 .
- Other techniques that may be applied for segmenting the foreground image 330 are described in U.S. patent application Ser. No. 15/975,640 (filed on May 9, 2018 and entitled “Skeleton-Based Supplementation for Foreground Image Segmentation”), which is incorporated by reference herein in its entirety.
- FIG. 3C shows details of the foreground image 330 obtained in FIG. 3B for the scene 300 shown in FIG. 3A .
- the foreground image 330 has a total height of about 74% of the height of the RGB image 310 and a total width of about 25% of the width of the width of the RGB image 310 .
- the video conferencing system 102 (for example, the multimedia communication device 100 and/or 160 ) is configured to obtain an eye pixel position 332 for the foreground image 330 , corresponding to an image portion included in the foreground image 330 depicting the eyes of the foreground subject 132 .
- the eye pixel position 332 may be determined based on a centroid, middle position, or average position for an image portion identified as a portion of the foreground image 330 depicting the eyes of the foreground subject 132 .
- a machine-trained algorithm used to identify the first foreground mask 316 may also be trained to identify a portion of the RGB image 310 depicting the eyes of the foreground subject 132 and/or estimate the eye pixel position 332 .
- the eye pixel position 332 is at a lateral (or “x”) pixel position or distance 334 of about 50% of the width of the foreground image 330 , and is at a longitudinal (or “y”) pixel position or distance 336 of about 85% of the height of the foreground image 330 .
- FIG. 3D shows pixel positions 343 , 345 , 347 , and 349 in a composite image 350 corresponding to respective RGB camera pixel display regions 190 a , 190 b , 190 c , and 190 d for RGB cameras 180 a , 180 b , 180 c , and 180 d of the remote multimedia communication device 160 that will display the composite image 350 .
- each of the pixel positions 343 , 345 , 347 , and 349 has at a longitudinal pixel position or distance 340 (in this example, along a Y axis similar to the Y axis 204 shown in FIG. 2 ) of about 55% of the height of the composite image 350 .
- the pixel position 343 corresponding to the pixel display region 190 a and the RGB camera 180 a , has a lateral pixel position or distance 342 (in this example, along a X axis similar to the X axis 206 shown in FIG. 2 ) of about 11% of the width of the composite image 350 .
- Pixel position 345 has a lateral pixel position or distance 344 of about 35%
- pixel position 347 has a lateral pixel position or distance 346 of about 65%
- pixel position 349 has a lateral pixel position or distance 348 of about 89%.
- the video conferencing system 102 is configured to generate the composite image 150 .
- the pixel positions 343 , 345 , 347 , and 349 are provided by the remote multimedia communication device 160 to the multimedia communication device 100 , and compositing is performed by the multimedia communication device 100 .
- the pixel positions 343 , 345 , 347 , and 349 are determined and used by the remote multimedia communication device 160 that will display the composite image 350 , and compositing is performed by the remote multimedia communication device 160 .
- FIG. 3E illustrates a portion of the composite image 350 generated for the scene 300 shown in FIG. 3A using the foreground image 330 shown in FIG. 3C .
- the foreground image 330 is selectively positioned such that the eye pixel position 332 of the foreground image 330 is at about the pixel position 347 for the RGB camera 180 c and as a result displayed by the pixel display region 190 c .
- the foreground image 330 is scaled for composition in the composite image 350 . This scaling is discussed in more detail in connection with FIGS. 5A-5F . In the example shown in FIG.
- the foreground image 330 is scaled such that it would have a total height 354 of about 93% of the height of the composite image 350 (an increase of about 26% from the proportionate size of the foreground image 330 portion of the RGB image 310 ).
- the rendered height 356 of the rendered portion 352 of the foreground image 330 is only about 59% of the height of the composited image 350 .
- the eye pixel position 332 of the rendered portion 352 foreground image 330 is at about the lateral pixel position 346 in the composite image 350 .
- the eyes of the foreground subject 132 are displayed at about the pixel display region 190 c that will be used to capture RGB images of the participant viewing the composite image 350 .
- FIG. 3F illustrates an example scene 360 in which the foreground subject 132 has moved laterally from the physical position in FIG. 3A and a resulting composite image 374 for the scene 360 in FIG. 3F .
- the composite image 374 is generated according to the techniques described in FIGS. 3A-3E .
- the video conferencing system 102 again selects the RGB camera 110 b as the foreground camera for the foreground subject 132 .
- the foreground subject 132 is at a distance d 362 from the selected RGB camera 110 b .
- 3F shows an RGB image 364 , obtained from the selected RGB camera 110 b for the scene 360 , in which the foreground subject 132 has a height 366 of about 74% of the height of the RGB image 364 , and the eyes of the foreground subject 132 are centered (for a position similar to the eye pixel position 332 shown in FIG. 3C ) at a lateral distance 368 of about 59% of the width of the RGB image 364 .
- the foreground subject 132 has a height 366 of about 74% of the height of the RGB image 364
- the eyes of the foreground subject 132 are centered (for a position similar to the eye pixel position 332 shown in FIG. 3C ) at a lateral distance 368 of about 59% of the width of the RGB image 364 .
- the resulting foreground image 370 is scaled and composited into the composite image 374 such that an eye position for the rendered portion 372 of the foreground image 370 is at about the longitudinal pixel position 340 and lateral pixel position 346 for the pixel display region 190 c.
- FIG. 3G illustrates an example scene 380 in which the foreground subject 132 has moved laterally from the physical position in FIG. 3F and a resulting composite image 394 for the scene 380 in FIG. 3G .
- the composite image 374 is generated according to the techniques described in FIGS. 3A-3E .
- the video conferencing system 102 again selects the RGB camera 110 b as the foreground camera for the foreground subject 132 .
- FIG. 3G illustrates an example scene 380 in which the foreground subject 132 has moved laterally from the physical position in FIG. 3F and a resulting composite image 394 for the scene 380 in FIG. 3G .
- the composite image 374 is generated according to the techniques described in FIGS. 3A-3E .
- the video conferencing system 102 again selects the RGB camera 110 b as the foreground camera for the foreground subject 132 .
- 3G shows an RGB image 384 , obtained from the selected RGB camera 110 b for the scene 380 , in which the foreground subject 132 has a height 386 of about 74% of the height of the RGB image 384 , and the eyes of the foreground subject 132 are centered at a lateral distance 388 of about 26% of the width of the RGB image 384 .
- the foreground subject 132 has a height 386 of about 74% of the height of the RGB image 384 , and the eyes of the foreground subject 132 are centered at a lateral distance 388 of about 26% of the width of the RGB image 384 .
- the resulting foreground image 390 is scaled and composited into the composite image 394 such that an eye position for the rendered portion 392 of the foreground image 390 is at about the longitudinal pixel position 340 and lateral pixel position 346 for the pixel display region 190 c.
- the resulting composite images 350 , 374 , and 394 consistently rendered the eyes of the foreground subject 132 at about the longitudinal pixel position 340 of about 55% and the lateral pixel position 346 of about 65%, and maintained the rendered position of the eyes of the foreground subject 132 over the foreground camera being used to capture RGB images of the participant viewing the composite images.
- FIG. 4 illustrates use of image distortion correction applied in some implementations to reduce distortions occurring in various portions of the fields of view of the RGB cameras 110 .
- some or all of the RGB cameras 110 have wide fields of view of about 90 degrees or more.
- curvilinear distortion such a barrel distortion is common.
- FIG. 4 shows an uncorrected image 400 obtained from a wide angle RGB camera 110 , with dashed lines added to more clearly illustrate barrel distortion in the uncorrected image 400 .
- the distortion is relatively minor at a central portion 410 of the uncorrected image 400 , as shown by a representative foreground image 420 .
- the distortion becomes more severe and becomes noticeable, as shown by the representative foreground image 425 from a peripheral portion of the uncorrected image 400 in contrast to the central foreground image 420 .
- such distortion if uncorrected, can cause the eyes of the foreground subject to appear to be looking away from a remote participant even when the foreground subject is looking at the RGB camera.
- axial distortion associated with subject distance can cause participant gaze angles to deviate.
- the resulting foreground images demonstrate distortions in different directions, resulting in an unusual and disturbing visual effect when the foreground subject is maintained at the same lateral position as shown in FIGS. 3E, 3F, and 3G .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) is configured to “undistort” or correct the RGB images to reduce such distortion.
- FIG. 4 shows a corrected image 430 , resulting from correction of the barrel distortion in the original uncorrected image 410 .
- the appearance of the foreground subject is more consistent in appearance across the FOV of an RGB camera 110 , as illustrated by the foreground images 450 and 455 from respective portions 440 and 445 of the corrected image 430 .
- other image corrections may be applied, including, but not limited to, corrections for more complex (non-curvilinear) optical distortions, vignetting, and chromatic aberration.
- Various image corrections may be performed using the techniques described in connection with transforming depth images in FIG. 3B .
- non-optical distortions can occur in the form of subject distance distortions when a participant is close to an RGB camera 110 .
- depth images obtained from the depth cameras 115 may be used to correct for certain subject distance distortions
- the multimedia communication device 100 is configured to present images and interfaces on the display 105 to as to reduce the occurrence of such distortions.
- interactive user interface elements responsive to touch-based user input are presented in portions of the display device 105 likely to reduce the occurrence of images with such disproportionate portions.
- interactive user interface elements may be positioned at or near the right or left ends of a display device 105 configured to operate as a touch screen to receive user input, such that input via a finger or handheld instrument is more likely to occur at positions away from an optical axis of an RGB camera 110 (including, for example, positions outside of an FOV of the RGB camera 110 ).
- such interactive user interface elements may be dynamically positioned and/or repositioned based on at least a detected position of a foreground subject. For example, an interactive user interface element may be moved from a left end to a right end in response to a corresponding lateral movement of a foreground subject.
- the dynamic positioning and/or repositioning of user interface elements may include selecting one of multiple areas of the display device 105 where touch-based input occurs away from optical axes of one or more of the RGB cameras 110 .
- a hand or limb likely to be used for touch-based input may be determined for a foreground subject (for example, a determination of a dominant hand based on past user input events), and dynamic positioning or repositioning is performed based on which hand is determined likely to be used. For example, positions to the left (as viewed by a user looking at the display device) of a foreground camera may be preferred to avoid a left-handed foreground subject reaching across an FOV of the foreground camera.
- a user interface may be selectively positioned to place a display area of the user interface closer than an input portion of the user interface to an optical axis of an RGB camera 110 , thereby guiding a foreground subject's gaze toward a RGB camera 110 at times that they are interacting with an application on the multimedia communication device 100 and not looking at an image of a remote participant, while also guiding the foreground subject's input interactions away from the RGB camera 110 so as to avoid subject distance distortions.
- FIGS. 5A-5D illustrate techniques which may be applied by the video conferencing system 102 in response to changes in distance between multimedia communication devices and respective foreground subjects.
- FIG. 5A illustrates a first scenario 500 occurring at about a first time, including a scene 500 a at a first geographic location and a scene 500 b at a different second geographic location, and a resulting composite image 540 .
- a first participant 504 is participating in a video conferencing session via a first multimedia communication device 502 .
- a second participant 514 is participating in the video conferencing session via a second multimedia communication device 512 .
- Each of the multimedia communication devices 504 and 514 may be configured as described for the multimedia communication devices 100 and 160 in FIGS.
- the multimedia communication devices 504 and 514 have smaller display screens than the multimedia communication device 100 , but otherwise are similarly configured.
- the first and second multimedia communication devices 502 and 512 are included in the video conferencing system 102 .
- the video conferencing system 102 determines a distance d 505 (in this example, about 70 centimeters) between the first multimedia communication device 502 and the first participant 504 .
- the first multimedia communication device 502 includes an RGB camera 506 c with a horizontal FOV 507 c (in this example, about 100 degrees), which is used to capture an RGB image 520 .
- a shoulder width of the first participant 504 occupies a horizontal angle or FOV 509 of the RGB camera 506 c of about 27.4 degrees.
- a foreground image portion 522 of the RGB image 520 corresponding to the first participant 504 , has a shoulder width 524 of about 20.4% of the width of the RGB image 520 and a height 526 of about 82% of the height of the RGB image 520 .
- the video conferencing system 102 (for example, the first multimedia communication device 502 ) segments a foreground image 528 , corresponding to the foreground subject 132 , from the RGB image 520 .
- the video conferencing system 102 determines a distance d 515 (in this example, about 140 centimeters) between the second multimedia communication device 512 and the second participant 514 .
- the second multimedia communication device 512 includes an RGB camera 516 c , which is used to capture an RGB image (not shown in FIG. 5A ).
- a shoulder width of the second participant 514 occupies a horizontal angle or FOV 519 of the RGB camera 516 c of about 13.4 degrees.
- FIG. 5B illustrates aspects of scaling of the foreground image 528 by the video conferencing system 102 (for example, the multimedia communication devices 502 and/or 512 ) for the composite image 540 based on at least the distance d 505 between the first multimedia communication device 502 and the first participant 504 .
- the video conferencing system 102 is configured to determine an apparent distance d 534 based on the distances d 505 and d 515 .
- the apparent distance d 534 is a sum of the distance d 505 and the distance d 515 , although other techniques may be used, including, but not limited to, limiting distances d 505 and/or d 515 to minimum and/or maximum distances, and/or applying a weighting or scaling factor to distances d 505 and/or d 515 .
- a portion of a display screen of the second multimedia communication device 512 appears to the second participant 514 to be like a “virtual window” 532 , through which the first participant 504 appears to be at the apparent distance d 534 from the second participant 514 .
- the video conferencing system 102 is configured to scale the foreground image 528 based on the apparent distance d 534 , resulting in the foreground image 528 being scaled such that it would have a total height 544 of about 95% of the height of the composite image 540 , resulting in the rendered foreground image 542 having a shoulder width 538 of about 22.7% of the width of the composite image 540 , spanning a horizontal FOV 536 of the second participant 514 of about 10.1 degrees.
- d 534 apparent distance
- the foreground image 528 being scaled such that it would have a total height 544 of about 95% of the height of the composite image 540 , resulting in the rendered foreground image 542 having a shoulder width 538 of about 22.7% of the width of the composite image 540 , spanning a horizontal FOV 536 of the second participant 514 of about 10.1 degrees.
- the video conferencing system 102 is configured to generate the composite image 540 with the eye position of the rendered foreground image 542 composited at about an RGB camera pixel display region 508 c for the foreground camera RGB camera 516 c . This results in the rendered foreground image 542 having a height 546 of about 63% of the height of the composite image 540 . It is noted that the video conferencing system 102 may be configured to similarly scale an image of the second participant 514 for display to the first participant 504 via the first multimedia communication device 502 , thereby achieving the same “virtual window” effect for both participants 504 and 514 .
- FIG. 5C illustrates a second scenario 550 occurring at about a second time after the first time in FIG. 5A and during the video conferencing session shown in FIG. 5A in which the second participant 514 has moved closer to the second multimedia communication device 512 , including a scene 550 a for the first participant 504 and a scene 550 b for the second participant 514 , and a resulting composite image 562 .
- the first participant 504 has remained in the physical position shown in FIG. 5A .
- the distance d 555 and horizontal FOV 509 are essentially the same, and the RGB image 552 captured by the RGB camera 506 c has a foreground image portion 554 with a shoulder width 556 and height 558 that are approximately the same as the shoulder width 524 and height 526 in FIG. 5A , resulting in a foreground image 560 similar to the foreground image 528 in FIG. 5A .
- FIG. 5D illustrates aspects of scaling of the foreground image 560 by the video conferencing system 102 for the composite image 562 based on at least the distance d 505 between the first multimedia communication device 502 and the first participant 504 in accordance with the techniques described in FIG. 5A .
- the movement of the second participant 514 has resulted in a decreased apparent distance d 535 , an increased horizontal FOV 537 of about 14.3 degrees.
- the net result is the foreground image 560 being scaled smaller than in FIG. 5A .
- the foreground image 560 being scaled such that it would have a total height 566 of about 71% of the height of the composite image 562 (a decrease of about 15% from the scaling of the foreground image 528 for the composite image 540 in FIG. 5A ), resulting in the rendered foreground image 564 having a shoulder width 539 of about 16.9% of the width of the composite image 562 , spanning a horizontal FOV 537 of the second participant 514 of about 14.3 degrees (an increase by about 42% over the horizontal FOV 536 in FIG. 5A ).
- the rendered foreground image 564 has a height 568 of about 60% of the height of the composite image 564 .
- FIGS. 5E and 5F illustrate additional techniques which may be applied by the video conferencing system 102 (for example, by multimedia communication devices 100 and/or 160 ) in response to changes in distance between the first multimedia communication device 100 and a foreground subject 132 .
- FIG. 5E illustrates an example scene 570 in which the foreground subject 132 has moved from the physical position shown in FIG. 3F to a new physical position closer to the multimedia communication device 100 , at a distance d 571 , and the resulting composite image 577 displayed by the multimedia communication device 160 .
- FIGS. 5E illustrate additional techniques which may be applied by the video conferencing system 102 (for example, by multimedia communication devices 100 and/or 160 ) in response to changes in distance between the first multimedia communication device 100 and a foreground subject 132 .
- FIG. 5E illustrates an example scene 570 in which the foreground subject 132 has moved from the physical position shown in FIG. 3F to a new physical position closer to the multimedia communication device 100 , at a distance d
- the video conferencing system 102 (for example, the multimedia communication device 100 or 160 ) is configured to generate the composite image 577 with the eye position of the rendered foreground image 578 composited at about the pixel display region 190 for the foreground camera (in this case, the pixel display region 190 c , as in FIGS. 3E, 3F, and 3G ).
- a different and larger view of the foreground subject 132 is captured in a foreground image portion 573 of an RGB image 572 from the RGB camera 110 b than in the examples shown in FIGS. 3B, 3F, and 3G .
- a shoulder width 574 of the foreground image portion 573 (at about 30% of the width of the RGB image 582 ) is about 70% greater than in those examples, the foreground image portion 573 has a height 575 of about 82% of the height of the RGB image 572 , and only a portion of the foreground subject 132 above the waist was captured in the RGB image 572 .
- the video conferencing system 102 segments a foreground image 576 corresponding to the foreground subject 132 from the RGB image 572 .
- the video conferencing system 102 (for example, the multimedia communication device 100 or 160 ) is configured to scale the foreground image 576 based on at least the distance d 571 between the multimedia communication device 100 and the foreground subject 132 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) may determine the distance d 571 based on at least depth images from the depth cameras 115 .
- the foreground image 576 is scaled such that it would have a total height 580 of about 65% of the height of the composite image 577 (a decrease of about 21% from the proportionate size of the foreground image portion 573 of the RGB image 572 ), resulting in a rendered shoulder width 579 of about 23.2%.
- the foreground subject 132 Since a lower portion of the foreground subject 132 was not captured in the RGB image 572 , most of the foreground image 576 is included in the composite image 577 , with the rendered portion 578 of the foreground image 576 having a rendered height 581 of about 59% of the height of the composite image 577 . As a result of the scaling based on distance, the foreground subject 132 has a very similar appearance in FIGS. 3F and 5A despite the differences in the captured RGB images 364 and 575 .
- FIG. 5F illustrates an example scene 582 in which the foreground subject 132 has moved from the physical position shown in FIG. 5E to a new physical position further away from the multimedia communication device 100 , at a distance d 583 , and the resulting composite image 589 .
- a different and smaller view of the foreground subject 132 is captured in a foreground image portion 585 of an RGB image 584 from the RGB camera 110 b than in the examples shown in FIGS. 3B, 3F, 3G, and 5E .
- a shoulder width 586 of the foreground image portion 585 is only about 15.6% of the width of the RGB image 584 , while the foreground image portion 585 has a height 587 of about 65% of the height of the RGB image 584 .
- the video conferencing system 102 segments a foreground image 588 corresponding to the foreground subject 132 from the RGB image 584 .
- the video conferencing system 102 again scales the foreground image 588 based on at least the distance d 583 between the multimedia communication device 100 and the foreground subject 132 .
- the foreground image 588 is scaled such that it would have a total height 592 of about 97% of the height of the composite image 589 (an increase of about 49% over the scaling of the foreground image 576 portion for the composite image 577 in FIG. 5E ), resulting in the rendered foreground image 590 having a shoulder width 591 of about 23.2%, which is substantially similar to the shoulder width 579 in FIG. 5E .
- the rendered foreground image 590 of the foreground image 587 has a rendered height 592 of about 59% of the height of the composite image 589 , which is substantially similar to the rendered height 580 in FIG. 5E .
- movement of the foreground subject 132 throughout much of an FOV of an RGB camera has a substantially reduced effect, both reducing distraction from changes in appearance caused by such movements of the foreground subject 132 and enabling a gaze-correct multi-party video conferencing session between at least those two participants despite such movements, granting participants more freedom within more effective video conferencing sessions.
- a side-by-side arrangement where two participants stand close together facing the same direct
- an off-axis arrangement where two individuals stand off-axis to each other (for example, perpendicularly to each other in an L-arrangement as if standing on two edges of the letter ‘L’).
- the face-to-face arrangement an arrangement commonly achieved by conventional video conferencing—is considered confrontational and uncomfortable over time, and instead the off-axis arrangement is preferred.
- spatial positioning is dynamic over the course of a conversation. For example, the face-to-face arrangement is often preferred when people greet each other at a beginning of a conversation, which then shifts to the off-axis arrangement.
- FIGS. 6A-6D illustrate techniques for selecting and changing RGB cameras that further support providing gaze-correct video conferencing sessions among and between various participants at various geographic locations during a single video conferencing session.
- FIG. 6A illustrates a first scenario 600 occurring at about a first time, including a scene 600 a at the second geographic location 150 shown in FIG. 1 and a scene 600 b at the first geographic location 120 shown in FIG. 1 .
- Two views are shown for the scene 600 a : on the left is a top view showing a physical position of the participant 155 relative to the multimedia communication device 160 , and on the right a perspective view showing the participant 155 interacting with a rendered foreground image 606 of the participant 132 displayed by the multimedia communication device 160 .
- two views are shown for the scene 600 b : on the left is a top view showing a physical position of the participant 132 relative to the multimedia communication device 100 , and on the right a perspective view showing the participant 132 interacting with a rendered foreground image 616 of the participant 155 displayed by the multimedia communication device 100 .
- the video conferencing system 102 is configured to determine (for example, at the multimedia communication device 160 ) a physical position of the participant 155 relative to the multimedia communication device 160 for selecting (for example, at the multimedia communication device 100 ) an RGB camera 110 of the multimedia communication device 100 as a foreground camera which will be used by the multimedia communication device 100 to capture images of the participant 132 and to which the portion of the rendered foreground image 616 depicting the eyes of the participant 155 will be aligned.
- the video conferencing system 102 (for example, the multimedia communication device 100 ) is configured to determine (for example, at the multimedia communication device 100 ) a physical position of the participant 132 relative to the multimedia communication device 100 for selecting (for example, at the multimedia communication device 160 ) an RGB camera 180 of the multimedia communication device 160 as a foreground camera which will be used by the multimedia communication device 100 to capture images of the participant 132 and to which the portion of the rendered foreground image 606 depicting the eyes of the participant 132 will be aligned.
- the video conferencing system 102 is configured to select the RGB camera 180 having a lateral position most closely corresponding to a detected lateral physical position of the participant 132 relative to the multimedia display device 100 .
- the video conferencing system 102 is configured to determine which of the RGB cameras 110 the participant 132 is most directly aligned with, and the video conferencing system 102 is configured to select the corresponding RGB camera 180 as the active camera (where RGB cameras 180 a , 180 b , 180 c , and 180 d respectively correspond to the RGB cameras 110 a , 100 b , 100 c , and 110 d ).
- the multimedia communication devices 100 and 160 are also configured reciprocally.
- the video conferencing system 102 determines that the participant 155 is laterally aligned with the RGB camera 180 c . In response to this determination, the video conferencing system 102 selects the corresponding RGB camera 110 c as the foreground camera for the participant 132 . As a result, an RGB image captured by the RGB camera 110 c will be used for generating the rendered foreground image 606 , and the eyes of the participant 155 depicted in the rendered foreground image 616 are aligned with the position of the pixel display region 210 c for the RGB camera 110 c.
- the video conferencing system 102 may determine that the participant 132 is laterally aligned with the RGB camera 110 c . In response to this determination, the video conferencing system 102 selects the corresponding RGB camera 180 c as the foreground camera for the participant 155 . As a result, an RGB image captured by the RGB camera 180 c will be used for generating the rendered foreground image 616 , the eyes of the participant 132 depicted in the rendered foreground image 606 are aligned with the position of the pixel display region 190 c for the RGB camera 180 c , and the gaze direction 602 of the participant 155 is directed at the RGB camera 180 c.
- a gaze direction 612 of the participant 132 is directed at the RGB camera 110 c behind the displayed eyes of the participant 155 .
- a gaze direction 602 of the participant 155 is directed at the RGB camera 180 c behind the displayed eyes of the participant 132 .
- both of the multimedia communication devices 100 and 160 capture foreground images in which the participants 132 and 155 are looking directly at the foreground cameras, resulting in a gaze-correct video conferencing session in which the participants 132 and 155 feel that they are making eye contact with each other.
- the multimedia communication devices 100 and 160 each convey a face-to-face spatial arrangement to the participants 132 and 155 , which may be preferable at certain times during the session, such as an initial salutary portion in which the participants 132 and 155 greet each other.
- FIG. 6B illustrates a second scenario 620 occurring at about a second time after the first time shown in FIG. 6A and during the video conferencing session shown in FIG. 6A , including a scene 620 a at the second geographic location 150 and a scene 620 b at the first geographic location 120 .
- the video conferencing system 102 (for example, the multimedia communication device 160 ) has determined that the participant 155 has moved to a new physical position, which is still within an FOV 184 c of the RGB camera 180 c .
- the video conferencing system 102 determines that the participant 155 is at a lateral physical position relative to the multimedia communication device 160 that is more aligned with the RGB camera 180 b than the previous RGB camera 180 c . In response to this determination, the video conferencing system 102 selects the corresponding RGB camera 110 b as the foreground camera for the participant 132 , changing from the RGB camera 110 c selected in FIG. 6A .
- the RGB camera 110 b As the foreground camera for the participant 132 in response to the new physical position of the participant 155 , images of the participant 155 are displayed in alignment with the RGB camera area 210 b for the RGB camera 110 b , as shown by the position of the rendered foreground image 636 in FIG. 6B .
- the gaze direction 632 of the participant 132 moves from the RGB camera area 210 c to the RGB camera area 210 b .
- An RGB image captured by the RGB camera 110 b will be used for generating the rendered foreground image 626 displayed to the participant 155 via the video conferencing session, and with the gaze direction 632 directed at the RGB camera 110 b , a gaze-correct video conferencing session is maintained.
- the rendered foreground image 626 continues to be aligned with the RGB camera area 190 c as in FIG. 6A , as the participant 132 has not moved significantly and the video conferencing system 102 continues to determine that the subject 132 is most aligned with the RGB camera 110 c (as in FIG. 1 ). Due to the new physical position of the participant 155 in FIG. 6B , the participant 155 has turned slightly to continue a gaze direction 622 directed at the RGB camera 180 c , and a gaze-correct video conferencing session is maintained.
- the multimedia communication devices 100 and 160 each convey an off-axis spatial arrangement to each of the participants 132 and 155 that is responsive to movements of the participant 132 and/or 155 , as further illustrated by FIGS. 6C and 6D below.
- FIG. 6C illustrates a third scenario 640 occurring at about a third time after the second time shown in FIG. 6B and during the video conferencing session shown in FIGS. 6A and 6B , including a scene 640 a at the second geographic location 150 and a scene 640 b at the first geographic location 120 .
- the video conferencing system 102 has determined that the participant 155 has moved to another new physical position, which is still within an FOV 184 c of the RGB camera 180 c . Based on the new physical position, the video conferencing system 102 determines that the participant 155 is at a lateral physical position relative to the multimedia communication device 160 that is more aligned with the RGB camera 180 a than the previous RGB camera 180 b . In response to this determination, the video conferencing system 102 selects the corresponding RGB camera 110 a as the foreground camera for the participant 132 , changing from the RGB camera 110 b selected in FIG. 6B .
- the RGB camera 110 a As the foreground camera for the participant 132 in response to the new physical position of the participant 155 , images of the participant 155 are displayed in alignment with the RGB camera area 210 a for the RGB camera 110 a , as shown by the position of the rendered foreground image 656 in FIG. 6C .
- the gaze direction 652 of the participant 132 moves from the RGB camera area 210 b to the RGB camera area 210 a , and the participant 132 turns his body to facilitate the new gaze direction 652 .
- An RGB image captured by the RGB camera 110 a will be used for generating the rendered foreground image 646 displayed to the participant 155 via the video conferencing session, and with the gaze direction 652 directed at the RGB camera 110 a , a gaze-correct video conferencing session is maintained.
- the rendered foreground image 646 continues to be aligned with the RGB camera area 190 c as in FIG. 6B . Due to the new physical position of the participant 155 in FIG. 6C , the participant 155 has turned her head to continue a gaze direction 642 directed at the RGB camera 180 c , and a gaze-correct video conferencing session is maintained.
- the multimedia communication devices 100 and 160 each convey a more oblique off-axis spatial arrangement to each of the participants 132 and 155 than in FIG. 6B .
- FIG. 6D illustrates a fourth scenario 660 occurring at about a fourth time after the third time shown in FIG. 6C and during the video conferencing session shown in FIGS. 6A-6C , including a scene 660 a at the second geographic location 150 and a scene 660 b at the first geographic location 120 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) has determined that the participant 132 has moved to a new physical position, which is still within an FOV 304 a of the RGB camera 110 a .
- the video conferencing system 102 determines that the participant 132 is at a lateral physical position relative to the multimedia communication device 100 that is more aligned with the RGB camera 110 b than the previous RGB camera 110 c . In response to this determination, the video conferencing system 102 selects the corresponding RGB camera 180 b as the foreground camera for the participant 155 , changing from the RGB camera 180 c selected in FIG. 6A .
- the RGB camera 180 b As the foreground camera for the participant 155 in response to the new physical position of the participant 132 , images of the participant 132 are displayed in alignment with the RGB camera area 190 b for the RGB camera 180 b , as shown by the position of the rendered foreground image 666 in FIG. 6D .
- the gaze direction 662 of the participant 155 moves from the RGB camera area 190 c to the RGB camera area 190 b .
- An RGB image captured by the RGB camera 180 b will be used for generating the rendered foreground image 676 displayed to the participant 132 via the video conferencing session, and with the gaze direction 662 directed at the RGB camera 180 b , a gaze-correct video conferencing session is maintained.
- the rendered foreground image 676 continues to be aligned with the RGB camera area 180 a as in FIG. 6C .
- a gaze direction 672 continuing to be directed at the RGB camera 110 a , a gaze-correct video conferencing session is maintained
- the multimedia communication devices 100 and 160 each convey a different off-axis spatial arrangement to each of the participants 132 and 155 than illustrated in FIG. 6C .
- the video conferencing system 102 via the multimedia communication devices 100 and 160 , enables spatial arrangements to be dynamically created, communicated, and controlled by video conferencing session participants.
- participants can assume a natural off-axis, diagonally opposite formation while retaining gaze awareness.
- a participant can look at another participant in the eyes when they want to, but is not forced to do so.
- the video conferencing system 102 conveys when another participant chooses to look away. This interaction and information is conveyed in a natural manner that conforms to established social conventions for in-person face-to-face interactions.
- FIGS. 5A-5D are combined with the techniques of FIGS. 6A-6D , spatial arrangements may be controlled and perceived in further detail, further enhancing interactions.
- FIGS. 7A-7C illustrate a technique used in some implementations, in which rendered foreground images make an animated transition from one RGB camera area to another when a new foreground camera is selected, in which over several successive video frames the rendered foreground images “glide” or otherwise approximate lateral human motion from the previous RGB camera area to the new RGB camera area.
- FIG. 7A illustrates a position of the rendered foreground image 646 in FIG. 6C at a point when the RGB camera 180 c has been selected as the foreground camera for the participant 155 . Accordingly, the eyes of the participant 132 in the rendered foreground image 646 are aligned with the RGB camera area 190 c .
- FIG. 7A illustrates a technique used in some implementations, in which rendered foreground images make an animated transition from one RGB camera area to another when a new foreground camera is selected, in which over several successive video frames the rendered foreground images “glide” or otherwise approximate lateral human motion from the previous RGB camera area to the new RGB camera area.
- FIG. 7B illustrates an animated transition to a new RGB camera area 190 b in response to the scenario 660 shown in FIG. 6D .
- a first rendered foreground image 710 for the participant 132 is first displayed at an intermediate lateral position 720 between the RGB camera areas 190 c and 190 b , followed by a second rendered foreground image 712 for the participant 132 being displayed at an intermediate lateral position 722 between the intermediate lateral position 720 and the RGB camera area 190 b , which is followed by a third rendered foreground image 714 for the participant 132 being displayed at an intermediate lateral position 724 between the intermediate lateral position 722 and the RGB camera area 190 b .
- FIG. 7C illustrates the rendered foreground image 766 at its target position aligned with the RGB camera area 190 b , as shown in in FIG. 6D .
- An advantage of performing the animated transition shown in FIGS. 7A-7C is that the gaze direction 662 of the participant 155 will track the animated position, resulting in a smoother transition in the gaze direction captured by the new foreground camera and displayed to the participant 132 . Additionally, such animated transitions in position are visually engaging for participants, further drawing participant's gazes to the rendered eye positions. In some implementations, more exaggerated motions may be implemented and selected to further enhance these effects.
- FIG. 8 illustrates techniques involving having multiple participants 132 and 134 concurrently participating in a video conferencing session via a single shared multimedia communication device 100 .
- FIG. 8 continues the video conferencing session shown in FIGS. 6A-6D , and illustrates a fifth scenario 800 including a scene 800 a at the second geographic location 150 and a scene 800 b at the first geographic location 120 .
- the previously seated participant 134 is now standing and in close proximity to the multimedia communication device 100 .
- the video conferencing system 102 (for example, the multimedia communication device 100 ) has identified the two participants 132 and 134 as two different and concurrent foreground subjects. Additionally, the participant 132 is at a different physical position than in FIG. 6D .
- the video conferencing system 102 determines that the participant 132 is at a lateral physical position relative to the multimedia communication device 100 that is most aligned with the RGB camera 110 d and that the participant 134 is at a lateral physical position relative to the multimedia communication device 100 that is most aligned with the RGB camera 110 b.
- the video conferencing system 102 selects the RGB camera area 190 d for the RGB camera 180 d corresponding to the RGB camera 110 d for alignment of the rendered foreground image 812 .
- the video conferencing system 102 selects the RGB camera area 190 b for the RGB camera 180 b corresponding to the RGB camera 110 b for alignment of the rendered foreground image 814 .
- the eyes of each of the participants 132 and 134 are displayed by the multimedia communication device 160 in front of respective RGB cameras 180 d and 180 b , enabling the multimedia communication device 160 to capture gaze-aligned RGB images of the participant 155 when the participant 155 looks at either of the participants 132 and 134 .
- the video conferencing system 102 (for example, the multimedia communication device 160 ) is configured to dynamically select a foreground camera from one of the RGB cameras 180 associated with a displayed participant.
- the video conferencing system 102 is configured to determine a gaze direction for the participant 155 and select the RGB camera 180 most directly aligned with the gaze direction of the participant 155 .
- the participant is currently looking at the participant 132 along the gaze direction 902 a , and as a result, the current foreground camera for the participant 155 is the RGB camera 180 d .
- the video conferencing system 102 may select the RGB camera 180 b as the foreground camera.
- the participant 155 is also at a different physical position than shown in FIG. 6D .
- the video conferencing system 102 determines that the participant 155 is at a lateral physical position relative to the multimedia communication device 160 that is most aligned with the RGB camera 180 c .
- the video conferencing system 102 selects the corresponding RGB camera 110 c as the foreground camera for the participant 132 .
- the video conferencing system 102 also selects the corresponding RGB camera 110 c as the foreground camera for the participant 134 .
- the RGB camera 110 c is effective for capturing gaze-aligned RGB images for both of the participants 132 and 134 for generating the rendered foreground images 912 and 914 .
- the multimedia communication devices 100 and 160 effectively establish a gaze-correct video conferencing session is for all three participants 132 , 134 , and 155 , even where there is a greater number of participants than a number of multimedia communication devices.
- FIG. 9 illustrates an example of gaze-correct multi-party video conferencing among five participants each at a different geographic location. In some examples, similar techniques and advantages may be realized with three or more participants each at different locations.
- FIG. 9 illustrates a scenario 900 including a five scenes 900 a , 900 b , 900 c , 900 d , and 900 e at respective different geographic locations 910 , 912 , 914 , 916 , and 918 with respective multimedia communication devices 930 , 932 , 934 , 936 , and 938 used by respective participants 920 , 922 , 924 , 926 , and 928 to participate in a single multi-party video conference session.
- Each of the multimedia communication devices 930 , 932 , 934 , 936 , and 938 may be configured as described for the multimedia communication devices 100 and 160 in FIGS. 1-8 .
- the multimedia communication devices 930 , 932 , 934 , 936 , and 938 are included in the video conferencing system 102 .
- the discussion will focus on the multimedia communication device 930 , as it is generally representative of the behavior of the other multimedia communication devices 932 , 934 , 936 , and 938 in this example.
- the video conferencing system 102 determines for the multimedia communication device 930 which RGB camera is aligned with each of the rendered foreground images of the other participants 922 , 924 , 926 , and 928 .
- each of the rendered foreground images has a narrower width than in the previous examples.
- the eyes of all of the participants 922 , 924 , 926 , and 928 are displayed over respective RGB camera areas. This, much as in FIG. 8 , enables the multimedia communication device 930 to capture gaze-aligned RGB images of the participant 920 when the participant 920 looks at any of the participants 922 , 924 , 926 , and 928 .
- the participant 924 is currently speaking, and accordingly may be referred to as the “active speaker” in the video conferencing session.
- the video conferencing system 102 (for example, the multimedia communication device 930 ) may automatically select the RGB camera associated with the active speaker as the foreground camera, although gaze detection may be used in some implementations, as discussed in FIG. 8 .
- the participant 924 is engaged in a discussion with the participant 920 , and as a result the gaze direction of the participant 924 is directed at the RGB camera corresponding to the participant 920 .
- the video conferencing system 102 may be configured to provide a visual indication of the active speaker, to assist participant identification of and focus on the active speaker.
- a graphical element 950 such as, but not limited to, an icon or outline may be included in a composite image 1042 to highlight the active speaker.
- the active speaker may be scaled differently than other participants and shown at a larger size than the other participants while still aligning the displayed eyes of the participants with respective RGB cameras.
- the multimedia communication devices 930 , 932 , 934 , 936 , and 938 effectively establish a gaze-correct multi-party video conferencing session even where there is a large number of participants using different multimedia communication devices.
- FIG. 10 illustrates an example in which two multimedia communication devices 1020 and 1040 are tiled adjacent to each other to provide a larger multimedia communication device or system 1010 .
- Each of the multimedia communication devices or systems 1010 , 1020 , and 1040 may be configured as described for the multimedia communication devices 100 , 160 , 932 , 934 , 936 , and 938 in FIGS. 1-9 .
- First and second multimedia communication devices 1020 and 1040 are positioned in landscape orientations and horizontally adjacent to each other. In some implementations, the first and second multimedia communication devices 1020 and 1040 are at fixed positions, such as mounted on a wall or stand.
- the second multimedia communication device 1040 may be dynamically combined, including during an ongoing video conferencing session, with the first multimedia communication device 1020 to provide the larger multimedia communication device 1010 .
- the two multimedia communication devices 1020 and 1040 are communicatively coupled to operate together as a single larger multimedia communication device or system 1010 , which is configured to make use of the RGB cameras 1030 a , 1030 b , 1030 c , 1030 d , 1050 a , 1050 b , 1050 c , and 1050 d , and the depth cameras 1035 a , 1035 b , 1055 a , and 1055 b , arranged behind display devices 1025 and 1045 .
- multiple devices may be used, such as, but not limited to, multiple devices positioned in portrait orientations and horizontally to each other, and arrays of devices (for example, a 2 ⁇ 2 array). Such arrangements offer more cameras, and a wider FOV. Additionally, multiprocessing may be performed among multiple multimedia communication devices.
- FIGS. 1-10 are implemented in respective modules, which may also be referred to as, and/or include, logic, components, units, and/or mechanisms. Modules may constitute either software modules (for example, code embodied on a machine-readable medium) or hardware modules.
- a hardware module may be implemented mechanically, electronically, or with any suitable combination thereof.
- a hardware module may include dedicated circuitry or logic that is configured to perform certain operations.
- a hardware module may include a special-purpose processor, such as a field-programmable gate array (FPGA) or an Application Specific Integrated Circuit (ASIC).
- a hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations, and may include a portion of machine-readable medium data and/or instructions for such configuration.
- a hardware module may include software encompassed within a programmable processor configured to execute a set of software instructions. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (for example, configured by software) may be driven by cost, time, support, and engineering considerations.
- hardware module should be understood to encompass a tangible entity capable of performing certain operations and may be configured or arranged in a certain physical manner, be that an entity that is physically constructed, permanently configured (for example, hardwired), and/or temporarily configured (for example, programmed) to operate in a certain manner or to perform certain operations described herein.
- “hardware-implemented module” refers to a hardware module. Considering examples in which hardware modules are temporarily configured (for example, programmed), each of the hardware modules need not be configured or instantiated at any one instance in time.
- a hardware module includes a programmable processor configured by software to become a special-purpose processor
- the programmable processor may be configured as respectively different special-purpose processors (for example, including different hardware modules) at different times.
- Software may accordingly configure a particular processor or processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
- a hardware module implemented using one or more processors may be referred to as being “processor implemented” or “computer implemented.”
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (for example, over appropriate circuits and buses) between or among two or more of the hardware modules. In implementations in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory devices to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output in a memory device, and another hardware module may then access the memory device to retrieve and process the stored output.
- At least some of the operations of a method may be performed by one or more processors or processor-implemented modules.
- the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS).
- SaaS software as a service
- at least some of the operations may be performed by, and/or among, multiple computers (as examples of machines including processors), with these operations being accessible via a network (for example, the Internet) and/or via one or more software interfaces (for example, an application program interface (API)).
- the performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across a number of machines.
- Processors or processor-implemented modules may be located in a single geographic location (for example, within a home or office environment, or a server farm), or may be distributed across multiple geographic locations.
- FIG. 11 is a block diagram 1100 illustrating an example software architecture 1102 , various portions of which may be used in conjunction with various hardware architectures herein described, which may implement any of the above-described features.
- FIG. 11 is a non-limiting example of a software architecture and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein.
- the software architecture 1102 may execute on hardware such as a device 120 of FIG. 1A that includes, among other things, document storage 1170 , processors, memory, and input/output (I/O) components.
- a representative hardware layer 1104 is illustrated and can represent, for example, the device 120 of FIG. 1 .
- the representative hardware layer 1104 includes a processing unit 1106 and associated executable instructions 1108 .
- the executable instructions 1108 represent executable instructions of the software architecture 1102 , including implementation of the methods, modules and so forth described herein.
- the hardware layer 1104 also includes a memory/storage 1110 , which also includes the executable instructions 1108 and accompanying data.
- the hardware layer 1104 may also include other hardware modules 1112 .
- Instructions 1108 held by processing unit 1108 may be portions of instructions 1108 held by the memory/storage 1110 .
- the example software architecture 1102 may be conceptualized as layers, each providing various functionality.
- the software architecture 1102 may include layers and components such as an operating system (OS) 1114 , libraries 1116 , frameworks 1118 , applications 1120 , and a presentation layer 1124 .
- OS operating system
- the applications 1120 and/or other components within the layers may invoke API calls 1124 to other layers and receive corresponding results 1126 .
- the layers illustrated are representative in nature and other software architectures may include additional or different layers. For example, some mobile or special purpose operating systems may not provide the frameworks/middleware 1118 .
- the OS 1114 may manage hardware resources and provide common services.
- the OS 1114 may include, for example, a kernel 1128 , services 1130 , and drivers 1132 .
- the kernel 1128 may act as an abstraction layer between the hardware layer 1104 and other software layers.
- the kernel 1128 may be responsible for memory management, processor management (for example, scheduling), component management, networking, security settings, and so on.
- the services 1130 may provide other common services for the other software layers.
- the drivers 1132 may be responsible for controlling or interfacing with the underlying hardware layer 1104 .
- the drivers 1132 may include display drivers, camera drivers, memory/storage drivers, peripheral device drivers (for example, via Universal Serial Bus (USB)), network and/or wireless communication drivers, audio drivers, and so forth depending on the hardware and/or software configuration.
- USB Universal Serial Bus
- the libraries 1116 may provide a common infrastructure that may be used by the applications 1120 and/or other components and/or layers.
- the libraries 1116 typically provide functionality for use by other software modules to perform tasks, rather than rather than interacting directly with the OS 1114 .
- the libraries 1116 may include system libraries 1134 (for example, C standard library) that may provide functions such as memory allocation, string manipulation, file operations.
- the libraries 1116 may include API libraries 1136 such as media libraries (for example, supporting presentation and manipulation of image, sound, and/or video data formats), graphics libraries (for example, an OpenGL library for rendering 2D and 3D graphics on a display), database libraries (for example, SQLite or other relational database functions), and web libraries (for example, WebKit that may provide web browsing functionality).
- the libraries 1116 may also include a wide variety of other libraries 1138 to provide many functions for applications 1120 and other software modules.
- the frameworks 1118 provide a higher-level common infrastructure that may be used by the applications 1120 and/or other software modules.
- the frameworks 1118 may provide various graphic user interface (GUI) functions, high-level resource management, or high-level location services.
- GUI graphic user interface
- the frameworks 1118 may provide a broad spectrum of other APIs for applications 1120 and/or other software modules.
- the applications 1120 include built-in applications 1120 and/or third-party applications 1122 .
- built-in applications 1120 may include, but are not limited to, a contacts application, a browser application, a location application, a media application, a messaging application, and/or a game application.
- Third-party applications 1122 may include any applications developed by an entity other than the vendor of the particular platform.
- the applications 1120 may use functions available via OS 1114 , libraries 1116 , frameworks 1118 , and presentation layer 1124 to create user interfaces to interact with users.
- the virtual machine 1128 provides an execution environment where applications/modules can execute as if they were executing on a hardware machine (such as the machine 1000 of FIG. 10 , for example).
- the virtual machine 1128 may be hosted by a host OS (for example, OS 1114 ) or hypervisor, and may have a virtual machine monitor 1126 which manages operation of the virtual machine 1128 and interoperation with the host operating system.
- a software architecture which may be different from software architecture 1102 outside of the virtual machine, executes within the virtual machine 1128 such as an OS 1150 , libraries 1152 , frameworks 1154 , applications 1156 , and/or a presentation layer 1158 .
- FIG. 12 is a block diagram illustrating components of an example machine 1200 configured to read instructions from a machine-readable medium (for example, a machine-readable storage medium) and perform any of the features described herein.
- the example machine 1200 is in a form of a computer system, within which instructions 1216 (for example, in the form of software components) for causing the machine 1200 to perform any of the features described herein may be executed.
- the instructions 1216 may be used to implement modules or components described herein.
- the instructions 1216 cause unprogrammed and/or unconfigured machine 1200 to operate as a particular machine configured to carry out the described features.
- the machine 1200 may be configured to operate as a standalone device or may be coupled (for example, networked) to other machines.
- the machine 1200 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a node in a peer-to-peer or distributed network environment.
- Machine 1200 may be embodied as, for example, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a gaming and/or entertainment system, a smart phone, a mobile device, a wearable device (for example, a smart watch), and an Internet of Things (IoT) device.
- PC personal computer
- STB set-top box
- STB set-top box
- smart phone smart phone
- mobile device for example, a smart watch
- wearable device for example, a smart watch
- IoT Internet of Things
- the machine 1200 may include processors 1210 , memory 1230 , and I/O components 1250 , which may be communicatively coupled via, for example, a bus 1202 .
- the bus 1202 may include multiple buses coupling various elements of machine 1200 via various bus technologies and protocols.
- the processors 1210 including, for example, a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an ASIC, or a suitable combination thereof
- the processors 1210 may include one or more processors 1212 a to 1212 n that may execute the instructions 1216 and process data.
- one or more processors 1210 may execute instructions provided or identified by one or more other processors 1210 .
- processor includes a multi-core processor including cores that may execute instructions contemporaneously.
- FIG. 12 shows multiple processors
- the machine 1200 may include a single processor with a single core, a single processor with multiple cores (for example, a multi-core processor), multiple processors each with a single core, multiple processors each with multiple cores, or any combination thereof.
- the machine 1200 may include multiple processors distributed among multiple machines.
- the memory/storage 1230 may include a main memory 1232 , a static memory 1234 , or other memory, and a storage unit 1236 , both accessible to the processors 1210 such as via the bus 1202 .
- the storage unit 1236 and memory 1232 , 1234 store instructions 1216 embodying any one or more of the functions described herein.
- the memory/storage 1230 may also store temporary, intermediate, and/or long-term data for processors 1210 .
- the instructions 1216 may also reside, completely or partially, within the memory 1232 , 1234 , within the storage unit 1236 , within at least one of the processors 1210 (for example, within a command buffer or cache memory), within memory at least one of I/O components 1250 , or any suitable combination thereof, during execution thereof.
- the memory 1232 , 1234 , the storage unit 1236 , memory in processors 1210 , and memory in I/O components 1250 are examples of machine-readable media.
- machine-readable medium refers to a device able to temporarily or permanently store instructions and data that cause machine 1200 to operate in a specific fashion.
- the term “machine-readable medium,” as used herein, does not encompass transitory electrical or electromagnetic signals per se (such as on a carrier wave propagating through a medium); the term “machine-readable medium” may therefore be considered tangible and non-transitory.
- Non-limiting examples of a non-transitory, tangible machine-readable medium may include, but are not limited to, nonvolatile memory (such as flash memory or read-only memory (ROM)), volatile memory (such as a static random-access memory (RAM) or a dynamic RAM), buffer memory, cache memory, optical storage media, magnetic storage media and devices, network-accessible or cloud storage, other types of storage, and/or any suitable combination thereof.
- nonvolatile memory such as flash memory or read-only memory (ROM)
- volatile memory such as a static random-access memory (RAM) or a dynamic RAM
- buffer memory cache memory
- optical storage media magnetic storage media and devices
- network-accessible or cloud storage other types of storage, and/or any suitable combination thereof.
- machine-readable medium applies to a single medium, or combination of multiple media, used to store instructions (for example, instructions 1216 ) for execution by a machine 1200 such that the instructions, when executed by one or more processors 1210 of the machine 1200 , cause the machine 1200 to perform and one or
- the I/O components 1250 may include a wide variety of hardware components adapted to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on.
- the specific I/O components 1250 included in a particular machine will depend on the type and/or function of the machine. For example, mobile devices such as mobile phones may include a touch input device, whereas a headless server or IoT device may not include such a touch input device.
- the particular examples of I/O components illustrated in FIG. 12 are in no way limiting, and other types of components may be included in machine 1200 .
- the grouping of I/O components 1250 are merely for simplifying this discussion, and the grouping is in no way limiting.
- the I/O components 1250 may include user output components 1252 and user input components 1254 .
- User output components 1252 may include, for example, display components for displaying information (for example, a liquid crystal display (LCD) or a projector), acoustic components (for example, speakers), haptic components (for example, a vibratory motor or force-feedback device), and/or other signal generators.
- display components for example, a liquid crystal display (LCD) or a projector
- acoustic components for example, speakers
- haptic components for example, a vibratory motor or force-feedback device
- User input components 1254 may include, for example, alphanumeric input components (for example, a keyboard or a touch screen), pointing components (for example, a mouse device, a touchpad, or another pointing instrument), and/or tactile input components (for example, a physical button or a touch screen that provides location and/or force of touches or touch gestures) configured for receiving various user inputs, such as user commands and/or selections.
- alphanumeric input components for example, a keyboard or a touch screen
- pointing components for example, a mouse device, a touchpad, or another pointing instrument
- tactile input components for example, a physical button or a touch screen that provides location and/or force of touches or touch gestures
- the I/O components 1250 may include biometric components 1256 and/or position components 1262 , among a wide array of other environmental sensor components.
- the biometric components 1256 may include, for example, components to detect body expressions (for example, facial expressions, vocal expressions, hand or body gestures, or eye tracking), measure biosignals (for example, heart rate or brain waves), and identify a person (for example, via voice-, retina-, and/or facial-based identification).
- the position components 1262 may include, for example, location sensors (for example, a Global Position System (GPS) receiver), altitude sensors (for example, an air pressure sensor from which altitude may be derived), and/or orientation sensors (for example, magnetometers).
- GPS Global Position System
- the I/O components 1250 may include communication components 1264 , implementing a wide variety of technologies operable to couple the machine 1200 to network(s) 1270 and/or device(s) 1280 via respective communicative couplings 1272 and 1282 .
- the communication components 1264 may include one or more network interface components or other suitable devices to interface with the network(s) 1270 .
- the communication components 1264 may include, for example, components adapted to provide wired communication, wireless communication, cellular communication, Near Field Communication (NFC), Bluetooth communication, Wi-Fi, and/or communication via other modalities.
- the device(s) 1280 may include other machines or various peripheral devices (for example, coupled via USB).
- the communication components 1264 may detect identifiers or include components adapted to detect identifiers.
- the communication components 1264 may include Radio Frequency Identification (RFID) tag readers, NFC detectors, optical sensors (for example, one- or multi-dimensional bar codes, or other optical codes), and/or acoustic detectors (for example, microphones to identify tagged audio signals).
- RFID Radio Frequency Identification
- NFC detectors for example, one- or multi-dimensional bar codes, or other optical codes
- acoustic detectors for example, microphones to identify tagged audio signals.
- location information may be determined based on information from the communication components 1262 , such as, but not limited to, geo-location via Internet Protocol (IP) address, location via Wi-Fi, cellular, NFC, Bluetooth, or other wireless station identification and/or signal triangulation.
- IP Internet Protocol
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Graphics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Ophthalmology & Optometry (AREA)
- General Health & Medical Sciences (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
- Video conferencing technologies have become increasingly commonplace. As globalization continues to spread throughout the world economy, it is increasingly common to find projects where team members are widely distributed across continents. Video conferencing has long been considered a critical technology to reduce high travel expenses for distributed work-forces.
- During a teleconference or other video conferencing session, individuals may “interact” and engage in face-to-face conversations through images and sound captured by digital cameras and transmitted to participants. There is a growing reliance on such network-based video conferencing and video chat applications and services, such as Skype®, Google Chat®, and iChat®. Nevertheless, even with high end teleconferencing solutions face-to-face meeting is usually still a better experience than remote meetings.
- In some cases, there may be video conferences where participants wish to move through their environment or otherwise change their physical position. A video conference session in which there is real-time variability in the physical position of participants(s) relative to a camera or to one another may preclude the capture of a consistent or reliable view of the participants(s) for the remote users. One of the factors that is known to be essential for face-to-face communication is eye contact. Eye contact can instill trust and foster an environment of collaboration and partnership. Lack of eye contact, on the other hand, may generate feelings of distrust and discomfort. Unfortunately, eye contact is usually not preserved in typical video conferencing. Although various techniques have been employed for improving the quality of video conferencing, there remain significant areas for new and improved ideas for capturing and presenting video in video conferencing sessions.
- A video conferencing system, in accord with a first aspect of this disclosure, includes a first device including a first display device and a first camera, one or more processors, and one or more computer readable media including instructions which, when executed by the one or more processors, cause the one or more processors to obtain a first RGB image captured, at a first time during a video conferencing session, by the first camera, wherein the first camera is positioned to capture the first RGB image through a first pixel display region of the first display device. The instructions also cause the one or more processors to receive at the first device, via the video conferencing session, a first video stream providing a first series of live images of a first human participant of the video conferencing session, wherein the first series of live images includes a first image portion depicting the eyes of the first human participant. In addition, the instructions cause the one or more processors to display, at about the first time, a first composite image on the first display device, wherein a first pixel position of the first composite image is displayed by the first pixel display region, the first pixel position having a first lateral pixel position in the first composite image. Furthermore, the instructions cause the one or more processors to, before the display of the first composite image, composite the first image portion at about the first lateral pixel position in the first composite image, segment a first foreground image, corresponding to a second human participant of the video conferencing session, from the first RGB image, cause, via the video conferencing session, a second composite image to be displayed by a second device at a different geographic location than the first device, wherein the second composite image includes the first foreground image composited with a first background image.
- A method for video conferencing, in accord with a second aspect of this disclosure, includes obtaining a first RGB image captured, at a first time during a video conferencing session, by a first camera included in a first device, wherein the first camera is positioned to capture the first RGB image through a first pixel display region of a first display device included in the first device. The method also includes receiving at the first device, via the video conferencing session, a first video stream providing a first series of live images of a first human participant of the video conferencing session, wherein the first series of live images includes a first image portion depicting the eyes of the first human participant. In addition, the method includes displaying, at about the first time, a first composite image on the first display device, wherein a first pixel position of the first composite image is displayed by the first pixel display region, the first pixel position having a first lateral pixel position in the first composite image. The method further includes, before the display of the first composite image, compositing the first image portion at about the first lateral pixel position in the first composite image. In addition, the method involves segmenting a first foreground image, corresponding to a second human participant of the video conferencing session, from the first RGB image, and causing, via the video conferencing session, a second composite image to be displayed by a second device at a different geographic location than the first device, wherein the second composite image includes the first foreground image composited with a first background image.
- This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
- The drawing figures depict one or more implementations in accord with the present teachings, by way of example only, not by way of limitation. In the figures, like reference numerals refer to the same or similar elements. Furthermore, it should be understood that the drawings are not necessarily to scale.
-
FIG. 1 illustrates an example of a video conferencing system that includes a first multimedia communication device multimedia communication device being used to access and participate in a video conferencing session. -
FIG. 2 illustrates an exploded view of the first multimedia communication device illustrated inFIG. 1 . -
FIG. 3A illustrates an example of capturing and displaying human foreground subject images.FIG. 3B illustrates an example of segmentation of a foreground image from an RGB image captured by the multimedia communication device for the scene shown inFIG. 3A .FIG. 3C shows details of the foreground image obtained inFIG. 3B for the scene shown inFIG. 3A .FIG. 3D shows positions in a composite image corresponding to each of the RGB camera pixel display regions of a remote multimedia communication device that will display the composite image, such as the remote multimedia communication device inFIG. 1 .FIG. 3E illustrates a portion of the composite image generated for the scene shown inFIG. 3A using the foreground image shown inFIG. 3C .FIG. 3F illustrates an example scene in which the foreground subject has moved laterally from the physical position inFIG. 3A and a resulting composite image for the scene inFIG. 3F .FIG. 3G illustrates an example scene in which the foreground subject has moved laterally from the physical position inFIG. 3F and a resulting composite image for the scene inFIG. 3G . -
FIG. 4 illustrates use of image distortion correction applied in some implementations to reduce distortions occurring in various portions of the fields of view of the RGB cameras. -
FIGS. 5A-5D illustrate techniques which may be applied by the video conferencing system in response to changes in distance between multimedia communication devices and respective foreground subjects.FIG. 5A illustrates a first scenario occurring at about a first time and a resulting composite image.FIG. 5B illustrates aspects of scaling of a foreground image by the video conferencing system for the composite image inFIG. 5A based on at least a distance between a multimedia communication device and a participant. -
FIG. 5C illustrates a second scenario occurring at about a second time after the first time inFIG. 5A in which a participant has moved closer to a multimedia communication device and a resulting composite image.FIG. 5D illustrates aspects of scaling of a foreground image by the video conferencing system for the second scenario shown inFIG. 5C . -
FIGS. 5E and 5F illustrate additional techniques which may be applied by the video conferencing system in response to changes in distance between the first multimedia communication device and a foreground subject.FIG. 5E illustrates an example scene in which the foreground subject has moved from the physical position shown inFIG. 3F to a new physical position closer to the multimedia communication device and the resulting composite image.FIG. 5F illustrates an example scene in which the foreground subject has moved from the physical position shown inFIG. 5E to a new physical position further away from the multimedia communication device and the resulting composite image. -
FIGS. 6A-6D illustrate techniques for selecting and changing RGB cameras that further support providing gaze-correct video conferencing sessions among and between various participants at various geographic locations during a single video conferencing session.FIG. 6A illustrates a first scenario occurring at a first time, including a scene at the first geographic location shown inFIG. 1 and a scene at the second geographic location shown inFIG. 1 .FIG. 6B illustrates a second scenario occurring at a second time after the first time shown inFIG. 6A and during the video conferencing session shown inFIG. 6A . -
FIG. 6C illustrates a third scenario occurring at a third time after the second time shown inFIG. 6B and during the video conferencing session shown inFIGS. 6A and 6B .FIG. 6D illustrates a fourth scenario occurring at a fourth time after the third time shown inFIG. 6C and during the video conferencing session shown inFIGS. 6A-6C . -
FIGS. 7A-7C illustrate a technique used in some implementations, in which rendered foreground images make an animated transition from one RGB camera area to another when a new foreground camera is selected, in which over several successive video frames the rendered foreground images “glide” or otherwise approximate lateral human motion from the previous RGB camera area to the new RGB camera area. -
FIG. 8 illustrates techniques involving having multiple participants concurrently participating in a video conferencing session via a single shared multimedia communication device. -
FIG. 9 illustrates an example of gaze-correct multi-party video conferencing among five participants each at a different geographic location. -
FIG. 10 illustrates an example in which two multimedia communication devices are tiled adjacent to each other to provide a larger multimedia communication device or system. -
FIG. 11 is a block diagram illustrating an example software architecture, various portions of which may be used in conjunction with various hardware architectures herein described. -
FIG. 12 is a block diagram illustrating components of an example machine configured to read instructions from a machine-readable medium and perform any of the features described herein. - In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. However, it should be apparent that the present teachings may be practiced without such details. In other instances, well known methods, procedures, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present teachings. In the following material, indications of direction, such as “top” or “left,” are merely to provide a frame of reference during the following discussion, and are not intended to indicate a required, desired, or intended orientation of the described articles unless expressly indicated.
- The following implementations introduce video conferencing systems and process for facilitating eye contact between participants of a video conferencing session. These systems are configured to improve gaze alignment between live participants and projected images of remote counterparts. This can occur by generation of composite images that maximize the presentation of a participant's face and eyes. In addition, segmentation of the image allows foreground images to be composited with background images. These systems are configured to present images of the participant(s) such that the projected person appears to be looking directly at a camera. As a result, the participants can have a gaze-correct multi-party video conferencing session.
- For purposes of this application, the terms “eye contact”, “gaze alignment”, or “direct gaze” refer to a situation in which two individuals are looking directly into each other's eyes, or where an image of a live person's eyes appear to be directed towards a person viewing the image, and/or a live person's eyes are directed toward the eyes of a projected image of a person. As noted above, eye gaze carries important information about another person's focus of attention, emotional and mental states, and intentions, as well as signals another person's potential interest for social interaction. Through eye contact, two persons share emotions and can more readily develop a connection. The perception of a direct gaze can trigger self-referential processing that leads, for example, to the enhanced processing of incoming information, enhancement of self-awareness, and increased prosocial behavior. The eye region is a key region of the face that individuals tend to pay attention to during conversations, as shown in multiple studies using eye tracking technology. In addition, a direct gaze can hold an audience's attention more effectively than other gaze directions. Thus, it becomes increasingly important to provide video conference participants with reliable systems and processes by which they may maintain consistent eye contact during virtual meetings.
-
FIG. 1 illustrates an example of avideo conferencing system 102 that includes a first multimedia communication device 100 (which may be referred to as a “teleconferencing device,” “telepresence device”, “video conferencing device,” or “participant device”) being used to access and participate in a video conferencing session (which may be referred to as a “telepresence session”). Thevideo conferencing system 102 further includes a secondmultimedia communication device 160 at a different secondgeographic location 150. For convenience of discussion, the secondmultimedia communication device 160 is configured with essentially the same features and to operate substantially the same as the firstmultimedia communication device 100. However, themultimedia communication devices video conferencing system 102 may include additional such multimedia communication devices, which may be used to access and participate in the video conferencing session shown inFIG. 1 and/or other video conferencing sessions. In some examples, thevideo conferencing system 102 may include and/or make use of additional network-connected computing devices and systems, with thevideo conferencing system 102 being configured to use such additional computing devices and systems for establishing video conferencing sessions, maintaining video conferencing sessions, image segmentation, and/or image compositing. - In
FIG. 1 , the firstmultimedia communication device 100 is arranged and operating at a firstgeographic location 120 as an endpoint in a video conferencing session. A video conferencing session may also be referred to as a “video conference.” During the video conferencing session, the firstmultimedia communication device 100 is operating to provide a video stream providing a series of live images depicting one or more participants (which may be referred to as “subjects” or “users”) at the firstgeographic location 120 to the secondmultimedia communication device 160 for viewing by aremote participant 155. Further, the firstmultimedia communication device 100 is operating to receive a video stream from the secondmultimedia communication device 160 providing a series of live images depicting theremote participant 155. In the example illustrated inFIG. 1 , the firstmultimedia communication device 100 may be referred to as a “local” device, and the secondmultimedia communication device 160 may be referred to as a “remote” device.” - In the examples illustrated in
FIGS. 1-3, 5-7, and 9 , themultimedia communication device 100 is embodied as an interactive device that includes adisplay device 105 for presenting images, although it is noted that themultimedia communication device 100 is not limited to such embodiments. For example, in some implementations, themultimedia communication device 100 may present images via, but not include, a display device. InFIG. 1 , thedisplay device 105 is positioned to present images to participants at the firstgeographic location 120. In some examples, themultimedia communication device 100 may be configured to display images and/or video streams from one or more remote devices or systems participating in a video conferencing session with themultimedia communication device 100, such as from themultimedia communication device 160. For example, themultimedia communication device 100 may be mounted on a wall, as illustrated inFIG. 1 , or on a stand (which may be movable). In some examples, thedisplay device 105 is also configured to operate as a touch screen to receive user input. In this example, the firstgeographic location 120 is a conference room withseated participants participant 132 in closer proximity to themultimedia communication device 100. The example illustrated inFIG. 1 is not intended to limit applications or environments in which themultimedia communication device 100 may be used. Also, in order to more compactly illustrate features of the firstgeographic location 120, thedesk 125 is shown closer inFIG. 1 than inFIG. 3 below. - At the time illustrated in
FIG. 1 , the fourparticipants multimedia communication device 100. The term “video conferencing” applies to electronic communications in which a video stream including images captured by a first participant device is received and displayed by at least a second participant device, and may include, but does not require, the first participant device displaying a video stream provided by the second participant device. The illustrated video conferencing session includes theremote participant 155 at the secondgeographic location 150, who is participating via the multimedia communication device 160 (which may also be referred to as a “remote participant device”) configured to serve as an endpoint in the video conferencing session. Themultimedia communication device 160 receives the video stream via one or more data communication networks (not illustrated inFIG. 1 ). It is noted that use of themultimedia communication device 100 is not necessarily limited to video conferencing activities. For example, themultimedia communication device 100 may provide a virtual whiteboard or run arbitrary computer program applications, and display information and/or user interfaces for such other activities on thedisplay device 105. Such other activities may be performed during a video conferencing session and result in additional data being exchanged among devices participating in a video conferencing session. - The
multimedia communication device 100 includes a plurality of RGB (red-green-blue)imaging cameras FIG. 1 includes four RGB cameras 110, in other implementations there may two or more RGB cameras 110. Each of the RGB cameras 110 are positioned behind thedisplay device 105 to capture images from light received through thedisplay device 105, and accordingly are not directly visible inFIG. 1 . By positioning the RGB cameras 110 behind thedisplay device 105, images can be displayed on thedisplay device 105 over the physical positions of the RGB cameras 110. By placing the RGB cameras 110 behind thedisplay device 105, subject gazes may be directed at the RGB cameras 110, enabling gaze-correct multi-party video conferencing as discussed in more detail herein. Additionally, by placing the RGB cameras 110 behind thedisplay device 105, greater numbers of RGB cameras 110 may be more easily included, the RGB cameras 110 may arranged to capture images from more natural angles (for example, for near and/or far features), and an additional non-display user-facing surface (such as a bezel) is not necessary to accommodate the RGB cameras 110. - In some implementations, as illustrated by the examples in
FIGS. 1-3 and 5-10 , the RGB cameras 110 are positioned such that, when themultimedia communication device 100 is operated, a leftmost RGB camera 110 (inFIG. 1 , theRGB camera 110 a) and a rightmost RGB camera 110 (inFIG. 1 , theRGB camera 110 d) span a horizontal distance that is at least large enough, in most conditions, to obtain a view around a human subject located close to and within a field of view (FOV) of one or more of the RGB cameras 110. For example, inFIG. 1 , an image of the standingparticipant 132 is included in animage 140 b captured by theRGB camera 110 b, whereas the standingparticipant 132 is not visible in animage 140 d captured by theRGB camera 110 d at approximately the same time. In some examples, theRGB camera 110 a may be positioned at a height less than or about equal to a height of theRGB camera 110 d. Various other arrangements and numbers for the RGB cameras 110 are also effective, such as, but not limited to, an array, along multiple parallel lines, or along perpendicular lines (for example, to increase a horizontal span when operated in portrait orientation perpendicular to the landscape orientation illustrated inFIG. 1 ). In some implementations, the RGB cameras 110 are configured and operated to periodically capture images at a frame rate suitable for video conferencing. Themultimedia communication device 160 similarly includesRGB cameras - In some implementations, the
multimedia communication device 100 includes one or more depth cameras 115, such as the twodepth cameras display device 105 to capture light for depth estimation through thedisplay device 105, such as is illustrated for the twodepth cameras FIG. 1 ). By placing the depth cameras 115 behind thedisplay device 105, greater numbers of depth cameras 115 may be more easily included, and an additional non-display user-facing surface is not necessary for the depth cameras 115. A depth estimate may also be referred to as an “estimated depth,” “distance estimate,” or “estimated distance.” In some implementations, the depth cameras 115 produce depth maps (also referred to as “depth images”) that include depth estimates for multiple physical positions within the FOV of the depth cameras 115. Depth estimates obtained using the depth cameras 115 may be used by the video conferencing system 102 (for example, at the multimedia communication device 100) to, among other things, determine when a subject has come into proximity to themultimedia communication device 100, estimate a distance between themultimedia communication device 100 and a subject, estimate a physical position of a subject relative to one or more of the RGB cameras 110, and/or identify discontinuities in a depth image and related depth image data used to aid image segmentation for a foreground subject in an image captured by one of the RGB cameras 110. - As will be described in more detail below, the video conferencing system 102 (for example, the multimedia communication device 100) is configured to select one or more foreground cameras from the multiple RGB cameras 110 for capturing one or more images of one or more identified foreground subjects (for example, a human subject). The term “foreground” may be abbreviated as “FG” in portions of this disclosure. For the discussion of
FIG. 1 , the standingparticipant 132 may also be referred to as “foreground subject 132.” In the example shown inFIG. 1 , theRGB camera 110 b has been selected as a foreground camera, and has captured anRGB image 140 b in which theforeground subject 132 can be seen. Image segmentation is performed to identify a foreground image portion of theRGB image 140 b corresponding to theforeground subject 132, which is used to generate aforeground image 142 of theforeground subject 132. - In some implementations, the video conferencing system 102 (for example, the multimedia communication device 100) is configured to select a background camera from the multiple RGB cameras 110 for capturing one or more images of at least a portion of a background area behind the
foreground subject 132. The term “background” may be abbreviated as “BG” in portions of this disclosure. In the example shown inFIG. 1 , theRGB camera 110 d has been selected as a background camera, and abackground image 140 d has been obtained from the selectedRGB camera 110 d. In this particular example, the background image 140 a includes images of the table 125 and theparticipants foreground subject 132. Various techniques and details for dynamically selecting RGB cameras to capture foreground subject images and/or background images, segmenting foreground images, and producing composite images from the foreground images are described in U.S. patent application Ser. No. 15/835,413 (filed on Dec. 7, 2017 and entitled “Video Capture Systems and Methods”), which is incorporated by reference herein in its entirety. - In the example shown in
FIG. 1 , theforeground image 142 has been scaled and composited with thebackground image 140 d to produce acomposite image 145. The scaledforeground image 142 has been positioned in thecomposite image 145 so that when thecomposite image 145 is displayed by themultimedia communication device 160, an image portion depicting the eyes of theforeground subject 132 is shown at about the position of theRGB camera 180 a. As a result, while theparticipant 155 views thecomposite image 145 on the multimedia communication device 160 (and other such images), in RGB images captured by theRGB camera 180 a theparticipant 155 is looking directly at theRGB camera 180 a. When such RGB images are used to generate images of theparticipant 155 on themultimedia communication device 100, it appears to at least some of the participant at the firstgeographic location 120 that they are in direct eye contact with theparticipant 155. In the example ofFIG. 1 , an image portion depicting the eyes of theparticipant 155 is shown at about the position of theRGB camera 110 b used as a foreground camera for theforeground subject 132. As a result, while the foreground subject 132 views such images of theparticipant 155 on themultimedia communication device 100, in RGB images captured by theRGB camera 110 b theforeground subject 132 is looking directly at theRGB camera 110 b. - With the use of such RGB images, the
participant 155 views images of theparticipant 132 in which theparticipant 132 is in eye contact with theparticipant 155, and theparticipant 132 views images of theparticipant 155 in which theparticipant 155 is in eye contact with theparticipant 132. As a result, theparticipants participants RGB cameras 110 and 180, there is no need to modify the portions of the RGB images depicting the eyes to achieve gaze alignment, thereby avoiding application of gaze correction techniques that generally result in unnatural images. - For delivery to remote devices such as the
multimedia communication device 160, thecomposite image 145 and/or theforeground image 142 is digitally encoded by thevideo conferencing system 102 to produce an encoded image (such as, but not limited to, a frame of an encoded video stream). The encoded image is then provided to the remotemultimedia communication device 160, thereby causing thecomposite image 145 to be displayed, at least in part, by the remotemultimedia communication device 160, such as via a video conferencing application program executed by the remotemultimedia communication device 160. Similar processing may be performed to generate a sequence of multiple such images, based on images captured by the RGB cameras 110, used for a sequence of frames that are encoded in one or more video streams transmitted to participants of the video conferencing session. Although inFIG. 1 theimage 170 is illustrated as occupying an entire display surface of theremote device 160, theimage 170 may be displayed in a subportion of the display surface; for example, theimage 170 may be displayed in a window or a video display region of a user interface. Themultimedia communication device 100 and/or themultimedia communication device 160 may display images received from one or more remote devices in a similar manner. -
FIG. 2 illustrates an exploded view of the firstmultimedia communication device 100 illustrated inFIG. 1 . For purposes of clarity and discussion,FIG. 2 is presented with reference to aZ axis 202, aY axis 204, and anX axis 206. With respect to theZ axis 202, a positive direction (illustrated with “+”) may be referred to as a “forward” direction, and a negative direction (illustrated with “−”) may be referred to as a “backward” direction. Thedisplay device 105 is arranged perpendicular to theZ axis 202 and configured to emit light in the forward direction through a front (and user-viewable)surface 205 of the display device 105 (which also, in this example, is afront surface 205 of the first multimedia communication device 100) in response to signals received from acontroller 250 included in the firstmultimedia communication device 100. In some examples, a horizontally arranged axis of the firstmultimedia communication device 100 may be referred to as a lateral axis or, and a vertically arranged axis of the firstmultimedia communication device 100 may be referred to as a longitudinal axis or direction (which may define an “upward” direction and a “downward” direction). For example, in the landscape orientation shown inFIG. 1 , theX axis 206 may be referred to as a lateral axis and theY axis 204 may be referred to as a longitudinal axis. In another example, where the firstmultimedia communication device 100 is rotated about theZ axis 202 by about 90 degrees, theX axis 206 may be referred to as a longitudinal axis and theY axis 204 may be referred to as a lateral axis. - The
display device 105 may be implemented with technologies such as liquid-crystal displays (LCDs), organic light-emitting diode type displays (OLEDs), quantum dot-based displays, or various other light-emitting displays that permit RGB cameras 110 to capture suitable images through thedisplay device 105. Light received by theRGB cameras scene 240 in front of thedisplay device 105 passes through respectivepixel display regions depth cameras scene 240 passes through respectivepixel display regions FIG. 2 ) may also be positioned behind thedisplay device 105. For example, one or more of the depth cameras 215 may include an integrated infrared (IR) illumination source. In some examples, thedisplay device 105 includes multiple display panels. - Various configurations may be used to allow the RGB cameras 110 to capture images through the
display device 105. In some implementations, thedisplay device 105 is a forward-emitting display device, such as an OLED-based forward-emitting display device, arranged such that a small portion or substantially none of the light emitted by thedisplay device 105 is emitted through a rear surface of thedisplay device 105. For example, some OLED-based forward-emitting display devices have about a 5% backward emission of display light. In some implementations, image correction is performed to correct for backward-emitted light; for example, image contents for an RGB camera pixel display region 210 may be used to estimate and subtract or otherwise correct the effect of backward-emitted light captured by an RGB camera 110. With a forward-emittingdisplay device 105, the RGB cameras 110 and/or the depth cameras 115 may capture images at any time, independent of synchronization with operation of thedisplay device 105. - In some implementations, image capture operations performed by the RGB cameras 110 are synchronized with at least operation of their respective pixel display regions 210. For example, image capture periods for an RGB camera 110 may be performed when its respective pixel display regions 210 is not emitting light, such as, but not limited to, in synchronization with display refresh periods or by displaying a dimmed image (including, for example, a black image) in the pixel display regions 210 during image capture operations. Additional approaches are described in U.S. Patent Application Publication Number 2015/0341593 (published on Nov. 26, 2015 and entitled “Imaging Through a Display device”), which is incorporated by reference herein in its entirety. In some implementations, depth image capture operations performed by the depth cameras 115 are similarly synchronized with at least operation of their respective depth camera pixel display regions 215. In the example of the first
multimedia communication device 100 inFIGS. 1 and 2 , each of the RGB cameras 110 is positioned at about a same first distance upward (and away) from alateral midline 206 of thedisplay device 105. However, in other implementations, the physical positions of the RGB cameras 110 relative to one another and/or thelateral midline 206 can vary. - The first
multimedia communication device 100 also includes thecontroller 250. Thecontroller 250 includes a logic subsystem, a data holding subsystem, a display controller, and a communications subsystem, and is communicatively coupled to thedisplay device 105, RGB cameras 110, and depth cameras 115. The logic subsystem may include, for example, one or more processors configured to execute instructions and communicate with the other elements of the firstmultimedia communication device 100 according to such instructions to realize various aspects of this disclosure. Such aspects include, but are not limited to, configuring and controlling the other elements of the firstmultimedia communication device 100, input and commands, communicating with other computer systems, processing images captured by the RGB cameras 110 and the depth cameras 115, and/or displaying image data received from remote systems. The data holding subsystem includes one or more memory devices (such as, but not limited to, DRAM devices) and/or one or more storage devices (such as, but not limited to, flash memory devices). The data holding subsystem includes one or more media having instructions stored thereon which are executable by the logic subsystem, which cause the logic subsystem to realize various aspects of this disclosure. Such instructions may be included as part of firmware, an operating system, device drivers, application programs, or other executable programs. The communications subsystem is arranged to allow the firstmultimedia communication device 100 to communicate with other computer systems. Such communication may be performed via, for example, wired or wireless data communication. Other examples for thecontroller 250 are illustrated inFIGS. 11 and 12 . - The first
multimedia communication device 100 also includes anenclosure 260, arranged to be mechanically coupled to thedisplay panel 105 and enclose internal components of the firstmultimedia communication device 100, including the RGB cameras 110, the depth cameras 215, and thecontroller 250. Theenclosure 260 may also be referred to as a “housing.” In this example, when the illustrated firstmultimedia communication device 100 is assembled, the RGB cameras 110 are all encompassed by thesingle enclosure 260 and positioned behind thesingle display device 105. - For the examples shown in
FIGS. 1-7, 9, and 10 , thedisplay device 105 has a 16:9 aspect ratio, with a diagonal size of approximately 213 centimeters. TheRGB cameras lateral axis 206 with a distance of about 150 centimeters between the optical axes of theRGB cameras lateral midline 206 illustrated inFIG. 2 ) is positioned horizontally and approximately 154 centimeters above a floor, and the optical axes of the RGB cameras 110 are positioned approximately 6 centimeters above the vertical center of thedisplay device 105, placing the optical axes of the RGB cameras 110 approximately 160 centimeters from the floor, positioning the RGB cameras 110 at approximately eye level for a standing human subject. By positioning the RGB cameras 110 at an eye-level height, a subject's eyes are more likely to be aligned with the RGB cameras 110 improving both capture of gaze-aligned images (images in which a subject is looking directly at the camera) and display of images of remote participants perceived as direct eye-to-eye contact. An optical axis of thedepth camera 115 a is oriented 11 degrees left from the horizontal axis 210 and an optical axis of thedepth camera 115 b is oriented 11 degrees right from the horizontal axis 210, thereby providing an increased combined FOV for the depth cameras 115. An optical center of thedepth camera 115 a is positioned approximately 66 centimeters in the lateral direction from an optical center of thedepth camera 215 b. The optical centers of the depth cameras 115 are positioned approximately 13 centimeters below the optical axes of the RGB cameras 110. The RGB cameras 110 and the depth cameras 115 each capture images with a 16:9 aspect ratio and with a horizontal FOV of approximately 100 degrees. These dimensions and arrangements are described to more fully describe the illustrations inFIGS. 1-7, 9, and 10 , and are not required features of the examples described herein. - Although in
FIGS. 1 and 2 various elements and features of the firstmultimedia communication device 100 are described as being integrated into a single device, in other implementations, various elements and features of the firstmultimedia communication device 100 may be implemented across multiple devices. For example, selected operations may be performed by a computer system not within the illustratedenclosure 260, and/or some or all of the depth cameras 115 may be included in one or more separate devices instead of being positioned behind thedisplay device 105 or otherwise not positioned within theenclosure 260. -
FIG. 3A illustrates an example of capturing and displaying human foreground subject images.FIG. 3A shows a top view of anexample scene 300 in which the fourparticipants FIG. 1 , withseated participants participant 132, during a video conferencing session. The standingparticipant 132 has advanced toward themultimedia communication device 100 and within anexample threshold distance 302 and acorresponding foreground space 303. The video conferencing system 102 (for example, the multimedia communication device 100) may be configured to determine a subject distance based on depth images captured by the depth cameras 115. In this example, thevideo conferencing system 102 is configured to ignore features beyond thethreshold distance 302 or outside of theforeground space 303 for identifying foreground subjects. The shape, physical positions, and distances illustrated inFIG. 3A for thethreshold distance 302 and theforeground space 303 are generally illustrated for discussion, and may be different in various implementations. In some implementations, thethreshold distance 302 and/or a shape of, and physical positions for, theforeground space 303 may be defined and/or adjusted by a user; for example, during a setup process. - Based on at least the
participant 132 being within thethreshold distance 302, the video conferencing system 102 (for example, the multimedia communication device 100) has identified theparticipant 132 as a foreground subject for segmentation from RGB images. InFIG. 3A , thevideo conferencing system 102 has selected theRGB camera 110 b, with acorresponding FOV 304 b (shown in part), as the foreground camera for capturing images of theforeground subject 132. It is noted that foreground camera selection may occur after the foreground image has been captured and be based on the content of the RGB images and/or corresponding depth images. -
FIG. 3B illustrates an example of segmentation of aforeground image 330, corresponding to theforeground subject 132, from anRGB image 310 captured by themultimedia communication device 100 for thescene 300 shown inFIG. 3A . In some implementations, the segmentation of a foreground image from an RGB image results in labeling of pixels in the RGB image, rather than generating a foreground image separate from the RGB image. TheRGB image 310 has been captured by the selectedforeground RGB camera 110 b. In theRGB image 310, theforeground subject 132 has aheight 312 of about 74% of the height of theRGB image 310, and the eyes of theforeground subject 132 are centered at alateral distance 314 of about 74% of the width of theRGB image 310. In this example, an RGB image based segmentation is performed, identifying afirst foreground mask 316 identifying pixel positions corresponding to theforeground subject 132 and, in some examples, afirst background mask 318. In some examples, a machine-trained model for an automated machine algorithm, trained to identify instances of certain types of objects, may be applied to theRGB image 310 to identify thefirst foreground mask 316 and/or thefirst background mask 318. For example, a trained neural network, such as a trained convolutional neural network (CNN), may be used for this purpose. - At about a same time as the capture of the
RGB image 310, adepth image 320 has been captured for thescene 300 by thedepth camera 115 a. Due to limitations of patent illustrations, thedepth image 320 is illustrated with only a few different levels of shading. In thedepth image 320, there is aportion 322 with depth estimates that are substantially discontinuous along edges between theportion 322 and surrounding areas of thedepth image 320. Based on thedepth image 320, the video conferencing system 102 (for example, the multimedia communication device 100) identifies a firstforeground depth mask 324 identifying positions in thedepth image 320 corresponding to theforeground subject 132 and, in some examples, a firstbackground depth mask 326. In some implementations, based on the above-mentioned discontinuities between theportion 322 and surrounding areas of thedepth image 320, thevideo conferencing system 102 identifies theportion 322 as aforeground portion 322 of thedepth image 320. In some examples, thevideo conferencing system 102 may further determine a distance d305 and/or physical position for the identifiedforeground portion 322. Based on, for example, the determined distance d305 being less than thethreshold distance 302 and/or the determined physical position being within theforeground space 303, thevideo conferencing system 102 identifies a foreground subject corresponding to theparticipant 132. - In an implementation with the
depth camera 115 a is at a different position than theimaging camera 110 b, as illustrated inFIGS. 1 and 2 ), the video conferencing system 102 (for example, the multimedia communication device 100) is configured to identify portions of theRGB image 310 corresponding to the firstforeground depth mask 324, resulting in asecond foreground mask 328 and, in some implementations, asecond background mask 329. For conversions, transformations, and/or other computations performed to identify the corresponding positions in theRGB image 310, various techniques can be used individually or in combination, including, but not limited to, rotations and/or translations of two-dimensional (2D) and/or 3D points and/or vectors (including, for example, use or one or more transformation matrices); optical distortion correction for a depth camera and/or RGB camera (including, for example, correction of complex asymmetric optical distortion); geometric transformations such as, but are not limited to, affine transformations (linear conformal (scaling, translations, rotations) and shears), projective transformations (projections, homographies, and collineations), and piecewise linear transformations (for example, affine transformations applied separately to triangular regions of an image); and/or nonlinear image transformations such as, but not limited to, polynomial transformations, nonuniform scaling, circular or radial distortion (barrel, pincushion, moustache, and multiorder), and tangential distortion (for example, using Brown's model). Such techniques may be implemented using various techniques, such as, but not limited to, matrix operations, numerical approximation (such as Taylor series or Newton-Raphson), and/or mapping/interpolation. - The video conferencing system 102 (for example, the multimedia communication device 100) is configured to, based on the
first foreground mask 316, thesecond foreground mask 328, thefirst background mask 318, and/or thesecond background mask 329, segment from the RGB image 310 aforeground image 330 corresponding to theforeground subject 132. Other techniques that may be applied for segmenting theforeground image 330 are described in U.S. patent application Ser. No. 15/975,640 (filed on May 9, 2018 and entitled “Skeleton-Based Supplementation for Foreground Image Segmentation”), which is incorporated by reference herein in its entirety. -
FIG. 3C shows details of theforeground image 330 obtained inFIG. 3B for thescene 300 shown inFIG. 3A . Theforeground image 330 has a total height of about 74% of the height of theRGB image 310 and a total width of about 25% of the width of the width of theRGB image 310. The video conferencing system 102 (for example, themultimedia communication device 100 and/or 160) is configured to obtain aneye pixel position 332 for theforeground image 330, corresponding to an image portion included in theforeground image 330 depicting the eyes of theforeground subject 132. In some examples, theeye pixel position 332 may be determined based on a centroid, middle position, or average position for an image portion identified as a portion of theforeground image 330 depicting the eyes of theforeground subject 132. In some implementations, a machine-trained algorithm used to identify thefirst foreground mask 316 may also be trained to identify a portion of theRGB image 310 depicting the eyes of theforeground subject 132 and/or estimate theeye pixel position 332. In this example, theeye pixel position 332 is at a lateral (or “x”) pixel position or distance 334 of about 50% of the width of theforeground image 330, and is at a longitudinal (or “y”) pixel position or distance 336 of about 85% of the height of theforeground image 330. -
FIG. 3D showspixel positions composite image 350 corresponding to respective RGB camerapixel display regions RGB cameras multimedia communication device 160 that will display thecomposite image 350. In this example, each of the pixel positions 343, 345, 347, and 349 has at a longitudinal pixel position or distance 340 (in this example, along a Y axis similar to theY axis 204 shown inFIG. 2 ) of about 55% of the height of thecomposite image 350. Thepixel position 343, corresponding to thepixel display region 190 a and theRGB camera 180 a, has a lateral pixel position or distance 342 (in this example, along a X axis similar to theX axis 206 shown inFIG. 2 ) of about 11% of the width of thecomposite image 350.Pixel position 345 has a lateral pixel position or distance 344 of about 35%,pixel position 347 has a lateral pixel position or distance 346 of about 65%, andpixel position 349 has a lateral pixel position or distance 348 of about 89%. These pixel positions are merely illustrated for the purposed of discussion, and are not intended to be limiting on other embodiments. Thevideo conferencing system 102 is configured to generate thecomposite image 150. In some implementations, the pixel positions 343, 345, 347, and 349 are provided by the remotemultimedia communication device 160 to themultimedia communication device 100, and compositing is performed by themultimedia communication device 100. In some implementations, the pixel positions 343, 345, 347, and 349 are determined and used by the remotemultimedia communication device 160 that will display thecomposite image 350, and compositing is performed by the remotemultimedia communication device 160. -
FIG. 3E illustrates a portion of thecomposite image 350 generated for thescene 300 shown inFIG. 3A using theforeground image 330 shown inFIG. 3C . Theforeground image 330 is selectively positioned such that theeye pixel position 332 of theforeground image 330 is at about thepixel position 347 for theRGB camera 180 c and as a result displayed by thepixel display region 190 c. Theforeground image 330 is scaled for composition in thecomposite image 350. This scaling is discussed in more detail in connection withFIGS. 5A-5F . In the example shown inFIG. 3E , theforeground image 330 is scaled such that it would have atotal height 354 of about 93% of the height of the composite image 350 (an increase of about 26% from the proportionate size of theforeground image 330 portion of the RGB image 310). However, due to longitudinal positioning or shifting of theforeground image 330 to have theeye position 332 at about thelongitudinal position 340, the renderedheight 356 of the renderedportion 352 of theforeground image 330 is only about 59% of the height of the compositedimage 350. Theeye pixel position 332 of the renderedportion 352foreground image 330 is at about thelateral pixel position 346 in thecomposite image 350. As a result, the eyes of theforeground subject 132 are displayed at about thepixel display region 190 c that will be used to capture RGB images of the participant viewing thecomposite image 350. -
FIG. 3F illustrates anexample scene 360 in which theforeground subject 132 has moved laterally from the physical position inFIG. 3A and a resultingcomposite image 374 for thescene 360 inFIG. 3F . Thecomposite image 374 is generated according to the techniques described inFIGS. 3A-3E . In this example, thevideo conferencing system 102 again selects theRGB camera 110 b as the foreground camera for theforeground subject 132. Theforeground subject 132 is at a distance d362 from the selectedRGB camera 110 b.FIG. 3F shows anRGB image 364, obtained from the selectedRGB camera 110 b for thescene 360, in which theforeground subject 132 has aheight 366 of about 74% of the height of theRGB image 364, and the eyes of theforeground subject 132 are centered (for a position similar to theeye pixel position 332 shown inFIG. 3C ) at alateral distance 368 of about 59% of the width of theRGB image 364. As described inFIG. 3E for theforeground image 330, the resultingforeground image 370 is scaled and composited into thecomposite image 374 such that an eye position for the renderedportion 372 of theforeground image 370 is at about thelongitudinal pixel position 340 andlateral pixel position 346 for thepixel display region 190 c. -
FIG. 3G illustrates anexample scene 380 in which theforeground subject 132 has moved laterally from the physical position inFIG. 3F and a resultingcomposite image 394 for thescene 380 inFIG. 3G . Thecomposite image 374 is generated according to the techniques described inFIGS. 3A-3E . In this example, thevideo conferencing system 102 again selects theRGB camera 110 b as the foreground camera for theforeground subject 132.FIG. 3G shows anRGB image 384, obtained from the selectedRGB camera 110 b for thescene 380, in which theforeground subject 132 has aheight 386 of about 74% of the height of theRGB image 384, and the eyes of theforeground subject 132 are centered at alateral distance 388 of about 26% of the width of theRGB image 384. As described inFIG. 3E for theforeground image 330 and inFIG. 3F for theforeground image 370, the resultingforeground image 390 is scaled and composited into thecomposite image 394 such that an eye position for the renderedportion 392 of theforeground image 390 is at about thelongitudinal pixel position 340 andlateral pixel position 346 for thepixel display region 190 c. - Thus, despite the lateral movements of the foreground subject 132 that occurred from
FIG. 3A toFIG. 3F toFIG. 3G , resulting in significantly different lateral positions in the FOV of theRGB camera 110 b to about 74%, 59%, and 26% respectively, throughout that time the resultingcomposite images longitudinal pixel position 340 of about 55% and thelateral pixel position 346 of about 65%, and maintained the rendered position of the eyes of the foreground subject 132 over the foreground camera being used to capture RGB images of the participant viewing the composite images. This both reduces distraction caused by such movements of theforeground subject 132 and enables a gaze-correct multi-party video conferencing session between at least those two participants. It is noted that the various techniques for generating composite images and displaying the composite images on the remotemultimedia communication device 160 are similarly done with reversed roles, whereby the remotemultimedia communication device 160 captures an RGB image of a remote participant, resulting in a composite image generated by thevideo conferencing system 102 being displayed on themultimedia communication device 100. -
FIG. 4 illustrates use of image distortion correction applied in some implementations to reduce distortions occurring in various portions of the fields of view of the RGB cameras 110. In some implementations, some or all of the RGB cameras 110 have wide fields of view of about 90 degrees or more. For compact and/or lower cost RGB cameras 110 at such wide fields of view, curvilinear distortion such a barrel distortion is common.FIG. 4 shows anuncorrected image 400 obtained from a wide angle RGB camera 110, with dashed lines added to more clearly illustrate barrel distortion in theuncorrected image 400. The distortion is relatively minor at acentral portion 410 of theuncorrected image 400, as shown by arepresentative foreground image 420. However, when a foreground subject moves towards an edge of the FOV of the RGB camera 110, the distortion becomes more severe and becomes noticeable, as shown by therepresentative foreground image 425 from a peripheral portion of theuncorrected image 400 in contrast to thecentral foreground image 420. In addition to being visually noticeable, such distortion, if uncorrected, can cause the eyes of the foreground subject to appear to be looking away from a remote participant even when the foreground subject is looking at the RGB camera. For example, axial distortion associated with subject distance can cause participant gaze angles to deviate. Further, if the foreground subject 132 moves from one side of the FOV to the other, the resulting foreground images demonstrate distortions in different directions, resulting in an unusual and disturbing visual effect when the foreground subject is maintained at the same lateral position as shown inFIGS. 3E, 3F, and 3G . - In some implementations, the video conferencing system 102 (for example, the multimedia communication device 100) is configured to “undistort” or correct the RGB images to reduce such distortion.
FIG. 4 shows a correctedimage 430, resulting from correction of the barrel distortion in the originaluncorrected image 410. As a result of this undistortion, the appearance of the foreground subject is more consistent in appearance across the FOV of an RGB camera 110, as illustrated by theforeground images respective portions image 430. In some examples, other image corrections may be applied, including, but not limited to, corrections for more complex (non-curvilinear) optical distortions, vignetting, and chromatic aberration. Various image corrections may be performed using the techniques described in connection with transforming depth images inFIG. 3B . - Other non-optical distortions can occur in the form of subject distance distortions when a participant is close to an RGB camera 110. Although in some examples, depth images obtained from the depth cameras 115 may be used to correct for certain subject distance distortions, in some implementations the
multimedia communication device 100 is configured to present images and interfaces on thedisplay 105 to as to reduce the occurrence of such distortions. In some implementations, interactive user interface elements responsive to touch-based user input are presented in portions of thedisplay device 105 likely to reduce the occurrence of images with such disproportionate portions. For example, interactive user interface elements may be positioned at or near the right or left ends of adisplay device 105 configured to operate as a touch screen to receive user input, such that input via a finger or handheld instrument is more likely to occur at positions away from an optical axis of an RGB camera 110 (including, for example, positions outside of an FOV of the RGB camera 110). In some examples, such interactive user interface elements may be dynamically positioned and/or repositioned based on at least a detected position of a foreground subject. For example, an interactive user interface element may be moved from a left end to a right end in response to a corresponding lateral movement of a foreground subject. As another example, the dynamic positioning and/or repositioning of user interface elements may include selecting one of multiple areas of thedisplay device 105 where touch-based input occurs away from optical axes of one or more of the RGB cameras 110. In some examples, a hand or limb likely to be used for touch-based input may be determined for a foreground subject (for example, a determination of a dominant hand based on past user input events), and dynamic positioning or repositioning is performed based on which hand is determined likely to be used. For example, positions to the left (as viewed by a user looking at the display device) of a foreground camera may be preferred to avoid a left-handed foreground subject reaching across an FOV of the foreground camera. In some examples, a user interface may be selectively positioned to place a display area of the user interface closer than an input portion of the user interface to an optical axis of an RGB camera 110, thereby guiding a foreground subject's gaze toward a RGB camera 110 at times that they are interacting with an application on themultimedia communication device 100 and not looking at an image of a remote participant, while also guiding the foreground subject's input interactions away from the RGB camera 110 so as to avoid subject distance distortions. -
FIGS. 5A-5D illustrate techniques which may be applied by thevideo conferencing system 102 in response to changes in distance between multimedia communication devices and respective foreground subjects.FIG. 5A illustrates afirst scenario 500 occurring at about a first time, including ascene 500 a at a first geographic location and ascene 500 b at a different second geographic location, and a resultingcomposite image 540. In thescene 500 a, afirst participant 504 is participating in a video conferencing session via a firstmultimedia communication device 502. In thescene 500 b, asecond participant 514 is participating in the video conferencing session via a secondmultimedia communication device 512. Each of themultimedia communication devices multimedia communication devices FIGS. 1-4 . In the examples, shown inFIGS. 5A-5D , themultimedia communication devices multimedia communication device 100, but otherwise are similarly configured. For convenience of discussion, the first and secondmultimedia communication devices video conferencing system 102. - In
FIG. 5A , the video conferencing system 102 (for example, the first multimedia communication device 502) determines a distance d505 (in this example, about 70 centimeters) between the firstmultimedia communication device 502 and thefirst participant 504. The firstmultimedia communication device 502 includes anRGB camera 506 c with ahorizontal FOV 507 c (in this example, about 100 degrees), which is used to capture anRGB image 520. A shoulder width of thefirst participant 504 occupies a horizontal angle orFOV 509 of theRGB camera 506 c of about 27.4 degrees. Aforeground image portion 522 of theRGB image 520, corresponding to thefirst participant 504, has ashoulder width 524 of about 20.4% of the width of theRGB image 520 and aheight 526 of about 82% of the height of theRGB image 520. The video conferencing system 102 (for example, the first multimedia communication device 502) segments aforeground image 528, corresponding to theforeground subject 132, from theRGB image 520. - The video conferencing system 102 (for example, the second multimedia communication device 512) determines a distance d515 (in this example, about 140 centimeters) between the second
multimedia communication device 512 and thesecond participant 514. The secondmultimedia communication device 512 includes anRGB camera 516 c, which is used to capture an RGB image (not shown inFIG. 5A ). A shoulder width of thesecond participant 514 occupies a horizontal angle orFOV 519 of theRGB camera 516 c of about 13.4 degrees. -
FIG. 5B illustrates aspects of scaling of theforeground image 528 by the video conferencing system 102 (for example, themultimedia communication devices 502 and/or 512) for thecomposite image 540 based on at least the distance d505 between the firstmultimedia communication device 502 and thefirst participant 504. Thevideo conferencing system 102 is configured to determine an apparent distance d534 based on the distances d505 and d515. In this example, the apparent distance d534 is a sum of the distance d505 and the distance d515, although other techniques may be used, including, but not limited to, limiting distances d505 and/or d515 to minimum and/or maximum distances, and/or applying a weighting or scaling factor to distances d505 and/or d515. A portion of a display screen of the second multimedia communication device 512 (in this example, the entire display screen) appears to thesecond participant 514 to be like a “virtual window” 532, through which thefirst participant 504 appears to be at the apparent distance d534 from thesecond participant 514. - The
video conferencing system 102 is configured to scale theforeground image 528 based on the apparent distance d534, resulting in theforeground image 528 being scaled such that it would have atotal height 544 of about 95% of the height of thecomposite image 540, resulting in the renderedforeground image 542 having ashoulder width 538 of about 22.7% of the width of thecomposite image 540, spanning ahorizontal FOV 536 of thesecond participant 514 of about 10.1 degrees. As in the examples inFIGS. 3A-3G , thevideo conferencing system 102 is configured to generate thecomposite image 540 with the eye position of the renderedforeground image 542 composited at about an RGB camerapixel display region 508 c for the foregroundcamera RGB camera 516 c. This results in the renderedforeground image 542 having aheight 546 of about 63% of the height of thecomposite image 540. It is noted that thevideo conferencing system 102 may be configured to similarly scale an image of thesecond participant 514 for display to thefirst participant 504 via the firstmultimedia communication device 502, thereby achieving the same “virtual window” effect for bothparticipants -
FIG. 5C illustrates asecond scenario 550 occurring at about a second time after the first time inFIG. 5A and during the video conferencing session shown inFIG. 5A in which thesecond participant 514 has moved closer to the secondmultimedia communication device 512, including ascene 550 a for thefirst participant 504 and ascene 550 b for thesecond participant 514, and a resultingcomposite image 562. In this example, thefirst participant 504 has remained in the physical position shown inFIG. 5A . This, the distance d555 andhorizontal FOV 509 are essentially the same, and theRGB image 552 captured by theRGB camera 506 c has aforeground image portion 554 with ashoulder width 556 andheight 558 that are approximately the same as theshoulder width 524 andheight 526 inFIG. 5A , resulting in aforeground image 560 similar to theforeground image 528 inFIG. 5A . - The
second participant 514 has moved to a new distance d555 of about 70 centimeters. A shoulder width of thesecond participant 514 occupies an increase horizontal angle orFOV 559 of theRGB camera 516 c of about 21.9 degrees.FIG. 5D illustrates aspects of scaling of theforeground image 560 by thevideo conferencing system 102 for thecomposite image 562 based on at least the distance d505 between the firstmultimedia communication device 502 and thefirst participant 504 in accordance with the techniques described inFIG. 5A . InFIG. 5D , the movement of thesecond participant 514 has resulted in a decreased apparent distance d535, an increasedhorizontal FOV 537 of about 14.3 degrees. Due to the decreased distance d555, the net result is theforeground image 560 being scaled smaller than inFIG. 5A . Theforeground image 560 being scaled such that it would have atotal height 566 of about 71% of the height of the composite image 562 (a decrease of about 15% from the scaling of theforeground image 528 for thecomposite image 540 inFIG. 5A ), resulting in the renderedforeground image 564 having ashoulder width 539 of about 16.9% of the width of thecomposite image 562, spanning ahorizontal FOV 537 of thesecond participant 514 of about 14.3 degrees (an increase by about 42% over thehorizontal FOV 536 inFIG. 5A ). With the eye position of the renderedforeground image 564 composited at about thepixel display region 508 c, the renderedforeground image 564 has aheight 568 of about 60% of the height of thecomposite image 564. -
FIGS. 5E and 5F illustrate additional techniques which may be applied by the video conferencing system 102 (for example, bymultimedia communication devices 100 and/or 160) in response to changes in distance between the firstmultimedia communication device 100 and aforeground subject 132.FIG. 5E illustrates anexample scene 570 in which theforeground subject 132 has moved from the physical position shown inFIG. 3F to a new physical position closer to themultimedia communication device 100, at a distance d571, and the resultingcomposite image 577 displayed by themultimedia communication device 160. As described inFIGS. 3A-3G , the video conferencing system 102 (for example, themultimedia communication device 100 or 160) is configured to generate thecomposite image 577 with the eye position of the renderedforeground image 578 composited at about the pixel display region 190 for the foreground camera (in this case, thepixel display region 190 c, as inFIGS. 3E, 3F, and 3G ). - In this example, as a result of the shorter distance d571, a different and larger view of the
foreground subject 132 is captured in aforeground image portion 573 of anRGB image 572 from theRGB camera 110 b than in the examples shown inFIGS. 3B, 3F, and 3G . For example, ashoulder width 574 of the foreground image portion 573 (at about 30% of the width of the RGB image 582) is about 70% greater than in those examples, theforeground image portion 573 has aheight 575 of about 82% of the height of theRGB image 572, and only a portion of theforeground subject 132 above the waist was captured in theRGB image 572. Thevideo conferencing system 102 segments aforeground image 576 corresponding to the foreground subject 132 from theRGB image 572. - The video conferencing system 102 (for example, the
multimedia communication device 100 or 160) is configured to scale theforeground image 576 based on at least the distance d571 between themultimedia communication device 100 and theforeground subject 132. The video conferencing system 102 (for example, the multimedia communication device 100) may determine the distance d571 based on at least depth images from the depth cameras 115. As a result, theforeground image 576 is scaled such that it would have atotal height 580 of about 65% of the height of the composite image 577 (a decrease of about 21% from the proportionate size of theforeground image portion 573 of the RGB image 572), resulting in a renderedshoulder width 579 of about 23.2%. Since a lower portion of theforeground subject 132 was not captured in theRGB image 572, most of theforeground image 576 is included in thecomposite image 577, with the renderedportion 578 of theforeground image 576 having a renderedheight 581 of about 59% of the height of thecomposite image 577. As a result of the scaling based on distance, theforeground subject 132 has a very similar appearance inFIGS. 3F and 5A despite the differences in the capturedRGB images -
FIG. 5F illustrates anexample scene 582 in which theforeground subject 132 has moved from the physical position shown inFIG. 5E to a new physical position further away from themultimedia communication device 100, at a distance d583, and the resultingcomposite image 589. In this example, as a result of the greater distance d583, a different and smaller view of theforeground subject 132 is captured in aforeground image portion 585 of anRGB image 584 from theRGB camera 110 b than in the examples shown inFIGS. 3B, 3F, 3G, and 5E . For example, a shoulder width 586 of theforeground image portion 585 is only about 15.6% of the width of theRGB image 584, while theforeground image portion 585 has aheight 587 of about 65% of the height of theRGB image 584. Thevideo conferencing system 102 segments aforeground image 588 corresponding to the foreground subject 132 from theRGB image 584. - As described in
FIG. 5E , thevideo conferencing system 102 again scales theforeground image 588 based on at least the distance d583 between themultimedia communication device 100 and theforeground subject 132. As a result, theforeground image 588 is scaled such that it would have atotal height 592 of about 97% of the height of the composite image 589 (an increase of about 49% over the scaling of theforeground image 576 portion for thecomposite image 577 inFIG. 5E ), resulting in the renderedforeground image 590 having ashoulder width 591 of about 23.2%, which is substantially similar to theshoulder width 579 inFIG. 5E . The renderedforeground image 590 of theforeground image 587 has a renderedheight 592 of about 59% of the height of thecomposite image 589, which is substantially similar to the renderedheight 580 inFIG. 5E . - Thus, in the examples shown in
FIGS. 5E and 5F , despite changes in distance between theparticipant 132 and themultimedia communication device 100 and corresponding differences in capturedforeground image portions FIGS. 3A-3G is maintained, including maintaining the rendered position of the eyes of the foreground subject 132 over the foreground camera being used to capture RGB images of the participant viewing the composite images (in the examples ofFIGS. 5E and 5F ,RGB camera 180 c). Thus, in the examples ofFIGS. 5E and 5F , movement of theforeground subject 132 throughout much of an FOV of an RGB camera has a substantially reduced effect, both reducing distraction from changes in appearance caused by such movements of theforeground subject 132 and enabling a gaze-correct multi-party video conferencing session between at least those two participants despite such movements, granting participants more freedom within more effective video conferencing sessions. - Although an ability to establish eye contact is an important component for improved video conferencing experiences, an ability to effectively convey dynamic cooperative spatial and postural behaviors by which people ordinarily interact adds another significant dimension to the experience and presents another area for improvement. Adam Kendon's F-formation system of spatial organization describes various spatial patterns that naturally arise during face-to-face interactions between two or more people to create a transactional segment (which may be referred to as a joint transactional space or an “o-space”) for directing attention and manipulating objects. In one-on-one interactions, which are significantly more common than interactions with more than two people, three spatial patterns were observed: a side-by-side arrangement where two participants stand close together facing the same direct, a face-to-face (or vis-à-vis) arrangement with two participants facing each other, and an off-axis arrangement where two individuals stand off-axis to each other (for example, perpendicularly to each other in an L-arrangement as if standing on two edges of the letter ‘L’). Subconsciously, the face-to-face arrangement—an arrangement commonly achieved by conventional video conferencing—is considered confrontational and uncomfortable over time, and instead the off-axis arrangement is preferred. Additionally, spatial positioning is dynamic over the course of a conversation. For example, the face-to-face arrangement is often preferred when people greet each other at a beginning of a conversation, which then shifts to the off-axis arrangement.
- The
video conferencing system 102 enables such spatial arrangements to be dynamically created, communicated, and controlled by participants, thereby further improving the perceived quality, comfort, and effectiveness of video conferencing sessions.FIGS. 6A-6D illustrate techniques for selecting and changing RGB cameras that further support providing gaze-correct video conferencing sessions among and between various participants at various geographic locations during a single video conferencing session.FIG. 6A illustrates afirst scenario 600 occurring at about a first time, including a scene 600 a at the secondgeographic location 150 shown inFIG. 1 and ascene 600 b at the firstgeographic location 120 shown inFIG. 1 . Two views are shown for the scene 600 a: on the left is a top view showing a physical position of theparticipant 155 relative to themultimedia communication device 160, and on the right a perspective view showing theparticipant 155 interacting with a renderedforeground image 606 of theparticipant 132 displayed by themultimedia communication device 160. Likewise, two views are shown for thescene 600 b: on the left is a top view showing a physical position of theparticipant 132 relative to themultimedia communication device 100, and on the right a perspective view showing theparticipant 132 interacting with a renderedforeground image 616 of theparticipant 155 displayed by themultimedia communication device 100. - The
video conferencing system 102 is configured to determine (for example, at the multimedia communication device 160) a physical position of theparticipant 155 relative to themultimedia communication device 160 for selecting (for example, at the multimedia communication device 100) an RGB camera 110 of themultimedia communication device 100 as a foreground camera which will be used by themultimedia communication device 100 to capture images of theparticipant 132 and to which the portion of the renderedforeground image 616 depicting the eyes of theparticipant 155 will be aligned. Likewise, the video conferencing system 102 (for example, the multimedia communication device 100) is configured to determine (for example, at the multimedia communication device 100) a physical position of theparticipant 132 relative to themultimedia communication device 100 for selecting (for example, at the multimedia communication device 160) anRGB camera 180 of themultimedia communication device 160 as a foreground camera which will be used by themultimedia communication device 100 to capture images of theparticipant 132 and to which the portion of the renderedforeground image 606 depicting the eyes of theparticipant 132 will be aligned. In some implementations, thevideo conferencing system 102 is configured to select theRGB camera 180 having a lateral position most closely corresponding to a detected lateral physical position of theparticipant 132 relative to themultimedia display device 100. In such implementations, in some examples thevideo conferencing system 102 is configured to determine which of the RGB cameras 110 theparticipant 132 is most directly aligned with, and thevideo conferencing system 102 is configured to select thecorresponding RGB camera 180 as the active camera (whereRGB cameras RGB cameras multimedia communication devices - In the particular example shown in
FIG. 6A , thevideo conferencing system 102 determines that theparticipant 155 is laterally aligned with theRGB camera 180 c. In response to this determination, thevideo conferencing system 102 selects the correspondingRGB camera 110 c as the foreground camera for theparticipant 132. As a result, an RGB image captured by theRGB camera 110 c will be used for generating the renderedforeground image 606, and the eyes of theparticipant 155 depicted in the renderedforeground image 616 are aligned with the position of thepixel display region 210 c for theRGB camera 110 c. - Similarly, the
video conferencing system 102 may determine that theparticipant 132 is laterally aligned with theRGB camera 110 c. In response to this determination, thevideo conferencing system 102 selects the correspondingRGB camera 180 c as the foreground camera for theparticipant 155. As a result, an RGB image captured by theRGB camera 180 c will be used for generating the renderedforeground image 616, the eyes of theparticipant 132 depicted in the renderedforeground image 606 are aligned with the position of thepixel display region 190 c for theRGB camera 180 c, and thegaze direction 602 of theparticipant 155 is directed at theRGB camera 180 c. - As the
participant 132 tends to gaze at the eyes of theparticipant 155 during a video conferencing session, agaze direction 612 of theparticipant 132 is directed at theRGB camera 110 c behind the displayed eyes of theparticipant 155. Likewise, as theparticipant 155 tends to gaze at the eyes of theparticipant 132 during a video conferencing session, agaze direction 602 of theparticipant 155 is directed at theRGB camera 180 c behind the displayed eyes of theparticipant 132. As a result, both of themultimedia communication devices participants participants multimedia communication devices participants participants -
FIG. 6B illustrates asecond scenario 620 occurring at about a second time after the first time shown inFIG. 6A and during the video conferencing session shown inFIG. 6A , including a scene 620 a at the secondgeographic location 150 and ascene 620 b at the firstgeographic location 120. InFIG. 6B , the video conferencing system 102 (for example, the multimedia communication device 160) has determined that theparticipant 155 has moved to a new physical position, which is still within anFOV 184 c of theRGB camera 180 c. Based on the new physical position, thevideo conferencing system 102 determines that theparticipant 155 is at a lateral physical position relative to themultimedia communication device 160 that is more aligned with theRGB camera 180 b than theprevious RGB camera 180 c. In response to this determination, thevideo conferencing system 102 selects the correspondingRGB camera 110 b as the foreground camera for theparticipant 132, changing from theRGB camera 110 c selected inFIG. 6A . - Due to the selection of the
RGB camera 110 b as the foreground camera for theparticipant 132 in response to the new physical position of theparticipant 155, images of theparticipant 155 are displayed in alignment with theRGB camera area 210 b for theRGB camera 110 b, as shown by the position of the renderedforeground image 636 inFIG. 6B . As a result, thegaze direction 632 of theparticipant 132 moves from theRGB camera area 210 c to theRGB camera area 210 b. An RGB image captured by theRGB camera 110 b will be used for generating the renderedforeground image 626 displayed to theparticipant 155 via the video conferencing session, and with thegaze direction 632 directed at theRGB camera 110 b, a gaze-correct video conferencing session is maintained. For theparticipant 155, the renderedforeground image 626 continues to be aligned with theRGB camera area 190 c as inFIG. 6A , as theparticipant 132 has not moved significantly and thevideo conferencing system 102 continues to determine that the subject 132 is most aligned with theRGB camera 110 c (as inFIG. 1 ). Due to the new physical position of theparticipant 155 inFIG. 6B , theparticipant 155 has turned slightly to continue agaze direction 622 directed at theRGB camera 180 c, and a gaze-correct video conferencing session is maintained. Additionally, in response to the detected movement and change in physical position of theparticipant 155, themultimedia communication devices participants participant 132 and/or 155, as further illustrated byFIGS. 6C and 6D below. -
FIG. 6C illustrates athird scenario 640 occurring at about a third time after the second time shown inFIG. 6B and during the video conferencing session shown inFIGS. 6A and 6B , including a scene 640 a at the secondgeographic location 150 and ascene 640 b at the firstgeographic location 120. InFIG. 6C , thevideo conferencing system 102 has determined that theparticipant 155 has moved to another new physical position, which is still within anFOV 184 c of theRGB camera 180 c. Based on the new physical position, thevideo conferencing system 102 determines that theparticipant 155 is at a lateral physical position relative to themultimedia communication device 160 that is more aligned with theRGB camera 180 a than theprevious RGB camera 180 b. In response to this determination, thevideo conferencing system 102 selects the correspondingRGB camera 110 a as the foreground camera for theparticipant 132, changing from theRGB camera 110 b selected inFIG. 6B . - Due to the selection of the
RGB camera 110 a as the foreground camera for theparticipant 132 in response to the new physical position of theparticipant 155, images of theparticipant 155 are displayed in alignment with theRGB camera area 210 a for theRGB camera 110 a, as shown by the position of the renderedforeground image 656 inFIG. 6C . As a result, thegaze direction 652 of theparticipant 132 moves from theRGB camera area 210 b to theRGB camera area 210 a, and theparticipant 132 turns his body to facilitate thenew gaze direction 652. An RGB image captured by theRGB camera 110 a will be used for generating the renderedforeground image 646 displayed to theparticipant 155 via the video conferencing session, and with thegaze direction 652 directed at theRGB camera 110 a, a gaze-correct video conferencing session is maintained. For theparticipant 155, the renderedforeground image 646 continues to be aligned with theRGB camera area 190 c as inFIG. 6B . Due to the new physical position of theparticipant 155 inFIG. 6C , theparticipant 155 has turned her head to continue agaze direction 642 directed at theRGB camera 180 c, and a gaze-correct video conferencing session is maintained. Additionally, in response to the detected movement and change in physical position of theparticipant 155, themultimedia communication devices participants FIG. 6B . -
FIG. 6D illustrates afourth scenario 660 occurring at about a fourth time after the third time shown inFIG. 6C and during the video conferencing session shown inFIGS. 6A-6C , including a scene 660 a at the secondgeographic location 150 and ascene 660 b at the firstgeographic location 120. InFIG. 6D , the video conferencing system 102 (for example, the multimedia communication device 100) has determined that theparticipant 132 has moved to a new physical position, which is still within anFOV 304 a of theRGB camera 110 a. Based on the new physical position, thevideo conferencing system 102 determines that theparticipant 132 is at a lateral physical position relative to themultimedia communication device 100 that is more aligned with theRGB camera 110 b than theprevious RGB camera 110 c. In response to this determination, thevideo conferencing system 102 selects the correspondingRGB camera 180 b as the foreground camera for theparticipant 155, changing from theRGB camera 180 c selected inFIG. 6A . - Due to the selection of the
RGB camera 180 b as the foreground camera for theparticipant 155 in response to the new physical position of theparticipant 132, images of theparticipant 132 are displayed in alignment with theRGB camera area 190 b for theRGB camera 180 b, as shown by the position of the renderedforeground image 666 inFIG. 6D . As a result, thegaze direction 662 of theparticipant 155 moves from theRGB camera area 190 c to theRGB camera area 190 b. An RGB image captured by theRGB camera 180 b will be used for generating the renderedforeground image 676 displayed to theparticipant 132 via the video conferencing session, and with thegaze direction 662 directed at theRGB camera 180 b, a gaze-correct video conferencing session is maintained. For theparticipant 132, the renderedforeground image 676 continues to be aligned with theRGB camera area 180 a as inFIG. 6C . With agaze direction 672 continuing to be directed at theRGB camera 110 a, a gaze-correct video conferencing session is maintained Additionally, in response to the detected movement and change in physical position of theparticipant 132, themultimedia communication devices participants FIG. 6C . - Thus, as illustrated by the examples shown in
FIGS. 6A-6D , thevideo conferencing system 102, via themultimedia communication devices video conferencing system 102 conveys when another participant chooses to look away. This interaction and information is conveyed in a natural manner that conforms to established social conventions for in-person face-to-face interactions. Further, when the techniques ofFIGS. 5A-5D are combined with the techniques ofFIGS. 6A-6D , spatial arrangements may be controlled and perceived in further detail, further enhancing interactions. -
FIGS. 7A-7C illustrate a technique used in some implementations, in which rendered foreground images make an animated transition from one RGB camera area to another when a new foreground camera is selected, in which over several successive video frames the rendered foreground images “glide” or otherwise approximate lateral human motion from the previous RGB camera area to the new RGB camera area.FIG. 7A illustrates a position of the renderedforeground image 646 inFIG. 6C at a point when theRGB camera 180 c has been selected as the foreground camera for theparticipant 155. Accordingly, the eyes of theparticipant 132 in the renderedforeground image 646 are aligned with theRGB camera area 190 c.FIG. 7B illustrates an animated transition to a newRGB camera area 190 b in response to thescenario 660 shown inFIG. 6D . Over several video frames, a first renderedforeground image 710 for theparticipant 132 is first displayed at an intermediatelateral position 720 between theRGB camera areas foreground image 712 for theparticipant 132 being displayed at an intermediatelateral position 722 between the intermediatelateral position 720 and theRGB camera area 190 b, which is followed by a third renderedforeground image 714 for theparticipant 132 being displayed at an intermediatelateral position 724 between the intermediatelateral position 722 and theRGB camera area 190 b. Although three intermediatelateral positions FIG. 7B , any number of intermediate positions may be selected.FIG. 7C illustrates the rendered foreground image 766 at its target position aligned with theRGB camera area 190 b, as shown in inFIG. 6D . An advantage of performing the animated transition shown inFIGS. 7A-7C is that thegaze direction 662 of theparticipant 155 will track the animated position, resulting in a smoother transition in the gaze direction captured by the new foreground camera and displayed to theparticipant 132. Additionally, such animated transitions in position are visually engaging for participants, further drawing participant's gazes to the rendered eye positions. In some implementations, more exaggerated motions may be implemented and selected to further enhance these effects. -
FIG. 8 illustrates techniques involving havingmultiple participants multimedia communication device 100.FIG. 8 continues the video conferencing session shown inFIGS. 6A-6D , and illustrates afifth scenario 800 including ascene 800 a at the secondgeographic location 150 and ascene 800 b at the firstgeographic location 120. InFIG. 8 , the previously seatedparticipant 134 is now standing and in close proximity to themultimedia communication device 100. As a result, the video conferencing system 102 (for example, the multimedia communication device 100) has identified the twoparticipants participant 132 is at a different physical position than inFIG. 6D . Based on their physical positions relative to themultimedia communication device 100, thevideo conferencing system 102 determines that theparticipant 132 is at a lateral physical position relative to themultimedia communication device 100 that is most aligned with theRGB camera 110 d and that theparticipant 134 is at a lateral physical position relative to themultimedia communication device 100 that is most aligned with theRGB camera 110 b. - In response to these determinations, for the
participant 132, the video conferencing system 102 (for example, the multimedia communication device 160) selects theRGB camera area 190 d for theRGB camera 180 d corresponding to theRGB camera 110 d for alignment of the renderedforeground image 812. For theparticipant 134, the video conferencing system 102 (for example, the multimedia communication device 160) selects theRGB camera area 190 b for theRGB camera 180 b corresponding to theRGB camera 110 b for alignment of the renderedforeground image 814. As a result, the eyes of each of theparticipants multimedia communication device 160 in front ofrespective RGB cameras multimedia communication device 160 to capture gaze-aligned RGB images of theparticipant 155 when theparticipant 155 looks at either of theparticipants - When multiple participants are displayed in alignment with
different RGB cameras 180, the video conferencing system 102 (for example, the multimedia communication device 160) is configured to dynamically select a foreground camera from one of theRGB cameras 180 associated with a displayed participant. In some implementations, thevideo conferencing system 102 is configured to determine a gaze direction for theparticipant 155 and select theRGB camera 180 most directly aligned with the gaze direction of theparticipant 155. In the example shown inFIG. 8 , the participant is currently looking at theparticipant 132 along the gaze direction 902 a, and as a result, the current foreground camera for theparticipant 155 is theRGB camera 180 d. In response to theparticipant 155 shifting to the gaze direction 902 b to look at theparticipant 134, thevideo conferencing system 102 may select theRGB camera 180 b as the foreground camera. - In
FIG. 8 , theparticipant 155 is also at a different physical position than shown inFIG. 6D . Based on the new physical position, thevideo conferencing system 102 determines that theparticipant 155 is at a lateral physical position relative to themultimedia communication device 160 that is most aligned with theRGB camera 180 c. As in thescenario 600 shown inFIG. 6A , in response to this determination, thevideo conferencing system 102 selects the correspondingRGB camera 110 c as the foreground camera for theparticipant 132. Additionally, as only oneparticipant 155 is displayed on themultimedia communication device 100, thevideo conferencing system 102 also selects the correspondingRGB camera 110 c as the foreground camera for theparticipant 134. As both of theparticipants participant 155, as illustrated by thegaze directions RGB camera 110 c is effective for capturing gaze-aligned RGB images for both of theparticipants foreground images multimedia communication devices participants -
FIG. 9 illustrates an example of gaze-correct multi-party video conferencing among five participants each at a different geographic location. In some examples, similar techniques and advantages may be realized with three or more participants each at different locations.FIG. 9 illustrates ascenario 900 including a fivescenes geographic locations multimedia communication devices respective participants multimedia communication devices multimedia communication devices FIGS. 1-8 . For convenience of discussion, themultimedia communication devices video conferencing system 102. The discussion will focus on themultimedia communication device 930, as it is generally representative of the behavior of the othermultimedia communication devices - In response to the large number of participants at different geographic locations, the video conferencing system 102 (for example, the multimedia communication device 930) determines for the
multimedia communication device 930 which RGB camera is aligned with each of the rendered foreground images of theother participants composite image 940, each of the rendered foreground images has a narrower width than in the previous examples. However, as in previous examples, the eyes of all of theparticipants FIG. 8 , enables themultimedia communication device 930 to capture gaze-aligned RGB images of theparticipant 920 when theparticipant 920 looks at any of theparticipants - At the time shown in
FIG. 9 , theparticipant 924 is currently speaking, and accordingly may be referred to as the “active speaker” in the video conferencing session. In some implementations or circumstances, the video conferencing system 102 (for example, the multimedia communication device 930) may automatically select the RGB camera associated with the active speaker as the foreground camera, although gaze detection may be used in some implementations, as discussed inFIG. 8 . In this example, theparticipant 924 is engaged in a discussion with theparticipant 920, and as a result the gaze direction of theparticipant 924 is directed at the RGB camera corresponding to theparticipant 920. In some examples, thevideo conferencing system 102 may be configured to provide a visual indication of the active speaker, to assist participant identification of and focus on the active speaker. In some examples, as shown by themultimedia communication device 932, agraphical element 950, such as, but not limited to, an icon or outline may be included in a composite image 1042 to highlight the active speaker. In some examples, as shown by themultimedia communication device 938, the active speaker may be scaled differently than other participants and shown at a larger size than the other participants while still aligning the displayed eyes of the participants with respective RGB cameras. - As a result of the techniques described for
FIG. 9 , themultimedia communication devices -
FIG. 10 illustrates an example in which twomultimedia communication devices system 1010. Each of the multimedia communication devices orsystems multimedia communication devices FIGS. 1-9 . First and secondmultimedia communication devices multimedia communication devices multimedia communication device 1040 may be dynamically combined, including during an ongoing video conferencing session, with the firstmultimedia communication device 1020 to provide the largermultimedia communication device 1010. The twomultimedia communication devices system 1010, which is configured to make use of theRGB cameras depth cameras display devices - The detailed examples of systems, devices, and techniques described in connection with
FIGS. 1-10 are presented herein for illustration of the disclosure and its benefits. Such examples of use should not be construed to be limitations on the logical process implementations of the disclosure, nor should variations of user interface methods from those described herein be considered outside the scope of the present disclosure. In some implementations, various features described inFIGS. 1-10 are implemented in respective modules, which may also be referred to as, and/or include, logic, components, units, and/or mechanisms. Modules may constitute either software modules (for example, code embodied on a machine-readable medium) or hardware modules. - In some examples, a hardware module may be implemented mechanically, electronically, or with any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic that is configured to perform certain operations. For example, a hardware module may include a special-purpose processor, such as a field-programmable gate array (FPGA) or an Application Specific Integrated Circuit (ASIC). A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations, and may include a portion of machine-readable medium data and/or instructions for such configuration. For example, a hardware module may include software encompassed within a programmable processor configured to execute a set of software instructions. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (for example, configured by software) may be driven by cost, time, support, and engineering considerations.
- Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity capable of performing certain operations and may be configured or arranged in a certain physical manner, be that an entity that is physically constructed, permanently configured (for example, hardwired), and/or temporarily configured (for example, programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering examples in which hardware modules are temporarily configured (for example, programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module includes a programmable processor configured by software to become a special-purpose processor, the programmable processor may be configured as respectively different special-purpose processors (for example, including different hardware modules) at different times. Software may accordingly configure a particular processor or processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time. A hardware module implemented using one or more processors may be referred to as being “processor implemented” or “computer implemented.”
- Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (for example, over appropriate circuits and buses) between or among two or more of the hardware modules. In implementations in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory devices to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output in a memory device, and another hardware module may then access the memory device to retrieve and process the stored output.
- In some examples, at least some of the operations of a method may be performed by one or more processors or processor-implemented modules. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by, and/or among, multiple computers (as examples of machines including processors), with these operations being accessible via a network (for example, the Internet) and/or via one or more software interfaces (for example, an application program interface (API)). The performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across a number of machines. Processors or processor-implemented modules may be located in a single geographic location (for example, within a home or office environment, or a server farm), or may be distributed across multiple geographic locations.
-
FIG. 11 is a block diagram 1100 illustrating anexample software architecture 1102, various portions of which may be used in conjunction with various hardware architectures herein described, which may implement any of the above-described features.FIG. 11 is a non-limiting example of a software architecture and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein. Thesoftware architecture 1102 may execute on hardware such as adevice 120 ofFIG. 1A that includes, among other things, document storage 1170, processors, memory, and input/output (I/O) components. Arepresentative hardware layer 1104 is illustrated and can represent, for example, thedevice 120 ofFIG. 1 . Therepresentative hardware layer 1104 includes aprocessing unit 1106 and associatedexecutable instructions 1108. Theexecutable instructions 1108 represent executable instructions of thesoftware architecture 1102, including implementation of the methods, modules and so forth described herein. Thehardware layer 1104 also includes a memory/storage 1110, which also includes theexecutable instructions 1108 and accompanying data. Thehardware layer 1104 may also includeother hardware modules 1112.Instructions 1108 held byprocessing unit 1108 may be portions ofinstructions 1108 held by the memory/storage 1110. - The
example software architecture 1102 may be conceptualized as layers, each providing various functionality. For example, thesoftware architecture 1102 may include layers and components such as an operating system (OS) 1114,libraries 1116,frameworks 1118,applications 1120, and apresentation layer 1124. Operationally, theapplications 1120 and/or other components within the layers may invoke API calls 1124 to other layers and receivecorresponding results 1126. The layers illustrated are representative in nature and other software architectures may include additional or different layers. For example, some mobile or special purpose operating systems may not provide the frameworks/middleware 1118. - The
OS 1114 may manage hardware resources and provide common services. TheOS 1114 may include, for example, akernel 1128,services 1130, anddrivers 1132. Thekernel 1128 may act as an abstraction layer between thehardware layer 1104 and other software layers. For example, thekernel 1128 may be responsible for memory management, processor management (for example, scheduling), component management, networking, security settings, and so on. Theservices 1130 may provide other common services for the other software layers. Thedrivers 1132 may be responsible for controlling or interfacing with theunderlying hardware layer 1104. For instance, thedrivers 1132 may include display drivers, camera drivers, memory/storage drivers, peripheral device drivers (for example, via Universal Serial Bus (USB)), network and/or wireless communication drivers, audio drivers, and so forth depending on the hardware and/or software configuration. - The
libraries 1116 may provide a common infrastructure that may be used by theapplications 1120 and/or other components and/or layers. Thelibraries 1116 typically provide functionality for use by other software modules to perform tasks, rather than rather than interacting directly with theOS 1114. Thelibraries 1116 may include system libraries 1134 (for example, C standard library) that may provide functions such as memory allocation, string manipulation, file operations. In addition, thelibraries 1116 may includeAPI libraries 1136 such as media libraries (for example, supporting presentation and manipulation of image, sound, and/or video data formats), graphics libraries (for example, an OpenGL library for rendering 2D and 3D graphics on a display), database libraries (for example, SQLite or other relational database functions), and web libraries (for example, WebKit that may provide web browsing functionality). Thelibraries 1116 may also include a wide variety ofother libraries 1138 to provide many functions forapplications 1120 and other software modules. - The frameworks 1118 (also sometimes referred to as middleware) provide a higher-level common infrastructure that may be used by the
applications 1120 and/or other software modules. For example, theframeworks 1118 may provide various graphic user interface (GUI) functions, high-level resource management, or high-level location services. Theframeworks 1118 may provide a broad spectrum of other APIs forapplications 1120 and/or other software modules. - The
applications 1120 include built-inapplications 1120 and/or third-party applications 1122. Examples of built-inapplications 1120 may include, but are not limited to, a contacts application, a browser application, a location application, a media application, a messaging application, and/or a game application. Third-party applications 1122 may include any applications developed by an entity other than the vendor of the particular platform. Theapplications 1120 may use functions available viaOS 1114,libraries 1116,frameworks 1118, andpresentation layer 1124 to create user interfaces to interact with users. - Some software architectures use virtual machines, as illustrated by a
virtual machine 1128. Thevirtual machine 1128 provides an execution environment where applications/modules can execute as if they were executing on a hardware machine (such as the machine 1000 ofFIG. 10 , for example). Thevirtual machine 1128 may be hosted by a host OS (for example, OS 1114) or hypervisor, and may have avirtual machine monitor 1126 which manages operation of thevirtual machine 1128 and interoperation with the host operating system. A software architecture, which may be different fromsoftware architecture 1102 outside of the virtual machine, executes within thevirtual machine 1128 such as anOS 1150,libraries 1152,frameworks 1154,applications 1156, and/or apresentation layer 1158. -
FIG. 12 is a block diagram illustrating components of anexample machine 1200 configured to read instructions from a machine-readable medium (for example, a machine-readable storage medium) and perform any of the features described herein. Theexample machine 1200 is in a form of a computer system, within which instructions 1216 (for example, in the form of software components) for causing themachine 1200 to perform any of the features described herein may be executed. As such, theinstructions 1216 may be used to implement modules or components described herein. Theinstructions 1216 cause unprogrammed and/orunconfigured machine 1200 to operate as a particular machine configured to carry out the described features. Themachine 1200 may be configured to operate as a standalone device or may be coupled (for example, networked) to other machines. In a networked deployment, themachine 1200 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a node in a peer-to-peer or distributed network environment.Machine 1200 may be embodied as, for example, a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a set-top box (STB), a gaming and/or entertainment system, a smart phone, a mobile device, a wearable device (for example, a smart watch), and an Internet of Things (IoT) device. Further, although only asingle machine 1200 is illustrated, the term “machine” include a collection of machines that individually or jointly execute theinstructions 1216. - The
machine 1200 may includeprocessors 1210,memory 1230, and I/O components 1250, which may be communicatively coupled via, for example, a bus 1202. The bus 1202 may include multiple buses coupling various elements ofmachine 1200 via various bus technologies and protocols. In an example, the processors 1210 (including, for example, a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an ASIC, or a suitable combination thereof) may include one ormore processors 1212 a to 1212 n that may execute theinstructions 1216 and process data. In some examples, one ormore processors 1210 may execute instructions provided or identified by one or moreother processors 1210. The term “processor” includes a multi-core processor including cores that may execute instructions contemporaneously. AlthoughFIG. 12 shows multiple processors, themachine 1200 may include a single processor with a single core, a single processor with multiple cores (for example, a multi-core processor), multiple processors each with a single core, multiple processors each with multiple cores, or any combination thereof. In some examples, themachine 1200 may include multiple processors distributed among multiple machines. - The memory/
storage 1230 may include amain memory 1232, astatic memory 1234, or other memory, and astorage unit 1236, both accessible to theprocessors 1210 such as via the bus 1202. Thestorage unit 1236 andmemory store instructions 1216 embodying any one or more of the functions described herein. The memory/storage 1230 may also store temporary, intermediate, and/or long-term data forprocessors 1210. Theinstructions 1216 may also reside, completely or partially, within thememory storage unit 1236, within at least one of the processors 1210 (for example, within a command buffer or cache memory), within memory at least one of I/O components 1250, or any suitable combination thereof, during execution thereof. Accordingly, thememory storage unit 1236, memory inprocessors 1210, and memory in I/O components 1250 are examples of machine-readable media. - As used herein, “machine-readable medium” refers to a device able to temporarily or permanently store instructions and data that cause
machine 1200 to operate in a specific fashion. The term “machine-readable medium,” as used herein, does not encompass transitory electrical or electromagnetic signals per se (such as on a carrier wave propagating through a medium); the term “machine-readable medium” may therefore be considered tangible and non-transitory. Non-limiting examples of a non-transitory, tangible machine-readable medium may include, but are not limited to, nonvolatile memory (such as flash memory or read-only memory (ROM)), volatile memory (such as a static random-access memory (RAM) or a dynamic RAM), buffer memory, cache memory, optical storage media, magnetic storage media and devices, network-accessible or cloud storage, other types of storage, and/or any suitable combination thereof. The term “machine-readable medium” applies to a single medium, or combination of multiple media, used to store instructions (for example, instructions 1216) for execution by amachine 1200 such that the instructions, when executed by one ormore processors 1210 of themachine 1200, cause themachine 1200 to perform and one or more of the features described herein. Accordingly, a “machine-readable medium” may refer to a single storage device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. - The I/
O components 1250 may include a wide variety of hardware components adapted to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O components 1250 included in a particular machine will depend on the type and/or function of the machine. For example, mobile devices such as mobile phones may include a touch input device, whereas a headless server or IoT device may not include such a touch input device. The particular examples of I/O components illustrated inFIG. 12 are in no way limiting, and other types of components may be included inmachine 1200. The grouping of I/O components 1250 are merely for simplifying this discussion, and the grouping is in no way limiting. In various examples, the I/O components 1250 may include user output components 1252 and user input components 1254. User output components 1252 may include, for example, display components for displaying information (for example, a liquid crystal display (LCD) or a projector), acoustic components (for example, speakers), haptic components (for example, a vibratory motor or force-feedback device), and/or other signal generators. User input components 1254 may include, for example, alphanumeric input components (for example, a keyboard or a touch screen), pointing components (for example, a mouse device, a touchpad, or another pointing instrument), and/or tactile input components (for example, a physical button or a touch screen that provides location and/or force of touches or touch gestures) configured for receiving various user inputs, such as user commands and/or selections. - In some examples, the I/
O components 1250 may includebiometric components 1256 and/orposition components 1262, among a wide array of other environmental sensor components. Thebiometric components 1256 may include, for example, components to detect body expressions (for example, facial expressions, vocal expressions, hand or body gestures, or eye tracking), measure biosignals (for example, heart rate or brain waves), and identify a person (for example, via voice-, retina-, and/or facial-based identification). Theposition components 1262 may include, for example, location sensors (for example, a Global Position System (GPS) receiver), altitude sensors (for example, an air pressure sensor from which altitude may be derived), and/or orientation sensors (for example, magnetometers). - The I/
O components 1250 may includecommunication components 1264, implementing a wide variety of technologies operable to couple themachine 1200 to network(s) 1270 and/or device(s) 1280 via respectivecommunicative couplings communication components 1264 may include one or more network interface components or other suitable devices to interface with the network(s) 1270. Thecommunication components 1264 may include, for example, components adapted to provide wired communication, wireless communication, cellular communication, Near Field Communication (NFC), Bluetooth communication, Wi-Fi, and/or communication via other modalities. The device(s) 1280 may include other machines or various peripheral devices (for example, coupled via USB). - In some examples, the
communication components 1264 may detect identifiers or include components adapted to detect identifiers. For example, thecommunication components 1264 may include Radio Frequency Identification (RFID) tag readers, NFC detectors, optical sensors (for example, one- or multi-dimensional bar codes, or other optical codes), and/or acoustic detectors (for example, microphones to identify tagged audio signals). In some examples, location information may be determined based on information from thecommunication components 1262, such as, but not limited to, geo-location via Internet Protocol (IP) address, location via Wi-Fi, cellular, NFC, Bluetooth, or other wireless station identification and/or signal triangulation. - While various embodiments have been described, the description is intended to be exemplary, rather than limiting, and it is understood that many more embodiments and implementations are possible that are within the scope of the embodiments. Although many possible combinations of features are shown in the accompanying figures and discussed in this detailed description, many other combinations of the disclosed features are possible. Any feature of any embodiment may be used in combination with or substituted for any other feature or element in any other embodiment unless specifically restricted. Therefore, it will be understood that any of the features shown and/or discussed in the present disclosure may be implemented together in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Also, various modifications and changes may be made within the scope of the attached claims.
- While the foregoing has described what are considered to be the best mode and/or other examples, it is understood that various modifications may be made therein and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.
- Unless otherwise stated, all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the claims that follow, are approximate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain.
- The scope of protection is limited solely by the claims that now follow. That scope is intended and should be interpreted to be as broad as is consistent with the ordinary meaning of the language that is used in the claims when interpreted in light of this specification and the prosecution history that follows and to encompass all structural and functional equivalents. Notwithstanding, none of the claims are intended to embrace subject matter that fails to satisfy the requirement of
Sections 101, 102, or 103 of the Patent Act, nor should they be interpreted in such a way. Any unintended embracement of such subject matter is hereby disclaimed. - Except as stated immediately above, nothing that has been stated or illustrated is intended or should be interpreted to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is or is not recited in the claims.
- It will be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein. Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element proceeded by “a” or “an” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
- The Abstract of the Disclosure is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various examples for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claims require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed example. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
Claims (22)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/056,446 US10554921B1 (en) | 2018-08-06 | 2018-08-06 | Gaze-correct video conferencing systems and methods |
EP19744982.0A EP3834408A1 (en) | 2018-08-06 | 2019-06-28 | Gaze-correct video conferencing systems and methods |
PCT/US2019/040009 WO2020033077A1 (en) | 2018-08-06 | 2019-06-28 | Gaze-correct video conferencing systems and methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/056,446 US10554921B1 (en) | 2018-08-06 | 2018-08-06 | Gaze-correct video conferencing systems and methods |
Publications (2)
Publication Number | Publication Date |
---|---|
US10554921B1 US10554921B1 (en) | 2020-02-04 |
US20200045261A1 true US20200045261A1 (en) | 2020-02-06 |
Family
ID=67439393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/056,446 Active US10554921B1 (en) | 2018-08-06 | 2018-08-06 | Gaze-correct video conferencing systems and methods |
Country Status (3)
Country | Link |
---|---|
US (1) | US10554921B1 (en) |
EP (1) | EP3834408A1 (en) |
WO (1) | WO2020033077A1 (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10803321B1 (en) * | 2019-07-30 | 2020-10-13 | Sling Media Pvt Ltd | Visual-based automatic video feed selection for a digital video production system |
US10928904B1 (en) * | 2019-12-31 | 2021-02-23 | Logitech Europe S.A. | User recognition and gaze tracking in a video system |
CN113068003A (en) * | 2021-01-29 | 2021-07-02 | 深兰科技(上海)有限公司 | Data display method and device, intelligent glasses, electronic equipment and storage medium |
US11163995B2 (en) | 2019-12-31 | 2021-11-02 | Logitech Europe S.A. | User recognition and gaze tracking in a video system |
US11410331B2 (en) * | 2019-10-03 | 2022-08-09 | Facebook Technologies, Llc | Systems and methods for video communication using a virtual camera |
WO2022221064A1 (en) * | 2021-04-16 | 2022-10-20 | Microsoft Technology Licensing, Llc | Gaze based video stream processing |
WO2022221532A1 (en) * | 2021-04-15 | 2022-10-20 | View, Inc. | Immersive collaboration of remote participants via media displays |
US20220353438A1 (en) * | 2021-04-16 | 2022-11-03 | Zoom Video Communication, Inc. | Systems and methods for immersive scenes |
US11579571B2 (en) | 2014-03-05 | 2023-02-14 | View, Inc. | Monitoring sites containing switchable optical devices and controllers |
WO2023048631A1 (en) * | 2021-09-24 | 2023-03-30 | Flatfrog Laboratories Ab | A videoconferencing method and system with focus detection of the presenter |
WO2023048632A1 (en) * | 2021-09-24 | 2023-03-30 | Flatfrog Laboratories Ab | A videoconferencing method and system with automatic muting |
WO2023048630A1 (en) * | 2021-09-24 | 2023-03-30 | Flatfrog Laboratories Ab | A videoconferencing system and method with maintained eye contact |
WO2023048731A1 (en) * | 2021-09-27 | 2023-03-30 | Hewlett-Packard Development Company, L.P. | Active image sensors |
US11631493B2 (en) | 2020-05-27 | 2023-04-18 | View Operating Corporation | Systems and methods for managing building wellness |
US20230177879A1 (en) * | 2021-12-06 | 2023-06-08 | Hewlett-Packard Development Company, L.P. | Videoconference iris position adjustments |
US11681197B2 (en) | 2011-03-16 | 2023-06-20 | View, Inc. | Onboard controller for multistate windows |
US11687045B2 (en) | 2012-04-13 | 2023-06-27 | View, Inc. | Monitoring sites containing switchable optical devices and controllers |
US11740948B2 (en) | 2014-12-08 | 2023-08-29 | View, Inc. | Multiple interacting systems at a site |
US11743071B2 (en) | 2018-05-02 | 2023-08-29 | View, Inc. | Sensing and communications unit for optically switchable window systems |
US11750594B2 (en) | 2020-03-26 | 2023-09-05 | View, Inc. | Access and messaging in a multi client network |
US11747696B2 (en) | 2017-04-26 | 2023-09-05 | View, Inc. | Tandem vision window and media display |
US11747698B2 (en) | 2017-04-26 | 2023-09-05 | View, Inc. | Tandem vision window and media display |
US11754902B2 (en) | 2009-12-22 | 2023-09-12 | View, Inc. | Self-contained EC IGU |
US20230319221A1 (en) * | 2022-03-29 | 2023-10-05 | Rovi Guides, Inc. | Systems and methods for enabling user-controlled extended reality |
US11868103B2 (en) | 2014-03-05 | 2024-01-09 | View, Inc. | Site monitoring system |
US11886089B2 (en) | 2017-04-26 | 2024-01-30 | View, Inc. | Displays for tintable windows |
US11892738B2 (en) | 2017-04-26 | 2024-02-06 | View, Inc. | Tandem vision window and media display |
US11892737B2 (en) | 2014-06-30 | 2024-02-06 | View, Inc. | Control methods and systems for networks of optically switchable windows during reduced power availability |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11113887B2 (en) * | 2018-01-08 | 2021-09-07 | Verizon Patent And Licensing Inc | Generating three-dimensional content from two-dimensional images |
AT521845B1 (en) * | 2018-09-26 | 2021-05-15 | Waits Martin | Method for adjusting the focus of a film camera |
CN112887654B (en) * | 2021-01-25 | 2022-05-31 | 联想(北京)有限公司 | Conference equipment, conference system and data processing method |
CN113225517A (en) * | 2021-04-14 | 2021-08-06 | 海信集团控股股份有限公司 | Video picture determining method and communication equipment during multi-party video call |
US11561686B2 (en) | 2021-05-11 | 2023-01-24 | Microsoft Technology Licensing, Llc | Intelligent content display for network-based communications |
US20230045610A1 (en) * | 2021-08-03 | 2023-02-09 | Dell Products, L.P. | Eye contact correction in a communication or collaboration session using a platform framework |
US11720237B2 (en) | 2021-08-05 | 2023-08-08 | Motorola Mobility Llc | Input session between devices based on an input trigger |
US11583760B1 (en) | 2021-08-09 | 2023-02-21 | Motorola Mobility Llc | Controller mode for a mobile device |
US11694419B2 (en) * | 2021-09-06 | 2023-07-04 | Kickback Space Inc. | Image analysis and gaze redirection using characteristics of the eye |
US11641440B2 (en) * | 2021-09-13 | 2023-05-02 | Motorola Mobility Llc | Video content based on multiple capture devices |
JP7225440B1 (en) * | 2022-01-05 | 2023-02-20 | キヤノン株式会社 | IMAGING DEVICE, IMAGING SYSTEM, IMAGING DEVICE CONTROL METHOD AND PROGRAM |
US11914858B1 (en) * | 2022-12-09 | 2024-02-27 | Helen Hyun-Min Song | Window replacement display device and control method thereof |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8199185B2 (en) * | 1995-09-20 | 2012-06-12 | Videotronic Systems | Reflected camera image eye contact terminal |
US5953052A (en) * | 1995-09-20 | 1999-09-14 | Videotronic Systems | Reflected display teleconferencing eye contact terminal |
US5963250A (en) * | 1995-10-20 | 1999-10-05 | Parkervision, Inc. | System and method for controlling the field of view of a camera |
US8106924B2 (en) | 2008-07-31 | 2012-01-31 | Stmicroelectronics S.R.L. | Method and system for video rendering, computer program product therefor |
US8384760B2 (en) | 2009-10-29 | 2013-02-26 | Hewlett-Packard Development Company, L.P. | Systems for establishing eye contact through a display |
WO2012008972A1 (en) | 2010-07-16 | 2012-01-19 | Hewlett-Packard Development Company, L.P. | Methods and systems for establishing eye contact and accurate gaze in remote collaboration |
US8896655B2 (en) | 2010-08-31 | 2014-11-25 | Cisco Technology, Inc. | System and method for providing depth adaptive video conferencing |
US8432432B2 (en) | 2010-12-03 | 2013-04-30 | Microsoft Corporation | Eye gaze reduction |
US9160966B2 (en) | 2011-05-11 | 2015-10-13 | Microsoft Technology Licensing, Llc | Imaging through a display screen |
US9684953B2 (en) * | 2012-02-27 | 2017-06-20 | Eth Zurich | Method and system for image processing in video conferencing |
US9088693B2 (en) | 2012-09-28 | 2015-07-21 | Polycom, Inc. | Providing direct eye contact videoconferencing |
US9424467B2 (en) * | 2013-03-14 | 2016-08-23 | Disney Enterprises, Inc. | Gaze tracking and recognition with image location |
US9232183B2 (en) | 2013-04-19 | 2016-01-05 | At&T Intellectual Property I, Lp | System and method for providing separate communication zones in a large format videoconference |
US9898856B2 (en) | 2013-09-27 | 2018-02-20 | Fotonation Cayman Limited | Systems and methods for depth-assisted perspective distortion correction |
US10075656B2 (en) * | 2013-10-30 | 2018-09-11 | At&T Intellectual Property I, L.P. | Methods, systems, and products for telepresence visualizations |
US10165176B2 (en) | 2013-10-31 | 2018-12-25 | The University Of North Carolina At Chapel Hill | Methods, systems, and computer readable media for leveraging user gaze in user monitoring subregion selection systems |
EP3100135A4 (en) | 2014-01-31 | 2017-08-30 | Hewlett-Packard Development Company, L.P. | Camera included in display |
GB2524473A (en) | 2014-02-28 | 2015-09-30 | Microsoft Technology Licensing Llc | Controlling a computing-based device using gestures |
US9485414B2 (en) * | 2014-06-20 | 2016-11-01 | John Visosky | Eye contact enabling device for video conferencing |
GB201507224D0 (en) | 2015-04-28 | 2015-06-10 | Microsoft Technology Licensing Llc | Eye gaze correction |
GB201507210D0 (en) * | 2015-04-28 | 2015-06-10 | Microsoft Technology Licensing Llc | Eye gaze correction |
US10423830B2 (en) * | 2016-04-22 | 2019-09-24 | Intel Corporation | Eye contact correction in real time using neural network based machine learning |
-
2018
- 2018-08-06 US US16/056,446 patent/US10554921B1/en active Active
-
2019
- 2019-06-28 EP EP19744982.0A patent/EP3834408A1/en not_active Withdrawn
- 2019-06-28 WO PCT/US2019/040009 patent/WO2020033077A1/en unknown
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11822159B2 (en) | 2009-12-22 | 2023-11-21 | View, Inc. | Self-contained EC IGU |
US11754902B2 (en) | 2009-12-22 | 2023-09-12 | View, Inc. | Self-contained EC IGU |
US11681197B2 (en) | 2011-03-16 | 2023-06-20 | View, Inc. | Onboard controller for multistate windows |
US11687045B2 (en) | 2012-04-13 | 2023-06-27 | View, Inc. | Monitoring sites containing switchable optical devices and controllers |
US11868103B2 (en) | 2014-03-05 | 2024-01-09 | View, Inc. | Site monitoring system |
US11733660B2 (en) | 2014-03-05 | 2023-08-22 | View, Inc. | Monitoring sites containing switchable optical devices and controllers |
US11579571B2 (en) | 2014-03-05 | 2023-02-14 | View, Inc. | Monitoring sites containing switchable optical devices and controllers |
US11892737B2 (en) | 2014-06-30 | 2024-02-06 | View, Inc. | Control methods and systems for networks of optically switchable windows during reduced power availability |
US11740948B2 (en) | 2014-12-08 | 2023-08-29 | View, Inc. | Multiple interacting systems at a site |
US11948015B2 (en) | 2014-12-08 | 2024-04-02 | View, Inc. | Multiple interacting systems at a site |
US11892738B2 (en) | 2017-04-26 | 2024-02-06 | View, Inc. | Tandem vision window and media display |
US11868019B2 (en) | 2017-04-26 | 2024-01-09 | View, Inc. | Tandem vision window and media display |
US11886089B2 (en) | 2017-04-26 | 2024-01-30 | View, Inc. | Displays for tintable windows |
US11747698B2 (en) | 2017-04-26 | 2023-09-05 | View, Inc. | Tandem vision window and media display |
US11747696B2 (en) | 2017-04-26 | 2023-09-05 | View, Inc. | Tandem vision window and media display |
US11743071B2 (en) | 2018-05-02 | 2023-08-29 | View, Inc. | Sensing and communications unit for optically switchable window systems |
US10803321B1 (en) * | 2019-07-30 | 2020-10-13 | Sling Media Pvt Ltd | Visual-based automatic video feed selection for a digital video production system |
US10965963B2 (en) | 2019-07-30 | 2021-03-30 | Sling Media Pvt Ltd | Audio-based automatic video feed selection for a digital video production system |
US11410331B2 (en) * | 2019-10-03 | 2022-08-09 | Facebook Technologies, Llc | Systems and methods for video communication using a virtual camera |
US11693475B2 (en) | 2019-12-31 | 2023-07-04 | Logitech Europe S.A. | User recognition and gaze tracking in a video system |
US11163995B2 (en) | 2019-12-31 | 2021-11-02 | Logitech Europe S.A. | User recognition and gaze tracking in a video system |
US10928904B1 (en) * | 2019-12-31 | 2021-02-23 | Logitech Europe S.A. | User recognition and gaze tracking in a video system |
US11882111B2 (en) | 2020-03-26 | 2024-01-23 | View, Inc. | Access and messaging in a multi client network |
US11750594B2 (en) | 2020-03-26 | 2023-09-05 | View, Inc. | Access and messaging in a multi client network |
US11631493B2 (en) | 2020-05-27 | 2023-04-18 | View Operating Corporation | Systems and methods for managing building wellness |
CN113068003A (en) * | 2021-01-29 | 2021-07-02 | 深兰科技(上海)有限公司 | Data display method and device, intelligent glasses, electronic equipment and storage medium |
WO2022221532A1 (en) * | 2021-04-15 | 2022-10-20 | View, Inc. | Immersive collaboration of remote participants via media displays |
WO2022221064A1 (en) * | 2021-04-16 | 2022-10-20 | Microsoft Technology Licensing, Llc | Gaze based video stream processing |
US11558563B2 (en) * | 2021-04-16 | 2023-01-17 | Zoom Video Communications, Inc. | Systems and methods for immersive scenes |
US11956561B2 (en) | 2021-04-16 | 2024-04-09 | Zoom Video Communications, Inc. | Immersive scenes |
US11740693B2 (en) | 2021-04-16 | 2023-08-29 | Microsoft Technology Licensing, Llc | Gaze based video stream processing |
US20220353438A1 (en) * | 2021-04-16 | 2022-11-03 | Zoom Video Communication, Inc. | Systems and methods for immersive scenes |
WO2023048632A1 (en) * | 2021-09-24 | 2023-03-30 | Flatfrog Laboratories Ab | A videoconferencing method and system with automatic muting |
WO2023048630A1 (en) * | 2021-09-24 | 2023-03-30 | Flatfrog Laboratories Ab | A videoconferencing system and method with maintained eye contact |
WO2023048631A1 (en) * | 2021-09-24 | 2023-03-30 | Flatfrog Laboratories Ab | A videoconferencing method and system with focus detection of the presenter |
WO2023048731A1 (en) * | 2021-09-27 | 2023-03-30 | Hewlett-Packard Development Company, L.P. | Active image sensors |
US20230177879A1 (en) * | 2021-12-06 | 2023-06-08 | Hewlett-Packard Development Company, L.P. | Videoconference iris position adjustments |
US20230319221A1 (en) * | 2022-03-29 | 2023-10-05 | Rovi Guides, Inc. | Systems and methods for enabling user-controlled extended reality |
Also Published As
Publication number | Publication date |
---|---|
EP3834408A1 (en) | 2021-06-16 |
US10554921B1 (en) | 2020-02-04 |
WO2020033077A1 (en) | 2020-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10554921B1 (en) | Gaze-correct video conferencing systems and methods | |
US10694146B2 (en) | Video capture systems and methods | |
US11023093B2 (en) | Human-computer interface for computationally efficient placement and sizing of virtual objects in a three-dimensional representation of a real-world environment | |
US9883138B2 (en) | Telepresence experience | |
US11729342B2 (en) | Designated view within a multi-view composited webcam signal | |
US10089769B2 (en) | Augmented display of information in a device view of a display screen | |
US9538133B2 (en) | Conveying gaze information in virtual conference | |
US9800831B2 (en) | Conveying attention information in virtual conference | |
US8698874B2 (en) | Techniques for multiple video source stitching in a conference room | |
US9369628B2 (en) | Utilizing a smart camera system for immersive telepresence | |
US20150334348A1 (en) | Privacy camera | |
JP2017108366A (en) | Method of controlling video conference, system, and program | |
CN112243583A (en) | Multi-endpoint mixed reality conference | |
US20200304713A1 (en) | Intelligent Video Presentation System | |
US20220400228A1 (en) | Adjusting participant gaze in video conferences | |
US9065972B1 (en) | User face capture in projection-based systems | |
Hsu et al. | Look at me! correcting eye gaze in live video communication | |
EP2953351B1 (en) | Method and apparatus for eye-line augmentation during a video conference | |
US11972505B2 (en) | Augmented image overlay on external panel | |
WO2023009124A1 (en) | Tactile copresence | |
KR101540110B1 (en) | System, method and computer-readable recording media for eye contact among users |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LIM, SE HOON;LARGE, TIMOTHY ANDREW;REEL/FRAME:046567/0291 Effective date: 20180806 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |