WO2023048630A1

WO2023048630A1 - A videoconferencing system and method with maintained eye contact

Info

Publication number: WO2023048630A1
Application number: PCT/SE2022/050846
Authority: WO
Inventors: Gunnar Weibull; Ola Wassvik; Håkan Bergström; Karolina DOROZYNSKA
Original assignee: Flatfrog Laboratories Ab
Priority date: 2021-09-24
Filing date: 2022-09-23
Publication date: 2023-03-30

Abstract

A method of videoconferencing between a first user and a second user comprises receiving at least one first image data of the first user from a first camera mounted behind a first display. The first display is configured to be optically transmissive with respect to the first camera. The method also comprises receiving at least one second image data from a second camera of the first user. The method further comprises generating the modified image data based on the first image data and the second image data. The method further yet comprises sending a display signal comprising the modified image data configured to be displayed for the second user on a second display.

Description

A VIDEOCONFERENCING SYSTEM AND METHOD WITH MAINTAINED EYE CONTACT

Technical Field

The present disclosure relates to a method of videoconferencing and a videoconferencing system. In particular, the present disclosure relates a method of videoconferencing which modifies the image data received from a plurality of cameras.

Remote working is becoming increasingly important to employers and employees. For example, there is an increasing demand not to travel and face to face meetings are being replaced with alternatives such as videoconferencing.

One problem is how to maintain eye contact with participants during videoconference to provide a better experience for the participants.

Examples of the present disclosure aim to address the aforementioned problems.

According to an aspect of the present disclosure there is a method of videoconferencing between a first user and a second user comprising: receiving at least one first image data of the first user from a first camera mounted behind a first display, the first display configured to be optically transmissive with respect to the first camera; receiving at least one second image data from a second camera of the first user; generating the modified image data based on the first image data and the second image data; and sending a display signal comprising the modified image data configured to be displayed for the second user on a second display.

Optionally, the first camera is a near-infrared camera.

Optionally, the second camera is a visible spectrum camera.

Optionally, the modifying comprises recolouring the first image data based on the second image data. Optionally, the recolouring of the first image data is based on one or more of image artifacts, colour space, brightness, contrast, position, perspective in the second image data.

Optionally, the method comprising calibrating with initial data sets received respectively from the first camera and the second camera.

Optionally, the method comprises training a neural network with the at least one second image data.

Optionally, the method comprises recolouring the first received image data with the neural network trained with the at least one second image data.

Optionally, the method comprises illuminating the at least one first user with one or more near-infrared lights.

Optionally, the one or more near-infrared lights is a ring light mounted behind the display.

Optionally, the second camera is mounted on a side of the display.

Optionally, the at least one first camera comprises a plurality of different first cameras mounted at different locations behind the display.

Optionally, at least one second image data from the second camera is only received during a set-up operation.

Optionally, the first camera is mounted in a display stack of the first display.

Optionally, the first camera comprises a higher resolution than the second camera.

Optionally, both the first and second cameras are integrated into the display. Optionally, the first and I or the second camera are mounted in a non-illuminating portion of the display.

According to another aspect of the present disclosure there is a videoconferencing device for videoconferencing between a first user and a second user comprising: a display configured to display an image of the second user; a first camera configured to capture at least one first image data of the first user, the first camera mounted behind the display, the display configured to be optically transmissive with respect to the first camera; a second camera configured to receive at least one second image data of the first user; and a controller comprising a processor, the controller configured to generate modified image data based on the first and second image data; and send a display signal comprising the modified image data configured to be displayed.

According to an aspect of the present disclosure there is a video conferencing device for video conferencing between a local user and a plurality of remote users comprising: a display configured to a first remote user image and a second remote user image; a first camera mounted behind the display configured to capture at least one first image of the local user; a second camera mounted behind the display configured to capture at least one second image of the local user; wherein the display is configured to be optically transmissive with respect to the first camera and the second camera; and a controller, having a processor, the controller being configured to send a display signal to the display to position the first remote image user over the first camera; detect a change in the gaze direction of the local user from the first remote user image to the second remote user image based on the first images and the second images of the local user; and position the second remote user image over the second camera in dependence of detecting the change in gaze direction from the first remote user image to the second remote user image.

Optionally, the at least one first image of the local user is displayed on a first remote user display and the at least one second image of the local user is display on a second remote user display.

According to yet another aspect of the present disclosure there is method of video conferencing between a local user and a plurality of remote users comprising: displaying a first remote user image and a second remote user image a display; receiving at least one first image data of the local user from a first camera mounted behind a first display, the first display configured to be optically transmissive with respect to the first camera; receiving at least one second image data of the local user from a second camera; sending a display signal for displaying the first remote user image over the first camera; detecting a change in the gaze direction of the local user from the first remote user image to the second remote user image based on the first images and the second images of the local user; and positioning the second remote user image over the second camera in dependence of detecting the change in gaze direction from the first remote user image to the second remote user image.

Brief Description of the Drawings

Various other aspects and further examples are also described in the following detailed description and in the attached claims with reference to the accompanying drawings, in which:

Figure 1 shows a schematic representation of a videoconferencing terminal according to an example;

Figure 2 shows a schematic representation of a videoconferencing system according to an example;

Figure 3 shows a side schematic representation of videoconferencing terminal according to an example;

Figures 4 to 8 show schematic representations of a videoconferencing system according to an example;

Figure 9 shows a flow diagram of the videoconferencing method according to an example;

Figures 10a, 10b and 11 show a schematic representation of a videoconferencing terminal comprising a touch sensing apparatus according to an example; and

Figures 12 and 13 shows flow diagrams of the videoconferencing method according to an example.

Detailed

Figure 1 shows a schematic view of a first videoconferencing terminal 100 according to some examples. The first videoconferencing terminal 100 is a first videoconference terminal 100 configured to be in communication with a second video conference terminal 202 (as shown in Figure 2).

The first videoconferencing terminal 100 comprises a camera module 102 and a first display 104. The first videoconferencing terminal 100 selectively controls the activation of the camera module 102 and the first display 104. As shown in Figure 1 , the camera module 102 and the first display 104 are controlled by a camera controller 106 and a display controller 108 respectively. As discussed in more detail below, the camera module 102 comprises one or more cameras 210, 212, 224.

The first videoconferencing terminal 100 comprises a videoconferencing controller 110. The videoconferencing controller 110, the camera controller 106 and the display controller 108 may be configured as separate units, or they may be incorporated in a single unit.

The videoconferencing controller 110 comprises a plurality of modules for processing the videos and images received from a remotely from an interface 112 and videos and images captured locally. The interface 112 and the method of transmitted and receiving videoconferencing data is known and will not be discussed any further. In some examples, the videoconferencing controller 110 comprises a face detection module 114 for detecting facial features and an image processing module 116 for modifying a first display image 220 to be displayed (as shown in Figure 2) on the first display 104.

In some examples, the videoconferencing controller 110 comprises an eye tracking module 118. The eye tracking module 118 can be part of the face detection module 114 or alternatively, the eye tracking module 118 can be a separate module from the face detection module 114. The face detection module 114, the image processing module 116, and the eye tracking module 118 will be discussed in further detail below.

One or all of the videoconferencing controller 110, the camera controller 106 and the display controller 108 may be at least partially implemented by software executed by a processing unit 120. The face detection module 114, the image processing module 116, and the eye-tracking module 118 may be configured as separate units, or they may be incorporated in a single unit. One or all of the face detection module 114, the image processing module 116, and the eye-tracking module 118 may be at least partially implemented by software executed by the processing unit 120.

The processing unit 120 may be implemented by special-purpose software (or firmware) run on one or more general-purpose or special-purpose computing devices. In this context, it is to be understood that each "element" or "means" of such a computing device refers to a conceptual equivalent of a method step; there is not always a one-to-one correspondence between elements/means and particular pieces of hardware or software routines. One piece of hardware sometimes comprises different means/elements. For example, a processing unit 120 may serve as one element/means when executing one instruction but serve as another element/means when executing another instruction. In addition, one element/means may be implemented by one instruction in some cases, but by a plurality of instructions in some other cases. Naturally, it is conceivable that one or more elements (means) are implemented entirely by analogue hardware components.

The processing unit 120 may include one or more processing units, e.g. a CPU ("Central Processing Unit"), a DSP ("Digital Signal Processor"), an ASIC ("Application- Specific Integrated Circuit"), discrete analogue and/or digital components, or some other programmable logical device, such as an FPGA ("Field Programmable Gate Array"). The processing unit 120 may further include a system memory and a system bus that couples various system components including the system memory to the processing unit. The system bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory may include computer storage media in the form of volatile and/or non-volatile memory such as read only memory (ROM), random access memory (RAM) and flash memory. The specialpurpose software and associated control parameter values may be stored in the system memory, or on other removable/non-removable volatile/non-volatile computer storage media which is included in or accessible to the computing device, such as magnetic media, optical media, flash memory cards, digital tape, solid state RAM, solid state ROM, etc. The processing unit 120 may include one or more communication interfaces, such as a serial interface, a parallel interface, a USB interface, a wireless interface, a network adapter, etc, as well as one or more data acquisition devices, such as an A/D converter. The special-purpose software may be provided to the processing unit 120 on any suitable computer-readable medium, including a record medium, and a read-only memory.

As mentioned above, the first videoconferencing terminal 100 will be used together with another remote second videoconferencing terminal 202. In some examples, the first videoconferencing terminal 100 can be used with a plurality of second videoconferencing terminals 202.

In some examples, the first videoconferencing terminal 100 is a presenter videoconferencing terminal 100, but the user of the first videoconferencing terminal 100 may not necessarily be designated as a presenter in the videoconference. Indeed, any of the remote second users 206 can present material in the videoconference.

The first videoconferencing terminal 100 will now be discussed in more detail with respect to Figure 2. Figure 2 shows a schematic representation of a videoconferencing system 200. The first videoconferencing terminal 100 as shown in Figure 2 is the same as described in reference to Figure 1. In some examples, the second videoconferencing terminal 202 is the same as described in reference to Figure 1. As shown in Figure 2, the second videoconferencing terminal 202 comprises a remote second display 218 which is the same as the first display 104. Other components of the second videoconferencing terminal 202 are the same as the first videoconferencing terminal 100.

In this way, a first user 204 can present to one or more remote second users 206. For the purposes of clarity, Figure 2 only shows one remote second user 206. However, in some examples there can be one or more remote second users 206. For example, there can be any number of remote second users 206. In other examples, there can also be any number of remote conferencing remote videoconferencing terminals 202.

Optionally, the first videoconferencing terminal 100 comprises additional functionality to the second videoconferencing terminals 202. For example, the first videoconferencing terminal 100 can be a large touch screen e.g. a first display 104 comprising a touch sensing apparatus (not shown). Touch sensing apparatuses are known and will not be discussed in any further detail. In some examples, the second videoconferencing terminals 202 can be a laptop, desktop computer, tablet, smartphone, or any other suitable device. In some other examples, the first videoconferencing terminal 100 does not comprise a touch sensing apparatus 1000 and is e.g. a laptop.

Optionally, the first videoconferencing terminal 100 comprises additional functionality to the remote second videoconferencing terminals 202. For example, the first videoconferencing terminal 100 can be a large touch screen e.g. a first display 104 comprising a touch sensing apparatus 1000 (as shown in Figures 10a, 10b, 11 ). In some examples, the remote second videoconferencing terminals 202 can be a laptop, desktop computer, tablet, smartphone, or any other suitable device. In some other examples, the first videoconferencing terminal 100 does not comprise a touch sensing apparatus 1000 and is e.g. a laptop.

If the first videoconferencing terminal 100 is a large touch screen, the first user 204 can present to both local participants 208 in the same room as the first user 204 and the remote second users 206 not in the same room as the first user 204. As mentioned above, the touch sensing apparatus 1000 is optional, and the first user 204 may present to both local participants 208 and remote second users 206 without the touch sensing apparatus 1000.

The example where the first videoconferencing terminal 100 is a touch screen will now be discussed in further detail in reference to Figures 10a, 10b and 11. Figures 10a and 10b illustrate an optional example of a touch sensing apparatus 1000 known as ‘above surface optical touch systems’. In some examples, the first videoconferencing terminal 100 comprises the touch sensing apparatus 1000. Whilst the touch sensing apparatus 1000 as shown and discussed in reference to Figures 10a and 10b can be an above surface optical touch system, alternative touch sensing technology can be used.

For example, the examples discussed with reference to the Figures 10a, 10b and 11 can be applied to any other above surface optical touch system configuration as well as non-above surface optical touch system types which perform touch detection in frames.

In some examples the touch sensing apparatus 1000 can use one or more of the following including: frustrated total internal reflection (FTIR), resistive, surface acoustic wave, capacitive, surface capacitance, projected capacitance, above surface optical touch, dispersive signal technology and acoustic pulse recognition type touch systems. The touch sensing apparatus 1000 can be any suitable apparatus for detecting touch input from a human interface device.

The touch sensing apparatus 1000 will now be discussed in reference to Figure 10a and Figure 10b. Figure 10a shows a schematic side view of a touch sensing apparatus 1000. Figure 10b shows a schematic top view of a touch sensing apparatus 1000.

The touch sensing apparatus 1000 comprises a set of optical emitters 1004 which are arranged around the periphery of a touch surface 1008. The optical emitters 1004 are configured to emit light that is reflected to travel above a touch surface 1008. A set of optical detectors 1006 are also arranged around the periphery of the touch surface 1008 to receive light from the set of optical emitters 1004 from above the touch surface 1008. An object 1012 that touches the touch surface 1008 will attenuate the light on one or more propagation paths D of the light and cause a change in the light received by one or more of the optical detectors 1006. The location (coordinates), shape or area of the object 1012 may be determined by analysing the received light at the detectors.

In some examples, the optical emitters 1004 are optionally arranged on a substrate 1034 such as a printed circuit board, and light from the optical emitters 1004 travel above the touch surface 1008 of a touch panel 1002 via reflection or scattering on an edge reflector I diffusor 1020. The emitted light may propagate through an optional light transmissive sealing window 1024.

The optional light transmissive sealing window 1024 allows light to propagate therethrough but prevents ingress of dirt into a frame 1036 where the electronics and other components are mounted. The light will then continue until deflected by a corresponding edge reflector / diffuser 1020 at an opposing edge of the touch panel 1002, where the light will be scattered back down around the touch panel 1002 and onto the optical detectors 1006. The touch panel 1002 can be a light transmissive panel for allowing light from the first display 104 propagating therethrough.

In some examples the touch panel 1002 is a sheet of glass. Alternatively, in some other examples, the touch panel 1002 is a sheet of any suitable light transmissive material such as polymethyl methacrylate, or any other suitable light transmissive plastic material.

In this way, the touch sensing apparatus 1000 comprising the light transmissive touch panel 1002 may be designed to be overlaid on or integrated into the first display 104. This means that the first display 104 can be viewed through the touch panel 1002 when the touch panel 1002 is overlaid on the first display 104.

The touch sensing apparatus 1000 allows an object 1012 that is brought into close vicinity of, or in contact with, the touch surface 1008 to interact with the propagating light at the point of touch. In Figure 10a, the object 1012 is a user’s hand, but additionally or alternatively is e.g. a pen (not shown). In this interaction, part of the light may be scattered by the object 1012, part of the light may be absorbed by the object 1012, and part of the light may continue to propagate in its original direction over the touch panel 1002.

The optical detectors 1006 collectively provide an output signal, which is received and sampled by the processor unit 120. The output signal may contain a number of subsignals, also denoted "projection signals", each representing the energy of light emitted by a certain optical emitter 1004 and received by a certain optical detector 1006. It is realized that the touching object 1012 results in a decrease (attenuation) of the received energy on one or more detection lines D as determined by the processor unit 120.

In addition to the processes mentioned above in reference to Figures 1 and 2, the processor unit 120 may be configured to process the projection signals so as to determine a distribution of signal strength values (for simplicity, referred to as a "touch surface pattern") across the touch surface 1008, where each signal strength value represents a local attenuation of light. The processor unit 120 is configured to carry out a plurality of different signal processing steps in order to extract touch data for at least one object. Additional signal processing steps may involve filtering, back projection, smoothing, and other post-processing techniques as described in WO 2011/139213, which is incorporated herein by reference. In some examples the filtering and smoothing of the reconstructed touch data is carried out by a filtering module 1120 as shown in Figure 11. The signal processing is known and will not be discussed in any further detail for the purposes of brevity.

Turning back to Figure 10b, in the illustrated example the touch sensing apparatus 1000 also includes a controller 1016 which is connected to selectively control the activation of the optical emitters 1004 and, possibly, the readout of data from the optical detectors 1006. The processor unit 120 and the controller 1016 may be configured as separate units, or they may be incorporated in a single unit. In some examples the processing unit 120 can be a touch controller. The reconstruction and filtering modules 1118, 1120 of the processor unit 120 may be configured as separate units, or they may be incorporated in a single unit. One or both of the reconstruction and filtering modules 1118, 1120 may be at least partially implemented by software executed by the processing unit 120.

The relationship between the touch sensing apparatus 1000 and the first display 104 will now be discussed in reference to Figure 11. Figure 11 shows a schematic representation of a first videoconferencing terminal 100 comprising the touch sensing apparatus 1000.

In some examples, the display controller 108 can be separate from the first display 104. In some examples, the display controller 108 can be incorporated into the processing unit 120.

The first display 104 can be any suitable device for visual output for a user such as a monitor. The first display 104 is controlled by the display controller 108. The first display 104 and display controller 108 are known and will not be discussed in any further depth for the purposes of expediency. In some examples, the first display 104 comprises a display stack comprising a plurality of layers such as filters, diffusers, backlights, and liquid crystals. Additional or alternative components can be provided in the plurality of layers depending on the type of first display 104. In some examples, the display device is an LCD, a quantum dot display, an LED backlit LCD, a WLCD, an OLCD, a plasma display, an OLED, a transparent OLED, a POLED, an AMOLED and / or a Micro LED. In other examples, any other suitable first display 104 can be used in the first videoconferencing terminal 100.

The host control device 1102 may be connectively coupled to the touch sensing apparatus 1000. The host control device 1102 receives output from the touch sensing apparatus 1000. In some examples the host control device 1102 and the touch sensing apparatus 1000 are connectively coupled via a data connection 1112 such as a USB connection. In other examples other wired or wireless data connection 1112 can be provided to permit data transfer between the host control device 1102 and the touch sensing apparatus 1000. For example, the data connection 1112 can be ethernet, firewire, Bluetooth, Wi-Fi, universal asynchronous receiver-transmitter (UART), or any other suitable data connection. In some examples there can be a plurality of data connections between the host control device 1102 and the touch sensing apparatus 1000 for transmitting different types of data. The touch sensing apparatus 1000 detects a touch object when a physical object is brought in sufficient proximity to, a touch surface 1008 so as to be detected by one or more optical detector 1006 in the touch sensing apparatus 1000. The physical object may be animate or inanimate. In preferred examples the data connection 1112 is a human interface device (HID) USB channel. The data connection 1112 can be a logical or physical connection.

In some examples the touch sensing apparatus 1000, the host control device 1102 and the first display 104 are integrated into the same first videoconferencing terminal 100 such as a laptop, tablet, smart phone, monitor or screen. In other examples, the touch sensing apparatus 1000, the host control device 1102 and the first display 104 are separate components. For example, the touch sensing apparatus 1000 can be a separate component mountable on a display screen. The host control device 1102 may comprise an operating system 1108 and one or more applications 1110 that are operable on the operating system 1108. The one or more applications 1110 are configured to allow the user to interact with the touch sensing apparatus 1000 and the first display 104. The operating system 1108 is configured to run one or more applications 1110 and send output information to the display controller 108 for displaying on the first display 104. The applications 1110 can be drawing applications or whiteboards applications for visualising user input. In other examples the applications 1110 can be any suitable application or software for receiving and displaying user input.

Turning back to Figure 2, the first videoconferencing terminal 100 will now be discussed in further detail. In some examples the camera module 102 comprises a first camera 210 and a second camera 224. The first videoconferencing terminal 100 comprises at least one first camera 210 configured to capture first image data of the first user 204. The first videoconferencing terminal 100 further comprises at least one second camera 224 configured to capture at least on second image data of the first user 204.

As shown in Figure 2, the first videoconferencing terminal 100 comprises a first camera 210 and a second camera 224 mounted to the first display 104. The first camera 210 is connected to the videoconferencing controller 110 and is configured to send the first image data of the first user 204 to the videoconferencing controller 110. The second camera 224 is connected to the videoconferencing controller 110 and is configured to send the second image data of the first user 204 to the videoconferencing controller 110.

Both the first and second cameras 210, 224 are configured to capture images of the first user 204. The captured images of the first user 204 are used for generating modified image data of the first user 204 when displayed as a remote second display image 216 on the remote second display 218 at the second videoconferencing terminal 202. Modification of the first image data and I or the second image data of the first user 204 will be discuss in further detail below. In some examples, the first camera 210 is a near-infrared (NIR) camera and the second camera 224 is an RGB camera.

This means that the first image data received from the first camera 210 comprises infrared information only and does not comprise colour information. In contrast the second camera 224 is an RGB camera and the second image data comprises colour information in the visible spectrum. The second camera 224 in some examples can be any suitable camera configured to detect colour information. In some examples, the second camera 224 is configured to detect colour information in any colour space. For example, the second camera 224 is configured to send the second image data in one or more of the following colour spaces: RGB, CYMK (cyan, yellow, magenta, black), YIQ, YUV, YCbCr, HSV (hue, saturation, value), HSL (hue, saturation, luminance) or any other suitable colour space.

Reference will be briefly made to Figure 3. Figure 3 shows a schematic side view of the first videoconferencing terminal 100. The first camera 210 is mounted behind the first display 104. The second camera 224 as shown in Figure 3 is mounted on the top of the first display 104. In other examples, the second camera 224 can be mounted in any suitable position or orientation with respect to the first display 104 and the first user 204. For example, the second camera 224 is mounted on the side of the first display 104. In further examples, the second camera 224 can be remote from the first display 104 e.g. in front of the first display 104 as a desktop camera.

In some examples, the first camera 210 and the second camera 224 are integrated into the same component forming a compound eye camera component (not shown). In some other examples as shown in the Figures, the first camera 210 and the second camera 224 are separate components.

The first display 104 is optically transmissive to the near-infrared light and the first camera 210 is configured to detect infrared light reflected from the first user 204 through the first display 104. The modification of the first image data based on the second image data will be discussed in further detail below. As shown in Figure 2 and 3, the first camera 210 is mounted in the centre of the first display 104. This means that if an image of the second user 206 is positioned over the first camera 210, the second user 206 will perceive that the first user 204 is looking directly at the second user 206. This is represented in Figure 3 that the gaze direction G of the first user 204 is aligned with an optical axis of the first camera 210.

In some examples, the first camera 210 is configured to capture first image data of the first user 204 for a video stream in the videoconference between the first videoconferencing terminal 100 and the second videoconferencing terminal 202. In this way, the first camera 210 is configured to be the primary camera for capturing the video stream of the first user 204 during the videoconference.

Optionally the first videoconferencing terminal 100 comprises a third camera 212, identical to the first camera 210 as shown in Figures 4, 5 or 6. The third camera 212 is configured to capture third image data and send the third image data to the videoconferencing controller 110. The videoconferencing controller 110 is configured to use the third image data to augment the first image data. In some other examples, the videoconferencing controller 110 is configured to use third image data together with the first image data for other functionality such as face detection and I or eye tracking as discussed below in more detail. In some other examples, the third image data captured by the third camera 212 is configured to be used for the purposes of maintaining eye contact during the videoconference which is also discussed in further detail below.

Turning to Figure 6, in some examples the first and third cameras 210, 212 are mounted remote from each other behind the first display 104. Figure 2 shows the first and third cameras 210, 212 mounted behind the first display 104 on different sides of the first display 104. However, the first and third cameras 210, 212 can be mounted at any position on the first display 104. In other less preferred examples the first and third cameras 210, 212 can be mounted remote from the first display 104. For example, the first and third cameras 210, 212 can be mounted on the ceiling or the wall near the first display 104.

In some more preferred examples, the first and third cameras 210, 212 are mounted behind the first display 104. As mentioned, the first and third cameras 210, 212 are near-infrared cameras and the first display 104 is optically transmissive to the near- infrared light. In some examples, the first and third cameras 210, 212 comprises a first illumination source 222 of near-infrared light for illuminating the first user 204. As shown in Figure 2 the first illumination source 222 is mounted on the top of the first display 104, but this first illumination source 222 can be mounted in any suitable position. The source of illumination can be a near-infrared light source such as an LED mounted to the first and I or the third camera 210, 212. Alternatively, the first illumination source 222 is mounted on or integrated with the first display 104 remote from the first and third cameras 210, 212.

Figure 4 shows alternative arrangements to the first illumination source 222. In Figure 4 the first illumination source 222 is positioned behind the first display 104. Additionally or alternatively the first user 204 can be optionally illuminated with other near-infrared sources. For example, in Figure 4 a second illumination source 400 is provided. The second illumination source 400 is a ring light mounted behind the first display 104 comprising a plurality of infrared LEDs 402. In some examples, the first illumination source 222 and I or the second illumination source 400 are integrally mounted into the first display 104 e.g. integrated into a layer of the display stack of the first display 104. Providing a ring light 400 is advantageous because the first user 204 is better illuminated during the videoconference and the second user 206 can more easily see the first user 204.

In some examples, the first illumination source 222 and I or the second illumination source 400 are incorporated into a backlight layer of the display stack of the first display 104.

Modification of the first image data based on the second image data will now be discussed. The videoconferencing controller 110 receives the first image data from the first camera 210 as shown in step 900 of Figure 9. Figure 9 shows a flow diagram of a method according to an example. The first image data comprises black and white image data because the first image data has been captured by the first camera 210 which is a NIR camera.

This means that if the unmodified first image data is used for the video stream, the first user 204 can only be viewed in black and white. In order to view the first user 204 in colour, the modified image data is generated based on the first image data and the second image data e.g. the first image data needs to be e.g. colourised.

The videoconferencing controller 110 receives the second image data from the second camera 224 as shown in step 902 of Figure 9. As mentioned above, the second image data received from the second camera 224 comprises colour information.

Once the videoconferencing controller 110 receives the first and second image data, the videoconferencing controller 110 sends a signal to the image processing module 116 to generate modified image data of the first user 204. In some preferred examples, the image processing module 116 only modifies the first image data of the first user 204 and not the background of the first user 204. In other examples, the image processing module 116 is configured to modify the first image data of the first user 204 and the background of the first user 204.

The image processing module 116 is configured to colourise the first image data based on one or more techniques that will be explained below.

In a first example the image processing module 116 modifies the first image data based on second image data based on one or more heuristic algorithms. The image processing module 116 is configured to receive a signal comprising the second image data and the second image data comprises colour information independent from other image data. For example, the second image data is received comprising HSL or HSV colour space data.

The image processing module 116 receives a signal comprising the first image data. The image processing module 116 is configured to map the second image data and the first image data corresponding pixels areas to determine corresponding parts of the first user 204 in both the first image data and the second image data. In some examples, the image processing module 116 performs edge detection to determine the part of the first and second image data which comprise the image of the first user 204. In some examples, the image processing module 116 determines a first pixel map of the first user 204 of the first image data and a second pixel map of the second image data. The image processing module 116 is configured to correlate the pixels of the first pixel map and the second pixel map. Depending on the location of the first camera 210 and the second camera 224, the image processing module 116 may need to translate the second image data on to the first image data because the field of views of the first and second cameras 210, 224 are different.

In some examples, the image processing module 116 is configured to correlate individual pixels of the first and second pixel maps. Additionally or alternatively, the image processing module 116 is configured to correlate different parts of the first and second pixel maps. For example, the first camera 210 may be a high-resolution camera to capture the detail of the first user 204 whilst the second camera 224 may be a low-resolution camera for only capturing colour information. In this way, the image processing module 116 may need to interpolate colour information from one pixel on the second pixel map to a plurality of pixels on the first pixel map.

The image processing module 116 then generates a modified image data by adding colour information from the second image data to the black and white first image data as shown in step 904 in Figure 9. In some examples, the modified signal comprises the hue, and saturation values from the second image data and the luminance value from the first image data. Colourising a NIR image such as the first image data with colour information is known e.g. Colouring the Near-Infrared, by Clement Fredembach and Sabine Susstrunk which is incorporated herein by reference.

In some other examples, the image processing module 116 is configured to modify the first image data based on one or more of the following image artifacts, colour space, brightness, contrast, position, perspective in the second image data.

In another example, the second camera 224 is a high-resolution colour camera mounted on top of the first display 104. The first camera 210 is a near-infrared camera mounted behind the first display 104. In this case, the second image data comprises high resolution colour image data. The image processing module 116 is configured to generate the modified image data by modifying the perspective of the second image data based on the perspective of the first image data.

The image processing module 116 then sends a signal comprising the modified first image data to videoconferencing controller 110. The videoconferencing controller 110 is configured to send the modified first image data to the remote second display 218 of the second videoconferencing terminal 202 as shown in step 906. The remote second display 218 then displays the modified e.g. colourised image on the remote second display 218 as shown in step 908. In some examples, the modified first image data is also displayed on the first display 104 so that the first user 204 can see the modified image.

In some examples, the image processing module 116 is configured to perform the steps as shown in Figure 9 continuously. In this way, the image processing module 116 is modifying the first image data received from the first camera 210 throughout the videoconference.

In some other examples the image processing module 116 is alternatively configured to perform the steps as shown in Figure 9 a single time during an initial calibration step. The first user 204 may use the second camera 224 as described above. However, alternatively, the first user 204 may use a smartphone, tablet, or similar device with a camera as the second camera 224. The smartphone may be configured to send the second image data to the videoconferencing controller 110. The image processing module 116 then performs the modification of the first image data as previously discussed.

In another example, the image processing module 116 is configured to automatically colourise the first image data based on one or more machine learning methods. Machine learning methods for automatic image colourisation are known e.g. Machine Learning Methods for Automatic Image Colorization, Computational Photography: Methods and Applications by GUILLAUME CHARPIAT et al. which is incorporated by reference herein. The image processing module 116 receives a signal comprising the second image data from the second camera 224. The second image data comprises the colour information of the first user 204 and the second image data is used to train a neural network for performing the automatic image colourisation. In some examples, the training step can be performed before the videoconference session based on a sequence of images of the first user 204 captured by the second camera 224. In other examples, the neural network is trained and the first user 204 does not need to train the neural network any further.

In some examples, the image processing module 116 can perform one or more modifications alternatively or additionally to the colourisation. In one example, the image processing module 116 can automatically lighten the first image data based on the second image data. The lightening can be performed to provide modified first image data which appears to have been illuminated during the videoconference. In some examples, the image processing module 116 is configured to enhance the lighting based on machine learning. Using machine learning to lighten an image is known e.g. https://ai.qooglebloq.com/2020/12/portrait-light-enhancing-portrait.html which is incorporated herein by reference. Automatically, lightening the first image data can be advantageous because the first videoconferencing terminal 100 does not need to comprise the first or second illumination sources 222, 400 as previously discussed. Accordingly, when the first user 204 is using a smaller device such as a laptop or smartphone, the image of the first user 204 in the video stream can still be lightened for a better appearance.

In some examples as mentioned above with reference to Figures 10a, 10b and 11, the first videoconferencing terminal 100 also comprises a touch sensing apparatus 1000. The touch sensing apparatus 1000 is configured to determine presenter image data comprising spatial position information of the position of the first user 204 relative to the touch sensing apparatus 1000 as shown in step 1200 of Figure 9.

In this case, the touch sensing apparatus 1000 is configured to issue one or more signals to the videoconferencing controller 110 relating to a touch event and the spatial position information. The first camera 210 and the third camera 212 are located behind the first display 104 and accordingly will capture different views of the first user 204 or other objects in front of the first display 104. The spatial relationship between the first camera 210, the third camera 212 and the first display 104 is known. The videoconferencing controller 110 can determine the distance of an object in front of the display with trigonometric algorithms. Accordingly, the videoconferencing controller 110 can determine when an object e.g. the first user 204 is within a predetermined distance of the first display 104. For example, the videoconferencing controller 110 determines that the first user 204 is hovering their finger above the touch sensing apparatus 1000 and the first display 104.

In some examples, the videoconferencing controller 110 is configured to send a control signal to the image processing module 116 to modify the first image data. The videoconferencing controller 110 is configured to issue a control signal to the image processing module 116 to modify the image to create a silhouette of a hand of the first user 204 on the first display image 220. The image processing module 116 is configured to generate a first user representation based on the spatial position information as shown in step 1202. The image processing module 116 is configured to modify the remote second display image 216 to also provide a silhouette of the first user 204. The videoconferencing controller 110 is then configured to output a display signal to the remote second display 218 to display the first user representation as shown in step 1204 of Figure 12. Additionally or alternatively, videoconferencing controller 110 is configured to output a display signal to modify the first display image 220 on the first display 104 to display the first user representation as shown in step 1206 of Figure 12.

In some examples, the first user representation is a silhouette of a hand, pointer, stylus, or other indicator object.

Determining the silhouette is described in detail in SE2130042-1 which is incorporated herein by reference.

In some examples as shown in Figures 5, 6, and 7, the first and third cameras 210, 212 are configured to capture images of the first user 204. Figures 5, 6 and 7 show schematic representations of the first videoconferencing terminal 100. The captured images of the first user 204 are used for determining a gaze direction G and other facial features of the first user 204 with respect to the first display image 220 on the first display 104. As mentioned above, the video stream is displayed on the first display image 220 as shown in step 1300 in Figure 13. Figure 13 shows a flow diagram of a method of videoconferencing according to some examples.

Figure 5 shows two video streams in a first application window 500 and a second application window 502. The first application window 500 shows a video stream of a second user 206 and the second application window 502 shows a video stream of a third user 504. The first application window 500 is positioned in front of the first camera 210 and the second application window 502 is positioned over the third camera 212. Accordingly, the image of the second user 206 is positioned in front of the first camera 210 and the image of the third user 504 is positioned in front of the third camera 212.

As shown in Figure 5, the first user 204 is looking at the image of the second user 206 and the first user gaze direction G is directed towards the image of the second user 206. Figure 6 is the same as Figure 5, except that the representations of the second user 206 and the third user 504 have been omitted for the purposes of clarity.

Whilst Figures 5, 6, and 7 show a plurality (e.g. two) cameras 210, 212 for determining a gaze direction G and other facial features of the first user 204, the first videoconferencing terminal 100 can comprise a single camera 210 for determining a gaze direction G and the facial features of the first user 204.

The videoconferencing controller 110 determines where the first display image 220 the first user 204 is looking. The videoconferencing controller 110 can use one or more different algorithms for calculating the first user gaze direction G which will be discussed now.

The videoconferencing controller 110 receives one or more images from the first and or the third cameras 210, 212 of the first user 204 as shown in step 1302 in Figure 13. The videoconferencing controller 110 then sends the one or more images to the face detection module 114. In one example, the face detection module 114 determines the orientation and position of the face of the first user 204 based on feature detection. The face detection module 114 detects the position of the eyes 506 of the first user 204 in a received image. In this way, the face detection module 114 determines the facial gestures and the gaze direction of the first user 204 as shown in steps 1304 and of Figure 6. The face detection module 114 uses feature detection on an image of the first user 204 to detect where the eyes 506 and the face of the first user 204 are with respect to the first display 104. For example, the face detection module 114 may determine that only one eye or no eyes 506 of the first user 204 are observable.

The face detection module 114 then sends a face detection signal or face position and or face orientation information of the first user 204 to the videoconferencing controller 110.

The videoconferencing controller 110 then determines whether the first user 204 is looking at the first display 104 based on the received signal from the face detection module 114. If the videoconference controller 110 does not receive a face detection signal from the face detection module 114, then the videoconference controller 110 determines that the first user 204 is not looking at the first display 104.

In this way, the videoconferencing controller 110 is able to determine the general gaze direction G of the first user 204 based on a detection of the face of the first user 204. In other words, the videoconferencing controller 110 determines that the gaze direction G in that the first user 204 is looking at the first display 104 or not.

When the videoconferencing controller 110 determines that the first user 204 is not looking at the first or second application window 500, 502, the videoconferencing controller 110, issues a signal to the image processing module 116 to modify the first display image 220 to indicate to the second and third users 206, 504 that the first user 204 has broken eye contract. In some examples, the image processing module 116 is configured to move the first and second application windows 500, 502 so that they are not in front of the first and third cameras 210, 212. This will appear to the second user 206 and the third user 504 that the first user 204 is not making eye contact with them. Additionally or alternatively, the image processing module 116 adds an indicator flag to the first display image 220 and the remote images. The indicator flag shows the second and third users 206, 504 that the first user 204 is not looking at them.

In another examples, when the videoconferencing controller 110 determines that the first user 204 is not looking at the first display 104 or e.g. the first or second application windows 500, 502, the videoconferencing controller 110 sends a control signal to the image processing module 116 to modify the image of the first user 204 so that they appear to be looking at the first display 104 and engaged in the videoconference. For example, when the first user 204 looks down at their phone, this is not shown to the other participants of the videoconference.

In some examples, the videoconferencing controller 110 determines a more precise presenter gaze direction G. This will now be discussed in further detail.

As mentioned previously, the videoconferencing terminal comprises a first illumination source 222 of near-infrared light configured to illuminate the first user 204. The infrared light is transmitted to the first user 204 and the infrared light is reflected from the first user eyes 506. The first and third cameras 210, 212 detect the reflected light from the presenter eyes 506.

The first and third cameras 210, 212 are configured to send one or more image signals to the videoconferencing controller 110 as shown in step 1302. The videoconferencing controller 110 sends the image signals to the eye tracking module 118. Since the placement of the first and third cameras 210, 212 and the first illumination source 222 are known, the eye tracking module 118 determines through trigonometry the gaze direction G of the first user 204 as shown in step 1304. Determining the first user gaze direction G from detection of reflected light from the eye 506 of the first user is known e.g. as discussed in US 6,659,661 which is incorporated by reference herein.

Alternatively, in some examples, the videoconferencing controller 110 determines the direction of the face of the first user 204 based on feature detection. For example, the eye tracking module 118 determines the location of eyes 506 of the first user 204 with respect to the nose 230 from the received image signals. In this way, the eye tracking module 118 determines the first user gaze direction G as shown in step 1304. Determining the first user gaze direction G from facial features is known e.g. as discussed in DETERMINING THE GAZE OF FACES IN IMAGES A. H. Gee and R. Cipolla, 1994 which is incorporated by reference herein.

Alternatively, in some other examples, the eye tracking module 118 determines the first user gaze direction G based on a trained neural network classifying the direction of the first user eyes 506 processing the received one or more image signals from the first and third cameras 210, 212 as shown in step 1304. Classifying the first user gaze direction G from a convolutional neural network is known e.g. as discussed in Realtime Eye Gaze Direction Classification Using Convolutional Neural Network Anjith George, and Aurobinda Routray 2016 wh ch is incorporated herein by reference.

The eye tracking module 118 determines the first user gaze direction G and sends a signal to the videoconferencing controller 110 comprising information relating to the first user gaze direction G. By using a more accurate detection of the first user gaze direction G, the videoconferencing controller 110 is able to better determine where on the first display image 220 the first user 204 is looking.

Once the videoconferencing controller 110 receives the information relating to the presenter gaze direction G, the videoconferencing controller 110 determines which part of the first display image 220 on the first display 104 that the first user gaze direction G intersects. Accordingly, the videoconferencing controller 110 determines an intersection point 508 between the first user gaze direction G and the first display image 220.

In the scenario as shown in Figure 5, the videoconferencing controller 110 determines the first user gaze object as shown in Figure 13, step 1306. In this way, the videoconferencing controller 110 determines what the first user 204 is looking at on the first display 104. In Figure 5, the videoconferencing controller 110 determines that the first user gaze direction G is directed at the first application window 500. The videoconferencing controller 110 then determines that since the first user 204 is looking at the video stream of the second user 206, the first camera 210 is used for capturing the images of the first user 204 and transmitting in the videoconference. The images of the first user 204 captured by the first camera 210 are transmitted to a remote display 218 at the second videoconferencing terminal 202 to be viewed by the second user 206. The images of the first user 204 captured by the third camera 212 are transmitted to a remote display (not shown) at another videoconferencing terminal (not shown) to be viewed by the third user 504. This means that the first user 204 will appear to look at the second user 206 and not the third user 504. Other users on the videoconference will receive the images of the first user 204 captured by the third camera 212 because there will be no eye contact.

Turning now to Figure 7, another scenario will be discussed. Figure 7 is the same as Figure 6 except that the first user 204 is looking at a different part of the first display image 220. In the scenario as shown in Figure 7, the videoconferencing controller 110 determines that the first user gaze direction G is directed at the second application window 502. The videoconferencing controller 110 then determines that since the first user 204 is looking at the video stream of the third user 504, the third camera 212 is used for capturing the images of the first user 204 and transmitting in the videoconference. This means that the first user 204 will appear to look at the third user 504. In some examples, when the videoconferencing controller 110 determines that the first user 204 is looking at the third user 504, the videoconferencing controller 110 issues a control signal the image processing module 116 to move the second application window 502 over the third camera 212 as shown in step 1308 in Figure 13. Figure 7 shows the previous position of the second application window 502' and the second application window 502 position has been translated to a position in front of the third camera 212.

The first videoconferencing terminal 100 can comprise an array of NIR cameras identical to the first and third cameras 210, 212 positioned behind the first display 104. An application window comprising a video stream to different participants of the video conference can be positioned in front of each different NIR camera. The videoconferencing controller 110 can issue a control instruction to position the video stream of the most recent or the most frequent speaker in the videoconference over the first or third cameras 210, 212.

Another example will now be discussed in reference to Figure 8. Figure 8 shows a first videoconference terminal 100 according to an example. The first videoconference terminal 100 is the same as shown in the previous Figures except that the first display 104 comprises a central strip 800 to disguise the first camera 210 and the first illumination source 222.

The first display 104 as shown in Figure 8 is a transparent OLED (TOLED). This means in some modes of the first display 104 the first user 204 is able to look through the first display 104. For example, the first display 104 is mounted in a window. However, in other modes (as shown in Figure 8), the first user 204 is able to use the first display 104 as an eye-contact display.

The first display 104 as shown in Figure 8 comprises a central strip 800. The central strip 800 is located in the middle of the first display 104. This hides the first camera 210 and cables (not shown) connected to the first camera 210. In some other examples, the central strip 800 and be located on the first display 104 having any suitable position, orientation, and shape. Aesthetically, the first user 204 will not mind the central strip 800 in the first display 104 because the first display 104 will have the pleasing appearance of a window frame.

In some examples, the first display 104 comprises an electrochromic layer (not shown) mounted on the back of the first display 104. The videoconferencing controller 110 is configured to issue a control signal to the electrochromic layer to turn opaque providing e.g. a black or mirrored appearance. This means that the first display 104 when not being used as a first videoconferencing terminal 100 can be transparent, but opaque during the videoconference. This will furthermore give a better videoconferencing experience to the first user 204 because the first user 204 is not able to see through the first display 104 when the electrochromic layer is opaque.

Additionally, in another example both the first videoconferencing terminal 100 and the second videoconferencing terminal 202 are identical. The first and second videoconferencing terminals 100, 202 respectively determine the gaze direction G of the first user 204 and the second user 206. The first and third cameras 210, 212 capture different image data of the first user 204. The videoconferencing controller 110 sends a signal to the image processing module 116 to combine the different image data of the first user 204 to generate a 3D representation of the first user 204. The videoconferencing controller 110 transmits the 3D representation to the second videoconferencing terminal 202. Since the second video conferencing terminal 202 determines the gaze direction G of the second user 206, the 3D representation of the first user 204 on the second display 218 can be modified depending on where the second user 206 is looking. This means that a 3D effect can be enabled without providing a 3D screen at each videoconferencing terminal 100, 202.

Furthermore, in some examples, the 3D representation of the first user 204 on the second display 218 means that the first user 204 does not have to stand exactly in front of one of the cameras 210, 212 for maintaining eye contact.

This means that a 3D representation of the first user 204 in front of the first display 104. The controller 110 has determined the gaze direction as previously discussed. In some examples, the controller 110 uses the gaze direction and the position of the remote users 206 to generate of an image of the 3D representation of the first user 204. In this way a plurality of different images of the 3D representation of the first user 204 is sent to each remote user 206 in dependence of where the displayed image of the remote user 206 is on the first display 104. This means eye contact can be maintained between the first user 204 and each remote user 206.

In some examples, an improvement to maintaining eye contact between the first user 204 and each remote user 206 can be provided. Eye contact is typically an indication of an ongoing discussion between e.g. two persons at the same time the first user 204 and each remote user 206. If the speaker, e.g. the first user 204 is seeking to switch the conversation to include or put a question to a new person this may be typically "indicated" by the first user 206 changing the gaze towards the new person.

In some examples, in order to provide improved eye contact and not make the first and second users 204, 206 feel uncomfortable (e.g. feel “uncanny”) the 3D representation is extremely well centred. In some examples, the 3D representation is centred within a milliradian.

Depending on the culture and social structure of the participants keeping eye contact "too long" between the first user 204 and the second user 206 may feel uncomfortable. In some examples, the videoconferencing controller 110 is configured to use an Al solution based on speech in order to determine the meaning of the conversation, body language, social markers, and adaption of the participants. This means that the videoconferencing controller 110 is configured to determine how the participants react during the videoconference e.g. how the participants react to e.g. the 3D representation. By modifying the 3D representation in respect to the reaction of the participants, the videoconferencing controller 110 may be used to overcome some of the “uncanny” feeling for the participants.

In some alternative examples, the videoconferencing controller 110 can create an avatar for the first and second users 204, 206 with smart minimalistic artistical representations of the participants which can further reduce the “uncanny feeling”.

In some examples, there can be more than one person in front of the first videoconferencing terminal 100. For example, two people can stand side by side in front of the first videoconferencing terminal 100. In this case, separate cameras e.g. the first, second or third cameras 210, 212, 224 are each used for maintaining eye contact with a different person. In some further examples, there can be a first and second arrays (not shown) of first, second and third cameras 210, 212, 224 and each of the first and second arrays is configured separately to operate as discussed in reference to the previously discussed examples.

In another example, two or more examples are combined. Features of one example can be combined with features of other examples.

Examples of the present disclosure have been discussed with particular reference to the examples illustrated. However, it will be appreciated that variations and modifications may be made to the examples described within the scope of the disclosure.

Claims

1. A method of videoconferencing between a first user and a second user comprising: receiving at least one first image data of the first user from a first camera mounted behind a first display, the first display configured to be optically transmissive with respect to the first camera; receiving at least one second image data from a second camera of the first user; generating the modified image data based on the first image data and the second image data; and sending a display signal comprising the modified image data configured to be displayed for the second user on a second display.

2. A method according to claim 1 wherein the first camera is a near-infrared camera.

3. A method according to claims 1 or 2 wherein the second camera is a visible spectrum camera.

4. A method according to any of claims 1 to 3 wherein the modifying comprises recolouring the first image data based on the second image data.

5. A method according to any of claims 1 to 4 wherein the recolouring of the first image data is based on one or more of image artifacts, colour space, brightness, contrast, position, perspective in the second image data.

6. A method according to any of the preceding claims wherein the method comprising calibrating with initial data sets received respectively from the first camera and the second camera.

7. A method according to any of the preceding claims wherein the method comprises training a neural network with the at least one second image data.

8. A method according to claim 7 wherein the method comprises recolouring the first received image data with the neural network trained with the at least one second image data.

9. A method according to any of the preceding claims wherein the method comprises illuminating the at least one first user with one or more near-infrared lights.

10. A method according to claim 9 wherein the one or more near-infrared lights is a ring light mounted behind the display.

11. A method according to any of the preceding claims wherein the second camera is mounted on a side of the display.

12. A method according to any of the preceding claims wherein the at least one first camera comprises a plurality of different first cameras mounted at different locations behind the display.

13. A method according to any of the preceding claims wherein at least one second image data from the second camera is only received during a set-up operation.

14. A method according to any of the preceding claims wherein the first camera is mounted in a display stack of the first display.

15. A method according to any of the preceding claims wherein the first camera comprises a higher resolution than the second camera.

16. A method according to any of the preceding claims wherein both the first and second cameras are integrated into the display.

17. A method according to any of the preceding claims wherein the first and I or the second camera are mounted in a non-illuminating portion of the display.

18. A videoconferencing device for videoconferencing between a first user and a second user comprising: a display configured to display an image of the second user; a first camera configured to capture at least one first image data of the first user, the first camera mounted behind the display, the display configured to be optically transmissive with respect to the first camera; a second camera configured to receive at least one second image data of the first user; and a controller comprising a processor, the controller configured to generate modified image data based on the first and second image data; and send a display signal comprising the modified image data configured to be displayed.

19. A video conferencing device for video conferencing between a local user and a plurality of remote users comprising: a display configured to a first remote user image and a second remote user image; a first camera mounted behind the display configured to capture at least one first image of the local user; a second camera mounted behind the display configured to capture at least one second image of the local user; wherein the display is configured to be optically transmissive with respect to the first camera and the second camera; and a controller, having a processor, the controller being configured to send a display signal to the display to position the first remote image user over the first camera; detect a change in the gaze direction of the local user from the first remote user image to the second remote user image based on the first images and the second images of the local user; and position the second remote user image over the second camera in dependence of detecting the change in gaze direction from the first remote user image to the second remote user image.

20. A video conferencing device according to claim 19 wherein the at least one first image of the local user is displayed on a first remote user display and the at least one second image of the local user is display on a second remote user display.

21 . A method of video conferencing between a local user and a plurality of remote users comprising: displaying a first remote user image and a second remote user image a display; receiving at least one first image data of the local user from a first camera mounted behind a first display, the first display configured to be optically transmissive with respect to the first camera; receiving at least one second image data of the local user from a second camera; sending a display signal for displaying the first remote user image over the first camera; detecting a change in the gaze direction of the local user from the first remote user image to the second remote user image based on the first images and the second images of the local user; and positioning the second remote user image over the second camera in dependence of detecting the change in gaze direction from the first remote user image to the second remote user image.