GB2594333A - A videoconferencing system - Google Patents

A videoconferencing system Download PDF

Info

Publication number
GB2594333A
GB2594333A GB2006073.7A GB202006073A GB2594333A GB 2594333 A GB2594333 A GB 2594333A GB 202006073 A GB202006073 A GB 202006073A GB 2594333 A GB2594333 A GB 2594333A
Authority
GB
United Kingdom
Prior art keywords
media stream
videoconferencing
local
stream
endpoint device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2006073.7A
Other versions
GB202006073D0 (en
Inventor
Joseph Nicolson Timothy
Martin Klingberg Tor
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Starleaf Ltd
Original Assignee
Starleaf Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Starleaf Ltd filed Critical Starleaf Ltd
Priority to GB2006073.7A priority Critical patent/GB2594333A/en
Publication of GB202006073D0 publication Critical patent/GB202006073D0/en
Publication of GB2594333A publication Critical patent/GB2594333A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • H04L65/4038Arrangements for multi-party communication, e.g. for conferences with floor control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/764Media network packet handling at the destination 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/561Adding application-functional data or data for application control, e.g. adding metadata
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/38Displays

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Library & Information Science (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A videoconferencing system 10 comprises a first device 101 (e.g. videoconferencing bridge) to provide a media stream (or plural streams), and a telecommunication endpoint device 3 to receive, over the internet from the first device 101, the media stream. First device 101 is further sends a control signal (e.g. display change instruction, metadata) to telecommunication endpoint device 3. The control signal comprises display configuration data related to a display configuration of the media stream to be displayed by the telecommunication endpoint device 2, the display configuration requiring video processing, e.g. combining plural streams or display animations. The telecommunication endpoint device 3 carries out at least a portion of the video processing of the display configuration, e.g. position and/or size media stream(s). The endpoint device 3 may further display a local media stream e.g. local camera or content stream. Control signal metadata may comprise a media stream identifier, a stream display position or size, rendering order or Z index, stream opacity, and/or media stream brightness. Device synchronisation may be performed based on measured latency. The control signal may be based on audio energy detection, a control interface, and/or media stream quality rating.

Description

A VIDEOCONFERENCI NG SYSTEM
FIELD OF THE INVENTION
The present invention relates to a video conferencing system and method. BACKGROUND OF THE INVENTION Video processing in the context of videoconferencing has typically been considered a central processing unit (CPU) intensive operation. The video processing has typically been reserved for a central device, such as a video conferencing bridge, in a telecommunication network. In such systems, telecommunication endpoints provide only video encoding and decoding but do not carry out video processing themselves.
Video protocols have been built around this model. In particular, content streams for a videoconference such as from the screen output of a laptop computer or a document camera are kept separate from a video camera output stream. The output from a video camera is delivered to a receiving endpoint or party separately from a content stream output. The intent being that dedicated hardware in the endpoints can take the stream from a video camera and render it on one display, and take the content stream output and render that on another display. In the case of single-display endpoints, the endpoint may switch between rendering the output from a video camera, or rendering the content stream.
Figure 1, showing an example prior art arrangement, illustrates a known videoconferencing system 1. The videoconferencing system 1 comprises a videoconferencing bridge 101 implemented in a cloud service. The videoconferencing bridge 101 comprises the capability for video processing using a video mixer 9. The system 1 also comprises a local telecommunication endpoint device 3, in connection over a wide area network with the videoconferencing bridge 101. In connection with the local telecommunication endpoint device 3 is a local camera device 4, and two displays including a local camera display 6 and a local content display 7.
The videoconferencing bridge 101 is also in connection with remote endpoints 2, 22 over the wide area network. In this prior art example, a remote endpoint 2 is connected over a local connection to a remote camera device 14 and a remote content device 15.
Media streams are sent from the remote endpoints 2, 22 to the videoconferencing bridge 101. The video mixer 9 of the videoconferencing bridge 101 combines one or more of these media streams into a combined media stream. The combined media stream is then sent to the local telecommunication endpoint device 3. The local telecommunication endpoint device 3 then displays the media stream or streams received from the videoconferencing bridge 101, along with any local media stream from local devices such as the local camera device 4, on the displays 6,7.
SUMMARY OF THE INVENTION
The invention is defined by the independent claims to which reference should now be made. Optional features are set forth in the dependent claims.
The inventors of the arrangements described herein have appreciated that with the increasing speeds of general purpose processors and graphics processing units (GPUs), the ability for endpoints to provide at least some video processing has become more extensive. In addition, the inventors of the present invention have not only appreciated that it is possible for the endpoints themselves to provide at least some of the video processing, but have also appreciated further benefits made possible with such a system.
By sharing the video processing tasks between a centralised video conferencing bridge comprising media stream mixing capabilities with one or more endpoint devices also comprising media stream mixing capabilities, flexible video stream organisation, layout, animations and effects for a videoconferencing system may be provided.
Arrangements of the present disclosure provide videoconferencing between a plurality of endpoints, each of which may provide one or more media streams to a videoconferencing bridge. The videoconferencing bridge utilizes its own media stream mixing capabilities, as well as the endpoints utilizing their own media stream mixing capabilities. By utilizing the video processing capabilities of the endpoint devices themselves, an efficient, flexible, and responsive user experience is provided. As such, the arrangements of the present disclosure achieve an enhanced user experience of videoconferencing by providing a combination of centralized and localized media stream mixing, such as video stream mixing.
According to an aspect of the present invention, there is provided a videoconferencing system, the videoconferencing system comprising: a first device configured to provide a media stream; and a telecommunication endpoint device configured to receive, over the internet from the first device, the media stream, wherein: the first device is further configured to send a control signal to the telecommunication endpoint device, the control signal comprising display configuration data related to a display configuration of the media stream to be displayed by the telecommunication endpoint device, the display configuration requiring video processing; and the telecommunication endpoint device is configured to carry out at least a portion of the video processing of the display configuration. By providing a telecommunication endpoint device which is configured to carry out at least a portion of the video processing of the display configuration, a flexible video stream may be provided, such as flexible or smart video stream organisation, layout, and animations.
The user experience is significantly improved when the telecommunication endpoint device provides at least some of the video processing. For example, a user interface at the telecommunication endpoint device may provide a greater range of customization, including use of animation effects to help a user track changes during a video conference.
The first device may be a device central to a videoconferencing network. For example, the first device may be a videoconferencing bridge. A videoconferencing bridge may be a device through which telecommunications are connected or routed between two or more telecommunication endpoint devices. The videoconferencing bridge may be implemented in a cloud service, where a cloud service comprises a plurality of servers and in particular servers remote from the telecommunication endpoint device located on a different site.
The media stream may comprise a plurality of streams. The video processing may comprise combining at least some of the plurality of streams. That is, at least some of the plurality of media streams may be combined to form a combined media stream. This may be carried out in part by the first device before sending said combined media stream to the telecommunication endpoint device. In addition, the first device may send a plurality of separate, or un-combined, media streams to the telecommunication endpoint device, and the telecommunication endpoint device may then combines at least some of the plurality of media streams into a combined media stream. In one embodiment, both the first device and the telecommunication device each combine at least some of the plurality of media streams into combined media streams, resulting in the telecommunication device managing and displaying one or more combined or non-combined media streams.
The media stream or plurality of media streams may comprise a remote stream from one or more of a remote endpoint device or remote content device. That is, the first device may receive media streams from one or more remote endpoint devices or remote content devices. Said media streams may then be handled, routed, or combined as above. For example, a remote user may initiate a video telecommunication from a remote endpoint device. The remote endpoint device may provide a media stream including a video stream from a remote video camera. Instead or in addition, the user may provide a remote content stream via a remote content device such as from a computer device such as a desktop computer, a laptop computer or tablet device.
The telecommunication endpoint device may also be configured to display a local media stream. The local media stream may comprise one or more of a local camera stream or a local content stream. A local camera stream may be a media stream from a local video camera device used by a user at the telecommunication endpoint device. A local content stream may be a media stream from a local computer device such as a desktop computer, a laptop computer or tablet device. Local devices, such as the local camera device or local content device, may be in connection with the telecommunication endpoint device by a local connection. The local connection may include a wired connection, a connection over a wireless local area network (WLAN) such as a W-Fi network, or a Bluetooth connection.
The display configuration may comprise one or more of: a display position of the media stream and/or local media stream; or a display size of the media stream and/or local media stream. In other words, the display configuration may comprise information defining the size and/or position of each media stream to be displayed by the telecommunication endpoint device.
The video processing may comprise one or more animations for displaying the media stream and/or the local media stream. Animations, or media animations, or effect animations, may be used to change the display configuration. Such animations may include, for example, on screen transitions. Such animations may initiate when the telecommunication endpoint device receives the control signal from the first device, and the display configuration data includes instructions or data to change the display of a video camera stream to a content stream. The animations may include animations or transitions that are instantaneous. The animations may be used to move, remove, add, overlay, resize, rearrange, or change in any other way, the display configuration including the display of media streams displayed by the telecommunication endpoint device.
The first device may be configured to carry out at least a portion of the video processing.
That is, the first device and the telecommunication endpoint device each carry out at least a portion of the video processing. For example, as explained above, both the first device and the telecommunication device may combine one or more of a plurality of media streams into combined media streams. The telecommunication endpoint device may also carry out video processing including providing animations and/or effects to the media stream(s).
The control signal may comprise an instruction based on changes made to the display configuration. Changes to the display configuration may comprise a change in the media streams, such as the addition, combination, resizing, repositioning, and/or removal of a media stream. Changes to the display configuration may also include a user manually making a change to the display configuration. Such changes may be made via a user input at or on the telecommunication endpoint device. For example, a user may manually arrange a display configuration, such as the position and/or size of media streams shown on the user's display. For example, a user may reposition or resize media streams shown on their display. This may then trigger an animation to execute such a change to the display configuration.
The first device or videoconferencing bridge may make control decisions in the system, such as by issuing high-level commands with the control signal to the telecommunication device, instructing how each media stream or window is to be displayed. This is advantageous as the videoconferencing bridge will be well aware of all aspects of the video conference, such as the current speaker at a respective endpoint device based on audio energy detection, the control interface for a managed conference, and the video stream quality for each stream. By allowing the videoconferencing bridge to provide high-level commands in the form of the control signal, or make decisions or issue instructions about the display configuration or layout of the media streams, it also allows the video processing to provide proper interaction between different animation effects applied to the media streams.
The media stream may comprise a plurality of frames. In an alternative embodiment to that which the videoconferencing bridge issues high-level commands in the form of the control signal directly instructing the telecommunication endpoint device how the media streams should be displayed, the control signal may comprise metadata relating to each of the plurality of frames. The first device may further be configured to send the metadata related to each from of the plurality of frames with each frame of the plurality of frames to the telecommunication endpoint device. The metadata may comprise display configuration data for each frame of the plurality of frames. For example, the metadata sent with each frame may comprise information or instructions as to how that frame should be displayed at the telecommunication endpoint device and/or each remote endpoint. The metadata may comprise one or more of: an identifier of the media stream, a display position of the media stream (such as position to be displayed on a display), a size of the media stream (such as the data size and/or display size to be display on a display), a rendering order or Z index of the media stream, an opacity of the media stream, or a brightness of the media stream.
The Z index of the media stream comprises a stack order of the media stream, where elements with greater stack order appear in front of those with a lower stack order. The metadata may also comprise other visual effects found in user interfaces, such as effects described by the Hypertext Markup Language (HTML) Cascading Style Sheets (CSS) filter property, that could be applied to the media stream such as blurring or tinting.
The metadata may also comprise images, or rich text information to be presented on the screen, such as labels identifying the participants, or an indicator for a current speaker. By specifying the composition metadata in every frame, the telecommunication endpoint device can render a wide array of different animations without the need for additional coding, for example in the client, because the metadata describes frame by frame how each animation should progress. The metadata may additionally be used to modify the user interface, such as changing the appearance of a self-view button when self-view of a camera device is enabled or disabled.
The inventors have appreciated that having the telecommunication endpoint device carry out at least a portion of the video processing introduces further ways in which arrangements of the present invention may be further improved. For example, it would be advantageous for the first device and telecommunication endpoint device to be synchronised. In particular, it would be beneficial for at least the GPU of the telecommunication endpoint device to be synchronised with the first device. For example, where the telecommunication endpoint device provides video processing such as animation effects, if the GPU of the telecommunication endpoint device fades in an overlay (as an example of an effect) too soon, then the stream will look disorganised to a user. In contrast, if an overlay is faded in too late, the stream may appear sluggish. However, the media stream received at the telecommunication endpoint device from the first device will comprise latency, and thus the latency should be taken into account when calculating the time at which video processing, for example animations locally shown at the telecommunication endpoint device, are started. There are various ways in which synchronisation may be achieved.
Network time protocol (NTP) calculations may be performed, or a measured current latency from the first device to the telecommunication endpoint device may be retrieved from a video codec and the measured current latency may be added to a start time of the media stream provided by the first device. For example, the current latency from the first device to the telecommunication endpoint device may be measured, and then the latency be taken into account before carrying out any video processing such as an animation or effect used to begin the stream from the first device.
Alternatively, frame indices of the media stream may be used to identify a frame of the media stream. This may be used to indicate when video processing, such as an animation or effect should start. This may be used instead of an absolute time map.
The control signal and display configuration enable the telecommunication endpoint device to present the media streams to a user in a layout or configuration which is convenient and customizable, and which can accommodate various types of media streams. Different media streams comprise different properties. For example, remote camera streams typically need to be displayed with different prominence according to importance. This importance may be determined by, for example, "last speaker' or by manual selection, and such information may be included in the control signal. Local camera streams are preferably displayed with very low latency to avoid the distraction that comes from a user seeing a delayed image of themselves being displayed. Local content streams from local devices are preferably also displayed with low latency as well as maximum bandwidth, so that the look and feel of the videoconference is that the local content device is connected directly to the local display. Remote content streams from remote content devices preferably have high resolution such that small text can be read, but frame rate may be compromised in order to save bandwidth. Combined media streams received from the first device may change layout which may be dependent on the changing nature of which media stream is of highest importance and is currently active. All such information may be included in the control signal sent from the first device to the telecommunication endpoint device.
In addition, the control signal and display configuration data enable the system to adapt to displaying the media streams when changes in the media streams occur. For example, changes could include: a change in the arrangement of remote camera; activation or deactivation of a local or remote content source; activation or deactivation of a local camera device; a change in position of a self-view of a local camera device; a change in the arrangement of a remote content device and remote camera views; activation, change, or deactivation of other windows in the system such as a chat session or information window. The display configuration data comprised in the control signal enables the system to conveniently and efficiently display any changes made, thus providing to a user a flexible or smart layout for the videoconferencing system.
The control signal may be based on one or more of: audio energy detection, a control interface, or a quality rating of the media stream. The audio energy detection may be the audio level of each media stream. For example, the control signal may be provided such that the media stream providing the highest level of audio (potentially indicating that a user in that media stream is speaking) is displayed most prominently on a display. That is, the display configuration may be adjusted based on these parameters to make said media stream largest on the display. Alternatively or in addition to the audio energy detection, the control signal may be based on a control interface. For example, a control interface or user interface may be used by a user to input or designate a desired display configuration. The control signal may then be based on the user's input into the user interface. Alternatively, or in addition, the control signal may also be based on a quality rating of the media stream. For example, similarly to the audio energy detection, the control signal may be provided such that the media stream with a highest quality is displayed most prominently, such as largest.
In one embodiment, a videoconference may be established comprising two telecommunication endpoint devices on the same local network, such as two different parties in the same organisation. In this embodiment, local media streams may be set up between the two local telecommunication endpoint devices on the same local network. This allows these local media streams to operate on a high bandwidth, advantageously higher than if the media streams were sent via the first device over a wide area network connection. The video processing carried out by the local telecommunication endpoint devices may include overlaying this decoded local media stream in preference to using a lower quality stream sent via the first device, such as via the videoconferencing bridge in the cloud. This therefore improves the video quality (latency, frame rate, resolution) for the users of the local telecommunication endpoints. Wien the first device provides the local telecommunication endpoint devices with one or more media streams, the first device may exclude the local media streams which are already being overlaid locally by the local telecommunication endpoint devices. This advantageously saves server processing power and bandwidth.
According to another aspect of the present invention, there is provided a videoconferencing method, the method comprising: a first device providing a media stream; a telecommunication device receiving the media stream over the internet from the first device; the first device sending a control signal to the telecommunication endpoint device, the control signal comprising display configuration data related to a display configuration of the media stream and the display configuration requiring video processing; the telecommunication endpoint device carrying out at least a portion of the video processing of the display configuration; and displaying the media stream according to the display configuration data.
The media stream may comprise a plurality of streams. The video processing may comprise combining at least some of the plurality of streams. The media stream may comprise a remote stream from one or more of a remote endpoint device or remote content device.
The method may further comprise the step of the telecommunication endpoint device displaying a local media stream. The local media stream may comprise one or more of a local camera stream or a local content stream.
The display configuration may comprise one or more of: a display position of the media stream and/or local media stream; or a display size of the media stream and/or local media stream. The display position and display size being the position and size as seen on a display. The video processing may comprise one or more animations for displaying the media stream and/or the local media stream. The method may further comprise the step of the first device carrying out at least a portion of the video processing. The control signal may comprise an instruction based on changes made to the display configuration.
The media stream may comprise a plurality of frames and the control signal may comprise metadata relating to each frame of the plurality of frames. The first device may be further configured to send the metadata related to each frame of the plurality of frames with each frame of the plurality of frames to the telecommunication endpoint device. The metadata may comprise display configuration data for each frame of the plurality of frames. The metadata may comprise one or more of: an identifier of the media stream, a display position of the media stream, a size of the media stream, a rendering order or Z index of the media stream, an opacity of the media stream, or a brightness of the media stream.
The method may further comprise the step of synchronising the first device and the telecommunication endpoint device by: performing network time protocol calculations or retrieving from a video codec a measured current latency from the first device to the telecommunication endpoint device and adding the measured current latency to a start time of the media stream provided by the first device; or using frame indices of the media stream to identify a frame.
The control signal may be at least partially based on one or more of: audio energy detection, a control interface, or a quality rating of the media stream.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be described in more detail, by way of example, with reference to the accompanying drawings, in which: Figure 1 is a schematic drawing of a videoconferencing system according to the prior art; Figure 2 is a schematic drawing of a videoconferencing system embodying an aspect of the present invention; Figure 3a is a schematic drawing illustrating a display configuration embodying an aspect of the present invention; Figure 3b is a schematic drawing illustrating a display configuration embodying an aspect of the present invention; and Figure 3c is a schematic drawing illustrating a display configuration embodying an aspect of the present invention.
DETAILED DESCRIPTION
An example videoconferencing system will now be described with reference to Figures 2 to 3c. The example system of Figure 2 is similar, in many respects, to the prior art system illustrated in Figure 1 described above and like features have been given like reference numerals. However, as will be described in more detail, the systems differ significantly in that in the example of the present invention illustrated by Figure 2, the telecommunication endpoint device 3 carries out at least some of the video processing.
Figure 2 illustrates an example videoconferencing system 10 comprising a first device, which in this example is a videoconferencing bridge 101. The system 10 also comprises a telecommunication endpoint device 3 which in this example will be referred to as the local endpoint device 3. The system further comprises a local camera device 4, and two displays including a local camera display 6 and a local content display 7. The local endpoint device 3 comprises the capability to perform video processing using, in this example, a GPU 10.
The videoconferencing bridge 101 is implemented in a cloud server system and comprises the capability for video processing using, in this example, a video mixer 9. In this example, the videoconferencing bridge 101 is in connection with remote endpoints 2, 22, and the remote endpoint 2 is in connection with a remote camera device 14 and a remote content device 15. The remote camera device 14 and remote content device 15 may be in connection with the remote endpoint 2 by any local connection but here they are connected over a wireless local area network connection.
The endpoints 2, 22 and the local endpoint device 3 are in connection with the videoconferencing bridge 101 via wide area network connections, which in this example are internet connections.
The local endpoint device 3 is in connection with the local camera device 4, the local camera display 6, and the local content display 7 by local wired connections. It will be appreciated that the connections may also be over a wireless local area network.
During use, a user of the local endpoint device 3 wishes to take part in a videoconference with endpoints 2, 22. The user may initiate a videoconference via a user interface (not shown) at the local endpoint device 3. The local endpoint device 3 connects to the videoconferencing bridge 101 via the internet The videoconferencing bridge 101 also connects to remote endpoints 2, 22 via the internet. Each of the local endpoint device 3 and the remote endpoints 2, 22 provide media streams for the videoconference. Media streams from the remote endpoints 2, 22 are sent via the internet to the videoconferencing bridge 101. The videoconferencing bridge 101 has the capability to combine one or more of the media streams received at the videoconferencing bridge 101 using the video mixer 9. In this example, the video mixer 9 combines a remote camera stream 11 from the remote endpoint 22 with a remote camera stream 11 from the remote camera device 14 sent through the other remote endpoint 2 to form a combined media stream 18. Having combined these remote camera streams 11, the videoconferencing bridge 101 sends the combined media stream 18 to the local endpoint device 3. The videoconferencing bridge also receives a remote content stream 19 from the remote content device 15 via the remote endpoint 2, and sends this remote content stream 19 to the local endpoint device 3 in addition to the combined media stream 18. The remote content stream 19 is not combined by the video mixer 9. The videoconferencing bridge 101 also receives a local camera stream 13 from the local camera device 4, sent via the local endpoint device 3, and the videoconferencing bridge 101 then returns the local camera stream 13 to the local endpoint device 3 over the internet.
The local endpoint device 3 is configured to process and display the media streams including the combined media stream 18, the remote content stream 19, and a local camera stream 13. These streams are then be displayed according to a specified display configuration.
The videoconferencing bridge 101 sends a control signal to the local endpoint device 3. The control signal comprises display configuration data related to a display configuration of the media streams to be displayed by the local endpoint device 3. The display configuration requires video processing. The display configuration of the media streams includes how the media streams will be displayed by the local endpoint device 3 on the displays 6, 7, such as the physical position and size of the media streams on a display.
Significantly, the local endpoint device 3 has the ability to provide at least some of the video processing of the media streams using the GPU 10. The video processing includes one or more animations for displaying the media streams, and also includes mixing or combining one or more of the media streams. In this example, the GPU 10 combines or mixes media streams including the received combined media stream 18, and the local camera stream 13. The local endpoint device 3 then displays the media streams according to the control signal and display configuration data received from the videoconferencing bridge 101. In this example, the video processing provided by the local endpoint device 3 includes fade animations for the media streams. The GPU 10 of the local endpoint device 3 provides video processing to fade in the combined media stream 18 and the local camera stream 13 from the local camera device 4 and displays these media streams on the local camera display 6, and provides video processing to fade in and display the remote content stream 19 onto the local content display 7.
The local endpoint device 3 continues to provide video processing to apply any changes made to the display configuration during the video conference.
Figures 3a to 3c illustrate an example scenario in which a videoconference is taking place between remote endpoints 2, 22 and local endpoint device 3, and media streams are received at the local endpoint device 3 as set out above. The local endpoint device 3 displays the media streams in a display layout according to the control signal sent by the videoconferencing bridge 101 which comprises the display data.
Figure 3a illustrates a first instance in which a remote camera stream 11 from remote camera device 14 showing a current speaker (indicated by highest audio energy output) is displayed largest, relative to other streams, on the local camera display 6. A thumbnail or icon 33 of the other remote endpoint device 22 also being displayed. A user then requests a "self-view" on the local camera device 4 by input on the local endpoint device 3 (not shown). The local endpoint device 3 sends a request via the internet to the video mixer 9 of the videoconferencing bridge 101 requesting to provide self-view. As illustrated in Figure 3b, the videoconferencing bridge 101 begins video processing including an animation on the combined media stream being sent to the local endpoint device 3 to rearrange the thumbnail 33 on the screen to make space 35 on the display 6 for the self-view 37 of the local camera device. The videoconferencing bridge 101 sends the control signal to the local endpoint device 3 instructing the local endpoint device 3 to begin video processing including initiating an animation, such as a fade animation, of the self-view 37 and specifying the screen position as the space 35 on the display opened up by the previous animation. As illustrated in Figure 3c, the local endpoint device 3 uses its GPU 10 to perform the animation and fade in an overlay of the local camera stream to show the self-view 37 at the screen position specified.
The animations and display of the media streams is most advantageous when there is synchronisation between the videoconferencing bridge 101 and the local endpoint device 3. In this example, synchronisation between the two components of the system is achieved by retrieving from the video codec, the measured current latency from the videoconferencing bridge 101 (specifically from the video mixer 9) and adding this measured latency to the start times of the media streams provided by the videoconferencing bridge 101 to the local endpoint device 3. However, it will be appreciated that in other embodiments, the synchronisation may also be achieved by performing NTP calculations, or using frame indices of the media streams sent from the videoconferencing bridge 101 to the local endpoint device 3 to identify the frame at which an animation should be initiated.
Embodiments of the invention have been described. It will be appreciated that variations and modifications may be made to the described embodiments within the scope of the present invention. For example, instead of the videoconferencing bridge 101 issuing high-level commands with the control signal such as triggering animations that are processed by the GPU 10 of the local endpoint device 3, the videoconferencing bridge 101 sends metadata embedded in each frame of the media stream. That is, the media streams comprise a plurality of frames. The control signal comprises metadata related to each frame of the plurality of frames. The videoconferencing bridge 101 is further configured to send the metadata related to each frame of the plurality of frames with each frame of the plurality of frames to the local endpoint device 3. The metadata comprises display configuration data for each frame of the plurality of frames. In such an example, the metadata also comprises an identifier of the media stream to which that metadata relates, and a size of the media stream. However, in other embodiments, the metadata may also comprise a display position of the media stream to which the metadata relates, a rendering order or Z index of the media stream, an opacity of the media stream, and a brightness of the media stream.

Claims (32)

  1. CLAIMS1. A videoconferencing system, the videoconferencing system comprising: a first device configured to provide a media stream; and a telecommunication endpoint device configured to receive, over the internet from the first device, the media stream, wherein: the first device is further configured to send a control signal to the telecommunication endpoint device, the control signal comprising display configuration data related to a display configuration of the media stream to be displayed by the telecommunication endpoint device, the display configuration requiring video processing; and the telecommunication endpoint device is configured to carry out at least a portion of the video processing of the display configuration.
  2. 2. A videoconferencing system according to claim 1 wherein the media stream comprises a plurality of streams.
  3. 3. A videoconferencing system according to claim 2 wherein the video processing comprises combining at least some of the plurality of streams.
  4. 4. A videoconferencing system according to any preceding claim wherein the media stream comprises a remote stream from one or more of a remote endpoint device or remote content device.
  5. 5. A videoconferencing system according to any preceding claim wherein the telecommunication endpoint device is further configured to display a local media stream.
  6. 6. A videoconferencing system according to any of claims 5 wherein the local media stream comprises one or more of a local camera stream or a local content stream.
  7. 7. A videoconferencing system according to claim 5 or claim 6 wherein the display configuration comprises one or more of: a display position of the media stream and/or local media stream; or a display size of the media stream and/or local media stream.
  8. 8. A videoconferencing system according to any of claims 5 to 7 wherein the video processing comprises one or more animations for displaying the media stream and/or the local media stream.
  9. 9. A videoconferencing system according to any preceding claim wherein the first device is further configured to carry out at least a portion of the video processing.
  10. 10. A videoconferencing system according to any preceding claim wherein the first device is a videoconferencing bridge.
  11. 11. A videoconferencing system according to any preceding claim wherein the control signal comprises an instruction based on changes made to the display configuration.
  12. 12. A videoconferencing system according to any of claims 1 to 10 wherein the media stream comprises a plurality of frames and the control signal comprises metadata relating to each frame of the plurality of frames, the first device being further configured to send the metadata related to each frame of the plurality of frames with each frame of the plurality of frames to the telecommunication endpoint device.
  13. 13. A videoconferencing system according to claim 12 wherein the metadata comprises display configuration data for each frame of the plurality of frames.
  14. 14. A videoconferencing system according to claim 12 or claim 13 wherein the metadata comprises one or more of: an identifier of the media stream, a display position of the media stream, a size of the media stream, a rendering order or Z index of the media stream, an opacity of the media stream, or a brightness of the media stream.
  15. 15. A videoconferencing system according to any preceding claim wherein the first device and the telecommunication endpoint device are configured to be synchronised by: performing network time protocol calculations or retrieving from a video codec a measured current latency from the first device to the telecommunication endpoint device and adding the measured current latency to a start time of the media stream provided by the first device; or using frame indices of the media stream to identify a frame of the media stream.
  16. 16. A videoconferencing system according to any preceding claim wherein the control signal is at least partially based on one or more of: audio energy detection, a control interface, or a quality rating of the media stream.
  17. 17. A videoconferencing method, the method comprising: a first device providing a media stream; a telecommunication device receiving the media stream over the internet from the first device; the first device sending a control signal to the telecommunication endpoint device, the control signal comprising display configuration data related to a display configuration of the media stream and the display configuration requiring video processing; the telecommunication endpoint device carrying out at least a portion of the video processing of the display configuration; and displaying the media stream according to the display configuration data.
  18. 18. A videoconferencing method according to claim 17 wherein the media stream comprises a plurality of streams.
  19. 19. A videoconferencing method according to claim 18 wherein the video processing comprises combining at least some of the plurality of streams.
  20. 20. A videoconferencing method according to any of claims 17 to 19 wherein the media stream comprises a remote stream from one or more of a remote endpoint device or remote content device.
  21. 21. A videoconferencing method according to any of claims 17 to 20 further comprising the step of the telecommunication endpoint device displaying a local media stream.
  22. 22. A videoconferencing method according to any of claims 21 wherein the local media stream comprises one or more of a local camera stream or a local content stream.
  23. 23. A videoconferencing method according to claim 21 or claim 22 wherein the display configuration comprises one or more of: a display position of the media stream and/or local media stream; or a display size of the media stream and/or local media stream.
  24. 24. A videoconferencing method according to any of claims 21 to 23 wherein the video processing comprises one or more animations for displaying the media stream and/or the local media stream.
  25. 25. A videoconferencing method according to any of claims 17 to 24 further comprising the step of the first device carrying out at least a portion of the video processing.
  26. 26. A videoconferencing method according to any of claims 17 to 25 wherein the first device is a videoconferencing bridge.
  27. 27. A videoconferencing method according to any of claims 17 to 26 wherein the control signal comprises an instruction based on changes made to the display configuration.
  28. 28. A videoconferencing method according to any of claims 17 to 26 wherein the media stream comprises a plurality of frames and the control signal comprises metadata relating to each frame of the plurality of frames, the first device being further configured to send the metadata related to each frame of the plurality of frames with each frame of the plurality of frames to the telecommunication endpoint device.
  29. 29. A videoconferencing method according to claim 28 wherein the metadata comprises display configuration data for each frame of the plurality of frames.
  30. 30. A videoconferencing method according to claim 28 or claim 29 wherein the metadata comprises one or more of an identifier of the media stream, a display position of the media stream, a size of the media stream, a rendering order or Z index of the media stream, an opacity of the media stream, or a brightness of the media stream.
  31. 31. A videoconferencing method according to any of claims 17 to 30 further comprising the step of synchronising the first device and the telecommunication endpoint device by: performing network time protocol calculations or retrieving from a video codec a measured current latency from the first device to the telecommunication endpoint device and adding the measured current latency to a start time of the media stream provided by the first device; or using frame indices of the media stream to identify a frame.
  32. 32. A videoconferencing method according to any of claims 17 to 31 wherein the control signal is based on one or more of: audio energy detection, a control interface, or a quality rating of the media stream.
GB2006073.7A 2020-04-24 2020-04-24 A videoconferencing system Pending GB2594333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2006073.7A GB2594333A (en) 2020-04-24 2020-04-24 A videoconferencing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2006073.7A GB2594333A (en) 2020-04-24 2020-04-24 A videoconferencing system

Publications (2)

Publication Number Publication Date
GB202006073D0 GB202006073D0 (en) 2020-06-10
GB2594333A true GB2594333A (en) 2021-10-27

Family

ID=71080105

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2006073.7A Pending GB2594333A (en) 2020-04-24 2020-04-24 A videoconferencing system

Country Status (1)

Country Link
GB (1) GB2594333A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050248652A1 (en) * 2003-10-08 2005-11-10 Cisco Technology, Inc., A California Corporation System and method for performing distributed video conferencing
US7627629B1 (en) * 2002-10-30 2009-12-01 Cisco Technology, Inc. Method and apparatus for multipoint conferencing
US20140307042A1 (en) * 2013-03-14 2014-10-16 Starleaf Telecommunication network
US20150116451A1 (en) * 2013-10-29 2015-04-30 Cisco Technology, Inc. Panoramic Video Conference
US20150288926A1 (en) * 2014-04-03 2015-10-08 CafeX Communications Inc. Framework to support a hybrid of meshed endpoints with non-meshed endpoints

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7627629B1 (en) * 2002-10-30 2009-12-01 Cisco Technology, Inc. Method and apparatus for multipoint conferencing
US20050248652A1 (en) * 2003-10-08 2005-11-10 Cisco Technology, Inc., A California Corporation System and method for performing distributed video conferencing
US20140307042A1 (en) * 2013-03-14 2014-10-16 Starleaf Telecommunication network
US20150116451A1 (en) * 2013-10-29 2015-04-30 Cisco Technology, Inc. Panoramic Video Conference
US20150288926A1 (en) * 2014-04-03 2015-10-08 CafeX Communications Inc. Framework to support a hybrid of meshed endpoints with non-meshed endpoints

Also Published As

Publication number Publication date
GB202006073D0 (en) 2020-06-10

Similar Documents

Publication Publication Date Title
US11303881B2 (en) Method and client for playing back panoramic video
US11120677B2 (en) Transcoding mixing and distribution system and method for a video security system
US11895426B2 (en) Method and apparatus for capturing video, electronic device and computer-readable storage medium
EP2603848B1 (en) Cloning or extending a computer desktop on a wireless display surface
US8179417B2 (en) Video collaboration
US20080303949A1 (en) Manipulating video streams
JP2016511603A (en) System and method for media streaming for multi-user control and shared display
JP2006101522A (en) Video conference system, video conference system for enabling participant to customize cooperation work model, and method for controlling mixing of data stream for video conference session
WO2019233314A1 (en) Method for echo imaging on television wall, service device and television wall system
US10044979B2 (en) Acquiring regions of remote shared content with high resolution
JP6241802B1 (en) Video distribution system, user terminal device, and video distribution method
EP3024223B1 (en) Videoconference terminal, secondary-stream data accessing method, and computer storage medium
US20210152876A1 (en) Video processing method and apparatus, and storage medium
KR20080082759A (en) System and method for realizing vertual studio via network
EP4171046A1 (en) Video processing method, and device, storage medium and program product
JP2020527883A5 (en)
JP2014049865A (en) Monitor camera system
Liu et al. Internet-based videoconferencing coder/decoders and tools for telemedicine
US9740378B2 (en) Collaboration content sharing
CN113315927B (en) Video processing method and device, electronic equipment and storage medium
GB2594333A (en) A videoconferencing system
CN115379277B (en) VR panoramic video playing method and system based on IPTV service
WO2022127524A1 (en) Video conference presentation method and apparatus, and terminal device and storage medium
US9648274B2 (en) Coordinated video-phone overlay on top of PC desktop display
US20130195184A1 (en) Scalable video coding method and apparatus