GB2587500A

GB2587500A - Audio - visual conferencing systems

Info

Publication number: GB2587500A
Application number: GB2014834.2A
Authority: GB
Inventors: John Gwilt David; Edward Nancekievill Alexander; James Rutledge Rupert
Original assignee: Starleaf Ltd
Current assignee: Starleaf Ltd
Priority date: 2015-01-20
Filing date: 2015-01-20
Publication date: 2021-03-31
Anticipated expiration: 2035-01-20
Also published as: GB202014834D0; GB2587500B

Abstract

A method of establishing or joining an audio-visual (AV) conference call comprises: sending a first password for the call to a first conferencing equipment 152 for output to a first participant on the call; receiving a second password from the first participant via a second conferencing equipment 170; checking whether the second password corresponds to said first password; and initiating an AV conference call with the first participant in response to determining that the first and second passwords match. The first password is sent from a control system 162, 164 to the first conferencing equipment. The second password is received by the control system. The password verification method is also carried out by the control system. The first conferencing equipment is used for capture and display of a video stream of the videocall. The second conferencing equipment is used for the receipt and sending of an audio stream of the videoconference call. The first and second conferencing equipment are simultaneously used by the first user within the AV conference call. The control system may comprise a control server 162 and a multipoint control unit (MCU) 164.

Description

Audio -Visual Conferencing Systems

FIELD OF THE INVENTION

This invention relates to systems, methods, apparatus and computer program code for establishing and conducting audio-visual (AV) conference calls. More particularly we will describe techniques where one or more of the participants uses separate pieces of equipment, such as a laptop and phone, to join or continue an audio-visual (AV) conference call.

BACKGROUND TO THE INVENTION

We have previously described, in W02012/172310, systems for automatically setting up an AV conference call. Broadly speaking, in these systems video is automatically added to a pre-existing telephone call by employing a phone tap at each endpoint which recognises a connection to a teleconference service and automatically establishes a video connection between the participants. Here, and in the following description, a conference call may have only two participants or it may have three or more participants.

The techniques described previously employ phone tap hardware at each endpoint and it would be advantageous to avoid the need for such hardware. An AV call, such as a Skype (RTM) call, can be made using just a laptop, but the quality of such calls can 25 often be poor, with users experiencing loss of audio.

There is therefore a need for improved audio-visual (AV) conferencing techniques.

SUMMARY OF THE INVENTION

The invention is set out later under the heading "Call initiation".

By way of background and context we first describe a method of conducting an audiovisual (AV) conference call comprising at least two participants, the method comprising, having established said audio-visual (AV) conference call linking a first call endpoint comprising a first audio-visual (AV) conferencing device and a second call endpoint: providing an AV pairing code to said first AV conferencing device using said AV conference call, for output to a user at said first endpoint; receiving an audio call from a second device at said first endpoint; receiving an audio pairing code over said audio call; linking said audio call with said first call endpoint of said AV conference call, using said audio pairing code received over said audio call and said AV pairing code sent to said first AV conferencing device using said AV conference call; and replacing an audible audio stream in said AV conference call by replacing said audible audio stream from said first AV conferencing device with an audible audio stream from said second device.

Broadly speaking in embodiments a participant is able to switch over from a combined audio and visual conference call to use a more reliable network such as a telephone network for the audio part of the call when the quality of the combined AV call is degraded. The switchover may be automatic, for example on detection of a reduction in quality, or less than a threshold level of quality, of the AV call, in particular the audio component of the AV call to the first endpoint. Alternatively the switchover may be at the request of the user at the relevant endpoint. Although reference has been made to switchover of the audio stream from a stream associated with the first AV conferencing device to an audio stream associated with the second device, typically a fixed line or mobile phone, this may be accomplished by muting the audio stream in the AV conference call and adding an audio stream derived from the second device or phone. Muting may be carried out either at the endpoint, or at the Multipoint Control Unit (MCU), for either the outbound or inbound audio. Thus although the effect to the user is that one audible audio stream has been replaced by another this may, but need not be, accomplished by replacing one audio data stream with another.

As described above, embodiments of the method determine that the audio pairing code received over the audio call (from the phone) corresponds to or matches the AV pairing code sent to the first AV conferencing device. Typically the first AV conferencing device will display the AV pairing code on a screen but in principle an audio output may additionally or alternatively be used; typically the user will re-enter the displayed code and thus the code received over the phone network may simply be matched to that sent (over an IP network) to the first AV conferencing device; but potentially the user may enter a modified version of the displayed AV pairing code in which case the system may instead determine that the two codes correspond. The skilled person will appreciate that the audio call may be initiated either before or after the AV conference call, and either before or after the AV pairing code is provided to the first AV conferencing device.

In embodiments the pairing code identifies an endpoint of the AV conference call. In principle the pairing code may identify the AV conference call itself as well as the endpoint within the call but in other, generally more preferable embodiments, an additional AV conference identifier is employed (also referred to later as a (virtual) meeting room). In such embodiments preferably the AV pairing code is generated in association with an AV conference identifier for the AV conference call including the endpoint to which the AV pairing code is to be sent. The method, more particularly a control or "rendezvous" server, is thus able to link the audio pairing code received via the audio call with an AV conference identifier associated with the matching AV pairing code. This AV conference identifier may then be provided to a gateway for the audio call into a network handling the AV conference call (and/or to a multipoint control unit of the system), to facilitate replacing the relevant audible audio stream in the AV conference call with the audible audio stream from the audio call. More particularly in embodiments the gateway may make a request to the MCU that the audio call be permitted to join the AV conference call identified by the AV conference identifier. The MCU is then able to replace the audible audio stream in the AV conference call from the AV conferencing device at the first endpoint with audio from the second device (phone) at the first endpoint. As previously mentioned replacing the audible audio may comprise a process of adding in the audio from the second device and muting the audio from the first, AV conferencing device. In preferred embodiments the audio stream from the second device is synchronised with the audio stream from the first AV conferencing device so that the replaced audible audio is synchronised with the corresponding video; since both audio streams are present at the MCU this synchronisation is straightforward. For example the synchronization may be performed using any convenient correlation mechanism between the audio received from the AV conferencing device (which in itself is 5 synchronized to the video) and the audio received from the second device.

In preferred embodiments, as previously mentioned the AV conference call is established over a data network which includes (and may consist of) an IP (internet protocol) network provided with a gateway to a phone network. Thus, for example, a connection to the first AV conferencing device, typically a static or mobile computing or similar device, may be either wired or wireless and, when wireless, may employ any of a range of wireless technologies including, but not limited to WiFi (RTM) and 3G and 4G phone network data connections. In embodiments, however, a connection to the data network may employ IP.

This gateway may receive the audio call from the user and, optionally, is provided with an IVR (interactive virtual receptionist) or similar voice-driven prompt system to provide an audio user interface for prompting the user to enter the audio pairing code and/or for providing connection successful/unsuccessful and other messages, examples of which are described later. In embodiments the audio pairing code is sent over the phone network over a DTMF (dual tone multi frequency) code, but in other embodiments other techniques, such as voice recognition, may be used. In embodiments the gateway forwards the audio pairing code over the IP network to the AV conference controller or rendezvous server which links the audio and AV pairing codes and, optionally, provides an AV conference identifier as previously described. In embodiments the gateway also forwards the audio stream from the second device (phone) to the MCU, the MCU then replacing the audible audio stream as previously described. The skilled person will recognise, however, that it is not necessary to have a separate AV conference controller (rendezvous server) and MCU -the functions of these two devices maybe combined. Further, as the skilled person will appreciate, a single physical computer may implement both the rendezvous server and an MCU.

The above techniques may straightforwardly be extended to an AV conference call comprising three or more participants where any one or all of the participants may be provided with the facility to swap the audio part of the AV call between a combined AV device such as a laptop and an audio call made over a different network, generally a phone network.

Examples of the techniques switch the audio portion of a call from a data network to a phone network. The skilled person will recognise that the approaches we describe may be generalised, for example additionally or alternatively to an audio stream, to switch a data communication channel or stream from one network to another. Thus, for example, the audio call from the second device may additionally or alternatively comprise a data call from a second device such as a phone and aspects and embodiments of the invention contemplate switching a used data channel between a data channel associated with a first AV conferencing device and a data channel associated with a second device at the same endpoint.

There is also described a control system for conducting an audio-visual (AV) conference call comprising at least two participants, wherein said AV conference call is established linking a first call endpoint comprising a first audio-visual (AV) conferencing device and a second call endpoint, the system comprising processor control code to: provide an AV pairing code to said first AV conferencing device using said AV conference call, for output to a user at said first endpoint; receive an audio pairing code over an audio call from a second device at said first endpoint; link said audio call with said first call endpoint of said AV conference call, using said audio pairing code received over said audio call and said AV pairing code sent to said first AV conferencing device using said AV conference call; and replace an audible audio stream in said AV conference call by replacing said audible audio stream from said first AV conferencing device with an audible audio stream from said second device.

As previously described, in preferred embodiments a control or rendezvous server generates the AV pairing code and links the audio pairing code with this to identify the AV conference call; and in embodiments the MCU replaces one audible audio stream with another.

There is further described a control server for a control system as described above, in particular comprising a network connection, working memory, non-volatile program memory and a processor. The program memory stores processor control code an any programming language) to implement one or more of the above described functions of the control server.

There is further described a multipoint control unit including at least one network connection, working memory, non-volatile program memory, and a processor. The program memory stores processor control code to implement one or more of the functions of the MCU described above.

There is further described a gateway, likewise including non-volatile program memory and a processor, stored code being configured to implement one or more functions of the above described gateway.

As the skilled person will appreciate, code and/or data to implement the control 20 system/control servers (rendezvous server)/MCU/gateway may be distributed between a plurality of coupled components in communication with one another.

Call Initiation We now describe techniques for initiating/establishing an audio-visual (AV) conference call in which at least one of the participants uses two different conferencing equipments, a first, such as a combination of a camera and monitor/TV, primarily for displaying video, and a second, such as a phone, primarily for audio communications.

According to a further aspect of the invention there is therefore provided a method of establishing or joining an audio-visual (AV) conference call comprising at least two participants, at least one of the participants using two different conferencing equipments, first conferencing equipment for capturing and displaying video and second conferencing equipment for receiving and sending audio, the method comprising: sending a first password from a control system for the AV conference call to said first conferencing equipment for output to a first participant on the AV conference call via said first conferencing equipment; receiving, in said control system, a second password from said first participant via said second conferencing equipment; checking, using said control system, whether said second password corresponds to said first password; and initiating an AV conference call with said first participant in response to determining that said first and second passwords correspond, wherein said AV conference call with said first participant uses said first conferencing equipment for a video steam of said AV conference call and uses said second conferencing equipment for an audio stream of said AV conference call.

Broadly speaking embodiments of the method employ a password, in embodiments a one-time password (OTP) to ensure that the same participant is using both the first and second conferencing equipment. This problem does not arise in embodiments of the previously described aspects of the invention because the pairing code sent to the first AV conferencing device can, in effect, serve as a password.

In some embodiments a phone number may be associated with an or each item of conferencing equipment or with each endpoint. A user may then dial the number associated with an item of equipment to access the equipment (the equipment may be labelled with its phone number). However it is preferable to be able to employ just one, common phone number for the conference call. Thus in some preferred embodiments of the above described method of establishing an AV conference call the system employs a physical location tag, also referred to later as a static pairing code.

In preferred embodiments the physical location tag identifies either the first conferencing equipment itself or a physical location, such as a room, in which the first conferencing equipment is located. In embodiments the physical location tag may take the form of a label associated with the equipment and/or room containing the equipment, in embodiments a physical label bearing an identification number or code which may be attached to the equipment and/or room. Thus in embodiments the physical location tag tags the equipment/room rather than specifying a location per se 5 of the equipment/room. The method may then further comprise receiving the physical location tag at the control system via the second conferencing equipment. This may be achieved, for example, by the first participant speaking or entering an identification code for the first video conferencing equipment or physical meeting room into the second conferencing equipment, for example a phone. In this way the second 10 conferencing equipment is identified as being located at the physical location of the first conferencing equipment.

In embodiments the physical location tag permits the use of a single phone number to access the conferencing service. In embodiments the combination of the phone 15 number dialled and the location tag (which may be NULL) uniquely identifies an endpoint so that an OTP challenge can be presented to the correct TV or other screen.

Moreover, even if a malicious user were to obtain the physical location tag (static pairing code) they would still not be able to spoof an AV conference call as the first 20 participant because they would need to be in the physical location of the first conferencing equipment to receive the first password from the control system.

In preferred implementations of the method the control system, more particularly the control (rendezvous) server, checks that the received physical location tag corresponds to a previously stored physical location tag. The control server inhibits initiating (joining) of the AV conference call contingent upon a result of the checking, more particularly if the received physical location tag (data) is not recognised. Alternatively, rather than being checked against a previously stored physical location tag, the received tag (data) may be validated in another way, for example by contacting the user (if previously registered), or by validating with a third party, or the like.

As previously described, in embodiments the second conferencing equipment comprises a phone coupled to a phone network, and the AV conference call employs an IF network for the video stream, with a gateway being provided between the phone network and the IF network. Again, a connection to the first (video) conferencing equipment may be either wired or wireless and, when wireless, may employ any of a range of wireless technologies including, for example, 3G and 4G phone network data connections. In embodiments the audio call may be received at the gateway. The physical location tag (static pairing code) may also be received at the gateway and forwarded to the control system, more particularly to the rendezvous server, and the gateway may be used to communicate the audio stream to/from the phone from/to the IP network. Optionally, as previously described, the gateway may be associated with a voice driven user interface such as an IVR (interactive virtual receptionist); interactions with the audio arm of the system may employ for example, DTMF tones or voice recognition.

As the skilled person will appreciate, a gateway as described above may be implemented as two units managed by different entities, typically physically separate and connected by a network such as an IF network. This facilitates, for example, out-sourced provision of the telephony -IF translation service (although for simplicity the figures described later show the gateway as a single unit).

In embodiments the control (rendezvous) server generates the first password and checks that the second (received) password corresponds to this. Typically, but not necessarily, the received password is matched to the sent password. In embodiments the control server is coupled to an MCU (on the IP network), and the MCU coordinates the video stream of the first conferencing equipment with audio stream of the second conferencing equipment, preferably but not essentially synchronising these.

In preferred embodiments an AV identifier (or "virtual meeting room" identifier) is associated with the AV conference call. In embodiments the AV identifier is used to identify at the MCU which endpoints are in conference with each other, and therefore to identify which received audio and video streams to mix together. The AV identifier may be received at the control server from the first participant, typically via the second (audio) conferencing equipment, for example as DTMF tones. In embodiments the gateway receives the AV identifier, forwards this to the control server, which checks the passwords, and then provides the AV identifier back to the gateway for the gateway to initiate or maintain the audio link to the second conferencing equipment. Alternatively, 5 however, the gateway may retain the AV identifier locally. In preferred embodiments the gateway also initiates the audio link to the MCU, although potentially this could be initiated by another element in the system. The control (rendezvous) server preferably communicates the AV identifier to the first conferencing equipment to initiate a video with the MCU, in particular identified by the AV identifier. Alternatively, however, the 10 video link may be initiated more directly by the control/rendezvous server.

In some preferred embodiments of the system/method the first conferencing equipment comprises a video camera coupled to a TV or monitor (here a monitor being any suitable display screen). In embodiments the video camera includes the camera itself, coupled to a processor, as well as working memory and non-volatile program memory, and a network connection. The program memory includes processor control code to communicate with the AV conference control system, in particular for displaying the first password to the first participant on the TV or monitor. In embodiments the video camera also includes processor control code for communicating with the control/rendezvous server to receive the AV identifier, and code for communicating with the MCU to initiate a video link with the MCU using the AV identifier and, optionally, code for displaying the password. The skilled person will appreciate that where the first conferencing equipment comprises a video camera coupled to a TV this functionality may be provided by a "Smart TV" incorporating a camera such as a webcam or the like. Such a camera may be either built into the TV so that the camera and TV are packaged together in a single device, or the camera may be, or may be part of, an accessory, connected to the Smart TV.

In a further aspect the invention provides a control system for establishing or joining an audio-visual (AV) conference call comprising at least two participants, at least one of the participants using two different conferencing equipments, first conferencing equipment for capturing and displaying video and second conferencing equipment for receiving and sending audio, the system comprising processor control code to: send a first password from a control system for the AV conference call to said first conferencing equipment for output to a first participant on the AV conference call via said first conferencing equipment; receive, in said control system, a second password 5 from said first participant via said second conferencing equipment; check, using said control system, whether said second password corresponds to said first password; and initiate an AV conference call with said first participant in response to determining that said first and second passwords correspond, wherein said AV conference call with said first participant uses said first conferencing equipment for a video steam of said AV 10 conference call and uses said second conferencing equipment for an audio stream of said AV conference call.

Preferably the control system also includes code for receiving and processing physical location tag data as previously described.

In other aspects the invention includes a control/rendezvous server for the above described control system, for establishing/joining an AV conference call; and an MCU for the above described control system. Each of these may comprise working memory, non-volatile program memory, and a processor, and may be configured by stored processor control code to implement one or more of the previously described functions of the control/rendezvous server and MCU.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects of the invention will now be further described, by way of example only, with reference to the accompanying figures in which: Figure la to lc show, respectively, a conceptual illustration of switchover of an audio channel of an AV conference call onto a phone network according to an embodiment of 30 the invention, and a block diagram of a control system for conducting (switching over) an AV conference call according to an embodiment of the invention; Figure 2 shows message passing within the system of 1c for establishing and switching over an AV conference call according to an embodiment of the invention; Figure 3 illustrates a block diagram of a control system for establishing or joining an AV conference call according to an embodiment of the invention; Figure 4 shows message passing within the system of figure 3 for establishing or joining an AV conference call in a method according to an embodiment of the invention; 10 and Figure 5 shows a flow diagram illustrating functionality of a user interface suitable for implementing the systems/method illustrated in figures 1 to 4.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

We will describe a system which allows a participant in a video conference to separate their combined video and audio stream travelling across a single network into two separate streams, audio and video, travelling across two separate networks.

Embodiments of this system effectively allow users to transfer their audio from a combined video and data stream travelling over an IF network to a separate audio stream travelling over an alternative network such as, for instance, a POTS (Plain Old Telephone System) or GSM (Group System for Mobile Communications) network, and a video stream travelling over an IP network. We will describe not only techniques for achieving this with video conference calls which are already in progress, but also techniques for initiating video calls where the streams will travel across separate networks from the outset.

During such a video conference meeting between two parties, the deterioration in the received and sent audio is much more critical than the same deterioration in the video. In a situation where the audio persists and the video fails, communication is relatively unaffected. If, however, there is good video but a participant cannot hear clearly it is much more difficult to have a meaningful discussion. If the parties are unable to hear each other communication breaks down completely, whereas if they are no longer able to see each other, communication can still continue.

Therefore separating out the video and audio streams conveys a benefit when the audio can be carried over a network which is reliable. IP networks without Quality of Service or similar higher-level services are 'best effort', typically having high bandwidth capability but no real-time guarantees. By contrast, telephone networks have usually been designed to provide a guaranteed real-time service at lower bandwidth. As an example, when an IP network being used to transfer the combined video and audio streams becomes congested, the audio quality of the associated call can deteriorate. By transferring the audio to a real-time telephone network, the audio can remain uninterrupted, maintaining a reliable audio communication with the other video conferencing participant(s).

If a user were to make a video conferencing call to another user or to multiple users, for example through a Multipoint Control Unit (MCU), their audio and video would both travel over an IP network. If congestion arises during the call, along either of the paths taken by the users, not only can the video of the call become corrupted and stutter, but the audio can exhibit drop-outs and artefacts associated with losing data in the audio stream.

We will describe techniques in which a user who is in a video conference over an IP network with other users through a Multipoint Control Unit (MCU) can transfer their audio from the IF network to a telephone network using telephony equipment that is available to them locally. In broad terms the MCU is a device used to bridge videoconferencing connections. In one example the system comprises a video endpoint, capable of initiating a video conference and sending audio and video across an IP network, a Central Control Server (Rendezvous Server), a Multipoint Control Unit (MCU), existing telephony equipment (for instance a telephone connected to a POTS or GSM network), and a Gateway, which acts to transfer the audio from the telephone network to the IP network.

Figures la and lb illustrate, conceptually, operation of a system according to an embodiment of the invention. Thus in figure la first and second video conference endpoints 100, 102 are connected via MCU 104. An endpoint may comprise for example, a laptop computer equipped with a camera; each of these endpoints has a combined audio and video connection with the MCU 104 over an IP network.

In Figure lb, due a poor quality audio signal, the video and audio streams have been separated at endpoint 100. The video signal is still sent from/to a laptop 100a but the audio stream is sent from/to a phone 100b such as a mobile phone.

Figure lc shows more details of an embodiment of a video conference control system 150 for implementing the conceptual switch over illustrated in Figures la and lb. Thus the system comprises a first audio-visual device 152 at a first audio-visual endpoint, comprising information technology associated with video conferencing, typically a video camera for capturing images, a microphone for detecting sound, a display for showing received static images or video, and a speaker for playing received audio. A second AV device 154, at a second audio-visual endpoint, is similar. The AV devices 152, 154 are connected to an IP network 156 such as the public Internet via respective consumer grade IP connections 158, 160, such as a "last mile" connection to the public internet.

The IP network 156 is coupled to a control or rendezvous (RV) server 162, to a multipoint control unit (MCU) 164, and to a gateway 166 through low congestion IP connections. Each of these entities typically comprises a processor operating under control of stored software (not shown) to implement the functions described later. In addition to the particular functions described later the MCU operates to manage a multiple endpoint video conference in a manner which will be understood by those skilled in the art, typically providing the capability to mix/switch audio and/or video streams, to control bandwidth (to adapt the bandwidth of an endpoint), and potentially to perform other functions as picture-in-picture effects, transcoding and the like.

A second device comprising audio telephony equipment 170 is co-located with AV device 152 at the first AV call endpoint, and is connected to a telephony network 172 such as a public POTS telephone network or cellular network through a low-bandwidth, high-reliability connection (such as a POTS line). Telephony network 172 is coupled to IP network 156 by gateway 166; gateway 166 is coupled to IP network 156 through a low congestion IP connection. Thus a telephony network connection can be transferred to an IP network connection. The gateway 166 may comprise two (or more) physically separate units where, for example, an interface is managed by a third party; this is schematically illustrated by dashed line 166a. In preferred embodiments gateway 166 includes, or is associated with, an IVR (interactive virtual receptionist) 174. In principle the functions of the first and second devices at the first AV call endpoint may be combined in a single unit (for example a smart phone could have separate but simultaneous telephone audio and video data connections), but in general they will be separate devices.

The example system of figure 1c includes an MCU, which is convenient for a centralised AV conference call. However the skilled person will appreciate that the operation of embodiments of the invention described below do not require an MCU as such, but merely the functionality below (the functionality of replacing the audio to/from the first AV endpoint). This function may be performed, for example, by server 162 or some other element within system 150. The skilled person will appreciate that the functions described below, whilst conveniently implemented in a system as shown in figure 1c, may be distributed in any desired manner.

In broad terms, after initiating the video conference with the MCU, the video endpoint displays a unique-in-time Pairing Code on the screen, which has been communicated to it by the MCU via the IP network. The user then has the option to dial a telephone number, where an Interactive Virtual Receptionist (IVR) prompts the user to enter the displayed Pairing Code. On entering the Pairing Code, the user's audio as heard over the phone line will be sent to the MCU. The MCU then alters the audio streams being sent to other participants in the meeting, muting the audio as heard by the video endpoint in the room, and playing the audio as sent by the telephone of the user who entered the pairing code. This has the overall effect of transferring the audio from the local IP network to the local telephony network. Synchronisation of the audio and video streams arriving from that user can then be achieved using pattern matching at the MCU between the two audio streams, the first from the telephone and the second from the video endpoint over the IP network.

The skilled person will appreciate that, even though this description is applied to a 10 video conference that is facilitated by an MCU, if the video endpoints were configured to send messages between the Rendezvous Server directly, a similar approach for transferring the audio to a separate network locally could be used.

Figures 2 and 4 show message passing between the different elements of embodiments of the invention, for the cases of transferring audio on an existing video conference call and initiating a call with separate audio and video paths respectively. The messages detailed hereafter are interpreted by software running on the different elements.

In Figure 2, we describe the messages sent when a user has already initiated a video conference and now wishes to transfer their audio from the video endpoint's combined audio and video stream to separate streams: The video stream will be sent over the local IP network and the audio stream will be sent over the local telephony network. The user rings a predetermined phone number which requests they enter their Pairing Code using DTMF. The entered code is then passed between the components before an onward call is started with the MCU. The MCU then alters the audio sent to all other participants in the meeting with respect to this one user, muting the audio stream from the user's video endpoint, and replacing it with the audio stream from the user's telephone, as received from the Gateway.

In more detail, the protocol of figure 2 is as set out below (the vertical dashed line for the Gateway represents the audio connection to the Gateway, the solid line the IP connection).

1. The video endpoint sends a request to the MCU to start a meeting with a specific AV conference identification code or in a specific virtual meeting room, say AV conference ID/meeting room X. 2. The MCU requests a group of Pairing Codes from the Rendezvous Server which will be assigned to video endpoints of AV conference ID X when they join AV conference ID X/meeting room X. 3. The Rendezvous Server generates unique-in-time Pairing Codes and associates these with AV conference ID/meeting room X. It then passes this list of Pairing Codes on to the MCU.

4 Upon successful receipt of these Pairing Codes from Rendezvous Server, the MCU accepts the incoming video conference call from the video endpoint, and associates with it a Pairing Code, say Y, that it received from Rendezvous Server. That video endpoint can now be uniquely identified by the MCU using this particular Pairing Code and AV conference ID/meeting room combination.

5. Either immediately, or upon detection of poor audio over the IP network, the MCU sends Pairing Code Y to the video endpoint. The video endpoint may display it on the screen for the user to enter, if they choose, to transfer their audio stream to the telephone.

6. Some time later, the Gateway receives an incoming phone call, which initiates the IVR, offering the user a transfer of the audio from the existing link to the telephone network. The user interface flow diagram of Figure 4 also refers.

7. Through the IVR, the Gateway requests a pairing code from the user.

8. The user enters Pairing Code Y, which is shown on the video endpoint's screen (as communicated by the MCU to the video endpoint in step 5), by sending (for example) DTMF tones down the telephone line to the Gateway.

9. The Gateway interprets the received DTMF tones and forwards a numeric representation of these on to the Rendezvous Server.

10. The Rendezvous Server responds to the Gateway, informing it that AV conference ID/meeting room Xis associated with Pairing Code Y. 11 The Gateway makes a request to the MCU to join AV conference ID/meeting room X, detailing in its initiation process that it is the audio stream associated with Pairing Code Y. The MCU now mutes outgoing audio from the video endpoint it locally associates with Pairing Code Y, and adds the audio stream from the gateway into AV conference ID/meeting room X. Synchronisation of the new audio and video stream pair can be performed by comparing the two sources of audio from the user: The audio that was being sent originally from the video endpoint and the audio that is arriving through the transferred audio channel. The waveforms associated with these two audio streams are pattern matched by theMCU to determine the difference Of any) in arrival times of the audio. This is then used as a basis for an amount of delay to apply either to the new, transferred audio stream or to the video stream incoming from the video endpoint, in order to synchronise the two streams. This calculated delay is applied by the MCU by choosing when to begin playing out either of the two streams.

We next describe the systems/methods for establishing or joining an AV conference call, in particular where separate audio and video streams are used from the outset. Such an approach may be employed with the system shown in figure 1c. In other implementations, however, a system of the type illustrated in figure 3 is employed (in which like elements of figure 1c are indicated by like reference numerals).

In figure 3 the AV device 152 is replaced by video apparatus 300 comprising a video camera 302 having a video output to a TV or monitor 304. Conveniently, in embodiments, the TV/monitor 304 is of the type which is able to automatically switch on in response to a signal from camera 302. Camera 302 preferably incorporates some intelligence, that is it includes a processor and stored control code to control camera video input from the camera, video output to the TV/monitor and communications over IP connection 158. More particularly the stored code preferably includes code for outputting video to display a password to the user, for communicating with rendezvous server 162, for receiving an AV conference identifier, and for communicating with the MCU for initiating a video link with the MCU.

Figure 4 details the messaging when a user wishes to initiate a call with separate audio 5 and video streams. In this scenario, the user ringing into the Gateway will enter a preferably globally unique, static identifier for the video endpoint in the room, which serves to identify to the Rendezvous Server the video endpoint that should start video with the MCU. Here the MCU does not need to alter the audio streams which are being sent to the other participants in the meeting, as it is aware at the time the video 10 call is initiated with the video endpoint that the audio is arriving from another source. As such, it need not negotiate an audio channel for this video endpoint.

The static identifier may, for example, be printed on a label attached to the video endpoint 300. The static identifier is protected by the physical security of the building housing the video endpoint but additional security is preferable. Thus the system employs a further step of authentication to ensure that the user is in the room with the endpoint that will start sending video to the MCU. This authentication step, which involves a password such as a One Time Password (OTP), inhibits a malicious user from starting video with an arbitrary video endpoint were the malicious user to discover the static identifier. This issue does not arise in the previous scenario as the pairing code sent from the MCU to the video endpoint serves as this OTP authentication.

In more detail, the protocol of figure 4 is as set out below (the vertical dashed line for the Gateway represents the audio connection to the Gateway, the solid line the IP 25 connection).

1. An incoming call is received by the Gateway. This initiates the Gateway's IVR.

2. Through the IVR, the Gateway requests a Pairing Code from the user.

3. The user enters the static Pairing Code (physical location tag) made available to the user in the room, for example printed on paper, by sending DTMF tones down the telephone line to the Gateway.

4. The Gateway interprets the received DTMF tones and sends a representation of these on to the Rendezvous Server.

5. The Rendezvous Server detects that this is a known static Pairing Code (physical location tag), and informs the Gateway that it should prompt the user to follow instructions shown on the video endpoint's screen to continue to their meeting.

6. The Rendezvous Server communicates the OTP code to the video endpoint (camera 302 and TV 304), and instructs it to display on its screen the instructions on how to join the meeting, which incorporates how to input the OTP code for authentication.

7. Through the IVR, the Gateway prompts the user to follow the on-screen instructions to continue to their meeting.

8. Over DTMF the user enters their AV conference ID/virtual meeting room number, and the OTP code that is displayed on the video endpoint's screen.

9. The Gateway interprets the received DTMF tones and sends a representation of these on to the Rendezvous Server.

The Rendezvous Server performs validation on the information that was entered by the user. It checks that the OTP code that was entered matches that was sent to the video endpoint earlier, thus authenticating that the user is in the room with the video endpoint. The confirmation that the user is allowed to proceed is then passed back to the Gateway, along with the AV conference ID/virtual meeting room number that should be joined. The Gateway then initiates an audio-only call 12a to this AV conference ID/virtual meeting room on the MCU.

11 After passing authentication, the Rendezvous Server tells the video endpoint (camera 302) to start a video-only call 12b with the MCU using the AV conference ID/virtual meeting room number that was part of the user input. The entire call is now established with separate local networks for the audio and video streams.

The human brain is relatively tolerant to a small lack of synchronisation between audio and video and thus in embodiments synchronisation can be achieved simply by starting the video and audio streams at the same time. However in embodiments synchronisation is achieved by continuing to send a representation of the audio stream over the IP network in synchrony with the video stream, and then correlating the received, reliable audio with the representation over IP. For example an envelope of the audio may be sent; this may be captured, for example, using a microphone associated with the video camera. Where the video is sent using H.264 (MPEG-4) the audio data may be sent as user data in the Network Abstraction Layer.

Figure 5 shows a flow diagram of a user interface suitable for implementing the methods of both figure 2 and figure 4 described above. Thus at step 500 the user calls into the system and at step 502 the system plays a first message, for example "please enter your pairing code, followed by hash. If you are dialling in audio only, just press hash". If the user immediately enters # the call is an audio only call and the procedure moves to step 504 and plays message 3, "Please enter your meeting number (AV conference ID), followed by hash". If this number/ID is not validated (506) the procedure plays message 4, "We didn't recognise your meeting number" (510) up to three times (508). If validation fails three times (508) message 5 is played, "/ am afraid something has gone wrong" (512) and the procedure hangs up 514.

If a pairing code is entered following step 502 the system checks the validity of the pairing code (516). If the code is not a valid pairing code the system checks whether the entered code is an AV conference ID/meeting number (518), if so playing message 6, "You seem to have entered a meeting number press hash now to join the meeting as an audio only participant otherwise, just re-enter your pairing code" (520). If # is entered the procedure re-joins the audio the audio only procedure flow 522 and plays message 7, "Please wait while we connect your call' (524). The procedure then initiates an audio call with the appropriate MCU AV conference ID/meeting room (526).

If at step 518 it is determined that the entered number is neither a pairing code nor an AV conference ID/meeting number the procedure checks whether an incorrect code has been entered less than three times (528). If an incorrect code has been entered three times the procedure joins the "failure" flow path and plays message Sat step 512. 5 Otherwise the procedure plays message 2, "I'm afraid we couldn't match up the pairing code you just entered. The static pairing code should be displayed on your video endpoint if you aren't already in a call, or at the bottom of the screen if you've already joined. If you want to join in an audio-only, just press hash" (530). If # is entered the procedure moves to the audio-only flow and plays message 3 (504). If a pairing code is 10 entered the procedure returns to the validation check at step 516.

If a valid pairing code was entered the procedure checks at step 532, whether the pairing code was a static pairing code (physical location tag) or a dynamic pairing code (generated by the rendezvous server for the AV conference ID/meeting in progress. If the code was a dynamic pairing code the procedure moves to step 524, playing message 7, "Please wait while we connect your calf'. The procedure then initiates the audio call with the appropriate AV conference 526 (and then replaces the audio from the AV endpoint with audio from the audio call from the telephone at that endpoint).

If the entered code was a static pairing code (physical location tag) the procedure plays message 8, "Please enter your AV conference ID/meeting room number followed by the code on the TV screen" (534). If the code on the TV screen (the user's one time password) is validated (536) the procedure initiates the audio call 526 as previously described. If the validation at step 536 fails the procedure checks whether there have been three successive failures (538). If not the procedure plays either message 9, 10 or 4 (step 540) as appropriate. Message 9 states, "Failed to match your security code"; message 10 states, "Please enter your AV conference ID/meeting number followed by hash"; message 4 states, "We didn't recognise your AV conference ID/meeting number. The procedure then loops back to step 534. If the validation of step 536 fails three times the procedure plays message 5, "I'm afraid something has gone wrong" (540) and the system hangs up (step 514).

In embodiments the procedure of figure 5 is traversed by IVR software running on the gateway 166 of figures lc and 3. It will be appreciated, however, that the user interface of figure 5 may be implemented in other ways, potentially for example on a display screen of the phone or a display screen of the video endpoint. Broadly speaking figure 5 describes example logic for a participant to join an AV conference/meeting room on the MCU without video being transferred, and also illustrates the optional processing of various errors and exceptions. The skilled person will appreciate that many variations to the approach of figure 5 are possible.

Broadly speaking we have described techniques which, in embodiments, enable a video stream from a laptop or other device to be paired with an audio stream from a phone. In embodiments the laptop or other device may be in a multi-way video conference with both video and audio carried over the internet. The user dials a telephone number using any telephone and, in embodiments, an "operator" asks them for a code which is displayed on their computer's screen. On entering this code the user's audio is transferred from the computer to the telephone, so that audio travels over the telephone network whilst the video is still transmitted over the internet. One or more other participants in the call now only hear the audio from the user's telephone, not from the user's computer (which is automatically muted).

Together with or separately from this, other embodiments of the systems/methods we describe pair a video stream from an intelligent video camera to an audio stream from a phone. For example the user may be in room with the video camera and may then ring a telephone number where an operator asks them to enter a number which is displayed with the camera in the room. On entering this number a TV/monitor attached to the camera in the room then switches on and displays a challenge. The user then enters their response to the challenge using the telephone. This response may also incorporate information about the multi-way video conference they wish to join (this information may be incorporated directly or indirectly). The intelligent camera then joins the identified multi-way video conference, showing the video feeds of the other participant(s). The other participant(s) in the meeting now hear the audio from the user's telephone and see the video transmitted by the camera in the room with the user.

No doubt many other effective alternatives will occur to the skilled person. For example, the techniques we have described may also be employed where there is no video camera at one endpoint (or the video camera is not used), instead a (computer) screen display being shared.

Aspects of the system described herein are set out in the following clauses: 1. A method of conducting an audio-visual (AV) conference call comprising at least two participants, the method comprising, having established said audio-visual (AV) conference call linking a first call endpoint comprising a first audio-visual (AV) conferencing device and a second call endpoint: providing an AV pairing code to said first AV conferencing device using said AV conference call, for output to a user at said first endpoint; receiving an audio call from a second device at said first endpoint; receiving an audio pairing code over said audio call; linking said audio call with said first call endpoint of said AV conference call, 20 using said audio pairing code received over said audio call and said AV pairing code sent to said first AV conferencing device using said AV conference call; and replacing an audible audio stream in said AV conference call by replacing said audible audio stream from said first AV conferencing device with an audible audio stream from said second device.

2. A method as defined in clause 1 further comprising determining that said user at said first endpoint is to have said audible audio stream from said first AV conferencing device replaced by said audible audio stream from said second device.

3. A method as defined in clause 2 wherein said determining comprises receiving an audio replace request from said user via said audio call.

4. A method as defined in any preceding clause further comprising generating said AV pairing code in association with an AV conference identifier for said AV conference call; and wherein the method further comprises: retrieving said AV conference identifier for said AV conference call with said first call endpoint by matching said audio pairing code with said AV pairing code and then identifying said AV conference identifier associated with said AV pairing code; and wherein said replacing uses said AV conference identifier to identify said AV conference call in which said audible audio stream is to be replaced and said replacing uses said AV pairing code to identify said first AV conferencing device for which said audible audio stream is to be replaced.

5. A method as defined in any preceding clause wherein said AV conference call is established over a data network including an IP network; the method further comprising: providing a gateway between said IP network and a phone network; receiving said audio call at said gateway over said phone network; using said gateway to request said audio pairing code from said user; receiving said audio pairing code at said gateway; forwarding said audio pairing code over said IP network from said gateway to an AV conference controller for said linking; and forwarding said audio stream from said second device to a multipoint control 20 unit for said replacing of said audible audio steam in said AV conference call.

6. A method as defined in any preceding clause wherein said first AV conferencing device comprises a computing device having a display screen and an audio input and output; wherein said second device comprises a phone; and wherein said AV pairing code is output to said user by displaying said AV pairing code on said display screen.

7. A method as defined in any preceding clause wherein said AV conference call comprises three or more participants; the method further comprising combining at least audio data streams of said participants at a multipoint control unit, and using said multipoint control unit for said replacing of said audible audio stream.

8. A control system for conducting an audio-visual (AV) conference call comprising at 30 least two participants, wherein said AV conference call is established linking a first call endpoint comprising a first audio-visual (AV) conferencing device and a second call endpoint, the system comprising processor control code to: provide an AV pairing code to said first AV conferencing device using said AV conference call, for output to a user at said first endpoint; receive an audio pairing code over an audio call from a second device at said first endpoint; link said audio call with said first call endpoint of said AV conference call, using said audio pairing code received over said audio call and said AV pairing code sent to said first AV conferencing device using said AV conference call; and replace an audible audio stream in said AV conference call by replacing said audible audio stream from said first AV conferencing device with an audible audio stream from said second device.

9. A conferencing system as defined in clause 8 comprising a control server and a multipoint control unit (MCU); wherein said control server is configured to generate said AV pairing code and to link said audio pairing code and said AV pairing code to identify said AV conference call; and wherein said MCU is configured to replace said audible audio stream in said AV conference call by replacing said audible audio stream from said first AV conferencing device with an audible audio stream from said first endpoint derived from said second device.

10. A control server or multipoint control unit as recited in clause 9.

11. A method of establishing or joining an audio-visual (AV) conference call comprising at least two participants, at least one of the participants using two different conferencing equipments, first conferencing equipment for capturing and displaying video and second conferencing equipment for receiving and sending audio, the method comprising: sending a first password from a control system for the AV conference call to said first conferencing equipment for output to a first participant on the AV conference call via said first conferencing equipment; receiving, in said control system, a second password from said first participant 30 via said second conferencing equipment; checking, using said control system, whether said second password corresponds to said first password; and initiating an AV conference call with said first participant in response to determining that said first and second passwords correspond, wherein said AV conference call with said first participant uses said first conferencing equipment for a video steam of said AV conference call and uses said second conferencing equipment for an audio stream of said AV conference call.

12. A method as defined in clause 11 further comprising: providing a physical location tag for a first endpoint of said AV conference call, 10 wherein said physical location tag identifies said first conferencing equipment or a physical location of said first conferencing equipment; and receiving said physical location tag, at said control system, via said second conferencing equipment, such that said second conferencing equipment is identified as being located at a physical location of said first conferencing equipment.

13. A method as defined in clause 12 further comprising checking, using said control system, that said received physical location tag corresponds to a previously stored physical location tag, and wherein said initiating of said AV conference call is contingent upon a result of said checking.

14. A method as defined in clause 12 or 13 wherein said second conferencing equipment comprises a phone coupled to a phone network, wherein said AV conference call uses an IP network for said video stream, and wherein a gateway is provided between said phone network, and said IP network, the method further comprising receiving an audio call at said gateway from said phone.

15. A method as defined in clause 14, the method further comprising receiving said physical location tag at said control system, from said phone, via said phone network; receiving said second password at said control system, from said phone, via said phone network; and using said gateway to communicate said audio stream of said call to said IP network.

16. A method as defined in any one of clauses 11 to 15 wherein said control system 30 comprises a control server to generate said first password and to check that said first and second passwords correspond, coupled to a multipoint control unit (MCU) to coordinate said video steam and said audio stream of said first participant.

17. A method as defined in clause 16 wherein said second conferencing equipment comprises a phone coupled to a phone network, wherein said AV conference call uses 5 an IP network for said video stream, and wherein a gateway is provided between said phone network, and said IP network, the method further comprising: receiving an AV identifier for said AV conference call, from said first participant, at said control server; communicating said AV identifier from said control server to said gateway for said gateway to initiate an audio link to said MCU identified by said AV identifier; and 10 communicating said AV identifier to said first conferencing equipment for said first conferencing equipment to initiate a video link to said MCU identified by said AV identifier.

18. A method as defined in any one of clauses 11 to 17 the method further comprising providing a video camera configured to communicate with said control system; forming 15 said first conferencing equipment by coupling said camera to a TV or monitor; and displaying said first password to said first participant using said video camera.

19. A control system for establishing or joining an audio-visual (AV) conference call comprising at least two participants, at least one of the participants using two different conferencing equipments, first conferencing equipment for capturing and displaying video and second conferencing equipment for receiving and sending audio, the system comprising processor control code to: send a first password from a control system for the AV conference call to said first conferencing equipment for output to a first participant on the AV conference call via said first conferencing equipment; receive, in said control system, a second password from said first participant via said second conferencing equipment; check, using said control system, whether said second password corresponds to said first password; and initiate an AV conference call with said first participant in response to 30 determining that said first and second passwords correspond, wherein said AV conference call with said first participant uses said first conferencing equipment for a video steam of said AV conference call and uses said second conferencing equipment for an audio stream of said AV conference call.

20. A control system as defined in clause 19 further comprising code to receive a physical location tag, said physical location tag identifying said first conferencing equipment or a physical location of said first conferencing equipment; and check that said received physical location tag corresponds to a previously stored physical location tag, and wherein said initiating of said AV conference call is contingent upon a result of said checking.

21. A control system as defined in clause 20 comprising a a control server to generate said first password and to check that said first and second passwords correspond, coupled to a multipoint control unit (MCU) to coordinate said video steam and said audio stream of said first participant.

22. A video camera for the control system of clause 19, 20 or 21, wherein said first 15 conferencing equipment comprises said video camera coupled to a TV or monitor; and wherein said video camera is configured to communicate with said control system to display said first password to said first participant on said TV or monitor.

It will be understood that the invention is not limited to the described embodiments and 20 encompasses modifications apparent to those skilled in the art lying within the scope of the claims appended hereto.

Claims

CLAIMS: 1. A method of establishing or joining an audio-visual (AV) conference call comprising at least two participants, at least one of the participants using two different conferencing equipments, first conferencing equipment for capturing and displaying video and second conferencing equipment for receiving and sending audio, the method comprising: sending a first password from a control system for the AV conference call to said first conferencing equipment for output to a first participant on the AV conference 10 call via said first conferencing equipment; receiving, in said control system, a second password from said first participant via said second conferencing equipment; checking, using said control system, whether said second password corresponds to said first password; and initiating an AV conference call with said first participant in response to determining that said first and second passwords correspond, wherein said AV conference call with said first participant uses said first conferencing equipment for a video steam of said AV conference call and uses said second conferencing equipment for an audio stream of said AV conference call.
2. A method as claimed in claim 1 further comprising: providing a physical location tag for a first endpoint of said AV conference call, wherein said physical location tag identifies said first conferencing equipment or a physical location of said first conferencing equipment; and receiving said physical location tag, at said control system, via said second conferencing equipment, such that said second conferencing equipment is identified as being located at a physical location of said first conferencing equipment.
3. A method as claimed in claim 2 further comprising checking, using said control system, that said received physical location tag corresponds to a previously stored physical location tag, and wherein said initiating of said AV conference call is contingent upon a result of said checking.
4. A method as claimed in claim 2 or 3 wherein said second conferencing equipment comprises a phone coupled to a phone network, wherein said AV conference call uses an IP network for said video stream, and wherein a gateway is provided between said phone network, and said IP network, the method further comprising receiving an audio call at said gateway from said phone.
5. A method as claimed in claim 4, the method further comprising receiving said physical location tag at said control system, from said phone, via said phone network; receiving said second password at said control system, from said phone, via said phone network; and using said gateway to communicate said audio stream of said call to said IP network.
6. A method as claimed in any one of claims 1 to 5 wherein said control system comprises a control server to generate said first password and to check that said first 20 and second passwords correspond, coupled to a multipoint control unit (MCU) to coordinate said video steam and said audio stream of said first participant.
7. A method as claimed in claim 6 wherein said second conferencing equipment comprises a phone coupled to a phone network, wherein said AV conference call uses an IP network for said video stream, and wherein a gateway is provided between said phone network, and said IP network, the method further comprising: receiving an AV identifier for said AV conference call, from said first participant, at said control server; communicating said AV identifier from said control server to said gateway for said gateway to initiate an audio link to said MCU identified by said AV identifier; and communicating said AV identifier to said first conferencing equipment for said first conferencing equipment to initiate a video link to said MCU identified by said AV identifier.
8. A method as claimed in any one of claims 1 to 7 the method further comprising providing a video camera configured to communicate with said control system; forming said first conferencing equipment by coupling said camera to a TV or monitor; and 5 displaying said first password to said first participant using said video camera.
9. A control system for establishing or joining an audio-visual (AV) conference call comprising at least two participants, at least one of the participants using two different conferencing equipments, first conferencing equipment for capturing and displaying video and second conferencing equipment for receiving and sending audio, the system comprising processor control code to: send a first password from a control system for the AV conference call to said first conferencing equipment for output to a first participant on the AV conference call via said first conferencing equipment; receive, in said control system, a second password from said first participant via said second conferencing equipment; check, using said control system, whether said second password corresponds to said first password; and initiate an AV conference call with said first participant in response to 20 determining that said first and second passwords correspond, wherein said AV conference call with said first participant uses said first conferencing equipment for a video steam of said AV conference call and uses said second conferencing equipment for an audio stream of said AV conference call.
10. A control system as claimed in claim 9 further comprising code to receive a physical location tag, said physical location tag identifying said first conferencing equipment or a physical location of said first conferencing equipment; and check that said received physical location tag corresponds to a previously stored physical location tag, and wherein said initiating of said AV conference call is contingent upon a result of said checking.
11. A control system as claimed in claim 10 comprising a a control server to generate said first password and to check that said first and second passwords correspond, coupled to a multipoint control unit (MCU) to coordinate said video steam and said 5 audio stream of said first participant.
12. A video camera for the control system of claim 9, 10 or 11, wherein said first conferencing equipment comprises said video camera coupled to a TV or monitor; and wherein said video camera is configured to communicate with said control system to 10 display said first password to said first participant on said TV or monitor.