WO2012155660A1 - 一种远程呈现方法、终端和系统 - Google Patents

一种远程呈现方法、终端和系统 Download PDF

Info

Publication number
WO2012155660A1
WO2012155660A1 PCT/CN2012/072751 CN2012072751W WO2012155660A1 WO 2012155660 A1 WO2012155660 A1 WO 2012155660A1 CN 2012072751 W CN2012072751 W CN 2012072751W WO 2012155660 A1 WO2012155660 A1 WO 2012155660A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
channel
video
input
remote
Prior art date
Application number
PCT/CN2012/072751
Other languages
English (en)
French (fr)
Inventor
叶小阳
王东
吴永明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP12784928.9A priority Critical patent/EP2731330A4/en
Priority to US14/130,475 priority patent/US9172912B2/en
Publication of WO2012155660A1 publication Critical patent/WO2012155660A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1069Session establishment or de-establishment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to telepresence techniques, and more particularly to a telepresence method, terminal and system. Background technique
  • Telepresence is an advanced remote video conferencing system. Telepresence is deeply loved by high-end users with its true presence. In telepresence systems, listening to the voice, real size, and witnessing the communication are directly related to whether the user can have an immersive experience. A very important technical indicator for measuring telepresence systems.
  • a video conferencing terminal in addition to auxiliary stream video, usually has the following functions: encoding and transmitting one channel of audio and/or one channel of video, receiving and decoding output audio and/or video. Since the input source and output of the sound are only one way, the user cannot feel which direction the sound is emitted from the venue. Since the input source and output of the video are only one way, the capture picture of the local end needs to capture the entire screen of the site; if it is a multi-point conference, you can only select the stitching picture of a certain site or multiple remote sites, whether it is sent or not. The received video is unable to meet the life-size requirements.
  • the user experience required by the telepresence system is that there is a multi-channel audio and video code stream, and the orientation information of each audio channel is provided to achieve the listening sound recognition; according to the calculation, the image of the remote participant is displayed by 1:1, which is often a venue. Multiple video inputs and multiple video outputs are required.
  • Some existing telepresence terminals are integrated through a conventional video conferencing terminal, and multiple video conferencing terminals are deployed in a single conference site, and each video conferencing terminal can be respectively connected with an audio and video input/output device, and then through the sound,
  • the deployment and assembly technology of the video input/output device basically achieves the sound recognition and life-size effects.
  • the main purpose of the present invention is to provide a remote presentation method, a terminal, and a system, which are to solve the problem that the deployment of the existing remote presentation system is complicated, and the problem of solving the single conference number call and the code stream synchronization is difficult. .
  • the present invention provides a telepresence system comprising: a telepresence terminal, and a plurality of audio input/output devices connected to the telepresence terminal, and/or a multi-channel video input/output device, further comprising Defining a remote endpoint of the remote presentation terminal interworking;
  • the telepresence terminal has a multi-channel audio, video input/output interface, and the multi-channel audio input/output device and/or a multi-channel video input/output device for establishing a session with the remote endpoint Performing multi-channel audio and/or video stream input/output location information interaction and media capability negotiation, and establishing a media logical channel; and also inputting codes for the multiple audio input devices and/or multiple video input devices
  • the stream is encoded, and based on the established media logical channel, is sent to the remote endpoint according to an input/output location corresponding to the code stream, and receives multiple audio and/or video code streams from the remote endpoint. Decoding and playing according to the input/output position corresponding to the code stream to the audio output device and/or the video output device of the corresponding position;
  • the multi-channel audio input device is configured to input the collected audio data into the telepresence end End
  • the multi-channel video input device is configured to input the collected video data into the telepresence terminal;
  • the multi-channel audio output device is configured to output audio data decoded by the remote presentation terminal;
  • the multi-channel video output device is configured to output video data decoded by the remote presentation terminal;
  • the remote endpoint is configured to perform multi-channel audio and/or video stream input/output location information interaction and media capability negotiation with the remote presentation terminal, establish a media logical channel, and establish the media logical channel and the The telepresence terminal interacts with the audio and/or video code streams.
  • the performing multi-channel audio and/or video stream input/output location information interaction and media capability negotiation includes:
  • the remote presentation terminal sends the capability set of the local end to the remote endpoint, where the media codec capability of the local end and the audio and video stream input/output location information of the local remote presentation terminal are received;
  • the capability set includes remote media codec capability and audio and video stream input/output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • the multi-channel audio stream is transmitted through a transmission channel and received through a receiving channel; and/or, the multi-channel video stream is transmitted through a transmission channel and received through a receiving channel; each audio and/or video stream passes
  • the packet header information is distinguished.
  • the packet header information includes: a code stream type, input location information corresponding to the code stream, and output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • the multi-channel audio streams are respectively transmitted through different transmission channels and received through different receiving channels; and/or, the multi-channel video streams are respectively transmitted through different transmission channels and received through different receiving channels; Different transmission channels respectively correspond to the types of audio and/or video streams and input/output location information; different receiving channels and types of audio and/or video streams, and input/output
  • the location information establishes a correspondence.
  • the telepresence terminal is further configured to:
  • the local sending address and the remote receiving address, the far-end audio or video stream output location information corresponding to the channel to be established, and the local audio or video stream input location information establish a transmission channel.
  • the remote sending address and the receiving address of the local end the remote audio or video stream input location information corresponding to the channel to be established, and the audio or video stream output location information of the local end, establishing a receiving channel .
  • the telepresence terminal is further configured to synchronize the transmission and/or reception of multiple audio and/or video streams.
  • the remote endpoint is a multipoint control unit (MCU) or a remote telepresence terminal.
  • MCU multipoint control unit
  • remote telepresence terminal a remote telepresence terminal.
  • the present invention also provides a telepresence terminal having a multi-channel audio and video input/output interface for connecting multiple audio input/output devices and/or multiple video input/output devices, the terminal comprising: protocol signaling a processing module, a media codec module, and a media delivery module; wherein
  • the protocol signaling processing module is configured to establish a session with the remote endpoint, perform multi-channel audio and/or video stream input/output location information interaction, and media capability negotiation, and establish a media logical channel;
  • the media codec module is configured to encode an input code stream of the multiple audio input device and/or the multiple video input device, and provide the media code module to the media delivery module; Decoding the audio and/or video code stream of the remote end point, and transferring the corresponding input/output position according to the code stream to the audio output device and/or the video output device of the corresponding position for playing;
  • the media delivery module is configured to send the code stream according to a corresponding input/output location to And receiving, by the remote end point, an audio and/or video code stream from the remote end point, and providing the media codec module for decoding according to an input/output location corresponding to the code stream.
  • the performing multi-channel audio and/or video stream input/output location information interaction and media capability negotiation includes:
  • the protocol signaling processing module sends the capability set of the local remote presentation terminal to the remote endpoint, which includes the media codec capability of the local end and the audio and video stream input/output location information of the local remote presentation terminal;
  • the capability set of the remote endpoint includes a remote media codec capability and audio and video stream input/output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • the multi-channel audio stream is transmitted through a transmission channel and received through a receiving channel; and/or, the multi-channel video stream is transmitted through a transmission channel and received through a receiving channel; each audio and/or video stream passes
  • the packet header information is distinguished.
  • the packet header information includes: a code stream type, input location information corresponding to the code stream, and output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • Multiple audio streams are sent through different transmission channels and received through different receiving channels; and/or, multiple video streams are sent through different transmission channels and received through different receiving channels;
  • Different transmission channels respectively correspond to the types of audio and/or video streams and input/output location information; different receiving channels and types of audio and/or video streams, and input/output
  • the location information establishes a correspondence.
  • the protocol signaling processing module is further configured to:
  • the local sending address and the remote receiving address, the far-end audio or video stream output location information corresponding to the channel to be established, and the local audio or video stream input location information establish a transmission channel.
  • the remote sending address and the receiving address of the local end are waiting. Establish the remote audio or video stream input location information corresponding to the channel, and the audio or video stream output location information of the local end to establish a receiving channel.
  • the media delivery module is further configured to synchronize the transmission and/or reception of multiple audio and/or video streams.
  • the present invention also provides a remote presentation method, the telepresence terminal having a multi-channel audio and video input/output interface for connecting multiple audio input/output devices and/or multiple video input/output devices, the method comprising:
  • the telepresence terminal encodes an input code stream of the multiple audio input device and/or the multiple video input device, and sends the input code to the corresponding input/output location according to the established media logical channel. Deriving a remote end point; receiving a plurality of audio and/or video code streams from the remote end point, decoding and forwarding the audio output device and/or video corresponding to the corresponding position according to the input/output position corresponding to the code stream The output device plays.
  • the performing multi-channel audio and/or video stream input/output location information interaction and media capability negotiation includes:
  • the remote presentation terminal sends the capability set of the local end to the remote endpoint, where the media codec capability of the local end and the audio and video stream input/output location information of the local remote presentation terminal are received;
  • the capability set includes remote media codec capability and audio and video stream input/output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • the multi-channel audio stream is transmitted through a transmission channel and received through a receiving channel; and/or, the multi-channel video stream is transmitted through a transmission channel and received through a receiving channel; each audio and/or video stream passes
  • the packet header information is distinguished.
  • the packet header information includes: a code stream type, input location information corresponding to the code stream, and output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • Multiple audio streams are sent through different transmission channels and received through different receiving channels; and/or, multiple video streams are sent through different transmission channels and received through different receiving channels;
  • Different transmission channels respectively correspond to the types of audio and/or video streams and input/output location information; different receiving channels and types of audio and/or video streams, and input/output
  • the location information establishes a correspondence.
  • the establishing a media logical channel is specifically:
  • the local sending address and the remote receiving address, the far-end audio or video stream output location information corresponding to the channel to be established, and the local audio or video stream input location information establish a transmission channel.
  • the remote sending address and the receiving address of the local end the remote audio or video stream input location information corresponding to the channel to be established, and the audio or video stream output location information of the local end, establishing a receiving channel .
  • the method also includes the remote presentation terminal synchronizing the plurality of audio and/or video code streams transmitted and/or received.
  • the remote endpoint is an MCU or a remote telepresence terminal.
  • the invention provides a telepresence method, a terminal and a system. Since a telepresence terminal has multiple audio input/output interfaces and multiple video input/output interfaces, it can realize multi-channel audio and video input/output devices. Connected, therefore, only one telepresence terminal needs to be deployed for a single site to handle multi-channel audio and video streams, simple to deploy, and single conference number call can be realized; due to the acquisition of multi-channel audio and video data for a single site, The data source is more accurate. Because each audio and video input device collects a relatively fixed range of data, it can achieve the desired sound recognition and life-size effects of the telepresence system. DRAWINGS
  • FIG. 1 is a schematic structural diagram of a remote presentation system according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a code stream receiving process of a remote presentation method according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic diagram of a code stream sending process of a remote presentation method according to Embodiment 2 of the present invention
  • FIG. 5 is a schematic diagram of a code stream sending process of a remote presentation method according to Embodiment 4 of the present invention
  • FIG. 6 is a schematic diagram of a code stream sending process according to an embodiment of the present invention
  • a telepresence system provided by the present invention mainly includes: a telepresence terminal, and a multi-channel audio input/output device connected to the telepresence terminal, and/or a multi-channel video input/output device, Also including a remote endpoint interworking with the telepresence terminal;
  • Telepresence terminal with multi-channel audio and video input/output interface connecting multiple audio input/output devices and/or multi-channel video input/output devices, capable of acquiring and inputting multi-channel audio and video streams, encoding and decoding, Output playback, synchronization processing, etc.; used to establish a session with a remote endpoint, perform multi-channel audio and / or video code stream input / output location information interaction and media capability negotiation, and establish a media logical channel; also used for many The input code stream of the audio input device and/or the multi-channel video input device is encoded, and based on the established media logical channel, is sent to the remote endpoint according to the input/output position corresponding to the code stream, and receives the multipath from the remote endpoint. Audio and/or video code stream, which is decoded and transferred to an audio output device and/or a video output device corresponding to the corresponding position according to the input/output position corresponding to the code stream;
  • a multi-channel audio input device configured to input audio data collected by the audio collection terminal into the remote presentation terminal;
  • a multi-channel video input device configured to input video data collected by the video collection terminal into the remote presentation terminal
  • a multi-channel audio output device configured to output audio data decoded by the remote rendering terminal to a corresponding audio device for playing
  • a multi-channel video output device configured to output video data decoded by the remote rendering terminal to a corresponding video device for playing
  • a remote endpoint for performing multi-channel audio and/or video stream input/output location information interaction and media capability negotiation with the remote presentation terminal, establishing a media logical channel, and performing audio and remote presentation terminals based on the established media logical channel / or video stream interaction.
  • the performing the multi-channel audio and/or video stream input/output location information interaction and the media capability negotiation includes: the telepresence terminal sends the local end capability set to the remote endpoint, where the local media codec capability and the local end remote presentation are included.
  • the audio and video stream input/output location information of the terminal includes the capability set of the remote endpoint, including the remote media codec capability and the audio and video stream input/output location information.
  • the media logical channel includes a transmitting channel and a receiving channel, where
  • the packet header information is distinguished by: the packet header information includes: a code stream type, input location information corresponding to the code stream, and output location information.
  • multiple audio streams may be sent through different transmission channels and received through different receiving channels; and/or, multiple video streams are sent through different transmission channels and received through different receiving channels;
  • Different transmission channels respectively correspond to the types of audio and/or video streams and input/output location information; different receiving channels and types of audio and/or video streams, and input/output
  • the location information establishes a correspondence.
  • the telepresence terminal can also be used,
  • the local sending address and the remote receiving address are to be treated.
  • Establishing a remote audio or video stream output location information corresponding to the channel, and inputting location information of the local audio or video stream, and establishing a transmission channel;
  • the remote sending address and the receiving address of the local end the remote audio or video stream input location information corresponding to the channel to be established, and the audio or video stream output location information of the local end, establishing a receiving channel .
  • the telepresence terminal is also used to synchronize the transmission and/or reception of multiple audio and/or video streams.
  • the remote presentation terminal may further include: a protocol signaling processing module, a media codec module, and a media delivery module;
  • a protocol signaling processing module configured to establish a session with the remote endpoint, perform multi-channel audio and/or video stream input/output location information interaction, and media capability negotiation, and establish a media logical channel;
  • a media codec module configured to encode an input code stream of the multi-channel audio input device and/or the multi-channel video input device, and provide the media stream transmission module; and provide audio and/or from the remote end point to the media delivery module
  • the video code stream is decoded, and is transferred to the audio output device and/or the video output device of the corresponding position according to the input/output position corresponding to the code stream;
  • a media delivery module configured to receive and transmit multiple audio and/or video streams for transmitting the code stream to the remote endpoint according to the corresponding input/output location; receiving the audio and/or video code stream from the remote endpoint,
  • the media codec module is provided for decoding according to the input/output position corresponding to the code stream.
  • the performing the multi-channel audio and/or video code stream input/output location information interaction and the media capability negotiation includes: the protocol signaling processing module sends the capability set of the local remote presentation terminal to the remote endpoint, where the local media codec is included The capability and the audio and video stream input/output location information of the local telepresence terminal; the capability set of the remote endpoint, including the remote media codec capability and the audio and video stream input/output location information.
  • the protocol signaling processing module is further configured to:
  • the local sending address and the remote receiving address, the far-end audio or video stream output location information corresponding to the channel to be established, and the local audio or video stream input location information establish a transmission channel.
  • the remote sending address and the receiving address of the local end the remote audio or video stream input location information corresponding to the channel to be established, and the audio or video stream output location information of the local end, establishing a receiving channel .
  • the media delivery module is also operative to synchronize the transmission and/or reception of multiple audio and/or video streams.
  • the foregoing media codec module may be deployed as an internal structure of the remote presentation terminal, or may be deployed as an external component of the remote presentation terminal, regardless of the deployment mode, and the media coding.
  • the functions implemented by the decoding module are the same.
  • the remote endpoint may be a Multipoint Control Unit (MCU) or a remote telepresence terminal.
  • MCU Multipoint Control Unit
  • the telepresence terminal interacts with the MCU as the remote endpoint and interacts with the remote telepresence terminal as the remote endpoint, its telepresence terminal does not differ in its implementation functionality.
  • the system shown in FIG. 1 further includes: a central control system connected to the remote presentation terminal, the central control system is configured to provide a user operation interface (initiating a call, etc.) to implement interaction with the user.
  • a central control system connected to the remote presentation terminal, the central control system is configured to provide a user operation interface (initiating a call, etc.) to implement interaction with the user.
  • a multi-channel audio stream is respectively received through different receiving channels, and multiple video streams are also received through different receiving channels, for example, a remote rendering according to the first embodiment of the present invention.
  • the code stream receiving process of the method is elaborated. As shown in FIG. 2, the process mainly includes:
  • Step 201 The remote presentation terminal establishes a call with the remote endpoint, and the protocol signaling processing module
  • the block is responsible for signaling interaction, multi-channel audio and/or video code stream input/output location information interaction and media capability negotiation, according to the negotiated media codec capability, the remote sending address and the local receiving address, corresponding to the channel to be established.
  • the remote audio or video stream input location information, and the audio or video stream output location information of the local end establish a receiving channel.
  • a telepresence system comprising at least one telepresence terminal and a plurality of audio and video input/output devices; the telepresence terminal has multiple audio input and multiple audio output interfaces, and multiple video inputs and multiple channels of video respectively.
  • the audio and video input/output devices are respectively connected to the corresponding interfaces of the remote presentation terminal, and the corresponding refers to the media type (audio/video) correspondence and the location information of the device.
  • the telepresence terminal is connected to the network and is on the gatekeeper
  • Gatekeeper is a softswitch exchange server that handles the signal exchange and control functions on the VoIP network.
  • the remote presentation terminal establishes a connection with a remote endpoint (which may be an MCU or a remote remote presentation terminal), and may set up a point-to-point conference or a multipoint conference.
  • the remote presentation terminal may initiate a call, or may be the The telepresence terminal accepts a call from a remote endpoint.
  • the call connection includes: session establishment, telepresence terminal information interaction, and media capability (code decoding capability) negotiation.
  • the media logical channel includes: a transmitting channel and a receiving channel, which are specifically referred to as receiving channels in this embodiment.
  • the process of establishing the receiving channel includes: the remote endpoint sends a message to open the logical channel to the local terminal, where the remote terminal sends the address (IP address and port number), the media codec capability after negotiation, and the location information of the input device;
  • the remote presentation terminal sends a confirmation message to the remote endpoint, which carries the receiving address (IP address and port number) of the local end, and outputs the location information of the device.
  • the channel information includes a transmission and reception address corresponding to the media logical channel, a media codec capability, location information of the audio and/or video input device, and location information output by the audio and/or video device, a transmission channel identifier, and the like.
  • the multi-channel audio stream is sent through different sending channels and received through different receiving channels; the multi-channel video streams are sent through different sending channels and received through different receiving channels;
  • Different transmission channels respectively correspond to the types of audio and/or video streams and input/output location information; different receiving channels and types of audio and/or video streams, and input/output
  • the location information establishes a correspondence.
  • Step 202 The media transmission module of the remote presentation terminal receives the multiple code streams of the remote end point through the established receiving channel, and parses the position information of the output device corresponding to each code stream channel, and the code stream is forwarded to the media codec module. decoding.
  • the media delivery module of the telepresence terminal receives the code stream sent by the remote endpoint through the media receiving logical channel established above, and selects the parsing stream classification information (such as the code stream type, location information, etc.) according to the need, and parses each code stream channel. Corresponding output device location information is forwarded to the media codec module for decoding.
  • the parsing stream classification information such as the code stream type, location information, etc.
  • Step 203 The media codec module of the remote presentation terminal separately decodes the received multiple audio and/or video code streams, and outputs the corresponding audio/video playback device according to the location information thereof.
  • the system shown in FIG. 1 is used to transmit multiple audio streams through different transmission channels, and multiple video streams are sent through different transmission channels as an example.
  • the code stream sending process of the presentation method is elaborated. As shown in FIG. 3, the process mainly includes:
  • Step 301 The remote presentation terminal establishes a call with the remote endpoint, and the protocol signaling processing module is responsible for signaling interaction, multiple audio and/or video code stream input/output location information interaction, and media capability negotiation, according to the negotiated media.
  • the codec capability, the local sending address and the remote receiving address, the far-end audio or video stream output location information corresponding to the channel to be established, and the local audio or video stream input location information establish a transmission channel.
  • the specific operations are similar to those in step 201, and are not described here.
  • the media logical channel in this embodiment specifically refers to a transmission channel.
  • the process of establishing a media logical channel includes:
  • the local telepresence terminal carries the sending address (IP address and port number), the negotiated media encoding and decoding capability, and the location information of the input device to the remote endpoint; the remote endpoint replies to the local end with the remote receiving address (IP address and Port number), the location information of the output device.
  • the channel information includes a transmission and reception address corresponding to the media logical channel, a media codec capability, location information of the audio and/or video input device, and location information output by the audio and/or video device, a channel identifier, and the like.
  • Step 302 The audio and video input devices connected to the remote presentation terminal respectively collect media data, and submit the code to the encoder corresponding to the media codec module according to the location information of the input device, and then forward the code to the media delivery module.
  • the external audio and/or video device connected to the remote presentation terminal collects the audio and video code stream, and is coded by the media codec module according to the negotiated media capability, and forwarded to the media transmission module for sending by the corresponding media transmission logic channel established above. , according to the need to carry the code stream to distinguish information (such as stream type, location information, etc.).
  • Step 303 The media delivery module of the remote presentation terminal sends the encoded multiple code streams according to the location information through corresponding transmission channels.
  • the media delivery module of the remote presentation terminal selects according to the location information of the input device corresponding to the code stream according to the corresponding relationship between the different transmission channels and the types of audio and/or video code streams and the input/output location information.
  • the corresponding transmission channel is sent.
  • each media logical channel needs to be closed, and then the remote presentation terminal completes the session deletion with the remote endpoint.
  • Step 401 The telepresence terminal establishes a call with the remote endpoint, and the protocol signaling processing module is responsible for signaling interaction, multi-channel audio and/or video code stream input/output location information interaction, and media capability negotiation, according to the negotiated media.
  • the codec capability establishes a receive channel.
  • only one transmitting and receiving channel is established between the remote presentation terminal and the remote endpoint for transmitting and receiving the audio code stream, and only one transmitting and receiving channel is established for transmitting and receiving the video code stream.
  • Step 402 The telepresence terminal receives the code stream of the remote end point through the established receiving channel, and the media transmission module parses the data packet header information in the code stream to obtain a code stream type, and input position information and output position information corresponding to the code stream.
  • Step 403 The media codec module of the remote presentation terminal separately decodes the received audio and/or video code stream, and outputs the same according to the location information to the corresponding audio/video playback device for playing.
  • Step 501 A remote presentation terminal establishes a call with a remote endpoint, and the protocol signaling processing module is responsible for signaling interaction, multiple audio and/or video.
  • the stream input/output location information interaction and media capability negotiation establish a transmission channel according to the negotiated media codec capability.
  • only one transmitting and receiving channel is established between the remote presentation terminal and the remote endpoint for transmitting and receiving the audio code stream, and only one transmitting and receiving channel is established for transmitting and receiving the video code stream.
  • Step 502 The audio and video input devices connected to the remote presentation terminal respectively collect media data, and are encoded by the media codec module, and then forwarded to the media delivery module.
  • Step 503 The media transmission module of the telepresence terminal carries the data packet header information by the encoded code stream, and then sends the data through the established transmission channel.
  • the header information includes at least: a code stream type, input position information corresponding to the code stream, and output position information.
  • the multi-channel audio code stream may be mixed and sent through one transmission channel, and one receiving channel is received, and the multiple video streams are separately sent through multiple sending channels, and multiple receiving is performed. Channels are respectively received; and multiple video streams are mixed and transmitted through one transmission channel, one receiving channel is received, and multiple audio streams are separately transmitted through multiple transmission channels, and multiple receiving channels are respectively received. It is transmitted through a transmission channel, and the implementation manner of receiving by one receiving channel is similar to the operation shown in FIG. 5 and FIG. 4 above; the transmission manner is respectively transmitted through multiple transmission channels, and the implementation manners of receiving multiple receiving channels respectively are compared with FIG. 3 and FIG. 3 described above. The operation shown in 2 is similar. I will not repeat them here.
  • the telepresence method of the present invention will be further described in detail below by taking a three-way audio and video input/output interface as an example.
  • the telepresence terminal system of the embodiment is configured to include at least one telepresence terminal and a plurality of audio and video input/output devices, wherein the telepresence terminal has three audio inputs and three audio output interfaces, and three video inputs and
  • the three-way video output interface connects the audio and video input/output devices to the interface of the remote location of the remote presentation terminal.
  • the multi-channel audio and video streams are sent through different transmission channels and received through different receiving channels.
  • the specific process includes:
  • Step 601 The remote presentation terminal (that is, the local terminal) is connected to the network and registered on the registration server Gatekeeper through the H.225 RAS protocol, and provides the registered H.323 ID number or IP address.
  • Step 602 The remote presentation terminal establishes a connection with the remote endpoint (which may be an MCU or a remote presentation terminal) through the H.225 protocol, and may be a point-to-point conference or a multipoint conference.
  • the remote presentation terminal may initiate a call, or may It is the telepresence terminal that accepts a call from a remote endpoint.
  • the IP address and/or H.323 ID number of the telepresence terminal is carried in the call signaling.
  • Step 603 After the local terminal establishes an H.225 call connection with the remote endpoint, the local terminal constructs The local capability set sends a capability set to the remote endpoint to receive feedback from the remote end.
  • the capability set includes decoding capabilities and parameters of three output audios, and interface positions connected to the external audio output device, for example, 1, 2, 3, respectively, left audio output, middle audio output, right audio output; The decoding capability and parameters of the output video, the interface location connected to the external video output device.
  • terminalCapabilitySet H.245 terminal capability set
  • Step 604 The local terminal receives the capability set of the remote endpoint, and performs feedback.
  • the remote endpoint supports three video decoding outputs, including H.264 and H.263 decoding; supports three audio outputs, including G.711 and G.728 decoding, and external audio that is carried according to the capabilities of the remote endpoints. Or the location of the interface connected to the video output device and the location of the external audio/video input device interface determining capability of the local terminal.
  • the code stream of the left audio input interface of the local end is sent to the remote endpoint by using the G.711 code.
  • the left audio interface output the code stream of the left video input interface of the local end is sent to the left end of the video interface output by the H.264 code, and the output between the local middle channel and the right audio and video and the remote end point is also established. Interface location correspondence.
  • Step 605 The local terminal establishes a transmission channel to the remote endpoint.
  • the local terminal determines the transmission address and channel identifier of the channel according to the capability set sent by the remote endpoint and the capability set of the local end, including the media codec capability, the remote media output location corresponding to the channel to be established, and the media input location of the local terminal.
  • the media outputs location information, and the media logical channel is opened by the H.245 openLogicalChannel message, at least the channel sending address (IP address and port number, such as 10.11.12.13: 10200), encoding type and parameters (such as G.711a audio), channel identification number (such as channel number 2 identifies the left channel to send audio), local media input location (such as position 1 indicates left-channel audio input).
  • IP address and port number such as 10.11.12.13: 10200
  • encoding type and parameters such as G.711a audio
  • channel identification number such as channel number 2 identifies the left channel to send audio
  • local media input location such as position 1
  • the remote endpoint After receiving the message, the remote endpoint replies with the H.245 openLogicalChannelAck message, which carries at least The identification number of the channel, the receiving address (IP address and port number, such as 10.11.12.14:5058), and the corresponding audio output device interface location identifier (such as position 7 indicates left-channel audio output).
  • the receiving address IP address and port number, such as 10.11.12.14:5058
  • the corresponding audio output device interface location identifier such as position 7 indicates left-channel audio output.
  • Step 606 The local terminal establishes a media logical channel that receives the remote endpoint code stream.
  • the local terminal receives the H.245 openLogicalChannel message of the remote endpoint, according to the media capability (such as H.264) in the channel information and the input location information (such as the location 4 indicates the left video input), and the local media output location information. Determining the receiving address of the local terminal and feeding back the openLogicalChannelAck message, which includes at least the above receiving address (such as 10.11.12.13: 10206) and the local media output location information (for example, the position 10 indicates the left channel video output).
  • the local terminal records the channel information, and at least includes channel identification, media capability, and media input and output location information.
  • Step 607 The local terminal and the remote end point respectively transmit the multiple code streams through the media logical channel established above.
  • the audio or video input device connected to the local terminal separately collects audio and video data, and is encoded by the media codec module, and then delivered to the media transmission module, according to the correspondence between the location of the device interface and the media logical channel. Transmitting a code stream through the corresponding media logical channel established above;
  • the local terminal media transmission module receives the code stream sent by the remote endpoint, and according to the output device location information corresponding to the media logical channel, sends the code to the decoder corresponding to the media codec module, and outputs the code to the corresponding interface location.
  • the external audio or video output device plays.
  • Step 608 When the conference ends, the local terminal first closes the media logical channel and stops the media transmission and reception, and finally completes the session deletion.
  • a transmission and reception channel of an audio stream is established according to the negotiated media codec capability, one view
  • the transmission and reception channels of the frequency code stream during the code stream transmission, each audio code stream is transmitted through the same media logical channel, and each video code stream is transmitted through the same media logical channel, and each audio and/or video code
  • the stream is distinguished by the packet header information.
  • the local remote presentation terminal and the remote endpoint send the corresponding code stream to the corresponding position audio output device and/or the multi-channel video output by parsing the data packet header information. Play.
  • the present invention has a multi-channel audio input/output interface and a multi-channel video input/output interface, which can be connected to multi-channel audio and video input/output devices, and therefore, for a single conference site only It is necessary to deploy a telepresence terminal to process multi-channel audio and video streams, which is simple to deploy and can realize a single conference number call. Since multi-channel audio and video data are collected for a single conference site, the data source is more accurate, because each The road sound and video input devices collect relatively fixed range of data, thus enabling the sound recognition and life-size effects required by the telepresence system.

Description

一种远程呈现方法、 终端和系统 技术领域
本发明涉及远程呈现(telepresence )技术, 尤其涉及一种远程呈现方 法、 终端和系统。 背景技术
远程呈现是一种高级的远程视频会议系统。 远程呈现以其真实的临场 感深受高端用户的喜爱, 在远程呈现系统中, 听声辨位、 真身大小、 目艮神 交流直接关系到用户是否能够有身临其境的感受, 因此是衡量远程呈现系 统非常重要的技术指标。
在传统视频会议系统中, 一个视频会议终端除了辅流视频外, 通常还 具备以下功能: 编码并发送一路音频和 /或一路视频, 接收并解码输出一路 音频和 /或视频。 由于声音的输入源和输出只有一路, 因此用户无法感受到 声音是从会场的哪个方位发出。 由于视频的输入源和输出只有一路, 因此 本端的采集编码画面需要捕捉会场整体画面; 如果是多点会议, 则只能选 看某一会场或者多个远端会场的拼接画面, 无论是发送还是接收的视频都 无法达到真人大小的要求。
而远程呈现系统要求的用户体验是有多路音、 视频码流, 提供各路音 频的方位信息达到听声辨位; 根据推算需要 1 : 1显示远端参会者的图像, 则往往一个会场需要多路视频输入和多路视频输出。 现有的一些远程呈现 终端通过传统的视频会议终端集成而来, 在单个会场部署多个视频会议终 端, 且每个视频会议终端可以分别连接有一个音、 视频输入 /输出设备, 再 通过音、 视频输入 /输出设备的部署组装技术, 基本达到听声辨位和真人大 小的效果。 但是这种多个视频会议终端集成的方式 (通常在单个会场部署 多个视频会议终端, 需要对每个视频会议终端分别进行呼叫)在解决单一 会议号呼叫、 码流同步等方面有较大困难; 更重要的是, 多个终端集成使 得系统部署十分复杂, 必须专业的集成和部署人员才能完成, 使用过程中 出现细微问题都必须有专业人员进行现场维护, 从而给远程呈现这种高端 应用的推广造成很大障碍。 且由于部分视频会议终端的功能并未在集成系 统中完全被使用, 因此会造成一定程度上的资源浪费。 另外, 由于集成方 案的复杂和非标准化, 这种方案也造成了不同厂商部署的远程呈现系统之 间要实现互通也变得极其困难。 发明内容
有鉴于此, 本发明的主要目的在于提供一种远程呈现方法、 终端和系 统, 以解决现有远程呈现系统的部署复杂, 且在解决单一会议号呼叫、 码 流同步方面存在较大困难的问题。
为达到上述目的, 本发明的技术方案是这样实现的:
本发明提供了一种远程呈现系统, 该系统包括: 远程呈现终端, 以及 与所述远程呈现终端相连的多路音频输入 /输出设备、 和 /或多路视频输入 / 输出设备, 还包括与所述远程呈现终端互通的远端端点;
所述远程呈现终端, 具有多路音、 视频输入 /输出接口, 连接所述多路 音频输入 /输出设备和 /或多路视频输入 /输出设备,用于与所述远端端点之间 建立会话, 进行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能 力协商, 并建立媒体逻辑通道; 还用于对所述多路音频输入设备和 /或多路 视频输入设备的输入码流进行编码, 并基于建立的媒体逻辑通道, 根据所 述码流对应的输入 /输出位置发送给所述远端端点, 接收来自所述远端端点 的多路音频和 /或视频码流, 进行解码并根据所述码流对应的输入 /输出位置 转给自身对应位置的音频输出设备和 /或视频输出设备进行播放;
所述多路音频输入设备, 用于将采集的音频数据输入所述远程呈现终 端;
所述多路视频输入设备, 用于将采集的视频数据输入所述远程呈现终 端;
所述多路音频输出设备, 用于将远程呈现终端解码所得的音频数据输 出;
所述多路视频输出设备, 用于将远程呈现终端解码所得的视频数据输 出;
所述远端端点, 用于与所述远程呈现终端进行多路音频和 /或视频码流 输入 /输出位置信息交互和媒体能力协商, 建立媒体逻辑通道, 并基于建立 的媒体逻辑通道与所述远程呈现终端进行音频和 /或视频码流的交互。
所述进行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能 力协商包括:
所述远程呈现终端向所述远端端点发送本端的能力集, 其中包括本端 的媒体编解码能力和本端远程呈现终端的音、视频码流输入 /输出位置信息; 接收所述远端端点的能力集, 其中包括远端的媒体编解码能力和音、 视频 码流输入 /输出位置信息。
所述媒体逻辑通道包括发送通道和接收通道, 其中,
多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 / 或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息包 括: 码流类型、 码流对应的输入位置信息和输出位置信息。
所述媒体逻辑通道包括发送通道和接收通道, 其中,
多路音频码流分别通过不同的发送通道发送, 并通过不同的接收通道 接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过不同的 接收通道接收; 不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输出 位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码流的 类型、 以及输入 /输出位置信息建立对应关系。
所述远程呈现终端还用于,
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待 建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或视频 码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待 建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频或视 频码流输出位置信息, 建立接收通道。
所述远程呈现终端还用于,对发送和 /或接收的多路音频和 /或视频码流 进行同步处理。
所述远端端点为多点控制单元(MCU )或远端远程呈现终端。
本发明还提供了一种远程呈现终端,具有多路音、视频输入 /输出接口, 用于连接多路音频输入 /输出设备和 /或多路视频输入 /输出设备, 该终端包 括: 协议信令处理模块、 媒体编解码模块和媒体传送模块; 其中,
所述协议信令处理模块, 用于与远端端点之间建立会话, 进行多路音 频和 /或视频码流输入 /输出位置信息交互以及媒体能力协商, 并建立媒体逻 辑通道;
所述媒体编解码模块, 用于对所述多路音频输入设备和 /或多路视频输 入设备的输入码流进行编码, 并提供给所述媒体传送模块; 对所述媒体传 送模块提供的来自远端端点的音频和 /或视频码流进行解码, 并根据所述码 流对应的输入 /输出位置转给对应位置的音频输出设备和 /或视频输出设备 进行播放;
所述媒体传送模块, 用于将所述码流根据对应的输入 /输出位置发送给 所述远端端点; 接收来自所述远端端点的音频和 /或视频码流, 根据所述码 流对应的输入 /输出位置, 提供给所述媒体编解码模块进行解码。
所述进行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能 力协商包括:
所述协议信令处理模块向所述远端端点发送本端远程呈现终端的能力 集, 其中包括本端的媒体编解码能力和本端远程呈现终端的音、 视频码流 输入 /输出位置信息; 接收所述远端端点的能力集, 其中包括远端的媒体编 解码能力和音、 视频码流输入 /输出位置信息。
所述媒体逻辑通道包括发送通道和接收通道, 其中,
多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 / 或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息包 括: 码流类型、 码流对应的输入位置信息和输出位置信息。
所述媒体逻辑通道包括发送通道和接收通道, 其中,
多路音频码流分别通过不同的发送通道发送, 并通过不同的接收通道 接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过不同的 接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输出 位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码流的 类型、 以及输入 /输出位置信息建立对应关系。
所述协议信令处理模块还用于,
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待 建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或视频 码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待 建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频或视 频码流输出位置信息, 建立接收通道。
所述媒体传送模块还用于,对发送和 /或接收的多路音频和 /或视频码流 进行同步处理。
本发明还提供了一种远程呈现方法, 远程呈现终端具有多路音、 视频 输入 /输出接口, 用于连接多路音频输入 /输出设备和 /或多路视频输入 /输出 设备, 该方法包括:
远程呈现终端与远端端点之间建立会话, 进行多路音频和 /或视频码流 输入 /输出位置信息交互以及媒体能力协商, 并建立媒体逻辑通道;
所述远程呈现终端对所述多路音频输入设备和 /或多路视频输入设备的 输入码流进行编码, 并基于建立的媒体逻辑通道, 根据所述码流对应的输 入 /输出位置发送给所述远端端点; 接收来自所述远端端点的多路音频和 / 或视频码流, 进行解码并根据所述码流对应的输入 /输出位置转给自身对应 位置的音频输出设备和 /或视频输出设备进行播放。
所述进行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能 力协商包括:
所述远程呈现终端向所述远端端点发送本端的能力集, 其中包括本端 的媒体编解码能力和本端远程呈现终端的音、视频码流输入 /输出位置信息; 接收所述远端端点的能力集, 其中包括远端的媒体编解码能力和音、 视频 码流输入 /输出位置信息。
所述媒体逻辑通道包括发送通道和接收通道, 其中,
多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 / 或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息包 括: 码流类型、 码流对应的输入位置信息和输出位置信息。 所述媒体逻辑通道包括发送通道和接收通道, 其中,
多路音频码流分别通过不同的发送通道发送, 并通过不同的接收通道 接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过不同的 接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输出 位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码流的 类型、 以及输入 /输出位置信息建立对应关系。
所述建立媒体逻辑通道, 具体为:
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待 建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或视频 码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待 建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频或视 频码流输出位置信息, 建立接收通道。
该方法还包括: 所述远程呈现终端对发送和 /或接收的多路音频和 /或视 频码流进行同步处理。
所述远端端点为 MCU或远端远程呈现终端。
本发明所提供的一种远程呈现方法、 终端和系统, 由于一个远程呈现 终端具备多路音频输入 /输出接口、 多路视频输入 /输出接口, 能够实现与多 路音、 视频输入 /输出设备的相连, 因此, 对单一会场只需要部署一个远程 呈现终端即可处理多路音、 视频码流, 部署简单, 且能够实现单一会议号 呼叫; 由于对单一会场实现多路音、 视频数据的采集, 其数据来源更加精 确, 由于每路音、 视频输入设备采集相对固定范围的数据, 因此能够实现 远程呈现系统所要求的听声辨位和真人大小的效果。 附图说明
图 1为本发明实施例的一种远程呈现系统的结构示意图;
图 2为本发明实施例一的一种远程呈现方法的码流接收流程的示意图; 图 3为本发明实施例二的一种远程呈现方法的码流发送流程的示意图; 图 4为本发明实施例三的一种远程呈现方法的码流接收流程的示意图; 图 5为本发明实施例四的一种远程呈现方法的码流发送流程的示意图; 图 6为本发明实施例的一种远程呈现方法的流程图。 具体实施方式
下面结合附图和具体实施例对本发明的技术方案进一步详细阐述。 本发明所提供的一种远程呈现系统, 如图 1 所示, 主要包括: 远程呈 现终端, 以及与远程呈现终端相连的多路音频输入 /输出设备、 和 /或多路视 频输入 /输出设备, 还包括与远程呈现终端互通的远端端点;
远程呈现终端, 具有多路音、 视频输入 /输出接口, 连接多路音频输入 / 输出设备和 /或多路视频输入 /输出设备, 能完成多路音、 视频码流的采集输 入、 编解码、 输出播放、 同步处理等; 用于与远端端点之间建立会话, 进 行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能力协商, 并建 立媒体逻辑通道; 还用于对多路音频输入设备和 /或多路视频输入设备的输 入码流进行编码, 并基于建立的媒体逻辑通道, 根据码流对应的输入 /输出 位置发送给远端端点, 接收来自远端端点的多路音频和 /或视频码流, 进行 解码并根据码流对应的输入 /输出位置转给自身对应位置的音频输出设备和 /或视频输出设备进行播放;
多路音频输入设备, 用于将音频采集终端所采集的音频数据输入远程 呈现终端;
多路视频输入设备, 用于将视频采集终端所采集的视频数据输入远程 呈现终端; 多路音频输出设备, 用于将远程呈现终端解码所得的音频数据输出到 相应的音频设备进行播放;
多路视频输出设备, 用于将远程呈现终端解码所得的视频数据输出到 相应的视频设备进行播放;
远端端点, 用于与远程呈现终端进行多路音频和 /或视频码流输入 /输出 位置信息交互和媒体能力协商, 建立媒体逻辑通道, 并基于建立的媒体逻 辑通道与远程呈现终端进行音频和 /或视频码流的交互。
其中, 进行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能 力协商包括: 远程呈现终端向远端端点发送本端的能力集, 其中包括本端 的媒体编解码能力和本端远程呈现终端的音、视频码流输入 /输出位置信息; 接收远端端点的能力集, 其中包括远端的媒体编解码能力和音、 视频码流 输入 /输出位置信息。
所述媒体逻辑通道包括发送通道和接收通道, 其中,
可以多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 /或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息至 少包括: 码流类型、 码流对应的输入位置信息和输出位置信息。
也可以多路音频码流分别通过不同的发送通道发送, 并通过不同的接 收通道接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过 不同的接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输出 位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码流的 类型、 以及输入 /输出位置信息建立对应关系。
所述远程呈现终端还可用于,
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待 建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或视频 码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待 建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频或视 频码流输出位置信息, 建立接收通道。
远程呈现终端还用于,对发送和 /或接收的多路音频和 /或视频码流进行 同步处理。
较佳的, 远程呈现终端还可以包括: 协议信令处理模块、 媒体编解码 模块和媒体传送模块; 其中,
协议信令处理模块, 用于与远端端点之间建立会话, 进行多路音频和 / 或视频码流输入 /输出位置信息交互以及媒体能力协商, 并建立媒体逻辑通 道;
媒体编解码模块, 用于对多路音频输入设备和 /或多路视频输入设备的 输入码流进行编码, 并提供给媒体传送模块; 对媒体传送模块提供的来自 远端端点的音频和 /或视频码流进行解码, 并根据码流对应的输入 /输出位置 转给对应位置的音频输出设备和 /或视频输出设备进行播放;
媒体传送模块, 负责接收和发送多路音频和 /或视频码流, 用于将码流 根据对应的输入 /输出位置发送给远端端点; 接收来自远端端点的音频和 / 或视频码流, 根据码流对应的输入 /输出位置, 提供给媒体编解码模块进行 解码。
所述进行多路音频和 /或视频码流输入 /输出位置信息交互以及媒体能 力协商包括: 协议信令处理模块向远端端点发送本端远程呈现终端的能力 集, 其中包括本端的媒体编解码能力和本端远程呈现终端的音、 视频码流 输入 /输出位置信息; 接收远端端点的能力集, 其中包括远端的媒体编解码 能力和音、 视频码流输入 /输出位置信息。 所述协议信令处理模块还用于,
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待 建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或视频 码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待 建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频或视 频码流输出位置信息, 建立接收通道。
媒体传送模块还用于, 对发送和 /或接收的多路音频和 /或视频码流进行 同步处理。
需要说明的是, 在具体实施过程中, 上述媒体编解码模块既可以作为 远程呈现终端的内部结构进行部署, 也可以作为远程呈现终端的外接组件 进行部署, 无论采用何种部署方式, 其媒体编解码模块所实现的功能是相 同的。
另外, 所述远端端点可以为多点控制单元(MCU, Multipoint Control Unit )或远端远程呈现终端。 远程呈现终端与作为远端端点的 MCU交互, 以及与作为远端端点的远端远程呈现终端交互时, 其远程呈现终端在实现 功能上没有区别。
较佳的, 图 1 所示的系统中还可以包括: 与远程呈现终端相连的中控 系统, 该中控系统用于提供用户操作界面 (发起呼叫等等), 实现与用户的 交互。
下面结合图 1 所示的系统, 以多路音频码流分别通过不同的接收通道 接收, 且多路视频码流也分别通过不同的接收通道接收为例, 对本发明实 施例一的一种远程呈现方法的码流接收流程进行详细阐述, 如图 2所示, 该流程主要包括:
步驟 201 ,远程呈现终端建立与远端端点之间的呼叫, 协议信令处理模 块负责信令交互、 多路音频和 /或视频码流输入 /输出位置信息交互和媒体能 力协商, 根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频或 视频码流输出位置信息, 建立接收通道。
首先组建远程呈现系统, 至少包括一远程呈现终端和多个音、 视频输 入 /输出设备;所述远程呈现终端分别有多路音频输入和多路音频输出接口, 以及多路视频输入和多路视频输出接口; 所述音、 视频输入 /输出设备分别 与远程呈现终端的对应接口相连, 所述对应是指媒体类型(音 /视频)对应, 以及设备的位置信息对应。 所述远程呈现终端连接到网络并在网守
( Gatekeeper )上进行注册,对外提供注册的端点 ID号或 IP地址。 Gatekeeper 是一种采用软交换方式的交换服务器, 负责 VoIP网路上的讯号交换及控制 功能。
所述远程呈现终端建立与远端端点(可以是 MCU或者远端的远程呈现 终端) 的连接, 可以组建点对点会议或者多点会议, 可以是所述远程呈现 终端主动发起呼叫, 也可以是所述远程呈现终端接受远端端点的呼叫。 其 中, 呼叫连接包括: 会话建立、 远程呈现终端信息交互和媒体能力 (编解 码能力)协商。
媒体逻辑通道包括: 发送通道和接收通道, 本实施例中特指接收通道。 接收通道的建立过程包括: 远端端点发送打开逻辑通道消息给本端终端, 其中携带远端的发送地址(IP地址和端口号), 协商后的媒体编解码能力, 输入设备的位置信息; 本端的远程呈现终端向远端端点回复确认消息, 其 中携带本端的接收地址(IP地址和端口号), 输出设备的位置信息。 通道信 息包括该媒体逻辑通道对应的发送和接收地址, 媒体编解码能力, 音频和 / 或视频输入设备的位置信息, 以及音频和 /或视频设备输出的位置信息, 传 输通道标识等等。 其中, 多路音频码流分别通过不同的发送通道发送, 并通过不同的接 收通道接收; 多路视频码流分别通过不同的发送通道发送, 并通过不同的 接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输出 位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码流的 类型、 以及输入 /输出位置信息建立对应关系。
步驟 202 ,远程呈现终端的媒体传送模块通过建立的接收通道分别接收 远端端点的多路码流, 并解析每一路码流通道对应的输出设备的位置信息, 码流转交给媒体编解码模块进行解码。
远程呈现终端的媒体传送模块通过上述建立的媒体接收逻辑通道接收 远端端点发送的码流, 根据需要选择解析码流区分信息 (如码流类型、 位 置信息等等), 解析每一路码流通道对应的输出设备的位置信息, 并转交给 媒体编解码模块进行解码。
步驟 203 , 远程呈现终端的媒体编解码模块对接收到的多路音频和 /或 视频码流分别进行解码, 并根据其位置信息输出到相应的音频 /视频播放设 备。
下面再结合图 1 所示的系统, 以多路音频码流分别通过不同的发送通 道发送, 且多路视频码流也分别通过不同的发送通道发送为例, 对本发明 实施例二的一种远程呈现方法的码流发送流程进行详细阐述, 如图 3所示, 该流程主要包括:
步驟 301 ,远程呈现终端建立与远端端点之间的呼叫, 协议信令处理模 块负责信令交互、 多路音频和 /或视频码流输入 /输出位置信息交互和媒体能 力协商, 根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或视 频码流输入位置信息, 建立发送通道。 具体操作与步驟 201 中类似, 此处不再赘述。 本实施例中的媒体逻辑 通道特指发送通道。 媒体逻辑通道的建立过程包括:
本端远程呈现终端携带发送地址( IP地址和端口号), 协商后的媒体编 解码能力, 输入设备的位置信息给远端端点; 远端端点向本端回复远端的 接收地址( IP地址和端口号), 输出设备的位置信息。 所述通道信息包括该 媒体逻辑通道对应的发送和接收地址, 媒体编解码能力, 音频和 /或视频输 入设备的位置信息, 以及音频和 /或视频设备输出的位置信息, 通道标识等 等。
步驟 302,远程呈现终端所连接的音、视频输入设备分别采集媒体数据, 并根据输入设备的位置信息提交给媒体编解码模块对应的编码器进行编 码, 转交给媒体传送模块。
远程呈现终端所连接的外部音频和 /或视频设备采集音视频码流, 由媒 体编解码模块根据协商的媒体能力进行编码 , 并转交给媒体传送模块由上 述建立的相应的媒体发送逻辑通道进行发送, 根据需要选择携带码流区分 信息 (如码流类型、 位置信息等等)。
步驟 303 ,远程呈现终端的媒体传送模块将编码后的多路码流根据位置 信息分别通过相应的发送通道进行发送。
根据建立的不同的发送通道分别与各路音频和 /或视频码流的类型、 以 及输入 /输出位置信息的对应关系, 远程呈现终端的媒体传送模块根据码流 对应的输入设备的位置信息, 选择对应的发送通道进行发送。
在会议结束时, 需要先关闭各媒体逻辑通道, 然后远程呈现终端完成 与远端端点之间的会话删除。
下面再结合图 1所示的系统, 以多路音频码流通过一个接收通道接收, 且多路视频码流通过一个接收通道接收为例 , 对本发明实施例三的一种远 程呈现方法的码流接收流程进行详细阐述, 如图 4所示, 该流程主要包括: 步驟 401 ,远程呈现终端建立与远端端点之间的呼叫, 协议信令处理模 块负责信令交互、 多路音频和 /或视频码流输入 /输出位置信息交互和媒体能 力协商, 根据协商的媒体编解码能力建立接收通道。
该实施例中, 远程呈现终端与远端端点之间只建立一个发送、 接收通 道, 用于发送、 接收音频码流, 并只建立一个发送、 接收通道, 用于发送、 接收视频码流。
步驟 402, 远程呈现终端通过建立的接收通道接收远端端点的码流,媒 体传送模块解析码流中的数据包头信息, 得到码流类型、 以及码流对应的 输入位置信息和输出位置信息。
步驟 403 , 远程呈现终端的媒体编解码模块对接收到的音频和 /或视频 码流分别进行解码, 并根据其位置信息输出到相应的音频 /视频播放设备进 行播放。
下面再结合图 1所示的系统, 以多路音频码流通过一个发送通道发送, 且多路视频码流通过一个发送通道发送为例 , 对本发明实施例四的一种远 程呈现方法的码流接收流程进行详细阐述, 如图 5所示, 该流程主要包括: 步驟 501 ,远程呈现终端建立与远端端点之间的呼叫, 协议信令处理模 块负责信令交互、 多路音频和 /或视频码流输入 /输出位置信息交互和媒体能 力协商, 根据协商的媒体编解码能力建立发送通道。
该实施例中, 远程呈现终端与远端端点之间只建立一个发送、 接收通 道, 用于发送、 接收音频码流, 并只建立一个发送、 接收通道, 用于发送、 接收视频码流。
步驟 502,远程呈现终端所连接的音、视频输入设备分别采集媒体数据, 通过媒体编解码模块进行编码后, 转交给媒体传送模块。
步驟 503 ,远程呈现终端的媒体传送模块将编码后的码流携带数据包头 信息后, 通过建立的发送通道进行发送。 包头信息至少包括: 码流类型、 码流对应的输入位置信息和输出位置 信息。
需要说明的是, 在本发明的实施例中还可以包括多路音频码流混成一 路通过一个发送通道发送, 一个接收通道接收, 而多路视频码流通过多个 发送通道分别发送, 多个接收通道分别接收的情况; 以及多路视频码流混 成一路通过一个发送通道发送, 一个接收通道接收, 而多路音频码流通过 多个发送通道分别发送, 多个接收通道分别接收的情况。 其通过一个发送 通道发送, 一个接收通道接收的实现方式与上述图 5和图 4所示的操作类 似; 通过多个发送通道分别发送, 多个接收通道分别接收的实现方式与上 述图 3和图 2所示的操作类似。 此处不再赘述。
下面以三路音、 视频输入 /输出接口为例, 对本发明的远程呈现方法进 一步详细说明。 组建本实施例的远程呈现终端系统, 至少包括一个远程呈 现终端和多个音、 视频输入 /输出设备, 所述远程呈现终端分别有三路音频 输入和三路音频输出接口, 以及三路视频输入和三路视频输出接口, 将所 述的音频、 视频输入 /输出设备分别与所述远程呈现终端正确位置的接口相 连。 该实施例中, 多路音、 视频码流分别通过不同的发送通道发送, 并通 过不同的接收通道接收, 如图 6所示, 具体流程包括:
步驟 601 , 远程呈现终端(即本端终端 )连接到网络并通过 H.225 RAS 协议在注册服务器 Gatekeeper上进行注册, 对外提供注册的 H.323 ID号或 者 IP地址。
步驟 602,远程呈现终端通过 H.225协议建立与远端端点(可以是 MCU 或者远程呈现终端) 的连接, 可以是点对点会议或者多点会议, 可以是所 述远程呈现终端主动发起呼叫, 也可以是所述远程呈现终端接受远端端点 的呼叫。 呼叫信令中携带所述远程呈现终端的 IP地址和 /或 H.323 ID号。
步驟 603 , 本端终端与远端端点建立 H.225呼叫连接后, 本端终端构建 本端的能力集并向远端端点发送能力集, 接收远端的反馈信息。 所述能力 集包括三路输出音频的解码能力和参数, 与外部音频输出设备相连的接口 位置, 比如分别用 1、 2、 3表示左路音频输出、 中路音频输出、 右路音频 输出; 三路输出视频的解码能力和参数, 与外部视频输出设备相连的接口 位置。 例如: 在 H.245的终端能力集( terminalCapabilitySet )消息结构中增 加左、 中、 右路音频描述符, 左、 中、 右路视频的描述符, 并约定不同值 对应不同的类型和位置, 如 1、 2、 3分别表示左、 中、 右路音频, 4、 5、 6 分别表示左、 中、 右路视频。
步驟 604, 本端终端接收远端端点的能力集, 并进行反馈。 如: 远端端 点支持三路视频解码输出, 包括 H.264、 H.263解码; 支持三路音频输出, 包括 G.711、 G.728解码, 根据远端端点的能力集中携带的外部音 /或视频输 出设备相连的接口位置、 以及本端终端连接的外部音 /或视频输入设备接口 位置确定能力协商结果,协商后本端左侧音频输入接口的码流用 G.711编码 发给远端端点左侧音频接口输出, 本端左侧视频输入接口的码流用 H.264 编码发给远端端点左侧视频接口输出, 同样建立起本端中路、 右路音视频 与远端端点之间的输出接口位置对应关系。
步驟 605 , 本端终端建立到远端端点的发送通道。本端终端根据远端端 点发送的能力集及本端的能力集, 包括媒体编解码能力、 待建立通道对应 的远端媒体输出位置以及本端终端媒体输入位置, 确定通道的发送地址、 通道标识、 媒体输出位置信息, 通过 H.245 openLogicalChannel (打开逻辑 通道) 消息打开媒体逻辑通道, 其中至少需要携带该通道发送地址(IP地 址和端口号, 如 10.11.12.13: 10200 )、 编码类型和参数(如 G.711a音频)、 通道标识号(如通道号为 2标识左路发送音频)、 本端媒体输入位置(如位 置 1 表示左路音频输入)。 远端端点接收到该消息后, 用 H.245 openLogicalChannelAck (打开逻辑通道确认 ) 消息回复, 其中至少携带该 通道的标识号、接收地址(IP地址和端口号, 如 10.11.12.14:5058 )、 对应的 音频输出设备接口位置标识(如位置 7表示左路音频输出)。
步驟 606, 本端终端建立接收远端端点码流的媒体逻辑通道。 本端终端 接收远端端点的 H.245 openLogicalChannel消息, 根据通道信息中的媒体能 力 (如 H.264 )和输入位置信息(如位置 4表示左路视频输入), 以及本端 媒体输 出 位置信 息 , 确 定本端终端 的接收地址并反馈 openLogicalChannelAck 消息, 其中至少 包括上述接收地址 ( 如 10.11.12.13: 10206 )、本端媒体输出位置信息(如位置 10表示左路视频输出)。 同时, 本端终端记录该通道信息, 至少包括通道标识、 媒体能力和媒体输 入输出位置信息。
步驟 607,本端终端分别与远端端点之间通过上述建立的媒体逻辑通道 传输多路码流。
包括发送码流, 本端终端所连接的音频或视频输入设备分别采集音视 频数据, 并由媒体编解码模块进行编码, 然后交给媒体传送模块, 根据设 备接口的位置与媒体逻辑通道的对应关系, 分别通过上述建立的对应媒体 逻辑通道发送码流;
接收码流, 本端终端媒体传送模块接收到远端端点发送的码流, 根据 媒体逻辑通道对应的输出设备位置信息, 交给媒体编解码模块对应的解码 器进行解码, 并输出到对应接口位置的外部音频或视频输出设备进行播放。
步驟 608 , 结束会议时, 本端终端先关闭媒体逻辑通道并停止媒体的收 发, 最后完成会话删除。
对于多路音频码流通过一个发送通道发送, 一个接收通道接收, 且多 路视频码流通过一个发送通道发送, 一个接收通道接收的情况, 其远程呈 现方法的操作流程与图 6所示的流程类似。 只是, 在建立媒体逻辑通道时, 根据协商的媒体编解码能力建立一个音频码流的发送和接收通道, 一个视 频码流的发送和接收通道; 在码流传输时, 各路音频码流通过同一个媒体 逻辑通道传输, 各路视频码流通过同一个媒体逻辑通道传输, 且各路音频 和 /或视频码流通过数据包头信息进行区分, 本端的远程呈现终端和远端端 点在接收到码流后, 通过解析数据包头信息, 将对应的码流发送到对应位 置音频输出设备和 /或多路视频输出进行播放。
综上所述,本发明由于一个远程呈现终端具备多路音频输入 /输出接口、 多路视频输入 /输出接口, 能够实现与多路音、 视频输入 /输出设备的相连, 因此, 对单一会场只需要部署一个远程呈现终端即可处理多路音、 视频码 流, 部署简单, 且能够实现单一会议号呼叫; 由于对单一会场实现多路音、 视频数据的采集, 其数据来源更加精确, 由于每路音、 视频输入设备采集 相对固定范围的数据, 因此能够实现远程呈现系统所要求的听声辨位和真 人大小的效果。
以上所述, 仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。

Claims

权利要求书
1、 一种远程呈现系统, 其特征在于, 该系统包括: 远程呈现终端, 以及与所述远程呈现终端相连的多路音频输入 /输出设备、 和 /或多路视频 输入 /输出设备, 还包括与所述远程呈现终端互通的远端端点;
所述远程呈现终端, 具有多路音、 视频输入 /输出接口, 连接所述多 路音频输入 /输出设备和 /或多路视频输入 /输出设备,用于与所述远端端点 之间建立会话, 进行多路音频和 /或视频码流输入 /输出位置信息交互以及 媒体能力协商, 并建立媒体逻辑通道; 还用于对所述多路音频输入设备 和 /或多路视频输入设备的输入码流进行编码, 并基于建立的媒体逻辑通 道, 根据所述码流对应的输入 /输出位置发送给所述远端端点, 接收来自 所述远端端点的多路音频和 /或视频码流, 进行解码并根据所述码流对应 的输入 /输出位置转给自身对应位置的音频输出设备和 /或视频输出设备 进行播放;
所述多路音频输入设备, 用于将采集的音频数据输入所述远程呈现 终端;
所述多路视频输入设备, 用于将采集的视频数据输入所述远程呈现 终端;
所述多路音频输出设备, 用于将远程呈现终端解码所得的音频数据 输出;
所述多路视频输出设备, 用于将远程呈现终端解码所得的视频数据 输出;
所述远端端点, 用于与所述远程呈现终端进行多路音频和 /或视频码 流输入 /输出位置信息交互和媒体能力协商, 建立媒体逻辑通道, 并基于 建立的媒体逻辑通道与所述远程呈现终端进行音频和 /或视频码流的交 互。
2、 根据权利要求 1所述远程呈现系统, 其特征在于, 所述进行多路 音频和 /或视频码流输入 /输出位置信息交互以及媒体能力协商包括:
所述远程呈现终端向所述远端端点发送本端的能力集, 其中包括本 端的媒体编解码能力和本端远程呈现终端的音、 视频码流输入 /输出位置 信息; 接收所述远端端点的能力集, 其中包括远端的媒体编解码能力和 音、 视频码流输入 /输出位置信息。
3、 根据权利要求 1所述远程呈现系统, 其特征在于, 所述媒体逻辑 通道包括发送通道和接收通道, 其中,
多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 /或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息 包括: 码流类型、 码流对应的输入位置信息和输出位置信息。
4、 根据权利要求 1所述远程呈现系统, 其特征在于, 所述媒体逻辑 通道包括发送通道和接收通道, 其中,
多路音频码流分别通过不同的发送通道发送, 并通过不同的接收通 道接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过不 同的接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输 出位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码 流的类型、 以及输入 /输出位置信息建立对应关系。
5、 根据权利要求 4所述远程呈现系统, 其特征在于, 所述远程呈现 终端还用于,
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或 视频码流输入位置信息, 建立发送通道; 根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频 或视频码流输出位置信息, 建立接收通道。
6、 根据权利要求 1所述远程呈现系统, 其特征在于, 所述远程呈现 终端还用于, 对发送和 /或接收的多路音频和 /或视频码流进行同步处理。
7、 根据权利要求 1至 6任一项所述远程呈现系统, 其特征在于, 所 述远端端点为多点控制单元 MCU或远端远程呈现终端。
8、 一种远程呈现终端, 其特征在于, 具有多路音、 视频输入 /输出接 口, 用于连接多路音频输入 /输出设备和 /或多路视频输入 /输出设备,该终 端包括: 协议信令处理模块、 媒体编解码模块和媒体传送模块; 其中, 所述协议信令处理模块, 用于与远端端点之间建立会话, 进行多路 音频和 /或视频码流输入 /输出位置信息交互以及媒体能力协商, 并建立媒 体逻辑通道;
所述媒体编解码模块, 用于对所述多路音频输入设备和 /或多路视频 输入设备的输入码流进行编码, 并提供给所述媒体传送模块; 对所述媒 体传送模块提供的来自远端端点的音频和 /或视频码流进行解码, 并根据 所述码流对应的输入 /输出位置转给对应位置的音频输出设备和 /或视频 输出设备进行播放;
所述媒体传送模块, 用于将所述码流根据对应的输入 /输出位置发送 给所述远端端点; 接收来自所述远端端点的音频和 /或视频码流, 根据所 述码流对应的输入 /输出位置, 提供给所述媒体编解码模块进行解码。
9、 根据权利要求 8所述远程呈现终端, 其特征在于, 所述进行多路 音频和 /或视频码流输入 /输出位置信息交互以及媒体能力协商包括: 所述协议信令处理模块向所述远端端点发送本端远程呈现终端的能 力集, 其中包括本端的媒体编解码能力和本端远程呈现终端的音、 视频 码流输入 /输出位置信息; 接收所述远端端点的能力集, 其中包括远端的 媒体编解码能力和音、 视频码流输入 /输出位置信息。
10、 根据权利要求 8 所述远程呈现终端, 其特征在于, 所述媒体逻 辑通道包括发送通道和接收通道, 其中,
多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 /或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息 包括: 码流类型、 码流对应的输入位置信息和输出位置信息。
11、根据权利要求 8所述远程呈现终端, 其特征在于, 所述媒体逻辑 通道包括发送通道和接收通道, 其中,
多路音频码流分别通过不同的发送通道发送, 并通过不同的接收通 道接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过不 同的接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输 出位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码 流的类型、 以及输入 /输出位置信息建立对应关系。
12、 根据权利要求 11所述远程呈现终端, 其特征在于, 所述协议信 令处理模块还用于,
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或 视频码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频 或视频码流输出位置信息, 建立接收通道。
13、 根据权利要求 8 所述远程呈现终端, 其特征在于, 所述媒体传 送模块还用于, 对发送和 /或接收的多路音频和 /或视频码流进行同步处 理。
14、 一种远程呈现方法, 其特征在于, 远程呈现终端具有多路音、 视频输入 /输出接口,用于连接多路音频输入 /输出设备和 /或多路视频输入 /输出设备, 该方法包括:
远程呈现终端与远端端点之间建立会话, 进行多路音频和 /或视频码 流输入 /输出位置信息交互以及媒体能力协商, 并建立媒体逻辑通道; 所述远程呈现终端对所述多路音频输入设备和 /或多路视频输入设备 的输入码流进行编码, 并基于建立的媒体逻辑通道, 根据所述码流对应 的输入 /输出位置发送给所述远端端点; 接收来自所述远端端点的多路音 频和 /或视频码流, 进行解码并根据所述码流对应的输入 /输出位置转给自 身对应位置的音频输出设备和 /或视频输出设备进行播放。
15、 根据权利要求 14所述远程呈现方法, 其特征在于, 所述进行多 路音频和 /或视频码流输入 /输出位置信息交互以及媒体能力协商包括: 所述远程呈现终端向所述远端端点发送本端的能力集, 其中包括本 端的媒体编解码能力和本端远程呈现终端的音、 视频码流输入 /输出位置 信息; 接收所述远端端点的能力集, 其中包括远端的媒体编解码能力和 音、 视频码流输入 /输出位置信息。
16、 根据权利要求 14所述远程呈现方法, 其特征在于, 所述媒体逻 辑通道包括发送通道和接收通道, 其中,
多路音频码流通过一个发送通道发送, 并通过一个接收通道接收; 和 /或, 多路视频码流通过一个发送通道发送, 并通过一个接收通道接收; 各路音频和 /或视频码流通过数据包头信息进行区分, 所述包头信息 包括: 码流类型、 码流对应的输入位置信息和输出位置信息。
17、 根据权利要求 14所述远程呈现方法, 其特征在于, 所述媒体逻 辑通道包括发送通道和接收通道, 其中,
多路音频码流分别通过不同的发送通道发送, 并通过不同的接收通 道接收; 和 /或, 多路视频码流分别通过不同的发送通道发送, 并通过不 同的接收通道接收;
不同的发送通道分别与各路音频和 /或视频码流的类型、 以及输入 /输 出位置信息建立对应关系; 不同的接收通道分别与各路音频和 /或视频码 流的类型、 以及输入 /输出位置信息建立对应关系。
18、 根据权利要求 17所述远程呈现方法, 其特征在于, 所述建立媒 体逻辑通道, 具体为:
根据协商的媒体编解码能力, 本端的发送地址和远端的接收地址, 待建立通道对应的远端音频或视频码流输出位置信息, 以及本端音频或 视频码流输入位置信息, 建立发送通道;
根据协商的媒体编解码能力, 远端的发送地址和本端的接收地址, 待建立通道对应的远端音频或视频码流输入位置信息, 以及本端的音频 或视频码流输出位置信息, 建立接收通道。
19、 根据权利要求 14所述远程呈现方法, 其特征在于, 该方法还包 括: 所述远程呈现终端对发送和 /或接收的多路音频和 /或视频码流进行同 步处理。
20、 根据权利要求 14至 19任一项所述远程呈现方法, 其特征在于, 所述远端端点为多点控制单元 MCU或远端远程呈现终端。
PCT/CN2012/072751 2011-07-08 2012-03-21 一种远程呈现方法、终端和系统 WO2012155660A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP12784928.9A EP2731330A4 (en) 2011-07-08 2012-03-21 TELEPRESENCE METHOD AND DEVICE AND SYSTEM THEREFOR
US14/130,475 US9172912B2 (en) 2011-07-08 2012-03-21 Telepresence method, terminal and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110191493.8A CN102868873B (zh) 2011-07-08 2011-07-08 一种远程呈现方法、终端和系统
CN201110191493.8 2011-07-08

Publications (1)

Publication Number Publication Date
WO2012155660A1 true WO2012155660A1 (zh) 2012-11-22

Family

ID=47176253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/072751 WO2012155660A1 (zh) 2011-07-08 2012-03-21 一种远程呈现方法、终端和系统

Country Status (4)

Country Link
US (1) US9172912B2 (zh)
EP (1) EP2731330A4 (zh)
CN (1) CN102868873B (zh)
WO (1) WO2012155660A1 (zh)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868873B (zh) * 2011-07-08 2017-10-17 中兴通讯股份有限公司 一种远程呈现方法、终端和系统
US9215406B2 (en) 2013-03-14 2015-12-15 Polycom, Inc. Immersive telepresence anywhere
CN104219483B (zh) * 2013-06-01 2019-10-25 中兴通讯股份有限公司 远程呈现端点的能力交互方法及装置
CN104219486B (zh) * 2013-06-01 2019-05-17 中兴通讯股份有限公司 远程呈现端点的能力交互方法及装置
CN104219487B (zh) * 2013-06-01 2019-05-07 中兴通讯股份有限公司 远程呈现端点的能力交互方法及装置、数据流
CN103369292B (zh) * 2013-07-03 2016-09-14 华为技术有限公司 一种呼叫处理方法及网关
CN104519023B (zh) * 2013-09-29 2019-08-27 中兴通讯股份有限公司 能力协商处理方法、装置及远程呈现端点
CN104519305A (zh) * 2013-09-29 2015-04-15 中兴通讯股份有限公司 端点信息交互处理方法、装置及远程呈现端点
CN105530223B (zh) * 2014-09-29 2019-08-30 中兴通讯股份有限公司 端点能力集的配置方法及装置
CN105530450A (zh) * 2014-10-24 2016-04-27 上海良相智能化工程有限公司 一种智能可视电话摄像机
CN106303551A (zh) * 2015-05-18 2017-01-04 阿里巴巴集团控股有限公司 一种视频采集方法及相关设备与系统
US11064453B2 (en) * 2016-11-18 2021-07-13 Nokia Technologies Oy Position stream session negotiation for spatial audio applications
CN106454276A (zh) * 2016-11-30 2017-02-22 拾联(厦门)信息科技有限公司 一种音视频集成装置及集成视频监控系统
CN107911361B (zh) * 2017-11-14 2020-05-08 网易(杭州)网络有限公司 支持多会话的语音管理方法、装置、终端设备及存储介质
JP2022051975A (ja) * 2019-02-12 2022-04-04 ソニーグループ株式会社 情報処理装置および情報処理方法
CN112689118B (zh) * 2020-12-29 2023-12-08 厦门亿联网络技术股份有限公司 一种多屏网真终端的数据传输方法和装置
CN113890659A (zh) * 2021-03-17 2022-01-04 广州市保伦电子有限公司 一种基于管道的音频广播方法
CN113079357A (zh) * 2021-04-08 2021-07-06 天地伟业技术有限公司 一种实现音视频交互的编码器及系统
CN113259690A (zh) * 2021-07-05 2021-08-13 人民法院信息技术服务中心 一种跨网系的音视频实时在线协同系统及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1170315A (zh) * 1996-04-05 1998-01-14 索尼公司 电视会议系统及其方法
US20080246834A1 (en) * 2007-03-16 2008-10-09 Tandberg Telecom As Telepresence system, method and computer program product
CN101911667A (zh) * 2007-12-27 2010-12-08 松下电器产业株式会社 连接装置及连接方法

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5335011A (en) * 1993-01-12 1994-08-02 Bell Communications Research, Inc. Sound localization system for teleconferencing using self-steering microphone arrays
SE521936C2 (sv) * 1996-06-26 2003-12-23 Telia Ab Metod för att effektivt använda bandbredd vid tillhandahållande av tjänster via ett digitalt cellulärt radiokommunikationssystem
JP2001111627A (ja) * 1999-10-04 2001-04-20 Toshiba Corp 通信システムおよび通信路結合方法
US6590604B1 (en) * 2000-04-07 2003-07-08 Polycom, Inc. Personal videoconferencing system having distributed processing architecture
US8948059B2 (en) * 2000-12-26 2015-02-03 Polycom, Inc. Conference endpoint controlling audio volume of a remote device
US7765302B2 (en) * 2003-06-30 2010-07-27 Nortel Networks Limited Distributed call server supporting communication sessions in a communication system and method
US7606181B1 (en) * 2003-06-30 2009-10-20 Nortel Networks Limited Apparatus, method, and computer program for processing audio information of a communication session
DE602004003070T2 (de) * 2003-07-21 2007-05-31 France Telecom Zugriffsregelung für eine multimedia-sitzung gemäss netzwerk-betriebsmittelverfügbarkeit
US8819128B2 (en) * 2003-09-30 2014-08-26 Apple Inc. Apparatus, method, and computer program for providing instant messages related to a conference call
NO318911B1 (no) * 2003-11-14 2005-05-23 Tandberg Telecom As Distribuert sammensetting av sanntids-media
US8996619B1 (en) * 2004-03-31 2015-03-31 Apple Inc. Apparatus, method, and computer program for controlling a target device using instant messages
US8599239B2 (en) * 2004-04-21 2013-12-03 Telepresence Technologies, Llc Telepresence systems and methods therefore
US7612793B2 (en) 2005-09-07 2009-11-03 Polycom, Inc. Spatially correlated audio in multipoint videoconferencing
WO2007055206A1 (ja) * 2005-11-08 2007-05-18 Sharp Kabushiki Kaisha 通信装置、通信方法、通信システム、プログラム、および、コンピュータ読み取り可能な記録媒体
US8760485B2 (en) 2006-03-02 2014-06-24 Cisco Technology, Inc. System and method for displaying participants in a videoconference between locations
US8072481B1 (en) * 2006-03-18 2011-12-06 Videotronic Systems Telepresence communication system
US7679639B2 (en) * 2006-04-20 2010-03-16 Cisco Technology, Inc. System and method for enhancing eye gaze in a telepresence system
US7710448B2 (en) * 2006-04-20 2010-05-04 Cisco Technology, Inc. System and method for preventing movement in a telepresence system
US7692680B2 (en) * 2006-04-20 2010-04-06 Cisco Technology, Inc. System and method for providing location specific sound in a telepresence system
US20070250567A1 (en) * 2006-04-20 2007-10-25 Graham Philip R System and method for controlling a telepresence system
EP2151122B1 (en) * 2007-02-14 2014-01-22 Teliris, Inc. Telepresence conference room layout, dynamic scenario manager, diagnostics and control system and method
US20080273078A1 (en) * 2007-05-01 2008-11-06 Scott Grasley Videoconferencing audio distribution
US8237769B2 (en) * 2007-09-21 2012-08-07 Motorola Mobility Llc System and method of videotelephony with detection of a visual token in the videotelephony image for electronic control of the field of view
US8289362B2 (en) 2007-09-26 2012-10-16 Cisco Technology, Inc. Audio directionality control for a multi-display switched video conferencing system
US8577011B1 (en) * 2008-01-09 2013-11-05 Shoretel, Inc. Distributed call pickup group for VoIP system
GB0905317D0 (en) * 2008-07-14 2009-05-13 Musion Ip Ltd Video processing and telepresence system and method
US8649426B2 (en) * 2008-09-18 2014-02-11 Magor Communications Corporation Low latency high resolution video encoding
NO332009B1 (no) * 2008-12-12 2012-05-21 Cisco Systems Int Sarl Fremgangsmate for a igangsette kommunikasjonsforbindelser
NO329739B1 (no) * 2008-12-23 2010-12-13 Tandberg Telecom As Fremgangsmate, anordning og dataprogram for a prosessere bilder i en konferanse mellom et flertall av videokonferanseterminaler
CN102549608A (zh) * 2009-09-24 2012-07-04 盖特赫尔公司 协同和旅行生态系统
CN102223201B (zh) * 2010-04-15 2014-01-01 中兴通讯股份有限公司 一种编解码器能力协商方法及终端
CN101883317A (zh) * 2010-05-18 2010-11-10 中兴通讯股份有限公司 移动终端及用户位置通知方法
US8928659B2 (en) * 2010-06-23 2015-01-06 Microsoft Corporation Telepresence systems with viewer perspective adjustment
CN102655584B (zh) * 2011-03-04 2017-11-24 中兴通讯股份有限公司 一种远程呈现技术中媒体数据发送和播放的方法及系统
CN102868880B (zh) * 2011-07-08 2017-09-05 中兴通讯股份有限公司 一种基于远程呈现的媒体传输方法及系统
CN102868873B (zh) * 2011-07-08 2017-10-17 中兴通讯股份有限公司 一种远程呈现方法、终端和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1170315A (zh) * 1996-04-05 1998-01-14 索尼公司 电视会议系统及其方法
US20080246834A1 (en) * 2007-03-16 2008-10-09 Tandberg Telecom As Telepresence system, method and computer program product
CN101911667A (zh) * 2007-12-27 2010-12-08 松下电器产业株式会社 连接装置及连接方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2731330A4 *

Also Published As

Publication number Publication date
CN102868873B (zh) 2017-10-17
EP2731330A1 (en) 2014-05-14
CN102868873A (zh) 2013-01-09
US9172912B2 (en) 2015-10-27
EP2731330A4 (en) 2015-04-15
US20140146129A1 (en) 2014-05-29

Similar Documents

Publication Publication Date Title
WO2012155660A1 (zh) 一种远程呈现方法、终端和系统
KR100880150B1 (ko) 멀티 포인트 화상회의 시스템 및 해당 미디어 프로세싱방법
CN108076306B (zh) 会议实现方法、装置、设备和系统、计算机可读存储介质
JP5320406B2 (ja) オーディオ処理の方法、システム、及び制御サーバ
CN104365088B (zh) 用于视频图像分享和控制的方法、系统及介质
US6457043B1 (en) Speaker identifier for multi-party conference
WO2012155659A1 (zh) 一种基于远程呈现的媒体传输方法及系统
EP2154885B1 (en) A caption display method and a video communication control device
US9888046B2 (en) Systems, methods and media for identifying and associating user devices with media cues
WO2010034254A1 (zh) 视频及音频处理方法、多点控制单元和视频会议系统
WO2008040258A1 (en) System and method for realizing multi-language conference
WO2011015136A1 (zh) 一种会议控制的方法、装置和系统
WO2012041117A1 (zh) 一种对视频会议终端集中监控的方法和系统及相关装置
WO2015127799A1 (zh) 协商媒体能力的方法和设备
US9088690B2 (en) Video conference system
WO2012175025A1 (zh) 远程呈现会议系统、远程呈现会议的录制与回放方法
CN107040458B (zh) 一种实现视频会议互通的方法和系统
CN102438119B (zh) 一种数字电视的音视频通讯系统
WO2011023024A1 (zh) 一种监控信息传输的方法及系统
Romanow et al. Requirements for Telepresence Multistreams
CN104270655A (zh) 一种多点视频汇聚系统
Romanow et al. RFC 7262: Requirements for Telepresence Multistreams

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12784928

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14130475

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012784928

Country of ref document: EP