WO2010094213A1 - Method, apparatus and system for transmitting and receiving multiplex media data stream - Google Patents

Method, apparatus and system for transmitting and receiving multiplex media data stream Download PDF

Info

Publication number
WO2010094213A1
WO2010094213A1 PCT/CN2010/070180 CN2010070180W WO2010094213A1 WO 2010094213 A1 WO2010094213 A1 WO 2010094213A1 CN 2010070180 W CN2010070180 W CN 2010070180W WO 2010094213 A1 WO2010094213 A1 WO 2010094213A1
Authority
WO
WIPO (PCT)
Prior art keywords
media data
data stream
media
streams
packet description
Prior art date
Application number
PCT/CN2010/070180
Other languages
French (fr)
Chinese (zh)
Inventor
苏红宏
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Publication of WO2010094213A1 publication Critical patent/WO2010094213A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets

Definitions

  • the present invention relates to the field of network technologies, and in particular, to a method, device and system for transmitting and receiving multiple media streams in multimedia communication. Background technique
  • the ITU-T H.239-based standard defines the transmission of multiple media streams in a conference television. Multiple channels of media streams can be transmitted between terminals and terminals and between terminals and MCUs (Multipoint Control Units). Only defines how to send an auxiliary video stream, and H.239 only defines the role of each media stream. This role can be a moving image or slide content, but it does not indicate the relationship between multiple media streams, so it receives After acquiring the multi-channel media data, the terminal needs to be processed by the receiving end before outputting.
  • FIG. 1 is a schematic diagram of a method for media streaming in a prior art conference television.
  • Two conference TV terminals are placed in the conference site A and the conference site B.
  • Each terminal is used by one (or one) group of participants.
  • each set of terminals includes a camera, a microphone, a display device, and a speaker.
  • two sites, eight and B connect to the network through switches.
  • the terminal 1 of the site A establishes a connection with the terminal 3 of the site B
  • the terminal 2 of the site A establishes a connection with the terminal 4 of the site B, so that the terminal 1 of the site A will present the sound image and data collected by the terminal 3 of the site B.
  • the terminal 2 of A will present the sound image and data collected by the terminal 4 of the venue B.
  • the terminal 3 of the conference site B will present the audio and video data of the terminal 1 of the conference site A
  • the terminal 4 of the conference site B will present the audio image and data of the terminal 2 of the conference site A.
  • Embodiments of the present invention provide a method, apparatus, and system for transmitting and receiving multiple media streams in multimedia communication, which can solve the problem that multiple media streams cannot be freely combined and output in multimedia communication.
  • a method for transmitting a multi-channel media data stream in multimedia communication comprising:
  • An embodiment of the present invention further provides a method for receiving and processing a multi-channel media data stream in a multimedia communication, including:
  • An embodiment of the present invention further provides a transmission device for a multi-channel media data stream in multimedia communication, including:
  • a grouping unit configured to group the multi-path media data streams to be transmitted; a packet description unit, grouping and describing the grouped multi-media media data streams; and a data transmission unit, transmitting the grouped multi-path media data And a packet description corresponding to the media data stream.
  • An embodiment of the present invention further provides a receiving and processing apparatus for a multi-channel media data stream in a multimedia communication, including:
  • a receiving unit configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream; and an output unit, configured to: according to the packet description of the received multi-channel media data stream, the packet output The media data stream.
  • An embodiment of the present invention further provides a conference terminal, including:
  • a sending device configured to acquire and group a plurality of media data streams to be transmitted, perform packet group description on the grouped multi-channel media data stream, and transmit the grouped multi-channel media data stream and the media a packet description corresponding to the data stream; and/or a receiving processing device, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream, and output the media data according to a packet description of the received multi-channel media data stream flow.
  • An embodiment of the present invention further provides a system for multi-channel media data transmission in multimedia communication, including:
  • a data acquisition device including a video acquisition device, an audio acquisition device, or a data acquisition device, for acquiring a multi-channel media data stream
  • a transmitting device configured to group the multi-channel media data streams acquired by the data acquiring device, perform packet group description on the grouped multi-channel media data streams, and transmit the grouped multi-channel media data streams and a packet description corresponding to the media data stream;
  • a receiving processing device configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream, and output the media data according to a packet description of the received multi-channel media data stream Flow
  • a data output device configured to output the grouped media data stream through a video output device, an audio output device, or a data output device.
  • the method, device and system for transmitting and receiving multiple media streams in multimedia communication are capable of grouping multiple media data, and grouping and describing each packet, and performing the grouped multimedia data stream and the packet description. Transmitting; and after receiving the multimedia data stream carrying the packet description and the packet description corresponding to the multimedia data stream, outputting the media data stream according to the packet description, thereby implementing free combination output of the multiple media streams .
  • FIG. 1 is a schematic diagram of a method for multi-channel media streaming in a conference television of the prior art
  • FIG. 2 is a flowchart of a method for transmitting multi-channel media data stream in multimedia communication according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for outputting multi-channel media data stream in multimedia communication according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method according to Embodiment 1 of the present invention.
  • FIG. 6 is a schematic diagram of a multi-channel media stream grouping at an input end according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram of a multi-channel media stream grouping at an output end according to an embodiment of the present invention
  • FIG. 8 is a second embodiment of the present invention. Schematic diagram of a method for transmitting and outputting media streams in a multi-channel medium
  • FIG. 9 is a flowchart of a method for transmitting and outputting multiple media data streams in multimedia communication according to Embodiment 3 of the present invention.
  • FIG. 10 is a schematic diagram of an apparatus for transmitting a multi-channel media data stream in a multimedia communication according to an embodiment of the present invention
  • FIG. 11 is a schematic diagram of a device for receiving a multi-channel media data stream in a multimedia communication according to an embodiment of the present invention
  • FIG. 12 is a schematic diagram of a system for transmitting multi-channel media data streams in multimedia communication according to an embodiment of the present invention. detailed description
  • a method for transmitting multiple media streams in multimedia communication includes:
  • it can be carried in a media data stream or a bearer channel corresponding to the media data stream.
  • a packet description or carrying a packet description in a control protocol related to the media data stream; or adding a control message carrying a packet description corresponding to the media data in the media data stream, thereby implementing grouping the grouped multi-media media data streams description.
  • the method for multi-media media stream transmission provided by the embodiment of the present invention can specify a group feature for a transmitted media data stream by grouping multiple media data streams and adding a packet description to the grouped media data stream, and receiving the terminal. Corresponding output processing can be performed according to the packet description, thereby realizing free combination transmission of multiple media streams at different terminals.
  • a method for receiving and processing a multi-channel media stream in a multimedia communication includes:
  • S301 Receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream.
  • the packet description may be carried in a media data stream or a bearer channel corresponding to the media data stream; or carried in a control protocol associated with the media data stream; or carried in a control message added in the media data stream.
  • the packet outputs the media data stream and a packet description corresponding to the media data stream according to the packet description of the received multiple media data stream.
  • Embodiments of the present invention for receiving a multi-channel media stream receiving process in a multimedia communication by receiving a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and outputting the packet according to the packet description packet Media data, thereby enabling multi-media media streams to be freely combined output at different terminals.
  • FIG. 4 it is a schematic diagram of a multi-media media stream transmission and output scenario according to the embodiment.
  • the multi-media media stream is sent by the site A, and the multi-channel media stream is received by the site B.
  • the site A and the site B generally include both the transmitting and receiving devices, and the transmitting and receiving devices are also included in the terminal.
  • the embodiments of the present invention have been simplified for the convenience of the description of the present invention.
  • the site A includes the participants P1 and P2, and the terminal 1.
  • Participant P1 has a corresponding microphone 1, camera 1 and data processing device 1 (such as a computer), participant P2 There are respectively corresponding microphone 2, camera 2 and data processing device 2 (such as a computer);
  • site B includes terminal 2, a plurality of participants P3, display 1 and speaker 1 respectively located at position 1, located at position 3 Display 3 and speaker 3, and display 2 at position 2.
  • the terminal 1 and the terminal 2 are connected through a network.
  • the basic processing is as follows:
  • the terminal 1 collects 6 pieces of media data and groups them.
  • the six sets of media data include: data service media 1, data service media 2, video media 1, video media 2, audio media 1, audio media 2.
  • the grouping may be performed according to the positional relationship between the interfaces of the media data streams, that is, the media data streams of the physical interfaces of the same physical interface or similar locations are grouped.
  • the six media streams are divided into two groups of L1 and L2, wherein the media data of the L1 group corresponds to the participant P1, and includes: data service media 1, video media 1, audio media 1, and L2 group.
  • the media stream corresponds to the participant P2, and includes: data service media 2, video media 2, and audio media 2.
  • the above is a method for determining the grouping according to the interface relationship of the terminal 1, corresponding to the data input interface of the participant P1, the audio input interface, and the data received by the data service input interface as the L1 group, corresponding to the video input interface and the audio input interface of the participant P2. And the data received by the data service input interface as the L2 group.
  • the video interface may be a CVBS (Composite Video Broadcast Signal or Composite Video Blanking and Sync) interface, an S-Video (Separate Video) interface, a VGA (Video Graphics Array) interface, DVI ( Digital Visual Interface, digital video) interface, etc.
  • the audio interface can be any form of MIC (Medium Interface Connector), Line IN (input signal) analog or digital interface.
  • the data service interface can be a network port, a USB (Universal Serial BUS) interface, or a VGA interface (data content has been converted to video output on a PC), and the data service content includes file or slide show. These interfaces may also be physically merged.
  • the video interface and the audio interface may be combined into an HDMI (High Definition Multimedia Interface) interface or an IEEE 1394 interface, and even video, audio, and data services may be connected through a USB interface. In this case, video, audio, and data services can also be transmitted through logical channels.
  • the present invention is not limited thereto, and there may be multiple sets of video, audio, and data service streams for each packet.
  • packet L1 may also have two audio inputs, or two video inputs.
  • the method of determining the grouping relationship may also be by grouping the spatial locations of the audio and/or video, that is, obtaining the spatial position of the image by the camera, and/or obtaining the spatial position of the sound through the microphone, A video, and/or audio that has a spatial location of the image, and/or a sound location that is consistent or similar, is grouped.
  • the method for determining the grouping relationship may further be: using the array microphone to pick up the multi-channel sound source positions, forming the multiple independent audio streams, grouping the multiple audio streams; or obtaining the panoramic image through the wide-angle camera, The panoramic image is sliced into multiplexed images, and the multiplexed video streams are grouped.
  • the corresponding video media and audio media are classified into a group, for example, the video media and the audio media can be grouped according to the positional relationship. That is, the media stream that is grouped may be constructed by other means, and is not necessarily obtained directly from the interface.
  • the terminal 1 compresses and encodes the grouped media data stream.
  • Manner 1 carries a packet description in a media data stream or a bearer channel corresponding to the media data stream.
  • an extended header or media data stream of an RTP may carry a packet description.
  • the group feature may be populated in an extension header of the RTP or an extension field or a specific field in the media data stream.
  • the RTP Header Extension fills the packet description field, and the RTP streams with the same packet description are the same group.
  • the packet description may be identified in the corresponding bearer channel, for example, field expansion in RTP data (payload), and the packet description is added.
  • the SIP/SDP (Session Initiation Protocol, Session Initiation Protocol) communication can also be carried in the RTP to carry the packet description.
  • Mode 2 carries packet description information in a control protocol related to the media stream.
  • the group feature can be populated in an extension field in the control protocol.
  • each media stream is assigned a corresponding session ID (sessionID), so that the associated session can be associated by specifying a packet description for each session ID, such as in an OpenLogicalChannel message. , extending a group description field, used to populate the group description, as defined below:
  • the groupID is an extended packet description field used to populate the packet description.
  • Mode 3 increases the control of the packet description carrying the corresponding media data in the media data stream.
  • Interest Specifically, a group feature is defined in each of the control messages, and at least one media stream belonging to the same group is included in a message body of the control message.
  • the message contains the logical channel number 1, 2, 3, so that the receiving end judges that the media data of the logical channel number 1, 2, 3 is the same group according to the message.
  • the packet description can also be carried by using multiple sessions.
  • the packet description can also be carried by using multiple sessions.
  • the session PI corresponds to the group LI
  • the session P2 corresponds to the group L2, which respectively contains the audio, video and application data streams.
  • g 49171 51373 32417
  • groupl and group2 in the media description are used to describe the group number corresponding to the media stream, and the media streams of the same group number are in the same group.
  • Groupl and group2 can be numeric or character identifiers.
  • the terminal 1 sends the multi-channel media data to the terminal 2.
  • S505 The terminal 2 receives the multi-path media data with the packet description, and performs decoding and decompression processing.
  • the video data 1 and the audio data 1 corresponding to the packet L1 are output to the display 1 and the speaker 1 of the position 1, and the video data 2 and the audio data of the packet L2 are determined.
  • 2 Output to the display 3 and the speaker 3 of the position 3, and output the data service medium 1 and the data service medium 2 of the packets L1 and L2 to the display 2 of the position 2 as needed.
  • the embodiment of the present invention can also implement the data service data output interface to switch and output the obtained data services of multiple senders, that is, can be switched as needed. For example, when the participant P1 at the sending end explains the slide show, the data service output interface is switched to the data service output of L1, and when P2 explains the slide show, the data service output is switched to L2.
  • the video, audio, and data service output interfaces herein can also be in various forms and combinations as the aforementioned input interfaces.
  • Embodiments of the present invention may further implement switching between video data, audio data, or data service data output by an output interface according to a packet description of the received media data stream.
  • the terminal 2 can further automatically switch the output contents of the video output 1 and the video output 2 interface to the video of the group L1 when the participant P1 explains the slide, according to the group description, and output the audio 1 and The audio output 2 is switched to output the audio of the packet L1, so that the participants of the site B can simultaneously see the image of P1 on their respective displays, and hear the sound of P1 on the respective speakers.
  • the receiving end can notify the transmitting end to pause the video and audio collection of the L2 group or suspend the transmission of the video and audio of the L2 group to reduce the occupation of the bandwidth.
  • the network connecting terminal 1 and terminal 2 may be a circuit domain network (such as E1/SDH/ISDN) or a packet network (such as an IP network), and the communication protocol may use H.320/H.323/H.324/ SIP, etc.
  • the video media 1, the audio media 1 and the data service media 1 are divided into one group, which is a common application situation, but the actual grouping can be very flexible, for example, the video media 2 is taken as a group, The audio medium 2 and the video medium 1 are grouped as one.
  • the embodiment of the present invention implements transmission and output of multi-channel media data by grouping, so that the receiving end can freely combine and output the multi-channel media streams, and also solves the transmission and control of the data service corresponding to the video and audio. .
  • the embodiment of the invention also implements the use of a single terminal in a conference site, thereby solving the problem of excessive cost caused by using multiple terminals in the prior art.
  • Embodiment 2
  • the method for transmitting and outputting the multi-channel media data stream in this embodiment may also be processed by an intermediate device (for example, a conference control device, specifically, a multi-point control unit MCU).
  • an intermediate device for example, a conference control device, specifically, a multi-point control unit MCU.
  • the MCU After receiving the media data stream carrying the packet description and the corresponding packet description, the MCU obtains the packet description of each media data, and can re-determine the grouping relationship of each media data stream according to the output requirement.
  • the grouping relationship of the media data streams may be forwarded by the multipoint control unit MCU to change the grouping relationship between the media data streams, or to regenerate the grouping relationship of the multiple media data streams.
  • the video media stream and the audio media stream sent by the terminal 1 and the data service media stream sent by the terminal 3 are combined into a new group L at the MCU and then sent to the terminal 2.
  • the specific combination is not limited to the above embodiment.
  • the conference control device mainly performs the main media stream control. Therefore, in this way, the corresponding media stream (audio media stream, video media stream, and the like) can be flexibly and according to the request of some conference terminals. Data media streams), combined, can be diverse to meet the needs of users.
  • the embodiment of the invention implements transmitting and outputting multiple media streams by grouping, thereby receiving
  • the terminal can freely combine and output the multiple media streams, and also solves the transmission and control of the data service media stream corresponding to the video and audio.
  • This embodiment is a method for outputting a multi-channel media stream by the terminal 2 on the basis of transmitting the multi-channel media data stream in the implementation, which specifically includes the following steps:
  • the terminal 2 acquires spatial location information of each display in the site B.
  • the video data 1 in the received packet L1 is output through the display 1 of the position 1, and the video data 2 in the packet L2 is output through the display 3 of the position 3.
  • the audio data 1 of the received packet L1 is output through at least one speaker in the conference site B, so that the spatial position sense is the same as or similar to the display 1 of the location 1.
  • the audio data 2 of the received packet L2 is output through at least one speaker in the conference site B, so that the spatial position sense is the same as or similar to the display 3 of the location 3.
  • step S904 and S905 are not limited to the above sequence, and step S905 may be before step S904.
  • the sounds heard by the participants can be the same or similar to the corresponding video positions in the multiple venues, thereby realizing the realism of the conference television.
  • Embodiments of the present invention also provide an apparatus for multi-channel media stream transmission, which can solve the problem that a multi-channel media stream can only be transmitted between specific terminals.
  • an apparatus for transmitting a plurality of media data streams in a multimedia communication includes:
  • a grouping unit 101 configured to group the multiple media data streams to be transmitted
  • the packet description unit 102 performs packet description on the packetized multi-media media data stream; the data transmission unit 103 transmits the packetized multi-channel media data stream and a packet description corresponding to the media data stream.
  • the grouping unit 101 may group the media data streams by any one of the following methods: grouping according to a positional relationship between interfaces of the media data streams, that is, the same physics
  • the media data streams of interfaces or similar physical interfaces are grouped into one group; or grouped by spatial locations of audio and/or video, that is, the spatial position of the image is obtained by the camera, the spatial position of the sound is obtained by the microphone, and the spatial position of the image is obtained.
  • the multiplexed video streams are grouped.
  • the packet description unit 102 may perform group description on the grouped media data stream in any of the following manners: carrying the packet description in the media data stream or the bearer channel corresponding to the media data stream; or related to the media data stream
  • the control protocol carries the packet description; or adds a control message carrying the packet description corresponding to the media data in the media data stream.
  • the media data stream in this embodiment may specifically be one or more of the following: an audio stream, a video stream, and a data stream.
  • the apparatus for transmitting the multi-channel media data stream in the multimedia communication the multi-channel media data is grouped by the grouping unit, and the packet description unit performs group description on the multimedia data stream, thereby realizing the multi-channel media stream to be performed through one terminal. Transmitting, and enabling the receiving terminal to output each media data according to the packet description corresponding packet, realizing free combination transmission of the multi-channel media data at different terminals.
  • the invention also provides a device for receiving and processing multi-channel media data streams in multimedia communication, which can solve the problem that multi-channel media streams cannot be freely combined and output.
  • an apparatus for receiving and processing a multi-media media stream in multimedia communication includes a receiving unit 111 and an output unit 112.
  • the receiving unit 111 is configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream.
  • the packet description may be carried in a media data stream or a bearer channel corresponding to the media data stream; or carried in a control protocol related to the media data stream; or carried in a control message added in the media data stream.
  • the output unit 112 is configured to output the media data stream in a packet according to the packet description of the received multiple media data stream.
  • the apparatus may further include: a switching unit, configured to switch, according to a packet description of the media data stream received by the receiving unit, the media data stream output by the output unit.
  • the apparatus can also include a suspending unit for suspending the collection or transmission of media data streams of other groups than the designated group when all of the output ports are switched to the media stream of the specified group.
  • Embodiment of the present invention when receiving a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, According to the packet description, the media data stream is outputted in groups, thereby realizing that the multi-media media stream can be freely outputted at different terminals.
  • the switching unit is also capable of switching the transmission of media data of a specific group, and the suspending unit can suspend the collection or transmission of media data of certain groups, thereby reducing network pressure.
  • the apparatus for transmitting multiple media streams in the multimedia communication provided by the embodiment of the present invention, and the apparatus for receiving the processing, can refer to the first, second and third embodiments of the foregoing method to realize free combination output of the multi-media media streams in the multimedia communication.
  • Embodiments of the present invention also provide a conference terminal, including a transmitting device and a receiving processing device.
  • the transmitting device is configured to acquire and group a multi-channel media data stream to be transmitted, perform packet group description on the grouped multi-channel media data stream, and transmit the grouped multi-channel media data stream and the device a packet description corresponding to the media data stream; and/or
  • the receiving processing apparatus is configured to receive a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and output the packet according to the packet description of the received multi-channel media data stream.
  • Media data stream is configured to receive a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and output the packet according to the packet description of the received multi-channel media data stream.
  • a transmitting device For a conference terminal, it is preferable to have both a transmitting device and a receiving processing device.
  • the sending device further includes:
  • a grouping unit configured to group the multi-media data streams to be transmitted, including grouping according to a positional relationship between interfaces of the media data streams, or grouping by spatial locations of audio and/or video, or picking up by using an array microphone
  • the multi-channel audio streams are grouped, and the multi-channel audio streams are grouped; or after the panoramic image is acquired by the wide-angle camera, the panoramic image is divided into multiple separated images.
  • the packet description unit performing packet group description on the grouped multi-path media data stream; and including carrying the packet description in the media data stream or the bearer channel corresponding to the media data stream; Or carrying a packet description in a control protocol related to the media data stream; or adding a control message carrying a packet description corresponding to the media data in the media data stream;
  • a data transmission unit configured to transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
  • the conference terminal of the embodiment of the present invention can implement the grouping of the multimedia data stream to be transmitted and the carrying of the packet description, so that the multi-channel media data stream can be sent in a packet, and the multimedia data stream that receives the packet transmission and the corresponding After the packet description, the media data streams can be output according to the packet description, thereby realizing free combination output of the multi-channel media data.
  • the embodiment of the invention also provides a system for multi-media media stream transmission in multimedia communication, which can solve the problem that multiple media streams cannot be combined and output freely.
  • a system for multi-media streaming in multimedia communication includes: a data acquiring device 121, a transmitting device 122, a receiving processing device 123, and a data output device 124. among them,
  • the data acquisition device 121 includes a video acquisition device, an audio acquisition device, or a data acquisition device, for acquiring a multi-channel media data stream.
  • the transmitting device 122 is configured to group the multi-channel media data streams acquired by the data acquiring device, perform packet group description on the grouped multi-channel media data streams, and transmit the grouped multi-channel media data. And a packet description corresponding to the media data stream.
  • the receiving processing device 123 is configured to receive a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and according to the packet description of the received multi-channel media data stream, the packet output station The media data stream.
  • the data output device 124 is configured to output the grouped media data stream through a video output device, an audio output device, or a data output device.
  • the transmission device may further include:
  • a grouping unit configured to group the multi-media data streams to be transmitted, including grouping according to a positional relationship between interfaces of the media data streams, or grouping by spatial locations of audio and/or video, or picking up by using an array microphone
  • the multi-channel audio streams are grouped, and the multi-channel audio streams are grouped; or after the panoramic image is acquired by the wide-angle camera, the panoramic image is divided into multiple separated images.
  • the packet description unit performing packet group description on the grouped multi-path media data stream; and including carrying the packet description in the media data stream or the bearer channel corresponding to the media data stream; Or carrying a packet description in a control protocol related to the media data stream; or adding a control message carrying a packet description corresponding to the media data in the media data stream;
  • a data transmission unit configured to transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
  • system may further include:
  • the conference control device is configured to change the grouping relationship by forwarding the media data stream grouping relationship, or group the multiple media data streams of different sites.
  • the conference control device can be an MCU.
  • a system for transmitting multiple media streams in multimedia communication provided by an embodiment of the present invention Grouping the multi-channel media data, grouping the packetized multi-media media data, and transmitting the media data with the packet description and the corresponding packet description, and the output device receives the multi-channel media data After the packet description, the media data is output according to the packet description packet, so that free combination output of the multi-media media stream can be realized; the preferred MCU in the system can also change or forward the packet relationship of the media data.
  • the media data stream grouping relationship is forwarded to change the grouping relationship, thereby enabling free combination output of the multi-media media streams.
  • the system for transmitting multiple media streams in the multimedia communication can refer to the first, second, and third embodiments of the foregoing method to implement multi-channel media stream transmission, reception, and output processing in multimedia communication.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method, apparatus and system for transmitting and receiving a multiplex media data stream are provided in the embodiments of the present invention, which relate to the field of network technology, and can solve the problem that the multiplex media stream can not be combined freely and output. The transmission method comprises: grouping the multiplex media data stream to be transmitted; describing said grouped multiplex media data stream; transmitting said grouped multiplex media data stream and the grouping description corresponding to the media data stream. The receiving method comprises: receiving the multiplex media data stream with the grouping description, wherein said grouping description corresponds to said media data stream; outputting the media data stream in groups according to the grouping description of the received multiplex media data stream. The embodiments of the present invention are applied to the multiplex media stream transmission and output in the multimedia communication.

Description

多路媒体流传输和接收的方法、 装置及系统 技术领域  Method, device and system for multi-channel media stream transmission and reception
本发明涉及网络技术领域, 尤其涉及一种多媒体通信中多路媒体流传输 和接收的方法、 装置及系统。 背景技术  The present invention relates to the field of network technologies, and in particular, to a method, device and system for transmitting and receiving multiple media streams in multimedia communication. Background technique
为了进一步增强会议的临床感, 在会议电视中, 常常需要同时传送一个 会场的多个(或多组)与会人的视频、 音频以及数据。 基于 ITU-T H.239的标 准定义了会议电视中多路媒体流的传输, 终端和终端以及终端和 MCU ( Multipoint Control Unit, 多点控制单元)之间可以传送多路媒体流, 目前该 标准只定义了如何传送一路辅助视频流, 同时 H.239 只定义了每路媒体流的 角色, 该角色可以是活动图像或幻灯片内容, 但并不能表明多路媒体流之间 的关系, 因此接收端获取到多路媒体数据后, 需要由接收端进行操作处理后 再进行输出。  In order to further enhance the clinical sense of the conference, in conference TV, it is often necessary to simultaneously transmit video, audio and data of multiple (or groups) of participants in a conference venue. The ITU-T H.239-based standard defines the transmission of multiple media streams in a conference television. Multiple channels of media streams can be transmitted between terminals and terminals and between terminals and MCUs (Multipoint Control Units). Only defines how to send an auxiliary video stream, and H.239 only defines the role of each media stream. This role can be a moving image or slide content, but it does not indicate the relationship between multiple media streams, so it receives After acquiring the multi-channel media data, the terminal needs to be processed by the receiving end before outputting.
如图 1 所示为现有技术的会议电视中一路媒体流传输的方法示意图。 会 场 A和会场 B分别放置两套会议电视终端, 每套终端供 1个(或 1组)与会 人使用。 其中, 各套终端分别包括摄像头、 麦克风、 显示设备和扬声器等设 备。 另外, 两个会场八、 B之间通过交换机连接网络。 会场 A的终端 1与会 场 B的终端 3建立连接, 会场 A的终端 2与会场 B的终端 4建立连接, 这样 会场 A的终端 1将呈现会场 B的终端 3釆集的声像和数据, 会场 A的终端 2 将呈现会场 B的终端 4釆集的声像和数据。 反过来, 会场 B的终端 3将呈现 会场 A的终端 1釆集的声像和数据, 会场 B的终端 4将呈现会场 A的终端 2 釆集的声像和数据。 通过这样的技术方案, 实现了两个会场双路视频、 音频 流的传送。  FIG. 1 is a schematic diagram of a method for media streaming in a prior art conference television. Two conference TV terminals are placed in the conference site A and the conference site B. Each terminal is used by one (or one) group of participants. Among them, each set of terminals includes a camera, a microphone, a display device, and a speaker. In addition, two sites, eight and B, connect to the network through switches. The terminal 1 of the site A establishes a connection with the terminal 3 of the site B, and the terminal 2 of the site A establishes a connection with the terminal 4 of the site B, so that the terminal 1 of the site A will present the sound image and data collected by the terminal 3 of the site B. The terminal 2 of A will present the sound image and data collected by the terminal 4 of the venue B. Conversely, the terminal 3 of the conference site B will present the audio and video data of the terminal 1 of the conference site A, and the terminal 4 of the conference site B will present the audio image and data of the terminal 2 of the conference site A. Through such a technical solution, the transmission of two channels of video and audio streams at two sites is realized.
现有技术中还有另一种为视频布局产生对应音频位置的方法, 包括: 由 MCU产生合成多个端点图像的多画面视频和与多画面布局相应的声音位置信 息, 并根据接收端的扬声器情况将多个音频流、 多画面视频和声音位置信息 发送给接收端, 接收端根据位置信息使用多个音频流进行重现, 使得与会者 听到的端点声音的位置和多画面中端点的位置 (布局)接近。 对于与会者来 说, 与在显示屏近端处的扬声器处的音频广播相比较, 在屏幕远端的扬声器 处的音频广播可能被衰减, 或者被延时。 这种方案能够增强会议的临床感。 但这种方案中声音位置由 MCU确定, 由终端执行, 缺乏灵活性, 终端不能对 声音的位置作自主调整。 现有技术无法实现多媒体通信中多路媒体流自由组 合输出, 并且没有考虑到数据业务的传输和处理。 发明内容 There is another method in the prior art for generating a corresponding audio position for a video layout, comprising: generating, by the MCU, a multi-picture video combining a plurality of endpoint images and sound position information corresponding to the multi-screen layout, and according to the speaker status at the receiving end Sending multiple audio streams, multi-picture video and sound position information to the receiving end, and the receiving end uses multiple audio streams to reproduce according to the position information, so that the position of the end point sound heard by the participant and the position of the end point in the multi-picture ( Layout) close. For the participant, the speaker at the far end of the screen is compared to the audio broadcast at the speaker at the near end of the display The audio broadcast at the location may be attenuated or delayed. This kind of program can enhance the clinical sense of the meeting. However, in this scheme, the sound position is determined by the MCU, and is executed by the terminal, which lacks flexibility, and the terminal cannot independently adjust the position of the sound. The prior art cannot realize free combination output of multi-media media streams in multimedia communication, and does not consider transmission and processing of data services. Summary of the invention
本发明的实施例提供了一种多媒体通信中多路媒体流传输和接收的方 法、 装置和系统, 能够解决多媒体通信中多路媒体流不能自由组合输出的问 题。  Embodiments of the present invention provide a method, apparatus, and system for transmitting and receiving multiple media streams in multimedia communication, which can solve the problem that multiple media streams cannot be freely combined and output in multimedia communication.
本发明的实施例釆用如下技术方案:  Embodiments of the present invention use the following technical solutions:
一种多媒体通信中多路媒体数据流的传输方法, 包括:  A method for transmitting a multi-channel media data stream in multimedia communication, comprising:
对要传输的多路媒体数据流进行分组; 对所述分组后的多路媒体数据流 进行分组描述; 传输所述分组后的多路媒体数据流及与所述媒体数据流对应 的分组描述。  And grouping the multi-media media data streams to be transmitted; grouping the packetized multi-media media data streams; and transmitting the packetized multi-media media data streams and packet descriptions corresponding to the media data streams.
本发明的实施例还提供了一种多媒体通信中多路媒体数据流的接收处理 方法, 包括:  An embodiment of the present invention further provides a method for receiving and processing a multi-channel media data stream in a multimedia communication, including:
接收带有分组描述的多路媒体数据流及与所述媒体数据流对应的分组描 述; 根据所述接收的多路媒体数据流的分组描述, 分组输出所述媒体数据流。  Receiving a multi-media media data stream with a packet description and a packet description corresponding to the media data stream; and outputting the media data stream according to a packet description of the received multi-media media data stream.
本发明的实施例还提供了一种多媒体通信中多路媒体数据流的传输装 置, 包括:  An embodiment of the present invention further provides a transmission device for a multi-channel media data stream in multimedia communication, including:
分组单元, 用于对要传输的多路媒体数据流进行分组; 分组描述单元, 对所述分组后的多路媒体数据流进行分组描述; 数据传输单元, 传输所述分 组后的多路媒体数据流及与所述媒体数据流对应的分组描述。  a grouping unit, configured to group the multi-path media data streams to be transmitted; a packet description unit, grouping and describing the grouped multi-media media data streams; and a data transmission unit, transmitting the grouped multi-path media data And a packet description corresponding to the media data stream.
本发明的实施例还提供了一种多媒体通信中多路媒体数据流的接收处理 装置, 包括:  An embodiment of the present invention further provides a receiving and processing apparatus for a multi-channel media data stream in a multimedia communication, including:
接收单元, 用于接收带有分组描述的多路媒体数据流及与所述媒体数据 流对应的分组描述; 输出单元, 用于根据所述接收的多路媒体数据流的分组 描述, 分组输出所述媒体数据流。  a receiving unit, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream; and an output unit, configured to: according to the packet description of the received multi-channel media data stream, the packet output The media data stream.
本发明的实施例还提供了一种会议终端, 包括:  An embodiment of the present invention further provides a conference terminal, including:
发送装置, 用于获取要传输的多路媒体数据流并进行分组, 对所述分组 后的多路媒体数据流进行分组描述, 并传输所述分组后的多路媒体数据流及 与所述媒体数据流对应的分组描述; 和 /或 接收处理装置, 用于接收带有分组描述的多路媒体数据流及与所述媒体 数据流对应的分组描述, 并根据所述接收的多路媒体数据流的分组描述, 分 组输出所述媒体数据流。 a sending device, configured to acquire and group a plurality of media data streams to be transmitted, perform packet group description on the grouped multi-channel media data stream, and transmit the grouped multi-channel media data stream and the media a packet description corresponding to the data stream; and/or a receiving processing device, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream, and output the media data according to a packet description of the received multi-channel media data stream flow.
本发明的实施例还提供了一种多媒体通信中多路媒体数据流传输的系 统, 包括:  An embodiment of the present invention further provides a system for multi-channel media data transmission in multimedia communication, including:
数据获取装置, 包括视频获取设备、 音频获取设备或数据获取设备, 用 于获取多路媒体数据流;  a data acquisition device, including a video acquisition device, an audio acquisition device, or a data acquisition device, for acquiring a multi-channel media data stream;
传输装置, 用于对所述数据获取装置获取的多路媒体数据流进行分组, 对所述分组后的多路媒体数据流进行分组描述, 并传输所述分组后的多路媒 体数据流及与所述媒体数据流对应的分组描述;  a transmitting device, configured to group the multi-channel media data streams acquired by the data acquiring device, perform packet group description on the grouped multi-channel media data streams, and transmit the grouped multi-channel media data streams and a packet description corresponding to the media data stream;
接收处理装置, 用于接收带有分组描述的多路媒体数据流及与所述媒体 数据流对应的分组描述, 并根据所述接收的多路媒体数据流的分组描述, 分 组输出所述媒体数据流;  a receiving processing device, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream, and output the media data according to a packet description of the received multi-channel media data stream Flow
数据输出装置, 用于将所述分组后的媒体数据流通过视频输出设备、 音 频输出设备或数据输出设备进行输出。  And a data output device, configured to output the grouped media data stream through a video output device, an audio output device, or a data output device.
本发明实施例提供的多媒体通信中多路媒体流传输和接收的方法、 装置 及系统, 能够将多路媒体数据进行分组, 并且对各个分组进行分组描述, 将 分组后多媒体数据流以及分组描述进行发送; 而在接收到携带分组描述的多 媒体数据流以及与所述多媒体数据流对应的分组描述后, 根据这些分组描述, 分组输出所述媒体数据流, 因而实现了多路媒体流的自由组合输出。 附图说明  The method, device and system for transmitting and receiving multiple media streams in multimedia communication provided by the embodiments of the present invention are capable of grouping multiple media data, and grouping and describing each packet, and performing the grouped multimedia data stream and the packet description. Transmitting; and after receiving the multimedia data stream carrying the packet description and the packet description corresponding to the multimedia data stream, outputting the media data stream according to the packet description, thereby implementing free combination output of the multiple media streams . DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图 1为现有技术会议电视中多路媒体流传输的方法示意图;  1 is a schematic diagram of a method for multi-channel media streaming in a conference television of the prior art;
图 2为本发明实施例多媒体通信中多路媒体数据流传输的方法流程图; 图 3为本发明实施例多媒体通信中多路媒体数据流输出的方法流程图; 图 4 为本发明实施例一多媒体通信中多路媒体数据流传输和输出的方法 示意图; 图 5为本发明实施例一的方法流程图; 2 is a flowchart of a method for transmitting multi-channel media data stream in multimedia communication according to an embodiment of the present invention; FIG. 3 is a flowchart of a method for outputting multi-channel media data stream in multimedia communication according to an embodiment of the present invention; Schematic diagram of a method for transmitting and outputting multiple media data streams in multimedia communication; FIG. 5 is a flowchart of a method according to Embodiment 1 of the present invention; FIG.
图 6为本发明实施例一多路媒体流在输入端进行分组的示意图; 图 7为本发明实施例一多路媒体流在输出端进行分组的示意图; 图 8 为本发明实施例二多媒体通信中多路媒体数据流传输和输出的方法 示意图;  6 is a schematic diagram of a multi-channel media stream grouping at an input end according to an embodiment of the present invention; FIG. 7 is a schematic diagram of a multi-channel media stream grouping at an output end according to an embodiment of the present invention; FIG. 8 is a second embodiment of the present invention. Schematic diagram of a method for transmitting and outputting media streams in a multi-channel medium;
图 9 为本发明实施例三多媒体通信中多路媒体数据流传输和输出的方法 流程图;  FIG. 9 is a flowchart of a method for transmitting and outputting multiple media data streams in multimedia communication according to Embodiment 3 of the present invention; FIG.
图 10为本发明实施例多媒体通信中多路媒体数据流传输的装置示意图; 图 11为本发明实施例多媒体通信中多路媒体数据流接收处理的装置示意 图;  10 is a schematic diagram of an apparatus for transmitting a multi-channel media data stream in a multimedia communication according to an embodiment of the present invention; FIG. 11 is a schematic diagram of a device for receiving a multi-channel media data stream in a multimedia communication according to an embodiment of the present invention;
图 12为本发明实施例多媒体通信中多路媒体数据流传输的系统示意图。 具体实施方式  FIG. 12 is a schematic diagram of a system for transmitting multi-channel media data streams in multimedia communication according to an embodiment of the present invention. detailed description
下面结合附图对本发明实施例多媒体通信中多路媒体流传输和接收的方 法、 装置和系统进行详细描述。  The method, device and system for transmitting and receiving multiple media streams in multimedia communication according to an embodiment of the present invention are described in detail below with reference to the accompanying drawings.
应当明确, 所描述的实施例仅仅是本发明一部分实施例, 而不是全部的 实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造性劳 动前提下所获得的所有其他实施例, 都属于本发明保护的范围。  It should be understood that the described embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
如图 2 所示, 本发明的实施例多媒体通信中多路媒体流传输的方法, 包 括:  As shown in FIG. 2, a method for transmitting multiple media streams in multimedia communication according to an embodiment of the present invention includes:
5201、 对要传输的多路媒体数据流进行分组。  5201. Group the multiple media data streams to be transmitted.
其中, 要对多路媒体数据流进行分组可以釆用以下任一种方式: 根据媒体数据流的接口之间的位置关系进行分组, 即将同一物理接口或 位置相近的物理接口的媒体数据流分为一组; 或通过音频和 /或视频的空间位 置进行分组, 即通过摄像机获取图像的空间位置, 麦克风获取声音的空间位 置, 将图像的空间位置、 和 /或声音位置一致或相近的视频、 和 /或音频作为一 组; 或利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流后, 将所述多路音频流进行分组; 或通过广角摄像机获取到全景图像后, 将所述 全景图像切分成多路分离的图像, 将所述多路视频流进行分组。  To group multiple media data streams, you can use any of the following methods: Grouping according to the positional relationship between the interfaces of the media data streams, that is, dividing the media data streams of the same physical interface or physical interfaces with similar positions a group; or grouped by the spatial location of the audio and/or video, that is, the spatial position of the image obtained by the camera, the spatial position of the sound obtained by the microphone, the video position, and/or the sound position of the image being consistent or similar, and / or audio as a group; or use the array microphone to pick up the multi-channel source position, form a plurality of corresponding independent audio streams, then group the multiple audio streams; or after obtaining the panoramic image through the wide-angle camera, The panoramic image is divided into multiplexed images, and the multiplexed video streams are grouped.
5202、 对所述分组后的多路媒体数据流进行分组描述。  5202. Perform packet group description on the grouped multi-media data stream.
具体地, 可以通过在媒体数据流或与媒体数据流相应的承载通道中携带 分组描述; 或在与媒体数据流相关的控制协议中携带分组描述; 或在媒体数 据流中增加携带与媒体数据相应的分组描述的控制消息, 从而实现对分组后 的多路媒体数据流进行分组描述。 Specifically, it can be carried in a media data stream or a bearer channel corresponding to the media data stream. a packet description; or carrying a packet description in a control protocol related to the media data stream; or adding a control message carrying a packet description corresponding to the media data in the media data stream, thereby implementing grouping the grouped multi-media media data streams description.
S203、 传输所述分组后的多路媒体数据流及与所述媒体数据流对应的分 组描述。  S203. Transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
本发明实施例提供的多路媒体流传输的方法通过将多路媒体数据流进行 分组, 并给分组后的媒体数据流加上分组描述, 能够为传输的媒体数据流指 定组别特征, 接收终端能够根据该分组描述进行相应的输出处理, 从而实现 了多路媒体流在不同终端的自由组合传输。  The method for multi-media media stream transmission provided by the embodiment of the present invention can specify a group feature for a transmitted media data stream by grouping multiple media data streams and adding a packet description to the grouped media data stream, and receiving the terminal. Corresponding output processing can be performed according to the packet description, thereby realizing free combination transmission of multiple media streams at different terminals.
如图 3所示, 本发明的实施例多媒体通信中多路媒体流接收处理的方法, 包括:  As shown in FIG. 3, a method for receiving and processing a multi-channel media stream in a multimedia communication according to an embodiment of the present invention includes:
S301、 接收带有分组描述的多路媒体数据流以及与所述媒体数据流对应 的分组描述。  S301. Receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream.
具体地, 分组描述可以携带在媒体数据流或与媒体数据流相应的承载通 道中; 或携带在与媒体数据流相关的控制协议中; 或携带在媒体数据流中增 加的控制消息中。  Specifically, the packet description may be carried in a media data stream or a bearer channel corresponding to the media data stream; or carried in a control protocol associated with the media data stream; or carried in a control message added in the media data stream.
S302、 根据所述接收的多路媒体数据流的分组描述, 分组输出所述媒体 数据流及与所述媒体数据流对应的分组描述。  S302. The packet outputs the media data stream and a packet description corresponding to the media data stream according to the packet description of the received multiple media data stream.
本发明的实施例多媒体通信中多路媒体流接收处理的方法通过接收带有 分组描述的多路媒体数据流以及与所述媒体数据流对应的分组描述, 再根据 所述分组描述分组输出所述媒体数据, 从而实现了多路媒体流能够在不同终 端的自由组合输出。  Embodiments of the present invention for receiving a multi-channel media stream receiving process in a multimedia communication by receiving a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and outputting the packet according to the packet description packet Media data, thereby enabling multi-media media streams to be freely combined output at different terminals.
下面通过不同场景下多媒体通信中多路媒体流的传输和输出的实际应用 对本发明的具体实施方式进行说明。  The specific embodiments of the present invention are described below by practical application of transmission and output of multiple media streams in multimedia communication in different scenarios.
实施例一  Embodiment 1
如图 4 所示, 为本实施例多路媒体流传输和输出场景的示意图。 本实施 例以会场 A发送多路媒体流、 会场 B接收多路媒体流为例。 必须说明的是, 在实际应用中, 会场 A和会场 B—般同时包括发送和接收设备, 在终端上也 同时包括发送和接收设备。 为了方便说明本发明, 本发明的实施例做了简化。  As shown in FIG. 4, it is a schematic diagram of a multi-media media stream transmission and output scenario according to the embodiment. In this embodiment, the multi-media media stream is sent by the site A, and the multi-channel media stream is received by the site B. It must be noted that, in practical applications, the site A and the site B generally include both the transmitting and receiving devices, and the transmitting and receiving devices are also included in the terminal. The embodiments of the present invention have been simplified for the convenience of the description of the present invention.
本实施例中, 会场 A包括与会人 P1和 P2, 终端 1。 与会人 P1分别有与 之对应的麦克风 1、 摄像机 1和数据处理设备 1 (比如一台电脑), 与会人 P2 分别有与之对应的麦克风 2、 摄像机 2和数据处理设备 2 (比如一台电脑); 会场 B包括终端 2, 多个与会人 P3 , 分别位于位置 1的显示器 1和扬声器 1 , 位于位置 3的显示器 3和扬声器 3 , 和位于位置 2的显示器 2。 终端 1与终端 2通过网络连接。 如图 5所示, 基本的处理过程如下: In this embodiment, the site A includes the participants P1 and P2, and the terminal 1. Participant P1 has a corresponding microphone 1, camera 1 and data processing device 1 (such as a computer), participant P2 There are respectively corresponding microphone 2, camera 2 and data processing device 2 (such as a computer); site B includes terminal 2, a plurality of participants P3, display 1 and speaker 1 respectively located at position 1, located at position 3 Display 3 and speaker 3, and display 2 at position 2. The terminal 1 and the terminal 2 are connected through a network. As shown in Figure 5, the basic processing is as follows:
S501、 终端 1釆集到 6路媒体数据并进行分组。 这 6组媒体数据包括: 数据业务媒体 1、 数据业务媒体 2, 视频媒体 1、 视频媒体 2, 音频媒体 1、 音 频媒体 2。  S501. The terminal 1 collects 6 pieces of media data and groups them. The six sets of media data include: data service media 1, data service media 2, video media 1, video media 2, audio media 1, audio media 2.
可以根据媒体数据流的接口之间的位置关系进行分组, 即将同一物理接 口或位置相近的物理接口的媒体数据流分为一组。 如图 6所示, 将这 6路媒 体流划分为两组 L1和 L2, 其中 L1组的媒体数据与与会人 P1对应, 包括: 数据业务媒体 1 , 视频媒体 1 , 音频媒体 1; L2组的媒体流与与会人 P2对应, 包括: 数据业务媒体 2, 视频媒体 2, 音频媒体 2。  The grouping may be performed according to the positional relationship between the interfaces of the media data streams, that is, the media data streams of the physical interfaces of the same physical interface or similar locations are grouped. As shown in FIG. 6, the six media streams are divided into two groups of L1 and L2, wherein the media data of the L1 group corresponds to the participant P1, and includes: data service media 1, video media 1, audio media 1, and L2 group. The media stream corresponds to the participant P2, and includes: data service media 2, video media 2, and audio media 2.
上述为终端 1根据接口关系确定分组的方法, 对应与会人 P1的视频输入 接口、 音频输入接口、 以及数据业务输入接口收到的数据作为 L1组, 对应与 会人 P2的视频输入接口、 音频输入接口、 以及数据业务输入接口收到的数据 作为 L2组。 视频接口可以是 CVBS ( Composite Video Broadcast Signal 或 Composite Video Blanking and Sync , 复合视频广播信号) 接口、 S-Video ( Separate Video, 分离视频)接口、 VGA ( Video Graphics Array, 视频图像 阵列)接口、 DVI ( Digital Visual Interface, 数字视频)接口等。 音频接口可 以是各种形式的 MIC ( Medium Interface Connector,媒体接口连接器)、 Line IN (输入信号)模拟或数字接口。 数据业务接口可以是网口、 USB ( Universal Serial BUS, 通用串行总线)接口, 也可以是 VGA接口 (数据内容在 PC上 已经转换为视频输出), 数据业务内容包括文件或幻灯片播放。 这些接口物理 上也可能是合并的, 例如视频接口和音频接口可以合并为 HDMI ( High Definition Multimedia Interface , 高清晰多媒体)接口或 IEEE 1394接口, 甚至 视频、 音频、 数据业务可以均通过 USB接口连接, 这种情况下视频、 音频、 数据业务还可以通过逻辑通道进行传输。  The above is a method for determining the grouping according to the interface relationship of the terminal 1, corresponding to the data input interface of the participant P1, the audio input interface, and the data received by the data service input interface as the L1 group, corresponding to the video input interface and the audio input interface of the participant P2. And the data received by the data service input interface as the L2 group. The video interface may be a CVBS (Composite Video Broadcast Signal or Composite Video Blanking and Sync) interface, an S-Video (Separate Video) interface, a VGA (Video Graphics Array) interface, DVI ( Digital Visual Interface, digital video) interface, etc. The audio interface can be any form of MIC (Medium Interface Connector), Line IN (input signal) analog or digital interface. The data service interface can be a network port, a USB (Universal Serial BUS) interface, or a VGA interface (data content has been converted to video output on a PC), and the data service content includes file or slide show. These interfaces may also be physically merged. For example, the video interface and the audio interface may be combined into an HDMI (High Definition Multimedia Interface) interface or an IEEE 1394 interface, and even video, audio, and data services may be connected through a USB interface. In this case, video, audio, and data services can also be transmitted through logical channels.
本发明并不局限于此, 各分组也可能存在多组视频、 音频、 数据业务流, 例如分组 L1也可以有两路音频输入, 或两路视频输入。  The present invention is not limited thereto, and there may be multiple sets of video, audio, and data service streams for each packet. For example, packet L1 may also have two audio inputs, or two video inputs.
确定分组关系的方法还可以是,通过音频和 /或视频的空间位置进行分组 , 即通过摄像机获取图像的空间位置, 和 /或通过麦克风获取声音的空间位置, 将图像的空间位置、 和 /或声音位置一致或相近的视频、 和 /或音频作为一组。 确定分组关系的方法还可以是, 利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流后, 将所述多路音频流进行分组; 或通过广角摄 像机获取到全景图像后, 将所述全景图像切分成多路分离的图像, 将所述的 多路视频流进行分组。 之后再将相应的视频媒体和音频媒体分类到一组, 例 如可以将视频媒体和音频媒体根据位置关系进行分组。 即进行分组的媒体流 可以是通过其他手段构成的, 而不一定是直接从接口获得的。 The method of determining the grouping relationship may also be by grouping the spatial locations of the audio and/or video, that is, obtaining the spatial position of the image by the camera, and/or obtaining the spatial position of the sound through the microphone, A video, and/or audio that has a spatial location of the image, and/or a sound location that is consistent or similar, is grouped. The method for determining the grouping relationship may further be: using the array microphone to pick up the multi-channel sound source positions, forming the multiple independent audio streams, grouping the multiple audio streams; or obtaining the panoramic image through the wide-angle camera, The panoramic image is sliced into multiplexed images, and the multiplexed video streams are grouped. Then, the corresponding video media and audio media are classified into a group, for example, the video media and the audio media can be grouped according to the positional relationship. That is, the media stream that is grouped may be constructed by other means, and is not necessarily obtained directly from the interface.
5502、 终端 1将经过分组后的媒体数据流进行压缩编码。  5502. The terminal 1 compresses and encodes the grouped media data stream.
5503、 对所述分组后的多路媒体数据流进行分组描述。  S503. Perform packet group description on the grouped multiple media data streams.
对多路媒体数据流进行分组描述有如下三种方式:  There are three ways to group multiple media data streams:
方式一 在媒体数据流或与媒体数据流相应的承载通道中携带分组描述。 例如, RTP ( real-time transport protocol, 实时流传输协议) 的扩展头部或 媒体数据流中可以携带分组描述。 具体地, 可以在 RTP的扩展头部或媒体数 据流中的扩展字段或特定字段中填充组别特征。  Manner 1 carries a packet description in a media data stream or a bearer channel corresponding to the media data stream. For example, an extended header or media data stream of an RTP (real-time transport protocol) may carry a packet description. Specifically, the group feature may be populated in an extension header of the RTP or an extension field or a specific field in the media data stream.
例如, 在 RTP的扩展头部 ( RTP Header Extension )填充分组描述字段, 具有相同分组描述的 RTP流, 为同一组。 或者, 还可以在相应的承载通道中 标识分组描述, 例如, 在 RTP的数据(payload ) 中进行字段扩展, 增加分组 描述。  For example, the RTP Header Extension fills the packet description field, and the RTP streams with the same packet description are the same group. Alternatively, the packet description may be identified in the corresponding bearer channel, for example, field expansion in RTP data (payload), and the packet description is added.
另外还可以自定义 RTP的扩展头部, 并在自定义的扩展头部内携带分组 描述。  It is also possible to customize the extension header of the RTP and carry the packet description in the custom extension header.
SIP/SDP ( Session Initiation Protocol, 会话初:½协议 /Session Initiation Protocol, 会话描述协议)通信中也可以通过在 RTP中进行标识, 实现分组描 述的携带。  The SIP/SDP (Session Initiation Protocol, Session Initiation Protocol) communication can also be carried in the RTP to carry the packet description.
方式二 在与媒体流相关的控制协议中携带分组描述信息。 具体地, 可以 在控制协议中的扩展字段中填充组别特征。  Mode 2 carries packet description information in a control protocol related to the media stream. Specifically, the group feature can be populated in an extension field in the control protocol.
以 H.323标准为例, 每一个媒体流都会分配对应的会话 ID ( sessionID ), 这样就可以通过为每个会话 ID指定分组描述来关联相关的会话, 比如在打开 逻辑通道(OpenLogicalChannel )消息中, 扩展一个分组描述字段, 用于填充 分组描述, 如下面的定义:  Taking the H.323 standard as an example, each media stream is assigned a corresponding session ID (sessionID), so that the associated session can be associated by specifying a packet description for each session ID, such as in an OpenLogicalChannel message. , extending a group description field, used to populate the group description, as defined below:
OpenLogicalChannel ::=SEQUENCE  OpenLogicalChannel ::=SEQUENCE
{  {
forwardLogicalChannelNumber LogicalChannelNumber, forwardLogicalChannelParameters SEQUENCE forwardLogicalChannelNumber LogicalChannelNumber, forwardLogicalChannelParameters SEQUENCE
{  {
portNumber INTEGER (0..65535) OPTIONAL,  portNumber INTEGER (0..65535) OPTIONAL,
dataType DataType,  dataType DataType,
multiplexParameters CHOICE  multiplexParameters CHOICE
{  {
h222LogicalChannelParameters H222LogicalChannelParameters,  h222LogicalChannelParameters H222LogicalChannelParameters,
h223LogicalChannelParameters H223LogicalChannelParameters,  h223LogicalChannelParameters H223LogicalChannelParameters,
v76LogicalChannelParameters V76LogicalChannelParameters, h225 OLogicalChannelParameters H2250LogicalChannelParameters,  v76LogicalChannelParameters V76LogicalChannelParameters, h225 OLogicalChannelParameters H2250LogicalChannelParameters,
none NULL ― for use with Separate Stack when  None NULL ― for use with Separate Stack when
― multiplexParameters are not required  ― multiplexParameters are not required
― or appropriate  ― or appropriate
}, forwardLogicalChannelDependency LogicalChannelNumber OPTIONAL,  }, forwardLogicalChannelDependency LogicalChannelNumber OPTIONAL,
-- also used to refer to the primary logical channel when using video redundancy coding replacementFor LogicalChannelNumber OPTIONAL,  -- also used to refer to the primary logical channel when using video redundancy coding replacementFor LogicalChannelNumber OPTIONAL,
groupID INTEGER (0..65535) OPTIONAL  groupID INTEGER (0..65535) OPTIONAL
},  },
― Used to specify the reverse channel for bi-directional open request  ― Used to specify the reverse channel for bi-directional open request
reverseLogicalChannelParameters SEQUENCE reverseLogicalChannelParameters SEQUENCE
{  {
dataType DataType,  dataType DataType,
multiplex Parameters CHOICE  Multiplex parameters CHOICE
{  {
― H.222 parameters are never present in reverse direction  ― H.222 parameters are never present in reverse direction
h223LogicalChannelParameters H223LogicalChannelParameters,  h223LogicalChannelParameters H223LogicalChannelParameters,
v76LogicalChannelParameters V76LogicalChannelParameters, h225 OLogicalChannelParameters H225 OLogicalChannelParameters  v76LogicalChannelParameters V76LogicalChannelParameters, h225 OLogicalChannelParameters H225 OLogicalChannelParameters
} OPTIONAL, -- Not present for H.222 reverseLogicalChannelDependency LogicalChannelNumber OPTIONAL,  } OPTIONAL, -- Not present for H.222 reverseLogicalChannelDependency LogicalChannelNumber OPTIONAL,
― also used to refer to the primary logical channel when using video redundancy coding replacementFor LogicalChannelNumber OPTIONAL  ― also used to refer to the primary logical channel when using video redundancy coding replacementFor LogicalChannelNumber OPTIONAL
} OPTIONAL,— Not present for uni-directional channel request separateStack NetworkAccessParameters OPTIONAL, } OPTIONAL, — Not present for uni-directional channel request separateStack NetworkAccessParameters OPTIONAL,
― for Open responder to establish the stack  ― for Open responder to establish the stack
encryptionSync EncryptionSync OPTIONAL ― used only by Master encryptionSync EncryptionSync OPTIONAL ― used only by Master
}  }
其中 groupID即为扩展的分组描述字段, 用于填充分组描述。  The groupID is an extended packet description field used to populate the packet description.
方式三 在媒体数据流中增加携带相应的媒体数据的分组描述的控制消 息。 具体地, 在各所述控制消息中定义了组别特征, 在所述控制消息的消息 体中包括属于同一分组的至少一个媒体流。 Mode 3 increases the control of the packet description carrying the corresponding media data in the media data stream. Interest. Specifically, a group feature is defined in each of the control messages, and at least one media stream belonging to the same group is included in a message body of the control message.
例如, 增加一个控制消息, 该消息用于通知对端哪些媒体数据是同一分 组。 4艮设分组 L1的的视频、 音频、数据业务数据的会话逻辑通道号分别为 1, For example, add a control message that informs the peer which media data is the same group. 4 Set the session logical channel number of the video, audio and data service data of the group L1 to 1, respectively
2, 3 , 则传送一个消息 Grouplndication, 该消息中包含逻辑通道号 1 , 2, 3, 这样接收端根据该消息判断逻辑通道号 1 , 2, 3的媒体数据是同一组的。 2, 3, then send a message Grouplndication, the message contains the logical channel number 1, 2, 3, so that the receiving end judges that the media data of the logical channel number 1, 2, 3 is the same group according to the message.
釆用 SIP/SDP协议进行多媒体通信时,也可以通过釆用多个会话( session ) 的方式来实现分组描述的携带。 例如:  When the SIP/SDP protocol is used for multimedia communication, the packet description can also be carried by using multiple sessions. E.g:
ν=0  ν=0
o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4  o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4
s=Pl m=audio 49170 RTP/AVP 0  s=Pl m=audio 49170 RTP/AVP 0
m=video 51372 RTP/AVP 31  m=video 51372 RTP/AVP 31
m=application 32416 udp wb v=0  m=application 32416 udp wb v=0
o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4  o=mhandley 2890844526 2890842807 IN IP4 126.16.64.4
s=P2 m=audio 49171 RTP/AVP 0  s=P2 m=audio 49171 RTP/AVP 0
m=video 51373 RTP/AVP 31  m=video 51373 RTP/AVP 31
m=application 32417 udp wb  m=application 32417 udp wb
其中, 会话 PI ( S=P1 )对应于分组 LI , 会话 P2 ( S=P2 )对应于分组 L2, 分别包含音频、 视频和应用数据流。  The session PI (S=P1) corresponds to the group LI, and the session P2 (S=P2) corresponds to the group L2, which respectively contains the audio, video and application data streams.
当然也可以通过增加一个属性来说明分组, 例如:  Of course, you can also specify the grouping by adding an attribute, for example:
v=0 m=audio 49170 RTP/AVP 0  v=0 m=audio 49170 RTP/AVP 0
m=video 51372 RTP/AVP 31  m=video 51372 RTP/AVP 31
m=application 32416 udp wb  m=application 32416 udp wb
m=audio 49171 RTP/AVP 0  m=audio 49171 RTP/AVP 0
m=video 51373 RTP/AVP 31  m=video 51373 RTP/AVP 31
m=application 32417 udp wb g=49170 51372 32416  m=application 32417 udp wb g=49170 51372 32416
g=49171 51373 32417 通过 g=49170 51372 32416这一说明将端口 49170, 51372, 32416相应的 媒体数据划分为一组, 通过 g=49171 51373 32417将 49171 , 51373 , 32417端 口相应的媒体数据划分为另外一组。 g=49171 51373 32417 The corresponding media data of ports 49170, 51372, 32416 are divided into groups by the description of g=49170 51372 32416, and the corresponding media data of ports 49171, 51373, 32417 are divided into another group by g =49171 51373 32417.
另外, 也可以通过直接扩展媒体描述, 增加分组说明来实现分组, 例如: ν=0 m=audio 49170 RTP/AVP 0 group 1  Alternatively, grouping can be achieved by directly extending the media description and adding a group description, for example: ν=0 m=audio 49170 RTP/AVP 0 group 1
m=video 51372 RTP/AVP 31 group 1  m=video 51372 RTP/AVP 31 group 1
m=application 32416 udp wb group 1 m=audio 49171 RTP/AVP 0 group2  m=application 32416 udp wb group 1 m=audio 49171 RTP/AVP 0 group2
m=video 51373 RTP/AVP 31 group2  m=video 51373 RTP/AVP 31 group2
m=application 32417 udp wb group2 这样, 媒体描述中的 groupl , group2用于说明该媒体流对应的组号, 相 同组号的媒体流为同一组。 groupl和 group2可以是数字或字符标识。  m=application 32417 udp wb group2 In this way, groupl and group2 in the media description are used to describe the group number corresponding to the media stream, and the media streams of the same group number are in the same group. Groupl and group2 can be numeric or character identifiers.
5504、 终端 1将多路媒体数据发送给终端 2。  5504. The terminal 1 sends the multi-channel media data to the terminal 2.
5505、 终端 2接收到带有分组描述的多路媒体数据, 进行解码解压缩处 理。  S505: The terminal 2 receives the multi-path media data with the packet description, and performs decoding and decompression processing.
5506、 根据分组描述输出该多路媒体数据。  5506. Output the multi-path media data according to the group description.
如图 4、 图 7所示, 具体为: 根据分组关系, 确定将分组 L1对应的视频 数据 1和音频数据 1输出到位置 1的显示器 1和扬声器 1 , 将分组 L2的视频 数据 2和音频数据 2输出到位置 3的显示器 3和扬声器 3 , 将分组 L1和 L2 的数据业务媒体 1和数据业务媒体 2按需输出到位置 2的显示器 2。  As shown in FIG. 4 and FIG. 7, specifically, according to the grouping relationship, the video data 1 and the audio data 1 corresponding to the packet L1 are output to the display 1 and the speaker 1 of the position 1, and the video data 2 and the audio data of the packet L2 are determined. 2 Output to the display 3 and the speaker 3 of the position 3, and output the data service medium 1 and the data service medium 2 of the packets L1 and L2 to the display 2 of the position 2 as needed.
本发明的实施例还可以实现数据业务数据输出接口对获取到的多个发送 端的数据业务进行切换输出, 即可以按需切换。 例如, 当发送端的与会人 P1 讲解幻灯片时则将数据业务输出接口切换到 L1的数据业务输出, 当 P2讲解 幻灯片时切换到 L2的数据业务输出。 这里的视频、 音频、 数据业务输出接口 也可以是和前面提到的输入接口一样的各种形式和组合。  The embodiment of the present invention can also implement the data service data output interface to switch and output the obtained data services of multiple senders, that is, can be switched as needed. For example, when the participant P1 at the sending end explains the slide show, the data service output interface is switched to the data service output of L1, and when P2 explains the slide show, the data service output is switched to L2. The video, audio, and data service output interfaces herein can also be in various forms and combinations as the aforementioned input interfaces.
本发明的实施例还可以实现根据所述接收的媒体数据流的分组描述对输 出接口输出的视频数据、 音频数据或数据业务数据进行切换。 例如, 终端 2 还可以进一步根据分组描述, 自动控制当与会人 P1讲解幻灯片时, 将视频输 出 1和视频输出 2接口的输出内容都切换为分组 L1的视频,将音频输出 1和 音频输出 2都切换为输出分组 L1的音频,这样可以使得会场 B的与会人能够 在各自的显示器上同时看到 P1的图像, 在各自的扬声器上听到 P1的声音。 进一步地, 同时接收端可以通知发送端暂停发生 L2组的视音频釆集或者暂停 L2组的视音频的传输, 以减少对带宽的占用。 Embodiments of the present invention may further implement switching between video data, audio data, or data service data output by an output interface according to a packet description of the received media data stream. For example, the terminal 2 can further automatically switch the output contents of the video output 1 and the video output 2 interface to the video of the group L1 when the participant P1 explains the slide, according to the group description, and output the audio 1 and The audio output 2 is switched to output the audio of the packet L1, so that the participants of the site B can simultaneously see the image of P1 on their respective displays, and hear the sound of P1 on the respective speakers. Further, the receiving end can notify the transmitting end to pause the video and audio collection of the L2 group or suspend the transmission of the video and audio of the L2 group to reduce the occupation of the bandwidth.
连接终端 1和终端 2的网络可以是电路域网络(如 E1/SDH/ISDN ), 也可 以是分组网络(如 IP网络), 通信协议可以釆用 H.320/H.323/H.324/SIP等。  The network connecting terminal 1 and terminal 2 may be a circuit domain network (such as E1/SDH/ISDN) or a packet network (such as an IP network), and the communication protocol may use H.320/H.323/H.324/ SIP, etc.
上述实施例中,将视频媒体 1、音频媒体 1和数据业务媒体 1划分为 1组, 这是一种常见的应用情况, 但实际的分组可以非常灵活, 例如将视频媒体 2 作为 1组, 将音频媒体 2与视频媒体 1作为 1组。  In the above embodiment, the video media 1, the audio media 1 and the data service media 1 are divided into one group, which is a common application situation, but the actual grouping can be very flexible, for example, the video media 2 is taken as a group, The audio medium 2 and the video medium 1 are grouped as one.
通过上述步骤, 本发明实施例实现了通过分组将多路媒体数据进行传输 和输出, 从而接收端能够将多路媒体流进行自由组合输出, 还解决了与视音 频对应的数据业务的传送和控制。 本发明实施例还实现了在一个会场使用单 个终端, 因此也解决了现有技术使用多个终端带来的成本过高的问题。 实施例二  Through the above steps, the embodiment of the present invention implements transmission and output of multi-channel media data by grouping, so that the receiving end can freely combine and output the multi-channel media streams, and also solves the transmission and control of the data service corresponding to the video and audio. . The embodiment of the invention also implements the use of a single terminal in a conference site, thereby solving the problem of excessive cost caused by using multiple terminals in the prior art. Embodiment 2
在实施例一的基础上, 本实施例多路媒体数据流传输和输出的方法还可 以通过中间设备(譬如: 会议控制设备, 具体可以为多点控制单元 MCU )来 处理。  On the basis of the first embodiment, the method for transmitting and outputting the multi-channel media data stream in this embodiment may also be processed by an intermediate device (for example, a conference control device, specifically, a multi-point control unit MCU).
MCU在接收到携带分组描述的媒体数据流及相应的分组描述之后, 获取 各路媒体数据的分组描述, 并可以根据输出需要重新确定各路媒体数据流 的分组关系。 本实施例中, 可以通过多点控制单元 MCU转发媒体数据流的分 组关系从而更改媒体数据流之间的分组关系, 或重新生成多路媒体数据流的 分组关系。  After receiving the media data stream carrying the packet description and the corresponding packet description, the MCU obtains the packet description of each media data, and can re-determine the grouping relationship of each media data stream according to the output requirement. In this embodiment, the grouping relationship of the media data streams may be forwarded by the multipoint control unit MCU to change the grouping relationship between the media data streams, or to regenerate the grouping relationship of the multiple media data streams.
如图 8所示, 终端 1发送的视频媒体流和音频媒体流, 以及终端 3发送 的数据业务媒体流在 MCU处组合为一个新组 L后发送给终端 2。具体的组合 方式不限于上述实施例。  As shown in FIG. 8, the video media stream and the audio media stream sent by the terminal 1 and the data service media stream sent by the terminal 3 are combined into a new group L at the MCU and then sent to the terminal 2. The specific combination is not limited to the above embodiment.
可以理解: 在视讯会议中, 会议控制设备主要完成主要媒体流控制, 因 此, 釆用这种方式能够较为灵活地根据一些会议终端的请求, 将相应的媒体 流(音频媒体流、 视频媒体流以及数据媒体流) 结合在一起, 能够多样化的 满足用户的需求。  It can be understood that in the video conference, the conference control device mainly performs the main media stream control. Therefore, in this way, the corresponding media stream (audio media stream, video media stream, and the like) can be flexibly and according to the request of some conference terminals. Data media streams), combined, can be diverse to meet the needs of users.
本发明实施例实现了通过分组将多路媒体流进行传输和输出, 从而接收 端能够将多路媒体流进行自由组合输出, 还解决了与视音频对应的数据业务 媒体流的传送和控制。 实施三 The embodiment of the invention implements transmitting and outputting multiple media streams by grouping, thereby receiving The terminal can freely combine and output the multiple media streams, and also solves the transmission and control of the data service media stream corresponding to the video and audio. Implementation three
本实施例为在实施一中传输多路媒体数据流的基础上, 终端 2输出多路 媒体流的方法, 具体包括如下步骤:  This embodiment is a method for outputting a multi-channel media stream by the terminal 2 on the basis of transmitting the multi-channel media data stream in the implementation, which specifically includes the following steps:
5901、 在会场 B中布置多个扬声器;  5901. Arranging a plurality of speakers in the conference site B;
5902、 终端 2获取会场 B中各显示器的空间位置信息;  5902. The terminal 2 acquires spatial location information of each display in the site B.
5903、 将接收到的分组 L1中视频数据 1通过位置 1的显示器 1输出, 分 组 L2中的视频数据 2通过位置 3的显示器 3输出。  5903. The video data 1 in the received packet L1 is output through the display 1 of the position 1, and the video data 2 in the packet L2 is output through the display 3 of the position 3.
5904、将接收到的分组 L1的音频数据 1通过会场 B中的至少一个扬声器 进行输出, 使得其空间位置感与位置 1的显示器 1相同或相近。  5904. The audio data 1 of the received packet L1 is output through at least one speaker in the conference site B, so that the spatial position sense is the same as or similar to the display 1 of the location 1.
S905、将接收到的分组 L2的音频数据 2通过会场 B中的至少一个扬声器 进行输出, 使得其空间位置感与位置 3的显示器 3相同或相近。  S905. The audio data 2 of the received packet L2 is output through at least one speaker in the conference site B, so that the spatial position sense is the same as or similar to the display 3 of the location 3.
上述步骤 S904和 S905并不局限于上述顺序,步骤 S905可以在步骤 S904 之前。  The above steps S904 and S905 are not limited to the above sequence, and step S905 may be before step S904.
本实施例能够在多会场中实现与会人听到的声音与相应的视频位置相同 或相近, 因而更好地实现了会议电视的真实感。  In this embodiment, the sounds heard by the participants can be the same or similar to the corresponding video positions in the multiple venues, thereby realizing the realism of the conference television.
本发明的实施例还提供了一种多路媒体流传输的装置, 能够解决多路媒 体流只能在特定的终端之间传输的问题。  Embodiments of the present invention also provide an apparatus for multi-channel media stream transmission, which can solve the problem that a multi-channel media stream can only be transmitted between specific terminals.
如图 10所示,本发明的实施例多媒体通信中多路媒体数据流传输的装置, 包括:  As shown in FIG. 10, an apparatus for transmitting a plurality of media data streams in a multimedia communication according to an embodiment of the present invention includes:
分组单元 101 , 用于对要传输的多路媒体数据流进行分组;  a grouping unit 101, configured to group the multiple media data streams to be transmitted;
分组描述单元 102, 对所述分组后的多路媒体数据流进行分组描述; 数据传输单元 103 ,传输所述分组后的多路媒体数据流及与所述媒体数据 流对应的分组描述。  The packet description unit 102 performs packet description on the packetized multi-media media data stream; the data transmission unit 103 transmits the packetized multi-channel media data stream and a packet description corresponding to the media data stream.
进一步地, 在本发明的一个较佳实施例中, 所述分组单元 101 可以通过 以下任一种方式对媒体数据流进行分组: 根据媒体数据流的接口之间的位置 关系进行分组, 即将同一物理接口或位置相近的物理接口的媒体数据流分为 一组; 或通过音频和 /或视频的空间位置进行分组, 即通过摄像机获取图像的 空间位置, 麦克风获取声音的空间位置, 将图像的空间位置、 和 /或声音位置 一致或相近的视频、 和 /或音频作为一组; 或利用阵列麦克风拾取出多路音源 位置, 形成多路相应的独立音频流后, 将所述多路音频流进行分组; 或通过 广角摄像机获取到全景图像后, 将所述全景图像切分成多路分离的图像后, 将所述的多路视频流进行分组。 Further, in a preferred embodiment of the present invention, the grouping unit 101 may group the media data streams by any one of the following methods: grouping according to a positional relationship between interfaces of the media data streams, that is, the same physics The media data streams of interfaces or similar physical interfaces are grouped into one group; or grouped by spatial locations of audio and/or video, that is, the spatial position of the image is obtained by the camera, the spatial position of the sound is obtained by the microphone, and the spatial position of the image is obtained. , and / or sound position Consistent or similar video, and/or audio as a group; or using an array microphone to pick up multiple audio source locations, forming multiple independent audio streams, grouping the multiple audio streams; or acquiring through a wide-angle camera After the panoramic image is divided into the multiplexed images, the multiplexed video streams are grouped.
所述分组描述单元 102 可以釆用以下任一种方式对分组后的媒体数据流 进行分组描述: 在媒体数据流或与媒体数据流相应的承载通道中携带分组描 述; 或在与媒体数据流相关的控制协议中携带分组描述; 或在媒体数据流中 增加携带与媒体数据相应的分组描述的控制消息。  The packet description unit 102 may perform group description on the grouped media data stream in any of the following manners: carrying the packet description in the media data stream or the bearer channel corresponding to the media data stream; or related to the media data stream The control protocol carries the packet description; or adds a control message carrying the packet description corresponding to the media data in the media data stream.
本实施例中所述媒体数据流具体可以为以下的一种或多种: 音频流、 视 频流、 数据流。  The media data stream in this embodiment may specifically be one or more of the following: an audio stream, a video stream, and a data stream.
本发明的实施例多媒体通信中多路媒体数据流传输的装置, 通过分组单 元将多路媒体数据进行分组, 分组描述单元对多媒体数据流进行分组描述, 从而实现了多路媒体流通过一个终端进行传输, 并使得接收终端能够根据分 组描述相应分组输出各媒体数据, 实现了多路媒体数据在不同终端的自由组 合传输。  In the embodiment of the present invention, the apparatus for transmitting the multi-channel media data stream in the multimedia communication, the multi-channel media data is grouped by the grouping unit, and the packet description unit performs group description on the multimedia data stream, thereby realizing the multi-channel media stream to be performed through one terminal. Transmitting, and enabling the receiving terminal to output each media data according to the packet description corresponding packet, realizing free combination transmission of the multi-channel media data at different terminals.
本发明还提供了一种多媒体通信中多路媒体数据流接收处理的装置, 能 够解决多路媒体流不能自由组合输出的问题。  The invention also provides a device for receiving and processing multi-channel media data streams in multimedia communication, which can solve the problem that multi-channel media streams cannot be freely combined and output.
如图 11所示, 本发明的实施例多媒体通信中多路媒体数据流接收处理的 装置, 包括接收单元 111和输出单元 112。  As shown in FIG. 11, an apparatus for receiving and processing a multi-media media stream in multimedia communication according to an embodiment of the present invention includes a receiving unit 111 and an output unit 112.
接收单元 111 ,用于接收带有分组描述的多路媒体数据流及与所述媒体数 据流对应的分组描述。 其中, 分组描述可以携带在媒体数据流或与媒体数据 流相应的承载通道中; 或携带在与媒体数据流相关的控制协议中; 或携带在 媒体数据流中增加的控制消息中。  The receiving unit 111 is configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream. The packet description may be carried in a media data stream or a bearer channel corresponding to the media data stream; or carried in a control protocol related to the media data stream; or carried in a control message added in the media data stream.
输出单元 112, 用于根据所述接收的多路媒体数据流的分组描述, 分组输 出所述媒体数据流。  The output unit 112 is configured to output the media data stream in a packet according to the packet description of the received multiple media data stream.
进一步地, 所述装置还可以包括: 切换单元, 用于根据所述接收单元接 收的媒体数据流的分组描述对所述输出单元输出的媒体数据流进行切换。 所 述装置还可以包括暂停单元, 用于在所有输出端口都切换到指定组的媒体数 据流时, 暂停除指定组外的其它组的媒体数据流的釆集或传输。  Further, the apparatus may further include: a switching unit, configured to switch, according to a packet description of the media data stream received by the receiving unit, the media data stream output by the output unit. The apparatus can also include a suspending unit for suspending the collection or transmission of media data streams of other groups than the designated group when all of the output ports are switched to the media stream of the specified group.
本发明的实施例多媒体通信中多路媒体数据流接收处理的装置在接收到 带有分组描述的多路媒体数据流及与所述媒体数据流对应的分组描述时, 能 够根据分组描述, 分组输出所述媒体数据流, 从而实现了多路媒体流能够在 不同终端自由输出。 另外, 切换单元还能够对特定分组的媒体数据的传输进 行切换, 暂停单元能够暂停某些组的媒体数据的釆集或传输, 减緩了网络压 力。 Embodiment of the present invention, when receiving a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, According to the packet description, the media data stream is outputted in groups, thereby realizing that the multi-media media stream can be freely outputted at different terminals. In addition, the switching unit is also capable of switching the transmission of media data of a specific group, and the suspending unit can suspend the collection or transmission of media data of certain groups, thereby reducing network pressure.
本发明的实施例提供的多媒体通信中多路媒体流传输的装置, 以及接收 处理的装置, 可以参照上述方法实施例一、 二、 三, 实现多媒体通信中多路 媒体流的自由组合输出。  The apparatus for transmitting multiple media streams in the multimedia communication provided by the embodiment of the present invention, and the apparatus for receiving the processing, can refer to the first, second and third embodiments of the foregoing method to realize free combination output of the multi-media media streams in the multimedia communication.
本发明的实施例还提供了一种会议终端, 包括发送装置和接收处理装置。 所述发送装置, 用于获取要传输的多路媒体数据流并进行分组, 对所述 分组后的多路媒体数据流进行分组描述, 并传输所述分组后的多路媒体数据 流及与所述媒体数据流对应的分组描述; 和 /或  Embodiments of the present invention also provide a conference terminal, including a transmitting device and a receiving processing device. The transmitting device is configured to acquire and group a multi-channel media data stream to be transmitted, perform packet group description on the grouped multi-channel media data stream, and transmit the grouped multi-channel media data stream and the device a packet description corresponding to the media data stream; and/or
所述接收处理装置, 用于接收带有分组描述的多路媒体数据流及与所述 媒体数据流对应的分组描述, 并根据所述接收的多路媒体数据流的分组描述, 分组输出所述媒体数据流。  The receiving processing apparatus is configured to receive a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and output the packet according to the packet description of the received multi-channel media data stream. Media data stream.
对于一个会议终端来说, 优选同时具有发送装置和接收处理装置。  For a conference terminal, it is preferable to have both a transmitting device and a receiving processing device.
在上述方案的基础上, 所述发送装置进一步包括:  Based on the foregoing solution, the sending device further includes:
分组单元, 用于对要传输的多路媒体数据流进行分组, 包括根据媒体数 据流的接口之间的位置关系进行分组, 或通过音频和 /或视频的空间位置进行 分组, 或利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流 后, 将所述的多路音频流进行分组; 或通过广角摄像机获取到全景图像后, 将所述全景图像切分成多路分离的图像后, 将所述的多路视频流进行分组; 分组描述单元, 对所述分组后的多路媒体数据流进行分组描述; 包括在 媒体数据流或与媒体数据流相应的承载通道中携带分组描述; 或在与媒体数 据流相关的控制协议中携带分组描述; 或在媒体数据流中增加携带与媒体数 据相应的分组描述的控制消息;  a grouping unit, configured to group the multi-media data streams to be transmitted, including grouping according to a positional relationship between interfaces of the media data streams, or grouping by spatial locations of audio and/or video, or picking up by using an array microphone After the multi-channel audio source is formed, the multi-channel audio streams are grouped, and the multi-channel audio streams are grouped; or after the panoramic image is acquired by the wide-angle camera, the panoramic image is divided into multiple separated images. And grouping the multi-channel video stream; the packet description unit, performing packet group description on the grouped multi-path media data stream; and including carrying the packet description in the media data stream or the bearer channel corresponding to the media data stream; Or carrying a packet description in a control protocol related to the media data stream; or adding a control message carrying a packet description corresponding to the media data in the media data stream;
数据传输单元, 传输所述分组后的多路媒体数据流及与所述媒体数据流 对应的分组描述。  And a data transmission unit, configured to transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
本发明的实施例会议终端, 能够实现对要发送的多媒体数据流的分组以 及分组描述的携带, 从而能够将多路媒体数据流进行分组发送, 而在接收到 分组发送的多媒体数据流以及对应的分组描述后, 能够根据该分组描述将各 媒体数据流进行输出, 从而实现了将多路媒体数据自由组合输出。 本发明的实施例还提供了一种多媒体通信中多路媒体流传输的系统, 能 够解决多路媒体流不能自由组合输出的问题。 The conference terminal of the embodiment of the present invention can implement the grouping of the multimedia data stream to be transmitted and the carrying of the packet description, so that the multi-channel media data stream can be sent in a packet, and the multimedia data stream that receives the packet transmission and the corresponding After the packet description, the media data streams can be output according to the packet description, thereby realizing free combination output of the multi-channel media data. The embodiment of the invention also provides a system for multi-media media stream transmission in multimedia communication, which can solve the problem that multiple media streams cannot be combined and output freely.
本实施例可以与图 4的实施例结合进行说明。 如图 12所示, 本发明的实 施例多媒体通信中多路媒体流传输的系统, 包括: 数据获取装置 121、 传输装 置 122、 接收处理装置 123以及数据输出装置 124。 其中,  This embodiment can be explained in conjunction with the embodiment of Fig. 4. As shown in FIG. 12, a system for multi-media streaming in multimedia communication according to an embodiment of the present invention includes: a data acquiring device 121, a transmitting device 122, a receiving processing device 123, and a data output device 124. among them,
所述数据获取装置 121 , 包括视频获取设备、音频获取设备或数据获取设 备, 用于获取多路媒体数据流。  The data acquisition device 121 includes a video acquisition device, an audio acquisition device, or a data acquisition device, for acquiring a multi-channel media data stream.
所述传输装置 122,用于对所述数据获取装置获取的多路媒体数据流进行 分组, 对所述分组后的多路媒体数据流进行分组描述, 并传输所述分组后的 多路媒体数据流及与所述媒体数据流对应的分组描述。  The transmitting device 122 is configured to group the multi-channel media data streams acquired by the data acquiring device, perform packet group description on the grouped multi-channel media data streams, and transmit the grouped multi-channel media data. And a packet description corresponding to the media data stream.
所述接收处理装置 123 ,用于接收带有分组描述的多路媒体数据流及与所 述媒体数据流对应的分组描述, 并根据所述接收的多路媒体数据流的分组描 述, 分组输出所述媒体数据流。  The receiving processing device 123 is configured to receive a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream, and according to the packet description of the received multi-channel media data stream, the packet output station The media data stream.
所述数据输出装置 124,用于将所述分组后的媒体数据流通过视频输出设 备、 音频输出设备或数据输出设备进行输出。  The data output device 124 is configured to output the grouped media data stream through a video output device, an audio output device, or a data output device.
在上述方案的基础上, 所述传输装置进一步可以包括:  Based on the foregoing solution, the transmission device may further include:
分组单元, 用于对要传输的多路媒体数据流进行分组, 包括根据媒体数 据流的接口之间的位置关系进行分组, 或通过音频和 /或视频的空间位置进行 分组, 或利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流 后, 将所述的多路音频流进行分组; 或通过广角摄像机获取到全景图像后, 将所述全景图像切分成多路分离的图像后, 将所述的多路视频流进行分组; 分组描述单元, 对所述分组后的多路媒体数据流进行分组描述; 包括在 媒体数据流或与媒体数据流相应的承载通道中携带分组描述; 或在与媒体数 据流相关的控制协议中携带分组描述; 或在媒体数据流中增加携带与媒体数 据相应的分组描述的控制消息;  a grouping unit, configured to group the multi-media data streams to be transmitted, including grouping according to a positional relationship between interfaces of the media data streams, or grouping by spatial locations of audio and/or video, or picking up by using an array microphone After the multi-channel audio source is formed, the multi-channel audio streams are grouped, and the multi-channel audio streams are grouped; or after the panoramic image is acquired by the wide-angle camera, the panoramic image is divided into multiple separated images. And grouping the multi-channel video stream; the packet description unit, performing packet group description on the grouped multi-path media data stream; and including carrying the packet description in the media data stream or the bearer channel corresponding to the media data stream; Or carrying a packet description in a control protocol related to the media data stream; or adding a control message carrying a packet description corresponding to the media data in the media data stream;
数据传输单元, 传输所述分组后的多路媒体数据流及与所述媒体数据流 对应的分组描述。  And a data transmission unit, configured to transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
在本发明的一个较佳实施例中, 所述系统还可以包括:  In a preferred embodiment of the present invention, the system may further include:
会议控制设备, 用于通过转发媒体数据流分组关系更改分组关系, 或对 不同会场的多路媒体数据流进行分组。 所述会议控制设备可以为 MCU。  The conference control device is configured to change the grouping relationship by forwarding the media data stream grouping relationship, or group the multiple media data streams of different sites. The conference control device can be an MCU.
本发明实施例提供的多媒体通信中多路媒体流传输的系统, 通过传输装 置对多路媒体数据进行分组, 对所述分组后的多路媒体数据流进行分组描述, 并发送所述带有分组描述的媒体数据以及对应的分组描述, 而输出装置接收 到多路媒体数据和分组描述后, 根据所述分组描述分组输出所述媒体数据, 从而能够实现多路媒体流的自由组合输出;所述系统中优选的 MCU还能够对 媒体数据的分组关系进行更改或者是转发通过转发媒体数据流分组关系进而 更改分组关系, 因而能够实现多路媒体流的自由组合输出。 A system for transmitting multiple media streams in multimedia communication provided by an embodiment of the present invention Grouping the multi-channel media data, grouping the packetized multi-media media data, and transmitting the media data with the packet description and the corresponding packet description, and the output device receives the multi-channel media data After the packet description, the media data is output according to the packet description packet, so that free combination output of the multi-media media stream can be realized; the preferred MCU in the system can also change or forward the packet relationship of the media data. The media data stream grouping relationship is forwarded to change the grouping relationship, thereby enabling free combination output of the multi-media media streams.
本发明的实施例提供的多媒体通信中多路媒体流传输的系统, 可以参照 上述方法实施例一、 二、 三, 实现多媒体通信中多路媒体流传输、 接收和输 出处理。  The system for transmitting multiple media streams in the multimedia communication provided by the embodiment of the present invention can refer to the first, second, and third embodiments of the foregoing method to implement multi-channel media stream transmission, reception, and output processing in multimedia communication.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流 程, 是可以通过计算机程序来指令相关的硬件来完成, 所述的程序可存储于 一计算机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施 例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体( Read-Only Memory, ROM )或随机存 己忆体 ( Random Access Memory, RAM )等。  A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. In execution, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到的变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保 护范围应所述以权利要求的保护范围为准。  The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any change or replacement that can be easily conceived by those skilled in the art within the technical scope of the present invention is All should be covered by the scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.

Claims

权 利 要求 书 Claim
1、 一种多媒体通信中多路媒体数据流传输的方法, 其特征在于, 包括: 对要传输的多路媒体数据流进行分组; A method for transmitting a multi-channel media data stream in a multimedia communication, comprising: grouping a plurality of media data streams to be transmitted;
对所述分组后的多路媒体数据流进行分组描述;  Performing group description on the grouped multi-media data stream;
传输所述分组后的多路媒体数据流及与所述媒体数据流对应的分组描述。 Transmitting the packetized multi-media media data stream and a packet description corresponding to the media data stream.
2、 如权利要求 1所述的方法, 其特征在于, 所述将多路媒体数据流进行分 组具体为: 2. The method according to claim 1, wherein the grouping the multiple media data streams is specifically:
根据媒体数据流的接口之间的位置关系进行分组, 即将同一物理接口或位 置相近的物理接口的媒体数据流分为一组; 或  Grouping according to the positional relationship between interfaces of the media data stream, that is, dividing the media data streams of the same physical interface or physical interfaces with similar positions into one group; or
通过音频和 /或视频的空间位置进行分组, 即通过摄像机获取图像的空间位 置, 麦克风获取声音的空间位置, 将图像的空间位置、 和 /或声音位置一致或相 近的视频、 和 /或音频作为一组; 或  By spatial location of audio and/or video, that is, the spatial position of the image is acquired by the camera, the spatial position of the sound is obtained by the microphone, and the video, and/or audio of the spatial position, and/or the sound position of the image are consistent or similar. a group; or
利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流后, 将 所述多路音频流进行分组; 或  After the multi-channel audio source is picked up by the array microphone to form a plurality of corresponding independent audio streams, the multiple audio streams are grouped; or
通过广角摄像机获取到全景图像后, 将所述全景图像切分成多路分离的图 像后, 将所述的多路视频流进行分组。  After the panoramic image is acquired by the wide-angle camera, the panoramic image is sliced into multiplexed images, and the multiplexed video streams are grouped.
3、 如权利要求 1或 2所述的方法, 其特征在于, 对所述分组后的多路媒体 数据流进行分组描述, 包括:  The method according to claim 1 or 2, wherein the grouping of the grouped multi-media data streams is performed, including:
在媒体数据流或与媒体数据流相应的承载通道中携带分组描述; 或 在与媒体数据流相关的控制协议中携带分组描述; 或  Carrying a packet description in a media data stream or a bearer channel corresponding to the media data stream; or carrying a packet description in a control protocol associated with the media data stream; or
在媒体数据流中增加携带与媒体数据相应的分组描述的控制消息。  A control message carrying a packet description corresponding to the media data is added to the media data stream.
4、 如权利要求 1或 2所述的方法, 其特征在于, 所述对要传输的多路媒体 数据流进行分组, 包括:  The method according to claim 1 or 2, wherein the grouping the multi-channel media streams to be transmitted comprises:
由会议控制设备通过转发多路媒体数据流的分组关系更改分组关系; 或 由会议控制设备对不同会场的多路媒体数据流进行分组。  The conference control device changes the grouping relationship by forwarding the grouping relationship of the multiple media data streams; or the conference control device groups the multiple media data streams of different sites.
5、 如权利要求 1或 2所述的方法, 其特征在于, 所述的媒体数据流具体为 以下的一种或多种: 音频流、 视频流、 数据流。  The method according to claim 1 or 2, wherein the media data stream is specifically one or more of the following: an audio stream, a video stream, and a data stream.
6、一种多媒体通信中多路媒体数据流接收处理的方法, 其特征在于, 包括: 接收带有分组描述的多路媒体数据流及与所述媒体数据流对应的分组描 述; 根据所述接收的多路媒体数据流的分组描述, 分组输出所述媒体数据流。A method for receiving and processing a multi-media media stream in a multimedia communication, comprising: receiving a multi-channel media data stream with a packet description and a packet description corresponding to the media data stream; And outputting the media data stream according to a packet description of the received multi-media media data stream.
7、 如权利要求 6所述的方法, 其特征在于, 根据所述接收的多路媒体数据 流的分组描述, 分组输出所述媒体数据流包括: The method according to claim 6, wherein, according to the packet description of the received multi-media media stream, the packet outputting the media data stream comprises:
根据所述接收的媒体数据流的分组描述将所述接收的视频数据输出到相应 的视频输出接口; 和 /或  Outputting the received video data to a corresponding video output interface according to a packet description of the received media data stream; and/or
将所述接收的音频数据输出到相应的音频数据接口; 和 /或  Outputting the received audio data to a corresponding audio data interface; and/or
将所述接收的数据业务数据输出到相应的数据业务数据输出接口。  And outputting the received data service data to a corresponding data service data output interface.
8、 如权利要求 7所述的方法, 其特征在于, 将所述接收的数据业务数据输 出到相应的数据业务数据输出接口包括:  8. The method of claim 7, wherein outputting the received data service data to a corresponding data service data output interface comprises:
所述数据业务数据输出接口对接收到的多个发送端的数据业务数据进行切 换输出。  The data service data output interface performs switching output on the received data service data of the plurality of senders.
9、 如权利要求 6所述的方法, 其特征在于, 根据所述接收的多路媒体数据 流的分组描述, 分组输出所述媒体数据流包括:  The method according to claim 6, wherein, according to the packet description of the received multi-media media stream, the packet outputting the media data stream comprises:
根据所述接收的多路媒体数据流的分组描述, 将视频数据输出到相应的显 示设备上, 获取所述显示设备的空间位置信息, 将与所述视频数据相关的音频 数据用至少一个扬声设备输出, 使得输出的音频空间位置与所述显示设备的空 间位置相同或相近。  And outputting the video data to the corresponding display device according to the packet description of the received multi-media media data stream, acquiring spatial location information of the display device, and using at least one speaker data for the audio data related to the video data. The device outputs such that the output audio spatial location is the same or similar to the spatial location of the display device.
10、 如权利要求 6所述的方法, 其特征在于, 还包括:  10. The method of claim 6, further comprising:
根据所述接收的媒体数据流的分组描述对输出接口输出的视频数据、 音频 数据或数据业务数据进行切换。  Switching video data, audio data, or data service data output by the output interface according to the packet description of the received media data stream.
11、 一种多媒体通信中多路媒体数据流传输的装置, 其特征在于, 包括: 分组单元, 用于对要传输的多路媒体数据流进行分组;  An apparatus for transmitting a plurality of media data streams in a multimedia communication, comprising: a grouping unit, configured to group a plurality of media data streams to be transmitted;
分组描述单元, 对所述分组后的多路媒体数据流进行分组描述;  a packet description unit, grouping and describing the grouped multi-media media data stream;
数据传输单元, 传输所述分组后的多路媒体数据流及与所述媒体数据流对 应的分组描述。  And a data transmission unit, configured to transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
12、 根据权利要求 11所述的装置, 其特征在于, 所述分组单元通过以下任 一项对要传输的多路媒体数据流进行分组:  12. The apparatus according to claim 11, wherein the grouping unit groups the plurality of media data streams to be transmitted by any one of the following:
根据媒体数据流的接口之间的位置关系进行分组, 即将同一物理接口或位 置相近的物理接口的媒体数据流分为一组; 或  Grouping according to the positional relationship between interfaces of the media data stream, that is, dividing the media data streams of the same physical interface or physical interfaces with similar positions into one group; or
通过音频和 /或视频的空间位置进行分组, 即通过摄像机获取图像的空间位 置, 麦克风获取声音的空间位置, 将图像的空间位置、 和 /或声音位置一致或相 近的视频、 和 /或音频作为一组; 或 Grouping by the spatial position of the audio and/or video, that is, obtaining the spatial position of the image through the camera, the spatial position of the sound obtained by the microphone, the spatial position of the image, and/or the sound position being consistent or phased Near video, and/or audio as a group; or
利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流后, 将 所述的多路音频流进行分组; 或  After the multi-channel audio source is picked up by the array microphone to form a plurality of corresponding independent audio streams, the multiple audio streams are grouped; or
通过广角摄像机获取到全景图像后, 将所述全景图像切分成多路分离的图 像后, 将所述的多路视频流进行分组。  After the panoramic image is acquired by the wide-angle camera, the panoramic image is sliced into multiplexed images, and the multiplexed video streams are grouped.
13、 根据权利要求 11所述的装置, 其特征在于, 所述分组描述单元通过以 下任一项对所述分组后的多路媒体数据流进行分组描述:  The device according to claim 11, wherein the packet description unit groups the grouped multi-media data streams by any one of the following:
在媒体数据流或与媒体数据流相应的承载通道中携带分组描述; 或 在与媒体数据流相关的控制协议中携带分组描述; 或  Carrying a packet description in a media data stream or a bearer channel corresponding to the media data stream; or carrying a packet description in a control protocol associated with the media data stream; or
在媒体数据流中增加携带与媒体数据相应的分组描述的控制消息。  A control message carrying a packet description corresponding to the media data is added to the media data stream.
14、 根据权利要求 11至 13中任一项所述的装置, 其特征在于,  14. Apparatus according to any one of claims 11 to 13 wherein:
所述媒体数据流具体为以下的一种或多种: 音频流、 视频流、 数据流。 The media data stream is specifically one or more of the following: an audio stream, a video stream, and a data stream.
15、 一种多媒体通信中多路媒体数据流接收处理的装置, 其特征在于, 包 括: 15. A device for receiving and processing a multi-channel media data stream in multimedia communication, comprising:
接收单元, 用于接收带有分组描述的多路媒体数据流及与所述媒体数据流 对应的分组描述;  a receiving unit, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream;
输出单元, 用于根据所述接收的多路媒体数据流的分组描述, 分组输出所 述媒体数据流。  And an output unit, configured to output the media data stream according to the packet description of the received multi-media media data stream.
16、 根据权利要求 15所述的装置, 其特征在于, 还包括:  The device according to claim 15, further comprising:
切换单元, 用于根据所述接收单元接收的媒体数据流的分组描述对所述输 出单元输出的媒体数据流进行切换。  And a switching unit, configured to switch the media data stream output by the output unit according to a packet description of the media data stream received by the receiving unit.
17、 一种会议终端, 其特征在于, 包括:  17. A conference terminal, comprising:
发送装置, 用于获取要传输的多路媒体数据流并进行分组, 对所述分组后 的多路媒体数据流进行分组描述, 并传输所述分组后的多路媒体数据流及与所 述媒体数据流对应的分组描述; 和 /或  a sending device, configured to acquire and group a plurality of media data streams to be transmitted, perform packet group description on the grouped multi-channel media data stream, and transmit the grouped multi-channel media data stream and the media a packet description corresponding to the data stream; and/or
接收处理装置, 用于接收带有分组描述的多路媒体数据流及与所述媒体数 据流对应的分组描述, 并根据所述接收的多路媒体数据流的分组描述, 分组输 出所述媒体数据流。  a receiving processing device, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream, and output the media data according to a packet description of the received multi-channel media data stream flow.
18、 根据权利要求 17所述的会议终端, 其特征在于, 所述发送装置包括: 分组单元, 用于对要传输的多路媒体数据流进行分组, 包括根据媒体数据 流的接口之间的位置关系进行分组,或通过音频和 /或视频的空间位置进行分组, 或利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流后, 将所 述的多路音频流进行分组; 或通过广角摄像机获取到全景图像后, 将所述全景 图像切分成多路分离的图像后, 将所述的多路视频流进行分组; The conference terminal according to claim 17, wherein the transmitting device comprises: a grouping unit, configured to group the multiple media data streams to be transmitted, including locations between interfaces according to the media data streams Relationships are grouped or grouped by spatial location of audio and/or video, Or using the array microphone to pick up the multi-channel sound source positions to form a plurality of corresponding independent audio streams, and then grouping the multiple audio streams; or after acquiring the panoramic image through the wide-angle camera, dividing the panoramic image into multiple After the separated images, the multiple video streams are grouped;
分组描述单元, 对所述分组后的多路媒体数据流进行分组描述; 包括在媒 体数据流或与媒体数据流相应的承载通道中携带分组描述; 或在与媒体数据流 相关的控制协议中携带分组描述; 或在媒体数据流中增加携带与媒体数据相应 的分组描述的控制消息;  a packet description unit, configured to perform group description on the grouped multi-media data stream; including carrying a packet description in a media data stream or a bearer channel corresponding to the media data stream; or carrying in a control protocol related to the media data stream a packet description; or adding a control message carrying a packet description corresponding to the media data in the media data stream;
数据传输单元, 传输所述分组后的多路媒体数据流及与所述媒体数据流对 应的分组描述。  And a data transmission unit, configured to transmit the packetized multi-media media data stream and a packet description corresponding to the media data stream.
19、 一种多媒体通信中多路媒体数据流传输的系统, 其特征在于, 包括: 数据获取装置, 包括视频获取设备、 音频获取设备或数据获取设备, 用于 获取多路媒体数据流;  A system for transmitting a plurality of media data streams in a multimedia communication, comprising: a data acquisition device, comprising: a video acquisition device, an audio acquisition device, or a data acquisition device, configured to acquire a multi-channel media data stream;
传输装置, 用于对所述数据获取装置获取的多路媒体数据流进行分组, 对 所述分组后的多路媒体数据流进行分组描述, 并传输所述分组后的多路媒体数 据流及与所述媒体数据流对应的分组描述;  a transmitting device, configured to group the multi-channel media data streams acquired by the data acquiring device, perform packet group description on the grouped multi-channel media data streams, and transmit the grouped multi-channel media data streams and a packet description corresponding to the media data stream;
接收处理装置, 用于接收带有分组描述的多路媒体数据流及与所述媒体数 据流对应的分组描述, 并根据所述接收的多路媒体数据流的分组描述, 分组输 出所述媒体数据流;  a receiving processing device, configured to receive a multi-media media data stream with a packet description and a packet description corresponding to the media data stream, and output the media data according to a packet description of the received multi-channel media data stream Flow
数据输出装置, 用于将所述分组后的媒体数据流通过视频输出设备、 音频 输出设备或数据输出设备进行输出。  And a data output device, configured to output the grouped media data stream through a video output device, an audio output device, or a data output device.
20、 根据权利要求 19所述的系统, 其特征在于, 所述传输装置包括: 分组单元, 用于对要传输的多路媒体数据流进行分组, 包括根据媒体数据 流的接口之间的位置关系进行分组,或通过音频和 /或视频的空间位置进行分组, 或利用阵列麦克风拾取出多路音源位置, 形成多路相应的独立音频流后, 将所 述的多路音频流进行分组; 或通过广角摄像机获取到全景图像后, 将所述全景 图像切分成多路分离的图像后, 将所述的多路视频流进行分组;  The system according to claim 19, wherein the transmitting means comprises: a grouping unit, configured to group the plurality of media data streams to be transmitted, including a positional relationship between interfaces according to the media data stream Grouping, or grouping by spatial locations of audio and/or video, or picking up multiple audio source locations using an array microphone to form a plurality of corresponding independent audio streams, grouping the multiple audio streams; or After the wide-angle camera acquires the panoramic image, the panoramic image is divided into multiple demultiplexed images, and the multiple video streams are grouped;
分组描述单元, 对所述分组后的多路媒体数据流进行分组描述; 包括在媒 体数据流或与媒体数据流相应的承载通道中携带分组描述; 或在与媒体数据流 相关的控制协议中携带分组描述; 或在媒体数据流中增加携带与媒体数据相应 的分组描述的控制消息;  a packet description unit, configured to perform group description on the grouped multi-media data stream; including carrying a packet description in a media data stream or a bearer channel corresponding to the media data stream; or carrying in a control protocol related to the media data stream a packet description; or adding a control message carrying a packet description corresponding to the media data in the media data stream;
数据传输单元, 传输所述分组后的多路媒体数据流及与所述媒体数据流对 应的分组描述。 a data transmission unit, transmitting the packetized multi-media media data stream and the media data stream pair The description of the grouping should be.
21、 根据权利要求 19或 20所述的系统, 其特征在于, 还包括:  The system according to claim 19 or 20, further comprising:
会议控制设备, 用于通过转发媒体数据流分组关系更改分组关系, 或对不同会 场的多路媒体数据流进行分组。 The conference control device is configured to change the grouping relationship by forwarding the media data stream grouping relationship, or group the multiple media data streams of different sites.
PCT/CN2010/070180 2009-02-20 2010-01-14 Method, apparatus and system for transmitting and receiving multiplex media data stream WO2010094213A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910008299.4 2009-02-20
CN200910008299.4A CN101489090B (en) 2009-02-20 2009-02-20 Method, apparatus and system for multipath media stream transmission and reception

Publications (1)

Publication Number Publication Date
WO2010094213A1 true WO2010094213A1 (en) 2010-08-26

Family

ID=40891737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/070180 WO2010094213A1 (en) 2009-02-20 2010-01-14 Method, apparatus and system for transmitting and receiving multiplex media data stream

Country Status (2)

Country Link
CN (1) CN101489090B (en)
WO (1) WO2010094213A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101489090B (en) * 2009-02-20 2014-01-08 华为终端有限公司 Method, apparatus and system for multipath media stream transmission and reception
CN106162038A (en) * 2015-03-25 2016-11-23 中兴通讯股份有限公司 A kind of audio frequency sending method and device
CN108667891B (en) * 2018-03-05 2020-11-06 集思谱(北京)科技有限公司 Independent unit combined multimedia information spreading method and system
CN110349584A (en) * 2019-07-31 2019-10-18 北京声智科技有限公司 A kind of audio data transmission method, device and speech recognition system
CN113542688B (en) * 2021-07-14 2023-03-28 杭州海康威视数字技术股份有限公司 Audio and video monitoring method, device, equipment, storage medium and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1128921A (en) * 1995-02-09 1996-08-14 三菱电机株式会社 Multi medium information treatment system
CN1929593A (en) * 2005-09-07 2007-03-14 宝利通公司 Spatially correlated audio in multipoint videoconferencing
CN101489090A (en) * 2009-02-20 2009-07-22 深圳华为通信技术有限公司 Method, apparatus and system for multipath media stream transmission and reception

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5548346A (en) * 1993-11-05 1996-08-20 Hitachi, Ltd. Apparatus for integrally controlling audio and video signals in real time and multi-site communication control method
CN1210956C (en) * 2002-06-19 2005-07-13 华为技术有限公司 Real time receiving and sxtorage method for video signal conference stream medium
JP2007072739A (en) * 2005-09-07 2007-03-22 Hitachi Communication Technologies Ltd Multipoint conference system, multipoint conference apparatus, and client terminal
CN101217658A (en) * 2008-01-09 2008-07-09 杭州华三通信技术有限公司 A media transmission method, system and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1128921A (en) * 1995-02-09 1996-08-14 三菱电机株式会社 Multi medium information treatment system
CN1929593A (en) * 2005-09-07 2007-03-14 宝利通公司 Spatially correlated audio in multipoint videoconferencing
CN101489090A (en) * 2009-02-20 2009-07-22 深圳华为通信技术有限公司 Method, apparatus and system for multipath media stream transmission and reception

Also Published As

Publication number Publication date
CN101489090B (en) 2014-01-08
CN101489090A (en) 2009-07-22

Similar Documents

Publication Publication Date Title
KR101081803B1 (en) Method and system for conducting continuous presence conferences
US8237765B2 (en) Video conferencing device which performs multi-way conferencing
US9426423B2 (en) Method and system for synchronizing audio and video streams in media relay conferencing
US9344475B2 (en) Media transmission method and system based on telepresence
US8385234B2 (en) Media stream setup in a group communication system
EP3197153B1 (en) Method and system for conducting video conferences of diverse participating devices
WO2012079424A1 (en) Distributed video processing method, system and multipoint control unit
WO2007082433A1 (en) Apparatus, network device and method for transmitting video-audio signal
WO2011029402A1 (en) Method and device for processing video image data, system and terminal for video conference
CN101198008A (en) Method and system for implementing multi-screen and multi-picture
WO2011140812A1 (en) Multi-picture synthesis method and system, and media processing device
WO2015127799A1 (en) Method and device for negotiating on media capability
WO2010094213A1 (en) Method, apparatus and system for transmitting and receiving multiplex media data stream
CN105191316A (en) Switching apparatus for switching compressed video streams, conference system with the switching apparatus and process for switching compressed video streams
WO2012175025A1 (en) Remotely presented conference system, method for recording and playing back remotely presented conference
WO2013128827A1 (en) Electronic conference system, bandwidth management method and storage medium in which bandwidth management program is stored
US20100274909A1 (en) Connection device and connection method
WO2014177082A1 (en) Video conference video processing method and terminal
US8024486B2 (en) Converting data from a first network format to non-network format and from the non-network format to a second network format
JP2005311670A (en) Terminal, system and method for television conference, and program therefor
CN101141615B (en) External implementing method of session television terminal supporting double currents
JP2823571B2 (en) Distributed multipoint teleconferencing equipment
EP1286547A1 (en) Broadband media distribution system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10743394

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10743394

Country of ref document: EP

Kind code of ref document: A1