WO2012175228A1

WO2012175228A1 - Methods and apparatus for advertising endpoint device capabilities for sending/receiving simultaneous media streams

Info

Publication number: WO2012175228A1
Application number: PCT/EP2012/053861
Authority: WO
Inventors: Bo Burman; Magnus Westerlund
Original assignee: Telefonaktiebolaget L M Ericsson (Publ)
Priority date: 2011-06-23
Filing date: 2012-03-07
Publication date: 2012-12-27
Also published as: WO2012175227A1

Abstract

Methods are disclosed for operating a sender endpoint device that communicates with a receiver endpoint device. The method includes advertising (2100) capability information to the sender endpoint device that defines a capability of the receiver endpoint device to simultaneously receive a plurality of media streams. The method further includes receiving (2102) the plurality of media streams at a same time from the sender endpoint device based on the advertised capability information. Related methods are disclosed for operating a receiver endpoint device. Related sender endpoint devices are receiver endpoint devices are disclosed.

Description

METHODS AND APPARATUS FOR ADVERTISING ENDPOINT DEVICE CAPABILITIES FOR SENDING/RECEIVING SIMULTANEOUS MEDIA STREAMS

CROSS-REFERENCE TO RELATED APPLICATION

[0001] The present non-provisional application claims priority to U.S. Provisional

Application No. 61/500,333, filed June 23, 2011, the disclosure of which is incorporated herein by reference as if set forth fully herein.

TECHNICAL FIELD

[0002] The present invention relates to communications networks. More particularly, and not by way of limitation, the present invention is directed to systems and methods of establishing and controlling Real Time Transport Protocol (RTP) media streams through a communications network between endpoint devices.

BACKGROUND

[0003] RTP protocol supports multiple endpoint device participants each sending their own media streams. Unfortunately many implementations are aimed only at point to point Voice over IP (VoIP) with a single source device at each endpoint. For example, client system implementations providing video conference functionality typically require the use of a central mixer that only delivers a single media stream per media type. Thus any application that wants to allow for more advanced usage where multiple media streams are sent and received by an endpoint device would have an incompatibility problem with legacy systems.

SUMMARY

[0004] It is therefore an object to address at least some of the above mentioned disadvantages and/or to improve performance in a wireless communication system.

[0005] Some embodiments of the present invention are directed to a method of operating a sender endpoint device that communicates with a receiver endpoint device. The method includes advertising capability information to the sender endpoint device that defines a capability of the receiver endpoint device to simultaneously receive a plurality of media streams. The method further includes receiving the plurality of media streams at a same time from the sender endpoint device during the communication session based on the advertised capability information. [0006] In some further embodiments, session negotiation information is exchanged with the sender endpoint device to setup a session prior to receiving the plurality of media streams, and the capability information is communicated to the sender endpoint device as part of the session negotiation information. The advertised capability information may indicate a maximum number of media streams that the receiver endpoint device is presently capable of simultaneously receiving from the sender endpoint device, which may be dependent upon the sender endpoint device using a defined coding to encode data carried by the media streams, and/or may indicate a maximum combined bandwidth for all of the media streams and/or a maximum per-stream bandwidth that the receiver endpoint device is presently capable of simultaneously receiving from the sender endpoint device, which again may be dependent upon the sender endpoint device using a defined coding to encode data carried by the media streams. The advertised capability information may indicate which of a plurality of defined coding parameters that the sender endpoint device should use to encode data carried by particular ones of the media streams to be communicated to the receiver endpoint device, and/or may indicate a token rate and a bucket size for a token bucket algorithm that will be performed by the receiver endpoint device to constrain a data rate of media streams that will be simultaneously received from the sender endpoint device.

[0007] Some other embodiments of the present invention are directed to a method of operating a receiver endpoint device that communicates with a sender endpoint device. The method includes advertising capability information to the receiver endpoint device that defines a capability of the sender endpoint device to simultaneously communicate a plurality of media streams. The method further includes communicating the plurality of media streams at a same time toward the receiver endpoint device based on the advertised capability information.

[0008] Other methods, sender endpoint devices, and receiver endpoint devices according to embodiments of the invention will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional methods, sender endpoint devices, and receiver endpoint devices be included within this description, be within the scope of the present invention, and be protected by the accompanying claims. Moreover, it is intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The accompanying drawings, which are included to provide a further

understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiment(s) of the invention. In the drawings: [0010] Figure 1 is a block diagram of a communication system that is configured according to some embodiments;

[0011] Figures 2 and 3 are block diagrams of a UE and a base station, respectively, configured according to some embodiments;

[0012] Figures 4-9 are flow charts that illustrate operations and methods that can be performed by sender endpoint devices to identify simultaneously communicated media streams as having related media data in accordance with some embodiments;

[0013] Figures 11-20 are flow charts that illustrate operations and methods that can be performed by receiver endpoint devices to simultaneously receive media streams and identify which media streams have related media data in accordance to some embodiments;

[0014] Figures 21-32 are flow charts that illustrate operations and methods that can be performed by receiver endpoint devices to advertise their capability for simultaneously receiving media streams in accordance with some embodiments; and

[0015] Figures 33-41 are flow charts that illustrate operations and methods that can be performed by sender endpoint devices to advertise their capability for simultaneously

communicating media streams in accordance with some embodiments.

DETAILED DESCRIPTION

[0016] The invention will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.

[0017] The terms "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" may be used herein with respect to particular embodiments without limiting the scope of the present invention. Stated in other words by way of example, an element(s), operation(s), step(s), etc., may be required with respect to a particular embodiment without being required for all embodiments. Accordingly, these terms should not be considered limiting with respect to the present application and claims omitting the referenced element(s), operation(s), step(s), etc.

1.1 Introduction

[0018] Various embodiments are directed to sender endpoint devices and receiver endpoint devices, and associated methods, that simultaneously send and receive, respectively, a plurality of media streams through at least one RTP session. In some embodiments, the media streams are communicated through Real Time Transport Protocol (RTP), where there are multiple media streams that are sent over an RTP session. A sender endpoint device may establish a RTP session with a receiver endpoint device and may then simultaneously communicate a plurality of media streams through the RTP session. Additionally, the sender endpoint device may establish a plurality of RTP sessions with the receiver endpoint device and communicate a plurality of media streams through one or more of the plurality of RTP sessions.

[0019] In accordance with various embodiments, a sender endpoint device communicates information toward the receiver endpoint device that identifies which of the media streams contain related media data. The receiver endpoint device uses that information to, for example, select among the simultaneously received media streams to output/use a selected media stream and/or may combine two or more of the selected media streams to output/use a combined media stream.

[0020] A sender endpoint device can communicate the information identifying the related media stream through additional uses of existing signaling provided by RTP and/or Real-time Transport Control Protocol (RTCP), and/or by generating signaling extensions to RTP and/or RTCP. The sender endpoint device can communicate the information directly to a receiver endpoint device and/or can communicate information to a central node for forwarding to the receiver endpoint device. In some embodiments, the related media data information can be used to provide improved handling of simulcasted media streams, such as when multiple encodings or representations of the same media source are sent from a same sender endpoint device to a receiver endpoint device. RTCP is a sister protocol of Real-time Transport Protocol (RTP) which is widely used for real time data transport. Providing additional uses for existing signaling aspects of RTP and/or RTCP and/or providing extensions to the signaling of RTP and/or RTCP may provide improved handling of simulcast media streams within legacy systems that are not configured to fully support multiple media streams.

[0021] As used herein, the term media stream (or media) refers to a stream of data (e.g., video data stream and/or audio data stream) that is sent from one endpoint device (such as a microphone for audio data stream and/or a video camera for video data stream). The term endpoint device (also referred to as an endpoint) refers to a communication device that handles media by originating one or more media streams (e.g., originating audio and/or video streams using a microphone and/or video camera) and/or terminating one or more media streams (e.g., generating audio and/or video data stream output) received from one or more other endpoint devices. Moreover, each endpoint device of a RTP session may be both a sender endpoint device generating one or more media data streams for communication to other endpoint devices (acting as receiver endpoint devices), and a receiver endpoint device receiving media data streams as input. By way of example, an RTP Mixer may be considered as an endpoint.

[0022] Figure 1 is a block diagram of a communication system that is configured according to some embodiments. The communication system includes a plurality of endpoint devices 111-1 to 111-n that are communicatively connected through one or more networks 101 (e.g., public networks, such as the Internet, and/or private networks). One or more of the endpoint devices may operate as a sender that simultaneously communicates a plurality of media streams (such as audio data and/or video data) toward a receiver endpoint device participating in a streaming communication session (such as a video conferencing session) through network 101 (e.g., the Internet) according to some embodiments. While at least five endpoint devices 111 are shown in Figure 1 by way of example, embodiments of the present invention may be implemented using any number of two or more endpoint devices.

[0023] One or more RTP sessions are established between two or more of endpoint devices 111. Each endpoint device 111 included in the RTP session(s) may act as a sender endpoint device to generate a plurality of media streams that can be communicated directly to a receiver endpoint device or may be communicated to a central node 112 (e.g., a RTP mixer node) for possible forwarding to the receiver endpoint device. Each endpoint device 111 may also act as a receiver endpoint device to receive a plurality of media streams.

[0024] As will be explained further below, the central node 112 may select among a plurality of received media streams for forwarding to a receiver endpoint device and/or it may combine or otherwise manipulate (e.g., perform transcoding between defined data encoding formats) one or more received media streams before forwarding to the receiver endpoint device. More particularly, the central node 112 may select a media stream to be sent to a receiver endpoint device 111 responsive to input from the receiver endpoint device 111. For example, each endpoint device 111 of a conference session may select a media stream or a plurality of media streams of the conference session to be presented at that endpoint device 111. Two endpoint devices in a peer to peer embodiment, for example, may each send and receive a plurality of media streams, and each of the endpoint devices may use functionality of

embodiments described herein to control which of the streams are received from the other endpoint device.

[0025] Figure 2 is a block diagram illustrating an endpoint device 111 of Figure 1 according to some embodiments. Endpoint device 111, for example, may include a processor 131 coupled to a display 121 (e.g., a liquid crystal display screen providing a video output) or display output, a user input interface 129 (e.g., including a keypad, a touch sensitive surface of display 121, etc.), a speaker 123 or speaker output, one or more video cameras 125 or video camera input(s), and one or more microphones 127 or microphone input(s). Inputs/outputs discussed above may be interfaces (e.g., couplings, jacks, etc.) for wired inputs/outputs and/or wireless interfaces (e.g., Bluetooth, WiFi, etc.). In addition, a network interface 133 may provide a data/communications coupling between processor 131 and network 101. The coupling between network interface 133 and network 101 may be provided over a wired coupling (e.g., using a digital subscriber line modem, a cable modem, etc.), over a wireless coupling (e.g., over a 3G/4G wireless network, over a WiFi link, etc.), or over a combination thereof. Endpoint device 111, for example, may be a smartphone, a tablet computer, a netbook computer, a laptop computer, a desktop computer, a video camera, a digital microphone, a hub that combines audio/video streams from a plurality of video cameras and/or digital microphones.

[0026] The processor 131 may include one or more data processing circuits, such as a general purpose and/or special purpose processor (e.g., microprocessor and/or digital signal processor), which may be collocated or distributed across a communication network. The processor 131 is configured to execute computer program instructions from memory device(s) (e.g., internal or external memory), described below as a computer readable medium, to perform at least some of the operations and methods described herein as being performed by an endpoint device in accordance with one or more embodiments of the present invention.

[0027] In one embodiment, the endpoint device 111 is used for video conferencing and is configured to receive audio/video streams from a plurality of external cameras and associated microphones that may be positioned around a room (e.g., for video conferencing) or positioned in a plurality of rooms (e.g., for security monitoring) and coupled to the processor 131 through video/microphone inputs 125/127. The processor 131 may encode data carried by the media streams, and may be configured to simultaneously output a plurality of different encoded types of a same input media content. For example, the processor 131 may receive a video or audio stream from a camera or microphone, and apply different spatial sampling and/or temporal sampling to the media stream to simultaneously output a plurality of different versions of the media stream. By further example, the processor 131 may output a full resolution video/audio stream and one or more reduced resolution video/audio streams for communication toward a receiver endpoint device 111. The processor 131 may, in another embodiment, output different bitrate streams using variable lossy coding, such as by controllably discarding different rates of bits from an input media stream to output different quality lossy quality versions (e.g., different levels of video coarseness) of the input media stream. A sender endpoint device 111 may thus simultaneously output a plurality of media streams that contain related media data (such as different spatial samplings, temporal samplings, and/or data encodings of a same media content stream). [0028] Figure 4 is a flowchart of operations and methods that can be performed by a sender endpoint device 111-1 to simultaneously communicate a plurality of media streams toward a receiver endpoint device 111-2 through a session, in accordance with some embodiments which will be explained in further detail below. Referring to Figure 4, the sender endpoint device 111-1 can be configured to communicate (block 400) a plurality of media streams, simultaneously in time, for a session (e.g., a RTP session) toward a receiver endpoint device 111-2. The sender endpoint device 111-1 also communicates (block 402) information toward the receiver endpoint device 111-2 that identifies which of the media streams contain related media data (content).

[0029] Accordingly, the receiver endpoint device 111-2 can use the information to identify which of the media streams contain related media data, and may further identify, using the information, differences between the media data (e.g., which media streams have been coded using which algorithms, which media streams have been sampled using which rates, which media streams have which video pixel/line resolutions, etc.).

[0030] Figure 3 is a block diagram illustrating the central node 112 of Figure 1 according to some embodiments. As shown in Figure 3, central node 112 may include a processor 231 and a network interface 233, with a network interface 233 providing a data/communications coupling between the processor 231 and the network 101.

[0031] The processor 231 may include one or more data processing circuits, such as a general purpose and/or special purpose processor (e.g., microprocessor and/or digital signal processor), which may be collocated or distributed across a communication network. The processor 231 is configured to execute computer program instructions from memory device(s) (e.g., internal or external memory), described below as a computer readable medium, to perform at least some of the operations and methods described herein as being performed by a central node (e.g., RTP mixer node) in accordance with one or more embodiments of the present invention.

[0032] Processor 231 may receive one or more media streams and associated information from each sender endpoint device 111-1, and may select among received media streams for forwarding to a receiver endpoint device 111-2 and/or it may combine or otherwise manipulate (e.g., perform transcoding between defined data encoding formats) one or more received media streams before forwarding to the receiver endpoint device 111-2.

[0033] The central node 112 may communicate to the receiver endpoint device 111-2 information identifying characteristics of each of the media streams to enable the receiver endpoint device to select one or more the available media streams for communication from the central node 112 to the receiver endpoint device 111-2. The processor 231 may select among the received media streams for forwarding to the receiver endpoint device 111-2 and/or combine or otherwise manipulate one or more received media streams responsive to an instruction received from the receiver endpoint device 111-2.

[0034] These and other embodiments are explained in further detail below.

1.1 Multiple Streams

[0035] RTP sessions are a fundamental part of a Synchronization Source identifier

(SSRC) space. The SSRC uniquely identifies real time media streams within a communication session. The SSRC space can encompass a number of network nodes and interconnecting transport flows between these nodes. Each node may have zero, one or more source identifiers SSRCs used to either source a real media source such as a camera or a microphone, a conceptual source, like the most active speaker selected by a RTP mixer that switches between incoming media streams based on the media stream or additional information, or simple as an identifier for a receiver that provides feedback and reports on reception. There are also RTP nodes, like translators that are manipulating, data, transport or session state without making their presence aware to the other session participants.

[0036] RTP was designed with multiple participants in a session from the beginning. This was not restricted to multicast, but instead also includes unicast using either multiple transport flows below RTP or a network node that redistributes the RTP packets, either unchanged in the form of a transport translator (relay) or modified in an RTP mixer. In addition, a single endpoint device may have multiple media sources of the same media type, like cameras or microphones.

[0037] However, the most common use cases has been point to point Voice over IP

(VoIP) or streaming applications where there has commonly not been more than one media source per endpoint. Even in conferencing applications, especially voice only, the conference focus or bridge has provided a single stream being a mix of the other participants to each participant. Thus there has been perceived little need by the relevant industry for handling multiple SSRCs in implementations. This has resulted in an installed legacy base that aren't fully RTP specification compliant and will have different issues if they receive multiple SSRCs of media, either simultaneously or in sequence. These issues manifest themselves in various ways, either by software crashes, or simply in limited functionality, like only decoding and playing back the first or latest SSRC received and discarding any other SSRCs.

[0038] Conventional signaling solutions around RTP, especially Session Description

Protocol (SDP) based signaling haven't considered the fundamental issues around RTP session's theoretical support of up to 4 billion plus sources all sending media. No endpoint has infinite processing resources to decode and mix any number of sources with media. In addition, the memory for storing related state, especially decoder state is limited, and the network bandwidth to receive multiple streams is also limited. Presently, the most likely limitations are processing and network bandwidth, although for some use cases memory or other limitations may exist. Thus, a given endpoint will have some limitations in the number of streams it simultaneously can receive, decode and playback.

[0039] Therefore in accordance with some embodiments, an endpoint device can be configured to advertise its capabilities to another session participant s), so that endpoint device limitations are exposed and can be compensated for by the other session participant(s) (e.g., sender endpoint device).

[0040] According to some other embodiments, an endpoint device is configured to signal whether it intends to produce one or more media stream. In contrast, convention SDP signaling is limited to communicating a directionality attribute which indicates whether an endpoint device intends to send media or not. No indication of how many media streams an endpoint device intends to send is communicated by conventional SDP signaling, and which is now addressed by some embodiments of the present invention.

[0041] Accordingly, there exists a clear need to enable the usage of multiple simultaneous media streams within an RTP session in a way that allows a system to take legacy

implementations into account in addition to negotiate the actual capabilities around the multiple streams in an RTP session.

[0042] In addition to the above issues, other issues related to simultaneous communication of multiple streams are identified and solved in accordance with some further embodiments. Some such issues related to obscurities in the RTP specification and short-comings in various signaling mechanism that are exposed by multi-stream use cases.

[0043] Although various embodiments are discussed in the context of simulcast, invention is not limited thereto and may be used for any communication devices either simultaneously receiving sending and/or simultaneously receiving a plurality of media streams.

1.2. Simulcast

[0044] Simulcast is the act of simultaneously sending multiple different versions of source media content. This can be done in several ways and for different purposes. Various example embodiments are described herein the context of the case where an endpoint device 111 will provide multiple different encodings towards a central node device 112 (intermediary device) so that the central node device 112 can select which version to forward to other endpoint device participants 111 in a RTP session. Various different ways of performing simulcast is further described below in Section 3, entitled "Simulcast Usage and Applicability. " [0045] The different versions of source media content that can be simulcasted by varying one or more of the following characteristics of the source media content:

1) Bit-rate: The primary difference is the amount of bits used to encode source media content into a media stream, and thus primarily affects the media Signal to Noise Ratio (SNR);

2) Codec: Different media codecs are used, for example, to ensure that different receivers that do not have a common set of decoders can decode at least one of the versions of the encoded source media content. This includes codec configuration options that aren't compatible, like video encoder profiles, or the capability of receiving the transport packetization; and

3) Sampling: Different sampling of the source media content, in the spatial domain and/or in the temporal domain, may be used to suit different rendering capabilities or needs at receiving endpoint devices, as well as a method to achieve different bit-rates. For video streams, spatial sampling affects image resolution, and temporal sampling affects video frame rate. For audio, spatial sampling relates to the number of audio channels, and temporal sampling affects audio bandwidth. In another embodiment, different lossy quality versions of the input media stream may be generated by discarding a controllable rate of bits of the input media stream to provide, for example, output different levels of video coarseness.

[0046] Different applications (in one or more endpoint devices) can have different reasons for simulcasting a plurality of different versions of media content from a media source. The need for simulcasting different versions of a media stream can arise even when media codecs used by a sending/receiving endpoint device have scalability features that enable them to solve a set of coding variations.

[0047] Various solutions are explained below for non-trivial variants of simulcast. An analysis of different ways of multiplexing the different encodings are discussed in Section 6. Following the presentation of the alternatives, an analysis is explained in Section 7 for how different aspects like RTP mechanisms, signaling possibilities, and network features are affected by these alternatives.

[0048] A recommendation is then provided for which solution may be most suitable, and changes to RTP that can utilized according to one or more embodiments are explained.

2. Definitions

2.1. Requirements Language [0049] To the extent that terms such as "must", "must not", "required", "shall", "shall not",

"should", "should not", "recommended", "may", and "optional" are used in the following disclosure, these terms may be interpreted in accordance with RFC (Request For Comments) 2119 by S. Bradner and entitled "Key words for use in RFCs to Indicate Requirement Levels" (BCP 14, RFC 2119, March 1997).

2.2. Terminology

[0050] In addition to definitions provided in the present disclosure, definitions which do not contradict these definitions herein may also be used from the documents listed in the

Normative References section below (RFC2119; RFC3550; RFC5234; RFC6222) and listed in the Information References section below (RFC2205; RFC2474; RFC3264; RFC4103; RFC4566; RFC4588; RFC5104; RFC5117; RFC5576; RFC5761; RFC5888).

[0051] As used herein, "encoding" refers to a media encoder (codec) that has been used to compress a media stream and/or the fidelity of encoding that has been used to encode a media stream through the choice of sampling, bit-rate, and/or other configuration parameters.

[0052] Moreover, the phrase "different encodings" refers to the use of one or more different parameters that characterize the encoding of a particular media source. Such changes can include, but are not limited to, one or more of the following parameters: codec; codec configuration; bit-rate; and/or sampling.

3. Simulcast Usage and Applicability

[0053] This section discusses different scenarios in which simulcast is used.

3.1. Simulcasting to RTP Mixer (Central Node 112)

[0054] Some embodiments are directed to a multi-party session that is communicated through one or more RTP Mixers 112 to facilitate the media transport between the session participants (endpoint devices 111-1...111-n). The RTP topology can include that defined in

[RFC5117] of RTP Mixer (Section 3.4: Topo-Mixer).

[0055] Simulcasting different media encodings of video that have both different resolution and bit-rate is highly applicable to video conferencing scenarios. For example an RTP mixer 112 can be controlled to select the most active speaker (e.g., endpoint device 111-1) and sends that participant's (e.g., selected) media stream as a high resolution stream to a receiver (e.g., endpoint device 111-2) and in addition can simultaneously send a number of small resolution video streams of any additional participants (e.g., non-selected, such as endpoint devices 111-3... 111-n), thus enabling the receiver (e.g., endpoint device 111-2) to both see the current speaker (e.g., endpoint device 111-1) in high quality and monitor the other participants (e.g., endpoint devices 111- 3... 111-n). Thus, there can be several different combinations of high resolution and low resolution in use simultaneously; requiring both a high and low resolution from a source at the same time.

[0056] To provide both high and low resolution from an RTP Mixer 112 there are at least these potential alternatives:

1) Simulcast: For example, the client endpoint device 111 sends one stream for the low resolution and another for the high resolution.

2) Scalable Video Coding: For example, the client endpoint device 111 uses a video encoder that can provide one media stream that is both providing the high resolution and enables the RTP mixer 112 to extract a lower bit-rate than the full stream version, for the low resolution.

3) Transcoding in the Mixer: For example, the client endpoint device 111 sends a high resolution stream to the RTP Mixer 112, which performs a transcoding to a lower resolution version of the video stream that is forwarded to other endpoint devices that need it.

[0057] The transcoding alternative may require that the RTP Mixer 112 has sufficient amounts of transcoding resources to produce the number of low resolution versions that are required. The worst case loading for resources may correspond to when all participants streams needs transcoding. If the resources are not available, a different solution needs to be chosen. In accordance with some embodiments, the RTP mixer 112 can advertise its resource capabilities to the sender endpoint device 111 and/or to the receiver endpoint device 111. The sender endpoint device 111 and to the receiver endpoint device 111 can then consider the resource capabilities of the RTP mixer 112 when configuring a session and/or one or more media streams that will be communicated through the RTP mixer 112.

[0058] The scalable video encoding alternative may necessitate a more complex encoder compared to non-scalable encoding. Also, if the resolution difference is sufficiently big, the scalable codec may be only marginally more bandwidth efficient, between the encoding client endpoint device 111 and the RTP mixer 112, than a simulcast that send the resolutions in separate streams, assuming equivalent video quality. Furthermore, with scalable video encoding, the transmission of all but the lowest resolution will consume more bandwidth from the RTP mixer 112 to the other participant endpoint devices 111 than a non-scalable encoding, again assuming equivalent video quality.

[0059] Simulcasting has the benefit that it is conceptually simple. It enables use of any media codec that the participants 111/112 agree on, allowing the RTP mixer 112 to be codec- agnostic. Considering today's video encoders, it is less bit-rate efficient in the path from the sending client endpoint device 111 to the RTP mixer 112 but more efficient in the RTP mixer 112 to receiver path compared to Scalable Video Coding.

3.2. Simulcasting to a Consuming Endpoint Device

[0060] Some other embodiments utilize a RTP Transport Translator (Section 3.3 : Topo-

Trn-Translator) [RFC5117], which may be included within the central node 112. The transport translator functions as a relay and transmits all the streams received from one participant endpoint device 111-1 to selected other participant endpoint devices 111-2... 111-n. All receiver endpoint devices may be configured to receive all media stream versions of a same source media content through simulcast thereof. However, this approach increases the bit-rate consumed on the paths to the receiver endpoint devices. A benefit for the receiver client endpoint devices is reduced decoding complexity when there is a need to only display a low resolution version. Otherwise a single stream application which only transmits the high resolution stream would allow the receiver endpoint device 111 to decode it and the scale it down to the needed resolution.

[0061] The usage of transport translator and simulcast becomes efficient if each receiving client endpoint device 111 is configured to control the relay to indicate which version it wants to receive. However such a usage of RTP has some potential issues with Real-time Transport Control Protocol (RTCP). From the sending endpoint device 111 it will look like the transmitted stream isn't received by a receiver endpoint device 111 that is known to receive other streams from the sending endpoint device 111. Thus some consideration and mechanism is needed to support such a use case so that it doesn't break RTCP reception reporting.

3.3. Same Encoding to Multiple Destinations

[0062] One implementation of simulcast is where one encoding is sent to multiple receiver endpoint devices 111. This may be supported in RTP by copying all outgoing RTP and RTCP traffic to several transport destinations as long as the intention is to create a common RTP session. As long as all participants do the same, a full mesh is constructed and everyone in the multi party session have a similar view of the joint RTP session. This is similar to an Any Source Multicast (ASM) session but without the traffic optimization as multiple copies of the same content is likely to have to pass over the same link.

3.4. Different Encoding to Independent Destinations

[0063] Another alternative implementation of simulcast is where multiple destination endpoint devices 111 each receive a specifically tailored version, but where the destination endpoint devices 111 are independent. A typical example for this would be a streaming server (e.g., source endpoint device 111) distributing the same live session to a number of receiver endpoint devices 111, while adapting the quality and resolution of the multi-media session to each receiver endpoint devices' 111 capability and available bit-rate. In one approach, multiple independent RTP sessions are established between the sender endpoint device 111 and the receiver endpoint devices 111.

4. Multiple Streams Issues

[0064] Some further embodiments are now explained that use multiple media streams in an RTP session. Although in theory multi-stream applications in RTP can be used, it can be advantageous to generate extensions to RTP for further signaling as explained below.

Alternatively, existing RTP features can be further tasked to support an RTP session that contains more than two Synchronization Source identifiers (SSRCs).

4.1. Legacy behaviors

[0065] It is a common assumption among many applications using RTP that they don't have a need to support more than one incoming and one outgoing media stream per RTP session. For a number of applications this assumption has been correct. For VoIP and Streaming applications it has been easiest to ensure that a given endpoint only receives and/or sends a single stream.

[0066] Some RTP extension mechanism has made the RTP stacks requiring to handle additional SSRCs, like SSRC multiplexed RTP retransmission [RFC4588]. However, that still has only required handling a single media decoding chain.

[0067] However, there are applications that can benefit from receiving and using multiple media streams simultaneously. A very basic case would be T.140 conversational text, which is both low bandwidth and where there is no logical method for mixing multiple sources of text. An RTP session that contains more than two SSRC actively sending media streams has the potential to confuse a legacy client in various ways:

1. The receiving client needs to be configured to handle receiving more than one stream simultaneously rather than replacing the already existing stream with the new one.

2. The receiving client needs to be configured to decode multiple streams simultaneously

3. The receiving client needs to be configured to render multiple streams simultaneously

[0068] These applications may be based on existing one media stream applications at signaling level. To avoid connecting two different implementations, one that is built to support multiple streams and one that doesn't, it is important that the capabilities are signaled. It is also the legacy that makes use of a basic assumption in the solution. Any endpoint device 111 that doesn't explicitly indicates a capability to receive multiple media streams is assumed, in accordance with one embodiment, to only be capable of handling a single media to avoid affecting the legacy ones.

4.2. Receiver Limitations

[0069] An endpoint device 111/ central node 1 12 that intends to process the media in an

RTP session needs to have sufficient resources to receive and process all the incoming streams. It is extremely likely that no receiver endpoint device 111/central node 112 is capable of handling the theoretical upper limit of an RTP session when it comes to more than 4 billion media sources. Instead, one or more limitations will exist for the endpoint device 111/central node 112 resource capabilities to handle simultaneous media streams. These resource limitations can include, for example memory, processing, network bandwidth, memory bandwidth, or rendering estate.

[0070] Another resource limitation of a receiver endpoint device 111/central node 112 is how many simultaneous non-active sources it can receive/send. Non-actively media sending SSRCs may not result in significant resource consumption and, therefore, may not need to be limited.

[0071] A potential issue that is noted is where a limited set of simultaneously active sources varies within a larger set of session members. As each media decoding chain may contain state, it is important that this type of usage ensures that a receiver endpoint device 111/central node 112 can flush a decoding state for an inactive source and if that source becomes active again it does not assume that this previous state exists.

[0072] In accordance with some embodiments, signaling is provided that allows a receiver endpoint device 111/central node 112 to indicate its upper limit in terms of capability to handle simultaneous media streams. There may not be an upper limitation of RTP session members. Applications may need to be configured to consideration how they use codecs.

4.3. Transmission Declarations

[0073] In an RTP based system where an endpoint device 111 may either be legacy or has an explicit upper limit in the number of simultaneous streams, one will encounter situations where the endpoint device 111 will not receive all simultaneous active streams in the session. Instead the endpoint device 111 or central nodes 112, like RTP mixers, will be configured to provide the endpoint device 111 with a selected set of streams based on various metrics, such as most active, most interesting, or user selected. In addition, the central node 112 may combine multiple media streams using mixing or composition into a new media stream to enable an endpoint device 111 to get a sufficient source coverage in the session, despite existing limitations. [0074] For such a system to be able to correctly determine the need for central processing, the capabilities needed for such a central node 112, and the potential need for an endpoint device 111 to do sender side limitations, it is necessary for an endpoint device 111 to declare how many simultaneous streams it may send. Thus, enabling negotiation of the number of streams an endpoint device 111 sends.

4.4. RTP and RTCP Issues

[0075] This section details a few RTP and Real-time Transport Control Protocol (RTCP) issues and related embodiments for supporting multiple streams.

4.4.1. Multiple Sender Reports in Compound

[0076] One potential interoperability issue is inclusion of multiple Sender Report blocks in the same RTCP compound packet. One potential problem is that some RTCP receivers might not correctly handle such message packets. There is also an uncertainty under the RTP standards how an endpoint device should calculate the RTCP transmission intervals in such cases.

4.4.2. Cross reporting within an endpoint

[0077] When an endpoint device 111 has more than one SSRC and sends media using them, a question arises if the different SSRCs needs to report on each other despite being local, which may be needed for any external observer (e.g., another endpoint device) to determine that separate media streams are actually sent from the same endpoint device 111. Thus, by reporting on each other there is no holes in the connectivity matrix between all sending SSRCs and all known SSRCs.

4.4.3. Which SSRC is providing feedback

[0078] When an endpoint device 111 has multiple SSRCs and needs to send RTCP feedback messages some considerations around which SSRC is used as the source one and if that is consistently used or not, may be needed.

4.5. SDP Signaling Issues

[0079] An existing issue with SDP is that the bandwidth parameters aren't specified to take asymmetric conditions into account. This becomes especially evident when using multiple streams in an RTP session. Such use case can result in an endpoint receiver receiving, for example, five streams of Full High Definition (HD) video but only sending one Standard

Definition (SD) video stream, which can result in a 10: 1 asymmetry in bit-rate. [0080] If one uses the current SDP bandwidth parameters then one likely needs to set the session bandwidth to the sum of the most consuming direction. This can result in that there is no way of negotiating an upper bound for the lower band-width direction media stream(s). In addition, an endpoint device 111 may conclude that it can't support the bit-rate despite it being capable of actually receiving the media streams being sent.

[0081] In the cases there is Quality of Service (QoS) either by endpoint device reservation or done by systems like IMS, the requested bandwidth based on the signaled value will not represent in what is actually needed.

[0082] Asymmetry in itself also create an issue, as RTCP bandwidth may be derived from the session bandwidth. It is important that all endpoint devices have a common view on what the RTCP bandwidth is. Otherwise if the bandwidth values are more than 5 times different, an endpoint device with the high bandwidth value may time out an endpoint device that has a low value as it's minimal reporting interval can become more than 5 times longer than for the other nodes.

5. Multi-Stream Extensions

5.1. Signaling Support for Multi- Stream

[0083] There is a need to signal between RTP sender and receiver endpoint devices how many simultaneous RTP streams can be handled. The number of RTP streams that can be sent from a client does not have to match the number of streams that can be received by the same client. A multi-stream capable RTP sender endpoint device, in accordance with some

embodiments, is able to adapt the number of streams that it will output responsive to the known capabilities of the RTP receiver endpoint device.

[0084] For this purpose and for use in SDP, two new media-level SDP attributes are defined, max-send-ssrc and max-recv-ssrc, which can be used independently to establish a limit to the number of simultaneously active SSRCs for the send and receive directions, respectively. Active SSRCs are the ones counted as senders according to RFC3550, i.e. they have sent RTP packets during the last two regular RTCP reporting intervals.

[0085] The syntax for the attributes are in AB F [RFC5234] :

max-ssrc = ("max-send-ssrc:" / "max-recv-ssrc:") PT 1 *WSP limit

PT = "*" / 1 *3DIGIT

limit = 1 *8DIGIT; WSP and DIGIT defined in [RFC5234]

[0086] A payload-agnostic upper limit to the total number of simultaneous SSRC that can be sent or received in this RTP session is signaled with a * payload type. A value of 0 may be used as maximum number of SSRC, but it is then recommended that this is also reflected using the sendonly or recvonly attribute. There should also be at most one payload-agnostic limit specified in each direction.

[0087] A payload-specific upper limit to the total number of simultaneous SSRC in the

RTP session with that specific payload type is signaled with a defined payload type (static, or dynamic through rtpmap). Multiple lines with max-send-ssrc or max-recv-ssrc attributes specifying a single payload type may be used, each line providing a limitation for that specific payload type. Payload types that are not defined in the media block should be ignored.

[0088] If a payload-agnostic limit is present in combination with one or more payload- specific ones, the total number of payload-specific SSRCs are additionally limited by the payload- agnostic number. When there are multiple lines with payload-specific limits, the sender or receiver endpoint devices 111 should be able to handle any combination of the SSRCs with different payload types that fulfill all of the payload specific limitations, with a total number of SSRCs up to the payload-agnostic limit.

[0089] For example, when the payload agnostic limit is 5 and payload type 96 is 2 and payload type 97 is 3 and there exist an additional payload type 98. Then the limitation is that one can have 0-2 payload types 96, 0-3 payload type 97 and 0-5 payload type 98, and the complete sum of all SSRCs simultaneous in use must always be no more than 5. Possible combinations include 2 with PT=96 and 3 with PT=97, or 1 with PT=96, 2 with PT=97 and 2 with PT=98.

[0090] When max-send-ssrc or max-recv-ssrc are not included in the SDP, it can be interpreted by an endpoint device as equivalent to a limit of one, unless sendonly or recvonly attributes are specified, in which case the limit is implicitly zero for the corresponding unused direction.

5.1.1. Declarative Use

[0091] When used as a declarative media description, the specified limit in max-send-ssrc indicates the maximum number of simultaneous streams of the specified payload types that the configured endpoint device 111 may send at any single point in time. Similarly, max-recv-ssrc indicates the maximum number of simultaneous streams of the specified payload types that may be sent to the configured endpoint endpoint device 111. Payload-agnostic limits can be used with or without additional payload-specific limits.

5.1.2. Use in Offer/ Answer

[0092] When used in an offer, the specified limits indicates the agent endpoint device's

111 intent of sending and/or capability of receiving that number of simultaneous SSRC. The answerer can cause a decrease in the offered limit in the answer to suit the answering client endpoint device's 111 capability. A sender endpoint device 111 can responds by not sending more simultaneous streams of the specified payload type than the receiver endpoint device 111 has indicated ability to receive, taking into account also any payload-agnostic limit.

[0093] In case of an answer fails to include any of the limitation attributes, the agent endpoint device 111 is then known to only be capable of supporting a single stream in the direction for which attributes are missing. If the offer lacks attributes it must be assumed that the offerer only supports a single stream in each direction.

5.1.3. Examples

[0094] The SDP examples below are provided for illustration of operations and methods, and only relevant parts have been included.

m=video 49200 RTP/AVP 99

a=rtpmap:99 H264/90000

a=max-send-ssrc:* 2

a=max-recv-ssrc:* 4

[0095] An offer with a stated intention of sending 2 simultaneous SSRCs and a capability to receive 4 simultaneous SSRCs.

m=video 50324 RTP/AVP 96 97

a=rtpmap:96 H264/90000

a=rtpmap:97 H263 -2000/90000

a=max-recv-ssrc:96 2

a=max-recv-ssrc:97 5

a=max-recv-ssrc:* 5

[0096] An offer to receive at most 5 SSRC, at most 2 of which using payload type 96 and the rest using payload type 97. By not including "max-send-ssrc" the value is implicitly set to 1.

5.2. Asymmetric SDP Bandwidth Modifiers

[0097] To resolve the issues around bandwidth, a new SDP bandwidth modifiers (SDP bandwidth attribute) can be used that supports directionality, possibility for payload specific values and clear semantics. A common problem for current SDP bandwidth modifiers is that they use a single bandwidth value without a clear specification. Uncertainty in how the bandwidth value is derived creates uncertainty on how bursty a media source can be.

5.2.1. Design Criteria [0098] The current b= SDP bandwidth syntax is very limited and only allows the following format:

bandwidth-fields = *(%x62 "=" bwtype " :" bandwidth CRLF)

bwtype = token

bandwidth = 1 *DIGIT

[0099] Thus there is a need to specify a new SDP bandwidth attribute that can be communicated by an endpoint device to support syntax needed for more advanced functionality. The functionality that can be provided by the new bandwidth attribute can include the following:

[00100] 1) Directionality: The SDP bandwidth attribute can indicate different sets of attribute values depending on direction;

[00101] 2) Bandwidth semantics: The SDP bandwidth attribute can include a semantics identifier so that new semantics can be defined in the future for other needed semantics. This part of the b= has been a very successful design feature. There may not be a need for both single stream limitations and limitations for the aggregate of all streams in one direction;

[00102] 3) Payload specific: Different bandwidth values may be specified for different RTP Payload types. Some codecs has different characteristics and, consequently, an endpoint device may want to limit a specific codec and payload configuration to a particular bandwidth. Especially combined with codec negotiation there is a need to express intentions and limitations on usage for that particular codec. In addition, payload agnostic information is also needed; and/or

[00103] 4) Bandwidth specification method: To indicate what bit-rate values mean, an endpoint device can communicate Token bucket parameters that indicate, for example, bucket depth and bucket fill rate. If single values are to be specified, a clear definition on how one derive that value must be specified, including averaging intervals etc.

5.2.2. Attribute Specification

[00104] A new SDP attribute ("a=") can be generated by an endpoint device, which can be represented as an "a=bw" attribute. This attribute is structured as follows, in accordance with some embodiments. After the attribute name there is a directionality parameter, a scope parameter, and a semantics. The semantics provides an indication that is useful for interpreting the parameter values.

[00105] The attribute is designed so that multiple instances of the line will be necessary to express the various bandwidth related configurations that are desired. [00106] To ensure that an endpoint device using SDP either in Offer/ Answer or declarative truly understands these extensions, a required-prefix indicator (" ! ") can be added prior to any scope or semantics parameter.

5.2.2.1. Attribute Definition

[00107] The AB F [RFC5234] for this attribute is the following:

bw-attrib = "a=bw:" direction SP [req] scope SP [req] semantics ":" values direction = "send" / "recv" / "sendrecv"

scope = payloadType / scope-ext

payloadType = "PT=" ("*" / PT-value-list)

PT-value-list = PT-value *(";" PT-Value)

PT-value = 1 *3DIGIT

req = " ! "

semantics = "SMT" / "AMT" / semantics-ext

values = token-bucket / value-ext

token-bucket = "tb=" br-value " :" bs-value

br-value = 1 * 15DIGIT ; Bucket Rate

bs-value = 1 * 15DIGIT ; Bucket Size

semantics-ext = token ; As defined in RFC 4566

scope-ext = 1 * VCHAR ; As defined in RFC 4566

value-ext = 0*(WSP / VCHAR)

[00108] The a=bw attribute defines three possible directionalities:

[00109] 1) send: In the send direction for SDP Offer/ Answer agent (e.g., endpoint device) or in case of declarative use in relation to the device that is being configured by the SDP.

[00110] 2) recv: In the receiving direction for the SDP Offer/ Answer agent (e.g., endpoint device) providing the SDP or in case of declarative use in relation to the device that is being configured by the SDP.

[00111] 3) sendrecv: The provided bandwidth values applies equally in send and recv direction, i.e. the values configures the directions symmetrically.

[00112] The Scope indicates what is being configured by the bandwidth semantics of this attribute line. This parameter is extensible and we begin with defining two different scopes based on payload type:

Payload Type: The bandwidth configuration applies to one or more specific payload type values. PT=*: Applies independently of which payload type is being used.

[00113] This specification defines two semantics which are related. The Stream Maximum Token bucket based value (SMT) and the Aggregate Maximum Token bucket based value (AMT). Both semantics represent the bandwidth consumption of the stream or the aggregate as a token bucket. The token bucket values are the Token bucket rate and the token bucket size and represented as two floating-point numbers.

[00114] The definition of the semantics in more detail are:

[00115] SMT: The maximum intended or allowed bandwidth usage for each individual source (SSRC) in an RTP session as specified by a token bucket. The token bucket values are the token rate in bits per second and the bucket size in bytes. This semantics may be used both symmetrically or in a particular direction. It can be used either to express the maximum for a particular payload type or for any payload type (PT=*).

[00116] AMT: The maximum intended or allowed bandwidth usage for sum of all sources (SSRC) in an RTP session according to the specified directionality as specified by a token bucket. The token bucket values are the token rate in bits per second and the bucket size in bytes. Thus, when using the sendrecv directionality parameter, both send and receive streams can be included in the generated aggregate. If only a send or recv, then only the streams present in that direction are included in the aggregate. It can be used either to express the maximum for a particular payload type or for any payload type (PT=*).

5.2.2.2. Offer/ Answer Usage

[00117] The offer/answer negotiation is done for each bw attribute line individually with the scope and semantics immutable. If an answerer would like to add additional bw

configurations using other directionality, scope, and semantics combination, it may add them.

[00118] A agent responding to an offer will need to consider the directionality and reverse them when responding for media streams using unicast. If the transport is multicast the directionality is not affected.

[00119] For media stream offers over unicast with directionality send, the answerer will reverse the directionality and indicate its reception bandwidth capability, which may be lower or higher than what the sender has indicated as its intended maximum.

[00120] For media stream offers over unicast with directionality receive, these do indicate an upper limit, the answerer (e.g., receiver and point device 111) will reverse the directionality and may only reduce the bandwidth when producing the answer indicating the answerer intended maximum. 5.2.2.3. Declarative Usage

[00121] In declarative usage the SDP attribute is interpreted from the perspective of the endpoint device being configured by the particular SDP. An interpreter may ignore a=bw attribute lines that contains unknown scope or semantics that does not start with the required (" ! ") prefix. If a "required" prefix is present at an unknown scope or semantics, the interpreter SHALL NOT use this SDP to configure the endpoint.

5.2.2.4. Example

[00122] Declarative example with stream asymmetry.

m=video 50324 RTP/AVP 96 97

a=rtpmap:96 H264/90000

a=rtpmap:97 H263 -2000/90000

a=max-recv-ssrc:96 2

a=max-recv-ssrc:97 5

a=max-recv-ssrc:* 5

a=bw:send pt=* SMT:tb=1200000: 16384

a=bw:recv pt=96 SMT:tb=l 500000: 16384

a=bw:recv pt=97 SMT:tb=2500000: 16384

a=bw:recv pt=* AMT:tb=8000000:65535

[00123] In the above example the outgoing single stream is limited to bucket rate of 1.2 Mbps and bucket size of 16384 bytes. The up to 5 incoming streams can in aggregate use maximum 8 Mbps bucket rate and with a bucket size of 65535 bytes. However, the individual streams maximum rate is depending on payload type. Payload type 96 (H.264) is limited to 1.5 Mbps with a bucket size of 16384 bytes, while the Payload type 97 (H.263) may use up to 2.5 Mbps with a bucket size of 16384 bytes.

6. Simulcast Alternatives

[00124] Simulcast is the act of sending multiple alternative encodings of the same underlying media source. When transmitting multiple independent flows that originate from the same source, it could potentially be done in several different ways in RTP. The below subsections describe potential ways of achieving flow de-multiplexing and identification of which streams are alternative encodings of the same source.

[00125] In the below descriptions we also include how this interact with multiple sources (SSRCs) in the same RTP session for other reasons than simulcast. So multiple SSRCs may occur for various reasons such as multiple participant endpoint devices in multipoint topologies such as multicast, transport relays or full mesh transport simulcasting, multiple source devices, such as multiple cameras or microphones at one endpoint, or RTP mechanisms in use, such as RTP Retransmission [RFC4588].

6.1. Payload Type Multiplexing

[00126] Payload multiplexing uses only the RTP payload type to identify the different alternatives. Thus all streams would be sent in the same RTP session using only a single SSRC per actual media source. So when having multiple SSRCs, each SSRC would be unique media sources or RTP mechanism-related SSRC. Each RTP payload type would then need to both indicate the particular encoding and its configuration in addition to being a stream identifier. When considering mechanism like RTP retransmission using SSRC multiplexing then an SSRC may either be a media source with multiple encodings as provided by the payload type, or a retransmission packet as identified also by the payload type.

[00127] As some encoders, like video, produce large payloads one cannot expect that multiple payload encodings can fit in the same RTP packet payload. Instead, a sender endpoint device 111 of a payload type multiplexed simulcast will need to send multiple different packets with one version in each packet or sequence of packets.

6.2. SSRC Multiplexing

[00128] The SSRC multiplexing idea is based on using a unique SSRC for each alternative encoding of one actual media source within the same RTP session. The identification of how flows are considered to be alternative needs an additional mechanism, for example using SSRC grouping [RFC5576] with a semantics that indicate them as alternatives. When one have multiple actual media sources in a session, each media source will use a number of SSRCs to represent the different alternatives it produces. For example, if all actual media sources are similar and produce the same number of simulcast versions, one will have n*m SSRCs in use in the RTP session, where n is the number of actual media sources and m the number of simulcast versions they can produce. Each SSRC can use any of the configured payload types for this RTP session.

6.3. Session Multiplexing

[00129] Session multiplexing means that each different version of an actual media source is transmitted in a different RTP session, using whatever session identifier to de-multiplex the different versions. This solution can then use the same SSRC in all the different sessions to indicate that they are alternatives, or it can use explicit session grouping [RFC5888] with a semantics that indicate them as alternatives (preferably with the same semantics identifier as in Section 6.2 above). When there is multiple actual media sources in use, the SSRC representing a particular source will be present in the sessions for which it produces a simulcast version. Each RTP session will have its own set of configured RTP payload types where each SSRC in that session can use any of the configured ones.

7. Simulcast Evaluation

[00130] The below sub-sections describe an evaluation of various multiplexing strategies described herein.

7.1. Effects on RTP/RTCP

[00131] This section will be oriented around the different multiplexing mechanisms. 7.1.1. Payload Type Multiplexing

[00132] The simulcast solution should ensure that any negative impact on RTP/RTCP is minimal and that all the features of RTP/RTCP and its extensions can be used.

[00133] Payload type multiplexing for purposes like simulcast has well known negative effects on RTP. The basic issue is that all the different versions are being sent on the same SSRC, thus using the same timestamp and sequence number space. This has many effects:

[00134] 1. Putting restraint between media encoding versions. For example, media encodings that uses different RTP timestamp rates cannot be combined as the timestamp values needs to be the same across all versions of the same media frame. Thus they are forced to use the same rate. When this is not possible, Payload Type Multiplexing cannot be used.

[00135] 2. Most RTP payload formats that may fragment a media object over multiple packets, like parts of a video frame, needs to determine the order of the fragments to correctly decode them. Thus it is important that an endpoint device 111 operates to ensure that all fragments related to a frame or a similar media object are transmitted in sequence and without interruptions within the object. This can relatively simple be solved by ensuring that each version is sent in sequence.

[00136] 3. Some media formats require uninterrupted sequence number space between media parts. These are media formats where any missing RTP sequence number will result in decoding failure or invoking of a repair mechanism within a single media context. The text/ T140 payload format [RFC4103] is an example of such a format. These formats may not be possible to simulcast using payload multiplexing.

[00137] 4. Sending multiple versions in the same sequence number space makes it more difficult to determine which version a packet loss may relate to. If an endpoint device uses RTP Retransmission [RFC4588], it can ask for the missing packet. However, if the missing packet(s) do not belong to the version one is interested in, the retransmission request was in fact unnecessary.

[00138] 5. The current RTCP feedback mechanisms are built around providing feedback on media streams based on stream ID (SSRC), packets (sequence number) and time interval (RTP Timestamps). There is almost never a field for indicating which payload type one is reporting on. Thus giving version specific feedback is difficult.

[00139] 6. The current RTCP media control messages [RFC5104] is oriented around controlling particular media flows, i.e. requests are done on RTCP level. Thus such mechanism needs to be redefined to support payload type multiplexing.

[00140] 7. The number of payload types are inherently limited. Accordingly, using payload type multiplexing limits the number of simulcast streams and does not scale.

7.1.2. SSRC Multiplexing

[00141] As each version of the source has its own SSRC and thus explicitly unique flows, the negative effects above (Section 7.1.1) are not present for SSRC multiplexed simulcast.

[00142] The SSRC multiplexing of simulcast version may require a receiver endpoint device to know that one is expected to only decode one of the versions and need not decode all of them simultaneously. This is currently a missing functionality as SDES CNAME cannot be used. The same CNAME has to be used for all flows connected to the same endpoint and location. A clear example of this could be video conference where an endpoint has 3 video cameras plus an audio mix being captured in the same room. As the media has a common timeline, it is important to be able to indicate that through the CNAME. Thus an endpoint device cannot use CNAME to indicate that multiple SSRCs with the same CNAME are different versions of the same source.

[00143] When a sender endpoint device 111-1 has all the versions in the same RTP session going to an RTP mixer 112 and the mixer 112 chooses to switch from forwarding one of the versions to forwarding another version, this creates an uncertainty in which SSRC one should use in the CSRC field (if used). As the sender endpoint device 111-1 is still delivering the same original source, such switch appears questionable to a receiver endpoint device 111-2 not having enabled simulcast in the direction to itself. Depending on what solution one chooses, one gets different effects here. If the CSRC is changed, then any message ensuring binding will need to be forwarded by the mixer 112, creating legacy issues. It has not been determined if there are downsides to not showing such switch.

[00144] The impact of SSRC collisions on the SSRC multiplexing will be highly depending on what method is used to bind the SSRCs that provides different versions. Upon a collision and a forced change of the SSRC, a sender endpoint device 111-1 will need to re-establish the binding to the other versions. By doing that, it will also likely be explicit when it comes to what the change was.

7.1.3. Session Multiplexing

[00145] Also session multiplexing does not have any of the negative effects that payload type multiplexing has (Section 7.1.1). As each flow is uniquely identified by RTP Session and SSRC, one can control and report on each flow explicitly.

[00146] One potential downside of session multiplexing is that it can become impossible without defining new RTCP message types to do truly synchronized media requests where one request goes to version A of source and another to version B of the same source. Due to the RTP session separation, an endpoint device will be forced to send different RTCP packets to the different RTP session contexts, thus losing the ability to send two different RTCP packets in the same compound packet and RTP session context. This can be a minor inconvenience.

[00147] Using the same SSRC in all the RTP sessions allows for quick binding between the different versions. It also enables an RTP mixer 112 that forwards one version to seamlessly decide to forward another version in a RTP session to a session participant endpoint device 111 that is not using simulcast in the direction from the RTP mixer to the participant endpoint device 111.

[00148] An SSRC collision forces a sender to change its SSRC in all sessions. Thus the collision-induced SSRC change may have bigger impact, as it affects all versions rather than a single version. But on the positive side, the binding between the versions will be immediate, rather than requiring additional signaling.

7.2. Signaling Impact

[00149] The method of multiplexing has significant impact on signaling functionality and how to perform it, especially if SDP [RFC4566] and SDP Offer/ Answer [RFC3264] is used.

7.2.1. Negotiating the use of Simulcast

[00150] There will be a need for negotiating the usage of simulcast in general. For payload type multiplexing, an endpoint device will need to indicate that different RTP payload types are intended as different simulcast versions. The endpoint device likely has standalone SDP attributes that indicate the relation between the payload types. As one needs unique payload type numbers for the different version. Thus, this increases the number of payload types needed within an RTP session. In worst case this may become a restriction as only 128 payload types are possible. This limitation is exacerbated if an endpoint device uses solutions like RTP and RTCP multiplexing [RFC5761] where a number of payload types are blocked due to the overlap between RTP and RTCP.

[00151] SSRC multiplexing will likely use a standalone attribute to indicate the usage of simulcast. In addition, it may be possible to use a mechanism in SDP that binds the different SSRCs together. The first part is non-controversial. However the second one has significant impact on the signaling load in sessions with dynamic session participation. As each new participant endpoint device joins a multiparty session, the existing participant endpoint devices that need to know the binding will need to receive an updated list of bindings. If that is done in SIP and SDP offer answer, a SIP re-Invite is required for each such transaction. Thus invoking all the SIP nodes related to invites, and in systems like IMS also a number of policy nodes. If a receiver endpoint device is required, which is likely, to receive the SSRC bindings prior to being able to decode any new source then the signaling channel may introduce additional delay before the receiver endpoint device can decode the media.

[00152] Session multiplexing results in one media description per version. It will be necessary to indicate which RTP sessions are in fact simulcast versions. For example, using a Media grouping semantics specific for this. Each of these sessions will be focused on the particular version they intended to transport.

[00153] Legacy fallback also needs to be considered, the impact on an endpoint that isn't simulcast enabled. For a payload type solution, a legacy endpoint that doesn't understand the indication that different RTP payload types are for different purpose may be slightly confused by the large amount of possibly overlapping or identical RTP payload types.

[00154] For an SSRC multiplexed session, a legacy endpoint device will ignore the SSRC binding signaling and from its perspective this session will look like an ordinary session but setup to handle all the versions simultaneously.

[00155] For session multiplexing, a legacy endpoint device will not understand the grouping semantic. It might either understand the grouping framework and thus determine that they are grouped for some purpose or not understand grouping at all as the offer simply looks like several different media sessions.

7.2.2. B andwi dth Negotiation

[00156] The payload type multiplexed session cannot negotiate bandwidth for the individual versions without extensions. The regular SDP bandwidth attributes can only negotiate the overall bandwidth that all versions will consume. This makes it difficult to determine that one should drop one or more versions due to lack of bandwidth between the peers. SSRC multiplexing suffers the same issues as payload type multiplexing, unless additional signaling (SSRC level attributes) is added.

[00157] Session multiplexing can negotiate bandwidth for each individual version and determine to exclude a particular version, and have the full knowledge on what it excludes to avoid consuming a certain amount of bandwidth.

7.2.3. Negotiation of Media Parameters

[00158] For the negotiation and setting of the media codec, e.g., the codec parameters and RTP payload parameters for the payload type multiplexing, it is possible for each version to be individually negotiated and set, by communications between the sender endpoint device and receiver endpoint device, because each version has unique payload types. The same is true for the session multiplexing where each version is negotiated by setting the parameters in the context of each RTP session. In endpoint device can be configured to provide additional signaling for the SSRC multiplexed version to enable a binding between the payload types and for which versions they are used. Otherwise, the RTP payload types are negotiated without any context of which version intends to use which payload type.

7.2.4. Negotiation of RTP/RTCP Extensions

[00159] When an endpoint device negotiates or configures RTP and RTCP extensions, they can be done on either session level, in direct relation to one or several RTP payload types. They are not negotiated in the context of an SSRC. Thus payload type multiplexing will need to negotiate any session level extensions for all the versions without version specific consideration, unless extensions are deployed. It can also negotiate payload specific versions at a version individual level. SSRC multiplexing cannot negotiate any extension related to a certain version without extensions. Session multiplexing will have the full freedom of negotiating extensions for each version individually without any additional extensions.

7.3. Network Aspects

[00160] The multiplex choice has impact on network level mechanisms. 7.3.1. Quality of Service

[00161] When it comes to Quality of Service mechanisms (e.g., signaling between sender and receiver endpoint devices), they are either flow based or marking based. RSVP [RFC2205] is an example of a flow based mechanism, while Diff-Serv [RFC2474] is an example of a Marking based one. If an endpoint device uses a marking based scheme, the associated operations and methods of multiplexing will not affect the possibility to use QoS. However, if an endpoint device uses flow based scheme, there is a clear difference between these operations and methods and those of the marking based scheme. Both Payload Type and SSRC multiplexing will result in all versions being part of the same 5-tuple (protocol, source address, destination address, source port, destination port) which is the most common selector for flow based QoS. Thus, separation of the level of QoS between versions is not possible. That is however possible if the endpoint device uses session based multiplexing, where each different version will be in a different RTP context and thus commonly being sent over different 5-tuples.

7.3.2. NAT Traversal

[00162] Both the payload and SSRC multiplexing will have only one RTP session, not introducing any additional NAT traversal complexities compared to not using simulcast and only have a single version. The session multiplexing is using one RTP session per simulcast version. Thus additional NAT/FW pinholes will be required. As it is expected that sessions using simulcast will use multiple media, there will already be more than a single (pair) of pinholes needed anyway. The additional pinhole will result in some extra delay in traversal mechanism such as ICE, however for mechanism that perform explicit control uPnP, NAT-PNP or PCP, such requests is expected to be possible to parallelize. Establishing additional pinholes will result in a slightly higher risk of NAT/FW traversal failure.

[00163] As most simulcast solutions will anyway not use a very large number of versions due to the cost in encoding resources etc. one can discuss if the extra pinholes are a significant cost. And if the conclusion is that they are, a more generalized mechanism for multiplexing RTP sessions onto the same underlying transport flow should be considered.

7.4. Summary

[00164] It is quite clear from the analysis that payload type multiplexing is not a realistic option for using simulcast. Both SSRC and session multiplexing are realistic to use. However, session multiplexing provides increased flexibility in usage, better support for network QoS, signaling flexibility, and support compared to SSRC multiplexing without defining additional extensions. Session multiplexing does however require additional NAT/FW pinholes to be opened.

[00165] Session multiplexing appears to be the best choice and is therefore recommended to be pursued as the single solution for simulcast.

8. Simulcast Extensions [00166] This section discusses various extensions that either are required or could provide system performance gains if they where specified.

8.1. Signaling Support for Simulcast

[00167] To enable the usage of simulcast using session multiplexing some minimal signaling support are required to be provided by endpoint devices 111. That support is discussed in this section. First of all there is need for a mechanism performed by endpoint devices 111 to identify the RTP sessions carrying simulcast alternatives to each other. Secondly, a receiver endpoint device needs to be able to identify the SSRC in the different sessions that are of the same media source but in different encodings.

8.1.1. Grouping Simulcast RTP Sessions

[00168] The proposal is to define a new grouping semantics for the session groupings framework [RFC5888]. This semantics, using tag "Simulcast ID" (SID), would both act as an indicator that session level simulcast is occurring and which sets of RTP sessions that carries simulcast alternatives to each other.

8.1.2. Binding SSRCs Across RTP Sessions

[00169] When a sender endpoint device 111-1 performs simulcast, it will, for each actual media source, have one SSRC in each session for which it currently provides an encoding alternative in. As a receiver endpoint device 111-2 or a mixer 112 will receive one or more of these, it is important that any RTP session participant beyond the sender endpoint device 111-1 can explicitly identify which SSRCs in the set of RTP sessions providing a simulcast service that are the same media source. Two extensions to RTP are explained below for how to accomplish this in accordance with some embodiments.

8.1.2.1. SDES Item SRCNAME

[00170] Source Descriptions are an approach that should work with all RTP topologies (assuming that any intermediary node (e.g., Central node 112) is supporting this item) and existing RTP extensions. Thus we propose one defines a new SDES item called the SRCNAME which identifies with an unique identifier a single media source, like a camera. That way even if multiple encodings or representations are produced, any endpoint device receiving the SDES information from a set of interlinked RTP sessions can determine which are the same source. The SRCNAME may commonly be per-session unique random identifiers generated identifiers according to "Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names

(CNAMEs)" [RFC6222]. [00171 ] This SRCNAME's relation to CNAME is the following. As CNAME represents an endpoint device 111 and a synchronization context, if the different representations should be played out synchronized and without overlap if switching between them, then need to be in the same synchronization context. Thus in almost all cases all SSRCs with the same SRCNAME will have the same CNAME. A given CNAME may also contain multiple sets of sources using different SRCNAMEs.

8.1.2.2. SDP SSRC Grouping Semantics for SRCNAME

[00172] Source-Specific Media Attributes in the Session Description Protocol (SDP) defines a way of declaring attributes for SSRC in each session in SDP. With a new SDES item, one can use this framework to define how also the SRCNAME can be provided for each SSRC in each RTP session, thus enabling an endpoint device 111 to declare and learn the simulcast bindings ahead of receiving RTP/RTCP packets.

8.2. Mixer Requests of Client streams

[00173] To increase the efficiency of simulcast systems, it is highly desirable that an RTP middlebox can signal to the client encoding and transmitting the streams if a particular stream is currently needed or not. This needs to be a quick and media plane oriented solution as it changes based on for example the user's speech activity or the user's selection in the user interface.

Although several SIP and SDP -based methods would be possible, the required responsiveness suggests use of TMMBR from [RFC5104] with a bandwidth value of 0 to temporarily pause a certain SSRC and re-establishing transmission through TMMBR with a non-zero value.

8.3. Multiplexing Multiple RTP Sessions on Single Flow

[00174] It would be beneficial for RTP in non-legacy cases if multiple RTP sessions could be multiplexed in a standardized way on top of a single transport layer flow. That way the cost of opening additional transport flows and the needed NAT/FW traversal would be avoided. This has impact on use cases using flow based QoS mechanism that needs differentiated service levels between sessions.

[00175] Such a mechanism should thus be optional to use, but as there is likely a general interest in such a mechanism.

9. Internet Assigned Numbers Authority (IANA) Considerations

[00176] Following the guidelines in [RFC4566], in [RFC5888], and in [RFC3550], the IANA is requested to register: 1. The SID grouping tag to be used with the grouping framework, as defined in Section 8.1.1

2. A new SDES Item named SRCNAME, as defined in Section 8.1.2.1

3. The max-send-ssrc and max-recv-ssrc SDP attributes as defined in Section 5.1

4. The bw attribute as defined in Section 5.2

5. The bw attribute scope registry rules

6. The bw attribute semantics registry rules

10. Security Considerations

[00177] There is minimal difference in security between the simulcast solutions. Session multiplexing may have some additional overhead in the key-management but also provide the flexibility to exclude certain users from certain versions by using session specific keys and not allow all users access in the key-management. But this may have minimal benefit.

[00178] The multi-stream signaling has as other SDP based signaling issues with man in the middles that may modify the SDP as an attack on either the service in general or a particular endpoint. This can as usual be resolved by a security mechanism that provides integrity and source authentication between the signaling peers.

[00179] The SDES SRCNAME being opaque identifiers could potentially carry additional meanings or function as overt channel. If the SRCNAME would be permanent between sessions, they have the potential for compromising the users privacy as they can be tracked between sessions. See RFC6222 for more discussion.

11. Additional Sender Endpoint Device Operations and Methods for Identifying Related Media Streams

[00180] Figures 4-9 are flow charts that illustrate operations and methods that can be performed by sender endpoint devices to identify simultaneously communicated media streams as having related media data in accordance with some embodiments. Figures 11-21 are flow charts that illustrate corresponding operations and methods that can be performed by receiver endpoint devices to identify and use the related media streams in accordance with some embodiments

[00181] As explained above, Figure 4 illustrates operations and methods that can be performed by a sender endpoint device 111-1 to communicate (block 400) a plurality of media streams, simultaneously in time, for a session (e.g., a RTP session) toward a receiver endpoint device 111-2. The sender endpoint device 111-1 also communicates (block 402) information toward the receiver endpoint device 111-2 that identifies which of the media streams contain related media data (content).

[00182] Referring to the further embodiments of Figure 5, the sender endpoint device 111-1 can communicate (block 500) the media streams of simulcast media streams. The sender endpoint device 111-1 can also communicate (block 502) information identifying which of the media streams contain different encoded versions of the same media content.

[00183] Referring to the further embodiment of Figure 6, the sender endpoint device 111-1 can communicate (block 600) the media streams of simulcast media streams. The sender endpoint device 111-1 can also communicate (block 602) information identifying which of the media streams contain different spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source.

[00184] Referring to the further embodiment of Figure 7, the sender endpoint device 111-1 can communicate (block 700) the media streams of simulcast media streams. The sender endpoint device 111-1 can also communicate (block 702) information identifying which of the media streams are from the sender endpoint device 111-1.

[00185] SSRCs and stream identifiers can be communicated to the receiver endpoint device 111-2 to identify groups of the media streams containing related media data across a plurality of RTP sessions. Referring to the embodiment of Figure 8, the sender endpoint device 111-1 can communicate (block 800) the media streams toward the receiver endpoint device 111-2 through a plurality of different RTP sessions. The sender endpoint device 111-1 also communicates (block 802) information, using RTCP, including a plurality of SSRCs that each uniquely identify a different one of the RTP sessions, and further including at least one stream identifier defined to uniquely identify each group of the media streams containing related media content.

[00186] Alternatively, the SSRCs can be communicated through RTCP and the stream identifiers can be communicated through RTP as part of the media streams. Referring to the embodiment of Figure 9, the sender endpoint device 111-1 can communicate (block 900) the media streams toward the receiver endpoint device 111-2 through a plurality of RTP sessions. The SSRCs are communicated (block 902) via RTCP toward the receiver endpoint device 111-2. Information including at least one stream identifier defined to uniquely identify each group of the media streams containing related media content is communicated (block 904) toward the receiver endpoint device (111-2).

[00187] Alternatively, the SSRCs and stream identifiers are communicated through SDP communications during setup of RTP sessions. Referring to the embodiment of Figure 10, the sender endpoint device 111-1 can communicate (block 1000) the media streams toward the receiver endpoint device 111-2 through a plurality of RTP sessions. Information can be communicated (block 1002) through SDP during setup of the RTP sessions, where the

information includes a plurality of SSRCs that each uniquely identify a different one of the RTP sessions, and further includes at least one stream identifier defined to uniquely identify each group of the media streams containing related media content.

12. Additional Receiver Endpoint Device Operations and Methods for Identifying Related Media Streams

[00188] A receiver endpoint device 111-2 can perform the corresponding operations and methods of Figure 11 to receive, identify, and use the related media streams. As explained above, the receiver endpoint device 111-2 may be a RTP mixer or other endpoint device. Referring to Figure 11, the receiver endpoint device 111-2 receives (block 1100) a plurality of media streams, simultaneously in time, from the sender endpoint device 111-1. Information is also received (block 1 102) from the sender endpoint device 111-1 that identifies which of the media streams contain related media data. The receiver endpoint device 111-2 then selects (block 1104) among the media streams responsive to the received information.

[00189] In one embodiment, the receiver endpoint device 111-2 operates as a RTP mixer (central node) 112 that selectively forwards a media stream(s), such as described above in section 3.1. Referring to the embodiment of Figure 12, the RTP mixer 112 selects (block 1200) at least one of the media streams responsive to the received information, and forwards (block 1202) the selected at least one media stream to another receiver endpoint device 111-3. The RTP mixer 112 may select among received media streams for output to the receiver endpoint device 111-3 responsive to user input that is received, for example, from a Graphical User Interface associate with the receiver endpoint device 111-3. For example, when a user selects a video display window or deselects (e.g., hides) a video display widow on a display device, a responsive message can be communicated to the RTP mixer 112 that causes the RTP mixer to start sending a media stream to a newly displayed video window, stop sending a media stream to a newly hidden video window, and/or perform other operations which select among the received media streams for forwarding to the receiver endpoint device.

[00190] When performing the selection (block 1104 of Figure 11) among the media streams responsive to the received information, the receiver endpoint device 111-2 may perform the operations and methods of Figure 12. The receiver endpoint device 111-2 selects (block 1200) at least one of the media streams responsive to the received information, and forwards (block 1202) the selected at least one media stream to the other receiver endpoint device 111-3.

[00191] Alternatively, when performing the selection (block 1104 of Figure 11) among the media streams responsive to the received information, the receiver endpoint device 111-2 may perform the operations and methods of Figure 13. The sender endpoint device 111-1 may be configured to operate a scalable encoder to output a media stream, which may be sent as a single SSRC, to the receiver endpoint device 111-2. The receiver endpoint device 111-2 may, in one embodiment, control what scalable coding is used by the sender endpoint device 111-1 to output the media stream. In another embodiment, the receiver endpoint device 111-2 is configured to select the media stream from the sender endpoint device 111-1 responsive to the received information, and to perform (block 1300) scalable video coding and/or transcoding on the media stream to generate at least one media stream, and forward (block 1302) the generated at least one media stream to the other receiver endpoint device 111-3.

[00192] In some embodiments, the receiver endpoint device 111-2 operates as a RTP mixer (central node) 112 that combines at media content from at least two of the media streams, such as described above in section 3.1. When performing the selection (block 1104 of Figure 11) among the media streams responsive to the received information, the receiver endpoint device 111-2 may perform the operations and methods of Figure 14. The receiver endpoint device 111-2 selects (block 1400) at least two of the media streams responsive to the received information. The selected at least two media streams are combined (block 1402) to generate a combined media stream. The receiver endpoint device 111-2 forwards (block 1404) the combined media stream to the other receiver endpoint device 111-3.

[00193] The receiver endpoint device 111-2 can be configured to perform the operations and methods of Figures 15-17 which correspond to the sender endpoint device operations and methods of Figures 4-6. The receiver endpoint device 111-2 receives (block 1500) the media streams as simulcast media streams from the sender endpoint device 111-1. The receiver endpoint device 111-2 uses the received information to identify (block 1502) which of the media streams contain different encoded types of the same media content.

[00194] Referring to the embodiment of Figure 16, the receiver endpoint device 111-2 receives (block 1600) the media streams as simulcast media streams from the sender endpoint device 111-1, and uses the received information to identify (block 1602) which of the media streams contain different spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source.

[00195] Referring to the embodiment of Figure 17, the receiver endpoint device 111-2 receives (block 1700) the media streams as simulcast media streams from the sender endpoint device 111-1, and uses the received information to identify (block 1702) which of the media streams are from the sender endpoint device.

[00196] The SSRCs and stream identifiers may be used by a receiver endpoint device to identify groups of the media streams containing related media data across a plurality of RTP sessions. Referring to the embodiment of Figure 18, a receiver endpoint device 111-2 simultaneously receives (block 1800) the media streams through a plurality of RTP sessions. The received information includes a plurality of SSRCs that are each used by the receiver endpoint device 111-2 to uniquely identify (block 1802) a different one of the RTP sessions. The received information further includes at least one stream identifier that is defined to enable the receiver endpoint device 111-2 to uniquely identify (block 1802) each group of the media streams containing related media content.

[00197] The SSRCs can be received through RTCP and the stream identifiers can be communicated through RTP as part of the media streams. Referring to the example embodiment of Figure 19, the receiver endpoint device 111-2 receives (block 1900) the media streams as simulcast media streams from the sender endpoint device 111-1. The receiver endpoint device 111-2 also receives (block 1902) information, using RTCP, that includes a plurality of SSRCs that are each used to uniquely identify a different one of the RTP sessions. The receiver endpoint device 111-2 also receives (block 1904) information, using RTP, that includes at least one stream identifier that is used to uniquely identify each group of the media streams containing related media content.

[00198] The SSRCs and stream identifiers can be received through SDP communications during setup of RTP sessions. Referring to the example embodiment of Figure 20, the receiver endpoint device 111-2 receives (block 2000) the media streams as simulcast media streams from the sender endpoint device 111-1. The receiver endpoint device 111-2 also receives (block 2002) information through SDP during setup of the RTP sessions, where the information includes a plurality of SSRCs that are each used to uniquely identify a different one of the RTP sessions, and further includes at least one stream identifier that is used to uniquely identify each group of the media streams containing related media content.

12. Additional Operations and Methods for the Endpoint Device of Figure 2

[00199] The endpoint device 111 of Figure 2 can be configured to perform the operations and methods of one or more of the embodiments of the sender endpoint device 111-1 of Figures 4-20. Accordingly, the network interface 133 of the sender endpoint device 111-1 can be configured to communicate over the network 101 with a receiver endpoint device 111-2. The processor 131 of the endpoint device 111-1 can be configured to simultaneously communicate a plurality of media streams through at least one RTP session toward the receiver endpoint device 111-2, and to communicate information toward the receiver endpoint device 111-2 that identifies which of the media streams contain related media data. [00200] In a further embodiment, the processor 131 of the sender endpoint device 111-1 is further configured to communicate the plurality of media streams as simulcast media streams toward the receiver endpoint device 111-2, and the communicated information identifies which of the media streams contain different encoded types of the same media content and/or identifies which of the media streams contain different spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source.

[00201] In a further embodiment, the processor 131 of the sender endpoint device 111-1 is further configured to communicate the plurality of media streams toward the receiver endpoint device 111-2 through a plurality of RTP sessions, and the information communicated toward the receiver endpoint device 111-2 includes a plurality of SSRCs that each uniquely identify a different one of the RTP sessions, and the information further includes at least one stream identifier defined to uniquely identify each group of the media streams containing related media content.

[00202] In a further embodiment, the processor 131 of the sender endpoint device 111-1 is further configured to communicate the plurality of media streams toward the receiver endpoint device 111-2 through a plurality of RTP sessions, and the information communicated toward the receiver endpoint device 111-2 includes a plurality of SSRCs that each uniquely identify a different one of the RTP sessions, and the information further includes at least one stream identifier defined to uniquely identify each group of the media streams containing related media content.

[00203] The endpoint device 111 of Figure 2 can be alternatively or additionally be configured to perform the operations and methods of one or more of the embodiments of the receiver endpoint device 111-2 of Figures 4-20. Accordingly, the network interface 133 of a receiver endpoint device 111-2 can be configured to communicate over the network 101 with a sender endpoint device 111-1. The processor 131 of the receiver endpoint device 111-2 can be configured to simultaneously receive a plurality of media streams for at least one RTP session from the sender endpoint device 111-1, receive information from the sender endpoint device 111- 1 that identifies which of the media streams contain related media data, and select among the media streams responsive to the received information.

[00204] In a further embodiment, the processor 131 of the receiver endpoint device 111-2 is further configured to receive the plurality of media streams as simulcast media streams from the sender endpoint device 111-1, and the received information identifies which of the media streams contain different encoded types of the same media content and/or identifies which of the media streams contain different spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source. [00205] In a further embodiment, the processor 131 of the receiver endpoint device 111-2 is further configured to receive the plurality of media streams through a plurality of RTP sessions, and the received information includes a plurality of Synchronization Source Identifiers, SSRCs, that each uniquely identify a different one of the RTP sessions, and the received information further includes at least one stream identifier defined to uniquely identify each group of the media streams containing related media content.

13. Additional Receiver Endpoint Device Operations and Methods for Advertising its

Capabilities for Simultaneously Receiving a Plurality of Media Streams During Negotiations with a Sender Endpoint Device

[00206] A receiver endpoint device 111-2 can advertises its capabilities for simultaneously receiving a plurality of media streams. Referring to the embodiment of Figure 21, the receiver endpoint device 111-2 advertises (block 2100) its capability information to a sender endpoint device 111-1 that defines a capability of the receiver endpoint device 111-2 to simultaneously receive media streams. The receiver endpoint device 111-2 receives (2102) the media streams at a same time (simultaneously) from the sender endpoint device 111-1 based on the advertised capability information. The simultaneous media streams may, in some embodiments, be sent through a same RTP session.

[00207] In the further embodiment of Figure 22, the receiver endpoint device 111-2 exchanges (block 2200) session negotiation information with the sender endpoint device 111-1 to setup the communication session prior to receiving the plurality of media streams. The receiver endpoint device 111-2 communicates the capability information to the sender endpoint device 111-1 as part of the session negotiation information.

[00208] In the further embodiment of Figure 23, the receiver endpoint device 111-2 communicates (2300) the capability information to the sender endpoint device 111-1 using SDP as part of re-negotiation communications with the sender endpoint device 111-1 to re-negotiate the session.

[00209] In the further embodiment of Figure 24, before receiving the media streams, the receiver endpoint device 111-2 waits (block 2400) for receipt of an acknowledgement message from the sender endpoint device 111-1 indicating agreement to constrain its communication of the media streams to the receiver endpoint device 111-2 according to the session negotiation information offered by the receiver endpoint device 111-2.

[00210] In the further embodiment of Figure 25, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 advertises (block 2500) a maximum number of media streams that the receiver endpoint device 111-2 is presently capable of simultaneously receiving, and which is dependent upon the sender endpoint device 111-1 using a defined coding to encode data carried by the media streams.

[00211] In the further embodiment of Figure 26, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 advertises (block 2600) a maximum combined bandwidth for all of the media streams and/or a maximum per-stream bandwidth that the receiver endpoint device 111-2 is presently capable of simultaneously receiving from the sender endpoint device 111-1. The maximum combined bandwidth for all of the media streams and/or the maximum per-stream bandwidth can be dependent upon the sender endpoint device 111-1 using a defined coding to encode data carried by the media streams.

[00212] As shown in the embodiment of Figure 27, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 can advertise (block 2700) which of a plurality of defined coding parameters that the sender endpoint device 111-1 should use to encode data carried by particular ones of the media streams to be communicated to the receiver endpoint device 111-2.

[00213] As shown in the embodiment of Figure 28, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 can advertise (block 2800) a token rate and a bucket size for a token bucket algorithm that will be performed by the receiver endpoint device 111-2 to constrain a data rate of media streams that will be simultaneously received from the sender endpoint device 111-1.

[00214] Referring to the embodiment of Figure 29, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 can advertise (block 2900) a number of media streams containing related media data that can be simultaneously received by the receiver endpoint device 111-2.

[00215] Referring to the embodiment of Figure 30, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 can advertise (block 3000) a number of media streams containing different encoded types of the same media content that can be simultaneously received.

[00216] Referring to the embodiment of Figure 31 , when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 can advertise (block 3100) a number of media streams containing different spatial sampling versions, temporal sampling versions, and/or lossy quality versions of a same media source that can be simultaneously received. [00217] As shown in the embodiment of Figure 32, when advertising (block 2100 of Figure 21) its capability information to the sender endpoint device 111-1, the receiver endpoint device 111-2 can advertise (block 3200) which of a plurality of different defined spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source that the sender endpoint device 111-1 should communicate as particular ones of the media streams to the receiver endpoint device 111-2.

14. Additional Sender Endpoint Device Operations and Methods for Advertising its Capabilities for Simultaneously Sending a Plurality of Media Streams During Negotiations with a sender Endpoint Device

[00218] A sender endpoint device 111-1 can advertises its capabilities for simultaneously sending a plurality of media streams. Referring to the embodiment of Figure 33, the sender endpoint device 111-1 advertises (block 3300) its capability information to a receiver endpoint device 111-2 that defines a capability of the sender endpoint device 111-1 to simultaneously communicate a plurality of media streams. The sender endpoint device 111-1 also communicates (block 3302) the plurality of media streams at a same time toward the receiver endpoint device (111-2) based on the advertised capability information. The simultaneous media streams may, in some embodiments, be received through a same RTP session.

[00219] In the further embodiment of Figure 34, the sender endpoint device 111-1 exchanges (block 3400) session negotiation information with the receiver endpoint device 111-2 to setup the communication session prior to communicating the plurality of media streams. The sender endpoint device 111-1 communicates the capability information to the receiver endpoint device 111-2 as part of the session negotiation information.

[00220] In the further embodiment of Figure 35, the sender endpoint device 111-1 communicates (block 3500) the capability information to the receiver endpoint device 111-2 using SDP as part of re-negotiation communications with the receiver endpoint device 111-2 to renegotiate the session.

[00221] In the further embodiment of Figure 36, when advertising (block 3300) its capability information to the receiver endpoint device 111-2, the sender endpoint device 111-1 can advertise (block 3600) a maximum number of media streams that the sender endpoint device 111-1 is presently capable of simultaneously communicating to the receiver endpoint device 111- 2. The sender endpoint device 111-1 may advertise (block 3600) that the maximum number of media streams is dependent upon the receiver endpoint device 111-2 being capable of using a defined coding to decode data carried by the media streams. [00222] In the further embodiment of Figure 37, when advertising (block 3300) its capability information to the receiver endpoint device 111-2, the sender endpoint device 111-1 can advertise (block 3700) a maximum per-stream bandwidth that the sender endpoint device 111- 1 is presently capable of simultaneously sending to the receiver endpoint device 111-2. The sender endpoint device 111-1 may advertise (block 3700) that the maximum per-stream bandwidth is dependent upon the receiver endpoint device 111-2 being capable of using a defined coding to decode data carried by the media streams.

[00223] Referring to the further embodiment of Figure 38, when advertising (block 3300) its capability information to the receiver endpoint device 111-2, the sender endpoint device 111-1 can advertise (block 3800) which of a plurality of defined coding parameters that the sender endpoint device 111-1 will use to encode data carried by particular ones of the media streams to be communicated to the receiver endpoint device 111-2.

[00224] Referring to the further embodiment of Figure 39, before communicating (block 3302) the plurality of media streams at a same time toward the receiver endpoint device 111-2, the sender endpoint device 111-1 can receive (block 3900) an advertised token rate and an advertised bucket size for a token bucket algorithm that will be performed by the receiver endpoint device 111-2 to receive the media streams. The sender endpoint device 111-1 can then constrain (block 3902) a data rate of the media streams that it simultaneously communicates to the receiver endpoint device (111-2) in response to the advertised token rate and the advertised bucket size.

[00225] Referring to the further embodiment of Figure 40, when advertising (block 3300) its capability information to the receiver endpoint device 111-2, the sender endpoint device 111-1 can advertise (block 4000) a number of media streams containing different spatial sampling versions, temporal sampling versions, and/or lossy quality versions of a same media source that are available to be simultaneously communicated to the receiver endpoint device 111-2.

[00226] In the further embodiment of Figure 40, when advertising (block 3300) its capability information to the receiver endpoint device 111-2, the sender endpoint device 111-1 can advertise (block 4100) which of a plurality of different defined spatial sampling versions, temporal sampling versions, and/or lossy quality versions of a same media source that are available for communication from the sender endpoint device 111-1 as particular ones of the media streams to the receiver endpoint device 111-2.

15. Abbreviations

[00227] Some of the abbreviations used herein are defined below: FW - Firewall IANA - Internet Assigned Numbers Authority

NAT - Network Address Translation

QoS - Quality of Service

RTP - Real-time Transport Protocol

RTCP - Real-time Transport Control Protocol

SDES - Source DEScription

SDP - Session Description Protocol

SID - Simulcast IDentifier

SSRC - Synchronization Source identifier

SRCNAME - Source Name

VoIP - Voice over IP

16. Further Definitions and Embodiments

[00228] When a node is referred to as being "connected", "coupled", "responsive", or variants thereof to another node, it can be directly connected, coupled, or responsive to the other node or intervening nodes may be present. In contrast, when an node is referred to as being "directly connected", "directly coupled", "directly responsive", or variants thereof to another node, there are no intervening nodes present. Like numbers refer to like nodes throughout. Furthermore, "coupled", "connected", "responsive", or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or", abbreviated "/", includes any and all combinations of one or more of the associated listed items.

[00229] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, nodes, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, nodes, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation "e.g.", which derives from the Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation "i.e.", which derives from the Latin phrase "id est," may be used to specify a particular item from a more general recitation.

[00230] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).

[00231] These computer program instructions may also be stored in a tangible computer- readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. [00232] A tangible, non-transitory computer-readable medium may include an electronic, magnetic, optical, electromagnetic, or semiconductor data storage system, apparatus, or device. More specific examples of the computer-readable medium would include the following: a portable computer diskette, a random access memory (RAM) circuit, a read-only memory (ROM) circuit, an erasable programmable read-only memory (EPROM or Flash memory) circuit, a portable compact disc read-only memory (CD-ROM), and a portable digital video disc read-only memory (DVD/BlueRay).

[00233] The computer program instructions may also be loaded onto a computer and/or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer and/or other programmable apparatus to produce a computer- implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of the present invention may be embodied in hardware and/or in software (including firmware, resident software, microcode, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.

[00234] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows. [00235] Many different embodiments have been disclosed herein, in connection with the above description and the drawings. It will be understood that it would be unduly repetitious and obfuscating to literally describe and illustrate every combination and subcombination of these embodiments. Accordingly, the present specification, including the drawings, shall be construed to constitute a complete written description of various example combinations and

subcombinations of embodiments and of the manner and process of making and using them, and shall support claims to any such combination or subcombination.

[00236] Other network nodes, UEs, and/or methods according to embodiments of the invention will be or become apparent to one with skill in the art upon review of the present drawings and description. It is intended that all such additional network nodes, UEs, and/or methods be included within this description, be within the scope of the present invention, and be protected by the accompanying claims. Moreover, it is intended that all embodiments disclosed herein can be implemented separately or combined in any way and/or combination.

17. References

17.1. Normative References

[00237] [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement

Levels", BCP 14, RFC 2119, March 1997.

[00238]

[00239] [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003.

[00240] [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax

Specifications: ABNF", STD 68, RFC 5234, January 2008.

[00241] [RFC6222] Begen, A., Perkins, C, and D. Wing, "Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)", RFC 6222, April 2011. 17.2. Informative References

[00242] [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. Jamin,

"Resource ReSerVation Protocol (RSVP)— Version 1 Functional Specification", RFC 2205, September 1997.

[00243] [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998.

[00244] [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/ Answer Model with

Session Description Protocol (SDP)", RFC 3264, June 2002.

[00245]

[00246] [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text Conversation", RFC 4103, June 2005.

[00247] [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session Description Protocol", RFC 4566, July 2006.

[00248] [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg, "RTP Retransmission Payload Format", RFC 4588, July 2006.

[00249] [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec Control Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", RFC 5104, February 2008.

[00250] [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, January 2008.

[00251] [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol (SDP)", RFC 5576, June 2009.

[00252] [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and Control Packets on a Single Port", RFC 5761, April 2010. [00253] [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

Claims

CLAIMS:

1. A method of operating a receiver endpoint device (111-2) that is configured to communicate with a sender endpoint device (111-1), the method comprising:

advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams; and

receiving (2102) the plurality of media streams at a same time from the sender endpoint device (111-1) during the communication session based on the advertised capability information.

2. The method of Claim 1, further comprising:

exchanging (2200) session negotiation information with the sender endpoint device (111- 1) to setup a session prior to receiving the plurality of media streams, wherein the capability information is communicated to the sender endpoint device (111-1) as part of the session negotiation information.

3. The method of Claim 2, further comprising:

communicating (2300) the capability information to the sender endpoint device (111-1) using Session Description Protocol, SDP, as part of re-negotiation communications with the sender endpoint device (111-1) to re-negotiate the session.

4. The method of Claim 2, further comprising:

before receiving the media streams, waiting (2400) for receipt of an acknowledgement message from the sender endpoint device (111-1) agreeing to constrain its communication of the media streams to the receiver endpoint device (111-2) according to the session negotiation information offered by the receiver endpoint device (111-2).

5. The method of Claim 1, wherein advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams comprises:

advertising (2500) a maximum number of media streams that the receiver endpoint device (111-2) is presently capable of simultaneously receiving from the sender endpoint device (111-1).

6. The method of Claim 5, wherein advertising (2500) the maximum number of media streams that the receiver endpoint device (111-2) is presently capable of simultaneously receiving from the sender endpoint device (111-1) comprises:

advertising (2500) that the maximum number of media streams is dependent upon the sender endpoint device (111-1) using a defined coding to encode data carried by the media streams.

7. The method of Claim 1, wherein advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams comprises:

advertising (2600) a maximum combined bandwidth for all of the media streams and/or a maximum per-stream bandwidth that the receiver endpoint device (111-2) is presently capable of simultaneously receiving from the sender endpoint device (111-1).

8. The method of Claim 7, wherein advertising (2600) the maximum combined bandwidth for all media streams and/or the maximum per-stream bandwidth that the receiver endpoint device (111-2) is presently capable of simultaneously receiving from the sender endpoint device (111-1) comprises:

advertising (2600) that the maximum combined bandwidth for all of the media streams and/or the maximum per-stream bandwidth is dependent upon the sender endpoint device (111-1) using a defined coding to encode data carried by the media streams.

9. The method of Claim 1, wherein advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams comprises:

advertising (2700) which of a plurality of defined coding parameters that the sender endpoint device (111-1) should use to encode data carried by particular ones of the media streams to be communicated to the receiver endpoint device (111-2).

10. The method of Claim 1, wherein advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams comprises:

advertising (2800) a token rate and a bucket size for a token bucket algorithm that will be performed by the receiver endpoint device (111-2) to constrain a data rate of media streams that will be simultaneously received from the sender endpoint device (111-1).

11. The method of Claim 1, wherein advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams comprises:

advertising (2900) a number of media streams containing related media data that can be simultaneously received by the receiver endpoint device (111-2).

12. The method of Claim 11, wherein advertising (2100) a number of media streams containing related media data that can be simultaneously received by the receiver endpoint device (111-2) comprises:

advertising (3000) a number of media streams containing different encoded types of the same media content that can be simultaneously received.

13. The method of Claim 11, wherein advertising (2100) a number of media streams containing related media data that can be simultaneously received by the receiver endpoint device (111-2) comprises:

advertising (3100) a number of media streams containing different spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source that can be simultaneously received.

14. The method of Claim 1, wherein advertising (2100) capability information to the sender endpoint device (111-1) that defines a capability of the receiver endpoint device (111-2) to simultaneously receive a plurality of media streams comprises:

advertising (3200) which of a plurality of different defined spatial sampled versions, temporal sampled versions, and/or lossy quality versions of a same media source that the sender endpoint device (111-1) should communicate as particular ones of the media streams to the receiver endpoint device (111-2).

15. A method of operating a sender endpoint device (111-1) that is configured to communicate with a receiver endpoint device (111-2), the method comprising:

advertising (3300) capability information to the receiver endpoint device (111-2) that defines a capability of the sender endpoint device (111-1) to simultaneously communicate a plurality of media streams; and

communicating (3302) the plurality of media streams at a same time toward the receiver endpoint device (111-2) based on the advertised capability information.

16. The method of Claim 15, further comprising:

exchanging (3400) session negotiation information with the receiver endpoint device (111- 2) to setup a session prior to communicating the plurality of media streams, wherein the capability information is communicated to the receiver endpoint device (111-2) as part of the session negotiation information.

17. The method of Claim 16, further comprising:

communicating (3500) the capability information from the sender endpoint device (111-1) to the receiver endpoint device (111-2) using Session Description Protocol, SDP, as part of renegotiation communications with the receiver endpoint device (111-2) to re-negotiate the session.

18. The method of Claim 15, wherein advertising (3300) capability information to the receiver endpoint device (111-2) that defines a capability of the sender endpoint device (111-1) to simultaneously communicate a plurality of media streams comprises:

advertising (3600) a maximum number of media streams that the sender endpoint device (111-1) is presently capable of simultaneously communicating to the receiver endpoint device (111-2).

19. The method of Claim 18, wherein advertising (3300) the maximum number of media streams that the sender endpoint device (111-1) is presently capable of simultaneously

communicating to the receiver endpoint device (111-2) comprises:

advertising (3600) that the maximum number of media streams is dependent upon the receiver endpoint device (111-2) being capable of using a defined coding to decode data carried by the media streams.

20. The method of Claim 15, wherein advertising (3300) capability information to the receiver endpoint device (111-2) that defines a capability of the sender endpoint device (111-1) to simultaneously communicate a plurality of media streams comprises:

advertising (3700) a maximum per-stream bandwidth that the sender endpoint device (111-1) is presently capable of simultaneously sending to the receiver endpoint device (111-2).

21. The method of Claim 20, wherein advertising (3300) the maximum per-stream bandwidth that the receiver endpoint device (111-2) is presently capable of simultaneously receiving from the sender endpoint device (111-1) comprises: advertising (3700) that the maximum per-stream bandwidth is dependent upon the receiver endpoint device (111-2) being capable of using a defined coding to decode data carried by the media streams.

22. The method of Claim 15, wherein advertising (3300) capability information to the receiver endpoint device (111-2) that defines a capability of the sender endpoint device (111-1) to simultaneously communicate a plurality of media streams comprises:

advertising (3800) which of a plurality of defined coding parameters that the sender endpoint device (111-1) will use to encode data carried by particular ones of the media streams to be communicated to the receiver endpoint device (111-2).

23. The method of Claim 15, wherein communicating (3302) the plurality of media streams at a same time toward the receiver endpoint device (111-2) comprises:

receiving (3900) an advertised token rate and an advertised bucket size for a token bucket algorithm that will be performed by the receiver endpoint device (111-2) to receive the media streams; and

constraining (3902) a data rate of media streams simultaneously communicated to the receiver endpoint device (111-2) in response to the advertised token rate and the advertised bucket size.

24. The method of Claim 15, wherein advertising (3300) capability information to the receiver endpoint device (111-2) that defines a capability of the sender endpoint device (111-1) to simultaneously communicate a plurality of media streams comprises:

advertising (4000) a number of media streams containing different spatial sampling versions, temporal sampling versions, and/or lossy quality versions of a same media source that are available to be simultaneously communicated to the receiver endpoint device (111-2).

25. The method of Claim 15, wherein advertising (3300) capability information to the receiver endpoint device (111-2) that defines a capability of the sender endpoint device (111-1) to simultaneously communicate a plurality of media streams comprises:

advertising (4100) which of a plurality of different defined spatial sampling versions, temporal sampling versions, and/or lossy quality versions of a same media source that are available for communication from the sender endpoint device (111-1) as particular ones of the media streams to the receiver endpoint device (111-2).