KR20100021435A

KR20100021435A - Active speaker identification

Info

Publication number: KR20100021435A
Application number: KR1020097025464A
Authority: KR
Inventors: 레지스 제이. 크리논; 후마윤 엠. 칸; 다리보르 쿠코레카
Original assignee: 마이크로소프트 코포레이션
Priority date: 2007-06-12
Filing date: 2008-05-30
Publication date: 2010-02-24
Also published as: US20080312923A1; US8385233B2; RU2483452C2; KR101486607B1; CN101689998A; EP2163035A1; US9160775B2; EP2163035A4; EP2163035B1; WO2008157005A1; US20140177482A1; RU2009146029A; BRPI0812128A2; JP2010529814A; JP5579598B2; US20130138740A1; US8717949B2

Abstract

Procedures for identifying clients in an audio event are described. In an example, a media server may order clients providing audio based on the input level. An identifier may be associated with the client for identifying the client providing input within the event. The ordered clients may be included in a list which may be inserted into a packet header carrying the audio content.

Description

Active speaker identification {ACTIVE SPEAKER IDENTIFICATION}

미디어 회의 참여자들은 회의의 다른 참여자들을 식별하는 데에 있어 어려움이 있을 수 있다. 참여자는 화자(speaker)의 음성 또는 화자의 얼굴에 익숙하지 않거나 오디오 교환이 청취자에게 혼동을 줄 수 있다. 후자의 경우, 청취자는, 말을 하건 말을 하지 않건 간에, 여러 명의 참여자가 동시에 말을 하거나 여러 참여자들 간에 빠른 교환이 있을 경우 혼란스러워질 수 있다. 어떤 경우, 화자들은 "저는 Bob입니다..."와 같이 그/그녀의 이름을 포함시킬 수 있거나 청취자가 이전 화자의 신원(identity)을 물어볼 수 있다. 이러한 이슈들의 복잡성은 말하고 있거나 오디오 입력에 기여하는(contributing) 참여자들의 수가 늘어날수록 증가할 수 있다. 청취자가 대화 중의 "문맥적인 단서"로부터 화자의 신원을 짐작할 수 있더라도, 어떤 경우에는, 참여자들은 어떤 참여자들이 오디오 입력을 제공하는지 이해하지 못할 수 있다.Media conference participants may have difficulty identifying other participants in the conference. Participants may not be familiar with the speaker's voice or the speaker's face or the audio exchange may confuse the listener. In the latter case, the listener may be confused if multiple participants speak at the same time or there is a quick exchange between the participants, whether or not to speak. In some cases, the speaker may include his / her name, such as "I am Bob ..." or the listener may ask for the identity of the previous speaker. The complexity of these issues can increase as the number of participants speaking or contributing to the audio input increases. Although the listener may guess the speaker's identity from the "contextual clue" in the conversation, in some cases the participants may not understand which participants provide the audio input.

또한, 정보 전달을 위한 대역폭 소비, 또는 데이터 처리량(throughput)을 최소화하는 것이 바람직할 수 있다. 예를 들면, 데이터를 전송하기 위한 물리적인 접속은 부가적인 처리량을 가질 수 있지만, 통신 링크 자원들을 소비하는 것이 다른 데이터 전달에 이용가능한 처리량을 줄일 수 있거나, 사용자가 제한된 네트워크 대역폭을 가지게 되는 경우 회의 오디오 데이터 전달에 영향을 미칠 수 있다.It may also be desirable to minimize bandwidth consumption, or data throughput, for information delivery. For example, a physical connection to send data may have additional throughput, but consuming communication link resources may reduce the throughput available for other data transfers, or if the user has limited network bandwidth Can affect audio data delivery.

미디어 회의 개선(improvement)의 수용은 이 개선이 "이전 버전과 호환가능(backwards compatible)"하지 않은 경우 제한적일 수 있다. 예를 들면, 수정이 기존의 프로토콜들 및 버전들과 조화되지 않는(inconsistent) 경우, 사용자는 수정된 버전을 구현하는 참여자와 통신하기 위하여 업데이트된 버전을 얻고/얻거나 기관 승인을 구해야할 것이다. 전술된 상황은 수정된 기술의 수용을 저지할 수 있다.Acceptance of media conferencing enhancements may be limited if these improvements are not "backwards compatible". For example, if a modification is inconsistent with existing protocols and versions, the user will have to obtain an updated version and / or seek institutional approval to communicate with the participant implementing the modified version. The above described situation can hinder the acceptance of the modified technology.

오디오 또는 오디오/비디오 이벤트에서 클라이언트들을 식별하기 위한 프로시저(procedures)가 기술된다. 일례로, 미디어 서버는 오디오를 제공하는 클라이언트들을 입력 레벨에 기초하여 순서화(order)할 수 있다. 이벤트 내의 입력을 제공하는 클라이언트를 식별하기 위해 ID(identifier)가 클라이언트와 연관될 수 있다. 순서화된 클라이언트들은 오디오 컨텐츠를 반송(carrying)하는 패킷 헤더에 삽입될 수 있는 리스트에 포함될 수 있다.Procedures for identifying clients in an audio or audio / video event are described. In one example, the media server may order clients providing audio based on the input level. An identifier may be associated with the client to identify the client providing the input within the event. The ordered clients can be included in a list that can be inserted into a packet header that carries audio content.

이 요약은 이하 상세한 설명에서 더 기술되는 개념들의 선택을 간단한 형태로 소개하기 위하여 제공된다. 이 요약은 청구된 요지의 핵심 특징들 또는 필수적인 특징들을 식별하는 것을 의도하지 않았으며, 청구된 요지의 범위를 결정하는 것을 돕는데 사용되는 것을 의도하지도 않았다.This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to help determine the scope of the claimed subject matter.

첨부된 도면들을 참조하여 상세한 설명이 이루어진다. 이 도면들에서, 참조 번호의 가장 왼쪽 숫자(들)는 그 참조 번호가 처음으로 나타난 도면을 식별하는 것 이다. 도면들 및 설명의 다른 사례들에서 동일한 참조 번호의 사용은 유사하거나 동일한 항목을 나타낼 수 있다.Detailed description will be made with reference to the accompanying drawings. In these figures, the leftmost digit (s) of a reference number identifies the figure in which the reference number first appeared. In other instances of the figures and description the use of the same reference numbers may indicate similar or identical items.

도 1은 적극적인 화자 식별을 허용하기 위한 기술들을 사용할 수 있는 예시적인 구현에서의 환경을 도시하는 도면.1 illustrates an environment in an example implementation that may use techniques for allowing aggressive speaker identification.

도 2는 CSRC(contributing sources) 필드의 리스트에서 순서화된/재순서화된 적극적인 클라이언트 리스트를 포함하는 실시간 프로토콜 데이터 패킷을 도시하는 도면.FIG. 2 illustrates a real time protocol data packet including an ordered / reordered active client list in a list of contributing sources (CSRC) fields.

도 3은 적극적인 클라이언트들을 식별하기 위한 예시적인 구현에서의 절차를 도시하는 흐름도.3 is a flow diagram illustrating a procedure in an example implementation for identifying aggressive clients.

도 4는 실시간 프로토콜 회의에서 적극적인 클라이언트들을 식별하기 위한 예시적인 구현에서의 절차를 도시하는 흐름도.4 is a flow diagram illustrating a procedure in an example implementation for identifying active clients in a real time protocol conference.

개관survey

미디어 이벤트에서 적극적인 오디오 기여자들(contributors)을 식별하는 기술이 설명된다. 구현들에서는, 기여하거나 참여하는 오디오 클라이언트들의 리스트가 세션에 대한 클라이언트 기여에 기초하여 구성될 수 있다. ID가 참여하는 클라이언트들과 연관될 수 있어 클라이언트들은 어떤 클라이언트(들)가 이벤트에 적극적으로 기여하고 있는지 식별할 수 있다. 구성된 리스트는 회의 클라이언트들에 전달되기 위하여 데이터 스트림 패킷 헤더들(data stream packet headers)에 삽입될 수 있다. 구현들에서, 식별 정보는 데이터 전송에 관련하여 이용되는 제어 패 킷들에 포함될 수 있다. 본원에서 설명되는 기술들은 최소한의 네트워크 자원들을 소비하며 동기화 문제를 불러일으키지 않으면서, 화자 정보를 제공할 수 있다.Techniques for identifying active audio contributors in a media event are described. In implementations, a list of contributing or participating audio clients can be constructed based on the client contribution to the session. The ID can be associated with participating clients so that clients can identify which client (s) are actively contributing to the event. The configured list can be inserted in data stream packet headers for delivery to conference clients. In implementations, the identification information can be included in control packets used in connection with data transmission. The techniques described herein can provide speaker information without consuming minimal network resources and causing synchronization problems.

다른 구현들에서, 적극적인 클라이언트들의 순서화된 리스트가 데이터 패킷 헤더들에 삽입되도록 오디오 스트림들을 스위칭/믹싱(switching/mixing)하기 위한 미디어 서버가 구성될 수 있다. 예를 들면, 미디어 서버는, 어떤 클라이언트들이 적극적으로 말하고 있는지에 관련된 정보가 클라이언트들에 제공되도록, 현재 적극적인 화자에 기초하여 순서화될 수 있는 적극적인 화자들의 리스트를 포함할 수 있다. 리스트는 네트워크에 대한 미디어 전송 오버헤드를 증가시키지 않으면서 제공될 수 있다.In other implementations, a media server can be configured to switch / mix audio streams such that an ordered list of active clients is inserted into the data packet headers. For example, the media server may include a list of active speakers that may be ordered based on the current active speaker so that information related to which clients are actively speaking is provided to the clients. The list can be provided without increasing the media transfer overhead for the network.

예시적인 환경Example environment

도 1은 적극적인 화자 식별을 이용하도록 동작할 수 있는 예시적인 구현들의 환경(100)을 도시한다. 예를 들면, 미디어 서버(102)는 적극적인 오디오 클라이언트들을 식별하면서 미디어 이벤트에서 클라이언트가 제공한 오디오 스트림들 간을 믹싱 및 스위칭을 할 수 있다. 오디오 데이터 처리가 설명되지만, 미디어 서버(102)는, 클라이언트 장치들의 기능들 및 회의에 기초하여, 비디오 등을 포함하는 다른 유형의 미디어 데이터를 처리할 수 있다. 예를 들면, 미디어 서버(102)는 비디오 기능 등이 없는 클라이언트들에 오디오 데이터를 전달할 때 일부 클라이언트들에 대한 오디오/비디오 데이터를 조작할 수 있다. 1 illustrates an environment 100 of example implementations that may operate to use aggressive speaker identification. For example, media server 102 may mix and switch between audio streams provided by the client in a media event while identifying active audio clients. Although audio data processing is described, media server 102 may process other types of media data, including video, and the like, based on the functions and conferencing of the client devices. For example, media server 102 may manipulate audio / video data for some clients when delivering audio data to clients without video capabilities or the like.

예를 들면, 미디어 서버 프로세서(104)는 클라이언트들에 대한 오디오 스트림들을 믹싱/스위칭할 때 어떤 클라이언트 또는 클라이언트들이 오디오 컨텐츠에 적극적으로 기여하는지를 결정할 수 있다. 미디어 서버 프로세서(104)는 송신 미디어 스트림(send media streams)들을 생성하기 위하여 프로세서에 의해 채용된 믹싱/스위칭 알고리즘/기술에 기초하여 어떤 클라이언트들이 적극적으로 오디오 데이터를 입력하고 있는지를 결정할 수 있다. 이 결정은 미디어 서버(102)로부터 아웃바운드(outbound) 미디어 스트림들에 기여하는 클라이언트들의 리스트, 또는 미디어 서버 출력에 기여한 클라이언트들을 순서화하는 데에 이용될 수 있다.For example, media server processor 104 may determine which client or clients actively contribute to audio content when mixing / switching audio streams for the clients. The media server processor 104 may determine which clients are actively entering audio data based on the mixing / switching algorithm / technique employed by the processor to generate send media streams. This determination can be used to order a list of clients contributing to outbound media streams from media server 102, or clients contributing to media server output.

(클라이언트 "A"(106) 및 "E"(114)가 대화를 진행(carry on)하고 있는 등의) 클라이언트 "A"(106) 및 "E"(114)가 오디오 입력에 기여하는, 클라이언트 "A"(106), "B"(108), "C"(110), "D"(112) 및 "E"(114)를 포함하는 오디오 이벤트에서는, 적극적이지 않은 클라이언트 "B"(108), "C"(110), "D"(112)에는 미디어 서버(102)로부터의 "A+E" 송신 스트림, 또는 이 2명의 화자들의 조합이 제공될 수 있는 한편, 클라이언트 "A"(106) 및 "E"(114)는 각각 미디어 서버(102)로부터 상대방의 송신 스트림을 수신한다(예를 들면, 클라이언트 A(106)는 클라이언트 E 송신 스트림을 수신하는 한편 클라이언트 E(114)는 클라이언트 A 송신 스트림을 수신한다). 적절한 클라이언트 장치로는 VoIP(voice over internet protocol) 전화, 오디오 기능을 구비한 컴퓨팅 장치, 게이트웨이를 통해 디지털 오디오 세션에 접속된 PSTN(publicly switched network telephones) 전화, 기타 등등을 포함하지만, 이에 한정되지는 않는다.Clients "A" 106 and "E" 114 (such as clients "A" 106 and "E" 114 carrying on a conversation) contribute to the audio input. In an audio event including "A" 106, "B" 108, "C" 110, "D" 112, and "E" 114, an inactive client "B" 108 ), "C" 110, "D" 112 can be provided with an "A + E" transmission stream from media server 102, or a combination of these two speakers, while client "A" ( 106 and " E " 114 respectively receive the other party's transmission stream from media server 102 (e.g., client A 106 receives the client E transmission stream while client E 114 receives the client). A receives a transmission stream). Suitable client devices include, but are not limited to, voice over internet protocol (VoIP) phones, computing devices with audio capabilities, publicly switched network telephones (PSTN) phones connected to digital audio sessions through gateways, and the like. Do not.

일부 구현들에서는, 적극적인 화자들에는 피드백 또는 에코를 피하기 위해 화자 자신의 송신 스트림을 포함하는 신호는 제공되지 않을 수 있다(예를 들면, 클 라이언트 A(106)에는 클라이언트 A 오디오를 포함하는 오디오 스트림이 송신되지 않을 수 있다). 몇 가지 일반적인 식별 시나리오가 고려될 수 있는데, 예를 들면, (클라이언트 A(106)에 관련된 참여자는 크게 말하는 한편 (클라이언트 E(114)에 관련된) 참여자 "E"는 비교적 정상적인 목소리로 대화하는 경우와 같이) 클라이언트 A(106)가 클라이언트 E"를 상대로 말할(talk over)" 수 있는 경우, 참여자 "A" 및 "E"가 이 두 참여자들 사이에서 현재 화자가 변경되는 빠른 교환에 관여하는 경우, 또는 참여자 "A"가 대화를 이끌어가는 반면에 참여자 "E"는 비교적 입력을 덜 제공하는 경우이다. 후자의 상황의 예로는 주요 화자의 압도적인 독백에 마이너한 응답(acknowledgement)들을 추가하는 참여자를 포함할 수 있다.In some implementations, active speakers may not be provided with a signal that includes the speaker's own transmission stream to avoid feedback or echo (eg, client A 106 has an audio stream that includes client A audio). May not be sent). Some general identification scenarios can be considered, for example, when the participant "E" (relative to client A 106) speaks loudly while the participant "E" (relative to client E 114) communicates in a relatively normal voice. Likewise, if Client A 106 can talk over to Client E, then Participants "A" and "E" are involved in a quick exchange in which the current speaker changes between these two participants, Or participant "A" leads the conversation while participant "E" provides relatively less input. An example of the latter situation could include a participant adding minor acknowledgments to the overwhelming monologue of the main speaker.

구현들에서, 미디어 서버(102)는 클라이언트로부터 수신된 패킷의 수, 오디오 컨텐츠가 수신될 때, 패킷 사이즈, 에너지 오디오 레벨 등에 기초하여 주도적인(dominant) 클라이언트(즉 화자)를 결정할 수 있다. 따라서, 2 이상의 클라이언트가 동시에 컨텐츠에 기여하지만, 상술한 요인들에 기초하여 하나의 적극적인 클라이언트가 주도적인 클라이언트(즉 화자)로서 지정될 수 있다. 예를 들면, 미디어 서버(102)는 서로 다른 클라이언트들로부터 수신된 입력들 간의 믹싱 및/또는 스위칭에 관련하여 적극적인 클라이언트로부터 수신된 오디오 컨텐츠를 포함하는 현재 데이터 패킷들에 기초하여 현재 적극적인 클라이언트(및 관련 화자)를 결정할 수 있다. 예를 들면, 클라이언트 E가 지금은 데이터 패킷들에 기여하고 있지 않다면, 미디어 서버(102)는 클라이언트 A(106)를 현재의 "적극적인" 클라이언트로서 지정할 수 있다. 다른 예에서, 클라이언트 A(106) 및 클라이언트 E(114) 모두가 적극적이지만, 클라이언트 A(106)가 클라이언트 E(114)보다 큰 에너지 레벨로 오디오 컨텐츠에 기여하고 있다면(즉, 참여자 A가 크게 말을 하고 E는 낮은 톤으로 말한다면), 클라이언트 A(106)는 주도적이며 적극적인 화자로서 지정될 수 있다. 클라이언트들에는 클라이언트 A(106)로 시작하는 적극적인 클라이언트 리스트가 제공될 수 있다. 하나 이상의 진행중인(ongoing) 회의에 대한 클라이언트 입력 오디오 스트림들을 믹싱/스위칭할 때 이러한 유형의 결정이 이루어질 수 있다. 예를 들면, 믹싱 알고리즘을 채용할 때 미디어 서버(102) 프로세서는 적극적인 클라이언트들 간을 구별할 수 있는 한편, 식별 모듈(116)은 적용될 수 있는 데이터 패킷들에 이 정보를 삽입하는 데에 이용될 수 있다. In implementations, media server 102 may determine the dominant client (ie, speaker) based on the number of packets received from the client, when the audio content is received, packet size, energy audio level, and the like. Thus, although two or more clients contribute to the content at the same time, one active client can be designated as the dominant client (ie, speaker) based on the factors mentioned above. For example, media server 102 may be based on a current active client (and based on current data packets containing audio content received from an active client with respect to mixing and / or switching between inputs received from different clients. Related speaker). For example, if client E is not contributing to data packets at this time, media server 102 can designate client A 106 as the current "active" client. In another example, if both Client A 106 and Client E 114 are active, but Client A 106 is contributing to the audio content at a higher energy level than Client E 114 (ie, Participant A speaks loudly) And E is in low tone), client A 106 may be designated as the dominant and active speaker. Clients may be provided with an active client list beginning with Client A 106. This type of decision can be made when mixing / switching client input audio streams for one or more ongoing conferences. For example, when employing a mixing algorithm, the media server 102 processor may distinguish between active clients, while the identification module 116 may be used to insert this information into applicable data packets. Can be.

도 2를 일반적으로 참조해 보면, 구현들에서, RTP(real-time transport protocol) 및 그 관련 RTCP(real-time control protocol)를 구현할 때, 미디어 서버(102)는 데이터 전송 및 시그널링 스트림들을 포함하는, 클라이언트들로부터 송신된 스트림들 내의 데이터를 검사함으로써, 적극적인 클라이언트들, 즉, 적극적인 화자들을 식별할 수 있다. 클라이언트 A(106)의 경우, 미디어 서버(102)는 RTP 패킷 내의 SSRC(synchronization source) 필드를 검사함으로써 또는 RTCP 보고에 포함된 CNAME(canonical name) 및 클라이언트 SSRC(세션 내의 클라이언트의 ID)로부터 오디오 클라이언트 송신 스트림이 클라이언트 A(106)로부터 발생하였음을 식별할 수 있다. 다른 정보 또한 검사될 수 있다. SSRC는 RTP 패킷 헤더로부터로도 획득될 수 있다. 예를 들면, SSRC는 RTCP 보고 내의 CNAME에 매핑될 수 있다.Referring generally to FIG. 2, in implementations, when implementing a real-time transport protocol (RTP) and its associated real-time control protocol (RTP), media server 102 includes data transmission and signaling streams. By examining the data in the streams sent from the clients, it is possible to identify active clients, ie active speakers. For client A 106, media server 102 checks the audio source (RSRC) field in the RTP packet or from the client name (CNAME) and client SSRC (client's ID in session) included in the RTCP report. It can be identified that the transmission stream originated from client A 106. Other information can also be checked. SSRC can also be obtained from the RTP packet header. For example, the SSRC may be mapped to a CNAME in the RTCP report.

상실된 패킷들을 식별하고, 데이터 전송 품질을 보장하는 등에 RTCP 시그널 링이 이용될 수 있고, RTCP 보고는 RTCP 아웃-오브-밴드(out-of-band) 신호들로부터 획득될 수 있다. 예를 들면, RTCP 보고는 클라이언트 CNAME에 매핑되는 임의적으로 생성된 클라이언트 SSRC를 포함할 수 있다. 일반적으로 CNAME는 클라이언트 장치에 사용되는 별칭(aliase)들에 관련된 ID/레코드이다. 일부 예에서는, CNAME는 숫자들의 스트링 등이다. 구현들에서, 미디어 서버(102)에는 세션 내의 SSRC가 할당될 수 있다. 일부 예에서는, SSRC는 세션에 포함되는 클라이언트에 대하여 변경될 수 있다. 예를 들면, 클라이언트 SSRC는 클라이언트가 차단(cuts-off)될 경우(예를 들면, 장시간 중지 이후에 다시 참여), 클라이언트 SSRC들이 충돌할 경우(하나 이상의 클라이언트에 공통된 SSRC가 발행될 경우) 등에서 변경될 수 있다. 이런 식으로, 들어오는(incoming) 데이터 스트림은 데이터 스트림 내의 SSRC로부터 또는 RTCP 시그널링으로부터 식별될 수 있다. 미디어 서버(102)는 또한 클라이언트 식별에 이용하기 위하여 RTCP 시그널링으로부터 CNAME을 획득할 수 있다.RTCP signaling may be used to identify lost packets, to ensure data transmission quality, and RTCP reporting may be obtained from RTCP out-of-band signals. For example, the RTCP report may include an arbitrarily generated client SSRC that maps to the client CNAME. In general, CNAMEs are IDs / records associated with aliases used on client devices. In some examples, CNAME is a string of numbers and the like. In implementations, media server 102 may be assigned an SSRC in a session. In some examples, the SSRC may change for clients involved in the session. For example, the client SSRC changes when the client cuts-off (for example, rejoins after a long pause), when the client SSRCs collide (an SSRC common to more than one client is issued), and so on. Can be. In this way, the incoming data stream can be identified from the SSRC or from the RTCP signaling in the data stream. Media server 102 may also obtain a CNAME from RTCP signaling for use in client identification.

(오디오 출력을 포함하는) 송신 스트림을 생성할 때, 미디어 서버(102)는, 적극적인 클라이언트로부터 획득한 CNAME 및 SSRC로부터, 어떤 클라이언트들이 세션에 대한 오디오 입력에 기여하고 있는지를 식별할 수 있다. 예를 들면, 미디어 서버(102)는 SSRC, RTCP 패킷에 삽입되는 CNAME을 오디오 컨텐츠 송신 스트림(즉, 오디오 데이터를 반송하는 미디어 서버 출력 스트림(들))에 연관시킬 수 있다. 클라이언트 "A"(106), "B"(108), "C"(110), "D"(112) 및 "E"(114) 간의 이전의 예시적인 세션으로 돌아가보면, 믹싱된 신호 "A+E"의 경우, 미디어 서버(102)는 지금 어떤 클라이언트가 적극적인지, 어떤자가 적극적이며 세션에 주도적인지(dominant) 등에 따라 클라이언트 "A" 및 "E"를 순서화할 수 있다. 이 정렬 순서는 오디오 입력을 제공하는 클라이언트에 기초하여 변경될 수 있다. 이 경우, 클라이언트 A가 현재 입력을 제공하고 있는 경우, 또는 클라이언트 A가 대화에 주도적인 경우에, 리스트는 클라이언트 A(106)의 ID로 시작하면서 클라이언트 E(114)를 포함할 수 있다. 클라이언트 A(106)와 클라이언트 E(114) 간에 오디오가 교환되는 상황들에서는, 그 정렬 순서는 패킷 단위로(on a per packet basis) 나타나는 바와 같이, 현재 말하고 있는 참여자에 기초하여 변경될 수 있다.When generating a transmission stream (including audio output), media server 102 may identify from the CNAME and SSRC obtained from the active client which clients are contributing to the audio input for the session. For example, media server 102 may associate the CRC inserted in the SSRC, RTCP packets with the audio content transmission stream (ie, media server output stream (s) carrying audio data). Returning to the previous exemplary session between the clients "A" 106, "B" 108, "C" 110, "D" 112, and "E" 114, the mixed signal "A" In the case of + E ", media server 102 may now order clients" A "and" E "depending on which client is active, who is active and session dominant, and so forth. This sort order can be changed based on the client providing the audio input. In this case, if Client A is currently providing input, or if Client A is dominant in the conversation, the list may include Client E 114 starting with the ID of Client A 106. In situations where audio is exchanged between client A 106 and client E 114, the sort order may be changed based on the currently speaking participant, as indicated on a per packet basis.

도 2를 참조해 보면, RTP 구성에서, 미디어 서버 식별 모듈(116)은 출력 스트림 RTP 패킷 헤더에 이 순서화된 SSRC들의 리스트를 삽입할 수 있다. 예를 들면, 순서화된 ID들이 데이터 스트림으로 송신된 패킷 헤더의 CSRC(contributing sources) 필드(204)의 리스트에 삽입된다. 클라이언트 A 및 클라이언트 E가 현재 적극적인 역할들을 교환하고 있는 경우, SSRC의 구성은 "클라이언트 A, 클라이언트 E..."(204(a))로부터 "클라이언트 E, 클라이언트 A..."(204(b))로 변경될 수 있다. 이전의 방식으로는, 동기화 문제 및 네트워크 오버헤드에 관련된 추가적인 시그널링을 피하면서도 데이터 스트림을 수신하는 클라이언트들(세션 내의 청취하는 클라이언트 또는 참여자들)은 어떤 클라이언트가 입력을 제공하고 있는지, 상대적인 기여도 등에 대한 통지를 받을 수 있다. 예를 들면, CSRC 필드는 1개당 32개의 비트의 15개의 ID까지 포함하는 것이 허용될 수 있고, 나머지는 사양에 관련될 수 있다. 본원에 논의된 기술들에 관련하여 수행되지 않는 클라이언트는 본원에 논의된 이점들 없이 참여할 수 있다. 따라서, 시스템 및 기술들을 구축하는 것은 이전 버 전과 호환(backwards compatible)될 수 있다.Referring to FIG. 2, in an RTP configuration, the media server identification module 116 can insert this ordered list of SSRCs in the output stream RTP packet header. For example, ordered IDs are inserted into the list of contributing sources (CSRC) field 204 of the packet header sent in the data stream. If Client A and Client E are currently exchanging active roles, the configuration of SSRC is from "Client A, Client E ..." 204 (a) to "Client E, Client A ..." (204 (b). Can be changed to)). In the former way, clients receiving a data stream (listening clients or participants in a session), while avoiding additional signaling related synchronization issues and network overhead, have no bearing on what clients are providing input, relative contributions, etc. Receive notification. For example, the CSRC field may be allowed to contain up to 15 IDs of 32 bits per one, and the rest may be related to the specification. Clients that are not performed in connection with the techniques discussed herein may participate without the benefits discussed herein. Thus, building systems and technologies can be backwards compatible.

SSRC가 적극적인 클라이언트를 식별할 수 있지만, SSRC를 사용하는 것은 SSRC가 임의적으로 할당될 수 있고, 유사한 SSRC를 가지는 다른 클라이언트와의 충돌에 의해 변경될 수 있으며, 이 클라이언트는 세션으로부터 드롭(dropping out of)된 다음 이 세션에 다시 참여한 이후에 SSRC가 재할당되는 등에 의해, 문제가 될 수 있다.Although SSRC can identify active clients, using SSRC can be randomly assigned by SSRC and changed by collisions with other clients with similar SSRCs, which drop out of the session. This may be a problem, for example, after the SSRC is reassigned after rejoining this session.

미디어 서버(102)는 (예를 들어, 다른 "청취하는" 클라이언트들이 적극적인 클라이언트 CNAME 및 SSRC를 인식할 수 있게 되도록) 클라이언트들에 전달된 RTCP 패킷들에 적극적인 클라이언트 CNAME(들)을 삽입할 수 있다. 예를 들면, 미디어 서버 식별 모듈(116)은 "청취하는 클라이언트들"로 송신된 적극적인 클라이언트 ID들을 미디어 서버 RTCP 패킷들로 "팬-아웃(fan-out)"시킬 수 있다. 예를 들면, 몇몇의 적극적인 클라이언트들이 회의에 기여하고 있다면, 미디어 서버는 획득된 클라이언트 ID들을 지정된 간격으로 미디어 서버 데이터 스트림에 관련하여 송신된 RTCP 패킷들에 삽입할 수 있다. RTCP 패킷들은 각 패킷 내에 CNAME을 포함할 수 있다. CNAME(들)은 전송 오버헤드를 최소화하기 위하여 청취하는 클라이언트들로 전달되는 RTCP 패킷들로 분산될 수 있다. 적극적인 클라이언트 ID들을 포함하는, 미디어 서버 RTCP 데이터를 수신하는 클라이언트들은 로컬 메모리에 데이터를 저장할 수 있어 오디오 컨텐츠를 수신할 때 CNAME이 데이터 패킷들과 관련될 수 있도록 한다. 예를 들면, CNAME, 매핑된 SSRC 및 기타 관련 정보가 룩-업 테이블(look-up table) 등에 저장될 수 있다. 예를 들면, 데이터 스트림에 포함되는 오디오 컨텐 츠가 일반적으로 연속적인 방식으로 송신될 수 있는 반면, RTCP 시그널링은 특정된 간격 마다(예를 들면, 5 또는 10초 간격 마다) 등으로 간헐적으로만 일어날 수 있다. 따라서, 데이터 패킷을 수신하는 클라이언트는 CSRC 내의 SSRC를 이전에 수신된 CNAME과 연관시킬 수 있다. 구현들에서, GRUU(globally routable user agent universal resource indicator)가 특정 클라이언트를 식별하는 데에 이용될 수 있다.Media server 102 may insert aggressive client CNAME (s) in RTCP packets delivered to clients (eg, to allow other “listening” clients to recognize the aggressive client CNAME and SSRC). . For example, media server identification module 116 may "fan-out" aggressive client IDs sent to "listening clients" into media server RTCP packets. For example, if several active clients are contributing to the conference, the media server may insert the obtained client IDs into the RTCP packets sent with respect to the media server data stream at specified intervals. RTCP packets may include a CNAME in each packet. The CNAME (s) may be distributed into RTCP packets that are delivered to listening clients to minimize transmission overhead. Clients receiving media server RTCP data, including aggressive client IDs, can store the data in local memory so that a CNAME can be associated with the data packets when receiving audio content. For example, the CNAME, mapped SSRC, and other related information may be stored in a look-up table or the like. For example, audio content included in a data stream may be transmitted in a generally continuous manner, while RTCP signaling may only occur intermittently at specified intervals (e.g., every 5 or 10 seconds). Can be. Thus, a client receiving a data packet can associate the SSRC in the CSRC with a previously received CNAME. In implementations, a globally routable user agent universal resource indicator (GRUU) may be used to identify a particular client.

구현들에서, 적극적인 클라이언트는 이 클라이언트가 회의에서 적극적인 자임을 통지받을 수 있다. 예를 들면, (적극적인 클라이언트와 연관된) 참여자는 그/그녀가 다른 참여자"를 상대로 말하고" 있지 않음을 알고 싶을 수 있다. 클라이언트 "A"(106), "B"(108), "C"(110), "D"(112) 및 "E"(114) 간의 세션으로 돌아가서, 예를 들어, 클라이언트 A(106)가 적극적이지만, 클라이언트 "B"(108), "C"(110), "D"(112) 및 "E"(114)는 적극적이지 않은 경우, 이는 클라이언트 A에 전달되는 RTCP 신호를 통해 식별될 수 있다. 따라서, 미디어 서버(102)는 클라이언트 A를 통해 송신 스트림을 전달함으로써 클라이언트 "B", "C", "D" 및 "E"에 대한 송신 미디어 스트림을 생성할 수 있는 한편, 클라이언트 A(106)는 CSRC/RTCP 패킷들에 기초하여 어떠한 다른 클라이언트도 "청취하는" 클라이언트 또는 세션의 맴버로서 적극적이지 않음을 식별할 수 있다.In implementations, an active client may be informed that this client is an active person in a meeting. For example, a participant (associated with an active client) may want to know that he / she is not "talking" to another participant. Returning to the session between the clients "A" 106, "B" 108, "C" 110, "D" 112 and "E" 114, for example, client A 106 If aggressive, but clients "B" 108, "C" 110, "D" 112, and "E" 114 are not aggressive, this may be identified through an RTCP signal sent to Client A. have. Accordingly, media server 102 can generate a transmission media stream for clients "B", "C", "D" and "E" by delivering the transmission stream through client A, while client A 106 May identify that no other client is active as a member of the client or session "listening" based on the CSRC / RTCP packets.

다른 구현들에서, 사람이 이해할 수 있는 정보가 SSRC 및 CNAME과 연관된다. 예를 들면, 참여자가 말하고 있을 때 사용자는 말하는 참여자의 사진이 연관된 모니터 상에 디스플레이되기를 원할 수 있다. 구현들에서, 사람이 이해할 수 있는 클라이언트 정보가 클라이언트들 간에 교환될 수 있다. 예를 들면, 일반적으로 이벤트 또는 세션을 개시할 때 데이터가 교환될 수 있다.In other implementations, human understandable information is associated with the SSRC and CNAME. For example, when the participant is speaking, the user may want the picture of the speaking participant to be displayed on the associated monitor. In implementations, human understandable client information may be exchanged between clients. For example, data can generally be exchanged when initiating an event or session.

클라이언트와 다른 컴포넌트들을 접속시키는 데에 인터넷(WWW; World Wide Web)이 이용될 수 있지만, 다른 네트워크 및 각종 링크들 또한 적절하다. 예를 들면, 미디어 서버(102)를 클라이언트에 접속시키는 네트워크는 WAN(wide area network), LAN(local area network), 무선 네트워크, 공중 전화 네트워크, 인트라넷, 기타 등등을 포함할 수 있다. 네트워크는 복수의 서브-네트워크를 포함하도록 구성될 수 있다.The Internet (WWW) may be used to connect the client and other components, but other networks and various links are also suitable. For example, a network connecting media server 102 to a client may include a wide area network (WAN), a local area network (LAN), a wireless network, a public telephone network, an intranet, and the like. The network may be configured to include a plurality of sub-networks.

다음의 논의는 이전에 기술된 시스템들 및 장치들을 이용하여 구현될 수 있는 기술들을 설명한다. 각 프로시저(procedure)들의 양태는 하드웨어, 펌웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수 있다. 프로시저들은 하나 이상의 장치에 의해 수행되는 동작을 지정하는 블록 세트로서 도시되지만 반드시 각 블록들에 의해 동작들을 수행하기 위한 도시된 순서들로 제한되는 것은 아니다.The following discussion describes techniques that can be implemented using the systems and apparatuses previously described. Aspects of the respective procedures may be implemented in hardware, firmware, software, or a combination thereof. Procedures are shown as a set of blocks that specify operations to be performed by one or more devices, but are not necessarily limited to the illustrated order for performing the operations by each block.

예시적인 프로시저들Example Procedures

다음의 논의는 이전에 기술된 시스템들 및 장치들을 이용하여 구현될 수 있는 기술들을 설명한다. 각 프로시저(procedure)들의 양태는 하드웨어, 펌웨어, 소프트웨어, 또는 이들의 조합으로 구현될 수 있다. 프로시저들은 하나 이상의 장치에 의해 수행되는 동작들을 지정하는 블록 세트로서 도시되지만 반드시 각 블록들에 의해 동작들을 수행하기 위한 도시된 순서들로 제한되는 것은 아니다. 다양한 다른 예가 또한 고려된다. The following discussion describes techniques that can be implemented using the systems and apparatuses previously described. Aspects of the respective procedures may be implemented in hardware, firmware, software, or a combination thereof. Procedures are shown as a set of blocks that specify operations to be performed by one or more devices, but are not necessarily limited to the illustrated order for performing the operations by each block. Various other examples are also contemplated.

도 3은 미디어 세션에서 적극적인 오디오 입력 클라이언트들을 식별하기 위한 예시적인 프로시저들을 설명한다. 예를 들면, 이 기술들은 일부 클라이언트가 비디오 기능을 가지고 있지 않은 미디어 회의 또는 전화 회의 등에서 이용될 수 있다. 3 illustrates example procedures for identifying active audio input clients in a media session. For example, these techniques may be used in media conferencing or teleconferences where some clients do not have video capabilities.

구현들에서, 호스트 또는 센트럴 포인트(central point)로서 작용하는 미디어 서버는 각 적극적인 클라이언트에 의해 제공된 입력에 관련된 오디오 입력 클라이언트들을 결정할 수 있다(302). 예를 들면, 오디오 클라이언트 입력의 믹싱 및/또는 스위칭의 일부로서 결정이 이루어질 수 있다. 따라서, 클라이언트 A는, 다른 클라이언트가 오디오 입력을 제공할 때까지, 주요 적극적인 클라이언트로서 지정될 수 있다. 다른 예에서, 클라이언트 A 및 클라이언트 E가 기여하고 있지만 클라이언트 A의 오디오가 더 높은 에너지 레벨을 가진다면 클라이언트 A가 선택될 수 있다. 클라이언트 오디오는 클라이언트와 연관된 참여자가 큰 목소리로 말하고 있거나 이 클라이언트가 오디오 입력에 주도적인 경우에서와 같이 보다 지속적으로 말하고 있다면 더 높은 에너지 레벨을 가질 수 있다. In implementations, the media server acting as a host or central point may determine 302 audio input clients related to the input provided by each active client. For example, the determination may be made as part of mixing and / or switching audio client input. Thus, client A may be designated as the primary active client until another client provides audio input. In another example, if client A and client E are contributing but client A's audio has a higher energy level, client A may be selected. Client audio may have a higher energy level if the participant associated with the client is speaking loudly or if the client is speaking more consistently, as in the case where the client is leading the audio input.

오디오 입력 클라이언트는 클라이언트가 현재 적극적이며, 대화에 주도적인 등의 경우 "탑(top)" 클라이언트로서 식별될 수 있다. 본 기술들에 따라 기능하는 RTP/RTCP 시스템들에서, 미디어 서버는 클라이언트 입력 스트림들과, 이에 관련된 RTCP 패킷들(예를 들면, 클라이언트로부터 송신된 RTCP 패킷들)을 획득할 수 있는데(304), 이 RTCP 패킷들은 오디오 컨텐츠를 포함하는 스트림을 생성하는 특정 클라이언트에 대한 CNAME에 매핑된 SSRC를 포함하고 있다. 예를 들면, 미디어 서버 는 클라이언트에 대한 SSRC 및 CNAME을 획득할 수 있다. CNAME은 SSRC에 관련하여 클라이언트를 식별한다. 미디어 서버는 어떤 클라이언트들이 현재 오디오 입력을 제공하고, 대화에 주도적인지 등에 따라서 입력 클라이언트 SSRC들을 순서화할 수 있다(306). 예를 들면, 미디어 서버는 적극적인 클라이언트 SSRC ID들을 현재 적극적인 "화자", 예를 들면, 입력을 제공하는 적극적인 클라이언트로부터 내림차순으로(descending from) 순서화할 수 있다. 예에서는, RTP가 CRSC에 포함된 적극적인 클라이언트 마다 32 비트 ID를 이용하는 15개의 적극적인 화자의 식별을 허용할 수 있다.An audio input client may be identified as a "top" client if the client is currently active, conversational, and the like. In RTP / RTCP systems functioning in accordance with the present techniques, the media server may obtain client input streams and associated RTCP packets (eg, RTCP packets sent from the client), 304 These RTCP packets contain an SSRC mapped to the CNAME for the particular client creating the stream containing the audio content. For example, the media server can obtain the SSRC and CNAME for the client. The CNAME identifies the client with respect to the SSRC. The media server may order the input client SSRCs according to which clients are currently providing audio input, dominating the conversation, and so on (306). For example, the media server may order active client SSRC IDs from the current active "speaker", eg, descending from an active client providing input. In an example, RTP may allow the identification of 15 active speakers using a 32-bit ID per active client included in the CRSC.

미디어 서버는 ID를 오디오 입력 클라이언트와 연관시킬 수 있다. 예를 들면, 미디어 서버는 오디오 입력 클라이언트 RTCP 패킷으로부터 SSRC 및 CNAME을 획득할 수 있다. 미디어 서버 출력 스트림에 포함된 CRSC 필드에서 SSRC는 오디오 입력 클라이언트를 식별하는 데에 이용될 수 있다. The media server can associate the ID with the audio input client. For example, the media server may obtain the SSRC and CNAME from the audio input client RTCP packet. SSRC in the CRSC field included in the media server output stream may be used to identify the audio input client.

클라이언트들은 다른 데이터를 수신/오디오 입력 클라이언트와 연관시킬 수 있다. 예를 들면, 수신하는 클라이언트(청취하는 클라이언트 또는 미디어 이벤트의 클라이언트)는 CNAME에 연관된 사람이 이해할 수 있는 정보를 가질 수 있다. 예를 들면, 클라이언트는 (클라이언트 CNAME/SSRC연관된) 참여자의 사진, 참여자의 이름, 그 밖의 정보를 가질 수 있다. Clients can associate other data with a receive / audio input client. For example, the receiving client (the listening client or the client of the media event) may have information that a person associated with the CNAME can understand. For example, the client may have a picture of the participant (associated with the client CNAME / SSRC), the participant's name, and other information.

순서화된 오디오 입력 클라이언트 ID들은 패킷 헤더에 리스트로 삽입될 수 있다(308). 예를 들면, 클라이언트 "A" 및 "E"가 오디오 입력을 제공하고 있다면(현재 클라이언트 A가 적극적인 클라이언트임), RTP 헤더의 CSRC 필드는 클라이언 트 "A"의 SSRC로 리스트를 시작하는 SSRC들을 포함할 수 있다. 이런 식으로 컨텐츠 스트림 내에서 (다른 적극적인 클라이언트로부터 오디오 입력을 수신하는 오디오 입력 클라이언트를 포함할 수 있는) 청취하는 클라이언트에 화자의 신원을 통지할 수 있다. 다른 예에서, 리스트 내의 오디오 입력 클라이언트들의 정렬 순서는, 어떤 오디오 입력 클라이언트가 미디어 세션에 주도적인지에 적어도 일부 기초할 수 있다. 주도적인지 대한 고려사항은 오디오 입력의 에너지 레벨, 입력 지속 시간, 침묵 기간의 지속 시간, 패킷 사이즈, 기타 등등을 포함할 수 있다. 예를 들면, 클라이언트 A가 현재 적극적이고 클라이언트 A 송신 스트림이 하나 이상의 다른 오디오 입력 클라이언트에 비하여 높은 에너지 레벨을 표시하고 있기 때문에 리스트는 클라이언트 A로 시작할 수 있다. The ordered audio input client IDs may be inserted 308 into the packet header. For example, if clients "A" and "E" are providing audio input (currently client A is an active client), the CSRC field of the RTP header contains SSRCs that start the list with SSRC of client "A". can do. In this way, the identity of the speaker can be notified to the listening client (which can include an audio input client receiving audio input from another active client) within the content stream. In another example, the sorting order of the audio input clients in the list may be based at least in part on which audio input client is dominant in the media session. Considerations for dominance may include the energy level of the audio input, the duration of the input, the duration of the silence period, the packet size, and the like. For example, the list may begin with client A because client A is currently active and the client A transmission stream is indicating a higher energy level than one or more other audio input clients.

미디어 서버는 청취하는 클라이언트(세션 클라이언트)에 (컨텐츠 전송과 관련하여 송신된 RTCP 패킷들과 같은) 미디어 서버 송신 스트림(들)로 SSRC 및 CNAME을 송신할 수 있다(310). 오디오 입력 클라이언트들에 대한 SSRC는 또한 RTP 패킷들의 데이터 스트림 패킷 헤더의 CSRC 필드에 위치될 수도 있다. 예를 들면, 5개의 클라이언트 미디어 이벤트에서, 3명의 참여자가 말하고 있을 경우, 오디오 입력 클라이언트들에 연관된 클라이언트 SSRC 및 CNAME이 오디오 컨텐츠를 전달하는 RTP 패킷들에 연관된 (청취하는 클라이언트들에 송신되는) 미디어 서버 RTCP 패킷들에 포함될 수 있다. 이런 식으로, 미디어 서버는 클라이언트들에 적극적인 오디오 클라이언트들을 식별하는 SSRC 및 CNAME을 송신할 수 있다. 따라서, 청취하는 클라이언트는 RTP 패킷 내의 SSRC(들) 및 CNAME(들)에 관련하여 오디오 컨텐츠의 출처 소스(originating source)를 식별할 수 있다. 클라이언트 SSRC는 이 클라이언트 SSRC가 다른 클라이언트로 발행되는 SSRC와 충돌하는 경우, 또는 클라이언트가 다른 이유로 인해 소스 전송 주소를 변경하는 경우에 업데이트될 수 있다. SSRC(들) 및 CNAME(들)은 로컬 메모리에 저장될 수 있어(312) 청취하는 클라이언트는 미디어 이벤트 전반에 걸친 정보를 액세스할 수 있다. The media server may send 310 the SSRC and CNAME to the listening client (session client) in the media server transmission stream (s) (such as RTCP packets sent in connection with the content transfer). SSRC for audio input clients may also be located in the CSRC field of the data stream packet header of RTP packets. For example, in five client media events, if three participants are speaking, the media (sent to the listening clients) associated with the RTP packets carrying the audio content are associated with the client SSRC and CNAME associated with the audio input clients. May be included in server RTCP packets. In this way, the media server can send SSRC and CNAME identifying the active audio clients to the clients. Thus, the listening client can identify the originating source of the audio content with respect to the SSRC (s) and CNAME (s) in the RTP packet. The client SSRC may be updated if this client SSRC conflicts with an SSRC issued to another client, or if the client changes the source transport address for some other reason. The SSRC (s) and CNAME (s) can be stored in local memory (312) so that the listening client can access information throughout the media event.

도 4에서는 미디어 회의에서 적극적인 클라이언트들을 식별하기 위한 예시적인 기술들이 설명된다. 예를 들면, 본 기술들은 클라이언트 일부가 비디오를 구비하지 않는 경우의, 미디어 회의 동안 이용될 수 있거나, 오디오 전화 회의에서 이용될 수 있다. 4 illustrates exemplary techniques for identifying active clients in a media conference. For example, the techniques can be used during media conferencing when some of the clients do not have video, or can be used in audio teleconferencing.

본 구현들에서, 미디어 서버는 적극적인 클라이언트들의 ID 뿐만 아니라 적극적인 클라이언트 입력(오디오 컨텐츠)도 수신할 수 있다(402). 예를 들면, 오디오 회의에 기여하는 클라이언트는 이 클라이언트를 식별하는 SSRC 및 CNAME을 송신할 수 있다. 예를 들면, SSRC는 데이터 스트림에서 RTP 패킷 헤더에 및 CNAME과 함께 RTCP 패킷에 포함될 수 있다.In the present implementations, the media server may receive active client input (audio content) as well as active client's ID (402). For example, a client contributing to an audio conference can send an SSRC and a CNAME identifying this client. For example, the SSRC may be included in the RTP packet header and in the RTCP packet along with the CNAME in the data stream.

회의 내의 하나 이상의 적극적인 클라이언트의 순서화된(404) 리스트가 생성된다. 예를 들면, 오디오 입력 스트림을 믹싱/스위칭하는 미디어 서버가 적극적인 클라이언트 리스트(RTP/RTCP 내의 SSRC ID들), 또는 회의 또는 세션에 입력을 제공하는 클라이언트들을 구성할 수 있다. 예를 들면, 미디어 서버는 적극적인 클라이언트 송신 스트림으로부터 SSRC 식별자를 획득하는 오디오/비디오 믹싱 서버(AVMCU)이며, 적극적인 클라이언트 송신 스트림은 데이터 영역 및 관련 시그널링 영역을 포함할 수 있다. 그 다음 AVMCU는 적극적인 클라이언트 SSRC들 또는 세션 내의 클라이언트의 다른 ID의 상대적인 구성을 결정할 수 있다. SSRC는 클라이언트 CNAME에 매핑될 수 있는 RTCP 보고로부터 식별될 수 있다. 예를 들면, 그 순위(ranking)는 어떤 클라이언트가 현재 적극적인지에 기초할 수 있다. 다른 구현들에서, 에너지 레벨, 제공된 데이터 패킷의 개수, 침묵 기간의 지속 시간, 패킷 사이즈 등과 같은 인자들이 고려될 수 있다. 예를 들면, 순서화된 리스트는 제공된 패킷들의 개수에 의해 세션에 주도적일 수 있는 적극적인 클라이언트로 시작할 수 있는 반면, 제2의 동시적이며 적극적인 클라이언트에는 더 낮은 상대적인 지위(a leser relative status)가 할당된다. An ordered 404 list of one or more active clients in the conference is generated. For example, a media server mixing / switching an audio input stream can configure an active client list (SSRC IDs in RTP / RTCP), or clients providing input to a meeting or session. For example, the media server is an audio / video mixing server (AVMCU) that obtains an SSRC identifier from an active client transmission stream, which may include a data area and an associated signaling area. The AVMCU may then determine the relative configuration of the active client SSRCs or other ID of the client in the session. The SSRC may be identified from an RTCP report that may be mapped to a client CNAME. For example, the ranking may be based on which client is currently active. In other implementations, factors such as energy level, number of data packets provided, duration of silence period, packet size, and the like can be considered. For example, an ordered list can start with an active client that can be in session with a given number of packets, while a second concurrent and active client is assigned a lower relative status. .

순서화된 리스트는 미디어 서버 송신 데이터 스트림 흐름들 내의 패킷 헤더들에 포함된 CSRC 리스트 필드에 삽입될 수 있다(406). 예를 들면, 미디어 서버 출력은 적극적인 클라이언트에 의해 제공된 오디오를 포함하는데, CSRC 필드는 이 적극적인 클라이언트들의 SSRC ID들의 순서화된 리스트를 포함한다. 결과적으로, 청취하는 클라이언트, 즉, 오디오 컨텐츠 스트림을 수신하는 클라이언트에 어떤 클라이언트들이 적극적인지, 및 이 적극적인 클라이언트들의 상대적인 관계에 대하여 알려줄 수 있다. 추가적으로, SSRC 및 CNAME은 미디어 서버 송신 RTCP 패킷들에 포함될 수 있다.The ordered list may be inserted 406 into the CSRC list field included in the packet headers in the media server transmission data stream flows. For example, the media server output includes audio provided by an active client, with the CSRC field containing an ordered list of SSRC IDs of these active clients. As a result, it is possible to inform the listening client, i.e., the client receiving the audio content stream, about which clients are active and the relative relationship of these active clients. In addition, SSRC and CNAME may be included in the media server transmission RTCP packets.

SSRC들은 적극적인 오디오 클라이언트의 CNAME과 연관될 수 있다(408). 예를 들면, 미디어 서버는 RTP 패킷 헤더의 CRSC 필드에 포함된 SSRC에 관련된 클라이언트 CNAME을 포함하는 RTCP를 송신할 수 있다. 이 CNAME은 RTCP 패킷들로부터 획득될 수 있다. SSRCs may be associated with a CNAME of an active audio client (408). For example, the media server may transmit an RTCP that includes a client CNAME related to the SSRC included in the CRSC field of the RTP packet header. This CNAME can be obtained from RTCP packets.

사람이 이해할 수 있는 정보가 CNAME 및/또는 오디오 입력 클라이언트 SSRC에도 연관될 수 있다. 예를 들면, 사진 또는 이름이 클라이언트 CNAME에 연관될 수 있어 관련 클라이언트가 오디오 컨텐츠를 제공하고 있을 때 참여자의 사진 또는 이름이 나타나도록 할 수 있다. 이 정보는 회의 내에서 전달될 수 있거나 클라이언트가 이러한 사람이 이해할 수 있는 정보를 입력할 수 있다.Human understandable information may also be associated with the CNAME and / or audio input client SSRC. For example, a picture or name can be associated with the client CNAME so that the participant's picture or name appears when the relevant client is providing audio content. This information can be delivered within the meeting or the client can enter information that this person can understand.

다른 구현들에서, GRUU가 적극적인 클라이언트의 SSRC와 연관될 수 있다. 한 클라이언트가 적극적이지만 다른 클라이언트들은 적극적이지 않은 몇몇의 경우에서, 미디어 서버는 적극적인 클라이언트에 표시(indication)를 제공하여(410) 적극적인 클라이언트에 어떠한 다른 클라이언트도 적극적이지 않음을 통지하지만, 적극적인 클라이언트 송신 스트림은 적극적인 클라이언트로 반환되지 않는다. 이런 식으로, 적극적인 클라이언트가 그 참여자는 다른 참여자"를 상대로 말하고" 있지는 않음을 알게 할 수 있다. In other implementations, the GRUU may be associated with the SSRC of the active client. In some cases where one client is active but not others, the media server provides an indication to the active client (410) to notify the active client that no other client is active, but the active client transmission stream. Is not returned to the active client. In this way, the active client can be made aware that the participant is not "talking" to the other participant.

RTP 및 RTCP가 설명되었지만, 본 개시물의 기술들 및 구현들은 다른 프로토콜 데이터 전송 메카니즘들에 적용될 수 있다.Although RTP and RTCP have been described, the techniques and implementations of the present disclosure can be applied to other protocol data transfer mechanisms.

결론conclusion

본 발명은 구조적인 특징 및/또는 방법론적인 행위들에 특정된 언어로 기술되었지만, 첨부된 특허 청구 범위들로 정의된 본 발명은 기술된 특정 특징들 또는 행위들에 반드시 제한되어야 할 필요는 없다고 이해되어야 한다. 오히려, 특정 특징들 및 행위는 특허 청구 범위에 청구된 발명을 구현하는 예시적인 형태들로서 개 시된 것이다.Although the invention has been described in language specific to structural features and / or methodological acts, it is understood that the invention as defined by the appended claims is not necessarily limited to the specific features or acts described. Should be. Rather, the specific features and acts are disclosed as example forms of implementing the invention claimed in the claims.

Claims

Ordering one or more audio input clients according to input provided by respective audio input clients included in the one or more audio input clients (306);

Associating an identifier (ID) with the respective audio input client (304);

Inserting an ordered list of one or more audio input client IDs into a packet header (308)

How to include.

The method of claim 1,

And the list is inserted into a real-time transport protocol (RTP) contributing source list (CSRC) in the packet header.

The method of claim 1,

Ordering is determined by the host mixing audio streams such that the list is descended from the currently active audio input client.

The method of claim 1,

Sending (310) for each client a canonical name (CNAME) and a synchronization source (SSRC) identifier mapped to the CNAME.

The method of claim 4, wherein

Wherein the CNAME and associated SSRC are obtained from a real-time control protocol (RTCP) record for the respective client.

The method of claim 5,

The CNAME and associated SSRC are sent in an RTCP packet to a listening client.

The method of claim 1,

Storing (312) the CNAME and SSRC in a listening client local memory.

The method of claim 1,

The dominant client is determined based on at least one of energy level, duration of silence period, duration or packet size.

The method of claim 1,

Updating the SSRC identifier by the CNAME when the client changes the source transport address in the session.

Ordering a list of one or more active audio clients in a meeting based on individual active audio client participants in the meeting (404), wherein the respective active audio clients are associated with a CNAME and SSRC identifier;

Inserting the ordered list into an RTP CSRC list field in one or more audio streams (408).

How to include.

The method of claim 10,

The ordered list starts with a dominant and active client.

The method of claim 11,

Determining a dominant client based on at least one of an energy level, a duration of a silent period, a duration, or a packet size.

The method of claim 11,

The SSRC identifier is mapped to a CNAME obtained from a control packet for one or more received audio streams.

The method of claim 13,

The control packet is compatible with RTCP.

The method of claim 13,

The SSRC and the CNAME are included in an RTCP packet.

The method of claim 10,

And (410) providing an indication that the predominant and active audio client included in the one or more active clients, i.e., the other audio client, is not active.

A media server (102) for generating one or more outgoing media streams from received aggressive clients, wherein the media server inserts an ordered list of one or more aggressive clients into the one or more media streams.

The method of claim 17,

The media server inserts the ordered list into an RTP CSRC in a packet header.

The method of claim 17,

And the ordered list is representative of a client predominant based on at least one of an energy level, a duration of a silent period, a duration, or a packet size associated with a media stream received from the client predominant.

The method of claim 17,

And the media server transmits aggressive client IDs in RTCP packets to clients receiving transmission media streams, including a CNAME for the aggressive client and an SSRC identifier mapped to the CNAME.