WO2012083799A1 - 级联会议中级联会场的处理方法、装置及系统 - Google Patents

级联会议中级联会场的处理方法、装置及系统 Download PDF

Info

Publication number
WO2012083799A1
WO2012083799A1 PCT/CN2011/083806 CN2011083806W WO2012083799A1 WO 2012083799 A1 WO2012083799 A1 WO 2012083799A1 CN 2011083806 W CN2011083806 W CN 2011083806W WO 2012083799 A1 WO2012083799 A1 WO 2012083799A1
Authority
WO
WIPO (PCT)
Prior art keywords
cascading
audio
site
audio data
conference
Prior art date
Application number
PCT/CN2011/083806
Other languages
English (en)
French (fr)
Inventor
梁丽燕
Original Assignee
华为终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为终端有限公司 filed Critical 华为终端有限公司
Priority to EP11851966.9A priority Critical patent/EP2574051B1/en
Priority to ES11851966.9T priority patent/ES2585003T3/es
Publication of WO2012083799A1 publication Critical patent/WO2012083799A1/zh
Priority to US13/715,436 priority patent/US8836753B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, an apparatus, and a system for processing a cascade conference site in a cascade conference. Background technique
  • the conference is usually held in an ordinary conference room under the MCU (Multipoint Control Unit). That is, the ordinary conference sites in the conference are connected to the same MCU.
  • MCU Multipoint Control Unit
  • it is necessary to hold a cascade conference that is, not only the conference site of each MCU joins, but also the conferences of multiple MCUs between multiple MCUs through the cascade conference site.
  • Connected to a meeting so that the purpose of meeting with multiple MCU venues.
  • a system needs to hold a national conference. It has MCUs and conference halls in Beijing, provincial capitals, cities and counties, so that a nationwide cascading conference can be held to arrange MCUs in Beijing, provincial capitals, and cities.
  • Each venue can be connected to its own MCU. Due to the large number of conference venues and scattered in different places, each conference site through the cascade conference only needs to connect to the nearest MCU to reduce the network requirements.
  • the MCU1 is connected to three sites, which are remotely presented.
  • the venue T1, ⁇ 3 and the ordinary venue ⁇ 2 the telepresence conference site T1 includes three screens respectively TIL, TIC, T1R, telepresence venue ⁇ 3 including three screens respectively T3L, T3C, T3R, MCU2 connected with three venues, respectively remote Presenting the conference venue ⁇ 4, ⁇ 6 and the ordinary conference venue ⁇ 5, the telepresence conference venue ⁇ 4 includes three screens respectively T4L, T4C, T4R, and the telepresence conference venue ⁇ 6 includes three screens respectively T6L, T6C, T6R.
  • each MCU supports audio data with a maximum of 2 sounds, ie the MCU is connected In all the sites (including the common site, the telepresence site, and the cascading site), the audio data of the two sites with the highest sound is selected for mixing. If there are less than two sites, the site will be selected. The data is mixed.
  • the cascading audio channel is T12.
  • the cascading video channel is a video stream, which is the mid-screen T1C of the site T1.
  • the conference mix processing is as follows: In the processing of MCU1, it is assumed that the cascade channel is the maximum two-party mix, and that the two largest sounds on the MCU1 are the conference sites T1 and T2, then the MCU1 outputs the cascade audio channel to the MCU2.
  • the two parties with the loudest sound at the time are the cascade sites T12 and ⁇ 5.
  • the venues for ⁇ 4 and ⁇ 6 are heard.
  • the situation to the sound is as follows:
  • the sound it hears is T12+T5, that is, T1+T2+T5.
  • the images displayed on the three screens are respectively T1, T1, T5, ⁇ 6.
  • the MCU2 needs to process the sound heard by the ⁇ 4, that is, adjust the orientation of the sound of each venue, adjust to the orientation of the corresponding image, and then perform the mixing output. Give the venue ⁇ 4, so that the sound position heard by ⁇ 4 can correspond to the orientation of the image.
  • the MUC2 can directly process the audio data to adapt to the orientation adjustments of the conference sites ⁇ 4 and ⁇ 6.
  • the T12 is a cascaded conference site
  • the audio data is the result of the mixing of the upper-level MCU, that is, the sum of the data of the conference sites T1 and ⁇ 2, since both the ⁇ 4 and the ⁇ 6 display the image of the conference site T1C, The position of the image is different. If the audio orientation of T1 is adjusted according to the position of the image displayed in each venue, since the data of T1 and ⁇ 2 cannot be separated, that is, the sound orientation of ⁇ 2 is adjusted at the same time, since the two sites see the image of T1. Different orientations will inevitably lead to different orientations of ⁇ 2 heard by ⁇ 4 and ⁇ 6, and the image orientation and sound orientation of each venue in the cascade conference cannot be achieved. The effect should be.
  • the audio data of the cascade site is the result of the mixing of the upper MCU, and the sound orientation is often inconsistent with the image position of the display site. Because the data cannot be separated as a result of the mixing, for different display screens.
  • the audio orientation is adjusted, the audio corresponding to the display screen cannot be adjusted separately, but the result of the mixing is uniformly adjusted, that is, the audio orientation that should not be adjusted is also adjusted, and the level cannot be achieved.
  • the image orientation and sound orientation of each venue in the conference will reduce the user experience of the participants. Summary of the invention
  • the embodiments of the present invention provide a method, a device, and a system for processing a cascading conference site in a cascading conference, which are used to implement the image orientation and voice orientation of each conference site in the cascading conference, and improve the user experience of the conference participants.
  • the processing method of the cascading site in the cascading conference includes: receiving the audio code stream sent by the cascading site, where the audio code stream sent by the cascading site is occupied by different sites.
  • the audio data stream sent by the channel or the audio cascade channel is received; the audio code stream sent by the non-cascading site is received; and the audio data that meets the preset condition is selected from the audio data to be selected, wherein the audio data to be selected includes: The audio stream sent by the cascading site and the audio stream sent by the non-cascading site; the order of the audio data that satisfies the preset condition is adjusted.
  • the processing method of the cascading site in another cascading conference includes: receiving an audio code stream sent by the non-cascading site; and selecting audio data that meets the preset condition from the audio data to be selected,
  • the audio data to be selected includes at least the received audio stream that is sent by the non-cascading site; and the audio data that meets the preset condition is processed by different audio channels or audio cascade channels at different sites to obtain a cascade.
  • the audio data of the site enables the first cascading site to identify the cascading site audio data; the cascading site audio data is encoded to obtain an audio code stream; and the audio stream is sent to the first cascading site.
  • the processing device of the cascading conference site in the cascading conference includes: a receiving unit, configured to receive an audio code stream sent by the cascading conference site, where the audio code stream sent by the cascading conference site is different
  • the venue occupies different audio channels or audio cascade channels; is also used to receive audio streams sent by non-cascading venues; and a selection unit for selecting full of audio data to be selected
  • the audio data of the preset condition, wherein the audio data to be selected includes: the received audio stream sent by the cascading site and the audio stream sent by the non-cascading site; a sequence adjusting unit, configured to satisfy the preset condition The orientation of the audio data is adjusted.
  • the processing device of the cascading conference site in another cascading conference includes: a receiving unit, configured to receive an audio code stream sent by the non-cascading conference site; and a selecting unit, configured to be used in the audio data to be selected The audio data that meets the preset condition is selected, where the audio data to be selected includes at least the audio stream that is sent by the non-cascading site; and the processing unit is configured to occupy the audio data that meets the preset condition by using different sites.
  • Different audio channels or audio cascade channels are processed to obtain the cascading site audio data, so that the first cascading site can identify the cascading site audio data;
  • the coding unit is configured to encode the cascading site audio data to obtain the audio code.
  • a sending unit configured to send an audio stream to the first cascading venue.
  • the invention further provides a cascade conference system.
  • the embodiments of the present invention have the following advantages:
  • the processing device of the cascading conference in the cascading conference needs to adjust the orientation of the audio data in the cascading conference
  • the audio data orientation to be adjusted can be directly adjusted, that is, the cascading conference intermediate level
  • the adjustment of the orientation of the single audio data by the processing device of the conference venue does not affect other audio data, and the one-to-one correspondence between the image orientation and the sound orientation of each conference site in the cascade conference can be realized, and the user experience of the participant is improved.
  • FIG. 1 is a structural diagram of a cascade conference site in a cascade conference in the prior art; Intention; intention; intention;
  • FIG. 6 is a structural diagram of a cascade conference site in a cascade conference according to an embodiment of the present invention.
  • Figure ⁇ is a structural diagram of a cascading venue in another cascading conference in the embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a processing device of a cascade conference site in a cascade conference according to an embodiment of the present invention
  • FIG. 9 is a schematic diagram of a processing device of a cascade conference site in another cascade conference according to an embodiment of the present invention
  • the embodiments of the present invention provide a method, a device, and a system for processing a cascading conference site in a cascading conference, which are used to implement the image orientation and voice orientation of each conference site in the cascading conference, and improve the user experience of the conference participants.
  • FIG. 2 is a flowchart of a processing method of a cascade conference site in a cascade conference provided by an embodiment of the present invention. This embodiment describes the processing flow of the processing device of the cascading conference site in the cascading conference as the receiving end.
  • the processing device of the cascading conference site is connected with the cascading conference site, and the non-cascading conference site, such as:
  • the conference site is connected to the remote presentation site.
  • the embodiment of the present invention includes the following steps:
  • the received audio stream sent by the cascading site is sent by different audio channels or audio cascade channels occupied by different sites, and the audio stream is performed by the cascading site in the prior art. Mixing and then sending is different.
  • only one audio cascade channel may be disposed, where the audio cascade channel includes more than two audio channels, and the audio code stream is sent by different venues occupying different audio channels, that is, Audio streams of different venues are transmitted in different audio channels.
  • two or more audio cascade channels may be configured, and the audio code stream is sent by different audio channels in different venues, that is, different conference sites are respectively transmitted in each audio cascade channel. Audio stream.
  • the embodiment of the present invention further includes: receiving audio concatenation channel composition information sent by the concatenation site, where the audio stream is composed of different audio concatenation channels, where the audio concatenation channel composition information is a cascade conference site. The number of audio cascade channels established to obtain the number of audio cascade channels occupied by the received audio stream sent by the cascaded conference site.
  • the audio code stream sent by the ordinary site connected to the processing device of the cascading site in the cascading conference, and/or the audio code corresponding to each screen of the remote presentation site connected to the processing device of the cascading conference site in the cascading conference may be received.
  • a plurality of audio code streams corresponding to respective screens of the remote presentation site are respectively input as a single audio code stream, and the plurality of audio code streams are independent, and the remote presentation site is The transmission of multiple audio streams is independent.
  • Step 202 may be performed after step 201, or step 201 may be performed first, or step 201 may be performed simultaneously.
  • 201 and 202 which are not limited herein.
  • the audio data that meets the preset condition is selected from the audio data to be selected, wherein the audio data of each site participates as a piece of audio data to determine whether the preset condition is met, and the audio data to be selected includes: the received cascading venue The transmitted audio stream and the audio stream sent by the non-cascading venue.
  • the number of selected audio data is less than or equal to a predetermined number, wherein the predetermined number is preset according to a preset condition.
  • the preset conditions in the embodiment of the present invention may specifically be to retain the audio data of the largest number of sounds in the cascaded and non-cascading venues, for example, when the audio of the largest and third parties in the cascaded and non-cascading venues is reserved.
  • the first 3 audio data of the sound volume from the largest to the smallest are selected from all the audio data.
  • the audio data satisfying the preset condition is filtered out from the audio data, and the number of selected audio data is less than or equal to a predetermined number.
  • the preset condition may be that the audio data of the preset site may be reserved.
  • the specific site may be preset, and the one or more sites specified by the user may be specifically configured in the embodiment of the present invention.
  • the audio data of the preset site is reserved, for example, the audio data to be selected includes the audio data of the ordinary site T1, the audio data of the ordinary site T2, the audio data of the remote presentation site T3, and the audio data of the remotely presented site T4.
  • the pre-set condition is that only the audio data of the T1 site and the audio data of the T2 site are reserved, the audio data of the T1 site and the audio data of the T2 site can be retained by the screening of the preset conditions.
  • the audio data of the T3 site and the audio data of the T4 site are not retained because they are not the audio data of the site specified by the user.
  • the preset condition may specifically meet other conditions, as long as the audio data of different sites can be filtered, which is not limited herein.
  • the predetermined number is preset, and is determined according to the audio data of the maximum number of reserved voices selected by the processing device of the cascade conference in the cascade conference, for example, the processing device of the cascade conference in the cascade conference. Supporting the retention of the maximum 2-party audio data, the predetermined number is 2; in the cascade conference, the processing device of the cascade conference site supports the retention of the maximum 3-party audio data, and the predetermined number is 3.
  • the processing device of the cascading site supports the maximum number of 3-party audio data, and the number of sites (including the common site and the telepresence site and the cascading site) connected to the processing device of the cascading site in the cascading conference. If the number of the conferences is less than 3, the number of the audio data of the conference site can be selected as the cascading site audio data. If there are less than 3 sites connected to the processing device of the cascading conference in the cascading conference, you can only select less than 3 3 audio data.
  • the processing device of the cascaded conference in the cascade conference adjusts the orientation order of the audio data that satisfies the preset condition, and the specific manner may be as follows. Now:
  • the audio data corresponding to all the screens of the video source of the non-cascade site The output orientation order is the same as the orientation order of the displayed screen in the plurality of screens or the position of the screen in the position in the multi-screen, for example, a telepresence conference site T1, including three screens TIL, TIC, T1R, only one of the screens T1L is displayed, then the audio data output orientation corresponding to the three screens TIL, TIC, T1R of the telepresence site and the display orientation of the screen T1L displayed in the multiple screens The same, or the position of the picture at that position in the multi-picture is the same.
  • the audio data output orientation order corresponding to the two or more screens of the video source is displayed and the orientation of the screens of the video source being displayed.
  • the audio data output orientation order corresponding to the screens that are not displayed in the plurality of screens of the video source is the same as the one of the screens in which all the screens of the video source are displayed, for example, the telepresence conference site T1 includes three Screen TIL, TIC, T1R, only screen TIL, T1C is displayed, and T1R is not displayed, then the displayed TIL, T1C corresponding audio data output orientation order is the same as the displayed TIL, T1C orientation order, not displayed
  • the audio data corresponding to the T1R can be the same as the orientation order of one of the screens TIL, T1C displayed.
  • the output azimuth order priority of the audio data corresponding to the video source is from high to high. : The orientation of the independent screen, the orientation of the screen with a large sub-screen, and the orientation of the screen displayed with the medium, left, and right priority.
  • the telepresence site T1 includes three screens TIL, TIC, T1R, and one screen T1L is displayed in multiple multi-screens or in a multi-screen and independent screen, then the audio data output orientation corresponding to the screen T1L
  • the order priority is in order: the orientation of the independent screen, the orientation of the screen with a large sub-screen, and the orientation of the screen displayed with the medium, left and right priority.
  • the foregoing embodiment is only one of the methods for adjusting the orientation of the audio data according to the orientation order of the video source, and other implementation manners may be adopted, as long as the orientation order of the output audio data meets a certain order requirement.
  • the adjustment strategy may be: maintaining the orientation order of the audio data of the venue itself, or setting the orientation order of the audio data in a fixed position, such as a fixed position in the middle, or an orientation on both sides, and When you see the image, you can also fix it somewhere outside the screen.
  • the processing device of the cascading site of the cascading conference receives the audio code stream sent by the cascading site, which is sent by different sites occupying different audio channels or audio cascade channels, when cascading
  • the audio data to be adjusted can be directly adjusted, that is, the processing device of the cascading conference site in the cascading conference does not adjust the orientation of the single audio data. It will affect other audio data, and can achieve the image orientation and sound orientation of each site in the cascade conference - correspondingly, improving the user experience of the participants.
  • the step 202 may further include: decoding the audio code stream sent by the cascaded site and the audio code stream sent by the non-cascading site, and then the audio data to be selected.
  • the method includes: decoding the audio code stream sent by the cascading site and the audio code stream sent by the non-cascading site.
  • the processing device of the cascading conference in the cascading conference receives the audio code stream sent by the cascading site and the audio code stream sent by the non-cascading site, and then adjusts the orientation of the audio data.
  • the processing device of the cascade conference in the cascade conference is described in the point of sending the audio stream to the first cascade conference site. Please refer to the following embodiment.
  • FIG. 3 is a flowchart of a processing method of a cascading conference site in a cascading conference according to an embodiment of the present invention.
  • the embodiment describes a processing flow of a processing device of a cascading conference site in a cascading conference, and the cascading conference intermediate level
  • the processing device of the conference site is connected to the first cascading site, and is also connected to the non-cascade site, such as a common conference site and/or a remote presentation site.
  • the embodiment of the present invention includes the following steps:
  • the audio code stream sent by the ordinary site connected to the processing device of the cascading site in the cascading conference, and/or the audio code corresponding to each screen of the remote presentation site connected to the processing device of the cascading conference site in the cascading conference may be received.
  • a plurality of audio code streams corresponding to respective screens of the remote presentation site are respectively input as separate audio code streams, and the plurality of audio code streams are The inter-generation is independent, and the telepresence site is independent of the transmission of the multiple audio streams.
  • the audio data to be selected includes at least the audio code stream sent by the non-cascade site, and the audio data satisfying the preset condition is selected from the audio data to be selected, and the number of the selected audio data is less than or equal to a predetermined number, wherein, the predetermined The number is preset according to the preset condition.
  • the preset condition reference may be made to the description of step 203 in FIG.
  • the audio data that meets the preset condition is processed by different venues occupying different audio channels or audio cascade channels.
  • the processing device of the cascading conference in the cascading conference selects the audio data that meets the preset condition, it can process the audio channel or the audio cascading channel that is occupied by different sites, and obtain the cascading site audio data, so that The first level of the conference site can identify the audio data of the cascade site.
  • the processing of the audio data is performed separately according to the audio channel or the audio cascade channel, and the processing of the single audio data does not affect other processing.
  • the audio data the specific processing method will be explained in the subsequent embodiments.
  • the cascading site audio data obtained in the above step is encoded, and the encoded result is used as an audio stream.
  • the audio stream is sent to the first cascading venue.
  • the first cascading site is a cascading site directly connected to the processing device of the cascading conference in the cascading conference.
  • the processing device of the cascading site in the cascading conference in the embodiment of the present invention performs different audio channels or audio cascade channels for different audio data of the preset site.
  • the cascading site audio data is processed, so that the first cascading site as the receiving end can directly adjust the audio data when the audio data needs to be adjusted in the azimuth order.
  • the present invention provides an embodiment of a method for processing a cascading site in another cascading conference.
  • the processing device of the cascading conference site and the first cascading venue the second level
  • only one audio cascade channel is provided, and the audio cascade channel includes two or more audio channels, that is, in each audio channel. Transmit audio stream.
  • Embodiments of the invention include:
  • the audio code stream sent by the ordinary site connected to the processing device of the cascading site in the cascading conference, and/or the audio code corresponding to each screen of the remote presentation site connected to the processing device of the cascading conference site in the cascading conference may be received.
  • a plurality of audio code streams corresponding to respective screens of the remote presentation site are respectively input as a single audio code stream, and the plurality of audio code streams are independent, and the remote presentation site is The transmission of multiple audio streams is independent.
  • the processing device of the cascading conference site When the processing device of the cascading conference site is connected to the second cascading site, the processing device of the cascading conference site also receives the audio code stream sent by the second cascading site.
  • the second cascading site is a cascading site directly connected to the processing device of the cascading conference in the cascading conference.
  • the audio data to be selected includes at least the audio stream that is sent by the non-cascading site.
  • the preset condition in the embodiment of the present invention may specifically be to retain audio data of the largest number of sounds in the cascaded and non-cascading venues, according to the reserved sound.
  • the principle of the maximum number of audio data is to filter the audio data from the audio data to be selected, and the number of selected audio data is less than or equal to a predetermined number.
  • the audio code stream sent by the non-cascading site connected by the processing device connected to the processing device in the cascade conference of the cascading conference and the audio code sent by the second cascade conference site are compared.
  • the audio stream of the second cascading site is compared to the audio stream of the conference site.
  • the audio stream of the second cascading site can be superimposed as an audio stream to participate in the comparison.
  • the cascading information of the audio stream of the second cascading site may be cascaded to the cascading conference by means of real-time transport protocol (RTP, Real-time Transport Protocol).
  • the processing device of the conference site transmits, and after receiving the audio envelope information, the processing device of the cascade conference in the cascade conference participates in the comparison of the audio data of the largest number of reserved voices. 403.
  • the audio data that meets the preset condition is the audio data corresponding to the screen of the remote presentation site
  • the audio data corresponding to the screen of the remote presentation site is used as the audio data corresponding to the site.
  • the screen of the telepresence site that satisfies the preset condition is transmitted as a separate site through a separate audio channel, and the cascade site is also transmitted as a site through a separate audio channel.
  • the audio data that meets the preset condition is the audio data corresponding to the normal site and the number of channels of the common site is not mono
  • the audio data of the ordinary site is mixed into mono audio data.
  • the audio data of the ordinary conference site whose number of channels is not mono is mixed into mono audio data, and is transmitted through one audio channel.
  • the video code stream that needs to be sent is part or all of the video code stream received by the processing device of the cascading conference site in the cascading conference, and specifically, which video code streams need to be sent to the first cascading conference site, which can be selected by the user.
  • the decision may also be determined by the processing device of the cascading conference in the cascading conference, which is not limited herein.
  • the audio data that satisfies the preset condition adjusted in the azimuth order is used as the cascaded conference audio data.
  • the audio data that meets the preset condition is separately adjusted according to different venues, so that the first cascade conference site as the receiving end can be according to the video code stream.
  • the orientation order of the images identifies the orientation order of the audio data that satisfies the preset conditions.
  • the orientation order of the audio data that satisfies the preset condition is adjusted to be the same orientation order as the video code stream to be transmitted. If the video source corresponding to the audio data that meets the preset condition is different from the video stream to be transmitted, the orientation order of the audio data that satisfies the preset condition is adjusted according to the policy described in step 204 in FIG. For example: The video stream to be sent is T2, T1C, T3R.
  • the audio data that meets the preset condition is T3R, T2, TIL
  • the audio data T2 and video code The stream T2 is the same, the audio data ⁇ 2 is adjusted to the same order as the video stream, that is, the left side, the audio data T1L and the video stream are different, but there is T1C in the video stream, since the video stream T1C and the audio stream T1L are the same
  • the orientation order of the audio data T1L is adjusted to the orientation order of the displayed video code stream T1C, that is, the audio data T1L is adjusted to the orientation order of the video source T1C, that is, the audio data T3R and
  • the video stream T3R is the same, and the audio data T3R is adjusted to the right side of the same direction as the video stream T3R, so the audio data after the orientation order is adjusted is T2, TIL, T3R.
  • step 405 can also be replaced by the following steps:
  • the audio data that meets the preset condition is sorted according to different audio channels of different venues, and the sorted audio data that meets the preset condition is used as the cascaded conference audio data, wherein the ordering may be according to the maximum number of sounds.
  • the order of the audio data is sequentially arranged, but other sequences may be used, which are not limited herein.
  • the audio site location information is generated, where the audio site location information is location ranking information of the audio data that satisfies the preset condition.
  • the generated audio site location information is sent to the first cascading site, and may be sent to the first cascading site by using the padding data of the RTP padding information in the specific implementation, and other implementation manners may be adopted, which are not limited herein.
  • the cascading site audio data obtained in the above step is encoded, that is, the audio stream can be obtained.
  • the audio stream is sent to the first cascade site.
  • the first cascading site is another cascading site directly connected to the processing device of the cascading conference in the cascading conference.
  • the audio data of the corresponding cascaded conference site is also adjusted in an azimuth order, so that the first cascade conference site as the receiving end can be configured according to
  • the azimuth order of the video code stream identifies the azimuth order of the audio data that satisfies the preset condition, so that the first cascaded conference site can individually adjust the audio data.
  • the processing device of the cascading site in the cascading conference in the embodiment of the present invention performs different audio channels or audio cascade channels for different audio data of the preset site.
  • the cascading site audio data is processed, so that the first cascading site as the receiving end can directly adjust the audio data when the audio data needs to be adjusted in the azimuth order.
  • the present invention provides an embodiment of a method for processing a cascading site in another cascading conference.
  • the conference field is connected to the non-cascading conference site.
  • more than two audio cascade channels are provided, which is different from the prior art in that only one audio cascade channel is different, that is, in each audio.
  • the audio stream is transmitted separately in the cascade channel.
  • Embodiments of the invention include:
  • step 401 The content of this step is the same as that of step 401 in the previous embodiment. For details, refer to this step, which is not described in detail here.
  • the audio stream sent by the non-cascading site and the second audio stream sent by the second cascading site are obtained, the audio stream can be decoded. It should be noted that the audio stream is optional. An implementation form.
  • audio data to be selected specifically includes: decoding the audio code stream sent by the non-cascading site and the audio code stream sent by the second cascade site.
  • audio data may be selected from the audio data to be selected according to the principle of retaining audio data of the largest number of sounds, and the number of selected audio data is less than or equal to a predetermined number.
  • the audio data that meets the preset condition is processed according to different audio cascading channels occupied by different sites to obtain cascading site audio data.
  • the audio data that meets the preset condition is processed according to different audio cascading channels occupied by different sites, that is, only one audio channel is used to transmit audio of one site.
  • Data the audio data that meets the preset condition is used as the cascaded site audio data.
  • the site that meets the preset condition is used as a separate site to transmit audio data through the audio cascade channel, and in the cascade conference, the cascading site is provided with multiple audio cascade channels, then in the embodiment of the present invention,
  • the audio data of each site can be processed according to each audio cascade channel, and the number of channels of each audio cascade channel can be mono, two-channel, three-channel or more channels, here Not limited.
  • the audio cascading channel composition information is the number of audio cascading channels established by the cascading site, so as to obtain the audio level occupied by the received audio stream sent by the cascading site.
  • the number of joint channels is the number of audio cascading channels established by the cascading site.
  • the audio data that satisfies the preset condition in the above steps is encoded as an audio stream.
  • the method is sent to the first cascading venue.
  • the processing device of the cascading conference in the cascading conference is provided with a plurality of audio cascading channels, and the audio cascading channels satisfying the preset conditions always change at all times, in the embodiment of the present invention.
  • the audio cascade channel composition information needs to be sent to the first cascade site.
  • the processing device of the cascading conference in the cascading conference in the embodiment of the present invention selects that the audio data that meets the preset condition is separately sent according to different audio cascading channels, so that the receiving end is used as the receiving end.
  • the processing device of the cascade conference site in the cascade conference can directly adjust the audio data directly when it is necessary to adjust the orientation of the audio data.
  • the structure of a cascading conference site in a cascade conference provided by an embodiment of the present invention is configured to provide an audio cascade channel, where the audio cascade channel includes For example, two or more audio channels, see Figure 6:
  • the cascading site has two MCUs, MCU1 and MCU2, of which MCU1 is connected to MCU2.
  • the MCU1 is connected to one common site and two telepresence sites. As shown in Figure 6, the figure is not adjusted before the audio stream sequence is adjusted.
  • one common site is T2
  • 2 telepresence sites are T1 and ⁇ 3 respectively
  • telepresence sites T1 and ⁇ 3 respectively have three screens, namely T1L, T1C, T1R and T3L, T3C, T3R, and MCU2 as MCU1 cascade site is also connected with MCU1.
  • MCU2 is connected to two common conference sites and one telepresence conference site. As shown in Figure 6, two common conference sites are T5 and T6, and one telepresence conference site is T4. It contains three screens, namely T4L and T4C. T4R.
  • Each MCU supports the encoding of the maximum 3-party audio site.
  • the MCU encodes the audio data of the three sites with the highest voice from all the connected sites (including the common site, telepresence site, and cascading site).
  • MCU1 can receive the audio stream sent by T1L, T1C, T1R, T2 and T3L, T3C and T3R.
  • MCU2 can receive the audio stream sent by T4L, T4C, T4R, T5 and ⁇ 6 and the cascading site audio code sent by MCU1.
  • the stream and the cascading site video code stream are as shown in FIG. 6.
  • the cascading site video code stream sent by the MCU1 to the MCU2 is T2, T1C, and T3R.
  • This embodiment does not describe the MCU1 to T1L, T1C,
  • the process of sending the cascading site media data by T1R, T2, and T3L, T3C, and T3R only describes the process in which the MCU1 sends the cascading site audio code stream to the MCU2.
  • the audio stream is decoded to obtain the site audio data corresponding to the T1L, T1C, T1R, T2, and T3L, T3C, and T3R, and the site audio data corresponding to the T1L, T1C, T1R, T2, and T3L, T3C, and T3R are treated as
  • the selected audio data is filtered out from the audio data to be selected according to the principle of retaining the audio data of the maximum 3 parties of the sound, and it is assumed that the audio data filtered according to the audio data of the 3 sounds with the largest remaining sound is T1C, T2, T3R.
  • the audio source of the cascading site video stream and the selected audio data are identical, that is, the cascading site video stream T2, T1C, T3R and the filtered audio data T1C, T2, T3R are identical.
  • the orientation order of the selected audio data may be adjusted correspondingly according to the sequence of the video stream of the cascading site. After the adjustment, the orientation order of the selected audio data and the orientation order of the cascading site video stream are the same.
  • the adjusted audio data is used as the cascading venue audio data and encoded to obtain the audio stream TIC, ⁇ 2, T3R of the MCU1. , MCU2 can recognize the audio stream.
  • the audio data filtered according to the audio data of the maximum 3 sounds of the reserved sound is not T1C, T2, T3R, that is, the source of the filtered audio data and the source of the cascaded video stream are different, the defined audio data is defined.
  • the site is an invisible site, the audio data corresponding to the invisible site can be adjusted according to the policy described in step 204 in FIG. The following describes from the perspective of the MCU 2 as the receiving end, first receiving the audio stream sent by the MCU 1, receiving the audio stream sent by the non-cascading venues T4, T5, T6 connected to the MCU 2, and then from the audio data to be selected.
  • the audio data that meets the preset condition is selected, and the process of selecting the audio data is the same as that in the MCU1, and will not be described here. Finally, the selected audio data can be adjusted in the orientation order.
  • the specific adjustment strategy has been described in the embodiment of FIG. 2, and details are not described herein again.
  • the adjustment of the orientation of the single audio data does not affect other audio data, and the one-to-one correspondence between the image orientation and the sound orientation of each conference site in the cascade conference can be realized, and the user experience of the participant is improved.
  • the embodiment of the present invention is described as another specific example. As shown in FIG. 7, the structure of a cascading conference in a cascade conference according to another embodiment of the present invention is described. In the embodiment of the present invention, more than two audios are set. Cascading channels, that is, audio streams are transmitted separately in each audio cascade channel, as shown in Figure ::
  • the cascading site has two MCUs, MCU1 and MCU2, of which MCU1 is connected to MCU2, and MUC1 and MUC2 are connected with four audio cascade channels and four video cascade channels.
  • the MCU1 is connected to two common conference sites and two telepresence conference sites. As shown in Figure , two common conference sites are T2 and T7. The two telepresence conference sites are T1 and T3 respectively.
  • MCU2 serves as the cascade conference site of MCU1. It is also connected to the MCU1.
  • the MCU2 is connected to two common sites and one telepresence site. As shown in Figure 7, the two common sites are T5 and T6.
  • the cascaded video sources between MCU1 and MCU2 are T2, T1C, T3R, and T7.
  • Each MCU supports the retention of the maximum 4-party audio site.
  • the MCU selects the audio data of the four sites with the highest voice from all the connected sites (including the common site, the telepresence site, and the cascading site).
  • the audio cascading channels of the MCU1 and the MCU2 are multiple, which may be determined according to the requirements of the cascading conference.
  • the audio cascading channel also performs audio according to the principle of preserving the audio data of the largest sound. The data is filtered.
  • the number of audio cascade channels is four, and the number of channels of each audio cascade channel may be mono, two, three, or more. This is not a limitation.
  • the audio data of the remote presentation site can be used as the audio data of a site, but the number of channels of the audio cascade channel can be set to two or three channels, Ensure that an audio cascade channel can load a far
  • the program presents all the audio of the venue.
  • MCU1 can receive the audio stream sent by T1, ⁇ 2, ⁇ 3, and ⁇ 7, and MCU2 can receive the audio stream sent by ⁇ 4, ⁇ 5, and ⁇ 6, and the cascaded site audio stream and cascading site video stream sent by MCU1, such as As shown in FIG. 7, in the embodiment of the present invention, the cascading site video code stream sent by the MCU1 to the MCU2 is T2, T1C, T3R, and T7. This embodiment does not describe that the MCU1 sends the cascading venue media to T1, T2, ⁇ 3, and ⁇ 7. The process of data only describes the process in which MCU1 sends a cascaded site audio stream to MCU2.
  • the MCU1 decodes the audio stream to obtain the site audio data corresponding to T1, ⁇ 2, ⁇ 3, and ⁇ 7, and the site audio data corresponding to T1, ⁇ 2, ⁇ 3, and ⁇ 7 as the audio data to be selected, according to the audio of the maximum 4-party venue.
  • the principle of data selects the audio data from the audio data to be selected as Tl, ⁇ 2, ⁇ 3, and ⁇ 7 as the cascading venue audio data. Then, the four audio data are respectively loaded into the corresponding audio cascade channel. If the selected audio data is the audio data of the conference site, the corresponding audio channel is loaded into the multi-channel audio cascade channel.
  • the cascading site audio data is encoded to obtain a cascading site audio stream, and then the cascading site audio stream is sent to the cascading site MCU 2, and the cascading is sent to the cascading venue MCU 2 Video stream of the venue.
  • the number of the audio cascading channels is used to obtain the number of the audio cascading channels that are received by the received audio stream, and may be sent in the form of the RTP padding information, but the method is not limited thereto.
  • MCU2 After MCU2 receives the data of the largest party site of the cascaded audio channel, and adds the site audio data directly connected to MCU2, MCU2 is actually equivalent to receiving the independence of each site of ⁇ 7, ⁇ 1, ⁇ 2, ⁇ 3, ⁇ 4, ⁇ 5, ⁇ 6.
  • Data according to the video stream that needs to be displayed on the MCU, adjust the orientation of the audio stream corresponding to the site, so that the orientation order of each site video stream and the orientation order of the corresponding audio stream are- correspond.
  • the processing manner in the embodiment of the present invention as shown in FIG. 2 is omitted, and details are not described herein again.
  • the processing device of the cascading conference site in the cascading conference provided by the embodiment of the present invention can be used as an MCU.
  • an example of a processing device of a cascade conference site in a cascade conference includes:
  • the receiving unit 801 is configured to receive the audio code stream sent by the cascading site, and the audio code stream sent by the cascading site is sent by different audio channels or audio cascade channels occupied by different sites; and is also used for receiving non-cascading The audio stream sent by the venue;
  • the selecting unit 802 is configured to select the audio data that meets the preset condition from the audio data to be selected, and the audio data to be selected includes: an audio code stream sent by the cascading site and an audio code stream sent by the non-cascading site;
  • the sequence adjustment unit 803 is configured to adjust the orientation order of the audio data selected by the selection unit 803.
  • the audio stream sent by the processing device of the cascading conference in the cascading conference is sent by different conference sites occupying different audio channels or audio cascade channels, when cascading
  • the orientation of the audio data to be adjusted can be directly adjusted, that is, the processing device of the cascading conference in the cascading conference adjusts the orientation of the single audio data. It does not affect other audio data, and can achieve the image orientation and sound orientation of each venue in the cascade conference - correspondingly, improving the user experience of the participants.
  • the receiving unit 801 when the audio code stream sent by the cascading site is sent by different sites occupying different audio cascade channels, the receiving unit 801 is further configured to receive the cascading.
  • the audio cascading channel information is sent by the site, where the audio cascading channel composition information is the number of audio cascading channels established by the cascading site, so as to obtain the number of audio cascading channels occupied by the received audio stream.
  • the processing device of the cascading conference in the cascading conference may further include: a decoding unit, configured to decode the audio code stream sent by the cascading venue and the audio code stream sent by the non-cascading venue.
  • the processing device in the cascading conference in the cascading conference receives the cascading venue from the conference.
  • the audio stream is subjected to an azimuthal adjustment of the audio data satisfying the preset condition.
  • the following describes the processing device of the cascading conference in the cascade conference from the perspective of sending the audio stream to the cascading venue. Please refer to Figure 9:
  • the receiving unit 901 is configured to receive the audio code stream sent by the non-cascading site, and further configured to receive the audio code stream sent by the second level conference site;
  • the decoding unit 902 is configured to decode the audio code stream received by the receiving unit 901.
  • the selecting unit 903 is configured to select audio data that meets the preset condition from the audio data to be selected, where the audio data to be selected specifically includes decoding. The result of unit 902 decoding;
  • the processing unit 904 is configured to process the audio data selected by the selecting unit 903 by using different audio channels or audio cascade channels by different sites to obtain the cascading site audio data, so that the first cascading site can identify the cascading Venue audio data;
  • the encoding unit 905 is configured to encode the processing result of the processing unit 904 to obtain an audio code stream
  • the sending unit 906 is configured to send the audio code stream to the first cascading venue.
  • the processing device of the cascading site in the cascading conference in the embodiment of the present invention performs different audio channels or audio cascade channels for different audio data of the preset site.
  • the cascading site audio data is processed, so that the first cascading site as the receiving end can directly adjust the audio data when the audio data needs to be adjusted in the azimuth order.
  • the processing device of the cascading site in the cascading conference of the embodiment of the present invention includes: a receiving unit 1001, configured to receive an audio code stream sent by the non-cascading site; and configured to receive audio sent by the second cascading site Code stream
  • the decoding unit 1002 is configured to decode the audio code stream received by the receiving unit 1001.
  • the selecting unit 1003 is configured to select audio data that meets a preset condition from the audio data to be selected, where the audio data to be selected specifically includes decoding. The result of decoding by unit 1002;
  • the processing unit 1004 includes: a site identification module 10041, configured to: if the audio data that meets the preset condition is the audio data corresponding to the screen of the remote presentation site, the screen corresponding to the remote presentation site The audio data is used as the audio data corresponding to the site; the audio module 10042 is configured to: if the audio data that meets the preset condition is the audio data corresponding to the common site, and the number of channels of the common site is not mono, the common site will be The audio data is mixed into mono audio data; the association module 1043 is configured to adjust the orientation order of the audio data that meets the preset condition according to different audio channels of different venues according to the orientation order of the video code stream that needs to be sent. And adjusting the audio data of the orientation condition that satisfies the preset condition as the cascaded conference audio data;
  • the encoding unit 1005 is configured to encode the cascading venue audio data acquired by the processing unit 1004 to obtain an audio code stream;
  • the sending unit 1006 is configured to send an audio code stream to the first cascading venue.
  • the processing module 10041 may not include the association module 10043, and may include the following modules: a sorting module, configured to use the audio data that meets the preset condition. According to different audio channels of different sites, the audio data that meets the preset conditions is used as the audio data of the conference site; the generating module is used to generate the location information of the audio site, and the location information of the audio site is preset.
  • the transmitting unit 1006 of the conditional audio device is further configured to send the audio conference site location information to the first cascade conference site.
  • two or more audio cascade channels may be disposed, that is, the audio code streams are respectively transmitted in the respective audio cascade channels
  • the processing device of the cascade conference site in the cascade conference includes the following units:
  • the unit is configured to generate audio cascading channel composition information, where the audio cascading channel composition information is information about the number of audio cascading channels established by the cascading site, so as to obtain the audio code stream sent by the received cascading site.
  • the number of the audio cascade channels; the sending unit 1006 in the processing device of the cascaded conference in the cascading conference in the embodiment of the present invention is further configured to send the audio cascade channel composition information to the first cascade conference site.
  • the processing device of the cascading conference in the cascading conference processes the audio data that meets the preset condition by using different audio channels or audio cascading channels at different sites to obtain a cascading venue.
  • the audio data enables the first cascading site as the receiving end to directly adjust the audio data when the audio data needs to be adjusted in the azimuthal order.
  • the embodiment of the invention further provides a cascade conference system, including: a processing device of the cascade conference site in the cascade conference shown in FIG. 8;
  • the audio code stream received by the cascade conference system is sent by different venues occupying different audio channels or audio cascade channels, when the cascade conference system needs to perform orientation on the audio data.
  • the adjustment of the sequence can directly adjust the audio data to be adjusted directly, that is, the adjustment of the orientation of the single audio data by the cascade conference system does not affect other audio data, and the image orientation and sound of each conference site in the cascade conference can be realized.
  • the orientation-correspondence improves the user experience of the participants.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Description

级联^义中级联会场的处理方法、 装置及系统 本申请要求于 2010 年 12 月 24 日提交中国专利局、 申请号为 201010605183.1、 发明名称为"级联会议中级联会场的处理方法、装置及系统" 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及通信技术领域, 尤其涉及级联会议中级联会场的处理方法、 装置及系统。 背景技术
一般的视频会议中, 通常是一个多点控制单元(MCU, Multipoint Control Unit )下的普通会场间开会, 即会议中的普通会场都是连接在同一个 MCU上 的。 但随着会议容量的增加或者是组网的越来越复杂, 这就需要召开级联会 议, 即不仅每个 MCU下的会场入会, 并且多个 MCU间通过级联会场把多个 MCU的会议连成一个会议,这样就达到了多个 MCU的会场一起开会的目的。 例如某系统需要召开全国会议, 其在北京、 各省会、 各地市、 各县均有 MCU 及会场, 这样就可以召开一个全国范围内的级联会议, 分别在北京、 各省会、 各地市安排 MCU,各会场分别连接到所属的 MCU即可。 由于参会会场众多, 并且分散在不同的地方, 通过级联会议各会场只需要连接最近的 MCU, 减少 对网络的要求。
如下举具体实例对现有的一种级联会议中级联会场的处理方法进行描 述, 如图 1所示, 在包含远程呈现会场的级联会议中, MCU1连接有三个会 场, 分别是远程呈现会场 Tl、 Τ3和普通会场 Τ2, 远程呈现会场 T1包括三个 屏分别为 TIL, TIC, T1R, 远程呈现会场 Τ3包括三个屏分别为 T3L, T3C, T3R, MCU2连接有三个会场, 分别是远程呈现会场 Τ4、 Τ6和普通会场 Τ5, 远程呈现会场 Τ4包括三个屏分别为 T4L, T4C, T4R, 远程呈现会场 Τ6包括 三个屏分别为 T6L, T6C, T6R。
假设每个 MCU都支持保留声音最大 2方的音频数据, 即 MCU从所连接 的所有会场 (包括普通会场, 远程呈现会场和级联会场) 中最多选择声音最 大的两个会场的音频数据进行混音, 如果连接的会场不到两个, 则会选择连 接的所有的会场的数据进行混音。
若 MCU1和 MCU2级联召开会议, 级联音频通道为 T12, 假设级联视频 通道为一条视频码流, 为会场 T1 的中屏 T1C。 会议混音处理为: 在 MCU1 的处理中, 假设级联通道为最大 2方混音, 并且假设当时 MCU1上的声音最 大的 2方为会场 T1和 T2, 则 MCU1输出到 MCU2的级联音频通道的混音码 流为 T12=T1+T2。 在 MCU2中 , 假设当时声音最大的 2方为级联会场 T12和 Τ5, 如果远程呈现会场 Τ4和 Τ6分别显示的图像为 T1C、 T5、 T6R和 T4L、 T1C、 T5, 则对于 Τ4和 Τ6会场听到声音的情况如下:
对会场 Τ4来说, 其听到的声音为 T12+T5即 T1+T2+T5, 同时因为会场 Τ4为远程呈现会场, 其三个屏幕显示的图像分别为 T1会场中屏即 T1C、 T5、 Τ6会场的右屏 T6R的图像。 由于用户希望 Τ4会场看到的图像方位与听到的 声音方位对应, 即听到 T1的声音在左边, 听到 Τ5的声音在中间, 而听到 Τ6 的声音在右边, 由于各会场本身的声音有一定的方位, 并不一定与图像显示 的方位一致,所以 MCU2就需要对 Τ4听到的声音进行处理, 即对各会场的声 音进行方位调整,调整至对应图像的方位后再进行混音输出给会场 Τ4,这样, Τ4听到的声音方位就能和图像的方位相对应。
对远程呈现会场 Τ6来说, 也有和 Τ4同样的问题, 也是需要对其听到的 声音(T12+T5 )的方位进行调整,使其与看到图像的方位相对应。对会场 Τ4、 Τ5、 Τ6来说, 由于这三个会场与 MCU2直接相连, MUC2可以直接对其音频 数据进行处理, 分别适应会场 Τ4和 Τ6的方位调整。
如现有技术的上述方案中, T12为级联会场, 音频数据为上一级 MCU混 音的结果, 即为会场 T1和 Τ2的数据之和, 由于 Τ4和 Τ6均显示会场 T1C的 图像, 但图像的位置不一样, 如果按照各会场显示图像的位置对 T1的音频方 位进行调整, 由于 T1和 Τ2的数据无法分开,也就是同时调整了 Τ2的声音方 位, 由于两个会场看到 T1 的图像方位不一样, 必然会导致 Τ4和 Τ6听到的 Τ2的方位不一样, 不能实现级联会议中各会场的图像方位与声音方位——对 应的效果。
从以上可以看出, 级联会场的音频数据作为上一级 MCU混音的结果, 其 声音方位常与显示会场的图像位置不一致, 由于作为混音的结果其数据无法 分开, 针对不同的显示屏幕对音频方位进行调整时不能单独的对和该显示屏 幕对应的音频进行调整, 而是将该混音的结果做统一的调整, 即会将不应该 调整的音频方位也做了调整, 不能实现级联会议中各会场的图像方位和声音 方位的——对应, 降低了与会者的用户体验。 发明内容
本发明实施例提供了一种级联会议中级联会场的处理方法、 装置及系统, 用于实现级联会议中各会场的图像方位和声音方位的——对应, 提高与会者 的用户体验。
本发明实施例提供的一种级联会议中级联会场的处理方法, 包括: 接收 级联会场发送的音频码流, 其中, 级联会场发送的音频码流是以不同的会场 占用不同的音频声道或音频级联通道发送的; 接收非级联会场发送的音频码 流; 从待选择的音频数据中选择出满足预置条件的音频数据, 其中, 待选择 的音频数据包括: 接收到的级联会场发送的音频码流和非级联会场发送的音 频码流; 对满足预置条件的音频数据的方位顺序进行调整。
本发明实施例提供的另一种级联会议中级联会场的处理方法, 包括: 接 收非级联会场发送的音频码流; 从待选择的音频数据中选择出满足预置条件 的音频数据, 其中, 待选择的音频数据至少包括接收到的非级联会场发送的 音频码流; 对满足预置条件的音频数据以不同的会场占用不同的音频声道或 音频级联通道进行处理得到级联会场音频数据, 使得第一级联会场能够识别 出级联会场音频数据; 将级联会场音频数据进行编码获得音频码流; 向第一 级联会场发送音频码流。
本发明实施例提供的一种级联会议中级联会场的处理装置, 包括: 接收 单元, 用于接收级联会场发送的音频码流, 其中, 级联会场发送的音频码流 是以不同的会场占用不同的音频声道或音频级联通道发送的; 还用于接收非 级联会场发送的音频码流; 选择单元, 用于从待选择的音频数据中选择出满 足预置条件的音频数据, 其中, 待选择的音频数据包括: 接收到的级联会场 发送的音频码流和非级联会场发送的音频码流; 顺序调整单元, 用于对满足 预置条件的音频数据的方位顺序进行调整。
本发明实施例提供的另一种级联会议中级联会场的处理装置, 包括: 接 收单元, 用于接收非级联会场发送的音频码流; 选择单元, 用于从待选择的 音频数据中选择出满足预置条件的音频数据, 其中, 待选择的音频数据至少 包括接收到的非级联会场发送的音频码流; 处理单元, 用于对满足预置条件 的音频数据以不同的会场占用不同的音频声道或音频级联通道进行处理得到 级联会场音频数据, 使得第一级联会场能够识别出级联会场音频数据; 编码 单元, 用于将级联会场音频数据进行编码获得音频码流; 发送单元, 用于向 第一级联会场发送音频码流。
本发明另外提供了一种级联会议系统。
从以上技术方案可以看出, 本发明实施例具有以下优点:
从本发明实施例提供的以上技术方案可以看出, 由于本发明实施例的级 联会议中级联会场的处理装置接收到级联会场发送的音频码流是以不同的会 场占用不同的音频声道或音频级联通道发送的, 当级联会议中级联会场的处 理装置需要对音频数据进行方位顺序上的调整时能够直接对需要调整的音频 数据方位进行单独调整, 即级联会议中级联会场的处理装置对单个音频数据 方位的调整不会影响到其它音频数据, 能够实现级联会议中各会场的图像方 位和声音方位的一一对应, 提高了与会者的用户体验。 附图说明
为了更清楚地说明本发明实施例中的技术方案, 下面将对实施例描述中 所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的附图仅仅是本 发明的一些实施例, 对于本领域的技术人员来讲, 还可以根据这些附图获得 其他的附图。
图 1是现有技术中一个级联会议中级联会场的结构图; 图; 意图; 意图; 意图;
图 6是本发明实施例中一个级联会议中级联会场的结构图;
图 Ί是本发明实施例中另一个级联会议中级联会场的结构图;
图 8是本发明实施例中一个级联会议中级联会场的处理装置的示意图; 图 9是本发明实施例中另一个级联会议中级联会场的处理装置的示意图; 图 10是本发明实施例中另一个级联会议中级联会场的处理装置的示意 图。 具体实施方式
本发明实施例提供了一种级联会议中级联会场的处理方法、 装置及系统, 用于实现级联会议中各会场的图像方位和声音方位的——对应, 提高与会者 的用户体验。
为使得本发明的发明目的、 特征、 优点能够更加的明显和易懂, 下面将 结合本发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整 地描述, 显然, 下面所描述的实施例仅仅是本发明一部分实施例, 而非全部 实施例。 基于本发明中的实施例, 本领域的技术人员所获得的所有其他实施 例, 都属于本发明保护的范围。 图 2描述的本发明一个实施例提供的级联会议中级联会场的处理方法的流程。 该实施例描述的是作为接收端的级联会议中级联会场的处理装置的处理流 程, 该级联会议中级联会场的处理装置与级联会场连接, 还与非级联会场, 如: 普通会场和 /或远程呈现会场相连, 本发明实施例包括如下步骤:
201、 接收级联会场发送的音频码流。
具体可以接收与级联会议中级联会场的处理装置连接的级联会场发送的 音频码流。 在本发明实施例中, 接收到的级联会场发送的音频码流是以不同 的会场占用不同的音频声道或音频级联通道发送的, 与现有技术中级联会场 将音频码流进行混音然后发送是不同的。
在本发明实施例中可以只设置有一条音频级联通道, 该音频级联通道包 括两条以上的音频声道, 则音频码流是以不同的会场占用不同的音频声道发 送的, 即在各个不同的音频声道中分别传输不同会场的音频码流。
在本发明实施例中也可以设置有两条以上的音频级联通道, 则音频码流 是以不同的会场占用不同的音频级联通道发送的, 即在各个音频级联通道中 分别传输不同会场的音频码流。 当音频码流以不同的会场占用不同的音频级 联通道发送时, 本发明实施例还包括: 接收级联会场发送的音频级联通道组 成信息, 其中, 音频级联通道组成信息为级联会场建立的音频级联通道的数 目信息, 以便获取到接收到的级联会场发送的音频码流占用的音频级联通道 的数目。
202、 接收非级联会场发送的音频码流。
具体可以接收与级联会议中级联会场的处理装置连接的普通会场发送的 音频码流, 和 /或与级联会议中级联会场的处理装置连接的远程呈现会场各个 屏幕分别对应的音频码流, 在本发明实施例中, 将远程呈现会场的各个屏幕 分别对应的多个音频码流分别作为单个的音频码流输入, 该多个音频码流之 间是独立的, 远程呈现会场对该多个音频码流的发送是独立的。
需要说明的是, 在本发明实施例中步骤 201和步骤 202之间没有先后顺 序的区别, 可以先执行步骤 201后执行步骤 202, 也可以先执行步骤 202后执 行步骤 201 , 也可以同时执行步骤 201和 202, 此处不作限定。
203、 从待选择的音频数据中选择出满足预置条件的音频数据。
从待选择的音频数据中选择满足预置条件的音频数据, 其中, 每个会场 的音频数据作为一个音频数据参加是否满足预置条件的选择, 待选择的音频 数据包括: 接收到的级联会场发送的音频码流和非级联会场发送的音频码流。 选择的音频数据的数量小于或等于预定数量, 其中, 预定数量是根据预置条 件预先设定。 在本发明实施例中预置的条件具体可以为保留级联会场和非级联会场中 声音最大几方的音频数据, 例如, 当保留级联会场和非级联会场中声音最大 3 方的音频数据时, 从所有的音频数据中选择出声音音量从大到小的前 3 个音 频数据。 根据保留声音最大几方的音频数据的原则, 从音频数据中筛选出满 足预置条件的音频数据, 选择的音频数据的数量少于或等于预定数量。
在本发明实施例中预置的条件具体还可以为保留预置会场的音频数据, 如可以预先设置特定的会场, 具体可以为用户指定的一个或两个以上的会场, 则在本发明实施例中满足预置条件为保留预置会场的音频数据, 如待选择的 音频数据包括普通会场 T1的音频数据, 普通会场 T2的音频数据, 远程呈现 会场 T3的音频数据, 远程呈现会场 T4的音频数据, 则假设预置的条件为只 保留用户指定的 T1会场的音频数据和 T2会场的音频数据, 则经过该预置条 件的筛选, 可以保留下来 T1会场的音频数据和 T2会场的音频数据, 而 T3 会场的音频数据和 T4会场的音频数据由于不是用户指定的会场的音频数据而 不会被保留。
需要说明的是, 在实际应用中, 预置的条件还具体可以为满足其它的条 件, 只要是可以对不同的会场的音频数据可以进行筛选即可, 此处不做限定。
在本发明实施例中, 预定数量是预先设定的, 具体根据级联会议中级联 会场的处理装置选择的保留声音最大几方的音频数据确定, 例如级联会议中 级联会场的处理装置支持保留最大 2方音频数据, 则预定数量为 2; 级联会议 中级联会场的处理装置支持保留最大 3方音频数据, 则预定数量为 3。 例如在 级联会议中级联会场的处理装置支持保留最大 3 方音频数据时, 如果级联会 议中级联会场的处理装置连接的会场 (包括普通会场和远程呈现会场以及级 联会场) 的数量大于或等于 3个, 则可以选择音量最大的 3个音频数据作为 级联会场音频数据,如果在级联会议中级联会场的处理装置连接的会场少于 3 个时, 则只能选择少于 3个的音频数据。
204、 对满足预置条件的音频数据的方位顺序进行调整。
当选择出满足预置条件的音频数据后, 级联会议中级联会场的处理装置 对满足预置条件的音频数据的方位顺序进行调整, 具体可以采用如下方式实 现:
如果非级联会场的视频源只有一个屏被显示在多个屏中的一个屏中或者 是在多画面中的一个位置的画面, 则该非级联会场的视频源的所有屏对应的 音频数据的输出方位顺序都为被显示的那个屏在多个屏中的显示方位顺序相 同或者在多画面中的那个位置的画面所在的方位顺序, 例如, 一个远程呈现 会场 T1 , 包括三个屏 TIL, TIC, T1R, 只被显示其中的一个屏 T1L, 则该远 程呈现会场的三个屏 TIL, TIC, T1R对应的音频数据输出方位都和被显示的 那个屏 T1L在多个屏中的显示方位顺序相同, 或在多画面中的那个位置的画 面所在的方位顺序相同。
如果非级联会场的视频源的多个屏中有两个以上的屏被显示, 则视频源 被显示的两个以上的屏对应的音频数据输出方位顺序和视频源被显示的这些 屏的方位顺序——对应, 视频源的多个屏中没有被显示的屏对应的音频数据 输出方位顺序为和视频源被显示所有屏中的其中一个屏保持相同, 例如, 远 程呈现会场 T1 , 包括三个屏 TIL, TIC, T1R, 只有屏 TIL, T1C被显示, 而 T1R没有被显示, 则被显示的 TIL, T1C对应的音频数据输出方位顺序和被 显示的 TIL, T1C的方位顺序相同, 没有被显示的 T1R对应的音频数据可以 和被显示的所有屏 TIL, T1C中的其中一个屏的方位顺序相同。
如果非级联会场的视频源中的一个屏被同时显示在多个多画面中或者是 一个多画面和独立屏中, 则视频源对应的音频数据的输出方位顺序优先级从 高到氏依次为: 独立屏的方位、 子画面大的屏的方位、 以中、 左、 右优先级 显示的屏的方位。 例如, 远程呈现会场 T1 , 包括三个屏 TIL, TIC, T1R, 有 一个屏 T1L被显示在多个多画面中或者是一个多画面和独立屏中, 那么该屏 T1L对应的音频数据输出方位的顺序优先级依次为: 独立屏的方位、 子画面 大的屏的方位、 以中、 左、 右优先级显示的屏的方位。
需要说明的是, 上述实施例只是根据视频源的方位顺序对音频数据进行 方位调整的其中一种方式, 也可以采用其它的实现方式, 只要输出的音频数 据的方位顺序符合一定的顺序要求即可。 如: 如果非级联会场的音频数据满 足了预置条件而被保留下来, 但却没有看该会场的图象, 即此会场为不可见 会场, 则调整策略可以是: 保持会场本身的音频数据的方位顺序, 或者, 将 音频数据的方位顺序设置在一个固定位置, 如固定在中间的方位出声, 或者 是两边的方位, 另外由于没看到其图象, 也可以固定在屏幕外的某个地方。
在本发明实施例中, 由于级联会议的级联会场的处理装置接收到级联会 场发送的音频码流是以不同的会场占用不同的音频声道或音频级联通道发送 的, 当级联会议中级联会场的处理装置需要对音频数据进行方位顺序上的调 整时能够直接对需要调整的音频数据进行单独调整, 即级联会议中级联会场 的处理装置对单个音频数据方位的调整不会影响到其它音频数据, 能够实现 级联会议中各会场的图像方位和声音方位的——对应, 提高了与会者的用户 体验。
需要说明的是, 在图 2所示的实施例中, 步骤 202之后还可以包括: 对 级联会场发送的音频码流和非级联会场发送的音频码流进行解码, 则待选择 的音频数据具体包括: 对级联会场发送的音频码流和非级联会场发送的音频 码流进行解码的结果。
上述实施例中描述的是级联会议中级联会场的处理装置接收到级联会场 发送的音频码流以及非级联会场发送的音频码流后, 对音频数据进行方位顺 序的调整。 下面对级联会议中级联会场的处理装置向第一级联会场发送音频 码流的角度出发进行描述, 请参阅如下实施例。 图 3描述的本发明一个实施例提供的级联会议中级联会场的处理方法的流程, 该实施例描述的是级联会议中级联会场的处理装置的处理流程, 该级联会议 中级联会场的处理装置与第一级联会场连接, 还与非级联会场如: 普通会场 和 /或远程呈现会场相连, 本发明实施例包括如下步骤:
301、 接收非级联会场发送的音频码流。
具体可以接收与级联会议中级联会场的处理装置连接的普通会场发送的 音频码流, 和 /或与级联会议中级联会场的处理装置连接的远程呈现会场各个 屏幕分别对应的音频码流, 在本发明实施例中, 将远程呈现会场的各个屏幕 分别对应的多个音频码流分别作为单独的音频码流输入, 该多个音频码流之 间是独立的, 远程呈现会场对该多个音频码流的发送是独立的。
302、 从待选择的音频数据中选择满足预置条件的音频数据。
其中, 待选择的音频数据至少包括非级联会场发送的音频码流, 从待选 择的音频数据中选择满足预置条件的音频数据, 选择的音频数据的数量小于 或等于预定数量, 其中, 预定数量是根据预置条件预先设定, 关于预置条件 的说明, 可参照图 2中步骤 203的说明。
303、 对满足预置条件的音频数据以不同的会场占用不同的音频声道或音 频级联通道进行处理。
当级联会议中级联会场的处理装置选择出满足预置条件的音频数据后, 可以分别按照不同的会场占用不同的音频声道或音频级联通道进行处理, 获 得级联会场音频数据, 使得第一级联会场能够识别出该级联会场音频数据。
需要说明的是, 在本发明实施例中对音频数据的处理是以会场为单位, 按照各个音频声道或音频级联通道分别进行的, 可以保证对单个音频数据的 处理不会影响到其它的音频数据, 具体的处理方法将在后续实施例中说明。
304、 将级联会场音频数据进行编码获得音频码流。
其中, 将上述步骤中得到的级联会场音频数据进行编码, 将编码的结果 作为音频码流。
305、 向第一级联会场发送音频码流。
在进行编码之后获得音频码流后, 向第一级联会场发送该音频码流。 需 要说明的是, 本发明实施例中第一级联会场是直接与级联会议中级联会场的 处理装置连接的级联会场。
从本发明实施例中可以看出, 由于本发明实施例的级联会议中级联会场 的处理装置对满足预置条件的音频数据以不同的会场占用不同的音频声道或 音频级联通道进行处理得到级联会场音频数据, 使得作为接收端的第一级联 会场当需要对音频数据进行方位顺序上的调整时能够直接对音频数据进行单 独调整。
进一步地, 本发明提供另一个级联会议中级联会场的处理方法的实施例, 具体请参阅图 4, 该级联会议中级联会场的处理装置与第一级联会场, 第二级 联会场相连接, 还与非级联会场相连, 在本发明实施例中只设置有一条音频 级联通道, 该音频级联通道包括两条以上的音频声道, 即在各个音频声道中 分别传输音频码流。 本发明实施例包括:
401、 接收非级联会场发送的音频码流, 以及第二级联会场发送的音频码 流。
具体可以接收与级联会议中级联会场的处理装置连接的普通会场发送的 音频码流, 和 /或与级联会议中级联会场的处理装置连接的远程呈现会场各个 屏幕分别对应的音频码流, 在本发明实施例中, 将远程呈现会场的各个屏幕 分别对应的多个音频码流分别作为单个的音频码流输入, 该多个音频码流之 间是独立的, 远程呈现会场对该多个音频码流的发送是独立的。
当级联会议中级联会场的处理装置连接有第二级联会场时, 级联会议中 级联会场的处理装置也接收第二级联会场发送的音频码流。 需要说明的是, 本发明实施例中第二级联会场是直接与级联会议中级联会场的处理装置连接 的一个级联会场。
402、 从待选择的音频数据中选择出满足预置条件的音频数据。
待选择的音频数据至少包括非级联会场发送的音频码流, 本发明实施例 中预置的条件具体可以为保留级联会场和非级联会场中声音最大几方的音频 数据, 根据保留声音最大几方的音频数据的原则从待选择的音频数据中筛选 出音频数据, 选择的音频数据的数量少于或等于预定数量。
在本发明实施例中, 参加保留声音最大几方的音频数据比较的有级联会 议中级联会场的处理装置连接的非级联会场发送的音频码流和第二级联会场 发送的音频码流, 其中, 第二级联会场的音频码流是作为一个会场的音频码 流参加比较的, 在步骤 402 中, 可以将第二级联会场的音频码流叠加起来作 为一个音频码流参加比较, 在实际应用中, 还可以通过其他方式如将第二级 联会场的音频码流的包络信息通过实时传送协议(RTP, Real-time Transport Protocol )填充信息的方式向级联会议中级联会场的处理装置发送, 级联会议 中级联会场的处理装置接收到该音频包络信息后, 将该音频包络信息参加保 留声音最大几方的音频数据的比较。 403、 当满足预置条件的音频数据为远程呈现会场的屏幕对应的音频数据 时, 将远程呈现会场的屏幕对应的音频数据作为单独的会场对应的音频数据。
本发明实施例中, 将满足预置条件的远程呈现会场的屏幕作为单独的会 场通过单独的音频声道进行传输, 级联会场也作为一个会场通过单独的音频 声道进行传输。
404、 或, 当满足预置条件的音频数据为普通会场对应的音频数据且普通 会场的声道数不是单声道时, 将该普通会场的音频数据混音为单声道音频数 据。
本发明实施例中, 将满足预置条件, 声道数不是单声道的普通会场的音 频数据混音为单声道音频数据, 通过一个音频声道进行传输。
405、 根据需要发送的视频码流的方位顺序, 对满足预置条件的音频数据 按照不同的会场分别调整方位顺序。
其中, 需要发送的视频码流为级联会议中级联会场的处理装置接收到的 视频码流中的一部分或者全部, 具体需要向第一级联会场发送哪些视频码流, 可以由用户来选择决定, 也可以由级联会议中级联会场的处理装置决定, 此 处不作限定。
在本步骤中, 将调整过方位顺序的满足预置条件的音频数据作为级联会 场音频数据。
在本发明实施例中, 根据需要发送的视频码流的方位顺序, 对满足预置 条件的音频数据按照不同的会场分别调整方位顺序, 以使得作为接收端的第 一级联会场能够根据视频码流的方位顺序, 识别出满足预置条件的音频数据 的方位顺序。
在实际应用中, 若满足预置条件的音频数据对应的视频源和需要发送的 视频码流相同, 将满足预置条件的音频数据的方位顺序调整为和需要发送的 视频码流相同的方位顺序; 若满足预置条件的音频数据对应的视频源和需要 发送的视频码流不相同, 将满足预置条件的音频数据的方位顺序按照图 2 中 的步骤 204描述的策略调整。 例如: 需要发送的视频码流为 T2、 T1C、 T3R, 若满足预置条件的音频数据为 T3R、 T2、 TIL, 则由于音频数据 T2和视频码 流 T2相同, 将音频数据 Τ2调整至和视频码流相同的顺序即左边, 音频数据 T1L和视频码流均不相同 , 但是视频码流中存在 T1C, 由于视频码流 T1C和 音频码流 T1L同为一个会场 T1的两个码流, 将音频数据 T1L的方位顺序调 整为所显示的视频码流 T1C的方位顺序,即该音频数据 T1L调整为视频源 T1C 的方位顺序即中间,音频数据 T3R和视频码流 T3R相同,将音频数据 T3R调 整至和视频码流 T3R 方位相同的右边, 故调整过方位顺序后的音频数据为 T2、 TIL, T3R。
需要说明的是, 本步骤 405也可以使用如下步骤进行替换:
将满足预置条件的音频数据按照不同的会场占用不同的音频声道进行排 序, 将排序过的满足预置条件的音频数据作为级联会场音频数据, 其中, 排 序可以为按照声音最大几方的音频数据的顺序依次排列, 但也可以是其它的 顺序, 此处不作限定。
然后, 生成音频会场位置信息, 其中, 音频会场位置信息为满足预置条 件的音频数据的位置排序信息。
最后, 将生成的音频会场位置信息发送给第一级联会场, 在具体实现时 可以采用 RTP填充信息的填充数据方式向第一级联会场发送, 同样可以采取 其它实现方式, 此处不作限定。
406、 将级联会场音频数据进行编码获得音频码流。
其中, 将上述步骤中得到的级联会场音频数据进行编码, 即可以获取到 音频码流。
407、 向第一级联会场发送音频码流。
在进行编码之后获得了音频码流之后, 向第一级联会场发送音频码流。 需要说明的是, 本发明实施例中第一级联会场是直接与级联会议中级联会场 的处理装置连接的另一个级联会场。
在本发明实施例中的步骤 406中, 根据需要发送的视频码流的方位顺序, 对相应的级联会场音频数据也作了方位顺序上的调整, 使得作为接收端的第 一级联会场能够根据视频码流的方位顺序, 识别出满足预置条件的音频数据 的方位顺序, 使得第一级联会场就可以对音频数据进行单独的调整。 从本发明实施例中可以看出, 由于本发明实施例的级联会议中级联会场 的处理装置对满足预置条件的音频数据以不同的会场占用不同的音频声道或 音频级联通道进行处理得到级联会场音频数据, 使得作为接收端的第一级联 会场当需要对音频数据进行方位顺序上的调整时能够直接对音频数据进行单 独调整。
可选地, 本发明提供另一个级联会议中级联会场的处理方法的实施例, 具体请参阅图 5 , 该级联会议中级联会场的处理装置与第一级联会场, 第二级 联会场相连接, 还与非级联会场相连, 在本发明实施例中设置有两条以上的 音频级联通道, 这与现有技术中只有一条音频级联通道是不同的, 即在各个 音频级联通道中分别传输音频码流。 本发明实施例包括:
501、 接收非级联会场发送的音频码流, 以及第二级联会场发送的音频码 流。
本步骤中的内容和上一实施例中步骤 401 的内容相同, 具体可参照该步 骤, 此处不作详细描述。
502、 对非级联会场发送的音频码流和第二级联会场发送的音频码流进行 解码。
获取到非级联会场发送的音频码流和第二级联会场发送的第二音频码流 后, 可以对该音频码流进行解码, 需要说明的是, 对音频码流进行解码时可 选的一种实现形式。
503、 从待选择的音频数据中选择出满足预置条件的音频数据; 待选择的音频数据具体包括: 对非级联会场发送的音频码流和第二级联 会场发送的音频码流进行解码的结果, 本发明实施例中可以选择根据保留声 音最大几方的音频数据的原则从待选择的音频数据中选择音频数据, 选择的 音频数据的数量少于或等于预定数量。
504、 将满足预置条件的音频数据按照不同的会场占用不同的音频级联通 道进行处理得到级联会场音频数据;
在本步骤中, 将满足预置条件的音频数据按照不同的会场占用不同的音 频级联通道进行处理, 即在一个音频级联通道中只用来传输一个会场的音频 数据, 将满足预置条件的音频数据作为级联会场音频数据。 本发明实施例中, 将满足预置条件的会场作为单独的会场通过音频级联通道进行传输音频数 据, 级联会议中级联会场设置有多条的音频级联通道, 那么本发明实施例中 可以按照各个音频级联通道对各个会场的音频数据进行处理, 每条音频级联 通道的声道数可以为单声道、 双声道、 三声道或者是更多的声道数, 此处不 作限定。
505、 生成音频级联通道组成信息, 其中, 音频级联通道组成信息为级联 会场建立的音频级联通道的数目信息, 以便获取到接收到的级联会场发送的 音频码流占用的音频级联通道的数目。
506、 将级联会场音频数据进行编码获得音频码流。
其中, 将上述步骤中满足预置条件的音频数据进行编码, 作为音频码流。
507、 向第一级联会场发送音频码流, 并向第一级联会场发送音频级联通 道组成信息。
在进行编码之后获得需要发送的音频码流后, 向第一级联会场发送。 由 于本发明实施例中级联会议中级联会场的处理装置设置有多条音频级联通 道, 而音频级联通道的满足预置条件的却总是会时刻变化, 在本发明实施例 中还需要向第一级联会场发送音频级联通道组成信息。
从本发明实施例中可以看出, 由于本发明实施例的级联会议中级联会场 的处理装置选择出满足预置条件的音频数据是按照不同的音频级联通道分别 发送, 使得作为接收端的级联会议中级联会场的处理装置当需要对音频数据 进行方位顺序上的调整时能够直接对音频数据进行单独调整。
如下举具体实例对本发明实施例进行描述, 如图 6,描述了本发明一个实 施例提供的一个级联会议中级联会场的结构, 以设置有一个音频级联通道, 该音频级联通道包括两个以上的音频声道为例, 请参阅图 6所示:
级联会场有 2个 MCU, 分别是 MCU1和 MCU2, 其中, MCU1与 MCU2 连接。
其中, MCU1连接有 1个普通会场和 2个远程呈现会场, 如图 6所示, 为没有对音频码流顺序进行调整之前的示意图, 在图 6 中, 1 个普通会场为 T2, 2个远程呈现会场分别是 T1和 Τ3 , 远程呈现会场 T1和 Τ3分别有三个 屏幕, 分别为 T1L、 T1C、 T1R和 T3L、 T3C、 T3R, 同时 MCU2作为 MCU1 的级联会场也与 MCU1连接, MCU2连接有 2个普通会场和 1个远程呈现会 场, 如图 6所示, 2个普通会场分别是 T5和 T6, 1个远程呈现会场为 T4, 包 含三个屏幕, 分别为 T4L、 T4C、 T4R。 每个 MCU都是支持保留最大 3方音 频会场, 即 MCU从所连接的所有会场(包括普通会场、 远程呈现会场和级联 会场) 中选择声音最大的 3个会场的音频数据进行编码。
MCU1可以接收到 T1L、 T1C、 T1R、 T2和 T3L、 T3C、 T3R发送的音频 码流, MCU2可以接收到 T4L、 T4C、 T4R、 T5和 Τ6发送的音频码流以及 MCU1发送的级联会场音频码流和级联会场视频码流, 如图 6所示, 本发明 实施例中, MCU1向 MCU2发送的级联会场视频码流为 T2、 T1C、 T3R, 本 实施例不描述 MCU1向 T1L、 T1C、 T1R、 T2和 T3L、 T3C、 T3R发送级联 会场媒体数据的过程,只描述 MCU1向 MCU2发送级联会场音频码流的过程。
对于 MCU1分别对音频码流进行解码获得 T1L、 T1C、 T1R、 T2和 T3L、 T3C、 T3R对应的会场音频数据, 将 T1L、 T1C、 T1R、 T2和 T3L、 T3C、 T3R对应的会场音频数据作为待选择的音频数据, 根据保留声音最大 3方的 音频数据的原则从待选择的音频数据中筛选出音频数据, 假设根据保留声音 最大的 3方的音频数据筛选出的音频数据为 T1C、 T2、 T3R, 则接收到的级 联会场视频码流和陣选出的音频数据的音频源完全相同, 即级联会场视频码 流 T2、 T1C、 T3R和筛选出的音频数据 T1C、 T2、 T3R完全相同, 则可以按 照级联会场视频码流的顺序对陣选出的音频数据的方位顺序进行相对应的调 整, 则调整后, 筛选出的音频数据的方位顺序和级联会场视频码流的方位顺 序相同, 从而将调整过顺序的音频数据作为级联会场音频数据并进行编码, 获得 MCU1的音频码流 TIC, Τ2, T3R, 则 MCU2能够识别出该音频码流。
若根据保留声音最大 3方的音频数据筛选出的音频数据不是 T1C、 T2、 T3R, 即筛选出的音频数据的来源和级联会场视频码流的来源不同, 则定义筛 选出的音频数据所在的会场是不可见会场, 则可以将不可见会场对应的音频 数据按照图 2中的步骤 204描述的策略调整。 下面从 MCU 2作为接收端的角度进行描述, 首先接收 MCU 1发送的音 频码流, 接收与 MCU 2相连接的非级联会场 T4 , T5 , T6发送的音频码流, 然后从待选择的音频数据中选择出满足预置条件的音频数据, 选择音频数据 的过程和 MCU1 中的处理过程相同, 此处不再贅述。 最后, 对选择出的音频 数据可以进行方位顺序上的调整, 具体的调整策略已经在图 2 的实施例中作 了说明, 此处不再贅述。 本发明实施例中, 对单个音频数据方位的调整不会 影响到其它音频数据, 能够实现级联会议中各会场的图像方位和声音方位的 一一对应, 提高了与会者的用户体验。
如下举另一具体实例对本发明实施例进行描述, 如图 7 , 描述了本发明另 一个实施例提供的一个级联会议中级联会场的结构, 本发明实施例中设置有 两条以上的音频级联通道, 即在各个音频级联通道中分别传输音频码流, 请 参阅图 Ί所示:
级联会场有 2个 MCU, 分别是 MCU1和 MCU2, 其中, MCU1与 MCU2 连接, MUC1和 MUC2之间连接有四条音频级联通道, 四条视频级联通道。 其中, MCU1连接有两个普通会场和两个远程呈现会场, 如图 Ί所示, 两个 普通会场为 T2和 T7 , 两个远程呈现会场分别是 T1和 T3 , 同时 MCU2作为 MCU1的级联会场也与 MCU1连接, MCU2连接有两个普通会场和一个远程 呈现会场, 如图 7所示, 2个普通会场分别是 T5和 T6。 MCU1与 MCU2间 的级联视频源为 T2、 T1C、 T3R、 T7。 每个 MCU都是支持保留最大 4方音频 会场, 即 MCU从所连接的所有会场(包括普通会场、 远程呈现会场和级联会 场) 中选择声音最大的 4个会场的音频数据。
在本发明实施例中 , MCU1和 MCU2的音频级联通道为多条, 具体可以 根据级联会议的要求而定, 同时, 音频级联通道同样按照保留声音最大几方 的音频数据的原则对音频数据进行筛选,在本发明实施例中音频级联通道为 4 条, 每条音频级联通道的声道数可以为单声道、 双声道、 三声道或者是更多 的声道数, 此处不作限定。 由于音频级联通道为多条, 本发明实施例中可以 将远程呈现会场的音频数据作为一个会场的音频数据, 但是音频级联通道的 声道数可以设为双声道或三声道, 以保证一个音频级联通道能够加载一个远 程呈现会场的全部音频。
MCU1可以接收到 Tl、 Τ2、 Τ3和 Τ7发送的音频码流, MCU2可以接收 到 Τ4、 Τ5和 Τ6发送的音频码流以及 MCU1发送的级联会场音频码流和级联 会场视频码流, 如图 7所示, 本发明实施例中, MCU1向 MCU2发送的级联 会场视频码流为 T2、 T1C、 T3R、 T7, 本实施例不描述 MCU1向 Tl、 T2、 Τ3 和 Τ7发送级联会场媒体数据的过程, 只描述 MCU1向 MCU2发送级联会场 音频码流的过程。
MCU1分别对音频码流进行解码获得 Tl、 Τ2、 Τ3和 Τ7对应的会场音频 数据, 将 Tl、 Τ2、 Τ3和 Τ7对应的会场音频数据作为待选择的音频数据, 根 据保留最大 4方会场的音频数据的原则从待选择的音频数据中筛选出音频数 据为 Tl、 Τ2、 Τ3和 Τ7, 作为级联会场音频数据。 然后, 分别将该四路音频 数据加载到相应的音频级联通道, 若筛选出的音频数据为远方呈现会场的音 频数据, 则相应的加载到多声道的音频级联通道。
获得级联会场音频数据之后, 对级联会场音频数据进行编码获得级联会 场音频码流, 然后, 向级联会场 MCU 2发送级联会场音频码流, 并向级联会 场 MCU 2发送级联会场视频码流。
由于级联通道的音频最大方是时时变化的, 所以需要在级联音频通道的 码流中同时也向 MCU 2发送音频级联通道组成信息, 其中, 音频级联通道组 成信息为级联会场建立的音频级联通道的数目信息, 以便获取到接收到的音 频码流占用的音频级联通道的数目,具体可以通过 RTP填充信息的形式发送, 但不仅限该方式, 此处不作限定。
MCU2收到级联音频通道的最大方会场的数据后, 加上和 MCU2直接相 连的会场音频数据, MCU2实际上相当于收到了 Τ7、 Τ1、 Τ2、 Τ3、 Τ4、 Τ5、 Τ6各个会场的独立数据, 根据本 MCU上会场所需要显示的视频码流的情况 对对应会场的音频码流的方位进行调整处理, 使每个会场视频码流的方位顺 序与对应的音频码流的方位顺序——对应。 全相同时, 则按照如图 2所示的本发明实施例中的处理方式, 此处不再贅述。 述, 下面对级联会议中级联会场的处理装置进行描述, 本发明实施例提供的 级联会议中级联会场的处理装置可以作为 MCU使用。
请参阅图 8,本发明实施例的级联会议中级联会场的处理装置的一个例子 包括:
接收单元 801 , 用于接收级联会场发送的音频码流, 级联会场发送的音频 码流是以不同的会场占用不同的音频声道或音频级联通道发送的; 还用于接 收非级联会场发送的音频码流;
选择单元 802,用于从待选择的音频数据中选择出满足预置条件的音频数 据, 待选择的音频数据包括: 级联会场发送的音频码流和非级联会场发送的 音频码流;
顺序调整单元 803 ,用于对选择单元 803选择出的音频数据的方位顺序进 行调整。
在本发明实施例中, 由于级联会议中级联会场的处理装置接收到级联会 场发送的音频码流是以不同的会场占用不同的音频声道或音频级联通道发送 的, 当级联会议中级联会场的处理装置需要对音频数据进行方位顺序上的调 整时能够直接对需要调整的音频数据方位进行单独调整, 即级联会议中级联 会场的处理装置对单个音频数据方位的调整不会影响到其它音频数据, 能够 实现级联会议中各会场的图像方位和声音方位的——对应, 提高了与会者的 用户体验。
需要说明的是, 在本发明图 8所示的实施例中, 当级联会场发送的音频 码流以不同的会场占用不同的音频级联通道发送时,接收单元 801 , 还用于接 收级联会场发送的音频级联通道组成信息, 其中, 音频级联通道组成信息为 级联会场建立的音频级联通道的数目信息, 以便获取到接收到的音频码流占 用的音频级联通道的数目。
级联会议中级联会场的处理装置还可以包括: 解码单元, 用于对级联会 场发送的音频码流和非级联会场发送的音频码流进行解码。
上述实施例中描述的是级联会议中级联会场的处理装置从接收级联会场 的音频码流, 对满足预置条件的音频数据进行方位顺序的调整。 下面对级联 会议中级联会场的处理装置从向级联会场发送音频码流的角度出发进行描 述, 请参阅图 9:
接收单元 901 , 用于接收非级联会场发送的音频码流,还用于接收第二级 联会场发送的音频码流;
解码单元 902, 用于对接收单元 901接收到的音频码流进行解码; 选择单元 903 ,用于从待选择的音频数据中选择出满足预置条件的音频数 据, 待选择的音频数据具体包括解码单元 902解码的结果;
处理单元 904,用于对选择单元 903选择出的音频数据以不同的会场占用 不同的音频声道或音频级联通道进行处理得到级联会场音频数据, 使得第一 级联会场能够识别出级联会场音频数据;
编码单元 905 , 用于将处理单元 904的处理结果进行编码获得音频码流; 发送单元 906, 用于向第一级联会场发送音频码流。
从本发明实施例中可以看出, 由于本发明实施例的级联会议中级联会场 的处理装置对满足预置条件的音频数据以不同的会场占用不同的音频声道或 音频级联通道进行处理得到级联会场音频数据, 使得作为接收端的第一级联 会场当需要对音频数据进行方位顺序上的调整时能够直接对音频数据进行单 独调整。
进一步地, 请参阅图 10, 在本发明实施例中只设置有一条音频级联通道, 该音频级联通道包括两条以上的音频声道, 即在各个音频声道中分别传输音 频码流。 本发明实施例的级联会议中级联会场的处理装置的另一个例子包括: 接收单元 1001 , 用于接收非级联会场发送的音频码流; 还用于接收第二 级联会场发送的音频码流;
解码单元 1002, 用于对接收单元 1001接收到的音频码流进行解码; 选择单元 1003 , 用于从待选择的音频数据中选择出满足预置条件的音频 数据, 待选择的音频数据具体包括解码单元 1002解码的结果;
处理单元 1004, 包括: 会场识别模块 10041 , 用于若满足预置条件的音 频数据为远程呈现会场的屏幕对应的音频数据, 将远程呈现会场的屏幕对应 的音频数据作为单独的会场对应的音频数据; 混音模块 10042, 用于若满足预 置条件的音频数据为普通会场对应的音频数据且普通会场的声道数不是单声 道, 将普通会场的音频数据混音为单声道音频数据; 关联模块 1043 , 用于根 据需要发送的视频码流的方位顺序, 对满足预置条件的音频数据按照不同的 会场占用不同的音频声道分别调整方位顺序, 将调整过方位顺序的满足预置 条件的音频数据作为级联会场音频数据;
编码单元 1005 ,用于将处理单元 1004获取到的级联会场音频数据进行编 码获得音频码流;
发送单元 1006, 用于向第一级联会场发送音频码流。
需要说明的是,本发明实施例中,处理单元 1004包括会场识别模块 10041 和混音模块 10042时也可以不包括关联模块 10043而包括如下模块: 排序模 块, 用于将满足预置条件的音频数据按照不同的会场占用不同的音频声道进 行排序, 将排序过的满足预置条件的音频数据作为级联会场音频数据; 生成 模块, 用于生成音频会场位置信息, 音频会场位置信息为满足预置条件的音 装置中发送单元 1006, 还用于向第一级联会场发送音频会场位置信息。
在本发明实施例中也可以设置有两条以上的音频级联通道, 即在各个音 频级联通道中分别传输音频码流, 级联会议中级联会场的处理装置此时包括 如下单元: 生成单元, 用于生成音频级联通道组成信息, 其中, 音频级联通 道组成信息为级联会场建立的音频级联通道的数目信息, 以便获取到接收到 的级联会场发送的音频码流占用的音频级联通道的数目; 同时本发明实施例 中的级联会议中级联会场的处理装置中发送单元 1006, 还用于向第一级联会 场发送音频级联通道组成信息。
从本发明实施例中可以看出, 由于级联会议中级联会场的处理装置对满 足预置条件的音频数据以不同的会场占用不同的音频声道或音频级联通道进 行处理得到级联会场音频数据, 使得作为接收端的第一级联会场当需要对音 频数据进行方位顺序上的调整时能够直接对音频数据进行单独调整。
本发明实施例还提供了一种级联会议系统, 包括: 如图 8所示的级联会议中级联会场的处理装置;
和,
如图 9或图 10所示的级联会议中级联会场的处理装置。
从本发明实施例可以看出, 由于级联会议系统接收到的音频码流是以不 同的会场占用不同的音频声道或音频级联通道发送的, 当级联会议系统需要 对音频数据进行方位顺序上的调整时能够直接对需要调整的音频数据进行单 独调整, 即级联会议系统对单个音频数据方位的调整不会影响到其它音频数 据, 能够实现级联会议中各会场的图像方位和声音方位的——对应, 提高了 与会者的用户体验。
上述装置和系统内的各模块之间的信息交互、 执行过程等内容, 由于与 本发明方法实施例基于同一构思, 具体内容可参见本发明方法实施例中的叙 述, 此处不再赘述„
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 是可以通过程序来指令相关的硬件完成, 所述的程序可以存储于一种计算机 可读存储介质中, 上述提到的存储介质可以是只读存储器, 磁盘或光盘等。 行了详细介绍, 对于本领域的一般技术人员, 依据本发明实施例的思想在具 体实施方式及应用范围上均会有改变之处, 综上所述, 本说明书内容不应理 解为对本发明的限制。
+

Claims

权利 要求 书
1、 一种级联会议中级联会场的处理方法, 其特征在于, 包括:
接收级联会场发送的音频码流, 所述级联会场发送的音频码流是以不同的 会场占用不同的音频声道或音频级联通道发送的;
接收非级联会场发送的音频码流;
从待选择的音频数据中选择出满足预置条件的音频数据, 所述待选择的音 频数据包括: 所述接收到的级联会场发送的音频码流和非级联会场发送的音频 码流;
对所述满足预置条件的音频数据的方位顺序进行调整。
2、 根据权利要求 1所述的级联会议中级联会场的处理方法, 其特征在于, 所述非级联会场包括: 普通会场和 /或远程呈现会场。
3、 根据权利要求 1所述的级联会议中级联会场的处理方法, 其特征在于, 当所述级联会场发送的音频码流以不同的会场占用不同的音频级联通道发送 时, 所述方法还包括: 接收所述级联会场发送的音频级联通道组成信息, 所述 音频级联通道组成信息为所述级联会场建立的音频级联通道的数目信息, 以便 获取到所述接收到的级联会场发送的音频码流占用的音频级联通道的数目。
4、 根据权利要求 1所述的级联会议中级联会场的处理方法, 其特征在于, 所述对所述满足预置条件的音频数据的方位顺序进行调整包括:
如果非级联会场的视频源只有一个屏被显示在多个屏中的一个屏中或者是 在多画面中的一个位置的画面, 则所述视频源的所有屏对应的音频数据的输出 方位顺序都为被显示的那个屏在多个屏中的显示方位顺序或者在多画面中的那 个位置的画面所在的方位顺序;
如果非级联会场的视频源的多个屏中有两个以上的屏被显示, 则所述视频 源被显示的两个以上的屏对应的音频数据输出方位顺序和所述视频源被显示的 两个以上的屏的方位顺序——对应, 所述视频源的多个屏中没有被显示的屏对 应的音频数据输出方位顺序为和所述视频源被显示所有屏中的其中一个屏保持 相同;
如果非级联会场的视频源的一个屏被同时显示在多个多画面中或者是一个 多画面和独立屏中, 则所述视频源对应的音频数据的输出方位顺序优先级从高 到氐依次为: 独立屏的方位、 子画面大的屏的方位、 以中、 左、 右优先级显示 的屏的方位。
5、 根据权利要求 1所述的级联会议中级联会场的处理方法, 其特征在于, 所述预置条件为保留级联会场和非级联会场中声音最大几方的音频数据。
6、 根据权利要求 1所述的级联会议中级联会场的处理方法, 其特征在于, 所述接收非级联会场发送的音频码流之后包括:
对所述接收到的级联会场发送的音频码流和非级联会场发送的音频码流进 行解码;
所述待选择的音频数据具体包括: 对所述接收到的级联会场发送的音频码 流和非级联会场发送的音频码流进行解码的结果。
7、 一种级联会议中级联会场的处理方法, 其特征在于, 包括:
接收非级联会场发送的音频码流;
从待选择的音频数据中选择出满足预置条件的音频数据, 所述待选择的音 频数据至少包括所述接收到的非级联会场发送的音频码流;
对所述满足预置条件的音频数据以不同的会场占用不同的音频声道或音频 级联通道进行处理得到级联会场音频数据, 使得第一级联会场能够识别出所述 级联会场音频数据;
将所述级联会场音频数据进行编码获得音频码流;
向第一级联会场发送所述音频码流。
8、 根据权利要求 7所述的级联会议中级联会场的处理方法, 其特征在于, 所述非级联会场包括: 普通会场和 /或远程呈现会场。
9、 根据权利要求 7所述的级联会议中级联会场的处理方法, 其特征在于, 所述对所述满足预置条件的音频数据以不同的会场占用不同的音频声道进 行处理获得级联会场音频数据包括:
若所述满足预置条件的音频数据为远程呈现会场的屏幕对应的音频数据, 将所述远程呈现会场的屏幕对应的音频数据作为单独的会场对应的音频数据; 若所述满足预置条件的音频数据为普通会场对应的音频数据且所述普通会 场的声道数不是单声道, 将所述普通会场的音频数据混音为单声道音频数据; 根据需要发送的视频码流的方位顺序, 对所述满足预置条件的音频数据按 照不同的会场占用不同的音频声道分别调整方位顺序, 将调整过方位顺序的满 足预置条件的音频数据作为级联会场音频数据。
10、 根据权利要求 7所述的级联会议中级联会场的处理方法, 其特征在于, 所述对所述满足预置条件的音频数据以不同的会场占用不同的音频声道进 行处理得到级联会场音频数据包括:
若所述满足预置条件的音频数据为远程呈现会场的屏幕对应的音频数据, 将所述远程呈现会场的屏幕对应的音频数据作为单独的会场对应的音频数据; 若所述满足预置条件的音频数据为普通会场对应的音频数据且所述普通会 场的声道数不是单声道, 将所述普通会场的音频数据混音为单声道音频数据; 将所述满足预置条件的音频数据按照不同的会场分别占用不同的音频声道 进行排序, 将排序过的满足预置条件的音频数据作为级联会场音频数据;
生成音频会场位置信息, 所述音频会场位置信息为所述满足预置条件的音 频数据的位置排序信息;
所述对所述满足预置条件的音频数据以不同的会场占用不同的音频声道或 音频级联通道进行处理得到级联会场音频数据之后包括: 向所述第一级联会场 发送所述音频会场位置信息。
11、 根据权利要求 7所述的级联会议中级联会场的处理方法, 其特征在于, 所述对所述满足预置条件的音频数据以不同的会场占用不同的音频级联通 道进行处理得到级联会场音频数据之后包括:
生成音频级联通道组成信息, 所述音频级联通道组成信息为所述级联会场 建立的音频级联通道的数目信息, 以便获取到所述接收到的级联会场发送的音 频码流占用的音频级联通道的数目;
所述对所述满足预置条件的音频数据以不同的会场占用不同的音频声道或 音频级联通道进行处理得到级联会场音频数据之后包括: 向所述第一级联会场 发送音频级联通道组成信息。
12、 根据权利要求 7所述的级联会议中级联会场的处理方法, 其特征在于, 所述预置条件为保留级联会场和非级联会场中声音最大几方的音频数据。
13、 根据权利要求 7所述的级联会议中级联会场的处理方法, 其特征在于, 所述方法还包括:
接收第二级联会场发送的音频码流;
所述待选择的音频数据还包括所述第二级联会场发送的音频码流。
14、根据权利要求 13所述的级联会议中级联会场的处理方法,其特征在于, 所述接收非级联会场发送的音频码流之后包括:
对非级联会场发送的音频码流和第二级联会场发送的音频码流进行解码; 所述待选择的音频数据具体包括: 对非级联会场发送的音频码流和第二级 联会场发送的音频码流进行解码的结果。
15、 一种级联会议中级联会场的处理装置, 其特征在于, 包括:
接收单元, 用于接收级联会场发送的音频码流, 所述级联会场发送的音频 码流是以不同的会场占用不同的音频声道或音频级联通道发送的;
所述接收单元, 还用于接收非级联会场发送的音频码流;
选择单元, 用于从待选择的音频数据中选择出满足预置条件的音频数据, 所述待选择的音频数据包括: 所述接收到的级联会场发送的音频码流和非级联 会场发送的音频码流;
顺序调整单元, 用于对所述满足预置条件的音频数据的方位顺序进行调整。
16、根据权利要求 15所述的级联会议中级联会场的处理装置,其特征在于, 当所述级联会场发送的音频码流以不同的会场占用不同的音频级联通道发送 时,
所述接收单元, 还用于接收所述级联会场发送的音频级联通道组成信息, 所述音频级联通道组成信息为所述级联会场建立的音频级联通道的数目信息, 以便获取到所述接收到的音频码流占用的音频级联通道数目。
17、根据权利要求 15所述的级联会议中级联会场的处理装置,其特征在于, 所述装置还包括:
解码单元, 用于对级联会场发送的音频码流和非级联会场发送的音频码流 进行解码。
18、 一种级联会议中级联会场的处理装置, 其特征在于, 包括: 接收单元, 用于接收非级联会场发送的音频码流;
选择单元, 用于从待选择的音频数据中选择出满足预置条件的音频数据, 所述待选择的音频数据至少包括非级联会场发送的音频码流;
处理单元, 用于对所述满足预置条件的音频数据以不同的会场占用不同的 音频声道或音频级联通道进行处理得到级联会场音频数据, 使得第一级联会场 能够识别出所述级联会场音频数据;
编码单元, 用于将所述级联会场音频数据进行编码获得音频码流; 发送单元, 用于向第一级联会场发送所述音频码流。
19、根据权利要求 18所述的级联会议中级联会场的处理装置,其特征在于, 当对所述满足预置条件的音频数据以不同的会场占用不同的音频声道进行 处理获得级联会场音频数据时, 所述处理单元包括:
会场识别模块, 用于若所述满足预置条件的音频数据为远程呈现会场的屏 幕对应的音频数据, 将所述远程呈现会场的屏幕对应的音频数据作为单独的会 场对应的音频数据;
混音模块, 用于若所述满足预置条件的音频数据为普通会场对应的音频数 据且所述普通会场的声道数不是单声道, 将所述普通会场的音频数据混音为单 声道音频数据;
关联模块, 用于根据所述需要发送的视频码流的方位顺序, 对所述满足预 置条件的音频数据按照不同的会场占用不同的音频声道分别调整方位顺序, 将 调整过方位顺序的满足预置条件的音频数据作为级联会场音频数据。
20、根据权利要求 18所述的级联会议中级联会场的处理装置,其特征在于, 当对所述满足预置条件的音频数据以不同的会场占用不同的音频声道进行 处理得到级联会场音频数据时, 所述处理单元包括:
会场识别模块, 用于若所述满足预置条件的音频数据为远程呈现会场的屏 幕对应的音频数据, 将所述远程呈现会场的屏幕对应的音频数据作为单独的会 场对应的音频数据;
混音模块, 用于若所述满足预置条件的音频数据为普通会场对应的音频数 据且所述普通会场的声道数不是单声道, 将所述普通会场的音频数据混音为单 声道音频数据;
排序模块, 用于将所述满足预置条件的音频数据按照不同的会场占用不同 的音频声道进行排序, 将排序过的满足预置条件的音频数据作为级联会场音频 数据;
生成模块, 用于生成音频会场位置信息, 所述音频会场位置信息为所述满 足预置条件的音频数据的位置排序信息;
所述发送单元, 还用于向所述第一级联会场发送所述音频会场位置信息。
21、根据权利要求 18所述的级联会议中级联会场的处理装置,其特征在于, 当对所述满足预置条件的音频数据以不同的会场占用不同的音频级联通道进行 处理得到级联会场音频数据时, 所述装置还包括:
生成单元, 用于生成音频级联通道组成信息, 所述音频级联通道组成信息 为所述级联会场建立的音频级联通道的数目信息, 以便获取到所述接收到的级 联会场发送的音频码流占用的音频级联通道的数目;
所述发送单元, 还用于向所述第一级联会场发送音频级联通道组成信息。
22、根据权利要求 18所述的级联会议中级联会场的处理装置,其特征在于, 所述接收单元, 还用于接收第二级联会场发送的音频码流。
23、根据权利要求 22所述的级联会议中级联会场的处理装置,其特征在于, 所述装置还包括: 解码单元, 用于对非级联会场发送的音频码流和第二级联会 场发送的音频码流进行解码。
24、 一种级联会议系统, 其特征在于, 包括:
如权利要求 15或 17所述的级联会议中级联会场的处理装置;
和,
如权利要求 18至 23中任一所述的级联会议中级联会场的处理装置。
PCT/CN2011/083806 2010-12-24 2011-12-12 级联会议中级联会场的处理方法、装置及系统 WO2012083799A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP11851966.9A EP2574051B1 (en) 2010-12-24 2011-12-12 Method, device and system for processing cascade conference sites in cascade conference
ES11851966.9T ES2585003T3 (es) 2010-12-24 2011-12-12 Método, aparato y sistema para procesar sitios de conferencia en cascada en una conferencia en cascada
US13/715,436 US8836753B2 (en) 2010-12-24 2012-12-14 Method, apparatus, and system for processing cascade conference sites in cascade conference

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010605183.1A CN102547210B (zh) 2010-12-24 2010-12-24 级联会议中级联会场的处理方法、装置及系统
CN201010605183.1 2010-12-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/715,436 Continuation US8836753B2 (en) 2010-12-24 2012-12-14 Method, apparatus, and system for processing cascade conference sites in cascade conference

Publications (1)

Publication Number Publication Date
WO2012083799A1 true WO2012083799A1 (zh) 2012-06-28

Family

ID=46313146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/083806 WO2012083799A1 (zh) 2010-12-24 2011-12-12 级联会议中级联会场的处理方法、装置及系统

Country Status (5)

Country Link
US (1) US8836753B2 (zh)
EP (1) EP2574051B1 (zh)
CN (1) CN102547210B (zh)
ES (1) ES2585003T3 (zh)
WO (1) WO2012083799A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083423A (zh) * 2019-11-18 2020-04-28 视联动力信息技术股份有限公司 多会议发言方法、装置及可读存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NO336217B1 (no) * 2012-12-21 2015-06-15 Pexip AS Fremgangsmåte, datamaskinprogram og system for håndtering av mediestrømmer i videokonferanser.
NO341411B1 (no) 2013-03-04 2017-10-30 Cisco Tech Inc Virtuelle endepunkter i videokonferanser
US9210379B2 (en) * 2014-02-27 2015-12-08 Google Inc. Displaying a presenter during a video conference
CN104469262A (zh) * 2014-12-26 2015-03-25 国家电网公司 一种视频会议一体化系统
CN106685975B (zh) * 2016-12-30 2022-08-12 深圳市潮流网络技术有限公司 一种音频会议设备的级联控制方法
JP6931815B2 (ja) * 2018-02-27 2021-09-08 パナソニックIpマネジメント株式会社 テレビ会議装置
CN113467741A (zh) * 2018-04-18 2021-10-01 海信视像科技股份有限公司 一种传屏方法、显示设备及其传屏系统
CN110708432B (zh) * 2019-10-12 2021-01-12 浙江大华技术股份有限公司 音频会议中音频输出的方法、系统、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885785A (zh) * 2006-07-04 2006-12-27 华为技术有限公司 Mcu级联系统和该系统的创建及通信方法
CN1953537A (zh) * 2006-11-23 2007-04-25 北京航空航天大学 多mcu视频会议系统中的混音方法
CN101132516A (zh) * 2007-09-28 2008-02-27 深圳华为通信技术有限公司 一种视频通讯的方法、系统及用于视频通讯的装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3390394B2 (ja) * 1999-12-22 2003-03-24 エヌイーシーネットワーク・センサ株式会社 Tv会議システム
US7492730B2 (en) * 2005-04-19 2009-02-17 Polycom, Inc. Multi-site conferencing system and method
US7800642B2 (en) * 2006-03-01 2010-09-21 Polycom, Inc. Method and system for providing continuous presence video in a cascading conference
CN101427232B (zh) 2006-04-20 2015-05-13 思科技术公司 用于控制远程呈现系统的系统和方法
US7990899B2 (en) * 2006-07-07 2011-08-02 Avaya Inc. Method and apparatus for expanding conference sizes
US8300556B2 (en) * 2007-04-27 2012-10-30 Cisco Technology, Inc. Optimizing bandwidth in a multipoint video conference
US20080273078A1 (en) * 2007-05-01 2008-11-06 Scott Grasley Videoconferencing audio distribution
US8289362B2 (en) * 2007-09-26 2012-10-16 Cisco Technology, Inc. Audio directionality control for a multi-display switched video conferencing system
US9369294B2 (en) * 2007-12-14 2016-06-14 Telecommunication Systems, Inc. Reverse 911 using multicast session internet protocol (SIP) conferencing of voice over internet protocol (VoIP) users
US20090281803A1 (en) * 2008-05-12 2009-11-12 Broadcom Corporation Dispersion filtering for speech intelligibility enhancement
CN101540872B (zh) * 2009-02-23 2012-07-04 华为终端有限公司 媒体控制服务器多通道级联的控制方法、装置和系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1885785A (zh) * 2006-07-04 2006-12-27 华为技术有限公司 Mcu级联系统和该系统的创建及通信方法
CN1953537A (zh) * 2006-11-23 2007-04-25 北京航空航天大学 多mcu视频会议系统中的混音方法
CN101132516A (zh) * 2007-09-28 2008-02-27 深圳华为通信技术有限公司 一种视频通讯的方法、系统及用于视频通讯的装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2574051A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111083423A (zh) * 2019-11-18 2020-04-28 视联动力信息技术股份有限公司 多会议发言方法、装置及可读存储介质
CN111083423B (zh) * 2019-11-18 2022-12-13 视联动力信息技术股份有限公司 多会议发言方法、装置及可读存储介质

Also Published As

Publication number Publication date
EP2574051B1 (en) 2016-05-04
EP2574051A4 (en) 2014-06-11
EP2574051A1 (en) 2013-03-27
US20130100239A1 (en) 2013-04-25
CN102547210A (zh) 2012-07-04
ES2585003T3 (es) 2016-10-03
CN102547210B (zh) 2014-09-17
US8836753B2 (en) 2014-09-16

Similar Documents

Publication Publication Date Title
WO2012083799A1 (zh) 级联会议中级联会场的处理方法、装置及系统
US9462228B2 (en) Distributed real-time media composer
US9462227B2 (en) Automatic video layouts for multi-stream multi-site presence conferencing system
JP5534813B2 (ja) 多言語会議を実現するシステム、方法、及び多地点制御装置
WO2011140812A1 (zh) 多画面合成方法、系统及媒体处理装置
WO2011026382A1 (zh) 视频会议虚拟会场的呈现方法、设备及系统
US9961303B2 (en) Video conference virtual endpoints
WO2008141539A1 (fr) Procédé d'affichage de légendes, système et appareil de communication vidéo
WO2011057511A1 (zh) 实现混音的方法、装置和系统
WO2011015136A1 (zh) 一种会议控制的方法、装置和系统
WO2012034476A1 (zh) 级联会议中级联会场的处理方法、装置及级联会议系统
EP2590360B1 (en) Multi-point sound mixing method, apparatus and system
WO2016082577A1 (zh) 视频会议的处理方法及装置
WO2012028018A1 (zh) 分布式视频处理方法及视频会议系统
WO2015003532A1 (zh) 多媒体会议的建立方法、装置及系统
WO2016206471A1 (zh) 多媒体业务处理方法、系统及装置
WO2012055291A1 (zh) 音频数据传输方法及系统
WO2014026478A1 (zh) 一种视频会议信号处理的方法、视频会议服务器及系统
WO2021254452A1 (zh) 视频会议系统的控制方法、多点控制单元及存储介质
TWI531244B (zh) 視訊會議資料處理方法及系統
CN112272281B (zh) 一种区域分布式视频会议系统
JP2007013764A (ja) 映像音声配信システムおよび方法およびプログラム
CN115314666A (zh) 一种视频会议数据协同方法及系统
JP2009290410A (ja) テレビ会議装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11851966

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2011851966

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE