WO2014026478A1 - Video conference signal processing method, video conference server and video conference system - Google Patents

Video conference signal processing method, video conference server and video conference system Download PDF

Info

Publication number
WO2014026478A1
WO2014026478A1 PCT/CN2013/072264 CN2013072264W WO2014026478A1 WO 2014026478 A1 WO2014026478 A1 WO 2014026478A1 CN 2013072264 W CN2013072264 W CN 2013072264W WO 2014026478 A1 WO2014026478 A1 WO 2014026478A1
Authority
WO
WIPO (PCT)
Prior art keywords
site
sites
audio stream
conference
terminal
Prior art date
Application number
PCT/CN2013/072264
Other languages
French (fr)
Chinese (zh)
Inventor
郑瑞琴
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014026478A1 publication Critical patent/WO2014026478A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor

Definitions

  • video conferencing systems generally include a conference server and a venue terminal. See Figure 1 and Figure 1 for an example of a multipoint control unit (MCU) as a conference server.
  • MCU multipoint control unit
  • Each site in the video conferencing system has at least one site terminal.
  • Each site terminal collects the sound and image of each site and encodes it to the MCU.
  • the MCU processes the sound and image according to a certain processing method, such as sound mixing.
  • the image is forwarded or composed into a multi-picture, and the processed sound and image are sent to other venue terminals in the video conference.
  • the other venue terminals decode and output the sound and image of the remote conference site to implement remote video communication.
  • the principle of the MCU's audio mixing processing in the video conference is to mix the audio of the N venues with the highest volume in the conference terminal, and the sounds heard by all the participating terminals are the same.
  • the current video conferencing system sometimes has the following situations. A site terminal is currently watching the video corresponding to the site A. However, since the voice of the site A is not within the maximum N square, the sound of the site A cannot be heard. Unheard, the mismatch between audio and video will affect the video conferencing experience.
  • Embodiments of the present invention provide a video conference signal processing method, a video conference server, and a system, to improve the matching problem of audio and video in a video conference, so that the voice heard by the venue follows the video seen by the conference site, and the video is improved. Meeting experience.
  • An embodiment of the present invention provides a method for processing a video conference signal, including:
  • the conference server receives the site selection command of the first site terminal
  • the conference server sends a video stream to the first site terminal according to the site selection command; wherein the video stream includes the first site terminal to select a video stream corresponding to the site;
  • the conference server If the first mixed audio stream currently being played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second mixed audio to the first site terminal. And the second hybrid audio stream includes a part or all of the audio streams corresponding to the first site terminal to select the viewing site.
  • the embodiment of the invention further provides a video conference server, including:
  • a receiving module configured to receive a site selection command sent by the first site terminal, and transmit the instruction to the video stream sending module and the audio stream sending module, respectively;
  • a video stream sending module configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
  • the audio stream sending module is configured to: if the currently played first mixed audio stream does not include the audio stream corresponding to the first site terminal selection site, generate a second mixed audio stream, and send the second mixed audio stream to the first The site terminal, where the second mixed audio stream includes a part or all audio streams corresponding to the first site terminal to select the viewing site.
  • the embodiment of the invention further provides a video conference system, including:
  • a conference server configured to receive a site selection command sent by the first site terminal; send the first site terminal to the first site to select a video stream corresponding to the site; if the first site audio channel currently played by the first site terminal does not include The first site terminal selects the audio stream corresponding to the site, generates a second mixed audio stream, and sends the second mixed audio stream to the first site terminal, where the second mixed audio stream includes the first site terminal selection. View some or all of the audio streams corresponding to the site;
  • a first site terminal configured to send a site selection command to the conference server; receive a video stream corresponding to the first site terminal from the conference server, and a second mixed audio stream, where the second hybrid audio stream includes A site terminal selects to view part or all of the audio stream corresponding to the site; and plays the video stream and the second mixed audio stream.
  • the audio stream of some or all of the sites selected by the first site terminal is added to the second mixed audio stream sent to the first site terminal, the audio and video are improved to some extent.
  • the problem of out of sync improves the user experience.
  • FIG. 1 is a schematic diagram of a video conference system of the prior art
  • FIG. 2 is a schematic flowchart of a method for processing a video conference signal according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a first application scenario of a video conference signal processing method according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a second type of application scenario of a video conference signal processing method according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a conference server according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another conference server according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a video conference system according to an embodiment of the present invention.
  • the embodiments of the present invention provide a video conference signal processing method, a video conference server, and a system, so as to enhance the matching of audio and video in a video conference and improve the video conference experience.
  • An embodiment of the present invention provides a method for processing a video conference signal, including:
  • the conference server receives the site selection command sent by the first site terminal
  • the conference server sends a video stream to the first site terminal according to the site selection command; wherein the video stream includes the first site terminal to select a video stream corresponding to the site;
  • the conference server If the first mixed audio stream currently being played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second mixed audio to the first site terminal.
  • a stream, where the second mixed audio stream includes a first venue terminal selection View some or all of the audio streams corresponding to the site.
  • the embodiment of the present invention further provides a method for frequency conference signal processing. Referring to FIG. 2, the method includes the following content:
  • the conference server receives the command of the N1 site sent by the first site terminal, where the site selection command sent by the first site includes an instruction to select a site and an instruction to select multiple sites, and the first site terminal It can be a communication terminal such as a PC or a mobile phone; N1 is an arbitrary integer greater than or equal to 1.
  • the conference server sends a video stream to the first site terminal according to the command of the N1 site.
  • the video stream includes the video stream corresponding to the N1 site selected by the first site terminal.
  • the video stream Before the conference server sends the video stream to the first site terminal, the video stream can be spliced through the built-in video processing policy, and then the spliced video stream is sent to the first site terminal.
  • the conference server If the first mixed audio stream that is currently played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second to the first site terminal.
  • the audio stream is mixed, wherein the second mixed audio stream includes a part or all of the audio stream corresponding to the first site terminal to select the viewing site.
  • the conference server determines the currently played first mixed audio. Whether the stream includes the audio stream corresponding to the selected site, and if not, the conference server sends the second mixed audio stream including the audio stream corresponding to the selected site to the first site terminal, if the first site terminal includes If the audio stream corresponding to the site is selected, the second mixed audio stream does not need to be resent.
  • the conference server transmits the audio stream included in the second mixed audio stream to the first site terminal in the same channel, or the conference server transmits the second mixed audio.
  • the audio stream corresponding to the selected site and the other audio streams in the second mixed audio stream are respectively transmitted to the first venue terminal in different channels;
  • the conference server sends the second hybrid audio stream of the audio stream of the N4 sites with the highest volume among the selected sites to the first site terminal.
  • the conference server may also transmit the audio stream of the N4 conference sites and the other audio streams in the second hybrid audio stream that are selected to be the highest volume in the conference site to the first conference terminal, where N4 is smaller than Or equal to the number of people selected to watch the venue.
  • the first mixed audio stream is an audio stream that is being played by the first site terminal before the current site selection command is transmitted, and the first mixed audio stream is obtained by mixing the N3 audio streams.
  • the audio stream played by the first site terminal becomes the second mixed audio stream, that is, the second mixed audio stream replaces the first mixed audio stream to become the current venue terminal. Audio stream.
  • the audio stream of some or all of the sites selected by the first site terminal is added to the second mixed audio stream sent to the first site terminal, which improves the problem of audio and video out of synchronization to some extent. , enhanced user experience.
  • the conference server generates a second hybrid audio stream, and multiple policies can be used.
  • the following is an example of the first, the second, the third, and the fourth.
  • the conference server mixes and processes the N4 audio streams corresponding to the N4 sites in the N1 sites that are selected by the first site terminal to obtain a second mixed audio stream, where N1 and N4 are both greater than or equal to 1.
  • N1 and N4 are both greater than or equal to 1.
  • An integer, and N4 is less than or equal to N1;
  • the second mixed audio stream sent by the conference server to the first conference terminal includes not only part or all of the audio streams selected by the conference site, but also audio streams corresponding to the N2 conference venues having the highest volume in the conference site, and The number of audio streams included in the second mixed audio stream is equal to the number of audio streams included in the first mixed audio stream, as described below.
  • Strategy 3 the conference server is first
  • the second mixed audio stream sent by the site terminal does not only contain Selecting part or all of the audio stream of the conference site, and also including the audio stream contained in the first mixed audio stream. In this case, the number of audio streams of the second mixed audio stream is greater than the number of the first mixed audio stream, specifically As described below.
  • the conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the N3 audio streams of the first mixed audio stream obtained by the mixing process, to obtain the first Two mixed audio streams;
  • the second mixed audio sent by the conference server to the first conference site includes not only part or all of the audio streams selected by the conference site, but also audio streams corresponding to the N3 conference venues with the highest volume in the conference site.
  • N3 is equal to The number of audio streams included in the first mixed audio stream.
  • the second mixed audio stream includes more audio streams than the first mixed audio stream, as described below.
  • the conference server mixes the N4 audio streams corresponding to the N4 conference sites with the highest volume in the N1 conference sites selected by the first site terminal, and the N3 audio streams corresponding to the N3 conference sites with the highest volume in the conference site, and obtains the first Two mixed audio streams.
  • N2, N3 and N4 can be configured by the background.
  • the information sent by the site terminal to the conference server includes, but is not limited to, the number of sites, the audio data of each site, and the mixing mode.
  • the mixing mode may be to directly mix the sound of the venue sound or to select the conference sound and the selected voice.
  • the sounds of the venue are placed in different channels for output.
  • the audio data of each site includes: audio stream parameters such as site number, audio stream identifier, and audio gain.
  • the embodiment of the present invention provides two types of application scenarios of a video conference signal processing method.
  • the first type of application scenario is that the conference server receives an instruction for selecting a single conference site sent by the conference terminal, as shown in FIG. 3, including an application.
  • Scenario 1, application scenario 2, application scenario 3, and application scenario 4 the second type of application scenario is that the conference server receives the command to select multiple sites sent by the conference terminal, as shown in FIG. 4, including application scenario 5, application scenario 6, application.
  • the conference server only sends the audio stream corresponding to the single site selected by the first site terminal to the first site terminal.
  • the conference site in the current videoconferencing system is site A, site B, site C, site D, and conference.
  • the first mixed audio stream currently played by the site B is mixed with the audio stream corresponding to the site D.
  • the MCU of the conference server After the MCU of the conference server receives the command of the site E sent by the site A, the MCU sends the video stream corresponding to the field E to the site A. In this case, the first mixed audio stream does not include the audio stream corresponding to the site E.
  • the conference server sends the audio stream corresponding to the field E to the site A.
  • the site A hears the sound of the site E, and at the same time, the video of the site E is seen, and the effect of the sound following the image is achieved.
  • the conference server sends the audio stream corresponding to the single site selected by the first site terminal to the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and the N2+1 is the first hybrid.
  • the number of audios corresponding to the audio stream is the number of audios corresponding to the audio stream.
  • the site sound size is the site B, the site C, the site D, the site A, and the site E.
  • the first mix is currently played.
  • the audio stream is mixed by the audio stream corresponding to the site B, the site C, and the site D3.
  • the MCU of the conference server receives the command of the site E sent by the conference terminal of the site A, the MCU will video corresponding to the site E.
  • the stream is sent to the site A.
  • the first mixed audio stream does not contain the audio stream corresponding to the selected site E.
  • the MCU will have the maximum volume of the site.
  • the stream is subjected to a mixing process to obtain a second mixed audio stream, and the second mixed audio stream is sent to the site A, wherein the number of audios of the second mixed audio stream is equal to the number of audios of the first mixed audio stream. Both are three.
  • the site A sees the image of the site ⁇ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
  • the conference server sends the audio stream corresponding to the single site selected by the first site terminal to the first site terminal by adding the audio stream corresponding to the first mixed audio stream.
  • the site sound size is the site B, the site C, the site D, the site A, and the site E.
  • the first mix is currently played.
  • the audio stream is mixed by the audio stream corresponding to the two sides of the site B and the site C;
  • the MCU of the conference server When the MCU of the conference server receives the command of the conference site E sent by the conference terminal of the conference site A Afterwards, the MCU sends the video stream corresponding to the field E to the site A.
  • the first mixed audio stream does not include the audio stream corresponding to the selected site E.
  • the MCU selects the audio stream corresponding to the site E and the audio corresponding to the first mixed audio stream.
  • the stream is mixed, that is, the audio stream corresponding to the field £, the site B, and the site C is mixed, and the second mixed audio stream is obtained and sent to the site A.
  • the site A sees the image of the site ⁇ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
  • the conference server sends the audio stream corresponding to the single site of the first site to the first site, and the N3 audio stream corresponding to the N3 sites in the conference site is sent to the first site terminal, where N3 is equal to the first mixed audio.
  • the site sound size is the site B, the site C, the site D, the site A, and the site E.
  • the first mix is currently played.
  • the audio stream is mixed by the audio stream corresponding to the two sides of the site B and the site D;
  • the MCU of the conference server After the MCU of the conference server receives the command of the site E sent by the conference terminal of the site A, the MCU sends the video stream corresponding to the field E to the site A.
  • the first mixed audio stream does not include the selected site E, so the MCU will
  • the maximum volume of the conference site is 2 (equal to the number of audios in the first mixed audio stream).
  • the conference site that is, the audio stream corresponding to the site B and the site C, plus the audio stream corresponding to the selected site E, is mixed and processed.
  • the second mixed audio stream is sent to the conference site A.
  • the site A sees the image of the site ⁇ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
  • the conference server sends the audio stream corresponding to the multiple sites selected by the first site terminal to the first site terminal.
  • the conference site in the current videoconferencing system is site A, site B, site C, site D, site E, and site F.
  • the site size is site B, site C, site D, site A, site E, and site F.
  • the currently mixed first audio stream is mixed by the audio stream corresponding to the two sites of the site B and the site D;
  • the MCU of the conference server receives the conference site C and the site E sent by the conference site A.
  • the video stream is sent to the site A.
  • the first mixed audio stream does not include the audio stream corresponding to the site C and the site E. Therefore, the MCU mixes the audio streams corresponding to the site C and the site E to obtain a second. Mix the audio stream and send the second mixed audio stream to Site A.
  • the site A sees the image of the site C and the site E, and hears the sound of the site C and the site E to achieve the effect of the sound following the image.
  • the conference server sends the N1 audio streams corresponding to the N1 sites selected by the first site terminal to the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and sends the N2 audio streams to the first site terminal, where N1+N2 Equal to the number of audios corresponding to the first mixed audio stream.
  • the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F.
  • the site is in the order of site B, site C, site D, site A, site E, and site F.
  • the currently played first mixed audio stream is mixed by the audio stream corresponding to the site B, the site D, and the F3 side of the site;
  • the MCU of the conference server After the MCU of the conference server receives the command of the site C and the site E sent by the site A, the MCU performs video splicing processing on the video stream corresponding to the site C and the site E, and sends the spliced video stream to the site A.
  • the first mixed audio stream does not include the audio stream corresponding to the site C and the site E. Therefore, the MCU will join the largest site in the site, that is, the audio stream of site B plus the selected site C and site E.
  • the audio stream is subjected to a mixing process to obtain a second mixed audio stream, and the second mixed audio stream is sent to the site A, wherein the number of audios of the second mixed audio stream is equal to the number of audios of the first mixed audio stream. Both are three.
  • the site A sees the image of the site ⁇ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
  • the conference server sends the audio stream corresponding to the multiple site selected by the first site terminal, and the audio stream corresponding to the first mixed audio stream to the first site terminal.
  • the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F.
  • the site is in the order of site B, site C, site D, site A, site E, and site F.
  • the currently played first mixed audio stream is mixed by the audio stream corresponding to the three parties of the venue ⁇ conference site D and the conference site F;
  • the MCU of the conference server receives the command of the site C and the site E sent by the site A. After that, the MCU performs video splicing processing on the video stream corresponding to the site C and the site E, and sends the spliced video stream to the site A.
  • the first mixed audio stream does not include the audio stream corresponding to the selected site C and the site E. Therefore, the MCU mixes the audio stream corresponding to the first mixed audio stream and the audio stream corresponding to the selected site, that is, the field ⁇ site D, the site F, and the site C and the site E are mixed to obtain a second mixture.
  • the audio stream, and the second mixed audio stream is sent to the venue A.
  • the site A sees the image of the site ⁇ site D, the site F, the site C, and the site E.
  • the sounds of the site B, the site D, the site F, the site C, and the venue E are heard, and the effect of the sound following the image is achieved. .
  • the conference server sends the audio stream corresponding to the multiple sites selected by the first site terminal to the N3 audio streams corresponding to the N3 sites with the highest volume in the conference site, and the N3 is equal to the first mixed audio.
  • the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F.
  • the site is in the order of site B, site C, site D, site A, site E, and site F.
  • the currently played first mixed audio stream is mixed by the audio stream corresponding to the site B, the site D, and the F3 side of the site;
  • the MCU of the conference server After the MCU of the conference server receives the command of the site C and the site E sent by the site A, the MCU performs video splicing processing on the video stream corresponding to the site C and the site E, and sends the spliced video stream to the site A.
  • the first mixed audio stream does not include the audio stream corresponding to the site C and the site E. Therefore, the MCU will enter the conference site with the highest volume of the site, that is, the audio stream corresponding to the site B, the site C, and the site D.
  • the audio stream corresponding to the site C and the site E is mixed. Since the site C has coincidences, the audio streams corresponding to the site B, the site C, the site D, and the site E are mixed.
  • a second mixed audio stream is obtained, and the second mixed audio stream is sent to the conference site A.
  • the site A sees the image of the site ⁇ site C, site D, and site E.
  • the sounds of site B, site C, site D, and site E are heard, and the effect of the sound following the image is achieved.
  • the conference server selects the N4 audio streams corresponding to the N4 venues with the highest volume in the N1 conference sites selected by the first site terminal, and adds the audio streams corresponding to the N2 conference sites with the highest volume in the conference site to the audio stream, and then sends the data to the first The situation of a venue terminal.
  • the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F.
  • the site is in the order of site B, site C, site D, site A, site E, and site F.
  • the currently played first mixed audio stream is mixed by the audio stream corresponding to the site B and the site C.
  • the MCU of the conference server receives the command of the site D, the site E, and the site F sent by the conference terminal of the site A.
  • the MCU will perform video splicing processing on the video stream corresponding to the site D, the site E, and the site F, and send the spliced video stream to the site A.
  • the first mixed audio stream does not include the selected site D, the site E, and the site.
  • the audio stream corresponding to F so the MCU will be selected to view the audio stream of the first two venues in the venue, that is, the audio stream corresponding to the site D and the site E, plus the audio corresponding to the top two sites in the conference venue.
  • the stream that is, the audio stream of the site B and the site C
  • is mixed that is, the field B, the site C, the site D, and the site E are mixed to obtain a second mixed audio stream, and then the second mixed audio stream.
  • the site A sees the image of the site C, the site D, and the site E.
  • the audio processing strategy may be added, and different strategies may be used to perform audio gain of a specific site. Processing, in order to achieve less interference in each venue, you can hear the sound of the venue that the user cares about.
  • the audio gain processing uses a configurable strategy to process the audio gain based on the resolution, bandwidth, frame rate, importance, volume, etc. of the venue. Specifically, the audio gain can be adjusted using the following scheme.
  • the conference server obtains the N4 audio streams corresponding to the N4 sites in the N1 sites and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site. Select the gain of the audio stream corresponding to one or more sites in the N4 sites with the highest volume in the site or reduce the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the acquired site. The gain of the audio stream corresponding to one or more sites in the N4 sites and the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the acquired site are reduced.
  • the sound of the first site terminal to view the site is greater than the sound corresponding to the N2 site with the highest volume in the conference site;
  • the conference server obtains the N4 with the highest volume in the N1 sites selected by the first site terminal.
  • the N4 audio streams corresponding to the site and the audio streams corresponding to the N3 sites corresponding to the first mixed audio stream are increased, and the audio stream corresponding to one or more of the N4 sites in the selected site is increased.
  • the gain of the audio stream corresponding to one or more sites in the N3 sites corresponding to the obtained first mixed audio stream, or the audio corresponding to one or more sites in the acquired N4 sites The gain of the stream and the gain of the audio stream corresponding to one or more of the N3 sites are obtained, so that the sound of the first site terminal selecting to view the site is greater than the sound of the N3 audio streams corresponding to the first mixed audio stream.
  • N4 is less than or equal to N1;
  • Solution 3 The conference server obtains N3 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and N3 audio streams corresponding to the N3 sites with the highest volume in the conference site.
  • the audio stream corresponding to one or more sites in the N4 sites with the highest volume in the conference site is selected or the audio stream corresponding to one or more sites in the N3 sites with the highest volume in the acquired conference site is selected.
  • Gain or increase the gain of the audio stream corresponding to one or more sites in the acquired N4 sites and reduce the gain of the audio stream corresponding to one or more sites in the acquired N3 sites, so that the first site
  • the sound that the terminal chooses to view the site is greater than the sound corresponding to the N3 sites with the highest volume in the conference site, where N4 is less than or equal to N1.
  • the sound of the second conference audio terminal in the second mixed audio stream that is played by the first conference terminal is selected to be 1.2 to 1.5 times of the other voices in the second hybrid audio stream.
  • the embodiment of the present invention further provides a video conference server and a video conference system for implementing the foregoing solution.
  • An embodiment of the present invention provides a video conference server, including:
  • a receiving module configured to receive a site selection command sent by the first site terminal, and transmit the instruction to the video stream sending module and the audio stream sending module, respectively;
  • a video stream sending module configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
  • the audio stream sending module is configured to: if the currently played first mixed audio stream does not include the audio stream corresponding to the first site terminal selection site, generate a second mixed audio stream, and send the second mixed audio stream to the first The site terminal, where the second mixed audio stream includes a part or all audio streams corresponding to the first site terminal to select the viewing site.
  • the audio stream sending module may perform the audio stream included in the generated second mixed audio stream. Transmitting to the first site terminal in the same channel; or transmitting the audio stream corresponding to the selected site in the generated second mixed audio stream and the other audio streams in the second mixed audio stream to different channels respectively The first venue terminal.
  • the first site terminal is The played audio stream follows the video stream, which improves the user's audio and video out-of-synchronization to a certain extent and enhances the user experience.
  • the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue. The quality of the audio played by the terminal.
  • An embodiment of the present invention further provides a video conference server, as shown in FIG. 5, including:
  • the receiving module 501 is configured to receive a site selection command sent by the first site terminal, and send the instruction to the video stream sending module 502 and the audio stream sending module 503, respectively;
  • the video stream sending module 502 is configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
  • the audio stream sending module 503 is configured to generate a second mixed audio stream if the first audio stream currently played by the first site terminal does not include the audio stream corresponding to the first site terminal, and the second mixed audio stream is generated.
  • the second hybrid audio stream is sent to the first site audio terminal, where the first site audio terminal includes a part or all of the audio stream corresponding to the viewing site.
  • the audio stream sending module 503 is configured to transmit the audio stream included in the generated second mixed audio stream to the first venue terminal in the same channel, or select the generated second mixed audio stream from the selected conference site.
  • the corresponding audio stream and the other audio streams in the second mixed audio stream are respectively transmitted to the first venue terminal in different channels.
  • the audio stream sending module 503 in the video conference server adds the audio stream of part or all of the sites selected by the first site terminal to the second mixed audio stream sent by the first site terminal
  • the played audio stream follows the video stream, which improves the user's audio and video out-of-synchronization to a certain extent and enhances the user experience.
  • the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue. Terminal playing audio the quality of.
  • An embodiment of the present invention further provides a video conference server. Referring to FIG. 6, the method includes:
  • the receiving module 601 is configured to receive the site selection command sent by the first site terminal, and send the instruction to the video stream sending module 602 and the audio stream sending module 603, respectively;
  • the video stream sending module 602 is configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
  • the audio stream sending module 603 includes an obtaining module 603a and an audio gain gain processing module 603b.
  • the acquiring module 603a is configured to obtain N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal.
  • the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, or the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal are corresponding to the first mixed audio stream.
  • N4 is less than or equal to N1;
  • the audio gain processing module 603b is configured to increase the gain of the audio stream corresponding to one or more of the N4 conference sites with the highest volume among the N1 conference sites selected by the first site terminal to be obtained, or to obtain the obtained conference venue.
  • the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume is reduced, or the audio stream gain corresponding to one or more sites in the acquired N4 sites is increased and the volume of the participant site is the largest.
  • the audio stream gain corresponding to one or more sites in the N2 sites, so that the sound of the first site terminal to view the site is greater than the voice corresponding to the N2 site with the highest volume in the conference site;
  • the audio stream gain corresponding to one or more sites in the N3 sites of the obtained first mixed audio stream is reduced, or the N4 sites in the N1 sites that are selected by the first site terminal are increased in the N4 sites.
  • the gain of the audio stream corresponding to one or more sites in the N3 sites with the highest volume in the conference site is reduced, or the number of N4 sites in the N1 sites selected by the first site terminal is increased.
  • the gain of the audio stream corresponding to one or more sites and the audio stream gain of the N3 sites with the highest volume in the conference site, so that the sound of the first site terminal to view the site is greater than the N3 site with the highest volume in the conference site. sound.
  • the audio gain processing module 603b can adjust the sound corresponding to the N4 venues having the highest volume in the N1 conference sites selected by the first terminal to be 1.2 to 1.5 times the sound corresponding to the N2 conference venues having the highest volume in the conference venue;
  • the audio gain processing module 603b adjusts the sound corresponding to the N4 conference sites with the largest volume of the N1 conferences selected by the first terminal to be 1.2-1.5 times of the sound corresponding to the N3 conference sites of the first mixed audio stream;
  • the audio gain processing module 603b adjusts the sound corresponding to the N4 venues with the largest volume of the N1 venues selected by the first terminal to 1.2-1 times of the sound corresponding to the N3 venues with the highest volume in the conference venue.
  • the embodiment of the invention provides a video conference system, including:
  • a conference server configured to receive a site selection command sent by the first site terminal; send the first site terminal to the first site to select a video stream corresponding to the site; if the first site audio channel currently played by the first site terminal does not include The first site terminal selects the audio stream corresponding to the site, generates a second mixed audio stream, and sends the second mixed audio stream to the first site terminal, where the second mixed audio stream includes the first site terminal selection. View some or all of the audio streams corresponding to the site;
  • a first site terminal configured to send a site selection command to the conference server; receive a video stream corresponding to the first site terminal from the conference server, and a second mixed audio stream, where the second hybrid audio stream includes A site terminal selects to view part or all of the audio stream corresponding to the site; and plays the video stream and the second mixed audio stream.
  • the conference server may transmit the audio stream included in the generated second mixed audio stream to the first conference terminal in the same channel; or, the selected second hybrid audio stream is selected to view the audio stream corresponding to the conference site.
  • the other audio streams in the second mixed audio stream are respectively transmitted to the first venue terminal in different channels.
  • the conference server in the video conference system provided by the embodiment of the present invention will The audio stream of the part or all of the site selected by the first site terminal is added to the second mixed audio stream sent by the first site terminal, so that the audio stream played by the site terminal follows the video stream, which improves the site audio to some extent.
  • the problem of being out of sync with the video enhances the user experience.
  • the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue.
  • the quality of the audio played by the terminal The embodiment of the present invention further provides another video conferencing system, as shown in FIG.
  • a conference server 701 configured to receive a site selection command sent by the first site terminal 702; and send a first site to the first site terminal 702.
  • the terminal 702 selects the video stream corresponding to the site; if the first mixed audio stream currently played by the first site terminal 702 does not include the audio stream corresponding to the first site terminal 702, the second hybrid is sent to the first site terminal 702.
  • the audio stream, where the second mixed audio stream includes the first site terminal 702 selecting to view part or all of the audio stream corresponding to the conference site;
  • the first site terminal 702 is configured to send a site selection command to the conference server 701.
  • the first site terminal 702 that receives the conference site 701 selects a video stream corresponding to the site and a second hybrid audio stream, where the second hybrid
  • the audio stream includes a first site terminal 702 that selects to view part or all of the audio stream corresponding to the site; and plays the video stream and the second mixed audio stream.
  • the generating, by the conference server 701, the second mixed audio stream includes:
  • the conference server 701 performs mixing processing on the N4 audio streams corresponding to the N4 conference sites with the highest volume among the N1 conference sites selected by the first site terminal, to obtain a second mixed audio stream, where N4 is smaller than or equal to N1;
  • the conference server 701 mixes and processes the N4 audio streams corresponding to the N4 conference sites with the highest volume in the N1 conference sites selected by the first site terminal, and the N2 audio streams corresponding to the N2 conference sites with the highest volume in the conference site. a second mixed audio stream; wherein, N4 is less than or equal to N1, the first mixed audio stream is obtained by mixing the N3 audio streams, and N4 plus N2 is equal to N3;
  • the conference server 701 performs the mixing process on the audio stream in the first audio stream set to obtain the second mixed audio stream, where the first audio stream set includes: N4 of the N1 sites that are selected by the first site terminal and having the highest volume. N4 audio streams corresponding to the venue, and the first mixed audio stream obtained by the mixing process N3 audio streams, where N4 is less than or equal to N1; or, the first audio stream set includes: N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the volume in the conference venue The N3 audio streams corresponding to the largest N3 sites, where N4 is less than or equal to N1.
  • the conference server 701 in the video conference system adds part or all of the venues selected by the first conference terminal to the second mixed audio stream sent to the first conference terminal.
  • the audio stream is such that the audio stream played by a conference terminal follows the video stream, which improves the user's audio and video unsynchronization to a certain extent and enhances the user experience.
  • the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue.
  • the quality of the audio played by the terminal is such that the audio stream played by a conference terminal follows the video stream, which improves the user's audio and video unsynchronization to a certain extent and enhances the user experience.
  • the embodiment of the present invention further provides a video conference system, where the system also includes a conference server and a first conference terminal.
  • the conference server in the system has the above functions. It can be used to increase the audio stream gain corresponding to one or more of the N4 sites with the highest volume in the N1 sites selected by the first site terminal, or to reduce one or more of the N2 sites with the highest volume in the conference site.
  • the audio stream gain corresponding to one or more sites in the site, so that the sound of the first site terminal to view the site is greater than the sound corresponding to the N2 site with the highest volume in the conference site, where N4 is less than or equal to N1; or
  • the audio stream gain of one or more sites in the N3 sites of the first mixed audio stream obtained by the mixing process is increased, or the N4 sites with the highest volume in the N1 sites selected by the first site terminal are simultaneously increased.
  • the sound is greater than the sound corresponding to the N3 conference sites of the first mixed audio stream, where N4 is less than or equal to N1;
  • the audio stream gain corresponding to one or more sites in the conference site and the largest volume in the conference site.
  • the audio stream gain of one or more of the N3 sites is such that the sound of the first site terminal to view the site is greater than the sound of the N3 site with the highest volume in the conference site, where N4 is less than or equal to N1.
  • the conference server 701 in the embodiment may be the conference server in the foregoing method embodiment, and the functions of the respective function modules may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may be implemented by referring to the foregoing method.
  • the related description of the example will not be described here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Discloses are a video conference signal processing method, a video conference server and a video conference system. The method comprises: sending a video stream of a selected conference hall to a first conference hall terminal according to a selection instruction sent by the first conference hall terminal; determining whether a first mixed audio stream currently played by the first conference hall terminal comprises the audio stream of the selected conference hall or not; and if not, generating a second mixed audio stream and sending the second mixed audio stream to the first conference hall terminal, the second mixed audio stream comprising corresponding part or all of the audio stream of the selected conference hall. Therefore, the audio stream played by the first conference hall terminal follows a video stream, so that asynchronization of audios and videos is improved to a certain degree, and user experience is improved.

Description

一种视频会议信号处理的方法、 视频会议服务器及系统 本申请要求于 2012 年 8 月 16 日提交中国专利局、 申请号为 201210292177.4, 发明名称为"一种视频会议信号处理的方法、 视频会议服务 器及系统"的中国专利申请的优先权, 在先申请文件的内容通过引用结合在本 申请中。  Method for processing video conference signal, video conference server and system The application is submitted to the Chinese Patent Office on August 16, 2012, and the application number is 201210292177.4, and the invention name is "a video conference signal processing method and a video conference server. The priority of the Chinese Patent Application, the entire disclosure of which is incorporated herein by reference.
技术领域 Technical field
本发明涉及通信技术领域, 具体涉及一种视频会议信号处理的方法、 视 频会议服务器及系统。 背景技术 目前, 视频会议系统一般包括会议服务器和会场终端。 参见图 1 , 图 1中 以多点控制单元( MCU, Multipoint Control Unit )作为会议服务器为例。 视频 会议系统中的每个会场均具有至少 1 个会场终端, 各会场终端釆集各自会场 的声音、 图像并编码发送给 MCU; MCU按照一定的处理方式对声音、 图像 进行处理, 如声音混音、 图像转发或组成多画面等处理, 并将处理后的声音 和图像发送给视讯会议中的其它会场终端, 其它各会场终端解码输出远端会 场的声音和图像, 实现远程视频通信。  The present invention relates to the field of communications technologies, and in particular, to a video conference signal processing method, a video conference server, and a system. BACKGROUND Currently, video conferencing systems generally include a conference server and a venue terminal. See Figure 1 and Figure 1 for an example of a multipoint control unit (MCU) as a conference server. Each site in the video conferencing system has at least one site terminal. Each site terminal collects the sound and image of each site and encodes it to the MCU. The MCU processes the sound and image according to a certain processing method, such as sound mixing. The image is forwarded or composed into a multi-picture, and the processed sound and image are sent to other venue terminals in the video conference. The other venue terminals decode and output the sound and image of the remote conference site to implement remote video communication.
目前视频会议中 MCU对音频混音处理的原则是将入会终端中音量最大 的 N个会场的音频进行混音, 所有与会的终端听到的声音都是一样的。 但是, 现在的视频会议系统有时存在下面一些情况, 某会场终端当前在观看会场 A 对应的视频, 但是由于会场 A的声音不在最大 N方内, 这样就听不到会场 A 的声音, 即所看非所听, 音频和视频之间的不匹配性将影响视频会议的体验。 发明内容 本发明实施例提供一种视频会议信号处理的方法、 视频会议服务器及系 统, 以改善视频会议中音频和视频的匹配性问题, 使得会场听到的声音跟随 会场看到的视频, 提高视频会议体验。  At present, the principle of the MCU's audio mixing processing in the video conference is to mix the audio of the N venues with the highest volume in the conference terminal, and the sounds heard by all the participating terminals are the same. However, the current video conferencing system sometimes has the following situations. A site terminal is currently watching the video corresponding to the site A. However, since the voice of the site A is not within the maximum N square, the sound of the site A cannot be heard. Unheard, the mismatch between audio and video will affect the video conferencing experience. SUMMARY OF THE INVENTION Embodiments of the present invention provide a video conference signal processing method, a video conference server, and a system, to improve the matching problem of audio and video in a video conference, so that the voice heard by the venue follows the video seen by the conference site, and the video is improved. Meeting experience.
为了解决上述技术问题, 本发明实施例提供以下技术方案实现。 本发明实施例提供一种视频会议信号处理的方法, 包括: In order to solve the above technical problem, the embodiment of the present invention provides the following technical solutions. An embodiment of the present invention provides a method for processing a video conference signal, including:
会议服务器接收第一会场终端的会场选看指令;  The conference server receives the site selection command of the first site terminal;
会议服务器根据会场选看指令向第一会场终端发送视频流; 其中, 视频 流包括第一会场终端选看会场对应的视频流;  The conference server sends a video stream to the first site terminal according to the site selection command; wherein the video stream includes the first site terminal to select a video stream corresponding to the site;
若第一会场终端当前播放的第一混合音频流不包含第一会场终端选看会 场对应的音频流, 则会议服务器生成第二混合音频流, 并向第一会场终端发 送所述第二混合音频流, 其中, 所述第二混合音频流包含第一会场终端选择 观看会场对应的部分或者全部音频流。  If the first mixed audio stream currently being played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second mixed audio to the first site terminal. And the second hybrid audio stream includes a part or all of the audio streams corresponding to the first site terminal to select the viewing site.
本发明实施例还提供一种视频会议服务器, 包括:  The embodiment of the invention further provides a video conference server, including:
接收模块, 用于接收第一会场终端发送的会场选看指令, 并将该指令分 别传送给视频流发送模块和音频流发送模块;  a receiving module, configured to receive a site selection command sent by the first site terminal, and transmit the instruction to the video stream sending module and the audio stream sending module, respectively;
视频流发送模块, 用于向第一会终端发送视频流, 其中, 视频流包括第 一会场终端选看会场对应的视频流;  a video stream sending module, configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
音频流发送模块, 用于若当前播放的第一混合音频流不包含第一会场终 端选看会场对应的音频流, 则生成第二混合音频流, 并将该第二混合音频流 发送给第一会场终端, 其中, 所述第二混合音频流包含第一会场终端选择观 看会场对应的部分或者全部音频流。  The audio stream sending module is configured to: if the currently played first mixed audio stream does not include the audio stream corresponding to the first site terminal selection site, generate a second mixed audio stream, and send the second mixed audio stream to the first The site terminal, where the second mixed audio stream includes a part or all audio streams corresponding to the first site terminal to select the viewing site.
本发明实施例还提供一种视频会议系统, 包括:  The embodiment of the invention further provides a video conference system, including:
会议服务器, 用于接收第一会场终端发送的会场选看指令; 向第一会场 终发送第一会场终端选看会场对应的视频流; 若第一会场终端当前播放的第 一混合音频流不包含第一会场终端选看会场对应的音频流, 则生成第二混合 音频流, 并向第一会场终端发送所述第二混合音频流, 其中, 所述第二混合 音频流包含第一会场终端选择观看会场对应的部分或者全部音频流;  a conference server, configured to receive a site selection command sent by the first site terminal; send the first site terminal to the first site to select a video stream corresponding to the site; if the first site audio channel currently played by the first site terminal does not include The first site terminal selects the audio stream corresponding to the site, generates a second mixed audio stream, and sends the second mixed audio stream to the first site terminal, where the second mixed audio stream includes the first site terminal selection. View some or all of the audio streams corresponding to the site;
第一会场终端, 用于向会议服务器发送会场选看指令; 接收来自会议服 务器的第一会场终端选看会场对应的视频流和第二混合音频流, 其中, 所述 第二混合音频流包含第一会场终端选择观看会场对应的部分或者全部音频 流; 播放视频流和第二混合音频流。  a first site terminal, configured to send a site selection command to the conference server; receive a video stream corresponding to the first site terminal from the conference server, and a second mixed audio stream, where the second hybrid audio stream includes A site terminal selects to view part or all of the audio stream corresponding to the site; and plays the video stream and the second mixed audio stream.
由上可知, 由于在向第一会场终端发送的第二混合音频流中加入了第一会场 终端选看的部分或者全部会场的音频流, 这在一定程度上改善了音频和视频 不同步的问题, 增强了用户体验。 As can be seen from the above, since the audio stream of some or all of the sites selected by the first site terminal is added to the second mixed audio stream sent to the first site terminal, the audio and video are improved to some extent. The problem of out of sync improves the user experience.
附图说明 为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将对实 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他的附图。 BRIEF DESCRIPTION OF THE DRAWINGS In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below, and obviously, in the following description The drawings are only some of the embodiments of the present invention, and those skilled in the art can obtain other drawings based on these drawings without any creative work.
图 1是现有技术的一种视频会议系统示意图;  1 is a schematic diagram of a video conference system of the prior art;
图 2是本发明实施例提供的一种视频会议信号处理方法的流程示意图; 图 3是本发明实施例提供的一种视频会议信号处理方法的第一类应用场 景示意图;  2 is a schematic flowchart of a method for processing a video conference signal according to an embodiment of the present invention; FIG. 3 is a schematic diagram of a first application scenario of a video conference signal processing method according to an embodiment of the present invention;
图 4是本发明实施例提供的一种视频会议信号处理方法的第二类应用场 景示意图;  FIG. 4 is a schematic diagram of a second type of application scenario of a video conference signal processing method according to an embodiment of the present invention; FIG.
图 5是本发明实施例提供的一种会议服务器的结构示意图;  FIG. 5 is a schematic structural diagram of a conference server according to an embodiment of the present disclosure;
图 6是本发明实施例提供的另一种会议服务器的结构示意图;  6 is a schematic structural diagram of another conference server according to an embodiment of the present invention;
图 7是本发明实施例提供的一种视频会议系统的结构示意图。  FIG. 7 is a schematic structural diagram of a video conference system according to an embodiment of the present invention.
具体实施方式 本发明实施例提供一种视频会议信号处理的方法、 视频会议服务器及系 统, 以期增强视频会议中音视频的匹配性, 提高视频会议体验。 The embodiments of the present invention provide a video conference signal processing method, a video conference server, and a system, so as to enhance the matching of audio and video in a video conference and improve the video conference experience.
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例 , 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例提供一种视频会议信号处理的方法, 包括:  An embodiment of the present invention provides a method for processing a video conference signal, including:
会议服务器接收第一会场终端发送的会场选看指令;  The conference server receives the site selection command sent by the first site terminal;
会议服务器根据会场选看指令向第一会场终端发送视频流; 其中, 视频 流包括第一会场终端选看会场对应的视频流;  The conference server sends a video stream to the first site terminal according to the site selection command; wherein the video stream includes the first site terminal to select a video stream corresponding to the site;
若第一会场终端当前播放的第一混合音频流不包含第一会场终端选看会 场对应的音频流, 则会议服务器生成第二混合音频流, 并向第一会场终端发 送所述第二混合音频流, 其中, 所述第二混合音频流包含第一会场终端选择 观看会场对应的部分或者全部音频流。 If the first mixed audio stream currently being played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second mixed audio to the first site terminal. a stream, where the second mixed audio stream includes a first venue terminal selection View some or all of the audio streams corresponding to the site.
由上可见, 本实施例由于在第一会场终端当前播放的音频流中加入了选 择观看会场对应的部分或者全部音频流, 使得第一会场终端播放的音频流跟 随视频流, 这在一定程度上改善了音频和视频不同步的问题, 增强了用户体 验。 本发明实施例还提供一种频会议信号处理的方法, 参见图 2, 该方法包括 以下内容:  It can be seen that, in this embodiment, part or all of the audio streams corresponding to the viewing site are added to the audio stream currently played by the first site terminal, so that the audio stream played by the first site terminal follows the video stream, which is to some extent Improved audio and video out of sync issues and enhanced user experience. The embodiment of the present invention further provides a method for frequency conference signal processing. Referring to FIG. 2, the method includes the following content:
5101、 会议服务器接收第一会场终端发送的选看 N1个会场的指令; 其中, 第一会场发送的会场选看指令包括选看一个会场的指令和选看多 个会场的指令, 第一会场终端可以是 PC机、 手机等通讯终端; N1为大于或者 等于 1的任意整数。  5101. The conference server receives the command of the N1 site sent by the first site terminal, where the site selection command sent by the first site includes an instruction to select a site and an instruction to select multiple sites, and the first site terminal It can be a communication terminal such as a PC or a mobile phone; N1 is an arbitrary integer greater than or equal to 1.
5102、会议服务器根据选看 N1个会场的指令向第一会场终端发送视频流; 其中, 视频流包括第一会场终端选看的 N1个会场对应的视频流;  5102. The conference server sends a video stream to the first site terminal according to the command of the N1 site. The video stream includes the video stream corresponding to the N1 site selected by the first site terminal.
会议服务器向第一会场终端发送视频流之前可以通过内置的视频处理策 略对视频流进行拼接处理, 然后再将拼接视频流发送给第一会场终端。  Before the conference server sends the video stream to the first site terminal, the video stream can be spliced through the built-in video processing policy, and then the spliced video stream is sent to the first site terminal.
S 103、 若第一会场终端当前播放的第一混合音频流不包含第一会场终端 选看会场对应的音频流, 则会议服务器生成第二混合音频流, 并向第一会场 终端发送该第二混合音频流, 其中, 所述第二混合音频流包含第一会场终端 选择观看会场对应的部分或者全部音频流。  S103. If the first mixed audio stream that is currently played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second to the first site terminal. The audio stream is mixed, wherein the second mixed audio stream includes a part or all of the audio stream corresponding to the first site terminal to select the viewing site.
需要说明的是, 在第一会场终端发出选看指令之前, 第一会场终端播放 的为第一混合音频流, 在第一会场终端发出选看指令后, 会议服务器判断当 前播放的第一混合音频流中是否包含被选看会场对应的音频流, 若没有, 则 会议服务器向第一会场终端发送包含部分或者全部被选看会场对应的音频流 的第二混合音频流, 若第一会场终端包含被选看会场对应的音频流, 则不需 要重新发送第二混合音频流。  It should be noted that, before the first site terminal issues the selection command, the first site terminal plays the first mixed audio stream, and after the first site terminal issues the selection command, the conference server determines the currently played first mixed audio. Whether the stream includes the audio stream corresponding to the selected site, and if not, the conference server sends the second mixed audio stream including the audio stream corresponding to the selected site to the first site terminal, if the first site terminal includes If the audio stream corresponding to the site is selected, the second mixed audio stream does not need to be resent.
若第一会场终端向会议服务器发送选看一个会场的指令, 会议服务器将 第二混合音频流中包含的音频流放在同一声道中传输给第一会场终端, 或者, 会议服务器将第二混合音频流中被选看会场对应的音频流和第二混合音频流 中其它音频流分别放在不同的声道中传输给第一会场终端; 若第一会场终端向会议服务器发送选看多个会场的指令, 会议服务器将 包含被选看的多个会场中音量最大的 N4个会场的音频流的第二混合音频流发 送给第一会场终端, 会议服务器也可以将被选看会场中音量最大的 N4个会场 的音频流和第二混合音频流中的其它音频流分别放在不同的声道中传输给第 一会场终端, 其中, N4小于或者等于被选看会场的个数。 If the first site terminal sends an instruction to select a site to the conference server, the conference server transmits the audio stream included in the second mixed audio stream to the first site terminal in the same channel, or the conference server transmits the second mixed audio. The audio stream corresponding to the selected site and the other audio streams in the second mixed audio stream are respectively transmitted to the first venue terminal in different channels; If the first site terminal sends an instruction to select a plurality of sites to the conference server, the conference server sends the second hybrid audio stream of the audio stream of the N4 sites with the highest volume among the selected sites to the first site terminal. The conference server may also transmit the audio stream of the N4 conference sites and the other audio streams in the second hybrid audio stream that are selected to be the highest volume in the conference site to the first conference terminal, where N4 is smaller than Or equal to the number of people selected to watch the venue.
其中, 第一混合音频流为第一会场终端发送当前选看会场指令前正在播 放的音频流, 第一混合音频流由 N3个音频流进行混音处理得到。  The first mixed audio stream is an audio stream that is being played by the first site terminal before the current site selection command is transmitted, and the first mixed audio stream is obtained by mixing the N3 audio streams.
会议服务器向第一会场终端发送第二混合音频流后, 第一会场终端播放 的音频流变为第二混合音频流, 即第二混合音频流取代了第一混合音频流成 为第会场终端当前播放的音频流。  After the conference server sends the second mixed audio stream to the first site terminal, the audio stream played by the first site terminal becomes the second mixed audio stream, that is, the second mixed audio stream replaces the first mixed audio stream to become the current venue terminal. Audio stream.
由上可知, 由于在向第一会场终端发送的第二混合音频流中加入了第一 会场终端选看的部分或者全部会场的音频流, 这在一定程度上改善了音频和 视频不同步的问题, 增强了用户体验。  It can be seen that the audio stream of some or all of the sites selected by the first site terminal is added to the second mixed audio stream sent to the first site terminal, which improves the problem of audio and video out of synchronization to some extent. , enhanced user experience.
其中, 所述的会议服务器生成第二混合音频流可釆用多种策略, 下面以 策略一、 策略二、 策略三和策略四为例进行阐述。  The conference server generates a second hybrid audio stream, and multiple policies can be used. The following is an example of the first, the second, the third, and the fourth.
策略一: 会议服务器向第一会场终端发送的第二混合音频流仅仅包括被 选看会场的部分或者全部音频流, 具体如下所述。  Strategy 1: The second mixed audio stream sent by the conference server to the first conference terminal only includes part or all of the audio streams selected to be viewed, as follows.
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流进行混音处理, 得到第二混合音频流, 其中, N1和 N4都为大 于或者等于 1的整数, 且 N4小于或者等于 N1 ;  The conference server mixes and processes the N4 audio streams corresponding to the N4 sites in the N1 sites that are selected by the first site terminal to obtain a second mixed audio stream, where N1 and N4 are both greater than or equal to 1. An integer, and N4 is less than or equal to N1;
策略二: 会议服务器向第一会场终端发送的第二混合音频流不仅仅包含 被选看会场的部分或者全部音频流, 还包括已入会会场中音量最大的 N2个会 场所对应的音频流, 并且第二混合音频流所包含的音频流个数等于第一混合 音频流包含的音频流个数, 具体如下所述。  Strategy 2: The second mixed audio stream sent by the conference server to the first conference terminal includes not only part or all of the audio streams selected by the conference site, but also audio streams corresponding to the N2 conference venues having the highest volume in the conference site, and The number of audio streams included in the second mixed audio stream is equal to the number of audio streams included in the first mixed audio stream, as described below.
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流,和入会会场中音量最大的 N2个会场对应的 N2个音频流进行 混音处理, 得到第二混合音频流; 其中, 第一会场终端发送选看会场指令前 播放的第一混合音频流由 N3个音频流进行混音处理得到, 且 N3=N4+N2; 策略三: 会议服务器向第一会场终端发送的第二混合音频流不仅仅包含 被选看会场的部分或者全部音频流, 还包括第一混合音频流中所含有的音频 流, 此情况下, 第二混合音频流的音频流个数大于第一混合音频流的个数, 具体如下所述。 The conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and obtains the first And the first mixed audio stream played by the first site terminal before being selected to view the site command is obtained by mixing the N3 audio streams, and N3=N4+N2; Strategy 3: the conference server is first The second mixed audio stream sent by the site terminal does not only contain Selecting part or all of the audio stream of the conference site, and also including the audio stream contained in the first mixed audio stream. In this case, the number of audio streams of the second mixed audio stream is greater than the number of the first mixed audio stream, specifically As described below.
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流, 和混音处理得到的第一混合音频流的 N3个音频流进行混音 处理, 得到第二混合音频流;  The conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the N3 audio streams of the first mixed audio stream obtained by the mixing process, to obtain the first Two mixed audio streams;
策略四: 会议服务器向第一会场终端发送的第二混合音频不仅仅包含被 选看会场的部分或者全部音频流, 还包括已入会会场中音量最大的 N3个会场 所对应的音频流, N3等于第一混合音频流包含的音频流的个数, 此情况下, 第二混合音频流包含的音频流个数大于第一混合音频流的个数, 具体如下所 述。  Strategy 4: The second mixed audio sent by the conference server to the first conference site includes not only part or all of the audio streams selected by the conference site, but also audio streams corresponding to the N3 conference venues with the highest volume in the conference site. N3 is equal to The number of audio streams included in the first mixed audio stream. In this case, the second mixed audio stream includes more audio streams than the first mixed audio stream, as described below.
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流,和入会会场中音量最大的 N3个会场对应的 N3个音频流进行 混音处理, 得到第二混合音频流。  The conference server mixes the N4 audio streams corresponding to the N4 conference sites with the highest volume in the N1 conference sites selected by the first site terminal, and the N3 audio streams corresponding to the N3 conference sites with the highest volume in the conference site, and obtains the first Two mixed audio streams.
在实际应用中, N2、 N3和 N4可以由后台进行配置。 会场终端向会议服务 器发送的信息中包含但不仅限于: 会场个数、 每个会场的音频数据、 混音方 式, 其中混音方式可以是将会场声音直接混音输出或者将会议声音和被选看 会场的声音分别放到不同声道中进行输出。 每个会场的音频数据包括: 会场 号、 音频流标识、 音频增益等音频流参数。  In practical applications, N2, N3 and N4 can be configured by the background. The information sent by the site terminal to the conference server includes, but is not limited to, the number of sites, the audio data of each site, and the mixing mode. The mixing mode may be to directly mix the sound of the venue sound or to select the conference sound and the selected voice. The sounds of the venue are placed in different channels for output. The audio data of each site includes: audio stream parameters such as site number, audio stream identifier, and audio gain.
为了更好的理解以上方案, 本发明实施例提供视频会议信号处理方法的 两类应用场景, 第一类应用场景为会议服务器接收会议终端发送的选看单个 会场的指令, 参见图 3 , 包括应用场景一、 应用场景二、 应用场景三和应用场 景四, 第二类应用场景为会议服务器接收会议终端发送的选看多个会场的指 令, 参见图 4,包括应用场景五、 应用场景六、 应用场景七、 应用场景八和应用 场景九。  In order to better understand the above solution, the embodiment of the present invention provides two types of application scenarios of a video conference signal processing method. The first type of application scenario is that the conference server receives an instruction for selecting a single conference site sent by the conference terminal, as shown in FIG. 3, including an application. Scenario 1, application scenario 2, application scenario 3, and application scenario 4, the second type of application scenario is that the conference server receives the command to select multiple sites sent by the conference terminal, as shown in FIG. 4, including application scenario 5, application scenario 6, application. Scene VII, Application Scenario 8 and Application Scenario 9.
应用场景一:  Application scenario 1:
会议服务器仅将第一会场终端选看的单会场对应的音频流发送给第一会 场终端的情形。  The conference server only sends the audio stream corresponding to the single site selected by the first site terminal to the first site terminal.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场 E, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A和会场 E, 当前播 放的第一混合音频流由会场 B和会场 D对应的音频流混音而成; Assume that the conference site in the current videoconferencing system is site A, site B, site C, site D, and conference. The first mixed audio stream currently played by the site B is mixed with the audio stream corresponding to the site D.
当会议服务器的 MCU接收到会场 A发送的选看会场 E的指令后, MCU将会 场 E对应的视频流发送给会场 A, 此情形下第一混合音频流不包含会场 E对应 的音频流, 因此会议服务器将会场 E对应的音频流发送给会场 A。  After the MCU of the conference server receives the command of the site E sent by the site A, the MCU sends the video stream corresponding to the field E to the site A. In this case, the first mixed audio stream does not include the audio stream corresponding to the site E. The conference server sends the audio stream corresponding to the field E to the site A.
此情形下, 会场 A听到会场 E的声音, 同时看到会场 E的视频, 达到了声 音跟随图像的效果。  In this case, the site A hears the sound of the site E, and at the same time, the video of the site E is seen, and the effect of the sound following the image is achieved.
应用场景二:  Application scenario 2:
会议服务器将第一会场终端选看的单个会场对应的音频流, 和入会会场 中音量最大的 N2个会场对应的 N2个音频流发送给第一会场终端的情形, 并且 N2+1为第一混合音频流对应的音频个数。  The conference server sends the audio stream corresponding to the single site selected by the first site terminal to the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and the N2+1 is the first hybrid. The number of audios corresponding to the audio stream.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场 E, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A和会场 E, 当前播 放的第一混合音频流由会场 B、 会场 C和会场 D3方对应的音频流混音而成; 当会议服务器的 MCU接收到会场 A的会议终端发送的选看会场 E的指令 后, MCU将会场 E对应的视频流发送给会场 A, 第一混合音频流不包含被选会 场 E对应的音频流, MCU将会场音量最大 2方会场, 即会场 B和会场 C对应的音 频流加上被选会场 E对应的音频流进行混音处理, 得到第二混合音频流, 并将 第二混合音频流发送给会场 A, 其中, 第二混合音频流混音的音频个数与第一 混合音频流的音频个数相等, 都为 3个。  Assume that the current conference site is the site A, the site B, the site C, the site D, and the site E. The site sound size is the site B, the site C, the site D, the site A, and the site E. The first mix is currently played. The audio stream is mixed by the audio stream corresponding to the site B, the site C, and the site D3. When the MCU of the conference server receives the command of the site E sent by the conference terminal of the site A, the MCU will video corresponding to the site E. The stream is sent to the site A. The first mixed audio stream does not contain the audio stream corresponding to the selected site E. The MCU will have the maximum volume of the site. The audio stream corresponding to the site B and the site C plus the audio corresponding to the selected site E. The stream is subjected to a mixing process to obtain a second mixed audio stream, and the second mixed audio stream is sent to the site A, wherein the number of audios of the second mixed audio stream is equal to the number of audios of the first mixed audio stream. Both are three.
此情形下, 会场 A看到会场^ 会场 C和会场 E的图像, 同时听到会场 B、 会场 C和会场 E的声音, 达到声音跟随图像的效果。  In this case, the site A sees the image of the site ^ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
应用场景三:  Application scenario three:
会议服务器将第一会场终端选看的单个会场对应的音频流, 加上第一混 合音频流对应的音频流发送给第一会场终端的情形。  The conference server sends the audio stream corresponding to the single site selected by the first site terminal to the first site terminal by adding the audio stream corresponding to the first mixed audio stream.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场 E, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A和会场 E, 当前播 放的第一混合音频流由会场 B和会场 C两方对应的音频流混音而成;  Assume that the current conference site is the site A, the site B, the site C, the site D, and the site E. The site sound size is the site B, the site C, the site D, the site A, and the site E. The first mix is currently played. The audio stream is mixed by the audio stream corresponding to the two sides of the site B and the site C;
当会议服务器的 MCU接收到会场 A的会议终端发送的选看会场 E的指令 后, MCU将会场 E对应的视频流发送给会场 A, 第一混合音频流不包含被选会 场 E对应的音频流, MCU将被选会场 E对应的音频流和第一混合音频流对应的 音频流进行混音处理, 即将会场£、 会场 B和会场 C对应的音频流进行混音处 理, 得到第二混合音频流, 并将其发送给会场 A。 When the MCU of the conference server receives the command of the conference site E sent by the conference terminal of the conference site A Afterwards, the MCU sends the video stream corresponding to the field E to the site A. The first mixed audio stream does not include the audio stream corresponding to the selected site E. The MCU selects the audio stream corresponding to the site E and the audio corresponding to the first mixed audio stream. The stream is mixed, that is, the audio stream corresponding to the field £, the site B, and the site C is mixed, and the second mixed audio stream is obtained and sent to the site A.
此情形下, 会场 A看到会场^ 会场 C和会场 E的图像, 同时听到会场 B、 会场 C和会场 E的声音, 达到声音跟随图像的效果。  In this case, the site A sees the image of the site ^ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
应用场景四:  Application scenario four:
会议服务器将第一会场终端选看的单会场对应的音频流, 加上入会会场 中音量最大的 N3个会场对应的 N3个音频流发送给第一会场终端的情形, 其中 N3等于第一混合音频流包含的音频流的个数。  The conference server sends the audio stream corresponding to the single site of the first site to the first site, and the N3 audio stream corresponding to the N3 sites in the conference site is sent to the first site terminal, where N3 is equal to the first mixed audio. The number of audio streams that the stream contains.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场 E, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A和会场 E, 当前播 放的第一混合音频流由会场 B和会场 D两方对应的音频流混音而成;  Assume that the current conference site is the site A, the site B, the site C, the site D, and the site E. The site sound size is the site B, the site C, the site D, the site A, and the site E. The first mix is currently played. The audio stream is mixed by the audio stream corresponding to the two sides of the site B and the site D;
当会议服务器的 MCU接收到会场 A的会议终端发送的选看会场 E的指令 后, MCU将会场 E对应的视频流发送给会场 A, 第一混合音频流不包含被选会 场 E, 因此 MCU将入会会场中音量最大的 2 (等于第一混合音频流包含的音频 个数 ) 方会场, 即会场 B和会场 C对应的音频流加上被选会场 E对应的音频流 进行混音处理, 得到第二混合音频流, 并将第二混合音频流发送给会场 A。  After the MCU of the conference server receives the command of the site E sent by the conference terminal of the site A, the MCU sends the video stream corresponding to the field E to the site A. The first mixed audio stream does not include the selected site E, so the MCU will The maximum volume of the conference site is 2 (equal to the number of audios in the first mixed audio stream). The conference site, that is, the audio stream corresponding to the site B and the site C, plus the audio stream corresponding to the selected site E, is mixed and processed. The second mixed audio stream is sent to the conference site A.
此情形下, 会场 A看到会场^ 会场 C和会场 E的图像, 同时听到会场 B、 会场 C和会场 E的声音, 达到声音跟随图像的效果。  In this case, the site A sees the image of the site ^ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
应用场景五:  Application scenario five:
会议服务器将第一会场终端选看的多个会场对应的音频流发送给第一会 场终端的情形。  The conference server sends the audio stream corresponding to the multiple sites selected by the first site terminal to the first site terminal.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D、 会 场 E和会场 F, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A、 会场 E和 会场 F, 当前播放的第一混合音频流由会场 B和会场 D两方对应的音频流混音 而成;  Assume that the conference site in the current videoconferencing system is site A, site B, site C, site D, site E, and site F. The site size is site B, site C, site D, site A, site E, and site F. The currently mixed first audio stream is mixed by the audio stream corresponding to the two sites of the site B and the site D;
当会议服务器的 MCU接收到会场 A发送的选看会场 C和会场 E的多会 后的视频流发送给会场 A,由于第一混合音频流不包含被选看会场 C和会场 E 对应的音频流, 因此 MCU将会场 C和会场 E对应的音频流进行混音处理, 得到第二混合音频流, 并将第二混合音频流发送给会场 A。 When the MCU of the conference server receives the conference site C and the site E sent by the conference site A. The video stream is sent to the site A. The first mixed audio stream does not include the audio stream corresponding to the site C and the site E. Therefore, the MCU mixes the audio streams corresponding to the site C and the site E to obtain a second. Mix the audio stream and send the second mixed audio stream to Site A.
此情形下, 会场 A看到会场 C和会场 E的图像, 同时听到会场 C和会场 E的 声音, 达到声音跟随图像的效果。  In this case, the site A sees the image of the site C and the site E, and hears the sound of the site C and the site E to achieve the effect of the sound following the image.
应用场景六:  Application scenario six:
会议服务器将第一会场终端选看的 N1个会场对应的 N1个音频流, 加上 入会会场中音量最大的 N2个会场对应的 N2个音频流发送给第一会场终端的 情形, 其中 N1+N2等于第一混合音频流对应的音频个数。  The conference server sends the N1 audio streams corresponding to the N1 sites selected by the first site terminal to the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and sends the N2 audio streams to the first site terminal, where N1+N2 Equal to the number of audios corresponding to the first mixed audio stream.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场£、 会场 F, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A、 会场 E、 会场 F, 当前播放的第一混合音频流由会场 B、 会场 D和会场 F3方对应的音频 流混音而成;  Assume that the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F. The site is in the order of site B, site C, site D, site A, site E, and site F. The currently played first mixed audio stream is mixed by the audio stream corresponding to the site B, the site D, and the F3 side of the site;
当会议服务器的 MCU接收到会场 A发送的选看会场 C和会场 E的指令 后, MCU将会场 C和会场 E对应的视频流进行视频拼接处理, 并将拼接后的 视频流发送给会场 A, 由于第一混合音频流不包含被选看会场 C和会场 E对 应的音频流, 因此 MCU将入会会场中最大的 1个会场, 即会场 B的音频流 加上被选看会场 C和会场 E对应的音频流进行混音处理, 得到第二混合音频 流, 并将第二混合音频流发送给会场 A, 其中, 第二混合音频流的音频个数 与第一混合音频流的音频个数相等, 都为 3个。  After the MCU of the conference server receives the command of the site C and the site E sent by the site A, the MCU performs video splicing processing on the video stream corresponding to the site C and the site E, and sends the spliced video stream to the site A. The first mixed audio stream does not include the audio stream corresponding to the site C and the site E. Therefore, the MCU will join the largest site in the site, that is, the audio stream of site B plus the selected site C and site E. The audio stream is subjected to a mixing process to obtain a second mixed audio stream, and the second mixed audio stream is sent to the site A, wherein the number of audios of the second mixed audio stream is equal to the number of audios of the first mixed audio stream. Both are three.
此情形下, 会场 A看到会场^ 会场 C和会场 E的图像, 同时听到会场 B、 会场 C和会场 E的声音, 达到声音跟随图像的效果。  In this case, the site A sees the image of the site ^ site C and the site E, and hears the sounds of the site B, the site C, and the site E, to achieve the effect of the sound following the image.
应用场景七:  Application scenario seven:
会议服务器将第一会场终端选看的多会场对应的音频流, 加上第一混合 音频流对应的音频流发送给第一会场终端的情形。  The conference server sends the audio stream corresponding to the multiple site selected by the first site terminal, and the audio stream corresponding to the first mixed audio stream to the first site terminal.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场£、 会场 F, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A、 会场 E、 会场 F, 当前播放的第一混合音频流由会场^ 会场 D和会场 F三方对应的音频 流混音而成;  Assume that the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F. The site is in the order of site B, site C, site D, site A, site E, and site F. The currently played first mixed audio stream is mixed by the audio stream corresponding to the three parties of the venue ^ conference site D and the conference site F;
当会议服务器的 MCU接收到会场 A发送的选看会场 C和会场 E的指令 后, MCU将会场 C和会场 E对应的视频流进行视频拼接处理, 并将拼接后的 视频流发送给会场 A, 由于第一混合音频流不包含被选看会场 C和会场 E对 应的音频流,因此 MCU将第一混合音频流对应的音频流和被选看会场对应的 音频流进行混音处理, 即将会场^ 会场 D、 会场 F和会场 C和会场 E进行 混音处理, 得到第二混合音频流, 并将第二混合音频流发送给会场 A。 When the MCU of the conference server receives the command of the site C and the site E sent by the site A. After that, the MCU performs video splicing processing on the video stream corresponding to the site C and the site E, and sends the spliced video stream to the site A. The first mixed audio stream does not include the audio stream corresponding to the selected site C and the site E. Therefore, the MCU mixes the audio stream corresponding to the first mixed audio stream and the audio stream corresponding to the selected site, that is, the field ^ site D, the site F, and the site C and the site E are mixed to obtain a second mixture. The audio stream, and the second mixed audio stream is sent to the venue A.
此情形下, 会场 A看到会场^ 会场 D、 会场 F、 会场 C和会场 E的图像, 同时听到会场 B、 会场 D、 会场 F、 会场 C和会场 E的声音, 达到声音跟随图像 的效果。  In this case, the site A sees the image of the site ^ site D, the site F, the site C, and the site E. At the same time, the sounds of the site B, the site D, the site F, the site C, and the venue E are heard, and the effect of the sound following the image is achieved. .
应用场景八:  Application scenario eight:
会议服务器将第一会场终端选看的多会场对应的音频流, 加上入会会场 中音量最大的 N3个会场对应的 N3个音频流发送给第一会场终端的情形, 其 中 N3等于第一混合音频流包含的音频流的个数。  The conference server sends the audio stream corresponding to the multiple sites selected by the first site terminal to the N3 audio streams corresponding to the N3 sites with the highest volume in the conference site, and the N3 is equal to the first mixed audio. The number of audio streams that the stream contains.
假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场£、 会场 F, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A、 会场 E、 会场 F, 当前播放的第一混合音频流由会场 B、 会场 D和会场 F3方对应的音频 流混音而成;  Assume that the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F. The site is in the order of site B, site C, site D, site A, site E, and site F. The currently played first mixed audio stream is mixed by the audio stream corresponding to the site B, the site D, and the F3 side of the site;
当会议服务器的 MCU接收到会场 A发送的选看会场 C和会场 E的指令 后, MCU将会场 C和会场 E对应的视频流进行视频拼接处理, 并将拼接后的 视频流发送给会场 A, 由于第一混合音频流不包含被选看会场 C和会场 E对 应的音频流, 因此 MCU将入会会场中音量最大的 3个会场, 即会场 B、 会场 C和会场 D对应的音频流和被选看会场, 即会场 C和会场 E对应的音频流进 行混音处理, 由于会场 C有重合, 因此去重后, 将会场 B、 会场 C、 会场 D 和会场 E对应的音频流进行混音处理, 得到第二混合音频流, 并将第二混合 音频流发送给会场 A。  After the MCU of the conference server receives the command of the site C and the site E sent by the site A, the MCU performs video splicing processing on the video stream corresponding to the site C and the site E, and sends the spliced video stream to the site A. The first mixed audio stream does not include the audio stream corresponding to the site C and the site E. Therefore, the MCU will enter the conference site with the highest volume of the site, that is, the audio stream corresponding to the site B, the site C, and the site D. The audio stream corresponding to the site C and the site E is mixed. Since the site C has coincidences, the audio streams corresponding to the site B, the site C, the site D, and the site E are mixed. A second mixed audio stream is obtained, and the second mixed audio stream is sent to the conference site A.
此情形下, 会场 A看到会场^ 会场 C、 会场 D和会场 E的图像, 同时听到 会场 B、 会场 C、 会场 D和会场 E的声音, 达到声音跟随图像的效果。  In this case, the site A sees the image of the site ^ site C, site D, and site E. At the same time, the sounds of site B, site C, site D, and site E are heard, and the effect of the sound following the image is achieved.
应用场景九:  Application scenario nine:
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流, 加上入会会场中音量最大的 N2个会场对应的音频流进行混 音处理后发送给第一会场终端的情形。 假设当前视频会议系统中入会会场为会场 A、 会场 B、 会场 C、 会场 D和会 场£、 会场 F, 会场声音大小次序为会场 B、 会场 C、 会场 D、 会场 A、 会场 E、 会场 F , 当前播放的第一混合音频流由会场 B和会场 C对应的音频流混音而成; 当会议服务器的 MCU接收到会场 A的会议终端发送的选看会场 D、 会场 E 和会场 F的指令后 , MCU将会场 D、会场 E和会场 F对应的视频流进行视频拼接 处理, 并将拼接后的视频流发送给会场 A, 由于第一混合音频流不包含被选看 会场 D、 会场 E和会场 F对应的音频流, 因此 MCU将被选看会场音量排名前两 名的会场的音频流, 即会场 D和会场 E对应的音频流, 加上入会会场中音量排 名前两名的会场对应的音频流, 即会场 B和会场 C的音频流, 进行混音处理, 也即将会场 B、 会场 C、 会场 D和会场 E进行混音处理得到第二混合音频流, 然 后将第二混合音频流发送给会场 A。 The conference server selects the N4 audio streams corresponding to the N4 venues with the highest volume in the N1 conference sites selected by the first site terminal, and adds the audio streams corresponding to the N2 conference sites with the highest volume in the conference site to the audio stream, and then sends the data to the first The situation of a venue terminal. Assume that the conference site in the current videoconferencing system is site A, site B, site C, site D, and site, site F. The site is in the order of site B, site C, site D, site A, site E, and site F. The currently played first mixed audio stream is mixed by the audio stream corresponding to the site B and the site C. The MCU of the conference server receives the command of the site D, the site E, and the site F sent by the conference terminal of the site A. The MCU will perform video splicing processing on the video stream corresponding to the site D, the site E, and the site F, and send the spliced video stream to the site A. The first mixed audio stream does not include the selected site D, the site E, and the site. The audio stream corresponding to F, so the MCU will be selected to view the audio stream of the first two venues in the venue, that is, the audio stream corresponding to the site D and the site E, plus the audio corresponding to the top two sites in the conference venue. The stream, that is, the audio stream of the site B and the site C, is mixed, that is, the field B, the site C, the site D, and the site E are mixed to obtain a second mixed audio stream, and then the second mixed audio stream. To the venue A.
此情形下, 会场 A看到会场^ 会场 C、 会场 D和会场 E对应的图像, 同时 能听到会场 B、会场 C、会场 D和会场 E对应的声音,达到声音跟随图像的效果。 另外, 在当前多点视频会议中, 有时当前会议声音和被观看会场声音混 合在一起可能会互相干扰, 导致听不清, 因此, 可增加音频处理策略, 釆用 不同策略进行特定会场的音频增益处理, 以达到各个会场声音干扰变少, 可 以听到用户关心的会场声音。  In this case, the site A sees the image of the site C, the site D, and the site E. At the same time, you can hear the sounds of the site B, the site C, the site D, and the site E, to achieve the effect of the sound following the image. In addition, in the current multi-point video conference, sometimes the current conference sound and the viewed venue sound are mixed together, which may interfere with each other, resulting in inaudible. Therefore, the audio processing strategy may be added, and different strategies may be used to perform audio gain of a specific site. Processing, in order to achieve less interference in each venue, you can hear the sound of the venue that the user cares about.
音频增益处理釆用可配置策略, 可以根据会场的分辨率、 带宽、 帧率、 重要程度、 音量大小等对音频增益进行处理。 具体的, 可以釆用以下方案对 音频增益进行调整。  The audio gain processing uses a configurable strategy to process the audio gain based on the resolution, bandwidth, frame rate, importance, volume, etc. of the venue. Specifically, the audio gain can be adjusted using the following scheme.
方案一: 会议服务器获取第一会场终端选看的 N1个会场中音量最大 N4个 会场对应的 N4个音频流和入会会场中音量最大的 N2个会场对应的 N2个音频 流, 增大获取的被选看会场中音量最大的 N4个会场中的一个或者多个会场对 应的音频流的增益或者减小获取的入会会场中音量最大的 N2个会场中的一个 或者多个会场对应的音频流的增益, 或者同时增大获取的 N4个会场中的一个 或者多个会场对应的音频流的增益和减小获取的入会会场中音量最大的 N2个 会场中的一个或者多个会场对应的音频流的增益, 使得第一会场终端选择观 看会场的声音大于入会会场中音量最大的 N2个会场对应的声音;  Solution 1: The conference server obtains the N4 audio streams corresponding to the N4 sites in the N1 sites and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site. Select the gain of the audio stream corresponding to one or more sites in the N4 sites with the highest volume in the site or reduce the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the acquired site. The gain of the audio stream corresponding to one or more sites in the N4 sites and the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the acquired site are reduced. The sound of the first site terminal to view the site is greater than the sound corresponding to the N2 site with the highest volume in the conference site;
方案二: 会议服务器获取第一会场终端选看的 N1个会场中音量最大的 N4 个会场对应的 N4个音频流和第一混合音频流对应的 N3个会场对应的音频流, 增大获取的被选看会场中音量最大的 N4个会场中的一个或者多个会场对应的 音频流的增益或者减小获取的第一混合音频流对应的 N3个会场中的一个或者 多个会场对应的音频流的增益, 或者同时增大获取的 N4个会场中的一个或者 多个会场对应的音频流的增益和减小获取的 N3个会场中的一个或者多个会场 对应的音频流的增益, 使得第一会场终端选择观看会场的声音大于第一混合 音频流对应的 N3个音频流的声音, 其中 N4小于或者等于 N1 ; Solution 2: The conference server obtains the N4 with the highest volume in the N1 sites selected by the first site terminal. The N4 audio streams corresponding to the site and the audio streams corresponding to the N3 sites corresponding to the first mixed audio stream are increased, and the audio stream corresponding to one or more of the N4 sites in the selected site is increased. The gain of the audio stream corresponding to one or more sites in the N3 sites corresponding to the obtained first mixed audio stream, or the audio corresponding to one or more sites in the acquired N4 sites The gain of the stream and the gain of the audio stream corresponding to one or more of the N3 sites are obtained, so that the sound of the first site terminal selecting to view the site is greater than the sound of the N3 audio streams corresponding to the first mixed audio stream. Where N4 is less than or equal to N1;
方案三: 会议服务器获取第一会场终端选看的 N1个会场中音量最大的 N4 个会场对应的 N4个音频流和入会会场中音量最大的 N3个会场对应的 N3个音 频流, 增大获取的被选看会场中音量最大的 N4个会场中的一个或者多个会场 对应的音频流的增益或者减小获取的入会会场中音量最大的 N3个会场中的一 个或者多个会场对应的音频流的增益, 或者同时增大获取的 N4个会场中的一 个或者多个会场对应的音频流的增益和减小获取的 N3个会场中的一个或者多 个会场对应的音频流的增益, 使得第一会场终端选择观看会场的声音大于入 会会场中音量最大的 N3个会场对应的声音, 其中 N4小于或者等于 N1。  Solution 3: The conference server obtains N3 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and N3 audio streams corresponding to the N3 sites with the highest volume in the conference site. The audio stream corresponding to one or more sites in the N4 sites with the highest volume in the conference site is selected or the audio stream corresponding to one or more sites in the N3 sites with the highest volume in the acquired conference site is selected. Gain, or increase the gain of the audio stream corresponding to one or more sites in the acquired N4 sites and reduce the gain of the audio stream corresponding to one or more sites in the acquired N3 sites, so that the first site The sound that the terminal chooses to view the site is greater than the sound corresponding to the N3 sites with the highest volume in the conference site, where N4 is less than or equal to N1.
进一步的, 可以通过音频增益处理使播放的第二混合音频流中第一会场 终端选择观看会场的声音为第二混合音频流中其它声音的 1.2— 1.5倍。 为便于更好的实施本发明实施例的技术方案, 本发明实施例还提供用于 实施上述方案的视频会议服务器和视频会议系统。  Further, the sound of the second conference audio terminal in the second mixed audio stream that is played by the first conference terminal is selected to be 1.2 to 1.5 times of the other voices in the second hybrid audio stream. In order to facilitate the implementation of the technical solution of the embodiment of the present invention, the embodiment of the present invention further provides a video conference server and a video conference system for implementing the foregoing solution.
本发明实施例提供一种视频会议服务器, 包括:  An embodiment of the present invention provides a video conference server, including:
接收模块, 用于接收第一会场终端发送的会场选看指令, 并将该指令分 别传送给视频流发送模块和音频流发送模块;  a receiving module, configured to receive a site selection command sent by the first site terminal, and transmit the instruction to the video stream sending module and the audio stream sending module, respectively;
视频流发送模块, 用于向第一会终端发送视频流, 其中, 视频流包括第 一会场终端选看会场对应的视频流;  a video stream sending module, configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
音频流发送模块, 用于若当前播放的第一混合音频流不包含第一会场终 端选看会场对应的音频流, 则生成第二混合音频流, 并将该第二混合音频流 发送给第一会场终端, 其中, 所述第二混合音频流包含第一会场终端选择观 看会场对应的部分或者全部音频流。  The audio stream sending module is configured to: if the currently played first mixed audio stream does not include the audio stream corresponding to the first site terminal selection site, generate a second mixed audio stream, and send the second mixed audio stream to the first The site terminal, where the second mixed audio stream includes a part or all audio streams corresponding to the first site terminal to select the viewing site.
其中, 音频流发送模块可以将生成的第二混合音频流中包含的音频流放 在同一声道中传输给第一会场终端; 或者将生成的第二混合音频流中被选看 会场对应的音频流和第二混合音频流中其它的音频流分别放在不同声道中传 输给第一会场终端。 The audio stream sending module may perform the audio stream included in the generated second mixed audio stream. Transmitting to the first site terminal in the same channel; or transmitting the audio stream corresponding to the selected site in the generated second mixed audio stream and the other audio streams in the second mixed audio stream to different channels respectively The first venue terminal.
由上可见, 由于视频会议服务器中的音频流发送模块将向第一会场终端 发送的第二混合音频流中加入了第一会场终端选看的部分或者全部会场的音 频流, 使得第一会场终端播放的音频流跟随视频流, 这在一定程度上改善了 会场音频和视频不同步的问题, 增强了用户体验。 同时, 将第二混合音频流 中被选看会场对应的音频流和第二混合音频流中其它的音频流分别放在不同 声道中传输可以减少会场之间声音的互相干扰, 提高第一会场终端播放音频 的质量。  It can be seen that, because the audio stream sending module in the video conference server adds the audio stream of part or all of the sites selected by the first site terminal to the second mixed audio stream sent by the first site terminal, the first site terminal is The played audio stream follows the video stream, which improves the user's audio and video out-of-synchronization to a certain extent and enhances the user experience. At the same time, the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue. The quality of the audio played by the terminal.
本发明实施例还提供一种视频会议服务器, 参见图 5 , 包括:  An embodiment of the present invention further provides a video conference server, as shown in FIG. 5, including:
接收模块 501 , 用于接收第一会场终端发送的会场选看指令, 并将该指令 分别传送给视频流发送模块 502和音频流发送模块 503 ;  The receiving module 501 is configured to receive a site selection command sent by the first site terminal, and send the instruction to the video stream sending module 502 and the audio stream sending module 503, respectively;
视频流发送模块 502 , 用于向第一会终端发送视频流, 其中, 视频流包括 第一会场终端选看会场对应的视频流;  The video stream sending module 502 is configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
音频流发送模块 503 , 用于若第一会场终端当前播放的第一音频流不包含 第一会场终端选看会场对应的音频流, 则生成第二混合音频流, 并将该第二 混合音频流发送给第一会场终端, 其中, 所述第二混合音频流包含第一会场 终端选择观看会场对应的部分或者全部音频流。  The audio stream sending module 503 is configured to generate a second mixed audio stream if the first audio stream currently played by the first site terminal does not include the audio stream corresponding to the first site terminal, and the second mixed audio stream is generated. The second hybrid audio stream is sent to the first site audio terminal, where the first site audio terminal includes a part or all of the audio stream corresponding to the viewing site.
需要说明的是, 音频流发送模块 503以将生成的第二混合音频流中包含的 音频流放在同一声道中传输给第一会场终端; 或者将生成的第二混合音频流 中被选看会场对应的音频流和第二混合音频流中其它的音频流分别放在不同 声道中传输给第一会场终端。  It should be noted that the audio stream sending module 503 is configured to transmit the audio stream included in the generated second mixed audio stream to the first venue terminal in the same channel, or select the generated second mixed audio stream from the selected conference site. The corresponding audio stream and the other audio streams in the second mixed audio stream are respectively transmitted to the first venue terminal in different channels.
由上可见, 由于视频会议服务器中的音频流发送模块 503将向第一会场终 端发送的第二混合音频流中加入了第一会场终端选看的部分或者全部会场的 音频流, 使得一会场终端播放的音频流跟随视频流, 这在一定程度上改善了 会场音频和视频不同步的问题, 增强了用户体验。 同时, 将第二混合音频流 中被选看会场对应的音频流和第二混合音频流中其它的音频流分别放在不同 声道中传输可以减少会场之间声音的互相干扰, 提高第一会场终端播放音频 的质量。 It can be seen that, because the audio stream sending module 503 in the video conference server adds the audio stream of part or all of the sites selected by the first site terminal to the second mixed audio stream sent by the first site terminal, The played audio stream follows the video stream, which improves the user's audio and video out-of-synchronization to a certain extent and enhances the user experience. At the same time, the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue. Terminal playing audio the quality of.
本发明实施例还提供一种视频会议服务器, 参见图 6, 包括:  An embodiment of the present invention further provides a video conference server. Referring to FIG. 6, the method includes:
接收模块 601 , 用于接收第一会场终端发送的会场选看指令, 并将该指令 分别传送给视频流发送模块 602和音频流发送模块 603;  The receiving module 601 is configured to receive the site selection command sent by the first site terminal, and send the instruction to the video stream sending module 602 and the audio stream sending module 603, respectively;
视频流发送模块 602, 用于向第一会终端发送视频流, 其中, 视频流包括 第一会场终端选看会场对应的视频流;  The video stream sending module 602 is configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
音频流发送模块 603包括获取模块 603a和音频增益增益处理模块 603b; 其中, 获取模块 603a, 用于获取第一会场终端选看的 N1个会场中音量 最大的 N4个会场对应的 N4个音频流和入会会场中音量最大的 N2个会场对 应的 N2个音频流, 或者用于获取第一会场终端选看的 N1个会场中音量最大 的 N4个会场对应的 N4个音频流和第一混合音频流对应的 N3个音频流, 或 者用于获取第一会场终端选看的 N1个会场中音量最大的 N4个会场对应的 N4个音频流和入会会场中音量最大的 N3个会场对应的 N3个音频流, 其中, N4小于或者等于 N1 ;  The audio stream sending module 603 includes an obtaining module 603a and an audio gain gain processing module 603b. The acquiring module 603a is configured to obtain N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal. The N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, or the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal are corresponding to the first mixed audio stream. N3 audio streams, or N3 audio streams corresponding to the N3 venues with the highest volume in the N1 conference sites selected by the first site terminal, and N3 audio streams corresponding to the N3 conference sites with the highest volume in the conference site. Where N4 is less than or equal to N1;
音频增益处理模块 603b, 用于将获得的第一会场终端选择观看的 N1个会 场中音量最大的 N4个会场中的一个或者多个会场对应的音频流的增益增大或 者将获得的入会会场中音量最大的 N2个会场中的一个或者多个会场对应的音 频流增益减小, 或者同时增加获取的 N4个会场中的一个或者多个会场对应的 音频流增益和减小入会会场中音量最大的 N2个会场中的一个或者多个会场对 应的音频流增益, 使得第一会场终端选择观看会场的声音大于入会会场中音 量最大的 N2个会场对应的声音;  The audio gain processing module 603b is configured to increase the gain of the audio stream corresponding to one or more of the N4 conference sites with the highest volume among the N1 conference sites selected by the first site terminal to be obtained, or to obtain the obtained conference venue. The gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume is reduced, or the audio stream gain corresponding to one or more sites in the acquired N4 sites is increased and the volume of the participant site is the largest. The audio stream gain corresponding to one or more sites in the N2 sites, so that the sound of the first site terminal to view the site is greater than the voice corresponding to the N2 site with the highest volume in the conference site;
或者,  Or,
用于将获得的第一混合音频流的 N3个会场中的一个或者多个会场对应的 音频流增益减小, 或者同时增加第一会场终端选择观看的 N1个会场中音量最 大的 N4个会场中的一个或者多个会场对应的音频流的增益和减小混音处理得 到的第一混合音频流对应的 N3个会场中的一个或者多个会场对应的音频流增 益, 使得第一会场终端选择观看会场的声音大于第一混合音频流的 N3个会场 对应的声音;  The audio stream gain corresponding to one or more sites in the N3 sites of the obtained first mixed audio stream is reduced, or the N4 sites in the N1 sites that are selected by the first site terminal are increased in the N4 sites. The gain of the audio stream corresponding to the one or more sites and the audio stream gain corresponding to one or more of the N3 sites corresponding to the first mixed audio stream obtained by the mixing process, so that the first site terminal selects to view The sound of the site is greater than the sound corresponding to the N3 sites of the first mixed audio stream;
或者, 用于将获得的入会会场中音量最大的 N3个会场中的一个或者多个会场对 应的音频流增益减小, 或者同时增加第一会场终端选择的 N1个会场中音量最 大的 N4个会场中的一个或者多个会场对应的音频流的增益和减小入会会场中 音量最大的 N3个会场的音频流增益, 使得第一会场终端选择观看会场的声音 大于入会会场中音量最大的 N3个会场对应的声音。 or, The gain of the audio stream corresponding to one or more sites in the N3 sites with the highest volume in the conference site is reduced, or the number of N4 sites in the N1 sites selected by the first site terminal is increased. The gain of the audio stream corresponding to one or more sites and the audio stream gain of the N3 sites with the highest volume in the conference site, so that the sound of the first site terminal to view the site is greater than the N3 site with the highest volume in the conference site. sound.
音频增益处理模块 603b可以将第一终端选择观看的 N1个会场中音量最大 的 N4个会场对应的声音调为入会会场中音量最大的 N2个会场对应的声音的 1.2— 1.5倍;  The audio gain processing module 603b can adjust the sound corresponding to the N4 venues having the highest volume in the N1 conference sites selected by the first terminal to be 1.2 to 1.5 times the sound corresponding to the N2 conference venues having the highest volume in the conference venue;
或者, 所述音频增益处理模块 603b将第一终端选择观看的 N1个会场中音 量最大的 N4个会场对应的声音调为第一混合音频流的 N3个会场对应的声音 的 1.2—1.5倍;  Or, the audio gain processing module 603b adjusts the sound corresponding to the N4 conference sites with the largest volume of the N1 conferences selected by the first terminal to be 1.2-1.5 times of the sound corresponding to the N3 conference sites of the first mixed audio stream;
或者, 所述音频增益处理模块 603b将第一终端选择观看的 N1个会场中音 量最大的 N4个会场对应的声音调为入会会场中音量最大的 N3个会场对应的 声音的 1.2—1.5倍。 本发明实施例提供一种视频会议系统, 包括:  Alternatively, the audio gain processing module 603b adjusts the sound corresponding to the N4 venues with the largest volume of the N1 venues selected by the first terminal to 1.2-1 times of the sound corresponding to the N3 venues with the highest volume in the conference venue. The embodiment of the invention provides a video conference system, including:
会议服务器, 用于接收第一会场终端发送的会场选看指令; 向第一会场 终发送第一会场终端选看会场对应的视频流; 若第一会场终端当前播放的第 一混合音频流不包含第一会场终端选看会场对应的音频流, 则生成第二混合 音频流, 并向第一会场终端发送所述第二混合音频流, 其中, 所述第二混合 音频流包含第一会场终端选择观看会场对应的部分或者全部音频流;  a conference server, configured to receive a site selection command sent by the first site terminal; send the first site terminal to the first site to select a video stream corresponding to the site; if the first site audio channel currently played by the first site terminal does not include The first site terminal selects the audio stream corresponding to the site, generates a second mixed audio stream, and sends the second mixed audio stream to the first site terminal, where the second mixed audio stream includes the first site terminal selection. View some or all of the audio streams corresponding to the site;
第一会场终端, 用于向会议服务器发送会场选看指令; 接收来自会议服 务器的第一会场终端选看会场对应的视频流和第二混合音频流, 其中, 所述 第二混合音频流包含第一会场终端选择观看会场对应的部分或者全部音频 流; 播放视频流和第二混合音频流。  a first site terminal, configured to send a site selection command to the conference server; receive a video stream corresponding to the first site terminal from the conference server, and a second mixed audio stream, where the second hybrid audio stream includes A site terminal selects to view part or all of the audio stream corresponding to the site; and plays the video stream and the second mixed audio stream.
其中, 会议服务器可以将生成的第二混合音频流中包含的音频流放在同 一声道中传输给第一会场终端; 或者, 将生成的第二混合音频流中被选看会 场对应的音频流和第二混合音频流中其它的音频流分别放在不同声道中传输 给第一会场终端。  The conference server may transmit the audio stream included in the generated second mixed audio stream to the first conference terminal in the same channel; or, the selected second hybrid audio stream is selected to view the audio stream corresponding to the conference site. The other audio streams in the second mixed audio stream are respectively transmitted to the first venue terminal in different channels.
由上可见, 由于本发明实施例提供的视频会议系统中的会议服务器将向 第一会场终端发送的第二混合音频流中加入了第一会场终端选看的部分或者 全部会场的音频流, 使得一会场终端播放的音频流跟随视频流, 这在一定程 度上改善了会场音频和视频不同步的问题, 增强了用户体验。 同时, 将第二 混合音频流中被选看会场对应的音频流和第二混合音频流中其它的音频流分 别放在不同声道中传输可以减少会场之间声音的互相干扰, 提高第一会场终 端播放音频的质量。 本发明实施例还提供另一种视频会议系统, 参见图 7所示, 包括: 会议服务器 701 , 用于接收第一会场终端 702发送的会场选看指令; 向第 一会场终 702发送第一会场终端 702选看会场对应的视频流; 若第一会场终端 702当前播放的第一混合音频流不包含第一会场终端 702选看会场对应的音频 流, 则向第一会场终端 702发送第二混合音频流, 其中, 所述第二混合音频流 包含第一会场终端 702选择观看会场对应的部分或者全部音频流; It can be seen that the conference server in the video conference system provided by the embodiment of the present invention will The audio stream of the part or all of the site selected by the first site terminal is added to the second mixed audio stream sent by the first site terminal, so that the audio stream played by the site terminal follows the video stream, which improves the site audio to some extent. The problem of being out of sync with the video enhances the user experience. At the same time, the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue. The quality of the audio played by the terminal. The embodiment of the present invention further provides another video conferencing system, as shown in FIG. 7, including: a conference server 701, configured to receive a site selection command sent by the first site terminal 702; and send a first site to the first site terminal 702. The terminal 702 selects the video stream corresponding to the site; if the first mixed audio stream currently played by the first site terminal 702 does not include the audio stream corresponding to the first site terminal 702, the second hybrid is sent to the first site terminal 702. The audio stream, where the second mixed audio stream includes the first site terminal 702 selecting to view part or all of the audio stream corresponding to the conference site;
第一会场终端 702, 用于向会议服务器 701发送会场选看指令; 接收来自 会议服务器 701的第一会场终端 702选看会场对应的视频流和第二混合音频 流, 其中, 所述第二混合音频流包含第一会场终端 702选择观看会场对应的部 分或者全部音频流; 播放视频流和第二混合音频流。  The first site terminal 702 is configured to send a site selection command to the conference server 701. The first site terminal 702 that receives the conference site 701 selects a video stream corresponding to the site and a second hybrid audio stream, where the second hybrid The audio stream includes a first site terminal 702 that selects to view part or all of the audio stream corresponding to the site; and plays the video stream and the second mixed audio stream.
其中, 会议服务器 701生成第二混合音频流包括:  The generating, by the conference server 701, the second mixed audio stream includes:
会议服务器 701将第一会场终端选看的 N1个会场中音量最大的 N4个会场 对应的 N4个音频流进行混音处理, 得到第二混合音频流, 其中 N4小于或者等 于 N1 ;  The conference server 701 performs mixing processing on the N4 audio streams corresponding to the N4 conference sites with the highest volume among the N1 conference sites selected by the first site terminal, to obtain a second mixed audio stream, where N4 is smaller than or equal to N1;
或者,  Or,
会议服务器 701将第一会场终端选看的 N1个会场中音量最大的 N4个会场 对应的 N4个音频流,和入会会场中音量最大的 N2个会场对应的 N2个音频流进 行混音处理, 得到第二混合音频流; 其中, N4小于或者等于 N1 , 第一混合音 频流由 N3个音频流进行混音处理得到 , N4加 N2等于 N3;  The conference server 701 mixes and processes the N4 audio streams corresponding to the N4 conference sites with the highest volume in the N1 conference sites selected by the first site terminal, and the N2 audio streams corresponding to the N2 conference sites with the highest volume in the conference site. a second mixed audio stream; wherein, N4 is less than or equal to N1, the first mixed audio stream is obtained by mixing the N3 audio streams, and N4 plus N2 is equal to N3;
或者,  Or,
会议服务器 701将第一音频流集合中的音频流进行混音处理, 得到第二混 合音频流, 其中, 第一音频流集合包括: 第一会场终端选看的 N1个会场中音 量最大的 N4个会场对应的 N4个音频流, 和混音处理得到的第一混合音频流的 N3个音频流, 其中 N4小于或者等于 N1 ; 或者, 第一音频流集合包括: 第一会 场终端选看的 N1个会场中音量最大的 N4个会场对应的 N4个音频流,和入会会 场中音量最大的 N3个会场对应的 N3个音频流, 其中 N4小于或者等于 Nl。 The conference server 701 performs the mixing process on the audio stream in the first audio stream set to obtain the second mixed audio stream, where the first audio stream set includes: N4 of the N1 sites that are selected by the first site terminal and having the highest volume. N4 audio streams corresponding to the venue, and the first mixed audio stream obtained by the mixing process N3 audio streams, where N4 is less than or equal to N1; or, the first audio stream set includes: N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the volume in the conference venue The N3 audio streams corresponding to the largest N3 sites, where N4 is less than or equal to N1.
由上可见, 由上可见, 由于本发明实施例提供的视频会议系统中的会议 服务器 701将向第一会场终端发送的第二混合音频流中加入了第一会场终端 选看的部分或者全部会场的音频流, 使得一会场终端播放的音频流跟随视频 流, 这在一定程度上改善了会场音频和视频不同步的问题, 增强了用户体验。 同时, 将第二混合音频流中被选看会场对应的音频流和第二混合音频流中其 它的音频流分别放在不同声道中传输可以减少会场之间声音的互相干扰, 提 高第一会场终端播放音频的质量。 进一步的, 本发明实施例还提供一种视频会议系统, 该系统也包括会议 服务器和第一会场终端, 与上个实施例所不同的是, 该系统中的会议服务器 除具有上述功能外, 还能够用于增加第一会场终端选择观看的 N1个会场中音 量最大的 N4个会场中的一个或者多个会场对应的音频流增益或者减小入会会 场中音量最大的 N2个会场中的一个或者多个会场对应的音频增益, 或者同时 增加第一会场终端选择观看的 N1个会场中音量最大的 N4个会场中的一个或 者多个会场对应的音频流增益和减小入会会场中音量最大的 N2个会场中的一 个或者多个会场对应的音频流增益, 使得第一会场终端选择观看会场的声音 大于入会会场中音量最大的 N2个会场对应的声音,其中, N4小于或者等于 N1; 或者,  As can be seen from the above, the conference server 701 in the video conference system provided by the embodiment of the present invention adds part or all of the venues selected by the first conference terminal to the second mixed audio stream sent to the first conference terminal. The audio stream is such that the audio stream played by a conference terminal follows the video stream, which improves the user's audio and video unsynchronization to a certain extent and enhances the user experience. At the same time, the audio stream corresponding to the selected conference site in the second mixed audio stream and the other audio streams in the second mixed audio stream are respectively transmitted in different channels, which can reduce mutual interference of sounds between the venues, and improve the first conference venue. The quality of the audio played by the terminal. Further, the embodiment of the present invention further provides a video conference system, where the system also includes a conference server and a first conference terminal. The difference from the previous embodiment is that the conference server in the system has the above functions. It can be used to increase the audio stream gain corresponding to one or more of the N4 sites with the highest volume in the N1 sites selected by the first site terminal, or to reduce one or more of the N2 sites with the highest volume in the conference site. The audio gain corresponding to one site or multiple sites in the N1 sites with the highest volume in the N1 sites selected by the first site terminal and the N2 of the largest volume in the conference site. The audio stream gain corresponding to one or more sites in the site, so that the sound of the first site terminal to view the site is greater than the sound corresponding to the N2 site with the highest volume in the conference site, where N4 is less than or equal to N1; or
用于减小混音处理得到的第一混合音频流的 N3个会场中的一个或者多个 会场的音频流增益, 或者同时增加第一会场终端选择观看的 N1个会场中音量 最大的 N4个会场中的一个或者多个会场对应的音频流增益和减小混音处理得 到的第一混合音频流的 N3个会场中的一个或者多个会场的音频流增益, 使得 第一会场终端选择观看会场的声音大于第一混合音频流的 N3个会场对应的声 音, 其中, N4小于或者等于 N1 ;  The audio stream gain of one or more sites in the N3 sites of the first mixed audio stream obtained by the mixing process is increased, or the N4 sites with the highest volume in the N1 sites selected by the first site terminal are simultaneously increased. The audio stream gain of the one or more sites in the one or more sites and the audio stream gain of one or more of the N3 sites in the first mixed audio stream obtained by the mixing process, so that the first site terminal selects to view the site. The sound is greater than the sound corresponding to the N3 conference sites of the first mixed audio stream, where N4 is less than or equal to N1;
或者,  Or,
用于减小入会会场中音量最大的 N3个会场中的一个或者多个会场的音频 流增益, 或者同时增加第一会场终端选择观看的 N1个会场中音量最大的 N4个 会场中的一个或者多个会场对应的音频流增益和减小入会会场中音量最大的It is used to reduce the audio stream gain of one or more sites in the N3 sites with the highest volume in the conference site, or increase the maximum volume of N1 sites in the N1 sites selected by the first site terminal. The audio stream gain corresponding to one or more sites in the conference site and the largest volume in the conference site.
N3个会场中的一个或者多个会场的音频流增益, 使得第一会场终端选择观看 会场的声音大于入会会场中音量最大的 N3个会场对应的声音, 其中, N4小于 或者等于 Nl。 The audio stream gain of one or more of the N3 sites is such that the sound of the first site terminal to view the site is greater than the sound of the N3 site with the highest volume in the conference site, where N4 is less than or equal to N1.
可以理解的是, 实施例中的会议服务器 701可如上述方法实施例中的会议 服务器, 其各个功能模块的功能可以根据上述方法实施例中的方法具体实现, 其具体实现过程可以参照上述方法实施例的相关描述, 此处不再赘述。  It can be understood that the conference server 701 in the embodiment may be the conference server in the foregoing method embodiment, and the functions of the respective function modules may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may be implemented by referring to the foregoing method. The related description of the example will not be described here.
需要说明的是, 对于前述的各方法实施例, 为了简单描述, 故将其都表 述为一系列的动作组合, 但是本领域技术人员应该知悉, 本发明并不受所描 述的动作顺序的限制, 因为依据本发明, 某些步骤可以釆用其他顺序或者同 时进行。 其次, 本领域技术人员也应该知悉, 说明书中所描述的实施例均属 于优选实施例, 所涉及的动作和模块并不一定是本发明所必须的。  It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because in accordance with the present invention, certain steps may be performed in other sequences or concurrently. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
在上述实施例中, 对各个实施例的描述都各有侧重, 某个实施例中没 有详述的部分, 可以参见其他实施例的相关描述。  In the above embodiments, the descriptions of the various embodiments are different, and the details are not described in detail in an embodiment, and the related descriptions of other embodiments can be referred to.
以上对本发明实施例所提供的一种视频会议信号处理的方法、 会议服务  A video conference signal processing method and conference service provided by an embodiment of the present invention
方式进行了阐述, 以上实施例的说明只是用于帮助理解本发明的方法及其核 心思想; 同时, 对于本领域的一般技术人员, 依据本发明的思想, 在具体实 施方式及应用范围上均会有改变之处, 综上, 本说明书内容不应理解为对本 发明的限制。 The manners of the above embodiments are only used to help understand the method of the present invention and its core ideas; at the same time, for those skilled in the art, according to the idea of the present invention, in the specific embodiments and application scopes, In view of the above, the contents of the specification are not to be construed as limiting the invention.

Claims

权 利 要求 书 Claim
1、 一种视频会议信号处理的方法, 其特征在于, 包括: A method for processing video conference signal, characterized in that it comprises:
会议服务器接收第一会场终端的会场选看指令;  The conference server receives the site selection command of the first site terminal;
会议服务器根据会场选看指令向第一会场终端发送视频流; 其中, 视频流 包括第一会场终端选看会场对应的视频流;  The conference server sends a video stream to the first site terminal according to the site selection command; wherein the video stream includes the video stream corresponding to the site of the first site terminal;
若第一会场终端当前播放的第一混合音频流不包含第一会场终端选看会场 对应的音频流, 则会议服务器生成第二混合音频流, 并向第一会场终端发送所 述第二混合音频流, 其中, 所述第二混合音频流包含第一会场终端选择观看会 场对应的部分或者全部音频流。  If the first mixed audio stream currently being played by the first site terminal does not include the audio stream corresponding to the first site terminal, the conference server generates a second mixed audio stream, and sends the second mixed audio to the first site terminal. And the second hybrid audio stream includes a part or all of the audio streams corresponding to the first site terminal to select the viewing site.
2、 根据权利要求 1所述的方法, 其特征在于,  2. The method of claim 1 wherein
所述会议服务器生成第二混合音频流, 包括:  The conference server generates a second hybrid audio stream, including:
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对应 的 N4个音频流进行混音处理, 得到第二混合音频流, 其中 N4小于或者等于 N1 ;  The conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal to obtain a second mixed audio stream, where N4 is less than or equal to N1;
或者,  Or,
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对应 的 N4个音频流,和入会会场中音量最大的 N2个会场对应的 N2个音频流进行混音 处理, 得到第二混合音频流; 其中, N4小于或者等于 N1 , 第一混合音频流由 N3 个音频流进行混音处理得到, N4加 N2等于 N3;  The conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and obtains the first a mixed audio stream; wherein, N4 is less than or equal to N1, the first mixed audio stream is obtained by mixing the N3 audio streams, and N4 plus N2 is equal to N3;
或者,  Or,
会议服务器将第一音频流集合中的音频流进行混音处理, 得到第二混合音 频流, 其中, 第一音频流集合包括: 第一会场终端选看的 N1个会场中音量最大 的 N4个会场对应的 N4个音频流 ,和混音处理得到的第一混合音频流的 N3个音频 流, 其中 N4小于或者等于 N1 ; 或者, 第一音频流集合包括: 第一会场终端选看 的 N1个会场中音量最大的 N4个会场对应的 N4个音频流,和入会会场中音量最大 的 N3个会场对应的 N3个音频流, 其中 N4小于或者等于 Nl。  The conference server performs the mixing process on the audio stream in the first audio stream set to obtain the second mixed audio stream, where the first audio stream set includes: the N4 conference sites with the highest volume among the N1 sites selected by the first site terminal. Corresponding N4 audio streams, and N3 audio streams of the first mixed audio stream obtained by the mixing process, wherein N4 is less than or equal to N1; or, the first audio stream set includes: N1 sites selected by the first site terminal The N4 audio streams corresponding to the N4 sites with the highest volume are the N3 audio streams corresponding to the N3 sites with the highest volume in the conference site, where N4 is less than or equal to N1.
3、 根据权利要求 2所述的方法, 其特征在于, 所述方法还包括: 所述会议 服务器将第二混合音频流中包含的音频流放在同一声道中传输给第一会场终 端;  The method according to claim 2, wherein the method further comprises: the conference server transmitting the audio stream included in the second mixed audio stream to the first conference terminal in the same channel;
或者, 所述会议服务器将第二混合音频流中被选看会场对应的音频流和第二混合 音频流中其它的音频流分别放在不同声道中传输给第一会场终端。 or, The conference server transmits the audio stream corresponding to the selected conference site and the other audio streams in the second hybrid audio stream to the first conference terminal in different channels.
4、 根据权利要求 2所述的方法, 其特征在于, 所述会议服务器生成第二混 合音频流具体为:  The method according to claim 2, wherein the generating, by the conference server, the second mixed audio stream is:
会议服务器获取第一会场终端选看的 N1个会场中音量最大 N4个会场对应 的 N4个音频流和入会会场中音量最大的 N2个会场对应的 N2个音频流,增大获取 的音量最大的 N4个会场中的一个或者多个会场对应的音频流的增益或者减小获 取的入会会场中音量最大的 N2个会场中的一个或者多个会场对应的音频流的增 益, 或者同时增大获取的 N4个会场中的一个或者多个会场对应的音频流的增益 和减小获取的入会会场中音量最大的 N2个会场中的一个或者多个会场对应的音 频流的增益, 使得第一会场终端选择观看会场的声音大于入会会场中音量最大 的 N2个会场对应的声音, 其中 N4小于或者等于 N1 , 将获取的 N4个会场对应的 N4个音频流, 和获取的 N2个会场对应的 N2个音频流进行混音处理, 得到第二混 合音频流;  The conference server obtains the N4 audio streams corresponding to the maximum N4 sites in the N1 sites and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and increases the N4 volume with the largest volume. The gain of the audio stream corresponding to one or more sites in the conference site or the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the conference site, or the acquired N4 The gain of the audio stream corresponding to one or more sites in the conference site and the gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the conference site, so that the first site terminal selects to view The sound of the site is greater than the sound of the N2 sites in which the volume of the site is the largest. The N4 is less than or equal to N1, and the N4 audio streams corresponding to the N2 sites are obtained, and the N2 audio streams corresponding to the N2 sites are obtained. Mixing processing to obtain a second mixed audio stream;
或者,  Or,
会议服务器获取第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流和第一混合音频流对应的 N3个音频流, 增大获取的音量最大的 N4个会场中的一个或者多个会场对应的音频流的增益或者减小获取的第一混合 音频流对应的 N3个会场中的一个或者多个会场对应的音频流的增益, 或者同时 增大获取的音量最大的 N4个会场中的一个或者多个会场对应的音频流的增益和 减小获取的第一混合音频流对应的 N3个会场中的一个或者多个会场对应的音频 流的增益, 使得第一会场终端选择观看会场的声音大于第一混合音频流对应的 N3个音频流的声音, 其中 N4小于或者等于 N1 , 将获取的 N4个会场对应的 N4个 音频流, 和获取的第一混合音频流对应的 N3个音频流进行混音处理, 得到第二 混合音频流;  The conference server obtains the N4 audio streams corresponding to the N4 sites and the N3 audio streams corresponding to the first mixed audio stream in the N1 sites selected by the first site terminal, and increases the N4 sites with the highest volume. The gain of the audio stream corresponding to one or more sites or the gain of the audio stream corresponding to one or more of the N3 sites corresponding to the acquired first mixed audio stream, or simultaneously increase the volume of the acquired volume The gain of the audio stream corresponding to one or more sites in the N4 sites and the gain of the audio stream corresponding to one or more sites in the N3 sites corresponding to the acquired first mixed audio stream, so that the first site terminal The sound of the N3 audio streams corresponding to the first mixed audio stream is greater than or equal to N1, and the N4 audio streams corresponding to the acquired N4 conference sites are corresponding to the acquired first mixed audio stream. N3 audio streams are subjected to mixing processing to obtain a second mixed audio stream;
或者,  Or,
会议服务器获取第一会场终端选看的 N1个会场中音量最大的 N4个会场对 应的 N4个音频流和入会会场中音量最大的 N3个会场对应的 N3个音频流,增大获 取的音量最大的 N4个会场中的一个或者多个会场对应的音频流的增益或者减小 获取的入会会场中音量最大的 N3个会场中的一个或者多个会场对应的音频流的 增益, 或者同时增大获取的 N4个会场中的一个或者多个会场对应的音频流的增 益和减小获取的 N3个会场中的一个或者多个会场对应的音频流的增益, 使得第 一会场终端选择观看会场的声音大于入会会场中音量最大的 N3个会场对应的声 音, 其中 N4小于或者等于 N1 , 将获取的 N4个会场对应的 N4个音频流, 和入会会 场中音量最大的 N3个会场对应的 N3个音频流进行混音处理, 得到第二混合音频 流。 The conference server obtains N3 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and N3 audio streams corresponding to the N3 sites with the highest volume in the conference site, and increases the volume of the obtained volume. Gain or decrease of the audio stream corresponding to one or more sites in the N4 sites Gain the gain of the audio stream corresponding to one or more sites in the N3 sites with the highest volume in the conference site, or increase the gain and decrease of the audio stream corresponding to one or more sites in the acquired N4 sites. The gain of the audio stream corresponding to one or more of the N3 sites is obtained, so that the sound of the first site terminal to view the site is greater than the sound of the N3 site with the highest volume in the conference site, where N4 is less than or equal to N1. The N3 audio streams corresponding to the acquired N4 sites are mixed and processed by the N3 audio streams corresponding to the N3 conference sites having the highest volume in the conference site to obtain a second mixed audio stream.
5、 根据权利要求 4所述的方法, 其特征在于, 播放的第二混合音频流中第 一会场终端选择观看的 N1个会场中音量最大的 N4个会场对应的声音为第二混 合音频流中其它声音的 1.2— 1.5倍。  The method according to claim 4, wherein the sound corresponding to the N4 venues having the highest volume among the N1 conference sites selected by the first venue terminal in the second mixed audio stream that is played is the second mixed audio stream. 1.2-1.5 times of other sounds.
6、 一种视频会议服务器, 其特征在于, 包括:  6. A video conference server, comprising:
接收模块, 用于接收第一会场终端发送的会场选看指令, 并将该指令分别 传送给视频流发送模块和音频流发送模块;  a receiving module, configured to receive a site selection command sent by the first site terminal, and transmit the instruction to the video stream sending module and the audio stream sending module respectively;
视频流发送模块, 用于向第一会终端发送视频流, 其中, 视频流包括第一 会场终端选看会场对应的视频流;  a video stream sending module, configured to send a video stream to the first conference terminal, where the video stream includes a video stream corresponding to the first site terminal and the conference site;
音频流发送模块, 用于若当前播放的第一混合音频流不包含第一会场终端 选看会场对应的音频流, 则生成第二混合音频流, 并将该第二混合音频流发送 给第一会场终端, 其中, 所述第二混合音频流包含第一会场终端选择观看会场 对应的部分或者全部音频流。  The audio stream sending module is configured to: if the currently played first mixed audio stream does not include the audio stream corresponding to the first site terminal selection site, generate a second mixed audio stream, and send the second mixed audio stream to the first The site terminal, where the second mixed audio stream includes a part or all audio streams corresponding to the first site terminal to select the viewing site.
7、 根据权利要求 6所述的视频会议服务器, 其特征在于, 所述音频流发送 模块生成第二混合音频流具体为:  The video conference server according to claim 6, wherein the audio stream sending module generates the second mixed audio stream specifically:
音频流发送模块将第一会场终端选看的 N1个会场中音量最大的 N4个会场 对应的 N4个音频流进行混音处理, 得到第二混合音频流, 其中 N4小于或者等于 N1 ;  The audio stream sending module performs mixing processing on the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, to obtain a second mixed audio stream, where N4 is less than or equal to N1;
或者,  Or,
音频流发送模块将第一会场终端选看的 N1个会场中音量最大的 N4个会场 对应的 N4个音频流,和入会会场中音量最大的 N2个会场对应的 N2个音频流进行 混音处理, 得到第二混合音频流; 其中, N4小于或者等于 N1 , 第一混合音频流 由 N3个音频流进行混音处理得到, N4加 N2等于 N3; 或者, The audio stream sending module mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site. Obtaining a second mixed audio stream; wherein, N4 is less than or equal to N1, the first mixed audio stream is obtained by mixing processing by N3 audio streams, and N4 plus N2 is equal to N3; or,
音频流发送模块将第一音频流集合中的音频流进行混音处理, 得到第二混 合音频流, 其中, 第一音频流集合包括: 第一会场终端选看的 N1个会场中音量 最大的 N4个会场对应的 N4个音频流,和混音处理得到的第一混合音频流的 N3个 音频流, 其中 N4小于或者等于 N1 ; 或者, 第一音频流集合包括: 第一会场终端 选看的 N1个会场中音量最大的 N4个会场对应的 N4个音频流,和入会会场中音量 最大的 N3个会场对应的 N3个音频流, 其中 N4小于或者等于 Nl。  The audio stream sending module performs the mixing process on the audio stream in the first audio stream set to obtain the second mixed audio stream, where the first audio stream set includes: the N4 with the highest volume among the N1 sites selected by the first site terminal. N1 audio streams corresponding to the site, and N3 audio streams of the first mixed audio stream obtained by the mixing process, wherein N4 is less than or equal to N1; or, the first audio stream set includes: N1 selected by the first site terminal N3 audio streams corresponding to the N4 sites with the highest volume in the conference site, and N3 audio streams corresponding to the N3 conference sites with the highest volume in the conference site, where N4 is less than or equal to N1.
8、 根据权利要求 7所述的视频会议服务器, 其特征在于, 所述音频流发送 模块将生成的第二混合音频流中包含的音频流放在同一声道中传输给第一会场 终端;  The video conference server according to claim 7, wherein the audio stream sending module transmits the audio stream included in the generated second mixed audio stream to the first conference terminal in the same channel;
或者,  Or,
所述音频流发送模块将生成的第二混合音频流中被选看会场对应的音频流 和第二混合音频流中其它的音频流分别放在不同声道中传输给第一会场终端。  The audio stream sending module transmits the audio stream corresponding to the selected conference site and the other audio streams in the second mixed audio stream to the first conference terminal in different channels.
9、 根据权利要求 7所述的视频会议服务器, 其特征在于, 所述音频流发送 模块进一步包括获取模块、 音频增益处理模块;  The video conference server according to claim 7, wherein the audio stream sending module further comprises an acquiring module and an audio gain processing module;
所述获取模块, 用于获取第一会场终端选看的 N1个会场中音量最大的 N4个 会场对应的 N4个音频流和入会会场中音量最大的 N2个会场对应的 N2个音频流, 或者用于获取第一会场终端选看的 N1个会场中音量最大的 N4个会场对应的 N4 个音频流和第一混合音频流对应的 N3个音频流, 或者用于获取第一会场终端选 看的 N1个会场中音量最大的 N4个会场对应的 N4个音频流和入会会场中音量最 大的 N3个会场对应的 N3个音频流, 其中, N4小于或者等于 N1 ;  The acquiring module is configured to obtain N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, or Obtaining N4 audio streams corresponding to the N4 sites with the highest volume and the N3 audio streams corresponding to the first mixed audio stream in the N1 sites selected by the first site terminal, or acquiring the N1 of the first site terminal selection N3 audio streams corresponding to the N4 sites with the highest volume in the conference site and N3 audio streams corresponding to the N3 conference sites with the highest volume in the conference site, where N4 is less than or equal to N1;
所述音频增益处理模块, 用于将获得的第一会场终端选择观看的 N1个会场 中音量最大的 N4个会场中的一个或者多个会场对应的音频流增益增大或者将获 得的入会会场中音量最大的 N2个会场中的一个或者多个会场对应的音频流增益 减小, 或者同时增加获取的 N4个会场中的一个或者多个会场对应的音频流增益 和减小入会会场中音量最大的 N2个会场中的一个或者多个会场对应的音频流增 益, 使得第一会场终端选择观看会场的声音大于入会会场中音量最大的 N2个会 场对应的声音;  The audio gain processing module is configured to increase the gain of the audio stream corresponding to one or more of the N4 sites in the N1 sites that are selected by the first site terminal to be obtained, or to obtain the participating conference site. The gain of the audio stream corresponding to one or more sites in the N2 sites with the highest volume is reduced, or the audio stream gain corresponding to one or more sites in the acquired N4 sites is increased and the volume of the participant site is the largest. The audio stream gain corresponding to one or more sites in the N2 sites, so that the sound of the first site terminal to view the site is greater than the voice corresponding to the N2 site with the highest volume in the conference site;
或者, 用于将获得的第一混合音频流的 N3个会场中的一个或者多个会场对应的音 频流增益减小, 或者同时增加第一会场终端选择观看的 N1个会场中音量最大的 N4个会场中的一个或者多个会场对应的音频流的增益和减小混音处理得到的第 一混合音频流的 N3个会场中的一个或者多个会场的音频流增益, 使得第一会场 终端选择观看会场的声音大于第一混合音频流的 N3个会场对应的声音; or, The audio stream gain corresponding to one or more sites in the N3 sites of the obtained first mixed audio stream is reduced, or the N4 sites in the N1 sites that are selected by the first site terminal are increased in the N4 sites. The gain of the audio stream corresponding to the one or more sites and the audio stream gain of one or more of the N3 sites of the first mixed audio stream obtained by the mixing process, so that the first site terminal selects to view the site. The sound is greater than the sound corresponding to the N3 sites of the first mixed audio stream;
或者,  Or,
用于将获得的入会会场中音量最大的 N3个会场中的一个或者多个会场的音 频增益减小, 或者同时增加第一会场终端选择的 N1个会场中音量最大的 N4个会 场中的一个或者多个会场对应的音频流增益和减小入会会场中音量最大的 N3个 会场中的一个或者多个会场的音频流增益, 使得第一会场终端选择观看会场的 声音大于入会会场中音量最大的 N3个会场对应的声音。  The audio gain of one or more of the N3 sites that have the highest volume in the conference site is reduced, or one of the N4 sites with the highest volume in the N1 sites selected by the first site terminal is added or The audio stream gain of the site and the audio stream of one or more sites in the N3 sites with the highest volume in the conference site, so that the first site terminal selects to view the site more than the N3 in the conference site. The sound corresponding to the venue.
10、 根据权利要求 9所述的视频会议服务器, 其特征在于, 所述音频增益处 理模块将第一终端选择观看的 N1个会场中音量最大的 N4个会场对应的声音调 为入会会场中音量最大的 N2个会场对应的声音的 1.2— 1.5倍;  The video conference server according to claim 9, wherein the audio gain processing module adjusts the sound corresponding to the N4 venues having the highest volume among the N1 conference sites selected by the first terminal to be the highest volume in the conference venue. 1.2 to 1.5 times the sound corresponding to the N2 venues;
或者, 所述音频增益处理模块将第一终端选择观看的 N1个会场中音量最大 的 N4个会场对应的声音调为第一混合音频流的 N3个会场对应的声音的 1.2— 1.5 倍;  Or the audio gain processing module adjusts the sound corresponding to the N4 venues with the highest volume in the N1 conference sites that the first terminal selects to view to 1.2 to 1.5 times the sound corresponding to the N3 conference sites of the first mixed audio stream;
或者, 所述音频增益处理模块将第一终端选择观看的 N1个会场中音量最大 的 N4个会场对应的声音调为入会会场中音量最大的 N3个会场对应的声音的 1.2— 1.5倍。  Alternatively, the audio gain processing module adjusts the sound corresponding to the N4 venues with the highest volume in the N1 conference sites selected by the first terminal to 1.2-1 times of the sound corresponding to the N3 conference venues with the highest volume in the conference venue.
11、 一种视频会议系统, 其特征在于, 包括:  11. A video conferencing system, comprising:
会议服务器, 用于接收第一会场终端发送的会场选看指令; 向第一会场终 发送第一会场终端选看会场对应的视频流; 若第一会场终端当前播放的第一混 合音频流不包含第一会场终端选看会场对应的音频流, 则生成第二混合音频流, 并向第一会场终端发送所述第二混合音频流, 其中, 所述第二混合音频流包含 第一会场终端选择观看会场对应的部分或者全部音频流;  a conference server, configured to receive a site selection command sent by the first site terminal; send the first site terminal to the first site to select a video stream corresponding to the site; if the first site audio channel currently played by the first site terminal does not include The first site terminal selects the audio stream corresponding to the site, generates a second mixed audio stream, and sends the second mixed audio stream to the first site terminal, where the second mixed audio stream includes the first site terminal selection. View some or all of the audio streams corresponding to the site;
第一会场终端, 用于向会议服务器发送会场选看指令; 接收来自会议服务 器的第一会场终端选看会场对应的视频流和第二混合音频流, 其中, 所述第二 混合音频流包含第一会场终端选择观看会场对应的部分或者全部音频流; 播放 视频流和第二混合音频流。 a first site terminal, configured to send a site selection command to the conference server; receive a video stream corresponding to the first site terminal from the conference server, and a second mixed audio stream, where the second hybrid audio stream includes A site terminal selects to view part or all of the audio stream corresponding to the site; Video stream and second mixed audio stream.
12、 根据权利要求 11所述的视频会议系统, 其特征在于, 所述会议服务器 生成第二混合音频流, 包括:  The video conference system according to claim 11, wherein the conference server generates the second hybrid audio stream, including:
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对应 的 N4个音频流进行混音处理, 得到第二混合音频流, 其中 N4小于或者等于 N1 ;  The conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal to obtain a second mixed audio stream, where N4 is less than or equal to N1;
或者,  Or,
会议服务器将第一会场终端选看的 N1个会场中音量最大的 N4个会场对应 的 N4个音频流,和入会会场中音量最大的 N2个会场对应的 N2个音频流进行混音 处理, 得到第二混合音频流; 其中, N4小于或者等于 N1 , 第一混合音频流由 N3 个音频流进行混音处理得到, N4加 N2等于 N3;  The conference server mixes the N4 audio streams corresponding to the N4 sites with the highest volume in the N1 sites selected by the first site terminal, and the N2 audio streams corresponding to the N2 sites with the highest volume in the conference site, and obtains the first a mixed audio stream; wherein, N4 is less than or equal to N1, the first mixed audio stream is obtained by mixing the N3 audio streams, and N4 plus N2 is equal to N3;
或者,  Or,
会议服务器将第一音频流集合中的音频流进行混音处理, 得到第二混合音 频流, 其中, 第一音频流集合包括: 第一会场终端选看的 N1个会场中音量最大 的 N4个会场对应的 N4个音频流 ,和混音处理得到的第一混合音频流的 N3个音频 流, 其中 N4小于或者等于 N1 ; 或者, 第一音频流集合包括: 第一会场终端选看 的 N1个会场中音量最大的 N4个会场对应的 N4个音频流,和入会会场中音量最大 的 N3个会场对应的 N3个音频流, 其中 N4小于或者等于 Nl。  The conference server performs the mixing process on the audio stream in the first audio stream set to obtain the second mixed audio stream, where the first audio stream set includes: the N4 conference sites with the highest volume among the N1 sites selected by the first site terminal. Corresponding N4 audio streams, and N3 audio streams of the first mixed audio stream obtained by the mixing process, wherein N4 is less than or equal to N1; or, the first audio stream set includes: N1 sites selected by the first site terminal The N4 audio streams corresponding to the N4 sites with the highest volume are the N3 audio streams corresponding to the N3 sites with the highest volume in the conference site, where N4 is less than or equal to N1.
13、 根据权利要求 12所述的视频会议系统, 其特征在于, 所述会议服务器 将生成的第二混合音频流中包含的音频流放在同一声道中传输给第一会场终 端;  The video conference system according to claim 12, wherein the conference server transmits the audio stream included in the generated second mixed audio stream to the first conference terminal in the same channel;
或者,  Or,
所述会议服务器将生成的第二混合音频流中被选看会场对应的音频流和第 二混合音频流中其它的音频流分别放在不同声道中传输给第一会场终端。  The conference server transmits the audio stream corresponding to the selected conference site and the other audio streams in the second hybrid audio stream to the first conference terminal in different channels.
14、 根据权利要求 12所述的视频会议系统, 其特征在于, 所述会议服务器 还用于增加第一会场终端选择观看的 N1个会场中音量最大的 N4个会场中的一 个或者多个会场对应的音频流增益或者减小入会会场中音量最大的 N2个会场中 的一个或者多个会场对应的音频流增益, 或者同时增加第一会场终端选择观看 的 N1个会场中音量最大的 N4个会场中的一个或者多个会场对应的音频流增益 和减小入会会场中音量最大的 N2个会场中的一个或者多个会场对应的音频流增 益, 使得第一会场终端选择观看会场的声音大于入会会场中音量最大的 N2个会 场对应的声音, 其中, N4小于或者等于 N1 ; The video conference system according to claim 12, wherein the conference server is further configured to add one or more of the N4 conference sites with the highest volume among the N1 sites that the first site terminal selects to view. The audio stream gain or the audio stream gain corresponding to one or more sites in the N2 sites with the highest volume in the conference site, or the N4 sites with the highest volume in the N1 sites selected by the first site terminal. Increase the audio stream corresponding to one or more sites and reduce the audio stream corresponding to one or more sites in the N2 sites with the highest volume in the conference site. Benefits, the sound of the first venue terminal to view the venue is greater than the sound corresponding to the N2 venues with the highest volume in the conference venue, where N4 is less than or equal to N1;
或者,  Or,
用于减小混音处理得到的第一混合音频流的 N3个会场中的一个或者多个会 场的音频流增益, 或者同时增加第一会场终端选择观看的 N1个会场中音量最大 的 N4个会场中的一个或者多个会场对应的音频流增益和减小混音处理得到的第 一混合音频流的 N3个会场中的一个或者多个会场的音频流增益, 使得第一会场 终端选择观看会场的声音大于第一混合音频流的 N3个会场对应的声音, 其中, N4小于或者等于 N1 ;  The audio stream gain of one or more sites in the N3 sites of the first mixed audio stream obtained by the mixing process is increased, or the N4 sites with the highest volume in the N1 sites selected by the first site terminal are simultaneously increased. The audio stream gain of the one or more sites in the one or more sites and the audio stream gain of one or more of the N3 sites in the first mixed audio stream obtained by the mixing process, so that the first site terminal selects to view the site. The sound is greater than the sound corresponding to the N3 conference sites of the first mixed audio stream, where N4 is less than or equal to N1;
或者,  Or,
用于减小入会会场中音量最大的 N3个会场中的一个或者多个会场的音频流 增益, 或者同时增加第一会场终端选择观看的 N1个会场中音量最大的 N4个会场 中的一个或者多个会场对应的音频流增益和减小入会会场中音量最大的 N3个会 场中的一个或者多个会场对应的音频流增益, 使得第一会场终端选择观看会场 的声音大于入会会场中音量最大的 N3个会场对应的声音, 其中, N4小于或者等 于 Nl。  It is used to reduce the audio stream gain of one or more sites in the N3 sites with the highest volume in the conference site, or to increase one or more of the N4 sites with the highest volume in the N1 sites selected by the first site terminal. The audio stream gain corresponding to the site and the audio stream gain corresponding to one or more sites in the N3 sites with the highest volume in the conference site, so that the first site terminal selects to view the site more than the N3 in the conference site. The sound corresponding to the venue, where N4 is less than or equal to Nl.
15、 根据权利要求 14所述的视频会议系统, 其特征在于, 所述的会议服务 器用于将第一会场终端选择观看的 N1个会场中音量最大的 N4个会场对应的声 音调为入会会场中音量最大的 N2个会场对应的声音的 1.2— 1.5倍;  The video conference system according to claim 14, wherein the conference server is configured to adjust the voice corresponding to the N4 venues having the highest volume among the N1 conference sites selected by the first site terminal to be added to the conference venue. 1.2 to 1.5 times the sound corresponding to the N2 venues with the highest volume;
或者, 所述的会议服务器用于将第一会场终端选择观看的 N1个会场中音量 最大的 N4个会场对应的声音调为第一混合音频流的 N3个会场对应的声音的 1.2— 1.5倍;  Or the conference server is configured to adjust the voice corresponding to the N4 conference sites with the highest volume in the N1 conference sites selected by the first site terminal to be 1.2 to 1.5 times the voice corresponding to the N3 conference sites of the first mixed audio stream;
或者,所述的会议服务器用于将第一会场终端选择观看的 N1个会场中音量 最大的 N4个会场对应的声音调为入会会场中音量最大的 N3个会场对应的声音 的 1.2—1.5倍。  Alternatively, the conference server is configured to adjust the voice corresponding to the N4 venues having the highest volume in the N1 conference sites selected by the first site terminal to be 1.2 to 1.5 times the voice corresponding to the N3 conference sites having the highest volume in the conference site.
PCT/CN2013/072264 2012-08-16 2013-03-07 Video conference signal processing method, video conference server and video conference system WO2014026478A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2012102921774A CN102833520A (en) 2012-08-16 2012-08-16 Video conference signal processing method, video conference server and video conference system
CN201210292177.4 2012-08-16

Publications (1)

Publication Number Publication Date
WO2014026478A1 true WO2014026478A1 (en) 2014-02-20

Family

ID=47336462

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/072264 WO2014026478A1 (en) 2012-08-16 2013-03-07 Video conference signal processing method, video conference server and video conference system

Country Status (2)

Country Link
CN (1) CN102833520A (en)
WO (1) WO2014026478A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833520A (en) * 2012-08-16 2012-12-19 华为技术有限公司 Video conference signal processing method, video conference server and video conference system
CN106162038A (en) * 2015-03-25 2016-11-23 中兴通讯股份有限公司 A kind of audio frequency sending method and device
CN114553845A (en) * 2020-11-26 2022-05-27 上海博泰悦臻网络技术服务有限公司 Directional communication method, medium, server and communication system for social interaction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022481A (en) * 2007-03-21 2007-08-22 华为技术有限公司 Method and device for realizing private conversation in multi-point meeting
US20100315482A1 (en) * 2009-06-15 2010-12-16 Microsoft Corporation Interest Determination For Auditory Enhancement
CN102404542A (en) * 2010-09-09 2012-04-04 华为终端有限公司 Method and device for adjusting display of images of participants in multi-screen video conference
CN102833520A (en) * 2012-08-16 2012-12-19 华为技术有限公司 Video conference signal processing method, video conference server and video conference system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100531382C (en) * 2006-01-18 2009-08-19 华为技术有限公司 Device and method for transmitting visual telephone video-audio signal
CN101179693B (en) * 2007-09-26 2011-02-02 深圳市迪威视讯股份有限公司 Mixed audio processing method of session television system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101022481A (en) * 2007-03-21 2007-08-22 华为技术有限公司 Method and device for realizing private conversation in multi-point meeting
US20100315482A1 (en) * 2009-06-15 2010-12-16 Microsoft Corporation Interest Determination For Auditory Enhancement
CN102404542A (en) * 2010-09-09 2012-04-04 华为终端有限公司 Method and device for adjusting display of images of participants in multi-screen video conference
CN102833520A (en) * 2012-08-16 2012-12-19 华为技术有限公司 Video conference signal processing method, video conference server and video conference system

Also Published As

Publication number Publication date
CN102833520A (en) 2012-12-19

Similar Documents

Publication Publication Date Title
US8243120B2 (en) Method and device for realizing private session in multipoint conference
US11153533B2 (en) System and method for scalable media switching conferencing
US8830294B2 (en) Method and system for video conference control, videoconferencing network equipment, and videoconferencing site
US7973859B2 (en) Apparatus, network device and method for video/audio data transmission
US20110261151A1 (en) Video and audio processing method, multipoint control unit and videoconference system
WO2012034324A1 (en) Video conference system and implementation method thereof
WO2011140812A1 (en) Multi-picture synthesis method and system, and media processing device
WO2011015136A1 (en) Method, equipment and system for conference control
US8836753B2 (en) Method, apparatus, and system for processing cascade conference sites in cascade conference
JP2010506444A (en) System, method, and multipoint control apparatus for realizing multilingual conference
WO2015131709A1 (en) Method and device for participants to privately chat in video conference
WO2012055335A1 (en) Conference control method, apparatus and system thereof
WO2012031566A1 (en) Method and device for adjusting display of attendee images in multi-screen video conference
WO2011057511A1 (en) Method, apparatus and system for implementing audio mixing
EP3070876A1 (en) Method and system for improving teleconference services
WO2016082577A1 (en) Video conference processing method and device
WO2011150868A1 (en) Method and system for conference cascading
WO2015003532A1 (en) Multimedia conferencing establishment method, device and system
US11882385B2 (en) System and method for scalable media switching conferencing
WO2014177082A1 (en) Video conference video processing method and terminal
WO2014026478A1 (en) Video conference signal processing method, video conference server and video conference system
WO2012055291A1 (en) Method and system for transmitting audio data
US20200329083A1 (en) Video conference transmission method and apparatus, and mcu
WO2016206471A1 (en) Multimedia service processing method, system and device
WO2011153926A1 (en) Method for broadcasting meeting place image and multipoint control unit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13829096

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13829096

Country of ref document: EP

Kind code of ref document: A1