WO2012000297A1 - 多点混音远景呈现方法、装置及系统 - Google Patents

多点混音远景呈现方法、装置及系统 Download PDF

Info

Publication number
WO2012000297A1
WO2012000297A1 PCT/CN2010/080331 CN2010080331W WO2012000297A1 WO 2012000297 A1 WO2012000297 A1 WO 2012000297A1 CN 2010080331 W CN2010080331 W CN 2010080331W WO 2012000297 A1 WO2012000297 A1 WO 2012000297A1
Authority
WO
WIPO (PCT)
Prior art keywords
conference
audio code
areas
mixed
sites
Prior art date
Application number
PCT/CN2010/080331
Other languages
English (en)
French (fr)
Inventor
吴明亮
孙波
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP10854000.6A priority Critical patent/EP2590360B1/en
Priority to US13/806,275 priority patent/US20130103393A1/en
Publication of WO2012000297A1 publication Critical patent/WO2012000297A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • H04L65/4038Arrangements for multi-party communication, e.g. for conferences with floor control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants

Definitions

  • the present invention relates to the field of communications, and in particular to a multi-point mixing vision presentation method, apparatus, and system.
  • BACKGROUND OF THE INVENTION The vision presentation is deeply loved by high-end users with its true sense of presence. Listening to voice, true size, and eye contact are key technical indicators in the perspective presentation.
  • each venue has only one audio or two audio channels.
  • the sound heard by each conference site is the sound of the three conferences in the entire conference.
  • the sound source and the output of each conference sound are superimposed. There is only one, and you can't feel where the sound is coming from the venue.
  • each venue has a single screen or multiple screens, and each screen displays an image of the participant, and each corresponding participant corresponds to one audio input.
  • multi-screen such as the three-screen venue
  • the left speaker then the participants of other venues should hear the voice from the left, the right-hand speaker, the participants of other venues The voice should be heard from the middle, and the speaker should speak. Other participants in the venue should hear the sound from the right side.
  • the speaker image shows which screen is in the venue, and the sound is emitted from the orientation of the screen, that is, the sound follows the image.
  • a primary object of the present invention is to provide a multi-point mixing vision presentation method, apparatus, and system to at least solve the problem that the above-described distant view presentation system is difficult to distinguish sounds in different areas.
  • a multi-point mixing vision presentation method including: receiving an audio code stream from a plurality of conference sites, wherein each conference site includes one or more conference areas, and each conference area corresponds to one road.
  • the audio stream is mixed; the audio stream of the corresponding conference area in each venue is mixed; and the mixed audio stream is output to a corresponding conference area in each venue.
  • each of the conference areas corresponds to a different orientation
  • the mixing the audio code streams of the corresponding conference areas in each conference site includes: mixing audio code streams of the conference areas having the same orientation in each conference site;
  • the output of the mixed audio stream to the corresponding conference area in each venue includes: outputting the mixed audio stream to the conference area having the same orientation.
  • the audio code stream includes the location information of the conference area, and the audio stream of the conference area having the same orientation in each conference site is mixed according to the following: the audio code of the conference area having the same orientation in each conference site according to the orientation information.
  • the flow phase is mixed.
  • the foregoing mixing the audio code streams of the corresponding conference areas in each site includes: The audio code stream of the conference area of the first site is mixed with the audio code stream of one of the conference areas of the second site.
  • the outputting the mixed audio code stream to the corresponding conference area in each conference site includes: outputting the mixed audio code stream to the conference area of the first conference site and the conference area of the second conference site and the first conference site The audio stream is mixed with the conference area.
  • the method further includes: mixing the audio code streams of all the conference areas in the multiple conference sites, and outputting the mixed audio code streams to the first conference site.
  • one or more of the plurality of sites include three left, middle, and right conference areas.
  • a multi-point mixing vision presentation apparatus including: a receiving module, configured to receive an audio code stream from a plurality of conference sites, where each conference site includes one or more conference areas, Each conference area corresponds to one audio stream; a mixing module, which is used to place each venue The audio stream of the corresponding conference area is mixed; the output module is configured to output the mixed audio stream to the corresponding conference area in each conference site.
  • a multi-point mixing vision presentation system including: a plurality of conference sites, wherein each conference site includes one or more conference areas, and each conference area corresponds to one audio code.
  • the stream, multi-point mixing vision rendering device is configured to mix the audio code streams of the corresponding conference areas in the respective conference sites, and output the mixed audio code streams to corresponding conference areas in the respective conference sites.
  • the audio code stream of the plurality of sites is received by the present invention, wherein each site includes one or more conference areas, and each conference area corresponds to one audio code stream; and the audio code of the corresponding conference area in each conference site is used.
  • the stream is mixed; the mixed audio stream is output to the corresponding conference area in each venue, which solves the problem that the perspective presentation system is difficult to distinguish the sounds of different regions, thereby achieving different regions in the perspective presentation system. The effect of the sound.
  • FIG. 1 is a schematic diagram of a multi-point mixing vision presentation apparatus according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of a multi-point mixing vision presentation method according to a first embodiment of the present invention
  • FIG. 4 is a schematic diagram of a multi-point mixing vision presentation apparatus according to a second embodiment of the present invention
  • FIG. 4 is a schematic diagram of a multi-point mixing vision presentation apparatus according to a second embodiment of the present invention.
  • BEST MODE FOR CARRYING OUT THE INVENTION the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict.
  • 1 is a schematic diagram of a multi-point mixing vision presentation apparatus according to a first embodiment of the present invention. As shown in FIG. 1, the multi-point mixing vision presentation device includes a receiving module 102, a mixing module 104, and an output module 106.
  • the receiving module 102 is configured to receive audio code streams from multiple sites, where each site includes one or more conference areas, and each conference area corresponds to one audio code stream; the mixing module 104 is used to The audio code streams of the corresponding conference areas are mixed; the output module 106 is configured to output the mixed audio code streams to corresponding conference areas in the respective conference sites.
  • 2 is a flow chart showing a multi-point mixing vision presentation method in accordance with a first embodiment of the present invention. The method can be implemented by using the above-mentioned multi-point mixing vision presentation device. As shown in FIG. 2, the method includes the following steps: Step S202: Receive audio code streams from multiple sites, where each site includes one Or multiple conference areas, each conference area corresponding to one audio stream.
  • all the sites may be sites that include multiple conference areas, or one or more sites may include only one conference area.
  • one or more exhibitions may be included, and each exhibition corresponds to one conference area.
  • An imaging device and an audio device can also be set in each screen or conference area.
  • Step S204 mixing the audio code streams of the corresponding conference areas in the respective conference sites.
  • it may be a conference area of the same location, or any corresponding area for setting.
  • it can be divided into a left area and a right area.
  • the audio streams of each conference site in the left area are mixed, and 1 ⁇ i of each site or region of the right audio streams are mixed.
  • Step S206 Output the mixed audio code stream to a corresponding conference area in each conference site. After the audio code streams in the left area of each site are mixed, the mixed audio stream is output to the left area in each site, and after the audio streams in the right area of each site are mixed, The mixed audio stream is output to the right area in each venue.
  • the mixed audio stream is output to the right area in each venue.
  • each of the conference areas corresponds to a different orientation
  • mixing the audio code streams of the corresponding conference areas in each conference site includes: mixing the audio code streams of the conference areas having the same orientation in each conference site; and mixing the audio streams
  • the output of the code stream to the corresponding conference area in each conference site includes: outputting the mixed audio code stream to the conference area having the same orientation.
  • the audio code stream includes the orientation information of the conference area, and the audio code streams of the conference areas having the same orientation in each conference site are mixed according to: the audio code streams of the conference areas having the same orientation in each conference site are according to the orientation information. mixing. According to this embodiment, the effect of listening to the sound can be easily achieved.
  • FIG. 3 is a schematic diagram of an audio data flow of a multi-point mixing method according to an embodiment of the present invention.
  • the method includes the following steps: Step S302: During the conference, each conference site includes multiple screens, and each exhibition corresponds to one audio input. According to the orientation of the left, middle and right seats of each audio stream in the venue, the difference is mixed.
  • the difference is mixed. That is, the left-hand input sounds of all the venues are mixed and superimposed; the mid-seat input sounds of all the venues are mixed and superimposed; the right-hand input sounds of all the venues are mixed and superimposed, and the single-screen venue is used as a special mid-seat to participate in all the intermediate sound mixing; The input sounds are additionally mixed and superimposed, for a total of four groups of mixes.
  • Step S304 the conference audio processing module mixes all input code streams and outputs a plurality of mixed code streams, including a left seat, a middle seat, a right seat, and all seat mixing code streams.
  • the conference audio processing module mixes all input streams to output four sets of mixed code streams, including all left-side mixed code streams, all medium-mixed mixed streams, all right-mixed mixed streams, all seat mixing codes. flow.
  • Step S306 according to the situation of the venue, select different mixed code stream codes to output to different positions of the venue, and the left-side audio input code stream is mixed and output to the left seat, and the intermediate audio input code stream is mixed and output to the middle seat.
  • the right audio input code stream is mixed and output to the right seat, and the sound recognition effect is achieved.
  • the single-screen site audio input code stream participates in all the seats and is output to the multi-screen venue. According to the situation of the site, different mixed code streams can be selected and output to different positions of the venue. All left-mixed code streams are encoded and output to the left seat. All the mixed-mix streams are encoded and output to the middle, all right. The sound mixing code stream is output to the right seat, and the sound recognition effect is achieved. When the single-screen site and the multi-screen site are viewed from each other, all the agent mixing code streams are encoded and output to the single-screen site. The single-screen site audio input participates in all the mixing of the seats, and the code is output to the multi-show venue.
  • all left-side mixed code streams can be encoded and output to A, ⁇ , C left seats; all the mid-mixed mixed code streams are encoded and output to A, ⁇ , C; and all right-mixed mixed code streams are encoded. Output to ⁇ , ⁇ , C right seat; encode all agent mixing code streams to D single exhibition venue.
  • the above-described mixing method of the embodiment of the present invention can support the sound following image in the conference system. According to the situation of the meeting, the single-seat and multi-seat venues can be effectively mixed, which does not affect the effect of listening to the voice.
  • 4 is a schematic diagram of a multi-point mixing vision presentation device in accordance with a second embodiment of the present invention.
  • the audio processing device may include: an audio acquiring module 402, configured to acquire each audio code stream in the conference site; an audio processing module 404, configured to process the audio code stream, mix the audio code stream in the conference, and according to the audio input in the conference site.
  • the azimuth mixing code output is output;
  • the audio transmission module 406 is configured to output the audio after the mixing and encoding to the conference site.
  • FIG. 5 is a schematic diagram of a multi-point mixing conference system according to an embodiment of the present invention. As shown in FIG. 5, the multi-point mixing conference system may include a multi-point processing module 502, an access module 504, an audio processing module 506, and a media switching module 508.
  • the multi-point processing module 502 is configured to control the multi-point access, the audio processing, and the media exchange; the access module 504 is configured to access the multi-channel audio code stream of all the sites in the conference; the audio processing module 506 is used for all the audio in the conference site. Code stream codec conversion, coded output after mixing; media switching module 508 4 bar audio processing module output code stream exchange output to each venue.
  • a multi-point mixing vision presentation system is further provided.
  • the system may include: multiple conference sites, where each conference site includes one or more conference areas, and each conference area corresponds to one audio code.
  • the multi-point mixing vision rendering device is configured to mix the audio code streams of the corresponding conference areas in the respective conference sites, and output the mixed audio code streams to the corresponding conference areas i or in the respective conference sites.
  • the multi-point mixing vision presentation device in the system embodiment may be any of the multi-point mixing vision presentation devices in the above embodiments.
  • the present invention can solve one or more problems of multi-point mixing in a telepresence conferencing system, so as to be able to distinguish sounds from different regions, thereby achieving a high sense of presence. effect.
  • modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices.
  • they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明公开了一种多点混音远景呈现方法、装置及系统。其中,该多点混音远景呈现方法包括:接收来自多个会场的音频码流,其中,每个会场包括一个或多个会议区域,每个会议区域均对应一路音频码流;将各个会场中相应的会议区域的音频码流相混合;将混合后的音频码流输出至各个会场中的相应的会议区域。通过本发明提供的技术方案,能够区分远景呈现会议系统中不同区域的声音。

Description

多点;昆音远景呈现方法、 装置及系统 技术领域 本发明涉及通信领域, 具体而言, 涉及一种多点混音远景呈现方法、 装 置及系统。 背景技术 远景呈现以其真实的临场感深受高端用户的喜爱, 听声辩位、真身大小、 眼神交流是远景呈现中的关键技术指标。 在传统会议系统中, 每个会场都只有一路音频或两路音频, 每个会场听 到的声音是整个会议中三个声音最大会场混音叠加后的声音, 每个会场声音 的输入源和输出只有一个, 无法感受到声音从会场的哪个方位发出。 在远景呈现会议系统中, 每个会场有单屏或者多个屏, 每个屏显示一个 与会者图像, 相应的每个与会者对应了一路音频输入。 要达到听声辩位的效 果, 那么在多屏的情况下, 比如三屏会场, 左席发言, 那么其他会场的与会 人员应该听到声音从左侧发出, 右中席发言, 其他会场的与会人员应该听到 声音从中间发出, 席发言, 其他会场的与会人员应该听到声音从右侧发出。 发言者图像显示在会场哪个屏, 声音即从该屏所在方位发出, 即声音跟随图 像。 这种情况下, 不同方位的音频输入输出, 需要区别对待进行不同的混音, 传统的单路音频混音方法显然不能满足这种情况。 同时在单屏和多屏会场互 通的多点会议中, 单展和多展会场如何混音输出, 同时不影响两个会场听声 辩位的效果, 也是需要解决的问题。 发明人发现, 上述的相关技术中, 远景呈现会议系统难以对不同区域的 声音进行区分。 发明内容 本发明的主要目的在于提供一种多点混音远景呈现方法、 装置及系统, 以至少解决上述的远景呈现会议系统难以对不同区域的声音进行区分的问 题。 根据本发明的一个方面, 提供了一种多点混音远景呈现方法, 包括: 接 收来自多个会场的音频码流, 其中, 每个会场包括一个或多个会议区域, 每 个会议区域对应一路音频码流; 将各个会场中相应的会议区域的音频码流相 混合; 将混合后的音频码流输出至各个会场中的相应的会议区域。 进一步地, 各个会议区域分别对应不同的方位, 则上述将各个会场中相 应的会议区域的音频码流相混合包括: 将各个会场中具有相同方位的会议区 域的音频码流相混合; 则上述将混合后的音频码流输出至各个会场中的相应 的会议区域包括: 将混合后的音频码流输出至具有相同方位的会议区域。 进一步地, 音频码流中包含会议区域的方位信息, 则上述将各个会场中 具有相同方位的会议区域的音频码流相混合包括: 按照方位信息将各个会场 中具有相同方位的会议区域的音频码流相混合。 进一步地, 在多个会场中存在包括一个会议区域的第一会场和包括多个 会议区域的第二会场的情况下, 则上述将各个会场中相应的会议区域的音频 码流相混合包括: 将第一会场的会议区域的音频码流和第二会场的会议区域 中之一的音频码流相混合。 进一步地, 上述将混合后的音频码流输出至各个会场中的相应的会议区 域包括: 将混合后的音频码流输出至第一会场的会议区域和第二会场中与第 一会场的会议区域的音频码流相混合的会议区域。 进一步地, 上述方法还包括: 将多个会场中所有会议区域的音频码流相 混合, 并将混合后的音频码流输出至第一会场。 进一步地, 多个会场中的一个或任意多个会场包括左、 中、 右三个会议 区域。 根据本发明的另一方面, 提供了一种多点混音远景呈现装置, 包括: 接 收模块, 用于接收来自多个会场的音频码流, 其中, 每个会场包括一个或多 个会议区域, 每个会议区域对应一路音频码流; 混音模块, 用于将各个会场 中相应的会议区域的音频码流相混合; 输出模块, 用于将混合后的音频码流 输出至各个会场中的相应的会议区域。 才艮据本发明的另一方面, 提供了一种多点混音远景呈现系统, 包括: 多 个会场, 其中, 每个会场包括一个或多个会议区域, 每个会议区域均对应一 路音频码流, 多点混音远景呈现装置, 用于将各个会场中相应的会议区域的 音频码流相混合, 以及将混合后的音频码流输出至各个会场中的相应的会议 区域。
的会议区域。 通过本发明, 釆用接收来自多个会场的音频码流, 其中, 每个会场包括 一个或多个会议区域, 每个会议区域对应一路音频码流; 将各个会场中相应 的会议区域的音频码流相混合; 将混合后的音频码流输出至各个会场中的相 应的会议区域, 解决了远景呈现会议系统难以对不同区域的声音进行区分的 问题, 进而达到了区分远景呈现会议系统中不同区域的声音的效果。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部 分, 本发明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的 不当限定。 在附图中: 图 1是 居本发明第一实施例的多点混音远景呈现装置的示意图; 图 2是 居本发明第一实施例的多点混音远景呈现方法的流程图; 图 3是 居本发明第二实施例的多点混音远景呈现方法的流程图; 图 4是 居本发明第二实施例的多点混音远景呈现装置的示意图; 图 5为本发明实施例的多点混音会议系统的示意图。 具体实施方式 下文中将参考附图并结合实施例来详细说明本发明。 需要说明的是, 在 不冲突的情况下, 本申请中的实施例及实施例中的特征可以相互组合。 图 1是 居本发明第一实施例的多点混音远景呈现装置的示意图。 如图 1所示,该多点混音远景呈现装置包括接收模块 102、 混音模块 104 和输出模块 106。 其中, 接收模块 102用于接收来自多个会场的音频码流, 其中, 每个会 场包括一个或多个会议区域,每个会议区域对应一路音频码流;混音模块 104 用于将各个会场中相应的会议区域的音频码流相混合; 输出模块 106用于将 混合后的音频码流输出至各个会场中的相应的会议区域。 图 2是 居本发明第一实施例的多点混音远景呈现方法的流程图。 该方 法可以利用上述的多点混音远景呈现装置来实现, 如图 2所示, 该方法包括 以下步 4聚: 步骤 S202, 接收来自多个会场的音频码流, 其中, 每个会场包括一个或 多个会议区域, 每个会议区域对应一路音频码流。 例如, 在上述多个会场中, 可以所有的会场都是包括多个会议区域的会 场, 也可以有一个或者多个会场仅包括一个会议区域。 在会场中, 可以包含一个或者多个展, 每个展对应一个会议区域。 每个 屏或会议区域中还可以设置一个摄像设备和一个音频设备。 步骤 S204, 将各个会场中相应的会议区域的音频码流相混合。 例如, 可 以是相同位置的会议区域, 也可以是用于设定的任意相应的区域。 例如, 可 以分为左侧区域和右侧区域。 将各个会场中左侧区域的音频码流相混合, 以 及^ 1各个会场中右侧区 i或的音频码流相混合。 步骤 S206, 将混合后的音频码流输出至各个会场中的相应的会议区域。 在将各个会场中左侧区域的音频码流相混合之后, 将混合后的音频码流输出 至各个会场中的左侧区域,在将各个会场中右侧区域的音频码流相混合之后, 将混合后的音频码流输出至各个会场中的右侧区域。 在上述实施例中, 通过将各个会场中相应会议区域的音频码流相混合以 及将混合后的音频码流输入到相应会议区域, 能够区分远景呈现会议系统中 不同区域的声音, 进而能够提高用户的体验度。 优选地, 各个会议区域分别对应不同的方位, 将各个会场中相应的会议 区域的音频码流相混合包括: 将各个会场中具有相同方位的会议区域的音频 码流相混合;将混合后的音频码流输出至各个会场中的相应的会议区域包括: 将混合后的音频码流输出至具有相同方位的会议区域。 通过该实施例, 能够 达到听声 立的效果。 优选地, 音频码流中包含会议区域的方位信息, 将各个会场中具有相同 方位的会议区域的音频码流相混合包括: 按照方位信息将各个会场中具有相 同方位的会议区域的音频码流相混合。 通过该实施例, 能够简便地达到听声 辩位的效果。 优选地, 在多个会场中存在包括一个会议区域的第一会场和包括多个会 议区域的第二会场的情况下, 将各个会场中相应的会议区域的音频码流相混 合包括: 将第一会场的会议区域的音频码流和第二会场的会议区域中之一的 音频码流相混合。 图 3为本发明实施例提供的一种多点混音方法音频数据流图。 以场景三 屏会场和单屏会场混合的多点会议为例,如图 3所示,该方法包括以下步骤: 步骤 S302, 会议过程中, 每个会场包含多个屏, 每个展对应一路音频输 入, 根据会场中每路音频码流处于左席、 中席和右席的方位, 区别进行混音。 例如, 才艮据会场中每路音频码流处于左席、 中席和右席的方位, 区别进 行混音。 即所有会场的左席输入声音混合叠加; 所有会场的中席输入声音混 合叠加; 所有会场的右席输入声音混合叠加, 单屏会场作为特殊的中席参与 所有中席声音混音; 同时会场所有输入声音另外混合叠加, 共四组混音。 例 如, 3个三展会场 A, Β , C, 1个单屏会场 D召开多点会议, 可以将三展会 场 A, Β , C的 3路左席输入声音混合叠加; 将三展会场 A, Β , C的 3路中 席和单展会场 D的 1路共 4路输入声音混合叠加; 将三屏会场 A, Β , C的 3路右席输入声音混合叠加; 将 A, Β, C, D所有输入声音共 10路声音输 入混合叠加。 步骤 S304 , 会议音频处理模块对所有输入码流混音后输出多种混音码 流, 包含左席, 中席, 右席, 所有坐席混音码流。 会议音频处理模块对所有输入码流混音后输出四组混音码流, 包含所有 左席混音码流, 所有中席混音码流, 所有右席混音码流, 所有坐席混音码流。 步骤 S306 , 根据会场情况, 选择不同的混音码流编码输出到会场的不同 方位, 左席音频输入码流混音后输出到左席, 中席音频输入码流混音后输出 到中席, 右席音频输入码流混音后输出到右席, 达到听声辩位效果。 单屏会 场和多屏会场互看时, 所有坐席音频输入码流混音后输出到单屏会场, 单屏 会场音频输入码流参与所有中席混音后, 输出到多屏会场中席。 根据会场情况, 可以选择不同的混音码流编码输出到会场的不同方位, 所有左席混音码流编码后输出到左席,所有中席混音码流编码后输出到中席, 所有右席混音码流输出到右席, 达到听声辩位效果。 单屏会场和多屏会场互 看时, 将所有坐席混音码流编码后输出到单屏会场, 单屏会场音频输入参与 所有中席混音后, 编码输出到多展会场中席。 例如, 可以将所有左席混音码流编码输出到 A, Β, C左席; 将所有中 席混音码流编码输出到 A, Β , C中席; 将所有右席混音码流编码输出到 Α, Β , C右席; 将所有坐席混音码流编码输出到 D单展会场。 上述本发明实施例的混音方法, 能够支持会议系统中声音跟随图像。 并 且根据会议中会场情况, 单席会场和多席会场都能进行有效的混音, 不影响 听声辩位的效果。 图 4是 居本发明第二实施例的多点混音远景呈现装置的示意图。 音频处理装置可以包括: 音频获取模块 402 , 用于获取会场中每路音频 码流; 音频处理模块 404 , 用于处理音频码流, 将会议中的音频码流混音, 以及根据会场中音频输入方位混音编码输出; 音频传输模块 406 , 用于将混 音编码后音频输出到会场。 图 5为本发明实施例的多点混音会议系统的示意图。 如图 5所示, 多点混音会议系统可以包括多点处理模块 502、 接入模块 504、 音频处理模块 506、 媒体交换模块 508。 其中, 多点处理模块 502用于控制多点接入, 音频处理, 媒体交换; 接 入模块 504用于接入会议中所有会场的多路音频码流; 音频处理模块 506用 于会场中所有音频码流编解码转换, 混音后编码输出; 媒体交换模块 508 4巴 音频处理模块输出码流交换输出到每个会场。 根据本发明实施例, 还提供了一种多点混音远景呈现系统, 该系统可以 包括: 多个会场, 其中, 每个会场包括一个或多个会议区域, 每个会议区域 均对应一路音频码流; 多点混音远景呈现装置, 用于将各个会场中相应的会 议区域的音频码流相混合, 以及将混合后的音频码流输出至各个会场中的相 应的会议区 i或。 其中, 该系统实施例中的多点混音远景呈现装置可以为上述实施例中的 任一种多点混音远景呈现装置。
会议区域。 从以上的描述中, 可以看出, 本发明能够解决远景呈现会议系统中多点 混音存在的一个或多个问题, 达到能够区分来自不同区域的声音, 进而能够 达到听声 立的高临场感效果。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可 以用通用的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布 在多个计算装置所组成的网络上, 可选地, 它们可以用计算装置可执行的程 序代码来实现, 从而, 可以将它们存储在存储装置中由计算装置来执行, 并 且在某些情况下, 可以以不同于此处的顺序执行所示出或描述的步骤, 或者 将它们分别制作成各个集成电路模块, 或者将它们中的多个模块或步骤制作 成单个集成电路模块来实现。 这样, 本发明不限制于任何特定的硬件和软件 结合。 以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于本 领域的技术人员来说, 本发明可以有各种更改和变化。 凡在本发明的^"神和 原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明的保护 范围之内。

Claims

权 利 要 求 书
1. 一种多点混音远景呈现方法, 其特征在于, 包括:
接收来自多个会场的音频码流, 其中, 每个所述会场包括一个或 多个会议区域, 每个所述会议区域均对应一路音频码流;
将各个所述会场中相应的会议区域的音频码流相混合; 将混合后的音频码流输出至各个所述会场中的所述相应的会议区 域。
2. 根据权利要求 1所述的方法, 其特征在于, 各个所述会议区域分别对 应不同的方位, 则所述将各个所述会场中相应的会议区域的音频码流 相混合包括:
将各个所述会场中具有相同方位的会议区域的音频码流相混合; 将混合后的音频码流输出至各个所述会场中的所述相应的会议区 域包括:
将混合后的音频码流输出至所述具有相同方位的会议区域。
3. 根据权利要求 2所述的方法, 其特征在于, 所述音频码流中包含会议 区域的方位信息, 则将各个所述会场中具有相同方位的会议区域的音 频码流相混合包括:
按照所述方位信息将各个所述会场中具有相同方位的会议区域的 音频码流相混合。
4. 根据权利要求 1所述的方法, 其特征在于, 在所述多个会场中存在包 括一个会议区域的第一会场和包括多个会议区域的第二会场的情况 下, 所述将各个所述会场中相应的会议区域的音频码流相混合包括: 将所述第一会场的会议区域的音频码流和所述第二会场的会议区 i或中之一的音频码流相混合。
5. 居权利要求 4所述的方法, 其特征在于, 所述将混合后的音频码流 输出至各个所述会场中的所述相应的会议区域包括: 将混合后的音频码流输出至所述第一会场的会议区域和所述第二 会场中与所述第一会场的会议区域的音频码流相混合的会议区域。
6. 根据权利要求 4所述的方法, 其特征在于, 所述方法还包括:
将所述多个会场中所有会议区域的音频码流相混合, 并将混合后 的音频码流输出至所述第一会场。
7. 根据权利要求 1至 6中任一项所述的方法, 其特征在于, 所述多个会 场中的一个或任意多个会场包括左、 中、 右三个会议区域。
8. —种多点混音远景呈现装置, 其特征在于, 包括: 接收模块, 用于接收来自多个会场的音频码流, 其中, 每个所述 会场包括一个或多个会议区域, 每个所述会议区域均对应一路音频码 流;
混音模块, 用于将各个所述会场中相应的会议区域的音频码流相 混合;
输出模块, 用于将混合后的音频码流输出至各个所述会场中的所 述相应的会议区域。
9. 一种多点混音远景呈现系统, 其特征在于, 包括: 多个会场, 其中, 每个所述会场包括一个或多个会议区域, 每个 所述会议区域均对应一路音频码流;
多点混音远景呈现装置, 用于将各个所述会场中相应的会议区域 的音频码流相混合, 以及将混合后的音频码流输出至各个所述会场中 的所述相应的会议区域。
10. 根据权利要求 9所述的系统, 其特征在于, 各个所述会场中相应的会 议区域为各个所述会场中具有相同方位信息的会议区域。
PCT/CN2010/080331 2010-06-29 2010-12-27 多点混音远景呈现方法、装置及系统 WO2012000297A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP10854000.6A EP2590360B1 (en) 2010-06-29 2010-12-27 Multi-point sound mixing method, apparatus and system
US13/806,275 US20130103393A1 (en) 2010-06-29 2010-12-27 Multi-point sound mixing and distant view presentation method, apparatus and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010218209.7 2010-06-29
CN201010218209.7A CN101877643B (zh) 2010-06-29 2010-06-29 多点混音远景呈现方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2012000297A1 true WO2012000297A1 (zh) 2012-01-05

Family

ID=43020114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/080331 WO2012000297A1 (zh) 2010-06-29 2010-12-27 多点混音远景呈现方法、装置及系统

Country Status (4)

Country Link
US (1) US20130103393A1 (zh)
EP (1) EP2590360B1 (zh)
CN (1) CN101877643B (zh)
WO (1) WO2012000297A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877643B (zh) * 2010-06-29 2014-12-10 中兴通讯股份有限公司 多点混音远景呈现方法、装置及系统
CN103050124B (zh) 2011-10-13 2016-03-30 华为终端有限公司 混音方法、装置及系统
DK3584394T3 (da) * 2014-08-22 2022-03-21 Schlage Lock Co Llc Bøjlelås med drejningssikring
CN105280192B (zh) * 2015-11-23 2019-04-05 北京华夏电通科技有限公司 基于多路声音编码的三方远程通讯中回声消除方法及系统
CN105847096B (zh) * 2016-05-12 2018-10-30 腾讯科技(深圳)有限公司 一种包含音频数据的通信方法、装置及系统
JP7176418B2 (ja) * 2019-01-17 2022-11-22 日本電信電話株式会社 多地点制御方法、装置及びプログラム
CN115550599A (zh) * 2022-09-22 2022-12-30 苏州科达科技股份有限公司 网呈会场的音视频输出方法、电子设备及存储介质
CN115643242B (zh) * 2022-10-13 2023-07-07 北京华建云鼎科技股份公司 一种多路音频数据处理方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1917622A (zh) * 2005-08-18 2007-02-21 北京德瑞塔时代网络技术有限公司 一种广播级宽带视频会议的声像传输系统及方法
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
US20090088880A1 (en) * 2007-09-30 2009-04-02 Thapa Mukund N Synchronization and Mixing of Audio and Video Streams in Network-Based Video Conferencing Call Systems
CN101877643A (zh) * 2010-06-29 2010-11-03 中兴通讯股份有限公司 多点混音远景呈现方法、装置及系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5889843A (en) * 1996-03-04 1999-03-30 Interval Research Corporation Methods and systems for creating a spatial auditory environment in an audio conference system
US6683858B1 (en) * 2000-06-28 2004-01-27 Paltalk Holdings, Inc. Hybrid server architecture for mixing and non-mixing client conferencing
ATE484157T1 (de) * 2004-05-13 2010-10-15 Qualcomm Inc Synchronisierung von audio und video daten in einem drahtlosen nachrichtenübertragungssystem
US8687820B2 (en) * 2004-06-30 2014-04-01 Polycom, Inc. Stereo microphone processing for teleconferencing
US20070008969A1 (en) * 2005-07-05 2007-01-11 Elstermann Erik J Apparatuses and methods for delivering data stream content to consumer devices
RU2460155C2 (ru) * 2006-09-18 2012-08-27 Конинклейке Филипс Электроникс Н.В. Кодирование и декодирование звуковых объектов
US8073125B2 (en) * 2007-09-25 2011-12-06 Microsoft Corporation Spatial audio conferencing
US8289362B2 (en) * 2007-09-26 2012-10-16 Cisco Technology, Inc. Audio directionality control for a multi-display switched video conferencing system
US8447809B2 (en) * 2008-02-29 2013-05-21 Via Technologies, Inc. System and method for network conference
US8289367B2 (en) * 2008-03-17 2012-10-16 Cisco Technology, Inc. Conferencing and stage display of distributed conference participants
US8316089B2 (en) * 2008-05-06 2012-11-20 Microsoft Corporation Techniques to manage media content for a multimedia conference event
CN101510988B (zh) * 2009-02-19 2012-03-21 华为终端有限公司 一种语音信号的处理、播放方法和装置
US8495726B2 (en) * 2009-09-24 2013-07-23 Avaya Inc. Trust based application filtering
US9154730B2 (en) * 2009-10-16 2015-10-06 Hewlett-Packard Development Company, L.P. System and method for determining the active talkers in a video conference
US8442198B2 (en) * 2009-10-20 2013-05-14 Broadcom Corporation Distributed multi-party conferencing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1917622A (zh) * 2005-08-18 2007-02-21 北京德瑞塔时代网络技术有限公司 一种广播级宽带视频会议的声像传输系统及方法
CN101179693A (zh) * 2007-09-26 2008-05-14 深圳市丽视视讯科技有限公司 一种会议电视系统的混音处理方法
US20090088880A1 (en) * 2007-09-30 2009-04-02 Thapa Mukund N Synchronization and Mixing of Audio and Video Streams in Network-Based Video Conferencing Call Systems
CN101877643A (zh) * 2010-06-29 2010-11-03 中兴通讯股份有限公司 多点混音远景呈现方法、装置及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2590360A4 *

Also Published As

Publication number Publication date
US20130103393A1 (en) 2013-04-25
CN101877643A (zh) 2010-11-03
CN101877643B (zh) 2014-12-10
EP2590360B1 (en) 2019-04-17
EP2590360A4 (en) 2014-06-18
EP2590360A1 (en) 2013-05-08

Similar Documents

Publication Publication Date Title
WO2012000297A1 (zh) 多点混音远景呈现方法、装置及系统
US8739045B2 (en) System and method for managing conversations for a meeting session in a network environment
EP2332346B1 (en) A common scene based conference system
US8483098B2 (en) Method and apparatus for conference spanning
US8836753B2 (en) Method, apparatus, and system for processing cascade conference sites in cascade conference
CN102222503B (zh) 一种音频信号的混音处理方法、装置及系统
US20160092153A1 (en) Connected Classroom
WO2011153905A1 (zh) 一种音频信号的混音处理方法及装置
WO2011140812A1 (zh) 多画面合成方法、系统及媒体处理装置
CN113411538B (zh) 视频会话处理方法、装置及电子设备
WO2011057511A1 (zh) 实现混音的方法、装置和系统
US20090019112A1 (en) Audio and video conferencing using multicasting
WO2011015136A1 (zh) 一种会议控制的方法、装置和系统
US9407448B2 (en) Notification of audio state between endpoint devices
GB2542327A (en) A method and system for controlling communications for video/audio-conferencing
WO2012034476A1 (zh) 级联会议中级联会场的处理方法、装置及级联会议系统
WO2015003532A1 (zh) 多媒体会议的建立方法、装置及系统
JP2005286972A (ja) 多地点会議接続システム、並びに多地点会議接続方法
WO2014177082A1 (zh) 一种视频会议中处理视频的方法及终端
WO2016206471A1 (zh) 多媒体业务处理方法、系统及装置
WO2016082579A1 (zh) 语音输出方法及装置
WO2012055291A1 (zh) 音频数据传输方法及系统
Wong et al. Shared-space: Spatial audio and video layouts for videoconferencing in a virtual room
JP2007135108A (ja) ネットワーク会議支援プログラムおよびネットワーク会議支援サーバ
KR100847147B1 (ko) 화상회의 제어 방법 및 그 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10854000

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13806275

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2010854000

Country of ref document: EP