CN113079267B - Audio conferencing in a room - Google Patents

Audio conferencing in a room Download PDF

Info

Publication number
CN113079267B
CN113079267B CN202110012253.0A CN202110012253A CN113079267B CN 113079267 B CN113079267 B CN 113079267B CN 202110012253 A CN202110012253 A CN 202110012253A CN 113079267 B CN113079267 B CN 113079267B
Authority
CN
China
Prior art keywords
computer system
data
microphone
audio
remote
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110012253.0A
Other languages
Chinese (zh)
Other versions
CN113079267A (en
Inventor
内厄姆·诺姆·威斯曼
马坦·本-阿舍
内塔内尔·埃亚勒
阿米尔·本-奇奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Waves Audio Ltd
Original Assignee
Waves Audio Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/092,339 external-priority patent/US11425258B2/en
Application filed by Waves Audio Ltd filed Critical Waves Audio Ltd
Publication of CN113079267A publication Critical patent/CN113079267A/en
Application granted granted Critical
Publication of CN113079267B publication Critical patent/CN113079267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/006Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses audio conferences within a room. The first computer system and the second computer system and their respective first microphone and second microphone receive respective portions of the same audio input signal. The audio buffers received from the first computer system and the second computer system, respectively, include data encoded from respective microphone inputs of the first computer system and the second computer system. The received audio buffers are synchronized and corrected for gain differences between the received audio buffers to produce corrected audio buffers. The corrected audio buffers are mixed into an output buffer. Synchronization reduces echo when the output buffer is played at the remote peer computer system.

Description

Audio conferencing in a room
Background
1. Technical field
The present invention relates to improvements in audio quality during audio conferences and audio conferences.
2. Description of related Art
Voice over internet protocol (VoIP) communications include encoding voice into digital data, encapsulating the digital data into data packets, and transmitting the data packets over a data network. A conference call is a telephone call between two or more participants at geographically dispersed locations that enables each participant to speak to and listen to the other participants simultaneously. Teleconferencing between participants may be conducted via a voice conference bridge (voice conference bridge) or a centralized server. Teleconferencing connects a plurality of endpoint devices (VoIP devices or computer systems) associated with participants using an appropriate web conference communication protocol. Alternatively, the teleconference may be mediated peer-to-peer (mediated peer), where audio may be streamed directly between the participant's computer systems without an intermediate server.
Brief summary of the invention
Various systems and methods are disclosed herein in a network that includes first and second computer systems and their respective first and second microphones in an acoustic environment. The first microphone and the second microphone receive respective portions of the same audio input signal. Audio buffers (audio buffers) received from the first computer system and the second computer system, respectively, include data encoded from respective microphone inputs of the first computer system and the second computer system. The received audio buffers are synchronized and corrected for gain differences between the received audio buffers to produce corrected audio buffers. The corrected audio buffers are mixed into an output buffer. Synchronization reduces echo when the output buffer is played on a remote peer computer system. Mixing corrected audio buffering may include boosting (emphsize) audio buffering from a computer system currently being used for audio input and reducing audio input into microphones attached to computer systems not currently being used for audio input. The first computer system and/or the second computer system may perform synchronization and mix corrected audio buffering. Alternatively, the synchronization and mixing corrected audio buffering may be performed by a server in the network. Prior to synchronization and mixing, the system/method may identify portions of the first computer system and the second computer system where microphones may receive the same audio input signal. An audio buffer may be received from a remote peer computer system of a network external to the acoustic environment. The received audio buffer may be transmitted to the first computer system and the second computer system with a corresponding delay such that the received audio buffer is played synchronously on the first computer system and the second computer system. Alternatively, the received audio buffer may be sent to one of the first computer system and the second computer system.
Various computer-readable media are disclosed that, when executed by a processor, cause the processor to perform the methods disclosed herein.
Brief Description of Drawings
The invention is described herein, by way of example only, with reference to the accompanying drawings, in which:
FIG. 1 illustrates a scenario in accordance with features of the present invention;
FIG. 1A schematically illustrates a conventional network connection between computer systems participating in an audio conference;
FIG. 1B illustrates a conventional audio stream during an audio conference;
fig. 2 schematically illustrates audio streaming during an audio conference according to features of the present invention;
fig. 3 schematically illustrates audio streaming during an audio conference according to another feature of the invention;
FIG. 4A illustrates an embodiment of audio streaming for audio received during an audio conference in accordance with features of the present invention;
FIG. 4B illustrates an alternative embodiment of audio streaming for audio received during an audio conference in accordance with features of the present invention;
FIG. 5 illustrates a method in accordance with features of the present invention; and
fig. 6 schematically shows a simplified computer system according to conventional technology.
The foregoing and/or other aspects will become apparent from the following detailed description when considered in conjunction with the accompanying drawings.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.
By way of introduction, aspects of the present invention are directed to systems and methods for reducing audio echo or unwanted reverberation in an audio conference or audio video conference implemented over a computer network. In particular, for example, during an audio conference using voice over internet protocol (VoIP), a participant may use a computer workstation equipped with a microphone to participate in the conference. Various embodiments of the present invention may be implemented in a VoIP audio conference implemented by a peer-to-peer or by a VoIP server or a mixture thereof.
Referring now to the drawings, and now to FIG. 1, there is shown a scenario featuring in accordance with the present invention. Fig. 1 shows three participants operating three workstations or computer systems 10A, 10B, and 10C, respectively, the workstations or computer systems 10A, 10B, and 10C including microphones 2A, 2B, and 2C and speakers 3a,3B, and 3C, respectively. The workstations 10A, 10B are configured in a single room. Workstation 10C is a remote peer computer system operating in another room, another city, or another continent. When there are two participants in a single location (e.g., a single room), the participant's voice may be received by microphone 2A of his workstation and by microphone 2B of his roommate's workstation. Both workstations 10A and 10B transmit parallel audio streams to remote participants over a network, and when both audio streams of the participants' voices are played, the remote participants of the conference hear echoes of the same voice. Conventionally, participants in a conference sharing a room may be required to ensure that only one microphone is unmuted in order to ensure sound quality.
Referring now also to FIG. 1A, there is schematically shown a network connection between computer systems 10A, 10B and 10C and a Voice over Internet protocol (VoIP) server 13. Computer systems 10A and 10B may be conventionally interconnected by a Local Area Network (LAN), which may be implemented by a wired network (e.g., IEEE 802.3 Ethernet) or a wireless network (e.g., IEEE 802.11 Wifi). Reference is now also made to fig. 1B, which illustrates, by way of example, conventional peer-to-peer audio streaming during an audio conference. Specifically, computer system 10A communicates audio buffer A to computer systems 10B and 10C, and similarly computer system 10B communicates audio buffer B to computer systems 10A and 10C. In the scenario shown in fig. 1, where the same speech from the participant is encoded into audio buffers a and B (with a sufficiently long delay of greater than 30 milliseconds), then mixed and played at computer system 10C, the speech may hear an echo or unwanted reverberation as it is played at computer system 10C.
Reference is now also made to fig. 2, which schematically illustrates audio streaming during an audio conference according to features of the present invention. Thus, audio buffer B may be transmitted from computer system 10B and received by computer system 10A. At computer system 10A, audio buffers a and B may be synchronized (e.g., within 30 milliseconds), mixed, and transmitted to VoIP server 13.VoIP server 13 may transmit the synchronized and mixed audio buffer to remote computer system 10C, playing sound at remote computer system 10C without echo.
Reference is now also made to fig. 3, which schematically illustrates audio streaming during an audio conference according to another feature of the present invention. Computer systems 10A and 10B transmit audio buffers a and B, respectively, to VoIP server 13 separately. VoIP server 13 includes a module 14, module 14 can synchronize and mix audio buffers a and B into a synchronized/mixed audio buffer such that audio is played at computer system 10C without echo.
Referring now also to fig. 5, a method 50 in accordance with features of the present invention is shown. In step 51, the conferencing application may identify whether two or more computer systems 10 participating in the audio conference have microphones 2 that may receive the same audio input signal. The identification may be performed by prompting a participant whether another participant of the audio conference is sharing a room with the participant (step 51). In step 52, the corresponding audio buffers may be received from computer systems 10A and 10B and the audio buffers A and B are synchronized (step 53). In step 54, the gain difference between the received audio buffers a and B may be corrected. The microphone 2A may be less sensitive and/or the signal from the microphone 2A may be streamed at a lower level than the other microphone 2B, so that gain may be added to the microphone 2A to balance the level at play. It is also desirable to increase the gain of the microphone being used by the participant currently speaking relative to other unmuted microphones of the participants in the conference. In step 55, the audio buffers are mixed into output buffers, and the output buffers are sent (step 56) to the remote peer computer system 10C. In step 57, the echo is reduced in the output buffer as it is played in the remote computer system 10C.
Referring now to fig. 4A, there is illustrated audio streaming for audio received during an audio conference in accordance with features of the present invention. Computer system 10A may receive an audio buffer from VoIP server 13, the audio buffer comprising combined audio from a remote peer computer system (not shown). The computer system 10A may transmit audio locally to the computer system 10B so that all computer systems 10 in the same room play the audio synchronously. Referring now also to fig. 4B, there is shown audio streaming for audio received during an audio conference in another configuration. The synchronized audio buffers are sent directly from the VoIP server 13 to the computer systems 10A and 10B. The received audio buffers may be sent to the first computer system 10A and the second computer system 10B with corresponding delays such that the received audio buffers are played synchronously at the first computer system 10A and the second computer system 10B. Alternatively, one speaker 3 in the same room may play audio.
Referring now to FIG. 6, a simplified computer system 60 is schematically illustrated in accordance with conventional techniques. The computer system 10 includes a processor 601, a storage mechanism including a memory bus 607 for storing information in a memory 609, and a network interface 605 operatively connected to the processor 601 through the peripheral bus 603. The computer system 10 also includes a data input mechanism 611 (disk drive), such as for a computer readable medium 613 (e.g., an optical disk). The data input mechanism 611 is operatively coupled to the processor 601 using a peripheral bus 603. The sound card 614 is operatively connected to the peripheral bus 603. The input of the sound card 614 is operatively connected to the output of the microphone 2 and to the input of the speaker 3.
In this specification and in the following claims, a "computer system" is defined as one or more software modules, one or more hardware modules, or a combination thereof that work together to perform operations on electronic data. For example, the definition of computer system includes the hardware components of a personal computer as well as software modules, such as the operating system of a personal computer. The physical layout of the modules is not important. The computer system may include one or more computers coupled via a computer network. Likewise, a computer system may include a single physical device (e.g., a mobile phone, a laptop computer, or a tablet computer) with internal modules (e.g., memory and a processor) working together to perform operations on electronic data.
In this specification and in the following claims, a "network" is defined as any architecture in which two or more computer systems may exchange data. The data exchanged may be in the form of electrical signals that are meaningful to two or more computer systems. When data is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer system or computer device, the connection is properly viewed as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media. Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer system or special purpose computer system to perform a certain function or group of functions. The described embodiments may also be embodied as computer readable code on a non-transitory computer readable medium. A non-transitory computer readable medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the non-transitory computer readable medium include read-only memory, random-access memory, CD-ROM, HDD, DVD, magnetic tape, and optical data storage devices. The non-transitory computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
The various aspects, embodiments, implementations, or features of the described embodiments may be used alone or in any combination. The various aspects of the described embodiments may be implemented in software, hardware, or a combination of hardware and software.
The terms "device," "workstation," and "computer system" are used interchangeably herein.
The term "connected" as used herein refers to both wired and wireless computer connections.
The term "emphasis" as used herein refers to a relative increase in audio gain or audio level.
The term "echo" as used herein refers to hearing when two audio signals having similar or identical audio inputs are played asynchronously with a time delay of greater than about 10-50 milliseconds.
The term "synchronized" or "synchronization" as used herein is less than about 50 milliseconds. In some cases where the participants are in different locations in a large room, there will be some reverberation depending on the room size. In such cases, the term "synchronized" or "synchronization" may refer to less than about 30 milliseconds. Alternatively, in some embodiments of the invention, it may be desirable to reduce reverberation even further, so that synchronization of less than about 20 milliseconds or less than 10 milliseconds may be suggested to be effective.
The transitional term "comprising" as used herein is synonymous with "including" and is broad or open-ended and does not exclude additional, unrecited elements or method steps. The articles "a", "an" (such as "a computer system", "an audio buffer") as used herein have the meaning of "one or more", i.e. "one or more computer systems", "one or more audio buffers".
All optional and preferred features and modifications of the described embodiments and the dependent claims may be used in all aspects of the invention taught herein. Furthermore, the various features of the dependent claims, as well as all optional and preferred features and modifications of the described embodiments, are combinable and interchangeable with each other.
While selected features of the invention have been illustrated and described, it should be understood that the invention is not limited to the described features.
While selected embodiments of the present invention have been shown and described, it should be understood that the invention is not limited to the described embodiments. Rather, it should be understood that changes can be made in these embodiments without departing from the scope of the invention as defined in the following claims and their equivalents.

Claims (17)

1. A system operable in a network comprising a first computer system and a second computer system, wherein the first computer system and the second computer system and their respective first microphone and second microphone are in an acoustic environment, wherein the first microphone and the second microphone receive respective portions of the same audio input signal, the system configured to:
receiving data from respective audio buffers of the first computer system and the second computer system, wherein the data is encoded from respective microphone inputs of the first computer system and the second computer system;
synchronizing received data from the respective audio buffers and correcting a gain difference between said received data of the first microphone input and the second microphone input, thereby producing corrected data; and
mixing the corrected data into an output buffer;
wherein the synchronization reduces echo when the corrected data is played at the remote peer computer system.
2. The system of claim 1, wherein mixing the corrected data includes emphasizing data from a computer system currently being used for audio input and reducing input from a microphone attached to a computer system not currently being used for audio input.
3. The system of claim 1, wherein synchronizing and mixing are performed by a computer system selected from the group consisting of the first computer system and the second computer system.
4. The system of claim 1, wherein synchronizing and mixing are performed by a server in the network.
5. The system of claim 1, further configured to:
a portion of the first computer system and the second computer system where microphones receive the same audio input signal is identified.
6. The system of claim 1, further configured to:
receive remote data from a remote peer computer system of the network, wherein the remote peer computer system is external to the acoustic environment; and
the remote data is transmitted to the first computer system and the second computer system with a corresponding delay such that the remote data is played synchronously at the first computer system and the second computer system or the remote data is transmitted to one of the first computer system and the second computer system.
7. A computerized method executable in a network, the network comprising a first computer system and a second computer system, wherein the first computer system and the second computer system and their respective first microphone and second microphone are in an acoustic environment, wherein the first microphone and the second microphone receive portions of the same audio input signal, the method comprising:
receiving data from respective audio buffers of the first computer system and the second computer system, wherein the data is encoded from respective microphone inputs of the first computer system and the second computer system;
synchronizing the received data and correcting a gain difference between said received data of the first microphone input and the second microphone input, thereby producing corrected data; and
mixing the corrected data into an output buffer;
wherein the synchronization reduces echo when the corrected data is played at the remote peer computer system.
8. The computerized method of claim 7, further comprising:
transmitting the corrected data to a remote peer computer system of the network external to the acoustic environment.
9. The computerized method of claim 7, wherein the mixing the corrected data includes emphasizing data from a computer system currently being used for audio input and reducing input from a microphone attached to a computer system not currently being used for audio input.
10. The computerized method of claim 7, wherein synchronizing and mixing are performed by a computer system selected from the group consisting of the first computer system and the second computer system.
11. The computerized method of claim 7, wherein synchronizing and mixing are performed by a server in the network.
12. The computerized method of claim 7, further comprising:
a portion of the first computer system and the second computer system where microphones receive the same audio input signal is identified.
13. The computerized method of claim 7, further comprising:
receive remote data from a remote peer computer system of the network, wherein the remote peer computer system is external to the acoustic environment; and
the remote data is transmitted to the first computer system and the second computer system with a corresponding delay such that the remote data is played synchronously at the first computer system and the second computer system or the remote data for playing is transmitted to one of the first computer system and the second computer system.
14. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a method in a network comprising first and second computer systems, wherein the first and second computer systems and their respective first and second microphones are in an acoustic environment, wherein the first and second microphones receive portions of a same audio input signal, the method comprising:
receiving data from respective audio buffers of the first computer system and the second computer system, wherein the data is encoded from respective microphone inputs of the first computer system and the second computer system;
synchronizing the received data and correcting a gain difference between said received data of the first microphone input and the second microphone input, thereby producing corrected data; and
mixing the corrected data into an output buffer;
wherein the synchronization reduces echo when the corrected data is played at the remote peer computer system.
15. The non-transitory computer readable storage medium of claim 14, wherein the mixing includes enhancing data from a computer system currently being used for audio input and reducing input from a microphone attached to a computer system not currently being used for audio input.
16. The non-transitory computer readable storage medium of claim 14, further storing instructions that, when executed by a processor, cause the processor to perform:
a portion of the first computer system and the second computer system where microphones receive the same audio input signal is identified.
17. The non-transitory computer readable storage medium of claim 14, further storing instructions that, when executed by a processor, cause the processor to perform:
receive remote data from a remote peer computer system of the network, wherein the remote peer computer system is external to the acoustic environment; and
the remote data is transmitted to the first computer system and the second computer system with a corresponding delay such that the remote data is played synchronously at the first computer system and the second computer system or the remote data for playing is transmitted to one of the first computer system and the second computer system.
CN202110012253.0A 2020-01-06 2021-01-06 Audio conferencing in a room Active CN113079267B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202062957372P 2020-01-06 2020-01-06
US62/957,372 2020-01-06
US17/092,339 2020-11-09
US17/092,339 US11425258B2 (en) 2020-01-06 2020-11-09 Audio conferencing in a room

Publications (2)

Publication Number Publication Date
CN113079267A CN113079267A (en) 2021-07-06
CN113079267B true CN113079267B (en) 2023-05-05

Family

ID=76609309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110012253.0A Active CN113079267B (en) 2020-01-06 2021-01-06 Audio conferencing in a room

Country Status (1)

Country Link
CN (1) CN113079267B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102461140A (en) * 2009-04-14 2012-05-16 思杰系统有限公司 Systems and methods for computer and voice conference audio transmission during conference call via voip device
CN102625006A (en) * 2011-01-31 2012-08-01 深圳三石科技有限公司 Method and system for synchronization and alignment of echo cancellation data and audio communication equipment
US8406415B1 (en) * 2007-03-14 2013-03-26 Clearone Communications, Inc. Privacy modes in an open-air multi-port conferencing device
CN103583032A (en) * 2011-05-11 2014-02-12 锐德世加拿大公司 Resource efficient acoustic echo cancellation in IP networks
CN107408395A (en) * 2015-04-05 2017-11-28 高通股份有限公司 Conference audio management

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8243631B2 (en) * 2006-12-27 2012-08-14 Nokia Corporation Detecting devices in overlapping audio space
US8560331B1 (en) * 2010-08-02 2013-10-15 Sony Computer Entertainment America Llc Audio acceleration
US9767784B2 (en) * 2014-07-09 2017-09-19 2236008 Ontario Inc. System and method for acoustic management
GB201414352D0 (en) * 2014-08-13 2014-09-24 Microsoft Corp Reversed echo canceller

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8406415B1 (en) * 2007-03-14 2013-03-26 Clearone Communications, Inc. Privacy modes in an open-air multi-port conferencing device
CN102461140A (en) * 2009-04-14 2012-05-16 思杰系统有限公司 Systems and methods for computer and voice conference audio transmission during conference call via voip device
CN102625006A (en) * 2011-01-31 2012-08-01 深圳三石科技有限公司 Method and system for synchronization and alignment of echo cancellation data and audio communication equipment
CN103583032A (en) * 2011-05-11 2014-02-12 锐德世加拿大公司 Resource efficient acoustic echo cancellation in IP networks
CN107408395A (en) * 2015-04-05 2017-11-28 高通股份有限公司 Conference audio management

Also Published As

Publication number Publication date
CN113079267A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
US11386912B1 (en) Method and computer program product for allowing a plurality of musicians who are in physically separate locations to create a single musical performance using a teleconferencing platform provided by a host server
US11910344B2 (en) Conference audio management
CN113273153B (en) System and method for distributed call processing and audio enhancement in a conference environment
US8606249B1 (en) Methods and systems for enhancing audio quality during teleconferencing
US10732924B2 (en) Teleconference recording management system
US11710488B2 (en) Transcription of communications using multiple speech recognition systems
US8700720B2 (en) System architecture for linking packet-switched and circuit-switched clients
US11782674B2 (en) Centrally controlling communication at a venue
US11521636B1 (en) Method and apparatus for using a test audio pattern to generate an audio signal transform for use in performing acoustic echo cancellation
US11985173B2 (en) Method and electronic device for Bluetooth audio multi-streaming
WO2012055291A1 (en) Method and system for transmitting audio data
CN113079267B (en) Audio conferencing in a room
US11425258B2 (en) Audio conferencing in a room
US11089164B2 (en) Teleconference recording management system
JP2011087074A (en) Output controller of remote conversation system, method thereof, and computer executable program
JP5210788B2 (en) Speech signal communication system, speech synthesizer, speech synthesis processing method, speech synthesis processing program, and recording medium storing the program
JP4522332B2 (en) Audiovisual distribution system, method and program
EP4300918A1 (en) A method for managing sound in a virtual conferencing system, a related system, a related acoustic management module, a related client device
JP2008005028A (en) Video voice conference system and terminal
JP2022108957A (en) Data processing device, data processing system, and voice processing method
JPH02150153A (en) Voice conference system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant