CN113079267B

CN113079267B - Audio conferencing in a room

Info

Publication number: CN113079267B
Application number: CN202110012253.0A
Authority: CN
Inventors: 内厄姆·诺姆·威斯曼; 马坦·本-阿舍; 内塔内尔·埃亚勒; 阿米尔·本-奇奇
Original assignee: Waves Audio Ltd
Current assignee: Waves Audio Ltd
Priority date: 2020-01-06
Filing date: 2021-01-06
Publication date: 2023-05-05
Anticipated expiration: 2041-01-06
Also published as: CN113079267A

Abstract

The application discloses audio conferences within a room. The first computer system and the second computer system and their respective first microphone and second microphone receive respective portions of the same audio input signal. The audio buffers received from the first computer system and the second computer system, respectively, include data encoded from respective microphone inputs of the first computer system and the second computer system. The received audio buffers are synchronized and corrected for gain differences between the received audio buffers to produce corrected audio buffers. The corrected audio buffers are mixed into an output buffer. Synchronization reduces echo when the output buffer is played at the remote peer computer system.

Description

Audio conferencing in a room

Background

1. Technical field

The present invention relates to improvements in audio quality during audio conferences and audio conferences.

2. Description of related Art

Voice over internet protocol (VoIP) communications include encoding voice into digital data, encapsulating the digital data into data packets, and transmitting the data packets over a data network. A conference call is a telephone call between two or more participants at geographically dispersed locations that enables each participant to speak to and listen to the other participants simultaneously. Teleconferencing between participants may be conducted via a voice conference bridge (voice conference bridge) or a centralized server. Teleconferencing connects a plurality of endpoint devices (VoIP devices or computer systems) associated with participants using an appropriate web conference communication protocol. Alternatively, the teleconference may be mediated peer-to-peer (mediated peer), where audio may be streamed directly between the participant's computer systems without an intermediate server.

Brief summary of the invention

Various systems and methods are disclosed herein in a network that includes first and second computer systems and their respective first and second microphones in an acoustic environment. The first microphone and the second microphone receive respective portions of the same audio input signal. Audio buffers (audio buffers) received from the first computer system and the second computer system, respectively, include data encoded from respective microphone inputs of the first computer system and the second computer system. The received audio buffers are synchronized and corrected for gain differences between the received audio buffers to produce corrected audio buffers. The corrected audio buffers are mixed into an output buffer. Synchronization reduces echo when the output buffer is played on a remote peer computer system. Mixing corrected audio buffering may include boosting (emphsize) audio buffering from a computer system currently being used for audio input and reducing audio input into microphones attached to computer systems not currently being used for audio input. The first computer system and/or the second computer system may perform synchronization and mix corrected audio buffering. Alternatively, the synchronization and mixing corrected audio buffering may be performed by a server in the network. Prior to synchronization and mixing, the system/method may identify portions of the first computer system and the second computer system where microphones may receive the same audio input signal. An audio buffer may be received from a remote peer computer system of a network external to the acoustic environment. The received audio buffer may be transmitted to the first computer system and the second computer system with a corresponding delay such that the received audio buffer is played synchronously on the first computer system and the second computer system. Alternatively, the received audio buffer may be sent to one of the first computer system and the second computer system.

Various computer-readable media are disclosed that, when executed by a processor, cause the processor to perform the methods disclosed herein.

Brief Description of Drawings

The invention is described herein, by way of example only, with reference to the accompanying drawings, in which:

FIG. 1 illustrates a scenario in accordance with features of the present invention;

FIG. 1A schematically illustrates a conventional network connection between computer systems participating in an audio conference;

FIG. 1B illustrates a conventional audio stream during an audio conference;

fig. 2 schematically illustrates audio streaming during an audio conference according to features of the present invention;

fig. 3 schematically illustrates audio streaming during an audio conference according to another feature of the invention;

FIG. 4A illustrates an embodiment of audio streaming for audio received during an audio conference in accordance with features of the present invention;

FIG. 4B illustrates an alternative embodiment of audio streaming for audio received during an audio conference in accordance with features of the present invention;

FIG. 5 illustrates a method in accordance with features of the present invention; and

fig. 6 schematically shows a simplified computer system according to conventional technology.

The foregoing and/or other aspects will become apparent from the following detailed description when considered in conjunction with the accompanying drawings.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below in order to explain the present invention by referring to the figures.

By way of introduction, aspects of the present invention are directed to systems and methods for reducing audio echo or unwanted reverberation in an audio conference or audio video conference implemented over a computer network. In particular, for example, during an audio conference using voice over internet protocol (VoIP), a participant may use a computer workstation equipped with a microphone to participate in the conference. Various embodiments of the present invention may be implemented in a VoIP audio conference implemented by a peer-to-peer or by a VoIP server or a mixture thereof.

Referring now to the drawings, and now to FIG. 1, there is shown a scenario featuring in accordance with the present invention. Fig. 1 shows three participants operating three workstations or

computer systems

10A, 10B, and 10C, respectively, the workstations or

computer systems

10A, 10B, and

10C including microphones

2A, 2B, and 2C and

speakers

3a,3B, and 3C, respectively. The

workstations

10A, 10B are configured in a single room. Workstation 10C is a remote peer computer system operating in another room, another city, or another continent. When there are two participants in a single location (e.g., a single room), the participant's voice may be received by microphone 2A of his workstation and by microphone 2B of his roommate's workstation. Both

workstations

10A and 10B transmit parallel audio streams to remote participants over a network, and when both audio streams of the participants' voices are played, the remote participants of the conference hear echoes of the same voice. Conventionally, participants in a conference sharing a room may be required to ensure that only one microphone is unmuted in order to ensure sound quality.

Referring now also to FIG. 1A, there is schematically shown a network connection between

computer systems

10A, 10B and 10C and a Voice over Internet protocol (VoIP) server 13.

Computer systems

10A and 10B may be conventionally interconnected by a Local Area Network (LAN), which may be implemented by a wired network (e.g., IEEE 802.3 Ethernet) or a wireless network (e.g., IEEE 802.11 Wifi). Reference is now also made to fig. 1B, which illustrates, by way of example, conventional peer-to-peer audio streaming during an audio conference. Specifically, computer system 10A communicates audio buffer A to

computer systems

10B and 10C, and similarly computer system 10B communicates audio buffer B to

computer systems

10A and 10C. In the scenario shown in fig. 1, where the same speech from the participant is encoded into audio buffers a and B (with a sufficiently long delay of greater than 30 milliseconds), then mixed and played at computer system 10C, the speech may hear an echo or unwanted reverberation as it is played at computer system 10C.

Reference is now also made to fig. 2, which schematically illustrates audio streaming during an audio conference according to features of the present invention. Thus, audio buffer B may be transmitted from computer system 10B and received by computer system 10A. At computer system 10A, audio buffers a and B may be synchronized (e.g., within 30 milliseconds), mixed, and transmitted to VoIP server 13.VoIP server 13 may transmit the synchronized and mixed audio buffer to remote computer system 10C, playing sound at remote computer system 10C without echo.

Reference is now also made to fig. 3, which schematically illustrates audio streaming during an audio conference according to another feature of the present invention.

Computer systems

10A and 10B transmit audio buffers a and B, respectively, to VoIP server 13 separately. VoIP server 13 includes a module 14, module 14 can synchronize and mix audio buffers a and B into a synchronized/mixed audio buffer such that audio is played at computer system 10C without echo.

Referring now also to fig. 5, a method 50 in accordance with features of the present invention is shown. In step 51, the conferencing application may identify whether two or more computer systems 10 participating in the audio conference have microphones 2 that may receive the same audio input signal. The identification may be performed by prompting a participant whether another participant of the audio conference is sharing a room with the participant (step 51). In step 52, the corresponding audio buffers may be received from

computer systems

10A and 10B and the audio buffers A and B are synchronized (step 53). In step 54, the gain difference between the received audio buffers a and B may be corrected. The microphone 2A may be less sensitive and/or the signal from the microphone 2A may be streamed at a lower level than the other microphone 2B, so that gain may be added to the microphone 2A to balance the level at play. It is also desirable to increase the gain of the microphone being used by the participant currently speaking relative to other unmuted microphones of the participants in the conference. In step 55, the audio buffers are mixed into output buffers, and the output buffers are sent (step 56) to the remote peer computer system 10C. In step 57, the echo is reduced in the output buffer as it is played in the remote computer system 10C.

Referring now to fig. 4A, there is illustrated audio streaming for audio received during an audio conference in accordance with features of the present invention. Computer system 10A may receive an audio buffer from VoIP server 13, the audio buffer comprising combined audio from a remote peer computer system (not shown). The computer system 10A may transmit audio locally to the computer system 10B so that all computer systems 10 in the same room play the audio synchronously. Referring now also to fig. 4B, there is shown audio streaming for audio received during an audio conference in another configuration. The synchronized audio buffers are sent directly from the VoIP server 13 to the

computer systems

10A and 10B. The received audio buffers may be sent to the first computer system 10A and the second computer system 10B with corresponding delays such that the received audio buffers are played synchronously at the first computer system 10A and the second computer system 10B. Alternatively, one speaker 3 in the same room may play audio.

Referring now to FIG. 6, a simplified computer system 60 is schematically illustrated in accordance with conventional techniques. The computer system 10 includes a processor 601, a storage mechanism including a memory bus 607 for storing information in a memory 609, and a network interface 605 operatively connected to the processor 601 through the peripheral bus 603. The computer system 10 also includes a data input mechanism 611 (disk drive), such as for a computer readable medium 613 (e.g., an optical disk). The data input mechanism 611 is operatively coupled to the processor 601 using a peripheral bus 603. The sound card 614 is operatively connected to the peripheral bus 603. The input of the sound card 614 is operatively connected to the output of the microphone 2 and to the input of the speaker 3.

In this specification and in the following claims, a "computer system" is defined as one or more software modules, one or more hardware modules, or a combination thereof that work together to perform operations on electronic data. For example, the definition of computer system includes the hardware components of a personal computer as well as software modules, such as the operating system of a personal computer. The physical layout of the modules is not important. The computer system may include one or more computers coupled via a computer network. Likewise, a computer system may include a single physical device (e.g., a mobile phone, a laptop computer, or a tablet computer) with internal modules (e.g., memory and a processor) working together to perform operations on electronic data.

In this specification and in the following claims, a "network" is defined as any architecture in which two or more computer systems may exchange data. The data exchanged may be in the form of electrical signals that are meaningful to two or more computer systems. When data is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer system or computer device, the connection is properly viewed as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media. Computer-executable instructions comprise, for example, instructions and data which cause a general purpose computer system or special purpose computer system to perform a certain function or group of functions. The described embodiments may also be embodied as computer readable code on a non-transitory computer readable medium. A non-transitory computer readable medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the non-transitory computer readable medium include read-only memory, random-access memory, CD-ROM, HDD, DVD, magnetic tape, and optical data storage devices. The non-transitory computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.

The various aspects, embodiments, implementations, or features of the described embodiments may be used alone or in any combination. The various aspects of the described embodiments may be implemented in software, hardware, or a combination of hardware and software.

The terms "device," "workstation," and "computer system" are used interchangeably herein.

The term "connected" as used herein refers to both wired and wireless computer connections.

The term "emphasis" as used herein refers to a relative increase in audio gain or audio level.

The term "echo" as used herein refers to hearing when two audio signals having similar or identical audio inputs are played asynchronously with a time delay of greater than about 10-50 milliseconds.

The term "synchronized" or "synchronization" as used herein is less than about 50 milliseconds. In some cases where the participants are in different locations in a large room, there will be some reverberation depending on the room size. In such cases, the term "synchronized" or "synchronization" may refer to less than about 30 milliseconds. Alternatively, in some embodiments of the invention, it may be desirable to reduce reverberation even further, so that synchronization of less than about 20 milliseconds or less than 10 milliseconds may be suggested to be effective.

The transitional term "comprising" as used herein is synonymous with "including" and is broad or open-ended and does not exclude additional, unrecited elements or method steps. The articles "a", "an" (such as "a computer system", "an audio buffer") as used herein have the meaning of "one or more", i.e. "one or more computer systems", "one or more audio buffers".

All optional and preferred features and modifications of the described embodiments and the dependent claims may be used in all aspects of the invention taught herein. Furthermore, the various features of the dependent claims, as well as all optional and preferred features and modifications of the described embodiments, are combinable and interchangeable with each other.

While selected features of the invention have been illustrated and described, it should be understood that the invention is not limited to the described features.

While selected embodiments of the present invention have been shown and described, it should be understood that the invention is not limited to the described embodiments. Rather, it should be understood that changes can be made in these embodiments without departing from the scope of the invention as defined in the following claims and their equivalents.

Claims

1. A system operable in a network comprising a first computer system and a second computer system, wherein the first computer system and the second computer system and their respective first microphone and second microphone are in an acoustic environment, wherein the first microphone and the second microphone receive respective portions of the same audio input signal, the system configured to:

receiving data from respective audio buffers of the first computer system and the second computer system, wherein the data is encoded from respective microphone inputs of the first computer system and the second computer system;

synchronizing received data from the respective audio buffers and correcting a gain difference between said received data of the first microphone input and the second microphone input, thereby producing corrected data; and

mixing the corrected data into an output buffer;

wherein the synchronization reduces echo when the corrected data is played at the remote peer computer system.

2. The system of claim 1, wherein mixing the corrected data includes emphasizing data from a computer system currently being used for audio input and reducing input from a microphone attached to a computer system not currently being used for audio input.

3. The system of claim 1, wherein synchronizing and mixing are performed by a computer system selected from the group consisting of the first computer system and the second computer system.

4. The system of claim 1, wherein synchronizing and mixing are performed by a server in the network.

5. The system of claim 1, further configured to:

a portion of the first computer system and the second computer system where microphones receive the same audio input signal is identified.

6. The system of claim 1, further configured to:

receive remote data from a remote peer computer system of the network, wherein the remote peer computer system is external to the acoustic environment; and

the remote data is transmitted to the first computer system and the second computer system with a corresponding delay such that the remote data is played synchronously at the first computer system and the second computer system or the remote data is transmitted to one of the first computer system and the second computer system.

7. A computerized method executable in a network, the network comprising a first computer system and a second computer system, wherein the first computer system and the second computer system and their respective first microphone and second microphone are in an acoustic environment, wherein the first microphone and the second microphone receive portions of the same audio input signal, the method comprising:

synchronizing the received data and correcting a gain difference between said received data of the first microphone input and the second microphone input, thereby producing corrected data; and

mixing the corrected data into an output buffer;

8. The computerized method of claim 7, further comprising:

transmitting the corrected data to a remote peer computer system of the network external to the acoustic environment.

9. The computerized method of claim 7, wherein the mixing the corrected data includes emphasizing data from a computer system currently being used for audio input and reducing input from a microphone attached to a computer system not currently being used for audio input.

10. The computerized method of claim 7, wherein synchronizing and mixing are performed by a computer system selected from the group consisting of the first computer system and the second computer system.

11. The computerized method of claim 7, wherein synchronizing and mixing are performed by a server in the network.

12. The computerized method of claim 7, further comprising:

13. The computerized method of claim 7, further comprising:

the remote data is transmitted to the first computer system and the second computer system with a corresponding delay such that the remote data is played synchronously at the first computer system and the second computer system or the remote data for playing is transmitted to one of the first computer system and the second computer system.

14. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a method in a network comprising first and second computer systems, wherein the first and second computer systems and their respective first and second microphones are in an acoustic environment, wherein the first and second microphones receive portions of a same audio input signal, the method comprising:

mixing the corrected data into an output buffer;

15. The non-transitory computer readable storage medium of claim 14, wherein the mixing includes enhancing data from a computer system currently being used for audio input and reducing input from a microphone attached to a computer system not currently being used for audio input.

16. The non-transitory computer readable storage medium of claim 14, further storing instructions that, when executed by a processor, cause the processor to perform:

17. The non-transitory computer readable storage medium of claim 14, further storing instructions that, when executed by a processor, cause the processor to perform: