WO2018219459A1 - Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur - Google Patents

Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur Download PDF

Info

Publication number
WO2018219459A1
WO2018219459A1 PCT/EP2017/063266 EP2017063266W WO2018219459A1 WO 2018219459 A1 WO2018219459 A1 WO 2018219459A1 EP 2017063266 W EP2017063266 W EP 2017063266W WO 2018219459 A1 WO2018219459 A1 WO 2018219459A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
idi
audio
primary
segment
Prior art date
Application number
PCT/EP2017/063266
Other languages
English (en)
Inventor
Keven WANG
Elena Fersman
Athanasios KARAPANTELAKIS
Leonid Mokrushin
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/EP2017/063266 priority Critical patent/WO2018219459A1/fr
Publication of WO2018219459A1 publication Critical patent/WO2018219459A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • Embodiments of the invention relate to the field of computer augmented hearing; and more specifically, to enhancing an audio signal captured in an indoor environment.
  • Noise cancellation technologies are based on isolating and enhancing one sound while suppressing or attenuating other sounds in an environment.
  • the choice of the sound to be enhanced is determined based on a priori information about the different sounds present in the environment. For example, a noise cancellation system can be pre-trained to classify audio signals corresponding to different sounds based on whether the audio signals can be classified as "echos" or not.
  • a noise cancellation system may be preconfigured to assume that the strongest audio signal received corresponds to the most important sound and it is the audio signal that needs to be enhanced.
  • One general aspect includes a method of enhancing an audio signal captured in an indoor environment, the method including: receiving from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, where the first audio signal includes a primary audio signal that is to be enhanced; receiving, from a secondary audio signal source, a second input indicative of a second audio signal, where the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment; determining that the first audio signal includes the second audio signal; modifying, based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, where the modified version of the first audio signal enhances the primary audio signal; and causing the modified version of the first audio signal to be output to a receiver.
  • IDI first importance designation indicator
  • a machine-readable medium including computer program code which when executed by a computer carries out the method including the operations of receiving from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, where the first audio signal includes a primary audio signal that is to be enhanced; receiving, from a secondary audio signal source, a second input indicative of a second audio signal, where the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment; determining that the first audio signal includes the second audio signal; modifying, based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, where the modified version of the first audio signal enhances the primary audio signal; and causing the modified version of the first audio signal to be output to a receiver.
  • IDI first importance designation indicator
  • One general aspect includes an apparatus for enhancing an audio signal captured in an indoor environment, the apparatus including: an audio signal alteration unit to perform the following operations: receiving from a primary audio signal input a first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, where the first audio signal includes a primary audio signal that is to be enhanced; receiving, from a secondary audio signal source, a second input indicative of a second audio signal, where the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment;
  • IDI first importance designation indicator
  • the first audio signal includes the second audio signal; modifying, based on the first IDI and the second IDI, the first audio signal to obtain a modified version of the first audio signal, where the modified version of the first audio signal enhances the primary audio signal; and causing the modified version of the first audio signal to be output to a receiver.
  • Figure 1 illustrates a block diagram of an exemplary audio system 100 for enabling adaptive audio signal alteration, in accordance with some embodiments.
  • Figure 2A illustrates a transactional diagram of exemplary operations for receiving audio signals at an audio signal alteration unit, in accordance with some embodiments.
  • Figure 2B illustrates a transactional diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.
  • Figure 3A illustrates exemplary audio samples received at the audio signal alteration unit, in accordance with some embodiments.
  • Figure 3B illustrates exemplary audio samples synchronized at the audio signal alteration unit, in accordance with some embodiments.
  • Figure 4A illustrates a flow diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.
  • Figure 4B illustrates a flow diagram of exemplary operations for determining that a first audio signal includes a second audio signal, in accordance with some embodiments.
  • Figure 4C illustrates a flow diagram of exemplary operations for modifying, based on importance designation indicators, one or more audio signals, in accordance with some embodiments.
  • Figure 5 illustrates a block diagram of an exemplary implementation of audio alteration unit, in accordance with some embodiments.
  • partitioning/sharing/duplication implementations types and interrelationships of system components, and logic partitioning/integration choices are set forth in order to provide a more thorough understanding of the present invention. It will be appreciated, however, by one skilled in the art that the invention may be practiced without such specific details. In other instances, control structures, gate level circuits and full software instruction sequences have not been shown in detail in order not to obscure the invention. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
  • references in the specification to "one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • Bracketed text and blocks with dashed borders may be used herein to illustrate optional operations that add additional features to embodiments of the invention. However, such notation should not be taken to mean that these are the only options or optional operations, and/or that blocks with solid borders are not optional in certain embodiments of the invention.
  • Coupled is used to indicate that two or more elements, which may or may not be in direct physical or electrical contact with each other, co-operate or interact with each other.
  • Connected is used to indicate the establishment of communication between two or more elements that are coupled with each other.
  • the embodiments of the present invention present a method, an apparatus and a system for enhancing an audio signal captured in an indoor environment.
  • the system includes multiple sound sources and an audio signal alteration unit.
  • the sound sources are coupled with the audio signal alteration unit and transmit to the audio signal alteration unit audio signals or an indication of an audio signal.
  • the audio signal alteration unit receives information about the generated/captured audio signals from the various sound sources and mixes them such that background noise is altered and a primary sound is enhanced.
  • a first audio signal is received from a primary audio signal input.
  • the first audio signal that is associated with a first importance designation indicator (IDI) that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment, wherein the first audio signal includes a primary audio signal that is to be enhanced.
  • IDI first importance designation indicator
  • a second input indicative of a second audio signal is received from a secondary audio signal source.
  • the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment.
  • a determination that the first audio signal includes the second audio signal is performed.
  • a modification, based on the first IDI and the second IDI, of first audio signal is performed to obtain a modified version of the first audio signal.
  • the modified version of the first audio signal enhances the primary audio signal.
  • the modified version of the first audio signal is output to a receiver.
  • FIG. 1 illustrates a block diagram of an exemplary audio system 100 for enabling adaptive audio signal alteration, in accordance with some embodiments.
  • the audio system 100 includes an audio input 105, one or more secondary audio signal sources 103, an audio signal alteration unit (AAU) 102, an audio output 107, an audio signal(s) database 104, and optionally a configuration unit 106.
  • the audio system 100 is a telecommunication system that allows a first user of the system to transmit audio to a second user of the audio system 100. The second user is located remotely from the location of the first user.
  • the audio system 100 can be part of an audio/video system enabling audio and video telecommunication.
  • the audio system 100 may be part of a teleconference system enabling the first user to communicate with one or more remote users. While the embodiments below will be described with reference to a first user communicating with a second user through the audio system 100, one of ordinary skill in the art would understand that this is intended to be exemplary only and should not be limiting. For example, any number of users can be located at two or more locations communicating through the audio system 100 without departing from the scope of the present invention.
  • the audio system 100 is operative to receive input audio signal(s), e.g., first and secondary input audio data, and analyze the signal(s) to identify secondary noises/sounds that need to be cancelled or attenuated in order to enhance a primary audio signal.
  • the first user of the audio system 100 may be speaking through the primary audio input 105 in an indoor environment 101, e.g., a teleconference room, a room in a private residence, a collaborative workspace, etc.
  • the indoor environment may include one or more additional audio signal sources, i.e., the secondary audio signal sources 103 that can be referred to as audio sources 103.
  • the additional audio signal sources create or record sounds that occur in the indoor environment.
  • the audio sources 103 can include a washing machine, a television (TV), a microphone recording outdoor noises that enter the room from a window, etc.. These sounds are considered to be secondary and more often will cause the second user, at the end of the audio system 100, to receive a sound that mixes the primary audio signal with some or all of the secondary sounds/noises. The mixed sounds/audio signals deteriorate the experience of the second user.
  • the audio system 100 is operative to receive the first audio signal which includes the primary audio signal, e.g., the voice of the first user, mixed with one or more sounds from the secondary audio signals, e.g., the sound of the washing machine, the sound of the TV, the sound from the outdoor environment, and/or any other sounds that can be heard in the indoor environment where the first user is located.
  • the audio system 100 is operative to alter the first audio signal to obtain a modified version (4) which includes an enhanced primary audio signal consequently improving the experience of the second user.
  • first audio signal are received at the primary audio input 105.
  • the primary audio input 105 includes a microphone operative to convert incoming sounds into electrical input audio signal(s).
  • the primary audio input 105 may include an Analog-to-Digital (ADC) converter operative to convert sound(s) into a digital input audio signal.
  • ADC Analog-to-Digital
  • the first audio signal includes a mix of a primary audio signal that is representative of a primary sound such as the voice of the first user and one or more additional audio signals that represent different sounds from the environment of the first user.
  • the additional audio signals are ambient sounds that occur in the environment of the user while the user is speaking via the audio system 100.
  • the first audio signal is associated with a time value indicating the time at which the signal is received at the audio system 100.
  • the audio system 100 may further receive other input audio signals from the secondary audio signal sources 103.
  • Each one of the secondary audio signal sources 103A-C transmits to the audio signal alteration unit 102 data about the sound it produces or captures.
  • Such data may come either in the format of digital audio stream, referred to as audio signal containing the sound itself, or in a form of an identifier, audio signal ID, that identifies an audio stream at a storage medium, e.g., the audio signal database 104.
  • Each data, audio signal or audio signal ID, from the audio signal sources 103 is associated with a time value indicating the time at which the signal is received at the audio system 100.
  • active audio signal source 103 A transmits an audio signal ID (2a), which when received by the AAU 102 is used, at operation 21a, to retrieve the audio signal, at operation (21b) from the audio signal database 104 corresponding to that audio signal ID.
  • the audio signal may be a sound emitted by the washing machine when the cleaning cycle is complete. This sound has been previously stored at the audio signal database 104, and when the washing machine emits that sound instead of sending the sound to the audio signal alteration unit 102, the ID associated with that particular sound is transmitted.
  • the ID of the audio signal is determined automatically using acoustic or audio fingerprinting.
  • the ID is a fingerprint of the audio signal generated either at the active audio signal source and transmitted or from AAU 102 upon reception of the audio signal from the active audio signal source.
  • the fingerprint is typically a few seconds long and used to retrieve the audio signal from the audio signal database 104.
  • the audio signal database 104 has stored a priori fingerprinted and signal versions for all audio streams and on request from AAU 102 matches the generated audio fingerprint with the fingerprints stored, finally returning the audio signal of the closest match.
  • the active audio signal source 103B and the passive audio signal source 103C transmit, at operations 2b and 2c respectively, respective audio signals to the AAU 102. These audio signals include the audio stream and the AAU 102 does not need to access the audio signal database 104 to retrieve the audio signals.
  • the secondary audio signal sources 103 may include active or passive audio signal sources.
  • An active audio signal source is a device that produces one or multiple sounds by itself.
  • an active audio signal source is also operative to send the same content to the AAU 102 over a communication network, e.g., network 110.
  • a digital TV, active audio signal source 103B, or a connected speaker, not illustrated are active sound producers as they generate sounds.
  • a washing machine, active audio signal source 103 A can also be considered as an active sound producer as it produces sounds during its operations, e.g. relay clicks, motor noises, end of cycle alarms, etc.
  • a passive audio signal source is a device operative to record one or more sounds from an entity that is not capable of reporting its sounds.
  • a passive audio signal source e.g., source 103C
  • the passive audio signal source is an "agent" of the passive entities and sends captured sounds to the AAU 102 on behalf of the passive entities.
  • a passive audio signal source 103C can be a microphone deployed near a window operative to capture and report, to the AAU 102, sounds from the outdoor environment.
  • the AAU 102 is operative to receive from the secondary audio signal sources and/or from the configuration unit 106 audio signal metadata, which can be referred to as "metadata", associated with different sounds that may be included in the first audio signal.
  • the audio signal metadata is associated with an audio signal and includes an importance designation indicator (ID I).
  • IDI indicates an importance of the audio signal with respect to one or more other audio signals in the indoor environment.
  • the IDI of the first audio signal indicates that the primary audio input 105 is the source of the most important signal.
  • the IDI associated with each one of the secondary audio signal sources 103 indicates that the audio signal received from these sources is of lower importance than the audio signal from audio input 105.
  • the IDI can take one of two Boolean values, i.e., a first value indicating that the corresponding audio signal is "important" and a second value indicating that the corresponding audio signal is "non-important".
  • the degree of alteration of a secondary signal can depend from its importance relative to other secondary audio signals.
  • an administrator of the audio system 100 can configure, through a communication interface, not shown, each one of the secondary audio signal sources 103 individually with an associated IDI. Additionally or alternatively, the administrator may configure the AAU 102, through the configuration unit 106, to associate with each audio signal originating from a given audio source or audio input with an IDI. In these embodiments, the administrator may identify the source of the audio signal and the audio signal with an audio signal descriptor.
  • the metadata also includes the audio signal descriptor.
  • the audio signal descriptor includes one or more parameter values that define the audio source and the audio signal.
  • the audio signal descriptor may include only an identification of the audio source, for example, when the audio source is a source of a single audio signal, outdoor noise.
  • the audio signal descriptor may include an identification of the audio source and an identification of the audio signal. For example, when the audio source is a source of more than one sound, e.g., the washing machine may generate different noises: the sound of the motor, the alarms, etc., and each sound has a corresponding identifier)).
  • the audio signal descriptor includes a semantic description of the audio signal.
  • the audio signal descriptor can have several forms, for example, a JSON (JavaScript Object Notation) document can be used.
  • JSON JavaScript Object Notation
  • the audio system 100 also includes an audio output 107 that is coupled with the AAU 102 for outputting the modified version of the first audio signal.
  • the audio output 107 is included in a device at a location remote from the primary audio input 105.
  • the audio output 107 may include speaker(s) and a Digital-to -Analog Converter.
  • Data transfer between the different components of the audio system can be done using wired or wireless networks 110, e.g. 3G, WiFi, etc.
  • the network 110 can be a combination of several networks, e.g., local area networks, wide area networks, cellular networks, etc., coupling the various components.
  • the AAU 102 is part of a network device 202 that also includes the primary audio input 105.
  • the AAU 102 and the primary audio input 105 are located at the same location and are either part of the same physical device or coupled through a local communication link.
  • the AAU 102 can be included within the same network device 204 as the audio output. This ND 204 is remote from the primary audio input 105.
  • the AAU 102 and the audio output 107 are located at the same location and are either part of the same physical device or coupled through a local communication link.
  • the AAU 102 is operative to receive the audio signals from the primary audio input 105 and the secondary audio signal sources 103, to receive the metadata associated with each audio signal, and to create a modified version of the first audio signal that enhances the primary audio signal, e.g., the voice of the first user, based on the metadata.
  • Figure 2A illustrates a transactional diagram of exemplary operations for receiving audio signals at an audio signal alteration unit, in accordance with some embodiments.
  • the operations described with reference to Figures 2A-B occur following the receipt by the AAU 102 of the metadata associated with the audio signals originating from the secondary audio signal sources 103 and from the primary audio input 105.
  • the AAU 102 receives, either from each one of the audio sources 103 and the primary audio input 105, from the configuration unit 106, or from a combination of both, metadata associated with each audio signal that is to originate from the audio sources 103 and the primary audio input 105.
  • This step of receiving the metadata can be performed during a configuration operation of the audio system 100 and is performed independently of the operations described below with reference to Figures 2A-B.
  • the metadata is transmitted to the AAU 102 along with the audio signals when the audio sources and the primary audio input are in operations and actively transmitting the audio signals to the AAU 102. In these embodiments, there is no separate step for transmitting and receiving the metadata.
  • the AAU 102 receives, at operation 201, from the primary audio signal input 105 a first audio signal that is associated with an IDI.
  • the IDI indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment.
  • the IDI indicates that the first audio signal is more important than other audio signals that may be received at the AAU 102.
  • the IDI is one of Boolean values
  • the IDI of the first audio signal indicates that it is important.
  • the IDI is from a range of values including more than two values
  • the IDI of the first audio signal has the highest value.
  • the first audio signal includes a primary audio signal that is to be enhanced.
  • the first audio signal includes the primary audio signal, e.g., the voice of a first user using the audio system 100, for communicating with a second user at the end of the communication network
  • this primary audio signal is mixed with one or more other secondary audio signals, e.g., sounds from the indoor environment that originates from the secondary audio signal sources 103.
  • the AAU 102 receives, from a secondary audio signal source, e.g., active audio signal source 103 A, which can be a washing machine, an input indicative of an audio signal.
  • the audio signal is associated with an IDI indicating the importance of the audio signal with respect to one or more other audio signals in the indoor environment.
  • the audio source 103 A may also transmit additional information, e.g., the metadata and additional information related to the audio signal, to the AAU 102.
  • the audio source 103 A may report an amplitude of the audio signal, the IDI, the information about the source of the audio signal.
  • the audio sources 103 report the audio signal they emit by transmitting an identifier of the audio signal instead of the audio signal.
  • the audio source 103 A transmits an audio signal ID to the AAU 102.
  • the IDs uniquely identify the audio signals and enable the AAU 102 to retrieve the actual audio signals from the audio signal database 104.
  • the AAU 102 Upon receipt of the audio signal ID, the AAU 102 transmits a request for audio signal to the audio signal database 104.
  • the audio signal database 104 stores audio signals that can be generated by one or more audio sources.
  • the audio signal database 104 can include multiple audio signals that can be generated by the washing machine, e.g., alarm sounds, motor sounds, etc. These audio signals are identified with an audio signal ID, which can be an alphanumerical value such as a fingerprint or other value that uniquely identifies the audio signal.
  • the audio signal database 104 retrieves the audio signal based on the first audio signal ID.
  • the database 104 may receive the amplitude of the first audio signal, in addition to the ID, from the AAU 102.
  • the audio signal is retrieved from the database it can also be transformed, operation 203d, according to the requested amplitude so that the first audio signal from the audio signal database 104 is, when played back, as realistic as possible as the actual audio signal being generated at the active audio signal source 103 A.
  • the operations 203d is not performed at the audio signal database 104 but instead at the audio signal alteration unit 102, when the audio signal is received.
  • the audio signal retrieved from the database which may have been transformed or not, is transmitted to the AAU 102.
  • the AAU 102 receives from the audio source 103B, e.g., a TV, another audio signal.
  • the audio source 103B is a TV located in the indoor environment of the first user
  • the TV continuously sends the sound it is producing in a digital format, digital audio signal, to the AAU 102 in addition to outputting the sound to its speakers.
  • This audio signal is associated with an IDI classifying the sound as less important than the audio signal originating from the audio input 105.
  • the AAU 102 receives from the audio source 103C, e.g., a microphone for recording outdoor noise that is located on or near a window, another audio signal.
  • the audio source 103C e.g., a microphone for recording outdoor noise that is located on or near a window
  • the microphone continuously captures and sends the sound of an outdoor environment in a digital format, digital audio signal, to the AAU 102.
  • This audio signal is associated with an IDI classifying the sound as less important than the audio signal originating from the audio input 105. While the embodiments herein describe three secondary audio signals this is intended to be exemplary. Any number of secondary audio signal sources can be coupled with the AAU 102 and consequently any number of secondary audio signals can be received at the AAU 102 without departing from the scope of the present invention.
  • Each audio signal received at the AAU 102 may be received at a different time.
  • the difference in the time of receipt between the various audio signals can be caused by the time at which the audio signal is actually produced/recorded by the audio source, as well as due to propagation delays caused by the transmission of these audio signal in the network 110 towards the AAU 102.
  • the delays can also be caused by different relative distances and angular shifts between the primary audio input 105 and the secondary audio signal sources, by audio wave reflections, and/or various processing capacity of the electronic devices producing or recording these signals. Therefore while a sound may be produced/recorded at a given time, the audio signal associated with that sound can be received at the AAU 102 with a varying delay.
  • the inputs associated with the audio signals, 201, 203a, 205, and 207 are received at the AAU 102 they are associated with a time of receipt.
  • FIG. 3A illustrates exemplary audio samples received at the audio signal alteration unit, in accordance with some embodiments.
  • the audio signals received at the AAU 102 can be received as a set of successive samples with a determined frequency or alternatively as a single file.
  • the audio sources e.g., audio source 103B-C, and audio input 105
  • the audio signal is received as segments.
  • the audio signal when the audio signal is received at the AAU 102 as a result of the retrieval from the database based on the ID, the audio signal can be retrieved as a single file and either the AAU 102 or the database 104 can segment the file into one or more segments. While in Figure 3A only a single sample is illustrated for a given audio source, this is intended to be exemplary only.
  • each audio signal received from an audio source is comprised of multiple successive segments received at the determined frequency. For example, in the illustrated example, the audio samples are at a 44100 Khz frequency, i.e. 44100 samples per second.
  • Each segment is associated with a time at which the audio signal is received at the AAU 102.
  • each segment 302, 304, 306, and 308 has a different time of receipt: t(audio input), t(outdoor noise), t(washing machine), and t(TV) respectively.
  • the difference between the various times of receipt of the signals can be caused by the network propagation delay, e.g., for audio source 103B-C and audio input 105, and/or by the audio signal identification delay, i.e., the time caused by the receipt of the ID and the retrieval of the audio signal from the database 104, as for audio source 103 A.
  • the delays can also be caused by different relative distances and angular shifts between the primary audio input 105 and the secondary audio signal sources, by audio wave reflections, and/or various processing capacity of the electronic devices.
  • FIG. 2B illustrates a transactional diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.
  • the AAU 102 is operative to determine, at operation 209, that the first audio signal includes one or more audio signals second audio signal.
  • the first audio signal includes a mix of the primary audio signal, e.g., the voice of the first user, and at least one other audio signal representative of a sound produced by another source.
  • the AAU 102 determines if the first audio signal includes such a secondary audio signal and identifies which one it is, based on the additional audio signals received from the secondary audio signal sources, at operations 203d, 205, and 207.
  • the operations of determining that the first audio signal includes one or more other audio signals can be performed according to several embodiments.
  • the operation 209 includes synchronizing, operation 217, a first segment of the first audio signal and a second segment of a second audio signal based on a first and a second timestamps respectively associated with each one of the first segment and the second segment. The first and the second timestamps are indicative of the time the first segment and the second segment are received at the AAU 102.
  • a reference clock and a network transmission delay aware- protocol are used to synchronize the internal clocks of each one of the secondary audio signal sources 103, the primary audio input 105 and the AAU 102.
  • a non-limiting example of a delay- aware protocol that can be used is the Network Time Protocol (NTP).
  • NTP Network Time Protocol
  • a reference clock is used to synchronize the internal clocks.
  • the approach above may not take into consideration the audio propagation delay from passive sound producers to the AAU 102. However, this propagation delay is quite small, when compared with the network propagation delay, given that speed of sound is 340.9 m/sec at sea level, e.g. for a 10 meter distance the audio propagation delay is 30ms. In some embodiments, audio propagation delay can be aggregated in the total delay considered when synchronizing the audio sources with the AAU 102.
  • a pattern matching mechanism is performed, at operation 219, to identify the various secondary audio signals within the first audio signal.
  • the pattern matching mechanism compares the content of the incoming secondary audio signals, e.g., 203d, 205, and 207, with the first audio signal 201 to determine the exact time at which the secondary audio signal occurs within the first audio signal.
  • the pattern matching mechanism can be performed through various methods without departing from the scope of the current invention.
  • the pattern matching mechanism is performed by creating a spectrogram, that have three dimensions of time, amplitude and frequency, of the first audio signal, e.g., using short-time Fourier transform, and analyzing occurrences of the secondary audio signal spectrograms in the spectrum of the first audio signal to determine a matching between these two spectrum in the time dimension.
  • Figure 3B illustrates exemplary audio samples synchronized at the audio signal alteration unit, in accordance with some embodiments.
  • the AAU 102 obtains audio signals that are synchronized and has identified when a secondary audio signal, e.g., audio signal 203d, 205, and 207, occur with reference to the first audio signal 201 as illustrated with reference to Figure 3B.
  • Figure 3B illustrates the segments 302-308 for which the various delays caused by the system have been accounted for and the segments are aligned based on the actual time they have occurred.
  • the AAU 102 may now modify, based on the respective IDIs of the various audio signals received, the first audio signal to obtain a modified version of the first audio signal.
  • the modified version of the first audio signal enhances the primary audio signal.
  • modifying the first audio signal includes determining that the IDI associated with the first audio signal, i.e., signal received from the primary audio input 105, indicates that this audio signal is of higher importance than the secondary audio signals received from the secondary audio signal sources 103.
  • determining that the first IDI indicates that the first audio signal is of higher importance than the secondary audio signals includes determining that the first IDI is greater than any of the secondary IDIs.
  • Modifying the first audio signal includes performing at least one of cancelling the secondary audio signals from the first audio signal and increasing a volume of the primary audio signal.
  • the AAU 102 may perform any of the two operations, cancelling the secondary audio signals or increasing the volume of the primary audio signal, or a combination of the two operations. In all cases, this causes the primary audio signal to be enhanced.
  • the modified version of the first audio signal is sent to the output from which it can be heard by the second user. For example, when the primary audio signal is the sound of a the voice of the first user, the mechanism discussed above causes this audio signal to be received at the output of the audio system 100 with a better quality and clarity without distracting background sounds/noises.
  • FIG. 4A illustrates a flow diagram of exemplary operations for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.
  • a local clock of the network device including the AAU 102 is synchronized with a reference clock. The synchronization causes the local clock of the network device to be in synchronization with a local clock of the secondary audio signal sources 103 and with a local clock of the primary audio input 105.
  • the AAU 102 receives from a primary audio signal input, audio input 105, a first audio signal that is associated with a first importance designation indicator that indicates an importance of the first audio signal with respect to one or more other audio signals in the indoor environment.
  • the first audio signal includes a primary audio signal that is to be enhanced.
  • the AAU 102 receives from a secondary audio signal source, e.g., audio source 103 A, 103B, and/or 103C, a second input indicative of a second audio signal.
  • the second audio signal is associated with a second IDI indicating the importance of the second audio signal with respect to one or more other audio signals in the indoor environment.
  • the second input is an identifier of a second audio signal, and the operations move to operation 425 at which the AAU 102 retrieves a second audio signal from a database 104 of stored audio signals based on the identifier of the second audio signal.
  • the second input is a second audio signal and the operation 425 is skipped.
  • the AAU 102 determines that the first audio signal includes the second audio signal.
  • Figure 4B illustrates a flow diagram of exemplary operations for determining that a first audio signal includes a second audio signal, in accordance with some embodiments.
  • determining that the first audio signal includes the second audio signal includes synchronizing, at operation 460, a first segment of the first audio signal and a second segment of the second audio signal based on a first and a second timestamps respectively associated with each one of the first segment and the second segment. The first and the second timestamps are indicative of the time the first segment and the second segment are received at the AAU 102.
  • determining that the first audio signal includes the second audio signal can also include identifying, at operation 465, based on a pattern matching mechanism whether the first audio signal includes the second audio signal.
  • the first audio is modified based on the first IDI and the second IDI signal to obtain a modified version of the first audio signal.
  • the modified version of the first audio signal enhances the primary audio signal.
  • Figure 4C illustrates a flow diagram of exemplary operations for modifying, based on importance designation indicators, one or more audio signals, in accordance with some embodiments.
  • modifying the first audio signal includes determining, operation 470, that the IDI associated with the first audio signal, i.e., signal received from the primary audio input 105, indicates that this audio signal is of higher importance than the secondary audio signals received from the secondary audio signal sources 103. For example, when the first IDI associated with the first audio signal and the secondary IDIs associated with the secondary audio signals are numerical values, determining, at operation 472, that the first IDI indicates that the first audio signal is of higher importance than the secondary audio signals includes determining that the first IDI is greater than any of the secondary IDIs.
  • Modifying the first audio signal includes performing at least one of cancelling, at operation 475, the secondary audio signals from the first audio signal and increasing, at operation 480, a volume of the primary audio signal.
  • the AAU 102 may perform any of the two operations: cancelling the secondary audio signals or increasing the volume of the primary audio signal; or a combination of the two operations. In all cases, this causes the primary audio signal to be enhanced.
  • the modified version of the first audio signal is caused to be output to a receiver. For example, when the primary audio signal is the sound of a the voice of the first user, the mechanism discussed above causes this audio signal to be received at the output of the audio system 100 with a better quality and clarity without distracting background sounds/noises.
  • the various embodiments described herein present clear advantages with respect to prior techniques for enhancing an audio signal in an indoor environment.
  • the embodiments herein enable the use of metadata associated with the background sounds, i.e., the secondary audio signals, as well as the receipt of these sounds from the sources creating or recording them in order to modify an audio signal consequently enhancing the primary audio signal. Therefore the embodiments of the present invention enable a superior noise cancellation that improves the experience of the users communicating through the audio system 100.
  • An electronic device stores and transmits, internally and/or with other electronic devices over a network, code, which is composed of software instructions and which is sometimes referred to as computer program code or a computer program, and/or data using machine-readable media (also called computer-readable media), such as machine-readable storage media, e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory, and machine-readable transmission media (also called a carrier), e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared signals.
  • machine-readable media also called computer-readable media
  • machine-readable storage media e.g., magnetic disks, optical disks, solid state drives, read only memory (ROM), flash memory devices, phase change memory, and machine-readable transmission media (also called a carrier), e.g., electrical, optical, radio, acoustical or other form of propagated signals - such as carrier waves, infrared
  • an electronic device e.g., a computer
  • processors e.g., wherein a processor is a microprocessor, controller, microcontroller, central processing unit, digital signal processor, application specific integrated circuit, field programmable gate array, other electronic circuitry, a combination of one or more of the preceding, coupled to one or more machine-readable storage media to store code for execution on the set of processors and/or to store data.
  • an electronic device may include non- volatile memory containing the code since the non- volatile memory can persist code/data even when the electronic device is turned off, when power is removed, and while the electronic device is turned on that part of the code that is to be executed by the processor(s) of that electronic device is typically copied from the slower nonvolatile memory into volatile memory, e.g., dynamic random access memory (DRAM), static random access memory (SRAM), of that electronic device.
  • volatile memory e.g., dynamic random access memory (DRAM), static random access memory (SRAM), of that electronic device.
  • the audio system 100 can be implemented on one or more electronic devices as will be described with reference to the following figure.
  • Figure 5 illustrates a block diagram of an exemplary implementation of audio alteration unit 102 for enhancing an audio signal captured in an indoor environment, in accordance with some embodiments.
  • the physical, i.e., hardware, Audio Device 500 is an electronic device that can perform some or all of the operations and methods described above for one or more of the embodiments.
  • the physical audio device 500 can include one or more I/O interfaces, processor(s) (“processor circuitry”) 510, and a memory 505.
  • the processor(s) 510 may include one or more data processing circuits, such as a general purpose and/or special purpose processor, e.g., microprocessor and/or digital signal processor.
  • the processor(s) is configured to execute the audio signal alteration unit instance 512.
  • the audio signal adaptive unit instance 512 when executed by the processor is operative to perform the operations described with reference to the Figures 1-4C.
  • the various modules of Fig. 5 are shown to be included as part of the processor 510, one having ordinary skill in the art will appreciate that the various modules may be stored separately from the processor, for example, in a non-transitory computer readable storage medium.
  • the processor can execute the module stored in the memory, e.g., the audio signal alteration unit code 522, to perform some or all of the operations and methods described above. Accordingly, the processor can be configured by execution of various modules to carry out some or all of the functionality disclosed herein.
  • the audio device 500 further includes an audio signal database 504 stored in the memory 505.
  • the audio device 500 also include a set or one or more physical Input/Output (I/O) interface(s) to establish connections and communication between the different components of the audio device 500 and with external electronic devices.
  • the set of I/O interfaces can include a microphone and an ADC for receiving input audio signals, speakers and a DAC for outputting audio signals to a user, a secondary audio input for receiving other audio signals, and a communication interface (e.g., BLE) for communicating with external electronic devices.
  • a communication interface e.g., BLE
  • the audio signal alteration unit can be included within the same audio device as the audio input receiving the primary audio signal.
  • audio device 500 may include a microphone and the audio signal alteration unit.
  • the audio signal alteration unit can be included within the same audio device as the audio output receiving the modified audio signal with the enhanced primary audio signal.
  • audio device 500 may include a speaker and the audio signal alteration unit.
  • Each one of the secondary audio signal source 103 is an electronic device

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Un premier signal audio est reçu d'une entrée de signal audio primaire. Le premier signal audio est associé à un premier indicateur de désignation d'importance (IDI) qui indique une importance du premier signal audio par rapport à un ou plusieurs autres signaux audio dans l'environnement intérieur, le premier signal audio comprenant un signal audio primaire qui doit être amélioré. Une seconde entrée indiquant un second signal audio est reçue d'une source de signal audio secondaire. Le second signal audio est associé à un second IDI indiquant l'importance du second signal audio. Une détermination selon laquelle le premier signal audio comprend le second signal audio est effectuée. Une modification du premier signal audio est effectuée d'après le premier IDI et le second IDI afin d'obtenir une version modifiée du premier signal audio qui améliore le signal audio primaire.
PCT/EP2017/063266 2017-06-01 2017-06-01 Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur WO2018219459A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/063266 WO2018219459A1 (fr) 2017-06-01 2017-06-01 Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/063266 WO2018219459A1 (fr) 2017-06-01 2017-06-01 Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur

Publications (1)

Publication Number Publication Date
WO2018219459A1 true WO2018219459A1 (fr) 2018-12-06

Family

ID=59014614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2017/063266 WO2018219459A1 (fr) 2017-06-01 2017-06-01 Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur

Country Status (1)

Country Link
WO (1) WO2018219459A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110827845A (zh) * 2019-11-18 2020-02-21 西安闻泰电子科技有限公司 录音方法、装置、设备及存储介质
RU2755807C1 (ru) * 2019-10-09 2021-09-21 Эковелл Электроник Ко., Лтд. Аудиосистема с естественным звуковым эффектом типа звукового поля
CN113873325A (zh) * 2021-10-29 2021-12-31 深圳市兆驰股份有限公司 声音处理方法、装置、设备和计算机可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8606249B1 (en) * 2011-03-07 2013-12-10 Audience, Inc. Methods and systems for enhancing audio quality during teleconferencing
EP2779162A2 (fr) * 2013-03-12 2014-09-17 Comcast Cable Communications, LLC Élimination de bruit audio
US20150195641A1 (en) * 2014-01-06 2015-07-09 Harman International Industries, Inc. System and method for user controllable auditory environment customization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8606249B1 (en) * 2011-03-07 2013-12-10 Audience, Inc. Methods and systems for enhancing audio quality during teleconferencing
EP2779162A2 (fr) * 2013-03-12 2014-09-17 Comcast Cable Communications, LLC Élimination de bruit audio
US20150195641A1 (en) * 2014-01-06 2015-07-09 Harman International Industries, Inc. System and method for user controllable auditory environment customization

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2755807C1 (ru) * 2019-10-09 2021-09-21 Эковелл Электроник Ко., Лтд. Аудиосистема с естественным звуковым эффектом типа звукового поля
CN110827845A (zh) * 2019-11-18 2020-02-21 西安闻泰电子科技有限公司 录音方法、装置、设备及存储介质
CN110827845B (zh) * 2019-11-18 2022-04-22 西安闻泰电子科技有限公司 录音方法、装置、设备及存储介质
CN113873325A (zh) * 2021-10-29 2021-12-31 深圳市兆驰股份有限公司 声音处理方法、装置、设备和计算机可读存储介质
CN113873325B (zh) * 2021-10-29 2024-05-17 深圳市兆驰股份有限公司 声音处理方法、装置、设备和计算机可读存储介质

Similar Documents

Publication Publication Date Title
CN105814909B (zh) 用于反馈检测的系统和方法
US9275625B2 (en) Content based noise suppression
US10359991B2 (en) Apparatus, systems and methods for audio content diagnostics
US11816968B2 (en) Automatic presence simulator for security systems
US20150358768A1 (en) Intelligent device connection for wireless media in an ad hoc acoustic network
JP2018528479A (ja) スーパー広帯域音楽のための適応雑音抑圧
US8472633B2 (en) Detection of device configuration
US10032475B2 (en) Enhancing an audio recording
US20150358767A1 (en) Intelligent device connection for wireless media in an ad hoc acoustic network
WO2018219459A1 (fr) Procédé et appareil permettant d'améliorer un signal audio capturé dans un environnement intérieur
US20170164110A1 (en) Method, device and system for controlling a sound image in an audio zone
JP2018516497A (ja) 動的音響環境におけるマルチチャネル音のための音響エコー消去の較正
US11089496B2 (en) Obtention of latency information in a wireless audio system
US10397287B2 (en) Audio data transmission using frequency hopping
US9516413B1 (en) Location based storage and upload of acoustic environment related information
US11785405B2 (en) Systems and methods for automatic synchronization of content between a player system and a listener system
JP2024520930A (ja) メディア再生システムにおけるオーディオ暗号化
GB2575873A (en) Processing audio signals
US11508379B2 (en) Asynchronous ad-hoc distributed microphone array processing in smart home applications using voice biometrics
US8675821B2 (en) Network audio testing system and network audio testing method thereof
JP2016184110A (ja) 多地点会議装置及び多地点会議制御プログラム、並びに、多地点会議制御方法
US10997984B2 (en) Sounding device, audio transmission system, and audio analysis method thereof
US11792594B2 (en) Simultaneous deconvolution of loudspeaker-room impulse responses with linearly-optimal techniques
US20240094984A1 (en) Synchronization of audio output for noise cancellation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17728157

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17728157

Country of ref document: EP

Kind code of ref document: A1