WO2020025140A1 - Sound processing apparatus and method for sound enhancement - Google Patents

Sound processing apparatus and method for sound enhancement Download PDF

Info

Publication number
WO2020025140A1
WO2020025140A1 PCT/EP2018/071070 EP2018071070W WO2020025140A1 WO 2020025140 A1 WO2020025140 A1 WO 2020025140A1 EP 2018071070 W EP2018071070 W EP 2018071070W WO 2020025140 A1 WO2020025140 A1 WO 2020025140A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
training
sound signal
sound
noise signal
Prior art date
Application number
PCT/EP2018/071070
Other languages
French (fr)
Inventor
Peter GROSCHE
Gil Keren
Jing HAN
Bjoern Schuller
Wenyu Jin
Panji Setiawan
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2018/071070 priority Critical patent/WO2020025140A1/en
Priority to EP18752715.5A priority patent/EP3797415B1/en
Publication of WO2020025140A1 publication Critical patent/WO2020025140A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the invention relates to the field of sound processing. More specifically, the invention relates to a sound processing apparatus and method for sound, in particular speech enhancement.
  • Sound or audio enhancement conventionally uses only a recording of the speech and environment, i.e. noise for producing the enhanced speech audio.
  • audio enhancement procedures make use of neural network, such as the speech enhancement procedure described in the article "A Fully Convolutional Neural Network For Speech Enhancement", Se Rim Park and Jinwon Lee, in Proc. Interspeech 2017, August 20-24, 2017, pages 1993-1997, Swiss, Sweden.
  • embodiments of the invention are based on the idea to use for a plurality of training sound signals, including a training target signal and a training noise signal, the training noise signal as an additional input for training the neural network of a sound processing apparatus for improving the sound enhancement process.
  • the environment recording i.e. the training noise signal can be fed into a dedicated portion of the neural network that outputs an audio environment representation defined, for instance, by a parameter set.
  • the environment representation in turn, can be fed as an additional input to another portion of the neural network that produces the enhanced sound.
  • representations for enhancement embodiments of the invention allow to perform efficient speech enhancement in unpredictable audio environments.
  • the invention relates to a sound, in particular speech processing apparatus configured to process a current noisy sound signal comprising a target signal and a current noise signal into an enhanced, i.e. de-noised sound signal.
  • the apparatus which could be implemented, for instance, as a
  • loudspeaker a mobile phone and the like, comprises processing circuitry, in particular one or more processors, configured to provide an adjustable neural network.
  • the adjustable neural network is configured to be trained, i.e. conditioned using as a first input a training noise signal, as a second input a noisy training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal, preferably using a set of training sound signals comprising a plurality of training target signals and a plurality of training noise signals.
  • the adjustable neural network is configured to adjust itself on the basis of the current noise signal and to generate an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal.
  • the processing circuitry is further configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
  • an improved improved sound processing apparatus and method allowing for an improved enhancement of the current noisy sound signal.
  • the neural network can better separate the target signal from the sounds originating in the environment and reverberations of both the target sound and the environment sounds.
  • the processing circuitry is configured to transform the training noise signal, the noisy training sound signal and the training target signal from the time domain into the frequency domain, wherein the adjustable neural network is configured, in the training phase, to be trained using the training noise signal, the noisy training sound signal and the training target signal in the frequency domain.
  • the processing circuitry is configured to process the respective log spectra of the training noise signal, the noisy training sound signal and the training target signal.
  • the processing circuitry is configured to transform the current noise signal and the sound signal from the time domain into the frequency domain, wherein, in the training phase, the adjustable neural network is configured to adjust itself on the basis of the current noise signal in the frequency domain and to generate the estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal in the frequency domain and wherein the processing circuitry is configured to process the sound signal into the enhanced sound signal in the frequency domain on the basis of the estimated noise signal in the frequency domain.
  • the processing circuitry is further configured to transform the enhanced sound signal from the frequency domain into the time domain.
  • the processing circuitry is further configured to extract phase information from the sound signal comprising the target signal and the current noise signal and to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal and the extracted phase information.
  • the sound signal is a multi channel sound signal
  • the processing circuitry is configured to select a channel of the multi-channel sound signal and to extract the phase information from the selected channel of the multi-channel sound signal.
  • the neural network may be trained to localize sound sources that belong to the environment, and remove sounds originating from these locations from the current noisy sound signal.
  • the neural network in the training phase, is further configured to generate an estimated training noise signal on the basis of the training sound signal comprising the training target signal and the training noise signal, to process the training sound signal into an enhanced training sound signal on the basis of the estimated training noise signal and to be trained by minimizing a difference measure between the training target signal and the enhanced training sound signal.
  • a gradient-based optimization algorithm can be used for training the neural network.
  • the processing circuitry is configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting the estimated noise signal from the sound signal.
  • the sound signal is a multi channel sound signal
  • the processing circuitry is configured to select a channel of the multi-channel sound signal and to process the multi channel sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting the estimated noise signal from the selected channel of the multi channel sound signal.
  • the neural network comprises a first neural sub-network and a second neural sub-network, wherein, in the training phase, the first neural sub-network is configured to generate on the basis of the training noise signal a parameter set describing the training noise signal and to provide the parameter set to the second neural sub-network, wherein the second neural sub-network is configured to adjust on the basis of the parameter set provided by the first neural sub network.
  • the first neural sub-network in the application phase, is configured to generate on the basis of the current noise signal a parameter set describing the current noise signal, i.e. the current environment and to provide the parameter set to the second neural sub-network, wherein the second neural sub-network is configured to adjust on the basis of the parameter set provided by the first neural sub-network.
  • the neural network can learn how to represent a sound environment not encountered yet, and later use this
  • the first neural sub-network and/or the second neural sub-network comprises one or more convolutional layers.
  • the invention relates to a corresponding sound processing method for processing a current noisy sound signal comprising a target signal and a current noise signal into an enhanced, i.e. de-noised sound signal.
  • the method comprises the steps of providing an adjustable neural network; in a training phase, training, i.e.
  • conditioning the adjustable neural network using as a first input a training noise signal, as a second input a noisy training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal; and, in an application phase, adjusting the neural network on the basis of the current noise signal, generating an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal, and processing the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
  • the sound processing method according to the second aspect of the invention can be performed by the sound processing apparatus according to the first aspect of the invention. Further features of the sound processing method according to the second aspect of the invention result directly from the functionality of the sound processing apparatus according to the first aspect of the invention and its different implementation forms described above and below.
  • the invention relates to a computer program comprising program code for performing the image processing method according to the second aspect, when executed on a processor or a computer.
  • the invention can be implemented in hardware and/or software.
  • Fig. 1 a shows a schematic diagram illustrating an example of processing blocks implemented in a single channel sound processing apparatus according to an
  • Fig. 1 b shows a schematic diagram illustrating an example of processing blocks implemented in a single channel sound processing apparatus according to an
  • Fig. 2a shows a schematic diagram illustrating an example of processing blocks implemented in a multi-channel sound processing apparatus according to an embodiment in a training phase
  • Fig. 2b shows a schematic diagram illustrating an example of processing blocks implemented in a multi-channel sound processing apparatus according to an embodiment in an application phase
  • Fig. 3 shows a flow diagram illustrating an example of a sound processing method according to an embodiment.
  • a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa.
  • a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures.
  • the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.
  • Figure 1 a shows a schematic diagram illustrating an example of processing blocks implemented in a single channel sound processing apparatus 100 according to an embodiment in a training phase
  • figure 1 b shows a schematic diagram illustrating an example of processing blocks implemented in the single channel sound processing apparatus 100 in an application phase.
  • the sound processing apparatus 100 is configured to process a current noisy sound, in particular speech signal comprising a target signal and a current noise signal into an enhanced, i.e. de-noised sound, in particular speech signal.
  • the apparatus 100 which could be implemented, for instance, as a loudspeaker, a mobile phone and the like, comprises processing circuitry, in particular one or more processors, configured to provide, i.e. implement an adjustable neural network.
  • the adjustable neural network comprises a first neural sub network 103 and a second neural sub-network 107.
  • the first neural sub-network 103 and/or the second neural sub-network 107 (referred to as "Environment Residual Blocks" 103, 107 in the figures) can comprise one or more residual blocks.
  • the first neural sub-network 103 and the second neural sub-network 107 can constitute independent, i.e. separate neural networks.
  • the neural network, the first neural sub-network 103 and/or the second neural sub-network 107 can comprise one or more convolutional layers. More details about possible implementations of the neural network, the first neural sub-network 103 and/or the second neural sub-network 107 can be found, for instance, in the article "A Fully Convolutional Neural Network For Speech Enhancement", Se Rim Park and Jinwon Lee, in Proc. Interspeech 2017, August 20-24, 2017, pages 1993-1997, Sweden, which is fully incorporated by reference herein.
  • the adjustable neural network 103, 107 of the sound processing apparatus 100 is configured to be trained, i.e. conditioned using as a first input a training noise signal (referred to in figure 1 a as “Environment Waveform”), as a second input a noisy training sound signal (referred to in figure 1 a as “Environment + speech Waveform”) comprising a training target signal and the training noise signal and as a third input the training target signal (referred to in figure 1 a as "clean Waveform").
  • the training phase involves processing a set of training sound signals comprising a plurality of known training target signals and a plurality of known training noise signals.
  • the adjustable neural network 103, 107 of the sound processing apparatus 100 is configured to adjust itself on the basis of the current noise signal and to generate an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal.
  • the processing circuitry of the sound processing apparatus is further configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
  • the processing unit of the sound processing apparatus 100 is configured to transform the training noise signal, the noisy training sound signal, the training target signal, the current noise signal and the current sound signal from the time domain into the frequency domain by generating a respective log spectrum thereof.
  • the blocks 101 , 105 and 1 13 can be configured to perform a short time Fourier transform (STFT) using, for instance, 25 ms frames shifted by 10 ms to extract the spectrum of each signal.
  • STFT short time Fourier transform
  • the spectrum of the training noise signal (which is provided by block 101 of figure 1 ) is then processed by the first neural sub-network 103.
  • the first neural sub-network 103 comprises a sequence of residual blocks.
  • a respective residual block comprises two parallel paths.
  • the first path can contain two convolutional layers applied one after another, where batch normalization and a rectified- linear non-linearity are applied in between the layers.
  • the second path can contain the identity function.
  • the respective outputs of the two paths can be summed, and a rectified- linear non-linearity can be applied.
  • the output provided by the first neural sub-network 103 is a representation of the environment associated with a respective training noise signal (referred to as
  • the first neural sub-network 103 is configured to generate on the basis of the training noise signal provided by block 101 a parameter set, i.e. an environment embedding vector describing the training noise signal and to provide the parameter set to the second neural sub-network 107, wherein the second neural sub network 107 is configured to adjust itself on the basis of the parameter set provided by the first neural sub-network 103.
  • a parameter set i.e. an environment embedding vector describing the training noise signal
  • the second neural sub network 107 is configured to adjust itself on the basis of the parameter set provided by the first neural sub-network 103.
  • the first neural sub-network 103 is configured to generate on the basis of the current noise signal the environment embedding vector describing the current noise signal and to provide the environment embedding vector to the second neural sub-network 107, wherein the second neural sub-network 107 is configured to adjust itself on the basis of the parameter set provided by the first neural sub-network 103.
  • the output of the first neural sub-network 103 i.e. the environment embedding vector describing in the training phase the training noise signal or in the application phase the current noise signal, is used by the second neural sub-network 107 to adjust itself.
  • the parameter set defined by the environment embedding vector is used as an additional input by the second neural sub-network 107 such that the output of the second neural sub-network 107 depends on the environment embedding vector, and is “adjusting” to the noise in that sense.
  • the second neural sub-network 107 comprises a set of residual blocks, each comprised of two convolutional layers.
  • the environment embedding vector is projected (a linear transformation) to a vector with a dimension equal to the number of feature maps in the convolutional layer. Then, the output of this projection is added to every spatial location in the output map of the convolutional layer.
  • the adjusted second neural sub-network 107 is configured to generate an estimated training noise signal (referred to as "Enhancement Mask” in figure 1 a) on the basis of the training sound signal provided by block 105.
  • the adjusted second neural sub-network 107 is configured to generate an estimated noise signal (referred to as "Enhancement Mask” in figure 1 a) on the basis of the sound signal provided by block 105.
  • an enhanced training sound signal (referred to as "Enhanced Speech Spectrum” in figure 1 a) is generated on the basis of the estimated training noise signal provided by the second neural sub-network 107 and the training sound signal provided by block 105. In an embodiment, this can be done by subtracting the estimated training noise signal from the training sound signal or, alternatively, by adding the negative of the estimated training noise signal to the training sound signal.
  • an enhanced sound signal (referred to as "Enhanced Speech
  • Spectrum in figure 1 b) is generated on the basis of the estimated noise signal provided by the second neural sub-network 107 and the sound signal provided by block 105. In an embodiment, this can be done by subtracting the estimated noise signal from the sound signal or, alternatively, by adding the negative of the estimated noise signal to the sound signal.
  • the output of block 109 i.e. the enhanced training sound signal
  • the output of block 109 is used for training the second neural sub-network 107 by minimizing a difference measure, such as the absolute difference(s), the squared difference(s) and the like, between the training target signal provided by block 1 13 and the enhanced training sound signal provided by block 109.
  • a gradient-based optimization algorithm can be used for training, i.e. optimizing the model parameters of the second neural sub-network 107.
  • the processing circuitry of the sound processing apparatus 100 can be further configured to extract phase information from the sound signal comprising the target signal and the current noise signal and to transform the spectrum of the enhanced sound signal back into the time domain on the basis of the extracted phase information.
  • the final output of the sound processing apparatus 100 is the enhanced, i.e. de-noised sound signal in the time domain (referred to as "Enhanced Waveform").
  • Figures 2a and 2b show a further embodiment of the sound processing apparatus 100 shown in figures 1 a and 1 b.
  • the sound processing apparatus 100 is configured to process multi-channel sound signals.
  • the embodiment of the sound processing apparatus 100 shown in figures 2a and 2b and the embodiment of the sound processing apparatus 100 shown in figures 1 a and 1 b will be described.
  • the processing circuitry of the sound processing apparatus 100 can be configured to select a channel of the multi channel sound signal and to process the multi-channel sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting (or adding) the estimated noise signal from the selected channel of the multi-channel sound signal.
  • the selected channel could be, for instance, the channel closest to the speaker.
  • the enhanced spectrum is considered the output of the beamforming procedure in the multichannel setting.
  • processing circuitry in block 1 14 of figure 2b can be configured to select a channel of the multi-channel sound signal and to extract the phase information from the selected channel of the multi-channel sound signal.
  • the multiple channels of the noise signal can be used for localizing the sound sources by processing these channels with a time-frequency transformation that is more localized in time, such as a STFT over frames of 10 ms, shifted by 5 ms, or a wavelet transform.
  • a time-frequency transformation that is more localized in time, such as a STFT over frames of 10 ms, shifted by 5 ms, or a wavelet transform.
  • FIG. 3 shows a flow diagram illustrating an example of a corresponding sound processing method 300 according to an embodiment.
  • the method 300 comprises the steps of: providing 301 the adjustable neural network 103, 107; in a training phase 303, training, i.e. conditioning the adjustable neural network 103, 107 using as a first input a training noise signal, as a second input a noisy training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal; and, in an application phase 305, adjusting the neural network 107 on the basis of the current noise signal, generating an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal, and processing the sound signal into the enhanced sound signal on the basis of the estimated noise signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention relates to a sound processing apparatus (100) configured to process a current noisy sound signal comprising a target signal and a current noise signal into an enhanced sound signal. The apparatus (100) comprises processing circuitry configured to provide an adjustable neural network (103, 107), wherein the adjustable neural network (103, 107) is configured: in a training phase, to be trained using as a first input a training noise signal, as a second input a training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal; and, in an application phase, to adjust on the basis of the current noise signal and to generate an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal. In the application phase, the processing circuitry is further configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal. Moreover, the invention relates to a corresponding sound processing method.

Description

DESCRIPTION
SOUND PROCESSING APPARATUS AND METHOD FOR SOUND ENHANCEMENT
TECHNICAL FIELD
The invention relates to the field of sound processing. More specifically, the invention relates to a sound processing apparatus and method for sound, in particular speech enhancement.
BACKGROUND
Sound or audio enhancement conventionally uses only a recording of the speech and environment, i.e. noise for producing the enhanced speech audio. Often audio enhancement procedures make use of neural network, such as the speech enhancement procedure described in the article "A Fully Convolutional Neural Network For Speech Enhancement", Se Rim Park and Jinwon Lee, in Proc. Interspeech 2017, August 20-24, 2017, pages 1993-1997, Stockholm, Sweden.
However, given only one recording that contains both the speech and the noise created by the environment, it can be difficult, in particular for a neural network to ascertain which components of an audio signal originate from the environment, which components are the clean speech or sound, i.e. the target signal and which components are just reverberation effects of both the speech and the environment. Additionally, in multichannel settings, audio localization can be performed, but sound enhancement may have difficulties predicting whether a given sound source is to be attributed to the speech or the environment.
Thus, there is still a need for an improved sound processing apparatus and method allowing for an improved enhancement of a noisy sound signal.
SUMMARY
It is an object of the invention to provide an improved sound processing apparatus and method allowing for an improved enhancement of a noisy sound signal. The foregoing and other objects are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
Generally, embodiments of the invention are based on the idea to use for a plurality of training sound signals, including a training target signal and a training noise signal, the training noise signal as an additional input for training the neural network of a sound processing apparatus for improving the sound enhancement process. In an embodiment, the environment recording, i.e. the training noise signal can be fed into a dedicated portion of the neural network that outputs an audio environment representation defined, for instance, by a parameter set. The environment representation, in turn, can be fed as an additional input to another portion of the neural network that produces the enhanced sound. By explicitly learning to represent audio environments, and use these
representations for enhancement, embodiments of the invention allow to perform efficient speech enhancement in unpredictable audio environments.
More specifically, according to a first aspect the invention relates to a sound, in particular speech processing apparatus configured to process a current noisy sound signal comprising a target signal and a current noise signal into an enhanced, i.e. de-noised sound signal. The apparatus, which could be implemented, for instance, as a
loudspeaker, a mobile phone and the like, comprises processing circuitry, in particular one or more processors, configured to provide an adjustable neural network. In a training phase, the adjustable neural network is configured to be trained, i.e. conditioned using as a first input a training noise signal, as a second input a noisy training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal, preferably using a set of training sound signals comprising a plurality of training target signals and a plurality of training noise signals. In an application phase, the adjustable neural network is configured to adjust itself on the basis of the current noise signal and to generate an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal. The processing circuitry is further configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
Thus, an improved improved sound processing apparatus and method allowing for an improved enhancement of the current noisy sound signal. By additionally conditioning the neural network on the basis of a separate recording of the environment, i.e. the training noise signal, the neural network can better separate the target signal from the sounds originating in the environment and reverberations of both the target sound and the environment sounds.
In a further possible implementation form of the first aspect, the processing circuitry is configured to transform the training noise signal, the noisy training sound signal and the training target signal from the time domain into the frequency domain, wherein the adjustable neural network is configured, in the training phase, to be trained using the training noise signal, the noisy training sound signal and the training target signal in the frequency domain. In an embodiment, the processing circuitry is configured to process the respective log spectra of the training noise signal, the noisy training sound signal and the training target signal.
In a further possible implementation form of the first aspect, the processing circuitry is configured to transform the current noise signal and the sound signal from the time domain into the frequency domain, wherein, in the training phase, the adjustable neural network is configured to adjust itself on the basis of the current noise signal in the frequency domain and to generate the estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal in the frequency domain and wherein the processing circuitry is configured to process the sound signal into the enhanced sound signal in the frequency domain on the basis of the estimated noise signal in the frequency domain.
In a further possible implementation form of the first aspect, the processing circuitry is further configured to transform the enhanced sound signal from the frequency domain into the time domain.
In a further possible implementation form of the first aspect, in the application phase, the processing circuitry is further configured to extract phase information from the sound signal comprising the target signal and the current noise signal and to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal and the extracted phase information.
In a further possible implementation form of the first aspect, the sound signal is a multi channel sound signal, wherein, in the application phase, the processing circuitry is configured to select a channel of the multi-channel sound signal and to extract the phase information from the selected channel of the multi-channel sound signal. In multi-channel embodiments of the sound processing apparatus, the neural network may be trained to localize sound sources that belong to the environment, and remove sounds originating from these locations from the current noisy sound signal.
In a further possible implementation form of the first aspect, in the training phase, the neural network is further configured to generate an estimated training noise signal on the basis of the training sound signal comprising the training target signal and the training noise signal, to process the training sound signal into an enhanced training sound signal on the basis of the estimated training noise signal and to be trained by minimizing a difference measure between the training target signal and the enhanced training sound signal. In an implementation form, a gradient-based optimization algorithm can be used for training the neural network.
In a further possible implementation form of the first aspect, in the application phase, the processing circuitry is configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting the estimated noise signal from the sound signal.
In a further possible implementation form of the first aspect, the sound signal is a multi channel sound signal, wherein, in the application phase, the processing circuitry is configured to select a channel of the multi-channel sound signal and to process the multi channel sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting the estimated noise signal from the selected channel of the multi channel sound signal.
In a further possible implementation form of the first aspect, the neural network comprises a first neural sub-network and a second neural sub-network, wherein, in the training phase, the first neural sub-network is configured to generate on the basis of the training noise signal a parameter set describing the training noise signal and to provide the parameter set to the second neural sub-network, wherein the second neural sub-network is configured to adjust on the basis of the parameter set provided by the first neural sub network.
In a further possible implementation form of the first aspect, in the application phase, the first neural sub-network is configured to generate on the basis of the current noise signal a parameter set describing the current noise signal, i.e. the current environment and to provide the parameter set to the second neural sub-network, wherein the second neural sub-network is configured to adjust on the basis of the parameter set provided by the first neural sub-network.
By explicitly producing an environment representation using, for instance, a parameter set or vector, as implemented in embodiments of the invention, the neural network can learn how to represent a sound environment not encountered yet, and later use this
representation for an improved sound enhancement.
In a further possible implementation form of the first aspect, the first neural sub-network and/or the second neural sub-network comprises one or more convolutional layers.
According to a second aspect the invention relates to a corresponding sound processing method for processing a current noisy sound signal comprising a target signal and a current noise signal into an enhanced, i.e. de-noised sound signal. The method comprises the steps of providing an adjustable neural network; in a training phase, training, i.e. conditioning the adjustable neural network using as a first input a training noise signal, as a second input a noisy training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal; and, in an application phase, adjusting the neural network on the basis of the current noise signal, generating an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal, and processing the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
The sound processing method according to the second aspect of the invention can be performed by the sound processing apparatus according to the first aspect of the invention. Further features of the sound processing method according to the second aspect of the invention result directly from the functionality of the sound processing apparatus according to the first aspect of the invention and its different implementation forms described above and below.
According to a third aspect the invention relates to a computer program comprising program code for performing the image processing method according to the second aspect, when executed on a processor or a computer. The invention can be implemented in hardware and/or software.
BRIEF DESCRIPTION OF THE DRAWINGS
Further embodiments of the invention will be described with respect to the following figures, wherein:
Fig. 1 a shows a schematic diagram illustrating an example of processing blocks implemented in a single channel sound processing apparatus according to an
embodiment in a training phase;
Fig. 1 b shows a schematic diagram illustrating an example of processing blocks implemented in a single channel sound processing apparatus according to an
embodiment in an application phase;
Fig. 2a shows a schematic diagram illustrating an example of processing blocks implemented in a multi-channel sound processing apparatus according to an embodiment in a training phase;
Fig. 2b shows a schematic diagram illustrating an example of processing blocks implemented in a multi-channel sound processing apparatus according to an embodiment in an application phase; and
Fig. 3 shows a flow diagram illustrating an example of a sound processing method according to an embodiment.
In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.
DETAILED DESCRIPTION OF EMBODIMENTS
In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the invention may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the invention. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the invention is defined by the appended claims.
For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.
Figure 1 a shows a schematic diagram illustrating an example of processing blocks implemented in a single channel sound processing apparatus 100 according to an embodiment in a training phase, while figure 1 b shows a schematic diagram illustrating an example of processing blocks implemented in the single channel sound processing apparatus 100 in an application phase.
As will be described in more detail further below, the sound processing apparatus 100 is configured to process a current noisy sound, in particular speech signal comprising a target signal and a current noise signal into an enhanced, i.e. de-noised sound, in particular speech signal.
The apparatus 100, which could be implemented, for instance, as a loudspeaker, a mobile phone and the like, comprises processing circuitry, in particular one or more processors, configured to provide, i.e. implement an adjustable neural network. In the embodiment shown in figures 1 a and 1 b, the adjustable neural network comprises a first neural sub network 103 and a second neural sub-network 107. In an embodiment, the first neural sub-network 103 and/or the second neural sub-network 107 (referred to as "Environment Residual Blocks" 103, 107 in the figures) can comprise one or more residual blocks. In further embodiments, the first neural sub-network 103 and the second neural sub-network 107 can constitute independent, i.e. separate neural networks. In an embodiment, the neural network, the first neural sub-network 103 and/or the second neural sub-network 107 can comprise one or more convolutional layers. More details about possible implementations of the neural network, the first neural sub-network 103 and/or the second neural sub-network 107 can be found, for instance, in the article "A Fully Convolutional Neural Network For Speech Enhancement", Se Rim Park and Jinwon Lee, in Proc. Interspeech 2017, August 20-24, 2017, pages 1993-1997, Stockholm, Sweden, which is fully incorporated by reference herein.
In a training phase, the adjustable neural network 103, 107 of the sound processing apparatus 100 is configured to be trained, i.e. conditioned using as a first input a training noise signal (referred to in figure 1 a as "Environment Waveform"), as a second input a noisy training sound signal (referred to in figure 1 a as "Environment + speech Waveform") comprising a training target signal and the training noise signal and as a third input the training target signal (referred to in figure 1 a as "clean Waveform"). Usually, the training phase involves processing a set of training sound signals comprising a plurality of known training target signals and a plurality of known training noise signals.
In an application phase, the adjustable neural network 103, 107 of the sound processing apparatus 100 is configured to adjust itself on the basis of the current noise signal and to generate an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal. The processing circuitry of the sound processing apparatus is further configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
As illustrated by blocks 101 , 105 and 1 13 in figures 1 a and 1 b, in an embodiment, the processing unit of the sound processing apparatus 100 is configured to transform the training noise signal, the noisy training sound signal, the training target signal, the current noise signal and the current sound signal from the time domain into the frequency domain by generating a respective log spectrum thereof. To this end, the blocks 101 , 105 and 1 13 can be configured to perform a short time Fourier transform (STFT) using, for instance, 25 ms frames shifted by 10 ms to extract the spectrum of each signal.
The spectrum of the training noise signal (which is provided by block 101 of figure 1 ) is then processed by the first neural sub-network 103. In an embodiment, the first neural sub-network 103 comprises a sequence of residual blocks. In an embodiment, a respective residual block comprises two parallel paths. The first path can contain two convolutional layers applied one after another, where batch normalization and a rectified- linear non-linearity are applied in between the layers. The second path can contain the identity function. The respective outputs of the two paths can be summed, and a rectified- linear non-linearity can be applied. The output provided by the first neural sub-network 103 is a representation of the environment associated with a respective training noise signal (referred to as
"Environment Embedding" in the figures). Thus, in an embodiment, in the training phase (illustrated in figure 1 a), the first neural sub-network 103 is configured to generate on the basis of the training noise signal provided by block 101 a parameter set, i.e. an environment embedding vector describing the training noise signal and to provide the parameter set to the second neural sub-network 107, wherein the second neural sub network 107 is configured to adjust itself on the basis of the parameter set provided by the first neural sub-network 103. Likewise, in the application phase (illustrated in figure 1 b), the first neural sub-network 103 is configured to generate on the basis of the current noise signal the environment embedding vector describing the current noise signal and to provide the environment embedding vector to the second neural sub-network 107, wherein the second neural sub-network 107 is configured to adjust itself on the basis of the parameter set provided by the first neural sub-network 103.
The output of the first neural sub-network 103, i.e. the environment embedding vector describing in the training phase the training noise signal or in the application phase the current noise signal, is used by the second neural sub-network 107 to adjust itself. In other words, the parameter set defined by the environment embedding vector is used as an additional input by the second neural sub-network 107 such that the output of the second neural sub-network 107 depends on the environment embedding vector, and is “adjusting” to the noise in that sense. There can be multiple ways for the second neural sub-network 107 to use this additional input, which also depend on the inner structure of the second neural sub-network 107. In one embodiment, the second neural sub-network 107 comprises a set of residual blocks, each comprised of two convolutional layers. For each convolutional layer, the environment embedding vector is projected (a linear transformation) to a vector with a dimension equal to the number of feature maps in the convolutional layer. Then, the output of this projection is added to every spatial location in the output map of the convolutional layer.
In the training phase, the adjusted second neural sub-network 107 is configured to generate an estimated training noise signal (referred to as "Enhancement Mask" in figure 1 a) on the basis of the training sound signal provided by block 105. Likewise, in the application phase, the adjusted second neural sub-network 107 is configured to generate an estimated noise signal (referred to as "Enhancement Mask" in figure 1 a) on the basis of the sound signal provided by block 105. In the training phase, in block 109 of the sound processing apparatus 100 shown in figure 1 a an enhanced training sound signal (referred to as "Enhanced Speech Spectrum" in figure 1 a) is generated on the basis of the estimated training noise signal provided by the second neural sub-network 107 and the training sound signal provided by block 105. In an embodiment, this can be done by subtracting the estimated training noise signal from the training sound signal or, alternatively, by adding the negative of the estimated training noise signal to the training sound signal.
Likewise, in the application phase, in block 109 of the sound processing apparatus 100 shown in figure 1 b an enhanced sound signal (referred to as "Enhanced Speech
Spectrum" in figure 1 b) is generated on the basis of the estimated noise signal provided by the second neural sub-network 107 and the sound signal provided by block 105. In an embodiment, this can be done by subtracting the estimated noise signal from the sound signal or, alternatively, by adding the negative of the estimated noise signal to the sound signal.
In the training phase shown in figure 1 a, the output of block 109, i.e. the enhanced training sound signal, is used for training the second neural sub-network 107 by minimizing a difference measure, such as the absolute difference(s), the squared difference(s) and the like, between the training target signal provided by block 1 13 and the enhanced training sound signal provided by block 109. In an embodiment, a gradient-based optimization algorithm can be used for training, i.e. optimizing the model parameters of the second neural sub-network 107.
In the application phase shown in figure 1 b, in block 1 12 (referred to as "Waveform Reconstruction" in figure 1 b) the spectrum of the enhanced sound signal provided by block 109 is transformed back into the time domain. To this end, as illustrated by block 1 14 of figure 1 b, the processing circuitry of the sound processing apparatus 100 can be further configured to extract phase information from the sound signal comprising the target signal and the current noise signal and to transform the spectrum of the enhanced sound signal back into the time domain on the basis of the extracted phase information. In the application phase, the final output of the sound processing apparatus 100 is the enhanced, i.e. de-noised sound signal in the time domain (referred to as "Enhanced Waveform"). Figures 2a and 2b show a further embodiment of the sound processing apparatus 100 shown in figures 1 a and 1 b. In the embodiment shown in figures 2a and 2b the sound processing apparatus 100 is configured to process multi-channel sound signals. In the following only the main differences between the embodiment of the sound processing apparatus 100 shown in figures 2a and 2b and the embodiment of the sound processing apparatus 100 shown in figures 1 a and 1 b will be described.
As can be taken from figure 2b illustrating the application phase, the processing circuitry of the sound processing apparatus 100 can be configured to select a channel of the multi channel sound signal and to process the multi-channel sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting (or adding) the estimated noise signal from the selected channel of the multi-channel sound signal. The selected channel could be, for instance, the channel closest to the speaker. The enhanced spectrum is considered the output of the beamforming procedure in the multichannel setting.
Moreover, the processing circuitry in block 1 14 of figure 2b can be configured to select a channel of the multi-channel sound signal and to extract the phase information from the selected channel of the multi-channel sound signal.
The multiple channels of the noise signal can be used for localizing the sound sources by processing these channels with a time-frequency transformation that is more localized in time, such as a STFT over frames of 10 ms, shifted by 5 ms, or a wavelet transform.
Figure 3 shows a flow diagram illustrating an example of a corresponding sound processing method 300 according to an embodiment. The method 300 comprises the steps of: providing 301 the adjustable neural network 103, 107; in a training phase 303, training, i.e. conditioning the adjustable neural network 103, 107 using as a first input a training noise signal, as a second input a noisy training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal; and, in an application phase 305, adjusting the neural network 107 on the basis of the current noise signal, generating an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal, and processing the sound signal into the enhanced sound signal on the basis of the estimated noise signal. While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "include", "have", "with", or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprise". Also, the terms "exemplary", "for example" and "e.g." are merely meant as an example, rather than the best or optimal. The terms "coupled" and "connected", along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.
Although specific aspects have been illustrated and described herein, it will be
appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.
Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the invention may be practiced otherwise than as specifically described herein.

Claims

1. A sound processing apparatus (100) configured to process a sound signal comprising a target signal and a current noise signal into an enhanced sound signal, wherein the apparatus (100) comprises: processing circuitry configured to provide an adjustable neural network (103, 107), wherein the adjustable neural network (103, 107) is configured: in a training phase, to be trained using as a first input a training noise signal, as a second input a training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal, and in an application phase, to adjust itself on the basis of the current noise signal and to generate an estimated noise signal on the basis of the sound signal; wherein, in the application phase, the processing circuitry is further configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
2. The apparatus (100) of claim 1 , wherein the processing circuitry is configured to transform the training noise signal, the training sound signal and the training target signal from a time domain into a frequency domain and wherein the adjustable neural network is configured, in the training phase, to be trained using the training noise signal, the training sound signal and the training target signal in the frequency domain.
3. The apparatus (100) of claim 1 or 2, wherein the processing circuitry is configured to transform the current noise signal and the sound signal from the time domain into the frequency domain, wherein, in the training phase, the adjustable neural network is configured to adjust itself on the basis of the current noise signal in the frequency domain and to generate the estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal in the frequency domain and wherein the processing circuitry is configured to process the sound signal into the enhanced sound signal in the frequency domain on the basis of the estimated noise signal in the frequency domain.
4. The apparatus (100) of claim 3, wherein the processing circuitry is further configured to transform the enhanced sound signal from the frequency domain into the time domain.
5. The apparatus (100) of any one of the preceding claims, wherein, in the application phase, the processing circuitry is further configured to extract phase information from the sound signal comprising the target signal and the current noise signal and to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal and the extracted phase information.
6. The apparatus (100) of claim 5, wherein the sound signal is a multi-channel sound signal and wherein, in the application phase, the processing circuitry is configured to select a channel of the multi-channel sound signal and to extract the phase information from the selected channel of the multi-channel sound signal.
7. The apparatus (100) of any one of the preceding claims, wherein, in the training phase, the neural network is further configured to generate an estimated training noise signal on the basis of the training sound signal comprising the training target signal and the training noise signal, to process the training sound signal into an enhanced training sound signal on the basis of the estimated training noise signal and to be trained by minimizing a difference measure between the training target signal and the enhanced training sound signal.
8. The apparatus (100) of any one of the preceding claims, wherein, in the application phase, the processing circuitry is configured to process the sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting the estimated noise signal from the sound signal.
9. The apparatus (100) of any one of the preceding claims, wherein the sound signal is a multi-channel sound signal and wherein, in the application phase, the processing circuitry is configured to select a channel of the multi-channel sound signal and to process the multi-channel sound signal into the enhanced sound signal on the basis of the estimated noise signal by subtracting the estimated noise signal from the selected channel of the multi-channel sound signal.
10. The apparatus (100) of any one of the preceding claims, wherein the neural network (103, 107) comprises a first neural sub-network (103) and a second neural sub network (107), wherein, in the training phase, the first neural sub-network (103) is configured to generate on the basis of the training noise signal a parameter set describing the training noise signal and to provide the parameter set to the second neural sub network (107), wherein the second neural sub-network (107) is configured to adjust on the basis of the parameter set provided by the first neural sub-network (103).
1 1. The apparatus (100) of any one of the preceding claims, wherein the neural network (103, 107) comprises a first neural sub-network (103) and a second neural sub network (107), wherein, in the application phase, the first neural sub-network (103) is configured to generate on the basis of the current noise signal a parameter set describing the current noise signal and to provide the parameter set to the second neural sub network (107), wherein the second neural sub-network (107) is configured to adjust on the basis of the parameter set provided by the first neural sub-network (103).
12. The apparatus (100) of claim 10 or 1 1 , wherein the first neural sub-network (103) and/or the second neural sub-network (107) comprises one or more convolutional layers.
13. A sound processing method (300) for processing a sound signal comprising a target signal and a current noise signal into an enhanced sound signal, wherein the method (300) comprises: providing (301 ) an adjustable neural network (103, 107); in a training phase (303), training the adjustable neural network (103, 107) using as a first input a training noise signal, as a second input a training sound signal comprising a training target signal and the training noise signal and as a third input the training target signal; and in an application phase (305), adjusting the neural network (103, 107) on the basis of the current noise signal, generating an estimated noise signal on the basis of the sound signal comprising the target signal and the current noise signal, and processing the sound signal into the enhanced sound signal on the basis of the estimated noise signal.
14. A computer program comprising program code for performing the method (300) of claim 13, when executed on a computer or a processor.
PCT/EP2018/071070 2018-08-02 2018-08-02 Sound processing apparatus and method for sound enhancement WO2020025140A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/EP2018/071070 WO2020025140A1 (en) 2018-08-02 2018-08-02 Sound processing apparatus and method for sound enhancement
EP18752715.5A EP3797415B1 (en) 2018-08-02 2018-08-02 Sound processing apparatus and method for sound enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2018/071070 WO2020025140A1 (en) 2018-08-02 2018-08-02 Sound processing apparatus and method for sound enhancement

Publications (1)

Publication Number Publication Date
WO2020025140A1 true WO2020025140A1 (en) 2020-02-06

Family

ID=63165343

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/071070 WO2020025140A1 (en) 2018-08-02 2018-08-02 Sound processing apparatus and method for sound enhancement

Country Status (2)

Country Link
EP (1) EP3797415B1 (en)
WO (1) WO2020025140A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768795A (en) * 2020-07-09 2020-10-13 腾讯科技(深圳)有限公司 Noise suppression method, device, equipment and storage medium for voice signal
CN111933171A (en) * 2020-09-21 2020-11-13 北京达佳互联信息技术有限公司 Noise reduction method and device, electronic equipment and storage medium
CN112767908A (en) * 2020-12-29 2021-05-07 安克创新科技股份有限公司 Active noise reduction method based on key sound recognition, electronic equipment and storage medium
CN113780107A (en) * 2021-08-24 2021-12-10 电信科学技术第五研究所有限公司 Radio signal detection method based on deep learning dual-input network model

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180040333A1 (en) * 2016-08-03 2018-02-08 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180040333A1 (en) * 2016-08-03 2018-02-08 Apple Inc. System and method for performing speech enhancement using a deep neural network-based signal

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANURAG KUMAR ET AL: "Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks", INTERSPEECH 2016, vol. 2016, 12 September 2016 (2016-09-12), pages 3738 - 3742, XP055567333, ISSN: 1990-9772, DOI: 10.21437/Interspeech.2016-88 *
CHOI J ET AL: "An auditory-based adaptive speech enhancement system by neural network according to noise intensity", CIRCUITS AND SYSTEMS, 2000. 42ND MIDWEST SYMPOSIUM ON AUGUST 8 - 11, 1999, PISCATAWAY, NJ, USA,IEEE, vol. 2, 8 August 1999 (1999-08-08), pages 993 - 996, XP010511117, ISBN: 978-0-7803-5491-3 *
SE RIM PARK; JINWON LEE: "A Fully Convolutional Neural Network For Speech Enhancement", PROC. INTERSPEECH, 20 August 2017 (2017-08-20), pages 1993 - 1997
YONG XU ET AL: "Dynamic Noise Aware Training for Speech Enhancement Based on Deep Neural Networks", INTERSPEECH 2014, 14 September 2014 (2014-09-14), Singapore, pages 2670 - 2674, XP055576602 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768795A (en) * 2020-07-09 2020-10-13 腾讯科技(深圳)有限公司 Noise suppression method, device, equipment and storage medium for voice signal
CN111933171A (en) * 2020-09-21 2020-11-13 北京达佳互联信息技术有限公司 Noise reduction method and device, electronic equipment and storage medium
CN111933171B (en) * 2020-09-21 2021-01-22 北京达佳互联信息技术有限公司 Noise reduction method and device, electronic equipment and storage medium
CN112767908A (en) * 2020-12-29 2021-05-07 安克创新科技股份有限公司 Active noise reduction method based on key sound recognition, electronic equipment and storage medium
CN112767908B (en) * 2020-12-29 2024-05-21 安克创新科技股份有限公司 Active noise reduction method based on key voice recognition, electronic equipment and storage medium
CN113780107A (en) * 2021-08-24 2021-12-10 电信科学技术第五研究所有限公司 Radio signal detection method based on deep learning dual-input network model
CN113780107B (en) * 2021-08-24 2024-03-01 电信科学技术第五研究所有限公司 Radio signal detection method based on deep learning dual-input network model

Also Published As

Publication number Publication date
EP3797415B1 (en) 2024-06-19
EP3797415A1 (en) 2021-03-31

Similar Documents

Publication Publication Date Title
EP3797415B1 (en) Sound processing apparatus and method for sound enhancement
US9060052B2 (en) Single channel, binaural and multi-channel dereverberation
CN102907120B (en) For the system and method for acoustic processing
US9681246B2 (en) Bionic hearing headset
KR102059486B1 (en) Method and apparatus for decoding stereo loudspeaker signals from a higher-order ambisonics audio signal
JP6198800B2 (en) Apparatus and method for generating an output signal having at least two output channels
JP6377249B2 (en) Apparatus and method for enhancing an audio signal and sound enhancement system
EP3005362B1 (en) Apparatus and method for improving a perception of a sound signal
JP6280983B2 (en) Apparatus and method for center signal scaling and stereophonic enhancement based on signal-to-downmix ratio
JP5906312B2 (en) Method and apparatus for decomposing stereo recordings using frequency domain processing using a spectral weight generator
CN103180752B (en) For resolving equipment and the method for the fuzziness arriving direction estimation
Marquardt et al. Interaural coherence preservation for binaural noise reduction using partial noise estimation and spectral postfiltering
JP6434157B2 (en) Audio signal processing apparatus and method
US20220337952A1 (en) Content based spatial remixing
JP2007047427A (en) Sound processor
KR20170092669A (en) An audio signal processing apparatus and method for modifying a stereo image of a stereo signal
Marelli et al. Efficient approximation of head-related transfer functions in subbands for accurate sound localization
Kabzinski et al. An adaptive crosstalk cancellation system using microphones at the ears
Pirhosseinloo et al. An Interaural Magnification Algorithm for Enhancement of Naturally-Occurring Level Differences.
Cobos et al. Resynthesis of sound scenes on wave-field synthesis from stereo mixtures using sound source separation algorithms
KR102547423B1 (en) Audio signal processor, system and methods for distributing an ambient signal to a plurality of ambient signal channels
JP6832095B2 (en) Channel number converter and its program
Grubesa et al. Speaker Recognition Method combining FFT, Wavelet Functions and Neural Networks
CN117896666A (en) Method for playback of audio data, electronic device and storage medium
Marin-Hurtado et al. Preservation of localization cues in BSS-based noise reduction: Application in binaural hearing aids

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18752715

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018752715

Country of ref document: EP

Effective date: 20201221

NENP Non-entry into the national phase

Ref country code: DE