CN113393850B - Parameterized auditory filter bank for end-to-end time domain sound source separation system - Google Patents
Parameterized auditory filter bank for end-to-end time domain sound source separation system Download PDFInfo
- Publication number
- CN113393850B CN113393850B CN202110569382.XA CN202110569382A CN113393850B CN 113393850 B CN113393850 B CN 113393850B CN 202110569382 A CN202110569382 A CN 202110569382A CN 113393850 B CN113393850 B CN 113393850B
- Authority
- CN
- China
- Prior art keywords
- network
- sound source
- filter bank
- time domain
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000003477 cochlea Anatomy 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Filters That Use Time-Delay Elements (AREA)
Abstract
The invention provides a parameterized auditory filter bank for an end-to-end time domain sound source separation system; a parameterized auditory filter bank is introduced into the end-to-end time domain separation system, a separation model with more auditory rationality is established, and the separation performance of a network is improved. Compared with a fixed filter bank, the parameters of the parameterized auditory filter bank are obtained through network training, so that the parameterized auditory filter bank has better flexibility, and can be spontaneously adjusted according to the characteristics of the network and data so as to obtain better separation performance. Compared with a free filter bank, the parameterized auditory filter bank provides prior information for the network auditory system through the form of the Gamma filter, so that the network can better simulate the human ear auditory system to improve the separation capability of the network in an actual scene, and the network has better interpretability. Furthermore, only 4 parameters per filter need to be trained, which significantly reduces the number of parameters of the network compared to a free filter where all parameters need to be trained.
Description
Technical Field
The invention belongs to the field of sound source separation, and particularly relates to a parameterized auditory filter bank for improving the performance of an end-to-end time domain sound source separation system.
Background
In real acoustic scenes, where there is typically simultaneous sounding of multiple sound sources, sound source separation has been an important aspect of computing auditory scene analysis. With the rapid development of deep learning, sound source separation systems have made breakthrough progress. As shown in fig. 1, most end-to-end time domain sound source separation systems now follow the framework of encoder-separator-decoders. The encoder converts the time-domain mixed sound into an intermediate representation, the separator serves to estimate a weighting function (mask) of the sound source, and then multiplies the mask and the intermediate representation of the mixed sound source and obtains the separated sound source through the decoder.
An encoder refers to a set of filters that convolve a time signal and may be fixed (referred to herein as a fixed filter bank), such as a Short Time Fourier Transform (STFT), a constant Q transform, or an auditory filter bank with a fixed value. On the other hand, the filter bank may also be a one-dimensional convolution layer with arbitrary initialization values, whose parameters are learned spontaneously during network training, referred to as a free filter bank in the present invention.
The fixed filter bank has visual explanation, contains priori knowledge, is not easy to cause over fitting of the network, but the performance is difficult to improve. In contrast, free-form filter banks have a high degree of freedom, typically better performance, but are susceptible to noisy data during training.
Disclosure of Invention
Technical problem to be solved
The invention mainly aims at solving the problems that the performance of a fixed filter bank adopted by an encoder in the existing sound source separation system is difficult to improve, and the free filter bank is easy to be influenced by noisy data in training, and compromises flexibility and priori information.
The excellent performance of the human ear in auditory scene analysis inspires us to introduce auditory filter banks with physiological and psycho-acoustic rationality into the sound source separation system. For auditory models, spectral analysis of the cochlea is typically modeled by a gammatine filter bank. The parameterized auditory filter bank proposed in the invention refers to a bank with gammatine function form, and its parameters are obtained through network learning, which has better separation performance than the fixed filter bank, and better auditory rationality and interpretability than the free filter bank.
The technical scheme of the invention is as follows:
the parameterized auditory filter bank for the end-to-end time domain sound source separation system adopts gammatine filters, and the number N of the filters is not less than 32; the filter time domain impulse response is pure tone modulated by Gamma distribution:
g(t)=At p-1 e -2πbt cos(2πf c t+φ)
where p is the order, f c Is the center frequency, b is the bandwidth, phi isThe phase, a, is the amplitude, determined by the order p and the bandwidth b.
Further, the amplitude is valued as
The method for constructing the end-to-end time domain sound source separation system by using the parameterized auditory filter bank comprises the following steps:
step 1: creating a time domain separation network from the framework of encoder-separator-decoder; wherein the encoder is realized by a one-dimensional convolution layer, and the form of a filter bank adopts a parameterized auditory filter bank; a separator for estimating a mask of the sound source; the decoder is a one-dimensional deconvolution layer; the mask estimate value of each sound source from the separator is multiplied with a two-dimensional representation of the mixed sound from the encoder, after which the time domain signal of the separated sound source can be synthesized by the decoder;
step 2: parameter sets for each filter based on a priori knowledge of the auditory system of the human earInitializing a parameter set in a parameterized auditory filter bank>The network training process is variable:
(1) The order p of each filter i Initializing and setting the average fitting value to be 4, wherein the average fitting value corresponds to the filter order in the auditory system of the human ear;
(2) Center frequency of each filterThe initializations are uniformly distributed over the equivalent rectangular bandwidth ERB scale, where the mapping from linear frequency to ERB scale is
(3) Initializing bandwidth b of each filter i From the following componentsAnd an order p i Determining
(4) Initialization phase phi of each filter i Is set asAligning the peak of the pitch with the peak of the Gamma envelope;
step 3: and selecting different sound sources to create a data set according to the separation task, and training a time domain separation network by utilizing the data set to obtain the end-to-end time domain sound source separation system.
Furthermore, the separator adopts a network structure based on depth convolution, and comprises a plurality of hole convolution modules with different expansion factors, wherein each module comprises a convolution layer, a rectification layer, a normalization layer, a depth convolution layer and a residue and jump structure.
Further, when training the time domain separation network, the scale-invariant signal distortion ratio between the minimum real sound source and the estimated sound source is used as a training target, and the network is trained by the Adam optimizer until the separation performance is not improved any more, so that the end-to-end time domain sound source separation system is obtained.
Advantageous effects
The invention introduces a parameterized auditory filter bank into the end-to-end time domain separation system, establishes a separation model with more auditory rationality, and improves the separation performance of the network. Compared with a fixed filter bank, the parameters of the parameterized auditory filter bank are obtained through network training, so that the parameterized auditory filter bank has better flexibility, and can be spontaneously adjusted according to the characteristics of the network and data so as to obtain better separation performance. Compared with a free filter bank, the parameterized auditory filter bank provides prior information for the network auditory system through the form of the Gamma filter, so that the network can better simulate the human ear auditory system to improve the separation capability of the network in an actual scene, and the network has better interpretability. Furthermore, only 4 parameters per filter need to be trained, which significantly reduces the number of parameters of the network compared to a free filter where all parameters need to be trained.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a generic framework for a sound source separation system;
FIG. 2 is a block diagram of an end-to-end time domain convolution separation network;
FIG. 3 (a) frequency response of a parameterized gammatine filter bank, arranged according to center frequency; (b) The frequency response of the 4 gammatine filters with a center frequency of 1.125 KHz.
Detailed Description
The present invention provides a parameterized auditory filter bank for improving the performance of an end-to-end time domain sound source separation network as an encoder for an end-to-end time domain sound source separation network and creates a more auditory rationalized end-to-end time domain sound source separation network based on the framework of encoder-separator-decoder. The encoder is in the form of a group of gammatine auditory filters, and parameters of the gammatine auditory filters are learned in network training, so that the performance of a separation network is improved, and a foundation is laid for the application of selective listening of a machine in a real scene.
In this embodiment, an end-to-end time domain separation network for separating any sound source is trained, where the encoder is composed of a group of gammatine filters, and a parameter set of the encoder is obtained through network learning, and the method includes the following steps:
step 1: and building an end-to-end time domain convolution network. The network is built in terms of the framework of encoder-separator-decoder. The encoder is implemented by a one-dimensional convolution layer whose form of the filter bank is given in step 2. The mask used by the separator to estimate the sound source may have a variety of network forms. The invention provides a network structure based on depth convolution, which is shown in fig. 2, and comprises a plurality of hole convolution modules with different expansion factors, wherein each module comprises a convolution layer, a rectification layer, a normalization layer, a depth convolution layer and a residue and jump structure. In this embodiment the separator is made up of 3 convolution modules, each implemented by 8 perforated convolution blocks with an index of 2 expansion factors. The mask estimate from each sound source of the separator is multiplied with a two-dimensional representation of the mixed sound from the encoder. Finally, the time domain signals of the separated sound sources are synthesized through a decoder (one-dimensional transposed convolution layer).
Step 2: a gammatine filter bank is created and initialized.
The gammatine filter is a filter with a better simulation effect on the auditory system, and the time domain impulse response can be expressed as pure tones modulated by Gamma distribution:
g(t)=At p-1 e -2πbt cos(2πf c t+φ)
where p is the order, f c Is the center frequency, b is the bandwidth, phi is the phase, A is the amplitude
The encoder in the invention is a group of gammatine filters, and the number N of the filters is not less than 32. Parameter set of filterVariable during the network training process.
Suitable initial values facilitate network training, for which purpose parameter sets for the individual filters are based on a priori knowledge of the auditory system of the human earInitialization is performed.
(1) The order p of each filter i All initialized to 4, corresponding to the average fit of the filter order in the human auditory system.
(2) Of each filterCenter frequencyThe initializations are uniformly distributed over an Equivalent Rectangular Bandwidth (ERB) scale, wherein the mapping from linear frequency to ERB scale is
(3) Initializing bandwidth b of each filter i From the following componentsAnd an order p i Determining
(4) Initialization phase phi of each filter i Is set asTo align the peak of the pitch with the peak of the Gamma envelope.
The encoder in this embodiment is composed of 512 gammatine filters of length 2ms, each filter having a parameter setThe initialization value of (2) is: order p i =4, center frequency ∈>Bandwidth b for one of 512 frequency points uniformly distributed on the ERB scale i And phase phi i By corresponding->And p i And (5) calculating to obtain the product.
Step 3: a dataset is created and a network is trained. Different sound sources are selected to create a dataset according to the separation task. And training a network by using an Adam optimizer with a scale-invariant-to-location ratio (SI-SDR) between a minimized real sound source and an estimated sound source as a training target until the separation performance is not improved any more, so as to obtain a sound source separation model.
To simulate the separation of arbitrary sound sources in a real sound scene, the present embodiment creates a multi-class large dataset containing ambient sound, speech, and musical tones. The ambient sounds (including traffic noise, alarm sounds, dog sounds, etc.) from the BBC effects dataset, the speech from the library dataset, and the musical tone from the musan dataset are selected. Each sound source is downsampled to 16kHz. Two different sound sources are randomly selected from the data set and mixed with a random signal to noise ratio of between-5 dB and 5 dB. The dataset contained a total of 37.5 hours of acoustic samples, 70% of which were used for training, 20% for cross-validation, and 10% for testing.
And training a network by using an Adam optimizer with a scale-invariant-to-location ratio (SI-SDR) between a minimized real sound source and an estimated sound source as a training target until the separation performance is not improved any more, so as to obtain a sound source separation model.
The SI-SDR improvement value (dB) of the network over the test set is shown in table 1. The separation performance of the parametric gammatine filter bank is improved by 2.31dB compared with that of the fixed gammatine filter bank, and the network is proved to be capable of learning a parameter set which is more suitable for the separation system, so that the parametric gammatine filter bank has higher flexibility and better separation performance compared with that of the fixed gammatine filter bank. Compared with a free filter bank, the parametric gammatine filter bank has better interpretability, and performance improvement is obtained, so that the gammatine filter bank with auditory rationality can introduce prior information beneficial to a separation network.
TABLE 1 Sound source separation networks with different encoders SI-SDR improvement values (dB) in test set
Fig. 3 (a) shows 512 filter responses obtained after network training, which are arranged according to the center frequency. (b) The frequency response of 4 gammatine filters with a center frequency of 1.125KHz is shown. The result shows that the center frequency learned by the network is still distributed according to the ERB scale, but richer orders p and bandwidths b are learned, and the network is sensitive to the parameters of the filter. It is difficult for the filter bank to manually determine the appropriate fixed parameter values, and training the spontaneous learning parameter values through the network is a better way to improve performance.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives, and variations may be made in the above embodiments by those skilled in the art without departing from the spirit and principles of the invention.
Claims (3)
1. A method for constructing an end-to-end time domain sound source separation system by using a parameterized auditory filter bank, which is characterized in that: the parameterized auditory filter bank adopts gammatine filters, and the number N of the filters is not less than 32; the filter time domain impulse response is pure tone modulated by Gamma distribution:
g(t)=At p-1 e -2πbt cos(2πf c t+φ)
where p is the order, f c The frequency is the center frequency, b is the bandwidth, phi is the phase, A is the amplitude, and is determined by the order p and the bandwidth b; amplitude takes on the value of
The method comprises the following steps:
step 1: creating a time domain separation network from the framework of encoder-separator-decoder; wherein the encoder is realized by a one-dimensional convolution layer, and the filter bank adopts the parameterized auditory filter bank; a separator for estimating a mask of the sound source; the decoder is a one-dimensional deconvolution layer; the mask estimate value of each sound source from the separator is multiplied with a two-dimensional representation of the mixed sound from the encoder, after which the time domain signal of the separated sound source can be synthesized by the decoder;
step 2: parameter sets for each filter based on a priori knowledge of the auditory system of the human earInitializing a parameter set in a parameterized auditory filter bank>The network training process is variable:
(1) The order p of each filter i Initializing and setting the average fitting value to be 4, wherein the average fitting value corresponds to the filter order in the auditory system of the human ear;
(2) Center frequency of each filterThe initializations are uniformly distributed over the equivalent rectangular bandwidth ERB scale, where the mapping from linear frequency to ERB scale is
(3) Initializing bandwidth b of each filter i From the following componentsAnd an order p i Determining
(4) Initialization phase phi of each filter i Is set asAligning the peak of the pitch with the peak of the Gamma envelope;
step 3: and selecting different sound sources to create a data set according to the separation task, and training a time domain separation network by utilizing the data set, wherein the scale-invariant signal distortion ratio between the minimized real sound source and the estimated sound source is used as a training target, and the network is trained by an Adam optimizer until the separation performance is not improved any more, so that the end-to-end time domain sound source separation system is obtained.
2. A method of constructing an end-to-end time domain sound source separation system as claimed in claim 1, wherein: the separator adopts a network structure based on depth convolution, and comprises a plurality of hole convolution modules with different expansion factors, wherein each hole convolution module comprises a convolution layer, a rectification layer, a normalization layer, a depth convolution layer and a residue and jump structure.
3. An end-to-end time domain sound source separation system, characterized in that: constructed by the method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110569382.XA CN113393850B (en) | 2021-05-25 | 2021-05-25 | Parameterized auditory filter bank for end-to-end time domain sound source separation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110569382.XA CN113393850B (en) | 2021-05-25 | 2021-05-25 | Parameterized auditory filter bank for end-to-end time domain sound source separation system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113393850A CN113393850A (en) | 2021-09-14 |
CN113393850B true CN113393850B (en) | 2024-01-19 |
Family
ID=77618982
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110569382.XA Active CN113393850B (en) | 2021-05-25 | 2021-05-25 | Parameterized auditory filter bank for end-to-end time domain sound source separation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113393850B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117711423B (en) * | 2024-02-05 | 2024-05-10 | 西北工业大学 | Mixed underwater sound signal separation method and system combining auditory scene analysis and deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
CN103985390A (en) * | 2014-05-20 | 2014-08-13 | 北京安慧音通科技有限责任公司 | Method for extracting phonetic feature parameters based on gammatone relevant images |
CN109256127A (en) * | 2018-11-15 | 2019-01-22 | 江南大学 | A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter |
CN110010150A (en) * | 2019-04-15 | 2019-07-12 | 吉林大学 | Auditory Perception speech characteristic parameter extracting method based on multiresolution |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9131295B2 (en) * | 2012-08-07 | 2015-09-08 | Microsoft Technology Licensing, Llc | Multi-microphone audio source separation based on combined statistical angle distributions |
US10536775B1 (en) * | 2018-06-21 | 2020-01-14 | Trustees Of Boston University | Auditory signal processor using spiking neural network and stimulus reconstruction with top-down attention control |
-
2021
- 2021-05-25 CN CN202110569382.XA patent/CN113393850B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456312A (en) * | 2013-08-29 | 2013-12-18 | 太原理工大学 | Single channel voice blind separation method based on computational auditory scene analysis |
CN103985390A (en) * | 2014-05-20 | 2014-08-13 | 北京安慧音通科技有限责任公司 | Method for extracting phonetic feature parameters based on gammatone relevant images |
CN109256127A (en) * | 2018-11-15 | 2019-01-22 | 江南大学 | A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter |
CN110010150A (en) * | 2019-04-15 | 2019-07-12 | 吉林大学 | Auditory Perception speech characteristic parameter extracting method based on multiresolution |
Non-Patent Citations (1)
Title |
---|
基于改进基音跟踪算法的单通道语音分离;王雨;林家骏;袁文浩;陈宁;;华东理工大学学报(自然科学版)(03);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113393850A (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chi et al. | Multiresolution spectrotemporal analysis of complex sounds | |
CN103996402B (en) | Encoding device and encoding method | |
CN110473567A (en) | Audio-frequency processing method, device and storage medium based on deep neural network | |
Lee et al. | Differentiable artificial reverberation | |
KR20230013054A (en) | Time-varying and non-linear audio processing using deep neural networks | |
CN112820315B (en) | Audio signal processing method, device, computer equipment and storage medium | |
CN113314140A (en) | Sound source separation algorithm of end-to-end time domain multi-scale convolutional neural network | |
Ramírez et al. | A general-purpose deep learning approach to model time-varying audio effects | |
JP2009518684A (en) | Extraction of voice channel using inter-channel amplitude spectrum | |
Daniel Meliza et al. | Pitch-and spectral-based dynamic time warping methods for comparing field recordings of harmonic avian vocalizations | |
CN113393850B (en) | Parameterized auditory filter bank for end-to-end time domain sound source separation system | |
Suga | Basic acoustic patterns and neural mechanisms shared by humans and animals for auditory perception | |
CN113823316B (en) | Voice signal separation method for sound source close to position | |
WANG et al. | Realization of acoustic inverse filtering through multi-microphone sub-band processing | |
CN113327624B (en) | Method for intelligent monitoring of environmental noise by adopting end-to-end time domain sound source separation system | |
CN106034274A (en) | 3D sound device based on sound field wave synthesis and synthetic method | |
CN113921007B (en) | Method for improving far-field voice interaction performance and far-field voice interaction system | |
CN116935879A (en) | Two-stage network noise reduction and dereverberation method based on deep learning | |
Martínez Ramírez | Deep learning for audio effects modeling | |
CN114283832A (en) | Processing method and device for multi-channel audio signal | |
CN113450811B (en) | Method and equipment for performing transparent processing on music | |
Chen et al. | Modified Perceptual Linear Prediction Liftered Cepstrum (MPLPLC) Model for Pop Cover Song Recognition. | |
CN113077811B (en) | Voice separation method based on parameterized multiphase gamma filter bank | |
CN112185403B (en) | Voice signal processing method and device, storage medium and terminal equipment | |
Douglas et al. | Blind separation of acoustical mixtures without time-domain deconvolution or decorrelation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |