CN116935879A - Two-stage network noise reduction and dereverberation method based on deep learning - Google Patents

Two-stage network noise reduction and dereverberation method based on deep learning Download PDF

Info

Publication number
CN116935879A
CN116935879A CN202210355142.4A CN202210355142A CN116935879A CN 116935879 A CN116935879 A CN 116935879A CN 202210355142 A CN202210355142 A CN 202210355142A CN 116935879 A CN116935879 A CN 116935879A
Authority
CN
China
Prior art keywords
stage
network
noise reduction
noise
reverberation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210355142.4A
Other languages
Chinese (zh)
Inventor
刘宏清
夏俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210355142.4A priority Critical patent/CN116935879A/en
Publication of CN116935879A publication Critical patent/CN116935879A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention relates to a two-stage network noise reduction and dereverberation method based on deep learning, which belongs to the field of voice processing and divides background noise and room reverberation into a noise reduction stage and a dereverberation stage for processing according to the property difference of interference signals. Firstly, independently training the two-stage network, reserving the weight parameters and related configuration of the training, and further transplanting the training to the time domain two-stage network for joint training. The invention processes noise and reverberation in time domain, does not need to carry out extra transformation on voice signals, and avoids the loss of useful information in the process of signal transformation. Through analysis of experimental data, the time domain two-stage network can show better performance than the single-stage network and the frequency domain network.

Description

Two-stage network noise reduction and dereverberation method based on deep learning
Technical Field
The invention belongs to the field of voice processing, and relates to a two-stage network noise reduction and reverberation removal method based on deep learning.
Background
In recent years, researchers have done a lot of work on how to suppress background noise and room reverberation. For suppressing reverberation alone, inverse filtering is one of the most common methods, which is to estimate a reasonable inverse filter by estimating an inverse filter that counteracts the effect of the room impulse response and then convolving the reverberation signal with the inverse filter to obtain an estimated clean speech signal, which is difficult to estimate. Subsequently, wu Mingyang et al propose a two-stage algorithm based on a single microphone scenario, using an inverse filter in the first stage and spectral subtraction in the second stage, to process the early and late reverberation, respectively. Next, zhao Yan et al learn a spectrum mapping from a noisy reverberant speech signal to a clean speech signal based on the frequency domain using a Deep Neural Network (DNN), which is the first study to simultaneously process room reverberation and background noise using a supervised learning approach. But due to the different nature of the background noise and the room reverberation, i.e. the reverberation signal is generated by convolving the clean speech signal with the Room Impulse Response (RIR), whereas the noisy speech signal is the superposition of the clean speech signal with the background noise. Thus, the background noise and room reverberation cannot be handled in the same model in a general way, and the two interfering signals should be handled separately. In addition, the proposed algorithm processes the voice signal based on the frequency domain, and before reconstructing the frequency domain signal into the time domain waveform, the spectrum of the clean voice signal is often estimated by means of the phase information of the noisy voice signal, which cannot fully utilize the phase information of the clean voice signal, and further causes the estimated clean voice signal to deviate from the target voice signal.
Disclosure of Invention
It is therefore an object of the present invention to provide a time domain two-phase joint network model, which aims to stage the background noise and room reverberation in the time domain. The invention firstly trains two single-stage networks, and transplants the network weight parameters obtained by independent training into a two-stage joint network model, and further uses the network weight parameters as initial values of the two-stage joint network training. According to the invention, training and testing are carried out on the frequency domain single-stage network, the time domain single-stage network, the frequency domain two-stage network and the time domain two-stage network under the same data set, and subjective voice quality assessment (PESQ) and short-time objective intelligibility (STOI) scores of different networks are further compared, so that the time domain two-stage method provided by the invention is verified to have better performance.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a two-stage network noise reduction and dereverberation method based on deep learning, comprising the steps of:
s1: preparing a data set: setting a reverberation environment, synthesizing the reverberation environment with pure voice signals to obtain reverberation signals, and synthesizing the reverberation signals with a training noise data set and a test noise data set respectively to obtain a voice signal training set and a test set which simultaneously contain noise and reverberation;
s2: building a two-stage joint network model based on a cyclic neural network (RNN) and a time domain convolutional network (TCN), wherein the two-stage joint network model comprises a noise reduction stage and a dereverberation stage;
s3: the time domain voice signal is input into a single-stage network for independent training, the input of the noise reduction stage comprises a noise reverberation signal and a noise-free reverberation signal H (t), the noise-free reverberation signal H (t) is used as a learning label, and the output of the noise reduction stage is an estimated noise-free reverberation signalThe loss function will constantly estimate +.>Fitting to the learning label H (t); the input of the dereverberation stage comprises a noise-free reverberant signal and a clean speech signal s (t), and the clean speech signal s (t) is used as a learning label, and the output of the dereverberation stage is an estimated clean speech signal +.>The loss function will constantly estimate +.>Fitting to learning tags s (t);
s4: performing joint training on the two-stage joint network model, and simultaneously inhibiting noise and reverberation; the optimal weight parameters of independent training of the noise reduction stage and the dereverberation stage are reserved and used as initial values of training of a two-stage combined network model; the inputs of the two-stage joint network model include the noise reverberation signal and the clean speech signal s (t), s (t) being the learned label, will be estimatedIs a clean speech signal of (1)Fitting to the tag s (t);
s5: repeating the step S4, and ending training when the loss value reaches the minimum and converges;
s6: and testing the trained two-stage joint network model by using the test set.
Further, in step S1, the setting a reverberant environment is: defining 5 different reverberation times between 0.1s and 0.9s, and the step size is 0.2s; the length and width of the room are arbitrarily valued between 2 meters and 10 meters, and the microphone and sound source positions are randomly arranged inside the room.
Further, in step S1, different signal-to-noise ratios are used in synthesizing the noise reverberation signal, and all the speech data are at the same sampling rate.
Further, the model of the noise reduction stage in step S2 includes an encoder, a noise reduction module, and a decoder, where the noise reduction module includes sequence segmentation, block processing, and overlap-add; the encoder and decoder are configured to convert the speech signal back and forth from a time domain waveform to a high-dimensional feature; the sequence segmentation is used for segmenting an input characteristic sequence into overlapped blocks, and then stacking all the blocks into a three-dimensional tensor; the block processing comprises an intra-block processing module for processing the first and second dimension information of the three-dimensional tensor and an inter-block processing module for processing the first and third dimension information of the three-dimensional tensor, and the overlap-add is used for synthesizing the long speech sequence.
Further, the model of the dereverberation stage in step S2 is used to generate high-dimensional features of the input speech signal, including an encoder, a time domain convolution network, an activation function and a decoder; the decoder output of the noise reduction stage is used as the encoder input of the dereverberation stage, the mask is estimated through the time domain convolution network and the activation function, then the output of the encoder is multiplied with the estimated mask, the high-dimensional characteristics of the estimated pure speech signal are obtained, and finally the decoder is used for converting the estimated high-dimensional characteristics into the time domain speech signal.
Further, the time domain convolution network is composed of stacked one-dimensional dilation convolutions (1-D D-Conv).
Further, in step S3, the loss function formula of the noise reduction stage is as follows:
where s is the target speech signal,is estimated speech signals and methods I.I 2 Representing the inner product of the vectors.
Further, in step S3, the loss function formula of the dereverberation stage is as follows:
further, in step S4, the joint loss function of the two-stage network is as follows:
further, an Adam optimizer is adopted to optimize the joint loss of the two-stage network, the Adam algorithm sets independent adaptive learning rate for different parameters by calculating first moment estimation and second moment estimation of the gradient, the neuron weight is biased by counter propagation, and the weight of the network neuron is continuously updated by calculating an optimal solution.
The invention has the beneficial effects that: the invention processes noise and reverberation in time domain, does not need to carry out extra transformation on voice signals, and avoids the loss of useful information in the process of signal transformation. Through analysis of experimental data, the time domain two-stage network can show better performance than the single-stage network and the frequency domain network.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of a two-stage joint network model;
FIG. 2 is a schematic diagram of sequence division;
FIG. 3 is a block processing flow diagram;
fig. 4 is a schematic structural diagram of TCN.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Wherein the drawings are for illustrative purposes only and are shown in schematic, non-physical, and not intended to limit the invention; for the purpose of better illustrating embodiments of the invention, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the size of the actual product; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., that indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, it is only for convenience of describing the present invention and simplifying the description, but not for indicating or suggesting that the referred device or element must have a specific azimuth, be constructed and operated in a specific azimuth, so that the terms describing the positional relationship in the drawings are merely for exemplary illustration and should not be construed as limiting the present invention, and that the specific meaning of the above terms may be understood by those of ordinary skill in the art according to the specific circumstances.
Referring to fig. 1, a two-stage network noise reduction and dereverberation method based on deep learning mainly comprises the following steps:
step S1: a data set used in the present invention was made. The clean speech signal used was taken from the WSJ0 dataset, the noise dataset for training was taken from ESC-50, and the noise dataset for testing was taken from noise 92. Creating the dataset requires setting different reverberation times, room size, microphone locations and sound source locations to simulate different reverberant environments. First, 5 different reverberation times are defined between 0.1s and 0.9s, and the step size is 0.2s. Secondly, the length and width of the room are arbitrarily valued between 2 meters and 10 meters, and the microphone and sound source positions are randomly arranged inside the room. The clean speech signal from WSJ0 is used to synthesize a different reverberant signal from the randomly modeled reverberant environment. And randomly extracting noise from the ESC-50 and Noisex92 noise data sets and synthesizing the noise and the reverberation signal to obtain a voice signal containing both noise and reverberation. Different signal-to-noise ratios are used in synthesizing the noise reverberation signal, respectively-9 dB, -5dB, 0dB, 5dB, and 9dB. The training set in the final data set was 40 hours, the validation set was 15 hours, the test set was 15 hours, and the sampling rate of all voice data was 16kHz.
Step S2: the invention mainly builds a model based on two networks of RNN and TCN.
1) The noise reduction stage can be divided into three parts: encoder, noise reduction module and decoder. The noise reduction module in turn includes sequence segmentation, block processing, and overlap-add. The codec functions to convert the speech signal back and forth from a time domain waveform to high-dimensional features. As shown in fig. 2, the purpose of the sequence segmentation is to segment the input feature sequence into overlapping blocks, and then stack all the blocks into a three-dimensional tensor, which facilitates the learning of the block processing module. As shown in fig. 3, the block processing includes intra-block processing and inter-block processing, and for the intra-block processing module, it processes the first and second dimension information of the three-dimensional tensor, and the inter-block processing module processes the first and third dimension information of the three-dimensional tensor.
2) The dereverberation stage uses an encoder for generating high-dimensional features of the input speech signal, further multiplies the output of the encoder with the estimated mask to obtain high-dimensional features of the estimated clean speech signal, and finally uses a decoder to convert the estimated features into a time-domain speech signal. As shown in fig. 4, the TCN of stacked 1-D D-Conv is used in estimating the mask.
Step S3: the time domain speech signal is input into a single stage network for individual training. The purpose of the noise reduction stage is to suppress the noise so as to obtain a noise-free reverberant signal, the input of which comprises a noise reverberant signal and a noise-free reverberant signal H (t), and the latter is a tag for learning. The output of the noise reduction stage is an estimated noise-free reverberation signalThe loss function will constantly estimate +.>Fitting to the learning label H (t). The loss function formula for the noise reduction stage is as follows:
wherein:
where s is the target speech signal,is estimated speech signals and methods I.I 2 Representing the inner product of the vectors.
The purpose of the dereverberation stage is to recover a clean speech signal from the noiseless reverberant signal. The inputs include a noise-free reverberant signal and a clean speech signal s (t), s (t) being considered a learned tag. The output of the dereverberation stage is an estimated clean speech signalThe loss function will constantly estimate +.>Fitting to the learning label s (t) to achieve the expected effect of suppressing reverberation. The loss function of the dereverberation stage is formulated as follows:
step S4: the two-stage network is jointly trained, and noise and reverberation are restrained. The invention reserves the optimal weight parameters of independent training of the noise reduction stage and the dereverberation stage and uses the optimal weight parameters as initial values of two-stage joint network training. This not only shortens the training period of the two-phase joint network, but also makes it easier to obtain an optimal two-phase network model. The inputs to the two-stage joint network training include a noise reverberant signal and a clean speech signal s (t), the purpose of the model being to suppress both noise and reverberant to obtain an estimated clean speech signalAnd s (t) is used as a learning label to estimate the pure speech signal +.>Fitting to the tag s (t). The joint loss function of the two-stage network is as follows:
when the loss is large, the network performance is poor and the network is not optimal. In order to minimize loss, an Adam optimizer is adopted for parameter optimization, an Adam algorithm sets independent adaptive learning rate for different parameters by calculating first moment estimation and second moment estimation of gradients, and the neuron weight is biased by counter propagation, so that the weight of the network neuron is continuously updated by calculating an optimal solution.
Step S5: and (4) repeating the step (S4), ending training when the loss value reaches the minimum and converges, wherein the network parameters reach the optimum, and taking the network model as a system model.
Step S6: the trained model is tested by using the test data set synthesized in the step S1, and subjective voice quality assessment (PESQ) and short-time objective intelligibility (STOI) scores of various methods are respectively obtained by comparing different methods, so that the superior performance of the invention is verified, and the invention is shown in a table 1 as a PESQ score table, and a table 2 as a STOI score table.
TABLE 1
TABLE 2
Wherein PESQ scores between-0.5 and 4.5, STOI scores between 0 and 1, and the higher their scores, the better the performance of the network.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (10)

1. A two-stage network noise reduction and dereverberation method based on deep learning is characterized in that: the method comprises the following steps:
s1: preparing a data set: setting a reverberation environment, synthesizing the reverberation environment with pure voice signals to obtain reverberation signals, and synthesizing the reverberation signals with a training noise data set and a test noise data set respectively to obtain a voice signal training set and a test set which simultaneously contain noise and reverberation;
s2: building a two-stage joint network model based on a cyclic neural network (RNN) and a time domain convolutional network (TCN), wherein the two-stage joint network model comprises a noise reduction stage and a dereverberation stage;
s3: the time domain voice signal is input into a single-stage network for independent training, the input of the noise reduction stage comprises a noise reverberation signal and a noise-free reverberation signal H (t), the noise-free reverberation signal H (t) is used as a learning label, and the output of the noise reduction stage is an estimated noise-free reverberation signalThe loss function will constantly estimate +.>Fitting to the learning label H (t); the input of the dereverberation stage comprises a noise-free reverberant signal and a clean speech signal s (t), and the clean speech signal s (t) is used as a learning label, and the output of the dereverberation stage is an estimated clean speech signal +.>The loss function will constantly estimate +.>Fitting to learning tags s (t);
s4: performing joint training on the two-stage joint network model, and simultaneously inhibiting noise and reverberation; the optimal weight parameters of independent training of the noise reduction stage and the dereverberation stage are reserved and used as initial values of training of a two-stage combined network model; the inputs of the two-stage joint network model include the noise reverberation signal and the clean speech signal s (t), s (t) being the learned label, the estimated clean speech signalFitting to the tag s (t);
s5: repeating the step S4, and ending training when the loss value reaches the minimum and converges;
s6: and testing the trained two-stage joint network model by using the test set.
2. The deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: the setting the reverberation environment in step S1 is: defining 5 different reverberation times between 0.1s and 0.9s, and the step size is 0.2s; the length and width of the room are arbitrarily valued between 2 meters and 10 meters, and the microphone and sound source positions are randomly arranged inside the room.
3. The deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: in step S1, different signal-to-noise ratios are used when synthesizing the noise reverberation signal, and all the speech data are at the same sampling rate.
4. The deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: the model of the noise reduction stage in the step S2 comprises an encoder, a noise reduction module and a decoder, wherein the noise reduction module comprises sequence segmentation, block processing and overlap addition; the encoder and decoder are configured to convert the speech signal back and forth from a time domain waveform to a high-dimensional feature; the sequence segmentation is used for segmenting an input characteristic sequence into overlapped blocks, and then stacking all the blocks into a three-dimensional tensor; the block processing comprises an intra-block processing module for processing the first and second dimension information of the three-dimensional tensor and an inter-block processing module for processing the first and third dimension information of the three-dimensional tensor, and the overlap-add is used for synthesizing the long speech sequence.
5. The deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: the model of the dereverberation stage in step S2 is used to generate high-dimensional features of the input speech signal, including an encoder, a time domain convolution network, an activation function and a decoder; the decoder output of the noise reduction stage is used as the encoder input of the dereverberation stage, the mask is estimated through the time domain convolution network and the activation function, then the output of the encoder is multiplied with the estimated mask, the high-dimensional characteristics of the estimated pure speech signal are obtained, and finally the decoder is used for converting the estimated high-dimensional characteristics into the time domain speech signal.
6. The deep learning based two-stage network noise reduction and dereverberation method according to claim 5, wherein: the time domain convolution network is composed of stacked one-dimensional dilation convolutions (1-D D-Conv).
7. The deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: in step S3, the loss function formula in the noise reduction stage is as follows:
where s is the target speech signal,is estimated speech signals and methods I.I 2 Representing the inner product of the vectors.
8. The deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: in step S3, the loss function formula of the dereverberation stage is as follows:
9. the deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: in step S4, the joint loss function of the two-stage network is as follows:
10. the deep learning based two-stage network noise reduction and dereverberation method according to claim 1, wherein: and optimizing the joint loss of the two-stage network by adopting an Adam optimizer, setting independent adaptive learning rates for different parameters by calculating first moment estimation and second moment estimation of the gradient, carrying out bias analysis on neuron weights by counter propagation, and continuously updating the weights of the network neurons by calculating an optimal solution.
CN202210355142.4A 2022-04-06 2022-04-06 Two-stage network noise reduction and dereverberation method based on deep learning Pending CN116935879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210355142.4A CN116935879A (en) 2022-04-06 2022-04-06 Two-stage network noise reduction and dereverberation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210355142.4A CN116935879A (en) 2022-04-06 2022-04-06 Two-stage network noise reduction and dereverberation method based on deep learning

Publications (1)

Publication Number Publication Date
CN116935879A true CN116935879A (en) 2023-10-24

Family

ID=88391296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210355142.4A Pending CN116935879A (en) 2022-04-06 2022-04-06 Two-stage network noise reduction and dereverberation method based on deep learning

Country Status (1)

Country Link
CN (1) CN116935879A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174105A (en) * 2023-11-03 2023-12-05 深圳市龙芯威半导体科技有限公司 Speech noise reduction and dereverberation method based on improved deep convolutional network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117174105A (en) * 2023-11-03 2023-12-05 深圳市龙芯威半导体科技有限公司 Speech noise reduction and dereverberation method based on improved deep convolutional network

Similar Documents

Publication Publication Date Title
CN107703486B (en) Sound source positioning method based on convolutional neural network CNN
CN107452389A (en) A kind of general monophonic real-time noise-reducing method
JP5124014B2 (en) Signal enhancement apparatus, method, program and recording medium
CN108172231B (en) Dereverberation method and system based on Kalman filtering
CN109841206A (en) A kind of echo cancel method based on deep learning
CN112581973B (en) Voice enhancement method and system
CN112151059A (en) Microphone array-oriented channel attention weighted speech enhancement method
KR101807961B1 (en) Method and apparatus for processing speech signal based on lstm and dnn
CN110047478B (en) Multi-channel speech recognition acoustic modeling method and device based on spatial feature compensation
Steinmetz et al. Filtered noise shaping for time domain room impulse response estimation from reverberant speech
CN110998723B (en) Signal processing device using neural network, signal processing method, and recording medium
JP2013037174A (en) Noise/reverberation removal device, method thereof, and program
CN101460996A (en) Gain control system, gain control method, and gain control program
Lv et al. A permutation algorithm based on dynamic time warping in speech frequency-domain blind source separation
Zhang et al. Birdsoundsdenoising: Deep visual audio denoising for bird sounds
CN116935879A (en) Two-stage network noise reduction and dereverberation method based on deep learning
Qi et al. Exploring deep hybrid tensor-to-vector network architectures for regression based speech enhancement
JP4348393B2 (en) Signal distortion removing apparatus, method, program, and recording medium recording the program
CN112201276B (en) TC-ResNet network-based microphone array voice separation method
CN115424627A (en) Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm
JP2013167698A (en) Apparatus and method for estimating spectral shape feature quantity of signal for every sound source, and apparatus, method and program for estimating spectral feature quantity of target signal
CN114141266A (en) Speech enhancement method for estimating prior signal-to-noise ratio based on PESQ driven reinforcement learning
Kim et al. HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
Prasad et al. Two microphone technique to improve the speech intelligibility under noisy environment
Schmid et al. Dereverberation preprocessing and training data adjustments for robust speech recognition in reverberant environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination