CN112885375A - Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network - Google Patents

Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network Download PDF

Info

Publication number
CN112885375A
CN112885375A CN202110025619.8A CN202110025619A CN112885375A CN 112885375 A CN112885375 A CN 112885375A CN 202110025619 A CN202110025619 A CN 202110025619A CN 112885375 A CN112885375 A CN 112885375A
Authority
CN
China
Prior art keywords
sub
noise
band
energy
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110025619.8A
Other languages
Chinese (zh)
Inventor
王龙标
李楠
党建武
张苏林
于波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110025619.8A priority Critical patent/CN112885375A/en
Publication of CN112885375A publication Critical patent/CN112885375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Abstract

The invention discloses a global signal-to-noise ratio estimation method based on an auditory filter bank and a convolutional neural network, which comprises the following steps: 1) for noisy speech, dividing audio into different sub-bands by utilizing a high-pass filter and a low-pass filter according to a bark scale, and calculating the energy of each sub-band; 2) constructing a convolutional neural network, calculating the noise proportion in each sub-band, and further calculating the noise energy in the sub-band; 3) the global SNR is calculated. The invention mainly provides a dynamic noise estimation method based on an ear filter bank and aiming at a multi-subband convolutional neural network aiming at global signal-to-noise ratio estimation in a noise environment. Aiming at the energy of different sub-bands, a convolutional neural network is utilized, and a noise ratio estimation method is provided, which can dynamically estimate the noise energy ratio of different sub-bands. And the dynamic sub-band noise energy is utilized to further fuse the sub-band to the full-band signal-to-noise ratio estimation method, so that the accuracy of the global signal-to-noise ratio calculation is further improved.

Description

Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network
Technical Field
The invention relates to the field of voice signal processing, in particular to a global signal-to-noise ratio estimation method based on an auditory filter bank and a convolutional neural network, aiming at the problem of inaccurate noise estimation in an environment with a relatively low signal-to-noise ratio.
Background
In recent years, emerging industries such as smart homes, conversation robots, smart sound boxes and the like are vigorously developed, so that the life style of people and the interaction mode of people and machines are greatly changed, and voice interaction is widely applied to the emerging fields as a new interaction mode. With the application of deep learning in speech recognition, the recognition performance is greatly improved, the recognition rate is over 95 percent, and the recognition effect basically reaches the hearing level of people. However, the above is limited to the near-field condition, the noise and the reverberation generated by the room are very small, and how to achieve a good recognition effect in a complex scene (much noise or much reverberation) becomes a very important user experience.
Noise estimation is an important research direction for far-field speech recognition. In a noisy environment, the amount of influence of the noise on clean speech can be generally expressed as a signal-to-noise ratio (SNR), which is defined as the ratio of signal power to noise power expressed in decibels (dB). Accurate snr estimation can help design algorithms and systems that compensate for noise effects, such as robust speech recognition systems, speech enhancement, and noise suppression. Noise estimation of the signal-to-noise ratio is however a challenging task, since we usually do not know how different kinds of noise affect the original audio within one environment.
Generally, SNR estimates are generally divided into two categories, one being local SNR estimates, which are usually focused on SNR estimates at the frame level; the other is a global SNR estimate, which typically focuses on the overall distribution of noise in a period. In this work, we mainly solved the estimation problem of the global SNR.
The main methods for global SNR are divided into two types: one is an SNR estimation method based on a signal processing method in the past, and a typical method is a waveform energy distribution analysis (WADA) method, which assumes that speech and noise are modeled as gamma and gaussian distributions, respectively, and this method has a problem that background noise is assumed to be gaussian noise, but in daily life, there are not only gaussian noise but also a plurality of different kinds of noise, so this method has a great limitation in practical application. Another method is a deep learning-based method, which usually performs noise estimation according to input different kinds of speech features, but the performance of this method drops sharply with the decrease of the signal-to-noise ratio in real environment and the influence of unsteady noise. Therefore, it is still a challenging topic to propose a global SNR estimation method in real scenes.
Disclosure of Invention
The invention aims to explore a method for calculating the noise energy ratio of each sub-band in an auditory filter bank by using a convolutional neural network so as to improve the accuracy of global SNR estimation.
The technical scheme of the invention is a global signal-to-noise ratio estimation method based on an auditory filter bank and a convolutional neural network, which specifically comprises the following steps: 1) for noisy speech, dividing audio into different sub-bands by utilizing a high-pass filter and a low-pass filter according to a bark scale, and calculating the energy of each sub-band; 2) constructing a convolutional neural network, calculating the noise proportion in each sub-band, and further calculating the noise energy in the sub-band; 3) the global SNR is calculated.
The method comprises the following specific steps:
1) filter bank based on Bark scale
It is difficult to distinguish between noise and noisy speech using the global full-band approach. To overcome this, in this study we used a multi-subband approach. The noise has a different distribution at different frequencies. Under high SNR conditions, since noise is mainly distributed in a high frequency band, the noise can be easily identified using high frequency components of subbands. Under low signal-to-noise conditions, it is difficult to estimate the energy of speech and noise from noise. The use of sub-bands can process noise into multiple frequency bands, making it easier to determine speech and noise portions. The noise-containing voice is divided into sub-bands with different frequencies, so that the distinguishing capability of noise and voice can be improved.
As shown in fig. 1, since a person usually has a higher focus on low and mid frequencies when listening to a segment of speech, in this study, a filterbank for base hearing is used to segment the original speech waveform into different subbands. Here hearing based filterbank we use a Bark scale based filterbank consisting of band pass filters with constant bandwidth. In this study, the cut-off frequency of the filter was set to [ 100200300400510630770920108012701480172020002320270031503700 ] according to the Bark scale, respectively, and the sampling frequency of the speech was reduced to 8000 Hz in this experiment, which can be expressed as the following function
y(k,n)=BFB(y(n))))
Where n is the number of samples, K is the kth subband after we split the audio into K subbands, and BFB represents the Bark filter bank. We also need to compute the energy of each subband after splitting into different subbands as follows:
Etotal(k,n)=|y(k,n)|2
2) computation of sub-band noise energy
Fig. 2 shows the proposed subband noise energy estimation method, in the training phase, we input the subband energy into the proposed subband noise estimation network (SNENet) to estimate the subband noise energy ratio, and the label in the training process is calculated by the following formula:
Figure BDA0002890217370000031
where R ═ R1(1), R (2),.., R (k), ] N is the total number of samples in a frame of speech, and R (k) is the noise energy of the kth subband
Figure BDA0002890217370000032
The ratio of the amount to the ratio is calculated by training a neural network g in the training processθSo that
Figure BDA0002890217370000033
The value of (d) is minimal;
wherein, R is a set of noise energy ratios of each sub-band; g is the proposed subband noise energy estimation network (SNENet);
in the decoding (estimation) stage, we directly apply the subband energy E of the test datak,totalThe estimated sub-band noise energy ratio can be obtained by inputting the noise energy ratio into the trained network, and the final sub-band noise energy can be obtained by multiplying the sub-band noise energy ratio and the total sub-band energy, as shown in the following formula:
Figure BDA0002890217370000034
wherein the content of the first and second substances,
Figure BDA0002890217370000035
for the estimated noise ratio of the kth subband, ET(k) For the found magnitude of the noise energy in each subband.
Fig. 3 shows the proposed structure of SNENet, in which a CNN codec is used in order to obtain a more accurate local context pattern from a given sub-band speech energy. Not only for the full connectivity layer, we also use another convolutional network structure, namely a CNN codec (C-ED) network. As shown in FIG. 3, C-ED consists of convolution, average pooling, batch normalization, and ReLU layers. The number of encoder and decoder filters is corresponding, with the number of encoder filters gradually increasing and the number of decoder filters gradually decreasing. The number of channels of the convolutional layers in the convolutional neural network corresponds to different sub-bands, the average pooling layer is used for reducing the number of parameters, and in addition, in order to improve the generalization capability of the model, different convolutional kernels are arranged in the CNN model to learn different context modes.
In order to estimate the noise more accurately, a fully connected layer based network is used in SNENet. Through deeper nonlinear operation, the network can predict more detailed information, and the learning of the sub-band noise ratio is facilitated. The post-mapping network consists of two fully connected layers, where the activation function is ReLU. Finally, the final sub-band energy-to-noise ratio can be obtained through a full-connection network with a layer of activation function being Sigmoid.
3) Calculation of global signal-to-noise ratio
In this method, the power of the speech waveform is calculated from the sum of the powers of all sub-bands. Since the threshold is designed separately for each subband, the estimation of noise and speech power is much more accurate than the direct estimation in the time domain. The final global SNR is obtained by power fusion of all sub-bands as follows
Figure BDA0002890217370000041
Wherein P isS(k) For the sum of the energies, P, of all clean speech in the k-th subbandN(k) For the energy of all noise in the k-th subband
Figure BDA0002890217370000042
The sum of these subband energy sums is added to obtain the final estimated global SNR
Figure BDA0002890217370000043
Wherein P isN(k) The following results are obtained by calculation:
Figure BDA0002890217370000044
wherein L isNIs as follows
Figure BDA0002890217370000045
And when the number of the speech frames is larger than P, the estimated noise ratio is not completely correct, and when the number of the speech frames is larger than a certain value, the global signal-to-noise ratio is calculated most accurately, wherein L is the total number of the speech frames. Finally, P is obtained by subtracting the energy of all the energy and all the noiseS(k)。
Advantageous effects
The invention mainly provides a dynamic noise estimation method based on an ear filter bank and aiming at a multi-subband convolutional neural network aiming at global signal-to-noise ratio estimation in a noise environment.
1. Through the human ear filter bank, the auditory mechanism of human ears in a noise environment is utilized to divide the noisy speech into a plurality of different sub-bands, and higher resolution is set for the middle and low frequency bands, so that the estimation capability of the sub-band energy in the middle and low frequency bands is improved.
2. Aiming at the energy of different sub-bands, a convolutional neural network is utilized, and a noise ratio estimation method is provided, which can dynamically estimate the noise energy ratio of different sub-bands.
3. And the dynamic sub-band noise energy is utilized to further fuse the sub-band to the full-band signal-to-noise ratio estimation method, so that the accuracy of the global signal-to-noise ratio calculation is further improved.
Drawings
FIG. 1 is a flow diagram of a global SNR estimation system;
FIG. 2 is a flow chart of the computation of subband noise energy;
FIG. 3 a framework of estimation of sub-band noise ratios;
fig. 4 estimates the MAE between the global snr and the true global snr.
Detailed Description
The action and effect of the present invention will be shown below with reference to the accompanying drawings and tables.
This example gives an embodiment of the invention based on the speech data set AURORA-2J and the noise data set NOISEX-92 as an example. The whole system algorithm flow is shown in fig. 1, and comprises 4 steps of data set production, subband feature extraction, SNENet model training and global signal-to-noise ratio calculation.
The method comprises the following specific steps:
1) data set production
We used the AURORA-2J and NOISEX-92 datasets for speech data assessment. 8440 AURORA-2J clean voices were selected as clean voices in the training dataset. In NOISEX-92, white noise, pink noise, factory noise, and babble noise are used as background noise. The SNRs are set to 20, 15, 10, 5, 0, -5 and-10, respectively, which are generated by adding noise and speech signals and then using these clean and noisy speech signals for signal-to-noise ratio design. The sampling frequency is 8khz and the number of subbands is 17.
In the test set, we used 1001 sentences of clean speech in AURORA-2J to add different noise types and different signal-to-noise ratio of audio respectively for testing of the proposed method.
2) Extraction of sub-band features
We resample all audio sampling frequencies to 8khz, with the number of subbands set to 17. The setting of the number of sub-bands is the same as the technical scheme.
3) Training SNENet model
The structure of SNENet is shown in FIG. 3, and the SNENet is trained by using a CNN coding and decoding model of Tensorflow and subband energy characteristics. All hidden layers use ReLU as the activation function. We use the Adam algorithm as an optimizer. The number of convolutional layer filters is 17, 40, 64, 128, 64, 40, 17, and the length and width of the kernel are set to 2, 3, 5, 7, 5, 3, 2. In the mapping network, the concealment size is set to 512. After the SNENet model is trained, a subband noise energy calculation is performed, as shown in fig. 2.
4) Global signal-to-noise ratio calculation
After calculating the sub-band noise energy, we need to estimate the global snr. We estimated the proposed method using the absolute mean error (MAE), as shown in the following equation
Figure BDA0002890217370000061
Wherein G isiFor the estimated global noise, RiFor true global signal-to-noise ratio, N is the number of all test data, where N equals 1001.
Fig. 4 shows the result of MAE, wherein the previously proposed method is a global snr estimation method based on signal processing multiple subbands. From the results it can be seen that the effect of our proposed method under steady state noise conditions is basically the same as the real results as shown in a and b in fig. 4, but slightly decreases under the conditions of plant noise and babble noise and low signal-to-noise ratio as shown in c and d in fig. 4, which is a disadvantage of all methods, because the distribution of noise and signal is very similar under these non-steady state noise conditions, and we have difficulty to propose a method to perfectly estimate the noise.

Claims (3)

1. The global signal-to-noise ratio estimation method based on the auditory filter bank and the convolutional neural network is characterized by comprising the following steps of:
1) for noisy speech, dividing audio into different sub-bands by utilizing a high-pass filter and a low-pass filter according to a bark scale, and calculating the energy of each sub-band;
2) constructing a convolutional neural network, calculating the noise proportion in each sub-band, and further calculating the noise energy in the sub-band;
3) calculating a global SNR;
the method comprises the following specific steps:
1) filter bank based on Bark scale
Dividing the noisy speech into sub-bands of different frequencies by using a multi-sub-band method;
using a Bark scale based filterbank consisting of bandpass filters with constant bandwidth, the cut-off frequencies of the filters were set to [ 100200300400510630770920108012701480172020002320270031503700 ] according to the Bark scale, respectively, the sampling frequency of speech was reduced to 8000 Hz in this experiment, which can be expressed as the function y (k, n) ═ BFB (y (n))
Wherein n is the number of sampling points, K is the kth sub-band after the audio is divided into K sub-bands, and BFB represents a Bark filter bank;
after division into different sub-bands, the energy of each sub-band needs to be calculated as follows: etotal(k,n)=|y(k,n)|2
2) Computation of sub-band noise energy
In the training stage, the sub-band energy is input into the proposed sub-band noise estimation network to estimate the sub-band noise energy ratio, and the label in the training process is calculated by the following formula:
Figure FDA0002890217360000011
Figure FDA0002890217360000012
wherein, R ═ R (1), R (2),.., R (K)]N is the total number of sampling points in a frame of voice, r (k) is the noise energy ratio of the kth sub-band, and a neural network g is trained in the training processθSo that
Figure FDA0002890217360000021
The value of (c) is minimal. (ii) a
Wherein, R is a set of noise energy ratios of each sub-band; g is the proposed subband noise energy estimation network (SNENet);
in the decoding/estimating stage, the sub-band energy E of the test data is directly usedk,totalThe estimated sub-band noise energy ratio is obtained after the sub-band noise energy ratio is input into a trained network, and the final sub-band noise energy can be obtained by multiplying the sub-band noise energy ratio and the total sub-band energy, wherein the following formula is shown:
Figure FDA0002890217360000022
wherein the content of the first and second substances,
Figure FDA0002890217360000023
for the estimated noise ratio of the kth subband, ET(k) Obtaining the magnitude of noise energy in each sub-band;
3) calculation of global signal-to-noise ratio
The power of the speech waveform is calculated from the sum of the powers of all the sub-bands, and finally the global SNR is obtained by the power fusion of all the sub-bands as follows:
wherein, PS(k) For the sum of the energies, P, of all clean speech in the k-th subbandN(k) For the energy of all noise in the k-th subband
Figure FDA0002890217360000024
And, by adding these subband energy sums, the final estimated global SNR, i.e., the sum of the subband energies, is obtained
Figure FDA0002890217360000025
);
Figure FDA0002890217360000026
Figure FDA0002890217360000027
Wherein, PN(k) The calculation results; l isNIs as follows
Figure FDA0002890217360000028
When the number of the voice frames is larger than P, calculating the global signal-to-noise ratio most accurately when the number of the voice frames is larger than a certain value, wherein L is the total number of the voice frames;
finally, P is obtained by subtracting the energy of all the energy and all the noiseS(k)。
2. The auditory filterbank and convolutional neural network-based global snr estimation method according to claim 1, wherein a CNN codec is used in SNENet not only for the fully connected layer, but also another convolutional network structure, namely a CNN codec C-ED network, where C-ED consists of convolution, average pooling, batch normalization and ReLU layers;
the number of the encoder and decoder filters is corresponding, the number of the encoder filters is gradually increased, and the number of the decoder filters is gradually decreased;
the number of channels of the convolutional layer in the convolutional neural network corresponds to different sub-bands, the average pooling layer is used for reducing the number of parameters, and different convolutional cores are arranged in the CNN model to learn different context modes.
3. The auditory filterbank and convolutional neural network-based global signal-to-noise ratio estimation method of claim 1, wherein a fully-connected layer-based network is used in SNENet; the post-mapping network consists of two fully connected layers, wherein an activation function is a ReLU; and finally, obtaining the final sub-band energy-to-noise ratio through a full-connection network with a layer of activation function being Sigmoid.
CN202110025619.8A 2021-01-08 2021-01-08 Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network Pending CN112885375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110025619.8A CN112885375A (en) 2021-01-08 2021-01-08 Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110025619.8A CN112885375A (en) 2021-01-08 2021-01-08 Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network

Publications (1)

Publication Number Publication Date
CN112885375A true CN112885375A (en) 2021-06-01

Family

ID=76047452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110025619.8A Pending CN112885375A (en) 2021-01-08 2021-01-08 Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network

Country Status (1)

Country Link
CN (1) CN112885375A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496698A (en) * 2021-08-12 2021-10-12 云知声智能科技股份有限公司 Method, device and equipment for screening training data and storage medium
CN113506581A (en) * 2021-07-08 2021-10-15 京东科技控股股份有限公司 Voice enhancement method and device
CN113555028A (en) * 2021-07-19 2021-10-26 首约科技(北京)有限公司 Processing method for voice noise reduction of Internet of vehicles
CN117198290A (en) * 2023-11-06 2023-12-08 深圳市金鼎胜照明有限公司 Acoustic control-based multi-mode LED intelligent control method and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679330A (en) * 2016-03-16 2016-06-15 南京工程学院 Digital hearing aid noise reduction method based on improved sub-band signal-to-noise ratio estimation
US20190318755A1 (en) * 2018-04-13 2019-10-17 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved real-time audio processing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679330A (en) * 2016-03-16 2016-06-15 南京工程学院 Digital hearing aid noise reduction method based on improved sub-band signal-to-noise ratio estimation
US20190318755A1 (en) * 2018-04-13 2019-10-17 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable media for improved real-time audio processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI NAN: "《Study on Robust Voice Activity Detection Using CNN Encoder-decoder Based on MTF Concept Under Noisy Conditions》", 《JAIST REPOSITORY》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506581A (en) * 2021-07-08 2021-10-15 京东科技控股股份有限公司 Voice enhancement method and device
CN113506581B (en) * 2021-07-08 2024-04-05 京东科技控股股份有限公司 Voice enhancement method and device
CN113555028A (en) * 2021-07-19 2021-10-26 首约科技(北京)有限公司 Processing method for voice noise reduction of Internet of vehicles
CN113496698A (en) * 2021-08-12 2021-10-12 云知声智能科技股份有限公司 Method, device and equipment for screening training data and storage medium
CN113496698B (en) * 2021-08-12 2024-01-23 云知声智能科技股份有限公司 Training data screening method, device, equipment and storage medium
CN117198290A (en) * 2023-11-06 2023-12-08 深圳市金鼎胜照明有限公司 Acoustic control-based multi-mode LED intelligent control method and apparatus

Similar Documents

Publication Publication Date Title
CN107845389B (en) Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN110619885B (en) Method for generating confrontation network voice enhancement based on deep complete convolution neural network
Bhat et al. A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone
CN112885375A (en) Global signal-to-noise ratio estimation method based on auditory filter bank and convolutional neural network
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
US8880396B1 (en) Spectrum reconstruction for automatic speech recognition
CN110428849B (en) Voice enhancement method based on generation countermeasure network
Kleijn et al. Optimizing speech intelligibility in a noisy environment: A unified view
Ma et al. Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations
CN111192598A (en) Voice enhancement method for jump connection deep neural network
CN113744749B (en) Speech enhancement method and system based on psychoacoustic domain weighting loss function
Swami et al. Speech enhancement by noise driven adaptation of perceptual scales and thresholds of continuous wavelet transform coefficients
Braun et al. Effect of noise suppression losses on speech distortion and ASR performance
CN112992121A (en) Voice enhancement method based on attention residual error learning
Li et al. Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network
CN115424627A (en) Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm
Pirhosseinloo et al. A new feature set for masking-based monaural speech separation
Elshamy et al. DNN-based cepstral excitation manipulation for speech enhancement
Zhou et al. Speech Enhancement via Residual Dense Generative Adversarial Network.
CN114283835A (en) Voice enhancement and detection method suitable for actual communication condition
Sose et al. Sound Source Separation Using Neural Network
CN114566179A (en) Time delay controllable voice noise reduction method
Sivapatham et al. Gammatone Filter Bank-Deep Neural Network-based Monaural speech enhancement for unseen conditions
Zhao Control system and speech recognition of exhibition hall digital media based on computer technology
Seyedin et al. New features using robust MVDR spectrum of filtered autocorrelation sequence for robust speech recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210601