CN114999519A - Voice real-time noise reduction method and system based on double transformation - Google Patents
Voice real-time noise reduction method and system based on double transformation Download PDFInfo
- Publication number
- CN114999519A CN114999519A CN202210838874.9A CN202210838874A CN114999519A CN 114999519 A CN114999519 A CN 114999519A CN 202210838874 A CN202210838874 A CN 202210838874A CN 114999519 A CN114999519 A CN 114999519A
- Authority
- CN
- China
- Prior art keywords
- time
- signal
- domain signal
- time domain
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000009467 reduction Effects 0.000 title claims abstract description 24
- 230000009466 transformation Effects 0.000 title claims abstract description 18
- 238000009432 framing Methods 0.000 claims abstract description 24
- 230000000873 masking effect Effects 0.000 claims abstract description 22
- 238000001228 spectrum Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims description 9
- 230000009977 dual effect Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 125000004122 cyclic group Chemical group 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 20
- 238000000844 transformation Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Biophysics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to a voice real-time noise reduction method and a system based on double transformation, wherein the method comprises the following steps: performing framing processing on the voice signal, and performing short-time Fourier transform to obtain a time-frequency signal; masking the time frequency signal to enhance the time frequency signal; then, carrying out inverse Fourier transform to obtain a time domain signal; masking the time domain signal to enhance the time domain signal, and then performing one-dimensional convolution operation; reconstructing a waveform signal through overlap-add; through two cascaded transformations, firstly, a short-time Fourier transformation is carried out on a voice signal to obtain a time-frequency domain signal, a masking treatment is carried out to obtain a clean amplitude spectrum signal, the signal is transformed to a time domain signal for the second time, and then a masking treatment is carried out to obtain a final clean voice signal.
Description
Technical Field
The invention relates to the technical field of software development, in particular to a voice real-time noise reduction method and system based on double transformation.
Background
With the continuous development of internet technology, people can live broadcast, conference or conversation at any time and any place through a mobile phone, and voice signals are often interfered by noise of the surrounding environment in the process, so that the quality of the voice signals is reduced, the intelligibility of audio is poor, and the daily communication of people is influenced. In order to improve the quality of a voice signal, a single-channel voice enhancement technology is generally used for noise reduction of the voice, and the existing noise reduction technology cannot process a non-stationary noise signal; the single-channel noise reduction method usually only processes the amplitude spectrum in the signal, and retains the original noisy phase, and the quality of the generated noise reduction signal is poor.
Disclosure of Invention
Therefore, it is necessary to provide a method and a system for real-time speech noise reduction based on dual transformation, which have better noise reduction effect.
The embodiment of the invention provides a speech real-time noise reduction method based on double transformation, which is characterized by comprising the following steps:
s1, framing the voice signal, and performing short-time Fourier transform to obtain a time-frequency signal;
s2: masking the time-frequency signal to enhance and purify the time-frequency signal;
s3: carrying out inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
s4: masking the time domain signal to enhance and purify the time domain signal;
s5: performing one-dimensional convolution operation on the enhanced time domain signal;
s6: the waveform signal is reconstructed by overlapping phases.
Preferably, in step S1, when the speech signal is subjected to framing processing, a frame of 25-35ms length and a frame of 5-10ms are moved into the framing processing.
Preferably, in step S1, when the speech signal is subjected to framing processing, 32 ms-length one frame and 8ms frames are moved into framing processing.
Preferably, the short-time fourier transform employs the following equation:
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, M is a mask applied to Y, and has a value of 0-1,representing the phase portion after the short-time fourier transform, the clean audio is predicted by preserving the phase of the mixed speech.
Preferably, in step S2, the masking process performed on the time-frequency signal by the first partial encoder includes the following steps: the mixed amplitude spectrum Y is subjected to GRU network of a full connection layer and a Sigmoid layer to obtain a mask M, and the mask M is multiplied by the Y to obtain an estimated amplitude spectrum(ii) a The expression is as follows:
preferably, in step S3, the estimated magnitude spectrum is analyzedAnd original phaseAnd performing inverse Fourier transform to obtain a time-domain signal, and not synthesizing into a waveform signal.
Preferably, after step S3, before step S4, the following steps are also required: after channel normalization, the time domain signal passes through a GRU network with a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain a pre-estimated time domain signal; the expression is as follows:
preferably, in step S5, the number of channels is converted into the length of one frame by using one-dimensional convolution, and then the waveform is reconstructed by using an overlap-add technique; the expression is as follows:
the invention also provides a voice real-time noise reduction system, which comprises:
the framing module is used for framing the voice signals;
the short-time Fourier module is used for obtaining a time-frequency signal;
the first part encoder is used for masking the time-frequency signal to enhance and purify the time-frequency signal;
the inverse Fourier transform module is used for performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
the time domain signal is masked by the second part encoder, so that the time domain signal is enhanced and purified;
the one-dimensional convolution module is used for performing one-dimensional convolution operation on the enhanced time domain signal;
and the overlap-add module is used for performing overlap-add on the signals to reconstruct the waveform signals.
Preferably, the first partial encoder at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are two layers of GRU networks of a full connection layer and a Sigmoid layer respectively;
the second part of the encoder also at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are a full connection layer GRU network and a Sigmoid layer GRU network respectively.
According to the method, through two cascaded transformations, firstly, short-time Fourier transformation is carried out on a voice signal to obtain a time-frequency domain signal, masking processing is carried out to obtain a clean amplitude spectrum signal, the signal is transformed to a time domain signal for the second time, and then masking processing is carried out to obtain a final clean voice signal.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intended to be drawn to scale in actual dimensions, emphasis instead being placed upon illustrating the principles of the invention.
FIG. 1 is a flow chart of a method for real-time noise reduction of speech based on dual transformation according to the present invention.
Detailed Description
The technical solutions of the present invention are further described in detail with reference to the drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not limited to the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for reducing noise in real time for speech based on dual transformation, which is characterized by comprising the following steps:
s1, framing the voice signal, and obtaining a time-frequency signal through short-time Fourier transform;
s2: masking the time-frequency signal to enhance and purify the time-frequency signal;
s3: performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
s4: masking the time domain signal to enhance and purify the time domain signal;
s5: performing one-dimensional convolution operation on the enhanced time domain signal;
s6: the waveform signal is reconstructed by overlapping phases.
Through two times of transformation, the time-frequency domain signal is processed firstly, then the time domain signal is processed, and the noise-carrying first-quality signal is processed in a progressive mode. In the quadratic transform, the model processes the signal on a frame-by-frame basis, and the audio signal is streamed in real time without losing the performance of the model. The amplitude spectrum is processed firstly, and then the amplitude spectrum is processed in a time domain, so that the effect of processing the phase at the same time is achieved, and the processed voice signal is better and clear.
In the preferred embodiment, in step S1, when the speech signal is subjected to framing processing, one frame of 25-35ms length and 5-10ms frames are subjected to framing processing.
In the preferred embodiment, in step S1, when the speech signal is subjected to framing processing, 32 ms-length one frame and 8ms frames are shifted into framing processing.
In a preferred embodiment, the short-time fourier transform employs the following equation:
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, M is a mask applied to Y and has a value of 0-1,representing the phase portion after the short-time fourier transform, the clean audio is predicted by preserving the phase of the mixed speech.
In a preferred embodiment, in step S2, the masking processing is performed on the time-frequency signal by the first partial encoder, which includes the following steps: the mixed amplitude spectrum Y is subjected to GRU network of a full connection layer and a Sigmoid layer to obtain a mask M, and the mask M is multiplied by the Y to obtainPre-estimated amplitude spectrum(ii) a The expression is as follows:
in a preferred embodiment, in step S3, the estimated magnitude spectrum is comparedAnd original phaseAnd performing inverse Fourier transform to obtain a time domain signal, and not combining the time domain signal into a waveform signal.
In a preferred embodiment, after step S3, before step S4, the following steps are also required: after channel normalization, the time domain signal passes through a GRU network with a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain a pre-estimated time domain signal; the expression is as follows:
in a preferred embodiment, in step S5, the number of channels is converted into the length of one frame using one-dimensional convolution, and then the waveform is reconstructed using overlap-add technique; the expression is as follows:
the invention also provides a voice real-time noise reduction system, which comprises:
the framing module is used for framing the voice signals;
the short-time Fourier module is used for obtaining a time-frequency signal;
the first part encoder is used for masking the time-frequency signal to enhance and purify the time-frequency signal;
the inverse Fourier transform module is used for performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
the time domain signal is masked by the second part encoder, so that the time domain signal is enhanced and purified;
the one-dimensional convolution module is used for performing one-dimensional convolution operation on the enhanced time domain signal;
and the overlap-add module is used for performing overlap-add on the signals to reconstruct the waveform signals.
The first part of the encoder at least comprises two gating circulating units and two layers of GRU networks, wherein the two layers of GRU networks are full connection layers and Sigmoid layers respectively;
the second part of the encoder also at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are a full connection layer GRU network and a Sigmoid layer GRU network respectively.
Example 1:
as shown in fig. 1, in order to further improve the noise-reduced speech quality while maintaining a low computational complexity, the present invention provides a dual transform noise reduction technique, which can obtain a clean amplitude spectrum in the time-frequency domain in real time, and also obtain a clean time-domain signal after performing a secondary transform and noise reduction, and this method further models the phase signal to obtain a higher-quality speech signal.
A real-time noise reduction method based on double transformation comprises the following steps:
performing framing processing on a voice signal by moving a frame with the length of 32ms and a frame with the length of 8ms, performing short-time Fourier transform to obtain a time-frequency signal, performing masking processing on the time-frequency signal by using a first part of encoders, and performing inverse Fourier transform to obtain a time-domain signal;
a mask is obtained by passing the time domain signal through a second part encoder, and the time domain signal is masked;
performing one-dimensional convolution operation on the enhanced time domain signal, and then performing overlapping phase to enhance a waveform signal;
as a specific real-time scheme, the masking process performed by the first partial encoder includes the following steps:
s11: taking 32ms as a frame length of an audio signal, performing framing by frame shift of 8ms, and performing short-time Fourier transform:
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, M is a mask applied to Y, and has a value of 0-1,representing the phase part after short-time Fourier transform, and predicting clean audio by reserving the phase of mixed voice;
s12, the mixed amplitude spectrum Y is processed by two layers of GRU network, full connection layer and Sigmoid layer to obtain a mask M, and the mask M is multiplied by Y to obtain the estimated amplitude spectrum;
S13: pre-estimated amplitude spectrumAnd sourceWith a phaseThe inverse fourier transform is performed to obtain a time domain signal, but the time domain signal is not combined into a waveform signal.
S21: the second stage of conversion processing is processing of time domain signals, firstly, the framing time domain signals output in S1 are converted into signals with 256 channels through one-dimensional convolution;
s22: in order to facilitate real-time processing and convergence of deep learning training, channel normalization is firstly carried out, then a time domain signal passes through a two-layer GRU network with the same structure as that in S1, a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain an estimated time domain signal;
s31: s3 first converts the channel number to the length of one frame using one-dimensional convolution, and then reconstructs the waveform using overlap-add techniques.
According to the method, through two cascaded transformations, firstly, short-time Fourier transformation is carried out on a voice signal to obtain a time-frequency domain signal, masking processing is carried out to obtain a clean amplitude spectrum signal, the signal is transformed to a time domain signal for the second time, and then masking processing is carried out to obtain a final clean voice signal.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A speech real-time noise reduction method based on double transformation is characterized by comprising the following steps:
s1, framing the voice signal, and obtaining a time-frequency signal through short-time Fourier transform;
s2: masking the time-frequency signal to enhance and purify the time-frequency signal;
s3: carrying out inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
s4: masking the time domain signal to enhance and purify the time domain signal;
s5: performing one-dimensional convolution operation on the enhanced time domain signal;
s6: the waveform signal is reconstructed by overlapping phases.
2. The method of real-time noise reduction for speech based on double-transform as claimed in claim 1, wherein in step S1, when the speech signal is framed, a frame of 25-35ms length and a frame of 5-10ms are moved into the framing process.
3. The method for reducing noise in real time for speech based on double-transform as claimed in claim 2, wherein in step S1, when framing the speech signal, 32ms long one frame and 8ms frame are moved into the framing.
4. The dual transform-based voice real-time noise reduction method of claim 1, wherein the short-time fourier transform employs the following formula:
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, and M is applied toA shade is obtained on Y, the value of which is 0-1,representing the phase portion after the short-time fourier transform, the clean audio is predicted by preserving the phase of the mixed speech.
5. The dual transform-based speech real-time noise reduction method of claim 1,
in step S2, the masking process is performed on the time-frequency signal by the first partial encoder, which includes the following steps: the mixed amplitude spectrum Y is subjected to GRU network of a full connection layer and a Sigmoid layer to obtain a mask M, and the mask M is multiplied by the Y to obtain an estimated amplitude spectrum(ii) a The expression is as follows:
7. The method for real-time noise reduction of speech based on double transformation according to claim 6, wherein after step S3, before step S4, the following steps are further performed: after channel normalization, the time domain signal passes through GRU networks of a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain a pre-estimated time domain signal; the expression is as follows:
8. the method for reducing noise in real time based on dual-transform speech of claim 1, wherein in step S5, a one-dimensional convolution is used to convert the number of channels into a length of one frame, and then an overlap-add technique is used to reconstruct the waveform; the expression is as follows:
9. a real-time voice noise reduction system is characterized by comprising
The framing module is used for framing the voice signals;
the short-time Fourier module is used for obtaining a time-frequency signal;
the first part encoder is used for masking the time-frequency signal to enhance and purify the time-frequency signal;
the inverse Fourier transform module is used for performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
the time domain signal is masked by the second part encoder, so that the time domain signal is enhanced and purified;
the one-dimensional convolution module is used for performing one-dimensional convolution operation on the enhanced time domain signal;
and the overlap-add module is used for performing overlap-add on the signals to reconstruct the waveform signals.
10. The speech real-time noise reduction system of claim 9, wherein the first partial encoder comprises at least two gated cyclic units and two layers of GRU networks, the two layers of GRU networks being a fully connected layer and a Sigmoid layer two layers of GRU networks, respectively;
the second part of the encoder also at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are a full connection layer GRU network and a Sigmoid layer GRU network respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210838874.9A CN114999519A (en) | 2022-07-18 | 2022-07-18 | Voice real-time noise reduction method and system based on double transformation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210838874.9A CN114999519A (en) | 2022-07-18 | 2022-07-18 | Voice real-time noise reduction method and system based on double transformation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114999519A true CN114999519A (en) | 2022-09-02 |
Family
ID=83022341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210838874.9A Pending CN114999519A (en) | 2022-07-18 | 2022-07-18 | Voice real-time noise reduction method and system based on double transformation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114999519A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107845389A (en) * | 2017-12-21 | 2018-03-27 | 北京工业大学 | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks |
US20190172476A1 (en) * | 2017-12-04 | 2019-06-06 | Apple Inc. | Deep learning driven multi-channel filtering for speech enhancement |
CN110211602A (en) * | 2019-05-17 | 2019-09-06 | 北京华控创为南京信息技术有限公司 | Intelligent sound enhances communication means and device |
CN110767244A (en) * | 2018-07-25 | 2020-02-07 | 中国科学技术大学 | Speech enhancement method |
CN111128214A (en) * | 2019-12-19 | 2020-05-08 | 网易(杭州)网络有限公司 | Audio noise reduction method and device, electronic equipment and medium |
CN113593594A (en) * | 2021-09-01 | 2021-11-02 | 北京达佳互联信息技术有限公司 | Training method and device of voice enhancement model and voice enhancement method and device |
CN113611324A (en) * | 2021-06-21 | 2021-11-05 | 上海一谈网络科技有限公司 | Method and device for inhibiting environmental noise in live broadcast, electronic equipment and storage medium |
CN113744749A (en) * | 2021-09-18 | 2021-12-03 | 太原理工大学 | Voice enhancement method and system based on psychoacoustic domain weighting loss function |
US20220044696A1 (en) * | 2020-08-06 | 2022-02-10 | LINE Plus Corporation | Methods and apparatuses for noise reduction based on time and frequency analysis using deep learning |
CN114121029A (en) * | 2021-12-23 | 2022-03-01 | 北京达佳互联信息技术有限公司 | Training method and device of speech enhancement model and speech enhancement method and device |
CN114171038A (en) * | 2021-12-10 | 2022-03-11 | 北京百度网讯科技有限公司 | Voice noise reduction method, device, equipment, storage medium and program product |
CN114694670A (en) * | 2022-04-06 | 2022-07-01 | 华南理工大学 | Multi-task network-based microphone array speech enhancement system and method |
-
2022
- 2022-07-18 CN CN202210838874.9A patent/CN114999519A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190172476A1 (en) * | 2017-12-04 | 2019-06-06 | Apple Inc. | Deep learning driven multi-channel filtering for speech enhancement |
CN107845389A (en) * | 2017-12-21 | 2018-03-27 | 北京工业大学 | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks |
CN110767244A (en) * | 2018-07-25 | 2020-02-07 | 中国科学技术大学 | Speech enhancement method |
CN110211602A (en) * | 2019-05-17 | 2019-09-06 | 北京华控创为南京信息技术有限公司 | Intelligent sound enhances communication means and device |
CN111128214A (en) * | 2019-12-19 | 2020-05-08 | 网易(杭州)网络有限公司 | Audio noise reduction method and device, electronic equipment and medium |
US20220044696A1 (en) * | 2020-08-06 | 2022-02-10 | LINE Plus Corporation | Methods and apparatuses for noise reduction based on time and frequency analysis using deep learning |
CN113611324A (en) * | 2021-06-21 | 2021-11-05 | 上海一谈网络科技有限公司 | Method and device for inhibiting environmental noise in live broadcast, electronic equipment and storage medium |
CN113593594A (en) * | 2021-09-01 | 2021-11-02 | 北京达佳互联信息技术有限公司 | Training method and device of voice enhancement model and voice enhancement method and device |
CN113744749A (en) * | 2021-09-18 | 2021-12-03 | 太原理工大学 | Voice enhancement method and system based on psychoacoustic domain weighting loss function |
CN114171038A (en) * | 2021-12-10 | 2022-03-11 | 北京百度网讯科技有限公司 | Voice noise reduction method, device, equipment, storage medium and program product |
CN114121029A (en) * | 2021-12-23 | 2022-03-01 | 北京达佳互联信息技术有限公司 | Training method and device of speech enhancement model and speech enhancement method and device |
CN114694670A (en) * | 2022-04-06 | 2022-07-01 | 华南理工大学 | Multi-task network-based microphone array speech enhancement system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065067B (en) | Conference terminal voice noise reduction method based on neural network model | |
CN111081268A (en) | Phase-correlated shared deep convolutional neural network speech enhancement method | |
CN110600050B (en) | Microphone array voice enhancement method and system based on deep neural network | |
CN101976566B (en) | Voice enhancement method and device using same | |
CN101083640A (en) | Low complexity noise reduction method | |
CN101916567A (en) | Speech enhancement method applied to dual-microphone system | |
CN114566176B (en) | Residual echo cancellation method and system based on deep neural network | |
Li et al. | Real-time monaural speech enhancement with short-time discrete cosine transform | |
JP2004527797A (en) | Audio signal processing method | |
Chavan et al. | Studies on implementation of wavelet for denoising speech signal | |
CN115273883A (en) | Convolution cyclic neural network, and voice enhancement method and device | |
Garg | Speech enhancement using long short term memory with trained speech features and adaptive wiener filter | |
Choi | Noise reduction algorithm in speech by Wiener filter | |
Nuha et al. | Noise reduction and speech enhancement using wiener filter | |
CN114360571A (en) | Reference-based speech enhancement method | |
JP2024502287A (en) | Speech enhancement method, speech enhancement device, electronic device, and computer program | |
Shraddha et al. | Noise cancellation and noise reduction techniques: A review | |
CN116884426A (en) | Voice enhancement method, device and equipment based on DFSMN model | |
CN114999519A (en) | Voice real-time noise reduction method and system based on double transformation | |
WO2022032608A1 (en) | Audio noise reduction method and device | |
CN114023352B (en) | Voice enhancement method and device based on energy spectrum depth modulation | |
Djendi | An efficient frequency-domain adaptive forward BSS algorithm for acoustic noise reduction and speech quality enhancement | |
CN113611321A (en) | Voice enhancement method and system | |
Sudheer Kumar et al. | Noise Reduction in Audio File Using Spectral Gatting and FFT by Python Modules | |
Chokkarapu et al. | Implementation of spectral subtraction noise suppressor using DSP processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220902 |