CN114999519A - Voice real-time noise reduction method and system based on double transformation - Google Patents

Voice real-time noise reduction method and system based on double transformation Download PDF

Info

Publication number
CN114999519A
CN114999519A CN202210838874.9A CN202210838874A CN114999519A CN 114999519 A CN114999519 A CN 114999519A CN 202210838874 A CN202210838874 A CN 202210838874A CN 114999519 A CN114999519 A CN 114999519A
Authority
CN
China
Prior art keywords
time
signal
domain signal
time domain
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210838874.9A
Other languages
Chinese (zh)
Inventor
唐镇坤
潘伟
吴庆耀
钟佳
王琅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Consumer Finance Co ltd
Original Assignee
China Post Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Post Consumer Finance Co ltd filed Critical China Post Consumer Finance Co ltd
Priority to CN202210838874.9A priority Critical patent/CN114999519A/en
Publication of CN114999519A publication Critical patent/CN114999519A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a voice real-time noise reduction method and a system based on double transformation, wherein the method comprises the following steps: performing framing processing on the voice signal, and performing short-time Fourier transform to obtain a time-frequency signal; masking the time frequency signal to enhance the time frequency signal; then, carrying out inverse Fourier transform to obtain a time domain signal; masking the time domain signal to enhance the time domain signal, and then performing one-dimensional convolution operation; reconstructing a waveform signal through overlap-add; through two cascaded transformations, firstly, a short-time Fourier transformation is carried out on a voice signal to obtain a time-frequency domain signal, a masking treatment is carried out to obtain a clean amplitude spectrum signal, the signal is transformed to a time domain signal for the second time, and then a masking treatment is carried out to obtain a final clean voice signal.

Description

Voice real-time noise reduction method and system based on double transformation
Technical Field
The invention relates to the technical field of software development, in particular to a voice real-time noise reduction method and system based on double transformation.
Background
With the continuous development of internet technology, people can live broadcast, conference or conversation at any time and any place through a mobile phone, and voice signals are often interfered by noise of the surrounding environment in the process, so that the quality of the voice signals is reduced, the intelligibility of audio is poor, and the daily communication of people is influenced. In order to improve the quality of a voice signal, a single-channel voice enhancement technology is generally used for noise reduction of the voice, and the existing noise reduction technology cannot process a non-stationary noise signal; the single-channel noise reduction method usually only processes the amplitude spectrum in the signal, and retains the original noisy phase, and the quality of the generated noise reduction signal is poor.
Disclosure of Invention
Therefore, it is necessary to provide a method and a system for real-time speech noise reduction based on dual transformation, which have better noise reduction effect.
The embodiment of the invention provides a speech real-time noise reduction method based on double transformation, which is characterized by comprising the following steps:
s1, framing the voice signal, and performing short-time Fourier transform to obtain a time-frequency signal;
s2: masking the time-frequency signal to enhance and purify the time-frequency signal;
s3: carrying out inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
s4: masking the time domain signal to enhance and purify the time domain signal;
s5: performing one-dimensional convolution operation on the enhanced time domain signal;
s6: the waveform signal is reconstructed by overlapping phases.
Preferably, in step S1, when the speech signal is subjected to framing processing, a frame of 25-35ms length and a frame of 5-10ms are moved into the framing processing.
Preferably, in step S1, when the speech signal is subjected to framing processing, 32 ms-length one frame and 8ms frames are moved into framing processing.
Preferably, the short-time fourier transform employs the following equation:
Figure 757307DEST_PATH_IMAGE001
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, M is a mask applied to Y, and has a value of 0-1,
Figure 768251DEST_PATH_IMAGE002
representing the phase portion after the short-time fourier transform, the clean audio is predicted by preserving the phase of the mixed speech.
Preferably, in step S2, the masking process performed on the time-frequency signal by the first partial encoder includes the following steps: the mixed amplitude spectrum Y is subjected to GRU network of a full connection layer and a Sigmoid layer to obtain a mask M, and the mask M is multiplied by the Y to obtain an estimated amplitude spectrum
Figure 410454DEST_PATH_IMAGE003
(ii) a The expression is as follows:
Figure 513539DEST_PATH_IMAGE004
Figure 155348DEST_PATH_IMAGE005
preferably, in step S3, the estimated magnitude spectrum is analyzed
Figure 489640DEST_PATH_IMAGE006
And original phase
Figure 302744DEST_PATH_IMAGE002
And performing inverse Fourier transform to obtain a time-domain signal, and not synthesizing into a waveform signal.
Preferably, after step S3, before step S4, the following steps are also required: after channel normalization, the time domain signal passes through a GRU network with a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain a pre-estimated time domain signal; the expression is as follows:
Figure 893125DEST_PATH_IMAGE007
Figure 354937DEST_PATH_IMAGE008
preferably, in step S5, the number of channels is converted into the length of one frame by using one-dimensional convolution, and then the waveform is reconstructed by using an overlap-add technique; the expression is as follows:
Figure 698063DEST_PATH_IMAGE009
the invention also provides a voice real-time noise reduction system, which comprises:
the framing module is used for framing the voice signals;
the short-time Fourier module is used for obtaining a time-frequency signal;
the first part encoder is used for masking the time-frequency signal to enhance and purify the time-frequency signal;
the inverse Fourier transform module is used for performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
the time domain signal is masked by the second part encoder, so that the time domain signal is enhanced and purified;
the one-dimensional convolution module is used for performing one-dimensional convolution operation on the enhanced time domain signal;
and the overlap-add module is used for performing overlap-add on the signals to reconstruct the waveform signals.
Preferably, the first partial encoder at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are two layers of GRU networks of a full connection layer and a Sigmoid layer respectively;
the second part of the encoder also at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are a full connection layer GRU network and a Sigmoid layer GRU network respectively.
According to the method, through two cascaded transformations, firstly, short-time Fourier transformation is carried out on a voice signal to obtain a time-frequency domain signal, masking processing is carried out to obtain a clean amplitude spectrum signal, the signal is transformed to a time domain signal for the second time, and then masking processing is carried out to obtain a final clean voice signal.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intended to be drawn to scale in actual dimensions, emphasis instead being placed upon illustrating the principles of the invention.
FIG. 1 is a flow chart of a method for real-time noise reduction of speech based on dual transformation according to the present invention.
Detailed Description
The technical solutions of the present invention are further described in detail with reference to the drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not limited to the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for reducing noise in real time for speech based on dual transformation, which is characterized by comprising the following steps:
s1, framing the voice signal, and obtaining a time-frequency signal through short-time Fourier transform;
s2: masking the time-frequency signal to enhance and purify the time-frequency signal;
s3: performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
s4: masking the time domain signal to enhance and purify the time domain signal;
s5: performing one-dimensional convolution operation on the enhanced time domain signal;
s6: the waveform signal is reconstructed by overlapping phases.
Through two times of transformation, the time-frequency domain signal is processed firstly, then the time domain signal is processed, and the noise-carrying first-quality signal is processed in a progressive mode. In the quadratic transform, the model processes the signal on a frame-by-frame basis, and the audio signal is streamed in real time without losing the performance of the model. The amplitude spectrum is processed firstly, and then the amplitude spectrum is processed in a time domain, so that the effect of processing the phase at the same time is achieved, and the processed voice signal is better and clear.
In the preferred embodiment, in step S1, when the speech signal is subjected to framing processing, one frame of 25-35ms length and 5-10ms frames are subjected to framing processing.
In the preferred embodiment, in step S1, when the speech signal is subjected to framing processing, 32 ms-length one frame and 8ms frames are shifted into framing processing.
In a preferred embodiment, the short-time fourier transform employs the following equation:
Figure 698380DEST_PATH_IMAGE010
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, M is a mask applied to Y and has a value of 0-1,
Figure 464473DEST_PATH_IMAGE011
representing the phase portion after the short-time fourier transform, the clean audio is predicted by preserving the phase of the mixed speech.
In a preferred embodiment, in step S2, the masking processing is performed on the time-frequency signal by the first partial encoder, which includes the following steps: the mixed amplitude spectrum Y is subjected to GRU network of a full connection layer and a Sigmoid layer to obtain a mask M, and the mask M is multiplied by the Y to obtainPre-estimated amplitude spectrum
Figure 106807DEST_PATH_IMAGE012
(ii) a The expression is as follows:
Figure 773280DEST_PATH_IMAGE013
Figure 944499DEST_PATH_IMAGE014
in a preferred embodiment, in step S3, the estimated magnitude spectrum is compared
Figure 460537DEST_PATH_IMAGE015
And original phase
Figure 703300DEST_PATH_IMAGE016
And performing inverse Fourier transform to obtain a time domain signal, and not combining the time domain signal into a waveform signal.
In a preferred embodiment, after step S3, before step S4, the following steps are also required: after channel normalization, the time domain signal passes through a GRU network with a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain a pre-estimated time domain signal; the expression is as follows:
Figure 240591DEST_PATH_IMAGE017
Figure 35241DEST_PATH_IMAGE018
in a preferred embodiment, in step S5, the number of channels is converted into the length of one frame using one-dimensional convolution, and then the waveform is reconstructed using overlap-add technique; the expression is as follows:
Figure 87511DEST_PATH_IMAGE019
the invention also provides a voice real-time noise reduction system, which comprises:
the framing module is used for framing the voice signals;
the short-time Fourier module is used for obtaining a time-frequency signal;
the first part encoder is used for masking the time-frequency signal to enhance and purify the time-frequency signal;
the inverse Fourier transform module is used for performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
the time domain signal is masked by the second part encoder, so that the time domain signal is enhanced and purified;
the one-dimensional convolution module is used for performing one-dimensional convolution operation on the enhanced time domain signal;
and the overlap-add module is used for performing overlap-add on the signals to reconstruct the waveform signals.
The first part of the encoder at least comprises two gating circulating units and two layers of GRU networks, wherein the two layers of GRU networks are full connection layers and Sigmoid layers respectively;
the second part of the encoder also at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are a full connection layer GRU network and a Sigmoid layer GRU network respectively.
Example 1:
as shown in fig. 1, in order to further improve the noise-reduced speech quality while maintaining a low computational complexity, the present invention provides a dual transform noise reduction technique, which can obtain a clean amplitude spectrum in the time-frequency domain in real time, and also obtain a clean time-domain signal after performing a secondary transform and noise reduction, and this method further models the phase signal to obtain a higher-quality speech signal.
A real-time noise reduction method based on double transformation comprises the following steps:
performing framing processing on a voice signal by moving a frame with the length of 32ms and a frame with the length of 8ms, performing short-time Fourier transform to obtain a time-frequency signal, performing masking processing on the time-frequency signal by using a first part of encoders, and performing inverse Fourier transform to obtain a time-domain signal;
a mask is obtained by passing the time domain signal through a second part encoder, and the time domain signal is masked;
performing one-dimensional convolution operation on the enhanced time domain signal, and then performing overlapping phase to enhance a waveform signal;
as a specific real-time scheme, the masking process performed by the first partial encoder includes the following steps:
s11: taking 32ms as a frame length of an audio signal, performing framing by frame shift of 8ms, and performing short-time Fourier transform:
Figure 494483DEST_PATH_IMAGE020
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, M is a mask applied to Y, and has a value of 0-1,
Figure 276495DEST_PATH_IMAGE011
representing the phase part after short-time Fourier transform, and predicting clean audio by reserving the phase of mixed voice;
s12, the mixed amplitude spectrum Y is processed by two layers of GRU network, full connection layer and Sigmoid layer to obtain a mask M, and the mask M is multiplied by Y to obtain the estimated amplitude spectrum
Figure 992778DEST_PATH_IMAGE021
Figure 984874DEST_PATH_IMAGE022
Figure 569439DEST_PATH_IMAGE023
S13: pre-estimated amplitude spectrum
Figure 81323DEST_PATH_IMAGE012
And sourceWith a phase
Figure 93141DEST_PATH_IMAGE011
The inverse fourier transform is performed to obtain a time domain signal, but the time domain signal is not combined into a waveform signal.
S21: the second stage of conversion processing is processing of time domain signals, firstly, the framing time domain signals output in S1 are converted into signals with 256 channels through one-dimensional convolution;
Figure 501030DEST_PATH_IMAGE024
s22: in order to facilitate real-time processing and convergence of deep learning training, channel normalization is firstly carried out, then a time domain signal passes through a two-layer GRU network with the same structure as that in S1, a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain an estimated time domain signal;
Figure 889286DEST_PATH_IMAGE025
Figure 504944DEST_PATH_IMAGE026
s31: s3 first converts the channel number to the length of one frame using one-dimensional convolution, and then reconstructs the waveform using overlap-add techniques.
According to the method, through two cascaded transformations, firstly, short-time Fourier transformation is carried out on a voice signal to obtain a time-frequency domain signal, masking processing is carried out to obtain a clean amplitude spectrum signal, the signal is transformed to a time domain signal for the second time, and then masking processing is carried out to obtain a final clean voice signal.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A speech real-time noise reduction method based on double transformation is characterized by comprising the following steps:
s1, framing the voice signal, and obtaining a time-frequency signal through short-time Fourier transform;
s2: masking the time-frequency signal to enhance and purify the time-frequency signal;
s3: carrying out inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
s4: masking the time domain signal to enhance and purify the time domain signal;
s5: performing one-dimensional convolution operation on the enhanced time domain signal;
s6: the waveform signal is reconstructed by overlapping phases.
2. The method of real-time noise reduction for speech based on double-transform as claimed in claim 1, wherein in step S1, when the speech signal is framed, a frame of 25-35ms length and a frame of 5-10ms are moved into the framing process.
3. The method for reducing noise in real time for speech based on double-transform as claimed in claim 2, wherein in step S1, when framing the speech signal, 32ms long one frame and 8ms frame are moved into the framing.
4. The dual transform-based voice real-time noise reduction method of claim 1, wherein the short-time fourier transform employs the following formula:
Figure 263332DEST_PATH_IMAGE001
where Y represents the amplitude component of the mixed speech signal Y after short-time Fourier transform, and M is applied toA shade is obtained on Y, the value of which is 0-1,
Figure 400659DEST_PATH_IMAGE002
representing the phase portion after the short-time fourier transform, the clean audio is predicted by preserving the phase of the mixed speech.
5. The dual transform-based speech real-time noise reduction method of claim 1,
in step S2, the masking process is performed on the time-frequency signal by the first partial encoder, which includes the following steps: the mixed amplitude spectrum Y is subjected to GRU network of a full connection layer and a Sigmoid layer to obtain a mask M, and the mask M is multiplied by the Y to obtain an estimated amplitude spectrum
Figure 248398DEST_PATH_IMAGE003
(ii) a The expression is as follows:
Figure 453DEST_PATH_IMAGE004
Figure 917856DEST_PATH_IMAGE005
6. the method of claim 5, wherein in step S3, the estimated magnitude spectrum is processed
Figure 981627DEST_PATH_IMAGE006
And original phase
Figure 508423DEST_PATH_IMAGE007
And performing inverse Fourier transform to obtain a time domain signal, and not combining the time domain signal into a waveform signal.
7. The method for real-time noise reduction of speech based on double transformation according to claim 6, wherein after step S3, before step S4, the following steps are further performed: after channel normalization, the time domain signal passes through GRU networks of a full connection layer and a Sigmoid layer to obtain a mask M on a time domain, and the mask M is multiplied by a framed time domain signal to obtain a pre-estimated time domain signal; the expression is as follows:
Figure 862788DEST_PATH_IMAGE008
Figure 324993DEST_PATH_IMAGE009
8. the method for reducing noise in real time based on dual-transform speech of claim 1, wherein in step S5, a one-dimensional convolution is used to convert the number of channels into a length of one frame, and then an overlap-add technique is used to reconstruct the waveform; the expression is as follows:
Figure 531853DEST_PATH_IMAGE010
9. a real-time voice noise reduction system is characterized by comprising
The framing module is used for framing the voice signals;
the short-time Fourier module is used for obtaining a time-frequency signal;
the first part encoder is used for masking the time-frequency signal to enhance and purify the time-frequency signal;
the inverse Fourier transform module is used for performing inverse Fourier transform on the enhanced time-frequency signal to obtain a time-domain signal;
the time domain signal is masked by the second part encoder, so that the time domain signal is enhanced and purified;
the one-dimensional convolution module is used for performing one-dimensional convolution operation on the enhanced time domain signal;
and the overlap-add module is used for performing overlap-add on the signals to reconstruct the waveform signals.
10. The speech real-time noise reduction system of claim 9, wherein the first partial encoder comprises at least two gated cyclic units and two layers of GRU networks, the two layers of GRU networks being a fully connected layer and a Sigmoid layer two layers of GRU networks, respectively;
the second part of the encoder also at least comprises two gating cycle units and two layers of GRU networks, wherein the two layers of GRU networks are a full connection layer GRU network and a Sigmoid layer GRU network respectively.
CN202210838874.9A 2022-07-18 2022-07-18 Voice real-time noise reduction method and system based on double transformation Pending CN114999519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210838874.9A CN114999519A (en) 2022-07-18 2022-07-18 Voice real-time noise reduction method and system based on double transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210838874.9A CN114999519A (en) 2022-07-18 2022-07-18 Voice real-time noise reduction method and system based on double transformation

Publications (1)

Publication Number Publication Date
CN114999519A true CN114999519A (en) 2022-09-02

Family

ID=83022341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210838874.9A Pending CN114999519A (en) 2022-07-18 2022-07-18 Voice real-time noise reduction method and system based on double transformation

Country Status (1)

Country Link
CN (1) CN114999519A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107845389A (en) * 2017-12-21 2018-03-27 北京工业大学 A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks
US20190172476A1 (en) * 2017-12-04 2019-06-06 Apple Inc. Deep learning driven multi-channel filtering for speech enhancement
CN110211602A (en) * 2019-05-17 2019-09-06 北京华控创为南京信息技术有限公司 Intelligent sound enhances communication means and device
CN110767244A (en) * 2018-07-25 2020-02-07 中国科学技术大学 Speech enhancement method
CN111128214A (en) * 2019-12-19 2020-05-08 网易(杭州)网络有限公司 Audio noise reduction method and device, electronic equipment and medium
CN113593594A (en) * 2021-09-01 2021-11-02 北京达佳互联信息技术有限公司 Training method and device of voice enhancement model and voice enhancement method and device
CN113611324A (en) * 2021-06-21 2021-11-05 上海一谈网络科技有限公司 Method and device for inhibiting environmental noise in live broadcast, electronic equipment and storage medium
CN113744749A (en) * 2021-09-18 2021-12-03 太原理工大学 Voice enhancement method and system based on psychoacoustic domain weighting loss function
US20220044696A1 (en) * 2020-08-06 2022-02-10 LINE Plus Corporation Methods and apparatuses for noise reduction based on time and frequency analysis using deep learning
CN114121029A (en) * 2021-12-23 2022-03-01 北京达佳互联信息技术有限公司 Training method and device of speech enhancement model and speech enhancement method and device
CN114171038A (en) * 2021-12-10 2022-03-11 北京百度网讯科技有限公司 Voice noise reduction method, device, equipment, storage medium and program product
CN114694670A (en) * 2022-04-06 2022-07-01 华南理工大学 Multi-task network-based microphone array speech enhancement system and method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172476A1 (en) * 2017-12-04 2019-06-06 Apple Inc. Deep learning driven multi-channel filtering for speech enhancement
CN107845389A (en) * 2017-12-21 2018-03-27 北京工业大学 A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks
CN110767244A (en) * 2018-07-25 2020-02-07 中国科学技术大学 Speech enhancement method
CN110211602A (en) * 2019-05-17 2019-09-06 北京华控创为南京信息技术有限公司 Intelligent sound enhances communication means and device
CN111128214A (en) * 2019-12-19 2020-05-08 网易(杭州)网络有限公司 Audio noise reduction method and device, electronic equipment and medium
US20220044696A1 (en) * 2020-08-06 2022-02-10 LINE Plus Corporation Methods and apparatuses for noise reduction based on time and frequency analysis using deep learning
CN113611324A (en) * 2021-06-21 2021-11-05 上海一谈网络科技有限公司 Method and device for inhibiting environmental noise in live broadcast, electronic equipment and storage medium
CN113593594A (en) * 2021-09-01 2021-11-02 北京达佳互联信息技术有限公司 Training method and device of voice enhancement model and voice enhancement method and device
CN113744749A (en) * 2021-09-18 2021-12-03 太原理工大学 Voice enhancement method and system based on psychoacoustic domain weighting loss function
CN114171038A (en) * 2021-12-10 2022-03-11 北京百度网讯科技有限公司 Voice noise reduction method, device, equipment, storage medium and program product
CN114121029A (en) * 2021-12-23 2022-03-01 北京达佳互联信息技术有限公司 Training method and device of speech enhancement model and speech enhancement method and device
CN114694670A (en) * 2022-04-06 2022-07-01 华南理工大学 Multi-task network-based microphone array speech enhancement system and method

Similar Documents

Publication Publication Date Title
CN109065067B (en) Conference terminal voice noise reduction method based on neural network model
CN111081268A (en) Phase-correlated shared deep convolutional neural network speech enhancement method
CN110600050B (en) Microphone array voice enhancement method and system based on deep neural network
CN101976566B (en) Voice enhancement method and device using same
CN101083640A (en) Low complexity noise reduction method
CN101916567A (en) Speech enhancement method applied to dual-microphone system
CN114566176B (en) Residual echo cancellation method and system based on deep neural network
Li et al. Real-time monaural speech enhancement with short-time discrete cosine transform
JP2004527797A (en) Audio signal processing method
Chavan et al. Studies on implementation of wavelet for denoising speech signal
CN115273883A (en) Convolution cyclic neural network, and voice enhancement method and device
Garg Speech enhancement using long short term memory with trained speech features and adaptive wiener filter
Choi Noise reduction algorithm in speech by Wiener filter
Nuha et al. Noise reduction and speech enhancement using wiener filter
CN114360571A (en) Reference-based speech enhancement method
JP2024502287A (en) Speech enhancement method, speech enhancement device, electronic device, and computer program
Shraddha et al. Noise cancellation and noise reduction techniques: A review
CN116884426A (en) Voice enhancement method, device and equipment based on DFSMN model
CN114999519A (en) Voice real-time noise reduction method and system based on double transformation
WO2022032608A1 (en) Audio noise reduction method and device
CN114023352B (en) Voice enhancement method and device based on energy spectrum depth modulation
Djendi An efficient frequency-domain adaptive forward BSS algorithm for acoustic noise reduction and speech quality enhancement
CN113611321A (en) Voice enhancement method and system
Sudheer Kumar et al. Noise Reduction in Audio File Using Spectral Gatting and FFT by Python Modules
Chokkarapu et al. Implementation of spectral subtraction noise suppressor using DSP processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220902