CN114067825A - Comfort noise generation method based on time-frequency masking estimation and application thereof - Google Patents

Comfort noise generation method based on time-frequency masking estimation and application thereof Download PDF

Info

Publication number
CN114067825A
CN114067825A CN202111360253.6A CN202111360253A CN114067825A CN 114067825 A CN114067825 A CN 114067825A CN 202111360253 A CN202111360253 A CN 202111360253A CN 114067825 A CN114067825 A CN 114067825A
Authority
CN
China
Prior art keywords
time
frequency
noise
estimation
comfort noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111360253.6A
Other languages
Chinese (zh)
Inventor
何平
樊晓辉
蒋升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suirui Technology Group Co Ltd
Original Assignee
Suirui Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suirui Technology Group Co Ltd filed Critical Suirui Technology Group Co Ltd
Priority to CN202111360253.6A priority Critical patent/CN114067825A/en
Publication of CN114067825A publication Critical patent/CN114067825A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Quality & Reliability (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to the technical field of noise processing, and particularly discloses a comfort noise generation method based on time-frequency masking estimation and application thereof, wherein the method comprises the following steps: s1, converting the time domain signal X (n) picked up by the microphone element into a time-frequency domain signal, and obtaining the frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band; s2, estimating the power spectrum density of the comfort noise; s3, generating comfortable noise with corresponding energy; s4, synthesizing the target speech. The scheme is characterized in that the stationary noise component is estimated based on the time-frequency masking information obtained based on deep learning, so that the phenomenon that excessive comfortable noise is generated due to the fact that voice energy is accumulated to the stationary noise component can be avoided; on the other hand, the comfort is selectively increased for the time frequency unit, and the noise introduced by the voice leading time frequency unit is avoided.

Description

Comfort noise generation method based on time-frequency masking estimation and application thereof
Technical Field
The present invention relates to the field of noise processing technologies, and in particular, to a comfort noise generation method based on time-frequency masking estimation and an application thereof.
Background
Noise suppression and speech enhancement have been key techniques for improving the quality of speech communication in conferencing systems or conferencing equipment. The traditional signal processing method is to track the noise power spectral density and the voice power spectral density in a signal, then construct a masking value of 0 to 1 in a frequency domain based on wiener filtering, and achieve the purpose of inhibiting background noise after masking a microphone signal. The signal processing technology has the disadvantages that non-stationary noise in the environment cannot be effectively processed, and the voice distortion is too large under strong noise interference. At present, time-frequency masking information estimation based on deep learning is another common method for noise suppression, and the main idea is to directly estimate a time-frequency masking value from a mixed signal by training a noisy data set to a pure voice signal. The deep learning method can better process non-stationary noise, but also has the distortion problem of speech over-cancellation.
Therefore, in summary, the main disadvantages of the prior art are:
the method for tracking the stationary component in the background noise by signal processing has the defect of overlarge comfort noise in the scene with larger environmental noise.
The existing comfort noise generation method is to add noise to all time-frequency units without adding any distinction, which causes that a certain amount of noise is added to the time-frequency area dominated by voice.
How to significantly improve the quality of hearing perception in a range of generating moderate comfortable noise is a difficult problem to be solved urgently.
The existing scheme combines stationary component estimation in environmental noise, and then generates white noise with the same energy to be added into a frequency spectrum so as to weaken the influence of voice distortion on listening perception.
The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The invention aims to provide a comfort noise generation method based on time-frequency masking estimation and application thereof, which can improve the communication quality, noise suppression and voice enhancement applied to a voice conference system and the like.
In order to achieve the above object, the present invention provides a comfort noise generation method based on time-frequency masking estimation, comprising the steps of:
s1, converting the time domain signal X (n) picked up by the microphone element into a time-frequency domain signal, and obtaining the frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band;
s2, estimating the power spectrum density of the comfort noise;
s3, generating comfortable noise with corresponding energy;
s4, synthesizing the target speech.
In a specific implementation scenario, in a first step, the microphone array signal is first subjected to signal decomposition. And Fourier transform is adopted to convert the suitable signal into a frequency domain signal to obtain a frequency spectrum, so that subsequent noise processing is facilitated. And secondly, carrying out comfort noise power spectral density estimation on the frequency domain signal, wherein the step sequentially comprises three steps of noise power spectral density estimation, stationary noise power spectral density estimation and comfort noise energy estimation. Wherein, the noise power spectral density estimation adopts the video masking information in the prior art. Third, a comfort noise spectrum is generated. Play the effect of starting and stopping, make things convenient for subsequent pronunciation further processing analysis. And fourthly, estimating the target voice. The method comprises the steps of target voice frequency domain estimation, after estimation, inverse Fourier transform is carried out to obtain a target time domain signal, namely a target voice signal, and the target time domain signal is output.
Alternatively, the calculation formula of the frequency spectrum X (l, k) in S1 is as follows:
Figure BDA0003358764120000021
wherein, N is the frame length 512, w (N) is the hamming window of the frame length 512, N is the time label, l is the time frame number, k is the frequency number, and X (l, k) is the frequency spectrum of the microphone signal in the kth frame and the kth frequency band.
Optionally, the S2 specifically includes:
s21, obtaining a time-frequency masking value M (l, k), and calculating the power spectral density rho of the environmental noise for each frequency band kv(k):
Figure BDA0003358764120000031
Wherein | | | represents taking the modulus of the complex number, alpha is the smoothing factor between adjacent frames, the value range is between 0 and 1;
s22, estimating stationary noise power spectral density rho for each frequency band kmin(k):
Figure BDA0003358764120000032
The stationary noise power spectral density represents a minimum component in the tracked noise, namely a minimum value of a noise component in the signal, alpha is a smoothing factor which is the same as the value obtained in the step S21, and gamma represents that the stationary noise control factor value is less than 1;
s23, calculating comfort noise energy ζ:
Figure BDA0003358764120000033
where K has a value equal to half the frame length.
Optionally, the smoothing factor α is 0.95.
Optionally, the stationary noise control factor γ is 0.08.
Optionally, the S3 specifically includes:
generating a comfort noise power spectrum v (l, k):
Figure BDA0003358764120000034
where σ (n) is a white noise sequence with energy of 1 and length of 512.
Optionally, the S4 specifically includes:
s41, calculating the frequency domain estimation of the target voice according to the following formula:
Figure BDA0003358764120000041
the system comprises a frequency spectrum X (l, k), a time-frequency masking value M (l, k), and a comfort noise power frequency spectrum upsilon (l, k);
s42, performing inverse Fourier transform to obtain target voice time domain estimation:
Figure BDA0003358764120000042
where w (k) is the hamming window of frame length 512.
The invention also provides a comfort noise generation system based on time-frequency masking estimation, which is used for implementing a comfort noise generation method based on time-frequency masking estimation, and comprises the following steps:
the signal decomposition module is used for converting a time domain signal X (n) picked up by the microphone element into a time-frequency domain signal and obtaining a frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band;
a comfort noise power spectral density estimation module for estimating a comfort noise power spectral density;
the comfortable noise generating module is used for generating comfortable noise with corresponding energy;
and the target voice synthesis module is used for synthesizing the target voice.
Optionally, the comfort noise power spectral density estimation module includes a noise power spectral density estimation module, a stationary noise power spectral density estimation module, and a comfort noise energy estimation module, which sequentially process the signal.
The invention also provides an electronic device comprising a memory and a processor, wherein the processor is used for realizing the steps of the comfort noise generation method based on the time-frequency masking estimation when executing the computer management program stored in the memory.
Compared with the prior art, the comfort noise generation method based on the time-frequency masking estimation and the application thereof provided by the invention comprise the following steps: s1, converting the time domain signal X (n) picked up by the microphone element into a time-frequency domain signal, and obtaining the frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band; s2, estimating the power spectrum density of the comfort noise; s3, generating comfortable noise with corresponding energy; s4, synthesizing the target speech. The scheme is characterized in that the stationary noise component is estimated based on the time-frequency masking information obtained based on deep learning, so that the phenomenon that excessive comfortable noise is generated due to the fact that voice energy is accumulated to the stationary noise component can be avoided; on the other hand, the comfort is selectively increased for the time frequency unit, and the noise introduced by the voice leading time frequency unit is avoided.
Drawings
Fig. 1 is a functional block diagram of a comfort noise generation system based on time-frequency masking estimation according to the present invention.
Detailed Description
The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.
Example one
A comfort noise generation method based on time-frequency masking estimation according to the preferred embodiment of the present invention comprises the following steps:
s1, converting the time domain signal X (n) picked up by the microphone element into a time-frequency domain signal, and obtaining the frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band;
s2, estimating the power spectrum density of the comfort noise;
s3, generating comfortable noise with corresponding energy;
s4, synthesizing the target speech.
In a preferred embodiment, the spectrum X (l, k) in S1 is calculated as follows:
obtaining a time-frequency domain representation by performing a short-time fourier transform on the time-domain signal x (n):
Figure BDA0003358764120000061
wherein, N is the frame length 512, w (N) is the hamming window with the length 512, N is the time label, l is the time frame number, and k is the frequency number. X (l, k) is the spectrum of the microphone signal in the kth frequency band, the l frame. A frequency band refers to a signal component corresponding to a certain frequency. And obtaining a Hamming window value corresponding to each sample time point n based on the Hamming window function. The hamming window is a fixed prior art and will not be described further herein.
Example two
The present embodiment is the same as the comfort noise generation method based on time-frequency masking estimation in the first embodiment, and the different places are as follows:
s2 specifically includes: assuming that the time-frequency masking value estimated by deep learning is M (l, k), the power spectral density rho of the environmental noise is calculated for each frequency band kv(k):
Figure BDA0003358764120000062
Wherein | | | represents taking the modulus of the complex number, alpha is the smoothing factor between adjacent frames, the value range is between 0 and 1. The invention preferably selects alpha to be 0.95, if the value is too small, the power spectral density estimation has the defect of unstable variation amplitude, and if the value is too high, the energy estimation is too smooth, and the modeling capability of non-smooth noise is reduced. The prior art M (l, k) is adopted, and is a masking value obtained through deep learning estimation, and the value range is between 0 and 1. And when M (l, k) is less than 0.5, the time frequency unit is represented as an environment noise dominant time frequency unit, the noise power spectral density can be updated, otherwise, the time frequency unit is represented as a voice dominant time unit, and the updating of the noise power spectral density is stopped. The result of this step is used to update the stationary noise power spectral density as (3) below.
Estimating stationary noise power spectral density ρ for each frequency band kmin(k):
Figure BDA0003358764120000071
This power spectral density represents a minimal component in the tracked noise, i.e. a minimal value of the noise component in the signal, without directly smoothing the microphone signal. Where α is the same smoothing factor as in step 2. Gamma represents the stationary noise control factor and since stationary noise is part of the noise, the stationary noise energy is less than the noise energy, i.e. gamma < 1. The control factor value is gamma 0.08, and the adoption of the value can generate proper comfortable noise energy and avoid overlarge comfortable noise. This step is used to calculate the comfort noise energy.
Calculating comfort noise energy ζ:
Figure BDA0003358764120000072
this step calculates the physical meaning that the value ρ for all the frequency bands kmin(k) Averaging, where K represents the number of all bands. The value of K is equal to half the frame length, i.e. 256. This step is used for comfort noise energy in the subsequent steps.
Step S3 specifically includes: generating a comfort noise power spectrum v (l, k):
Figure BDA0003358764120000073
where σ (n) is a white noise sequence of energy 1 and length 512. This step is used as the next step to calculate the final speech spectrum.
Step S4 specifically includes: obtaining a frequency domain estimation of the target voice according to the comfortable noise power spectrum obtained in the step (5):
Figure BDA0003358764120000074
namely, when the energy after the time frequency masking is less than the energy of the comfort noise, the comfort noise is added on the time frequency unit, and the distortion caused by local energy loss is avoided. Meanwhile, if the energy after the time-frequency masking is larger than the comfortable noise energy, no noise is added, and the phenomenon that the added noise energy is too large is avoided.
Performing inverse Fourier transform to obtain target voice time domain estimation:
Figure BDA0003358764120000075
Figure BDA0003358764120000081
where w (k) is the hamming window of frame length 512.
The time domain estimated signal can be directly converted into a voltage signal through digital-to-analog conversion, and the voltage signal is played by a loudspeaker to form enhanced voice.
As shown in fig. 1, an embodiment of the present invention further provides a comfort noise generation system based on time-frequency masking estimation, where the system is configured to implement the comfort noise generation method based on time-frequency masking estimation according to the two previous embodiments, where the system includes:
the signal decomposition module is used for converting a time domain signal X (n) picked up by the microphone element into a time-frequency domain signal and obtaining a frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band;
a comfort noise power spectral density estimation module for estimating a comfort noise power spectral density;
the comfortable noise generating module is used for generating comfortable noise with corresponding energy;
and the target voice synthesis module is used for synthesizing the target voice.
Specifically, the comfort noise power spectral density estimation module includes a noise power spectral density estimation module, a stationary noise power spectral density estimation module, and a comfort noise energy estimation module, which sequentially process signals.
An embodiment of the present invention further provides an electronic device, which includes a memory and a processor, where the processor is configured to implement the steps of the comfort noise generation method based on time-frequency masking estimation as described in the two previous embodiments when executing a computer management program stored in the memory.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (10)

1. A comfort noise generation method based on time-frequency masking estimation is characterized by comprising the following steps:
s1, converting the time domain signal X (n) picked up by the microphone element into a time-frequency domain signal, and obtaining the frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band;
s2, estimating the power spectrum density of the comfort noise;
s3, generating comfortable noise with corresponding energy;
s4, synthesizing the target speech.
2. The method for comfort noise generation based on time-frequency masking estimation according to claim 1, characterized in that the calculation formula of the frequency spectrum X (l, k) in S1 is as follows:
Figure FDA0003358764110000011
wherein, N is the frame length 512, w (N) is the hamming window of the frame length 512, N is the time label, l is the time frame number, k is the frequency number, and X (l, k) is the frequency spectrum of the microphone signal in the kth frame and the kth frequency band.
3. The method for comfort noise generation based on time-frequency masking estimation as claimed in claim 2, wherein said S2 specifically includes:
s21, obtaining a time-frequency masking value M (l, k), and calculating the power spectral density rho of the environmental noise for each frequency band kv(k):
Figure FDA0003358764110000012
Wherein | | | represents taking the modulus of the complex number, alpha is the smoothing factor between adjacent frames, the value range is between 0 and 1;
s22, estimating stationary noise power spectral density rho for each frequency band kmin(k):
Figure FDA0003358764110000013
The stationary noise power spectral density represents a minimum component in the tracked noise, namely a minimum value of a noise component in the signal, alpha is a smoothing factor which is the same as the value obtained in the step S21, and gamma represents that the stationary noise control factor value is less than 1;
s23, calculating comfort noise energy ζ:
Figure FDA0003358764110000021
where K has a value equal to half the frame length.
4. A comfort noise generation method based on time-frequency masking estimation according to claim 3, characterized in that the smoothing factor α is 0.95.
5. A comfort noise generation method based on time-frequency masking estimation according to claim 3, characterized in that the stationary noise control factor γ is 0.08.
6. The method for comfort noise generation based on time-frequency masking estimation as claimed in claim 3, wherein said S3 specifically includes:
generating a comfort noise power spectrum v (l, k):
Figure FDA0003358764110000022
where σ (n) is a white noise sequence with energy of 1 and length of 512.
7. The method for comfort noise generation based on time-frequency masking estimation as claimed in claim 6, wherein said S4 specifically includes:
s41, calculating the frequency domain estimation of the target voice according to the following formula:
Figure FDA0003358764110000023
the system comprises a frequency spectrum X (l, k), a time-frequency masking value M (l, k), and a comfort noise power frequency spectrum upsilon (l, k);
s42, performing inverse Fourier transform to obtain target voice time domain estimation:
Figure FDA0003358764110000031
where w (k) is the hamming window of frame length 512.
8. A comfort noise generation system based on time-frequency masking estimation, characterized in that the system is configured to implement the comfort noise generation method based on time-frequency masking estimation according to any one of claims 1 to 7, and comprises:
the signal decomposition module is used for converting a time domain signal X (n) picked up by the microphone element into a time-frequency domain signal and obtaining a frequency spectrum X (l, k) of the microphone signal in the ith frame and the kth frequency band;
a comfort noise power spectral density estimation module for estimating a comfort noise power spectral density;
the comfortable noise generating module is used for generating comfortable noise with corresponding energy;
and the target voice synthesis module is used for synthesizing the target voice.
9. The time-frequency mask estimation based comfort noise generation system according to claim 8, wherein the comfort noise power spectral density estimation module comprises a noise power spectral density estimation module, a stationary noise power spectral density estimation module, and a comfort noise energy estimation module, which sequentially process signals.
10. An electronic device, comprising a memory, a processor for implementing the steps of the method for comfort noise generation based on time-frequency mask estimation according to any of claims 1-7 when executing a computer management class program stored in the memory.
CN202111360253.6A 2021-11-17 2021-11-17 Comfort noise generation method based on time-frequency masking estimation and application thereof Pending CN114067825A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111360253.6A CN114067825A (en) 2021-11-17 2021-11-17 Comfort noise generation method based on time-frequency masking estimation and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111360253.6A CN114067825A (en) 2021-11-17 2021-11-17 Comfort noise generation method based on time-frequency masking estimation and application thereof

Publications (1)

Publication Number Publication Date
CN114067825A true CN114067825A (en) 2022-02-18

Family

ID=80273356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111360253.6A Pending CN114067825A (en) 2021-11-17 2021-11-17 Comfort noise generation method based on time-frequency masking estimation and application thereof

Country Status (1)

Country Link
CN (1) CN114067825A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226592A1 (en) * 2022-05-25 2023-11-30 青岛海尔科技有限公司 Noise signal processing method and apparatus, and storage medium and electronic apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101258057B1 (en) * 2011-11-16 2013-04-24 한국과학기술원 Apparatus and method for auditory masking-based adjusting the amplitude of phone ringing sounds under acoustic noise environments
CN113030862A (en) * 2021-03-12 2021-06-25 中国科学院声学研究所 Multi-channel speech enhancement method and device
CN113160845A (en) * 2021-03-29 2021-07-23 南京理工大学 Speech enhancement algorithm based on speech existence probability and auditory masking effect

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101258057B1 (en) * 2011-11-16 2013-04-24 한국과학기술원 Apparatus and method for auditory masking-based adjusting the amplitude of phone ringing sounds under acoustic noise environments
CN113030862A (en) * 2021-03-12 2021-06-25 中国科学院声学研究所 Multi-channel speech enhancement method and device
CN113160845A (en) * 2021-03-29 2021-07-23 南京理工大学 Speech enhancement algorithm based on speech existence probability and auditory masking effect

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023226592A1 (en) * 2022-05-25 2023-11-30 青岛海尔科技有限公司 Noise signal processing method and apparatus, and storage medium and electronic apparatus

Similar Documents

Publication Publication Date Title
CN108172231B (en) Dereverberation method and system based on Kalman filtering
JP6703525B2 (en) Method and device for enhancing sound source
JP4842583B2 (en) Method and apparatus for multisensory speech enhancement
US10755728B1 (en) Multichannel noise cancellation using frequency domain spectrum masking
Zhang et al. Multi-channel multi-frame ADL-MVDR for target speech separation
CN106572419A (en) Stereo sound effect enhancement system
CN113571047B (en) Audio data processing method, device and equipment
US9530429B2 (en) Reverberation suppression apparatus used for auditory device
CN109841223B (en) Audio signal processing method, intelligent terminal and storage medium
Chao et al. Perceptual contrast stretching on target feature for speech enhancement
CN107045874B (en) Non-linear voice enhancement method based on correlation
CN114067825A (en) Comfort noise generation method based on time-frequency masking estimation and application thereof
Liu et al. Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction
Baby isegan: Improved speech enhancement generative adversarial networks
CN117219102A (en) Low-complexity voice enhancement method based on auditory perception
CN113808608B (en) Method and device for suppressing mono noise based on time-frequency masking smoothing strategy
Xiong et al. Deep Subband Network for Joint Suppression of Echo, Noise and Reverberation in Real-Time Fullband Speech Communication
CN114360560A (en) Speech enhancement post-processing method and device based on harmonic structure prediction
CN111009259A (en) Audio processing method and device
Miyazaki et al. Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction
CN113299308B (en) Voice enhancement method and device, electronic equipment and storage medium
Chun et al. Comparison of cnn-based speech dereverberation using neural vocoder
Xiang et al. Distributed Microphones Speech Separation by Learning Spatial Information With Recurrent Neural Network
Prasad et al. Two microphone technique to improve the speech intelligibility under noisy environment
Muhammed Shifas et al. Speech intelligibility enhancement based on a non-causal WaveNet-like model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination