CN112102818B - Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation - Google Patents
Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation Download PDFInfo
- Publication number
- CN112102818B CN112102818B CN202011297932.9A CN202011297932A CN112102818B CN 112102818 B CN112102818 B CN 112102818B CN 202011297932 A CN202011297932 A CN 202011297932A CN 112102818 B CN112102818 B CN 112102818B
- Authority
- CN
- China
- Prior art keywords
- frame
- energy
- entropy
- activity detection
- voice activity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 55
- 230000000694 effects Effects 0.000 title claims abstract description 54
- 238000004364 calculation method Methods 0.000 title claims abstract description 12
- 238000001228 spectrum Methods 0.000 claims abstract description 41
- 230000003595 spectral effect Effects 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 12
- 238000009499 grossing Methods 0.000 claims description 10
- 238000009825 accumulation Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation comprises the following steps: s1, carrying out frame-by-frame processing on input voice with noise, and S2, setting a sliding window and continuously updating the minimum value of each frequency point of a frequency spectrum in the window; s3, solving the frame energy f and the frame spectrum entropy of each frame; s4, judging whether the voice activity detection state is in the voice activity detection state according to whether the frame energy and the frame spectrum entropy are simultaneously larger than respective set threshold values; and S5, when the voice activity detection state is in, obtaining and updating the frame signal-to-noise ratio. The invention can judge the environment instant state by judging whether the frame signal-to-noise ratio updating time is controlled according to the voice activity detection state, thereby updating the frame signal-to-noise ratio more effectively and accurately.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, relates to voice recognition, and particularly relates to a signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation.
Background
The application scenes of voice are gradually enriched, and different application scenes are often accompanied by noise. These speech related applications require tools such as decibel detectors, etc., and speech technologies such as speech recognition, array signal processing, etc. may also require signal-to-noise ratios or optimize the experience in terms of signal-to-noise ratios. Therefore, it is necessary to obtain an accurate snr estimate, which firstly needs to perform a relatively accurate real-time estimate of the background noise, and secondly needs to decide when to update the snr.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses a signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation.
The invention discloses a signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation, which comprises the following steps of:
s1, carrying out frame-by-frame processing on input voice with noise, and carrying out short-time Fourier transform on each frame of data to obtain a frequency spectrum Y (k, l), wherein k is frequency point frequency and l is frame number;
s2, setting a sliding window, and continuously updating the minimum value of each frequency point of the frequency spectrum in the window;
the continuous updating method comprises the following specific steps:
with the frequency spectrum amplitude of each frequency point of the first frameThe sum of the square values is used as the initial value of background energy; starting from a second frame, comparing each frequency point of the frame with the same frequency point value of all frames in front of the frame and in a window, selecting the minimum value, and updating the frame frequency point by frequency point after obtaining the minimum value of a single frequency point to obtain the minimum value of the background energy of the full frequency band of the frame;
s3, solving the frame energy frame _ energy and the frame spectral entropy frame _ entropy of each frame;
s4, judging whether the frame energy frame _ energy and the frame spectrum entropy frame _ entropy are larger than respective set threshold values or not at the same time, and judging whether the frame energy frame _ energy and the frame spectrum entropy frame _ entropy are in a voice activity detection state or not;
and S5, when the voice activity detection state is in, obtaining and updating the frame signal-to-noise ratio.
Preferably: in step S2, the update equation of the minimum value of the background energy is:
wherein min (k, l) is the minimum value before update of the frequency point k, and frame _ energy is the frame energy; back _ energy (l) is the background energy of the l-th frame; alpha is a background energy smoothing parameter, and N is the number of points of Fourier transform.
Preferably: frame energy of the l-th frame
The frame spectrum entropy frame _ entropy is estimated by adopting the following formula, wherein N is the number of points of Fourier transform:
p (k, l) is the proportion of the power spectrum of each frequency point to the power spectrum of the whole frame, wherein k is the frequency point frequency and l is the frame number.
Preferably: the threshold value set in step S4 is linearly related to the background spectral entropy,
the calculation formula of the background spectral entropy back _ entropy (l) of the l-th frame is:
(ii) a Wherein beta is a background frame spectrum entropy smoothing parameter, and l is a frame number.
Preferably: the step S4 specifically includes:
when the frame energy frame _ energy and the frame spectral entropy frame _ entropy are simultaneously larger than the respective set threshold values, defining the state as a state 1, otherwise, defining the state as a state 2;
in the state 1, the value of the voice counting frame voice _ frame is added with 1, and the value of the silence counting frame silence _ frame is 0;
in the state 2, the value of the silence count frame silence _ frame is added with 1, and the value of the voice count frame voice _ frame is 0;
and judging that the voice activity detection state is 1 only when the continuous occurrence frequency of the state 1 reaches the set state 1 frequency threshold value, namely, the voice activity detection state is considered to be in the voice activity detection state.
Preferably: the frame signal-to-noise ratio in step S5 is obtained according to the following formula when the voice activity detection state is 1,
frame snr of frame l:
γ is the SNR smoothing parameter of the frame, frame _ energy (l) is the frame energy of the l-th frame, and back _ energy (l) is the background energy of the l-th frame.
The invention can judge the environment instant state by controlling the update time of the frame signal-to-noise ratio through the voice activity detection state, thereby updating the frame signal-to-noise ratio more effectively and accurately.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of the signal-to-noise ratio calculation method according to the present invention.
Detailed Description
The following provides a more detailed description of the present invention.
The invention discloses a signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation, which comprises the following steps of:
s1, carrying out frame-by-frame processing on input voice with noise, and carrying out short-time Fourier transform on each frame of data to obtain a frequency spectrum Y (k, l), wherein k is frequency point frequency and l is frame number;
s2, setting a sliding window, and continuously updating the minimum value of each frequency point of the frequency spectrum in the window;
the continuous updating method comprises the following specific steps:
with the frequency spectrum amplitude of each frequency point of the first frameTaking the square value as an initial value of background energy; starting from a second frame, comparing each frequency point of the frame with the same frequency point value of all frames in front of the frame and in a window, selecting the minimum value, and updating the frame frequency point by frequency point after obtaining the minimum value of a single frequency point to obtain the minimum value of the background energy of the full frequency band of the frame;
s3, solving the frame energy and the frame spectrum entropy frame _ entropy of each frame;
s4, judging whether the frame energy frame _ energy and the frame spectrum entropy frame _ entropy are simultaneously larger than a set threshold value or not, if so, judging whether the frame energy frame _ energy and the frame spectrum entropy are in a voice activity detection state, otherwise, judging not to be in a voice activity detection state;
and S5, when the voice activity detection state is in, obtaining and updating the frame signal-to-noise ratio.
The concrete mode is as follows:
taking the sum of the square values of the spectral amplitude Y (k, l) of each frequency point of the first frame as an initial value of background energy; starting from a second frame, comparing each frequency point of the frame with the same frequency point value of all frames in front of the frame and in a window, selecting the minimum value, and updating the frame frequency point by frequency point after obtaining the minimum value of a single frequency point to obtain the minimum value of the background energy of the full frequency band of the frame;
s3, solving the frame energy and the frame spectrum entropy frame _ entropy of each frame;
s4, judging whether the voice activity detection state is in the voice activity detection state according to the continuous times of whether the frame energy frame _ energy and the frame spectrum entropy frame _ entropy are simultaneously larger than the respective set threshold values;
s5, when the voice activity detection state is in, the frame signal to noise ratio is obtained and updated
As shown in fig. 1, the input noisy speech y is processed frame by frame, the minimum value of the background energy is updated by a sliding window, the frame energy and the frame spectral entropy of each frame are obtained, and the update output of the frame signal-to-noise ratio snr is performed after whether the speech activity detection state is achieved.
In the frame-by-frame processing, each frame of data is subjected to short-time Fourier transform to obtain a frequency spectrum Y (k, l) and the frequency spectrum amplitude(ii) a Where k represents the frequency of the frequency bins and l represents the number of frames.
Setting a sliding window, continuously updating the minimum value of each frequency point of the frequency spectrum in the window through the sliding window, and ensuring that the estimated background energy value back _ energy is as smooth as possible through a smoothing strategy and cannot be suddenly changed due to burst noise;
the method can obtain the frame energy frame _ energy and the frame spectrum entropy frame _ entropy at the same time, after obtaining the frame energy and the frame spectrum entropy respectively in each frame, compare the values of the two with respective threshold values,
obtaining the voice activity detection state according to the frame energy frame _ energy and the frame spectrum entropy frame _ entropy, if the frame energy frame _ energy and the frame spectrum entropy are larger than the threshold value, adding 1 to the voice counting frame, otherwise adding 1 to the quiet counting frame,
for example, setting the voice count frame to be greater than 5 or the quiet count frame to be greater than 10 may be used to determine whether to turn on or off the voice, and then output the voice activity detection status 0/1 to determine whether to update the snr.
One specific flow of sliding window noise estimation is given below:
according to the frequency spectrum amplitude of each frequency point in the first frameThe sum of the squared values is used as the background energy initialization value and recorded as the initial background energy data min (k, l). The sliding window frame length may be set to L =80, that is, the length of the frame can be set to cover the pronunciation of the monosyllabic word in chinese, but the length is not limited to L =80 and varies depending on the speech speed and the language type.
That is, each frequency point of each frame is compared with the value of the L-1 frame in the L frame with the past window length, the minimum value of each frequency point is selected, and the minimum value is updated to the background energy data min (k, L). After obtaining the background energy data min (k, l) of a single frequency point, the background energy data min (k, l) is updated frequency point by frequency point to obtain the background energy minimum value of the full frequency band of the frame, and subsequent operation needs to use the minimum value to update the background energy.
Background energy of the l-th frame
Where frame _ energy is the frame energy value.
In the present invention, the background energy smoothing parameter α can be set to 0.9, and l is the frame number.
One specific procedure for voice activity detection is given below:
the frame energy frame _ energy of each frame is first obtained according to a time domain method, the specific time domain method provided in the present invention is used for reference, and the obtaining method is not limited to the time domain method, but can also be a frequency domain method. In particular to spectral amplitudeThe square of the frame is added frequency point by frequency point to obtain the frame energy of the l frame:
secondly, the frame spectral entropy frame _ entropy of the frame needs to be solved, and the simplest estimation method can adopt the following formula to estimate the spectral entropy of the current frame, wherein N is the number of points of Fourier transform, and N/2 is taken for summation due to the conjugate symmetry property.
The frame spectrum entropy frame _ entropy is estimated by adopting the following formula, wherein N is the number of points of Fourier transform:
after the frame energy frame _ energy and the frame spectral entropy frame _ entropy are obtained, a background spectral entropy back _ entropy needs to be obtained, and the update timing of the background spectral entropy is performed according to whether the background spectral entropy is in a voice activity detection state or not, that is, when the background spectral entropy is in the voice activity detection state, the background spectral entropy is updated.
The background spectral entropy back _ entropy is smoothed as follows, i.e., the background spectral entropy of the first frame
Wherein the value of the background frame spectrum entropy smoothing parameter beta can be selected to be 0.95, and l is the frame number.
The step S4 may specifically be:
when the frame energy frame _ energy and the frame spectral entropy frame _ entropy are simultaneously larger than the respective set threshold, defining the state as a state 1, otherwise, defining the state as a state 2;
in the state 1, the value of the voice counting frame voice _ frame is added with 1, and the value of the silence counting frame silence _ frame is 0;
in state 2, 1 is added to the value of the silence count frame silence _ frame, and the value of the speech count frame voice _ frame is 0.
Under the frame-by-frame detection state, the continuous occurrence frequency of the state 1 reaches a set state 1 frequency threshold value, the voice activity detection state is judged to be 1, at the moment, the voice activity detection state is in, and the voice activity detection state is not in other states; if the continuous occurrence frequency of the state 2 reaches the set threshold value of the state 2 frequency, the voice activity detection state is 0, at this time, no voice can be judged, and the system can enter a power-saving standby mode.
Let th _ energy be the threshold of the frame energy, and th _ entropy be the threshold of the frame spectral entropy. The threshold for voice activity detection can be set by referring to the following but not limited to the following, and the setting in the present invention is a specific implementation:
when the current frame energy frame _ energy and the frame spectrum entropy frame _ entropy are simultaneously larger than the respective threshold, adding 1 to the voice counting frame voice _ frame value, and setting the silence counting frame silence _ frame value to be 0;
and if the silence count frame is required to appear, adding 1 to the silence count frame value, otherwise, resetting the value of the silence count frame value, and otherwise, resetting the value of the silence count frame value. Namely, the increase and accumulation process of the voice counting frame and the quiet counting frame can not be interrupted, and the voice counting frame and the quiet counting frame can be accumulated only by adding 1 which continuously appears, and the accumulation is restarted after interruption, namely zero clearing.
For example, if the voice count frame voice _ frame >5, that is, the threshold of the number of times of state 1 is 5, it can be determined that the voice activity detection state vad _ state is 1, and at this time, it is in the voice activity detection state;
if the silence _ frame is greater than 10, i.e. the threshold of the number of times of state 2 is 10, the speech activity detection state vad _ state is considered to be 0, and at this time, the system can enter the power saving state.
The voice activity detection state vad _ state is found to be 1, and the frame signal-to-noise ratio can be found and updated if the voice activity detection state is considered to be in the voice activity detection state.
Wherein, when the frame SNR is in the voice activity detection state, i.e. the voice activity detection state vad _ state is 1, the frame SNR of the l-th frame can be obtained according to the following formula
When the voice activity detection state vad _ state is 0, it is considered that the voice activity detection state is not in the voice activity detection state, and the update is not performed, the value of the frame snr smoothing parameter γ may be 0.8, frame _ energy (l), and back _ energy (l) respectively represent the frame energy and the background energy of the first frame.
The invention can judge the environment instant state by controlling the update time of the frame signal-to-noise ratio through the voice activity detection state, thereby updating the frame signal-to-noise ratio more effectively and accurately.
The foregoing is a description of preferred embodiments of the present invention, and the preferred embodiments in the preferred embodiments may be combined and combined in any combination, if not obviously contradictory or prerequisite to a certain preferred embodiment, and the specific parameters in the examples and the embodiments are only for the purpose of clearly illustrating the inventor's invention verification process and are not intended to limit the patent protection scope of the present invention, which is defined by the claims and the equivalent structural changes made by the content of the description of the present invention are also included in the protection scope of the present invention.
Claims (2)
1. The signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation is characterized by comprising the following steps of:
s1, carrying out frame-by-frame processing on input voice with noise, and carrying out short-time Fourier transform on each frame of data to obtain a frequency spectrum Y (k, l), wherein k is frequency point frequency and l is frame number;
s2, setting a sliding window, and continuously updating the minimum value of each frequency point of the frequency spectrum in the window;
the continuous updating method comprises the following specific steps:
with the frequency spectrum amplitude of each frequency point of the first frameThe sum of the square values is used as the initial value of background energy; starting from a second frame, comparing each frequency point of the frame with the same frequency point value of all frames in front of the frame and in a window, selecting the minimum value, and updating the frame frequency point by frequency point after obtaining the minimum value of a single frequency point to obtain the minimum value of the background energy of the full frequency band of the frame;
s3, solving the frame energy frame _ energy and the frame spectral entropy frame _ entropy of each frame;
s4, judging whether the frame energy frame _ energy and the frame spectrum entropy frame _ entropy are larger than respective set threshold values or not at the same time, and judging whether the frame energy frame _ energy and the frame spectrum entropy frame _ entropy are in a voice activity detection state or not;
s5, when the voice activity detection state is in, obtaining and updating a frame signal-to-noise ratio;
in said step S3; frame energy of the l-th frame
The frame spectrum entropy frame _ entropy is estimated by adopting the following formula, wherein N is the number of points of Fourier transform:
p (k, l) is the proportion of the power spectrum of each frequency point in the whole frame power spectrum, wherein k is the frequency point frequency and l is the frame number;
setting a threshold value in the step S4 to be linearly related to the background spectrum entropy;
the calculation formula of the background spectral entropy back _ entropy (l) of the l-th frame is:
wherein beta is a background frame spectrum entropy smoothing parameter, and l is a frame number;
the step S4 specifically includes:
when the frame energy frame _ energy and the frame spectral entropy frame _ entropy are simultaneously larger than the respective set threshold values, defining the state as a state 1, otherwise, defining the state as a state 2;
in the state 1, the value of the voice counting frame voice _ frame is added with 1, and the value of the silence counting frame silence _ frame is 0;
in the state 2, the value of the silence count frame silence _ frame is added with 1, and the value of the voice count frame voice _ frame is 0;
judging that the voice activity detection state is 1 only when the continuous occurrence frequency of the state 1 reaches a set state 1 frequency threshold value, namely, the voice activity detection state is considered to be in the voice activity detection state;
the frame signal-to-noise ratio in step S5 is obtained according to the following formula when the voice activity detection state is 1,
frame snr of frame l:
γ is the SNR smoothing parameter of the frame, frame _ energy (l) is the frame energy of the l-th frame, and back _ energy (l) is the background energy of the l-th frame.
2. The signal-to-noise ratio calculation method according to claim 1, characterized in that: in step S2, the update equation of the minimum value of the background energy is:
wherein min (k, l) is the minimum value before update of the frequency point k, and frame _ energy is the frame energy; back _ energy (l) is the background energy of the l-th frame; alpha is a background energy smoothing parameter, and N is the number of points of Fourier transform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011297932.9A CN112102818B (en) | 2020-11-19 | 2020-11-19 | Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011297932.9A CN112102818B (en) | 2020-11-19 | 2020-11-19 | Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112102818A CN112102818A (en) | 2020-12-18 |
CN112102818B true CN112102818B (en) | 2021-01-26 |
Family
ID=73785304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011297932.9A Active CN112102818B (en) | 2020-11-19 | 2020-11-19 | Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112102818B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115798514B (en) * | 2023-02-06 | 2023-04-21 | 成都启英泰伦科技有限公司 | Knock detection method |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1212603C (en) * | 2003-08-08 | 2005-07-27 | 中国科学院声学研究所 | Non linear spectrum reduction and missing component estimation method |
US7660713B2 (en) * | 2003-10-23 | 2010-02-09 | Microsoft Corporation | Systems and methods that detect a desired signal via a linear discriminative classifier that utilizes an estimated posterior signal-to-noise ratio (SNR) |
CN1322488C (en) * | 2004-04-14 | 2007-06-20 | 华为技术有限公司 | Method for strengthening sound |
CN101802909B (en) * | 2007-09-12 | 2013-07-10 | 杜比实验室特许公司 | Speech enhancement with noise level estimation adjustment |
JP4950930B2 (en) * | 2008-04-03 | 2012-06-13 | 株式会社東芝 | Apparatus, method and program for determining voice / non-voice |
CN102044243B (en) * | 2009-10-15 | 2012-08-29 | 华为技术有限公司 | Method and device for voice activity detection (VAD) and encoder |
CN104021796B (en) * | 2013-02-28 | 2017-06-20 | 华为技术有限公司 | Speech enhan-cement treating method and apparatus |
CN105023572A (en) * | 2014-04-16 | 2015-11-04 | 王景芳 | Noised voice end point robustness detection method |
CN104125579B (en) * | 2014-08-07 | 2017-07-11 | 桂林电子科技大学 | A kind of frequency spectrum sensing method and device based on time domain energy Yu frequency domain spectra entropy |
CN105741849B (en) * | 2016-03-06 | 2019-03-22 | 北京工业大学 | The sound enhancement method of phase estimation and human hearing characteristic is merged in digital deaf-aid |
CN107331393B (en) * | 2017-08-15 | 2020-05-12 | 成都启英泰伦科技有限公司 | Self-adaptive voice activity detection method |
CN110706693B (en) * | 2019-10-18 | 2022-04-19 | 浙江大华技术股份有限公司 | Method and device for determining voice endpoint, storage medium and electronic device |
-
2020
- 2020-11-19 CN CN202011297932.9A patent/CN112102818B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112102818A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Adaptive noise estimation algorithm for speech enhancement | |
US6289309B1 (en) | Noise spectrum tracking for speech enhancement | |
US9142221B2 (en) | Noise reduction | |
CN103456310B (en) | Transient noise suppression method based on spectrum estimation | |
EP1065657B1 (en) | Method for detecting a noise domain | |
US7302388B2 (en) | Method and apparatus for detecting voice activity | |
US9349384B2 (en) | Method and system for object-dependent adjustment of levels of audio objects | |
CN105023572A (en) | Noised voice end point robustness detection method | |
EP1887559B1 (en) | Yule walker based low-complexity voice activity detector in noise suppression systems | |
JP5752324B2 (en) | Single channel suppression of impulsive interference in noisy speech signals. | |
Ma et al. | Perceptual Kalman filtering for speech enhancement in colored noise | |
Nelke et al. | Single microphone wind noise PSD estimation using signal centroids | |
CN103544961A (en) | Voice signal processing method and device | |
CN112102818B (en) | Signal-to-noise ratio calculation method combining voice activity detection and sliding window noise estimation | |
KR101295727B1 (en) | Apparatus and method for adaptive noise estimation | |
US11183172B2 (en) | Detection of fricatives in speech signals | |
EP4128225A1 (en) | Noise supression for speech enhancement | |
KR100784456B1 (en) | Voice Enhancement System using GMM | |
US8788265B2 (en) | System and method for babble noise detection | |
KR102718917B1 (en) | Detection of fricatives in speech signals | |
Hendriks et al. | Speech reinforcement in noisy reverberant conditions under an approximation of the short-time SII | |
CN118398022B (en) | Improved speech enhancement noise reduction method | |
He et al. | Codebook-based speech enhancement using Markov process and speech-presence probability. | |
CN113409812B (en) | Processing method and device of voice noise reduction training data and training method | |
Verteletskaya et al. | Enhanced spectral subtraction method for noise reduction with minimal speech distortion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |