CN104269180B - A kind of quasi- clean speech building method for speech quality objective assessment - Google Patents

A kind of quasi- clean speech building method for speech quality objective assessment Download PDF

Info

Publication number
CN104269180B
CN104269180B CN201410515374.7A CN201410515374A CN104269180B CN 104269180 B CN104269180 B CN 104269180B CN 201410515374 A CN201410515374 A CN 201410515374A CN 104269180 B CN104269180 B CN 104269180B
Authority
CN
China
Prior art keywords
speech
voice
noise
distorted
power spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410515374.7A
Other languages
Chinese (zh)
Other versions
CN104269180A (en
Inventor
贺前华
周伟力
李洪韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201410515374.7A priority Critical patent/CN104269180B/en
Publication of CN104269180A publication Critical patent/CN104269180A/en
Application granted granted Critical
Publication of CN104269180B publication Critical patent/CN104269180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of quasi- clean speech building method for speech quality objective assessment, this method is mainly included using a kind of quasi- clean speech of improved minimum value control recursive average algorithm with obtaining distorted speech with spectrum-subtraction more:(1) distorted speech non-speech segment and voice segments are distinguished;(2) noise power spectrum of non-speech segment and voice segments is estimated respectively according to the division of non-speech segment and voice segments;(3) according to non-speech segment and voice segments noise spectrum estimation value, the quasi- clean speech power spectrum of calculated distortion voice.Have the advantages that using quasi- clean speech and distorted speech as the input voice of PESQ algorithms, the objective evaluation score value of acquisition distorted speech.

Description

Quasi-clean voice construction method for voice quality objective evaluation
Technical Field
The invention relates to a voice quality objective evaluation technology, in particular to a quasi-clean voice construction method for voice quality objective evaluation, belonging to the field of reference-source-free (Non-intrusive) voice quality objective evaluation.
Background
The quality of voice is one of the important criteria for evaluating the quality of a voice communication system. Speech quality assessment is generally divided into subjective assessment methods and objective assessment methods. The subjective evaluation method directly reflects the viewpoint of a user on the system quality by depending on the judgment of the Opinion of a listener on the voice quality, wherein MOS (Mean Opinion Score) proposed by ITU-T recommendation P.830 is a widely used subjective evaluation method. However, the subjective evaluation method has poor repeatability, is difficult to organize and implement flexibly, is easily influenced by subjective factors of people, and is not beneficial to application in the production process and field experiments.
The objective evaluation method avoids the possible influence of human factors, and adopts a signal processing mode to realize the evaluation process of voice quality aiming at the specific characteristics of voice signals. The objective evaluation method is classified into an objective evaluation method with a reference source (intuitive) and an objective evaluation method without a reference source (Non-intuitive) according to whether a reference source signal (clean voice) is required. The objective evaluation method with the reference source judges whether the voice quality is good or bad according to the error size between the input signal and the output signal of the voice system, and is an error measurement, wherein the PESQ perception voice quality evaluation provided by ITU-T recommendation P.862 is the objective evaluation method with the reference source with better performance at present, and can better identify communication delay, environmental noise and errors. However, PESQ and other objective evaluation methods with reference sources require the use of input speech (clean speech) as a reference and cannot be used in applications where only distorted signals are present.
ITU-T recommendation P.563 is a standard of the current reference-source-free objective evaluation method, can be applied to VoIP without reference signals and monitoring of telecommunication network performance, but has high operation complexity, is not beneficial to real-time evaluation of voice quality, and has evaluation performance inferior to PESQ. At present, mainstream objective evaluation methods based on statistical models are mainly based on Gaussian Mixture Models (GMM) and Vector Quantization (Vector Quantization), clean speech is trained into a reference model and a reference codebook in the model training process, distorted speech, the reference model and the reference codebook are subjected to distortion calculation during testing, and error results are mapped into final objective quality scores. A large amount of clean voice data is needed in the model training process based on the statistical model, and the evaluation performance of the model is greatly different from the PESQ.
The quasi-clean speech construction technology estimates the noise spectrum of the distorted speech through a noise tracking algorithm, eliminates the noise part of the distorted speech and obtains the quasi-clean speech of the distorted speech. Different from Voice Activity Detection (Voice Activity Detection), which updates the noise power spectrum only in a non-Voice section, the noise tracking algorithm can continuously perform better noise estimation during Voice Activity, and is more suitable for a noisy non-stationary scene. The minimum-controlled recursive averaging algorithm is able to estimate the noise power spectrum in non-stationary noise environments much faster than other noise-tracking algorithms (Martin, 2001, doblinger, 1995. However, the minimum control recursive average algorithm estimates the distorted speech uniformly when estimating and updating the noise spectrum, and does not distinguish the distorted speech between speech segments and non-speech segments, so that the estimation result has a certain error compared with the actual noise power spectrum, and the uniform estimation of the distorted speech noise spectrum increases the computational complexity, reduces the efficiency of the algorithm, and is not beneficial to real-time estimation.
Disclosure of Invention
The invention aims to overcome the defects of the reference-source-free objective evaluation method in the prior art and provide a quasi-clean voice construction method for the objective evaluation of voice quality.
The purpose of the invention is realized by the following technical scheme: a quasi-clean speech construction method for objective evaluation of speech quality comprises the following steps:
step 1, an improved minimum control recursive average algorithm distinguishes a non-speech section from a speech section in noise spectrum estimation of distorted speech, and the noise spectrum estimation value of the non-speech section is updated according to the characteristics of the non-speech section;
step 2, when carrying out noise estimation on a voice frame, adopting a new frequency correlation threshold value when the improved minimum control recursive average algorithm determines the existence probability of voice in a voice frame frequency band;
step 3, the improved minimum control recursive average algorithm determines a final noise power spectrum estimation value of the voice with noise according to the noise power spectrum estimation of the non-voice section and the voice section;
step 4, the improved minimum control recursive average algorithm divides a non-speech section and a speech section by adopting a speech activity detection mode, and the sohn algorithm respectively determines the speech section of the distorted speech and the non-speech section between the words in the speech section by utilizing the zero crossing rate and the short-time energy time domain characteristics;
and step 5, respectively calculating the quasi-clean power spectrums of the non-voice sections and the voice sections of the quasi-clean voice by multi-band spectrum subtraction according to the division of the non-voice sections and the corresponding noise spectrum estimation values, thereby obtaining the quasi-clean voice power spectrums of the distorted voice.
In step 1, the modified minimum controls recursive averagingThe algorithm is based on the division of the non-voice sections and the voice sections; recognizing the non-speech segment as noise, noise spectrum estimation value D (lambda) uv ,k)=|Y(λ uv ,k)| 2 Wherein, | Y (λ) uv ,k)| 2 Short-time power spectrum, lambda, for non-speech frames uv Is the frame index of the non-speech segment, and k is the band index.
The division of the non-voice segments and the voice segments is realized by a voice activity detection mode, namely: rough estimation is carried out on the distorted voice by using the zero crossing rate, the short-time energy and the like time domain characteristics, the starting time and the ending time of the voice section of the distorted voice are found out, background noise is eliminated, the whole voice section of the distorted voice is determined, fine estimation is carried out on the positioned whole voice section by adopting a sohn voice activity detection algorithm, and the voice part and the non-voice part between the words in the voice section are determined.
In step 2, when the improved minimum control recursive average algorithm performs noise estimation on a speech frame, the definition of the adopted frequency correlation threshold value δ (k) is as follows:
wherein, LF and MF correspond to frequency points of 1kHZ and 3kHZ respectively, fs is sampling frequency, and k is frequency band index.
In step 3, the improved minimum control recursive average algorithm estimates and determines a noise power spectrum estimation value D (λ, k) of the noisy speech to be divided into a non-speech segment and a speech segment, where the noise power spectrum estimation value D (λ, k) is defined as:
wherein alpha is sv K) is a time-frequency dependent smoothing factor, | Y (λ) v ,k)| 2 For short-time power spectrum, D (lambda), of speech frames v -1, k) is the previous frame noise spectrum estimate for the current speech frame.
In step 5, the quasi-clean speech power spectrum S (λ, k) calculated by the multiband subtraction is divided into two parts, i.e. a non-speech section and a speech section, and the estimated value of the quasi-clean speech power spectrum S (λ, k) is defined as:
S(λ,k)=(Y(λ v ,k)-D(λ v ,k))+(Y(λ uv ,k)-D(λ uv ,k)),
wherein, | Y (λ) v ,k)| 2 Is the short-time power spectrum, | Y (λ) of the speech frame uv ,k)| 2 Short-time power spectrum, D (lambda), for non-speech frames v K) is the speech frame noise power spectrum estimate, D (λ) uv And k) is the estimated value of the noise power spectrum of the non-speech frame.
The specific implementation process of the quasi-clean voice construction method of the invention is as follows:
1. the method for determining the speech frame and the non-speech frame of the distorted speech is shown in figure 2 in the attached figure of the specification, and the process for determining the speech frame and the non-speech frame is shown in figure. Firstly, roughly estimating a voice section of distorted voice, and specifically realizing the following steps: windowing and framing the distorted voice, and calculating the short-time energy and zero crossing rate of framing; setting short-time energy and zero-crossing rate threshold of the voice segment, and determining a starting frame and an ending frame of the distorted voice segment by using the short-time energy and the zero-crossing rate time domain characteristics. And then, carrying out fine estimation on the voice section by adopting a sohn algorithm, determining an inter-speech non-speech part of the voice section, marking the background noise section and the inter-speech non-speech part as non-speech frames, and marking the speech part of the voice section as a speech frame.
2. Noise tracking is performed on the distorted speech. Description of the drawings figure 3 shows the noise tracking estimation process for distorted speech. Firstly, fourier transform is carried out on the short-time frame of the distorted voice in the step 1, and the power spectrum of each frame is calculated. The noise tracking adopts the improved minimum control to pass to the average algorithm, and estimates and updates the non-speech frame and the speech frame of the distorted speech respectively, thereby improving the accuracy and the execution efficiency of the algorithm. The non-speech frame is considered as a noise frame, and the noise spectrum estimation value of the non-speech frame is the short-time power spectrum of the non-speech frame; when the noise estimation of the voice frame is carried out, the existence probability of the voice in the voice frame frequency band is obtained by comparing the ratio of the smooth power spectrum of the voice frame to the local minimum value thereof with a new frequency correlation threshold value; then smoothing the existence probability of the voice and updating a time-frequency related smoothing factor according to the smoothing probability; updating the noise spectrum estimation value of the voice part by using the time-frequency correlation smoothing factor; and finally, forming a distorted voice noise spectrum estimation value by the noise spectrum estimation values of the non-voice part and the voice part.
3. A quasi-clean speech is obtained. And (3) carrying out multi-band frequency spectrum subtraction on the noisy power spectrum of the distorted voice and the distorted voice noise estimation power spectrum obtained in the step (2) to obtain a quasi-clean voice power spectrum. And carrying out Fourier inverse transformation on the clean voice power spectrum to obtain a quasi-clean voice time domain signal.
4. Evaluating objective quality of distorted voice; the PESQ algorithm calculates distortion errors between distorted voice and quasi-clean voice through a perception model, and the distortion errors are finally mapped into objective quality scores of the distorted voice through a cognition model.
The principle of the invention is as follows: the invention adopts an improved minimum control recursive average algorithm and a multi-band spectrum subtraction method to obtain the quasi-clean voice of the distorted voice, and takes the quasi-clean voice and the distorted voice as the input voice of the PESQ algorithm to obtain the objective evaluation score of the distorted voice.
Compared with the prior art, the invention has the following advantages and effects:
1. by constructing quasi-clean speech of distorted speech, the PESQ algorithm can be applied to an objective evaluation application scenario without input speech. Compared with other reference-source-free objective evaluation methods, the method disclosed by the invention has the advantage that higher subjective and objective evaluation correlation degrees are obtained.
2. Compared with a mainstream reference-source-free objective evaluation method based on a statistical model, the method does not need a large amount of clean corpus training statistical models, so that the evaluation algorithm is suitable for the reference-source-free objective evaluation application field lacking of clean corpus.
3. The quasi-clean speech construction method can distinguish the non-speech section and the speech section of the distorted speech, estimate the noise power spectrum of the distorted speech more accurately, eliminate the noise part of the distorted speech to a greater extent, and improve the accuracy of objective quality scoring of the distorted speech.
Drawings
FIG. 1 is a process diagram of a quasi-clean speech construction method for objective assessment of speech quality.
Fig. 2 is a diagram of a process for labeling speech and non-speech frames.
Fig. 3 is a diagram of a noise tracking estimation process for distorted speech.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
A quasi-clean speech construction method for objective evaluation of speech quality comprises the following steps:
1. performing frame windowing on the distorted voice (frame length is 30ms, frame shift is 15ms, and Hamming window is added), and respectively calculating the short-time energy and the zero crossing rate of each frame; then, the average energy, the energy upper threshold, the energy lower threshold, the average zero crossing number and the zero crossing number threshold of the distorted voice are calculated. The upper energy threshold is 0.05 times of the average energy; the lower energy threshold is 0.25 times of the upper energy threshold; the zero crossing threshold is an average zero crossing number of 0.3 times.
2. Determining a starting frame and an ending frame of a distorted voice speech section by adopting a double-threshold method based on energy and zero-crossing rate; and determining the non-speech part between the conversations of the distorted speech segments by taking the determined distorted speech segments as input data of a sohn speech activity detection algorithm.
3. Defining the non-speech frames between the audio frames and the distorted speech segments except the distorted speech segments determined in the step 2 as the non-speech part of the distorted speech; and defining the audio frames except the distorted voice section non-voice frame determined in the step 2 as the voice part of the distorted voice. As shown in fig. 2, the distorted speech short time frame λ marks the non-speech frame part and the speech frame part:
4. as shown in FIG. 3, the fast Fourier transform is performed on the short time frame of the distorted speech, and the power spectrum | Y (λ) of the non-speech frame is calculated and obtained uv ,k)| 2 Speech frame power spectrum | Y (λ) v ,k)| 2 Where k is the band index.
5. The noise power spectrum of the non-speech frame is estimated. The non-speech segments are considered as noise, i.e. the noise spectrum estimate is D (lambda) uv ,k)=|Y(λ uv ,k)| 2
6. For speech frame power spectrum Y (lambda) v ,k)| 2 And (3) smoothing:
P(λ v ,k)=ηP(λ v -1,k)+(1-η)|Y(λ v ,k)| 2
wherein, P (lambda) v K) smoothed power spectrum of speech frame, λ v The frame number of speech frame index, k the band index, and η the smoothing factor parameter (0.7 in the equation).
7. For P (lambda) v K) performing local minimum tracking to obtain P minv ,k):
if P minv -1,k)<P(λ v ,k)
else
P minv ,k)=P(λ v ,k)
end
In the formula, beta is 0.8, and gamma is 0.998.
8. The speech existence probability is calculated. Firstly, the ratio Sr (lambda) of the power spectrum of the speech frame to the local minimum value thereof is calculated v ,k):
Then according to S rv K) determining the speech frame band speech presence probability I (lambda) v ,k):
if S rv ,k)>δ(k)
I(λ v K) =1 speech present
else
I(λ v K) =0 voice absent
end
δ (k) is the band-dependent threshold:
wherein, LF and MF correspond to frequency points of 1kHZ and 3kHZ respectively, fs is sampling frequency, and k is frequency band index.
9. Smoothed speech existence probability p (lambda) v ,k):
p(λ v ,k)=α p p(λ v -1,k)+(1-α p )I(λ v ,k),
Wherein alpha is p The smoothing factor parameter (0.2 in the formula).
10. Using smoothed speech presence probability p (lambda) v K) calculating a time-frequency dependent smoothing factor alpha sv ,k):
α sv ,k)=α d +(1-α d )p(λ v ,k),
Wherein alpha is d Is a constant number (0.85 in the formula).
11. Using time-frequency dependent smoothing factor alpha sv K) updating the speech frame noise spectrum estimate D (λ) v ,k):
D(λ v ,k)=α sv ,k)D(λ v -1,k)+(1-α sv ,k))|Y(λ v ,k)| 2
12. And (2) obtaining a quasi-clean voice s (t) by inverse Fourier transform by adopting a multi-band spectral subtraction voice section and non-voice section quasi-clean power spectrum:
s(t)=IFFT[Y(λ v ,k)+Y(λ uv ,k)-(D(λ v ,k)+D(λ uv ,k))],
13. as shown in fig. 1, calculating a distorted speech objective quality score; and calculating a distortion error between the distorted voice and the quasi-clean voice by utilizing a PESQ algorithm, and mapping the distortion error into an objective quality score of the distorted voice through a cognitive model.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such modifications are intended to be included in the scope of the present invention.

Claims (5)

1. A quasi-clean speech construction method for objectively evaluating speech quality is characterized by comprising the following steps:
step 1, distinguishing a non-speech section from a speech section in noise spectrum estimation of distorted speech by using an improved minimum control recursive average algorithm, and updating the noise spectrum estimation value of the non-speech section according to the characteristics of the non-speech section;
the improved minimum control recursive average algorithm adopts a voice activity detection mode to divide a non-voice section and a voice section, and a sohn algorithm respectively determines the voice section of the distorted voice and the non-voice section among the voice sections by utilizing the zero crossing rate and the short-time energy time domain characteristics;
step 2, when carrying out noise estimation on a voice frame, using an improved minimum control recursive average algorithm to adopt a new frequency correlation threshold value when determining the existence probability of voice in a voice frame frequency band;
the frequency-dependent threshold δ (k) is defined as:
wherein, LF and MF correspond to frequency points of 1kHZ and 3kHZ respectively, fs is sampling frequency, and k is frequency band index;
step 3, determining a final noise power spectrum estimation value of the voice with noise according to the noise power spectrum estimation values of the non-voice section and the voice section by utilizing an improved minimum control recursive average algorithm;
and 4, respectively calculating the quasi-clean power spectrums of the non-voice sections and the voice sections of the quasi-clean voice by utilizing a multi-band spectrum subtraction method according to the division of the non-voice sections and the corresponding noise spectrum estimation values, so as to obtain the quasi-clean voice power spectrums of the distorted voice.
2. The method according to claim 1, wherein in step 1, the recursive average algorithm controlled by improved minimum is based on the division of non-speech segments into speech segments; recognizing the non-speech segment as noise, noise spectrum estimation value D (lambda) uv ,k)=|Y(λ uv ,k)| 2 Wherein, | Y (λ) uv ,k)| 2 Short-time power spectrum, lambda, for non-speech frames uv Is the frame index of the non-speech segment, and k is the band index.
3. The method according to claim 1, wherein in step 3, the noise power spectrum estimation value D (λ, k) for determining the final noisy speech by using the modified minimum-controlled recursive average algorithm is divided into two parts, i.e. a non-speech segment and a speech segment, and the noise power spectrum estimation value D (λ, k) is defined as:
wherein alpha is sv K) is a time-frequency dependent smoothing factor, | Y (λ) v ,k)| 2 Short-time power spectrum, D (lambda), for a speech frame v -1, k) is the previous frame noise spectrum estimate for the current speech frame, k being the band index.
4. The method according to claim 2, wherein the non-speech segments are divided from speech segments by voice activity detection, that is: the method comprises the steps of roughly estimating distorted voice by utilizing the characteristics of zero crossing rate, short-time energy and the like in an equal time domain, finding out the starting time and the ending time of a voice section of the distorted voice, eliminating background noise, determining the whole voice section of the distorted voice, finely estimating the whole voice section by adopting a sohn voice activity detection algorithm, and determining a voice part and an interphone non-voice part in the voice section.
5. The method according to claim 1, wherein in step 4, the power spectrum S (λ, k) of quasi-clean speech calculated by multiband subtraction is divided into two parts, i.e. non-speech and speech, and the estimated value of the power spectrum S (λ, k) of quasi-clean speech is defined as:
S(λ,k)=(Y(λv,k)-D(λv,k))+(Y(λuv,k)-D(λuv,k)),
wherein, | Y (λ) v ,k)| 2 For the short-time power spectrum, | Y (λ) of the speech frame uv ,k)| 2 Short-time power spectrum, D (lambda), for non-speech frames v K) is the noise power spectrum estimate of the speech frame, D (λ) uv K) is the estimate of the noise power spectrum of the non-speech frame, λ v Indexing the number of frames of a speech segment, λ uv Is the frame index of the non-speech segment, and k is the band index.
CN201410515374.7A 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment Active CN104269180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410515374.7A CN104269180B (en) 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410515374.7A CN104269180B (en) 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment

Publications (2)

Publication Number Publication Date
CN104269180A CN104269180A (en) 2015-01-07
CN104269180B true CN104269180B (en) 2018-04-13

Family

ID=52160694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410515374.7A Active CN104269180B (en) 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment

Country Status (1)

Country Link
CN (1) CN104269180B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106328151B (en) * 2015-06-30 2020-01-31 芋头科技(杭州)有限公司 ring noise eliminating system and application method thereof
CN106816158B (en) 2015-11-30 2020-08-07 华为技术有限公司 Voice quality assessment method, device and equipment
CN107293286B (en) * 2017-05-27 2020-11-24 华南理工大学 Voice sample collection method based on network dubbing game
CN109308904A (en) * 2018-10-22 2019-02-05 上海声瀚信息科技有限公司 A kind of array voice enhancement algorithm
CN109961799A (en) * 2019-01-31 2019-07-02 杭州惠耳听力技术设备有限公司 A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN113593604A (en) * 2021-07-22 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio quality
CN114374924B (en) * 2022-01-07 2024-01-19 上海纽泰仑教育科技有限公司 Recording quality detection method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080765A (en) * 2005-05-09 2007-11-28 株式会社东芝 Voice activity detection apparatus and method
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103456310A (en) * 2013-08-28 2013-12-18 大连理工大学 Transient noise suppression method based on spectrum estimation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080765A (en) * 2005-05-09 2007-11-28 株式会社东芝 Voice activity detection apparatus and method
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103456310A (en) * 2013-08-28 2013-12-18 大连理工大学 Transient noise suppression method based on spectrum estimation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于双向搜索方法的最小值控制递归平均语音增强算法;曾毓敏等;《声学学报》;20100131;第35卷(第1期);81-87 *
基于子频带加权的语音活动检测算法;张玲等;《计算机应用》;20100531;第30卷(第5期);1262-1265 *

Also Published As

Publication number Publication date
CN104269180A (en) 2015-01-07

Similar Documents

Publication Publication Date Title
CN104269180B (en) A kind of quasi- clean speech building method for speech quality objective assessment
CN108831499B (en) Speech enhancement method using speech existence probability
CN105513605B (en) The speech-enhancement system and sound enhancement method of mobile microphone
CN107610715B (en) Similarity calculation method based on multiple sound characteristics
CN103440869B (en) Audio-reverberation inhibiting device and inhibiting method thereof
Hines et al. Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA
CN103117067B (en) Voice endpoint detection method under low signal-to-noise ratio
CN104091603B (en) Endpoint detection system and its computational methods based on fundamental frequency
US8655656B2 (en) Method and system for assessing intelligibility of speech represented by a speech signal
JP3588030B2 (en) Voice section determination device and voice section determination method
He et al. Multiplicative update of auto-regressive gains for codebook-based speech enhancement
Schwerin et al. An improved speech transmission index for intelligibility prediction
JP2015535100A (en) Method for evaluating intelligibility of degraded speech signal and apparatus therefor
WO2016004757A1 (en) Noise detection method and apparatus
Wang et al. Oracle performance investigation of the ideal masks
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
Li et al. Non-intrusive quality assessment for enhanced speech signals based on spectro-temporal features
Elshamy et al. An iterative speech model-based a priori SNR estimator
WO2017128910A1 (en) Method, apparatus and electronic device for determining speech presence probability
Park et al. Estimation of speech absence uncertainty based on multiple linear regression analysis for speech enhancement
Zhou et al. Non-intrusive speech quality objective evaluation in high-noise environments
Lu Reduction of musical residual noise using block-and-directional-median filter adapted by harmonic properties
CN114530161A (en) Voice detection method based on spectral subtraction and self-adaptive subband logarithmic energy entropy product
KR100931487B1 (en) Noisy voice signal processing device and voice-based application device including the device
Heese et al. Speech-codebook based soft voice activity detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant