CN104269180A - Quasi-clean voice construction method for voice quality objective evaluation - Google Patents

Quasi-clean voice construction method for voice quality objective evaluation Download PDF

Info

Publication number
CN104269180A
CN104269180A CN201410515374.7A CN201410515374A CN104269180A CN 104269180 A CN104269180 A CN 104269180A CN 201410515374 A CN201410515374 A CN 201410515374A CN 104269180 A CN104269180 A CN 104269180A
Authority
CN
China
Prior art keywords
speech
voice
segment
spectrum
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410515374.7A
Other languages
Chinese (zh)
Other versions
CN104269180B (en
Inventor
贺前华
周伟力
李洪韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201410515374.7A priority Critical patent/CN104269180B/en
Publication of CN104269180A publication Critical patent/CN104269180A/en
Application granted granted Critical
Publication of CN104269180B publication Critical patent/CN104269180B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a quasi-clean voice construction method for voice quality objective evaluation. An improved minimum value control recursion average algorithm and a multi-spectrum subtraction are adopted to obtain quasi-clean voice of distorsion voice. The method mainly comprises the steps of (1) distinguishing a voice segment and a non-voice segment of the distorsion voice; (2) respectively evaluating noise power spectrums of the voice segment and the non-voice segment according to the division of the voice segment and the non-voice segment; (3) calculating the quasi-clean voice power spectrum of the distorsion voice according to noise spectrum evaluation values of the non-voice segment and the voice segment. The quasi-clean voice construction method for voice quality objective evaluation has the advantages that the quasi-clean voice and the distorsion voice serve as input voice of a PESQ algorithm, and an objective evaluation value of the distorsion voice is obtained.

Description

A kind of accurate clean speech building method for speech quality objective assessment
Technical field
The present invention relates to a kind of speech quality objective assessment technology, in particular to a kind of accurate clean speech building method for speech quality objective assessment, this voice building method belongs to the speech quality objective assessment field of reference source-free (Non-intrusive).
Background technology
Voice quality quality is one of major criterion evaluating voice communication system quality.Voice quality assessment is generally divided into subjective evaluation method and method for objectively evaluating.Subjective evaluation method relies on comments hearer's suggestion to make judgement to voice quality, be directly reflect the viewpoint of user to system quality, wherein ITU-T advises that the MOS (Mean Opinion Score) P.830 proposed is a kind of widely used subjective evaluation method.But subjective evaluation method poor repeatability, be difficult to organize and implement underaction, the subjective factor easily by people affects, and is unfavorable for applying in production run and field experiment.
Method for objectively evaluating has stopped the issuable impact of human factor, for the special characteristic of voice signal, adopts the mode of signal transacting to realize the evaluation procedure of voice quality.Method for objectively evaluating has reference source (Intrusive) method for objectively evaluating and reference source-free (Non-Intrusive) method for objectively evaluating according to being divided into the need of reference source signal (clean speech).Reference source method for objectively evaluating is had to differentiate the quality of voice quality with the error size between the input signal of voice system and output signal, it is a kind of error metrics, wherein ITU-T advise the PESQ perceptual speech quality evaluation P.862 proposed be current better performances have reference source method for objectively evaluating, can identification communication time delay, neighbourhood noise and mistake preferably.But, PESQ and other have reference source method for objectively evaluating need use input voice (clean speech) as a reference, can not use in the application only having distorted signal.
P.563, ITU-T suggestion is the standard of current reference source-free method for objectively evaluating, can be applied to the monitoring of VoIP without reference signal and communication network performance, but its computational complexity is high, be unfavorable for Real-Time Evaluation voice quality, and assess performance is not as good as PESQ.The method for objectively evaluating of the Corpus--based Method model of current main flow is mainly based on gauss hybrid models (GMM) and vector quantization (Vector Quantization), clean speech is trained for reference model and reference code book by these class methods in model training process, carry out distortion computation by distorted speech and reference model and with reference to code book during test, error result is mapped as final objective quality score.Corpus--based Method model not only needs a large amount of clean speech data in model training process, and its assess performance differs larger with PESQ.
Accurate clean speech constructing technology, by the noise spectrum of noise track algorithm distortion estimator voice, eliminates the noise section of distorted speech, obtains the accurate clean speech of distorted speech.Be different from voice activity detection (Voice Activity Detection) and only upgrade noise power spectrum in non-speech segment, noise track algorithm can continue to carry out good noise estimation during voice activity, is more applicable to noise non-stationary scene.Minimum value controls recurrence average algorithm relative to other noise track algorithm (Martin, 2001; Doblinger, 1995; Hirsch and Ehrlicher, 1995; Cohen, 2003) can estimating noise power spectrum under nonstationary noise environment quickly.But, minimum value controls recurrence average algorithm and estimates distorted speech with unified during renewal noise spectrum in estimation, distorted speech is not carried out to the differentiation of voice segments and non-speech segment, therefore there is certain error in estimated result compared with the noise power spectrum of reality, and computation complexity is added to the unified estimation of distorted speech noise spectrum, reduce the efficiency of algorithm, be unfavorable for real-time estimation.
Summary of the invention
The object of the invention is to overcome the shortcoming of the defect of reference source-free method for objectively evaluating in prior art with not enough, a kind of accurate clean speech building method for speech quality objective assessment is provided, this voice building method, can follow the tracks of noise of the accurate clean speech introducing distorted speech with removing method and construct.
Object of the present invention is achieved through the following technical solutions: a kind of accurate clean speech building method for speech quality objective assessment, comprises the following steps:
Step 1, the minimum value improved control recurrence average algorithm and distinguish non-speech segment and voice segments in the noise spectrum estimations of distorted speech, upgrade the noise spectrum estimation value of non-speech segment according to the characteristic of non-speech segment;
Step 2, speech frame carried out to noise when estimating, the minimum value of improvement controls recurrence average algorithm when determining that speech frame band speech exists probability, adopts new frequency dependence threshold value;
Step 3, the minimum value improved control recurrence average algorithm determines final noisy speech noise power spectrum estimated value according to the noise power Power estimation of non-speech segment and voice segments;
Step 4, the minimum value improved control recurrence average algorithm and adopt voice activity detection model split non-speech segment and voice segments, utilize zero-crossing rate and short-time energy temporal signatures, sohn algorithm determines non-speech segment between the words in the voice segments of distorted speech and voice segments respectively;
Step 5, multi-band spectrum-subtraction, according to the division of non-speech segment and voice segments and corresponding noise spectrum estimation value, calculate non-speech segment and the accurate clean power spectrum of voice segments of accurate clean speech respectively, thus obtain the accurate clean speech power spectrum of distorted speech.
In step 1, the minimum value of described improvement controls the division based on non-speech segment and voice segments of recurrence average algorithm; Non-speech segment is regarded as noise, noise spectrum estimation value D (λ uv, k)=| Y (λ uv, k) | 2, wherein, | Y (λ uv, k) | 2for non-speech frame short-time rating spectrum, λ uvfor the frame number index of non-speech segment, k is band index.
The division of described non-speech segment and voice segments is realized by the mode of voice activity detection, that is: the temporal signatures such as zero-crossing rate and short-time energy is utilized to carry out rough estimate to distorted speech, find out start time and the finish time of the voice segments of distorted speech, get rid of ground unrest, determine the holophrase segment of distorted speech, adopt the holophrase segment of sohn voice activity detection algorithms to above-mentioned location carefully to estimate, determine non-speech portion between phonological component in voice segments and words.
In step 2, when the minimum value control recurrence average algorithm of described improvement carries out noise estimation to speech frame, frequency dependence threshold value δ (k) of employing is defined as:
δ ( k ) = 1.5,1 ≤ k ≤ LF 2.5 , LF ≤ k ≤ MF 6.5 , MF ≤ k ≤ Fs / 2 ,
Wherein, the frequency of corresponding 1kHZ and 3kHZ of LF and MF difference, Fs is sample frequency, and k is band index.
In step 3, the minimum value of described improvement controls the noise power spectrum estimated value D (λ that recurrence average algorithm estimates to determine noisy speech, k) be divided into non-speech segment and voice segments two parts, described noise power spectrum estimated value D (λ, k) is defined as:
Wherein, α sv, k) be the smoothing factor that time-frequency is relevant, | Y (λ v, k) | 2for speech frame short-time rating spectrum, D (λ v-1, k) be the former frame noise spectrum estimation value of current speech frame.
In steps of 5, accurate clean speech power spectrum S (λ, k) that described multi-band spectrum-subtraction calculates is divided into non-speech segment and voice segments two parts, and the estimated value of described accurate clean speech power spectrum S (λ, k) is defined as:
S(λ,k)=(Y(λ v,k)-D(λ v,k))+(Y(λ uv,k)-D(λ uv,k)),
Wherein, | Y (λ v, k) | 2for speech frame short-time rating spectrum, | Y (λ uv, k) | 2for non-speech frame short-time rating spectrum, D (λ v, k) be speech frame noise power spectrum estimated value, D (λ uv, k) be non-speech frame noise power spectrum estimated value.
The specific implementation process of accurate clean speech building method of the present invention is as follows:
1, determine speech frame and the non-speech frame of distorted speech, Figure of description Fig. 2 shows the processing procedure determining speech frame and non-speech frame.First voice segments rough estimate is carried out to distorted speech, be implemented as follows: windowing framing is carried out to distorted speech, calculate short-time energy and the zero-crossing rate of framing; Setting voice segments short-time energy and zero-crossing rate threshold value, utilize start frame and the end frame of short-time energy and zero-crossing rate temporal signatures determination distorted speech voice segments.Then adopt sohn algorithm carefully to estimate upper speech segment, non-speech portion between the words determining voice segments, non-speech portion between ground unrest section and words is labeled as non-speech frame, and the phonological component of voice segments is labeled as speech frame.
2, noise tracking is carried out to distorted speech.The noise that Figure of description Fig. 3 shows distorted speech follows the tracks of estimation procedure.First Fourier transform is carried out to the distorted speech short time frame of step 1, calculate the power spectrum of every frame.Noise is followed the tracks of and is adopted the minimum value improved to control to pass average algorithm, carries out respectively estimating and upgrades, improve accuracy and the execution efficiency of algorithm to the non-speech frame of distorted speech and speech frame.Wherein, non-speech frame is considered to noise frame, and the noise spectrum estimation value of non-speech frame is the short-time rating spectrum of non-speech frame; Carrying out speech frame noise when estimating, there is probability and is compared by the smooth power spectrum of speech frame and the ratio of its local minimum and new frequency dependence threshold value and obtained in speech frame band speech; Then there is probability and upgrade time-frequency according to smoothly enlarge and to be correlated with smoothing factor in smoothing speech; Above-mentioned time-frequency smoothing factor of being correlated with is used to upgrade the noise spectrum estimation value of phonological component; Finally form distorted speech noise spectrum estimation value by the noise spectrum estimation value of non-voice and voice two parts.
3, accurate clean speech is obtained.Power spectrum and the step 2 of being made an uproar by the band of distorted speech obtains distorted speech noise estimated power spectrum and carries out multiband spectral substraction, obtains accurate clean speech power spectrum.Aim at clean speech power spectrum and carry out Fourier inversion, obtain accurate clean speech time-domain signal.
4, distorted speech evaluating objective quality; PESQ algorithm is by the distortion between sensor model calculated distortion voice and accurate clean speech, and distortion is finally mapped as distorted speech objective quality score by cognitive model.
Principle of the present invention: the present invention adopts a kind of minimum value of improvement control recurrence average algorithm and multi-band spectrum-subtraction to obtain the accurate clean speech of distorted speech, using this accurate clean speech and the distorted speech input voice as PESQ algorithm, obtain the objective evaluation score value of distorted speech.
The present invention has following advantage and effect relative to prior art:
1, by the accurate clean speech of structure distorted speech, PESQ algorithm can be applied to do not input the objective evaluation application scenarios of voice.Compared with other reference source-free method for objectively evaluating, the present invention obtains the higher subjective evaluation degree of correlation.
2, relative to the reference source-free method for objectively evaluating of the Corpus--based Method model of main flow, the present invention does not need a large amount of clean language material training statistical models, makes evaluation algorithms be applicable to the reference source-free objective evaluation application of clean language material shortage.
3, accurate clean speech building method can distinguish non-speech segment and the voice segments of distorted speech, more accurate to the noise power Power estimation of distorted speech, eliminate the noise section of distorted speech largely, improve the accuracy of distorted speech objective quality score.
Accompanying drawing explanation
Fig. 1 is the accurate clean speech building method procedure chart for speech quality objective assessment.
Fig. 2 is the mark process procedure chart of speech frame and non-speech frame.
Fig. 3 is that the noise of distorted speech follows the tracks of estimation procedure figure.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
Embodiment
For an accurate clean speech building method for speech quality objective assessment, comprise the steps:
1, framing windowing (frame length 30ms, frame moves 15ms, adds Hamming window) is carried out to distorted speech, calculate short-time energy and the zero-crossing rate of each frame respectively; Then the average energy of calculated distortion voice, energy Upper threshold, energy Lower Threshold, average Zero-crossing Number, Zero-crossing Number thresholding.Energy is visited the average energy being limited to 0.05 times; Energy Xiamen is limited to the energy Upper threshold of 0.25 times; Zero-crossing Number thresholding is the average Zero-crossing Number of 0.3 times.
2, the start frame based on the double threshold method determination distorted speech voice segments of energy and zero-crossing rate and end frame is adopted; Using the input data of the above-mentioned distorted speech section determined as sohn voice activity detection algorithms, non-speech portion between the words determining distorted speech section.
3, the audio frame beyond distorted speech section above-mentioned steps 2 determined and between distorted speech section words non-speech frame be defined as the non-speech portion of this distorted speech; Audio frame between distorted speech section words above-mentioned steps 2 determined beyond non-speech frame is defined as the phonological component of this distorted speech.As shown in Figure 2, distorted speech short time frame lambda notation non-speech frame part and speech frame part:
4, as shown in Figure 3, Fast Fourier Transform (FFT) is carried out to distorted speech short time frame, calculates and obtain non-speech frame power spectrum | Y (λ uv, k) | 2, speech frame power spectrum | Y (λ v, k) | 2, wherein k is band index.
5, non-speech frame noise power spectrum is estimated.Non-speech segment is considered to noise, and namely noise spectrum estimation value is D (λ uv, k)=| Y (λ uv, k) | 2.
6, to speech frame power spectrum | Y (λ v, k) | 2smoothing:
P(λ v,k)=ηP(λ v-1,k)+(1-η)|Y(λ v,k)| 2
Wherein, P (λ v, k) be speech frame smooth power spectrum, λ vfor speech frame frame number index, k is band index, and η is smoothing factor parameter (getting 0.7 in formula).
7, to P (λ v, k) carry out Local Minimum value trace, obtain P minv, k):
if?P minv-1,k)<P(λ v,k)
P min ( &lambda; v , k ) = &gamma; P min ( &lambda; v - 1 , k ) + 1 - &gamma; 1 - &beta; ( P ( &lambda; v , k ) - &beta;P ( &lambda; v - 1 , k ) )
else
P minv,k)=P(λ v,k)
end
In formula, β gets 0.8, γ and gets 0.998.
8, calculate voice and there is probability.First the ratio Sr (λ of speech frame power spectrum and its local minimum is calculated v, k):
S r ( &lambda; v , k ) = P ( &lambda; v , k ) P min ( &lambda; v , k ) ,
Then according to S rv, k) determine that speech frame band speech exists probability I (λ v, k):
if?S rv,k)>δ(k)
I (λ v, k)=1 voice exist
else
I (λ v, k)=0 voice do not exist
end
The threshold value that δ (k) is correlated with for frequency band:
&delta; ( k ) = 1.5,1 &le; k &le; LF , 2.5 , LF &le; k &le; MF , 6.5 , MF &le; k &le; Fs / 2 ,
Wherein, the frequency of LF and MF difference correspondence and 1kHZ and 3kHZ, Fs is sample frequency, and k is band index.
9, there is Probability p (λ in smoothing speech v, k):
p(λ v,k)=α pp(λ v-1,k)+(1-α p)I(λ v,k),
Wherein, α pfor smoothing factor parameter (getting 0.2 in formula).
10, smoothing speech is utilized to there is Probability p (λ v, k) calculate the smoothing factor α that time-frequency is relevant sv, k):
α sv,k)=α d+(1-α d)p(λ v,k),
Wherein, α dfor constant (getting 0.85 in formula).
11, time-frequency is utilized to be correlated with smoothing factor α sv, k) more new speech frame noise spectrum estimation value D (λ v, k):
D(λ v,k)=α sv,k)D(λ v-1,k)+(1-α sv,k))|Y(λ v,k)| 2
12, adopt multi-band spectrum-subtraction voice segments and the accurate clean power spectrum of non-speech segment, obtain accurate clean speech s (t) by inverse Fourier transform:
s(t)=IFFT[Y(λ v,k)+Y(λ uv,k)-(D(λ v,k)+D(λ uv,k))],
13, as shown in Figure 1, calculated distortion speech objective quality scoring; Utilize the distortion between PESQ algorithm calculated distortion voice and accurate clean speech, distortion is mapped as distorted speech objective quality score by cognitive model.
Above-described embodiment is the present invention's preferably embodiment; but embodiments of the present invention are not restricted to the described embodiments; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.

Claims (6)

1., for an accurate clean speech building method for speech quality objective assessment, it is characterized in that, comprise the following steps:
Step 1, the minimum value improved control recurrence average algorithm and distinguish non-speech segment and voice segments in the noise spectrum estimations of distorted speech, upgrade the noise spectrum estimation value of non-speech segment according to the characteristic of non-speech segment;
Step 2, speech frame carried out to noise when estimating, the minimum value of improvement controls recurrence average algorithm when determining that speech frame band speech exists probability, adopts new frequency dependence threshold value;
Step 3, the minimum value improved control recurrence average algorithm determines final noisy speech noise power spectrum estimated value according to the noise power Power estimation of non-speech segment and voice segments;
Step 4, the minimum value improved control recurrence average algorithm and adopt voice activity detection model split non-speech segment and voice segments, utilize zero-crossing rate and short-time energy temporal signatures, sohn algorithm determines non-speech segment between the words in the voice segments of distorted speech and voice segments respectively;
Step 5, multi-band spectrum-subtraction, according to the division of non-speech segment and voice segments and corresponding noise spectrum estimation value, calculate the non-speech segment of accurate clean speech and the clean power spectrum of standard of voice segments respectively, thus obtain the accurate clean speech power spectrum of distorted speech.
2. the accurate clean speech building method for speech quality objective assessment according to claim 1, is characterized in that, in step 1, the minimum value of described improvement controls the division of recurrence average algorithm based on non-speech segment and voice segments; Non-speech segment is regarded as noise, noise spectrum estimation value D (λ uv, k)=| Y (λ uv, k) | 2, wherein, | Y (λ uv, k) | 2for non-speech frame short-time rating spectrum, λ uvfor the frame number index of non-speech segment, k is band index.
3. the accurate clean speech building method for speech quality objective assessment according to claim 1, it is characterized in that, in step 2, when the minimum value control recurrence average algorithm of described improvement carries out noise estimation to speech frame, frequency dependence threshold value δ (k) of employing is defined as:
&delta; ( k ) = 1.5,1 &le; k &le; LF , 2.5 , LF &le; k &le; MF , 6.5 , MF &le; k &le; Fs / 2 ,
Wherein, the frequency of corresponding 1kHZ and 3kHZ of LF and MF difference, Fs is sample frequency, and k is band index.
4. the accurate clean speech building method for speech quality objective assessment according to claim 1, it is characterized in that, in step 3, the minimum value of described improvement controls the noise power spectrum estimated value D (λ that recurrence average algorithm estimates to determine noisy speech, k) non-speech segment and voice segments two parts are divided into, described noise power spectrum estimated value D (λ, k) is defined as:
Wherein, α sv, k) be the smoothing factor that time-frequency is relevant, | Y (λ v, k) | 2for speech frame short-time rating spectrum, D (λ v-1, k) be the former frame noise spectrum estimation value of current speech frame.
5. the accurate clean speech building method for speech quality objective assessment according to claim 2, it is characterized in that, the division of described non-speech segment and voice segments is realized by the mode of voice activity detection, that is: the temporal signatures such as zero-crossing rate and short-time energy is utilized to carry out rough estimate to distorted speech, find out start time and the finish time of the voice segments of distorted speech, get rid of ground unrest, determine the holophrase segment of distorted speech, the holophrase segment of sohn voice activity detection algorithms to above-mentioned location is adopted carefully to estimate, determine phonological component in voice segments and non-speech portion between words.
6. the accurate clean speech building method for speech quality objective assessment according to claim 1, it is characterized in that, in steps of 5, the accurate clean speech power spectrum S (λ that described multi-band spectrum-subtraction calculates, k) non-speech segment and voice segments two parts are divided into, the estimated value of described accurate clean speech power spectrum S (λ, k) is defined as:
S(λ,k)=(Y(λ v,k)-D(λ v,k))+(Y(λ uv,k)-D(λ uv,k)),
Wherein, | Y (λ v, k) | 2for speech frame short-time rating spectrum, | Y (λ uv, k) | 2for non-speech frame short-time rating spectrum, D (λ v, k) be speech frame noise power spectrum estimated value, D (λ uv, k) be non-speech frame noise power spectrum estimated value.
CN201410515374.7A 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment Expired - Fee Related CN104269180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410515374.7A CN104269180B (en) 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410515374.7A CN104269180B (en) 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment

Publications (2)

Publication Number Publication Date
CN104269180A true CN104269180A (en) 2015-01-07
CN104269180B CN104269180B (en) 2018-04-13

Family

ID=52160694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410515374.7A Expired - Fee Related CN104269180B (en) 2014-09-29 2014-09-29 A kind of quasi- clean speech building method for speech quality objective assessment

Country Status (1)

Country Link
CN (1) CN104269180B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106328151A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Environment de-noising system and application method
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN106816158A (en) * 2015-11-30 2017-06-09 华为技术有限公司 A kind of speech quality assessment method, device and equipment
CN107293286A (en) * 2017-05-27 2017-10-24 华南理工大学 A kind of speech samples collection method that game is dubbed based on network
CN109308904A (en) * 2018-10-22 2019-02-05 上海声瀚信息科技有限公司 A kind of array voice enhancement algorithm
CN109961799A (en) * 2019-01-31 2019-07-02 杭州惠耳听力技术设备有限公司 A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN113593604A (en) * 2021-07-22 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio quality
CN114374924A (en) * 2022-01-07 2022-04-19 上海纽泰仑教育科技有限公司 Recording quality detection method and related device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2426166B (en) * 2005-05-09 2007-10-17 Toshiba Res Europ Ltd Voice activity detection apparatus and method
CN102800322B (en) * 2011-05-27 2014-03-26 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103456310B (en) * 2013-08-28 2017-02-22 大连理工大学 Transient noise suppression method based on spectrum estimation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张玲等: "基于子频带加权的语音活动检测算法", 《计算机应用》 *
曾毓敏等: "基于双向搜索方法的最小值控制递归平均语音增强算法", 《声学学报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106328151A (en) * 2015-06-30 2017-01-11 芋头科技(杭州)有限公司 Environment de-noising system and application method
CN106816158A (en) * 2015-11-30 2017-06-09 华为技术有限公司 A kind of speech quality assessment method, device and equipment
CN106816158B (en) * 2015-11-30 2020-08-07 华为技术有限公司 Voice quality assessment method, device and equipment
US10497383B2 (en) 2015-11-30 2019-12-03 Huawei Technologies Co., Ltd. Voice quality evaluation method, apparatus, and device
CN106448661B (en) * 2016-09-23 2019-07-16 华南理工大学 Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth
CN106448661A (en) * 2016-09-23 2017-02-22 华南理工大学 Audio type detection method based on pure voice and background noise two-level modeling
CN107293286A (en) * 2017-05-27 2017-10-24 华南理工大学 A kind of speech samples collection method that game is dubbed based on network
CN109308904A (en) * 2018-10-22 2019-02-05 上海声瀚信息科技有限公司 A kind of array voice enhancement algorithm
CN109961799A (en) * 2019-01-31 2019-07-02 杭州惠耳听力技术设备有限公司 A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering
CN112750456A (en) * 2020-09-11 2021-05-04 腾讯科技(深圳)有限公司 Voice data processing method and device in instant messaging application and electronic equipment
CN113593604A (en) * 2021-07-22 2021-11-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio quality
CN113593604B (en) * 2021-07-22 2024-07-19 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for detecting audio quality
CN114374924A (en) * 2022-01-07 2022-04-19 上海纽泰仑教育科技有限公司 Recording quality detection method and related device
CN114374924B (en) * 2022-01-07 2024-01-19 上海纽泰仑教育科技有限公司 Recording quality detection method and related device

Also Published As

Publication number Publication date
CN104269180B (en) 2018-04-13

Similar Documents

Publication Publication Date Title
CN104269180A (en) Quasi-clean voice construction method for voice quality objective evaluation
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
CN107610715B (en) Similarity calculation method based on multiple sound characteristics
CN108896878B (en) Partial discharge detection method based on ultrasonic waves
CN103440871B (en) A kind of method that in voice, transient noise suppresses
CN111128213B (en) Noise suppression method and system for processing in different frequency bands
WO2017092216A1 (en) Method, device, and equipment for voice quality assessment
CN102881289B (en) Hearing perception characteristic-based objective voice quality evaluation method
US8655656B2 (en) Method and system for assessing intelligibility of speech represented by a speech signal
CN105023572A (en) Noised voice end point robustness detection method
Zhang et al. Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison–female voices
CN103440872A (en) Transient state noise removing method
Dubey et al. Non-intrusive speech quality assessment using several combinations of auditory features
Schwerin et al. An improved speech transmission index for intelligibility prediction
CN104091603A (en) Voice activity detection system based on fundamental frequency and calculation method thereof
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
Li et al. Non-intrusive quality assessment for enhanced speech signals based on spectro-temporal features
Zhang et al. Fast nonstationary noise tracking based on log-spectral power mmse estimator and temporal recursive averaging
Lu Noise reduction using three-step gain factor and iterative-directional-median filter
Elshamy et al. An iterative speech model-based a priori SNR estimator
Gomez et al. Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio
Ijima et al. Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis.
CN112489692A (en) Voice endpoint detection method and device
Flynn et al. Combined speech enhancement and auditory modelling for robust distributed speech recognition
Yuan et al. Noise estimation based on time–frequency correlation for speech enhancement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180413

CF01 Termination of patent right due to non-payment of annual fee