CN104269180A - Quasi-clean voice construction method for voice quality objective evaluation - Google Patents
Quasi-clean voice construction method for voice quality objective evaluation Download PDFInfo
- Publication number
- CN104269180A CN104269180A CN201410515374.7A CN201410515374A CN104269180A CN 104269180 A CN104269180 A CN 104269180A CN 201410515374 A CN201410515374 A CN 201410515374A CN 104269180 A CN104269180 A CN 104269180A
- Authority
- CN
- China
- Prior art keywords
- speech
- voice
- segment
- spectrum
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000011156 evaluation Methods 0.000 title abstract description 16
- 238000010276 construction Methods 0.000 title abstract 3
- 238000001228 spectrum Methods 0.000 claims abstract description 66
- 238000000034 method Methods 0.000 claims abstract description 46
- 238000009499 grossing Methods 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 9
- 230000006872 improvement Effects 0.000 claims description 9
- 230000002123 temporal effect Effects 0.000 claims description 5
- 206010038743 Restlessness Diseases 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 2
- 230000002596 correlated effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000009432 framing Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Landscapes
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a quasi-clean voice construction method for voice quality objective evaluation. An improved minimum value control recursion average algorithm and a multi-spectrum subtraction are adopted to obtain quasi-clean voice of distorsion voice. The method mainly comprises the steps of (1) distinguishing a voice segment and a non-voice segment of the distorsion voice; (2) respectively evaluating noise power spectrums of the voice segment and the non-voice segment according to the division of the voice segment and the non-voice segment; (3) calculating the quasi-clean voice power spectrum of the distorsion voice according to noise spectrum evaluation values of the non-voice segment and the voice segment. The quasi-clean voice construction method for voice quality objective evaluation has the advantages that the quasi-clean voice and the distorsion voice serve as input voice of a PESQ algorithm, and an objective evaluation value of the distorsion voice is obtained.
Description
Technical field
The present invention relates to a kind of speech quality objective assessment technology, in particular to a kind of accurate clean speech building method for speech quality objective assessment, this voice building method belongs to the speech quality objective assessment field of reference source-free (Non-intrusive).
Background technology
Voice quality quality is one of major criterion evaluating voice communication system quality.Voice quality assessment is generally divided into subjective evaluation method and method for objectively evaluating.Subjective evaluation method relies on comments hearer's suggestion to make judgement to voice quality, be directly reflect the viewpoint of user to system quality, wherein ITU-T advises that the MOS (Mean Opinion Score) P.830 proposed is a kind of widely used subjective evaluation method.But subjective evaluation method poor repeatability, be difficult to organize and implement underaction, the subjective factor easily by people affects, and is unfavorable for applying in production run and field experiment.
Method for objectively evaluating has stopped the issuable impact of human factor, for the special characteristic of voice signal, adopts the mode of signal transacting to realize the evaluation procedure of voice quality.Method for objectively evaluating has reference source (Intrusive) method for objectively evaluating and reference source-free (Non-Intrusive) method for objectively evaluating according to being divided into the need of reference source signal (clean speech).Reference source method for objectively evaluating is had to differentiate the quality of voice quality with the error size between the input signal of voice system and output signal, it is a kind of error metrics, wherein ITU-T advise the PESQ perceptual speech quality evaluation P.862 proposed be current better performances have reference source method for objectively evaluating, can identification communication time delay, neighbourhood noise and mistake preferably.But, PESQ and other have reference source method for objectively evaluating need use input voice (clean speech) as a reference, can not use in the application only having distorted signal.
P.563, ITU-T suggestion is the standard of current reference source-free method for objectively evaluating, can be applied to the monitoring of VoIP without reference signal and communication network performance, but its computational complexity is high, be unfavorable for Real-Time Evaluation voice quality, and assess performance is not as good as PESQ.The method for objectively evaluating of the Corpus--based Method model of current main flow is mainly based on gauss hybrid models (GMM) and vector quantization (Vector Quantization), clean speech is trained for reference model and reference code book by these class methods in model training process, carry out distortion computation by distorted speech and reference model and with reference to code book during test, error result is mapped as final objective quality score.Corpus--based Method model not only needs a large amount of clean speech data in model training process, and its assess performance differs larger with PESQ.
Accurate clean speech constructing technology, by the noise spectrum of noise track algorithm distortion estimator voice, eliminates the noise section of distorted speech, obtains the accurate clean speech of distorted speech.Be different from voice activity detection (Voice Activity Detection) and only upgrade noise power spectrum in non-speech segment, noise track algorithm can continue to carry out good noise estimation during voice activity, is more applicable to noise non-stationary scene.Minimum value controls recurrence average algorithm relative to other noise track algorithm (Martin, 2001; Doblinger, 1995; Hirsch and Ehrlicher, 1995; Cohen, 2003) can estimating noise power spectrum under nonstationary noise environment quickly.But, minimum value controls recurrence average algorithm and estimates distorted speech with unified during renewal noise spectrum in estimation, distorted speech is not carried out to the differentiation of voice segments and non-speech segment, therefore there is certain error in estimated result compared with the noise power spectrum of reality, and computation complexity is added to the unified estimation of distorted speech noise spectrum, reduce the efficiency of algorithm, be unfavorable for real-time estimation.
Summary of the invention
The object of the invention is to overcome the shortcoming of the defect of reference source-free method for objectively evaluating in prior art with not enough, a kind of accurate clean speech building method for speech quality objective assessment is provided, this voice building method, can follow the tracks of noise of the accurate clean speech introducing distorted speech with removing method and construct.
Object of the present invention is achieved through the following technical solutions: a kind of accurate clean speech building method for speech quality objective assessment, comprises the following steps:
Step 1, the minimum value improved control recurrence average algorithm and distinguish non-speech segment and voice segments in the noise spectrum estimations of distorted speech, upgrade the noise spectrum estimation value of non-speech segment according to the characteristic of non-speech segment;
Step 2, speech frame carried out to noise when estimating, the minimum value of improvement controls recurrence average algorithm when determining that speech frame band speech exists probability, adopts new frequency dependence threshold value;
Step 3, the minimum value improved control recurrence average algorithm determines final noisy speech noise power spectrum estimated value according to the noise power Power estimation of non-speech segment and voice segments;
Step 4, the minimum value improved control recurrence average algorithm and adopt voice activity detection model split non-speech segment and voice segments, utilize zero-crossing rate and short-time energy temporal signatures, sohn algorithm determines non-speech segment between the words in the voice segments of distorted speech and voice segments respectively;
Step 5, multi-band spectrum-subtraction, according to the division of non-speech segment and voice segments and corresponding noise spectrum estimation value, calculate non-speech segment and the accurate clean power spectrum of voice segments of accurate clean speech respectively, thus obtain the accurate clean speech power spectrum of distorted speech.
In step 1, the minimum value of described improvement controls the division based on non-speech segment and voice segments of recurrence average algorithm; Non-speech segment is regarded as noise, noise spectrum estimation value D (λ
uv, k)=| Y (λ
uv, k) |
2, wherein, | Y (λ
uv, k) |
2for non-speech frame short-time rating spectrum, λ
uvfor the frame number index of non-speech segment, k is band index.
The division of described non-speech segment and voice segments is realized by the mode of voice activity detection, that is: the temporal signatures such as zero-crossing rate and short-time energy is utilized to carry out rough estimate to distorted speech, find out start time and the finish time of the voice segments of distorted speech, get rid of ground unrest, determine the holophrase segment of distorted speech, adopt the holophrase segment of sohn voice activity detection algorithms to above-mentioned location carefully to estimate, determine non-speech portion between phonological component in voice segments and words.
In step 2, when the minimum value control recurrence average algorithm of described improvement carries out noise estimation to speech frame, frequency dependence threshold value δ (k) of employing is defined as:
Wherein, the frequency of corresponding 1kHZ and 3kHZ of LF and MF difference, Fs is sample frequency, and k is band index.
In step 3, the minimum value of described improvement controls the noise power spectrum estimated value D (λ that recurrence average algorithm estimates to determine noisy speech, k) be divided into non-speech segment and voice segments two parts, described noise power spectrum estimated value D (λ, k) is defined as:
Wherein, α
s(λ
v, k) be the smoothing factor that time-frequency is relevant, | Y (λ
v, k) |
2for speech frame short-time rating spectrum, D (λ
v-1, k) be the former frame noise spectrum estimation value of current speech frame.
In steps of 5, accurate clean speech power spectrum S (λ, k) that described multi-band spectrum-subtraction calculates is divided into non-speech segment and voice segments two parts, and the estimated value of described accurate clean speech power spectrum S (λ, k) is defined as:
S(λ,k)=(Y(λ
v,k)-D(λ
v,k))+(Y(λ
uv,k)-D(λ
uv,k)),
Wherein, | Y (λ
v, k) |
2for speech frame short-time rating spectrum, | Y (λ
uv, k) |
2for non-speech frame short-time rating spectrum, D (λ
v, k) be speech frame noise power spectrum estimated value, D (λ
uv, k) be non-speech frame noise power spectrum estimated value.
The specific implementation process of accurate clean speech building method of the present invention is as follows:
1, determine speech frame and the non-speech frame of distorted speech, Figure of description Fig. 2 shows the processing procedure determining speech frame and non-speech frame.First voice segments rough estimate is carried out to distorted speech, be implemented as follows: windowing framing is carried out to distorted speech, calculate short-time energy and the zero-crossing rate of framing; Setting voice segments short-time energy and zero-crossing rate threshold value, utilize start frame and the end frame of short-time energy and zero-crossing rate temporal signatures determination distorted speech voice segments.Then adopt sohn algorithm carefully to estimate upper speech segment, non-speech portion between the words determining voice segments, non-speech portion between ground unrest section and words is labeled as non-speech frame, and the phonological component of voice segments is labeled as speech frame.
2, noise tracking is carried out to distorted speech.The noise that Figure of description Fig. 3 shows distorted speech follows the tracks of estimation procedure.First Fourier transform is carried out to the distorted speech short time frame of step 1, calculate the power spectrum of every frame.Noise is followed the tracks of and is adopted the minimum value improved to control to pass average algorithm, carries out respectively estimating and upgrades, improve accuracy and the execution efficiency of algorithm to the non-speech frame of distorted speech and speech frame.Wherein, non-speech frame is considered to noise frame, and the noise spectrum estimation value of non-speech frame is the short-time rating spectrum of non-speech frame; Carrying out speech frame noise when estimating, there is probability and is compared by the smooth power spectrum of speech frame and the ratio of its local minimum and new frequency dependence threshold value and obtained in speech frame band speech; Then there is probability and upgrade time-frequency according to smoothly enlarge and to be correlated with smoothing factor in smoothing speech; Above-mentioned time-frequency smoothing factor of being correlated with is used to upgrade the noise spectrum estimation value of phonological component; Finally form distorted speech noise spectrum estimation value by the noise spectrum estimation value of non-voice and voice two parts.
3, accurate clean speech is obtained.Power spectrum and the step 2 of being made an uproar by the band of distorted speech obtains distorted speech noise estimated power spectrum and carries out multiband spectral substraction, obtains accurate clean speech power spectrum.Aim at clean speech power spectrum and carry out Fourier inversion, obtain accurate clean speech time-domain signal.
4, distorted speech evaluating objective quality; PESQ algorithm is by the distortion between sensor model calculated distortion voice and accurate clean speech, and distortion is finally mapped as distorted speech objective quality score by cognitive model.
Principle of the present invention: the present invention adopts a kind of minimum value of improvement control recurrence average algorithm and multi-band spectrum-subtraction to obtain the accurate clean speech of distorted speech, using this accurate clean speech and the distorted speech input voice as PESQ algorithm, obtain the objective evaluation score value of distorted speech.
The present invention has following advantage and effect relative to prior art:
1, by the accurate clean speech of structure distorted speech, PESQ algorithm can be applied to do not input the objective evaluation application scenarios of voice.Compared with other reference source-free method for objectively evaluating, the present invention obtains the higher subjective evaluation degree of correlation.
2, relative to the reference source-free method for objectively evaluating of the Corpus--based Method model of main flow, the present invention does not need a large amount of clean language material training statistical models, makes evaluation algorithms be applicable to the reference source-free objective evaluation application of clean language material shortage.
3, accurate clean speech building method can distinguish non-speech segment and the voice segments of distorted speech, more accurate to the noise power Power estimation of distorted speech, eliminate the noise section of distorted speech largely, improve the accuracy of distorted speech objective quality score.
Accompanying drawing explanation
Fig. 1 is the accurate clean speech building method procedure chart for speech quality objective assessment.
Fig. 2 is the mark process procedure chart of speech frame and non-speech frame.
Fig. 3 is that the noise of distorted speech follows the tracks of estimation procedure figure.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
Embodiment
For an accurate clean speech building method for speech quality objective assessment, comprise the steps:
1, framing windowing (frame length 30ms, frame moves 15ms, adds Hamming window) is carried out to distorted speech, calculate short-time energy and the zero-crossing rate of each frame respectively; Then the average energy of calculated distortion voice, energy Upper threshold, energy Lower Threshold, average Zero-crossing Number, Zero-crossing Number thresholding.Energy is visited the average energy being limited to 0.05 times; Energy Xiamen is limited to the energy Upper threshold of 0.25 times; Zero-crossing Number thresholding is the average Zero-crossing Number of 0.3 times.
2, the start frame based on the double threshold method determination distorted speech voice segments of energy and zero-crossing rate and end frame is adopted; Using the input data of the above-mentioned distorted speech section determined as sohn voice activity detection algorithms, non-speech portion between the words determining distorted speech section.
3, the audio frame beyond distorted speech section above-mentioned steps 2 determined and between distorted speech section words non-speech frame be defined as the non-speech portion of this distorted speech; Audio frame between distorted speech section words above-mentioned steps 2 determined beyond non-speech frame is defined as the phonological component of this distorted speech.As shown in Figure 2, distorted speech short time frame lambda notation non-speech frame part and speech frame part:
4, as shown in Figure 3, Fast Fourier Transform (FFT) is carried out to distorted speech short time frame, calculates and obtain non-speech frame power spectrum | Y (λ
uv, k) |
2, speech frame power spectrum | Y (λ
v, k) |
2, wherein k is band index.
5, non-speech frame noise power spectrum is estimated.Non-speech segment is considered to noise, and namely noise spectrum estimation value is D (λ
uv, k)=| Y (λ
uv, k) |
2.
6, to speech frame power spectrum | Y (λ
v, k) |
2smoothing:
P(λ
v,k)=ηP(λ
v-1,k)+(1-η)|Y(λ
v,k)|
2,
Wherein, P (λ
v, k) be speech frame smooth power spectrum, λ
vfor speech frame frame number index, k is band index, and η is smoothing factor parameter (getting 0.7 in formula).
7, to P (λ
v, k) carry out Local Minimum value trace, obtain P
min(λ
v, k):
if?P
min(λ
v-1,k)<P(λ
v,k)
else
P
min(λ
v,k)=P(λ
v,k)
end
In formula, β gets 0.8, γ and gets 0.998.
8, calculate voice and there is probability.First the ratio Sr (λ of speech frame power spectrum and its local minimum is calculated
v, k):
Then according to S
r(λ
v, k) determine that speech frame band speech exists probability I (λ
v, k):
if?S
r(λ
v,k)>δ(k)
I (λ
v, k)=1 voice exist
else
I (λ
v, k)=0 voice do not exist
end
The threshold value that δ (k) is correlated with for frequency band:
Wherein, the frequency of LF and MF difference correspondence and 1kHZ and 3kHZ, Fs is sample frequency, and k is band index.
9, there is Probability p (λ in smoothing speech
v, k):
p(λ
v,k)=α
pp(λ
v-1,k)+(1-α
p)I(λ
v,k),
Wherein, α
pfor smoothing factor parameter (getting 0.2 in formula).
10, smoothing speech is utilized to there is Probability p (λ
v, k) calculate the smoothing factor α that time-frequency is relevant
s(λ
v, k):
α
s(λ
v,k)=α
d+(1-α
d)p(λ
v,k),
Wherein, α
dfor constant (getting 0.85 in formula).
11, time-frequency is utilized to be correlated with smoothing factor α
s(λ
v, k) more new speech frame noise spectrum estimation value D (λ
v, k):
D(λ
v,k)=α
s(λ
v,k)D(λ
v-1,k)+(1-α
s(λ
v,k))|Y(λ
v,k)|
2,
12, adopt multi-band spectrum-subtraction voice segments and the accurate clean power spectrum of non-speech segment, obtain accurate clean speech s (t) by inverse Fourier transform:
s(t)=IFFT[Y(λ
v,k)+Y(λ
uv,k)-(D(λ
v,k)+D(λ
uv,k))],
13, as shown in Figure 1, calculated distortion speech objective quality scoring; Utilize the distortion between PESQ algorithm calculated distortion voice and accurate clean speech, distortion is mapped as distorted speech objective quality score by cognitive model.
Above-described embodiment is the present invention's preferably embodiment; but embodiments of the present invention are not restricted to the described embodiments; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.
Claims (6)
1., for an accurate clean speech building method for speech quality objective assessment, it is characterized in that, comprise the following steps:
Step 1, the minimum value improved control recurrence average algorithm and distinguish non-speech segment and voice segments in the noise spectrum estimations of distorted speech, upgrade the noise spectrum estimation value of non-speech segment according to the characteristic of non-speech segment;
Step 2, speech frame carried out to noise when estimating, the minimum value of improvement controls recurrence average algorithm when determining that speech frame band speech exists probability, adopts new frequency dependence threshold value;
Step 3, the minimum value improved control recurrence average algorithm determines final noisy speech noise power spectrum estimated value according to the noise power Power estimation of non-speech segment and voice segments;
Step 4, the minimum value improved control recurrence average algorithm and adopt voice activity detection model split non-speech segment and voice segments, utilize zero-crossing rate and short-time energy temporal signatures, sohn algorithm determines non-speech segment between the words in the voice segments of distorted speech and voice segments respectively;
Step 5, multi-band spectrum-subtraction, according to the division of non-speech segment and voice segments and corresponding noise spectrum estimation value, calculate the non-speech segment of accurate clean speech and the clean power spectrum of standard of voice segments respectively, thus obtain the accurate clean speech power spectrum of distorted speech.
2. the accurate clean speech building method for speech quality objective assessment according to claim 1, is characterized in that, in step 1, the minimum value of described improvement controls the division of recurrence average algorithm based on non-speech segment and voice segments; Non-speech segment is regarded as noise, noise spectrum estimation value D (λ
uv, k)=| Y (λ
uv, k) |
2, wherein, | Y (λ
uv, k) |
2for non-speech frame short-time rating spectrum, λ
uvfor the frame number index of non-speech segment, k is band index.
3. the accurate clean speech building method for speech quality objective assessment according to claim 1, it is characterized in that, in step 2, when the minimum value control recurrence average algorithm of described improvement carries out noise estimation to speech frame, frequency dependence threshold value δ (k) of employing is defined as:
Wherein, the frequency of corresponding 1kHZ and 3kHZ of LF and MF difference, Fs is sample frequency, and k is band index.
4. the accurate clean speech building method for speech quality objective assessment according to claim 1, it is characterized in that, in step 3, the minimum value of described improvement controls the noise power spectrum estimated value D (λ that recurrence average algorithm estimates to determine noisy speech, k) non-speech segment and voice segments two parts are divided into, described noise power spectrum estimated value D (λ, k) is defined as:
Wherein, α
s(λ
v, k) be the smoothing factor that time-frequency is relevant, | Y (λ
v, k) |
2for speech frame short-time rating spectrum, D (λ
v-1, k) be the former frame noise spectrum estimation value of current speech frame.
5. the accurate clean speech building method for speech quality objective assessment according to claim 2, it is characterized in that, the division of described non-speech segment and voice segments is realized by the mode of voice activity detection, that is: the temporal signatures such as zero-crossing rate and short-time energy is utilized to carry out rough estimate to distorted speech, find out start time and the finish time of the voice segments of distorted speech, get rid of ground unrest, determine the holophrase segment of distorted speech, the holophrase segment of sohn voice activity detection algorithms to above-mentioned location is adopted carefully to estimate, determine phonological component in voice segments and non-speech portion between words.
6. the accurate clean speech building method for speech quality objective assessment according to claim 1, it is characterized in that, in steps of 5, the accurate clean speech power spectrum S (λ that described multi-band spectrum-subtraction calculates, k) non-speech segment and voice segments two parts are divided into, the estimated value of described accurate clean speech power spectrum S (λ, k) is defined as:
S(λ,k)=(Y(λ
v,k)-D(λ
v,k))+(Y(λ
uv,k)-D(λ
uv,k)),
Wherein, | Y (λ
v, k) |
2for speech frame short-time rating spectrum, | Y (λ
uv, k) |
2for non-speech frame short-time rating spectrum, D (λ
v, k) be speech frame noise power spectrum estimated value, D (λ
uv, k) be non-speech frame noise power spectrum estimated value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410515374.7A CN104269180B (en) | 2014-09-29 | 2014-09-29 | A kind of quasi- clean speech building method for speech quality objective assessment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410515374.7A CN104269180B (en) | 2014-09-29 | 2014-09-29 | A kind of quasi- clean speech building method for speech quality objective assessment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104269180A true CN104269180A (en) | 2015-01-07 |
CN104269180B CN104269180B (en) | 2018-04-13 |
Family
ID=52160694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410515374.7A Expired - Fee Related CN104269180B (en) | 2014-09-29 | 2014-09-29 | A kind of quasi- clean speech building method for speech quality objective assessment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104269180B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106328151A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Environment de-noising system and application method |
CN106448661A (en) * | 2016-09-23 | 2017-02-22 | 华南理工大学 | Audio type detection method based on pure voice and background noise two-level modeling |
CN106816158A (en) * | 2015-11-30 | 2017-06-09 | 华为技术有限公司 | A kind of speech quality assessment method, device and equipment |
CN107293286A (en) * | 2017-05-27 | 2017-10-24 | 华南理工大学 | A kind of speech samples collection method that game is dubbed based on network |
CN109308904A (en) * | 2018-10-22 | 2019-02-05 | 上海声瀚信息科技有限公司 | A kind of array voice enhancement algorithm |
CN109961799A (en) * | 2019-01-31 | 2019-07-02 | 杭州惠耳听力技术设备有限公司 | A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering |
CN106448661B (en) * | 2016-09-23 | 2019-07-16 | 华南理工大学 | Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth |
CN112750456A (en) * | 2020-09-11 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Voice data processing method and device in instant messaging application and electronic equipment |
CN113593604A (en) * | 2021-07-22 | 2021-11-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and storage medium for detecting audio quality |
CN114374924A (en) * | 2022-01-07 | 2022-04-19 | 上海纽泰仑教育科技有限公司 | Recording quality detection method and related device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2426166B (en) * | 2005-05-09 | 2007-10-17 | Toshiba Res Europ Ltd | Voice activity detection apparatus and method |
CN102800322B (en) * | 2011-05-27 | 2014-03-26 | 中国科学院声学研究所 | Method for estimating noise power spectrum and voice activity |
CN103456310B (en) * | 2013-08-28 | 2017-02-22 | 大连理工大学 | Transient noise suppression method based on spectrum estimation |
-
2014
- 2014-09-29 CN CN201410515374.7A patent/CN104269180B/en not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
张玲等: "基于子频带加权的语音活动检测算法", 《计算机应用》 * |
曾毓敏等: "基于双向搜索方法的最小值控制递归平均语音增强算法", 《声学学报》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106328151A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Environment de-noising system and application method |
CN106816158A (en) * | 2015-11-30 | 2017-06-09 | 华为技术有限公司 | A kind of speech quality assessment method, device and equipment |
CN106816158B (en) * | 2015-11-30 | 2020-08-07 | 华为技术有限公司 | Voice quality assessment method, device and equipment |
US10497383B2 (en) | 2015-11-30 | 2019-12-03 | Huawei Technologies Co., Ltd. | Voice quality evaluation method, apparatus, and device |
CN106448661B (en) * | 2016-09-23 | 2019-07-16 | 华南理工大学 | Audio types detection method based on clean speech and the modeling of ambient noise the two poles of the earth |
CN106448661A (en) * | 2016-09-23 | 2017-02-22 | 华南理工大学 | Audio type detection method based on pure voice and background noise two-level modeling |
CN107293286A (en) * | 2017-05-27 | 2017-10-24 | 华南理工大学 | A kind of speech samples collection method that game is dubbed based on network |
CN109308904A (en) * | 2018-10-22 | 2019-02-05 | 上海声瀚信息科技有限公司 | A kind of array voice enhancement algorithm |
CN109961799A (en) * | 2019-01-31 | 2019-07-02 | 杭州惠耳听力技术设备有限公司 | A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering |
CN112750456A (en) * | 2020-09-11 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Voice data processing method and device in instant messaging application and electronic equipment |
CN113593604A (en) * | 2021-07-22 | 2021-11-02 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and storage medium for detecting audio quality |
CN113593604B (en) * | 2021-07-22 | 2024-07-19 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device and storage medium for detecting audio quality |
CN114374924A (en) * | 2022-01-07 | 2022-04-19 | 上海纽泰仑教育科技有限公司 | Recording quality detection method and related device |
CN114374924B (en) * | 2022-01-07 | 2024-01-19 | 上海纽泰仑教育科技有限公司 | Recording quality detection method and related device |
Also Published As
Publication number | Publication date |
---|---|
CN104269180B (en) | 2018-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104269180A (en) | Quasi-clean voice construction method for voice quality objective evaluation | |
CN103854662B (en) | Adaptive voice detection method based on multiple domain Combined estimator | |
CN107610715B (en) | Similarity calculation method based on multiple sound characteristics | |
CN108896878B (en) | Partial discharge detection method based on ultrasonic waves | |
CN103440871B (en) | A kind of method that in voice, transient noise suppresses | |
CN111128213B (en) | Noise suppression method and system for processing in different frequency bands | |
WO2017092216A1 (en) | Method, device, and equipment for voice quality assessment | |
CN102881289B (en) | Hearing perception characteristic-based objective voice quality evaluation method | |
US8655656B2 (en) | Method and system for assessing intelligibility of speech represented by a speech signal | |
CN105023572A (en) | Noised voice end point robustness detection method | |
Zhang et al. | Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison–female voices | |
CN103440872A (en) | Transient state noise removing method | |
Dubey et al. | Non-intrusive speech quality assessment using several combinations of auditory features | |
Schwerin et al. | An improved speech transmission index for intelligibility prediction | |
CN104091603A (en) | Voice activity detection system based on fundamental frequency and calculation method thereof | |
Martín-Doñas et al. | Dual-channel DNN-based speech enhancement for smartphones | |
Li et al. | Non-intrusive quality assessment for enhanced speech signals based on spectro-temporal features | |
Zhang et al. | Fast nonstationary noise tracking based on log-spectral power mmse estimator and temporal recursive averaging | |
Lu | Noise reduction using three-step gain factor and iterative-directional-median filter | |
Elshamy et al. | An iterative speech model-based a priori SNR estimator | |
Gomez et al. | Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio | |
Ijima et al. | Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis. | |
CN112489692A (en) | Voice endpoint detection method and device | |
Flynn et al. | Combined speech enhancement and auditory modelling for robust distributed speech recognition | |
Yuan et al. | Noise estimation based on time–frequency correlation for speech enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180413 |
|
CF01 | Termination of patent right due to non-payment of annual fee |