CN100347988C - Broad frequency band voice quality objective evaluation method - Google Patents

Broad frequency band voice quality objective evaluation method Download PDF

Info

Publication number
CN100347988C
CN100347988C CNB2003101112735A CN200310111273A CN100347988C CN 100347988 C CN100347988 C CN 100347988C CN B2003101112735 A CNB2003101112735 A CN B2003101112735A CN 200310111273 A CN200310111273 A CN 200310111273A CN 100347988 C CN100347988 C CN 100347988C
Authority
CN
China
Prior art keywords
frame
voice
speech
loudness
distortion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2003101112735A
Other languages
Chinese (zh)
Other versions
CN1538667A (en
Inventor
胡瑞敏
艾浩军
涂卫平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CNB2003101112735A priority Critical patent/CN100347988C/en
Publication of CN1538667A publication Critical patent/CN1538667A/en
Application granted granted Critical
Publication of CN100347988C publication Critical patent/CN100347988C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Monitoring And Testing Of Exchanges (AREA)

Abstract

The present invention relates to a broad frequency band voice quality objective evaluation method. The amplitude of measured voice and reference speech are normalized to be an average value which is zero, and standard deviation is a sequence of 1; the hearing threshold of a critical zone in a frequency band of 50 to 7000Hz is calculated; the threshold value of a quiet frame is calculated according to the energy of a window adding voice frame; the power spectrum of a signal whhich is normalized is calculated; a Bark spectrum is obtained through summation in the critical zone; the loudness of the voice frame is calculated according to the Bark spectrum; loudness vector quantity is normalized; the voice loudness *t is encoded according to the loudness *o of the original voice, and a distortion marker M (i) is perceptible by determining a noise shielding threshold value Th<n>; the distortion of each frame is given; the steps are repeated, and the distortion WBSD of the integral voice section is calculated. In the condition that whether voice distortion does not influence hearing quality in the quiet section, each unmute section is accumulated and added. Average values are calculated according to the number of frames of unmute frames to obtain the WBSD of the integral voice section. The present invention keeps good correlativity with subjective quality measurement, and improves precision.

Description

Objective evaluation method for broadband voice quality
Technical Field
The invention belongs to the field of voice communication quality evaluation, and particularly relates to an objective quality evaluation method for broadband voice communication on a data network.
Background
The data network transmits voice service, the problem of service quality must be considered, in order to effectively utilize bandwidth, speech coding technology and voice activity detection technology are used to realize Discontinuous Transmission (DTX), and the signal received by a listener and the signal sent by a speaker are not strictly synchronized in time domain. Meanwhile, due to the improvement of the demand of people on the call quality, the broadband (50-7000 Hz) voice communication can be more widely applied due to higher intelligibility, naturalness and definition. The original objective quality evaluation method of the telephone bandwidth (300-3400 Hz) voice has the following defects: a. the objective quality evaluation requirement of broadband voice cannot be met; b. the objective quality evaluation requirement after adopting discontinuous transmission on the packet network can not be met.
Disclosure of Invention
The invention aims to provide a method for evaluating broadband voice transmission quality on a packet network, which overcomes the defects of the existing circuit switching network objective voice quality evaluation method.
In order to achieve the purpose, the invention provides a broadband voice quality objective evaluation method, which is characterized by comprising the following steps of:
(1) the voice section comprises a test voice and a reference voice, a voice frame is taken from the voice section for calculation, and the amplitudes of the test voice and the reference voice are normalized into a sequence with an average value of 0 and a standard deviation of 1;
(2) calculating a critical band hearing threshold in a frequency band of 50-7000 Hz;
(3) calculating a quiet frame speech energy threshold based on the energy of a reference speech windowed speech frame, and if the energy of a frame of speech is less than the quiet frame speech energy threshold, the frame of speech does not participate in the quality assessment, the quiet frame speech energy threshold EnSilenceThIs compared with the energy En of the maximum energy frameMax, its energy is below 15 dB;
(4) calculating a power spectrum for the normalized signal;
(5) summing in a critical band to obtain a Bark spectrum;
(6) according to Bark spectrum, calculating loudness of current speech frame, i.e. calculating loudness upper L of each critical band of test speecht(i) And the loudness L of each critical band of the reference speecho(i) Wherein i is more than or equal to 1 and less than or equal to K, and K is the number of critical zones;
(7) calculating normalized loudness of test speech
Figure C20031011127300051
The normalization factor is equal to the critical band loudness L of the reference speecho(i) Sum of and L on each critical band loudness of the test speecht(i) The ratio of (a) to (b);
L t &OverBar; ( i ) = &Sigma; i = 1 K L o ( i ) &Sigma; i = 1 K L t ( i ) L t ( i ) , 1 &le; i &le; K
(8) according to the critical band loudness L of the reference speecho(i) Testing the normalized loudness of speech
Figure C20031011127300053
Sum noise masking threshold Thn(i) Determining a perceivable distortion flag m (i):
M ( i ) = 1 L o ( i ) - L &OverBar; t ( i ) > Th n ( i ) 0 else , 1 &le; i &le; K
(9) the distortion d (i) for each critical band is given by:
D ( i ) = M ( i ) | L o ( i ) - L t &OverBar; ( i ) |
(10) and (4) repeating the steps (1) to (9), calculating the whole voice section frame by frame, then calculating the distortion WBSD of the whole voice section, judging whether the voice distortion exists in the quiet frame or not and not influencing the hearing quality, accumulating and summing the distortion of each non-quiet frame, and averaging according to the frame number of the non-quiet frame to obtain the WBSD of the whole voice section.
WBSD = 1 N &Sigma; j = 1 N [ &Sigma; i = 1 K D ( j ) ( i ) ]
Wherein,
n: total number of processed non-silent frames
K: critical band number
D(j)(i) The method comprises the following steps Distortion of ith critical band of jth frame of reference speech
Further, in the step (10), the linear prediction coefficient LPC is calculated from the power spectrum of the reference speech, and after weighting the Bark spectral distance of each critical band by the LPC spectral envelope, the average value is calculated, wherein the weighting coefficient W is(j)(i) Summing the LPC filter frequency response values within the ith critical band for the jth frame;
WBSD = 1 N &Sigma; j = 1 N [ &Sigma; i = 1 K W ( j ) ( i ) D ( j ) ( i ) ]
the invention provides a method for calculating weighted spectral distance, which is used for calculating the spectral distance of each frame after weighting the critical band with the spectral distance larger than a masking value according to the amplitude of an LPC (linear predictive coding) spectrum. After FFT calculation, autocorrelation coefficients are directly calculated in the frequency domain, and LPC spectrum is calculated through Durbin algorithm.
Furthermore, in step (1) above, a time-hierarchical alignment based on voice activity detection is added, with analysis thereafter being performed after the active speech segments are time-aligned.
The invention has the following advantages and positive effects:
(1) calculating the speech Bark spectral distance in a wide frequency band as a measure basis, matching with the auditory characteristics of human ears, and keeping good correlation with subjective quality measure;
(2) by adopting a loudness linear interpolation algorithm, the precision is higher than that of a table lookup interpolation calculation method used for calculating the general loudness;
(3) the peak value of the spectrum of the LPC corresponds to the formant of the speech signal, and the frequency band corresponding to the formant has a direct relation with the intelligibility of the speech. The correlation between the method and the subjective quality can be improved by increasing the weight;
(4) due to the action of the voice activity detector, the problem that the reference speech and the detected speech are not synchronous due to the discontinuous transmission in the voice communication of the packet network can be overcome.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a graph of weighting coefficients derived from the LPC filter frequency response for an embodiment of the present invention;
fig. 3 is a schematic diagram of uninterrupted transmission according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings 1 to 3.
The invention provides a broadband voice quality objective evaluation method, which comprises the following steps:
(1) the test voice and the reference voice amplitude are normalized into a sequence with the average value of 0 and the standard deviation of 1;
(2) calculating a critical band hearing threshold in a frequency band of 50-7000 Hz;
(3) a quiet frame threshold is calculated based on the energy of the windowed speech frames, and if the energy of a frame of speech is less than the quiet frame threshold, the frame of signal does not participate in the quality assessment. Quiet frame speech energy threshold EnSilenceThIs compared with the energy En of the maximum energy frameMaxIts energy is lower than 15 dB;
(4) calculating a power spectrum of the normalized signal;
(5) summing in a critical band to obtain a Bark spectrum;
(6) calculating the loudness of the voice frame according to the Bark spectrum;
(7) normalized loudness vector Li(i) The normalization factor is equal to the loudness L of the reference speech frame0(j) And the loudness L of the test speech framei(j) The ratio of (A) to (B), wherein K is the number of critical bands;
L t ( i ) &OverBar; = &Sigma; j = 1 K L o ( j ) &Sigma; j = 1 K L t ( j ) L t ( i )
(8) loudness according to reference speech
Figure C20031011127300072
Testing loudness of speech
Figure C20031011127300073
Sum noise masking threshold ThnDetermining a perceivable distortion flag m (i):
M ( i ) = 1 L &OverBar; o - L &OverBar; t > Th n 0 else
(9) distortion per frame d (i) is given by:
D ( i ) = M ( i ) | L &OverBar; o - L &OverBar; t |
(10) repeating the steps (1) - (9), calculating the distortion WBSD of the whole voice segment, judging whether the voice distortion does not influence the hearing quality in the quiet segment, accumulating and summing each non-quiet segment, and averaging according to the frame number of the non-quiet frame to obtain the WBSD of the whole voice segment.
WBSD = 1 N &Sigma; j = 1 N [ &Sigma; i = 1 K M ( i ) | L o ( j ) ( i ) - L t ( j ) ( i ) | ]
Wherein,
n: total number of frames processed
K: critical band number
Lo (j)(i) The method comprises the following steps J frame Bark spectrum of reference speech
Lt (j)(i) The method comprises the following steps Testing the j frame Bark spectrum of speech
FIG. 1 is a flow chart showing an embodiment of the method, wherein test speech y (n) and reference speech x (n) are input into the BSD preprocessor respectively, and loudness L of each critical band in a frame of test speech is calculatedy(j) And the loudness L of each critical band in a frame of reference speechx(j) In that respect The bandwidth of the voice is limited to 50-7000 Hz, a critical band of a Bark number from 1 to 21 is covered, the corresponding frequency is 20-7700 Hz, and therefore in the whole calculation process, the loudness model is a 21-dimensional feature vector. The noise threshold value calculating section derives a noise masking threshold value Thn and a perceptual distortion flag m (j) for each critical band. BSD preprocessor and noiseThe result of the acoustic threshold computation module is the degree of distortion WBSD per frame. The input voice signal is a 16-bit signed integer, and the sampling frequency is 16 KHz. In the BSD preprocessor, firstly, a voice signal is converted from a time domain to a frequency domain, FFT calculation is used, the window length of FFT is 1024 points, the frame length of each frame of voice is 20ms, corresponding to 640 voice sample points, the frame is shifted to 10 ms.
As shown in fig. 2, the linear prediction coefficient LPC is calculated for the windowed speech signal, and the frequency response of the filter is calculated, and the frequency response of the filter is the broken line. The peaks of the filter correspond to the formants of the frame of speech. And summing the frequency response values in each critical band, averaging to obtain a weighted coefficient W (i), and calculating the voice distortion WBSD according to the following formula.
WBSD = 1 N &Sigma; j = 1 N [ &Sigma; i = 1 K W ( i ) M ( i ) | L o ( j ) ( i ) - L t ( j ) ( i ) | ]
As shown in fig. 3, in a data network, because uninterrupted transmission is used, the recipient's speech is not time aligned with the speaker's speech, and voice activity detection methods may be used to time align the active speech segments, analyze them frame by frame, and compute the WBSD.
Taking the g.722.1 coding as an example, the voice quality under different packet loss rates is calculated, and the correlation between the test result and the subjective test result is not lower than 0.8.

Claims (3)

1. A broadband voice quality objective evaluation method is characterized by comprising the following steps:
(1) the voice section comprises a test voice and a reference voice, a voice frame is taken from the voice section for calculation, and the amplitudes of the test voice and the reference voice are normalized into a sequence with an average value of 0 and a standard deviation of 1;
(2) calculating a critical band hearing threshold in a frequency band of 50-7000 Hz;
(3) calculating a quiet frame speech energy threshold based on the energy of a reference speech windowed speech frame, if the energy of a frame of speech is less than the energy of the quiet frame speechThreshold value, this frame signal does not participate in quality evaluation, and silent frame speech energy threshold value EnSilenceThIs compared with the energy En of the maximum energy frameMaxIts energy is lower than 15 dB;
(4) calculating a power spectrum for the normalized signal;
(5) summing in a critical band to obtain a Bark spectrum;
(6) according to Bark spectrum, calculating loudness of current speech frame, i.e. calculating loudness L of each critical band of test speecht(i) And the loudness L of each critical band of the reference speecho(i) Wherein i is more than or equal to 1 and less than or equal to K, and K is the number of critical zones;
(7) calculating normalized loudness of test speech
Figure C2003101112730002C1
The normalization factor is equal to the critical band loudness L of the reference speecho(i) And the critical band loudness L of the test speecht(i) The ratio of (a) to (b);
L t &OverBar; ( i ) = &Sigma; i = 1 K L o ( i ) &Sigma; i = 1 K L t ( i ) L t ( i ) , 1 &le; i &le; K
(8) according to the critical band loudness L of the reference speecho(i) Testing the normalized loudness of speech
Figure C2003101112730002C3
Sum noise masking threshold Thn(i) DeterminingPerceivable distortion flag m (i):
M ( i ) = 1 L o ( i ) - L &OverBar; t ( i ) > T h n ( i ) 0 else , 1 &le; i &le; K
(9) the distortion d (i) for each critical band is given by:
D ( i ) = M ( i ) | L o ( i ) - L t &OverBar; ( i ) |
(10) repeating the steps (1) - (9), calculating the whole voice section frame by frame, then calculating the distortion WBSD of the whole voice section, if the voice distortion does not affect the hearing quality in the quiet frame, accumulating and summing the distortion of each non-quiet frame, and averaging according to the frame number of the non-quiet frame to obtain the WBSD of the whole voice section;
WBSD = 1 N &Sigma; j = 1 N [ &Sigma; i = 1 K D ( j ) ( i ) ]
wherein,
n: total number of processed non-silent frames
K: critical band number
D(j)(i) The method comprises the following steps Distortion of ith critical band of jth frame of reference speech
2. The objective evaluation method for wideband speech quality according to claim 1, wherein: in the step (10), linear prediction coefficients LPC are calculated based on the power spectrum of the reference speech, and after the Bark spectral distance of each critical band is weighted according to the LPC spectral envelope, an average value is calculated, in which the weighting coefficient W is(j)(i) Summing the LPC filter frequency response values within the ith critical band for the jth frame;
WBSD = 1 N &Sigma; j = 1 N [ &Sigma; i = 1 K W ( j ) ( i ) D ( j ) ( i ) ]
3. the objective evaluation method for wideband speech quality according to claim 1 or 2, characterized in that: in step (1) above, a time-hierarchical alignment based on voice activity detection is added, with analysis thereafter being performed after the active speech segments are time-aligned.
CNB2003101112735A 2003-10-24 2003-10-24 Broad frequency band voice quality objective evaluation method Expired - Fee Related CN100347988C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2003101112735A CN100347988C (en) 2003-10-24 2003-10-24 Broad frequency band voice quality objective evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2003101112735A CN100347988C (en) 2003-10-24 2003-10-24 Broad frequency band voice quality objective evaluation method

Publications (2)

Publication Number Publication Date
CN1538667A CN1538667A (en) 2004-10-20
CN100347988C true CN100347988C (en) 2007-11-07

Family

ID=34335996

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2003101112735A Expired - Fee Related CN100347988C (en) 2003-10-24 2003-10-24 Broad frequency band voice quality objective evaluation method

Country Status (1)

Country Link
CN (1) CN100347988C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106851713A (en) * 2015-12-07 2017-06-13 中兴通讯股份有限公司 Terminal speech evaluation the quality method and apparatus, switch managing method and device

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100382514C (en) * 2005-01-13 2008-04-16 康全电讯股份有限公司 Method for testing speech quality of network speech apparatus
CN1321390C (en) * 2005-01-18 2007-06-13 中国电子科技集团公司第三十研究所 Establishment of statistics concerned model of acounstic quality normalization
CN1321400C (en) * 2005-01-18 2007-06-13 中国电子科技集团公司第三十研究所 Noise masking threshold algorithm based Barker spectrum distortion measuring method in objective assessment of sound quality
CN101609686B (en) * 2009-07-28 2011-09-14 南京大学 Objective assessment method based on voice enhancement algorithm subjective assessment
CN102231279B (en) * 2011-05-11 2012-09-26 武汉大学 Objective evaluation system and method of voice frequency quality based on hearing attention
CN103632679A (en) * 2012-08-21 2014-03-12 华为技术有限公司 An audio stream quality assessment method and an apparatus
CN103716470B (en) * 2012-09-29 2016-12-07 华为技术有限公司 The method and apparatus of Voice Quality Monitor
EP2922058A1 (en) * 2014-03-20 2015-09-23 Nederlandse Organisatie voor toegepast- natuurwetenschappelijk onderzoek TNO Method of and apparatus for evaluating quality of a degraded speech signal
US9548067B2 (en) 2014-09-30 2017-01-17 Knuedge Incorporated Estimating pitch using symmetry characteristics
US9396740B1 (en) * 2014-09-30 2016-07-19 Knuedge Incorporated Systems and methods for estimating pitch in audio signals based on symmetry characteristics independent of harmonic amplitudes
CN105989853B (en) * 2015-02-28 2020-08-18 科大讯飞股份有限公司 Audio quality evaluation method and system
CN105551496B (en) * 2015-12-30 2020-01-31 哈尔滨海能达科技有限公司 method, device and terminal for judging voice coding and decoding technology
CN105656931B (en) * 2016-03-01 2018-10-30 邦彦技术股份有限公司 Method and device for objectively evaluating and processing voice quality of network telephone
JP6742620B2 (en) * 2016-10-14 2020-08-26 公立大学法人大阪 Swallowing diagnostic device and program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002073601A1 (en) * 2001-03-13 2002-09-19 Koninklijke Kpn N.V. Method and device for determining the quality of a speech signal

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002073601A1 (en) * 2001-03-13 2002-09-19 Koninklijke Kpn N.V. Method and device for determining the quality of a speech signal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
语音评价技术的发展与研究现状 景新幸,沈述明.电讯技术,第38卷第6期 1998 *
语音质量客观评价方法研究进展 陈国,胡修林,张蕴玉,朱耀庭.电子学报,第29卷第4期 2001 *
语音质量客观评价的一步策略 付强,易克初,田斌,张知易.电子学报,第29卷第7期 2001 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106851713A (en) * 2015-12-07 2017-06-13 中兴通讯股份有限公司 Terminal speech evaluation the quality method and apparatus, switch managing method and device
CN106851713B (en) * 2015-12-07 2021-11-12 中兴通讯股份有限公司 Terminal voice service quality evaluation method and device, and switching management method and device

Also Published As

Publication number Publication date
CN1538667A (en) 2004-10-20

Similar Documents

Publication Publication Date Title
CN100347988C (en) Broad frequency band voice quality objective evaluation method
EP1738355B1 (en) Signal encoding
CN1320521C (en) Method and device for selecting coding speed in variable speed vocoder
RU2232434C2 (en) Process conducting machine evaluation of quality of audio signals
CN1188835C (en) System and method for reducing noise
US6889187B2 (en) Method and apparatus for improved voice activity detection in a packet voice network
US20050108004A1 (en) Voice activity detector based on spectral flatness of input signal
EP2619753B1 (en) Method and apparatus for adaptively detecting voice activity in input audio signal
US20120130711A1 (en) Speech determination apparatus and speech determination method
US20050091040A1 (en) Preprocessing of digital audio data for improving perceptual sound quality on a mobile phone
CN1675684A (en) Distributed speech recognition with back-end voice activity detection apparatus and method
KR101048278B1 (en) Auditory-articulation analysis for speech quality assessment
Sakhnov et al. Approach for Energy-Based Voice Detector with Adaptive Scaling Factor.
JP2001501790A (en) Method and apparatus for detecting bad data packets received by a mobile telephone using decoded speech parameters
Itoh et al. Environmental noise reduction based on speech/non-speech identification for hearing aids
US20090161882A1 (en) Method of Measuring an Audio Signal Perceived Quality Degraded by a Noise Presence
US20090299740A1 (en) Method of processing audio signals for improving the quality of output audio signal which is transferred to subscriber&#39;s terminal over network and audio signal pre-processing apparatus of enabling the method
WO1999012155A1 (en) Channel gain modification system and method for noise reduction in voice communication
Beritelli et al. A psychoacoustic auditory model to evaluate the performance of a voice activity detector
EP1010169B1 (en) Channel gain modification system and method for noise reduction in voice communication
JPH0784596A (en) Method for evaluating quality of encoded speech
KR100399057B1 (en) Apparatus for Voice Activity Detection in Mobile Communication System and Method Thereof
Vini Voice Activity Detection Techniques-A Review
Tarraf et al. Neural network-based voice quality measurement technique
Nam et al. A preprocessing approach to improving the quality of the music decoded by an EVRC codec

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20071107

Termination date: 20131024