CN100347988C - Broad frequency band voice quality objective evaluation method - Google Patents
Broad frequency band voice quality objective evaluation method Download PDFInfo
- Publication number
- CN100347988C CN100347988C CNB2003101112735A CN200310111273A CN100347988C CN 100347988 C CN100347988 C CN 100347988C CN B2003101112735 A CNB2003101112735 A CN B2003101112735A CN 200310111273 A CN200310111273 A CN 200310111273A CN 100347988 C CN100347988 C CN 100347988C
- Authority
- CN
- China
- Prior art keywords
- frame
- voice
- speech
- loudness
- distortion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 8
- 238000001228 spectrum Methods 0.000 claims abstract description 19
- 238000012360 testing method Methods 0.000 claims description 20
- 238000000034 method Methods 0.000 claims description 12
- 230000003595 spectral effect Effects 0.000 claims description 8
- 238000013441 quality evaluation Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 5
- 230000000873 masking effect Effects 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 claims description 2
- 239000003550 marker Substances 0.000 abstract 1
- 238000005259 measurement Methods 0.000 abstract 1
- 230000005540 biological transmission Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001303 quality assessment method Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 206010021403 Illusion Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Monitoring And Testing Of Exchanges (AREA)
Abstract
The present invention relates to a broad frequency band voice quality objective evaluation method. The amplitude of measured voice and reference speech are normalized to be an average value which is zero, and standard deviation is a sequence of 1; the hearing threshold of a critical zone in a frequency band of 50 to 7000Hz is calculated; the threshold value of a quiet frame is calculated according to the energy of a window adding voice frame; the power spectrum of a signal whhich is normalized is calculated; a Bark spectrum is obtained through summation in the critical zone; the loudness of the voice frame is calculated according to the Bark spectrum; loudness vector quantity is normalized; the voice loudness *t is encoded according to the loudness *o of the original voice, and a distortion marker M (i) is perceptible by determining a noise shielding threshold value Th<n>; the distortion of each frame is given; the steps are repeated, and the distortion WBSD of the integral voice section is calculated. In the condition that whether voice distortion does not influence hearing quality in the quiet section, each unmute section is accumulated and added. Average values are calculated according to the number of frames of unmute frames to obtain the WBSD of the integral voice section. The present invention keeps good correlativity with subjective quality measurement, and improves precision.
Description
Technical Field
The invention belongs to the field of voice communication quality evaluation, and particularly relates to an objective quality evaluation method for broadband voice communication on a data network.
Background
The data network transmits voice service, the problem of service quality must be considered, in order to effectively utilize bandwidth, speech coding technology and voice activity detection technology are used to realize Discontinuous Transmission (DTX), and the signal received by a listener and the signal sent by a speaker are not strictly synchronized in time domain. Meanwhile, due to the improvement of the demand of people on the call quality, the broadband (50-7000 Hz) voice communication can be more widely applied due to higher intelligibility, naturalness and definition. The original objective quality evaluation method of the telephone bandwidth (300-3400 Hz) voice has the following defects: a. the objective quality evaluation requirement of broadband voice cannot be met; b. the objective quality evaluation requirement after adopting discontinuous transmission on the packet network can not be met.
Disclosure of Invention
The invention aims to provide a method for evaluating broadband voice transmission quality on a packet network, which overcomes the defects of the existing circuit switching network objective voice quality evaluation method.
In order to achieve the purpose, the invention provides a broadband voice quality objective evaluation method, which is characterized by comprising the following steps of:
(1) the voice section comprises a test voice and a reference voice, a voice frame is taken from the voice section for calculation, and the amplitudes of the test voice and the reference voice are normalized into a sequence with an average value of 0 and a standard deviation of 1;
(2) calculating a critical band hearing threshold in a frequency band of 50-7000 Hz;
(3) calculating a quiet frame speech energy threshold based on the energy of a reference speech windowed speech frame, and if the energy of a frame of speech is less than the quiet frame speech energy threshold, the frame of speech does not participate in the quality assessment, the quiet frame speech energy threshold EnSilenceThIs compared with the energy En of the maximum energy frameMax, its energy is below 15 dB;
(4) calculating a power spectrum for the normalized signal;
(5) summing in a critical band to obtain a Bark spectrum;
(6) according to Bark spectrum, calculating loudness of current speech frame, i.e. calculating loudness upper L of each critical band of test speecht(i) And the loudness L of each critical band of the reference speecho(i) Wherein i is more than or equal to 1 and less than or equal to K, and K is the number of critical zones;
(7) calculating normalized loudness of test speechThe normalization factor is equal to the critical band loudness L of the reference speecho(i) Sum of and L on each critical band loudness of the test speecht(i) The ratio of (a) to (b);
(8) according to the critical band loudness L of the reference speecho(i) Testing the normalized loudness of speechSum noise masking threshold Thn(i) Determining a perceivable distortion flag m (i):
(9) the distortion d (i) for each critical band is given by:
(10) and (4) repeating the steps (1) to (9), calculating the whole voice section frame by frame, then calculating the distortion WBSD of the whole voice section, judging whether the voice distortion exists in the quiet frame or not and not influencing the hearing quality, accumulating and summing the distortion of each non-quiet frame, and averaging according to the frame number of the non-quiet frame to obtain the WBSD of the whole voice section.
Wherein,
n: total number of processed non-silent frames
K: critical band number
D(j)(i) The method comprises the following steps Distortion of ith critical band of jth frame of reference speech
Further, in the step (10), the linear prediction coefficient LPC is calculated from the power spectrum of the reference speech, and after weighting the Bark spectral distance of each critical band by the LPC spectral envelope, the average value is calculated, wherein the weighting coefficient W is(j)(i) Summing the LPC filter frequency response values within the ith critical band for the jth frame;
the invention provides a method for calculating weighted spectral distance, which is used for calculating the spectral distance of each frame after weighting the critical band with the spectral distance larger than a masking value according to the amplitude of an LPC (linear predictive coding) spectrum. After FFT calculation, autocorrelation coefficients are directly calculated in the frequency domain, and LPC spectrum is calculated through Durbin algorithm.
Furthermore, in step (1) above, a time-hierarchical alignment based on voice activity detection is added, with analysis thereafter being performed after the active speech segments are time-aligned.
The invention has the following advantages and positive effects:
(1) calculating the speech Bark spectral distance in a wide frequency band as a measure basis, matching with the auditory characteristics of human ears, and keeping good correlation with subjective quality measure;
(2) by adopting a loudness linear interpolation algorithm, the precision is higher than that of a table lookup interpolation calculation method used for calculating the general loudness;
(3) the peak value of the spectrum of the LPC corresponds to the formant of the speech signal, and the frequency band corresponding to the formant has a direct relation with the intelligibility of the speech. The correlation between the method and the subjective quality can be improved by increasing the weight;
(4) due to the action of the voice activity detector, the problem that the reference speech and the detected speech are not synchronous due to the discontinuous transmission in the voice communication of the packet network can be overcome.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a graph of weighting coefficients derived from the LPC filter frequency response for an embodiment of the present invention;
fig. 3 is a schematic diagram of uninterrupted transmission according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings 1 to 3.
The invention provides a broadband voice quality objective evaluation method, which comprises the following steps:
(1) the test voice and the reference voice amplitude are normalized into a sequence with the average value of 0 and the standard deviation of 1;
(2) calculating a critical band hearing threshold in a frequency band of 50-7000 Hz;
(3) a quiet frame threshold is calculated based on the energy of the windowed speech frames, and if the energy of a frame of speech is less than the quiet frame threshold, the frame of signal does not participate in the quality assessment. Quiet frame speech energy threshold EnSilenceThIs compared with the energy En of the maximum energy frameMaxIts energy is lower than 15 dB;
(4) calculating a power spectrum of the normalized signal;
(5) summing in a critical band to obtain a Bark spectrum;
(6) calculating the loudness of the voice frame according to the Bark spectrum;
(7) normalized loudness vector Li(i) The normalization factor is equal to the loudness L of the reference speech frame0(j) And the loudness L of the test speech framei(j) The ratio of (A) to (B), wherein K is the number of critical bands;
(8) loudness according to reference speechTesting loudness of speechSum noise masking threshold ThnDetermining a perceivable distortion flag m (i):
(9) distortion per frame d (i) is given by:
(10) repeating the steps (1) - (9), calculating the distortion WBSD of the whole voice segment, judging whether the voice distortion does not influence the hearing quality in the quiet segment, accumulating and summing each non-quiet segment, and averaging according to the frame number of the non-quiet frame to obtain the WBSD of the whole voice segment.
Wherein,
n: total number of frames processed
K: critical band number
Lo (j)(i) The method comprises the following steps J frame Bark spectrum of reference speech
Lt (j)(i) The method comprises the following steps Testing the j frame Bark spectrum of speech
FIG. 1 is a flow chart showing an embodiment of the method, wherein test speech y (n) and reference speech x (n) are input into the BSD preprocessor respectively, and loudness L of each critical band in a frame of test speech is calculatedy(j) And the loudness L of each critical band in a frame of reference speechx(j) In that respect The bandwidth of the voice is limited to 50-7000 Hz, a critical band of a Bark number from 1 to 21 is covered, the corresponding frequency is 20-7700 Hz, and therefore in the whole calculation process, the loudness model is a 21-dimensional feature vector. The noise threshold value calculating section derives a noise masking threshold value Thn and a perceptual distortion flag m (j) for each critical band. BSD preprocessor and noiseThe result of the acoustic threshold computation module is the degree of distortion WBSD per frame. The input voice signal is a 16-bit signed integer, and the sampling frequency is 16 KHz. In the BSD preprocessor, firstly, a voice signal is converted from a time domain to a frequency domain, FFT calculation is used, the window length of FFT is 1024 points, the frame length of each frame of voice is 20ms, corresponding to 640 voice sample points, the frame is shifted to 10 ms.
As shown in fig. 2, the linear prediction coefficient LPC is calculated for the windowed speech signal, and the frequency response of the filter is calculated, and the frequency response of the filter is the broken line. The peaks of the filter correspond to the formants of the frame of speech. And summing the frequency response values in each critical band, averaging to obtain a weighted coefficient W (i), and calculating the voice distortion WBSD according to the following formula.
As shown in fig. 3, in a data network, because uninterrupted transmission is used, the recipient's speech is not time aligned with the speaker's speech, and voice activity detection methods may be used to time align the active speech segments, analyze them frame by frame, and compute the WBSD.
Taking the g.722.1 coding as an example, the voice quality under different packet loss rates is calculated, and the correlation between the test result and the subjective test result is not lower than 0.8.
Claims (3)
1. A broadband voice quality objective evaluation method is characterized by comprising the following steps:
(1) the voice section comprises a test voice and a reference voice, a voice frame is taken from the voice section for calculation, and the amplitudes of the test voice and the reference voice are normalized into a sequence with an average value of 0 and a standard deviation of 1;
(2) calculating a critical band hearing threshold in a frequency band of 50-7000 Hz;
(3) calculating a quiet frame speech energy threshold based on the energy of a reference speech windowed speech frame, if the energy of a frame of speech is less than the energy of the quiet frame speechThreshold value, this frame signal does not participate in quality evaluation, and silent frame speech energy threshold value EnSilenceThIs compared with the energy En of the maximum energy frameMaxIts energy is lower than 15 dB;
(4) calculating a power spectrum for the normalized signal;
(5) summing in a critical band to obtain a Bark spectrum;
(6) according to Bark spectrum, calculating loudness of current speech frame, i.e. calculating loudness L of each critical band of test speecht(i) And the loudness L of each critical band of the reference speecho(i) Wherein i is more than or equal to 1 and less than or equal to K, and K is the number of critical zones;
(7) calculating normalized loudness of test speechThe normalization factor is equal to the critical band loudness L of the reference speecho(i) And the critical band loudness L of the test speecht(i) The ratio of (a) to (b);
(8) according to the critical band loudness L of the reference speecho(i) Testing the normalized loudness of speechSum noise masking threshold Thn(i) DeterminingPerceivable distortion flag m (i):
(9) the distortion d (i) for each critical band is given by:
(10) repeating the steps (1) - (9), calculating the whole voice section frame by frame, then calculating the distortion WBSD of the whole voice section, if the voice distortion does not affect the hearing quality in the quiet frame, accumulating and summing the distortion of each non-quiet frame, and averaging according to the frame number of the non-quiet frame to obtain the WBSD of the whole voice section;
wherein,
n: total number of processed non-silent frames
K: critical band number
D(j)(i) The method comprises the following steps Distortion of ith critical band of jth frame of reference speech
2. The objective evaluation method for wideband speech quality according to claim 1, wherein: in the step (10), linear prediction coefficients LPC are calculated based on the power spectrum of the reference speech, and after the Bark spectral distance of each critical band is weighted according to the LPC spectral envelope, an average value is calculated, in which the weighting coefficient W is(j)(i) Summing the LPC filter frequency response values within the ith critical band for the jth frame;
3. the objective evaluation method for wideband speech quality according to claim 1 or 2, characterized in that: in step (1) above, a time-hierarchical alignment based on voice activity detection is added, with analysis thereafter being performed after the active speech segments are time-aligned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2003101112735A CN100347988C (en) | 2003-10-24 | 2003-10-24 | Broad frequency band voice quality objective evaluation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2003101112735A CN100347988C (en) | 2003-10-24 | 2003-10-24 | Broad frequency band voice quality objective evaluation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1538667A CN1538667A (en) | 2004-10-20 |
CN100347988C true CN100347988C (en) | 2007-11-07 |
Family
ID=34335996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2003101112735A Expired - Fee Related CN100347988C (en) | 2003-10-24 | 2003-10-24 | Broad frequency band voice quality objective evaluation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100347988C (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106851713A (en) * | 2015-12-07 | 2017-06-13 | 中兴通讯股份有限公司 | Terminal speech evaluation the quality method and apparatus, switch managing method and device |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100382514C (en) * | 2005-01-13 | 2008-04-16 | 康全电讯股份有限公司 | Method for testing speech quality of network speech apparatus |
CN1321390C (en) * | 2005-01-18 | 2007-06-13 | 中国电子科技集团公司第三十研究所 | Establishment of statistics concerned model of acounstic quality normalization |
CN1321400C (en) * | 2005-01-18 | 2007-06-13 | 中国电子科技集团公司第三十研究所 | Noise masking threshold algorithm based Barker spectrum distortion measuring method in objective assessment of sound quality |
CN101609686B (en) * | 2009-07-28 | 2011-09-14 | 南京大学 | Objective assessment method based on voice enhancement algorithm subjective assessment |
CN102231279B (en) * | 2011-05-11 | 2012-09-26 | 武汉大学 | Objective evaluation system and method of voice frequency quality based on hearing attention |
CN103632679A (en) * | 2012-08-21 | 2014-03-12 | 华为技术有限公司 | An audio stream quality assessment method and an apparatus |
CN103716470B (en) * | 2012-09-29 | 2016-12-07 | 华为技术有限公司 | The method and apparatus of Voice Quality Monitor |
EP2922058A1 (en) * | 2014-03-20 | 2015-09-23 | Nederlandse Organisatie voor toegepast- natuurwetenschappelijk onderzoek TNO | Method of and apparatus for evaluating quality of a degraded speech signal |
US9548067B2 (en) | 2014-09-30 | 2017-01-17 | Knuedge Incorporated | Estimating pitch using symmetry characteristics |
US9396740B1 (en) * | 2014-09-30 | 2016-07-19 | Knuedge Incorporated | Systems and methods for estimating pitch in audio signals based on symmetry characteristics independent of harmonic amplitudes |
CN105989853B (en) * | 2015-02-28 | 2020-08-18 | 科大讯飞股份有限公司 | Audio quality evaluation method and system |
CN105551496B (en) * | 2015-12-30 | 2020-01-31 | 哈尔滨海能达科技有限公司 | method, device and terminal for judging voice coding and decoding technology |
CN105656931B (en) * | 2016-03-01 | 2018-10-30 | 邦彦技术股份有限公司 | Method and device for objectively evaluating and processing voice quality of network telephone |
JP6742620B2 (en) * | 2016-10-14 | 2020-08-26 | 公立大学法人大阪 | Swallowing diagnostic device and program |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002073601A1 (en) * | 2001-03-13 | 2002-09-19 | Koninklijke Kpn N.V. | Method and device for determining the quality of a speech signal |
-
2003
- 2003-10-24 CN CNB2003101112735A patent/CN100347988C/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002073601A1 (en) * | 2001-03-13 | 2002-09-19 | Koninklijke Kpn N.V. | Method and device for determining the quality of a speech signal |
Non-Patent Citations (3)
Title |
---|
语音评价技术的发展与研究现状 景新幸,沈述明.电讯技术,第38卷第6期 1998 * |
语音质量客观评价方法研究进展 陈国,胡修林,张蕴玉,朱耀庭.电子学报,第29卷第4期 2001 * |
语音质量客观评价的一步策略 付强,易克初,田斌,张知易.电子学报,第29卷第7期 2001 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106851713A (en) * | 2015-12-07 | 2017-06-13 | 中兴通讯股份有限公司 | Terminal speech evaluation the quality method and apparatus, switch managing method and device |
CN106851713B (en) * | 2015-12-07 | 2021-11-12 | 中兴通讯股份有限公司 | Terminal voice service quality evaluation method and device, and switching management method and device |
Also Published As
Publication number | Publication date |
---|---|
CN1538667A (en) | 2004-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100347988C (en) | Broad frequency band voice quality objective evaluation method | |
EP1738355B1 (en) | Signal encoding | |
CN1320521C (en) | Method and device for selecting coding speed in variable speed vocoder | |
RU2232434C2 (en) | Process conducting machine evaluation of quality of audio signals | |
CN1188835C (en) | System and method for reducing noise | |
US6889187B2 (en) | Method and apparatus for improved voice activity detection in a packet voice network | |
US20050108004A1 (en) | Voice activity detector based on spectral flatness of input signal | |
EP2619753B1 (en) | Method and apparatus for adaptively detecting voice activity in input audio signal | |
US20120130711A1 (en) | Speech determination apparatus and speech determination method | |
US20050091040A1 (en) | Preprocessing of digital audio data for improving perceptual sound quality on a mobile phone | |
CN1675684A (en) | Distributed speech recognition with back-end voice activity detection apparatus and method | |
KR101048278B1 (en) | Auditory-articulation analysis for speech quality assessment | |
Sakhnov et al. | Approach for Energy-Based Voice Detector with Adaptive Scaling Factor. | |
JP2001501790A (en) | Method and apparatus for detecting bad data packets received by a mobile telephone using decoded speech parameters | |
Itoh et al. | Environmental noise reduction based on speech/non-speech identification for hearing aids | |
US20090161882A1 (en) | Method of Measuring an Audio Signal Perceived Quality Degraded by a Noise Presence | |
US20090299740A1 (en) | Method of processing audio signals for improving the quality of output audio signal which is transferred to subscriber's terminal over network and audio signal pre-processing apparatus of enabling the method | |
WO1999012155A1 (en) | Channel gain modification system and method for noise reduction in voice communication | |
Beritelli et al. | A psychoacoustic auditory model to evaluate the performance of a voice activity detector | |
EP1010169B1 (en) | Channel gain modification system and method for noise reduction in voice communication | |
JPH0784596A (en) | Method for evaluating quality of encoded speech | |
KR100399057B1 (en) | Apparatus for Voice Activity Detection in Mobile Communication System and Method Thereof | |
Vini | Voice Activity Detection Techniques-A Review | |
Tarraf et al. | Neural network-based voice quality measurement technique | |
Nam et al. | A preprocessing approach to improving the quality of the music decoded by an EVRC codec |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20071107 Termination date: 20131024 |