WO2011129655A2 - Method, apparatus, and program-containing medium for assessment of audio quality - Google Patents

Method, apparatus, and program-containing medium for assessment of audio quality Download PDF

Info

Publication number
WO2011129655A2
WO2011129655A2 PCT/KR2011/002713 KR2011002713W WO2011129655A2 WO 2011129655 A2 WO2011129655 A2 WO 2011129655A2 KR 2011002713 W KR2011002713 W KR 2011002713W WO 2011129655 A2 WO2011129655 A2 WO 2011129655A2
Authority
WO
WIPO (PCT)
Prior art keywords
eitddist
signal
under test
reference signal
audio
Prior art date
Application number
PCT/KR2011/002713
Other languages
French (fr)
Other versions
WO2011129655A3 (en
Inventor
Jeong-Hun Seo
Koeng Mo Sung
Sang-Bae Chon
In-Yong Choi
Original Assignee
Jeong-Hun Seo
Koeng Mo Sung
Sang-Bae Chon
In-Yong Choi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jeong-Hun Seo, Koeng Mo Sung, Sang-Bae Chon, In-Yong Choi filed Critical Jeong-Hun Seo
Publication of WO2011129655A2 publication Critical patent/WO2011129655A2/en
Publication of WO2011129655A3 publication Critical patent/WO2011129655A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Definitions

  • the present invention relates to an audio quality assessment method, an audio quality assessment apparatus, and audio quality assessment program containing media, particularly for objective audio quality assessment.
  • Needs for objective sound quality assessment technique for a multi channel audio signal has been arising with development of multi channel compression techniques and common use of multi channel systems.
  • One of purposes of the present invention is to develop an assessment feature to objectively evaluate a multi channel audio compression codec, and to develop a method for evaluating an audio compression codec using the assessment feature, and to develop an apparatus for the same method, and a program-containing medium to conduct the method.
  • the quality prediction model in ITU-R Recommendation BS. 1387-1 may be extended to multichannel audio coding systems showing high performance in the prediction of the perceived quality.
  • This extended model may use at least thirteen features - ten timbral features from ITU-R Rec. BS. 1387-1 and three additional spatial features called ITDDist (Interaural Time Difference Distortions), ILDDist (Interaural Level Difference Distortions), and lACCDist (Interaural Cross Correlations Distortions).
  • ITDDist can be used as an important feature for predicting errors in sound localization.
  • ITDDist may be calculated only for low frequency bands, based on the claims that ITDs (Interaural Time Differences) have greater salience on low frequency bands where interaural phase differences are unambiguous.
  • ITDs Interaural Time Differences
  • Envelope ITDDist may be calculated in high frequency bands.
  • ITD is used for recognizing the location of low frequency sound source.
  • the excitation pattern of a basilar membrane, which is generated by low frequency sound excitation, is delivered to a MSO (Medial Superior Olive).
  • Coincidence detection neurons may process the delivered signal to calculating ITD.
  • Human brain can recognize a sound location by using ITD.
  • the excitation pattern of a basilar membrane is delivered to a LSO (Lateral Superior Olive). Due to this, different levels of electric signals are produced at both LSOs (left and right).
  • Human brain can recognize a sound location using this interaural level difference of the electric signals.
  • human brain may also utilize the signal envelope information of high frequency sound for sound localization.
  • neurons located in LSOs are sensitive to high frequency transposed tones.
  • the neuron firing probabilities in auditory nerve fibers (ANFs) for high frequency transposed tones and low frequency tones are similar to each other. Sensitivity to ITDs of the high frequency envelope can be equivalent to that of ITDs in low frequency sound. Based on this phenomenon, it can be thought that EITDs of high frequency components have as much influence on sound localization of human listeners, as ITDs of low frequency sound and ILDs of high frequency sound do.
  • EITDs of high frequency components by human brain is important issue.
  • the central mechanisms related to the sensitivity of envelope-based ITDs are similar to those related to sensitivity of fine-structure-based ITDs. If the central mechanisms of the two different cases have the similarity, EITDs of high frequency components can be computed as derived by coincidence detection neurons in the MSO, although binaural cues for sound localization of high frequency sounds are extracted in the LSO. Therefore, perceived EITDs of high frequency components can be computed by the cognition model used in the computation of ITD for low frequency bands.
  • An audio quality measurement method comprises a step for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion (EITDDist) based on a comparison between a reference signal and a signal under test, and a step for mapping the number of model output variables to an audio quality value.
  • MOVs model output variables
  • EITDDist envelope interaural time difference distortion
  • Computer readable medium is provided according to another aspect of the present invention.
  • the computer readable medium is for storing computer instructions executable by a processor for modifying an operation of a device having a processor.
  • the computer readable medium comprises computer code for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion (EITDDist) based on a comparison between a reference signal and a signal under test, and mapping the number of model output variables to an audio quality value.
  • MOVs model output variables
  • EITDDist envelope interaural time difference distortion
  • Computer readable medium is provided according to still another aspect of the present invention.
  • the computer readable medium is for storing a set of computer instructions executable by a processor for modifying another set of computer instructions.
  • the another set of computer instructions is for producing a number of model output variables (MOVs) based on comparisons between a reference signal and a signal under test and mapping the number of model output variables to an audio quality value.
  • the computer readable medium comprises computer code for modifying the another set of computer instructions to have the number of model output variables comprise a variable representing envelope interaural time difference distortion ⁇ EITDDist) based on a comparison between the reference signal and the signal under test.
  • An audio quality measurement apparatus comprises a producing mean for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion ⁇ EITDDist) based on a comparison between a reference signal and a signal under test, and a mapping mean for mapping the number of model output variables to an audio quality value.
  • the producing mean and the mapping mean may be parts of a processing unit configured to execute a set of instruction for the producing and the mapping.
  • the EITDDist may be any type of the EITDDist.
  • EITDDist - ElTDDist[k, n]
  • the EITDDist[k,n] may represent a value of envelope interaural time difference distortion obtained by comparing the reference signal and the signal under test at k-th frequency band of n-th time-frame.
  • the EITDDist[k,n] may be given by
  • the AEITD[K,n] may represent a difference between envelope interaural time differences of the reference signal and the signal under test at k-th frequency band of n-th time-frame
  • the C test [k,n] may represent a nonlinearly transformed value of a envelope interaural cross-correlation coefficient (EIACC) of the signal under test at k-th frequency band of n-th time-frame
  • the C ref [k,n] may represent a nonlinearly transformed value of a envelope interaural cross-correlation coefficient (EIACC) of the reference signal at k-t frequency band of n-th time-frame.
  • the reference signal may be obtained from a multichannel audio signal
  • the signal under test may be obtained from an output of a device under test through which the multichannel audio signal is inputted.
  • At least one of the number of model output variables may be based on a comparison between excitation patterns of the reference signal and the signal under test.
  • the EITDDist may be obtained by applying the reference signal and the signal under test to a filter bank.
  • the reliability of an objective assessment model for multichannel audio codec can be increased by use of the variable ElTDDist.
  • FIG. 1 is a diagram illustrating a structure of a multi-channel audio reproduction system recommended by ITU-R, to which an embodiment of the present invention can be applied.
  • FIG. 2 is a diagram illustrating a structure of an apparatus for evaluating the audio quality of a multi-channel audio codec in accordance with an embodiment of the present invention.
  • FIG. 3 is a diagram describing an embodiment of sound transfer paths in accordance with an embodiment of the present invention.
  • FIG. 4 is a diagram describing the operation of one example of the preprocessing unit for binaural signal synthesis in accordance with an embodiment of the present invention.
  • FIG. 5 is a flowchart illustrating a method for evaluating an audio quality of a multi-channel audio codec in accordance with another embodiment of the present invention.
  • FIG. 6 is a flow chart for calculating an ILD distortion in accordance with one embodiment of the present invention.
  • FIG. 7 is a flow chart for calculating an EITD distortion in accordance with one embodiment of the present invention.
  • FIG. 8 is a sample envelope of an exemplary sound signal.
  • FIG. 9 shows a more detailed version of the flow chart of FIG. 7 calculating an EITD distortion.
  • ITU-R Recommendation BS.l 1 16-1 "Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems", ITU-R Recommendation BS. 1387-1, "Method for objective measurements of perceived audio quality", International Telecommunication Union, Geneva, Switzerland, 1998, ITU-R
  • a multi-channel audio has six channels (or 5.1 channel) such as front 5 speakers (LF (left front) and RF (right front)), a center speaker (C), an intermediate and low sound channel (LFE: low frequency effect), and rear speakers ((LS (left surround) and RS (right surround)).
  • front 5 speakers LF (left front) and RF (right front)
  • center speaker C
  • LFE intermediate and low sound channel
  • rear speakers ((LS (left surround) and RS (right surround)
  • FIG. 1 is a diagram illustrating a structure of a multi-channel audio reproduction system recommended by ITU-R, to which an embodiment of the present invention may apply.
  • the five channel speakers may be arranged on the line of one circle
  • the distance between the center speaker C in the front and the listener 10 may be equal to that between the front left and the right speakers L and R.
  • the rear left and the right speakers LS and RS may be placed on the concentric circle of 100 to 120 degrees with respect to the front which is 0 degree.
  • the reason why the reproduction system is arranged to conform to the standard arrangement recommended by the ITU-R is that the intended audio quality (the best audio quality) can be obtained by doing so, because most of sources are being edited/recorded based on the standard arrangement.
  • the listener 10 of the multi-channel the listener 10 of the multi-channel
  • 25 audio reproduction system recommended by the ITU-R is substituted by an audio quality evaluation apparatus for a multi-channel audio codec, which evaluates the audio quality of the codec by measuring impulse responses of multi-channel audio signals from the five channel speakers L, R, C, LS and RS by using an binaural microphone that simulates the body (the head and upper half).
  • FIG. 2 is a diagram illustrating a structure of an apparatus for evaluating an audio quality of a multi-channel audio codec in accordance with an embodiment of the present invention.
  • the audio quality evaluation apparatus 10 of a multi-channel audio codec may include a preprocessing unit 1 1 for synthesizing binaural signals
  • an output variable calculator 12 for calculating MOVs (Model Output Variables) including lACCDist (InterAural Cross-correlation Coefficient Distortion), ILDDist (Interaural Level Difference Distortion), and EITDDist (Envelope Interaural Time Difference Distortion), and an artificial neural network circuit 13 for outputting a grade of the audio quality on the basis of the MOVs calculated from the output variable calculator 12.
  • MOVs Model Output Variables
  • MOVs Model Output Variables
  • lACCDist InterAural Cross-correlation Coefficient Distortion
  • ILDDist Interaural Level Difference Distortion
  • EITDDist envelope Interaural Time Difference Distortion
  • IACC represents the maximum value of the normalized cross correlation function between the left ear input and the right ear input.
  • ILD denotes the ratio of intensity of signals between the left ear input and the right ear input.
  • EITD represents the time difference between the audio signal envelopes inputted through left and right ears, particularly for high frequency band audio signal.
  • a preprocessing unit 1 1 may convolve head related impulse responses of corresponding azimuth angles— that simulate the transfer function of the sound propagation path including the body (head and torso) of a listener-to the five channel test signals and five channel reference signals, and sums up the
  • the total number of the sound transfer paths is ten, due to the five locations of loudspeakers and two ears of a listener, which may be represented by graphs as depicted in FIG. 3.
  • the output variable calculator 12 calculates MOVs including lACCDist, ILDDist, and EITDDist. Those two variables, lACCDist and ILDDist, mirror degradations in the attributes of spatial quality.
  • the calculated MOVs may then be provided to artificial neural network circuit 13.
  • the artificial neural network circuit 13 may output a grade of the audio quality based on the MOVs provided from the output variable calculator 12.
  • the grade of the audio quality may be referred to as ODG (Objective Difference Grade).
  • the output variable calculator 12 may calculate ILDDist from the binaural
  • the ILD of an uncompressed original audio signal may be denoted as ILD re /
  • the ILD of the audio signal which is encoded and decoded by the multi-channel audio codec under test may be denoted as ILD test .
  • the IACC may be named in the similar way.
  • the binaural signals may be converted to time-frequency segment signals with the 75% overlapped time frames (of the length that equivalent to 50 ms for IACC, and of the length that equivalent to 10 ms for ILD) and 24 auditory critical bands filter-banks.
  • ILDDist for a k-th frequency band of an n-th time frame may be represented as ILDDist[k,n].
  • ILDDist[k, n] w[k] ⁇ ILD test [k, n] - ILD ref [k, n]
  • ILDDist represents an interaural level difference distortion
  • w[k,n] represent a weighted function that is decided depending on the range of the critical band, which reflects the intensity level of a time-frequency segment and auditory sensitivity to the ILD.
  • Equation 2 To acquire the ILDDist[n] of the entire auditory band in the n-th time frame, an average is taken for the entire frequency bands as following Equation 2.
  • ILDDist[n] - ⁇ ILDDist[k, n]
  • IACCDist may be named as ICCDist. Since ICCDist and ILDDist have the high cross correlation with the audio quality evaluation (subjective evaluation) result of the multi-channel audio codec by the listener, output variable calculator 12 can regard ICCDist and ILDDist as the output variables. Variables including ICCDist and ILDDist may be inputted to artificial neural network circuit 13, to thereby output the one-dimensional grade of the audio quality with the objectivity and consistency.
  • FIG. 4 is a diagram describing the operation of one example of the preprocessing unit of the audio quality evaluation apparatus in accordance with an embodiment of the present invention.
  • a preprocessing unit 1 1 of audio quality evaluation apparatus 10 converts an impulse response of each sound transfer path which is measured by using an interaural microphone that simulates the body (the head and upper half) of the standard multichannel audio reproduction system recommended by the ITU-R into a transfer function, and sums up the transfer functions, to thereby calculate the interaural input ⁇ ⁇ ⁇ ⁇
  • FIG. 5 illustrates a flowchart for a method of evaluating an audio quality of a multi-channel audio codec in accordance with an embodiment of the present invention.
  • a preprocessing unit 1 1 of the audio quality evaluation apparatus 10 for a multi-channel audio codec converts impulse responses of each of sound sources which are encoded and decoded by the multi-channel audio codec and original sound sources into transfer functions, and sums up the transfer functions, to thereby calculate the interaural
  • an output variable calculator 12 may calculate MOVs including IACCDist, ILDDist, and EITDDist from the time-frequency
  • the calculated MOVs may be then applied to an artificial neural network circuit 13 (S503).
  • the artificial neural network circuit 13 may output an objective audio quality grade based on the MOVs produced at the output variable calculator 12 (S504).
  • the output variable calculator 12 may further produce EITDs
  • the produced EITDs may be inputted to the artificial neural network circuit 13.
  • Audio quality degradation caused by change of audio signal location is one of important evaluation factors.
  • the location of an audio signal can be recognized by ILD for high frequency component.
  • ILD In addition to ILD, EITD of high frequency component of an audio signal influences the mechanism of recognizing a location of an audio signal.
  • ILD and EITD may be respectively calculated both for a reference signal (i.e., original signal) and a test signal
  • ILDDist or EITDDist may be calculated using cognitive distance or difference between ILDs or EITDs obtained from the reference signal and the test signal, respectively.
  • multichannel audio signals may be synthesized into binaural signals.
  • HRTFs Head-Related Transfer Functions
  • a HRTF represents an audio signal transfer path from each speaker to left and right ears.
  • ILD and EITD of a high frequency audio signal may be calculated using the synthesized binaural signals.
  • FIG. 6 is a flow chart for calculating an ILD distortion in accordance with one embodiment of the present invention.
  • a binaural synthesis part 601 may produce binaural signals
  • a peripheral ear model part 602 may produce excitation patterns of the reference
  • Envelop extraction part 603 may produce envelopes of the excitation patterns of the reference signal and envelopes of the excitation patterns of the test signal, respectively.
  • a cognition model part 604 may calculate an ILDDist value of a high frequency band by using the envelopes from the envelop extraction part 603.
  • the binaural synthesis part 601 of FIG. 6 may correspond to the preprocessing unit 1 1 of FIG. 2.
  • the peripheral ear model part 602, envelop extraction part 603, cognition model part 604 of FIG. 6 may be included in the output variable calculator 12 of FIG. 2.
  • ILD may be defined as an energy difference between the signals inputted to a peripheral ear model of left and right ear, which is composed of a multiple of band-pass filters having a center frequency decided by ERB (Equivalent Rectangular Bandwidth) scale, and may be represented by Equation 3.
  • a peripheral ear model is for calculating excitation patterns at basilar membrane, from audio signals inputted from both left and right ears.
  • Equation 3 Although the energy difference between the signals inputted from left and right ears can be expressed as Equation 3, human brain may process in different way when an ILD is given. When a non-zero ILD is given, the higher level signal among the signals from left and right ears may cause more frequent neural spikes in IC (Inferior Colliculuse) which processes the ILD, so the IC may have to handle the neural spikes. Because a model for the number of neural spikes occurring in IC follows a tangential sigmoid function, the calculated ILD value may further be nonlinearly transformed by a tangential sigmoid function and this is represented as Equations 4 and 5.
  • the gradient of a tangential sigmoid function shows different signs (e.g., positive or negative sign) according to the energy difference of the ear input signals. If the signal from left ear is larger than that of right ear, the gradient may have positive sign. To the contrary, if the signal from right ear is larger than that of left ear, the gradient may have negative sign.
  • the tangential sigmoid function may have different gradient according to each frequency band.
  • ' 73 ⁇ 4' may represent the threshold of the tangential sigmoid function
  • ' Tk' may be zero(0) in case of ILD.
  • an ILDDist may be represented as Equation 7 for a time-frequency segmented signal.
  • a resulting ILDDist may be obtained by calculating a mean value of ILDDist[k,n] values over the whole frequency bands and time frames, and may be represented as Equation 7.
  • the resulting ILDDist may be regarded as a cognitive distance due to the ILD between a test signal and a reference signal.
  • An EITDDist represents a cognitive distance of the audio signal location of a test signal and the audio signal location of a reference signal, which arises due to the difference of EITDs of the test and reference signals.
  • EITDDist, along with ILDDist, may be used as a feature for evaluating spatial impression that occurs due to the difference of high frequency audio signal source locations.
  • FIG. 7 is a flow chart for calculating an EITDDist in accordance with an embodiment of the present invention.
  • the binaural synthesis part 701 may produce binaural signals test
  • Binaural synthesis part 701 of FIG. 7 may correspond to the preprocessing unit 1 1 of FIG. 2.
  • multichannel sound sources may be synthesized test
  • Equation 8 may be used to synthesize binaural signals from the five channel signals.
  • each of the subscripts 'tesf and 'ref represents a test signal and a reference signal, respectively.
  • Ha, H L jL, HRJI, H LSL , HR S L, HCR, HLJR, HR/ ⁇ HL S R, HR S R of Equation 8 represent total ten of BRTFs (Binaural Room Transfer Functions) which represent acoustic wave paths from each speaker to left and right ears. Further, each of ⁇ and R of Equation 8 represents an acoustic wave input at left ear and right ear, respectively.
  • BRTFs Binary Room Transfer Functions
  • the synthesized binaural signals can be processed by a peripheral ear model. Input signals from two ears (left and right) are delivered to middle ears and then be processed in cochleas, and such a process can be reproduced by the peripheral ear model.
  • a cochlea simulator in the peripheral ear model may transform the binaural signals into signals which stimulate hair cells at basilar membrane.
  • the cochlea simulator may be regarded as a filter bank which is composed of a total of 24 pass-band filters with a center frequency decided by ERB (Equivalent Rectangular Bandwidth) scale.
  • ERB Equivalent Rectangular Bandwidth
  • the signals passed through the cochlea simulator may be transformed into excitation patterns of the signals filtered by respective pass-band filters.
  • the peripheral ear model part 702 of FIG. 7 may produce excitation patterns of the
  • the envelop extraction part 703 of FIG. 7 may produce envelopes of the excitation patterns of the reference signal and envelopes of the excitation patterns of the test signal, respectively.
  • Envelopes of the excitation patterns can be extracted by discrete Hilbert Transform.
  • the envelope is obtained by the squared sum of magnitude and Hilbert-transformed value of the excitation patterns.
  • FIG. 8 shows an example of an extracted envelope.
  • the solid line represents a full-rectified excitation pattern and the dashed line represents an extracted envelope.
  • An EITD can be obtained by calculating a binaural time difference of the extracted envelope.
  • the output signals from ERB-scale auditory filter-bank can be denoted as x[k,n], a time-frequency segmented signal.
  • x[k,n] a time-frequency segmented signal.
  • V represent the index number of frequency band and time frame, respectively.
  • Envelope signal E[k,ri ⁇ can be computed using discrete Hilbert Transformed signal H ⁇ x[k,n] ⁇ as shown in Equation 9.
  • x[k,n] can also be denoted as r[k,ri]
  • H ⁇ x[k, n) ⁇ can also be denoted as i[k,n].
  • Equation 9 'k' represents a frequency band index which is segmented by a peripheral ear model, and 'n' represents a time frame index which is being processed.
  • the cognition model part 704 of FIG.7 may calculate an EITDDist value of high frequency band by using the envelopes from the envelop extraction part 703.
  • high frequency EITDs can be computed using the time-segmented normalized cross-correlation function (NCF) as described in Equation 10.
  • IACC interaural cross-correlation coefficient
  • ITDs interaural time difference
  • the cross-correlation may be calculated with an approximately lOms-length rectangular window, overlapped by 7/8.
  • EITDs and EIACCs envelope InterAural Cross Correlation
  • Equations 1 1 and 12 'N' represents the scope of 'd', and means a theoretically possible ITD value. EITDs and EIACCs are measured for reference and test signals. Then, subscripts 'ref and 'test ' represent the corresponding signals, respectively.
  • Equation 13 the EITD difference between a test signal and a reference signal can be computed as shown in Equation 13. That is, the difference between EITDs of a test signal and a reference signal can be computed as the difference between two vectors with the corresponding phase angles to EITDs.
  • Equation 13 y denotes the sampling rate and W is the maximum ITD in sample numbers.
  • EITDs After calculating EITDs, the next process has to consider that EITD detection may fail in some cases. If EIACCs is too low, perceived source location is ambiguous. Thus, a decision factor that considers the certainty of computed EITDs is applied. This certainty can be modeled by a tangential sigmoid function that transforms EIACCs nonlinearly as shown in Equations 14 and 15. That is, EIACCs may be transformed non-linearly by a tangential sigmoid function in order to consider the case that the detection of sound locations may fail for too low EIACC. EIACC values can be non-linearly transformed by Equations 14 and 15 for a reference signal and a test signal.
  • the tangential sigmoid function used in this model may have a steepness S of 50, and threshold 7* may have different value in different frequency band, since each frequency band has different sensitivity to JTDs.
  • EITDDist value can be obtained as shown in Equation 16. That is, EITD distortion can be computed by applying nonlinearly transformed EIACC values as certainty factors to Equation 13.
  • the resulting EITDDist may be obtained by averaging EITDDist[k,n] values over frequency bands and time frames, as expressed in Equation 17. That is, the resulting EITDDist is an averaged value over frequency bands and time frames.
  • peripheral ear model part 702, envelop extraction part 703, and cognition model part 704 of FIG. 7 may be included in the output variable calculator 12 of FIG. 2.
  • FIG. 9 shows a more detailed version of the flow chart of FIG. 7 calculating an EITD distortion.
  • the binaural synthesis part 901 may produce binaural signals from multichannel signals, using Equation 8.
  • the peripheral ear model part 902 may produce excitation patterns of a reference signal and a test signal by using binaural input signals.
  • the envelop extraction part 903 may produce envelopes of the excitation patterns of the reference signal and envelopes of the excitation patterns of the test signal by using Equation 9, respectively.
  • the NFC part 904 can calculate EITDs and EIACCs using the obtained envelopes.
  • the EITD Distortion Computation part 905 can calculate an EITDDist value using the EITDs and EIACCs of the test and reference signals.
  • each of subscripts '/? ', 'L ', 'test ', 'ref, 'k 'n ' represents right channel, left channel, test signal, reference signal, frequency band index, and time frame index, respectively.
  • Equations 13 to 17 The £/7jDZ /5t-obtaining method that uses Equations 13 to 17 can be modified to a method using the following Equations 18 and 19.
  • EIACC values Before obtaining an EITDDist value from the EITD values calculated by Equation 1 1 , EIACC values may be non-linearly transformed by applying a tangential sigmoid function as shown in Equations 14 and 15.
  • the transformed EIACC value can be used as a weighting factor which can be applied to EITD values. Then, a cognitive EITD distance can be calculated from a weighted EITD values. Since the perceptual change of the source direction can be approximated as the Euclidian distance between two different positions on the unit circle, the EITD difference can be computed as in Equation 18.
  • Equation 18 ⁇ denotes the sampling rate and N is the maximum ITD in sample numbers.
  • c tes t[k,n] and c ref [k,n] of Equation 18 may also be denoted as p tes t[k,n] and p ref (k,n].
  • the resulting EITDDist is averaged over frequency bands and time frames, as expressed in Equation 20.
  • the resulting EITDDist can show a mean value of EITD distances, which means a cognitive distance between reference and test signals due to EITD value difference.
  • an audio signal evaluation apparatus may comprise a preprocessing means adapted to produce binaural input signals from multichannel audio signals from each channel(L, R, C, LS, RS) of a multichannel audio reproducing system, an output variable calculating means adapted to output model output values including lACCDist, ILDDist, EITDDist values, and a neural network circuit means adapted to output an audio quality level based on the model output variables.
  • a listening test database distributed from the ISO/MPEG audio group of ITU-R Recommendation BS. 1534-1 was used for the model. Subjective listening tests followed the procedures recommended in ITU-R Rec. BS.1534-1 "Multiple Stimulus with Hidden Reference and Anchor (MUSHRA). 1 1 different test signals were used in the listening tests. Each test excerpt was encoded and decoded by 1 1 different multichannel audio coding systems. Consequently, listening test database contains 121 items.
  • MUSHRA Multiple Stimulus with Hidden Reference and Anchor
  • Table 1 shows the correlation coefficients between the subjective listening evaluation result and the 14 evaluation features (MOVs) used for the objective evaluation scheme.
  • Each correlation coefficient ⁇ , ⁇ can be calculated as in Equation 20.
  • Equation 20 each of X and Y represents MOS and data of each feature, respectively. Correlation coefficients between the fourteen features (MOVs) and the subjective listening evaluation result were calculated for the 121 signals synthesized from binaural signals. Among the fourteen features, last four features represent the degree of degradation level of spatial impression. Ten model output values and four spatial features are listed in Table 2 and 3, respectively.
  • RDF band contains a
  • Each of the above MOVs can be used as an input variable for a prediction model for an objective audio quality evaluation.
  • An objective audio quality prediction model for multichannel audio coding system can show better prediction performance when a MOV representing spatial impression distortion having a high correlation coefficient with respect to a subjective listening evaluation result is added on the model.
  • EITDDist can be used as a model output variable for evaluating spatial impression distortion in an objective audio quality prediction model. Particularly, because EITDDist has high correlation with a subjective listening evaluation result, one can improve the performance of an objective audio quality prediction model for a multichannel audio coding system by adding EITDDist to the objective audio quality prediction model as an input feature.
  • the performance of an objective audio quality evaluation model can be improved by providing spatial impression features.
  • An evaluation model reflecting cognitive differences can be provided by mathematically modeling the audio signal process inside human brain using the spatial features.
  • the present invention is different from a conventional method which simply provides a distortion level between an original signal and its codes/encoded signal at individual frequency bands.
  • the present invention is for obtaining a result that is similar to a statistically processed result of subjective audio quality evaluation results in multichannel audio reproduction environment. According to an embodiment of the present invention, listening evaluation and statistical processing procedures can be omitted.
  • An embodiment of the present invention can be used for an audio compression codec performance evaluating method/apparatus in order to compare cognitive sound qualities of a reference signal and a test signal (i.e., signal under test) which is coded and decoded from the reference signal using the audio compression codec.
  • a test signal i.e., signal under test
  • the artificial neural network circuit may be substituted by a general digital signal processing unit. That means, the artificial neural network circuit in this document was introduced as an exemplary digital signal filter. Therefore, the scope of the present invention is not limited to the accompanying drawings its related descriptions.
  • features that influence spatial impression recognition can be obtained based on the psycho-acoustical and physiological research results, and the performance of an objective evaluation model for a multichannel audio codec can be improved by implementing the features by appropriate mathematical models.
  • the method of an embodiment of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD-ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This process may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.
  • an embodiment of the present invention can be implemented by various means, such as hardware, firmware, software or the combination thereof.
  • an embodiment of the present invention can be implemented with one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), processors, controllers, micro controllers, and micro processors.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGAs Field Programmable Gate Arrays
  • processors controllers, micro controllers, and micro processors.
  • an embodiment of the present invention can be implemented with modules, procedures, functions performing the above described means or steps.
  • a software code can be saved in a memory unit and run by a processor.
  • the memory unit may be located in or outside the processor, and communicate data with the processor by various conventional communication means.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Stereophonic System (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

An audio quality measurement method is disclosed. The method comprises a step for producing a number of model output variables including a variable representing envelope interaural time difference distortion based on comparison between a reference signal and a signal under test, and a step for mapping the number of model output variables to an audio quality value.

Description

[DESCRIPTION]
[Invention Title]
METHOD, APPARATUS, AND PROGRAM-CONTAINING MEDIUM FOR ASSESSMENT OF AUDIO QUALITY
[Technical Field]
The present invention relates to an audio quality assessment method, an audio quality assessment apparatus, and audio quality assessment program containing media, particularly for objective audio quality assessment.
[Background Art]
Prediction of perceived sound quality, or Objective' quality assessment, is one of popular applications in the field of psychoacoustics. Many researchers have introduced various methods to predict perceived quality. Some of those methods have been widely adopted and used for quality assessment of compression coding systems for monaural and stereo audio.
There is a proposal for audio quality assessment of audio signal processed by a single channel audio signal compression codec, which is recommended by ITU Radio communication Sector (see ITU-R Recommendation BS. 1387-1, "Method for objective measurements of perceived audio quality", International Telecommunication Union, Geneva, Switzerland, 1998). The proposal, however, has a limitation that it cannot be used for an intermediate/low performance audio codec and a multi-channel audio codec. Further, the objective assessment of this proposal is mainly focused on the applications which are subjectively assessed by applying ITU-R Recommendation BS. 1 116-1 (see ITU-R BS. 1116-1, "Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems").
On the other hand, for a multi -channel audio codec that is the object of evaluation, its development discussion is actively underway in the MPEG standard group of ISO/IEC/JTC1/SC29/WG1 1. There are publications developed by various institutions. Audio quality evaluation of these codecs has been made by a subjective listening evaluation method based on the 'MUSHRA' technique (ITU-R Recommendation BS. 1534-1, "Method for the subjective Assessment of Intermediate Sound Quality (MUSHRA)", International Telecommunication Union, Geneva, Switzerland, 2001). There are publications on listening evaluation results of diverse codecs employing the above method (see ISO/IEC JTC1/SC29/WG1 l(MPEG), N7138, "Report on MPEG Spatial Audio Coding RMO Listening Tests", and ISO/IEC JTC1/SC29/WG11(MPEG), N7139, "Spatial Audio Coding RMO Listening Test Data").
In evaluating an audio quality of a multi-channel audio codec, however, such a method is very subjective, wherein a listener directly listens to an audio signal, evaluates its audio quality, and then a statistical process is conducted thereon. Therefore, there are needs for new methods for performing audio quality evaluation through consistent audio quality measurement or predicting the result of an audio quality evaluation, without doing the listening evaluation by the listener and statistical process for the audio quality evaluation of the multi-channel audio codec.
[Summary of Invention]
[Technical Problem]
Needs for objective sound quality assessment technique for a multi channel audio signal has been arising with development of multi channel compression techniques and common use of multi channel systems.
One of purposes of the present invention is to develop an assessment feature to objectively evaluate a multi channel audio compression codec, and to develop a method for evaluating an audio compression codec using the assessment feature, and to develop an apparatus for the same method, and a program-containing medium to conduct the method.
The scope of the present invention is not limited by the above described purpose of the present invention.
[Technical Solution]
For an objective audio quality assessment of a multi channel audio signal, the quality prediction model in ITU-R Recommendation BS. 1387-1 may be extended to multichannel audio coding systems showing high performance in the prediction of the perceived quality. This extended model may use at least thirteen features - ten timbral features from ITU-R Rec. BS. 1387-1 and three additional spatial features called ITDDist (Interaural Time Difference Distortions), ILDDist (Interaural Level Difference Distortions), and lACCDist (Interaural Cross Correlations Distortions).
Especially, ITDDist can be used as an important feature for predicting errors in sound localization. ITDDist may be calculated only for low frequency bands, based on the claims that ITDs (Interaural Time Differences) have greater salience on low frequency bands where interaural phase differences are unambiguous. However, based on many investigations saying that the ITDs in high frequency components are also important for sound localization, especially the ITDs in temporal envelopes of high frequency signals, Envelope ITDDist may be calculated in high frequency bands.
Generally, human brain uses different processes to recognize the location of low frequency sound and high frequency sound. ITD is used for recognizing the location of low frequency sound source. The excitation pattern of a basilar membrane, which is generated by low frequency sound excitation, is delivered to a MSO (Medial Superior Olive). Coincidence detection neurons may process the delivered signal to calculating ITD. Human brain can recognize a sound location by using ITD.
On the other hand, for high frequency sound, the excitation pattern of a basilar membrane is delivered to a LSO (Lateral Superior Olive). Due to this, different levels of electric signals are produced at both LSOs (left and right). Human brain can recognize a sound location using this interaural level difference of the electric signals. In addition to the interaural level difference, human brain may also utilize the signal envelope information of high frequency sound for sound localization. Particularly, neurons located in LSOs are sensitive to high frequency transposed tones. In addition, the neuron firing probabilities in auditory nerve fibers (ANFs) for high frequency transposed tones and low frequency tones are similar to each other. Sensitivity to ITDs of the high frequency envelope can be equivalent to that of ITDs in low frequency sound. Based on this phenomenon, it can be thought that EITDs of high frequency components have as much influence on sound localization of human listeners, as ITDs of low frequency sound and ILDs of high frequency sound do.
The processing of EITDs of high frequency components by human brain is important issue. The central mechanisms related to the sensitivity of envelope-based ITDs are similar to those related to sensitivity of fine-structure-based ITDs. If the central mechanisms of the two different cases have the similarity, EITDs of high frequency components can be computed as derived by coincidence detection neurons in the MSO, although binaural cues for sound localization of high frequency sounds are extracted in the LSO. Therefore, perceived EITDs of high frequency components can be computed by the cognition model used in the computation of ITD for low frequency bands.
An audio quality measurement method according to one aspect of the present invention comprises a step for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion (EITDDist) based on a comparison between a reference signal and a signal under test, and a step for mapping the number of model output variables to an audio quality value.
Computer readable medium is provided according to another aspect of the present invention. The computer readable medium is for storing computer instructions executable by a processor for modifying an operation of a device having a processor. The computer readable medium comprises computer code for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion (EITDDist) based on a comparison between a reference signal and a signal under test, and mapping the number of model output variables to an audio quality value.
Computer readable medium is provided according to still another aspect of the present invention. The computer readable medium is for storing a set of computer instructions executable by a processor for modifying another set of computer instructions. The another set of computer instructions is for producing a number of model output variables (MOVs) based on comparisons between a reference signal and a signal under test and mapping the number of model output variables to an audio quality value. The computer readable medium comprises computer code for modifying the another set of computer instructions to have the number of model output variables comprise a variable representing envelope interaural time difference distortion {EITDDist) based on a comparison between the reference signal and the signal under test.
An audio quality measurement apparatus is provided according to still another aspect of the present invention. The apparatus comprises a producing mean for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion {EITDDist) based on a comparison between a reference signal and a signal under test, and a mapping mean for mapping the number of model output variables to an audio quality value. The producing mean and the mapping mean may be parts of a processing unit configured to execute a set of instruction for the producing and the mapping.
In the above describe various aspect of the present invention, the EITDDist may be
EITDDist = - ElTDDist[k, n]
given by
Figure imgf000005_0001
' , and the EITDDist[k,n may represent a value of envelope interaural time difference distortion obtained by comparing the reference signal and the signal under test at k-th frequency band of n-th time-frame. In this case, the EITDDist[k,n] may be given by
£/7DDisC[k, n] = i (ctest[k, n] + creflkf n]) - AOTD [k, n]
ώ , and the AEITD[K,n] may represent a difference between envelope interaural time differences of the reference signal and the signal under test at k-th frequency band of n-th time-frame, and the Ctest[k,n] may represent a nonlinearly transformed value of a envelope interaural cross-correlation coefficient (EIACC) of the signal under test at k-th frequency band of n-th time-frame, and the Cref[k,n] may represent a nonlinearly transformed value of a envelope interaural cross-correlation coefficient (EIACC) of the reference signal at k-t frequency band of n-th time-frame.
In the above describe various aspect of the present invention, the reference signal may be obtained from a multichannel audio signal, and the signal under test may be obtained from an output of a device under test through which the multichannel audio signal is inputted.
In the above describe various aspect of the present invention, at least one of the number of model output variables may be based on a comparison between excitation patterns of the reference signal and the signal under test.
In the above describe various aspect of the present invention, the EITDDist may be obtained by applying the reference signal and the signal under test to a filter bank.
[Advantageous Effects]
According to the present invention, the reliability of an objective assessment model for multichannel audio codec can be increased by use of the variable ElTDDist.
The scope of the present invention is not restricted by the described advantageous effects.
[Description of Drawings]
FIG. 1 is a diagram illustrating a structure of a multi-channel audio reproduction system recommended by ITU-R, to which an embodiment of the present invention can be applied.
FIG. 2 is a diagram illustrating a structure of an apparatus for evaluating the audio quality of a multi-channel audio codec in accordance with an embodiment of the present invention.
FIG. 3 is a diagram describing an embodiment of sound transfer paths in accordance with an embodiment of the present invention.
FIG. 4 is a diagram describing the operation of one example of the preprocessing unit for binaural signal synthesis in accordance with an embodiment of the present invention.
FIG. 5 is a flowchart illustrating a method for evaluating an audio quality of a multi-channel audio codec in accordance with another embodiment of the present invention.
FIG. 6 is a flow chart for calculating an ILD distortion in accordance with one embodiment of the present invention.
FIG. 7 is a flow chart for calculating an EITD distortion in accordance with one embodiment of the present invention.
FIG. 8 is a sample envelope of an exemplary sound signal.
FIG. 9 shows a more detailed version of the flow chart of FIG. 7 calculating an EITD distortion.
[Mode for Invention]
The advantages, features and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter, so that a person skilled in the art will easily carry out the invention. Further, in the following description, well-known arts will not be described in detail if it seems that they could obscure the invention in unnecessary detail. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.
ITU-R Recommendation BS.l 1 16-1, "Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems", ITU-R Recommendation BS. 1387-1, "Method for objective measurements of perceived audio quality", International Telecommunication Union, Geneva, Switzerland, 1998, ITU-R
Recommendation BS. 1534-1, "Method for the Subjective Assessment of Intermediate
Sound Quality(MUSHRA)", International Telecommunication Union, Geneva, Switzerland, 2001, and ISO/IEC JTC1/SC29/WG11(MPEG), N7138, "Report on MPEG Spatial Audio Coding RMO Listening Tests," ISO/IEC JTC1/SC29/WG1 l(MPEG), N7139, "Spatial Audio Coding RMO listening test data" are incorporated herein by reference.
In general, a multi-channel audio has six channels (or 5.1 channel) such as front 5 speakers (LF (left front) and RF (right front)), a center speaker (C), an intermediate and low sound channel (LFE: low frequency effect), and rear speakers ((LS (left surround) and RS (right surround)). Among these, only five channel channels of the front speakers (LF and RF), the center speaker (C), and the rear speakers (LS and RS) are used for some embodiments of the present invention, since the LFE is not actually used in many cases. 10 FIG. 1 is a diagram illustrating a structure of a multi-channel audio reproduction system recommended by ITU-R, to which an embodiment of the present invention may apply.
In the multi-channel audio reproduction system recommended by the ITU-R, as shown in FIG. 1, the five channel speakers may be arranged on the line of one circle
15 centering around a listener 10, wherein the front left and the right speakers L and R and the listener 10 forms a regular triangle. The distance between the center speaker C in the front and the listener 10 may be equal to that between the front left and the right speakers L and R. And, the rear left and the right speakers LS and RS may be placed on the concentric circle of 100 to 120 degrees with respect to the front which is 0 degree.
20 The reason why the reproduction system is arranged to conform to the standard arrangement recommended by the ITU-R is that the intended audio quality (the best audio quality) can be obtained by doing so, because most of sources are being edited/recorded based on the standard arrangement.
In some embodiments of the present invention, the listener 10 of the multi-channel
25 audio reproduction system recommended by the ITU-R is substituted by an audio quality evaluation apparatus for a multi-channel audio codec, which evaluates the audio quality of the codec by measuring impulse responses of multi-channel audio signals from the five channel speakers L, R, C, LS and RS by using an binaural microphone that simulates the body (the head and upper half).
30 FIG. 2 is a diagram illustrating a structure of an apparatus for evaluating an audio quality of a multi-channel audio codec in accordance with an embodiment of the present invention.
As shown in FIG. 2, the audio quality evaluation apparatus 10 of a multi-channel audio codec may include a preprocessing unit 1 1 for synthesizing binaural signals
A , . X. ft
,,_ Lfef Rref Ltest Rtest , , . , , , , 5 .·. ! based on multi-channel audio signals transmitted through the channels L, R, C, LS and RS of a standard multi-channel audio reproduction system recommended by the ITU-R, an output variable calculator 12 for calculating MOVs (Model Output Variables) including lACCDist (InterAural Cross-correlation Coefficient Distortion), ILDDist (Interaural Level Difference Distortion), and EITDDist (Envelope Interaural Time Difference Distortion), and an artificial neural network circuit 13 for outputting a grade of the audio quality on the basis of the MOVs calculated from the output variable calculator 12.
Here, IACC represents the maximum value of the normalized cross correlation function between the left ear input and the right ear input. ILD denotes the ratio of intensity of signals between the left ear input and the right ear input. And EITD represents the time difference between the audio signal envelopes inputted through left and right ears, particularly for high frequency band audio signal.
The following is a brief explanation on the operation of each of the components of the audio quality evaluation apparatus of the multi-channel audio codec according to the invention. Five channel signals of sound sources which are encoded and decoded by a multi-channel audio codec to be evaluated are indicated by LF,est, RFtest, Qest, LStest, RStest, and five channel signals of their original sound sources are denoted by LFref, RFref, Cref, LSref, RSref- In this document, above LF,est, RF,est, LFref, RFref may be also denoted as Ltest,
Rtes Lref, Rref-
Above all, total ten signals of LFtest, RFtest, Ctest, LStest, RStest, LFref, RFref, Cref, LSref, RSref may be inputted to a preprocessing unit 1 1. The preprocessing unit 1 1 may convolve head related impulse responses of corresponding azimuth angles— that simulate the transfer function of the sound propagation path including the body (head and torso) of a listener-to the five channel test signals and five channel reference signals, and sums up the
•Λ Λ Λ· Λ convolutions, to thereby calculate the binaural signals 'e} · ' i · t · . The purpose of this process is to simulate the acoustical environment in the audio reproduction layouts, and the process is illustrated as a block diagram in FIG. 4.
In this configuration, the total number of the sound transfer paths is ten, due to the five locations of loudspeakers and two ears of a listener, which may be represented by graphs as depicted in FIG. 3.
The output variable calculator 12 calculates MOVs including lACCDist, ILDDist, and EITDDist. Those two variables, lACCDist and ILDDist, mirror degradations in the attributes of spatial quality. The calculated MOVs may then be provided to artificial neural network circuit 13. The artificial neural network circuit 13 may output a grade of the audio quality based on the MOVs provided from the output variable calculator 12. The grade of the audio quality may be referred to as ODG (Objective Difference Grade).
Here, the output variable calculator 12 may calculate ILDDist from the binaural
Λ Λ Λ Λ
signals ref · ™J > ' ''"'' · '"" inputted from the preprocessing unit 1 1 , by using following Equations 1 and 2. The ILD of an uncompressed original audio signal may be denoted as ILDre/, and the ILD of the audio signal which is encoded and decoded by the multi-channel audio codec under test may be denoted as ILDtest. Also, the IACC may be named in the similar way. For the calculation of IACC and ILD, the binaural signals may be converted to time-frequency segment signals with the 75% overlapped time frames (of the length that equivalent to 50 ms for IACC, and of the length that equivalent to 10 ms for ILD) and 24 auditory critical bands filter-banks. Among these, ILDDist for a k-th frequency band of an n-th time frame may be represented as ILDDist[k,n].
[Equation 1]
ILDDist[k, n] = w[k] ILDtest[k, n] - ILDref[k, n]
In Equation 1 , ILDDist represents an interaural level difference distortion, and w[k,n] represent a weighted function that is decided depending on the range of the critical band, which reflects the intensity level of a time-frequency segment and auditory sensitivity to the ILD.
Meanwhile, to acquire the ILDDist[n] of the entire auditory band in the n-th time frame, an average is taken for the entire frequency bands as following Equation 2.
[Equation 2]
1 2-1
ILDDist[n] = -∑ ILDDist[k, n]
By averaging again a number of the ILDDist[n] for the entire time frames, the ILDDist of the multi-channel audio codec can be calculated, and the IACC can also be calculated in the same way. At this time, IACCDist may be named as ICCDist. Since ICCDist and ILDDist have the high cross correlation with the audio quality evaluation (subjective evaluation) result of the multi-channel audio codec by the listener, output variable calculator 12 can regard ICCDist and ILDDist as the output variables. Variables including ICCDist and ILDDist may be inputted to artificial neural network circuit 13, to thereby output the one-dimensional grade of the audio quality with the objectivity and consistency.
Details of the method calculating an EITDDist at output variable calculator 12 will be explained with reference to FIG. 8.
FIG. 4 is a diagram describing the operation of one example of the preprocessing unit of the audio quality evaluation apparatus in accordance with an embodiment of the present invention.
As shown in FIG. 4, a preprocessing unit 1 1 of audio quality evaluation apparatus 10 converts an impulse response of each sound transfer path which is measured by using an interaural microphone that simulates the body (the head and upper half) of the standard multichannel audio reproduction system recommended by the ITU-R into a transfer function, and sums up the transfer functions, to thereby calculate the interaural input λ Λ λ Λ
-Lref; Rref -Ltest Rtest
signals · · ·
FIG. 5 illustrates a flowchart for a method of evaluating an audio quality of a multi-channel audio codec in accordance with an embodiment of the present invention.
First of all, a preprocessing unit 1 1 of the audio quality evaluation apparatus 10 for a multi-channel audio codec converts impulse responses of each of sound sources which are encoded and decoded by the multi-channel audio codec and original sound sources into transfer functions, and sums up the transfer functions, to thereby calculate the interaural
Λ Λ Λ Λ
input signals ref · r"y < < 'es' (S501 ). Thereafter, an output variable calculator 12 may calculate MOVs including IACCDist, ILDDist, and EITDDist from the time-frequency
Λ ·Λ Λ, Λ
segments of the binaural signals ™! · reJ · · provided by thepreprocessing unit 1 1 (S502). The calculated MOVs may be then applied to an artificial neural network circuit 13 (S503). Finally, the artificial neural network circuit 13 may output an objective audio quality grade based on the MOVs produced at the output variable calculator 12 (S504).
Referring back to FIG. 2, the output variable calculator 12 may further produce EITDs
A A A Λ
from the binaural signals tJ - r¾f ■ · test produced by the preprocessing unit 1 1. The produced EITDs may be inputted to the artificial neural network circuit 13.
Audio quality degradation caused by change of audio signal location is one of important evaluation factors. According to classical Duplex theory, the location of an audio signal can be recognized by ILD for high frequency component. In addition to ILD, EITD of high frequency component of an audio signal influences the mechanism of recognizing a location of an audio signal.
In one embodiment of the present invention, a method for calculating ILD and/or
EITD of a high frequency component is provided.
For evaluation of objective performance of a multichannel audio, quantitative analysis for a distortion of spatial impression is required as well as quantitative analysis for a distortion of sound tone. Distortion of sound location is one of important factors for evaluating a distortion of spatial impression. Because human brain uses ILD and EITD to recognizing the location of high frequency sound, a quantitative evaluation of an audio signal quality may use parameters including ILD and EITD. ILD and EITD may be respectively calculated both for a reference signal (i.e., original signal) and a test signal
(i.e., a signal coded and decoded from the reference signal by a codec). ILDDist or EITDDist may be calculated using cognitive distance or difference between ILDs or EITDs obtained from the reference signal and the test signal, respectively. In order to calculate ILD and EITD of a high frequency audio signal, multichannel audio signals may be synthesized into binaural signals. HRTFs (Head-Related Transfer Functions) may be used for synthesizing binaural signals, a HRTF represents an audio signal transfer path from each speaker to left and right ears. ILD and EITD of a high frequency audio signal may be calculated using the synthesized binaural signals.
FIG. 6 is a flow chart for calculating an ILD distortion in accordance with one embodiment of the present invention.
Referring to FIG. 6, a binaural synthesis part 601 may produce binaural signals
1 R A Λ
Γ<¾Γ ' ref of a reference signal and binaural signals '""''■< of a test signal, using the above described input signals LF,est, RF(est, Ctest, LStest, RStesi, LFref, RFref, Cref, LSref, RSref. A peripheral ear model part 602 may produce excitation patterns of the reference
A Λ Λ Λ'
· i J-' '*' Rief L.tast Rtesl „ . signal and the test signal by using the binaural signals · · . Envelop extraction part 603 may produce envelopes of the excitation patterns of the reference signal and envelopes of the excitation patterns of the test signal, respectively. A cognition model part 604 may calculate an ILDDist value of a high frequency band by using the envelopes from the envelop extraction part 603.
The binaural synthesis part 601 of FIG. 6 may correspond to the preprocessing unit 1 1 of FIG. 2. The peripheral ear model part 602, envelop extraction part 603, cognition model part 604 of FIG. 6 may be included in the output variable calculator 12 of FIG. 2.
ILD may be defined as an energy difference between the signals inputted to a peripheral ear model of left and right ear, which is composed of a multiple of band-pass filters having a center frequency decided by ERB (Equivalent Rectangular Bandwidth) scale, and may be represented by Equation 3. A peripheral ear model is for calculating excitation patterns at basilar membrane, from audio signals inputted from both left and right ears.
[Equation 3]
Figure imgf000011_0001
Although the energy difference between the signals inputted from left and right ears can be expressed as Equation 3, human brain may process in different way when an ILD is given. When a non-zero ILD is given, the higher level signal among the signals from left and right ears may cause more frequent neural spikes in IC (Inferior Colliculuse) which processes the ILD, so the IC may have to handle the neural spikes. Because a model for the number of neural spikes occurring in IC follows a tangential sigmoid function, the calculated ILD value may further be nonlinearly transformed by a tangential sigmoid function and this is represented as Equations 4 and 5.
[Equation 4]
Figure imgf000012_0001
[Equation 5]
Figure imgf000012_0002
The gradient of a tangential sigmoid function shows different signs (e.g., positive or negative sign) according to the energy difference of the ear input signals. If the signal from left ear is larger than that of right ear, the gradient may have positive sign. To the contrary, if the signal from right ear is larger than that of left ear, the gradient may have negative sign. In addition, in order to reflect the sensitivity of neural spike occurring mechanism in IC according to each frequency band, the tangential sigmoid function may have different gradient according to each frequency band. In Equations 4 and 5, ' 7¾' may represent the threshold of the tangential sigmoid function, and ' Tk' may be zero(0) in case of ILD. Then, an ILDDist may be represented as Equation 7 for a time-frequency segmented signal.
Equation 6]
Figure imgf000012_0003
A resulting ILDDist may be obtained by calculating a mean value of ILDDist[k,n] values over the whole frequency bands and time frames, and may be represented as Equation 7. The resulting ILDDist may be regarded as a cognitive distance due to the ILD between a test signal and a reference signal.
[E uation 7]
Figure imgf000012_0004
An EITDDist represents a cognitive distance of the audio signal location of a test signal and the audio signal location of a reference signal, which arises due to the difference of EITDs of the test and reference signals. EITDDist, along with ILDDist, may be used as a feature for evaluating spatial impression that occurs due to the difference of high frequency audio signal source locations. FIG. 7 is a flow chart for calculating an EITDDist in accordance with an embodiment of the present invention.
Referring to FIG. 7, the binaural synthesis part 701 may produce binaural signals
Figure imgf000013_0001
test
o a re erence s gna an naura s gna s of a test signal, using the above described input signals LFtest, RFtest> CteSt, LStest, RStest, LFref, RFref, Cref, LSref, RSref. Binaural synthesis part 701 of FIG. 7 may correspond to the preprocessing unit 1 1 of FIG. 2.
At the binaural synthesis part 701 , multichannel sound sources may be synthesized test
into binaural signals, which are represented as
Figure imgf000013_0002
, using HRTFs recorded in a reference listening room as recommended in ITU-R Rec. BS. 1 1 16-1. In this case, the LFE channel may be adjusted to have zero(0) value for every sound source. Equation 8 may be used to synthesize binaural signals from the five channel signals. Herein, each of the subscripts 'tesf and 'ref represents a test signal and a reference signal, respectively.
[Equation 8]
Figure imgf000013_0003
Ha, HLjL, HRJI, HLSL, HRSL, HCR, HLJR, HR/Λ HLSR, HRSR of Equation 8 represent total ten of BRTFs (Binaural Room Transfer Functions) which represent acoustic wave paths from each speaker to left and right ears. Further, each of ^ and R of Equation 8 represents an acoustic wave input at left ear and right ear, respectively.
The synthesized binaural signals can be processed by a peripheral ear model. Input signals from two ears (left and right) are delivered to middle ears and then be processed in cochleas, and such a process can be reproduced by the peripheral ear model. A cochlea simulator in the peripheral ear model may transform the binaural signals into signals which stimulate hair cells at basilar membrane. The cochlea simulator may be regarded as a filter bank which is composed of a total of 24 pass-band filters with a center frequency decided by ERB (Equivalent Rectangular Bandwidth) scale. The signals passed through the cochlea simulator may be transformed into excitation patterns of the signals filtered by respective pass-band filters. The peripheral ear model part 702 of FIG. 7 may produce excitation patterns of the
•Λ Λ Λ Λ reference signal and the test signal by using the binaural signals eJ · reJ · · .
The envelop extraction part 703 of FIG. 7 may produce envelopes of the excitation patterns of the reference signal and envelopes of the excitation patterns of the test signal, respectively.
Envelopes of the excitation patterns can be extracted by discrete Hilbert Transform. The envelope is obtained by the squared sum of magnitude and Hilbert-transformed value of the excitation patterns. FIG. 8 shows an example of an extracted envelope. The solid line represents a full-rectified excitation pattern and the dashed line represents an extracted envelope. An EITD can be obtained by calculating a binaural time difference of the extracted envelope.
As shown in Equation 9, the output signals from ERB-scale auditory filter-bank can be denoted as x[k,n], a time-frequency segmented signal. Here, and V represent the index number of frequency band and time frame, respectively. Envelope signal E[k,ri\ can be computed using discrete Hilbert Transformed signal H{x[k,n] } as shown in Equation 9. In this document, x[k,n] can also be denoted as r[k,ri], and H{x[k, n) } can also be denoted as i[k,n].
[Equation 9]
E[k, n] - Jx2[k, n] + H {x[k, n]}2
In Equation 9, 'k' represents a frequency band index which is segmented by a peripheral ear model, and 'n' represents a time frame index which is being processed.
The cognition model part 704 of FIG.7 may calculate an EITDDist value of high frequency band by using the envelopes from the envelop extraction part 703.
For time-frequency segmented left and right ear input signals, high frequency EITDs can be computed using the time-segmented normalized cross-correlation function (NCF) as described in Equation 10.
Equation 10]
Figure imgf000014_0001
Here, „ ' and 1ERX„ ' represent the envelope signals of the excitation patterns for left and right ear, respectively. ld', 'A ' and 'n ' represent the time lag in the sample unit, frequency band and time frame indices, respectively. In this document, and ERX„ can also be denoted as Χιχη and XRX,„, respectively. The interaural cross-correlation coefficient (IACC) may be defined as the maximum value of NCF over all d. The interaural time difference (ITDs) may be defined as the d value that yields the maximum NCF value. The cross-correlation may be calculated with an approximately lOms-length rectangular window, overlapped by 7/8. EITDs and EIACCs (Envelope InterAural Cross Correlation) can be expressed by Equations 1 1 and 12 for time-frequency segmented signals.
[Equation 1 1 ]
Figure imgf000015_0001
[Equation 12]
Figure imgf000015_0002
In Equations 1 1 and 12, 'N' represents the scope of 'd', and means a theoretically possible ITD value. EITDs and EIACCs are measured for reference and test signals. Then, subscripts 'ref and 'test ' represent the corresponding signals, respectively.
Since the perceptual change of the source direction can be approximated as an Euclidian distance between two different positions on the unit circle, the EITD difference between a test signal and a reference signal can be computed as shown in Equation 13. That is, the difference between EITDs of a test signal and a reference signal can be computed as the difference between two vectors with the corresponding phase angles to EITDs. In Equation 13, y denotes the sampling rate and W is the maximum ITD in sample numbers.
[Equation 13]
=^2- 2(cos^.: cos^/ -sin ¾« sin )
Figure imgf000015_0003
After calculating EITDs, the next process has to consider that EITD detection may fail in some cases. If EIACCs is too low, perceived source location is ambiguous. Thus, a decision factor that considers the certainty of computed EITDs is applied. This certainty can be modeled by a tangential sigmoid function that transforms EIACCs nonlinearly as shown in Equations 14 and 15. That is, EIACCs may be transformed non-linearly by a tangential sigmoid function in order to consider the case that the detection of sound locations may fail for too low EIACC. EIACC values can be non-linearly transformed by Equations 14 and 15 for a reference signal and a test signal.
[Equation 14]
Figure imgf000016_0001
[Equation 15]
Figure imgf000016_0002
For Equations 14 and 15, the tangential sigmoid function used in this model may have a steepness S of 50, and threshold 7* may have different value in different frequency band, since each frequency band has different sensitivity to JTDs.
When these certainty factors are applied to AEITD[k, n] , an EITDDist value can be obtained as shown in Equation 16. That is, EITD distortion can be computed by applying nonlinearly transformed EIACC values as certainty factors to Equation 13.
[Equation 16]
EITDDist[k, n] - -(¾ε5,[^, «] t ?re/[/C «]) · AEITD[k,n]
The resulting EITDDist may be obtained by averaging EITDDist[k,n] values over frequency bands and time frames, as expressed in Equation 17. That is, the resulting EITDDist is an averaged value over frequency bands and time frames.
[Equation 17]
Figure imgf000016_0003
The peripheral ear model part 702, envelop extraction part 703, and cognition model part 704 of FIG. 7 may be included in the output variable calculator 12 of FIG. 2.
FIG. 9 shows a more detailed version of the flow chart of FIG. 7 calculating an EITD distortion.
Referring to FIG. 9, the binaural synthesis part 901 may produce binaural signals from multichannel signals, using Equation 8. The peripheral ear model part 902 may produce excitation patterns of a reference signal and a test signal by using binaural input signals. The envelop extraction part 903 may produce envelopes of the excitation patterns of the reference signal and envelopes of the excitation patterns of the test signal by using Equation 9, respectively. The NFC part 904 can calculate EITDs and EIACCs using the obtained envelopes. The EITD Distortion Computation part 905 can calculate an EITDDist value using the EITDs and EIACCs of the test and reference signals. In FIG. 9, each of subscripts '/? ', 'L ', 'test ', 'ref, 'k 'n ' represents right channel, left channel, test signal, reference signal, frequency band index, and time frame index, respectively.
The £/7jDZ /5t-obtaining method that uses Equations 13 to 17 can be modified to a method using the following Equations 18 and 19. Before obtaining an EITDDist value from the EITD values calculated by Equation 1 1 , EIACC values may be non-linearly transformed by applying a tangential sigmoid function as shown in Equations 14 and 15.
The transformed EIACC value can be used as a weighting factor which can be applied to EITD values. Then, a cognitive EITD distance can be calculated from a weighted EITD values. Since the perceptual change of the source direction can be approximated as the Euclidian distance between two different positions on the unit circle, the EITD difference can be computed as in Equation 18. In Equation 18, ^ denotes the sampling rate and N is the maximum ITD in sample numbers. In this document, ctest[k,n] and cref[k,n] of Equation 18 may also be denoted as ptest[k,n] and pref(k,n].
[Equation 18] bElTD[k, H} 2 --2 COS1T · fs · (ctest[k, n] · EITDtestfk, n] - cref[k, n] · EITDt?st[k, h])/N
The resulting EITDDist is averaged over frequency bands and time frames, as expressed in Equation 20. The resulting EITDDist can show a mean value of EITD distances, which means a cognitive distance between reference and test signals due to EITD value difference.
[Equation 20]
Figure imgf000017_0001
According to one embodiment of the present invention, an audio signal evaluation apparatus may comprise a preprocessing means adapted to produce binaural input signals from multichannel audio signals from each channel(L, R, C, LS, RS) of a multichannel audio reproducing system, an output variable calculating means adapted to output model output values including lACCDist, ILDDist, EITDDist values, and a neural network circuit means adapted to output an audio quality level based on the model output variables.
To train and verify a prediction model used in one embodiment of the present invention, a listening test database distributed from the ISO/MPEG audio group of ITU-R Recommendation BS. 1534-1 was used for the model. Subjective listening tests followed the procedures recommended in ITU-R Rec. BS.1534-1 "Multiple Stimulus with Hidden Reference and Anchor (MUSHRA). 1 1 different test signals were used in the listening tests. Each test excerpt was encoded and decoded by 1 1 different multichannel audio coding systems. Consequently, listening test database contains 121 items.
Table 1 shows the correlation coefficients between the subjective listening evaluation result and the 14 evaluation features (MOVs) used for the objective evaluation scheme.
[Table 1 ] MOV correlation coefficient
ADB -0.68
NMRtoB -0.51
NLo ndB -0.51
AModDiflB -0.45
WModDiflB -0.44
RDF -0.43
EHS -0.43
AModDiflB -0.36
AvgBwRef -0.06
AvgBwTst -0.00
HDD -0.78
IACCD -0.62
ITDD -0.61
EITDD -0.72
Each correlation coefficient ρχ,γ can be calculated as in Equation 20.
[Equation 20]
_ OV(X ) _ E{XY] - E[X]E[Y]
In Equation 20, each of X and Y represents MOS and data of each feature, respectively. Correlation coefficients between the fourteen features (MOVs) and the subjective listening evaluation result were calculated for the 121 signals synthesized from binaural signals. Among the fourteen features, last four features represent the degree of degradation level of spatial impression. Ten model output values and four spatial features are listed in Table 2 and 3, respectively.
[Table 2]
Figure imgf000018_0001
AModDiflB Averaged modulation difference
Averaged modulation difference with emphasis on introduced
AModDif2B modulations and modulation changes where the reference contains little or no modulations
Windowed averaged difference in modulation (envelopes)
WinModDifB between Reference Signal
and Signal Under Test
Relative fraction of frames for which at least one frequency
RDF band contains a
significant noise component
Rms value of the averaged noise loudness with emphasis on
NLoudB
introduced components
[Table 3]
MOV Description
Cognitive distance for the audio signal direction difference
ITDDist between a test signal and a reference signal, arising due to interaural time difference
Cognitive distance for the audio signal direction difference
ILDDist between a test signal and a reference signal, arising due to interaural level difference
Cognitive distance for the audio signal width and ambience
LACCDist between a test signal and a reference signal, arising due to interaural correlation coefficient difference
Cognitive distance for the audio signal direction difference,
EITDDist arising due to high frequency envelope interaural time difference
Because every MOV has a negative correlation value with respect to the subjective listening evaluation result, it can be considered that a larger absolute value of a correlation coefficient in Table 1 means a better performance for audio quality prediction. As shown in Table 1, EITDDist shows a considerably high absolute coefficient value of 0.72, which is larger than lACCDist of 0.61, ITDDist of 0.61, and other ten audio tone distortion features. From this result, it can be understood that the high frequency envelope information plays an important role in spatial impression and overall audio quality for multichannel audio signal. In addition, compared to the MOVs used in ITU-R Rec. BS. 1387-1, the four features in Table 3 has larger or similar correlation coefficient values. Based on this result, it can be understood that spatial impression, as well as audio tone, is very important for evaluating multichannel audio quality assessment.
Each of the above MOVs can be used as an input variable for a prediction model for an objective audio quality evaluation. An objective audio quality prediction model for multichannel audio coding system can show better prediction performance when a MOV representing spatial impression distortion having a high correlation coefficient with respect to a subjective listening evaluation result is added on the model. EITDDist can be used as a model output variable for evaluating spatial impression distortion in an objective audio quality prediction model. Particularly, because EITDDist has high correlation with a subjective listening evaluation result, one can improve the performance of an objective audio quality prediction model for a multichannel audio coding system by adding EITDDist to the objective audio quality prediction model as an input feature.
According to one embodiment of the present invention, the performance of an objective audio quality evaluation model can be improved by providing spatial impression features. An evaluation model reflecting cognitive differences can be provided by mathematically modeling the audio signal process inside human brain using the spatial features.
The present invention is different from a conventional method which simply provides a distortion level between an original signal and its codes/encoded signal at individual frequency bands. The present invention is for obtaining a result that is similar to a statistically processed result of subjective audio quality evaluation results in multichannel audio reproduction environment. According to an embodiment of the present invention, listening evaluation and statistical processing procedures can be omitted.
An embodiment of the present invention can be used for an audio compression codec performance evaluating method/apparatus in order to compare cognitive sound qualities of a reference signal and a test signal (i.e., signal under test) which is coded and decoded from the reference signal using the audio compression codec.
In some embodiments of the present invention, the artificial neural network circuit may be substituted by a general digital signal processing unit. That means, the artificial neural network circuit in this document was introduced as an exemplary digital signal filter. Therefore, the scope of the present invention is not limited to the accompanying drawings its related descriptions.
According to one embodiment of the present invention, features that influence spatial impression recognition can be obtained based on the psycho-acoustical and physiological research results, and the performance of an objective evaluation model for a multichannel audio codec can be improved by implementing the features by appropriate mathematical models. The method of an embodiment of the present invention as mentioned above may be implemented by a software program that is stored in a computer-readable storage medium such as CD-ROM, RAM, ROM, floppy disk, hard disk, optical magnetic disk, or the like. This process may be readily carried out by those skilled in the art; and therefore, details of thereof are omitted here.
Each of above described embodiments can be obtained by elements and features of the present invention. Each element or feature can be regarded as a selectable one unless any contrary explanation is provided. Each element or feature can be omitted in some embodiments. The order of steps described above can be interchanged without departing from the spirit and scope of the invention. One element comprised in an embodiment can be comprised in another embodiment, or can be substituted with an element or a feature in another embodiment. At least two of the accompanying claims can be merged to constitute an embodiment.
Some embodiments of the present invention can be implemented by various means, such as hardware, firmware, software or the combination thereof. In case of a hardware implementation, an embodiment of the present invention can be implemented with one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), processors, controllers, micro controllers, and micro processors.
In case of a firmware or software implementation, an embodiment of the present invention can be implemented with modules, procedures, functions performing the above described means or steps. A software code can be saved in a memory unit and run by a processor. The memory unit may be located in or outside the processor, and communicate data with the processor by various conventional communication means.
While the present invention has been described with respect to the particular embodiment, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims

[CLAIMS]
[Claim 1 ]
An audio quality measurement method, comprising:
producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion {EITDDist) based on comparison between a reference signal and a signal under test; and
mapping the number of model output variables to an audio quality value.
[Claim 2]
The method of claim 1, wherein the EITDDist is given by
N Z
EITDDist = - ^ i - ^ £/TDDist[k, n]
n=i V k=i ? ancj wherein the EITDDist[k,n] represents a value of envelope interaural time difference distortion obtained by comparing the reference signal and the signal under test at k-th frequency band of n-th time-frame.
[Claim 3]
The method of claim 2,
wherein the EITDDist[k,n] is given by
1
EITDDist[k, n] = - (ctest[k, n] + cref[k, n]) · AEITD [k, n] wherein the AEITD[k,n] represents a difference between envelope interaural time differences of the reference signal and the signal under test at k-th frequency band of «-th time-frame,
wherein the Ctest[k,n] represents a nonlinearly transformed value of a envelope interaural cross-correlation coefficient (EIACC) of the signal under test at k-t frequency band of n-th time-frame, and
wherein the Cref[k,n] represents a nonlinearly transformed value of a envelope interaural cross-correlation coefficient (EIACC) of the reference signal at k-th frequency band of n-th time-frame.
[Claim 4]
The method of claim 1 , wherein the reference signal is obtained from a multichannel audio signal, and the signal under test is obtained from an output of a device under test through which the multichannel audio signal is inputted.
[Claim 5 ]
The method of claim 1 , wherein at least one of the number of model output variables is based on comparison between excitation patterns of the reference signal and the signal under test.
[Claim 6]
The method of claim 1 , wherein the EITDDist is obtained by applying the reference signal and the signal under test to a filter bank.
[Claim 7]
Computer readable medium for storing computer instructions executable by a processor for modifying an operation of a device having a processor, the computer readable medium comprising:
computer code for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion {EITDDist) based on a comparison between a reference signal and a signal under test, and mapping the number of model output variables to an audio quality value.
[Claim 8]
The computer readable medium of claim 7, wherein the EITDDist is given by
Figure imgf000023_0001
and wherein the EITDDist[k,n] represents a value of envelope interaural time difference distortion obtained by comparing the reference signal and the signal under test at k-th frequency band of n-t time-frame. [Claim 9]
Computer readable medium for storing a set of computer instructions executable by a processor for modifying another set of computer instructions for producing a number of model output variables (MOVs) based on comparison between a reference signal and a signal under test and mapping the number of model output variables to an audio quality value, the computer readable medium comprising:
computer code for modifying the another set of computer instructions to have the number of model output variables comprise a variable representing envelope interaural time difference distortion (EITDDist) based on comparison between the reference signal and the signal under test. [Claim 10]
The computer readable medium of claim 9, wherein the EITDDist is given by
EITDDist
Figure imgf000024_0001
, and wherein the EITDDist[k,n] represents a value of envelope interaural time difference distortion obtained by comparing the reference signal and the signal under test at k-t frequency band of n-t time-frame.
[Claim 1 1 ]
An audio quality measurement apparatus, comprising:
a producing mean for producing a number of model output variables (MOVs) including a variable representing envelope interaural time difference distortion (EITDDist) based on comparison between a reference signal and a signal under test; and
a mapping mean for mapping the number of model output variables to an audio quality value. [Claim 12]
The audio quality measurement apparatus of claim 1 1 , wherein the producing mean and the mapping mean are parts of a processing unit configured to execute a set of instruction for the producing and the mapping. [Claim 13 ]
The computer readable medium of claim 1 1 , wherein the EITDDist is given by EITDDist
Figure imgf000024_0002
, and wherein the EITDDist[k,n] represents a value of envelope interaural time difference distortion obtained by comparing the reference signal and the signal under test at A th frequency band of n-th time-frame.
PCT/KR2011/002713 2010-04-16 2011-04-15 Method, apparatus, and program-containing medium for assessment of audio quality WO2011129655A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20100035182 2010-04-16
KR10-2010-0035182 2010-04-16
KR10-2010-0035579 2010-04-17
KR20100035579 2010-04-17

Publications (2)

Publication Number Publication Date
WO2011129655A2 true WO2011129655A2 (en) 2011-10-20
WO2011129655A3 WO2011129655A3 (en) 2012-03-15

Family

ID=44799206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2011/002713 WO2011129655A2 (en) 2010-04-16 2011-04-15 Method, apparatus, and program-containing medium for assessment of audio quality

Country Status (2)

Country Link
KR (2) KR101170524B1 (en)
WO (1) WO2011129655A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857852A (en) * 2012-09-12 2013-01-02 清华大学 Sound-field quantitative regeneration control system and method thereof
CN102857851A (en) * 2012-09-12 2013-01-02 清华大学 Sound and image synchronizing system for sound quality evaluation
US10362427B2 (en) 2014-09-04 2019-07-23 Dolby Laboratories Licensing Corporation Generating metadata for audio object
CN110211610A (en) * 2019-06-20 2019-09-06 平安科技(深圳)有限公司 Assess the method, apparatus and storage medium of audio signal loss
CN111935624A (en) * 2020-09-27 2020-11-13 广州汽车集团股份有限公司 Objective evaluation method, system, equipment and storage medium for in-vehicle sound space sense
WO2022112594A3 (en) * 2020-11-30 2022-07-28 Dolby International Ab Robust intrusive perceptual audio quality assessment based on convolutional neural networks
WO2023018889A1 (en) * 2021-08-13 2023-02-16 Dolby Laboratories Licensing Corporation Management of professionally generated and user-generated audio content
CN115798518A (en) * 2023-01-05 2023-03-14 腾讯科技(深圳)有限公司 Model training method, device, equipment and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108259893B (en) * 2018-03-22 2020-08-18 天津大学 Virtual reality video quality evaluation method based on double-current convolutional neural network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070079899A (en) * 2006-02-03 2007-08-08 한국전자통신연구원 Apparatus and method for measurement of auditory quality of multichannel audio codec

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8612237B2 (en) 2007-04-04 2013-12-17 Apple Inc. Method and apparatus for determining audio spatial quality
US8233629B2 (en) * 2008-09-04 2012-07-31 Dts, Inc. Interaural time delay restoration system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070079899A (en) * 2006-02-03 2007-08-08 한국전자통신연구원 Apparatus and method for measurement of auditory quality of multichannel audio codec

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
INYOUNG CHOI ET AL.: 'Objective measurement of perceived auditory quality in multi-channel audio compression coding systems' J. AUDIO ENG. SOC. vol. 5, 2008, page 6 *
INYOUNG CHOI ET AL.: 'Objective measurement of spatial auditory quality for multi-channel audio codecs' IEEK vol. 28, no. 2, 2005, pages 431 - 434 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857852A (en) * 2012-09-12 2013-01-02 清华大学 Sound-field quantitative regeneration control system and method thereof
CN102857851A (en) * 2012-09-12 2013-01-02 清华大学 Sound and image synchronizing system for sound quality evaluation
CN102857852B (en) * 2012-09-12 2014-10-22 清华大学 Method for processing playback array control signal of loudspeaker of sound-field quantitative regeneration control system
CN102857851B (en) * 2012-09-12 2015-04-15 清华大学 Sound and image synchronizing system for sound quality evaluation
US10362427B2 (en) 2014-09-04 2019-07-23 Dolby Laboratories Licensing Corporation Generating metadata for audio object
CN110211610A (en) * 2019-06-20 2019-09-06 平安科技(深圳)有限公司 Assess the method, apparatus and storage medium of audio signal loss
CN111935624A (en) * 2020-09-27 2020-11-13 广州汽车集团股份有限公司 Objective evaluation method, system, equipment and storage medium for in-vehicle sound space sense
CN111935624B (en) * 2020-09-27 2021-04-06 广州汽车集团股份有限公司 Objective evaluation method, system, equipment and storage medium for in-vehicle sound space sense
WO2022112594A3 (en) * 2020-11-30 2022-07-28 Dolby International Ab Robust intrusive perceptual audio quality assessment based on convolutional neural networks
WO2023018889A1 (en) * 2021-08-13 2023-02-16 Dolby Laboratories Licensing Corporation Management of professionally generated and user-generated audio content
CN115798518A (en) * 2023-01-05 2023-03-14 腾讯科技(深圳)有限公司 Model training method, device, equipment and medium
CN115798518B (en) * 2023-01-05 2023-04-07 腾讯科技(深圳)有限公司 Model training method, device, equipment and medium

Also Published As

Publication number Publication date
KR20110115984A (en) 2011-10-24
KR20120053996A (en) 2012-05-29
KR101170524B1 (en) 2012-08-01
WO2011129655A3 (en) 2012-03-15

Similar Documents

Publication Publication Date Title
WO2011129655A2 (en) Method, apparatus, and program-containing medium for assessment of audio quality
US8238563B2 (en) System, devices and methods for predicting the perceived spatial quality of sound processing and reproducing equipment
EP1979900B1 (en) Apparatus for estimating sound quality of audio codec in multi-channel and method therefor
TWI555011B (en) Method for processing an audio signal, signal processing unit, binaural renderer, audio encoder and audio decoder
Breebaart et al. Spatial audio processing: MPEG surround and other applications
US8612237B2 (en) Method and apparatus for determining audio spatial quality
JP7526173B2 (en) Directional Loudness Map Based Audio Processing
US20090238371A1 (en) System, devices and methods for predicting the perceived spatial quality of sound processing and reproducing equipment
KR20110002491A (en) Decoding of binaural audio signals
BR112013014173B1 (en) APPARATUS AND METHOD FOR DECOMPOSITING AN INPUT SIGNAL USING A PRE-CALCULATED REFERENCE CURVE
Narbutt et al. AMBIQUAL-a full reference objective quality metric for ambisonic spatial audio
BRPI0516405B1 (en) INDIVIDUAL CHANNEL CONFORMATION FOR BCC AND SIMILAR SCHEMES
Seo et al. Perceptual objective quality evaluation method for high quality multichannel audio codecs
Choi et al. Objective measurement of perceived auditory quality in multichannel audio compression coding systems
Fleßner et al. Subjective and objective assessment of monaural and binaural aspects of audio quality
Winter et al. Colouration in local wave field synthesis
US9311925B2 (en) Method, apparatus and computer program for processing multi-channel signals
JP2006325162A (en) Device for performing multi-channel space voice coding using binaural queue
RU2793703C2 (en) Audio data processing based on a directional volume map
RU2771833C1 (en) Processing of audio data based on a directional loudness map
RU2798019C2 (en) Audio data processing based on a directional volume map
Delgado et al. Energy aware modeling of interchannel level difference distortion impact on spatial audio perception
Cheng Spatial squeezing techniques for low bit-rate multichannel audio coding
Seo et al. An improved method for objective quality assessment of multichannel audio codecs
Jackson et al. Estimates of Perceived Spatial Quality across theListening Area

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11769121

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/02/2013)

122 Ep: pct application non-entry in european phase

Ref document number: 11769121

Country of ref document: EP

Kind code of ref document: A2