EP0720146A1 - Verfahren zur Messung von Sprachmaskierungseigenschaften - Google Patents

Verfahren zur Messung von Sprachmaskierungseigenschaften Download PDF

Info

Publication number
EP0720146A1
EP0720146A1 EP95309003A EP95309003A EP0720146A1 EP 0720146 A1 EP0720146 A1 EP 0720146A1 EP 95309003 A EP95309003 A EP 95309003A EP 95309003 A EP95309003 A EP 95309003A EP 0720146 A1 EP0720146 A1 EP 0720146A1
Authority
EP
European Patent Office
Prior art keywords
signal
noise
subband
power
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP95309003A
Other languages
English (en)
French (fr)
Inventor
Yair Shoham
Casimir Wierzynski
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Publication of EP0720146A1 publication Critical patent/EP0720146A1/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • G10L19/0208Subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • the invention relates to a method for measuring masking properties of components of a signal and for determining a noise level vector for the signal.
  • ISDN Integrated Services Digital Network
  • an input speech signal which can be characterized as a continuous function of a continuous time variable, must be converted to a digital signal -- a signal that is discrete in both time and amplitude.
  • the conversion is a two step process. First, the input speech signal is sampled periodically in time ( i.e. at a particular rate) to produce a sequence of samples where the samples take on a continuum of values. Then the values are quantized to a finite set of values, represented by binary digits (bits), to yield the digital signal.
  • the digital signal is characterized by a bit rate, i.e. a specified number of bits per second that reflects how often the input speech signal was sampled and how many bits were used to quantized the sampled values.
  • Masking is a term describing the phenomenon of human hearing wherein one sound obscures or drowns out another.
  • a common example is where the sound of a car engine is drowned out if the volume of the car radio is high enough.
  • the sound of the shower masked the sound of the telephone ring; if the shower had not been running, the ring would have been heard.
  • the masking properties of a signal are typically measured as a noise-to-signal ratio determined with respect to a masking criterion.
  • a masking criterion is the just-noticeable-distortion (JND) level, i.e. the noise-to-signal ratio where the noise just becomes audible to a listener.
  • JND just-noticeable-distortion
  • another masking criterion is the audible-but-not-annoying level, i.e. the point where a listener may hear the noise, but the noise level is not sufficiently high as to irritate the listener.
  • the invention provides a method for determining the masking properties of a signal in which the signal is decomposed into a set of subband components, as for example by a filterbank.
  • the noise power spectrum that can be masked by each subband component is identified and the noise spectra are combined to yield the noise power spectrum that can be masked by the signal.
  • output signals are generated based on the power in each subband signal and on a masking matrix. The noise power spectrum that can be masked by the input signal is determined from the output signals.
  • FIG. 1 illustrates a flow chart of the inventive method in which for a frame (or segment) of an input signal, a noise level vector, i.e. the spectrum of noise which may be added to the frame without exceeding a masking criterion, is determined a priori .
  • the method involves three main steps.
  • step 120 the input signal frame is broken down, as for example by a filterbank, into subband components whose masking properties are known or can be determined.
  • the masking properties for each component are identified or accessed, e.g. from a database or a library, and in step 160 the masking properties are combined to determine the noise level vector, i.e. the spectrum of noise power that can be masked by the input signal.
  • the method represents the frame of the input signal as a sum of subband components each of whose masking properties has already been measured.
  • the masking properties of the components required in step 140 must first be determined. Once the library of component masking properties is determined and advantageously stored in a database, the masking components can always be accessed, and optionally adapted, to determine the noise level vector of any input signal.
  • the inventive method of FIG. 1 recognizes that the masking property of a speech signal, i.e. the spectrum of noise that the speech signal can mask, can be based on the masking property of components of the speech.
  • a segment or frame of a first speech input signal is split into subband components, as for example by using a filterbank comprising a plurality of subband (bandpass) filters.
  • the spectrum of noise that can be masked by the first speech input signal is determined and then the spectra for all subband components are combined to find the noise level vector for the first speech input signal.
  • a measurement is taken to determine how much narrowband noise in each subband can be masked.
  • the measurement could be summarized as a method consisting of two nested steps: for every subband of speech i and for every subband of white noise j : Adjust the noise in subband j to the point where sufficient noise is added so that the masking criterion is met. Measure the noise-to-signal ratio at this point. repeat for next subband j repeat for next subband i .
  • the noise-to-signal measurements for each combination of i and j , q i , j represent the ratio of noise power in band j that can be masked by the first speech input signal in band i .
  • the elements q i , j form a matrix Q .
  • An example of such a Q matrix is illustrated in FIG. 2A where, for convenience, the entries have been converted to decibels.
  • the Q matrix of FIG. 2A illustrates the results of an experiment in which narrowband speech masked narrowband noise.
  • the row numbers correspond to noise bands; the column numbers correspond to speech bands.
  • Each element q i , j represents the maximum power ratio that can be maintained between noise in band j and the first speech input signal in band i so that the noise is masked. Note that not all q i , j have an associated value, i.e.
  • subband 1 covers a frequency range of 80 Hz, from 0 to 80 Hz
  • each q i , j is a power ratio determined for a particular masking criterion.
  • This definition makes sense for stationary stimuli (i.e. signals whose statistical properties are invariant to time translation), but in the case of dynamic stimuli, such as speech, care must be taken in adding noise power to a signal whose level varies rapidly.
  • this problem is advantageously avoided by arranging for the noise power level to vary with the speech power level so that within a given segment or frame, the ratio of speech to noise power is a predetermined constant.
  • the level of the added noise is dynamically adjusted in order to achieve a constant signal-to-noise ratio (SNR) throughout the frame.
  • SNR signal-to-noise ratio
  • Measuring the amount of masking between one subband component of speech and another subband of noise therefore consists of listening to an ensemble of frames of bandpassed speech with a range of segmental SNRs to determine which SNR value meets the masking criterion.
  • Different frame sizes may advantageously be used for different subbands as described below.
  • quasi-critical band filterbank To split the speech and noise into subbands a non-uniform, quasi-critical band filterbank is designed.
  • the term quasi-critical is used in recognition that the human cochlea may be represented as a collection of bandpass filters where the bandwidth of each bandpass filter is termed a critical band. See , H. Fletcher, "Auditory Patterns," Rev. Mod. Phy. , Vol. 12, pp. 47-65, 1940.
  • the characteristics and parameters of the filters in the filterbank may incorporate knowledge from auditory experiments as, for example, in determining the bandwidth of the filters in the filterbank. Note that it is advantageous that the filterbank used to produce the library of masking properties of components be the same as the filterbank used in step 120 of FIG. 1.
  • each filter should be as rectangular as possible, although significant passband ripple can be sacrificed in the name of greater attenuation. Overlap between adjacent filters should be minimized.
  • the filterbank is not completely faithful to the human ear to the extent that experimentally measured cochlear filter responses are not rectangular and tend to overlap a great deal.
  • the combined output should advantageously be perceptually indistinguishable from the input. This quality of the filterbank may be verified by listening tests.
  • linear phase filters may be used, although it should be noted that because of the asymmetry of forward and backward masking it would be preferable to use minimum phase filters. This last point is illustrated by considering the case when the speech signal consists of a single spike.
  • the combined output of a linear-phase filterbank would consist of the same spike delayed by half of the filter length, but the combined filtered noise would be dispersed equally before and after the spike. Since forward masking extends much farther in time than backward masking, it would be preferable if more noise came after the spike instead of before; this might be achieved with a more complicated minimum-phase filter design.
  • N 20 total subbands, corresponding roughly to the number of critical bands between 0 and 7KHz as found in prior experimental methods.
  • the bandwidths form an increasing geometric series.
  • f 20 a b 20 -1 b -1
  • f 20 is the highest frequency to be included, typically 7KHz in a speech case.
  • Setting a 100, corresponding to previous measurements of the first critical band, and solved for b using Newton's iterative approximation. This value of b is then used to generate an ideal set of band edges as shown in Table 1.
  • filters may be designed.
  • twenty 512-point, min-max optimal filters using the well-known Remez exchange algorithm were designed. Table 2 lists the parameters for each filter.
  • the frame size for each band is advantageously chosen according to the length of the impulse response of the band filter. For higher bands, the energy of the impulse response becomes more concentrated in time, leading to a choice of a smaller frame size.
  • Table 3 shows the relationship between the noise band number and frame size.
  • the volume control may be set to a comfortable level for listening to the full-bandwidth speech and left in the same position when listening to the constituent subbands, which as a result sound much softer than the full speech signal. Listening tests are advantageously be carried out in a soundproof booth using headphones with the same signal is presented to both ears.
  • FIG. 3 is a block diagram of a system to achieve this for each frame of speech.
  • FIG. 4 is a flowchart illustrating steps carried out by the system of FIG. 3. The operation of the system of FIG.
  • Filter speech Input the current frame of speech in step 410. In step 415 the speech is filtered through filter j 315 of the filterbank to produce s j ( n ).
  • Measure energy of bandpass speech The output of filter 315 is then passed through delay 317.
  • the delay allows the system of FIG. 3 to "look ahead" to maintain a constant local NSR as described below.
  • Measure look-ahead energy of bandpass speech Because of the inherent delay imposed by the filterbank, adjustments to the noise level at the filter input are not immediately registered at the output.
  • L 320 samples yields the best results for 512 point filters. Note that this problem would be easier to solve if the filters were minimum-phase rather than linear phase.
  • Compute desired narrowband noise power: In step 430 multiply the speech power by the desired noise-to-signal ratio q ij in adaptive controller 330 to yield a desired noise power, ⁇ : ⁇ p j q ij .
  • e ( n ) u ( n ) S ⁇ ⁇ ⁇ i
  • Filter the adjusted noise The adjusted noise e ( n ) is filtered through band i using filter 350, to yield e i ( n ), and then applied to delay 355 so that the noise is again synchronous with the input frame of speech.
  • the quadratic equation for B usually has two real solutions; typically the solution that minimized
  • noise level vector for a speech signal i.e. the spectrum of noise masked by the input signal
  • a noise level vector for a speech signal may be calculated according to a three step process.
  • speech might best be analyzed in terms of its constituent critical bands, and determining the masking properties of each band.
  • the third step of the process namely, superposing the masking properties of the subbands to form a noise level vector, is discussed.
  • a noise level vector d ( d 1 ,..., d 20 ) can be determined such that noise added at these levels or below does not exceed the masking threshold.
  • the threshold noise power in each band is equal to the product of the signal power and the threshold noise-to-signal ratio.
  • Equation 4.4 thus describes how the noise level vector for a given frame of speech can be determined based on the input power in the speech frame and on the masking properties of speech as represented by the masking matrix Q .
  • the above method is flexible in that new knowledge about masking effects in the human auditory system may be readily incorporated.
  • the choice of a linear superposition rule for example, can be easily changed to a more complex function based on future auditory experiments.
  • the values in the Q matrix need not be fixed.
  • Each element in the matrix could be adaptive, e.g. a function of loudness since masking properties have been shown to change at high volume levels. It would also be easy to use different Q matrices depending on whether the current frame of speech consisted of voiced or unvoiced speech.
  • This disclosure describes a method for measuring the masking properties of components of speech signals and for determining the masking threshold of the speech signals.
  • the method disclosed herein has been described without reference to specific hardware or software. Instead the method has been described in such a manner that those skilled in the art can readily adapt such hardware or software as may be available or preferable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP95309003A 1994-12-30 1995-12-12 Verfahren zur Messung von Sprachmaskierungseigenschaften Withdrawn EP0720146A1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US36737194A 1994-12-30 1994-12-30
US367371 1994-12-30

Publications (1)

Publication Number Publication Date
EP0720146A1 true EP0720146A1 (de) 1996-07-03

Family

ID=23446902

Family Applications (1)

Application Number Title Priority Date Filing Date
EP95309003A Withdrawn EP0720146A1 (de) 1994-12-30 1995-12-12 Verfahren zur Messung von Sprachmaskierungseigenschaften

Country Status (3)

Country Link
EP (1) EP0720146A1 (de)
JP (1) JPH08272391A (de)
CA (1) CA2165352A1 (de)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101727A1 (en) * 2001-06-12 2002-12-19 Globespan Virata Incorporated Method and system for determining filter gain and automatic gain control
CN108806660A (zh) * 2017-04-26 2018-11-13 福特全球技术公司 对车辆中音调噪声的主动声音去敏感

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107942152A (zh) * 2017-11-15 2018-04-20 中国电子科技集团公司第四十研究所 一种微波射频前端的噪声测量装置及测量方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0240330A2 (de) * 1986-04-04 1987-10-07 National Research Development Corporation Geräuschkompensation zur Spracherkennung
EP0240329A2 (de) * 1986-04-04 1987-10-07 National Research Development Corporation Geräuschkompensation zur Spracherkennung
EP0575815A1 (de) * 1992-06-25 1993-12-29 Atr Auditory And Visual Perception Research Laboratories Verfahren zur Spracherkennung

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0240330A2 (de) * 1986-04-04 1987-10-07 National Research Development Corporation Geräuschkompensation zur Spracherkennung
EP0240329A2 (de) * 1986-04-04 1987-10-07 National Research Development Corporation Geräuschkompensation zur Spracherkennung
EP0575815A1 (de) * 1992-06-25 1993-12-29 Atr Auditory And Visual Perception Research Laboratories Verfahren zur Spracherkennung

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002101727A1 (en) * 2001-06-12 2002-12-19 Globespan Virata Incorporated Method and system for determining filter gain and automatic gain control
US7013271B2 (en) 2001-06-12 2006-03-14 Globespanvirata Incorporated Method and system for implementing a low complexity spectrum estimation technique for comfort noise generation
CN108806660A (zh) * 2017-04-26 2018-11-13 福特全球技术公司 对车辆中音调噪声的主动声音去敏感
CN108806660B (zh) * 2017-04-26 2023-12-01 福特全球技术公司 对车辆中音调噪声的主动声音去敏感

Also Published As

Publication number Publication date
JPH08272391A (ja) 1996-10-18
CA2165352A1 (en) 1996-07-01

Similar Documents

Publication Publication Date Title
US5623577A (en) Computationally efficient adaptive bit allocation for encoding method and apparatus with allowance for decoder spectral distortions
US5825320A (en) Gain control method for audio encoding device
KR100913987B1 (ko) 다중-채널 출력 신호를 발생시키기 위한 다중-채널합성장치 및 방법
JP3804968B2 (ja) 適応配分式符号化・復号装置及び方法
US5414795A (en) High efficiency digital data encoding and decoding apparatus
EP1939862B1 (de) Kodiervorrichtung, Dekodiervorrichtung und Verfahren dafür
JP3033156B2 (ja) ディジタル信号符号化装置
EP2991075B1 (de) Sprachcodierungsverfahren und sprachcodierungsvorrichtung
DE69633633T2 (de) Mehrkanaliger prädiktiver subband-kodierer mit adaptiver, psychoakustischer bitzuweisung
KR100295217B1 (ko) 신호스펙트럼-의존양자화비트할당및노이즈스펙트럼-의존양자화비트할당으로서디지탈입력신호를압축하는장치
US6604069B1 (en) Signals having quantized values and variable length codes
JP3277682B2 (ja) 情報符号化方法及び装置、情報復号化方法及び装置、並びに情報記録媒体及び情報伝送方法
JPH07273657A (ja) 情報符号化方法及び装置、情報復号化方法及び装置、並びに情報伝送方法及び情報記録媒体
US6199038B1 (en) Signal encoding method using first band units as encoding units and second band units for setting an initial value of quantization precision
EP1606797A1 (de) Verarbeitung von mehrkanalsignalen
US5303346A (en) Method of coding 32-kb/s audio signals
JP3297050B2 (ja) デコーダスペクトル歪み対応電算式適応ビット配分符号化方法及び装置
EP0720146A1 (de) Verfahren zur Messung von Sprachmaskierungseigenschaften
JPH06242797A (ja) 変換符号化装置のブロックサイズ決定法
EP2355094B1 (de) Subband zur Verarbeitung der Komplexitätsverringerung
JPH08123488A (ja) 高能率符号化方法、高能率符号記録方法、高能率符号伝送方法、高能率符号化装置及び高能率符号復号化方法
JP3033157B2 (ja) ディジタル信号符号化装置
JP2002189499A (ja) ディジタルオーディオ信号圧縮方法および圧縮装置
JPH04302533A (ja) ディジタルデータの高能率符号化方法
JP3070123B2 (ja) ディジタル信号符号化装置及び方法

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB IT SE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

17P Request for examination filed

Effective date: 19961211

18W Application withdrawn

Withdrawal date: 19970107