EP0720146A1 - Verfahren zur Messung von Sprachmaskierungseigenschaften - Google Patents
Verfahren zur Messung von Sprachmaskierungseigenschaften Download PDFInfo
- Publication number
- EP0720146A1 EP0720146A1 EP95309003A EP95309003A EP0720146A1 EP 0720146 A1 EP0720146 A1 EP 0720146A1 EP 95309003 A EP95309003 A EP 95309003A EP 95309003 A EP95309003 A EP 95309003A EP 0720146 A1 EP0720146 A1 EP 0720146A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- noise
- subband
- power
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000000873 masking effect Effects 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 50
- 239000011159 matrix material Substances 0.000 claims description 28
- 238000001228 spectrum Methods 0.000 claims description 23
- 230000004044 response Effects 0.000 claims description 7
- 238000001914 filtration Methods 0.000 abstract description 2
- 238000002474 experimental method Methods 0.000 description 8
- 238000005259 measurement Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 210000003477 cochlea Anatomy 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000008054 signal transmission Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
- G10L19/0208—Subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Definitions
- the invention relates to a method for measuring masking properties of components of a signal and for determining a noise level vector for the signal.
- ISDN Integrated Services Digital Network
- an input speech signal which can be characterized as a continuous function of a continuous time variable, must be converted to a digital signal -- a signal that is discrete in both time and amplitude.
- the conversion is a two step process. First, the input speech signal is sampled periodically in time ( i.e. at a particular rate) to produce a sequence of samples where the samples take on a continuum of values. Then the values are quantized to a finite set of values, represented by binary digits (bits), to yield the digital signal.
- the digital signal is characterized by a bit rate, i.e. a specified number of bits per second that reflects how often the input speech signal was sampled and how many bits were used to quantized the sampled values.
- Masking is a term describing the phenomenon of human hearing wherein one sound obscures or drowns out another.
- a common example is where the sound of a car engine is drowned out if the volume of the car radio is high enough.
- the sound of the shower masked the sound of the telephone ring; if the shower had not been running, the ring would have been heard.
- the masking properties of a signal are typically measured as a noise-to-signal ratio determined with respect to a masking criterion.
- a masking criterion is the just-noticeable-distortion (JND) level, i.e. the noise-to-signal ratio where the noise just becomes audible to a listener.
- JND just-noticeable-distortion
- another masking criterion is the audible-but-not-annoying level, i.e. the point where a listener may hear the noise, but the noise level is not sufficiently high as to irritate the listener.
- the invention provides a method for determining the masking properties of a signal in which the signal is decomposed into a set of subband components, as for example by a filterbank.
- the noise power spectrum that can be masked by each subband component is identified and the noise spectra are combined to yield the noise power spectrum that can be masked by the signal.
- output signals are generated based on the power in each subband signal and on a masking matrix. The noise power spectrum that can be masked by the input signal is determined from the output signals.
- FIG. 1 illustrates a flow chart of the inventive method in which for a frame (or segment) of an input signal, a noise level vector, i.e. the spectrum of noise which may be added to the frame without exceeding a masking criterion, is determined a priori .
- the method involves three main steps.
- step 120 the input signal frame is broken down, as for example by a filterbank, into subband components whose masking properties are known or can be determined.
- the masking properties for each component are identified or accessed, e.g. from a database or a library, and in step 160 the masking properties are combined to determine the noise level vector, i.e. the spectrum of noise power that can be masked by the input signal.
- the method represents the frame of the input signal as a sum of subband components each of whose masking properties has already been measured.
- the masking properties of the components required in step 140 must first be determined. Once the library of component masking properties is determined and advantageously stored in a database, the masking components can always be accessed, and optionally adapted, to determine the noise level vector of any input signal.
- the inventive method of FIG. 1 recognizes that the masking property of a speech signal, i.e. the spectrum of noise that the speech signal can mask, can be based on the masking property of components of the speech.
- a segment or frame of a first speech input signal is split into subband components, as for example by using a filterbank comprising a plurality of subband (bandpass) filters.
- the spectrum of noise that can be masked by the first speech input signal is determined and then the spectra for all subband components are combined to find the noise level vector for the first speech input signal.
- a measurement is taken to determine how much narrowband noise in each subband can be masked.
- the measurement could be summarized as a method consisting of two nested steps: for every subband of speech i and for every subband of white noise j : Adjust the noise in subband j to the point where sufficient noise is added so that the masking criterion is met. Measure the noise-to-signal ratio at this point. repeat for next subband j repeat for next subband i .
- the noise-to-signal measurements for each combination of i and j , q i , j represent the ratio of noise power in band j that can be masked by the first speech input signal in band i .
- the elements q i , j form a matrix Q .
- An example of such a Q matrix is illustrated in FIG. 2A where, for convenience, the entries have been converted to decibels.
- the Q matrix of FIG. 2A illustrates the results of an experiment in which narrowband speech masked narrowband noise.
- the row numbers correspond to noise bands; the column numbers correspond to speech bands.
- Each element q i , j represents the maximum power ratio that can be maintained between noise in band j and the first speech input signal in band i so that the noise is masked. Note that not all q i , j have an associated value, i.e.
- subband 1 covers a frequency range of 80 Hz, from 0 to 80 Hz
- each q i , j is a power ratio determined for a particular masking criterion.
- This definition makes sense for stationary stimuli (i.e. signals whose statistical properties are invariant to time translation), but in the case of dynamic stimuli, such as speech, care must be taken in adding noise power to a signal whose level varies rapidly.
- this problem is advantageously avoided by arranging for the noise power level to vary with the speech power level so that within a given segment or frame, the ratio of speech to noise power is a predetermined constant.
- the level of the added noise is dynamically adjusted in order to achieve a constant signal-to-noise ratio (SNR) throughout the frame.
- SNR signal-to-noise ratio
- Measuring the amount of masking between one subband component of speech and another subband of noise therefore consists of listening to an ensemble of frames of bandpassed speech with a range of segmental SNRs to determine which SNR value meets the masking criterion.
- Different frame sizes may advantageously be used for different subbands as described below.
- quasi-critical band filterbank To split the speech and noise into subbands a non-uniform, quasi-critical band filterbank is designed.
- the term quasi-critical is used in recognition that the human cochlea may be represented as a collection of bandpass filters where the bandwidth of each bandpass filter is termed a critical band. See , H. Fletcher, "Auditory Patterns," Rev. Mod. Phy. , Vol. 12, pp. 47-65, 1940.
- the characteristics and parameters of the filters in the filterbank may incorporate knowledge from auditory experiments as, for example, in determining the bandwidth of the filters in the filterbank. Note that it is advantageous that the filterbank used to produce the library of masking properties of components be the same as the filterbank used in step 120 of FIG. 1.
- each filter should be as rectangular as possible, although significant passband ripple can be sacrificed in the name of greater attenuation. Overlap between adjacent filters should be minimized.
- the filterbank is not completely faithful to the human ear to the extent that experimentally measured cochlear filter responses are not rectangular and tend to overlap a great deal.
- the combined output should advantageously be perceptually indistinguishable from the input. This quality of the filterbank may be verified by listening tests.
- linear phase filters may be used, although it should be noted that because of the asymmetry of forward and backward masking it would be preferable to use minimum phase filters. This last point is illustrated by considering the case when the speech signal consists of a single spike.
- the combined output of a linear-phase filterbank would consist of the same spike delayed by half of the filter length, but the combined filtered noise would be dispersed equally before and after the spike. Since forward masking extends much farther in time than backward masking, it would be preferable if more noise came after the spike instead of before; this might be achieved with a more complicated minimum-phase filter design.
- N 20 total subbands, corresponding roughly to the number of critical bands between 0 and 7KHz as found in prior experimental methods.
- the bandwidths form an increasing geometric series.
- f 20 a b 20 -1 b -1
- f 20 is the highest frequency to be included, typically 7KHz in a speech case.
- Setting a 100, corresponding to previous measurements of the first critical band, and solved for b using Newton's iterative approximation. This value of b is then used to generate an ideal set of band edges as shown in Table 1.
- filters may be designed.
- twenty 512-point, min-max optimal filters using the well-known Remez exchange algorithm were designed. Table 2 lists the parameters for each filter.
- the frame size for each band is advantageously chosen according to the length of the impulse response of the band filter. For higher bands, the energy of the impulse response becomes more concentrated in time, leading to a choice of a smaller frame size.
- Table 3 shows the relationship between the noise band number and frame size.
- the volume control may be set to a comfortable level for listening to the full-bandwidth speech and left in the same position when listening to the constituent subbands, which as a result sound much softer than the full speech signal. Listening tests are advantageously be carried out in a soundproof booth using headphones with the same signal is presented to both ears.
- FIG. 3 is a block diagram of a system to achieve this for each frame of speech.
- FIG. 4 is a flowchart illustrating steps carried out by the system of FIG. 3. The operation of the system of FIG.
- Filter speech Input the current frame of speech in step 410. In step 415 the speech is filtered through filter j 315 of the filterbank to produce s j ( n ).
- Measure energy of bandpass speech The output of filter 315 is then passed through delay 317.
- the delay allows the system of FIG. 3 to "look ahead" to maintain a constant local NSR as described below.
- Measure look-ahead energy of bandpass speech Because of the inherent delay imposed by the filterbank, adjustments to the noise level at the filter input are not immediately registered at the output.
- L 320 samples yields the best results for 512 point filters. Note that this problem would be easier to solve if the filters were minimum-phase rather than linear phase.
- Compute desired narrowband noise power: In step 430 multiply the speech power by the desired noise-to-signal ratio q ij in adaptive controller 330 to yield a desired noise power, ⁇ : ⁇ p j q ij .
- e ( n ) u ( n ) S ⁇ ⁇ ⁇ i
- Filter the adjusted noise The adjusted noise e ( n ) is filtered through band i using filter 350, to yield e i ( n ), and then applied to delay 355 so that the noise is again synchronous with the input frame of speech.
- the quadratic equation for B usually has two real solutions; typically the solution that minimized
- noise level vector for a speech signal i.e. the spectrum of noise masked by the input signal
- a noise level vector for a speech signal may be calculated according to a three step process.
- speech might best be analyzed in terms of its constituent critical bands, and determining the masking properties of each band.
- the third step of the process namely, superposing the masking properties of the subbands to form a noise level vector, is discussed.
- a noise level vector d ( d 1 ,..., d 20 ) can be determined such that noise added at these levels or below does not exceed the masking threshold.
- the threshold noise power in each band is equal to the product of the signal power and the threshold noise-to-signal ratio.
- Equation 4.4 thus describes how the noise level vector for a given frame of speech can be determined based on the input power in the speech frame and on the masking properties of speech as represented by the masking matrix Q .
- the above method is flexible in that new knowledge about masking effects in the human auditory system may be readily incorporated.
- the choice of a linear superposition rule for example, can be easily changed to a more complex function based on future auditory experiments.
- the values in the Q matrix need not be fixed.
- Each element in the matrix could be adaptive, e.g. a function of loudness since masking properties have been shown to change at high volume levels. It would also be easy to use different Q matrices depending on whether the current frame of speech consisted of voiced or unvoiced speech.
- This disclosure describes a method for measuring the masking properties of components of speech signals and for determining the masking threshold of the speech signals.
- the method disclosed herein has been described without reference to specific hardware or software. Instead the method has been described in such a manner that those skilled in the art can readily adapt such hardware or software as may be available or preferable.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US36737194A | 1994-12-30 | 1994-12-30 | |
US367371 | 1994-12-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0720146A1 true EP0720146A1 (de) | 1996-07-03 |
Family
ID=23446902
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP95309003A Withdrawn EP0720146A1 (de) | 1994-12-30 | 1995-12-12 | Verfahren zur Messung von Sprachmaskierungseigenschaften |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0720146A1 (de) |
JP (1) | JPH08272391A (de) |
CA (1) | CA2165352A1 (de) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002101727A1 (en) * | 2001-06-12 | 2002-12-19 | Globespan Virata Incorporated | Method and system for determining filter gain and automatic gain control |
CN108806660A (zh) * | 2017-04-26 | 2018-11-13 | 福特全球技术公司 | 对车辆中音调噪声的主动声音去敏感 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107942152A (zh) * | 2017-11-15 | 2018-04-20 | 中国电子科技集团公司第四十研究所 | 一种微波射频前端的噪声测量装置及测量方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0240330A2 (de) * | 1986-04-04 | 1987-10-07 | National Research Development Corporation | Geräuschkompensation zur Spracherkennung |
EP0240329A2 (de) * | 1986-04-04 | 1987-10-07 | National Research Development Corporation | Geräuschkompensation zur Spracherkennung |
EP0575815A1 (de) * | 1992-06-25 | 1993-12-29 | Atr Auditory And Visual Perception Research Laboratories | Verfahren zur Spracherkennung |
-
1995
- 1995-12-12 EP EP95309003A patent/EP0720146A1/de not_active Withdrawn
- 1995-12-15 CA CA 2165352 patent/CA2165352A1/en not_active Abandoned
-
1996
- 1996-01-04 JP JP6096A patent/JPH08272391A/ja active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0240330A2 (de) * | 1986-04-04 | 1987-10-07 | National Research Development Corporation | Geräuschkompensation zur Spracherkennung |
EP0240329A2 (de) * | 1986-04-04 | 1987-10-07 | National Research Development Corporation | Geräuschkompensation zur Spracherkennung |
EP0575815A1 (de) * | 1992-06-25 | 1993-12-29 | Atr Auditory And Visual Perception Research Laboratories | Verfahren zur Spracherkennung |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002101727A1 (en) * | 2001-06-12 | 2002-12-19 | Globespan Virata Incorporated | Method and system for determining filter gain and automatic gain control |
US7013271B2 (en) | 2001-06-12 | 2006-03-14 | Globespanvirata Incorporated | Method and system for implementing a low complexity spectrum estimation technique for comfort noise generation |
CN108806660A (zh) * | 2017-04-26 | 2018-11-13 | 福特全球技术公司 | 对车辆中音调噪声的主动声音去敏感 |
CN108806660B (zh) * | 2017-04-26 | 2023-12-01 | 福特全球技术公司 | 对车辆中音调噪声的主动声音去敏感 |
Also Published As
Publication number | Publication date |
---|---|
JPH08272391A (ja) | 1996-10-18 |
CA2165352A1 (en) | 1996-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5623577A (en) | Computationally efficient adaptive bit allocation for encoding method and apparatus with allowance for decoder spectral distortions | |
US5825320A (en) | Gain control method for audio encoding device | |
KR100913987B1 (ko) | 다중-채널 출력 신호를 발생시키기 위한 다중-채널합성장치 및 방법 | |
JP3804968B2 (ja) | 適応配分式符号化・復号装置及び方法 | |
US5414795A (en) | High efficiency digital data encoding and decoding apparatus | |
EP1939862B1 (de) | Kodiervorrichtung, Dekodiervorrichtung und Verfahren dafür | |
JP3033156B2 (ja) | ディジタル信号符号化装置 | |
EP2991075B1 (de) | Sprachcodierungsverfahren und sprachcodierungsvorrichtung | |
DE69633633T2 (de) | Mehrkanaliger prädiktiver subband-kodierer mit adaptiver, psychoakustischer bitzuweisung | |
KR100295217B1 (ko) | 신호스펙트럼-의존양자화비트할당및노이즈스펙트럼-의존양자화비트할당으로서디지탈입력신호를압축하는장치 | |
US6604069B1 (en) | Signals having quantized values and variable length codes | |
JP3277682B2 (ja) | 情報符号化方法及び装置、情報復号化方法及び装置、並びに情報記録媒体及び情報伝送方法 | |
JPH07273657A (ja) | 情報符号化方法及び装置、情報復号化方法及び装置、並びに情報伝送方法及び情報記録媒体 | |
US6199038B1 (en) | Signal encoding method using first band units as encoding units and second band units for setting an initial value of quantization precision | |
EP1606797A1 (de) | Verarbeitung von mehrkanalsignalen | |
US5303346A (en) | Method of coding 32-kb/s audio signals | |
JP3297050B2 (ja) | デコーダスペクトル歪み対応電算式適応ビット配分符号化方法及び装置 | |
EP0720146A1 (de) | Verfahren zur Messung von Sprachmaskierungseigenschaften | |
JPH06242797A (ja) | 変換符号化装置のブロックサイズ決定法 | |
EP2355094B1 (de) | Subband zur Verarbeitung der Komplexitätsverringerung | |
JPH08123488A (ja) | 高能率符号化方法、高能率符号記録方法、高能率符号伝送方法、高能率符号化装置及び高能率符号復号化方法 | |
JP3033157B2 (ja) | ディジタル信号符号化装置 | |
JP2002189499A (ja) | ディジタルオーディオ信号圧縮方法および圧縮装置 | |
JPH04302533A (ja) | ディジタルデータの高能率符号化方法 | |
JP3070123B2 (ja) | ディジタル信号符号化装置及び方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB IT SE |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
17P | Request for examination filed |
Effective date: 19961211 |
|
18W | Application withdrawn |
Withdrawal date: 19970107 |