WO1996010870A1 - Psychoacoustic audio-signal coding system and method - Google Patents
Psychoacoustic audio-signal coding system and method Download PDFInfo
- Publication number
- WO1996010870A1 WO1996010870A1 PCT/EP1995/003866 EP9503866W WO9610870A1 WO 1996010870 A1 WO1996010870 A1 WO 1996010870A1 EP 9503866 W EP9503866 W EP 9503866W WO 9610870 A1 WO9610870 A1 WO 9610870A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- ratio
- blocks
- mask
- mask ratio
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B1/00—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
- H04B1/66—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission
- H04B1/665—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission using psychoacoustic properties of the ear, e.g. masking effect
Definitions
- the present invention relates to a method and to a psychoacoustic system for the audio-signal coding that utilizes the perceptive characteristics of the human auditive system.
- the audio-signal encoders utilize the results of the psychoacoustic analysis to know which characteristics of the signal must be kept unchanged during the coding process, in order to maintain a perceptive transparency or, correspondingly, the same subjective qualities of the signal.
- the analysis leads to define the maximum noise spectrum characteristics (noise mask) that the coding may introduce, underneath which the system is perceptively transparent; in other words, the maximum amount of noise that the ear is not able to perceive is defined.
- a vector of signal-to-mask ratios relative to the frequency intervals which the audible spectrum has been subdivided into is then calculated.
- the psychoacoustic model described above starting from a frequency analysis of the signal subject of the coding, one is able to define a signal-to-mask ratio that utilizes the concealment phenomenon produced by a frequency or, more in general, by a narrow frequency band over the entire spectrum of the signal.
- the present invention utilizes the time concealment phenomenon to obtain a psychoacoustic analysis more improved and suitable to auditive perception mechanisms.
- this invention provides a method usable in any coding ambit in which a good compression ratio i desired to be achieved, moreover, it can also be used in compliance with the ISO standard No. 11172-3 in which the use of the signal-to-mask ratio is recommended. Therefore, it is possible also to use the improvement provided with our invention still observing the constraints described in the standard.
- Fig. 1 is a block diagram of a generic coding psychoacoustic system
- Fig. 2 represents an implementation of the subsystem which realizes the modification of the signal-to-mask ratio.
- Fig. 1 there is represented a generic encoder 1 that receives a signal from an input line and information on how to encode the signal from a psychoacoustic analyzer 2.
- a time concealment is an alteration in the capability to detect a noise along with a tone if the tone power is varied.
- the signal-to-mask ratio SM and the procedure of calculating it are considered.
- the values of SM are calculated on the basis of frequency concealment.
- the basic idea of the time concealment is to increase the SM values in those situations where the sensitivity to noises in listening is greater and to decrease them when the sensitivity goes down.
- the perceptive entropy is the minimum information (bit/s) to be transmitted that assures the perceptive indistinguishability of the received signal with respect to the original one.
- the introduction of the time concealment, in the coding according to the invention does not necessarily lead to lower values in the average of the perceptive entropy, but the bits are surely distributed in a different manner in the sub-bands. On the basis of experimental observations, see e.g. James 0.
- a double amount of peaks per second brings to the brain more information on the tone beginning than brought successively in an equal amount of time. It has further been found that the ratio between the peak at the beginning and the stationary average value of peaks per second is not independent from the amplitude of the signal: as the amplitude is increased the peak increases more than the stationary value increases. Moreover, the description of a second tone reaching the brain is much less detailed if preceded by a strong tone. This phenomenon is called postconcealment. For the postconcealment it can be said that the insensitivity to a noise is maximum in the vicinity of the tone power fall, and the power of the concealed noise decreases with a time constant of about 20 ms independently of the tone frequency.
- a signal corresponding to the instantaneous power of the signal to be encoded is applied to terminal 10, such terminal 10 being connected with block 12 for calculating and updating the average power, such terminal being also applied to block 13 that carries out the product of the instantaneous power and the average power.
- the block 13 is connected with block 14 that calculates the result of the function f () applied to the argument outgoing from block 13.
- Block 15 multiplies the result of block 14 with the signal-to-mask ratio received by terminal 11. The modified signal-to-mask ratio is then obtained at terminal 16.
- the average power is evaluated by applying a suitable function.
- a first-order low pass filter is used with a time constant equal to the constant of the evolution times of time-concealment phenomena.
- the equation of the average power updating is:
- ⁇ ⁇ k (n) p ⁇ P k ⁇ (7 -1 ) + ( ⁇ -p) p k (n- ⁇ )
- r is the time constant of the mobile-average filter and T is the power updating time.
Abstract
The present invention relates to a method and to a psychoacoustic system for encoding the audio-signal that exploits the perceptive characteristics of the human auditive system. In particular, the system is based on a method of analysis which permits to highlight the characteristics of the audio-signal that result to be perceptively significant. Such information, according to one aspect of the invention, can be used for compressing the information contained in said signal in order to be able to transmit it in an effective manner.
Description
Psychoacoustic Audio-Signal Codi ng System and Method
The present invention relates to a method and to a psychoacoustic system for the audio-signal coding that utilizes the perceptive characteristics of the human auditive system.
The mechanisms of sound stimulus perception by the mankind are known in the literature as, e.g. , in the book by E. Zwicker and R.
Feldtkeller entitled "Das Ohr als Nachrichtenempfaenger" (Hirzel, Stuttgart, 1967).
The discipline that studies these phenomena is named psychoacoustics.
In general the audio-signal encoders utilize the results of the psychoacoustic analysis to know which characteristics of the signal must be kept unchanged during the coding process, in order to maintain a perceptive transparency or, correspondingly, the same subjective qualities of the signal.
The analysis leads to define the maximum noise spectrum characteristics (noise mask) that the coding may introduce, underneath which the system is perceptively transparent; in other words, the maximum amount of noise that the ear is not able to perceive is defined.
A vector of signal-to-mask ratios relative to the frequency intervals which the audible spectrum has been subdivided into, is then calculated.
In the psychoacoustic model described above, starting from a frequency analysis of the signal subject of the coding, one is able to define a signal-to-mask ratio that utilizes the concealment phenomenon produced by a frequency or, more in general, by a narrow frequency band over the entire spectrum of the signal.
In the field of the audio-coding the psychoacoustic analysis resulted to be very effective in obtaining high compression ratios. It is an object of the present invention to improve the known art in terms of obtaining a higher compression ratio. This object is therefore reached through a method as set forth in claim 1 and through a system as set forth in claim 6.
Further advantageous aspects of the present invention are set forth in the dependent claims. The present invention utilizes the time concealment phenomenon to obtain a psychoacoustic analysis more improved and suitable to auditive perception mechanisms.
According to one aspect of the invention it is possible to calculate concealment curves which allow a higher signal compression still maintaining the same subjective quality of the reconstructed signal. This invention provides a method usable in any coding ambit in which a good compression ratio i desired to be achieved, moreover, it can also be used in compliance with the ISO standard No. 11172-3 in which the use of the signal-to-mask ratio is recommended.
Therefore, it is possible also to use the improvement provided with our invention still observing the constraints described in the standard.
The invention will now be described in greater detail with reference to the attached drawings wherein:
- Fig. 1 is a block diagram of a generic coding psychoacoustic system,
- Fig. 2 represents an implementation of the subsystem which realizes the modification of the signal-to-mask ratio. In Fig. 1 there is represented a generic encoder 1 that receives a signal from an input line and information on how to encode the signal from a psychoacoustic analyzer 2.
Such schematic is equivalent to the structure utilized in the ISO standard No. 11172-3 and used also as a basis for the invention described hereinafter.
A time concealment is an alteration in the capability to detect a noise along with a tone if the tone power is varied. With reference to the ISO standard 11172-3 the signal-to-mask ratio SM and the procedure of calculating it are considered. In the standard, the values of SM are calculated on the basis of frequency concealment.
The basic idea of the time concealment is to increase the SM values in those situations where the sensitivity to noises in listening is greater and to decrease them when the sensitivity goes down. In order to quantify the effects of the system introduced in the
present invention it is useful to introduce the concept of perceptive entropy. The perceptive entropy is the minimum information (bit/s) to be transmitted that assures the perceptive indistinguishability of the received signal with respect to the original one. The introduction of the time concealment, in the coding according to the invention, does not necessarily lead to lower values in the average of the perceptive entropy, but the bits are surely distributed in a different manner in the sub-bands. On the basis of experimental observations, see e.g. James 0. Pickles "An introduction to the Physiology of Hearing", London, Academic Press, 1982, pages 10-99, it was possible to establish how to a stepped envelope tone, the nervous fibers outgoing from the ciliated cellules, respond with a peak of activity having an amplitude about twice the stationary activity (peaks/s) and a duration of about 20 ms.
A double amount of peaks per second brings to the brain more information on the tone beginning than brought successively in an equal amount of time. It has further been found that the ratio between the peak at the beginning and the stationary average value of peaks per second is not independent from the amplitude of the signal: as the amplitude is increased the peak increases more than the stationary value increases. Moreover, the description of a second tone reaching the brain is much less detailed if preceded by a strong tone. This phenomenon is called postconcealment. For the postconcealment it
can be said that the insensitivity to a noise is maximum in the vicinity of the tone power fall, and the power of the concealed noise decreases with a time constant of about 20 ms independently of the tone frequency. In order to exploit the phenomena described above according to one aspect of the invention the schematic as shown in Fig. 2 is used. A signal corresponding to the instantaneous power of the signal to be encoded is applied to terminal 10, such terminal 10 being connected with block 12 for calculating and updating the average power, such terminal being also applied to block 13 that carries out the product of the instantaneous power and the average power. The block 13 is connected with block 14 that calculates the result of the function f () applied to the argument outgoing from block 13. Block 15 multiplies the result of block 14 with the signal-to-mask ratio received by terminal 11. The modified signal-to-mask ratio is then obtained at terminal 16.
Consider a generic subdivision into time blocks of the signal to be coded: the variation of power in band K and in block n is indicated by the ratio
Pk (n)
between of the instantaneous power in band K and in block n to the average power in the same block. The average power is evaluated by applying a suitable function.
According to one aspect of the present invention a first-order low pass filter is used with a time constant equal to the constant of the evolution times of time-concealment phenomena. The equation of the average power updating is:
ρ~ k(n) =p~Pk ~ (7 -1 ) + (ι-p) pk (n-ι)
with
T τ
where r is the time constant of the mobile-average filter and T is the power updating time.
Clearly the parameters are to be rated in order to maximize the performances. In order to conveniently exploit the analysis carried out according to one aspect of the invention, an increasing function f() is applied to the ratio between the powers and the value
f (
P (n)
is used for modifying the signal-to-mask ratios of each block n in accordance with the following formula
SM=f ( -i ) *SM P According to one embodiment of the invention use an increasing function of the type
leading to a new definition of SM.
Parameter "all controls the adaptation rate of function f() to be determined for the maximization of the performances.
Claims
C L A I M S 1. Psychoacoustic method of coding an audio signal including the steps of:
- subdividing said audio signal into frequency bands; - subdividing said audio signal of each band into time blocks;
- determining for each block a signal-to-mask ratio;
- coding said blocks in accordance with said signal-to-mask ratio; characterized in that prior to encode said blocks, said signal-to- mask ratio is modified by using the time concealment phenomenon.
2. Method according to claim 1 , characterized in that said modification of said signal-to-mask ratio is carried out by multiplying said signal-to-mask power in said blocks.
3. Method according to claim 2, characterized in that the argument of said function is given by the ratio of instantaneous power to the average power in said block.
4. Method according to claim 3, characterized in that said average power is evaluated through the formula:
P~ k (n) =pP~ k (n-l ) + ( l -p) Pk (n-l )
5. Method according to claim 2, characterized in function is an increasing function.
6. Psychoacoustic system for coding an audio signal designed to implement the method of claim 1 including:
- means for subdividing said audio signal into frequency bands;
- means for subdividing said audio signal of each band into time blocks;
- means for determining a signal-to-mask ratio for each block;
- means for coding said blocks according to said signal-to-mask ratio; characterized by further comprising means for modifying said signal-to-mask ratio prior to encode said blocks by using the time- concealment phenomenon.
7. System according to claim 6, characterized in that said means for modifying said signal-to-mask ratio comprise: - means for evaluating the average power;
- means for executing the ratio of said average power to the instantaneous power;
- means for calculating a function having said ratio as a variable;
- means for multiplying said function with said signal-to-mask ratio.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU38027/95A AU3802795A (en) | 1994-10-04 | 1995-09-29 | Psychoacoustic audio-signal coding system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
ITMI94A002015 | 1994-10-04 | ||
ITMI942015A IT1271240B (en) | 1994-10-04 | 1994-10-04 | AUDIO SIGNAL CODING METHOD AND SYSTEM |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1996010870A1 true WO1996010870A1 (en) | 1996-04-11 |
Family
ID=11369642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP1995/003866 WO1996010870A1 (en) | 1994-10-04 | 1995-09-29 | Psychoacoustic audio-signal coding system and method |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU3802795A (en) |
IT (1) | IT1271240B (en) |
WO (1) | WO1996010870A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0458645A2 (en) * | 1990-05-25 | 1991-11-27 | Sony Corporation | Subband digital signal encoding apparatus |
EP0610007A2 (en) * | 1993-02-02 | 1994-08-10 | Sony Corporation | High efficiency encoding and decoding |
-
1994
- 1994-10-04 IT ITMI942015A patent/IT1271240B/en active IP Right Grant
-
1995
- 1995-09-29 WO PCT/EP1995/003866 patent/WO1996010870A1/en active Application Filing
- 1995-09-29 AU AU38027/95A patent/AU3802795A/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0458645A2 (en) * | 1990-05-25 | 1991-11-27 | Sony Corporation | Subband digital signal encoding apparatus |
EP0610007A2 (en) * | 1993-02-02 | 1994-08-10 | Sony Corporation | High efficiency encoding and decoding |
Also Published As
Publication number | Publication date |
---|---|
ITMI942015A0 (en) | 1994-10-04 |
IT1271240B (en) | 1997-05-27 |
AU3802795A (en) | 1996-04-26 |
ITMI942015A1 (en) | 1996-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3297051B2 (en) | Apparatus and method for adaptive bit allocation encoding | |
US8913754B2 (en) | System for dynamic spectral correction of audio signals to compensate for ambient noise | |
KR100299528B1 (en) | Apparatus and method for encoding / decoding audio signal using intensity-stereo process and prediction process | |
RU2381571C2 (en) | Synthesisation of monophonic sound signal based on encoded multichannel sound signal | |
US8805696B2 (en) | Quality improvement techniques in an audio encoder | |
JP5539203B2 (en) | Improved transform coding of speech and audio signals | |
DE69633633T2 (en) | MULTI-CHANNEL PREDICTIVE SUBBAND CODIER WITH ADAPTIVE, PSYCHOACOUS BOOK ASSIGNMENT | |
KR100551862B1 (en) | Enhancing the performance of coding systems that use high frequency reconstruction methods | |
KR100242864B1 (en) | Digital signal coder and the method | |
JP3343962B2 (en) | High efficiency coding method and apparatus | |
EP3598442B1 (en) | Systems and methods for modifying an audio signal using custom psychoacoustic models | |
HU213963B (en) | High-activity coder and decoder for digital data | |
EP0740428A1 (en) | Tonality for perceptual audio compression based on loudness uncertainty | |
DE69932861T2 (en) | METHOD FOR CODING AN AUDIO SIGNAL WITH A QUALITY VALUE FOR BIT ASSIGNMENT | |
CA2118916C (en) | Process for reducing data in the transmission and/or storage of digital signals from several dependent channels | |
JPH1028057A (en) | Audio decoder and audio encoding/decoding system | |
EP0525774B1 (en) | Digital audio signal coding system and method therefor | |
JPH0653911A (en) | Method and device for encoding voice data | |
Davidson et al. | Parametric bit allocation in a perceptual audio coder | |
KR20020044416A (en) | Personal wireless communication apparatus and method having a hearing compensation facility | |
WO1996010870A1 (en) | Psychoacoustic audio-signal coding system and method | |
Davidson | Digital audio coding: Dolby AC-3 | |
JP2000148161A (en) | Method and device for automatically controlling sound quality and volume | |
JP3134363B2 (en) | Quantization method | |
Lanciani | Auditory perception and the MPEG audio standard |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU CA CN FI JP KR MX NZ US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: CA |
|
122 | Ep: pct application non-entry in european phase |