WO1996010870A1

WO1996010870A1 - Psychoacoustic audio-signal coding system and method

Info

Publication number: WO1996010870A1
Application number: PCT/EP1995/003866
Authority: WO
Inventors: Giorgio Parladori; Gain Antonio Mian; Renato Andreola
Original assignee: Alcatel Italia S.P.A.; Alcatel N.V.
Priority date: 1994-10-04
Filing date: 1995-09-29
Publication date: 1996-04-11
Also published as: ITMI942015A0; IT1271240B; AU3802795A; ITMI942015A1

Abstract

The present invention relates to a method and to a psychoacoustic system for encoding the audio-signal that exploits the perceptive characteristics of the human auditive system. In particular, the system is based on a method of analysis which permits to highlight the characteristics of the audio-signal that result to be perceptively significant. Such information, according to one aspect of the invention, can be used for compressing the information contained in said signal in order to be able to transmit it in an effective manner.

Description

Psychoacoustic Audio-Signal Codi ng System and Method

The present invention relates to a method and to a psychoacoustic system for the audio-signal coding that utilizes the perceptive characteristics of the human auditive system.

The mechanisms of sound stimulus perception by the mankind are known in the literature as, e.g. , in the book by E. Zwicker and R.

Feldtkeller entitled "Das Ohr als Nachrichtenempfaenger" (Hirzel, Stuttgart, 1967).

The discipline that studies these phenomena is named psychoacoustics.

In general the audio-signal encoders utilize the results of the psychoacoustic analysis to know which characteristics of the signal must be kept unchanged during the coding process, in order to maintain a perceptive transparency or, correspondingly, the same subjective qualities of the signal.

The analysis leads to define the maximum noise spectrum characteristics (noise mask) that the coding may introduce, underneath which the system is perceptively transparent; in other words, the maximum amount of noise that the ear is not able to perceive is defined.

A vector of signal-to-mask ratios relative to the frequency intervals which the audible spectrum has been subdivided into, is then calculated. In the psychoacoustic model described above, starting from a frequency analysis of the signal subject of the coding, one is able to define a signal-to-mask ratio that utilizes the concealment phenomenon produced by a frequency or, more in general, by a narrow frequency band over the entire spectrum of the signal.

In the field of the audio-coding the psychoacoustic analysis resulted to be very effective in obtaining high compression ratios. It is an object of the present invention to improve the known art in terms of obtaining a higher compression ratio. This object is therefore reached through a method as set forth in claim 1 and through a system as set forth in claim 6.

Further advantageous aspects of the present invention are set forth in the dependent claims. The present invention utilizes the time concealment phenomenon to obtain a psychoacoustic analysis more improved and suitable to auditive perception mechanisms.

According to one aspect of the invention it is possible to calculate concealment curves which allow a higher signal compression still maintaining the same subjective quality of the reconstructed signal. This invention provides a method usable in any coding ambit in which a good compression ratio i desired to be achieved, moreover, it can also be used in compliance with the ISO standard No. 11172-3 in which the use of the signal-to-mask ratio is recommended. Therefore, it is possible also to use the improvement provided with our invention still observing the constraints described in the standard.

The invention will now be described in greater detail with reference to the attached drawings wherein:

- Fig. 1 is a block diagram of a generic coding psychoacoustic system,

- Fig. 2 represents an implementation of the subsystem which realizes the modification of the signal-to-mask ratio. In Fig. 1 there is represented a generic encoder 1 that receives a signal from an input line and information on how to encode the signal from a psychoacoustic analyzer 2.

Such schematic is equivalent to the structure utilized in the ISO standard No. 11172-3 and used also as a basis for the invention described hereinafter.

A time concealment is an alteration in the capability to detect a noise along with a tone if the tone power is varied. With reference to the ISO standard 11172-3 the signal-to-mask ratio SM and the procedure of calculating it are considered. In the standard, the values of SM are calculated on the basis of frequency concealment.

The basic idea of the time concealment is to increase the SM values in those situations where the sensitivity to noises in listening is greater and to decrease them when the sensitivity goes down. In order to quantify the effects of the system introduced in the present invention it is useful to introduce the concept of perceptive entropy. The perceptive entropy is the minimum information (bit/s) to be transmitted that assures the perceptive indistinguishability of the received signal with respect to the original one. The introduction of the time concealment, in the coding according to the invention, does not necessarily lead to lower values in the average of the perceptive entropy, but the bits are surely distributed in a different manner in the sub-bands. On the basis of experimental observations, see e.g. James 0. Pickles "An introduction to the Physiology of Hearing", London, Academic Press, 1982, pages 10-99, it was possible to establish how to a stepped envelope tone, the nervous fibers outgoing from the ciliated cellules, respond with a peak of activity having an amplitude about twice the stationary activity (peaks/s) and a duration of about 20 ms.

A double amount of peaks per second brings to the brain more information on the tone beginning than brought successively in an equal amount of time. It has further been found that the ratio between the peak at the beginning and the stationary average value of peaks per second is not independent from the amplitude of the signal: as the amplitude is increased the peak increases more than the stationary value increases. Moreover, the description of a second tone reaching the brain is much less detailed if preceded by a strong tone. This phenomenon is called postconcealment. For the postconcealment it can be said that the insensitivity to a noise is maximum in the vicinity of the tone power fall, and the power of the concealed noise decreases with a time constant of about 20 ms independently of the tone frequency. In order to exploit the phenomena described above according to one aspect of the invention the schematic as shown in Fig. 2 is used. A signal corresponding to the instantaneous power of the signal to be encoded is applied to terminal 10, such terminal 10 being connected with block 12 for calculating and updating the average power, such terminal being also applied to block 13 that carries out the product of the instantaneous power and the average power. The block 13 is connected with block 14 that calculates the result of the function f () applied to the argument outgoing from block 13. Block 15 multiplies the result of block 14 with the signal-to-mask ratio received by terminal 11. The modified signal-to-mask ratio is then obtained at terminal 16.

Consider a generic subdivision into time blocks of the signal to be coded: the variation of power in band K and in block n is indicated by the ratio

P_k (n)

between of the instantaneous power in band K and in block n to the average power in the same block. The average power is evaluated by applying a suitable function. According to one aspect of the present invention a first-order low pass filter is used with a time constant equal to the constant of the evolution times of time-concealment phenomena. The equation of the average power updating is:

ρ^~ _k(n) =p^~P_k ^~ (7 -1 ) + (ι-p) p_k (n-ι)

with

T τ

where r is the time constant of the mobile-average filter and T is the power updating time.

Clearly the parameters are to be rated in order to maximize the performances. In order to conveniently exploit the analysis carried out according to one aspect of the invention, an increasing function f() is applied to the ratio between the powers and the value

f (

P (n)

is used for modifying the signal-to-mask ratios of each block n in accordance with the following formula

SM=f ( -i ) *SM P According to one embodiment of the invention use an increasing function of the type

- 6 - SUBSTTTUTE SHEET (RULE 26)

leading to a new definition of SM.

Parameter "all controls the adaptation rate of function f() to be determined for the maximization of the performances.

Claims

C L A I M S 1. Psychoacoustic method of coding an audio signal including the steps of:

- subdividing said audio signal into frequency bands; - subdividing said audio signal of each band into time blocks;

- determining for each block a signal-to-mask ratio;

- coding said blocks in accordance with said signal-to-mask ratio; characterized in that prior to encode said blocks, said signal-to- mask ratio is modified by using the time concealment phenomenon.

2. Method according to claim 1 , characterized in that said modification of said signal-to-mask ratio is carried out by multiplying said signal-to-mask power in said blocks.

3. Method according to claim 2, characterized in that the argument of said function is given by the ratio of instantaneous power to the average power in said block.

4. Method according to claim 3, characterized in that said average power is evaluated through the formula:

P^~ _k (n) =pP^~ _k (n-l ) + ( l -p) P_k (n-l )

5. Method according to claim 2, characterized in function is an increasing function.

6. Psychoacoustic system for coding an audio signal designed to implement the method of claim 1 including:

- means for subdividing said audio signal into frequency bands;

- means for subdividing said audio signal of each band into time blocks;

- means for determining a signal-to-mask ratio for each block;

- means for coding said blocks according to said signal-to-mask ratio; characterized by further comprising means for modifying said signal-to-mask ratio prior to encode said blocks by using the time- concealment phenomenon.

7. System according to claim 6, characterized in that said means for modifying said signal-to-mask ratio comprise: - means for evaluating the average power;

- means for executing the ratio of said average power to the instantaneous power;

- means for calculating a function having said ratio as a variable;

- means for multiplying said function with said signal-to-mask ratio.