CN101308659B

CN101308659B - Psychoacoustics model processing method based on advanced audio decoder

Info

Publication number: CN101308659B
Application number: CN2007101276606A
Authority: CN
Inventors: 吴晟; 邱小军; 黎家力; 陈强
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2007-05-16
Filing date: 2007-06-20
Publication date: 2011-11-30
Anticipated expiration: 2027-06-20
Also published as: CN101308659A

Abstract

The invention discloses a psychoacoustic model processing method based on an advanced audio encoder. The psychoacoustic model processing method includes the following steps: A, the perceptual entropy threshold value and the masking limen of a coding sub-band are obtained by the spectrum energy of the psychoacoustic of the sub-band of the bit stream to be encoded through masked diffusion matrix algorithm; B, anticipated bit consumption of the sub-band is calculated by employing time-frequency correction and anticipated echo correction through the perceptual entropy threshold value and the masking limen of the coding sub-band; C, the psychoacoustic model outputs the anticipated bit consumption of the sub-band, which then serves as a parameter for code rate distortion so as to carry out the encoding process. The psychoacoustic model processing method can obtain the bit consumption of the sub-band through the perceptual entropy more accurately and the anticipated value is taken by the encoder as the parameter for code rate distortion control, thus greatly improving the encoding efficiency and quality when the encoder makes quantized encoding.

Description

A kind of disposal route of the psychoacoustic model based on advanced audio coder

Technical field

The present invention relates to advanced audio coder, relate in particular to a kind of disposal route of the psychoacoustic model based on advanced audio coder.

Background technology

(Advanced Audio Coding AAC) belongs to a kind of transform domain and diminishes sensing audio encoding advanced audio.Diminish sensing audio encoding and can obtain very high ratio of compression, but its encoding error (quantizing noise) is inevitably higher.In order to reduce the influence of quantizing noise, diminish sensing audio encoding and control the distribution of encoding error, thereby make the noise that produces by quantization error be difficult to be discovered by the psychologic acoustics effect of research people ear.This process realizes by psychoacoustic model diminishing in the perceptual coding.

The distribution of psychoacoustic model control quantization error has utilized the auditory masking phenomenon of people's ear.Occlusion is a kind of common psycho-acoustic phenomenon, it is to be determined by the frequency discrimination mechanism and the time resolution mechanism of people's ear to sound, refer near a stronger sound, more weak sound will not discovered by people's ear relatively, promptly sheltered by forte, at this moment forte is called the person of sheltering (Masker), and off beat is masked person (Maskee).Masking effect be divided into simultaneously and shelter (Simultaneous Masking, SM) and the different time shelter (Heterochronous Masking, HM).Shelter simultaneously and be meant when occlusion occurs in the person of sheltering and masked person and exists simultaneously, be also referred to as frequency domain and shelter; When the different time masking effect of sheltering occurs in the person of sheltering and masked person and do not exist simultaneously, be also referred to as time domain masking.Different time shelter and shelter before the front and back order that takes place according to the person of sheltering is divided into again (Forward Masking, FM) and after shelter (Backward Masking, BM).If masking effect occurs in the person's of sheltering certain time before beginning, shelter before then being, shelter after taking place afterwards then to be referred to as.

The tradition psychoacoustic model provides two important parameters for scrambler, one is perceptual entropy, its representation signal is considered the auditory masking effect of people's ear, removed the size of the quantity of information after people's the perception redundancy, it can be used for the Bit Allocation in Discrete of estimated coding, also can be in order to judge the block type of coding; Another is the scrambler threshold value, and it is each coding subband largest tolerable noise, can be in order to carry out the distortion control of quantizer.Use the general quantization algorithm that adopts of AAC scrambler of traditional psychoacoustic model to be based on the rate distortion control algolithm (Rate-Distortion of scrambler threshold value, R-D), this algorithm has two nested loop searching algorithms (Two Loop Search, TLS), lattice shape framework algorithm (Trellis-Based) and cascade lattice shape framework algorithm (Cascaded Trellis-Based), wherein back two kinds is deriving of two nested loop searching algorithms.Quantizer in the AAC scrambler is a quantizer heterogeneous, and its entropy coding is elongated huffman coding.But because the use of non-uniform quantizing device, make scrambler to tolerate that noise specifies the coder parameters of enough optimization according to perception, and, elongated entropy coding obtains because causing the bit consumption number to calculate by quantized result, these factors make that the parameter that traditional psychoacoustic model provided can not be well in order to the quantification and the coding of control signal, and this has caused the complexity and the poor efficiency of present conducting code rate distortion control algorithm.

The Bit Allocation in Discrete of the two-layer embedded iteration that discards tradition and distortion control algolithm, utilize subband Bit Allocation in Discrete scale prediction to finish the conducting code rate distortion control of Rate Control and distortion control simultaneously, can obtain higher counting yield, its coding tonequality will depend on the enough optimization of subband Bit Allocation in Discrete scale prediction.Subband bit consumption prediction number can be by formula: subband bit consumption prediction bit number/all subband perceptual entropy and acquisitions that number=subband perceptual entropy * present frame can be used.Wherein, be to decide bit rate coding (CBR) as coding, so present frame can with bit number be a definite value, equal bit rate * 1024/ sampling rate; If, be exactly variable bit rate coding (VBR) so along with operating position changes, present frame in this case can with bit number generally provide by interframe bit control algolithm.As can be seen, subband bit consumption prediction number only is that the product by normalized perceptual entropy and present frame available bit number obtains, and accuracy is not high, and then has influence on the efficient of conducting code rate distortion control.And masking effect when only having considered people's ear owing to traditional psychoacoustic model has been ignored the different time masking effect, scrambler can not utilize the different time to shelter to improve coding quality, in case before shelter inefficacy, quantizing noise can not be sheltered and when Pre echoes took place, tonequality can significantly reduce.Though the instantaneous noise shaping is provided in the AAC standard, and (Temporal Noise Shaping, TNS) to weaken the influence of Pre echoes, actual test shows uses this module can worsen tonequality more.

Summary of the invention

The present invention is exactly in order to solve the problems of the technologies described above, a kind of disposal route of the psychoacoustic model based on advanced audio coder has been proposed, taken into full account time domain masking and frequency domain is sheltered, thereby output is encoded accurately subband bit consumption prediction number has improved coding quality and efficient that scrambler carries out quantization encoding.

To achieve these goals, the present invention has adopted following technical scheme:

A kind of disposal route of the psychoacoustic model based on advanced audio coder comprises following processing procedure:

A, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculate perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix;

B, by the perceptual entropy and the masking threshold of coding subband, use time-frequency and shelter and revise and the Pre echoes correction, calculate and obtain subband bit consumption prediction number;

C, psychoacoustic model output subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process.

Described step B comprises following processing procedure:

B1, relatively the encode current masking threshold of subband obtains time-frequency with long-term average masking threshold and shelters modifying factor;

B2, judge by time domain masking whether Pre echoes loses and shelter, in this way, revise time-frequency and shelter modifying factor;

B3, use time-frequency are sheltered modifying factor correction perceptual entropy and are calculated acquisition subband bit consumption prediction number.

Described long-term average masking threshold among the step B1 obtains by following formula: Argmask _Sfb(k)=α Argmask _Sfb' (k)+(1-α) mask _Sfb(k) wherein, Argmask _Sfb' (k) be the long-term average masking threshold of coding subband of previous frame, Argmask _Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask _Sfb(k) be present frame coding subband masking threshold, α is a damped expoential;

Described time-frequency is sheltered modifying factor and is obtained by following formula:

chk = \frac{{mask}_{sfb} (k)}{Argmas k_{sfb} (k)},

If chk＞4,

{brust}_{sfb} (k) = \min (1.5, \frac{\log_{2} (chk)}{2}),

α＝0.98；

If chk 〉=0.5, brust at this moment _Sfb(k)=0.95, α=0.4;

If chk＜0.5, brust at this moment _Sfb(k)=0.90, α=0.4;

Wherein, chk is an energy ratio, brust _Sfb(k) be the time domain masking modifying factor.

Judge by time domain masking whether Pre echoes loses to shelter described in the step B2 and comprise following steps:

B21, a frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and and be placed in 8 elements in centre of segmentation absolute amplitude abamp:

abamp (m + 1) = Σ_{n = 256 (m - 1) + 1}^{256 m} | x_{i} (n) |, m = 1,2, \cdot \cdot \cdot, 8

Wherein, abamp is 10 * 1 vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and

{abamp}_{i} (1) = \sqrt{Σ_{m = 2}^{9} {abamp}_{i - 1} {(m)}^{2}},

Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of this frame;

B22, the segmentation absolute amplitude that is obtained by step B21 calculate time domain mask Tmask (m) by following formula:

Tmask (m) = Tnorm (m) Σ_{n = 1}^{m + 2} abamp (n) {Rate}_{Tmask} (m - n + 3)

Time domain diffusive attenuation coefficients R ate wherein _TmaskFor

Rate _Tmask=[0.10.9 ⁰0.9 ¹0.9 ²0.9 ³0.9 ⁴0.9 ⁵0.9 ⁶0.9 ⁷0.9 ⁸] time domain diffusion normalization coefficient Tnorm (m) is

Tnorm (m) = \frac{1}{Σ_{n = 1}^{m + 2} {Rate}_{Tmask}}, m = 1,2, \cdot \cdot \cdot, 8

B23, when 1.3Tmask (1)＜Tmask (8) and Tmask (8)＞2000, be judged as Pre echoes and lose and to shelter.

Lose when sheltering when being judged as Pre echoes, by following principle the two continuous frames time-frequency is sheltered modifying factor correction: brust ' _Sfb(k)=brust _Sfb(k) ^ChnBrust, wherein, brust ' _Sfb(k) for sheltering modifying factor, brust through the time-frequency of Pre echoes correction _Sfb(k) be former time domain masking modifying factor, chnBrust=3 during the first frame correction, chnBrust=2 during the second frame correction.

Step B3 realizes as follows:

B31, use time-frequency are sheltered modifying factor correction perceptual entropy and are obtained subband bit consumption prediction ratio;

B32, carry out interframe negative feedback bit control, obtain the available bit number of present frame according to actual bit consumption;

B33, calculate by the available bit number of subband bit consumption prediction ratio and present frame and to obtain subband bit consumption prediction number.

The bit consumption of subband described in step B31 prediction ratio obtains by following formula:

sfbBitRatio (k) = \frac{{RE}_{sfb} (k)}{Σ_{k = 1}^{49} {PE}_{sfb} (k)} {brust}^{'}_{sfb} (k),

Wherein, sfbBitRatio (k) is subband bit consumption prediction ratio, brust _Sfb(k) be the time domain masking modifying factor, PE _Sfb(k) be the perceptual entropy of coding subband.

The available bit number of present frame described in the step B32 is obtained by following formula:

BitAvailable (i)=controlRatio (bitAverage+bitAvailable (i-1)-bitUsed), wherein, controlRatio is the interframe modifying factor, the average number of bits that bitAverage can use for the every frame that obtains according to average bit rate, bitAvailable (i-1) is the previous frame available bit number, bitUsed is the bit number of previous frame actual consumption, and described interframe modifying factor is determined by following principle:

If bitRatio＞1.06,

controlRatio = \frac{1}{bitRatio + 0.2},

If 1.06 〉=bitRatio＞1.05, controlRatio=0.9,

If 1.05 〉=bitRatio＞1.02, controlRatio=0.95,

If 1.02 〉=bitRatio 〉=0.98, controlRatio=1,

If bitRatio＜0.98, controlRatio=1.2, wherein

bitRatio = \frac{bitAll}{K \cdot bitAverage},

Ratio for current average every frame bit number bitAll/K and available average number of bits.

The bit consumption of subband described in step B33 prediction number is obtained by following formula:

SfbBits (k)=bitAvailable (i) sfbBitRatio (k), wherein, sfbBits (k) is subband bit consumption prediction number, and bitAvailable (i) is the present frame available bit number, and sfbBitRatio (k) is subband bit consumption prediction ratio.

Steps A comprises following steps:

A1, obtain the psychologic acoustics sub belt energy by the spectrum energy addition of the psychologic acoustics subband of code stream to be encoded;

A2, calculate the sub belt energy peak-to-valley value by the psychologic acoustics sub belt energy;

A3, the sub belt energy peak-to-valley value is mapped as the masking signal ratio by the second order linear equations;

A4, utilize masking signal than and the psychologic acoustics sub belt energy calculate subband shelter energy certainly;

A5, by diffusion matrix by sheltering the masking threshold that energy obtains the psychologic acoustics subband certainly;

A6, calculate the perceptual entropy of psychologic acoustics subband by psychologic acoustics sub belt energy and masking threshold;

A7, the perceptual entropy of psychologic acoustics subband and masking threshold be mapped to the perceptual entropy and the masking threshold of coding subband respectively.

Diffusion matrix described in the steps A 5 is sparse diffusion matrix, is to realize that by will be in the normalization diffusion matrix being changed to 0 less than the element of predetermined decibel threshold value the normalized factor of described normalization diffusion matrix obtains by following formula to the rarefaction of diffusion matrix:

sprdngN (b) = Σ_{bb = 1}^{70} sprdngf [bavl (b) - bval (bb)],

Wherein, sprdngN (b) is a normalized factor, and bavl (b) and bval (bb) are the Bark frequency, and sprdngf is a diffusion equation;

Described diffusion equation is determined by following principle:

spr = sprdngf (Δ f_{c})

= \{\begin{matrix} {Δf}_{c} < = - 3.3333, spr = 0 \\ - 3.3333 < {Δf}_{c} < = 0, spr = 10^{\frac{15.811389 + 7.5 (1.5 {Δf}_{c} + 0.474) - 17.5 \sqrt{1 + {(1.5 {Δf}_{c} + 0.474)}^{2}}}{10}} \\ 0 < {Δf}_{c} < = 0.5, spr = 10^{\frac{15.811389 + 7.5 (3 {Δf}_{c} + 0.474) - 17.5 \sqrt{1 + {(3 {Δf}_{c} + 0.474)}^{2}}}{10}} \\ 0.5 < {Δf}_{c} < = 2.5, spr = 10^{\frac{8 [{({3 Δf}_{c} - 1.5)}^{2} - 1] + 15.811389 + 7.5 ({3 Δf}_{c} + 0.474) - 17.5 \sqrt{1 + {({3 Δf}_{c} + 0.474)}^{2}}}{10}}, \\ 2.5 < {Δf}_{c} < = 7.3333, spr = 10^{\frac{15.811389 + 7.5 ({3 Δf}_{c} + 0.474) - 17.5 \sqrt{{1 + (3 {Δf}_{c} + 0.474)}^{2}}}{10}} \\ {Δf}_{c} > 7.3333, spr = 0 \end{matrix}

Wherein, spr is the value of diffusion equation.

The sub belt energy peak-to-valley value obtains by following formula described in the steps A 2:

ppRate (b) = \frac{E_{psy} (ϵ)}{E_{psy} (b)} = \frac{\min (E_{psy} (b - 1), E_{psy} (b + 1))}{E_{psy} (b)},

Wherein, ppRate (b) is the sub belt energy peak-to-valley value, E _Psy(b) be current psychologic acoustics sub belt energy, E _Psy(b-1), E _Psy(b+1) be respectively a psychological acoustics subband and next psychologic acoustics sub belt energy.

The second order linear equations is described in the steps A 3:

MSR _Psy(b)=0.17453ppRate (b) ²+ 0.08325ppRate (b), wherein, MSR _Psy(b) be the masking signal ratio, ppRate (b) is the sub belt energy peak-to-valley value.

Obtain by following formula from sheltering energy described in the steps A 4:

E _Selfmask(b)=E _Psy(b) MSR _Psy(b), wherein, E _Selfmask(b) for sheltering energy, E certainly _Psy(b) be the psychologic acoustics sub belt energy, MSR _Psy(b) be the masking signal ratio.

Masking threshold obtains by following formula described in the steps A 5:

Mask _Psy(b)=E _Selfmask* sprdngMN, wherein, mask _Psy(b) be the masking threshold of psychologic acoustics subband, sprdngMN is a diffusion matrix.

The perceptual entropy of psychologic acoustics subband described in the steps A 6 obtains by following formula:

{PE}_{psy} (b) = {bw}_{psy} (b) \log 10 [\frac{E_{psy} (b)}{{mask}_{psy} (b)}],

Wherein, PE _Psy(b) be psychologic acoustics subband perceptual entropy, bw _Psy(b) be psychologic acoustics subband bandwidth, E _Psy(b) be the psychologic acoustics sub belt energy, mask _Psy(b) be the masking threshold of psychologic acoustics subband.

The perceptual entropy of psychologic acoustics subband described in the steps A 7 is mapped to the perceptual entropy of coding subband by following formula:

{PE}_{sfb} (k) = Σ_{w = sfbLow (b)}^{w = sfbHigh (b)} {PE}_{spec} (w),

Wherein, PE _Sfb(k) be coding subband perceptual entropy, psyLow (b)≤w≤psyHigh (b), psyHigh (b), psyLow (b) are respectively the upper bound and the lower bound of psychologic acoustics subband b; Sfblow (b), sfbhigh (b) are respectively the upper bound and the lower bound of coding subband b;

{PE}_{spec} (w) = \frac{{PE}_{psy} (b)}{{bw}_{psy} (b)},

Bw _Psy(b) be psychologic acoustics subband bandwidth, PE _Psy(b) be psychologic acoustics subband perceptual entropy;

Described psychologic acoustics subband masking threshold is mapped to the masking threshold of coding subband by following formula:

Mask _Sfb(k)=bw _Sfb(k) min (mask _Apsy(b)), b1≤b≤b2, wherein, mask _Sfb(k) be the masking threshold of coding subband, b1 satisfies psyLow (b1)≤sfblow (k)≤psyhigh (b1), and b2 satisfies psyLow (b2)≤sfbhigh (k)≤psyhigh (b2),

{mask}_{apsy} (b) = \frac{{mask}_{psy} (b)}{{bw}_{psy} (b)},

Mask _Psy(b) be psychologic acoustics subband masking threshold; PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Sfblow (k), sfbhigh (k) are respectively the upper bound and the lower bound of coding subband k.Bw _Sfb(k) be coding subband bandwidth.

The comparison of the parameter that the parameter of the present invention by present frame and frame length phase in the past are average, and judge that by time domain masking Pre echoes revises, realized taking into full account time domain masking and disposal route that frequency domain is sheltered the psychoacoustic model of (time-frequency is sheltered), thereby obtain subband bit consumption prediction number by perceptual entropy more accurately, carry out the parameter of conducting code rate distortion control with this prediction number as scrambler, improved code efficiency and quality when scrambler carries out quantization encoding greatly.Shelter diffusion matrix by calculating and obtain perceptual entropy, in computation process, carry out sparse processing, thereby can obtain perceptual entropy more quickly, reduced the operand that calculates perceptual entropy sheltering diffusion matrix.

Description of drawings

Fig. 1 has been to use the structural framing figure of the Megal AAC scrambler of the embodiment of the invention;

Fig. 2 is the process flow diagram of the disposal route of the embodiment of the invention;

Fig. 3 is that masking signal is than the constraint subband upper bound on different sub-band and constraint subband lower bound synoptic diagram;

Fig. 4 is that Pre echoes loses the judgement synoptic diagram of sheltering;

Fig. 5 is that the ODG index of several scramblers compares synoptic diagram;

Fig. 6 is the comparison synoptic diagram of the NMR index of several scramblers;

Fig. 7 is the ODG distribution schematic diagram of several scramblers;

Fig. 8 is the NMR distribution schematic diagram of several scramblers.

Embodiment

Below in conjunction with accompanying drawing the specific embodiment of the present invention is described in detail.

The embodiment of disposal route of the present invention is referring to Fig. 2, and its concrete treatment step is as follows:

1, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculates perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix

1a) MDCT (improved discrete cosine transform) the spectrum energy addition with each psychologic acoustics subband of present frame obtains psychologic acoustics sub belt energy E _Psy

1b) calculate sub belt energy peak-to-valley value ppRate (b)

ppRate (b) = \frac{E_{psy} (ϵ)}{E_{psy} (b)} = \frac{\min (E_{psy} (b - 1), E_{psy} (b + 1))}{E_{psy} (b)} - - - (1)

Wherein b represents the current sub index, and b-1 and b+1 represent a subband and next subband respectively.

Obtain after the sub belt energy peak-to-valley value it being constrained between [lower (b), upper (b)]

If?ppRate(b)＞upper(b)，ppRate(b)＝upper(b)

If?ppRate(b)＜lower(b)，ppRate(b)＝lower(b)

Be ppRate (b)=max (lower (b), min (upper (b), ppRate (b))), wherein,

lower (b) = \tan ({| 1.5 \frac{(b - 2)}{67} - 0.5 |}^{4}), b = 2, \cdot \cdot \cdot, 69

lower(1)＝lower(2)+0.1，lower(70)＝lower(69) (2)

upper(b)＝lower(b)+0.7

1c) finish the sub belt energy peak-to-valley value and compare MSR to masking signal by the second order linear equations _Psy(b) mapping MSR _Psy(b)=0.17453ppRate (b) ²+ 0.08325ppRate (b) (3)

Wherein, equation once and the quadratic term coefficient for pass through that substantive test obtains than the figure of merit.Referring to Fig. 3, as we can see from the figure, masking signal is than between the constraint upper bound and constraint lower bound than the restraint condition on different psychologic acoustics subbands for masking signal.

1d) utilize psychologic acoustics sub belt energy and masking signal and shelter ENERGY E certainly than what calculate subband _Selfmask(b) E _Selfmask(b)=E _Psy(b) MSR _Psy(b) (4)

1e) utilize the normalization diffusion matrix to calculate masking threshold mask _Psy(b) mask _Psy(b)=E _Selfmask* sprdngMN (5)

Wherein, normalization diffusion matrix sprdngMN is determined by following formula

sprdngN (b) = Σ_{bb = 1}^{70} sprdngf [bavl (b) - bval (bb)]

sprdngMN =

In the formula (6), bavl () is the mapping function of sub-band serial number to Bark (bark) frequency, the Bark frequency is a kind of frequency partition principle of simulation human hearing characteristic, arrive in the frequency range of 20000Hz 20,25 bark have been divided unevenly, frequency is represented to the nonlinear function of a bark complexity commonly used, usually the bark value that limited usefulness is obtained realizes calculating making table, be used for searching simplifying and calculate, bavl () promptly should simplify reckoner, calculated normalized factor sprdngN (b) in advance by Bark frequency look-up table.

Sprdngf () is a diffusion equation, and its value is obtained by following formula:

spr = sprdngf ({Δf}_{c})

= \{\begin{matrix} {Δf}_{c} < = - 3.3333, spr = 0 \\ - 3.3333 < {Δf}_{c} < = 0, spr = 10^{\frac{15.811389 + 7.5 (1.5 {Δf}_{c} + 0.474) - 17.5 \sqrt{1 + {(1.5 {Δf}_{c} + 0.474)}^{2}}}{10}} \\ 0 < {Δf}_{c} < = 0.5, spr = 10^{\frac{15.811389 + 7.5 (3 {Δf}_{c} + 0.474) - 17.5 \sqrt{1 + {(3 {Δf}_{c} + 0.474)}^{2}}}{10}} \\ 0.5 < {Δf}_{c} < = 2.5, spr = 10^{\frac{8 [{({3 Δf}_{c} - 1.5)}^{2} - 1] + 15.811389 + 7.5 ({3 Δf}_{c} + 0.474) - 17.5 \sqrt{1 + {({3 Δf}_{c} + 0.474)}^{2}}}{10}} \\ 2.5 < {Δf}_{c} < = 7.3333, spr = 10^{\frac{15.811389 + 7.5 ({3 Δf}_{c} + 0.474) - 17.5 \sqrt{{1 + (3 {Δf}_{c} + 0.474)}^{2}}}{10}} \\ {Δf}_{c} > 7.3333, spr = 0 \end{matrix} - - - (7)

Element less than-100dB among the sprdngMN all is changed to 0, and sprdngMN will be a sparse diffusion matrix, and its nonzero term is

SprdngMN always has 672 nonzero terms, can use the calculating that 672 times multiply-add operation is finished masking threshold.

After calculating masking threshold, it is retrained, make it on quiet threshold of audibility, as shown in the formula:

mask _psy(b)＝max[mask _psy(b)，qthr(b)] (9)

In the formula, qthr (b) is quiet threshold of audibility.

1f) calculate perceptual entropy PE by psychologic acoustics sub belt energy and masking threshold _Psy(b)

{PE}_{psy} (b) = {bw}_{psy} (b) \log 10 [\frac{E_{psy} (b)}{{mask}_{psy} (b)}] - - - (10)

Wherein, bw _Psy(b) be psychologic acoustics subband bandwidth.

1g) the coding subband of acquisition perceptual entropy and masking threshold mapping

Calculate the perceptual entropy of each frequency spectrum in the psychologic acoustics subband

{PE}_{spec} (w) = \frac{{PE}_{psy} (b)}{{bw}_{psy} (b)} - - - (11)

Be mapped to the coding subband

{PE}_{sfb} (k) = Σ_{w = sfbLow (b)}^{w = sfbHigh (b)} {PE}_{spec} (w) - - - (12)

PsyLow (b)≤w≤psyHigh (b) wherein, psyHigh (b), psyLow (b) is respectively the upper bound and the lower bound of psychologic acoustics subband b; Sfblow (b), sfbhigh (b) are respectively the upper bound and the lower bound of coding subband b.

Calculate the masking threshold of each frequency spectrum in the psychologic acoustics subband

{mask}_{apsy} (b) = \frac{{mask}_{psy} (b)}{{bw}_{psy} (b)} - - - (13)

Be mapped to the coding subband

mask _sfb(k)＝bw _sfb(k)min(mask _apsy(b))，b1≤b≤b2 (14)

Wherein b1 satisfies

psyLow(b1)≤sfblow(k)≤psyhigh(b1) (15)

B2 satisfies

psyLow(b2)≤sfbhigh(k)≤psyhigh(b2) (16)

PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Sfblow (k), sfbhigh (k) are respectively the upper bound and the lower bound of coding subband k.Bw _Sfb(k) be coding subband bandwidth.

2, more current masking threshold obtains time-frequency with long-term average masking threshold and shelters modifying factor

Coding subband masking threshold according to present frame upgrades the long-term average masking threshold of coding subband

Argmask _sfb(k)＝αArgmask _sfb′(k)+(1-α)mask _sfb(k) (17)

Argmask _Sfb' (k) be the long-term average masking threshold of coding subband of previous frame, Argmask _Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask _Sfb(k) be present frame coding subband masking threshold, wherein α is a damped expoential, and its is according to the difference of sheltering situation and difference, and concrete value is definite by formula (18).

The subband of relatively encoding shelters energy and the coding subband is on average sheltered energy for a long time, obtains energy ratio

chk = \frac{{mask}_{sfb} (k)}{Argmas k_{sfb} (k)} - - - (18)

Compare

3, judge Pre echoes by time domain masking, revise time-frequency and shelter modifying factor

Can judge that Pre echoes loses by time domain masking and shelter, utilize time-frequency to shelter the accuracy that modifying factor is carried out subsequent processing steps if take place then the time domain masking modifying factor to be revised so that further improve.Concrete steps are:

One frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and, be placed in 8 elements in centre of segmentation absolute amplitude abamp

abamp (m + 1) = Σ_{n = 256 (m - 1) + 1}^{256 m} | x_{i} (n) |, m = 1,2, \cdot \cdot \cdot, 8 - - - (19)

Abamp is one 10 * 1 a vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and

{abamp}_{i} (1) = \sqrt{Σ_{m = 2}^{9} {abamp}_{i - 1} {(m)}^{2}} - - - (20)

Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of this frame.Time domain mask Tmask (m) is one 8 * 1 a vector, calculates by following formula

Tmask (m) = Tnorm (m) Σ_{n = 1}^{m + 2} abamp (n) {Rate}_{Tmask} (m - n + 3) - - - (21)

Time domain diffusive attenuation coefficients R ate wherein _TmaskFor

Rate _Tmask＝[0.10.9 ⁰0.9 ¹0.9 ²0.9 ³0.9 ⁴0.9 ⁵0.9 ⁶0.9 ⁷0.9 ⁸] (22)

Time domain diffusion normalization coefficient Tnorm (m) is

Tnorm (m) = \frac{1}{Σ_{n = 1}^{m + 2} {Rate}_{Tmask}}, m = 1,2, \cdot \cdot \cdot, 8 - - - (23)

When 1.3Tmask (1)＜Tmask (8) and Tmask (8)＞2000, be judged as the Pre echoes mistake and shelter, its determine effect is seen Fig. 4.When judging that the Pre echoes mistake is sheltered, the time-frequency of two continuous frames is sheltered modifying factor carries out the Pre echoes correction:

brust′ _sfb(k)＝brust _sfb(k) ^chnBrust (24)

Wherein, brust ' _Sfb(k) for sheltering modifying factor, chnBrust=3 during the first frame correction, chnBrust=2 during the second frame correction through the time-frequency of Pre echoes correction.

4, use time-frequency to shelter modifying factor correction perceptual entropy and obtain subband bit consumption prediction ratio sfbBitRatio (k)

sfbBitRatio (k) = \frac{{PE}_{sfb} (k)}{Σ_{k = 1}^{49} {PE}_{sfb} (k)} {brust}^{'}_{sfb} (k) - - - (25)

5, carry out the control of interframe negative feedback bit according to actual bit consumption, calculate coding subband bit consumption prediction number by subband bit consumption prediction ratio, concrete steps are:

5a) negative feedback interframe bit correction

Make the current total number of bits of using be bitAll, the current frame number of having handled is K, the bit number of previous frame actual consumption is bitUsed, the every frame that obtains according to average bit rate can with average number of bits be bitAverage, the previous frame available bit number is bitAvailable (i-1), current average every frame bit number is bitAll/K, the ratio of it and average number of bits

bitRatio = \frac{bitAll}{K \cdot bitAverage} .

The available bit number bitAvailable of present frame (i) is

(bitAverage+bitAvailable (i-1)-bitUsed) (26) constrains in it in certain scope bitAvailable (i)=controlRatio

β·bitAverage≤bitAvailable(i)≤α·bitAverage (27)

Wherein, 0＜α＜1, α=0.95 is generally established in β＞1, and β=1.2 are proper.

5b) calculation code subband bit consumption prediction number sfbBits (k)

sfbBits(k)＝bitAvailable(i)sfbBitRatio(k) (28)

6, psychoacoustic model output subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process

Behind the subband bit consumption that obtains the encoding prediction number, psychoacoustic model is exported the parameter of this prediction number as conducting code rate distortion control, and conducting code rate distortion control carries out entropy coding and code stream is synthetic, finishes encoding process.

Given each threshold value, parameter and coefficient in the above present embodiment, be that experiment obtains than the figure of merit, the present invention does not limit and only gets aforementioned disclosed numerical value, under design of the present invention, it will be understood by those skilled in the art that and to carry out certain adjustment to reach better effect according to the RST of reality to above numerical value.

Psychoacoustic model of the present invention is called entropy and distributes psychoacoustic model (Entropy-allocation psychoacoustic model, EAPAM), with in multiple audio coding such as MP3, adopt, traditional psychoacoustic model 2 (PAMII) that MPEG-4 AAC standard provides compares.Megal AAC Encoder (Megal) is a kind of AAC scrambler that utilizes the prediction of subband bits proportion to instruct conducting code rate distortion control, and its structural framing as shown in Figure 1.About algorithm complex is assessed by Free Advanced Audio Coder (FAAC) that relatively uses PAM II and the Megal AAC Encoder that uses EAPAM, 44100Hz is being sampled, the stereo audio of 16 quantifications averages under the situation that bit rate is the 128Kbps coding and carries out, and reference index is per second 1,000,000 operands.

The calculated amount of table 1 psychoacoustic model type and the calculated amount of encryption algorithm

* 1 use look-up tables'implementation

* 2 use sparse diffusion matrix

As seen from Table 1, the calculated amount of EAPAM algorithm has reduced 48.478MOPS with respect to PAM II, the proportion that this module accounts for the amount of calculation has been reduced to 17% from 57%, the R-D algorithm distributes because used the prediction of subband bits proportion to instruct, calculated amount is reduced to 12.8MOPS from 35MOPS, overall calculated amount has reduced 69.6MOPS, and amplitude reaches 76.7%.

EAQUAL1.3 is used in the tonequality assessment of scrambler, and this is to use the objective evaluation program of sensing audio objective evaluation standard P EAQ, and assessment of acoustics index name and description thereof that PEAQ provides see Table 2

The evaluation index and the meaning of the output of table 2 eaqual software

Index name	The index meaning
		?ODG	Objective difference level (objective difference grade)
?DIX	Distortion index (distortion index)
		?BandwidthTest	The reference signal frequency span
?NMR	Noise mask is than (noise to mask radio)
		?WinModDiff1	The windowing modulation difference is average
?ADB	Average distortion piece (Average Distortion Block)
		?EHS	Harmonic structure distortion (err of harmonic structure)
?AvgModDiff1	Transfer difference time average 1
		?AvgModDiff2	Transfer difference time average 2
?MFPD	Maximum filtering probability (Maximum Filtered Probability of Detection)
		?RDF	Disperse frame (Relative Disturbed Frames) relatively

Here select overall objective (ODG) and two important single indexs (BandwidthTest and NMR) as the main reference index.Tonequality assessment uses four kinds of scramblers with reference to carrying out, and they are laterally assessed, and these four scramblers are respectively the Megal that uses EAPAM model of the present invention and traditional PAM II model, NCTU AAC Encoder (hereinafter to be referred as NCTU) and FAAC.Wherein NCTU is the AAC scrambler of Taiwan university of communications sensing audio group development, FAAC is the AAC scrambler of German FraunhoferIIS exploitation, Fraunhofer IIS is the main maker of Mpeg standard, the checking scrambler that its FAAC scrambler is the AAC standard.First and second of sound equipment British Music on Lyrita from Quad that U.S. Hui Wei company provides used in tonequality assessment source, rejected the song of repetition, chosen wherein 37 music excerpt, and these montages have comprised the fundamental type of melody, and their title and description see Table 3

Table 3 test song

Sequence number	Song	Type specification	Duration (second)
				1	Snowflake flies upward	Electronic synthesizer, Pre echoes probability height	84.07
2	Female voice is sung opera arias	Female voice is sung opera arias, English female voice	59.30
				3	shaniaFuain	Popular, English female voice, Pre echoes probability height	88.68
4	The ferry	Popular, Chinese female voice, Pre echoes probability height	72.77
				5	Da Ban city Miss	Men's chorus	68.38
6	Hotel california	Eagles	119.98
				7	The drum poem	XRCD, Pre echoes probability height	65.32
8	The red light note	Beijing opera female voice	53.43
				9	Zhang San's song	Popular, Chinese male voice	57.77
10	The bass king	Contrabass	87.49
				11	Denon	Orchestral music	61.21
12	The POLO guitar	Instrumental music	59.98
				13	Chinese lute is to the saxophone	Instrumental music	84.08
14	The water of the Yellow River has been done	Nationality, Chinese male voice	69.85
				15	Mut's violin	Solo	61.63
16	OneIlove	Female voice is sung opera arias, English female voice	74.51
				17	High mountain and great rivers	Zheng	53.96
18	Liang shanbo and Zhu yingtai	Violin association plays	50.36
				19	Fever is classical	Symphony	76.63
20	Seven-stringed plucked instrument in some ways similar to the zither is to suona horn	National musical instruments	77.25
				21	The hunting polka	Symphony	59.98
22	Wilfully like you	Popular, the Guangdong language male voice	89.05
				23	2001 A Space Odysseys	Symphony	99.45
24	The Song of Joy	Women's chorus, sound literary composition female voice	128.64
				25	Bubukao	Zheng	63.58
26	The song in the four seasons	Viol	68.71
				27	The toll bar prelude	Small size	67.43
28	See off	Women's chorus, Chinese young girl's sound	106.23
				29	Knock toll bar	Knock pleasure, Pre echoes probability height	68.66
30	Carmina Burana	Chorus, English poem	151.46
				31	The fiddler on the Roof	Violin, solo	72.98
32	Dear father	The soprano	76.14
				33	Tonight is unmanned sleeping	Tenor, opera	175.94
34	Voice	National ecosystem, female voice	61.99
				35	The F-16 fighter plane	Effect	43.89
36	Twister	Effect, natural phonation	64.29
				37	The rocket lift-off	Effect	39.99

Test result sees Table 4

Table 4 test result

26	-0.59	20173	-8.4287	-0.44	?20409	-9.5179	-0.44	20627	-9.8411	-0.47	?19948	-9.5655
													27	-1.25	20167	-7.1056	-0.71	?20352	-8.5245	-0.7	20593	-9.1391	-0.82	?19832	-8.7548
28	-0.74	20097	-8.3041	-0.4	?20930	-10.178	-0.4	20649	-10.254	-0.54	?19895	-9.7134
													29	-1.05	20258	-8.1341	-0.83	?20064	-9.1348	-0.64	20432	-9.2984	-0.82	?19910	-8.3131
30	-0.58	20108	-8.1582	-0.57	?19449	-8.9738	-0.34	20654	-9.8612	-0.5	?19965	-7.4497
													31	-0.84	20152	-9.102	-0.65	?20739	-9.6848	-0.69	20651	-10.339	-0.94	?19940	-8.3607
32	-1.42	20044	-7.2542	-0.76	?20001	-9.1252	-0.67	19858	-9.6689	-0.78	?19773	-9.4107
													33	-0.86	20050	-8.4676	-0.5	?19327	-10.514	-0.53	19726	-10.665	-0.59	?19832	-9.9142
34	-0.86	20128	-7.6612	-0.52	?20373	-9.6314	-0.41	20602	-9.9943	-0.56	?19897	-9.0474
													35	-0.63	20158	-8.6926	-0.66	?20317	-9.1048	-0.33	20541	-10.686	-0.45	?19899	-9.9646
36	-0.61	20076	-7.9394	-1.1	?19055	-8.1089	-0.35	20527	-9.4003	-0.5	?19826	-8.7527
													37	-0.39	20574	-8.7607	-0.67	?20773	-7.5065	-0.35	20738	-8.8435	-0.45	?19963	-8.1802
The worst	-2.02	20044	-6.6257	-1.1	?19055	-7.2061	-0.91	19726	-7.7108	-1.8	?19770	6.8442
													Best	-0.36	20574	-10.022	-0.37	?21221	-11.012	-0.2	20738	-16.799	-0.45	?20038	-10.442
On average	-0.829	20200	-8.263	-0.666	?20295	-9.321	-0.47919	20531.95	-10.3993	-0.845	?19894	-6.32534

From Fig. 5 and Fig. 6 as seen, the relative Faac of the average ODG of NCTU improves 0.163, and uses the relative NCTU of average ODG of Megal of the present invention to improve 0.187, uses the megal of PAM II method basic suitable with Faac; The average N MR of NCTU is relative, and Faac has reduced 1.06dB, and uses the relative NCTU of average N MR of Megal of the present invention to reduce 1.08dB, uses the megal average N MR of PAM II method will be higher than Faac.Similarly conclusion can obtain in the NMR distribution plan of the ODG distribution plan of Fig. 7 of test clips and Fig. 8.The calculated amount assessment illustrates all that with the tonequality objective evaluation the present invention can make the AAC scrambler obtain the tonequality that significantly improves with the calculated amount that significantly reduces.

The comparison of the parameter that the parameter of the present invention by present frame and frame length phase in the past are average, and the time domain Pre echoes is judged, realized taking into full account time domain masking and frequency domain is sheltered the psychoacoustic model of (time-frequency is sheltered), the final output subband Bit Allocation in Discrete scale prediction of encoding accurately, can improve the coding quality of quantization encoding algorithm, comparing operand simultaneously with traditional psychoacoustic model algorithm also has reduction significantly.

Claims

1. the audio-frequency processing method based on the psychoacoustic model of advanced audio coder is characterized in that, comprises following processing procedure:

B, by the perceptual entropy and the masking threshold of coding subband, use time-frequency and shelter and revise and the Pre echoes correction, calculate and obtain coding subband bit consumption prediction number, concrete processing procedure is:

B1, relatively the encode current masking threshold of subband obtains time-frequency with long-term average masking threshold and shelters modifying factor,

B2, judge by time domain masking whether Pre echoes loses and shelter, in this way, then the two continuous frames time-frequency is sheltered modifying factor correction: brust ' according to following principle _Sfb(k)=brust _Sfb(k) ^ChnBrust, wherein, brust ' _Sfb(k) for sheltering modifying factor, brust through the time-frequency of Pre echoes correction _Sfb(k) shelter modifying factor for former time-frequency, k is a k coding subband, chnBrust=3 during the first frame correction, and chnBrust=2 during the second frame correction,

B3, use time-frequency are sheltered modifying factor correction perceptual entropy, calculate to obtain coding subband bit consumption prediction number, and concrete steps are:

B31, the perceptual entropy of using time-frequency the to shelter modifying factor correction coding subband subband bit consumption prediction ratio that obtains encoding,

B32, carry out interframe negative feedback bit control, obtain the available bit number of present frame according to actual bit consumption,

B33, calculate by the available bit number of coding subband bit consumption prediction ratio and present frame and to obtain coding subband bit consumption prediction number,

Wherein, the ratio of coding subband bit consumption prediction described in the step B31 obtains by following formula:

sfbBitRatio (k) = \frac{{PE}_{sfb} (k)}{Σ_{k = 1}^{49} {PE}_{sfb} (k)} {brust}^{'}_{sfb} (k),

Wherein, sfbBitRatio (k) is coding subband bit consumption prediction ratio, brust ' _Sfb(k) shelter modifying factor for time-frequency, PE _Sfb(k) be the perceptual entropy of coding subband, k is a k coding subband;

C, psychoacoustic model output encoder subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process.

2. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, long-term average masking threshold obtains by following formula described in the step B1: Argmask _Sfb(k)=α Argmask ' _Sfb(k)+(1-α) mask _Sfb(k)

Wherein, Argmask ' _Sfb(k) be the long-term average masking threshold of coding subband of previous frame, Argmask _Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask _Sfb(k) be present frame coding subband masking threshold, α is a damped expoential, and k is a k coding subband;

chk = \frac{{mask}_{sfb} (k)}{{Argmask}_{sfb} (k)},

If chk＞4,

{brust}_{sfb} (k) = \min (1.5, \frac{\log_{2} (chk)}{2}),

α＝0.98；

If 4 〉=chk 〉=0.5, brust at this moment _Sfb(k)=0.95, α=0.4;

If chk＜0.5, brust at this moment _Sfb(k)=0.90, α=0.4;

Wherein, chk is an energy ratio, brust _Sfb(k) shelter modifying factor for the time-frequency of present encoding subband.

3. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, judges by time domain masking whether Pre echoes loses to shelter described in the step B2 and comprises following steps:

abamp (m + 1) = Σ_{n = 256 (m - 1) + 1}^{256 m} | x_{i} (n) |, m = 1,2, \cdot \cdot \cdot, 8

Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of present frame; Wherein, i is a present frame, and i-1 is a previous frame; x _i(n) be n time-domain signal point of present frame;

Tmask (m) = Tnorm (m) Σ_{t = 1}^{m + 2} abamp (t) {Rate}_{Tmask} (m - t + 3)

Time domain diffusive attenuation coefficients R ate wherein _TmaskFor

Rate _Tmask＝[0.1?0.9 ⁰?0.9 ¹?0.9 ²?0.9 ³?0.9 ⁴?0.9 ⁵?0.9 ⁶?0.9 ⁷?0.9 ⁸]

Time domain diffusion normalization coefficient Tnorm (m) is

Tnorm (m) = \frac{1}{Σ_{t = 1}^{m + 2} {Rate}_{Tmask} (t)}, m = 1,2, \cdot \cdot \cdot, 8;

4. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1, it is characterized in that, the available bit number of present frame described in the step B32 is obtained by following formula: bitAvailable (i)=controlRatio (bitAverage+bitAvailable (i-1)-bitUsed), wherein, controlRatio is the interframe modifying factor, the average number of bits that bitAverage can use for the every frame that obtains according to average bit rate, bitAvailable (i-1) is the previous frame available bit number, bitUsed is the bit number of previous frame actual consumption, and described interframe modifying factor is determined by following principle:

If bitRatio＞1.06,

controlRatio = \frac{1}{bitRatio + 0.2},

If 1.06 〉=bitRatio＞1.05, controlRatio=0.9,

If 1.05 〉=bitRatio＞1.02, controlRatio=0.95,

If 1.02 〉=bitRatio 〉=0.98, controlRatio=1,

If bitRatio＜0.98, controlRatio=1.2, wherein

Be the ratio of current average every frame bit number bitAll/K and available average number of bits, i is a present frame, and K is the current frame number of having handled, and bitAll is the total number of bits of current use.

5. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, the subband of coding described in step B33 bit consumption prediction number is obtained by following formula:

SfbBits (k)=bitAvailable (i) sfbBitRatio (k), wherein, sfbBits (k) is coding subband bit consumption prediction number, bitAvailable (i) is the present frame available bit number, sfbBitRatio (k) is coding subband bit consumption prediction ratio, and k is a k coding subband.

6. as the audio-frequency processing method of the arbitrary described psychoacoustic model based on advanced audio coder of claim 1 to 5, it is characterized in that steps A comprises following steps:

A2, calculate psychologic acoustics sub belt energy peak-to-valley value by the psychologic acoustics sub belt energy;

A3, psychologic acoustics sub belt energy peak-to-valley value is mapped as the masking signal ratio by the second order linear equations;

A4, utilize masking signal than and the psychologic acoustics sub belt energy calculate the psychologic acoustics subband shelter energy certainly;

7. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6, it is characterized in that, diffusion matrix described in the steps A 5 is sparse diffusion matrix, rarefaction to diffusion matrix is by realizing that with being changed to 0 less than the element of being scheduled to the decibel threshold value in the normalization diffusion matrix normalized factor of described normalization diffusion matrix obtains by following formula:

sprdngN (b) = Σ_{bb = 1}^{70} sprdngf [bavl (b) - bval (bb)],

Wherein, sprdngN (b) is a normalized factor, and bavl (b) and bval (bb) are the Bark frequency, and sprdngf () is a diffusion equation; Described diffusion equation is determined by following principle:

spr = sprdngf ({Δf}_{c})

= \{\begin{matrix} {Δf}_{c} < = - 3.3333 & , spr = 0 \\ - 3.3333 < Δ f_{c} < = 0 & , spr = 10^{\frac{15.811389 + 7.5 (1.5 Δ f_{c} + 0.474) - 17.5 \sqrt{1 + {(1.5 Δ f_{c} + 0.474)}^{2}}}{10}} \\ 0 < Δ f_{c} < = 0.5 & , spr = 10^{\frac{15.811389 + 7.5 (3 Δ f_{c} + 0.474) - 17.5 \sqrt{1 + {(3 Δ f_{c} + 0.474)}^{2}}}{10}} \\ 0.5 < Δ f_{c} < = 2.5 & , spr = 10^{\frac{8 [{(3 Δ f_{c} - 1.5)}^{2} - 1] + 15.811389 + 7.5 (3 Δ f_{c} + 0.474) - 17.5 \sqrt{1 + {(3 Δ f_{c} + 0.474)}^{2}}}{10}} \\ 2.5 < Δ f_{c} < = 7.3333 & , spr = 10^{\frac{15.811389 + 7.5 (3 Δ f_{c} + 0.474) - 17.5 \sqrt{1 + {(3 Δ f_{c} + 0.474)}^{2}}}{10}} \\ Δ f_{c} > 7.3333 & , spr = 0 \end{matrix},,

Wherein,

Spr is the value of diffusion equation, and b is a b psychologic acoustics subband, and bb is a bb psychologic acoustics subband, and Δ fc represents formula bavl (b)-bval (bb).

8. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the peak-to-valley value of psychologic acoustics sub belt energy described in the steps A 2 obtains by following formula:

ppRate (b) = \frac{E_{psy} (ϵ)}{E_{psy} (b)} = \frac{\min (E_{psy} (b - 1), E_{psy} (b + 1))}{E_{psy} (b)},

Wherein, ppRate (b) is a psychologic acoustics sub belt energy peak-to-valley value, E _Psy(b) be current psychologic acoustics sub belt energy, E _Psy(b-1), E _Psy(b+1) be respectively a psychological acoustics subband and next psychologic acoustics sub belt energy, b is a b psychologic acoustics subband, and ε is (b-1) individual psychologic acoustics subband or (b+1) individual psychologic acoustics subband.

9. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the second order linear equations is described in the steps A 3:

MSR _Psy(b)=0.17453ppRate (b) ²+ 0.08325ppRate (b), wherein, MSR _Psy(b) be the masking signal ratio, ppRate (b) is a psychologic acoustics sub belt energy peak-to-valley value, and b is a b psychologic acoustics subband.

10. the disposal route of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that, obtains by following formula from sheltering energy described in the steps A 4:

E _Selfmask(b)=E _Psy(b) MSR _Psy(b), wherein, E _Selfmask(b) for sheltering energy, E certainly _Psy(b) be the psychologic acoustics sub belt energy, MSR _Psy(b) be the masking signal ratio, b is a b psychologic acoustics subband.

11. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that masking threshold obtains by following formula described in the steps A 5:

Mask _Psy(b)=E _Selfmask* sprdngMN, wherein, mask _Psy(b) be the masking threshold of psychologic acoustics subband, sprdngMN is a diffusion matrix, E _SelfmaskFor sheltering energy certainly, b is a b psychologic acoustics subband.

12. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the perceptual entropy of psychologic acoustics subband described in the steps A 6 obtains by following formula:

{PE}_{psy} (b) = {bw}_{psy} (b) \log 10 [\frac{E_{psy} (b)}{{mask}_{psy} (b)}],

Wherein, PE _Psy(b) be psychologic acoustics subband perceptual entropy, bw _Psy(b) be psychologic acoustics subband bandwidth, E _Psy(b) be the psychologic acoustics sub belt energy, mask _Psy(b) be the masking threshold of psychologic acoustics subband, b is a b psychologic acoustics subband.

13. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that, the perceptual entropy of psychologic acoustics subband described in the steps A 7 is mapped to the perceptual entropy of coding subband by following formula:

{PE}_{sfb} (k) = Σ_{w = sfbLow (k)}^{w = sfbHigh (k)} {PE}_{spec} (w),

Wherein, PE _Sfb(k) be coding subband perceptual entropy; SfbLow (k), sfbHigh (k) are respectively the lower bound and the upper bound of coding subband k;

Bw _Psy(b) be psychologic acoustics subband bandwidth, PE _Psy(b) be psychologic acoustics subband perceptual entropy, psyLow (b)≤w≤psyHigh (b), psyHigh (b), psyLow (b) is respectively the upper bound and the lower bound of psychologic acoustics subband b; W is a w frequency spectrum in b the psychologic acoustics subband;

Mask _Psy(b) be psychologic acoustics subband masking threshold; PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Bw _Sfb(k) be coding subband bandwidth.