CN101308659B - Psychoacoustics model processing method based on advanced audio decoder - Google Patents

Psychoacoustics model processing method based on advanced audio decoder Download PDF

Info

Publication number
CN101308659B
CN101308659B CN2007101276606A CN200710127660A CN101308659B CN 101308659 B CN101308659 B CN 101308659B CN 2007101276606 A CN2007101276606 A CN 2007101276606A CN 200710127660 A CN200710127660 A CN 200710127660A CN 101308659 B CN101308659 B CN 101308659B
Authority
CN
China
Prior art keywords
subband
psychologic acoustics
psy
sfb
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2007101276606A
Other languages
Chinese (zh)
Other versions
CN101308659A (en
Inventor
吴晟
邱小军
黎家力
陈强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN2007101276606A priority Critical patent/CN101308659B/en
Publication of CN101308659A publication Critical patent/CN101308659A/en
Application granted granted Critical
Publication of CN101308659B publication Critical patent/CN101308659B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a psychoacoustic model processing method based on an advanced audio encoder. The psychoacoustic model processing method includes the following steps: A, the perceptual entropy threshold value and the masking limen of a coding sub-band are obtained by the spectrum energy of the psychoacoustic of the sub-band of the bit stream to be encoded through masked diffusion matrix algorithm; B, anticipated bit consumption of the sub-band is calculated by employing time-frequency correction and anticipated echo correction through the perceptual entropy threshold value and the masking limen of the coding sub-band; C, the psychoacoustic model outputs the anticipated bit consumption of the sub-band, which then serves as a parameter for code rate distortion so as to carry out the encoding process. The psychoacoustic model processing method can obtain the bit consumption of the sub-band through the perceptual entropy more accurately and the anticipated value is taken by the encoder as the parameter for code rate distortion control, thus greatly improving the encoding efficiency and quality when the encoder makes quantized encoding.

Description

A kind of disposal route of the psychoacoustic model based on advanced audio coder
Technical field
The present invention relates to advanced audio coder, relate in particular to a kind of disposal route of the psychoacoustic model based on advanced audio coder.
Background technology
(Advanced Audio Coding AAC) belongs to a kind of transform domain and diminishes sensing audio encoding advanced audio.Diminish sensing audio encoding and can obtain very high ratio of compression, but its encoding error (quantizing noise) is inevitably higher.In order to reduce the influence of quantizing noise, diminish sensing audio encoding and control the distribution of encoding error, thereby make the noise that produces by quantization error be difficult to be discovered by the psychologic acoustics effect of research people ear.This process realizes by psychoacoustic model diminishing in the perceptual coding.
The distribution of psychoacoustic model control quantization error has utilized the auditory masking phenomenon of people's ear.Occlusion is a kind of common psycho-acoustic phenomenon, it is to be determined by the frequency discrimination mechanism and the time resolution mechanism of people's ear to sound, refer near a stronger sound, more weak sound will not discovered by people's ear relatively, promptly sheltered by forte, at this moment forte is called the person of sheltering (Masker), and off beat is masked person (Maskee).Masking effect be divided into simultaneously and shelter (Simultaneous Masking, SM) and the different time shelter (Heterochronous Masking, HM).Shelter simultaneously and be meant when occlusion occurs in the person of sheltering and masked person and exists simultaneously, be also referred to as frequency domain and shelter; When the different time masking effect of sheltering occurs in the person of sheltering and masked person and do not exist simultaneously, be also referred to as time domain masking.Different time shelter and shelter before the front and back order that takes place according to the person of sheltering is divided into again (Forward Masking, FM) and after shelter (Backward Masking, BM).If masking effect occurs in the person's of sheltering certain time before beginning, shelter before then being, shelter after taking place afterwards then to be referred to as.
The tradition psychoacoustic model provides two important parameters for scrambler, one is perceptual entropy, its representation signal is considered the auditory masking effect of people's ear, removed the size of the quantity of information after people's the perception redundancy, it can be used for the Bit Allocation in Discrete of estimated coding, also can be in order to judge the block type of coding; Another is the scrambler threshold value, and it is each coding subband largest tolerable noise, can be in order to carry out the distortion control of quantizer.Use the general quantization algorithm that adopts of AAC scrambler of traditional psychoacoustic model to be based on the rate distortion control algolithm (Rate-Distortion of scrambler threshold value, R-D), this algorithm has two nested loop searching algorithms (Two Loop Search, TLS), lattice shape framework algorithm (Trellis-Based) and cascade lattice shape framework algorithm (Cascaded Trellis-Based), wherein back two kinds is deriving of two nested loop searching algorithms.Quantizer in the AAC scrambler is a quantizer heterogeneous, and its entropy coding is elongated huffman coding.But because the use of non-uniform quantizing device, make scrambler to tolerate that noise specifies the coder parameters of enough optimization according to perception, and, elongated entropy coding obtains because causing the bit consumption number to calculate by quantized result, these factors make that the parameter that traditional psychoacoustic model provided can not be well in order to the quantification and the coding of control signal, and this has caused the complexity and the poor efficiency of present conducting code rate distortion control algorithm.
The Bit Allocation in Discrete of the two-layer embedded iteration that discards tradition and distortion control algolithm, utilize subband Bit Allocation in Discrete scale prediction to finish the conducting code rate distortion control of Rate Control and distortion control simultaneously, can obtain higher counting yield, its coding tonequality will depend on the enough optimization of subband Bit Allocation in Discrete scale prediction.Subband bit consumption prediction number can be by formula: subband bit consumption prediction bit number/all subband perceptual entropy and acquisitions that number=subband perceptual entropy * present frame can be used.Wherein, be to decide bit rate coding (CBR) as coding, so present frame can with bit number be a definite value, equal bit rate * 1024/ sampling rate; If, be exactly variable bit rate coding (VBR) so along with operating position changes, present frame in this case can with bit number generally provide by interframe bit control algolithm.As can be seen, subband bit consumption prediction number only is that the product by normalized perceptual entropy and present frame available bit number obtains, and accuracy is not high, and then has influence on the efficient of conducting code rate distortion control.And masking effect when only having considered people's ear owing to traditional psychoacoustic model has been ignored the different time masking effect, scrambler can not utilize the different time to shelter to improve coding quality, in case before shelter inefficacy, quantizing noise can not be sheltered and when Pre echoes took place, tonequality can significantly reduce.Though the instantaneous noise shaping is provided in the AAC standard, and (Temporal Noise Shaping, TNS) to weaken the influence of Pre echoes, actual test shows uses this module can worsen tonequality more.
Summary of the invention
The present invention is exactly in order to solve the problems of the technologies described above, a kind of disposal route of the psychoacoustic model based on advanced audio coder has been proposed, taken into full account time domain masking and frequency domain is sheltered, thereby output is encoded accurately subband bit consumption prediction number has improved coding quality and efficient that scrambler carries out quantization encoding.
To achieve these goals, the present invention has adopted following technical scheme:
A kind of disposal route of the psychoacoustic model based on advanced audio coder comprises following processing procedure:
A, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculate perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix;
B, by the perceptual entropy and the masking threshold of coding subband, use time-frequency and shelter and revise and the Pre echoes correction, calculate and obtain subband bit consumption prediction number;
C, psychoacoustic model output subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process.
Described step B comprises following processing procedure:
B1, relatively the encode current masking threshold of subband obtains time-frequency with long-term average masking threshold and shelters modifying factor;
B2, judge by time domain masking whether Pre echoes loses and shelter, in this way, revise time-frequency and shelter modifying factor;
B3, use time-frequency are sheltered modifying factor correction perceptual entropy and are calculated acquisition subband bit consumption prediction number.
Described long-term average masking threshold among the step B1 obtains by following formula: Argmask Sfb(k)=α Argmask Sfb' (k)+(1-α) mask Sfb(k) wherein, Argmask Sfb' (k) be the long-term average masking threshold of coding subband of previous frame, Argmask Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask Sfb(k) be present frame coding subband masking threshold, α is a damped expoential;
Described time-frequency is sheltered modifying factor and is obtained by following formula: chk = mask sfb ( k ) Argmas k sfb ( k ) , If chk>4, brust sfb ( k ) = min ( 1.5 , log 2 ( chk ) 2 ) , α=0.98;
If chk 〉=0.5, brust at this moment Sfb(k)=0.95, α=0.4;
If chk<0.5, brust at this moment Sfb(k)=0.90, α=0.4;
Wherein, chk is an energy ratio, brust Sfb(k) be the time domain masking modifying factor.
Judge by time domain masking whether Pre echoes loses to shelter described in the step B2 and comprise following steps:
B21, a frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and and be placed in 8 elements in centre of segmentation absolute amplitude abamp:
abamp ( m + 1 ) = Σ n = 256 ( m - 1 ) + 1 256 m | x i ( n ) | , m = 1,2 , · · · , 8
Wherein, abamp is 10 * 1 vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and abamp i ( 1 ) = Σ m = 2 9 abamp i - 1 ( m ) 2 , Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of this frame;
B22, the segmentation absolute amplitude that is obtained by step B21 calculate time domain mask Tmask (m) by following formula:
Tmask ( m ) = Tnorm ( m ) Σ n = 1 m + 2 abamp ( n ) Rate Tmask ( m - n + 3 )
Time domain diffusive attenuation coefficients R ate wherein TmaskFor
Rate Tmask=[0.10.9 00.9 10.9 20.9 30.9 40.9 50.9 60.9 70.9 8] time domain diffusion normalization coefficient Tnorm (m) is
Tnorm ( m ) = 1 Σ n = 1 m + 2 Rate Tmask , m = 1,2 , · · · , 8
B23, when 1.3Tmask (1)<Tmask (8) and Tmask (8)>2000, be judged as Pre echoes and lose and to shelter.
Lose when sheltering when being judged as Pre echoes, by following principle the two continuous frames time-frequency is sheltered modifying factor correction: brust ' Sfb(k)=brust Sfb(k) ChnBrust, wherein, brust ' Sfb(k) for sheltering modifying factor, brust through the time-frequency of Pre echoes correction Sfb(k) be former time domain masking modifying factor, chnBrust=3 during the first frame correction, chnBrust=2 during the second frame correction.
Step B3 realizes as follows:
B31, use time-frequency are sheltered modifying factor correction perceptual entropy and are obtained subband bit consumption prediction ratio;
B32, carry out interframe negative feedback bit control, obtain the available bit number of present frame according to actual bit consumption;
B33, calculate by the available bit number of subband bit consumption prediction ratio and present frame and to obtain subband bit consumption prediction number.
The bit consumption of subband described in step B31 prediction ratio obtains by following formula:
sfbBitRatio ( k ) = RE sfb ( k ) Σ k = 1 49 PE sfb ( k ) brust ′ sfb ( k ) , Wherein, sfbBitRatio (k) is subband bit consumption prediction ratio, brust Sfb(k) be the time domain masking modifying factor, PE Sfb(k) be the perceptual entropy of coding subband.
The available bit number of present frame described in the step B32 is obtained by following formula:
BitAvailable (i)=controlRatio (bitAverage+bitAvailable (i-1)-bitUsed), wherein, controlRatio is the interframe modifying factor, the average number of bits that bitAverage can use for the every frame that obtains according to average bit rate, bitAvailable (i-1) is the previous frame available bit number, bitUsed is the bit number of previous frame actual consumption, and described interframe modifying factor is determined by following principle:
If bitRatio>1.06, controlRatio = 1 bitRatio + 0.2 ,
If 1.06 〉=bitRatio>1.05, controlRatio=0.9,
If 1.05 〉=bitRatio>1.02, controlRatio=0.95,
If 1.02 〉=bitRatio 〉=0.98, controlRatio=1,
If bitRatio<0.98, controlRatio=1.2, wherein bitRatio = bitAll K · bitAverage , Ratio for current average every frame bit number bitAll/K and available average number of bits.
The bit consumption of subband described in step B33 prediction number is obtained by following formula:
SfbBits (k)=bitAvailable (i) sfbBitRatio (k), wherein, sfbBits (k) is subband bit consumption prediction number, and bitAvailable (i) is the present frame available bit number, and sfbBitRatio (k) is subband bit consumption prediction ratio.
Steps A comprises following steps:
A1, obtain the psychologic acoustics sub belt energy by the spectrum energy addition of the psychologic acoustics subband of code stream to be encoded;
A2, calculate the sub belt energy peak-to-valley value by the psychologic acoustics sub belt energy;
A3, the sub belt energy peak-to-valley value is mapped as the masking signal ratio by the second order linear equations;
A4, utilize masking signal than and the psychologic acoustics sub belt energy calculate subband shelter energy certainly;
A5, by diffusion matrix by sheltering the masking threshold that energy obtains the psychologic acoustics subband certainly;
A6, calculate the perceptual entropy of psychologic acoustics subband by psychologic acoustics sub belt energy and masking threshold;
A7, the perceptual entropy of psychologic acoustics subband and masking threshold be mapped to the perceptual entropy and the masking threshold of coding subband respectively.
Diffusion matrix described in the steps A 5 is sparse diffusion matrix, is to realize that by will be in the normalization diffusion matrix being changed to 0 less than the element of predetermined decibel threshold value the normalized factor of described normalization diffusion matrix obtains by following formula to the rarefaction of diffusion matrix: sprdngN ( b ) = Σ bb = 1 70 sprdngf [ bavl ( b ) - bval ( bb ) ] , Wherein, sprdngN (b) is a normalized factor, and bavl (b) and bval (bb) are the Bark frequency, and sprdngf is a diffusion equation;
Described diffusion equation is determined by following principle:
spr = sprdngf ( Δ f c )
= &Delta;f c < = - 3.3333 , spr = 0 - 3.3333 < &Delta;f c < = 0 , spr = 10 15.811389 + 7.5 ( 1.5 &Delta;f c + 0.474 ) - 17.5 1 + ( 1.5 &Delta;f c + 0.474 ) 2 10 0 < &Delta;f c < = 0.5 , spr = 10 15.811389 + 7.5 ( 3 &Delta;f c + 0.474 ) - 17.5 1 + ( 3 &Delta;f c + 0.474 ) 2 10 0.5 < &Delta;f c < = 2.5 , spr = 10 8 [ ( 3 &Delta;f c - 1.5 ) 2 - 1 ] + 15.811389 + 7.5 ( 3 &Delta;f c + 0.474 ) - 17.5 1 + ( 3 &Delta;f c + 0.474 ) 2 10 , 2.5 < &Delta;f c < = 7.3333 , spr = 10 15.811389 + 7.5 ( 3 &Delta;f c + 0.474 ) - 17.5 1 + ( 3 &Delta;f c + 0 . 474 ) 2 10 &Delta;f c > 7.3333 , spr = 0 Wherein, spr is the value of diffusion equation.
The sub belt energy peak-to-valley value obtains by following formula described in the steps A 2: ppRate ( b ) = E psy ( &epsiv; ) E psy ( b ) = min ( E psy ( b - 1 ) , E psy ( b + 1 ) ) E psy ( b ) , Wherein, ppRate (b) is the sub belt energy peak-to-valley value, E Psy(b) be current psychologic acoustics sub belt energy, E Psy(b-1), E Psy(b+1) be respectively a psychological acoustics subband and next psychologic acoustics sub belt energy.
The second order linear equations is described in the steps A 3:
MSR Psy(b)=0.17453ppRate (b) 2+ 0.08325ppRate (b), wherein, MSR Psy(b) be the masking signal ratio, ppRate (b) is the sub belt energy peak-to-valley value.
Obtain by following formula from sheltering energy described in the steps A 4:
E Selfmask(b)=E Psy(b) MSR Psy(b), wherein, E Selfmask(b) for sheltering energy, E certainly Psy(b) be the psychologic acoustics sub belt energy, MSR Psy(b) be the masking signal ratio.
Masking threshold obtains by following formula described in the steps A 5:
Mask Psy(b)=E Selfmask* sprdngMN, wherein, mask Psy(b) be the masking threshold of psychologic acoustics subband, sprdngMN is a diffusion matrix.
The perceptual entropy of psychologic acoustics subband described in the steps A 6 obtains by following formula:
PE psy ( b ) = bw psy ( b ) log 10 [ E psy ( b ) mask psy ( b ) ] , Wherein, PE Psy(b) be psychologic acoustics subband perceptual entropy, bw Psy(b) be psychologic acoustics subband bandwidth, E Psy(b) be the psychologic acoustics sub belt energy, mask Psy(b) be the masking threshold of psychologic acoustics subband.
The perceptual entropy of psychologic acoustics subband described in the steps A 7 is mapped to the perceptual entropy of coding subband by following formula:
PE sfb ( k ) = &Sigma; w = sfbLow ( b ) w = sfbHigh ( b ) PE spec ( w ) , Wherein, PE Sfb(k) be coding subband perceptual entropy, psyLow (b)≤w≤psyHigh (b), psyHigh (b), psyLow (b) are respectively the upper bound and the lower bound of psychologic acoustics subband b; Sfblow (b), sfbhigh (b) are respectively the upper bound and the lower bound of coding subband b; PE spec ( w ) = PE psy ( b ) bw psy ( b ) , Bw Psy(b) be psychologic acoustics subband bandwidth, PE Psy(b) be psychologic acoustics subband perceptual entropy;
Described psychologic acoustics subband masking threshold is mapped to the masking threshold of coding subband by following formula:
Mask Sfb(k)=bw Sfb(k) min (mask Apsy(b)), b1≤b≤b2, wherein, mask Sfb(k) be the masking threshold of coding subband, b1 satisfies psyLow (b1)≤sfblow (k)≤psyhigh (b1), and b2 satisfies psyLow (b2)≤sfbhigh (k)≤psyhigh (b2), mask apsy ( b ) = mask psy ( b ) bw psy ( b ) , Mask Psy(b) be psychologic acoustics subband masking threshold; PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Sfblow (k), sfbhigh (k) are respectively the upper bound and the lower bound of coding subband k.Bw Sfb(k) be coding subband bandwidth.
The comparison of the parameter that the parameter of the present invention by present frame and frame length phase in the past are average, and judge that by time domain masking Pre echoes revises, realized taking into full account time domain masking and disposal route that frequency domain is sheltered the psychoacoustic model of (time-frequency is sheltered), thereby obtain subband bit consumption prediction number by perceptual entropy more accurately, carry out the parameter of conducting code rate distortion control with this prediction number as scrambler, improved code efficiency and quality when scrambler carries out quantization encoding greatly.Shelter diffusion matrix by calculating and obtain perceptual entropy, in computation process, carry out sparse processing, thereby can obtain perceptual entropy more quickly, reduced the operand that calculates perceptual entropy sheltering diffusion matrix.
Description of drawings
Fig. 1 has been to use the structural framing figure of the Megal AAC scrambler of the embodiment of the invention;
Fig. 2 is the process flow diagram of the disposal route of the embodiment of the invention;
Fig. 3 is that masking signal is than the constraint subband upper bound on different sub-band and constraint subband lower bound synoptic diagram;
Fig. 4 is that Pre echoes loses the judgement synoptic diagram of sheltering;
Fig. 5 is that the ODG index of several scramblers compares synoptic diagram;
Fig. 6 is the comparison synoptic diagram of the NMR index of several scramblers;
Fig. 7 is the ODG distribution schematic diagram of several scramblers;
Fig. 8 is the NMR distribution schematic diagram of several scramblers.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the present invention is described in detail.
The embodiment of disposal route of the present invention is referring to Fig. 2, and its concrete treatment step is as follows:
1, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculates perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix
1a) MDCT (improved discrete cosine transform) the spectrum energy addition with each psychologic acoustics subband of present frame obtains psychologic acoustics sub belt energy E Psy
1b) calculate sub belt energy peak-to-valley value ppRate (b)
ppRate ( b ) = E psy ( &epsiv; ) E psy ( b ) = min ( E psy ( b - 1 ) , E psy ( b + 1 ) ) E psy ( b ) - - - ( 1 )
Wherein b represents the current sub index, and b-1 and b+1 represent a subband and next subband respectively.
Obtain after the sub belt energy peak-to-valley value it being constrained between [lower (b), upper (b)]
If?ppRate(b)>upper(b),ppRate(b)=upper(b)
If?ppRate(b)<lower(b),ppRate(b)=lower(b)
Be ppRate (b)=max (lower (b), min (upper (b), ppRate (b))), wherein,
lower ( b ) = tan ( | 1.5 ( b - 2 ) 67 - 0.5 | 4 ) , b = 2 , &CenterDot; &CenterDot; &CenterDot; , 69
lower(1)=lower(2)+0.1,lower(70)=lower(69) (2)
upper(b)=lower(b)+0.7
1c) finish the sub belt energy peak-to-valley value and compare MSR to masking signal by the second order linear equations Psy(b) mapping MSR Psy(b)=0.17453ppRate (b) 2+ 0.08325ppRate (b) (3)
Wherein, equation once and the quadratic term coefficient for pass through that substantive test obtains than the figure of merit.Referring to Fig. 3, as we can see from the figure, masking signal is than between the constraint upper bound and constraint lower bound than the restraint condition on different psychologic acoustics subbands for masking signal.
1d) utilize psychologic acoustics sub belt energy and masking signal and shelter ENERGY E certainly than what calculate subband Selfmask(b) E Selfmask(b)=E Psy(b) MSR Psy(b) (4)
1e) utilize the normalization diffusion matrix to calculate masking threshold mask Psy(b) mask Psy(b)=E Selfmask* sprdngMN (5)
Wherein, normalization diffusion matrix sprdngMN is determined by following formula
sprdngN ( b ) = &Sigma; bb = 1 70 sprdngf [ bavl ( b ) - bval ( bb ) ]
sprdngMN =
Figure G071C7660620070716D000093
In the formula (6), bavl () is the mapping function of sub-band serial number to Bark (bark) frequency, the Bark frequency is a kind of frequency partition principle of simulation human hearing characteristic, arrive in the frequency range of 20000Hz 20,25 bark have been divided unevenly, frequency is represented to the nonlinear function of a bark complexity commonly used, usually the bark value that limited usefulness is obtained realizes calculating making table, be used for searching simplifying and calculate, bavl () promptly should simplify reckoner, calculated normalized factor sprdngN (b) in advance by Bark frequency look-up table.
Sprdngf () is a diffusion equation, and its value is obtained by following formula:
spr = sprdngf ( &Delta;f c )
= &Delta;f c < = - 3.3333 , spr = 0 - 3.3333 < &Delta;f c < = 0 , spr = 10 15.811389 + 7.5 ( 1.5 &Delta;f c + 0.474 ) - 17.5 1 + ( 1.5 &Delta;f c + 0.474 ) 2 10 0 < &Delta;f c < = 0.5 , spr = 10 15.811389 + 7.5 ( 3 &Delta;f c + 0.474 ) - 17.5 1 + ( 3 &Delta;f c + 0.474 ) 2 10 0.5 < &Delta;f c < = 2.5 , spr = 10 8 [ ( 3 &Delta;f c - 1.5 ) 2 - 1 ] + 15.811389 + 7.5 ( 3 &Delta;f c + 0.474 ) - 17.5 1 + ( 3 &Delta;f c + 0.474 ) 2 10 2.5 < &Delta;f c < = 7.3333 , spr = 10 15.811389 + 7.5 ( 3 &Delta;f c + 0.474 ) - 17.5 1 + ( 3 &Delta;f c + 0 . 474 ) 2 10 &Delta;f c > 7.3333 , spr = 0 - - - ( 7 )
Element less than-100dB among the sprdngMN all is changed to 0, and sprdngMN will be a sparse diffusion matrix, and its nonzero term is
Figure G071C7660620070716D000101
SprdngMN always has 672 nonzero terms, can use the calculating that 672 times multiply-add operation is finished masking threshold.
After calculating masking threshold, it is retrained, make it on quiet threshold of audibility, as shown in the formula:
mask psy(b)=max[mask psy(b),qthr(b)] (9)
In the formula, qthr (b) is quiet threshold of audibility.
1f) calculate perceptual entropy PE by psychologic acoustics sub belt energy and masking threshold Psy(b)
PE psy ( b ) = bw psy ( b ) log 10 [ E psy ( b ) mask psy ( b ) ] - - - ( 10 )
Wherein, bw Psy(b) be psychologic acoustics subband bandwidth.
1g) the coding subband of acquisition perceptual entropy and masking threshold mapping
Calculate the perceptual entropy of each frequency spectrum in the psychologic acoustics subband
PE spec ( w ) = PE psy ( b ) bw psy ( b ) - - - ( 11 )
Be mapped to the coding subband
PE sfb ( k ) = &Sigma; w = sfbLow ( b ) w = sfbHigh ( b ) PE spec ( w ) - - - ( 12 )
PsyLow (b)≤w≤psyHigh (b) wherein, psyHigh (b), psyLow (b) is respectively the upper bound and the lower bound of psychologic acoustics subband b; Sfblow (b), sfbhigh (b) are respectively the upper bound and the lower bound of coding subband b.
Calculate the masking threshold of each frequency spectrum in the psychologic acoustics subband
mask apsy ( b ) = mask psy ( b ) bw psy ( b ) - - - ( 13 )
Be mapped to the coding subband
mask sfb(k)=bw sfb(k)min(mask apsy(b)),b1≤b≤b2 (14)
Wherein b1 satisfies
psyLow(b1)≤sfblow(k)≤psyhigh(b1) (15)
B2 satisfies
psyLow(b2)≤sfbhigh(k)≤psyhigh(b2) (16)
PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Sfblow (k), sfbhigh (k) are respectively the upper bound and the lower bound of coding subband k.Bw Sfb(k) be coding subband bandwidth.
2, more current masking threshold obtains time-frequency with long-term average masking threshold and shelters modifying factor
Coding subband masking threshold according to present frame upgrades the long-term average masking threshold of coding subband
Argmask sfb(k)=αArgmask sfb′(k)+(1-α)mask sfb(k) (17)
Argmask Sfb' (k) be the long-term average masking threshold of coding subband of previous frame, Argmask Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask Sfb(k) be present frame coding subband masking threshold, wherein α is a damped expoential, and its is according to the difference of sheltering situation and difference, and concrete value is definite by formula (18).
The subband of relatively encoding shelters energy and the coding subband is on average sheltered energy for a long time, obtains energy ratio
chk = mask sfb ( k ) Argmas k sfb ( k ) - - - ( 18 ) Compare
Figure G071C7660620070716D000113
3, judge Pre echoes by time domain masking, revise time-frequency and shelter modifying factor
Can judge that Pre echoes loses by time domain masking and shelter, utilize time-frequency to shelter the accuracy that modifying factor is carried out subsequent processing steps if take place then the time domain masking modifying factor to be revised so that further improve.Concrete steps are:
One frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and, be placed in 8 elements in centre of segmentation absolute amplitude abamp
abamp ( m + 1 ) = &Sigma; n = 256 ( m - 1 ) + 1 256 m | x i ( n ) | , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; , 8 - - - ( 19 )
Abamp is one 10 * 1 a vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and
abamp i ( 1 ) = &Sigma; m = 2 9 abamp i - 1 ( m ) 2 - - - ( 20 )
Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of this frame.Time domain mask Tmask (m) is one 8 * 1 a vector, calculates by following formula
Tmask ( m ) = Tnorm ( m ) &Sigma; n = 1 m + 2 abamp ( n ) Rate Tmask ( m - n + 3 ) - - - ( 21 )
Time domain diffusive attenuation coefficients R ate wherein TmaskFor
Rate Tmask=[0.10.9 00.9 10.9 20.9 30.9 40.9 50.9 60.9 70.9 8] (22)
Time domain diffusion normalization coefficient Tnorm (m) is
Tnorm ( m ) = 1 &Sigma; n = 1 m + 2 Rate Tmask , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; , 8 - - - ( 23 )
When 1.3Tmask (1)<Tmask (8) and Tmask (8)>2000, be judged as the Pre echoes mistake and shelter, its determine effect is seen Fig. 4.When judging that the Pre echoes mistake is sheltered, the time-frequency of two continuous frames is sheltered modifying factor carries out the Pre echoes correction:
brust′ sfb(k)=brust sfb(k) chnBrust (24)
Wherein, brust ' Sfb(k) for sheltering modifying factor, chnBrust=3 during the first frame correction, chnBrust=2 during the second frame correction through the time-frequency of Pre echoes correction.
4, use time-frequency to shelter modifying factor correction perceptual entropy and obtain subband bit consumption prediction ratio sfbBitRatio (k)
sfbBitRatio ( k ) = PE sfb ( k ) &Sigma; k = 1 49 PE sfb ( k ) brust &prime; sfb ( k ) - - - ( 25 )
5, carry out the control of interframe negative feedback bit according to actual bit consumption, calculate coding subband bit consumption prediction number by subband bit consumption prediction ratio, concrete steps are:
5a) negative feedback interframe bit correction
Make the current total number of bits of using be bitAll, the current frame number of having handled is K, the bit number of previous frame actual consumption is bitUsed, the every frame that obtains according to average bit rate can with average number of bits be bitAverage, the previous frame available bit number is bitAvailable (i-1), current average every frame bit number is bitAll/K, the ratio of it and average number of bits bitRatio = bitAll K &CenterDot; bitAverage .
Figure G071C7660620070716D000132
The available bit number bitAvailable of present frame (i) is
(bitAverage+bitAvailable (i-1)-bitUsed) (26) constrains in it in certain scope bitAvailable (i)=controlRatio
β·bitAverage≤bitAvailable(i)≤α·bitAverage (27)
Wherein, 0<α<1, α=0.95 is generally established in β>1, and β=1.2 are proper.
5b) calculation code subband bit consumption prediction number sfbBits (k)
sfbBits(k)=bitAvailable(i)sfbBitRatio(k) (28)
6, psychoacoustic model output subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process
Behind the subband bit consumption that obtains the encoding prediction number, psychoacoustic model is exported the parameter of this prediction number as conducting code rate distortion control, and conducting code rate distortion control carries out entropy coding and code stream is synthetic, finishes encoding process.
Given each threshold value, parameter and coefficient in the above present embodiment, be that experiment obtains than the figure of merit, the present invention does not limit and only gets aforementioned disclosed numerical value, under design of the present invention, it will be understood by those skilled in the art that and to carry out certain adjustment to reach better effect according to the RST of reality to above numerical value.
Psychoacoustic model of the present invention is called entropy and distributes psychoacoustic model (Entropy-allocation psychoacoustic model, EAPAM), with in multiple audio coding such as MP3, adopt, traditional psychoacoustic model 2 (PAMII) that MPEG-4 AAC standard provides compares.Megal AAC Encoder (Megal) is a kind of AAC scrambler that utilizes the prediction of subband bits proportion to instruct conducting code rate distortion control, and its structural framing as shown in Figure 1.About algorithm complex is assessed by Free Advanced Audio Coder (FAAC) that relatively uses PAM II and the Megal AAC Encoder that uses EAPAM, 44100Hz is being sampled, the stereo audio of 16 quantifications averages under the situation that bit rate is the 128Kbps coding and carries out, and reference index is per second 1,000,000 operands.
The calculated amount of table 1 psychoacoustic model type and the calculated amount of encryption algorithm
Figure G071C7660620070716D000141
* 1 use look-up tables'implementation
* 2 use sparse diffusion matrix
As seen from Table 1, the calculated amount of EAPAM algorithm has reduced 48.478MOPS with respect to PAM II, the proportion that this module accounts for the amount of calculation has been reduced to 17% from 57%, the R-D algorithm distributes because used the prediction of subband bits proportion to instruct, calculated amount is reduced to 12.8MOPS from 35MOPS, overall calculated amount has reduced 69.6MOPS, and amplitude reaches 76.7%.
EAQUAL1.3 is used in the tonequality assessment of scrambler, and this is to use the objective evaluation program of sensing audio objective evaluation standard P EAQ, and assessment of acoustics index name and description thereof that PEAQ provides see Table 2
The evaluation index and the meaning of the output of table 2 eaqual software
Index name The index meaning
?ODG Objective difference level (objective difference grade)
?DIX Distortion index (distortion index)
?BandwidthTest The reference signal frequency span
?NMR Noise mask is than (noise to mask radio)
?WinModDiff1 The windowing modulation difference is average
?ADB Average distortion piece (Average Distortion Block)
?EHS Harmonic structure distortion (err of harmonic structure)
?AvgModDiff1 Transfer difference time average 1
?AvgModDiff2 Transfer difference time average 2
?MFPD Maximum filtering probability (Maximum Filtered Probability of Detection)
?RDF Disperse frame (Relative Disturbed Frames) relatively
Here select overall objective (ODG) and two important single indexs (BandwidthTest and NMR) as the main reference index.Tonequality assessment uses four kinds of scramblers with reference to carrying out, and they are laterally assessed, and these four scramblers are respectively the Megal that uses EAPAM model of the present invention and traditional PAM II model, NCTU AAC Encoder (hereinafter to be referred as NCTU) and FAAC.Wherein NCTU is the AAC scrambler of Taiwan university of communications sensing audio group development, FAAC is the AAC scrambler of German FraunhoferIIS exploitation, Fraunhofer IIS is the main maker of Mpeg standard, the checking scrambler that its FAAC scrambler is the AAC standard.First and second of sound equipment British Music on Lyrita from Quad that U.S. Hui Wei company provides used in tonequality assessment source, rejected the song of repetition, chosen wherein 37 music excerpt, and these montages have comprised the fundamental type of melody, and their title and description see Table 3
Table 3 test song
Sequence number Song Type specification Duration (second)
1 Snowflake flies upward Electronic synthesizer, Pre echoes probability height 84.07
2 Female voice is sung opera arias Female voice is sung opera arias, English female voice 59.30
3 shaniaFuain Popular, English female voice, Pre echoes probability height 88.68
4 The ferry Popular, Chinese female voice, Pre echoes probability height 72.77
5 Da Ban city Miss Men's chorus 68.38
6 Hotel california Eagles 119.98
7 The drum poem XRCD, Pre echoes probability height 65.32
8 The red light note Beijing opera female voice 53.43
9 Zhang San's song Popular, Chinese male voice 57.77
10 The bass king Contrabass 87.49
11 Denon Orchestral music 61.21
12 The POLO guitar Instrumental music 59.98
13 Chinese lute is to the saxophone Instrumental music 84.08
14 The water of the Yellow River has been done Nationality, Chinese male voice 69.85
15 Mut's violin Solo 61.63
16 OneIlove Female voice is sung opera arias, English female voice 74.51
17 High mountain and great rivers Zheng 53.96
18 Liang shanbo and Zhu yingtai Violin association plays 50.36
19 Fever is classical Symphony 76.63
20 Seven-stringed plucked instrument in some ways similar to the zither is to suona horn National musical instruments 77.25
21 The hunting polka Symphony 59.98
22 Wilfully like you Popular, the Guangdong language male voice 89.05
23 2001 A Space Odysseys Symphony 99.45
24 The Song of Joy Women's chorus, sound literary composition female voice 128.64
25 Bubukao Zheng 63.58
26 The song in the four seasons Viol 68.71
27 The toll bar prelude Small size 67.43
28 See off Women's chorus, Chinese young girl's sound 106.23
29 Knock toll bar Knock pleasure, Pre echoes probability height 68.66
30 Carmina Burana Chorus, English poem 151.46
31 The fiddler on the Roof Violin, solo 72.98
32 Dear father The soprano 76.14
33 Tonight is unmanned sleeping Tenor, opera 175.94
34 Voice National ecosystem, female voice 61.99
35 The F-16 fighter plane Effect 43.89
36 Twister Effect, natural phonation 64.29
37 The rocket lift-off Effect 39.99
Test result sees Table 4
Table 4 test result
Figure G071C7660620070716D000171
26 -0.59 20173 -8.4287 -0.44 ?20409 -9.5179 -0.44 20627 -9.8411 -0.47 ?19948 -9.5655
27 -1.25 20167 -7.1056 -0.71 ?20352 -8.5245 -0.7 20593 -9.1391 -0.82 ?19832 -8.7548
28 -0.74 20097 -8.3041 -0.4 ?20930 -10.178 -0.4 20649 -10.254 -0.54 ?19895 -9.7134
29 -1.05 20258 -8.1341 -0.83 ?20064 -9.1348 -0.64 20432 -9.2984 -0.82 ?19910 -8.3131
30 -0.58 20108 -8.1582 -0.57 ?19449 -8.9738 -0.34 20654 -9.8612 -0.5 ?19965 -7.4497
31 -0.84 20152 -9.102 -0.65 ?20739 -9.6848 -0.69 20651 -10.339 -0.94 ?19940 -8.3607
32 -1.42 20044 -7.2542 -0.76 ?20001 -9.1252 -0.67 19858 -9.6689 -0.78 ?19773 -9.4107
33 -0.86 20050 -8.4676 -0.5 ?19327 -10.514 -0.53 19726 -10.665 -0.59 ?19832 -9.9142
34 -0.86 20128 -7.6612 -0.52 ?20373 -9.6314 -0.41 20602 -9.9943 -0.56 ?19897 -9.0474
35 -0.63 20158 -8.6926 -0.66 ?20317 -9.1048 -0.33 20541 -10.686 -0.45 ?19899 -9.9646
36 -0.61 20076 -7.9394 -1.1 ?19055 -8.1089 -0.35 20527 -9.4003 -0.5 ?19826 -8.7527
37 -0.39 20574 -8.7607 -0.67 ?20773 -7.5065 -0.35 20738 -8.8435 -0.45 ?19963 -8.1802
The worst -2.02 20044 -6.6257 -1.1 ?19055 -7.2061 -0.91 19726 -7.7108 -1.8 ?19770 6.8442
Best -0.36 20574 -10.022 -0.37 ?21221 -11.012 -0.2 20738 -16.799 -0.45 ?20038 -10.442
On average -0.829 20200 -8.263 -0.666 ?20295 -9.321 -0.47919 20531.95 -10.3993 -0.845 ?19894 -6.32534
From Fig. 5 and Fig. 6 as seen, the relative Faac of the average ODG of NCTU improves 0.163, and uses the relative NCTU of average ODG of Megal of the present invention to improve 0.187, uses the megal of PAM II method basic suitable with Faac; The average N MR of NCTU is relative, and Faac has reduced 1.06dB, and uses the relative NCTU of average N MR of Megal of the present invention to reduce 1.08dB, uses the megal average N MR of PAM II method will be higher than Faac.Similarly conclusion can obtain in the NMR distribution plan of the ODG distribution plan of Fig. 7 of test clips and Fig. 8.The calculated amount assessment illustrates all that with the tonequality objective evaluation the present invention can make the AAC scrambler obtain the tonequality that significantly improves with the calculated amount that significantly reduces.
The comparison of the parameter that the parameter of the present invention by present frame and frame length phase in the past are average, and the time domain Pre echoes is judged, realized taking into full account time domain masking and frequency domain is sheltered the psychoacoustic model of (time-frequency is sheltered), the final output subband Bit Allocation in Discrete scale prediction of encoding accurately, can improve the coding quality of quantization encoding algorithm, comparing operand simultaneously with traditional psychoacoustic model algorithm also has reduction significantly.

Claims (13)

1. the audio-frequency processing method based on the psychoacoustic model of advanced audio coder is characterized in that, comprises following processing procedure:
A, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculate perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix;
B, by the perceptual entropy and the masking threshold of coding subband, use time-frequency and shelter and revise and the Pre echoes correction, calculate and obtain coding subband bit consumption prediction number, concrete processing procedure is:
B1, relatively the encode current masking threshold of subband obtains time-frequency with long-term average masking threshold and shelters modifying factor,
B2, judge by time domain masking whether Pre echoes loses and shelter, in this way, then the two continuous frames time-frequency is sheltered modifying factor correction: brust ' according to following principle Sfb(k)=brust Sfb(k) ChnBrust, wherein, brust ' Sfb(k) for sheltering modifying factor, brust through the time-frequency of Pre echoes correction Sfb(k) shelter modifying factor for former time-frequency, k is a k coding subband, chnBrust=3 during the first frame correction, and chnBrust=2 during the second frame correction,
B3, use time-frequency are sheltered modifying factor correction perceptual entropy, calculate to obtain coding subband bit consumption prediction number, and concrete steps are:
B31, the perceptual entropy of using time-frequency the to shelter modifying factor correction coding subband subband bit consumption prediction ratio that obtains encoding,
B32, carry out interframe negative feedback bit control, obtain the available bit number of present frame according to actual bit consumption,
B33, calculate by the available bit number of coding subband bit consumption prediction ratio and present frame and to obtain coding subband bit consumption prediction number,
Wherein, the ratio of coding subband bit consumption prediction described in the step B31 obtains by following formula:
sfbBitRatio ( k ) = PE sfb ( k ) &Sigma; k = 1 49 PE sfb ( k ) brust &prime; sfb ( k ) , Wherein, sfbBitRatio (k) is coding subband bit consumption prediction ratio, brust ' Sfb(k) shelter modifying factor for time-frequency, PE Sfb(k) be the perceptual entropy of coding subband, k is a k coding subband;
C, psychoacoustic model output encoder subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process.
2. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, long-term average masking threshold obtains by following formula described in the step B1: Argmask Sfb(k)=α Argmask ' Sfb(k)+(1-α) mask Sfb(k)
Wherein, Argmask ' Sfb(k) be the long-term average masking threshold of coding subband of previous frame, Argmask Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask Sfb(k) be present frame coding subband masking threshold, α is a damped expoential, and k is a k coding subband;
Described time-frequency is sheltered modifying factor and is obtained by following formula:
chk = mask sfb ( k ) Argmask sfb ( k ) , If chk>4, brust sfb ( k ) = min ( 1.5 , log 2 ( chk ) 2 ) , α=0.98;
If 4 〉=chk 〉=0.5, brust at this moment Sfb(k)=0.95, α=0.4;
If chk<0.5, brust at this moment Sfb(k)=0.90, α=0.4;
Wherein, chk is an energy ratio, brust Sfb(k) shelter modifying factor for the time-frequency of present encoding subband.
3. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, judges by time domain masking whether Pre echoes loses to shelter described in the step B2 and comprises following steps:
B21, a frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and and be placed in 8 elements in centre of segmentation absolute amplitude abamp:
abamp ( m + 1 ) = &Sigma; n = 256 ( m - 1 ) + 1 256 m | x i ( n ) | , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; , 8
Wherein, abamp is 10 * 1 vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and
Figure FSB00000569748500024
Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of present frame; Wherein, i is a present frame, and i-1 is a previous frame; x i(n) be n time-domain signal point of present frame;
B22, the segmentation absolute amplitude that is obtained by step B21 calculate time domain mask Tmask (m) by following formula:
Tmask ( m ) = Tnorm ( m ) &Sigma; t = 1 m + 2 abamp ( t ) Rate Tmask ( m - t + 3 )
Time domain diffusive attenuation coefficients R ate wherein TmaskFor
Rate Tmask=[0.1?0.9 0?0.9 1?0.9 2?0.9 3?0.9 4?0.9 5?0.9 6?0.9 7?0.9 8]
Time domain diffusion normalization coefficient Tnorm (m) is
Tnorm ( m ) = 1 &Sigma; t = 1 m + 2 Rate Tmask ( t ) , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; , 8 ;
B23, when 1.3Tmask (1)<Tmask (8) and Tmask (8)>2000, be judged as Pre echoes and lose and to shelter.
4. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1, it is characterized in that, the available bit number of present frame described in the step B32 is obtained by following formula: bitAvailable (i)=controlRatio (bitAverage+bitAvailable (i-1)-bitUsed), wherein, controlRatio is the interframe modifying factor, the average number of bits that bitAverage can use for the every frame that obtains according to average bit rate, bitAvailable (i-1) is the previous frame available bit number, bitUsed is the bit number of previous frame actual consumption, and described interframe modifying factor is determined by following principle:
If bitRatio>1.06, controlRatio = 1 bitRatio + 0.2 ,
If 1.06 〉=bitRatio>1.05, controlRatio=0.9,
If 1.05 〉=bitRatio>1.02, controlRatio=0.95,
If 1.02 〉=bitRatio 〉=0.98, controlRatio=1,
If bitRatio<0.98, controlRatio=1.2, wherein
Figure FSB00000569748500033
Be the ratio of current average every frame bit number bitAll/K and available average number of bits, i is a present frame, and K is the current frame number of having handled, and bitAll is the total number of bits of current use.
5. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, the subband of coding described in step B33 bit consumption prediction number is obtained by following formula:
SfbBits (k)=bitAvailable (i) sfbBitRatio (k), wherein, sfbBits (k) is coding subband bit consumption prediction number, bitAvailable (i) is the present frame available bit number, sfbBitRatio (k) is coding subband bit consumption prediction ratio, and k is a k coding subband.
6. as the audio-frequency processing method of the arbitrary described psychoacoustic model based on advanced audio coder of claim 1 to 5, it is characterized in that steps A comprises following steps:
A1, obtain the psychologic acoustics sub belt energy by the spectrum energy addition of the psychologic acoustics subband of code stream to be encoded;
A2, calculate psychologic acoustics sub belt energy peak-to-valley value by the psychologic acoustics sub belt energy;
A3, psychologic acoustics sub belt energy peak-to-valley value is mapped as the masking signal ratio by the second order linear equations;
A4, utilize masking signal than and the psychologic acoustics sub belt energy calculate the psychologic acoustics subband shelter energy certainly;
A5, by diffusion matrix by sheltering the masking threshold that energy obtains the psychologic acoustics subband certainly;
A6, calculate the perceptual entropy of psychologic acoustics subband by psychologic acoustics sub belt energy and masking threshold;
A7, the perceptual entropy of psychologic acoustics subband and masking threshold be mapped to the perceptual entropy and the masking threshold of coding subband respectively.
7. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6, it is characterized in that, diffusion matrix described in the steps A 5 is sparse diffusion matrix, rarefaction to diffusion matrix is by realizing that with being changed to 0 less than the element of being scheduled to the decibel threshold value in the normalization diffusion matrix normalized factor of described normalization diffusion matrix obtains by following formula:
sprdngN ( b ) = &Sigma; bb = 1 70 sprdngf [ bavl ( b ) - bval ( bb ) ] , Wherein, sprdngN (b) is a normalized factor, and bavl (b) and bval (bb) are the Bark frequency, and sprdngf () is a diffusion equation; Described diffusion equation is determined by following principle:
spr = sprdngf ( &Delta;f c )
= &Delta;f c < = - 3.3333 , spr = 0 - 3.3333 < &Delta; f c < = 0 , spr = 10 15.811389 + 7.5 ( 1.5 &Delta; f c + 0.474 ) - 17.5 1 + ( 1.5 &Delta; f c + 0.474 ) 2 10 0 < &Delta; f c < = 0.5 , spr = 10 15.811389 + 7.5 ( 3 &Delta; f c + 0.474 ) - 17.5 1 + ( 3 &Delta; f c + 0.474 ) 2 10 0.5 < &Delta; f c < = 2.5 , spr = 10 8 [ ( 3 &Delta; f c - 1.5 ) 2 - 1 ] + 15.811389 + 7.5 ( 3 &Delta; f c + 0.474 ) - 17.5 1 + ( 3 &Delta; f c + 0.474 ) 2 10 2.5 < &Delta; f c < = 7.3333 , spr = 10 15.811389 + 7.5 ( 3 &Delta; f c + 0.474 ) - 17.5 1 + ( 3 &Delta; f c + 0.474 ) 2 10 &Delta; f c > 7.3333 , spr = 0 , Wherein,
Spr is the value of diffusion equation, and b is a b psychologic acoustics subband, and bb is a bb psychologic acoustics subband, and Δ fc represents formula bavl (b)-bval (bb).
8. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the peak-to-valley value of psychologic acoustics sub belt energy described in the steps A 2 obtains by following formula:
ppRate ( b ) = E psy ( &epsiv; ) E psy ( b ) = min ( E psy ( b - 1 ) , E psy ( b + 1 ) ) E psy ( b ) , Wherein, ppRate (b) is a psychologic acoustics sub belt energy peak-to-valley value, E Psy(b) be current psychologic acoustics sub belt energy, E Psy(b-1), E Psy(b+1) be respectively a psychological acoustics subband and next psychologic acoustics sub belt energy, b is a b psychologic acoustics subband, and ε is (b-1) individual psychologic acoustics subband or (b+1) individual psychologic acoustics subband.
9. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the second order linear equations is described in the steps A 3:
MSR Psy(b)=0.17453ppRate (b) 2+ 0.08325ppRate (b), wherein, MSR Psy(b) be the masking signal ratio, ppRate (b) is a psychologic acoustics sub belt energy peak-to-valley value, and b is a b psychologic acoustics subband.
10. the disposal route of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that, obtains by following formula from sheltering energy described in the steps A 4:
E Selfmask(b)=E Psy(b) MSR Psy(b), wherein, E Selfmask(b) for sheltering energy, E certainly Psy(b) be the psychologic acoustics sub belt energy, MSR Psy(b) be the masking signal ratio, b is a b psychologic acoustics subband.
11. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that masking threshold obtains by following formula described in the steps A 5:
Mask Psy(b)=E Selfmask* sprdngMN, wherein, mask Psy(b) be the masking threshold of psychologic acoustics subband, sprdngMN is a diffusion matrix, E SelfmaskFor sheltering energy certainly, b is a b psychologic acoustics subband.
12. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the perceptual entropy of psychologic acoustics subband described in the steps A 6 obtains by following formula: PE psy ( b ) = bw psy ( b ) log 10 [ E psy ( b ) mask psy ( b ) ] , Wherein, PE Psy(b) be psychologic acoustics subband perceptual entropy, bw Psy(b) be psychologic acoustics subband bandwidth, E Psy(b) be the psychologic acoustics sub belt energy, mask Psy(b) be the masking threshold of psychologic acoustics subband, b is a b psychologic acoustics subband.
13. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that, the perceptual entropy of psychologic acoustics subband described in the steps A 7 is mapped to the perceptual entropy of coding subband by following formula:
PE sfb ( k ) = &Sigma; w = sfbLow ( k ) w = sfbHigh ( k ) PE spec ( w ) , Wherein, PE Sfb(k) be coding subband perceptual entropy; SfbLow (k), sfbHigh (k) are respectively the lower bound and the upper bound of coding subband k;
Figure FSB00000569748500054
Bw Psy(b) be psychologic acoustics subband bandwidth, PE Psy(b) be psychologic acoustics subband perceptual entropy, psyLow (b)≤w≤psyHigh (b), psyHigh (b), psyLow (b) is respectively the upper bound and the lower bound of psychologic acoustics subband b; W is a w frequency spectrum in b the psychologic acoustics subband;
Described psychologic acoustics subband masking threshold is mapped to the masking threshold of coding subband by following formula:
Mask Sfb(k)=bw Sfb(k) min (mask Apsy(b)), b1≤b≤b2, wherein, mask Sfb(k) be the masking threshold of coding subband, b1 satisfies psyLow (b1)≤sfbLow (k)≤psyHigh (b1), and b2 satisfies psyLow (b2)≤sfbHigh (k)≤psyHigh (b2),
Figure FSB00000569748500061
Mask Psy(b) be psychologic acoustics subband masking threshold; PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Bw Sfb(k) be coding subband bandwidth.
CN2007101276606A 2007-05-16 2007-06-20 Psychoacoustics model processing method based on advanced audio decoder Active CN101308659B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2007101276606A CN101308659B (en) 2007-05-16 2007-06-20 Psychoacoustics model processing method based on advanced audio decoder

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN200710074481 2007-05-16
CN200710074481.0 2007-05-16
CN2007101276606A CN101308659B (en) 2007-05-16 2007-06-20 Psychoacoustics model processing method based on advanced audio decoder

Publications (2)

Publication Number Publication Date
CN101308659A CN101308659A (en) 2008-11-19
CN101308659B true CN101308659B (en) 2011-11-30

Family

ID=40125072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007101276606A Active CN101308659B (en) 2007-05-16 2007-06-20 Psychoacoustics model processing method based on advanced audio decoder

Country Status (1)

Country Link
CN (1) CN101308659B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101826327B (en) * 2009-03-03 2013-06-05 中兴通讯股份有限公司 Method and system for judging transient state based on time domain masking
CN102804263A (en) * 2009-06-23 2012-11-28 日本电信电话株式会社 Coding method, decoding method, and device and program using the methods
CN102714040A (en) * 2010-01-14 2012-10-03 松下电器产业株式会社 Encoding device, decoding device, spectrum fluctuation calculation method, and spectrum amplitude adjustment method
CN106373583B (en) * 2016-09-28 2019-05-21 北京大学 Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM
CN112530444B (en) * 2019-09-18 2023-10-03 华为技术有限公司 Audio coding method and device
CN111243568B (en) * 2020-01-15 2022-04-26 西南交通大学 Convex constraint self-adaptive echo cancellation method
CN112599140A (en) * 2020-12-23 2021-04-02 北京百瑞互联技术有限公司 Method, device and storage medium for optimizing speech coding rate and operand
CN112599139B (en) * 2020-12-24 2023-11-24 维沃移动通信有限公司 Encoding method, encoding device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1735925A (en) * 2003-01-02 2006-02-15 杜比实验室特许公司 Reducing scale factor transmission cost for MPEG-2 AAC using a lattice
US7020603B2 (en) * 2002-02-07 2006-03-28 Intel Corporation Audio coding and transcoding using perceptual distortion templates
CN1841938A (en) * 2005-03-31 2006-10-04 Lg电子株式会社 Method and apparatus for coding audio signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7020603B2 (en) * 2002-02-07 2006-03-28 Intel Corporation Audio coding and transcoding using perceptual distortion templates
CN1735925A (en) * 2003-01-02 2006-02-15 杜比实验室特许公司 Reducing scale factor transmission cost for MPEG-2 AAC using a lattice
CN1841938A (en) * 2005-03-31 2006-10-04 Lg电子株式会社 Method and apparatus for coding audio signal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JP特开2001-154695A 2001.06.08
胡多传.MPEG-2 AAC音频编解码的研究及实现.《中国优秀博硕士学位论文全文数据库(硕士)》.2005,(第5期),第11-20页. *
黄春明 等.心理声学模型及其在MPEG-2 AAC中的应用.《电声技术》.2004,(第11期),第44-47页. *

Also Published As

Publication number Publication date
CN101308659A (en) 2008-11-19

Similar Documents

Publication Publication Date Title
CN101308659B (en) Psychoacoustics model processing method based on advanced audio decoder
CN101297356B (en) Audio compression
CN101276587B (en) Audio encoding apparatus and method thereof, audio decoding device and method thereof
CN107945811B (en) Frequency band expansion-oriented generation type confrontation network training method and audio encoding and decoding method
CN101521014B (en) Audio bandwidth expansion coding and decoding devices
CN103903626B (en) Sound encoding device, audio decoding apparatus, voice coding method and tone decoding method
KR101120911B1 (en) Audio signal decoding device and audio signal encoding device
CN105679327B (en) Method and apparatus for encoding and decoding audio signal
Parvaix et al. A watermarking-based method for informed source separation of audio signals with a single sensor
JP2774203B2 (en) Audio signal processing method
KR101238239B1 (en) An encoder
CN101149925B (en) Space parameter selection method for parameter stereo coding
Ravelli et al. Union of MDCT bases for audio coding
CN103959375A (en) Enhanced chroma extraction from an audio codec
CN103366749B (en) A kind of sound codec devices and methods therefor
CN101202043B (en) Method and system for encoding and decoding audio signal
CN104103276A (en) Sound coding device, sound decoding device, sound coding method and sound decoding method
Ramprashad A two stage hybrid embedded speech/audio coding structure
CN101847413A (en) Method for realizing digital audio encoding by using new psychoacoustic model and quick bit allocation
CN102460574A (en) Method and apparatus for encoding and decoding audio signal using hierarchical sinusoidal pulse coding
CN101308661B (en) Quantizer code rate distortion controlling means based on advanced audio coder
Wu et al. Low bitrates audio object coding using convolutional auto-encoder and densenet mixture model
BR112021007516A2 (en) audio encoder, audio processor and method for processing an audio signal
Zhang et al. MDCT spectrum separation: Catching the fine spectral structures for stereo coding
CN103489450A (en) Wireless audio compression and decompression method based on time domain aliasing elimination and equipment thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant