CN101308659B - Psychoacoustics model processing method based on advanced audio decoder - Google Patents
Psychoacoustics model processing method based on advanced audio decoder Download PDFInfo
- Publication number
- CN101308659B CN101308659B CN2007101276606A CN200710127660A CN101308659B CN 101308659 B CN101308659 B CN 101308659B CN 2007101276606 A CN2007101276606 A CN 2007101276606A CN 200710127660 A CN200710127660 A CN 200710127660A CN 101308659 B CN101308659 B CN 101308659B
- Authority
- CN
- China
- Prior art keywords
- subband
- psychologic acoustics
- psy
- sfb
- coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a psychoacoustic model processing method based on an advanced audio encoder. The psychoacoustic model processing method includes the following steps: A, the perceptual entropy threshold value and the masking limen of a coding sub-band are obtained by the spectrum energy of the psychoacoustic of the sub-band of the bit stream to be encoded through masked diffusion matrix algorithm; B, anticipated bit consumption of the sub-band is calculated by employing time-frequency correction and anticipated echo correction through the perceptual entropy threshold value and the masking limen of the coding sub-band; C, the psychoacoustic model outputs the anticipated bit consumption of the sub-band, which then serves as a parameter for code rate distortion so as to carry out the encoding process. The psychoacoustic model processing method can obtain the bit consumption of the sub-band through the perceptual entropy more accurately and the anticipated value is taken by the encoder as the parameter for code rate distortion control, thus greatly improving the encoding efficiency and quality when the encoder makes quantized encoding.
Description
Technical field
The present invention relates to advanced audio coder, relate in particular to a kind of disposal route of the psychoacoustic model based on advanced audio coder.
Background technology
(Advanced Audio Coding AAC) belongs to a kind of transform domain and diminishes sensing audio encoding advanced audio.Diminish sensing audio encoding and can obtain very high ratio of compression, but its encoding error (quantizing noise) is inevitably higher.In order to reduce the influence of quantizing noise, diminish sensing audio encoding and control the distribution of encoding error, thereby make the noise that produces by quantization error be difficult to be discovered by the psychologic acoustics effect of research people ear.This process realizes by psychoacoustic model diminishing in the perceptual coding.
The distribution of psychoacoustic model control quantization error has utilized the auditory masking phenomenon of people's ear.Occlusion is a kind of common psycho-acoustic phenomenon, it is to be determined by the frequency discrimination mechanism and the time resolution mechanism of people's ear to sound, refer near a stronger sound, more weak sound will not discovered by people's ear relatively, promptly sheltered by forte, at this moment forte is called the person of sheltering (Masker), and off beat is masked person (Maskee).Masking effect be divided into simultaneously and shelter (Simultaneous Masking, SM) and the different time shelter (Heterochronous Masking, HM).Shelter simultaneously and be meant when occlusion occurs in the person of sheltering and masked person and exists simultaneously, be also referred to as frequency domain and shelter; When the different time masking effect of sheltering occurs in the person of sheltering and masked person and do not exist simultaneously, be also referred to as time domain masking.Different time shelter and shelter before the front and back order that takes place according to the person of sheltering is divided into again (Forward Masking, FM) and after shelter (Backward Masking, BM).If masking effect occurs in the person's of sheltering certain time before beginning, shelter before then being, shelter after taking place afterwards then to be referred to as.
The tradition psychoacoustic model provides two important parameters for scrambler, one is perceptual entropy, its representation signal is considered the auditory masking effect of people's ear, removed the size of the quantity of information after people's the perception redundancy, it can be used for the Bit Allocation in Discrete of estimated coding, also can be in order to judge the block type of coding; Another is the scrambler threshold value, and it is each coding subband largest tolerable noise, can be in order to carry out the distortion control of quantizer.Use the general quantization algorithm that adopts of AAC scrambler of traditional psychoacoustic model to be based on the rate distortion control algolithm (Rate-Distortion of scrambler threshold value, R-D), this algorithm has two nested loop searching algorithms (Two Loop Search, TLS), lattice shape framework algorithm (Trellis-Based) and cascade lattice shape framework algorithm (Cascaded Trellis-Based), wherein back two kinds is deriving of two nested loop searching algorithms.Quantizer in the AAC scrambler is a quantizer heterogeneous, and its entropy coding is elongated huffman coding.But because the use of non-uniform quantizing device, make scrambler to tolerate that noise specifies the coder parameters of enough optimization according to perception, and, elongated entropy coding obtains because causing the bit consumption number to calculate by quantized result, these factors make that the parameter that traditional psychoacoustic model provided can not be well in order to the quantification and the coding of control signal, and this has caused the complexity and the poor efficiency of present conducting code rate distortion control algorithm.
The Bit Allocation in Discrete of the two-layer embedded iteration that discards tradition and distortion control algolithm, utilize subband Bit Allocation in Discrete scale prediction to finish the conducting code rate distortion control of Rate Control and distortion control simultaneously, can obtain higher counting yield, its coding tonequality will depend on the enough optimization of subband Bit Allocation in Discrete scale prediction.Subband bit consumption prediction number can be by formula: subband bit consumption prediction bit number/all subband perceptual entropy and acquisitions that number=subband perceptual entropy * present frame can be used.Wherein, be to decide bit rate coding (CBR) as coding, so present frame can with bit number be a definite value, equal bit rate * 1024/ sampling rate; If, be exactly variable bit rate coding (VBR) so along with operating position changes, present frame in this case can with bit number generally provide by interframe bit control algolithm.As can be seen, subband bit consumption prediction number only is that the product by normalized perceptual entropy and present frame available bit number obtains, and accuracy is not high, and then has influence on the efficient of conducting code rate distortion control.And masking effect when only having considered people's ear owing to traditional psychoacoustic model has been ignored the different time masking effect, scrambler can not utilize the different time to shelter to improve coding quality, in case before shelter inefficacy, quantizing noise can not be sheltered and when Pre echoes took place, tonequality can significantly reduce.Though the instantaneous noise shaping is provided in the AAC standard, and (Temporal Noise Shaping, TNS) to weaken the influence of Pre echoes, actual test shows uses this module can worsen tonequality more.
Summary of the invention
The present invention is exactly in order to solve the problems of the technologies described above, a kind of disposal route of the psychoacoustic model based on advanced audio coder has been proposed, taken into full account time domain masking and frequency domain is sheltered, thereby output is encoded accurately subband bit consumption prediction number has improved coding quality and efficient that scrambler carries out quantization encoding.
To achieve these goals, the present invention has adopted following technical scheme:
A kind of disposal route of the psychoacoustic model based on advanced audio coder comprises following processing procedure:
A, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculate perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix;
B, by the perceptual entropy and the masking threshold of coding subband, use time-frequency and shelter and revise and the Pre echoes correction, calculate and obtain subband bit consumption prediction number;
C, psychoacoustic model output subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process.
Described step B comprises following processing procedure:
B1, relatively the encode current masking threshold of subband obtains time-frequency with long-term average masking threshold and shelters modifying factor;
B2, judge by time domain masking whether Pre echoes loses and shelter, in this way, revise time-frequency and shelter modifying factor;
B3, use time-frequency are sheltered modifying factor correction perceptual entropy and are calculated acquisition subband bit consumption prediction number.
Described long-term average masking threshold among the step B1 obtains by following formula: Argmask
Sfb(k)=α Argmask
Sfb' (k)+(1-α) mask
Sfb(k) wherein, Argmask
Sfb' (k) be the long-term average masking threshold of coding subband of previous frame, Argmask
Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask
Sfb(k) be present frame coding subband masking threshold, α is a damped expoential;
Described time-frequency is sheltered modifying factor and is obtained by following formula:
If chk>4,
α=0.98;
If chk 〉=0.5, brust at this moment
Sfb(k)=0.95, α=0.4;
If chk<0.5, brust at this moment
Sfb(k)=0.90, α=0.4;
Wherein, chk is an energy ratio, brust
Sfb(k) be the time domain masking modifying factor.
Judge by time domain masking whether Pre echoes loses to shelter described in the step B2 and comprise following steps:
B21, a frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and and be placed in 8 elements in centre of segmentation absolute amplitude abamp:
Wherein, abamp is 10 * 1 vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and
Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of this frame;
B22, the segmentation absolute amplitude that is obtained by step B21 calculate time domain mask Tmask (m) by following formula:
Time domain diffusive attenuation coefficients R ate wherein
TmaskFor
Rate
Tmask=[0.10.9
00.9
10.9
20.9
30.9
40.9
50.9
60.9
70.9
8] time domain diffusion normalization coefficient Tnorm (m) is
B23, when 1.3Tmask (1)<Tmask (8) and Tmask (8)>2000, be judged as Pre echoes and lose and to shelter.
Lose when sheltering when being judged as Pre echoes, by following principle the two continuous frames time-frequency is sheltered modifying factor correction: brust '
Sfb(k)=brust
Sfb(k)
ChnBrust, wherein, brust '
Sfb(k) for sheltering modifying factor, brust through the time-frequency of Pre echoes correction
Sfb(k) be former time domain masking modifying factor, chnBrust=3 during the first frame correction, chnBrust=2 during the second frame correction.
Step B3 realizes as follows:
B31, use time-frequency are sheltered modifying factor correction perceptual entropy and are obtained subband bit consumption prediction ratio;
B32, carry out interframe negative feedback bit control, obtain the available bit number of present frame according to actual bit consumption;
B33, calculate by the available bit number of subband bit consumption prediction ratio and present frame and to obtain subband bit consumption prediction number.
The bit consumption of subband described in step B31 prediction ratio obtains by following formula:
The available bit number of present frame described in the step B32 is obtained by following formula:
BitAvailable (i)=controlRatio (bitAverage+bitAvailable (i-1)-bitUsed), wherein, controlRatio is the interframe modifying factor, the average number of bits that bitAverage can use for the every frame that obtains according to average bit rate, bitAvailable (i-1) is the previous frame available bit number, bitUsed is the bit number of previous frame actual consumption, and described interframe modifying factor is determined by following principle:
If bitRatio>1.06,
If 1.06 〉=bitRatio>1.05, controlRatio=0.9,
If 1.05 〉=bitRatio>1.02, controlRatio=0.95,
If 1.02 〉=bitRatio 〉=0.98, controlRatio=1,
If bitRatio<0.98, controlRatio=1.2, wherein
Ratio for current average every frame bit number bitAll/K and available average number of bits.
The bit consumption of subband described in step B33 prediction number is obtained by following formula:
SfbBits (k)=bitAvailable (i) sfbBitRatio (k), wherein, sfbBits (k) is subband bit consumption prediction number, and bitAvailable (i) is the present frame available bit number, and sfbBitRatio (k) is subband bit consumption prediction ratio.
Steps A comprises following steps:
A1, obtain the psychologic acoustics sub belt energy by the spectrum energy addition of the psychologic acoustics subband of code stream to be encoded;
A2, calculate the sub belt energy peak-to-valley value by the psychologic acoustics sub belt energy;
A3, the sub belt energy peak-to-valley value is mapped as the masking signal ratio by the second order linear equations;
A4, utilize masking signal than and the psychologic acoustics sub belt energy calculate subband shelter energy certainly;
A5, by diffusion matrix by sheltering the masking threshold that energy obtains the psychologic acoustics subband certainly;
A6, calculate the perceptual entropy of psychologic acoustics subband by psychologic acoustics sub belt energy and masking threshold;
A7, the perceptual entropy of psychologic acoustics subband and masking threshold be mapped to the perceptual entropy and the masking threshold of coding subband respectively.
Diffusion matrix described in the steps A 5 is sparse diffusion matrix, is to realize that by will be in the normalization diffusion matrix being changed to 0 less than the element of predetermined decibel threshold value the normalized factor of described normalization diffusion matrix obtains by following formula to the rarefaction of diffusion matrix:
Wherein, sprdngN (b) is a normalized factor, and bavl (b) and bval (bb) are the Bark frequency, and sprdngf is a diffusion equation;
Described diffusion equation is determined by following principle:
The sub belt energy peak-to-valley value obtains by following formula described in the steps A 2:
Wherein, ppRate (b) is the sub belt energy peak-to-valley value, E
Psy(b) be current psychologic acoustics sub belt energy, E
Psy(b-1), E
Psy(b+1) be respectively a psychological acoustics subband and next psychologic acoustics sub belt energy.
The second order linear equations is described in the steps A 3:
MSR
Psy(b)=0.17453ppRate (b)
2+ 0.08325ppRate (b), wherein, MSR
Psy(b) be the masking signal ratio, ppRate (b) is the sub belt energy peak-to-valley value.
Obtain by following formula from sheltering energy described in the steps A 4:
E
Selfmask(b)=E
Psy(b) MSR
Psy(b), wherein, E
Selfmask(b) for sheltering energy, E certainly
Psy(b) be the psychologic acoustics sub belt energy, MSR
Psy(b) be the masking signal ratio.
Masking threshold obtains by following formula described in the steps A 5:
Mask
Psy(b)=E
Selfmask* sprdngMN, wherein, mask
Psy(b) be the masking threshold of psychologic acoustics subband, sprdngMN is a diffusion matrix.
The perceptual entropy of psychologic acoustics subband described in the steps A 6 obtains by following formula:
The perceptual entropy of psychologic acoustics subband described in the steps A 7 is mapped to the perceptual entropy of coding subband by following formula:
Described psychologic acoustics subband masking threshold is mapped to the masking threshold of coding subband by following formula:
Mask
Sfb(k)=bw
Sfb(k) min (mask
Apsy(b)), b1≤b≤b2, wherein, mask
Sfb(k) be the masking threshold of coding subband, b1 satisfies psyLow (b1)≤sfblow (k)≤psyhigh (b1), and b2 satisfies psyLow (b2)≤sfbhigh (k)≤psyhigh (b2),
Mask
Psy(b) be psychologic acoustics subband masking threshold; PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Sfblow (k), sfbhigh (k) are respectively the upper bound and the lower bound of coding subband k.Bw
Sfb(k) be coding subband bandwidth.
The comparison of the parameter that the parameter of the present invention by present frame and frame length phase in the past are average, and judge that by time domain masking Pre echoes revises, realized taking into full account time domain masking and disposal route that frequency domain is sheltered the psychoacoustic model of (time-frequency is sheltered), thereby obtain subband bit consumption prediction number by perceptual entropy more accurately, carry out the parameter of conducting code rate distortion control with this prediction number as scrambler, improved code efficiency and quality when scrambler carries out quantization encoding greatly.Shelter diffusion matrix by calculating and obtain perceptual entropy, in computation process, carry out sparse processing, thereby can obtain perceptual entropy more quickly, reduced the operand that calculates perceptual entropy sheltering diffusion matrix.
Description of drawings
Fig. 1 has been to use the structural framing figure of the Megal AAC scrambler of the embodiment of the invention;
Fig. 2 is the process flow diagram of the disposal route of the embodiment of the invention;
Fig. 3 is that masking signal is than the constraint subband upper bound on different sub-band and constraint subband lower bound synoptic diagram;
Fig. 4 is that Pre echoes loses the judgement synoptic diagram of sheltering;
Fig. 5 is that the ODG index of several scramblers compares synoptic diagram;
Fig. 6 is the comparison synoptic diagram of the NMR index of several scramblers;
Fig. 7 is the ODG distribution schematic diagram of several scramblers;
Fig. 8 is the NMR distribution schematic diagram of several scramblers.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the present invention is described in detail.
The embodiment of disposal route of the present invention is referring to Fig. 2, and its concrete treatment step is as follows:
1, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculates perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix
1a) MDCT (improved discrete cosine transform) the spectrum energy addition with each psychologic acoustics subband of present frame obtains psychologic acoustics sub belt energy E
Psy
1b) calculate sub belt energy peak-to-valley value ppRate (b)
Wherein b represents the current sub index, and b-1 and b+1 represent a subband and next subband respectively.
Obtain after the sub belt energy peak-to-valley value it being constrained between [lower (b), upper (b)]
If?ppRate(b)>upper(b),ppRate(b)=upper(b)
If?ppRate(b)<lower(b),ppRate(b)=lower(b)
Be ppRate (b)=max (lower (b), min (upper (b), ppRate (b))), wherein,
lower(1)=lower(2)+0.1,lower(70)=lower(69) (2)
upper(b)=lower(b)+0.7
1c) finish the sub belt energy peak-to-valley value and compare MSR to masking signal by the second order linear equations
Psy(b) mapping MSR
Psy(b)=0.17453ppRate (b)
2+ 0.08325ppRate (b) (3)
Wherein, equation once and the quadratic term coefficient for pass through that substantive test obtains than the figure of merit.Referring to Fig. 3, as we can see from the figure, masking signal is than between the constraint upper bound and constraint lower bound than the restraint condition on different psychologic acoustics subbands for masking signal.
1d) utilize psychologic acoustics sub belt energy and masking signal and shelter ENERGY E certainly than what calculate subband
Selfmask(b) E
Selfmask(b)=E
Psy(b) MSR
Psy(b) (4)
1e) utilize the normalization diffusion matrix to calculate masking threshold mask
Psy(b) mask
Psy(b)=E
Selfmask* sprdngMN (5)
Wherein, normalization diffusion matrix sprdngMN is determined by following formula
In the formula (6), bavl () is the mapping function of sub-band serial number to Bark (bark) frequency, the Bark frequency is a kind of frequency partition principle of simulation human hearing characteristic, arrive in the frequency range of 20000Hz 20,25 bark have been divided unevenly, frequency is represented to the nonlinear function of a bark complexity commonly used, usually the bark value that limited usefulness is obtained realizes calculating making table, be used for searching simplifying and calculate, bavl () promptly should simplify reckoner, calculated normalized factor sprdngN (b) in advance by Bark frequency look-up table.
Sprdngf () is a diffusion equation, and its value is obtained by following formula:
Element less than-100dB among the sprdngMN all is changed to 0, and sprdngMN will be a sparse diffusion matrix, and its nonzero term is
SprdngMN always has 672 nonzero terms, can use the calculating that 672 times multiply-add operation is finished masking threshold.
After calculating masking threshold, it is retrained, make it on quiet threshold of audibility, as shown in the formula:
mask
psy(b)=max[mask
psy(b),qthr(b)] (9)
In the formula, qthr (b) is quiet threshold of audibility.
1f) calculate perceptual entropy PE by psychologic acoustics sub belt energy and masking threshold
Psy(b)
Wherein, bw
Psy(b) be psychologic acoustics subband bandwidth.
1g) the coding subband of acquisition perceptual entropy and masking threshold mapping
Calculate the perceptual entropy of each frequency spectrum in the psychologic acoustics subband
Be mapped to the coding subband
PsyLow (b)≤w≤psyHigh (b) wherein, psyHigh (b), psyLow (b) is respectively the upper bound and the lower bound of psychologic acoustics subband b; Sfblow (b), sfbhigh (b) are respectively the upper bound and the lower bound of coding subband b.
Calculate the masking threshold of each frequency spectrum in the psychologic acoustics subband
Be mapped to the coding subband
mask
sfb(k)=bw
sfb(k)min(mask
apsy(b)),b1≤b≤b2 (14)
Wherein b1 satisfies
psyLow(b1)≤sfblow(k)≤psyhigh(b1) (15)
B2 satisfies
psyLow(b2)≤sfbhigh(k)≤psyhigh(b2) (16)
PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Sfblow (k), sfbhigh (k) are respectively the upper bound and the lower bound of coding subband k.Bw
Sfb(k) be coding subband bandwidth.
2, more current masking threshold obtains time-frequency with long-term average masking threshold and shelters modifying factor
Coding subband masking threshold according to present frame upgrades the long-term average masking threshold of coding subband
Argmask
sfb(k)=αArgmask
sfb′(k)+(1-α)mask
sfb(k) (17)
Argmask
Sfb' (k) be the long-term average masking threshold of coding subband of previous frame, Argmask
Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask
Sfb(k) be present frame coding subband masking threshold, wherein α is a damped expoential, and its is according to the difference of sheltering situation and difference, and concrete value is definite by formula (18).
The subband of relatively encoding shelters energy and the coding subband is on average sheltered energy for a long time, obtains energy ratio
3, judge Pre echoes by time domain masking, revise time-frequency and shelter modifying factor
Can judge that Pre echoes loses by time domain masking and shelter, utilize time-frequency to shelter the accuracy that modifying factor is carried out subsequent processing steps if take place then the time domain masking modifying factor to be revised so that further improve.Concrete steps are:
One frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and, be placed in 8 elements in centre of segmentation absolute amplitude abamp
Abamp is one 10 * 1 a vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and
Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of this frame.Time domain mask Tmask (m) is one 8 * 1 a vector, calculates by following formula
Time domain diffusive attenuation coefficients R ate wherein
TmaskFor
Rate
Tmask=[0.10.9
00.9
10.9
20.9
30.9
40.9
50.9
60.9
70.9
8] (22)
Time domain diffusion normalization coefficient Tnorm (m) is
When 1.3Tmask (1)<Tmask (8) and Tmask (8)>2000, be judged as the Pre echoes mistake and shelter, its determine effect is seen Fig. 4.When judging that the Pre echoes mistake is sheltered, the time-frequency of two continuous frames is sheltered modifying factor carries out the Pre echoes correction:
brust′
sfb(k)=brust
sfb(k)
chnBrust (24)
Wherein, brust '
Sfb(k) for sheltering modifying factor, chnBrust=3 during the first frame correction, chnBrust=2 during the second frame correction through the time-frequency of Pre echoes correction.
4, use time-frequency to shelter modifying factor correction perceptual entropy and obtain subband bit consumption prediction ratio sfbBitRatio (k)
5, carry out the control of interframe negative feedback bit according to actual bit consumption, calculate coding subband bit consumption prediction number by subband bit consumption prediction ratio, concrete steps are:
5a) negative feedback interframe bit correction
Make the current total number of bits of using be bitAll, the current frame number of having handled is K, the bit number of previous frame actual consumption is bitUsed, the every frame that obtains according to average bit rate can with average number of bits be bitAverage, the previous frame available bit number is bitAvailable (i-1), current average every frame bit number is bitAll/K, the ratio of it and average number of bits
The available bit number bitAvailable of present frame (i) is
(bitAverage+bitAvailable (i-1)-bitUsed) (26) constrains in it in certain scope bitAvailable (i)=controlRatio
β·bitAverage≤bitAvailable(i)≤α·bitAverage (27)
Wherein, 0<α<1, α=0.95 is generally established in β>1, and β=1.2 are proper.
5b) calculation code subband bit consumption prediction number sfbBits (k)
sfbBits(k)=bitAvailable(i)sfbBitRatio(k) (28)
6, psychoacoustic model output subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process
Behind the subband bit consumption that obtains the encoding prediction number, psychoacoustic model is exported the parameter of this prediction number as conducting code rate distortion control, and conducting code rate distortion control carries out entropy coding and code stream is synthetic, finishes encoding process.
Given each threshold value, parameter and coefficient in the above present embodiment, be that experiment obtains than the figure of merit, the present invention does not limit and only gets aforementioned disclosed numerical value, under design of the present invention, it will be understood by those skilled in the art that and to carry out certain adjustment to reach better effect according to the RST of reality to above numerical value.
Psychoacoustic model of the present invention is called entropy and distributes psychoacoustic model (Entropy-allocation psychoacoustic model, EAPAM), with in multiple audio coding such as MP3, adopt, traditional psychoacoustic model 2 (PAMII) that MPEG-4 AAC standard provides compares.Megal AAC Encoder (Megal) is a kind of AAC scrambler that utilizes the prediction of subband bits proportion to instruct conducting code rate distortion control, and its structural framing as shown in Figure 1.About algorithm complex is assessed by Free Advanced Audio Coder (FAAC) that relatively uses PAM II and the Megal AAC Encoder that uses EAPAM, 44100Hz is being sampled, the stereo audio of 16 quantifications averages under the situation that bit rate is the 128Kbps coding and carries out, and reference index is per second 1,000,000 operands.
The calculated amount of table 1 psychoacoustic model type and the calculated amount of encryption algorithm
* 1 use look-up tables'implementation
* 2 use sparse diffusion matrix
As seen from Table 1, the calculated amount of EAPAM algorithm has reduced 48.478MOPS with respect to PAM II, the proportion that this module accounts for the amount of calculation has been reduced to 17% from 57%, the R-D algorithm distributes because used the prediction of subband bits proportion to instruct, calculated amount is reduced to 12.8MOPS from 35MOPS, overall calculated amount has reduced 69.6MOPS, and amplitude reaches 76.7%.
EAQUAL1.3 is used in the tonequality assessment of scrambler, and this is to use the objective evaluation program of sensing audio objective evaluation standard P EAQ, and assessment of acoustics index name and description thereof that PEAQ provides see Table 2
The evaluation index and the meaning of the output of table 2 eaqual software
Index name | The index meaning |
?ODG | Objective difference level (objective difference grade) |
?DIX | Distortion index (distortion index) |
?BandwidthTest | The reference signal frequency span |
?NMR | Noise mask is than (noise to mask radio) |
?WinModDiff1 | The windowing modulation difference is average |
?ADB | Average distortion piece (Average Distortion Block) |
?EHS | Harmonic structure distortion (err of harmonic structure) |
?AvgModDiff1 | Transfer |
?AvgModDiff2 | Transfer |
?MFPD | Maximum filtering probability (Maximum Filtered Probability of Detection) |
?RDF | Disperse frame (Relative Disturbed Frames) relatively |
Here select overall objective (ODG) and two important single indexs (BandwidthTest and NMR) as the main reference index.Tonequality assessment uses four kinds of scramblers with reference to carrying out, and they are laterally assessed, and these four scramblers are respectively the Megal that uses EAPAM model of the present invention and traditional PAM II model, NCTU AAC Encoder (hereinafter to be referred as NCTU) and FAAC.Wherein NCTU is the AAC scrambler of Taiwan university of communications sensing audio group development, FAAC is the AAC scrambler of German FraunhoferIIS exploitation, Fraunhofer IIS is the main maker of Mpeg standard, the checking scrambler that its FAAC scrambler is the AAC standard.First and second of sound equipment British Music on Lyrita from Quad that U.S. Hui Wei company provides used in tonequality assessment source, rejected the song of repetition, chosen wherein 37 music excerpt, and these montages have comprised the fundamental type of melody, and their title and description see Table 3
Table 3 test song
Sequence number | Song | Type specification | Duration (second) |
1 | Snowflake flies upward | Electronic synthesizer, Pre echoes probability height | 84.07 |
2 | Female voice is sung opera arias | Female voice is sung opera arias, English female voice | 59.30 |
3 | shaniaFuain | Popular, English female voice, Pre echoes probability height | 88.68 |
4 | The ferry | Popular, Chinese female voice, Pre echoes probability height | 72.77 |
5 | Da Ban city Miss | Men's chorus | 68.38 |
6 | Hotel california | Eagles | 119.98 |
7 | The drum poem | XRCD, Pre echoes probability height | 65.32 |
8 | The red light note | Beijing opera female voice | 53.43 |
9 | Zhang San's song | Popular, Chinese male voice | 57.77 |
10 | The bass king | Contrabass | 87.49 |
11 | Denon | Orchestral music | 61.21 |
12 | The POLO guitar | Instrumental music | 59.98 |
13 | Chinese lute is to the saxophone | Instrumental music | 84.08 |
14 | The water of the Yellow River has been done | Nationality, Chinese male voice | 69.85 |
15 | Mut's violin | Solo | 61.63 |
16 | OneIlove | Female voice is sung opera arias, English female voice | 74.51 |
17 | High mountain and great rivers | Zheng | 53.96 |
18 | Liang shanbo and Zhu yingtai | Violin association plays | 50.36 |
19 | Fever is classical | Symphony | 76.63 |
20 | Seven-stringed plucked instrument in some ways similar to the zither is to suona horn | National musical instruments | 77.25 |
21 | The hunting polka | Symphony | 59.98 |
22 | Wilfully like you | Popular, the Guangdong language male voice | 89.05 |
23 | 2001 A Space Odysseys | Symphony | 99.45 |
24 | The Song of Joy | Women's chorus, sound literary composition female voice | 128.64 |
25 | Bubukao | Zheng | 63.58 |
26 | The song in the four seasons | Viol | 68.71 |
27 | The toll bar prelude | Small size | 67.43 |
28 | See off | Women's chorus, Chinese young girl's sound | 106.23 |
29 | Knock toll bar | Knock pleasure, Pre echoes probability height | 68.66 |
30 | Carmina Burana | Chorus, English poem | 151.46 |
31 | The fiddler on the Roof | Violin, solo | 72.98 |
32 | Dear father | The soprano | 76.14 |
33 | Tonight is unmanned sleeping | Tenor, opera | 175.94 |
34 | Voice | National ecosystem, female voice | 61.99 |
35 | The F-16 fighter plane | Effect | 43.89 |
36 | Twister | Effect, natural phonation | 64.29 |
37 | The rocket lift-off | Effect | 39.99 |
Test result sees Table 4
Table 4 test result
26 | -0.59 | 20173 | -8.4287 | -0.44 | ?20409 | -9.5179 | -0.44 | 20627 | -9.8411 | -0.47 | ?19948 | -9.5655 |
27 | -1.25 | 20167 | -7.1056 | -0.71 | ?20352 | -8.5245 | -0.7 | 20593 | -9.1391 | -0.82 | ?19832 | -8.7548 |
28 | -0.74 | 20097 | -8.3041 | -0.4 | ?20930 | -10.178 | -0.4 | 20649 | -10.254 | -0.54 | ?19895 | -9.7134 |
29 | -1.05 | 20258 | -8.1341 | -0.83 | ?20064 | -9.1348 | -0.64 | 20432 | -9.2984 | -0.82 | ?19910 | -8.3131 |
30 | -0.58 | 20108 | -8.1582 | -0.57 | ?19449 | -8.9738 | -0.34 | 20654 | -9.8612 | -0.5 | ?19965 | -7.4497 |
31 | -0.84 | 20152 | -9.102 | -0.65 | ?20739 | -9.6848 | -0.69 | 20651 | -10.339 | -0.94 | ?19940 | -8.3607 |
32 | -1.42 | 20044 | -7.2542 | -0.76 | ?20001 | -9.1252 | -0.67 | 19858 | -9.6689 | -0.78 | ?19773 | -9.4107 |
33 | -0.86 | 20050 | -8.4676 | -0.5 | ?19327 | -10.514 | -0.53 | 19726 | -10.665 | -0.59 | ?19832 | -9.9142 |
34 | -0.86 | 20128 | -7.6612 | -0.52 | ?20373 | -9.6314 | -0.41 | 20602 | -9.9943 | -0.56 | ?19897 | -9.0474 |
35 | -0.63 | 20158 | -8.6926 | -0.66 | ?20317 | -9.1048 | -0.33 | 20541 | -10.686 | -0.45 | ?19899 | -9.9646 |
36 | -0.61 | 20076 | -7.9394 | -1.1 | ?19055 | -8.1089 | -0.35 | 20527 | -9.4003 | -0.5 | ?19826 | -8.7527 |
37 | -0.39 | 20574 | -8.7607 | -0.67 | ?20773 | -7.5065 | -0.35 | 20738 | -8.8435 | -0.45 | ?19963 | -8.1802 |
The worst | -2.02 | 20044 | -6.6257 | -1.1 | ?19055 | -7.2061 | -0.91 | 19726 | -7.7108 | -1.8 | ?19770 | 6.8442 |
Best | -0.36 | 20574 | -10.022 | -0.37 | ?21221 | -11.012 | -0.2 | 20738 | -16.799 | -0.45 | ?20038 | -10.442 |
On average | -0.829 | 20200 | -8.263 | -0.666 | ?20295 | -9.321 | -0.47919 | 20531.95 | -10.3993 | -0.845 | ?19894 | -6.32534 |
From Fig. 5 and Fig. 6 as seen, the relative Faac of the average ODG of NCTU improves 0.163, and uses the relative NCTU of average ODG of Megal of the present invention to improve 0.187, uses the megal of PAM II method basic suitable with Faac; The average N MR of NCTU is relative, and Faac has reduced 1.06dB, and uses the relative NCTU of average N MR of Megal of the present invention to reduce 1.08dB, uses the megal average N MR of PAM II method will be higher than Faac.Similarly conclusion can obtain in the NMR distribution plan of the ODG distribution plan of Fig. 7 of test clips and Fig. 8.The calculated amount assessment illustrates all that with the tonequality objective evaluation the present invention can make the AAC scrambler obtain the tonequality that significantly improves with the calculated amount that significantly reduces.
The comparison of the parameter that the parameter of the present invention by present frame and frame length phase in the past are average, and the time domain Pre echoes is judged, realized taking into full account time domain masking and frequency domain is sheltered the psychoacoustic model of (time-frequency is sheltered), the final output subband Bit Allocation in Discrete scale prediction of encoding accurately, can improve the coding quality of quantization encoding algorithm, comparing operand simultaneously with traditional psychoacoustic model algorithm also has reduction significantly.
Claims (13)
1. the audio-frequency processing method based on the psychoacoustic model of advanced audio coder is characterized in that, comprises following processing procedure:
A, by the psychologic acoustics subband spectrum energy of code stream to be encoded, calculate perceptual entropy and the masking threshold that obtains the coding subband by sheltering diffusion matrix;
B, by the perceptual entropy and the masking threshold of coding subband, use time-frequency and shelter and revise and the Pre echoes correction, calculate and obtain coding subband bit consumption prediction number, concrete processing procedure is:
B1, relatively the encode current masking threshold of subband obtains time-frequency with long-term average masking threshold and shelters modifying factor,
B2, judge by time domain masking whether Pre echoes loses and shelter, in this way, then the two continuous frames time-frequency is sheltered modifying factor correction: brust ' according to following principle
Sfb(k)=brust
Sfb(k)
ChnBrust, wherein, brust '
Sfb(k) for sheltering modifying factor, brust through the time-frequency of Pre echoes correction
Sfb(k) shelter modifying factor for former time-frequency, k is a k coding subband, chnBrust=3 during the first frame correction, and chnBrust=2 during the second frame correction,
B3, use time-frequency are sheltered modifying factor correction perceptual entropy, calculate to obtain coding subband bit consumption prediction number, and concrete steps are:
B31, the perceptual entropy of using time-frequency the to shelter modifying factor correction coding subband subband bit consumption prediction ratio that obtains encoding,
B32, carry out interframe negative feedback bit control, obtain the available bit number of present frame according to actual bit consumption,
B33, calculate by the available bit number of coding subband bit consumption prediction ratio and present frame and to obtain coding subband bit consumption prediction number,
Wherein, the ratio of coding subband bit consumption prediction described in the step B31 obtains by following formula:
C, psychoacoustic model output encoder subband bit consumption prediction number as the parameter of conducting code rate distortion control to carry out encoding process.
2. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, long-term average masking threshold obtains by following formula described in the step B1: Argmask
Sfb(k)=α Argmask '
Sfb(k)+(1-α) mask
Sfb(k)
Wherein, Argmask '
Sfb(k) be the long-term average masking threshold of coding subband of previous frame, Argmask
Sfb(k) be the long-term average masking threshold of coding subband of present frame, mask
Sfb(k) be present frame coding subband masking threshold, α is a damped expoential, and k is a k coding subband;
Described time-frequency is sheltered modifying factor and is obtained by following formula:
If 4 〉=chk 〉=0.5, brust at this moment
Sfb(k)=0.95, α=0.4;
If chk<0.5, brust at this moment
Sfb(k)=0.90, α=0.4;
Wherein, chk is an energy ratio, brust
Sfb(k) shelter modifying factor for the time-frequency of present encoding subband.
3. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, judges by time domain masking whether Pre echoes loses to shelter described in the step B2 and comprises following steps:
B21, a frame time-domain signal is divided into 8 sections, try to achieve each section the time domain absolute amplitude and and be placed in 8 elements in centre of segmentation absolute amplitude abamp:
Wherein, abamp is 10 * 1 vector, its first element abamp (1) inherit 8 sections of previous frames all square amplitude and
Last element is inherited absolute amplitude abamp (the 10)=abamp (9) of the final stage of present frame; Wherein, i is a present frame, and i-1 is a previous frame; x
i(n) be n time-domain signal point of present frame;
B22, the segmentation absolute amplitude that is obtained by step B21 calculate time domain mask Tmask (m) by following formula:
Time domain diffusive attenuation coefficients R ate wherein
TmaskFor
Rate
Tmask=[0.1?0.9
0?0.9
1?0.9
2?0.9
3?0.9
4?0.9
5?0.9
6?0.9
7?0.9
8]
Time domain diffusion normalization coefficient Tnorm (m) is
B23, when 1.3Tmask (1)<Tmask (8) and Tmask (8)>2000, be judged as Pre echoes and lose and to shelter.
4. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1, it is characterized in that, the available bit number of present frame described in the step B32 is obtained by following formula: bitAvailable (i)=controlRatio (bitAverage+bitAvailable (i-1)-bitUsed), wherein, controlRatio is the interframe modifying factor, the average number of bits that bitAverage can use for the every frame that obtains according to average bit rate, bitAvailable (i-1) is the previous frame available bit number, bitUsed is the bit number of previous frame actual consumption, and described interframe modifying factor is determined by following principle:
If bitRatio>1.06,
If 1.06 〉=bitRatio>1.05, controlRatio=0.9,
If 1.05 〉=bitRatio>1.02, controlRatio=0.95,
If 1.02 〉=bitRatio 〉=0.98, controlRatio=1,
5. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 1 is characterized in that, the subband of coding described in step B33 bit consumption prediction number is obtained by following formula:
SfbBits (k)=bitAvailable (i) sfbBitRatio (k), wherein, sfbBits (k) is coding subband bit consumption prediction number, bitAvailable (i) is the present frame available bit number, sfbBitRatio (k) is coding subband bit consumption prediction ratio, and k is a k coding subband.
6. as the audio-frequency processing method of the arbitrary described psychoacoustic model based on advanced audio coder of claim 1 to 5, it is characterized in that steps A comprises following steps:
A1, obtain the psychologic acoustics sub belt energy by the spectrum energy addition of the psychologic acoustics subband of code stream to be encoded;
A2, calculate psychologic acoustics sub belt energy peak-to-valley value by the psychologic acoustics sub belt energy;
A3, psychologic acoustics sub belt energy peak-to-valley value is mapped as the masking signal ratio by the second order linear equations;
A4, utilize masking signal than and the psychologic acoustics sub belt energy calculate the psychologic acoustics subband shelter energy certainly;
A5, by diffusion matrix by sheltering the masking threshold that energy obtains the psychologic acoustics subband certainly;
A6, calculate the perceptual entropy of psychologic acoustics subband by psychologic acoustics sub belt energy and masking threshold;
A7, the perceptual entropy of psychologic acoustics subband and masking threshold be mapped to the perceptual entropy and the masking threshold of coding subband respectively.
7. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6, it is characterized in that, diffusion matrix described in the steps A 5 is sparse diffusion matrix, rarefaction to diffusion matrix is by realizing that with being changed to 0 less than the element of being scheduled to the decibel threshold value in the normalization diffusion matrix normalized factor of described normalization diffusion matrix obtains by following formula:
Spr is the value of diffusion equation, and b is a b psychologic acoustics subband, and bb is a bb psychologic acoustics subband, and Δ fc represents formula bavl (b)-bval (bb).
8. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the peak-to-valley value of psychologic acoustics sub belt energy described in the steps A 2 obtains by following formula:
9. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the second order linear equations is described in the steps A 3:
MSR
Psy(b)=0.17453ppRate (b)
2+ 0.08325ppRate (b), wherein, MSR
Psy(b) be the masking signal ratio, ppRate (b) is a psychologic acoustics sub belt energy peak-to-valley value, and b is a b psychologic acoustics subband.
10. the disposal route of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that, obtains by following formula from sheltering energy described in the steps A 4:
E
Selfmask(b)=E
Psy(b) MSR
Psy(b), wherein, E
Selfmask(b) for sheltering energy, E certainly
Psy(b) be the psychologic acoustics sub belt energy, MSR
Psy(b) be the masking signal ratio, b is a b psychologic acoustics subband.
11. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that masking threshold obtains by following formula described in the steps A 5:
Mask
Psy(b)=E
Selfmask* sprdngMN, wherein, mask
Psy(b) be the masking threshold of psychologic acoustics subband, sprdngMN is a diffusion matrix, E
SelfmaskFor sheltering energy certainly, b is a b psychologic acoustics subband.
12. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that the perceptual entropy of psychologic acoustics subband described in the steps A 6 obtains by following formula:
Wherein, PE
Psy(b) be psychologic acoustics subband perceptual entropy, bw
Psy(b) be psychologic acoustics subband bandwidth, E
Psy(b) be the psychologic acoustics sub belt energy, mask
Psy(b) be the masking threshold of psychologic acoustics subband, b is a b psychologic acoustics subband.
13. the audio-frequency processing method of the psychoacoustic model based on advanced audio coder as claimed in claim 6 is characterized in that, the perceptual entropy of psychologic acoustics subband described in the steps A 7 is mapped to the perceptual entropy of coding subband by following formula:
Described psychologic acoustics subband masking threshold is mapped to the masking threshold of coding subband by following formula:
Mask
Sfb(k)=bw
Sfb(k) min (mask
Apsy(b)), b1≤b≤b2, wherein, mask
Sfb(k) be the masking threshold of coding subband, b1 satisfies psyLow (b1)≤sfbLow (k)≤psyHigh (b1), and b2 satisfies psyLow (b2)≤sfbHigh (k)≤psyHigh (b2),
Mask
Psy(b) be psychologic acoustics subband masking threshold; PsyHigh (b1), psyLow (b1) is respectively the upper bound and the lower bound of psychologic acoustics subband b1; PsyHigh (b2), psyLow (b2) is respectively the upper bound and the lower bound of psychologic acoustics subband b2; Bw
Sfb(k) be coding subband bandwidth.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007101276606A CN101308659B (en) | 2007-05-16 | 2007-06-20 | Psychoacoustics model processing method based on advanced audio decoder |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200710074481 | 2007-05-16 | ||
CN200710074481.0 | 2007-05-16 | ||
CN2007101276606A CN101308659B (en) | 2007-05-16 | 2007-06-20 | Psychoacoustics model processing method based on advanced audio decoder |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101308659A CN101308659A (en) | 2008-11-19 |
CN101308659B true CN101308659B (en) | 2011-11-30 |
Family
ID=40125072
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2007101276606A Active CN101308659B (en) | 2007-05-16 | 2007-06-20 | Psychoacoustics model processing method based on advanced audio decoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101308659B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101826327B (en) * | 2009-03-03 | 2013-06-05 | 中兴通讯股份有限公司 | Method and system for judging transient state based on time domain masking |
CN102804263A (en) * | 2009-06-23 | 2012-11-28 | 日本电信电话株式会社 | Coding method, decoding method, and device and program using the methods |
CN102714040A (en) * | 2010-01-14 | 2012-10-03 | 松下电器产业株式会社 | Encoding device, decoding device, spectrum fluctuation calculation method, and spectrum amplitude adjustment method |
CN106373583B (en) * | 2016-09-28 | 2019-05-21 | 北京大学 | Multi-audio-frequency object coding and decoding method based on ideal soft-threshold mask IRM |
CN112530444B (en) * | 2019-09-18 | 2023-10-03 | 华为技术有限公司 | Audio coding method and device |
CN111243568B (en) * | 2020-01-15 | 2022-04-26 | 西南交通大学 | Convex constraint self-adaptive echo cancellation method |
CN112599140A (en) * | 2020-12-23 | 2021-04-02 | 北京百瑞互联技术有限公司 | Method, device and storage medium for optimizing speech coding rate and operand |
CN112599139B (en) * | 2020-12-24 | 2023-11-24 | 维沃移动通信有限公司 | Encoding method, encoding device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1735925A (en) * | 2003-01-02 | 2006-02-15 | 杜比实验室特许公司 | Reducing scale factor transmission cost for MPEG-2 AAC using a lattice |
US7020603B2 (en) * | 2002-02-07 | 2006-03-28 | Intel Corporation | Audio coding and transcoding using perceptual distortion templates |
CN1841938A (en) * | 2005-03-31 | 2006-10-04 | Lg电子株式会社 | Method and apparatus for coding audio signal |
-
2007
- 2007-06-20 CN CN2007101276606A patent/CN101308659B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7020603B2 (en) * | 2002-02-07 | 2006-03-28 | Intel Corporation | Audio coding and transcoding using perceptual distortion templates |
CN1735925A (en) * | 2003-01-02 | 2006-02-15 | 杜比实验室特许公司 | Reducing scale factor transmission cost for MPEG-2 AAC using a lattice |
CN1841938A (en) * | 2005-03-31 | 2006-10-04 | Lg电子株式会社 | Method and apparatus for coding audio signal |
Non-Patent Citations (3)
Title |
---|
JP特开2001-154695A 2001.06.08 |
胡多传.MPEG-2 AAC音频编解码的研究及实现.《中国优秀博硕士学位论文全文数据库(硕士)》.2005,(第5期),第11-20页. * |
黄春明 等.心理声学模型及其在MPEG-2 AAC中的应用.《电声技术》.2004,(第11期),第44-47页. * |
Also Published As
Publication number | Publication date |
---|---|
CN101308659A (en) | 2008-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101308659B (en) | Psychoacoustics model processing method based on advanced audio decoder | |
CN101297356B (en) | Audio compression | |
CN101276587B (en) | Audio encoding apparatus and method thereof, audio decoding device and method thereof | |
CN107945811B (en) | Frequency band expansion-oriented generation type confrontation network training method and audio encoding and decoding method | |
CN101521014B (en) | Audio bandwidth expansion coding and decoding devices | |
CN103903626B (en) | Sound encoding device, audio decoding apparatus, voice coding method and tone decoding method | |
KR101120911B1 (en) | Audio signal decoding device and audio signal encoding device | |
CN105679327B (en) | Method and apparatus for encoding and decoding audio signal | |
Parvaix et al. | A watermarking-based method for informed source separation of audio signals with a single sensor | |
JP2774203B2 (en) | Audio signal processing method | |
KR101238239B1 (en) | An encoder | |
CN101149925B (en) | Space parameter selection method for parameter stereo coding | |
Ravelli et al. | Union of MDCT bases for audio coding | |
CN103959375A (en) | Enhanced chroma extraction from an audio codec | |
CN103366749B (en) | A kind of sound codec devices and methods therefor | |
CN101202043B (en) | Method and system for encoding and decoding audio signal | |
CN104103276A (en) | Sound coding device, sound decoding device, sound coding method and sound decoding method | |
Ramprashad | A two stage hybrid embedded speech/audio coding structure | |
CN101847413A (en) | Method for realizing digital audio encoding by using new psychoacoustic model and quick bit allocation | |
CN102460574A (en) | Method and apparatus for encoding and decoding audio signal using hierarchical sinusoidal pulse coding | |
CN101308661B (en) | Quantizer code rate distortion controlling means based on advanced audio coder | |
Wu et al. | Low bitrates audio object coding using convolutional auto-encoder and densenet mixture model | |
BR112021007516A2 (en) | audio encoder, audio processor and method for processing an audio signal | |
Zhang et al. | MDCT spectrum separation: Catching the fine spectral structures for stereo coding | |
CN103489450A (en) | Wireless audio compression and decompression method based on time domain aliasing elimination and equipment thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |