CN101388213B

CN101388213B - Preecho control method

Info

Publication number: CN101388213B
Application number: CN2008100537463A
Authority: CN
Inventors: 张涛; 王伟; 杨东明; 李海
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2008-07-03
Filing date: 2008-07-03
Publication date: 2012-02-22
Anticipated expiration: 2028-07-03
Also published as: CN101388213A

Abstract

The invention discloses a pre-echo control method, which conducts a time-domain planarization treatment to original signals through detecting transient positions and transient intensity in transient signals, and comprises the following processes: audio frame data which are be coded are divided into a plurality of data blocks, the transient intensity of each data block is calculated, data block whose transient intensity exceeds a threshold value is marked with a transient data block, transient starting data block which is a transient starting position is marked, and redundant transient positions are eliminated according to the masking effect of ears. Time domain planarization curves are drawn according to the transient intensity, if having a plurality of transient positions, the time domain planarization curves are synthesized, the planarization curves are aligned to the transient positions, and frame changing signals are added with windows and are done with the planarization treatment. Compared with a traditional scheme, the pre-echo control method directly aims to the transient positions to inhibit noise, thereby effectively controlling the pre-echo phenomenon.

Description

A kind of Pre echoes control method

Technical field

The invention belongs to the digital audio processing technical field, be specifically related to a kind of new Pre echoes control method and device.

Background technology

In the audio coding technology, the Pre echoes distortion is a suitable stubborn problem always, and especially when bit rate is low, that is compressibility is when higher, and the Pre echoes distortion will become more obviously with serious.The key reason that the Pre echoes distortion produces is: the deficiency of temporal resolution causes the time domain diffusion of quantizing noise.Especially when a transient signal by piecemeal conversion (or filtering) when frequency domain carries out quantization encoding, owing to quantizing noise is diffused on whole transform block (or bank of filters) scope, if it can not be sheltered by signal, Pre echoes will appear.Pre echoes causes the waveform of distorted signals as shown in Figure 1, tangible quantizing noise obviously as can be seen from the figure before burst, occurred, and people's ear is very responsive to this type of distortion.

Except the intensity of quantizing noise has determined that the Pre echoes distortion effect sound quality, the time domain masking of people's ear also plays an important role to Pre echoes distortion effect sound quality.The time domain masking of people's ear has two kinds of situation, i.e. forward masking and backward masking.Can reach 20ms the action time of forward masking, it has been generally acknowledged that the practical function time 0.5～2ms with interior effectively; Backward masking has the longer continuous action time, reaches 200ms approximately, and it has been generally acknowledged that the practical function time is effective within 10～50ms.Because backward masking is longer action time, quantizing noise generally can be masked off well and do not influenced subjective sound quality, therefore this situation of less consideration in perceptual audio coder.With respect to backward masking; The forward masking ability a little less than, just need carefully design the time domain specification of a suitable quantizing noise, make it be no more than the preparatory masking level of people's ear; The Pre echoes distortion just can not detected by people's ear like this, thereby guarantees transparent encode sound quality.

In order to suppress quantizing noise; In modern times in the main flow audio coding standard; Nearly all adopt the method that the length window switches to handle transient signal, promptly when transient signal is handled, used relatively shorter transformation block length; Temporal resolution can be improved like this, of the diffusion of frequency domain quantizing noise can be suppressed again in time domain.In addition, " Pre echoes " phenomenon that adopted other measure to improve respectively again in each standard has all adopted code rate control method such as MPEG-1 Layer 3 and MPEG-2; MPEG-2 AAC has adopted time domain noise-shaped and gain control, AC-3 exponent mantissa coding and sound channel coupling etc.

In the AVS audio standard, do not use length window handoff technique, but the long window conversion of unified use; Different is; When signal is transient state, also to carry out the frequency domain multiresolution analysis to the coefficient behind the time-frequency conversion, promptly adopt the method for hybrid filter-bank to suppress Pre echoes and improve code efficiency.It is all very high that but the computation complexity of this method and computational accuracy require, and seriously influenced the speed of codec, especially influenced the real-time of demoder.

And present main flow scrambler is when handling transient signal; It all is unit with the piece; Though and these methods can detect the transient response of signal; But they all can not accurately locate position and the intensity that transition takes place, and also do not make corresponding processing to the position that transition takes place and the intensity of transition.So still can produce tangible Pre echoes phenomenon in some cases.

Summary of the invention

In view of above-mentioned technical matters; The present invention proposes a kind of new Pre echoes control method; Ultimate principle (the time domain transient signal needs high temporal resolution) according to the Pre echoes generation; Utilize people's ear to the details of high-frequency signal and insensitive characteristic (thereby can give up part high frequency details and not by the perception of people's ear) simultaneously,, signal is carried out time-domain planarization according to the transition intensity of this transition through the transient position of location time domain transient signal; Directly carry out squelch, thereby effectively control the Pre echoes phenomenon to transient position.

The invention discloses audio frame number to be encoded according to being divided into a plurality of data blocks; In the AVS audio coding standard; The length of one frame signal is 1024 sampled points; 44.1kHz the time of one frame signal is 23.22ms during sampling, is a data block with 32 sampled points, each data block is approximately 0.7ms;

Calculate the transition intensity of each data block;

The data block that wherein transition intensity is surpassed threshold value is labeled as the transient data piece;

To transition initial data piece, i.e. transition reference position carry out mark, and rejects redundant transient position according to the masking effect of people's ear;

Draw the time-domain planarization curve according to transition intensity, may further comprise the steps:

The transition intensity C (k) that tries to achieve according to the front confirms the minimum value C that the planarization curve is decayed _Min, T wherein _cTransition intensity threshold for the detection of transition piece;

Work as T _c＜C (k)＜4 o'clock, C _Min=1/2

When 4≤C (k)＜8, C _Min=1/4

When 8≤C (k)＜16, C _Min=1/8

When 16≤C (k)＜32, C _Min=1/16

When 32≤C (k), C _Min=1/32

If k data block is first transition piece in the frame, the calculating of planarization curve y (x) is following:

y(x)＝1；(x＝0，1，...，32*(k-1)-1)

y (x) = 1 - \frac{1 - C_{\min}}{32} (x - 32 * (k - 1)); (x = 32 * (k - 1), 32 * (k - 1) + 1, . . ., 32 * k - 1)

y(x)＝C _min；(x＝32*k，32*k+1，...，1024-1)

The starting point of planarization curve y (x) with align with transition piece reference position, promptly the planarization curve begins from transition piece original position;

If any a plurality of transient position, then synthetic time-domain planarization curve;

The planarization curve is alignd with transient position, this frame signal is carried out windowing, planarization.

The said data block that wherein transition intensity is surpassed threshold value is labeled as the step of transient data piece, and also further may further comprise the steps: with this frame flag is the transient state frame.

Compare with existing main flow scrambler, the present invention can accurately locate position and the intensity that transition takes place, and makes corresponding processing to the position that transition takes place and the intensity of transition.This new Pre echoes control method; Ultimate principle (the time domain transient signal needs high temporal resolution) according to the Pre echoes generation; Utilize people's ear to the details of high-frequency signal and insensitive characteristic (thereby can give up part high frequency details and not by the perception of people's ear) simultaneously,, signal is carried out time-domain planarization according to the transition intensity of this transition through the transient position of location time domain transient signal; Directly carry out squelch, thereby effectively control the Pre echoes phenomenon to transient position.

Description of drawings

Fig. 1 is the Pre echoes phenomenon;

Fig. 2 is a FB(flow block) of the present invention;

Fig. 3 is the oscillogram of original castanets sequence;

Fig. 4 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after the multiresolution analysis method is handled;

Fig. 5 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after new method is handled.

Embodiment

Below in conjunction with accompanying drawing and specific embodiment technical scheme of the present invention is done and to be further described:

Embodiment 1: adopt AVS audio coding decoding platform; Through test, contrasted the control effect of the method for the hybrid filter-bank that the present invention and AVS audio standard adopt to the Pre echoes distortion to typical castanets sequence (monophony, 44.1kHz SF, 16 bit-pattern precision, encoder bit rate 32kbps/ch).

Embodiment of the present invention illustrates as follows: for example, in the AVS audio coder, the length of a frame signal be 1024 sampled points (PCM [i], i=0,1 ... 1023; ), the time of 44.1kHz when sampling one frame signal is about 23.22ms, is a data block with 32 sampled points, i.e. the about 0.7ms of each data block, and every frame is 32 data blocks altogether.If the general power of each data block sampling point is P (k); K=1; 2; ... 32, then

Definition 1: the power of each data block sampling point is with respect to the ratio of last data piece

C (k) = \frac{P (k)}{P (k - 1)}; (P (k - 1)! = 0, k = 2,3, . . ., 32)

The transition intensity that is called this data block.

Definition 2: transition intensity C (k) surpasses a certain threshold value T _cData block be called the transition piece.

The first step: data block is divided.Signal in one frame is divided into a data block according to per 32 the continuous sampling points of top definition.

Second step: the transition intensity of calculating and detect each data block in the frame.

Calculate the transition intensity C (k) of each data block successively, and with threshold value T _cCompare, when C (k) greater than T _cThe time (get T here _c=2), think that then this data block is the transition piece.

The 3rd step: all the transition pieces in the frame are labeled as transient position, and are the transient state frame with this frame flag.

The 4th step: reject redundant transient position.

Because the forward masking time of signal is generally 20ms; The practical function time is general consider 0.5～2ms with interior effectively, so k=2,3; In 4 three data blocks; If transition takes place then need not consider,, can be sheltered by the forward masking effect because only have an appointment the signal about 2ms in the front of transient position.Longer because of the backward masking time of signal again, about 200ms, generally get about 20ms effective time, so after detecting first transient position, remaining transient position just can have been rejected.

The 5th step: time-domain planarization treatment

The transition intensity C (k) that tries to achieve according to the front confirms the minimum value C that the planarization curve is decayed _MinConcrete grammar is following, wherein T _cTransition intensity threshold for the detection of transition piece:

Work as T _c＜C (k)＜4 o'clock, C _Min=1/2

When 4≤C (k)＜8, C _Min=1/4

When 8≤C (k)＜16, C _Min=1/8

When 16≤C (k)＜32, C _Min=1/16

When 32≤C (k), C _Min=1/32

The calculating of planarization curve y (x) is following: (establish k data block is first transition piece in the frame)

y(x)＝1；(x＝0，1，...，32*(k-1)-1)

y (x) = 1 - \frac{1 - C_{\min}}{32} (x - 32 * (k - 1)); (x = 32 * (k - 1), 32 * (k - 1) + 1, . . ., 32 * k - 1)

y(x)＝C _min；(x＝32*k，32*k+1，...，1024-1)

Use y (x) to this frame signal windowing, so just can suppress the amplitude of signal transients part, make the signal planarization.

The 6th step:, be bundled in the AVS audio code stream like transition frame identification, transient position and transition intensity with transient information.

Other processing are identical with steady-state signal, and it is reducible raw data that the signal that decoding end obtains decoding according to transient information in the code stream carries out reverse operating.

Test findings such as Fig. 3-shown in Figure 5.Wherein, Fig. 3 is the oscillogram of original castanets sequence, and transition effect is obvious in this oscillogram, and the noise before the transient signal is very low; Fig. 4 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after the multiresolution analysis method is handled; Can find out from this oscillogram: the signal of handling through coding/decoding has kept time domain resolution preferably; Noise before the transient signal has also obtained inhibition, obvious noise do not occur; Fig. 5 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after method of the present invention is handled; Can find out from this oscillogram: the signal of handling through encoding and decoding has also kept good time domain resolution; Noise before the transient signal has also obtained effective inhibition, and noise is starkly lower than the result of multiresolution analysis method.

Embodiment 2: the Dolby AC-3 audio coding decoding platform of employing, to the embodiment of typical castanets sequence (monophony, 44.1kHz SF, 16 bit-pattern precision, encoder bit rate 64kbps/ch).

Embodiment of the present invention illustrates as follows: for example, in the AC-3 audio coder, the length of a frame signal be 512 sampled points (PCM [i], i=0,1 ... 511; ), the time of 44.1kHz when sampling one frame signal is about 11.6ms, is a data block with 32 sampled points, i.e. the about 0.7ms of each data block, and every frame is 16 data blocks altogether.If the general power of each data block sampling point is P (k); K=1; 2; ... 16, then

C (k) = \frac{P (k)}{P (k - 1)}; (P (k - 1)! = 0, k = 2,3, . . ., 15)

The transition intensity that is called this data block.

The 4th step: reject redundant transient position.

The 5th step: time-domain planarization treatment

Work as T _c＜C (k)＜4 o'clock, C _Min=1/2

When 4≤C (k)＜8, C _Min=1/4

When 8≤C (k)＜16, C _Min=1/8

When 16≤C (k)＜32, C _Min=1/16

When 32≤C (k), C _Min=1/32

y(x)＝1；(x＝0，1，...，32*(k-1)-1)

y (x) = 1 - \frac{1 - C_{\min}}{32} (x - 32 * (k - 1)); (x = 32 * (k - 1), 32 * (k - 1) + 1, . . ., 32 * k - 1)

y(x)＝C _min；(x＝32*k，32*k+1，...，1024-1)

Claims

1. a Pre echoes control method is carried out time-domain planarization treatment through the transient position and the transition intensity that detect in the transient signal to original signal, and this method comprises following process:

Audio frame number certificate to be encoded is divided into a plurality of data blocks; In the AVS audio coding standard; The length of one frame signal is 1024 sampled points; 44.1kHz the time of one frame signal is 23.22 milliseconds of ms during sampling, is a data block with 32 sampled points, promptly about sampling time of each data block is 0.7 millisecond of ms;

Calculate the transition intensity of each data block;

Work as T _c＜C (k)＜4 o'clock, C _Min=1/2

When 4≤C (k)＜8, C _Min=1/4

When 8≤C (k)＜16, C _Min=1/8

When 16≤C (k)＜32, C _Min=1/16

When 32≤C (k), C _Min=1/32

y(x)＝1；(x＝0，1，…，32*(k-1)-1)

y (x) = 1 - \frac{1 - C_{\min}}{32} (x - 32 * (k - 1)); (x = 32 * (k - 1), 32 * (k - 1) + 1, . . ., 32 * k - 1)

y(x)＝C _min；(x＝32*k，32*k+1，…，1024-1)

The starting point of planarization curve y (x) is alignd with the reference position of audio frame;

Use the planarization curve that this frame signal is carried out windowing, planarization.

2. Pre echoes control method as claimed in claim 1 is characterized in that, the said data block that wherein transition intensity is surpassed threshold value is labeled as the step of transient data piece, and also further may further comprise the steps: with this frame flag is the transient state frame.