CN101388213B - Preecho control method - Google Patents

Preecho control method Download PDF

Info

Publication number
CN101388213B
CN101388213B CN2008100537463A CN200810053746A CN101388213B CN 101388213 B CN101388213 B CN 101388213B CN 2008100537463 A CN2008100537463 A CN 2008100537463A CN 200810053746 A CN200810053746 A CN 200810053746A CN 101388213 B CN101388213 B CN 101388213B
Authority
CN
China
Prior art keywords
transient
planarization
data block
transition
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008100537463A
Other languages
Chinese (zh)
Other versions
CN101388213A (en
Inventor
张涛
王伟
杨东明
李海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN2008100537463A priority Critical patent/CN101388213B/en
Publication of CN101388213A publication Critical patent/CN101388213A/en
Application granted granted Critical
Publication of CN101388213B publication Critical patent/CN101388213B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a pre-echo control method, which conducts a time-domain planarization treatment to original signals through detecting transient positions and transient intensity in transient signals, and comprises the following processes: audio frame data which are be coded are divided into a plurality of data blocks, the transient intensity of each data block is calculated, data block whose transient intensity exceeds a threshold value is marked with a transient data block, transient starting data block which is a transient starting position is marked, and redundant transient positions are eliminated according to the masking effect of ears. Time domain planarization curves are drawn according to the transient intensity, if having a plurality of transient positions, the time domain planarization curves are synthesized, the planarization curves are aligned to the transient positions, and frame changing signals are added with windows and are done with the planarization treatment. Compared with a traditional scheme, the pre-echo control method directly aims to the transient positions to inhibit noise, thereby effectively controlling the pre-echo phenomenon.

Description

A kind of Pre echoes control method
Technical field
The invention belongs to the digital audio processing technical field, be specifically related to a kind of new Pre echoes control method and device.
Background technology
In the audio coding technology, the Pre echoes distortion is a suitable stubborn problem always, and especially when bit rate is low, that is compressibility is when higher, and the Pre echoes distortion will become more obviously with serious.The key reason that the Pre echoes distortion produces is: the deficiency of temporal resolution causes the time domain diffusion of quantizing noise.Especially when a transient signal by piecemeal conversion (or filtering) when frequency domain carries out quantization encoding, owing to quantizing noise is diffused on whole transform block (or bank of filters) scope, if it can not be sheltered by signal, Pre echoes will appear.Pre echoes causes the waveform of distorted signals as shown in Figure 1, tangible quantizing noise obviously as can be seen from the figure before burst, occurred, and people's ear is very responsive to this type of distortion.
Except the intensity of quantizing noise has determined that the Pre echoes distortion effect sound quality, the time domain masking of people's ear also plays an important role to Pre echoes distortion effect sound quality.The time domain masking of people's ear has two kinds of situation, i.e. forward masking and backward masking.Can reach 20ms the action time of forward masking, it has been generally acknowledged that the practical function time 0.5~2ms with interior effectively; Backward masking has the longer continuous action time, reaches 200ms approximately, and it has been generally acknowledged that the practical function time is effective within 10~50ms.Because backward masking is longer action time, quantizing noise generally can be masked off well and do not influenced subjective sound quality, therefore this situation of less consideration in perceptual audio coder.With respect to backward masking; The forward masking ability a little less than, just need carefully design the time domain specification of a suitable quantizing noise, make it be no more than the preparatory masking level of people's ear; The Pre echoes distortion just can not detected by people's ear like this, thereby guarantees transparent encode sound quality.
In order to suppress quantizing noise; In modern times in the main flow audio coding standard; Nearly all adopt the method that the length window switches to handle transient signal, promptly when transient signal is handled, used relatively shorter transformation block length; Temporal resolution can be improved like this, of the diffusion of frequency domain quantizing noise can be suppressed again in time domain.In addition, " Pre echoes " phenomenon that adopted other measure to improve respectively again in each standard has all adopted code rate control method such as MPEG-1 Layer 3 and MPEG-2; MPEG-2 AAC has adopted time domain noise-shaped and gain control, AC-3 exponent mantissa coding and sound channel coupling etc.
In the AVS audio standard, do not use length window handoff technique, but the long window conversion of unified use; Different is; When signal is transient state, also to carry out the frequency domain multiresolution analysis to the coefficient behind the time-frequency conversion, promptly adopt the method for hybrid filter-bank to suppress Pre echoes and improve code efficiency.It is all very high that but the computation complexity of this method and computational accuracy require, and seriously influenced the speed of codec, especially influenced the real-time of demoder.
And present main flow scrambler is when handling transient signal; It all is unit with the piece; Though and these methods can detect the transient response of signal; But they all can not accurately locate position and the intensity that transition takes place, and also do not make corresponding processing to the position that transition takes place and the intensity of transition.So still can produce tangible Pre echoes phenomenon in some cases.
Summary of the invention
In view of above-mentioned technical matters; The present invention proposes a kind of new Pre echoes control method; Ultimate principle (the time domain transient signal needs high temporal resolution) according to the Pre echoes generation; Utilize people's ear to the details of high-frequency signal and insensitive characteristic (thereby can give up part high frequency details and not by the perception of people's ear) simultaneously,, signal is carried out time-domain planarization according to the transition intensity of this transition through the transient position of location time domain transient signal; Directly carry out squelch, thereby effectively control the Pre echoes phenomenon to transient position.
The invention discloses audio frame number to be encoded according to being divided into a plurality of data blocks; In the AVS audio coding standard; The length of one frame signal is 1024 sampled points; 44.1kHz the time of one frame signal is 23.22ms during sampling, is a data block with 32 sampled points, each data block is approximately 0.7ms;
Calculate the transition intensity of each data block;
The data block that wherein transition intensity is surpassed threshold value is labeled as the transient data piece;
To transition initial data piece, i.e. transition reference position carry out mark, and rejects redundant transient position according to the masking effect of people's ear;
Draw the time-domain planarization curve according to transition intensity, may further comprise the steps:
The transition intensity C (k) that tries to achieve according to the front confirms the minimum value C that the planarization curve is decayed Min, T wherein cTransition intensity threshold for the detection of transition piece;
Work as T c<C (k)<4 o'clock, C Min=1/2
When 4≤C (k)<8, C Min=1/4
When 8≤C (k)<16, C Min=1/8
When 16≤C (k)<32, C Min=1/16
When 32≤C (k), C Min=1/32
If k data block is first transition piece in the frame, the calculating of planarization curve y (x) is following:
y(x)=1;(x=0,1,...,32*(k-1)-1)
y ( x ) = 1 - 1 - C min 32 ( x - 32 * ( k - 1 ) ) ; ( x = 32 * ( k - 1 ) , 32 * ( k - 1 ) + 1 , . . . , 32 * k - 1 )
y(x)=C min;(x=32*k,32*k+1,...,1024-1)
The starting point of planarization curve y (x) with align with transition piece reference position, promptly the planarization curve begins from transition piece original position;
If any a plurality of transient position, then synthetic time-domain planarization curve;
The planarization curve is alignd with transient position, this frame signal is carried out windowing, planarization.
The said data block that wherein transition intensity is surpassed threshold value is labeled as the step of transient data piece, and also further may further comprise the steps: with this frame flag is the transient state frame.
Compare with existing main flow scrambler, the present invention can accurately locate position and the intensity that transition takes place, and makes corresponding processing to the position that transition takes place and the intensity of transition.This new Pre echoes control method; Ultimate principle (the time domain transient signal needs high temporal resolution) according to the Pre echoes generation; Utilize people's ear to the details of high-frequency signal and insensitive characteristic (thereby can give up part high frequency details and not by the perception of people's ear) simultaneously,, signal is carried out time-domain planarization according to the transition intensity of this transition through the transient position of location time domain transient signal; Directly carry out squelch, thereby effectively control the Pre echoes phenomenon to transient position.
Description of drawings
Fig. 1 is the Pre echoes phenomenon;
Fig. 2 is a FB(flow block) of the present invention;
Fig. 3 is the oscillogram of original castanets sequence;
Fig. 4 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after the multiresolution analysis method is handled;
Fig. 5 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after new method is handled.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment technical scheme of the present invention is done and to be further described:
Embodiment 1: adopt AVS audio coding decoding platform; Through test, contrasted the control effect of the method for the hybrid filter-bank that the present invention and AVS audio standard adopt to the Pre echoes distortion to typical castanets sequence (monophony, 44.1kHz SF, 16 bit-pattern precision, encoder bit rate 32kbps/ch).
Embodiment of the present invention illustrates as follows: for example, in the AVS audio coder, the length of a frame signal be 1024 sampled points (PCM [i], i=0,1 ... 1023; ), the time of 44.1kHz when sampling one frame signal is about 23.22ms, is a data block with 32 sampled points, i.e. the about 0.7ms of each data block, and every frame is 32 data blocks altogether.If the general power of each data block sampling point is P (k); K=1; 2; ... 32, then
Figure GSB00000576623500041
Definition 1: the power of each data block sampling point is with respect to the ratio of last data piece
C ( k ) = P ( k ) P ( k - 1 ) ; ( P ( k - 1 ) ! = 0 , k = 2,3 , . . . , 32 ) The transition intensity that is called this data block.
Definition 2: transition intensity C (k) surpasses a certain threshold value T cData block be called the transition piece.
The first step: data block is divided.Signal in one frame is divided into a data block according to per 32 the continuous sampling points of top definition.
Second step: the transition intensity of calculating and detect each data block in the frame.
Calculate the transition intensity C (k) of each data block successively, and with threshold value T cCompare, when C (k) greater than T cThe time (get T here c=2), think that then this data block is the transition piece.
The 3rd step: all the transition pieces in the frame are labeled as transient position, and are the transient state frame with this frame flag.
The 4th step: reject redundant transient position.
Because the forward masking time of signal is generally 20ms; The practical function time is general consider 0.5~2ms with interior effectively, so k=2,3; In 4 three data blocks; If transition takes place then need not consider,, can be sheltered by the forward masking effect because only have an appointment the signal about 2ms in the front of transient position.Longer because of the backward masking time of signal again, about 200ms, generally get about 20ms effective time, so after detecting first transient position, remaining transient position just can have been rejected.
The 5th step: time-domain planarization treatment
The transition intensity C (k) that tries to achieve according to the front confirms the minimum value C that the planarization curve is decayed MinConcrete grammar is following, wherein T cTransition intensity threshold for the detection of transition piece:
Work as T c<C (k)<4 o'clock, C Min=1/2
When 4≤C (k)<8, C Min=1/4
When 8≤C (k)<16, C Min=1/8
When 16≤C (k)<32, C Min=1/16
When 32≤C (k), C Min=1/32
The calculating of planarization curve y (x) is following: (establish k data block is first transition piece in the frame)
y(x)=1;(x=0,1,...,32*(k-1)-1)
y ( x ) = 1 - 1 - C min 32 ( x - 32 * ( k - 1 ) ) ; ( x = 32 * ( k - 1 ) , 32 * ( k - 1 ) + 1 , . . . , 32 * k - 1 )
y(x)=C min;(x=32*k,32*k+1,...,1024-1)
Use y (x) to this frame signal windowing, so just can suppress the amplitude of signal transients part, make the signal planarization.
The 6th step:, be bundled in the AVS audio code stream like transition frame identification, transient position and transition intensity with transient information.
Other processing are identical with steady-state signal, and it is reducible raw data that the signal that decoding end obtains decoding according to transient information in the code stream carries out reverse operating.
Test findings such as Fig. 3-shown in Figure 5.Wherein, Fig. 3 is the oscillogram of original castanets sequence, and transition effect is obvious in this oscillogram, and the noise before the transient signal is very low; Fig. 4 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after the multiresolution analysis method is handled; Can find out from this oscillogram: the signal of handling through coding/decoding has kept time domain resolution preferably; Noise before the transient signal has also obtained inhibition, obvious noise do not occur; Fig. 5 is based on the oscillogram that AVS audio coding decoding platform adopts the castanets sequence after method of the present invention is handled; Can find out from this oscillogram: the signal of handling through encoding and decoding has also kept good time domain resolution; Noise before the transient signal has also obtained effective inhibition, and noise is starkly lower than the result of multiresolution analysis method.
Embodiment 2: the Dolby AC-3 audio coding decoding platform of employing, to the embodiment of typical castanets sequence (monophony, 44.1kHz SF, 16 bit-pattern precision, encoder bit rate 64kbps/ch).
Embodiment of the present invention illustrates as follows: for example, in the AC-3 audio coder, the length of a frame signal be 512 sampled points (PCM [i], i=0,1 ... 511; ), the time of 44.1kHz when sampling one frame signal is about 11.6ms, is a data block with 32 sampled points, i.e. the about 0.7ms of each data block, and every frame is 16 data blocks altogether.If the general power of each data block sampling point is P (k); K=1; 2; ... 16, then
Figure GSB00000576623500052
Definition 1: the power of each data block sampling point is with respect to the ratio of last data piece
C ( k ) = P ( k ) P ( k - 1 ) ; ( P ( k - 1 ) ! = 0 , k = 2,3 , . . . , 15 ) The transition intensity that is called this data block.
Definition 2: transition intensity C (k) surpasses a certain threshold value T cData block be called the transition piece.
The first step: data block is divided.Signal in one frame is divided into a data block according to per 32 the continuous sampling points of top definition.
Second step: the transition intensity of calculating and detect each data block in the frame.
Calculate the transition intensity C (k) of each data block successively, and with threshold value T cCompare, when C (k) greater than T cThe time (get T here c=2), think that then this data block is the transition piece.
The 3rd step: all the transition pieces in the frame are labeled as transient position, and are the transient state frame with this frame flag.
The 4th step: reject redundant transient position.
Because the forward masking time of signal is generally 20ms; The practical function time is general consider 0.5~2ms with interior effectively, so k=2,3; In 4 three data blocks; If transition takes place then need not consider,, can be sheltered by the forward masking effect because only have an appointment the signal about 2ms in the front of transient position.Longer because of the backward masking time of signal again, about 200ms, generally get about 20ms effective time, so after detecting first transient position, remaining transient position just can have been rejected.
The 5th step: time-domain planarization treatment
The transition intensity C (k) that tries to achieve according to the front confirms the minimum value C that the planarization curve is decayed MinConcrete grammar is following, wherein T cTransition intensity threshold for the detection of transition piece:
Work as T c<C (k)<4 o'clock, C Min=1/2
When 4≤C (k)<8, C Min=1/4
When 8≤C (k)<16, C Min=1/8
When 16≤C (k)<32, C Min=1/16
When 32≤C (k), C Min=1/32
The calculating of planarization curve y (x) is following: (establish k data block is first transition piece in the frame)
y(x)=1;(x=0,1,...,32*(k-1)-1)
y ( x ) = 1 - 1 - C min 32 ( x - 32 * ( k - 1 ) ) ; ( x = 32 * ( k - 1 ) , 32 * ( k - 1 ) + 1 , . . . , 32 * k - 1 )
y(x)=C min;(x=32*k,32*k+1,...,1024-1)
Use y (x) to this frame signal windowing, so just can suppress the amplitude of signal transients part, make the signal planarization.
The 6th step:, be bundled in the AVS audio code stream like transition frame identification, transient position and transition intensity with transient information.
Other processing are identical with steady-state signal, and it is reducible raw data that the signal that decoding end obtains decoding according to transient information in the code stream carries out reverse operating.

Claims (2)

1. a Pre echoes control method is carried out time-domain planarization treatment through the transient position and the transition intensity that detect in the transient signal to original signal, and this method comprises following process:
Audio frame number certificate to be encoded is divided into a plurality of data blocks; In the AVS audio coding standard; The length of one frame signal is 1024 sampled points; 44.1kHz the time of one frame signal is 23.22 milliseconds of ms during sampling, is a data block with 32 sampled points, promptly about sampling time of each data block is 0.7 millisecond of ms;
Calculate the transition intensity of each data block;
The data block that wherein transition intensity is surpassed threshold value is labeled as the transient data piece;
To transition initial data piece, i.e. transition reference position carry out mark, and rejects redundant transient position according to the masking effect of people's ear;
Draw the time-domain planarization curve according to transition intensity, may further comprise the steps:
The transition intensity C (k) that tries to achieve according to the front confirms the minimum value C that the planarization curve is decayed Min, T wherein cTransition intensity threshold for the detection of transition piece;
Work as T c<C (k)<4 o'clock, C Min=1/2
When 4≤C (k)<8, C Min=1/4
When 8≤C (k)<16, C Min=1/8
When 16≤C (k)<32, C Min=1/16
When 32≤C (k), C Min=1/32
If k data block is first transition piece in the frame, the calculating of planarization curve y (x) is following:
y(x)=1;(x=0,1,…,32*(k-1)-1)
y ( x ) = 1 - 1 - C min 32 ( x - 32 * ( k - 1 ) ) ; ( x = 32 * ( k - 1 ) , 32 * ( k - 1 ) + 1 , . . . , 32 * k - 1 )
y(x)=C min;(x=32*k,32*k+1,…,1024-1)
The starting point of planarization curve y (x) is alignd with the reference position of audio frame;
If any a plurality of transient position, then synthetic time-domain planarization curve;
Use the planarization curve that this frame signal is carried out windowing, planarization.
2. Pre echoes control method as claimed in claim 1 is characterized in that, the said data block that wherein transition intensity is surpassed threshold value is labeled as the step of transient data piece, and also further may further comprise the steps: with this frame flag is the transient state frame.
CN2008100537463A 2008-07-03 2008-07-03 Preecho control method Expired - Fee Related CN101388213B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100537463A CN101388213B (en) 2008-07-03 2008-07-03 Preecho control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100537463A CN101388213B (en) 2008-07-03 2008-07-03 Preecho control method

Publications (2)

Publication Number Publication Date
CN101388213A CN101388213A (en) 2009-03-18
CN101388213B true CN101388213B (en) 2012-02-22

Family

ID=40477583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100537463A Expired - Fee Related CN101388213B (en) 2008-07-03 2008-07-03 Preecho control method

Country Status (1)

Country Link
CN (1) CN101388213B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908342B (en) * 2010-07-23 2012-09-26 北京理工大学 Method for inhibiting pre-echoes of audio transient signals by utilizing frequency domain filtering post-processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1253418A (en) * 1998-10-29 2000-05-17 松下电器产业株式会社 Block size determination used in audio frequency conversion coding and self adapting method
CN1458646A (en) * 2003-04-21 2003-11-26 北京阜国数字技术有限公司 Filter parameter vector quantization and audio coding method via predicting combined quantization model
CN1514997A (en) * 2001-06-08 2004-07-21 �ʼҷ����ֵ������޹�˾ Editing of audio signals
US7099830B1 (en) * 2000-03-29 2006-08-29 At&T Corp. Effective deployment of temporal noise shaping (TNS) filters
CN1934619A (en) * 2004-03-17 2007-03-21 皇家飞利浦电子股份有限公司 Audio coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1253418A (en) * 1998-10-29 2000-05-17 松下电器产业株式会社 Block size determination used in audio frequency conversion coding and self adapting method
US7099830B1 (en) * 2000-03-29 2006-08-29 At&T Corp. Effective deployment of temporal noise shaping (TNS) filters
CN1514997A (en) * 2001-06-08 2004-07-21 �ʼҷ����ֵ������޹�˾ Editing of audio signals
CN1458646A (en) * 2003-04-21 2003-11-26 北京阜国数字技术有限公司 Filter parameter vector quantization and audio coding method via predicting combined quantization model
CN1934619A (en) * 2004-03-17 2007-03-21 皇家飞利浦电子股份有限公司 Audio coding

Also Published As

Publication number Publication date
CN101388213A (en) 2009-03-18

Similar Documents

Publication Publication Date Title
Moattar et al. A simple but efficient real-time voice activity detection algorithm
JP2023015055A (en) Harmonic dependency control for harmonic filter tool
Harma et al. A comparison of warped and conventional linear predictive coding
CN1926608B (en) Device and method for processing a multi-channel signal
US8082157B2 (en) Apparatus for encoding and decoding audio signal and method thereof
Liu et al. Compression artifacts in perceptual audio coding
CN1938758B (en) Method and apparatus for determining an estimate
EP1808851A1 (en) System and method for low power stereo perceptual audio coding using adaptive masking threshold
US20100223054A1 (en) Single-microphone wind noise suppression
MX2011000368A (en) Providing a time warp activation signal and encoding an audio signal therewith.
EP3113184B1 (en) Method and device for voice activity detection
EP3739582B1 (en) Voice detection
WO2005055197A3 (en) Noise suppressor for speech coding and speech recognition
CN1997988A (en) Method of making a window type decision based on MDCT data in audio encoding
CN101908342B (en) Method for inhibiting pre-echoes of audio transient signals by utilizing frequency domain filtering post-processing
AU666612B2 (en) Method and apparatus for encoding/decoding of background sounds
EP3614384B1 (en) Method for estimating noise in an audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signals
CN101388213B (en) Preecho control method
CN100492495C (en) Apparatus and method for detecting noise
CN110709926A (en) Apparatus and method for post-processing audio signals using prediction-based shaping
CN101393744A (en) Method for regulating threshold and detection module
CN101271691A (en) Time-domain noise reshaping instrument start-up judging method and device
JP2656069B2 (en) Voice detection device
CN113205826B (en) LC3 audio noise elimination method, device and storage medium
JP2006126372A (en) Audio signal coding device, method, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120222

Termination date: 20120703