CN1460992A - Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding - Google Patents

Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding Download PDF

Info

Publication number
CN1460992A
CN1460992A CN03148514A CN03148514A CN1460992A CN 1460992 A CN1460992 A CN 1460992A CN 03148514 A CN03148514 A CN 03148514A CN 03148514 A CN03148514 A CN 03148514A CN 1460992 A CN1460992 A CN 1460992A
Authority
CN
China
Prior art keywords
bank
filters
signal
frequency
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN03148514A
Other languages
Chinese (zh)
Inventor
潘兴德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING FUGUO DIGITAL TECHN Co Ltd
Original Assignee
BEIJING FUGUO DIGITAL TECHN Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING FUGUO DIGITAL TECHN Co Ltd filed Critical BEIJING FUGUO DIGITAL TECHN Co Ltd
Priority to CN03148514A priority Critical patent/CN1460992A/en
Publication of CN1460992A publication Critical patent/CN1460992A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention relates to filter group for data compression and signal processing in signal processing process, in more detail, it is used for decorrelation of voice-frequency signal so as to provide a method for resolving redundancy and its device. Besides, based on psychoacoustic model, said invention also can be used for separating voice-frequency signal to obtain signal component with different importance. It utilizing cosine modulation filter technology to form filter group structure with several different time-frequency division, and filter group structure can make real-time signal-adaptive change-over, so that at the same time of reducing coding delay it can obtain high coding efficiency.

Description

Be used for the low delay of sensing audio coding/decoding, adaptive multiresolution bank of filters
Technical field
The present invention relates to data compression and signal Processing bank of filters in the signal Processing, in more detail, it is used for the decorrelation of sound signal, thereby provide a method and apparatus of clearing up redundance, in addition, based on psychoacoustic model, the present invention also can be used for separating the component of signal with different importance.
Background technology
The Digital Audio Compression Coding technology can obtain high-quality coding effect with lower speed, and its ultimate principle is: the redundance of 1) managing to eliminate sound signal; 2) make full use of human hearing characteristic.
As everyone knows, some linear transformations can cause approaching zero high frequency coefficient, in other words, the most information that time-domain signal comprises can be converted or focus on frequency domain or the time--a son of frequency coefficient is concentrated, so the signal compression technology adopts different filter structures as the means that improve code efficiency widely.
In psychologic acoustics, a pure tone can be the center with it, and the continuing noise with certain bandwidth is sheltered, if noise power equals the power of this pure tone in this frequency band, this moment, this pure tone was in the critical conditions that can just be heard, and claimed that promptly this frequency band is critical bandwidth (unit is Bark).Critical band is the psychology foundation of sub-band division in the coding.People's ear to the analysis of sound signal based on critical band, a similar non-equiband bank of filters, widely different in different subbands.Therefore, critical band is the psychologic acoustics foundation of sub-band division in the coding.In sensing audio encoding, the division of subband should be approaching with the width of people's ear critical band as far as possible, so that better adapt to the auditory properties of people's ear.But, in the coding of reality, consider the cost of realization, this requirement can not obtain satisfying completely.Reason is, can and quantize the difficulty that there is technical elements in design near the non-equiband bank of filters design of the auditory properties of people's ear, relevant psychoacoustic analysis.
Usually, a basic operation of perception audio encoding device be the sound signal of input from time domain be mapped to frequency field or the time--frequency domain, its basic thinking is: signal decomposition is the composition on each frequency band; In case input signal is expressed on frequency domain, psychoacoustic model just can be with removing minor matters information; Further, the composition on each frequency band is divided into groups.At last, by allocation bit number reasonably to express each class frequency parameter.Because sound signal shows strong quasi periodic, this process can reduce data volume greatly, promote code efficiency.
In nearest several years, a series of can be used for that the signal composition separates and redundance is extracted the time--frequency domain shines upon (being also referred to as conversion and filtering) algorithm and is developed.The different method of these performances comprises:
(1) discrete Fourier transform (DFT) (Discrete Fourier transform, DFT).
(2) discrete cosine transform (Di screte cosine transform, DCT).
(3) mirror filter (Quadrature mirror filters, QMF).
(4) pseudo-mirror filter (Pseudo QMF, PQMF).
(5) the cosine modulation wave filter (Cosine Modulated Filters, CMF).(comprising discrete cosine transform, i.e. MDCT)
(6) discrete wavelet (bag) conversion (Discrete Wavelet (Packet) Transform, DW (P) T).
Above-mentioned various conversion has different relative merits, and different systems selects the basic comprising of suitable conversion as its bank of filters as required for use.
MPEG-1,2 Layers I and II have adopted PQMF as bank of filters.The advantage of this bank of filters is: structure is simple relatively, temporal resolution is fine.Its shortcoming is: have tangible frequency overlap between the contiguous subband; The variation of single-frequency signals can influence two subbands that are adjacent.The following frequency span of 2000Hz is much larger than the psychologic acoustics bandwidth value, thereby can't realize the optimum allocation of bit number.The real-time operation amount is bigger than normal.
MPEG-1,2 Layer III have adopted the cascade of PQMF and MDCT as its bank of filters.Thereby improve code efficiency though the introducing of MDCT can promote frequency resolution, the frequency overlap of PQMF between contiguous subband still can cause mixing repeatedly of signal, and the diffusion ratio of frequency domain quantizing noise on time domain is more serious.
MPEG-2,4 AAC have adopted MDCT, and (steady-state signal: 1024-point MDCT, transient signal: 128-point MDCT), this bank of filters has been used two kinds of overlapping window shape: SINE and KBD as bank of filters.Its advantage is: frequency resolution is fine; Its shortcoming is: temporal resolution is on the low side.
Bank of filters and the MPEG-2 of MPEG-4 Twin VQ, 4 AAC are similar, and in addition, it has adopted the linear filter group to carry out the normalization operation with the albefaction spectral coefficient and before quantized level.
The bank of filters of AC-3 is used 256-point MDCT to steady-state signal, and transient signal is used 128-point MDCT, and its block length choice mechanism is fairly simple, and the selection effect is a suboptimum.
Said system or only adopt a kind of alternate arrangement to go compression to express an input signal frame, perhaps adopt interval less bank of filters of time-domain analysis or conversion compression to express and change violent signal (or the varying signal of expressing one's gratification), to eliminate the influence of pre-echo decoded signal.When a signal frame comprised the composition of different transient characterisitics, single alternate arrangement was not enough to satisfy the unlike signal subframe to optimizing the primary demand of compression; And simply adopt bank of filters less between the time domain active region or conversion to handle fast changed signal, then the frequency resolution of gained coefficient is lower, makes the frequency resolution of low frequency part much larger than the critical subband bandwidth of people's ear, thereby has a strong impact on code efficiency.
The bank of filters of ATRAC is formed by pre-echo gain control, QMF and MDCT cascade.It has also adopted window to change the mechanism and has adjusted time frequency resolution with the characteristic according to input signal.
The bank of filters of DTS is made of 512-tap 32 subband PQMF.In order further to extract redundance, a linear filter group can be in cascade after the PQMF.
Deepen Sinha and J.D.Johnston have proposed a kind of coding techniques (Deepen Sinha and J.D.Johnston " Audiocompression at low bit rates using a signal adaptiVe switchedfilterbank " based on MDCT and the switching of wavelet transform signal self-adaptation, In Proc.IEEE Int.Conf.Acoust., Speech, SignalProcessing, volume 2, pages 1053-1056, Atlanta, USA, 1996.), to tempolabile signal, adopt the higher MDCT conversion of frequency resolution, to the violent signal of conversion, adopt wavelet transformation, obtained higher code efficiency.
Marcus Purat and Peter Noll carry out filtering again by the output to the cosine modulation bank of filters, a kind of filtering technique (MarcusPurat and Peter Noll of the multiresolution that is used for audio coding newly are provided, " A new orthonormal wavelet packetdecomposition for audio coding using frequency-varying modulatedlapped transforms ", IEEE 1995 Workshop on Applications of SignalProcessing to Audio and Acoustics, New Paltz, N.Y. (USA), 1995), also obtained higher code efficiency.
Summary of the invention
In order to improve the quality of audio coding, must effectively improve statistical redundancy and the irrelevant composition of the sense of hearing in the coded signal.The use of bank of filters provides a kind of removal statistics and sense of hearing redundant information, and the optimal path that reduces the coding side information.According to its function, the purpose of design of filtering comprises:
(1) for different signal types, adjust bank of filters the time, resolution frequently, optimized Separation has the signal content of different apperceive characteristics.
(2) adopt the basis function of long as far as possible improvement cosine form, effectively remove or weaken statistical redundancy in the sound signal.
(3) self-adaptation by the bank of filters time frequency resolution is switched, and the overlapping window adding technology between consecutive frame, has reduced pre-echo (Pre Echo) noise as far as possible and by the sense of hearing blocking effect that uncontinuity caused (Blocking Effect) on border.
(4) owing to effectively removed the statistical redundancy and the irrelevant redundancy of perception of sound signal, under the prerequisite that keeps the sound signal quality, effectively improved the compression efficiency of sound signal.
(5) filtering technique of Cai Yonging can produce less volume/separate delay.
(6) adopt fast algorithm, operand is less.
In order to realize above-mentioned target, the present invention adopts cosine modulation bank of filters technology, designs one group of filter bank structure of switching according to the transient state tolerance of audio input signal, when eliminating or weakening intersymbol statistical redundancy, make full use of human hearing characteristic, to improve code efficiency.
Low delay proposed by the invention, adaptive multiresolution filter bank structure are meant in audio coding, according to the type of present encoding signal, dynamically adjust the technology of filter structure.According to signal properties, the time of dynamic adjustments bank of filters--frequency resolution, the optimization filtering and the time-frequency representation of picked up signal.To reduce coding bit rate to greatest extent, perhaps under the bit rate of determining, obtain high as far as possible coding subjective quality.
It is to carry out wavelet transformation by the frequency coefficient that the filtering to cosine modulation obtains to realize that the signal adaptive of multiresolution filter structure of the present invention is regulated.Input signal is analyzed through the transient state metric module, is divided into tempolabile signal, fast changed signal (can segment the non-class I type fast changed signal of difference, class II type signal etc. to fast changed signal); Then, different signal type adopts different filter structure filtering, obtains when required--the frequency filter factor.
In the described audio-frequency signal coding process, at first divide frame, then frame signal is carried out transient state tolerance, will determine the current demand signal type then, and select corresponding filter structure signal.Concrete, form by two steps for the filtering of fast changed signal: after 1, carrying out the filtering of equiband bank of filters; 2, filter factor is carried out multiresolution analysis again.The multiresolution analysis structure of different fast changed signal types is different, to improve code efficiency.
Bank of filters of the present invention is used for audio coding decoding wherein, has obtained very high coding efficiency, and the not significant increase of needed operand.
Description of drawings
In declarative procedure of the present invention, we will adopt a series of synoptic diagram, but these synoptic diagram should not be understood that restrictive condition of the present invention, because those skilled in the art can finish one according to the method that the present invention sets forth and similarly realize.These synoptic diagram are:
Fig. 1 is the structured flowchart of the analysis and synthesis bank of filters of cosine modulation wave filter.
Fig. 2 is the principle of work block diagram of bank of filters of the present invention.
Fig. 3 is the filter structure synoptic diagram that part MDCT coefficient is carried out wavelet transformation with the Harr small echo.
Fig. 4 divides synoptic diagram to part MDCT coefficient with the time-frequency that the Harr small echo carries out wavelet transformation.
Fig. 5 is the workflow diagram of bank of filters of the present invention.
Fig. 6 is the process flow diagram that filtering technique of the present invention is used for a typical encoder of audio coding.
Fig. 7 is the process flow diagram that filtering technique of the present invention is used for a typical decoder of audio coding.
Embodiment
Low delay proposed by the invention, adaptive multiresolution filter bank structure are meant in audio coding, according to the type of present encoding signal, dynamically adjust the technology of filter structure.The strategy that is different from the length MDCT transform block of AAC, the present invention is according to signal properties, the time of dynamic adjustments bank of filters--frequency resolution, the optimization filtering and the time-frequency representation of picked up signal.To reduce coding bit rate to greatest extent, perhaps under the bit rate of determining, obtain high as far as possible coding subjective quality.
It is to carry out wavelet transformation by the frequency coefficient that the filtering to cosine modulation obtains to realize that the signal adaptive of multiresolution filter structure of the present invention is regulated.The principle of work of signal adaptive filtering technology proposed by the invention is as shown in Figure 2: input signal is analyzed through the transient state metric module, be divided into tempolabile signal, fast changed signal (can segment fast changed signal, the non-class I type fast changed signal of difference, class II type signal etc.); Then, different signal type adopts different filter structure filtering, obtains when required--the frequency filter factor.
Transient state tolerance can calculate according to statistical property, masking characteristics and/or the time-frequency characteristic of current demand signal.
The workflow of signal adaptive filter group technology of the present invention as shown in Figure 5.Its step is as follows:
(1) frequency signal decomposition framing, the input treatment scheme;
(2) select the transient state measure;
(3) transient state of calculating current demand signal frame;
(4) type of judgement current demand signal;
(5) filter structure of selection current frame signal;
(6) cosine modulation filtering;
(7) the time frequency tissue of filter factor;
(8) filtering output.
In order to narrate conveniently, the present patent application proposes two notions, i.e. " tempolabile signal " and " fast changed signal ".Because sound signal became when being, therefore, characteristics according to current frame signal, as the statistic intensity of variation, the time/the temporal masking ability indexs such as (whether can produce pre-echo) of frequency-domain waveform flatness and signal self, current frame signal is defined as " tempolabile signal " or " fast changed signal "." tempolabile signal " and so-called usually " accurate steady " or " time domain is gradual " signal of should be noted that here definition are distinguishing, " fast changed signal " also and so-called usually " non-stationary " or " transition " signal distinguish to some extent.
In the implementation procedure of bank of filters of the present invention, need to determine a signal type judgment mechanism easily and effectively, and this judgment mechanism can be used and determine according to actual coding.
In the present invention, definition sound signal transient state measure is: Z = ( Σ j = 1 N | s j - 1 N Σ j = 1 N s j | 2 + λ ) / Σ j = 1 N | s j | 2 + λ
s jBe j sample of signal of present frame; N is a frame length,
λ is less than 1 real number greater than zero; The introducing of λ is in order to highlight the importance of variation.
When the Z of following formula is lower than a certain threshold X 1The time, this signal can be defined as tempolabile signal; Otherwise, if be lower than another threshold X 2, then be type K 1Fast changed signal like this, can define a series of fast changed signal type.If establish common K kind signal type, then threshold X i(i=1 ..., K) can change adaptive change according to signal.Wherein, K and threshold X i(i=1 ..., definite method K) is as follows: if desire limits every frame filter structural information and takies L bit, then K≤2 L, the distribution function of statistical signal transient state tolerance is divided into K interval with transient state tolerance, and each interval probability distribution is equated.
In the present invention, tempolabile signal is adopted the cosine modulation bank of filters of equiband; For fast changed signal, with equiband cosine modulation bank of filters filtering the obtain coefficient identical, carrying out multiresolution analysis with tempolabile signal, thereby the adjusting heterogeneity the time--frequency resolution.When tempolabile signal and fast changed signal transition, need not adopt transition signal to handle, can guarantee the complete re-configurability of system.When this--the characteristics of division frequently meet the regularity of distribution of the critical subband of human auditory system; Simultaneously, because signal becomes branch soon and is mainly reflected in the medium-high frequency part, therefore, in audio coding, such filter structure is better than the bank of filters of other single structures or adopts the simple bank of filters of switching.
In the present invention, some parameters and mechanism are reasonably formulated.These parameters and mechanism comprise:
(a) the multiresolution filter structure of cosine modulation filter factor is selected;
(b) shape of cosine modulation bank of filters overlapping window;
(c) length of cosine modulation bank of filters overlapping window.
As mentioned above, in the present invention, the filtering of tempolabile signal and fast changed signal is all based on equiband cosine modulation bank of filters technology, and wherein, the cosine modulation bank of filters comprises two kinds of filtered version: traditional cosine modulation filtering technique and MDCT technology.One based on the information source coding/decoding system of cosine modulation filtering as shown in Figure 1.At coding side, the analyzed bank of filters of input signal resolves into M subband, and sub-band coefficients is quantized and entropy coding.In decoding end, behind entropy decoding and inverse quantization, obtain sub-band coefficients, sub-band coefficients is recovered sound signal by the filtering of synthesis filter group.
The shock response of traditional cosine modulation filtering technique is as follows: h k ( n ) = 2 p a ( n ) cos ( π M ( k + 0.5 ) ( n - D 2 ) + θ k ) - - - ( 1 )
n=0,1,…,N h-1 f k ( n ) = 2 p s ( n ) cos ( π M ( k + 0.5 ) ( n - D 2 ) + θ k ) - - - ( 2 )
n=0,1,…,N f-1
0≤k<M-1 wherein, 0≤n<2KM-1, K are the integer greater than zero, θ k = ( - 1 ) k π 4 .
Here, establish analysis window (analysis prototype filter) p of M subband cosine modulation bank of filters a(n) shock response length is N a, comprehensive window (or claiming comprehensive prototype filter) p s(n) shock response length is N s, this moment, the time-delay D of total system can be defined in [M-1, N s+ N a-M+1] in the scope, system delay is D=2sM+d (0≤d≤2M-1).
When analysis window and comprehensive window equate, promptly
p a(n)=p sAnd N (n), a=N s(3)
The time, the cosine modulation bank of filters of being represented by formula (1) and (2) is the orthogonal filter group, at this moment matrix H and F ([H] N, k=h k(n), [F] N, k=f k(n)) be orthogonal transform matrix.For obtaining the linear-phase filter group, further stipulate symmetry-windows
p a(2KM-1-n)=p a(n) (4)
For guaranteeing the complete reconstruct of quadrature and biorthogonal system, the condition that window function need satisfy is seen document (P.P.Vaidynathan, " Multirate Systems and Filter Banks ", Prentice Hall, Englewood Cliffs, NJ, 1993).
Another filtered version is MDCT (Modified Discrete Cosine Transform), is also referred to as TDAC (Time Domain Aliasing Cancellation) cosine modulation bank of filters, and its shock response is: h k ( n ) = p a ( n ) 2 M cos ( π M ( k + 0.5 ) ( n + M + 1 2 ) ) - - - ( 5 ) f k ( n ) = p s ( n ) 2 M cos ( π M ( k + 0.5 ) ( n + M + 1 2 ) ) - - - ( 6 )
0≤k<M-1 wherein, 0≤n<2KM-1, k are the integer greater than zero.Wherein, p a(n) and p s(n) be respectively analysis window (or analyzing prototype filter) and comprehensive window (or comprehensive prototype filter).
Same, when analysis window and comprehensive window equate, promptly
p a(n)=p s(n) (7)
The time, the cosine modulation bank of filters of being represented by formula (5) and (6) is the orthogonal filter group, at this moment matrix H and F ([H] N, k=h k(n), [F] N, k=f k(n)) be orthogonal transform matrix.For obtaining the linear-phase filter group, further stipulate symmetry-windows
p a(2KM-1-n)=p a(n) (8)
Then for satisfying complete reconstruct, by as can be known, analysis window and comprehensive window need satisfy Σ m = 0 2 K - 1 - 2 s p a ( mM + n ) p a ( ( m + 2 s ) M + n ) = δ ( s ) - - - ( 9 )
Wherein s = 0 , · · · , K - 1 , n = 0 , · · · , M 2 - 1 .
Relax the constraint condition of formula (7), promptly cancel the restriction that analysis window and comprehensive window equate, then the cosine modulation bank of filters is the biorthogonal modulated filter bank.Though the biorthogonal modulated filter bank has been lost the orthogonality of conversion, might obtain the performance that other more are of practical significance.Time-domain analysis is verified, and the biorthogonal modulated filter bank that obtains suc as formula (5) and (6) still satisfies complete reconstruct performance, as long as Σ m = 0 2 K - 1 - 2 s p s ( mM + n ) p a ( ( m + 2 s ) M + n ) = δ ( s ) - - - ( 10 ) Σ m = 0 2 K - 1 - 2 s ( - 1 ) m p s ( mM + n ) p a ( ( m + 2 s ) M + ( M - n - 1 ) ) = 0 - - - ( 11 )
S=0 wherein ..., K-1, n=0 ..., M-1.
The analysis window of filtering of the present invention and comprehensive window can adopt the window shape formula that satisfies the complete reconstruct of bank of filters (Perfect Reconstruction) condition arbitrarily, as SINE and KBD window commonly used in audio coding.
Below, how we introduce under the situation of minute frame, realizes the MDCT coefficient is carried out wavelet transformation.
If a time series x (i), i=0,1 ..., 2M-1 through the MDCT conversion, can obtain MDCT coefficient X (k), k=0, and 1 ..., M-1 is without loss of generality, and can suppose that M is an even number.Because in the small echo or wavelet package transforms process of reality, wavelet basis can also can be adaptive for fixing, therefore, can adopt different wavelet transformation techniques.
For the fixing wavelet transformation of wavelet basis, can adopt following overlapping wavelet transformation technique that the MDCT coefficient is carried out wavelet transformation.If the value when the coefficient at the block boundary place of each frame is still got the indefinite length wavelet transformation, according to the digital filtering principle, wavelet transformation so at this moment is equivalent to overlapping N-1 sample between the transform block.At this moment, wavelet transform matrix is the dimension of above-mentioned M * (M+N-1) matrix H Sub, wavelet transformation can be expressed as y → = H sub · x → - - - ( 12 )
Wherein x → = [ X ( 0 ) , X ( 1 ) , X ( 2 ) , · · · , X ( M + N - 1 ) ] T - - - ( 13 ) y → = [ Y 0 ( 0 ) , Y 1 ( 0 ) , Y 0 ( 1 ) , Y 1 ( 1 ) · · · Y 0 ( M / 2 - 1 ) , Y 1 ( M / 2 - 1 ) ] T - - - ( 14 )
When realizing small echo or wavelet package transforms, can be by M subband multiresolution bank of filters at low frequency and/or high-frequency sub-band is nested layer by layer realizes.All do as up conversion at each node of nested wavelet decomposition, like this, the wavelet coefficient of each frame is identical with the time domain samples number.If the nested number of times of wavelet transformation is i, consider re-sampling operations, then effectively overlapping sample number is (2 I+1-1) * (N-1).
In the wavelet reconstruction process, also adopt similar mapping mode.If establish K SubBe finite length wavelet inverse transformation matrix, N 1Be odd number, at this moment K SubDimension be ( M + N 1 - 1 / 2 ) × M 。Wavelet inverse transformation can be expressed as so x → ^ = K sub T · y → ext - - - ( 16 ) Wherein
Figure A0314851400144
If N 1Be even number, can obtain similar matrix form.The wavelet inverse transformation of multilayer can be realized by carry out above-mentioned conversion at each reconstruct node. y → ext = [ y 0 ( 0 ) , y 1 ( 0 ) , y 0 ( 1 ) , y 1 ( 1 ) , · · · , y 0 ( M / 2 + N 1 - 1 / 4 ) , y 1 ( M / 2 + N 1 - 1 / 4 ) ] T - - - ( 19 )
For the wavelet transformation of frequency domain adaptive, can adopt the symmetry expansion of biorthogonal wavelet base and data, solve the finite length filtering problem of data.When adopting Orthogonal Wavelets to carry out wavelet transformation, can adopt the overlapping window adding technology of data to solve the filtering problem of finite length data, but can increase certain data volume like this; Also can the plan boundary wave filter, realize the complete reconstruct (Perfect Reconstruction) when wave filter switches, but complexity will increase sharply along with the adaptability of wave filter.
Below, be example with the simplest Harr wavelet basis wavelet transformation, the specific implementation method of the MDCT coefficient being carried out multiresolution analysis is described.
The scale coefficient of Harr wavelet basis is [ 1 2 , 1 2 ] , Wavelet coefficient is [ 1 2 , - 1 2 ] . Be that part MDCT coefficient is carried out wavelet transformation as shown in Figure 3 with the Harr small echo below.
Wherein, MDCT coefficient separated into two parts, i.e. medium and low frequency part X 1(k), k=0 ..., k 1(not carrying out wavelet transformation) and HFS (doing the Harr wavelet transformation).Through behind the wavelet transformation, obtain the coefficient X in different T/F intervals 2(k), X 3(k), X 4(k), X 5(k), X 6(k) and X 7(k).H among the figure 0(filter factor is for low-pass filtering [ 1 2 , 1 2 ] ) , H 1(filter factor is for high-pass filtering [ 1 2 , - 1 2 ] ) , The down-sampling operation that " ↓ 2 " expression is 2 times.Its time corresponding-frequency plane is divided as shown in Figure 4.
In order to improve counting yield, can be in scrambler by+/-computing carries out the Harr wavelet transformation of filter factor, in demoder by-/+and shift operation carry out inverse transformation.At this moment, signal energy is exaggerated during owing to coding, need calculate the signal energy that is exaggerated during quantification.Same, can be in demoder by+/-computing carries out the Harr wavelet inverse transformation, and in scrambler by-/+and shift operation filter factor is carried out the Harr small echo, at this moment, signal energy is reduced, need calculate reduced signal energy during quantification.For other wavelet basiss, can adopt Lifting small echo calculative strategy to realize the integer arithmetic of wavelet transformation, reduce computational complexity.
If adopt different wavelet transformation structures, then can obtain other similar times--frequency plane is divided.Like this, can be as required, the time-frequency plane when adjusting signal analysis is arbitrarily divided, and satisfies different time and frequency resolution and analyzes requirement.
Embodiment
A following examples specific implementation of the present invention does not as an illustration limit the scope of the claim of patent of the present invention, because researchist who is skilled in technique or slip-stick artist can realize similar innovation and creation according to the present invention.
Coding implementation platform of the present invention as shown in Figure 6.An input audio signal is sampled with 44.1kHz.Sampled signal is divided framing.Every frame is formed (about 23.22ms) by 1024 samples.At first determine current demand signal frame encoding block type 601,, adopt different filter bank structure according to different block types according to the transient state of current demand signal.Psychoacoustic model utilizes human auditory system's occlusion to remove imperceptible content from input signal frame according to selected bank of filters configuration 603, simultaneously, determines the budget bit number 609 of present frame coding.Then, the mapping 605,607 between bank of filters execution time-frequency, last, pretreated data are quantized 611 and 613 (quantification is corresponding with selected alternate arrangement with Methods for Coding) of encoding, and index value and side information be packaged to advance bit stream 613.Wherein the realization details such as the following steps of bank of filters specific implementation and changing method are described:
Step 1, input audio data is decomposed framing (1024 samples);
The transient state tolerance of step 2, assessment current input signal frame: Z = ( Σ j = 1 1024 [ | s j - 1 1024 Σ j = 1 1024 s j | ] 2 + 0.618 ) / Σ j = 1 1024 | s j | 2 + 0.618
Step 3, determine the filter bank structure of current demand signal frame according to indexs such as Z value, historical information and coding gains;
Step 4, to input signal frame with the filtering of equiband cosine modulation bank of filters;
Step 5, if current demand signal be fast changed signal, carry out multiresolution analysis with the Harr small echo, the adjustment coefficient time frequency resolution.
For reducing further raising code efficiency, can realize the selected different wavelet basis structure of N kind, when coding, according to the frame mode of Z value and the selected bank of filters of historical information.For example, for tempolabile signal, carry out or not multiresolution analysis after the cosine modulation filtering.For fast changed signal, can be according to the different wavelet basis structures of indexs such as masking characteristics cascade after cosine modulation filtering of position, severe degree and people's ear of signal time-frequency conversion.
Certainly, also can in cataloged procedure, adopt optimisation strategy, calculation code gain in real time, the wavelet basis structure of acquisition coding gain maximum.
Decoding implementation platform of the present invention as shown in Figure 7.Compressed bit stream is handled through Huffman decoding 701, inverse quantization 703, multiresolution liftering 705 and IMDCT707, obtains decoded audio signal output.

Claims (11)

1, a kind of low delay of sensing audio coding/decoding, adaptive multiresolution bank of filters of being used for, it is characterized in that: utilize cosine modulation filtering and multiresolution analysis to construct the filter structure that multiple different time-frequency is divided, and, the adaptive switching of the variation according to the present encoding signal that this filter structure can be real-time.
2, bank of filters according to claim 1 is characterized in that: comprise different transient state tolerance according to the current demand signal frame Z = ( Σ j = 1 N | s j - 1 N Σ j = 1 N s j | 2 + λ ) / Σ j = 1 N | s j | 2 + λ Characteristic, self-adaptation is switched the filter bank structure that is used to encode,
To tempolabile signal, adopt the cosine modulation bank of filters of equiband;
To fast changed signal, adopt the cosine modulation bank of filters coefficient that filtering obtains of the equiband identical with tempolabile signal, carry out multiresolution analysis again, thus regulate heterogeneity the time--resolution frequently.
3, bank of filters according to claim 2 is characterized in that: based on the multiresolution time-frequency division filters group of cosine modulation, and can basis h k ( n ) = p a ( n ) 2 M cos ( π M ( k + 0.5 ) ( n + M + 1 2 ) ) - - - ( 5 ) f k ( n ) = p s ( n ) 2 M cos ( π M ( k + 0.5 ) ( n + M + 1 2 ) ) - - - ( 6 )
0≤k<M-1 wherein, 0≤n<2KM-1, K are the integer greater than zero, and structure satisfies the multiresolution filter structure of different performance requirement.
4, bank of filters according to claim 2 is characterized in that: it is to carry out wavelet transformation by the frequency coefficient that the filtering to cosine modulation obtains to realize that the signal adaptive of multiresolution filter structure is regulated.
5, bank of filters according to claim 2 is characterized in that: satisfy the multiresolution filter structure of different performance requirement, can be according to statistical property, masking characteristics and/or the time-frequency characteristic of current demand signal, and the self-adaptation of filter structure is regulated.
6, bank of filters according to claim 2, it is characterized in that: based on the multiresolution time-frequency division filters group of cosine modulation, to input signal conversion/filtering the time, in the different frequency interval, adopt the cosine modulation bank of filters of different time frequency resolution, the time-frequency that obtains multiresolution is divided, and makes system satisfy complete reconstruct, and its complete reconstruction condition is: Σ m = 0 2 K - 1 - 2 s p s ( mM + n ) p a ( ( m + 2 s ) M + n ) = δ ( s ) - - - ( 10 ) Σ m = 0 2 K - 1 - 2 s ( - 1 ) m p s ( mM + n ) p a ( ( m + 2 s ) M + ( M - n - 1 ) ) = 0 - - - ( 11 )
S=0 wherein ..., K-1, n=0 ..., M-1.
7, bank of filters according to claim 2 is characterized in that: based on the multiresolution time-frequency division filters group of cosine modulation, utilize cosine modulation bank of filters technology h k ( n ) = 2 p a ( n ) cos ( π M ( k + 0.5 ) ( n - D 2 ) + θ k ) - - - ( 1 )
n=0,1,…,N h-1 f k ( n ) = 2 p s ( n ) cos ( π M ( k + 0.5 ) ( n - D 2 ) + θ k ) - - - ( 2 )
n=0,1,…,N f-1
0≤k<M-1 wherein, 0≤n<2KM-1, K are the integer greater than zero, θ k = ( - 1 ) k π 4 , Construct a specific multiresolution analysis structure, satisfy the requirement of the masking characteristics compressing audio signal that utilizes signal statistics redundancy and human auditory system.
8, according to the bank of filters of claim 2, it is characterized in that: based on the multiresolution bank of filters of cosine modulation technology, carry out wavelet transformation for the cosine modulation filter factor, with the time of regulating system--frequency resolution, make low-frequency component have higher frequency resolution, radio-frequency component has higher temporal resolution, and the time of unlike signal composition--and frequency resolution is meticulous adjustable, and its wavelet inverse transformation can be expressed as: x → ^ = K sub T · y → ext - - - ( 16 ) Wherein If N 1Be even number, can obtain similar matrix form, the wavelet inverse transformation of multilayer can realize by carry out above-mentioned conversion at each reconstruct node, y → ext = [ y 0 ( 0 ) , y 1 ( 0 ) , y 0 ( 1 ) , y 1 ( 1 ) , · · · , y 0 ( M / 2 + N 1 - 1 / 4 ) , y 1 ( M / 2 + N 1 - 1 / 4 ) ] T - - - ( 19 )
9, according to claim 2,8 described bank of filters, it is characterized in that: the multiresolution bank of filters technology that the cosine modulation bank of filters of different resolution is formed, filtering than high time resolution realizes by wave filter output the carrying out wavelet transformation to upper frequency resolution, guarantees that simultaneously the complete reconstruct performance of system is not destroyed.
10, bank of filters according to claim 9, it is characterized in that: to wave filter output the carrying out Harr wavelet transformation of upper frequency resolution, can by simple+/-and shift operation realize that and the conversion of different resolution is meticulous adjustable according to current demand signal character.
11, bank of filters according to claim 10 is characterized in that: in the different resolution conversion, can guarantee that the complete reconstruct performance of system filter structure is not destroyed.
CN03148514A 2003-07-01 2003-07-01 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding Pending CN1460992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN03148514A CN1460992A (en) 2003-07-01 2003-07-01 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN03148514A CN1460992A (en) 2003-07-01 2003-07-01 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding

Publications (1)

Publication Number Publication Date
CN1460992A true CN1460992A (en) 2003-12-10

Family

ID=29591432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN03148514A Pending CN1460992A (en) 2003-07-01 2003-07-01 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding

Country Status (1)

Country Link
CN (1) CN1460992A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010108315A1 (en) * 2009-03-24 2010-09-30 华为技术有限公司 Method and device for switching a signal delay
CN101853660A (en) * 2004-10-20 2010-10-06 弗劳恩霍夫应用研究促进协会 The diffuse sound shaping that is used for two-channel keying encoding scheme and similar scheme
CN101010723B (en) * 2004-08-25 2011-05-18 杜比实验室特许公司 Audio frequency signal processing method and device
CN101609684B (en) * 2008-06-19 2012-06-06 展讯通信(上海)有限公司 Post-processing filter for decoding voice signal
CN101241701B (en) * 2004-09-17 2012-06-27 广州广晟数码技术有限公司 Method and equipment used for audio signal decoding
CN101325060B (en) * 2007-06-14 2012-10-31 汤姆逊许可公司 Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain
CN101878504B (en) * 2007-08-27 2013-12-04 爱立信电话股份有限公司 Low-complexity spectral analysis/synthesis using selectable time resolution
CN105261373A (en) * 2015-09-16 2016-01-20 深圳广晟信源技术有限公司 Self-adaptive grid construction method and device used for bandwidth extended coding
CN109863555A (en) * 2016-07-29 2019-06-07 弗劳恩霍夫应用研究促进协会 It is reduced before partially synthetic using the Time-domain aliasing of the non-homogeneous filter group of spectrum analysis
CN112037759A (en) * 2020-07-16 2020-12-04 武汉大学 Anti-noise perception sensitivity curve establishing and voice synthesizing method

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101010723B (en) * 2004-08-25 2011-05-18 杜比实验室特许公司 Audio frequency signal processing method and device
CN101241701B (en) * 2004-09-17 2012-06-27 广州广晟数码技术有限公司 Method and equipment used for audio signal decoding
CN101853660B (en) * 2004-10-20 2013-07-03 弗劳恩霍夫应用研究促进协会 Diffuse sound envelope shaping for binaural cue coding schemes and the like
CN101853660A (en) * 2004-10-20 2010-10-06 弗劳恩霍夫应用研究促进协会 The diffuse sound shaping that is used for two-channel keying encoding scheme and similar scheme
CN101325060B (en) * 2007-06-14 2012-10-31 汤姆逊许可公司 Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain
CN101878504B (en) * 2007-08-27 2013-12-04 爱立信电话股份有限公司 Low-complexity spectral analysis/synthesis using selectable time resolution
CN101609684B (en) * 2008-06-19 2012-06-06 展讯通信(上海)有限公司 Post-processing filter for decoding voice signal
CN102265338A (en) * 2009-03-24 2011-11-30 华为技术有限公司 Method and device for switching signal delay
WO2010108315A1 (en) * 2009-03-24 2010-09-30 华为技术有限公司 Method and device for switching a signal delay
CN105261373A (en) * 2015-09-16 2016-01-20 深圳广晟信源技术有限公司 Self-adaptive grid construction method and device used for bandwidth extended coding
CN109863555A (en) * 2016-07-29 2019-06-07 弗劳恩霍夫应用研究促进协会 It is reduced before partially synthetic using the Time-domain aliasing of the non-homogeneous filter group of spectrum analysis
CN109863555B (en) * 2016-07-29 2023-09-08 弗劳恩霍夫应用研究促进协会 Method for processing an audio signal and audio processor
CN112037759A (en) * 2020-07-16 2020-12-04 武汉大学 Anti-noise perception sensitivity curve establishing and voice synthesizing method
CN112037759B (en) * 2020-07-16 2022-08-30 武汉大学 Anti-noise perception sensitivity curve establishment and voice synthesis method

Similar Documents

Publication Publication Date Title
Srinivasan et al. High-quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling
CN1272911C (en) Audio signal decoding device and audio signal encoding device
CN101521014B (en) Audio bandwidth expansion coding and decoding devices
KR101602408B1 (en) Audio signal coding and decoding method and device
CN1181467C (en) Enhancing perceptual performance of SBR and related HFR coding methods by adaptive noise-floor addition and noise substitution limiting
CN1866355B (en) Audio coding apparatus and method, and audio decoding apparatus and method
CN1527995A (en) Encoding device and decoding device
WO2005096274A1 (en) An enhanced audio encoding/decoding device and method
CN1708787A (en) Method for encoding digital audio using advanced psychoacoustic model and apparatus thereof
CN1310210C (en) Audio coding system using characteristics of a decoded signal to adapt synthesized spectral components
KR100472442B1 (en) Method for compressing audio signal using wavelet packet transform and apparatus thereof
CN1460992A (en) Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding
CN101105940A (en) Audio frequency encoding and decoding quantification method, reverse conversion method and audio frequency encoding and decoding device
CN1138254C (en) Audio signal comprssing coding/decoding method based on wavelet conversion
CN1388517A (en) Audio coding/decoding technology based on pseudo wavelet filtering
Dobson et al. High quality low complexity scalable wavelet audio coding
CN1471236A (en) Signal adaptive multi resolution wave filter set for sensing audio encoding
Zhao et al. Speech Compression with Best Wavelet Packet Transform and SPIHT Algorithm
CN101527139B (en) Audio encoding and decoding method and device thereof
CN1318904A (en) Practical sound coder based on wavelet conversion
CN1123865C (en) Block effect eliminating method in wavelet voice frequency signal processing
James et al. A comparative study of speech compression using different transform techniques
Manohar et al. Audio compression using daubechie wavelet
CN1890712A (en) Audio signal coding
Luo Ultra low delay wavelet audio coding with low complexity for real time wireless transmission

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20031210

C20 Patent right or utility model deemed to be abandoned or is abandoned