CN105261373B

CN105261373B - Adaptive grid configuration method and apparatus for bandwidth extension encoding

Info

Publication number: CN105261373B
Application number: CN201510590039.8A
Authority: CN
Inventors: 闫建新; 王磊
Original assignee: SHENZHEN GUANGSHENG XINYUAN TECHNOLOGY Co Ltd
Current assignee: Guangdong Guangsheng Research And Development Institute Co ltd
Priority date: 2015-09-16
Filing date: 2015-09-16
Publication date: 2019-01-08
Anticipated expiration: 2035-09-16
Also published as: CN105261373A

Abstract

The present invention relates to a kind of adaptive grid configuration method and apparatus for bandwidth extension encoding.The described method includes: S1, based on the transient state analysis to input monophonic audio signal carry out frequency resolution selection, to input monophonic audio signal carry out it is much more adaptive differentiate filtering, obtain optimal Time-frequency Filter signal；S2, Transient detection and positioning are carried out to each subband signal of filtering output, transient state according to each subband signal is analyzed and considers the high frequency band encoder bit rate and human ear critical band characteristic of setting, the adaptive grid configuration for carrying out frequency direction and time orientation, obtains the best T-F-grid under current code rate.The invention further relates to bandwidth extension encodings and coding/decoding method based on this method.It can be limited with code rate the present invention is based on characteristic audio signal and high-frequency signal and carry out adaptive multiresolutional filter and adaptive T-F-grid construction, the high frequency section code efficiency of digital audio and video signals and the sound quality of high frequency section signal can be significantly improved.

Description

Adaptive grid configuration method and apparatus for bandwidth extension encoding

Technical field

The present invention relates to digital audio encoding and decoding technique, more specifically to it is a kind of for bandwidth extension encoding from Adapt to grid configuration method and apparatus and bandwidth extension encoding and coding/decoding method using this method.

Background technique

The stereo exemplary operation code rate of traditional perceptual audio technology (DRA, AAC and MP3 etc.) is 96~ 128kbps, and there are apparent subjective sensations to be distorted for coding quality when 64kbps/ is stereo following.Frequency modulation broadcasting application Typical encoder bit rate is that 48kbps~64kbps/ is stereo, and at this moment heritage has felt the subjective sound quality of audio decoding techniques It is not able to satisfy frequency modulation broadcasting requirement.

It is proposed to this end that the bandwidth expansion (BandWidth Extension, abbreviation BWE) of digital audio and video signals encodes skill Art.Current bandwidth extension encoding technology has very much, and performance is also irregular.It has disclosed and for the bandwidth in international standard Extended coding technology mainly has the following two kinds encryption algorithm:

The first is frequency spectrum tape copy (Spectral Bandwidth described in ISO/IEC 14496-3MPEG-4 Replication, abbreviation SBR) coding.Fig. 1 shows the concrete principle block diagram of SBR coding.SBR is the algorithm of frequency domain processing, Its coding principle are as follows: quadrature mirror filter group (Quadrature Mirror Filter, the letter that every frame signal passes through 64 subbands Claim QMF) 64 uniform sub-bands are obtained, each sub-band includes 32 sampling points, is divided according to the transient response of current demand signal One reasonable T-F-grid, one energy information of each raster symbol-base simultaneously carry out huffman coding.The algorithm includes simultaneously sound Tonality inspection simultaneously transmits an other single sinusoidal signal parameter information.In decoding end, normalized low frequency part is selected first, and Copy detailed information of the high frequency as high frequency；The envelope of high frequency is modified by the energy information of transmission again；Finally supplement single-frequency letter Number；Finally the high-frequency signal of recovery is filtered.The synthesis QMF filtering that low frequency and high frequency pass through 64 bands again in this way is rebuild Pcm digital audio signal.

The major defect of MPEG SBR coding techniques is relatively-stationary time-frequency segmentation.For 48kHz sample rate, due to making With 64 band QMF, then maximum frequency resolution ratio is 375Hz (24khz/64)；Every 2048 sampling point of frame, then maximum time resolution ratio be about 1.3ms(64/48000).Since audio signal is extremely complex, this algorithm cannot meet the precision of signal analysis well sometimes It is required that.

Second is a kind of simple bandwidth expansion technique for including in 3GPP AMR-WB+ coding method.It is a kind of The algorithm of Time Domain Processing, main code principle is: input signal being divided into the low frequency of same bandwidth and high frequency two parts time domain is believed Number, low frequency (LF) is partially filtered to obtain the residual signals of low frequency signal by lpc analysis, then synthesizes by high frequency LPC Filtering is to simulate high frequency detail signal；Then by with practical S_HF(n) practical high-frequency signal compares, and obtains high-frequency envelope (energy Amount) gain vector (one yield value of every subframe), finally by low frequency high and low frequency tie point gain consistency into One step modified gain vector, then encodes this gain vector.Therefore be transferred to decoding end includes the gain vector and height of correction Frequency LPC coefficient.The high frequency decoding process of AMR-WB+ is substantially the inverse process encoded.

The bandwidth extension encoding technology of 3GPP AMR-WB+ has the following problems: (1) in time domain realize high-frequency coding, it can not Higher frequency resolution is obtained, because this method can consider that only one high-frequency region divides；(2) of high-frequency coding Beginning frequency band is fixed, and can only be Fs/4, and for 48khz sample frequency, the starting frequency point of high-frequency coding is 12khz；(3) in high frequency Harmonic signal can not accurately restore；(4) envelope restoration of high-frequency signal is not accurate enough.

In addition there are some bandwidth extension encoding technologies, time-frequency conversion unit uses traditional FFT, then will on frequency domain High frequency is divided into several regions, encodes to the spectrum energy in each region, therefore can only to provide a temporal resolution multiple for every frame Frequency resolution.This High Frequency Reconstruction Technology based on FFT, frequency domain resolution is high and time resolution is too low, becomes when input is fast Audio signal when, the signal of high-frequency reconstruction cannot track the variation of original audio signal well.

Intensity-stereo encoding in digital audio encoding it is also assumed that be a kind of special bandwidth extension encoding technology, Its principle is insensitive to the details of high frequency section using human auditory system, therefore to stereo or 5.1 surround sounds each sound channels High frequency section carry out under mix as a sound channel, and the high frequency detail signal after normalizing as all sound channels, but each sound The envelope (energy in high frequency critical band) of the high-frequency signal in road requires coding transmission.

Summary of the invention

The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of for bandwidth expansion The adaptive grid configuration method and apparatus of coding and bandwidth extension encoding and coding/decoding method using this method are opened up, to improve The code efficiency of digital audio and video signals high frequency section and the sound quality of high frequency section signal.

The technical solution adopted by the present invention to solve the technical problems is: proposing a kind of for the adaptive of bandwidth extension encoding Grid configuration method is answered, is included the following steps:

S1, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to input monophone Audio channel signal carries out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal；

S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state according to each subband signal Property analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time orientation Adaptive grid configuration obtains the best T-F-grid under current code rate.

According to one embodiment of present invention, the step S1 further comprises:

To transient signal, thick frequency resolution and high temporal resolution is selected to be filtered；

To stable state model, thin frequency resolution and low temporal resolution is selected to be filtered；

To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered.

In the case of lower for high frequency band encoder bit rate, the selected frequency resolution is reduced.

According to one embodiment of present invention, the adaptive grid configuration of frequency direction is further wrapped in the step S2 Include: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid configurations, specifically:

To general audio signal, frequency grid gradually reduces frequency resolution with the frequency raising of highband part, so that Frequency grid is consistent with human ear critical band；

The case where to including tone signal in highband part, under the premise of considering critical band, with the general sound The case where frequency signal, is compared to the appropriate frequency resolution for increasing grid；

The adaptive grid configuration of time orientation further comprises in the step S2: according to input monophonic audio signal The transient response of position and each subband signal that middle one or more transient signal occurs, is configured to multiple in time orientation Temporal interval, each section represent a grid.

According to one embodiment of present invention, the step S2 further comprises:

High frequency band encoder bit rate based on setting corrects the frequency direction and grid construction that time orientation obtains, obtains Best T-F-grid under proper preceding code rate.

The present invention is to solve its technical problem also to propose a kind of adaptive grid configuration device for bandwidth extension encoding, Include:

Adaptive multiresolutional filter module, for carrying out frequency based on the transient state analysis to input monophonic audio signal The selection of rate resolution ratio carries out adaptive multiresolution QMF to input monophonic audio signal and filters, obtains optimal Time-frequency Filter Signal；

T-F-grid constructing module, for carrying out Transient detection and positioning, foundation to each subband signal of filtering output The high frequency band encoder bit rate and human ear critical band characteristic of setting are analyzed and considered to the transient state of each subband signal, carries out frequency The adaptive grid configuration in rate direction and time orientation obtains the best T-F-grid under current code rate.

According to one embodiment of present invention, the adaptive multiresolutional filter module is further used for:

According to one embodiment of present invention, the T-F-grid constructing module carries out the adaptive grid of frequency direction Construction further comprises: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid structures It makes, specifically:

The adaptive grid configuration that the T-F-grid constructing module carries out time orientation further comprises: single according to input The transient response of one or more transient signals occur in channel audio signal position and each subband signal, in time side To multiple temporal intervals are configured to, each section represents a grid.

The present invention is to solve its technical problem also to propose a kind of bandwidth extension encoding method, is included the following steps:

S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state according to each subband signal Property analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time orientation Adaptive grid configuration obtains the best T-F-grid under current code rate；

S3, high-frequency parameter coding is carried out as unit of the T-F-grid；

S4, coding parameter is multiplexed, exports high frequency band code stream, the coding parameter includes multiresolutional filter selection parameter With T-F-grid parameter.

The present invention is to solve its technical problem also to propose a kind of bandwidth expansion coding/decoding method, is included the following steps:

S1, the high frequency band code stream of input obtained according to first mentioned bandwidth extended coding method is demultiplexed, is compiled Code parameter, the coding parameter includes multiresolutional filter selection parameter and T-F-grid parameter；

S2, high-frequency parameter decoding is carried out based on T-F-grid, restores high frequency subband signals；

S3, foundation multiresolutional filter selection parameter, the low frequency sub-band obtained to the high frequency subband signals and decoding are believed Number carrying out corresponding with coding side multiresolution synthesizes, and exports the monophonic audio signal of Whole frequency band.

The present invention is used for the adaptive grid configuration method and apparatus of bandwidth extension encoding, is based on characteristic audio signal and height Frequency signal can be limited with code rate carries out adaptive multiresolutional filter and adaptive T-F-grid construction (Adaptive multi- Resolution Filtering&Adaptive time-frequency Griding, abbreviation AFAG), obtain best time-frequency grid Lattice, conducive to the processing of the modules such as the subsequent high-frequency parameter coding as unit of T-F-grid of bandwidth extension encoding, hence it is evident that improve number The high frequency section code efficiency of word audio signal and the sound quality of high frequency section signal.Based on bandwidth extension encoding of the present invention and It is low that the digital audio codec that coding/decoding method is realized can be used for Sound processing and high quality audio broadcast of satellite HDTV etc. Bit Rate Audio Coding field.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:

Fig. 1 is the functional block diagram of existing SBR coding method；

Fig. 2 is the flow chart of the adaptive grid configuration method for bandwidth extension encoding of one embodiment of the invention；

Fig. 3 is the logical construction of the adaptive grid configuration device for bandwidth extension encoding of one embodiment of the invention Block diagram；

Fig. 4 is the functional block diagram of the bandwidth extension encoding method of one embodiment of the invention；

Fig. 5 is the functional block diagram of the bandwidth expansion coding/decoding method of one embodiment of the invention；

Fig. 6 is the functional block diagram of the SBR coding application example based on AFAG；

Fig. 7 is the functional block diagram of the SBR decoding application example based on AFAG；

Fig. 8 is the schematic diagram of 128 subband QMF time frequency resolutions；

Fig. 9 is the schematic diagram of 64 subband QMF time frequency resolutions；

Figure 10 is the schematic diagram of 32 subband QMF time frequency resolutions；

Figure 11 is the schematic diagram of 16 subband QMF time frequency resolutions；

Figure 12 is the schematic diagram that the frequency grid of 64 subband QMF generally constructs；

Figure 13 is the schematic diagram of the frequency grid construction for the 64 subband QMF that subband 16-23 contains drummy speech component；

Figure 14 is the schematic diagram that the time grid of 64 subband QMF generally constructs；

Figure 15 is the schematic diagram of the best T-F-grid construction of the 64 subband QMF under constrained code rate.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Fig. 2 shows the adaptive grid configuration methods according to an embodiment of the invention for bandwidth extension encoding 100 flow chart.As shown in Figure 1, this method 100 includes the following steps:

Step S110 carries out frequency resolution selection based on the transient state analysis to input monophonic audio signal, to defeated Enter monophonic audio signal and carry out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal.

In the step, the transient state of input monophonic audio signal is analyzed in real time first, then basis analyzes Audio signal stable state/transient response carry out frequency resolution selection, to select a best multiresolutional filter device group (QMF) audio signal is filtered, exports optimal Time-frequency Filter signal.In general, based on input monophonic sound The selection strategy that the transient state of frequency signal carries out adaptive multiresolutional filter is as follows:

To transient signal, thick frequency resolution and high temporal resolution may be selected to be filtered；

To stable state model, thin frequency resolution and low temporal resolution may be selected to be filtered；

To other audio signals, intermediate frequency resolution and intermediate temporal resolution may be selected to be filtered.

Further, it is contemplated that the influence that high-frequency signal bandwidth extended coding code rate is limited, if the total bitrate of audio-frequency signal coding It is relatively low, so that the code rate of high-frequency band signals coding is relatively low (or the available bits of coding highband part are less), then want The appropriate frequency resolution for reducing high-frequency band signals filtering, i.e., in the filtering for only considering input audio signal transient response and determination Under definition case, selected frequency resolution further can be suitably reduced.

Step S120 carries out Transient detection and positioning to each subband signal of filtering output, according to each subband signal Transient state analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time The adaptive grid configuration in direction obtains the best T-F-grid under current code rate.

Specific location of the construction of T-F-grid dependent on transient signal in a frame, or even believe dependent on each filtering subband Number transient state analysis, while being also required to consider the available code rate that is distributed of highband part and human ear critical band characteristic. Therefore the elementary tactics of adaptive T-F-grid construction mainly consists of two parts, first is that the grid configuration of time orientation, i.e., together Sampling point combines in son in one frequency subband；First is that the grid configuration of frequency direction, i.e., combine between different frequency sub-bands.

The adaptive grid configuration strategy of frequency direction is mainly according to input monophonic audio signal higher frequency band part Frequency characteristic selects different grid configurations, specifically: to general audio signal, frequency grid with highband part frequency Raising gradually reduces frequency resolution, so that frequency grid is consistent with human ear critical band；To in highband part include tone The case where signal, under the premise of considering critical band, with the aforementioned general audio signal the case where compared with should suitably increase grid Frequency resolution.The adaptive grid configuration of time orientation is mainly according to one or more winks in input monophonic audio signal The transient response of position and each subband signal that state signal occurs, is configured to multiple temporal intervals, often in the direction of time A section represents a grid.

In addition, the above-mentioned T-F-grid construction being calculated based on current high-frequency band signals characteristic, it will also be by high frequency The limitation of signal bandwidth extended coding code rate, therefore also need based on high-frequency signal bandwidth extended coding code rate come to frequency direction The grid construction obtained with time orientation is corrected, to obtain the best T-F-grid under current code rate.

The present invention adaptive grid configuration method for bandwidth extension encoding described above, referred to as AFAG are calculated Method obtains best T-F-grid based on adaptive multiresolutional filter and adaptive T-F-grid construction, in favor of bandwidth expansion Subsequent high-frequency parameter coded treatment is encoded, the high frequency section code efficiency and high frequency section of digital audio and video signals can be significantly improved The sound quality of signal.

Based on the adaptive grid configuration method for bandwidth extension encoding described above, the present invention also proposes one kind Adaptive grid configuration device for bandwidth extension encoding.Fig. 3 shows according to an embodiment of the invention for bandwidth The logical construction block diagram of the adaptive grid configuration device 200 of extended coding.As shown in figure 3, this is used for bandwidth extension encoding Adaptive grid configuration device 200 includes adaptive multiresolutional filter module 210 and T-F-grid constructing module 220.Wherein, Adaptive multiresolutional filter module 210 is for carrying out frequency discrimination based on the transient state analysis to input monophonic audio signal Rate selection carries out adaptive multiresolution QMF to input monophonic audio signal and filters, obtains optimal Time-frequency Filter signal. T-F-grid constructing module 220 is used to carry out Transient detection and positioning to each subband signal of filtering output, according to every height The high frequency band encoder bit rate and human ear critical band characteristic of setting are analyzed and considered to the transient state of band signal, carries out frequency direction With the adaptive grid configuration of time orientation, the best T-F-grid under current code rate is obtained.

Specifically as shown in figure 3, adaptive multiresolutional filter module 210 further comprises audio signal analysis unit 211 With multiresolutional filter device group 212.Wherein, transient state of the audio signal analysis unit 211 first to input monophonic audio signal Property analyzed in real time, frequency resolution selection is then carried out according to the stable state of the audio signal that analyzes/transient response, with choosing One in multiresolutional filter device group 212 optimal filter group (such as quadrature mirror filter group QMF) is selected to come to the sound Frequency signal is filtered.Multiresolutional filter device group 212 is executed more points adaptive based on the selection of audio signal analysis unit 211 Resolution QMF filtering, exports optimal Time-frequency Filter signal.In general, the frequency resolution choosing of audio signal analysis unit 211 It is as follows to select strategy:

To transient signal, it may be selected that there is thick frequency resolution and high time point in multiresolutional filter device group 212 One filter group of resolution is filtered；

To stable state model, it may be selected that there is thin frequency resolution and low time point in multiresolutional filter device group 212 One filter group of resolution is filtered；

To other audio signals, the frequency resolution and centre that have intermediate in multiresolutional filter device group 212 may be selected Filter group of temporal resolution be filtered.

Further, it is contemplated that the influence that high-frequency signal bandwidth extended coding code rate is limited, only considers input audio signal above-mentioned Transient response and in the case of the filtering resolution of determination, further can suitably reduce selected frequency resolution.

Specifically as shown in figure 3, T-F-grid constructing module 220 further comprises transient detection unit 221 and T-F-grid Division unit 222.Wherein, each subband signal that transient detection unit 221 exports adaptive multiresolutional filter module 210 Carry out Transient detection and positioning.Each filtering subband that T-F-grid division unit 222 is analyzed according to transient detection unit 210 The available code rate and human ear critical band characteristic that the transient state and highband part of signal are distributed, carry out frequency direction and when Between direction adaptive grid configuration, export the best T-F-grid under current code rate.T-F-grid division unit 222 it is adaptive Time-frequency grid configuration strategy is answered, reference can be made to the associated description in preceding method 100, just repeats no more herein.

Fig. 4 is the functional block diagram of the general bandwidth extension encoding method 300 based on the aforementioned AFAG algorithm of the present invention.Such as figure Shown in 4, the bandwidth extension encoding method 300 based on AFAG of being somebody's turn to do includes the following steps:

In step 310, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to defeated Enter monophonic audio signal and carry out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal.

In step 320, Transient detection and positioning are carried out to each subband signal of filtering output, according to each subband signal Transient state analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time The adaptive grid configuration in direction obtains the best T-F-grid under current code rate.

In step 330, high-frequency parameter coding is carried out as unit of T-F-grid.High-frequency parameter coding can use existing skill Various suitable coding methods in art are realized, and the principal concern of non-present invention, therefore this will not be detailed here.

In step 340, the coding parameters such as T-F-grid parameter, multiresolutional filter selection parameter are multiplexed, export high frequency Band code stream.

For the monaural digital audio signal inputted in real time, using as described above based on the bandwidth extension encoding of AFAG Method encodes its high-frequency signal part, realizes best time-frequency grid construction under the encoder bit rate limitation of setting, can improve number The code efficiency of word audio signal high frequency section or the sound quality of high frequency section signal.

Fig. 5 is the functional block diagram of the general bandwidth expansion coding/decoding method 400 based on the aforementioned AFAG algorithm of the present invention.The band The encoding code stream that wide extension coding/decoding method 400 is used to export bandwidth extension encoding method 300 shown in Fig. 4 is decoded, with Obtain monophonic audio signal.Include the following steps: as shown in figure 5, being somebody's turn to do the bandwidth expansion coding/decoding method 400 based on AFAG

In step 410, the high frequency band code stream of input is demultiplexed, obtains multiresolutional filter selection parameter, time-frequency The coding parameters such as grid parameter.

In step 420, high-frequency parameter decoding is carried out based on T-F-grid, restores high frequency subband signals.High-frequency parameter decoding It can be realized using suitable coding/decoding method corresponding with aforementioned high-frequency parameter coding method, and non-present invention is main Focus, therefore this will not be detailed here.

In step 430, the low frequency signal for decoding recovery is analyzed by QMF, obtains low frequency sub-band signal.This is also existing skill Art, therefore details are not described herein.

In step 440, according to multiresolutional filter selection parameter, the high frequency subband signals and steps that obtained to step 420 430 obtained low frequency sub-band signals carry out more resolutions corresponding with coding side (bandwidth extension encoding method 300 referring to fig. 4) Rate synthesis, exports the monophonic audio signal of Whole frequency band.

Fig. 6 shows the SBR coding application example based on AFAG, and basic process is: input Whole frequency band audio signal High frequency section carries out adaptive T-F-grid construction by AFAG, is then encoded using the high-frequency parameter coding method of SBR, Form BWE code stream.It is specific as shown in fig. 6, adaptive multiresolutional filter in the AFAG algorithm is by from 16 subband QMF, 32 Subband QMF, 64 subband QMF and 128 subband QMF tetra- are most matched with selecting in the QMF filter groups of different frequency bands bandwidth One QMF filter group realizes that each QMF filter group provides the frequency resolution and temporal resolution of a kind of fixation, Respectively as shown in Fig. 8,9,10 and 11, wherein each grey fritter 501 indicates the smallest time frequency unit.Adaptive multiresolution filter The strategy of wave selection is:

To transient signal, 16 subband QMF filtering may be selected.16 subband QMF have most thick frequency resolution, can be by audio Signal is configured to 16 uniform frequency bands, and the bandwidth of each frequency band is Fs/32 (Fs is audio sampling frequency), and the corresponding time Resolution ratio highest is 16/Fs.Thus 16 subband QMF are filtered into each subband signal and provide maximum time resolution ratio, after being conducive to Continuous reasonable T-F-grid construction, can improve the high-frequency signal sound quality of transient signal BWE parameter coding.

To steady-state signal, 128 subband QMF filtering may be selected.128 subband QMF have most thin frequency resolution, Mei Gepin The bandwidth of band is Fs/256, and corresponding temporal resolution is minimum, is 128/Fs.Characteristics of signals in combination with each subband is The steady-state signals such as tone type provide maximum frequency resolution ratio, to improve the code efficiency of steady-state signal high frequency section.

To the audio signal of other atypia classes, can be selected according to characteristics of signals intermediate two kinds of QMF filtering (32 subbands or One of 64 subbands), it can be obtained frequency resolution more higher than 16 subband QMF, temporal resolution more higher than 128 subband QMF, Convenient for subsequent T-F-grid construction, and finally obtain more balanced high-frequency coding effect.

It is influenced by high-frequency signal BWE encoder bit rate: if the total bitrate of audio-frequency signal coding is relatively low, so that high frequency is believed Number coding code rate it is relatively low (or coding high frequency section available bits it is less), then suitably to reduce the frequency of high-frequency signal Resolution ratio filtering, i.e., it is appropriate to reduce only in accordance with input audio signal characteristic the QMF frequency filtering resolution ratio of determination.

The AFAG algorithm is by being adaptive selected a best more resolutions to a frame signal stable state/Transient Analysis Rate QMF, and the construction of T-F-grid then depends on the specific location of transient signal in a frame, or even depends on QMF subband signal Transient state analysis, while being also required to consider the available code rate that is distributed of high-frequency sub-band part and human ear critical band characteristic. Therefore the elementary tactics of adaptive T-F-grid construction mainly consists of two parts, first is that grid configuration (QMF of time orientation It is combined with interior sampling point)；First is that the grid configuration (combination of QMF intersubband) of frequency direction.SBR coding application example based on AFAG The combination of middle both direction is as follows:

Once selecting a certain multiresolutional filter QMF, then output signal has fixed frequency resolution and time resolution Rate.Therefore, the grid configuration in adaptive frequency direction is according to the encoded high frequency section frequency characteristic of BWE in audio signal, choosing Select different grid configurations.To general audio signal, basic grid configuration method in a frequency direction are as follows: in view of human ear is listened The critical band characteristic of feel, as frequency increases, the frequency resolution of human ear declines, therefore increase with frequency should be by for frequency grid Step reduces frequency resolution, i.e., several QMF subbands merge into a frequency grid unit, specific to merge the frequency that combine QMF Resolution ratio and human ear critical band width, so that frequency grid and human ear critical band are unanimous on the whole, as shown in figure 12.To BWE It include tone signal abundant in the high-frequency signal of coding, then it should be under the premise of considering critical band, with general audio signal Compared to the frequency resolution that should suitably increase grid, i.e., less QMF subband forms the grid of a frequency direction, such as 1/3 Critical bandwidth, such as grid configuration when Figure 13, which is given in the subband 16~23 of 64 subband QMF, tonal components.

The grid configuration in auto-adaptive time direction depends on what one or more transient signals in a frame signal occurred Position, while considering the transient response of each subband signal, and be configured to multiple temporal intervals in the direction of time (such as 8 are not The section of equal in width), each section represents a grid.Under normal circumstances, the signal time of the higher QMF subband of frequency is differentiated Rate requirement is lower, therefore can suitably reduce grid number, as shown in figure 14.

The grid information of adaptive T-F-grid construction describes the T-F-grid construction letter of the high-frequency sub-band of BWE coding Breath, needs to be encoded in BWE code stream and pass to decoding end, for decoding coding parameter based on grid, it can be considered that grid Lattice information is the side information of BWE high-frequency signal coding, should not occupy excessive bit (or code rate).According to low bit rate audio coding Total bitrate and low frequency signal and high-frequency signal complexity, determine encoder bit rate (or the bit of distribution of high-frequency signal BWE Number).It, also will be by the limit of code rate in this way in addition to considering based on current high frequency signal properties best T-F-grid construction calculated System, integrated calibration obtain the best T-F-grid under current code rate, as shown in figure 15.Basic correction method includes:

(1) reduce the frequency resolution of grid: i.e. in a frequency direction, the width of each grid increases, such as is originally 1/3 critical band is wide be changed to that 1/2 critical band is wide or QMF high-frequency sub-band in part low frequency sub-band using 1/3 critical band Wide and remainder is wide using 1/2 critical band.

(2) optimize the time domain direction grid construction of different Q MF high-frequency sub-band: if the transient state based on each subband signal is special Property carry out grid configuration, possible different Q MF subband have different grid number and each grid starting and terminate sampling point difference, The information of transmission is more, therefore can adjust the grid configuration section of each QMF subband on the whole, shares or reduce faceted boundary (section) description information.For example, all BWE high-frequency sub-band T-F-grids have the same grid configuration, side information is minimum；Example again Such as, all BWE high-frequency sub-bands have a T-F-grid of n (such as: n < 4), and higher BWE subband has less grid number, grid number It is the 1/2 of previous subband, and each grid is aligned with two grids of previous subband.

(3) reduce the temporal resolution of grid: i.e. in the time domain of QMF subband signal, the width for increasing grid is (i.e. each Grid includes more subband sampling points), such as: former time orientation constructs 16 uniform sections, can merge into 8 homogeneity ranges two-by-two Between, or partially merge into 12 sections (the preceding strong region grid of subband signal transient state is constant, and front and rear part suitably merges).

SBR coding application example shown in fig. 6 based on AFAG, AFAG algorithm through the invention obtain best time-frequency grid After lattice, SBR high-frequency parameter coding is carried out based on the T-F-grid, the high frequency section coding of digital audio and video signals can be significantly improved The sound quality of efficiency and high frequency section signal.SBR high-frequency parameter coding specifically includes tonality detection, pitch parameters encode, High frequency detail parameter coding, high-frequency envelope coding, this belongs to the prior art, is just no longer described in detail herein.It is multiple finally to carry out BWE parameter With output high frequency band code stream.

Fig. 7 shows the decoding application example of the SBR based on AFAG, decoding process are as follows: demultiplexes, obtains high to BWE code stream Frequency parameter；The low frequency part of the general audio signal for feeling decoder decoding acquisition passes through QMF and filters acquisition low frequency sub-band signal； Low frequency sub-band signal and high-frequency parameter are decoded by SBR high-frequency parameter and obtain high frequency subband signals；High frequency subband signals and low frequency Subband signal carries out multiresolution synthesis by adaptive multiresolution composite filter corresponding with coding side, exports full range The monophonic audio signal of band.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of adaptive grid configuration method for bandwidth extension encoding, which comprises the steps of:

S1, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to input monophonic sound Frequency signal carries out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal；

S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state point according to each subband signal The high frequency band encoder bit rate and human ear critical band characteristic for analysing and considering setting, carry out the adaptive of frequency direction and time orientation Grid configuration is answered, the best T-F-grid under current code rate is obtained；

Wherein, the step S1 further comprises:

To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered；

The adaptive grid configuration of frequency direction further comprises in the step S2: according to high in input monophonic audio signal The frequency characteristic of band portion selects different grid configurations, specifically:

The case where to including tone signal in highband part, believes under the premise of considering critical band with the general audio Number the case where compared to the appropriate frequency resolution for increasing grid；

The adaptive grid configuration of time orientation further comprises in the step S2: according in input monophonic audio signal one The transient response of position and each subband signal that a or multiple transient signals occur, is configured to multiple time domains in time orientation Section, each section represent a grid.

2. the method according to claim 1, wherein the step S1 further comprises:

3. the method according to claim 1, wherein the step S2 further comprises:

High frequency band encoder bit rate based on setting corrects the frequency direction and grid construction that time orientation obtains, is worked as Best T-F-grid under preceding code rate.

4. a kind of adaptive grid configuration device for bandwidth extension encoding characterized by comprising

Adaptive multiresolutional filter module, for carrying out frequency point based on the transient state analysis to input monophonic audio signal Resolution selection carries out adaptive multiresolution QMF to input monophonic audio signal and filters, and obtains optimal Time-frequency Filter letter Number；

T-F-grid constructing module, for carrying out Transient detection and positioning to each subband signal of filtering output, according to each The high frequency band encoder bit rate and human ear critical band characteristic of setting are analyzed and considered to the transient state of subband signal, carries out frequency side To the adaptive grid configuration with time orientation, the best T-F-grid under current code rate is obtained；

Wherein, the adaptive multiresolutional filter module is further used for:

The adaptive grid configuration that the T-F-grid constructing module carries out frequency direction further comprises: according to input monophonic The frequency characteristic of audio signal higher frequency band part selects different grid configurations, specifically:

The adaptive grid configuration that the T-F-grid constructing module carries out time orientation further comprises: according to input monophonic The transient response of one or more transient signals occur in audio signal position and each subband signal, in time orientation structure Multiple temporal intervals are caused, each section represents a grid.

5. a kind of bandwidth extension encoding method, which comprises the steps of:

S3, high-frequency parameter coding is carried out as unit of the T-F-grid；

S4, coding parameter is multiplexed, export high frequency band code stream, the coding parameter include multiresolutional filter selection parameter and when Frequency grid parameter；

Wherein, the step S1 further comprises:

6. a kind of bandwidth expansion coding/decoding method, which comprises the steps of:

S1, the high frequency band code stream obtained to the bandwidth extension encoding method according to claim 5 of input demultiplex, Coding parameter is obtained, the coding parameter includes multiresolutional filter selection parameter and T-F-grid parameter；

S3, according to multiresolutional filter selection parameter, low frequency sub-band signal that the high frequency subband signals and decoding are obtained into Row multiresolution synthesis corresponding with coding side, exports the monophonic audio signal of Whole frequency band.