CN105261373B - Adaptive grid configuration method and apparatus for bandwidth extension encoding - Google Patents
Adaptive grid configuration method and apparatus for bandwidth extension encoding Download PDFInfo
- Publication number
- CN105261373B CN105261373B CN201510590039.8A CN201510590039A CN105261373B CN 105261373 B CN105261373 B CN 105261373B CN 201510590039 A CN201510590039 A CN 201510590039A CN 105261373 B CN105261373 B CN 105261373B
- Authority
- CN
- China
- Prior art keywords
- frequency
- grid
- signal
- adaptive
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention relates to a kind of adaptive grid configuration method and apparatus for bandwidth extension encoding.The described method includes: S1, based on the transient state analysis to input monophonic audio signal carry out frequency resolution selection, to input monophonic audio signal carry out it is much more adaptive differentiate filtering, obtain optimal Time-frequency Filter signal;S2, Transient detection and positioning are carried out to each subband signal of filtering output, transient state according to each subband signal is analyzed and considers the high frequency band encoder bit rate and human ear critical band characteristic of setting, the adaptive grid configuration for carrying out frequency direction and time orientation, obtains the best T-F-grid under current code rate.The invention further relates to bandwidth extension encodings and coding/decoding method based on this method.It can be limited with code rate the present invention is based on characteristic audio signal and high-frequency signal and carry out adaptive multiresolutional filter and adaptive T-F-grid construction, the high frequency section code efficiency of digital audio and video signals and the sound quality of high frequency section signal can be significantly improved.
Description
Technical field
The present invention relates to digital audio encoding and decoding technique, more specifically to it is a kind of for bandwidth extension encoding from
Adapt to grid configuration method and apparatus and bandwidth extension encoding and coding/decoding method using this method.
Background technique
The stereo exemplary operation code rate of traditional perceptual audio technology (DRA, AAC and MP3 etc.) is 96~
128kbps, and there are apparent subjective sensations to be distorted for coding quality when 64kbps/ is stereo following.Frequency modulation broadcasting application
Typical encoder bit rate is that 48kbps~64kbps/ is stereo, and at this moment heritage has felt the subjective sound quality of audio decoding techniques
It is not able to satisfy frequency modulation broadcasting requirement.
It is proposed to this end that the bandwidth expansion (BandWidth Extension, abbreviation BWE) of digital audio and video signals encodes skill
Art.Current bandwidth extension encoding technology has very much, and performance is also irregular.It has disclosed and for the bandwidth in international standard
Extended coding technology mainly has the following two kinds encryption algorithm:
The first is frequency spectrum tape copy (Spectral Bandwidth described in ISO/IEC 14496-3MPEG-4
Replication, abbreviation SBR) coding.Fig. 1 shows the concrete principle block diagram of SBR coding.SBR is the algorithm of frequency domain processing,
Its coding principle are as follows: quadrature mirror filter group (Quadrature Mirror Filter, the letter that every frame signal passes through 64 subbands
Claim QMF) 64 uniform sub-bands are obtained, each sub-band includes 32 sampling points, is divided according to the transient response of current demand signal
One reasonable T-F-grid, one energy information of each raster symbol-base simultaneously carry out huffman coding.The algorithm includes simultaneously sound
Tonality inspection simultaneously transmits an other single sinusoidal signal parameter information.In decoding end, normalized low frequency part is selected first, and
Copy detailed information of the high frequency as high frequency;The envelope of high frequency is modified by the energy information of transmission again;Finally supplement single-frequency letter
Number;Finally the high-frequency signal of recovery is filtered.The synthesis QMF filtering that low frequency and high frequency pass through 64 bands again in this way is rebuild
Pcm digital audio signal.
The major defect of MPEG SBR coding techniques is relatively-stationary time-frequency segmentation.For 48kHz sample rate, due to making
With 64 band QMF, then maximum frequency resolution ratio is 375Hz (24khz/64);Every 2048 sampling point of frame, then maximum time resolution ratio be about
1.3ms(64/48000).Since audio signal is extremely complex, this algorithm cannot meet the precision of signal analysis well sometimes
It is required that.
Second is a kind of simple bandwidth expansion technique for including in 3GPP AMR-WB+ coding method.It is a kind of
The algorithm of Time Domain Processing, main code principle is: input signal being divided into the low frequency of same bandwidth and high frequency two parts time domain is believed
Number, low frequency (LF) is partially filtered to obtain the residual signals of low frequency signal by lpc analysis, then synthesizes by high frequency LPC
Filtering is to simulate high frequency detail signal;Then by with practical SHF(n) practical high-frequency signal compares, and obtains high-frequency envelope (energy
Amount) gain vector (one yield value of every subframe), finally by low frequency high and low frequency tie point gain consistency into
One step modified gain vector, then encodes this gain vector.Therefore be transferred to decoding end includes the gain vector and height of correction
Frequency LPC coefficient.The high frequency decoding process of AMR-WB+ is substantially the inverse process encoded.
The bandwidth extension encoding technology of 3GPP AMR-WB+ has the following problems: (1) in time domain realize high-frequency coding, it can not
Higher frequency resolution is obtained, because this method can consider that only one high-frequency region divides;(2) of high-frequency coding
Beginning frequency band is fixed, and can only be Fs/4, and for 48khz sample frequency, the starting frequency point of high-frequency coding is 12khz;(3) in high frequency
Harmonic signal can not accurately restore;(4) envelope restoration of high-frequency signal is not accurate enough.
In addition there are some bandwidth extension encoding technologies, time-frequency conversion unit uses traditional FFT, then will on frequency domain
High frequency is divided into several regions, encodes to the spectrum energy in each region, therefore can only to provide a temporal resolution multiple for every frame
Frequency resolution.This High Frequency Reconstruction Technology based on FFT, frequency domain resolution is high and time resolution is too low, becomes when input is fast
Audio signal when, the signal of high-frequency reconstruction cannot track the variation of original audio signal well.
Intensity-stereo encoding in digital audio encoding it is also assumed that be a kind of special bandwidth extension encoding technology,
Its principle is insensitive to the details of high frequency section using human auditory system, therefore to stereo or 5.1 surround sounds each sound channels
High frequency section carry out under mix as a sound channel, and the high frequency detail signal after normalizing as all sound channels, but each sound
The envelope (energy in high frequency critical band) of the high-frequency signal in road requires coding transmission.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of for bandwidth expansion
The adaptive grid configuration method and apparatus of coding and bandwidth extension encoding and coding/decoding method using this method are opened up, to improve
The code efficiency of digital audio and video signals high frequency section and the sound quality of high frequency section signal.
The technical solution adopted by the present invention to solve the technical problems is: proposing a kind of for the adaptive of bandwidth extension encoding
Grid configuration method is answered, is included the following steps:
S1, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to input monophone
Audio channel signal carries out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal;
S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state according to each subband signal
Property analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time orientation
Adaptive grid configuration obtains the best T-F-grid under current code rate.
According to one embodiment of present invention, the step S1 further comprises:
To transient signal, thick frequency resolution and high temporal resolution is selected to be filtered;
To stable state model, thin frequency resolution and low temporal resolution is selected to be filtered;
To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered.
According to one embodiment of present invention, the step S1 further comprises:
In the case of lower for high frequency band encoder bit rate, the selected frequency resolution is reduced.
According to one embodiment of present invention, the adaptive grid configuration of frequency direction is further wrapped in the step S2
Include: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid configurations, specifically:
To general audio signal, frequency grid gradually reduces frequency resolution with the frequency raising of highband part, so that
Frequency grid is consistent with human ear critical band;
The case where to including tone signal in highband part, under the premise of considering critical band, with the general sound
The case where frequency signal, is compared to the appropriate frequency resolution for increasing grid;
The adaptive grid configuration of time orientation further comprises in the step S2: according to input monophonic audio signal
The transient response of position and each subband signal that middle one or more transient signal occurs, is configured to multiple in time orientation
Temporal interval, each section represent a grid.
According to one embodiment of present invention, the step S2 further comprises:
High frequency band encoder bit rate based on setting corrects the frequency direction and grid construction that time orientation obtains, obtains
Best T-F-grid under proper preceding code rate.
The present invention is to solve its technical problem also to propose a kind of adaptive grid configuration device for bandwidth extension encoding,
Include:
Adaptive multiresolutional filter module, for carrying out frequency based on the transient state analysis to input monophonic audio signal
The selection of rate resolution ratio carries out adaptive multiresolution QMF to input monophonic audio signal and filters, obtains optimal Time-frequency Filter
Signal;
T-F-grid constructing module, for carrying out Transient detection and positioning, foundation to each subband signal of filtering output
The high frequency band encoder bit rate and human ear critical band characteristic of setting are analyzed and considered to the transient state of each subband signal, carries out frequency
The adaptive grid configuration in rate direction and time orientation obtains the best T-F-grid under current code rate.
According to one embodiment of present invention, the adaptive multiresolutional filter module is further used for:
To transient signal, thick frequency resolution and high temporal resolution is selected to be filtered;
To stable state model, thin frequency resolution and low temporal resolution is selected to be filtered;
To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered.
According to one embodiment of present invention, the T-F-grid constructing module carries out the adaptive grid of frequency direction
Construction further comprises: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid structures
It makes, specifically:
To general audio signal, frequency grid gradually reduces frequency resolution with the frequency raising of highband part, so that
Frequency grid is consistent with human ear critical band;
The case where to including tone signal in highband part, under the premise of considering critical band, with the general sound
The case where frequency signal, is compared to the appropriate frequency resolution for increasing grid;
The adaptive grid configuration that the T-F-grid constructing module carries out time orientation further comprises: single according to input
The transient response of one or more transient signals occur in channel audio signal position and each subband signal, in time side
To multiple temporal intervals are configured to, each section represents a grid.
The present invention is to solve its technical problem also to propose a kind of bandwidth extension encoding method, is included the following steps:
S1, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to input monophone
Audio channel signal carries out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal;
S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state according to each subband signal
Property analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time orientation
Adaptive grid configuration obtains the best T-F-grid under current code rate;
S3, high-frequency parameter coding is carried out as unit of the T-F-grid;
S4, coding parameter is multiplexed, exports high frequency band code stream, the coding parameter includes multiresolutional filter selection parameter
With T-F-grid parameter.
The present invention is to solve its technical problem also to propose a kind of bandwidth expansion coding/decoding method, is included the following steps:
S1, the high frequency band code stream of input obtained according to first mentioned bandwidth extended coding method is demultiplexed, is compiled
Code parameter, the coding parameter includes multiresolutional filter selection parameter and T-F-grid parameter;
S2, high-frequency parameter decoding is carried out based on T-F-grid, restores high frequency subband signals;
S3, foundation multiresolutional filter selection parameter, the low frequency sub-band obtained to the high frequency subband signals and decoding are believed
Number carrying out corresponding with coding side multiresolution synthesizes, and exports the monophonic audio signal of Whole frequency band.
The present invention is used for the adaptive grid configuration method and apparatus of bandwidth extension encoding, is based on characteristic audio signal and height
Frequency signal can be limited with code rate carries out adaptive multiresolutional filter and adaptive T-F-grid construction (Adaptive multi-
Resolution Filtering&Adaptive time-frequency Griding, abbreviation AFAG), obtain best time-frequency grid
Lattice, conducive to the processing of the modules such as the subsequent high-frequency parameter coding as unit of T-F-grid of bandwidth extension encoding, hence it is evident that improve number
The high frequency section code efficiency of word audio signal and the sound quality of high frequency section signal.Based on bandwidth extension encoding of the present invention and
It is low that the digital audio codec that coding/decoding method is realized can be used for Sound processing and high quality audio broadcast of satellite HDTV etc.
Bit Rate Audio Coding field.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the functional block diagram of existing SBR coding method;
Fig. 2 is the flow chart of the adaptive grid configuration method for bandwidth extension encoding of one embodiment of the invention;
Fig. 3 is the logical construction of the adaptive grid configuration device for bandwidth extension encoding of one embodiment of the invention
Block diagram;
Fig. 4 is the functional block diagram of the bandwidth extension encoding method of one embodiment of the invention;
Fig. 5 is the functional block diagram of the bandwidth expansion coding/decoding method of one embodiment of the invention;
Fig. 6 is the functional block diagram of the SBR coding application example based on AFAG;
Fig. 7 is the functional block diagram of the SBR decoding application example based on AFAG;
Fig. 8 is the schematic diagram of 128 subband QMF time frequency resolutions;
Fig. 9 is the schematic diagram of 64 subband QMF time frequency resolutions;
Figure 10 is the schematic diagram of 32 subband QMF time frequency resolutions;
Figure 11 is the schematic diagram of 16 subband QMF time frequency resolutions;
Figure 12 is the schematic diagram that the frequency grid of 64 subband QMF generally constructs;
Figure 13 is the schematic diagram of the frequency grid construction for the 64 subband QMF that subband 16-23 contains drummy speech component;
Figure 14 is the schematic diagram that the time grid of 64 subband QMF generally constructs;
Figure 15 is the schematic diagram of the best T-F-grid construction of the 64 subband QMF under constrained code rate.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Fig. 2 shows the adaptive grid configuration methods according to an embodiment of the invention for bandwidth extension encoding
100 flow chart.As shown in Figure 1, this method 100 includes the following steps:
Step S110 carries out frequency resolution selection based on the transient state analysis to input monophonic audio signal, to defeated
Enter monophonic audio signal and carry out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal.
In the step, the transient state of input monophonic audio signal is analyzed in real time first, then basis analyzes
Audio signal stable state/transient response carry out frequency resolution selection, to select a best multiresolutional filter device group
(QMF) audio signal is filtered, exports optimal Time-frequency Filter signal.In general, based on input monophonic sound
The selection strategy that the transient state of frequency signal carries out adaptive multiresolutional filter is as follows:
To transient signal, thick frequency resolution and high temporal resolution may be selected to be filtered;
To stable state model, thin frequency resolution and low temporal resolution may be selected to be filtered;
To other audio signals, intermediate frequency resolution and intermediate temporal resolution may be selected to be filtered.
Further, it is contemplated that the influence that high-frequency signal bandwidth extended coding code rate is limited, if the total bitrate of audio-frequency signal coding
It is relatively low, so that the code rate of high-frequency band signals coding is relatively low (or the available bits of coding highband part are less), then want
The appropriate frequency resolution for reducing high-frequency band signals filtering, i.e., in the filtering for only considering input audio signal transient response and determination
Under definition case, selected frequency resolution further can be suitably reduced.
Step S120 carries out Transient detection and positioning to each subband signal of filtering output, according to each subband signal
Transient state analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time
The adaptive grid configuration in direction obtains the best T-F-grid under current code rate.
Specific location of the construction of T-F-grid dependent on transient signal in a frame, or even believe dependent on each filtering subband
Number transient state analysis, while being also required to consider the available code rate that is distributed of highband part and human ear critical band characteristic.
Therefore the elementary tactics of adaptive T-F-grid construction mainly consists of two parts, first is that the grid configuration of time orientation, i.e., together
Sampling point combines in son in one frequency subband;First is that the grid configuration of frequency direction, i.e., combine between different frequency sub-bands.
The adaptive grid configuration strategy of frequency direction is mainly according to input monophonic audio signal higher frequency band part
Frequency characteristic selects different grid configurations, specifically: to general audio signal, frequency grid with highband part frequency
Raising gradually reduces frequency resolution, so that frequency grid is consistent with human ear critical band;To in highband part include tone
The case where signal, under the premise of considering critical band, with the aforementioned general audio signal the case where compared with should suitably increase grid
Frequency resolution.The adaptive grid configuration of time orientation is mainly according to one or more winks in input monophonic audio signal
The transient response of position and each subband signal that state signal occurs, is configured to multiple temporal intervals, often in the direction of time
A section represents a grid.
In addition, the above-mentioned T-F-grid construction being calculated based on current high-frequency band signals characteristic, it will also be by high frequency
The limitation of signal bandwidth extended coding code rate, therefore also need based on high-frequency signal bandwidth extended coding code rate come to frequency direction
The grid construction obtained with time orientation is corrected, to obtain the best T-F-grid under current code rate.
The present invention adaptive grid configuration method for bandwidth extension encoding described above, referred to as AFAG are calculated
Method obtains best T-F-grid based on adaptive multiresolutional filter and adaptive T-F-grid construction, in favor of bandwidth expansion
Subsequent high-frequency parameter coded treatment is encoded, the high frequency section code efficiency and high frequency section of digital audio and video signals can be significantly improved
The sound quality of signal.
Based on the adaptive grid configuration method for bandwidth extension encoding described above, the present invention also proposes one kind
Adaptive grid configuration device for bandwidth extension encoding.Fig. 3 shows according to an embodiment of the invention for bandwidth
The logical construction block diagram of the adaptive grid configuration device 200 of extended coding.As shown in figure 3, this is used for bandwidth extension encoding
Adaptive grid configuration device 200 includes adaptive multiresolutional filter module 210 and T-F-grid constructing module 220.Wherein,
Adaptive multiresolutional filter module 210 is for carrying out frequency discrimination based on the transient state analysis to input monophonic audio signal
Rate selection carries out adaptive multiresolution QMF to input monophonic audio signal and filters, obtains optimal Time-frequency Filter signal.
T-F-grid constructing module 220 is used to carry out Transient detection and positioning to each subband signal of filtering output, according to every height
The high frequency band encoder bit rate and human ear critical band characteristic of setting are analyzed and considered to the transient state of band signal, carries out frequency direction
With the adaptive grid configuration of time orientation, the best T-F-grid under current code rate is obtained.
Specifically as shown in figure 3, adaptive multiresolutional filter module 210 further comprises audio signal analysis unit 211
With multiresolutional filter device group 212.Wherein, transient state of the audio signal analysis unit 211 first to input monophonic audio signal
Property analyzed in real time, frequency resolution selection is then carried out according to the stable state of the audio signal that analyzes/transient response, with choosing
One in multiresolutional filter device group 212 optimal filter group (such as quadrature mirror filter group QMF) is selected to come to the sound
Frequency signal is filtered.Multiresolutional filter device group 212 is executed more points adaptive based on the selection of audio signal analysis unit 211
Resolution QMF filtering, exports optimal Time-frequency Filter signal.In general, the frequency resolution choosing of audio signal analysis unit 211
It is as follows to select strategy:
To transient signal, it may be selected that there is thick frequency resolution and high time point in multiresolutional filter device group 212
One filter group of resolution is filtered;
To stable state model, it may be selected that there is thin frequency resolution and low time point in multiresolutional filter device group 212
One filter group of resolution is filtered;
To other audio signals, the frequency resolution and centre that have intermediate in multiresolutional filter device group 212 may be selected
Filter group of temporal resolution be filtered.
Further, it is contemplated that the influence that high-frequency signal bandwidth extended coding code rate is limited, only considers input audio signal above-mentioned
Transient response and in the case of the filtering resolution of determination, further can suitably reduce selected frequency resolution.
Specifically as shown in figure 3, T-F-grid constructing module 220 further comprises transient detection unit 221 and T-F-grid
Division unit 222.Wherein, each subband signal that transient detection unit 221 exports adaptive multiresolutional filter module 210
Carry out Transient detection and positioning.Each filtering subband that T-F-grid division unit 222 is analyzed according to transient detection unit 210
The available code rate and human ear critical band characteristic that the transient state and highband part of signal are distributed, carry out frequency direction and when
Between direction adaptive grid configuration, export the best T-F-grid under current code rate.T-F-grid division unit 222 it is adaptive
Time-frequency grid configuration strategy is answered, reference can be made to the associated description in preceding method 100, just repeats no more herein.
Fig. 4 is the functional block diagram of the general bandwidth extension encoding method 300 based on the aforementioned AFAG algorithm of the present invention.Such as figure
Shown in 4, the bandwidth extension encoding method 300 based on AFAG of being somebody's turn to do includes the following steps:
In step 310, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to defeated
Enter monophonic audio signal and carry out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal.
In step 320, Transient detection and positioning are carried out to each subband signal of filtering output, according to each subband signal
Transient state analyze and consider the high frequency band encoder bit rate and human ear critical band characteristic of setting, carry out frequency direction and time
The adaptive grid configuration in direction obtains the best T-F-grid under current code rate.
In step 330, high-frequency parameter coding is carried out as unit of T-F-grid.High-frequency parameter coding can use existing skill
Various suitable coding methods in art are realized, and the principal concern of non-present invention, therefore this will not be detailed here.
In step 340, the coding parameters such as T-F-grid parameter, multiresolutional filter selection parameter are multiplexed, export high frequency
Band code stream.
For the monaural digital audio signal inputted in real time, using as described above based on the bandwidth extension encoding of AFAG
Method encodes its high-frequency signal part, realizes best time-frequency grid construction under the encoder bit rate limitation of setting, can improve number
The code efficiency of word audio signal high frequency section or the sound quality of high frequency section signal.
Fig. 5 is the functional block diagram of the general bandwidth expansion coding/decoding method 400 based on the aforementioned AFAG algorithm of the present invention.The band
The encoding code stream that wide extension coding/decoding method 400 is used to export bandwidth extension encoding method 300 shown in Fig. 4 is decoded, with
Obtain monophonic audio signal.Include the following steps: as shown in figure 5, being somebody's turn to do the bandwidth expansion coding/decoding method 400 based on AFAG
In step 410, the high frequency band code stream of input is demultiplexed, obtains multiresolutional filter selection parameter, time-frequency
The coding parameters such as grid parameter.
In step 420, high-frequency parameter decoding is carried out based on T-F-grid, restores high frequency subband signals.High-frequency parameter decoding
It can be realized using suitable coding/decoding method corresponding with aforementioned high-frequency parameter coding method, and non-present invention is main
Focus, therefore this will not be detailed here.
In step 430, the low frequency signal for decoding recovery is analyzed by QMF, obtains low frequency sub-band signal.This is also existing skill
Art, therefore details are not described herein.
In step 440, according to multiresolutional filter selection parameter, the high frequency subband signals and steps that obtained to step 420
430 obtained low frequency sub-band signals carry out more resolutions corresponding with coding side (bandwidth extension encoding method 300 referring to fig. 4)
Rate synthesis, exports the monophonic audio signal of Whole frequency band.
Fig. 6 shows the SBR coding application example based on AFAG, and basic process is: input Whole frequency band audio signal
High frequency section carries out adaptive T-F-grid construction by AFAG, is then encoded using the high-frequency parameter coding method of SBR,
Form BWE code stream.It is specific as shown in fig. 6, adaptive multiresolutional filter in the AFAG algorithm is by from 16 subband QMF, 32
Subband QMF, 64 subband QMF and 128 subband QMF tetra- are most matched with selecting in the QMF filter groups of different frequency bands bandwidth
One QMF filter group realizes that each QMF filter group provides the frequency resolution and temporal resolution of a kind of fixation,
Respectively as shown in Fig. 8,9,10 and 11, wherein each grey fritter 501 indicates the smallest time frequency unit.Adaptive multiresolution filter
The strategy of wave selection is:
To transient signal, 16 subband QMF filtering may be selected.16 subband QMF have most thick frequency resolution, can be by audio
Signal is configured to 16 uniform frequency bands, and the bandwidth of each frequency band is Fs/32 (Fs is audio sampling frequency), and the corresponding time
Resolution ratio highest is 16/Fs.Thus 16 subband QMF are filtered into each subband signal and provide maximum time resolution ratio, after being conducive to
Continuous reasonable T-F-grid construction, can improve the high-frequency signal sound quality of transient signal BWE parameter coding.
To steady-state signal, 128 subband QMF filtering may be selected.128 subband QMF have most thin frequency resolution, Mei Gepin
The bandwidth of band is Fs/256, and corresponding temporal resolution is minimum, is 128/Fs.Characteristics of signals in combination with each subband is
The steady-state signals such as tone type provide maximum frequency resolution ratio, to improve the code efficiency of steady-state signal high frequency section.
To the audio signal of other atypia classes, can be selected according to characteristics of signals intermediate two kinds of QMF filtering (32 subbands or
One of 64 subbands), it can be obtained frequency resolution more higher than 16 subband QMF, temporal resolution more higher than 128 subband QMF,
Convenient for subsequent T-F-grid construction, and finally obtain more balanced high-frequency coding effect.
It is influenced by high-frequency signal BWE encoder bit rate: if the total bitrate of audio-frequency signal coding is relatively low, so that high frequency is believed
Number coding code rate it is relatively low (or coding high frequency section available bits it is less), then suitably to reduce the frequency of high-frequency signal
Resolution ratio filtering, i.e., it is appropriate to reduce only in accordance with input audio signal characteristic the QMF frequency filtering resolution ratio of determination.
The AFAG algorithm is by being adaptive selected a best more resolutions to a frame signal stable state/Transient Analysis
Rate QMF, and the construction of T-F-grid then depends on the specific location of transient signal in a frame, or even depends on QMF subband signal
Transient state analysis, while being also required to consider the available code rate that is distributed of high-frequency sub-band part and human ear critical band characteristic.
Therefore the elementary tactics of adaptive T-F-grid construction mainly consists of two parts, first is that grid configuration (QMF of time orientation
It is combined with interior sampling point);First is that the grid configuration (combination of QMF intersubband) of frequency direction.SBR coding application example based on AFAG
The combination of middle both direction is as follows:
Once selecting a certain multiresolutional filter QMF, then output signal has fixed frequency resolution and time resolution
Rate.Therefore, the grid configuration in adaptive frequency direction is according to the encoded high frequency section frequency characteristic of BWE in audio signal, choosing
Select different grid configurations.To general audio signal, basic grid configuration method in a frequency direction are as follows: in view of human ear is listened
The critical band characteristic of feel, as frequency increases, the frequency resolution of human ear declines, therefore increase with frequency should be by for frequency grid
Step reduces frequency resolution, i.e., several QMF subbands merge into a frequency grid unit, specific to merge the frequency that combine QMF
Resolution ratio and human ear critical band width, so that frequency grid and human ear critical band are unanimous on the whole, as shown in figure 12.To BWE
It include tone signal abundant in the high-frequency signal of coding, then it should be under the premise of considering critical band, with general audio signal
Compared to the frequency resolution that should suitably increase grid, i.e., less QMF subband forms the grid of a frequency direction, such as 1/3
Critical bandwidth, such as grid configuration when Figure 13, which is given in the subband 16~23 of 64 subband QMF, tonal components.
The grid configuration in auto-adaptive time direction depends on what one or more transient signals in a frame signal occurred
Position, while considering the transient response of each subband signal, and be configured to multiple temporal intervals in the direction of time (such as 8 are not
The section of equal in width), each section represents a grid.Under normal circumstances, the signal time of the higher QMF subband of frequency is differentiated
Rate requirement is lower, therefore can suitably reduce grid number, as shown in figure 14.
The grid information of adaptive T-F-grid construction describes the T-F-grid construction letter of the high-frequency sub-band of BWE coding
Breath, needs to be encoded in BWE code stream and pass to decoding end, for decoding coding parameter based on grid, it can be considered that grid
Lattice information is the side information of BWE high-frequency signal coding, should not occupy excessive bit (or code rate).According to low bit rate audio coding
Total bitrate and low frequency signal and high-frequency signal complexity, determine encoder bit rate (or the bit of distribution of high-frequency signal BWE
Number).It, also will be by the limit of code rate in this way in addition to considering based on current high frequency signal properties best T-F-grid construction calculated
System, integrated calibration obtain the best T-F-grid under current code rate, as shown in figure 15.Basic correction method includes:
(1) reduce the frequency resolution of grid: i.e. in a frequency direction, the width of each grid increases, such as is originally
1/3 critical band is wide be changed to that 1/2 critical band is wide or QMF high-frequency sub-band in part low frequency sub-band using 1/3 critical band
Wide and remainder is wide using 1/2 critical band.
(2) optimize the time domain direction grid construction of different Q MF high-frequency sub-band: if the transient state based on each subband signal is special
Property carry out grid configuration, possible different Q MF subband have different grid number and each grid starting and terminate sampling point difference,
The information of transmission is more, therefore can adjust the grid configuration section of each QMF subband on the whole, shares or reduce faceted boundary
(section) description information.For example, all BWE high-frequency sub-band T-F-grids have the same grid configuration, side information is minimum;Example again
Such as, all BWE high-frequency sub-bands have a T-F-grid of n (such as: n < 4), and higher BWE subband has less grid number, grid number
It is the 1/2 of previous subband, and each grid is aligned with two grids of previous subband.
(3) reduce the temporal resolution of grid: i.e. in the time domain of QMF subband signal, the width for increasing grid is (i.e. each
Grid includes more subband sampling points), such as: former time orientation constructs 16 uniform sections, can merge into 8 homogeneity ranges two-by-two
Between, or partially merge into 12 sections (the preceding strong region grid of subband signal transient state is constant, and front and rear part suitably merges).
SBR coding application example shown in fig. 6 based on AFAG, AFAG algorithm through the invention obtain best time-frequency grid
After lattice, SBR high-frequency parameter coding is carried out based on the T-F-grid, the high frequency section coding of digital audio and video signals can be significantly improved
The sound quality of efficiency and high frequency section signal.SBR high-frequency parameter coding specifically includes tonality detection, pitch parameters encode,
High frequency detail parameter coding, high-frequency envelope coding, this belongs to the prior art, is just no longer described in detail herein.It is multiple finally to carry out BWE parameter
With output high frequency band code stream.
Fig. 7 shows the decoding application example of the SBR based on AFAG, decoding process are as follows: demultiplexes, obtains high to BWE code stream
Frequency parameter;The low frequency part of the general audio signal for feeling decoder decoding acquisition passes through QMF and filters acquisition low frequency sub-band signal;
Low frequency sub-band signal and high-frequency parameter are decoded by SBR high-frequency parameter and obtain high frequency subband signals;High frequency subband signals and low frequency
Subband signal carries out multiresolution synthesis by adaptive multiresolution composite filter corresponding with coding side, exports full range
The monophonic audio signal of band.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (6)
1. a kind of adaptive grid configuration method for bandwidth extension encoding, which comprises the steps of:
S1, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to input monophonic sound
Frequency signal carries out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal;
S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state point according to each subband signal
The high frequency band encoder bit rate and human ear critical band characteristic for analysing and considering setting, carry out the adaptive of frequency direction and time orientation
Grid configuration is answered, the best T-F-grid under current code rate is obtained;
Wherein, the step S1 further comprises:
To transient signal, thick frequency resolution and high temporal resolution is selected to be filtered;
To stable state model, thin frequency resolution and low temporal resolution is selected to be filtered;
To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered;
The adaptive grid configuration of frequency direction further comprises in the step S2: according to high in input monophonic audio signal
The frequency characteristic of band portion selects different grid configurations, specifically:
To general audio signal, frequency grid gradually reduces frequency resolution with the frequency raising of highband part, so that frequency
Grid is consistent with human ear critical band;
The case where to including tone signal in highband part, believes under the premise of considering critical band with the general audio
Number the case where compared to the appropriate frequency resolution for increasing grid;
The adaptive grid configuration of time orientation further comprises in the step S2: according in input monophonic audio signal one
The transient response of position and each subband signal that a or multiple transient signals occur, is configured to multiple time domains in time orientation
Section, each section represent a grid.
2. the method according to claim 1, wherein the step S1 further comprises:
In the case of lower for high frequency band encoder bit rate, the selected frequency resolution is reduced.
3. the method according to claim 1, wherein the step S2 further comprises:
High frequency band encoder bit rate based on setting corrects the frequency direction and grid construction that time orientation obtains, is worked as
Best T-F-grid under preceding code rate.
4. a kind of adaptive grid configuration device for bandwidth extension encoding characterized by comprising
Adaptive multiresolutional filter module, for carrying out frequency point based on the transient state analysis to input monophonic audio signal
Resolution selection carries out adaptive multiresolution QMF to input monophonic audio signal and filters, and obtains optimal Time-frequency Filter letter
Number;
T-F-grid constructing module, for carrying out Transient detection and positioning to each subband signal of filtering output, according to each
The high frequency band encoder bit rate and human ear critical band characteristic of setting are analyzed and considered to the transient state of subband signal, carries out frequency side
To the adaptive grid configuration with time orientation, the best T-F-grid under current code rate is obtained;
Wherein, the adaptive multiresolutional filter module is further used for:
To transient signal, thick frequency resolution and high temporal resolution is selected to be filtered;
To stable state model, thin frequency resolution and low temporal resolution is selected to be filtered;
To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered;
The adaptive grid configuration that the T-F-grid constructing module carries out frequency direction further comprises: according to input monophonic
The frequency characteristic of audio signal higher frequency band part selects different grid configurations, specifically:
To general audio signal, frequency grid gradually reduces frequency resolution with the frequency raising of highband part, so that frequency
Grid is consistent with human ear critical band;
The case where to including tone signal in highband part, believes under the premise of considering critical band with the general audio
Number the case where compared to the appropriate frequency resolution for increasing grid;
The adaptive grid configuration that the T-F-grid constructing module carries out time orientation further comprises: according to input monophonic
The transient response of one or more transient signals occur in audio signal position and each subband signal, in time orientation structure
Multiple temporal intervals are caused, each section represents a grid.
5. a kind of bandwidth extension encoding method, which comprises the steps of:
S1, frequency resolution selection is carried out based on the transient state analysis to input monophonic audio signal, to input monophonic sound
Frequency signal carries out adaptive multiresolution QMF filtering, obtains optimal Time-frequency Filter signal;
S2, Transient detection and positioning are carried out to each subband signal of filtering output, the transient state point according to each subband signal
The high frequency band encoder bit rate and human ear critical band characteristic for analysing and considering setting, carry out the adaptive of frequency direction and time orientation
Grid configuration is answered, the best T-F-grid under current code rate is obtained;
S3, high-frequency parameter coding is carried out as unit of the T-F-grid;
S4, coding parameter is multiplexed, export high frequency band code stream, the coding parameter include multiresolutional filter selection parameter and when
Frequency grid parameter;
Wherein, the step S1 further comprises:
To transient signal, thick frequency resolution and high temporal resolution is selected to be filtered;
To stable state model, thin frequency resolution and low temporal resolution is selected to be filtered;
To other audio signals, intermediate frequency resolution and intermediate temporal resolution is selected to be filtered;
The adaptive grid configuration of frequency direction further comprises in the step S2: according to high in input monophonic audio signal
The frequency characteristic of band portion selects different grid configurations, specifically:
To general audio signal, frequency grid gradually reduces frequency resolution with the frequency raising of highband part, so that frequency
Grid is consistent with human ear critical band;
The case where to including tone signal in highband part, believes under the premise of considering critical band with the general audio
Number the case where compared to the appropriate frequency resolution for increasing grid;
The adaptive grid configuration of time orientation further comprises in the step S2: according in input monophonic audio signal one
The transient response of position and each subband signal that a or multiple transient signals occur, is configured to multiple time domains in time orientation
Section, each section represent a grid.
6. a kind of bandwidth expansion coding/decoding method, which comprises the steps of:
S1, the high frequency band code stream obtained to the bandwidth extension encoding method according to claim 5 of input demultiplex,
Coding parameter is obtained, the coding parameter includes multiresolutional filter selection parameter and T-F-grid parameter;
S2, high-frequency parameter decoding is carried out based on T-F-grid, restores high frequency subband signals;
S3, according to multiresolutional filter selection parameter, low frequency sub-band signal that the high frequency subband signals and decoding are obtained into
Row multiresolution synthesis corresponding with coding side, exports the monophonic audio signal of Whole frequency band.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510590039.8A CN105261373B (en) | 2015-09-16 | 2015-09-16 | Adaptive grid configuration method and apparatus for bandwidth extension encoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510590039.8A CN105261373B (en) | 2015-09-16 | 2015-09-16 | Adaptive grid configuration method and apparatus for bandwidth extension encoding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105261373A CN105261373A (en) | 2016-01-20 |
CN105261373B true CN105261373B (en) | 2019-01-08 |
Family
ID=55101029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510590039.8A Active CN105261373B (en) | 2015-09-16 | 2015-09-16 | Adaptive grid configuration method and apparatus for bandwidth extension encoding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105261373B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108398691B (en) * | 2018-05-25 | 2023-10-17 | 中国工程物理研究院流体物理研究所 | Difference frequency signal generating device and method |
CN111210831A (en) * | 2018-11-22 | 2020-05-29 | 广州广晟数码技术有限公司 | Bandwidth extension audio coding and decoding method and device based on spectrum stretching |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5886276A (en) * | 1997-01-16 | 1999-03-23 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for multiresolution scalable audio signal encoding |
US6978236B1 (en) * | 1999-10-01 | 2005-12-20 | Coding Technologies Ab | Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching |
CN1460992A (en) * | 2003-07-01 | 2003-12-10 | 北京阜国数字技术有限公司 | Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding |
CN100364235C (en) * | 2004-09-17 | 2008-01-23 | 广州广晟数码技术有限公司 | Apparatus and methods for multichannel digital audio coding |
RU2488896C2 (en) * | 2008-03-04 | 2013-07-27 | Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. | Mixing of incoming information flows and generation of outgoing information flow |
JP6262668B2 (en) * | 2013-01-22 | 2018-01-17 | パナソニック株式会社 | Bandwidth extension parameter generation device, encoding device, decoding device, bandwidth extension parameter generation method, encoding method, and decoding method |
-
2015
- 2015-09-16 CN CN201510590039.8A patent/CN105261373B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN105261373A (en) | 2016-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105280190B (en) | Bandwidth extension encoding and decoding method and device | |
KR101120911B1 (en) | Audio signal decoding device and audio signal encoding device | |
US9111532B2 (en) | Methods and systems for perceptual spectral decoding | |
US9711154B2 (en) | Adaptive transition frequency between noise fill and bandwidth extension | |
TWI441162B (en) | Audio signal synthesizer, audio signal encoder, method for generating synthesis audio signal and data stream, computer readable medium and computer program | |
US20070156397A1 (en) | Coding equipment | |
CN102144259A (en) | An apparatus and a method for generating bandwidth extension output data | |
JP2011509428A (en) | Audio signal processing method and apparatus | |
CN103069484A (en) | Time/frequency two dimension post-processing | |
EP1199812A1 (en) | Perceptually improved encoding of acoustic signals | |
DE102007029381A1 (en) | Digital signal e.g. audio signal, processing device, has decision section, which assumes forecast data before deletion as interpolation data, when absolute value is lower than resolution | |
CN105261373B (en) | Adaptive grid configuration method and apparatus for bandwidth extension encoding | |
CN101197576A (en) | Audio signal encoding and decoding method | |
CN103295577B (en) | Analysis window switching method and device for audio signal coding | |
Bhatt et al. | A novel approach for artificial bandwidth extension of speech signals by LPC technique over proposed GSM FR NB coder using high band feature extraction and various extension of excitation methods | |
Shin et al. | Designing a unified speech/audio codec by adopting a single channel harmonic source separation module | |
CN105280189B (en) | The method and apparatus that bandwidth extension encoding and decoding medium-high frequency generate | |
Bhatt | Implementation and Overall Performance Evaluation of CELP based GSM AMR NB coder over ABE | |
BR112021007516A2 (en) | audio encoder, audio processor and method for processing an audio signal | |
CN111210831A (en) | Bandwidth extension audio coding and decoding method and device based on spectrum stretching | |
KR20230035373A (en) | Audio encoding method, audio decoding method, related device, and computer readable storage medium | |
Chiang et al. | Advances in Low Bit-Rate Audio Coding: A Digest of Selected Papers from Recent AES Conventions | |
Annadana et al. | A new low bit rate speech coding scheme for mixed content | |
Gunawan et al. | Fixed bit rate perceptual wavelet packet audio coder | |
Jax | Backwards Compatible Wideband Telephony |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220527 Address after: 510530 No. 10, Nanxiang 2nd Road, Science City, Luogang District, Guangzhou, Guangdong Patentee after: Guangdong Guangsheng research and Development Institute Co.,Ltd. Address before: 518057 6th floor, software building, No. 9, Gaoxin Zhongyi Road, high tech Zone, Nanshan District, Shenzhen, Guangdong Province Patentee before: SHENZHEN RISING SOURCE TECHNOLOGY Co.,Ltd. |