CN105261373A - Self-adaptive grid construction method and device used for bandwidth extended coding - Google Patents

Self-adaptive grid construction method and device used for bandwidth extended coding Download PDF

Info

Publication number
CN105261373A
CN105261373A CN201510590039.8A CN201510590039A CN105261373A CN 105261373 A CN105261373 A CN 105261373A CN 201510590039 A CN201510590039 A CN 201510590039A CN 105261373 A CN105261373 A CN 105261373A
Authority
CN
China
Prior art keywords
frequency
grid
signal
self
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510590039.8A
Other languages
Chinese (zh)
Other versions
CN105261373B (en
Inventor
闫建新
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Guangsheng Research And Development Institute Co ltd
Original Assignee
SHENZHEN GUANGSHENG XINYUAN TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN GUANGSHENG XINYUAN TECHNOLOGY Co Ltd filed Critical SHENZHEN GUANGSHENG XINYUAN TECHNOLOGY Co Ltd
Priority to CN201510590039.8A priority Critical patent/CN105261373B/en
Publication of CN105261373A publication Critical patent/CN105261373A/en
Application granted granted Critical
Publication of CN105261373B publication Critical patent/CN105261373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a self-adaptive grid construction method and device used for bandwidth extended coding. The method comprises the steps of: S1, based on transient property analysis of input monophonic audio signals, carrying out frequency resolution selection, carrying out self-adaptive multi-resolution filtering on the input monophonic audio signals, and obtaining an optimum time frequency filtering signal; and S2, carrying out transient detection and positioning on each sub-band signal output after the filtering, carrying out self-adaptive grid construction respectively in a frequency direction and a time direction according to the transient property analysis of each sub-band signal and by considering set high frequency band code rates and ear critical frequency band characteristics, and obtaining an optimum time frequency grid under a current code rate. The invention further relates to bandwidth extended coding and decoding methods based on the method. According to the invention, based on audio signal characteristics and high frequency signal available code rate limits, multi-resolution filtering and self-adaptive time frequency grid construction are carried out, and the coding efficiency of high frequency parts of digital audio signals and the sound quality of the high frequency part signals are obviously improved.

Description

For the self-adaptation grid configuration method and apparatus of bandwidth extension encoding
Technical field
The present invention relates to digital audio decode technology, more particularly, relate to a kind of self-adaptation grid configuration method and apparatus for bandwidth extension encoding and adopt bandwidth extension encoding and the coding/decoding method of the method.
Background technology
The stereo exemplary operation code check of traditional perceptual audio technology (DRA, AAC and MP3 etc.) is 96 ~ 128kbps, and time below 64kbps/ is stereo, coding quality exists obvious subjective sensation distortion.The typical encoder bit rate of frequency modulation broadcasting application is that 48kbps ~ 64kbps/ is stereo, and at this moment heritage feels that the subjective sound quality of audio decoding techniques can not meet frequency modulation broadcasting requirement.
For this reason, bandwidth expansion (BandWidthExtension the is called for short BWE) coding techniques of digital audio and video signals is proposed.Current bandwidth extension encoding technology has a lot, and performance is also uneven.Disclose and mainly contain the following two kinds encryption algorithm for the bandwidth extension encoding technology in international standard:
The first is that the spectral band described in ISO/IEC14496-3MPEG-4 copies (SpectralBandwidthReplication is called for short SBR) coding.Fig. 1 shows the concrete theory diagram of SBR coding.SBR is the algorithm of frequency domain process, its coding principle is: every frame signal is by the quadrature mirror filter bank (QuadratureMirrorFilter of 64 subbands, be called for short QMF) obtain 64 uniform sub-bands, each sub-band comprises 32 sampling points, divide a rational T-F-grid according to the transient response of current demand signal, each raster symbol-base energy information also carries out huffman coding.This algorithm comprises tonality inspection and transmission other single sinusoidal signal parameter information simultaneously.In decoding end, first select normalized low frequency part, and copy the detailed information of high frequency as high frequency; Again by the envelope of the energy information amendment high frequency of transmission; Finally supplement simple signal; Finally filtering process is carried out to the high-frequency signal recovered.Such low frequency and high frequency rebuild pcm digital audio signal by the synthesis QMF filtering of 64 bands again.
The major defect of MPEGSBR coding techniques is the segmentation of relatively-stationary time-frequency.For 48kHz sampling rate, owing to using 64 band QMF, then maximum frequency resolution is 375Hz (24khz/64); Every frame 2048 sampling point, then resolution is about 1.3ms (64/48000) maximum time.Because sound signal is extremely complicated, this algorithm can not meet the accuracy requirement of signal analysis sometimes well.
The second is the simple bandwidth expansion technique of one comprised in 3GPPAMR-WB+ coding method.It is a kind of algorithm of Time Domain Processing, main code principle is: the low frequency and the high frequency two parts time-domain signal that input signal are divided into same bandwidth, low frequency (LF) part obtains the residual signals of low frequency signal by lpc analysis filtering process, then simulate high frequency detail signal through high frequency LPC synthetic filtering; Then by with actual S hFn the actual high-frequency signal of () compares, obtain the gain vector (every subframe yield value) of high-frequency envelope (energy), the further modified gain vector of consistance finally by the gain of low frequency high and low frequency tie point, this gain vector of then encoding.Therefore the gain vector comprising correction and the high frequency LPC coefficient of decoding end is transferred to.The high frequency decode procedure of AMR-WB+ is the inverse process of coding substantially.
There is following problem in the bandwidth extension encoding technology of 3GPPAMR-WB+: (1) realizes high-frequency coding in time domain, cannot obtain higher frequency resolution, only has a high-frequency region to divide because this method can be thought; (2) start frequency band of high-frequency coding is fixed, and can only be Fs/4, and for 48khz sample frequency, the initial frequency of high-frequency coding is 12khz; (3) cannot accurately recover the harmonic signal in high frequency; (4) envelope restoration of high-frequency signal is not accurate enough.
In addition also have some bandwidth extension encoding technology, time-frequency conversion unit adopts traditional FFT, then on frequency domain, high frequency is divided into several region, encodes to the spectrum energy in each region, and therefore every frame can only provide a multiple frequency resolution of temporal resolution.This High Frequency Reconstruction Technology based on FFT, frequency domain resolution is high and time resolution is too low, and when inputting the fast sound signal become, the signal of high-frequency reconstruction can not follow the tracks of the change of original audio signal well.
Intensity-stereo encoding in digital audio encoding also can think a kind of special bandwidth extension encoding technology, its principle utilizes the details of human auditory system to HFS insensitive, therefore mixing under carrying out the HFS of each sound channel of stereo or 5.1 surround sounds is a sound channel, and as the high frequency detail signal of all sound channels after normalization, but the envelope of the high-frequency signal of each sound channel (energy in high frequency critical band) all needs coding transmission.
Summary of the invention
The technical problem to be solved in the present invention is, for the above-mentioned defect of prior art, a kind of self-adaptation grid configuration method and apparatus for bandwidth extension encoding is provided and adopts bandwidth extension encoding and the coding/decoding method of the method, with the sound quality of the code efficiency and HFS signal of improving digital audio and video signals HFS.
The technical solution adopted for the present invention to solve the technical problems is: propose a kind of self-adaptation grid configuration method for bandwidth extension encoding, comprise the steps:
S1, carry out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, self-adaptation many resolutions filtering is carried out to input monophonic audio signal, obtains best Time-frequency Filter signal;
S2, to filtering export each subband signal carry out Transient detection and location, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
According to one embodiment of present invention, described step S1 comprises further:
To transient signal, select thick frequency resolution and high temporal resolution to carry out filtering;
To stable state model, select thin frequency resolution and low temporal resolution to carry out filtering;
To other sound signal, the frequency resolution in the middle of selecting and middle temporal resolution carry out filtering.
According to one embodiment of present invention, described step S1 comprises further:
For the situation that high frequency band encoder bit rate is lower, reduce the frequency resolution of described selection.
According to one embodiment of present invention, the self-adaptation grid configuration in described step S2 medium frequency direction comprises further: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid configuration, is specially:
To general sound signal, frequency grid progressively reduces frequency resolution with the frequency rising of highband part, makes frequency grid consistent with people's ear critical band;
Comprise sound signal in highband part, under the prerequisite considering critical band, suitably increase the frequency resolution of grid compared with the situation of described general sound signal;
In described step S2, the self-adaptation grid configuration of time orientation comprises further: the position occurred according to one or more transient signal in input monophonic audio signal and the transient response of each subband signal, be configured to multiple temporal interval at time orientation, each interval represents a grid.
According to one embodiment of present invention, described step S2 comprises further:
Based on the high frequency band encoder bit rate of setting, correct the grid construction that described frequency direction and time orientation obtain, obtain the best T-F-grid under current code check.
The present invention solves its technical matters also to propose a kind of self-adaptation grid configuration device for bandwidth extension encoding, comprising:
Self-adaptation many resolutions filtration module, for carrying out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, carrying out self-adaptation many resolutions filtering to input monophonic audio signal, obtaining best Time-frequency Filter signal;
T-F-grid constructing module, each subband signal for exporting filtering carries out Transient detection and location, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
According to one embodiment of present invention, described self-adaptation many resolutions filtration module is further used for:
To transient signal, select thick frequency resolution and high temporal resolution to carry out filtering;
To stable state model, select thin frequency resolution and low temporal resolution to carry out filtering;
To other sound signal, the frequency resolution in the middle of selecting and middle temporal resolution carry out filtering.
According to one embodiment of present invention, the self-adaptation grid configuration that described T-F-grid constructing module carries out frequency direction comprises further: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid configuration, is specially:
To general sound signal, frequency grid progressively reduces frequency resolution with the frequency rising of highband part, makes frequency grid consistent with people's ear critical band;
Comprise sound signal in highband part, under the prerequisite considering critical band, suitably increase the frequency resolution of grid compared with the situation of described general sound signal;
The self-adaptation grid configuration that described T-F-grid constructing module carries out time orientation comprises further: the position occurred according to one or more transient signal in input monophonic audio signal and the transient response of each subband signal, be configured to multiple temporal interval at time orientation, each interval represents a grid.
The present invention also proposes a kind of bandwidth extension encoding method for solving its technical matters, comprises the steps:
S1, carry out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, self-adaptation many resolutions filtering is carried out to input monophonic audio signal, obtains best Time-frequency Filter signal;
S2, to filtering export each subband signal carry out Transient detection and location, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check;
S3, in units of described T-F-grid, carry out high-frequency parameter coding;
S4, coding parameter is multiplexing, export high frequency band code stream, described coding parameter comprises multiresolutional filter Selection parameter and T-F-grid parameter.
The present invention also proposes a kind of bandwidth expansion coding/decoding method for solving its technical matters, comprises the steps:
S1, to input high frequency band code stream carry out demultiplexing, obtain coding parameter, described coding parameter comprises multiresolutional filter Selection parameter and T-F-grid parameter;
S2, carry out high-frequency parameter decoding based on T-F-grid, recover high frequency subband signals;
S3, foundation multiresolutional filter Selection parameter, carry out the multiresolution synthesis corresponding with coding side to described high frequency subband signals and the low frequency sub-band signal obtained of decoding, export the monophonic audio signal of Whole frequency band.
The present invention is used for the self-adaptation grid configuration method and apparatus of bandwidth extension encoding, self-adaptation multiresolutional filter and self-adaptation T-F-grid structure (Adaptivemulti-resolutionFiltering & Adaptivetime-frequencyGriding can be carried out with code check restriction based on characteristic audio signal and high-frequency signal, be called for short AFAG), obtain best T-F-grid, be beneficial to the process of the modules such as the follow-up high-frequency parameter coding in units of T-F-grid of bandwidth extension encoding, significantly improve the HFS code efficiency of digital audio and video signals and the sound quality of HFS signal.The digital audio decode device realized based on bandwidth extension encoding of the present invention and coding/decoding method can be used for the low bit rate audio coding field such as Sound processing and high quality audio broadcast of satellite HDTV.
Accompanying drawing explanation
Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing:
Fig. 1 is the theory diagram of existing SBR coding method;
Fig. 2 is the process flow diagram of the self-adaptation grid configuration method for bandwidth extension encoding of one embodiment of the invention;
Fig. 3 is the logical organization block diagram of the self-adaptation grid configuration device for bandwidth extension encoding of one embodiment of the invention;
Fig. 4 is the theory diagram of the bandwidth extension encoding method of one embodiment of the invention;
Fig. 5 is the theory diagram of the bandwidth expansion coding/decoding method of one embodiment of the invention;
Fig. 6 is the theory diagram of the SBR coding application example based on AFAG;
Fig. 7 is the theory diagram of the SBR decoding application example based on AFAG;
Fig. 8 is the schematic diagram of 128 subband QMF time frequency resolution;
Fig. 9 is the schematic diagram of 64 subband QMF time frequency resolution;
Figure 10 is the schematic diagram of 32 subband QMF time frequency resolution;
Figure 11 is the schematic diagram of 16 subband QMF time frequency resolution;
Figure 12 is the schematic diagram of the frequency grid ordinary construction of 64 subband QMF;
Figure 13 is the schematic diagram that subband 16-23 contains the frequency grid structure of 64 subband QMF of drummy speech component;
Figure 14 is the schematic diagram of the time grid ordinary construction of 64 subband QMF;
Figure 15 is the schematic diagram of the best T-F-grid structure of 64 subband QMF under constrained code rate.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
Fig. 2 shows according to an embodiment of the invention for the process flow diagram of the self-adaptation grid configuration method 100 of bandwidth extension encoding.As shown in Figure 1, the method 100 comprises the steps:
Step S110, carries out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, carries out self-adaptation many resolutions filtering, obtain best Time-frequency Filter signal to input monophonic audio signal.
In this step, first real-time analysis is carried out to the transient state of input monophonic audio signal, then frequency resolution selection is carried out according to the stable state/transient response of the sound signal analyzed, to select a best multirate filter bank (QMF) to carry out filtering to this sound signal, export best Time-frequency Filter signal.In general, the selection strategy of self-adaptation multiresolutional filter is carried out based on the transient state of input monophonic audio signal as follows:
To transient signal, thick frequency resolution and high temporal resolution can be selected to carry out filtering;
To stable state model, thin frequency resolution and low temporal resolution can be selected to carry out filtering;
To other sound signal, middle frequency resolution and middle temporal resolution can be selected to carry out filtering.
In addition, consider the impact that high-frequency signal bandwidth extended coding code check is limited, if the total bitrate of audio-frequency signal coding is lower, thus the code check of high-frequency band signals coding also lower (or the available bits of coding highband part is less), then suitably to reduce the frequency resolution of high-frequency band signals filtering, namely, in the filtering resolution situation determined only considering input audio signal transient response, selected frequency resolution can suitably be reduced further.
Step S120, Transient detection and location are carried out to each subband signal that filtering exports, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
The structure of T-F-grid depends on the particular location of transient signal in a frame, even depends on the transient state analysis of each filtering subband signal, the available code check simultaneously also needing consideration highband part to distribute and people's ear critical band characteristic.Therefore the elementary tactics of self-adaptation T-F-grid structure forms primarily of two parts, and one is the grid configuration of time orientation, sampling point combination in the son namely in same frequency subband; One is the grid configuration of frequency direction, namely combines between different frequency sub-bands.
The frequency characteristic of the self-adaptation grid configuration strategy Main Basis input monophonic audio signal higher frequency band part of frequency direction selects different grid configuration, be specially: to general sound signal, frequency grid progressively reduces frequency resolution with the frequency rising of highband part, makes frequency grid consistent with people's ear critical band; To the situation comprising sound signal in highband part, under the prerequisite considering critical band, the frequency resolution of grid suitably should be increased compared with the situation of aforementioned general sound signal.The position that in the self-adaptation grid configuration Main Basis input monophonic audio signal of time orientation, one or more transient signal occurs and the transient response of each subband signal, time orientation is configured to multiple temporal interval, and each interval represents a grid.
In addition, the above-mentioned T-F-grid structure calculated based on current high-frequency band signals characteristic, also to be subject to the restriction of high-frequency signal bandwidth extended coding code check, therefore also need to correct the grid construction that frequency direction and time orientation obtain based on high-frequency signal bandwidth extended coding code check, thus the best T-F-grid under obtaining current code check.
The self-adaptation grid configuration method for bandwidth extension encoding more than the present invention introduced, referred to as AFAG algorithm, best T-F-grid is obtained based on self-adaptation multiresolutional filter and self-adaptation T-F-grid structure, be beneficial to the high-frequency parameter coded treatment that bandwidth extension encoding is follow-up, the HFS code efficiency of digital audio and video signals and the sound quality of HFS signal can be significantly improved.
Based on above the introduced self-adaptation grid configuration method for bandwidth extension encoding, the present invention also proposes a kind of self-adaptation grid configuration device for bandwidth extension encoding.Fig. 3 shows according to an embodiment of the invention for the logical organization block diagram of the self-adaptation grid configuration device 200 of bandwidth extension encoding.As shown in Figure 3, this self-adaptation grid configuration device 200 being used for bandwidth extension encoding comprises self-adaptation multiresolutional filter module 210 and T-F-grid constructing module 220.Wherein, self-adaptation many resolutions filtration module 210, for carrying out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, carries out self-adaptation many resolutions filtering to input monophonic audio signal, obtains best Time-frequency Filter signal.T-F-grid constructing module 220 carries out Transient detection and location for each subband signal exported filtering, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
Specifically as shown in Figure 3, self-adaptation many resolutions filtration module 210 comprises audio signal analysis unit 211 and many resolution filters group 212 further.Wherein, first audio signal analysis unit 211 carries out real-time analysis to the transient state of input monophonic audio signal, then carry out frequency resolution selection according to the stable state/transient response of the sound signal analyzed, to select in multirate filter bank 212 best bank of filters (such as quadrature mirror filter bank QMF), filtering is carried out to this sound signal.Multirate filter bank 212 performs self-adaptation multiresolutional filter based on the selection of audio signal analysis unit 211, exports best Time-frequency Filter signal.In general, the frequency resolution selection strategy of audio signal analysis unit 211 is as follows:
To transient signal, a bank of filters in multirate filter bank 212 with thick frequency resolution and high temporal resolution can be selected to carry out filtering;
To stable state model, a bank of filters in multirate filter bank 212 with thin frequency resolution and low temporal resolution can be selected to carry out filtering;
To other sound signal, a bank of filters in multirate filter bank 212 with middle frequency resolution and middle temporal resolution can be selected to carry out filtering.
In addition, consider the impact that high-frequency signal bandwidth extended coding code check is limited in the filtering resolution situation determined in above-mentioned only consideration input audio signal transient response, can suitably reduce selected frequency resolution further.
Specifically as shown in Figure 3, T-F-grid constructing module 220 comprises Transient detection unit 221 and T-F-grid division unit 222 further.Wherein, each subband signal that Transient detection unit 221 pairs of self-adaptation multiresolutional filter modules 210 export carries out Transient detection and location.The available code check that the transient state of each filtering subband signal that T-F-grid division unit 222 analyzes according to Transient detection unit 210 and highband part distribute and people's ear critical band characteristic, carry out the self-adaptation grid configuration of frequency direction and time orientation, export the best T-F-grid under current code check.The self-adaptation T-F-grid Constructing Policy of T-F-grid division unit 222, see the associated description in preceding method 100, just can repeat no more at this.
Fig. 4 is the theory diagram of the general bandwidth extension encoding method 300 based on the present invention aforementioned AFAG algorithm.As shown in Figure 4, should comprise the steps: based on the bandwidth extension encoding method 300 of AFAG
In step 310, carry out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, self-adaptation many resolutions filtering is carried out to input monophonic audio signal, obtains best Time-frequency Filter signal.
In step 320, Transient detection and location are carried out to each subband signal that filtering exports, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
In step 330, in units of T-F-grid, carry out high-frequency parameter coding.High-frequency parameter coding can adopt various suitable coding method of the prior art to realize, and it is principal concern of the present invention not, therefore is not described in detail in this.
In step 340, by multiplexing for the coding parameters such as T-F-grid parameter, multiresolutional filter Selection parameter, export high frequency band code stream.
For the monaural digital audio signal of input in real time, adopt and to encode its high-frequency signal part based on the bandwidth extension encoding method of AFAG as above, under the encoder bit rate restriction of setting, achieve best time-frequency grid construction, the code efficiency of digital audio and video signals HFS or the sound quality of HFS signal can be improved.
Fig. 5 is the theory diagram of the general bandwidth expansion coding/decoding method 400 based on the present invention aforementioned AFAG algorithm.This bandwidth expansion coding/decoding method 400 is decoded for the encoding code stream exported the bandwidth extension encoding method 300 shown in Fig. 4, to obtain monophonic audio signal.As shown in Figure 5, should comprise the steps: based on the bandwidth expansion coding/decoding method 400 of AFAG
In step 410, demultiplexing is carried out to the high frequency band code stream of input, obtain the coding parameters such as multiresolutional filter Selection parameter, T-F-grid parameter.
In step 420, carry out high-frequency parameter decoding based on T-F-grid, recover high frequency subband signals.High-frequency parameter decoding can adopt the suitable coding/decoding method corresponding with aforementioned high-frequency parameter coding method to realize, and it is principal concern of the present invention not, therefore is not described in detail in this.
In step 430, the low frequency signal that decoding recovers is analyzed by QMF, obtains low frequency sub-band signal.This is also prior art, therefore does not repeat them here.
In step 440, according to multiresolutional filter Selection parameter, the low frequency sub-band signal that the high frequency subband signals obtain step 420 and step 430 obtain carries out the multiresolution corresponding with coding side (the bandwidth extension encoding method 300 see Fig. 4) and synthesizes, and exports the monophonic audio signal of Whole frequency band.
Fig. 6 shows the SBR coding application example based on AFAG, and its basic process is: the HFS of input Whole frequency band sound signal carries out self-adaptation T-F-grid structure by AFAG, then adopts the high-frequency parameter coding method of SBR to encode, forms BWE code stream.Specifically as shown in Figure 6, self-adaptation multiresolutional filter in this AFAG algorithm selects a QMF bank of filters of mating most to realize by having in the QMF bank of filters of different frequency bands bandwidth from 16 subband QMF, 32 subband QMF, 64 subband QMF and 128 subband QMF tetra-, each QMF bank of filters provides a kind of fixing frequency resolution and temporal resolution, respectively as Fig. 8,9, shown in 10 and 11, wherein each grey fritter 501 represents minimum time frequency unit.The strategy that self-adaptation multiresolutional filter is selected is:
To transient signal, 16 subband QMF filtering can be selected.16 subband QMF have the thickest frequency resolution, sound signal can be configured to 16 uniform frequency bands, and the bandwidth of each frequency band is Fs/32 (Fs is audio sampling frequency), and corresponding temporal resolution is the highest, is 16/Fs.16 subband QMF are filtered into each subband signal and provide maximum time resolution thus, are beneficial to follow-up reasonable T-F-grid structure, can improve the high-frequency signal sound quality of transient signal BWE parameter coding.
To steady-state signal, 128 subband QMF filtering can be selected.128 subband QMF have the thinnest frequency resolution, and the bandwidth of each frequency band is Fs/256, and corresponding temporal resolution is minimum, is 128/Fs.Characteristics of signals simultaneously in conjunction with each subband provides maximum frequency resolution for steady-state signals such as tone type, improves the code efficiency of steady-state signal HFS.
To the sound signal of other atypia classes, one of middle two kinds of QMF filtering (32 subbands or 64 subbands) can be selected according to characteristics of signals, the frequency resolution higher than 16 subband QMF can be obtained, the temporal resolution higher than 128 subband QMF, be convenient to follow-up T-F-grid structure, and finally obtain more balanced high-frequency coding effect.
Impact by high-frequency signal BWE encoder bit rate: if the total bitrate of audio-frequency signal coding is lower, thus the code check of high-frequency signal coding also lower (or the available bits of coding HFS is less), then suitably to reduce the frequency resolution filtering of high-frequency signal, namely suitably reduce the QMF frequency filtering resolution only determined according to input audio signal characteristic.
This AFAG algorithm passes through a frame signal stable state/Transient Analysis, select a best multiresolution QMF adaptively, the structure of T-F-grid then depends on the particular location of transient signal in a frame, even depend on the transient state analysis of QMF subband signal, the available code check simultaneously also needing consideration high-frequency sub-band part to distribute and people's ear critical band characteristic.Therefore the elementary tactics of self-adaptation T-F-grid structure forms primarily of two parts, and one is the grid configuration (in QMF subband sampling point combination) of time orientation; One is the grid configuration (combination of QMF intersubband) of frequency direction.As follows based on the combination of both direction in the SBR coding application example of AFAG:
Once selected a certain multiresolutional filter QMF, then output signal has fixing frequency resolution and temporal resolution.Therefore, the grid configuration in adaptive frequency direction, according to the HFS frequency characteristic in sound signal coded by BWE, selects different grid configuration.To general sound signal, basic grid configuration method is in a frequency direction: the critical band characteristic considering human auditory system, along with frequency increases, the frequency resolution of people's ear declines, therefore frequency grid raises progressively should reduce frequency resolution with frequency, and namely several QMF subband merges into a frequency grid unit, and concrete merging should in conjunction with the frequency resolution of QMF and people's ear critical band width, make frequency grid and people's ear critical band unanimous on the whole, as shown in figure 12.Abundant tone signal is comprised in the high-frequency signal of BWE coding, then should under the prerequisite considering critical band, the frequency resolution of grid suitably should be increased compared with general sound signal, namely less QMF subband forms the grid of a frequency direction, such as 1/3 critical bandwidth, as there being grid configuration during tonal components in the subband 16 ~ 23 that Figure 13 is given in 64 subband QMF.
The grid configuration in auto-adaptive time direction depends on the position that in a frame signal, one or more transient signal occurs, consider the transient response of each subband signal simultaneously, and on time orientation, being configured to multiple temporal interval (as 8 are not waited the interval of width), each interval represents a grid.Generally, the Signal temporal resolution of the QMF subband that frequency is higher requires lower, therefore can suitably reduce grid number, as shown in figure 14.
The grid information of self-adaptation T-F-grid structure, describe the T-F-grid tectonic information of the high-frequency sub-band of BWE coding, to need to be encoded in BWE code stream and to pass to decoding end, for the coding parameter of decoding based on grid, therefore can think that grid information is the side information of BWE high-frequency signal coding, should not take too much bit (or code check).According to the total bitrate of low bit rate audio coding, and low frequency signal and high-frequency signal complexity, determine the encoder bit rate (or the bit number distributed) of high-frequency signal BWE.Like this except considering the best T-F-grid structure calculated based on current high frequency signal properties, also will be subject to the restriction of code check, integrated calibration obtains the best T-F-grid under current code check, as shown in figure 15.Basic correction method comprises:
(1) frequency resolution of grid is reduced: namely in a frequency direction, the width of each grid increases, be such as that wide to change 1/2 critical band into wide for 1/3 critical band originally, or the part low frequency sub-band in QMF high-frequency sub-band applies the wide and remainder of 1/3 critical band, and to apply 1/2 critical band wide.
(2) the time domain direction grid structure of different Q MF high-frequency sub-band is optimized: if carry out grid configuration based on the transient response of each subband signal, possible different Q MF subband has different grid numbers and the initial sum of each grid stops sampling point difference, the information of transmission is more, the grid configuration that therefore can adjust each QMF subband is on the whole interval, shares or reduces faceted boundary (interval) descriptor.Such as, all BWE high-frequency sub-band T-F-grid have the same grid configuration, and side information is minimum; Again such as, all BWE high-frequency sub-band have n (such as: n<4) individual T-F-grid, and higher BWE subband has less grid number, and grid number is 1/2 of previous subband, and each grid aligns with two grids of previous subband.
(3) temporal resolution of grid is reduced: namely in the time domain of QMF subband signal, increase the width (namely each grid comprises more subband sampling point) of grid, such as: former time orientation constructs between 16 homogeneity ranges, can merge between 8 homogeneity ranges between two, or part merges into 12 intervals (front subband signal transient state strong region grid is constant, and front and rear part suitably merges).
The coding of the SBR based on AFAG application example shown in Fig. 6, after obtaining best T-F-grid by AFAG algorithm of the present invention, carry out SBR high-frequency parameter coding based on this T-F-grid, the HFS code efficiency of digital audio and video signals and the sound quality of HFS signal can be significantly improved.SBR high-frequency parameter coding specifically comprises tonality detection, pitch parameters coding, high frequency detail parameter coding, high-frequency envelope coding, and this belongs to prior art, just no longer describes in detail at this.Finally carry out BWE parameter multiplexing, export high frequency band code stream.
Fig. 7 illustrates the SBR decoding application example based on AFAG, and its decode procedure is: to BWE code stream demultiplexing, obtains high-frequency parameter; The low frequency part of the sound signal that general sensation decoders decode obtains obtains low frequency sub-band signal by QMF filtering; Low frequency sub-band signal and high-frequency parameter, obtain high frequency subband signals by the decoding of SBR high-frequency parameter; High frequency subband signals carries out multiresolution synthesis with low frequency sub-band signal by the self-adaptation multiresolution composite filter corresponding with coding side, exports the monophonic audio signal of Whole frequency band.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1., for a self-adaptation grid configuration method for bandwidth extension encoding, it is characterized in that, comprise the steps:
S1, carry out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, self-adaptation many resolutions filtering is carried out to input monophonic audio signal, obtains best Time-frequency Filter signal;
S2, to filtering export each subband signal carry out Transient detection and location, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
2. method according to claim 1, is characterized in that, described step S1 comprises further:
To transient signal, select thick frequency resolution and high temporal resolution to carry out filtering;
To stable state model, select thin frequency resolution and low temporal resolution to carry out filtering;
To other sound signal, the frequency resolution in the middle of selecting and middle temporal resolution carry out filtering.
3. method according to claim 2, is characterized in that, described step S1 comprises further:
For the situation that high frequency band encoder bit rate is lower, reduce the frequency resolution of described selection.
4. method according to claim 1, is characterized in that, the self-adaptation grid configuration in described step S2 medium frequency direction comprises further: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid configuration, is specially:
To general sound signal, frequency grid progressively reduces frequency resolution with the frequency rising of highband part, makes frequency grid consistent with people's ear critical band;
Comprise sound signal in highband part, under the prerequisite considering critical band, suitably increase the frequency resolution of grid compared with the situation of described general sound signal;
In described step S2, the self-adaptation grid configuration of time orientation comprises further: the position occurred according to one or more transient signal in input monophonic audio signal and the transient response of each subband signal, be configured to multiple temporal interval at time orientation, each interval represents a grid.
5. method according to claim 4, is characterized in that, described step S2 comprises further:
Based on the high frequency band encoder bit rate of setting, correct the grid construction that described frequency direction and time orientation obtain, obtain the best T-F-grid under current code check.
6., for a self-adaptation grid configuration device for bandwidth extension encoding, it is characterized in that, comprising:
Self-adaptation many resolutions filtration module, for carrying out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, carrying out self-adaptation many resolutions filtering to input monophonic audio signal, obtaining best Time-frequency Filter signal;
T-F-grid constructing module, each subband signal for exporting filtering carries out Transient detection and location, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check.
7. device according to claim 6, is characterized in that, described self-adaptation many resolutions filtration module is further used for:
To transient signal, select thick frequency resolution and high temporal resolution to carry out filtering;
To stable state model, select thin frequency resolution and low temporal resolution to carry out filtering;
To other sound signal, the frequency resolution in the middle of selecting and middle temporal resolution carry out filtering.
8. device according to claim 6, it is characterized in that, the self-adaptation grid configuration that described T-F-grid constructing module carries out frequency direction comprises further: the frequency characteristic according to input monophonic audio signal higher frequency band part selects different grid configuration, is specially:
To general sound signal, frequency grid progressively reduces frequency resolution with the frequency rising of highband part, makes frequency grid consistent with people's ear critical band;
Comprise sound signal in highband part, under the prerequisite considering critical band, suitably increase the frequency resolution of grid compared with the situation of described general sound signal;
The self-adaptation grid configuration that described T-F-grid constructing module carries out time orientation comprises further: the position occurred according to one or more transient signal in input monophonic audio signal and the transient response of each subband signal, be configured to multiple temporal interval at time orientation, each interval represents a grid.
9. a bandwidth extension encoding method, is characterized in that, comprises the steps:
S1, carry out frequency resolution selection based on to the transient state analysis of input monophonic audio signal, self-adaptation many resolutions filtering is carried out to input monophonic audio signal, obtains best Time-frequency Filter signal;
S2, to filtering export each subband signal carry out Transient detection and location, transient state according to each subband signal is analyzed and is considered high frequency band encoder bit rate and people's ear critical band characteristic of setting, carry out the self-adaptation grid configuration of frequency direction and time orientation, obtain the best T-F-grid under current code check;
S3, in units of described T-F-grid, carry out high-frequency parameter coding;
S4, coding parameter is multiplexing, export high frequency band code stream, described coding parameter comprises multiresolutional filter Selection parameter and T-F-grid parameter.
10. a bandwidth expansion coding/decoding method, is characterized in that, comprises the steps:
S1, to input high frequency band code stream carry out demultiplexing, obtain coding parameter, described coding parameter comprises multiresolutional filter Selection parameter and T-F-grid parameter;
S2, carry out high-frequency parameter decoding based on T-F-grid, recover high frequency subband signals;
S3, foundation multiresolutional filter Selection parameter, carry out the multiresolution synthesis corresponding with coding side to described high frequency subband signals and the low frequency sub-band signal obtained of decoding, export the monophonic audio signal of Whole frequency band.
CN201510590039.8A 2015-09-16 2015-09-16 Adaptive grid configuration method and apparatus for bandwidth extension encoding Active CN105261373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510590039.8A CN105261373B (en) 2015-09-16 2015-09-16 Adaptive grid configuration method and apparatus for bandwidth extension encoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510590039.8A CN105261373B (en) 2015-09-16 2015-09-16 Adaptive grid configuration method and apparatus for bandwidth extension encoding

Publications (2)

Publication Number Publication Date
CN105261373A true CN105261373A (en) 2016-01-20
CN105261373B CN105261373B (en) 2019-01-08

Family

ID=55101029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510590039.8A Active CN105261373B (en) 2015-09-16 2015-09-16 Adaptive grid configuration method and apparatus for bandwidth extension encoding

Country Status (1)

Country Link
CN (1) CN105261373B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108398691A (en) * 2018-05-25 2018-08-14 中国工程物理研究院流体物理研究所 A kind of difference frequency signal generation device and method
CN111210831A (en) * 2018-11-22 2020-05-29 广州广晟数码技术有限公司 Bandwidth extension audio coding and decoding method and device based on spectrum stretching

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5886276A (en) * 1997-01-16 1999-03-23 The Board Of Trustees Of The Leland Stanford Junior University System and method for multiresolution scalable audio signal encoding
CN1377499A (en) * 1999-10-01 2002-10-30 编码技术瑞典股份公司 Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching
CN1460992A (en) * 2003-07-01 2003-12-10 北京阜国数字技术有限公司 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding
CN101312041A (en) * 2004-09-17 2008-11-26 广州广晟数码技术有限公司 Apparatus and methods for multichannel digital audio coding
CN102016983A (en) * 2008-03-04 2011-04-13 弗劳恩霍夫应用研究促进协会 Apparatus for mixing a plurality of input data streams
CN104584124A (en) * 2013-01-22 2015-04-29 松下电器产业株式会社 Bandwidth expansion parameter-generator, encoder, decoder, bandwidth expansion parameter-generating method, encoding method, and decoding method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5886276A (en) * 1997-01-16 1999-03-23 The Board Of Trustees Of The Leland Stanford Junior University System and method for multiresolution scalable audio signal encoding
CN1377499A (en) * 1999-10-01 2002-10-30 编码技术瑞典股份公司 Efficient spectral envelope coding using variable time/frequency resolution and time/frequency switching
CN1460992A (en) * 2003-07-01 2003-12-10 北京阜国数字技术有限公司 Low-time-delay adaptive multi-resolution filter group for perception voice coding/decoding
CN101312041A (en) * 2004-09-17 2008-11-26 广州广晟数码技术有限公司 Apparatus and methods for multichannel digital audio coding
CN102016983A (en) * 2008-03-04 2011-04-13 弗劳恩霍夫应用研究促进协会 Apparatus for mixing a plurality of input data streams
CN104584124A (en) * 2013-01-22 2015-04-29 松下电器产业株式会社 Bandwidth expansion parameter-generator, encoder, decoder, bandwidth expansion parameter-generating method, encoding method, and decoding method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108398691A (en) * 2018-05-25 2018-08-14 中国工程物理研究院流体物理研究所 A kind of difference frequency signal generation device and method
CN108398691B (en) * 2018-05-25 2023-10-17 中国工程物理研究院流体物理研究所 Difference frequency signal generating device and method
CN111210831A (en) * 2018-11-22 2020-05-29 广州广晟数码技术有限公司 Bandwidth extension audio coding and decoding method and device based on spectrum stretching

Also Published As

Publication number Publication date
CN105261373B (en) 2019-01-08

Similar Documents

Publication Publication Date Title
US20230245667A1 (en) Stereo audio encoder and decoder
KR101221918B1 (en) A method and an apparatus for processing a signal
RU2381571C2 (en) Synthesisation of monophonic sound signal based on encoded multichannel sound signal
CN1973319B (en) Method and apparatus to encode and decode multi-channel audio signals
US8515768B2 (en) Enhanced audio decoder
CN105280190B (en) Bandwidth extension encoding and decoding method and device
CN105164749B (en) The hybrid coding of multichannel audio
WO2009029036A1 (en) Method and device for noise filling
CN105493182A (en) Hybrid waveform-coded and parametric-coded speech enhancement
CN114550732B (en) Coding and decoding method and related device for high-frequency audio signal
KR102390360B1 (en) Backward-compatible integration of harmonic transposer for high frequency reconstruction of audio signals
CN101197576A (en) Audio signal encoding and decoding method
CN105261373A (en) Self-adaptive grid construction method and device used for bandwidth extended coding
CN103295577B (en) Analysis window switching method and device for audio signal coding
KR101786863B1 (en) Frequency band table design for high frequency reconstruction algorithms
CN105280189B (en) The method and apparatus that bandwidth extension encoding and decoding medium-high frequency generate
KR20230035373A (en) Audio encoding method, audio decoding method, related device, and computer readable storage medium
CN117476024A (en) Audio encoding method, audio decoding method, apparatus, and readable storage medium
EA044947B1 (en) BACKWARD COMPATIBLE HARMONIC TRANSPOSER LAYOUT FOR RECONSTRUCTION OF HIGH FREQUENCY AUDIO SIGNALS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220527

Address after: 510530 No. 10, Nanxiang 2nd Road, Science City, Luogang District, Guangzhou, Guangdong

Patentee after: Guangdong Guangsheng research and Development Institute Co.,Ltd.

Address before: 518057 6th floor, software building, No. 9, Gaoxin Zhongyi Road, high tech Zone, Nanshan District, Shenzhen, Guangdong Province

Patentee before: SHENZHEN RISING SOURCE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right