CN102194457A - Audio encoding and decoding method, system and noise level estimation method - Google Patents

Audio encoding and decoding method, system and noise level estimation method Download PDF

Info

Publication number
CN102194457A
CN102194457A CN2010191850619A CN201019185061A CN102194457A CN 102194457 A CN102194457 A CN 102194457A CN 2010191850619 A CN2010191850619 A CN 2010191850619A CN 201019185061 A CN201019185061 A CN 201019185061A CN 102194457 A CN102194457 A CN 102194457A
Authority
CN
China
Prior art keywords
frequency
subband
noise
encoded
zero bits
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010191850619A
Other languages
Chinese (zh)
Other versions
CN102194457B (en
Inventor
江东平
袁浩
彭科
陈国明
黎家力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN2010191850619A priority Critical patent/CN102194457B/en
Publication of CN102194457A publication Critical patent/CN102194457A/en
Application granted granted Critical
Publication of CN102194457B publication Critical patent/CN102194457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to an audio encoding and decoding method, a system and a noise level estimation method, and the noise level estimation method comprises the following steps: estimating a power spectrum of audio signals to be encoded according to a frequency domain coefficient of the audio signals to be encoded; and estimating noise level of the audio signals of a zero-bit encoding sub-band according to the calculated power spectrum, wherein the noise level is used for controlling the ratio of energy for noise filling to the energy for frequency band replication during decoding, and the zero-bit encoding sub-band refers to the encoding sub-band of which the distributed number of bits is zero. By adopting the method in the invention, the frequency domain coefficient which is not encoded can be well re-constructed.

Description

Audio encoding and decoding method, system and noise level method of estimation
Technical field
The present invention relates to a kind of audio encoding and decoding technique, especially a kind of audio encoding and decoding method, system and noise level method of estimation of uncoded coding subband being carried out frequency spectrum reconfiguration.
Background technology
The audio coding technology is that the raising of audio coder compression performance can be greatly benefited from these application such as the core of multimedia application technology such as the propagation music of digital audio broadcasting, internet and voice communication.Perceptual audio encoders is modern main flow audio coder as diminishing a kind of of transform domain coding.Usually because the restriction of coding bit rate, part frequency coefficient or frequency content can't be encoded during audio coding, for the do not encode spectrum component of subband of better recovery, existing audio codec uses noise to fill usually or the method for spectral band replication is come the do not encode spectrum component of subband of reconstruct.G.722.1C the method that has adopted noise to fill, HE-AAC-V1 has adopted the spectral band replication technology, has G.719 adopted noise filling and simple band to duplicate the method for combination.The method that adopts noise to fill well to recover the not encode spectrum envelope of subband and the tone and the noise contribution of subband inside.The spectral band replication method of HE-AAC-V1 need be carried out spectrum analysis to the sound signal before encoding, the signal of radio-frequency component is carried out tone and Noise Estimation, extracting parameter, and to using the AAC scrambler to encode behind the sound signal down-sampling, its computational complexity height, but also need to transmit more parameter information to decoding end, take more coded-bit, also can increase coding delay simultaneously.And replication theme G.719 is too simple, and the spectrum envelope of subband and the tone and the noise contribution of subband inside can't well recover not encode.
Summary of the invention
The technical problem to be solved in the present invention provides a kind of audio encoding and decoding method, system and noise level method of estimation, the frequency coefficient that is not encoded with reconstruct well.
For solving above technical matters, the invention provides a kind of noise level method of estimation, this method comprises:
Estimate the power spectrum of sound signal to be encoded according to the frequency coefficient of sound signal to be encoded;
According to the noise level of the power Spectral Estimation that calculates zero bits of encoded subband sound signal, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding; Wherein, zero bits of encoded subband refers to that the bit number that is assigned to is zero coding subband.
Further, estimate in the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal that the noise contribution power that obtains and zero bits of encoded subband estimate the ratio of the tonal content power that obtains.
Further,
Estimate the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, the rating formula of the frequency k of i frame is as follows:
P i(k)=λ P I-1(k)+(1-λ) X j(k) 2, P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
Further,
The frequency coefficient of sound signal to be encoded is divided into one or several noises fills subband, calculate certain effective noise according to the power spectrum of the sound signal of estimating to be encoded and fill the process of the noise level of subband and specifically comprise:
Calculate this effective noise and fill the mean value of the power of all frequency coefficients of all or part zero bits of encoded subband in the subband, obtain average power P_aveg (j);
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband;
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband;
The ratio P_noise_rate (j) of calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband.
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
For solving above technical matters, the present invention also provides a kind of audio coding method, and this method comprises:
A, the MDCT frequency coefficient of sound signal to be encoded is divided into several coding subbands, the amplitude envelope value of each coding subband is carried out quantization encoding, obtain the amplitude envelope coded-bit;
B, each coding subband is carried out Bit Allocation in Discrete, and non-zero bit coding subband is carried out quantization encoding, obtain MDCT frequency coefficient coded-bit;
C, estimate the power spectrum of sound signal to be encoded, and then estimate the noise level of zero bits of encoded subband sound signal, and quantization encoding obtains the noise level coded-bit according to the MDCT frequency coefficient of sound signal to be encoded; Wherein, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding, and zero bits of encoded subband refers to that the bit number that is assigned to is zero coding subband;
Behind D, the amplitude envelope coded-bit and frequency coefficient coded-bit and the multiplexing packing of noise level coded-bit, send decoding end to each coding subband.
Further, among the step C, the ratio of the tonal content power that estimation obtains in noise contribution power that the interior estimation of the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal obtains and the zero bits of encoded subband.
Further,
Estimate the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, the algorithm that the power of the frequency k of i frame is estimated is as follows:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, wherein equal 0 as i, the time P I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
Further, among the step B, the frequency coefficient of sound signal to be encoded is divided into one or several noises fills subband, and after to each coding allocation of subbands bit, be that effective noise is filled the allocation of subbands bit; Among the step C, calculate certain effective noise according to the power spectrum of the sound signal of estimating to be encoded and fill the process of the noise level of subband and specifically comprise:
Calculate this effective noise and fill the mean value of all frequency coefficients of all or part zero bits of encoded subband in the subband, obtain average power P_aveg (j);
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband;
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband;
The ratio P_noise_rate (j) of calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband.
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
Further, when dividing noise and filling subband, evenly divide or carry out non-homogeneous division according to human hearing characteristic, a noise is filled subband and is comprised one or more coding subbands.
Further, the effective noise of filling allocation of subbands bits for all effective noises among the step B or skipping one or several low frequency is filled subband, is that the effective noise of follow-up higher-frequency is filled the allocation of subbands bit; Among the step C dispensed effective noise of bit fill the noise level of subband; Use the bit of this distribution to the multiplexing packing of noise level coded-bit among the step D.
Further, each effective noise is filled the identical bit number of allocation of subbands or is distributed different bit numbers according to auditory properties.
For solving above technical matters, the present invention also provides a kind of audio-frequency decoding method, and this method comprises:
A2, treat in the decoding bit stream each amplitude envelope coded-bit inverse quantization of decoding, the amplitude envelope of the subband of respectively being encoded;
B2, each coding subband is carried out Bit Allocation in Discrete, the noise level coded-bit inverse quantization of decoding is obtained the noise level of zero bits of encoded subband, the frequency coefficient coded-bit inverse quantization of decoding is obtained the frequency coefficient of non-zero bit coding subband;
C2, zero bits of encoded subband is carried out spectral band replication, and control the integral energy fill level of this coding subband according to the amplitude envelope of each zero bits of encoded subband, fill and the ratio of the energy of spectral band replication the frequency coefficient of the zero bits of encoded subband of acquisition reconstruct according to the noise level control noise of this zero bits of encoded subband;
D2, the frequency coefficient of the zero bits of encoded subband of the frequency coefficient of non-zero bit coding subband and reconstruct is revised inverse discrete cosine transform (IMDCT), obtain final sound signal.
Further, among the step C2, during spectral band replication, the position at certain tone place of search sound signal in the MDCT frequency coefficient, is the spectral band replication cycle with 0 frequency to the bandwidth of the frequency of tone locations, and be offset copyband_offset frequency backward with 0 frequency and be offset the frequency range of a described copyband_offset frequency backward as the source frequency range to the frequency of tone locations, zero bits of encoded subband is carried out spectral band replication, if the highest frequency of zero bits of encoded subband inside less than the frequency of the tone that searches, then should only adopt noise to be filled into the reconstruct of line frequency spectrum by zero bits of encoded subband.
Further, among the step C2,
The frequency coefficient of first frequency range is taken absolute value or square value and carry out smothing filtering;
According to the result of smothing filtering, search for the position at the maximum extreme value place of the first frequency range filtering output value, with position, the position at this maximum extreme value place as certain tone place.
Further, as follows to the take absolute value operational formula of carrying out smothing filtering of the frequency coefficient of this first frequency range:
X _ amp i ( k ) = μX _ amp i - 1 ( k ) + ( 1 - μ ) | X ‾ i ( k ) |
Or the operational formula of the frequency coefficient square value of this first frequency range being carried out smothing filtering is as follows
X _ amp i ( k ) = μX _ amp i - 1 ( k - 1 ) + ( 1 - μ ) X ‾ i ( k ) 2
Wherein, μ is the smothing filtering coefficient, X_amp i(k) filtering output value of k frequency of expression i frame, Be k frequency decoded M DCT coefficient of i frame, and during i=0, X_amp I-1(k)=0.
Further, described first frequency range is the frequency range of the low frequency relatively concentrated according to the energy that the statistical property of frequency spectrum is determined, and its medium and low frequency refers to the spectrum component less than 1/2nd signal total bandwidths.
Further, adopt following method to determine the maximum extreme value of filtering output value: directly from the filtering output value of the frequency coefficient of the first frequency range correspondence, to search for original maximum, with the maximum extreme value of this maximal value as the first frequency range filtering output value.
Further, adopt following method to determine the maximum extreme value of filtering output value:
With this first frequency range wherein one section as second frequency range, from the filtering output value of the frequency coefficient of the second frequency range correspondence, search for original maximum, carry out different processing according to the position of the frequency coefficient of this original maximum correspondence:
If a. this original maximum is the filtering output value of the frequency coefficient of the second frequency range low-limit frequency, then the filtering output value of the frequency coefficient of this second frequency range low-limit frequency is compared with the filtering output value of previous more low-frequency frequency coefficient in first frequency range, compare forward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of previous frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, during greater than the filtering output value of a back frequency coefficient, then the filtering output value of the frequency coefficient of the low-limit frequency of first frequency range is the final maximum extreme value of determining up to the filtering output value of the frequency coefficient of the low-limit frequency that relatively draws first frequency range;
If b. this original maximum is the filtering output value of the frequency coefficient of the second frequency range highest frequency, then the filtering output value of the frequency coefficient of this second frequency range highest frequency is compared with the filtering output value of the frequency coefficient of a back higher frequency in first frequency range, compare backward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of a back frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, when the filtering output value of the frequency coefficient of the highest frequency that relatively draws first frequency range was bigger than the filtering output value of previous frequency coefficient, then the filtering output value of the frequency coefficient of the highest frequency of first frequency range was the final maximum extreme value of determining;
If c. this original maximum is the filtering output value of the frequency coefficient between the second frequency range low-limit frequency and the highest frequency, then the frequency coefficient of this original maximum correspondence is the position at tone place, that is, this original maximum is the final maximum extreme value of determining.
Further, among the step C2, when zero bits of encoded subband is carried out spectral band replication, earlier the start sequence number of carrying out the zero bits of encoded subband of spectral band replication according to source frequency range and needs is calculated the source frequency range replication initiation sequence number of this zero bits of encoded subband, be the cycle with the spectral band replication cycle again, begin the frequency coefficient of source frequency range is periodically copied to zero bits of encoded subband from source frequency range replication initiation sequence number.
Further, the method for the source frequency range replication initiation sequence number of this zero bits of encoded subband of calculating is among the step C2:
Acquisition needs the sequence number of frequency of initial MDCT frequency coefficient of the zero bits of encoded subband of reconstructed frequency domain coefficient, be designated as fillband_start_freq, the sequence number of tone frequency points corresponding is designated as Tonal_pos, Tonal_pos is added 1 obtain replicative cycle copy_period, the spectral band replication skew is designated as copyband_offset, the value circulation of fillband_start_freq is deducted copy_period, drop on the value interval of the sequence number of source frequency range up to this value, this value is designated as copy_pos_mod for source frequency range replication initiation sequence number.
Further, be the cycle with the spectral band replication cycle among the step C2, begin from source frequency range replication initiation sequence number with the frequency coefficient periodic repetitions of source frequency range be to the method for zero bits of encoded subband:
To copy to backward successively with fillband_start_freq with the frequency coefficient that source frequency range replication initiation sequence number begins is on the zero bits of encoded subband of reference position, after the frequency that the source frequency range is duplicated arrives the Tonal_pos+copyband_offset frequency, again will continue to copy to backward on this zero bits of encoded subband since the frequency coefficient of copyband_offset frequency, the rest may be inferred, up to the spectral band replication of finishing when all frequency coefficients of leading zero bits of encoded subband.
Further, among the step C2, the frequency coefficient that obtains after adopting following method that zero bits of encoded subband is duplicated carries out the energy adjustment:
Calculate the amplitude envelope of the frequency coefficient that obtains after the zero bits of encoded subband spectral band replication, be designated as sbr_rms (r);
The formula that the frequency coefficient that obtains after duplicating is carried out the energy adjustment is:
X _ sbr ‾ ( r ) = X _ sbr ( r ) * sbr _ lev _ scale ( r ) * rms ( r ) / sbr _ rms ( r ) ;
Wherein,
Figure GSA00000030054100072
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, X_sbr (r) expression zero frequency coefficient of bits of encoded subband r by obtaining after duplicating, sbr_rms (r) is the amplitude envelope of the frequency coefficient X_sbr (r) that obtains after zero bits of encoded subband r duplicates, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, sbr_lev_scale (r) is the energy control ratio factor of the spectral band replication of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
sbr _ lev _ scale ( r ) = ( 1 - P _ noise _ rate ‾ ( j ) ) * fill _ energy _ saclefactor
Fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1),
Figure GSA00000030054100074
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
Further, among the step C2, according to following formula the energy adjustment frequency coefficient is carried out noise and fills:
X ‾ ( r ) = X _ sbr ‾ ( r ) + rms ( r ) * noise _ lev _ scale ( r ) * random ( ) ;
Wherein,
Figure GSA00000030054100081
Expression zero bits of encoded subband r reconstructed frequency domain coefficient,
Figure GSA00000030054100082
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, random () is the random phase generator, produce the random phase value, its rreturn value is+1 or-1, noise_lev_scale (r) is the noise level control ratio factor of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
noise _ lev _ scale ( r ) = P _ noise _ rate ‾ ( j ) * fill _ energy _ saclefactor
Wherein, fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1), For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
Further, among the step B2, after subband carries out Bit Allocation in Discrete to each coding, the sub-band division of will encoding is filled subband for several noises, effective noise is filled subband carry out Bit Allocation in Discrete, among the step C2, the zero bits of encoded subband that the effective noise that has distributed bit is filled in the subband carries out the energy level of spectral band replication and the control frequency coefficient that duplicates and the energy level that noise is filled, the zero bits of encoded subband that the effective noise of unallocated bit is filled in the subband carries out the noise filling, and wherein effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
For solving above technical matters, the present invention also provides a kind of audio coding system, this system comprises revises discrete cosine transform (MDCT) unit, amplitude envelope computing unit, amplitude envelope quantification and coding unit, Bit Allocation in Discrete unit, frequency coefficient coding unit and bit stream multiplexer (MUX), this system also comprises the noise level estimation unit, wherein:
The MDCT unit is used for that sound signal is revised the inverse discrete cosine transform conversion and generates frequency coefficient;
The amplitude envelope computing unit is connected with described MDCT unit, is used for the frequency coefficient that described MDCT generates is divided into several coding subbands, and calculates the amplitude envelope value of the subband of respectively encoding;
Amplitude envelope quantizes and coding unit, is connected with described amplitude envelope computing unit, is used for amplitude envelope value with each coding subband and quantizes and encode, respectively the encode coded-bit of subband amplitude envelope of generation;
The Bit Allocation in Discrete unit quantizes to be connected with coding unit with described amplitude envelope, is used for each coding allocation of subbands bit;
Frequency coefficient quantization encoding unit quantizes to be connected with coding unit with MDCT unit, Bit Allocation in Discrete unit and amplitude envelope, is used for each all frequency coefficient of coding subband is carried out normalization, quantification and encoding process, generates the frequency coefficient coded-bit;
The noise level estimation unit, be connected with MDCT unit and Bit Allocation in Discrete unit, be used for estimating the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, and then the noise level of estimation zero bits of encoded subband sound signal, and quantization encoding obtains the noise level coded-bit; Wherein, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding;
Bit stream multiplexer (MUX) is connected with coding unit, frequency coefficient coding unit and noise level estimation unit with described amplitude envelope quantification, is used for the coded-bit of each coding coded-bit of subband and frequency coefficient is multiplexing and send to decoding end.
Further, estimate in the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal that the noise contribution power that obtains and zero bits of encoded subband estimate the ratio of the tonal content power that obtains.
Further, described noise level estimation unit specifically comprises:
The power Spectral Estimation module is used for estimating according to the MDCT frequency coefficient of sound signal to be encoded the power spectrum of sound signal to be encoded;
The noise level computing module is connected with described power Spectral Estimation module, is used for the noise level according to the power Spectral Estimation zero bits of encoded subband sound signal of described power Spectral Estimation module estimation;
The noise level coding module is connected with described noise level computing module, is used for the noise level that described noise level computing module calculates is carried out quantization encoding, obtains the noise level coded-bit.
Further, described power Spectral Estimation module adopts following formula to estimate the power of the frequency k of i frame, and formula is as follows:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
Further,
The frequency coefficient of sound signal to be encoded is divided into one or several noises and fills subband, the function of described noise level computing module specifically comprises: be used for calculating the mean value that this effective noise is filled all frequency coefficient power of subband all or part zero bits of encoded subband, obtain average power P_aveg (j); Be used for calculating this effective noise and fill subband all or part zero bits of encoded subband power P i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband; Be used for calculating this effective noise and fill subband all or part zero bits of encoded subband power P i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband; The ratio that is used for calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband;
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
Further, described noise level estimation unit also comprises the Bit Allocation in Discrete module that is connected with noise level computing module and noise level coding module, the effective noise that is used to all effective noises to fill allocation of subbands bits or skips one or several low frequency is filled subband, for the effective noise of follow-up higher-frequency is filled the allocation of subbands bit, and notice noise level computing module and noise level coding module; Described noise level computing module is only filled subband calculating noise level for the noise that has distributed bit; Described noise level coding module utilizes the bit of Bit Allocation in Discrete module assignment that described noise level is carried out quantization encoding.
For solving above technical matters, the present invention also provides a kind of audio decoding system, this system comprises bit stream demultiplexer (DeMUX), coding subband amplitude envelope decoding unit, Bit Allocation in Discrete unit, frequency coefficient decoding unit, frequency spectrum reconfiguration unit, revises inverse discrete cosine transform (IMDCT) unit, wherein:
Described DeMUX is used for isolating amplitude envelope coded-bit, frequency coefficient coded-bit and noise level coded-bit from bit stream to be decoded;
Described amplitude envelope decoding unit is connected with described DeMUX, is used for the amplitude envelope coded-bit of described bit stream demultiplexer output is decoded the amplitude envelope quantification index of the subband of respectively being encoded;
Described Bit Allocation in Discrete unit is connected with described amplitude envelope decoding unit, is used to carry out Bit Allocation in Discrete, is the number of coded bits that each frequency coefficient distributed in the subband of respectively being encoded;
The frequency coefficient decoding unit is connected with the Bit Allocation in Discrete unit with the amplitude envelope decoding unit, be used for to the coding subband decode, inverse quantization and anti-normalization to be to obtain frequency coefficient;
The noise level decoding unit is connected with described bit stream demultiplexer and Bit Allocation in Discrete unit, is used for the noise level coded-bit inverse quantization of decoding is obtained noise level;
Described frequency spectrum reconfiguration unit, be connected with described noise level decoding unit, frequency coefficient decoding unit, amplitude envelope decoding unit and Bit Allocation in Discrete unit, be used for zero bits of encoded subband is carried out spectral band replication, and control the integral energy fill level of this coding subband according to the amplitude envelope of amplitude envelope decoding unit output, fill and the ratio of the energy of spectral band replication the frequency coefficient of the zero bits of encoded subband of acquisition reconstruct according to the noise level control noise of noise level decoding unit output;
The IMDCT unit is connected with described frequency spectrum reconfiguration unit, is used for the frequency coefficient behind the frequency spectrum reconfiguration of finishing zero bits of encoded subband is carried out IMDCT, the sound signal that obtains.
Further, described frequency spectrum reconfiguration unit comprises that the spectral band replication subelement, the energy that connect are successively adjusted subelement and noise is filled subelement, wherein:
The spectral band replication subelement is used for zero bits of encoded subband is carried out spectral band replication;
Energy is adjusted subelement, is used to calculate the amplitude envelope of the frequency coefficient that obtains after the zero bits of encoded subband spectral band replication, is designated as sbr_rms (r); And the frequency coefficient that obtains after duplicating is carried out the energy adjustment according to the noise level of noise level decoding unit output, the formula of energy adjustment is:
X _ sbr ‾ ( r ) = X _ sbr ( r ) * sbr _ lev _ scale ( r ) * rms ( r ) / sbr _ rms ( r ) ;
Wherein, The energy adjustment frequency coefficient of expression zero bits of encoded subband r, the frequency coefficient that obtains after X_sbr (r) expression zero bits of encoded subband r duplicates, sbr_rms (r) is the amplitude envelope of the frequency coefficient X_sbr (r) that obtains after zero bits of encoded subband r duplicates, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, sbr_lev_scale (r) is the energy control ratio factor of the spectral band replication of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
sbr _ lev _ scale ( r ) = ( 1 - P _ noise _ rate ‾ ( j ) ) * fill _ energy _ saclefactor
Fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1), For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband;
Noise is filled subelement, is used for according to the noise level of noise level decoding unit output the energy adjustment frequency coefficient being carried out the noise filling, and the formula that noise is filled is:
X ‾ ( r ) = X _ sbr ‾ ( r ) + rms ( r ) * noise _ lev _ scale ( r ) * random ( ) ;
Wherein,
Figure GSA00000030054100123
Expression zero bits of encoded subband r reconstructed frequency domain coefficient,
Figure GSA00000030054100124
The energy adjustment of expression zero bits of encoded subband r is duplicated frequency coefficient, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, random () is the random phase generator, produce the random phase value, its rreturn value is+1 or-1, noise_lev_scale (r) is the noise level control ratio factor of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
noise _ lev _ scale ( r ) = P _ noise _ rate ‾ ( j ) * fill _ energy _ saclefactor
Wherein, fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1), For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
Further, described spectral band replication subelement comprises tone locations search module, cycle and source frequency range computing module, source frequency range replication initiation sequence number computing module and the spectral band replication module that connects successively, wherein:
The tone locations search module is used in the position at certain tone place of MDCT frequency coefficient search sound signal,
Cycle and source frequency range computing module, be used for the spectral band replication cycle and the source frequency range that are identified for duplicating according to the tone position, this spectral band replication cycle is 0 frequency to the bandwidth of the frequency of tone locations, and described source frequency range is frequency that 0 frequency is offset spectral band replication skew copyband_offset backward is offset the frequency of described copyband_offset backward to the frequency of tone locations a frequency range;
Source frequency range replication initiation sequence number computing module is used for calculating according to the start sequence number that source frequency range and needs carry out the zero bits of encoded subband of spectral band replication the source frequency range replication initiation sequence number of this zero bits of encoded subband;
Described spectral band replication module be used for the spectral band replication cycle be the cycle, begin the frequency coefficient periodic repetitions of source frequency range to zero bits of encoded subband from source frequency range replication initiation sequence number; As the highest frequency of the zero bits of encoded subband inside frequency less than the tone that searches, then this frequency only adopts noise to be filled into the reconstruct of line frequency spectrum.
Further, described tone locations search module adopts following method search tone position: the MDCT frequency coefficient to first frequency range takes absolute value or square value, and carries out smothing filtering; According to the result of smothing filtering, search for the position at the maximum extreme value place of the first frequency range filtering output value, the position at this maximum extreme value place is the position at tone place.
Further, described tone locations search module to the MDCT frequency coefficient of this first frequency range operational formula of carrying out smothing filtering that takes absolute value is: X _ amp i ( k ) = μX _ amp i - 1 ( k ) + ( 1 - μ ) | X ‾ i ( k ) |
Or the computing of the frequency coefficient square value of this first frequency range being carried out smothing filtering is:
X _ amp i ( k ) = μX _ amp i - 1 ( k - 1 ) + ( 1 - μ ) X ‾ i ( k ) 2
Wherein, μ is the smothing filtering coefficient, X_amp i(k) filtering output value of k frequency of expression i frame,
Figure GSA00000030054100133
Be k frequency decoded M DCT coefficient of i frame, and during i=0, X_amp I-1(k)=0.
Further, described first frequency range is the frequency range of the low frequency relatively concentrated according to the energy that the statistical property of frequency spectrum is determined, and its medium and low frequency refers to the spectrum component less than 1/2nd signal total bandwidths.
Further, described tone locations search module computing module is directly searched for original maximum from the filtering output value of the frequency coefficient of the first frequency range correspondence, with the maximum extreme value of this maximal value as the first frequency range filtering output value.
Further, when described tone locations search module is determined the maximum extreme value of filtering output value, with this first frequency range wherein one section as second frequency range, earlier from the filtering output value of the frequency coefficient of the second frequency range correspondence, search for original maximum, carry out different processing according to the position of the frequency coefficient of this original maximum correspondence again:
If a. this original maximum is the filtering output value of the frequency coefficient of the second frequency range low-limit frequency, then the filtering output value of the frequency coefficient of this second frequency range low-limit frequency is compared with the filtering output value of previous more low-frequency frequency coefficient in first frequency range, compare forward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of previous frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, during greater than the filtering output value of a back frequency coefficient, then the filtering output value of the frequency coefficient of the low-limit frequency of first frequency range is the final maximum extreme value of determining up to the filtering output value of the frequency coefficient of the low-limit frequency that relatively draws first frequency range;
If b. this original maximum is the filtering output value of the frequency coefficient of the second frequency range highest frequency, then the filtering output value of the frequency coefficient of this second frequency range highest frequency is compared with the filtering output value of the frequency coefficient of a back higher frequency in first frequency range, compare backward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of a back frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, when the filtering output value of the frequency coefficient of the highest frequency that relatively draws first frequency range was bigger than the filtering output value of previous frequency coefficient, then the filtering output value of the frequency coefficient of the highest frequency of first frequency range was the final maximum extreme value of determining;
If c. this original maximum is the filtering output value of the frequency coefficient between the second frequency range low-limit frequency and the highest frequency, then the frequency coefficient of this original maximum correspondence is the position at tone place, that is, this original maximum is the final maximum extreme value of determining.
Further, the process that described source frequency range replication initiation sequence number computing module calculates the source frequency range replication initiation sequence number of the zero bits of encoded subband that need carry out spectral band replication comprises: the sequence number that obtains the initial frequency of the current zero bits of encoded subband that needs the reconstructed frequency domain coefficient, be designated as fillband_start_freq, the sequence number of tone frequency points corresponding is designated as Tonal_pos, Tonal_pos is added 1 obtain replicative cycle copy_period, source frequency range start sequence number is designated as copyband_offset, the value circulation of fillband_start_freq is deducted copy_period, drop on the value interval of the sequence number of source frequency range up to this value, this value is designated as copy_pos_mod for source frequency range replication initiation sequence number.
Further, when the spectral band replication module is carried out spectral band replication, to copy to backward successively with fillband_start_freq with the frequency coefficient that source frequency range replication initiation sequence number begins is on the zero bits of encoded subband of reference position, after the frequency that the source frequency range is duplicated arrives the Tonal_pos+copyband_offset frequency, again will continue to copy to backward on this zero bits of encoded subband since the frequency coefficient of copyband_offset frequency, the rest may be inferred, duplicates up to all frequency coefficients of finishing when leading zero bits of encoded subband.
Further, described Bit Allocation in Discrete unit also is used to all effective noises filling allocation of subbands bits or skips the effective noise filling subband of one or several low frequency, is that the effective noise of follow-up higher-frequency is filled the allocation of subbands bit; Described energy is adjusted the frequency coefficient that subelement obtains after to spectral band replication and is carried out the energy adjustment; Described noise is filled subelement the zero bits of encoded subband in the noise filling subband of energy adjustment frequency coefficient and unallocated bit is carried out the noise filling.
The present invention estimates the power spectrum of sound signal to be encoded by the MDCT frequency coefficient at coding side, and estimate the noise level of zero bits of encoded subband sound signal by the power spectrum of estimating to obtain, will noise level information be sent to decoding end behind the coding, the noise that is used for controlling decoding end is filled and the ratio of the energy of spectral band replication; After the decoding end decoding obtains encoding the MDCT frequency coefficient, the method that adopts spectral band replication and noise to fill is carried out frequency coefficient reconstruct to uncoded coding subband, and wherein the ratio of the energy of noise filling and spectral band replication is controlled by the noise level coded-bit that coding side sends.This method can be recovered the spectrum envelope and the inner tonal noise composition of uncoded coding subband well, has obtained subjective preferably hearing effect.
Description of drawings
Fig. 1 is an audio coding method synoptic diagram of the present invention.
Fig. 2 is that the present invention obtains the schematic flow sheet that noise is filled the noise level coded-bit of the inner zero bits of encoded subband of subband.
Fig. 3 is the schematic flow sheet of calculating noise level of the present invention.
Fig. 4 is an audio-frequency decoding method synoptic diagram of the present invention.
Fig. 5 is the schematic flow sheet of frequency spectrum reconfiguration of the present invention.
Fig. 6 is the structural representation of audio coding system of the present invention.
Fig. 7 is the modular structure synoptic diagram of noise level estimation unit of the present invention.
Fig. 8 is the structural representation of audio decoding system of the present invention.
Fig. 9 is the modular structure synoptic diagram of frequency spectrum reconfiguration of the present invention unit.
Figure 10 is that the code stream of the embodiment of the invention constitutes synoptic diagram.
Embodiment
Core concept of the present invention is, estimate the power spectrum of sound signal to be encoded by the MDCT frequency coefficient at coding side, and estimate the noise level of zero bits of encoded subband sound signal by the power spectrum of estimating to obtain, will noise level information be sent to decoding end behind the coding, the noise that is used for controlling decoding end is filled and the ratio of the energy of spectral band replication; After the decoding end decoding obtains encoding the MDCT frequency coefficient, the method that adopts spectral band replication and noise to fill is carried out frequency coefficient reconstruct to uncoded coding subband, and wherein the ratio of the energy of noise filling and spectral band replication is controlled by the noise level coded-bit that coding side sends.This method can be recovered the spectrum envelope and the inner tonal noise composition of uncoded coding subband well, has obtained subjective preferably hearing effect.
The said frequency coefficient of the present invention all refers to the MDCT frequency coefficient.
Below divide coding method, coding/decoding method, coded system, decode system four parts to come that the present invention is described in detail:
One, coding method
Audio coding method of the present invention may further comprise the steps:
A, the MDCT frequency coefficient of sound signal to be encoded is divided into several coding subbands, and the amplitude envelope value of each coding subband is carried out quantization encoding, obtain the coded-bit of amplitude envelope;
When dividing the coding subband, the frequency coefficient after the described MDCT conversion is divided into several equally spaced coding subbands, perhaps is divided into several non-uniform encoding subbands according to auditory perception property.
B, each coding subband is carried out Bit Allocation in Discrete, and non-zero bit coding subband is carried out quantization encoding, obtain the coded-bit of MDCT frequency coefficient;
After subband carries out Bit Allocation in Discrete to each coding, if the assigned bit number of certain coding subband is zero, then this coding subband is not carried out quantization encoding, the subband of should encoding herein is called zero bits of encoded subband or uncoded coding subband, and other coding subbands are called non-zero bit coding subband.
Adopting which kind of method that each coding subband is carried out normalization, quantification and coding is not the emphasis that the present invention pays close attention to.
C, estimate the power spectrum of sound signal to be encoded, and then estimate the noise level of zero bits of encoded sound signal, and quantization encoding obtains the noise level coded-bit according to the MDCT frequency coefficient of sound signal to be encoded; Wherein, this noise level coded-bit is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding;
The ratio of the tonal content power that noise contribution power that estimation obtains in the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal and zero bits of encoded subband estimation obtain.
Behind D, the coded-bit and the multiplexing packing of noise level coded-bit, send decoding end to coded-bit, the described frequency coefficient of the described amplitude envelope of each coding subband.
Below in conjunction with accompanying drawing, audio coding method of the present invention is elaborated:
Embodiment 1-coding method
Fig. 1 is the structural representation of a kind of audio coding method of the embodiment of the invention.Be 20ms with the frame length in the present embodiment, sampling rate is that the audio stream of 32kHz is that example specifies audio coding method of the present invention.Under other frame length and sampling rate condition, method of the present invention is suitable equally.As shown in Figure 1, this method comprises:
101: treat coded audio stream enforcement MDCT (Modified Discrete Cosine Transform revises discrete cosine transform) and obtain N the frequency coefficient on the frequency domain sample point;
The specific implementation of this step can be:
With the N point time-domain sampling signal x (n) of present frame and the N point time-domain sampling signal x of previous frame Old(n) form 2N point time-domain sampling signal
Figure GSA00000030054100171
The time-domain sampling signal that 2N is ordered can be expressed from the next:
x ‾ ( n ) = x old ( n ) n = 0,1 , · · · , N - 1 x ( n - N ) n = N , N + 1 , · · · , 2 N - 1 - - - ( 1 )
Right Implement the MDCT conversion, obtain following frequency coefficient:
X ( k ) = Σ n = 0 2 N - 1 x ‾ ( n ) w ( n ) cos [ π N ( n + 1 2 + N 2 ) ( k + 1 2 ) ] , k = 0 , · · · , N - 1 - - - ( 2 )
Wherein, w (n) expression sinusoidal windows function, expression formula is:
w ( n ) = sin [ π 2 N ( n + 1 2 ) ] , n = 0 , · · · , 2 N - 1 - - - ( 3 )
When frame length is 20ms, when sampling rate is 32kHz, obtain 640 frequency coefficients.Other frame lengths and sampling rate can be calculated corresponding frequency coefficient number N equally.
102: N frequency coefficient is divided into several coding subbands, calculates the amplitude envelope of each coding subband;
Adopt non-homogeneous sub-band division in the present embodiment, calculate the frequency domain amplitude envelope (abbreviation amplitude envelope) of each subband.
This step can adopt following substep to realize:
102a: the frequency coefficient in the frequency band range of required processing is divided into L subband (can be called the coding subband);
In the present embodiment, the frequency band range of required processing is 0~13.6kHz, can carry out non-homogeneous sub-band division according to people's ear apperceive characteristic, and table 1 has provided a concrete dividing mode.
In table 1, the frequency coefficient in 0~13.6kHz frequency band range is divided into 28 codings subband, just L=28; And the frequency coefficient more than the 13.6kHz is changed to 0.
102b: the amplitude envelope of calculating the subband of respectively encoding according to following formula:
Th ( j ) = 1 HIndex ( j ) - LIndex ( j ) + 1 Σ k = LIndex ( j ) HIndex ( j ) X ( k ) X ( k ) , j = 0,1 , · · · , L - 1 - - - ( 4 )
Wherein, LIndex (j) and HIndex (j) represent the initial frequency point of j coding subband respectively and finish Frequency point that its concrete numerical value is as shown in table 1.
The non-homogeneous sub-band division mode of table 1 frequency domain example
Sub-band serial number Initial frequency point (LIndex) Finish Frequency point (HIndex) Subband width (BandWidth)
0 0 7 8
1 8 15 8
2 16 23 8
3 24 31 8
4 32 47 16
5 48 63 16
6 64 79 16
7 80 95 16
8 96 111 16
9 112 127 16
10 128 143 16
11 144 159 16
12 160 183 24
13 184 207 24
14 208 231 24
15 232 255 24
16 256 279 24
17 280 303 24
18 304 327 24
19 328 351 24
20 352 375 24
21 376 399 24
22 400 423 24
23 424 447 24
24 448 471 24
25 472 495 24
26 496 519 24
27 520 543 24
103: the amplitude envelope to each coding subband quantizes and encodes, and obtains the quantification index of amplitude envelope and the quantification index coded-bit of amplitude envelope (being the coded-bit of amplitude envelope);
Adopt following formula (5) the subband amplitude envelope of respectively encoding that calculates according to formula (4) to be quantized the quantification index of the subband amplitude envelope of respectively being encoded:
Figure GSA00000030054100201
Wherein, Expression rounds Th downwards q(0) for the amplitude envelope quantification index of first coding subband, its scope is limited in [5,34], promptly works as Th q(0)<-5 o'clock, makes Th q(0)=-5; Work as Th q(0)>34 o'clock, makes Th q(0)=34.
The quantization amplitude envelope of rebuilding according to quantification index is
Figure GSA00000030054100203
Use 6 bits that the amplitude envelope quantification index of first coding subband is encoded, promptly consume 6 bits.
Each calculus of differences value of encoding between subband amplitude envelope quantification index adopts following formula to calculate:
ΔTh q(j)=Th q(j+1)-Th q (j)j=0,…,L-2 (6)
Can carry out following correction to guarantee Δ Th to amplitude envelope q(j) scope is within [15,16]:
If Δ Th q(j)<-15, then make Δ Th q(j)=-15, Th q(j)=Th q(j+1)+15, j=L-2 ..., 0;
If Δ Th q(j)>16, then make Δ Th q(j)=16, Th q(j+1)=Th q(j)+16, j=0 ..., L-2;
To Δ Th q(j), j=0 ..., L-2 carries out Huffman (Huffman) coding, and calculates the bit number (being called the huffman coding bit, Huffman coded bits) that consumed this moment.If the huffman coding bit then uses the natural coding mode to Δ Th more than or equal to the bit number (in the present embodiment greater than (L-1) * 5) of fixed allocation at this moment q(j), j=0 ..., L-2 encodes, and puts amplitude envelope huffman coding zone bit Flag_huff_rms=0; Otherwise utilize huffman coding to Δ Th q(j), j=0 ..., L-2 encodes, and puts amplitude envelope huffman coding zone bit Flag_huff_rms=1.Coded-bit of amplitude envelope quantification index (being the coded-bit of amplitude envelope difference value) and amplitude envelope huffman coding zone bit need be sent among the MUX.
104: the importance according to each coding subband is carried out Bit Allocation in Discrete to each coding subband;
Earlier the initial value that theoretical and coding subband amplitude envelope information calculations is respectively encoded subband importance according to code rate distortion carries out Bit Allocation in Discrete according to the importance of each subband to each subband again; This step can adopt following substep to realize:
104a: the bit consumption mean value that calculates single frequency coefficient;
From the available total bit number bits_available of 20ms frame length, the bit number bit_sides that the deduction side information consumes, noise are filled the noise level information reserved bit bits_noiseband and the used up bit number bits_Th of coding subband amplitude envelope of subband, obtain the remaining bit number bits_left that can be used for the frequency coefficient coding, that is:
bits_left=bits_available-bit_sides-bits_Th-bits_noiseband (7)
The noise level information reserved bit bits_noiseband that noise is filled subband is the bit of filling the noise level coded-bit reservation of subband for noise, after finishing the Bit Allocation in Discrete of noise filling subband, if also have remaining bits, then remaining noise is filled subband noise level information reserved bit bits_noiseband and be used for the Bit Allocation in Discrete correction.
Side information comprises the bit of amplitude envelope huffman coding sign Flag_huff_rms, frequency coefficient huffman coding sign Flag_huff_plvq and iterations count.Flag_huff_rms is used for sign and whether the subband amplitude envelope has been used huffman coding; Flag_huff_plvq is used for sign and whether has used huffman coding when frequency coefficient being carried out vector quantization and coding, and the iterations (see the description of subsequent step) of iterations count when being used for identifying the Bit Allocation in Discrete correction.
104b: calculate the importance initial value of subband in Bit Allocation in Discrete of respectively encoding:
Importance when being used for Bit Allocation in Discrete with j coding of rk (j) expression subband.
104c: the importance according to each coding subband is carried out Bit Allocation in Discrete to each coding subband;
Specifically describe as follows:
The coding subband at maximizing place from each rk (j) is at first supposed the j that is numbered of this coding subband k, increase the number of coded bits of each frequency coefficient in this coding subband then, and reduce the importance of this coding subband; Calculating simultaneously is used for this sub-band coding and consumes total number of bits bit_band_used (j k); Calculate the summation sum (bit_band_used (j)) of all coding bit numbers that subband consumes at last, j=0 ..., L-1; Repeat said process and satisfy the maximal value that can provide under the bit limit condition until the summation of consumption bit number.
The Bit Allocation in Discrete number is meant the assigned bit number of single frequency coefficient in the coding subband.Coding bit number that subband consumed is meant that single frequency coefficient institute allocation bit number multiply by the number that comprises frequency coefficient in this coding subband in this coding subband.
In the present embodiment, the step-length that to the bit allotment is 0 coding allocation of subbands bit is 1 bit, the step-length that importance reduces after the Bit Allocation in Discrete is 1, to the bit allotment greater than 0 and Bit Allocation in Discrete step-length when appending allocation bit less than the coding subband of threshold value 5 be 0.5 bit, appending the step-length that importance reduces after the allocation bit also is 0.5, Bit Allocation in Discrete step-length when the bit allotment is appended allocation bit more than or equal to the coding subband of threshold value 5 is 1, and appending the step-length that importance reduces after the allocation bit also is 1.
Bit distribution method in this step can be represented by following false code:
Make region_bit (j)=0, j=0,1 ..., L-1;
For coding subband 0,1 ..., L-1:
{
Seek j k = arg max j = 0 , · · · , L - 1 [ rk ( j ) ] ;
If region_bit is (j k)<5
{
If region_bit is (j k)=0
Make region_bit (j k)=region_bit (j k)+1;
Calculate bit_band_used (j k)=region_bit (j k) * BandWidth (j k);
Make rk (j k)=rk (j k)-1;
Region_bit (j else if k)>=1
Make region_bit (jk)=region_bit (jk)+0.5;
Calculate bit_band_used (j k)=region_bit (j k) * BandWidth (j k) * 0.5;
Make rk (j k)=rk (j k)-0.5;
}
Region_bit (j else if k)>=5
{
Make region_bit (j k)=region_bit (j k)+1;
Order rk ( j k ) = rk ( j k ) - 1 ifregion _ bit ( j k ) < MaxBit - 100 else ;
Calculate bit_band_used (j k)=region_bit (j k) * BandWidth (j k);
}
Calculate bit_used_all=sum (bit_band_used (j)) j=0,1 ..., L-1;
If bit_used_all<bits_left-24 returns and seeks j at each coding again in subband k, cycle calculations Bit Allocation in Discrete value; Wherein 24 is the maximal values of coding subband width.
Otherwise end loop is calculated the Bit Allocation in Discrete value, output Bit Allocation in Discrete value at this moment.
}
At last, importance according to the coding subband, distribute to the coding subband that meet the demands less than 24 bits by following principle with remaining, preferentially in Bit Allocation in Discrete is 1 coding subband, distribute 0.5 bit for each frequency coefficient, reduce the importance 0.5 of this coding subband simultaneously; Otherwise be that each frequency coefficient distributes 1 bit in 0 the subband to Bit Allocation in Discrete, reduce the importance 1 of this coding subband simultaneously, until bit_left-bit_used_all<4, the Bit Allocation in Discrete end.
Wherein, MaxBit is the number of coded bits of the maximum that single frequency coefficient can be assigned in the coding subband.Adopt MaxBit=9 in the present embodiment.This value can suitably be adjusted according to the encoder bit rate of codec.Region_bit (j) is the bit number that single frequency coefficient distributed in j the coding subband.
105: according to the bit allocation result of step 104, the effective noise that contains zero bits of encoded subband for inside is filled the allocation of subbands bit; By the power spectrum of MDCT frequency coefficient estimation sound signal, according to estimating that the power spectrum that obtains estimates that effective noise fills the noise level of subband; This noise level information is carried out quantization encoding, obtain the noise level coded-bit that noise is filled subband;
N MDCT frequency coefficient can be regarded a noise as and fill subband, also can evenly divide or be divided into several noises according to human hearing characteristic and fill subband.A noise is filled subband and is comprised one or more coding subbands.
The noise filling subband that the present invention contains zero bits of encoded subband with inside is called effective noise filling subband.
When carrying out noise filling subband Bit Allocation in Discrete, can fill the allocation of subbands bit for all effective noises, also can skip the effective noise of one or several low frequency and fill subband, for the effective noise of follow-up higher-frequency is filled the allocation of subbands bit, corresponding, when decoding, the zero bits of encoded subband that the low frequency effective noise of this unallocated bit is filled in the subband adopts the mode of white noise filling to carry out frequency spectrum reconfiguration.
Each effective noise is filled the identical bit number of allocation of subbands, or according to the auditory properties distribution different bit number of people's ear to each subband.Follow-up, after the acquisition effective noise is filled the noise level coded-bit of subband, with the multiplexing packing of this bit.
106: the vector of non-zero bit coding subband is quantized and encodes, obtain the coded-bit of frequency coefficient;
107: the structure encoding code stream
Figure 10 is that the code stream of the embodiment of the invention constitutes synoptic diagram.At first side information is write among the bit stream multiplexer MUX in the following order Flag_huff_rms, Flag_huff_plvq and count; The subband amplitude envelope of will encoding then coded-bit writes MUX, then the noise level coded-bit is write MUX, and the coded-bit with frequency coefficient writes MUX then; To be sent to decoding end by the code stream that said sequence is write as at last.
Fill subbands below in conjunction with accompanying drawing N MDCT frequency coefficient is divided into a plurality of noises, and to fill the allocation of subbands bit from second effective noise be that example is elaborated to step 105.
As shown in Figure 2, the process of the noise level coded-bit of the inner zero bits of encoded subband of acquisition noise filling subband specifically comprises:
201: the sub-band division of will encoding is filled subband for several noises, according to coding subband bit allocation result, for effective noise is filled the allocation of subbands bit;
Frequency coefficient in the frequency band range of required processing according to non-homogeneous several subbands that is divided into of human hearing characteristic, is called noise and fills subband; A noise is filled subband and is comprised one or more coding subbands;
The concrete dividing mode example of an example sees Table 2:
Table 2 noise is filled the non-homogeneous sub-band division mode of subband example
Noise is filled sub-band serial number Start code sub-band serial number (NLIndex) Finish coding sub-band serial number (NHIndex) Comprise coding subband number (SubBandNum)
0 0 11 12
1 12 13 2
2 14 16 3
3 17 20 4
4 21 28 8
In the above table 2, it is according to from low to high tactic of coding sub-bands of frequencies that noise is filled subband.
Suppose that noise filling subband noise level information reserved bit is that each noise except that sequence number 0 is filled two bits of subband reservation, the aggregate reservation bit number equals to multiply by 2 after noise filling subband number subtracts 1.
When Bit Allocation in Discrete, be not that 0 noise is filled the allocation of subbands bit for sequence number, promptly do not take coded-bit, correspondingly, in when decoding, be that 0 noise is filled subband to sequence number, if the inner coding subband that zero bit is arranged, then adopt the white noise fill method that the frequency coefficient of the coding subband of zero bit is carried out frequency spectrum reconfiguration, see step 504 for details; From sequence number is that 1 noise is filled subband and begun to judge that this noise fills subband inside whether zero bits of encoded subband is arranged, if this noise is filled subband zero bits of encoded subband is arranged, then fill 2 bits of allocation of subbands for this noise, be used to represent that this noise fills the noise level information of the inner zero bits of encoded subband of subband, and noise is filled subband noise level information reserved bit bits_noiseband subtract 2.After finishing all noises and filling the Bit Allocation in Discrete of subband, remaining noise level information reserved bit bits_noiseband is used for the Bit Allocation in Discrete correction.
Noise filling subband Bit distribution method can be represented by following false code in this step:
Nregion_bitflag (j-1) is the Bit Allocation in Discrete sign of noise filling subband j, and 1: expression has distributed bit; 0 expression there is not allocation bit.
Make Nregion_bitflag (j-1)=0, j=1,2 ..., L_noise-1;
Make noise fill subband Bit Allocation in Discrete remaining bits noiseband_remain_bits=0;
Fill subband j=1 for noise, 2 ... L_noise-1
Make region=NLInde (j), NLIndex (j)+1 ... NHIndex (j);
For all region
{
If region_bit (region) equals 0
{
Then make Nregion_bit (Nregion)=1;
bits_noiseband=bits_noiseband-2;
Jump out current circulation;
}
}
}
noiseband_remain_bits=bits_noiseband;
It is exactly the said noise level coded-bit in back that each bit of distributing to noise filling subband is lined up in order.
More than be to fill the process that subband carries out Bit Allocation in Discrete, can certainly directly fill the bit (as 2 bits) that subband is reserved specific quantity for each noise for noise.
202: filling the sub-band division mode based on the noise of table 2, is the power spectrum that 4 noises of 1,2,3,4 are filled the signal of subbands by MDCT coefficient estimate sequence number;
The algorithm that the power of the frequency k of i frame is estimated is suc as formula (13):
P i(k)=λP i-1(k)+(1-λ)X i(k) 2 (13)
P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame.X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter, one of them example λ=0.875;
Carrying out the principle of power Spectral Estimation by MDCT derives as follows:
It is the discrete time Fourier transform (DTFT) of the signal x of 2M at the angular frequency place that following formula provides length:
X DTFT ( &omega; ) = &Sigma; n = 0 2 M - 1 x ( n ) e - j&omega;n - - - ( 14 )
2M homogeneous phase between 0 and 2 π every frequency on DTFT is sampled.This conversion through sampling is called discrete Fourier transform (DFT) (DFT), and following formula provides the DFT at frequency k place:
X DFT ( k ) = X DTFT ( 2 &pi;k / 2 M ) = &Sigma; k = 0 2 M - 1 x ( n ) e - j 2 &pi;kn 2 M - - - ( 15 )
Utilize the skew of half frequency that DTFT is sampled, to generate displacement discrete Fourier transform (DFT) (SDFT):
X SDFT ( k ) = X DTFT ( 2 &pi; ( k + 1 / 2 ) / 2 M ) = &Sigma; k = 0 2 M - 1 x ( n ) e - j 2 &pi; ( k + 1 / 2 ) n 2 M - - - ( 16 )
As follows to the SDFT after actual signal x (n) windowing:
X SDFT ( k ) = &Sigma; k = 0 2 M - 1 w ( n ) x ( n ) e - j 2 &pi; ( k + 1 / 2 ) n 2 M - - - ( 16 )
According to formula (2) note MDCT frequency coefficient X (k) is X MDCT(k), and make M=N, rewriting formula (2) is as follows:
X MDCT ( k ) = &Sigma; n = 0 2 M - 1 x &OverBar; ( n ) w ( n ) cos ( &pi; M ( n + 1 2 + M 2 ) ( k + 1 2 ) ) , k = 0 , &CenterDot; &CenterDot; &CenterDot; , M - 1 - - - ( 17 )
SDFT and MDCT adopt same window type, order x ( n ) = x &OverBar; ( n ) ;
The MDCT of actual signal x (n) and the relation between the SDFT can be with following formulates:
X MDCT ( k ) = | X SDFT ( k ) | cos ( &angle; X SDFT ( k ) - &pi; M ( 1 2 + M 2 ) ( k + 1 2 ) ) - - - ( 18 )
That is to say that MDCT can be expressed as the amplitude by the SDFT of cosine modulation, this cosine is the angle function of SDFT.
The SDFT of the windowing piece of the continuous crossover by sound signal comes the power spectrum of estimated signal, and the transform length of supposing signal x is 2M, and following formula has provided at frequency k and at the STSDFT of displacement discrete Fourier transform (DFT) in short-term at piece t place so:
X STSDFT ( k , t ) = &Sigma; k = 0 2 M - 1 w ( n ) x ( n + Ht ) e - j 2 &pi; ( k + 1 / 2 ) n 2 M - - - ( 19 )
H is that the jumping of piece is long.H=M in addition, then STSDFT has identical jumping long with MDCT.
Utilize STSDFT to pass through on many t X SDFTThe squared magnitude of [k, t] averages the power spectrum of estimated signal, and by following formula, computational length is the moving average of the piece of T, to generate estimation to the time variation of power spectrum:
P STSDFT ( k , t ) = 1 T &Sigma; n = 0 T - 1 | X STSDFT ( k , t - &eta; ) | 2 - - - ( 20 )
According to the operation relation of MDCT and SDFT, under some assumed condition, can be according to X MDCT(k, t) the approximate P that obtains STSDFT(k, t).Definition:
P MDCT ( k , t ) = 1 T &Sigma; &eta; = 0 T - 1 | X MDCT ( k , t - &eta; ) | 2 - - - ( 21 )
Can obtain according to formula (18):
P MDCT ( k , t ) = 1 T &Sigma; &eta; = 0 T - 1 | X STSDFT ( k , t - &eta; ) | 2 cos 2 ( &angle; X STSDFT ( k , t - &eta; ) - &pi; M ( 1 2 + M 2 ) ( k + 1 2 ) ) - - - ( 22 )
If supposition is on piece | X STSDFT(k, t-η) | and ∠ X STSDFTThe co-variation (this hypothesis is true for most of sound signals) that (k, t-η) is relatively independent then can obtain:
P MDCT ( k , t ) &cong; ( 1 T &Sigma; &eta; = 0 T - 1 | X STSDFT ( k , t - &eta; ) | 2 ) ( 1 T &Sigma; &eta; = 0 T - 1 cos 2 ( &angle; X STSDFT ( k ) - &pi; M ( 1 2 + M 2 ) ( k + 1 2 ) ) ) - - - ( 23 )
If further suppose ∠ X STSDFT(k) if generally speaking be uniformly distributed between 0 and 2 π on T the piece and T big relatively, then, can obtain because the expectation value of the cosine square of equally distributed phase angle is arranged is 1/2nd:
P MDCT ( k , t ) &cong; 1 2 ( 1 T &Sigma; &eta; = 0 T - 1 | X STSDFT [ k , t - &eta; ] | 2 ) = 1 2 P STSDFT ( k , t ) ; - - - ( 24 )
Therefore, can see that the power spectrum of estimating according to MDCT approximates half of the power spectrum estimated according to STSDFT greatly.
Because the requirement of coding computing low delay, we select the one pole smoothing filter for use, carry out power Spectral Estimation, P MDCT(k, t) middle piece t represents with i, and is written as subscript, P MDCT(k t) can be written as P i(k), the length of piece is decided to be the length of a frame sound signal, and what then i represented is the numbering of frame, and the algorithm that can obtain final estimation is suc as formula (13), and formula (13) is exactly the algorithm that is used for power Spectral Estimation among the present invention.
203:, calculate each noise that is assigned to bit and fill zero bits of encoded subband noise level in the subband according to the power spectrum that formula (13) is estimated.
As shown in Figure 3: the detailed process of calculating noise level is:
Step 301: calculate the mean value that this noise is filled the power of all frequency coefficients of all or part zero bits of encoded subband in the subband, obtain average power P_aveg (j);
Step 302: this noise is filled in subband the power in all or part zero bits of encoded subband think that greater than the frequency coefficient of average power this noise fills the tonal content in subband, calculate this effective noise and fill that power obtains the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband greater than the mean value of the power of all frequency coefficients of average power P_aveg (j) in subband all or part zero bits of encoded subband;
Step 303: this noise is filled frequency coefficient that the power in all or part zero bits of encoded subband in subband is less than or equal to average power think that this noise fills the noise contribution in subband, calculate this effective noise and fill in subband the mean value of power that power in all or part zero bits of encoded subband is less than or equal to all frequency coefficients of average power P_aveg (j), obtain the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in subband;
The ratio P_noise_rate (j) of step 304: calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j), its value is filled the noise level of subband for this effective noise.
Noise level is carried out obtaining the noise level coded-bit behind the quantization encoding;
P_noise_rate (j) carries out quantization encoding and obtains P_noise_rate_bits (j).After finishing the noise level quantization encoding, the noise level coded-bit that each noise that is assigned to bit is filled subband is arranged from low to high by the sequence number of subband, obtained the noise level coded-bit that whole effective noise is filled subband.
One of them adopts the example of non-uniform quantizing as shown in table 3:
Table 3 noise signal ratio non-uniform quantizing example
P_noise_rate(j) P_noise_rate_bits(j)
[0,0.04) 00
[0.04,0.08) 01
[0.08,0.16) 10
[0.16,1) 11
The noise level that this effective noise is filled subband also is the noise level that this noise is filled zero bits of encoded subband in the subband, this noise level can also be represented with the ratio of tonal content average power P_signal_aveg (j) and noise contribution average power P_noise_aveg (j) except can using P_noise_rate (j) expression.
Two, coding/decoding method
Audio-frequency decoding method of the present invention is the inverse process of coding method, comprising:
A, treat in the decoding bit stream each amplitude envelope coded-bit and decode the amplitude envelope quantification index of the subband of respectively being encoded;
B, each coding subband is carried out Bit Allocation in Discrete, the noise level coded-bit inverse quantization of decoding is obtained the noise level of zero bits of encoded subband, the frequency coefficient coded-bit inverse quantization of decoding is obtained the frequency coefficient of non-zero bit coding subband;
C, zero bits of encoded subband is carried out spectral band replication, and control the integral energy fill level of this coding subband according to the amplitude envelope of each zero bits of encoded subband in the bit stream to be decoded, fill and the ratio of the energy of spectral band replication the frequency coefficient of the zero bits of encoded subband of acquisition reconstruct according to the noise level control noise of this zero bits of encoded subband;
D, the frequency coefficient of the zero bits of encoded subband of the frequency coefficient of non-zero bit coding subband and reconstruct is revised inverse discrete cosine transform (IMDCT), obtain final sound signal.
Fig. 4 is the structural representation of a kind of audio-frequency decoding method of the embodiment of the invention.As shown in Figure 4, this method comprises:
401: to each amplitude envelope coded-bit amplitude envelope quantification index of subband of respectively being encoded of decoding;
(promptly from bit stream demultiplexer DeMUX) extracts the coded-bit of a frame from the coded bit stream that coding side sends; After extracting coded-bit, at first side information is decoded, according to the value of amplitude envelope huffman coding sign Flag_huff_rms each amplitude envelope coded-bit in this frame is carried out Hofmann decoding or direct decoding then, the amplitude envelope quantification index Th of the subband of respectively being encoded q(j), j=0 ..., L-1.
402: each coding subband is carried out Bit Allocation in Discrete, and effective noise is filled subband carry out Bit Allocation in Discrete;
Amplitude envelope quantification index according to each coding subband calculates the subband importance initial value of respectively encoding, and utilizes coding subband importance that each coding subband is carried out Bit Allocation in Discrete, the Bit Allocation in Discrete number of the subband that obtains encoding; The Bit distribution method of decoding end and the Bit distribution method of coding side are identical.In bit allocation procedures, the step-length that coding subband importance reduces after Bit Allocation in Discrete step-length and the Bit Allocation in Discrete changes.
After finishing above-mentioned bit allocation procedures, according to the Bit Allocation in Discrete correction iterations count value of coding side and the importance of each coding subband, the coding subband is carried out count Bit Allocation in Discrete correction again, the Bit Allocation in Discrete overall process finishes then.
In Bit Allocation in Discrete and makeover process, the Bit Allocation in Discrete step-length when being 0 coding allocation of subbands bit to the bit allotment and the step-length of Bit Allocation in Discrete correction are 1 bits, the step-length that importance reduces after Bit Allocation in Discrete and the Bit Allocation in Discrete correction is 1, to the bit allotment greater than 0 and Bit Allocation in Discrete step-length when appending allocation bit and the step-length of Bit Allocation in Discrete correction less than the coding subband of certain threshold value be 0.5 bit, the step-length that importance reduces after Bit Allocation in Discrete and the Bit Allocation in Discrete correction also is 0.5, the Bit Allocation in Discrete step-length when bit allotment is appended allocation bit more than or equal to the coding subband of this threshold value and the step-length of Bit Allocation in Discrete correction are 1, and the step-length that importance reduces after Bit Allocation in Discrete and the Bit Allocation in Discrete correction also is 1;
The sub-band division of will encoding is filled subband for several noises, according to coding subband bit allocation result, for effective noise is filled the allocation of subbands bit; The Bit distribution method that the division methods of noise filling subband and noise are filled subband is identical with coding method, does not repeat them here.
403: the noise level coded-bit inverse quantization of decoding is obtained the noise level of zero bits of encoded subband, and the frequency coefficient coded-bit inverse quantization of decoding is obtained the MDCT frequency coefficient;
404: zero bits of encoded subband is carried out spectral band replication, control the integral-filled energy level of this coding subband according to the amplitude envelope of zero bits of encoded subband, and fill the noise level of subband according to the noise at this coding subband place, control each zero bits of encoded subband spectral band replication and noise and fill the ratio of energy, obtain the frequency coefficient of the zero bits of encoded subband of reconstruct;
This step detailed process is seen following Fig. 5 explanation.
The effective noise that has distributed bit is filled zero bits of encoded subband in the subband carry out the energy level of the frequency coefficient that obtains after spectral band replication and control are duplicated and the energy level that noise is filled, the zero bits of encoded subband that the effective noise of unallocated bit is filled in the subband carries out the noise filling.
405: the frequency coefficient behind the frequency spectrum reconfiguration is carried out IMDCT (Inverse Modified DiscreteCosine Transform revises inverse discrete cosine transform), obtain final audio output signal.
Below in conjunction with Fig. 5 step 404 is elaborated:
As shown in Figure 5, step 404 specifically comprises:
Step 501: the zero bits of encoded subband of effective noise being filled subband carries out spectral band replication;
The position at certain tone place of search sound signal in the MDCT frequency coefficient, is the spectral band replication cycle with 0 frequency to the bandwidth of the frequency of tone locations, to be offset copyband_offset frequency be offset a described copyband_offset frequency backward to tone locations frequency range backward as spectral band replication source frequency range from 0 frequency, zero bits of encoded subband will be carried out spectral band replication.The highest frequency of zero bits of encoded subband inside that carries out spectral band replication if desired is less than the frequency of the tone that searches, and then this frequency only adopts noise to be filled into the reconstruct of line frequency spectrum.
Frequency coefficient is by frequency series arrangement from low to high, and skew is promptly to the high offset of frequency backward.
Below this spectral band replication method is elaborated:
A, in the MDCT frequency coefficient position at certain tone place of search sound signal;
The method that the present invention preferably searches for the position at tone place is that the MDCT frequency coefficient is carried out smothing filtering: the MDCT frequency coefficient to certain special frequency channel of low frequency takes absolute value or square value, and carries out smothing filtering; According to the result of smothing filtering, the position at maximum extreme value place of search filtering output value is with the position as the tone place, the position at this maximum extreme value place;
The tone of the said sound signal of the present invention refers to the fundamental tone of sound signal or certain harmonic wave of fundamental tone.
Here said special frequency channel can be the frequency range of relatively concentrating according to the energy that spectral characteristic is determined, is called first frequency range.The low frequency here refers to the spectrum component less than 1/2nd signal total bandwidths.
The frequency coefficient here is step 403 a decoded M DCT frequency coefficient, and frequency is arranged from low to high.
As follows to the take absolute value operational formula of carrying out smothing filtering of the frequency coefficient of this first frequency range:
X _ amp i ( k ) = &mu;X _ amp i - 1 ( k ) + ( 1 - &mu; ) | X &OverBar; i ( k ) |
Or the operational formula of the frequency coefficient square value of this first frequency range being carried out smothing filtering is as follows
X _ amp i ( k ) = &mu;X _ amp i - 1 ( k - 1 ) + ( 1 - &mu; ) X &OverBar; i ( k ) 2
Wherein, μ is the smothing filtering coefficient, and its span is (0,1), but value is 0.125.X_amp i(k) filtering output value of k frequency of expression i frame,
Figure GSA00000030054100332
Be k frequency decoded M DCT coefficient of i frame, and during i=0, X_amp I-1(k)=0.
There are following two kinds of methods the position of searching for the maximum extreme value place of the first frequency range filtering output value:
(1) directly from the filtering output value of the frequency coefficient of the first frequency range correspondence, searches for original maximum, with the maximum extreme value of this maximal value, with the sequence number of frequency points corresponding position as maximum extreme value (being tone) as the first frequency range filtering output value;
(2) during the maximum extreme value of search, with this first frequency range wherein one section as second frequency range, from the filtering output value of the frequency coefficient of the second frequency range correspondence, search for original maximum, and with the maximum extreme value of this original maximum, with the sequence number of frequency points corresponding position as maximum extreme value (being tone) as the first frequency range filtering output value.
The start position of second frequency range is greater than the starting point of first frequency range, and the final position of second frequency range is less than the terminal point of first frequency range, and preferably, the number of first frequency range and the second frequency range medium frequency coefficient is not less than 8.
For the frequency coefficient of the original maximum correspondence that prevents to find is not the position at the tone place of sound signal, when carrying out the tone locations search, earlier from the filtering output value of this second frequency range, search for original maximum, and carry out different processing according to the position of the frequency coefficient of this original maximum correspondence:
(a) if this original maximum is the filtering output value of the frequency coefficient of the second frequency range low-limit frequency, then the filtering output value of the frequency coefficient of this second frequency range low-limit frequency is compared with the filtering output value of previous more low-frequency frequency coefficient in first frequency range, compare forward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of previous frequency coefficient, then think the position that this current frequency coefficient is the tone place, promptly the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, up to the filtering output value of the frequency coefficient of the low-limit frequency that relatively draws first frequency range during greater than the filtering output value of a back frequency coefficient, think that then the frequency coefficient of low-limit frequency of first frequency range is the position at tone place, promptly the filtering output value of the frequency coefficient of the low-limit frequency of first frequency range is the final maximum extreme value of determining;
(b) if this original maximum is the filtering output value of the frequency coefficient of the second frequency range highest frequency, then the filtering output value of the frequency coefficient of this second frequency range highest frequency is compared with the filtering output value of the frequency coefficient of a back higher frequency in first frequency range, compare backward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of a back frequency coefficient, think that then current frequency coefficient is the position at tone place, promptly the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, when the filtering output value of the frequency coefficient of the highest frequency that relatively draws first frequency range is bigger than the filtering output value of previous frequency coefficient, think that then the frequency coefficient of highest frequency of first frequency range is the position at tone place, promptly the filtering output value of the frequency coefficient of the highest frequency of first frequency range is the final maximum extreme value of determining;
(c) if this original maximum is the filtering output value of the frequency coefficient between the second frequency range low-limit frequency and the highest frequency, then the frequency coefficient of this original maximum correspondence is the position at tone place, that is, this original maximum is the final maximum extreme value of determining.
Below be the 24th to the 64th MDCT frequency coefficient with the frequency coefficient of first frequency range, the frequency coefficient of second frequency range is that the 33rd to the 56th MDCT frequency coefficient is that example describes the method for determining the sound signal position:
Its maximal value of search the filtering output value of the from the 33rd to 56 MDCT frequency coefficient; If corresponding the 33rd frequency coefficient of maximal value, whether the detection output result who judges the 32nd frequency coefficient big than the 33rd frequency coefficient, if will continue forward than, whether the detection output result who sees the 31st frequency coefficient whether big than the 32nd frequency coefficient, relatively bigger than previous up to the filtering output value of current frequency coefficient forward successively according to the method; Perhaps up to the filtering output value of the filtering output value that finds the 24th frequency coefficient greater than the 25th frequency coefficient, then current frequency coefficient or the 24th frequency coefficient are the position of tone;
If maximal value is 56 will adopt similar method to seek backward successively, bigger up to the filtering output value of current frequency coefficient than back one, this current frequency coefficient position that is tone then, or up to the filtering output value that finds the 64th frequency coefficient and its value filtering output value greater than the 63rd frequency coefficient, then the 64th frequency coefficient is the position of tone;
If maximal value is between 33 to 56, the frequency coefficient of this maximal value correspondence is the position of tone.
The value of this position is designated as the sequence number that Tonal_pos is maximum extreme value frequency points corresponding.
B, be the cycle to the bandwidth of the frequency of tone locations, will be offset copyband_offset frequency backward from 0 frequency and as the source frequency range zero bits of encoded subband be carried out spectral band replication to the frequency range that the frequency of tone locations is offset a described copyband_offset frequency backward with 0 frequency;
That is, the start sequence number copyband_offset of the frequency of source frequency range, the end sequence number is copyband_offset+Tonal_pos.
Among the present invention, the value of spectral band replication skew (being designated as copyband_offset) is for preestablishing, copyband_offset 〉=0, when predefined copyband_offset=0 is zero, the source frequency range is the frequency range of the frequency from 0 frequency to tone locations, in order to reduce the frequency spectrum saltus step of duplicating frequency band, copyband_offset is made as greater than zero, then the source frequency range is that frequency that (is designated as copyband_offset) among a small circle of skew backward of 0 frequency is offset the MDCT frequency coefficient of the frequency range of an identical frequency that (is designated as copyband_offset) among a small circle backward to the frequency of maximum extreme value position, and (as sequence number is 1 to fill subband for certain effective noise more than the frequency, 2,3,4) frequency spectrum of Nei Bu zero bits of encoded subband is filled and is all duplicated from the frequency range of source.
Flow process corresponding to Fig. 2, the zero bits of encoded subband of filling subband for first noise adopts the random noise fill method to carry out frequency spectrum reconfiguration, for sequence number is the zero bits of encoded subband that 1,2,3,4 noise is filled subband, adopts frequency coefficient to duplicate in conjunction with the method for noise filling and carries out frequency spectrum reconfiguration;
When carrying out spectral band replication, earlier the start sequence number of carrying out the zero bits of encoded subband of spectral band replication according to source frequency range and needs is calculated the source frequency range replication initiation sequence number of this zero bits of encoded subband, be the cycle with the spectral band replication cycle again, begin the frequency coefficient periodic repetitions of source frequency range to zero bits of encoded subband from source frequency range replication initiation sequence number.
The method of determining source frequency range replication initiation sequence number is:
At first, first that duplicates from needs zero bits of encoded subband, acquisition needs the sequence number of frequency of initial MDCT frequency coefficient of the zero bits of encoded subband of reconstructed frequency domain coefficient, be designated as fillband_start_freq, the sequence number of tone frequency points corresponding is designated as Tonal_pos, and the replicative cycle of frequency band is designated as copy_period.Copy_period equals Tonal_pos and adds 1.The highest frequency of zero bits of encoded subband inside that carries out spectral band replication if desired is less than the frequency of the tone that searches, and then this frequency only adopts noise to be filled into the reconstruct of line frequency spectrum, does not carry out spectral band replication.Spectral band replication skew is designated as copyband_offset, the value circulation of fillband_start_freq is deducted copy_period, drop on the value interval of the sequence number of source frequency range up to its value, this value then is a source frequency range replication initiation sequence number, is designated as copy_pos_mod.
Source frequency range replication initiation sequence number copy_pos_mod can obtain by following false code algorithm:
Make copy_pos_mod=fillband_start_freq;
When copy_pos_mod greater than (Tonal_pos+copyband_offset)
{
copy_pos_mod=copy_pos_mod-copy_period;
}
Copy_pos_mod then is source frequency range replication initiation sequence number after finishing computing.
When duplicating, to copy to backward successively with fillband_start_freq with the frequency coefficient that source frequency range replication initiation sequence number begins is on the zero bits of encoded subband of reference position, after the frequency that the source frequency range is duplicated arrives the Tonal_pos+copyband_offset frequency, again will continue to copy to backward on this zero bits of encoded subband since the frequency coefficient of copyband_offset frequency, the rest may be inferred, up to the spectral band replication of finishing when all frequency coefficients of leading zero bits of encoded subband.
Setting spectral band replication skew copyband_offset is 10 o'clock, the frequency band that will begin from copy_pos_mod arranges from low to high by frequency that to copy to fillband_start_freq be on the zero bits of encoded subband of reference position, behind the Tonal_pos+10 frequency, duplicate since the 10th frequency coefficient again, the rest may be inferred, the all signals that are somebody's turn to do zero bits of encoded subband all duplicate from 10 to Tonal_pos+10 frequency coefficients, and the frequency coefficient of frequency 10 to Tonal_pos+10 is the source frequency range of spectral band replication.
Method above adopting is for sequence number is all zero bits of encoded subband replica spectra that 1,2,3,4 noise is filled subband.
Except that above spectral band replication method, other spectral band replication method is equally applicable to the present invention, to realization did not influence of the present invention.
Step 502: the frequency coefficient that obtains after the zero bits of encoded subband that the noise level that obtains according to decoding is filled subband inside to each noise duplicates carries out the energy adjustment;
Calculate the amplitude envelope of the frequency coefficient that obtains after zero bits of encoded subband duplicates, be designated as sbr_rms (r).The computing formula of energy adjustment frequency coefficient is:
X _ sbr &OverBar; ( r ) = X _ sbr ( r ) * sbr _ lev _ scale ( r ) * rms ( r ) / sbr _ rms ( r ) ;
Wherein,
Figure GSA00000030054100372
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, the frequency coefficient that obtains after X_sbr (r) expression zero bits of encoded subband r duplicates, sbr_rms (r) is the amplitude envelope (its root mean square just) of the frequency coefficient X_sbr (r) that obtains after zero bits of encoded subband r duplicates, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, sbr_lev_scale (r) is the energy control ratio factor of the spectral band replication of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
sbr _ lev _ scale ( r ) = ( 1 - P _ noise _ rate &OverBar; ( j ) ) * fill _ energy _ saclefactor
Fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1), and value is 0.2 in this example.
Figure GSA00000030054100374
Fill the noise level of subband j for the noise that obtains of decoding inverse quantization, it can obtain the inverse quantization value according to the noise level coded-bit from the quantizing range of table 3, realize example such as table 10 for one in this example:
Table 10 noise level inverse quantization value
Figure GSA00000030054100375
Wherein j is the sequence number of the noise filling subband at zero bits of encoded subband r place.
Step 503: Additive White Noise forms last reconstructed frequency domain coefficient on the energy adjustment frequency coefficient.
After finishing the energy adjustment of duplicating frequency coefficient, Additive White Noise forms last reconstructed frequency domain coefficient on the energy adjustment frequency coefficient
Figure GSA00000030054100376
X &OverBar; ( r ) = X _ sbr &OverBar; ( r ) + rms ( r ) * noise _ lev _ scale ( r ) * random ( ) ;
Wherein,
Figure GSA00000030054100378
The frequency coefficient of expression zero bits of encoded subband r reconstruct,
Figure GSA00000030054100379
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, random () is the random phase generator, produce the random phase value, its rreturn value is+1 or-1, and noise_lev_scale (r) is the noise level control ratio factor of zero bits of encoded subband r, and its value is filled the noise level decision of subband by the noise at zero bits of encoded subband r place.Concrete computing formula is as follows:
noise _ lev _ scale ( r ) = P _ noise _ rate &OverBar; ( j ) * fill _ energy _ saclefactor
Wherein, fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1), and value is 0.2 in this example.
Figure GSA00000030054100382
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
The effective noise of certainly also tackling unallocated bit is filled zero bits of encoded subband in the subband (is that 0 noise is filled subband as sequence number) and is carried out white noise and fill with the reconstruct of realization frequency coefficient, does not repeat them here.
The present invention also provides a kind of method of estimation of noise level, and this method comprises:
Estimate the power spectrum of sound signal to be encoded according to the frequency coefficient of sound signal to be encoded;
According to the noise level of the power Spectral Estimation of estimating gained zero bits of encoded subband sound signal, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding; Wherein, zero bits of encoded subband refers to that the bit number that is assigned to is zero coding subband.Can also can calculate a shared noise level by several noise subbands for each zero bits of encoded subband calculates a noise level.
Further, the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal estimates to estimate in the noise contribution power that obtains and the zero bits of encoded subband ratio of the tonal content power that obtains.
Further, estimate the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, the formula that the power of the frequency k of i frame is estimated is as follows:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
Further, the frequency coefficient of sound signal to be encoded is divided into one or several noises fills subband, calculate certain effective noise according to the power spectrum of the sound signal of estimating to be encoded and fill the process of the noise level of subband and specifically comprise:
Calculate this effective noise and fill the mean value of the power of all frequency coefficients of all or part zero bits of encoded subband in the subband, obtain average power P_aveg (j);
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband;
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband;
The ratio P_noise_rate (j) of calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband.
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
Three, coded system
For realizing above coding method, the present invention also provides a kind of audio coding system, as shown in Figure 6, this system comprises correction discrete cosine transform (MDCT) unit, amplitude envelope computing unit, amplitude envelope quantification and coding unit, Bit Allocation in Discrete unit, frequency coefficient coding unit, noise level estimation unit and bit stream multiplexer (MUX); Wherein:
The MDCT unit is used for that sound signal is revised the inverse discrete cosine transform conversion and generates frequency coefficient;
The amplitude envelope computing unit is connected with described MDCT unit, is used for the frequency coefficient that described MDCT generates is divided into several coding subbands, and calculates the amplitude envelope of the subband of respectively encoding;
When the amplitude envelope computing unit is divided the coding subband, the frequency coefficient after the described MDCT conversion is divided into several equally spaced coding subbands, perhaps is divided into several non-uniform encoding subbands according to auditory perception property.
Amplitude envelope quantizes and coding unit, is connected with described amplitude envelope computing unit, is used for amplitude envelope value with each coding subband and quantizes and encode, respectively the encode coded-bit of subband amplitude envelope of generation;
The Bit Allocation in Discrete unit quantizes to be connected with coding unit with described amplitude envelope, is used to carry out Bit Allocation in Discrete, is the number of coded bits that each frequency coefficient distributed in the subband of respectively being encoded;
Particularly, the Bit Allocation in Discrete unit comprises importance computing module and the Bit Allocation in Discrete module that is connected, wherein:
The importance computing module is used for calculating according to coding subband amplitude envelope value the initial value of the subband importance of respectively encoding;
Described Bit Allocation in Discrete module is used for according to the importance of each coding subband each frequency coefficient of coding subband being carried out Bit Allocation in Discrete, and in bit allocation procedures, the step-length that importance reduces after Bit Allocation in Discrete step-length and the Bit Allocation in Discrete all changes.
Described importance initial value is to quantize optimum bit value under the snr gain condition and the scale factor calculation that meets people's ear apperceive characteristic according to maximum, or the quantification index Th of the subband amplitude envelope of respectively encoding q(j) or
Figure GSA00000030054100401
μ>0 wherein, μ and v are real number.
When described importance computing module calculates described importance initial value, calculate the bit consumption mean value of single frequency coefficient earlier; Again according to the optimum bit value of code rate distortion Theoretical Calculation under maximum quantification snr gain condition; Refer to calculate the importance initial value of subband in Bit Allocation in Discrete of respectively encoding according to described bit consumption mean value and optimum bit more afterwards;
Described Bit Allocation in Discrete module is carried out Bit Allocation in Discrete according to the importance of each coding subband to each coding subband: increase the number of coded bits of each frequency coefficient in the coding subband of importance maximum, and reduce the importance of this coding subband; The maximal value that can provide under the bit limit condition is provided summation until all coding bit numbers that subband consumes.
When described Bit Allocation in Discrete module was carried out Bit Allocation in Discrete, the Bit Allocation in Discrete step-length of low bits of encoded subband and the importance after the Bit Allocation in Discrete reduced Bit Allocation in Discrete step-length and importance Bit Allocation in Discrete after the reduction step-length of step-length less than zero bits of encoded subband and higher bit coding subband.As: as described in Bit Allocation in Discrete module when carrying out Bit Allocation in Discrete, the Bit Allocation in Discrete step-length of low bits of encoded subband and the importance after the Bit Allocation in Discrete reduce step-length and are 0.5; The Bit Allocation in Discrete step-length of zero bits of encoded subband and higher bit coding subband and the importance after the Bit Allocation in Discrete reduce step-length and are 1.
Frequency coefficient quantization encoding unit quantizes to be connected with coding unit with MDCT unit, Bit Allocation in Discrete unit and amplitude envelope, is used for each all frequency coefficient of coding subband is carried out normalization, quantification and encoding process, generates the frequency coefficient coded-bit;
The noise level estimation unit, be connected with MDCT unit and Bit Allocation in Discrete unit, be used for estimating the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, and then the noise level of estimation zero bits of encoded subband sound signal, and quantization encoding obtains the noise level coded-bit; Wherein, this noise level is used for controlling the ratio of noise filling and spectral band replication energy when decoding, see Fig. 7 for details;
The ratio of the tonal content power that noise contribution power that estimation obtains in the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal and zero bits of encoded subband estimation obtain.
Bit stream multiplexer (MUX), be connected with coding unit, frequency coefficient coding unit and noise level estimation unit with described amplitude envelope quantification, be used for the coded-bit of each coding coded-bit of subband and frequency coefficient and noise level coded-bit is multiplexing and send to decoding end.
Bit stream multiplexer is followed successively by amplitude envelope huffman coding sign, frequency coefficient huffman coding sign, Bit Allocation in Discrete correction iterations, the coded-bit of amplitude envelope, the coded-bit and the noise level coded-bit of frequency coefficient to the order of the multiplexing packing of bit after encoding.
As shown in Figure 7, the noise level estimation unit specifically comprises:
The power Spectral Estimation module is used for estimating according to the MDCT frequency coefficient of sound signal to be encoded the power spectrum of sound signal to be encoded;
Described power Spectral Estimation module adopts following formula to estimate the power of the frequency k of i frame:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
The noise level computing module is connected with described power Spectral Estimation module, is used for filling according to the noise that the power Spectral Estimation that described power Spectral Estimation module is estimated is assigned to bit the noise level of the sound signal of subband;
The noise level coding module is connected with described noise level computing module, is used for the noise level that described noise level computing module calculates is carried out quantization encoding, obtains the noise level coded-bit.
Further, the frequency coefficient of sound signal to be encoded is divided into one or several noises and fills subband, the function of described noise level computing module specifically comprises: be used for calculating the mean value that this effective noise is filled all frequency coefficient power of subband all or part zero bits of encoded subband, obtain average power P_aveg (j); Be used for calculating this effective noise and fill subband all or part zero bits of encoded subband power P i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband; Be used for calculating this effective noise and fill subband all or part zero bits of encoded subband power P i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the average power P_noise_aveg (j) that this effective noise is filled the noise contribution of zero bits of encoded subband in the subband; The ratio that is used for calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband;
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
Described noise level estimation unit also comprises the Bit Allocation in Discrete module that is connected with noise level computing module and noise level coding module, the effective noise that is used to all effective noises to fill allocation of subbands bits or skips one or several low frequency is filled subband, for the effective noise of follow-up higher-frequency is filled the allocation of subbands bit, and notice noise level computing module and noise level coding module; Described noise level computing module is only filled subband calculating noise level for the noise that has distributed bit; Described noise level coding module utilizes the bit of Bit Allocation in Discrete module assignment that described noise level is carried out quantization encoding.
Four, decode system
In order to realize above coding/decoding method, the present invention also provides a kind of audio decoding system, as shown in Figure 8, this system comprises bit stream demultiplexer (DeMUX), coding subband amplitude envelope decoding unit, Bit Allocation in Discrete unit, frequency coefficient decoding unit, frequency spectrum reconfiguration unit, revises inverse discrete cosine transform (IMDCT) unit, wherein:
Bit stream demultiplexer (DeMUX) is used for isolating amplitude envelope coded-bit, frequency coefficient coded-bit and noise level coded-bit from bit stream to be decoded;
The amplitude envelope decoding unit is connected with described bit stream demultiplexer, is used for the coded-bit of the amplitude envelope of described bit stream demultiplexer output is decoded the amplitude envelope quantification index of the subband of respectively being encoded;
The Bit Allocation in Discrete unit is connected with described amplitude envelope decoding unit, is used to respectively to encode the allocation of subbands bit and fill the allocation of subbands bit for the noise that contains zero bits of encoded subband;
The Bit Allocation in Discrete unit comprises importance computing module and Bit Allocation in Discrete module and Bit Allocation in Discrete correcting module, wherein:
The importance computing module is used for calculating according to coding subband amplitude envelope value the initial value of the subband importance of respectively encoding;
Described Bit Allocation in Discrete module is used for according to the importance initial value of each coding subband each frequency coefficient of coding subband being carried out Bit Allocation in Discrete, and in bit allocation procedures, the step-length that importance reduces after Bit Allocation in Discrete step-length and the Bit Allocation in Discrete all changes;
The Bit Allocation in Discrete correcting module is used for after carrying out Bit Allocation in Discrete, according to the Bit Allocation in Discrete correction iterations count value of coding side and the importance of each coding subband, the coding subband is carried out count Bit Allocation in Discrete correction again.
When described Bit Allocation in Discrete module was carried out Bit Allocation in Discrete, the Bit Allocation in Discrete step-length of low bits of encoded subband and the importance after the Bit Allocation in Discrete reduced Bit Allocation in Discrete step-length and importance Bit Allocation in Discrete after the reduction step-length of step-length less than zero bits of encoded subband and higher bit coding subband.
When described Bit Allocation in Discrete correcting module carried out bit correction, the bit correction step-length of low bits of encoded subband and the importance after the bit correction reduced bit correction step-length and importance bit correction after the reduction step-length of step-length less than zero bits of encoded subband and higher bit coding subband.
When described Bit Allocation in Discrete unit is noise filling allocation of subbands bit, filling subband according to the effective noise that the distribution method of scrambler is filled allocation of subbands bits for all effective noises or skipped one or several low frequency, is that the effective noise of follow-up higher-frequency is filled the allocation of subbands bit.
The frequency coefficient decoding unit is connected with the Bit Allocation in Discrete unit with the amplitude envelope decoding unit, be used for to the coding subband decode, inverse quantization and anti-normalization to be to obtain frequency coefficient;
The noise level decoding unit is connected with described bit stream demultiplexer and Bit Allocation in Discrete unit, is used for the noise level coded-bit inverse quantization of decoding is obtained noise level;
Frequency spectrum reconfiguration unit: be connected with noise level decoding unit, frequency coefficient decoding unit, amplitude envelope decoding unit and Bit Allocation in Discrete unit, be used for zero bits of encoded subband is carried out spectral band replication, and control the integral energy fill level of this coding subband according to the amplitude envelope of amplitude envelope decoding unit output, fill and the ratio of the energy of spectral band replication the frequency coefficient of the zero bits of encoded subband of acquisition reconstruct according to the noise level control noise of noise level decoding unit output;
Revise inverse discrete cosine transform (IMDCT) unit, be connected, be used for the frequency coefficient behind the frequency spectrum reconfiguration of finishing zero bits of encoded subband is carried out IMDCT, obtain sound signal with described frequency spectrum reconfiguration unit.
As shown in Figure 9, described frequency spectrum reconfiguration unit comprises that specifically the spectral band replication subelement, the energy that connect are successively adjusted subelement and noise is filled subelement, wherein:
The spectral band replication subelement is used for zero bits of encoded subband is carried out spectral band replication;
Energy is adjusted subelement, is used to calculate the amplitude envelope of the frequency coefficient that obtains after the zero bits of encoded subband spectral band replication, is designated as sbr_rms (r); And the frequency coefficient that obtains after duplicating is carried out the energy adjustment according to the noise level of noise level decoding unit output, the energy adjustment frequency coefficient is:
X _ sbr &OverBar; ( r ) = X _ sbr ( r ) * sbr _ lev _ scale ( r ) * rms ( r ) / sbr _ rms ( r ) ;
Wherein,
Figure GSA00000030054100442
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, X_sbr (r) expression zero bits of encoded subband r is by duplicating the frequency coefficient that obtains, sbr_rms (r) is the amplitude envelope of the frequency coefficient X_sbr (r) that obtains after zero bits of encoded subband r duplicates, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, sbr_lev_scale (r) is the energy control ratio factor of the spectral band replication of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
sbr _ lev _ scale ( r ) = ( 1 - P _ noise _ rate &OverBar; ( j ) ) * fill _ energy _ saclefactor
Fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1),
Figure GSA00000030054100444
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
Noise is filled subelement, is used for according to the noise level of noise level decoding unit output the energy adjustment frequency coefficient being carried out the noise filling, and the formula that noise is filled is:
X &OverBar; ( r ) = X _ sbr &OverBar; ( r ) + rms ( r ) * noise _ lev _ scale ( r ) * random ( ) ;
Wherein,
Figure GSA00000030054100446
Expression zero bits of encoded subband r reconstructed frequency domain coefficient,
Figure GSA00000030054100447
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, random () is the random phase generator, produce the random phase value, its rreturn value is+1 or-1, noise_lev_scale (r) is the noise level control ratio factor of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
noise _ lev _ scale ( r ) = P _ noise _ rate &OverBar; ( j ) * fill _ energy _ saclefactor
Wherein, fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1), and value is 0.2 in this example.
Figure GSA00000030054100452
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
The zero bits of encoded subband that described spectral band replication subelement is filled in the subband the noise that has distributed bit according to the bit allocation result of Bit Allocation in Discrete unit carries out spectral band replication; Described energy is adjusted the frequency coefficient that subelement obtains after to spectral band replication and is carried out the energy adjustment; Described noise is filled subelement the zero bits of encoded subband in the noise filling subband of energy adjustment frequency coefficient and unallocated bit is carried out the noise filling.
Further, as shown in Figure 9, described spectral band replication subelement comprises tone locations search module, cycle and source frequency range computing module, source frequency range replication initiation sequence number computing module and the spectral band replication module that connects successively, wherein:
The tone locations search module is used in the position at certain tone place of MDCT frequency coefficient search sound signal, and specifically comprise: the MDCT frequency coefficient to first frequency range takes absolute value or square value, and carries out smothing filtering; According to the result of smothing filtering, search for the position at the maximum extreme value place of the first frequency range filtering output value, the position at this maximum extreme value place is the position at tone place;
Cycle and source frequency range computing module, be used for the spectral band replication cycle and the source frequency range that are identified for duplicating according to the tone position, this spectral band replication cycle is 0 frequency to the bandwidth of the frequency of tone locations, and described source frequency range is frequency that 0 frequency is offset spectral band replication skew copyband_offset backward is offset the frequency of described copyband_offset backward to the frequency of tone locations a frequency range;
If the sequence number of the frequency of tone locations is designated as Tonal_pos, preestablish the spectral band replication skew and be designated as copyband_offset, the start sequence number copyband_offset of the frequency coefficient of source frequency range then, the end sequence number is copyband_offset+Tonal_pos.
Source frequency range replication initiation sequence number computing module is used for calculating according to the start sequence number that source frequency range and needs carry out the zero bits of encoded subband of spectral band replication the source frequency range replication initiation sequence number of this zero bits of encoded subband.
Described spectral band replication module be used for the spectral band replication cycle be the cycle, begin the frequency coefficient periodic repetitions of source frequency range to zero bits of encoded subband from source frequency range replication initiation sequence number;
The highest frequency of zero bits of encoded subband inside that carries out spectral band replication if desired is less than the frequency of the tone that searches, and then this frequency only adopts noise to be filled into the reconstruct of line frequency spectrum, does not carry out spectral band replication.
Described tone locations search module adopts following method search tone position: the MDCT frequency coefficient to first frequency range takes absolute value or square value, and carries out smothing filtering; According to the result of smothing filtering, search for the position at the maximum extreme value place of the first frequency range filtering output value, the position at this maximum extreme value place is the position at tone place;
Further,
Described tone locations search module to the MDCT frequency coefficient of this first frequency range operational formula of carrying out smothing filtering that takes absolute value is: X _ amp i ( k ) = &mu;X _ amp i - 1 ( k ) + ( 1 - &mu; ) | X &OverBar; i ( k ) |
Or the computing of the frequency coefficient square value of this first frequency range being carried out smothing filtering is:
X _ amp i ( k ) = &mu;X _ amp i - 1 ( k - 1 ) + ( 1 - &mu; ) X &OverBar; i ( k ) 2
Wherein, μ is the smothing filtering coefficient, and getting its value among the embodiment is 0.125, X_amp i(k) filtering output value of k frequency of expression i frame,
Figure GSA00000030054100463
Be k frequency decoded M DCT coefficient of i frame, and during i=0, X_amp I-1(k)=0.
Further, described first frequency range is the frequency range of the low frequency relatively concentrated according to the energy that the statistical property of frequency spectrum is determined, and its medium and low frequency refers to the spectrum component less than 1/2nd signal total bandwidths.
Further, described tone locations search module is directly searched for original maximum from the filtering output value of the frequency coefficient of the first frequency range correspondence, with the maximum extreme value of this maximal value as the first frequency range filtering output value.
Further, when described tone locations search module is determined the maximum extreme value of filtering output value, with this first frequency range wherein one section as second frequency range, earlier from the filtering output value of the frequency coefficient of the second frequency range correspondence, search for original maximum, carry out different processing according to the position of the frequency coefficient of this original maximum correspondence again:
If a. this original maximum is the filtering output value of the frequency coefficient of the second frequency range low-limit frequency, then the filtering output value of the frequency coefficient of this second frequency range low-limit frequency is compared with the filtering output value of previous more low-frequency frequency coefficient in first frequency range, compare forward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of previous frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, during greater than the filtering output value of a back frequency coefficient, then the filtering output value of the frequency coefficient of the low-limit frequency of first frequency range is the final maximum extreme value of determining up to the filtering output value of the frequency coefficient of the low-limit frequency that relatively draws first frequency range;
If b. this original maximum is the filtering output value of the frequency coefficient of the second frequency range highest frequency, then the filtering output value of the frequency coefficient of this second frequency range highest frequency is compared with the filtering output value of the frequency coefficient of a back higher frequency in first frequency range, compare backward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of a back frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, when the filtering output value of the frequency coefficient of the highest frequency that relatively draws first frequency range was bigger than the filtering output value of previous frequency coefficient, then the filtering output value of the frequency coefficient of the highest frequency of first frequency range was the final maximum extreme value of determining;
If c. this original maximum is the filtering output value of the frequency coefficient between the second frequency range low-limit frequency and the highest frequency, then the frequency coefficient of this original maximum correspondence is the position at tone place, that is, this original maximum is the final maximum extreme value of determining.
Further, the process that described source frequency range replication initiation sequence number computing module calculates the source frequency range replication initiation sequence number of the zero bits of encoded subband that need carry out spectral band replication comprises: the sequence number that obtains the initial frequency of the current zero bits of encoded subband that needs the reconstructed frequency domain coefficient, be designated as fillband_start_freq, the sequence number of tone frequency points corresponding is designated as Tonal_pos, Tonal_pos is added 1 obtain replicative cycle copy_period, source frequency range start sequence number is designated as copyband_offset, the value circulation of fillband_start_freq is deducted copy_period, drop on the value interval of the sequence number of source frequency range up to this value, this value is designated as copy_pos_mod, is source frequency range replication initiation sequence number.
Further, when described spectral band replication module is carried out spectral band replication, specifically comprise:
To copy to backward successively with fillband_start_freq with the frequency coefficient that source frequency range replication initiation sequence number begins is on the zero bits of encoded subband of reference position, after the frequency that the source frequency range is duplicated arrives the Tonal_pos+copyband_offset frequency, again will continue to copy to backward on this zero bits of encoded subband since the frequency coefficient of copyband_offset frequency, the rest may be inferred, duplicates up to all frequency coefficients of finishing when leading zero bits of encoded subband.

Claims (41)

1. a noise level method of estimation is characterized in that, this method comprises:
Estimate the power spectrum of sound signal to be encoded according to the frequency coefficient of sound signal to be encoded;
According to the noise level of the power Spectral Estimation that calculates zero bits of encoded subband sound signal, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding; Wherein, zero bits of encoded subband refers to that the bit number that is assigned to is zero coding subband.
2. the method for claim 1 is characterized in that: the ratio of the tonal content power that noise contribution power that estimation obtains in the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal and zero bits of encoded subband estimation obtain.
3. the method for claim 1 is characterized in that:
Estimate the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, the rating formula of the frequency k of i frame is as follows:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
4. the method for claim 1 is characterized in that:
The frequency coefficient of sound signal to be encoded is divided into one or several noises fills subband, calculate certain effective noise according to the power spectrum of the sound signal of estimating to be encoded and fill the process of the noise level of subband and specifically comprise:
Calculate this effective noise and fill the mean value of the power of all frequency coefficients of all or part zero bits of encoded subband in the subband, obtain average power P_aveg (j);
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband;
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband;
The ratio P_noise_rate (j) of calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband.
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
5. an audio coding method is characterized in that, this method comprises:
A, the MDCT frequency coefficient of sound signal to be encoded is divided into several coding subbands, the amplitude envelope value of each coding subband is carried out quantization encoding, obtain the amplitude envelope coded-bit;
B, each coding subband is carried out Bit Allocation in Discrete, and non-zero bit coding subband is carried out quantization encoding, obtain MDCT frequency coefficient coded-bit;
C, estimate the power spectrum of sound signal to be encoded, and then estimate the noise level of zero bits of encoded subband sound signal, and quantization encoding obtains the noise level coded-bit according to the MDCT frequency coefficient of sound signal to be encoded; Wherein, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding, and zero bits of encoded subband refers to that the bit number that is assigned to is zero coding subband;
Behind D, the amplitude envelope coded-bit and frequency coefficient coded-bit and the multiplexing packing of noise level coded-bit, send decoding end to each coding subband.
6. method as claimed in claim 5, it is characterized in that: among the step C, the ratio of the tonal content power that estimation obtains in noise contribution power that the interior estimation of the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal obtains and the zero bits of encoded subband.
7. method as claimed in claim 5 is characterized in that:
Estimate the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, the algorithm that the power of the frequency k of i frame is estimated is as follows:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, wherein equal 0 as i, the time P I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
8. method as claimed in claim 5 is characterized in that, among the step B, the frequency coefficient of sound signal to be encoded is divided into one or several noises fills subband, and after to each coding allocation of subbands bit, be that effective noise is filled the allocation of subbands bit; Among the step C, calculate certain effective noise according to the power spectrum of the sound signal of estimating to be encoded and fill the process of the noise level of subband and specifically comprise:
Calculate this effective noise and fill the mean value of all frequency coefficients of all or part zero bits of encoded subband in the subband, obtain average power P_aveg (j);
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband;
Calculate this effective noise and fill power P in the subband all or part zero bits of encoded subband i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband;
The ratio P_noise_rate (j) of calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband.
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
9. method as claimed in claim 8 is characterized in that:
When dividing noise and filling subband, evenly divide or carry out non-homogeneous division according to human hearing characteristic, a noise is filled subband and is comprised one or more coding subbands.
10. method as claimed in claim 8 is characterized in that: the effective noise of filling allocation of subbands bits for all effective noises among the step B or skipping one or several low frequency is filled subband, is that the effective noise of follow-up higher-frequency is filled the allocation of subbands bit; Among the step C dispensed effective noise of bit fill the noise level of subband; Use the bit of this distribution to the multiplexing packing of noise level coded-bit among the step D.
11. method as claimed in claim 8 is characterized in that: each effective noise is filled the identical bit number of allocation of subbands or is distributed different bit numbers according to auditory properties.
12. an audio-frequency decoding method is characterized in that, this method comprises:
A2, treat in the decoding bit stream each amplitude envelope coded-bit inverse quantization of decoding, the amplitude envelope of the subband of respectively being encoded;
B2, each coding subband is carried out Bit Allocation in Discrete, the noise level coded-bit inverse quantization of decoding is obtained the noise level of zero bits of encoded subband, the frequency coefficient coded-bit inverse quantization of decoding is obtained the frequency coefficient of non-zero bit coding subband;
C2, zero bits of encoded subband is carried out spectral band replication, and control the integral energy fill level of this coding subband according to the amplitude envelope of each zero bits of encoded subband, fill and the ratio of the energy of spectral band replication the frequency coefficient of the zero bits of encoded subband of acquisition reconstruct according to the noise level control noise of this zero bits of encoded subband;
D2, the frequency coefficient of the zero bits of encoded subband of the frequency coefficient of non-zero bit coding subband and reconstruct is revised inverse discrete cosine transform (IMDCT), obtain final sound signal.
13. method as claimed in claim 12, it is characterized in that, among the step C2, during spectral band replication, the position at certain tone place of search sound signal in the MDCT frequency coefficient, is the spectral band replication cycle with 0 frequency to the bandwidth of the frequency of tone locations, and be offset copyband_offset frequency backward with 0 frequency and be offset the frequency range of a described copyband_offset frequency backward as the source frequency range to the frequency of tone locations, zero bits of encoded subband is carried out spectral band replication, if the highest frequency of zero bits of encoded subband inside less than the frequency of the tone that searches, then should only adopt noise to be filled into the reconstruct of line frequency spectrum by zero bits of encoded subband.
14. method as claimed in claim 12 is characterized in that, among the step C2,
The frequency coefficient of first frequency range is taken absolute value or square value and carry out smothing filtering;
According to the result of smothing filtering, search for the position at the maximum extreme value place of the first frequency range filtering output value, with position, the position at this maximum extreme value place as certain tone place.
15. method as claimed in claim 14 is characterized in that:
As follows to the take absolute value operational formula of carrying out smothing filtering of the frequency coefficient of this first frequency range:
X _ amp i ( k ) = &mu;X _ amp i - 1 ( k ) + ( 1 - &mu; ) | X &OverBar; i ( k ) |
Or the operational formula of the frequency coefficient square value of this first frequency range being carried out smothing filtering is as follows
X _ amp i ( k ) = &mu;X _ amp i - 1 ( k - 1 ) + ( 1 - &mu; ) X &OverBar; i ( k ) 2
Wherein, μ is the smothing filtering coefficient, X_amp i(k) filtering output value of k frequency of expression i frame,
Figure FSA00000030054000051
Be k frequency decoded M DCT coefficient of i frame, and during i=0, X_amp I-1(k)=0.
16. method as claimed in claim 14 is characterized in that, described first frequency range is the frequency range of the low frequency relatively concentrated according to the energy that the statistical property of frequency spectrum is determined, and its medium and low frequency refers to the spectrum component less than 1/2nd signal total bandwidths.
17. method as claimed in claim 14, it is characterized in that, adopt following method to determine the maximum extreme value of filtering output value: directly from the filtering output value of the frequency coefficient of the first frequency range correspondence, to search for original maximum, with the maximum extreme value of this maximal value as the first frequency range filtering output value.
18. method as claimed in claim 14 is characterized in that, adopts following method to determine the maximum extreme value of filtering output value:
With this first frequency range wherein one section as second frequency range, from the filtering output value of the frequency coefficient of the second frequency range correspondence, search for original maximum, carry out different processing according to the position of the frequency coefficient of this original maximum correspondence:
If a. this original maximum is the filtering output value of the frequency coefficient of the second frequency range low-limit frequency, then the filtering output value of the frequency coefficient of this second frequency range low-limit frequency is compared with the filtering output value of previous more low-frequency frequency coefficient in first frequency range, compare forward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of previous frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, during greater than the filtering output value of a back frequency coefficient, then the filtering output value of the frequency coefficient of the low-limit frequency of first frequency range is the final maximum extreme value of determining up to the filtering output value of the frequency coefficient of the low-limit frequency that relatively draws first frequency range;
If b. this original maximum is the filtering output value of the frequency coefficient of the second frequency range highest frequency, then the filtering output value of the frequency coefficient of this second frequency range highest frequency is compared with the filtering output value of the frequency coefficient of a back higher frequency in first frequency range, compare backward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of a back frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, when the filtering output value of the frequency coefficient of the highest frequency that relatively draws first frequency range was bigger than the filtering output value of previous frequency coefficient, then the filtering output value of the frequency coefficient of the highest frequency of first frequency range was the final maximum extreme value of determining;
If c. this original maximum is the filtering output value of the frequency coefficient between the second frequency range low-limit frequency and the highest frequency, then the frequency coefficient of this original maximum correspondence is the position at tone place, that is, this original maximum is the final maximum extreme value of determining.
19. method as claimed in claim 13, it is characterized in that, among the step C2, when zero bits of encoded subband is carried out spectral band replication, earlier the start sequence number of carrying out the zero bits of encoded subband of spectral band replication according to source frequency range and needs is calculated the source frequency range replication initiation sequence number of this zero bits of encoded subband, be the cycle with the spectral band replication cycle again, begin the frequency coefficient of source frequency range is periodically copied to zero bits of encoded subband from source frequency range replication initiation sequence number.
20. method as claimed in claim 19 is characterized in that, the method for calculating the source frequency range replication initiation sequence number of this zero bits of encoded subband among the step C2 is:
Acquisition needs the sequence number of frequency of initial MDCT frequency coefficient of the zero bits of encoded subband of reconstructed frequency domain coefficient, be designated as fillband_start_freq, the sequence number of tone frequency points corresponding is designated as Tonal_pos, Tonal_pos is added 1 obtain replicative cycle copy_period, the spectral band replication skew is designated as copyband_offset, the value circulation of fillband_start_freq is deducted copy_period, drop on the value interval of the sequence number of source frequency range up to this value, this value is designated as copy_pos_mod for source frequency range replication initiation sequence number.
21. method as claimed in claim 19 is characterized in that, is the cycle with the spectral band replication cycle among the step C2, begins from source frequency range replication initiation sequence number with the frequency coefficient periodic repetitions of source frequency range to the method for zero bits of encoded subband to be:
To copy to backward successively with fillband_start_freq with the frequency coefficient that source frequency range replication initiation sequence number begins is on the zero bits of encoded subband of reference position, after the frequency that the source frequency range is duplicated arrives the Tonal_pos+copyband_offset frequency, again will continue to copy to backward on this zero bits of encoded subband since the frequency coefficient of copyband_offset frequency, the rest may be inferred, up to the spectral band replication of finishing when all frequency coefficients of leading zero bits of encoded subband.
22. method as claimed in claim 12 is characterized in that, among the step C2, the frequency coefficient that obtains after adopting following method that zero bits of encoded subband is duplicated carries out the energy adjustment:
Calculate the amplitude envelope of the frequency coefficient that obtains after the zero bits of encoded subband spectral band replication, be designated as sbr_rms (r);
The formula that the frequency coefficient that obtains after duplicating is carried out the energy adjustment is:
X _ sbr &OverBar; ( r ) = X _ sbr ( r ) * sbr _ lev _ scale ( r ) * rms ( r ) / sbr _ rms ( r ) ;
Wherein, The energy adjustment frequency coefficient of expression zero bits of encoded subband r, X_sbr (r) expression zero frequency coefficient of bits of encoded subband r by obtaining after duplicating, sbr_rms (r) is the amplitude envelope of the frequency coefficient X_sbr (r) that obtains after zero bits of encoded subband r duplicates, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, sbr_lev_scale (r) is the energy control ratio factor of the spectral band replication of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
sbr _ lev _ scale ( r ) = ( 1 - P _ noise _ rate &OverBar; ( j ) ) * fill _ energy _ saclefactor
Fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1),
Figure FSA00000030054000074
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
23. method as claimed in claim 12 is characterized in that, among the step C2, according to following formula the energy adjustment frequency coefficient is carried out noise and fills:
X &OverBar; ( r ) = X _ sbr &OverBar; ( r ) + rms ( r ) * noise _ lev _ scale ( r ) * random ( ) ;
Wherein,
Figure FSA00000030054000076
Expression zero bits of encoded subband r reconstructed frequency domain coefficient,
Figure FSA00000030054000077
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, random () is the random phase generator, produce the random phase value, its rreturn value is+1 or-1, noise_lev_scale (r) is the noise level control ratio factor of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
noise _ lev _ scale ( r ) = P _ noise _ rate &OverBar; ( j ) * fill _ energy _ saclefactor
Wherein, fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1),
Figure FSA00000030054000079
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
24. method as claimed in claim 13, it is characterized in that: among the step B2, after subband carries out Bit Allocation in Discrete to each coding, the sub-band division of will encoding is filled subband for several noises, effective noise is filled subband carry out Bit Allocation in Discrete, among the step C2, the zero bits of encoded subband that the effective noise that has distributed bit is filled in the subband carries out the energy level of spectral band replication and the control frequency coefficient that duplicates and the energy level that noise is filled, the zero bits of encoded subband that the effective noise of unallocated bit is filled in the subband carries out the noise filling, and wherein effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
25. audio coding system, this system comprises revises discrete cosine transform (MDCT) unit, amplitude envelope computing unit, amplitude envelope quantification and coding unit, Bit Allocation in Discrete unit, frequency coefficient coding unit and bit stream multiplexer (MUX), it is characterized in that, this system also comprises the noise level estimation unit, wherein:
The MDCT unit is used for that sound signal is revised the inverse discrete cosine transform conversion and generates frequency coefficient;
The amplitude envelope computing unit is connected with described MDCT unit, is used for the frequency coefficient that described MDCT generates is divided into several coding subbands, and calculates the amplitude envelope value of the subband of respectively encoding;
Amplitude envelope quantizes and coding unit, is connected with described amplitude envelope computing unit, is used for amplitude envelope value with each coding subband and quantizes and encode, respectively the encode coded-bit of subband amplitude envelope of generation;
The Bit Allocation in Discrete unit quantizes to be connected with coding unit with described amplitude envelope, is used for each coding allocation of subbands bit;
Frequency coefficient quantization encoding unit quantizes to be connected with coding unit with MDCT unit, Bit Allocation in Discrete unit and amplitude envelope, is used for each all frequency coefficient of coding subband is carried out normalization, quantification and encoding process, generates the frequency coefficient coded-bit;
The noise level estimation unit, be connected with MDCT unit and Bit Allocation in Discrete unit, be used for estimating the power spectrum of sound signal to be encoded according to the MDCT frequency coefficient of sound signal to be encoded, and then the noise level of estimation zero bits of encoded subband sound signal, and quantization encoding obtains the noise level coded-bit; Wherein, this noise level is used for controlling the ratio of the energy of noise filling and spectral band replication when decoding;
Bit stream multiplexer (MUX) is connected with coding unit, frequency coefficient coding unit and noise level estimation unit with described amplitude envelope quantification, is used for the coded-bit of each coding coded-bit of subband and frequency coefficient is multiplexing and send to decoding end.
26. system as claimed in claim 25 is characterized in that: the ratio of the tonal content power that noise contribution power that estimation obtains in the noise level nulling bits of encoded subband of described zero bits of encoded subband sound signal and zero bits of encoded subband estimation obtain.
27. system as claimed in claim 25 is characterized in that, described noise level estimation unit specifically comprises:
The power Spectral Estimation module is used for estimating according to the MDCT frequency coefficient of sound signal to be encoded the power spectrum of sound signal to be encoded;
The noise level computing module is connected with described power Spectral Estimation module, is used for the noise level according to the power Spectral Estimation zero bits of encoded subband sound signal of described power Spectral Estimation module estimation;
The noise level coding module is connected with described noise level computing module, is used for the noise level that described noise level computing module calculates is carried out quantization encoding, obtains the noise level coded-bit.
28. system as claimed in claim 27 is characterized in that: described power Spectral Estimation module adopts following formula to estimate the power of the frequency k of i frame, and formula is as follows:
P i(k)=λ P I-1(k)+(1-λ) X i(k) 2, P when i equals 0 wherein I-1(k)=0; P i(k) k the performance number that the frequency estimation obtains of expression i frame; X i(k) the MDCT coefficient of k frequency of expression i frame, λ is the filter factor of one pole smoothing filter.
29. system as claimed in claim 27 is characterized in that:
The frequency coefficient of sound signal to be encoded is divided into one or several noises and fills subband, the function of described noise level computing module specifically comprises: be used for calculating the mean value that this effective noise is filled all frequency coefficient power of subband all or part zero bits of encoded subband, obtain average power P_aveg (j); Be used for calculating this effective noise and fill subband all or part zero bits of encoded subband power P i(k) greater than the mean value of the power of all frequency coefficients of average power P_aveg (j), obtain the tonal content average power P_signal_aveg (j) that this effective noise is filled zero bits of encoded subband in subband; Be used for calculating this effective noise and fill subband all or part zero bits of encoded subband power P i(k) be less than or equal to the power P of all frequency coefficients of average power P_aveg (j) i(k) mean value obtains the noise contribution average power P_noise_aveg (j) that this effective noise is filled zero bits of encoded subband in the subband; The ratio that is used for calculating noise composition average power P_noise_aveg (j) and tonal content average power P_signal_aveg (j) obtains the noise level that this effective noise is filled subband;
Wherein, effective noise is filled the noise filling subband that subband refers to contain zero bits of encoded subband.
30. system as claimed in claim 27, it is characterized in that: described noise level estimation unit also comprises the Bit Allocation in Discrete module that is connected with noise level computing module and noise level coding module, the effective noise that is used to all effective noises to fill allocation of subbands bits or skips one or several low frequency is filled subband, for the effective noise of follow-up higher-frequency is filled the allocation of subbands bit, and notice noise level computing module and noise level coding module; Described noise level computing module is only filled subband calculating noise level for the noise that has distributed bit; Described noise level coding module utilizes the bit of Bit Allocation in Discrete module assignment that described noise level is carried out quantization encoding.
31. audio decoding system, this system comprises bit stream demultiplexer (DeMUX), coding subband amplitude envelope decoding unit, Bit Allocation in Discrete unit, frequency coefficient decoding unit, frequency spectrum reconfiguration unit, revises inverse discrete cosine transform (IMDCT) unit, it is characterized in that:
Described DeMUX is used for isolating amplitude envelope coded-bit, frequency coefficient coded-bit and noise level coded-bit from bit stream to be decoded;
Described amplitude envelope decoding unit is connected with described DeMUX, is used for the amplitude envelope coded-bit of described bit stream demultiplexer output is decoded the amplitude envelope quantification index of the subband of respectively being encoded;
Described Bit Allocation in Discrete unit is connected with described amplitude envelope decoding unit, is used to carry out Bit Allocation in Discrete, is the number of coded bits that each frequency coefficient distributed in the subband of respectively being encoded;
The frequency coefficient decoding unit is connected with the Bit Allocation in Discrete unit with the amplitude envelope decoding unit, be used for to the coding subband decode, inverse quantization and anti-normalization to be to obtain frequency coefficient;
The noise level decoding unit is connected with described bit stream demultiplexer and Bit Allocation in Discrete unit, is used for the noise level coded-bit inverse quantization of decoding is obtained noise level;
Described frequency spectrum reconfiguration unit, be connected with described noise level decoding unit, frequency coefficient decoding unit, amplitude envelope decoding unit and Bit Allocation in Discrete unit, be used for zero bits of encoded subband is carried out spectral band replication, and control the integral energy fill level of this coding subband according to the amplitude envelope of amplitude envelope decoding unit output, fill and the ratio of the energy of spectral band replication the frequency coefficient of the zero bits of encoded subband of acquisition reconstruct according to the noise level control noise of noise level decoding unit output;
The IMDCT unit is connected with described frequency spectrum reconfiguration unit, is used for the frequency coefficient behind the frequency spectrum reconfiguration of finishing zero bits of encoded subband is carried out IMDCT, the sound signal that obtains.
32. system as claimed in claim 31 is characterized in that:
Described frequency spectrum reconfiguration unit comprises that the spectral band replication subelement, the energy that connect are successively adjusted subelement and noise is filled subelement, wherein:
The spectral band replication subelement is used for zero bits of encoded subband is carried out spectral band replication;
Energy is adjusted subelement, is used to calculate the amplitude envelope of the frequency coefficient that obtains after the zero bits of encoded subband spectral band replication, is designated as sbr_rms (r); And the frequency coefficient that obtains after duplicating is carried out the energy adjustment according to the noise level of noise level decoding unit output, the formula of energy adjustment is:
X _ sbr &OverBar; ( r ) = X _ sbr ( r ) * sbr _ lev _ scale ( r ) * rms ( r ) / sbr _ rms ( r ) ;
Wherein,
Figure FSA00000030054000112
The energy adjustment frequency coefficient of expression zero bits of encoded subband r, the frequency coefficient that obtains after X_sbr (r) expression zero bits of encoded subband r duplicates, sbr_rms (r) is the amplitude envelope of the frequency coefficient X_sbr (r) that obtains after zero bits of encoded subband r duplicates, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, sbr_lev_scale (r) is the energy control ratio factor of the spectral band replication of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
sbr _ lev _ scale ( r ) = ( 1 - P _ noise _ rate &OverBar; ( j ) ) * fill _ energy _ saclefactor
Fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1),
Figure FSA00000030054000114
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband;
Noise is filled subelement, is used for according to the noise level of noise level decoding unit output the energy adjustment frequency coefficient being carried out the noise filling, and the formula that noise is filled is:
X &OverBar; ( r ) = X _ sbr &OverBar; ( r ) + rms ( r ) * noise _ lev _ scale ( r ) * random ( ) ;
Wherein,
Figure FSA00000030054000116
Expression zero bits of encoded subband r reconstructed frequency domain coefficient,
Figure FSA00000030054000117
The energy adjustment of expression zero bits of encoded subband r is duplicated frequency coefficient, rms (r) is the amplitude envelope of the preceding frequency coefficient of coding of zero bits of encoded subband r, obtain by amplitude envelope quantification index inverse quantization, random () is the random phase generator, produce the random phase value, its rreturn value is+1 or-1, noise_lev_scale (r) is the noise level control ratio factor of zero bits of encoded subband r, the noise level that its value is filled subband by the noise at zero bits of encoded subband r place determines that concrete computing formula is as follows:
noise _ lev _ scale ( r ) = P _ noise _ rate &OverBar; ( j ) * fill _ energy _ saclefactor
Wherein, fill_energy_saclefactor is used to adjust the gain of whole filling energy for filling the energy proportion factor, and its span is (0,1),
Figure FSA00000030054000122
For the noise that obtains of decoding inverse quantization is filled the noise level of subband j, wherein j is the sequence number that the noise at zero bits of encoded subband r place is filled subband.
33. system as claimed in claim 31 is characterized in that: described spectral band replication subelement comprises tone locations search module, cycle and source frequency range computing module, source frequency range replication initiation sequence number computing module and the spectral band replication module that connects successively, wherein:
The tone locations search module is used in the position at certain tone place of MDCT frequency coefficient search sound signal,
Cycle and source frequency range computing module, be used for the spectral band replication cycle and the source frequency range that are identified for duplicating according to the tone position, this spectral band replication cycle is 0 frequency to the bandwidth of the frequency of tone locations, and described source frequency range is frequency that 0 frequency is offset spectral band replication skew copyband_offset backward is offset the frequency of described copyband_offset backward to the frequency of tone locations a frequency range;
Source frequency range replication initiation sequence number computing module is used for calculating according to the start sequence number that source frequency range and needs carry out the zero bits of encoded subband of spectral band replication the source frequency range replication initiation sequence number of this zero bits of encoded subband;
Described spectral band replication module be used for the spectral band replication cycle be the cycle, begin the frequency coefficient periodic repetitions of source frequency range to zero bits of encoded subband from source frequency range replication initiation sequence number; As the highest frequency of the zero bits of encoded subband inside frequency less than the tone that searches, then this frequency only adopts noise to be filled into the reconstruct of line frequency spectrum.
34. system as claimed in claim 33 is characterized in that: described tone locations search module adopts following method search tone position: the MDCT frequency coefficient to first frequency range takes absolute value or square value, and carries out smothing filtering; According to the result of smothing filtering, search for the position at the maximum extreme value place of the first frequency range filtering output value, the position at this maximum extreme value place is the position at tone place.
35. system as claimed in claim 33 is characterized in that:
Described tone locations search module to the MDCT frequency coefficient of this first frequency range operational formula of carrying out smothing filtering that takes absolute value is: X _ amp i ( k ) = &mu;X _ amp i - 1 ( k ) + ( 1 - &mu; ) | X &OverBar; i ( k ) |
Or the computing of the frequency coefficient square value of this first frequency range being carried out smothing filtering is:
X _ amp i ( k ) = &mu;X _ amp i - 1 ( k - 1 ) + ( 1 - &mu; ) X &OverBar; i ( k ) 2 Wherein, μ is the smothing filtering coefficient, X_amp i(k) filtering output value of k frequency of expression i frame,
Figure FSA00000030054000133
Be k frequency decoded M DCT coefficient of i frame, and during i=0, X_amp I-1(k)=0.
36. system as claimed in claim 33 is characterized in that, described first frequency range is the frequency range of the low frequency relatively concentrated according to the energy that the statistical property of frequency spectrum is determined, and its medium and low frequency refers to the spectrum component less than 1/2nd signal total bandwidths.
37. system as claimed in claim 33, it is characterized in that: described tone locations search module computing module is directly searched for original maximum from the filtering output value of the frequency coefficient of the first frequency range correspondence, with the maximum extreme value of this maximal value as the first frequency range filtering output value.
38. system as claimed in claim 33, it is characterized in that: when described tone locations search module is determined the maximum extreme value of filtering output value, with this first frequency range wherein one section as second frequency range, earlier from the filtering output value of the frequency coefficient of the second frequency range correspondence, search for original maximum, carry out different processing according to the position of the frequency coefficient of this original maximum correspondence again:
If a. this original maximum is the filtering output value of the frequency coefficient of the second frequency range low-limit frequency, then the filtering output value of the frequency coefficient of this second frequency range low-limit frequency is compared with the filtering output value of previous more low-frequency frequency coefficient in first frequency range, compare forward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of previous frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, during greater than the filtering output value of a back frequency coefficient, then the filtering output value of the frequency coefficient of the low-limit frequency of first frequency range is the final maximum extreme value of determining up to the filtering output value of the frequency coefficient of the low-limit frequency that relatively draws first frequency range;
If b. this original maximum is the filtering output value of the frequency coefficient of the second frequency range highest frequency, then the filtering output value of the frequency coefficient of this second frequency range highest frequency is compared with the filtering output value of the frequency coefficient of a back higher frequency in first frequency range, compare backward successively, when the filtering output value of current frequency coefficient is bigger than the filtering output value of a back frequency coefficient, then the filtering output value of this current frequency coefficient is the final maximum extreme value of determining, or, when the filtering output value of the frequency coefficient of the highest frequency that relatively draws first frequency range was bigger than the filtering output value of previous frequency coefficient, then the filtering output value of the frequency coefficient of the highest frequency of first frequency range was the final maximum extreme value of determining;
If c. this original maximum is the filtering output value of the frequency coefficient between the second frequency range low-limit frequency and the highest frequency, then the frequency coefficient of this original maximum correspondence is the position at tone place, that is, this original maximum is the final maximum extreme value of determining.
39. system as claimed in claim 33 is characterized in that:
The process that described source frequency range replication initiation sequence number computing module calculates the source frequency range replication initiation sequence number of the zero bits of encoded subband that need carry out spectral band replication comprises: the sequence number that obtains the initial frequency of the current zero bits of encoded subband that needs the reconstructed frequency domain coefficient, be designated as fillband_start_freq, the sequence number of tone frequency points corresponding is designated as Tonal_pos, Tonal_pos is added 1 obtain replicative cycle copy_period, source frequency range start sequence number is designated as copyband_offset, the value circulation of fillband_start_freq is deducted copy_period, drop on the value interval of the sequence number of source frequency range up to this value, this value is designated as copy_pos_mod for source frequency range replication initiation sequence number.
40. system as claimed in claim 33, it is characterized in that: when the spectral band replication module is carried out spectral band replication, to copy to backward successively with fillband_start_freq with the frequency coefficient that source frequency range replication initiation sequence number begins is on the zero bits of encoded subband of reference position, after the frequency that the source frequency range is duplicated arrives the Tonal_pos+copyband_offset frequency, again will continue to copy to backward on this zero bits of encoded subband since the frequency coefficient of copyband_offset frequency, the rest may be inferred, duplicates up to all frequency coefficients of finishing when leading zero bits of encoded subband.
41. system as claimed in claim 31 is characterized in that:
Described Bit Allocation in Discrete unit also is used to all effective noises filling allocation of subbands bits or skips the effective noise filling subband of one or several low frequency, is that the effective noise of follow-up higher-frequency is filled the allocation of subbands bit; Described energy is adjusted the frequency coefficient that subelement obtains after to spectral band replication and is carried out the energy adjustment; Described noise is filled subelement the zero bits of encoded subband in the noise filling subband of energy adjustment frequency coefficient and unallocated bit is carried out the noise filling.
CN2010191850619A 2010-03-02 2010-03-02 Audio encoding and decoding method, system and noise level estimation method Active CN102194457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010191850619A CN102194457B (en) 2010-03-02 2010-03-02 Audio encoding and decoding method, system and noise level estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010191850619A CN102194457B (en) 2010-03-02 2010-03-02 Audio encoding and decoding method, system and noise level estimation method

Publications (2)

Publication Number Publication Date
CN102194457A true CN102194457A (en) 2011-09-21
CN102194457B CN102194457B (en) 2013-02-27

Family

ID=44602411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010191850619A Active CN102194457B (en) 2010-03-02 2010-03-02 Audio encoding and decoding method, system and noise level estimation method

Country Status (1)

Country Link
CN (1) CN102194457B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102594701A (en) * 2012-03-14 2012-07-18 中兴通讯股份有限公司 Frequency spectrum reconstruction determination method and corresponding system
CN103137133A (en) * 2011-11-29 2013-06-05 中兴通讯股份有限公司 In-activated sound signal parameter estimating method, comfortable noise producing method and system
CN103854653A (en) * 2012-12-06 2014-06-11 华为技术有限公司 Signal decoding method and device
CN103918029A (en) * 2011-11-11 2014-07-09 杜比国际公司 Upsampling using oversampled SBR
WO2014117484A1 (en) * 2013-01-29 2014-08-07 华为技术有限公司 Prediction method and decoding device for bandwidth expansion band signal
CN105190749A (en) * 2013-01-29 2015-12-23 弗劳恩霍夫应用研究促进协会 Noise filling concept
CN106663439A (en) * 2014-07-01 2017-05-10 弗劳恩霍夫应用研究促进协会 Decoder and method for decoding an audio signal, encoder and method for encoding an audio signal
CN106796798A (en) * 2014-07-28 2017-05-31 弗劳恩霍夫应用研究促进协会 Apparatus and method for filling generation enhancing signal using independent noise
CN107516530A (en) * 2012-10-01 2017-12-26 日本电信电话株式会社 Coding method, code device, program and recording medium
CN107945811A (en) * 2017-10-23 2018-04-20 北京大学 A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN108109629A (en) * 2016-11-18 2018-06-01 南京大学 A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative
CN109065062A (en) * 2015-03-13 2018-12-21 杜比国际公司 Decode the audio bit stream in filling element with enhancing frequency spectrum tape copy metadata
CN110310659A (en) * 2013-07-22 2019-10-08 弗劳恩霍夫应用研究促进协会 The device and method of audio signal are decoded or encoded with reconstruct band energy information value
CN110992739A (en) * 2019-12-26 2020-04-10 上海乂学教育科技有限公司 Student on-line dictation system
CN112290975A (en) * 2019-07-24 2021-01-29 北京邮电大学 Noise estimation receiving method and device for audio information hiding system
CN112992188A (en) * 2012-12-25 2021-06-18 中兴通讯股份有限公司 Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment
CN113539281A (en) * 2020-04-21 2021-10-22 华为技术有限公司 Audio signal encoding method and apparatus
US11996106B2 (en) 2013-07-22 2024-05-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Apparatus and method for encoding and decoding an encoded audio signal using temporal noise/patch shaping

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462429A (en) * 2001-05-08 2003-12-17 皇家菲利浦电子有限公司 Audio coding
CN1677492A (en) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 Intensified audio-frequency coding-decoding device and method
CN101281748A (en) * 2008-05-14 2008-10-08 武汉大学 Method for filling opening son (sub) tape using encoding index as well as method for generating encoding index
WO2009029037A1 (en) * 2007-08-27 2009-03-05 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive transition frequency between noise fill and bandwidth extension
CN101393741A (en) * 2007-09-19 2009-03-25 中兴通讯股份有限公司 Audio signal classification apparatus and method used in wideband audio encoder and decoder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462429A (en) * 2001-05-08 2003-12-17 皇家菲利浦电子有限公司 Audio coding
CN1677492A (en) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 Intensified audio-frequency coding-decoding device and method
WO2009029037A1 (en) * 2007-08-27 2009-03-05 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive transition frequency between noise fill and bandwidth extension
CN101393741A (en) * 2007-09-19 2009-03-25 中兴通讯股份有限公司 Audio signal classification apparatus and method used in wideband audio encoder and decoder
CN101281748A (en) * 2008-05-14 2008-10-08 武汉大学 Method for filling opening son (sub) tape using encoding index as well as method for generating encoding index

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103918029A (en) * 2011-11-11 2014-07-09 杜比国际公司 Upsampling using oversampled SBR
USRE48258E1 (en) 2011-11-11 2020-10-13 Dolby International Ab Upsampling using oversampled SBR
CN103918029B (en) * 2011-11-11 2016-01-20 杜比国际公司 Use the up-sampling of over-sampling spectral band replication
US9530424B2 (en) 2011-11-11 2016-12-27 Dolby International Ab Upsampling using oversampled SBR
CN103137133A (en) * 2011-11-29 2013-06-05 中兴通讯股份有限公司 In-activated sound signal parameter estimating method, comfortable noise producing method and system
CN103137133B (en) * 2011-11-29 2017-06-06 南京中兴软件有限责任公司 Inactive sound modulated parameter estimating method and comfort noise production method and system
CN102594701A (en) * 2012-03-14 2012-07-18 中兴通讯股份有限公司 Frequency spectrum reconstruction determination method and corresponding system
CN107516530A (en) * 2012-10-01 2017-12-26 日本电信电话株式会社 Coding method, code device, program and recording medium
CN107516530B (en) * 2012-10-01 2020-08-25 日本电信电话株式会社 Encoding method, encoding device, and recording medium
US11610592B2 (en) 2012-12-06 2023-03-21 Huawei Technologies Co., Ltd. Method and device for decoding signal
CN105976824A (en) * 2012-12-06 2016-09-28 华为技术有限公司 Signal decoding method and device
JP2016506536A (en) * 2012-12-06 2016-03-03 ▲ホア▼▲ウェイ▼技術有限公司 Method and apparatus for decoding a signal
US10971162B2 (en) 2012-12-06 2021-04-06 Huawei Technologies Co., Ltd. Method and device for decoding signal
CN103854653B (en) * 2012-12-06 2016-12-28 华为技术有限公司 The method and apparatus of signal decoding
US9626972B2 (en) 2012-12-06 2017-04-18 Huawei Technologies Co., Ltd. Method and device for decoding signal
US10236002B2 (en) 2012-12-06 2019-03-19 Huawei Technologies Co., Ltd. Method and device for decoding signal
WO2014086155A1 (en) * 2012-12-06 2014-06-12 华为技术有限公司 Signal decoding method and device
US9830914B2 (en) 2012-12-06 2017-11-28 Huawei Technologies Co., Ltd. Method and device for decoding signal
CN103854653A (en) * 2012-12-06 2014-06-11 华为技术有限公司 Signal decoding method and device
US10546589B2 (en) 2012-12-06 2020-01-28 Huawei Technologies Co., Ltd. Method and device for decoding signal
CN112992188A (en) * 2012-12-25 2021-06-18 中兴通讯股份有限公司 Method and device for adjusting signal-to-noise ratio threshold in VAD (voice over active) judgment
US11031022B2 (en) 2013-01-29 2021-06-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling concept
WO2014117484A1 (en) * 2013-01-29 2014-08-07 华为技术有限公司 Prediction method and decoding device for bandwidth expansion band signal
CN105190749A (en) * 2013-01-29 2015-12-23 弗劳恩霍夫应用研究促进协会 Noise filling concept
US9361904B2 (en) 2013-01-29 2016-06-07 Huawei Technologies Co., Ltd. Method for predicting bandwidth extension frequency band signal, and decoding device
CN103971694B (en) * 2013-01-29 2016-12-28 华为技术有限公司 The Forecasting Methodology of bandwidth expansion band signal, decoding device
CN105190749B (en) * 2013-01-29 2019-06-11 弗劳恩霍夫应用研究促进协会 Noise fill technique
US10388295B2 (en) 2013-01-29 2019-08-20 Huawei Technologies Co., Ltd. Method for predicting bandwidth extension frequency band signal, and decoding device
US10410642B2 (en) 2013-01-29 2019-09-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filling concept
US10607621B2 (en) 2013-01-29 2020-03-31 Huawei Technologies Co., Ltd. Method for predicting bandwidth extension frequency band signal, and decoding device
US9875749B2 (en) 2013-01-29 2018-01-23 Huawei Technologies Co., Ltd. Method for predicting bandwidth extension frequency band signal, and decoding device
CN110310659A (en) * 2013-07-22 2019-10-08 弗劳恩霍夫应用研究促进协会 The device and method of audio signal are decoded or encoded with reconstruct band energy information value
US11996106B2 (en) 2013-07-22 2024-05-28 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Apparatus and method for encoding and decoding an encoded audio signal using temporal noise/patch shaping
US11922956B2 (en) 2013-07-22 2024-03-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for encoding or decoding an audio signal with intelligent gap filling in the spectral domain
CN110310659B (en) * 2013-07-22 2023-10-24 弗劳恩霍夫应用研究促进协会 Apparatus and method for decoding or encoding audio signal using reconstructed band energy information value
US11769513B2 (en) 2013-07-22 2023-09-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for decoding or encoding an audio signal using energy information values for a reconstruction band
US11769512B2 (en) 2013-07-22 2023-09-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for decoding and encoding an audio signal using adaptive spectral tile selection
US11735192B2 (en) 2013-07-22 2023-08-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
US10770083B2 (en) 2014-07-01 2020-09-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio processor and method for processing an audio signal using vertical phase correction
CN106663439A (en) * 2014-07-01 2017-05-10 弗劳恩霍夫应用研究促进协会 Decoder and method for decoding an audio signal, encoder and method for encoding an audio signal
US10930292B2 (en) 2014-07-01 2021-02-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio processor and method for processing an audio signal using horizontal phase correction
CN106663439B (en) * 2014-07-01 2021-03-02 弗劳恩霍夫应用研究促进协会 Decoder and method for decoding audio signal, encoder and method for encoding audio signal
US11705145B2 (en) 2014-07-28 2023-07-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating an enhanced signal using independent noise-filling
CN106796798B (en) * 2014-07-28 2021-03-05 弗劳恩霍夫应用研究促进协会 Apparatus and method for generating an enhanced signal using independent noise filling
CN106796798A (en) * 2014-07-28 2017-05-31 弗劳恩霍夫应用研究促进协会 Apparatus and method for filling generation enhancing signal using independent noise
US11908484B2 (en) 2014-07-28 2024-02-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating an enhanced signal using independent noise-filling at random values and scaling thereupon
US10885924B2 (en) 2014-07-28 2021-01-05 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating an enhanced signal using independent noise-filling
US11264042B2 (en) 2014-07-28 2022-03-01 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating an enhanced signal using independent noise-filling information which comprises energy information and is included in an input signal
US11664038B2 (en) 2015-03-13 2023-05-30 Dolby International Ab Decoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element
CN109273013B (en) * 2015-03-13 2023-04-04 杜比国际公司 Decoding an audio bitstream with enhanced spectral band replication metadata
CN109065062B (en) * 2015-03-13 2022-12-16 杜比国际公司 Decoding an audio bitstream having enhanced spectral band replication metadata in a filler element
CN109065062A (en) * 2015-03-13 2018-12-21 杜比国际公司 Decode the audio bit stream in filling element with enhancing frequency spectrum tape copy metadata
CN109273013A (en) * 2015-03-13 2019-01-25 杜比国际公司 Decode the audio bit stream with the frequency spectrum tape copy metadata of enhancing
CN108109629A (en) * 2016-11-18 2018-06-01 南京大学 A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative
CN107945811B (en) * 2017-10-23 2021-06-01 北京大学 Frequency band expansion-oriented generation type confrontation network training method and audio encoding and decoding method
CN107945811A (en) * 2017-10-23 2018-04-20 北京大学 A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method
CN112290975A (en) * 2019-07-24 2021-01-29 北京邮电大学 Noise estimation receiving method and device for audio information hiding system
CN110992739B (en) * 2019-12-26 2021-06-01 上海松鼠课堂人工智能科技有限公司 Student on-line dictation system
CN110992739A (en) * 2019-12-26 2020-04-10 上海乂学教育科技有限公司 Student on-line dictation system
CN113539281A (en) * 2020-04-21 2021-10-22 华为技术有限公司 Audio signal encoding method and apparatus

Also Published As

Publication number Publication date
CN102194457B (en) 2013-02-27

Similar Documents

Publication Publication Date Title
CN102194457B (en) Audio encoding and decoding method, system and noise level estimation method
KR101809592B1 (en) Audio encoder, audio decoder and related methods using two-channel processing within an intelligent gap filling framework
CN101878504B (en) Low-complexity spectral analysis/synthesis using selectable time resolution
CN101276587B (en) Audio encoding apparatus and method thereof, audio decoding device and method thereof
JP4950210B2 (en) Audio compression
CN103366755B (en) To the method and apparatus of coding audio signal and decoding
KR101238239B1 (en) An encoder
US10311884B2 (en) Advanced quantizer
CN102089808A (en) Audio encoder, audio decoder, methods for encoding and decoding an audio signal, audio stream and computer program
CN103366749B (en) A kind of sound codec devices and methods therefor
JP2013015598A (en) Audio coding/decoding method, system and noise level estimation method
CN101662288A (en) Method, device and system for encoding and decoding audios
CN102194458B (en) Spectral band replication method and device and audio decoding method and system
EP2814028B1 (en) Audio and speech coding device, audio and speech decoding device, method for coding audio and speech, and method for decoding audio and speech
CN103366750A (en) Sound coding and decoding apparatus and sound coding and decoding method
EP2227682A1 (en) An encoder
CN103366751B (en) A kind of sound codec devices and methods therefor
KR20080034817A (en) Apparatus and method for encoding and decoding signal
KR20160098597A (en) Apparatus and method for codec signal in a communication system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant