Background technology
The full name of " DRA (Digital Rise Audio) audio standard " is " a multi-sound channel digital audio encoding and decoding technique standard ", it becomes CNS (standard No. GB/T 22726-2008) on Dec 22nd, 2008 by the formal promulgation of Standardization Administration of China, and the unit of drafting of this standard comprises the application's applicant---Digital Wave (Beijing) Co., Ltd. etc.The application of DRA audio standard includes but not limited to: Digital Television, digital audio broadcasting, digital movie institute, VCD player, network flow-medium, IPTV and mobile multimedia (as, China Mobile multimedia broadcasting (CMMB)).
The coding framework 100 of DRA audio frequency as shown in Figure 1.The PCM sample one tunnel of input is input to transient state detection module 120, be used to judge that this frame signal is steady-state signal or transient signal, the result who obtains is input to variable-resolution bank of filters module 122: if this frame is a steady-state signal, variable-resolution bank of filters module 122 adopts long block length to carry out time frequency analysis to improve code efficiency; Otherwise variable-resolution bank of filters module 122 adopts a plurality of short block lengths to carry out time frequency analysis with the control Pre echoes.In order to improve the code efficiency of transient signal, the recombination module that intersects intersects reorganization to the frequency domain data of variable-resolution bank of filters module 122 outputs; But if this frame is a steady-state signal, then the frequency domain data of variable-resolution bank of filters output by not intersecting recombination module with being changed.In order to eliminate the redundancy between sound channel, and difference coding module (optional) utilizes correlativity between two sound channels of sound channel centering to reach the effect that reduces code check and improve code efficiency.Combined strength coding module (optional) is used for signal is carried out the combined strength coding, because (for example for high band, greater than 2kHz) signal, the sense of direction of the sense of hearing is relevant with the variation of relevant signal intensity (signal envelope), and irrelevant with the waveform of signal, promptly constant envelope signal does not have influence to sense of hearing sense of direction, therefore can utilize the relevant information between these characteristics and multichannel, the synthetic common sound channel of some sound channels is encoded, thus the code efficiency of raising multichannel.
Psychoacoustic analysis is carried out by human auditory system model module 140 in another road of PCM sample of input.Human auditory system model module 140 according to the masking threshold of sheltering the specific time-frequency region of curve calculation and perceptual entropy, is used to instruct the quantification to current frame signal according to the curve of sheltering of human hearing characteristic calculating current frame signal; Overall situation Bit Allocation in Discrete module 142 is given each quantifying unit the bit resources allocation, and (each quantifying unit is corresponding to a transition segment in the DRA technology and a definite jointly rectangular area of critical band, participate in DRA audio standard the 3rd chapter 3.3.16 bar) so that quantization noise power is lower than masking threshold.The quantization step that linear scalar quantization module 130 utilizes overall Bit Allocation in Discrete module 142 to provide quantizes the sub-band samples (that is, obtaining quantizing factor) in each quantifying unit, and to the laggard capable Huffman coding of the index difference of former and later two quantization steps.Appropriate H uffman code book and range of application thereof that quantizing factor module 132 utilizes code book to select module 134 to select come all quantizing factors are carried out the Huffman coding.Huffman sign indicating number and supplementary with all quantizing factors is packaged into a complete DRA code stream by multiplexing module 150 at last.
Using audio encoding and decoding technique to carry out in the process of digital audio encoding such as the DRA technology; when the code stream that is encoded occur speech pause or the spectrum hole (in Fig. 2, represent with shade; all is zero situation corresponding to quantization parameters all in a certain quantifying unit) etc. during situation, scrambler will come this segment data is encoded with less data.This way can make decoding end when speech pause, and the noise of following voice to transmit together also disappears thereupon together, and causes the uncontinuity of ground unrest.When noise level when higher, this uncontinuity will cause the serious decline of voice quality.
For this reason, the multiple noise fill method that is used for audio decoder has been proposed in the prior art, for example: in AMR-WB+ (referring to 3GPP TS 26.290), adopted non-blind noise to fill, this algorithm by analyze at transmitting terminal present frame and before the audio-frequency information of 7 frames, energy parameter and frequency spectrum parameter in the silence description frames of calculating current background noise, and this information encoded, be transferred to decoding end.Decoding end generates comfort noise according to energy parameter that receives and frequency spectrum parameter.
Again such as packing module that non-blind noise is also arranged in Dolby AC-3, this algorithm transmits the envelope of noise signal with special parameter at coding side, decoding end is when to detect Bit Allocation in Discrete be zero mantissa, multiply by envelope with the random number that generates and obtain spectral coefficient, finish the filling of comfort noise.
Yet above-mentioned two kinds of algorithms all have defective separately:
(1) the comfort noise filling algorithm among the AMR-WB+ has solved the problem that voice quality descends on the receiving end user subjective sensation.But because this algorithm need be searched for the Frame that frequency spectrum differs greatly at coding side, and calculates cepstrum frequency domain (ISF) mean value, also need to carry out the computing of ISF mean value at receiving end, this increases the operand of total system greatly.
(2) the noise filling algorithm among the Dolby AC-3 need be to the spectrum envelope index of a correspondence of each spectral coefficient transmission, and coding side need be changed, and this way is bigger to the code check influence.
This paper aims to provide a kind of blind noise fill method and device thereof that is used for audio decoder, to solve above-mentioned and other problem.Especially, this paper it would also be desirable to provide a kind of blind noise fill method and device thereof of the DRA of being exclusively used in algorithm, can solve above-mentioned and other problem equally.
Embodiment
By describing the preferred embodiments of the present invention hereinafter by accompanying drawing.Unnecessary details in the following description, function or the structure that becomes prior art will be described in detail, because will cause the ambiguous of introducing of the present invention.In addition, in the instructions with accompanying drawing in the identical Reference numeral that uses represent same device or same step.
Fig. 2 shows one section quantifying unit, wherein may occur a plurality ofly with the spectrum hole shown in the shade, and remainder is then for having shown the non-spectrum hole of corresponding quantization step index.What those skilled in the art can understand is that position, above-mentioned spectrum hole and non-spectrum hole quantization step index value all are schematic.Hereinafter will describe blind noise fill method and device thereof according to the embodiment of the invention, it can be used to fill the spectrum hole among Fig. 2 for example.
With reference to figure 3, wherein describe blind according to an embodiment of the invention noise fill method 10 in detail.Specifically, (the step 11) of the corresponding hereafter of concrete representative detection methods when detecting certain quantifying unit appearance spectrum hole in decoding end, at first in step 12, pass through tone characteristic at the current audio frame of frequency-domain analysis, determine corresponding decay factor, concrete exemplary definite mode will specifically be set forth hereinafter; Simultaneously, in step 13 for example by respectively the non-spectrum hole quantifying unit before and after corresponding quantitative unit in some frames before the present frame and this quantifying unit being carried out time-domain analysis, the energy of zero quantifying unit in the prediction present frame, concrete exemplary prediction mode will specifically be set forth hereinafter; In step 14, be the synthetic corresponding comfort noise signal of detected zero quantifying unit at last, concrete exemplary synthesis mode will specifically be set forth hereinafter.
What should specify is, step 12 and 13 order are uncertain, and though step 12 formerly or step 13 formerly carry out, perhaps both carry out simultaneously, do not exceed scope of the present invention.Similarly, execution in step 11 does not influence realization of the present invention with the order of step 12 yet.Exemplarily describe above-mentioned steps 11-14 in detail below in conjunction with Fig. 3.
At first, in step 11, judge whether to occur the spectrum hole, specifically:, therefore when detecting zero quantifying unit first, the call number of corresponding quantitative unit need be noted because zero quantifying unit may comprise one or more quantifying unit usually in frequency domain.
Situation with steady-state signal is that example illustrates how to determine the call number of zero quantifying unit first below: according to the regulation of DRA standard, the transition segment of every frame signal has only one under the steady-state signal situation, and this moment quantizes subband and quantifying unit coincides.From Q
t(Q
tBe a preset value, be preferably the integer between the 5-20, Q
tCan be referred to as " by the call number that quantizes band ") individual quantifying unit begins, and adds up the absolute value sum of the spectral coefficient X (k) in each quantifying unit.Suppose to comprise in certain quantifying unit N root spectral line (be the value of k from 0 to N-1), then above-mentioned summation process can be expressed as:
If for certain Q
T1, S
qLess than certain predetermined threshold value (being preferably 0), can determine that then current quantifying unit is zero quantifying unit, notes this call number Q first
T1
Because with Q
T1Be defined as the call number of zero quantifying unit first, therefore from Q
T1Begin to carry out the interpolation of noise, all are lower than Q
T1Quantifying unit no longer carry out noise and fill.If detect Q
T1Do not exist, show that then present frame is quiet frame, whole frame does not carry out noise and fills.
If the situation of transient signal, then step 11 can be more complicated: must carry out the calculating of similar formula (1) to each quantifying unit that quantizes subband.Owing to the quantifying unit (referring to DRA standard scale 33,34) of each quantification subband (that is, critical band) correspondence under the transient signal situation more than one, so each quantification subband need be carried out repeatedly the calculating of formula (1) and follow-up comparison.But it will be appreciated by persons skilled in the art that for transient signal, determine that therefore the mode of zero quantifying unit call number does not change first.
Carried out after the zero quantifying unit detection in the step 11, processing can enter into step 12, calculates decay factor therein.Because the spectrum information of sound signal is often to change in time, be infeasible with noise limit to be filled to the way in the tolerance interval therefore by fixed gain (decay factor).Especially, when the tone information of sound signal is relatively abundanter,, and damage the tonequality of original sound signal probably because gain the incorrect of (decay factor) selection.
Fig. 4 A-4C shows the influence of decay factor to frequency spectrum: specifically, Fig. 4 A shows the frequency spectrum of original audio signal, Fig. 4 B shows and fills decay factor in zero quantifying unit zone is 0.25 random noise, and fill decay factor in zero quantifying unit zone be 0.03 random noise and Fig. 4 C shows.When noise was filled, if the decay factor of selecting is bigger, the noise of generation was with the tonality (for example visible Fig. 4 B center section) of the original signal of possibility havoc; Reduce the value of decay factor, may correspondingly improve tonality (for example visible Fig. 4 C center section).As seen, before carrying out the noise filling, need select suitable gain according to the tone characteristic of sound signal.
According to one embodiment of present invention, further proposed a kind of according to sound signal tone characteristic select suitable gain method (step 12) in the corresponding diagram 3: at first on frequency domain at all quantifying unit in the frame (for steady-state signal, quantifying unit is and quantizes subband) energy spectrum calculates the ratio of their arithmetic mean and geometric mean, calculates the flatness of every frame sound signal with this.Then according to this flatness, for each audio frame is selected a suitable gain (decay factor).
Still be that example is discussed with the steady-state signal, the frequency spectrum of present frame has been divided into M quantifying unit (be this moment and quantize subband), and each quantifying unit comprises N root spectral line again, and then exemplary calculation procedure is as follows:
12.1, calculate all MDCT coefficient energy spectrums:
P(k)=X
2(k) (2)
Wherein, k=0,1,2 ..., MN-1
12.2, calculate the corresponding average energy of b quantifying unit (be during stable state and quantize subband) and compose:
Wherein, b represents the numbering of quantifying unit in the frame, value from 0 to M-1.
12.3, the arithmetic mean and the geometric mean ratio of the average energy spectrum by all quantifying unit in the frame, determine the flatness F of this frame:
12.4, according to flatness F, determine gain (decay factor) g of this frame
pValue:
In step 13, also can carry out time domain prediction to the energy (for steady-state signal, quantifying unit quantizes subband exactly) of zero quantifying unit, more effectively the scope of the comfort noise of filling is controlled in the tolerance interval in decoding end.For example, in method according to an embodiment of the invention: the energy by analyzing the corresponding quantifying unit of front cross frame sound signal and before and after the leading zero quantifying unit energy of non-zero quantifying unit, prediction is when the energy information of leading zero quantifying unit.Exemplary step is as follows:
13.1, decoding n frame data, if contain zero quantifying unit to this audio frame, then write down the call number M of detected zero quantifying unit in frequency domain detection
Z
13.2, seek in (n-2), (n-1) frame, manipulative indexing number is M
ZQuantifying unit, and calculate the average energy of this quantifying unit, be designated as E
(n-2), E
(n-1)
13.3, according to the average energy E of the quantifying unit of calculating in the step 13.2
(n-2), E
(n-1), the energy of the zero quantifying unit of present frame is carried out linear prediction, and (predicted value is
).When adopting linear interpolation to predict, corresponding two average energy E
(n-2), E
(n-1)Interpolation coefficient preferably be 0.5;
13.4, calculate the average energy of the non-zero quantifying unit before and after zero quantifying unit in the present frame (that is, the n frame), carry out interpolation then, obtain interpolation energy value E '
(n), the interpolation coefficient that corresponding former and later two non-zeros quantize the cell-average energy preferably is 0.5;
If 13.5 the prediction energy that obtains in the step 13.3
Greater than the interpolation ENERGY E that obtains in the step 13.4 '
(n), then use the interpolation ENERGY E '
(n)Substitute the prediction energy
Otherwise still use the prediction energy in the step 13.3
It will be understood by those skilled in the art that: exponent number, the mode of prediction in the step 13.3, and the coefficient etc. of interpolation all may change to some extent in the step 13.4, this does not exceed scope of the present invention.
After having carried out decay factor calculating and zero quantifying unit energy predicting, handle and enter step 14, therein, carry out comfort noise based on the parameter that obtains among the step 11-13 and fill.Specifically:
14.1, before filling noise, generate one group of magnitude range set of random numbers r between-1 to 1, that be evenly distributed (k), the interior spectral line sum of the quantity of random number and a frame is identical;
14.2, the spectral line numbering that comprised according to each zero quantifying unit, at set of random numbers r (k) (wherein, k=0,1,2 ... MN-1, the meaning of M, N sees above) in distribute the data interval Rz (for example, the scope of zero quantifying unit has comprised x to y sub-band samples (spectral line numbering), and then the corresponding x that chooses at random among the array r (k) of Rz is to y random number) of corresponding each zero quantifying unit;
14.3, according to the decay factor that obtains in the step 12, calculate the average energy value Ez in the interval Rz of each random number that generates:
14.4, by formula
As can be known, after detecting zero quantifying unit and carrying out corresponding noise filling, the energy computing formula of corresponding quantifying unit is as follows:
Therefore, the E that obtains in the integrating step 14.3
ZEnergy with resulting zero quantifying unit in the step 13
Utilize above-mentioned formula, can backwards calculation go out at the prediction energy
The quantization step of the zero quantifying unit in the scope
14.5, utilize the quantization step of the zero quantifying unit that step 14.4 obtains
Zero quantifying unit is carried out noise fills:
In the formula (9)
Being should be in the content of k position of spectral line filling.
As example, Fig. 5 shows in detail at the blind noise of DRA algorithm and fills flow process 20.Specifically:
20.1, by treatment step 21-26, carried out the inverse operation of each main computing module in the DRA coding framework shown in Figure 1, and to have obtained be the spectral coefficient (a for example frame of 1024) of unit with the frame when step 26 is exported;
20.2, according to gain (decay factor) g that above description of step 12 is determined present frame
p
20.3, determine whether to comprise in the present frame zero quantifying unit (for stable state, just zero subband) according to abovementioned steps 11: when in the step 11 being judged as not the time directly the spectral coefficient of present frame is carried out IMDCT and handles 27, the PCM data that output needs; Otherwise, change the step of hereinafter describing 13 over to;
20.4, in step 13, the zero quantifying unit energy predicting step of above having described, preferably, above-mentioned prediction steps has comprised front cross frame (is that the N frame is an example with present frame) energy calculation procedure 13A and the energy predicting step 13B that works as the leading zero quantifying unit, and concrete prediction mode step 13.1-13.5 as mentioned in the above is described;
20.5, handle through step 13 after, and the present frame decay factor of calculating in the integrating step 12 just can be carried out noise and be filled in step 14, specifically step 14.1-14.5 is described as mentioned for mode;
20.6, the voice data after filling through step 14 noise carries out IMDCT again and handles 27, and the PCM data that need of output subsequently.
It will be understood by those skilled in the art that: though step 12 is carried out before being arranged at step 11 in the present embodiment, this does not influence realization of the present invention.In other words, the order of step 11-14 following combination may occur at least and not exceed the spirit and scope of the present invention: (1) step 11->step 12->step 13->step 14; (2) step 11->step 13->step 12->step 14; (3) step 11->(step 12 and step 13 simultaneously)->step 14; (4) step 12->step 11->step 13->step 14.
Hereinafter, will describe the blind noise filling device 30 of another embodiment according to the present invention with reference to figure 6 in detail, it is preferred in the audio decoding system, more preferably is used for the DRA audio decoder.Blind noise filling device 30 comprises:
● be used to receive the zero quantifying unit detection module 31 that is decoded as frequency domain data, this module also detects whether there is zero quantifying unit in the present frame according to abovementioned steps 11 described methods;
● the decay factor computing module 32 that is used to receive frequency domain data and calculates the present frame decay factor according to abovementioned steps 12;
● be used to receive the judged result (not shown) and the frequency domain data of zero quantifying unit detection module 31, and calculate the zero quantifying unit energy predicting module 33 of predicting energy when the leading zero quantifying unit according to abovementioned steps 13; And
● be used for result of calculation, and come frequency domain data is carried out the comfort noise packing module 34 that comfort noise is filled according to abovementioned steps 14 according to zero quantifying unit detection module 31, decay factor computing module 32, zero quantifying unit energy predicting module 33.
It will be appreciated by persons skilled in the art that according to the spirit and scope of the present invention module 31-34 is arranged to one of following order does not influence realization of the present invention: (1) module 31->module 32->module 33->module 34; (2) module 31->module 33->module 32->module 34; (3) module 31->(module 32 and module 33 simultaneously)->module 34; (4) module 32->module 31->module 33->module 34.
Simulation result
In view of one of most important index of estimating the audio coder quality is the subjective tonequality of its coding back sound signal, this paper has provided following simulation result especially, and is for reference.
Table 1 is that the DRA demoder adopts the blind fill method of comfort noise described in the literary composition, is respectively the result of the subjective testing that carries out of 48kbps and 128kbps code stream to code check.Wherein test code streams is 12 sections used sources of sound of MPEG, is two-channel, 48kHz.Represent the Three Estate tested respectively with-1,0,1 ,-1 expression noise is filled the back deleterious, and indifference before and after 0 expression noise is filled, 1 expression noise are filled back tonequality and obviously improved.
The blind subjective testing contrast of filling front and back of table 1 noise
When code check was 128kbps as can be seen from Table 1, the effect that noise is filled was not clearly.When code check was 48kbps, according to test result as can be seen, for the code stream of voice class, after noise was filled, code stream tonequality was significantly improved; For music class code stream, effect is also more satisfactory.As seen, use method of the present invention (device), can under low code check condition, improve subjective tonequality, particularly at the voice class signal.
Though described the present invention in conjunction with being considered to most realistic and optimum embodiment at present, but those skilled in the art are to be understood that and the invention is not restricted to the disclosed embodiments, on the contrary, the present invention is intended to cover various modifications and the equivalent construction that comprises within the spirit of claims and the category.Those skilled in the art can be understood that: can various deformation and/or improvement be used the present invention as being shown in specific embodiment ground, and this does not break away from the spirit or scope of the present invention of describing in broad mode.Therefore, to be considered to be descriptive but not determinate to the embodiment of this paper in all fields.