The method of discrimination of transient audio signal
Technical field
The present invention relates to a kind of method of discrimination of transient audio signal, particularly a kind of in the audio coding process method of discrimination of transient audio signal.
Background technology
Sensing audio encoding is a kind of entropy constrained transform domain coding that diminishes, the time-domain digital sound signal enters analysis filterbank and psychoacoustic model respectively after forming a frame (vector of certain-length), after analysis filterbank adds the window function (namely multiplying each other with the vectorial pointwise of certain-length) of certain-length and shape to signal, carry out the territory map function of certain block length, obtain the transform domain frequency spectrum of sound signal; Psychoacoustic model then obtains the information for coding control; The transform domain frequency spectrum of signal is delivered in the quantizer, carries out entropy constrained according to the coding control information; Exported after advancing that code stream is synthetic and be packaged into needed form through the transform domain frequency spectrum that quantizes and control information, just finished the coding of a frame signal.
The quantity of information of sound signal reduces that the quantification link that is at sensing audio encoding realizes.Scrambler uses different quantified precisions to the sound signal at the transform domain different frequency bands, obtains final Global Information amount and reduces, and each frequency band has also been introduced the quantizing noise of different sizes simultaneously.By the guidance of psychoacoustic model, the quantizing noise of introducing can be controlled below the appreciable degree of people, makes that the audio quality after the coding and rebuilding is not acoustically reducing significantly.
In analysis filterbank, sound signal is added the window function of difformity and length, the transform domain frequency spectrum that obtains has different temporal resolutions and spectral resolution, and they have different code efficiencies.Use long piece coding (adding long window function) to obtain higher spectral resolution generally speaking, obtain higher audio coding quality, but because the temporal resolution of frequency spectrum is lower, coding back quantizing noise will be diffused in the scope of whole conversion block length on time domain, in the face of transient signal the time, the quantizing noise of diffusion is easy to the signal that energy is less and covers, and causes transient distortion.For eliminating this effect, the piece handover mechanism is introduced in the audio coder, and it allows scrambler that signal is added different windows, and long piece coding or short block coding are with the demand of temporal resolution and spectral resolution under the reply different situations.Though piece switches delay and the complexity that has increased coding, but because it is suppressing to quantize noise diffusion, eliminate the better effects of transient distortion aspect, comprise advanced audio (Advanced audio coding, AAC), MPEG audio layer III (MP3) etc. all has this optional mechanism in interior main flow audio coding standard.
For obtaining the high-level efficiency coding, the judgement that piece switches need operate adaptively according to input signal.The judgement that piece switches has two kinds of main classification: based on the back checking method that quantizes with based on the first checking method of signal analysis.Do the quantization encoding of two kinds of block lengths simultaneously and compare their efficient based on the back checking method that quantizes, the added window function type of signal will enter multi-dimensional optimization as the part of quantization parameter like this, the multi-dimensional optimization problem is handled by the quantizing distortion control algolithm, this switching judging algorithm has the highest theoretical performance, but it has greatly increased the complexity of encryption algorithm, does not almost adopt in the scrambler of realizing usually.First checking method based on signal analysis is analyzed input signal before coding, carry out the block length switching judging by setting up criterion, and this method has lower complexity naturally, also is widely used.United States Patent (USP) 5,285,498 just provide a kind of piece switching deciding method based on perceptual entropy, it with the parameter perceptual entropy of psychoacoustic model output as main comparative parameter, set up threshold value to be used to judge whether transient state of signal, part MP3 and AAC scrambler just use this algorithm at present, and are integrated among the psychoacoustic model PAMII.United States Patent (USP) 5,701,389 provide another kind of method of discrimination, and the high-frequency energy of its usefulness signal spectrum and the ratio of low frequency energy set up threshold value to be used to judge whether signal is transient signal as main comparative parameter.In addition, transient signal can also detect to judge by time domain energy.
Piece switching judging algorithm is the judgement to the signal transient characteristic in essence, and determination methods must have the certain rate of missing and False Rate, the rate of missing is the probability that does not detect transient signal, False Rate is the probability that non-transient signal is judged as transient signal, the former causes bigger transient distortion easily, and the latter then to a certain degree reduces the audio coding quality.Based on perceptual entropy judge and the low-and high-frequency energy than the method for judging, the effect of actual detection transient signal is also bad, the audio quality raising of audio coder is very limited after opening piece and switching; Then only utilized the information of time domain based on the method for time domain energy detection, it has the transient signal of significant change that good detection effect is arranged for time domain energy, if but the energy of signal keeps stable, and in frequency acute variation takes place, just this method can't detect.
Summary of the invention
The object of the present invention is to provide a kind of method of discrimination of transient audio signal, in order to solve in the prior art because transient signal can't accurately detect or to detect error rate too high, thereby reduce the problem of audio coding quality.
For addressing the above problem, the method for discrimination of transient audio signal provided by the invention comprises the steps:
Step 1, frequency component matrix y when obtaining log-domain;
Step 2, according to the principle of projection, the minimum area D of the view plane projection of frequency component matrix or sight line projection during calculating
Area
Step 3 is by the minimum area D of this view plane projection or sight line projection
Area, utilize discrimination formula AC * D
Area>DC * Thr differentiates transient signal and steady-state signal, and wherein DC is energy value or the range value of DC component in the formula, and AC is energy value or the range value of AC compounent, Thr is discrimination threshold, if above-mentioned discrimination formula is set up, then signal is transient signal, otherwise is steady-state signal.
Preferably, it is the time frequency signal that earlier sound signal is converted to two dimension in said method, frequency component (absolute value or the energy) conversion of taking the logarithm obtains by directly to time-frequency conversion in short-term heterogeneous or Methods of Subband Filter Banks output non-homogeneous the time, perhaps by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix frequency component matrix y when doing mapping transformation and obtaining log-domain; Frequency component matrix y is expressed from the next during described log-domain:
Wherein, m=1,2 ..., M represents the m frequency band, n=1, and 2 ..., N represents the n time block, and M represents it and has M frequency band, and N represents the temporal resolution that it has the N line; The requirement of the time shaft of frequency component matrix y is uniformly during log-domain, i.e. y
M, n-1, y
M, n, y
M, n+1The identical time interval is arranged, and frequency axis is heterogeneous, requires y
M-1, n, y
M, n, y
M+1, nThe band bandwidth of representative increases progressively gradually.
Preferably, during described log-domain frequency component matrix y by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix X do mapping transformation and the conversion of taking the logarithm obtains, described uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix X be:
The size of X is K * L, and K is representing it and having K bar line frequently, the K bar spectral line that corresponding conversion in short-term has or the K bar subband of sub-band filter, and L is representing the temporal resolution that it has the L line, L output of the L piece that corresponding conversion in short-term has or sub-band filter.
Preferably, described mapping transformation is a part of x among the frequency component matrix X when even
SubMerge into a y
M, n, x
SubBe expressed from the next:
Wherein, T=L/N, L are the integral multiples of N, and K bar frequency line is divided into M band from low to high, and the frequency line number that each band comprises is [w
1, w
2, w
3..., w
M], w wherein
1≤ w
2≤ w
3≤ ... ≤ w
M, corresponding frequency band border is [b
1, b
2, b
3..., b
M+1], b wherein
1=1, the method for the merging of adopting be energy and, average energy value, absolute amplitude and, a kind of or this several method in the absolute amplitude average, absolute amplitude maximal value is used alternatingly between ranks.
Preferably, if the slope of all projection lines is identical, then calculate the minimum area of view plane projection, computing method are as follows:
L
mFor a series of projection lines with same slope, be expressed as ax+y+b
m=0, wherein a is slope, b
mIt is biasing; Point (n, y
M, n) to L
mThe quadratic sum of distance is:
Find the solution this minor increment quadratic sum;
According to D
Area(a, b
m) minimum value to appear at its partial differential be zero place, namely
Can get coefficient
Slope has two kinds of possibilities
If the minor increment quadratic sum is then calculated in B<0, obtain discriminant parameter and be:
If the minor increment quadratic sum is then calculated in B>0, obtain discriminant parameter and be:
Preferably, if projection line slope difference is separately then calculated the minimum area of sight line projection, computing method are as follows:
Suppose L
mBe the projection line of a series of Different Slope, for the point of the N on the m frequency band (n, y
M, n), best-fitting straight line y=f (x) can be so that point arrives L
mThe quadratic sum of distance, i.e. square error | f (n)-y
n|
2Minimum, the algebraic equation of this straight line is:
Wherein
Point (n, y
M, n) to L
mThe quadratic sum of distance is:
Coefficient wherein
D with all subbands
mBe summed into D
AreaBe used for to judge, be weighted simultaneously, namely use on this frequency axis maximal value to be weighted to obtain discriminant parameter to be:
Compared with prior art, the present invention has adopted the time-frequency detection, sound signal is converted to the time frequency signal of two dimension, by calculating the minimum area of view plane projection or sight line projection, and utilize judgment formula to judge transient signal or steady-state signal, utilize this kind method comparatively accuracy to detect transient signal, can cooperate multiple audio coder to improve the quality of audio coding thus.
Description of drawings
Fig. 1 is typical signal analysis figure, and wherein (a)-(f) is signal waveforms, (g)-(l) is signal short-time spectrum curved surface, (m)-(r) is minimum projection's synoptic diagram of signal short-time spectrum curved surface.
Fig. 2 is the process flow diagram of implementing the method for discrimination of transient audio signal of the present invention.
Fig. 3 is the projection problem analysis diagram of m frequency band.
Fig. 4 is the oscillogram of test signal.
Fig. 5 is the spectrogram of test signal.
Fig. 6 is the assessment of acoustics comparison diagram as a result of each frame of audio frequency of obtaining of three kinds of configuration codes.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the invention is described.
See also shown in Figure 1ly, be typical signal analysis figure.By signal analysis, utilize the method for priori to differentiate that the efficient of length piece coding is very difficult, but to the signal under some situation, obtain high-level efficiency with short block coding total energy fully.Fig. 1 has extracted 2048 signals in some sound signals, and according to the short block partitioning scheme of AAC.1152 of centres are divided into 8 short blocks (temporal resolution is 8), it is done spectrum analysis, obtained the curved surface (being the curved surface of time-frequency signal) of signal short-time spectrum signal (2 to be the log-domain short block coding sub belt energy of the truth of a matter, the frequency spectrum number is 13).Can see (a) among Fig. 1 and (d) in signal be easy to analyze it by short-time energy and be transient signal, but among Fig. 1 (b) and (c) just can't analyze in this way, (e) is typical steady-state signal among Fig. 1, (f) then very difficult differentiation among Fig. 1.Signal is transformed into short-term spectrum, see at a certain angle along time shaft, short-time spectrum will project on the plane, the feasible projected area minimum of seeing, this Projection Display (m) in Fig. 1 arrives on (r), be easy to find out that transient signal has bigger projection, and the projection of steady-state signal is very little.
See also shown in Figure 2ly, be the process flow diagram of the method for discrimination of implementing transient audio signal of the present invention, this method may further comprise the steps:
Step 201, frequency component matrix y when obtaining size for the log-domain of M * N,
M=1 wherein, 2 ..., M represents the m frequency band, n=1, and 2 ..., N represents the n time block, and M is representing it and is having M frequency band, and N is representing the temporal resolution that it has the N line.The requirement of the time shaft of frequency component matrix y is uniformly during log-domain, i.e. y
M, n-1, y
M, n, y
M, n+1The identical time interval is arranged, and its frequency axis is heterogeneous, requires y
M-1, n, y
M, n, y
M+1, nThe band bandwidth of representative increases progressively gradually, and namely the m of low sequence number has higher spectral resolution (band bandwidth is little), and high sequence number m has lower spectral resolution (band bandwidth is big).Spectral resolution is by the signals sampling frequency, and the length of window of spectrum analysis, window shape determine, in general, spectral resolution=sample frequency/spectrum line number/2* (coefficient less than 1) is wherein determined by window shape less than 1 coefficient.
Frequency component matrix y during log-domain, can be directly during to time-frequency conversion in short-term heterogeneous or Methods of Subband Filter Banks output non-homogeneous frequency component (absolute value or the energy) conversion of taking the logarithm obtain, also can by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix X do mapping transformation and the conversion of taking the logarithm obtains.Uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix be
(formula 2)
The size of X is K * L, and K is representing it and having K bar line frequently, the K bar spectral line that corresponding conversion in short-term has or the K bar subband of sub-band filter; L is representing the temporal resolution that it has the L line, L output of the L piece that corresponding conversion in short-term has or sub-band filter.Mapping transformation is that this requires L is the integral multiple of N, makes T=L/N here uniformly for the mapping of time shaft; Mapping transformation is heterogeneous for the mapping of frequency axis, and it is divided into M band from low to high with K bar frequency line, and the frequency line number that each band comprises is [w
1, w
2, w
3..., w
M], w wherein
1≤ w
2≤ w
3≤ ... ≤ w
M, corresponding frequency band border is [b
1, b
2, b
3..., b
M+1], b wherein
1=1.
Mapping transformation is with the x in the formula 3
SubMerge into a y
M, n, the method for merging can be following one of several: energy and, average energy value, absolute amplitude and, absolute amplitude average, absolute amplitude maximal value.This several method also can be used alternatingly between ranks, as getting maximal value in the ranks, again it is got average.Merging value the most at last is transformed into log-domain.
Step 202, according to the principle of projection, the minimum area of the view plane projection of frequency component matrix y or sight line projection during calculating.
Seeing also shown in Figure 3ly, is the projection problem analysis diagram of m frequency band.Consider the projection process of N point of m frequency band, these points are on the plane of time shaft and amplitude axis formation, and there is a projection line in this plane, makes this N to put the online both sides that distribute.Adjust slope and skew, make projection line add the ultimate range sum minimum of the point of opposite side to the ultimate range of the point of a side, obtaining with the projection line is the projection plane of normal, and this N point is also minimum to the projected length (the longest line distance between the point) of projection plane.Because finding the solution of this minor increment is very difficult, thus this index with point to the least square of projection line distance with come approximate representation.The m frequency band has m bar projection line.According to the slope difference of projection line, can be divided into two kinds of situations: if the slope of all projection lines must be the same, then calculate the minimum area of view plane projection; If projection line slope separately is different, then calculate the minimum area of sight line projection.Below discuss respectively:
(1) minimum area of calculating view plane projection:
Suppose L
mFor a series of projection lines with same slope, be expressed as ax+y+b
m=0, wherein a is slope, b
mIt is biasing.Point (n, y
M, n) to L
mThe quadratic sum of distance is
(formula 4)
Find the solution this minor increment quadratic sum.
According to D
Area(a, b
m) minimum value to appear at its partial differential be zero place, namely
(formula 5)
Can get coefficient
(formula 6)
Slope has two kinds of possibilities
(formula 7)
If B<0, then the minor increment quadratic sum is
(formula 8)
If B>0, the minor increment quadratic sum is
(formula 9)
(2) minimum area of calculating sight line projection:
Suppose L
mBe the projection line of a series of Different Slope, for the point of the N on the m frequency band (n, y
M, n), best-fitting straight line y=f (x) can be so that point arrives L
mThe quadratic sum of distance is square error | f (n)-y
n|
2Minimum, the algebraic equation of this straight line is
(formula 10)
Wherein
(formula 11)
Formula (12)
Point (n, y
M, n) to L
mThe quadratic sum of distance is
(formula 13)
Coefficient wherein
(formula 14)
D with all subbands
mBe summed into D
AreaAs discriminant parameter, carry out energy or amplitude weighting simultaneously, namely use energy value or range value (being log-domain) maximum on this frequency axis to be weighted, obtain discriminant parameter:
(formula 15)
If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be energy and or average energy value, the maximum energy value of the log-domain that maximal value refers to is carried out in the weighting in the formula 15 to energy so; If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be absolute amplitude and, absolute amplitude average or absolute amplitude maximal value, weighting in the formula 15 is carried out range value so, the amplitude peak value of the log-domain that maximal value refers to.
Step 203, utilize discrimination formula to determine transient signal and steady-state signal:
Because the short block coding is very low for the very big signal code efficiency of low-frequency component fluctuation, so the differentiation of transient audio signal need be considered the ratio of DC component and AC compounent.Making energy or the amplitude of DC component is DC, and the energy of AC compounent or amplitude are AC, and then discrimination formula is:
AC * D
Area>DC * Thr (formula 16)
If satisfy formula 16, then signal is transient signal, otherwise is steady-state signal.Wherein, discrimination threshold Thr can be for a predefined value, and perhaps with reference to M, the size of N and the truth of a matter of using when time frequency signal y in the formula (1) taken the logarithm determine jointly, generally can calculate the long-term average y of all elements quadratic sum among the y
a(namely being averaged by all elements quadratic sum that multiframe y is obtained), y
aAt fixing M, the fixing approximately jointly also can be similar to of the amplitude range of the size of N, the truth of a matter of taking the logarithm and input signal is considered as a constant, judgment threshold Thr=γ y
a, do fine setting optimization between coefficient gamma generally is made as 1% to 5% and according to actual needs, coefficient gamma represents steady-state signal and allows Wave energy to surpass the percentage upper limit of integral energy.
If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be energy and or during average energy value, formula 16 uses the energy value of DC component and AC compounent to differentiate; If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be absolute amplitude and, when absolute amplitude average or absolute amplitude maximal value, formula 16 uses the range value of DC component and AC compounent to differentiate.
For weighing the useful improvement of the transient signal determination methods that the present invention provides, carry out the test of scrambler objective quality evaluation here.The AAC scrambler is adopted in test, and the piece handoff algorithms is respectively the determination methods that perceptual entropy is judged and the present invention gives.Testing tool is based on perception assessment of acoustics (the Evaluation of audio quality described in the standard I TU-R BS.1387 of International Telecommunications Union (ITU), PEAQ), the index noise mask that PEAQ provides is than (Noise to mask ratio, NMR) be to measure with the dB scale, NMR is a kind of the considered unfamiliar to the ear reason of people and the auditopsychic noise measurement of people parameter, be a kind of metering of weighted noise, it represents the scrambler quantizing noise to the mean distance of sheltering curve.If the encoder encodes quality is fine, tonequality is very high, and NMR is smaller so, otherwise then very big.
In the test, test signal selects to use the first typical song " gspi35_2 " in the sound quality assessment material database of taking from European Broadcasting Union's formulation.In this first song a large amount of transient signals is arranged, some belongs to the energy saltus step (referring to the audio volume control among Fig. 4) of time domain, and some belongs to frequency hopping (referring to the audible spectrum among Fig. 5), and situation is very complicated.Test is by three kinds of configurations, and first kind of configuration do not use transient signal to detect, and scrambler just uses long piece coding fully; Second kind of configuration uses perceptual entropy to carry out judgement and the execution block switching of transient signal; The third configuration uses transient signal determination methods provided by the invention and execution block to switch.The assessment of acoustics result (NMR) of each frame of audio frequency that three kinds of configuration codes obtain is presented among Fig. 6, wherein figure (a) (b) (c) represent respectively and disposes one, two, three.Can obviously see from test result, the scrambler that does not provide piece to switch, audio quality is very poor, and in the moment that transient signal takes place, the NMR value is very big, has surpassed 0dB in the time of most; The use perceptual entropy judges that audio quality makes moderate progress, and some constantly can judge transient signal, but judgment accuracy is not high, and False Rate is also higher, though NMR makes moderate progress, but some exists the local NMR of erroneous judgement to increase on the contrary, on the whole also has the NMR of a large amount of signal frames greater than 0dB; After having adopted transient signal determination methods provided by the present invention, the judging nicety rate of transient signal significantly improves, and the piece switching has also had significantly lifting to the coding quality of scrambler so, and all NMR constantly are all less than 0dB.
Compared with prior art, the present invention has adopted the time-frequency detection, sound signal is converted to the time frequency signal of two dimension, by calculating the minimum area of view plane projection or sight line projection, and utilize judgment formula to judge transient signal or steady-state signal, utilize this kind method comparatively accuracy to detect transient signal, can cooperate multiple audio coder to improve the quality of audio coding thus.
Be understandable that, for those of ordinary skills, can be equal to replacement or change according to technical scheme of the present invention and inventive concept thereof, and all these changes or replacement all should belong to the protection domain of the appended claim of the present invention.