CN102419977B - Method for discriminating transient audio signals - Google Patents

Method for discriminating transient audio signals Download PDF

Info

Publication number
CN102419977B
CN102419977B CN2011100070617A CN201110007061A CN102419977B CN 102419977 B CN102419977 B CN 102419977B CN 2011100070617 A CN2011100070617 A CN 2011100070617A CN 201110007061 A CN201110007061 A CN 201110007061A CN 102419977 B CN102419977 B CN 102419977B
Authority
CN
China
Prior art keywords
sigma
discrimination
frequency
projection
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2011100070617A
Other languages
Chinese (zh)
Other versions
CN102419977A (en
Inventor
吴晟
张本好
林福辉
李昙
徐晶明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN2011100070617A priority Critical patent/CN102419977B/en
Publication of CN102419977A publication Critical patent/CN102419977A/en
Application granted granted Critical
Publication of CN102419977B publication Critical patent/CN102419977B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for discriminating transient audio signals and aims at providing the method for discriminating the transient audio signals in an audio coding process. The key points of the technical scheme as follows: the method comprises the following steps of: converting audio signals into two-dimensional time frequency signals; extracting parameters used for discrimination by calculating the minimal sight plane projection or sight line projection; and finally discriminating the transient audio signals. The method is matched with multiple audio decoders for use by more accurately discriminating the transient signals so as to increase the quality of audio decoding.

Description

The method of discrimination of transient audio signal
Technical field
The present invention relates to a kind of method of discrimination of transient audio signal, particularly a kind of in the audio coding process method of discrimination of transient audio signal.
Background technology
Sensing audio encoding is a kind of entropy constrained transform domain coding that diminishes, the time-domain digital sound signal enters analysis filterbank and psychoacoustic model respectively after forming a frame (vector of certain-length), after analysis filterbank adds the window function (namely multiplying each other with the vectorial pointwise of certain-length) of certain-length and shape to signal, carry out the territory map function of certain block length, obtain the transform domain frequency spectrum of sound signal; Psychoacoustic model then obtains the information for coding control; The transform domain frequency spectrum of signal is delivered in the quantizer, carries out entropy constrained according to the coding control information; Exported after advancing that code stream is synthetic and be packaged into needed form through the transform domain frequency spectrum that quantizes and control information, just finished the coding of a frame signal.
The quantity of information of sound signal reduces that the quantification link that is at sensing audio encoding realizes.Scrambler uses different quantified precisions to the sound signal at the transform domain different frequency bands, obtains final Global Information amount and reduces, and each frequency band has also been introduced the quantizing noise of different sizes simultaneously.By the guidance of psychoacoustic model, the quantizing noise of introducing can be controlled below the appreciable degree of people, makes that the audio quality after the coding and rebuilding is not acoustically reducing significantly.
In analysis filterbank, sound signal is added the window function of difformity and length, the transform domain frequency spectrum that obtains has different temporal resolutions and spectral resolution, and they have different code efficiencies.Use long piece coding (adding long window function) to obtain higher spectral resolution generally speaking, obtain higher audio coding quality, but because the temporal resolution of frequency spectrum is lower, coding back quantizing noise will be diffused in the scope of whole conversion block length on time domain, in the face of transient signal the time, the quantizing noise of diffusion is easy to the signal that energy is less and covers, and causes transient distortion.For eliminating this effect, the piece handover mechanism is introduced in the audio coder, and it allows scrambler that signal is added different windows, and long piece coding or short block coding are with the demand of temporal resolution and spectral resolution under the reply different situations.Though piece switches delay and the complexity that has increased coding, but because it is suppressing to quantize noise diffusion, eliminate the better effects of transient distortion aspect, comprise advanced audio (Advanced audio coding, AAC), MPEG audio layer III (MP3) etc. all has this optional mechanism in interior main flow audio coding standard.
For obtaining the high-level efficiency coding, the judgement that piece switches need operate adaptively according to input signal.The judgement that piece switches has two kinds of main classification: based on the back checking method that quantizes with based on the first checking method of signal analysis.Do the quantization encoding of two kinds of block lengths simultaneously and compare their efficient based on the back checking method that quantizes, the added window function type of signal will enter multi-dimensional optimization as the part of quantization parameter like this, the multi-dimensional optimization problem is handled by the quantizing distortion control algolithm, this switching judging algorithm has the highest theoretical performance, but it has greatly increased the complexity of encryption algorithm, does not almost adopt in the scrambler of realizing usually.First checking method based on signal analysis is analyzed input signal before coding, carry out the block length switching judging by setting up criterion, and this method has lower complexity naturally, also is widely used.United States Patent (USP) 5,285,498 just provide a kind of piece switching deciding method based on perceptual entropy, it with the parameter perceptual entropy of psychoacoustic model output as main comparative parameter, set up threshold value to be used to judge whether transient state of signal, part MP3 and AAC scrambler just use this algorithm at present, and are integrated among the psychoacoustic model PAMII.United States Patent (USP) 5,701,389 provide another kind of method of discrimination, and the high-frequency energy of its usefulness signal spectrum and the ratio of low frequency energy set up threshold value to be used to judge whether signal is transient signal as main comparative parameter.In addition, transient signal can also detect to judge by time domain energy.
Piece switching judging algorithm is the judgement to the signal transient characteristic in essence, and determination methods must have the certain rate of missing and False Rate, the rate of missing is the probability that does not detect transient signal, False Rate is the probability that non-transient signal is judged as transient signal, the former causes bigger transient distortion easily, and the latter then to a certain degree reduces the audio coding quality.Based on perceptual entropy judge and the low-and high-frequency energy than the method for judging, the effect of actual detection transient signal is also bad, the audio quality raising of audio coder is very limited after opening piece and switching; Then only utilized the information of time domain based on the method for time domain energy detection, it has the transient signal of significant change that good detection effect is arranged for time domain energy, if but the energy of signal keeps stable, and in frequency acute variation takes place, just this method can't detect.
Summary of the invention
The object of the present invention is to provide a kind of method of discrimination of transient audio signal, in order to solve in the prior art because transient signal can't accurately detect or to detect error rate too high, thereby reduce the problem of audio coding quality.
For addressing the above problem, the method for discrimination of transient audio signal provided by the invention comprises the steps:
Step 1, frequency component matrix y when obtaining log-domain;
Step 2, according to the principle of projection, the minimum area D of the view plane projection of frequency component matrix or sight line projection during calculating Area
Step 3 is by the minimum area D of this view plane projection or sight line projection Area, utilize discrimination formula AC * D Area>DC * Thr differentiates transient signal and steady-state signal, and wherein DC is energy value or the range value of DC component in the formula, and AC is energy value or the range value of AC compounent, Thr is discrimination threshold, if above-mentioned discrimination formula is set up, then signal is transient signal, otherwise is steady-state signal.
Preferably, it is the time frequency signal that earlier sound signal is converted to two dimension in said method, frequency component (absolute value or the energy) conversion of taking the logarithm obtains by directly to time-frequency conversion in short-term heterogeneous or Methods of Subband Filter Banks output non-homogeneous the time, perhaps by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix frequency component matrix y when doing mapping transformation and obtaining log-domain; Frequency component matrix y is expressed from the next during described log-domain:
Figure BSA00000417806700041
Wherein, m=1,2 ..., M represents the m frequency band, n=1, and 2 ..., N represents the n time block, and M represents it and has M frequency band, and N represents the temporal resolution that it has the N line; The requirement of the time shaft of frequency component matrix y is uniformly during log-domain, i.e. y M, n-1, y M, n, y M, n+1The identical time interval is arranged, and frequency axis is heterogeneous, requires y M-1, n, y M, n, y M+1, nThe band bandwidth of representative increases progressively gradually.
Preferably, during described log-domain frequency component matrix y by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix X do mapping transformation and the conversion of taking the logarithm obtains, described uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix X be:
Figure BSA00000417806700051
The size of X is K * L, and K is representing it and having K bar line frequently, the K bar spectral line that corresponding conversion in short-term has or the K bar subband of sub-band filter, and L is representing the temporal resolution that it has the L line, L output of the L piece that corresponding conversion in short-term has or sub-band filter.
Preferably, described mapping transformation is a part of x among the frequency component matrix X when even SubMerge into a y M, n, x SubBe expressed from the next:
Figure BSA00000417806700052
Wherein, T=L/N, L are the integral multiples of N, and K bar frequency line is divided into M band from low to high, and the frequency line number that each band comprises is [w 1, w 2, w 3..., w M], w wherein 1≤ w 2≤ w 3≤ ... ≤ w M, corresponding frequency band border is [b 1, b 2, b 3..., b M+1], b wherein 1=1, the method for the merging of adopting be energy and, average energy value, absolute amplitude and, a kind of or this several method in the absolute amplitude average, absolute amplitude maximal value is used alternatingly between ranks.
Preferably, if the slope of all projection lines is identical, then calculate the minimum area of view plane projection, computing method are as follows:
L mFor a series of projection lines with same slope, be expressed as ax+y+b m=0, wherein a is slope, b mIt is biasing; Point (n, y M, n) to L mThe quadratic sum of distance is:
D area ( a , b m ) = Σ m = 1 M Σ n = 1 N ( an + y m , n + b m ) 2 a 2 + 1
= Aa 2 + Ba + C a 2 + 1
Find the solution this minor increment quadratic sum;
According to D Area(a, b m) minimum value to appear at its partial differential be zero place, namely
∂ D ( a , b m ) ∂ b m = 0 , ∂ D ( a , b m ) ∂ a = 0
Can get coefficient
A = Σ m = 1 M Σ n = 1 N n - 1 N Σ m = 1 M ( Σ n = 1 N n ) 2
B = 2 Σ m = 1 M Σ n = 1 N ny m , n - 2 N Σ m = 1 M [ ( Σ n = 1 N n ) ( Σ n = 1 N y m , n ) ]
C = Σ m = 1 M Σ n = 1 N y m , n 2 - 1 N Σ m = 1 M ( Σ n = 1 N y m , n ) 2
Slope has two kinds of possibilities
a 1 = 1 B ( A - C + ( A - C ) 2 + B 2 )
a 2 = 1 B ( A - C - ( A - C ) 2 + B 2 )
If the minor increment quadratic sum is then calculated in B<0, obtain discriminant parameter and be:
D area = Aa 1 2 + Ba 1 + C a 1 2 + 1
If the minor increment quadratic sum is then calculated in B>0, obtain discriminant parameter and be:
D area = Aa 2 2 + Ba 2 + C a 2 2 + 1 .
Preferably, if projection line slope difference is separately then calculated the minimum area of sight line projection, computing method are as follows:
Suppose L mBe the projection line of a series of Different Slope, for the point of the N on the m frequency band (n, y M, n), best-fitting straight line y=f (x) can be so that point arrives L mThe quadratic sum of distance, i.e. square error | f (n)-y n| 2Minimum, the algebraic equation of this straight line is:
y = y ‾ m + a m ( x - x ‾ )
Wherein
y ‾ m = 1 N Σ n = 1 N y m , n , x ‾ = 1 N Σ n = 1 N n
a m = xy ‾ m - xy ‾ m x 2 ‾ - x ‾ 2 = N ( Σ n = 1 N ny m , n ) - ( Σ n = 1 N n ) ( Σ n = 1 N y m , n ) N ( Σ n = 1 N n 2 ) - ( Σ n = 1 N n ) 2
Point (n, y M, n) to L mThe quadratic sum of distance is:
D m = Σ n = 1 N ( y ‾ m + a m ( n - x ‾ ) - y m , n ) 2
= A m a m 2 + B m a m + C m
Coefficient wherein
A m = Σ n = 1 N ( n - 1 N Σ n = 1 N n ) 2
B m = Σ n = 1 N [ ( 1 N Σ n = 1 N y m , n - y n ) ( n - 1 N Σ n = 1 N n ) ]
C m = Σ n = 1 N ( 1 N Σ n = 1 N y m , n - y m , n ) 2
D with all subbands mBe summed into D AreaBe used for to judge, be weighted simultaneously, namely use on this frequency axis maximal value to be weighted to obtain discriminant parameter to be:
D area = Σ m = 1 M ( D m max ( y m , n ) n = 1 N ) .
Compared with prior art, the present invention has adopted the time-frequency detection, sound signal is converted to the time frequency signal of two dimension, by calculating the minimum area of view plane projection or sight line projection, and utilize judgment formula to judge transient signal or steady-state signal, utilize this kind method comparatively accuracy to detect transient signal, can cooperate multiple audio coder to improve the quality of audio coding thus.
Description of drawings
Fig. 1 is typical signal analysis figure, and wherein (a)-(f) is signal waveforms, (g)-(l) is signal short-time spectrum curved surface, (m)-(r) is minimum projection's synoptic diagram of signal short-time spectrum curved surface.
Fig. 2 is the process flow diagram of implementing the method for discrimination of transient audio signal of the present invention.
Fig. 3 is the projection problem analysis diagram of m frequency band.
Fig. 4 is the oscillogram of test signal.
Fig. 5 is the spectrogram of test signal.
Fig. 6 is the assessment of acoustics comparison diagram as a result of each frame of audio frequency of obtaining of three kinds of configuration codes.
Embodiment
Below in conjunction with accompanying drawing the specific embodiment of the invention is described.
See also shown in Figure 1ly, be typical signal analysis figure.By signal analysis, utilize the method for priori to differentiate that the efficient of length piece coding is very difficult, but to the signal under some situation, obtain high-level efficiency with short block coding total energy fully.Fig. 1 has extracted 2048 signals in some sound signals, and according to the short block partitioning scheme of AAC.1152 of centres are divided into 8 short blocks (temporal resolution is 8), it is done spectrum analysis, obtained the curved surface (being the curved surface of time-frequency signal) of signal short-time spectrum signal (2 to be the log-domain short block coding sub belt energy of the truth of a matter, the frequency spectrum number is 13).Can see (a) among Fig. 1 and (d) in signal be easy to analyze it by short-time energy and be transient signal, but among Fig. 1 (b) and (c) just can't analyze in this way, (e) is typical steady-state signal among Fig. 1, (f) then very difficult differentiation among Fig. 1.Signal is transformed into short-term spectrum, see at a certain angle along time shaft, short-time spectrum will project on the plane, the feasible projected area minimum of seeing, this Projection Display (m) in Fig. 1 arrives on (r), be easy to find out that transient signal has bigger projection, and the projection of steady-state signal is very little.
See also shown in Figure 2ly, be the process flow diagram of the method for discrimination of implementing transient audio signal of the present invention, this method may further comprise the steps:
Step 201, frequency component matrix y when obtaining size for the log-domain of M * N,
Figure BSA00000417806700091
(formula 1)
M=1 wherein, 2 ..., M represents the m frequency band, n=1, and 2 ..., N represents the n time block, and M is representing it and is having M frequency band, and N is representing the temporal resolution that it has the N line.The requirement of the time shaft of frequency component matrix y is uniformly during log-domain, i.e. y M, n-1, y M, n, y M, n+1The identical time interval is arranged, and its frequency axis is heterogeneous, requires y M-1, n, y M, n, y M+1, nThe band bandwidth of representative increases progressively gradually, and namely the m of low sequence number has higher spectral resolution (band bandwidth is little), and high sequence number m has lower spectral resolution (band bandwidth is big).Spectral resolution is by the signals sampling frequency, and the length of window of spectrum analysis, window shape determine, in general, spectral resolution=sample frequency/spectrum line number/2* (coefficient less than 1) is wherein determined by window shape less than 1 coefficient.
Frequency component matrix y during log-domain, can be directly during to time-frequency conversion in short-term heterogeneous or Methods of Subband Filter Banks output non-homogeneous frequency component (absolute value or the energy) conversion of taking the logarithm obtain, also can by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix X do mapping transformation and the conversion of taking the logarithm obtains.Uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix be
(formula 2)
The size of X is K * L, and K is representing it and having K bar line frequently, the K bar spectral line that corresponding conversion in short-term has or the K bar subband of sub-band filter; L is representing the temporal resolution that it has the L line, L output of the L piece that corresponding conversion in short-term has or sub-band filter.Mapping transformation is that this requires L is the integral multiple of N, makes T=L/N here uniformly for the mapping of time shaft; Mapping transformation is heterogeneous for the mapping of frequency axis, and it is divided into M band from low to high with K bar frequency line, and the frequency line number that each band comprises is [w 1, w 2, w 3..., w M], w wherein 1≤ w 2≤ w 3≤ ... ≤ w M, corresponding frequency band border is [b 1, b 2, b 3..., b M+1], b wherein 1=1.
Figure BSA00000417806700111
(formula 3)
Mapping transformation is with the x in the formula 3 SubMerge into a y M, n, the method for merging can be following one of several: energy and, average energy value, absolute amplitude and, absolute amplitude average, absolute amplitude maximal value.This several method also can be used alternatingly between ranks, as getting maximal value in the ranks, again it is got average.Merging value the most at last is transformed into log-domain.
Step 202, according to the principle of projection, the minimum area of the view plane projection of frequency component matrix y or sight line projection during calculating.
Seeing also shown in Figure 3ly, is the projection problem analysis diagram of m frequency band.Consider the projection process of N point of m frequency band, these points are on the plane of time shaft and amplitude axis formation, and there is a projection line in this plane, makes this N to put the online both sides that distribute.Adjust slope and skew, make projection line add the ultimate range sum minimum of the point of opposite side to the ultimate range of the point of a side, obtaining with the projection line is the projection plane of normal, and this N point is also minimum to the projected length (the longest line distance between the point) of projection plane.Because finding the solution of this minor increment is very difficult, thus this index with point to the least square of projection line distance with come approximate representation.The m frequency band has m bar projection line.According to the slope difference of projection line, can be divided into two kinds of situations: if the slope of all projection lines must be the same, then calculate the minimum area of view plane projection; If projection line slope separately is different, then calculate the minimum area of sight line projection.Below discuss respectively:
(1) minimum area of calculating view plane projection:
Suppose L mFor a series of projection lines with same slope, be expressed as ax+y+b m=0, wherein a is slope, b mIt is biasing.Point (n, y M, n) to L mThe quadratic sum of distance is
D area ( a , b m ) = Σ m = 1 M Σ n = 1 N ( an + y m , n + b m ) 2 a 2 + 1 (formula 4)
= Aa 2 + Ba + C a 2 + 1
Find the solution this minor increment quadratic sum.
According to D Area(a, b m) minimum value to appear at its partial differential be zero place, namely
∂ D ( a , b m ) ∂ b m = 0 , ∂ D ( a , b m ) ∂ a = 0 (formula 5)
Can get coefficient
A = Σ m = 1 M Σ n = 1 N n - 1 N Σ m = 1 M ( Σ n = 1 N n ) 2
B = 2 Σ m = 1 M Σ n = 1 N ny m , n - 2 N Σ m = 1 M [ ( Σ n = 1 N n ) ( Σ n = 1 N y m , n ) ] (formula 6)
C = Σ m = 1 M Σ n = 1 N y m , n 2 - 1 N Σ m = 1 M ( Σ n = 1 N y m , n ) 2
Slope has two kinds of possibilities
a 1 = 1 B ( A - C + ( A - C ) 2 + B 2 ) (formula 7)
a 2 = 1 B ( A - C - ( A - C ) 2 + B 2 )
If B<0, then the minor increment quadratic sum is
D area = Aa 1 2 + Ba 1 + C a 1 2 + 1 (formula 8)
If B>0, the minor increment quadratic sum is
D area = Aa 2 2 + Ba 2 + C a 2 2 + 1 (formula 9)
(2) minimum area of calculating sight line projection:
Suppose L mBe the projection line of a series of Different Slope, for the point of the N on the m frequency band (n, y M, n), best-fitting straight line y=f (x) can be so that point arrives L mThe quadratic sum of distance is square error | f (n)-y n| 2Minimum, the algebraic equation of this straight line is
y = y ‾ m + a m ( x - x ‾ ) (formula 10)
Wherein
y ‾ m = 1 N Σ n = 1 N y m , n , x ‾ = 1 N Σ n = 1 N n (formula 11)
a m = xy ‾ m - xy ‾ m x 2 ‾ - x ‾ 2 = N ( Σ n = 1 N ny m , n ) - ( Σ n = 1 N n ) ( Σ n = 1 N y m , n ) N ( Σ n = 1 N n 2 ) - ( Σ n = 1 N n ) 2 Formula (12)
Point (n, y M, n) to L mThe quadratic sum of distance is
D m = Σ n = 1 N ( y ‾ m + a m ( n - x ‾ ) - y m , n ) 2 (formula 13)
= A m a m 2 + B m a m + C m
Coefficient wherein
A m = Σ n = 1 N ( n - 1 N Σ n = 1 N n ) 2
B m = Σ n = 1 N [ ( 1 N Σ n = 1 N y m , n - y n ) ( n - 1 N Σ n = 1 N n ) ] (formula 14)
C m = Σ n = 1 N ( 1 N Σ n = 1 N y m , n - y m , n ) 2
D with all subbands mBe summed into D AreaAs discriminant parameter, carry out energy or amplitude weighting simultaneously, namely use energy value or range value (being log-domain) maximum on this frequency axis to be weighted, obtain discriminant parameter:
D area = Σ m = 1 M ( D m max ( y m , n ) n = 1 N ) (formula 15)
If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be energy and or average energy value, the maximum energy value of the log-domain that maximal value refers to is carried out in the weighting in the formula 15 to energy so; If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be absolute amplitude and, absolute amplitude average or absolute amplitude maximal value, weighting in the formula 15 is carried out range value so, the amplitude peak value of the log-domain that maximal value refers to.
Step 203, utilize discrimination formula to determine transient signal and steady-state signal:
Because the short block coding is very low for the very big signal code efficiency of low-frequency component fluctuation, so the differentiation of transient audio signal need be considered the ratio of DC component and AC compounent.Making energy or the amplitude of DC component is DC, and the energy of AC compounent or amplitude are AC, and then discrimination formula is:
AC * D Area>DC * Thr (formula 16)
If satisfy formula 16, then signal is transient signal, otherwise is steady-state signal.Wherein, discrimination threshold Thr can be for a predefined value, and perhaps with reference to M, the size of N and the truth of a matter of using when time frequency signal y in the formula (1) taken the logarithm determine jointly, generally can calculate the long-term average y of all elements quadratic sum among the y a(namely being averaged by all elements quadratic sum that multiframe y is obtained), y aAt fixing M, the fixing approximately jointly also can be similar to of the amplitude range of the size of N, the truth of a matter of taking the logarithm and input signal is considered as a constant, judgment threshold Thr=γ y a, do fine setting optimization between coefficient gamma generally is made as 1% to 5% and according to actual needs, coefficient gamma represents steady-state signal and allows Wave energy to surpass the percentage upper limit of integral energy.
If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be energy and or during average energy value, formula 16 uses the energy value of DC component and AC compounent to differentiate; If when the frequency component matrix is done mapping transformation when even in step S201 the merging method of practicality be absolute amplitude and, when absolute amplitude average or absolute amplitude maximal value, formula 16 uses the range value of DC component and AC compounent to differentiate.
For weighing the useful improvement of the transient signal determination methods that the present invention provides, carry out the test of scrambler objective quality evaluation here.The AAC scrambler is adopted in test, and the piece handoff algorithms is respectively the determination methods that perceptual entropy is judged and the present invention gives.Testing tool is based on perception assessment of acoustics (the Evaluation of audio quality described in the standard I TU-R BS.1387 of International Telecommunications Union (ITU), PEAQ), the index noise mask that PEAQ provides is than (Noise to mask ratio, NMR) be to measure with the dB scale, NMR is a kind of the considered unfamiliar to the ear reason of people and the auditopsychic noise measurement of people parameter, be a kind of metering of weighted noise, it represents the scrambler quantizing noise to the mean distance of sheltering curve.If the encoder encodes quality is fine, tonequality is very high, and NMR is smaller so, otherwise then very big.
In the test, test signal selects to use the first typical song " gspi35_2 " in the sound quality assessment material database of taking from European Broadcasting Union's formulation.In this first song a large amount of transient signals is arranged, some belongs to the energy saltus step (referring to the audio volume control among Fig. 4) of time domain, and some belongs to frequency hopping (referring to the audible spectrum among Fig. 5), and situation is very complicated.Test is by three kinds of configurations, and first kind of configuration do not use transient signal to detect, and scrambler just uses long piece coding fully; Second kind of configuration uses perceptual entropy to carry out judgement and the execution block switching of transient signal; The third configuration uses transient signal determination methods provided by the invention and execution block to switch.The assessment of acoustics result (NMR) of each frame of audio frequency that three kinds of configuration codes obtain is presented among Fig. 6, wherein figure (a) (b) (c) represent respectively and disposes one, two, three.Can obviously see from test result, the scrambler that does not provide piece to switch, audio quality is very poor, and in the moment that transient signal takes place, the NMR value is very big, has surpassed 0dB in the time of most; The use perceptual entropy judges that audio quality makes moderate progress, and some constantly can judge transient signal, but judgment accuracy is not high, and False Rate is also higher, though NMR makes moderate progress, but some exists the local NMR of erroneous judgement to increase on the contrary, on the whole also has the NMR of a large amount of signal frames greater than 0dB; After having adopted transient signal determination methods provided by the present invention, the judging nicety rate of transient signal significantly improves, and the piece switching has also had significantly lifting to the coding quality of scrambler so, and all NMR constantly are all less than 0dB.
Compared with prior art, the present invention has adopted the time-frequency detection, sound signal is converted to the time frequency signal of two dimension, by calculating the minimum area of view plane projection or sight line projection, and utilize judgment formula to judge transient signal or steady-state signal, utilize this kind method comparatively accuracy to detect transient signal, can cooperate multiple audio coder to improve the quality of audio coding thus.
Be understandable that, for those of ordinary skills, can be equal to replacement or change according to technical scheme of the present invention and inventive concept thereof, and all these changes or replacement all should belong to the protection domain of the appended claim of the present invention.

Claims (18)

1. the method for discrimination of a transient audio signal is characterized in that this method comprises the steps:
Step 1, frequency component matrix y when obtaining log-domain;
Step 2, according to the principle of projection, the minimum area D of the view plane projection of frequency component matrix or sight line projection during calculating Area
Step 3 is by the minimum area D of this view plane projection or sight line projection Area, utilize discrimination formula AC * D AreaDC * Thr differentiation transient signal and steady-state signal, wherein DC is energy value or the range value of DC component in the formula, and AC is energy value or the range value of AC compounent, Thr is discrimination threshold, if above-mentioned discrimination formula is set up, then signal is transient signal, otherwise is steady-state signal.
2. the method for discrimination of transient audio signal as claimed in claim 1 is characterized in that, frequency component matrix y is expressed from the next during described log-domain:
Figure FDA00003138789000011
Wherein, m=1,2 ..., M represents the m frequency band, n=1, and 2 ..., N represents the n time block, and M represents it and has M frequency band, and N represents the temporal resolution that it has the N line; The requirement of the time shaft of frequency component matrix y is uniformly during log-domain, i.e. y M, n-1, y M, n, y M, n+1The identical time interval is arranged, and frequency axis is heterogeneous, requires y M-1, n, y M, n, y M+1, nThe band bandwidth of representative increases progressively gradually.
3. the method for discrimination of transient audio signal as claimed in claim 1, it is characterized in that, during described log-domain frequency component matrix y directly by to time-frequency conversion in short-term heterogeneous or Methods of Subband Filter Banks output non-homogeneous the time conversion of taking the logarithm of the absolute value of frequency component or energy obtain.
4. the method for discrimination of transient audio signal as claimed in claim 1, it is characterized in that, during described log-domain frequency component matrix y by to uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix x do mapping transformation and the conversion of taking the logarithm obtains, described uniformly in short-term time-frequency conversion or Methods of Subband Filter Banks obtain even the time frequency component matrix x be:
Figure FDA00003138789000021
The size of x is K * L, and K is representing it and having K bar line frequently, the K bar spectral line that corresponding conversion in short-term has or the K bar subband of sub-band filter, and L is representing the temporal resolution that it has the L line, L output of the L piece that corresponding conversion in short-term has or sub-band filter.
5. the method for discrimination of transient audio signal as claimed in claim 4 is characterized in that, described mapping transformation is uniformly for the mapping of time shaft, is heterogeneous for the mapping of frequency axis.
6. the method for discrimination of transient audio signal as claimed in claim 4 is characterized in that, described mapping transformation is a part of x among the frequency component matrix x when even SubMerge into a y M, n, x SubBe expressed from the next:
Figure FDA00003138789000022
Wherein, T=L/N, L are the integral multiples of N, and K bar frequency line is divided into M band from low to high, and the frequency line number that each band comprises is [w 1, w 2, w 3..., w M], w wherein 1≤ w 2≤ w 3≤ ... ≤ w M, corresponding frequency band border is [b 1, b 2, b 3..., b M+1], b wherein 1=1.
7. the method for discrimination of transient audio signal as claimed in claim 6, it is characterized in that, the method for described merging be energy and, average energy value, absolute amplitude and, a kind of or this several method in the absolute amplitude average, absolute amplitude maximal value is used alternatingly between ranks.
8. the method for discrimination of transient audio signal as claimed in claim 1 is characterized in that, the time the minimum view plane projection of frequency component matrix y or sight line projection area calculate and adopt point to least square and the approximate representation of projection line distance.
9. the method for discrimination of transient audio signal as claimed in claim 1 is characterized in that, if the slope of all projection lines is identical, then calculates the minimum area of view plane projection.
10. the method for discrimination of transient audio signal as claimed in claim 1 is characterized in that, if projection line slope difference is separately then calculated the minimum area of sight line projection.
11. the method for discrimination of transient audio signal as claimed in claim 9 is characterized in that, the computing method of the minimum area of described view plane projection are as follows:
L mFor a series of projection lines with same slope, be expressed as ax+y+b m=0, wherein a is slope, b mIt is biasing; Point (n, y M, n) to L mThe quadratic sum of distance is:
D area ( a , b m ) = Σ m = 1 M Σ n = 1 N ( an + y m , n + b m ) 2 a 2 + 1
= Aa 2 + Ba + C a 2 + 1
Wherein, m=1,2 ..., M represents the m frequency band, n=1, and 2 ..., N represents the n time block, and M represents it and has M frequency band, and N represents the temporal resolution that it has the N line;
Find the solution this minor increment quadratic sum;
According to D Area(a, b m) minimum value to appear at its partial differential be zero place, namely
∂ D ( a , b m ) ∂ b m = 0 , ∂ D ( a , b m ) ∂ a = 0
Can get coefficient
A = Σ m = 1 M Σ n = 1 N n - 1 N Σ m = 1 M ( Σ n = 1 N n ) 2
B = 2 Σ m = 1 M Σ n = 1 N ny m , n - 2 N Σ m = 1 M [ ( Σ n = 1 N n ) ( Σ n = 1 N y m , n ) ]
C = Σ m = 1 M Σ n = 1 N y m , n 2 - 1 N Σ m = 1 M ( Σ n = 1 N y m , n ) 2
Slope has two kinds of possibilities
a 1 = 1 B ( A - C + ( A - C ) 2 + B 2 )
a 2 = 1 B ( A - C - ( A - C ) 2 + B 2 )
If the minor increment quadratic sum is then calculated in B<0, the minimum area that obtains the view plane projection is:
D area = Aa 1 2 + Ba 1 + C a 1 2 + 1
If B〉0, then calculate the minor increment quadratic sum, the minimum area that obtains the view plane projection is:
D area = Aa 2 2 + Ba 2 + C a 2 2 + 1 .
12. the method for discrimination of transient audio signal as claimed in claim 10 is characterized in that, the computing method of the minimum area of described sight line projection are as follows:
L mBe the projection line of a series of Different Slope, for the point of the N on the m frequency band (n, y M, n), best-fitting straight line y=f (x) makes invocation point to L mThe quadratic sum of distance, i.e. square error | f (n)-y n| 2Minimum, the algebraic equation of this straight line is:
y = y ‾ m + a m ( x - x ‾ )
Wherein
y ‾ m = 1 N Σ n = 1 N y m , n , x ‾ = 1 N Σ n = 1 N n
a m = xy ‾ m - x ‾ y ‾ m x 2 ‾ - x ‾ 2 = N ( Σ n = 1 N ny m , n ) - ( Σ n = 1 N n ) ( Σ n = 1 N y m , n ) N ( Σ n = 1 N n 2 ) - ( Σ n = 1 N n ) 2
Point (n, y M, n) to L mThe quadratic sum of distance is:
D m = Σ n = 1 N ( y ‾ m + a m ( n - x ‾ ) - y m , n ) 2
= A m a m 2 + B m a m + C m
Coefficient wherein
A m = Σ n = 1 N ( n - 1 N Σ n = 1 N n ) 2
B m = Σ n = 1 N [ ( 1 N Σ n = 1 N y m , n - y n ) ( n - 1 N Σ n = 1 N n ) ]
C m = Σ n = 1 N ( 1 N Σ n = 1 N y m , n - y m , n ) 2
D with all subbands mBe summed into D AreaBe used for to judge, be weighted simultaneously, when namely using described log-domain on the frequency axis of frequency component matrix y maximal value be weighted the minimum area that obtains the sight line projection and be:
D area = Σ m = 1 M ( D m max ( y m , n ) n = 1 N ) .
13. the method for discrimination of transient audio signal as claimed in claim 12, it is characterized in that, if the merging method of when the frequency component matrix is done mapping transformation when even, using be energy and or average energy value, so described weighting is carried out energy, and maximal value is the maximum energy value of log-domain.
14. the method for discrimination of transient audio signal as claimed in claim 12, it is characterized in that, if the merging method of when the frequency component matrix is done mapping transformation when even, using be absolute amplitude and, absolute amplitude average or absolute amplitude maximal value, described weighting is carried out range value, and maximal value is the amplitude peak value of log-domain.
15. the method for discrimination of transient audio signal as claimed in claim 1 is characterized in that, described discrimination threshold Thr is a predefined value.
16. the method for discrimination of transient audio signal as claimed in claim 2, it is characterized in that, described discrimination threshold Thr arranges with reference to M, the size of N and the truth of a matter of using when time frequency signal taken the logarithm, concrete computing method are: the long-term average y that calculates all elements quadratic sum among the time frequency signal matrix y a, namely be averaged by all elements quadratic sum that multiframe y is obtained; y aAt fixing M, fixing approximately jointly of the amplitude range of the size of N, the truth of a matter of taking the logarithm and input signal is considered as a constant, i.e. judgment threshold Thr=γ y a, do fine setting optimization between coefficient gamma is made as 1% to 5% and according to actual needs, coefficient gamma represents steady-state signal and allows Wave energy to surpass the percentage upper limit of integral energy.
17. the method for discrimination of transient audio signal as claimed in claim 4, it is characterized in that, if the merging method of using when the frequency component matrix is done mapping transformation when even is energy and or during average energy value, then discrimination formula is used the energy value of DC component and AC compounent.
18. the method for discrimination of transient audio signal as claimed in claim 4, it is characterized in that, if the merging method of when the frequency component matrix is done mapping transformation when even, using be absolute amplitude and, when absolute amplitude average or absolute amplitude maximal value, then discrimination formula is used the range value of DC component and AC compounent.
CN2011100070617A 2011-01-14 2011-01-14 Method for discriminating transient audio signals Active CN102419977B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100070617A CN102419977B (en) 2011-01-14 2011-01-14 Method for discriminating transient audio signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100070617A CN102419977B (en) 2011-01-14 2011-01-14 Method for discriminating transient audio signals

Publications (2)

Publication Number Publication Date
CN102419977A CN102419977A (en) 2012-04-18
CN102419977B true CN102419977B (en) 2013-10-02

Family

ID=45944359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100070617A Active CN102419977B (en) 2011-01-14 2011-01-14 Method for discriminating transient audio signals

Country Status (1)

Country Link
CN (1) CN102419977B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014118179A1 (en) * 2013-01-29 2014-08-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoders, audio decoders, systems, methods and computer programs using an increased temporal resolution in temporal proximity of onsets or offsets of fricatives or affricates
CN104575513B (en) * 2013-10-24 2017-11-21 展讯通信(上海)有限公司 The processing system of burst noise, the detection of burst noise and suppressing method and device
CN104167209B (en) * 2014-08-06 2017-06-13 华为软件技术有限公司 The detection method and device of a kind of audio distortion
CN104599677B (en) * 2014-12-29 2018-03-09 中国科学院上海高等研究院 Transient noise suppressing method based on speech reconstructing
CN106067819B (en) * 2016-06-23 2021-11-26 广州市迪声音响有限公司 Signal processing system based on component type matrix algorithm
CN111933181B (en) * 2020-07-10 2022-05-17 北京理工大学 Snore feature extraction and detection method and device based on complex order derivative processing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488344A (en) * 2008-01-16 2009-07-22 华为技术有限公司 Quantitative noise leakage control method and apparatus
EP2054881B1 (en) * 2006-08-18 2010-10-27 Digital Rise Technology Co., Ltd. Audio decoding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069593A1 (en) * 2000-03-15 2001-09-20 Koninklijke Philips Electronics N.V. Laguerre fonction for audio coding
RU2443028C2 (en) * 2008-07-11 2012-02-20 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Apparatus and method for calculating bandwidth extension data using a spectral tilt controlled framing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2054881B1 (en) * 2006-08-18 2010-10-27 Digital Rise Technology Co., Ltd. Audio decoding
CN101488344A (en) * 2008-01-16 2009-07-22 华为技术有限公司 Quantitative noise leakage control method and apparatus

Also Published As

Publication number Publication date
CN102419977A (en) 2012-04-18

Similar Documents

Publication Publication Date Title
CN102419977B (en) Method for discriminating transient audio signals
US7778825B2 (en) Method and apparatus for extracting voiced/unvoiced classification information using harmonic component of voice signal
EP2346030B1 (en) Audio encoder, method for encoding an audio signal and computer program
EP1738355B1 (en) Signal encoding
EP2304719B1 (en) Audio encoder, methods for providing an audio stream and computer program
EP1719119B1 (en) Classification of audio signals
EP3029670B1 (en) Determining a weighting function having low complexity for linear predictive coding coefficients quantization
CN102479504A (en) Speech determination apparatus and speech determination method
CN101622662B (en) Encoding device and encoding method
CN102341852B (en) Filtering speech
EP0718822A2 (en) A low rate multi-mode CELP CODEC that uses backward prediction
EP2702585B1 (en) Frame based audio signal classification
KR100367202B1 (en) Digitalized Speech Signal Analysis Method for Excitation Parameter Determination and Voice Encoding System thereby
CN101933086A (en) A method and an apparatus for processing an audio signal
CA2246532A1 (en) Perceptual audio coding
EP2843659B1 (en) Method and apparatus for detecting correctness of pitch period
US9142222B2 (en) Apparatus and method of enhancing quality of speech codec
CN106716528B (en) Method and device for estimating noise in audio signal, and device and system for transmitting audio signal
EP2405424A1 (en) Stereo coding method, device and encoder
CN104517614A (en) Voiced/unvoiced decision device and method based on sub-band characteristic parameter values
Özaydın et al. Matrix quantization and mixed excitation based linear predictive speech coding at very low bit rates
CN101271691B (en) Time-domain noise reshaping instrument start-up judging method and device
CN102760441B (en) Background noise coding/decoding device and method as well as communication equipment
Li et al. A variable-bit-rate speech coding algorithm based on enhanced mixed excitation linear prediction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20170210

Address after: Room 32, building 3205F, No. 707, Zhang Yang Road, free trade zone,, China (Shanghai)

Patentee after: Xin Xin Finance Leasing Co.,Ltd.

Address before: 201203 Shanghai city Zuchongzhi road Pudong Zhangjiang hi tech park, Spreadtrum Center Building 1, Lane 2288

Patentee before: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20170707

Address after: 100033 room 2062, Wenstin Executive Apartment, 9 Financial Street, Beijing, Xicheng District

Patentee after: Xin Xin finance leasing (Beijing) Co.,Ltd.

Address before: Room 32, building 707, Zhang Yang Road, China (Shanghai) free trade zone, 3205F

Patentee before: Xin Xin Finance Leasing Co.,Ltd.

EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20120418

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xin Xin finance leasing (Beijing) Co.,Ltd.

Contract record no.: 2018990000163

Denomination of invention: Method for discriminating transient audio signals

Granted publication date: 20131002

License type: Exclusive License

Record date: 20180626

EE01 Entry into force of recordation of patent licensing contract
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200305

Address after: 201203 Zuchongzhi Road, China (Shanghai) pilot Free Trade Zone 2288

Patentee after: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Address before: 100033 room 2062, Wenstin administrative apartments, 9 Financial Street B, Xicheng District, Beijing.

Patentee before: Xin Xin finance leasing (Beijing) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200601

Address after: 361012 unit 05, 8th floor, building D, Xiamen international shipping center, No.97 Xiangyu Road, Xiamen area, China (Fujian) free trade zone

Patentee after: Xinxin Finance Leasing (Xiamen) Co.,Ltd.

Address before: 2288 Zuchongzhi Road, China (Shanghai) pilot Free Trade Zone

Patentee before: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xin Xin finance leasing (Beijing) Co.,Ltd.

Contract record no.: 2018990000163

Date of cancellation: 20210301

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20120418

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xinxin Finance Leasing (Xiamen) Co.,Ltd.

Contract record no.: X2021110000010

Denomination of invention: Discrimination method of transient audio signal

Granted publication date: 20131002

License type: Exclusive License

Record date: 20210317

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230710

Address after: 201203 Shanghai city Zuchongzhi road Pudong New Area Zhangjiang hi tech park, Spreadtrum Center Building 1, Lane 2288

Patentee after: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Address before: Unit 05, 8 / F, building D, Xiamen international shipping center, 97 Xiangyu Road, Xiamen area, 361012 China (Fujian) pilot Free Trade Zone

Patentee before: Xinxin Finance Leasing (Xiamen) Co.,Ltd.