WO2006000215A1 - Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation - Google Patents
Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation Download PDFInfo
- Publication number
- WO2006000215A1 WO2006000215A1 PCT/DK2004/000458 DK2004000458W WO2006000215A1 WO 2006000215 A1 WO2006000215 A1 WO 2006000215A1 DK 2004000458 W DK2004000458 W DK 2004000458W WO 2006000215 A1 WO2006000215 A1 WO 2006000215A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input signal
- audio input
- perception intensity
- evaluating
- intensity
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Definitions
- the invention relates to a method of evaluating perception intensity of an audio signal as stated in claim 1.
- loudness estimates which relate to the different listeners' perception of how loud a present signal is.
- An automated loudness estimation of audio signals is highly needed for different .purposes such as automatic gain control in relation to broadcasting or, e.g., reproduction of audio signals in a car.
- a problem related to measuring of loudness is that it for many years has been well accepted that the loudness perception of an audio signal is not just a straightforward measurement and a subsequent processing of an audio signal to be evaluated.
- a more advanced example of loudness estimation is disclosed in US 2004/0044525 Al where loudness estimation is based on the assumption that loudness of speech must be evaluated differently than other audio signal components.
- a problem of the disclosed method is that a signal to be evaluated initially must be processed for the purpose of identifying and separating speech components, which is a relatively complicated and processing consuming affair.
- the invention relates to a method of evaluating perception intensity of an audio input signal (IS) comprising the steps of
- TVDF time variant distribution function
- a perception intensity has been obtained on the basis of a time variant distribution function, thereby obtaining an advantageous universal and flexible determination of a perception intensity.
- the universal applicability is basically obtained due to the fact that a distribution function may match and describe audio input signal of very different nature.
- a distribution function may match and describe audio input signal of very different nature.
- speech, music and noise may be evaluated on the basis of a distribution function.
- said estimation of a time variant distribution function refers to the audio input signal (IS).
- a time variant distribution function should, preferably, be performed on the basis of the input signal as; in other words, a feed-forward implementation of the invention.
- the estimation according to the invention may also be performed on the basis of the output signal
- TVDF time variant distribution function
- MI modified audio input signal
- a time variant distribution function should, preferably, be performed on the basis of the actually modified audio input signal; in other words, a feed-back implementation of the invention.
- said audio input signal comprises a sequence of input samples (IS).
- establishment of one perception intensity estimate in the form of a sample should be made on the basis of several audio input signal representative samples, preferably at least two, in order to benefit from the signal history.
- said perception intensity estimate comprises an output sample.
- said time variant distribution function is estimated by a shape description of a distribution function.
- a shape should facilitate utilization of not just only a simple representation or single point of such distribution but rather a representation representing the variation of the distribution function.
- variation should not be regarded as a strict mathematical expression, e.g. only variance, but rather reflect the fact that the shape of a distribution function may vary and that this variation may be estimated for the purpose of obtaining an advantageous evaluation of perception intensity.
- a shape description may also comprise parameters or measures, which may not specifically relate to a specific point of the distribution function. On the other hand, such parameters or measures should of course be derived from the distribution function.
- the shape refers to a time variant distribution function and thus also comprises a location and a scale. Consequently the shape may form a basis for derival or direct extraction of relevant feature parameters of the time variant distribution function.
- said time variant distribution comprises an amplitude distribution function.
- said time variant distribution comprises a power distribution function.
- said time variant distribution comprises a sound intensity distribution function.
- said time variant distribution comprises a two- dimensional distribution function.
- the determining of the perception intensity estimate (PIE ) is made on the basis of at least two time variant distribution functions (TVDF) estimated at at least two different times.
- the determining of the perception intensity representative output samples (OS) is on the basis of a weighted accumulation of at least two time variant distribution functions (TVDF) estimated at at least two different times.
- TVDF time variant distribution functions
- the estimated time variant distribution function should be weighted over time in order to facilitate the desired derivation of perception intensity. This feature is particular strong when the perception intensity to be determined relates to a loudness estimate.
- an output sample is determined on the basis of a least two audio input samples (IS)
- an output sample should, preferably, be based on at least two input samples, thereby obtaining an advantageous description of an input signal, which may broadly be applied for the derivation of a perceptual intensity of representations of audio signals of very different nature.
- the determining of the perception intensity on the basis of said estimated time variant distribution function (TVDF) is done according to at least one non-linear function (NLF).
- a loudness estimate is based on the basis of determination of at least two different statistical functions characterising the evaluated input signal on the basis of non-linear signal processing.
- a typical modification would be applied for the purpose of obtaining automatic equalisation of loudness, although other types of gain control may be applied within the scope of the invention.
- a non-linearity may form a necessary and advantageous way of deriving a representative loudness estimate.
- said at least one non-linear function is established by an artificial neural network (ANN: artificial neural network).
- ANN artificial neural network
- said artificial neural network comprises a multilayer perceptron.
- said at least one non-linear function is established by means of polynomial fitting.
- said at least one non-linear function is established by means of splining.
- the evaluation is established by a serial, a parallel or a combination thereof of at least two non-linear functions (NLF).
- an overall desired evaluation may advantageously be split up in several different non-linear signal processing steps.
- splitting may, e.g., comprise a pre-processing of an input signal performed by at least one non-linear function in one or several bands or partial representations of the input signal prior to a non-linear processing of the individual or combined representations obtained by the pre-processing.
- An example of such pre-processing may, e.g., be establishment of non-linear typically well-known statistical functions representing the input signal in one or several bands according to predetermined signal processing and subsequently performing a signal processing of the combined signals on the basis of one or several non-linear functions.
- the subsequent one or several non-linear functions will typically be non-linear functions adapted specifically for the purpose of bringing the result of the established pre-filtering into an estimate of perception intensity.
- said perception intensity comprises loudness
- said perception intensity comprises sharpness, annoyance, airiness, punchiness, brilliance, presence, fatness, deepness and edginess or any combination thereof.
- TVDF time variant distribution function
- At least one of said at least two different characterizing functions comprises a time variant statistical function.
- two statistical functions are applied as a combined representation of the desired time variant distribution function.
- At least one of said feature characterizing parameters comprises a central value over time, such as a mean value, an average value and/or a median. In an embodiment of the invention at least one of said feature characterizing parameters comprises a measure of the spread over time, standard deviation, variance or inter quartile range.
- preprocessing of the audio input signal is done prior to the establishment of said at least two feature characterizing parameters.
- said time variant distribution function is determined in a time window.
- the time variant distribution function should be determined as a function of time and in a time window of the input signal. In this way, a runtime updating of the perception intensity may be obtained and, moreover, when applying a time window, a memory in the method with respect to previous behavior of the input signal.
- Examples of a runtime window would range from, e.g., approximately 1/10 second and, e.g., up to 30 seconds.
- the window may in principle be much larger than 30 seconds, solely depending on the input signal to be evaluated and the intentions of the user.
- An overall evaluation of perception intensity of an audio signal e.g. an audio track of a CD or several minutes, may, thus, be evaluated according to the invention if so desired.
- At least two different partial representations (PRl, PR2,..PRn) of the audio input signal (IS) are established, at least two different statistical functions (SFl, SF2, SFn) are established on the basis of at least one of said different partial representations (PRl, PR2,..PRn) of said audio input signal (IS), said determined statistical functions are combined into a loudness representation by means of at least one non-linear signal processing.
- the loudness estimation is initially performed on the basis of an (initial) individual analysis of different bands of the complete audio input signals, which are subsequently combined into at least one, preferably one, combined loudness estimate.
- said audio input signal is modified on the basis of said evaluated perception intensity.
- the evaluated perception intensity should preferably from the basis of a modification of the input signal or an input signal corresponding thereto.
- the modification should preferably be automatic by means of signal processing hardware.
- said modifying of the audio input signal is performed as a gain control of the complete or a part of the audio input signal (IS).
- different controlling of the input signal may be performed on the basis of the determined loudness estimate although a simple straightforward gain control may typically be quite sufficient in order to establish, e.g., a somewhat smoothed loudness between different input signals.
- a gain control may, e.g., be narrowed to a certain band or certain bands, e.g. by a boosting or a damping of parts or a part of the input signal.
- said audio input signal comprises a multichannel signal.
- a multichannel signal may, e.g., comprise a stereo signal, a five or six-channel surround sound signal format, etc, all representing an audio representation which may be evaluated advantageously into one or a number of perception intensity representations.
- One of these may, e.g., be an overall loudness perception intensity of the complete multi- channel signal.
- the perception intensity refers to one shared parameter evaluation of the audio input signal or a derivative thereof.
- the audio input signal or a derivative thereof is evaluated with respect to two or more different types of perception intensity and combinations thereof.
- the perception intensity of an audio input signal may comprise sharpness, annoyance, airiness, punchiness brilliance, presence, fatness, deepness and edginess or any combination thereof.
- an example of a more complex evaluation of an input signal would be an evaluation of a 5.1 audio input signal with respect to loudness and annoyance.
- said method is implemented in signal processing hardware, such as a digital signal processor and optional supporting electrical circuitry.
- non-linear function is established on the basis of adaptation data (AD).
- Adaptation AD could e.g. be registering the user behavior of a signal processing device, e.g. a consumer amplifier, and modifying the performed signal processing accordingly.
- a specific example of such embodiment may be an amplifier, which may be used in a "learn-mode" by a user and combined with a registered user behavior - e.g. a registering of the user settings, modifying the function of the block ASP.
- This embodiment is in particular advantageous when applying a non-linear transfer function established by a neural network, as the learn mode may be activated on a run-time basis if so desired.
- Adaptation data AD could also be a previously collected data set.
- the invention relates to a perception intensity estimating device comprising signal processing means performing the method according to any of the claims 1-34.
- the device comprises monitoring means for displaying the estimated perception intensity.
- the device comprises control means for controlling connected electronic circuitry in response, to the established perception intensity.
- fig. 1 illustrates an exemplary, audio signal to be evaluated according to an embodiment of the invention
- fig. 2 illustrates specific applicable distribution function characterizing features
- fig. 3 illustrates the distribution of amplitude of the first two second segment of fig. 1,
- fig. 4 illustrates the distribution of amplitude of the second two second segment of % 1,
- fig. 5 illustrates the distribution of amplitude of the third two second segment of fig. 1,
- fig. 6 illustrates the distribution of amplitude of the fourth two second segment of fig. 1,
- fig. 7 illustrates the distribution of amplitude of the fifth two second segment of fig. 1,
- fig. 8 illustrates the distribution of amplitude of the sixth two second segment of fig. 1,
- fig. 9 illustrates the extracted feature parameters of fig.1 as a function of time
- fig. 10 illustrates the resultant obtained loudness estimates related to the audio signal of fig. 1
- figs. 11-13 illustrate a further embodiment of the invention applying a multiband evaluation of the input signal of fig. 1,
- figs. 15A and 15B illustrate two examples of evaluation principles of the invention
- fig. 16 illustrates a more general control principle of the invention
- fig. 17 illustrates a flow chart of an applicable evaluation and control algorithm according to an embodiment of the invention
- figs. 18A-18D illustrate different examples of distribution function characterizing parameters
- fig. 19 illustrates a hardware implemented preferred device according to an embodiment of the invention. Detailed description Initially, an embodiment of the invention will be described specifically with reference to a specific time varying audio sequence and related to loudness evaluation.
- Fig. 1 illustrates a time domain amplitude representation of a twelve second audio signal as a function of time.
- the illustrated audio signal was constructed to represent six different audio signals each forming a two second sound segment window from each of the following sound segments: a A) IkHz tone, B) Pink noise C) Reference female speech D) Rock music E) Big band jazz F) Clarinet duet
- an audio input signal preferably in the forms of one or a number of sample streams, should initially be processed in order to extract the necessary and sufficient input signal characterizing features.
- time variant characterizing features are inter quartile range, median, sum of squares, percentiles, average, maximum, minimum, standard deviation, sum or variance and combinations thereof.
- the combination of these characterizing features should, according to the invention, characterize the distribution function of the audio input signals.
- the necessary exactness of the time varying functions may vary depending on the desired type of evaluation and the type of input signal. It is generally desired that a two-dimensional representation of the time varying distributing function representing the input signal is obtained.
- Fig. 2 illustrates the principles of some specific time variant distribution function characterizing features applied according to a specific embodiment of the invention. It is noted that several other time variant features may, of course, be applied for the purpose.
- the specifically chosen and illustrated parameters are statistical parameters such as maximum, median and inter quartile range (IQR), defined as the distance between the first and third quartile of a specific statistical representation of an input audio signal.
- IQR inter quartile range
- each of the abovementioned six two-second segments will be analyzed individually and non-overlapping in a single frequency band.
- the two calculated signal features are: the median and the inter-quartile range (IQR) of the dB magnitude of the signal. These two functions are commonly used in descriptive statistics as robust measurements of the central tendency and the spread of a distribution, respectively.
- Fig. 3 illustrates the distribution of amplitude of the first two second segment A, namely the IkHz tone.
- the 1 st quartile, 3 rd quartile and the median are marked up as IQ, 3 Q and M, respectively.
- Fig. 4 illustrates the distribution of amplitude of the second two second segment B, namely the pink noise.
- the 1 st quartile, 3 rd quartile and the median are marked up as IQ, 3 Q and M, respectively.
- Fig. 5 illustrates the distribution of amplitude of the third two second segment C, namely the speech signal.
- the 1 st quartile, 3 rd quartile and the median are marked up as IQ, 3 Q and M, respectively.
- Fig. 6 illustrates the distribution of amplitude of the fourth two second segment D, namely the rock music signal.
- the 1 st quartile, 3 r quartile and the median are marked up as IQ, 3 Q and M, respectively.
- Fig. 7 illustrates the distribution of amplitude of the fifth two second segment E namely the big band signal.
- the 1 st quartile, 3 r quartile and the median are marked up as IQ, 3 Q and M, respectively.
- Fig. 8 illustrates the distribution of amplitude of the sixth two second segment F, namely the clarinet duo signal.
- the 1 st quartile, 3 rd quartile and the median are marked up as IQ, 3 Q and M, respectively.
- the non-linear function may, e.g., be provided by an artificial neural network trained by data representing different tests performed by test persons.
- the input audio signal is initially divided into nine octave bands Bl to B9.
- the magnitude in each octave frequency band Bl to B9 is illustrated in fig. 11 as a function of time.
- the evaluated input signal corresponds to the already described twelve second signal of fig. 1.
- Fig. 14 illustrates a more general evaluation principle of the invention
- An audio input signal representation IS is input to a block FPE performing feature parameter extraction.
- the performing feature parameter extraction has the purpose of representing the input signal IS suitably for the further evaluation of the signal.
- the audio representative input signal must be represented in a certain way to facilitate the desired evaluation of perception intensity.
- an at least two- dimensional statistical description over time of the input signal must be estimated for the purpose of evaluating perception intensity according to the invention. More specifically such a two-dimensional description of the input signal is referred to as a distribution function of the input signal.
- a distribution function of the input signal Several different statistical functions may be applied within the scope of the invention. Examples of such function may be inter quartile range, median, sum of squares, percentiles, average, maximum, minimum, standard deviation, sum, variance.
- the description of the shape of the distribution function may be obtained in several different ways, e.g. by means of at least two at least partly linear independent functions.
- further descriptive parameters i.e. further dimensional description serving the purpose of providing a more detailed description of the distribution function, may be applied according to the invention.
- a partial description of the distribution function of the input signal according to the invention may also be obtained by more conventional filtering typically not associated as a statistical function.
- An example of such is a mean value over a time interval which may be e.g. be obtained by a conventional integrating filter.
- shape of a distribution function preferably refers to a shape of a function which has been fixed with respect to the axis of the distribution function.
- Another example is an initial band-pass filtering of an input audio signal into two or several bands for the purpose of individual handling of the different bands prior to the estimation of perception intensity.
- Such initial splitting of the input signal into different bands may, e.g., ease the process of establishing a non-linear function fitting a relevant perception intensity reference database.
- Such preprocessing is preferred for the purpose of reducing the complexity of the subsequent establishment of a perception intensity estimate.
- the length of the time intervals of the input signal applied for extraction of feature parameters may vary from application to application. Likewise, the interval between the evaluation of a new perception intensity estimate may vary. The two mentioned intervals do not necessarily need to be identical.
- the invention although very advantageous with respect to loudness as explained above, may be utilized for evaluation of very different types of perception intensity such as sharpness, annoyance, and airiness.
- the invention features a very advantageous adaptation to each purpose as the invention basically needs to adapt ultimately one non-linear function to the purpose as the rest of the processing equipment and critical settings may be fixed or principally fixed.
- an initial setting of a non-linear function may be changed over time, e.g. on the basis of user behavior.
- the signal processing performed in the block SP is based on a non-linear transfer function.
- the preferred processing of the estimated distribution function is non-linear as the available non-linear processing is very advantageous in connection with complex evaluation of two or several input parameters.
- a non-linear function may be established on the basis of a multidimensional input by machine - learning, e.g. by means of a neural network.
- Preferred descriptive parameters comprise two substantially orthogonal or linearly independent descriptive parameters expressing a central tendency and a spread of distribution of preferably the amplitude of an input signal.
- the resulting perception intensity estimate PIE may, e.g., be fed to a perception intensity metering for a run-time monitoring of the perception intensity of the input signal IS.
- a perception intensity metering for a run-time monitoring of the perception intensity of the input signal IS.
- En example of such meter may be a loudness meter.
- Preprocessing would, e.g., serve the purpose of reducing complexity of the audio input signal and, thereby, facilitate a more efficient establishment of a distribution function.
- Fig. 15A illustrates an example of a general control principle of the invention based on the embodiment illustrated in the above fig.14.
- an input signal IS is feature extracted in a feature extraction block FPE and perception intensity estimate is subsequently established on the basis of the distribution function established by block FPE.
- the input signal IS is bypassed to a signal processing block SPA and the input signal IS may then be processed according to the perception intensity estimate PIE established by the block SP.
- the resulting modified audio signal MIS is subsequently output.
- a real-life example of such an embodiment is an automatic gain control of an input signal IS.
- Fig. 15B illustrates a further example of a control principle of the invention based on the embodiment illustrated in the above fig.14; basically a variant of fig. 15 A.
- An input signal IS is fed to a signal processing block SPA and the input signal IS may then be processed according to the perception intensity estimate PIE established by the block SP.
- the resulting modified audio signal MIS is subsequently output.
- a real-life example of such an embodiment is an automatic gain control of an input signal IS. According to this embodiment, however, the feature extraction is performed on the resulting modified output signal.
- Fig. 16 illustrates a further embodiment of the invention basically corresponding to the above-illustrated embodiment but now the signal processing block SP of fig. 15 A or 15B has been exchanged with an adaptive signal processing block ASP.
- the adaptive signal processing block is adapted for adaptation data AD.
- Adaptation AD could e.g. be registering the user behavior of a signal processing device, e.g. a consumer amplifier, and modifying the performed signal processing accordingly.
- a specific example of such embodiment may be an amplifier, which may be used in a "learn-mode" by a user and combined with a registered user behavior - e.g. a registering of the user settings, modifying the function of the block ASP.
- This embodiment is in particular advantageous when applying a non-linear transfer function established by a neural network, as the learn mode may be activated on a run-time basis if so desired.
- Adaptation data AD could also be a previously collected data set.
- Fig. 17 illustrates a flow chart of an applicable evaluation and control algorithm according to an embodiment of the invention.
- the described flow chart may, e.g., be implemented in a signal processing device or signal processing circuitry described in principles according to fig. 19 and applied on the signals described with reference to fig. 1.
- step 100 an audio signal representation is provided, typically in the form of a digital audio signal.
- an analog program material may be applied although an initial A/D conversion would be strongly preferred for the purpose of a subsequent streamlined and efficient signal processing.
- a time window is applied to the provided audio signal representation.
- the selected window is chosen to be the individual sound segments; that is, the six different audio signals as explained with reference to fig. 1.
- the use of such discrete non-overlapping sound segments is here applied, as only a single number representing the relative loudness of each segment is desired.
- other approaches to a sliding window may include a complete audio track or, e.g., a true sliding window comprising a dynamically sliding audio window having a certain, typically fixed, time length.
- the time length may, e.g., be a 1.5 second window.
- step 102 the input audio signal is normalised in level in order to optimize use of the dynamic range of the following steps.
- the normalization is performed by using a weighted RMS measurement. This level normalisation is compensated at the end of the measurement procedure.
- a broadband crest parameter is calculated as the ratio between the overall unweighted RMS value and a pseudo peak value (attack time 1 ms). This value, Crest, is converted into dB.
- a filterbank is applied as a rough approximation of the frequency analysis in the human ear.
- the applied filters are octave wide, and an overall bandwidth limitation is also applied.
- step 104 a full wave rectification is applied to the processed signal.
- the output of each band is passed through an abs() function. This implies that the loudness measurement method is insensitive to the absolute phase of the input signal.
- the BandCrest is the maximum value divided by the overall RMS value per band. This value is converted into dB.
- the BandCrest vector contains one value for each frequency band.
- each of the rectified filter output signals are filtered with a first order low pass filter with asymmetric time constants to extract the short-term envelope of each band.
- the time constant - natural logarithm based - is 20 ms
- the time constant is 50 ms
- step 106 the level of the processed signal is converted to level in dB by taking 20 times the logarithm (base 10) of the envelope.
- step 107 for each band, two percentiles are calculated: The 50th percentile (corresponding to the median) and the 90th percentile (corresponding to the value which 10% of the values are above). These two latter statistics are referred to as the lower and the upper percentiles, respectively.
- step 108 a feature vector is constructed from the following parameters:
- Each of the linear combinations is implemented by first subtracting a constant value from each contributing parameter, and then multiplying the result by another constant value.
- N lincom ⁇ (parameter, - ⁇ ( ) • w,
- step 109 the non-linear function is established for the purpose of mapping the feature parameters into a loudness estimate.
- the applied network comprises a multi-layer perceptron type having a tan-sigmoid activation function for the units in the single hidden layer and, moreover, it comprises a single output unit with a linear activation function.
- the tan- sigmoid activation function is expressed as:
- the topology of the neural network is as follows: There are thirteen input units (normalised features). The first nine represent bands 1-9 from the reference signal, the last 2 plus 2 are the percentile difference and crest features, respectively. These thirteen input units are connected to hidden-layer units of the ANN, and the hidden- layer units are in turn connected to the single output unit.
- the input to the neural network thus, consists of the 9+2+2 feature parameters, normalised by addition of real-valued constants in the range [-50,50], and multiplication by real-valued constants in the range [0,10].
- the weights connecting the units of the network are optimised to predict the perceived loudness.
- the neural network weights are real- valued constants in the range [—16,16], and the bias values are real- valued constants in the range [-3,71].
- step 110 a loudness estimate is determined on the basis of the above-described non-linear function provided according to the previous step.
- the last step in computing the relative loudness level value consists of de- normalising the output of the neural network. This may be done by adding the weighted level measured at the start in step 102 to the output of the neural network.
- step 115 the loudness of a reference signal is provided.
- the loudness of a reference signal is estimated corresponding to the output of block 110. This value is kept as a constant within the model in order to enable calculation of gain correction values.
- the model itself does not assume any particular relationship between digital levels and playback SPL but a practical value for some purposes would be 100 dB SPL for digital full scale. With this assumption the loudness level estimate of a specific reference signal used is: 72.2 dB (phon).
- step 111 and 112 a gain correction is computed.
- Figs. 18A to fig. 18D illustrate different combinations of distribution characterizing parameters applicable within the scope of the invention.
- the estimation characterizing parameters i.e. shape defining parameters, are applied to the same distribution function TVDF.
- the distribution function TVDF is mapped in as numbers of signal samples per time unit NSS as a function of amplitude A of an audio input signal.
- the distribution function TVDF of an input signal is characterized by two shape-defining parameters, namely interquartile range IQR and median M.
- the distribution function of an input signal is characterized by three shape defining parameters, namely distribution range DR, a minimum amplitude value MIN 3 and a maximum amplitude value MAX.
- the shape of distribution may, basically, be said to be represented completely by two distribution characterizing parameters, namely the distribution range DR and one of the amplitude values MIN or MAX.
- the distribution function may be estimated by more than two characterizing parameters, e.g. four, namely a combination of the illustrated parameters of figs. 18A and 18B, i.e. median, interquartile range, max value and distribution range.
- Min, Max, Range and Mid Range are the minimum and maximum values, respectively.
- the mid range is,
- Percentile The r th percentile ofx is the value such that r percent of the data in x falls at or below that value.
- Interpolated percentile Interpolation, such as linear interpolation, may be used in the calculation of the percentile, which makes the percentile parameter 'smoother', in particular in cases with small sample sizes.
- Median and Quartiles The median is the value such that half of the data in x falls below that value and half above,
- the first, second and third quartiles are,
- Qi the median of the data that falls below the median; this is also the 25 th percentile.
- Q 2 the median or the 50 th percentile.
- Qi the median of the data that falls above the median; this is also the 75 th percentile.
- Inter quartile range and Mid mean The inter-quartile range (IQR) is,
- the mid mean is,
- MidMean a mean of the data between the 25 th and 75 th percentiles Trimmed Mean and Winsorized Mean The trimmed mean is similar to the mid mean except that different percentile values are used. A common choice is to trim 5% of the data in both the lower and upper tails of the distribution, i.e. the trimmed mean is the mean of the data between the 5 th and 95 th percentiles.
- the winsorized mean is similar to the trimmed mean. However, instead of trimming the extreme data samples, they are set to the lowest (or highest) value. For example, all data below the 5 th percentile is set equal to the value of the 5 th percentile, and all data greater than the 95 th percentile is set equal to the 95 th percentile.
- Mode the value of the data sample that occurs with the greatest frequency.
- the mode may be defined as the midpoint of the histogram-interval with the highest peak.
- Skewness The skewness measures the amount of asymmetry of the distribution
- Outlier-detectors A) The proportion of the data samples that is higher than m standard deviations above, or lower than m standard deviations below the mean value!
- WeightedDeviation 3/ , 2 - Y (x - x) 3 ' 2
- Fig. 19 illustrates a hardware implemented preferred device according to an embodiment of the invention.
- the perception intensity evaluator comprises an input block BP comprising a filter bank of band-pass filters, e.g. octave filters adapted in a conventional manner to divide an incoming audio signal into a parallel representation.
- the parallel representations are fed to an analyzer block DFC.
- the analyzer block DFC is adapted for extraction of feature parameters of the input signal. Such feature parameters have also been referred to above as distribution function characterizing parameters.
- Processing block NF When the distribution function of the individual bands has been established, they are fed to a processing block NF performing a non-linear processing of the parallel signal. The resulting processing is transformed into one expression of the overall perception intensity in the block PIE.
- Processing block NF may be adapted to adaptation data AD as previously described with reference to fig. 16.
- the established evaluation is fed to a block ACE performing a monitoring of the evaluated perception intensity and/or performing an automatic control of the signal on the basis thereof.
- the illustrated hardware may, e.g., be implemented in a Motorola DSP 56303 and optional supporting circuitry.
- the illustrated device may comprise monitoring means (not shown) for displaying the estimated perception intensity.
- the illustrated device may comprise control means for controlling connected electronic circuitry in response to the established perception intensity (not shown).
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/630,727 US8175282B2 (en) | 2004-06-25 | 2004-06-25 | Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation |
EP04738955A EP1766610A1 (en) | 2004-06-25 | 2004-06-25 | Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation |
PCT/DK2004/000458 WO2006000215A1 (en) | 2004-06-25 | 2004-06-25 | Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/DK2004/000458 WO2006000215A1 (en) | 2004-06-25 | 2004-06-25 | Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006000215A1 true WO2006000215A1 (en) | 2006-01-05 |
Family
ID=34957819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DK2004/000458 WO2006000215A1 (en) | 2004-06-25 | 2004-06-25 | Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation |
Country Status (3)
Country | Link |
---|---|
US (1) | US8175282B2 (en) |
EP (1) | EP1766610A1 (en) |
WO (1) | WO2006000215A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6207646B1 (en) * | 1994-07-15 | 2001-03-27 | University Of Iowa Research Foundation | Immunostimulatory nucleic acid molecules |
US20130268177A1 (en) * | 2012-04-05 | 2013-10-10 | Chrysler Group Llc | Individual cylinder fuel air ratio estimation for engine control and on-board diagnosis |
GB2539785B (en) * | 2013-12-06 | 2020-12-16 | Halliburton Energy Services Inc | Controlling wellbore drilling systems |
CN104142879B (en) | 2014-04-08 | 2016-04-20 | 腾讯科技(深圳)有限公司 | A kind of audio loudness reminding method, device and user terminal |
US10964335B2 (en) * | 2018-04-09 | 2021-03-30 | Qualcomm Incorporated | Multiple microphone speech generative networks |
-
2004
- 2004-06-25 EP EP04738955A patent/EP1766610A1/en not_active Withdrawn
- 2004-06-25 WO PCT/DK2004/000458 patent/WO2006000215A1/en not_active Application Discontinuation
- 2004-06-25 US US11/630,727 patent/US8175282B2/en active Active
Non-Patent Citations (7)
Title |
---|
EARL VICKERS: "Automatic Long-Term Loudness and Dynamics Matching", 111TH CONVENTION OF THE AUDIO ENGINEERING SOCIETY, 21 September 2001 (2001-09-21) - 24 September 2001 (2001-09-24), NEW YORK, NY, USA, pages 1 - 11, XP002317489 * |
FASTL H: "LOUDNESS OF RUNNING SPEECH MEASURED BY A LOUDNESS METER", ACUSTICA, S. HIRZEL VERLAG, STUTTGART, DE, vol. 71, no. 2, June 1990 (1990-06-01), pages 156 - 158, XP000952134, ISSN: 0001-7884 * |
SIEGFRIED KLAR, GERHARD SPIKOFSKI: "On levelling and loudness problems at television and radio broadcast studios (Convention Paper 5538)", 112TH CONVENTION OF THE AUDIO ENGINEERING SOCIETY, 10 May 2002 (2002-05-10) - 13 May 2002 (2002-05-13), MUNICH, GERMANY, pages 1 - 17, XP002317487 * |
SKOVENBORG ET AL.: "Loudness Assessment of Music and Speech", 116TH CONVENTION OF THE AUDIO ENGINEERING SOCIETY, 8 May 2004 (2004-05-08) - 11 May 2004 (2004-05-11), BERLIN, GERMANY, pages 1 - 25, XP002317552 * |
SOULODRE: "Evaluation of Objective Loudness Meters (Convention Paper 6161)", 116TH CONVENTION OF THE AUDIO ENGINEERING SOCIETY, 8 May 2004 (2004-05-08) - 11 May 2004 (2004-05-11), BERLIN, GERMANY, pages 1 - 12, XP008042756 * |
SPIKOFSKI G ET AL: "Leveling and loudness - in radio and television broadcasting", EBU TECHNICAL REVIEW EUR. BROADCASTING UNION SWITZERLAND, no. 297, January 2004 (2004-01-01), XP002317488, ISSN: 1019-6587 * |
ZWICKER: "Procedure for calculating loudness of temporally variable sounds", J. ACOUSTIC SOCIETY OF AMERICA, vol. 62, September 1977 (1977-09-01), pages 675 - 682, XP008042823 * |
Also Published As
Publication number | Publication date |
---|---|
EP1766610A1 (en) | 2007-03-28 |
US8175282B2 (en) | 2012-05-08 |
US20080031464A1 (en) | 2008-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9391579B2 (en) | Dynamic compensation of audio signals for improved perceived spectral imbalances | |
JP4486646B2 (en) | Method, apparatus and computer program for calculating and adjusting the perceived volume of an audio signal | |
JP5507596B2 (en) | Speech enhancement | |
EP2162879B1 (en) | Loudness measurement with spectral modifications | |
JP5730881B2 (en) | Adaptive dynamic range enhancement for recording | |
TWI455481B (en) | Non-transitory computer-readable storage medium, method and apparatus for controlling dynamic gain parameters of audio using auditory scene analysis and specific-loudness-based detection of auditory events | |
AU2011244268B2 (en) | Apparatus and method for modifying an input audio signal | |
RU2507608C2 (en) | Method and apparatus for processing audio signal for speech enhancement using required feature extraction function | |
US7848531B1 (en) | Method and apparatus for audio loudness and dynamics matching | |
US7508948B2 (en) | Reverberation removal | |
JP4879180B2 (en) | Frequency compensation for perceptual speech analysis | |
CN102543095B (en) | For reducing the method and apparatus of the tone artifacts in audio processing algorithms | |
WO2011018428A1 (en) | Method and system for determining a perceived quality of an audio system | |
Lee et al. | The effect of loudness on the reverberance of music: Reverberance prediction using loudness models | |
CN102610232B (en) | Method for adjusting self-adaptive audio sensing loudness | |
Perez‐Gonzalez et al. | Automatic mixing | |
US10008998B2 (en) | Method, apparatus, and system for analysis, evaluation, measurement and control of audio dynamics processing | |
US8175282B2 (en) | Method of evaluating perception intensity of an audio signal and a method of controlling an input audio signal on the basis of the evaluation | |
Huber | Objective assessment of audio quality using an auditory processing model | |
EP3718476B1 (en) | Systems and methods for evaluating hearing health | |
CN106533379B (en) | Method and apparatus for processing audio signal | |
CN117544262A (en) | Dynamic control method, device, equipment and storage medium for directional broadcasting | |
Kozłowski et al. | The comparison between subjective and objective, perceptual based evaluation of compressed speech and audio signals | |
AU7145600A (en) | Method and apparatus for estimating a spectral model of a signal used to enhance a narrowband signal | |
Terrell et al. | Research Article Automatic Noise Gate Settings for Drum Recordings Containing Bleed from Secondary Sources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2004738955 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11630727 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2004738955 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11630727 Country of ref document: US |