GB2514422A

GB2514422A - Improvements in audio systems

Info

Publication number: GB2514422A
Application number: GB1309435.4A
Authority: GB
Inventors: Clyde Witchard
Original assignee: ALIEN AUDIO Ltd
Current assignee: ALIEN AUDIO Ltd
Priority date: 2013-05-24
Filing date: 2013-05-24
Publication date: 2014-11-26
Also published as: GB201309435D0

Abstract

A fading curve generator 1 includes a curve rendering engine (CRE) 2 which outputs a curve of fader gain versus time. The curve may used to control a fader (fig.3). The shape of the fading curve may be controlled by variable parameters supplied to the CRE 2 by an optimiser 5 which takes as inputs: a measure of rate of change of loudness as determined by a loudness rate model 3, and a measure of spectral leakage as determined by a spectral leakage model 4. In turn, the loudness rate model 3 and the spectral leakage model 4 receive the curve generated by the CRE 2. The fading curve generator 1 iterates until optimization is complete, whereupon an optimized fading curve is available at the generator's output. Such a fading curve may be used within an audio fader to provide a fade with low perceptual salience, even when the available time for fading is short. The invention finds application in the shaping of audio trigger signals that may be generated to reduce tinnitus, utilising the phenomenon of residual inhibition.

Description

Improvements in Audio Systems The present invention relates to improvements in audio systems. The invention relates to a fading curve generator, particularly, but not exclusively, a fading curve generator for modifying an audio signal.

Many types of final audio content (e.g. music tracks, radio programs, or video sound tracks) are produced or mixed from one or more original audio sources. Examples of sources include pre-recorded or pre-generated audio, or live audio signals. Sometimes there is a requirement for the final audio mix to contain a transition, for a particular original audio source (or sources), from being silent in the mix, to being played at some sound level; or vice versa.

These transitions are usually achieved by somehow gradually changing the sound level of that source from it being silent in the mix to its final intended level, or vice versa. This type of sound level transition is generally known as fading, and can be accomplished by a sub-system called a fader. Depending on its design, a fader may be controlled entirely manually, by a person; or it may be partly or wholly controlled by an automatic system.

In order to review the prior art, it is useful to briefly define the term "fading curve". In general, a fader takes an input signal and amplifies it with a particular gain. During the act of fading, clearly the gain is varied over some period of time. The function of gain versus time during this period (or conceptually, its graph) is termed here the fading curve. Many existing automatic faders employ a "straight-line fade". This is a fading curve that is simply a straight-line graph when the gain is expressed as a raw amplitude multiple (as opposed to decibel form, for example).

A problem with this type of fading curve is that, during the fade, the perceived loudness changes much more rapidly around the regions of the fading curve that are close to zero gain. This rapid change of perceived loudness increases the perceptual salience (i.e. the obtrusiveness) of the transition, as is argued further later. There can be spectral problems with a straight-line fade, too. Whenever a fader's gain is changed, in terms of signal processing mathematics this amounts to a form of modulation or windowing. It is thus mathematically inevitable that there will be modification of the original input frequency spectrum. This is seen in terms of extra frequency components that appear in the fader's output signal; components that are not present in its input signal. In the engineering terminology of modulation systems, these extra frequency components would be termed sidebands. In the terminology of signal windowing, they would be termed spectral leakage.

In this specification, the term spectral leakage is used to denote these extra frequency components.

Spectral leakage can be a problem, not only for reasons of general reduced audio fidelity, but also because the spectral leakage can, in some instances, spill from frequencies to which human hearing is less sensitive, into frequencies to which human hearing is more sensitive. For a given shape of fading curve, spectral leakage becomes more pronounced as the total time allowed for the curve is reduced, in that the new spectral components become ever more displaced in frequency from the spectral components of the original signal.

Another known example of a fading curve is one in which the gain changes by a fixed decibel value per unit time. In other words, the fading curve graph has the form of a straight line when the gain is expressed logarithmically. Fader gain is routinely expressed and controlled in decibel form, so this would seem a convenient type of fading curve to use. However, it can cause problems in terms of obtrusiveness. Although the rate of change of perceived loudness is, with limitations, more uniform than the raw amplitude straight-line fade discussed above, spectral leakage is substantially more of a problem. It is found that significant frequency components of spectral leakage occur at a greater frequency displacement from the frequency components of the original signal. This is due to a sharp edge that occurs in the graph of fader gain versus time, when the gain is viewed linearly (i.e. as a raw amplitude multiple). When viewed in such a way, the fading portions of the graph are exponential in form, with a sharp corner where the tops of the exponential curves meet the final fader gain value. It is known that sudden (or narrow) features in a time domain signal usually equate to wide features in its frequency domain representation. This is the mathematical basis of the wide spreading of spectral leakage components caused by this form of fading curve. The problem is compounded when the fade must be completed in a short period of time.

Another well-known type of audio fading is cross-fading. In cross-fading, one original audio source is faded out, whilst another is simultaneously faded in. Many cross-fading systems use a "constant power" approach, in which the power of the signal output is kept nominally constant throughout the fade. (These systems often assume that the two original source signals have the same signal power, and that they are uncorrelated.) However, this approach has limitations. Firstly, the approach only works in the situation where a cross-fade is required, i.e. a transition from one original audio source to another. (It does not work for transitions to or from silence, for instance.) Secondly, maintaining constant output power throughout the cross-fade does not imply any particular shape (or rate) of fading curve. For example, any fading curve shape may be chosen for one source; then the fading curve for the other can simply be calculated so as to maintain constant power. So, for the special case of a cross-fade, although attempts have been made to keep perceived loudness constant, the spectral leakage remains undefined, with consequent possible problems as outlined earlier.

The development of the present invention was motivated by a specific application: to provide relief from chronic tinnitus, using a well-known effect called residual inhibition. In this application, alternating periods of a certain type of sound (termed trigger sound) and silence are played to the user. It is known that for most people with tinnitus, their tinnitus can be reduced, or even silenced, for a temporary period after playing an appropriate trigger sound. It is also known that by playing repeating trigger sounds, where each trigger sound is followed by an appropriate period of silence, tinnitus suppression can be maintained for as long as these sounds are played.

However, for the technique to be practically useful, the repeating trigger sound must be less annoying to the user than their tinnitus. A key factor in achieving this is minimizing the perceptual salience of the fading between the periods of trigger sound and the periods of silence. However, there is design pressure to keep the fading period short, as the longer the sound level is kept at its intended full level, the more effective the residual inhibition in terms of its depth and duration. Previously published work on repeating trigger sounds has simply used an instant transition, i.e. no fading period, between the periods of trigger sound and silence. This abrupt form of transition is highly obtrusive in terms of two aspects of perceptual salience.

Firstly, the perceived loudness changes very quickly, making this perceptual aspect particularly attention-grabbing. Secondly, the transition also gives rise to very high levels of spectral leakage. This spreads energy from the intended frequency region, which is generally in a less sensitive region of the user's hearing, into frequencies for which the user's hearing has greater sensitivity, with consequent increased obtrusiveness. In published annoyance studies, groups of people rated this type of sound as being highly annoying, as might be expected from the forerunning arguments.

According to a first aspect of the present invention, there is provided a fading curve generator for modifying an audio signal, the generator including a loudness rate model which provides a measure of rate of change of loudness and a spectral leakage model which provides a measure of spectral leakage, wherein the generator generates a fading curve which optimizes the rate of change of loudness measure and the spectral leakage measure.

According to a second aspect of the present invention, there is provided an audio signal fader for modifying an audio signal, the fader comprising a variable gain element, the variable gain element having an input which is a fading curve, wherein the fading curve optimizes a measure of a rate of change of loudness and a measure of spectral leakage.

Possibly, the audio signal fader includes a fading curve generator which generates the fading curve. Alternatively, the fader includes a store, which stores one or more fading curves generated by a fading curve generator.

Possibly, the generator includes a loudness rate model which provides a measure of rate of change of loudness and a spectral leakage model which provides a measure of spectral leakage.

Possibly, the measure of rate of change of loudness is derived from psychoacoustic data.

Possibly, the fading curve generator includes an optimizer, which optimizes the rate of change of loudness measure and the spectral leakage measure. Possibly, the optimizer utilizes psychoacoustic data.

According to a third aspect of the present invention, there is provided an audio system, the audio system including an audio signal fader, the fader comprising a variable gain element, the variable gain element having an input which is a fading curve, wherein the fading curve optimizes a measure of a rate of change of loudness and a measure of spectral leakage.

Possibly, the system includes one or more sound-emitting transducers that are fed, directly or indirectly, by an output of the audio signal fader.

Possibly, the fader includes an input which is an audio signal.

Possibly, the audio input has a fading period, which is the time period over which a fade (ie a change in loudness) occurs, and a direction (ie rising or falling), and the fading curve is dependent on the fading period and the direction of the audio input. Possibly, the fader includes an output which is a modified audio signal, in which the perceptual salience of the output audio signal has been minimised. Possibly, only the perceptual salience of any fading performed by the fader is minimised. Possibly, the perceptual salience is minimised in accordance with psychoacoustic data.

Possibly, the audio system includes transmission and/or distribution apparatus to relay the modified audio signal to a remote user.

According to a fourth aspect of the present invention, there is provided a method of generating a fading curve so as to optimize a measure of rate of change of loudness and a measure of spectral leakage.

According to a fifth aspect of the present invention, there is provided a method of modifying an audio signal, the method including generating a fading curve so as to optimize a measure of rate of change of loudness and a measure of spectral leakage.

Possibly, the method includes any of the steps or features described above.

According to a sixth aspect of the present invention, there is provided a computer program comprising computer program code means adapted to perform the method described in any of the preceding paragraphs when the program is run on a computer, a mobile device, a computer network or a telephony network.

Possibly, the computer program is embodied on a computer readable medium.

Embodiments of the present invention will now be described, by way of example only, and with reference to the accompanying drawings, in which:-Figure 1 is a block schematic diagram of a fading curve generator; Figure 2 is a block schematic diagram, illustrating a component of the fading curve generator; Figure 3 is a block schematic diagram of an audio signal fader that includes the fading curve generator; Figure 4 is a block schematic diagram of an audio signal fader that includes a store of one or more fading curves generated by the fading curve generator; and Figure 5 is a block schematic diagram of a system that includes the audio signal fader of Figures 3 or 4, and that has one or more sound emitting transducers.

In accordance with an embodiment of the present invention, in Figure 1 a fading curve generator 1 comprises a number of functional sub-blocks that are described as follows. The output of any of the sub-blocks numbered 2, 3, 4, 5 or 6 may be in the form of a signal, or data placed into a data store (e.g. electronic memory, as might be used in a microprocessor-based implementation), or in any format that allows compatible information transfer.

A curve rendering engine (CRE) 2 outputs a curve (equivalently, a mathematical function) of fader gain versus time. The curve represents fader gain over the time period of a single fade, either for fading in from silence to a full level, or for fading out from a full level to silence. The ORE 2 generates (or renders) the curve in accordance with a variable geometry, which is described in more detail later. The particular shape of the curve generated by the variable geometry is determined by various input parameters. One or more of the input parameters may be fixed throughout the operation of the fading curve generator 1, in which case such input parameters are supplied to the CRE 2 by a store of fixed CRE inputs 6. One or more other input parameters can vary throughout the operation of the fading curve generator 1, and these are supplied to the ORE 2 by an optimizer 5. The optimizer 5 decides the values of the variable CRE inputs using a mathematical optimization procedure, which is discussed in more detail later. The optimizer 5 takes as inputs: a measure of rate of change of loudness as determined by a loudness rate model 3, and a measure of spectral leakage as determined by a spectral leakage model 4. These models are described in more detail below. In turn, the loudness rate model 3 and the spectral leakage model 4 take as input the curve generated by the CRE 2.

The entire system within the fading curve generator 1 operates in an iterative manner. Before running the iterative procedure, the optimizer 5 should preferably be initialized such that its outputs cause the CRE 2 to produce a valid curve of fader gain versus time, i.e. such that the time ranges over a specified fading period, and such that the fader gain (when expressed as a raw amplitude multiple) ranges between zero and one. The system within the fading curve generator 1 then iterates for a number of iterations until a terminating condition (discussed further below) is reached. During this iterative procedure, the optimizer 5 seeks to minimize both of its inputs, i.e. the measure of rate of change of loudness and the measure of spectral leakage. After the iterative procedure has terminated, a duly optimized fading curve is available at the output of the CRE 2 for external use outside of the fading curve generator 1 via an output 8, e.g. for use in a fader system. The details of the sub-blocks within the fading curve generator 1 are now discussed further.

Internally, the CRE 2 may use any one of a number of possible geometric models to define the curve of fader gain versus time. These models take various inputs as parameters, from the optimizer 5, and optionally from the fixed CRE inputs 6. The models are referred to here simply as variable geometries. The following list gives some examples of possible types of variable geometry that may be used: piecewise linear (also called a first-order spline), higher-order piecewise (also called a higher-order spline), single polynomial, additive combination of trigonometric functions (including the Fourier transform, or its inverse), or simple selection from a number of pre-existing curves.

Each item in the preceding list is a known type of mathematical function, and it will be apparent to those skilled in the art that there are other known types that could be used too. However, for this application there are a number of factors that favour the use of certain types of function over others.

Piecewise functions, by definition, have no jump discontinuities within the function itself. However, by repeatedly differentiating the function, up to some order, jump discontinuities are generally found. As is known, for a spline function of order n, the derivatives of order n-i and lower are guaranteed to be free from jump discontinuities. However, the derivatives of order n and higher generally contain jump discontinuities. For any function, if there is a jump discontinuity in a derivative of any order, the Fourier transform (i.e. the frequency spectrum) of that function will be infinitely wide. (In other words, the amplitude of the Fourier transform will be non-zero at frequencies up to infinity.) This is an important consideration for the present application, because if the frequency spectrum of the fading curve is broad, then the resulting spectral leakage frequency components (due to the action of fading) will generally be widely displaced from the frequency components of the original signal, and will therefore be more problematic. However, the effect of a jump discontinuity depends strongly on the order (n) of the derivative in which it occurs. In general, the larger the value of n, the sooner the fading curve's spectrum decays (with increasing frequency) to near-zero amplitude; although in theory it never gets to zero exactly.

For practical purposes, spectral leakage can therefore be constrained more narrowly around the frequency components of the original signal by using a higher-order spline for the fading curve, than by using a lower-order spline. The use of 3rd order (cubic) splines is almost ubiquitous in the practical application of spline functions. However, for the present application, a 4th order (quartic) spline is preferred. It provides a good balance between minimizing the spectral leakage issues just described, and known issues with higher-order splines (including oscillatory problems related to Runge's phenomenon, and general implementation complexity).

A single polynomial is also listed above as a possible type of variable geometry. Whilst single polynomials do have some appealing characteristics in this application, such as being totally continuous within the fading period for any order of derivative, they have some unappealing characteristics too. It is known that a polynomial curve can be specified by giving a number of points through which the curve must pass; a technique known as polynomial interpolation. However, above more than a few points, Runge's phenomenon can lead to large and problematic oscillations. If specified in other ways, such as by direct specification of the polynomial coefficients, oscillatory problems can still occur, and the control of the CRE 2 can become less tractable.

The additive combination of trigonometric functions, e.g. the use of an inverse Fourier transform, is another possibility. Again, this produces a curve that is totally continuous within the fading period for any order of derivative.

However, there can be problems with jump discontinuities at the start and end of the curve, i.e. where the fader's gain transitions between silence and the fading curve, or between the final full fader level and the fading curve.

Whereas a spline function of order n is guaranteed, for its derivatives up to order n-i, to have no jump discontinuities at the curve's start and end, there is in general no such guarantee with an additive combination of trigonometric functions. This can lead to significant spectral leakage.

Simple selection of one curve from a number of pre-existing curves is a perfectly valid form of operation for the CRE 2. It allows the system of the fading curve generator 1 to identify the best of a number of given candidate fading curves, in terms of minimizing both of the inputs of the optimizer 5, i.e. the measure of rate of change of loudness and the measure of spectral leakage. In this mode of usage, where the candidate fading curves have no particular ordering of variable features, it makes sense for the optimizer 5 to perform a simple brute-force search through all of the candidates. Of course, this method of using the fading curve generator 1 requires that a number of candidate fading curves have somehow been pre-prepared. Due to the inherently large number of possible fading curves, and the inherently slow nature of brute-force searching, in practice this method can only search a relatively small set of possibilities.

For the reasons discussed, it is therefore preferred that the CRE 2 uses a quartic spline function for its variable geometry. Alternatively, other functions can be used, as noted above.

The store of fixed CRE inputs 6 can hold a number of fixed parameters for the GRE 2, if required. In general, optimizers work more quickly, and sometimes more accurately, when constrained so as to avoid exercising unnecessary degrees (or amounts) of freedom. For example, the preferred quartic spline variable geometry takes as input a set of points (in the dimensions of gain and time) through which the fading curve must pass. It has been found that the time values of this set can be fixed, e.g. at equi-spaced instants throughout the fading period, and that this speeds the operation of the optimizer 5. (In other words, only the gain dimension is varied by the optimizer 5.) Therefore, for the case of a quartic spline variable geometry, it is preferred that the store of fixed GRE inputs 6 holds such a set of time values.

It is well known that many optimization algorithms (also known simply as optimizers) have been developed for the general purpose of finding optimal values for the arguments of a function. In the field of optimization, such a function is often called a cost function, or an objective function. By convention, the task of the optimizer is usually to minimize, rather than maximize, the value of a cost function. Thus the term "to optimize" usually means "to minimize a cost function value". (If an original problem requires a maximum value to be found, this can trivially be converted from a maximization problem to a minimization problem, e.g. just by changing the sign of the cost function value.) Optimizers generally work best when the graph representing the cost function is a smooth curve or surface of some form. Often this surface is in multiple dimensions: one dimension for the cost function value, and a dimension for each of the arguments of the function. This is the situation we have with the optimizer 5 presently considered. Each of the outputs of the optimizer 5 represents an argument of the cost function. The cost function itself is defined by the specific actions of the CRE 2, the loudness rate model 3, the spectral leakage model 4, and a means within optimizer 5 for dealing with its two inputs, shortly to be discussed. The multidimensional surface of this cost function is generally smooth, if the preferred schemes described herein are used. Therefore, the problem of minimizing the cost function value is a good candidate for a general-purpose optimizer. It will be apparent to the skilled person that there are many available optimizers that are adept at performing this kind of optimization, to good accuracy, and within a practically acceptable run time. Therefore, within the optimizer 5, the specific optimization algorithm is preferably one such conventional optimizer, and the algorithm's internal details are not discussed further here. However, there are considerations that need to be addressed regarding the handling of the two inputs to the optimizer 5.

The general optimization approach just outlined assumes that there is a single cost function value to be minimized. However, the optimizer 5 has two inputs, both of which are required to be minimized in some form. Two approaches for handling this are now discussed. The first approach is to combine the two inputs into a single measure. The second is to use the known method of multi-objective optimization. The first approach is preferred.

It will be apparent that there are many possible ways of combining the two inputs of the optimizer 5 into a single measure. A key consideration is how the two inputs are weighted, one versus the other. For example, each input can be multiplied by a constant particular to that input to produce a scaled result.

(The constants are termed weights.) The two scaled results can then be added together to form the single measure. This is a simple but effective weighting method. At one extreme, the single measure can be weighted to include just the measure of spectral leakage. In this case, fading curves similar to those of the classic known window functions can be generated. At the other extreme, the single measure can be weighted to include just the measure of rate of change of loudness. In this case, the fading curve limits the rate of change of loudness, but spectral leakage issues can occur, especially due to sharp corners where the tops of the fading curves meet the final fader gain value, as discussed earlier. Between these two extremes, a range of novel fading curves can be generated, in which both the spectral leakage and the rate of change of loudness are optimized, in some balanced way.

Preferably, the specific ratio of the two weight values is decided from measured psychoacoustic data. Thus the optimizer 5 utilizes psychoacoustic data 18. In other words, the results of listening tests on a group of people generates the psychoacoustic data which are used to determine the perceptual salience due to rate of change of loudness, and separately due to spectral leakage (using the same scale of salience). From this, an appropriate ratio for a single combined salience metric can be determined.

Multi-objective optimization is another possible approach, in which more than one cost function, sharing the same arguments, can be taken into consideration by the optimizer. With this approach, each of the two inputs to the optimizer 5 can be used directly as a cost function value. However, inherent to this approach is the problem that the optimizer has no information on how to weight one cost function value versus the other. To handle this problem, one approach is to use a weighted addition of the two inputs to the optimizer 5 as just described above, thereby converting the problem into a single-objective optimization. The optimizer can then be run multiple times, using different ratios between the two input weights. This produces a set of pairs of values for the cost function values (along with the associated optimal cost function arguments) that is termed a Pareto set. (Strictly, the Pareto set is the set of all possible such pairs of values.) When the Pareto set is plotted graphically, as one cost function value versus the other, the curve is known as a Pareto frontier. The Pareto frontier shows how one cost function value can be traded off against the other, all points on the curve being optimal for a particular weighting ratio. Effectively, the design choice of which weighting ratio to use is simply deferred by this approach: the choice must be made at some point. However, the Pareto frontier curve may be useful if other factors need to be considered in making the decision. Again, the decision should preferably be made based on the measured psychoacoustic data 18, as discussed above.

Optimization algorithms are usually iterative, and will generally run forever if some terminating condition is not detected and the procedure stopped accordingly. There are many known general types of terminating condition, including terminating on achieving a target cost function value, a computation run-time limit, an iteration count limit, or the occurrence of optimization "stalling" (i.e. when no further significant reduction in cost function value is observed throughout some timeout period). Any of these types, or others, can be used for the present application, for either single-objective or multi-objective optimization. The detection of optimization stalling is the preferred type of terminating condition.

Figure 2 illustrates the loudness rate model 3. The loudness rate model 3 comprises a number of functional sub-blocks that are now described briefly, and then more fully in the following text. An endpoint appender 32 takes a fading curve and appends new points to the start and end of it. The endpoint appender 32 passes the resulting appended curve, termed here the "appended fading curve", to a loudness model 33. The loudness model 33 computes a measure of loudness versus time, and outputs it to a differentiator 34. The differentiator 34 differentiates the measure of loudness versus time, and passes a resulting measure of rate of change of loudness versus time to a maximum finder 35. The maximum finder 35 locates the maximum value of the measure of rate of change of loudness versus time, and passes the value of the maximum out of the loudness rate model 3, where it may be used as a measure of rate of change of loudness.

The following describes in more detail how the loudness rate model 3 is used.

In each iteration of the fading curve generator 1, the endpoint appender 32 takes a fading curve produced by the CRE 2 as a loudness rate model input 20. At the very start of each fading curve, the endpoint appender 32 appends at least one new point, where the gain of the new point is set to the required starting gain. Likewise, at the very end of each fading curve, the endpoint appender 32 appends at least one new point, where the gain of this new point is set to the required finishing gain. (So, for the case of fading in, the starting gain is set to zero, and the finishing gain is set to the fader's required full level. For fading out, the settings are reversed.) The loudness model 33 takes the appended fading curve from the endpoint appender 32 and computes a measure of loudness versus time. In the field of psychoacoustics, a number of loudness models are known that are suitable for the present application. These models have been developed to agree with psychoacoustic data, from listening tests on groups of people, on the relative perceived loudness of different sounds. Thus the loudness model 33 is derived from psychoacoustic data 28. Although all psychoacoustic tests require human judgement on the part of the test subjects, they are generally designed and executed in a scientific manner. For instance, test subjects may be required to match the loudness of a reference sound against some different test sound. Results are only accepted as a general characterization of human hearing when they are repeatedly and consistently seen, with similar results obtained from person to person. Some of the simpler known loudness models do not take proper account of time-varying sounds. For example, one simple known model is simply to take loudness as being proportional to the logarithm of the amplitude or power of the signal.

However, more sophisticated known loudness models do take account of a sound's duration, or the way in which a sound varies, prior to the time instant for which the loudness is computed. Since the fading curve is time varying, these more sophisticated models are preferred. For such models, the curve generated by the fading curve generator 1 for fading out may not simply be a time-reversed version of the curve generated for fading in. Therefore the fading curve generator 1 may have to be run separately for the fading in and fading out cases, rather than running it for one case and simply time-reversing the result for the other.

Loudness models often take an audio signal as input, and they often include frequency domain processing. The frequency domain processing usually includes a method of weighting some frequencies more than others, to take account of the fact that human hearing is more sensitive to some frequencies than others. However, both the audio signal input and the frequency domain processing (including any frequency weighting) may be omitted in the loudness model 33. This is preferred, as it avoids the complexity of providing (within the loudness model 33) an audio signal source (providing simulated or actual audio), a variable gain element (to simulate a fader), and the frequency domain processing that is conventionally present within a loudness model. The omission is justified on the grounds that the fader gain, at any instant, is the same at all frequencies. Therefore the rate of change of loudness is the same, to a good approximation, at all frequencies too. Instead of taking an audio signal as input, the otherwise conventional loudness model can take the appended fading curve as input, supplied directly to a later stage in the conventional processing of the model. The details of this interfacing will be apparent to those skilled in the art.

The differentiator 34 takes the measure of loudness versus time, as output by the loudness model 33, and differentiates it, yielding a measure of rate of change of loudness versus time. Regarding perception, it is generally the case that rapid changes in sensory input, or rapid changes in perceptual magnitudes, demand high levels of attenuation; i.e. such changes have high perceptual salience. This is the basis for the differentiation performed by the differentiator 34: it gives rapid changes in loudness a high cost function value, so the optimizer 5 will work to avoid them. If the measure output by the loudness model 33, and supplied to the differentiator 34, is represented as a discrete uniformly-sampled signal, as is preferred, then it is known that the differentiation can be performed simply by subtraction of the previous sample from the current sample, possibly followed by scaling of the result.

The differentiator 34 outputs a measure of rate of change of loudness versus time, i.e. a sequence of values of rate of change of loudness, each value corresponding to a different instant in time during the period of the fading curve. However, it is preferred that only one value (somehow representing rate of change of loudness) is passed out of the loudness rate model 3, to the optimizer 5, for the reasons discussed earlier in this text.

There are a number of possible functions that can map the entire sequence of values of rate of change of loudness to a single value, such as taking the mean value of the sequence. However, it is preferred, for psychoacoustic reasons, to evaluate the maximum value of the sequence. This is the function of the maximum finder 35: it takes the output from the differentiator 34, locates the maximum value of the sequence, and passes the value of the maximum to the optimizer 5. This value provides a measure of rate of change of loudness 22.

The purpose of the spectral leakage model 4 is to evaluate the extent of the spectral leakage caused by the fading curve output by the CRE 2, as this is one of the possible factors affecting the overall perceptual salience of the fading curve. In a preferred form, the spectral leakage model 4 makes use of a fast Fourier transform (FFT) to map between a time domain representation of the fading curve and a frequency domain representation of the fading curve. Prior to running the FET, the spectral leakage model 4 performs some pre-processing on the time domain representation. After running the FFT, it performs some post-processing on the frequency domain representation, in order to yield a single-valued measure 26 of spectral leakage. The measure 26 is then passed to the optimizer 5. These operations are now discussed in more detail.

As was discussed above regarding the loudness rate model 3, it would be possible within the spectral leakage model 4 to design a system that provides an audio signal source, along with a variable gain element to simulate a fader. The faded audio could then be analyzed for the effects of spectral leakage, compared to the particular original (i.e. pre-fader) audio used. However, again, it is not necessary to involve an actual audio signal.

Fading amounts to multiplication of the original audio signal by another time-varying signal. Mathematically, the operation of multiplication in the time domain is equivalent to the operation of convolution in the frequency domain.

Thus, in the frequency domain, the effect of fading is represented by the convolution of the Fourier transform of the fading curve with the Fourier transform of the original audio signal. This is the mathematical basis of the spectral leakage due to fading. It can be seen that, for any frequency component within the original audio signal, the same spectral leakage pattern occurs, relative to the frequency of that component. This frequency invariance is the reason why it is not necessary to consider a particular audio signal.

Rather, the fading curve itself can be considered to be equivalent to the signal output by a fader whose input is simply a fixed direct current (DC) level. Thus, directly taking the Fourier transform of the fading curve itself yields the spectral leakage pattern, where that pattern is centred around DC (i.e. zero hertz). This is the mathematical basis of the preferred method for evaluating spectral leakage now described.

In standard mathematical form, the Fourier transform takes a time domain function that is defined over an infinite period of time, from a time value of minus infinity to a time value of plus infinity. For example, if we consider a single fading curve for fading in (from silence to the fader's required full level), then this would be represented as the fading period itself, prior to which is appended an infinite period of zero gain, and after which is appended an infinite period at which the gain is at the fader's required full level (termed here "full gain"). A Fourier transform can, in mathematical principal, yield a perfect frequency domain representation of such a time domain function. However, practical numerical Fourier transform algorithms, such as the FFT, can only take and process a time domain signal that exists over a finite period of time (called the window period). These algorithms usually inherently assume that the signal has zero amplitude outside of the window period. As is known, problems can arise when this assumption is not actually true, i.e. when the signal is not actually zero outside of the window period. In the case of the infinite-time fading curve just described, the fading period itself should be placed within the window period. Clearly, the two appended infinite periods (one of zero gain, and one of full gain) cannot entirely fit inside the finite window period. In the case of the infinite period of zero gain, its inevitable truncation does not matter, as the FF1 correctly assumes that the removed section has zero gain. In the case of the infinite period of full gain, however, its truncation causes spectral inaccuracy (in fact, another form of spectral leakage), because the EFT incorrectly assumes that the removed section has zero gain rather than full gain.

For a standard FFT algorithm, this spectral inaccuracy is mathematically inevitable: it cannot be completely removed. However, it can be made small in practice. One approach, which works to some extent, is to place a time-reversed copy of the fading curve just before the window period finishes, in order to bring the full gain level back down to zero gain within the window period. The spectral leakage due to the time-reversed copy has the same amplitude versus frequency as the spectral leakage due to the original fading curve, but it has different phase versus frequency. The two spectral leakage spectra add together, but the different phase causes the two spectra to sum in a non-ideal way. A preferred approach is to make the window period at least several times longer than the fading period, and to apply a window function over that window period. (There are many commonly known window functions that are suitable for this.) The longer window function will have a spectral leakage pattern that is narrower (in frequency) than the spectral leakage pattern of the fading curve under test, hence keeping modification of the latter pattern to a minimum. (The case of fading in has just been described, but it can be seen that the case of fading out follows straightforwardly.) Considering the above discussion, the internal operation of the spectral leakage model 4 proceeds as follows. In each iteration of the fading curve generator 1, the spectral leakage model 4 takes a fading curve produced by the CRE 2 as a spectral leakage model input 24. The spectral leakage model 4 then prepares a time domain window for processing by a k-point EFT, where k is at least several times (for example, at least 4 times) the number of samples in the fading curve. The fading curve is placed in the centre of the time domain window. For the case of fading in, all the samples before the fading curve are set to zero, and all the samples after the fading curve are set to full gain. For the case of fading out, all the samples before the fading curve are set to full gain, and all the samples after the fading curve are set to zero.

All of the samples of the time domain window are then multiplied by a known window function. The spectral leakage model 4 then runs the k-point FFT, using the time domain window just prepared as the FFT's input. When the FF1 operation completes, it outputs k frequency domain bins, each bin holding a complex number.

The k frequency domain bins include a DC bin (conventionally in bin zero) whose value represents an original, and therefore wanted, component of the audio signal. The values held in all the other bins represent unwanted spectral leakage due to the fading curve (and, to a much smaller extent, due to the window function applied to the time domain window prior to running the FF1). It is therefore desirable to minimize the values in these other bins, compared to the value in the DC bin. As discussed above regarding the loudness rate model 3, it is preferred that just a single value is passed to the optimizer 5, as the measure of spectral leakage. It will be apparent that there are many possible functions that can map the k frequency domain bins to a single such value. These functions include various possible measures of unwanted versus wanted spectral energy, which may involve defining spectral masks that demarcate regions in which energy is wanted or unwanted (or even transition regions in which the energy is counted as neither wanted nor unwanted). It is clear that such spectral masks could be simple or complicated, to any degree.

A preferred single-valued measure of spectral leakage can be calculated as follows. A small positive frequency, close to DC, is chosen as a boundary frequency, above which any spectral energy is regarded as unwanted. Although close to DC, the boundary frequency is placed high enough to avoid any main spectral lobe (as will usually be present) in the spectral leakage pattern due to the window function that was applied to the time domain window prior to running the FFT. (This minimizes the effect of the window function on the final measure.) The measure of spectral leakage is calculated by dividing the total energy in the bins above the boundary frequency by the energy in the DC bin. (The negative frequency bins may be ignored, as the amplitude of the spectrum is symmetrical around DC.) For the purpose of this calculation, the energy in each bin may be calculated as the sum of the squares of the real and imaginary components of the complex amplitude value of the bin. The measure of spectral leakage 26 thus calculated is passed to the optimizer 5.

The method of the spectral leakage model 4 just described ignores the fact that human hearing is more sensitive to some frequencies than others. It is possible to include a frequency-weighting curve to account for this (or even other psychoacoustic magnitudes, such as frequency-dependent sharpness).

For example, the frequency domain bins output by the FFT could by be multiplied by such a frequency-weighting curve, where the curve is based on psychoacoustic data from human listening tests. However, this would require assumptions to be made about the specific frequency of the wanted audio component represented by the DC bin. The method described above is free from such assumptions, and it allows the fading curve generator 1 to generate general fading curves that can be applied to any audio signal. For these reasons, and because a good result can be obtained without frequency weighting, it is preferred that such weighting is not performed.

Regarding the overall operation of the fading curve generator 1, it should be noted that the longer the fading period allowed, the smaller that both the rate of change of loudness and the spectral leakage can be made.

For very slow fades, across a long fading period, in practice it is not too critical exactly what fading curve shape is used: as long as it is generally smooth, low values of rate of change of loudness and spectral leakage are generally obtained. The utility of the present invention is most markedly seen when the available fading period is short. In this situation, using the present invention, a minimally salient fade can be achieved despite the short time available.

Figure 3 illustrates an audio signal fader 71 having an audio signal input 12, for audio that is to be faded, and an audio signal output 14, that provides a faded version of the signal supplied to the audio signal input. The audio signal fader 71 includes the fading curve generator 1 and a variable gain element 73. The variable gain element 73 has two inputs: the audio input 12, for the audio that is to be faded; and a gain input 16, whose value controls the gain of the element. The variable gain element 73 can simply be a multiplier of some form, digital or analog, in which one operand takes the audio input 12, and the other operand takes the gain input 16. The output 8 of the fading curve generator 1 supplies the gain input 16 of the variable gain element 73. The audio input 12 of the variable gain element 73 is provided by the audio signal input 12 of the audio signal fader 71. The output 14 of the variable gain element 73 provides the audio signal output 14 of the audio signal fader 71. Thus the audio signal fader 11, 71 is put into effect using the fading curve generator 1 of Figure 1.

It will be apparent that substantial computation is required within the fading curve generator 1 in order to generate a fading curve. Also, the preferred form of the fading curve generator 1 in Figure 1 (as used for the fading curve generator 1 in Figure 3) generates fading curves that are independent of any particular audio signal. For a particular fading period and fading direction (i.e. fading in or fading out), the same curve will always be generated. Therefore, it may be preferable to store a fading curve, or a set of fading curves (e.g. for different fading periods and directions), that have been produced by the fading curve generator 1. Figure 4 illustrates an audio signal fader 81 that uses such a scheme. It operates in a manner similar to the audio signal fader 71 of Figure 3, except that the fading curve generator 1 is replaced by a fading curve store 82. The fading curve store 82 contains one or more fading curves generated by the fading curve generator 1. As required, the fading curve store 82 plays back a pre-stored fading curve to the gain input of a variable gain element 73. Other than this, the audio signal fader 81 operates in the same way as the audio signal fader 71. Thus an audio signal fader is put into effect without the computational overhead of re-running the fading curve generator 1 for each fade.

It can be seen that there is therefore possible utility in storing or transmitting the output of the fading curve generator 1, or in storing or transmitting the output of an audio signal fader that uses a fading curve produced by the fading curve generator 1.

There is thus described a fading curve generator that can be used to control the gain of an audio signal fader so as to produce an audio fade with low perceptual salience, even when the available time for fading is short.

Fig. 5 shows an audio system 10. The audio system 10 includes the audio signal fader 11, an audio device 92 and one or more sound emitting transducers 93 in the form of loudspeakers.

In use, the audio input signal 12 is input to the fader 11, which, as has been described above, modifies the audio signal 12 and outputs the output audio signal 14 for which the perceptual salience has been minimised. In the situation in which the input audio signal 12 includes other content for which fading is not required and/or desired, the perceptual salience is minimised only in respect of any fading performed by the fader 11.

The audio device 92 could be a device such as an MP3 player, or could be part of a computer. The audio system 10 could include transmission and/or distribution apparatus (not shown) to relay the modified audio signal 14 to a remote user. For example, the audio signal fader 11 could be hosted on a remote server, with the output audio signal 14 being transmitted via the internet to a user's local computer. The device 92 could include, alternatively or additionally, a microprocessor, a central processing unit and/or a mobile device such as a telephone or a tablet.

The present invention has been found to be particularly beneficial in modifying audio signals which are generated to utilize the phenomenon of residual inhibition for tinnitus sufferers, as described earlier.

It will be apparent to the skilled person that the above described schemes are not exhaustive and that variations may be employed to achieve a similar result, whilst using the same inventive concept. For example, for ease of explanation, the fading requirement has been presented as transitioning between zero gain and some final gain. However, clearly the same inventive concept can be applied to a transition between any two gain levels, neither of which may necessarily include zero gain. Also, the preferred embodiment described above produces fading curves that are independent of the specific audio signal being faded. As has been noted above, by certain examples, it is instead possible to use the inventive concept in such a way that it considers the particular audio that is being faded. For example, such a form of use may be of practical value for the post-processing of audio sources, where such processing may not necessarily be carried out in real time. In addition to the examples given above, it will be apparent to the skilled person that there are other ways that the inventive concept can be used to consider particular audio signals.

Any of the features, steps, methods or apparatus of any of the embodiments shown or described could be combined in any suitable way, within the scope of the overall disclosure of this document.

A novel means of producing an audio fade with low perceptual salience is thus presented. The audio fader of the invention provides an automated fade between silence and the final level, or vice versa, in such a way as to minimize the amount of attention demanded by the transition. In other words, in the terminology of psychology or psychoacaustics, the invention minimizes the perceptual salience of the transition. In a preferred embodiment, this is achieved by applying psychoacoustic data in some novel ways. The systems and methods presented are of particular advantage when only a short time is available in which to complete the fade, and yet the fade is required to be unobtrusive.

Claims

CLAIMS1. A fading curve generator for modifying an audio signal, the generator including a loudness rate model which provides a measure of rate of change of loudness and a spectral leakage model which provides a measure of spectral leakage, wherein the generator generates a fading curve which optimizes the rate of change of loudness measure and the spectral leakage measure.
2. A generator according to claim 1, in which the measure of rate of change of loudness is derived from psychoacoustic data.
3. A generator according to claims 1 or 2, in which the generator includes an optimizer, which optimizes the rate of change of loudness measure and the spectral leakage measure.
4. A generator according to claim 3, in which the optimizer utilizes psychoacoustic data.
5. An audio signal fader for modifying an audio signal, the fader comprising a variable gain element, the variable gain element having an audio signal input and an input which is a fading curve, wherein the fading curve optimizes a measure of a rate of change of loudness and a measure of spectral leakage.
6. A fader according to claim 5, in which the fader includes a fading curve generator which generates the fading curve.
7. A fader according to claim 5, in which the fader includes a store, which stores one or more fading curves generated by a fading curve generator.
8. A fader according to claims 6 or 7, in which the fading curve generator includes any of the features defined in any of claims 1 to 4.
9. A fader according to any of claims 5 to 8, in which the audio input has a fading period, which is the time period over which a fade (ie a change in loudness) occurs, and a direction (ie rising or falling), and the fading curve is dependent on the fading period and the direction of the audio input.
10.A fader according to any of claims 5 to 9, in which the fader includes an output which is a modified audio signal, in which the perceptual salience of the output audio signal has been minimised.
11.A fader according to claim 10, in which the perceptual salience is minimised in accordance with psychoacoustic data.
12.An audio system including an audio signal fader, the fader comprising a variable gain element, the variable gain element having an audio signal input and an input which is a fading curve, wherein the fading curve optimizes a measure of a rate of change of loudness and a measure of spectral leakage.
13.An audio system according to claim 12, in which the fader includes any of the features defined in any of claims 5 to 11.
14.An audio system according to claims 12 or 13, in which the system includes one or more sound-emitting transducers that are fed, directly or indirectly, by an output of the audio signal fader.
15.An audio system according to any of claims 12 to 14, in which the audio system includes transmission and/or distribution apparatus to relay the modified audio signal to a remote user.
16.A method of generating a fading curve so as to optimize a measure of rate of change of loudness and a measure of spectral leakage.
17.A method of modifying an audio signal, the method including generating a fading curve so as to optimize a measure of rate of change of loudness and a measure of spectral leakage.
18.A method according to claims 16 or 17, in which the method includes any of the steps or features defined in any of claims 1 to 15.
19.A computer program comprising computer program code means adapted to perform the method defined in any of claims 16 to 18 when the program is run on a microprocessor, a central processing unit, a computer, a mobile device, a computer network or a telephony network.
20.A computer program according to claim 19, in which the computer program is embodied on a computer readable medium.
21.A fading curve generator substantially as hereinbefore described and/or with reference to any of the accompanying drawings.
22.An audio signal fader substantially as hereinbefore described and/or with reference to any of the accompanying drawings.
23.An audio system substantially as hereinbefore described and/or with reference to any of the accompanying drawings.
24.A method of generating a fading curve substantially as hereinbefore described and/or with reference to any of the accompanying drawings.
25.A method of modifying an audio signal substantially as hereinbefore described and/or with reference to any of the accompanying drawings.
26. A computer program substantially as hereinbefore described and/or with reference to any of the accompanying drawings.