GB2490877A - Processing audio data for producing metadata and determining aconfidence value based on a major or minor key - Google Patents

Processing audio data for producing metadata and determining aconfidence value based on a major or minor key Download PDF

Info

Publication number
GB2490877A
GB2490877A GB1107903.5A GB201107903A GB2490877A GB 2490877 A GB2490877 A GB 2490877A GB 201107903 A GB201107903 A GB 201107903A GB 2490877 A GB2490877 A GB 2490877A
Authority
GB
United Kingdom
Prior art keywords
tonality
audio data
transition
major
chord
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1107903.5A
Other versions
GB201107903D0 (en
GB2490877B (en
Inventor
Mark Mann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Broadcasting Corp
Original Assignee
British Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Broadcasting Corp filed Critical British Broadcasting Corp
Priority to GB1807498.9A priority Critical patent/GB2560458B/en
Priority to GB1107903.5A priority patent/GB2490877B/en
Priority to GB1807502.8A priority patent/GB2560459B/en
Publication of GB201107903D0 publication Critical patent/GB201107903D0/en
Publication of GB2490877A publication Critical patent/GB2490877A/en
Application granted granted Critical
Publication of GB2490877B publication Critical patent/GB2490877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/081Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/075Musical metadata derived from musical analysis or for use in electrophonic musical instruments
    • G10H2240/085Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

Systems and methods for producing a measure of tonality operating by sampling audio data and determining the difference in a confidence value for a major key or a minor key. A summing function produces a weighted confidence value. Systems and methods also provide an indication of the rate of change or amount of change between major and minor tonality within music. The weighted tonality may be using chords. A signal asserted from a system embodying the invention is intended to be used to control studio equipment such as lighting or camera equipment according to a weighted tonality.

Description

I
PROCESSING AUDiO DATA FOR PRODUCiNG METADATA
BACKGROUND OF THE iNVENTION
This invention relates to systems and methods for processing audio data for producing metadata that can be used by a subsequent process. The invention particularly applies to audio data containing music.
Metadata relating to music may be used in a variety of systems and processes.
Such metadata may include, for example, data relating to the tone, key, tempo, volume, dynamic range or other attribute of a musical piece. Such metadata may be asserted as an output signal for controlling a process or system. Studio systems, for example, control lighting effects to match the beat and volume of music. Vision mixers can select between different cameras based on metadata such as the tempo of music. Archive systems may determine and store metadata extracted from audio data so as to allow subsequent efficient retrieval.
In all such systems and processes, there is a need to derive metadata from the underlying audio data representing an audio track containing music.
Attributes that may be extracted from music data include tonality (whether a major or a minor key), the tempo (beats per minute of the fundamental timing of the music) and similar attributes derived from either the key or tempo. In order to better understand these concepts, some basics of music will first be described with respect to Figures 1 and 2.
Much of modern western music is still based on compositional methods of the common practice period, which were employed almost exclusively in Europe between the Renaissance and the Twentieth Century. Harmony in this system is based upon the diatonic scale in which an octave (in which the top note is double the frequency of the bottom note) is sub-divided into 12 intervals in which each note has a frequency approximately 21/12 (or 1.059) times the one below it. Thus the thirteenth note will be (21/12)12 = 2 times the frequency of the first, hence recovering the octave. The notes are arranged in a scale denoted by seven letters from A to G, with the remaining five intermediate notes inserted between. t
S
If, for instance, the intermediate note lies between 0 and E, it can be referred to as D sharp (D#) because it is sharper or higher in pitch than note 0, or as E flat (Eb) because it is flatter or lower in pitch than the note E. Single intervals are referred to as semitones, whilst two intervals (such as from C to 0) are referred to as tones. This can be best explained by representing these notes on a keyboard (as shown in Figure 1), where the five intermediate notes are coloured in black and offset from the other seven.
Whilst it is possible to use any of the notes during a piece of music, for a basic melody, they very rarely are. Melodies tend to follow a scale, which is a made from a standard combination of 7 of the 12 notes available. The two most common combinations are major scales and minor scales (of which there are three valid versions where the 6th and 7th notes can be altered). The main difference between the two scales is the third note, which for a major key is a semitone higher than that of the minor. Major chords are generally perceived to be lighter, whilst minor chords are perceived to be darker and heavier.
Chords (or the occurrence of multiple notes concurrently) can be made up from any combination of the notes. Basic chords, whether major or minor, incorporate the first tone of the scale (or tonic -this note lends its name to the chord), the third tone of the scale (referred to as the mediant which denotes whether the chord is major or minor) and the fifth tone (referred to as the dominant). Figure 2 illustrates the difference between a C major chord and a C minor chord.
The tempo of a piece of music is a measure of the rate at which beats are struck and is usually given in beats per minute. As with tonality (major or minor) the tempos are a form of metadata which can be extracted from an audio signal.
Whilst methods and systems exist for producing metadata from audio data, these can be improved.
S
SUMMARY OF THE INVENTION
We have appreciated the need to provide improved systems and methods for processing audio data to produce metadata. The invention resides in four aspects.
A first aspect of the invention relates to systems and methods for producing a weighted tonality measure by sampling audio data at intervals, determining the difference between a confidence value for a major key and a confidence value for a minor key at each interval, and summing a function such as the product of this difference value multiplied by the peak confidence value at each interval to produce a weighted confidence value. This first aspect provides a clearer emphasis to one tonality where the opposite tonality has a low confidence measure.
A second aspect of the invention resides in systems and methods for determining a weighted tonality differential indicative of the rate or amount of change between a major and minor tonality within music by sampling audio data at transitions of tonality, determining the difference in confidence measure between major and minor tonality both before and after each transition, summing a function such as the difference in confidence measure before each transition with the difference in confidence measure after the transition to produce a summed difference, and summing the summed difference values for the transitions. This aspect improves the quality of metadata which may be asserted as an output signal because the weighting of each tonality transition ensures that the summation better represents the true amount of tonality change.
A third aspect of the invention, similar to the second aspect, determines a weighted chord differential. In this aspect, confidence values of dominant chords before and after a chord transition are determined, a difference in confidence value before a chord transition and a difference in confidence value after the transition are determined, a function including the difference in confidence values is summed thereby emphasising where there is a high degree of change of confidence that a chord transition has occurred, but reducing the effect of changes where there is a low degree of confidence of a chord transition.
In a fourth aspect of the invention, a tempo extraction system and method filters an audio signal into separate signals for different frequency bands, produces a beat rate for each band, orders and groups the beat rates and determines which group has the highest number of members. This shows the beat that is present in most frequencies but, this is not necessarily the tempo. To determine the tempo this aspect then determines if any frequencies have a beat rate approximately half the rate of the group with the highest number of members. If so, this is determined as the tempo. If no frequency has a beat rate with half the beat rate present in most frequencies, an analysis is undertaken to check for any signals at a third of the beat rate of the highest group and if any are found, this is determined to be the tempo of the music. In this way, it is more likely that the actual tempo of the music will be correctly determined, rather than a multiple such as half beats or one third beats.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described in more detail by way of example with reference to the drawings, in which: Figure 1: is a diagram of the diatonic scale labelled with notes and approximate associated frequencies; Figure 2: is a diagram showing that the C Major chord is made up of C, E and G, whilst the C Minor chord is made up of C, Eb, and G; Figure 3: shows the main functional components of a system embodying tonality aspects of the invention; Figure 4: shows a graph of confidence values for each of 12 possible major chords and 12 possible minor chords within a given piece of music; Figure 5: shows peak and weighted tonality values for various music; Figure 6: shows the system embodying tempo extraction aspects of the invention; Figure 7: shows frequency bands used to filter audio; and Figure 8: shows beat signals for three different music in each of 10 frequency bands.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The invention may be embodied in a method and system for processing audio data, particularly audio data representing an audio signal derived solely or in part from a piece of music, Whilst all aspects of the invention may be embodied in single system, for convenience of understanding, the tonality and tempo aspects will be described as separate embodiments. Furthermore, the embodiments may be dedicated hardware or suitably programmed general purpose processors.
A system embodying the tonality aspects of the invention is shown in Figure 3.
An input 2 receives an audio signal and provides this in an appropriate form to a sampler 4. The input audio signal could be an analogue signal direct from a live feed or an already processed digital signal from an audio archive. The input 2 provides audio data. The sampler 4 samples the audio data at selected time intervals and provides the samples to a tonality processor which determines a measure of the confidence of tonality (major or minor) for each sample. This is provided to a weight sum processor 8. The processing within the weight sum processor provides the advantages of improved tonality metadata production as will be described presently.
The sampler 4 and tonality amplitude processor 6 may be combined together in a single process and use a number of known techniques. One such technique is implemented within a known software tool named Matlab and in particular within an open source toolbox for Matlab named MIR toolbox. This toolbox may be used to implement various functions on audio data to determine energy, tempo, tonality and key and is just one example of known techniques for operating such functions.
The MIR toolbox in Matlab incorporates a function called mirkeystrength. There are 12 possible major chords and 12 possible minor chords. The function calculates and assigns a probability or confidence value to each of the possible 24 chords at a sample rate that can be controlled with the function. In this embodiment, half second intervals are used. The function calls another MIR toolbox function, mirchromagram, which calculates the energy distribution for each note in the diatonic scales. The pitches are then concatenated into one octave and normalized. Next, mirkeystrength cross-correlates the chromagram with the chromagram one would expect for each of the 24 chords and assigns a probability or confidence value to each chord, where a probability of +1 for the tested chord would indicate a definite match whilst -1 would indicate a definite mismatch. In practice neither +1 nor -1 is obtained because of the presence of other characteristic frequencies of musical instruments and synthesisers. The confidence value may be on any arbitrary scale that indicates a measure of the confidence with which a given chord is found. The output is accessible either as raw data or as a graph in which the keystrength of the music is represented by a colour spectrum, with red indicating a clear match and blue a clear mismatch.
Figure 4 shows such a graph for the music "Last of the Summer Wine".
As can be seen in Figure 4 a graphical representation of the possible keys used in "Last of the Summer Wine" with time is shown. Major keys are denoted with capital M, minor keys with a small m. The red (darker) colours denote a high degree of matching the green and blue (lighter) colours denote progressively lower degrees of matching. Consequently, this piece is predominately in C major as shown by the bottom line, though C minor gives quite a strong match also.
The premise of tonality calculations is that the exact key in which the music is composed does not matter. It has been proposed by musicologists that some major keys sound brighter than others and some minor keys more mournful than others, but we consider the differences subtle to all but the most advanced musicians. In fact, music is very rarely made up of one chord, with chords often progressing to different base notes within the scale, so such considerations are somewhat nugatory. Whilst major and minor chords are the basic construct of a piece of Western-style music, other chord types such as dominant 7th, diminished 7th, extended, added tone and dissonant chords are used to great effect in music to elicit different emotions. However, by their nature, they are more complex and hence difficult to detect and can often be confused with major and minor chords. Consequently, when such chords are present, one would expect the key clarity to diminish. This can be seen again in Figure 4 at around the 3 second mark, where an added tone chord of C, D, F and A is played. The confidence value does not adequately differentiate between 0 minor, A minor and F major chords as a consequence, with no assigned probability particularly high.
We have appreciated that the tonality calculations should be weighted. In Figure 4, shortly after 10 seconds, a strong C major chord is present. The next highest probability is considerably less. This should therefore count more towards the tonality calculation than the uncertainty at the 3 second mark. The weight sum processor 8 shown in Figure 3 implements a process to determine a weighted tonality. The weighted tonality is calculated by determining the probability amplitude which may also be referred to as a confidence measure for each of the (i) peak major tonality, (ii) peak minor tonality and (iii) peak tonality irrespective of whether major or minor. The difference between the peak confidence measure of major tonality and peak confidence measure of minor tonality is multiplied by the confidence measure of tonality at each interval. The resulting value at each interval is then summed for all intervals and divided by the number of intervals.
The resulting weighted tonality gives an indication of the measure of confidence of tonality on a positive to negative scale, positive values indicating major tonality and negative values indicating minor tonality. In the example given, each confidence measure varies from +1 to -1 and so summing n such confidence measures and dividing by n will give a weighted tonality which also varies from +1 to -1, but of course is more accurate in representing the tonality of the whole piece of music.
The weighted tonality, W, is defined as: w = (Ekrnax(Kmarkmin)) n Equation I Where: Kmax = peak confidence measure of tonality (irrespective of whether major or minor) Kmaj peak confidence measure of major tonality Kmin = peak confidence measure of minor tonality n = the number of time intervals used to classify the sample of music.
summed over all n and divided by n. Minor keys will therefore be of negative W. This measure gives a much clearer representation of the tonality of the music under consideration because it emphasises certainty where it exists and minimizes uncertain contributions. Consequently, it gives a clearer comparison between similar music, spreading the data, enabling subsequent processes such as a classifier to determine music classification with greater clarity.
Eleven theme tunes were tested for peak tonality and weighted tonality and the results shown in Figure 5. Weighted tonality offers a better tonality comparison than subtracting the peak major from the peak minor.
As well as the overall nature of the tonality in the music, it is also useful to know the frequency with which tonality changes. Attributes such as exciting and dramatic can be associated with a more frequent change of tonality and key and used to assert an output accordingly. Consequently, two further measures are made in this embodiment. The first is a weighted tonality differential, which detects the rate at which the tonality changes per second during the course of the music. The second measure determined is a weighted chord differential, which detects the rate at which the dominant chord changes in the piece; the dominant chord may change but this doesn't necessarily mean a change in tonality (for instance the chord can change from an A major to an F major chord).
The weighted tonality differential will be described first. This measure uses the same data taken from calculating the weighted tonality as described above. It detects where the tonality changes (i.e. whether from major to minor or vice versa) and weights it with the certainty that the key change has happened. It does this by finding the transitions and calculating the sum of the certainties associated with the chords before and after the transition and optionally multiplying by the peak confidence measure of tonality. The difference in certainty/ confidence is given by: (Kmaj-Kminljftl Kmaj-Kminlj+i (where j corresponds to the certainty before and j+1 to the certainty after).
It will only do this at transition locations. Where there is not a transition, the differential will be 0 (see table I below). This is then averaged over the number of time intervals, n. Again, because this weights the transitions with a certainty that the tonality change has happened, it gives greater emphasis to clearer transitions, thus filtering out noise.
T = Z(lKc KminIj± U<mar KmanIj+i) Equation 2 where for all j, the expression contained within the brackets is only summed where a transition occurs and is otherwise equal to zero.
As described by the equation above, the weighted tonality differential emphasises those transitions that have a high confidence measure of one tonality and low confidence measure of another before a transition and a low confidence measure of the tonality and a high confidence measure of the other tonality after the transition. On the other hand, transitions where there is less of a difference between the confidence measures of each tonality either before or after (or both) a detected transition are given less weight in the summation. The tonality differential may be multiplied by the peak tonality, or other measure, to further weight the function before summing.
In the example of Table I, the overall weighted tonality differential for the data is 0.335/n = 0.112. Note that the total number of time intervals, n = 3 in this case and that I Kmaj -Kmin I j+ I Kmaj -Kmin I j+1 is not calculated if tonality does not change.
Timeinterval K!qK number 0583 0M74 -0.092 0.335 i1 0.786 0.543 0.243 0.000 i2 0.777 0.562 0.215
Table I
There are other possible ways in which the transition can be weighted. For instance, one could look one or two time intervals beyond the transition to give an overall transition probability. It may be that simply down to how the music is divided up that the transition may occur at a boundary between time intervals, thus giving uncertainty to a tonal change which is actually clear. However, work carried out so far doesn't indicate that this is a significant problem.
The weighted chord differential will now be described. This measure again uses the same data taken from calculating the weighted tonality. It searches for the dominant chord, and detects transitions of the dominant chord. The transition is weighted with a chord transition certainty, which is calculated by looking at the change in certainty of the two chords in question before and after the transition.
In this aspect, the tonality processor of Figure 3 is replaced by a chord processor.
Let us define K as the confidence value for the maximum certainty chord before the transition and K÷i as the confidence value of this chord after the transition.
Likewise Li is defined as the as the confidence value of the new dominant chord before the transition and Li+i as the confidence value of the new dominant chord after it. The transition is weighted by the factor The effect of this again is to give added weight to more certain transitions and to decrease the effect of less certain transitions. Again, where a transition doesn't occur, the differential will be 0. Table 2 gives an example of a weighted chord differential calculation.
The weighting of the chord differential can be understood by the full summation equation shown below. By taking the sum of the differences in confidence and value of the two dominant chords involved in a transition before and after the transition, greater weight is given to those transitions where there is a large change in confidence value. By summing the weighted values at all such transitions, a resulting single weighted chord differential value gives a measure of confidence as to the relative rate of chord changes within a given piece of music.
Z(IKI-bI÷IKi+i--La+iI) = 1=1 n Equation 3 where for all i, the expression contained within the brackets is only counted where a transition occurs and is otherwise equal to zero.
The chord differential may be multiplied by the peak chord confidence, or other measure, to further weight the function before summing.
Time interval number K L (IC -L) + -IC+i) 0.773 0356 0.675 i÷1 0.567 0.825 0.000 i2 0.587 0.796
Table 2
As shown in Table 2, the overall weighted chord differential for the above data is 0.675/n = 0.225. Note n = 3 in this case and that (Ki -Li) + (Li+1 -Ki+1) is not calculated if key does not change.
A second embodiment of the invention relates to accurate determination of tempo as will now be described in relation to Figure 6. The system embodying the invention has an input 12 for receiving an audio signal and presenting this in an appropriate format to a sampler 14 which samples the audio data at a chosen interval and provides this to a beat processor 16. The sampler 14 and beat processor 16 may be provided by known techniques such as the Matlab MIR toolbox.
Tempo is a measure of the rate at which beats aie struck and is usually given in beats per minute (bpm). A set of typical tempo values is listed in Table 3.
Tempo marking Tempo range (bpm) Larghissimo 0-40 Lento/Largo 40-60 LarghettolGrave 60-66 Adagio 66-76 Andante 76-108 Moderato 108-120 Allegro 120-168 Presto 168-200 Prestissimo 200+
______________________________ _______________________________
Table 3
Most tempi tend to be in the range 60-1 80 beats per minute. Other tempi do occur but are much less frequent.
A distinction may be drawn, though, between the beats discovered within a piece of music by a beat processor and the actual tempo of the overall piece of music.
The reason for this is the extensive use of percussion or plucked instruments in music, which have sharp, high frequency onsets which the software picks up.
Consequently, instruments such as the guitar, which often plays on the half beat unless strummed, will produce sharp, high frequency onsets on the half beat.
The techniques for extracting the audio features are accurate, but it is the way they are interpreted which causes the problem. If the maximum rate of beats is simply counted, then the tempo of a piece will often erroneously be determined to be twice or three times the actual tempo of the music. It is for this reason that a tempo processor 18 is configured to provide a rnetadata output which may be asserted on output 20.
The tempo processor 18 operates by taking beat signals from the beat processor for each of a number of different frequencies, grouping the tempos indicated by each of the beat signals, determining the most frequently occurring range of beats per minute and then determining whether there are beat signals having one half or one third of this number of beats per minute. In either case, this would indicate that the number of beats per minute occurring most often is not actually the tempo of the piece, but is instead a multiple showing one half or one third beats. Each of the steps undertaken by the tempo processor will now be described in turn.
Figure 7 illustrates 10 frequency filters and Figure 8 three theme tunes filtered into ten, roughly logarithmically equal frequency bands using the filtering Figure 7 which roughly correlate with octaves. As can be seen, when this is done, the beat is clearly visible in certain frequency bands, but the band in which they occur is not necessarily the same each time. The assumption for many automated beat extractors is that the beat is present in the lower frequencies, but this is a generalisation too far.
As shown in Figure 8, the filtered waveform of three theme tunes with the frequency bands ascending in pitch from bottom top are: left Postman Pat, centre Dad's Army and right Eastenders. Red (mid grey) waveforms indicate the bands in which an autocorrelation of onsets returns the musically correct tempo, green (light grey) indicates where the function returns double the tempo, and blue (dark grey) indicates a spurious result.
Tempo calculations are carried out on the filtered waveforms in Figure 5 using the mirtempo function. The function mirtempo calculates the tempo by picking the highest peaks in the autocorrelation function of onset detection and computing the time beween such peaks. The red bands indicate where the tempo was correctly identified, the green where a tempo twice that of the correct tempo was calculated. Note that in no instance has a tempo half that of the correct tempo been found and that the correct beat can be clearly identified in the red waveforms.
The operation of the tempo processor embodying the invention is as follows. The mirtempo function is applied to each of the ten filtered waveforms. The extracted tempi are arranged in ascending order, the difference in tempo between adjacent values noted, and g, the number of times the difference exceeds a threshold chosen to be 10 beats per minute, noted. A statistical function krneans is used to cluster the data. Other grouping functions may be used. Extracted tempi are likely to vary by one or two beats per minute within a cluster and gil clusters are found (but with g limited to 4 so that tracks with no clear tempo do not give an arbitrary result). The standard deviation of the beats per minute inside each cluster is also noted to give a measure of how precise the extracted tempo is. It is therefore possible to return an unspecified tempo should this value go above a certain threshold.
When the data is clustered, the largest cluster (or mode) is found. The processor then searches for a tempo within a tolerance (which may be varied) chosen to be 15% of half the value of the mode tempo. If this exists, the slower tempo is chosen as the correct tempo. The processor then searches for a tempo within 15% of a third of the value of the mode tempo. Again, if this exists, this slower tempo is chosen as the correct tempo. If neither a half nor a third tempo is found, the mean value of the modal cluster is chosen as the tempo. Table 4 shows a worked example of this for Last of the Summer Wine.
Extracted Sorted and Mean of Mode? Halt factor? Third factor? Tempo tempi clustered each cluster 145.27 81.062 86.007 No Yes No 168.53 9(1951 __________ __________ __________ __________ 169.84 127.66 127.66 -No No No 127.66 145.27 145.27 No No -No 174.01 165.41 86.007 81.062 168.53 90.951 168.99 165.41 16W84 170.18 Yes No No 4.31 174.01 168.99 17431 ___________ ___________ ___________ ___________ ___________
Table 4
As shown in Table 4 the steps in calculating the tempo of Last of the Summer Wine flow from left to right. Tempi are extracted for each of the ten bands; they are then sorted and clustered. The mean is taken of each cluster, the mode determined and then each cluster tested to see whether it is a half factor or third factor of the mode. If either exists, the factor is chosen as the tempo, if not, the mode is chosen.
For completeness, applications of the embodiments of the invention will be now discussed. A first application is in studio equipment in controlling devices such as lighting equipment, camera shot changes or the like. A signal asserted from a system embodying the invention can be provided as an input and used to control such devices based on the weighted tonality, tonality differential, chord differential or tempo value.
A further application is in devising metadata for audio data extracted from a music archive.
The weighted tonality and a new tempo extractor can be used to improve music mood classification. Weighted tonality gives greater emphasis to musical passages of clear tonality, whilst ignoring more ambiguous passages. The tempo extractor outperforms commonly available software.
Using weighted tonality as a dimension for support vector machine classification enhances the classification of the moods happy, light and heavy to over 80%, amongst the best reported success rates in the field. The improved tempo extractor will improve the success rate of moods such as exciting and dramatic.

Claims (23)

  1. CLAIMS1. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a sampler arranged to sample the audio data at time intervals; -a tonality processor arranged to produce a major tonality confidence measure, a minor tonality confidence measure and a peak tonality confidence measure for each sample; -a weight sum processor arranged to produce a weighted tonality confidence measure for the audio data by summing for all the samples a function of a difference in major and minor confidence measures for each of the samples; and -an output for asserting the weighted tonality confidence measure.
  2. 2. A method of processing audio data relating to music to produce metadata, comprising: -providing audio data; -sampling the audio data at time intervals; -producing a major tonality confidence measure, a minor tonality confidence measure and a peak tonality confidence measure for each sample; -producing a weighted tonality confidence measure for the music by summing for all the samples a function of a difference in major and minor confidence measures for each of the samples; and -asserting the weighted tonality confidence measure.
  3. 3. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a sampler arranged to sample the audio data; -a tonality processor arranged process the samples and to detect transitions in tonality and to produce a major and minor confidence measure before each transition and a major and minor confidence measure after each transition; -a weight sum processor arranged to produce a weighted tonality differential by summing for all the transitions a function of the difference in major and minor confidence measures both before and after each transition; and -an output for asserting the weighted tonality differential.
  4. 4. A method of processing audio data relating to music to produce metadata, comprising: -providing audio data; -sampling the audio data; -detecting transitions in tonality and producing a major and minor confidence measure before each transition and a major and minor confidence measure after each transition; -producing a weighted tonality differential by summing for all the transitions a function of the difference in major and minor confidence measures both before and after each transition; and -asserting the weighted tonality differential..
  5. 5. A system according to claim I or 3, wherein the function comprises multiplying by peak tonality for each respective the sample.
  6. 6. A system according to claim I or 3, further configured to divide the summed function by the number of samples.
  7. 7. A system according to claim 5, wherein the peak tonality confidence measure is a measure of the maximum tonality confidence irrespective of whether major or minor.
  8. 8. A system according to any of claims 1, 3, 5 to 7, wherein the major tonality confidence measure is a measure of the confidence of a major key.
  9. 9. A system according to any of claims 1, 3, 5 to 7, wherein the minor tonality confidence measure is a measure of the confidence of a minor key.
  10. 10. A system according to any of claims 1, 3, 5 to 7, wherein the sampler is configurable to vary the time intervals.
  11. 11. A system according to claim 10, wherein the sampler is configurable based on an tempo input from a tempo processor.
  12. 12. A system according to any of claims 1, 3, 5 to 11, wherein the output comprises a control for controlling audio or video equipment according to the weighted tonality measure.
  13. 13. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a sampler arranged to sample the audio data; -a chord processor arranged to process the samples and to detect chord transitions and to produce a confidence measure of a first chord that is dominant before each transition and of that first chord after the transition and a confidence measure of a second chord that is dominant after each transition and of that second chord before the transition; -a weight sum processor arranged to produce a weighted chord differential by summing for all the transitions a function of the difference in confidence measure of the first and second chords before and after each transition; and -an output for asserting the weighted chord differential.
  14. 14. A method of processing audio data relating to music to produce metadata, comprising: -providing audio data; -sampling the audio data; -detecting chord transitions and producing a confidence measure of a first chord that is dominant before each transition and of that first chord after the transition and a confidence measure of a second chord that is dominant after each transition and of that second chord before the transition; and -producing a weighted tonality differential by summing for all the transitions a function of the difference in confidence measure of the first and second chords before and after each transition.
  15. 15. A system according to claim 13, wherein the function comprises multiplying by a peak chord measure for each respective sample.
  16. 16. A system according to claim 13, further configured to divide the summed function by the number of samples.
  17. 17. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a filler arranged to filter the audio data into a plurality of frequency bands; -a beat processor arranged to produce a beat rate for each frequency band; and -a tempo processor arranged to determine and output a tempo value by: -receiving the beat rates, -grouping the beat rates into groups of similar beat rates, -determining the group containing the most members, -comparing a notional beat rate of the group containing most beat rates to the notional beat rates of other groups, and -outputting the notional beat rate of any group having, within a given tolerance, half the notional beat rate of the group containing most members if such a group exists, or outputting the notional beat rate of the group containing most members if such a group does not exist.
  18. 18. A system according to claim 17, wherein the tempo processor is further arranged to determine and output a tempo value by outputting the notional beat rate of any group having, within a given tolerance, one third the notional beat rate of the group containing most members if such a group exists.
  19. 19. A system according to claim 17 or 18, wherein the notional beat rate of each group is one of the mean, median or mode, maximum or minimum beat rate of the rates in the group.
  20. 20. A system according to claim 17 or 18, wherein the tolerance is a percentage.
  21. 21. A system according to claim 20, wherein the tolerance is 15%.
  22. 22. A method of processing audio data relating to music to produce metadata, comprising: -receiving audio data; -filtering the audio data into a plurality of frequency bands; -producing a beat rate for each frequency band; and -determining a tempo value by: -receiving the beat rates, -grouping the beat rates into groups of similar beat rates, -determining the group containing the most members, -comparing a notional beat rate of the group containing most members to the notional beat rates of other groups, and -outputting the notional beat rate of any group having within a given tolerance half the notional beat rate of the group containing most members if such a group exists, or outputting the notional beat rate of the group containing most members if such a group does not exist.
  23. 23. A system arranged to produce metadata according to any of equations 1, 2 or 3 described herein.
GB1107903.5A 2011-05-11 2011-05-11 Processing audio data for producing metadata Active GB2490877B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB1807498.9A GB2560458B (en) 2011-05-11 2011-05-11 Processing audio data for producing metadata
GB1107903.5A GB2490877B (en) 2011-05-11 2011-05-11 Processing audio data for producing metadata
GB1807502.8A GB2560459B (en) 2011-05-11 2011-05-11 Processing audio data for producing metadata

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1107903.5A GB2490877B (en) 2011-05-11 2011-05-11 Processing audio data for producing metadata

Publications (3)

Publication Number Publication Date
GB201107903D0 GB201107903D0 (en) 2011-06-22
GB2490877A true GB2490877A (en) 2012-11-21
GB2490877B GB2490877B (en) 2018-07-18

Family

ID=44243972

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1107903.5A Active GB2490877B (en) 2011-05-11 2011-05-11 Processing audio data for producing metadata

Country Status (1)

Country Link
GB (1) GB2490877B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488782A (en) * 2013-09-30 2014-01-01 华北电力大学 Method for recognizing musical emotion through lyrics

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1426921A1 (en) * 2002-12-04 2004-06-09 Pioneer Corporation Music searching apparatus and method
GB2427291A (en) * 2005-06-17 2006-12-20 Queen Mary & Westfield College A method of analysing audio, music or video data
US20070261535A1 (en) * 2006-05-01 2007-11-15 Microsoft Corporation Metadata-based song creation and editing
US20080097633A1 (en) * 2006-09-29 2008-04-24 Texas Instruments Incorporated Beat matching systems
CN101399035A (en) * 2007-09-27 2009-04-01 三星电子株式会社 Method and equipment for extracting beat from audio file
EP2068255A2 (en) * 2007-12-07 2009-06-10 Magix Ag System and method for efficient generation and management of similarity playlists on portable devices
EP2204774A2 (en) * 2008-12-05 2010-07-07 Sony Corporation Information processing apparatus, information processing method, and program
WO2010129693A1 (en) * 2009-05-06 2010-11-11 Gracenote, Inc. Apparatus and method for determining a prominent tempo of an audio work
US20120060667A1 (en) * 2010-09-15 2012-03-15 Yamaha Corporation Chord detection apparatus, chord detection method, and program therefor

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1426921A1 (en) * 2002-12-04 2004-06-09 Pioneer Corporation Music searching apparatus and method
GB2427291A (en) * 2005-06-17 2006-12-20 Queen Mary & Westfield College A method of analysing audio, music or video data
US20070261535A1 (en) * 2006-05-01 2007-11-15 Microsoft Corporation Metadata-based song creation and editing
US20080097633A1 (en) * 2006-09-29 2008-04-24 Texas Instruments Incorporated Beat matching systems
CN101399035A (en) * 2007-09-27 2009-04-01 三星电子株式会社 Method and equipment for extracting beat from audio file
EP2068255A2 (en) * 2007-12-07 2009-06-10 Magix Ag System and method for efficient generation and management of similarity playlists on portable devices
EP2204774A2 (en) * 2008-12-05 2010-07-07 Sony Corporation Information processing apparatus, information processing method, and program
WO2010129693A1 (en) * 2009-05-06 2010-11-11 Gracenote, Inc. Apparatus and method for determining a prominent tempo of an audio work
US20120060667A1 (en) * 2010-09-15 2012-03-15 Yamaha Corporation Chord detection apparatus, chord detection method, and program therefor

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488782A (en) * 2013-09-30 2014-01-01 华北电力大学 Method for recognizing musical emotion through lyrics
CN103488782B (en) * 2013-09-30 2016-07-27 华北电力大学 A kind of method utilizing lyrics identification music emotion

Also Published As

Publication number Publication date
GB201107903D0 (en) 2011-06-22
GB2490877B (en) 2018-07-18

Similar Documents

Publication Publication Date Title
Dixon Onset detection revisited
Klapuri Sound onset detection by applying psychoacoustic knowledge
Grosche et al. What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas.
US7301092B1 (en) Method and apparatus for synchronizing audio and video components of multimedia presentations by identifying beats in a music signal
JP3433818B2 (en) Music search device
Mauch et al. Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music.
US20080300702A1 (en) Music similarity systems and methods using descriptors
CN101189610B (en) Method and electronic device for determining a characteristic of a content item
MXPA01004281A (en) Fast find fundamental method.
Yoshii et al. Automatic Drum Sound Description for Real-World Music Using Template Adaptation and Matching Methods.
Cambouropoulos From MIDI to traditional musical notation
Belle et al. Raga identification by using swara intonation
US20060075883A1 (en) Audio signal analysing method and apparatus
Dittmar et al. Automated Estimation of Ride Cymbal Swing Ratios in Jazz Recordings.
Uhle et al. Estimation of tempo, micro time and time signature from percussive music
Paiva et al. On the Detection of Melody Notes in Polyphonic Audio.
Zhang et al. Main melody extraction from polyphonic music based on modified Euclidean algorithm
Grosche et al. Automatic transcription of recorded music
Mann et al. Music Mood Classification of Television Theme Tunes.
GB2490877A (en) Processing audio data for producing metadata and determining aconfidence value based on a major or minor key
JP6263382B2 (en) Audio signal processing apparatus, audio signal processing apparatus control method, and program
Gao et al. Vocal melody extraction via dnn-based pitch estimation and salience-based pitch refinement
JP6263383B2 (en) Audio signal processing apparatus, audio signal processing apparatus control method, and program
Shao et al. Automatic music summarization based on music structure analysis
Politis et al. Determining the chromatic index of music