GB2490877A - Processing audio data for producing metadata and determining aconfidence value based on a major or minor key - Google Patents
Processing audio data for producing metadata and determining aconfidence value based on a major or minor key Download PDFInfo
- Publication number
- GB2490877A GB2490877A GB1107903.5A GB201107903A GB2490877A GB 2490877 A GB2490877 A GB 2490877A GB 201107903 A GB201107903 A GB 201107903A GB 2490877 A GB2490877 A GB 2490877A
- Authority
- GB
- United Kingdom
- Prior art keywords
- tonality
- audio data
- transition
- major
- chord
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000005070 sampling Methods 0.000 claims abstract description 6
- 230000007704 transition Effects 0.000 claims description 67
- 230000008569 process Effects 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 3
- 239000000945 filler Substances 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 16
- 230000005236 sound signal Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 239000011295 pitch Substances 0.000 description 4
- 230000036651 mood Effects 0.000 description 3
- 230000001174 ascending effect Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- HCUOEKSZWPGJIM-YBRHCDHNSA-N (e,2e)-2-hydroxyimino-6-methoxy-4-methyl-5-nitrohex-3-enamide Chemical compound COCC([N+]([O-])=O)\C(C)=C\C(=N/O)\C(N)=O HCUOEKSZWPGJIM-YBRHCDHNSA-N 0.000 description 1
- NBGBEUITCPENLJ-UHFFFAOYSA-N Bunazosin hydrochloride Chemical compound Cl.C1CN(C(=O)CCC)CCCN1C1=NC(N)=C(C=C(OC)C(OC)=C2)C2=N1 NBGBEUITCPENLJ-UHFFFAOYSA-N 0.000 description 1
- 241000408533 Lento Species 0.000 description 1
- 240000002923 Najas minor Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- ZOMSMJKLGFBRBS-UHFFFAOYSA-N bentazone Chemical compound C1=CC=C2NS(=O)(=O)N(C(C)C)C(=O)C2=C1 ZOMSMJKLGFBRBS-UHFFFAOYSA-N 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000003292 diminished effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001256 tonic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/076—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/081—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for automatic key or tonality recognition, e.g. using musical rules or a knowledge base
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/075—Musical metadata derived from musical analysis or for use in electrophonic musical instruments
- G10H2240/085—Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Acoustics & Sound (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
Systems and methods for producing a measure of tonality operating by sampling audio data and determining the difference in a confidence value for a major key or a minor key. A summing function produces a weighted confidence value. Systems and methods also provide an indication of the rate of change or amount of change between major and minor tonality within music. The weighted tonality may be using chords. A signal asserted from a system embodying the invention is intended to be used to control studio equipment such as lighting or camera equipment according to a weighted tonality.
Description
I
PROCESSING AUDiO DATA FOR PRODUCiNG METADATA
BACKGROUND OF THE iNVENTION
This invention relates to systems and methods for processing audio data for producing metadata that can be used by a subsequent process. The invention particularly applies to audio data containing music.
Metadata relating to music may be used in a variety of systems and processes.
Such metadata may include, for example, data relating to the tone, key, tempo, volume, dynamic range or other attribute of a musical piece. Such metadata may be asserted as an output signal for controlling a process or system. Studio systems, for example, control lighting effects to match the beat and volume of music. Vision mixers can select between different cameras based on metadata such as the tempo of music. Archive systems may determine and store metadata extracted from audio data so as to allow subsequent efficient retrieval.
In all such systems and processes, there is a need to derive metadata from the underlying audio data representing an audio track containing music.
Attributes that may be extracted from music data include tonality (whether a major or a minor key), the tempo (beats per minute of the fundamental timing of the music) and similar attributes derived from either the key or tempo. In order to better understand these concepts, some basics of music will first be described with respect to Figures 1 and 2.
Much of modern western music is still based on compositional methods of the common practice period, which were employed almost exclusively in Europe between the Renaissance and the Twentieth Century. Harmony in this system is based upon the diatonic scale in which an octave (in which the top note is double the frequency of the bottom note) is sub-divided into 12 intervals in which each note has a frequency approximately 21/12 (or 1.059) times the one below it. Thus the thirteenth note will be (21/12)12 = 2 times the frequency of the first, hence recovering the octave. The notes are arranged in a scale denoted by seven letters from A to G, with the remaining five intermediate notes inserted between. t
S
If, for instance, the intermediate note lies between 0 and E, it can be referred to as D sharp (D#) because it is sharper or higher in pitch than note 0, or as E flat (Eb) because it is flatter or lower in pitch than the note E. Single intervals are referred to as semitones, whilst two intervals (such as from C to 0) are referred to as tones. This can be best explained by representing these notes on a keyboard (as shown in Figure 1), where the five intermediate notes are coloured in black and offset from the other seven.
Whilst it is possible to use any of the notes during a piece of music, for a basic melody, they very rarely are. Melodies tend to follow a scale, which is a made from a standard combination of 7 of the 12 notes available. The two most common combinations are major scales and minor scales (of which there are three valid versions where the 6th and 7th notes can be altered). The main difference between the two scales is the third note, which for a major key is a semitone higher than that of the minor. Major chords are generally perceived to be lighter, whilst minor chords are perceived to be darker and heavier.
Chords (or the occurrence of multiple notes concurrently) can be made up from any combination of the notes. Basic chords, whether major or minor, incorporate the first tone of the scale (or tonic -this note lends its name to the chord), the third tone of the scale (referred to as the mediant which denotes whether the chord is major or minor) and the fifth tone (referred to as the dominant). Figure 2 illustrates the difference between a C major chord and a C minor chord.
The tempo of a piece of music is a measure of the rate at which beats are struck and is usually given in beats per minute. As with tonality (major or minor) the tempos are a form of metadata which can be extracted from an audio signal.
Whilst methods and systems exist for producing metadata from audio data, these can be improved.
S
SUMMARY OF THE INVENTION
We have appreciated the need to provide improved systems and methods for processing audio data to produce metadata. The invention resides in four aspects.
A first aspect of the invention relates to systems and methods for producing a weighted tonality measure by sampling audio data at intervals, determining the difference between a confidence value for a major key and a confidence value for a minor key at each interval, and summing a function such as the product of this difference value multiplied by the peak confidence value at each interval to produce a weighted confidence value. This first aspect provides a clearer emphasis to one tonality where the opposite tonality has a low confidence measure.
A second aspect of the invention resides in systems and methods for determining a weighted tonality differential indicative of the rate or amount of change between a major and minor tonality within music by sampling audio data at transitions of tonality, determining the difference in confidence measure between major and minor tonality both before and after each transition, summing a function such as the difference in confidence measure before each transition with the difference in confidence measure after the transition to produce a summed difference, and summing the summed difference values for the transitions. This aspect improves the quality of metadata which may be asserted as an output signal because the weighting of each tonality transition ensures that the summation better represents the true amount of tonality change.
A third aspect of the invention, similar to the second aspect, determines a weighted chord differential. In this aspect, confidence values of dominant chords before and after a chord transition are determined, a difference in confidence value before a chord transition and a difference in confidence value after the transition are determined, a function including the difference in confidence values is summed thereby emphasising where there is a high degree of change of confidence that a chord transition has occurred, but reducing the effect of changes where there is a low degree of confidence of a chord transition.
In a fourth aspect of the invention, a tempo extraction system and method filters an audio signal into separate signals for different frequency bands, produces a beat rate for each band, orders and groups the beat rates and determines which group has the highest number of members. This shows the beat that is present in most frequencies but, this is not necessarily the tempo. To determine the tempo this aspect then determines if any frequencies have a beat rate approximately half the rate of the group with the highest number of members. If so, this is determined as the tempo. If no frequency has a beat rate with half the beat rate present in most frequencies, an analysis is undertaken to check for any signals at a third of the beat rate of the highest group and if any are found, this is determined to be the tempo of the music. In this way, it is more likely that the actual tempo of the music will be correctly determined, rather than a multiple such as half beats or one third beats.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will now be described in more detail by way of example with reference to the drawings, in which: Figure 1: is a diagram of the diatonic scale labelled with notes and approximate associated frequencies; Figure 2: is a diagram showing that the C Major chord is made up of C, E and G, whilst the C Minor chord is made up of C, Eb, and G; Figure 3: shows the main functional components of a system embodying tonality aspects of the invention; Figure 4: shows a graph of confidence values for each of 12 possible major chords and 12 possible minor chords within a given piece of music; Figure 5: shows peak and weighted tonality values for various music; Figure 6: shows the system embodying tempo extraction aspects of the invention; Figure 7: shows frequency bands used to filter audio; and Figure 8: shows beat signals for three different music in each of 10 frequency bands.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The invention may be embodied in a method and system for processing audio data, particularly audio data representing an audio signal derived solely or in part from a piece of music, Whilst all aspects of the invention may be embodied in single system, for convenience of understanding, the tonality and tempo aspects will be described as separate embodiments. Furthermore, the embodiments may be dedicated hardware or suitably programmed general purpose processors.
A system embodying the tonality aspects of the invention is shown in Figure 3.
An input 2 receives an audio signal and provides this in an appropriate form to a sampler 4. The input audio signal could be an analogue signal direct from a live feed or an already processed digital signal from an audio archive. The input 2 provides audio data. The sampler 4 samples the audio data at selected time intervals and provides the samples to a tonality processor which determines a measure of the confidence of tonality (major or minor) for each sample. This is provided to a weight sum processor 8. The processing within the weight sum processor provides the advantages of improved tonality metadata production as will be described presently.
The sampler 4 and tonality amplitude processor 6 may be combined together in a single process and use a number of known techniques. One such technique is implemented within a known software tool named Matlab and in particular within an open source toolbox for Matlab named MIR toolbox. This toolbox may be used to implement various functions on audio data to determine energy, tempo, tonality and key and is just one example of known techniques for operating such functions.
The MIR toolbox in Matlab incorporates a function called mirkeystrength. There are 12 possible major chords and 12 possible minor chords. The function calculates and assigns a probability or confidence value to each of the possible 24 chords at a sample rate that can be controlled with the function. In this embodiment, half second intervals are used. The function calls another MIR toolbox function, mirchromagram, which calculates the energy distribution for each note in the diatonic scales. The pitches are then concatenated into one octave and normalized. Next, mirkeystrength cross-correlates the chromagram with the chromagram one would expect for each of the 24 chords and assigns a probability or confidence value to each chord, where a probability of +1 for the tested chord would indicate a definite match whilst -1 would indicate a definite mismatch. In practice neither +1 nor -1 is obtained because of the presence of other characteristic frequencies of musical instruments and synthesisers. The confidence value may be on any arbitrary scale that indicates a measure of the confidence with which a given chord is found. The output is accessible either as raw data or as a graph in which the keystrength of the music is represented by a colour spectrum, with red indicating a clear match and blue a clear mismatch.
Figure 4 shows such a graph for the music "Last of the Summer Wine".
As can be seen in Figure 4 a graphical representation of the possible keys used in "Last of the Summer Wine" with time is shown. Major keys are denoted with capital M, minor keys with a small m. The red (darker) colours denote a high degree of matching the green and blue (lighter) colours denote progressively lower degrees of matching. Consequently, this piece is predominately in C major as shown by the bottom line, though C minor gives quite a strong match also.
The premise of tonality calculations is that the exact key in which the music is composed does not matter. It has been proposed by musicologists that some major keys sound brighter than others and some minor keys more mournful than others, but we consider the differences subtle to all but the most advanced musicians. In fact, music is very rarely made up of one chord, with chords often progressing to different base notes within the scale, so such considerations are somewhat nugatory. Whilst major and minor chords are the basic construct of a piece of Western-style music, other chord types such as dominant 7th, diminished 7th, extended, added tone and dissonant chords are used to great effect in music to elicit different emotions. However, by their nature, they are more complex and hence difficult to detect and can often be confused with major and minor chords. Consequently, when such chords are present, one would expect the key clarity to diminish. This can be seen again in Figure 4 at around the 3 second mark, where an added tone chord of C, D, F and A is played. The confidence value does not adequately differentiate between 0 minor, A minor and F major chords as a consequence, with no assigned probability particularly high.
We have appreciated that the tonality calculations should be weighted. In Figure 4, shortly after 10 seconds, a strong C major chord is present. The next highest probability is considerably less. This should therefore count more towards the tonality calculation than the uncertainty at the 3 second mark. The weight sum processor 8 shown in Figure 3 implements a process to determine a weighted tonality. The weighted tonality is calculated by determining the probability amplitude which may also be referred to as a confidence measure for each of the (i) peak major tonality, (ii) peak minor tonality and (iii) peak tonality irrespective of whether major or minor. The difference between the peak confidence measure of major tonality and peak confidence measure of minor tonality is multiplied by the confidence measure of tonality at each interval. The resulting value at each interval is then summed for all intervals and divided by the number of intervals.
The resulting weighted tonality gives an indication of the measure of confidence of tonality on a positive to negative scale, positive values indicating major tonality and negative values indicating minor tonality. In the example given, each confidence measure varies from +1 to -1 and so summing n such confidence measures and dividing by n will give a weighted tonality which also varies from +1 to -1, but of course is more accurate in representing the tonality of the whole piece of music.
The weighted tonality, W, is defined as: w = (Ekrnax(Kmarkmin)) n Equation I Where: Kmax = peak confidence measure of tonality (irrespective of whether major or minor) Kmaj peak confidence measure of major tonality Kmin = peak confidence measure of minor tonality n = the number of time intervals used to classify the sample of music.
summed over all n and divided by n. Minor keys will therefore be of negative W. This measure gives a much clearer representation of the tonality of the music under consideration because it emphasises certainty where it exists and minimizes uncertain contributions. Consequently, it gives a clearer comparison between similar music, spreading the data, enabling subsequent processes such as a classifier to determine music classification with greater clarity.
Eleven theme tunes were tested for peak tonality and weighted tonality and the results shown in Figure 5. Weighted tonality offers a better tonality comparison than subtracting the peak major from the peak minor.
As well as the overall nature of the tonality in the music, it is also useful to know the frequency with which tonality changes. Attributes such as exciting and dramatic can be associated with a more frequent change of tonality and key and used to assert an output accordingly. Consequently, two further measures are made in this embodiment. The first is a weighted tonality differential, which detects the rate at which the tonality changes per second during the course of the music. The second measure determined is a weighted chord differential, which detects the rate at which the dominant chord changes in the piece; the dominant chord may change but this doesn't necessarily mean a change in tonality (for instance the chord can change from an A major to an F major chord).
The weighted tonality differential will be described first. This measure uses the same data taken from calculating the weighted tonality as described above. It detects where the tonality changes (i.e. whether from major to minor or vice versa) and weights it with the certainty that the key change has happened. It does this by finding the transitions and calculating the sum of the certainties associated with the chords before and after the transition and optionally multiplying by the peak confidence measure of tonality. The difference in certainty/ confidence is given by: (Kmaj-Kminljftl Kmaj-Kminlj+i (where j corresponds to the certainty before and j+1 to the certainty after).
It will only do this at transition locations. Where there is not a transition, the differential will be 0 (see table I below). This is then averaged over the number of time intervals, n. Again, because this weights the transitions with a certainty that the tonality change has happened, it gives greater emphasis to clearer transitions, thus filtering out noise.
T = Z(lKc KminIj± U<mar KmanIj+i) Equation 2 where for all j, the expression contained within the brackets is only summed where a transition occurs and is otherwise equal to zero.
As described by the equation above, the weighted tonality differential emphasises those transitions that have a high confidence measure of one tonality and low confidence measure of another before a transition and a low confidence measure of the tonality and a high confidence measure of the other tonality after the transition. On the other hand, transitions where there is less of a difference between the confidence measures of each tonality either before or after (or both) a detected transition are given less weight in the summation. The tonality differential may be multiplied by the peak tonality, or other measure, to further weight the function before summing.
In the example of Table I, the overall weighted tonality differential for the data is 0.335/n = 0.112. Note that the total number of time intervals, n = 3 in this case and that I Kmaj -Kmin I j+ I Kmaj -Kmin I j+1 is not calculated if tonality does not change.
Timeinterval K!qK number 0583 0M74 -0.092 0.335 i1 0.786 0.543 0.243 0.000 i2 0.777 0.562 0.215
Table I
There are other possible ways in which the transition can be weighted. For instance, one could look one or two time intervals beyond the transition to give an overall transition probability. It may be that simply down to how the music is divided up that the transition may occur at a boundary between time intervals, thus giving uncertainty to a tonal change which is actually clear. However, work carried out so far doesn't indicate that this is a significant problem.
The weighted chord differential will now be described. This measure again uses the same data taken from calculating the weighted tonality. It searches for the dominant chord, and detects transitions of the dominant chord. The transition is weighted with a chord transition certainty, which is calculated by looking at the change in certainty of the two chords in question before and after the transition.
In this aspect, the tonality processor of Figure 3 is replaced by a chord processor.
Let us define K as the confidence value for the maximum certainty chord before the transition and K÷i as the confidence value of this chord after the transition.
Likewise Li is defined as the as the confidence value of the new dominant chord before the transition and Li+i as the confidence value of the new dominant chord after it. The transition is weighted by the factor The effect of this again is to give added weight to more certain transitions and to decrease the effect of less certain transitions. Again, where a transition doesn't occur, the differential will be 0. Table 2 gives an example of a weighted chord differential calculation.
The weighting of the chord differential can be understood by the full summation equation shown below. By taking the sum of the differences in confidence and value of the two dominant chords involved in a transition before and after the transition, greater weight is given to those transitions where there is a large change in confidence value. By summing the weighted values at all such transitions, a resulting single weighted chord differential value gives a measure of confidence as to the relative rate of chord changes within a given piece of music.
Z(IKI-bI÷IKi+i--La+iI) = 1=1 n Equation 3 where for all i, the expression contained within the brackets is only counted where a transition occurs and is otherwise equal to zero.
The chord differential may be multiplied by the peak chord confidence, or other measure, to further weight the function before summing.
Time interval number K L (IC -L) + -IC+i) 0.773 0356 0.675 i÷1 0.567 0.825 0.000 i2 0.587 0.796
Table 2
As shown in Table 2, the overall weighted chord differential for the above data is 0.675/n = 0.225. Note n = 3 in this case and that (Ki -Li) + (Li+1 -Ki+1) is not calculated if key does not change.
A second embodiment of the invention relates to accurate determination of tempo as will now be described in relation to Figure 6. The system embodying the invention has an input 12 for receiving an audio signal and presenting this in an appropriate format to a sampler 14 which samples the audio data at a chosen interval and provides this to a beat processor 16. The sampler 14 and beat processor 16 may be provided by known techniques such as the Matlab MIR toolbox.
Tempo is a measure of the rate at which beats aie struck and is usually given in beats per minute (bpm). A set of typical tempo values is listed in Table 3.
Tempo marking Tempo range (bpm) Larghissimo 0-40 Lento/Largo 40-60 LarghettolGrave 60-66 Adagio 66-76 Andante 76-108 Moderato 108-120 Allegro 120-168 Presto 168-200 Prestissimo 200+
______________________________ _______________________________
Table 3
Most tempi tend to be in the range 60-1 80 beats per minute. Other tempi do occur but are much less frequent.
A distinction may be drawn, though, between the beats discovered within a piece of music by a beat processor and the actual tempo of the overall piece of music.
The reason for this is the extensive use of percussion or plucked instruments in music, which have sharp, high frequency onsets which the software picks up.
Consequently, instruments such as the guitar, which often plays on the half beat unless strummed, will produce sharp, high frequency onsets on the half beat.
The techniques for extracting the audio features are accurate, but it is the way they are interpreted which causes the problem. If the maximum rate of beats is simply counted, then the tempo of a piece will often erroneously be determined to be twice or three times the actual tempo of the music. It is for this reason that a tempo processor 18 is configured to provide a rnetadata output which may be asserted on output 20.
The tempo processor 18 operates by taking beat signals from the beat processor for each of a number of different frequencies, grouping the tempos indicated by each of the beat signals, determining the most frequently occurring range of beats per minute and then determining whether there are beat signals having one half or one third of this number of beats per minute. In either case, this would indicate that the number of beats per minute occurring most often is not actually the tempo of the piece, but is instead a multiple showing one half or one third beats. Each of the steps undertaken by the tempo processor will now be described in turn.
Figure 7 illustrates 10 frequency filters and Figure 8 three theme tunes filtered into ten, roughly logarithmically equal frequency bands using the filtering Figure 7 which roughly correlate with octaves. As can be seen, when this is done, the beat is clearly visible in certain frequency bands, but the band in which they occur is not necessarily the same each time. The assumption for many automated beat extractors is that the beat is present in the lower frequencies, but this is a generalisation too far.
As shown in Figure 8, the filtered waveform of three theme tunes with the frequency bands ascending in pitch from bottom top are: left Postman Pat, centre Dad's Army and right Eastenders. Red (mid grey) waveforms indicate the bands in which an autocorrelation of onsets returns the musically correct tempo, green (light grey) indicates where the function returns double the tempo, and blue (dark grey) indicates a spurious result.
Tempo calculations are carried out on the filtered waveforms in Figure 5 using the mirtempo function. The function mirtempo calculates the tempo by picking the highest peaks in the autocorrelation function of onset detection and computing the time beween such peaks. The red bands indicate where the tempo was correctly identified, the green where a tempo twice that of the correct tempo was calculated. Note that in no instance has a tempo half that of the correct tempo been found and that the correct beat can be clearly identified in the red waveforms.
The operation of the tempo processor embodying the invention is as follows. The mirtempo function is applied to each of the ten filtered waveforms. The extracted tempi are arranged in ascending order, the difference in tempo between adjacent values noted, and g, the number of times the difference exceeds a threshold chosen to be 10 beats per minute, noted. A statistical function krneans is used to cluster the data. Other grouping functions may be used. Extracted tempi are likely to vary by one or two beats per minute within a cluster and gil clusters are found (but with g limited to 4 so that tracks with no clear tempo do not give an arbitrary result). The standard deviation of the beats per minute inside each cluster is also noted to give a measure of how precise the extracted tempo is. It is therefore possible to return an unspecified tempo should this value go above a certain threshold.
When the data is clustered, the largest cluster (or mode) is found. The processor then searches for a tempo within a tolerance (which may be varied) chosen to be 15% of half the value of the mode tempo. If this exists, the slower tempo is chosen as the correct tempo. The processor then searches for a tempo within 15% of a third of the value of the mode tempo. Again, if this exists, this slower tempo is chosen as the correct tempo. If neither a half nor a third tempo is found, the mean value of the modal cluster is chosen as the tempo. Table 4 shows a worked example of this for Last of the Summer Wine.
Extracted Sorted and Mean of Mode? Halt factor? Third factor? Tempo tempi clustered each cluster 145.27 81.062 86.007 No Yes No 168.53 9(1951 __________ __________ __________ __________ 169.84 127.66 127.66 -No No No 127.66 145.27 145.27 No No -No 174.01 165.41 86.007 81.062 168.53 90.951 168.99 165.41 16W84 170.18 Yes No No 4.31 174.01 168.99 17431 ___________ ___________ ___________ ___________ ___________
Table 4
As shown in Table 4 the steps in calculating the tempo of Last of the Summer Wine flow from left to right. Tempi are extracted for each of the ten bands; they are then sorted and clustered. The mean is taken of each cluster, the mode determined and then each cluster tested to see whether it is a half factor or third factor of the mode. If either exists, the factor is chosen as the tempo, if not, the mode is chosen.
For completeness, applications of the embodiments of the invention will be now discussed. A first application is in studio equipment in controlling devices such as lighting equipment, camera shot changes or the like. A signal asserted from a system embodying the invention can be provided as an input and used to control such devices based on the weighted tonality, tonality differential, chord differential or tempo value.
A further application is in devising metadata for audio data extracted from a music archive.
The weighted tonality and a new tempo extractor can be used to improve music mood classification. Weighted tonality gives greater emphasis to musical passages of clear tonality, whilst ignoring more ambiguous passages. The tempo extractor outperforms commonly available software.
Using weighted tonality as a dimension for support vector machine classification enhances the classification of the moods happy, light and heavy to over 80%, amongst the best reported success rates in the field. The improved tempo extractor will improve the success rate of moods such as exciting and dramatic.
Claims (23)
- CLAIMS1. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a sampler arranged to sample the audio data at time intervals; -a tonality processor arranged to produce a major tonality confidence measure, a minor tonality confidence measure and a peak tonality confidence measure for each sample; -a weight sum processor arranged to produce a weighted tonality confidence measure for the audio data by summing for all the samples a function of a difference in major and minor confidence measures for each of the samples; and -an output for asserting the weighted tonality confidence measure.
- 2. A method of processing audio data relating to music to produce metadata, comprising: -providing audio data; -sampling the audio data at time intervals; -producing a major tonality confidence measure, a minor tonality confidence measure and a peak tonality confidence measure for each sample; -producing a weighted tonality confidence measure for the music by summing for all the samples a function of a difference in major and minor confidence measures for each of the samples; and -asserting the weighted tonality confidence measure.
- 3. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a sampler arranged to sample the audio data; -a tonality processor arranged process the samples and to detect transitions in tonality and to produce a major and minor confidence measure before each transition and a major and minor confidence measure after each transition; -a weight sum processor arranged to produce a weighted tonality differential by summing for all the transitions a function of the difference in major and minor confidence measures both before and after each transition; and -an output for asserting the weighted tonality differential.
- 4. A method of processing audio data relating to music to produce metadata, comprising: -providing audio data; -sampling the audio data; -detecting transitions in tonality and producing a major and minor confidence measure before each transition and a major and minor confidence measure after each transition; -producing a weighted tonality differential by summing for all the transitions a function of the difference in major and minor confidence measures both before and after each transition; and -asserting the weighted tonality differential..
- 5. A system according to claim I or 3, wherein the function comprises multiplying by peak tonality for each respective the sample.
- 6. A system according to claim I or 3, further configured to divide the summed function by the number of samples.
- 7. A system according to claim 5, wherein the peak tonality confidence measure is a measure of the maximum tonality confidence irrespective of whether major or minor.
- 8. A system according to any of claims 1, 3, 5 to 7, wherein the major tonality confidence measure is a measure of the confidence of a major key.
- 9. A system according to any of claims 1, 3, 5 to 7, wherein the minor tonality confidence measure is a measure of the confidence of a minor key.
- 10. A system according to any of claims 1, 3, 5 to 7, wherein the sampler is configurable to vary the time intervals.
- 11. A system according to claim 10, wherein the sampler is configurable based on an tempo input from a tempo processor.
- 12. A system according to any of claims 1, 3, 5 to 11, wherein the output comprises a control for controlling audio or video equipment according to the weighted tonality measure.
- 13. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a sampler arranged to sample the audio data; -a chord processor arranged to process the samples and to detect chord transitions and to produce a confidence measure of a first chord that is dominant before each transition and of that first chord after the transition and a confidence measure of a second chord that is dominant after each transition and of that second chord before the transition; -a weight sum processor arranged to produce a weighted chord differential by summing for all the transitions a function of the difference in confidence measure of the first and second chords before and after each transition; and -an output for asserting the weighted chord differential.
- 14. A method of processing audio data relating to music to produce metadata, comprising: -providing audio data; -sampling the audio data; -detecting chord transitions and producing a confidence measure of a first chord that is dominant before each transition and of that first chord after the transition and a confidence measure of a second chord that is dominant after each transition and of that second chord before the transition; and -producing a weighted tonality differential by summing for all the transitions a function of the difference in confidence measure of the first and second chords before and after each transition.
- 15. A system according to claim 13, wherein the function comprises multiplying by a peak chord measure for each respective sample.
- 16. A system according to claim 13, further configured to divide the summed function by the number of samples.
- 17. A system for processing audio data relating to music to produce metadata, comprising: -an input for providing audio data; -a filler arranged to filter the audio data into a plurality of frequency bands; -a beat processor arranged to produce a beat rate for each frequency band; and -a tempo processor arranged to determine and output a tempo value by: -receiving the beat rates, -grouping the beat rates into groups of similar beat rates, -determining the group containing the most members, -comparing a notional beat rate of the group containing most beat rates to the notional beat rates of other groups, and -outputting the notional beat rate of any group having, within a given tolerance, half the notional beat rate of the group containing most members if such a group exists, or outputting the notional beat rate of the group containing most members if such a group does not exist.
- 18. A system according to claim 17, wherein the tempo processor is further arranged to determine and output a tempo value by outputting the notional beat rate of any group having, within a given tolerance, one third the notional beat rate of the group containing most members if such a group exists.
- 19. A system according to claim 17 or 18, wherein the notional beat rate of each group is one of the mean, median or mode, maximum or minimum beat rate of the rates in the group.
- 20. A system according to claim 17 or 18, wherein the tolerance is a percentage.
- 21. A system according to claim 20, wherein the tolerance is 15%.
- 22. A method of processing audio data relating to music to produce metadata, comprising: -receiving audio data; -filtering the audio data into a plurality of frequency bands; -producing a beat rate for each frequency band; and -determining a tempo value by: -receiving the beat rates, -grouping the beat rates into groups of similar beat rates, -determining the group containing the most members, -comparing a notional beat rate of the group containing most members to the notional beat rates of other groups, and -outputting the notional beat rate of any group having within a given tolerance half the notional beat rate of the group containing most members if such a group exists, or outputting the notional beat rate of the group containing most members if such a group does not exist.
- 23. A system arranged to produce metadata according to any of equations 1, 2 or 3 described herein.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1807498.9A GB2560458B (en) | 2011-05-11 | 2011-05-11 | Processing audio data for producing metadata |
GB1107903.5A GB2490877B (en) | 2011-05-11 | 2011-05-11 | Processing audio data for producing metadata |
GB1807502.8A GB2560459B (en) | 2011-05-11 | 2011-05-11 | Processing audio data for producing metadata |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1107903.5A GB2490877B (en) | 2011-05-11 | 2011-05-11 | Processing audio data for producing metadata |
Publications (3)
Publication Number | Publication Date |
---|---|
GB201107903D0 GB201107903D0 (en) | 2011-06-22 |
GB2490877A true GB2490877A (en) | 2012-11-21 |
GB2490877B GB2490877B (en) | 2018-07-18 |
Family
ID=44243972
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB1107903.5A Active GB2490877B (en) | 2011-05-11 | 2011-05-11 | Processing audio data for producing metadata |
Country Status (1)
Country | Link |
---|---|
GB (1) | GB2490877B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488782A (en) * | 2013-09-30 | 2014-01-01 | 华北电力大学 | Method for recognizing musical emotion through lyrics |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1426921A1 (en) * | 2002-12-04 | 2004-06-09 | Pioneer Corporation | Music searching apparatus and method |
GB2427291A (en) * | 2005-06-17 | 2006-12-20 | Queen Mary & Westfield College | A method of analysing audio, music or video data |
US20070261535A1 (en) * | 2006-05-01 | 2007-11-15 | Microsoft Corporation | Metadata-based song creation and editing |
US20080097633A1 (en) * | 2006-09-29 | 2008-04-24 | Texas Instruments Incorporated | Beat matching systems |
CN101399035A (en) * | 2007-09-27 | 2009-04-01 | 三星电子株式会社 | Method and equipment for extracting beat from audio file |
EP2068255A2 (en) * | 2007-12-07 | 2009-06-10 | Magix Ag | System and method for efficient generation and management of similarity playlists on portable devices |
EP2204774A2 (en) * | 2008-12-05 | 2010-07-07 | Sony Corporation | Information processing apparatus, information processing method, and program |
WO2010129693A1 (en) * | 2009-05-06 | 2010-11-11 | Gracenote, Inc. | Apparatus and method for determining a prominent tempo of an audio work |
US20120060667A1 (en) * | 2010-09-15 | 2012-03-15 | Yamaha Corporation | Chord detection apparatus, chord detection method, and program therefor |
-
2011
- 2011-05-11 GB GB1107903.5A patent/GB2490877B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1426921A1 (en) * | 2002-12-04 | 2004-06-09 | Pioneer Corporation | Music searching apparatus and method |
GB2427291A (en) * | 2005-06-17 | 2006-12-20 | Queen Mary & Westfield College | A method of analysing audio, music or video data |
US20070261535A1 (en) * | 2006-05-01 | 2007-11-15 | Microsoft Corporation | Metadata-based song creation and editing |
US20080097633A1 (en) * | 2006-09-29 | 2008-04-24 | Texas Instruments Incorporated | Beat matching systems |
CN101399035A (en) * | 2007-09-27 | 2009-04-01 | 三星电子株式会社 | Method and equipment for extracting beat from audio file |
EP2068255A2 (en) * | 2007-12-07 | 2009-06-10 | Magix Ag | System and method for efficient generation and management of similarity playlists on portable devices |
EP2204774A2 (en) * | 2008-12-05 | 2010-07-07 | Sony Corporation | Information processing apparatus, information processing method, and program |
WO2010129693A1 (en) * | 2009-05-06 | 2010-11-11 | Gracenote, Inc. | Apparatus and method for determining a prominent tempo of an audio work |
US20120060667A1 (en) * | 2010-09-15 | 2012-03-15 | Yamaha Corporation | Chord detection apparatus, chord detection method, and program therefor |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488782A (en) * | 2013-09-30 | 2014-01-01 | 华北电力大学 | Method for recognizing musical emotion through lyrics |
CN103488782B (en) * | 2013-09-30 | 2016-07-27 | 华北电力大学 | A kind of method utilizing lyrics identification music emotion |
Also Published As
Publication number | Publication date |
---|---|
GB201107903D0 (en) | 2011-06-22 |
GB2490877B (en) | 2018-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dixon | Onset detection revisited | |
Klapuri | Sound onset detection by applying psychoacoustic knowledge | |
Grosche et al. | What Makes Beat Tracking Difficult? A Case Study on Chopin Mazurkas. | |
US7301092B1 (en) | Method and apparatus for synchronizing audio and video components of multimedia presentations by identifying beats in a music signal | |
JP3433818B2 (en) | Music search device | |
Mauch et al. | Timbre and Melody Features for the Recognition of Vocal Activity and Instrumental Solos in Polyphonic Music. | |
US20080300702A1 (en) | Music similarity systems and methods using descriptors | |
CN101189610B (en) | Method and electronic device for determining a characteristic of a content item | |
MXPA01004281A (en) | Fast find fundamental method. | |
Yoshii et al. | Automatic Drum Sound Description for Real-World Music Using Template Adaptation and Matching Methods. | |
Cambouropoulos | From MIDI to traditional musical notation | |
Belle et al. | Raga identification by using swara intonation | |
US20060075883A1 (en) | Audio signal analysing method and apparatus | |
Dittmar et al. | Automated Estimation of Ride Cymbal Swing Ratios in Jazz Recordings. | |
Uhle et al. | Estimation of tempo, micro time and time signature from percussive music | |
Paiva et al. | On the Detection of Melody Notes in Polyphonic Audio. | |
Zhang et al. | Main melody extraction from polyphonic music based on modified Euclidean algorithm | |
Grosche et al. | Automatic transcription of recorded music | |
Mann et al. | Music Mood Classification of Television Theme Tunes. | |
GB2490877A (en) | Processing audio data for producing metadata and determining aconfidence value based on a major or minor key | |
JP6263382B2 (en) | Audio signal processing apparatus, audio signal processing apparatus control method, and program | |
Gao et al. | Vocal melody extraction via dnn-based pitch estimation and salience-based pitch refinement | |
JP6263383B2 (en) | Audio signal processing apparatus, audio signal processing apparatus control method, and program | |
Shao et al. | Automatic music summarization based on music structure analysis | |
Politis et al. | Determining the chromatic index of music |