WO2007072394A2 - Audio structure analysis - Google Patents

Audio structure analysis Download PDF

Info

Publication number
WO2007072394A2
WO2007072394A2 PCT/IB2006/054915 IB2006054915W WO2007072394A2 WO 2007072394 A2 WO2007072394 A2 WO 2007072394A2 IB 2006054915 W IB2006054915 W IB 2006054915W WO 2007072394 A2 WO2007072394 A2 WO 2007072394A2
Authority
WO
WIPO (PCT)
Prior art keywords
energy
similarity
determining
music signal
beat
Prior art date
Application number
PCT/IB2006/054915
Other languages
French (fr)
Other versions
WO2007072394A3 (en
Inventor
Aweke N. Lemma
Original Assignee
Koninklijke Philips Electronics N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics N.V. filed Critical Koninklijke Philips Electronics N.V.
Publication of WO2007072394A2 publication Critical patent/WO2007072394A2/en
Publication of WO2007072394A3 publication Critical patent/WO2007072394A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/021Indicator, i.e. non-screen output user interfacing, e.g. visual or tactile instrument status or guidance information using lights, LEDs, seven segments displays
    • G10H2220/081Beat indicator, e.g. marks or flashing LEDs to indicate tempo or beat positions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/135Autocorrelation

Definitions

  • the present invention relates to audio structure analysis. More in particular, the present invention relates to a device for and method of determining accented beats in a music signal.
  • United States Patent US 6 542 869 discloses a method of determining points of change in an audio signal by measuring the self-similarity of components of the audio signal.
  • the self-similarity as well as cross-similarity between each of a set of signal parameterization values is determined for all past and future time window regions.
  • a significant point of change will have a high self-similarity in the past and future, and a low cross-similarity.
  • This known method may be used for beat tracking, including finding the tempo and location of downbeats in music.
  • the present invention provides a device for determining accented beats in a music signal, the device comprising: energy determination means for determining the energy of the music signal, segmentation means for segmenting the energy on the basis of a tempo estimate, similarity determination means for determining the similarity between the energy of segments, and selecting means for selecting the segment having the smallest similarity as the segment containing an accented beat.
  • a tempo estimate is used to aid the segmentation of the calculated energy.
  • This tempo estimate may be produced using any known method and may also involve detecting beat onsets, although this is not essential. It is preferred that the segmentation substantially corresponds with the beat onsets (the beginning of each beat), but this is not essential.
  • any other equivalent property may be determined, such as its magnitude.
  • the similarity determination means may be arranged for carrying out a cross- correlation, an autocorrelation, a distance measurement, an information measurement and/or a pattern match.
  • a cross-correlation is preferred, but other (dis)similarity measures may also be used.
  • the segmentation means are preferably arranged for segmenting the energy on beat onset positions. Additionally, or alternatively, the segmentation means are preferably arranged for providing the segments in parallel so as to allow a simple, essentially one-dimensional comparison.
  • the device of the present invention may further comprise tempo estimation means for estimating the tempo of the music signal.
  • tempo estimation means which may also determine beat onsets, may also be external to the device.
  • the energy determination means may be arranged for determining the time domain energy.
  • the device of the present invention further comprises a transform means for transforming the music signal to a transform domain, while the energy determination means are arranged for determining the transform domain energy, said transform domain preferably being the frequency domain.
  • the transform means are preferably arranged for performing a Fast Fourier Transform (FFT).
  • FFT Fast Fourier Transform
  • the device of the present invention may further comprise a frame compilation means for compiling frames of the music signal, and/or an energy buffer means for buffering the (time and/or transform domain) energy.
  • the device of the present invention may advantageously further comprise a filter means arranged between the segmentation means and the similarity determination means for filtering the energy segments prior to determining their similarity.
  • the filter means serve to reduce any influence of transients and improve the reliability of the accented beat estimates.
  • a music system such as an AutoDJ system, according to the present invention comprises an accented beat determination device as defined above.
  • the present invention also provides a method of determining accented beats in a music signal, the method comprising the steps of: determining the energy of the music signal, - segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, and selecting the segment having the smallest similarity as the segment containing an accented beat.
  • the method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat. Accordingly, the present invention also provides a method of detecting bar boundaries in a music signal, the method comprising the steps of: determining the energy of the music signal, segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, selecting the segment having the smallest similarity as the segment containing an accented beat, and equating the bar boundary with the beat onset of the accented beat. Further advantageous embodiments of the inventive device and methods will become apparent from the description below.
  • the present invention additionally provides a computer program product for carrying out the method as defined above.
  • a computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD.
  • the set of computer executable instructions which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
  • Fig. 1 schematically shows the energy of signal segments as processed according to the present invention
  • Fig. 2 schematically shows a first embodiment of an accented beat detection device according to the present invention
  • Fig. 3 schematically shows a second embodiment of an accented beat detection device according to the present invention
  • Fig. 4 schematically shows an AutoDJ system in which the invention may advantageously be utilized.
  • the energy of a music signal as a function of time is schematically illustrated in Fig. 1.
  • the energy E illustrated in Fig. 1 may be determined by an accented beat detection device of the present invention, which will be discussed later with reference to Figs. 2 and 3.
  • the top diagram of Fig. 1 shows the energy E of a music signal as a function of time (sample number n) or frequency (frequency bin k).
  • sample number n sample number
  • frequency bin k frequency
  • the music signal is segmented into segments or beat periods BP. In the example shown, the segment boundaries are at the peaks of the energy signal E.
  • the segments can be labeled I, II, III and IV so as to correspond with the four respective beats. It is noted that at this stage, the accented beat and the beginning of the measure are not yet known and the label I is essentially arbitrary.
  • each segment I, II, III and IV it is preferred to use multiple copies of each segment so as to average out any noise. Accordingly, the energy E of all first segments I (of a certain time period or time frame) are concatenated, resulting in the energy signal E labeled I in the leftmost lower diagram of Fig. 1. It is to be understood that the lower diagram labeled I contains a succession of segments I of the top diagram. Similarly, the second segments II are concatenated so as to produce the succession of segments illustrated in the lower diagram labeled II, while the same action is repeated for the segments III and IV.
  • the successions of segments I, II and III are very similar and a similarity measure (such as cross-correlation) would yield a high degree of similarity.
  • the segments IV have a different shape and are therefore less similar. It can therefore be concluded that the segments IV represent the accented beats (downbeats), as they are the most dissimilar.
  • the (dis)similarity can be determined in various ways, for example by determining the cross-correlation of each succession I, II, III and IV with each of the other successions, the succession having the lowest aggregate cross-correlation with the other successions representing the accented beat. Additionally, or alternatively, the autocorrelation of each succession may be determined, the most dissimilar autocorrelation value indicating the accented beat. In other embodiments, the shape and/or amplitude of the successions may be involved using pattern matching techniques or distance measures. It will be understood that the particular technique of determining the (dis)similarity of the successions is not essential.
  • FIG. 2 A first embodiment of an accented beat detection device 1 according to the present invention is schematically illustrated in Fig. 2.
  • the device 1 shown merely by way of non-limiting example in Fig. 2 is arranged for time domain similarity determination and comprises an energy calculation unit 12, a segmentation unit 15, a similarity determination unit 17 and a selecting unit 18.
  • the energy calculation unit 12 receives a (digital) music signal x[n] and determines its energy (or any other suitable parameter), for example the signal energy E[n] (energy per sample n).
  • This energy signal E[n] is fed to the segmentation unit 15, which acts as a demultiplexer (DMux).
  • Dux demultiplexer
  • the segmentation unit receives tempo (beat and/or beat onset) information T and beats-per-measure information M and segments the energy E[n] accordingly (see also Fig. 1).
  • the segmented energy is fed per segment number (I - IV in Fig. 1) to the similarity (SIM) determination unit 17.
  • SIM similarity
  • the similarity determination unit 17 determines the similarity between its input signals, essentially as indicated above. Similarity information relating to each of its inputs is produced at its respective outputs and fed to a selecting unit 18.
  • This selecting unit 18 is, in the present example, arranged for outlier selection (OS) so as to determine which of its input signals is the most dissimilar, that is, is the outlier.
  • OS outlier selection
  • Information identifying the outlier, and hence the corresponding segment (of the segments I - IV of Fig. 1) is output as accented beat information abi.
  • the embodiment of Fig. 3 is very similar to the embodiment of Fig. 2 but is arranged for operating in the frequency domain.
  • the device 1 of Fig. 3 comprises a frame compilation (FC) unit 10 for compiling frames of the input time domain music signal x[n]. It will be understood that when the music signal x[n] is input in frame format, the frame compilation unit 10 may be dispensed with.
  • FC frame compilation
  • the music signal frames containing time domain signal data are fed to a transform unit 11 which in the present embodiment is arranged for carrying out a Fast Fourier Transform (FFT). It will be understood that other transforms, such as a Discrete Cosine Transform (DCT), may be used instead.
  • FFT Fast Fourier Transform
  • the transform domain signal data produced by the transform unit 11 are fed to the energy calculation unit 12, which calculates the energy of each frame using the transform domain data.
  • the resulting transform domain energy E[k] is fed to the segmentation means 15 via an energy buffer (EB) 13.
  • the embodiment shown also comprises a tempo estimator (TE) unit 14, which also receives the transform domain energy E[k] so as to derive the beat and optionally also the beat onsets.
  • TE tempo estimator
  • This tempo information T produced by the tempo estimator unit 14 is fed to the segmentation means 15, which also receive the beat-per-measure information M as in the embodiment of Fig. 2.
  • the energy E[k] is then processed by the segmentation unit 15 essentially as in the embodiment of Fig. 2.
  • a low-pass filter (LPF) 16 is arranged between the segmentation unit 15 and the similarity determination unit 17 so as to remove any undesired frequency components, such as noise components.
  • transform domain energy E[k] the transform domain spectrum is first divided into a number of sub-bands. Then the (spectral) weighted energy is computed by taking the weighted sum of the transform domain coefficients (in the example shown: FFT coefficients) of the respective sub-bands.
  • the AutoDJ system 5 illustrated merely by way of non- limiting example in Fig. 4 comprises a song database (SDB) 51 coupled to a player device (PD) 50, a playlist generator (PG) 54 and an audio analyzer (AA) 52.
  • the player device may be a home music (e.g. 5.1) set, an MP3 player, a computer sound card, or any other device capable of playing music, and is coupled to a loudspeaker 56.
  • the playlist generator 54 selects songs from the sound database 51 and compiles playlists in accordance with user preferences.
  • the audio analyzer 52 comprises an accented beat determination device 1 according to the present invention and supplies audio analysis information, including the positions of the accented beats, to a feature database (FDB) 53.
  • FDB feature database
  • a playlist recorder (PLR) 55 uses information provided by both the playlist generator 54 and the feature database 53 to record a playlist, and feeds this playlist (or playlists) to the player device 50. Using the accented beat information, smooth transitions between various songs can be achieved.
  • the method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat.
  • the present invention is based upon the insight that accented beats may be detected on the basis of their (dis)similarity with the unaccented beats.
  • the present invention benefits from the further insight that an accented beats typically indicates the beginning of a measure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Auxiliary Devices For Music (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A device (1) for determining accented beats in a music signal (x[n]) comprises: energy determination means (12) for determining the energy (E[n]; E[k]) of the music signal (x[n]), - segmentation means (15) for segmenting the energy on the basis of a tempo estimate (T), similarity determination means (17) for determining the similarity between the energy (E[n]; E[k]) of segments, and selecting means (18) for selecting the segment having the smallest similarity as the segment containing an accented beat. The tempo estimate (T) may be determined by external means. The energy may be determined in the time domain or in a transform domain. The similarity may be determined using cross-correlation, entropy or a distance measure. The device (1) may advantageously be used in AutoDJ apparatus.

Description

Audio structure analysis
The present invention relates to audio structure analysis. More in particular, the present invention relates to a device for and method of determining accented beats in a music signal.
It is well known to analyze the structure of audio signals, in particular music signals, both by hand and automatically. In order to compare music pieces or sound tracks, several characteristics of the music may be determined, such as the meter of the music, including the beat and the bar boundaries. When automatically processing music, for example in AutoDJ (Automatic Disc Jockey) applications, it is necessary to match the meter of successive music pieces. When mixing songs, it is highly desirable to synchronize the beat of the songs, in particular the accented beats (downbeats). Although many different methods of beat detection are known, very few Prior Art documents deal with detecting the accented beat. United States Patent US 6 542 869 (Foote) discloses a method of determining points of change in an audio signal by measuring the self-similarity of components of the audio signal. The self-similarity as well as cross-similarity between each of a set of signal parameterization values is determined for all past and future time window regions. A significant point of change will have a high self-similarity in the past and future, and a low cross-similarity. This known method may be used for beat tracking, including finding the tempo and location of downbeats in music.
This known method has several disadvantages. A self- similarity matrix is very complex and its compilation is computationally very demanding, while requiring a large amount of memory. This makes the known method less suitable for consumer devices which typically have relatively little computational power and a limited amount of memory. In addition, the known method suffers from high degree of ambiguity as the nature of the detected points of change has to be derived from their frequency of occurrence, which may only be determined accurately if a sufficiently high resolution is used. It has been found that this method is less suitable for accurately determining accented beats in music. It is an object of the present invention to overcome these and other problems of the Prior Art and to provide a device for and method of determining accented beats in a music signal which are simple yet provide a sufficient accuracy.
Accordingly, the present invention provides a device for determining accented beats in a music signal, the device comprising: energy determination means for determining the energy of the music signal, segmentation means for segmenting the energy on the basis of a tempo estimate, similarity determination means for determining the similarity between the energy of segments, and selecting means for selecting the segment having the smallest similarity as the segment containing an accented beat. By determining the similarity between the energy of signal segments, a very simple yet effective way of detecting accented beats is obtained, as the accented beat will be dissimilar from the other beats. The present invention uses a one-dimensional approach (comparing consecutive signal segments), which is computationally far less demanding and requires far less memory than the two-dimensional approach of the Foote patent mentioned above.
A tempo estimate is used to aid the segmentation of the calculated energy. This tempo estimate may be produced using any known method and may also involve detecting beat onsets, although this is not essential. It is preferred that the segmentation substantially corresponds with the beat onsets (the beginning of each beat), but this is not essential.
It is noted that instead of the energy of the music signal, any other equivalent property may be determined, such as its magnitude.
The similarity determination means may be arranged for carrying out a cross- correlation, an autocorrelation, a distance measurement, an information measurement and/or a pattern match. A cross-correlation is preferred, but other (dis)similarity measures may also be used.
As mentioned above, the segmentation means are preferably arranged for segmenting the energy on beat onset positions. Additionally, or alternatively, the segmentation means are preferably arranged for providing the segments in parallel so as to allow a simple, essentially one-dimensional comparison.
The device of the present invention may further comprise tempo estimation means for estimating the tempo of the music signal. However, such tempo estimation means, which may also determine beat onsets, may also be external to the device.
The energy determination means may be arranged for determining the time domain energy. However, in a preferred embodiment the device of the present invention further comprises a transform means for transforming the music signal to a transform domain, while the energy determination means are arranged for determining the transform domain energy, said transform domain preferably being the frequency domain. Accordingly, the transform means are preferably arranged for performing a Fast Fourier Transform (FFT).
The device of the present invention may further comprise a frame compilation means for compiling frames of the music signal, and/or an energy buffer means for buffering the (time and/or transform domain) energy. The device of the present invention may advantageously further comprise a filter means arranged between the segmentation means and the similarity determination means for filtering the energy segments prior to determining their similarity. The filter means serve to reduce any influence of transients and improve the reliability of the accented beat estimates. A music system, such as an AutoDJ system, according to the present invention comprises an accented beat determination device as defined above.
The present invention also provides a method of determining accented beats in a music signal, the method comprising the steps of: determining the energy of the music signal, - segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, and selecting the segment having the smallest similarity as the segment containing an accented beat.
The method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat. Accordingly, the present invention also provides a method of detecting bar boundaries in a music signal, the method comprising the steps of: determining the energy of the music signal, segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, selecting the segment having the smallest similarity as the segment containing an accented beat, and equating the bar boundary with the beat onset of the accented beat. Further advantageous embodiments of the inventive device and methods will become apparent from the description below.
The present invention additionally provides a computer program product for carrying out the method as defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
Fig. 1 schematically shows the energy of signal segments as processed according to the present invention,
Fig. 2 schematically shows a first embodiment of an accented beat detection device according to the present invention,
Fig. 3 schematically shows a second embodiment of an accented beat detection device according to the present invention,
Fig. 4 schematically shows an AutoDJ system in which the invention may advantageously be utilized.
The energy of a music signal as a function of time is schematically illustrated in Fig. 1. The energy E illustrated in Fig. 1 may be determined by an accented beat detection device of the present invention, which will be discussed later with reference to Figs. 2 and 3. The top diagram of Fig. 1 shows the energy E of a music signal as a function of time (sample number n) or frequency (frequency bin k). In the following discussion, it will be assumed that the energy E is a function of time and that the music has four beats per measure, although the invention is not so limited. The music signal is segmented into segments or beat periods BP. In the example shown, the segment boundaries are at the peaks of the energy signal E. Assuming (or knowing) that the music signal has four beats per measure, the segments can be labeled I, II, III and IV so as to correspond with the four respective beats. It is noted that at this stage, the accented beat and the beginning of the measure are not yet known and the label I is essentially arbitrary.
Although it is possible to use only a single copy of each segment I, II, III and IV, it is preferred to use multiple copies of each segment so as to average out any noise. Accordingly, the energy E of all first segments I (of a certain time period or time frame) are concatenated, resulting in the energy signal E labeled I in the leftmost lower diagram of Fig. 1. It is to be understood that the lower diagram labeled I contains a succession of segments I of the top diagram. Similarly, the second segments II are concatenated so as to produce the succession of segments illustrated in the lower diagram labeled II, while the same action is repeated for the segments III and IV. As can be seen, the successions of segments I, II and III are very similar and a similarity measure (such as cross-correlation) would yield a high degree of similarity. The segments IV, however, have a different shape and are therefore less similar. It can therefore be concluded that the segments IV represent the accented beats (downbeats), as they are the most dissimilar. The (dis)similarity can be determined in various ways, for example by determining the cross-correlation of each succession I, II, III and IV with each of the other successions, the succession having the lowest aggregate cross-correlation with the other successions representing the accented beat. Additionally, or alternatively, the autocorrelation of each succession may be determined, the most dissimilar autocorrelation value indicating the accented beat. In other embodiments, the shape and/or amplitude of the successions may be involved using pattern matching techniques or distance measures. It will be understood that the particular technique of determining the (dis)similarity of the successions is not essential.
A first embodiment of an accented beat detection device 1 according to the present invention is schematically illustrated in Fig. 2. The device 1 shown merely by way of non-limiting example in Fig. 2 is arranged for time domain similarity determination and comprises an energy calculation unit 12, a segmentation unit 15, a similarity determination unit 17 and a selecting unit 18. The energy calculation unit 12 receives a (digital) music signal x[n] and determines its energy (or any other suitable parameter), for example the signal energy E[n] (energy per sample n). This energy signal E[n] is fed to the segmentation unit 15, which acts as a demultiplexer (DMux). The segmentation unit receives tempo (beat and/or beat onset) information T and beats-per-measure information M and segments the energy E[n] accordingly (see also Fig. 1). The segmented energy is fed per segment number (I - IV in Fig. 1) to the similarity (SIM) determination unit 17. As a result, the similarity determination unit 17 receives the successions I - IV (Fig. 1) at each of its inputs.
The similarity determination unit 17 then determines the similarity between its input signals, essentially as indicated above. Similarity information relating to each of its inputs is produced at its respective outputs and fed to a selecting unit 18. This selecting unit 18 is, in the present example, arranged for outlier selection (OS) so as to determine which of its input signals is the most dissimilar, that is, is the outlier. Information identifying the outlier, and hence the corresponding segment (of the segments I - IV of Fig. 1) is output as accented beat information abi.
The embodiment of Fig. 3 is very similar to the embodiment of Fig. 2 but is arranged for operating in the frequency domain. The device 1 of Fig. 3 comprises a frame compilation (FC) unit 10 for compiling frames of the input time domain music signal x[n]. It will be understood that when the music signal x[n] is input in frame format, the frame compilation unit 10 may be dispensed with.
The music signal frames containing time domain signal data are fed to a transform unit 11 which in the present embodiment is arranged for carrying out a Fast Fourier Transform (FFT). It will be understood that other transforms, such as a Discrete Cosine Transform (DCT), may be used instead. The transform domain signal data produced by the transform unit 11 are fed to the energy calculation unit 12, which calculates the energy of each frame using the transform domain data. The resulting transform domain energy E[k], with k indicating the frequency bin number, is fed to the segmentation means 15 via an energy buffer (EB) 13. The embodiment shown also comprises a tempo estimator (TE) unit 14, which also receives the transform domain energy E[k] so as to derive the beat and optionally also the beat onsets.
This tempo information T produced by the tempo estimator unit 14 is fed to the segmentation means 15, which also receive the beat-per-measure information M as in the embodiment of Fig. 2. The energy E[k] is then processed by the segmentation unit 15 essentially as in the embodiment of Fig. 2. In the embodiment shown in Fig. 3, a low-pass filter (LPF) 16 is arranged between the segmentation unit 15 and the similarity determination unit 17 so as to remove any undesired frequency components, such as noise components.
It is possible to use additional filters to process the music signal x[n] or its energy E[n] or E[k] per sub-band. In the case of time domain energy E[n], the energy computation is preceded by a filter bank that splits the incoming signal into a number of sub- bands (m=l ...M). For each sub-band m, the energy function Em[n] is computed. The similarity is then determined for each sub-band independently. The selection of the accented beat is based upon the weighted sum of the similarity values of the sub-bands. Similarly, in the case of transform domain energy E[k], the transform domain spectrum is first divided into a number of sub-bands. Then the (spectral) weighted energy is computed by taking the weighted sum of the transform domain coefficients (in the example shown: FFT coefficients) of the respective sub-bands.
The AutoDJ system 5 illustrated merely by way of non- limiting example in Fig. 4 comprises a song database (SDB) 51 coupled to a player device (PD) 50, a playlist generator (PG) 54 and an audio analyzer (AA) 52. The player device may be a home music (e.g. 5.1) set, an MP3 player, a computer sound card, or any other device capable of playing music, and is coupled to a loudspeaker 56. The playlist generator 54 selects songs from the sound database 51 and compiles playlists in accordance with user preferences. The audio analyzer 52 comprises an accented beat determination device 1 according to the present invention and supplies audio analysis information, including the positions of the accented beats, to a feature database (FDB) 53. A playlist recorder (PLR) 55 uses information provided by both the playlist generator 54 and the feature database 53 to record a playlist, and feeds this playlist (or playlists) to the player device 50. Using the accented beat information, smooth transitions between various songs can be achieved.
The method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat.
The present invention is based upon the insight that accented beats may be detected on the basis of their (dis)similarity with the unaccented beats. The present invention benefits from the further insight that an accented beats typically indicates the beginning of a measure.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words "comprise(s)" and "comprising" are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.

Claims

CLAIMS:
1. A device (1) for determining accented beats in a music signal, the device comprising: energy determination means (12) for determining the energy (E[n]; E[k]) of the music signal (x[n]), - segmentation means (15) for segmenting the energy on the basis of a tempo estimate (T), similarity determination means (17) for determining the similarity between the energy (E[n]; E[k]) of segments, and selecting means (18) for selecting the segment having the smallest similarity as the segment containing an accented beat.
2. The device according to claim 1, wherein the similarity determination means (17) are arranged for carrying out a cross-correlation, an autocorrelation, a distance measurement, an information measurement and/or a pattern match.
3. The device according to claim 1, wherein the segmentation means (15) are arranged for segmenting the energy on beat onset positions.
4. The device according to claim 1, wherein the segmentation means (15) are arranged for providing the segments in parallel.
5. The device according to claim 1, further comprising tempo estimation means (14) for estimating the tempo of the music signal (x[n]).
6. The device according to claim 1, wherein the energy determination means (12) are arranged for determining the time domain energy (E[n]).
7. The device according to claim 1, further comprising a transform means (11) for transforming the music signal x[n] to a transform domain, wherein the energy determination means (12) are arranged for determining the transform domain energy (E[k]), said transform domain preferably being the frequency domain.
8. The device according to claim 1, further comprising a frame compilation means (10) for compiling frames of the music signal, and/or an energy buffer means (13) for buffering the energy (E [n], E[k]).
9. The device according to claim 1, further comprising a filter means (16) arranged between the segmentation means (15) and the similarity determination means (17) for filtering the energy segments prior to determining their similarity.
10. An AutoDJ system (5), comprising a device (1; 52) according to claim 1.
11. A method of determining accented beats in a music signal (x[n]), the method comprising the steps of: determining the energy (E [n]; E[k]) of the music signal (x[n]), segmenting the energy on the basis of a tempo estimate (T), determining the similarity between the energy (E[n]; E[k]) of segments, and selecting the segment having the smallest similarity as the segment containing an accented beat.
12. A method of detecting bar boundaries in a music signal (x[n]), the method comprising the steps of: determining the energy (E[n]; E[k]) of the music signal (x[n]), - segmenting the energy on the basis of a tempo estimate (T), determining the similarity between the energy (E[n]; E[k]) of segments, selecting the segment having the smallest similarity as the segment containing an accented beat, and equating the bar boundary with the beat onset of the accented beat.
13. A computer program product for carrying out the method according to claim 11 and/or 12.
PCT/IB2006/054915 2005-12-22 2006-12-18 Audio structure analysis WO2007072394A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP05112778 2005-12-22
EP05112778.5 2005-12-22

Publications (2)

Publication Number Publication Date
WO2007072394A2 true WO2007072394A2 (en) 2007-06-28
WO2007072394A3 WO2007072394A3 (en) 2007-10-18

Family

ID=38137441

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2006/054915 WO2007072394A2 (en) 2005-12-22 2006-12-18 Audio structure analysis

Country Status (1)

Country Link
WO (1) WO2007072394A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007036846A2 (en) * 2005-09-30 2007-04-05 Koninklijke Philips Electronics N.V. Method and apparatus for automatic structure analysis of music
DE102009031673A1 (en) * 2009-02-13 2010-08-26 Kajetan Dvoracek Method for determining clock speed of electrical signals for e.g. pulse frequency-oriented sports activity, involves supplementing maximum value if necessary, and determining clock speed of piece of music from period values
US9830896B2 (en) 2013-05-31 2017-11-28 Dolby Laboratories Licensing Corporation Audio processing method and audio processing apparatus, and training method
US20200357369A1 (en) * 2018-01-09 2020-11-12 Guangzhou Baiguoyuan Information Technology Co., Ltd. Music classification method and beat point detection method, storage device and computer device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5625235B2 (en) * 2008-11-21 2014-11-19 ソニー株式会社 Information processing apparatus, voice analysis method, and program
JP5463655B2 (en) * 2008-11-21 2014-04-09 ソニー株式会社 Information processing apparatus, voice analysis method, and program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6316712B1 (en) * 1999-01-25 2001-11-13 Creative Technology Ltd. Method and apparatus for tempo and downbeat detection and alteration of rhythm in a musical segment
US6542869B1 (en) * 2000-05-11 2003-04-01 Fuji Xerox Co., Ltd. Method for automatic analysis of audio including music and speech
US20050211072A1 (en) * 2004-03-25 2005-09-29 Microsoft Corporation Beat analysis of musical signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6316712B1 (en) * 1999-01-25 2001-11-13 Creative Technology Ltd. Method and apparatus for tempo and downbeat detection and alteration of rhythm in a musical segment
US6542869B1 (en) * 2000-05-11 2003-04-01 Fuji Xerox Co., Ltd. Method for automatic analysis of audio including music and speech
US20050211072A1 (en) * 2004-03-25 2005-09-29 Microsoft Corporation Beat analysis of musical signals

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GOTO M ET AL: "Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions" SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 27, no. 3-4, April 1999 (1999-04), pages 311-335, XP004163257 ISSN: 0167-6393 *
SCHEIRER ERIC D: "Tempo and beat analysis of acoustic musical signals" JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, AIP / ACOUSTICAL SOCIETY OF AMERICA, MELVILLE, NY, US, vol. 103, no. 1, January 1998 (1998-01), pages 588-601, XP012000051 ISSN: 0001-4966 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007036846A2 (en) * 2005-09-30 2007-04-05 Koninklijke Philips Electronics N.V. Method and apparatus for automatic structure analysis of music
WO2007036846A3 (en) * 2005-09-30 2007-11-29 Koninkl Philips Electronics Nv Method and apparatus for automatic structure analysis of music
DE102009031673A1 (en) * 2009-02-13 2010-08-26 Kajetan Dvoracek Method for determining clock speed of electrical signals for e.g. pulse frequency-oriented sports activity, involves supplementing maximum value if necessary, and determining clock speed of piece of music from period values
US9830896B2 (en) 2013-05-31 2017-11-28 Dolby Laboratories Licensing Corporation Audio processing method and audio processing apparatus, and training method
US20200357369A1 (en) * 2018-01-09 2020-11-12 Guangzhou Baiguoyuan Information Technology Co., Ltd. Music classification method and beat point detection method, storage device and computer device
US11715446B2 (en) 2018-01-09 2023-08-01 Bigo Technology Pte, Ltd. Music classification method and beat point detection method, storage device and computer device

Also Published As

Publication number Publication date
WO2007072394A3 (en) 2007-10-18

Similar Documents

Publication Publication Date Title
JP5362178B2 (en) Extracting and matching characteristic fingerprints from audio signals
JP4900960B2 (en) Apparatus and method for analyzing information signals
US8586847B2 (en) Musical fingerprinting based on onset intervals
US7085613B2 (en) System for monitoring audio content in a video broadcast
US7386357B2 (en) System and method for generating an audio thumbnail of an audio track
JP4949687B2 (en) Beat extraction apparatus and beat extraction method
US6604072B2 (en) Feature-based audio content identification
US7500176B2 (en) Method and apparatus for automatically creating a movie
US20020116195A1 (en) System for selling a product utilizing audio content identification
JP4650662B2 (en) Signal processing apparatus, signal processing method, program, and recording medium
EP1579419B1 (en) Audio signal analysing method and apparatus
GB2518663A (en) Audio analysis apparatus
MX2007002071A (en) Methods and apparatus for generating signatures.
WO2007072394A2 (en) Audio structure analysis
US20110067555A1 (en) Tempo detecting device and tempo detecting program
US8983082B2 (en) Detecting musical structures
Zhou et al. Music onset detection based on resonator time frequency image
EP2022041A1 (en) Selection of tonal components in an audio spectrum for harmonic and key analysis
US9767846B2 (en) Systems and methods for analyzing audio characteristics and generating a uniform soundtrack from multiple sources
EP1497935B1 (en) Feature-based audio content identification
JP5395399B2 (en) Mobile terminal, beat position estimating method and beat position estimating program
JP2005292207A (en) Method of music analysis
CN112687247A (en) Audio alignment method and device, electronic equipment and storage medium
EP2355104A1 (en) Apparatus and method for processing audio data
AU2002249371B2 (en) Method and apparatus for identifying electronic files

Legal Events

Date Code Title Description
NENP Non-entry into the national phase in:

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 06842576

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 06842576

Country of ref document: EP

Kind code of ref document: A2