WO2007072394A2 - Audio structure analysis - Google Patents
Audio structure analysis Download PDFInfo
- Publication number
- WO2007072394A2 WO2007072394A2 PCT/IB2006/054915 IB2006054915W WO2007072394A2 WO 2007072394 A2 WO2007072394 A2 WO 2007072394A2 IB 2006054915 W IB2006054915 W IB 2006054915W WO 2007072394 A2 WO2007072394 A2 WO 2007072394A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- energy
- similarity
- determining
- music signal
- beat
- Prior art date
Links
- 230000011218 segmentation Effects 0.000 claims abstract description 18
- 238000000034 method Methods 0.000 claims description 27
- 238000005259 measurement Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 230000003139 buffering effect Effects 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 238000001514 detection method Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000011524 similarity measure Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/40—Rhythm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/076—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/021—Indicator, i.e. non-screen output user interfacing, e.g. visual or tactile instrument status or guidance information using lights, LEDs, seven segments displays
- G10H2220/081—Beat indicator, e.g. marks or flashing LEDs to indicate tempo or beat positions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/135—Autocorrelation
Definitions
- the present invention relates to audio structure analysis. More in particular, the present invention relates to a device for and method of determining accented beats in a music signal.
- United States Patent US 6 542 869 discloses a method of determining points of change in an audio signal by measuring the self-similarity of components of the audio signal.
- the self-similarity as well as cross-similarity between each of a set of signal parameterization values is determined for all past and future time window regions.
- a significant point of change will have a high self-similarity in the past and future, and a low cross-similarity.
- This known method may be used for beat tracking, including finding the tempo and location of downbeats in music.
- the present invention provides a device for determining accented beats in a music signal, the device comprising: energy determination means for determining the energy of the music signal, segmentation means for segmenting the energy on the basis of a tempo estimate, similarity determination means for determining the similarity between the energy of segments, and selecting means for selecting the segment having the smallest similarity as the segment containing an accented beat.
- a tempo estimate is used to aid the segmentation of the calculated energy.
- This tempo estimate may be produced using any known method and may also involve detecting beat onsets, although this is not essential. It is preferred that the segmentation substantially corresponds with the beat onsets (the beginning of each beat), but this is not essential.
- any other equivalent property may be determined, such as its magnitude.
- the similarity determination means may be arranged for carrying out a cross- correlation, an autocorrelation, a distance measurement, an information measurement and/or a pattern match.
- a cross-correlation is preferred, but other (dis)similarity measures may also be used.
- the segmentation means are preferably arranged for segmenting the energy on beat onset positions. Additionally, or alternatively, the segmentation means are preferably arranged for providing the segments in parallel so as to allow a simple, essentially one-dimensional comparison.
- the device of the present invention may further comprise tempo estimation means for estimating the tempo of the music signal.
- tempo estimation means which may also determine beat onsets, may also be external to the device.
- the energy determination means may be arranged for determining the time domain energy.
- the device of the present invention further comprises a transform means for transforming the music signal to a transform domain, while the energy determination means are arranged for determining the transform domain energy, said transform domain preferably being the frequency domain.
- the transform means are preferably arranged for performing a Fast Fourier Transform (FFT).
- FFT Fast Fourier Transform
- the device of the present invention may further comprise a frame compilation means for compiling frames of the music signal, and/or an energy buffer means for buffering the (time and/or transform domain) energy.
- the device of the present invention may advantageously further comprise a filter means arranged between the segmentation means and the similarity determination means for filtering the energy segments prior to determining their similarity.
- the filter means serve to reduce any influence of transients and improve the reliability of the accented beat estimates.
- a music system such as an AutoDJ system, according to the present invention comprises an accented beat determination device as defined above.
- the present invention also provides a method of determining accented beats in a music signal, the method comprising the steps of: determining the energy of the music signal, - segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, and selecting the segment having the smallest similarity as the segment containing an accented beat.
- the method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat. Accordingly, the present invention also provides a method of detecting bar boundaries in a music signal, the method comprising the steps of: determining the energy of the music signal, segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, selecting the segment having the smallest similarity as the segment containing an accented beat, and equating the bar boundary with the beat onset of the accented beat. Further advantageous embodiments of the inventive device and methods will become apparent from the description below.
- the present invention additionally provides a computer program product for carrying out the method as defined above.
- a computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD.
- the set of computer executable instructions which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
- Fig. 1 schematically shows the energy of signal segments as processed according to the present invention
- Fig. 2 schematically shows a first embodiment of an accented beat detection device according to the present invention
- Fig. 3 schematically shows a second embodiment of an accented beat detection device according to the present invention
- Fig. 4 schematically shows an AutoDJ system in which the invention may advantageously be utilized.
- the energy of a music signal as a function of time is schematically illustrated in Fig. 1.
- the energy E illustrated in Fig. 1 may be determined by an accented beat detection device of the present invention, which will be discussed later with reference to Figs. 2 and 3.
- the top diagram of Fig. 1 shows the energy E of a music signal as a function of time (sample number n) or frequency (frequency bin k).
- sample number n sample number
- frequency bin k frequency
- the music signal is segmented into segments or beat periods BP. In the example shown, the segment boundaries are at the peaks of the energy signal E.
- the segments can be labeled I, II, III and IV so as to correspond with the four respective beats. It is noted that at this stage, the accented beat and the beginning of the measure are not yet known and the label I is essentially arbitrary.
- each segment I, II, III and IV it is preferred to use multiple copies of each segment so as to average out any noise. Accordingly, the energy E of all first segments I (of a certain time period or time frame) are concatenated, resulting in the energy signal E labeled I in the leftmost lower diagram of Fig. 1. It is to be understood that the lower diagram labeled I contains a succession of segments I of the top diagram. Similarly, the second segments II are concatenated so as to produce the succession of segments illustrated in the lower diagram labeled II, while the same action is repeated for the segments III and IV.
- the successions of segments I, II and III are very similar and a similarity measure (such as cross-correlation) would yield a high degree of similarity.
- the segments IV have a different shape and are therefore less similar. It can therefore be concluded that the segments IV represent the accented beats (downbeats), as they are the most dissimilar.
- the (dis)similarity can be determined in various ways, for example by determining the cross-correlation of each succession I, II, III and IV with each of the other successions, the succession having the lowest aggregate cross-correlation with the other successions representing the accented beat. Additionally, or alternatively, the autocorrelation of each succession may be determined, the most dissimilar autocorrelation value indicating the accented beat. In other embodiments, the shape and/or amplitude of the successions may be involved using pattern matching techniques or distance measures. It will be understood that the particular technique of determining the (dis)similarity of the successions is not essential.
- FIG. 2 A first embodiment of an accented beat detection device 1 according to the present invention is schematically illustrated in Fig. 2.
- the device 1 shown merely by way of non-limiting example in Fig. 2 is arranged for time domain similarity determination and comprises an energy calculation unit 12, a segmentation unit 15, a similarity determination unit 17 and a selecting unit 18.
- the energy calculation unit 12 receives a (digital) music signal x[n] and determines its energy (or any other suitable parameter), for example the signal energy E[n] (energy per sample n).
- This energy signal E[n] is fed to the segmentation unit 15, which acts as a demultiplexer (DMux).
- Dux demultiplexer
- the segmentation unit receives tempo (beat and/or beat onset) information T and beats-per-measure information M and segments the energy E[n] accordingly (see also Fig. 1).
- the segmented energy is fed per segment number (I - IV in Fig. 1) to the similarity (SIM) determination unit 17.
- SIM similarity
- the similarity determination unit 17 determines the similarity between its input signals, essentially as indicated above. Similarity information relating to each of its inputs is produced at its respective outputs and fed to a selecting unit 18.
- This selecting unit 18 is, in the present example, arranged for outlier selection (OS) so as to determine which of its input signals is the most dissimilar, that is, is the outlier.
- OS outlier selection
- Information identifying the outlier, and hence the corresponding segment (of the segments I - IV of Fig. 1) is output as accented beat information abi.
- the embodiment of Fig. 3 is very similar to the embodiment of Fig. 2 but is arranged for operating in the frequency domain.
- the device 1 of Fig. 3 comprises a frame compilation (FC) unit 10 for compiling frames of the input time domain music signal x[n]. It will be understood that when the music signal x[n] is input in frame format, the frame compilation unit 10 may be dispensed with.
- FC frame compilation
- the music signal frames containing time domain signal data are fed to a transform unit 11 which in the present embodiment is arranged for carrying out a Fast Fourier Transform (FFT). It will be understood that other transforms, such as a Discrete Cosine Transform (DCT), may be used instead.
- FFT Fast Fourier Transform
- the transform domain signal data produced by the transform unit 11 are fed to the energy calculation unit 12, which calculates the energy of each frame using the transform domain data.
- the resulting transform domain energy E[k] is fed to the segmentation means 15 via an energy buffer (EB) 13.
- the embodiment shown also comprises a tempo estimator (TE) unit 14, which also receives the transform domain energy E[k] so as to derive the beat and optionally also the beat onsets.
- TE tempo estimator
- This tempo information T produced by the tempo estimator unit 14 is fed to the segmentation means 15, which also receive the beat-per-measure information M as in the embodiment of Fig. 2.
- the energy E[k] is then processed by the segmentation unit 15 essentially as in the embodiment of Fig. 2.
- a low-pass filter (LPF) 16 is arranged between the segmentation unit 15 and the similarity determination unit 17 so as to remove any undesired frequency components, such as noise components.
- transform domain energy E[k] the transform domain spectrum is first divided into a number of sub-bands. Then the (spectral) weighted energy is computed by taking the weighted sum of the transform domain coefficients (in the example shown: FFT coefficients) of the respective sub-bands.
- the AutoDJ system 5 illustrated merely by way of non- limiting example in Fig. 4 comprises a song database (SDB) 51 coupled to a player device (PD) 50, a playlist generator (PG) 54 and an audio analyzer (AA) 52.
- the player device may be a home music (e.g. 5.1) set, an MP3 player, a computer sound card, or any other device capable of playing music, and is coupled to a loudspeaker 56.
- the playlist generator 54 selects songs from the sound database 51 and compiles playlists in accordance with user preferences.
- the audio analyzer 52 comprises an accented beat determination device 1 according to the present invention and supplies audio analysis information, including the positions of the accented beats, to a feature database (FDB) 53.
- FDB feature database
- a playlist recorder (PLR) 55 uses information provided by both the playlist generator 54 and the feature database 53 to record a playlist, and feeds this playlist (or playlists) to the player device 50. Using the accented beat information, smooth transitions between various songs can be achieved.
- the method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat.
- the present invention is based upon the insight that accented beats may be detected on the basis of their (dis)similarity with the unaccented beats.
- the present invention benefits from the further insight that an accented beats typically indicates the beginning of a measure.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Auxiliary Devices For Music (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A device (1) for determining accented beats in a music signal (x[n]) comprises: energy determination means (12) for determining the energy (E[n]; E[k]) of the music signal (x[n]), - segmentation means (15) for segmenting the energy on the basis of a tempo estimate (T), similarity determination means (17) for determining the similarity between the energy (E[n]; E[k]) of segments, and selecting means (18) for selecting the segment having the smallest similarity as the segment containing an accented beat. The tempo estimate (T) may be determined by external means. The energy may be determined in the time domain or in a transform domain. The similarity may be determined using cross-correlation, entropy or a distance measure. The device (1) may advantageously be used in AutoDJ apparatus.
Description
Audio structure analysis
The present invention relates to audio structure analysis. More in particular, the present invention relates to a device for and method of determining accented beats in a music signal.
It is well known to analyze the structure of audio signals, in particular music signals, both by hand and automatically. In order to compare music pieces or sound tracks, several characteristics of the music may be determined, such as the meter of the music, including the beat and the bar boundaries. When automatically processing music, for example in AutoDJ (Automatic Disc Jockey) applications, it is necessary to match the meter of successive music pieces. When mixing songs, it is highly desirable to synchronize the beat of the songs, in particular the accented beats (downbeats). Although many different methods of beat detection are known, very few Prior Art documents deal with detecting the accented beat. United States Patent US 6 542 869 (Foote) discloses a method of determining points of change in an audio signal by measuring the self-similarity of components of the audio signal. The self-similarity as well as cross-similarity between each of a set of signal parameterization values is determined for all past and future time window regions. A significant point of change will have a high self-similarity in the past and future, and a low cross-similarity. This known method may be used for beat tracking, including finding the tempo and location of downbeats in music.
This known method has several disadvantages. A self- similarity matrix is very complex and its compilation is computationally very demanding, while requiring a large amount of memory. This makes the known method less suitable for consumer devices which typically have relatively little computational power and a limited amount of memory. In addition, the known method suffers from high degree of ambiguity as the nature of the detected points of change has to be derived from their frequency of occurrence, which may only be determined accurately if a sufficiently high resolution is used. It has been found that this method is less suitable for accurately determining accented beats in music.
It is an object of the present invention to overcome these and other problems of the Prior Art and to provide a device for and method of determining accented beats in a music signal which are simple yet provide a sufficient accuracy.
Accordingly, the present invention provides a device for determining accented beats in a music signal, the device comprising: energy determination means for determining the energy of the music signal, segmentation means for segmenting the energy on the basis of a tempo estimate, similarity determination means for determining the similarity between the energy of segments, and selecting means for selecting the segment having the smallest similarity as the segment containing an accented beat. By determining the similarity between the energy of signal segments, a very simple yet effective way of detecting accented beats is obtained, as the accented beat will be dissimilar from the other beats. The present invention uses a one-dimensional approach (comparing consecutive signal segments), which is computationally far less demanding and requires far less memory than the two-dimensional approach of the Foote patent mentioned above.
A tempo estimate is used to aid the segmentation of the calculated energy. This tempo estimate may be produced using any known method and may also involve detecting beat onsets, although this is not essential. It is preferred that the segmentation substantially corresponds with the beat onsets (the beginning of each beat), but this is not essential.
It is noted that instead of the energy of the music signal, any other equivalent property may be determined, such as its magnitude.
The similarity determination means may be arranged for carrying out a cross- correlation, an autocorrelation, a distance measurement, an information measurement and/or a pattern match. A cross-correlation is preferred, but other (dis)similarity measures may also be used.
As mentioned above, the segmentation means are preferably arranged for segmenting the energy on beat onset positions. Additionally, or alternatively, the
segmentation means are preferably arranged for providing the segments in parallel so as to allow a simple, essentially one-dimensional comparison.
The device of the present invention may further comprise tempo estimation means for estimating the tempo of the music signal. However, such tempo estimation means, which may also determine beat onsets, may also be external to the device.
The energy determination means may be arranged for determining the time domain energy. However, in a preferred embodiment the device of the present invention further comprises a transform means for transforming the music signal to a transform domain, while the energy determination means are arranged for determining the transform domain energy, said transform domain preferably being the frequency domain. Accordingly, the transform means are preferably arranged for performing a Fast Fourier Transform (FFT).
The device of the present invention may further comprise a frame compilation means for compiling frames of the music signal, and/or an energy buffer means for buffering the (time and/or transform domain) energy. The device of the present invention may advantageously further comprise a filter means arranged between the segmentation means and the similarity determination means for filtering the energy segments prior to determining their similarity. The filter means serve to reduce any influence of transients and improve the reliability of the accented beat estimates. A music system, such as an AutoDJ system, according to the present invention comprises an accented beat determination device as defined above.
The present invention also provides a method of determining accented beats in a music signal, the method comprising the steps of: determining the energy of the music signal, - segmenting the energy on the basis of a tempo estimate, determining the similarity between the energy of segments, and selecting the segment having the smallest similarity as the segment containing an accented beat.
The method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat. Accordingly, the present invention also provides a method of detecting bar boundaries in a music signal, the method comprising the steps of: determining the energy of the music signal, segmenting the energy on the basis of a tempo estimate,
determining the similarity between the energy of segments, selecting the segment having the smallest similarity as the segment containing an accented beat, and equating the bar boundary with the beat onset of the accented beat. Further advantageous embodiments of the inventive device and methods will become apparent from the description below.
The present invention additionally provides a computer program product for carrying out the method as defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
Fig. 1 schematically shows the energy of signal segments as processed according to the present invention,
Fig. 2 schematically shows a first embodiment of an accented beat detection device according to the present invention,
Fig. 3 schematically shows a second embodiment of an accented beat detection device according to the present invention,
Fig. 4 schematically shows an AutoDJ system in which the invention may advantageously be utilized.
The energy of a music signal as a function of time is schematically illustrated in Fig. 1. The energy E illustrated in Fig. 1 may be determined by an accented beat detection device of the present invention, which will be discussed later with reference to Figs. 2 and 3. The top diagram of Fig. 1 shows the energy E of a music signal as a function of time (sample number n) or frequency (frequency bin k). In the following discussion, it will be assumed that the energy E is a function of time and that the music has four beats per measure, although the invention is not so limited.
The music signal is segmented into segments or beat periods BP. In the example shown, the segment boundaries are at the peaks of the energy signal E. Assuming (or knowing) that the music signal has four beats per measure, the segments can be labeled I, II, III and IV so as to correspond with the four respective beats. It is noted that at this stage, the accented beat and the beginning of the measure are not yet known and the label I is essentially arbitrary.
Although it is possible to use only a single copy of each segment I, II, III and IV, it is preferred to use multiple copies of each segment so as to average out any noise. Accordingly, the energy E of all first segments I (of a certain time period or time frame) are concatenated, resulting in the energy signal E labeled I in the leftmost lower diagram of Fig. 1. It is to be understood that the lower diagram labeled I contains a succession of segments I of the top diagram. Similarly, the second segments II are concatenated so as to produce the succession of segments illustrated in the lower diagram labeled II, while the same action is repeated for the segments III and IV. As can be seen, the successions of segments I, II and III are very similar and a similarity measure (such as cross-correlation) would yield a high degree of similarity. The segments IV, however, have a different shape and are therefore less similar. It can therefore be concluded that the segments IV represent the accented beats (downbeats), as they are the most dissimilar. The (dis)similarity can be determined in various ways, for example by determining the cross-correlation of each succession I, II, III and IV with each of the other successions, the succession having the lowest aggregate cross-correlation with the other successions representing the accented beat. Additionally, or alternatively, the autocorrelation of each succession may be determined, the most dissimilar autocorrelation value indicating the accented beat. In other embodiments, the shape and/or amplitude of the successions may be involved using pattern matching techniques or distance measures. It will be understood that the particular technique of determining the (dis)similarity of the successions is not essential.
A first embodiment of an accented beat detection device 1 according to the present invention is schematically illustrated in Fig. 2. The device 1 shown merely by way of non-limiting example in Fig. 2 is arranged for time domain similarity determination and comprises an energy calculation unit 12, a segmentation unit 15, a similarity determination unit 17 and a selecting unit 18.
The energy calculation unit 12 receives a (digital) music signal x[n] and determines its energy (or any other suitable parameter), for example the signal energy E[n] (energy per sample n). This energy signal E[n] is fed to the segmentation unit 15, which acts as a demultiplexer (DMux). The segmentation unit receives tempo (beat and/or beat onset) information T and beats-per-measure information M and segments the energy E[n] accordingly (see also Fig. 1). The segmented energy is fed per segment number (I - IV in Fig. 1) to the similarity (SIM) determination unit 17. As a result, the similarity determination unit 17 receives the successions I - IV (Fig. 1) at each of its inputs.
The similarity determination unit 17 then determines the similarity between its input signals, essentially as indicated above. Similarity information relating to each of its inputs is produced at its respective outputs and fed to a selecting unit 18. This selecting unit 18 is, in the present example, arranged for outlier selection (OS) so as to determine which of its input signals is the most dissimilar, that is, is the outlier. Information identifying the outlier, and hence the corresponding segment (of the segments I - IV of Fig. 1) is output as accented beat information abi.
The embodiment of Fig. 3 is very similar to the embodiment of Fig. 2 but is arranged for operating in the frequency domain. The device 1 of Fig. 3 comprises a frame compilation (FC) unit 10 for compiling frames of the input time domain music signal x[n]. It will be understood that when the music signal x[n] is input in frame format, the frame compilation unit 10 may be dispensed with.
The music signal frames containing time domain signal data are fed to a transform unit 11 which in the present embodiment is arranged for carrying out a Fast Fourier Transform (FFT). It will be understood that other transforms, such as a Discrete Cosine Transform (DCT), may be used instead. The transform domain signal data produced by the transform unit 11 are fed to the energy calculation unit 12, which calculates the energy of each frame using the transform domain data. The resulting transform domain energy E[k], with k indicating the frequency bin number, is fed to the segmentation means 15 via an energy buffer (EB) 13. The embodiment shown also comprises a tempo estimator (TE) unit 14, which also receives the transform domain energy E[k] so as to derive the beat and optionally also the beat onsets.
This tempo information T produced by the tempo estimator unit 14 is fed to the segmentation means 15, which also receive the beat-per-measure information M as in the embodiment of Fig. 2. The energy E[k] is then processed by the segmentation unit 15 essentially as in the embodiment of Fig. 2.
In the embodiment shown in Fig. 3, a low-pass filter (LPF) 16 is arranged between the segmentation unit 15 and the similarity determination unit 17 so as to remove any undesired frequency components, such as noise components.
It is possible to use additional filters to process the music signal x[n] or its energy E[n] or E[k] per sub-band. In the case of time domain energy E[n], the energy computation is preceded by a filter bank that splits the incoming signal into a number of sub- bands (m=l ...M). For each sub-band m, the energy function Em[n] is computed. The similarity is then determined for each sub-band independently. The selection of the accented beat is based upon the weighted sum of the similarity values of the sub-bands. Similarly, in the case of transform domain energy E[k], the transform domain spectrum is first divided into a number of sub-bands. Then the (spectral) weighted energy is computed by taking the weighted sum of the transform domain coefficients (in the example shown: FFT coefficients) of the respective sub-bands.
The AutoDJ system 5 illustrated merely by way of non- limiting example in Fig. 4 comprises a song database (SDB) 51 coupled to a player device (PD) 50, a playlist generator (PG) 54 and an audio analyzer (AA) 52. The player device may be a home music (e.g. 5.1) set, an MP3 player, a computer sound card, or any other device capable of playing music, and is coupled to a loudspeaker 56. The playlist generator 54 selects songs from the sound database 51 and compiles playlists in accordance with user preferences. The audio analyzer 52 comprises an accented beat determination device 1 according to the present invention and supplies audio analysis information, including the positions of the accented beats, to a feature database (FDB) 53. A playlist recorder (PLR) 55 uses information provided by both the playlist generator 54 and the feature database 53 to record a playlist, and feeds this playlist (or playlists) to the player device 50. Using the accented beat information, smooth transitions between various songs can be achieved.
The method of the present invention may advantageously be used for detecting bar boundaries, as a bar typically starts with an accented beat.
The present invention is based upon the insight that accented beats may be detected on the basis of their (dis)similarity with the unaccented beats. The present invention benefits from the further insight that an accented beats typically indicates the beginning of a measure.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words "comprise(s)" and
"comprising" are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
It will be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.
Claims
1. A device (1) for determining accented beats in a music signal, the device comprising: energy determination means (12) for determining the energy (E[n]; E[k]) of the music signal (x[n]), - segmentation means (15) for segmenting the energy on the basis of a tempo estimate (T), similarity determination means (17) for determining the similarity between the energy (E[n]; E[k]) of segments, and selecting means (18) for selecting the segment having the smallest similarity as the segment containing an accented beat.
2. The device according to claim 1, wherein the similarity determination means (17) are arranged for carrying out a cross-correlation, an autocorrelation, a distance measurement, an information measurement and/or a pattern match.
3. The device according to claim 1, wherein the segmentation means (15) are arranged for segmenting the energy on beat onset positions.
4. The device according to claim 1, wherein the segmentation means (15) are arranged for providing the segments in parallel.
5. The device according to claim 1, further comprising tempo estimation means (14) for estimating the tempo of the music signal (x[n]).
6. The device according to claim 1, wherein the energy determination means (12) are arranged for determining the time domain energy (E[n]).
7. The device according to claim 1, further comprising a transform means (11) for transforming the music signal x[n] to a transform domain, wherein the energy determination means (12) are arranged for determining the transform domain energy (E[k]), said transform domain preferably being the frequency domain.
8. The device according to claim 1, further comprising a frame compilation means (10) for compiling frames of the music signal, and/or an energy buffer means (13) for buffering the energy (E [n], E[k]).
9. The device according to claim 1, further comprising a filter means (16) arranged between the segmentation means (15) and the similarity determination means (17) for filtering the energy segments prior to determining their similarity.
10. An AutoDJ system (5), comprising a device (1; 52) according to claim 1.
11. A method of determining accented beats in a music signal (x[n]), the method comprising the steps of: determining the energy (E [n]; E[k]) of the music signal (x[n]), segmenting the energy on the basis of a tempo estimate (T), determining the similarity between the energy (E[n]; E[k]) of segments, and selecting the segment having the smallest similarity as the segment containing an accented beat.
12. A method of detecting bar boundaries in a music signal (x[n]), the method comprising the steps of: determining the energy (E[n]; E[k]) of the music signal (x[n]), - segmenting the energy on the basis of a tempo estimate (T), determining the similarity between the energy (E[n]; E[k]) of segments, selecting the segment having the smallest similarity as the segment containing an accented beat, and equating the bar boundary with the beat onset of the accented beat.
13. A computer program product for carrying out the method according to claim 11 and/or 12.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05112778 | 2005-12-22 | ||
EP05112778.5 | 2005-12-22 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2007072394A2 true WO2007072394A2 (en) | 2007-06-28 |
WO2007072394A3 WO2007072394A3 (en) | 2007-10-18 |
Family
ID=38137441
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2006/054915 WO2007072394A2 (en) | 2005-12-22 | 2006-12-18 | Audio structure analysis |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007072394A2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007036846A2 (en) * | 2005-09-30 | 2007-04-05 | Koninklijke Philips Electronics N.V. | Method and apparatus for automatic structure analysis of music |
DE102009031673A1 (en) * | 2009-02-13 | 2010-08-26 | Kajetan Dvoracek | Method for determining clock speed of electrical signals for e.g. pulse frequency-oriented sports activity, involves supplementing maximum value if necessary, and determining clock speed of piece of music from period values |
US9830896B2 (en) | 2013-05-31 | 2017-11-28 | Dolby Laboratories Licensing Corporation | Audio processing method and audio processing apparatus, and training method |
US20200357369A1 (en) * | 2018-01-09 | 2020-11-12 | Guangzhou Baiguoyuan Information Technology Co., Ltd. | Music classification method and beat point detection method, storage device and computer device |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5625235B2 (en) * | 2008-11-21 | 2014-11-19 | ソニー株式会社 | Information processing apparatus, voice analysis method, and program |
JP5463655B2 (en) * | 2008-11-21 | 2014-04-09 | ソニー株式会社 | Information processing apparatus, voice analysis method, and program |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6316712B1 (en) * | 1999-01-25 | 2001-11-13 | Creative Technology Ltd. | Method and apparatus for tempo and downbeat detection and alteration of rhythm in a musical segment |
US6542869B1 (en) * | 2000-05-11 | 2003-04-01 | Fuji Xerox Co., Ltd. | Method for automatic analysis of audio including music and speech |
US20050211072A1 (en) * | 2004-03-25 | 2005-09-29 | Microsoft Corporation | Beat analysis of musical signals |
-
2006
- 2006-12-18 WO PCT/IB2006/054915 patent/WO2007072394A2/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6316712B1 (en) * | 1999-01-25 | 2001-11-13 | Creative Technology Ltd. | Method and apparatus for tempo and downbeat detection and alteration of rhythm in a musical segment |
US6542869B1 (en) * | 2000-05-11 | 2003-04-01 | Fuji Xerox Co., Ltd. | Method for automatic analysis of audio including music and speech |
US20050211072A1 (en) * | 2004-03-25 | 2005-09-29 | Microsoft Corporation | Beat analysis of musical signals |
Non-Patent Citations (2)
Title |
---|
GOTO M ET AL: "Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions" SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 27, no. 3-4, April 1999 (1999-04), pages 311-335, XP004163257 ISSN: 0167-6393 * |
SCHEIRER ERIC D: "Tempo and beat analysis of acoustic musical signals" JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, AIP / ACOUSTICAL SOCIETY OF AMERICA, MELVILLE, NY, US, vol. 103, no. 1, January 1998 (1998-01), pages 588-601, XP012000051 ISSN: 0001-4966 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007036846A2 (en) * | 2005-09-30 | 2007-04-05 | Koninklijke Philips Electronics N.V. | Method and apparatus for automatic structure analysis of music |
WO2007036846A3 (en) * | 2005-09-30 | 2007-11-29 | Koninkl Philips Electronics Nv | Method and apparatus for automatic structure analysis of music |
DE102009031673A1 (en) * | 2009-02-13 | 2010-08-26 | Kajetan Dvoracek | Method for determining clock speed of electrical signals for e.g. pulse frequency-oriented sports activity, involves supplementing maximum value if necessary, and determining clock speed of piece of music from period values |
US9830896B2 (en) | 2013-05-31 | 2017-11-28 | Dolby Laboratories Licensing Corporation | Audio processing method and audio processing apparatus, and training method |
US20200357369A1 (en) * | 2018-01-09 | 2020-11-12 | Guangzhou Baiguoyuan Information Technology Co., Ltd. | Music classification method and beat point detection method, storage device and computer device |
US11715446B2 (en) | 2018-01-09 | 2023-08-01 | Bigo Technology Pte, Ltd. | Music classification method and beat point detection method, storage device and computer device |
Also Published As
Publication number | Publication date |
---|---|
WO2007072394A3 (en) | 2007-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5362178B2 (en) | Extracting and matching characteristic fingerprints from audio signals | |
JP4900960B2 (en) | Apparatus and method for analyzing information signals | |
US8586847B2 (en) | Musical fingerprinting based on onset intervals | |
US7085613B2 (en) | System for monitoring audio content in a video broadcast | |
US7386357B2 (en) | System and method for generating an audio thumbnail of an audio track | |
JP4949687B2 (en) | Beat extraction apparatus and beat extraction method | |
US6604072B2 (en) | Feature-based audio content identification | |
US7500176B2 (en) | Method and apparatus for automatically creating a movie | |
US20020116195A1 (en) | System for selling a product utilizing audio content identification | |
JP4650662B2 (en) | Signal processing apparatus, signal processing method, program, and recording medium | |
EP1579419B1 (en) | Audio signal analysing method and apparatus | |
GB2518663A (en) | Audio analysis apparatus | |
MX2007002071A (en) | Methods and apparatus for generating signatures. | |
WO2007072394A2 (en) | Audio structure analysis | |
US20110067555A1 (en) | Tempo detecting device and tempo detecting program | |
US8983082B2 (en) | Detecting musical structures | |
Zhou et al. | Music onset detection based on resonator time frequency image | |
EP2022041A1 (en) | Selection of tonal components in an audio spectrum for harmonic and key analysis | |
US9767846B2 (en) | Systems and methods for analyzing audio characteristics and generating a uniform soundtrack from multiple sources | |
EP1497935B1 (en) | Feature-based audio content identification | |
JP5395399B2 (en) | Mobile terminal, beat position estimating method and beat position estimating program | |
JP2005292207A (en) | Method of music analysis | |
CN112687247A (en) | Audio alignment method and device, electronic equipment and storage medium | |
EP2355104A1 (en) | Apparatus and method for processing audio data | |
AU2002249371B2 (en) | Method and apparatus for identifying electronic files |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 06842576 Country of ref document: EP Kind code of ref document: A2 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06842576 Country of ref document: EP Kind code of ref document: A2 |