EP1671315B1 - Procede et dispositif pour caracteriser un signal audio - Google Patents
Procede et dispositif pour caracteriser un signal audio Download PDFInfo
- Publication number
- EP1671315B1 EP1671315B1 EP05735854A EP05735854A EP1671315B1 EP 1671315 B1 EP1671315 B1 EP 1671315B1 EP 05735854 A EP05735854 A EP 05735854A EP 05735854 A EP05735854 A EP 05735854A EP 1671315 B1 EP1671315 B1 EP 1671315B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- sequence
- implemented
- tone
- sub
- order
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims description 25
- 230000008569 process Effects 0.000 title description 4
- 230000001020 rhythmical effect Effects 0.000 claims description 12
- 238000013139 quantization Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 7
- 238000003909 pattern recognition Methods 0.000 claims 1
- 230000033764 rhythmic process Effects 0.000 abstract description 11
- 238000011002 quantification Methods 0.000 abstract 1
- 238000004458 analytical method Methods 0.000 description 13
- 238000000605 extraction Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 9
- 238000012880 independent component analysis Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 238000011524 similarity measure Methods 0.000 description 6
- 238000009527 percussion Methods 0.000 description 5
- 241001050985 Disco Species 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 239000011435 rock Substances 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 230000002459 sustained effect Effects 0.000 description 3
- 238000005311 autocorrelation function Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 241001077262 Conga Species 0.000 description 1
- 241000982634 Tragelaphus eurycerus Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- METKIMKYRPQLGS-UHFFFAOYSA-N atenolol Chemical compound CC(C)NCC(O)COC1=CC=C(CC(N)=O)C=C1 METKIMKYRPQLGS-UHFFFAOYSA-N 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 229910001385 heavy metal Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 210000004197 pelvis Anatomy 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/071—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition
Definitions
- the present invention relates to the analysis of audio signals, and more particularly to the analysis of audio signals for purposes of classifying and identifying audio signals to characterize the audio signals.
- the aim is also to "enrich" audio data with metadata to z. B. to recover a piece of music based on a fingerprint metadata.
- the "fingerprint” should on the one hand be meaningful, and on the other hand be as short and concise as possible. "Fingerprint” thus refers to a compressed generated from a music signal Information signal, which does not contain the metadata, but for referencing to the metadata eg by searching in a database is used, for example in a system for the identification of audio material ("AudioID").
- music data consists of the superposition of sub-signals from single sources. While pop music typically has relatively few individual sources, namely the singer, the guitar, the bass guitar, the drums, and a keyboard, the number of sources for an orchestral piece can become very large.
- An orchestral piece and a pop music piece for example, consist of a superposition of the tones emitted by the individual instruments.
- An orchestral piece or piece of music thus represents a superimposition of partial signals from individual sources, the partial signals being the sounds produced by the individual instruments of the orchestra or pop music ensemble, and the individual instruments being individual sources.
- groups of original sources can also be considered as individual sources, so that at least two individual sources can be assigned to one signal.
- An analysis of a general information signal is shown below by way of example only with reference to an orchestra signal.
- the analysis of an orchestra signal can be done in many ways.
- Other possibilities of analysis exist in extracting a dominant rhythm, whereby rhythm extraction on the basis of the percussion instruments is better than on the basis of the more sound-giving instruments, which are also referred to as harmonic-sustained or "harmonic sustained" instruments.
- harmonic-sustained or "harmonic sustained” instruments While percussion instruments typically include timpani, drums, rattles or other percussion instruments, the harmonic sustained instruments include all other instruments such as violins, wind instruments, etc.
- the percussion instruments include all those acoustic or synthetic tone generators that contribute to the rhythm section due to their sound characteristics (e.g., rhythm guitar).
- rhythm extraction of a piece of music it would be desirable for the rhythm extraction of a piece of music to extract only percussive parts from the entire piece of music and then perform rhythm recognition on the basis of these percussive parts, without the rhythm recognition being "disturbed” by signals from the harmonically sustained instruments.
- melodic fragments In contrast to the usual structure of occidental music, melodic fragments, unlike the rhythmic structure, usually do not appear periodically. For this reason, many methods of searching for melodic fragments are limited to the individual finding of their occurrence. In contrast to this, in the field of rhythmic analysis, the interest is preferentially in finding periodic structures.
- Methods for identifying melodic themes are only of limited suitability for identifying periodicities present in a sound signal, since, as has been said, musical themes are recurrent, but not so much a basic periodicity in a piece of music, but rather, when have at all higher-level periodicity information in itself.
- methods for the identification of melodic themes are very complex, since in the search for melodic themes the different variations of the topics must be considered. So it is known from the music world that topics are usually varied, namely, for example, by transposition, mirroring, etc.
- the object of the present invention is to provide an efficient and reliable concept for characterizing a sound signal.
- This object is achieved by a device for characterizing a sound signal according to claim 1, a method for characterizing a sound signal according to claim 20 or a computer program according to claim 21.
- the present invention is based on the finding that a characteristic of a sound signal which can be calculated efficiently and is informative with respect to many information can be determined on the basis of a sequence of application times by period length determination, division into subsequences and summary into a combined subsequence as a characteristic.
- each sequence of deployment times is subdivided into respective subsequences, wherein a length of a subsequence is equal to the common period length.
- the characteristic extraction then takes place on the basis of a summary of the subsequences for the first sound source into a first combined subsequence and on the basis of a summary of the subsequences for the second sound source into a second combined subsequence, the combined subsequences being characteristic of the sound signal can be used and used for further processing, such as for extracting semantically meaningful information about the entire piece of music, such as genre, tempo, time signature, similarity to other pieces of music, etc.
- the combined subsequence for the first sound source and the combined subsequence for the second sound source thus form a drum pattern of the sound signal if the two sound sources, which have been taken into account based on the sequence of application times, are percussive sound sources, such as drums, other drum Instruments or any other percussive instruments, which are characterized by the fact that their pitch does not decide their pitch, but that their characteristic spectrum or the rise and fall of an output sound and not the pitch of higher musical importance.
- the procedure according to the invention thus serves for the automatic extraction of preferably drum patterns from a preferably transcribed, so z.
- B. note representation of a music signal This representation may be in MIDI format or automatically determined from an audio signal using digital signal processing techniques.
- ICA Independent Component Analysis
- BSS Blind Source Separation
- recognition of the note inserts ie start times, for each different instrument and pitch for tonal instruments is first performed.
- a reading out of a score can take place, wherein this reading can consist in a reading in of a MIDI file or can consist in a scanning and image processing of a musical notation or in the acceptance of manually typed notes.
- a raster is determined according to which the billet times are quantized, after which the billboard times are then quantized.
- the length of the drum pattern is then determined as the length of a musical measure, as an integral multiple of the length of a musical measure, or as an integral multiple of the length of a musical count.
- the pattern histogram can be processed as such.
- the pattern histogram is also a compressed representation of the musical events, ie the score, and contains information on the degree of variation and preferred beats, with flatness of the histogram indicative of a large variation, while a very "mountainous" histogram indicates a high indicates stationary signal in the sense of a self-similarity.
- the histogram it is preferred to first perform a preprocessing to subdivide a signal into characteristic mutually similar regions of the signal and to extract a drum pattern only for mutually similar regions in the signal and another for other characteristic regions in the signal Determine drum pattern.
- the present invention is advantageous in that a robust and efficient way of calculating a characteristic of a sound signal is obtained, in particular due to the subdivision carried out, which is very robust and equally feasible for all signals according to the period length which can also be determined by statistical methods.
- the concept according to the invention is scalable to the extent that the meaningfulness and accuracy of the concept can be increased at the price of a higher computing time without further ado that more and more episodes of occurrence of more and more different sound sources, ie instruments, in the determination of common Period length and are included in the determination of the drum pattern, so that the calculation of the summarized subsequences becomes more and more complex.
- an alternative scalability is also to calculate a certain number of combined subsequences for a certain number of sound sources, and then, depending on the further processing interest to rework the resulting summarized subsequences and thus reduce their explanatory power as needed. Histogram entries below a certain threshold may e.g. B. be ignored. However, histogram entries can also be quantized per se or only be generally binarized depending on the threshold decision to the effect that a histogram merely contains the statement that a histogram entry is or is not in the summarized subsequence at a specific point in time.
- the concept according to the invention is a robust method due to the fact that many subsequences are "merged" into a combined subsequence, but nevertheless can be executed efficiently since no numerically intensive processing steps are required.
- percussive instruments without pitch which are also called drums in the following, play an essential role, especially in popular music.
- Lots of information about rhythm and musical genre is in the drums played "notes", which z. B. could be used in an intelligent and intuitive search in music archives to perform classifications or at least Vorklasstechniken can.
- drum patterns The notes played by drums often form recurring patterns, also known as drum patterns.
- a drum pattern can serve as a compressed representation of the played notes by extracting a note of the length of a drum pattern from a longer note image. This can be extracted from drum pattern semantically meaningful information about the entire piece of music, such as genre, tempo, time signature, similarity to other pieces of music, etc.
- FIG. 1 shows an inventive device for characterizing a sound signal.
- FIG. 1 includes means 10 for providing a sequence of deployment times for each sound source from at least two sound sources over time.
- the deployment times are preferably already quantized deployment times, which are present in a quantization grid.
- FIG. 2 shows a sequence of application times of notes from different sound sources, ie instruments 1, 2,..., N, which are designated by "x" in FIG. 2
- FIG. 3 shows one in a raster in FIG FIG. 3 shows a quantized sequence of quantized use times for each sound source, ie for each instrument 1, 2,..., N.
- FIG. 3 simultaneously represents a matrix or list of insertion times, wherein a column in FIG. 3 corresponds to a distance between two grid points or grid lines and thus represents a time interval in which a note insert is present or not depending on the sequence of use times.
- a note insert from instrument 1 and this also applies to the instrument 2, as indicated by the "x" in the two lines 1 and 2 associated with the instruments in FIG. 3 is indicated.
- the instrument n has no note insertion time in the time interval shown by the reference numeral 30.
- the plurality of sequences of preferably quantized application times are supplied by the device 10 to a means 12 for determining a common period length.
- the means 12 for determining a common period length is designed so as not to determine its own period length itself for each succession of application times, but to find a common period length which is most likely to underlie the at least two sound sources. This is based on the fact that even if z. B. play several percussion instruments in one piece, all play more or less the same rhythm, so that a common period length must exist to which virtually all instruments that contribute to the audio signal, so all sound sources will hold.
- the common tone period length is then supplied to a means 14 for dividing each sequence of use times to obtain on the output side a set of subsequences for each sound source.
- a common period length 40 has been found for all of the instruments 1, 2,..., N, where the means 14 is arranged to be divided into subsequences to all To divide sequences of deployment times into subsequences of the length of the common period length 40.
- the sequence of application instants for the instrument would then, as shown in FIG. 4, be divided into a first subsequence 41, a subsequent second subsequence 42 and a subsequent subsequence 43, in order thus to illustrate the example of FIG the sequence for the instrument 1 to obtain three subsequences.
- the other consequences for the instruments 2, ..., n also become divided into corresponding contiguous subsequences, as illustrated by the sequence of deployment times for the instrument 1.
- the sets of subsequences for the sound sources are then supplied to a means 16 for combining for each sound source to obtain a combined subsequence for the first sound source and a combined subsequence for the second sound source as a characteristic for the sound signal.
- the summary preferably takes place in the form of a pattern histogram.
- the subsequences for the first instrument are superimposed aligned with each other such that the first interval of each subsequence is effectively "above" the first interval of each other subsequence.
- the entries in each slot of a combined suborder and in each histogram bin of the pattern histogram are counted. In the example shown in FIG.
- the combined subsequence for the first sound source would thus be a first line 50 of the pattern histogram.
- the combined subsequence would be the second line 52 of the pattern histogram, etc.
- the pattern histogram in Fig. 5 thus represents the characteristic for the sound signal, which can then be used for various other purposes.
- the finding of the pattern length can be realized in various ways, namely, for example, from an a-priori criterion, which immediately gives an estimate of the periodicity / pattern length due to the existing Provides note information, or alternatively z.
- a preferably iterative search algorithm which accepts a number of hypotheses for the pattern length and checks their plausibility based on the resulting results. For example, this may also be done again by evaluating a pattern histogram, as is also preferably implemented by the merge means 16, or using other self-similarity measures.
- the pattern histogram may be generated by the merge means 16.
- the pattern histogram may also consider the intensities of the individual notes in order to weight the notes according to their relevance.
- the histogram may only contain information as to whether there is a sound in a subsequence or in a bin or time slot of a subsequence, or not. In this case, a weighting of the individual notes would not be included in the histogram in terms of their relevance.
- the characteristic shown in Fig. 5, which is here preferably a pattern histogram is further processed.
- note selection can be made on the basis of a criterion, for example by comparing the frequency or the combined intensity values with a threshold value. This threshold may also depend on the type of instrument or the flatness of the histogram, among other things.
- the Drum Pattern entries can be Boolean sizes, with a "1" for the fact would stand for a note, while a "0" would stand for the fact that no note occurs.
- an entry in the histogram can also be a measure of how high the intensity (loudness) or relevance of the note occurring in this time slot over the music signal is considered.
- the threshold was chosen to mark all time slots or bins in the pattern histogram for each instrument with an "x" where the number of entries is greater than or equal to 3 is.
- all bins are deleted in which the number of entries is less than 3, namely, for example, 2 or 1 amounts.
- a musical "result" or score is generated from percussive instruments that are not or not significantly characterized by a pitch.
- a musical event is defined as the occurrence of a sound of a musical instrument.
- Events are detected in the audio signal and classified into instrument classes, with the timing of events being quantized on a quantization grid, also referred to as a tatum grid.
- the musical measure or the length of a clock is calculated in milliseconds or else a number of quantization intervals, and furthermore preferably also clocks are identified. The identification of rhythmic structures based on the frequency of the occurrence of musical events at certain positions in the drum pattern allows a robust identification of the tempo and provides valuable information for the positioning the timing lines, when also musical background knowledge is used.
- the musical score or characteristic preferably comprises the rhythmic information such as start time and duration.
- this metric information namely a time signature
- an automatic transcription process can be divided into two tasks, namely the detection and classification of the musical events, ie notes, and the generation of a musical score from the detected notes, ie the drum pattern, as has already been explained above.
- the metric structure of the music is preferably estimated, wherein a quantization of the temporal positions of the detected notes as well as a detection of upbeats and a determination of the position of the clock lines can be made.
- the detection and classification of the events is preferably performed by the method of independent subspace analysis.
- means 10 for providing sequences of times of use for a plurality of sound sources performs quantization.
- the detected events are preferably quantized in the Tatum grid.
- the tatum grid is estimated using the note usage times of the recorded events along with note times that operate using conventional note-taking techniques.
- the Creating the Tatum Grid based on the detected percussive events works reliably and robustly. It should be noted that the distance between two halftone dots in a piece of music usually represents the fastest played note. Thus, if there are at most sixteenth notes in a piece of music and no ones faster than the sixteenth notes, the distance between two dots of the tatum grid is equal to the length of a sixteenth note of the tone signal.
- the distance between two halftone dots corresponds to the largest note value needed to represent all occurring note values or time periods by forming integer multiples of this note value.
- the grid spacing is thus the largest common divisor of all occurring note durations / period lengths etc.
- the tatum grid is represented using a 2-way mismatch procedure (TWM).
- TWM 2-way mismatch procedure
- a series of trial values for the tatum period that is, the spacing of two halftone dots, is derived from a histogram for an inter-onset interval (IOI).
- IOI inter-onset interval
- the calculation of the IOI is not limited to consecutive onsets, but to virtually all pairs of onsets in a timeframe.
- Tatum candidates are calculated as integer fractions of the most common IOI. The candidate is selected that best predicts the harmonic structure of the IOI according to the 2-way mismatch error function.
- the estimated tatum period is subsequently calculated by calculating the error function between the comb grid and the tatum period is derived and calculates the onset times of the signal.
- the histogram of the IOI is generated and smoothed by means of an FIR low-pass filter.
- Tatum candidates are calculated by dividing the IOI according to the peaks in the IOI histogram by a set of values between e.g. B. 1 and 4 received.
- a raw estimate for the Tatum period is derived from the IOI histogram after applying the TWM. Thereafter, the phase of the tatum grid and an exact estimate of the tatum period are calculated by means of the TWM between the billets and several tatum grids with periods close to the previously estimated tatum period.
- the second method refines and presents the tatum grid by computing the best match between the note insert vector and the tatum grid, using a correlation coefficient R xy between the note insert vector x and the tatum y.
- ⁇ i 1 n x i - x ⁇ 2
- ⁇ i 1 n y i - y ⁇ 2
- the tatum grid for adjacent frames with z. B. a length of 2.5 seconds estimated.
- the transitions between the tatum grids of adjacent frames are smoothed by low-pass filtering the IOI vector of the tatum grid points, and the tatum grid is restored from the smoothed IOI vector. Then each event is assigned to its closest grid position. This is a kind of quantization.
- the intensity of the detected events can either be removed or used, resulting in a Boolean matrix or resulting in a matrix of intensity values.
- the quantized representation of the percussive events provides valuable information for the estimation of the musical measure or a periodicity that underlies the playing of the sound sources.
- the periodicity at the clock level for example, is determined in two stages. First, a periodicity is calculated to then estimate the cycle length.
- the periodic functions used are the autocorrelation function (ACF) or the mean magnitude difference function (AMDF), as shown in the following equations.
- the AMDF is also used to estimate the fundamental frequency for music and speech signals and to estimate the musical measure.
- a suitable extension for the comparison of the rhythmic structures results from the different weighting of similar hits and rest periods.
- the similarity B between two sections of a score T 1 and T 2 is then calculated by weighted summation of the Boolean operations, as shown below.
- B a ⁇ T 1 ⁇ T 2 + b ⁇ ⁇ ⁇ ⁇ T 1 ⁇ ⁇ ⁇ ⁇ T 2 - c ⁇ T 1 ⁇ ⁇ ⁇ T 2
- the similarity measure M is obtained by summing the elements of B, as set forth below.
- the similarity measures for Boolean matrices can be extended by weighting B with the average of T 1 and T 2 to account for intensity values. Distances or dissimilarities are regarded as negative similarities.
- the time signature is determined by comparing P with a number of metric models.
- the implemented metric models Q consist of a train of spikes with typical accent positions for different time signatures and micro times.
- a micro-time is the integer ratio between the duration of a musical beat, that is, the note value that determines the musical tempo (eg, quarter-note), and the duration of a tatum period.
- T ' is referred to as a score histogram or a pattern histogram.
- Drum patterns are obtained from the score histogram T 'by searching for score elements T' i, j with large histogram values. Patterns longer than one clock are retrieved by repeating the procedure described above for integer values of the measured length. The pattern length with the most hits, relative to the pattern length itself, is selected to obtain a maximum representative pattern as a further or alternative characteristic for the sound signal.
- the identified rhythmic patterns are interpreted using a set of rules derived from musical knowledge.
- equidistant occurrences occur identified by individual instruments and evaluated with reference to the instrument class. This leads to an identification of playing styles that often occur in popular music.
- An example is the very frequent use of the snare-drum or tambourines, or hand claps in the second and fourth beats in a four-quarter cycle.
- This concept serves as an indicator of the position of the timing lines. If there is a backbeat pattern, a measure starts between two small drum attacks.
- Timing lines Another indication of the positioning of the timing lines is the occurrence of kick drum events, that is, events of a typically foot operated large drum.
- a classification of different playing styles is performed, each of which is assigned to individual instruments.
- playing style is that events occur only on every quarter note.
- An associated instrument for this style of play is the kick-drum, so the big drum of the drums operated by the foot.
- This style of playing is abbreviated FS.
- an alternate style of playing is that events occur in every second and fourth quarter note of a four-fourths beat. This is mainly played by the small drum (snare drum) and tambourines, so the hand claps.
- This style of play is abbreviated as BS.
- Exemplary other play styles are that notes often appear on the first and third notes of a triplet. This is abbreviated as SP and is often observed in a hi-hat or cymbal.
- the first feature FS is a boolean value and true if kick-drum events occur only on each quarter note. Only for certain values are Boolean variables not calculated, but certain numbers are determined, such as the relation between the number of off-beat events and the number of on-beat events, such as those from a hi-hat, a shaker or a tambourine.
- drum instruments are classified into one of the various drum set types, such as rock, jazz, latin, disco, and techno, to provide another feature for genre classification.
- the classification of the drum set is not derived using the instrument sounds, but by generally examining the occurrence of drum instruments in various pieces belonging to each genre.
- the drum set type Rock is characterized by a kick drum, a snare drum, a hi-hat and a pelvis.
- the type "Latin" a bongo, a conga, claves and shakers.
- rhythmic features of the drum score or drum pattern are derived from the rhythmic features of the drum score or drum pattern. These features include musical tempo, time signature, micro-time, etc.
- a measure of the variation in the occurrence of kick drum notes is obtained by counting the number of different IOI that occur in the drum pattern.
- the classification of the musical genre using the drum pattern is performed using a rule-based decision network. Potential genre candidates will be rewarded if they fulfill a hypothesis currently under investigation and will be "punished” if they do not fulfill aspects of a hypothesis that is currently under investigation. This process results in the selection of favorable feature combinations for each genre.
- the rules for a rational decision become more representative of observations Pieces and derived from musical knowledge in itself. Values for reward or punishment are set empirically considering the robustness of the extraction concept. The resulting decision for a particular musical genre is made for the genre candidate who has the maximum number of rewards.
- the disco genre is recognized when a drum set type is disco, when the tempo is in the range of 115 to 132 bpm, when a time signature is 4/4 bit and the micro time is equal to 2.
- a play style FS z. B. is present, and that z. B. yet another style of play is present, namely the events occur on each off-beat position. Similar criteria can be applied to other genres such as hip-hop, soul / funk, drum and bass, jazz / swing, rock / pop, heavy metal, Latin, waltz, polka / punk or techno.
- the inventive method for characterizing a sound signal can be implemented in hardware or in software.
- the implementation may be on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which may interact with a programmable computer system such that the method is executed.
- the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method when the computer program product runs on a computer.
- the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
Claims (21)
- Dispositif pour caractériser un signal sonore, aux caractéristiques suivantes :un dispositif (10) destiné à préparer une succession de moments d'attaque de sons pour au moins une source sonore ;un dispositif (12) destiné à déterminer une longueur de période commune qui est à la base de l'au moins une source sonore, à l'aide de l'au moins une succession de moments d'attaque ;un dispositif (14) destiné à subdiviser l'au moins une succession de moments d'attaque en sous-successions respectives, une longueur d'une sous-succession étant égale à la longueur de période commune ou étant dérivée de la longueur de période commune ; etun dispositif (16) destiné à regrouper les sous-successions pour l'au moins une source sonore, pour obtenir une sous-succession regroupée, la sous-succession regroupée représentant une caractéristique du signal sonore.
- Dispositif selon la revendication 1,
dans lequel le dispositif (10) destiné à préparer est réalisé de manière à préparer au moins deux successions de moments d'attaque pour au moins deux sources sonores,
dans lequel le dispositif (12) destiné à déterminer est réalisé de manière à déterminer la longueur de période commune pour les au moins deux sources sonores,
dans lequel l'un dispositif (14) destiné à subdiviser est réalisé de manière à subdiviser les au moins deux successions de moments d'attaque selon la longueur de période commune, et
dans lequel le dispositif (16) destiné à regrouper est réalisé de manière à regrouper les sous-successions pour la deuxième source sonore, pour obtenir une deuxième sous-succession regroupée, la première sous-succession regroupée et la deuxième sous-succession regroupée représentant la caractéristique du signal sonore. - Dispositif selon la revendication 1, dans lequel le dispositif destiné à préparer (10) est réalisé de manière à fournir pour chacune des au moins deux sources sonores une succession de moments d'attaque quantifiés, les moments d'attaque étant quantifiés par rapport à une trame de quantification, une distance de points de trame entre deux points de trame étant égale à une distance la plus petite entre deux sons dans le signal sonore ou égale au diviseur commun le plus grand des durées de sons dans le signal musical.
- Dispositif selon la revendication 1, 2 ou 3, dans lequel le dispositif (10) destiné à préparer est réalisé de manière à fournir les moments d'attaque d'instruments percussifs, mais pas les moments d'attaque d'instruments harmoniques.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif destiné à déterminer (12) est réalisé de manière à
déterminer, pour chacune d'une pluralité de longueurs de période communes hypothétiques, une mesure de probabilité, et
sélectionner, comme longueur de période commune, la longueur de période commune hypothétique parmi la pluralité de longueurs de période communes hypothétiques dont la mesure de probabilité indique que la longueur de période commune hypothétique est la longueur de période commune pour les au moins deux sources sonores. - Dispositif selon la revendication 5, dans lequel le dispositif (12) destiné à déterminer est réalisé de manière à déterminer la mesure de probabilité sur base d'une première mesure de probabilité pour la première source sonore et sur base d'une deuxième mesure de probabilité pour la deuxième source sonore.
- Dispositif selon la revendication 5 ou 6, dans lequel le dispositif (12) destiné à déterminer est réalisé de manière à calculer les mesures de probabilité par une comparaison de la succession de moments d'attaque avec une succession décalée de moments d'attaque.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif (14) destiné à subdiviser est réalisé de manière à générer une liste pour chaque sous-succession, la liste présentant, pour chaque point de trame et pour chaque source sonore, une information assosciée qui se rapporte à si au point de trame est présent ou non un moment d'attaque d'un son.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif (10) destiné à préparer est réalisé de manière à générer une liste pour chaque source sonore, la liste présentant, pour chaque point d'une trame, une information assosciée de si au point de trame est présent ou non un moment d'attaque d'un son.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif (16) destiné à regrouper est réalisé de manière à générer, comme sous-succession regroupée, un histogramme.
- Dispositif selon la revendication 10, dans lequel le dispositif (16) destiné à regrouper est réalisé de manière à générer l'histogramme de sorte que chaque point d'une trame de son de la sous-succession regroupée représente un bin d'histogramme.
- Dispositif selon la revendication 10 ou 11, dans lequel le dispositif (16) destiné à regrouper est réalisé de manière à incrémenter, à chaque sous-succession pour une source sonore, en cas de détection d'une entrée, une valeur de comptage pour un bin associé dans l'histogramme, ou pour l'incrémenter par l'addition d'une mesure fixée par l'entrée, l'entrée étant une mesure d'une intensité d'un son qui a une attaque au moment d'attaque.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif (16) destiné à regrouper est réalisé de manière à sortir dans la première sous-succession regroupée et la deuxième sous-succession regroupée, comme caractéristique, uniquement des valeurs des sous-successions qui se situent au-dessus d'un seuil.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif (16) destiné à regrouper est réalisé de manière à normaliser les sous-successions par rapport à la longueur commune ou pour normaliser la première sous-succession regroupée ou la deuxième sous-succession regroupée par rapport à la longueur commune.
- Dispositif selon l'une des revendications précédentes, dans lequel le dispositif (10) destiné à préparer est réalisé de manière à générer des segments à structure rythmique unitaire à partir d'un signal audio, et
le dispositif (16) destiné à regrouper est réalisé de manière à générer la caractéristique pour un segment à structure rythmique unitaire. - Dispositif selon l'une des revendications précédentes, présentant, par ailleurs, la caractéristique suivante :un dispositif destiné à extraire une caractéristique de la caractéristique pour le signal sonore ; etun dispositif destiné à déterminer un genre musical auquel appartient le signal sonore, à l'aide de la caractéristique.
- Dispositif selon la revendication 16, dans lequel le dispositif destiné à déterminer est réalisé de manière à utiliser un réseau de décision à base de règles, un dispositif d'identification de modèle ou un classificateur.
- Dispositif selon l'une des revendications précédentes, présentant, par ailleurs, un dispositif destiné à extraire un tempo à partir de la caractéristique.
- Dispositif selon la revendication 18, dans lequel le dispositif destiné à extraire est réalisé de manière à déterminer le tempo sur base de la longueur de période commune.
- Procédé pour caractériser un signal sonore, aux étapes suivantes consistant à :préparer (10) une succession de moments d'attaque de sons pour au moins une source sonore ;déterminer (12) une longueur de période commune qui est à la base de l'au moins une source sonore, à l'aide de l'au moins une succession de moments d'attaque ;subdiviser (14) l'au moins une succession de moments d'attaque en sous-successions respectives, une longueur d'une sous-succession étant égale à la longueur de période commune ou étant dérivée de la longueur de période commune ; etregrouper (16) les sous-successions pour l'au moins une source sonore, pour obtenir une sous-succession regroupée, la sous-succession regroupée représentant une caractéristique du signal sonore.
- Programme d'ordinateur avec un code de programme pour l'exécution du procédé selon la revendication 20 lorsque le programme se déroule sur un ordinateur.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE200410022659 DE102004022659B3 (de) | 2004-05-07 | 2004-05-07 | Vorrichtung zum Charakterisieren eines Tonsignals |
PCT/EP2005/004517 WO2005114650A1 (fr) | 2004-05-07 | 2005-04-27 | Procede et dispositif pour caracteriser un signal audio |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1671315A1 EP1671315A1 (fr) | 2006-06-21 |
EP1671315B1 true EP1671315B1 (fr) | 2007-05-02 |
Family
ID=34965834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05735854A Ceased EP1671315B1 (fr) | 2004-05-07 | 2005-04-27 | Procede et dispositif pour caracteriser un signal audio |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1671315B1 (fr) |
JP (1) | JP4926044B2 (fr) |
DE (2) | DE102004022659B3 (fr) |
WO (1) | WO2005114650A1 (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019026236A1 (fr) * | 2017-08-03 | 2019-02-07 | Pioneer DJ株式会社 | Dispositif d'analyse de composition musicale et programme d'analyse de composition musicale |
JP6920445B2 (ja) * | 2017-08-29 | 2021-08-18 | AlphaTheta株式会社 | 楽曲解析装置および楽曲解析プログラム |
CN108257588B (zh) * | 2018-01-22 | 2022-03-01 | 姜峰 | 一种谱曲方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6201176B1 (en) * | 1998-05-07 | 2001-03-13 | Canon Kabushiki Kaisha | System and method for querying a music database |
US6990453B2 (en) * | 2000-07-31 | 2006-01-24 | Landmark Digital Services Llc | System and methods for recognizing sound and music signals in high noise and distortion |
DE10157454B4 (de) * | 2001-11-23 | 2005-07-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Verfahren und Vorrichtung zum Erzeugen einer Kennung für ein Audiosignal, Verfahren und Vorrichtung zum Aufbauen einer Instrumentendatenbank und Verfahren und Vorrichtung zum Bestimmen der Art eines Instruments |
JP2004029274A (ja) * | 2002-06-25 | 2004-01-29 | Fuji Xerox Co Ltd | 信号パターン評価装置、信号パターン評価方法及び信号パターン評価プログラム |
-
2004
- 2004-05-07 DE DE200410022659 patent/DE102004022659B3/de not_active Expired - Fee Related
-
2005
- 2005-04-27 WO PCT/EP2005/004517 patent/WO2005114650A1/fr active IP Right Grant
- 2005-04-27 DE DE502005000658T patent/DE502005000658D1/de active Active
- 2005-04-27 EP EP05735854A patent/EP1671315B1/fr not_active Ceased
- 2005-04-27 JP JP2007511960A patent/JP4926044B2/ja not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
EP1671315A1 (fr) | 2006-06-21 |
JP4926044B2 (ja) | 2012-05-09 |
JP2007536586A (ja) | 2007-12-13 |
WO2005114650A1 (fr) | 2005-12-01 |
DE102004022659B3 (de) | 2005-10-13 |
DE502005000658D1 (de) | 2007-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7273978B2 (en) | Device and method for characterizing a tone signal | |
Mitrović et al. | Features for content-based audio retrieval | |
EP1797552B1 (fr) | Procede et dispositif pour extraire une melodie servant de base a un signal audio | |
EP1371055B1 (fr) | Dispositif pour l'analyse d'un signal audio concernant des informations de rythme de ce signal a l'aide d'une fonction d'auto-correlation | |
EP2351017B1 (fr) | Procédé permettant de détecter des motifs de notes dans des pièces musicales | |
CN102770856B (zh) | 用于精确波形测量的域识别和分离 | |
Tzanetakis et al. | Human perception and computer extraction of musical beat strength | |
DE10123366C1 (de) | Vorrichtung zum Analysieren eines Audiosignals hinsichtlich von Rhythmusinformationen | |
WO2006039995A1 (fr) | Procede et dispositif pour le traitement harmonique d'une ligne melodique | |
WO2002084641A1 (fr) | Procede pour convertir un signal musical en une description fondee sur des notes et pour referencer un signal musical dans une base de donnees | |
DE102004028693B4 (de) | Vorrichtung und Verfahren zum Bestimmen eines Akkordtyps, der einem Testsignal zugrunde liegt | |
EP1671315B1 (fr) | Procede et dispositif pour caracteriser un signal audio | |
DE102004028694B3 (de) | Vorrichtung und Verfahren zum Umsetzen eines Informationssignals in eine Spektraldarstellung mit variabler Auflösung | |
CN102930865A (zh) | 一种波形音乐粗情感软切割分类方法 | |
Smith et al. | Using quadratic programming to estimate feature relevance in structural analyses of music | |
EP1377924B1 (fr) | Procede et dispositif permettant d'extraire une identification de signaux, procede et dispositif permettant de creer une banque de donnees a partir d'identifications de signaux, et procede et dispositif permettant de se referencer a un signal temps de recherche | |
Bader | Neural coincidence detection strategies during perception of multi-pitch musical tones | |
DE112020002116T5 (de) | Informationsverarbeitungsvorrichtung und Verfahren und Programm | |
EP1743324B1 (fr) | Dispositif et procede pour analyser un signal d'information | |
Boonmatham et al. | Musical-scale characteristics for traditional Thai music genre classification | |
Tjahyanto et al. | Gamelan instrument sound recognition using spectral and facial features of the first harmonic frequency | |
Wang et al. | The analysis and comparison of vital acoustic features in content-based classification of music genre | |
Morman et al. | A system for the automatic segmentation and classification of chord sequences | |
Raposo | Improving Acoustic Features for Structural Segmentation of Music | |
Pérez Fernández et al. | A comparison of pitch chroma extraction algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060420 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR LV MK YU |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 11/00 20060101AFI20060811BHEP |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FOERDERUNG DER ANGEWAN |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: UHLE, CHRISTIAN Inventor name: CREMER, MARKUS |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
DAX | Request for extension of the european patent (deleted) | ||
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20070510 |
|
REF | Corresponds to: |
Ref document number: 502005000658 Country of ref document: DE Date of ref document: 20070614 Kind code of ref document: P |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20080205 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 12 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 13 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20180423 Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20180424 Year of fee payment: 14 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20180403 Year of fee payment: 14 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 502005000658 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20190427 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20191101 Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190427 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190430 |