EP1797552A2 - Procede et dispositif pour extraire une melodie servant de base a un signal audio - Google Patents

Procede et dispositif pour extraire une melodie servant de base a un signal audio

Info

Publication number
EP1797552A2
EP1797552A2 EP05790019A EP05790019A EP1797552A2 EP 1797552 A2 EP1797552 A2 EP 1797552A2 EP 05790019 A EP05790019 A EP 05790019A EP 05790019 A EP05790019 A EP 05790019A EP 1797552 A2 EP1797552 A2 EP 1797552A2
Authority
EP
European Patent Office
Prior art keywords
segment
time
spectral
melody
predetermined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP05790019A
Other languages
German (de)
English (en)
Other versions
EP1797552B1 (fr
Inventor
Frank Streitenberger
Martin Weis
Claas Derboven
Markus Cremer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gracenote Inc
Original Assignee
Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV filed Critical Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Publication of EP1797552A2 publication Critical patent/EP1797552A2/fr
Application granted granted Critical
Publication of EP1797552B1 publication Critical patent/EP1797552B1/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H3/00Instruments in which the tones are generated by electromechanical means
    • G10H3/12Instruments in which the tones are generated by electromechanical means using mechanical resonant generators, e.g. strings or percussive instruments, the tones of which are picked up by electromechanical transducers, the electrical signals being further manipulated or amplified and subsequently converted to sound by a loudspeaker or equivalent instrument
    • G10H3/125Extracting or recognising the pitch or fundamental frequency of the picked up signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/066Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/086Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for transcription of raw audio or music data to a displayed or printed staff representation or to displayable MIDI-like note-oriented data, e.g. in pianoroll format
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/031Spectrum envelope processing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/161Logarithmic functions, scaling or conversion, e.g. to reflect human auditory perception of loudness or frequency
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Definitions

  • the present invention relates to the extraction of a melody underlying an audio signal.
  • Such extraction may be used, for example, to obtain a transcribed representation of a melody underlying a monophonic or polyphonic audio signal, which may also be in an analog form or in a digital sampled form.
  • melody extractions for example, enable the generation of ringtones for mobile phones from any audio signal, e.g. Singing, humming, whistling or the like.
  • Such combinability of ready-made melody and accompaniment patterns is implemented, for example, in the Sony-Ericsson T610 phone.
  • the user relies on the purchase of commercially available pre-made ringtones.
  • Klapuri AP: Signal Processing Methods for the Automatic Transcription of Music, Tampere University of Technology, Summary Diss., December 2003, and Klapuri, AP, Signal Processing Methods for the Automatic Transcription of Music, Tampere University of Technology, Diss.
  • Klapuri Multiple Fundamental Frequency Estimation based on Harmonicity and Spectral Smoothness
  • Klapuri AP Eronen AJ and Astola JT
  • Automatic Estimation of the "Meter of Acoustic Musical Signals” Tempere University of Technology, Institute of Signal Processing, Report 1-2004, Tampere, Finland, 2004, ISSN: 1459-4595, ISBN: 952-15-1149-4, are various procedures around the automatic transcription of music described.
  • Paiva R.P. also deals with melody detection.
  • u. a. A Methodology for Detection of Melody in Polyphonic Musical Signals, 116th AES Convention, Berlin, May 2004. There, too, it is proposed to follow the path of trajectory tracking in the time / spectral representation. The document also deals with the segmentation of the individual trajectories, until they are processed into a sequence of notes.
  • a robust transcription could, of course, also be used as a recording front end, and it would also be possible to use an automatic transcription as an addition to an audio ID system, ie a system that recognizes audio files on a fingerprint contained in them if not recognized by the audio ID system, such as due to a missing fingerprint, automatic transcription could alternatively be used to evaluate an incoming audio file.
  • Stably functioning automatic transcription would also allow for the production of similarity relationships associated with other musical features, e.g. Key, harmony and rhythm, such as for a "recomandation engine” or "suggestion engine”.
  • a stable automatic transcription could create new views and lead to a re-examination of judgments on older music.
  • automatic transcription that is stable in use could be used.
  • melody recognition or auto-transcription is not limited to the generation of ringtones for mobile phones mentioned above, but can generally serve as a support for musicians and those interested in music.
  • the object of the present invention is to provide a more stable melody recognition scheme which works correctly for a wider variety of audio signals. This object is achieved by a device according to claim 1 and a method according to claim 33.
  • the knowledge of the present invention is that the melody extraction or automatic transcription can be made much more stable and possibly even less expensive, if the assumption is sufficiently taken into account that the main melody is that portion of a piece of music that the person perceives loudest and most concise.
  • the time / spectral representation or spectrogram of an audio signal of interest is scaled using the equal volume curves reflecting human loudness perception to determine the melody of the audio signal based on the resulting perceptual time / spectral representation to investigate.
  • the above musicological statement that the main melody is that portion of a piece of music that the person perceives most loudly and succinctly is taken into account in two ways.
  • a melody line extending through the time / spectral representation is firstly determined, namely by virtue of the fact that exactly one spectral component or one frequency bin is assigned to each time segment or frame the time / spectral representation is assigned, namely that which leads to the sound result with the maximum intensity.
  • the spectrogram of the audio signal is first logarithmized, so that the logarithmized spectral values indicate the sound pressure level.
  • the logarithmic spectral values of the logarithmized spectrogram become dependent on their respective value and the spectral component to which they belong perceptually related spectral values.
  • Functions are used that represent the curves of equal volume as sound pressure as a function of spectral components or as a function of the frequency and are assigned to different volumes.
  • Fig. 1 is a block diagram of an apparatus for generating a polyphonic melody
  • Fig. 2 is a flowchart illustrating the operation of the extraction device of
  • Fig. 3 is a more detailed flow chart illustrating the operation of the extractor of the apparatus of Fig. 1 in the case of a polyphonic audio input signal;
  • Fig. 5 is a logarithmic spectrogram, as shown by the logarithm of Fig. 3;
  • Fig. 6 is a graph of equal loudness curves underlying the evaluation of the spectrum in Fig. 3;
  • Fig. 7 is a graph of an audio signal used before the actual logarithm in Fig. 3 to obtain a reference value for logarithmization; - -
  • Fig. 8 is a perceptual spectrogram obtained after the evaluation of the spectrogram of Fig. 5 in Fig. 3;
  • FIG. 9 is the melody line resulting from the perceptual spectrum of FIG. 8 through the melody line determination of FIG. 3, plotted in the time / spectral domain;
  • FIG. 10 is a flow chart illustrating the general segmentation of FIG. 3; FIG.
  • 11 is a schematic representation of an exemplary melody line progression in the time / spectral domain
  • FIG. 12 is a schematic representation of a section of the melody line progression diagram of FIG. 11, for illustrating the mode of operation of the filtering in the general segmentation of FIG. 10;
  • FIG. 12 is a schematic representation of a section of the melody line progression diagram of FIG. 11, for illustrating the mode of operation of the filtering in the general segmentation of FIG. 10;
  • Fig. 13 is the melody line progression of Fig. 9 after the frequency domain confinement in the general segmentation of Fig. 10;
  • Fig. 14 is a schematic drawing, in which a section of a melody line is shown, for
  • Fig. 15 is a schematic drawing of a portion of a melody line illustrating the operation of the segmentation in the general segmentation of Fig. 10;
  • Fig. 16 is a flow chart illustrating the gap closure in Fig. 3; _
  • Fig. 17 is a schematic drawing for illustrating the procedure for setting the variable halftone vector in Fig. 3;
  • Fig. 18 is a schematic drawing for illustrating the gap closing of Fig. 16;
  • FIG. 19 is a flowchart for illustrating the harmony mapping and the harmony illustration in FIG.
  • FIG. 20 shows a schematic representation of a detail from the melody line progression for illustrating the mode of action of the harmoniemappings according to FIG. 19;
  • 21 is a flowchart illustrating the
  • FIG. 22 shows a schematic representation of a segment profile for illustrating the procedure according to FIG.
  • FIG. 23 shows a schematic illustration of a detail from the melody line progression in order to illustrate the procedure for the statistical correction in FIG. 3;
  • FIG. 24 is a flow chart illustrating the procedure of onset detection and correction in FIG. 3; FIG.
  • FIG. 25 is a graph illustrating an exemplary filter transfer function for use in the
  • 26 shows a schematic progression of a two-way rectified filtered audio signal and the envelope thereof, as they are used for onset
  • FIG. 27 is a flow chart illustrating the operation of the extractor of FIG. 1 in the case of monophonic audio input signals
  • Fig. 28 is a flowchart illustrating sound separation in Fig. 27;
  • 29 is a schematic representation of a section of the amplitude curve of the spectrogram of a
  • Audio signal along a segment to
  • FIG. 30 a and b are schematic representations of a portion of the amplitude profile of the spectrogram of an audio signal along a segment to illustrate the operation of the sound separation of Fig. 28;
  • Fig. 31 is a flowchart illustrating the tone smoothing in Fig. 27;
  • Fig. 32 is a schematic representation of a segment of the melody line course to illustrate the
  • FIG. 35 shows a detail of a two-way rectified filtered audio signal and its interpolation in the case of a potential segment extension.
  • the present invention is described there merely by way of example with reference to a specific application, namely the generation of a polyphonic ringing melody from an audio signal.
  • a melody extraction or automatic transcription according to the invention can also be used elsewhere, such as e.g. for facilitating searching in a database, merely recognizing pieces of music, enabling copyright protection by objectively comparing pieces of music or the like, or just transcribing audio signals to display the transcription result to a musician.
  • FIG. 1 shows an embodiment of a device for generating a polyphonic melody from an audio signal containing a desired tune.
  • FIG. 1 shows a device for the rhythmic and harmonic conditioning and re-instrumenting of a melody-representing audio signal and for supplementing the resulting melody with a suitable accompaniment.
  • the apparatus of FIG. 1, indicated generally at 300, includes an input 302 for receiving the audio signal.
  • the audio signal in a time sample representation, such as a WAV file, expected.
  • the audio signal could also be present in other form at input 302, such as in uncompressed or compressed form or in a frequency band representation.
  • an extraction device 304 Between the input 302 and the output 304, an extraction device 304, a rhythm device 306, a key device 308, a harmony device 310 and a synthesis device 312 are connected in series in this order.
  • the device 300 comprises a melody memory 314.
  • An output of the key device 308 is connected not only to an input of the subsequent harmony device 310, but also to an input of the melody memory 314.
  • the input of the harmony device 310 is not limited to the output of the processing direction but also with an output of the melody memory 314.
  • Another input of the melody memory 314 is provided to receive a provision identification number ID.
  • Another input of the synthesizer 312 is configured to receive style information. The meaning of the style information and the provision identification number is shown in the following functional description. Extractor 304 and rhythm means 306 together form a rhythm editor 316.
  • the extraction device 304 is designed to subject the audio signal received at the input 302 to note extraction or recognition in order to extract a signal from the audio signal _
  • the note sequence 318 which forwards the extraction device 304 to the rhythm device 306, in the present exemplary embodiment is in a form in which, for each note n, a note start time t n indicating the beginning of the note, for example, in seconds, a tone or note duration ⁇ n , which indicates the note duration of the note, for example in seconds, a quantized note or pitch, ie C, Fis or the like, for example as a MIDI note, a volume Ln of the note and an exact frequency f n of N is an index for the respective note in the note sequence, which increases with the order of the successive notes or indicates the position of the respective note in the note sequence.
  • the note sequence 318 still represents the melody as it was also represented by the audio signal 302.
  • the note sequence 318 is now fed to the rhythm device 306.
  • the rhythm means 306 is arranged to analyze the supplied note sequence to one bar length, one prelude, i. a clock raster to determine the note sequence and thereby the individual notes of the note sequence of appropriate clock-quantified lengths, such as. whole, half, quarter, eighth notes, etc., for the particular bar, and to match the notes' notes to the bar pattern.
  • the note sequence that the rhythm device 306 outputs thus represents a rhythmically processed note sequence 324.
  • the key device 308 performs a key determination and possibly a key correction. More specifically, that determines Means 308 based on the note sequence 324 a major key of the user melody represented by the note sequence 324 and the audio signal 302 including the tone gender, ie major or minor, of the example sung piece. Thereafter, it also recognizes, at this point, non-sounding notes in the note sequence 114 and corrects them to arrive at a harmonic-sounding final result, a rhythmically edited and pitch-corrected note sequence 700 forwarded to the harmony 310 and a key ⁇ corrected form represents the desired by the user melody.
  • the functioning of the device 324 with regard to the determination of the key can be carried out in various ways.
  • the key determination can, for example, to those in the
  • the harmony device 310 is configured to receive the note sequence 700 from the device 308 and to find a suitable accompaniment for the tune represented by this note sequence 700.
  • device 310 acts or acts in a cyclic manner.
  • the means 310 operates on each clock as determined by the clock raster set by the rhythm means 306, such that it provides statistics about the tones of the notes T n occurring in the respective clock.
  • the statistic of the occurring tones is then compared with the possible chords of the major scale scale as determined by the key device 308.
  • Means 310 selects, among the possible chords, in particular, that chord whose tones best match the notes that are in the respective measure, as indicated by statistics. To this - -
  • means 310 determines the chord that best fits the notes or notes, for example, in the respective measure.
  • the means 310 allocates chord levels of the root key to the pitches found by the means 306 in response to the pitch, so that a chord progression is formed over the course of the melody. Consequently, at the output of the device 310, in addition to the rhythmically processed and key-corrected note sequence including NL, it also outputs a chord step specification to the synthesis device 312 for each measure.
  • Synthesizer 312 uses style information that can be entered by a user as indicated by case 702 to perform the synthesis, ie, artificially generate the eventually resulting polyphonic melody.
  • style information allows a user to select from four different styles in which the polyphonic melody can be generated, namely Pop, Techno, Latin or Reggae.
  • either one or more companion patterns are stored in the synthesis device 312.
  • the synthesizer 312 To generate the accompaniment, the synthesizer 312 now uses the accompaniment pattern (s) indicated by the style information 702. To produce the accompaniment, the synthesis device 312 hangs the accompaniment patterns per cycle together.
  • the synthesizer 312 simply selects the corresponding accompaniment pattern for the current style for that accompaniment clock. However, if, for a particular clock, the chord determined by the means 310 is not the one in which an accompaniment pattern is stored in the means 312, the synthesizer 312 shifts the notes of the accompaniment pattern by the corresponding semitone, respectively, and changes the sixth and fifth by a semitone in the case of another - -
  • the synthesizer 312 orchestrates the melody represented by the note string 700 passed from the harmony 310 to the synthesizer 312 to obtain a main melody, and then combines the accompaniment and main melody into a polyphonic melody, exemplified herein by way of example Outputs MIDI file at output 304.
  • the key device 308 is further configured to store the note string 700 in the melody memory 314 under a provision identification number. If the user is dissatisfied with the outcome of the polyphonic tune at exit 304, he may reenter the provisioning identification number along with a new style information in the apparatus of Figure 1, whereupon the melody store 314 forwards the sequence 700 stored under the staging identification number to the harmony facility 310, which then determines the chords as described above, whereupon the synthesizer 312 creates a new main tune using the new style information depending on the chords and adds a new main melody depending on the note string 700 and joins it together to form a new polyphonic melody at the output 304.
  • Fig. 2 shows first the rough procedure in the melody extraction or autotranscription. Starting point is - -
  • step 750 which, as described above, may be present as a WAV file.
  • the device 304 then performs a frequency analysis on the audio file in a step 752 to thereby provide a time / frequency representation or spectrogram of the audio signal contained in the file.
  • step 752 comprises a decomposition of the audio signal into frequency bands.
  • the audio signal is subdivided into preferably time-overlapping time segments which are then spectrally decomposed in each case in order to obtain a spectral value for each of a set of spectral components for each time interval or each frame.
  • the set of spectral components depends on the choice of the transformation underlying the frequency analysis 752, a particular embodiment of which will be explained below with reference to FIG. 4.
  • step 752 means 304 determines a weighted amplitude spectrum or perceptual spectrogram in step 754.
  • the detailed procedure for determining the perceptual spectrogram will now be described in greater detail with reference to FIGS. 3-8.
  • the result of step 754 is to rescale the spectrogram obtained from the frequency analysis 752 using the equal volume curves that reflect the human perceptual sensation to fit the spectrogram to the human perceptual perception.
  • the processing 756 subsequent to step 754 uses, among other things, the perceptual spectrogram obtained from step 754 to finally obtain the melody of the output signal in the form of a musical segmented melody line, ie in a form in which groups of consecutive frames interleave each other same pitch, these groups being temporally spaced over one or more frames - -
  • the processing 756 is decomposed into three substeps 758, 760 and 762.
  • the perceptual spectrogram is used to obtain a time / fundamental frequency representation from the same, and in turn to use this time / basic frequency representation to determine a melody line in such a way that exactly one spectral component resp. a frequency bin is assigned.
  • the time / fundamental representation accounts for the division of sounds into partial tones by first delogarithmizing the perceptual spectrogram from step 754 to summate for each frame and for each frequency bin the delogarithmized perceptual spectral values at that frequency bin and the overtones to the respective frequency bin. The result is a sound spectrum per frame.
  • the determination of the melody line is carried out by selecting for each frame the fundamental tone or the frequency or that frequency bin at which the sound spectrum has its maximum.
  • the result of step 758 is thus, so to speak, a melody line function which uniquely assigns exactly one frequency bin to each frame.
  • This melody line function in turn defines a melody line progression in the time / frequency domain or a two-dimensional melody matrix spanned by the possible speech components or bins on one side and the possible frames on the other side.
  • the following substeps 760 and 762 are provided to segment the continuous melody line, thus giving single notes.
  • the segmentation is divided into two sub-steps 760 and 762, depending on whether the segmentation takes place in input frequency resolution, ie in Frequenzbinetzments, or whether the - -
  • Segmentation takes place in halftone resolution, i. after quantizing the frequencies to semitone frequencies.
  • the result of processing 756 is processed in step 764 to generate a sequence of notes from the melody line segments, each note being assigned a note start time, a note duration, a quantized pitch, an exact pitch, and so on.
  • Fig. 3 is consistent with Fig. 2, i. an audio signal is first provided 750 and then subjected to a frequency analysis 752.
  • the WAV file is in a format since the individual audio samples are sampled at a sampling frequency of 16 kHz.
  • the individual samples are present, for example, in a 16-bit format.
  • the audio signal is present as a mono-file.
  • the frequency analysis 752 can then be carried out, for example, by means of a warped filter bank and an FFT (Fast Fourier Transformation).
  • FFT Fast Fourier Transformation
  • the sequence of audio values is first windowed with a window length of 512 samples, where _ _
  • a time frame corresponds to a duration of 8 milliseconds.
  • the warped filter bank is used according to a special embodiment for the frequency range up to about 1550 Hz. This is necessary to achieve a sufficiently good resolution for low frequencies. For a good
  • each frequency band can be assigned to a semitone.
  • the FFT is used for the frequency range up to 8 kHz.
  • the frequency resolution of the FFT is sufficient from about 1,550 Hz for a good halftone representation.
  • Two to six frequency bands correspond to one semitone.
  • the transient response of the warped filter bank must be taken into account.
  • a temporal synchronization in the combination of the two transformations is made.
  • the first 16 frames of the filterbank output are discarded, as well as the last 16 frames of the output spectrum FFT are disregarded.
  • the amplitude level of the filter bank and FFT is identical and requires no adaptation.
  • FIG. 4 shows by way of example an amplitude spectrum or a time / frequency representation or a spectrogram of an audio signal, as obtained by the preceding exemplary embodiment of a combination of a warped filter bank and an FFT.
  • the time t is plotted in seconds s, while along the vertical axis, the frequency f is in Hz.
  • the height of the individual spectral values is gray scale.
  • the time / frequency representation of an audio signal is a two-dimensional field spanned by the possible frequency bins or spectral components on one side (vertical axis) and the time segments or frames on the other side (horizontal axis), each Position of this field is assigned to a specific tuple of frame and Frequenzbin a spectral value or an amplitude.
  • the amplitudes in the spectrum of Fig. 4 are still post-processed in the frequency analysis 752, since the amplitudes calculated by the warped filterbank may sometimes not be accurate enough for subsequent processing.
  • the frequencies that are not exactly at the center frequency of a frequency band have a lower amplitude value than frequencies that correspond exactly to the center frequency of a frequency band.
  • a crosstalk occurs on adjacent frequency bands, which are also referred to as bins or frequency bins.
  • the analysis result of frequency analysis 752 is a matrix of spectral values. These spectral values represent the volume by the amplitude. However, the human volume perception possesses a logarithmic division. It is therefore useful to the amplitude spectrum _
  • logarithmization 770 all spectral values are logarithmized to the level of the sound pressure level, which corresponds to the logarithmic perception of loudness of humans. More specifically, in logarithmizing 770 to the spectral value p in the spectrogram as obtained from the frequency analysis 752, p is mapped to a sound pressure level value and a logarithmic spectral value L, respectively
  • Po indicates the reference sound pressure, i. the volume level that has the smallest perceptible sound pressure at 1000 Hz.
  • FIG. 7 shows the sample audio signal 772 over the time t, the amplitude A being plotted in the Y direction in the smallest representable digital units.
  • the sample audio signal or reference signal 772 is present with an amplitude value of one LSB or with the smallest representable digital value.
  • the amplitude of the reference signal 772 only oscillates by one bit.
  • the frequency of the reference signal 772 corresponds to the frequency of the highest sensitivity of the human Hearing threshold.
  • other determinations of the benchmark may be more beneficial on a case-by-case basis.
  • the result of the logarithmization 770 of the spectrogram from FIG. 4 is shown by way of example. If, due to the logarithmization, a part of the logarithmic spectrogram is in the negative value range, these negative spectral or amplitude values are set to 0 dB to avoid non-meaningful results in the further processing in order to obtain positive results over the entire frequency range ,
  • the logarithmic spectral values are shown in the same manner as in FIG. 4, i. arranged in a spanned by the time t and the frequency f matrix and grayscale depending on the value, namely the darker the greater the respective spectral value.
  • the volume rating of humans is frequency dependent. Therefore, the logarithmic spectrum, as it results from the logarithm 770, must be evaluated in a subsequent step 772 in order to adapt to this frequency-dependent evaluation of the human. For this purpose, the curves of equal volume 774 are used.
  • the score 772 is therefore necessary to match the different amplitude scores of the musical sounds across the frequency scale of human perception since, according to human perception, the amplitude values of low frequencies experience a lower score than amplitudes of higher frequencies.
  • the equal volume curves 774 are present in the device 204 in analytic form, of course, it would also be possible to provide a look-up table which assigns a volume level value to each pair of frequency bin and sound pressure level quantization value.
  • the formula For the volume curve with the lowest volume level, for example, the formula
  • the function parameters of the resting hearing threshold can be changed according to the above equation to the curve of the lowest volume curve of the above-mentioned DIN standard of
  • step 772 means 304 forms each logarithmic spectral value, i. 5, depending on the frequency f or the frequency bin to which it belongs, and its value, which represents the sound pressure level, on a perceptual spectral value representing the volume level.
  • steps 770-774 illustrate possible substeps of step 754 of FIG. 2.
  • the method of FIG. 3 proceeds to evaluation 772 of the spectrum in a step 776 with a fundamental frequency determination or with the calculation of the total intensity of each sound in the audio signal.
  • step 776 the intensities of each fundamental tone and the associated harmonics are added up.
  • a sound consists of a fundamental tone under the corresponding partial tones.
  • the partial tones are integer multiples of the fundamental frequency of a Sound.
  • the partial or overtones are also called harmonics.
  • a harmonic grid 778 is used in step 776 in order to search for each possible fundamental tone, ie each frequency bin, for overtones that are an integral multiple of the respective one Fundamental tones are.
  • further frequency bins corresponding to an integer multiple of the frequency bin of the fundamental are assigned as harmonic frequencies.
  • step 776 the intensities in the spectrogram of the audio signal at the respective fundamental tone and its harmonics are then added up for all possible fundamental tone frequencies.
  • a weighting of the individual intensity values is carried out, since due to several sounds occurring simultaneously in a piece of music, there is the possibility that the fundamental tone of a sound is obscured by an overtone of another sound with a lower-frequency fundamental tone. Also overtones of a sound can be obscured by overtones of another sound.
  • a tone model based on the principle of the Mosataka Goto model and adapted to the spectral resolution of the frequency analysis 752 is used in step 776, the tone model of Goto in Goto, M.: A Robust Predominant-F0 Estimation Method for Real-time Detection of Melody and Bass Lines, in CD Recordings, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, 2000. _
  • harmonic grid 778 assigns the respective harmonic frequencies for each frequency band or frequency bin.
  • harmonic grid 778 assigns the respective harmonic frequencies for each frequency band or frequency bin.
  • overtones for fundamental frequencies m only a specific frequency range is sought, for example from 80 Hz to 4,100 Hz, and harmonics are considered only up to the 15th order.
  • the overtones of different sounds can be assigned to the sound model of several fundamental frequencies.
  • the amplitude ratio of a sought sound can be changed significantly.
  • the amplitudes of the partial tones are evaluated with a halved Gaussian filter.
  • the keynote receives the highest value.
  • the intensity over the intensity values is not directly performed on the perceptual spectrum from step 772. Rather, first in step 776, the perceptual spectrum of FIG. 8 is first delogarithmized with the aid of the reference value from step 770. The result is a delogarithmized perceptual spectrum, ie an array of delogarithmized perceptual spectral values for each frequency bin and frame tuple.
  • the result of step 776 is a sound spectrogram, step 776 itself corresponding to a level addition within the spectrogram of the audio signal.
  • the result of step 776 is entered, for example, in a new matrix having one row for each frequency bin within the frequency range of possible fundamental frequencies and one column for each frame, wherein in each matrix element, ie at each column and row crossing, the result of the summation is entered as the fundamental tone for the corresponding frequency bin.
  • a preliminary determination of a potential melody line is made.
  • the melody line corresponds to a function over time, namely one Function that assigns exactly one frequency band or one frequency bin to each frame.
  • the melody line determined in step 780 defines a track along the domain of definition of the sound spectrogram or matrix of step 776, the track never being ambiguous along the frequency axis.
  • step 780 The determination is made in step 780 such that the maximum amplitude is determined for each frame over the entire frequency range of the sound spectrogram, i. the largest summation value.
  • the result, i. the melody line largely corresponds to the basic course of the melody of the audio track underlying the audio signal 302.
  • the evaluation of the spectralogram with the equal volume curves in step 772 and the search for the maximum intensity sound result in step 780 take into account the musicological statement that the main melody is that portion of a song that the person perceives loudest and most concisely.
  • steps 776-780 represent possible substeps of step 758 of FIG. 2.
  • a general segmentation 782 is first performed in a step 782 which ensures that parts of the potential melody line are eliminated which prima facie can not belong to the actual melody line.
  • the result of the melody line determination of step 780 is shown by way of example for the case of the perceptual spectrum of FIG. 8. 9 shows the melody line plotted over the time t or over the sequence of frames along the x-axis, the frequency f and the frequency bins being indicated along the y-axis.
  • the melody line of step 780 is represented in the form of a binary image array, which is also sometimes referred to as a melody matrix and has one row for each frequency bin and one column for each frame.
  • All points of the array where the melody line is not located have a value of 0 or are white, while the points of the array where the melody line is located have a value of 1 or are black. These points are thus located on frequency bin and frame tuples, which are associated with each other by the melody line function of step 780.
  • the general segmentation 782 begins in a step 786 with the filtering of the melody line 784 in the frequency / time domain in a representation in which the melody line 784 as shown in Fig. 9 is shown as a binary track in an array represented by the frequency bins on the one and the frames are spanned on the other side.
  • the pixel array of Fig. 9 be an x by y pixel array, where x is the number of frames and y is the number of frequency bins.
  • Step 786 is now intended to remove smaller outliers or artifacts in the melody line.
  • 11 shows by way of example in schematic form a possible course of a melody line 784 in a representation according to FIG. 9.
  • the pixel array shows regions 788 in which there are isolated black pixel elements, the sections of the potential melody line 784 due to their temporal brevity certainly do not belong to the actual melody and should therefore be removed.
  • a second pixel array is first generated from the pixel array of FIG. 9 or FIG. 11, in which the melody line is represented in binary form, by entering a value for each pixel, that of the summation of the binary values at the corresponding pixel as well as the pixel adjacent to that pixel.
  • FIG. 12a There is shown an exemplary section of the course of a melody line in the binary image of Fig. 9 or Fig. 11.
  • the exemplary portion of FIG. 12a includes five rows corresponding to different frequency bins 1-5 and five columns AE corresponding to different adjacent frames.
  • the frequency bin 4 is assigned to the frame B by the melody line, the frequency bin 3, etc., to the frame C.
  • the frame A is also assigned a frequency bin by the melody line, but this is not among the five frequency bins from the section of FIG. 12a.
  • the binary value of the same as well as the binary value of the neighboring pixels is first summed up for each pixel 790, as already mentioned.
  • This is exemplified, for example, in FIG. 12a for the pixel 792, in which figure at 794 a square is drawn, which surrounds the pixel adjacent to the pixel 792 and the pixel 792 itself.
  • the pixel 792 there would be a cumulative value of 2, because in the area 794 around the pixel 792 there are only 2 pixels belonging to the melody line, namely the pixel 792 itself and the pixel C3, i. at frame C and bin 3.
  • This summation is repeated by shifting region 794 for all other pixels, resulting in a second pixel image, also sometimes referred to hereinafter as an intermediate matrix.
  • This second pixel image is then subjected to a pixel-by-pixel mapping, wherein in the pixel image all summation values from 0 or 1 to zero and all summation values greater than or equal to 2 are mapped to one.
  • the result of this mapping is shown for the exemplary case of FIG. 12a in FIG. 12a with numbers of "0" and "1" in the individual pixels 790.
  • the first pixel image that is to say that shown in FIG. Figure 12a multiplies the pixel image represented by the hatched pixels by the second pixel array, ie, that illustrated by zeros and ones in Figure 12a.
  • This multiplication prevents low pass filtering of the melody line by the filtering 786 and provides
  • the uniqueness of assigning frequency bins to frames remains secure.
  • FIG. 12b therefore shows a further exemplary section from the melody matrix of FIG. 9 or FIG. 11.
  • the combination of summation and threshold value mapping results in an intermediate matrix in which two individual pixels P4 and R2 have a binary value of 0, although at these pixel positions the melody matrix has a binary value of 1, as can be seen by the hatching in Fig. 12b, which is intended to illustrate that the melody line is at these pixel positions ,
  • These isolated "outliers" of the melody line are therefore removed by the filtering in step 786 after multiplication.
  • step 796 in which portions of melody line 784 are removed by neglecting those portions of the melody line that are not within a predetermined frequency range.
  • the value range of the melody line function is restricted to the predetermined frequency range from step 780.
  • all the pixels of the melody matrix of Fig. 9 and Fig. 11, respectively, are set to zero which are outside the predetermined frequency range.
  • a frequency range is, for example, 100-200-2001-10000 Hz, and preferably 150-1050 Hz.
  • monophonic analysis as described with reference to FIGS.
  • 27 ff A range of frequencies is, for example, 50-150 to 1,000-1,100 Hz, and preferably 80 to 1,050 Hz. Limiting the frequency range to this bandwidth accommodates the observation that melodies in popular music are usually represented by vocals that are themselves located in this frequency range as well as the human language.
  • FIG. 13 shows the melody line filtered by step 786 and clipped by step 796, denoted by reference numeral 802 for discrimination in FIG.
  • the extraction device 304 uses the logarithmic spectrum of FIG. 5 from step 770. More specifically, for each tuber of frequency bin and frame through which melody line 802 passes, extractor 304, in the logarithmic spectrum of FIG.
  • step 804 looks up the corresponding logarithmic spectral value and determines whether the corresponding logarithmic spectral value is less than a predetermined percentage is the maximum amplitude or the maximum logarithmic spectral value in the logarithmic spectrum of FIG. 5.
  • this percentage is preferably between 50 and 70% and preferably 60%, while in monophonic analysis this percentage is preferably between 20 and 40% and preferably 30%.
  • Parts of the melody line 802 for which this is the case are neglected. This approach takes into account the fact that a melody usually always has approximately the same volume, or that sudden extreme volume fluctuations are unlikely to be expected.
  • step 804 all the pixels of the melody matrix of Figures 9 and 17, respectively, are set to zero at which the logarithmic spectral values are less than the predetermined percentage of the maximum logarithmic value.
  • Step 804 is followed, in a step 806, by a separation of those sections of the remaining melody line at which the course of the melody line in the frequency direction changes abruptly so as to have only a short halfway uniform melody curve. To this, too ⁇ J o -
  • Figure 14 shows a portion of the melody matrix across A-M consecutive frames, with the frames arranged in columns as the frequency increases from bottom to top along the column direction.
  • the frequency bin resolution is not shown in FIG. 14 for the sake of clarity.
  • the melody line is indicated by the reference numeral 808 in FIG. 14 by way of example.
  • the melody line 808 in the frames A-D remains constant on a frequency bin to then show a frequency hopping between the frames D and E that is greater than a semitone distance HT.
  • the melody line 808 then remains constant again on a frequency bin, in order then to fall back from frame H to frame I by more than one semitone interval HT.
  • Such a frequency hopping, which is greater than a semitone distance HT also occurs between the frames J and K.
  • the melody line 808 between frames J and M again remains constant on a frequency bin.
  • the device 304 now scans the melody line in a frame-wise manner, for example from the front to the back. In this case, the device 304 checks for each frame whether a frequency jump greater than the semitone distance HT takes place between this frame and the subsequent frame. If so, means 304 marks these frames. In FIG. 14, the result of this marking is exemplarily illustrated by the fact that the corresponding frames are surrounded by a circle, here the frames D, H and J. In a second step, the means 304 checks between which of the marked frames less than one predetermined number of frames are arranged, wherein in the present case, the predetermined number is preferably three.
  • this extracts portions of the melody line 808 where the same jumps between immediately consecutive frames less than a semitone but is less than four frame elements long. Between frames D and H there are three frames in the present exemplary case. This means nothing else than that the melody line 808 does not jump over the frames E - H by more than a semitone. However, there is only one frame between the marked frames H and J. This means nothing else than that in the area of the frames I and J, the melody line 808 jumps more than one semitone both forward and backward in the time direction. This section of the melody line 808, namely in the area of the frames I and J, is therefore neglected in the subsequent processing of the melody line.
  • the corresponding melody line element is set to zero on frames I and J, ie it turns white.
  • This exclusion can therefore comprise at most three consecutive frames, which corresponds to 24 ms.
  • tones shorter than 30 ms rarely occur in today's music, so that the exclusion after step 806 does not lead to a deterioration of the transcription result.
  • step 806 processing within general segmentation 782 proceeds to step 810, where device 304 divides the remaining remnants of the former potential melody line from step 780 into a sequence of segments. In the division into segments, all elements in the melody matrix are combined into a segment or a trajectory, which are directly adjacent. To illustrate this, FIG. 15 shows a section of the melody line 812 as it appears after the - -
  • Step 806 results. Only the individual matrix elements 814 from the melody matrix along which the melody line 812 extends are shown in FIG. For example, to check which matrix elements 814 are to be segmented, the device 304 scans them in the following manner. The device 304 first of all checks whether the melody matrix at all has a marked matrix element 814 for a first frame. If not, means 304 proceeds to the next matrix element and again checks the next frame for the presence of a corresponding matrix element. Otherwise, i. If there is a matrix element that is part of the melody line 812, the device 304 checks the next frame for the presence of a matrix element that is part of the melody line 812. If so, means 304 further checks if that matrix element is directly adjacent to the matrix element of the previous frame.
  • means 304 checks for the presence of a neighborhood relationship also for the next frame. Otherwise, i. in the absence of a neighborhood relationship, a currently recognized segment ends at the previous frame, and a new segment begins at the current frame.
  • the section of the melody line 812 shown in FIG. 15 represents an incomplete segment, in which all the matrix elements 814 that are part of the melody line or along which it runs are immediately adjacent to one another.
  • the segments found in this way are numbered consecutively, resulting in a sequence of segments.
  • the result of the general segmentation 782 is thus a sequence of melody segments, each melody segment covering a sequence of immediately adjacent frames.
  • the melody line jumps from frame to frame by at most a predetermined number of frequency bins, in the preceding exemplary embodiment by at most one frequency bin.
  • Step 816 is to close the gaps between adjacent segments to address the case that, due to, for example, percussive events in the melody line determination in step 780, other sound components have been inadvertently detected and filtered out in the general segmentation 782.
  • Gap closure 816 will be explained in greater detail with reference to FIG. 16, wherein gap closure 816 relies on a halftone vector, which is determined in a step 818, the determination of the halftone vector will be explained in more detail with reference to FIG.
  • FIG. 17 shows the patchy melody line 812 resulting from the general segmentation 782 in a shape plotted in the melody matrix.
  • the device Upon determination of the halftone vector in step 818, the device now sets 304 determines which frequency bins the melody line 812 passes through and how often or in how many frames.
  • the result of this approach illustrated with the case 820, is a histogram 822 indicating, for each frequency bin f, the frequency with which it is traversed by the melody line 812 and how many matrix elements of the melody matrix that are part of the melody line 812 , are arranged at the respective Frequenzbin.
  • device 304 determines in a step 824 the frequency bin with the maximum frequency. This is indicated by an arrow 826 in FIG. Starting from this frequency bin 826 of the frequency fo, the device 304 then determines a vector of frequencies fi which have a frequency spacing to one another and above all to the frequency fo, which corresponds to an integer multiple of a half-tone length HT.
  • the frequencies in the halftone vector will be referred to as halftone frequencies hereinafter.
  • halftone cutoff frequencies will also be referred to below. These are located exactly between adjacent halftone frequencies, ie exactly centered on this.
  • a halftone interval, as is customary in music, is defined as 2 1/12 of the frequency of use f 0 .
  • FIG. 18 shows a section of the melody matrix with a section of the melody line 812.
  • the melody line 812 has a gap 832 between two segments 812a and 812b, of which the segment 812a is the aforementioned reference segment.
  • the gap in the exemplary case of FIG. 18 is six frames.
  • step 834 it is checked whether the facing segment ends of the reference segment 812a and the successor segment 812b, ie the end of the segment 812a and the beginning of the successor segment 812b, are in a same or in relation to each other adjacent halftone areas lie.
  • step 836 it is checked whether the facing segment ends of the reference segment 812a and the successor segment 812b, ie the end of the segment 812a and the beginning of the successor segment 812b, are in a same or in relation to each other adjacent halftone areas lie.
  • the frequency axis f is divided into halftone areas as determined in step 818. As can be seen, in the case of FIG. 18, the facing segment ends of the segments 812a and 812b lie in one and the same halftone area 838.
  • step 840 in the perceptual spectrum of step 772, means 304 looks up the respective perceptual spectral values at the positions of the end of segment 812a and the beginning of segment 812b and determines the absolute value of the difference of the two spectral values. Further, means 304 determines in step 840 whether the difference is greater than a predetermined threshold r, preferably being 20-40% and preferably 30% of the perceptual spectral value at the end of the reference segment 812a.
  • a predetermined threshold r preferably being 20-40% and preferably 30% of the perceptual spectral value at the end of the reference segment 812a.
  • step 840 determines a gap closure line 844 in the melody matrix which directly connects the end of the reference segment 812a and the beginning of the successor segment 812b.
  • the gap closure line is preferably rectilinear, as also shown in FIG. More specifically, the connecting line 844 is a function across the frames over which the gap 832 extends, the function assigning a frequency bin to each of these frames so that a desired connecting line 844 results in the melody matrix.
  • means 304 determines the corresponding perceptual spectral values from the perceptual spectrum from step 772 by looking up the respective frequency bin and frame tuples of gap closure line 844 in the perceptual spectrum. Via these perception-related spectral values along the gap closure line, means 304 determines the mean value and compares it in the context of step 842 with the corresponding mean values of the perceptual spectral values along reference segment 812a and successor segment 812b.
  • step 846 the gap 832 is closed in a step 846 by entering or closing the gap closure line 844 in the melody matrix the corresponding matrix elements thereof are set to 1.
  • step 846 the list of segments is changed to merge the segments 812a and 812b into a common segment, whereupon the gap closure for the reference segment and the successor segment is completed.
  • a gap closure along the gap closure line 844 also occurs if, at step 830, the gap 832 is less than 4 frames long.
  • the gap 832 is closed, as in the case of step 846 along a direct and preferably straight gap closure line 844 connecting the facing ends of the segments 812a-812b, whereupon the gap closure for the two segments is finished and continues with the subsequent segment, as far as such exists.
  • step 848 will still be made dependent on a condition corresponding to that of step 836, i. that the two mutually facing segment ends lie in the same or adjacent halftone areas.
  • the result of the gap closure 816 is thus a possibly shortened list of segments or a melody line, which may have gap closure lines in the melody matrix in some places. As was apparent from the previous discussion, at a gap of less than 4 frames, a connection between adjacent segments in the same or adjacent halftone area is always made.
  • the gap closure 816 is followed by a harmony map 850, which is intended to eliminate errors in the melody line that have arisen by incorrectly determining the wrong root note of a sound in determining the potential melody line 780.
  • the harmony mapping 850 operates segment by segment to shift individual segments of the melody line resulting from gap closure 816 by an octave, fifth, or major third, as will be described in more detail below. As the following description will show, the conditions for this are strict in order not to erroneously shift a segment wrong in frequency.
  • the Harmoniemapping 850 is in the will be described in more detail below with reference to FIGS. 19 and 20.
  • FIG. 20 shows by way of example a section of the melody line as it has appeared after the gap closure 816.
  • This melody line is provided with the reference numeral 852 in FIG. 20, wherein in the section of FIG. 20 three segments can be seen from the melody line 852, namely the segments 852a-c.
  • the representation of the melody line again takes place as a track in the melody matrix, it being recalled, however, that the melody line 852 is a function which uniquely assigns a single frequency bin to the individual frames - meanwhile not all of them - so that those shown in FIG Make tracks.
  • the segment 852b located between the segments 852a and 852c appears to be cut out of the melody line progression as it would result from the segments 852a and 852c.
  • the segment 852b without frame gap follows the reference segment 852a by way of example, as indicated by a dashed line 854.
  • the time range covered by the segment 852b is intended to be directly adjacent to the time range covered by the segment 852c, as indicated by a dashed line 856.
  • dot-dashed and dot-dashed lines are shown in the melody matrix or in the time / frequency representation, which also result from a parallel displacement of the segment 852b along the frequency axis f.
  • Dashed line 858b is shifted down twelve halftones of frequency direction f, ie one octave.
  • a third line 858c is dash-dotted to this line and a quint line 858d is shown as a dash-dot-dot line, ie a line shifted by seven semitones towards higher frequencies relative to the line 858b.
  • the segment 852b appears to have been erroneously detected in the context of the melody line determination 780, since the same would be less abruptly inserted between the adjacent segments 852a and 852c when shifted one octave lower.
  • the task of the Harmoniemappings 850 is therefore to check whether a shift to such "outliers" should take place or not, since such frequency jumps occur less frequently in a melody.
  • the harmony mapping 850 begins with the determination of a melody centroid line by means of a mean value filter in a step 860.
  • step 860 comprises calculating a moving average of the melody curve 852 with a certain number of frames over the segments in the time direction t, the window length being 80 - 120, for example and preferably 100 frames at the above-mentioned frame length of 8 ms, ie correspondingly different number of frames at a different frame length.
  • a window of length 100 frames is frame-shifted along the time axis t. In this case, all the frequency bins associated with frames within the filter window by the melody line 852 are averaged and this average value for the frame is entered in the middle of the filter window, which causes repetition of successive frames in the filter frame - -
  • a melody centroid line 862 results, a function that uniquely assigns a frequency to the individual frames.
  • the melody centroid line 862 may extend over the entire time range of the audio signal, in which case the filter window at the beginning and end of the piece must be "squashed” accordingly, or only over a range from the beginning and the end of the audio piece half the filter window width is spaced.
  • a subsequent step 864 the device 304 checks whether the reference segment 852a is adjacent to the successor segment 852b along the time axis t. If this is not the case, the processing with the subsequent segment as the reference segment is performed again (866).
  • step 868 the successor segment 852b is virtually shifted to obtain the octave, fifth, and / or third lines 858a-d.
  • the selection of major thirds, fifths and octaves is advantageous in pop music, as there is usually used a major chord, in which the highest and the lowest tone of a chord have a spacing of a major third plus a minor third of a fifth.
  • the above procedure is of course also applicable to minor keys, in which chords of minor third and then major third occur.
  • a step 870 means 304 in the spectrum evaluated with equal loudness curves or the perceptual spectrum from step 772, respectively, look up the minimum perceptual spectral value along reference segment 852a and the octave, quintet, and / or third line, respectively 858a-d.
  • These minimum values are used in the subsequent step 872 to select one or none among the octave, fifth, and / or third shift lines 858a-d, depending on whether the octave, fifth, and / or octave shift lines or third-line minimum value has a predetermined reference to the minimum value of the reference segment.
  • an octave line 858b is selected below the lines 858a-858d if the minimum value is at most 30% less than the minimum value for the reference segment 852a.
  • a quint-line 858d is selected if the minimum value determined for it is at most 2.5% smaller than the minimum value of the reference segment 852a.
  • One of the triplets 858c is used if the corresponding minimum value for that line is at least 10% greater than the minimum value for the reference segment 852a.
  • the device 304 shifts the segment 852b to the selected line 858a-858d, if such was selected in step 872, provided that the shift points in the direction of the melody centerline 862 as viewed from the follower segment 852b , In the exemplary case - -
  • a vibrato detection and a vibrato compensation take place in a step 876, the mode of operation of which is explained in more detail with reference to FIGS. 21 and 27.
  • Step 876 is performed segment by segment for each segment 878 in the melody line as it results after harmony mapping 850.
  • an exemplary segment 878 is shown enlarged, in a representation in which the horizontal axis corresponds to the time axis and the vertical axis corresponds to the frequency axis, as was the case in the previous figures.
  • the reference segment 878 is first examined in the context of the vibrato detection 876 for local extrema.
  • the melody line function and thus also the part of the segment corresponding to the interest clearly maps the frames over this segment unambiguously on frequency bins in order to form the segment 888.
  • This segment function is examined for local extrema.
  • step 880 the reference segment 878 is examined for those locations where it has local extremal locations along the time axis with respect to the frequency direction, ie, locations where the slope of the melody line function is zero. These points are indicated in FIG. 22 by way of example with vertical lines 882.
  • a subsequent step 884 it is checked whether the extrema sites 882 are arranged such that temporally adjacent local extreme parts 882 are arranged at frequency bins having a frequency spacing greater than or equal to a predetermined number of bins, for example 15 to 25 preferably but 22 bins with reference to FIG. 4 described implementation of the frequency analysis or a _
  • Number of bins per semitone range of about 2 to 6, is.
  • the length of 22 frequency bins is shown by way of example with a double arrow 886.
  • extremals 882 satisfy criterion 884.
  • the device 304 checks whether, between the adjacent extreme digits 882, the time interval is always less than or equal to a predetermined number of time frames, the predetermined number being 21, for example.
  • step 892 If the check check in step 888 is positive, as is the case in the example of FIG. 22, as indicated by the double arrow 890, which is to correspond to the length of 21 frames, it is checked in a step 892 whether the number of times Extrema 882 is greater than or equal to a predetermined number, which is preferably 5 in the present case. In the example of Fig. 22, this is given. Thus, even if the check in step 892 is positive, in a subsequent step 894 the reference segment 878 or the detected vibrato is replaced by its mean value. The result of step 894 is indicated at 896 in FIG.
  • step 894 the reference segment 878 on the current melody line is removed and replaced by a reference segment 896 which extends over the same frames as the reference segment 878 but along a constant frequency bin corresponding to the average of the frequency bins through which the replaced reference segment 878 was. If the result of one of the 8 ' tests 884, 888 and 892 is negative, the vibrato detection or compensation ends for the relevant reference segment.
  • the vibrato detection and the vibrato balancing of Fig. 21 perform vibrato detection by stepwise feature extraction, which searches for local extrema, namely local minima and maxima, with a limitation on the number of allowed frequency bins of the modulation and a limitation in the modulation _
  • step 898 After the vibrato detection in step 876, a statistical correction is performed in step 898, which also accounts for the observation that in a tune short and extreme pitch variations are not to be expected.
  • the statistical correction of FIG. 898 will be further explained with reference to FIG. FIG. 23 shows, by way of example, a detail of a melody line 900, as may arise after the vibrato recognition 876. Again, the course of the melody line 900 is shown registered in the melody matrix, which is spanned by the frequency axis f and the time axis t.
  • a melody centerline for the melody line 900 is first determined similar to the step 860 in the harmony mapping.
  • a window 902 of predetermined time length e.g. also 100 frames in length, framed along the time axis t to frame-by-frame calculate an average of the frequency bins that the melody line 900 passes through within the window 902, with the mean being assigned to the frame in the middle of the window 902 as the frequency bin; resulting in a point 904 of the melody centerline to be determined.
  • the resulting melody centerline line is indicated by reference numeral 906 in FIG.
  • a second window is frame-shifted along the time axis t, which has, for example, a window length of 170 frames.
  • the standard deviation of the melody line 900 to the melody center line 906 is determined.
  • the resulting standard deviation for each frame is multiplied by 2 and added by 1 bin. This value then becomes the respective frequency bin for each frame, which is the _
  • Melody centerline 906 at this frame traverses, adds, and subtracts to obtain upper and lower standard deviation lines 908a and 908b.
  • the two standard deviation lines 908a and 908b define an allowed area 910 between them.
  • all segments of the melody line 900 that are completely outside of the approval area 910 are now removed. The result of the statistical correction 898 is thus a reduction in the number of segments.
  • the step 898 is followed by a halftone mapping 912.
  • the halftone mapping is performed frame-wise by resorting to the halftone vector, step 818, which defines the halftone frequencies.
  • the halftone mapping 912 functions such that, for each frame on which the melody line resulting from step 898 is present, it is checked in which of the halftone areas the frequency bin lies in which the melody line passes through the respective frame Frequency bin the melody line function the respective frame maps.
  • the melody line is then changed such that in the respective frame the melody line is changed to the frequency value corresponding to the semitone frequency of the semitone area in which the frequency bin through which the melody line passed was.
  • segment-wise semitone quantization can also be carried out, for example by assigning only the frequency mean value per segment to one of the halftone areas and thus to the corresponding halftone area frequency in the previously described manner, which then over the entire time length of the corresponding segment the frequency is used.
  • Steps 782, 816, 818, 850, 876, 898 and 912 thus correspond to step 760 in FIG. 2.
  • the goal of the onset detection and correction 914 is to correct the individual segments of the melody line resulting from the halftone mapping 912, which correspond more and more to the individual notes of the searched tune, with respect to their starting times.
  • the incoming or in step 750 provided audio signal 302 is used again, as will be described in more detail below.
  • a step 916 first the audio signal 302 is filtered with a bandpass filter corresponding to the halftone frequency to which the respective reference segment has been quantized in step 912, or with a bandpass filter having cutoff frequencies between which the quantized halftone frequency of the respective segment lies ,
  • the bandpass filter is used as one having cutoff frequencies corresponding to the halftone cutoff frequencies f u and f 0 of the halftone area in which the considered segment is located.
  • an IIR band-pass filter having the cutoff frequencies f u and f 0 associated with the respective halftone area is filtered as filter cutoff frequencies or with a Butterworth bandpass filter whose transfer function is shown in FIG.
  • a two-way rectification of the audio signal filtered in step 916 is performed, whereupon, in a step 920, the time signal obtained in step 918 is interpolated and the interpolated time signal is convolved with a Hamming window, whereby an envelope of the two-way rectified and filtered audio signals, respectively, is determined.
  • Steps 916-920 are illustrated once again with reference to FIG. Fig. 26 shows at 922 the two-way rectified audio signal, as it does after step 918, in a graph plotting horizontally the time t in virtual units and vertically the amplitude of the audio signal A in virtual units. Further, in the graph, the envelope 924 resulting in step 920 is shown.
  • Steps 916-920 are only one way of generating the envelope 924 and of course can be varied.
  • envelopes 924 are generated for the audio signal for all those semitone frequencies or halftone areas in which segments or note segments of the current melody line are arranged. For each such envelope 924, the following steps of Fig. 24 are then performed.
  • potential start times are determined as the locations of locally maximum rise of the envelope 924.
  • inflection points in the envelope 924 are determined in step 926.
  • the timings of the inflection points in the case of FIG. 26 are illustrated with vertical bars 928.
  • step 926 For the following evaluation of the determined potential starting times or potential increases, a downsampling to the time resolution of the preprocessing is carried out, possibly in the context of step 926, which is not shown in FIG. 24. It should be noted that in step 926 not all potential start times or all inflection points have to be determined. It is also not necessary that all determined or determined potential starting times must be supplied to the subsequent processing. Rather, it is possible to determine or further process only those inflection points as potential starting times, which are arranged in temporal proximity before or in a time range of one of the segments corresponds to the melody line located in the halftone area underlying the determination of the envelope 924.
  • step 928 it is now checked whether, for a potential start time, it lies before the segment start of the same corresponding segment. If so, processing continues at step 930. Otherwise, however, i. if the potential start time is past the existing start of the segment, step 928 is repeated for a next potential start time, or step 926 for a next envelope determined for another half-tone range, or segmented onset detection and correction is performed for a next segment ,
  • step 930 a check is made as to whether the potential start time is more than x frames before the beginning of the corresponding segment, where x is between 8 and 12, for example, and preferably 10 with a frame length of 8 ms, changing the values for other frame lengths accordingly would. If this is not the case, ie if the potential start time or the determined start time is 10 frames before the segment of interest, the gap between the potential start time and the previous segment start is closed or the previous start of the segment is corrected to the potential start time in a step 932 , If necessary, the predecessor segment is correspondingly shortened or its segment end is changed to the frame before the potential start time.
  • step 932 includes extending the reference segment forward to the potential start time and possibly shortening the length of the precursor segment at the end thereof to avoid overlapping the two segments. - -
  • step 934 it is checked in step 934 whether step 934 is being traversed the first time for that potential start time. If this is not the case here, the processing for this potential start time and the relevant segment ends here, and the processing of the onset detection continues in step 928 for another potential start time or in step 926 for a further envelope.
  • a step 936 the previous segment start of the segment of interest is virtually moved forward. In doing so, the perceptual spectral values in the perception-related spectrum are looked up, which are located at the virtually shifted segment start times. If the fall of these perception-related spectral values in the perceptual spectrum exceeds a certain value, then the frame at which this 3 ' overflow has taken place is provisionally used as segment start of the reference segment and step 930 is repeated again. If then the potential start time is not more than x frames before the beginning of the corresponding segment determined in step 936, the gap is also closed in step 932, as described above.
  • the effect of the onset detection and correction 914 is thus that individual segments in the current melody line are changed in their time extent, namely extended forward or shortened back.
  • segmental length segmentation 938 all segments of the melody line which, because of halftone mapping 912, now appear in the melody matrix as horizontal lines lying at semitone frequencies, are scanned through, and those segments from the melody line removed, which are smaller than a predetermined length. For example, segments are removed that are less than 10-14 frames long, and preferably 12 frames and less long, again assuming a frame length of 8ms above or adjusting the numbers of frames accordingly. 12 frames at 8 milliseconds corresponds to 96 milliseconds time resolution, which is less than about -1/64 note.
  • Steps 914 and 938 thus correspond to step 762 of FIG. 2.
  • the melody line held in step 938 then consists of a somewhat reduced number of segments having one and the same semitone frequency over a certain number of consecutive frames. These segments are clearly attributable to musical segments.
  • This melody line is then converted to a note representation in a step 940 corresponding to the above-described step 764 of FIG. 2, or into a midi file.
  • each segment still in the melody line after the length segmentation 938 is examined to find the first frame in the respective segment. This frame then determines the note start time of the note corresponding to that segment. For the note, the note length is then determined from the number of frames over which the corresponding segment extends.
  • the quantized pitch of the note results from the halftone frequency, which is constant in each segment due to step 912.
  • the MIDI output 914 by means 304 then provides the note sequence, based on which the rhythm means 306 performs the operations described above.
  • the audio signals 302 are of a monophonic type, as is the case, for example, in the case of pre-whistling for the generation of ringing tones, as described above, a procedure slightly modified from the procedure of FIG. 3 may be preferred in this respect be as they can avoid errors that may result in the approach of Fig. 3 due to musical inadequacies in the audiophone signal 302.
  • Fig. 27 shows the alternative mode of operation of device 304, which is preferable to monophonic audio signals over the approach of Fig. 3, but which would also be applicable in principle to polyphonic audio signals.
  • step 950 audio separation is performed in step 950.
  • the reason for performing the tone separation in step 950 can be illustrated with reference to FIG. 29, which is a section of the frequency / time space of the spectrogram of the audio signal
  • FIG. 29 is a section of the frequency / time space of the spectrogram of the audio signal
  • the exemplary segment 952 has been shifted along the frequency direction f by integer multiples of the respective frequency to determine harmonic lines.
  • FIG. 29 now shows only those parts of the reference segment 952 and corresponding upper lines 954a-g on which the Spectrogram from step 752 has spectral values that exceed an exemplary value.
  • the amplitude of the fundamental of the reference segment 952 obtained in the general segmentation 782 is consistently above the exemplary value. Only the above arranged devisto ne indicate an interruption approximately in the middle of the segment. The patency of the root has caused the segment in general segmentation 782 not to split into two notes, although there is likely to be a note boundary at about the middle of segment 952. Errors of this kind occur mainly only in monophonic music, for which reason the sound separation is performed only in the case of Fig. 27.
  • the tone separation 950 begins at step 958, starting from the melody line obtained in step 782, with the search for that overtone or tone lines 954a-954g along which the spectrogram obtained by frequency analysis 752 has the most dynamic amplitude response.
  • FIG. 30a shows, in a graph in which the x-axis of a time axis t and the y-axis corresponds to the amplitude or the value of the spectrogram, such an amplitude curve 960 for one of the overtone lines 954a-954g by way of example.
  • the dynamics for the amplitude curve 960 is determined from the difference between the maximum spectral value of the curve 960 and the minimum value within the curve 960.
  • Fig. 30a will exemplify the amplitude curve of the spectrogram along that overtone line 450a - 450g, which has the greatest dynamics among all these amplitude curves.
  • step 958 preferably only the 4th through 15th order overtones are considered.
  • a local amplitude minimum falls below a predetermined threshold, identified as potential separation points.
  • a predetermined threshold identified as potential separation points.
  • FIGS. 30a and b respectively, only the absolute minimum 964, which of course also represents a local minimum, falls below the threshold value which is illustrated in FIG. 30b by way of example with the dashed line 966.
  • the threshold value which is illustrated in FIG. 30b by way of example with the dashed line 966.
  • step 968 those which are located in a boundary area 970 around the segment start 972 or in a boundary area 974 around the segment end 976 are then sorted out among the possibly multiple separation sites.
  • step 978 the difference between the amplitude minimum at the minimum 964 and the mean of the amplitudes of the local maxima 980 and 982 adjacent to the minimum 964 is formed in the amplitude curve 960. The difference is illustrated in Fig. 30b with a double arrow 984.
  • a subsequent step 986 it is checked whether the difference 984 is greater than a predetermined threshold value. Otherwise, in a step 988, the reference segment at the potential separation point or minimum 964 is separated into two segments, one of which is separated from the second Segment start 972 extends to the frame of the minimum 964, and the other between the frame of the minimum 964 and the subsequent frame and the segment end 976. The list of segments is extended accordingly. Another way of separating 988 is to provide a gap between the two emerging segments. For example, in the area in which the amplitude curve 960 is below the threshold value, _ _
  • step 992 Another problem that occurs primarily in monophonic music is that the individual notes are subject to frequency fluctuations that complicate subsequent segmentation. Therefore, subsequent to the sound separation 950, sound smoothing is performed in step 992, which is explained in more detail with reference to FIGS. 31 and 32.
  • Fig. 32 shows, in high magnification, schematically a segment 994, as it is in the melody line, which results in the sound separation 950 out.
  • the illustration in FIG. 32 is such that in FIG. 32, a digit is provided to the corresponding tuple for each frequency bin and frame tuple traversed by the segment 994. The assignment of the digit will be explained in more detail below with reference to FIG. 31.
  • segment 994 in the exemplary case of FIG. 32, varies across 4 frequency bins and spans 27 frames.
  • tone smoothing is now to select, among the frequency bins between which segment 994 oscillates, the one which is to be assigned to segment 994 constantly for all frames.
  • the tone smoothing begins in step 996 with the initialization of a counter variable i to 1.
  • a counter value z is initialized to 1.
  • the counter variable i has the meaning of the numbering of the frames of the segment 994 from left to right in Fig. 32.
  • the counter variable z has the meaning of a counter that counts over how many consecutive frames the segment 994 is in one and the same frequency bin.
  • the value for z for the individual frames is shown in the form of the numbers representing the course of the segment 994 in FIG. - -
  • the counter value z is then accumulated to a sum for the frequency bin of the ith frame of the segment. For each frequency bin in which the segment 994 oscillates, there exists a sum or an accumulation value.
  • the counter value could be weighted according to a varying embodiment, e.g. with a factor f (i), where f (i) is a function that increases continuously with i, in order thus to balance the components to be summed up at the end of a segment, since the voice, for example, is already better attuned to the tone compared to the transient response at the beginning of a segment Note stronger weight. Below the horizontal time axis, an example of such a function f (i) is shown in square brackets in FIG.
  • i increases along the time and indicates how many positions a particular frame occupies among the frames of the considered segment
  • successive values which assumes the function shown by way of example for successive sections, which in turn are indicated by small vertical bars along the time axis, are shown with numbers in these square brackets.
  • the exemplary weighting function increases with i from 1 to 2.2.
  • a step 1002 it is checked if the i-th frame is the last frame of the segment 994. If this is not the case, the counter variable i is incremented in a step 1004, ie it is moved to the next frame.
  • a subsequent step 1006 it is checked whether the segment 994 in the current frame, ie the i-th frame, is in the same frequency bin as it was in the (i-1) th frame. If so, in a step 1008 the counter variable z is incremented, whereupon processing continues again at step 1000. However, if the segment 994 is not in the same frequency bin in the ith frame and the (il) th frame, processing continues with the initialization of the counter variable z to 1 in step 998. - -
  • step 1002 determines whether i-th frame is the last frame of segment 994, then for each frequency bin in which segment 994 resides, a sum is shown at 1010 in FIG.
  • step 1012 upon determination of the last frame in step 1002, the frequency bin for which the accumulated sum 1010 is greatest is selected. In the exemplary case of FIG. 32, this is the second lowest frequency bin among the four frequency bins in which segment 994 resides.
  • the reference segment 994 is then smoothed by swapping it to a segment in which each of the frames where the segment 994 was located is assigned the selected frequency bin. The tone smoothing of FIG. 31 is repeated in segments for all segments.
  • tone smoothing serves to equalize the singing and singing of tones from lower or higher frequencies, and accomplishes this by finding a value over the time course of a tone corresponding to the frequency of the settled tone.
  • tone smoothing serves to equalize the singing and singing of tones from lower or higher frequencies, and accomplishes this by finding a value over the time course of a tone corresponding to the frequency of the settled tone.
  • all elements of a frequency band are counted up, after which all the incremented elements of a frequency band, which are located on the note segment, are added up. Then the tone is entered over the time of the note segment in the frequency band with the largest sum.
  • a statistical correction 916 is then carried out, wherein the implementation of the statistical correction corresponds to that of FIG. 3, namely in particular to step 898.
  • the statistical correction 1016 is followed by a halftone mapping 1018, which corresponds to the halftone mapping 912 of FIG. 3 and also uses a halftone vector determined at a halftone vector detection 1020, which corresponds to that of Fig. 818.
  • Steps 950, 992, 1016, 1018 and 1020 thus correspond to step 760 of FIG. 2.
  • the halftone mapping 1018 is followed by an onset identifier 1022, which essentially corresponds to that of FIG. 3, namely step 914. Only in step 932 is it preferably prevented that gaps are closed again, or segments imposed by the tone separation 950 are closed again.
  • the onset identifier 1022 is followed by offset detection and correction 1024, which is explained in more detail below with reference to FIGS. 33-35.
  • the offset detection and correction is used to correct the end of the note.
  • the offset detection 1024 serves to prevent the reverberation of monophonic pieces of music.
  • step 1026 first the audio signal is filtered with a bandpass filter corresponding to the halftone frequency of the reference segment, whereupon, in a step 1028 corresponding to step 918, the filtered audio signal is two-way rectified. Further, in step 1028, an interpretation of the rectified time signal is performed. This approach is sufficient in the case of offset detection and correction to determine approximately an envelope, thereby eliminating the more complicated step 920 of onset detection.
  • FIG. 34 shows in a graph in which the time t is plotted in virtual units along the x-axis and in the virtual units along the y-axis of the amplitude A, the interpolated time signal is for example one Reference numeral 1030 and, for comparison thereto, the envelope, as determined in the Onseterkennung in step 920, with a reference numeral 1032.
  • a maximum 1040 of the interpolated time signal 1030 is determined, in particular the value of the interpolated time signal 1030 at the maximum 1040.
  • a potential note end time is then determined as the time wherein the rectified audio signal has dropped in time to the maximum 1040 to a predetermined percentage of the value at the maximum 1040, the percentage in step 1042 being preferably 15%.
  • the potential note end is illustrated in FIG. 34 with a dashed line 1044.
  • a subsequent step 1046 it is then checked whether the potential note end 1044 is behind the segment end 1048 in time. If this is not the case, as shown by way of example in FIG. 34, the reference segment is shortened by the time range 1036 in order to end at the potential note end 1044. However, if the note end is timed before the end of the segment, as shown by way of example in FIG. 35, it is checked in step 1050 if the time interval between potential note end 1044 and segment end 1048 is less than a predetermined percentage of the current segment length a, where the predetermined percentage in step 1050 is preferably 25%.
  • step 1050 determines whether the check in step 1050 is negative, no offset correction is performed and step 1034 and the following steps are repeated for another reference segment of equal semitone frequency or step 1026 for other semitone frequencies is continued.
  • a length segmentation 1052 corresponding to step 938 of FIG. 3 is performed in step 1052, followed by a MIDI output 1054 corresponding to step 940 of FIG. Step 762 of FIG. 2 corresponds to steps 1022, 1024, and 1052.
  • steps 770-774 or only steps 772 and 774 which, however, should lead to a deterioration of the melody line determination in step 780 and thus to a deterioration of the overall result of the melody extraction method as a whole.
  • basic frequency determination 776 a tone model from Goto was used.
  • other sound models or other weightings of the overtone components would also be possible and could be adapted, for example, to the origin or origin of the audio signal, as far as this or these is known, such as when in the embodiment of the ringtone generation, the user is set to a Vorsumsmen.
  • the determination of the potential melody line 780 could include assigning multiple frequency bins to the same frame. Subsequently, a finding of multiple trajectories could be performed. This means allowing a selection of multiple fundamental frequencies or multiple sounds for each frame. Of course, the subsequent segmentation would then have to be carried out differently and, in particular, the subsequent segmentation would be somewhat more complicated since several trajectories or segments would have to be considered and found.
  • Step 806 could be transferred to the case where the melody line consists of time-overlapping trajectories, if this step took place after the trajectories were identified.
  • the identification of trajectories could be similar to step 810, but modifications would have to be made such that multiple trajectories overlapping in time can also be tracked.
  • the gap closure could also be carried out in a similar way for those trajectories between which there is no gap in time. That too - -
  • Harmoniemapping could be done between two temporally consecutive trajectories. Vibrato detection and / or vibrato compensation could be readily applied to a single trajectory as well as to the previously mentioned non-overlapping melody line segments. Onset detection and correction could also be applied to trajectories. The same applies to tone separation and tone smoothing as well as offset detection and correction as well as statistical correction and length segmentation. However, allowing for the temporal overlap of trajectories of the melody line in the determination in step 780 at least required that the time overlap of trajectories be sometime eliminated before the actual note sequence output.
  • the advantage of determining the potential melody line in the manner described above with reference to FIGS. 3 and 27 is that the number of segments to be examined after the general segmentation is limited in advance to the most essential, and that the melody line determination even in step 780 is extremely simple and yet leads to a good melody extraction or note generation or transcription.
  • the general segmentation implementation described above does not have to have all substeps 786, 796, 804, and 806, but may also include a selection thereof.
  • the perceptual spectrum was used in steps 840 and 842. In principle, however, it is also possible to use in these steps the logarithmic spectrum or the spectrogram obtained directly from the frequency analysis, but the use of the perceptual spectrum in these steps has given the best result in terms of melody extraction.
  • step 870 from the harmony mapping. With regard to harmoniemapping, it is pointed out that it could be provided there, during the displacement 868 of the successor segment, to apply the displacement just in the direction of the melody centerline line, so that the second condition could be omitted in step 874.
  • uniqueness among the selection of the various octave, fifth, and / or third lines could be achieved by creating a priority ranking among them, such as an octave line before the fifth line before the third line and under lines of the same line type (octave, quint, or third line), which is closer to the original position of the successor segment.
  • the determination of the envelope or of the interpolated time signal used instead in the case of offset detection could also be carried out differently. It is only essential that in the onset and offset detection the audio signal is used back, which is filtered by a bandpass filter with a transmission characteristic around the respective semitone frequency, to the rise of the envelope of the resulting filtered signal the note start time or on the basis of Drop of the envelope to recognize the end of the note.
  • FIGS. 8-41 illustrate the operation of the melody extractor 304 and that each of the steps represented by a block in these flowcharts may be implemented in a corresponding subset of the device 304.
  • the implementation of the individual steps can be realized in hardware, as an ASIC circuit part, or in software, as a subroutine.
  • the inscriptions inscribed in the blocks roughly indicate which ones The respective step corresponds to the respective block, while the arrows between the blocks illustrate the sequence of steps in the operation of the device 304.
  • the inventive scheme can also be implemented in software.
  • the implementation may be on a digital storage medium, in particular a floppy disk or a CD with electronically readable control signals, which may cooperate with a programmable computer system such that the corresponding method is executed.
  • the invention thus also consists in a computer program product with program code stored on a machine-readable carrier for carrying out the method according to the invention when the computer program product runs on a computer.
  • the invention can thus be realized as a computer program with a program code for carrying out the method when the computer program runs on a computer.

Abstract

L'invention repose sur la découverte que l'extraction de mélodie ou la transcription automatique peut être considérablement plus stable et même le cas échéant moins complexe, lorsque l'on prend suffisamment en compte l'hypothèse que la mélodie principale représente la partie d'un morceau de musique que l'être humain perçoit comme étant la plus forte et la plus importante. Selon l'invention la représentation temporelle/spectrale ou le spectrogramme d'un signal audio présentant un intérêt sont mis à l'échelle au moyen de courbes (774) de même intensité sonore, qui reflètent la perception de l'intensité sonore par l'être humain, pour déterminer la mélodie du signal audio sur la base de la représentation temporelle/spectrale obtenue, liée à la perception.
EP05790019A 2004-10-11 2005-09-23 Procede et dispositif pour extraire une melodie servant de base a un signal audio Active EP1797552B1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102004049457A DE102004049457B3 (de) 2004-10-11 2004-10-11 Verfahren und Vorrichtung zur Extraktion einer einem Audiosignal zu Grunde liegenden Melodie
PCT/EP2005/010333 WO2006039994A2 (fr) 2004-10-11 2005-09-23 Procede et dispositif pour extraire une melodie servant de base a un signal audio

Publications (2)

Publication Number Publication Date
EP1797552A2 true EP1797552A2 (fr) 2007-06-20
EP1797552B1 EP1797552B1 (fr) 2010-04-21

Family

ID=35462427

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05790019A Active EP1797552B1 (fr) 2004-10-11 2005-09-23 Procede et dispositif pour extraire une melodie servant de base a un signal audio

Country Status (8)

Country Link
US (1) US20060075884A1 (fr)
EP (1) EP1797552B1 (fr)
JP (1) JP2008516289A (fr)
KR (1) KR20070062550A (fr)
CN (1) CN101076850A (fr)
AT (1) ATE465484T1 (fr)
DE (2) DE102004049457B3 (fr)
WO (1) WO2006039994A2 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US10425675B2 (en) 2008-11-26 2019-09-24 Free Stream Media Corp. Discovery, access control, and communication with networked services

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1571647A1 (fr) * 2004-02-26 2005-09-07 Lg Electronics Inc. Dispositif et méthode pour le traitement d'une sonnerie
DE102004028693B4 (de) * 2004-06-14 2009-12-31 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zum Bestimmen eines Akkordtyps, der einem Testsignal zugrunde liegt
DE102004049477A1 (de) * 2004-10-11 2006-04-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Verfahren und Vorrichtung zur harmonischen Aufbereitung einer Melodielinie
JP4672474B2 (ja) * 2005-07-22 2011-04-20 株式会社河合楽器製作所 自動採譜装置及びプログラム
JP4948118B2 (ja) * 2005-10-25 2012-06-06 ソニー株式会社 情報処理装置、情報処理方法、およびプログラム
JP4465626B2 (ja) * 2005-11-08 2010-05-19 ソニー株式会社 情報処理装置および方法、並びにプログラム
CN101371569B (zh) * 2006-01-17 2011-07-27 皇家飞利浦电子股份有限公司 使用周期平稳工具箱检测嵌入到噪声中的电视信号的存在
US7705231B2 (en) * 2007-09-07 2010-04-27 Microsoft Corporation Automatic accompaniment for vocal melodies
WO2007119221A2 (fr) * 2006-04-18 2007-10-25 Koninklijke Philips Electronics, N.V. Procédé et appareil destinés à extraire une partition musicale d'un signal musical
US8168877B1 (en) * 2006-10-02 2012-05-01 Harman International Industries Canada Limited Musical harmony generation from polyphonic audio signals
US8283546B2 (en) * 2007-03-28 2012-10-09 Van Os Jan L Melody encoding and searching system
KR100876794B1 (ko) 2007-04-03 2009-01-09 삼성전자주식회사 이동 단말에서 음성의 명료도 향상 장치 및 방법
US8140331B2 (en) * 2007-07-06 2012-03-20 Xia Lou Feature extraction for identification and classification of audio signals
CN101398827B (zh) * 2007-09-28 2013-01-23 三星电子株式会社 用于哼唱检索的方法和装置
US9159325B2 (en) * 2007-12-31 2015-10-13 Adobe Systems Incorporated Pitch shifting frequencies
US20090193959A1 (en) * 2008-02-06 2009-08-06 Jordi Janer Mestres Audio recording analysis and rating
DE102008013172B4 (de) * 2008-03-07 2010-07-08 Neubäcker, Peter Verfahren zur klangobjektorientierten Analyse und zur notenobjektorientierten Bearbeitung polyphoner Klangaufnahmen
JP5188300B2 (ja) * 2008-07-14 2013-04-24 日本電信電話株式会社 基本周波数軌跡モデルパラメータ抽出装置、基本周波数軌跡モデルパラメータ抽出方法、プログラム及び記録媒体
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US9519772B2 (en) 2008-11-26 2016-12-13 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9154942B2 (en) 2008-11-26 2015-10-06 Free Stream Media Corp. Zero configuration communication between a browser and a networked media device
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US8180891B1 (en) 2008-11-26 2012-05-15 Free Stream Media Corp. Discovery, access control, and communication with networked services from within a security sandbox
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9177540B2 (en) 2009-06-01 2015-11-03 Music Mastermind, Inc. System and method for conforming an audio input to a musical key
US8785760B2 (en) 2009-06-01 2014-07-22 Music Mastermind, Inc. System and method for applying a chain of effects to a musical composition
US9251776B2 (en) 2009-06-01 2016-02-02 Zya, Inc. System and method creating harmonizing tracks for an audio input
US8779268B2 (en) 2009-06-01 2014-07-15 Music Mastermind, Inc. System and method for producing a more harmonious musical accompaniment
US9257053B2 (en) 2009-06-01 2016-02-09 Zya, Inc. System and method for providing audio for a requested note using a render cache
US9310959B2 (en) 2009-06-01 2016-04-12 Zya, Inc. System and method for enhancing audio
US9293127B2 (en) * 2009-06-01 2016-03-22 Zya, Inc. System and method for assisting a user to create musical compositions
CN102422531B (zh) * 2009-06-29 2014-09-03 三菱电机株式会社 音频信号处理装置
CN101645268B (zh) * 2009-08-19 2012-03-14 李宋 一种演唱和演奏的计算机实时分析系统
KR101106185B1 (ko) 2010-01-19 2012-01-20 한국과학기술원 여러 음을 가진 오디오 신호에서 하모닉 구조 모델과 유동적인 길이를 갖는 분석 창을 이용한 멜로디 추출 방법 및 시스템
US8710343B2 (en) * 2011-06-09 2014-04-29 Ujam Inc. Music composition automation including song structure
CN103915093B (zh) * 2012-12-31 2019-07-30 科大讯飞股份有限公司 一种实现语音歌唱化的方法和装置
US8927846B2 (en) * 2013-03-15 2015-01-06 Exomens System and method for analysis and creation of music
US10133537B2 (en) 2014-09-25 2018-11-20 Honeywell International Inc. Method of integrating a home entertainment system with life style systems which include searching and playing music using voice commands based upon humming or singing
CN104503758A (zh) * 2014-12-24 2015-04-08 天脉聚源(北京)科技有限公司 一种生成动感音乐光圈的方法及装置
US9501568B2 (en) 2015-01-02 2016-11-22 Gracenote, Inc. Audio matching based on harmonogram
WO2017133213A1 (fr) * 2016-02-01 2017-08-10 北京小米移动软件有限公司 Procédé et dispositif d'identification d'empreinte digitale
CN107203571B (zh) * 2016-03-18 2019-08-06 腾讯科技(深圳)有限公司 歌曲旋律信息处理方法和装置
CN107301857A (zh) * 2016-04-15 2017-10-27 青岛海青科创科技发展有限公司 一种给旋律自动配伴奏的方法及系统
CN107039024A (zh) * 2017-02-10 2017-08-11 美国元源股份有限公司 乐谱数据处理方法及装置
CN107123415B (zh) * 2017-05-04 2020-12-18 吴振国 一种自动编曲方法及系统
US10249209B2 (en) * 2017-06-12 2019-04-02 Harmony Helper, LLC Real-time pitch detection for creating, practicing and sharing of musical harmonies
US11282407B2 (en) 2017-06-12 2022-03-22 Harmony Helper, LLC Teaching vocal harmonies
IL253472B (en) * 2017-07-13 2021-07-29 Melotec Ltd Method and system for performing melody recognition
WO2020094263A1 (fr) * 2018-11-05 2020-05-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et processeur de signal audio, pour fournir une représentation de signal audio traité, décodeur audio, codeur audio, procédés et programmes informatiques
CN112259063B (zh) * 2020-09-08 2023-06-16 华南理工大学 一种基于音符瞬态字典和稳态字典的多音高估计方法
CN112258932B (zh) * 2020-11-04 2022-07-19 深圳市平均律科技有限公司 一种乐器演奏辅助练习装置、方法及系统
SE544738C2 (en) * 2020-12-22 2022-11-01 Algoriffix Ab Method and system for recognising patterns in sound
CN113537102B (zh) * 2021-07-22 2023-07-07 深圳智微电子科技有限公司 一种微震信号的特征提取方法
CN114007166B (zh) * 2021-09-18 2024-02-27 北京车和家信息技术有限公司 定制声音的方法及装置、电子设备和存储介质
CN115472143A (zh) * 2022-09-13 2022-12-13 天津大学 一种调性音乐音符起始点检测与音符解码方法及装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04220880A (ja) * 1990-12-21 1992-08-11 Casio Comput Co Ltd 量子化装置
JP2558997B2 (ja) * 1991-12-03 1996-11-27 松下電器産業株式会社 ディジタルオーディオ信号の符号化方法
DE19526333A1 (de) * 1995-07-17 1997-01-23 Gehrer Eugen Dr Verfahren zur Erzeugung von Musik
DE19710953A1 (de) * 1997-03-17 1997-07-24 Frank Dr Rer Nat Kowalewski Verfahren und Vorrichtung zur Erkennung von Schallsignalen
JP3795201B2 (ja) * 1997-09-19 2006-07-12 大日本印刷株式会社 音響信号の符号化方法およびコンピュータ読み取り可能な記録媒体
JP4037542B2 (ja) * 1998-09-18 2008-01-23 大日本印刷株式会社 音響信号の符号化方法
JP4055336B2 (ja) * 2000-07-05 2008-03-05 日本電気株式会社 音声符号化装置及びそれに用いる音声符号化方法
US6856923B2 (en) * 2000-12-05 2005-02-15 Amusetec Co., Ltd. Method for analyzing music using sounds instruments
WO2002054715A2 (fr) * 2000-12-28 2002-07-11 Koninklijke Philips Electronics N.V. Appareil telephonique muni d'un dispositif de sonnerie
JP2002215142A (ja) * 2001-01-17 2002-07-31 Dainippon Printing Co Ltd 音響信号の符号化方法
JP2004534274A (ja) * 2001-03-23 2004-11-11 インスティチュート・フォー・インフォコム・リサーチ 内容ベースのマルチメディア情報検索で使用するためデジタル表示で音楽情報を表示する方法およびシステム
DE10117870B4 (de) * 2001-04-10 2005-06-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Verfahren und Vorrichtung zum Überführen eines Musiksignals in eine Noten-basierte Beschreibung und Verfahren und Vorrichtung zum Referenzieren eines Musiksignals in einer Datenbank
AU2002346116A1 (en) * 2001-07-20 2003-03-03 Gracenote, Inc. Automatic identification of sound recordings
US8050874B2 (en) * 2004-06-14 2011-11-01 Papadimitriou Wanda G Autonomous remaining useful life estimation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2006039994A2 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US10425675B2 (en) 2008-11-26 2019-09-24 Free Stream Media Corp. Discovery, access control, and communication with networked services

Also Published As

Publication number Publication date
US20060075884A1 (en) 2006-04-13
DE502005009467D1 (de) 2010-06-02
WO2006039994A2 (fr) 2006-04-20
KR20070062550A (ko) 2007-06-15
EP1797552B1 (fr) 2010-04-21
WO2006039994A3 (fr) 2007-04-19
DE102004049457B3 (de) 2006-07-06
JP2008516289A (ja) 2008-05-15
ATE465484T1 (de) 2010-05-15
CN101076850A (zh) 2007-11-21

Similar Documents

Publication Publication Date Title
EP1797552B1 (fr) Procede et dispositif pour extraire une melodie servant de base a un signal audio
WO2006039995A1 (fr) Procede et dispositif pour le traitement harmonique d'une ligne melodique
WO2006039993A1 (fr) Procede et dispositif pour lisser un segment de ligne melodique
WO2006039992A1 (fr) Extraction d'une melodie sous-jacente a un signal audio
DE69907498T2 (de) Verfahren zur schnellen erfassung der tonhöhe
EP2099024B1 (fr) Procédé d'analyse orienté objet sonore et destiné au traitement orienté objet sonore de notes d'enregistrements de sons polyphoniques
WO2002073592A2 (fr) Procede et dispositif de caracterisation d'un signal et procede et dispositif de production d'un signal indexe
EP1407446A1 (fr) Procede et dispositif pour caracteriser un signal et pour produire un signal indexe
WO2005122135A1 (fr) Dispositif et procede de transformation d'un signal d'information en une representation spectrale a resolution variable
EP2180463A1 (fr) Procédé destiné à la reconnaissance de motifs de notes dans des morceaux de musique
DE102004028693B4 (de) Vorrichtung und Verfahren zum Bestimmen eines Akkordtyps, der einem Testsignal zugrunde liegt
DE102012025016B3 (de) Verfahren zur Ermittlung wenigstens zweier Einzelsignale aus wenigstens zwei Ausgangssignalen
DE102004033829B4 (de) Verfahren und Vorrichtung zur Erzeugung einer Polyphonen Melodie
EP1758096A1 (fr) Méthode et appareil pour la reconnaissance de motifs dans des enregistrements accoustiques
DE102004033867B4 (de) Verfahren und Vorrichtung zur rhythmischen Aufbereitung von Audiosignalen
EP1377924B1 (fr) Procede et dispositif permettant d'extraire une identification de signaux, procede et dispositif permettant de creer une banque de donnees a partir d'identifications de signaux, et procede et dispositif permettant de se referencer a un signal temps de recherche
DE102004022659B3 (de) Vorrichtung zum Charakterisieren eines Tonsignals
EP1743324B1 (fr) Dispositif et procede pour analyser un signal d'information
AT521870B1 (de) Verfahren zur Erkennung der Liedtonart
DE102004045097B3 (de) Verfahren zur Extraktion periodischer Signalkomponenten und Vorrichtung hierzu
DE3790442C2 (de) Einrichtung zur Berechnung eines Ähnlichkeitsgrades eines Sprachmusters
DE102019215469A1 (de) Verfahren zur Erweiterung des Tonumfangs eines Handzuginstruments
DD141356A1 (de) Verfahren zur akustischen guetebewertung von musikinstrumenten
WO2005122137A1 (fr) Dispositif et procede permettant de determiner une disposition des canaux radioelectriques a la base d'un signal audio

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20070405

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK YU

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20080110

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: GRACENOTE, INC.

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: GERMAN

REF Corresponds to:

Ref document number: 502005009467

Country of ref document: DE

Date of ref document: 20100602

Kind code of ref document: P

REG Reference to a national code

Ref country code: NL

Ref legal event code: VDEP

Effective date: 20100421

LTIE Lt: invalidation of european patent or patent extension

Effective date: 20100421

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100801

REG Reference to a national code

Ref country code: IE

Ref legal event code: FD4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100821

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100722

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100512

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: IE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100823

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

26N No opposition filed

Effective date: 20110124

BERE Be: lapsed

Owner name: GRACENOTE, INC.

Effective date: 20100930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100930

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100930

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100930

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100930

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100923

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20100923

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20101022

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100421

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100721

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 11

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 12

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 13

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 14

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20220927

Year of fee payment: 18

Ref country code: DE

Payment date: 20220629

Year of fee payment: 18

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20220926

Year of fee payment: 18