EP1377924B1

EP1377924B1 - Method and device for extracting a signal identifier, method and device for creating a database from signal identifiers and method and device for referencing a search time signal

Info

Publication number: EP1377924B1
Application number: EP02714186A
Authority: EP
Inventors: Frank Klefenz; Karlheinz Brandenburg; Wolfgang Hirsch; Christian Uhle; Christian Richter; Andras Katai; Matthias Kaufmann
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2001-04-10
Filing date: 2002-03-12
Publication date: 2004-09-22
Anticipated expiration: 2022-03-12
Also published as: JP3934556B2; DE10117871C1; ATE277381T1; WO2002084539A3; US20040158437A1; EP1377924A2; HK1059492A1; JP2004531758A; CA2443202A1; DE50201116D1; WO2002084539A2; AU2002246109A1

Abstract

In a method of extracting a signal identifier from a time signal, the temporal occurrence of signal edges in the time signal is detected (12), wherein a signal edge has a specified temporal length. In addition, the temporal interval between two selected detected signal edges is determined (14). From the temporal interval determined, a frequency value is calculated (16), the frequency value being associated with a time of occurrence of the frequency value in the time signal so as to obtain a coordinate tuple from the frequency value and the time of occurrence for this frequency value. A signal identifier is created from a plurality of coordinate tuples (18), each coordinate tuple including a frequency value and a time of occurrence, which is why the signal identifier includes a sequence of signal identifier values reproducing the temporal form of the time signal. The extracted signal identifier is based on signal edges of the time signal and thus reproduces the temporal form of the time signal. The signal identifier is therefore characteristic of the time signal, on the one hand, and robust towards changes in the time signal, on the other hand.

Description

Die vorliegende Erfindung bezieht sich auf die Verarbeitung von Zeitsignalen, die einen harmonischen Anteil haben, und insbesondere auf das Erzeugen einer Signalkennung für ein Zeitsignal, um das Zeitsignal mittels einer Datenbank, in der eine Mehrzahl von Signalkennungen für eine Mehrzahl von Zeitsignalen gespeichert ist, beschreiben zu können.The present invention relates to processing time signals that have a harmonic component, and in particular on the generation of a signal identifier for a Time signal to get the time signal using a database in of a plurality of signal identifiers for a plurality of Time signals is stored to be able to describe.

Konzepte, durch die Zeitsignale mit einem harmonischen Anteil, wie z. B. Audiodaten, identifizierbar und referenzierbar sind, sind für viele Anwender nützlich. Insbesondere in einer Situation, in der ein Audiosignal vorliegt, dessen Titel und Autor unbekannt sind, ist es oftmals wünschenswert, herauszufinden, von wem das entsprechende Lied stammt. Ein Bedarf hierzu besteht beispielsweise, wenn der Wunsch vorhanden ist, z. B. eine CD des betreffenden Interpreten zu erwerben. Wenn das vorliegende Audiosignal lediglich den Zeitsignalinhalt umfaßt, jedoch keinen Namen über den Interpreten, den Musikverlag etc., so ist eine Identifizierung des Ursprungs des Audiosignals bzw. von wem ein Lied stammt, nicht möglich. Die einzige Hoffnung bestand dann darin, das Audiostück samt Referenzdaten bezüglich des Autors oder der Quelle, wo das Audiosignal zu erwerben ist, noch einmal zu hören, um dann den gewünschten Titel beschaffen zu können.Concepts by which time signals with a harmonic component, such as B. audio data, identifiable and referenced are useful for many users. In particular in a situation where there is an audio signal, whose title and author are unknown, it is often desirable find out from whom the corresponding song comes. There is a need for this, for example, if the Wish is present, e.g. B. a CD of the artist in question to acquire. If the present audio signal only includes the time signal content, but no name above the interpreter, the music publisher etc. is an identification the origin of the audio signal or by whom Song comes from, not possible. The only hope was then in it, the audio piece including reference data regarding the Author or the source where the audio signal can be purchased, heard again to get the title you want to be able to.

Im Internet ist es nicht möglich, Audiodaten unter Verwendung herkömmlicher Suchmaschinen zu suchen, da die Suchmaschinen lediglich mit textuellen Daten umgehen können. Audiosignale bzw. allgemeiner gesagt, Zeitsignale, die einen harmonischen Anteil haben, können durch solche Suchmaschinen nicht verarbeitet werden, wenn sie keine textuellen Suchangaben umfassen.On the internet it is not possible to use audio data conventional search engines to look for since the search engines can only deal with textual data. Audio signals or more generally, time signals that one can have a harmonious share through such search engines not be processed if they are not textual Include search information.

Ein realistischer Bestand an Audiodateien liegt bei mehreren tausend gespeicherten Audiodateien bis zu hunderttausenden von Audiodateien. Musikdatenbankinformationen können auf einem zentralen Internet-Server abgelegt sein, und potentielle Suchanfragen könnten über das Internet erfolgen. Alternativ sind bei heutigen Festplattenkapazitäten auch die zentrale Musikdatenbanken auf lokalen Festplattensystemen von Benutzern denkbar. Es ist wünschenswert, solche Musikdatenbanken durchsuchen zu können, um Referenzdaten über eine Audiodatei zu erfahren, von der lediglich die Datei selbst, jedoch keine Referenzdaten bekannt sind.A realistic inventory of audio files is several thousand saved audio files up to hundreds of thousands of audio files. Music database information can be stored on a central Internet server, and potential Search queries could be made over the Internet. Alternatively, with today's hard drive capacities the central music databases on local hard disk systems conceivable by users. It is desirable to have such music databases to be able to search for reference data via to learn an audio file, of which only the file itself, but no reference data are known.

Darüber hinaus ist es gleichermaßen wünschenswert, Musikdatenbanken unter Verwendung vorgegebener Kriterien durchsuchen zu können, die beispielsweise dahingehend lauten, ähnliche Stücke herausfinden zu können. Ähnliche Stücke sind beispielsweise die Stücke mit einer ähnlichen Melodie, einem ähnlichen Instrumentensatz, oder einfach mit ähnlichen Geräuschen, wie z. B. Meeresrauschen, Vogelgezwitscher, männliche Stimmen, weibliche Stimmen, etc.In addition, it is equally desirable to have music databases search using predetermined criteria to be able to, for example, similar To be able to find out pieces. Similar pieces are for example, the pieces with a similar melody, one similar set of instruments, or simply with similar ones Noises such as B. sound of the sea, twittering of birds, male voices, female voices, etc.

Das U.S.-Patent Nr. 5,918,223 offenbart ein Verfahren und eine Vorrichtung für eine Inhalts-basierte Analyse, Speicherung, Wiedergewinnung und Segmentierung von Audioinformationen. Dieses Verfahren beruht darauf, mehrere akustische Merkmale aus einem Audiosignal zu extrahieren. Gemessen werden Lautstärke, Baß, Tonhöhe, Brightness und Melfrequenz-basierte Cepstral-Koeffizienten in einem Zeitfenster bestimmter Länge in periodischen Intervallabständen. Jeder Meßdatensatz besteht aus einer Folge von gemessenen Merkmalsvektoren. Jede Audiodatei ist durch den kompletten Satz der pro Merkmal berechneten Merkmalsfolgen spezifiziert. Ferner werden die ersten Ableitungen für jede Folge von Merkmalsvektoren berechnet. Dann werden statistische Werte wie Mittelwert und Standardabweichung berechnet. Dieser Satz von Werten wird in einem N-Vektor, d. h. einem Vektor mit N Elementen, gespeichert. Diese Vorgehensweise wird auf eine Vielzahl von Audiodateien angewendet, um für jede Audiodatei einen N-Vektor abzuleiten. Damit wird nach und nach eine Datenbank aus einer Vielzahl von N-Vektoren aufgebaut. Aus einer unbekannten Audiodatei wird dann unter Verwendung derselben Vorgehensweise ein Such-N-Vektor extrahiert. Bei einer Suchanfrage wird dann eine Abstandsberechnung des vorgegebenen N-Vektors und der in der Datenbank gespeicherten N-Vektoren ermittelt. Schließlich wird der N-Vektor ausgegeben, der den minimalen Abstand zu dem Such-N-Vektor hat. Dem ausgegebenen N-Vektor sind Daten über den Autor, den Titel, die Beschaffungsquelle etc. zugeordnet, so daß eine Audiodatei hinsichtlich ihres Ursprungs identifiziert werden kann.U.S. Patent No. 5,918,223 discloses a method and a device for content-based analysis, storage, Recovery and segmentation of audio information. This procedure relies on several acoustic Extract features from an audio signal. Measured are volume, bass, pitch, brightness and melency-based Cepstral coefficients in a time window certain length in periodic intervals. Everyone Measurement data set consists of a sequence of measured feature vectors. Each audio file is through the complete sentence the characteristic sequences calculated for each characteristic. Furthermore, the first derivatives for each sequence of Feature vectors calculated. Then statistical values how mean and standard deviation are calculated. This Set of values is in an N vector, i.e. H. a vector with N elements, saved. This procedure is based on a variety of audio files applied to each audio file derive an N vector. This will gradually according to a database made up of a large number of N vectors. An unknown audio file then becomes Extract a search n vector using the same procedure. In the case of a search query, a distance calculation is then made of the given N vector and that in the database stored N vectors determined. Eventually the N vector is output, which is the minimum distance to that Search N vector. The output N vector is data assigned via the author, title, source of supply, etc. so an audio file regarding its origin can be identified.

Dieses Verfahren hat den Nachteil, daß mehrere Merkmale berechnet werden und willkürliche Heuristiken zur Berechnung der Kenngrößen eingeführt werden. Durch Mittelwert- und Standardabweichungsberechnungen über alle Merkmalsvektoren für eine gesamte Audiodatei wird die Information, die durch den zeitlichen Verlauf der Merkmalsvektoren gegeben ist, auf wenige Merkmalsgrößen reduziert. Dies führt zu einem hohen Informationsverlust.This method has the disadvantage that several characteristics are calculated and arbitrary heuristics to calculate of the parameters are introduced. By mean and Standard deviation calculations across all feature vectors for an entire audio file, the information generated by the time course of the feature vectors is given, reduced to a few feature sizes. This leads to one high loss of information.

Die Aufgabe der vorliegenden Erfindung besteht darin, ein Verfahren und eine Vorrichtung zum Extrahieren einer Signalkennung aus einem Zeitsignal zu schaffen, die eine aussagekräftige Kennzeichnung eines Zeitsignals ohne zu großen Informationsverlust ermöglichen.The object of the present invention is a Method and device for extracting a signal identifier to create a time signal that is meaningful Identification of a time signal without too large Enable loss of information.

Diese Aufgabe wird durch ein Verfahren zum Extrahieren einer Signalkennung aus einem Zeitsignal nach Patentanspruch 1 oder durch eine Vorrichtung zum Extrahieren einer Signalkennung aus einem Zeitsignal nach Patentanspruch 19 gelöst.This task is accomplished by a method of extracting a Signal identifier from a time signal according to claim 1 or by a device for extracting a signal identifier solved from a time signal according to claim 19.

Eine weitere Aufgabe der vorliegenden Erfindung besteht darin, ein Verfahren und eine Vorrichtung zum Erzeugen einer Datenbank aus Signalkennungen und ein Verfahren und eine Vorrichtung zum Referenzieren eines Such-Zeitsignals mittels einer solchen Datenbank zu schaffen.Another object of the present invention is therein, a method and an apparatus for generating a Database of signal identifiers and a method and a Device for referencing a search time signal by means of to create such a database.

Diese Aufgabe wird durch ein Verfahren zum Erzeugen einer Datenbank nach Patentanspruch 13, eine Vorrichtung zum Erzeugen einer Datenbank nach Patentanspruch 20, ein Verfahren zum Referenzieren eines Such-Zeitsignals nach Patentanspruch 14 oder eine Vorrichtung zum Referenzieren eines Such-Zeitsignals nach Patentanspruch 21 gelöst.This task is accomplished by a method for generating a Database according to claim 13, a device for generating a database according to claim 20, a method for referencing a search time signal according to claim 14 or a device for referencing a Search time signal according to claim 21 solved.

Der vorliegenden Erfindung liegt die Erkenntnis zugrunde, daß bei Zeitsignalen, die einen harmonischen Anteil haben, der zeitliche Verlauf des Zeitsignals verwendet werden kann, um eine Signalkennung des Zeitsignals aus dem Zeitsignal zu extrahieren, die einerseits einen guten Fingerabdruck für das Zeitsignal liefert, die jedoch andererseits hinsichtlich ihrer Datenmenge überschaubar ist, um ein effizientes Durchsuchen einer Vielzahl von Signalkennungen in einer Datenbank zu ermöglichen. Eine wesentliche Eigenschaft von Zeitsignalen mit einem harmonischen Anteil sind wiederkehrende Signalflanken in dem Zeitsignal, wobei z. B. zwei aufeinanderfolgende Signalflanken mit gleicher bzw. ähnlicher Länge die Angabe einer Periodendauer und damit einer Frequenz in dem Zeitsignal mit hoher zeitlicher und frequenzmäßiger Auflösung ermöglichen, wenn nicht nur das Vorhandensein der Signalflanken an sich, sondern auch das zeitliche Auftreten der Signalflanken in dem Zeitsignal berücksichtigt wird. Somit ist es möglich, eine Beschreibung des Zeitsignals dadurch zu erhalten, daß das Zeitsignal aus zeitlich aufeinanderfolgenden Frequenzen besteht. Am Beispiel eines Audiosignals wird das Audiosignal somit so charakterisiert, daß ein Ton, also eine Frequenz, zu einem bestimmten Zeitpunkt vorhanden ist, und daß diesem Ton, d. h. dieser Frequenz, zu einem späteren Zeitpunkt ein anderer Ton, d. h. eine andere Frequenz, folgt. The present invention is based on the finding that that with time signals that have a harmonic component, the time course of the time signal can be used can to a signal identifier of the time signal from the time signal to extract the one hand a good fingerprint for the time signal, but on the other hand is manageable in terms of their amount of data in order to be an efficient one Search a variety of signal identifiers in to enable a database. An essential quality of time signals with a harmonic component recurring signal edges in the time signal, z. B. two successive signal edges with the same or Similar length, the specification of a period and thus a frequency in the time signal with high temporal and enable frequency resolution, if not only that Presence of the signal edges per se, but also that temporal occurrence of the signal edges in the time signal is taken into account becomes. It is therefore possible to provide a description to obtain the time signal in that the time signal is off consecutive frequencies. Exemplary of an audio signal, the audio signal is thus characterized that a tone, that is, a frequency, at a certain point Time is present, and that this sound, i.e. H. this frequency, another one later Sound, d. H. another frequency follows.

Erfindungsgemäß wird somit von der Beschreibung des Zeitsignals durch eine Folge von zeitlichen Abtastwerten in eine Beschreibung des Zeitsignals durch Koordinaten-Tupel aus Frequenz und Zeitpunkt des Auftretens der Frequenz übergegangen. Die Signalkennung oder anders ausgedrückt der Merkmalsvektor (MV), der zum Beschreiben des Zeitsignals verwendet wird, umfaßt somit eine Folge von Signalkennungswerten, die je nach Ausführungsform mehr oder weniger grob den zeitlichen Verlauf des Zeitsignals wiedergibt. Das Zeitsignal wird somit nicht, wie im Stand der Technik, anhand seiner spektralen Eigenschaften charakterisiert, sondern anhand der zeitlichen Abfolge von Frequenzen in dem Zeitsignal.According to the invention, the description of the time signal by a sequence of temporal samples into one Description of the time signal using coordinate tuples Frequency and time of occurrence of the frequency passed. The signal identifier or, in other words, the feature vector (MV) used to describe the time signal thus comprises a sequence of signal identification values, the more or less roughly depending on the embodiment reproduces the time course of the time signal. The time signal is therefore not based on, as in the prior art characterized its spectral properties, but based on the temporal sequence of frequencies in the time signal.

Zur Berechnung eines Frequenzwerts aus den detektierten Signalflanken werden somit zumindest zwei detektierte Signalflanken benötigt. Die Auswahl dieser zwei Signalflanken aus den gesamten detektierten Signalflanken, auf deren Basis Frequenzwerte berechnet werden, ist vielfältig. Zunächst können zwei aufeinanderfolgende Signalflanken von im wesentlichen gleicher Länge verwendet werden. Der Frequenzwert ist dann der Kehrwert aus dem zeitlichen Abstand dieser Flanken. Alternativ kann eine Auswahl auch nach der Amplitude der detektierten Signalflanken durchgeführt werden. So können auch zwei aufeinanderfolgende Signalflanken gleicher Amplitude genommen werden, um einen Frequenzwert zu ermitteln. Es müssen jedoch nicht immer zwei aufeinanderfolgende Signalflanken genommen werden, sondern z. B. immer die zweite, dritte, vierte, ... Signalflanke gleicher Amplitude oder Länge. Schließlich sei angemerkt, daß auch zwei beliebige Signalflanken genommen werden können, um unter Verwendung statistischer Methoden und auf der Basis der Superpositionsgesetze die Koordinatentupel zu erhalten. Am Beispiel einer Flöte wird deutlich, daß ein Flötenton zwei Signalflanken mit hoher Amplitude liefert, zwischen denen sich ein Wellenberg mit niedrigerer Amplitude befindet. Um den Grundton der Flöte zu ermitteln, könnte beispielsweise eine Auswahl der zwei detektierten Signalflanken nach der Amplitude getroffen werden.To calculate a frequency value from the detected signal edges are at least two detected signal edges needed. The selection of these two signal edges the total detected signal edges, on the basis thereof Frequency values are calculated is varied. First can have two consecutive signal edges of essentially same length can be used. The frequency value is then the reciprocal of the time interval between them Flanks. Alternatively, a selection can also be made according to the amplitude of the detected signal edges. So two successive signal edges can be the same Amplitude can be taken to get a frequency value determine. However, it doesn't always have to be two consecutive Signal edges are taken, but z. B. always the second, third, fourth, ... signal edge of the same amplitude or length. Finally, it should be noted that also any two signal edges can be taken to under Using statistical methods and based on the Superposition laws to get the coordinate tuple. At the Example of a flute, it becomes clear that a flute sound is two Signal edges with high amplitude provides between those there is a wave crest with a lower amplitude. Around for example, to determine the root note of the flute a selection of the two detected signal edges after the Amplitude are taken.

Insbesondere für Audiosignale stellt die zeitliche Abfolge von Tönen die natürlichste Art und Weise der Charakterisierung dar, da, wie es am einfachsten an Musiksignalen erkennbar ist, der Wesensgehalt des Audiosignals eben in der zeitlichen Abfolge von Tönen steckt. Die unmittelbarste Empfindung die ein Hörer von einem Musiksignal erhält, ist die zeitliche Abfolge von Tönen. Nicht nur in der klassischen Musik, bei der sich Werke immer um ein bestimmtes Thema aufbauen, das sich in verschiedenen Abwandlungen durch das ganze Werk zieht, sondern auch bei Liedern der populären oder sonstigen zeitgenössischen Musik existiert eine einprägsame Melodie, die im allgemeinen aus einer Folge von einfachen Tönen besteht, wobei das Thema bzw. die einfache Melodie wesentlich die Wiedererkennungsfähigkeit unabhängig von Rhythmus, der Tonhöhe, einer eventuellen Instrumentenbegleitung etc. prägt.The temporal sequence represents in particular for audio signals of tones the most natural way of characterizing because there is the easiest way to recognize it from music signals is the essence of the audio signal in the chronological sequence of tones. The most immediate Sensation that a listener receives from a music signal is the sequence of tones. Not just in the classic Music where works are always about a certain one Build up subject that is in various modifications through the whole work, but also with songs of popular or other contemporary music exists a catchy melody that generally consists of one episode consists of simple tones, the subject or simple melody essential to the recognition ability regardless of rhythm, pitch, any instrument accompaniment etc. shapes.

Das erfindungsgemäße Konzept basiert auf dieser Erkenntnis und liefert eine Signalkennung, die aus einer zeitlichen Abfolge von Frequenzen besteht oder, je nach Ausführungsform, aus einer zeitlichen Abfolge von Frequenzen, d. h. Tönen, durch statistische Verfahren abgeleitet ist.The concept according to the invention is based on this knowledge and provides a signal identifier that consists of a temporal Sequence of frequencies exists or, depending on the embodiment, from a chronological sequence of frequencies, d. H. Tones derived from statistical methods.

Ein Vorteil der vorliegenden Erfindung besteht darin, daß die Signalkennung als zeitliche Abfolge von Frequenzen einen Fingerabdruck von hohem Informationsgehalt für Zeitsignale mit harmonischem Anteil darstellt und gewissermaßen das wesentliche oder den Kern eines Zeitsignal ausmacht.An advantage of the present invention is that the signal identifier as a chronological sequence of frequencies High information content fingerprint for time signals with a harmonic component and to a certain extent the essential or the core of a time signal.

Ein weiterer Vorteil der vorliegenden Erfindung besteht darin, daß die erfindungsgemäß extrahierte Signalkennung zwar eine starke Komprimierung des Zeitsignals darstellt, jedoch nach wie vor an den zeitlichen Verlauf des Zeitsignals angelehnt ist und damit an die natürliche Auffassung von Zeitsignalen, z. B. Musikstücken, angepasst ist. Another advantage of the present invention is in that the signal identifier extracted according to the invention represents a strong compression of the time signal, however, the timing of the time signal continues is based on the natural view of time signals, e.g. B. pieces of music is adjusted.

Ein weiterer Vorteil der vorliegenden Erfindung besteht darin, daß durch die sequentielle Natur der Signalkennung von den Abstandsberechnungs-Referenzierungsalgorithmen im Stand der Technik weggegangen werden kann und zur Referenzierung des Zeitsignals in einer Datenbank Algorithmen eingesetzt werden können, die aus der DNA-Sequenzierung bekannt sind und daß darüber hinaus auch Ähnlichkeitsberechnungen durchgeführt werden können, indem DNA-Sequenzierungsalgorithmen mit Ersetzen/Einfügen/Löschen-Operationen eingesetzt werden.Another advantage of the present invention is in that due to the sequential nature of the signal identifier from the distance calculation referencing algorithms in State of the art can be gone and for referencing the time signal used in a database algorithms can be known from DNA sequencing are and that in addition also similarity calculations can be performed using DNA sequencing algorithms with replace / insert / delete operations be used.

Ein weiterer Vorteil der vorliegenden Erfindung besteht darin, daß zum Detektieren des zeitlichen Auftretens von Signalflanken in dem Zeitsignal auf günstige Art und Weise die Hough-Transformation eingesetzt werden kann, für die aus der Bildverarbeitung und Bilderkennung effiziente Algorithmen existieren.Another advantage of the present invention is in that to detect the time occurrence of Signal edges in the time signal in a favorable manner the Hough transform can be used for that efficient algorithms from image processing and image recognition exist.

Ein weiterer Vorteil der vorliegenden Erfindung besteht darin, daß die erfindungsgemäß extrahierte Signalkennung eines Zeitsignals unabhängig davon ist, ob die Such-Signalkennung aus dem gesamten Zeitsignal oder nur aus einem Abschnitt des Zeitsignals abgeleitet ist, da gemäß den Algorithmen der DNA-Sequenzierung ein zeitlich schrittweiser Vergleich der Such-Signalkennung mit einer Referenz-Signalkennung durchgeführt werden kann, wobei aufgrund des zeitlich sequentiellen Vergleichs der Abschnitt des zu identifizierenden Zeitsignals gewissermaßen automatisch in dem Referenz-Zeitsignal dort identifiziert wird, wo die höchste Übereinstimmung zwischen Such-Signalkennung und Referenz-Signalkennung existiert.Another advantage of the present invention is in that the signal identifier extracted according to the invention of a time signal is independent of whether the search signal identifier from the entire time signal or only from one Section of the time signal is derived because according to the DNA sequencing algorithms one step at a time Comparison of the search signal identifier with a reference signal identifier can be carried out, due to the temporal sequential comparison of the section of the identifying time signal to a certain extent automatically the reference time signal is identified where the highest match between search signal identifier and Reference signal identifier exists.

Bevorzugte Ausführungsbeispiele der vorliegenden Erfindung werden nachfolgend bezugnehmend auf die beiliegenden Zeichnungen näher erläutert. Es zeigen:

Fig. 1: ein Blockschaltbild der erfindungsgemäßen Vorrichtung zum Extrahieren einer Signalkennung aus einem Zeitsignal;
Fig. 2: ein Blockschaltbild eines bevorzugten Ausführungsbeispiels, in dem eine Vorverarbeitung des Audiosignals dargestellt ist;
Fig. 3: ein Blockschaltbild eines Ausführungsbeispiels für die Signalkennungserzeugung;
Fig. 4: ein Blockschaltbild für eine erfindungsgemäße Vorrichtung zum Erzeugen einer Datenbank und zum Referenzieren eines Such-Zeitsignals in der Datenbank.
Fig. 5: graphische Darstellung eines Ausschnitts von Mozart KV 581 durch Frequenz-Zeit-Koordinaten-Tupel.

Preferred exemplary embodiments of the present invention are explained in more detail below with reference to the accompanying drawings. Show it:

Fig. 1: a block diagram of the inventive device for extracting a signal identifier from a time signal;
Fig. 2: a block diagram of a preferred embodiment, in which a preprocessing of the audio signal is shown;
Fig. 3: a block diagram of an embodiment for the signal identification generation;
Fig. 4: a block diagram for an inventive device for generating a database and for referencing a search time signal in the database.
Fig. 5: graphic representation of a section of Mozart KV 581 by frequency-time-coordinate tuple.

Fig. 1 zeigt ein Blockdiagramm einer Vorrichtung zum Extrahieren einer Signalkennung aus einem Zeitsignal. Die Vorrichtung umfaßt eine Einrichtung 12 zum Durchführen einer Signalflankendetektion, eine Einrichtung 14 zur Abstandsermittlung zwischen zwei ausgewählten detektierten Flanken, eine Einrichtung 16 zur Frequenzberechnung und eine Einrichtung 18 zur Signalkennungserzeugung unter Verwendung von aus der Einrichtung 16 zur Frequenzberechnung ausgegebenen Koordinaten-Tupeln, die jeweils einen Frequenzwert und eine Auftrittszeit für diesen Frequenzwert aufweisen.Fig. 1 shows a block diagram of an extracting device a signal identifier from a time signal. The device comprises a device 12 for performing a Signal edge detection, a device 14 for determining the distance between two selected detected edges, a device 16 for frequency calculation and a device 18 for signal identification generation using from output from the device 16 for frequency calculation Coordinate tuples, each a frequency value and have an appearance time for this frequency value.

An dieser Stelle sei darauf hingewiesen, daß, obgleich im nachfolgenden von einem Audiosignal als Zeitsignal gesprochen wird, das erfindungsgemäße Konzept nicht nur für Audiosignale geeignet ist, sondern für sämtliche Zeitsignale, die einen harmonischen Anteil haben, da die Signalkennung darauf basiert, daß ein Zeitsignal aus einer zeitlichen Abfolge von Frequenzen, am Beispiel des Audiosignals von Tönen, besteht.At this point it should be noted that, although in subsequent spoken of an audio signal as a time signal is, the concept of the invention not only for audio signals is suitable, but for all time signals, which have a harmonious part, because the signal identification is based on the fact that a time signal from a time sequence of frequencies, using the example of the audio signal of tones, consists.

Die Einrichtung 12 zum Erfassen des zeitlichen Auftretens von Signalflanken in dem Zeitsignal führt vorzugsweise eine Hough-Transformation durch.The device 12 for detecting the occurrence of time of signal edges in the time signal preferably carries one Hough transformation through.

Die Hough-Transformation ist in dem U.S.-Patent Nr. 3,069,654 von Paul V. C. Hough beschrieben. Die Hough-Transformation dient zur Erkennung von komplexen Strukturen und insbesondere zur automatischen Erkennung von komplexen Linien in Photographien oder anderen Bilddarstellungen. Die Hough-Transformation ist somit allgemein eine Technik, die verwendet werden kann, um Merkmale mit spezieller Form innerhalb eines Bildes zu extrahieren.The Hough transform is described in U.S. Patent No. 3,069,654 by Paul V. C. Hough. The Hough transformation is used to recognize complex structures and especially for the automatic detection of complex Lines in photographs or other images. The Hough transformation is thus generally a technique that can be used to create features with special shape inside extract an image.

In ihrer Anwendung gemäß der vorliegenden Erfindung wird die Hough-Transformation dazu verwendet, aus dem Zeitsignal Signalflanken mit spezifizierten zeitlichen Längen zu extrahieren. Eine Signalflanke wird zunächst durch ihre zeitliche Länge spezifiziert. Im Idealfall einer Sinuswelle wäre eine Signalflanke durch die ansteigende Flanke der Sinusfunktion von 0 bis 90° definiert. Alternativ könnte eine Signalflanke auch durch den Anstieg der Sinus-Funktion von -90° bis +90° spezifiziert werden.In their application according to the present invention the Hough transform used to do this from the time signal Extract signal edges with specified time lengths. A signal edge is initially determined by its temporal Length specified. Ideally a sine wave would be a signal edge through the rising edge of the sine function defined from 0 to 90 °. Alternatively, one could Signal edge also due to the increase in the sine function of -90 ° to + 90 ° can be specified.

Liegt das Zeitsignal als Folge von zeitlichen Abtastwerten ("Samples") vor, so entspricht die zeitliche Länge einer Signalflanke unter Berücksichtigung der Abtastfrequenz, mit der die Samples erzeugt worden sind, einer bestimmten Anzahl von Abtastwerten. Die Länge einer Signalflanke kann somit ohne weiteres durch die Angabe der Anzahl der Abtastwerte, die die Signalflanke umfassen soll, spezifiziert werden.Is the time signal as a result of time samples ("Samples"), the length of time corresponds to one Signal edge taking into account the sampling frequency, with the samples were generated, a certain number of samples. The length of a signal edge can thus by simply specifying the number of samples, specified to encompass the signal edge become.

Darüber hinaus wird es bevorzugt, eine Signalflanke nur dann als Signalflanke zu detektieren, wenn dieselbe stetig ist und einen überwiegend monotonen Verlauf hat, also im Falle einer positiven Signalflanke einen überwiegend monoton steigenden Verlauf hat. Selbstverständlich können auch negative Signalflanken, also monoton fallende Signalflanken detektiert werden.In addition, it is preferred to have a signal edge only then to be detected as a signal edge if the same is continuous and is predominantly monotonous, i.e. in In the case of a positive signal edge a predominantly monotone has increasing course. Of course you can too negative signal edges, i.e. monotonously falling signal edges can be detected.

Ein weiteres Kriterium zur Klassifizierung von Signalflanken besteht darin, daß eine Signalflanke nur dann als Signalflanke detektiert wird, wenn sie einen bestimmten Pegelbereich übersteigt. Um Rauschstörungen auszublenden, wird es bevorzugt, für eine Signalflanke einen minimalen Pegelbereich oder Amplitudenbereich vorzugeben, wobei monoton steigende Signalflanken unterhalb dieses Pegelbereichs nicht als Signalflanken detektiert werden.Another criterion for the classification of signal edges is that a signal edge is only a signal edge is detected when it has a certain level range exceeds. To hide noise disturbances, it is preferred to use a minimum for a signal edge Specify level range or amplitude range, being monotonous rising signal edges below this level range cannot be detected as signal edges.

Gemäß einem bevorzugten Ausführungsbeispiel der vorliegenden Erfindung wird zur Referenzierung von Audiosignalen eine weitere Einschränkung dahingehend getroffen, daß lediglich Signalflanken gesucht werden, deren spezifizierte zeitliche Länge größer als eine minimale Grenzlänge und kleiner als eine maximale zeitliche Grenzlänge ist. Dies bedeutet in anderen Worten ausgedrückt, daß lediglich Signalflanken gesucht werden, die auf Frequenzen kleiner als eine obere Grenzfrequenz und größer als eine untere Grenzfrequenz hinweisen. Bei Musikstücken wird es bevorzugt, lediglich Signalflanken zu detektieren, die auf Frequenzen im Frequenzbereich von 27,5 Hz (Ton A2) bis 4186 Hz (Ton c5) hinweisen. Dieser Frequenzbereich wird durch die durch ein übliches Klavier zur Verfügung gestellten Töne überstrichen. Für Signalkennungen von Musikstücken hat sich dieser Tonbereich als ausreichend herausgestellt.According to a preferred embodiment of the present Invention is used for referencing audio signals further restriction that only Signal edges are searched, their specified temporal length greater than a minimum limit length and is less than a maximum time limit. This in other words means that only signal edges are searched that are on frequencies less than an upper cutoff frequency and greater than a lower cutoff frequency Clues. It is preferred for pieces of music, just Detect signal edges based on frequencies in the Frequency range from 27.5 Hz (tone A2) to 4186 Hz (tone c5) Clues. This frequency range is defined by a The usual piano provided overtones. This has for signal identifications of pieces of music Sound range highlighted as sufficient.

Die Signalflankendetektionseinheit 12 liefert somit eine Signalflanke und den Zeitpunkt des Auftretens der Signalflanke. Hierbei ist es unerheblich, ob als Signalauftrittszeitpunkt der Signalflanke der Zeitpunkt des ersten Abtastwerts der Signalflanke, der Zeitpunkt des letzten Abtastwerts der Signalflanke oder der Zeitpunkt irgend eines Abtastwerts innerhalb der Signalflanke genommen wird, so lange Signalflanken gleich behandelt werden.The signal edge detection unit 12 thus provides one Signal edge and the time of occurrence of the signal edge. It is irrelevant whether the time of the signal occurrence the signal edge the time of the first sample the signal edge, the time of the last sample the signal edge or the time of any sample is taken within the signal edge as long Signal edges are treated the same.

Die Einrichtung 14 zum Ermitteln eines zeitlichen Abstands zwischen zwei aufeinanderfolgenden Signalflanken, deren zeitliche Längen abgesehen von einem vorbestimmten Toleranzwert gleich sind, untersucht die von der Einrichtung 12 ausgegebenen Signalflanken und extrahiert zwei aufeinanderfolgende Signalflanken, die gleich sind oder innerhalb eines bestimmten vorgegebenen Toleranzwerts im wesentlichen gleich sind. Wenn ein einfacher Sinuston betrachtet wird, so ist eine Periode des Sinustons durch den zeitlichen Abstand zweier aufeinanderfolgender gleich langer z. B. positiver Viertelwellen gegeben. Hierauf beruht die Einrichtung 16 zum Berechnen eines Frequenzwerts aus dem ermittelten zeitlichen Abstand. Der Frequenzwert entspricht dem Inversen des ermittelten zeitlichen Abstands.The device 14 for determining a time interval between two successive signal edges whose time lengths apart from a predetermined tolerance value are the same, examined by the device 12 output signal edges and extracted two consecutive Signal edges that are the same or within one certain predetermined tolerance value essentially are the same. If a simple sinus tone is considered, such is a period of the sine tone through the time interval two successive equally long z. B. more positive Given quarter waves. The facility is based on this 16 for calculating a frequency value from the determined time interval. The frequency value corresponds to the inverse the determined time interval.

Durch diese Vorgehensweise kann mit hoher zeitlicher und gleichzeitig frequenzmäßiger Auflösung eine Darstellung eines Zeitsignals durch Angabe der in dem Zeitsignal vorkommenden Frequenzen und durch Angabe der mit den Frequenzen korrespondierenden Auftrittszeitpunkten geliefert werden. Wenn die Ergebnisse der Einrichtung 16 zur Frequenzberechnung graphisch dargestellt werden, wird ein Diagramm gemäß Fig. 5 erhalten.By doing this, high time and at the same time frequency resolution a representation of a Time signal by specifying those occurring in the time signal Frequencies and by specifying the with the frequencies Corresponding performance times can be delivered. If the results of the device 16 for frequency calculation are graphically represented, is a diagram according to Fig. 5 obtained.

Fig. 5 zeigt einen Ausschnitt mit etwa 13 Sekunden Länge des Klarinettenquintetts A-Dur, Larghetto, KV 581 von Wolfgang Amadeus Mozart, wie es am Ausgang der Einrichtung 16 zur Frequenzberechnung erscheinen würde. In diesem Ausschnitt erklingt eine Klarinette, die eine melodieführende Solostimme spielt sowie ein begleitendes Streichquartett. Es ergeben sich die in Fig. 5 dargestellten Koordinaten-Tupel, wie sie durch die Einrichtung 16 zur Frequenzberechnung erzeugt werden könnten. Fig. 5 shows a section of about 13 seconds in length of the clarinet quintet in A major, Larghetto, KV 581 by Wolfgang Amadeus Mozart as it exit 16 would appear for frequency calculation. In this excerpt a clarinet sounds, which is a melody leading Solo part plays as well as an accompanying string quartet. The coordinate tuples shown in FIG. 5 result, as provided by the device 16 for frequency calculation could be generated.

Die Einrichtung 18 dient schließlich dazu, aus den Ergebnissen der Einrichtung 16 eine Signalkennung zu erzeugen, die für eine Signalkennungsdatenbank günstig und geeignet ist. Die Signalkennung wird allgemein aus einer Mehrzahl von Koordinatentupeln erzeugt, wobei jeder Koordinatentupel einen Frequenzwert und einen Auftrittszeitpunkt umfaßt, so daß die Signalkennung eine Folge von Signalkennungswerten umfaßt, die den zeitlichen Verlauf des Zeitsignals wiedergibt.The device 18 is finally used from the results to generate a signal identifier for the device 16, the cheap and suitable for a signal identification database is. The signal identifier is generally made up of a plurality generated from coordinate tuples, each coordinate tuple includes a frequency value and an occurrence time, so that the signal identifier is a sequence of signal identifier values includes the time course of the time signal.

Wie es später erläutert wird, dient die Einrichtung 18 dazu, aus dem Frequenz-Zeit-Diagramm von Fig. 5, das durch die Einrichtung 16 erzeugt werden könnte, die wesentlichen Informationen zu extrahieren, um einen Fingerabdruck des Zeitsignals zu erzeugen, der einerseits kompakt ist, und der andererseits das Zeitsignal ausreichend genau und unterscheidbar von anderen Zeitsignalen unterscheiden kann.As will be explained later, the device 18 serves to from the frequency-time diagram of Fig. 5, which by the device 16 could be generated, the essential Extract information to a fingerprint of the To generate time signal that is compact on the one hand, and on the other hand, the time signal is sufficiently precise and distinguishable can differ from other time signals.

Fig. 2 zeigt eine erfindungsgemäße Vorrichtung zum Extrahieren einer Signalkennung gemäß einem bevorzugten Ausführungsbeispiel der vorliegenden Erfindung. Als Zeitsignal wird eine Audiodatei 20 in einen Audio-I/O-Handler eingegeben. Der Audio-I/O-Handler 22 liest die Audiodatei beispielsweise von einer Festplatte. Der Audiodatenstrom kann auch direkt über eine Soundkarte eingelesen werden. Nach dem Einlesen eines Abschnitts des Audiodatenstroms schließt die Einrichtung 22 die Audiodatei wieder und lädt die nächste zu bearbeitende Audiodatei oder terminiert den Einlesevorgang. Die Folge von PCM-Abtastwerten (PCM = Puls Code Modulated), wie sie beispielsweise von einer CD erhalten werden, werden dann in eine Einrichtung 24 zur Vorverarbeitung des Audiosignals eingegeben. Die Einrichtung 24 dient einerseits dazu, falls erforderlich eine Abtastratenumwandlung durchzuführen, oder eine Lautstärkemodifikation des Audiosignals zu erreichen. Audiosignale liegen auf verschiedenen Medien in unterschiedlichen Abtastfrequenzen vor. Wie es bereits ausgeführt worden ist, wird jedoch der Zeitpunkt des Auftretens einer Signalflanke in dem Audiosignal zur Beschreibung des Audiosignals verwendet, so daß die Abtastrate bekannt sein muß, um die Auftrittszeitpunkte von Signalflanken korrekt zu detektieren, und um darüber hinaus Frequenzwerte korrekt zu detektieren. Alternativ kann eine Abtastratenumwandlung durch Dezimierung oder Interpolation durchgeführt werden, um die Audiosignale verschiedener Abtastraten auf eine gleiche Abtastrate zu bringen.2 shows an extraction device according to the invention a signal identifier according to a preferred embodiment of the present invention. As a time signal an audio file 20 is input to an audio I / O handler. The audio I / O handler 22 reads the audio file, for example from a hard drive. The audio data stream can can also be read directly via a sound card. To reading a portion of the audio data stream device 22 retrieves the audio file and loads the next one audio file to be edited or terminates the import process. The sequence of PCM samples (PCM = pulse code Modulated), as received for example from a CD are then transferred to a device 24 for preprocessing of the audio signal entered. The device 24 serves on the one hand, if necessary a sample rate conversion perform, or a volume modification of the Audio signal. Audio signals are on different Media in different sampling frequencies in front. However, as has already been stated, the Time of occurrence of a signal edge in the audio signal used to describe the audio signal so that the sampling rate must be known to the times of occurrence of signal edges to be detected correctly and above also correctly detect frequency values. alternative can do a sample rate conversion by decimation or interpolation be performed to the audio signals different Bring sampling rates to the same sampling rate.

Bei einem bevorzugten Ausführungsbeispiel der vorliegenden Erfindung, das für mehrere Abtastraten geeignet sein soll, ist daher die Einrichtung 24 vorgesehen, um eine Abtastrateneinstellung durchzuführen.In a preferred embodiment of the present Invention that is said to be suitable for multiple sampling rates the device 24 is therefore provided for setting a sampling rate perform.

Die PCM-Abtastwerte werden ferner einer automatischen Pegelanpassung unterzogen, die ebenfalls in der Einrichtung 24 vorgesehen ist. In der Einrichtung 24 wird zur automatischen Pegelanpassung in einem Look-Ahead-Buffer die mittlere Signalleistung des Audiosignals bestimmt. Der Audiosignalabschnitt, der zwischen zwei Signalleistungsminima liegt, wird mit einem Skalierungsfaktor multipliziert, der das Produkt aus einem Gewichtungsfaktor und dem Quotienten aus Vollausschlag und maximalem Pegel innerhalb des Segments ist. Die Länge des Look-Ahead-Buffers ist variabel.The PCM samples also become automatic Level adjustment also undergone in the facility 24 is provided. In the device 24 is for automatic Level adjustment in a look-ahead buffer is the middle one Signal power of the audio signal determined. The audio signal section, the between two signal power minima is multiplied by a scaling factor that the product of a weighting factor and the quotient from full scale and maximum level within the segment is. The length of the look-ahead buffer is variable.

Anschließend wird das derart vorverarbeitete Audiosignal in die Einrichtung 12 eingespeist, die eine Signalflankendetektion durchführt, wie sie bezugnehmend auf Fig. 1 beschrieben worden ist. Bevorzugterweise wird hierzu die Hough-Transformation verwendet. Eine schaltungstechnische Realisierung der Hough-Transformation ist in der WO 99/26167 offenbart.The audio signal preprocessed in this way is then converted into the device 12 is fed by a signal edge detection performs as described with reference to FIG. 1 has been. For this purpose, the Hough transform used. A circuit technology Realization of the Hough transformation is in the WO 99/26167.

Die durch die Hough-Transformation ermittelte Amplitude einer Signalflanke und der Detektionszeitpunkt einer Signalflanke werden dann in die Einrichtung 14 von Fig. 1 übergeben. In dieser Einheit werden jeweils zwei aufeinanderfolgende Detektionszeitpunkte voneinander subtrahiert, wobei der Kehrwert der Differenz der Auftrittszeiten als Frequenzwert angenommen wird. Diese Aufgabe wird durch die Einrichtung 16 aus Fig. 1 bewirkt und führt, wenn ein Musikstück entsprechend bearbeitet wird, zu dem Frequenz-Zeit-Diagramm von Fig. 5, in der die erhaltenen Frequenz-Zeit-Koordinaten-Tupel graphisch dargestellt sind, die durch Mozart, Köchel-Verzeichnis 581, erhalten werden.The amplitude of one determined by the Hough transformation Signal edge and the time of detection of a signal edge are then transferred to the device 14 of FIG. 1. In this unit there are two successive ones Subtract detection times from each other, where the reciprocal of the difference in performance times as a frequency value Is accepted. This task is accomplished by the 1 causes and leads when a piece of music is processed accordingly to the frequency-time diagram of Fig. 5, in which the obtained frequency-time coordinate tuple are graphically represented by Mozart, Köchel-Directory 581.

Erfindungsgemäß könnte die Darstellung von Fig. 5 bereits als Signalkennung für das Zeitsignal verwendet werden, da die zeitliche Folge der Koordinaten-Tupel den zeitlichen Verlauf des Zeitsignals wiedergibt.According to the invention, the representation of FIG. 5 could already be can be used as a signal identifier for the time signal since the temporal sequence of the coordinate tuples the temporal Reproduces the course of the time signal.

Bei einem Ausführungsbeispiel wird es jedoch bevorzugt, eine Nachbearbeitung durchzuführen, um aus dem Frequenz-Zeit-Diagramm von Fig. 5 die wesentlichen Informationen zu extrahieren, die für eine Signal-Referenzierung einen möglichst kleinen und dennoch möglichst aussagefähigen Fingerabdruck für das Zeitsignal liefern.In one embodiment, however, it is preferred to use one Postprocess to get out of the frequency-time diagram 5 to extract the essential information, one for signal referencing if possible small and yet meaningful fingerprint deliver for the time signal.

Hierzu kann die Signalkennungserzeugung 18 wie in Fig. 3 dargestellt aufgebaut sein. Die Einrichtung 18 gliedert sich in eine Einrichtung 18a zur Ermittlung der Häufungsgebiete, in eine Einrichtung 18b zur Gruppierung, in eine Einrichtung 18c zur Mittelung über einer Gruppe, in eine Einrichtung 18d zur Intervallfestlegung, in eine Einrichtung zum Quantisieren 18e und schließlich in eine Einrichtung 18f auf, um die Signalkennung für das Zeitsignal zu erhalten.For this purpose, the signal identification generator 18 can be used as in FIG. 3 be constructed shown. The device 18 is structured into a device 18a for determining the cluster areas, into a device 18b for grouping, into a Device 18c for averaging over a group, in a Device 18d for setting intervals, in a device to quantize 18e and finally into a device 18f to switch the signal identifier for the time signal receive.

Wie in Fig. 5 gut erkennbar, werden in der Einrichtung 18a zur Ermittlung der Häufungsgebiete charakteristische Verteilungspunktwolken, die als Haufen oder Cluster bezeichnet werden, herausgearbeitet. Dies geschieht, indem alle isolierten Frequenz-Zeit-Tupel gelöscht werden, die einen vorgegebenen Mindestabstand zum nächsten räumlichen Nachbarn überschreiten. Solche isolierten Frequenz-Zeit-Tupel sind beispielsweise die Punkte in der rechten oberen Ecke des Diagramms von Fig. 5. Dadurch bleibt ein sogenanntes Pitch-Contour-Streifenband übrig, das in Fig. 5 mit dem Bezugszeichen 50 skizziert ist. Das Pitch-Contour-Streifenband besteht aus Clustern bestimmter Frequenzbreite und Länge, wobei diese Cluster von gespielten Tönen hervorgerufen werden. Diese Töne sind in Fig. 5 durch waagrechte Linien, die die Ordinate schneiden, angedeutet (52), wobei bei dem hier gezeigten Beispiel die Töne h1, c2, cis2, d2 und h1 in dem Bereich zwischen etwa 6 und 10 Sekunden in der genannten Folge auftreten. Der Ton a1 hat eine Frequenz von 440 Hz. Der Ton h1 hat eine Frequenz von 494 Hz. der Ton c2 hat eine Frequenz von 523 Hz, der Ton cis2 hat eine Frequenz von 554 Hz, während der Ton d2 eine Frequenz von 587 Hz hat.As can be clearly seen in FIG. 5, the device 18a characteristic distribution point clouds to determine the cluster areas, which are referred to as clusters or clusters are worked out. This is done by isolating everyone Frequency-time tuples are deleted that have a given Minimum distance to the closest spatial neighbor exceed. Such isolated frequency-time tuples are for example, the dots in the top right corner of the 5. This leaves a so-called pitch contour strip band left that in Fig. 5 with the reference symbol 50 is outlined. The Pitch Contour strip tape consists of clusters of certain frequency latitude and longitude, whereby these clusters are caused by played notes. These tones are shown in Fig. 5 by horizontal lines intersect the ordinate, indicated (52), with this one shown example the tones h1, c2, cis2, d2 and h1 in the Range between about 6 and 10 seconds in the above Episode occur. The tone a1 has a frequency of 440 Hz. The tone h1 has a frequency of 494 Hz. The tone c2 has one Frequency of 523 Hz, the tone cis2 has a frequency of 554 Hz, while the tone d2 has a frequency of 587 Hz.

Bei polyphonen Klängen ergeben sich breitere Streifenbänder. Die Streifenbreite bei Einzeltönen hängt darüber hinaus von einem Vibrato des die Einzeltöne erzeugenden Musikinstruments ab.In the case of polyphonic sounds, there are wider stripes. The stripe width for single tones also depends of a vibrato of the musical instrument producing the single tones from.

In der Einrichtung 18b zur Gruppierung oder zur Bildung von Blöcken werden die Koordinaten-Tupel des Pitch-Contour-Streifenbandes in einem Zeitfenster von n Abtastwerten zu einem separat zu bearbeitenden Verarbeitungsblock zusammengefaßt oder gruppiert. Die Blockgröße kann dabei äquidistant oder variabel gewählt werden. Je nach Genauigkeit und zur Verfügung stehendem Speicherplatz für die Signalkennung kann eine relativ grobe Aufteilung gewählt werden, beispielsweise ein Ein-Sekunden-Raster, was über die vorliegende Abtastrate einer bestimmten Anzahl von Abtastwerten pro Block entspricht, oder eine kleinere Einteilung. Alternativ kann, um bei Musikstücken der zugrunde liegenden Notenschreibweise Rechnung zu tragen, das Raster immer so gewählt werden, daß in das Raster ein Ton fällt. Hierzu ist es erforderlich, die Länge eines Tons abzuschätzen, was durch die in Fig. 5 eingezeichnete Polynomfitfunktion 54 möglich ist. Eine Gruppe bzw. ein Block wird dann durch den zeitlichen Abstand zwischen zwei lokalen Extremwerten des Polynoms bestimmt. Diese Vorgehensweise liefert besonders bei relativ monophonen Abschnitten relativ große Gruppen von Abtastwerten, wie sie zwischen 6 und 12 Sekunden auftreten, während bei relativ polyphonen Abständen des Musikstücks, bei denen die Koordinaten-Tupel über einen größen Frequenzbereich verteilt sind, wie z. B. etwa bei 2 Sekunden in Fig. 5 oder bei 12 Sekunden von Fig. 5 kleinere Gruppen ermittelt werden, was wiederum dazu führt, daß die Signalkennung auf der Basis relativ kleiner Gruppen durchgeführt wird, so daß die Informationskompression kleiner als bei einer festen Blockbildung ist.In the device 18b for grouping or for forming Blocks become the coordinate tuples of the pitch contour strip in a time window of n samples summarized in a processing block to be processed separately or grouped. The block size can be equidistant or variable. Depending on accuracy and available storage space for the signal identification a relatively rough division can be chosen, for example a one-second grid thing about the present Sampling rate of a certain number of samples corresponds to per block, or a smaller division. alternative can, in the case of pieces of music the underlying notation To take account of the grid always chosen that a tone falls into the grid. This is it is necessary to estimate the length of a sound, what by the polynomial fit function 54 shown in FIG. 5 is possible. A group or block is then created by the temporal distance between two local extreme values of the Polynomial determined. This approach delivers particularly relatively large groups for relatively monophonic sections samples as they occur between 6 and 12 seconds, while at relatively polyphonic intervals of the piece of music, where the coordinate tuples are larger than one Frequency range are distributed, such as. B. at about 2 seconds 5 in Fig. 5 or smaller at 12 seconds from Fig. 5 Groups are determined, which in turn leads to the fact that the Signal identification carried out on the basis of relatively small groups becomes smaller, so that the information compression than with solid block formation.

In dem Block 18c zur Mittelung über einer Gruppe von Abtastwerten wird je nach Bedarf ein gewichteter Mittelwert über alle in einem Block vorhandenen Koordinaten-Tupel bestimmt. Bei dem bevorzugten Ausführungsbeispiel wurden die Tupel außerhalb des Pitch-Contour-Streifenband bereits vorher "ausgeblendet". Alternativ kann jedoch auch auf dieses Ausblenden verzichtet werden, was dazu führt, daß sämtliche durch die Einrichtung 16 berechneten Koordinaten-Tupel bei der Mittelung, die durch die Einrichtung 18c durchgeführt wird, berücksichtigt werden.In block 18c for averaging over a group of samples becomes a weighted average as needed determined over all coordinate tuples present in a block. In the preferred embodiment, the Tuples outside of the Pitch Contour strip band beforehand "Hidden". Alternatively, however, this can also be done Hide are dispensed with, which leads to all coordinate tuple calculated by the device 16 at the averaging performed by the device 18c will be taken into account.

In der Einrichtung 18d zur Intervallfestlegung wird eine Sprungweite zur Festlegung der Mitte der nächsten, d. h. zeitlich folgenden, Gruppe von Abtastwerten bestimmt.In the device 18d for setting intervals, a Jump distance to determine the middle of the next, d. H. temporally following, group of samples determined.

Es sei darauf hingewiesen, daß in der Einrichtung 18c entweder eine arithmetische, eine geometrische oder eine Median-Mittelung durchgeführt werden kann.It should be noted that in device 18c either an arithmetic, a geometric or a median averaging can be carried out.

In dem Quantisierer 18e wird der Wert, der durch die Einrichtung 18c berechnet worden ist, in nicht äquidistante Rasterwerte quantisiert. Bei Musikstücken wird es bevorzugt, die Unterteilung nach der Tonfrequenzskala durchzuführen, wobei die Tonfrequenzskala, wie es bereits ausgeführt worden ist, gemäß dem Frequenzbereich eingeteilt ist, der durch ein übliches Klavier geliefert wird und sich von 27,5 Hz (Ton A2) bis 4186 Hz (Ton c5) erstreckt und 88 Tonstufen umfaßt. Liegt der gemittelte Wert am Ausgang der Einrichtung 18c zwischen zwei benachbarten Halbtönen, so erhält er den Wert des nächstliegenden Bezugstons.In the quantizer 18e, the value generated by the device 18c has been calculated in non-equidistant Grid values quantized. For pieces of music, it is preferred carry out the division according to the audio frequency scale, being the tone frequency scale, as already stated has been classified according to the frequency range, which is supplied by a standard piano and differs from 27.5 Hz (tone A2) to 4186 Hz (tone c5) and 88 tone levels includes. Is the averaged value at the output of the Device 18c between two adjacent semitones, see above he receives the value of the closest reference tone.

Damit ergibt sich am Ausgang der Einrichtung 18e zum Quantisieren nach und nach eine Folge von quantisierten Werten, welche zusammen die Signalkennung ergeben. Je nach Bedarf können die quantisierten Werte durch die Einrichtung 18f nachverarbeitet werden, wobei eine Nachverarbeitung beispielsweise in einer Tonhöhen-Offset-Korrektur, einer Transposition in eine andere Tonskala, etc. bestehen könnte.This results in quantization at the output of the device 18e gradually a sequence of quantized values, which together make up the signal identifier. As required can the quantized values by the device 18f be post-processed, with post-processing for example in a pitch offset correction, one Transposition into another tone scale, etc. could exist.

Im nachfolgenden wird auf Fig. 4 Bezug genommen. Fig. 4 zeigt schematisch eine Vorrichtung zum Referenzieren eines Such-Zeitsignals in einer Datenbank 40, wobei die Datenbank 40 Signalkennungen einer Mehrzahl von Datenbank-Zeitsignalen Track_1 bis Track_m aufweist, die in einer vorzugsweise von der Datenbank 40 getrennten Bibliothek 42 gespeichert sind.In the following, reference is made to FIG. 4. Fig. 4 shows schematically a device for referencing a Search time signal in a database 40, the database 40 signal identifiers of a plurality of database time signals Track_1 to Track_m, which in one preferably library 42 separate from database 40 are saved.

Um ein Zeitsignal anhand der Datenbank 40 referenzieren zu können, muß die Datenbank zunächst gefüllt werden, was in einem "Lernen"-Modus erreicht werden kann. Hierzu werden Audiodateien 41 nach und nach einem Vektorgenerator 43 zugeführt, der für jede Audiodatei eine Referenz-Kennung aufweist und in der Datenbank so abspeichert, daß erkannt werden kann, zu welcher Audiodatei z. B. in der Bibliothek 42 die Signalkennung gehört.To reference a time signal based on database 40 , the database must first be filled, which in a "learning" mode can be achieved. To do this Audio files 41 are gradually fed to a vector generator 43, which has a reference identifier for each audio file and stored in the database so that they are recognized can to which audio file z. B. in library 42 the signal identifier belongs.

Gemäß der in Fig. 4 gegebenen Zuordnung entspricht die Signalkennung MV11, ...., MV1n dem Zeitsignal Track_1. Die Signal kennung MV21, ..., MV2n gehört zu dem Zeitsignal Track_2. Schließlich gehört die Signalkennung MVm1, ..., MVmn zu dem Zeitsignal Track_m.According to the assignment given in FIG. 4, the signal identifier corresponds MV11, ...., MV1n the time signal Track_1. The Signal identifier MV21, ..., MV2n belongs to the time signal Track_2. Finally, the signal identifier MVm1, ..., MVmn to the time signal Track_m.

Der Vektorgenerator 43 ist ausgebildet, um allgemein die in Fig. 1 dargestellten Funktionen durchzuführen, und ist gemäß einem bevorzugten Ausführungsbeispiel wie in den Fig. 2 und 3 dargestellt implementiert. Im "Lernen"-Modus verarbeitet der Vektorgenerator 43 nach und nach verschiedene Audiodateien (Track_1 bis Track_m), um Signalkennungen für die Zeitsignale in der Datenbank abzuspeichern, d. h. um die Datenbank zu füllen.The vector generator 43 is designed to generally perform the operations shown in FIGS Fig. 1 perform functions, and is according to a preferred embodiment as in FIG. 2nd and 3 implemented. Processed in "Learn" mode the vector generator 43 gradually different Audio files (Track_1 to Track_m) for signal identifiers for save the time signals in the database, d. H. around to fill the database.

Im "Suchen"-Modus soll eine Audiodatei 41 anhand der Datenbank 40 referenziert werden. Hierzu wird das Such-Zeitsignal 41 durch den Vektorgenerator 43 verarbeitet, um eine Such-Kennung 45 zu erzeugen. Die Such-Kennung 45 wird dann in einen DNA-Sequencer 46 eingespeist, um mit den Referenz-Kennungen in der Datenbank 40 verglichen zu werden. Der DNA-Sequencer 46 ist ferner angeordnet, um eine Aussage über das Such-Zeitsignal bezüglich der Mehrzahl von Datenbank-Zeitsignalen aus der Bibliothek 42 zu treffen. Der DNA-Sequencer sucht mit der Such-Kennung 45 die Datenbank 40 auf eine übereinstimmende Referenz-Kennung ab und übergibt einen Zeiger auf das entsprechende mit der Referenzkennung assoziierte Audiofile in der Bibliothek 42.In "search" mode, an audio file 41 is to be based on the database 40 are referenced. For this, the search time signal 41 processed by the vector generator 43 to generate a search identifier 45. The search identifier 45 will then fed into a DNA sequencer 46 to match the reference identifiers to be compared in database 40. The DNA sequencer 46 is also arranged to make a statement about the search time signal with respect to the plurality of database time signals from library 42. The DNA sequencer searches the database with the search identifier 45 40 on a matching reference identifier and passes a pointer to the corresponding one with the reference identifier associated audio files in library 42.

Der DNA-Sequencer 46 führt somit einen Vergleich der Such-Kennung 45 oder Teilen davon mit den Referenz-Kennungen in der Datenbank durch. Bei Vorliegen der vorgegebenen Folge bzw. einer Teilsequenz davon wird das zugehörige Zeitsignal in der Bibliothek 42 referenziert.The DNA sequencer 46 thus makes a comparison of the search identifier 45 or parts thereof with the reference identifiers in the database. If the given sequence is available or a partial sequence thereof becomes the associated time signal referenced in library 42.

Vorzugsweise führt der DNA-Sequencer 46 einen Boyer-Moore-Algorithmus aus, welcher beispielsweise in dem Fachbuch "Algorithms on Strings, Trees and Sequences", Dan Gusfield, Cambridge University Press, 1997, beschrieben ist. Gemäß einer ersten Alternative wird auf exakte Übereinstimmung geprüft. Das Treffen einer Aussage besteht daher darin, daß gesagt wird, daß das Such-Zeitsignal identisch zu einem Zeitsignal in der Bibliothek 42 ist. Alternativ oder zusätzlich kann auch die Ähnlichkeit zweier Sequenzen durch Verwendung von Ersetzen/Einfügen/Löschen-Operationen und einer Pitch-Offset-Korrektur (Tonhöhen-Versatzkorrektur) untersucht werden.DNA sequencer 46 preferably uses a Boyer-Moore algorithm which, for example, in the specialist book "Algorithms on Strings, Trees and Sequences", Dan Gusfield, Cambridge University Press, 1997. According to A first alternative is based on exact match checked. The conclusion of a statement is therefore that it is said that the search time signal is identical to one Time signal in library 42 is. Alternatively or additionally can also see the similarity of two sequences Use of replace / insert / delete operations and a pitch offset correction (pitch offset correction) to be examined.

Vorzugsweise ist die Datenbank 40 so strukturiert, daß sie aus der Verkettung von Signalkennungsfolgen zusammengesetzt ist, wobei das Ende jeder Vektorsignalkennung eines Zeitsignals durch ein Trennzeichen festgelegt wird, damit die Suche nicht über Zeitsignaldateigrenzen fortgesetzt wird. Werden mehrere Übereinstimmungen festgestellt, werden alle referenzierten Zeitsignale angegeben.Preferably, database 40 is structured to: composed of the chaining of signal identification sequences is, the end of each vector signal identifier of a time signal is specified by a separator so the search does not continue beyond time signal file limits. If multiple matches are found, all of them referenced time signals specified.

Durch Nutzung der Operationen Ersetzen/Einfügen/Löschen (Replace/Insert/Delete) kann ein Ähnlichkeitsmaß eingeführt werden, wobei das Zeitsignal in der Bibliothek 42 referenziert wird, das dem Such-Zeitsignal 41 anhand eines vorgegebenen Ähnlichkeitsmaßes am ähnlichsten ist. Ferner wird es bevorzugt, ein Ähnlichkeitsmaß des Such-Audio-Signals zu mehreren Signalen in der Bibliothek zu ermitteln und dann die ähnlichsten n Abschnitte in der Bibliothek 42 in Reihenfolge absteigender Ähnlichkeit auszugeben.By using the replace / insert / delete operations (Replace / Insert / Delete) a similarity measure can be introduced with the time signal referenced in library 42 is that the search time signal 41 based on a predetermined Similarity measure is most similar. Furthermore, it prefers to measure the similarity of the search audio signal and then determine multiple signals in the library the most similar n sections in library 42 in Order of descending similarity.

Claims

Method for extracting a signal identifier from a time signal having a harmonic portion, the method comprising:

detecting (12) the temporal occurrence of signal edges in the time signal;

determining (14) a temporal interval between two selected detected signal edges;

calculating (16) a frequency value from the temporal interval determined, and associating the frequency value with a time of occurrence of the frequency value in the time signal to obtain a coordinate tuple from the frequency value and the time of occurrence for this frequency value; and

creating (18) the signal identifier from a plurality of coordinate tuples, each coordinate tuple including a frequency value and a time of occurrence, whereby the signal identifier includes a sequence of signal-identifier values which reflects the temporal form of the time signal.
Method as claimed in claim 1, wherein in the step of detecting (12), a signal-flank is detected as a signal-flank only if same has, over its specified temporal length, an amplitude larger than a predetermined amplitude threshold value.
Method as claimed in claim 1 or 2,
wherein in the step of detecting (12), a signal-flank is detected as a signal-flank only if its specified temporal length is longer than a minimum cut-off length and shorter than a maximum cut-off length.
Method as claimed in claim 3, wherein the time signal is an audio signal, and wherein the minimum temporal cut-off length is specified by means of a maximum audible cut-off frequency, and the maximum temporal cut-off length is specified by means of a minimum audible cut-off frequency.
Method as claimed in claim 3, wherein the time signal is an audio signal, and wherein the minimum temporal cut-off length is specified by means of a maximum tone frequency that may be created by an instrument, and the maximum temporal cut-off length is specified by means of a minimum tone frequency which may be created by an instrument.
Method as claimed in any one of the previous claims, wherein the step of creating (18) the signal identifier comprises:

eliminating (18a) coordinate tuples spaced apart by more than a predetermined threshold distance from an adjacent coordinate tuple in a frequency-time diagram so as to determine clusters of coordinate tuples.
Method as claimed in claim 5 or 6, wherein the step of creating (18) comprises:

grouping (18b) coordinate tuples in successive temporal intervals into blocks of coordinate tuples.
Method as claimed in claim 7, wherein the successive temporal intervals have a fixed and/or a variable length.
Method as claimed in claim 7 or 8, wherein the step of creating (18) the signal identifier comprises:

averaging (18c) the frequency values of coordinate tuples in the temporal intervals to obtain a sequence of averaged frequency values for a sequence of temporal intervals, the sequence of averaged frequency values representing a feature vector.
Method as claimed in claim 9, wherein step (18) of creating the signal identifier comprises:

quantizing (18e) the feature vector to obtain a quantized feature vector.
Method as claimed in claim 10, wherein the step of quantizing (18e) is performed using non-equidistantly distributed raster points, distances between two adjacent raster points being determined in accordance with a tone-frequency scale.
Method as claimed in any one of the previous claims, wherein in step (12) of detecting signal edges, a Hough transformation is employed.
Method for creating a database (40) from reference signal identifiers for a plurality of time signals, comprising:

extracting a first signal identifier for a first time signal by the method as claimed in any one of claims 1 to 12;

extracting a second signal identifier for a second time signal by means of a method as claimed in any one of claims 1 to 12; and

storing the extracted first signal identifier in association with the first time signal in the database (40); and

storing the extracted second signal identifier in association with the second time signal in the database (40).
Method of referencing a search time signal using a database (40), the database comprising reference signal identifiers of a plurality of database time signals, a reference signal identifier of a database time signal having been determined by a method as claimed in any one of claims 1 to 12, the method comprising:

providing at least one portion of a search time signal (41);

extracting (43) a search signal identifier from the search time signal by a method as claimed in any one of claims 1 to 12; and

comparing (46) the search signal identifier with the plurality of reference signal identifiers, and, in response to the step of comparing, making a statement about the search time signal with regard to the plurality of database time signals.
Method as claimed in claim 14, wherein in the step of making a statement, a search time signal is identified as a reference time signal if the search signal identifier matches at least a portion of a reference signal identifier.
Method as claimed in claim 14, wherein in the step of making a statement, a similarity between a search time signal and a database time signal is established if the search signal identifier and/or at least a portion of database signal identifier may be made to match by means of a reproducible manipulation.
Method as claimed in any one of claims 14 to 16,
wherein the database signal identifier comprises a sequence of database signal identifier values reproducing the temporal form of the database time signal,
wherein the search signal identifier comprises a search sequence of search signal identifier values reproducing the temporal form of the search time signal,
wherein the length of the database sequence is longer than the length of the search sequence, and
wherein the search sequence is sequentially compared to the database sequence.
Method as claimed in claim 17, wherein during the sequential comparing of the search sequence with the database sequence, a correction of the values of the search and/or the database signal identifier is performed by a replace, insert or delete operation of at least one value of the search and/or the database signal identifier to determine a similarity of the search time signal and the database time signal.
Method as claimed in any one of claims 14 to 18,
wherein the step of comparing (46) is performed using a DNA sequencing algorithm and/or using the Boyer-Moore algorithm.
Apparatus for extracting a signal identifier from a time signal having a harmonic portion, the apparatus comprising:

means for detecting (12) the temporal occurrence of signal edges in the time signal;

means for determining (14) a temporal interval between two selected detected signal edges;

means for calculating (16) a frequency value from the temporal interval determined, and for associating the frequency value with a time of occurrence of the frequency value in the time signal to obtain a coordinate tuple from the frequency value and the time of occurrence for this frequency value; and

means for creating (18) the signal identifier from a plurality of coordinate tuples, each coordinate tuple including a frequency value and a time of occurrence, whereby the signal identifier includes a sequence of signal-identifier values which reflects the temporal form of the time signal.
Apparatus for creating a database (40) from reference signal identifiers for a plurality of time signals, comprising:

means for extracting a first signal identifier for a first time signal as claimed in claim 20;

means for extracting a second signal identifier for a second time signal as claimed in claim 20; and

means for storing the extracted first signal identifier in association with the first time signal in the database (40); and

means for storing the extracted second signal identifier in association with the second time signal in the database (40).
Apparatus for referencing a search time signal using a database (40), the database comprising reference signal identifiers of a plurality of database time signals, a reference signal identifier of a database time signal having been determined by a method as claimed in any one of claims 1 to 12, the apparatus comprising:

means for providing at least one portion of a search time signal (41);

means for extracting (43) a search signal identifier as claimed in claim 20; and

means for comparing (46) the search signal identifier with the plurality of reference signal identifiers, and, in response to the step of comparing, making a statement about the search time signal with regard to the plurality of database time signals.