EP1377924A2

EP1377924A2 - Method and device for extracting a signal identifier, method and device for creating a database from signal identifiers and method and device for referencing a search time signal

Info

Publication number: EP1377924A2
Application number: EP02714186A
Authority: EP
Inventors: Frank Klefenz; Karlheinz Brandenburg; Wolfgang Hirsch; Christian Uhle; Christian Richter; Andras Katai; Matthias Kaufmann
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2001-04-10
Filing date: 2002-03-12
Publication date: 2004-01-07
Anticipated expiration: 2022-03-12
Also published as: HK1059492A1; DE10117871C1; DE50201116D1; WO2002084539A2; US20040158437A1; JP3934556B2; WO2002084539A3; ATE277381T1; CA2443202A1; EP1377924B1; JP2004531758A; AU2002246109A1

Abstract

In a method of extracting a signal identifier from a time signal, the temporal occurrence of signal edges in the time signal is detected (12), wherein a signal edge has a specified temporal length. In addition, the temporal interval between two selected detected signal edges is determined (14). From the temporal interval determined, a frequency value is calculated (16), the frequency value being associated with a time of occurrence of the frequency value in the time signal so as to obtain a coordinate tuple from the frequency value and the time of occurrence for this frequency value. A signal identifier is created from a plurality of coordinate tuples (18), each coordinate tuple including a frequency value and a time of occurrence, which is why the signal identifier includes a sequence of signal identifier values reproducing the temporal form of the time signal. The extracted signal identifier is based on signal edges of the time signal and thus reproduces the temporal form of the time signal. The signal identifier is therefore characteristic of the time signal, on the one hand, and robust towards changes in the time signal, on the other hand.

Description

Method and device for extracting a signal identifier, method and device for generating a database from signal identifiers and method and device for referencing a search time signal

description

The present invention relates to the processing of time signals having a harmonic component, and in particular to the generation of a signal identifier for a time signal to describe the time signal by means of a database in which a plurality of signal identifiers for a plurality of time signals are stored to be able to.

Concepts through which time signals with a harmonic component, such as For example, audio data that can be identified and referenced are useful for many users. Particularly in a situation where there is an audio signal whose title and author are unknown, it is often desirable to find out who wrote the song. There is a need for this, for example, if the wish is present, e.g. B. to acquire a CD of the artist in question. If the present audio signal only includes the time signal content, but no name about the artist, the music publisher, etc., then it is not possible to identify the origin of the audio signal or who wrote the song. The only hope was then to hear the audio piece together with reference data regarding the author or the source, where the audio signal can be obtained, in order to then be able to obtain the desired title.

On the Internet, it is not possible to search for audio data using conventional search engines, since the search engines can only deal with textual data. Audio signals or more generally, time signals that have a harmonic component can be searched by such search engines. are not processed if they do not include any textual search information.

A realistic inventory of audio files includes several thousand stored audio files up to hundreds of thousands of audio files. Music database information can be stored on a central Internet server, and potential searches can be made over the Internet. Alternatively, with today's hard disk capacities, the central music databases on local hard disk systems are also conceivable for users. It is desirable to be able to search through such music databases in order to find reference data about an audio file, of which only the file itself, but no reference data, is known.

In addition, it is equally desirable to be able to search through music databases using predetermined criteria, such as being able to find similar pieces. Similar pieces are, for example, the pieces with a similar melody, a similar set of instruments, or simply with similar sounds, such as. B. sound of the sea, twittering of birds, male voices, female voices, etc.

US Patent No. 5,918,223 discloses a method and apparatus for content based analysis, storage, retrieval and segmentation of audio information. This method is based on extracting several acoustic features from one audio signal. Volume, bass, pitch, brightness and melody-frequency-based cepstral coefficients are measured in a time window of a certain length at periodic intervals. Each measurement data record consists of a sequence of measured feature vectors. Each audio file is specified by the complete set of feature sequences calculated for each feature. Furthermore, the first derivatives are calculated for each sequence of feature vectors. Then statistical values such as mean and standard deviation are calculated. This Set of values is stored in an N vector, that is, a vector with n elements. This procedure is applied to a variety of audio files to derive an N vector for each audio file. This gradually builds up a database from a large number of N vectors. A search N vector is then extracted from an unknown audio file using the same procedure. In the case of a search query, a distance calculation of the predefined N vector and the N vectors stored in the database is then determined. Finally, the N-vector is output which has the minimum distance to the search N-vector. The output N-vector is assigned data about the author, the title, the source of procurement, etc., so that an audio file can be identified with regard to its origin.

The disadvantage of this method is that several characteristics are calculated and arbitrary heuristics for calculating the parameters are introduced. The mean value and standard deviation calculations for all feature vectors for an entire audio file reduce the information given by the time history of the feature vectors to a few feature sizes. This leads to a high loss of information.

The object of the present invention is to provide a method and a device for extracting a signal identifier from a time signal, which enable a meaningful identification of a time signal without excessive loss of information.

This object is achieved by a method for extracting a signal identifier from a time signal according to patent claim 1 or by a device for extracting a signal identifier from a time signal according to patent claim 19.

Another object of the present invention is to provide a method and an apparatus for producing a ner database from signal identifiers and to create a method and an apparatus for referencing a search time signal by means of such a database.

This object is achieved by a method for generating a database according to claim 13, a device for generating a database according to claim 20, a method for referencing a search time signal according to claim 14 or a device for referencing a search time signal according to claim 21.

The present invention is based on the knowledge that with time signals that have a harmonic component, the time profile of the time signal can be used to extract a signal identifier of the time signal from the time signal, which on the one hand provides a good fingerprint for the time signal, but which on the other hand, it is manageable with regard to the amount of data to enable an efficient search of a large number of signal identifiers in a database. An essential property of time signals with a harmonic component are recurring signal edges in the time signal. B. two successive signal edges with the same or similar length allow the specification of a period and thus a frequency in the time signal with high temporal and frequency resolution, if not only the presence of the signal edges themselves, but also the temporal occurrence of the signal edges in the time signal is taken into account. It is thus possible to obtain a description of the time signal in that the time signal consists of frequencies which are successive in time. Using the example of an audio signal, the audio signal is thus characterized in such a way that a tone, that is to say a frequency, is present at a certain point in time, and that this tone, ie this frequency, is followed by another tone, ie a different frequency, at a later point in time. According to the invention, the description of the time signal by means of a sequence of temporal sampling values is therefore passed to a description of the time signal by coordinate tuples from frequency and time of occurrence of the frequency. The signal identifier, or in other words the feature vector (MV), which is used to describe the time signal, thus comprises a sequence of signal identification values which, depending on the embodiment, more or less roughly reproduces the time profile of the time signal. The time signal is therefore not characterized on the basis of its spectral properties, as in the prior art, but on the basis of the chronological sequence of frequencies in the time signal.

At least two detected signal edges are thus required to calculate a frequency value from the detected signal edges. The selection of these two signal edges from the total detected signal edges, on the basis of which frequency values are calculated, is diverse. First, two successive signal edges of essentially the same length can be used. The frequency value is then the reciprocal of the time interval between these edges. Alternatively, a selection can also be made based on the amplitude of the detected signal edges. In this way, two successive signal edges of the same amplitude can also be taken in order to determine a frequency value. However, it is not always necessary to take two successive signal edges, but e.g. B. always the second, third, fourth, ... signal edge of the same amplitude or length. Finally, it should be noted that any two signal edges can also be taken in order to obtain the coordinate tuple using statistical methods and on the basis of the superposition laws. Using the example of a flute, it becomes clear that a flute tone provides two signal edges with a high amplitude, between which there is a wave crest with a lower amplitude. For example, to determine the root note of the flute a selection of the two detected signal edges is made according to the amplitude.

For audio signals in particular, the temporal sequence of tones is the most natural way of characterization, since, as can be recognized most easily by music signals, the essence of the audio signal lies in the temporal sequence of tones. The most immediate sensation a listener receives from a music signal is the temporal sequence of tones. Not only in classical music, in which works are always based on a certain theme, which runs through the whole work in various modifications, but also in songs of popular or other contemporary music, there is a catchy melody, which generally consists of a sequence consists of simple tones, whereby the theme or the simple melody significantly shapes the ability to recognize regardless of rhythm, pitch, any instrument accompaniment, etc.

The concept according to the invention is based on this knowledge and provides a signal identifier which consists of a chronological sequence of frequencies or, depending on the embodiment, of a chronological sequence of frequencies, i. H. Tones derived from statistical methods.

An advantage of the present invention is that the signal identification as a time sequence of frequencies represents a fingerprint of high information content for time signals with a harmonic component and to a certain extent constitutes the essential or the core of a time signal.

Another advantage of the present invention is that the signal identifier extracted according to the invention represents a strong compression of the time signal, but is still based on the temporal course of the time signal and thus on the natural understanding of time signals, e.g. B. pieces of music is adjusted. Another advantage of the present invention is that the sequential nature of the signal identification means that the distance calculation referencing algorithms in the prior art can be dispensed with and algorithms known from DNA sequencing can be used in a database to reference the time signal and that, in addition, similarity calculations can also be performed using DNA sequencing algorithms with replace / insert / delete operations.

A further advantage of the present invention is that the Hough transformation, for which efficient algorithms exist from image processing and image recognition, can be used in a favorable manner to detect the time occurrence of signal edges in the time signal.

Another advantage of the present invention is that the signal identifier of a time signal extracted according to the invention is independent of whether the search signal identifier is derived from the entire time signal or only from a section of the time signal, since according to the algorithms of DNA sequencing, a time stepwise Comparison of the search signal identifier with a reference signal identifier can be carried out, the portion of the time signal to be identified being, so to speak, automatically identified in the reference time signal where the highest match between the search signal identifier and the reference signal identifier exists due to the temporally sequential comparison ,

Preferred exemplary embodiments of the present invention are explained in more detail below with reference to the accompanying drawings. Show it: 1 shows a block diagram of the device according to the invention for extracting a signal identifier from a time signal;

2 shows a block diagram of a preferred exemplary embodiment, in which preprocessing of the audio signal is shown;

3 shows a block diagram of an exemplary embodiment for the signal identification generation;

4 shows a block diagram for a device according to the invention for generating a database and for referencing a search time signal in the database.

Fig. 5 graphical representation of a section of Mozart KV 581 by frequency-time coordinate tuple.

1 shows a block diagram of an apparatus for extracting a signal identifier from a time signal. The device comprises a device 12 for performing a signal edge detection, a device 14 for determining the distance between two selected detected edges, a device 16 for frequency calculation and a device 18 for signal recognition generation using coordinate tuples output from the device 16 for frequency calculation each have a frequency value and an occurrence time for this frequency value.

At this point it should be pointed out that, although in the following we speak of an audio signal as a time signal, the concept according to the invention is not only suitable for audio signals, but for all time signals that have a harmonic component, since the signal identification is based on the fact that a time signal from a time sequence of frequencies, using the example of the audio signal of tones.

The device 12 for detecting the temporal occurrence of signal edges in the time signal preferably carries out a Hough transformation.

The Hough transformation is in the U.S. - Patent No. 3,069,654 by Paul V. C. Hough. The Hough transformation is used for the detection of complex structures and in particular for the automatic detection of complex lines in photographs or other image representations. The Hough transform is thus generally a technique that can be used to extract features with a special shape within an image.

In its application according to the present invention, the Hough transformation is used to extract signal edges with specified time lengths from the time signal. A signal edge is initially specified by its length in time. In the ideal case of a sine wave, a signal edge would be defined by the rising edge of the sine function from 0 to 90 °. Alternatively, a signal edge could also be specified by increasing the sine function from -90 ° to + 90 °.

If the time signal is available as a sequence of temporal samples (“samples”), the temporal length of a signal edge, taking into account the sampling frequency with which the samples were generated, corresponds to a certain number of samples. The length of a signal edge can thus be easily determined by the specification of the number of samples that the signal edge is to include.

In addition, it is preferred to detect a signal edge as a signal edge only if it is continuous and has a predominantly monotonic course, that is to say in the In the case of a positive signal edge it has a predominantly monotonously increasing profile. Of course, negative signal edges, ie monotonically falling signal edges, can also be detected.

Another criterion for the classification of signal edges is that a signal edge is only detected as a signal edge if it covers a certain level range. In order to suppress noise disturbances, it is preferred to specify a minimum level range or amplitude range for a signal edge, wherein monotonically increasing signal edges below this level range are not detected as signal edges.

According to a preferred exemplary embodiment of the present invention, a further restriction is made for referencing audio signals to the effect that only signal edges are sought whose specified time length is greater than a minimum limit length and less than a maximum time limit length. In other words, this means that only signal edges are sought which indicate frequencies lower than an upper limit frequency and higher than a lower limit frequency. In the case of pieces of music, it is preferred to only detect signal edges which indicate frequencies in the frequency range from 27.5 Hz (tone A2) to 4186 Hz (tone c5). This frequency range is covered by the tones provided by a conventional piano. This tone range has been found to be sufficient for signal identifications of pieces of music.

The signal edge detection unit 12 thus supplies a signal edge and the time of the occurrence of the signal edge. It is irrelevant whether the time of the first sample value of the signal edge, the time of the last sample value of the signal edge or the time of any sample value is taken within the signal edge, as long as signal edges are treated equally.

The device 14 for determining a time interval between two successive signal edges, the lengths of which are the same apart from a predetermined tolerance value, examines the signal edges output by the device 12 and extracts two successive signal edges that are the same or within a specific predetermined tolerance value in the are essentially the same. If a simple sine tone is considered, a period of the sine tone is determined by the time interval between two successive z. B. given positive quarter waves. The device 16 for calculating a frequency value from the determined time interval is based on this. The frequency value corresponds to the inverse of the determined time interval.

With this procedure, a representation of a time signal can be provided with a high temporal and at the same time frequency resolution by specifying the frequencies occurring in the time signal and specifying the times of occurrence corresponding to the frequencies. If the results of the device 16 for frequency calculation are represented graphically, a diagram according to FIG. 5 is obtained.

5 shows a section with a length of about 13 seconds of the clarinet quintet in A major, Larghetto, KV 581 by Wolfgang Amadeus Mozart, as it would appear at the output of the device 16 for frequency calculation. This section features a clarinet that plays a melody-leading solo part and an accompanying string quartet. The coordinate tuples shown in FIG. 5 result as they could be generated by the device 16 for frequency calculation. Finally, the device 18 serves to generate from the results of the device 16 a signal identifier that is cheap and suitable for a signal identifier database. The signal identifier is generally generated from a plurality of coordinate tuples, each coordinate tuple comprising a frequency value and an occurrence time, so that the signal identifier comprises a sequence of signal identifier values which reproduce the time profile of the time signal.

As will be explained later, the device 18 serves to extract the essential information from the frequency-time diagram of FIG. 5, which could be generated by the device 16, in order to generate a fingerprint of the time signal, which is compact on the one hand and, on the other hand, can differentiate the time signal from other time signals with sufficient accuracy and differentiation.

2 shows a device according to the invention for extracting a signal identifier according to a preferred exemplary embodiment of the present invention. An audio file 20 is input to an audio I / O handler as the time signal. The audio I / O handler 22 reads the audio file from a hard drive, for example. The audio data stream can also be read directly via a sound card. After reading in a section of the audio data stream, the device 22 closes the audio file again and loads the next audio file to be processed or terminates the reading process. The sequence of PCM samples (PCM = Pulse Code Modulated), as obtained for example from a CD, is then input into a device 24 for preprocessing the audio signal. The device 24 serves, on the one hand, to carry out a sampling rate conversion if necessary, or to achieve a volume modification of the audio signal. Audio signals are available on different media in different sampling frequencies. As has already been explained, however, the time of the occurrence of a signal edge in the audio gnal used to describe the audio signal, so that the sampling rate must be known in order to correctly detect the occurrence times of signal edges, and in addition to correctly detect frequency values. Alternatively, a sampling rate conversion can be carried out by decimation or interpolation in order to bring the audio signals of different sampling rates to the same sampling rate.

In a preferred exemplary embodiment of the present invention, which is intended to be suitable for a plurality of sampling rates, the device 24 is therefore provided in order to carry out a sampling rate setting.

The PCM samples are also subjected to an automatic level adjustment, which is also provided in the device 24. The average signal power of the audio signal is determined in the device 24 for automatic level adjustment in a look-ahead buffer. The audio signal section which lies between two signal power minima is multiplied by a scaling factor which is the product of a weighting factor and the quotient of the full scale and the maximum level within the segment. The length of the look-ahead buffer is variable.

The audio signal which has been preprocessed in this way is then fed into the device 12, which carries out a signal edge detection, as has been described with reference to FIG. 1. The Hough transformation is preferably used for this. A circuit implementation of the Hough transformation is disclosed in WO 99/26167.

The amplitude of a signal edge determined by the Hough transformation and the time of detection of a signal edge are then transferred to the device 14 in FIG. 1. In this unit, two successive detection times are subtracted from each other, whereby the reciprocal of the difference in performance times is taken as the frequency value. This object is achieved by the device 16 from FIG. 1 and, when a piece of music is processed accordingly, leads to the frequency-time diagram of FIG. 5, in which the frequency-time-coordinate tuples obtained are shown graphically, which by Mozart, Köchel-Directory 581.

5 could already be used as a signal identifier for the time signal, since the chronological sequence of the coordinate tuples reproduces the chronological course of the time signal.

In one exemplary embodiment, however, it is preferred to carry out post-processing in order to extract the essential information from the frequency-time diagram of FIG. 5, which provides a fingerprint for the time signal that is as small as possible and nevertheless as meaningful as possible for signal referencing ,

For this purpose, the signal identification generator 18 can be constructed as shown in FIG. 3. The device 18 is divided into a device 18a for determining the cluster areas, a device 18b for grouping, a device 18c for averaging over a group, a device 18d for interval setting, a device for quantizing 18e and finally a device 18f to get the signal identifier for the time signal.

As can be clearly seen in FIG. 5, characteristic distribution point clouds, which are referred to as clusters or clusters, are worked out in the device 18a for determining the cluster areas. This is done by deleting all isolated frequency-time tuples that exceed a predetermined minimum distance from the nearest spatial neighbor. Such isolated frequency-time tuples are, for example, the points in the top right corner of the 5. This leaves what is known as a pitch contour strip band, which is sketched in FIG. 5 with the reference symbol 50. The pitch contour strip band _d consists of clusters of a certain frequency width and length, these clusters being caused by played tones. These tones are indicated in Fig. 5 by horizontal lines which intersect the ordinate (52), the tones h1, c2, cis2, d2 and hl in the example shown here in the range between about 6 and 10 seconds in the above Episode occur. The tone al has a frequency of 440 Hz. The tone hl has a frequency of 494 Hz. The tone c2 has a frequency of 523 Hz, the tone cis2 has a frequency of 554 Hz, while the tone d2 has a frequency of 587 Hz ,

With polyphonic sounds, there are wider stripes. The strip width for single tones also depends on a vibrato of the musical instrument producing the single tones.

In the device 18b for grouping or for forming blocks, the coordinate tuples of the pitch contour strip band are combined or grouped into a processing block to be processed separately in a time window of n samples. The block size can be chosen to be equidistant or variable. Depending on the accuracy and available storage space for the signal identification, a relatively rough division can be selected, for example a one-second raster, which corresponds to a certain number of samples per block over the present sampling rate, or a smaller division. Alternatively, in order to take account of the underlying musical notation for pieces of music, the grid can always be chosen so that a tone falls into the grid. For this purpose, it is necessary to estimate the length of a tone, which is possible using the polynomial fit function 54 shown in FIG. 5. A group or a block is then determined by the time interval between two local extreme values of the polynomial. This approach delivers particularly for relatively monophonic sections, relatively large groups of samples, such as occur between 6 and 12 seconds, while for relatively polyphonic intervals of the piece of music, in which the coordinate tuples are distributed over a large frequency range, such as e.g. B. at about 2 seconds in Fig. 5 or at 12 seconds of Fig. 5 smaller groups can be determined, which in turn leads to the fact that the signal recognition is carried out on the basis of relatively small groups, so that the information compression is smaller than with a fixed block formation is.

In the block 18c for averaging over a group of samples, a weighted average over all coordinate tuples present in a block is determined as required. In the preferred exemplary embodiment, the tuples outside the pitch contour strip band had already been "hidden" beforehand. Alternatively, however, this fading out can also be dispensed with, which leads to all coordinate tuples calculated by the device 16 being averaged by the the device 18c is carried out are taken into account.

In the device 18d for interval setting, a jump distance for determining the center of the next, i.e. H. temporally following, group of samples determined.

It should be noted that either arithmetic, geometric or median averaging can be performed in device 18c.

In the quantizer 18e, the value that has been calculated by the device 18c is quantized into non-equidistant raster values. In the case of pieces of music, it is preferred to carry out the division according to the tone frequency scale, the tone frequency scale, as has already been explained, being divided according to the frequency range supplied by a conventional piano and deviating from 27.5 Hz (tone A2) extends to 4186 Hz (tone c5) and 88 tone stages includes. If the averaged value at the output of the device 18c lies between two adjacent semitones, it receives the value of the closest reference tone.

This results in a sequence of quantized values at the output of the device 18e for quantization, which together give the signal identifier. Depending on requirements, the quantized values can be post-processed by the device 18f, post-processing could consist, for example, of a pitch offset correction, a transposition into another tone scale, etc.

In the following, reference is made to FIG. 4. FIG. 4 schematically shows a device for referencing a search time signal in a database 40, the database 40 having signal identifiers of a plurality of database time signals Track_l to Track_m, which are stored in a library 42, which is preferably separate from the database 40.

In order to be able to reference a time signal on the basis of the database 40, the database must first be filled, which can be achieved in a “learning” mode. For this purpose, audio files 41 are gradually fed to a vector generator 43 which has a reference identifier for each audio file and stores it in the database in such a way that it can be recognized to which audio file the signal identifier belongs, for example in library 42.

According to the assignment given in FIG. 4, the signal identifier MVll, ...., MVln corresponds to the time signal Track_l. The signal identifier MV21, .-., MV2n belongs to the time signal Track_2. Finally, the signal identifier MVml, ..., MVmn belongs to the time signal Track_m.

The vector generator 43 is designed to generally perform the functions shown in FIG. 1, and is measured a preferred embodiment as shown in FIGS. 2 and 3 implemented. In the “learning” mode, the vector generator 43 gradually processes various audio files (Track_l to Track_m) in order to store signal identifiers for the time signals in the database, ie to fill the database.

In the “search” mode, an audio file 41 is to be referenced using the database 40. For this purpose, the search time signal 41 is processed by the vector generator 43 in order to generate a search identifier 45. The search identifier 45 is then converted into a DNA Sequencer 46 is fed in to be compared with the reference identifiers in database 40. DNA sequencer 46 is also arranged to make a statement about the search time signal with regard to the plurality of database time signals from library 42. Der DNA sequencer searches the database 40 for a matching reference identifier with the search identifier 45 and transfers a pointer to the corresponding audio file in the library 42 associated with the reference identifier.

The DNA sequencer 46 thus compares the search identifier 45 or parts thereof with the reference identifiers in the database. If the specified sequence or a partial sequence thereof is present, the associated time signal is referenced in library 42.

The DNA sequencer 46 preferably executes a Boyer-Moore algorithm, which is described, for example, in the textbook "Algorithms on Strings, Trees and Sequences", Dan Gusfield, Cambridge University Press, 1997. According to a first alternative, exact agreement is sought Making a statement is therefore to say that the search time signal is identical to a time signal in library 42. Alternatively or in addition, the similarity of two sequences can also be achieved by using replace / insert / delete operations and a pitch offset correction (pitch offset correction) are examined.

The database 40 is preferably structured in such a way that it is composed of the concatenation of signal identification sequences, the end of each vector signal identification of a time signal being defined by a separator, so that the search is not continued over time signal file boundaries. If several matches are found, all referenced time signals are given.

A similarity measure can be introduced by using the Replace / Insert / Delete operations, the time signal being referenced in the library 42 which is most similar to the search time signal 41 on the basis of a predetermined similarity measure. It is further preferred to determine a similarity measure of the search audio signal to a plurality of signals in the library and then to output the n most similar sections in the library 42 in descending order.

Claims

claims

1. A method for extracting a signal identifier from a time signal that has a harmonic component, with the following steps:

Detecting (12) the time occurrence of signal edges in the time signal;

Determining (14) a time interval between two selected detected signal edges;

Calculating (16) a frequency value from the determined time interval and assigning the frequency value to an occurrence time of the frequency value in the time signal in order to obtain a coordinate tuple from the frequency value and the occurrence time for this frequency value; and

Generating (18) the signal identifier from a plurality of coordinate tuples, each coordinate tuple comprising a frequency value and an occurrence time, as a result of which the signal identifier comprises a sequence of signal identifier values which reproduce the time profile of the time signal.

2. The method as claimed in claim 1, in which, in the step of detecting (12), a signal edge is only detected as a signal edge if it has an amplitude over its specified time length which is greater than a predetermined amplitude threshold value.

The method of claim 1 or 2,

in which in the step of detecting (12) a signal edge is only detected as a signal edge if its specified time length is greater than an i- nominal limit length and is less than a maximum limit length.

4. The method according to claim 3, in which the time signal is an audio signal, and in which the minimum time limit is determined on the basis of a maximum audible limit frequency and the maximum time limit is determined on the basis of a minimum audible limit frequency.

5. The method of claim 3, wherein the time signal is an audio signal, and in which the minimum time limit is determined by a maximum sound frequency that can be generated by an instrument and the maximum time limit is determined by a minimum sound frequency that can be generated by an instrument.

6. The method according to any one of the preceding claims, wherein the step of generating (18) the signal identifier comprises the following step:

Eliminate (18a) coordinate tuples that are spaced more than a predetermined threshold distance from an adjacent coordinate tuple in a frequency-time diagram to determine clusters of coordinate tuples.

7. The method of claim 5 or 6, wherein the step of generating (18) comprises the following step:

Grouping (18b) coordinate tuples in successive time intervals into blocks of coordinate tuples.

8. The method as claimed in claim 7, in which the successive time intervals have a fixed and / or a variable length.

9. The method of claim 7 or 8, wherein the step of generating (18) the signal identifier comprises the following step:

Averaging (18c) the frequency values of coordinate tuples in the time intervals to obtain a sequence of averaged frequency values for a sequence of time intervals, the sequence of averaged frequency values representing a feature vector.

10. The method of claim 9, wherein the step (18) of generating the signal identifier comprises the following step:

Quantize (18e) the feature vector to obtain a quantized feature vector.

11. The method according to claim 10, in which the step of quantizing (18e) is carried out using non-equidistantly distributed raster points, wherein distances between two adjacent raster points are determined according to an audio frequency scale.

12. The method according to any one of the preceding claims, in which a Hough transformation is used in the step (12) of detecting signal edges.

13. A method for generating a database (40) from reference signal identifiers for a plurality of time signals, with the following steps:

Extracting a first signal identifier for a first time signal by the method according to one of claims 1 to 12;

Extracting a second signal identifier for a second time signal by a method according to any one of claims 1 to 12; and Storing the extracted first signal identifier in association with the first time signal in the database (40); and

Storing the extracted second signal identifier in association with the second time signal in the database (40).

14. A method for referencing a search time signal using a database (40), the database having reference signal identifiers of a plurality of database time signals, a reference signal identifier of a database time signal by a method according to one of the claims 1 to 12 has been determined with the following steps:

Specifying at least a portion of a search time signal (41);

Extracting (43) a search signal identifier from the search time signal by a method according to one of the claims 1 to 12; and

Comparing (46) the search signal identifier to the plurality of reference signal identifiers and, in response to the step of comparing, making a statement about the search time signal with respect to the plurality of database time signals.

15. The method according to claim 14, in which, in the step of making a statement, a search time signal is identified as a reference time signal if the search signal identifier corresponds at least to a section of a reference signal identifier.

16. The method of claim 14, wherein in the step of making a statement a similarity between one Search time signal and a database time signal is determined if the search signal identifier and / or at least a section of a database signal identifier can be brought into agreement by reproducible manipulation.

17. The method according to any one of claims 14 to 16,

in which the database signal identifier has a sequence of database signal identifier values that reflect the time profile of the database time signal,

in which the search signal identifier has a search sequence of search signal identifier values which reproduce the time profile of the search time signal,

in which the length of the database sequence is greater than the length of the search sequence, and

in which the search sequence is compared sequentially with the database sequence.

18. The method according to claim 17, wherein during the sequential comparison of the search sequence with the database sequence, a correction of the values of the search and / or the database signal identifier by a replacement, insert or delete operation of at least one value of the Search and / or the database signal identification is carried out to determine a similarity of the search time signal and the database time signal.

19. The method according to any one of claims 14 to 18,

in which the step of comparing (46) is carried out using a DNA sequencing algorithm and / or using the Boyer-Moore algorithm.

20. Device for extracting a signal identifier from a time signal, which has a harmonic component, with the following features:

means for detecting (12) the time occurrence of signal edges in the time signal;

a device for determining (14) a time interval between two selected detected signal edges;

means for calculating (16) a frequency value from the determined time interval and assigning the frequency value to an occurrence time of the frequency value in the time signal in order to obtain a coordinate tuple from the frequency value and the occurrence time for this frequency value; and

means for generating (18) the signal identifier from a plurality of coordinate tuples, each coordinate tuple comprising a frequency value and an occurrence time, whereby the signal identifier comprises a sequence of signal identifier values which reproduce the time profile of the time signal.

21. Device for generating a database (40) from reference signal identifiers for a plurality of time signals, with the following features:

means for extracting a first signal identifier for a first time signal by the method according to one of claims 1 to 12;

a device for extracting a second signal identifier for a second time signal by a method according to one of claims 1 to 12; and means for storing the extracted first signal identifier in association with the first time signal in the database (40); and

a device for storing the extracted second signal identifier in association with the second time signal in the database (40).

22. Device for referencing a search time signal using a database (40), the database having reference signal identifiers of a plurality of database time signals, a reference signal identifier of a database time signal by a method according to one of the claims 1 to 12 has been determined with the following characteristics:

means for specifying at least a portion of a search time signal (41);

means for extracting (43) a search signal identifier by a method according to one of the claims 1 to 12; and

means for comparing (46) the search signal identifier to the plurality of reference signal identifiers and, in response to the step of comparing, making a statement about the search time signal with respect to the plurality of database time signals.