EP1794745A1

EP1794745A1 - Device and method for changing the segmentation of an audio piece

Info

Publication number: EP1794745A1
Application number: EP05762452A
Authority: EP
Inventors: Markus Van Pinxteren; Michael Saupe; Markus Cremer
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2004-09-28
Filing date: 2005-07-15
Publication date: 2007-06-13
Anticipated expiration: 2025-07-15
Also published as: JP2008515011A; JP5565374B2; ATE390681T1; EP1794745B1; JP2011180610A; WO2006034742A1; US20060080100A1; DE502005003500D1; US7282632B2; DE102004047069A1; US7345233B2; US20060065106A1

Abstract

For grouping temporal segments of an audio piece, which is structured into main parts repeatedly occurring in the audio piece, into various segment classes, at first a similarity representation for the segments is provided, wherein the similarity representation for each segment comprises an associated plurality of similarity values, wherein the similarity values indicate how similar the segment is to every other segment of the audio piece. Hereupon, using the similarity values associated with the segment, a similarity threshold value for a segment is calculated in order to then associate a segment with a segment class when the similarity value of the segment meets a predetermined relation with reference to the similarity threshold value. With this, clustering is achieved, which also works efficiently and correctly where there are segments with strongly different or almost equal combined similarity values.

Description

Device and method for changing a segmentation of an audio piece

description

The present invention relates to the audio segmentation and in particular to the analysis of pieces of music on the individual Haupttei¬ contained in the pieces of music, which may occur repeatedly in the piece of music.

Music from the rock and pop area usually consists of more or less distinct segments, such as intro, verse, chorus, bridge, outro, etc. The beginning and end times of such segments to detect and the segments according to their affiliation to the most important Klas¬ Grouping the stanza (verse and chorus) is the goal of audio segmentation. Correct segmentation and also identification of the calculated segments can be usefully used in various areas. For example, pieces of music from online providers such as Amazon, Mu- sicline, etc. can be intelligently "played".

Most providers on the Internet limit their listening examples to a short excerpt from the available pieces of music. In this case, it would of course also make sense to offer the interested party not just the first 30 seconds or any 30 seconds, but a section of the song that is as representative as possible. This could be z. For example, it may be the chorus, but also a summary of the song, consisting of segments that belong to the various main classes (stanza, chorus, etc.).

Another application example for the technique of audio segmentation is the integration of the segmentation / Grouping / marking algorithm into a music player. The information about segment beginnings and segment ends makes it possible to navigate through a piece of music. Due to the class affiliation of the segments, ie whether a segment is a verse, a chorus, etc., z. B. also jump directly to the next chorus or the next stanza. Such an application is of interest to large music markets, offering their customers the opportunity to listen to complete albums. As a result, the customer spares himself the annoying, searching prelude to characteristic passages in the song, which might perhaps lead him to actually buy a piece of music in the end.

There are different approaches in the field of audio segmentation. In the following, the approach of Jonathan Foote and Matthew Cooper is exemplified. This process is described in FOOTE, J.T. / Cooper, M.L .: Summarizing Populary Music via Structural Similarity Analysis. Proceedings of the IEEE Workshop on Signal Processing to Audio and Acoustics 2003. FOOTE, J.T. / COOPER, M.L. : Media Segmentation using Self-Similar Decomposition. Proceedings of SPIE Storage and Retrieval for Multimedia Databases, Vol. 5021, pp. 167-75, January 2003.

The known method of Foote antiand the Block¬ circuit diagram of Fig. 5 is exemplified. First, a WAV file 500 is provided. In a subsequent extraction block 502, a feature extraction then takes place, wherein as a feature the spectral coefficients per se or alternatively the mel frequency cepstral coefficients (MFCCs) are extracted. Prior to this extraction, a short-time Fourier transform (STFT) is performed with 0.05 second wide non-overlapping windows with the WAV file. The MFCC features are then extracted in the spectral range. It should be noted that the parameterization is not optimized for compression, transmission or reconstruction, but for a O

Audio analysis. The requirement is that similar audio pieces produce similar features.

The extracted features are then stored in a memory 504.

The feature extraction algorithm now has a segmentation algorithm that ends in a similarity matrix, as shown in a block 506. First, however, the feature matrix is read in (508), to then group feature vectors (510), to then build a similarity matrix based on the grouped feature vectors, which consists of a distance measurement between each of all features. In particular, all pairs of audio window pairs are compared using a quantitative similarity measure, distance.

The structure of the similarity matrix is shown in FIG. 8. Thus, in Fig. 8, the music piece is represented as a stream or stream 800 of audio samples. The piece of audio is windowed, as has been stated, with a first window having i and a second window being j. Overall, the audio piece has z. B. K window. This means that the similarity matrix has K rows and K columns. Then, for each window i and for each window j, a similarity measure to each other is calculated, and the calculated similarity measure or distance measure D (i, j) is input to the row or column designated by i and j in the similarity matrix. One column therefore shows the similarity of the window designated by j to all other audio windows in the piece of music. The similarity of window j to the very first window of the piece of music would then be in column j and in line 1. The similarity of window j to the second window of the piece of music would then be in column j, but now in line 2. By contrast, the similarity of the second window - A -

to the first window in the second column of the matrix and in the first row of the matrix.

It can be seen that the matrix is redundant in that it is symmetric to the diagonal, and that on the diagonal the similarity of a window is to itself, which is the trivial case of 100% similarity.

An example of a similarity matrix of a piece can be seen in FIG. Here again the completely symmetrical structure of the matrix with respect to the main diagonal is recognizable, the main diagonal being visible as a light stripe. It should also be noted that due to the small window length compared to the relatively coarse time resolution in Fig. 6, the main diagonal is not seen as a lighter solid line, but from Fig. 6 is only approximately recognizable.

This is done using the similarity matrix as described, for. For example, as shown in FIG. 6, a kernel correlation 512 is performed on a kernel matrix 514 to obtain a novelty measure, also known as a novelty score, that could be averaged and smoothed into The smoothing of these Novelty Scores is shown schematically in FIG. 5 by a block 516.

Then, in a block 518, the segment boundaries are read out using the smoothed novelty value profile, for which purpose the local maxima in the smoothed novelty curve are determined and, if appropriate, must be shifted by a constant number of samples caused by the smoothing in order actually to produce the correct segment - To obtain limits of the audio piece as absolute or relative Zeitan¬ gift. Then, as is already apparent in a block of FIG. 5 designated with clustering, a so-called segment similarity representation or segment similarity matrix is created. An example of a segment similarity matrix is shown in FIG. The similarity matrix in FIG. 7 is basically similar to the feature similarity matrix of FIG. 6, but now, as in FIG. 6, features from windows are no longer used, but features from a whole segment. The segment similarity matrix has a similar proposition to the feature similarity matrix, but with a much coarser resolution, which of course is desired when considering that window lengths are in the range of 0.05 seconds, while reasonably long segments are in the range of maybe 10 seconds of a piece lying.

Then, in a block 522, a clustering is performed, ie an arrangement of the segments into segment classes (an arrangement of similar segments in the same segment class), in order then to mark the found segment classes in a block 524, which is also referred to as "labeling Thus, in labeling, it is determined which segment class contains segments which are stanzas, which are reflections, which are intros, outros, bridges, etc.

Finally, in a block labeled 526 in FIG. 5, a music score is created, which is e.g. B. can be provided to a user, without redundancy of a piece only z. B. a verse, a chorus and the intro to hear.

The individual blocks will be discussed in more detail below.

As has already been explained, the actual segmentation of the piece of music does not take place until the feature matrices have been generated and stored (block 504). - fi -

Depending on which feature the music piece is to be examined for its structure, the corresponding feature matrix is read out and loaded into a main memory for further processing. The feature matrix has the dimension number of analysis windows times the number of feature coefficients.

The similarity matrix brings the feature course of a piece into a two-dimensional representation. For each pairwise combination of feature vectors, the distance measure is calculated, which is kept fixed in the similarity matrix. There are various possibilities for calculating the distance measure between two vectors, namely, for example, the Euclidean distance measurement and the cosine distance measurement. A result D (i, j) between the two feature vectors is stored in the i, jth element of the window similarity matrix (block 506). The main diagonal of the similarity matrix represents the course over the entire piece. Accordingly, the elements of the main diagonal result from the respective comparison of a window with itself and always have the value of the greatest similarity. For the cosine distance measurement this is the value 1, for the simple scalar difference and the Euclidean distance this value is 0.

To visualize a similarity matrix, as shown in FIG. 6, each element i, j is assigned a gray value. The gray values are graduated in proportion to the similarity values, so that the maximum similarity (the main diagonal) corresponds to the maximum similarity. By means of this representation, the structure of a song can already be visually recognized on the basis of the matrix. Areas of similar feature expression correspond to quadrants of similar brightness along the main diagonal. Finding the boundaries between the areas is the task of the actual segmentation. The structure of the similarity matrix is important to the novelty measure calculated in kernel correlation 512. The measure of novelty arises from the correlation of a special kernel along the main diagonal of the similarity matrix. An exemplary kernel K is shown in FIG. If one compares this kernel matrix along the main diagonal of the similarity matrix S, and sums up all the products of the superimposed matrix elements for each time point i of the piece, then the novelty measure is obtained, which is shown by way of example in FIG. 9 in a smoothed form , Preferably, the kernel K in FIG. 5 is not used, but rather an enlarged kernel, which is additionally superimposed with a Gaussian distribution, so that the edges of the matrix tend towards 0.

The selection of the striking maxima in the novelty course is important for the segmentation. The selection of all maxima of the unsmoothed novelty course would lead to a strong over-segmentation of the audio signal.

Therefore, the novelty measure should be smoothed with different filters, such as IIR filters or FIR filters.

If the segment boundaries of a piece of music are extracted, then similar segments must be identified as such and grouped into classes.

Foote and Cooper describe the computation of a segment-based similarity matrix using a Cullback-Leibler distance. For this purpose, individual segment feature matrices are extracted from the entire feature matrix on the basis of the segment boundaries obtained from the course of novelty, ie each of these matrices is a submatrix of the entire feature matrix. The resulting segment similarity matrix 520 is now subjected to a singular value decomposition (SVD, SVD = Singular Value Decomposition). Then one obtains singular values in descending order.

In block 526, an automatic summary of a piece is then carried out on the basis of the segments and clusters of a piece of music. For this purpose, first the two clusters with the largest singular values are selected. Then, the segment with the maximum value of the corresponding cluster indicator is added to this summary. This means that the summary includes a stanza and a chorus. Alternatively, all repeated segments can be removed to ensure that all information of the piece is provided, but always exactly once.

Concerning further techniques for segmentation / music analysis, CHU, s. / LOGAN B .: Music Summary using Key Phases. Technical Report, Cambridge Research Laboratory 2000, BARTSCH, M.A. / WAKEFIELD, g. H.: To Catch a Chorus: Usin <g Chroma-Based Representation for Audio Thumbnailing. Preparations of the IEEE Workshop on Signal Processing to Audio and Acoustics 2001. http://musen.engin.umich.edu/papers/bartsch wakefield waspaaOl final.pdf, referenced

A disadvantage of the known method is the fact that the singular value decomposition (SVD) for segment class formation, that is to say the assignment of segments to clusters, is very computationally intensive and is problematic in the evaluation of the results. Thus, if the singular values are nearly equal, then a possibly wrong decision is made that the two similar singular values actually represent the same segment class and not two different segment classes.

It was also found that the results ^~ by Singularwer tzerlegung obtained are then always prob¬ lematischer when strong Ähnlichkeitswertunterr- There are differences, if a piece contains very similar parts, such as stanza and chorus, but also relatively different parts, such as intro, outro or bridge.

Also problematic about the known method is that it is always assumed that the cluster among the two clusters with the highest singular values having the first segment in the song is the cluster "stanza" and that the other cluster is the cluster " Refrain "is. This procedure is based on the fact that it is assumed in the known method that a song always starts with a stanza. Experience has shown that this would result in significant faulty errors. This is problematic insofar as labeling is effectively the "harvesting" of the entire process, ie what the user experiences directly.Were the preceding steps were still so precise and complex, everything gets relativised, if in the end wrong is labeled, because then the user could lose confidence in the entire concept as a whole.

It should also be noted at this point that there is a particular need for automatic music analysis methods without the result always being able to be checked and, if appropriate, corrected. Instead, a VeH - "- drive can only be used on the market if it can run automatically without human Nachkorirektur.

A further disadvantage of the known concept is the fact that in the segmentation the segmentation calculated by the singular value decomposition is established. In other words, this means that both the clustering and the final labeling are based on the segmentation determined by singular value decomposition. However, clustering and labeling, and thus also the music summary, which is the actual product of the entire process for the listener, can never be better than the underlying segmentation. If an over-segmentation takes place, as is frequently the case for kernel-correlation-based concepts, one will end up with far too many segment classes that have to be reworked in order to avoid disturbing segment classes that are actually not appropriate for a major part ¬ if necessary completely remove. Dd_ese "post-repair" is unfavorable in that it eliminates audio information, since a listener, when navigating through the audio track due to the already-mentioned segment classes, can not hear the entire audio information, since insignificant segments do not actually belong to any one Main part correspond, have been completely eliminated in this Verfallren.

Even more important, however, is the fact that over-segmentation, which can also occur due to other segmentation methods, points to the fact that the original primary segmentation was not correct. The segments of the segment class, for example, denoted by "refrain" are then of different quality, for example, a segment in which the segmentation was correct has a longer refrain, while another segment, When the segmentation is incorrect, it will have a shorter chorus, and working with the segmented representation of the audio piece will cause synchronization problems and user confusion that may even go so far as to spoof the whole Reliance on the segmentation concept loses.

The object of the present invention is to achieve a more precise segmentation concept, which should also be compatible with an already existing first segmentation of the audio piece.

This object is achieved by a device for changing a segmentation of an audio piece according to claim 1, a method for changing a segmentation of an audio piece - IT -

according to claim 19 or a computer program according to claim 20.

The present invention is based on the finding that the over-segmentation is effectively counteracted if, after an original segmentation and subsequent segmentation class assignment, the originally already completed original segmentation is corrected. For this purpose, the device according to the invention comprises a segmentation correction device for correcting the segmentation, which is designed to merge a segment having a length that is shorter than a predetermined minimum length with a temporal preceding segment or a successor segment, by a modified segmentation of the audio track. According to the invention, this post-correction takes place after the first segmentation and the allocation to the segment classes which adjoins the first segmentation, thus also after the clustering. This makes it possible to merge not only short segments according to certain criteria with a preceding segment and a subsequent segment for correcting the segmentation, but also information about the segment class affiliation of the predecessor segment, about the segment class affiliation of the successor segment, or for this merger about the segment class affiliation of the short segment itself.

However, even without consideration of the segment class affiliations of the short segment, the predecessor segment or the successor seed segment, simple algorithms can achieve a segment merger on the basis of a check of the novelty values at the segment boundaries, which already has an acceptable hit probability.

However, due to the novelty values at the segment boundaries, the segment merge is preferably carried out only ciann, so to speak, as a last resort, if a corresponding speaking short segment could not be merged by prior verifications, in which the segment class affiliation of the concerned precursor / successor segments were taken into account.

In a preferred embodiment of the present invention, adaptive segment allocation is performed on the basis of the primary segmentation, however, when segmentation allocation conflicts occur, segments that are actually associated with a first segment class tend to have a different class of segment that caused the conflict become. If it then turns out that a segment with such a tendency is at the same time a short segment, and it further states that the tendency is due to a segment class h_ which also includes the temporally preceding or the temporally subsequent segment, then receive a segment merger that meets the requirements of the original representation of similarity on the basis of this tendency or trend.

The inventive concept is particularly advantageous in that no portion of the audio piece is completely eliminated. The user, who then navigates through the audio piece when all processing has finished, will find segments which constitute the changed segmentation whose total length is still equal to the original length of the audio piece.

In addition, a number of segment classes is obtained, which is equal to the number of main parts occurring in an audio piece.

Furthermore, the minimum length of a segment can be adjusted variably, solely on the basis of a temporal threshold value specification, which opens possibilities, particularly permissible minimum segment lengths, to the object, particularly in connection with a music identity identification. adapted music genre, especially since different genre Musik¬ different lengths of segments bring with it Rön¬ NEN.

Furthermore, the concept according to the invention also makes it possible to reduce the number of segment classes by assigning short segments only on the basis of a minimum length threshold value, until an expected number is fulfilled, without the segment display of the audio piece having holes includes.

In a preferred embodiment, the assignment of a segment to a segment class is based on an adaptive similarity mean for a segment, such that the mean of similarity takes into account which overall similarity score a segment as a whole takes into account Piece has. After such a mean similarity value has been calculated for a segment, for the calculation of which the number of segments and the similarity values of the plurality of similarity values assigned to the segment are required, then the actual assignment of a segment to a segment class, ie a cluster based on this similarity average. If a similarity value of a segment to the segment just considered is above the similarity mean, for example, then the segment is assigned as belonging to the segment class currently being considered. On the other hand, if the similarity value of a segment to the segment under consideration is below this similarity mean value, then it is not assigned to the segment kile.

In other words, this means that the assignment is no longer carried out as a function of the absolute size of the similarity values, but rather relative to. the similarity mean. This means that for a segment which has a relatively low similarity score, ie, for example, For example, for a segment that has an intro or outro, the similarity mean will be lower than for a segment that is a stanza or chorus. Thus, the strong deviations of the similarities of segments in pieces or the frequency of occurrence of certain segments in pieces are taken into account. B. numerical problems and thus ambiguities and da¬ associated false allocations can be avoided.

The concept according to the invention is particularly suitable for music pieces which consist not only of stanzas and choruses, that is to say have the segments which belong to segment class having equal similarity values, but also for pieces which, in addition to stanza and chorus, also have others Parts have, namely an introduction (Intro), an interlude (Bridge) or a conclusion (Outro).

In a preferred embodiment of the present invention, the calculation of the adaptive similarity mean and the assignment of a segment are performed iteratively, similarity values of assigned segments being ignored in the next iteration run. This results in a new maximum similarity absolute value for the next iteration run, ie the sum of the similarity values in a column of the similarity matrix, since the similarity absolute values corresponding to the previously assigned segments have been set to zero.

According to the invention, a segmentation post-correction is carried out in such a way that after the segmentation z. For example, based on the novelty value (the local maxima of the novelty value) and after a subsequent assignment to segment classes, relatively short segments are examined in order to see whether they can be assigned to the predecessor segment or the successor segment Segments below a minimum segment length indicate a high degree of probability of an over-segmentation. In an alternative preferred embodiment of the present invention, a labeling is carried out after the final segmentation and assignment into the segment classes, specifically using a special selection algorithm in order to obtain the most correct possible labeling of the segment classes as a stanza or chorus.

Preferred embodiments of the present invention will be explained in detail below with reference to the accompanying drawings. Show it:

1 shows a block diagram of the device according to the invention for grouping according to a preferred embodiment of the present invention;

FIG. 2 shows a flow chart for illustrating a preferred embodiment of the invention for iteratively assigning; FIG.

3 shows a block diagram of the mode of operation of the segmentation correcting device;

Figures 4a and 4b show a preferred embodiment of the segment class designator;

5 shows an overall block diagram of an audio analysis tool;

6 is an illustration of an exemplary feature similarity matrix;

7 shows an exemplary representation of a segment similarity matrix;

FIG. 8 shows a schematic illustration for illustrating the elements in a similarity matrix S; FIG. and 9 is a schematic representation of a smoothed novelty value.

1 shows a device for grouping temporal segments of a piece of music, which is subdivided into main parts which repeatedly appear in the piece of music, into different segment classes, one segment class being assigned to one main part. The present invention thus relates particularly to pieces of music which are subject to a certain structure in which similar sections appear several times and alternate with other sections. Most rock and pop songs have a clear structure in terms of their main parts.

The literature deals with the subject of music analysis mainly on the basis of classical music, but it also applies to rock and pop music. The main parts of a piece of music are also called "large shaped parts." A large shaped part of a piece is understood to mean a section which has a relatively uniform quality with regard to various features, eg, melody, rhythm, texture, etc. Definition applies generally in music theory.

Large moldings in rock and pop music are z. Eg verse, chorus, bridge and solo. In classical music, an interplay of refrain and other parts (couplets) of a composition is also called rondo. In general, the couplets contrast with the chorus, for example, melody, rhythm, harmony, key or instrumentation. This can also be transferred to modern light music. Just as the rondo has different forms (chain rondo, bowed rondo, sonata rondo), well-established patterns for building a love are also found in rock and pop music. These are, of course, only a few possibilities. Ultimately, of course, the composer decides how his piece is constructed. An example of a typical structure of a rock song is the pattern. ABABCDA B,

where A equals strophe, B equals refrain, C equals bridge, and D equals solo. Often a piece of music is introduced with a prelude (Intro), Σntros often consist of the same chord progression as dd_e stanza, but with different instrumentation, eg. B. without drums, without bass or distortion of the guitar in rock songs etc.

The device according to the invention initially comprises a device 10 for providing a similarity representation for the segments, the similarity representation having an associated plurality of similarity values for each segment, the similarity values indicating how the segment is similar to any other segment is. The similarity representation is preferably the segment similarity matrix shown in FIG. It has a separate column for each segment (segments 1-10 in Fig. 7) having the index "j." Furthermore, the similarity representation has a separate row for each segment, one row being designated by a row index i This is referred to below by means of the exemplary segment 5. The element (5, 5) in the main diagonal of the matrix of Fig. 7 is the similarity value of the segment 5 with itself, ie the maximum similarity value still mid-like to the segment No. 6, as denoted by the element (6, 5) or by the element (5, 6) of the matrix in Fig. 7. In addition, the segment 5 still has similarities to the segments 2 and 3, as shown by the elements (2,5) or (3,5) or (5,2) or (5,3) in Fig. 7. To the other segments 1, 4, 7, 8, 9, 10, segment # 5 has a similarity that is not visible in Figure 7;

A plurality of similarity values assigned to the segment is, for example, a column or a row of the segment similarity matrix in FIG. 7, which indicates this column or row on the basis of its column / row index which segment it refers to, namely, for example, to the fifth segment, and where this row / column comprises the similarities of the fifth segment to each other segment in the piece. Thus, the plurality of similarity values is, for example, a row of the similarity matrix or, alternatively, a column of the similarity matrix of FIG. 7.

The device for grouping temporal segments of the piece of music further comprises a means 12 for calculating a similarity mean value for a segment, using the segments and the similarity values of the plurality of similarity values assigned to the segment. The device 12 is designed to z. For example, to calculate a similarity mean for column 5 in FIG. If the arithmetic mean value is used in a preferred exemplary embodiment, the device 12 will add the similarity values in the column and divide them by the total number of segments. In order to eliminate the self-similarity, the similarity of the segment to itself could also be deducted from the result of addition, whereby of course then a division should no longer be carried out by all elements but by all elements less 1.

The means 12 for calculating could alternatively calculate the geometric mean value, that is to say, squareness each etch value of a column individually in order to sum the quadrated results, in order then to calculate a root from the summation result, which is given by the number Elements in the column ^' (or the number of Ele¬ elements in the column less 1) to share. Any other average values, such as the median value, etc., may be used as long as the mean value for each column of the similarity matrix is adaptively calculated, that is, a value calculated using the similarity values of the plurality of similarity values associated with the segment. The adaptively calculated similarity threshold is then provided to a segment 14 for assigning a segment to a segment class. The means 14 for assigning is designed to assign a segment to a segment class if the similarity value of the segment fulfills a predetermined condition with respect to the mean of similarity. For example, if the similarity mean is such that a larger value indicates greater similarity and a smaller value indicates lesser similarity, then the predetermined relationship will be that the similarity value of a segment must be equal to or above the similarity mean, thus the segment is assigned to a segment class.

In a preferred exemplary embodiment of the present invention, further devices exist to realize special embodiments, which will be discussed later. These devices are a segment selection device 16, a segment assignment conflicting device 18, a segmentation correction device 20 and a segment class designation device 22.

The segment selection device 16 in FIG. 1 is designed to first calculate an overall similarity value V (j) for each column in the matrix of FIG. 7, which is determined as follows:

P is the number of segments. SA is the value of the self-similarity of a segment to itself. Depending on the technique used, the value z. B. zero or one. The segment selector 16 will first calculate the value V (j) for each segment to then find the vector element i of the maximum value vector V. In other words, this means that the column in FIG. is selected, which achieves the greatest value or score in the addition of the individual similarity values in the column. This segment could, for example, be the segment No. 5 or the column 5 of the matrix in FIG. 7, since this segment has at least a certain similarity with three other segments. Another candidate in the example of FIG. 7 could also be the segment with the number 7, since this segment also has a certain similarity to three other segments, which is even greater than the similarity of the segment 5 to the segments 2 and 3 (higher gray shade in Fig. 7).

For the following example, it is now assumed that the segment selector 16 selects the segment # 7 because it has the highest similarity score due to the matrix elements (1,7), (4,7) and (10,7) , This means in other words that V is the component of the vector V (7) has the maximum value un ^~ ter all components of V.

Now the similarity score of the column I ₁ for the segment no. 7 is still divided by the number "9" in order to obtain from the device 12 the similarity threshold value for the segment.

Then, in the segment similarity matrix for the seventh row or column, it is checked which segment similarities are above the calculated threshold value, ie. H. with which segments the ith segment has an above-average similarity. All these segments are now also assigned to a first segment class like the seventh segment.

For the present example, it is assumed that the similarity of the segment 10 to the segment 7 is below average, but that the similarities of the segment 4 and the segment 1 to the segment 7 are above average. Therefore, in the first segment class next to the segment no. 7 also segment no. 4 and segment no. 1 are classified. By contrast, segment no. 10 becomes the ISIr segment due to the below-average similarity. 7 not classified in the first segment class.

After the assignment, the corresponding vector elements V (j) of all segments which were assigned to a cluster in this threshold value analysis are set to 0. In the example these are beside V (7) also the components V (4) and V (I). This means directly that the 7th, 4th and 1st columns of the matrix will no longer be available for a later maximum search, ie that they are zero, ie that they can never be a maximum.

This is roughly equivalent to setting the entries (1, 7), (4, 7), (7, 7) and (10, 7) of the semantic proximity mismatch to 0. The same procedure is used for column 1 (elements (1,1), (4,1) and (7, IL)) and column 4 (elements (1,4), (4,4), (7,4) and (10, 4)). However, due to the simpler handler wedge, the matrix is not changed but the components of V that belong to an assigned segment are ignored on the next maximum search in a later iteration step.

In a next iteration step, a new maximum is now selected from the remaining elements of V, that is to say V (2), V (3), V (5), V (6), V (8), V (9) and V (IO) searched. The segment no. 5, ie V (5), is expected to yield the largest similarity score. The second segment class then obtains segments 5 and 6. Due to the fact that the similarities to segments 2 and 3 are below average, the segments 2 and 3 are not placed in the second order clusters. Thus, the elements V (6) and V (5) are set to 0 by the vector ^" V on the basis of the following assignment, while the components V (2), V (3), V (8), V (9) and V (IO) of the third-order cluster selection vector remain. Then again a new maximum among the mentioned remaining elements of V is searched for. The new maximum could be V (IO), ie the component of V for segment 10. Segment 10 thus comes in the segment class of third order. Thus, it could also be found that the segment 7 also has an above-average similarity to the segment 10, although the segment 7 has already been identified as belonging to the first segment class. This results in an assignment conflict, which is resolved by the segment assignment conflicting device 18 of FIG.

A simple kind of resolution could be to simply not assign the segment 7 into the third segment class and e.g. For example, instead of assigning the segment 4, if for the segment 4 would not also conflict exist.

Preferably, however, in order not to disregard the similarity between the segment 7 and the segment 10, the similarity between 7 and 10 is taken into account in the following algorithm.

Generally, the invention is designed not to discount the similarity between i and k. Therefore, the similarity values S _s (i, k) of segment i and k are compared with the similarity value S _s (i ^* , k), where i ^{* is} the first segment assigned to the cluster C ^* . The cluster or the segment class C ^* is the cluster to which segment k is already assigned on the basis of a previous examination. The similarity value S _s (i ^* , k) is decisive for the fact that the segment k belongs to the cluster C ^* . If S _s (i ^* , k) is greater than S _s (i, k), the segment k is in cluster C ^* . If S _s (i ^* , k) is smaller than S _s (i, k), the segment k is taken out of the cluster C ^* and assigned to the cluster C. For the first case, ie if the segment k does not change the cluster membership, a tendency towards the cluster i is noted for the cluster C ^* . Preferably, however, this tendency is also noted " when segment k changes cluster membership. In this case, a tendency of this segment to the cluster in which it was originally recorded is noted. These tendencies may be advantageously used in a segmentation correction performed by the segmentation correction device 20.

The similarity value check is based on the fact that the segment 7 is the "original segment" in the first segment class, in favor of the first segment class e. but it will remain in the first segment class, but this fact is taken into account by the fact that segment no. 10 Ln of the third segment class is attested a trend towards the first segment class.

According to the invention, it is thus taken into account that, in particular, for the segments whose segment similarities to two different segment classes exist, these similarities are nevertheless not ignored, but, if appropriate, are taken into account later by the trend or tendency ,

The procedure continues until all segments in the segment similarity matrix are assigned, which is the case when all elements of vector V are set to zero.

For the example shown in FIG. 7, this would mean that next, in the fourth segment class, the maxiimam of V (2), V (3), V (8), V (9), ie the segment 2 and 3 be classified, then, in a fifth segment class, di_e segments 8 and 9, respectively, to classify until all segments have been assigned. Thus, the iterative algorithm shown in FIG. 2 is completed. In the following, the preferred implementation of the segmentation correcting device 20 will be described in detail with reference to FIG. 3.

Thus, it follows that in the calculation of the segment boundaries by means of the kernel correlation, but also in the calculation of segment boundaries by means of other measures, an over-segmentation of a piece frequently occurs, ie. H. too many segment boundaries or generally too short segments are calculated. An over-segmentation, z. B. caused by an incorrect subdivision of the stanza is according to the invention corrected by the fact that due to the segment length and the information in which segment class Vor¬ a successor or successor segment has been sorted ^ corrected. In other words, the correction serves to completely eliminate too short segments, ie to fuse with adjacent segments, and to segments which are short but are not too short, that is to say they have a short length, but longer than that Minimum length is still to undergo a special investigation, whether they may not still be merged with a predecessor segment or a successor segment. Basically, according to the invention, successive segments which belong to the same segment class are always merged. If the scenario shown in FIG. B. that the segments 2 and 3 come in the same segment class, they are automatically ver¬ melted together, while the segments in the first segment class se, ie the segments 7, 4, 1 are spaced apart and therefore (at least initially) can not be merged. This is indicated in FIG. 3 by a block 30. Now, in a block 31, it is examined whether segments have a segment length that is smaller than a minimum length. Thus, there are preferably different minimum lengths.

Relatively short segments which are shorter than 11 seconds (a first threshold) are only examined at all, while later even shorter segments (a second threshold): smaller than the first one) shorter than 9 seconds, and later remaining segments shorter than 6 seconds (a third threshold shorter than the second threshold) are again alternately treated ,

In the preferred embodiment of the present invention in which this staggered length check occurs, the segment length check in block 31 is initially directed to finding the segments shorter than 11 seconds. For the segments that are longer than 11 seconds, no post-processing is performed, as can be seen by a "No" at block 31. For segments which are shorter than 11 seconds, a trend check (block 32) is first carried out Thus, it is first examined whether a segment has an associated trend or an associated tendency due to the functionality of the segment assignment conflicting device 18 of Figure 1. In the example of Figure 7, this would be the segment 10 that is a trend If the tenth segment is shorter than 11 seconds, then in the example shown in Fig. 7, nothing would happen, also because of the trend check, since a fusion of the one under consideration occurs Segments takes place only if it has a tendency not to any cluster, so to any segment class, but a tendency to a cluster of an adjacent (vor¬ hero o However, this is not the case for the segment 10 in the example shown in FIG.

In order to avoid too short segments, which have no tendency to the cluster of an adjacent segment, the procedure is as shown in the blocks 33a, 33b, 33c and 33d in FIG. Thus, nothing is done on segments longer than 9 seconds, but shorter than 11 seconds. They are left. In a block 33a, however, for segments of the cluster X which are shorter than 9 seconds and for which both the An assignment to the cluster Y, which automatically means that such a segment is merged with both the predecessor segment and the successor segment, so that the total length of the segment is longer Segment is formed, which consists of the considered segment as well as the predecessor as well as the successor segment. Thus, a subsequent merger may result in a combination of initially separate segments over an intervening segment to be merged.

In a block 33b it is further explained what happens to a segment that is shorter than 9 seconds and that is the only segment in a segment group. For example, segment no. 10 is the only segment in the third segment class. If it were shorter than 9 seconds, it is automatically assigned to the segment class to which segment No. 9 belongs. This automatically leads to a fusion of the segment 10 with the segment 9. If the segment 10 is longer than 9 seconds, then this merger is not taken vor¬.

In a block 33c an examination is then made for segments shorter than 9 seconds which are not the only segment in a corresponding cluster X than in a corresponding segment group. They are subjected to a more detailed check in which a regularity in the cluster sequence is to be ascertained. Initially, all segments from the segment group X that are shorter than the minimum length are searched for. Subsequently, it is checked for each of these segments whether the predecessor and successor segments each belong to a uniform cluster. If all predecessor segments are from a uniform cluster, all segments that are too short from cluster X are assigned to the predecessor cluster. If, on the other hand, all successor segments are from a uniform cluster, the segments too short from cluster X are each assigned to the successor cluster. In a block 33d is executed what happens, even if this condition is not fulfilled for segments that are shorter than 9 seconds. In this case, a novelty value check is performed by resorting to the novelty value curve shown in FIG. 9. In particular, the novelty curve, which has arisen from the kernel correlation, is read out at the locations of the affected segment boundaries, and the maximum of these values is determined. If the maximum occurs at the beginning of a segment, the segments that are too short are assigned to the cluster of the successor segment. If the maximum occurs at a segment end, the segments that are too short are assigned to the cluster of the predecessor segment. If the segment labeled 90 in Fig. 9 were a segment shorter than 9 seconds, the novelty check at the beginning of the segment 90 would give a higher novelty value 91 than at the end of the segment, with the novelty value at the end of the segment 92 is designated. This would mean that the segment 90 would be assigned to the successor segment because the novelty value to the successor segment is less than the novelty value to the predecessor segment.

If there are still segments that are shorter than 9 seconds and have not yet been allowed to merge, a staggered selection is carried out once again below them. In particular, now all segments among the remaining segments shorter than 6 seconds are selected. The segments whose length is between 6 and 9 seconds from this group are left untouched.

However, the segments which are shorter than 6 seconds are now all subjected to the novelty check explained with reference to the elements 90, 91, 92 and assigned to either the predecessor segment or the successor segment, so that at the end of the in FIG. 3, all too short segments, namely all segments below a length of 6 Seconds have been merged intelli¬ gent with predecessor and successor segments.

This procedure according to the invention has the advantage that no elimination of parts of the piece has been carried out, that is, that no simple elimination of the segments which are too short has been carried out by setting them to zero, but that the entire complete piece of music is still due to the The totality of the segments is represented by the segmentation therefore no loss of information auf¬ occurred, which would be, however, if one z. For example, in response to over-segmentation, simply eliminating all too short segments "without regard to losses" would be easy.

Referring now to FIG. 4a and FIG. 5d, a preferred implementation of the segment class designator 22 of FIG. 1 is illustrated. According to the invention, two clusters are assigned the labels "stanza" and "refrain" during labeling.

According to the invention, not a largest singular value of a singular value decomposition and the associated cluster are used as a refrain and the cluster for the second largest singular value as a stanza. Furthermore, it is not fundamentally assumed that each song begins with a stanza, ie that the cluster with the first segment is the stanza cluster and the other cluster is the refrain cluster. Instead, according to the invention, the cluster: in the candidate selection having the last segment is called a refrain, and the other cluster is called a str-ophe.

So, for the two ultimately to S ^"trophe- / refrain selection waiting Cluster checked (40) welcher- cluster has the segment of the two segment groups in Liedverlaiαf vor¬ comes as the last segment of the segments to the same as a refrain to describe. The last segment may actually be the last segment in the song or else a segment which occurs later in the song than all segments of the other segment class. If this segment is not the actual last segment in the song, this means that there is still an outro.

This decision is based on the insight that the Re¬ frain in the vast majority of cases in a song kommerm behind the last stanza, so directly as the last segment of the song, if a piece z. B. is eroded with the refrain, or as a segment before an outro, which follows a Re¬ frain and with which the piece is terminated.

If the last segment is from the first segment group, then all segments of this first (highest-order) segment class are referred to as a refrain, as represented by a block 41 in FIG. 4b. In addition, in this case, all the segments of the other segment class that is to be selected are identified as "stxophe", since typically one class of the refrain and thus the other class will have the strokes of the two candidate segment classes.

If, on the other hand, the examination in block 40 reveals that which segment class in the selection has the last segment in the music piece history, that this is the second, ie rather non-valued segment class, then in a block 42 it is examined whether the second segment class has the first segment in the piece of music. This investigation is based on the commentary that the probability is very high that a song starts with a stream, not a chorus.

If the question is answered with "No" in block 42, that is to say if the second segment class does not have the first segment in the piece of music, then the second segment class is designated as a refrain, and the first segment class is characterized as a stanza, as it is is indicated in a block 43. answered against the query in block 42 with "yes", contrary to the rule, the second segment group is called a stanza and the first segment group as a refrain, as indicated in a block 44. The designation in block 44 is because the probability The fact that the second segment class corresponds to the chorus is already quite small: If the untruthful coincidence that a piece of music is introduced with a refrain adds up to a mistake in the clustering, for example, that the segment last considered is more incorrect Has been assigned to the second segment class.

FIG. 4b shows how the stanza / refrain determination has been carried out on the basis of two classes of segments available. After this stanza / refrain determination, the remaining segment classes can then be designated in a block 45, where an outro will possibly be the segment class having the last segment of the piece, while an intro will be the segment class which has the first segment of a piece in itself.

It is shown below with reference to FIG. 4a, how the two segment classes are determined, the candidates for the. deliver in algorithm shown in Fig. 4b.

In general, an assignment of the labels "stroke" and "refrain" is carried out in the labeling, whereby one segment group is marked as a stanza segment group, while the other segment group is marked as a refrain segment group. Basically, this concept is based on the assumption (Al) that the two clusters (Segmentgrupperα) with the highest similarity values, ie cluster 1 and cluster 2, the Re¬ frain and stanza clusters correspond. The last of these two clusters is the refrain cluster, assuming that a verse follows a chorus. The experience from numerous tests has shown that cluster 1 in most cases corresponds to the refrain. For cluster 2, however, the assumption (Al) is often not fulfilled. This situation usually occurs when there is either a third, frequently repeating part in the play, eg. B. a bridge, with a high Ähn¬ probability of intro and outro _Λ or for the not sel¬ th occurring case that a segment in the piece has a high similarity to the chorus, thus also has a high overall similarity, the similarity to the refrain but gera¬ de is not big enough to belong to the cluster 1 yet.

Research has shown that this situation often occurs for variations of the refrain at the end of the piece. In order to mark chorus and stanza correctly (as labein) with the greatest possible certainty, the segment selection described in FIG. 4b is improved such that, as illustrated in FIG. 4a, the two candidates are dependent on the chorus chorus selection is determined by the segments present in the same.

First, in a step 46, the cluster or the segment group with the highest similarity value (value of the component of V, which was once a maximum for the first-determined segment class, ie segment 7 in the example of FIG ), that is, the segment group determined in the first run of Fig. 1, is included in the stanza-refrain selection as the first candidate.

It is now questionable which other segment group will be the second participant in the verse-chorus selection. The most probable candidate is the second highest segment class, ie the segment k, which is found on the second pass through the concept described in FIG. However, this does not always have to be this way. Therefore, firstly, for the second highest segment class (segment 5 in FIG. 1) ₁ , cluster 2 checks whether this class has only: a single segment or exactly two segments, one of them the two segments is the first segment and the other segment of the two is the last segment in the song (block 47).

If, on the other hand, the answer to the question is "no", then the second highest segment class, for example, has at least three segments, or two segments, one of which is within the piece and not at the "edge" of the piece, then the second segment class remains initially in the selection and is henceforth referred to as "Second Cluster".

If the question in block 47 is answered with "yes", then the second highest class is eliminated (block 48a), then it is replaced by the segment class which occurs most frequently in the entire song (in other words, which contains the most segments) ) and not the highest segment class (cluster 1) .This segment class will henceforth be referred to as "second cluster".

Second clusters, as explained below, still have to measure themselves with a third segment class (48b), which is referred to as a "third cluster" in order to ultimately survive the selection process as a candidate.

The segment class "Third Cluster" corresponds to the cluster which occurs most frequently in the entire song, but the highest segment class (cluster 1) still corresponds to the segment class "second cluster", so to speak the next most frequently (often equally frequently) occurring clusters after cluster 1 and "second clusters".

With regard to the so-called bridge problem, it is now checked for "third cluster" whether it belongs more in the stanza-refrain selection than "second cluster" or not. This happens because "second clusters" and "third clusters" often occur the same number of times, so one of them may represent a bridge or another recurring juncture. To ensure that the segmentation class is selected from the two that most closely corresponds to the stanza or chorus, so not a bridge or other intermediate piece, the investigations shown in blocks 49a, 49b, 49c are performed.

The first examination in block 49a is that it is examined whether each segment of third cluster has a certain minimum length, wherein as threshold value z. B. 4% of the 'entire song length is preferred. Other values between 2% and 10% can also lead to meaningful results.

In a block 49b, it is then examined whether ThirdCluster has a greater total portion of the song than SecondCluster. For this purpose, the total time of all segments in ThirdCluster is added up and compared with the correspondingly added total number of all segments in SecondCluster, in which case ThirdCluster has a greater overall proportion of the song than SecondCluster, if the added segments in ThirdChuster give a greater value than the added ones Segments in SecondCluster.

In block 49c, it is finally checked whether the distance of the segments from third cluster to the segments from cluster 1, ie the most frequent cluster, is constant, ie. H. whether a regularity is evident in the sequence.

If all these three conditions are answered with "yes", then ThirdCluster enters the stanza-refrain selection, but if at least one of these conditions is not met, ThirdCluster does not enter the stanza-refrain selection the stanza-refrain selection, as represented by a block 50 in Fig. 4a, completes the "candidate search" for the stanza-refrain selection, and the algorithm shown in Fig. 4b is started which is the final one Segment class includes the stanzas, and which Segmentklas¬ se includes the chorus.

It should be noted at this point that the three conditions in the blocks 49a, 49b, 49c could alternatively also be weighted, so that z. For example, a no answer in block 49a is "overruled" if both the query in block 49b and the query in block 49c are answered "yes". Alternatively, a condition of the three conditions could be emphasized, so that z. For example, it only examines whether there is regularity of the sequence between the third segment class and the first segment class, while the queries in blocks 49a and 49b are not performed or are only performed if the query in block 49c reads " No answer is given, but for example a relatively large total proportion in block 49b and relatively large minimum quantities are determined in block 49a.

Alternative combinations are also possible, and for a low-level examination only the query of one of the blocks 49a, 49b, 49c will be sufficient for certain implementations.

Hereinafter, exemplary implementations of the block 526 for performing a music summary are set forth. So there are different possibilities, which can be stored as music summary. Two of them are described below, namely the option entitled "Refrain" and the option entitled "Medley".

The refrain possibility consists of selecting a version of the reiner as a summary. This will attempt to choose a chorus version that lasts between 20 and 30 seconds if possible. If a segment with such a length is not contained in the refrain cluster, then a version is chosen which has the smallest possible deviation to a length of 25 seconds. Is the chosen chorus If it is longer than 30 seconds, it will be hidden for more than 30 seconds in this example and if it is shorter than 20 seconds, it will be extended to 30 seconds with the following segment.

Storing a medley for the second option is more like an actual summary of a piece of music. In this case, a section of the stroke, a section of the refrain and a section of a third segment are constructed in their actual chronological order as a medley. The third segment is selected from a cluster that has the largest total portion of the song and is not a verse or chorus.

The following priority is used to search for the most appropriate sequence of segments:

"Third segment ^λλ stanza refrain;

- stanza-chorus- "third segment"; or

- stanza "third segment" refrain.

The selected segments are not installed in their full length in the medley. The length is preferably set to a fixed 10 seconds per segment, so that a total of 30 seconds is created again. However, alternative values are also readily feasible,

Preferably, after the feature extraction in block 502 or after block 508, a grouping of a plurality of feature vectors in block 510 is performed to save computation time by forming an average over the grouped feature vectors. In the next processing step, the calculation of the similar-time matrix, the grouping can save computing time. To calculate the similarity matrix, a distance is determined between all possible combinations of two feature vectors. Dara.us For n vectors, calculations are made over the entire piece of nxn calculations. A grouping factor g indicates how many consecutive feature vectors are grouped into a vector via the mean value formation. This can reduce the number of calculations.

The grouping is also a kind of noise suppression, in which small changes in the feature expression of successive vectors are compensated on average. This property has a positive effect on finding large song structures.

The concept according to the invention makes it possible to navigate through the calculated segments by means of a special music player and to selectively select individual segments, so that a consumer in a music shop can easily start immediately by, for example, pressing a certain key or by activating a specific software command Re¬ frain of a piece can jump to determine whether the chorus pleases him, and then perhaps listen to a strife, so that the consumer can finally make a purchase decision. Thus, it is easily possible for a buyer to hear from a piece exactly what he is particularly interested in, while he himself z. , B. the solo or the bridge then actually save for the listening pleasure at home.

Alternatively, the concept according to the invention is also of great advantage for a music shop, since the customer can listen in and therefore quickly and ultimately buy, so that the customers do not have to wait long to listen in, but also quickly get their turn , This is due to the fact that a user does not have to constantly toggle back and forth, but receives targeted and quick all the information of the piece that he also wants to have.

Furthermore, reference is made to a significant advantage of the inventive concept, namely that in particular Because of the post-correction of the segmentation no information of the piece is lost. Thus, although all Segmen¬ be te, are preferably shorter than 6 seconds, mi ^~ t the predecessor or successor segment fused. However, no segments as short as they are will be eliminated. This has the advantage that the user can listen to anything in the piece, in principle, so that _r would have actually completely eliminated a section of the piece though but briefly a user yet very well gefallendes striking piece that would have removed at a segmentation post-correction, yet is available to the user so that he may perhaps make a well considered purchase decision just because of the short distinctive piece.

However, the present invention is also applicable in other A.nwen-dungsszenarien, for example, inferfer monitoring, ie where an advertiser wants to check whether the Aud lot, for which he has bought advertising time, actually played over the entire length has been. An audio piece may include, for example, music segments, speaker segments, and noise segments. The segmentation algo- rithm, ie the segmentation and subsequent classification into segment groups, then makes it possible to carry out a scully-free and considerably less complicated check than a complete sample-wise comparison A comparison of how many segment classes are found, what are and how many segments are in the individual segment classes, with a specification based on the ideal advertising piece. whether a broadcaster or television station has actually aired all or part of the advertising section of the commercial.

The present invention is further advantageous in that it can be used for research in large music data banks in order, for example, only to listen through the RefTrains of many pieces of music in order then to listen to a music score. -? R -

Prograinmauswahl perform. In this case, only individual segments from the segment class marked "refrain" would be selected from many different pieces and provided by a program provider, Alternatively, there could also be an interest, for example from an artist, for all the guitar solos According to the invention, these can likewise be readily provided by always selecting one or more segments (if present) in the segment class designated "Solo" from a large number of pieces of music, for example, B. assembled and provided as a file.

Yet other possible applications are to mix stanzas and choruses from different amido pieces, which will be of particular interest to DJs and opens up completely new possibilities of creative music synthesis, which can be carried out precisely and, above all, automatically automatically. Thus, the concept according to the invention can be easily automated since it requires no user intervention at any point. This means that users of the inventive concept by no means require special training, except for. For example, a common skill in dealing with normal software user interfaces.

Depending on the practical conditions, the inventive concept d_n hardware or can be implemented in software. The implementation can be carried out on a digital storage medium, in particular a floppy disk or CD with control signals which can be electronically read-out, which can cooperate with a programmable computer system in such a way that the corresponding method is executed. In general, the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method according to the invention when the computer program product runs on a computer. In other words puts the invention thus represents a computer program with a program code for carrying out the method when the computer program runs on a computer.

Claims

claims

1 . Device for changing a segmentation of an audio piece into temporal segments, the audio piece being divided into main components repeatedly occurring in the audio piece, with the following features:

a device (10, 12, 14) for providing a

Representation of the audio piece in which the segments of the audio piece are assigned to different segment classes, each one segment segment being assigned to a main part; and

a segment correction device (20) for correcting the segmentation, wherein the segment correction device (20) is designed to form a short segment with a length that is shorter than a predetermined minimum length, with a temporal precursor segment or a temporal successor segment to merge in order to obtain a changed segmentation of the audio signal.

2. Apparatus according to claim 1, wherein the segment correction means (20) is adapted to use a segment class membership of the short segment for merging the short segment.

3. Device according to claim 1 or 2, in which the segment is formed mentkorrektureinricht "ung (20) to determine _r sol¬ che segments as Kixrz segments whose temporal longer less than 18 seconds and insbeson particular less than 12 seconds is.

4. Device according to one of the preceding claims, in which the segment correction device (20) is designed to use the short segment using merges segmentations of a time precursor segment or of a temporal precursor segment or of the short segment itself with the temporal precursor segment or the temporal successor segment.

5. Apparatus as claimed in any one of the preceding claims, wherein the means (10, 12, 14) for providing is adapted to provide a novelty value for segment boundaries of the short segment, the novelty value indicating how much neutron content the short cut segment Segment with respect to a sec- tion bordering the segment, and wherein the segment corrector (20) is configured to. to merge the short segment with the segment of the short segment adjoining the segment boundary, which has a novelty value to a lower ISIeuheitsge- halt compared to a novelty value at a ande ren segment boundary of the short segment ref ^~ t ,

6. Device according to claim 5, in which the segment correction device (20) is designed to perform the fusion on the basis of the novelty value only for short segments which have a predetermined: e minimum length less than 8 seconds and in particular less than 6 seconds have.

7. Device according to claim 5 or 6, in which the segment correction means (20) is designed to merge only such short segments on the basis of an examination of a novelty value, which in a preceding test using information about a segment class affiliation of the curriculum Segments, the temporal precursor segment or the successor segment could not be merged.

8. Device according to one of the preceding claims, further comprising the following feature: a segment assignment conflicting device (18),. which is designed to calculate a first similarity value of the conflict segment with a segment of a first segment class in the case in which a conflict segment for α should be assigned to two different segment classes by the device (14), and by a second similarity value of the conflict segment with a segment of a second segment class, and

wherein the means (14) is adapted to assign, in the case in which the second similarity value points to a stronger similarity of the conflict segment to the segment of the second segment class, the conflict segment from the first segment class remove and assign the second segment class sen.

9. Device according to claim 8, wherein the segment allocation conflicting device (18) is designed to assign the segment a tendency to the first segment class in the case of a removal of the segment from the first segment class, or in the case of an unsuccessful removal of the segment assign the segment a tendency to the second segment class.

10. Device according to one of the preceding Anspxüche, in which the Segmentierungskorrektureinrichtung ^" (20) is designed to determine for a segment that is shorter than a predetermined minimum length, whether a tendency of the segment with a Segmentklasse Ü coincides, the one temporally belongs to the previous segment and, in this case, to merge the segment iti-it to the temporally preceding segment or is adapted to determine, for a segment shorter than a predetermined minimum length, whether a trend of the segment is segmented indicates, which belongs to a temporally following segment, and in this case to merge the segment with the temporally following segment.

11. Device according to one of the preceding claims, wherein the Segmentierungskorrektureinrichtung (20) is formed to merge temporally successive Seg¬ elements belonging to the same segment class.

12. Device according to one of the preceding claims, wherein the Segmentierungskorrektureinrichtung (20) is formed to le¬ only segments to correct the segments having a temporal segment length, which is shorter than a predetermined Mi¬ nimallänge.

13. Device according to claim 12, wherein the segmentation correction device (20) is designed to include a selected segment from a second segment class whose time precursor segment and its temporal successor segment of a first segment class belong to the predecessor segment and the successor segment.

The apparatus of claim 12 or 13, wherein the segmentation correcting means (20) is configured to order a segment that is in a segment class that includes only a single segment with the preceding segment or segment ¬ melt.

15. The apparatus of claim 12, 13 or 14, wherein the Segmentierungskorrektureinrichtung (20) is adapted to merge a plurality of selected segments that are in the same segment class, each with a temporally voraus¬ outgoing segment or in each case a temporally nach¬ ing segment if all Segments of the segment class selected include precursor segments from one and the same segment class or successor segments from the same segment class.

16. Apparatus as claimed in any one of the preceding claims, wherein the segmentation correcting means (20) is adapted to determine a first novelty value at a beginning of the segment for a segment having a smaller length of time than a predetermined minimum length, and a second novelty ¬ value at one end of the segment to determine and to merge the segment with a temporally following segment when the first novelty value is greater than the second novelty value, or to merge the segment with a temporally preceding segment when the first novelty value is smaller as the second novelty value.

17. Device according to one of the preceding claims, wherein the Segmentierungskorrektureinrichtung (2.0) is designed to perform different corrective actions depending on different vorb> e-tuned segment lengths.

18. Device according to one of the preceding Ansprücbre, wherein the means (10, 12, 14) for providing the representation of the audio piece comprises the following features:

means (10) for providing a similarity representation for the segments, the similarity representation having for each segment an associated plurality of similarity values, the similarity values indicating how similar the segment is to each other segment of the audio piece;

means (12) for calculating a similarity threshold for a segment using the plurality of similarity values associated with the segment; and

means (14) for assigning a result to a segment class when the similarity value of the segment satisfies a predetermined condition regarding the similarity threshold.

19. A method of changing a segmentation of an audio piece into temporal segments, wherein the audio piece is structured into repeated portions of the audio piece, comprising the steps of:

Providing (10, 12, 14) a representation of the audio piece in which the segments of the audio stile are assigned to different segment classes, wherein one segment class is assigned to one main part in each case; and

Correcting (20) the segmentation by merging a short segment having a length shorter than a predetermined minimum length with a temporal precursor segment or a temporal successor segment to obtain a changed segmentation of the audio signal.

20. Computer program with a program code for carrying out the method for changing a segmentation according to claim 22, when the computer program is executed on a computer.