EP1743324A1 - Device and method for analysing an information signal - Google Patents
Device and method for analysing an information signalInfo
- Publication number
- EP1743324A1 EP1743324A1 EP05744658A EP05744658A EP1743324A1 EP 1743324 A1 EP1743324 A1 EP 1743324A1 EP 05744658 A EP05744658 A EP 05744658A EP 05744658 A EP05744658 A EP 05744658A EP 1743324 A1 EP1743324 A1 EP 1743324A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- short
- spectra
- term
- spectrum
- information signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 34
- 238000001228 spectrum Methods 0.000 claims abstract description 220
- 238000004458 analytical method Methods 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 230000003595 spectral effect Effects 0.000 claims description 37
- 238000012880 independent component analysis Methods 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000013518 transcription Methods 0.000 abstract description 6
- 230000035897 transcription Effects 0.000 abstract description 6
- 230000004069 differentiation Effects 0.000 description 13
- 230000033764 rhythmic process Effects 0.000 description 12
- 230000002459 sustained effect Effects 0.000 description 12
- 230000005236 sound signal Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 9
- 238000000354 decomposition reaction Methods 0.000 description 9
- 230000001052 transient effect Effects 0.000 description 8
- 238000000926 separation method Methods 0.000 description 7
- 230000009466 transformation Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000009527 percussion Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000000513 principal component analysis Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000005204 segregation Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000002087 whitening effect Effects 0.000 description 2
- 241001077262 Conga Species 0.000 description 1
- 235000011312 Silene vulgaris Nutrition 0.000 description 1
- 240000000022 Silene vulgaris Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001020 rhythmical effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Definitions
- the present invention relates to the analysis of information signals, such as audio signals, and in particular to the analysis of information signals which consist of a superposition of partial signals, wherein a partial signal can originate from a single source or a group of individual sources.
- the extraction of fingerprints is of great importance in particular when analyzing audio signals, that is to say signals which comprise music and / or speech.
- the aim is also to "enrich" audio data with metadata, in order to retrieve metadata for a piece of music, for example, on the basis of a fingerprint.
- the "fingerprint” should be meaningful on the one hand, and be as short and concise as possible on the other. “Fingerprint” thus designates a com- pact generated from a music signal limited information signal, which does not contain the metadata, but is used for referencing the metadata, for example by searching a database, for example in a system for identifying audio material (“AudioID”).
- Music data usually consists of superimposing partial signals from individual sources. While there is typically a relatively small number of individual sources in pop music, namely the singer, the guitar, the bass guitar, the drums and a keyboard, the number of sources for an orchestral piece can be very large.
- An orchestral piece and a pop music piece for example, consist of an overlay of the tones emitted by the individual instruments.
- An orchestral piece or any piece of music thus represents a superposition of partial signals from individual sources, the partial signals being the tones generated by the individual instruments of the orchestra or pop music ensemble, and the individual instruments being individual sources.
- groups of original sources can also be understood as individual sources, so that at least two individual sources can be assigned to a signal.
- An analysis of a general information signal is shown below using an orchestral signal as an example.
- An orchestral signal can be analyzed in a number of ways. For example, there may be a desire to recognize the individual instruments and to extract the individual signals of the instruments from the overall signal and, if necessary, to convert them into a musical notation, the musical notation functioning as "metadata". Further possibilities of the analysis are a dominant rhythm to extract, with a rhythm extraction based on the percussion instruments This is done on the basis of the more tone-giving instruments, which are also referred to as harmonic sustained instruments. While percussion instruments typically include timpani, drums, racing or other percussion instruments, these are among the harmonic ones sustained instruments all other instruments, such as violins, wind instruments, etc.
- the percussion instruments also include all those acoustic or synthetic sound generators that contribute to the rhythm section due to their sound characteristics (e.g. rhythm guitar).
- rhythm extraction of a piece of music it would be desirable to extract only percussive parts from the entire piece of music and then carry out a rhythm recognition on the basis of these percussive parts without the rhythm recognition being “disturbed” by signals from the harmonically sustained instruments.
- any analysis with the aim of extracting metadata that only requires information from the harmonic instruments e.g. a harmonic or melodic analysis
- BSS blind source separation
- ICA independent component analysis
- the term BSS encompasses techniques for separating signals from a mix of signals with a minimum of prior knowledge of the nature of the signals and the mixing process.
- the ICA is a process that makes use of the assumption that the sources on which a mix is based are at least to a certain extent statistically independent of one another. Furthermore, the mixing process is assumed to be unchangeable in time and the number of mixed signals observed is not less than the number of source signals on which the mixing is based.
- ICA Independent Subspace Analysis
- [1] shows a procedure for the separation of single sources from mono audio signals.
- an application for a separation into single tracks and then the rhythm analysis is given.
- a component analysis is carried out in order to achieve a separation into percussive and non-percussive sounds of a polyphonic piece.
- the Independent Component Analysis is applied to amplitude bases that are obtained from a spectrogram representation of a drum track using generally calculated frequency bases. This is done for the purpose of transcription.
- this process is extended to polyphonic pieces of music.
- Casey s first publication mentioned above is shown below by way of example for the prior art.
- This publication describes a technique for separating mixed audio sources by technology the independent subspace analysis.
- an audio signal is split into individual component signals using BSS techniques.
- BSS techniques To determine which of the individual component signals belong to a multicomponent subspace, a grouping is carried out in such a way that the similarity of the components to one another is represented by a so-called Ixegram.
- the Ixegram is called the cross entropy matrix of the independent components. It is calculated by examining all individual component signals in pairs in a correlation calculation in order to find a measure of how similar two components are.
- the cost function is minimized, so that ultimately there is an assignment of individual components to individual subspaces.
- Applied to a signal that represents a speaker in the context of a continuous waterfall noise the speaker results as a subspace, the reconstructed information signal of the speaker subspace showing a significant attenuation of the waterfall noise.
- a disadvantage of the concepts described is the fact that the case that the signal components of a source come to lie on different component signals is very likely to occur. This is the reason why, as has been explained above, a complex and computation-intensive similarity calculation is carried out among all component signals in order to obtain the two-dimensional similarity matrix, on the basis of which a component function is then ultimately classified using a cost function to be minimized is carried out in subspaces.
- a further disadvantage is that in the case where there are several individual sources, i.e. where the output signal is not known a priori, a similarity distribution does exist after a long calculation, but that the similarity distribution itself does not yet provide any actual insight into the actual audio scene.
- the viewer only knows that certain component signals are similar to one another with regard to the minimized cost function. However, he does not know which information these subspaces ultimately received or which original individual source or which group of individual sources are represented by a subspace.
- the Independent Subspace Analysis can thus be used to break down a time-frequency representation, eg a spectrogram, of an audio signal into independent component spectra.
- a time-frequency representation eg a spectrogram
- the previous methods described previously rely either on a calculation-intensive determination of frequency and amplitude bases from the entire spectrogram or on a priori defined frequency bases.
- Such a priori defined frequency bases or profile spectra consist, for example, in that one says that a trumpet is very likely to be in one piece and that a sample spectrum of a trumpet is then used for signal analysis.
- a spectrogram typically consists of a sequence of individual spectra, a hopping time period being defined between the individual spectra, and a spectrum representing a certain number of samples, so that a spectrum has a certain length of time, ie a block of samples of the signal is associated.
- the duration of the block of samples from which a spectrum is calculated is repeated. is presented to be significantly greater than the hopping time in order to obtain a satisfactory spectrogram with regard to the required frequency resolution and with regard to the required time resolution.
- this spectrogram representation is extremely redundant. If, for example, the case is considered that a hopping time period is 10 ms and that a spectrum is based on a block of samples with a time length of 100 ms, for example, each sample occurs in 10 successive spectra.
- the redundancy generated in this way can drive the computing time requirements to astronomical heights, particularly when a larger number of instruments is sought.
- the approach of working on the basis of the entire spectrogram is disadvantageous in those cases in which not all of the sources contained in a signal are to be extracted, but only, for example, sources of a certain type, that is to say sources that have a specific characteristic .
- a characteristic can relate to percussive sources, ie percussion instruments, or so-called pitched instruments, which are also referred to as harmonic-sustained instruments, which are typical melody instruments such as trumpet, violin, etc.
- a method that works on the basis of all of these sources is then too complex and ultimately not robust enough if, for example, only a few sources, namely the sources that are to fulfill a specific characteristic, are to be extracted.
- the object of the present invention is to create a robust and computationally time-efficient concept for analyzing an information signal.
- This object is achieved by a device for analyzing an information signal according to claim 1, a method for analyzing an information signal according to claim 24 or a computer program according to claim 25.
- the present invention is based on the finding that a robust and efficient information signal analysis is achieved by firstly extracting significant short-term spectra or short-term spectra derived from significant short-term spectra, such as difference spectra, etc., from the entire information signal or from the spectrogram of the information signal, with such Short-term spectra are extracted that come closer to a specific characteristic than other short-term spectra of the information signal.
- Short-term spectra which have percussive components are preferably extracted, and thus short-term spectra which have harmonic components are not extracted.
- the specific characteristic is a percussive or drum characteristic.
- the extracted short-term spectra or short-term spectra derived from the extracted short-term spectra are then fed to a device for decomposing the short-term spectra into component signal spectra, a component signal spectrum representing a profile spectrum of a sound source, which produces a sound that corresponds to the characteristic sought, and wherein another component signal spectrum represents a different profile spectrum of a sound source that generates a sound that also corresponds to the characteristic sought.
- an amplitude envelope over time is calculated on the basis of the profile spectra of the sound sources, the determined profile spectra as well as the original short-term spectra being used for the calculation of the amplitude envelope over time, so that for each point in time at which a short-term spectrum was recorded, a Amplitude value is obtained.
- the information obtained in this way namely different profile spectra and amplitude envelopes for the profile spectra, provide a complete description of the music or information signal with regard to the specified characteristic, after which it has been extracted, so that this information may already be sufficient, in order to make a transcription, that is, to first use concepts of feature extraction and segmentation to determine which instrument "belongs" to the spectrum of profiles, and which rhythm is present, that is to say which rises and falls are present, which indicate notes of this instrument played at certain points in time ,
- the present invention is advantageous in that the entire spectrogram is not used to compute the component analysis, but only extracted short-term spectra, that is to say the calculation of the independent subspace analysis (ISA) takes place only on the basis of a subset of all spectra, so that the computing requirements be lowered. Furthermore, the robustness with regard to the finding of certain sources is also increased, sometimes other short-term spectra that do not meet the specified characteristics, are not available in the component analysis and therefore do not represent a disturbance or "blurring" of the actual spectra.
- ISA independent subspace analysis
- the concept according to the invention is advantageous in that the profile spectra are determined directly from the signal without the problem of the prefabricated profile spectra resulting, which in turn would lead to either inaccurate results or to increased computational effort.
- the concept according to the invention is preferably used for the detection and classification of percussive, non-harmonic instruments in polyphonic audio signals, in order to obtain both profile spectra and amplitude envelopes for the individual profile spectra.
- FIG. 1 shows a block diagram of the device according to the invention for analyzing an information signal
- FIG. 2 shows a block diagram of a preferred embodiment of the device according to the invention for analyzing an information signal
- 3a shows an example of an amplitude envelope for a percussive source
- 3b shows an example of a profile spectrum for a percussive source
- 4a shows an example of an amplitude envelope for a harmonic sustained instrument
- FIG. 1 shows a preferred exemplary embodiment of a device according to the invention for analyzing an information signal which is fed via an input line 10 to a device 12 for providing a sequence of short-term spectra which represent the information signal.
- the information signal can also be supplied, for example in temporal form, to a device 16 for extracting significant short-term spectra or short-term spectra derived from the short-term spectra from the information signal, whereby the extracting device is designed to extract those short-term spectra that come closer to a specific characteristic than other short-term spectra of the information signal.
- the extracted spectra ie the original short-term spectra or the short-term spectra derived from the original short-term spectra, for example by differentiation, differentiation and rectification or by other operations, are fed to a device 18 for decomposing the extracted short-term spectra into component signal spectra, a component signal spectrum represents a profile spectrum of a sound source that produces a sound that corresponds to the characteristic sought, and another profile spectrum represents another sound source that generates a sound that also corresponds to the characteristic sought.
- the profile spectra are finally fed to a device 20 for calculating an amplitude envelope for the one sound source, the amplitude envelope indicating how the profile spectra of a sound source change over time, and in particular how the intensity or weighting of a profile spectrum changes over time.
- the device 20 is designed to work on the basis of the sequence of short-term spectra on the one hand and on the basis of the profile spectra on the other hand, as can be seen from FIG. 1.
- the device 20 for calculating provides amplitude envelopes for the sources, while the device 18 supplies profile spectra for the sound sources.
- the profile spectra and the associated amplitude envelopes provide a complete description of the portion of the information signal that corresponds to the specific characteristic.
- This part is preferably the percussive part of a piece of music.
- this part could also be the harmonic part.
- the device for extracting significant short-term spectra would be designed differently than in the case in which the specific characteristic is a percussive characteristic.
- Detection and classification of percussive, non-harmonic instruments is preferably carried out with the profile spectra F and the amplitude envelopes E, as is also represented by a block 22 in FIG. 2. However, this will be discussed later.
- the device 12 is designed to provide a sequence of short-term spectra in order to use a suitable time-frequency Transformation to generate an amplitude spectrogram X.
- the time / frequency device 12 is preferably a device for performing a short-term Fourier transformation with a specific hopping period, or comprises filter banks.
- a phase spectrogram is also obtained as an additional information source, as shown by a phase arrow 13 in FIG. 2.
- a differentiation is then made by differentiation along the temporal extent of each individual spectrogram line, that is to say each individual frequency bin.
- Difference spectrogram is fed to a maximum finder 16c, which is designed to search for the times t, that is to say for the indices of the corresponding spectrogram columns, for the occurrence of local maxima in a detection function e, which is calculated before the maximum finder 16c.
- the detection function can, for example, be added up over all times
- phase information which is supplied from block 12 to block 16c via phase line 13, as an indicator of the reliability of the maxima found.
- PCA Principle Component Analysis
- the transformation matrix T causes a dimension reduction to X, which results in a reduction in the number of columns in this matrix. Decorrelation and normalization of variance are also achieved.
- a non-negative independent component analysis is then carried out in block 18b.
- the method of non-negative independent component analysis on X shown in [6] for calculating a separation matrix A is carried out. According to the equation below, X is broken down into independent components.
- Independent components F are interpreted as static spectral profiles or profile spectra of the sound sources that occur.
- the amplitude base or the amplitude envelope E is then extracted in a block 20 according to the following equation for the individual sound sources.
- the amplitude base is interpreted as a set of time-varying amplitude envelopes of the corresponding spectral profiles.
- the spectral profile is obtained from the music signal itself.
- a feature extraction and a classification operation are then carried out in a block 22.
- the components are differentiated into two subsets, namely first into a subset with the properties non-percussive, ie quasi-harmonic, and into another percussive subset.
- the components with the property percussive / dissonant are further classified in different instrument classes ,
- the characteristics of percussiveness or spectral dissonance are used to divide the two subsets.
- instrument classes can be classified, for example:
- the device 16 can be designed to extract significant short-term spectra in order to carry out this extraction on the basis of actual short-term spectra, such as are obtained, for example, in the case of a short-term Fourier transformation.
- the specific characteristic is the drum characteristic or the percussive characteristic
- the differentiation leads the sequence of short-term spectra to a sequence of derived or differentiated spectra, each
- the PCA 18a and the non-negative ICA 18b that is to say more generally, the decomposition operation for decomposing the extracted short-term spectra in block 18 of FIG. 1 not with the original short-term spectra but with the derived short-term spectra.
- the differentiated signal is very similar to the original signal before the differentiation, which is particularly the case when there are very rapid changes in a signal. This applies to percussive instruments.
- the device 18 for disassembly which a PCA 18a carries out with a subsequent non-negative ICA (18b), anyway carries out a weighted linear compensation of the extracted spectra, which are supplied by the device, in order to determine a profile spectrum.
- the extracted spectra as a whole are subjected to certain weighting factors calculated according to the individual methods and are combined linearly, that is to say by subtraction or addition. Therefore, the effect is observed, at least in part, that the device 18 for storing the extracted short-term spectra can have a functionality that counteracts the differentiation, so that the profile spectra that are determined for the sound sources do not differentiated profile spectra but the actual profile spectra are.
- differentiated spectra i.e. of difference spectra from a difference spectrograph in connection with a decomposition algorithm in the device 18, which is based on a weighted linear combination of the individual extracted spectra, to profile spectra for the individual sound sources of high quality and high selectivity.
- the specific characteristic is not a percussive but a harmonic characteristic
- typical digital audio signals are first preprocessed by preprocessing the device 8. Furthermore, it is preferred to supply mono files with a width of 16 bits per sample at a sampling frequency of 44.1 Hz as the PCM audio signal which is input into the preprocessing device 8.
- These audio signals that is to say this stream of audio samples, which can also be a stream of video samples and generally a stream of information samples, are fed to the preprocessing device 8 in order to carry out preprocessing in the time domain using a software-based emu
- preprocessing stage 8 amplifies the high-frequency portion of the audio signal.
- STFT Short Time Fourier Transform
- a relatively large block size of preferably 4096 values and a high overlap are preferred for implementing the time / frequency device.
- a good spectral resolution is required for the lower frequency range, ie for the lower spectral coefficient.
- the temporal resolution is increased to a desired accuracy by maintaining a small hop size, that is to say a small hop interval between adjacent blocks.
- 4096 samples per block have been subjected to a short-time Fourier transformation, which corresponds to a temporal block length of 92 ms.
- a value of 10 ms is used as the hop size. This means that each sample value occurs more than 9 times in succession in a short-term spectrum.
- the device 12 is designed to obtain an amplitude spectrum X.
- the phase information can also be calculated and, as will be explained later, used in the extreme value or maximum finder 16c.
- the magnitude spectrum X now has n frequency bins or frequency coefficients and m columns or frames, ie individual short-term spectra.
- the time-variant changes of each spectral coefficient are differentiated across all frames or individual spectra, specifically by the differentiator 16a, in order to decimate the influence of harmonic sound sources and to simplify the subsequent detection of transients.
- the differentiation which preferably has a difference between two short-term spectra of the sequence, can also have certain normalizations.
- the maximum searcher 16c carries out an event detection, which will be discussed below.
- the acquisition of several local extreme values and preferably of local maxima, which are assigned to transient use events in the music signal, is carried out by first defining a time tolerance that separates two successive drum uses.
- a time of 68 ms is used as a constant value derived from the time resolution and knowledge of the music signal.
- this value determines the number of frames or individual spectra or differentiated individual spectra that must occur at least between two successive uses.
- the use of this minimum distance is also supported by the observation that a sixteenth note lasts 60 ms at an upper tempo limit of a very high tempo of 250 bpm.
- a detection function is derived from the differentiated and rectified spectrum, that is to say from the sequence of rectified (different) short-term spectra, on the basis of which the maximum search can be carried out.
- a sum is simply determined over all frequency coefficients or all spectral bins.
- the function obtained is folded using a suitable Hann window, so that a relatively smooth function e is obtained.
- a sliding window of the tolerance length is "pushed" over the entire path e in order to achieve the ability to obtain a maximum per step.
- the reliability of the maximum search is improved by preferably only retaining the maxima that appear in a window for more than one point in time, since they are very likely the peaks of interest.
- the unwrapped phase information of the original spectrogram is used as a reliability function. It has been found that a significant positively directed phase jump must occur in addition to an estimated application time t in the phase information, which prevents small ripples from being incorrectly regarded as inserts or “onsets”.
- a small section of the difference spectrogram namely a short-term spectrum created by differentiation, is now extracted and fed to the subsequent decomposition device.
- the functionality of the device 18a for performing a principal component analysis is discussed below. addressed.
- the information about the time of occurrence t and the spectral compositions of the inserts, ie the extracted short-term spectra X t are thus derived from the steps described in the previous section.
- a large number of transient events are typically found within the duration of the piece of music.
- Even a simple example of a piece at a speed of 120 beats per minute (bpm) shows that there can be 480 events in a four-minute section, provided that only quarter notes occur.
- the principal component analysis is used to find only a few significant subspaces or profile spectra
- an eigenvalue decomposition (EVD) of the covariance matrix of the data set is calculated. From the set of eigenvectors, the eigenvectors with the d largest eigenvalues are selected to provide the coefficients for the linear combination of the original vectors according to the following equation:
- T describes a transformation matrix that is actually a subset of the manifold of the eigenvectors.
- the reciprocal values of the eigenvalues used as scaling factors which not only leads to a decorrelation, but which also provides a standardization of variance, which in turn leads to a whitening or a whitening effect.
- a singular value decomposition (SVD) of X t can also be used. It has been found that the SVD is equivalent to the PCA with EVD.
- the whitened components X are subsequently fed into the ICA stage 18b, which will be discussed below.
- ICA Independent Component Analysis
- Disassemble component signals A requirement for optimal behavior of the algorithm is the statistical independence of the sources.
- a non-negative ICA is preferably used, which is based on the intuitive concept of optimizing a cost function that describes the non-negativity of the components.
- This cost function is related to a reconstruction error introduced by axis pair rotations of two or more variables in the positive quadrant of the common probability density function (PDF).
- PDF common probability density function
- the first concept is always fulfilled, since the vectors which are subjected to the ICA result from the differentiated and half-wave-balanced version X of the original spectrogram X, which thus never comprises values less than zero, but certainly values equal to zero.
- the second limitation is taken into account when the spectra collected at the time of use are considered to be the linear combinations of a small set of original source spectra that characterize instruments. Of course, this means a fairly rough approximation, but it turns out to be sufficiently good in the multitude of cases.
- A denotes a d x d segregation matrix, which is determined by the ICA process, which actually separates the individual components X.
- Sources F are also referred to as profile spectra in this document.
- each profile spectrum has n frequency bins, but is identical for all times - apart from the amplitude normalization - that is, the amplitude envelope. This means that such a profile spectrum only contains the spectral information that relates to an onset spectrum of an instrument.
- a transformation matrix R is used in accordance with the following equation:
- the spectral profiles obtained from the ICA process can be viewed as a transfer function of highly frequency-selective parts in a filter bank, with overlapping passbands leading to crosstalk in the output of the filter bank channels.
- the crosstalk measure between two spectral profiles is calculated according to the following equation.
- i ranges from 1 to d
- j ranges from 1 to d
- j is not equal to i.
- this value is related to the known cross-correlation coefficient, but it uses a different standardization.
- an amplitude envelope determination is now carried out in block 20 of FIG. 2.
- the original spectrogram i.e. the sequence of e.g. short-term spectra obtained by means 12 of FIG. 1 or in time / frequency / converter 12 of FIG. 2 are used.
- the following equation applies:
- the differentiated version of the amplitude envelopes from the difference spectrogram can also be determined as a second information source according to the following equation:
- the concept according to the invention provides highly specialized spectral profiles that are very close to the spectra of the instruments that actually appear in the signal. Nevertheless, the extracted amplitude envelopes are only in certain cases beautiful recording functions with sharp peaks, for example for dance-oriented music with very dominant percussive rhythm components. The amplitude envelopes often contain smaller peaks and plateaus, which can result from the above-mentioned cross talk effects.
- components mean both the spectral profiles and the corresponding amplitude envelopes. If the number d of components extracted is too low, artifacts of the components not taken into account are very likely to occur in other components. On the other hand, if too many components are extracted, The most prominent components are divided into several components, which can disadvantageously occur even with the correct number of components and can sometimes make it difficult to record the real components.
- a maximum number d of components is specified in the PCA or ICA process.
- the extracted components are then classified using a set of spectral-based and time-based features.
- the classification is intended to provide two pieces of information. First, the components from the further process that are recognized as non-percussive with high certainty are to be eliminated. Furthermore, the remaining components should be assigned to predefined instrument classes.
- the amplitude envelope for the trumpet shows a relatively rapid rise, but then a relatively slow decay, as is typical for harmoniously sustained instruments.
- the amplitude envelope for a percussive element rises very quickly and very strongly and also falls again just as quickly and steeply, since a drum sound typically does not linger very long due to the nature of the generation of this sound or subsides.
- the amplitude envelopes can thus be used for classification or feature extraction just as well as the profile spectra explained below, which are evident in the case of a percussive source (Fig. 3b; hi-hat) and Fig. 4b in the case of a harmoniously sustained instrument (guitar) differ.
- the harmonic sustained instrument shows a clear expression of the harmonics
- the percussive source has a rather noise-like spectrum that does not have any clearly defined harmonics, but which overall has an area in which energy is concentrated, this area, where energy is concentrated, is very broadband.
- a spectral-based measure i.e. a measure that is derived from the profile spectra (for example, FIGS.
- 3b and 4b is therefore preferably used to obtain spectra of harmonic sustained tones from spectra related to percussive tones separate.
- a modified version of the calculation of this measure is used, which shows a tolerance to spectral lag phenomena, a dissonance with all harmonics and a suitable standardization.
- a higher level of computational efficiency is achieved by replacing an original dissonance function with a weighting matrix for frequency pairs.
- the assignment of spectral profiles to a-priori-defined classes of percussive instruments is created by a simple classifier for classifying the k nearest neighbors with spectral profiles of individual instruments as a training database.
- the distance function is calculated from at least one correlation coefficient between a query profile and a database profile.
- additional features which provide detailed information about the shape of the spectral profile are extracted. These include the individual features already mentioned.
- Drum-like inserts are captured in the amplitude envelopes, such as the amplitude envelope in Fig. 3a, using conventional tip selection techniques, also referred to as peak picking. Only peaks in a tolerance range Rich in addition to the original times t, that is to say the times in which the maximum seeker 16c delivered a result, are primarily regarded as candidates for missions. Remaining peaks extracted from the amplitude envelopes are initially saved for further considerations. The value of the magnitude of the amplitude envelope is assigned to each candidate candidate at his position. If this value does not exceed a predetermined dynamic threshold, then the bet will not be accepted. The threshold varies across the amount of energy in a larger temporal area surrounding the stakes.
- automatic detection and preferably also automatic classification of non-pitched percussive instruments in real polyphonic music signals is thus achieved, the starting point for this being the profile spectra on the one hand and the amplitude envelope curve on the other hand.
- the rhythmic information of a piece of music can also be extracted well from the percussive instruments, which in turn should lead to a favorable note-to-note transcription.
- Methods for analyzing an information signal can be implemented in hardware or in software.
- the implement Menting can take place on a digital storage medium, in particular a floppy disk or CD with electronically readable control signals, which can cooperate with a programmable computer system in such a way that the method is carried out.
- the invention thus also consists in a computer program product with a program code stored on a machine-readable carrier for carrying out the method when the computer program product runs on a computer.
- the invention can thus be implemented as a computer program with a program code for carrying out the method if the computer program runs on a computer.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102004022660A DE102004022660B4 (en) | 2004-05-07 | 2004-05-07 | Apparatus and method for analyzing an information signal |
PCT/EP2005/004685 WO2005114651A1 (en) | 2004-05-07 | 2005-04-29 | Device and method for analysing an information signal |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1743324A1 true EP1743324A1 (en) | 2007-01-17 |
EP1743324B1 EP1743324B1 (en) | 2007-10-31 |
Family
ID=34968451
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05744658A Not-in-force EP1743324B1 (en) | 2004-05-07 | 2005-04-29 | Device and method for analysing an information signal |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1743324B1 (en) |
JP (1) | JP2007536587A (en) |
AT (1) | ATE377240T1 (en) |
DE (2) | DE102004022660B4 (en) |
WO (1) | WO2005114651A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723200B (en) * | 2021-08-03 | 2024-01-12 | 同济大学 | Method for extracting time spectrum structural features of non-stationary signals |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0675562A (en) * | 1992-08-28 | 1994-03-18 | Brother Ind Ltd | Automatic musical note picking-up device |
US6140568A (en) * | 1997-11-06 | 2000-10-31 | Innovative Music Systems, Inc. | System and method for automatically detecting a set of fundamental frequencies simultaneously present in an audio signal |
US6201176B1 (en) * | 1998-05-07 | 2001-03-13 | Canon Kabushiki Kaisha | System and method for querying a music database |
GB2363227B (en) * | 1999-05-21 | 2002-02-20 | Yamaha Corp | Method and system for supplying contents via communication network |
US7117149B1 (en) * | 1999-08-30 | 2006-10-03 | Harman Becker Automotive Systems-Wavemakers, Inc. | Sound source classification |
US6453252B1 (en) * | 2000-05-15 | 2002-09-17 | Creative Technology Ltd. | Process for identifying audio content |
JP2002207482A (en) * | 2000-11-07 | 2002-07-26 | Matsushita Electric Ind Co Ltd | Device and method for automatic performance |
JP2004029274A (en) * | 2002-06-25 | 2004-01-29 | Fuji Xerox Co Ltd | Device and method for evaluating signal pattern, and signal pattern evaluation program |
-
2004
- 2004-05-07 DE DE102004022660A patent/DE102004022660B4/en not_active Expired - Fee Related
-
2005
- 2005-04-29 AT AT05744658T patent/ATE377240T1/en not_active IP Right Cessation
- 2005-04-29 JP JP2007511985A patent/JP2007536587A/en not_active Ceased
- 2005-04-29 DE DE502005001838T patent/DE502005001838D1/en active Active
- 2005-04-29 EP EP05744658A patent/EP1743324B1/en not_active Not-in-force
- 2005-04-29 WO PCT/EP2005/004685 patent/WO2005114651A1/en active IP Right Grant
Non-Patent Citations (1)
Title |
---|
See references of WO2005114651A1 * |
Also Published As
Publication number | Publication date |
---|---|
ATE377240T1 (en) | 2007-11-15 |
DE102004022660A1 (en) | 2005-12-15 |
WO2005114651A1 (en) | 2005-12-01 |
DE102004022660B4 (en) | 2006-03-23 |
DE502005001838D1 (en) | 2007-12-13 |
JP2007536587A (en) | 2007-12-13 |
EP1743324B1 (en) | 2007-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1606798B1 (en) | Device and method for analysing an audio information signal | |
US7565213B2 (en) | Device and method for analyzing an information signal | |
DE10133333C1 (en) | Producing fingerprint of audio signal involves setting first predefined fingerprint mode from number of modes and computing a fingerprint in accordance with set predefined mode | |
EP1368805B1 (en) | Method and device for characterising a signal and method and device for producing an indexed signal | |
Mitrović et al. | Features for content-based audio retrieval | |
EP1371055B1 (en) | Device for the analysis of an audio signal with regard to the rhythm information in the audio signal using an auto-correlation function | |
EP1407446B1 (en) | Method and device for characterising a signal and for producing an indexed signal | |
DE10123366C1 (en) | Device for analyzing an audio signal for rhythm information | |
WO2006039995A1 (en) | Method and device for harmonic processing of a melodic line | |
WO2006039992A1 (en) | Extraction of a melody on which an audio signal is based | |
DE102004028693B4 (en) | Apparatus and method for determining a chord type underlying a test signal | |
DE102004028694B3 (en) | Apparatus and method for converting an information signal into a variable resolution spectral representation | |
EP1743324B1 (en) | Device and method for analysing an information signal | |
EP1671315B1 (en) | Process and device for characterising an audio signal | |
EP1377924B1 (en) | Method and device for extracting a signal identifier, method and device for creating a database from signal identifiers and method and device for referencing a search time signal | |
Krusche | Visualization and auralization of features learned by neural networks for musical instrument recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20061102 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
RIN1 | Information on inventor provided before grant (corrected) |
Inventor name: DITTMAR, CHRISTIAN Inventor name: UHLE, CHRISTIAN Inventor name: HERRE, JUERGEN |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
DAX | Request for extension of the european patent (deleted) | ||
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D Free format text: LANGUAGE OF EP DOCUMENT: GERMAN |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REF | Corresponds to: |
Ref document number: 502005001838 Country of ref document: DE Date of ref document: 20071213 Kind code of ref document: P |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20071129 |
|
ET | Fr: translation filed | ||
NLV1 | Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act | ||
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080131 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080211 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080131 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080229 Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080331 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20080801 |
|
BERE | Be: lapsed |
Owner name: FRAUNHOFER-GESELLSCHAFT ZUR FORDERUNG DER ANGEWAN Effective date: 20080430 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20080430 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080201 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20080430 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: 732E Free format text: REGISTERED BETWEEN 20090409 AND 20090415 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20080429 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090430 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20090430 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: TP |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20080429 Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080501 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20100506 Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20071031 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20100428 Year of fee payment: 6 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20100426 Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20080430 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20110429 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20111230 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20111101 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110502 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 502005001838 Country of ref document: DE Effective date: 20111101 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20110429 |