WO2015111014A1 - A method and a system for decomposition of acoustic signal into sound objects, a sound object and its use - Google Patents
A method and a system for decomposition of acoustic signal into sound objects, a sound object and its use Download PDFInfo
- Publication number
- WO2015111014A1 WO2015111014A1 PCT/IB2015/050572 IB2015050572W WO2015111014A1 WO 2015111014 A1 WO2015111014 A1 WO 2015111014A1 IB 2015050572 W IB2015050572 W IB 2015050572W WO 2015111014 A1 WO2015111014 A1 WO 2015111014A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- objects
- signal
- frequency
- sound
- amplitude
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 23
- 238000001228 spectrum Methods 0.000 claims abstract description 55
- 230000006872 improvement Effects 0.000 claims abstract description 12
- 238000006243 chemical reaction Methods 0.000 claims abstract description 7
- 238000009826 distribution Methods 0.000 claims abstract description 6
- 230000005236 sound signal Effects 0.000 claims description 49
- 230000015572 biosynthetic process Effects 0.000 claims description 11
- 238000003786 synthesis reaction Methods 0.000 claims description 11
- 239000000470 constituent Substances 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 description 24
- 230000006870 function Effects 0.000 description 14
- 238000012937 correction Methods 0.000 description 9
- 238000005070 sampling Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000004044 response Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 230000008451 emotion Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 230000018199 S phase Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
- G10H1/06—Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/066—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/091—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
- G10H2220/101—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters
- G10H2220/126—Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters for graphical editing of individual notes, parts or phrases represented as variable length segments on a 2D or 3D representation, e.g. graphical edition of musical collage, remix files or pianoroll representations of MIDI-like files
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/145—Sound library, i.e. involving the specific use of a musical database as a sound bank or wavetable; indexing, interfacing, protocols or processing therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/131—Mathematical functions for musical analysis, processing, synthesis or composition
- G10H2250/215—Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
- G10H2250/235—Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/093—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using sinusoidal excitation models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Definitions
- the object of the invention is a method and a system for decomposition of acoustic signal into sound objects having the form of signals with slowly varying amplitude and frequency, and sound objects and their use.
- the invention is applicable in the field of analysis and synthesis of acoustic signals, e.g. in particular to speech signal synthesis.
- the existing sound analysis systems operate satisfactorily in conditions ensuring one source of the signal. If additional sources of sound appear, such as interference, ambient sounds or consonant sounds of multiple instruments, their spectrum overlap, causing the mathematical models being applied to fail.
- the algorithms based on the Fourier transformation use the amplitude characteristic for the analysis, and in particular the maximum of the amplitude of the spectrum. In the case of sounds with different frequencies close to each other this parameter will be strongly distorted. In this case, additional information could be obtained from the phase characteristic, analysing the signal's phase. However, since the spectrum is analysed in windows shifted e.g. by 256 samples, there is nothing to relate the calculated phase to.
- an object of the invention is to provide a method and a system for decomposition of acoustic signal, which would make possible an effective analysis of acoustic signal perceived as a signal incoming simultaneously from a number of sources, while maintaining a very good resolution in time and frequency. More generally, an object of the invention is to improve the reliability and to enhance the possibilities of sound signals' processing systems, including those for analysis and synthesis of speech.
- the essence of the invention is that a method for decomposition of acoustic signal into sound objects having the form of sinusoidal wave with slowly-varying amplitude and frequency, comprising conversion of the signal from analogue to digital form, then splitting the signal into adjacent sub- bands with central frequencies distributed according to logarithmic scale by feeding samples of the acoustic signal to the digital filter bank's input, is characterised in that
- an active object in an active objects database is created for its tracking
- values of the envelope of amplitude and values of frequency and their corresponding time instants are determined not less frequently than once per period of duration of a given filter's window W(n) so as to create characteristic points describing slowly-varying sinusoidal waveform of said sound obj ect
- At least one selected closed active object is transferred to a database of sound objects to obtain at least one decomposed sound object, defined by a set of characteristic points with coordinates in time-frequency-amplitude space.
- the essence of the invention is also that a system for decomposition of acoustic signal into sound objects in the form of sinusoidal waveforms with slowly-varying amplitude and frequency, comprising a converter system for digital-to- analog conversion from which said signal being input to a filter bank having central frequencies of filters distributed according to geometrical distribution is characterised in that each filter is adapted to determine a real value FC (n) and an imaginary FS (n) of said filtered signal, said filter bank being connected to a system for tracking objects, wherein the system for tracking objects comprises a spectral analysis system, a voting system adapted to calculate the value of angular frequency weighted by amplitude, a resolution improving system adapted to adjust the filter's window and/or to determine a difference between spectrum between the input signal and at the filter bank's output, and/or to determine a difference between the input signal and the bank filter's output signal, a system for associating objects, a shape forming system adapted to determine characteristic points describing slowly-varying sinusoidal
- the essence of the invention is also that a sound object being a signal having slowly-varying amplitude and frequency is characterised in that it is obtained by the method according to any of claims 1 to 7.
- a sound object being a signal having slowly-varying amplitude and frequency is characterised in that it is defined by characteristic points having three coordinates in the time- amplitude-frequency space, wherein each characteristic point is distant from the next one by a value not greater than 20 periods of the object's frequency.
- the main advantage of the method and the system for decomposition of signal according to the invention is that it is suitable for effective analysis of a real acoustic signal, which usually is composed of signals incoming from a few different sources, e.g. a number of various instruments or a number of talking or singing persons.
- the method and the system according to the invention allow to decompose a sound signal into sinusoidal components having slow variation of amplitude and frequency of the components.
- Such process can be referred to as a vectorization of a sound signal, wherein vectors calculated as a result of the vectorization process can be referred to as sound objects.
- a primary objective of decomposition is to extract at first all the signal's components (sound objects), next to group them according to a determined criterion, and afterwards to determine the information contained therein.
- a spectrum of the audio signal obtained at said filter bank's output comprises information about the current location and variations in the sound objects' signal.
- the task of the system and the method according to the invention is to precisely associate a variation of these parameters with existing objects, to create a new object, if the parameters do not fit to any of the existing objects, or to terminate an object if there are no further parameters for it.
- the number of considered filters is increased and a voting system is used, allowing to more precisely localize frequencies of the present sounds. If close frequencies appear, the length of said filters is increased for example to improve the frequency-domain resolution or techniques for suppressing the already recognized sounds are applied so as to better extract newly appearing sound objects.
- the method and the system according to the invention track objects having a frequency variable in time. This means that the system will analyse real phenomena, correctly identifying an object with a new frequency as an already existing object or an object belonging to the same group associated with the same source of signal. Precise localization of the objects' parameters in amplitude and frequency domain allows to group objects in order to identify their source. Assignment to a given group of objects is possible due to the use of specific relations between the fundamental frequency and its harmonics, determining the timbre of the sound. [0019] A precise separation of objects makes a chance of further analysis for each group of objects, without interference, by means of already existing systems, which obtain good results for a clean signal (without interference) . Possessing precise information about sound objects which are present in a signal makes it possible to use them in completely new applications such as, for example, automatic generation of musical notation of individual instruments from an audio signal or voice control of devices even with high ambient interference.
- FIG.l is a block diagram of a system for decomposition of audio signal into sound objects
- FIG.2a is a parallel structure of a filter bank according to the first embodiment of the invention.
- FIG.2b is a tree structure of the filter bank according to the second embodiment of the invention.
- FIG.3 shows a general principle of operation of a passive filter bank system
- FIG.4 shows exemplary parameters of filters
- FIG.5 is the impulse response of a filter F(n) having the Blackman window
- FIG.6 is a flowchart of a single filter
- FIGS .7a and 7c show a part of a spectrum of the filter bank output signal, comprising the real component FC(n), the imaginary component FS (n) and the resulting amplitude of the spectrum FA(n) and the phase FF(n)
- FIGS .7b and 7d show the nominal angular frequency F# (n) of a corresponding filter group and angular frequency of the spectrum FQ(n) .
- FIG .8 is a block diagram of a system for tracking sound objects
- FIGS .9a and 9b show exemplary results of operation of a voting system
- FIG.10 is a flowchart of a sound system for associating objects
- FIG.11 shows the operation of a frequency resolution improvement system according to an embodiment
- FIG.12 shows the operation of a frequency resolution improvement system according to another embodiment
- FIG.13 shows the operation of a frequency resolution improvement system according to yet another embodiment
- FIGS.14a, 14b, 14c, 14d show examples of representation of sound objects
- FIG.15 shows an exemplary format of notation of information about sound objects
- FIG.16 shows a first example of a sound object requiring correction
- FIG.17 shows a second example of a sound object requiring correction
- FIG.18 shows further examples of sound objects requiring correction
- FIGS.19a, 19b, 19c, 19d, 19e, 19f, 19g, 19h show the process of extracting sound objects from an audio signal and synthesis of an audio signal from sound objects.
- connection in the context of a connection between any two systems, should be understood in the broadest possible sense as any possible single or multipath, as well as direct or indirect physical or operational connection.
- FIG.l A system 1 for decomposition of acoustic signal into sound objects according to the invention is shown schematically in FIG.l.
- An audio signal in digital form is fed to its input.
- a digital form of said audio signal is obtained as a result of the application of typical and known A/D conversion techniques.
- the elements used to convert the acoustic signal from analogue to digital form have not been shown herein.
- the system 1 comprises a filter bank 2 with an output connected to a system for tracking objects 3, which is further connected with a correcting system 4. Between the system for tracking objects 3 and the filter bank there exists a feedback connection, used to control the parameters of the filter bank 2.
- the system for tracking objects 3 is connected to the input of the filter bank 2 via a differential system 5, which is an integral component of a frequency resolution improvement system 36 in FIG.8.
- a time-domain and frequency-domain signal analysis has been used.
- Said digital input signal is input to the filter bank 2 sample by sample.
- said filters are SOI filters. It is shown in FIG.2a a typical structure of the filter bank 2, in which individual filters 20 process in parallel the same signal with a given sampling rate. Typically, the sampling rate is at least two times higher than the highest expected audio signal's component, preferably 44.1 kHz.
- a filter bank tree structure of FIG.2b can be used.
- the filters 20 are grouped according to the input signal sampling rate. For example, the splitting in the tree structure can be done at first for the whole octaves, For individual sub-bands with lower frequencies it is possible to cut off high frequency components using a low-pass filter and to sample them with a smaller rate. As a consequence, due to reduction of the number of samples a significant increase in processing speed is achieved.
- the filter bank should provide a high frequency-domain resolution, i.e. greater than 2 filters per semitone, making it possible to separate two adjacent semitones. In the presented examples 4 filters per semitone are used.
- a scale corresponding to human ear's parameters has been adopted, with logarithmic distribution, however a person skilled in the art will know that other distributions of filters' central frequencies are allowed within the scope of the invention.
- a pattern for the distribution of filters' central frequencies is the musical scale, wherein the subsequent octaves begin with a tone 2 times higher than the previous octave.
- the number of filters is greater than 300.
- FIG.3 A general principle of operation of a passive filter bank is shown in FIG.3.
- the input signal which is fed to each filter 20 of the filter bank 2 is transformed as a result of relevant mathematical operations from the time domain into the frequency domain.
- a response to an excitation signal appears at the output of each filter 20, and the signal's spectrum jointly appears at the filter bank's output.
- FIG.4 shows exemplary parameters of selected filters 20 in the filter bank 2. As can be seen in the table, central frequencies correspond to tones to which a particular music note symbol can be attributed.
- the window width of each filter 20 is given by the relation:
- W(n) K * fp / FN(n) (1) where: W(n) - window width of a filter n
- fp - sampling rate ( e.g. 44100 Hz )
- K - window width coefficient ( e.g. 16 )
- FIG. 5 An exemplary impulse response of a filter 20 according to the invention is shown in FIG.5.
- each of the filters 20 has been shown in FIG.6.
- parameters of the filter 20 are initiated, the exemplary parameters being the coefficients of particular components of time window function.
- the current sample P IN of the input signal having only a real value, is fed to the input of the filter bank 2.
- Each filter uses a recursive algorithm, calculates a new value of components FC(n) and FS (n) based on the previous values of the real component FC(n) and the imaginary component FS (n) , and calculates also values of the sample P IN input to the filter and the sample ⁇ 0 ⁇ leaving the filter's window and which is stored in an internal shift register. Thanks to the use of a recursive algorithm the number of calculations for each of the filters is constant and does not depend on the filter's window length.
- the executed operations for a cosine window are defined by the formula:
- a spectrum analysing system 31 being a component of the system for tracking objects 3 (as shown in FIG.8) calculates individual components of the signal's spectrum at the filter bank output. To illustrate the operation of this system, an acoustic signal with the following components has been subjected to analysis:
- FIGS. 7a and 7b plots of instantaneous values of quantities obtained at the output of selected group of filters 20 for said signal and values of quantities calculated and analysed by the spectrum analysing system 31.
- the spectrum analysing system 31 collects all the possible information necessary to determine the actual frequency of the sound objects present at a given time instant in the signal, including the information about the angular frequency.
- the correct location of the tone of component frequencies has been shown in FIG.7b, and it is at the intersection of the nominal angular frequency of the filters FQ[n] and the value of the angular frequency at the output of the filters FQ[n], calculated as a derivative of the phase of the spectrum at the output of a particular filter n.
- the spectrum analysing system 31 analyses also the plot of angular frequency F# [n] and FQ[n]. In the case of a signal comprising components which are distant from each other, points which are determined as a result of analysis of the angular frequency correspond to locations of maxima of the amplitude in FIG. 7a.
- the fundamental task of the system for tracking objects 3, a block diagram of which is shown in FIG.8, is to detect at a given time instant all frequency components present in an input signal.
- the filters adjacent to the input tone have very similar angular frequencies, different from the nominal angular frequencies of those filters.
- This property is used by another subsystem of the system for tracking objects 3, namely the voting system 32.
- the voting system 32 To prevent incorrect detection of frequency components, the values of the amplitude spectrum FA(n) and angular frequency at the output of filters FQ(n), calculated by the spectrum analysing system 31, are forwarded to the voting system 32 for calculation of their weighted value and detection of its maxima in function of the filter's number (n) .
- FIGS.9a and 9b illustrate the operation of this, system.
- FIG.9a illustrates a relevant case shown in FIGS.7a and 7b
- FIG.9b illustrates a relevant case shown in FIGS.7c and 7d.
- the plot of the signal FG(n) (the weighted value calculated by the voting system 32) has distinct peaks in locations corresponding to tones of frequency components present in the input signal.
- the spectrum analysing system 31 and the voting system 32 are connected at their output with a system for associating objects 33.
- the system for associating objects 33 combines these parameters in "elements" and next builds sound objects out of them.
- the frequencies (angular frequencies) detected by the voting system 32, and thus "elements" are identified by the filter number n.
- the system for associating objects 33 is connected to an active objects database 34.
- the active objects database 34 comprises objects arranged in order depending on the frequency value, wherein the objects have not yet been "terminated".
- the term "a terminated object” is to be understood as an object such that at a given time instant no element detected by the spectrum analysing system 31 and the voting system 32 can be associated with it.
- the operation of the system for associating objects 33 has been shown in FIG.10.
- Subsequent elements of the input signal detected by the voting system 32 are associated with selected active objects in the database 34.
- detected objects of a given frequency are compared only with the corresponding active objects located in a predefined frequency range. At first, the comparison takes into account the angular frequency of an element and an active object. If there is no object sufficiently close to said element (e.g.
- a matching function is further calculated in the system for associating objects 33, which comprises the following weighted values: amplitude matching, phase matching, objects duration time.
- amplitude matching amplitude matching
- phase matching objects duration time.
- Such a functionality of the system for associating objects 33 according to the invention is of essential importance in the situation when in a real input signal a component signal from one and the same source has changed frequency. This is because it happens that as a result of frequency changing a number of active objects become closer to each other. Therefore, after calculating the matching function the system for associating objects 33 checks if at a given time instant there is a second object sufficiently close to in the database 34. The system 33 decides which object will be a continuer of the objects which join together.
- a resolution improvement system 36 cooperates with the active objects database 34. It tracks the mutual frequency-domain distance of the objects present in the signal. If too close frequencies of active objects are detected the resolution improvement system 36 sends a control signal to start one of the three processes improving the frequency-domain resolution. As mentioned previously, in the case of presence of a few frequencies close to each other, their spectrum overlap. To distinguish them the system has to "listen intently" to the sound. It can achieve this by elongating the window in which the filter samples the signal.
- a window adjustment signal 301 is activated, informing the filter bank 2 that in the given range the windows should be elongated. Due to the window elongation the signal dynamics analysis is impeded, therefore if no close objects are detected the resolution improvement system 36 enforces a next shortening of the filter's 20 window.
- a window with length of 12 to 24 periods of nominal frequency of the filter 20 is assumed.
- the relation of the frequency-domain resolution with the window's width is shown in FIG.11.
- the table below illustrates the ability of the system to detect and track at least 4 non-damaged objects subsequently present next to each other, with the minimal distance expressed in percentage, as a function of the window's width.
- the system "listens intently" to a sound by modifying the filter bank's spectrum, what is schematically illustrated in FIG.12.
- the frequency-domain resolution is improved by subtracting from a spectrum at the tracking system's 3 input the expected spectrum of "well localised objects", which are localised in vicinity of new appearing objects.
- "Well localised objects” are considered as objects the amplitude of which does not vary too quickly (no more than one extreme per window's width) and the frequency of which does not drift too quickly (no more than 10% variation of frequency per window's width) .
- An attempt to subtract a spectrum of objects varying quicker can lead to the phase inversion at the measurement system input and to a positive feedback resulting in generation of an interfering signal.
- the resolution improvement system 36 calculates the expected spectrum 303 based on the known instantaneous frequency, amplitude and phase of an object and subtracts them from the real spectrum, causing that the spectrum of adjacent elements will not be interfered so strongly.
- the spectrum analysing system 31 and the voting system 32 perceive only adjacent elements and a variation of the subtracted object.
- the system for associating objects 33 further takes into account the subtracted parameters while comparing the detected elements with the active objects database 34.
- This frequency-domain resolution improvement method a very large number of computations is required and a risk of positive feedback exists.
- the frequency-domain resolution can be improved by subtracting from the input signal an audio signal generated based on well localised
- FIG.13 Such operation is shown schematically in FIG.13.
- the resolution improvement system 36 generates an audio signal 302 based on information about frequency, amplitude and phase of the active objects 34, which is forwarded to a differential system 5 at the filter bank's 2 input, as shown schematically in FIG.13.
- the number of required calculations in an operation of this type is smaller than in the case of the embodiment in FIG.12, however due to an additional delay introduced by the filter bank 2 the risk of system's instability and unintended generation increases.
- the information contained in the active objects database 34 is also used by a shape forming system 37.
- the expected result of the sound signal decomposition according to the invention is to obtain sound objects having the form of sinusoidal waveforms with slowly- varying amplitude envelope and frequency. Therefore, the shape forming system 37 tracks variations of the amplitude envelope and frequency of the active objects in the database 34 and calculates online subsequent characteristic points of amplitude and frequency, which are the local maximum, local minimum and inflection points. Such information allows to unambiguously describe sinusoidal waveforms.
- the shape forming system 37 forwards these characteristic information in the form of points describing an object online to the active objects database 34. It has been assumed that the distance between points to be determined should be no less than 20 periods of the object's frequency.
- FIG.14a illustrates four objects with frequency varying in function of time (sample number) .
- the same objects have been shown in FIG.14b in the space defined by amplitude and time (sample number).
- the illustrated points indicate local maxima and minima of the amplitude.
- the points are connected by a smooth curve, calculated with the use of third order polynomials. Having determined the function of frequency variation and the amplitude envelope it is possible to determine the audio signal.
- FIG.14c illustrates an audio signal determined based on the shape of the objects defined in FIG.14a and FIG.14b.
- FIG.14d shows an exemplary format of sound objects notation.
- Header The notation starts with a header having as an essential element a header tag comprising a four byte keyword, informing that we deal with the description of sound objects. Next, in two bytes an information about the number of channels (tracks) is specified and two bytes of time unit definition. The header occurs only once at the beginning of a file.
- Channel Information about channels (tracks) from this field serves to separate the group of sound objects being in an essential relation, e.g. left or right channel in stereo, vocal track, percussion instruments track, recording from a defined microphone etc.
- the channel field comprises the channel identifier (number), the number of objects in the channel and the position of the channel from the beginning of an audio signal, measured in defined units.
- Identifier "0" denotes a basic unit in the signal record which is the sound object.
- Value "1" can denote a folder containing a group of objects like, for example, basic tone and its harmonics. Other values can be used to define other elements related to objects.
- the description of the fundamental sound object includes the number of points. The number of points does not include the first point, which is defined by the object itself. Specifying maximal amplitude in object's parameters allows to control simultaneous amplification of all points of the object. In the case of a folder of objects, this affects the value of amplitude of all the objects contained in the folder.
- Points are used to describe the shape of the sound object in time-frequency-amplitude domain. They have relative value with respect to parameters defined by the sound object. One byte of amplitude defines which part of the maximal amplitude defined by the object the point has. Similarly, tone variation defines by what fraction of tone the frequency has changed. Position of point is defined as relative with respect to the previously defined point in the object.
- the multilevel structure of recording and relative associations between the fields allow a very flexible operation on sound objects, making them effective tools for designing and modifying audio signals.
- Condensed recording of information about sound objects according to the invention greatly affects in a positive way the size of registered and transferred files. Taking into account that an audio file can be readily played from this format, we can compare the size of the file shown in FIG.14c, which in .WAV format would contain over 2000 bytes, and in the form of sound objects record "UH0" according to the invention, it would contain 132 bytes. A compression better than 15-fold is not an excellent achievement in this case. In the case of longer audio signals much better results can be achieved. The compression level depends on how much information is contained in the audio signal, i.e. how many and how composed objects can be read from the signal.
- Identification of sound objects in an audio signal is not an unambiguous mathematical transformation.
- the audio signal created as a composition of objects obtained in the result of a decomposition differs from the input signal.
- the task of the system and the method according to the invention is to minimize this difference.
- Sources of differences are of two types. Part of them is expected and results from the applied technology, other can result from interference or unexpected properties of input audio signal.
- a correcting system 4 shown in FIG.l is used.
- the system takes parameters of objects from the sound objects database 35 already after terminating the object and performs the operation of modification of selected parameters of objects and points such as to minimize the expected differences or irregularities localised in these parameters.
- the first type of correction of sound objects according to the invention, performed by the correcting system 4, is shown in FIG.16.
- the distortion at the beginning and at the end of the object is caused by the fact that during transient states, when the signal with defined freguency appears or fades, filters with a shorter impulse response react to the change quicker. Therefore, at the beginning the object is bent in the direction of higher frequencies, and at the end it turns towards the lower frequencies.
- Correction of an object can be based on deforming the object's frequency at the beginning and at the end in the direction defined by the middle section of the obj ect .
- FIG.17 A further type of correction according to the invention, performed by the correcting system 4, has been shown in FIG.17.
- the audio signal samples passing through a filter 20 of the filter bank 2 cause a change at the filter's output, which manifests as a signal shift.
- This shift has a regular character and is possible to be predicted. Its magnitude depends on the width of the window K of the filter n, the width being in accordance to the invention a function of frequency. This means that each frequency is shifted by a different value, what affects the sound of the signal perceivably.
- the magnitude of the shift is ca . 1/2 filter window's width in the area of normal operation of the filter, 1/4 window's width in the initial phase and ca. 3/4 window's width in the case of the objects end. Because for each frequency the magnitude of the shift can be predicted, the task of the correcting system 4 is to properly shift all the points of the object in the opposite direction, so that the dynamics of the representation of the input signal improves.
- FIG. 18a Yet another type of correction according to the invention, performed by the correcting system 4, is shown in FIG. 18a, FIG. 18B and FIG. 18C.
- the distortion manifests itself as an object splitting into pieces which are independent objects. This splitting can be caused e.g. by a phase fluctuation in an input signal's component, an interference or mutual influence of closely adjacent objects.
- the correction of distortions of this type requires the correcting circuit 4 to perform an analysis of the functions of envelope and frequency and to demonstrate that said objects should form an entirety.
- the correction is simple and is based on combination of the identified objects into one obj ect .
- a task of the correcting system 4 is also to remove objects having an insignificant influence on the audio signal's sound. According to the invention it was decided, that such objects can be the ones having the maximal amplitude which is lower than 1% of the maximal amplitude present in the whole signal at a given time instant. Change in the signal at the level of 40 dB should not be audible.
- the correcting system performs generally the removal of all irregularities in the shape of sound objects, which operations can be classified as: joining of discontinuous objects, removal of objects' oscillations near the adjacent ones, removal of insignificant objects, as well as the interfering ones, lasting too shortly or audible too weakly.
- FIG. 19a illustrating two channel includes ca . 250000 samples ( ca. 5.6 sec.) of the recording.
- FIG. 19b shows a spectrogram resulting from the operation of the filter bank 2 for the audio signal's left channel (upper plot in FIG.19a).
- On the left side of the spectrogram a piano keyboard has been shown as reference points defining the frequency.
- staffs with bass clef and a staff with treble clef above have been marked.
- the horizontal axis of the spectrogram corresponds to time instants during a composition, while the darker colour in the spectrogram indicates a higher value of the filtered signal's amplitude .
- FIG. 19c shows the result of operation of the voting system 32. Comparing the spectrogram in FIG. 19b with the spectrogram in FIG.19C it can be seen that wide spots representing signal constituent elements have been replaced by distinct lines indicating precise localisation of said constituent elements of the input signal.
- FIG.19d shows a cross-section of the spectrogram along the A-A line for the 149008th sample and presents the amplitude in function of frequency.
- the vertical axis in the middle indicates the real component and the imaginary component and the amplitude of the spectrum.
- the vertical axis at the right side shows peaks of the voting signal, indicating the temporary localisation of audio signal constituent elements.
- FIG. 19e is a cross-section of the spectrogram along the line BB at the frequency of 226.4 Hz.
- FIG. 19f sound objects are shown (without operation of the correcting system 4) .
- the vertical axis indicates the frequency, while the horizontal axis indicates time expressed by the number of the sample.
- To store these objects ca. 9780 bytes are required.
- the audio signal in FIG. 19a comprising 250000 samples in the left channel requires 500 000 bytes for direct storing, which in the case of using the signal decomposition method and sound objects according to the invention leads to a compression at the level of 49.
- the use of correcting system 4 further improves the compression level, due to removal of objects having a negligible influence on the signal's sound.
- FIG .19g there are shown amplitudes of selected sound objects, shaped with the use of already determined characteristic points by means of smooth curves created of third order polynomials.
- objects with amplitude higher than 10% of the amplitude of the object with the highest amplitude are shown.
- the sound objects according to the invention have a number of properties enabling their multiple applications, in particular in processing, analysis and synthesis of sound signals.
- Sound objects can be acquired with the use of the method for signal decomposition according to the invention as a result of an audio signal decomposition. Sound objects can be also formed analytically, by defining values of parameters shown in FIG.14d.
- a sound object database can be formed by sounds taken from the surrounding environment or created artificially. Below some advantageous properties of sound objects described by points having three coordinates are listed :
- One of the parameters which describe sound objects is the time, thanks to which the objects can be shifted, shortened and lengthened in the time domain.
- a second parameter of sound objects is the frequency, thanks to which the objects can be shifted and modified in the frequency domain.
- a next parameter of sound objects is the amplitude, thanks to which envelopes of sound objects can be modified.
- Sound objects can be grouped, by selecting e.g. the ones present in the same time or/and the ones with frequencies being harmonics.
- Grouped objects can be separated from or appended to an audio signal. This allows to create a new signal from a number of other signals or to split a single signal into a number of independent signals.
- Grouped objects can be amplified (by increasing their amplitude) or silenced (by decreasing their amplitude) .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Stereophonic System (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PL406948A PL231399B1 (pl) | 2014-01-27 | 2014-01-27 | Sposób dekompozycji sygnału akustycznego na obiekty dźwiękowe |
PLPL406948 | 2014-01-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015111014A1 true WO2015111014A1 (en) | 2015-07-30 |
Family
ID=52598803
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2015/050572 WO2015111014A1 (en) | 2014-01-27 | 2015-01-26 | A method and a system for decomposition of acoustic signal into sound objects, a sound object and its use |
Country Status (2)
Country | Link |
---|---|
PL (1) | PL231399B1 (pl) |
WO (1) | WO2015111014A1 (pl) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106814670A (zh) * | 2017-03-22 | 2017-06-09 | 重庆高略联信智能技术有限公司 | 一种河道采砂智能监管方法及系统 |
CN107103895A (zh) * | 2017-03-29 | 2017-08-29 | 华东交通大学 | 一种钢琴弹奏音准的检测装置 |
CN107657956A (zh) * | 2017-10-23 | 2018-02-02 | 上海斐讯数据通信技术有限公司 | 一种多媒体设备语音控制系统及方法 |
WO2019229738A1 (en) * | 2018-05-29 | 2019-12-05 | Sound Object Technologies S.A. | System for decomposition of digital sound samples into sound objects |
CN110910895A (zh) * | 2019-08-29 | 2020-03-24 | 腾讯科技(深圳)有限公司 | 一种声音处理的方法、装置、设备和介质 |
CN111856399A (zh) * | 2019-04-26 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | 基于声音的定位识别方法、装置、电子设备及存储介质 |
CN113380258A (zh) * | 2021-04-29 | 2021-09-10 | 国网浙江省电力有限公司嘉兴供电公司 | 一种变电站故障判定声纹识别方法 |
CN117113065A (zh) * | 2023-10-24 | 2023-11-24 | 深圳波洛斯科技有限公司 | 一种基于声音检测的智能灯组数据管理系统及方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640450A (zh) * | 2020-05-13 | 2020-09-08 | 广州国音智能科技有限公司 | 多人声音频处理方法、装置、设备及可读存储介质 |
CN115620706B (zh) * | 2022-11-07 | 2023-03-10 | 之江实验室 | 一种模型训练方法、装置、设备及存储介质 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214708A (en) | 1991-12-16 | 1993-05-25 | Mceachern Robert H | Speech information extractor |
-
2014
- 2014-01-27 PL PL406948A patent/PL231399B1/pl unknown
-
2015
- 2015-01-26 WO PCT/IB2015/050572 patent/WO2015111014A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5214708A (en) | 1991-12-16 | 1993-05-25 | Mceachern Robert H | Speech information extractor |
Non-Patent Citations (4)
Title |
---|
MATHIEU LAGRANGE: "Modélisation sinusoïdale des sons polyphoniques", 16 December 2004 (2004-12-16), pages 1 - 220, XP055186478, Retrieved from the Internet <URL:http://recherche.ircam.fr/equipes/analyse-synthese/lagrange/research/papers/lagrangePhd.pdf> [retrieved on 20150428] * |
MEINARD MULLER ET AL: "Signal Processing for Music Analysis", IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, IEEE, US, vol. 5, no. 6, 6 October 2011 (2011-10-06), pages 1088 - 1110, XP011386713, ISSN: 1932-4553, DOI: 10.1109/JSTSP.2011.2112333 * |
TERO TOLONEN: "Methods for Separation of Harmonic Sound Sources using Sinusoidal Modeling", 106TH CONVENTION AES, 8 May 1999 (1999-05-08), XP055186475, Retrieved from the Internet <URL:http://www.aes.org/e-lib/inst/download.cfm/8222.pdf?ID=8222> [retrieved on 20150428] * |
VIRTANEN T ET AL: "Separation of harmonic sound sources using sinusoidal modeling", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2000. ICASSP '00. PROCEEDING S. 2000 IEEE INTERNATIONAL CONFERENCE ON 5-9 JUNE 2000, PISCATAWAY, NJ, USA,IEEE, vol. 2, 5 June 2000 (2000-06-05), pages 765 - 768, XP010504835, ISBN: 978-0-7803-6293-2 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106814670A (zh) * | 2017-03-22 | 2017-06-09 | 重庆高略联信智能技术有限公司 | 一种河道采砂智能监管方法及系统 |
CN107103895A (zh) * | 2017-03-29 | 2017-08-29 | 华东交通大学 | 一种钢琴弹奏音准的检测装置 |
CN107657956A (zh) * | 2017-10-23 | 2018-02-02 | 上海斐讯数据通信技术有限公司 | 一种多媒体设备语音控制系统及方法 |
CN107657956B (zh) * | 2017-10-23 | 2020-12-22 | 吴建伟 | 一种多媒体设备语音控制系统及方法 |
WO2019229738A1 (en) * | 2018-05-29 | 2019-12-05 | Sound Object Technologies S.A. | System for decomposition of digital sound samples into sound objects |
CN111856399A (zh) * | 2019-04-26 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | 基于声音的定位识别方法、装置、电子设备及存储介质 |
CN111856399B (zh) * | 2019-04-26 | 2023-06-30 | 北京嘀嘀无限科技发展有限公司 | 基于声音的定位识别方法、装置、电子设备及存储介质 |
CN110910895A (zh) * | 2019-08-29 | 2020-03-24 | 腾讯科技(深圳)有限公司 | 一种声音处理的方法、装置、设备和介质 |
CN110910895B (zh) * | 2019-08-29 | 2021-04-30 | 腾讯科技(深圳)有限公司 | 一种声音处理的方法、装置、设备和介质 |
CN113380258A (zh) * | 2021-04-29 | 2021-09-10 | 国网浙江省电力有限公司嘉兴供电公司 | 一种变电站故障判定声纹识别方法 |
CN117113065A (zh) * | 2023-10-24 | 2023-11-24 | 深圳波洛斯科技有限公司 | 一种基于声音检测的智能灯组数据管理系统及方法 |
CN117113065B (zh) * | 2023-10-24 | 2024-02-09 | 深圳波洛斯科技有限公司 | 一种基于声音检测的智能灯组数据管理系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
PL231399B1 (pl) | 2019-02-28 |
PL406948A1 (pl) | 2015-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10565970B2 (en) | Method and a system for decomposition of acoustic signal into sound objects, a sound object and its use | |
WO2015111014A1 (en) | A method and a system for decomposition of acoustic signal into sound objects, a sound object and its use | |
Serra et al. | Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition | |
EP2633524B1 (en) | Method, apparatus and machine-readable storage medium for decomposing a multichannel audio signal | |
Holzapfel et al. | Three dimensions of pitched instrument onset detection | |
Nakatani et al. | Robust and accurate fundamental frequency estimation based on dominant harmonic components | |
Ganapathy et al. | Robust feature extraction using modulation filtering of autoregressive models | |
Caetano et al. | Improved estimation of the amplitude envelope of time-domain signals using true envelope cepstral smoothing | |
JP2010055000A (ja) | 信号帯域拡張装置 | |
AU2014204540B1 (en) | Audio Signal Processing Methods and Systems | |
Yang et al. | BaNa: A noise resilient fundamental frequency detection algorithm for speech and music | |
US9305570B2 (en) | Systems, methods, apparatus, and computer-readable media for pitch trajectory analysis | |
Benetos et al. | Auditory spectrum-based pitched instrument onset detection | |
Martin et al. | Cepstral modulation ratio regression (CMRARE) parameters for audio signal analysis and classification | |
Zaunschirm et al. | A sub-band approach to modification of musical transients | |
Coyle et al. | Onset detection using comb filters | |
Meriem et al. | New front end based on multitaper and gammatone filters for robust speaker verification | |
JP3916834B2 (ja) | 雑音が付加された周期波形の基本周期あるいは基本周波数の抽出方法 | |
Dziubiński et al. | High accuracy and octave error immune pitch detection algorithms | |
Wolfel et al. | Warping and scaling of the minimum variance distortionless response | |
JP2011013383A (ja) | オーディオ信号補正装置及びオーディオ信号補正方法 | |
US9307320B2 (en) | Feedback suppression using phase enhanced frequency estimation | |
Danayi et al. | A novel algorithm based on time-frequency analysis for extracting melody from human whistling | |
Wilczyński et al. | Spectral features of the clarinet sound revealed by the set of stft-based parameters | |
CN115295014A (zh) | 一种提高拼音模糊匹配正确率的拼音相似度计算方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15707806 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15707806 Country of ref document: EP Kind code of ref document: A1 |