EP1918911A1 - Zeitskalenmodifikation eines Audiosignals - Google Patents

Zeitskalenmodifikation eines Audiosignals Download PDF

Info

Publication number
EP1918911A1
EP1918911A1 EP06123394A EP06123394A EP1918911A1 EP 1918911 A1 EP1918911 A1 EP 1918911A1 EP 06123394 A EP06123394 A EP 06123394A EP 06123394 A EP06123394 A EP 06123394A EP 1918911 A1 EP1918911 A1 EP 1918911A1
Authority
EP
European Patent Office
Prior art keywords
spectral
dominant
spectral lines
time windows
lines
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP06123394A
Other languages
English (en)
French (fr)
Inventor
Thorsten Karrer
Eric Lee
Jan Oliver Dr. Borchers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rheinisch Westlische Technische Hochschuke RWTH
Original Assignee
Rheinisch Westlische Technische Hochschuke RWTH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rheinisch Westlische Technische Hochschuke RWTH filed Critical Rheinisch Westlische Technische Hochschuke RWTH
Priority to EP06123394A priority Critical patent/EP1918911A1/de
Publication of EP1918911A1 publication Critical patent/EP1918911A1/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion

Definitions

  • One aim is to enhance the control of the speed of time based media in real time, while maintaining the integrity of the content.
  • Such speed control would inter alia allow for cue-play or slow-play without changing the pitch, timbre, micro timing, and transients of the audio signal.
  • playback speed control has proved to be particularly difficult for certain types of media especially audio signals.
  • phase vocoder principle modifies the phase spectre of subsequent spectral lines within signal blocks, namely time windows, to obtain a continuous signal curve when overlap-adding those blocks with altered time spacing. This transfers the distribution of frequency components that make up the sound over time to a finer or coarser time scale.
  • phase vocoder principle changes the playback speed of pre-recorded audio data without changing its pitch, it introduces considerable artefacts, commonly referred to as “transient smearing” and "reverberation”, limiting its use to recordings of instrumental or vocal music.
  • phase vocoder takes the principles of the human auditory system into account.
  • the human ear processes sound by performing a frequency-to-location transformation at the basilar membrane inside the inner ear. Sound perception begins with the sound vibrations being led from tympanum to the cochlea, where the basilar membrane starts to oscillate. These oscillations move along the membrane in a frequency dependent way: low frequencies move the membrane near the helicotrema, high frequencies near the oval window. Therefore, such a frequency-to-location transformation inside the ear is non-linear.
  • the object of the invention is solved by comprising the steps: Receiving the audio signal. Partitioning the audio signal into a plurality of equally spaced, overlapping time windows having a first overlap amount. Partitioning each of the time windows into a plurality of frequency channels, thereby providing a spectral decomposition with spectral lines. Partitioning the frequency channels for each of the time windows into a plurality of spectral bands, each of the spectral bands having an extent depending upon human frequency perception. Identifying dominant spectral lines within each of the spectral bands based on the magnitude of the spectral lines, whereby the number of the dominant spectral lines in each of the spectral bands being equal to or larger than zero.
  • Identifying trajectories by analyzing the identified dominant spectral lines for each of the spectral bands for a subset of time windows comprising at least two consecutive time windows. Determining a phase difference for each of the identified dominant spectral lines lying on a trajectory by a phase vocoder principle using a phase relation of each of the dominant spectral lines lying on trajectory in consecutive time windows. Determining a phase difference for each of a given number of spectral lines near each of the dominant spectral lines lying on a trajectory within each of the spectral bands as the phase difference of the corresponding dominant spectral line lying on a trajectory.
  • Such a method of time scale modification of an audio signal takes some principles of music and noise into account that can be used to provide an improved quality of time scale modification with phase locking.
  • the principles are called multiresolution peak-picking and trajectory heuristics according to the present invention.
  • the technique of multiresolution peak-picking addresses the above mentioned problem with the non-linear frequency-to-location transformation in the human ear.
  • peak detection not an equidistant spectrum but rather a band based spectrum is used in which multiple spectrum lines are partitioned into a plurality of contiguous spectral bands, particularly based on a Bark-Scale or a Mel-Scale. This makes peak detection frequency dependent as the width of the bands is frequency dependent.
  • the number of detected peaks in one spectral band is not limited to one, but can - depending on the magnitude of the spectral lines - be equal to or larger than zero, according to the present invention.
  • trajectory heuristics addresses the problems with scaled phase locking and uses the results of the multiresolution peak-picking technique setting the identified dominant spectral lines in temporal relationship. This makes it possible to decide whether a dominant spectral line is the continuation of a preceding dominant spectral line or not. Based on the identified trajectories a phase difference for each of such dominant spectral lines can then be determined by the phase vocoder principle. Trajectory heuristics according to the present invention only take the identified dominant spectral lines into account in order to identify associated trajectories. Such dominant spectral lines regularly constitute the main frequencies providing essentially sound and tone of the audio signal.
  • trajectory heuristics can also be called “sinusoidal trajectory heuristics” or “trajectories-of-sinusoids awareness” or “heuristics to regard the trajectories of sinusoids”.
  • the step of partitioning the audio signal into a plurality of equally spaced, overlapping time windows includes a smooth masking out of a desired part of the audio signal. This avoids artefacts biasing the spectrum when a desired part of the audio signal is cut out.
  • the step of partitioning each of the time windows into a plurality of frequency channels is performed by a filter bank of analogue and/or digital filters. This allows for an analogue implementation of the present invention.
  • the step of partitioning each of the time windows into a plurality of frequency channels is performed by a discrete Fourier transformation. This allows for a digital implementation of the present invention.
  • the number of spectral lines to be identified as dominant spectral lines within each of the spectral bands becomes smaller as the frequencies in the respective spectral band become higher. This identification further takes the principles of the human auditory system combined with the principles of music and speech into account.
  • the step of identifying trajectories includes the step of determination whether one of the dominant spectral lines with a first associated frequency channel in one of the spectral bands in one of the time windows is a continuation of a dominant spectral line with a second associated frequency channel in the same or another spectral band of the preceding time window, or whether the dominant spectral line in one of the spectral bands in one of the time windows is a new dominant spectral line and therefore not a continuation of a dominant spectral line in the same or another spectral band of the preceding time window.
  • the decision, whether one dominant spectral is a continuation of another dominant spectral line or not, is one of the advantageous concepts of trajectory heuristics. It allows identifying new transitions in music or speech signals, e.g., new instruments and/or new cues.
  • This frequency dependent decision, whether one dominant spectral is a continuation of another dominant spectral line or not, enhances the identification of new and old transitions in music or speech signals.
  • the step of synthesizing an audio signal from the plurality of modified equally spaced, overlapping time windows having a second overlap amount employing the determined phase differences for each of the spectral lines is performed by a bank of analogue and/or digital oscillators. This allows for an analogue implementation of the present invention.
  • the step of synthesizing an audio signal from the plurality of modified equally spaced, overlapping time windows having a second overlap amount employing the determined phase differences for each of the spectral lines is performed by an inverse discrete Fourier transformation. This allows for a digital implementation of the present invention.
  • phase relation is reset for all spectral lines when the dynamics in one of the time windows are below a given threshold for a given time.
  • the present invention provides a method of time scale modification of an audio signal and improves the perceived quality of frequency-domain time scale modification of audio signals by employing frequency dependent constraints under which spectral peaks are detected and phase-linked in a phase vocoder.
  • the phase vocoder principle is a well known technique that allows for time stretching of an analogue and/or digital audio signal without changing its pitch, that is, a change of the frequencies of the signal.
  • the basic principle of a phase vocoder is illustrated in Fig. 1.
  • a signal - here shown with a single frequency to simplify matters - is partitioned into a plurality of equally spaced, overlapping time windows with a first overlap amount.
  • the frequency can be evaluated by means of analogue filter banks or Fourier Transformation (e.g., discrete Fast Fourier Transformation)
  • a phase difference can be determined with which the partitioned signal can be shifted.
  • a time stretched version of the audio signal is provided.
  • the second overlap has a ratio to the first overlap to achieve a desired time scale modification.
  • an audio signal is provided that is composed of a plurality of frequencies - which is the normal case if the audio signal is music or speech - this audio signal has to be partitioned into that plurality of frequency channels, thereby providing a spectral decomposition with spectral lines.
  • decomposition can be accomplished by a bank of heterodyning bandpass filters as shown in Fig. 2 if the invention is realized with analogue components, or by a Fast Fourier Transformation as shown in Fig 3. if the invention is carried out within a digital implementation.
  • phase shifted spectral lines by using the phase vocoder principle can be accomplished by a bank of oscillators as shown in Fig. 2 or by an inverse Fast Fourier Transformation as shown in Fig. 3.
  • Fig. 2 a bank of oscillators as shown in Fig. 2
  • Fig. 3 a inverse Fast Fourier Transformation
  • Fig. 4 shows the principle of a preferred digital embodiment according to the present invention.
  • First the audio signal is windowed by partitioning it into a plurality of equally spaced windows having a first overlap amount.
  • a window is used that masks out the desired part of the audio signal smoothly.
  • Hanning or Hamming windows are known for instance.
  • the windowed audio signal is partitioned into a plurality of frequency channels as stated above.
  • the number of frequency channels used within the preferred embodiment of the present invention is 4096 and spans a frequency range from 0 to 22,05kHz. This allows for a spectral composition of 4096 frequency channels, enough for modern high quality digital audio streams as, according to Nyquist, the frequency analysis of a digital audio stream sampled with 44,1kHz is limited to frequencies up to 22,05kHz.
  • the frequency channels are partitioned for each of the time windows into a plurality of, preferably contiguous, spectral bands based on a Bark-Scale or a Mel-Scale, whereby each of the spectral bands have an extent depending upon human frequency perception.
  • Such partitioning of the frequency channels is a frequency depending and non-linear grouping of such frequency channels based on the anatomy of the human ear. The lower the frequency, the lower is the number of frequency channels in a spectral band.
  • the frequency channels are partitioned into seven spectral bands:
  • the first spectral band includes the frequency channels from 0 - 16 spanning the frequency range from OHz - 178Hz.
  • the second spectral band includes the frequency channels from 17 32 spanning the frequency range from 178Hz - 351Hz.
  • the third spectral band includes the frequency channels from 33 - 64 spanning the frequency range from 351Hz - 695Hz.
  • the fourth spectral band includes the frequency channels from 65 - 128 spanning the frequency range from 695Hz - 1383Hz.
  • the fifth spectral band includes the frequency channels from 129-256 spanning the frequency range from 1383Hz - 2761Hz.
  • the sixth spectral band includes the frequency channels from 257 - 512 spanning the frequency range from 2761Hz - 5518Hz.
  • the seventh spectral band includes the frequency channels from 513 - 4095 spanning the frequency range from 5318Hz - 22,05kHz.
  • the next step, the phase calculation includes the principles of multiresolution peak-picking and trajectory heuristics according to the present invention.
  • dominant spectral lines within each of the spectral bands are identified based on the magnitude of the spectral lines, that is the magnitude of their peaks, whereby the number of the dominant spectral lines in each of the spectral line is not limited to 1 but can be equal to or larger than zero.
  • the number of spectral lines to be identified as dominant spectral lines within each of the spectral bands depends on the respective spectral band in which the dominant spectral lines are to be identified.
  • the identification of dominant spectral lines becomes also frequency dependent. More precisely, the number of spectral lines to be identified as dominant spectral lines within each of the spectral bands becomes smaller as the frequencies in the respective spectral band become higher. However, the number of dominant spectral lines can also be calculated in a non-deterministic way, whereby the probability of a spectral line to be a dominant spectral line becomes lower as the respective spectral line lies within a spectral band spanning higher frequencies. Such a frequency dependent peak-picking considers the fact that changes and transitions in music or speech are perceived more dominant when they affect low frequencies.
  • Fig. 5 is an illustration showing the principle of multiresolution peak-picking.
  • a part of an audio signal is shown in a graph with the x axis presenting frequency axis and the y axis presenting magnitude.
  • the band below the x axis presents a plurality of spectral bands partitioning the frequencies that are the spectral lines.
  • the partitioning of the frequencies is based on the Bark-Scale with seven spectral bands becoming larger as the frequencies spanned become higher.
  • a number of dominant spectral lines is identified, the number of the dominant spectral lines being equal to or larger than zero, here larger than one.
  • this identification of trajectories includes the determination whether one of the dominant spectral lines with a first associated frequency channel in one of the spectral bands in one of the time windows is a continuation of a dominant spectral line with a second associated frequency channel in the same or another spectral band of the preceding time window, or whether the dominant spectral line in one of the spectral band in one of the time windows is a new dominant spectral line and therefore not a continuation of a dominant spectral line in the same or another spectral band of the preceding time window.
  • a fixed trajectory jump distance can be used that defines the maximum number of frequency channels that can lie between the first dominant spectral line and the consecutive second dominant spectral line. If the actual number of frequency channels between the first and the second dominant spectral lines exceeds this number the first and the second dominant spectral lines are assumed not to be a continuation. Then, such separated dominant spectral lines do not lie on a trajectory.
  • a phase difference for each of the identified dominant spectral lines lying on a trajectory by the phase vocoder principle - as mentioned above - using a phase relation of each of the dominant spectral lines lying on a trajectory in consecutive time windows is determined depending on the time scaling factor.
  • the phase difference of a given number of spectral lines near the dominant spectral lines lying on a trajectory within each of the spectral bands is determined as the phase difference of the corresponding dominant spectral line lying on a trajectory.
  • the given number of spectral lines that are assumed to be near a corresponding one of the dominant spectral lines lying on a trajectory can be a fixed number or can also be frequency and/or spectral band dependent.
  • the phase differences of all other spectral lines, such lines that are no dominant spectral lines lying on a trajectory and are also not near such dominant spectral lines lying on a trajectory, of the spectral bands are determined by a phase vocoder principle, using a phase relation of each of the other spectral lines in consecutive time windows.
  • All the steps of the phase calculation can be performed by state of the art analogue and/or digital electronic devices or any other devices that are suitable to perform these steps. In the preferred embodiment of the present invention these steps are performed by digital equipment, such as a computer.
  • time scale modified, equally spaced, overlapping time windows having a second overlap amount are synthesized to a modified audio signal employing the determined phase differences for each of the spectral lines.
  • the second overlap is selected to have a ratio to the first overlap amount to achieve the desired time scale modification.
  • Such synthesis can be performed - as described above - by a bank of analogue and/or digital oscillators or in the preferred digital embodiment of the present invention by an inverse Fast Fourier Transformation.
  • the modified audio signal is then provided for further modifications, such as A/D converting or amplifying.
  • phase relation can be reset in the preferred embodiment of the present invention for all spectral lines when the dynamics in one of the time windows are below a given threshold for a given time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Auxiliary Devices For Music (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
EP06123394A 2006-11-02 2006-11-02 Zeitskalenmodifikation eines Audiosignals Withdrawn EP1918911A1 (de)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP06123394A EP1918911A1 (de) 2006-11-02 2006-11-02 Zeitskalenmodifikation eines Audiosignals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP06123394A EP1918911A1 (de) 2006-11-02 2006-11-02 Zeitskalenmodifikation eines Audiosignals

Publications (1)

Publication Number Publication Date
EP1918911A1 true EP1918911A1 (de) 2008-05-07

Family

ID=37813596

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06123394A Withdrawn EP1918911A1 (de) 2006-11-02 2006-11-02 Zeitskalenmodifikation eines Audiosignals

Country Status (1)

Country Link
EP (1) EP1918911A1 (de)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010086194A3 (en) * 2009-01-30 2011-09-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for manipulating an audio signal comprising a transient event
JP2015508911A (ja) * 2012-02-27 2015-03-23 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 知覚的オーディオコーデックにおけるハーモニック信号のための位相コヒーレンス制御
US9305570B2 (en) 2012-06-13 2016-04-05 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for pitch trajectory analysis
GB2537924A (en) * 2015-04-30 2016-11-02 Toshiba Res Europe Ltd A Speech Processing System and Method
US9508386B2 (en) * 2014-06-27 2016-11-29 Nokia Technologies Oy Method and apparatus for synchronizing audio and video signals
RU2611986C2 (ru) * 2010-03-11 2017-03-01 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. Сигнальный процессор, формирователь окон, кодированный медиасигнал, способ обработки сигнала и способ формирования окон
RU2713094C1 (ru) * 2016-05-20 2020-02-03 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство и способ обработки многоканального аудиосигнала

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0766230A2 (de) * 1995-09-28 1997-04-02 Sony Corporation Verfahren und Vorrichtung zur Sprachkodierung
US20050010397A1 (en) * 2002-11-15 2005-01-13 Atsuhiro Sakurai Phase locking method for frequency domain time scale modification based on a bark-scale spectral partition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0766230A2 (de) * 1995-09-28 1997-04-02 Sony Corporation Verfahren und Vorrichtung zur Sprachkodierung
US20050010397A1 (en) * 2002-11-15 2005-01-13 Atsuhiro Sakurai Phase locking method for frequency domain time scale modification based on a bark-scale spectral partition

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HERMUS K ET AL: "Perceptual audio modeling with exponentially damped sinusoids", SIGNAL PROCESSING, ELSEVIER SCIENCE PUBLISHERS B.V. AMSTERDAM, NL, vol. 85, no. 1, January 2005 (2005-01-01), pages 163 - 176, XP004656865, ISSN: 0165-1684 *
JEAN LAROCHE ET AL: "Improved Phase Vocoder Time-Scale Modification of Audio", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 7, no. 3, May 1999 (1999-05-01), XP011054370, ISSN: 1063-6676 *
LEE E., KARRER T., BORCHERS J.: "PhaVoRIT: A Phase Vocoder for Real-Time Interactive Time-Stretching", PROCEEDINGS ICMC 2006 (INTERNATIONAL COMPUTER MUSIC CONFERENCE), 6 November 2006 (2006-11-06), XP002424910 *
LEE E., KARRER T., BORCHERS J.: "Toward a Framework for Interactive Systems to Conduct Digital Audio and Video Streams", COMPUTER MUSIC JOURNAL, vol. 30, no. 1, 9 February 2006 (2006-02-09), pages 21 - 36, XP002424909 *
LEVINE S N ET AL: "Multiresolution sinusoidal modeling for wideband audio with modifications", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, vol. 6, 12 May 1998 (1998-05-12), pages 3585 - 3588, XP010279556, ISBN: 0-7803-4428-6 *
MCAULAY R J ET AL: "SPEECH ANALYSIS/SYNTHESIS BASED ON A SINUSOIDAL REPRESENTATION", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, US, vol. ASSP-34, no. 4, August 1986 (1986-08-01), pages 744 - 754, XP001002928, ISSN: 0096-3518 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9230557B2 (en) 2009-01-30 2016-01-05 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus, method and computer program for manipulating an audio signal comprising a transient event
CN102341847A (zh) * 2009-01-30 2012-02-01 弗劳恩霍夫应用研究促进协会 用于操纵包括暂态事件的音频信号的装置、方法和计算机程序
CN102341847B (zh) * 2009-01-30 2014-01-08 弗劳恩霍夫应用研究促进协会 用于操纵包括暂态事件的音频信号的装置和方法
RU2543309C2 (ru) * 2009-01-30 2015-02-27 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство, способ и компьютерная программа для того, чтобы управлять аудиосигналом, включающим переходный сигнал
WO2010086194A3 (en) * 2009-01-30 2011-09-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method and computer program for manipulating an audio signal comprising a transient event
RU2611986C2 (ru) * 2010-03-11 2017-03-01 Фраунхофер-Гезелльшафт цур Фёрдерунг дер ангевандтен Форшунг Е.Ф. Сигнальный процессор, формирователь окон, кодированный медиасигнал, способ обработки сигнала и способ формирования окон
JP2015508911A (ja) * 2012-02-27 2015-03-23 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン 知覚的オーディオコーデックにおけるハーモニック信号のための位相コヒーレンス制御
US10818304B2 (en) 2012-02-27 2020-10-27 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Phase coherence control for harmonic signals in perceptual audio codecs
US9305570B2 (en) 2012-06-13 2016-04-05 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for pitch trajectory analysis
US9508386B2 (en) * 2014-06-27 2016-11-29 Nokia Technologies Oy Method and apparatus for synchronizing audio and video signals
GB2537924A (en) * 2015-04-30 2016-11-02 Toshiba Res Europe Ltd A Speech Processing System and Method
GB2537924B (en) * 2015-04-30 2018-12-05 Toshiba Res Europe Limited A Speech Processing System and Method
RU2713094C1 (ru) * 2016-05-20 2020-02-03 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство и способ обработки многоканального аудиосигнала
US11929089B2 (en) 2016-05-20 2024-03-12 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for processing a multichannel audio signal

Similar Documents

Publication Publication Date Title
Masri et al. Imroved Modelling of Attack Transients in Music Analysis-Resynthesis
JP4906230B2 (ja) オーディトリーイベントに基づく特徴付けを使ったオーディオ信号の時間調整方法
EP1918911A1 (de) Zeitskalenmodifikation eines Audiosignals
Böck et al. Maximum filter vibrato suppression for onset detection
EP1393300B1 (de) Segmentierung von audiosignalen in hörereignissen
AU2002252143A1 (en) Segmenting audio signals into auditory events
JP3033061B2 (ja) 音声雑音分離装置
WO2015092492A1 (en) Audio information processing
Kawamura et al. Differentiable digital signal processing mixture model for synthesis parameter extraction from mixture of harmonic sounds
JP5694745B2 (ja) 隠蔽信号生成装置、隠蔽信号生成方法および隠蔽信号生成プログラム
US9240196B2 (en) Apparatus and method for handling transient sound events in audio signals when changing the replay speed or pitch
Quatieri et al. A subband approach to time-scale expansion of complex acoustic signals
Rai et al. Analysis of three pitch-shifting algorithms for different musical instruments
Thirumuru et al. Improved vowel region detection from a continuous speech using post processing of vowel onset points and vowel end-points
Barbedo et al. A robust and computationally efficient speech/music discriminator
de León et al. Blind separation of overlapping partials in harmonic musical notes using amplitude and phase reconstruction
Zivanovic Harmonic bandwidth companding for separation of overlapping harmonics in pitched signals
JP4468506B2 (ja) 音声データ作成装置および声質変換方法
Barbedo et al. Empirical methods to determine the number of sources in single-channel musical signals
JP2001027895A (ja) 信号分離方法及び装置
Gupta et al. Lyrics-to-audio alignment with music-aware acoustic models
Esquef et al. Spectral-based analysis and synthesis of audio signals
Müller et al. Music signal processing
Jain et al. Feature extraction techniques based on human auditory system
Aczél et al. Sound separation of polyphonic music using instrument prints

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

AKX Designation fees paid
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20081110

REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566