EP2840570A1 - Verbesserte Schätzung von mindestens einem Zielsignal - Google Patents

Verbesserte Schätzung von mindestens einem Zielsignal Download PDF

Info

Publication number
EP2840570A1
EP2840570A1 EP13181563.1A EP13181563A EP2840570A1 EP 2840570 A1 EP2840570 A1 EP 2840570A1 EP 13181563 A EP13181563 A EP 13181563A EP 2840570 A1 EP2840570 A1 EP 2840570A1
Authority
EP
European Patent Office
Prior art keywords
signal
phase
amplitude
estimation
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13181563.1A
Other languages
English (en)
French (fr)
Inventor
Pejman Mowlaee
Rahim Saeidi
Gernot Kubin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technische Universitaet Graz
Original Assignee
Technische Universitaet Graz
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technische Universitaet Graz filed Critical Technische Universitaet Graz
Priority to EP13181563.1A priority Critical patent/EP2840570A1/de
Priority to PCT/EP2014/067667 priority patent/WO2015024940A1/en
Priority to EP14753072.9A priority patent/EP3036739A1/de
Publication of EP2840570A1 publication Critical patent/EP2840570A1/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Definitions

  • the signal of interest can be any target signal included in the at least one discrete-time signal.
  • This approach according to the invention pushes the limits of the conventional methods by introducing interaction between amplitude estimation and phase estimation stages.
  • the at least one discrete-time signal can be a bio-medical, radar, image or video signal.
  • the complex time-frequency representation X can be either one- or multidimensional.
  • the matrix X is typically composed of frames as rows and frequency bins as its columns (rows are often larger than the columns).
  • speech signals it is composed of a wide dynamic range of values (80 dB).
  • the dynamic range is often much lower as the signal is sparse in time-frequency.
  • the at least one discrete-time signal can be derived from a multi channel signal.
  • An additional information provided by at least a second measurement device can be processed to give an extraordinary accurate estimation of the at least one target signal.
  • the method according to the invention is also suited to estimate two or more target signals.
  • a typical approach to estimate the signal of interest s 1 (t) consists of transforming the continuous-time signal y(t) into a quantized discrete-time signal y(n) by applying an analog digital converter 1 on the continuous-time signal y(t).
  • a signal estimation device 2 processes the discrete time signal y(n) using a priori information to provide an estimate of at least the signal of interest ⁇ 1 (n). In the given example an estimate of the signal ⁇ 2 (n) representing noise is provided as well.
  • Fig. 3 shows exemplary state of the art modifications of the modification stage of Fig. 2 (if not stated otherwise in the description of the figures, same reference signs describe same features).
  • the M discrete-time signals exploited from the number of M sensors are analyzed in block A, providing N x M samples in a complex format as described in Fig. 2 .
  • block 3 the amplitude part contained in the complex format of the samples is exploited (block 4 exploits the phase part contained in the complex format of the samples).
  • the samples are processed through amplitude or phase enhancement stages, wherein the amplitude enhancement stage is provided with a noise estimate, and finally synthesized in block S to provide an enhanced signal, in particular an enhanced speech signal.
  • the modifications of the samples can be categorized in four different groups.
  • FIG. 4 shows an exemplary schematic block-diagram of a variant of the invention.
  • Block A and block S represent analysis and synthesis blocks as described in Fig. 2 and 3 , wherein block A is provided with at least one discrete-time signal y(n).
  • the conventional enhancement block C represents any phase-unaware amplitude estimator or phase-unaware amplitude estimation methods (or any amplitude estimation method performed irrespectively of the phase spectrum of at least the signal of interest s 1 (t)), which separate signal of interest s 1 (n) from noise and/or other signals s 2 (n) for example by applying a frequency-dependent gain function (mask) on observed noisy amplitude spectrum.
  • gain functions are Wiener filter (as softmask) and binary mask.
  • Noise reduction capability obtained by conventional methods is limited since they only modify the amplitude or phase individually.
  • the block C and the block "New Enhancement" is provided with a noise estimate.
  • Fig. 4 shows a block "stopping rule", which provides a criterion to stop the feedback loop.
  • the output of the block “New Enhancement” can be looped back as an input signal s in (n) (the input signal s in (n) can be in complex format) for the block “New Enhancement” in a following iteration.
  • the block “New Enhancement” is described in more detail in Fig. 6 .
  • Fig. 6 shows a schematic block-diagram of the block "New Enhancement” according to the invention shown in Fig. 4 and 5 .
  • Fig. 8 shows a schematic block-diagram of a typical single-channel separation algorithm based on amplitude estimation on a complex spectrum of a noisy signal described in appendix AP1, said amplitude estimation being performed phase-unaware.
  • a signal y comprises two signals s1 and s2 to be separated, wherein amplitude estimates ⁇ 1 and ⁇ 2 a noisy phase signal ⁇ y is applied to reconstruct the clean signals ⁇ 1 and ⁇ 2 .
  • Fig. 9 shows a schematic block-diagram of amplitude-aware phase estimation.
  • the signal reconstruction is provided with phase information corresponding to the signals s1 and s2 respectively.
  • An minimum mean square error (MMSE) phase estimation block is shown, which is provided with the amplitude estimates ⁇ 1 and ⁇ 2 and the signal y, said phase estimation being amplitude-aware and providing phase signals ⁇ 1 and ⁇ 2 .
  • MMSE minimum mean square error
  • Fig. 10 shows two schematic block-diagrams of two different single-channel speech separation algorithms.
  • a typical method to estimate a clean speech amplitude X ⁇ (corresponding to ⁇ 1 of Fig. 8 and 9 ) is shown in (a), wherein the amplitude estimation (within the block "Gain function") is not provided with any phase information.
  • the amplitude estimation (within the block "Gain function") is not provided with any phase information.
  • phase-aware amplitude estimation and amplitude-aware phase estimation do not relate to speech signals only.
  • phase-aware amplitude estimation and amplitude-aware phase estimation is applicable to a plurality of signals and the speech signals described in appendix AP2 and AP1 just represent one utilization of phase-aware amplitude estimation and amplitude-aware phase example, respectively. Therefore, the invention is not limited to the examples given in this specification and can be adjusted in any manner known to a person skilled in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
EP13181563.1A 2013-08-23 2013-08-23 Verbesserte Schätzung von mindestens einem Zielsignal Withdrawn EP2840570A1 (de)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP13181563.1A EP2840570A1 (de) 2013-08-23 2013-08-23 Verbesserte Schätzung von mindestens einem Zielsignal
PCT/EP2014/067667 WO2015024940A1 (en) 2013-08-23 2014-08-19 Enhanced estimation of at least one target signal
EP14753072.9A EP3036739A1 (de) 2013-08-23 2014-08-19 Verbesserte schätzung von mindestens einem zielsignal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP13181563.1A EP2840570A1 (de) 2013-08-23 2013-08-23 Verbesserte Schätzung von mindestens einem Zielsignal

Publications (1)

Publication Number Publication Date
EP2840570A1 true EP2840570A1 (de) 2015-02-25

Family

ID=49115345

Family Applications (2)

Application Number Title Priority Date Filing Date
EP13181563.1A Withdrawn EP2840570A1 (de) 2013-08-23 2013-08-23 Verbesserte Schätzung von mindestens einem Zielsignal
EP14753072.9A Withdrawn EP3036739A1 (de) 2013-08-23 2014-08-19 Verbesserte schätzung von mindestens einem zielsignal

Family Applications After (1)

Application Number Title Priority Date Filing Date
EP14753072.9A Withdrawn EP3036739A1 (de) 2013-08-23 2014-08-19 Verbesserte schätzung von mindestens einem zielsignal

Country Status (2)

Country Link
EP (2) EP2840570A1 (de)
WO (1) WO2015024940A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903355A (zh) * 2021-12-09 2022-01-07 北京世纪好未来教育科技有限公司 语音获取方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7492814B1 (en) * 2005-06-09 2009-02-17 The U.S. Government As Represented By The Director Of The National Security Agency Method of removing noise and interference from signal using peak picking
US20090163168A1 (en) * 2005-04-26 2009-06-25 Aalborg Universitet Efficient initialization of iterative parameter estimation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090163168A1 (en) * 2005-04-26 2009-06-25 Aalborg Universitet Efficient initialization of iterative parameter estimation
US7492814B1 (en) * 2005-06-09 2009-02-17 The U.S. Government As Represented By The Director Of The National Security Agency Method of removing noise and interference from signal using peak picking

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
EPHRAIM Y ET AL: "Speech Enhancement Using a- Minimum Mean- Square Error Short-Time Spectral Amplitude Estimator", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, USA, vol. ASSP-32, no. 6, 1 December 1984 (1984-12-01), pages 1109 - 1121, XP002435684, ISSN: 0096-3518, DOI: 10.1109/TASSP.1984.1164453 *
MOWLAEE P ET AL: "On phase importance in parameter estimation in single-channel speech enhancement", 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) IEEE PISCATAWAY, NJ, USA, 31 May 2013 (2013-05-31) - 31 May 2013 (2013-05-31), pages 7462 - 7466, XP002717793, ISBN: 978-1-4799-0356-6, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639113> *
PEJMAN MOWLAEE ET AL: "Phase estimation for signal reconstruction in single-channel speech separation", INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 31 January 2013 (2013-01-31), XP055092414 *
TIMO GERKMANN ET AL: "MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase", IEEE SIGNAL PROCESSING LETTERS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 20, no. 2, 1 February 2013 (2013-02-01), pages 129 - 132, XP011482926, ISSN: 1070-9908, DOI: 10.1109/LSP.2012.2233470 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903355A (zh) * 2021-12-09 2022-01-07 北京世纪好未来教育科技有限公司 语音获取方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP3036739A1 (de) 2016-06-29
WO2015024940A1 (en) 2015-02-26

Similar Documents

Publication Publication Date Title
US9666183B2 (en) Deep neural net based filter prediction for audio event classification and extraction
JP3154487B2 (ja) 音声認識の際の雑音のロバストネスを改善するためにスペクトル的推定を行う方法
DE112014003337T5 (de) Sprachsignaltrennung und Synthese basierend auf auditorischer Szenenanalyse und Sprachmodellierung
JP2005518118A (ja) 周波数解析のためのフィルタセット
Vincent et al. Estimation of LF glottal source parameters based on an ARX model.
Ganapathy Multivariate autoregressive spectrogram modeling for noisy speech recognition
Min et al. Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement
Dumortier et al. Blind RT60 estimation robust across room sizes and source distances
Do et al. Speech Separation in the Frequency Domain with Autoencoder.
Islam et al. Supervised single channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask
Watanabe et al. Iterative sinusoidal-based partial phase reconstruction in single-channel source separation.
Agcaer et al. Optimization of amplitude modulation features for low-resource acoustic scene classification
EP2840570A1 (de) Verbesserte Schätzung von mindestens einem Zielsignal
Yoshioka et al. Dereverberation by using time-variant nature of speech production system
CN107919136B (zh) 一种基于高斯混合模型的数字语音采样频率估计方法
Li et al. Multichannel identification and nonnegative equalization for dereverberation and noise reduction based on convolutive transfer function
Bavkar et al. PCA based single channel speech enhancement method for highly noisy environment
Do et al. A variational autoencoder approach for speech signal separation
Malek Blind compensation of memoryless nonlinear distortions in sparse signals
Rassem et al. Restoring the missing features of the corrupted speech using linear interpolation methods
CN110491408B (zh) 一种基于稀疏元分析的音乐信号欠定混叠盲分离方法
Adrian et al. Synthesis of perceptually plausible multichannel noise signals controlled by real world statistical noise properties
Mallidi et al. Robust speaker recognition using spectro-temporal autoregressive models.
Saleem et al. Regularized sparse decomposition model for speech enhancement via convex distortion measure
JP6849978B2 (ja) 音声明瞭度計算方法、音声明瞭度計算装置及び音声明瞭度計算プログラム

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20130823

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20150826