EP2840570A1 - Verbesserte Schätzung von mindestens einem Zielsignal - Google Patents
Verbesserte Schätzung von mindestens einem Zielsignal Download PDFInfo
- Publication number
- EP2840570A1 EP2840570A1 EP13181563.1A EP13181563A EP2840570A1 EP 2840570 A1 EP2840570 A1 EP 2840570A1 EP 13181563 A EP13181563 A EP 13181563A EP 2840570 A1 EP2840570 A1 EP 2840570A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- phase
- amplitude
- estimation
- discrete
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001228 spectrum Methods 0.000 claims abstract description 65
- 238000000034 method Methods 0.000 claims abstract description 46
- 230000001131 transforming effect Effects 0.000 claims abstract description 4
- 230000009466 transformation Effects 0.000 claims description 8
- 230000001419 dependent effect Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 claims description 2
- 238000011426 transformation method Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 18
- 230000004048 modification Effects 0.000 description 9
- 238000012986 modification Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 230000003595 spectral effect Effects 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002715 modification method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Definitions
- the signal of interest can be any target signal included in the at least one discrete-time signal.
- This approach according to the invention pushes the limits of the conventional methods by introducing interaction between amplitude estimation and phase estimation stages.
- the at least one discrete-time signal can be a bio-medical, radar, image or video signal.
- the complex time-frequency representation X can be either one- or multidimensional.
- the matrix X is typically composed of frames as rows and frequency bins as its columns (rows are often larger than the columns).
- speech signals it is composed of a wide dynamic range of values (80 dB).
- the dynamic range is often much lower as the signal is sparse in time-frequency.
- the at least one discrete-time signal can be derived from a multi channel signal.
- An additional information provided by at least a second measurement device can be processed to give an extraordinary accurate estimation of the at least one target signal.
- the method according to the invention is also suited to estimate two or more target signals.
- a typical approach to estimate the signal of interest s 1 (t) consists of transforming the continuous-time signal y(t) into a quantized discrete-time signal y(n) by applying an analog digital converter 1 on the continuous-time signal y(t).
- a signal estimation device 2 processes the discrete time signal y(n) using a priori information to provide an estimate of at least the signal of interest ⁇ 1 (n). In the given example an estimate of the signal ⁇ 2 (n) representing noise is provided as well.
- Fig. 3 shows exemplary state of the art modifications of the modification stage of Fig. 2 (if not stated otherwise in the description of the figures, same reference signs describe same features).
- the M discrete-time signals exploited from the number of M sensors are analyzed in block A, providing N x M samples in a complex format as described in Fig. 2 .
- block 3 the amplitude part contained in the complex format of the samples is exploited (block 4 exploits the phase part contained in the complex format of the samples).
- the samples are processed through amplitude or phase enhancement stages, wherein the amplitude enhancement stage is provided with a noise estimate, and finally synthesized in block S to provide an enhanced signal, in particular an enhanced speech signal.
- the modifications of the samples can be categorized in four different groups.
- FIG. 4 shows an exemplary schematic block-diagram of a variant of the invention.
- Block A and block S represent analysis and synthesis blocks as described in Fig. 2 and 3 , wherein block A is provided with at least one discrete-time signal y(n).
- the conventional enhancement block C represents any phase-unaware amplitude estimator or phase-unaware amplitude estimation methods (or any amplitude estimation method performed irrespectively of the phase spectrum of at least the signal of interest s 1 (t)), which separate signal of interest s 1 (n) from noise and/or other signals s 2 (n) for example by applying a frequency-dependent gain function (mask) on observed noisy amplitude spectrum.
- gain functions are Wiener filter (as softmask) and binary mask.
- Noise reduction capability obtained by conventional methods is limited since they only modify the amplitude or phase individually.
- the block C and the block "New Enhancement" is provided with a noise estimate.
- Fig. 4 shows a block "stopping rule", which provides a criterion to stop the feedback loop.
- the output of the block “New Enhancement” can be looped back as an input signal s in (n) (the input signal s in (n) can be in complex format) for the block “New Enhancement” in a following iteration.
- the block “New Enhancement” is described in more detail in Fig. 6 .
- Fig. 6 shows a schematic block-diagram of the block "New Enhancement” according to the invention shown in Fig. 4 and 5 .
- Fig. 8 shows a schematic block-diagram of a typical single-channel separation algorithm based on amplitude estimation on a complex spectrum of a noisy signal described in appendix AP1, said amplitude estimation being performed phase-unaware.
- a signal y comprises two signals s1 and s2 to be separated, wherein amplitude estimates ⁇ 1 and ⁇ 2 a noisy phase signal ⁇ y is applied to reconstruct the clean signals ⁇ 1 and ⁇ 2 .
- Fig. 9 shows a schematic block-diagram of amplitude-aware phase estimation.
- the signal reconstruction is provided with phase information corresponding to the signals s1 and s2 respectively.
- An minimum mean square error (MMSE) phase estimation block is shown, which is provided with the amplitude estimates ⁇ 1 and ⁇ 2 and the signal y, said phase estimation being amplitude-aware and providing phase signals ⁇ 1 and ⁇ 2 .
- MMSE minimum mean square error
- Fig. 10 shows two schematic block-diagrams of two different single-channel speech separation algorithms.
- a typical method to estimate a clean speech amplitude X ⁇ (corresponding to ⁇ 1 of Fig. 8 and 9 ) is shown in (a), wherein the amplitude estimation (within the block "Gain function") is not provided with any phase information.
- the amplitude estimation (within the block "Gain function") is not provided with any phase information.
- phase-aware amplitude estimation and amplitude-aware phase estimation do not relate to speech signals only.
- phase-aware amplitude estimation and amplitude-aware phase estimation is applicable to a plurality of signals and the speech signals described in appendix AP2 and AP1 just represent one utilization of phase-aware amplitude estimation and amplitude-aware phase example, respectively. Therefore, the invention is not limited to the examples given in this specification and can be adjusted in any manner known to a person skilled in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13181563.1A EP2840570A1 (de) | 2013-08-23 | 2013-08-23 | Verbesserte Schätzung von mindestens einem Zielsignal |
PCT/EP2014/067667 WO2015024940A1 (en) | 2013-08-23 | 2014-08-19 | Enhanced estimation of at least one target signal |
EP14753072.9A EP3036739A1 (de) | 2013-08-23 | 2014-08-19 | Verbesserte schätzung von mindestens einem zielsignal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13181563.1A EP2840570A1 (de) | 2013-08-23 | 2013-08-23 | Verbesserte Schätzung von mindestens einem Zielsignal |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2840570A1 true EP2840570A1 (de) | 2015-02-25 |
Family
ID=49115345
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP13181563.1A Withdrawn EP2840570A1 (de) | 2013-08-23 | 2013-08-23 | Verbesserte Schätzung von mindestens einem Zielsignal |
EP14753072.9A Withdrawn EP3036739A1 (de) | 2013-08-23 | 2014-08-19 | Verbesserte schätzung von mindestens einem zielsignal |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP14753072.9A Withdrawn EP3036739A1 (de) | 2013-08-23 | 2014-08-19 | Verbesserte schätzung von mindestens einem zielsignal |
Country Status (2)
Country | Link |
---|---|
EP (2) | EP2840570A1 (de) |
WO (1) | WO2015024940A1 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113903355A (zh) * | 2021-12-09 | 2022-01-07 | 北京世纪好未来教育科技有限公司 | 语音获取方法、装置、电子设备及存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7492814B1 (en) * | 2005-06-09 | 2009-02-17 | The U.S. Government As Represented By The Director Of The National Security Agency | Method of removing noise and interference from signal using peak picking |
US20090163168A1 (en) * | 2005-04-26 | 2009-06-25 | Aalborg Universitet | Efficient initialization of iterative parameter estimation |
-
2013
- 2013-08-23 EP EP13181563.1A patent/EP2840570A1/de not_active Withdrawn
-
2014
- 2014-08-19 EP EP14753072.9A patent/EP3036739A1/de not_active Withdrawn
- 2014-08-19 WO PCT/EP2014/067667 patent/WO2015024940A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090163168A1 (en) * | 2005-04-26 | 2009-06-25 | Aalborg Universitet | Efficient initialization of iterative parameter estimation |
US7492814B1 (en) * | 2005-06-09 | 2009-02-17 | The U.S. Government As Represented By The Director Of The National Security Agency | Method of removing noise and interference from signal using peak picking |
Non-Patent Citations (4)
Title |
---|
EPHRAIM Y ET AL: "Speech Enhancement Using a- Minimum Mean- Square Error Short-Time Spectral Amplitude Estimator", IEEE TRANSACTIONS ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, IEEE INC. NEW YORK, USA, vol. ASSP-32, no. 6, 1 December 1984 (1984-12-01), pages 1109 - 1121, XP002435684, ISSN: 0096-3518, DOI: 10.1109/TASSP.1984.1164453 * |
MOWLAEE P ET AL: "On phase importance in parameter estimation in single-channel speech enhancement", 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) IEEE PISCATAWAY, NJ, USA, 31 May 2013 (2013-05-31) - 31 May 2013 (2013-05-31), pages 7462 - 7466, XP002717793, ISBN: 978-1-4799-0356-6, Retrieved from the Internet <URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639113> * |
PEJMAN MOWLAEE ET AL: "Phase estimation for signal reconstruction in single-channel speech separation", INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, 31 January 2013 (2013-01-31), XP055092414 * |
TIMO GERKMANN ET AL: "MMSE-Optimal Spectral Amplitude Estimation Given the STFT-Phase", IEEE SIGNAL PROCESSING LETTERS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 20, no. 2, 1 February 2013 (2013-02-01), pages 129 - 132, XP011482926, ISSN: 1070-9908, DOI: 10.1109/LSP.2012.2233470 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113903355A (zh) * | 2021-12-09 | 2022-01-07 | 北京世纪好未来教育科技有限公司 | 语音获取方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP3036739A1 (de) | 2016-06-29 |
WO2015024940A1 (en) | 2015-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9666183B2 (en) | Deep neural net based filter prediction for audio event classification and extraction | |
JP3154487B2 (ja) | 音声認識の際の雑音のロバストネスを改善するためにスペクトル的推定を行う方法 | |
DE112014003337T5 (de) | Sprachsignaltrennung und Synthese basierend auf auditorischer Szenenanalyse und Sprachmodellierung | |
JP2005518118A (ja) | 周波数解析のためのフィルタセット | |
Vincent et al. | Estimation of LF glottal source parameters based on an ARX model. | |
Ganapathy | Multivariate autoregressive spectrogram modeling for noisy speech recognition | |
Min et al. | Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement | |
Dumortier et al. | Blind RT60 estimation robust across room sizes and source distances | |
Do et al. | Speech Separation in the Frequency Domain with Autoencoder. | |
Islam et al. | Supervised single channel speech enhancement based on stationary wavelet transforms and non-negative matrix factorization with concatenated framing process and subband smooth ratio mask | |
Watanabe et al. | Iterative sinusoidal-based partial phase reconstruction in single-channel source separation. | |
Agcaer et al. | Optimization of amplitude modulation features for low-resource acoustic scene classification | |
EP2840570A1 (de) | Verbesserte Schätzung von mindestens einem Zielsignal | |
Yoshioka et al. | Dereverberation by using time-variant nature of speech production system | |
CN107919136B (zh) | 一种基于高斯混合模型的数字语音采样频率估计方法 | |
Li et al. | Multichannel identification and nonnegative equalization for dereverberation and noise reduction based on convolutive transfer function | |
Bavkar et al. | PCA based single channel speech enhancement method for highly noisy environment | |
Do et al. | A variational autoencoder approach for speech signal separation | |
Malek | Blind compensation of memoryless nonlinear distortions in sparse signals | |
Rassem et al. | Restoring the missing features of the corrupted speech using linear interpolation methods | |
CN110491408B (zh) | 一种基于稀疏元分析的音乐信号欠定混叠盲分离方法 | |
Adrian et al. | Synthesis of perceptually plausible multichannel noise signals controlled by real world statistical noise properties | |
Mallidi et al. | Robust speaker recognition using spectro-temporal autoregressive models. | |
Saleem et al. | Regularized sparse decomposition model for speech enhancement via convex distortion measure | |
JP6849978B2 (ja) | 音声明瞭度計算方法、音声明瞭度計算装置及び音声明瞭度計算プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20130823 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20150826 |