EP2312579A1 - Speech from noise separation with reference information - Google Patents

Speech from noise separation with reference information Download PDF

Info

Publication number
EP2312579A1
EP2312579A1 EP09173163A EP09173163A EP2312579A1 EP 2312579 A1 EP2312579 A1 EP 2312579A1 EP 09173163 A EP09173163 A EP 09173163A EP 09173163 A EP09173163 A EP 09173163A EP 2312579 A1 EP2312579 A1 EP 2312579A1
Authority
EP
European Patent Office
Prior art keywords
signal
mixture
reference signal
cues
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP09173163A
Other languages
German (de)
English (en)
French (fr)
Inventor
Martin Heckmann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honda Research Institute Europe GmbH
Original Assignee
Honda Research Institute Europe GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honda Research Institute Europe GmbH filed Critical Honda Research Institute Europe GmbH
Priority to EP09173163A priority Critical patent/EP2312579A1/en
Priority to JP2010182876A priority patent/JP5377442B2/ja
Publication of EP2312579A1 publication Critical patent/EP2312579A1/en
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02085Periodic noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Definitions

  • the invention generally refers to the processing of acoustically sensed signals.
  • the present invention relates to a system and a method for separating a mixture signal containing a mixture of acoustical target information ("speech") and interfering information (“noise").
  • speech acoustical target information
  • noise interfering information
  • the European patent application EP 1 879 180 A1 shows a method to reduce the background noise in speech signals with the help of a reference microphone.
  • the main idea behind this method is to estimate the spectrum of the noise based on a reference microphone which captures only the noise and then subtract these spectral components from the microphone signal which captures the mixture of the speech and noise signal.
  • the main disadvantage, however, of this method is that the acoustic environment where the noise is captured is normally different from that where the mixture of the noise and the speech signal are captured (e.g. the engine compartment and the passenger compartment if one wants to reduce the engine noise in a car).
  • the present invention relates to a technique for reducing noise in a mixture signal containing a mixture of noise and speech by means of additional reference information captures e.g. by a second microphone.
  • additional reference information captures e.g. by a second microphone.
  • the present invention does not try to reduce the noise directly in the signal domain but uses techniques inspired by Computational Auditory Scene Analysis (CASA) to reduce the noise.
  • CASA Computational Auditory Scene Analysis
  • a system for the separation of a mixture signal containing a mixture of target information and interfering information comprises means for receiving the mixture signal, means for receiving a reference signal and a signal processing unit configured to extract cues from the reference signal and to separate the target information from the mixture signal using these cues.
  • a method for separating a mixture signal containing a mixture of target information and interfering information comprises the steps of receiving the mixture signal, receiving a reference signal and extracting cues from the reference signal and separating the target information from the mixture signal using these cues.
  • the means for receiving the mixture signal and the means for receiving the reference signal may comprise a microphone and a recording unit each, wherein the microphone for the mixture signal may be positioned close to the origin of the target information and the microphone for the reference signal may be positioned close the origin of the interfering information, when the means for receiving the reference signal are configured to receive interfering information.
  • the interfering information can be also extracted from the speed of an engine.
  • the means for receiving the reference signal may be configured to receive target information, wherein this information can be extracted from a video signal, in particular from the movement of a speaker's body or the speaker's lip movement in the video signal.
  • the signal processing unit of the system may comprise means for splitting the reference signal and the mixture signal in a multitude of frequency channels, means for extracting grouping cues from the reference signal and evaluating the grouping cues in the mixture signal for each frequency channel at each instant in time and means for allocating each frequency channel of the mixture signal at each instant in time to either the target information or the interfering information and separating the mixture signal into the target information and the interfering information.
  • the signal processing unit of the system comprises means for splitting the mixture signal in a multitude of frequency channels, means for extracting grouping cues from the reference signal and evaluating the grouping cues in the mixture signal at each instant in time and means for allocating each frequency channel of the mixture signal at each instant in time to either the target information or the interfering information and separating the mixture signal into the target information and the interfering information.
  • the grouping cues may be the fundamental frequency or on- or off-sets.
  • the target information may be speech and the interfering information may be noise.
  • the system for separating a mixture signal may be included in a motorcycle helmet, wherein the means for receiving the mixture signal are positioned inside the helmet and the means for receiving the reference signal are positioned partly inside the helmet and partly close to the engine of a motorcycle, wherein the means for receiving the reference signal are connected via a cable or wireless.
  • Fig. 1 shows a motorcyclist 7 driving a motorcycle 6 and wearing a helmet 4.
  • the helmet 4 includes a system according to the invention.
  • the system comprises a signal processing unit 1, means for receiving a mixture signal, here a microphone 2, and means for receiving a reference signal, here a microphone 3a, and a receiving unit 3b.
  • the microphone 2 for receiving the mixture signal and the receiving unit 3b are connected to the signal processing unit 1 via a cable.
  • the microphone 3a for receiving the reference signal is, however, not positioned in the helmet 4, but close to the engine 5 of the motorcycle 6 to be at the origin of the interfering signal, which in the shown example may be the harmonic noise generated by the engine 5.
  • the transmission of the reference signal of the microphone 3a to the receiving unit 3b can be for example accomplished via a wireless transmission.
  • the microphone 2 for receiving the mixture signal is positioned to the front of the helmet 4 close to the mouth of the motorcyclist 7.
  • the microphone 2 is therefore positioned close to the origin of the target signal, here the acoustically sensed speech signal of the motorcyclist 7.
  • the microphone 2 also receives noise of the engine 5, due to the fact that the engine 5 of the motorcycle 6 is quite loud and the engine noise is only slightly attenuated by the helmet 4. Therefore the mixture signal received by the microphone 2 contains a mixture of speech and noise.
  • the signal processing unit 1 is configured to extract cues from the reference signal received by the microphone 3 and to separate the speech from the mixture signal received by the microphone 2 using the cues. A detailed description of the extraction and separation will be given in combination with the method and Fig. 3 .
  • the system according to the invention is therefore able to significantly reduce the engine noise in the mixture signal. As a result of the reduction telecommunication while riding will be improved.
  • FIG. 2 Another application area for the system according to the invention is shown in Fig. 2 , where a car 8 is shown including a similar system to that in Fig. 1 .
  • the signal processing unit 1 and the microphone 2 for receiving the mixture signal are not positioned in a helmet, but inside the car 8.
  • the microphone 2 for receiving the mixture signal is positioned in the passenger compartment to be near to the mouth of the driver 7.
  • the microphone 3a for receiving the reference signal is again positioned close to the engine 5.
  • the signal processing unit 1 can be positioned anywhere in the car and has connections to the microphone 2 for receiving the mixture signal and the microphone 3a for receiving the reference signal. Therefore a receiving unit 3b is not needed here.
  • the system according to the invention does not only improve the headset free telecommunication but also speech based operation of devices in a car. In particular the reduction of the harmonic noise generated by the engine is here helpful.
  • Fig. 3 shows a method according to an embodiment of the invention. At the beginning an acoustically sensed mixture signal and a reference signal are received (100, 101).
  • the reference signal is preferably directly sensed from the origin of the noise (e.g. an engine, fan, ). In an ideal setup only the reference signal without any additional signals would be sensed such that the reference signal is available without distortions. This can best be achieved by sensing the reference signal close to its source.
  • the target signal will also preferably be sensed close to its source.
  • the target signal In the case of a speech signal of the driver of a car or a motorcycle sensing close to the drivers mouth would be best.
  • the target signal is commonly sensed at a certain distance from its source.
  • this mixture signal would be a mixture of noise generated by the engine and a speech signal of the driver.
  • the mixture signal and reference signal can for example be sensed by microphones.
  • both signals are split into a multitude of adjacent frequency channels 102, 103.
  • auditory scene analysis cues are extracted from the reference signal ("noise signal") 105.
  • These cues which are typically used in Computational Auditory Scene Analysis (CASA) systems for the separation of sources can be e.g. one or more of:
  • These auditory cues provide information on the reference signal. Knowing these cues allows identifying the reference signal in the mixture signal. For doing so, these cues are extracted in the reference signal, where the reference signal is mostly undistorted and these cues can easily be extracted, and then evaluated in the mixture signal. As a result of this evaluation parts, i.e. frequency channels at each instant in time, can be identified in which the reference signal is dominating the mixture signal.
  • the reference signal is transformed into the frequency domain and they are determined for each frequency channel at each instance in time.
  • these cues as e.g. the fundamental frequency it is also possible to extract these cues directly in the time domain and then calculate their effect on the different frequency channels (in the case of the fundamental frequency signal parts will be present at the fundamental frequency and at its harmonics which can easily be calculated from the fundamental frequency).
  • the auditory cues After extracting the auditory cues from the reference signal they are evaluated in the mixture signal (comprising e.g. noise and speech).
  • the mixture signal comprising e.g. noise and speech
  • the mixture signal is separated, in discrete time steps, into frequency channels "speech" and frequency channels "noise” (107).
  • Figs. 1 and 2 the system according to the invention is included in a motorcycle helmet and in a car.
  • a system is for example included in a robot. This would help to improve speech recognition systems in robots. Therefore robots or any other technical systems which are controlled by speech or interpret speech can be even used in loud and noisy environments.
  • Another application area where the system according to the invention can be used is the field of hearing devices.
  • the elimination of a noise in a mixture signal that the hearing device is receiving helps the person who uses the hearing device to even better understand the speech of other persons.
  • Figs. 1 and 2 are showing a system where the reference signal uses noise from an engine.
  • the reference signal information on the speech signal can be obtained e. g. by using a bone conductive microphone.
  • the necessary grouping information is extracted from the speech signal and then used to separate the speech signal from the noise signal.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
EP09173163A 2009-10-15 2009-10-15 Speech from noise separation with reference information Ceased EP2312579A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP09173163A EP2312579A1 (en) 2009-10-15 2009-10-15 Speech from noise separation with reference information
JP2010182876A JP5377442B2 (ja) 2009-10-15 2010-08-18 参照情報により雑音から音声を分離するシステム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP09173163A EP2312579A1 (en) 2009-10-15 2009-10-15 Speech from noise separation with reference information

Publications (1)

Publication Number Publication Date
EP2312579A1 true EP2312579A1 (en) 2011-04-20

Family

ID=41694574

Family Applications (1)

Application Number Title Priority Date Filing Date
EP09173163A Ceased EP2312579A1 (en) 2009-10-15 2009-10-15 Speech from noise separation with reference information

Country Status (2)

Country Link
EP (1) EP2312579A1 (enExample)
JP (1) JP5377442B2 (enExample)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013041942A1 (en) * 2011-09-20 2013-03-28 Toyota Jidosha Kabushiki Kaisha Sound source detecting system and sound source detecting method
EP3116236A1 (de) 2015-07-06 2017-01-11 Sivantos Pte. Ltd. Verfahren zur signalverarbeitung für ein hörgerät, hörgerät, hörgerätesystem und störquellensender für ein hörgerätesystem
WO2020012229A1 (en) * 2018-07-12 2020-01-16 Bosch Car Multimedia Portugal, S.A. Selective active noise cancelling system
CN112614501A (zh) * 2020-12-08 2021-04-06 深圳创维-Rgb电子有限公司 降噪方法、装置、噪音消除器、麦克风和可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4932063A (en) * 1987-11-01 1990-06-05 Ricoh Company, Ltd. Noise suppression apparatus
WO1998001956A2 (en) * 1996-07-08 1998-01-15 Chiefs Voice Incorporated Microphone noise rejection system
GB2377805A (en) * 2001-07-10 2003-01-22 20 20 Speech Ltd Localisation of a person in a conveyance
EP1879180A1 (en) 2006-07-10 2008-01-16 Harman Becker Automotive Systems GmbH Reduction of background noise in hands-free systems

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3384540B2 (ja) * 1997-03-13 2003-03-10 日本電信電話株式会社 受話方法、装置及び記録媒体
JPH11265199A (ja) * 1998-03-18 1999-09-28 Nippon Telegr & Teleph Corp <Ntt> 送話器
JP4119112B2 (ja) * 2001-11-05 2008-07-16 本田技研工業株式会社 混合音の分離装置
EP1605439B1 (en) * 2004-06-04 2007-06-27 Honda Research Institute Europe GmbH Unified treatment of resolved and unresolved harmonics
JP2007079389A (ja) * 2005-09-16 2007-03-29 Yamaha Motor Co Ltd 音声分析方法および音声分析装置
US8184827B2 (en) * 2006-11-09 2012-05-22 Panasonic Corporation Sound source position detector
JP4336378B2 (ja) * 2007-04-26 2009-09-30 株式会社神戸製鋼所 目的音抽出装置,目的音抽出プログラム,目的音抽出方法
JP4519901B2 (ja) * 2007-04-26 2010-08-04 株式会社神戸製鋼所 目的音抽出装置,目的音抽出プログラム,目的音抽出方法
JP4493690B2 (ja) * 2007-11-30 2010-06-30 株式会社神戸製鋼所 目的音抽出装置,目的音抽出プログラム,目的音抽出方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4932063A (en) * 1987-11-01 1990-06-05 Ricoh Company, Ltd. Noise suppression apparatus
WO1998001956A2 (en) * 1996-07-08 1998-01-15 Chiefs Voice Incorporated Microphone noise rejection system
GB2377805A (en) * 2001-07-10 2003-01-22 20 20 Speech Ltd Localisation of a person in a conveyance
EP1879180A1 (en) 2006-07-10 2008-01-16 Harman Becker Automotive Systems GmbH Reduction of background noise in hands-free systems

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BREGMAN, A: "Auditory Scene Analysis", 1990, MIT PRESS
BROWN, G. J.; COOKE, M. P., COMPUTATIONAL AUDITORY SCENE ANALYSIS COMPUTER SPEECH AND LANGUAGE, no. 1, 1994, pages 297 - 336
HECKMANN, M.; JOUBLIN, F.; KORNER, E.: "Sound Source Separation for a Robot Based on Pitch Proc IEEE/RSJ Int.l Conf. on Robots and Intell", SYST, 2005, pages 203 - 208
HU, G.; WANG, D. L.: "Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation IEEE Trans", NEURAL NETWORKS, vol. 15, 2004, pages 1135 - 1150
HU, G.; WANG, D., AUDITORY SEGMENTATION BASED ON ONSET AND OFFSET ANALYSIS IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 15, 2007, pages 396 - 405
KUO S M ET AL: "ACTIVE NOISE CONTROL: A TUTORIAL REVIEW", PROCEEDINGS OF THE IEEE, IEEE. NEW YORK, US, vol. 87, no. 6, 1 June 1999 (1999-06-01), pages 943 - 973, XP011044219, ISSN: 0018-9219 *
PUDER H ET AL: "Improved noise reduction for hands-free car phones utilizing information on vehicle and engine speeds", SIGNAL PROCESSING : THEORIES AND APPLICATIONS, PROCEEDINGS OFEUSIPCO, XX, XX, vol. 3, 1 January 2000 (2000-01-01), pages 1851 - 1854, XP009030255 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013041942A1 (en) * 2011-09-20 2013-03-28 Toyota Jidosha Kabushiki Kaisha Sound source detecting system and sound source detecting method
CN103930791A (zh) * 2011-09-20 2014-07-16 丰田自动车株式会社 声源检测系统和声源检测方法
CN103930791B (zh) * 2011-09-20 2016-03-02 丰田自动车株式会社 声源检测系统和声源检测方法
US9299334B2 (en) 2011-09-20 2016-03-29 Toyota Jidosha Kabushiki Kaisha Sound source detecting system and sound source detecting method
EP3116236A1 (de) 2015-07-06 2017-01-11 Sivantos Pte. Ltd. Verfahren zur signalverarbeitung für ein hörgerät, hörgerät, hörgerätesystem und störquellensender für ein hörgerätesystem
WO2020012229A1 (en) * 2018-07-12 2020-01-16 Bosch Car Multimedia Portugal, S.A. Selective active noise cancelling system
CN112614501A (zh) * 2020-12-08 2021-04-06 深圳创维-Rgb电子有限公司 降噪方法、装置、噪音消除器、麦克风和可读存储介质
CN112614501B (zh) * 2020-12-08 2024-07-12 深圳创维-Rgb电子有限公司 降噪方法、装置、噪音消除器、麦克风和可读存储介质

Also Published As

Publication number Publication date
JP5377442B2 (ja) 2013-12-25
JP2011085904A (ja) 2011-04-28

Similar Documents

Publication Publication Date Title
US11856379B2 (en) Method, device and electronic device for controlling audio playback of multiple loudspeakers
US20210312915A1 (en) System and method for audio-visual multi-speaker speech separation with location-based selection
US20100185308A1 (en) Sound Signal Processing Device And Playback Device
EP2207168A3 (en) Robust two microphone noise suppression system
JPWO2001095314A1 (ja) ロボット聴覚装置及びロボット聴覚システム
WO2001095314A1 (en) Robot acoustic device and robot acoustic system
EP2312579A1 (en) Speech from noise separation with reference information
JP2012189907A (ja) 音声判別装置、音声判別方法および音声判別プログラム
JP3632099B2 (ja) ロボット視聴覚システム
CN110865788B (zh) 交通工具通信系统和操作交通工具通信系统的方法
WO2013067145A1 (en) Systems and methods for enhancing place-of-articulation features in frequency-lowered speech
CN105049802B (zh) 一种语音识别执法记录仪及其识别方法
EP2482566A1 (en) Method for generating an audio signal
CN113707156A (zh) 一种用于车载的语音识别方法及系统
CN106328154B (zh) 一种前端音频处理系统
WO2017000774A1 (zh) 一种机器人自身音源消除系统
KR102208536B1 (ko) 음성인식 장치 및 음성인식 장치의 동작방법
US20250184665A1 (en) Ear-worn device and reproduction method
CN118506805A (zh) 一种汽车智能座舱的环境音透传方法与装置
Nakadai et al. Humanoid active audition system improved by the cover acoustics
CN110012391A (zh) 一种手术会诊系统及手术室音频采集方法
CN120091256B (zh) 一种大型私家车内保证交流畅通的方法及系统
KR101081972B1 (ko) 하이브리드 특징벡터 처리 방법 및 이를 이용한 화자 인식 방법과 장치
Marquardt et al. A natural acoustic front-end for Interactive TV in the EU-Project DICIT
CN118942491B (zh) 数据处理方法、电子设备、存储介质及计算机程序产品

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20100818

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO SE SI SK SM TR

AX Request for extension of the european patent

Extension state: AL BA RS

17Q First examination report despatched

Effective date: 20111012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20121116