WO2014000658A1 - 消除噪音的方法和装置、以及移动终端 - Google Patents
消除噪音的方法和装置、以及移动终端 Download PDFInfo
- Publication number
- WO2014000658A1 WO2014000658A1 PCT/CN2013/078130 CN2013078130W WO2014000658A1 WO 2014000658 A1 WO2014000658 A1 WO 2014000658A1 CN 2013078130 W CN2013078130 W CN 2013078130W WO 2014000658 A1 WO2014000658 A1 WO 2014000658A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- audio fingerprint
- party
- calling party
- voice
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000004891 communication Methods 0.000 claims abstract description 22
- 230000005540 biological transmission Effects 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 11
- 239000000284 extract Substances 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 8
- 230000003595 spectral effect Effects 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
Definitions
- the present invention relates to computer technology, and more particularly to a method, apparatus, and mobile terminal for eliminating noise. Background of the invention
- the quality of the call is affected by the background noise of the surrounding environment. For example, when a user uses a mobile phone to talk to a friend, if the user is in a relatively noisy environment, the voice transmitted by the user through the mobile phone may be disturbed by background noise, which may cause the voice received by the friend through the mobile phone to contain background noise, which affects the call. quality.
- a hardware device that is, a noise canceling hardware device, is additionally added to the mobile terminal to reduce the impact of noise on the call quality.
- the noise canceling hardware device includes a background noise canceling microphone, a noise canceling chip, and a generating device.
- the background noise canceling microphone is different from the normal talk microphone on the mobile terminal for collecting noise sound waves.
- the noise canceling chip is used to generate sound waves opposite to the noise based on the noise sound waves collected by the background noise canceling microphone.
- the sounding device is configured to emit the sound wave opposite to the noise to utilize the cancellation principle to eliminate noise during the call, thereby improving the call quality.
- Embodiments of the present invention provide a method, apparatus, and mobile terminal for eliminating noise, which can eliminate background noise during a call and avoid adding a noise canceling hardware device to the mobile terminal.
- a method of eliminating noise including:
- the sound matching the audio fingerprint is extracted from the current call voice, and the sound matching the audio fingerprint is matched. Send to the opposite party through the communication network.
- a device for canceling noise comprising: at least a memory, and a processor in communication with the memory, wherein the memory includes an fetch instruction and a transfer instruction executable by the processor:
- the extraction instruction is configured to extract and store an audio fingerprint of the party in advance from a voice of the party;
- the transmission instruction is configured to: when the calling party and the opposite party are in a call, extract a sound matching the audio fingerprint from the current call voice according to the audio fingerprint of the party, and The voice matching the audio fingerprint is sent to the opposite party through the communication network.
- a mobile terminal includes the above noise canceling device.
- the audio fingerprint of the calling party is first extracted from the voice of the calling party, and when the calling party and the opposite party are talking, the audio of the calling party is a fingerprint, extracting a sound matching the audio fingerprint of the party from the current call voice, and transmitting the extracted voice to the opposite party through the communication network, thereby It ensures that the opposite party hears a clearer and more needed voice, which improves the quality of the call. Further, in the embodiment of the present invention, since the sound transmitted through the communication network is only the sound actually emitted by the calling party, other noise is not included, thereby reducing the load of the communication network.
- FIG. 1 is a flowchart of a method for eliminating noise according to an embodiment of the present invention.
- FIG. 2 is another flow chart of a method for eliminating noise according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of an apparatus for eliminating noise according to an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of another apparatus for eliminating noise according to an embodiment of the present invention. Mode for carrying out the invention
- the method for eliminating the noise provided by the embodiment of the present invention can be applied to a mobile terminal, such as a mobile phone, and the like, and can also be applied to a fixed hardware device, such as a PC, etc., which is not described in the embodiment of the present invention.
- FIG. 1 is a flowchart of a method for eliminating noise according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps.
- step 101 the audio fingerprinting of the party is extracted in advance from the voice of the party.
- the audio fingerprint indicates the voice attribute of the party, and can be used to identify the voice of the party.
- step 102 when the calling party and the opposite party are talking, according to the audio fingerprint of the party, the sound matching the audio fingerprint is extracted from the current call voice, and the The audio fingerprint matching sound is sent to the opposite party through the communication network.
- the current call voice may include the actual voice of the party and the noise that affects the actual voice of the party.
- the noise will be mixed with the actual voice of the party as a mixed party voice. If the mobile terminal transmits the mixed party voice directly through the communication network, the opposite party will receive both the noise and the actual voice of the party, which affects the quality of the call.
- the actual voice of the party is extracted from the voice of the mixed party, and only the extracted voice is transmitted through the communication network, so that the opposite end is received. The party will receive the actual voice of the party, ensuring that the opposite party hears a clearer and more needed voice, which improves the quality of the call.
- steps 101 to 102 can be implemented by software installed in the mobile terminal, and the flow shown in FIG. 1 is described in detail below.
- FIG. 2 is a detailed flowchart of a method for eliminating noise according to an embodiment of the present invention.
- the method is applied to a mobile terminal. As shown in FIG. 2, the method includes the following steps.
- step 201 the mobile terminal extracts the audio fingerprint of the user from the voice of each user.
- the audio fingerprint indicates the voice attribute of the user, and can be used to identify the voice of the user.
- the mobile terminal extracts the audio fingerprint of the user from the voice of the user, including: dividing the user voice signal into multiple frames overlapping each other; performing feature calculation on each frame, and mapping the obtained result by using a classifier manner As a data, the obtained data is taken as the audio fingerprint of the user.
- the user sound signal can be divided into a plurality of frames overlapping each other by the following manner. Starting from different starting times, the user sound signal is divided into a plurality of frames overlapping each other according to the set time interval; or, starting from different starting frequencies, the user sound signal is divided into a plurality of frames overlapping each other according to the set frequency interval.
- the user sound signal is divided into a plurality of frames that overlap each other according to a set time interval. If the set time interval is 1 ms, the user sound signal of 1 ms length starting from the 0 ms is used as a frame, and the lms starts from 0.5 ms.
- the feature operation performed on each frame may be implemented in any one or any combination of the following: Fourier transform (FFT), wavelet transform (WT), Meyer cepstral coefficient (MFCC), spectral smoothness, Sharpness, Linear Predictive Coding (LPC).
- FFT Fourier transform
- WT wavelet transform
- MFCC Meyer cepstral coefficient
- spectral smoothness Sharpness
- LPC Linear Predictive Coding
- the classifier mode in the embodiment of the present invention may be an existing hidden Markov model or a quantization technique, wherein the obtained result is mapped to a data by using a classifier manner, which may be used in the prior art.
- the way of implicit Markov model or quantization technology mapping is similar, and will not be described here.
- step 202 the mobile terminal stores the audio fingerprint of each user locally.
- step 203 the mobile terminal finds the audio fingerprint of the user A from the audio fingerprint of the locally stored user when a user, such as user A, makes a call.
- the current call voice of the user A includes: the actual sound of the user A and the noise affecting the actual sound of the user A, which may be the background noise around the user A or the like.
- step 204 the mobile terminal extracts a sound matching the audio fingerprint of the user A from the current call voice of the user A by using the audio fingerprint of the user A.
- the target sound collection and prediction mode is adopted, from the user.
- a sound that matches the audio fingerprint of User A is predicted in the current call voice of A.
- the predicted sound is extracted from the current call sound by the secondary positioning of the target sound in the time-frequency domain, and the extracted sound is used as the sound matching the audio fingerprint of the user A.
- the target sound collection and prediction mode used in the embodiment and the secondary positioning of the target sound in the time-frequency domain can be similar to the prior art, and the present invention will not be described again.
- step 205 the mobile terminal transmits the voice extracted in step 204 to the opposite party through the communication network.
- the opposite party can hear the voice actually sent by the user A, thereby ensuring the quality of the call between the user A and the opposite party, and, due to the transmission through the communication network.
- the sound is only the actual sound emitted by User A, and does not include other noise, thereby reducing the load on the communication network.
- FIG. 3 is a schematic structural diagram of an apparatus for eliminating noise according to an embodiment of the present invention.
- the apparatus includes an extraction module and a transmission module.
- the extraction module is configured to extract and store the audio fingerprint of the party in advance from the voice of the party.
- the transmission module is configured to: when the calling party and the opposite party are in a call, extract a sound matching the audio fingerprint from the current call voice according to the audio fingerprint of the party, and match the audio fingerprint.
- the sound is sent to the opposite party through the communication network; wherein, the current call voice includes the sound actually emitted by the party and the noise that affects the actual sound of the party.
- the extraction module includes a dividing unit and a mapping unit.
- the dividing unit is configured to divide the voice signal of the party into a plurality of frames overlapping each other.
- the mapping unit is configured to perform a feature operation on each frame, and use the classifier method to map the obtained result into a data, and use the obtained data as an audio fingerprint of the party.
- the dividing unit divides the voice signal of the party into a plurality of frames that overlap each other, including: starting from different starting times, dividing the voice signal of the party into a plurality of frames overlapping each other according to the set time interval; Or, starting from different starting frequencies, the voice signal of the party is divided into a plurality of frames overlapping each other according to the set frequency interval.
- the transmission module extracts a sound matching the audio fingerprint from the current call sound through the prediction unit and the extraction unit.
- the prediction unit is configured to predict the sound matching the audio fingerprint of the party from the current call voice by using the target sound collection prediction mode.
- the extracting unit is configured to extract the predicted sound from the current call sound by using the secondary positioning of the target sound in the time-frequency domain, and use the extracted sound as a sound matching the audio fingerprint of the party.
- FIG. 4 is a schematic structural diagram of another apparatus for eliminating noise according to an embodiment of the present invention.
- the apparatus includes at least a memory, and a processor in communication with the memory, wherein the memory includes fetch instructions and transfer instructions executable by the processor.
- the fetch instruction is used to extract and store the audio fingerprint of the party in advance from the voice of the party.
- the transmission instruction is used to extract a sound matching the audio fingerprint from the current call sound according to the audio fingerprint of the party when the party and the opposite party are talking, and match the sound of the audio fingerprint.
- the extraction instruction includes a division sub-instruction and a mapping sub-instruction.
- the dividing sub-instruction is used to divide the voice signal of the party into a plurality of frames overlapping each other.
- the mapping sub-instruction is used to perform a feature operation on each frame, and the obtained result is mapped into a data by using a classifier method, and the obtained data is used as an audio fingerprint of the party.
- the dividing sub-instruction divides the voice signal of the party into a plurality of frames that overlap each other includes: dividing the voice signal of the party into multiple frames overlapping each other according to the set time interval from different starting times Or, starting from different starting frequencies, divide the voice signal of the party into multiple frames that overlap each other according to the set frequency interval.
- the transmission instruction extracts a sound matching the audio fingerprint from the current call sound by using the prediction sub-instruction and the extraction sub-instruction.
- the prediction sub-instruction is used to predict the sound matching the audio fingerprint of the party from the current call voice by using the target sound collection prediction mode.
- the extracting sub-instruction is for extracting the predicted sound from the current call sound by using the secondary positioning of the target sound in the time-frequency domain, and using the extracted sound as a sound matching the audio fingerprint of the party.
- the embodiment of the present invention further provides a mobile terminal, where the mobile terminal may include the apparatus shown in FIG. 3 or FIG.
- the audio fingerprint of the party is extracted from the voice of the party, and the party is called according to the party when the party and the opposite party are talking.
- Audio fingerprint extracting a sound matching the audio fingerprint of the party from the current call voice, and transmitting the extracted voice to the opposite party through the communication network; wherein, the current call voice includes the actual party.
- the sound emitted and the noise that affects the actual sound of the party can be used to ensure that the receiving party hears a clearer and more desired sound, and improves the quality of the call.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/410,602 US20150325252A1 (en) | 2012-06-28 | 2013-06-27 | Method and device for eliminating noise, and mobile terminal |
KR20157001736A KR20150032562A (ko) | 2012-06-28 | 2013-06-27 | 소음을 제거하기 위한 방법, 장치 및 모바일 단말 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210217760.9A CN103514876A (zh) | 2012-06-28 | 2012-06-28 | 噪音消除方法和装置、以及移动终端 |
CN201210217760.9 | 2012-06-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014000658A1 true WO2014000658A1 (zh) | 2014-01-03 |
Family
ID=49782256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/078130 WO2014000658A1 (zh) | 2012-06-28 | 2013-06-27 | 消除噪音的方法和装置、以及移动终端 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20150325252A1 (zh) |
KR (1) | KR20150032562A (zh) |
CN (1) | CN103514876A (zh) |
WO (1) | WO2014000658A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104601825A (zh) * | 2015-02-16 | 2015-05-06 | 联想(北京)有限公司 | 一种控制方法及装置 |
WO2016127506A1 (zh) * | 2015-02-09 | 2016-08-18 | 宇龙计算机通信科技(深圳)有限公司 | 语音处理方法、语音处理装置和终端 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103871417A (zh) * | 2014-03-25 | 2014-06-18 | 北京工业大学 | 一种移动手机特定连续语音过滤方法及过滤装置 |
CN107094196A (zh) * | 2017-04-21 | 2017-08-25 | 维沃移动通信有限公司 | 一种通话消噪的方法及移动终端 |
CN107172256B (zh) * | 2017-07-27 | 2020-05-05 | Oppo广东移动通信有限公司 | 耳机通话自适应调整方法、装置、移动终端及存储介质 |
CN111696565B (zh) * | 2020-06-05 | 2023-10-10 | 北京搜狗科技发展有限公司 | 语音处理方法、装置和介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000032269A (ko) * | 1998-11-13 | 2000-06-05 | 구자홍 | 음향 기기의 음성인식장치 |
CN101321387A (zh) * | 2008-07-10 | 2008-12-10 | 中国移动通信集团广东有限公司 | 基于通信系统的声纹识别方法及系统 |
CN101345055A (zh) * | 2007-07-11 | 2009-01-14 | 雅马哈株式会社 | 语音处理器和通信终端设备 |
CN102694891A (zh) * | 2011-03-21 | 2012-09-26 | 鸿富锦精密工业(深圳)有限公司 | 通话噪音去除系统及方法 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070219801A1 (en) * | 2006-03-14 | 2007-09-20 | Prabha Sundaram | System, method and computer program product for updating a biometric model based on changes in a biometric feature of a user |
EP2324475A1 (en) * | 2008-08-26 | 2011-05-25 | Dolby Laboratories Licensing Corporation | Robust media fingerprints |
CN101847409B (zh) * | 2010-03-25 | 2012-01-25 | 北京邮电大学 | 一种基于数字指纹的语音完整性保护方法 |
-
2012
- 2012-06-28 CN CN201210217760.9A patent/CN103514876A/zh active Pending
-
2013
- 2013-06-27 WO PCT/CN2013/078130 patent/WO2014000658A1/zh active Application Filing
- 2013-06-27 US US14/410,602 patent/US20150325252A1/en not_active Abandoned
- 2013-06-27 KR KR20157001736A patent/KR20150032562A/ko not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20000032269A (ko) * | 1998-11-13 | 2000-06-05 | 구자홍 | 음향 기기의 음성인식장치 |
CN101345055A (zh) * | 2007-07-11 | 2009-01-14 | 雅马哈株式会社 | 语音处理器和通信终端设备 |
CN101321387A (zh) * | 2008-07-10 | 2008-12-10 | 中国移动通信集团广东有限公司 | 基于通信系统的声纹识别方法及系统 |
CN102694891A (zh) * | 2011-03-21 | 2012-09-26 | 鸿富锦精密工业(深圳)有限公司 | 通话噪音去除系统及方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016127506A1 (zh) * | 2015-02-09 | 2016-08-18 | 宇龙计算机通信科技(深圳)有限公司 | 语音处理方法、语音处理装置和终端 |
CN104601825A (zh) * | 2015-02-16 | 2015-05-06 | 联想(北京)有限公司 | 一种控制方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
US20150325252A1 (en) | 2015-11-12 |
KR20150032562A (ko) | 2015-03-26 |
CN103514876A (zh) | 2014-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105027541B (zh) | 基于内容的噪声抑制 | |
WO2014000658A1 (zh) | 消除噪音的方法和装置、以及移动终端 | |
KR101540896B1 (ko) | 전자 디바이스 상에서의 마스킹 신호 생성 | |
US8983844B1 (en) | Transmission of noise parameters for improving automatic speech recognition | |
US9202455B2 (en) | Systems, methods, apparatus, and computer program products for enhanced active noise cancellation | |
US8824666B2 (en) | Noise cancellation for phone conversation | |
WO2015184893A1 (zh) | 移动终端通话语音降噪方法及装置 | |
CN107240405B (zh) | 一种音箱及告警方法 | |
US8615394B1 (en) | Restoration of noise-reduced speech | |
CN109727607B (zh) | 时延估计方法、装置及电子设备 | |
WO2022135340A1 (zh) | 一种主动降噪的方法、设备及系统 | |
CN110708625A (zh) | 基于智能终端的环境声抑制与增强可调节耳机系统与方法 | |
WO2014117722A1 (zh) | 语音处理方法、装置及终端设备 | |
JP2015135494A (ja) | 音声認識方法及び装置 | |
CN111883182B (zh) | 人声检测方法、装置、设备及存储介质 | |
WO2014154057A1 (zh) | 用户语音通话预警方法、装置及计算机存储介质 | |
KR20100068188A (ko) | 신호 분리 방법, 상기 신호 분리 방법을 이용한 통신 시스템 및 음성인식시스템 | |
JP2019184809A (ja) | 音声認識装置、音声認識方法 | |
US11386911B1 (en) | Dereverberation and noise reduction | |
CN103370741A (zh) | 处理音频信号 | |
US20230046518A1 (en) | Howling suppression method and apparatus, computer device, and storage medium | |
CN112133324A (zh) | 通话状态检测方法、装置、计算机系统和介质 | |
GB2516208B (en) | Noise reduction in voice communications | |
CN104078049B (zh) | 信号处理设备和信号处理方法 | |
Yan et al. | Telesonar: Robocall Alarm System by Detecting Echo Channel and Breath Timing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13808541 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14410602 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20157001736 Country of ref document: KR Kind code of ref document: A |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205N DATED 29-05-2015) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13808541 Country of ref document: EP Kind code of ref document: A1 |