JP2020109498A - システム、及び、方法 - Google Patents
システム、及び、方法 Download PDFInfo
- Publication number
- JP2020109498A JP2020109498A JP2019220476A JP2019220476A JP2020109498A JP 2020109498 A JP2020109498 A JP 2020109498A JP 2019220476 A JP2019220476 A JP 2019220476A JP 2019220476 A JP2019220476 A JP 2019220476A JP 2020109498 A JP2020109498 A JP 2020109498A
- Authority
- JP
- Japan
- Prior art keywords
- target
- speech
- stream
- enhancement
- enhanced
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 238000001514 detection method Methods 0.000 claims description 43
- 238000000926 separation method Methods 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 abstract description 49
- 238000012545 processing Methods 0.000 abstract description 20
- 230000002708 enhancing effect Effects 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 13
- 230000015654 memory Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 8
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012880 independent component analysis Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012285 ultrasound imaging Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/65—Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/40—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
- H04R1/406—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/20—Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
- H04R2430/23—Direction finding using a sum-delay beam-former
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Quality & Reliability (AREA)
- Otolaryngology (AREA)
- General Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims (20)
- マルチチャンネルオーディオ入力信号を分析し、複数の強調ターゲットストリームを生成するように作動可能なターゲット発話強調エンジンと、
前記ストリームの中の特定のターゲット発話の品質、及び/又は、存在の信頼性を決定するようにそれぞれが作動可能な複数のターゲット発話検出エンジンを備えるマルチストリームターゲット発話検出生成部であり、前記複数の強調ターゲットストリームに関する複数の重みを決定するように作動可能なマルチストリームターゲット発話検出生成部と、
前記複数の重みを前記強調ターゲットストリームに適用して、組合せ強調出力信号を生成するように作動可能な融合サブシステムと、
を備えるシステム。 - 人間の発話と環境ノイズとを感知し、対応する前記マルチチャンネルオーディオ入力信号を生成するように作動可能なオーディオセンサアレーを更に備える、
請求項1に記載のシステム。 - 前記ターゲット発話強調エンジンが、前記マルチチャンネルオーディオ入力信号を分析し、前記複数の強調ターゲットストリームの一つを出力するようにそれぞれが作動可能な複数の発話強調モジュールを備える、
請求項1に記載のシステム。 - 前記複数の発話強調モジュールが、適応空間フィルタリングアルゴリズム、ビームフォーミングアルゴリズム、ブラインド音源分離アルゴリズム、シングルチャンネル強調アルゴリズム、及び/又は、ニューラルネットワーク、を備える、
請求項3に記載のシステム。 - 前記ターゲット発話検出エンジンが、混合ガウスモデル、隠れマルコフモデル、及び/又は、ニューラルワーク、を備える、
請求項1に記載のシステム。 - 各ターゲット発話検出エンジンが、入力オーディオストリームが前記特定のターゲット発話を含んでいる信頼性に相関する事後重みを作るように作動可能な、
請求項1に記載のシステム。 - 各ターゲット発話検出エンジンが、クリーンな発話に対してはより高い事後を作るように作動可能な、
請求項6のシステム。 - 前記強調出力信号が、前記強調ターゲットストリームの重み付き和である、
請求項1のシステム。 - 前記マルチストリームターゲット発話検出生成部が、前記ストリームの中に特定のターゲット発話が検出される組合せ確率を決定するように更に作動可能であり、前記組合せ確率が検出閾値を超えている場合に前記ターゲット発話が検出される、
請求項1のシステム。 - 自動発話認識エンジン又はVoIPアプリケーションを更に備え、
前記ターゲット発話が検出されたときに、前記強調出力信号が前記自動発話認識エンジン又はVoIPに転送される、
請求項9のシステム。 - ターゲット発話強調エンジンを用いて、マルチチャンネルオーディオ入力信号を解析し、複数の強調ターゲットストリームを生成し、
マルチストリームターゲット発話検出生成部を用いて前記ストリームにターゲット発話を検出する確率を決定し、
前記複数の強調ターゲットストリームのそれぞれについて重みを計算し、
計算した前記重みを前記複数の強調ターゲットストリームに適用して、強調出力信号を生成する、
方法。 - オーディオセンサアレーを用いて人間の発話と環境ノイズとを感知し、前記マルチチャンネルオーディオ入力信号を生成する、ことを更に含む、
請求項11の方法。 - 前記マルチチャンネルオーディオ入力信号を分析することは、複数の発話強調モダリティを適用することを含み、各発話強調モダリティは前記複数の強調ターゲットストリームのうち分離された一つを出力する、
請求項11の方法。 - 前記複数の発話強調モダリティは、適応空間フィルタリングアルゴリズム、ビーム形成アルゴリズム、ブラインド音源分離アルゴリズム、シングルチャンネル強調アルゴリズム、及び/又は、ニューラルネットワーク、を含む、
請求項13の方法。 - 前記ストリームに前記ターゲット発話を検出する前記確率を決定することは、混合ガウスモデル、隠れマルコフモデル、及び/又は、ニューラルネットワーク、を適用することを含む、
請求項11の方法。 - 前記ストリームに前記ターゲット発話を検出する前記確率を決定することは、入力された前記ストリームにキーワードが含まれている信頼性に相関する事後重みを作ることを含む、
請求項11の方法。 - クリーンな発話により高い事後を作ることを更に含む、
請求項16の方法。 - 前記強調出力信号は、前記複数の強調ターゲットストリームの重み付き和である、
請求項11の方法。 - 前記複数のストリームの中に前記ターゲット発話を検出する組合せ確率を決定することを更に含み、前記ターゲット発話は前記組合せ確率が検出閾値を超えた場合に検出される、
請求項11の方法。 - 前記ターゲット発話が検出された場合に、前記強調出力信号に対して自動発話認識を実行する、ことを更に含む、
請求項19の方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862776422P | 2018-12-06 | 2018-12-06 | |
US62/776,422 | 2018-12-06 |
Publications (3)
Publication Number | Publication Date |
---|---|
JP2020109498A true JP2020109498A (ja) | 2020-07-16 |
JP2020109498A5 JP2020109498A5 (ja) | 2022-12-08 |
JP7407580B2 JP7407580B2 (ja) | 2024-01-04 |
Family
ID=70970205
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019220476A Active JP7407580B2 (ja) | 2018-12-06 | 2019-12-05 | システム、及び、方法 |
Country Status (3)
Country | Link |
---|---|
US (2) | US11158333B2 (ja) |
JP (1) | JP7407580B2 (ja) |
CN (1) | CN111370014A (ja) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7407580B2 (ja) | 2018-12-06 | 2024-01-04 | シナプティクス インコーポレイテッド | システム、及び、方法 |
US11048472B2 (en) | 2019-01-27 | 2021-06-29 | Listen AS | Dynamically adjustable sound parameters |
US11126398B2 (en) * | 2019-03-13 | 2021-09-21 | Listen AS | Smart speaker |
US11551671B2 (en) * | 2019-05-16 | 2023-01-10 | Samsung Electronics Co., Ltd. | Electronic device and method of controlling thereof |
US11557307B2 (en) | 2019-10-20 | 2023-01-17 | Listen AS | User voice control system |
US20210201928A1 (en) * | 2019-12-31 | 2021-07-01 | Knowles Electronics, Llc | Integrated speech enhancement for voice trigger application |
US11064294B1 (en) | 2020-01-10 | 2021-07-13 | Synaptics Incorporated | Multiple-source tracking and voice activity detections for planar microphone arrays |
US11670298B2 (en) | 2020-05-08 | 2023-06-06 | Nuance Communications, Inc. | System and method for data augmentation for multi-microphone signal processing |
US11875797B2 (en) * | 2020-07-23 | 2024-01-16 | Pozotron Inc. | Systems and methods for scripted audio production |
CN111916106B (zh) * | 2020-08-17 | 2021-06-15 | 牡丹江医学院 | 一种提高英语教学中发音质量的方法 |
CN112017686B (zh) * | 2020-09-18 | 2022-03-01 | 中科极限元(杭州)智能科技股份有限公司 | 基于门控递归融合深度嵌入式特征的多通道语音分离系统 |
CN112786069B (zh) * | 2020-12-24 | 2023-03-21 | 北京有竹居网络技术有限公司 | 语音提取方法、装置和电子设备 |
TWI761018B (zh) * | 2021-01-05 | 2022-04-11 | 瑞昱半導體股份有限公司 | 語音擷取方法以及語音擷取系統 |
US11823707B2 (en) | 2022-01-10 | 2023-11-21 | Synaptics Incorporated | Sensitivity mode for an audio spotting system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011248025A (ja) * | 2010-05-25 | 2011-12-08 | Nippon Telegr & Teleph Corp <Ntt> | チャネル統合方法、チャネル統合装置、プログラム |
JP2016517023A (ja) * | 2013-07-18 | 2016-06-09 | 三菱電機株式会社 | 音響信号を処理する方法 |
JP2016524193A (ja) * | 2013-06-27 | 2016-08-12 | ロウルズ リミテッド ライアビリティ カンパニー | 自己生成ウェイク表現の検出 |
US9734822B1 (en) * | 2015-06-01 | 2017-08-15 | Amazon Technologies, Inc. | Feedback based beamformed signal selection |
Family Cites Families (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3484112B2 (ja) | 1999-09-27 | 2004-01-06 | 株式会社東芝 | 雑音成分抑圧処理装置および雑音成分抑圧処理方法 |
US6370500B1 (en) | 1999-09-30 | 2002-04-09 | Motorola, Inc. | Method and apparatus for non-speech activity reduction of a low bit rate digital voice message |
AUPS270902A0 (en) | 2002-05-31 | 2002-06-20 | Canon Kabushiki Kaisha | Robust detection and classification of objects in audio using limited training data |
CN1303582C (zh) | 2003-09-09 | 2007-03-07 | 摩托罗拉公司 | 自动语音归类方法 |
KR100754385B1 (ko) * | 2004-09-30 | 2007-08-31 | 삼성전자주식회사 | 오디오/비디오 센서를 이용한 위치 파악, 추적 및 분리장치와 그 방법 |
US7464029B2 (en) | 2005-07-22 | 2008-12-09 | Qualcomm Incorporated | Robust separation of speech signals in a noisy environment |
JP2007047427A (ja) | 2005-08-10 | 2007-02-22 | Hitachi Ltd | 音声処理装置 |
KR100821177B1 (ko) | 2006-09-29 | 2008-04-14 | 한국전자통신연구원 | 통계적 모델에 기반한 선험적 음성 부재 확률 추정 방법 |
KR100964402B1 (ko) | 2006-12-14 | 2010-06-17 | 삼성전자주식회사 | 오디오 신호의 부호화 모드 결정 방법 및 장치와 이를 이용한 오디오 신호의 부호화/복호화 방법 및 장치 |
US8005237B2 (en) | 2007-05-17 | 2011-08-23 | Microsoft Corp. | Sensor array beamformer post-processor |
EP2081189B1 (en) | 2008-01-17 | 2010-09-22 | Harman Becker Automotive Systems GmbH | Post-filter for beamforming means |
US9113240B2 (en) * | 2008-03-18 | 2015-08-18 | Qualcomm Incorporated | Speech enhancement using multiple microphones on multiple devices |
KR20100006492A (ko) | 2008-07-09 | 2010-01-19 | 삼성전자주식회사 | 부호화 방식 결정 방법 및 장치 |
EP2146519B1 (en) | 2008-07-16 | 2012-06-06 | Nuance Communications, Inc. | Beamforming pre-processing for speaker localization |
JP2010085733A (ja) * | 2008-09-30 | 2010-04-15 | Equos Research Co Ltd | 音声強調システム |
US9202456B2 (en) | 2009-04-23 | 2015-12-01 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for automatic control of active noise cancellation |
US20110010172A1 (en) | 2009-07-10 | 2011-01-13 | Alon Konchitsky | Noise reduction system using a sensor based speech detector |
US9037458B2 (en) * | 2011-02-23 | 2015-05-19 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for spatially selective audio augmentation |
CN102956230B (zh) | 2011-08-19 | 2017-03-01 | 杜比实验室特许公司 | 对音频信号进行歌曲检测的方法和设备 |
EP2791935B1 (en) | 2011-12-12 | 2016-03-09 | Dolby Laboratories Licensing Corporation | Low complexity repetition detection in media data |
CN103325386B (zh) | 2012-03-23 | 2016-12-21 | 杜比实验室特许公司 | 用于信号传输控制的方法和系统 |
KR101318328B1 (ko) * | 2012-04-12 | 2013-10-15 | 경북대학교 산학협력단 | 성김 특성 최소화를 통한 암묵 신호 제거를 이용한 음성 향상 방법 및 장치 |
US9768829B2 (en) | 2012-05-11 | 2017-09-19 | Intel Deutschland Gmbh | Methods for processing audio signals and circuit arrangements therefor |
TWI474317B (zh) | 2012-07-06 | 2015-02-21 | Realtek Semiconductor Corp | 訊號處理裝置以及訊號處理方法 |
US10142007B2 (en) | 2012-07-19 | 2018-11-27 | Intel Deutschland Gmbh | Radio communication devices and methods for controlling a radio communication device |
DK2701145T3 (en) | 2012-08-24 | 2017-01-16 | Retune DSP ApS | Noise cancellation for use with noise reduction and echo cancellation in personal communication |
EP2747451A1 (en) | 2012-12-21 | 2014-06-25 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Filter and method for informed spatial filtering using multiple instantaneous direction-of-arrivial estimates |
US9183849B2 (en) | 2012-12-21 | 2015-11-10 | The Nielsen Company (Us), Llc | Audio matching with semantic audio recognition and report generation |
US9158760B2 (en) | 2012-12-21 | 2015-10-13 | The Nielsen Company (Us), Llc | Audio decoding with supplemental semantic audio recognition and report generation |
CN104078050A (zh) | 2013-03-26 | 2014-10-01 | 杜比实验室特许公司 | 用于音频分类和音频处理的设备和方法 |
US9769576B2 (en) | 2013-04-09 | 2017-09-19 | Sonova Ag | Method and system for providing hearing assistance to a user |
CN104217729A (zh) | 2013-05-31 | 2014-12-17 | 杜比实验室特许公司 | 音频处理方法和音频处理装置以及训练方法 |
US9240182B2 (en) | 2013-09-17 | 2016-01-19 | Qualcomm Incorporated | Method and apparatus for adjusting detection threshold for activating voice assistant function |
GB2518663A (en) | 2013-09-27 | 2015-04-01 | Nokia Corp | Audio analysis apparatus |
US9654894B2 (en) * | 2013-10-31 | 2017-05-16 | Conexant Systems, Inc. | Selective audio source enhancement |
US9589560B1 (en) | 2013-12-19 | 2017-03-07 | Amazon Technologies, Inc. | Estimating false rejection rate in a detection system |
EP2916321B1 (en) | 2014-03-07 | 2017-10-25 | Oticon A/s | Processing of a noisy audio signal to estimate target and noise spectral variances |
US9548065B2 (en) | 2014-05-05 | 2017-01-17 | Sensory, Incorporated | Energy post qualification for phrase spotting |
US9484022B2 (en) | 2014-05-23 | 2016-11-01 | Google Inc. | Training multiple neural networks with different accuracy |
US9369113B2 (en) | 2014-06-20 | 2016-06-14 | Steve Yang | Impedance adjusting device |
US10360926B2 (en) | 2014-07-10 | 2019-07-23 | Analog Devices Global Unlimited Company | Low-complexity voice activity detection |
US9432769B1 (en) | 2014-07-30 | 2016-08-30 | Amazon Technologies, Inc. | Method and system for beam selection in microphone array beamformers |
US9953661B2 (en) | 2014-09-26 | 2018-04-24 | Cirrus Logic Inc. | Neural network voice activity detection employing running range normalization |
US9530400B2 (en) | 2014-09-29 | 2016-12-27 | Nuance Communications, Inc. | System and method for compressed domain language identification |
JP6450139B2 (ja) | 2014-10-10 | 2019-01-09 | 株式会社Nttドコモ | 音声認識装置、音声認識方法、及び音声認識プログラム |
US20160275961A1 (en) * | 2015-03-18 | 2016-09-22 | Qualcomm Technologies International, Ltd. | Structure for multi-microphone speech enhancement system |
US10229700B2 (en) | 2015-09-24 | 2019-03-12 | Google Llc | Voice activity detection |
US9668073B2 (en) | 2015-10-07 | 2017-05-30 | Robert Bosch Gmbh | System and method for audio scene understanding of physical object sound sources |
US10347271B2 (en) * | 2015-12-04 | 2019-07-09 | Synaptics Incorporated | Semi-supervised system for multichannel source enhancement through configurable unsupervised adaptive transformations and supervised deep neural network |
US9978397B2 (en) | 2015-12-22 | 2018-05-22 | Intel Corporation | Wearer voice activity detection |
US10090005B2 (en) | 2016-03-10 | 2018-10-02 | Aspinity, Inc. | Analog voice activity detection |
KR102151682B1 (ko) * | 2016-03-23 | 2020-09-04 | 구글 엘엘씨 | 다중채널 음성 인식을 위한 적응성 오디오 강화 |
US9947323B2 (en) | 2016-04-01 | 2018-04-17 | Intel Corporation | Synthetic oversampling to enhance speaker identification or verification |
US11107461B2 (en) | 2016-06-01 | 2021-08-31 | Massachusetts Institute Of Technology | Low-power automatic speech recognition device |
US20180039478A1 (en) | 2016-08-02 | 2018-02-08 | Google Inc. | Voice interaction services |
EP3522152B1 (en) | 2016-09-30 | 2020-02-12 | Sony Corporation | Signal processing device, signal processing method, and program |
US9741360B1 (en) * | 2016-10-09 | 2017-08-22 | Spectimbre Inc. | Speech enhancement for target speakers |
US9881634B1 (en) * | 2016-12-01 | 2018-01-30 | Arm Limited | Multi-microphone speech processing system |
WO2018106971A1 (en) | 2016-12-07 | 2018-06-14 | Interactive Intelligence Group, Inc. | System and method for neural network based speaker classification |
US10546575B2 (en) | 2016-12-14 | 2020-01-28 | International Business Machines Corporation | Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier |
US10083689B2 (en) | 2016-12-23 | 2018-09-25 | Intel Corporation | Linear scoring for low power wake on voice |
US10170134B2 (en) | 2017-02-21 | 2019-01-01 | Intel IP Corporation | Method and system of acoustic dereverberation factoring the actual non-ideal acoustic environment |
JP6652519B2 (ja) | 2017-02-28 | 2020-02-26 | 日本電信電話株式会社 | ステアリングベクトル推定装置、ステアリングベクトル推定方法およびステアリングベクトル推定プログラム |
US10224053B2 (en) * | 2017-03-24 | 2019-03-05 | Hyundai Motor Company | Audio signal quality enhancement based on quantitative SNR analysis and adaptive Wiener filtering |
US10269369B2 (en) | 2017-05-31 | 2019-04-23 | Apple Inc. | System and method of noise reduction for a mobile device |
US10403299B2 (en) * | 2017-06-02 | 2019-09-03 | Apple Inc. | Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition |
US10096328B1 (en) | 2017-10-06 | 2018-10-09 | Intel Corporation | Beamformer system for tracking of speech and noise in a dynamic environment |
US10090000B1 (en) | 2017-11-01 | 2018-10-02 | GM Global Technology Operations LLC | Efficient echo cancellation using transfer function estimation |
US10504539B2 (en) | 2017-12-05 | 2019-12-10 | Synaptics Incorporated | Voice activity detection systems and methods |
US10777189B1 (en) | 2017-12-05 | 2020-09-15 | Amazon Technologies, Inc. | Dynamic wakeword detection |
US10679617B2 (en) | 2017-12-06 | 2020-06-09 | Synaptics Incorporated | Voice enhancement in audio signals through modified generalized eigenvalue beamformer |
WO2019126569A1 (en) | 2017-12-21 | 2019-06-27 | Synaptics Incorporated | Analog voice activity detector systems and methods |
US11062727B2 (en) | 2018-06-13 | 2021-07-13 | Ceva D.S.P Ltd. | System and method for voice activity detection |
JP7407580B2 (ja) | 2018-12-06 | 2024-01-04 | シナプティクス インコーポレイテッド | システム、及び、方法 |
US11232788B2 (en) | 2018-12-10 | 2022-01-25 | Amazon Technologies, Inc. | Wakeword detection |
US11069353B1 (en) | 2019-05-06 | 2021-07-20 | Amazon Technologies, Inc. | Multilingual wakeword detection |
US11064294B1 (en) | 2020-01-10 | 2021-07-13 | Synaptics Incorporated | Multiple-source tracking and voice activity detections for planar microphone arrays |
US11308959B2 (en) | 2020-02-11 | 2022-04-19 | Spotify Ab | Dynamic adjustment of wake word acceptance tolerance thresholds in voice-controlled devices |
US11769520B2 (en) | 2020-08-17 | 2023-09-26 | EMC IP Holding Company LLC | Communication issue detection using evaluation of multiple machine learning models |
-
2019
- 2019-12-05 JP JP2019220476A patent/JP7407580B2/ja active Active
- 2019-12-06 CN CN201911241535.7A patent/CN111370014A/zh active Pending
- 2019-12-06 US US16/706,519 patent/US11158333B2/en active Active
-
2021
- 2021-09-24 US US17/484,208 patent/US11694710B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011248025A (ja) * | 2010-05-25 | 2011-12-08 | Nippon Telegr & Teleph Corp <Ntt> | チャネル統合方法、チャネル統合装置、プログラム |
JP2016524193A (ja) * | 2013-06-27 | 2016-08-12 | ロウルズ リミテッド ライアビリティ カンパニー | 自己生成ウェイク表現の検出 |
JP2016517023A (ja) * | 2013-07-18 | 2016-06-09 | 三菱電機株式会社 | 音響信号を処理する方法 |
US9734822B1 (en) * | 2015-06-01 | 2017-08-15 | Amazon Technologies, Inc. | Feedback based beamformed signal selection |
Also Published As
Publication number | Publication date |
---|---|
US20220013134A1 (en) | 2022-01-13 |
JP7407580B2 (ja) | 2024-01-04 |
US11158333B2 (en) | 2021-10-26 |
US11694710B2 (en) | 2023-07-04 |
CN111370014A (zh) | 2020-07-03 |
US20200184985A1 (en) | 2020-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7407580B2 (ja) | システム、及び、方法 | |
US9940949B1 (en) | Dynamic adjustment of expression detection criteria | |
WO2020103703A1 (zh) | 一种音频数据处理方法、装置、设备及存储介质 | |
WO2021022094A1 (en) | Per-epoch data augmentation for training acoustic models | |
US11257512B2 (en) | Adaptive spatial VAD and time-frequency mask estimation for highly non-stationary noise sources | |
US10535361B2 (en) | Speech enhancement using clustering of cues | |
JP2021110938A (ja) | 平面マイクロフォンアアレイのための複数音源トラッキング及び発話区間検出 | |
JP2021505933A (ja) | 修正された一般化固有値ビームフォーマーを用いた音声信号のボイス強調 | |
US11264017B2 (en) | Robust speaker localization in presence of strong noise interference systems and methods | |
EP2745293B1 (en) | Signal noise attenuation | |
US20220148611A1 (en) | Speech enhancement using clustering of cues | |
US20210201928A1 (en) | Integrated speech enhancement for voice trigger application | |
US20170206898A1 (en) | Systems and methods for assisting automatic speech recognition | |
US20220254332A1 (en) | Method and apparatus for normalizing features extracted from audio data for signal recognition or modification | |
JP7279710B2 (ja) | 信号処理装置および方法、並びにプログラム | |
US10204638B2 (en) | Integrated sensor-array processor | |
WO2023212690A1 (en) | Audio source feature separation and target audio source generation | |
WO2023183684A1 (en) | Microphone array configuration invariant, streaming, multichannel neural enhancement frontend for automatic speech recognition | |
JP2023551704A (ja) | サブ帯域ドメイン音響エコーキャンセラに基づく音響状態推定器 | |
CN117795597A (zh) | 用于自动语音辨识的联合声学回声消除、语音增强和话音分离 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20221130 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20221130 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20231024 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20231206 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20231219 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 7407580 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |