JP6561219B1 - 話者照合 - Google Patents
話者照合 Download PDFInfo
- Publication number
- JP6561219B1 JP6561219B1 JP2019500442A JP2019500442A JP6561219B1 JP 6561219 B1 JP6561219 B1 JP 6561219B1 JP 2019500442 A JP2019500442 A JP 2019500442A JP 2019500442 A JP2019500442 A JP 2019500442A JP 6561219 B1 JP6561219 B1 JP 6561219B1
- Authority
- JP
- Japan
- Prior art keywords
- vector
- user
- neural network
- user device
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/18—Artificial neural networks; Connectionist approaches
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
Description
105a ホットワード
105b オーディオ
110 ユーザデバイス
111 マイクロフォン
113 "Speaker Identity Verified"
115 オーディオによる挨拶
115a ホットワード
115b オーディオ
120 ユーザデバイス
121 マイクロフォン
123 "Shuohuazhe de shenfen yanzheng"
125 オーディオによる挨拶
130 ネットワーク
140 サーバ
150 ニューラルネットワーク
180 話者照合モデル
200 システム
210 ユーザデバイス
210a、210b トレーニング発話
211 マイクロフォン
213a 第1のトレーニングデータのセット
213b 第2のトレーニングデータのセット
214a トレーニング発話ベクトル
215a、215b 言語ID
230 ネットワーク
240 サーバ
250 ニューラルネットワーク
252 入力層
254a、254b、254c 隠れ層
256 出力層
258 一部
260a 出力
260b 出力
270 コンパレータ
272 比較モジュールの出力
280 言語独立話者照合モデル
305、310、315、320 one-hot言語ベクトル
400 システム
402 ユーザ
410a ホットワード
410b オーディオ
414 音響特徴ベクトル
415 言語ID
430 参照ベクトル
440 比較モジュール
450 照合モジュール
460 メッセージ
500 プロセス
Claims (14)
ユーザデバイスによって、ユーザの発話を表現するオーディオデータを受信することと、
前記ユーザデバイス上に記憶されたニューラルネットワークに、前記オーディオデータと、前記ユーザデバイスに関連付けられた言語識別子または場所識別子とから導出された入力データのセットを提供することであって、前記ニューラルネットワークは異なる言語または異なる方言で音声を表現する音声データを使用してトレーニングされたパラメータを有する、提供することと、
前記入力データのセットの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの声の特徴を示す話者表現を生成することと、
前記話者表現および第2の表現に基づいて、前記発話が前記ユーザの発話であると判定することと、
前記発話が前記ユーザの発話であると判定したことに基づいて前記ユーザデバイスへのユーザアクセスを提供することと
を含む動作を実行させるように動作可能な命令を記憶した1つまたは複数の記憶デバイスと
を備える、システム。
前記ニューラルネットワークに、前記生成された入力ベクトルを提供することと、
前記入力ベクトルの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの前記声の特徴を示す話者表現を生成することと
をさらに含む、請求項2に記載のシステム。
前記ニューラルネットワークに、前記生成された入力ベクトルを提供することと、
前記入力ベクトルの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの前記声の特徴を示す話者表現を生成することと
をさらに含む、請求項2に記載のシステム。
前記ニューラルネットワークに、前記生成された入力ベクトルを提供することと、
前記入力ベクトルの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの前記声の特徴を示す話者表現を生成することと
をさらに含む、請求項2に記載のシステム。
前記ユーザデバイス上に記憶されたニューラルネットワークに、前記オーディオデータと、前記ユーザデバイスに関連付けられた言語識別子または場所識別子とから導出された入力データのセットを提供するステップであって、前記ニューラルネットワークは異なる言語または方言で音声を表現する音声データを使用してトレーニングされたパラメータを有する、ステップと、
前記入力データのセットの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの声の特徴を示す話者表現を生成するステップと、
前記話者表現および第2の表現に基づいて、前記発話が前記ユーザの発話であると判定するステップと、
前記発話が前記ユーザの発話であると判定したことに基づいて前記ユーザデバイスへのユーザアクセスを提供するステップと
を含む、方法。
前記ニューラルネットワークに、前記生成された入力ベクトルを提供するステップと、
前記入力ベクトルの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの前記声の特徴を示す話者表現を生成するステップと
をさらに含む、請求項8に記載の方法。
前記ニューラルネットワークに、前記生成された入力ベクトルを提供するステップと、
前記入力ベクトルの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの前記声の特徴を示す話者表現を生成するステップと
をさらに含む、請求項8に記載の方法。
前記ニューラルネットワークに、前記生成された入力ベクトルを提供するステップと、
前記入力ベクトルの受信に応答して生成される前記ニューラルネットワークの出力に基づいて、前記ユーザの前記声の特徴を示す話者表現を生成するステップと
をさらに含む、請求項8に記載の方法。
前記話者表現と前記第2の表現との間の距離を決定するステップを含む、
請求項7から12のいずれか一項に記載の方法。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/211,317 | 2016-07-15 | ||
US15/211,317 US20180018973A1 (en) | 2016-07-15 | 2016-07-15 | Speaker verification |
PCT/US2017/040906 WO2018013401A1 (en) | 2016-07-15 | 2017-07-06 | Speaker verification |
Publications (2)
Publication Number | Publication Date |
---|---|
JP6561219B1 true JP6561219B1 (ja) | 2019-08-14 |
JP2019530888A JP2019530888A (ja) | 2019-10-24 |
Family
ID=59366524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019500442A Active JP6561219B1 (ja) | 2016-07-15 | 2017-07-06 | 話者照合 |
Country Status (7)
Country | Link |
---|---|
US (4) | US20180018973A1 (ja) |
EP (2) | EP3373294B1 (ja) |
JP (1) | JP6561219B1 (ja) |
KR (1) | KR102109874B1 (ja) |
CN (1) | CN108140386B (ja) |
RU (1) | RU2697736C1 (ja) |
WO (1) | WO2018013401A1 (ja) |
Families Citing this family (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11676608B2 (en) * | 2021-04-02 | 2023-06-13 | Google Llc | Speaker verification using co-location information |
CN106469040B (zh) * | 2015-08-19 | 2019-06-21 | 华为终端有限公司 | 通信方法、服务器及设备 |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US20180018973A1 (en) | 2016-07-15 | 2018-01-18 | Google Inc. | Speaker verification |
CN106251859B (zh) * | 2016-07-22 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | 语音识别处理方法和装置 |
US11545146B2 (en) * | 2016-11-10 | 2023-01-03 | Cerence Operating Company | Techniques for language independent wake-up word detection |
US11276395B1 (en) * | 2017-03-10 | 2022-03-15 | Amazon Technologies, Inc. | Voice-based parameter assignment for voice-capturing devices |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770428A1 (en) | 2017-05-12 | 2019-02-18 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
EP3625792B1 (en) * | 2017-07-31 | 2023-05-10 | Beijing Didi Infinity Technology and Development Co., Ltd. | System and method for language-based service hailing |
US11817103B2 (en) * | 2017-09-15 | 2023-11-14 | Nec Corporation | Pattern recognition apparatus, pattern recognition method, and storage medium |
CN108305615B (zh) * | 2017-10-23 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 一种对象识别方法及其设备、存储介质、终端 |
US10916252B2 (en) | 2017-11-10 | 2021-02-09 | Nvidia Corporation | Accelerated data transfer for latency reduction and real-time processing |
KR102486395B1 (ko) * | 2017-11-23 | 2023-01-10 | 삼성전자주식회사 | 화자 인식을 위한 뉴럴 네트워크 장치, 및 그 동작 방법 |
US10593321B2 (en) * | 2017-12-15 | 2020-03-17 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for multi-lingual end-to-end speech recognition |
US10783873B1 (en) * | 2017-12-15 | 2020-09-22 | Educational Testing Service | Native language identification with time delay deep neural networks trained separately on native and non-native english corpora |
CN111630934B (zh) * | 2018-01-22 | 2023-10-13 | 诺基亚技术有限公司 | 隐私保护的声纹认证装置和方法 |
CN108597525B (zh) * | 2018-04-25 | 2019-05-03 | 四川远鉴科技有限公司 | 语音声纹建模方法及装置 |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11152006B2 (en) * | 2018-05-07 | 2021-10-19 | Microsoft Technology Licensing, Llc | Voice identification enrollment |
GB2573809B (en) | 2018-05-18 | 2020-11-04 | Emotech Ltd | Speaker Recognition |
WO2019227290A1 (en) * | 2018-05-28 | 2019-12-05 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for speech recognition |
JP6980603B2 (ja) * | 2018-06-21 | 2021-12-15 | 株式会社東芝 | 話者モデル作成システム、認識システム、プログラムおよび制御装置 |
US10991379B2 (en) | 2018-06-22 | 2021-04-27 | Babblelabs Llc | Data driven audio enhancement |
CN110634489B (zh) * | 2018-06-25 | 2022-01-14 | 科大讯飞股份有限公司 | 一种声纹确认方法、装置、设备及可读存储介质 |
KR20200011796A (ko) * | 2018-07-25 | 2020-02-04 | 엘지전자 주식회사 | 음성 인식 시스템 |
CN110874875B (zh) * | 2018-08-13 | 2021-01-29 | 珠海格力电器股份有限公司 | 门锁控制方法及装置 |
KR102492783B1 (ko) * | 2018-09-25 | 2023-01-27 | 구글 엘엘씨 | 화자 임베딩(들)과 트레이닝된 생성 모델을 이용한 화자 분리 |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
CN110164452B (zh) * | 2018-10-10 | 2023-03-10 | 腾讯科技(深圳)有限公司 | 一种声纹识别的方法、模型训练的方法以及服务器 |
KR102623246B1 (ko) * | 2018-10-12 | 2024-01-11 | 삼성전자주식회사 | 전자 장치, 전자 장치의 제어 방법 및 컴퓨터 판독 가능 매체. |
US11144542B2 (en) * | 2018-11-01 | 2021-10-12 | Visa International Service Association | Natural language processing system |
US11031017B2 (en) * | 2019-01-08 | 2021-06-08 | Google Llc | Fully supervised speaker diarization |
TW202029181A (zh) * | 2019-01-28 | 2020-08-01 | 正崴精密工業股份有限公司 | 語音識別用於特定目標喚醒的方法及裝置 |
US10978069B1 (en) * | 2019-03-18 | 2021-04-13 | Amazon Technologies, Inc. | Word selection for natural language interface |
US11948582B2 (en) * | 2019-03-25 | 2024-04-02 | Omilia Natural Language Solutions Ltd. | Systems and methods for speaker verification |
CN113646835A (zh) * | 2019-04-05 | 2021-11-12 | 谷歌有限责任公司 | 联合自动语音识别和说话人二值化 |
WO2020223122A1 (en) * | 2019-04-30 | 2020-11-05 | Walmart Apollo, Llc | Systems and methods for processing retail facility-related information requests of retail facility workers |
US11158305B2 (en) * | 2019-05-05 | 2021-10-26 | Microsoft Technology Licensing, Llc | Online verification of custom wake word |
US11132992B2 (en) | 2019-05-05 | 2021-09-28 | Microsoft Technology Licensing, Llc | On-device custom wake word detection |
US11222622B2 (en) | 2019-05-05 | 2022-01-11 | Microsoft Technology Licensing, Llc | Wake word selection assistance architectures and methods |
US11468890B2 (en) | 2019-06-01 | 2022-10-11 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11031013B1 (en) | 2019-06-17 | 2021-06-08 | Express Scripts Strategic Development, Inc. | Task completion based on speech analysis |
CN110400562B (zh) * | 2019-06-24 | 2022-03-22 | 歌尔科技有限公司 | 交互处理方法、装置、设备及音频设备 |
CN110415679B (zh) * | 2019-07-25 | 2021-12-17 | 北京百度网讯科技有限公司 | 语音纠错方法、装置、设备和存储介质 |
CN110379433B (zh) * | 2019-08-02 | 2021-10-08 | 清华大学 | 身份验证的方法、装置、计算机设备及存储介质 |
EP4086904A1 (en) * | 2019-12-04 | 2022-11-09 | Google LLC | Speaker awareness using speaker dependent speech model(s) |
RU2723902C1 (ru) * | 2020-02-15 | 2020-06-18 | Илья Владимирович Редкокашин | Способ верификации голосовых биометрических данных |
JP7388239B2 (ja) * | 2020-02-21 | 2023-11-29 | 日本電信電話株式会社 | 照合装置、照合方法、および、照合プログラム |
CN111370003B (zh) * | 2020-02-27 | 2023-05-30 | 杭州雄迈集成电路技术股份有限公司 | 一种基于孪生神经网络的声纹比对方法 |
US11651767B2 (en) | 2020-03-03 | 2023-05-16 | International Business Machines Corporation | Metric learning of speaker diarization |
US11443748B2 (en) * | 2020-03-03 | 2022-09-13 | International Business Machines Corporation | Metric learning of speaker diarization |
US20210287681A1 (en) * | 2020-03-16 | 2021-09-16 | Fidelity Information Services, Llc | Systems and methods for contactless authentication using voice recognition |
WO2021187146A1 (ja) * | 2020-03-16 | 2021-09-23 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 情報送信装置、情報受信装置、情報送信方法、プログラム、及び、システム |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11508380B2 (en) * | 2020-05-26 | 2022-11-22 | Apple Inc. | Personalized voices for text messaging |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
KR102277422B1 (ko) * | 2020-07-24 | 2021-07-19 | 이종엽 | 음성 단말기의 음성 검증 및 제한 방법 |
US11676572B2 (en) * | 2021-03-03 | 2023-06-13 | Google Llc | Instantaneous learning in text-to-speech during dialog |
US11776550B2 (en) * | 2021-03-09 | 2023-10-03 | Qualcomm Incorporated | Device operation based on dynamic classifier |
US11798562B2 (en) * | 2021-05-16 | 2023-10-24 | Google Llc | Attentive scoring function for speaker identification |
US20230137652A1 (en) * | 2021-11-01 | 2023-05-04 | Pindrop Security, Inc. | Cross-lingual speaker recognition |
Family Cites Families (160)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4799262A (en) | 1985-06-27 | 1989-01-17 | Kurzweil Applied Intelligence, Inc. | Speech recognition |
US4868867A (en) | 1987-04-06 | 1989-09-19 | Voicecraft Inc. | Vector excitation speech or audio coder for transmission or storage |
JP2733955B2 (ja) | 1988-05-18 | 1998-03-30 | 日本電気株式会社 | 適応型音声認識装置 |
US5465318A (en) | 1991-03-28 | 1995-11-07 | Kurzweil Applied Intelligence, Inc. | Method for generating a speech recognition model for a non-vocabulary utterance |
JP2979711B2 (ja) | 1991-04-24 | 1999-11-15 | 日本電気株式会社 | パターン認識方式および標準パターン学習方式 |
US5680508A (en) | 1991-05-03 | 1997-10-21 | Itt Corporation | Enhancement of speech coding in background noise for low-rate speech coder |
EP0576765A1 (en) | 1992-06-30 | 1994-01-05 | International Business Machines Corporation | Method for coding digital data using vector quantizing techniques and device for implementing said method |
US5636325A (en) | 1992-11-13 | 1997-06-03 | International Business Machines Corporation | Speech synthesis and analysis of dialects |
US5627939A (en) | 1993-09-03 | 1997-05-06 | Microsoft Corporation | Speech recognition system and method employing data compression |
US5689616A (en) * | 1993-11-19 | 1997-11-18 | Itt Corporation | Automatic language identification/verification system |
US5509103A (en) | 1994-06-03 | 1996-04-16 | Motorola, Inc. | Method of training neural networks used for speech recognition |
US5542006A (en) | 1994-06-21 | 1996-07-30 | Eastman Kodak Company | Neural network based character position detector for use in optical character recognition |
US5729656A (en) | 1994-11-30 | 1998-03-17 | International Business Machines Corporation | Reduction of search space in speech recognition using phone boundaries and phone ranking |
US5839103A (en) * | 1995-06-07 | 1998-11-17 | Rutgers, The State University Of New Jersey | Speaker verification system using decision fusion logic |
US6067517A (en) | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
US5729694A (en) | 1996-02-06 | 1998-03-17 | The Regents Of The University Of California | Speech coding, reconstruction and recognition using acoustics and electromagnetic waves |
US5745872A (en) | 1996-05-07 | 1998-04-28 | Texas Instruments Incorporated | Method and system for compensating speech signals using vector quantization codebook adaptation |
US6038528A (en) | 1996-07-17 | 2000-03-14 | T-Netix, Inc. | Robust speech processing with affine transform replicated data |
EP0954854A4 (en) * | 1996-11-22 | 2000-07-19 | T Netix Inc | PARTIAL VALUE-BASED SPEAKER VERIFICATION BY UNIFYING DIFFERENT CLASSIFIERS USING CHANNEL, ASSOCIATION, MODEL AND THRESHOLD ADAPTATION |
US6260013B1 (en) | 1997-03-14 | 2001-07-10 | Lernout & Hauspie Speech Products N.V. | Speech recognition system employing discriminatively trained models |
KR100238189B1 (ko) | 1997-10-16 | 2000-01-15 | 윤종용 | 다중 언어 tts장치 및 다중 언어 tts 처리 방법 |
AU1305799A (en) * | 1997-11-03 | 1999-05-24 | T-Netix, Inc. | Model adaptation system and method for speaker verification |
US6188982B1 (en) | 1997-12-01 | 2001-02-13 | Industrial Technology Research Institute | On-line background noise adaptation of parallel model combination HMM with discriminative learning using weighted HMM for noisy speech recognition |
US6397179B2 (en) | 1997-12-24 | 2002-05-28 | Nortel Networks Limited | Search optimization system and method for continuous speech recognition |
US6381569B1 (en) | 1998-02-04 | 2002-04-30 | Qualcomm Incorporated | Noise-compensated speech recognition templates |
US6434520B1 (en) | 1999-04-16 | 2002-08-13 | International Business Machines Corporation | System and method for indexing and querying audio archives |
US6665644B1 (en) | 1999-08-10 | 2003-12-16 | International Business Machines Corporation | Conversational data mining |
GB9927528D0 (en) | 1999-11-23 | 2000-01-19 | Ibm | Automatic language identification |
DE10018134A1 (de) | 2000-04-12 | 2001-10-18 | Siemens Ag | Verfahren und Vorrichtung zum Bestimmen prosodischer Markierungen |
US6631348B1 (en) | 2000-08-08 | 2003-10-07 | Intel Corporation | Dynamic speech recognition pattern switching for enhanced speech recognition accuracy |
DE10047172C1 (de) | 2000-09-22 | 2001-11-29 | Siemens Ag | Verfahren zur Sprachverarbeitung |
US6876966B1 (en) | 2000-10-16 | 2005-04-05 | Microsoft Corporation | Pattern recognition training method and apparatus using inserted noise followed by noise reduction |
JP4244514B2 (ja) | 2000-10-23 | 2009-03-25 | セイコーエプソン株式会社 | 音声認識方法および音声認識装置 |
US7280969B2 (en) | 2000-12-07 | 2007-10-09 | International Business Machines Corporation | Method and apparatus for producing natural sounding pitch contours in a speech synthesizer |
GB2370401A (en) | 2000-12-19 | 2002-06-26 | Nokia Mobile Phones Ltd | Speech recognition |
US7062442B2 (en) | 2001-02-23 | 2006-06-13 | Popcatcher Ab | Method and arrangement for search and recording of media signals |
GB2375673A (en) | 2001-05-14 | 2002-11-20 | Salgen Systems Ltd | Image compression method using a table of hash values corresponding to motion vectors |
GB2375935A (en) | 2001-05-22 | 2002-11-27 | Motorola Inc | Speech quality indication |
GB0113581D0 (en) | 2001-06-04 | 2001-07-25 | Hewlett Packard Co | Speech synthesis apparatus |
US7668718B2 (en) | 2001-07-17 | 2010-02-23 | Custom Speech Usa, Inc. | Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile |
US20030033143A1 (en) | 2001-08-13 | 2003-02-13 | Hagai Aronowitz | Decreasing noise sensitivity in speech processing under adverse conditions |
US7571095B2 (en) | 2001-08-15 | 2009-08-04 | Sri International | Method and apparatus for recognizing speech in a noisy environment |
US7043431B2 (en) | 2001-08-31 | 2006-05-09 | Nokia Corporation | Multilingual speech recognition system using text derived recognition models |
US6950796B2 (en) | 2001-11-05 | 2005-09-27 | Motorola, Inc. | Speech recognition by dynamical noise model adaptation |
US7286987B2 (en) | 2002-06-28 | 2007-10-23 | Conceptual Speech Llc | Multi-phoneme streamer and knowledge representation speech recognition system and method |
US7290207B2 (en) | 2002-07-03 | 2007-10-30 | Bbn Technologies Corp. | Systems and methods for providing multimedia information management |
US6756821B2 (en) * | 2002-07-23 | 2004-06-29 | Broadcom | High speed differential signaling logic gate and applications thereof |
JP4352790B2 (ja) | 2002-10-31 | 2009-10-28 | セイコーエプソン株式会社 | 音響モデル作成方法および音声認識装置ならびに音声認識装置を有する乗り物 |
US20040111272A1 (en) | 2002-12-10 | 2004-06-10 | International Business Machines Corporation | Multimodal speech-to-speech language translation and display |
US7593842B2 (en) | 2002-12-10 | 2009-09-22 | Leslie Rousseau | Device and method for translating language |
KR100486735B1 (ko) | 2003-02-28 | 2005-05-03 | 삼성전자주식회사 | 최적구획 분류신경망 구성방법과 최적구획 분류신경망을이용한 자동 레이블링방법 및 장치 |
US7571097B2 (en) | 2003-03-13 | 2009-08-04 | Microsoft Corporation | Method for training of subspace coded gaussian models |
US8849185B2 (en) | 2003-04-15 | 2014-09-30 | Ipventure, Inc. | Hybrid audio delivery system and method therefor |
JP2004325897A (ja) | 2003-04-25 | 2004-11-18 | Pioneer Electronic Corp | 音声認識装置及び音声認識方法 |
US7275032B2 (en) | 2003-04-25 | 2007-09-25 | Bvoice Corporation | Telephone call handling center where operators utilize synthesized voices generated or modified to exhibit or omit prescribed speech characteristics |
US7499857B2 (en) | 2003-05-15 | 2009-03-03 | Microsoft Corporation | Adaptation of compressed acoustic models |
US20040260550A1 (en) | 2003-06-20 | 2004-12-23 | Burges Chris J.C. | Audio processing system and method for classifying speakers in audio data |
JP4548646B2 (ja) | 2003-09-12 | 2010-09-22 | 株式会社エヌ・ティ・ティ・ドコモ | 音声モデルの雑音適応化システム、雑音適応化方法、及び、音声認識雑音適応化プログラム |
US20050144003A1 (en) | 2003-12-08 | 2005-06-30 | Nokia Corporation | Multi-lingual speech synthesis |
FR2865846A1 (fr) | 2004-02-02 | 2005-08-05 | France Telecom | Systeme de synthese vocale |
FR2867598B1 (fr) | 2004-03-12 | 2006-05-26 | Thales Sa | Procede d'identification automatique de langues, en temps reel, dans un signal audio et dispositif de mise en oeuvre |
US20050228673A1 (en) | 2004-03-30 | 2005-10-13 | Nefian Ara V | Techniques for separating and evaluating audio and video source data |
FR2868586A1 (fr) | 2004-03-31 | 2005-10-07 | France Telecom | Procede et systeme ameliores de conversion d'un signal vocal |
US20050267755A1 (en) | 2004-05-27 | 2005-12-01 | Nokia Corporation | Arrangement for speech recognition |
US7406408B1 (en) | 2004-08-24 | 2008-07-29 | The United States Of America As Represented By The Director, National Security Agency | Method of recognizing phones in speech of any language |
US7418383B2 (en) | 2004-09-03 | 2008-08-26 | Microsoft Corporation | Noise robust speech recognition with a switching linear dynamic model |
US7797156B2 (en) | 2005-02-15 | 2010-09-14 | Raytheon Bbn Technologies Corp. | Speech analyzing system with adaptive noise codebook |
US20060253272A1 (en) | 2005-05-06 | 2006-11-09 | International Business Machines Corporation | Voice prompts for use in speech-to-speech translation system |
CN101176146B (zh) | 2005-05-18 | 2011-05-18 | 松下电器产业株式会社 | 声音合成装置 |
WO2006126216A1 (en) | 2005-05-24 | 2006-11-30 | Loquendo S.P.A. | Automatic text-independent, language-independent speaker voice-print creation and speaker recognition |
US20070088552A1 (en) | 2005-10-17 | 2007-04-19 | Nokia Corporation | Method and a device for speech recognition |
US20070118372A1 (en) | 2005-11-23 | 2007-05-24 | General Electric Company | System and method for generating closed captions |
US7991770B2 (en) | 2005-11-29 | 2011-08-02 | Google Inc. | Detecting repeating content in broadcast media |
US7539616B2 (en) * | 2006-02-20 | 2009-05-26 | Microsoft Corporation | Speaker authentication using adapted background models |
US20080004858A1 (en) | 2006-06-29 | 2008-01-03 | International Business Machines Corporation | Apparatus and method for integrated phrase-based and free-form speech-to-speech translation |
US7996222B2 (en) | 2006-09-29 | 2011-08-09 | Nokia Corporation | Prosody conversion |
CN101166017B (zh) | 2006-10-20 | 2011-12-07 | 松下电器产业株式会社 | 用于声音产生设备的自动杂音补偿方法及装置 |
US8204739B2 (en) | 2008-04-15 | 2012-06-19 | Mobile Technologies, Llc | System and methods for maintaining speech-to-speech translation in the field |
CA2676380C (en) | 2007-01-23 | 2015-11-24 | Infoture, Inc. | System and method for detection and analysis of speech |
US7848924B2 (en) | 2007-04-17 | 2010-12-07 | Nokia Corporation | Method, apparatus and computer program product for providing voice conversion using temporal dynamic features |
US20080300875A1 (en) | 2007-06-04 | 2008-12-04 | Texas Instruments Incorporated | Efficient Speech Recognition with Cluster Methods |
CN101359473A (zh) | 2007-07-30 | 2009-02-04 | 国际商业机器公司 | 自动进行语音转换的方法和装置 |
GB2453366B (en) | 2007-10-04 | 2011-04-06 | Toshiba Res Europ Ltd | Automatic speech recognition method and apparatus |
JP4944241B2 (ja) * | 2008-03-14 | 2012-05-30 | 名古屋油化株式会社 | 離型性シートおよび成形物 |
US8615397B2 (en) | 2008-04-04 | 2013-12-24 | Intuit Inc. | Identifying audio content using distorted target patterns |
CN101562013B (zh) * | 2008-04-15 | 2013-05-22 | 联芯科技有限公司 | 一种自动识别语音的方法和装置 |
US8374873B2 (en) | 2008-08-12 | 2013-02-12 | Morphism, Llc | Training and applying prosody models |
US20100057435A1 (en) | 2008-08-29 | 2010-03-04 | Kent Justin R | System and method for speech-to-speech translation |
US8239195B2 (en) | 2008-09-23 | 2012-08-07 | Microsoft Corporation | Adapting a compressed model for use in speech recognition |
US8332223B2 (en) * | 2008-10-24 | 2012-12-11 | Nuance Communications, Inc. | Speaker verification methods and apparatus |
CA2748695C (en) | 2008-12-31 | 2017-11-07 | Bce Inc. | System and method for unlocking a device |
US20100198577A1 (en) | 2009-02-03 | 2010-08-05 | Microsoft Corporation | State mapping for cross-language speaker adaptation |
US8380507B2 (en) | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
EP2406787B1 (en) | 2009-03-11 | 2014-05-14 | Google, Inc. | Audio classification for information retrieval using sparse features |
US9009039B2 (en) | 2009-06-12 | 2015-04-14 | Microsoft Technology Licensing, Llc | Noise adaptive training for speech recognition |
US20110238407A1 (en) | 2009-08-31 | 2011-09-29 | O3 Technologies, Llc | Systems and methods for speech-to-speech translation |
US8886531B2 (en) | 2010-01-13 | 2014-11-11 | Rovi Technologies Corporation | Apparatus and method for generating an audio fingerprint and using a two-stage query |
US8700394B2 (en) | 2010-03-24 | 2014-04-15 | Microsoft Corporation | Acoustic model adaptation using splines |
US8234111B2 (en) | 2010-06-14 | 2012-07-31 | Google Inc. | Speech and noise models for speech recognition |
US20110313762A1 (en) | 2010-06-20 | 2011-12-22 | International Business Machines Corporation | Speech output with confidence indication |
US8725506B2 (en) | 2010-06-30 | 2014-05-13 | Intel Corporation | Speech audio processing |
EP2609587B1 (en) | 2010-08-24 | 2015-04-01 | Veovox SA | System and method for recognizing a user voice command in noisy environment |
US8782012B2 (en) | 2010-08-27 | 2014-07-15 | International Business Machines Corporation | Network analysis |
EP2431969B1 (de) | 2010-09-15 | 2013-04-03 | Svox AG | Spracherkennung mit kleinem Rechenaufwand und reduziertem Quantisierungsfehler |
US8972253B2 (en) | 2010-09-15 | 2015-03-03 | Microsoft Technology Licensing, Llc | Deep belief network for large vocabulary continuous speech recognition |
US9318114B2 (en) | 2010-11-24 | 2016-04-19 | At&T Intellectual Property I, L.P. | System and method for generating challenge utterances for speaker verification |
US20120143604A1 (en) | 2010-12-07 | 2012-06-07 | Rita Singh | Method for Restoring Spectral Components in Denoised Speech Signals |
TWI413105B (zh) | 2010-12-30 | 2013-10-21 | Ind Tech Res Inst | 多語言之文字轉語音合成系統與方法 |
US9286886B2 (en) | 2011-01-24 | 2016-03-15 | Nuance Communications, Inc. | Methods and apparatus for predicting prosody in speech synthesis |
US8594993B2 (en) | 2011-04-04 | 2013-11-26 | Microsoft Corporation | Frame mapping approach for cross-lingual voice transformation |
US8260615B1 (en) | 2011-04-25 | 2012-09-04 | Google Inc. | Cross-lingual initialization of language models |
ES2459391T3 (es) | 2011-06-06 | 2014-05-09 | Bridge Mediatech, S.L. | Método y sistema para conseguir hashing de audio invariante al canal |
US8768707B2 (en) * | 2011-09-27 | 2014-07-01 | Sensory Incorporated | Background speech recognition assistant using speaker verification |
US9235799B2 (en) | 2011-11-26 | 2016-01-12 | Microsoft Technology Licensing, Llc | Discriminative pretraining of deep neural networks |
CN103562993B (zh) * | 2011-12-16 | 2015-05-27 | 华为技术有限公司 | 说话人识别方法及设备 |
US9137600B2 (en) | 2012-02-16 | 2015-09-15 | 2236008 Ontario Inc. | System and method for dynamic residual noise shaping |
US9042867B2 (en) | 2012-02-24 | 2015-05-26 | Agnitio S.L. | System and method for speaker recognition on mobile devices |
JP5875414B2 (ja) | 2012-03-07 | 2016-03-02 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 雑音抑制方法、プログラム及び装置 |
WO2013149123A1 (en) | 2012-03-30 | 2013-10-03 | The Ohio State University | Monaural speech filter |
US9368104B2 (en) | 2012-04-30 | 2016-06-14 | Src, Inc. | System and method for synthesizing human speech using multiple speakers and context |
US20130297299A1 (en) | 2012-05-07 | 2013-11-07 | Board Of Trustees Of Michigan State University | Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech and Speaker Recognition |
US9489950B2 (en) | 2012-05-31 | 2016-11-08 | Agency For Science, Technology And Research | Method and system for dual scoring for text-dependent speaker verification |
US9123338B1 (en) | 2012-06-01 | 2015-09-01 | Google Inc. | Background audio identification for speech disambiguation |
US9704068B2 (en) | 2012-06-22 | 2017-07-11 | Google Inc. | System and method for labelling aerial images |
US9536528B2 (en) * | 2012-07-03 | 2017-01-03 | Google Inc. | Determining hotword suitability |
US9153230B2 (en) * | 2012-10-23 | 2015-10-06 | Google Inc. | Mobile speech recognition hardware accelerator |
US9336771B2 (en) | 2012-11-01 | 2016-05-10 | Google Inc. | Speech recognition using non-parametric models |
US9477925B2 (en) | 2012-11-20 | 2016-10-25 | Microsoft Technology Licensing, Llc | Deep neural networks training for speech and pattern recognition |
US9263036B1 (en) | 2012-11-29 | 2016-02-16 | Google Inc. | System and method for speech recognition using deep recurrent neural networks |
US20140156575A1 (en) | 2012-11-30 | 2014-06-05 | Nuance Communications, Inc. | Method and Apparatus of Processing Data Using Deep Belief Networks Employing Low-Rank Matrix Factorization |
US9230550B2 (en) * | 2013-01-10 | 2016-01-05 | Sensory, Incorporated | Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination |
US9502038B2 (en) * | 2013-01-28 | 2016-11-22 | Tencent Technology (Shenzhen) Company Limited | Method and device for voiceprint recognition |
US9454958B2 (en) | 2013-03-07 | 2016-09-27 | Microsoft Technology Licensing, Llc | Exploiting heterogeneous data in deep neural network-based speech recognition systems |
US9361885B2 (en) | 2013-03-12 | 2016-06-07 | Nuance Communications, Inc. | Methods and apparatus for detecting a voice command |
US9728184B2 (en) | 2013-06-18 | 2017-08-08 | Microsoft Technology Licensing, Llc | Restructuring deep neural network acoustic models |
JP5734354B2 (ja) * | 2013-06-26 | 2015-06-17 | ファナック株式会社 | 工具クランプ装置 |
US9311915B2 (en) | 2013-07-31 | 2016-04-12 | Google Inc. | Context-based speech recognition |
US9679258B2 (en) | 2013-10-08 | 2017-06-13 | Google Inc. | Methods and apparatus for reinforcement learning |
US9401148B2 (en) * | 2013-11-04 | 2016-07-26 | Google Inc. | Speaker verification using neural networks |
US9620145B2 (en) | 2013-11-01 | 2017-04-11 | Google Inc. | Context-dependent state tying using a neural network |
US9715660B2 (en) | 2013-11-04 | 2017-07-25 | Google Inc. | Transfer learning for deep neural network based hotword detection |
US9514753B2 (en) * | 2013-11-04 | 2016-12-06 | Google Inc. | Speaker identification using hash-based indexing |
CN104700831B (zh) * | 2013-12-05 | 2018-03-06 | 国际商业机器公司 | 分析音频文件的语音特征的方法和装置 |
US8965112B1 (en) | 2013-12-09 | 2015-02-24 | Google Inc. | Sequence transcription with deep neural networks |
US9195656B2 (en) | 2013-12-30 | 2015-11-24 | Google Inc. | Multilingual prosody generation |
US9589564B2 (en) | 2014-02-05 | 2017-03-07 | Google Inc. | Multiple speech locale-specific hotword classifiers for selection of a speech locale |
US20150228277A1 (en) | 2014-02-11 | 2015-08-13 | Malaspina Labs (Barbados), Inc. | Voiced Sound Pattern Detection |
US10102848B2 (en) | 2014-02-28 | 2018-10-16 | Google Llc | Hotwords presentation framework |
US9412358B2 (en) * | 2014-05-13 | 2016-08-09 | At&T Intellectual Property I, L.P. | System and method for data-driven socially customized models for language generation |
US9728185B2 (en) | 2014-05-22 | 2017-08-08 | Google Inc. | Recognizing speech using neural networks |
US20150364129A1 (en) | 2014-06-17 | 2015-12-17 | Google Inc. | Language Identification |
CN104008751A (zh) * | 2014-06-18 | 2014-08-27 | 周婷婷 | 一种基于bp神经网络的说话人识别方法 |
CN104168270B (zh) * | 2014-07-31 | 2016-01-13 | 腾讯科技(深圳)有限公司 | 身份验证方法、服务器、客户端及系统 |
US9378731B2 (en) | 2014-09-25 | 2016-06-28 | Google Inc. | Acoustic model training corpus selection |
US9299347B1 (en) | 2014-10-22 | 2016-03-29 | Google Inc. | Speech recognition using associative mapping |
US9418656B2 (en) * | 2014-10-29 | 2016-08-16 | Google Inc. | Multi-stage hotword detection |
CN104732978B (zh) * | 2015-03-12 | 2018-05-08 | 上海交通大学 | 基于联合深度学习的文本相关的说话人识别方法 |
EP3067884B1 (en) * | 2015-03-13 | 2019-05-08 | Samsung Electronics Co., Ltd. | Speech recognition system and speech recognition method thereof |
US9978374B2 (en) * | 2015-09-04 | 2018-05-22 | Google Llc | Neural networks for speaker verification |
US20180018973A1 (en) | 2016-07-15 | 2018-01-18 | Google Inc. | Speaker verification |
-
2016
- 2016-07-15 US US15/211,317 patent/US20180018973A1/en not_active Abandoned
-
2017
- 2017-07-06 KR KR1020187009479A patent/KR102109874B1/ko active IP Right Grant
- 2017-07-06 EP EP18165912.9A patent/EP3373294B1/en active Active
- 2017-07-06 EP EP17740860.6A patent/EP3345181B1/en active Active
- 2017-07-06 JP JP2019500442A patent/JP6561219B1/ja active Active
- 2017-07-06 CN CN201780003481.3A patent/CN108140386B/zh active Active
- 2017-07-06 WO PCT/US2017/040906 patent/WO2018013401A1/en active Application Filing
- 2017-07-06 RU RU2018112272A patent/RU2697736C1/ru active
-
2018
- 2018-06-01 US US15/995,480 patent/US10403291B2/en active Active
-
2019
- 2019-08-30 US US16/557,390 patent/US11017784B2/en active Active
-
2021
- 2021-05-04 US US17/307,704 patent/US11594230B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US20190385619A1 (en) | 2019-12-19 |
US20180018973A1 (en) | 2018-01-18 |
EP3345181B1 (en) | 2019-01-09 |
CN108140386A (zh) | 2018-06-08 |
US20180277124A1 (en) | 2018-09-27 |
KR102109874B1 (ko) | 2020-05-12 |
US11017784B2 (en) | 2021-05-25 |
US10403291B2 (en) | 2019-09-03 |
JP2019530888A (ja) | 2019-10-24 |
EP3345181A1 (en) | 2018-07-11 |
EP3373294B1 (en) | 2019-12-18 |
EP3373294A1 (en) | 2018-09-12 |
KR20180050365A (ko) | 2018-05-14 |
WO2018013401A1 (en) | 2018-01-18 |
CN108140386B (zh) | 2021-11-23 |
RU2697736C1 (ru) | 2019-08-19 |
US20210256981A1 (en) | 2021-08-19 |
US11594230B2 (en) | 2023-02-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6561219B1 (ja) | 話者照合 | |
US10255922B1 (en) | Speaker identification using a text-independent model and a text-dependent model | |
US10476872B2 (en) | Joint speaker authentication and key phrase identification | |
US10446141B2 (en) | Automatic speech recognition based on user feedback | |
KR20160011709A (ko) | 지불 확인을 위한 방법, 장치 및 시스템 | |
US11416593B2 (en) | Electronic device, control method for electronic device, and control program for electronic device | |
JP4143541B2 (ja) | 動作モデルを使用して非煩雑的に話者を検証するための方法及びシステム | |
JP7339116B2 (ja) | 音声認証装置、音声認証システム、および音声認証方法 | |
KR20230156145A (ko) | 하이브리드 다국어 텍스트 의존형 및 텍스트 독립형 화자 검증 | |
JP4245948B2 (ja) | 音声認証装置、音声認証方法及び音声認証プログラム | |
AU2019100034B4 (en) | Improving automatic speech recognition based on user feedback |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20190208 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20190208 |
|
A871 | Explanation of circumstances concerning accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A871 Effective date: 20190208 |
|
A975 | Report on accelerated examination |
Free format text: JAPANESE INTERMEDIATE CODE: A971005 Effective date: 20190527 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20190624 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20190722 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6561219 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |