CN116583899A - 用户语音简档管理 - Google Patents

用户语音简档管理 Download PDF

Info

Publication number
CN116583899A
CN116583899A CN202180080295.6A CN202180080295A CN116583899A CN 116583899 A CN116583899 A CN 116583899A CN 202180080295 A CN202180080295 A CN 202180080295A CN 116583899 A CN116583899 A CN 116583899A
Authority
CN
China
Prior art keywords
audio
feature data
speaker
user voice
audio feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180080295.6A
Other languages
English (en)
Chinese (zh)
Inventor
S·J·朴
S·穆恩
金莱轩
E·维瑟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN116583899A publication Critical patent/CN116583899A/zh
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3231Monitoring the presence, absence or movement of users
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)
CN202180080295.6A 2020-12-08 2021-09-28 用户语音简档管理 Pending CN116583899A (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/115,158 2020-12-08
US17/115,158 US11626104B2 (en) 2020-12-08 2020-12-08 User speech profile management
PCT/US2021/071617 WO2022126040A1 (en) 2020-12-08 2021-09-28 User speech profile management

Publications (1)

Publication Number Publication Date
CN116583899A true CN116583899A (zh) 2023-08-11

Family

ID=78303075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180080295.6A Pending CN116583899A (zh) 2020-12-08 2021-09-28 用户语音简档管理

Country Status (6)

Country Link
US (1) US11626104B2 (https=)
EP (1) EP4260314A1 (https=)
JP (1) JP7753363B2 (https=)
KR (1) KR20230118089A (https=)
CN (1) CN116583899A (https=)
WO (1) WO2022126040A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11929077B2 (en) * 2019-12-23 2024-03-12 Dts Inc. Multi-stage speaker enrollment in voice authentication and identification
US11462218B1 (en) * 2020-04-29 2022-10-04 Amazon Technologies, Inc. Conserving battery while detecting for human voice
US12198677B2 (en) * 2022-05-27 2025-01-14 Tencent America LLC Techniques for end-to-end speaker diarization with generalized neural speaker clustering
KR102516391B1 (ko) * 2022-09-02 2023-04-03 주식회사 액션파워 음성 구간 길이를 고려하여 오디오에서 음성 구간을 검출하는 방법
CN116364063B (zh) * 2023-06-01 2023-09-05 蔚来汽车科技(安徽)有限公司 音素对齐方法、设备、驾驶设备和介质
WO2025254947A1 (en) * 2024-06-04 2025-12-11 Qualcomm Incorporated Speech profile management

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253811A1 (en) * 2011-03-30 2012-10-04 Kabushiki Kaisha Toshiba Speech processing system and method
US20180004925A1 (en) * 2015-01-13 2018-01-04 Validsoft Uk Limited Authentication method
JP2019008131A (ja) * 2017-06-23 2019-01-17 日本電信電話株式会社 話者判定装置、話者判定情報生成方法、プログラム
EP3627505A1 (en) * 2018-09-21 2020-03-25 Televic Conference NV Real-time speaker identification with diarization
CN110998717A (zh) * 2018-04-16 2020-04-10 谷歌有限责任公司 自动确定通过自动化助理接口接收的口头话语的语音识别的语言
US20200194006A1 (en) * 2017-09-11 2020-06-18 Telefonaktiebolaget Lm Ericsson (Publ) Voice-Controlled Management of User Profiles

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424946B1 (en) 1999-04-09 2002-07-23 International Business Machines Corporation Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
WO2005122141A1 (en) * 2004-06-09 2005-12-22 Canon Kabushiki Kaisha Effective audio segmentation and classification
US7536304B2 (en) * 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
US8630854B2 (en) * 2010-08-31 2014-01-14 Fujitsu Limited System and method for generating videoconference transcriptions
US9898723B2 (en) * 2012-12-19 2018-02-20 Visa International Service Association System and method for voice authentication
US9666204B2 (en) * 2014-04-30 2017-05-30 Qualcomm Incorporated Voice profile management and speech signal generation
WO2016022588A1 (en) * 2014-08-04 2016-02-11 Flagler Llc Voice tallying system
US10373612B2 (en) * 2016-03-21 2019-08-06 Amazon Technologies, Inc. Anchored speech detection and speech recognition
US11398218B1 (en) * 2018-04-26 2022-07-26 United Services Automobile Association (Usaa) Dynamic speech output configuration
US10991379B2 (en) * 2018-06-22 2021-04-27 Babblelabs Llc Data driven audio enhancement
US11024291B2 (en) * 2018-11-21 2021-06-01 Sri International Real-time class recognition for an audio stream
US11545156B2 (en) * 2020-05-27 2023-01-03 Microsoft Technology Licensing, Llc Automated meeting minutes generation service

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253811A1 (en) * 2011-03-30 2012-10-04 Kabushiki Kaisha Toshiba Speech processing system and method
US20180004925A1 (en) * 2015-01-13 2018-01-04 Validsoft Uk Limited Authentication method
JP2019008131A (ja) * 2017-06-23 2019-01-17 日本電信電話株式会社 話者判定装置、話者判定情報生成方法、プログラム
US20200194006A1 (en) * 2017-09-11 2020-06-18 Telefonaktiebolaget Lm Ericsson (Publ) Voice-Controlled Management of User Profiles
CN110998717A (zh) * 2018-04-16 2020-04-10 谷歌有限责任公司 自动确定通过自动化助理接口接收的口头话语的语音识别的语言
EP3627505A1 (en) * 2018-09-21 2020-03-25 Televic Conference NV Real-time speaker identification with diarization

Also Published As

Publication number Publication date
US20220180859A1 (en) 2022-06-09
KR20230118089A (ko) 2023-08-10
WO2022126040A1 (en) 2022-06-16
TW202223877A (zh) 2022-06-16
US11626104B2 (en) 2023-04-11
JP7753363B2 (ja) 2025-10-14
JP2023553867A (ja) 2023-12-26
EP4260314A1 (en) 2023-10-18

Similar Documents

Publication Publication Date Title
US12567435B1 (en) Context driven device arbitration
CN116583899A (zh) 用户语音简档管理
US12125483B1 (en) Determining device groups
US11545147B2 (en) Utterance classifier
US10320780B2 (en) Shared secret voice authentication
CN108351872B (zh) 用于响应用户语音的方法和系统
EP3210205B1 (en) Sound sample verification for generating sound detection model
JP2021033051A (ja) 情報処理装置、情報処理方法およびプログラム
US20190355352A1 (en) Voice and conversation recognition system
CN105210146A (zh) 用于控制语音激活的方法和设备
US20240212689A1 (en) Speaker-specific speech filtering for multiple users
CN112585674B (zh) 信息处理装置、信息处理方法和存储介质
US12567414B2 (en) System and method for detecting a wakeup command for a voice assistant
CN110024027A (zh) 说话人识别
US11205433B2 (en) Method and apparatus for activating speech recognition
CN115310066A (zh) 一种升级方法、装置及电子设备
TWI918728B (zh) 用戶話音輪廓管理
US20240212669A1 (en) Speech filter for speech processing
US20250372098A1 (en) Speech profile management
US20240419731A1 (en) Knowledge-based audio scene graph
WO2024053915A1 (en) System and method for detecting a wakeup command for a voice assistant
TW202605803A (zh) 話音設定檔管理
EP4728513A1 (en) Knowledge-based audio scene graph
WO2025254947A1 (en) Speech profile management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination