CN116583899A - 用户语音简档管理 - Google Patents
用户语音简档管理 Download PDFInfo
- Publication number
- CN116583899A CN116583899A CN202180080295.6A CN202180080295A CN116583899A CN 116583899 A CN116583899 A CN 116583899A CN 202180080295 A CN202180080295 A CN 202180080295A CN 116583899 A CN116583899 A CN 116583899A
- Authority
- CN
- China
- Prior art keywords
- audio
- feature data
- speaker
- user voice
- audio feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3231—Monitoring the presence, absence or movement of users
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/115,158 | 2020-12-08 | ||
| US17/115,158 US11626104B2 (en) | 2020-12-08 | 2020-12-08 | User speech profile management |
| PCT/US2021/071617 WO2022126040A1 (en) | 2020-12-08 | 2021-09-28 | User speech profile management |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116583899A true CN116583899A (zh) | 2023-08-11 |
Family
ID=78303075
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202180080295.6A Pending CN116583899A (zh) | 2020-12-08 | 2021-09-28 | 用户语音简档管理 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US11626104B2 (https=) |
| EP (1) | EP4260314A1 (https=) |
| JP (1) | JP7753363B2 (https=) |
| KR (1) | KR20230118089A (https=) |
| CN (1) | CN116583899A (https=) |
| WO (1) | WO2022126040A1 (https=) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11929077B2 (en) * | 2019-12-23 | 2024-03-12 | Dts Inc. | Multi-stage speaker enrollment in voice authentication and identification |
| US11462218B1 (en) * | 2020-04-29 | 2022-10-04 | Amazon Technologies, Inc. | Conserving battery while detecting for human voice |
| US12198677B2 (en) * | 2022-05-27 | 2025-01-14 | Tencent America LLC | Techniques for end-to-end speaker diarization with generalized neural speaker clustering |
| KR102516391B1 (ko) * | 2022-09-02 | 2023-04-03 | 주식회사 액션파워 | 음성 구간 길이를 고려하여 오디오에서 음성 구간을 검출하는 방법 |
| CN116364063B (zh) * | 2023-06-01 | 2023-09-05 | 蔚来汽车科技(安徽)有限公司 | 音素对齐方法、设备、驾驶设备和介质 |
| WO2025254947A1 (en) * | 2024-06-04 | 2025-12-11 | Qualcomm Incorporated | Speech profile management |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120253811A1 (en) * | 2011-03-30 | 2012-10-04 | Kabushiki Kaisha Toshiba | Speech processing system and method |
| US20180004925A1 (en) * | 2015-01-13 | 2018-01-04 | Validsoft Uk Limited | Authentication method |
| JP2019008131A (ja) * | 2017-06-23 | 2019-01-17 | 日本電信電話株式会社 | 話者判定装置、話者判定情報生成方法、プログラム |
| EP3627505A1 (en) * | 2018-09-21 | 2020-03-25 | Televic Conference NV | Real-time speaker identification with diarization |
| CN110998717A (zh) * | 2018-04-16 | 2020-04-10 | 谷歌有限责任公司 | 自动确定通过自动化助理接口接收的口头话语的语音识别的语言 |
| US20200194006A1 (en) * | 2017-09-11 | 2020-06-18 | Telefonaktiebolaget Lm Ericsson (Publ) | Voice-Controlled Management of User Profiles |
Family Cites Families (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6424946B1 (en) | 1999-04-09 | 2002-07-23 | International Business Machines Corporation | Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering |
| WO2005122141A1 (en) * | 2004-06-09 | 2005-12-22 | Canon Kabushiki Kaisha | Effective audio segmentation and classification |
| US7536304B2 (en) * | 2005-05-27 | 2009-05-19 | Porticus, Inc. | Method and system for bio-metric voice print authentication |
| US8630854B2 (en) * | 2010-08-31 | 2014-01-14 | Fujitsu Limited | System and method for generating videoconference transcriptions |
| US9898723B2 (en) * | 2012-12-19 | 2018-02-20 | Visa International Service Association | System and method for voice authentication |
| US9666204B2 (en) * | 2014-04-30 | 2017-05-30 | Qualcomm Incorporated | Voice profile management and speech signal generation |
| WO2016022588A1 (en) * | 2014-08-04 | 2016-02-11 | Flagler Llc | Voice tallying system |
| US10373612B2 (en) * | 2016-03-21 | 2019-08-06 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
| US11398218B1 (en) * | 2018-04-26 | 2022-07-26 | United Services Automobile Association (Usaa) | Dynamic speech output configuration |
| US10991379B2 (en) * | 2018-06-22 | 2021-04-27 | Babblelabs Llc | Data driven audio enhancement |
| US11024291B2 (en) * | 2018-11-21 | 2021-06-01 | Sri International | Real-time class recognition for an audio stream |
| US11545156B2 (en) * | 2020-05-27 | 2023-01-03 | Microsoft Technology Licensing, Llc | Automated meeting minutes generation service |
-
2020
- 2020-12-08 US US17/115,158 patent/US11626104B2/en active Active
-
2021
- 2021-09-28 KR KR1020237018503A patent/KR20230118089A/ko active Pending
- 2021-09-28 EP EP21795235.7A patent/EP4260314A1/en active Pending
- 2021-09-28 JP JP2023533713A patent/JP7753363B2/ja active Active
- 2021-09-28 WO PCT/US2021/071617 patent/WO2022126040A1/en not_active Ceased
- 2021-09-28 CN CN202180080295.6A patent/CN116583899A/zh active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120253811A1 (en) * | 2011-03-30 | 2012-10-04 | Kabushiki Kaisha Toshiba | Speech processing system and method |
| US20180004925A1 (en) * | 2015-01-13 | 2018-01-04 | Validsoft Uk Limited | Authentication method |
| JP2019008131A (ja) * | 2017-06-23 | 2019-01-17 | 日本電信電話株式会社 | 話者判定装置、話者判定情報生成方法、プログラム |
| US20200194006A1 (en) * | 2017-09-11 | 2020-06-18 | Telefonaktiebolaget Lm Ericsson (Publ) | Voice-Controlled Management of User Profiles |
| CN110998717A (zh) * | 2018-04-16 | 2020-04-10 | 谷歌有限责任公司 | 自动确定通过自动化助理接口接收的口头话语的语音识别的语言 |
| EP3627505A1 (en) * | 2018-09-21 | 2020-03-25 | Televic Conference NV | Real-time speaker identification with diarization |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220180859A1 (en) | 2022-06-09 |
| KR20230118089A (ko) | 2023-08-10 |
| WO2022126040A1 (en) | 2022-06-16 |
| TW202223877A (zh) | 2022-06-16 |
| US11626104B2 (en) | 2023-04-11 |
| JP7753363B2 (ja) | 2025-10-14 |
| JP2023553867A (ja) | 2023-12-26 |
| EP4260314A1 (en) | 2023-10-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12567435B1 (en) | Context driven device arbitration | |
| CN116583899A (zh) | 用户语音简档管理 | |
| US12125483B1 (en) | Determining device groups | |
| US11545147B2 (en) | Utterance classifier | |
| US10320780B2 (en) | Shared secret voice authentication | |
| CN108351872B (zh) | 用于响应用户语音的方法和系统 | |
| EP3210205B1 (en) | Sound sample verification for generating sound detection model | |
| JP2021033051A (ja) | 情報処理装置、情報処理方法およびプログラム | |
| US20190355352A1 (en) | Voice and conversation recognition system | |
| CN105210146A (zh) | 用于控制语音激活的方法和设备 | |
| US20240212689A1 (en) | Speaker-specific speech filtering for multiple users | |
| CN112585674B (zh) | 信息处理装置、信息处理方法和存储介质 | |
| US12567414B2 (en) | System and method for detecting a wakeup command for a voice assistant | |
| CN110024027A (zh) | 说话人识别 | |
| US11205433B2 (en) | Method and apparatus for activating speech recognition | |
| CN115310066A (zh) | 一种升级方法、装置及电子设备 | |
| TWI918728B (zh) | 用戶話音輪廓管理 | |
| US20240212669A1 (en) | Speech filter for speech processing | |
| US20250372098A1 (en) | Speech profile management | |
| US20240419731A1 (en) | Knowledge-based audio scene graph | |
| WO2024053915A1 (en) | System and method for detecting a wakeup command for a voice assistant | |
| TW202605803A (zh) | 話音設定檔管理 | |
| EP4728513A1 (en) | Knowledge-based audio scene graph | |
| WO2025254947A1 (en) | Speech profile management |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |