JP7753363B2 - ユーザ発話プロファイル管理 - Google Patents
ユーザ発話プロファイル管理Info
- Publication number
- JP7753363B2 JP7753363B2 JP2023533713A JP2023533713A JP7753363B2 JP 7753363 B2 JP7753363 B2 JP 7753363B2 JP 2023533713 A JP2023533713 A JP 2023533713A JP 2023533713 A JP2023533713 A JP 2023533713A JP 7753363 B2 JP7753363 B2 JP 7753363B2
- Authority
- JP
- Japan
- Prior art keywords
- audio
- speaker
- audio feature
- user speech
- profile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3231—Monitoring the presence, absence or movement of users
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/115,158 | 2020-12-08 | ||
| US17/115,158 US11626104B2 (en) | 2020-12-08 | 2020-12-08 | User speech profile management |
| PCT/US2021/071617 WO2022126040A1 (en) | 2020-12-08 | 2021-09-28 | User speech profile management |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2023553867A JP2023553867A (ja) | 2023-12-26 |
| JP2023553867A5 JP2023553867A5 (https=) | 2024-09-05 |
| JP7753363B2 true JP7753363B2 (ja) | 2025-10-14 |
Family
ID=78303075
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023533713A Active JP7753363B2 (ja) | 2020-12-08 | 2021-09-28 | ユーザ発話プロファイル管理 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US11626104B2 (https=) |
| EP (1) | EP4260314A1 (https=) |
| JP (1) | JP7753363B2 (https=) |
| KR (1) | KR20230118089A (https=) |
| CN (1) | CN116583899A (https=) |
| WO (1) | WO2022126040A1 (https=) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11929077B2 (en) * | 2019-12-23 | 2024-03-12 | Dts Inc. | Multi-stage speaker enrollment in voice authentication and identification |
| US11462218B1 (en) * | 2020-04-29 | 2022-10-04 | Amazon Technologies, Inc. | Conserving battery while detecting for human voice |
| US12198677B2 (en) * | 2022-05-27 | 2025-01-14 | Tencent America LLC | Techniques for end-to-end speaker diarization with generalized neural speaker clustering |
| KR102516391B1 (ko) * | 2022-09-02 | 2023-04-03 | 주식회사 액션파워 | 음성 구간 길이를 고려하여 오디오에서 음성 구간을 검출하는 방법 |
| CN116364063B (zh) * | 2023-06-01 | 2023-09-05 | 蔚来汽车科技(安徽)有限公司 | 音素对齐方法、设备、驾驶设备和介质 |
| WO2025254947A1 (en) * | 2024-06-04 | 2025-12-11 | Qualcomm Incorporated | Speech profile management |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120253811A1 (en) | 2011-03-30 | 2012-10-04 | Kabushiki Kaisha Toshiba | Speech processing system and method |
| JP2019008131A (ja) | 2017-06-23 | 2019-01-17 | 日本電信電話株式会社 | 話者判定装置、話者判定情報生成方法、プログラム |
| US20200194006A1 (en) | 2017-09-11 | 2020-06-18 | Telefonaktiebolaget Lm Ericsson (Publ) | Voice-Controlled Management of User Profiles |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6424946B1 (en) | 1999-04-09 | 2002-07-23 | International Business Machines Corporation | Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering |
| WO2005122141A1 (en) * | 2004-06-09 | 2005-12-22 | Canon Kabushiki Kaisha | Effective audio segmentation and classification |
| US7536304B2 (en) * | 2005-05-27 | 2009-05-19 | Porticus, Inc. | Method and system for bio-metric voice print authentication |
| US8630854B2 (en) * | 2010-08-31 | 2014-01-14 | Fujitsu Limited | System and method for generating videoconference transcriptions |
| US9898723B2 (en) * | 2012-12-19 | 2018-02-20 | Visa International Service Association | System and method for voice authentication |
| US9666204B2 (en) * | 2014-04-30 | 2017-05-30 | Qualcomm Incorporated | Voice profile management and speech signal generation |
| WO2016022588A1 (en) * | 2014-08-04 | 2016-02-11 | Flagler Llc | Voice tallying system |
| GB2525464B (en) * | 2015-01-13 | 2016-03-16 | Validsoft Uk Ltd | Authentication method |
| US10373612B2 (en) * | 2016-03-21 | 2019-08-06 | Amazon Technologies, Inc. | Anchored speech detection and speech recognition |
| WO2019203794A1 (en) * | 2018-04-16 | 2019-10-24 | Google Llc | Automatically determining language for speech recognition of spoken utterance received via an automated assistant interface |
| US11398218B1 (en) * | 2018-04-26 | 2022-07-26 | United Services Automobile Association (Usaa) | Dynamic speech output configuration |
| US10991379B2 (en) * | 2018-06-22 | 2021-04-27 | Babblelabs Llc | Data driven audio enhancement |
| EP3627505B1 (en) * | 2018-09-21 | 2023-11-15 | Televic Conference NV | Real-time speaker identification with diarization |
| US11024291B2 (en) * | 2018-11-21 | 2021-06-01 | Sri International | Real-time class recognition for an audio stream |
| US11545156B2 (en) * | 2020-05-27 | 2023-01-03 | Microsoft Technology Licensing, Llc | Automated meeting minutes generation service |
-
2020
- 2020-12-08 US US17/115,158 patent/US11626104B2/en active Active
-
2021
- 2021-09-28 KR KR1020237018503A patent/KR20230118089A/ko active Pending
- 2021-09-28 EP EP21795235.7A patent/EP4260314A1/en active Pending
- 2021-09-28 JP JP2023533713A patent/JP7753363B2/ja active Active
- 2021-09-28 WO PCT/US2021/071617 patent/WO2022126040A1/en not_active Ceased
- 2021-09-28 CN CN202180080295.6A patent/CN116583899A/zh active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120253811A1 (en) | 2011-03-30 | 2012-10-04 | Kabushiki Kaisha Toshiba | Speech processing system and method |
| JP2019008131A (ja) | 2017-06-23 | 2019-01-17 | 日本電信電話株式会社 | 話者判定装置、話者判定情報生成方法、プログラム |
| US20200194006A1 (en) | 2017-09-11 | 2020-06-18 | Telefonaktiebolaget Lm Ericsson (Publ) | Voice-Controlled Management of User Profiles |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220180859A1 (en) | 2022-06-09 |
| KR20230118089A (ko) | 2023-08-10 |
| WO2022126040A1 (en) | 2022-06-16 |
| CN116583899A (zh) | 2023-08-11 |
| TW202223877A (zh) | 2022-06-16 |
| US11626104B2 (en) | 2023-04-11 |
| JP2023553867A (ja) | 2023-12-26 |
| EP4260314A1 (en) | 2023-10-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7753363B2 (ja) | ユーザ発話プロファイル管理 | |
| US11875820B1 (en) | Context driven device arbitration | |
| US11138977B1 (en) | Determining device groups | |
| US10540970B2 (en) | Architectures and topologies for vehicle-based, voice-controlled devices | |
| US9672812B1 (en) | Qualifying trigger expressions in speech-based systems | |
| EP2994911B1 (en) | Adaptive audio frame processing for keyword detection | |
| US12586580B2 (en) | System for recognizing and responding to environmental noises | |
| US20130211826A1 (en) | Audio Signals as Buffered Streams of Audio Signals and Metadata | |
| US20150302855A1 (en) | Method and apparatus for activating application by speech input | |
| CN105210146A (zh) | 用于控制语音激活的方法和设备 | |
| CN107767863A (zh) | 语音唤醒方法、系统及智能终端 | |
| EP2801092A1 (en) | Methods, apparatuses and computer program products for implementing automatic speech recognition and sentiment detection on a device | |
| WO2019242414A1 (zh) | 语音处理方法、装置、存储介质及电子设备 | |
| US12567414B2 (en) | System and method for detecting a wakeup command for a voice assistant | |
| US10629199B1 (en) | Architectures and topologies for vehicle-based, voice-controlled devices | |
| WO2019242415A1 (zh) | 位置提示方法、装置、存储介质及电子设备 | |
| US11699444B1 (en) | Speech recognition using multiple voice-enabled devices | |
| US20210082427A1 (en) | Information processing apparatus and information processing method | |
| TWI918728B (zh) | 用戶話音輪廓管理 | |
| CN116153291A (zh) | 一种语音识别方法及设备 | |
| US20240419731A1 (en) | Knowledge-based audio scene graph | |
| US20250372098A1 (en) | Speech profile management | |
| WO2024053915A1 (en) | System and method for detecting a wakeup command for a voice assistant | |
| US20260045260A1 (en) | Environment based user model creation and user verification | |
| WO2024258821A1 (en) | Knowledge-based audio scene graph |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20240828 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20240828 |
|
| A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20250527 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20250804 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20250909 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20251001 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7753363 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |