JP7753363B2 - ユーザ発話プロファイル管理 - Google Patents

ユーザ発話プロファイル管理

Info

Publication number
JP7753363B2
JP7753363B2 JP2023533713A JP2023533713A JP7753363B2 JP 7753363 B2 JP7753363 B2 JP 7753363B2 JP 2023533713 A JP2023533713 A JP 2023533713A JP 2023533713 A JP2023533713 A JP 2023533713A JP 7753363 B2 JP7753363 B2 JP 7753363B2
Authority
JP
Japan
Prior art keywords
audio
speaker
audio feature
user speech
profile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2023533713A
Other languages
English (en)
Japanese (ja)
Other versions
JP2023553867A5 (https=
JP2023553867A (ja
Inventor
パク、ソ・ジン
ムン、ソンクク
キム、レ-フン
ビッサー、エリック
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of JP2023553867A publication Critical patent/JP2023553867A/ja
Publication of JP2023553867A5 publication Critical patent/JP2023553867A5/ja
Application granted granted Critical
Publication of JP7753363B2 publication Critical patent/JP7753363B2/ja
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3231Monitoring the presence, absence or movement of users
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)
JP2023533713A 2020-12-08 2021-09-28 ユーザ発話プロファイル管理 Active JP7753363B2 (ja)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/115,158 2020-12-08
US17/115,158 US11626104B2 (en) 2020-12-08 2020-12-08 User speech profile management
PCT/US2021/071617 WO2022126040A1 (en) 2020-12-08 2021-09-28 User speech profile management

Publications (3)

Publication Number Publication Date
JP2023553867A JP2023553867A (ja) 2023-12-26
JP2023553867A5 JP2023553867A5 (https=) 2024-09-05
JP7753363B2 true JP7753363B2 (ja) 2025-10-14

Family

ID=78303075

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2023533713A Active JP7753363B2 (ja) 2020-12-08 2021-09-28 ユーザ発話プロファイル管理

Country Status (6)

Country Link
US (1) US11626104B2 (https=)
EP (1) EP4260314A1 (https=)
JP (1) JP7753363B2 (https=)
KR (1) KR20230118089A (https=)
CN (1) CN116583899A (https=)
WO (1) WO2022126040A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11929077B2 (en) * 2019-12-23 2024-03-12 Dts Inc. Multi-stage speaker enrollment in voice authentication and identification
US11462218B1 (en) * 2020-04-29 2022-10-04 Amazon Technologies, Inc. Conserving battery while detecting for human voice
US12198677B2 (en) * 2022-05-27 2025-01-14 Tencent America LLC Techniques for end-to-end speaker diarization with generalized neural speaker clustering
KR102516391B1 (ko) * 2022-09-02 2023-04-03 주식회사 액션파워 음성 구간 길이를 고려하여 오디오에서 음성 구간을 검출하는 방법
CN116364063B (zh) * 2023-06-01 2023-09-05 蔚来汽车科技(安徽)有限公司 音素对齐方法、设备、驾驶设备和介质
WO2025254947A1 (en) * 2024-06-04 2025-12-11 Qualcomm Incorporated Speech profile management

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253811A1 (en) 2011-03-30 2012-10-04 Kabushiki Kaisha Toshiba Speech processing system and method
JP2019008131A (ja) 2017-06-23 2019-01-17 日本電信電話株式会社 話者判定装置、話者判定情報生成方法、プログラム
US20200194006A1 (en) 2017-09-11 2020-06-18 Telefonaktiebolaget Lm Ericsson (Publ) Voice-Controlled Management of User Profiles

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424946B1 (en) 1999-04-09 2002-07-23 International Business Machines Corporation Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
WO2005122141A1 (en) * 2004-06-09 2005-12-22 Canon Kabushiki Kaisha Effective audio segmentation and classification
US7536304B2 (en) * 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
US8630854B2 (en) * 2010-08-31 2014-01-14 Fujitsu Limited System and method for generating videoconference transcriptions
US9898723B2 (en) * 2012-12-19 2018-02-20 Visa International Service Association System and method for voice authentication
US9666204B2 (en) * 2014-04-30 2017-05-30 Qualcomm Incorporated Voice profile management and speech signal generation
WO2016022588A1 (en) * 2014-08-04 2016-02-11 Flagler Llc Voice tallying system
GB2525464B (en) * 2015-01-13 2016-03-16 Validsoft Uk Ltd Authentication method
US10373612B2 (en) * 2016-03-21 2019-08-06 Amazon Technologies, Inc. Anchored speech detection and speech recognition
WO2019203794A1 (en) * 2018-04-16 2019-10-24 Google Llc Automatically determining language for speech recognition of spoken utterance received via an automated assistant interface
US11398218B1 (en) * 2018-04-26 2022-07-26 United Services Automobile Association (Usaa) Dynamic speech output configuration
US10991379B2 (en) * 2018-06-22 2021-04-27 Babblelabs Llc Data driven audio enhancement
EP3627505B1 (en) * 2018-09-21 2023-11-15 Televic Conference NV Real-time speaker identification with diarization
US11024291B2 (en) * 2018-11-21 2021-06-01 Sri International Real-time class recognition for an audio stream
US11545156B2 (en) * 2020-05-27 2023-01-03 Microsoft Technology Licensing, Llc Automated meeting minutes generation service

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120253811A1 (en) 2011-03-30 2012-10-04 Kabushiki Kaisha Toshiba Speech processing system and method
JP2019008131A (ja) 2017-06-23 2019-01-17 日本電信電話株式会社 話者判定装置、話者判定情報生成方法、プログラム
US20200194006A1 (en) 2017-09-11 2020-06-18 Telefonaktiebolaget Lm Ericsson (Publ) Voice-Controlled Management of User Profiles

Also Published As

Publication number Publication date
US20220180859A1 (en) 2022-06-09
KR20230118089A (ko) 2023-08-10
WO2022126040A1 (en) 2022-06-16
CN116583899A (zh) 2023-08-11
TW202223877A (zh) 2022-06-16
US11626104B2 (en) 2023-04-11
JP2023553867A (ja) 2023-12-26
EP4260314A1 (en) 2023-10-18

Similar Documents

Publication Publication Date Title
JP7753363B2 (ja) ユーザ発話プロファイル管理
US11875820B1 (en) Context driven device arbitration
US11138977B1 (en) Determining device groups
US10540970B2 (en) Architectures and topologies for vehicle-based, voice-controlled devices
US9672812B1 (en) Qualifying trigger expressions in speech-based systems
EP2994911B1 (en) Adaptive audio frame processing for keyword detection
US12586580B2 (en) System for recognizing and responding to environmental noises
US20130211826A1 (en) Audio Signals as Buffered Streams of Audio Signals and Metadata
US20150302855A1 (en) Method and apparatus for activating application by speech input
CN105210146A (zh) 用于控制语音激活的方法和设备
CN107767863A (zh) 语音唤醒方法、系统及智能终端
EP2801092A1 (en) Methods, apparatuses and computer program products for implementing automatic speech recognition and sentiment detection on a device
WO2019242414A1 (zh) 语音处理方法、装置、存储介质及电子设备
US12567414B2 (en) System and method for detecting a wakeup command for a voice assistant
US10629199B1 (en) Architectures and topologies for vehicle-based, voice-controlled devices
WO2019242415A1 (zh) 位置提示方法、装置、存储介质及电子设备
US11699444B1 (en) Speech recognition using multiple voice-enabled devices
US20210082427A1 (en) Information processing apparatus and information processing method
TWI918728B (zh) 用戶話音輪廓管理
CN116153291A (zh) 一种语音识别方法及设备
US20240419731A1 (en) Knowledge-based audio scene graph
US20250372098A1 (en) Speech profile management
WO2024053915A1 (en) System and method for detecting a wakeup command for a voice assistant
US20260045260A1 (en) Environment based user model creation and user verification
WO2024258821A1 (en) Knowledge-based audio scene graph

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20240828

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20240828

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20250527

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20250804

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20250909

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20251001

R150 Certificate of patent or registration of utility model

Ref document number: 7753363

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150