KR20230118089A - 사용자 스피치 프로파일 관리 - Google Patents

사용자 스피치 프로파일 관리 Download PDF

Info

Publication number
KR20230118089A
KR20230118089A KR1020237018503A KR20237018503A KR20230118089A KR 20230118089 A KR20230118089 A KR 20230118089A KR 1020237018503 A KR1020237018503 A KR 1020237018503A KR 20237018503 A KR20237018503 A KR 20237018503A KR 20230118089 A KR20230118089 A KR 20230118089A
Authority
KR
South Korea
Prior art keywords
audio
feature data
user speech
speaker
profile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
KR1020237018503A
Other languages
English (en)
Korean (ko)
Inventor
수진 박
선국 문
래훈 김
에릭 비제르
Original Assignee
퀄컴 인코포레이티드
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 퀄컴 인코포레이티드 filed Critical 퀄컴 인코포레이티드
Publication of KR20230118089A publication Critical patent/KR20230118089A/ko
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3231Monitoring the presence, absence or movement of users
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)
KR1020237018503A 2020-12-08 2021-09-28 사용자 스피치 프로파일 관리 Pending KR20230118089A (ko)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/115,158 2020-12-08
US17/115,158 US11626104B2 (en) 2020-12-08 2020-12-08 User speech profile management
PCT/US2021/071617 WO2022126040A1 (en) 2020-12-08 2021-09-28 User speech profile management

Publications (1)

Publication Number Publication Date
KR20230118089A true KR20230118089A (ko) 2023-08-10

Family

ID=78303075

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020237018503A Pending KR20230118089A (ko) 2020-12-08 2021-09-28 사용자 스피치 프로파일 관리

Country Status (6)

Country Link
US (1) US11626104B2 (https=)
EP (1) EP4260314A1 (https=)
JP (1) JP7753363B2 (https=)
KR (1) KR20230118089A (https=)
CN (1) CN116583899A (https=)
WO (1) WO2022126040A1 (https=)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11929077B2 (en) * 2019-12-23 2024-03-12 Dts Inc. Multi-stage speaker enrollment in voice authentication and identification
US11462218B1 (en) * 2020-04-29 2022-10-04 Amazon Technologies, Inc. Conserving battery while detecting for human voice
US12198677B2 (en) * 2022-05-27 2025-01-14 Tencent America LLC Techniques for end-to-end speaker diarization with generalized neural speaker clustering
KR102516391B1 (ko) * 2022-09-02 2023-04-03 주식회사 액션파워 음성 구간 길이를 고려하여 오디오에서 음성 구간을 검출하는 방법
CN116364063B (zh) * 2023-06-01 2023-09-05 蔚来汽车科技(安徽)有限公司 音素对齐方法、设备、驾驶设备和介质
WO2025254947A1 (en) * 2024-06-04 2025-12-11 Qualcomm Incorporated Speech profile management

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6424946B1 (en) 1999-04-09 2002-07-23 International Business Machines Corporation Methods and apparatus for unknown speaker labeling using concurrent speech recognition, segmentation, classification and clustering
WO2005122141A1 (en) * 2004-06-09 2005-12-22 Canon Kabushiki Kaisha Effective audio segmentation and classification
US7536304B2 (en) * 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
US8630854B2 (en) * 2010-08-31 2014-01-14 Fujitsu Limited System and method for generating videoconference transcriptions
GB2489489B (en) 2011-03-30 2013-08-21 Toshiba Res Europ Ltd A speech processing system and method
US9898723B2 (en) * 2012-12-19 2018-02-20 Visa International Service Association System and method for voice authentication
US9666204B2 (en) * 2014-04-30 2017-05-30 Qualcomm Incorporated Voice profile management and speech signal generation
WO2016022588A1 (en) * 2014-08-04 2016-02-11 Flagler Llc Voice tallying system
GB2525464B (en) * 2015-01-13 2016-03-16 Validsoft Uk Ltd Authentication method
US10373612B2 (en) * 2016-03-21 2019-08-06 Amazon Technologies, Inc. Anchored speech detection and speech recognition
JP6676009B2 (ja) 2017-06-23 2020-04-08 日本電信電話株式会社 話者判定装置、話者判定情報生成方法、プログラム
WO2019048062A1 (en) 2017-09-11 2019-03-14 Telefonaktiebolaget Lm Ericsson (Publ) MANAGING USER PROFILES WITH VOICE COMMAND
WO2019203794A1 (en) * 2018-04-16 2019-10-24 Google Llc Automatically determining language for speech recognition of spoken utterance received via an automated assistant interface
US11398218B1 (en) * 2018-04-26 2022-07-26 United Services Automobile Association (Usaa) Dynamic speech output configuration
US10991379B2 (en) * 2018-06-22 2021-04-27 Babblelabs Llc Data driven audio enhancement
EP3627505B1 (en) * 2018-09-21 2023-11-15 Televic Conference NV Real-time speaker identification with diarization
US11024291B2 (en) * 2018-11-21 2021-06-01 Sri International Real-time class recognition for an audio stream
US11545156B2 (en) * 2020-05-27 2023-01-03 Microsoft Technology Licensing, Llc Automated meeting minutes generation service

Also Published As

Publication number Publication date
US20220180859A1 (en) 2022-06-09
WO2022126040A1 (en) 2022-06-16
CN116583899A (zh) 2023-08-11
TW202223877A (zh) 2022-06-16
US11626104B2 (en) 2023-04-11
JP7753363B2 (ja) 2025-10-14
JP2023553867A (ja) 2023-12-26
EP4260314A1 (en) 2023-10-18

Similar Documents

Publication Publication Date Title
KR20230118089A (ko) 사용자 스피치 프로파일 관리
US9293133B2 (en) Improving voice communication over a network
US20190355352A1 (en) Voice and conversation recognition system
US20190122661A1 (en) System and method to detect cues in conversational speech
CN105580071A (zh) 用于训练声音识别模型数据库的方法和装置
CN112585674B (zh) 信息处理装置、信息处理方法和存储介质
US12567414B2 (en) System and method for detecting a wakeup command for a voice assistant
JP2020095121A (ja) 音声認識システム、学習済みモデルの生成方法、音声認識システムの制御方法、プログラム、及び移動体
US20250058726A1 (en) Voice assistant optimization dependent on vehicle occupancy
CN120457484A (zh) 针对多个用户的说话者特定语音过滤
JP2021047507A (ja) 通知システム、通知制御装置、通知制御方法、及び通知制御プログラム
CN115310066A (zh) 一种升级方法、装置及电子设备
US20210082427A1 (en) Information processing apparatus and information processing method
CN112750440B (zh) 一种信息处理方法及装置
TWI918728B (zh) 用戶話音輪廓管理
CN119317956A (zh) 情绪感知语音助理
WO2024053915A1 (en) System and method for detecting a wakeup command for a voice assistant
US20250372098A1 (en) Speech profile management
US20240419731A1 (en) Knowledge-based audio scene graph
EP4728513A1 (en) Knowledge-based audio scene graph
TW202605803A (zh) 話音設定檔管理
KR20230122427A (ko) 차량 및 그 제어 방법
WO2025254947A1 (en) Speech profile management
CN121641030A (zh) 文本显示方法、装置及电子设备
CN118433612A (zh) 扬声器音量控制方法、装置、车载语音通话系统及介质

Legal Events

Date Code Title Description
PA0105 International application

Patent event date: 20230531

Patent event code: PA01051R01D

Comment text: International Patent Application

PG1501 Laying open of application
A201 Request for examination
PA0201 Request for examination

Patent event code: PA02012R01D

Patent event date: 20240912

Comment text: Request for Examination of Application