EA202091595A1 - METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER - Google Patents
METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCERInfo
- Publication number
- EA202091595A1 EA202091595A1 EA202091595A EA202091595A EA202091595A1 EA 202091595 A1 EA202091595 A1 EA 202091595A1 EA 202091595 A EA202091595 A EA 202091595A EA 202091595 A EA202091595 A EA 202091595A EA 202091595 A1 EA202091595 A1 EA 202091595A1
- Authority
- EA
- Eurasian Patent Office
- Prior art keywords
- voice
- voice model
- speakers
- target
- telephone conversations
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/75—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Telephonic Communication Services (AREA)
Abstract
Изобретение относится к области голосовой биометрии, в частности к задаче автоматической оценки голосовых моделей дикторов по записям их телефонных переговоров с автоматической привязкой голосовой модели диктора к номеру телефона. Предложен способ получения голосовой модели целевого диктора, согласно которому осуществляют сегментацию по голосам дикторов по меньшей мере двух фонограмм телефонных переговоров с получением сегментов речи; строят голосовые модели дикторов по полученным сегментам речи; осуществляют кластеризацию построенных голосовых моделей дикторов с использованием метаданных телефонных переговоров с получением кластеров; определяют связи между кластерами на основании фонограмм телефонных переговоров и выделяют кластер с наибольшим количеством связей как кластер целевого диктора. Также предложено устройство для получения голосовой модели целевого диктора.The invention relates to the field of voice biometrics, in particular to the problem of automatic assessment of voice models of speakers based on the recordings of their telephone conversations with automatic binding of the voice model of the speaker to a telephone number. A method for obtaining a voice model of a target speaker is proposed, according to which segmentation is carried out according to the voices of the speakers of at least two phonograms of telephone conversations to obtain speech segments; build voice models of speakers based on the received speech segments; clustering the constructed voice models of speakers using the metadata of telephone conversations to obtain clusters; determine connections between clusters on the basis of phonograms of telephone conversations and select the cluster with the greatest number of connections as a target speaker's cluster. Also proposed is a device for obtaining a target speaker's voice model.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/RU2017/000990 WO2019132690A1 (en) | 2017-12-27 | 2017-12-27 | Method and device for building voice model of target speaker |
Publications (1)
Publication Number | Publication Date |
---|---|
EA202091595A1 true EA202091595A1 (en) | 2020-09-18 |
Family
ID=67067964
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EA202091595A EA202091595A1 (en) | 2017-12-27 | 2017-12-27 | METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER |
Country Status (3)
Country | Link |
---|---|
KR (1) | KR20200140235A (en) |
EA (1) | EA202091595A1 (en) |
WO (1) | WO2019132690A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111865926A (en) * | 2020-06-24 | 2020-10-30 | 深圳壹账通智能科技有限公司 | Call channel construction method and device based on double models and computer equipment |
CN111785291A (en) * | 2020-07-02 | 2020-10-16 | 北京捷通华声科技股份有限公司 | Voice separation method and voice separation device |
CN112750440B (en) * | 2020-12-30 | 2023-12-29 | 北京捷通华声科技股份有限公司 | Information processing method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3745403B2 (en) * | 1994-04-12 | 2006-02-15 | ゼロックス コーポレイション | Audio data segment clustering method |
EP2499637A1 (en) * | 2009-11-12 | 2012-09-19 | Agnitio S.L. | Speaker recognition from telephone calls |
RU2530314C1 (en) * | 2013-04-23 | 2014-10-10 | Общество с ограниченной ответственностью "ЦРТ-инновации" | Method for hybrid generative-discriminative segmentation of speakers in audio-flow |
-
2017
- 2017-12-27 KR KR1020207021848A patent/KR20200140235A/en not_active Application Discontinuation
- 2017-12-27 EA EA202091595A patent/EA202091595A1/en unknown
- 2017-12-27 WO PCT/RU2017/000990 patent/WO2019132690A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2019132690A1 (en) | 2019-07-04 |
KR20200140235A (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI643184B (en) | Method and apparatus for speaker diarization | |
US11023690B2 (en) | Customized output to optimize for user preference in a distributed system | |
JP6954680B2 (en) | Speaker confirmation method and speaker confirmation device | |
GB2566215A (en) | Voice user interface | |
EP4235646A3 (en) | Adaptive audio enhancement for multichannel speech recognition | |
EP2806425A3 (en) | System and method for speaker verification | |
EP4084000A3 (en) | Neural networks for speaker verification | |
EA202091595A1 (en) | METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER | |
JP2017515140A5 (en) | ||
DE602004023134D1 (en) | LANGUAGE RECOGNITION AND SYSTEM ADAPTED TO THE CHARACTERISTICS OF NON-NUT SPEAKERS | |
JP2014515833A5 (en) | ||
EP3751561A3 (en) | Hotword recognition | |
ATE491202T1 (en) | COMPENSATING BETWEEN-SESSION VARIABILITY TO AUTOMATICALLY EXTRACT INFORMATION FROM SPEECH | |
Sun et al. | Speaker diarization system for RT07 and RT09 meeting room audio | |
EP4343615A3 (en) | Neural speech-to-meaning translation | |
EP2963643A3 (en) | Entity name recognition | |
WO2017172632A3 (en) | Characterizing, selecting and adapting audio and acoustic training data for automatic speech recognition systems | |
WO2014025682A3 (en) | Acoustic data selection for training the parameters of an acoustic model | |
CN103730112B (en) | Multi-channel voice simulation and acquisition method | |
WO2014115115A3 (en) | Determining apnea-hypopnia index ahi from speech | |
WO2021074721A3 (en) | System for automatic assessment of fluency in spoken language and a method thereof | |
RU2015141805A (en) | SIMULATION OF ACOUSTIC PULSE RESPONSE | |
MX2018001996A (en) | Dynamic acoustic model for vehicle. | |
US20160210982A1 (en) | Method and Apparatus to Enhance Speech Understanding | |
WO2021021814A3 (en) | Acoustic zoning with distributed microphones |