EA202091595A1 - METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER - Google Patents

METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER

Info

Publication number
EA202091595A1
EA202091595A1 EA202091595A EA202091595A EA202091595A1 EA 202091595 A1 EA202091595 A1 EA 202091595A1 EA 202091595 A EA202091595 A EA 202091595A EA 202091595 A EA202091595 A EA 202091595A EA 202091595 A1 EA202091595 A1 EA 202091595A1
Authority
EA
Eurasian Patent Office
Prior art keywords
voice
voice model
speakers
target
telephone conversations
Prior art date
Application number
EA202091595A
Other languages
Russian (ru)
Inventor
Сергей Александрович НОВОСЕЛОВ
Александр Викторович КОЗЛОВ
Дмитрий Александрович РУМЯНЦЕВ
Олег Юрьевич КУДАШЕВ
Original Assignee
Общество с ограниченной ответственностью "Центр речевых технологий"
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Общество с ограниченной ответственностью "Центр речевых технологий" filed Critical Общество с ограниченной ответственностью "Центр речевых технологий"
Publication of EA202091595A1 publication Critical patent/EA202091595A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Изобретение относится к области голосовой биометрии, в частности к задаче автоматической оценки голосовых моделей дикторов по записям их телефонных переговоров с автоматической привязкой голосовой модели диктора к номеру телефона. Предложен способ получения голосовой модели целевого диктора, согласно которому осуществляют сегментацию по голосам дикторов по меньшей мере двух фонограмм телефонных переговоров с получением сегментов речи; строят голосовые модели дикторов по полученным сегментам речи; осуществляют кластеризацию построенных голосовых моделей дикторов с использованием метаданных телефонных переговоров с получением кластеров; определяют связи между кластерами на основании фонограмм телефонных переговоров и выделяют кластер с наибольшим количеством связей как кластер целевого диктора. Также предложено устройство для получения голосовой модели целевого диктора.The invention relates to the field of voice biometrics, in particular to the problem of automatic assessment of voice models of speakers based on the recordings of their telephone conversations with automatic binding of the voice model of the speaker to a telephone number. A method for obtaining a voice model of a target speaker is proposed, according to which segmentation is carried out according to the voices of the speakers of at least two phonograms of telephone conversations to obtain speech segments; build voice models of speakers based on the received speech segments; clustering the constructed voice models of speakers using the metadata of telephone conversations to obtain clusters; determine connections between clusters on the basis of phonograms of telephone conversations and select the cluster with the greatest number of connections as a target speaker's cluster. Also proposed is a device for obtaining a target speaker's voice model.

EA202091595A 2017-12-27 2017-12-27 METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER EA202091595A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2017/000990 WO2019132690A1 (en) 2017-12-27 2017-12-27 Method and device for building voice model of target speaker

Publications (1)

Publication Number Publication Date
EA202091595A1 true EA202091595A1 (en) 2020-09-18

Family

ID=67067964

Family Applications (1)

Application Number Title Priority Date Filing Date
EA202091595A EA202091595A1 (en) 2017-12-27 2017-12-27 METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER

Country Status (3)

Country Link
KR (1) KR20200140235A (en)
EA (1) EA202091595A1 (en)
WO (1) WO2019132690A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111865926A (en) * 2020-06-24 2020-10-30 深圳壹账通智能科技有限公司 Call channel construction method and device based on double models and computer equipment
CN111785291A (en) * 2020-07-02 2020-10-16 北京捷通华声科技股份有限公司 Voice separation method and voice separation device
CN112750440B (en) * 2020-12-30 2023-12-29 北京捷通华声科技股份有限公司 Information processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3745403B2 (en) * 1994-04-12 2006-02-15 ゼロックス コーポレイション Audio data segment clustering method
EP2499637A1 (en) * 2009-11-12 2012-09-19 Agnitio S.L. Speaker recognition from telephone calls
RU2530314C1 (en) * 2013-04-23 2014-10-10 Общество с ограниченной ответственностью "ЦРТ-инновации" Method for hybrid generative-discriminative segmentation of speakers in audio-flow

Also Published As

Publication number Publication date
WO2019132690A1 (en) 2019-07-04
KR20200140235A (en) 2020-12-15

Similar Documents

Publication Publication Date Title
TWI643184B (en) Method and apparatus for speaker diarization
US11023690B2 (en) Customized output to optimize for user preference in a distributed system
JP6954680B2 (en) Speaker confirmation method and speaker confirmation device
GB2566215A (en) Voice user interface
EP4235646A3 (en) Adaptive audio enhancement for multichannel speech recognition
EP2806425A3 (en) System and method for speaker verification
EP4084000A3 (en) Neural networks for speaker verification
EA202091595A1 (en) METHOD AND DEVICE FOR BUILDING VOICE MODEL OF A TARGET ANNOUNCER
JP2017515140A5 (en)
DE602004023134D1 (en) LANGUAGE RECOGNITION AND SYSTEM ADAPTED TO THE CHARACTERISTICS OF NON-NUT SPEAKERS
JP2014515833A5 (en)
EP3751561A3 (en) Hotword recognition
ATE491202T1 (en) COMPENSATING BETWEEN-SESSION VARIABILITY TO AUTOMATICALLY EXTRACT INFORMATION FROM SPEECH
Sun et al. Speaker diarization system for RT07 and RT09 meeting room audio
EP4343615A3 (en) Neural speech-to-meaning translation
EP2963643A3 (en) Entity name recognition
WO2017172632A3 (en) Characterizing, selecting and adapting audio and acoustic training data for automatic speech recognition systems
WO2014025682A3 (en) Acoustic data selection for training the parameters of an acoustic model
CN103730112B (en) Multi-channel voice simulation and acquisition method
WO2014115115A3 (en) Determining apnea-hypopnia index ahi from speech
WO2021074721A3 (en) System for automatic assessment of fluency in spoken language and a method thereof
RU2015141805A (en) SIMULATION OF ACOUSTIC PULSE RESPONSE
MX2018001996A (en) Dynamic acoustic model for vehicle.
US20160210982A1 (en) Method and Apparatus to Enhance Speech Understanding
WO2021021814A3 (en) Acoustic zoning with distributed microphones