KR101011713B1 - 화자의 압축된 표시를 위한 음성 신호 분석 방법 및 시스템 - Google Patents
화자의 압축된 표시를 위한 음성 신호 분석 방법 및 시스템 Download PDFInfo
- Publication number
- KR101011713B1 KR101011713B1 KR1020067000063A KR20067000063A KR101011713B1 KR 101011713 B1 KR101011713 B1 KR 101011713B1 KR 1020067000063 A KR1020067000063 A KR 1020067000063A KR 20067000063 A KR20067000063 A KR 20067000063A KR 101011713 B1 KR101011713 B1 KR 101011713B1
- Authority
- KR
- South Korea
- Prior art keywords
- speaker
- speech
- speech signal
- similarity
- speakers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000004458 analytical method Methods 0.000 title claims description 18
- 238000000034 method Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 20
- 239000000203 mixture Substances 0.000 claims description 6
- 238000012360 testing method Methods 0.000 description 7
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 238000009472 formulation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000013095 identification testing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Circuit For Audible Band Transducer (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims (11)
- 소정 모델의 화자(λ)의 음성 표시와 상기 소정 모델의 다수(E)의 참조 화자들의 소정 집합의 음성 표시들 사이의 유사성을 나타내는 확률 밀도가 사용되고, 상기 확률 밀도는 상기 확률 밀도로부터 음성 신호 상의 정보를 추출하기 위하여 분석되는 것을 특징으로 하는 화자(λ)의 음성 신호 분석 방법.
- 제1항에 있어서,M 가우시안의 혼합을 사용한 크기 D의 절대적인 모델(GMM)이 상기 소정 모델로 주어지고, 상기 화자(λ)는 상기 절대적인 모델(GMM)에서의 가우시안 혼합을 위한 가중 계수(αi, i=1 내지 M), 크기 D의 평균 벡터(μi, i=1 내지 M) 및 크기 D×D의 공분산 행렬(∑i, i=1 내지 M)을 포함하는 변수들의 집합에 의하여 표시되는 것을 특징으로 하는 화자(λ)의 음성 신호 분석 방법.
- 제2항에 있어서,상기 화자(λ)의 상기 음성 신호의 표시와 소정 집합의 상기 참조 화자들의 상기 음성 표시들 사이의 유사성의 확률 밀도는 상기 소정 집합의 E 참조 화자들의 유사성 공간으로 추정되는 크기 E×E의 공분산 행렬(Σλ)과 크기 E의 평균 벡터(μλ)의 가우시안 분포(ψ(μλ, Σλ))에 의하여 표시되는 것을 특징으로 하는 화자(λ)의 음성 신호 분석 방법.
- 제3항에 있어서,상기 E 참조 화자들의 소정 집합에 대하여 유사성 공간의 Nλ 벡터들에 의하여 표시되는 음성 신호들의 Nλ 세그먼트들이 존재하는 상기 E 참조 화자들에 대한, 상기 E 참조 화자들에 대한 상기 화자(λ)의 유사성(ψ(μλ, Σλ))은 상기 E 참조 화자들에 대하여 화자(λ)의 유사성의 공분산 행렬(Σλ) 및 크기 E의 평균 벡터(μλ)의 함수로 정의되는 것을 특징으로 하는 화자(λ)의 음성 신호 분석 방법.
- 소정 모델의 화자들의 소정 집합의 음성 신호 및 그들의 연관 음성 표시가 저장된 데이터베이스를 포함하는 화자(λ)의 음성 신호를 분석하기 위한 시스템에 있어서,상기 화자(λ)의 음성 표시와 E 참조 화자들의 음성 표시의 소정 집합들 사이의 유사성을 나타내는 확률 밀도를 사용하는 음성 신호 분석 수단을 포함하는 것을 특징으로 하는 화자(λ)의 음성 신호 분석 시스템.
- 제7항에 있어서,상기 데이터 베이스는 상기 분석 수단에 의하여 수행되는 음성 신호 분석을 더 저장하는 것을 특징으로 하는 화자(λ)의 음성 신호 분석 시스템.
- 제1항 내지 제6항 중 어느 한 항에 청구된 방법을 사용하여 오디오 문서를 인덱싱하는 방법.
- 제1항 내지 제6항 중 어느 한 항에 청구된 방법을 사용하여 화자를 식별하는 방법.
- 제1항 내지 제6항 중 어느 한 항에 청구된 방법을 사용하여 화자를 확인하는 방법.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/FR2003/002037 WO2005015547A1 (fr) | 2003-07-01 | 2003-07-01 | Procede et systeme d'analyse de signaux vocaux pour la representation compacte de locuteurs |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20060041208A KR20060041208A (ko) | 2006-05-11 |
KR101011713B1 true KR101011713B1 (ko) | 2011-01-28 |
Family
ID=34130575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020067000063A Expired - Fee Related KR101011713B1 (ko) | 2003-07-01 | 2003-07-01 | 화자의 압축된 표시를 위한 음성 신호 분석 방법 및 시스템 |
Country Status (7)
Country | Link |
---|---|
US (1) | US7539617B2 (ko) |
EP (1) | EP1639579A1 (ko) |
JP (1) | JP4652232B2 (ko) |
KR (1) | KR101011713B1 (ko) |
CN (1) | CN1802695A (ko) |
AU (1) | AU2003267504A1 (ko) |
WO (1) | WO2005015547A1 (ko) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7548651B2 (en) * | 2003-10-03 | 2009-06-16 | Asahi Kasei Kabushiki Kaisha | Data process unit and data process unit control program |
US8566093B2 (en) * | 2006-05-16 | 2013-10-22 | Loquendo S.P.A. | Intersession variability compensation for automatic extraction of information from voice |
JP4717872B2 (ja) * | 2006-12-06 | 2011-07-06 | 韓國電子通信研究院 | 話者の音声特徴情報を利用した話者情報獲得システム及びその方法 |
AU2007335251B2 (en) | 2006-12-19 | 2014-05-15 | Validvoice, Llc | Confidence levels for speaker recognition |
CN102237084A (zh) * | 2010-04-22 | 2011-11-09 | 松下电器产业株式会社 | 声音空间基准模型的在线自适应调节方法及装置和设备 |
US8635067B2 (en) * | 2010-12-09 | 2014-01-21 | International Business Machines Corporation | Model restructuring for client and server based automatic speech recognition |
CN103229233B (zh) * | 2010-12-10 | 2015-11-25 | 松下电器(美国)知识产权公司 | 用于识别说话人的建模设备和方法、以及说话人识别系统 |
JP6556575B2 (ja) | 2015-09-15 | 2019-08-07 | 株式会社東芝 | 音声処理装置、音声処理方法及び音声処理プログラム |
WO2018009969A1 (en) * | 2016-07-11 | 2018-01-18 | Ftr Pty Ltd | Method and system for automatically diarising a sound recording |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6411930B1 (en) | 1998-11-18 | 2002-06-25 | Lucent Technologies Inc. | Discriminative gaussian mixture models for speaker verification |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2105034C (en) * | 1992-10-09 | 1997-12-30 | Biing-Hwang Juang | Speaker verification with cohort normalized scoring |
US5664059A (en) * | 1993-04-29 | 1997-09-02 | Panasonic Technologies, Inc. | Self-learning speaker adaptation based on spectral variation source decomposition |
US5793891A (en) * | 1994-07-07 | 1998-08-11 | Nippon Telegraph And Telephone Corporation | Adaptive training method for pattern recognition |
JPH08110792A (ja) * | 1994-10-12 | 1996-04-30 | Atr Onsei Honyaku Tsushin Kenkyusho:Kk | 話者適応化装置及び音声認識装置 |
US5864810A (en) * | 1995-01-20 | 1999-01-26 | Sri International | Method and apparatus for speech recognition adapted to an individual speaker |
US5790758A (en) * | 1995-07-07 | 1998-08-04 | The United States Of America As Represented By The Secretary Of The Navy | Neural network architecture for gaussian components of a mixture density function |
US5835890A (en) * | 1996-08-02 | 1998-11-10 | Nippon Telegraph And Telephone Corporation | Method for speaker adaptation of speech models recognition scheme using the method and recording medium having the speech recognition method recorded thereon |
US6029124A (en) * | 1997-02-21 | 2000-02-22 | Dragon Systems, Inc. | Sequential, nonparametric speech recognition and speaker identification |
US6212498B1 (en) * | 1997-03-28 | 2001-04-03 | Dragon Systems, Inc. | Enrollment in speech recognition |
US6009390A (en) * | 1997-09-11 | 1999-12-28 | Lucent Technologies Inc. | Technique for selective use of Gaussian kernels and mixture component weights of tied-mixture hidden Markov models for speech recognition |
US5946656A (en) * | 1997-11-17 | 1999-08-31 | At & T Corp. | Speech and speaker recognition using factor analysis to model covariance structure of mixture components |
US6141644A (en) * | 1998-09-04 | 2000-10-31 | Matsushita Electric Industrial Co., Ltd. | Speaker verification and speaker identification based on eigenvoices |
US20010044719A1 (en) * | 1999-07-02 | 2001-11-22 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for recognizing, indexing, and searching acoustic signals |
US6954745B2 (en) * | 2000-06-02 | 2005-10-11 | Canon Kabushiki Kaisha | Signal processing system |
US7035790B2 (en) * | 2000-06-02 | 2006-04-25 | Canon Kabushiki Kaisha | Speech processing system |
US6754628B1 (en) * | 2000-06-13 | 2004-06-22 | International Business Machines Corporation | Speaker recognition using cohort-specific feature transforms |
-
2003
- 2003-07-01 KR KR1020067000063A patent/KR101011713B1/ko not_active Expired - Fee Related
- 2003-07-01 WO PCT/FR2003/002037 patent/WO2005015547A1/fr active Application Filing
- 2003-07-01 US US10/563,065 patent/US7539617B2/en not_active Expired - Fee Related
- 2003-07-01 EP EP03748194A patent/EP1639579A1/fr not_active Withdrawn
- 2003-07-01 JP JP2005507539A patent/JP4652232B2/ja not_active Expired - Fee Related
- 2003-07-01 CN CNA038267411A patent/CN1802695A/zh active Pending
- 2003-07-01 AU AU2003267504A patent/AU2003267504A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6411930B1 (en) | 1998-11-18 | 2002-06-25 | Lucent Technologies Inc. | Discriminative gaussian mixture models for speaker verification |
Non-Patent Citations (1)
Title |
---|
Reynolds et al. 'Speaker verification using adapted Gaussian mixture models', Digital signal processing, Vol.10, Nos.1-3, pp.19-41, January 2000 |
Also Published As
Publication number | Publication date |
---|---|
KR20060041208A (ko) | 2006-05-11 |
CN1802695A (zh) | 2006-07-12 |
US7539617B2 (en) | 2009-05-26 |
US20060253284A1 (en) | 2006-11-09 |
JP4652232B2 (ja) | 2011-03-16 |
AU2003267504A1 (en) | 2005-02-25 |
WO2005015547A1 (fr) | 2005-02-17 |
EP1639579A1 (fr) | 2006-03-29 |
JP2007514959A (ja) | 2007-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9536525B2 (en) | Speaker indexing device and speaker indexing method | |
JP5229478B2 (ja) | 統計モデル学習装置、統計モデル学習方法、およびプログラム | |
US10008209B1 (en) | Computer-implemented systems and methods for speaker recognition using a neural network | |
JP5059115B2 (ja) | 音声キーワードの特定方法、装置及び音声識別システム | |
EP2022042B1 (en) | Intersession variability compensation for automatic extraction of information from voice | |
US20160111112A1 (en) | Speaker change detection device and speaker change detection method | |
US20100114572A1 (en) | Speaker selecting device, speaker adaptive model creating device, speaker selecting method, speaker selecting program, and speaker adaptive model making program | |
US20020095287A1 (en) | Method of determining an eigenspace for representing a plurality of training speakers | |
KR102195246B1 (ko) | 음성 신호를 이용한 감정 분류 방법, 이를 수행하기 위한 기록 매체 및 장치 | |
CN112017694B (zh) | 语音数据的评测方法和装置、存储介质和电子装置 | |
EP1005019B1 (en) | Segment-based similarity measurement method for speech recognition | |
US20020143539A1 (en) | Method of determining an eigenspace for representing a plurality of training speakers | |
KR101011713B1 (ko) | 화자의 압축된 표시를 위한 음성 신호 분석 방법 및 시스템 | |
US7783581B2 (en) | Data learning system for identifying, learning apparatus, identifying apparatus and learning method | |
EP1431959A2 (en) | Gaussian model-based dynamic time warping system and method for speech processing | |
CN1391211A (zh) | 对识别系统中的参数进行训练的方法和系统 | |
KR20060072504A (ko) | 음성 인식 방법 및 장치 | |
US7516071B2 (en) | Method of modeling single-enrollment classes in verification and identification tasks | |
Luettin et al. | Learning to recognise talking faces | |
WO2002029785A1 (en) | Method, apparatus, and system for speaker verification based on orthogonal gaussian mixture model (gmm) | |
JP2008040035A (ja) | 発音評定装置、およびプログラム | |
US11996086B2 (en) | Estimation device, estimation method, and estimation program | |
KR101524848B1 (ko) | 오디오 유형 판별장치 | |
Kamble et al. | Spontaneous emotion recognition for Marathi spoken words | |
JP3036509B2 (ja) | 話者照合における閾値決定方法及び装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PA0105 | International application |
Patent event date: 20060102 Patent event code: PA01051R01D Comment text: International Patent Application |
|
PG1501 | Laying open of application | ||
A201 | Request for examination | ||
PA0201 | Request for examination |
Patent event code: PA02012R01D Patent event date: 20080620 Comment text: Request for Examination of Application |
|
E902 | Notification of reason for refusal | ||
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20100329 Patent event code: PE09021S01D |
|
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20101118 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20110124 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20110125 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
LAPS | Lapse due to unpaid annual fee | ||
PC1903 | Unpaid annual fee |