CN109065059A - The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established - Google Patents

The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established Download PDF

Info

Publication number
CN109065059A
CN109065059A CN201811118265.6A CN201811118265A CN109065059A CN 109065059 A CN109065059 A CN 109065059A CN 201811118265 A CN201811118265 A CN 201811118265A CN 109065059 A CN109065059 A CN 109065059A
Authority
CN
China
Prior art keywords
speaker
audio
principal component
new
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811118265.6A
Other languages
Chinese (zh)
Inventor
陈永清
陈东风
王贵珊
李瑞娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Bart (anhui) Intelligent Technology Co Ltd
Original Assignee
New Bart (anhui) Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Bart (anhui) Intelligent Technology Co Ltd filed Critical New Bart (anhui) Intelligent Technology Co Ltd
Priority to CN201811118265.6A priority Critical patent/CN109065059A/en
Publication of CN109065059A publication Critical patent/CN109065059A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of methods for identifying speaker with the voice cluster of audio frequency characteristics principal component foundation, this method is to combine the hierarchical clustering of principal component analysis and the Euclidean distance based on audio frequency characteristics in principle components space, specifically: collect different training audio sample collection;Calculate the time domain and frequency domain audio feature of each sample;Calculate the average value and standard deviation of time domain and frequency domain audio feature;Principal component analysis is carried out to training sample by calculated data;Each audio is represented by audio characteristic data along the coordinate of above-mentioned N number of principal component projection;Using UPGMA cluster algorithm, speaker is clustered based on the distance in n-dimensional space.Method of the invention has speed fast, and the new convenient feature of human speech sound of speaking of addition is used for intelligent language tutoring system, realizes Speaker Identification, speaker is differentiated in time from unknown multiple spokesman's sessions, conducive to targetedly imparting knowledge to students.

Description

The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established
Technical field:
The invention belongs to speaker Recognition Technology field, in particular to a kind of voice group established with audio frequency characteristics principal component The method for collecting to identify speaker.
Background technique:
Speaker Identification is one mode identification problem.Various technologies for handling and storing vocal print include that frequency is estimated Meter, hidden Markov model, gauss hybrid models, pattern matching algorithm, matrix expression, vector quantization, support vector machines and certainly Plan tree, some systems also use " anti-speaker " technology, such as queuing model and world model.Neural network in recent years, especially Deep neural network and convolutional neural networks are widely used in speech recognition and obtain immense success.Similar technology is also used for Speaker Identification.However existing session identification technology not only needs a large amount of voice data, but also the training time is also longer, to having A bit using not very convenient.
Currently, service robot is either in the world or domestic all not counting especially mature, session robotic is not only wanted It can understand what you are saying, also to understand more people while talk with, this is difficult for robot.Because speech intonation is different Mingle, robot can not accept dialogue that cannot be smooth.For this purpose, being difficult to meet reality for session identification technology in the prior art Border application demand, the application provide a kind of voice cluster established with audio frequency characteristics principal component to break this technical barrier and know The method of other speaker.
Summary of the invention:
The purpose of the present invention is intended to provide a kind of voice cluster established with audio frequency characteristics principal component and identifies speaker's Method differentiates speaker from unknown multiple spokesman's sessions to realize intelligent language tutoring system Speaker Identification in time.
In order to achieve the above objectives, the present invention takes following technical scheme:
The voice cluster that the present invention is established with audio frequency characteristics principal component is come the method that identifies speaker, mainly by principal component The hierarchical clustering of analysis (PCA) and the Euclidean distance based on audio frequency characteristics in principle components space combines, and specifically includes Following steps:
1) different training audio sample collection is collected;
2) algorithm according to described in Librosa calculates the time domain and frequency domain audio feature of each sample;The frequency domain sound Frequency feature mainly include zero-crossing rate, root mean square energy, spectral centroid and bandwidth, Mel-Frequency cepstrum coefficient (MFCC) and Fundamental tone grade or coloration.
3) average value and standard deviation of above-mentioned time domain and frequency domain audio feature are calculated separately out;
4) principal component analysis is carried out to training sample by calculated above-mentioned data, 95% variance can be explained by selecting Top n component;
5) each audio is represented by audio characteristic data along the coordinate of above-mentioned N number of principal component projection;
6) UPGMA cluster algorithm is used, speaker is clustered based on the distance in n-dimensional space.
The above-mentioned distance based in n-dimensional space clusters specifically first by the speaker clustering of minimum distance speaker At cluster or branch, coordinate is the speaker for including or the average value of leaf, is continued until that all speakers are added with this To cluster, one tree is formed.
Further, identify speaker in new audio with the following method:
Reading or typing new speech, first calculate new audio characteristic data, and are converted into the projection of N-dimensional principle components space Coordinate;
Branches and leaves in above-mentioned existing cluster tree are compared with new audio, find out immediate speaker, that is, are calculated new The similarity of audio and immediate speaker, specifically:
Distance d is first calculated, matching score s is then calculated by following equation:
As d≤rave,
As d≤rave,
Wherein, r in above formulaaveAnd rsdIt is the flat of the distance from immediate speaker's audio frequency characteristics coordinate sample to center Equal and standard deviation,Cdf is normal cumulative distribution function.
If score s is higher than specified cutoff value d, new audio and immediate speaker are same speakers;Otherwise, newly Audio is from new speaker.
The new audio data coordinate of above-mentioned acquisition is added in the above cluster tree as new entry, is used to further identification Thus voice from this new speaker constitutes new voice cluster tree.
The beneficial effects of the present invention are:
(1) compared with prior art, the method that the present invention identifies speaker only needs one group of different phonetic file to train With establish a starting cluster tree, the audio to be identified can be entirely different with these training voices, after starting cluster tree foundation No longer need to be trained can Direct Recognition new speech, addition newly speaks human speech sound.
(2) special algorithm is utilized in the method for present invention identification speaker, listens dialogue by succinct, fast and accurate Must be clear, then this method has speed fast, the new convenient feature of human speech sound of speaking of addition.
(3) method of the invention is used for intelligent language tutoring system, realizes Speaker Identification, from unknown multiple hairs Speaker is differentiated in time in speaker's session, conducive to targetedly imparting knowledge to students.
Detailed description of the invention:
Fig. 1 is the flow chart that speaker's voice cluster is established in the specific embodiment of the invention;
Fig. 2 is identification speaker's voice flow figure in the specific embodiment of the invention.
Specific embodiment:
In conjunction with the embodiments below by attached drawing, further specific be described in detail is made to technical solution of the present invention.
Referring to Fig. 1, the present invention first passes through principal component analysis (PCA) and based on audio spy on the basis of Speaker Identification The hierarchical clustering for levying the Euclidean distance in principle components space, which combines, establishes speaker's voice cluster, and specific steps are such as Under:
(1) reading training voice document;
(2) phonetic feature is calculated, i.e., the time domain and frequency domain audio feature of each trained voice document mainly include zero passage Rate, root mean square energy, spectral centroid and bandwidth, Mel-Frequency cepstrum coefficient (MFCC) and fundamental tone grade or coloration;
(3) principal component in phonetic feature is found, that is, the average value and standard deviation for calculating the above phonetic feature carry out Principal component analysis;
(4) coordinate in phonetic feature principal component space is calculated, i.e., selecting from phonetic feature principal component can explain Coordinate of the top n component of 95% variance as N number of principal component projection;
(5) based on the Distance aggregation voice in principal component space, a trained voice cluster is saved.
According to the voice cluster library established above based on speaker's speech audio feature principal component, it is exemplified below table 1:
The voice cluster library that table 1 is established based on speaker's speech audio feature principal component
The voice cluster library established in the above table 1 is given a mark and identified according to the parameter set that signature analysis obtains People is talked about whether in sound-groove model library.
Referring to fig. 2, the above-mentioned voice cluster kept is used into UPGMA cluster algorithm, by speaking for minimum distance People is clustered into cluster or branch, and coordinate is the speaker for including or the average value of leaf, and all speakers are continued until with this It is all added to cluster, forms one tree.When there is new speech, the step of identification speaker is as follows by means of the present invention:
(1) on the basis of reading trained voice cluster, reading or typing new speech;
(2) new speech characteristic is calculated;
(3) coordinate in new speech feature principal component space is calculated, i.e., new speech characteristic is converted into N-dimensional principal component Space projection coordinate;
(4) voice nearest with new speech is found out from trained voice cluster, i.e., by the branches and leaves in existing cluster tree It is compared with new speech, finds out immediate speaker;
(5) similarity of new speech Yu immediate speaker is calculated, specifically:
Distance d is first calculated, matching score s is then calculated by following equation:
As d≤rave,
As d≤rave,
Wherein, r in above formulaaveAnd rsdIt is the flat of the distance from immediate speaker's audio frequency characteristics coordinate sample to center Equal and standard deviation,Cdf is normal cumulative distribution function.
(6) if the cutoff value d of score s >=specified, new speech and nearest voice are same speakers;Otherwise, new speech From new speaker;
(7) it is added to the new speech of acquisition as new entry in the above cluster tree, constitutes new voice cluster tree.

Claims (4)

1. with the voice cluster that audio frequency characteristics principal component is established the method that identifies speaker, it is characterised in that: the method is The hierarchical clustering of principal component analysis and the Euclidean distance based on audio frequency characteristics in principle components space is combined, it is specific to wrap Include following steps:
1) different training audio sample collection is collected;
2) algorithm according to described in Librosa calculates the time domain and frequency domain audio feature of each sample;
3) average value and standard deviation of above-mentioned time domain and frequency domain audio feature are calculated separately out;
4) principal component analysis is carried out to training sample by calculated above-mentioned data, selects the preceding N that can explain 95% variance A component;
5) each audio is represented by audio characteristic data along the coordinate of above-mentioned N number of principal component projection;
6) UPGMA cluster algorithm is used, speaker is clustered based on the distance in n-dimensional space.
2. the method according to claim 1 for identifying speaker with the voice cluster of audio frequency characteristics principal component foundation, Be characterized in that: the time domain and frequency domain audio feature of sample described in step 2) include zero-crossing rate, root mean square energy, spectral centroid and Bandwidth, Mel-Frequency cepstrum coefficient and fundamental tone grade or coloration.
3. the method according to claim 1 for identifying speaker with the voice cluster of audio frequency characteristics principal component foundation, It is characterized in that: speaker being clustered specifically first by minimum distance based on the distance in n-dimensional space described in step 6) For speaker clustering at cluster or branch, coordinate is the speaker for including or the average value of leaf, is continued until all say with this Words people is added to cluster, forms one tree.
4. the method for identifying the speaker in new audio using method according to any one of claims 1 to 3, feature exist In: the method for the speaker in the new audio of identification includes the following steps:
Reading or typing new speech, first calculate new audio characteristic data, and are converted into the projection of N-dimensional principle components space and sit Mark;
Branches and leaves in above-mentioned existing cluster tree are compared with new audio, immediate speaker is found out, that is, calculates new audio With the similarity of immediate speaker, specifically:
Distance d is first calculated, matching score s is then calculated by following equation:
As d≤rave,
As d≤rave,
Wherein, r in above formulaaveAnd rsdBe the average of the distance from immediate speaker's audio frequency characteristics coordinate sample to center and Standard deviation,Cdf is normal cumulative distribution function;
If score s is higher than specified cutoff value d, new audio and immediate speaker are same speakers;Otherwise, new audio From new speaker;
The new audio data coordinate of the acquisition is added in the above cluster tree as new entry, comes from for further identifying Thus the voice of this new speaker constitutes new voice cluster tree.
CN201811118265.6A 2018-09-26 2018-09-26 The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established Withdrawn CN109065059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811118265.6A CN109065059A (en) 2018-09-26 2018-09-26 The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811118265.6A CN109065059A (en) 2018-09-26 2018-09-26 The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established

Publications (1)

Publication Number Publication Date
CN109065059A true CN109065059A (en) 2018-12-21

Family

ID=64765876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811118265.6A Withdrawn CN109065059A (en) 2018-09-26 2018-09-26 The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established

Country Status (1)

Country Link
CN (1) CN109065059A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800299A (en) * 2019-02-01 2019-05-24 浙江核新同花顺网络信息股份有限公司 A kind of speaker clustering method and relevant apparatus
CN110135492A (en) * 2019-05-13 2019-08-16 山东大学 Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models
WO2020143263A1 (en) * 2019-01-11 2020-07-16 华南理工大学 Speaker identification method based on speech sample feature space trajectory
CN112019786A (en) * 2020-08-24 2020-12-01 上海松鼠课堂人工智能科技有限公司 Intelligent teaching screen recording method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1178467A1 (en) * 2000-07-05 2002-02-06 Matsushita Electric Industrial Co., Ltd. Speaker verification and identification in their own spaces
JP2013061402A (en) * 2011-09-12 2013-04-04 Nippon Telegr & Teleph Corp <Ntt> Spoken language estimating device, method, and program
CN103413551A (en) * 2013-07-16 2013-11-27 清华大学 Sparse dimension reduction-based speaker identification method
CN104538035A (en) * 2014-12-19 2015-04-22 深圳先进技术研究院 Speaker recognition method and system based on Fisher supervectors
CN107342077A (en) * 2017-05-27 2017-11-10 国家计算机网络与信息安全管理中心 A kind of speaker segmentation clustering method and system based on factorial analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1178467A1 (en) * 2000-07-05 2002-02-06 Matsushita Electric Industrial Co., Ltd. Speaker verification and identification in their own spaces
JP2013061402A (en) * 2011-09-12 2013-04-04 Nippon Telegr & Teleph Corp <Ntt> Spoken language estimating device, method, and program
CN103413551A (en) * 2013-07-16 2013-11-27 清华大学 Sparse dimension reduction-based speaker identification method
CN104538035A (en) * 2014-12-19 2015-04-22 深圳先进技术研究院 Speaker recognition method and system based on Fisher supervectors
CN107342077A (en) * 2017-05-27 2017-11-10 国家计算机网络与信息安全管理中心 A kind of speaker segmentation clustering method and system based on factorial analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张文林等: "基于正则化的本征音说话人自适应方法", 《自动化学报》 *
方尔庆等: "基于视听信息的自动年龄估计方法", 《软件学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143263A1 (en) * 2019-01-11 2020-07-16 华南理工大学 Speaker identification method based on speech sample feature space trajectory
CN109800299A (en) * 2019-02-01 2019-05-24 浙江核新同花顺网络信息股份有限公司 A kind of speaker clustering method and relevant apparatus
CN110135492A (en) * 2019-05-13 2019-08-16 山东大学 Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models
CN112019786A (en) * 2020-08-24 2020-12-01 上海松鼠课堂人工智能科技有限公司 Intelligent teaching screen recording method and system

Similar Documents

Publication Publication Date Title
CN109065059A (en) The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN106847292B (en) Method for recognizing sound-groove and device
CN104036774B (en) Tibetan dialect recognition methods and system
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN108986824B (en) Playback voice detection method
CN102324232A (en) Method for recognizing sound-groove and system based on gauss hybrid models
WO2019153404A1 (en) Smart classroom voice control system
CN105469784B (en) A kind of speaker clustering method and system based on probability linear discriminant analysis model
CN107342077A (en) A kind of speaker segmentation clustering method and system based on factorial analysis
CN107393554A (en) In a kind of sound scene classification merge class between standard deviation feature extracting method
CN103811009A (en) Smart phone customer service system based on speech analysis
CN105261367B (en) A kind of method for distinguishing speek person
CN106128465A (en) A kind of Voiceprint Recognition System and method
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
CN1808567A (en) Voice-print authentication device and method of authenticating people presence
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
CN110047504A (en) Method for distinguishing speek person under identity vector x-vector linear transformation
CN108735200A (en) A kind of speaker&#39;s automatic marking method
CN109961794A (en) A kind of layering method for distinguishing speek person of model-based clustering
CN110299150A (en) A kind of real-time voice speaker separation method and system
CN107358947A (en) Speaker recognition methods and system again
CN109377981A (en) The method and device of phoneme alignment
CN106898355A (en) A kind of method for distinguishing speek person based on two modelings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20181221