CN109545229A - A kind of method for distinguishing speek person based on speech samples Feature space trace - Google Patents

A kind of method for distinguishing speek person based on speech samples Feature space trace Download PDF

Info

Publication number
CN109545229A
CN109545229A CN201910027145.3A CN201910027145A CN109545229A CN 109545229 A CN109545229 A CN 109545229A CN 201910027145 A CN201910027145 A CN 201910027145A CN 109545229 A CN109545229 A CN 109545229A
Authority
CN
China
Prior art keywords
speech
feature space
speaker
feature
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910027145.3A
Other languages
Chinese (zh)
Other versions
CN109545229B (en
Inventor
贺前华
吴克乾
谢伟
庞文丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910027145.3A priority Critical patent/CN109545229B/en
Publication of CN109545229A publication Critical patent/CN109545229A/en
Priority to PCT/CN2019/111530 priority patent/WO2020143263A1/en
Priority to SG11202103091XA priority patent/SG11202103091XA/en
Application granted granted Critical
Publication of CN109545229B publication Critical patent/CN109545229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of method for distinguishing speek person based on speech samples Feature space trace, the method includes indicating to being clustered without mark voice data feature, obtain speech feature space: mark subclass;Speaker's registration is carried out using mark speech samples, obtains distributed intelligence and motion track information of the speaker in speech feature space;Speech samples to be identified are identified using the motion track information of the distributed intelligence of speaker's speech feature space and speech samples.The present invention uses the thinking of speaker's speech feature space positioning, and Speaker Identification computation complexity is low, solves the problems, such as that GMM-UBM computation complexity is high;And speaker's speech feature space of a languages can be used as the speech feature space of the Speaker Identification of another languages, realize the shared of data.

Description

A kind of method for distinguishing speek person based on speech samples Feature space trace
Technical field
The present invention relates to living things feature recognition fields, and in particular to a kind of speaking based on speech samples Feature space trace People's recognition methods.
Background technique
With the development of artificial intelligence technology, audio perception has become the hot spot of audio signal processing technique research, middle pitch Frequency division class or audio identification are the key problems of audio perception, and in engineer application, audio classification shows as Speaker Identification, sound Frequency event recognition, audio event detection etc..Speaker Recognition Technology is identity validation technology --- the one of biometrics identification technology Kind.Biometrics identification technology is known using the technology of biological characteristic automatic identification individual identity, including fingerprint recognition, iris Not, gene identification, recognition of face etc..Compared with other identity validation technologies, Speaker Identification is more convenient, naturally, and having The relatively low user property invaded.Speaker Identification carries out identification using voice signal, has human-computer interaction nature, voice letter Number it is easy to extract, the advantages such as long-range identification can be achieved.
Existing Speaker Recognition System includes two stages: training stage and cognitive phase.In the training stage, system makes It is that speaker establishes model with speaker's voice of collection;In cognitive phase, system will input voice and speaker model carries out Matching is to enter a judgement.Speaker Recognition System needs to extract the feature that can reflect speaker's individual character from voice signal, and establishes Accurate model distinguishes the difference between the speaker and other speakers.Currently used audio classification techniques mainly have two big Class, one kind are to generate statistical model, and such as mixed Gauss model GMM and hidden Markov model HMM, another kind of is based on depth mind Method through network, such as DNN, RNN or LSTM.Whether any technology requires largely to mark training sample, in order to Reach preferable recognition performance, deep neural network method requires sample size higher.Method based on GMM or HMM is not to It is not subject to special consideration with the distinction information between audio class, does not account for the shared of inhomogeneity sample data yet, Such as: paper " the Speaker Verification Using Adapted Gaussian of MIT professor Reynold etc. Mixture Models " (Digital Signal Processing 10 (2000), 19-41.) method for mentioning, have higher Computation complexity;Under large sample support, deep neural network method shows good performance, such as Google Paper " End-to-End Text-Dependent Speaker Verification " (2016IEEE International Conference on Acoustics,Speech and Signal Processing(ICASSP),2016,Pages:5115- 5119) feature and training are extracted to voice using neural network in, but the training of neural network needs largely to have mark language Sound, and the procurement cost of great amount of samples is very high, and deep neural network method shortage is explanatory, suitable one is black Case.
Often computation complexity is higher for existing speaker Recognition Technology, and the speaker's voice data for needing largely to mark comes Training pattern, and acquiring largely has mark voice data to need huge workload.Therefore needing to find one kind can be more Method for distinguishing speek person and system easily and effectively.
Summary of the invention
In view of the deficiencies of the prior art, it is an object of the present invention to provide a kind of based on speech samples Feature space trace Method for distinguishing speek person, wherein speech feature space does not depend on speaker, text and language, therefore speech feature space Building can use the voice data of any qualification, realize the shared of voice data;And speaker's voice track, even if one Sample can also construct, therefore not need largely have mark voice data, and overcoming the prior art and needing to acquire largely has mark The defect of voice data.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of method for distinguishing speek person based on speech samples Feature space trace, one of speech samples can be considered as The primary movement of speech feature space the described method comprises the following steps with the rail track feature in activity space and space:
Step 1), building speech feature space Ω: it will be carried out without mark speech samples in feature space using clustering method Cluster, generated by clustering obtained subclass data expression Ω of certain expression as speech feature space of the subclass data= {gk, k=1,2 ..., K };
Step 2), building speaker's knowledge: using the clean speech sample for having speaker's attribute labeling, it is obtained in voice Distributed intelligence and motion track information on feature space Ω;
Step 3), Speaker Identification: for speech samples to be identified, the speech feature space distribution of the sample is obtained first Expression and track, then calculated using the distributed intelligence of speaker's speech feature space the difference of sample distribution and prior distribution with And the accumulative local distribution difference along track, as Speaker Identification foundation and judged.
Further, during step 1) building speech feature space Ω, it is able to use any clean speech sample, it is right Speaker, languages factor do not have any constraint.
Further, speech samples are gathered in feature space using K-means or other clustering methods in step 1) Class, the speech feature space express Ω={ gk, k=1,2 ..., K can be class data distribution function (such as Gauss point Cloth function), cluster centre vector (mass center) or generate model (such as hidden Markov model or neural network) these have The mark of stationkeeping ability, referred to as feature space mark are sub, and the mark of class used in speech feature space cuckoo mould K determines voice Feature space expresses granularity, and K is bigger, and speech feature space expression is finer.On the other hand, the accuracy and data of space expression Scale is related, and data are abundanter, and space expression is more complete;Meanwhile the data for constructing speech feature space are more targeted, for For particular problem, space expression can be more accurate.
Further, in step 2), using have the clean speech sample of speaker's attribute labeling to speech feature space into Rower note is using Gaussian Profile gk(mk,Uk) it is used as the space identification period of the day from 11 p.m. to 1 a.m, speaker characteristic space distribution information is pressed with lower section Formula obtains:
One, each feature f of speech samples is calculatedtWith the sub- g of space identificationk(mk,Uk) the position degree of association, is defined as:
In formula, space identification is indicated with Multi-dimensional Gaussian distribution, mkIndicate the mean value vector of k-th of Gaussian Profile, UkTable Show the variance matrix of k-th of Multi-dimensional Gaussian distribution;
Two, speaker's sample set and the sub- g of space identification are calculatedk(mk,Uk) the position degree of association desired value:
In formula,Indicate the t frame feature and the sub- g of space identification of n-th of samplek(mk,Uk) the degree of association;
Three, speaker characteristic spatial distribution is calculated are as follows:
Further, in step 2), motion profile timing information of speaker's speech samples on speech feature space Ω It is expressed as the sequence of neighborhoods Ψ of speech samples feature1Ψ2…ΨT, and speech samples feature ftδ neighborhood be Ψt={ gk|dtk< δ }, wherein dtkFor the Mahalanobis generalised distance (Mahalanobis of speech samples feature and speech samples distribution Distance), i.e.,
Further, the speech samples feature ftδ neighborhood Ψt={ gk|dtk< δ } decision threshold refer to normal distribution Characteristic, choose 2 < δ < 3.
Further, in step 3), speech samples f={ f1,f2,…,fTSpeaker Identification process include following step It is rapid:
One, speech samples f={ f is calculated1,f2,…,fTIn the distribution P=(p of speech feature space Ω1,p2,…,pK), Wherein
Two, speech samples f={ f is determined1,f2,…,fTMotion profile Ψ in speech feature space Ω1Ψ2…ΨT, Ψt={ gk|dtk<δ};
Three, sample distribution P=(p is calculated1,p2,…,pK) with the priori features spatial distribution of speaker s DistanceThen screening includes the possible disaggregation S really solvedp:
Four, motion profile Ψ is calculated1Ψ2…ΨTDistance metricFrom SpIn select possibility SolutionComplete Speaker Identification.
Specifically, in step 3), the space distribution information P=(p of speech samples is only used1,p2,…,pK) or motion profile DistanceGood Speaker Identification performance can be obtained.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention, wherein voice is special The foundation for levying space is clustered to a large amount of phonetic feature, is not needed the data of mark, is established speech feature space Data sample derives from different speakers, does not have exact requirement for speech content, speaker's age, languages, overcomes It in neural network method, needs largely to have the problem of mark voice, the data acquisition that speech space is established facilitates realization.
2, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention is based on speaker Positioning and trace information of the phonetic feature in speech feature space are different from signal source and generate Modeling Approaches, such as hidden Ma Er Can husband's model (HMM) etc., positioning is opposite, and it is absolute for generating model;Compared with the method for deep neural network, have Interpretation, each knowledge data have certain physics semanteme, such as the degree of association of the sample characteristics on the Ω of space point Cloth information P=(p1,p2,…,pK) express activity space range (the mark subclass corresponding to nonzero element of the sample Representative space), and express the distribution in the space.
3, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention, essence is voice Feature positions in space, for the phonetic feature of different speakers, is positioned, is made on the speech feature space of foundation The phonetic feature location information that different speakers are indicated with the degree of association has just given expression to difference by less calculation amount and has spoken Distinction between people has compared to the method for the GMM or HMM for needing to model each speaker with generation model Lower computation complexity.
4, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention, wherein voice is special Sign space identification subclass is the reference system for positioning speaker's phonetic feature, is relativeness, does not have with sample to be identified There is stringent concerns mandate, therefore feature space has sharing, the speech feature space of foundation can move to other theorys It is identified on words personal data collection, such as: speaker's speech feature space of a languages can be used as the speaker of another languages The speech feature space of identification realizes the shared of data.
Detailed description of the invention
Fig. 1 is the general flowchart of method for distinguishing speek person in the embodiment of the present invention 1.
Fig. 2 is the step flow chart that speech feature space is established in the embodiment of the present invention 1.
Fig. 3 is that the step of distributed intelligence of speaker's speech feature space is generated with trace information in the embodiment of the present invention 1 is flowed Cheng Tu.
Fig. 4 is the step flow chart identified in the embodiment of the present invention 1 to speech samples to be identified.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
Embodiment 1:
Present embodiments provide a kind of method for distinguishing speek person based on speech samples Feature space trace, general flowchart As shown in Figure 1, comprising the following three steps:
1) speech feature space Ω is established, any clean speech sample can be used, do not have to factors such as speaker, languages Then speech samples are clustered using clustering method in feature space, cluster obtained subclass tables of data by any constraint Up to the expression { g for speech feature spacek, k=1,2 ..., K };
2) speaker's knowledge, distributed intelligence and motion track information two including speaker in speech feature space are constructed Part;
3) for speech samples to be identified, the distributed intelligence of speaker's speech feature space and the movement of speech samples are utilized Trace information is identified.
It is the step flow chart that speech feature space is established in the present embodiment with reference to Fig. 2.Use aishell Chinese corpus Speech samples collection of speaker's voice data as no mark in library includes 400 speakers, selection in aishell altogether Everyone 60 wav files are used to train speech feature space, extract without mark speech samples collection X={ x1,x2,....., xN12 dimension MFCC features, to obtain feature setWhereinIt is short time frame feature, tNFor institute There is the sum of the frame number of sample;
Then it usesThe GMM that characteristic sequence one degree of mixing of training is K, gives up the power of GMM Weight information, retains mark subclass of each Gaussian component as speech feature space.Wherein, K is audio feature space mark Know the quantity of son, mark quantity K selection 4096, to give precision higher description audio feature space;
Speech feature space mark sublist is shown as Ω={ gk, k=1,2 ..., K }, wherein g=N (m, U) is multidimensional Gauss Distribution function;
It is the step flow chart that speaker characteristic space distribution information generates in the present embodiment with reference to Fig. 3.For Everyone, is used to be labeled speech feature space using 20 wav files in aishell.Target speaker's voice sample This collection Y={ (y1,s1),(y2,s2),.....,(yM,sM), si∈ S={ Sl, l=1,2 ..., L } and (speaker's collection), speaker SlSample be Yl={ ym|sm=Sl, m=1,2 ..., M }, extracting its audio frequency characteristics sequence is Fl={ f1,f2,…,ftl};Meter Calculate all feature f of speech samplestWith the sub- g of space identificationk(mk,Uk) the position degree of association:
Calculate speaker's sample set and the sub- g of space identificationk(mk,Uk) the position degree of association desired value:
WhereinFor the t frame feature and the sub- g of mark of n-th of samplek(mk,Uk) the position degree of association;
Calculate speaker characteristic spatial distribution are as follows:
It concentrates the registration voice of each speaker to handle target speaker, obtains the phonetic feature of each speaker Distributed intelligence.
The motion profile timing information of speech samples is expressed as the sequence of neighborhoods Ψ of speech samples feature1Ψ2…ΨT, and sample Eigen ftδ neighborhood be Ψt={ gk|dtk< δ }, wherein dtkIt is characterized the Mahalanobis distance with distribution, i.e.,
It is the step flow chart that speech samples are identified in the present embodiment with reference to Fig. 4.The spy of speech samples to be identified Sign is f={ f1,f2,…,fT, identification process is as follows:
Calculate voice f={ f1,f2,…,fTIn the distribution P=(p of feature space Ω1,p2,…,pK);
Wherein, feature ftWith the sub- g of space identificationk(mk,Uk) the position degree of association are as follows:
Speaker's sample to be identified and the sub- g of space identificationk(mk,Uk) the position degree of association are as follows:
Determine phonetic feature f={ f1,f2,…,fTTrack Ψ in feature space Ω1Ψ2…ΨT, wherein Ψt={ gk |dtk<δ};
Calculate sample distribution P=(p1,p2,…,pK) with the priori features spatial distribution of speaker s DistanceWherein α takes 2, and then screening includes the possible disaggregation S really solvedp:It selects apart from the smallest 10 speakers as candidate recognition result;
Calculate track Ψ1Ψ2…ΨTDistance metricWherein α takes 2, from candidate 10 Select the smallest speaker of trajectory distance as recognition result in a speaker, i.e.,
Embodiment 2:
Present embodiments provide a kind of method for distinguishing speek person based on speech samples Feature space trace, including following step It is rapid:
Step 1, speech feature space mark subclass is established using the voice data of English corpus timit;
Step 2, target speaker collection is registered using the voice data in aishell corpus, with embodiment 1;
Step 3, speech samples to be identified are identified, with embodiment 1.
Obtained recognition effect has a small amount of gap compared in embodiment 1, can prove the speaker of another languages Speech feature space can be used as the speech feature space of the Speaker Identification of another languages, realize the shared of data.
The above, only the invention patent preferred embodiment, but the scope of protection of the patent of the present invention is not limited to This, anyone skilled in the art is in the range disclosed in the invention patent, according to the present invention the skill of patent Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the scope of protection of the patent of the present invention.

Claims (7)

1. a kind of method for distinguishing speek person based on speech samples Feature space trace, one of speech samples can be considered as language The primary movement of sound feature space, with the rail track feature in activity space and space, which is characterized in that the method includes with Lower step:
Step 1), building speech feature space Ω: will be clustered without mark speech samples in feature space using clustering method, Expression Ω={ g of certain expression as speech feature space of the subclass data is generated by clustering obtained subclass datak,k =1,2 ..., K };
Step 2), building speaker's knowledge: using the clean speech sample for having speaker's attribute labeling, it is obtained in phonetic feature Distributed intelligence and motion track information on the Ω of space;
Step 3), Speaker Identification: for speech samples to be identified, the speech feature space distribution and expression of the sample is obtained first And track, difference and the edge of sample distribution and prior distribution are then calculated using the distributed intelligence of speaker's speech feature space The accumulative local distribution difference of track, as Speaker Identification foundation and judged.
2. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 1, feature It is: during step 1) constructs speech feature space Ω, any clean speech sample is able to use, to speaker, languages Factor does not have any constraint.
3. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 1, feature Be: the speech feature space expresses Ω={ gk, k=1,2 ..., K can be class data distribution function, cluster centre Vector generates these marks with stationkeeping ability of model, and referred to as feature space mark, speech feature space are made Class identifies cuckoo mould K and determines that speech feature space expresses granularity, and K is bigger, and speech feature space expression is finer.
4. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 1, feature It is: in step 2), using there is the clean speech sample of speaker's attribute labeling to be labeled speech feature space, is using Gaussian Profile gk(mk,Uk) it is used as the space identification period of the day from 11 p.m. to 1 a.m, speaker characteristic space distribution information obtains in the following manner:
One, each feature f of speech samples is calculatedtWith the sub- g of space identificationk(mk,Uk) the position degree of association, is defined as:
In formula, space identification is indicated with Multi-dimensional Gaussian distribution, mkIndicate the mean value vector of k-th of Gaussian Profile, UkIndicate the The variance matrix of k Multi-dimensional Gaussian distribution;
Two, speaker's sample set and the sub- g of space identification are calculatedk(mk,Uk) the position degree of association desired value:
In formula,Indicate the t frame feature and the sub- g of space identification of n-th of samplek(mk,Uk) the degree of association;
Three, speaker characteristic spatial distribution is calculated are as follows:
5. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 4, special Sign is: in step 2), motion profile timing information of speaker's speech samples on speech feature space Ω is expressed as language The sequence of neighborhoods Ψ of sound sample characteristics1Ψ2…ΨT, and speech samples feature ftδ neighborhood be Ψt={ gk|dtk< δ }, wherein dtk For speech samples feature and speech samples distribution Mahalanobis generalised distance (Mahalanobis distance), i.e.,
6. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 5, feature It is: the speech samples feature ftδ neighborhood Ψt={ gk|dtk< δ } decision threshold refer to normal distribution characteristic, choose 2 <δ<3。
7. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 5, feature It is, in step 3), speech samples f={ f1,f2,…,fTSpeaker Identification process the following steps are included:
One, speech samples f={ f is calculated1,f2,…,fTIn the distribution P=(p of speech feature space Ω1,p2,…,pK), wherein
Two, speech samples f={ f is determined1,f2,…,fTMotion profile Ψ in speech feature space Ω1Ψ2…ΨT, Ψt= {gk|dtk<δ};
Three, sample distribution P=(p is calculated1,p2,…,pK) with the priori features spatial distribution of speaker s's DistanceThen screening includes the possible disaggregation S really solvedp:
Four, motion profile Ψ is calculated1Ψ2…ΨTDistance metricFrom SpIn select possible solutionComplete Speaker Identification.
CN201910027145.3A 2019-01-11 2019-01-11 Speaker recognition method based on voice sample characteristic space track Active CN109545229B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910027145.3A CN109545229B (en) 2019-01-11 2019-01-11 Speaker recognition method based on voice sample characteristic space track
PCT/CN2019/111530 WO2020143263A1 (en) 2019-01-11 2019-10-16 Speaker identification method based on speech sample feature space trajectory
SG11202103091XA SG11202103091XA (en) 2019-01-11 2019-10-16 A Speaker Recognition Method Based on Trajectories in Feature Spaces of Voice Samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910027145.3A CN109545229B (en) 2019-01-11 2019-01-11 Speaker recognition method based on voice sample characteristic space track

Publications (2)

Publication Number Publication Date
CN109545229A true CN109545229A (en) 2019-03-29
CN109545229B CN109545229B (en) 2023-04-21

Family

ID=65835222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910027145.3A Active CN109545229B (en) 2019-01-11 2019-01-11 Speaker recognition method based on voice sample characteristic space track

Country Status (3)

Country Link
CN (1) CN109545229B (en)
SG (1) SG11202103091XA (en)
WO (1) WO2020143263A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111081261A (en) * 2019-12-25 2020-04-28 华南理工大学 Text-independent voiceprint recognition method based on LDA
CN111128128A (en) * 2019-12-26 2020-05-08 华南理工大学 Voice keyword detection method based on complementary model scoring fusion
WO2020143263A1 (en) * 2019-01-11 2020-07-16 华南理工大学 Speaker identification method based on speech sample feature space trajectory
CN111933156A (en) * 2020-09-25 2020-11-13 广州佰锐网络科技有限公司 High-fidelity audio processing method and device based on multiple feature recognition

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487978B (en) * 2020-11-30 2024-04-16 清华珠三角研究院 Method and device for positioning speaker in video and computer storage medium
CN113611285B (en) * 2021-09-03 2023-11-24 哈尔滨理工大学 Language identification method based on stacked bidirectional time sequence pooling
CN117235435B (en) * 2023-11-15 2024-02-20 世优(北京)科技有限公司 Method and device for determining audio signal loss function

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067517A (en) * 1996-02-02 2000-05-23 International Business Machines Corporation Transcription of speech data with segments from acoustically dissimilar environments
CN1652206A (en) * 2005-04-01 2005-08-10 郑方 Sound veins identifying method
JP2009063773A (en) * 2007-09-05 2009-03-26 Nippon Telegr & Teleph Corp <Ntt> Speech feature learning device and speech recognition device, and method, program and recording medium thereof
CN102024455A (en) * 2009-09-10 2011-04-20 索尼株式会社 Speaker recognition system and method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5598507A (en) * 1994-04-12 1997-01-28 Xerox Corporation Method of speaker clustering for unknown speakers in conversational audio data
CN102479511A (en) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 Large-scale voiceprint authentication method and system
CN105845141A (en) * 2016-03-23 2016-08-10 广州势必可赢网络科技有限公司 Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness
US10637898B2 (en) * 2017-05-24 2020-04-28 AffectLayer, Inc. Automatic speaker identification in calls
CN109065028B (en) * 2018-06-11 2022-12-30 平安科技(深圳)有限公司 Speaker clustering method, speaker clustering device, computer equipment and storage medium
CN109065059A (en) * 2018-09-26 2018-12-21 新巴特(安徽)智能科技有限公司 The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established
CN109545229B (en) * 2019-01-11 2023-04-21 华南理工大学 Speaker recognition method based on voice sample characteristic space track

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6067517A (en) * 1996-02-02 2000-05-23 International Business Machines Corporation Transcription of speech data with segments from acoustically dissimilar environments
CN1652206A (en) * 2005-04-01 2005-08-10 郑方 Sound veins identifying method
JP2009063773A (en) * 2007-09-05 2009-03-26 Nippon Telegr & Teleph Corp <Ntt> Speech feature learning device and speech recognition device, and method, program and recording medium thereof
CN102024455A (en) * 2009-09-10 2011-04-20 索尼株式会社 Speaker recognition system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周雷等: "一种新型的与文本相关的说话人识别方法研究", 《上海师范大学学报(自然科学版)》 *
邓浩江等: "基于聚类统计与文本无关的说话人识别研究", 《电路与系统学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143263A1 (en) * 2019-01-11 2020-07-16 华南理工大学 Speaker identification method based on speech sample feature space trajectory
CN111081261A (en) * 2019-12-25 2020-04-28 华南理工大学 Text-independent voiceprint recognition method based on LDA
CN111081261B (en) * 2019-12-25 2023-04-21 华南理工大学 Text-independent voiceprint recognition method based on LDA
CN111128128A (en) * 2019-12-26 2020-05-08 华南理工大学 Voice keyword detection method based on complementary model scoring fusion
CN111128128B (en) * 2019-12-26 2023-05-23 华南理工大学 Voice keyword detection method based on complementary model scoring fusion
CN111933156A (en) * 2020-09-25 2020-11-13 广州佰锐网络科技有限公司 High-fidelity audio processing method and device based on multiple feature recognition

Also Published As

Publication number Publication date
CN109545229B (en) 2023-04-21
WO2020143263A1 (en) 2020-07-16
SG11202103091XA (en) 2021-04-29

Similar Documents

Publication Publication Date Title
CN109545229A (en) A kind of method for distinguishing speek person based on speech samples Feature space trace
Kamper et al. An embedded segmental k-means model for unsupervised segmentation and clustering of speech
Ryu et al. Out-of-domain detection based on generative adversarial network
CN111597328B (en) New event theme extraction method
CN105810191B (en) Merge the Chinese dialects identification method of prosodic information
CN111128128B (en) Voice keyword detection method based on complementary model scoring fusion
CN111696522B (en) Tibetan language voice recognition method based on HMM and DNN
CN114625879A (en) Short text clustering method based on self-adaptive variational encoder
Debnath et al. RETRACTED ARTICLE: Audio-Visual Automatic Speech Recognition Towards Education for Disabilities
CN106448660B (en) It is a kind of introduce big data analysis natural language smeared out boundary determine method
Yao et al. Real time large vocabulary continuous sign language recognition based on OP/Viterbi algorithm
Rodríguez-Serrano et al. Unsupervised writer adaptation of whole-word HMMs with application to word-spotting
Kesiraju et al. Topic identification of spoken documents using unsupervised acoustic unit discovery
Celikyilmaz et al. Exploiting distance based similarity in topic models for user intent detection
Farooq et al. Mispronunciation detection in articulation points of Arabic letters using machine learning
Huang et al. Generation of phonetic units for mixed-language speech recognition based on acoustic and contextual analysis
Ping English speech recognition method based on hmm technology
CN114943235A (en) Named entity recognition method based on multi-class language model
Martínez-Hinarejos et al. Spanish Sign Language Recognition with Different Topology Hidden Markov Models.
Ge et al. Accent classification with phonetic vowel representation
Xu et al. Research on continuous sign language sentence recognition algorithm based on weighted key-frame
Wujisguleng [Retracted] The Mongolian Vowel Acoustic Model Based on the Clustering Algorithm
Liu Research on Tibetan Speech Endpoint Detection Method Based on Extreme Learning Machine
Sheng Research on English Language Learning Algorithm Based on Speech Recognition Confidence
Krishnaveni et al. Performance evaluation of Statistical classifiers using Indian Sign language datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant