CN109545229A - A kind of method for distinguishing speek person based on speech samples Feature space trace - Google Patents
A kind of method for distinguishing speek person based on speech samples Feature space trace Download PDFInfo
- Publication number
- CN109545229A CN109545229A CN201910027145.3A CN201910027145A CN109545229A CN 109545229 A CN109545229 A CN 109545229A CN 201910027145 A CN201910027145 A CN 201910027145A CN 109545229 A CN109545229 A CN 109545229A
- Authority
- CN
- China
- Prior art keywords
- speech
- feature space
- speaker
- feature
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000000694 effects Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 241000544061 Cuculus canorus Species 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 3
- 230000001149 cognitive effect Effects 0.000 description 2
- 239000004744 fabric Substances 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of method for distinguishing speek person based on speech samples Feature space trace, the method includes indicating to being clustered without mark voice data feature, obtain speech feature space: mark subclass;Speaker's registration is carried out using mark speech samples, obtains distributed intelligence and motion track information of the speaker in speech feature space;Speech samples to be identified are identified using the motion track information of the distributed intelligence of speaker's speech feature space and speech samples.The present invention uses the thinking of speaker's speech feature space positioning, and Speaker Identification computation complexity is low, solves the problems, such as that GMM-UBM computation complexity is high;And speaker's speech feature space of a languages can be used as the speech feature space of the Speaker Identification of another languages, realize the shared of data.
Description
Technical field
The present invention relates to living things feature recognition fields, and in particular to a kind of speaking based on speech samples Feature space trace
People's recognition methods.
Background technique
With the development of artificial intelligence technology, audio perception has become the hot spot of audio signal processing technique research, middle pitch
Frequency division class or audio identification are the key problems of audio perception, and in engineer application, audio classification shows as Speaker Identification, sound
Frequency event recognition, audio event detection etc..Speaker Recognition Technology is identity validation technology --- the one of biometrics identification technology
Kind.Biometrics identification technology is known using the technology of biological characteristic automatic identification individual identity, including fingerprint recognition, iris
Not, gene identification, recognition of face etc..Compared with other identity validation technologies, Speaker Identification is more convenient, naturally, and having
The relatively low user property invaded.Speaker Identification carries out identification using voice signal, has human-computer interaction nature, voice letter
Number it is easy to extract, the advantages such as long-range identification can be achieved.
Existing Speaker Recognition System includes two stages: training stage and cognitive phase.In the training stage, system makes
It is that speaker establishes model with speaker's voice of collection;In cognitive phase, system will input voice and speaker model carries out
Matching is to enter a judgement.Speaker Recognition System needs to extract the feature that can reflect speaker's individual character from voice signal, and establishes
Accurate model distinguishes the difference between the speaker and other speakers.Currently used audio classification techniques mainly have two big
Class, one kind are to generate statistical model, and such as mixed Gauss model GMM and hidden Markov model HMM, another kind of is based on depth mind
Method through network, such as DNN, RNN or LSTM.Whether any technology requires largely to mark training sample, in order to
Reach preferable recognition performance, deep neural network method requires sample size higher.Method based on GMM or HMM is not to
It is not subject to special consideration with the distinction information between audio class, does not account for the shared of inhomogeneity sample data yet,
Such as: paper " the Speaker Verification Using Adapted Gaussian of MIT professor Reynold etc.
Mixture Models " (Digital Signal Processing 10 (2000), 19-41.) method for mentioning, have higher
Computation complexity;Under large sample support, deep neural network method shows good performance, such as Google
Paper " End-to-End Text-Dependent Speaker Verification " (2016IEEE International
Conference on Acoustics,Speech and Signal Processing(ICASSP),2016,Pages:5115-
5119) feature and training are extracted to voice using neural network in, but the training of neural network needs largely to have mark language
Sound, and the procurement cost of great amount of samples is very high, and deep neural network method shortage is explanatory, suitable one is black
Case.
Often computation complexity is higher for existing speaker Recognition Technology, and the speaker's voice data for needing largely to mark comes
Training pattern, and acquiring largely has mark voice data to need huge workload.Therefore needing to find one kind can be more
Method for distinguishing speek person and system easily and effectively.
Summary of the invention
In view of the deficiencies of the prior art, it is an object of the present invention to provide a kind of based on speech samples Feature space trace
Method for distinguishing speek person, wherein speech feature space does not depend on speaker, text and language, therefore speech feature space
Building can use the voice data of any qualification, realize the shared of voice data;And speaker's voice track, even if one
Sample can also construct, therefore not need largely have mark voice data, and overcoming the prior art and needing to acquire largely has mark
The defect of voice data.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of method for distinguishing speek person based on speech samples Feature space trace, one of speech samples can be considered as
The primary movement of speech feature space the described method comprises the following steps with the rail track feature in activity space and space:
Step 1), building speech feature space Ω: it will be carried out without mark speech samples in feature space using clustering method
Cluster, generated by clustering obtained subclass data expression Ω of certain expression as speech feature space of the subclass data=
{gk, k=1,2 ..., K };
Step 2), building speaker's knowledge: using the clean speech sample for having speaker's attribute labeling, it is obtained in voice
Distributed intelligence and motion track information on feature space Ω;
Step 3), Speaker Identification: for speech samples to be identified, the speech feature space distribution of the sample is obtained first
Expression and track, then calculated using the distributed intelligence of speaker's speech feature space the difference of sample distribution and prior distribution with
And the accumulative local distribution difference along track, as Speaker Identification foundation and judged.
Further, during step 1) building speech feature space Ω, it is able to use any clean speech sample, it is right
Speaker, languages factor do not have any constraint.
Further, speech samples are gathered in feature space using K-means or other clustering methods in step 1)
Class, the speech feature space express Ω={ gk, k=1,2 ..., K can be class data distribution function (such as Gauss point
Cloth function), cluster centre vector (mass center) or generate model (such as hidden Markov model or neural network) these have
The mark of stationkeeping ability, referred to as feature space mark are sub, and the mark of class used in speech feature space cuckoo mould K determines voice
Feature space expresses granularity, and K is bigger, and speech feature space expression is finer.On the other hand, the accuracy and data of space expression
Scale is related, and data are abundanter, and space expression is more complete;Meanwhile the data for constructing speech feature space are more targeted, for
For particular problem, space expression can be more accurate.
Further, in step 2), using have the clean speech sample of speaker's attribute labeling to speech feature space into
Rower note is using Gaussian Profile gk(mk,Uk) it is used as the space identification period of the day from 11 p.m. to 1 a.m, speaker characteristic space distribution information is pressed with lower section
Formula obtains:
One, each feature f of speech samples is calculatedtWith the sub- g of space identificationk(mk,Uk) the position degree of association, is defined as:
In formula, space identification is indicated with Multi-dimensional Gaussian distribution, mkIndicate the mean value vector of k-th of Gaussian Profile, UkTable
Show the variance matrix of k-th of Multi-dimensional Gaussian distribution;
Two, speaker's sample set and the sub- g of space identification are calculatedk(mk,Uk) the position degree of association desired value:
In formula,Indicate the t frame feature and the sub- g of space identification of n-th of samplek(mk,Uk) the degree of association;
Three, speaker characteristic spatial distribution is calculated are as follows:
Further, in step 2), motion profile timing information of speaker's speech samples on speech feature space Ω
It is expressed as the sequence of neighborhoods Ψ of speech samples feature1Ψ2…ΨT, and speech samples feature ftδ neighborhood be Ψt={ gk|dtk<
δ }, wherein dtkFor the Mahalanobis generalised distance (Mahalanobis of speech samples feature and speech samples distribution
Distance), i.e.,
Further, the speech samples feature ftδ neighborhood Ψt={ gk|dtk< δ } decision threshold refer to normal distribution
Characteristic, choose 2 < δ < 3.
Further, in step 3), speech samples f={ f1,f2,…,fTSpeaker Identification process include following step
It is rapid:
One, speech samples f={ f is calculated1,f2,…,fTIn the distribution P=(p of speech feature space Ω1,p2,…,pK),
Wherein
Two, speech samples f={ f is determined1,f2,…,fTMotion profile Ψ in speech feature space Ω1Ψ2…ΨT,
Ψt={ gk|dtk<δ};
Three, sample distribution P=(p is calculated1,p2,…,pK) with the priori features spatial distribution of speaker s
DistanceThen screening includes the possible disaggregation S really solvedp:
Four, motion profile Ψ is calculated1Ψ2…ΨTDistance metricFrom SpIn select possibility
SolutionComplete Speaker Identification.
Specifically, in step 3), the space distribution information P=(p of speech samples is only used1,p2,…,pK) or motion profile
DistanceGood Speaker Identification performance can be obtained.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention, wherein voice is special
The foundation for levying space is clustered to a large amount of phonetic feature, is not needed the data of mark, is established speech feature space
Data sample derives from different speakers, does not have exact requirement for speech content, speaker's age, languages, overcomes
It in neural network method, needs largely to have the problem of mark voice, the data acquisition that speech space is established facilitates realization.
2, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention is based on speaker
Positioning and trace information of the phonetic feature in speech feature space are different from signal source and generate Modeling Approaches, such as hidden Ma Er
Can husband's model (HMM) etc., positioning is opposite, and it is absolute for generating model;Compared with the method for deep neural network, have
Interpretation, each knowledge data have certain physics semanteme, such as the degree of association of the sample characteristics on the Ω of space point
Cloth information P=(p1,p2,…,pK) express activity space range (the mark subclass corresponding to nonzero element of the sample
Representative space), and express the distribution in the space.
3, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention, essence is voice
Feature positions in space, for the phonetic feature of different speakers, is positioned, is made on the speech feature space of foundation
The phonetic feature location information that different speakers are indicated with the degree of association has just given expression to difference by less calculation amount and has spoken
Distinction between people has compared to the method for the GMM or HMM for needing to model each speaker with generation model
Lower computation complexity.
4, a kind of method for distinguishing speek person based on speech samples Feature space trace provided by the invention, wherein voice is special
Sign space identification subclass is the reference system for positioning speaker's phonetic feature, is relativeness, does not have with sample to be identified
There is stringent concerns mandate, therefore feature space has sharing, the speech feature space of foundation can move to other theorys
It is identified on words personal data collection, such as: speaker's speech feature space of a languages can be used as the speaker of another languages
The speech feature space of identification realizes the shared of data.
Detailed description of the invention
Fig. 1 is the general flowchart of method for distinguishing speek person in the embodiment of the present invention 1.
Fig. 2 is the step flow chart that speech feature space is established in the embodiment of the present invention 1.
Fig. 3 is that the step of distributed intelligence of speaker's speech feature space is generated with trace information in the embodiment of the present invention 1 is flowed
Cheng Tu.
Fig. 4 is the step flow chart identified in the embodiment of the present invention 1 to speech samples to be identified.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
Embodiment 1:
Present embodiments provide a kind of method for distinguishing speek person based on speech samples Feature space trace, general flowchart
As shown in Figure 1, comprising the following three steps:
1) speech feature space Ω is established, any clean speech sample can be used, do not have to factors such as speaker, languages
Then speech samples are clustered using clustering method in feature space, cluster obtained subclass tables of data by any constraint
Up to the expression { g for speech feature spacek, k=1,2 ..., K };
2) speaker's knowledge, distributed intelligence and motion track information two including speaker in speech feature space are constructed
Part;
3) for speech samples to be identified, the distributed intelligence of speaker's speech feature space and the movement of speech samples are utilized
Trace information is identified.
It is the step flow chart that speech feature space is established in the present embodiment with reference to Fig. 2.Use aishell Chinese corpus
Speech samples collection of speaker's voice data as no mark in library includes 400 speakers, selection in aishell altogether
Everyone 60 wav files are used to train speech feature space, extract without mark speech samples collection X={ x1,x2,.....,
xN12 dimension MFCC features, to obtain feature setWhereinIt is short time frame feature, tNFor institute
There is the sum of the frame number of sample;
Then it usesThe GMM that characteristic sequence one degree of mixing of training is K, gives up the power of GMM
Weight information, retains mark subclass of each Gaussian component as speech feature space.Wherein, K is audio feature space mark
Know the quantity of son, mark quantity K selection 4096, to give precision higher description audio feature space;
Speech feature space mark sublist is shown as Ω={ gk, k=1,2 ..., K }, wherein g=N (m, U) is multidimensional Gauss
Distribution function;
It is the step flow chart that speaker characteristic space distribution information generates in the present embodiment with reference to Fig. 3.For
Everyone, is used to be labeled speech feature space using 20 wav files in aishell.Target speaker's voice sample
This collection Y={ (y1,s1),(y2,s2),.....,(yM,sM), si∈ S={ Sl, l=1,2 ..., L } and (speaker's collection), speaker
SlSample be Yl={ ym|sm=Sl, m=1,2 ..., M }, extracting its audio frequency characteristics sequence is Fl={ f1,f2,…,ftl};Meter
Calculate all feature f of speech samplestWith the sub- g of space identificationk(mk,Uk) the position degree of association:
Calculate speaker's sample set and the sub- g of space identificationk(mk,Uk) the position degree of association desired value:
WhereinFor the t frame feature and the sub- g of mark of n-th of samplek(mk,Uk) the position degree of association;
Calculate speaker characteristic spatial distribution are as follows:
It concentrates the registration voice of each speaker to handle target speaker, obtains the phonetic feature of each speaker
Distributed intelligence.
The motion profile timing information of speech samples is expressed as the sequence of neighborhoods Ψ of speech samples feature1Ψ2…ΨT, and sample
Eigen ftδ neighborhood be Ψt={ gk|dtk< δ }, wherein dtkIt is characterized the Mahalanobis distance with distribution, i.e.,
It is the step flow chart that speech samples are identified in the present embodiment with reference to Fig. 4.The spy of speech samples to be identified
Sign is f={ f1,f2,…,fT, identification process is as follows:
Calculate voice f={ f1,f2,…,fTIn the distribution P=(p of feature space Ω1,p2,…,pK);
Wherein, feature ftWith the sub- g of space identificationk(mk,Uk) the position degree of association are as follows:
Speaker's sample to be identified and the sub- g of space identificationk(mk,Uk) the position degree of association are as follows:
Determine phonetic feature f={ f1,f2,…,fTTrack Ψ in feature space Ω1Ψ2…ΨT, wherein Ψt={ gk
|dtk<δ};
Calculate sample distribution P=(p1,p2,…,pK) with the priori features spatial distribution of speaker s
DistanceWherein α takes 2, and then screening includes the possible disaggregation S really solvedp:It selects apart from the smallest 10 speakers as candidate recognition result;
Calculate track Ψ1Ψ2…ΨTDistance metricWherein α takes 2, from candidate 10
Select the smallest speaker of trajectory distance as recognition result in a speaker, i.e.,
Embodiment 2:
Present embodiments provide a kind of method for distinguishing speek person based on speech samples Feature space trace, including following step
It is rapid:
Step 1, speech feature space mark subclass is established using the voice data of English corpus timit;
Step 2, target speaker collection is registered using the voice data in aishell corpus, with embodiment 1;
Step 3, speech samples to be identified are identified, with embodiment 1.
Obtained recognition effect has a small amount of gap compared in embodiment 1, can prove the speaker of another languages
Speech feature space can be used as the speech feature space of the Speaker Identification of another languages, realize the shared of data.
The above, only the invention patent preferred embodiment, but the scope of protection of the patent of the present invention is not limited to
This, anyone skilled in the art is in the range disclosed in the invention patent, according to the present invention the skill of patent
Art scheme and its patent of invention design are subject to equivalent substitution or change, belong to the scope of protection of the patent of the present invention.
Claims (7)
1. a kind of method for distinguishing speek person based on speech samples Feature space trace, one of speech samples can be considered as language
The primary movement of sound feature space, with the rail track feature in activity space and space, which is characterized in that the method includes with
Lower step:
Step 1), building speech feature space Ω: will be clustered without mark speech samples in feature space using clustering method,
Expression Ω={ g of certain expression as speech feature space of the subclass data is generated by clustering obtained subclass datak,k
=1,2 ..., K };
Step 2), building speaker's knowledge: using the clean speech sample for having speaker's attribute labeling, it is obtained in phonetic feature
Distributed intelligence and motion track information on the Ω of space;
Step 3), Speaker Identification: for speech samples to be identified, the speech feature space distribution and expression of the sample is obtained first
And track, difference and the edge of sample distribution and prior distribution are then calculated using the distributed intelligence of speaker's speech feature space
The accumulative local distribution difference of track, as Speaker Identification foundation and judged.
2. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 1, feature
It is: during step 1) constructs speech feature space Ω, any clean speech sample is able to use, to speaker, languages
Factor does not have any constraint.
3. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 1, feature
Be: the speech feature space expresses Ω={ gk, k=1,2 ..., K can be class data distribution function, cluster centre
Vector generates these marks with stationkeeping ability of model, and referred to as feature space mark, speech feature space are made
Class identifies cuckoo mould K and determines that speech feature space expresses granularity, and K is bigger, and speech feature space expression is finer.
4. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 1, feature
It is: in step 2), using there is the clean speech sample of speaker's attribute labeling to be labeled speech feature space, is using
Gaussian Profile gk(mk,Uk) it is used as the space identification period of the day from 11 p.m. to 1 a.m, speaker characteristic space distribution information obtains in the following manner:
One, each feature f of speech samples is calculatedtWith the sub- g of space identificationk(mk,Uk) the position degree of association, is defined as:
In formula, space identification is indicated with Multi-dimensional Gaussian distribution, mkIndicate the mean value vector of k-th of Gaussian Profile, UkIndicate the
The variance matrix of k Multi-dimensional Gaussian distribution;
Two, speaker's sample set and the sub- g of space identification are calculatedk(mk,Uk) the position degree of association desired value:
In formula,Indicate the t frame feature and the sub- g of space identification of n-th of samplek(mk,Uk) the degree of association;
Three, speaker characteristic spatial distribution is calculated are as follows:
5. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 4, special
Sign is: in step 2), motion profile timing information of speaker's speech samples on speech feature space Ω is expressed as language
The sequence of neighborhoods Ψ of sound sample characteristics1Ψ2…ΨT, and speech samples feature ftδ neighborhood be Ψt={ gk|dtk< δ }, wherein dtk
For speech samples feature and speech samples distribution Mahalanobis generalised distance (Mahalanobis distance), i.e.,
6. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 5, feature
It is: the speech samples feature ftδ neighborhood Ψt={ gk|dtk< δ } decision threshold refer to normal distribution characteristic, choose 2
<δ<3。
7. a kind of method for distinguishing speek person based on speech samples Feature space trace according to claim 5, feature
It is, in step 3), speech samples f={ f1,f2,…,fTSpeaker Identification process the following steps are included:
One, speech samples f={ f is calculated1,f2,…,fTIn the distribution P=(p of speech feature space Ω1,p2,…,pK), wherein
Two, speech samples f={ f is determined1,f2,…,fTMotion profile Ψ in speech feature space Ω1Ψ2…ΨT, Ψt=
{gk|dtk<δ};
Three, sample distribution P=(p is calculated1,p2,…,pK) with the priori features spatial distribution of speaker s's
DistanceThen screening includes the possible disaggregation S really solvedp:
Four, motion profile Ψ is calculated1Ψ2…ΨTDistance metricFrom SpIn select possible solutionComplete Speaker Identification.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910027145.3A CN109545229B (en) | 2019-01-11 | 2019-01-11 | Speaker recognition method based on voice sample characteristic space track |
PCT/CN2019/111530 WO2020143263A1 (en) | 2019-01-11 | 2019-10-16 | Speaker identification method based on speech sample feature space trajectory |
SG11202103091XA SG11202103091XA (en) | 2019-01-11 | 2019-10-16 | A Speaker Recognition Method Based on Trajectories in Feature Spaces of Voice Samples |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910027145.3A CN109545229B (en) | 2019-01-11 | 2019-01-11 | Speaker recognition method based on voice sample characteristic space track |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109545229A true CN109545229A (en) | 2019-03-29 |
CN109545229B CN109545229B (en) | 2023-04-21 |
Family
ID=65835222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910027145.3A Active CN109545229B (en) | 2019-01-11 | 2019-01-11 | Speaker recognition method based on voice sample characteristic space track |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN109545229B (en) |
SG (1) | SG11202103091XA (en) |
WO (1) | WO2020143263A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111081261A (en) * | 2019-12-25 | 2020-04-28 | 华南理工大学 | Text-independent voiceprint recognition method based on LDA |
CN111128128A (en) * | 2019-12-26 | 2020-05-08 | 华南理工大学 | Voice keyword detection method based on complementary model scoring fusion |
WO2020143263A1 (en) * | 2019-01-11 | 2020-07-16 | 华南理工大学 | Speaker identification method based on speech sample feature space trajectory |
CN111933156A (en) * | 2020-09-25 | 2020-11-13 | 广州佰锐网络科技有限公司 | High-fidelity audio processing method and device based on multiple feature recognition |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487978B (en) * | 2020-11-30 | 2024-04-16 | 清华珠三角研究院 | Method and device for positioning speaker in video and computer storage medium |
CN113611285B (en) * | 2021-09-03 | 2023-11-24 | 哈尔滨理工大学 | Language identification method based on stacked bidirectional time sequence pooling |
CN117235435B (en) * | 2023-11-15 | 2024-02-20 | 世优(北京)科技有限公司 | Method and device for determining audio signal loss function |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
CN1652206A (en) * | 2005-04-01 | 2005-08-10 | 郑方 | Sound veins identifying method |
JP2009063773A (en) * | 2007-09-05 | 2009-03-26 | Nippon Telegr & Teleph Corp <Ntt> | Speech feature learning device and speech recognition device, and method, program and recording medium thereof |
CN102024455A (en) * | 2009-09-10 | 2011-04-20 | 索尼株式会社 | Speaker recognition system and method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5598507A (en) * | 1994-04-12 | 1997-01-28 | Xerox Corporation | Method of speaker clustering for unknown speakers in conversational audio data |
CN102479511A (en) * | 2010-11-23 | 2012-05-30 | 盛乐信息技术(上海)有限公司 | Large-scale voiceprint authentication method and system |
CN105845141A (en) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness |
US10637898B2 (en) * | 2017-05-24 | 2020-04-28 | AffectLayer, Inc. | Automatic speaker identification in calls |
CN109065028B (en) * | 2018-06-11 | 2022-12-30 | 平安科技(深圳)有限公司 | Speaker clustering method, speaker clustering device, computer equipment and storage medium |
CN109065059A (en) * | 2018-09-26 | 2018-12-21 | 新巴特(安徽)智能科技有限公司 | The method for identifying speaker with the voice cluster that audio frequency characteristics principal component is established |
CN109545229B (en) * | 2019-01-11 | 2023-04-21 | 华南理工大学 | Speaker recognition method based on voice sample characteristic space track |
-
2019
- 2019-01-11 CN CN201910027145.3A patent/CN109545229B/en active Active
- 2019-10-16 SG SG11202103091XA patent/SG11202103091XA/en unknown
- 2019-10-16 WO PCT/CN2019/111530 patent/WO2020143263A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6067517A (en) * | 1996-02-02 | 2000-05-23 | International Business Machines Corporation | Transcription of speech data with segments from acoustically dissimilar environments |
CN1652206A (en) * | 2005-04-01 | 2005-08-10 | 郑方 | Sound veins identifying method |
JP2009063773A (en) * | 2007-09-05 | 2009-03-26 | Nippon Telegr & Teleph Corp <Ntt> | Speech feature learning device and speech recognition device, and method, program and recording medium thereof |
CN102024455A (en) * | 2009-09-10 | 2011-04-20 | 索尼株式会社 | Speaker recognition system and method |
Non-Patent Citations (2)
Title |
---|
周雷等: "一种新型的与文本相关的说话人识别方法研究", 《上海师范大学学报(自然科学版)》 * |
邓浩江等: "基于聚类统计与文本无关的说话人识别研究", 《电路与系统学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020143263A1 (en) * | 2019-01-11 | 2020-07-16 | 华南理工大学 | Speaker identification method based on speech sample feature space trajectory |
CN111081261A (en) * | 2019-12-25 | 2020-04-28 | 华南理工大学 | Text-independent voiceprint recognition method based on LDA |
CN111081261B (en) * | 2019-12-25 | 2023-04-21 | 华南理工大学 | Text-independent voiceprint recognition method based on LDA |
CN111128128A (en) * | 2019-12-26 | 2020-05-08 | 华南理工大学 | Voice keyword detection method based on complementary model scoring fusion |
CN111128128B (en) * | 2019-12-26 | 2023-05-23 | 华南理工大学 | Voice keyword detection method based on complementary model scoring fusion |
CN111933156A (en) * | 2020-09-25 | 2020-11-13 | 广州佰锐网络科技有限公司 | High-fidelity audio processing method and device based on multiple feature recognition |
Also Published As
Publication number | Publication date |
---|---|
CN109545229B (en) | 2023-04-21 |
WO2020143263A1 (en) | 2020-07-16 |
SG11202103091XA (en) | 2021-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109545229A (en) | A kind of method for distinguishing speek person based on speech samples Feature space trace | |
Kamper et al. | An embedded segmental k-means model for unsupervised segmentation and clustering of speech | |
Ryu et al. | Out-of-domain detection based on generative adversarial network | |
CN111597328B (en) | New event theme extraction method | |
CN105810191B (en) | Merge the Chinese dialects identification method of prosodic information | |
CN111128128B (en) | Voice keyword detection method based on complementary model scoring fusion | |
CN111696522B (en) | Tibetan language voice recognition method based on HMM and DNN | |
CN114625879A (en) | Short text clustering method based on self-adaptive variational encoder | |
Debnath et al. | RETRACTED ARTICLE: Audio-Visual Automatic Speech Recognition Towards Education for Disabilities | |
CN106448660B (en) | It is a kind of introduce big data analysis natural language smeared out boundary determine method | |
Yao et al. | Real time large vocabulary continuous sign language recognition based on OP/Viterbi algorithm | |
Rodríguez-Serrano et al. | Unsupervised writer adaptation of whole-word HMMs with application to word-spotting | |
Kesiraju et al. | Topic identification of spoken documents using unsupervised acoustic unit discovery | |
Celikyilmaz et al. | Exploiting distance based similarity in topic models for user intent detection | |
Farooq et al. | Mispronunciation detection in articulation points of Arabic letters using machine learning | |
Huang et al. | Generation of phonetic units for mixed-language speech recognition based on acoustic and contextual analysis | |
Ping | English speech recognition method based on hmm technology | |
CN114943235A (en) | Named entity recognition method based on multi-class language model | |
Martínez-Hinarejos et al. | Spanish Sign Language Recognition with Different Topology Hidden Markov Models. | |
Ge et al. | Accent classification with phonetic vowel representation | |
Xu et al. | Research on continuous sign language sentence recognition algorithm based on weighted key-frame | |
Wujisguleng | [Retracted] The Mongolian Vowel Acoustic Model Based on the Clustering Algorithm | |
Liu | Research on Tibetan Speech Endpoint Detection Method Based on Extreme Learning Machine | |
Sheng | Research on English Language Learning Algorithm Based on Speech Recognition Confidence | |
Krishnaveni et al. | Performance evaluation of Statistical classifiers using Indian Sign language datasets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |