CN105931646A - Speaker identification method base on simple direct tolerance learning algorithm - Google Patents

Speaker identification method base on simple direct tolerance learning algorithm Download PDF

Info

Publication number
CN105931646A
CN105931646A CN201610281884.1A CN201610281884A CN105931646A CN 105931646 A CN105931646 A CN 105931646A CN 201610281884 A CN201610281884 A CN 201610281884A CN 105931646 A CN105931646 A CN 105931646A
Authority
CN
China
Prior art keywords
vector
speaker
similar sample
mahalanobis distance
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610281884.1A
Other languages
Chinese (zh)
Inventor
雷震春
杨印根
朱明华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN201610281884.1A priority Critical patent/CN105931646A/en
Publication of CN105931646A publication Critical patent/CN105931646A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building

Abstract

The invention provides a speaker identification method base on a simple direct tolerance learning algorithm. The method comprises the following steps: acquiring voice samples of multiple speakers, extracting i-vectors of all the samples, performing channel compensation processing by use of an LDA or WCCN method, performing length normalizing, and forming a training sample set; according to the i-vectors of the training sample set and speaker identity, constructing a similar sample pair set and a non-similar sample pair set; by use of a KISS algorithm, obtaining a tolerance matrix by performing training on the similar sample pair set and the non-similar sample pair set; and for two pieces of new voice, their i-vectors are extracted firstly, the channel compensation processing is carried out by use of the LDA or WCCN method, the length normalizing is performed, by use of the previously calculated tolerance matrix, a Mahalanobis distance between the two i-vectors is calculated and compared with a threshold, and thus whether the two pieces of new voice belong to the same speaker is determined. According to the invention, the obtained Mahalanobis distance tolerance matrix can better truly reflect similarities and distinctions of a sample space so as to improve the performance of a speaker identification system.

Description

A kind of method for distinguishing speek person based on simple directly metric learning algorithm
Technical field
The present invention is a kind of method for distinguishing speek person based on simple directly metric learning algorithm, can be widely used for the fields such as Speaker Identification, pattern-recognition, metric learning, machine learning.
Background technology
Speaker Identification (Speaker Recognition, SR), also known as Application on Voiceprint Recognition, is a kind of by the voice of speaker is processed and analyzed, thus the technology that speaker's identity is differentiated.The most effectively weigh the similarity between speaker's speech samples, be one of the hot issue in current Research of Speaker Recognition field.In area of pattern recognition, the method weighing similarity between sample has a lot, more common method has distance scoring, such as COS distance marking (cosine distance scoring) and mahalanobis distance marking (Mahalanobis distance scoring) etc..
COS distance scoring weighs the similarity between sample by the included angle cosine value calculating the sample vector inner product space, and it makes a distinction according to the difference in vector direction, it is impossible to weigh the difference of numerical value on vector dimension.COS distance dC(xi, xj) computing formula be:
d C ( x i , x j ) = x i T x j x i T x i x i T x j
Wherein, COS distance dC(xi, xj), xi is the i-vector vector of i-th voice, and T represents transposition.
Two vector (xi, xjMahalanobis distance d between)M(xi, xj) it is defined as:
d M ( x i , x j ) = ( x i - x j ) T M ( x i - x j )
Wherein, mahalanobis distance dM(xi, xj), it is the i-vector vector of i-th voice, T represents transposition.
Only obtain can the similar sample similarity in reflected sample space, the positive semidefinite metric matrix M of non-similar sample distinction, the mahalanobis distance of calculating could effectively weigh Sample Similarity, but training sample is limited makes this metric matrix of acquisition difficult.
Metric learning method is typically based on the classification information that training sample contains, and by being automatically learned a distance matrix metric, is commonly used to calculate the mahalanobis distance score between target sample, thus is predicted the similarity of unknown data.The elementary object of metric learning algorithm is the prior information utilizing training sample, on the premise of meeting some condition as far as possible, finds overall, a linear transformed distances metric matrix M by optimization following formula:
m i n M 1 ( M ) + l R ( M )
1 (M) is loss function, and R (M) is the regular item during training distance matrix metric M, carries out constraint when loss function 1 (M) over-fitting in the training process and revises, balance parameters l30.Metric matrix M is used for calculating sample (xi, xjMahalanobis distance between):
dM(xi, xj)=(xi-xj)TM(xi-xj)
Wherein, mahalanobis distance dM(xi, xj), xiIt it is the i-vector vector of i-th voice.
Number for training the training sample of metric matrix is increasing, and googol makes the analysis of large-scale data and process bring the biggest trouble according to amount, brings so-called " dimension disaster ".Along with the rising of data dimension, between these high dimensional datas, often there is bigger correlation and redundancy.
Summary of the invention
It is an object of the invention to provide a kind of method for distinguishing speek person based on simple directly metric learning algorithm, the metric matrix of the method gained can effectively reflect the similitude in speaker space and distinction, this metric matrix is used for the mahalanobis distance scores classification device of test target speaker's speech samples simultaneously, Speaker Recognition System can be made to obtain good recognition effect.
For reaching object above, the present invention adopts the technical scheme that:
A kind of based on simple directly metric learning algorithm (Keep it simple and straight!, KISS) method for distinguishing speek person, it is characterised in that: the i-vector after using KISS Algorithm for Training to process, calculate the mahalanobis distance between speaker's tone testing sample and target sample;
Keep simple directly metric learning algorithm (Keep it simple and straight!, KISS), the most effectively, there is globally optimal solution, can quickly try to achieve the metric matrix meeting condition, the sample for training belongs to similar to only knowing whether.The metric matrix solved does not haves over-fitting, and is easily obtained.The extensibility of KISS algorithm is good, it is not necessary to the iterative process of optimization, only need to calculate two covariance matrixes the least.This metric matrix can effectively reflect the similitude in speaker space and distinction, this metric matrix is used for the mahalanobis distance scores classification device of test target speaker's speech samples, makes Speaker Recognition System achieve good recognition effect.Performance is more excellent, and the speed of the training process of metric matrix.
The purpose of the present invention implements by the following technical programs:
Gather the speech samples of multiple speaker, extract the i-vector in all samples;
Using LDA or WCCN method to carry out channel compensation and process the i-vector in all samples, line length of going forward side by side is regular, forms training sample set;
Construct i-vector based on training sample set with the similar sample of speaker's identity to collecting with non-similar sample collection;
Use KISS algorithm, at similar sample, collection is obtained metric matrix with non-similar sample to training on collection;
For two new voices, by their i-vector after the process that the above extraction, channel compensation process and length are regular, based on the metric matrix calculated before, calculate the mahalanobis distance between two i-vector;
Mahalanobis distance and the threshold value of gained are compared, based on comparative result, whether these two new voices is belonged to same speaker and judges.
Further, use LDA method to carry out channel compensation and process the i-vector in all samples, specifically include:
By the similar sample separation of projection matrix algorithmic minimizing from maximize non-similar sample separation from.
Further, use WCCN method to carry out channel compensation and process the i-vector in all samples, specifically include:
Make the base in target sample space the most orthogonal.
Further, the method also includes:
The i-vector extracted in all samples is carried out length regular.
Further, it is characterised in that use KISS algorithm, at similar sample, collection is obtained metric matrix with non-similar sample to training on collection, specifically includes:
Solve the covariance of the covariance of similar sample pair and non-similar sample pair in described target sample respectively;
Calculate the covariance of described similar sample pair and the metric matrix of the covariance of non-similar sample pair;
Further, the method also includes:
Metric matrix according to gained calculates the mahalanobis distance between two i-vector.
Further, mahalanobis distance and the threshold value of gained are compared, based on comparative result, whether these two new voices are belonged to same speaker and judges, specifically include:
If the mahalanobis distance of gained is more than threshold value, then illustrate that these two new voices are not belonging to same speaker;
If the mahalanobis distance of gained is within threshold value, then illustrate that these two new voices are belonging to same speaker.
The present invention discloses a kind of method for distinguishing speek person based on simple directly metric learning algorithm.Simple directly metric learning algorithm (KISS) is kept to utilize constraint information one mahalanobis distance metric matrix of training of paired training sample, utilize the constraint information of paired training sample pair to instruct metric learning process, to marked similar sample to non-similar sample to being effectively utilized the tutorial message of similitude and non-similarity between training sample data when carrying out metric matrix training, the metric matrix obtained more truly reflects the distinction of speaker space, similitude between unknown speaker speech samples can preferably be predicted by mahalanobis distance scores classification device.During metric matrix is trained, covariance with non-similar sample pair is calculated by similar sample, and obtains the difference of two covariances, as mahalanobis distance metric matrix, training metric matrix out, for Speaker Recognition System, achieves good recognition effect.
Accompanying drawing explanation
Fig. 1 is a kind of based on a simple directly embodiment of the method for distinguishing speek person of metric learning algorithm the flow chart according to the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings a kind of based on simple directly metric learning algorithm the method for distinguishing speek person of the embodiment of the present invention is described in detail.Show the flow chart of an embodiment of the method for the present invention with reference to Fig. 1, Fig. 1, the method comprises the following steps:
In step s 110, gather the speech samples of multiple speaker, extract the i-vector in all samples;
In the step s 120, using LDA or WCCN method to carry out channel compensation and process the i-vector in all samples, line length of going forward side by side is regular, forms training sample set;
In step s 130, i-vector based on training sample set is constructed with the similar sample of speaker's identity to collecting with non-similar sample collection;
In step S140, use KISS algorithm, at similar sample, collection is obtained metric matrix with non-similar sample to training on collection;
In step S150, for two new voices, by their i-vector after the process that the above extraction, channel compensation process and length are regular, based on the metric matrix calculated before, calculate the mahalanobis distance between two i-vector;
In a step 160, mahalanobis distance and the threshold value of gained are compared, based on comparative result, whether these two new voices are belonged to same speaker and judges.
Further, use linear discriminant analysis (Linear Discriminant Analysis, LDA) method to carry out channel compensation and process the i-vector in all samples, specifically include:
The target of linear discriminant analysis (LDA) be by the similar sample separation of projection matrix algorithmic minimizing from maximize non-similar sample separation from.
Particularly as follows: definition class scatter matrix SbWith Scatter Matrix S in classw
SbFor speaker's class scatter matrix, SwFor Scatter Matrix, n in the mankind that speaksIt it is voice number corresponding for speaker s;It is all speaker's i-vector averages;It is the s speaker's i-vector average.
Projection matrix A is made up of following formula characteristic value l characteristic of correspondence vector.
SbV=lSwv
SbFor speaker's class scatter matrix, SwFor Scatter Matrix in the mankind that speak, l is speaker characteristic value diagonal matrix, and v is speaker space direction.
Further, use variance within clusters regular (Within Class Covariance Normalization, WCCN) method to carry out channel compensation and process the i-vector in all samples, specifically include:
In class, the target of covariance regular (WCCN) is to make the base in sample space the most orthogonal.
Being calculated as follows of covariance matrix in class:
Total s speaker;nsIt it is voice number corresponding for speaker s;It is all speaker's i-vector averages;It is the s speaker's i-vector average.
Characteristic vector is mapped:Wherein B is W-1Qiao Laisiji decompose, i.e. W-1=BBT
Further, the method also includes:
The i-vector extracted in all samples is carried out length regular.
Further, use KISS algorithm, at similar sample, collection obtained metric matrix with non-similar sample to training on collection, specifically include:
Solve the covariance of similar sample pair and the covariance of non-similar sample pair in described target sample respectively, calculate the covariance of described similar sample pair and the metric matrix of the covariance of non-similar sample pair.
Specifically, solve the covariance of all similar samples pair the most respectivelyCovariance with non-similar sample pair
Σ y i j = 1 = Σ y i j = 1 ( x i - x j ) ( x i - x j ) T
Σ y i j = 0 = Σ y i j = 0 ( x i - x j ) ( x i - x j ) T
xiRepresent the i-vector vector of i-th voice, yij=0 represents that i-th voice and j-th strip voice are from different speakers, yij=1 represent i-th voice with j-th strip voice from identical speaker, can try to achieve metric matrix M:
M = ( Σ y i j = 1 - 1 - Σ y i j = 0 - 1 )
For the covariance of similar sample pair,For the covariance of non-similar sample pair, obtain M as the final required metric matrix solved.
Further, calculate the mahalanobis distance between two i-vector according to the metric matrix of gained, specifically include: according to the metric matrix M tried to achieve before, calculate two i-vector (xi, xjMahalanobis distance between):
dM(xi, xj)=(xi-xj)TM(xi-xj)
xiRepresenting the i-vector vector of i-th voice, M is metric matrix, dM(xi, xj) it is two i-vector (xi, xjMahalanobis distance between).
Further, mahalanobis distance and the threshold value of gained are compared, based on comparative result, whether these two new voices are belonged to same speaker and judges, specifically include:
Mahalanobis distance according to gained is calculated as two i-vector (xi, xjSimilarity score between):
ScoreM(xi, xj)=-(xi-xj)TM(xi-xj)
Wherein, mahalanobis distance score ScoreM(xi, xj), M is metric matrix, xiIt it is the i-vector vector of i-th voice.
By mahalanobis distance score Score of gainedM(xi, xj) make comparisons with threshold value, if mahalanobis distance score is more than threshold value, then illustrate that these two new voices are not belonging to same speaker;If mahalanobis distance score is within threshold value, then illustrate that these two new voices are belonging to same speaker.
In the present embodiment, s is speaker's quantity;nsIt it is voice number corresponding for speaker s;It is all speaker's i-vector averages;It is the s speaker's i-vector average.
For the ease of understanding technical scheme, the effect reached below by way of the method illustrating embodiment offer as a example by a concrete experiment test application scenarios and exploitativeness:
Experiment is carried out under MATLAB environment, and the experiment speech data of speaker's tone testing sample evaluates and tests (SRE) 04,05,06,08 year core voice library both from American National Standard with Technical Board (NIST) speaker.First Speaker Recognition System carries out de-redundancy and noise reduction process to the speech data of the target sample of the multiple speakers gathered, and voice analog signal is changed discrete voice data signal.With the window function of frame length 20ms, voice signal overlapped framing (frame moves 10ms).Extract 13 Jan Vermeer frequency cepstral coefficients (MFCC) to be combined into 39 dimensional feature vectors with its single order, second differnce voice signal is indicated.Use NISTSRE04,05 train, with 06 year speech data collection, the UBM that 512 rank sexes are relevant, train the i-vector vector (400 dimension) of the target sample of all speakers on this basis, and i-vector vector is carried out LDA, WCCN and length is regular etc. that robustness processes, for subsequent process.Wherein 08 year speech data carries out similarity evaluation and test as target sample and the tone testing sample of speaker.
Before carrying out metric learning experiment, first it is configured to the similar sample of training to collecting with non-similar sample collection.The present embodiment uses 6609 voices of 491 male sex that NIST SRE04,05,06 year voice is concentrated, and 9136 voices of 703 women construct similar sample to collecting S to non-similar sample to collection D.
The i-vector extracted from voice, after LDA or WCCN channel compensation processes, uses one mahalanobis distance metric matrix of KISS Algorithm for Training, calculates mahalanobis distance and calculates the similarity score between target i-vector and test i-vector.
If s speaker;nsIt it is voice number corresponding for speaker s;It is all speaker's i-vector averages;It it is each speaker's i-vector average.
Wherein, the target of linear discriminant analysis (LDA) for by projection minimize similar sample separation from maximize non-similar sample separation from.Definition class scatter matrix SbWith Scatter Matrix S in classw:
SbFor speaker's class scatter matrix, SwFor Scatter Matrix, n in the mankind that speaksIt it is voice number corresponding for speaker s;It is all speaker's i-vector averages;It is the s speaker's i-vector average.
Projection matrix A is made up of following formula characteristic value l characteristic of correspondence vector.
SbV=lSwv
SbFor speaker's class scatter matrix, SwFor Scatter Matrix in the mankind that speak, l is speaker characteristic value diagonal matrix, and v is speaker space direction.
In class, the target of covariance regular (WCCN) is to make the base of sample space the most orthogonal.Being calculated as follows of covariance matrix in class:
Total s speaker;nsIt it is voice number corresponding for speaker s;It is all speaker's i-vector averages;It is the s speaker's i-vector average.
Characteristic vector is mapped:Wherein B is W-1Qiao Laisiji decompose, i.e. W-1=B BT
I-vector vector is carried out that length is regular improves systematic function.
Wherein, KISS algorithm is as follows:
Solve the covariance of all similar samples pair respectivelyCovariance with non-similar sample pair
Σ y i j = 1 = Σ y i j = 1 ( x i - x j ) ( x i - x j ) T
Σ y i j = 0 = Σ y i j = 0 ( x i - x j ) ( x i - x j ) T
xiRepresent the i-vector vector of i-th voice, yij=0 represents that i-th voice and j-th strip voice are from different speakers, yij=1 represent i-th voice with j-th strip voice from identical speaker, can try to achieve metric matrix M:
M = ( Σ y i j = 1 - 1 - Σ y i j = 0 - 1 )
For the covariance of similar sample pair,For the covariance of non-similar sample pair, obtain the M metric matrix as final required solution, be used for calculating speaker's tone testing sample and target sample (xi, xjMahalanobis distance between):
dM(xi, xj)=(xi-xj)TM(xi-xj)
xiRepresenting the i-vector vector of i-th voice, M is metric matrix, dM(xi, xj) it is speaker's tone testing sample and target sample (xi, xjMahalanobis distance between).
Speaker sample (x is calculated according to this distancei, xjSimilarity score between):
ScoreM(xi, xj)=-(xi-xj)TM(xi-xj)
Wherein, mahalanobis distance score ScoreM(xi, xj), M is metric matrix, xiIt it is the i-vector vector of i-th voice.
The method that the present embodiment provides, keeps simple directly (KISS) algorithm, the most effectively, there is globally optimal solution, can quickly try to achieve the distance matrix metric meeting condition, and the sample for training belongs to similar to only knowing whether.Metric matrix to be solved does not haves over-fitting, and is easily obtained, and the extensibility of KISS algorithm is good, it is not necessary to the iterative process of optimization, only need to calculate two covariance matrixes the least.This metric matrix can effectively reflect the similitude in speaker space and distinction, this metric matrix is used for the mahalanobis distance scores classification device of test target speaker's speech samples, makes Speaker Recognition System achieve good recognition effect.Performance is close to the most currently a popular metric learning algorithm, and the speed of the training process of metric matrix is faster than other algorithms, training mahalanobis distance metric matrix out more can the truly similitude in reflected sample space and distinction, thus improve the performance of Speaker Recognition System.
It may be noted that according to the needs implemented, each step described in this application can be split as more multi-step, it is possible to two or more step part operations are combined into new step, to realize the purpose of the present invention.
Above-mentioned the method according to the invention can be at hardware, firmware realizes, or it is implemented as being storable in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk) in software or computer code, or the original storage being implemented through network download and will be stored in the computer code in local recording medium in remotely record medium or nonvolatile machine readable media, thus method described here can be stored in use all-purpose computer, application specific processor or able to programme or specialized hardware (such as ASIC or FPGA) the such software recorded on medium process.It is appreciated that, computer, processor, microprocessor controller or programmable hardware include to store or receive the storage assembly of software or computer code (such as, RAM, ROM, flash memory etc.), when described software or computer code are by computer, processor or hardware access and execution, it is achieved processing method described here.Additionally, when all-purpose computer accesses for the code of the process that realization is shown in which, all-purpose computer is converted to the special-purpose computer of the process being shown in which for execution by the execution of code.
The above; being only the detailed description of the invention of the present invention, but protection scope of the present invention is not limited thereto, any those familiar with the art is in the technical scope that the invention discloses; change can be readily occurred in or replace, all should contain within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with described scope of the claims.

Claims (7)

1. a method for distinguishing speek person based on simple directly metric learning algorithm, it is characterised in that the method comprises the following steps:
Gather the speech samples of multiple speaker, extract the i-vector in all samples;
Using LDA or WCCN method to carry out channel compensation and process the i-vector in all samples, line length of going forward side by side is regular, forms training sample set;
Construct i-vector based on training sample set with the similar sample of speaker's identity to collecting with non-similar sample collection;
Use KISS algorithm, at similar sample, collection is obtained metric matrix with non-similar sample to training on collection;
For two new voices, by their i-vector after the process that the above extraction, channel compensation process and length are regular, based on the metric matrix calculated before, calculate the mahalanobis distance between two i-vector;
Mahalanobis distance and the threshold value of gained are compared, based on comparative result, whether these two new voices is belonged to same speaker and judges.
2. the method for claim 1, it is characterised in that use LDA or WCCN method to carry out channel compensation and process the i-vector in all samples, specifically include:
By the similar sample separation of projection matrix algorithmic minimizing from maximize non-similar sample separation from.
3. the method for claim 1, it is characterised in that use LDA or WCCN method to carry out channel compensation and process the i-vector in all samples, specifically include:
Make the base in target sample space the most orthogonal.
4. the method for claim 1, it is characterised in that the method also includes:
The i-vector extracted in all samples is carried out length regular.
5. the method for claim 1, it is characterised in that use KISS algorithm, obtains metric matrix with non-similar sample to training on collection to collection at similar sample, specifically includes:
Solve the covariance of the covariance of similar sample pair and non-similar sample pair in all samples respectively;
Calculate the covariance of described similar sample pair and the metric matrix of the covariance of non-similar sample pair.
6. the method for claim 1, it is characterised in that the method also includes:
Metric matrix according to gained calculates the mahalanobis distance between two i-vector.
7. whether these two new voices, based on comparative result, are belonged to same speaker and judge, specifically include by the method for claim 1, it is characterised in that mahalanobis distance and the threshold value of gained are compared:
If the mahalanobis distance of gained is more than threshold value, then illustrate that these two new voices are not belonging to same speaker;
If the mahalanobis distance of gained is within threshold value, then illustrate that these two new voices are belonging to same speaker.
CN201610281884.1A 2016-04-29 2016-04-29 Speaker identification method base on simple direct tolerance learning algorithm Pending CN105931646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610281884.1A CN105931646A (en) 2016-04-29 2016-04-29 Speaker identification method base on simple direct tolerance learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610281884.1A CN105931646A (en) 2016-04-29 2016-04-29 Speaker identification method base on simple direct tolerance learning algorithm

Publications (1)

Publication Number Publication Date
CN105931646A true CN105931646A (en) 2016-09-07

Family

ID=56837754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610281884.1A Pending CN105931646A (en) 2016-04-29 2016-04-29 Speaker identification method base on simple direct tolerance learning algorithm

Country Status (1)

Country Link
CN (1) CN105931646A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147799A (en) * 2018-10-18 2019-01-04 广州势必可赢网络科技有限公司 A kind of method, apparatus of speech recognition, equipment and computer storage medium
CN109377984A (en) * 2018-11-22 2019-02-22 北京中科智加科技有限公司 A kind of audio recognition method and device based on ArcFace
CN110188641A (en) * 2019-05-20 2019-08-30 北京迈格威科技有限公司 Image recognition and the training method of neural network model, device and system
CN111179914A (en) * 2019-12-04 2020-05-19 华南理工大学 Voice sample screening method based on improved dynamic time warping algorithm
CN111462762A (en) * 2020-03-25 2020-07-28 清华大学 Speaker vector regularization method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103959323A (en) * 2011-09-21 2014-07-30 搜诺思公司 Methods and systems to share media

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103959323A (en) * 2011-09-21 2014-07-30 搜诺思公司 Methods and systems to share media

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ZHENCHUN LEI, JIAN LUO, YANHONG WAN, AND YINGEN YANG: "A Mahalanobis Distance Scoring with KISS Metric", 《BIOMETRIC RECOGNITION》 *
沈媛媛,严严,王菡子: "有监督的距离度量学习算法研究进展", 《自动化学报》 *
钱强,陈松灿: "基于矩形正态分布似然比测试的矩阵度量学习算法", 《山东大学学报(工学版)》 *
雷震春,万艳红,罗剑,朱明华: "基于Mahalanobis距离的说话人识别模型研究", 《第十三届全国人机语音通讯学术会议(NCMMSC2015)论文集》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147799A (en) * 2018-10-18 2019-01-04 广州势必可赢网络科技有限公司 A kind of method, apparatus of speech recognition, equipment and computer storage medium
CN109377984A (en) * 2018-11-22 2019-02-22 北京中科智加科技有限公司 A kind of audio recognition method and device based on ArcFace
CN109377984B (en) * 2018-11-22 2022-05-03 北京中科智加科技有限公司 ArcFace-based voice recognition method and device
CN110188641A (en) * 2019-05-20 2019-08-30 北京迈格威科技有限公司 Image recognition and the training method of neural network model, device and system
CN110188641B (en) * 2019-05-20 2022-02-01 北京迈格威科技有限公司 Image recognition and neural network model training method, device and system
CN111179914A (en) * 2019-12-04 2020-05-19 华南理工大学 Voice sample screening method based on improved dynamic time warping algorithm
CN111179914B (en) * 2019-12-04 2022-12-16 华南理工大学 Voice sample screening method based on improved dynamic time warping algorithm
CN111462762A (en) * 2020-03-25 2020-07-28 清华大学 Speaker vector regularization method and device, electronic equipment and storage medium
CN111462762B (en) * 2020-03-25 2023-02-24 清华大学 Speaker vector regularization method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105931646A (en) Speaker identification method base on simple direct tolerance learning algorithm
CN110120218B (en) Method for identifying highway large-scale vehicles based on GMM-HMM
CN110197502B (en) Multi-target tracking method and system based on identity re-identification
US9355642B2 (en) Speaker recognition method through emotional model synthesis based on neighbors preserving principle
CN103415825A (en) System and method for gesture recognition
CN111128128B (en) Voice keyword detection method based on complementary model scoring fusion
CN105261367A (en) Identification method of speaker
CN109977213B (en) Optimal answer selection method for intelligent question-answering system
Senoussaoui et al. Efficient iterative mean shift based cosine dissimilarity for multi-recording speaker clustering
JP2014026455A (en) Media data analysis device, method and program
Bahari Speaker age estimation using Hidden Markov Model weight supervectors
CN108648760A (en) Real-time sound-groove identification System and method for
CN107945791A (en) A kind of audio recognition method based on deep learning target detection
CN114519351A (en) Subject text rapid detection method based on user intention embedded map learning
Ghaemmaghami et al. A study of speaker clustering for speaker attribution in large telephone conversation datasets
Prasad et al. Improving the performance of speech clustering method
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
Ruiz-Muñoz et al. Enhancing the dissimilarity-based classification of birdsong recordings
CN112465054B (en) FCN-based multivariate time series data classification method
Chandrakala et al. Combination of generative models and SVM based classifier for speech emotion recognition
CN114242066A (en) Speech processing method, speech processing model training method, apparatus and medium
Aronowitz Trainable speaker diarization
CN113823326A (en) Method for using training sample of efficient voice keyword detector
CN116230012B (en) Two-stage abnormal sound detection method based on metadata comparison learning pre-training
Pao et al. Audio-visual speech recognition with weighted KNN-based classification in mandarin database

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160907

RJ01 Rejection of invention patent application after publication