CN108154371A - Electronic device, the method for authentication and storage medium - Google Patents
Electronic device, the method for authentication and storage medium Download PDFInfo
- Publication number
- CN108154371A CN108154371A CN201810030621.2A CN201810030621A CN108154371A CN 108154371 A CN108154371 A CN 108154371A CN 201810030621 A CN201810030621 A CN 201810030621A CN 108154371 A CN108154371 A CN 108154371A
- Authority
- CN
- China
- Prior art keywords
- vocal print
- voice data
- training
- triphones
- data sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/38—Payment protocols; Details thereof
- G06Q20/40—Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
- G06Q20/401—Transaction verification
- G06Q20/4014—Identity check for transactions
- G06Q20/40145—Biometric identity checks
Abstract
The present invention relates to a kind of electronic device, the method for authentication and storage medium, this method to include:After the voice data of target user is received, the preset kind vocal print feature of the voice data is extracted using Predetermined filter, and the corresponding vocal print feature vector of the voice data is built based on the preset kind vocal print feature;By the first model of vocal print feature vector input training in advance, the corresponding triphones feature of each frame voice of the voice data is determined, and construct the corresponding triphones feature vector of all triphones features of voice data;The second model that triphones feature vector input is trained in advance, to construct the current vocal print discriminant vectors of target user;The space length between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore is calculated, authentication is carried out, and generate verification result to the user based on the space length.The present invention can improve the accuracy rate of authentication, improve financial security.
Description
Technical field
The present invention relates to a kind of field of communication technology more particularly to electronic device, the method for authentication and storage mediums.
Background technology
At present, the scope of business of many large-scale financing corporations is related to multiple business such as insurance, bank, investment, almost
Each business is usually required for same client to be linked up, and authentication or anti-fraud identification is carried out before communication, to ensure business
Safety.In order to meet the real-time demand of business, some financing corporations by the way of speech recognition to the identity of client into
Row verification or anti-fraud identification.In speech recognition, since the sounding (waveform) of a word is not only determined by phoneme, collaboration
The presence of pronunciation (referring to that a sound is influenced by front and rear adjacent tone and changed) is so that the perception of phoneme differs with standard
Sample, it is seen then that the sounding of word actually additionally depends on phoneme context this factor other than phoneme.Existing speech recognition side
Application on Voiceprint Recognition model does not consider the phoneme context of voice data to be identified used by case, and speech recognition carries out authentication
Accuracy rate it is not high, criminal carries out financial fraud possibly also with this weakness, and safety is low.
Invention content
The purpose of the present invention is to provide a kind of electronic device, the method for authentication and storage mediums, it is intended to improve body
The accuracy rate of part verification, improves financial security.
To achieve the above object, the present invention provides a kind of electronic device, the electronic device include memory and with it is described
The processor of memory connection, is stored with the processing system that can be run on the processor, the processing in the memory
System realizes following steps when being performed by the processor:
Extraction step after the voice data of target user of pending authentication is received, utilizes Predetermined filter
The preset kind vocal print feature of the voice data is extracted, and it is corresponding based on the preset kind vocal print feature to build the voice data
Vocal print feature vector;
First construction step by the first model of vocal print feature vector input training in advance, determines the voice data
The corresponding triphones feature of each frame voice, and the corresponding triphones of all triphones features for constructing the voice data are special
Sign vector;
Second construction step by the second model of triphones feature vector input training in advance, is used with constructing target
The current vocal print discriminant vectors at family;
Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user that prestore it
Between space length, authentication is carried out to the user based on the space length, and generates verification result.
Preferably, second model is gauss hybrid models, and the training process of second model includes the following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a vocal print discriminant vectors;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample
Corresponding preset kind vocal print feature builds the corresponding vocal print feature vector of each voice data sample;
Respectively by the first model of the vocal print feature vector of structure input training in advance, each voice data sample is determined
The corresponding triphones feature of each frame voice, and all triphones features for constructing each voice data sample respectively are corresponding
Triphones feature vector;
All triphones feature vectors constructed are divided into the verification of the training set and the second percentage of the first percentage
Collection, the sum of first percentage and the second percentage are less than or equal to 1;
The gauss hybrid models are trained using the triphones feature vector in training set, and the profit after the completion of training
It is verified with the accuracy rate of the gauss hybrid models after verification set pair training;
If accuracy rate is more than default accuracy rate, training terminates, using the gauss hybrid models after training as described second
Model alternatively, if accuracy rate is less than or equal to default accuracy rate, increases the quantity of voice data sample, and based on increase after
Voice data sample re -training.
Preferably, the verification step, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Distance:It is describedIt is described for the standard vocal print discriminant vectorsFor current vocal print discriminant vectors;
If the COS distance is less than or equal to preset distance threshold, the information being verified is generated;
If the COS distance be more than preset distance threshold, generate verification not by information.
Preferably, first model be shot and long term memory network LSTM models, the training process packet of first model
Include following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a triphones feature vector;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample
Corresponding preset kind vocal print feature builds the corresponding vocal print feature vector of each voice data sample;
All vocal print feature vectors of structure are divided into the training set of the first percentage and the verification collection of the second percentage, institute
The sum of the first percentage and the second percentage are stated less than or equal to 1;
The shot and long term memory network LSTM models are trained, and in training using the vocal print feature vector in training set
After the completion using verifying that the accuracy rate of shot and long term memory network LSTM models after set pair training is verified;
If accuracy rate is more than default accuracy rate, training terminates, and is made with the shot and long term memory network LSTM models after training
For first model, if alternatively, accuracy rate is less than or equal to default accuracy rate, increase the quantity of voice data sample, and base
Voice data sample re -training after increase.
To achieve the above object, the present invention also provides a kind of method of authentication, the method for the authentication includes:
S1, should using Predetermined filter extraction after the voice data of target user of pending authentication is received
The preset kind vocal print feature of voice data, and the corresponding vocal print spy of the voice data is built based on the preset kind vocal print feature
Sign vector;
S2 by the first model of vocal print feature vector input training in advance, determines each frame voice of the voice data
Corresponding triphones feature, and construct the corresponding triphones feature vector of all triphones features of the voice data;
S3, the second model that triphones feature vector input is trained in advance, to construct the current sound of target user
Line discriminant vectors;
S4 calculates the sky between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Between distance, authentication is carried out to the user based on the space length, and generates verification result.
Preferably, second model is gauss hybrid models, and the training process of second model includes the following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a vocal print discriminant vectors;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample
Corresponding preset kind vocal print feature builds the corresponding vocal print feature vector of each voice data sample;
Respectively by the first model of the vocal print feature vector of structure input training in advance, each voice data sample is determined
The corresponding triphones feature of each frame voice, and all triphones features for constructing each voice data sample respectively are corresponding
Triphones feature vector;
All triphones feature vectors constructed are divided into the verification of the training set and the second percentage of the first percentage
Collection, the sum of first percentage and the second percentage are less than or equal to 1;
The gauss hybrid models are trained using the triphones feature vector in training set, and the profit after the completion of training
It is verified with the accuracy rate of the gauss hybrid models after verification set pair training;
If accuracy rate is more than default accuracy rate, training terminates, using the gauss hybrid models after training as described second
Model alternatively, if accuracy rate is less than or equal to default accuracy rate, increases the quantity of voice data sample, and based on increase after
Voice data sample re -training.
Preferably, the step S4, specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Distance:It is describedIt is described for the standard vocal print discriminant vectorsFor current vocal print discriminant vectors;
If the COS distance is less than or equal to preset distance threshold, the information being verified is generated;
If the COS distance be more than preset distance threshold, generate verification not by information.
Preferably, first model be shot and long term memory network LSTM models, the training process packet of first model
Include following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a triphones feature vector;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample
Corresponding preset kind vocal print feature builds the corresponding vocal print feature vector of each voice data sample;
All vocal print feature vectors of structure are divided into the training set of the first percentage and the verification collection of the second percentage, institute
The sum of the first percentage and the second percentage are stated less than or equal to 1;
The shot and long term memory network LSTM models are trained, and in training using the vocal print feature vector in training set
After the completion using verifying that the accuracy rate of shot and long term memory network LSTM models after set pair training is verified;
If accuracy rate is more than default accuracy rate, training terminates, and is made with the shot and long term memory network LSTM models after training
For first model, if alternatively, accuracy rate is less than or equal to default accuracy rate, increase the quantity of voice data sample, and base
Voice data sample re -training after increase.
Preferably, the step S1, specifically includes:
Preemphasis, framing and windowing process are carried out to the voice data, carrying out Fourier transform to each adding window obtains
To corresponding frequency spectrum, the frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms corresponding vocal print feature vector.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium
The step of system, the processing system realizes the method for above-mentioned authentication when being executed by processor.
The beneficial effects of the invention are as follows:The present invention extracts the voice first when carrying out authentication or anti-fraud identification
The vocal print feature of data simultaneously builds corresponding vocal print feature vector, to training in advance the first mode input vocal print feature to
Amount, to determine the triphones feature of the voice data and build corresponding triphones feature vector, to the second mould of training in advance
Type inputs the triphones feature vector to obtain the current vocal print discriminant vectors of target user, calculate current vocal print discriminant vectors with
Space length between the standard vocal print discriminant vectors of the target user to prestore, to carry out body to user using the space length
Part verification, since the present embodiment is when speech recognition carries out authentication, in addition to phoneme in itself other than also consider and represent voice data
Phoneme context triphones, therefore the accuracy rate of authentication can be improved, improve financial security.
Description of the drawings
Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention;
Fig. 2 is the flow diagram of one embodiment of method of authentication of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not
For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before creative work is made
All other embodiments obtained are put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is only used for description purpose, and cannot
It is interpreted as indicating or implies its relative importance or imply the quantity of the technical characteristic indicated by indicating.Define as a result, " the
One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the skill between each embodiment
Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical solution
Will be understood that the combination of this technical solution is not present with reference to there is conflicting or can not realize when, also not the present invention claims
Protection domain within.
As shown in fig.1, Fig. 1 is the schematic diagram of the hardware structure of one embodiment of electronic device of the present invention.Electronic device 1 is
It is a kind of to carry out numerical computations and/or the equipment of information processing automatically according to the instruction for being previously set or storing.The electricity
Sub-device 1 can be computer, can also be single network server, the server group or base of multiple network servers composition
In the cloud being made of a large amount of hosts or network server of cloud computing, wherein cloud computing is one kind of Distributed Calculation, by one
One super virtual computer of the computer collection composition of group's loose couplings.
In the present embodiment, electronic device 1 may include, but be not limited only to, and can be in communication with each other connection by system bus
Memory 11, processor 12, network interface 13, memory 11 are stored with the processing system that can be run on the processor 12.It needs
, it is noted that Fig. 1 illustrates only the electronic device 1 with component 11-13, it should be understood that being not required for implementing all
The component shown, what can be substituted implements more or less components.
Wherein, memory 11 includes memory and the readable storage medium storing program for executing of at least one type.Inside save as the fortune of electronic device 1
Row provides caching;Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories
Deng), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electric erasable can compile
Journey read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile
Storage medium.In some embodiments, readable storage medium storing program for executing can be the internal storage unit of electronic device 1, such as the electronics
The hard disk of device 1;In further embodiments, which can also be that the external storage of electronic device 1 is set
Plug-in type hard disk that is standby, such as being equipped on electronic device 1, intelligent memory card (Smart Media Card, SMC), secure digital
(Secure Digital, SD) blocks, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11
The operating system of electronic device 1 and types of applications software, such as the place in one embodiment of the invention are installed on commonly used in storage
Program code of reason system etc..It has exported or will export each in addition, memory 11 can be also used for temporarily storing
Class data.
The processor 12 can be in some embodiments central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is commonly used in the control electricity
The overall operation of sub-device 1, such as perform and carry out data interaction or communicate relevant control and processing with the other equipment
Deng.In the present embodiment, the processor 12 is used to run the program code stored in the memory 11 or processing data, example
Such as run processing system.
The network interface 13 may include radio network interface or wired network interface, which is commonly used in
Communication connection is established between the electronic device 1 and other electronic equipments.
The processing system is stored in memory 11, including it is at least one be stored in it is computer-readable in memory 11
Instruction, which can be performed by processor device 12, to realize the method for each embodiment of the application;With
And at least one computer-readable instruction is different according to the function that its each section is realized, can be divided into different logic moulds
Block.
In one embodiment, following steps are realized when above-mentioned processing system is performed by the processor 12:
Extraction step after the voice data of target user of pending authentication is received, utilizes Predetermined filter
The preset kind vocal print feature of the voice data is extracted, and it is corresponding based on the preset kind vocal print feature to build the voice data
Vocal print feature vector;
In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device.
When acquiring voice data, the interference of ambient noise and voice capture device should be prevented as possible.Voice capture device is used with target
Family keeps suitable distance, and does not have to be distorted big voice capture device as possible, and power supply keeps electric current steady it is preferable to use alternating current
It is fixed;Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise
Reason, to be further reduced interference.In order to extract to obtain the vocal print feature of voice data, the voice data acquired is default
The voice data of data length is the voice data more than preset data length.
Wherein, vocal print feature includes multiple types, such as broadband vocal print, narrowband vocal print, amplitude vocal print etc., and the present embodiment is pre-
If type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data
Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted
The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to
Amount.
First construction step by the first model of vocal print feature vector input training in advance, determines the voice data
The corresponding triphones feature of each frame voice, and the corresponding triphones of all triphones features for constructing the voice data are special
Sign vector;
In the present embodiment, triphones state represents that the phoneme of a speech frame adds its shape with front and rear phoneme relationship in itself
State.First model is preferably shot and long term memory network LSTM (Long Short-Term Memory) model, is remembered using shot and long term
Recall network LSTM model treatments vocal print feature vector, obtain the corresponding triphones feature of each frame voice and construct corresponding three
Phoneme feature vector.
Wherein, triphones are characterized as the triphones shape probability of state representated by corresponding speech frame, voice data
All triphones features correspond to a triphones eigenmatrix, and the triphones eigenmatrix and the vocal print feature of input are vectorial (i.e.
Vocal print feature matrix) it is corresponding, and the corresponding triphones feature vector (i.e. three of all triphones features for constructing the voice data
Phoneme eigenmatrix).
In a preferred embodiment, shot and long term memory network LSTM models include 1 input layer, 3 LSTM layers and 1
Classification layer, it is as shown in table 1 below:
Layer Name | Batch Size |
Input | 913 |
LSTM1/HLSTM1 | 1024 |
LSTM2/HLSTM2 | 1024 |
LSTM3/HLSTM3 | 1024 |
Softmax | 4773 |
Table 1
Wherein, Layer Name represent each layer of title, and Batch Size represent the input voice strip number of current layer,
Input table shows input layer, the length that HLSTM (Highway Long Short-term Memory) expressions are connected based on mnemon
(recurrent neural network introduces and is directly connected to adjacent memory list the length based on mnemon connection recurrent neural network in short-term in short-term
The part of member so that the information in shot and long term memory network LSTM models can be transmitted directly in different interlayers, improve voice and know
Other efficiency promotes recognition effect), Softmax represents Softmax graders.Vocal print feature vector is remembered successively by shot and long term
After the processing of the above-mentioned each layer structure of network LSTM models, obtain the corresponding triphones feature of each frame voice and construct this three
The corresponding triphones feature vector of phoneme feature.
In a preferred embodiment, first model includes for the training process of shot and long term memory network LSTM models
Following steps:
The voice data sample (such as 10) of preset quantity is obtained, each voice data sample corresponds to a triphones spy
Sign vector;The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample
Corresponding preset kind vocal print feature builds the corresponding vocal print feature vector of each voice data sample;By all vocal prints of structure
Feature vector is divided into the training set of the first percentage (such as 75%) and the verification collection of the second percentage (such as 20%), and described
The sum of one percentage and the second percentage are less than or equal to 1;Net is remembered to the shot and long term using the vocal print feature vector in training set
Network LSTM models are trained, and the shot and long term memory network LSTM models after verifying set pair training are utilized after the completion of training
Accuracy rate, which is verified, (verifies the triphones feature vector of shot and long term memory network LSTM models output relative to voice data
The accuracy rate of the corresponding triphones feature vector of sample);If accuracy rate is more than default accuracy rate (such as 0.985), training knot
Beam, using the shot and long term memory network LSTM models after training as first model, if alternatively, accuracy rate is less than or equal to preset
Accuracy rate then increases the quantity of voice data sample, and based on the voice data sample re -training after increase, until accuracy rate
More than default accuracy rate.
Second construction step by the second model of triphones feature vector input training in advance, is used with constructing target
The current vocal print discriminant vectors at family;
In the present embodiment, the second model is preferably gauss hybrid models, using the gauss hybrid models come to triphones spy
Sign vectorial (i.e. triphones eigenmatrix) is calculated, and obtains corresponding current vocal print discriminant vectors (the i.e. i- of the voice data
vector)。
Specifically, which includes:
1) Gauss model, is selected:First, each frame data are calculated using the parameter in the second model in different Gaussian modes
The likelihood logarithm of type, by likelihood logarithm value matrix each column sorting in parallel, choosing top n Gauss model, finally obtaining one
A matrix per frame data N before numerical value in mixed Gauss model:
Loglike=E (X) * D (X)-1*XT-0.5*D(X)-1*(X.2)T,
Wherein, Loglike is likelihood logarithm value matrix, and E (X) is the Mean Matrix that the second model training comes out, and D (X) is
Covariance matrix, X are data matrix, X.2Each it is worth for matrix and is squared.
Wherein, likelihood logarithm calculation formula:loglikesi=Ci+Ei*Covi -1*Xi-Xi T*Xi*Covi -1, loglikesi
For the i-th row vector of likelihood logarithm value matrix, CiFor the constant term of i-th of model, EiFor the Mean Matrix of i-th of model, Covi
For the covariance matrix of i-th of model, XiFor the i-th frame data.
2) posterior probability, is calculated:X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, be reduced to herein
Lower triangular matrix, and element is arranged as to 1 row in order, become a N frame and be multiplied by the such latitude of lower triangular matrix number
One vector is calculated, and vector as all frames is combined into new data matrix, while by the calculating in the second model
The covariance matrix of probability, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through
Mean Matrix and covariance matrix in second model calculate the likelihood logarithm under the Gauss model of the selection per frame data
Then value carries out Softmax recurrence, operation is finally normalized, obtain posterior probability point of every frame in mixed Gauss model
The ProbabilityDistribution Vector of every frame is formed probability matrix by cloth.
3) current vocal print discriminant vectors, are extracted:Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with
It is obtained by probability matrix row summation:
Wherein, GammaiFor i-th of element of coefficient of first order vector, loglikesjiFor
The jth row of likelihood logarithm value matrix, i-th of element.
Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix:
X=LoglikeT* feats, wherein, X be second order coefficient matrix, loglike be likelihood logarithm value matrix, feats
It is characterized data matrix.
It is being calculated single order, after second order coefficient, then parallel computation first order and quadratic term pass through first order and two
Secondary item calculates current vocal print discriminant vectors:I-vector=quadratic-1*linear。
Wherein,MiFor the Mean Matrix of i-th of model in the second model, ΣiIt is i-th
The covariance matrix of a model, XiThe i-th row vector for second order coefficient matrix;
M is vectorial for coefficient of first order, MiFor the equal of i-th of model in the second model
Value matrix, ΣiCovariance matrix for i-th of model.
Preferably, the process of training gauss hybrid models includes:
The voice data sample of preset quantity (such as 100,000) is obtained, each voice data sample corresponds to a vocal print and differentiates
Vector;The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample pair
The preset kind vocal print feature answered builds the corresponding vocal print feature vector of each voice data sample;Respectively by the vocal print of structure spy
First model of sign vector input training in advance determines that the corresponding triphones of each frame voice of each voice data sample are special
Sign, and the corresponding triphones feature vector of all triphones features of each voice data sample is constructed respectively;It will construct
All triphones feature vectors be divided into the training set of the first percentage (such as 75%) and the second percentage (such as 25%)
Verification collection, the sum of first percentage and the second percentage are less than or equal to 1;Utilize the triphones feature vector pair in training set
The gauss hybrid models are trained, and the accurate of the gauss hybrid models after verifying set pair training is utilized after the completion of training
Rate, which is verified, (verifies that the current vocal print discriminant vectors of gauss hybrid models output are worked as relative to voice data sample is corresponding
The accuracy rate of preceding vocal print discriminant vectors);If accuracy rate is more than default accuracy rate (such as 0.98), training terminates, after training
Gauss hybrid models as second model, if alternatively, accuracy rate is less than or equal to default accuracy rate, increase voice data
The quantity of sample, and based on the voice data sample re -training after increase, until accuracy rate is more than default accuracy rate.
Verification step, calculate the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user that prestore it
Between space length, authentication is carried out to the user based on the space length, and generates verification result.
In the present embodiment, there are many vector and the distance between vectors, including COS distance and Euclidean distance etc., preferably
Ground, the space length of the present embodiment is COS distance, and COS distance is utilizes two vectorial angle cosine values in vector space
Measurement as the size for weighing two inter-individual differences.
Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors
The identification information of its corresponding user is carried in storage, is capable of the identity of the corresponding user of accurate representation.Calculating space
Before distance, the standard vocal print discriminant vectors of storage are obtained according to the identification information that user provides.
Wherein, it when the space length being calculated is less than or equal to pre-determined distance threshold value, is verified, conversely, then verifying
Failure.
In addition, the present embodiment also can be applicable in the anti-identification cheated, based on the space length identification being calculated
Whether the user is black list user, to improve safety.
Compared with prior art, the present embodiment extracts the voice number first when carrying out authentication or anti-fraud identification
According to vocal print feature and build corresponding vocal print feature vector, to the first mode input of training vocal print feature vector in advance,
It is defeated to the second model of training in advance to determine the triphones feature of the voice data and build corresponding triphones feature vector
Enter the triphones feature vector to obtain the current vocal print discriminant vectors of target user, calculate current vocal print discriminant vectors with prestoring
The target user standard vocal print discriminant vectors between space length, with using the space length to user carry out identity test
Card, since the present embodiment is when speech recognition carries out authentication, in addition to phoneme in itself other than also consider and represent voice data
The triphones of phoneme context, therefore the accuracy rate of authentication can be improved, improve financial security.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 1, the verification step specifically includes:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Distance:It is describedIt is described for the standard vocal print discriminant vectorsFor current vocal print discriminant vectors;If institute
COS distance is stated less than or equal to preset distance threshold, then generates the information being verified;If the COS distance is more than
Preset distance threshold, then generate verification not by information.
In the present embodiment, the mark letter of target user can be carried in the standard vocal print discriminant vectors for storing target user
Breath in the identity for verifying user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects
It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining
Chordal distance verifies the identity of target user, improves the accuracy of authentication.
As shown in Fig. 2, Fig. 2 is the flow diagram of one embodiment of method of authentication of the present invention, the authentication
Method includes the following steps:
Step S1 after the voice data of target user of pending authentication is received, is carried using Predetermined filter
The preset kind vocal print feature of the voice data is taken, and the corresponding sound of the voice data is built based on the preset kind vocal print feature
Line feature vector;
In the present embodiment, voice data is collected (voice capture device is, for example, microphone) by voice capture device.
When acquiring voice data, the interference of ambient noise and voice capture device should be prevented as possible.Voice capture device is used with target
Family keeps suitable distance, and does not have to be distorted big voice capture device as possible, and power supply keeps electric current steady it is preferable to use alternating current
It is fixed;Sensor should be used when carrying out telephonograph.Before framing and sampling, voice data can be carried out at noise
Reason, to be further reduced interference.In order to extract to obtain the vocal print feature of voice data, the voice data acquired is default
The voice data of data length is the voice data more than preset data length.
Wherein, vocal print feature includes multiple types, such as broadband vocal print, narrowband vocal print, amplitude vocal print etc., and the present embodiment is pre-
If type vocal print feature is preferably mel-frequency cepstrum coefficient (the Mel Frequency Cepstrum of speech sample data
Coefficient, MFCC), Predetermined filter is Meier wave filter.When building corresponding vocal print feature vector, voice is adopted
The vocal print feature composition characteristic data matrix of sample data, this feature data matrix be speech sample data vocal print feature to
Amount.
Step S2 by the first model of vocal print feature vector input training in advance, determines each frame of the voice data
The corresponding triphones feature of voice, and construct the corresponding triphones feature of all triphones features of the voice data to
Amount;
In the present embodiment, triphones state represents that the phoneme of a speech frame adds its shape with front and rear phoneme relationship in itself
State.First model is preferably shot and long term memory network LSTM (Long Short-Term Memory) model, is remembered using shot and long term
Recall network LSTM model treatments vocal print feature vector, obtain the corresponding triphones feature of each frame voice and construct corresponding three
Phoneme feature vector.
Wherein, triphones are characterized as the triphones shape probability of state representated by corresponding speech frame, voice data
All triphones features correspond to a triphones eigenmatrix, and the triphones eigenmatrix and the vocal print feature of input are vectorial (i.e.
Vocal print feature matrix) it is corresponding, and the corresponding triphones feature vector (i.e. three of all triphones features for constructing the voice data
Phoneme eigenmatrix).
In a preferred embodiment, shot and long term memory network LSTM models include 1 input layer, 3 LSTM layers and 1
Classification layer, as shown in Table 1 above, details are not described herein again.
Wherein, Layer Name represent each layer of title, and Batch Size represent the input voice strip number of current layer,
Input table shows input layer, the length that HLSTM (Highway Long Short-term Memory) expressions are connected based on mnemon
(recurrent neural network introduces and is directly connected to adjacent memory list the length based on mnemon connection recurrent neural network in short-term in short-term
The part of member so that the information in shot and long term memory network LSTM models can be transmitted directly in different interlayers, improve voice and know
Other efficiency promotes recognition effect), Softmax represents Softmax graders.Vocal print feature vector is remembered successively by shot and long term
After the processing of the above-mentioned each layer structure of network LSTM models, obtain the corresponding triphones feature of each frame voice and construct this three
The corresponding triphones feature vector of phoneme feature.
In a preferred embodiment, first model includes for the training process of shot and long term memory network LSTM models
Following steps:
The voice data sample (such as 10) of preset quantity is obtained, each voice data sample corresponds to a triphones spy
Sign vector;The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample
Corresponding preset kind vocal print feature builds the corresponding vocal print feature vector of each voice data sample;By all vocal prints of structure
Feature vector is divided into the training set of the first percentage (such as 75%) and the verification collection of the second percentage (such as 20%), and described
The sum of one percentage and the second percentage are less than or equal to 1;Net is remembered to the shot and long term using the vocal print feature vector in training set
Network LSTM models are trained, and the shot and long term memory network LSTM models after verifying set pair training are utilized after the completion of training
Accuracy rate, which is verified, (verifies the triphones feature vector of shot and long term memory network LSTM models output relative to voice data
The accuracy rate of the corresponding triphones feature vector of sample);If accuracy rate is more than default accuracy rate (such as 0.985), training knot
Beam, using the shot and long term memory network LSTM models after training as first model, if alternatively, accuracy rate is less than or equal to preset
Accuracy rate then increases the quantity of voice data sample, and based on the voice data sample re -training after increase, until accuracy rate
More than default accuracy rate.
Step S3, the second model that triphones feature vector input is trained in advance, to construct working as target user
Preceding vocal print discriminant vectors;
In the present embodiment, the second model is preferably gauss hybrid models, using the gauss hybrid models come to triphones spy
Sign vectorial (i.e. triphones eigenmatrix) is calculated, and obtains corresponding current vocal print discriminant vectors (the i.e. i- of the voice data
vector)。
Specifically, which includes:
1) Gauss model, is selected:First, each frame data are calculated using the parameter in the second model in different Gaussian modes
The likelihood logarithm of type, by likelihood logarithm value matrix each column sorting in parallel, choosing top n Gauss model, finally obtaining one
A matrix per frame data N before numerical value in mixed Gauss model:
Loglike=E (X) * D (X)-1*XT-0.5*D(X)-1*(X.2)T,
Wherein, Loglike is likelihood logarithm value matrix, and E (X) is the Mean Matrix that the second model training comes out, and D (X) is
Covariance matrix, X are data matrix, X.2Each it is worth for matrix and is squared.
Wherein, likelihood logarithm calculation formula:loglikesi=Ci+Ei*Covi -1*Xi-Xi T*Xi*Covi -1, loglikesi
For the i-th row vector of likelihood logarithm value matrix, CiFor the constant term of i-th of model, EiFor the Mean Matrix of i-th of model, Covi
For the covariance matrix of i-th of model, XiFor the i-th frame data.
2) posterior probability, is calculated:X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, be reduced to herein
Lower triangular matrix, and element is arranged as to 1 row in order, become a N frame and be multiplied by the such latitude of lower triangular matrix number
One vector is calculated, and vector as all frames is combined into new data matrix, while by the calculating in the second model
The covariance matrix of probability, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through
Mean Matrix and covariance matrix in second model calculate the likelihood logarithm under the Gauss model of the selection per frame data
Then value carries out Softmax recurrence, operation is finally normalized, obtain posterior probability point of every frame in mixed Gauss model
The ProbabilityDistribution Vector of every frame is formed probability matrix by cloth.
3) current vocal print discriminant vectors, are extracted:Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with
It is obtained by probability matrix row summation:
Wherein, GammaiFor i-th of element of coefficient of first order vector, loglikesjiFor
The jth row of likelihood logarithm value matrix, i-th of element.
Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix:
X=LoglikeT* feats, wherein, X be second order coefficient matrix, loglike be likelihood logarithm value matrix, feats
It is characterized data matrix.
It is being calculated single order, after second order coefficient, then parallel computation first order and quadratic term pass through first order and two
Secondary item calculates current vocal print discriminant vectors:I-vector=quadratic-1*linear。
Wherein,MiFor the Mean Matrix of i-th of model in the second model, ΣiIt is i-th
The covariance matrix of a model, XiThe i-th row vector for second order coefficient matrix;
M is vectorial for coefficient of first order, MiFor the equal of i-th of model in the second model
Value matrix, ΣiCovariance matrix for i-th of model.
Preferably, the process of training gauss hybrid models includes:
The voice data sample of preset quantity (such as 100,000) is obtained, each voice data sample corresponds to a vocal print and differentiates
Vector;The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and based on each voice data sample pair
The preset kind vocal print feature answered builds the corresponding vocal print feature vector of each voice data sample;Respectively by the vocal print of structure spy
First model of sign vector input training in advance determines that the corresponding triphones of each frame voice of each voice data sample are special
Sign, and the corresponding triphones feature vector of all triphones features of each voice data sample is constructed respectively;It will construct
All triphones feature vectors be divided into the training set of the first percentage (such as 75%) and the second percentage (such as 25%)
Verification collection, the sum of first percentage and the second percentage are less than or equal to 1;Utilize the triphones feature vector pair in training set
The gauss hybrid models are trained, and the accurate of the gauss hybrid models after verifying set pair training is utilized after the completion of training
Rate, which is verified, (verifies that the current vocal print discriminant vectors of gauss hybrid models output are worked as relative to voice data sample is corresponding
The accuracy rate of preceding vocal print discriminant vectors);If accuracy rate is more than default accuracy rate (such as 0.98), training terminates, after training
Gauss hybrid models as second model, if alternatively, accuracy rate is less than or equal to default accuracy rate, increase voice data
The quantity of sample, and based on the voice data sample re -training after increase, until accuracy rate is more than default accuracy rate.
Step S4 is calculated between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Space length, authentication is carried out to the user based on the space length, and generates verification result.
In the present embodiment, there are many vector and the distance between vectors, including COS distance and Euclidean distance etc., preferably
Ground, the space length of the present embodiment is COS distance, and COS distance is utilizes two vectorial angle cosine values in vector space
Measurement as the size for weighing two inter-individual differences.
Wherein, standard vocal print discriminant vectors are the vocal print discriminant vectors for being obtained ahead of time and storing, standard vocal print discriminant vectors
The identification information of its corresponding user is carried in storage, is capable of the identity of the corresponding user of accurate representation.Calculating space
Before distance, the standard vocal print discriminant vectors of storage are obtained according to the identification information that user provides.
Wherein, it when the space length being calculated is less than or equal to pre-determined distance threshold value, is verified, conversely, then verifying
Failure.
In addition, the present embodiment also can be applicable in the anti-identification cheated, based on the space length identification being calculated
Whether the user is black list user, to improve safety.
Compared with prior art, the present embodiment extracts the voice number first when carrying out authentication or anti-fraud identification
According to vocal print feature and build corresponding vocal print feature vector, to the first mode input of training vocal print feature vector in advance,
It is defeated to the second model of training in advance to determine the triphones feature of the voice data and build corresponding triphones feature vector
Enter the triphones feature vector to obtain the current vocal print discriminant vectors of target user, calculate current vocal print discriminant vectors with prestoring
The target user standard vocal print discriminant vectors between space length, with using the space length to user carry out identity test
Card, since the present embodiment is when speech recognition carries out authentication, in addition to phoneme in itself other than also consider and represent voice data
The triphones of phoneme context, therefore the accuracy rate of authentication can be improved, improve financial security.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, the step S4 is specifically included:
Calculate the cosine between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Distance:It is describedIt is described for the standard vocal print discriminant vectorsFor current vocal print discriminant vectors;If institute
COS distance is stated less than or equal to preset distance threshold, then generates the information being verified;If the COS distance is more than
Preset distance threshold, then generate verification not by information.
In the present embodiment, the mark letter of target user can be carried in the standard vocal print discriminant vectors for storing target user
Breath in the identity for verifying user, obtains corresponding standard vocal print according to the identification information match of current vocal print discriminant vectors and reflects
It is not vectorial, and the COS distance between current vocal print discriminant vectors and the standard vocal print discriminant vectors matched is calculated, with remaining
Chordal distance verifies the identity of target user, improves the accuracy of authentication.
In a preferred embodiment, on the basis of the embodiment of above-mentioned Fig. 2, the step S1 is specifically included:To institute
It states voice data and carries out preemphasis, framing and windowing process, carrying out Fourier transform to each adding window obtains corresponding frequency spectrum,
The frequency spectrum is inputted into Meier wave filter to export to obtain Meier frequency spectrum;Cepstral analysis is carried out on Meier frequency spectrum to obtain Meier
Frequency cepstral coefficient MFCC forms corresponding vocal print feature vector based on the mel-frequency cepstrum coefficient MFCC.
In the present embodiment, after the voice data for receiving the user for carrying out authentication, voice data is handled.
Wherein, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that the high frequency characteristics in voice data is more prominent
Aobvious, specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein, Z is voice data, and α is constant factor, preferably
Ground, the value of α is 0.97;Since only stationarity is presented in voice signal within a short period of time, one section of voice signal is divided into N
The signal (i.e. N frames) of section short time, and lost in order to avoid the continuity Characteristics of sound, there is one section of duplicate block between consecutive frame
Domain, repeat region are generally 1/2 per frame length;After framing is carried out to voice data, each frame signal all treats as stationary signal
It handles, but the presence of Gibbs' effect, the start frame and end frame of voice data are discontinuous, after framing, more
Away from raw tone, therefore, it is necessary to carry out windowing process to voice data.
Wherein, cepstral analysis is, for example, to take the logarithm, do inverse transformation, and inverse transformation comes generally by DCT discrete cosine transforms
It realizes, takes the 2nd after DCT to the 13rd coefficient as MFCC coefficients.Mel-frequency cepstrum coefficient MFCC is this frame voice
The vocal print feature of data, by the mel-frequency cepstrum coefficient MFCC composition characteristic data matrixes of every frame, this feature data matrix is
Vocal print feature vector for voice data.
The present embodiment takes speech sample data mel-frequency cepstrum coefficient MFCC to form corresponding vocal print feature vector, due to
Its than be used for the frequency band of the linear interval in normal cepstrum more can subhuman auditory system, therefore can improve
The accuracy of authentication.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium
The step of system, the processing system realizes the method for above-mentioned authentication when being executed by processor.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme of the present invention substantially in other words does the prior art
Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), used including some instructions so that a station terminal equipment (can be mobile phone, computer takes
Be engaged in device, air conditioner or the network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made directly or indirectly is used in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of electronic device, which is characterized in that the electronic device includes memory and the processing being connect with the memory
Device is stored with the processing system that can be run on the processor in the memory, and the processing system is by the processor
Following steps are realized during execution:
Extraction step after the voice data of target user of pending authentication is received, is extracted using Predetermined filter
The preset kind vocal print feature of the voice data, and the corresponding vocal print of the voice data is built based on the preset kind vocal print feature
Feature vector;
First construction step by the first model of vocal print feature vector input training in advance, determines each of the voice data
The corresponding triphones feature of frame voice, and construct the corresponding triphones feature of all triphones features of the voice data to
Amount;
Second construction step, the second model that triphones feature vector input is trained in advance, to construct target user's
Current vocal print discriminant vectors;
Verification step is calculated between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore
Space length carries out authentication, and generate verification result based on the space length to the user.
2. electronic device according to claim 1, which is characterized in that second model is gauss hybrid models, described
The training process of second model includes the following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a vocal print discriminant vectors;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and is corresponded to based on each voice data sample
Preset kind vocal print feature build the corresponding vocal print feature vector of each voice data sample;
Respectively by the first model of the vocal print feature vector of structure input training in advance, each of each voice data sample is determined
The corresponding triphones feature of frame voice, and corresponding three sound of all triphones features of each voice data sample is constructed respectively
Plain feature vector;
All triphones feature vectors constructed are divided into the training set of the first percentage and the verification collection of the second percentage, institute
The sum of the first percentage and the second percentage are stated less than or equal to 1;
The gauss hybrid models are trained using the triphones feature vector in training set, and utilizes and tests after the completion of training
The accuracy rate of the gauss hybrid models after card set pair training is verified;
If accuracy rate is more than default accuracy rate, training terminates, using the gauss hybrid models after training as second model,
Alternatively, if accuracy rate is less than or equal to default accuracy rate, increase the quantity of voice data sample, and based on the voice number after increase
According to sample re -training.
3. electronic device according to claim 1 or 2, which is characterized in that the verification step specifically includes:
Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore:It is describedIt is described for the standard vocal print discriminant vectorsFor current vocal print discriminant vectors;
If the COS distance is less than or equal to preset distance threshold, the information being verified is generated;
If the COS distance be more than preset distance threshold, generate verification not by information.
4. electronic device according to claim 1 or 2, which is characterized in that first model is shot and long term memory network
LSTM models, the training process of first model include the following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a triphones feature vector;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and is corresponded to based on each voice data sample
Preset kind vocal print feature build the corresponding vocal print feature vector of each voice data sample;
All vocal print feature vectors of structure are divided into the training set of the first percentage and the verification collection of the second percentage, described
The sum of one percentage and the second percentage are less than or equal to 1;
The shot and long term memory network LSTM models are trained using the vocal print feature vector in training set, and are completed in training
Afterwards using verifying that the accuracy rate of shot and long term memory network LSTM models after set pair training is verified;
If accuracy rate is more than default accuracy rate, training terminates, using the shot and long term memory network LSTM models after training as institute
The first model is stated, if alternatively, accuracy rate increases the quantity of voice data sample, and based on increasing less than or equal to default accuracy rate
Voice data sample re -training after adding.
A kind of 5. method of authentication, which is characterized in that the method for the authentication includes:
After the voice data of target user of pending authentication is received, the voice is extracted using Predetermined filter by S1
The preset kind vocal print feature of data, and based on the preset kind vocal print feature build the corresponding vocal print feature of the voice data to
Amount;
S2 by the first model of vocal print feature vector input training in advance, determines that each frame voice of the voice data corresponds to
Triphones feature, and construct the corresponding triphones feature vector of all triphones features of the voice data;
S3 by the second model of triphones feature vector input training in advance, is reflected with constructing the current vocal print of target user
Not vector;
S4, calculate space between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore away from
From carrying out authentication to the user based on the space length, and generate verification result.
6. the method for authentication according to claim 5, which is characterized in that second model is Gaussian Mixture mould
Type, the training process of second model include the following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a vocal print discriminant vectors;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and is corresponded to based on each voice data sample
Preset kind vocal print feature build the corresponding vocal print feature vector of each voice data sample;
Respectively by the first model of the vocal print feature vector of structure input training in advance, each of each voice data sample is determined
The corresponding triphones feature of frame voice, and corresponding three sound of all triphones features of each voice data sample is constructed respectively
Plain feature vector;
All triphones feature vectors constructed are divided into the training set of the first percentage and the verification collection of the second percentage, institute
The sum of the first percentage and the second percentage are stated less than or equal to 1;
The gauss hybrid models are trained using the triphones feature vector in training set, and utilizes and tests after the completion of training
The accuracy rate of the gauss hybrid models after card set pair training is verified;
If accuracy rate is more than default accuracy rate, training terminates, using the gauss hybrid models after training as second model,
Alternatively, if accuracy rate is less than or equal to default accuracy rate, increase the quantity of voice data sample, and based on the voice number after increase
According to sample re -training.
7. the method for authentication according to claim 5 or 6, which is characterized in that the step S4 is specifically included:
Calculate the COS distance between the current vocal print discriminant vectors and the standard vocal print discriminant vectors of the target user to prestore:It is describedIt is described for the standard vocal print discriminant vectorsFor current vocal print discriminant vectors;
If the COS distance is less than or equal to preset distance threshold, the information being verified is generated;
If the COS distance be more than preset distance threshold, generate verification not by information.
8. the method for authentication according to claim 5 or 6, which is characterized in that first model is remembered for shot and long term
Recall network LSTM models, the training process of first model includes the following steps:
The voice data sample of preset quantity is obtained, each voice data sample corresponds to a triphones feature vector;
The corresponding preset kind vocal print feature of each voice data sample is extracted respectively, and is corresponded to based on each voice data sample
Preset kind vocal print feature build the corresponding vocal print feature vector of each voice data sample;
All vocal print feature vectors of structure are divided into the training set of the first percentage and the verification collection of the second percentage, described
The sum of one percentage and the second percentage are less than or equal to 1;
The shot and long term memory network LSTM models are trained using the vocal print feature vector in training set, and are completed in training
Afterwards using verifying that the accuracy rate of shot and long term memory network LSTM models after set pair training is verified;
If accuracy rate is more than default accuracy rate, training terminates, using the shot and long term memory network LSTM models after training as institute
The first model is stated, if alternatively, accuracy rate increases the quantity of voice data sample, and based on increasing less than or equal to default accuracy rate
Voice data sample re -training after adding.
9. the method for authentication according to claim 5 or 6, which is characterized in that the step S1 is specifically included:
Preemphasis, framing and windowing process are carried out to the voice data, Fourier transform is carried out to each adding window and is obtained pair
The frequency spectrum is inputted Meier wave filter to export to obtain Meier frequency spectrum by the frequency spectrum answered;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum system
Number MFCC forms corresponding vocal print feature vector.
10. a kind of computer readable storage medium, which is characterized in that be stored with processing system on the computer readable storage medium
System, when the processing system is executed by processor the method for authentication of the realization as described in any one of claim 5 to 9
Step.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810030621.2A CN108154371A (en) | 2018-01-12 | 2018-01-12 | Electronic device, the method for authentication and storage medium |
PCT/CN2018/089461 WO2019136912A1 (en) | 2018-01-12 | 2018-06-01 | Electronic device, identity authentication method and system, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810030621.2A CN108154371A (en) | 2018-01-12 | 2018-01-12 | Electronic device, the method for authentication and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108154371A true CN108154371A (en) | 2018-06-12 |
Family
ID=62461520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810030621.2A Pending CN108154371A (en) | 2018-01-12 | 2018-01-12 | Electronic device, the method for authentication and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108154371A (en) |
WO (1) | WO2019136912A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109360573A (en) * | 2018-11-13 | 2019-02-19 | 平安科技(深圳)有限公司 | Livestock method for recognizing sound-groove, device, terminal device and computer storage medium |
CN109378002A (en) * | 2018-10-11 | 2019-02-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN109493873A (en) * | 2018-11-13 | 2019-03-19 | 平安科技(深圳)有限公司 | Livestock method for recognizing sound-groove, device, terminal device and computer storage medium |
CN109800309A (en) * | 2019-01-24 | 2019-05-24 | 华中师范大学 | Classroom Discourse genre classification methods and device |
CN109982137A (en) * | 2019-02-22 | 2019-07-05 | 北京奇艺世纪科技有限公司 | Model generating method, video marker method, apparatus, terminal and storage medium |
CN111341325A (en) * | 2020-02-13 | 2020-06-26 | 平安科技(深圳)有限公司 | Voiceprint recognition method and device, storage medium and electronic device |
CN112243487A (en) * | 2018-06-14 | 2021-01-19 | 北京嘀嘀无限科技发展有限公司 | System and method for on-demand services |
CN113421573A (en) * | 2021-06-18 | 2021-09-21 | 马上消费金融股份有限公司 | Identity recognition model training method, identity recognition method and device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112992156B (en) * | 2021-02-05 | 2022-01-04 | 浙江浙达能源科技有限公司 | Power distribution network dispatching identity authentication system based on voiceprint authentication |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5937381A (en) * | 1996-04-10 | 1999-08-10 | Itt Defense, Inc. | System for voice verification of telephone transactions |
CN106782564A (en) * | 2016-11-18 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing speech data |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN107331384A (en) * | 2017-06-12 | 2017-11-07 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN107527620A (en) * | 2017-07-25 | 2017-12-29 | 平安科技(深圳)有限公司 | Electronic installation, the method for authentication and computer-readable recording medium |
-
2018
- 2018-01-12 CN CN201810030621.2A patent/CN108154371A/en active Pending
- 2018-06-01 WO PCT/CN2018/089461 patent/WO2019136912A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5937381A (en) * | 1996-04-10 | 1999-08-10 | Itt Defense, Inc. | System for voice verification of telephone transactions |
CN106782564A (en) * | 2016-11-18 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing speech data |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN107331384A (en) * | 2017-06-12 | 2017-11-07 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN107527620A (en) * | 2017-07-25 | 2017-12-29 | 平安科技(深圳)有限公司 | Electronic installation, the method for authentication and computer-readable recording medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112243487A (en) * | 2018-06-14 | 2021-01-19 | 北京嘀嘀无限科技发展有限公司 | System and method for on-demand services |
CN109378002A (en) * | 2018-10-11 | 2019-02-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN109360573A (en) * | 2018-11-13 | 2019-02-19 | 平安科技(深圳)有限公司 | Livestock method for recognizing sound-groove, device, terminal device and computer storage medium |
CN109493873A (en) * | 2018-11-13 | 2019-03-19 | 平安科技(深圳)有限公司 | Livestock method for recognizing sound-groove, device, terminal device and computer storage medium |
CN109800309A (en) * | 2019-01-24 | 2019-05-24 | 华中师范大学 | Classroom Discourse genre classification methods and device |
CN109982137A (en) * | 2019-02-22 | 2019-07-05 | 北京奇艺世纪科技有限公司 | Model generating method, video marker method, apparatus, terminal and storage medium |
CN111341325A (en) * | 2020-02-13 | 2020-06-26 | 平安科技(深圳)有限公司 | Voiceprint recognition method and device, storage medium and electronic device |
CN113421573A (en) * | 2021-06-18 | 2021-09-21 | 马上消费金融股份有限公司 | Identity recognition model training method, identity recognition method and device |
CN113421573B (en) * | 2021-06-18 | 2024-03-19 | 马上消费金融股份有限公司 | Identity recognition model training method, identity recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2019136912A1 (en) | 2019-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108154371A (en) | Electronic device, the method for authentication and storage medium | |
CN107993071A (en) | Electronic device, auth method and storage medium based on vocal print | |
TWI641965B (en) | Method and system of authentication based on voiceprint recognition | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
US11068571B2 (en) | Electronic device, method and system of identity verification and computer readable storage medium | |
CN102737633B (en) | Method and device for recognizing speaker based on tensor subspace analysis | |
CN108806695A (en) | Anti- fraud method, apparatus, computer equipment and the storage medium of self refresh | |
CN108281158A (en) | Voice biopsy method, server and storage medium based on deep learning | |
CN109378002A (en) | Method, apparatus, computer equipment and the storage medium of voice print verification | |
CN109584884A (en) | A kind of speech identity feature extractor, classifier training method and relevant device | |
CN110473552A (en) | Speech recognition authentication method and system | |
CN109378014A (en) | A kind of mobile device source discrimination and system based on convolutional neural networks | |
CN105096955A (en) | Speaker rapid identification method and system based on growing and clustering algorithm of models | |
DE102020133233A1 (en) | ENVIRONMENTAL CLASSIFIER FOR DETECTING LASER-BASED AUDIO IMPACT ATTACKS | |
CN110556126A (en) | Voice recognition method and device and computer equipment | |
CN108091326A (en) | A kind of method for recognizing sound-groove and system based on linear regression | |
CN106991312B (en) | Internet anti-fraud authentication method based on voiceprint recognition | |
CN108650266B (en) | Server, voiceprint verification method and storage medium | |
CN108694952A (en) | Electronic device, the method for authentication and storage medium | |
CN111161713A (en) | Voice gender identification method and device and computing equipment | |
CN113112992B (en) | Voice recognition method and device, storage medium and server | |
CN108630208A (en) | Server, auth method and storage medium based on vocal print | |
CN116665649A (en) | Synthetic voice detection method based on prosody characteristics | |
Zhang et al. | A highly stealthy adaptive decay attack against speaker recognition | |
CN115631748A (en) | Emotion recognition method and device based on voice conversation, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180612 |
|
RJ01 | Rejection of invention patent application after publication |