CN108447491A - A kind of Intelligent voice recognition method - Google Patents
A kind of Intelligent voice recognition method Download PDFInfo
- Publication number
- CN108447491A CN108447491A CN201810224944.5A CN201810224944A CN108447491A CN 108447491 A CN108447491 A CN 108447491A CN 201810224944 A CN201810224944 A CN 201810224944A CN 108447491 A CN108447491 A CN 108447491A
- Authority
- CN
- China
- Prior art keywords
- voice
- pos machine
- user
- intelligent
- payment platform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000001514 detection method Methods 0.000 claims abstract description 19
- 230000008859 change Effects 0.000 claims abstract description 8
- 230000004069 differentiation Effects 0.000 claims abstract description 4
- 239000012634 fragment Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 description 20
- 239000013598 vector Substances 0.000 description 13
- 238000013528 artificial neural network Methods 0.000 description 11
- 101150060512 SPATA6 gene Proteins 0.000 description 9
- 239000002245 particle Substances 0.000 description 9
- 238000013507 mapping Methods 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 8
- 238000012795 verification Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 230000001755 vocal effect Effects 0.000 description 5
- 239000002131 composite material Substances 0.000 description 3
- 230000005484 gravity Effects 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 210000004205 output neuron Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000006462 rearrangement reaction Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/08—Payment architectures
- G06Q20/20—Point-of-sale [POS] network systems
- G06Q20/206—Point-of-sale [POS] network systems comprising security or operator identification provisions, e.g. password entry
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3247—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures
- H04L9/3249—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures using RSA or related signature schemes, e.g. Rabin scheme
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Accounting & Taxation (AREA)
- Computer Security & Cryptography (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Cash Registers Or Receiving Machines (AREA)
Abstract
The present invention provides a kind of Intelligent voice recognition method, this method includes:Step 1:Short-time rating and ZCR made to feature differentiation is mute and voice, carries out end-point detection;Step 2:Voice signal after end-point detection is divided into multiple frames of equal length;Step 3:Phonic signal character is obtained by the dynamic change of audio power;Step 4:Comparison result based on phonic signal character carries out the authenticating user identification of intelligent POS machine.The present invention proposes a kind of Intelligent voice recognition method, realizes intelligent POS machine terminal identity authentication data and is locally stored, compares and operation, need not configure hardware encryption equipment, without being uploaded to payment platform, more safety.
Description
Technical field
The present invention relates to speech recognition, more particularly to a kind of Intelligent voice recognition method.
Background technology
The safety of the especially intelligent POS machine of the network security of point-of-sale terminal attracts people's attention at present, by intelligent POS machine into
The safety problem that row information is transmitted is increasingly subject to the attention of people.All it is to use user in current intelligent POS machine application
The user authentication of name, password, and to intelligent POS machine user issuing digital certificate, and can not using hardware encryption terminal secret key
Export property, reinforces the identity security of user.But the hardware encryption equipment of any form need to all be set in the external entity of intelligent POS machine
It is standby, the ease for use of scheme is more reduced, the operation complexity of user is increased.For the fingerprint recognition of the prior art, identification letter
Breath needs to transmit, and safety is challenged.If the feature database of payment platform storage loses, identity can not be carried out and recognized
Card.
Invention content
To solve the problems of above-mentioned prior art, the present invention proposes a kind of Intelligent voice recognition method, including:
Step 1:Short-time rating and ZCR made to feature differentiation is mute and voice, carries out end-point detection;
Step 2:Voice signal after end-point detection is divided into multiple frames of equal length;
Step 3:Phonic signal character is obtained by the dynamic change of audio power;
Step 4:Comparison result based on phonic signal character carries out the authenticating user identification of intelligent POS machine.
Preferably, the end-point detection further comprises:
Before detection, first thresholding is determined for short-time rating and ZCR, then Continuous plus short-time rating and ZCR, adjust thresholding,
By state analysis to judge whether mute section terminated.
Preferably, in end-point detection, frequency band is divided into 4 sections, calculates the power ratio SE of subband according to the following formula:
Wherein:UiAnd LiThe upper limiting frequency and lower frequency limit of expression subband i respectively, i=1,2,3,4;X (ω) indicates signal
Amplitude at frequencies omega;
If the power ZCR of certain frame signal is less than thresholding, and the SE approximately equals of 4 cross-talk bands, then judge to be mute section.
Preferably, the step 2 further comprises:
Voice signal is divided into R isometric non-overlapping frames, is denoted as fk={ fk(n) | n=1,2 ..., L/R;K=1,
2 ..., R }, wherein:L is voice signal length;R is totalframes;fk(n) it is n-th of sampled value of kth frame.
Preferably, the step 3 further comprises:
The dynamic change of audio power is calculated by the power difference between consecutive frame and its adjacent sub-bands, including to adjacent
Subband rate of doing work difference, then difference is asked to the differential power of consecutive frame, carry out threshold decision.
Preferably, the step 4 further comprises:
During voice authentication, using the similitude of blackman distance metric voice signals, for two audio pieces
Section θ1And θ2, h1It is denoted as voice signal θ1Hash index value, h2It is denoted as voice signal θ2Hash index value;D is denoted as h1And h2's
Regularization blackman distance D, the i.e. ratio of hash index value mistake digit and total bit, calculation formula are:
If two audio fragment θ1And θ2Feature it is identical, thenIf two audio fragment θ1And θ2Spy
Sign differs, thenWhereinTo identify certification threshold value.If distanceThen think two audio section θ1And θ2
Feature it is identical.
The present invention compared with prior art, has the following advantages:
The present invention proposes a kind of Intelligent voice recognition method, and it is local to realize intelligent POS machine terminal identity authentication data
Storage compares and operation, hardware encryption equipment need not be configured, without being uploaded to payment platform, more safety.
Description of the drawings
Fig. 1 is a kind of flow chart of Intelligent voice recognition method according to the ... of the embodiment of the present invention.
Specific implementation mode
Retouching in detail to one or more embodiment of the invention is hereafter provided together with the attached drawing of the diagram principle of the invention
It states.The present invention is described in conjunction with such embodiment, but the present invention is not limited to any embodiments.The scope of the present invention is only by right
Claim limits, and the present invention covers many replacements, modification and equivalent.Illustrate in the following description many details with
Just it provides a thorough understanding of the present invention.These details are provided for exemplary purposes, and without in these details
Some or all details can also realize the present invention according to claims.
An aspect of of the present present invention provides a kind of Intelligent voice recognition method.Fig. 1 is one kind according to the ... of the embodiment of the present invention
Intelligent voice recognition method flow chart.
The intelligent POS machine of the present invention is connect with payment platform by safe lane.Intelligent POS machine obtains pre- from payment platform
The speech recognition request first started.The identification method currently supported based on intelligent POS machine judges whether the intelligent POS machine is supported
The speech recognition.
If supporting the speech recognition, intelligent POS machine client carries out user identity using the recognition result of user speech
Verification.
If being verified, random number is encrypted in the private key of the RSA key centering generated when being started using authentication,
The first secret value is obtained, and first secret value is sent to the payment platform by intelligent POS machine client, for institute
It states the client public key obtained when payment platform is started based on first secret value and authentication and carries out authentication.
During authenticating user identification, intelligent POS machine is downloaded the certification that current intelligent POS machine starts by payment platform and is asked
Ask, and the identification method that current intelligent POS machine is supported found using intelligent POS machine client, according to the certification request of startup and
The authentication mode that current intelligence POS machine is supported filters out the current intelligent available certification of POS machine and is presented to user, is selected for user
And verification.
After user's checking, using the RSA that is generated in security context of authentication module of intelligence POS machine when starting speech recognition
Random number is encrypted in the private key for user of cipher key pair, and secret value is returned to payment platform.Payment platform utilizes startup language
The validity of the client public key verification secret value stored after sound identification.
After obtaining secret value, whether effectively judge whether authentication succeeds according to secret value, if secret value has
Effect, authentication success;If secret value is invalid, authentication failure.
Before starting certification, need to negotiate identification method, specific authentication starting mistake with payment platform using intelligent POS machine
Journey includes:
Intelligent POS machine obtains the identification method negotiated from the payment platform;Enumerate the identification that intelligent POS machine is currently supported
Mode, judges whether the intelligent POS machine supports speech recognition;
If supporting, intelligent POS machine client carries out subscriber authentication using speech recognition;If subscriber authentication is logical
It crosses, authentication module generates RSA key pair in security context, and using the authentication module private key in intelligent POS machine to the RSA
The client public key of cipher key pair is encrypted, and generates the second secret value;
Then, the client public key of second secret value and authenticated module private key encryption is passed through intelligence by authentication module
POS machine client is uploaded to payment platform, so that whether payment platform uses the second secret value described in authentication module public key verifications
Effectively.
During being somebody's turn to do, the identification method that current intelligent POS machine is supported is found using intelligent POS machine client, according to current
The identification method that intelligent POS machine is supported filters out available certification and is presented to user, after user's checking, the certification of intelligent POS machine
Module generates RSA key, and public key and the certification request return authentication management platform of startup are stored.
After starting speech recognition, RSA key pair is generated in the trusted storage block of intelligent POS machine, and by RSA key pair
In client public key export, by Encryption Transmission Protocol, client public key is transmitted to payment platform.Intelligence POS in the next use
When machine, after authentication module completes authentication, the private key for the RSA key centering stored in trusted storage block is directly invoked to abstract
It is encrypted, and secret value is transmitted to payment platform and is verified.
The speech recognition request wherein sent by the interface intelligence POS machine client of the trusted storage block, and root
According to the identification request received, corresponding identification process is created, and by executing the identification process, manage the authentication module
And voice acquisition module completes the identification process jointly.
Specifically, sent first by the interface intelligence POS machine client of the trusted storage block when payment platform
When speech recognition request, payment platform creates identification process, and by executing the identification process according to the speech recognition request,
Call instruction is sent to the authentication module.
Secondly, the authentication module is after the call instruction for receiving payment platform transmission, according to the call instruction, determine to
Payment platform returns to the acquisition instructions for calling the voice acquisition module.So that payment platform is according to the acquisition instructions, to the language
Sound acquisition module forwards the acquisition instructions.
Later, the acquisition instructions that voice acquisition module is forwarded according to payment platform pass through the interface tune of the trusted storage block
The sound bite is acquired with the voice-input device of intelligent POS machine, and collected sound bite is returned by payment platform
The authentication module.
The authentication module receives the sound bite of the voice acquisition module acquisition of payment platform forwarding.If payment platform is sent
Call instruction in carry the identity information to be identified, then the authentication module can create the sound bite and this is to be identified
The incidence relation of identity information, and using the sound bite and the identity information to be identified voice messaging to be identified as this
Return to payment platform.Alternatively, the authentication module according to preset algorithm, extracts the corresponding user's language to be identified of the sound bite
Sound feature templates, then the incidence relation of the user vocal feature template and the identity information to be identified to be identified is created, and
The user vocal feature template to be identified and the identity information to be identified voice messaging to be identified as this are returned
Payment platform.
And when not carrying the identity information to be identified in the call instruction that payment platform is sent, then the authentication module can
The sound bite is directly returned into the user vocal feature mould to be identified that payment platform or the authentication module can will extract
Plate returns to payment platform.Payment platform receives the sound bite to be identified or the user vocal feature template to be identified,
When payment platform receives the voice messaging to be identified, payment platform is according to scheduled safety is advised in advance with payment platform
Then, intelligent POS machine client is returned to by the interface of the trusted storage block after which being encrypted, when
When payment platform receives the sound bite or the user vocal feature template to be identified, payment platform can be according to the tune
With business, corresponding identity information to be identified is determined, and then determine the voice messaging to be identified, and to be identified to this
Voice messaging by the interface of the trusted storage block returns to intelligent POS machine client after being encrypted.
In the preferred embodiment of the invention, the intelligence POS machine client is carried out using the recognition result of user speech
Subscriber authentication further includes:The voice of input is verified, is the user of login bank card reading program after being verified
ID generates a pair of public and private key, and private key is stored securely in the trusted storage block of intelligent POS machine;Use the trusted storage of intelligent POS machine
Terminal secret key built in block carries out the public key of User ID and the characteristic sequence of the voice of the User ID and login user ID
Encryption;
Wherein, terminal secret key is preset in the secure storage section of equipment when intelligent POS machine is dispatched from the factory;Every POS
The public private key pair of machine has uniqueness;
It is that the characteristic sequence of the voice is encrypted when the voice to login user ID is encrypted, voice messaging
Characteristic sequence is generated in the trusted storage block storage of intelligent POS machine, characteristic sequence create-rule can be according to any suitable
Audio database search rule generates, and the corresponding sound bite of characteristic sequence has uniqueness.
Will include by the encrypted public key of terminal secret key, the User ID and the phonetic feature sequence as certification
Request is sent to payment platform, so that payment platform verifies the public key after receiving the certification request, and preserves
The public key, the User ID and the phonetic feature sequence.
The terminal secret key is preset in the secure storage section of equipment in the trusted storage block manufacture of intelligent POS machine
, then the terminal public key in payment platform can be sent to payment platform storage in advance by intelligent POS machine terminal or payment is flat
It directly stores in platform, is identified by equipment unique identifier between a pair of of terminal public key and terminal secret key;
After payment platform receives certification request, the information for including in certification request is by the terminal of intelligent POS machine terminal
Private key encryption, thus payment platform retrieves terminal public key corresponding with the terminal secret key by the encryption information, completes verification;
After being verified by certification request the public key and User ID and phonetic feature sequence store;Payment platform is fed back
Recognition result to intelligent POS machine trusted storage block.
After the completion of registration, when logging on bank card reading program with registered above-mentioned User ID, input voice carries out
Verification operation;By the User ID private key that is stored in the trusted storage block of intelligent POS machine to the feature of User ID and the voice
Sequence is encrypted.
Certification request comprising User ID and phonetic feature sequence is sent to payment platform, so that payment platform reception is recognized
It is verified after card request, and checks that whether corresponding with User ID when registration the voice of the phonetic feature sequence in certification request be special
Sign sequence is consistent to obtain authentication result.
If certification does not pass through, the trusted storage block of intelligent POS machine initiates re-authentication request, and payment platform can not will have
It is added in authentication record by the phonetic feature sequence of certification, certification is that this has with phonetic feature sequence inconsistent when registration
The standby permission serviced using bank card reading program.
For the phonetic feature sequence inconsistent with the phonetic feature sequence in certification request, if recognizing again in initiation
The execution identifying code that can execute bank card reading program service is provided in card request, then by the phonetic feature in certification request
Sequence is stored in authentication record, completes authentication.
Before carrying out match cognization to voice, need to carry out preemphasis, filtering, adding window framing, end-point detection.With work(in short-term
Rate and ZCR make that feature differentiation is mute and voice.First thresholding is determined for short-time rating and ZCR, then Continuous plus is in short-term before detection
Power and ZCR adjust thresholding, carry out state analysis, judge whether mute section terminated.
In end-point detection, frequency band is divided into 4 sections, calculates the power ratio SE of subband according to the following formula:
Wherein:UiAnd LiThe upper limiting frequency and lower frequency limit of expression subband i respectively, i=1,2,3,4;X (ω) indicates signal
Amplitude at frequencies omega.
If the power ZCR of certain frame signal is less than thresholding, and the SE approximately equals of 4 cross-talk bands, then judge to be mute section.
Preferably, the detection of speech sound signal terminal point is realized using neural network combination particle cluster algorithm:
1:If one-dimensional neural network hidden node includes K × L θ and K λ, export node layer K × N number of θ for being included with
N number of λ, wherein K are hidden node number, and L is the number of input node, and N is output layer node number, and θ and λ are respectively phase rotated
Coefficient and phase controlling factor;Population and one-dimensional neural network relevant parameter are initialized;
2:One section of signal containing voice segments and noise is randomly choosed, by short-time rating, circular average magnitude difference function, frequency band
Variance is inputted as one-dimensional neural network, is indicated output of every frame signal start-stop as one-dimensional neural network, is completed one-dimensional god
It is built through training sample;
3:Training sample is inputted one-dimensional neural network to be trained, by population to one-dimensional Neural Network Optimization, is made
One-dimensional neural network output reaches with idea output is pre-designed requirement, and then completes one-dimensional neural metwork training;One-dimensional god
It is as follows through the specific Optimization Steps of network parameter:
1) the equity parameter to be optimized with study is initialized;The movement position of optimization particle and velocity vector are set
Count into matrix, wherein row indicates that each parameter to be learned, row indicate optimization Motion Particles;
2) output of entire one-dimensional neural network is calculated | Y>It is as follows to define fitness function by n:
|O>N indicates the expression of n-th of output neuron target output, | Y>N is n-th of output neuron reality output
Expression;
3) the current speed of each particle and position are updated by the speed of population and location formula;Particle i works as
Preceding speed update is simplified as:
vt+1 i=vt i+c1r2-c2xt i
The update of the current locations particle i is simplified as:
xt+1 i=xt i+vt+1 i
r1And r2For the independent random number between [0,1], c1And c2Indicate that acceleration limits factor, wherein c1For adjusting
The step-length that section particle is advanced to respective optimal location, c2The step-length advanced to global particle optimal location for adjusting individual.
4) fitness for calculating and evaluating each particle, the extreme value to more new individual and global extreme value;
5) when meeting termination condition to get to the optimal value of the parameter θ of the hidden layer of one-dimensional neural network and output layer, λ;
Then, then by the parameter store, optimization process terminates;Otherwise it goes to and 3) continues to search for;
After neural metwork training, original training sample is calculated with trained one-dimensional neural network, and defeated
Go out testing result, if output result is more than threshold value, then it is assumed that present frame is speech frame, is otherwise non-speech frame, then will be real
Border output result is compared with the signal speech frame indicated, if one-dimensional neural metwork training is ineffective, it is right to need again
It is trained;
Carry out speech terminals detection;One section of voice signal is taken, its characteristic quantity is extracted, then uses trained one-dimensional nerve
Network is detected it, final output speech terminals detection result.
After completing end-point detection, voice signal is divided into R isometric non-overlapping frames, is denoted as fk={ fk(n) | n=1,
2 ..., L/R;K=1,2 ..., R }, wherein:L is voice signal length;R is totalframes;fk(n) it is n-th of sampling of kth frame
Value.
Short Time Fourier Transform is carried out to every frame signal after pretreatment, marks off subband according to the following formula:
Bi=exp [(lgFmin+i(lgFmax-lgFmin)/M)]
Wherein:I indicates subband number, value 1,2,3 ..., M;M indicates number of sub-bands;Fmin、FmaxFor sense of hearing bandwidth
The bandwidth range of lower and upper limit, subband i is [Bi- 1, Bi].Subband power B is calculated on each subbandi, obtain M subband work(
Rate.
The dynamic change of audio power is calculated by the power difference between consecutive frame and its adjacent sub-bands:
E(k)n=e (k)n+1-e(k)n,
dE(k)n=E (k+1)n-E(k)n,
If dE (k)n≤ 0, F (k)n=0,
If dE (k)n> 0, F (k)n=1,
Wherein:N=0,1,2 ..., M-1 indicate subband number;K indicates frame number.
I.e. first to adjacent sub-bands rate of doing work difference E (k)n, then difference dE (k) is asked to the differential power of consecutive framen, carry out
Threshold decision, to obtain feature F (k)n。
By frequency range [0, fs/ 2] it is divided into N number of subband, calculates the center of gravity of m-th of subband:
Wherein:lm、hmFor the lower frequency limit and upper limiting frequency of subband;P (f) is the band power at f;
Then regularizing filter makes its value not influenced by subband selection, as follows with center of gravity again:
NCm=[Cm-(hm+lm)]/2(hm-lm)。
Wherein, NCmFor regularizing filter band center of gravity.
Using parametrization hash index table, former list item is mapped on hash index table, gives fingerprint F (k)nGet Ha Xisuo
Draw value:
H(F(k)n)=F (k)nMaxlen
Wherein:Maxlen is hash index table size;H(F(k)n) it is hash index value, numerical value is in 0~Maxlen-1;
Calculate kth frame voice signal fk(n) power ratio of every frame is calculated in ZCR in short-term:
Ck=Bk/(Rk+ b),
B is anti-spilled constant, R in formulakFor the ZCR in short-term of kth frame;
Obtain the vector H={ H (F (k) of power ration)Ck| k=1,2 ..., R }.
Next out of order encryption is used to Hash sequence H.Firstly generate the pseudo-random sequence S with Hash sequence equal length
=[s1, s2..., sR], then Hash sequence carries out position rearrangement reaction according to the value of pseudo-random number sequence, and encrypted sequence is h*
(si)=h (i),
Wherein:Only as H (i)>When H (i-1), h (i) is 1, and otherwise h (i) is 0.
During voice authentication, using the similitude of blackman distance metric voice signals, for two audio pieces
Section θ1And θ2, h1It is denoted as voice signal θ1Hash index value, h2It is denoted as voice signal θ2Hash index value.D is denoted as h1And h2's
Regularization blackman distance D, the i.e. ratio of hash index value mistake digit and total bit, calculation formula are:
If two audio fragment θ1And θ2Feature it is identical, thenIf two audio fragment θ1And θ2Spy
Sign differs, thenWhereinTo identify certification threshold value.If distanceThen think two audio section θ1And θ2
Feature it is identical, certification passes through;Otherwise certification does not pass through.
In an additional preferred embodiment, non-registered users can also carry out payment platform by random phonetic characters string
Registration.Specifically, payment platform generates random string and is sent to intelligent POS machine user;The random string that user will receive
Voice is recorded into, voice is sent to payment platform;After payment platform receives the voice of user, the MFCC for extracting voice is special
Sign;
According to the MFCC features of voice, character string text is converted speech into, if obtained character string text and pre- Mr.
At random string content it is identical, then by this section of phonetic symbol be active registration voice;Otherwise it is labeled as invalid voice;
Correspondingly, in Qualify Phase:When intelligent POS machine user sends out ID authentication request, payment platform firstly generates
Random string is sent to user, and user will record after sequence that the random string received is specified according to payment platform,
Certification voice is obtained, the certification voice generated is sent to payment platform;If user fails typing language within certain duration
Sound, then current random string failure, user's checking failure;
After payment platform receives certification voice, the MFCC features of certification voice are extracted;The user for verifying the certification voice is special
Sign whether belongs to active user and whether content is originally consistent with correct characters illustration and text juxtaposed setting, respectively obtains voice match value S1With text
With value S2;
By voice match value S1With text matches value S2Final score value is obtained after weighted sum, is relatively gone forward side by side with given threshold
Row judgement:When final score value is more than given threshold, then it is assumed that registered users and language of the certification voice from intelligent POS machine
The content of text of sound is correct, is verified;Otherwise authentication failed;
The calculating of final score value is as follows:
S=wS1+(1-w)S2
In formula, S is final score value, and w is weight, 0<w<1
Wherein, whether the user characteristics of the above-mentioned verification certification voice belong to active user and content and correct characters illustration and text juxtaposed setting
Whether this is consistent, and further comprises:
According to the first HMM of sequential build of correct characters illustration and text juxtaposed setting sheet;
According to the MFCC features and the first HMM of certification voice, the MFCC that certification voice is obtained using Viterbi algorithm is special
Mapping between sign and the first HMM states so that:
Φ*t=argmaxΦp(Xt|H,Φt)
In formula, XtFor the MFCC characteristic sets { x of certification voicet(1), xt(2) ..., xt(Nt), NtIt is special for certification voice
Total quantity is levied, subscript t represents certification voice segments, and H is the first HMM, ΦtFor the mapping of certification voice MFCC features and HMM states,
p(Xt| H, Φt) indicate certification voice MFCC characteristic sets XtIn the first HMM and state corresponded manner ΦtUnder whole likelihood
Probability, Φ *tFor the optimal mapping between the Viterbi algorithm certification voice MFCC features found and the first HMM states;
According to the mapping between the MFCC features and the first HMM states of certification voice, and then it is special to obtain certification voice MFCC
The mapping of sign and each character calculates certification voice in specific user's voice GMM model and the log-likelihood of general GMM model
Than as voice match value S1;Voice match value S1Calculation expression it is as follows:
In formula, xt(n) it is the n-th frame MFCC features of certification voice,Indicate that all character texts correspond in certification voice
MFCC feature quantities, d (n) be correct characters illustration and text juxtaposed setting sheet under the conditions of the corresponding character of certification voice n-th frame MFCC features,
Λ0 d(n)And Λd(n)The corresponding specific user's GMM models of respectively character d (n) and general GMM model, p (xt(n)|Λ0 d(n)And p
(xt(n)|Λd(n)) it is respectively whole likelihood probabilities of the xt (n) in two kinds of GMM models;
The string content for identifying certification voice, the string content that verification is obtained is as optimal character string;According to most
Excellent character string builds the 2nd HMM using general GMM model;
Mapping between the MFCC features of certification voice and the 2nd HMM states is obtained using Viterbi algorithm, and then is obtained
The mapping of certification voice MFCC features and each character;
According to respectively obtain under correct characters illustration and text juxtaposed setting sheet and optimal character string certification voice MFCC features with it is each
The mapping of character calculates certification voice specific user's voice GMM model and the log-likelihood ratio on general GMM model, as text
This matching value S2;Text matches value S2Calculation expression it is as follows:
In formula,Indicate the corresponding MFCC feature quantities of optimal character text, d in certification voice2(n) it is optimal character string
Under the conditions of the corresponding character of certification voice n-th frame MFCC features,For d2(n) corresponding general GMM model,For xt(n) in d2(n) the whole likelihood probability on general GMM model.
To eliminate the unmatched influence of channel, when estimating user's identification model, space and channel space are identified in user
It is modeled simultaneously based on factor analysis.One section of voice is indicated by a composite vector, i.e., speech space can by user and
The composite vector of channel forms.
Composite vector M is indicated with following formula:
M=s+c
S=m+Vy+Dz
C=Ux
Wherein s is user characteristics space vector, and c is channel space vector, and m is general GMM vectors, and V, D and U are spatial moments
Battle array.The component of vector x identifies that factor, the component of z are known as residual error factor as channel factor, the component of y as user.By estimating
The matrix in space is counted, user's identification model is established and tests into factor analytic process.
In space matrix estimation procedure, a voice output user and speech feature vector { x are given1, x2..., xTCan
:
Wherein mcRepresent the mean value subvector of channel c, xt, γt(c) be each GMM function state probability, Nc(s), Fc
(s), Sc(s) it is respectively zeroth order, the statistics of single order, second order of the user s on c-th of GMM.
Later, above-mentioned statistic is spliced:Nc(s) the diagonal matrix N (s) of CF × CF, F are spliced intoc(s) it is spliced into
CF × 1, column vector F (s), Sc(s) CF × CF diagonal matrix S (s) are spliced into, CF is the dimension of general GMM vectors.
The intermediate variable of each user is calculated again:
L (s)=VTΨ-1N (s) V,
Wherein Ψ is the covariance matrix of general GMM;
The single order and second order desired value that user identifies factor y (s) are calculated using L (s):
E [y (s)]=L-1(s)VTΨ-1F (s),
E[y(s)y-1(s)]=E [y (s)] E [yT(s)]+L-1(s)
N (s), F (s), S (s) are respectively the zeroth order of user s feature space vectors, the statistic of single order, second order;
It updates user and identifies space matrix V and covariance matrix Ψ s
V=ΨsF(s)E[yT(s)]/(∑sN(s)E[y(s)yT(s)]),
Ψnew=[ΨsN(s)]-1{ΨsS(s)-diag{ΨsF(s)E[yT(s)]VT}}。
In conclusion the present invention proposes a kind of Intelligent voice recognition method, intelligent POS machine terminal identity certification is realized
Data are locally stored, compare and operation, need not configure hardware encryption equipment, more safe without being uploaded to payment platform
Property.
Obviously, it should be appreciated by those skilled in the art, each module of the above invention or each steps can be with general
Computing system realize that they can be concentrated in single computing system, or be distributed in multiple computing systems and formed
Network on, optionally, they can be realized with the program code that computing system can perform, it is thus possible to they are stored
It is executed within the storage system by computing system.In this way, the present invention is not limited to any specific hardware and softwares to combine.
It should be understood that the above-mentioned specific implementation mode of the present invention is used only for exemplary illustration or explains the present invention's
Principle, but not to limit the present invention.Therefore, that is done without departing from the spirit and scope of the present invention is any
Modification, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.In addition, appended claims purport of the present invention
Covering the whole variations fallen into attached claim scope and boundary or this range and the equivalent form on boundary and is repairing
Change example.
Claims (6)
1. a kind of Intelligent voice recognition method, which is characterized in that including:
Step 1:Short-time rating and ZCR made to feature differentiation is mute and voice, carries out end-point detection;
Step 2:Voice signal after end-point detection is divided into multiple frames of equal length;
Step 3:Phonic signal character is obtained by the dynamic change of audio power;
Step 4:Comparison result based on phonic signal character carries out the authenticating user identification of intelligent POS machine.
2. according to the method described in claim 1, it is characterized in that, the end-point detection further comprises:
Before detection, first thresholding is determined for short-time rating and ZCR, then Continuous plus short-time rating and ZCR, adjust thresholding, pass through
State analysis is to judge whether mute section terminated.
3. according to the method described in claim 2, it is characterized in that:
In end-point detection, frequency band is divided into 4 sections, calculates the power ratio SE of subband according to the following formula:
Wherein:UiAnd LiThe upper limiting frequency and lower frequency limit of expression subband i respectively, i=1,2,3,4;X (ω) indicates signal in frequency
Amplitude at rate ω;
If the power ZCR of certain frame signal is less than thresholding, and the SE approximately equals of 4 cross-talk bands, then judge to be mute section.
4. according to the method described in claim 1, it is characterized in that, the step 2 further comprises:
Voice signal is divided into R isometric non-overlapping frames, is denoted as fk={ fk(n) | n=1,2 ..., L/R;K=1,2 ..., R },
Wherein:L is voice signal length;R is totalframes;fk(n) it is n-th of sampled value of kth frame.
5. according to the method described in claim 1, it is characterized in that, the step 3 further comprises:
The dynamic change of audio power is calculated by the power difference between consecutive frame and its adjacent sub-bands, including to adjacent sub-bands
Rate of doing work difference, then difference is asked to the differential power of consecutive frame, carry out threshold decision.
6. according to the method described in claim 1, it is characterized in that, the step 4 further comprises:
During voice authentication, using the similitude of blackman distance metric voice signals, for two audio fragment θ1With
θ2, h1It is denoted as voice signal θ1Hash index value, h2It is denoted as voice signal θ2Hash index value;D is denoted as h1And h2Canonical
Change blackman distance D, i.e. the ratio of hash index value mistake digit and total bit, calculation formula is:
If two audio fragment θ1And θ2Feature it is identical, thenIf two audio fragment θ1And θ2Feature not
It is identical, thenWhereinTo identify certification threshold value;If distanceThen think two audio section θ1And θ2Spy
It levies identical.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810224944.5A CN108447491B (en) | 2018-03-19 | 2018-03-19 | Intelligent voice recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810224944.5A CN108447491B (en) | 2018-03-19 | 2018-03-19 | Intelligent voice recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108447491A true CN108447491A (en) | 2018-08-24 |
CN108447491B CN108447491B (en) | 2021-08-10 |
Family
ID=63195147
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810224944.5A Active CN108447491B (en) | 2018-03-19 | 2018-03-19 | Intelligent voice recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108447491B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020048295A1 (en) * | 2018-09-05 | 2020-03-12 | 深圳追一科技有限公司 | Audio tag setting method and device, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000347686A (en) * | 1999-06-03 | 2000-12-15 | Toshiba Tec Corp | Voice processing device, service quality improvement assisting device using it, and goods sales control device |
CN102023604A (en) * | 2010-11-24 | 2011-04-20 | 陕西电力科学研究院 | Intelligent online monitoring system capable of preventing external damage on transmission line |
CN107104803A (en) * | 2017-03-31 | 2017-08-29 | 清华大学 | It is a kind of to combine the user ID authentication method confirmed with vocal print based on numerical password |
-
2018
- 2018-03-19 CN CN201810224944.5A patent/CN108447491B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000347686A (en) * | 1999-06-03 | 2000-12-15 | Toshiba Tec Corp | Voice processing device, service quality improvement assisting device using it, and goods sales control device |
CN102023604A (en) * | 2010-11-24 | 2011-04-20 | 陕西电力科学研究院 | Intelligent online monitoring system capable of preventing external damage on transmission line |
CN107104803A (en) * | 2017-03-31 | 2017-08-29 | 清华大学 | It is a kind of to combine the user ID authentication method confirmed with vocal print based on numerical password |
Non-Patent Citations (2)
Title |
---|
汉小欢: "基于功率谱差分和TEO的语音端点检测", 《计算机应用与软件》 * |
韩志艳: "《语音识别及语音可视化技术研究》", 30 January 2017, 东北大学出版社 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020048295A1 (en) * | 2018-09-05 | 2020-03-12 | 深圳追一科技有限公司 | Audio tag setting method and device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108447491B (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107104803B (en) | User identity authentication method based on digital password and voiceprint joint confirmation | |
US20220147602A1 (en) | System and methods for implementing private identity | |
US8384516B2 (en) | System and method for radio frequency identifier voice signature | |
CA2549092C (en) | System and method for providing improved claimant authentication | |
CN107517207A (en) | Server, auth method and computer-readable recording medium | |
US20220147607A1 (en) | System and methods for implementing private identity | |
US20060294390A1 (en) | Method and apparatus for sequential authentication using one or more error rates characterizing each security challenge | |
KR100297833B1 (en) | Speaker verification system using continuous digits with flexible figures and method thereof | |
JP2003132023A (en) | Personal authentication method, personal authentication device and personal authentication system | |
EP1962280A1 (en) | Method and network-based biometric system for biometric authentication of an end user | |
Ren et al. | Secure smart home: A voiceprint and internet based authentication system for remote accessing | |
GB2465782A (en) | Biometric identity verification utilising a trained statistical classifier, e.g. a neural network | |
EP3373177B1 (en) | Methods and systems for determining user liveness | |
US10970573B2 (en) | Method and system for free text keystroke biometric authentication | |
CN112751838A (en) | Identity authentication method, device and system | |
CN108416592A (en) | A kind of high speed voice recognition methods | |
KR101424962B1 (en) | Authentication system and method based by voice | |
CN108550368B (en) | Voice data processing method | |
Revathi et al. | Person authentication using speech as a biometric against play back attacks | |
KR20190142056A (en) | Voice recognition otp authentication method using machine learning and system thereof | |
CN108447491A (en) | A kind of Intelligent voice recognition method | |
KR20010019772A (en) | User Password Verification System and Method by Speech for Reinforced Security | |
Saleema et al. | Voice biometrics: the promising future of authentication in the internet of things | |
CN111785280B (en) | Identity authentication method and device, storage medium and electronic equipment | |
US10803873B1 (en) | Systems, devices, software, and methods for identity recognition and verification based on voice spectrum analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |