CN101938489B

CN101938489B - Without-trust third party public key authentication method based on speaker voice print

Info

Publication number: CN101938489B
Application number: CN2010102830136A
Authority: CN
Inventors: 吴震东
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Dianzi University; Hangzhou Electronic Science and Technology University
Priority date: 2010-09-14
Filing date: 2010-09-14
Publication date: 2012-09-12
Anticipated expiration: 2030-09-14
Also published as: CN101938489A

Abstract

The invention discloses a without-trust third party public key authentication method based on a speaker voice print, comprising the following steps that 1. the two parties determine a voice channel; 2. the two parties enable the voice channel and respectively establish a GMM (Gaussian mixture model) of the speaker voice print of the opposite side; 3. the two parties respectively generate a publicand private key pair; the two parties are symmetrical below; 4. a public key sender states a public key for a receiver through the voice; 5. a public key receiver carries out voice print identification on a voice stream for stating the public key by using the speaker GMM to judge whether to be the sender or not, if yes, carrying out a step 6, and if not, denying; 6. the public receiver extracts the public key of the call voice, and the encrypted public key for a random number is generated and sent to a public key sending end; and 7. after the public key sender receives the random number, the random number is encrypted by the public key and sent to the receiver; after the receiver receives the random number, the random number is decrypted and verified by the public key in the step 6, if pass, the public key authentication is successful, and if fail, the public key authentication fails. The invention does not need the third party to participate and is convenient and flexible to use; andthe two communication parties directly transfer and authenticate the public key through a voice call, thereby effectively preventing the attack of other people.

Description

A kind of no trusted third party authentication public key method based on words person's vocal print

Technical field

The invention belongs to the network security technology field; Be specifically related to a kind of no trusted third party authentication public key method based on words person's vocal print; It can not arranged under the situation of key for the both sides of network immediate communication in advance, and making things convenient for and opening one section encryption session safely provides technical support.

Background technology

Public-key cryptosystem is widely used in the every field of network encryption communication.Existing public-key cryptosystem has three kinds of patterns: based on certificate, based on identity and from authentication, these three kinds of patterns all need the existence of a trusted third party.Pattern based on certificate is the basis with PKIX PKI, and public key certificate needs a believable certification authority (CA) to issue, and the user solves the reliability of certificate to CA.Based on the pattern of identity directly with user's identity information as its PKI, need not certification authority and issue PKI, but need a believable key escrow mechanism trustship private key.Produced by user oneself or produce private key jointly with CA from the authentication pattern, CA no longer knows private key for user, does not have the key escrow problem, but but user's self-certified public key needs user and CA to produce jointly, i.e. the participation of a trusted third party of needs still.

In the network communications environment of reality, communicating pair can occur and think immediately to carry out coded communication, but not want to involve third-party situation.At this moment, existing public-key cryptosystem is difficult to realize.

Summary of the invention

The present invention is directed to one type of demand in the network encryption communication; Proposed a kind of no trusted third party authentication public key method based on words person's vocal print, this method need not trusted third party and participates in the coded communication process, and communicating pair is directly through voice call transmission and authentication public key information; Under words person's vocal print can not be by the hypothesis of real-time forgery; This method can prevent man-in-the-middle attack, its safe (suitable) with public key cryptography pattern based on certificate, and concrete steps are following:

The first step, communicating pair is confirmed speech channel, requires the voice flow of ability voice passage input and output to carry out Digital Signal Processing;

In second step, communicating pair opening voice conversation in dialog procedure, is collected the other side's voice messaging and is trained the GMM model (gauss hybrid models) of words person's vocal print of the other side;

In the 3rd step, communicating pair independently generates public private key pair separately, keeps private key, the exchange PKI;

The operation of following steps communicating pair is symmetrical, and a side operating process is only narrated in narration for ease, and a side that will send PKI in the narration is called the PKI transmit leg, and the opposing party is called the PKI recipient:

In the 4th step, the PKI transmit leg is directly stated PKI to the recipient through voice;

In the 5th step, the PKI recipient carries out Application on Voiceprint Recognition with the words person GMM model of the relevant PKI transmit leg that second step generated to the voice flow of stating PKI, judges whether this voice flow is that transmit leg is said; If, then carried out for the 6th step, if not, then refuse this PKI;

The 6th step; The PKI recipient extracts the PKI in the call voice; And (the random number summary is generated by general HASH algorithm, and like MD5, SHA1 etc., general HASH algorithms such as MD5, SHA1 are general-purpose algorithms in the present technique field to generate one section random number and summary thereof; No longer detail), send to the PKI transmit leg after with PKI random number and summary thereof being encrypted;

In the 7th step, after the PKI transmit leg is received random number and summary ciphertext thereof,,, and send to the recipient if correctly then random number and summary thereof are encrypted with our private key with our private key deciphering and checking digest value; After the recipient received this section ciphertext, PKI deciphering and checking random number and digest value thereof with the 6th step extracted through then authentication public key success, did not pass through then authentication public key failure.

The GMM model that the present invention gets words person's voice characterizes unique biometry characteristic that words person's voice are contained; If the assailant will forge PKI; It need replace the speech stream that words person states PKI; And the speech stream after the replacement must detect through vocal print, and this just requires the assailant can not only forge speech stream in real time, also wants to forge the speech stream that meets the unique biometry characteristic of words person in real time.This is difficult to realize under the prior art condition.

Description of drawings

Fig. 1 is the authentication public key sketch map based on words person's vocal print.

Fig. 2 is the GMM modelling flow chart of words person's vocal print.

Fig. 3 is words person's Application on Voiceprint Recognition flow chart.

Embodiment

Below in conjunction with accompanying drawing the present invention is described further.

In network environment, transmit a major issue that PKI need solve and be exactly and how to make the PKI recipient confirm PKI and have the corresponding relation between the private key user of this PKI.In the present invention; The affirmation of this corresponding relation is to bind together through unique biometry characteristic that PKI and words person's voice are contained; Thereby confirm that PKI is that words person sends out, promptly confirmed PKI and had the corresponding relation between the private key user of this PKI, referring to Fig. 1.The no trusted third party authentication public key method that the present invention is based on words person's vocal print is as follows:

The first step, communicating pair is confirmed speech channel, requires the voice flow of ability voice passage input and output to carry out Digital Signal Processing.

In second step, communicating pair opening voice conversation in dialog procedure, is collected the other side's voice messaging and is trained the GMM model (gauss hybrid models) of words person's vocal print of the other side.

Words person's sound-groove model training process divided for 3 steps:

1, the voice sequence (pcm stream) of input is removed silent signal and divided frame to handle;

2, extract the MFCC parameter (Mei Er cepstrum coefficient) and the preservation of each frame voice signal;

3, the MFCC parameter training words person's who extracts with the 2nd step GMM (gauss hybrid models) obtains exclusive certain words person's GMM sound-groove model.

Voice sequence like input is not a pcm stream, then is decoded as pcm stream earlier and handles.With 256 sampled points is a sound frame unit, 128 to be overlapping unit between the sound frame, the input voice flow is carried out the branch frame.

The MFCC parameter of calculating and preserving all each frame voice signals, every frame voice signal produces 20 MFCC parameters.

GMM Model Calculation flow process is as shown in Figure 2, and the main formulas that relates to is:

(1), p (\overset{&RightArrow;}{x} | λ) = Σ_{i = 1}^{M} p_{i} b_{i} (\overset{&RightArrow;}{x}); - - - (1)

For D dimension random vector, corresponding with r [];

It is M group D dimension Gaussian probability-density function; p _i, i=1 ..., M is the mixed number of M group Gauss vector,

(2) D dimension Gaussian probability-density function formula

b_{i} (\overset{&RightArrow;}{x}) = \frac{1}{{(2 π)}^{D / 2} {| Σ_{i} |}^{1 / 2}} \exp {- \frac{1}{2} {({\overset{&RightArrow;}{x}}_{i} - {\overset{&RightArrow;}{μ}}_{i})}^{'} (Σ_{i}^{- 1}) ({\overset{&RightArrow;}{x}}_{i} - {\overset{&RightArrow;}{μ}}_{i})};

(3) words person GMM model parameter group

λ＝{p _i，u _i，∑ _i}，i＝1，...，M

The purpose of GMM model training promptly obtains GMM parameter group λ={ p of specific words person _i, u _i, ∑ _i, i=1 ..., M.

Step is:

1), reads in the MFCC argument sequence of training utterance, i.e. x _i[]=r _i[], i=1 ..., T; The totalframes of T=training utterance.

2), set the initial parameters value

λ_{0} = {{\overset{&RightArrow;}{p}}_{0 i}, {\overset{&RightArrow;}{u}}_{0 i}, Σ_{0 i}}, i = 1, . . ., M

3), maximize algorithm (be called for short EM) with desired value, iterative computation

stops until

algorithm.The λ that obtains is specific words person GMM parameter group.

Step 2) specific algorithm is:

{\overset{&RightArrow;}{p}}_{0 i} = \frac{1}{M};

obtained by the k-mean algorithm; Vectorial quantity in order to training k-average is 1..T;

is covariance matrix; I=1;, M, D are MFCC parameter vector dimension=20; Be convenience of calculation, suppose that it is a diagonal matrix.

One group of vector of

, M group altogether.

Ti is for to belong to u after the k-mean algorithm _0iThe training utterance frame of group.

The step 3) specific algorithm is:

A) be ready to T training vector, be designated as

X = {{\overset{&RightArrow;}{x}}_{1}, . . ., {\overset{&RightArrow;}{x}}_{T}}

B) calculating posterior probability λ is the GMM parameter group that obtains after the last round of iteration.

C) calculate

{\overset{&RightArrow;}{p}}_{i} = \frac{1}{T} Σ_{t = 1}^{T} p (i | {\overset{&RightArrow;}{x}}_{t}, λ)

D) calculate

{\overset{&RightArrow;}{u}}_{i} = \frac{Σ_{t = 1}^{T_{i}} p (i | {\overset{&RightArrow;}{x}}_{t}, λ) {\overset{&RightArrow;}{x}}_{t}}{Σ_{t = 1}^{T_{i}} p (i | {\overset{&RightArrow;}{x}}_{t}, λ)}

E) calculate

{\overset{&RightArrow;}{σ}}_{i}^{2} = \frac{Σ_{t = 1}^{T_{i}} p (i | {\overset{&RightArrow;}{x}}_{t}, λ) x_{t}^{2}}{Σ_{t = 1}^{T_{i}} p (i | {\overset{&RightArrow;}{x}}_{t}, λ)} - {\overset{&RightArrow;}{u}}_{i}^{2}

F) calculate

if; Then the iteration training finishes; Obtain words person GMM parameter group model

G) if not, then make

to return b) step continuation calculating.

Annotate:

b_{i} (\overset{&RightArrow;}{x}) = \frac{1}{{(2 π)}^{D / 2} {| Σ_{i} |}^{1 / 2}} Exp {- \frac{1}{2} {({\overset{&RightArrow;}{x}}_{i} - {\overset{&RightArrow;}{μ}}_{i})}^{'} (Σ_{i}^{- 1}) ({\overset{&RightArrow;}{x}}_{i} - {\overset{&RightArrow;}{μ}}_{i})} D = 20;

| Σ_{i} | = σ_{i 1}^{2} σ_{i 2}^{2} . . . . . . σ_{iD}^{2}

In the 3rd step, communicating pair independently generates public private key pair separately, keeps private key, the exchange PKI.Public private key pair generates with RSA Algorithm, and specific algorithm is:

1) chooses two different big prime number p and q arbitrarily, calculate product r=p*q; Generally require p, q greater than 128bit.

2) choose a big integer e arbitrarily, e is with (p-1) * (q-1) is relatively prime, and integer e is used as encryption key; Annotate: all prime numbers greater than p and q all can be chosen as e.

3) confirm decruption key d:d*e=1mod ((p-1) * (q-1)).

4) obtain public private key pair, PKI be (r, e), private key be (d, e).

In the 4th step, the PKI transmit leg is directly stated PKI to the recipient through voice.

In the 5th step, the PKI recipient carries out Application on Voiceprint Recognition with the words person GMM model of the relevant PKI transmit leg that second step generated to the voice flow of stating PKI, judges whether this voice flow is that transmit leg is said; If, then carried out for the 6th step, if not, then refuse this PKI.

The Application on Voiceprint Recognition process: as shown in Figure 3, voice sequence X (having passed through the mfcc parameter extraction) the input GMM sound-groove model with words person calculates posterior probability

is formula (1)

Greater than a certain threshold value, think then that voice print matching is successful as

.

In the 6th step, the PKI recipient extracts the PKI in the call voice, and generates one section random number and send to the PKI transmitting terminal after with public key encryption.

In the 7th step, after the PKI transmit leg is received random number, with private key random number and summary thereof are encrypted, and send to the recipient; After the recipient receives this section ciphertext,,, do not pass through then authentication public key failure through then authentication public key success with PKI deciphering and the checking that the 6th step extracted.

Those of ordinary skill in the art will be appreciated that; Above embodiment is used for explaining the present invention; And be not as to qualification of the present invention, as long as in essential scope of the present invention, all will drop on protection scope of the present invention to variation, the modification of above embodiment.

Claims

1. no trusted third party authentication public key method based on words person's vocal print is characterized in that as follows:

The first step, communicating pair is confirmed speech channel, the voice flow of ability voice passage input and output carries out Digital Signal Processing;

In second step, communicating pair opening voice conversation in dialog procedure, is collected the other side's voice messaging and is trained the GMM model of words person's vocal print of the other side;

The operation of following steps communicating pair is symmetrical, only narrates a side operating process, and a side who sends PKI is called the PKI transmit leg, and the opposing party is called the PKI recipient:

In the 5th step, the PKI recipient carries out Application on Voiceprint Recognition with the GMM model of words person's vocal print of the relevant PKI transmit leg of second step generation to the voice flow of stating PKI, judges whether this voice flow is that transmit leg is said; If, then carried out for the 6th step, if not, then refuse this PKI;

The 6th the step, the PKI recipient extracts the PKI in the call voice, and generate one section random number and the summary with sending to the PKI transmitting terminal behind the public key encryption;

2. the no trusted third party authentication public key method based on words person's vocal print as claimed in claim 1 is characterized in that: in described second step: the GMM model training process of words person's vocal print divided for 3 steps:

(1) voice sequence of input is removed silent signal and divided frame to handle, voice sequence is a pcm stream;

(2) the MFCC parameter of extracting each frame voice signal is also preserved;

The GMM model of the MFCC parameter training words person vocal print that (3) extracts with the 2nd step obtains the GMM model of exclusive words person's vocal print.

3. the no trusted third party authentication public key method based on words person's vocal print as claimed in claim 2 is characterized in that: in described (1) step: if the voice sequence of input is not a pcm stream, then is decoded as pcm stream earlier and handles; With 256 sampled points is a sound frame unit, 128 to be overlapping unit between the sound frame, the input voice flow is carried out the branch frame.

4. the no trusted third party authentication public key method based on words person's vocal print as claimed in claim 2 is characterized in that: in described (2) step: calculate and preserve the MFCC parameter of all each frame voice signals, every frame voice signal produces 20 MFCC parameters.