CN102238189A

CN102238189A - Voiceprint password authentication method and system

Info

Publication number: CN102238189A
Application number: CN2011102180429A
Authority: CN
Inventors: 何婷婷; 胡国平; 胡郁; 王智国; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2011-08-01
Filing date: 2011-08-01
Publication date: 2011-11-09
Anticipated expiration: 2031-08-01
Also published as: CN102238189B

Abstract

The invention discloses a voiceprint password authentication method and a voiceprint password authentication system. The method comprises the following steps of: receiving a voice signal recorded by a login user; extracting a voiceprint characteristic sequence from the voice signal; performing voice recognition on the voice signal to obtain the password contents of the login user; if the obtained password contents are different from a registration password text, determining the login user is an unauthenticated user; otherwise computing likelihood between the voiceprint characteristic sequence and each of a speaker voiceprint mode corresponding to the login user and a background model selected for the login user, wherein the background model comprises a text-unrelated universal background model and a text-related optimized background model; computing a likelihood ratio according to the obtained likelihood; and if the likelihood is greater than a preset threshold value, determining the login user is an effectively authenticated user, otherwise determining the login user is the unauthenticated user. By the method and the system, the voiceprint password authentication accuracy can be improved.

Description

Vocal print cipher authentication method and system

Technical field

The present invention relates to the cipher authentication technique field, particularly a kind of vocal print cipher authentication method and system.

Background technology

(Voiceprint Recognition VPR) is also referred to as Speaker Identification to Application on Voiceprint Recognition, and two classes are arranged, i.e. speaker's identification and speaker verification.The former in order to judge that certain section voice are some philtrums which is said, be " multiselect one " problem; And the latter is " differentiating one to one " problem in order to confirm that whether certain section voice are that the someone of appointment is said.Different tasks can be used different Application on Voiceprint Recognition technology with using.

Voiceprint is meant according to the voice signal that collects confirms speaker ' s identity, belongs to the differentiation problem of " one to one ".The voiceprint system of main flow has adopted the framework based on hypothesis testing now, by calculate respectively the vocal print signal with respect to the likelihood score of speaker's sound-groove model and background model and relatively they likelihood ratio and in advance rule of thumb the threshold size of setting confirm.Obviously the accuracy of background model and speaker's sound-groove model will directly have influence on the voiceprint effect, and the big more then modelling effect of amount of training data is good more under setting based on the statistical model of data-driven.

The vocal print cipher authentication is the relevant speaker ' s identity authentication method of a kind of text.This method requires the user speech input to determine cryptogram, and confirms speaker ' s identity in view of the above.The phonetic entry of determining cryptogram is all adopted in user's registration and authentication in this application, thereby its vocal print is often more consistent, can obtain accordingly than the speaker verification of text-independent and better authenticate effect.

In the vocal print cipher authentication system, the user substitutes traditional word string password input with voice input signal, and corresponding Verification System is preserved user's vocal print password with the form of speaker's sound-groove model.Existing vocal print cipher authentication system mostly is to adopt to calculate the likelihood score of vocal print signal with respect to speaker's sound-groove model and background model, and its likelihood score of comparison compares and the preset threshold value size is confirmed user identity.Therefore, the levels of precision of background model and speaker's sound-groove model will directly have influence on the effect of vocal print cipher authentication.

In the prior art, the vocal print cipher authentication system generally adopts universal background model, is used to simulate user's vocal print characteristic of text-independent, specifically is to obtain single universal background model with the offline mode training on the words personal data of speaking more of gathering.Though this universal background model has universality preferably, model description is accurate inadequately, and discrimination is lower, has influenced the accuracy of cipher authentication to a certain extent.

Summary of the invention

The embodiment of the invention provides a kind of vocal print cipher authentication method and system, to improve the accuracy rate of carrying out authentication based on the vocal print password.

A kind of vocal print cipher authentication method comprises:

Receive the voice signal of login user typing;

Extract the vocal print characteristic sequence in the described voice signal;

Described voice signal is carried out speech recognition, obtain the cryptogram of described login user;

If the cryptogram that obtains is different with the log-in password text of corresponding described login user, determine that then described login user is non-authenticated user;

If the cryptogram that obtains is identical with the log-in password text of corresponding described login user,

Then determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model;

Calculate described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model respectively;

According to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculate likelihood ratio;

If described likelihood ratio, determines then that described login user is effective authenticated user greater than preset threshold, otherwise determines that described login user is non-authenticated user.

Preferably, the background model of the described login user of described definite correspondence comprises:

If the optimization background model corresponding with the cryptogram of described login user arranged, then select the background model of this optimization background model as the described login user of correspondence; Otherwise select the background model of described universal background model as the described login user of correspondence.

Preferably, described method also comprises:

The voice signal of login user typing or the vocal print characteristic sequence that extracts from the voice signal of login user typing are write and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district;

Receive the registration voice signal of registered user's typing;

Described registration voice signal is carried out speech recognition, obtain described registered user's log-in password text;

Described registration voice signal or the vocal print characteristic sequence that extracts from described registration voice signal are write and the corresponding cryptogram corresponding cache of this registration voice signal district;

Train speaker's sound-groove model of corresponding described registered user according to the registration voice signal of described registered user's typing;

In real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area.

Alternatively, described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:

If data quantity stored reaches first preset value in the buffer area, and the current not optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with described universal background model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; If data quantity stored reaches first preset value in the buffer area, and current have an optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with this optimization background model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.

If data quantity stored reaches the integral multiple of second preset value in buffer area, is initial model with described universal background model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area.

Preferably, the registration voice signal of described registered user's typing repeats repeatedly;

Described described registration voice signal is carried out speech recognition, the log-in password text that obtains described registered user comprises:

Registration voice signal to each typing carries out speech recognition respectively, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result;

Selection has the log-in password text of the recognition result of the highest likelihood score score as described registered user.

A kind of vocal print cipher authentication system comprises:

Receiving element is used for when the user logins, and receives the voice signal of login user typing;

The vocal print feature extraction unit is used for extracting the vocal print characteristic sequence of described voice signal;

Voice recognition unit is used for described voice signal is carried out speech recognition, obtains the cryptogram of described login user;

Judging unit is used to judge whether the cryptogram of described voice recognition unit acquisition is identical with the log-in password of corresponding described login user;

The authentication result unit, the log-in password text that is used for judged result at described judging unit and is cryptogram that described voice recognition unit obtains and corresponding described login user determines that described login user is non-authenticated user not simultaneously;

The model determining unit, be used for judged result at described judging unit and be the log-in password text of cryptogram that described voice recognition unit obtains and described login user when identical, determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model;

First computing unit is used for calculating respectively the likelihood score of the background model that the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model of corresponding described login user and described vocal print characteristic sequence and described model determining unit determine;

Second computing unit is used for according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculates likelihood ratio;

Described judging unit is used to also to judge that whether likelihood ratio that described second computing unit calculates is greater than preset threshold;

Described authentication result unit, when also being used for judged result at described judging unit and being likelihood ratio that described second computing unit calculates greater than preset threshold, determine that described login user is effective authenticated user, otherwise determine that described login user is non-authenticated user.

Preferably, described system also comprises:

Inspection unit is used to check whether have the optimization background model corresponding with the log-in password text of described login user;

Described model determining unit when specifically being used for check result at described inspection unit and being having the corresponding optimization background model of log-in password text with described login user, is selected the background model of this optimization background model as the described login user of correspondence; Otherwise select the background model of described universal background model as the described login user of correspondence.

Preferably, described voice recognition unit also is used for writing and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district with the voice signal of login user typing or from the vocal print characteristic sequence that the voice signal of login user typing extracts;

Described receiving element also is used to receive the registration voice signal of registered user's typing;

Described voice recognition unit also is used for described registration voice signal is carried out speech recognition, obtains described registered user's log-in password text;

Described system also comprises:

Speaker's sound-groove model construction unit is used for training according to the registration voice signal of described registered user's typing speaker's sound-groove model of corresponding described registered user;

The background model construction unit is used for real-time data construct or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area according to each buffer area.

Alternatively, described background model construction unit, specifically be used for reaching first preset value a buffer area data quantity stored, and during current not relevant optimization background model with the corresponding cryptogram of this buffer area, with described universal background model is initial model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; Data quantity stored reaches first preset value in a buffer area, and it is current when the optimization background model relevant with the corresponding cryptogram of this buffer area arranged, with this optimization background model is initial model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.

Alternatively, described background model construction unit, specifically be used for reaching the integral multiple of second preset value a buffer area data quantity stored, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.

Described voice recognition unit carries out speech recognition to the registration voice signal of each typing respectively, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result;

Described system also comprises:

The password determining unit is used for selecting to have the log-in password text of the recognition result of the highest likelihood score score as described registered user from a plurality of recognition results that described voice recognition unit obtains.

Vocal print cipher authentication method and system that the embodiment of the invention provides, when carrying out user identity identification, the voice signal of typing carries out speech recognition when not only the user being logined, determine its password content, and it is carried out voiceprint, when carrying out voiceprint, based on many background models, promptly reach the optimization background model relevant with text with the universal background model of text-independent, by selecting suitable background model to realize accurately coupling, improved the accuracy rate of carrying out authentication based on the vocal print password effectively.

Description of drawings

In order to be illustrated more clearly in technical scheme of the invention process, to do to introduce simply to the accompanying drawing of required use among the embodiment below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the flow chart of embodiment of the invention vocal print cipher authentication method;

Fig. 2 be in the embodiment of the invention with the structure flow chart of the universal background model of text-independent;

Fig. 3 is a kind of flow chart that makes up the optimization background model relevant with text in the embodiment of the invention;

Fig. 4 is the flow chart that in the embodiment of the invention registration voice signal of registered user's typing is carried out speech recognition;

Fig. 5 is a kind of structural representation of embodiment of the invention vocal print cipher authentication system;

Fig. 6 is the another kind of structural representation of embodiment of the invention vocal print cipher authentication system;

Fig. 7 is the another kind of structural representation of embodiment of the invention vocal print cipher authentication system.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

As shown in Figure 1, be the flow chart of embodiment of the invention vocal print cipher authentication method, may further comprise the steps:

Step 101, the voice signal of reception login user typing.

Step 102 is extracted the vocal print characteristic sequence in the described voice signal.

Described vocal print characteristic sequence comprises one group of vocal print feature, can distinguish different speakers effectively, and same speaker's variation is kept relative stability.

Such as, described vocal print feature mainly contains: spectrum envelope parameter phonetic feature, fundamental tone profile, formant frequency bandwidth feature, linear predictor coefficient, cepstrum coefficient etc.Consider the quantification property of above-mentioned vocal print feature, the quantity of training sample and the problems such as evaluation of systematic function, can select MFCC (Mel Frequency Cepstrum Coefficient for use, the Mel frequency cepstral coefficient) feature, every frame speech data that the long 25ms frame of window is moved 10ms is done short-time analysis and is obtained MFCC parameter and single order second differnce thereof, amounts to 39 dimensions.Like this, every voice signal can be quantified as one 39 dimension vocal print characteristic sequence X.

Step 103 is carried out speech recognition to described voice signal, obtains the cryptogram of login user.

The processing mode of concrete speech recognition can adopt some existing mode, no longer describes in detail at this.

Step 104 judges whether the cryptogram that obtains is identical with the log-in password text of current login user; If then execution in step 105; Otherwise, execution in step 110.

Step 105 is determined the background model of corresponding described login user.

Wherein, speaker's sound-groove model is used to simulate registered users at the pronunciation characteristic of determining on the cryptogram, and background model is used to simulate many speakers' pronunciation general character.

In embodiments of the present invention, speaker's sound-groove model can make up by the registration voice signal according to user's typing when the user registers, and specifically can adopt building modes more of the prior art.The structure of background model can comprehensively adopt dual mode make up respectively with the universal background model of text-independent and with text relevant optimization background model, wherein, can obtain with the offline mode training by the words personal data of speaking more of gathering in advance with the universal background model of text-independent, concrete training process can not done qualification to this embodiment of the invention with reference to processing modes more of the prior art; The vocal print characteristic sequence that extracts in the voice signal of typing when the optimization background model relevant with text can be registered according to the user of record and login obtains with the online mode training.

Correspondingly, in this step, can there be different ways to select the background model of corresponding described login user as required, will describes in detail in the back this.

Step 106 is calculated described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model respectively.

Above-mentioned speaker's sound-groove model can online training obtains according to the registration voice signal when the user registers.Such as, with the universal background model is that initial model is by a small amount of speaker's data adjustment model partial parameters of various adaptive approachs bases, as at present commonly used based on maximum a posteriori probability (Maximum A Posterior, MAP) adaptive algorithms etc. are current speaker's individual character with user's vocal print general character self adaptation.Certainly, can also adopt other modes to train and obtain speaker's sound-groove model, this embodiment of the invention is not done qualification.

Suppose to obtain the vocal print characteristic sequence X that frame number is T, then its likelihood score corresponding to background model is:

p (X | UBM) = \frac{1}{T} Σ_{t = 1}^{T} Σ_{m = 1}^{M} c_{m} N (X_{t}; μ_{m}, Σ_{m}) - - - (1)

Wherein, c _mBe m Gauss's weight coefficient, satisfy μ _mAnd ∑ _mBe respectively m Gauss's average and variance.Wherein N (.) satisfies normal distribution, is used to calculate t vocal print characteristic vector X constantly _tLikelihood score on single Gaussian component:

N (X_{t}; μ_{m}, Σ_{m}) = \frac{1}{\sqrt{{(2 π)}^{n} | Σ_{m} |}} e^{- \frac{1}{2} {(X_{t} - μ_{m})}^{'} {Σ_{m}}^{- 1} (X_{t} - μ_{m})} - - - (2)

Described vocal print characteristic sequence X is similar to the above corresponding to the calculating of the likelihood score of speaker's sound-groove model, no longer describes in detail at this.

Step 107 according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, is calculated likelihood ratio.

Likelihood ratio is:

p = \frac{p (X | U)}{p (X | UBM)} - - - (3)

Wherein, p (X|U) is the likelihood score of described vocal print feature and speaker's sound-groove model, and p (X|UBM) is the likelihood score of described vocal print feature and background model.

Step 108 judges that whether described likelihood ratio is greater than preset threshold; If then execution in step 109; Otherwise execution in step 110.

Above-mentioned threshold value can be preestablished by system, in general, this threshold value is big more, then the sensitivity of system is high more, require user's pronunciation of the voice signal (password) of typing during as far as possible according to registration when login, otherwise then the sensitivity of system is lower, and there is certain variation in the pronunciation the when pronunciation of the voice signal of typing is with registration when allowing the user to login.

Step 109 determines that login user is effective authenticated user.

Step 110 determines that login user is non-authenticated user.

Need to prove, in order to improve the robustness of system, between above-mentioned steps 101 and step 102, can also carry out noise reduction process to described voice signal, such as, at first, continuous voice signal is divided into independently voice snippet and non-voice segment by short-time energy and short-time zero-crossing rate analysis to voice signal.Reduce the interference of channel noise and background noise then by the front end noise reduction process, improve the voice signal to noise ratio, handling for follow-up system provides clean signal.

The front is mentioned, in embodiments of the present invention, background model can comprise: with the universal background model of text-independent and with text relevant optimization background model, and can have different ways to select the background model of corresponding described login user as required, such as, can system initialisation phase (such as, can set the regular hour section), the universal background model of selection and text-independent is to adapt to the various different vocal print passwords of user's typing; And along with the operation of system, the user data relevant with the specific cryptosystem text that collects constantly increases, and can obtain the optimization background model relevant with this cryptogram according to these user data training.After this, the cryptogram of the current login user that can obtain according to above-mentioned steps 103 is selected corresponding background model.Certainly, in order to simplify the complexity in the realization, also can just select corresponding background model from system start-up according to the cryptogram of current login user.

Above-mentioned and universal background model text-independent can adopt modes more of the prior art, such as adopt 1024 or the mixed Gauss model of bigger Gaussage make up, its model parameter training process is as shown in Figure 2.

Step 201 is extracted the vocal print feature respectively from many speakers training utterance signal, each vocal print feature is as a characteristic vector.

Step 202 utilizes clustering algorithm that above-mentioned characteristic vector is carried out cluster, obtains K Gauss's initialization average, and K is the mixed Gauss model number that sets in advance.

Such as, (Gray) clustering algorithm approaches optimum regeneration code book by trained vector collection and certain iterative algorithm for Linde, Buzo can to adopt traditional LBG.

Step 203 utilizes EM (Expectation Maximization) algorithm iteration to upgrade the weight coefficient of above-mentioned average, variance and each Gauss's correspondence, obtains the universal background model with text-independent.

Concrete iteration renewal process is same as the prior art, is not described in detail at this.

Certainly, can also adopt other modes to make up above-mentioned and universal background model text-independent, this embodiment of the invention is not done qualification.

In embodiments of the present invention, no matter the user is in login mode or registration mode, the voice signal of user's typing or the vocal print feature of extracting from this voice signal can be write in the cryptogram corresponding cache district that this voice signal is identified, and make up or renewal and the corresponding relevant optimization background model of cryptogram according to the data in real time in the buffer area.Like this, can collect related data fast, thereby make described optimization background model obtain rapid Optimum, improve the efficient and the accuracy of Application on Voiceprint Recognition at the specific cryptosystem text.

Certainly, in actual applications,, also can only under registration mode or login mode, make up or upgrade and the corresponding relevant optimization background model of cryptogram in order to reduce the operand of system.This embodiment of the invention is not done qualification.

Therefore, in above-mentioned flow process shown in Figure 1, also can further may further comprise the steps: the voice signal of login user typing or the vocal print characteristic sequence that extracts from the voice signal of login user typing are write and described cryptogram corresponding cache district.At login state, receive the registration voice signal of registered user's typing; Described registration voice signal is carried out speech recognition, obtain described registered user's log-in password text; Described registration voice signal or the vocal print characteristic sequence that extracts from described registration voice signal are write and the corresponding cryptogram corresponding cache of this registration voice signal district.In addition, need train speaker's sound-groove model of corresponding described registered user according to the registration voice signal of described registered user's typing, also need in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area.

In embodiments of the present invention, can set up a corresponding cache district for each cryptogram, the corresponding different buffer area of different cryptograms, the voice signal of the corresponding same cryptogram of storage or the vocal print characteristic sequence that from described voice signal, extracts in this buffer area, above-mentioned voice signal not only comprises the voice signal of login user typing, the registration voice signal that also comprises registered user's typing, certainly, in a buffer area, store from the voice signal of different user all corresponding same cryptogram.

In real time according to the data construct in each buffer area or the renewal optimization background model relevant the time with the corresponding cryptogram of described buffer area, can after having new data to add described buffer area, promptly the relevant optimization background model of current and described cryptogram be upgraded at every turn.Certainly, in order to reduce overhead and computing workload, when the data that can also store satisfy certain predetermined condition, optimize background model accordingly in the buffer area of a corresponding cryptogram according to data construct in the described buffer area or renewal.When concrete the application, above-mentioned pre-conditioned and corresponding make up or upgrade the mode of optimizing background model can have multiple, such as:

A kind of mode is: if data quantity stored reaches first preset value (such as 500 or 600 etc.) in buffer area, and the current not optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with described universal background model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; If data quantity stored reaches first preset value in the buffer area, and current have an optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with this optimization background model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.

In this manner, the data volume of foundation was identical when background model was optimized in each structure or renewal, and when structure was optimized background model, the initial model of employing was above-mentioned universal background model, when upgrading the optimization background model, the initial model of employing is current optimization background model.In addition, in this manner, no matter be to make up Optimization Model or upgrade current optimization background model, all need to remove the data in the respective cache district afterwards, so that gather next group data.This mode can reduce the demand to the buffer area memory space.

Another kind of mode is: if data quantity stored reaches the integral multiple of second preset value (such as 500 or 600 etc.) in buffer area, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.

In this manner, make up at every turn or upgrade the data volume difference of foundation when optimizing background model, and when making up and upgrading current optimization background model, the initial model of employing all is above-mentioned universal background models.In addition, in this manner, need not after making up at every turn or upgrading current optimization background model, all will remove the data in the respective cache district, but bigger to the demand of spatial cache, can be applied under the environment with magnanimity spatial cache.Certainly, also can adopt and above-mentioned first kind of similar processing mode, when the data volume in buffer area acquires a certain degree (such as 50000), remove the data in this buffer area, in order to guarantee to optimize the characteristic of background model, when the data volume in this buffer area reaches above-mentioned second preset value again, not to be that initial model carries out renewal process with the universal background model, but be that initial model carries out renewal process with current optimization background model, when the data volume in follow-up buffer area reached the condition of upgrading once more then, continuing with the universal background model was that initial model carries out renewal process again.

As shown in Figure 3, be a kind of flow chart that makes up or upgrade the optimization background model in the embodiment of the invention, may further comprise the steps:

Step 301 is utilized the average μ of all vocal print characteristic sequence adaptive updates universal background model mixed Gaussians in the buffer area _m

Particularly, new Gaussian mean

Be calculated as the weighted average of sample statistic and original Gaussian mean, that is:

{\hat{μ}}_{m} = \frac{Σ_{i = 1}^{N} Σ_{t = 1}^{T_{i}} γ_{m} (x_{t}) x_{t} + τ μ_{m}}{Σ_{i = 1}^{N} Σ_{t = 1}^{T} γ_{m} (x_{t}) + τ} - - - (4)

Wherein, N is a vocal print characteristic sequence sum, T _iBe total frame length of i sentence vocal print characteristic sequence, x _tRepresent t frame vocal print feature, γ _m(x _t) representing that t frame vocal print feature falls within m Gauss's probability, τ is a forgetting factor, is used for historical average of balance and the sample update intensity to new average.In general, the τ value is big more, and then new average is restricted by original average mainly.And if the τ value is less, then new average has more embodied the characteristics that new samples distributes mainly by the sample statistic decision.The τ value can be pre-determined by system, also can select the parameter value that gradually changes in time, with the effect of continuous lifting new samples data

Step 302 is duplicated the universal background model variance as the optimization background model variance relevant with cryptogram.

Step 303 generates the optimization background model relevant with cryptogram.

Process according to the optimization background model relevant with the log-in password text of the Data Update in the buffer area is similar to the above, does not repeat them here.

Need to prove, in embodiments of the present invention, the registration voice signal of registered user's typing can be typing once, also can be the repetition typing repeatedly, to guarantee the accuracy of log-in password.

If repeat typing repeatedly, correspondingly, when determining described registered user's log-in password text by speech recognition, can be respectively the registration voice signal of each typing be carried out speech recognition, obtain a plurality of recognition results and the identification likelihood score score corresponding with each recognition result; Select to have the log-in password text of the recognition result of the highest likelihood score score then as described registered user.

Detailed process below in conjunction with speech recognition is carried out simple declaration to this.

Supposing that system can support the user to define the password content arbitrarily, as shown in Figure 4, is the flow chart that in the embodiment of the invention registration voice signal of registered user's typing is carried out speech recognition, may further comprise the steps:

Step 401 is obtained the current voice signal that needs identification.

Step 402 is extracted the acoustic feature sequence from described voice signal.

Step 403 is searched for the optimal path corresponding to step 302 in the search network of large vocabulary continuous speech recognition, and writes down the historical accumulated probability (being above-mentioned likelihood score score) in its path, and detailed process and prior art are similar, are not described in detail at this.

Consider that Chinese character is too much, cause internal memory excessive easily, thereby can select,, and make up search network in view of the above as a syllable that band is transferred surplus the syllable or 1300 surplus 400 etc. to littler voice unit to each character modeling.

Need to prove, in embodiments of the present invention, can also preestablish the cryptogram range of choice, as Chinese idiom commonly used, password commonly used etc. is selected use for the user.In this case, in the embodiment of the invention registration voice signal of registered user's typing is carried out speech recognition and can carry out, to improve decoding efficiency according to order speech RM (promptly making up above-mentioned search network) according to cryptogram.

Certainly, in actual applications, can also select or self-defined cryptogram by the user.

Need to prove, if the registered user is at when registration registration voice signal of typing repeatedly, also the registration voice signal of each typing or the vocal print characteristic sequence that extracts can be write the corresponding memory block of cryptogram of this voice signal correspondence from the registration voice signal of each typing, to increase the user data of corresponding cryptogram, the background model relevant with this cryptogram for refinement provides enough data.

The vocal print cipher authentication method that the embodiment of the invention provides, when carrying out user identity identification, the voice signal of typing carries out speech recognition when not only the user being logined, determine its password content, and it is carried out voiceprint, when carrying out voiceprint, based on many background models, promptly reach the optimization background model relevant with text with the universal background model of text-independent, by selecting suitable background model to realize accurately coupling, improved the accuracy rate of carrying out authentication based on the vocal print password effectively.

In embodiments of the present invention, utilize the user to register and logon data training optimization background model, make system from initial single universal background model, constantly refinement obtains the many background models corresponding to different cryptograms, thereby, the different passwords of user have background model more targetedly for providing, improve the differentiation between the model, and then improved the accuracy rate and the recognition efficiency of speech recognition.

Correspondingly, the embodiment of the invention also provides a kind of vocal print cipher authentication system, as shown in Figure 5, is a kind of structural representation of this system.

In this embodiment, described vocal print cipher authentication system comprises:

Receiving element 501 is used for when the user logins, and receives the voice signal of login user typing;

Vocal print feature extraction unit 502 is used for extracting the vocal print characteristic sequence of described voice signal;

Described vocal print characteristic sequence comprises one group of vocal print feature, can distinguish different speakers effectively, and same speaker's variation is kept relative stability.Such as, described vocal print feature mainly contains: spectrum envelope parameter phonetic feature, fundamental tone profile, formant frequency bandwidth feature, linear predictor coefficient, cepstrum coefficient etc.; Consider the quantification property of above-mentioned vocal print feature, the quantity of training sample and the problems such as evaluation of systematic function, can select MFCC (Mel Frequency Cepstrum Coefficient for use, the Mel frequency cepstral coefficient) feature, every frame speech data that the long 25ms frame of window is moved 10ms is done short-time analysis and is obtained MFCC parameter and single order second differnce thereof, amounts to 39 dimensions.Like this, every voice signal can be quantified as one 39 dimension vocal print characteristic sequence X;

Voice recognition unit 503 is used for described voice signal is carried out speech recognition, obtains the cryptogram of described login user, and the processing mode of concrete speech recognition can adopt some existing mode, no longer describes in detail at this;

Judging unit 504 is used to judge whether the cryptogram of voice recognition unit 503 acquisitions is identical with the log-in password of corresponding described login user;

Authentication result unit 505, the log-in password text that is used for judged result at judging unit 504 and is cryptogram that described voice recognition unit 503 obtains and described login user determines that described login user is non-authenticated user not simultaneously;

Model determining unit 506, be used for judged result at described judging unit 504 and be the log-in password text of cryptogram that described voice recognition unit 503 obtains and described login user when identical, determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model, and, in actual applications, model determining unit 506 can be as required, there is different ways to determine the background model of corresponding described login user, specifically can be with reference to the description of front;

First computing unit 507 is used for calculating respectively described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model;

Second computing unit 508 is used for according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculates likelihood ratio;

The concrete computational process of above-mentioned first computing unit 507 and second computing unit 508 can no longer describe in detail at this with reference to the description among the vocal print cipher authentication method of the present invention embodiment of front.

In this embodiment, above-mentioned judging unit 504 is used to also to judge that whether likelihood ratio that described second computing unit 508 calculates is greater than preset threshold; Correspondingly, it is that the likelihood ratio that calculates of second computing unit 508 is during greater than preset threshold that above-mentioned authentication result unit 505 also is used for judged result at judging unit 504, determine that described login user is effective authenticated user, otherwise determine that described login user is non-authenticated user.

As shown in Figure 6, be the another kind of structural representation of embodiment of the invention vocal print cipher authentication system.

With embodiment illustrated in fig. 5 different be that in this embodiment, described system also comprises:

Inspection unit 601 is used to check whether have the optimization background model corresponding with the log-in password text of described login user.

Correspondingly, model determining unit 506 can be when the corresponding optimization background model of log-in password text with described login user is arranged, to select the background model of this optimization background model as the described login user of correspondence in the check result of described inspection unit 601; Otherwise select the background model of described universal background model as the described login user of correspondence.

Certainly, in the embodiment of the invention vocal print cipher authentication system, model determining unit 506 can also be as required, there is different ways to select the background model of corresponding described login user, such as, can system initialisation phase (such as, can set the regular hour section), the universal background model of selection and text-independent is to adapt to the various different vocal print passwords of user's typing; And along with the operation of system, the user data that the specific cryptosystem that collects is relevant constantly increases, can obtain the optimization background model relevant according to these user data training with text, this optimization background model is the model relevant with the user cipher text, after this, can select corresponding background model according to the cryptogram of current login user.

As shown in Figure 7, be the another kind of structural representation of embodiment of the invention vocal print cipher authentication system.

With embodiment illustrated in fig. 6 different be that in this embodiment, described system also comprises: background model construction unit 701 and speaker's sound-groove model construction unit 702.

In addition, in this embodiment, voice recognition unit 503 also is used for writing and described cryptogram corresponding cache district with the voice signal of login user typing or from the vocal print characteristic sequence that the voice signal of login user typing extracts.

Receiving element 501 also is used to receive the registration voice signal of registered user's typing, and correspondingly, voice recognition unit 503 also is used for described registration voice signal is carried out speech recognition, obtains described registered user's log-in password text.

Background model construction unit 701 is used for real-time data construct or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area according to each buffer area.

Speaker's sound-groove model construction unit 702 is used for training according to the registration voice signal of described registered user's typing speaker's sound-groove model of corresponding described registered user.

Certainly, in actual applications, the corresponding cryptogram of voice signal (comprising the voice signal of login user typing and the registration voice signal of registered user's typing) that also can identify according to voice recognition unit 503 by vocal print feature extraction unit 502, described voice signal is write and this cryptogram corresponding cache district, this embodiment of the invention is not done qualification.

In the system of the embodiment of the invention, can set up a corresponding cache district for each cryptogram, the corresponding different buffer area of different cryptograms, the voice signal of the corresponding same cryptogram of storage or the vocal print characteristic sequence that from described voice signal, extracts in this buffer area, above-mentioned voice signal not only comprises the voice signal of login user typing, the registration voice signal that also comprises registered user's typing, certainly, in a buffer area, store from the voice signal of different user all corresponding same cryptogram.

Background model construction unit 701 is in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area, can be at every turn after having new data to add described buffer area, promptly the relevant optimization background model of current and described cryptogram be upgraded.Certainly, in order to reduce overhead and computing workload, can also be after the data of storing in the buffer area of a corresponding cryptogram satisfy certain predetermined condition, according to the data construct in the described buffer area or upgrade the corresponding background model of optimizing.When concrete the application, the mode that background model is optimized in above-mentioned pre-conditioned and corresponding structure or renewal can have multiple, such as: in one embodiment, background model construction unit 701 can data quantity stored reach first preset value in a buffer area, and during current not relevant optimization background model with the corresponding cryptogram of this buffer area, with described universal background model is initial model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; Data quantity stored reaches first preset value in a buffer area, and it is current when the optimization background model relevant with the corresponding cryptogram of this buffer area arranged, with this optimization background model is initial model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.

In another kind of embodiment, background model construction unit 701 can be in a buffer area data quantity stored reach the integral multiple of second preset value, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.

The detailed process of the optimization background model that background model construction unit 701 structures or renewal are relevant with cryptogram among above-mentioned two embodiment can not repeat them here referring to the description among the inventive method embodiment of front.

Need to prove, when concrete the application, the registration voice signal of described registered user's typing can be typing once, also can be the repetition typing repeatedly, if repeat typing repeatedly, correspondingly, described voice recognition unit 503 can be respectively carries out speech recognition to the registration voice signal of each typing, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result.

Correspondingly, described system also can further comprise: password determining unit (not shown) is used for selecting to have the log-in password text of the recognition result of the highest likelihood score score as described registered user from a plurality of recognition results that described voice recognition unit 503 obtains.Detailed process can not repeat them here with reference to the description of front.

The vocal print cipher authentication system that the embodiment of the invention provides, when carrying out user identity identification, the voice signal of typing carries out speech recognition when not only the user being logined, determine its password content, and it is carried out voiceprint, when carrying out voiceprint, based on many background models, promptly reach the optimization background model relevant with text with the universal background model of text-independent, by selecting suitable background model to realize accurately coupling, improved the accuracy rate of carrying out authentication based on the vocal print password effectively.

Each embodiment in this specification all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses all is difference with other embodiment.Especially, for system embodiment, because it is substantially similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.System embodiment described above only is schematically, and wherein said unit and module as the separating component explanation can or can not be physically to separate also.In addition, can also select wherein some or all of unit and the module purpose that realizes the present embodiment scheme according to the actual needs.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.

More than disclosed only be preferred implementation of the present invention; but the present invention is not limited thereto; any those skilled in the art can think do not have a creationary variation, and, all should drop in protection scope of the present invention not breaking away from some improvements and modifications of being done under the principle of the invention prerequisite.

Claims

1. a vocal print cipher authentication method is characterized in that, comprising:

Receive the voice signal of login user typing;

Extract the vocal print characteristic sequence in the described voice signal;

2. the method for claim 1 is characterized in that, the background model of the described login user of described definite correspondence comprises:

3. the method for claim 1 is characterized in that, described method also comprises:

Receive the registration voice signal of registered user's typing;

4. method as claimed in claim 3 is characterized in that, and is described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:

5. method as claimed in claim 3 is characterized in that, and is described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:

6. as each described method of claim 3 to 5, it is characterized in that the registration voice signal of described registered user's typing repeats repeatedly;

7. a vocal print cipher authentication system is characterized in that, comprising:

8. system as claimed in claim 7 is characterized in that, described system also comprises:

9. system as claimed in claim 8 is characterized in that,

Described voice recognition unit also is used for writing and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district with the voice signal of login user typing or from the vocal print characteristic sequence that the voice signal of login user typing extracts;

Described system also comprises:

10. system as claimed in claim 9 is characterized in that,

Described background model construction unit, specifically be used for reaching first preset value a buffer area data quantity stored, and during current not relevant optimization background model with the corresponding cryptogram of this buffer area, with described universal background model is initial model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; Data quantity stored reaches first preset value in a buffer area, and it is current when the optimization background model relevant with the corresponding cryptogram of this buffer area arranged, with this optimization background model is initial model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.

11. system as claimed in claim 9 is characterized in that,

Described background model construction unit, specifically be used for reaching the integral multiple of second preset value a buffer area data quantity stored, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.

12., it is characterized in that the registration voice signal of described registered user's typing repeats repeatedly as each described system of claim 9 to 11;

Described system also comprises: