CN102238189A - Voiceprint password authentication method and system - Google Patents

Voiceprint password authentication method and system Download PDF

Info

Publication number
CN102238189A
CN102238189A CN2011102180429A CN201110218042A CN102238189A CN 102238189 A CN102238189 A CN 102238189A CN 2011102180429 A CN2011102180429 A CN 2011102180429A CN 201110218042 A CN201110218042 A CN 201110218042A CN 102238189 A CN102238189 A CN 102238189A
Authority
CN
China
Prior art keywords
background model
buffer area
voice signal
login user
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011102180429A
Other languages
Chinese (zh)
Other versions
CN102238189B (en
Inventor
何婷婷
胡国平
胡郁
王智国
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN2011102180429A priority Critical patent/CN102238189B/en
Publication of CN102238189A publication Critical patent/CN102238189A/en
Application granted granted Critical
Publication of CN102238189B publication Critical patent/CN102238189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Collating Specific Patterns (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a voiceprint password authentication method and a voiceprint password authentication system. The method comprises the following steps of: receiving a voice signal recorded by a login user; extracting a voiceprint characteristic sequence from the voice signal; performing voice recognition on the voice signal to obtain the password contents of the login user; if the obtained password contents are different from a registration password text, determining the login user is an unauthenticated user; otherwise computing likelihood between the voiceprint characteristic sequence and each of a speaker voiceprint mode corresponding to the login user and a background model selected for the login user, wherein the background model comprises a text-unrelated universal background model and a text-related optimized background model; computing a likelihood ratio according to the obtained likelihood; and if the likelihood is greater than a preset threshold value, determining the login user is an effectively authenticated user, otherwise determining the login user is the unauthenticated user. By the method and the system, the voiceprint password authentication accuracy can be improved.

Description

Vocal print cipher authentication method and system
Technical field
The present invention relates to the cipher authentication technique field, particularly a kind of vocal print cipher authentication method and system.
Background technology
(Voiceprint Recognition VPR) is also referred to as Speaker Identification to Application on Voiceprint Recognition, and two classes are arranged, i.e. speaker's identification and speaker verification.The former in order to judge that certain section voice are some philtrums which is said, be " multiselect one " problem; And the latter is " differentiating one to one " problem in order to confirm that whether certain section voice are that the someone of appointment is said.Different tasks can be used different Application on Voiceprint Recognition technology with using.
Voiceprint is meant according to the voice signal that collects confirms speaker ' s identity, belongs to the differentiation problem of " one to one ".The voiceprint system of main flow has adopted the framework based on hypothesis testing now, by calculate respectively the vocal print signal with respect to the likelihood score of speaker's sound-groove model and background model and relatively they likelihood ratio and in advance rule of thumb the threshold size of setting confirm.Obviously the accuracy of background model and speaker's sound-groove model will directly have influence on the voiceprint effect, and the big more then modelling effect of amount of training data is good more under setting based on the statistical model of data-driven.
The vocal print cipher authentication is the relevant speaker ' s identity authentication method of a kind of text.This method requires the user speech input to determine cryptogram, and confirms speaker ' s identity in view of the above.The phonetic entry of determining cryptogram is all adopted in user's registration and authentication in this application, thereby its vocal print is often more consistent, can obtain accordingly than the speaker verification of text-independent and better authenticate effect.
In the vocal print cipher authentication system, the user substitutes traditional word string password input with voice input signal, and corresponding Verification System is preserved user's vocal print password with the form of speaker's sound-groove model.Existing vocal print cipher authentication system mostly is to adopt to calculate the likelihood score of vocal print signal with respect to speaker's sound-groove model and background model, and its likelihood score of comparison compares and the preset threshold value size is confirmed user identity.Therefore, the levels of precision of background model and speaker's sound-groove model will directly have influence on the effect of vocal print cipher authentication.
In the prior art, the vocal print cipher authentication system generally adopts universal background model, is used to simulate user's vocal print characteristic of text-independent, specifically is to obtain single universal background model with the offline mode training on the words personal data of speaking more of gathering.Though this universal background model has universality preferably, model description is accurate inadequately, and discrimination is lower, has influenced the accuracy of cipher authentication to a certain extent.
Summary of the invention
The embodiment of the invention provides a kind of vocal print cipher authentication method and system, to improve the accuracy rate of carrying out authentication based on the vocal print password.
A kind of vocal print cipher authentication method comprises:
Receive the voice signal of login user typing;
Extract the vocal print characteristic sequence in the described voice signal;
Described voice signal is carried out speech recognition, obtain the cryptogram of described login user;
If the cryptogram that obtains is different with the log-in password text of corresponding described login user, determine that then described login user is non-authenticated user;
If the cryptogram that obtains is identical with the log-in password text of corresponding described login user,
Then determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model;
Calculate described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model respectively;
According to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculate likelihood ratio;
If described likelihood ratio, determines then that described login user is effective authenticated user greater than preset threshold, otherwise determines that described login user is non-authenticated user.
Preferably, the background model of the described login user of described definite correspondence comprises:
If the optimization background model corresponding with the cryptogram of described login user arranged, then select the background model of this optimization background model as the described login user of correspondence; Otherwise select the background model of described universal background model as the described login user of correspondence.
Preferably, described method also comprises:
The voice signal of login user typing or the vocal print characteristic sequence that extracts from the voice signal of login user typing are write and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district;
Receive the registration voice signal of registered user's typing;
Described registration voice signal is carried out speech recognition, obtain described registered user's log-in password text;
Described registration voice signal or the vocal print characteristic sequence that extracts from described registration voice signal are write and the corresponding cryptogram corresponding cache of this registration voice signal district;
Train speaker's sound-groove model of corresponding described registered user according to the registration voice signal of described registered user's typing;
In real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area.
Alternatively, described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:
If data quantity stored reaches first preset value in the buffer area, and the current not optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with described universal background model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; If data quantity stored reaches first preset value in the buffer area, and current have an optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with this optimization background model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.
Alternatively, described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:
If data quantity stored reaches the integral multiple of second preset value in buffer area, is initial model with described universal background model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area.
Preferably, the registration voice signal of described registered user's typing repeats repeatedly;
Described described registration voice signal is carried out speech recognition, the log-in password text that obtains described registered user comprises:
Registration voice signal to each typing carries out speech recognition respectively, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result;
Selection has the log-in password text of the recognition result of the highest likelihood score score as described registered user.
A kind of vocal print cipher authentication system comprises:
Receiving element is used for when the user logins, and receives the voice signal of login user typing;
The vocal print feature extraction unit is used for extracting the vocal print characteristic sequence of described voice signal;
Voice recognition unit is used for described voice signal is carried out speech recognition, obtains the cryptogram of described login user;
Judging unit is used to judge whether the cryptogram of described voice recognition unit acquisition is identical with the log-in password of corresponding described login user;
The authentication result unit, the log-in password text that is used for judged result at described judging unit and is cryptogram that described voice recognition unit obtains and corresponding described login user determines that described login user is non-authenticated user not simultaneously;
The model determining unit, be used for judged result at described judging unit and be the log-in password text of cryptogram that described voice recognition unit obtains and described login user when identical, determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model;
First computing unit is used for calculating respectively the likelihood score of the background model that the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model of corresponding described login user and described vocal print characteristic sequence and described model determining unit determine;
Second computing unit is used for according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculates likelihood ratio;
Described judging unit is used to also to judge that whether likelihood ratio that described second computing unit calculates is greater than preset threshold;
Described authentication result unit, when also being used for judged result at described judging unit and being likelihood ratio that described second computing unit calculates greater than preset threshold, determine that described login user is effective authenticated user, otherwise determine that described login user is non-authenticated user.
Preferably, described system also comprises:
Inspection unit is used to check whether have the optimization background model corresponding with the log-in password text of described login user;
Described model determining unit when specifically being used for check result at described inspection unit and being having the corresponding optimization background model of log-in password text with described login user, is selected the background model of this optimization background model as the described login user of correspondence; Otherwise select the background model of described universal background model as the described login user of correspondence.
Preferably, described voice recognition unit also is used for writing and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district with the voice signal of login user typing or from the vocal print characteristic sequence that the voice signal of login user typing extracts;
Described receiving element also is used to receive the registration voice signal of registered user's typing;
Described voice recognition unit also is used for described registration voice signal is carried out speech recognition, obtains described registered user's log-in password text;
Described system also comprises:
Speaker's sound-groove model construction unit is used for training according to the registration voice signal of described registered user's typing speaker's sound-groove model of corresponding described registered user;
The background model construction unit is used for real-time data construct or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area according to each buffer area.
Alternatively, described background model construction unit, specifically be used for reaching first preset value a buffer area data quantity stored, and during current not relevant optimization background model with the corresponding cryptogram of this buffer area, with described universal background model is initial model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; Data quantity stored reaches first preset value in a buffer area, and it is current when the optimization background model relevant with the corresponding cryptogram of this buffer area arranged, with this optimization background model is initial model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.
Alternatively, described background model construction unit, specifically be used for reaching the integral multiple of second preset value a buffer area data quantity stored, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.
Preferably, the registration voice signal of described registered user's typing repeats repeatedly;
Described voice recognition unit carries out speech recognition to the registration voice signal of each typing respectively, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result;
Described system also comprises:
The password determining unit is used for selecting to have the log-in password text of the recognition result of the highest likelihood score score as described registered user from a plurality of recognition results that described voice recognition unit obtains.
Vocal print cipher authentication method and system that the embodiment of the invention provides, when carrying out user identity identification, the voice signal of typing carries out speech recognition when not only the user being logined, determine its password content, and it is carried out voiceprint, when carrying out voiceprint, based on many background models, promptly reach the optimization background model relevant with text with the universal background model of text-independent, by selecting suitable background model to realize accurately coupling, improved the accuracy rate of carrying out authentication based on the vocal print password effectively.
Description of drawings
In order to be illustrated more clearly in technical scheme of the invention process, to do to introduce simply to the accompanying drawing of required use among the embodiment below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the flow chart of embodiment of the invention vocal print cipher authentication method;
Fig. 2 be in the embodiment of the invention with the structure flow chart of the universal background model of text-independent;
Fig. 3 is a kind of flow chart that makes up the optimization background model relevant with text in the embodiment of the invention;
Fig. 4 is the flow chart that in the embodiment of the invention registration voice signal of registered user's typing is carried out speech recognition;
Fig. 5 is a kind of structural representation of embodiment of the invention vocal print cipher authentication system;
Fig. 6 is the another kind of structural representation of embodiment of the invention vocal print cipher authentication system;
Fig. 7 is the another kind of structural representation of embodiment of the invention vocal print cipher authentication system.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
As shown in Figure 1, be the flow chart of embodiment of the invention vocal print cipher authentication method, may further comprise the steps:
Step 101, the voice signal of reception login user typing.
Step 102 is extracted the vocal print characteristic sequence in the described voice signal.
Described vocal print characteristic sequence comprises one group of vocal print feature, can distinguish different speakers effectively, and same speaker's variation is kept relative stability.
Such as, described vocal print feature mainly contains: spectrum envelope parameter phonetic feature, fundamental tone profile, formant frequency bandwidth feature, linear predictor coefficient, cepstrum coefficient etc.Consider the quantification property of above-mentioned vocal print feature, the quantity of training sample and the problems such as evaluation of systematic function, can select MFCC (Mel Frequency Cepstrum Coefficient for use, the Mel frequency cepstral coefficient) feature, every frame speech data that the long 25ms frame of window is moved 10ms is done short-time analysis and is obtained MFCC parameter and single order second differnce thereof, amounts to 39 dimensions.Like this, every voice signal can be quantified as one 39 dimension vocal print characteristic sequence X.
Step 103 is carried out speech recognition to described voice signal, obtains the cryptogram of login user.
The processing mode of concrete speech recognition can adopt some existing mode, no longer describes in detail at this.
Step 104 judges whether the cryptogram that obtains is identical with the log-in password text of current login user; If then execution in step 105; Otherwise, execution in step 110.
Step 105 is determined the background model of corresponding described login user.
Wherein, speaker's sound-groove model is used to simulate registered users at the pronunciation characteristic of determining on the cryptogram, and background model is used to simulate many speakers' pronunciation general character.
In embodiments of the present invention, speaker's sound-groove model can make up by the registration voice signal according to user's typing when the user registers, and specifically can adopt building modes more of the prior art.The structure of background model can comprehensively adopt dual mode make up respectively with the universal background model of text-independent and with text relevant optimization background model, wherein, can obtain with the offline mode training by the words personal data of speaking more of gathering in advance with the universal background model of text-independent, concrete training process can not done qualification to this embodiment of the invention with reference to processing modes more of the prior art; The vocal print characteristic sequence that extracts in the voice signal of typing when the optimization background model relevant with text can be registered according to the user of record and login obtains with the online mode training.
Correspondingly, in this step, can there be different ways to select the background model of corresponding described login user as required, will describes in detail in the back this.
Step 106 is calculated described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model respectively.
Above-mentioned speaker's sound-groove model can online training obtains according to the registration voice signal when the user registers.Such as, with the universal background model is that initial model is by a small amount of speaker's data adjustment model partial parameters of various adaptive approachs bases, as at present commonly used based on maximum a posteriori probability (Maximum A Posterior, MAP) adaptive algorithms etc. are current speaker's individual character with user's vocal print general character self adaptation.Certainly, can also adopt other modes to train and obtain speaker's sound-groove model, this embodiment of the invention is not done qualification.
Suppose to obtain the vocal print characteristic sequence X that frame number is T, then its likelihood score corresponding to background model is:
p ( X | UBM ) = 1 T Σ t = 1 T Σ m = 1 M c m N ( X t ; μ m , Σ m ) - - - ( 1 )
Wherein, c mBe m Gauss's weight coefficient, satisfy μ mAnd ∑ mBe respectively m Gauss's average and variance.Wherein N (.) satisfies normal distribution, is used to calculate t vocal print characteristic vector X constantly tLikelihood score on single Gaussian component:
N ( X t ; μ m , Σ m ) = 1 ( 2 π ) n | Σ m | e - 1 2 ( X t - μ m ) ′ Σ m - 1 ( X t - μ m ) - - - ( 2 )
Described vocal print characteristic sequence X is similar to the above corresponding to the calculating of the likelihood score of speaker's sound-groove model, no longer describes in detail at this.
Step 107 according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, is calculated likelihood ratio.
Likelihood ratio is: p = p ( X | U ) p ( X | UBM ) - - - ( 3 )
Wherein, p (X|U) is the likelihood score of described vocal print feature and speaker's sound-groove model, and p (X|UBM) is the likelihood score of described vocal print feature and background model.
Step 108 judges that whether described likelihood ratio is greater than preset threshold; If then execution in step 109; Otherwise execution in step 110.
Above-mentioned threshold value can be preestablished by system, in general, this threshold value is big more, then the sensitivity of system is high more, require user's pronunciation of the voice signal (password) of typing during as far as possible according to registration when login, otherwise then the sensitivity of system is lower, and there is certain variation in the pronunciation the when pronunciation of the voice signal of typing is with registration when allowing the user to login.
Step 109 determines that login user is effective authenticated user.
Step 110 determines that login user is non-authenticated user.
Need to prove, in order to improve the robustness of system, between above-mentioned steps 101 and step 102, can also carry out noise reduction process to described voice signal, such as, at first, continuous voice signal is divided into independently voice snippet and non-voice segment by short-time energy and short-time zero-crossing rate analysis to voice signal.Reduce the interference of channel noise and background noise then by the front end noise reduction process, improve the voice signal to noise ratio, handling for follow-up system provides clean signal.
The front is mentioned, in embodiments of the present invention, background model can comprise: with the universal background model of text-independent and with text relevant optimization background model, and can have different ways to select the background model of corresponding described login user as required, such as, can system initialisation phase (such as, can set the regular hour section), the universal background model of selection and text-independent is to adapt to the various different vocal print passwords of user's typing; And along with the operation of system, the user data relevant with the specific cryptosystem text that collects constantly increases, and can obtain the optimization background model relevant with this cryptogram according to these user data training.After this, the cryptogram of the current login user that can obtain according to above-mentioned steps 103 is selected corresponding background model.Certainly, in order to simplify the complexity in the realization, also can just select corresponding background model from system start-up according to the cryptogram of current login user.
Above-mentioned and universal background model text-independent can adopt modes more of the prior art, such as adopt 1024 or the mixed Gauss model of bigger Gaussage make up, its model parameter training process is as shown in Figure 2.
Step 201 is extracted the vocal print feature respectively from many speakers training utterance signal, each vocal print feature is as a characteristic vector.
Step 202 utilizes clustering algorithm that above-mentioned characteristic vector is carried out cluster, obtains K Gauss's initialization average, and K is the mixed Gauss model number that sets in advance.
Such as, (Gray) clustering algorithm approaches optimum regeneration code book by trained vector collection and certain iterative algorithm for Linde, Buzo can to adopt traditional LBG.
Step 203 utilizes EM (Expectation Maximization) algorithm iteration to upgrade the weight coefficient of above-mentioned average, variance and each Gauss's correspondence, obtains the universal background model with text-independent.
Concrete iteration renewal process is same as the prior art, is not described in detail at this.
Certainly, can also adopt other modes to make up above-mentioned and universal background model text-independent, this embodiment of the invention is not done qualification.
In embodiments of the present invention, no matter the user is in login mode or registration mode, the voice signal of user's typing or the vocal print feature of extracting from this voice signal can be write in the cryptogram corresponding cache district that this voice signal is identified, and make up or renewal and the corresponding relevant optimization background model of cryptogram according to the data in real time in the buffer area.Like this, can collect related data fast, thereby make described optimization background model obtain rapid Optimum, improve the efficient and the accuracy of Application on Voiceprint Recognition at the specific cryptosystem text.
Certainly, in actual applications,, also can only under registration mode or login mode, make up or upgrade and the corresponding relevant optimization background model of cryptogram in order to reduce the operand of system.This embodiment of the invention is not done qualification.
Therefore, in above-mentioned flow process shown in Figure 1, also can further may further comprise the steps: the voice signal of login user typing or the vocal print characteristic sequence that extracts from the voice signal of login user typing are write and described cryptogram corresponding cache district.At login state, receive the registration voice signal of registered user's typing; Described registration voice signal is carried out speech recognition, obtain described registered user's log-in password text; Described registration voice signal or the vocal print characteristic sequence that extracts from described registration voice signal are write and the corresponding cryptogram corresponding cache of this registration voice signal district.In addition, need train speaker's sound-groove model of corresponding described registered user according to the registration voice signal of described registered user's typing, also need in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area.
In embodiments of the present invention, can set up a corresponding cache district for each cryptogram, the corresponding different buffer area of different cryptograms, the voice signal of the corresponding same cryptogram of storage or the vocal print characteristic sequence that from described voice signal, extracts in this buffer area, above-mentioned voice signal not only comprises the voice signal of login user typing, the registration voice signal that also comprises registered user's typing, certainly, in a buffer area, store from the voice signal of different user all corresponding same cryptogram.
In real time according to the data construct in each buffer area or the renewal optimization background model relevant the time with the corresponding cryptogram of described buffer area, can after having new data to add described buffer area, promptly the relevant optimization background model of current and described cryptogram be upgraded at every turn.Certainly, in order to reduce overhead and computing workload, when the data that can also store satisfy certain predetermined condition, optimize background model accordingly in the buffer area of a corresponding cryptogram according to data construct in the described buffer area or renewal.When concrete the application, above-mentioned pre-conditioned and corresponding make up or upgrade the mode of optimizing background model can have multiple, such as:
A kind of mode is: if data quantity stored reaches first preset value (such as 500 or 600 etc.) in buffer area, and the current not optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with described universal background model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; If data quantity stored reaches first preset value in the buffer area, and current have an optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with this optimization background model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.
In this manner, the data volume of foundation was identical when background model was optimized in each structure or renewal, and when structure was optimized background model, the initial model of employing was above-mentioned universal background model, when upgrading the optimization background model, the initial model of employing is current optimization background model.In addition, in this manner, no matter be to make up Optimization Model or upgrade current optimization background model, all need to remove the data in the respective cache district afterwards, so that gather next group data.This mode can reduce the demand to the buffer area memory space.
Another kind of mode is: if data quantity stored reaches the integral multiple of second preset value (such as 500 or 600 etc.) in buffer area, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.
In this manner, make up at every turn or upgrade the data volume difference of foundation when optimizing background model, and when making up and upgrading current optimization background model, the initial model of employing all is above-mentioned universal background models.In addition, in this manner, need not after making up at every turn or upgrading current optimization background model, all will remove the data in the respective cache district, but bigger to the demand of spatial cache, can be applied under the environment with magnanimity spatial cache.Certainly, also can adopt and above-mentioned first kind of similar processing mode, when the data volume in buffer area acquires a certain degree (such as 50000), remove the data in this buffer area, in order to guarantee to optimize the characteristic of background model, when the data volume in this buffer area reaches above-mentioned second preset value again, not to be that initial model carries out renewal process with the universal background model, but be that initial model carries out renewal process with current optimization background model, when the data volume in follow-up buffer area reached the condition of upgrading once more then, continuing with the universal background model was that initial model carries out renewal process again.
As shown in Figure 3, be a kind of flow chart that makes up or upgrade the optimization background model in the embodiment of the invention, may further comprise the steps:
Step 301 is utilized the average μ of all vocal print characteristic sequence adaptive updates universal background model mixed Gaussians in the buffer area m
Particularly, new Gaussian mean
Figure BDA0000080248880000121
Be calculated as the weighted average of sample statistic and original Gaussian mean, that is:
μ ^ m = Σ i = 1 N Σ t = 1 T i γ m ( x t ) x t + τ μ m Σ i = 1 N Σ t = 1 T γ m ( x t ) + τ - - - ( 4 )
Wherein, N is a vocal print characteristic sequence sum, T iBe total frame length of i sentence vocal print characteristic sequence, x tRepresent t frame vocal print feature, γ m(x t) representing that t frame vocal print feature falls within m Gauss's probability, τ is a forgetting factor, is used for historical average of balance and the sample update intensity to new average.In general, the τ value is big more, and then new average is restricted by original average mainly.And if the τ value is less, then new average has more embodied the characteristics that new samples distributes mainly by the sample statistic decision.The τ value can be pre-determined by system, also can select the parameter value that gradually changes in time, with the effect of continuous lifting new samples data
Step 302 is duplicated the universal background model variance as the optimization background model variance relevant with cryptogram.
Step 303 generates the optimization background model relevant with cryptogram.
Process according to the optimization background model relevant with the log-in password text of the Data Update in the buffer area is similar to the above, does not repeat them here.
Need to prove, in embodiments of the present invention, the registration voice signal of registered user's typing can be typing once, also can be the repetition typing repeatedly, to guarantee the accuracy of log-in password.
If repeat typing repeatedly, correspondingly, when determining described registered user's log-in password text by speech recognition, can be respectively the registration voice signal of each typing be carried out speech recognition, obtain a plurality of recognition results and the identification likelihood score score corresponding with each recognition result; Select to have the log-in password text of the recognition result of the highest likelihood score score then as described registered user.
Detailed process below in conjunction with speech recognition is carried out simple declaration to this.
Supposing that system can support the user to define the password content arbitrarily, as shown in Figure 4, is the flow chart that in the embodiment of the invention registration voice signal of registered user's typing is carried out speech recognition, may further comprise the steps:
Step 401 is obtained the current voice signal that needs identification.
Step 402 is extracted the acoustic feature sequence from described voice signal.
Step 403 is searched for the optimal path corresponding to step 302 in the search network of large vocabulary continuous speech recognition, and writes down the historical accumulated probability (being above-mentioned likelihood score score) in its path, and detailed process and prior art are similar, are not described in detail at this.
Consider that Chinese character is too much, cause internal memory excessive easily, thereby can select,, and make up search network in view of the above as a syllable that band is transferred surplus the syllable or 1300 surplus 400 etc. to littler voice unit to each character modeling.
Need to prove, in embodiments of the present invention, can also preestablish the cryptogram range of choice, as Chinese idiom commonly used, password commonly used etc. is selected use for the user.In this case, in the embodiment of the invention registration voice signal of registered user's typing is carried out speech recognition and can carry out, to improve decoding efficiency according to order speech RM (promptly making up above-mentioned search network) according to cryptogram.
Certainly, in actual applications, can also select or self-defined cryptogram by the user.
Need to prove, if the registered user is at when registration registration voice signal of typing repeatedly, also the registration voice signal of each typing or the vocal print characteristic sequence that extracts can be write the corresponding memory block of cryptogram of this voice signal correspondence from the registration voice signal of each typing, to increase the user data of corresponding cryptogram, the background model relevant with this cryptogram for refinement provides enough data.
The vocal print cipher authentication method that the embodiment of the invention provides, when carrying out user identity identification, the voice signal of typing carries out speech recognition when not only the user being logined, determine its password content, and it is carried out voiceprint, when carrying out voiceprint, based on many background models, promptly reach the optimization background model relevant with text with the universal background model of text-independent, by selecting suitable background model to realize accurately coupling, improved the accuracy rate of carrying out authentication based on the vocal print password effectively.
In embodiments of the present invention, utilize the user to register and logon data training optimization background model, make system from initial single universal background model, constantly refinement obtains the many background models corresponding to different cryptograms, thereby, the different passwords of user have background model more targetedly for providing, improve the differentiation between the model, and then improved the accuracy rate and the recognition efficiency of speech recognition.
Correspondingly, the embodiment of the invention also provides a kind of vocal print cipher authentication system, as shown in Figure 5, is a kind of structural representation of this system.
In this embodiment, described vocal print cipher authentication system comprises:
Receiving element 501 is used for when the user logins, and receives the voice signal of login user typing;
Vocal print feature extraction unit 502 is used for extracting the vocal print characteristic sequence of described voice signal;
Described vocal print characteristic sequence comprises one group of vocal print feature, can distinguish different speakers effectively, and same speaker's variation is kept relative stability.Such as, described vocal print feature mainly contains: spectrum envelope parameter phonetic feature, fundamental tone profile, formant frequency bandwidth feature, linear predictor coefficient, cepstrum coefficient etc.; Consider the quantification property of above-mentioned vocal print feature, the quantity of training sample and the problems such as evaluation of systematic function, can select MFCC (Mel Frequency Cepstrum Coefficient for use, the Mel frequency cepstral coefficient) feature, every frame speech data that the long 25ms frame of window is moved 10ms is done short-time analysis and is obtained MFCC parameter and single order second differnce thereof, amounts to 39 dimensions.Like this, every voice signal can be quantified as one 39 dimension vocal print characteristic sequence X;
Voice recognition unit 503 is used for described voice signal is carried out speech recognition, obtains the cryptogram of described login user, and the processing mode of concrete speech recognition can adopt some existing mode, no longer describes in detail at this;
Judging unit 504 is used to judge whether the cryptogram of voice recognition unit 503 acquisitions is identical with the log-in password of corresponding described login user;
Authentication result unit 505, the log-in password text that is used for judged result at judging unit 504 and is cryptogram that described voice recognition unit 503 obtains and described login user determines that described login user is non-authenticated user not simultaneously;
Model determining unit 506, be used for judged result at described judging unit 504 and be the log-in password text of cryptogram that described voice recognition unit 503 obtains and described login user when identical, determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model, and, in actual applications, model determining unit 506 can be as required, there is different ways to determine the background model of corresponding described login user, specifically can be with reference to the description of front;
First computing unit 507 is used for calculating respectively described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model;
Second computing unit 508 is used for according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculates likelihood ratio;
The concrete computational process of above-mentioned first computing unit 507 and second computing unit 508 can no longer describe in detail at this with reference to the description among the vocal print cipher authentication method of the present invention embodiment of front.
In this embodiment, above-mentioned judging unit 504 is used to also to judge that whether likelihood ratio that described second computing unit 508 calculates is greater than preset threshold; Correspondingly, it is that the likelihood ratio that calculates of second computing unit 508 is during greater than preset threshold that above-mentioned authentication result unit 505 also is used for judged result at judging unit 504, determine that described login user is effective authenticated user, otherwise determine that described login user is non-authenticated user.
Above-mentioned threshold value can be preestablished by system, in general, this threshold value is big more, then the sensitivity of system is high more, require user's pronunciation of the voice signal (password) of typing during as far as possible according to registration when login, otherwise then the sensitivity of system is lower, and there is certain variation in the pronunciation the when pronunciation of the voice signal of typing is with registration when allowing the user to login.
As shown in Figure 6, be the another kind of structural representation of embodiment of the invention vocal print cipher authentication system.
With embodiment illustrated in fig. 5 different be that in this embodiment, described system also comprises:
Inspection unit 601 is used to check whether have the optimization background model corresponding with the log-in password text of described login user.
Correspondingly, model determining unit 506 can be when the corresponding optimization background model of log-in password text with described login user is arranged, to select the background model of this optimization background model as the described login user of correspondence in the check result of described inspection unit 601; Otherwise select the background model of described universal background model as the described login user of correspondence.
Certainly, in the embodiment of the invention vocal print cipher authentication system, model determining unit 506 can also be as required, there is different ways to select the background model of corresponding described login user, such as, can system initialisation phase (such as, can set the regular hour section), the universal background model of selection and text-independent is to adapt to the various different vocal print passwords of user's typing; And along with the operation of system, the user data that the specific cryptosystem that collects is relevant constantly increases, can obtain the optimization background model relevant according to these user data training with text, this optimization background model is the model relevant with the user cipher text, after this, can select corresponding background model according to the cryptogram of current login user.
As shown in Figure 7, be the another kind of structural representation of embodiment of the invention vocal print cipher authentication system.
With embodiment illustrated in fig. 6 different be that in this embodiment, described system also comprises: background model construction unit 701 and speaker's sound-groove model construction unit 702.
In addition, in this embodiment, voice recognition unit 503 also is used for writing and described cryptogram corresponding cache district with the voice signal of login user typing or from the vocal print characteristic sequence that the voice signal of login user typing extracts.
Receiving element 501 also is used to receive the registration voice signal of registered user's typing, and correspondingly, voice recognition unit 503 also is used for described registration voice signal is carried out speech recognition, obtains described registered user's log-in password text.
Background model construction unit 701 is used for real-time data construct or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area according to each buffer area.
Speaker's sound-groove model construction unit 702 is used for training according to the registration voice signal of described registered user's typing speaker's sound-groove model of corresponding described registered user.
Certainly, in actual applications, the corresponding cryptogram of voice signal (comprising the voice signal of login user typing and the registration voice signal of registered user's typing) that also can identify according to voice recognition unit 503 by vocal print feature extraction unit 502, described voice signal is write and this cryptogram corresponding cache district, this embodiment of the invention is not done qualification.
In the system of the embodiment of the invention, can set up a corresponding cache district for each cryptogram, the corresponding different buffer area of different cryptograms, the voice signal of the corresponding same cryptogram of storage or the vocal print characteristic sequence that from described voice signal, extracts in this buffer area, above-mentioned voice signal not only comprises the voice signal of login user typing, the registration voice signal that also comprises registered user's typing, certainly, in a buffer area, store from the voice signal of different user all corresponding same cryptogram.
Background model construction unit 701 is in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area, can be at every turn after having new data to add described buffer area, promptly the relevant optimization background model of current and described cryptogram be upgraded.Certainly, in order to reduce overhead and computing workload, can also be after the data of storing in the buffer area of a corresponding cryptogram satisfy certain predetermined condition, according to the data construct in the described buffer area or upgrade the corresponding background model of optimizing.When concrete the application, the mode that background model is optimized in above-mentioned pre-conditioned and corresponding structure or renewal can have multiple, such as: in one embodiment, background model construction unit 701 can data quantity stored reach first preset value in a buffer area, and during current not relevant optimization background model with the corresponding cryptogram of this buffer area, with described universal background model is initial model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; Data quantity stored reaches first preset value in a buffer area, and it is current when the optimization background model relevant with the corresponding cryptogram of this buffer area arranged, with this optimization background model is initial model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.
In another kind of embodiment, background model construction unit 701 can be in a buffer area data quantity stored reach the integral multiple of second preset value, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.
The detailed process of the optimization background model that background model construction unit 701 structures or renewal are relevant with cryptogram among above-mentioned two embodiment can not repeat them here referring to the description among the inventive method embodiment of front.
Need to prove, when concrete the application, the registration voice signal of described registered user's typing can be typing once, also can be the repetition typing repeatedly, if repeat typing repeatedly, correspondingly, described voice recognition unit 503 can be respectively carries out speech recognition to the registration voice signal of each typing, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result.
Correspondingly, described system also can further comprise: password determining unit (not shown) is used for selecting to have the log-in password text of the recognition result of the highest likelihood score score as described registered user from a plurality of recognition results that described voice recognition unit 503 obtains.Detailed process can not repeat them here with reference to the description of front.
The vocal print cipher authentication system that the embodiment of the invention provides, when carrying out user identity identification, the voice signal of typing carries out speech recognition when not only the user being logined, determine its password content, and it is carried out voiceprint, when carrying out voiceprint, based on many background models, promptly reach the optimization background model relevant with text with the universal background model of text-independent, by selecting suitable background model to realize accurately coupling, improved the accuracy rate of carrying out authentication based on the vocal print password effectively.
In embodiments of the present invention, utilize the user to register and logon data training optimization background model, make system from initial single universal background model, constantly refinement obtains the many background models corresponding to different cryptograms, thereby, the different passwords of user have background model more targetedly for providing, improve the differentiation between the model, and then improved the accuracy rate and the recognition efficiency of speech recognition.
Each embodiment in this specification all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses all is difference with other embodiment.Especially, for system embodiment, because it is substantially similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.System embodiment described above only is schematically, and wherein said unit and module as the separating component explanation can or can not be physically to separate also.In addition, can also select wherein some or all of unit and the module purpose that realizes the present embodiment scheme according to the actual needs.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
More than disclosed only be preferred implementation of the present invention; but the present invention is not limited thereto; any those skilled in the art can think do not have a creationary variation, and, all should drop in protection scope of the present invention not breaking away from some improvements and modifications of being done under the principle of the invention prerequisite.

Claims (12)

1. a vocal print cipher authentication method is characterized in that, comprising:
Receive the voice signal of login user typing;
Extract the vocal print characteristic sequence in the described voice signal;
Described voice signal is carried out speech recognition, obtain the cryptogram of described login user;
If the cryptogram that obtains is different with the log-in password text of corresponding described login user, determine that then described login user is non-authenticated user;
If the cryptogram that obtains is identical with the log-in password text of corresponding described login user,
Then determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model;
Calculate described vocal print characteristic sequence and the likelihood score of speaker's sound-groove model of corresponding described login user and the likelihood score of described vocal print characteristic sequence and described background model respectively;
According to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculate likelihood ratio;
If described likelihood ratio, determines then that described login user is effective authenticated user greater than preset threshold, otherwise determines that described login user is non-authenticated user.
2. the method for claim 1 is characterized in that, the background model of the described login user of described definite correspondence comprises:
If the optimization background model corresponding with the cryptogram of described login user arranged, then select the background model of this optimization background model as the described login user of correspondence; Otherwise select the background model of described universal background model as the described login user of correspondence.
3. the method for claim 1 is characterized in that, described method also comprises:
The voice signal of login user typing or the vocal print characteristic sequence that extracts from the voice signal of login user typing are write and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district;
Receive the registration voice signal of registered user's typing;
Described registration voice signal is carried out speech recognition, obtain described registered user's log-in password text;
Described registration voice signal or the vocal print characteristic sequence that extracts from described registration voice signal are write and the corresponding cryptogram corresponding cache of this registration voice signal district;
Train speaker's sound-groove model of corresponding described registered user according to the registration voice signal of described registered user's typing;
In real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area.
4. method as claimed in claim 3 is characterized in that, and is described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:
If data quantity stored reaches first preset value in the buffer area, and the current not optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with described universal background model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; If data quantity stored reaches first preset value in the buffer area, and current have an optimization background model relevant with the corresponding cryptogram of this buffer area, be initial model then with this optimization background model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.
5. method as claimed in claim 3 is characterized in that, and is described in real time according to data construct in each buffer area or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area:
If data quantity stored reaches the integral multiple of second preset value in buffer area, is initial model with described universal background model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area.
6. as each described method of claim 3 to 5, it is characterized in that the registration voice signal of described registered user's typing repeats repeatedly;
Described described registration voice signal is carried out speech recognition, the log-in password text that obtains described registered user comprises:
Registration voice signal to each typing carries out speech recognition respectively, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result;
Selection has the log-in password text of the recognition result of the highest likelihood score score as described registered user.
7. a vocal print cipher authentication system is characterized in that, comprising:
Receiving element is used for when the user logins, and receives the voice signal of login user typing;
The vocal print feature extraction unit is used for extracting the vocal print characteristic sequence of described voice signal;
Voice recognition unit is used for described voice signal is carried out speech recognition, obtains the cryptogram of described login user;
Judging unit is used to judge whether the cryptogram of described voice recognition unit acquisition is identical with the log-in password of corresponding described login user;
The authentication result unit, the log-in password text that is used for judged result at described judging unit and is cryptogram that described voice recognition unit obtains and corresponding described login user determines that described login user is non-authenticated user not simultaneously;
The model determining unit, be used for judged result at described judging unit and be the log-in password text of cryptogram that described voice recognition unit obtains and described login user when identical, determine the background model of corresponding described login user, described background model comprises: with the universal background model of text-independent and with text relevant optimization background model;
First computing unit is used for calculating respectively the likelihood score of the background model that the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model of corresponding described login user and described vocal print characteristic sequence and described model determining unit determine;
Second computing unit is used for according to the likelihood score of described vocal print characteristic sequence and speaker's sound-groove model and the likelihood score of described vocal print characteristic sequence and background model, calculates likelihood ratio;
Described judging unit is used to also to judge that whether likelihood ratio that described second computing unit calculates is greater than preset threshold;
Described authentication result unit, when also being used for judged result at described judging unit and being likelihood ratio that described second computing unit calculates greater than preset threshold, determine that described login user is effective authenticated user, otherwise determine that described login user is non-authenticated user.
8. system as claimed in claim 7 is characterized in that, described system also comprises:
Inspection unit is used to check whether have the optimization background model corresponding with the log-in password text of described login user;
Described model determining unit when specifically being used for check result at described inspection unit and being having the corresponding optimization background model of log-in password text with described login user, is selected the background model of this optimization background model as the described login user of correspondence; Otherwise select the background model of described universal background model as the described login user of correspondence.
9. system as claimed in claim 8 is characterized in that,
Described voice recognition unit also is used for writing and the corresponding cryptogram corresponding cache of the voice signal of described login user typing district with the voice signal of login user typing or from the vocal print characteristic sequence that the voice signal of login user typing extracts;
Described receiving element also is used to receive the registration voice signal of registered user's typing;
Described voice recognition unit also is used for described registration voice signal is carried out speech recognition, obtains described registered user's log-in password text;
Described system also comprises:
Speaker's sound-groove model construction unit is used for training according to the registration voice signal of described registered user's typing speaker's sound-groove model of corresponding described registered user;
The background model construction unit is used for real-time data construct or the renewal optimization background model relevant with the corresponding cryptogram of described buffer area according to each buffer area.
10. system as claimed in claim 9 is characterized in that,
Described background model construction unit, specifically be used for reaching first preset value a buffer area data quantity stored, and during current not relevant optimization background model with the corresponding cryptogram of this buffer area, with described universal background model is initial model, generate the optimization background model relevant according to the data in this buffer area, and delete the data of storing in this buffer area with the corresponding cryptogram of this buffer area; Data quantity stored reaches first preset value in a buffer area, and it is current when the optimization background model relevant with the corresponding cryptogram of this buffer area arranged, with this optimization background model is initial model, should optimize background model according to the Data Update in this buffer area, and delete the data of storing in this buffer area.
11. system as claimed in claim 9 is characterized in that,
Described background model construction unit, specifically be used for reaching the integral multiple of second preset value a buffer area data quantity stored, be initial model then, regenerate the optimization background model relevant with the corresponding cryptogram of this buffer area according to the data in this buffer area with described universal background model.
12., it is characterized in that the registration voice signal of described registered user's typing repeats repeatedly as each described system of claim 9 to 11;
Described voice recognition unit carries out speech recognition to the registration voice signal of each typing respectively, obtains a plurality of recognition results and the identification likelihood score score corresponding with each recognition result;
Described system also comprises:
The password determining unit is used for selecting to have the log-in password text of the recognition result of the highest likelihood score score as described registered user from a plurality of recognition results that described voice recognition unit obtains.
CN2011102180429A 2011-08-01 2011-08-01 Voiceprint password authentication method and system Active CN102238189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011102180429A CN102238189B (en) 2011-08-01 2011-08-01 Voiceprint password authentication method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011102180429A CN102238189B (en) 2011-08-01 2011-08-01 Voiceprint password authentication method and system

Publications (2)

Publication Number Publication Date
CN102238189A true CN102238189A (en) 2011-11-09
CN102238189B CN102238189B (en) 2013-12-11

Family

ID=44888394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011102180429A Active CN102238189B (en) 2011-08-01 2011-08-01 Voiceprint password authentication method and system

Country Status (1)

Country Link
CN (1) CN102238189B (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
CN103035247A (en) * 2012-12-05 2013-04-10 北京三星通信技术研究有限公司 Method and device of operation on audio/video file based on voiceprint information
CN103456304A (en) * 2012-05-31 2013-12-18 新加坡科技研究局 Method and system for dual scoring for text-dependent speaker verification
CN103685185A (en) * 2012-09-14 2014-03-26 上海掌门科技有限公司 Mobile equipment voiceprint registration and authentication method and system
CN104021790A (en) * 2013-02-28 2014-09-03 联想(北京)有限公司 Sound control unlocking method and electronic device
CN104616655A (en) * 2015-02-05 2015-05-13 清华大学 Automatic vocal print model reconstruction method and device
WO2015081681A1 (en) * 2013-12-03 2015-06-11 Tencent Technology (Shenzhen) Company Limited Systems and methods for audio command recognition
CN104734858A (en) * 2015-04-17 2015-06-24 黑龙江中医药大学 Anti-lock USB (universal serial bus) identity authentication system and anti-lock USB identity authentication method by means of recognizing data
CN104765996A (en) * 2014-01-06 2015-07-08 讯飞智元信息科技有限公司 Voiceprint authentication method and system
CN104795068A (en) * 2015-04-28 2015-07-22 深圳市锐曼智能装备有限公司 Robot awakening control method and robot awakening control system
WO2015106728A1 (en) * 2014-01-20 2015-07-23 Tencent Technology (Shenzhen) Company Limited Data processing method and system
CN104901807A (en) * 2015-04-07 2015-09-09 合肥芯动微电子技术有限公司 Vocal print password method available for low-end chip
CN105225664A (en) * 2015-09-24 2016-01-06 百度在线网络技术(北京)有限公司 The generation method and apparatus of Information Authentication method and apparatus and sample sound
US9424837B2 (en) 2012-01-24 2016-08-23 Auraya Pty Ltd Voice authentication and speech recognition system and method
WO2016150369A1 (en) * 2015-03-24 2016-09-29 中兴通讯股份有限公司 Method and device for recording and recognising voice password
CN106100846A (en) * 2016-06-02 2016-11-09 百度在线网络技术(北京)有限公司 Voiceprint registration, authentication method and device
CN106302339A (en) * 2015-05-25 2017-01-04 腾讯科技(深圳)有限公司 Login validation method and device, login method and device
WO2017012496A1 (en) * 2015-07-23 2017-01-26 阿里巴巴集团控股有限公司 User voiceprint model construction method, apparatus, and system
CN106789957A (en) * 2016-11-30 2017-05-31 无锡小天鹅股份有限公司 The voice login method and its smart machine of laundry applications
CN106782572A (en) * 2017-01-22 2017-05-31 清华大学 The authentication method and system of speech cipher
CN106921668A (en) * 2017-03-09 2017-07-04 福建省汽车工业集团云度新能源汽车股份有限公司 User vehicle fast verification method and device based on Application on Voiceprint Recognition
CN107046517A (en) * 2016-02-05 2017-08-15 阿里巴巴集团控股有限公司 A kind of method of speech processing, device and intelligent terminal
CN107077848A (en) * 2014-09-18 2017-08-18 纽昂斯通讯公司 Method and apparatus for performing Speaker Identification
CN107105010A (en) * 2017-03-23 2017-08-29 福建省汽车工业集团云度新能源汽车股份有限公司 The quick accessing method of user vehicle and device based on GPS position information
CN107221331A (en) * 2017-06-05 2017-09-29 深圳市讯联智付网络有限公司 A kind of personal identification method and equipment based on vocal print
CN107426143A (en) * 2017-03-09 2017-12-01 福建省汽车工业集团云度新能源汽车股份有限公司 The quick accessing method of user vehicle and device based on Application on Voiceprint Recognition
WO2017215558A1 (en) * 2016-06-12 2017-12-21 腾讯科技(深圳)有限公司 Voiceprint recognition method and device
CN107690684A (en) * 2017-08-22 2018-02-13 福建联迪商用设备有限公司 A kind of cashier's machine user management method and terminal
CN108023856A (en) * 2016-11-01 2018-05-11 中国移动通信有限公司研究院 A kind of method and device of information sharing
CN106782564B (en) * 2016-11-18 2018-09-11 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice data
CN109346086A (en) * 2018-10-26 2019-02-15 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and computer readable storage medium
CN110364168A (en) * 2019-07-22 2019-10-22 南京拓灵智能科技有限公司 A kind of method for recognizing sound-groove and system based on environment sensing
CN111145758A (en) * 2019-12-25 2020-05-12 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111554307A (en) * 2020-05-20 2020-08-18 浩云科技股份有限公司 Voiceprint acquisition registration method and device
CN111566729A (en) * 2017-12-26 2020-08-21 罗伯特·博世有限公司 Speaker identification with ultra-short speech segmentation for far-field and near-field sound assistance applications
CN112233679A (en) * 2020-10-10 2021-01-15 安徽讯呼信息科技有限公司 Artificial intelligence speech recognition system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
EP1526505A1 (en) * 2003-10-24 2005-04-27 Aruze Corp. Vocal print authentication system and vocal print authentication program
CN101124623A (en) * 2005-02-18 2008-02-13 富士通株式会社 Voice authentication system
US7386448B1 (en) * 2004-06-24 2008-06-10 T-Netix, Inc. Biometric voice authentication
US20090171660A1 (en) * 2007-12-20 2009-07-02 Kabushiki Kaisha Toshiba Method and apparatus for verification of speaker authentification and system for speaker authentication

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1526505A1 (en) * 2003-10-24 2005-04-27 Aruze Corp. Vocal print authentication system and vocal print authentication program
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
US7386448B1 (en) * 2004-06-24 2008-06-10 T-Netix, Inc. Biometric voice authentication
CN101124623A (en) * 2005-02-18 2008-02-13 富士通株式会社 Voice authentication system
US20090171660A1 (en) * 2007-12-20 2009-07-02 Kabushiki Kaisha Toshiba Method and apparatus for verification of speaker authentification and system for speaker authentication

Cited By (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102510426A (en) * 2011-11-29 2012-06-20 安徽科大讯飞信息科技股份有限公司 Personal assistant application access method and system
US9424837B2 (en) 2012-01-24 2016-08-23 Auraya Pty Ltd Voice authentication and speech recognition system and method
CN104185868B (en) * 2012-01-24 2017-08-22 澳尔亚有限公司 Authentication voice and speech recognition system and method
CN103456304A (en) * 2012-05-31 2013-12-18 新加坡科技研究局 Method and system for dual scoring for text-dependent speaker verification
CN103456304B (en) * 2012-05-31 2018-06-01 新加坡科技研究局 For the dual methods of marking and system with the relevant speaker verification of text
CN103685185A (en) * 2012-09-14 2014-03-26 上海掌门科技有限公司 Mobile equipment voiceprint registration and authentication method and system
CN103685185B (en) * 2012-09-14 2018-04-27 上海果壳电子有限公司 Mobile equipment voiceprint registration, the method and system of certification
CN103035247B (en) * 2012-12-05 2017-07-07 北京三星通信技术研究有限公司 Based on the method and device that voiceprint is operated to audio/video file
CN103035247A (en) * 2012-12-05 2013-04-10 北京三星通信技术研究有限公司 Method and device of operation on audio/video file based on voiceprint information
CN104021790A (en) * 2013-02-28 2014-09-03 联想(北京)有限公司 Sound control unlocking method and electronic device
WO2015081681A1 (en) * 2013-12-03 2015-06-11 Tencent Technology (Shenzhen) Company Limited Systems and methods for audio command recognition
US10013985B2 (en) 2013-12-03 2018-07-03 Tencent Technology (Shenzhen) Company Limited Systems and methods for audio command recognition with speaker authentication
CN104765996A (en) * 2014-01-06 2015-07-08 讯飞智元信息科技有限公司 Voiceprint authentication method and system
CN104765996B (en) * 2014-01-06 2018-04-27 讯飞智元信息科技有限公司 Voiceprint password authentication method and system
WO2015106728A1 (en) * 2014-01-20 2015-07-23 Tencent Technology (Shenzhen) Company Limited Data processing method and system
CN107077848B (en) * 2014-09-18 2020-12-25 纽昂斯通讯公司 Method, computer system and program product for performing speaker recognition
CN107077848A (en) * 2014-09-18 2017-08-18 纽昂斯通讯公司 Method and apparatus for performing Speaker Identification
US10540980B2 (en) 2015-02-05 2020-01-21 Beijing D-Ear Technologies Co., Ltd. Dynamic security code speech-based identity authentication system and method having self-learning function
CN104616655A (en) * 2015-02-05 2015-05-13 清华大学 Automatic vocal print model reconstruction method and device
CN104616655B (en) * 2015-02-05 2018-01-16 北京得意音通技术有限责任公司 The method and apparatus of sound-groove model automatic Reconstruction
WO2016150369A1 (en) * 2015-03-24 2016-09-29 中兴通讯股份有限公司 Method and device for recording and recognising voice password
CN104901807B (en) * 2015-04-07 2019-03-26 河南城建学院 A kind of vocal print cryptographic methods can be used for low side chip
CN104901807A (en) * 2015-04-07 2015-09-09 合肥芯动微电子技术有限公司 Vocal print password method available for low-end chip
CN104734858B (en) * 2015-04-17 2018-01-09 黑龙江中医药大学 The USB identity authorization systems and method for the anti-locking that data are identified
CN104734858A (en) * 2015-04-17 2015-06-24 黑龙江中医药大学 Anti-lock USB (universal serial bus) identity authentication system and anti-lock USB identity authentication method by means of recognizing data
CN104795068B (en) * 2015-04-28 2018-08-17 深圳市锐曼智能装备有限公司 The wake-up control method and its control system of robot
CN104795068A (en) * 2015-04-28 2015-07-22 深圳市锐曼智能装备有限公司 Robot awakening control method and robot awakening control system
CN106302339A (en) * 2015-05-25 2017-01-04 腾讯科技(深圳)有限公司 Login validation method and device, login method and device
CN106373575A (en) * 2015-07-23 2017-02-01 阿里巴巴集团控股有限公司 Method, device and system for constructing user voiceprint model
US11043223B2 (en) 2015-07-23 2021-06-22 Advanced New Technologies Co., Ltd. Voiceprint recognition model construction
CN106373575B (en) * 2015-07-23 2020-07-21 阿里巴巴集团控股有限公司 User voiceprint model construction method, device and system
US10714094B2 (en) 2015-07-23 2020-07-14 Alibaba Group Holding Limited Voiceprint recognition model construction
WO2017012496A1 (en) * 2015-07-23 2017-01-26 阿里巴巴集团控股有限公司 User voiceprint model construction method, apparatus, and system
JP2018527609A (en) * 2015-07-23 2018-09-20 アリババ グループ ホウルディング リミテッド Method, apparatus and system for building user voiceprint model
CN105225664A (en) * 2015-09-24 2016-01-06 百度在线网络技术(北京)有限公司 The generation method and apparatus of Information Authentication method and apparatus and sample sound
CN105225664B (en) * 2015-09-24 2019-12-06 百度在线网络技术(北京)有限公司 Information verification method and device and sound sample generation method and device
CN107046517A (en) * 2016-02-05 2017-08-15 阿里巴巴集团控股有限公司 A kind of method of speech processing, device and intelligent terminal
CN106100846B (en) * 2016-06-02 2019-05-03 百度在线网络技术(北京)有限公司 Voiceprint registration, authentication method and device
CN106100846A (en) * 2016-06-02 2016-11-09 百度在线网络技术(北京)有限公司 Voiceprint registration, authentication method and device
WO2017215558A1 (en) * 2016-06-12 2017-12-21 腾讯科技(深圳)有限公司 Voiceprint recognition method and device
CN108023856A (en) * 2016-11-01 2018-05-11 中国移动通信有限公司研究院 A kind of method and device of information sharing
CN108023856B (en) * 2016-11-01 2020-10-16 中国移动通信有限公司研究院 Information sharing method and device
CN106782564B (en) * 2016-11-18 2018-09-11 百度在线网络技术(北京)有限公司 Method and apparatus for handling voice data
CN106789957A (en) * 2016-11-30 2017-05-31 无锡小天鹅股份有限公司 The voice login method and its smart machine of laundry applications
CN106782572A (en) * 2017-01-22 2017-05-31 清华大学 The authentication method and system of speech cipher
CN106782572B (en) * 2017-01-22 2020-04-07 清华大学 Voice password authentication method and system
CN107426143A (en) * 2017-03-09 2017-12-01 福建省汽车工业集团云度新能源汽车股份有限公司 The quick accessing method of user vehicle and device based on Application on Voiceprint Recognition
CN106921668A (en) * 2017-03-09 2017-07-04 福建省汽车工业集团云度新能源汽车股份有限公司 User vehicle fast verification method and device based on Application on Voiceprint Recognition
CN107105010A (en) * 2017-03-23 2017-08-29 福建省汽车工业集团云度新能源汽车股份有限公司 The quick accessing method of user vehicle and device based on GPS position information
CN107105010B (en) * 2017-03-23 2020-02-07 福建省汽车工业集团云度新能源汽车股份有限公司 Automobile user rapid login method and device based on GPS (global positioning system) position information
CN107221331A (en) * 2017-06-05 2017-09-29 深圳市讯联智付网络有限公司 A kind of personal identification method and equipment based on vocal print
CN107690684A (en) * 2017-08-22 2018-02-13 福建联迪商用设备有限公司 A kind of cashier's machine user management method and terminal
WO2019036904A1 (en) * 2017-08-22 2019-02-28 福建联迪商用设备有限公司 Cash register user management method and terminal
CN111566729A (en) * 2017-12-26 2020-08-21 罗伯特·博世有限公司 Speaker identification with ultra-short speech segmentation for far-field and near-field sound assistance applications
CN111566729B (en) * 2017-12-26 2024-05-28 罗伯特·博世有限公司 Speaker identification with super-phrase voice segmentation for far-field and near-field voice assistance applications
CN109346086A (en) * 2018-10-26 2019-02-15 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and computer readable storage medium
CN110364168A (en) * 2019-07-22 2019-10-22 南京拓灵智能科技有限公司 A kind of method for recognizing sound-groove and system based on environment sensing
CN110364168B (en) * 2019-07-22 2021-09-14 北京拓灵新声科技有限公司 Voiceprint recognition method and system based on environment perception
CN111145758A (en) * 2019-12-25 2020-05-12 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111554307A (en) * 2020-05-20 2020-08-18 浩云科技股份有限公司 Voiceprint acquisition registration method and device
CN112233679A (en) * 2020-10-10 2021-01-15 安徽讯呼信息科技有限公司 Artificial intelligence speech recognition system
CN112233679B (en) * 2020-10-10 2024-02-13 安徽讯呼信息科技有限公司 Artificial intelligence speech recognition system

Also Published As

Publication number Publication date
CN102238189B (en) 2013-12-11

Similar Documents

Publication Publication Date Title
CN102238189B (en) Voiceprint password authentication method and system
CN102238190B (en) Identity authentication method and system
JP7362851B2 (en) Neural network for speaker verification
US7813927B2 (en) Method and apparatus for training a text independent speaker recognition system using speech data with text labels
EP1989701B1 (en) Speaker authentication
CN104900235B (en) Method for recognizing sound-groove based on pitch period composite character parameter
JP6303971B2 (en) Speaker change detection device, speaker change detection method, and computer program for speaker change detection
CN111418009A (en) Personalized speaker verification system and method
JP5106371B2 (en) Method and apparatus for verification of speech authentication, speaker authentication system
US20090119103A1 (en) Speaker recognition system
CN102509547A (en) Method and system for voiceprint recognition based on vector quantization based
US20230401338A1 (en) Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium
CN104765996A (en) Voiceprint authentication method and system
Kannadaguli et al. A comparison of Gaussian mixture modeling (GMM) and hidden Markov modeling (HMM) based approaches for automatic phoneme recognition in Kannada
Sturim et al. Classification methods for speaker recognition
Trabelsi et al. A multi level data fusion approach for speaker identification on telephone speech
JP3920749B2 (en) Acoustic model creation method for speech recognition, apparatus thereof, program thereof and recording medium thereof, speech recognition apparatus using acoustic model
CN109872721A (en) Voice authentication method, information processing equipment and storage medium
Panda et al. Study of speaker recognition systems
Hsu et al. Speaker verification without background speaker models
Ahmad et al. Vector quantization decision function for Gaussian Mixture Model based speaker identification
Ahn et al. On effective speaker verification based on subword model
Logan et al. A real time speaker verification system using hidden Markov models
Hernaez et al. Evaluation of Speaker Verification Security and Detection of HMM-based Synthetic Speech
Talwar HMM-based non-intrusive speech quality and implementation of Viterbi score distribution and hiddenness based measures to improve the performance of speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP03 Change of name, title or address

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Iflytek Co., Ltd.

Address before: 230088 No. 616, Mount Huangshan Road, hi tech Development Zone, Anhui, Hefei

Patentee before: Anhui USTC iFLYTEK Co., Ltd.