CN1924998A - Method and system for verifying speakers - Google Patents

Method and system for verifying speakers Download PDF

Info

Publication number
CN1924998A
CN1924998A CNA2005100976490A CN200510097649A CN1924998A CN 1924998 A CN1924998 A CN 1924998A CN A2005100976490 A CNA2005100976490 A CN A2005100976490A CN 200510097649 A CN200510097649 A CN 200510097649A CN 1924998 A CN1924998 A CN 1924998A
Authority
CN
China
Prior art keywords
speech
score value
coupling score
obviously
ubm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2005100976490A
Other languages
Chinese (zh)
Inventor
黄伟
韩兆兵
张亚昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to CNA2005100976490A priority Critical patent/CN1924998A/en
Publication of CN1924998A publication Critical patent/CN1924998A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

This invention relates to speaker validation method and system, which sorts the sound signals between fuzzy sound and apparent sound to output clear V sound and to input test sound signals for sorting and to output noisy U sound; the invention uses clear V sound to generates clear aim module through common background module and uses noisy U sound to generate aim module to compute CTM matching values from UBM matching values for regression to establish first initial matching value; compute CTM matching values from UBM matching values for regression to establish second initial matching value; using first and second initial matching value to determine final matching values.

Description

The method and system that is used for verifying speakers
Technical field
Usually, the present invention relates to the verifying speakers method and system.More specifically, although also not exclusively, the present invention relates to use and obviously can distinguish the speech verification that the object module of the speech components derivation of speech (unvoiced) carries out from noisy nothing.
Background technology
Biological assay usually is the Perfected process of protection to the visit of equipment or facility.Different with the safety lock of the traditional password that relates to physics key or key entry, biological lock only can be by individual's operation specific, that authorize.Such lock is by measuring the unique biological characteristic, and for example fingerprint, eye pattern or speech signature are evaluated personal identification.When someone attempts to open such lock, measure one or more biological nature of this personage, and compare with authorizing the information in personage's database.If the coupling of finding, then lock is opened, otherwise lock keeps cutting out.Because be not easy to lose, stolen or key or the password forgotten, and because biological signature can be highly reliable and unique, biological lock is more and more universal probably.
The biological lock that relates to verifying speakers or authentication voice, the biology of paying close attention to the speech signature mates.Verifying speakers is the technology especially easily of protection visit, and this is because the user can easily carry out it in the mode of " hands-free (hands free) ".This makes for the equipment that often operates in " hands-free " pattern, for example mobile phone and PDA(Personal Digital Assistant), and verifying speakers becomes desirable safety technique.
Therefore, exist countless versions to attempt the characteristic of classifying and mating human speech, with the algorithm that allows the speech signature reliably to be used as biological key.Algorithm comprises gauss hybrid models universal background model (GMM-UBM) method.In the GMM-UBM speaker identification, with the talker of GMM modeling mandate.Use large-scale voice corpus (large speech corpus) at first to create the UBM that the high-order talker has nothing to do., use Bayes (Bayesian) or maximum a posteriori probability (MAP) adaptation method, derive single talker's model from UBM thereafter.Model and input characteristics of speech sounds vector compared, to determine whether specific input speech mate GMM-UBM model one of thereafter.
As most of detection systems, the verifying speakers system is usually by tuning, so that the receiver operating characteristic of wanting (ROC) to be provided.Compromise (DET) curve of detection/error is the universal method of measure R OC, the mistake that its assessment is two types: false rejection rate and false acceptance rate.About verifying speakers, when the personage who authorizes attempts his or his speech and speech Model Matching, but this personage is verified system when refusing inadequately, the false rejection generation.As undelegated personage, forger for example, can be successfully with his or his speech and the speech Model Matching of creating for another personage, thereby acquisition during to the unsuitable visit of equipment or facility, misconnection is taken place.
Many detection systems are calibrated, and make system operation in false acceptance rate curve and the crossing situation of false rejection rate curve.This situation such as usually is called as at error rate (EER) point, its provide too much misconnection be subjected to and too much false rejection between balance.Yet the calibration of verifying speakers system is often upset in the variation of background noise level, causes unacceptable misconnection to be subjected to number or unacceptable false rejection number.
Description of drawings
For making easy to understand of the present invention and dropping into practicality, now with reference to exemplary embodiment, shown in the appended drawing of reference, wherein in each discrete view, similar reference signs refers to assembly similar on identical or the function.Drawing is integrated into instructions together with following detailed and forms the part of instructions, and with further explaination embodiment with explain various principles and advantage, it abides by the present invention, wherein:
Fig. 1 is the synoptic diagram of the Wireless Telecom Equipment of explaination wireless telephone form;
Fig. 2 is the synoptic diagram of the adaptive flow process of explaination MAP;
Fig. 3 is the figure of the typical set of explaination receiver operating characteristic (ROC) curve;
Fig. 4 is the two group histogram score values distributions of explaination from two kinds of talkers (target speaker and forger);
Fig. 5 is the synoptic diagram of verifying speakers system, and it abides by embodiments of the invention, and it provides the robustness at the improvement of ground unrest; With
Fig. 6 is the general flow figure that the verifying speakers method of embodiments of the invention is abideed by in explaination.
One of skill in the art will appreciate that among the figure assembly for simple and clear for the purpose of draw, not necessarily abide by ratio and draw.For example, the size of some assembly may be by exaggerative, to help the understanding of promotion to embodiments of the invention with respect to other assembly among the figure.
Embodiment
Before describe in detail abideing by embodiments of the invention, need observedly be, embodiment mainly is present in the method step that relates to the method and system that is used for verifying speakers and the combination of apparatus assembly.Correspondingly, when suitable, with conventional symbols indication equipment assembly and method step, only show the specific detail relevant among the figure with understanding embodiments of the invention, thereby avoid conspicuous for those of ordinary skills details to overwhelm the description here, make the disclosure become obscure.
In this document, relational terms, for example first and second, top and the end, or the like only be used for an entity or action are made a distinction from another entity or action, and not necessarily require or such relation of hint (existences) any reality between such entity or action or in proper order.Term " comprises " or its any other distortion is intended to refer to comprising of non-exclusionism, make to comprise that process, method, article or the equipment of a group element not only comprise these elements, also can comprise do not list especially or be other intrinsic element of this process, method, article or equipment.When not having more restrictions, " comprising one " element afterwards is not precluded within and has other identical element in process, method, article or the equipment that comprises this element.
With reference to Fig. 1, the synoptic diagram of the Wireless Telecom Equipment of explaination wireless telephone 100 forms, wireless telephone 100 comprises radio frequency communications unit 102, it is connected to communicate by letter with processor 103.Wireless telephone 100 also has keypad 106 and display screen 105, and it is connected to communicate by letter with processor 103.With conspicuous, screen 105 can be a touch-screen, thereby makes keypad 106 become option as to those skilled in the art.
Processor 103 comprises encoder/decoder 111, and it has the code ROM (read-only memory) (ROM) 112 that is associated, and they can be by wireless telephone 100 transmissions or the speech or other signal storage data that receive for Code And Decode.Processor 103 also comprises microprocessor 113, and it is connected to encoder/decoder 111, character ROM (read-only memory) (ROM) 114, random-access memory (ram) 104, static programmable memory 116 and SIM interface 118 by common data and address bus 117.Static programmable memory 116 is operably connected to SIM interface 118 with SIM (often being called SIM card), except that other function, it can store selected text message of coming in and telephone number database (TND) (telephone directory) respectively, it comprises number field that is used for telephone number and the title-domain that is used for identifier, and the identifier in the title-domain is associated with one of number. At work ".SIM card and static memory 116 also can be stored password or training utterance signal corpus, to allow the protected function on the visit wireless telephone 100.
Microprocessor 113 has port, and to be connected to keypad 106 and screen 105 and alarm 115, alarm 115 typically comprises alert speaker, vibrator motor and the driving that is associated.And microprocessor 113 has port, to be connected to microphone 135 and communications speaker 140.Character ROM (read-only memory) 114 storage code words can be by the text message of communication unit 102 receptions with decoding or coding.In this embodiment, character ROM (read-only memory) 114 is also stored the op-code word (OC) that is used for microprocessor 113, and storage is used to carry out the code word of the function that is associated with wireless telephone 100.
Radio frequency communications unit 102 is receiver and the transmitter with combination of community antenna 107.Communication unit 102 has transceiver 108, and it is connected to antenna 107 via radio frequency amplifier 109.Transceiver 108 is also connected to combined modulator/demodulator 110, and communication unit 102 is connected to processor 103.
In order to provide, some extra background materials are described referring now to Fig. 2 and 3 about adaptive flow process of the MAP of prior art and EER curve respectively to of the present invention clear and complete description.
With reference to Fig. 2, the synoptic diagram of the adaptive flow process of explaination MAP, it abides by prior art.Four ellipses, the 205 expression speaker model on the left side, it comprises four Gaussian probability-density functions (PDF) in universal background model.Point 210 expressions are from the training utterance sample score value of target speaker.The adaptive flow process of MAP recomputates the distribution of each Gauss PDF, and reconfigures PDF effectively based on contiguous training utterance sample score value, and is represented like that by the ellipse revised 215 as the right of Fig. 2, the speaker model of the ellipse 215 definition modifications of modification.
With reference to Fig. 3, explain the figure of the typical set of receiver operating characteristic (ROC) curve as known in the art.The y axle is represented error rate, and the x axle is represented threshold setting, and particular detection system operates in this thresholding to produce one group of given error rate.As be applied to verifying speakers (SV) technology, for example can be included in the security feature of wireless telephone 100 like that, misconnection is subjected to the such error rate of (FA) curve representation, wherein undelegated personage, forger for example, can be successfully with his or his speech and the speech Model Matching of creating for another personage, thereby acquisition is to the unsuitable visit of phone 100.The such error rate of false rejection (FR) curve representation, wherein the personage of Shou Quaning attempts his or his speech and speech Model Matching, but the visit of phone 100 is refused inadequately.Cusp such as often is called as at error rate (EER) point.As known in the art, detection system often is calibrated, to operate in the EER point or near the EER point, so that optimal performance to be provided.
About the SV system that comprises in the wireless telephone 100, if system is calibrated to operate in the EER point corresponding to threshold setting T0, the provide convenience access security of level of the user that phone 100 can be mandate, wherein phone 100 can be verified the user's who authorizes speech apace, reliably, and refuses undelegated user's visit.Yet if customer requirements phone 100 is discerned the user's who authorizes speech more reliably, the lower FR that system can be calibrated to operate in corresponding to threshold setting T1 leads.On the other hand, if the bigger access security of customer requirements phone 100, the lower FA that the SV system can be calibrated to operate in corresponding to threshold setting T2 leads.Yet, to set for given thresholding, the ground unrest of change level can change the FA/FR that wants and lead.
With reference to Fig. 4, explaination distributes from two groups of histogram score values of two types talker (target speaker and many forgers).The x axle is represented SV test score value, and the y axle is represented the number of test speaker (utterance).Having observed the SV system works in accordance with different FA/FR ROC curves in different background noise environment.Measuring a kind of method of ground unrest in the SV system uses voice noise than (SNR).Quietly background generates higher SNR, and noisy background generates lower SNR.When the SV system when the environment with high SNR moves to the environment with low SNR, the FA/FR curve of the ROC of define system changes.The distribution that shows among Fig. 4 is based on 5dB to the SNR that changes between the 25dB.Like this, Fig. 4 is illustrated in the quiet background environment (SNR=25dB), generally will be different with the SV score value from target speaker from the talker's who acts as fraudulent substitute for a person SV score value.But in noisy relatively background environment (SNR=5dB), generally will be more approximate from the talker's who acts as fraudulent substitute for a person SV score value with SV score value from target speaker, present more multiple folded.
Further, Fig. 4 explaination is compared with the SV score value from the talker who acts as fraudulent substitute for a person, and is generally more responsive to background noise level from the SV score value of target speaker.This shows with forger's SV score value and compares, and the SV score value of target speaker laterally moving from 25dB to 5dB is bigger.The SV score value of target speaker has the susceptibility of increase for ground unrest, and this is generally to create at environment quiet relatively or " totally " because be used for the training pattern of target speaker; And generally in noisy relatively " truly " environment, create from forger's tested speech.
With reference to Fig. 5, the synoptic diagram of SV system 500, it abides by embodiments of the invention, and the robustness at the improvement of ground unrest is provided.System 500 comprises that nothing can distinguish that obviously speech (unvoiced) obviously can distinguish (U/V) speech classifier 505 of speech (voiced) with having, and it is classified to voice input signal.Three language models are operably connected to U/V sorter 505: common background speech model (UBM) 515, clean target speech model (CTM) 510, with noisy target speech model (NTM) 520.U/V sorter 505 is three components with the input speech signal frame classification: quiet, clean having obviously can be distinguished the speech voice, obviously can distinguish the speech voice with noisy nothing.Use can distinguish obviously that from clean the having of input training utterance the speech voice generate CTM 510 from UBM 515, thereby it only comprises the information of one or more particular speaker.Like this, CTM 510 can be defined as use and can distinguish obviously that from having of quiet relatively background environment the speech speech components generates or adaptive arbitrary target speech models from UBM 515.Use can distinguish obviously that from the quiet and noisy nothing of the tested speech pronunciation of true environment the speech speech components generates NTM 520 from CTM 510.Like this, NTM 520 comprise about particular speaker with about the information of background noise environment.Therefore, NTM 520 can be defined as use and can distinguish obviously that from the nothing of noisy relatively background environment the speech speech components generates or adaptive arbitrary target speech model from UBM 515.
Like this, system 500 comprises two subsystems: comprise U/V sorter 505, UBM 515 and the baseline system of CTM 510 and the environment adaption system that comprises NTM 520.After U/V sorter 505 receives input training utterance signal, system 500 carries out enrollment process, wherein can distinguish voice components 525 from totally having obviously of input training utterance signal, and use, such as, Bayes (Bayesian) or maximum a posteriori probability (MAP) adaptation method generate CTM 510 from UBM 515.
U/V sorter 505 also receives the input test voice signal, and it exports noisy nothing subsequently obviously can distinguish the speech speech components.After above-mentioned enrollment process, further adaptation procedure can distinguish obviously that from noisy nothing speech speech components 530 generates NTM 520 thereafter.
Those skilled in the art will recognize that the cost efficiency of embodiments of the invention.For example, although system 500 comprises three speech models, CTM 510 and NTM 520 all directly or indirectly generate from UBM 515.Like this, comprise in the specific embodiment of 128 original Gauss's speech models that every frame only need calculate five extra Gauss's speech models at a UBM 515, with generate CTM 510 and NTM 520 both.Like this, for the noise robustness that prior art is improved, extra assessing the cost is inappreciable with respect to system 500.
After generating CTM 510 and NTM 520, the component of input test voice signal by CTM 510, UBM 515, with NTM 520 in each handle.As shown in Figure 5, abide by one embodiment of the present of invention, calculate the initial CTM coupling score value of input test voice signal, and it is carried out normalization, to create the first preliminary coupling score value (score value 1) with coupling score value output from UBM.Also calculate the initial NTM coupling score value of input test voice signal, and it is carried out normalization, to create the second preliminary coupling score value (score value 2) with coupling score value output from UBM.The normalization process can comprise various technology, for example deducts UBM coupling score value simply.Use first and second preliminary coupling score value determine finally mate score value thereafter.For example, finally mate score value and can equal first and second preliminary coupling score value sum.
Example
Be wrong experimental result of cutting down below, it abides by the embodiments of the invention generation of using from the input test voice signal of diversity of settings environment.Background environment comprises babble noise (table 1), airport noise (table 2), railway car noise (table 3), street noise (table 4), restaurant noise (table 5) and train station noise (table 6).Use is called the telephone voice data storehouse of Polycost as input speech signal.The Polycost database is a large-scale mixing voice corpus, and it relates to and surpasses 100 talkers, comprises the English that the foreigner says.Database mainly comprises numeral, and some free voice, and it is collected from international telephone line, and each talker comprises the session above eight sections.Different background environments is represented the SNR of a segment limit.Parameter comprises 36 dimension mel-frequency cepstral coefficients (MFCC) (for example, 12MFCC+12 Δ MFCC+12 Δ Δ MFCC).Speaker model is adaptive from the UBM with 128 Gauss's speech models, 3 parts of pronunciations (utterance).
Wrong reduction-babble the noise of table 1
SNR(dB) Baseline (EER) Environment adaptive (EER) Mistake is cut down
25 20 15 10 5 2% 2.31% 2.88% 3.75% 5.19% 1.44% 1.38% 1.73% 3.17% 4.7% 28.0% 40.26% 39.93% 15.47% 9.44%
Wrong reduction-the airport noise of table 2
SNR(dB ) Baseline (EER) Environment adaptive (EER) Mistake is cut down
25 20 15 10 5 1.73% 1.92% 2.19% 2.85% 3.46% 1.36% 1.4% 1.44% 1.73% 2.57% 21.39% 27.08% 34.25% 39.3% 25.72%
Wrong reduction-railway car the noise of table 3
SNR(dB) Baseline (EER) Environment adaptive (EER) Mistake is cut down
25 20 15 10 5 1.97% 2.02% 2.32% 2.94% 3.68% 1.14% 1.44% 1.73% 2.02% 2.60% 42.13% 28.71% 25.43% 31.3% 29.35%
Wrong reduction-the street noise of table 4
SNR(dB) Baseline (EER) Environment adaptive (EER) Mistake is cut down
25 20 15 10 5 1.73% 2.31% 2.59% 2.81% 4.2% 1.44% 1.73% 2.20% 2.59% 3.36% 16.77% 25.1% 15.06% 7.8% 20%
Wrong reduction-the restaurant noise of table 5
SNR(d B) Baseline (EER) Environment adaptive (EER) Mistake is cut down
25 20 15 10 5 1.14% 1.48% 1.78% 2.58% 5.01% 1.10% 1.36% 1.62% 1.99% 4.03% 3.5% 8.1% 9.0% 22.9% 19.56%
Wrong reduction-the train of table 6 station noise
SNR (dB) Baseline (EER) Environment adaptive (EER) Mistake is cut down
25 20 15 10 5 2.07% 2.32% 2.43% 4.17% 5.48% 1.89% 2.07% 2.27% 3.25% 4.07% 8.7% 10.78% 6.58% 22.06% 25.73%
The experimental data that presents is above set forth and is abideed by verifying speakers method and system of the present invention and improved verifying speakers performance under the noisy environment of broad range significantly.Mistake is cut down scope ground unrest 3.5% arriving between 42.13% under the railway car ground unrest down at the restaurant.It is about 22% that average EER cuts down.
In a word, with reference to Fig. 6, the general flow figure of the verifying speakers method 600 of embodiments of the invention is abideed by in explaination.At first, in step 605, do not have and to distinguish that obviously speech can distinguish that obviously 505 pairs of inputs of speech (U/V) speech classifier training utterance signal classifies with having, can distinguish the speech speech components to export clean having obviously, and the input test voice signal classified, obviously can distinguish the speech speech components to export noisy nothing.Then, in step 610, use clean the having of training utterance can distinguish that obviously the speech speech components generates CTM 510 from UBM 515.In step 615, the noisy nothing of use test voice can distinguish that obviously the speech speech components generates NTM 520 from CTM510.In step 620, for the speech speech components of input test voice signal is calculated initial CTM coupling score value, and it is carried out normalization, to create the first preliminary coupling score value with coupling score value output from UBM 515.In step 625, for the speech speech components of input test voice signal calculate initial NTM coupling score value, and with coupling score value output from UBM515 it carried out normalization, to create second preliminary coupling score value thereafter.At last, in step 630, use first and second preliminary coupling score value to determine final coupling score value.
Like this, advantage of the present invention comprises more the verifying speakers system 500 and method 600 of robust, and it is more insensitive for ground unrest.Further, the present invention is expensive efficient on calculating, although this is because use at least three models 510,515,520, CTM 510 is to derive from UBM 515 with NTM 515, therefore only calculates the extra Gauss's speech model of relatively small number purpose.
Above details describe exemplary embodiment only be provided, and be not intended to limit the scope of the invention, applicability or configuration.On the contrary, the detailed description of exemplary embodiment provides such description to those skilled in the art, and it allows them to realize exemplary embodiment of the present invention.It will be appreciated that, can the function of assembly and step with arrange in carry out various variations, and do not depart from as described essence of the present invention of appended claims and scope.Those skilled in the art will recognize that, embodiments of the invention described herein can comprise one or more conventional processors and unique program stored instruction, described one or more processor of its control is realized some of verifying speakers, most of or repertoire together with specific non-processor circuit, and is such as described herein.Non-processor circuit can include, but not limited to wireless receiver, radio transmitters, signal driver, clock circuit, power circuit and user input device.Similarly, can be the step of carrying out the method for verifying speakers with these functional interpretations.As alternative another replacement scheme, can use does not have the state machine of program stored instruction to realize some or all functions, perhaps (realize some or all functions) in one or more special IC (ASIC), wherein some combination with each function or specific function realizes as customized logic.Certainly, can use the combination of two kinds of methods.The method and apparatus of these functions has been described like this, here.Further, although may need significant effort, and exist by, such as, pot life, current techniques, with many design alternatives of economic consideration equal excitation, when being instructed by notion disclosed herein and principle, expection those of ordinary skills can easily generate such software instruction and program and IC, and only need minimum experiment.
In the detailed description in front, specific embodiment of the present invention has been described.Yet those of ordinary skills recognize, can carry out various modifications and change, and do not depart from as the scope of the present invention as illustrated in the appended claims.Correspondingly, that instructions and accompanying drawing should be regarded as illustrative and nonrestrictive, and all such modifications are all attempted to be included within the scope of the present invention.The solution of benefit, advantage, problem, and anyly cause that any benefit, advantage or solution take place or the significant more element that becomes, should not be interpreted as the characteristic conclusive, essential or internal or the element of any claim.The present invention is included in any correction of carrying out during the application's pre-the determining only by appended claims, and all equivalents of claim, defines.

Claims (18)

1. method that is used for verifying speakers, it comprises:
Use nothing obviously can distinguish speech and have and obviously can distinguish speech (U/V) speech classifier, input training utterance signal is classified, to export the clean speech components that obviously can distinguish speech that has, and the input test voice signal classified, obviously can distinguish the speech components of speech to export noisy nothing;
Use described clean having to distinguish that obviously the speech speech components generates clean target speech model (CTM) from universal background model (UBM);
Use described noisy nothing can distinguish that obviously the speech speech components generates noisy target speech model (NTM) from described CTM;
For described input test voice signal calculates initial CTM coupling score value, and it is carried out normalization, to create the first preliminary coupling score value with coupling score value output from described UBM;
For described input test voice signal calculates initial NTM coupling score value, and it is carried out normalization, to create the second preliminary coupling score value with coupling score value output from described UBM; With
Use described first and second preliminary coupling score value to determine final coupling score value.
2. the method for claim 1, wherein described UBM comprises the Gauss's speech model that surpasses 100 calculating.
3. method as claimed in claim 2 wherein, is calculated and is no more than five extra Gauss's speech models to generate described CTM and described NTM.
4. the method for claim 1, wherein described initial CTM coupling score value is carried out normalization, to create the described first preliminary coupling score value by deducting to export from the coupling score value of described UBM.
5. the method for claim 1, wherein described initial NTM coupling score value is carried out normalization, to create the described second preliminary coupling score value by deducting to export from the coupling score value of described UBM.
6. the method for claim 1, wherein generate described NTM and also use quiet component output, with the definition background noise level from described U/V speech classifier from described CTM.
7. the method for claim 1, wherein described final coupling score value is described first and second preliminary coupling score value sum.
8. the method for claim 1, wherein described U/V speech classifier is three kinds of components with classification of speech signals: quiet, have and obviously can distinguish the speech voice and not have and obviously can distinguish the speech voice.
9. the method for claim 1, wherein use maximum a posteriori probability (MAP) adaptation method to generate described CTM and described NTM.
10. system that is used for verifying speakers comprises:
Nothing obviously can be distinguished speech and have and obviously can distinguish speech (U/V) speech classifier, it receives input training utterance signal, can distinguish the speech speech components to export clean having obviously, and receive the input test voice signal, obviously can distinguish the speech speech components to export noisy nothing;
Universal background model (UBM), it is operably connected to described U/V speech classifier;
Clean target speech model (CTM), it is operably connected to described U/V speech classifier and described UBM, and it is used described having and can distinguish that obviously the speech speech components generates from described UBM totally;
Noisy target speech model (NTM), it is operably connected to described U/V sorter, described UBM and described CTM, and it is used described noisy nothing and can distinguishes that obviously the speech speech components generates from described CTM;
Wherein, for described input test voice signal calculates initial CTM coupling score value, and with from the coupling score value of described UBM output it is carried out normalization, to create the first preliminary coupling score value, for described input test voice signal calculates initial NTM coupling score value, and with from the coupling score value of described UBM output it is carried out normalization, creating the second preliminary coupling score value, and use described first and second preliminary coupling score value to determine final coupling score value.
11. system as claimed in claim 10, wherein, described UBM comprises the Gauss's speech model that surpasses 100 calculating.
12. system as claimed in claim 11 wherein, calculates and is no more than five extra Gauss's speech models to generate described CTM and described NTM.
13. system as claimed in claim 10 wherein, carries out normalization by deducting to export from the coupling score value of described UBM to described initial CTM coupling score value, to create the described first preliminary coupling score value.
14. system as claimed in claim 10 wherein, carries out normalization by deducting to export from the coupling score value of described UBM to described initial NTM coupling score value, to create the described second preliminary coupling score value.
15. system as claimed in claim 10 wherein, generates described NTM from described CTM and also uses quiet component output from described U/V speech classifier, with the definition background noise level.
16. system as claimed in claim 10, wherein, described final coupling score value is described first and second preliminary coupling score value sum.
17. system as claimed in claim 10, wherein, described U/V speech classifier is three kinds of components with classification of speech signals: quiet, have and obviously can distinguish the speech voice and not have and obviously can distinguish the speech voice.
18. system as claimed in claim 10 wherein, uses maximum a posteriori probability (MAP) adaptation method to generate described CTM and described NTM.
CNA2005100976490A 2005-08-29 2005-08-29 Method and system for verifying speakers Pending CN1924998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2005100976490A CN1924998A (en) 2005-08-29 2005-08-29 Method and system for verifying speakers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2005100976490A CN1924998A (en) 2005-08-29 2005-08-29 Method and system for verifying speakers

Publications (1)

Publication Number Publication Date
CN1924998A true CN1924998A (en) 2007-03-07

Family

ID=37817607

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2005100976490A Pending CN1924998A (en) 2005-08-29 2005-08-29 Method and system for verifying speakers

Country Status (1)

Country Link
CN (1) CN1924998A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390406A (en) * 2012-05-11 2013-11-13 联发科技股份有限公司 Speaker authentication method, preparation method of speaker authentication and electronic device
CN103714818A (en) * 2013-12-12 2014-04-09 清华大学 Speaker recognition method based on noise shielding nucleus
US9548054B2 (en) 2012-05-11 2017-01-17 Mediatek Inc. Speaker authentication methods and related methods of electronic devices using calendar data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390406A (en) * 2012-05-11 2013-11-13 联发科技股份有限公司 Speaker authentication method, preparation method of speaker authentication and electronic device
US9548054B2 (en) 2012-05-11 2017-01-17 Mediatek Inc. Speaker authentication methods and related methods of electronic devices using calendar data
CN103714818A (en) * 2013-12-12 2014-04-09 清华大学 Speaker recognition method based on noise shielding nucleus
CN103714818B (en) * 2013-12-12 2016-06-22 清华大学 Method for distinguishing speek person based on noise shielding core

Similar Documents

Publication Publication Date Title
CN105556920B (en) Method and apparatus for controlling the access to application program
US6401063B1 (en) Method and apparatus for use in speaker verification
CN104143326B (en) A kind of voice command identification method and device
AU2004300140B2 (en) System and method for providing improved claimant authentication
CN103065631B (en) A kind of method of speech recognition, device
KR100655491B1 (en) Two stage utterance verification method and device of speech recognition system
CN103971680B (en) A kind of method, apparatus of speech recognition
Bigun et al. Multimodal biometric authentication using quality signals in mobile communications
EP0870300B1 (en) Speaker verification system
US20150112682A1 (en) Method for verifying the identity of a speaker and related computer readable medium and computer
US20070192095A1 (en) Methods and systems for adapting a model for a speech recognition system
US20100114573A1 (en) Method and Device for Verifying a User
CN107886957A (en) Voice wake-up method and device combined with voiceprint recognition
EP2005418B1 (en) Methods and systems for adapting a model for a speech recognition system
US9530417B2 (en) Methods, systems, and circuits for text independent speaker recognition with automatic learning features
CN111199741A (en) Voiceprint identification method, voiceprint verification method, voiceprint identification device, computing device and medium
CN101772015A (en) Method for starting up mobile terminal through voice password
CN116343797A (en) Voice awakening method and corresponding device
CN113223536A (en) Voiceprint recognition method and device and terminal equipment
KR101754954B1 (en) Certification system and method using autograph and voice
CN1924998A (en) Method and system for verifying speakers
CN101071565A (en) Method for correcting voice identification system
CN1924997A (en) Method and system for verifying speakers using dynamic threshold
CN111128198A (en) Voiceprint recognition method, voiceprint recognition device, storage medium, server and voiceprint recognition system
RU2351023C2 (en) User verification method in authorised access systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20070307