CN103390406A

CN103390406A - Speaker authentication method, preparation method of speaker authentication and electronic device

Info

Publication number: CN103390406A
Application number: CN201210537148XA
Authority: CN
Inventors: 孙良哲; 郑尧文
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2012-05-11
Filing date: 2012-12-12
Publication date: 2013-11-13
Anticipated expiration: 2032-12-12
Also published as: CN103390406B

Abstract

A speaker authentication method, a preparation method of speaker authentication and an electronic device are provided. The device includes a side information extractor, a speaker authentication device, and a message generator. The side information extractor is configured to extract side information. The speaker authentication device is configured to carry out speaker authentication by using speech data and a trained speaker model, wherein the speech data represent speeches made by a user. The message generator is coupled to the side information extractor and the speaker authentication device, and is configured to generate a feedback message for the user based on the side information and authentication results provided by the speaker authentication device. The abovementioned speaker authentication method, the preparation method of speaker authentication and the electronic device can provided meaningful feedback information, and thus the user experience is improved.

Description

Speaker validation method, speaker verification's preparation method and electronic installation

Technical field

The present invention is relevant for speaker verification (speaker authentication), and is particularly to the speaker validation method that a kind of side information based on extra acquisition (side information) produces feedback message, speaker verification's preparation method and electronic installation.

Background technology

Before determining to authorize the right to use, electronic installation can applying biological signature verification (biometric authentication) technology be examined user's identity.In general, after having examined user's identity, traditional electronic installation has two kinds of selections: the one, and authorize the right to use and with fixing hello messages (greeting message), respond; The 2nd, refusal is authorized the right to use and with fixing failed message, is responded (fail message).These fixing hello messages/failed messages are not very helpful for the user.

For instance, if the user is the real owner of electronic installation, but it has been refused several times by electronic installation, and described user may think that the biological characteristic validation function that electronic installation has is not handy.Under the situation of not understanding described False Rejects potential cause behind, described user may be forbidden the biological characteristic validation function of electronic installation, thereby abandons the potential benefit of this function.

Summary of the invention

In view of this, the spy provides following technical scheme:

Embodiments of the present invention provide a kind of speaker verification's preparation method, comprise: receive speech data, described speech data represents the speech that the user makes; The acquisition side information; Check this side information, to determine whether to allow to use the speaker model training of speech data; And if use the speaker model training of speech data not to be allowed to, based on side information, for the user, produce feedback message.

Embodiments of the present invention separately provide a kind of speaker validation method, comprise: receive speech data, described speech data represents the speech that the user makes; The acquisition side information; Carry out the speaker verification of the speaker model that uses speech data and trained; And based on side information and speaker verification's result, for the user, produce feedback message.

Embodiments of the present invention separately provide a kind of electronic installation, comprise: side information acquisition device, speaker model generator and message generator.The side information acquisition device is used for the acquisition side information and checks side information to determine whether to allow to use the speaker model training of speech data, and wherein said speech data represents the speech that the user makes; The speaker model generator is coupled to the side information acquisition device, is used for when the side information acquisition device allows the speaker model training of use speech data, the speaker model that uses speech data being trained; And the message generator is coupled to the side information acquisition device, is used for not producing feedback message based on side information as the user when the side information acquisition device does not allow to use the speaker model training of speech data.

Embodiments of the present invention separately provide a kind of electronic installation, comprise: side information acquisition device, speaker verification's device and message generator.The side information acquisition device is used for the acquisition side information; Speaker verification's device is used for carrying out the speaker verification of the speaker model that uses speech data and trained, and wherein said speech data represents the speech that the user makes; And the message generator is coupled to side information acquisition device and speaker verification's device, is used for producing feedback message based on the result that side information and speaker verification's device provide for the user.

Above-described speaker validation method, speaker verification's preparation method and electronic installation, can provide more significant feedback message, thereby promoted user's experience.

Description of drawings

Fig. 1 is the rough schematic view according to the electronic installation of the embodiment of the present invention.

Fig. 2 is the rough schematic view according to another electronic installation of the embodiment of the present invention.

Fig. 3 is the rough schematic view according to side information acquisition device in Fig. 1 of the embodiment of the present invention or Fig. 2.

Fig. 4 is the simplified flow chart according to the speaker verification's of the embodiment of the present invention preparation method.

Fig. 5 is the simplified flow chart according to the speaker validation method of the embodiment of the present invention.

Embodiment

Used some vocabulary to censure specific element in the middle of instructions and claims.The person of ordinary skill in the field should understand, and hardware manufacturer may be called same element with different nouns.This specification and claims book is not used the difference of title as the mode of distinguishing element, but uses the difference of element on function as the criterion of distinguishing.In instructions and claim, be an open term mentioned " comprising " in the whole text, therefore should be construed to " comprise but be not limited to ".In addition, " couple " word and comprise any means that indirectly are electrically connected that directly reach at this.Therefore, if describe first device in literary composition, be coupled to the second device, represent that first device can directly be electrically connected in the second device, or through other device or connection means, indirectly be electrically connected to the second device.

Electronic installation can utilize speaker verification's function to confirm that its user is authorized user (authorized person) or illegal user (impostor).In general, this function needs the authorized user first in the preparatory stage (preparation phase), to train a kind of speaker model (speaker model).Then, at test phase (testing phase), the speaker model that electronic installation can utilize the authorized user specifically to train is verified and is attempted to use whether the people of electronic installation is the authorized user.As a rule, at described preparation/test phase, traditional electronic installation can't attempt collecting the side information relevant with current environment (current environment), speaker's mood (emotion)/health status (health condition) or nearest event (recent events), says nothing of and utilizes described side information to promote user's experience (user experience) at described preparation/test phase.

Fig. 1 is the rough schematic view according to the electronic installation 100 of the embodiment of the present invention.For instance, electronic installation 100 can be the part of user's equipment (user equipment), and may have the speech (utterance) that the user is made and convert the function of speech data (speech data) to, and described user's equipment can be smart mobile phone (smart phone), panel computer (tablet computer) or intelligent television (smart television), but is not limited only to this.In another embodiment, electronic installation 100 can be to provide the part in the cloud (cloud) of cloud computing service (cloud computing service), and can receive the speech data of the speech that represents that the user has done the part of user's equipment.

As shown in Figure 1, electronic installation 100 comprises side information acquisition device (side information extractor) 120, speaker model generator (speaker model generator) 130, speaker verification's device (speaker authenticator) 140 and message generator (message generator) 150, but it not is defined in this, electronic installation 110 also can comprise other assemblies, for the sake of simplicity, do not draw in figure.Side information acquisition device 120 is responsible for the acquisition side information, and described side information can disclose or its combination in any in following information: produce the environment of speech data, the health status of making the people of speech (by described speech data, being represented), people's the mood of making speech and the nearest event of people of making speech.Side information acquisition device 120 further is responsible at preparatory stage enable/disable speaker model generator 130.

Fig. 2 is the rough schematic view according to another electronic installation 200 of the embodiment of the present invention.Electronic installation 200 in Fig. 2 is very approximate with the electronic installation 100 in Fig. 1, one of its difference be side information acquisition device 120 in Fig. 2 be from the irrelevant limit data (side data) of speech data capture side information, the side information acquisition device 120 in Fig. 1 is to capture side information from speech data.For instance, described limit data can to following one or its combination in any relevant, perhaps by following one or its combination in any, provided: schedule (calendar), timetable (schedule), alarm clock (alarm clock), locating module (positioning module) and weather application (weather application).

Fig. 3 is the rough schematic view according to side information acquisition device 120 in Fig. 1 of the embodiment of the present invention or Fig. 2.As shown in Figure 3, side information acquisition device 120 comprises feature extraction device (feature extractor) 122, sorter (classifier) 124 and decision module (decision module) 126.In simple terms, feature extraction device 122 is responsible for capturing feature from speech data/limit data, sorter 124 is responsible for the tagsort that will capture, and decision module 126 is responsible for producing side information based on classification results, and based on described side information enable/disable speaker model generator 130.In other embodiment, the some or all of function of decision module 126 can be carried out by message generator 150.For instance, decision module 126 can produce side information based on classification results, and allows message generator 150 determine whether enable/disable speaker model generator 130.

Electronic installation 100/200 has a preparatory stage and a test phase at least.Fig. 4 is the simplified flow chart according to the speaker verification's of the embodiment of the present invention preparation method, and described method is carried out in the preparatory stage by electronic installation 100/200.At first, in step 410, speaker model generator 130 receives the speech data that represents the speech that the user makes, and described user may be the authorized user.

Then, in step 420, side information acquisition device 120 acquisition side informations.Described side information can capture from speech data or limit data, and can disclose the environmental information of making described speech.For example, side information can be irrelevant with the language content (verbal content) of described speech.

Step 420 can be divided into three sub-steps.In the first substep of step 420, feature extraction device 122 captures feature from speech data/limit data.For instance, if feature extraction device 122 receives speech data and inputs as it, described feature may comprise spectrum signature (spectral features), prosodic features (prosodic features), space-time characteristic (spectro-temporal feature) and other features.Wherein, described spectrum signature can be for example: Mel frequency cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC), perception linear prediction (Perceptual Linear Prediction, PLP), line spectrum antithetical phrase (Line Spectral Pairs, LSP) and linear prediction cepstrum coefficient coefficient (Linear Prediction Cepstral Coefficients, LPCC); Described prosodic features can be for example: pitch (pitch), pitch poor (delta-pitch), resonance peak (formant) and sound channel (vocal tract) correlated characteristic; Described space-time characteristic can be for example: gal cypress feature (Gabor features), relative spectrum (RelAtive SpecTrA, RASTA), time domain pattern (temporal pattern) and the word speed of speaking (speaking rate); And described other features signal to noise ratio (S/N ratio) (Signal-to-Noise Ratio, SNR) for example.If feature extraction device 122 receives the limit data and inputs as it, described feature can be relevant to some nearest event/plan/task of comprising in the data of limit or positional information that some is nearest.

Then, in the second sub-steps of step 420, the feature that 124 pairs of feature extraction devices 122 of sorter provide is classified.in described process, sorter 124 can use following one or its combination in any: gauss hybrid models (Gaussian Mixture Model, GMM), hidden Markov model (Hidden Markov Model, HMM), neural network (Neural net)/Multilayer Perception (Multi-Layer Perceptron, MLP), support vector machine (Support Vector Machine, SVM), random forest (Random Forest), decision tree (Decision Tree), condition random field (Conditional Random Field, CRF), self-adaptation strengthens (Adaboost) and other.In another embodiment, sorter 124 can use pre-training pattern (pre-trained model) to carry out classification.

Then, in the 3rd sub-steps of step 420, decision module 126 produces side information based on the result of described classification.In one implemented, feature extraction device 122, sorter 124 and decision module 126 were done as a whole the running, and to realize the function of environment detector, and described side information can disclose the user and makes the details of the environment of described speech.For example, side information may represent whether environment is noisy/quiet, blow/rain or outdoor/indoor etc., may represent more specifically that also whether described environment similarly is or is positioned at restaurant, stadium or railway station etc.

In another embodiment, side information acquisition device 120 can be used as healthy detector and detects the user's who makes described speech health status.For instance, side information can represent whether the user coughs, has a stuffy nose (snuffling) or have a running nose, perhaps represent the user whether sick (because nearest doctor reservation is arranged on the calender).In another embodiment, side information acquisition device 120 can be used as the mood detector and detects the user's who makes described speech mood.For instance, side information can represent whether the user is glad, angry or sad.In another embodiment, side information acquisition device 120 can be used as the case detecting device and detects the nearest event of user.

Then, in step 430, decision module 126 checks that described side information is to determine whether described speech data is fit to the speaker model training.If be fit to, electronic installation 100/200 enters step 440, if be not suitable for, electronic installation 100/200 goes to step 450.

Under perfect condition, if while making described speech, the user cools down, is in a good state of health, and environment is quietly, and described speech data will be fit to the speaker model training.If electronic installation 100/200 utilizes unaccommodated speech data to carry out the speaker model training, the speaker model after can causing training becomes inferior, and can't normally use at follow-up test phase.For instance, the speaker model after the training that produces based on inappropriate speech data will cause more False Rejects/pass through at test phase.Therefore, step 430 similarly is porter (gate keeper), is used for guaranteeing that electronic installation 100/200 carries out the speaker model training with the speech data that is fit to.

For instance, if one or its combination in any in the following fact of side information prompting, electronic installation 100/200 can decide and not train described speaker model by directly enter step 450 from step 430, the described fact comprises: speech outdoor/noisy/blow/environment that rains makes, and has too many noise; Speech (may comprise individual/sensitive information) is to be made by the environment (for example restaurant) that other people overhear; And the user is unhealthy/calm, thereby speech is made with the improper sound of user.

In step 440, decision module 126 allows/enables 130 training of speaker model generator to use the speaker model of described speech data.If speaker model generator 130 has successfully been completed training process, speaker model generator 130 can be sent to speaker verification's device 140 with the speaker model after training.

In step 450, decision module 126 does not allow to carry out speaker model training (for example, forbidden energy speaker model generator 130), and message generator 150 produces feedback message based on described side information for the user.For instance, electronic installation 100/200 can show the feedback message of generation visually, or feedback message is play as sound, to allow the user can see/hear the content of feedback message.The feedback message that produces helps the user to understand why described speech data to be not suitable for the speaker model training, thereby prevents that the user from misreading reliability and the quality of speaker verification's function that electronic installation 100/200 has.

For instance, based on described side information, message generator 150 can be to select a piece of news the alternate message prepared of various situations to use as feedback message from one group.The comprehensive alternate message of this group can comprise following example alternate message:

" environment is noisy, and suggestion is in quietly environment execution speaker model training.（It’s?noisy?here.It’sadvised?that?you?perform?speaker?model?training?at?a?quiet?place.）”

" people is many herein, if in herein, carry out the speaker model training, other people may learn unintentionally your individual/sensitive information.（This?place?seems?crowded?with?people.If?you?perform?speaker?model?training?here,other?people?may?overhear?your?private/sensitive?information.）”

" you seem sick today, advise carrying out the speaker model training after your rehabilitation again, and your sound will be more stable when the time comes.（You?sound?ill?today.It’s?advised?that?you?perform?speaker?model?training?after?restoring?from?illness.Your?voice?should?be?more?stable?then.）”

" you are in angry state.Please carry out the speaker model training when cooling down and having stable sound status.（You?sound?angry.Maybe?you?should?perform?speaker?model?training?when?you're?calm?and?have?stable?voice.）”

" schedule shows that you will once preengage soon, please carry out the speaker model training next time again.（The?calendar?indicates?that?you’re?going?to?have?an?appointment?soon.Maybe?you?should?perform?speaker?model?training?next?time.）”

Each in above alternate message has all explained to the user potential cause that is not suitable for carrying out the speaker model training under current environment for where, thereby than traditional electronic installation, does not rely on side information and the fixing presetting message responded is more meaningful.The comprehensive alternate message of this group that message generator 150 uses can promote the user and experience, and prevents that the user from scolding speaker verification's function undeservedly.

After speaker model electronic installation 100/200 produces training for the user in step 440 after, can allow again electronic installation 100/200 self or user to adjust verification threshold (authentication threshold).For instance, electronic installation 100/200 can first be adjusted verification threshold according to user's instruction.Then, electronic installation 100/200 can utilize the verification threshold after adjustment to carry out speaker verification test, to determine whether to keep the verification threshold after described adjustment.If speaker verification test shows that the verification threshold after adjustment has the risk of too high False Rejects, electronic installation 100/200 can determine not keep described adjusted verification threshold, and the suggestion user not verification threshold arrange so high.If speaker verification test shows that the verification threshold after adjustment has the risk that too high mistake is passed through, electronic installation 100/200 can determine not keep the verification threshold after described adjustment, and the suggestion user not verification threshold arrange so low.

As mentioned above, electronic installation 100/200 also has a test phase.Fig. 5 is the simplified flow chart according to the speaker validation method of the embodiment of the present invention, and described method is carried out in test phase by electronic installation 100/200.At first, in step 510, speaker verification's device 140 receives the speech data that represents the speech that the user makes.Although step 510 similar with step 410, in step 510, whether the use of electronic installation 100/200 and uncertain this moment is the corresponding authorized user of speaker model after training.In addition, step 510 occurs in the user to be wanted to open electronic installation 100/200 or removes electronic installation 100/200 locking, perhaps wants by obtaining some individual/sensitive information with electronic installation 100/200 when.

In step 520, side information acquisition device 120 captures side information for the user.Because the step 420 that step 520 and top detailed description are crossed is closely similar, for the sake of simplicity, it is described in detail in herein and omits.

In step 530, the speaker model of speaker verification's device 140 described speech datas of use and described training is carried out the speaker verification.If the user has passed through checking, electronic installation 100/200 performs step 540; If the user is not by checking (authentication failed), electronic installation 100/200 performs step 550.

For instance, in step 530, speaker verification's device 140 can produce a mark (score), to represent the similarity degree of the speaker model after described speech data and described training.Mark is higher, shows that the authorized user's that the user that makes described speech is corresponding with speaker model after described training similarity degree is higher.If mark is positioned at the first scope, for example, the mark that described the first scope comprises surpasses authorizes threshold value, and electronic installation 100/200 execution step 540 is to authorize the user with the right to use.If mark is positioned at the second scope, for example, the mark that described the second scope comprises is equal to or less than the mandate threshold value, and electronic installation 100/200 execution step 550 is authorized the user with refusal with the right to use.

In step 530, when electronic installation 100/200 is made decision not only can according to speech data and the training after speaker model, can also be according to side information.For instance, if side information shows the user and is in noisy environment, electronic installation 100/200 can be made a kind of or its combination in any in following action to avoid False Rejects in step 530, described action comprises: lower verification threshold is set, more effectively suppress noise, and make sorter 124 as noise sorter (noisy classifier).In another embodiment, if side information shows the user and is in indignation, electronic installation 100/200 can be made a kind of or its combination in any in following action to avoid False Rejects in step 530, and described action comprises: give pitch the relevant higher trust mark of feature; And make feature relevant to pitch in speech data normalization (normalization).

After step 530, electronic installation 100/200 can produce feedback message for the user based on side information and the result.Specifically, if the result shows that the user has passed through checking, electronic installation 100/200 is authorized the user with the right to use in step 540, and based on side information, for the user, produces hello messages (feedback message).For instance, electronic installation 100/200 can show the feedback message of generation visually, or feedback message is play as sound, to allow the user can see/hear the content of feedback message.

In order to make the response that feeds back to the user comprise more Useful Information, message generator 150 can be selected feedback message based on side information from the message groups that comprises a plurality of candidate's hello messages in step 540.Below to comprise the example candidate hello messages that is used for various situations that comprises in the comprehensive message groups of a plurality of candidate's hello messages:

" welcome you! Today, bad weather, please carefully do not catch a cold.（Welcome.The?weather?doesn’t?sound?good.Be?careful?not?to?catch?a?cold.）”

" good morning/and good day/good night (Good morning/day/evening.) "

" be a fair weather today, good luck.（Today?is?a?sunny?day.Good?luck.）”

" be very glad and see again you, do not forget the reservation at 1 o'clock.（Good?to?see?you?again.Don’t?forget?your?appointment?at?one?o’clock.）”

" how is the lunch of today? (How about the lunch today) "

" would you please keep quite at the cinema.（Please?be?quite?when?you?are?in?a?movie?theater.）”

" wish that you have fun at sports ground.（Have?fun?at?this?playground.）”

" are as if you out of shape, and need to help you to preengage a doctor? (You sound ill.Should I make a doctor ' s appointment for you) "

" not sad, all things all can turn better.（Don’t?be?sad.Everything?will?be?fine.）”

" not angry, please remain calm.（Don’t?be?angry.Please?calm?down.）”

For the user, than prior device, can respond, not rely on side information and machine-made predetermined message, these candidates' hello messages is more meaningful.These candidates' hello messages can promote the user and experience.

In step 550, electronic installation 100/200 refusal is authorized the user with the right to use, and based on side information, for the user, produces feedback message.For instance, electronic installation 100/200 can show the feedback message of generation visually, or feedback message is play as sound, to allow the user can see/hear the content of feedback message.

In order to make the response to the user comprise more Useful Information, message generator 150 can be selected feedback message based on side information from the message groups that comprises a plurality of candidate's failed messages in step 550.Below to comprise the example candidate failed message that is used for various situations that comprises in the comprehensive message groups of a plurality of candidate's failed messages:

" your sound is too little, please louder talks.（Your?voice?is?too?low.Please?speak?louder.）”

" you seem sick, and this may be your sound reason by validation test not.（You?sound?ill.Maybe?that’s?why?your?voice?didn’t?pass?the?authentication?test.）”

" noisy herein, please speak near microphone, or with touch-screen, carry out release.（It’s?noisy?here.Pleaseconsider?speaking?closer?to?the?microphone?or?using?the?touchscreen?to?unlock.）”

" as if you are too exciting, please again attempt when calm.（You?soundtoo?agitated.Please?try?again?when?you?are?calm.）”

" people is many herein, please be careful, and do not allow other people hear your password.（This?place?seems?to?be?crowded?with?people.Be?careful?not?to?let?others?overhear?your?password.）”

Even the user is not by checking, if obatained score is in the 3rd scope, for example, the mark that described the 3rd scope comprises deducts one by a small margin between numerical value (small margin) at verification threshold and verification threshold, electronic installation 100/200 can be applied the concept of " pass through order and detect (sequential test) ", and gives the chance of another time of user test.For instance, electronic installation 100/200 can be prior to producing failed message to encourage the user again to attempt in step 550.After test for the second time, electronic installation 100/200 can determine whether based on the mean value of twice test result of user authorizing the user with the right to use.

For the user, than prior device, can respond, not rely on side information and machine-made predetermined message (for example, " refusal is authorized (access denied) "), these candidates' message is more meaningful.Under this situation, even the user is refused mistakenly by electronic installation 100/200, he/her also can understand failed potential cause, and can not scold user's authentication function biasedly.

The foregoing is only preferred embodiment of the present invention, all equalizations of doing according to the claims in the present invention change and modify, and all should belong to covering scope of the present invention.

Claims

1. a speaker verification preparation method comprises:

Receive speech data, this speech data represents the speech that the user makes;

The acquisition side information;

Check this side information, to determine whether to allow to use the speaker model training of this speech data; And

If use this speaker model training of this speech data not to be allowed to, based on this side information, for this user, produce feedback message.

2. speaker verification's as claimed in claim 1 preparation method, is characterized in that, more comprises:

Adjust verification threshold according to user's instruction; And

Utilize adjusted this verification threshold to carry out speaker verification's test, to determine whether to keep adjusted this verification threshold.

3. speaker verification's as claimed in claim 1 preparation method, is characterized in that, the text-independent of this side information and this speech.

4. speaker verification's as claimed in claim 1 preparation method, is characterized in that, the step of this acquisition side information comprises:

This side information of acquisition from this speech data.

5. speaker verification's as claimed in claim 1 preparation method, is characterized in that, the step of this acquisition side information comprises:

From with the irrelevant limit data of this speech data this side information of acquisition.

6. speaker verification's as claimed in claim 1 preparation method, is characterized in that, the step of this acquisition side information comprises:

At least one in the nearest event of detecting environment, this user's health status, this user's mood and this user, wherein, this user makes this speech in this environment.

7. speaker validation method comprises:

The acquisition side information;

Carry out the speaker verification of the speaker model that uses this speech data and trained; And

Result based on this side information and this speaker verification produces feedback message for this user.

8. speaker validation method as claimed in claim 7, is characterized in that, more comprises:

If this user obtains the first checking mark in particular range, by giving this another time of user speaker verification, test as this user draws the second checking mark, and based on this first checking mark and this second checking mark, determine whether with the right to use, authorize this user.

9. speaker validation method as claimed in claim 7, is characterized in that, the text-independent of this side information and this speech.

10. speaker validation method as claimed in claim 7, is characterized in that, the step of this acquisition side information comprises:

This side information of acquisition from this speech data.

11. speaker validation method as claimed in claim 7, is characterized in that, the step of this acquisition side information comprises:

12. speaker validation method as claimed in claim 7, is characterized in that, the step of this acquisition side information comprises:

13. an electronic installation comprises:

The side information acquisition device, be used for the acquisition side information and check this side information to determine whether to allow to use the speaker model training of speech data, and wherein this speech data represents the speech that the user makes;

The speaker model generator, be coupled to this side information acquisition device, is used for when this side information acquisition device allows this speaker model training of this speech data of use, the speaker model that uses this speech data being trained; And

The message generator, be coupled to this side information acquisition device, is used for not producing feedback message based on this side information for this user when this side information acquisition device does not allow to use this speaker model training of this speech data.

14. electronic installation as claimed in claim 13, is characterized in that, this side information acquisition device comprises:

The feature extraction device, be used for from a plurality of features of this speech data acquisition;

Sorter, be coupled to this feature extraction device, is used for these a plurality of features of acquisition are classified; And

Decision module, be coupled to this sorter, this speaker model generator and this message generator, and the classification results that is used for providing based on this sorter produces this side information, and based on this side information, controls this speaker model generator.

15. electronic installation as claimed in claim 13, is characterized in that, this side information acquisition device comprises:

The feature extraction device, be used for from the irrelevant a plurality of features of limit data acquisition of this speech data;

16. electronic installation as claimed in claim 13, it is characterized in that, this side information acquisition device more is used at least one of detecting environment, this user's health status, this user's mood and the nearest event of this user, and wherein, this user makes this speech in this environment.

17. an electronic installation comprises:

The side information acquisition device, be used for the acquisition side information;

Speaker verification's device, the speaker verification who is used for carrying out the speaker model that uses speech data and trained, wherein this speech data represents the speech that the user makes; And

The message generator, be coupled to this side information acquisition device and this speaker verification's device, is used for producing feedback message based on the result that this side information and this speaker verification's device provide for this user.

18. electronic installation as claimed in claim 17, is characterized in that, this side information acquisition device comprises:

Decision module, be coupled to this sorter and this message generator, and the classification results that is used for providing based on this sorter produces this side information.

19. electronic installation as claimed in claim 17, is characterized in that, this side information acquisition device comprises:

20. electronic installation as claimed in claim 17, it is characterized in that, this side information acquisition device more is used at least one of detecting environment, this user's health status, this user's mood and the nearest event of this user, and wherein, this user makes this speech in this environment.