CN105575402A

CN105575402A - Network teaching real time voice analysis method

Info

Publication number: CN105575402A
Application number: CN201510971830.3A
Authority: CN
Inventors: 陈拥权; 李建中; 郑荣稳; 鲁加旺
Original assignee: Hefei Huanjing Information Technology Co Ltd
Current assignee: Hefei Huanjing Information Technology Co Ltd
Priority date: 2015-12-18
Filing date: 2015-12-18
Publication date: 2016-05-11

Abstract

The invention provides a network teaching real time voice analysis method, comprising steps of capturing a voice input, executing the real-time identification on the voice input, analyzing the identified voice input in order to mark the possible fault in the voice, and processing a text in order to extract a context dialogue prompt, wherein the voice input comprises the voice coming from the user or at least one another speaker. The real time recognition comprises a step of using automatic voice reorganization ASR to convert the voice input to the text. The context dialogue prompt is used for detecting at least one of candidate voice, a candidate work and a candidate word group in order to correct. The invention has advantages that the network teaching real time voice analysis method provides the real-time and passive monitoring to user voice, and does not need the active participation of the user. The method is highly interactive and highly, and can use the context and the language semantic.

Description

Web-based instruction real-time voice analytical approach

Technical field

The invention belongs to speech analysis techniques field, be specifically related to the method that Web-based instruction real-time voice is analyzed.

Background technology

Voice are indispensable parts of our daily life.Voice (such as, pronunciation, grammer etc.) play an important role in efficient communication accurately.Effectively can speak and people can be made easily to be understood, sound very confident, and give expression to emphasis clearly.

Conventional equipment and the technology of correcting and improve voice comprise artificial guidance and computer-aided tool.In the artificial guidance method of routine, employ teacher (that is, voice-Language Training teacher, linguist etc.) and help to correct and improve voice.Such as, on-the-spot workshop can be participated in or complete online course.But, use on-the-spot teacher to need the plenty of time.In addition, cost is usually very high.In addition, make to lack in this way in the urgent need to dirigibility.

In the computer-aided tool of routine, user opens software and the text of ocr software display (selecting in advance or Stochastic choice).The sound channel of Computer Analysis user also identifies mistake.Computing machine such as can according to the degree of closeness analyzing speech of voice and required pronunciation, or use speech recognition component that phonetic entry is converted to text, then measures the text of conversion and the degree of closeness of urtext.

But this computer-aided tool does not provide individual sense of touch.Further, computing machine is difficult to represent the reality of user, real voice content.In addition, user still needs the cost plenty of time to use instrument usually.

The speech recognition component of conventional tool is through training in advance, therefore non-individualized to heavens.In fact, conventional computer aid can not dynamically adapting user speech or user and other people talk with in content.Conventional method also needs initiatively exercise.The text selected in advance may not correspond to the word and phrase that user the most often says.Use routine techniques, may be difficult to contain some things that user habit is said, such as some term.

Summary of the invention

For shortcoming and the inferior position of above-mentioned conventional method and structure, the invention provides a kind of method that real-time voice is analyzed, wherein with highly personalized, there is ageing mode real time correction and improve user speech in the Web-based instruction.

The present invention adopts following scheme to realize above-mentioned purpose: a kind of method providing Web-based instruction real-time voice to analyze for user, it is characterized in that, described method comprises the steps:

A) phonetic entry is caught;

B) Real time identification of described phonetic entry is performed;

C) phonetic entry that identifies is analyzed to identify the possible errors in the voice of described user;

D) described text is processed to extract context dialog prompt;

Wherein, described phonetic entry comprises the voice from described user and at least one other speaker; Described Real time identification comprises use automatic speech recognition ASR and described phonetic entry is converted to text.

Described context dialog prompt is used for detecting at least one in candidate sound, word candidate and candidate's phrase to correct.

Optimize, described possible errors comprises at least one in mispronounce, syntax error and grammar mistake.

Optimize, it is characterized in that, described analysis comprises conventional semantic analysis.

Optimize, it is characterized in that, perform Real time identification and comprise the voice message used from least one other speaker.

Optimizing, it is characterized in that, identifying described possible errors by using context dialog prompt.

Optimize, also comprise: the error correcting of offering suggestions for described user in real time.

Optimize, also comprise: the user learning session creating customization, wherein said learning session comprises interactive learning session, and wherein said learning session is based on frequent fault pattern.

Optimize, also comprise: by identified mistake, visual correction, can listen correct and suggestion synonym at least one export to described user.

Optimize, also comprise: extract the mistake that described user produces; Under the help of machine learning algorithm, gather frequent fault pattern; And at least one in the described mistake described user to be produced and described frequent fault pattern is stored in user profiles.

Optimize, described user profiles comprises at least one in user nationality, user's accent and user's history, and described user's history comprises analyzed user speech, at least one in the previous response of identified mistake, previous user feedback and the fault-tolerant preference of user.

The invention has the advantages that: use Web-based instruction real-time voice analytical approach of the present invention, can provide real-time, the PASSIVE SURVEILLANCE of user speech, this does not need the active participate of user.This method height is mutual, can utilize context and dialog semantics, and highly personalized.

Accompanying drawing explanation

Fig. 1 is Web-based instruction real-time voice analytic system configuration diagram in the embodiment of the present invention;

Fig. 2 step block diagram of the present invention;

Embodiment

With reference now to accompanying drawing, more particularly with reference to Fig. 1-2, according to each exemplary embodiment of method and structure of the present invention shown in it.

In the world busy now, the time is of great rarity.The present invention does not need initiatively exercise.On the contrary, it provides real-time, the PASSIVE SURVEILLANCE of user speech.

Further, everyone is unique.As for voice, the weakness of a people may be the strong point of another person.Use default word and phrase to correct and improve voice to reach the end of the road.By analyzing from the actual speech of user's daily life instead of selected text, one group of representative and complete user's high frequency vocabulary is contained in the present invention.

The present invention also supports highly personalized mispronounce profile and speech recognition component.The customization provided, interactive course for the mistake being common in user uniquely, and can be absorbed in the exclusive problem of user.

The present invention disposes interactive user interface, and it not only can utilize user feedback analyzing speech mistake, but also can advise correcting to user.

The present invention can utilize context of dialogue information to help mark mistake.By using contextual information, dialog semantics, topic identification etc., more easily mistake can be identified.That is, the present invention can depend in user speech and/or contextual information in exchanging between user with other speakers one or more.This contextual information can be called context dialog prompt usually.

In one exemplary embodiment, the user speech (such as, dialogue, phone, meeting) in real time monitoring daily life of the present invention.

The present invention can use speech recognition technology to be text by speech conversion, and by the problematic words/phrases of some tolerance mark, these tolerance can include but not limited to following one or more: the confidence score in speech recognition, morphology contextual analysis are (such as, use the word that Text Mining Technology identifies seldom and context remainder occurs simultaneously), and semantic context analysis (such as, identifying its other party problem and repetition/correction).

The present invention can correct problematic text when not relating to user, and it can highlight problematic text alternatively in user interface, and requires user to correct or confirm automatic correction (by parole or to graphically).

The present invention can user pronunciation in the Received Pronunciation of more correct text and raw tone, mark mistake and they being stored in user profiles.

The present invention can via audio frequency and via the graphic interface with voice for user provides correction in real time.

The present invention can gather the frequent fault pattern of user and show them to user, and automatically arranges to be intended to the course correcting these mistakes.The present invention can the histogram of maintenance customer's error pattern.

Such as by down load application, the present invention can be arranged on portable set (such as smart phone), or by the Internet or so that other modes various of program and application can be provided, will be able to the invention provides as service.

In one exemplary embodiment, claimed the present invention can provide pronunciation correction and training.In fact, usual mispronounce is a subject matter of user speech.Therefore, the present invention can be particularly useful providing in pronunciation correction and improvement.

In one exemplary embodiment, method of the present disclosure can train automatic speech recognition system (ASR) to realize by the speaker that communicates in one's mother tongue.Then, the present invention continues the spoken sample reception from user in ASR.Receive ASR and export (such as, text) and the level of confidence with each word associations.The present invention then in the text mark may not be one or more word or the phrase (this can be called " problematic " text) of implication expressed by user.

Problematic text can be identified by selecting the word with low confidence score.Further, the present invention can pick out the word do not occurred within a context.The present invention can also use dialog semantics to identify problematic word.

Such as can arrange about the threshold value for searching the confidence score having question text based on test result, and can adjust and this threshold value of tuning.It is too high that the adjustment of threshold value and tuning can contribute to preventing threshold value from arranging, therefore too strict, thus cause reporting false alarm once in a while.On the contrary, it is too low that tuning and adjustment can contribute to preventing threshold value from arranging, and therefore threshold value may lack required susceptibility, thus causes sometimes ignoring some mistake.

Then various technology can be used to correct problematic word, phrase etc.These technology such as can comprise the frequent fault pattern in inquiring user profile, select the word (multiple) that pronunciation is similar, but more preferably in context and statistical language model, require that user corrects via audio frequency or graphic interface or confirms automatically to correct.The mistake extracted user and produce can be completed in every way.These modes such as comprise the voice of more correct text and the voice of user's original transcription, and send true (correctly) text by automatic speech generation system (ASG), then compare the original transcription of voice output from ASG and user.

The present invention can also via audio frequency or graphic interface for user provides optional, real-time feedback/correction.This feedback and correction can comprise to be stored in user profiles by mistake, and user error is aggregated into common schema, and by these pattern storage in user profiles.Further, the present invention can create useful graph data with the histogrammic form of user pronunciation error pattern.

The present invention can be used as the independent utility on mobile device, or is used as service by the Internet.The present invention can also be used as the translation between training language instrument, learn the instrument of speaking for child, or wherein user has reason to monitor and correct any other application of voice and/or pronunciation.

As mentioned above, problematic text comprises following text: user does not really express, but ASR is thought, and user said due to his/her mistake (such as, vicious pronunciation)." truly " text comprises the implication that user really expresses.Such as, suppose that the mispronounce of canesugar (sucrose) is read as kearnsugar (Koln sugar) by user.In this example, problematic text will be " kearnsugar ", and " truly " text will be " canesugar ".

Fig. 1 illustrates an exemplary embodiment of the present disclosure.System 100 is shown.This system comprises capture component 110, automatic speech recognition (ASR) assembly 120, error identification assembly 130, error extraction assembly 140, user interface 150, memory module 160, mistake gather assembly 170, user profiles assembly 180, course planner assembly 190 and Active Learning assembly 195.

Memory module 160 such as can represent disc driver, magnetic memory drive, light storage device, flush memory device, other types memory device, and their any various combination.

Capture component 110 receives phonetic entry.Capture component 110 can receive phonetic entry from one or more source.Phonetic entry can comprise the voice of multiple speaker.That is, in one exemplary embodiment, phonetic entry comprises the voice from a user.In another exemplary embodiment, phonetic entry comprises dialogue.Dialogue can comprise the talk between user and other speakers one or more.

The input of error identification assembly 130 comprises the text output from ASR.Text output from ASR can comprise dialog text.Be separated with its other party of dialogue from the speech text of user.Voice can also have the confidence score associated with each words/phrases.Further, error identification assembly 130 can also depend on the information be stored in user profiles assembly 180.In addition, error identification assembly 130 can respond user feedback.User feedback can be generated from user interface 150.Dotted arrow from user interface 150 represents the optional input from user.User feedback such as can comprise the confirmation and/or amendment (when prompted) that use actual (truly) text to carry out a certain problematic text.

User interface 150 can be independent or in same interface.User interface 150 can be audio frequency and graphic/text.

Further, except from except the most probable text output of ASR, the present invention can also export have multiple may the list of texts.In this respect, the present invention can also use N number of best list (top n most probable text) of each sentence

(retrieval " truly " text see below).

The present invention can detect problematic text in every way, and these modes are also non-exclusive or limited.In one exemplary embodiment, error identification module can depend on problem detection and problem retrieval to detect problematic text.Therefore, in this embodiment, whether its other party of error identification module check has a question to the prior statements of user, such as " didyoumean ... (you are meant to ...) ", " Pardon? (pardon ?) ", thus determine mistake by the dialogue analyzed between speaker.

Further, error identification assembly 130 can perform and depend on Similarity measures.Similarity measures checks whether its other party is attempted repetition or repeated the prior statements of user.If other people attempt repetition or repeat the something said of user, then this can be the strong instruction of mistake usually.

In addition, error identification assembly 130 can perform and depend on subject distillation.Subject distillation checks whether the statement of user shows and digresses from the subject.If the said content of user digresses from the subject, then it may be different from real text.Error identification assembly 130 can also with reference to the confidence score from ASR and/or with reference to user profiles (that is, which kind of frequent fault user produces).

The above-mentioned illustrative methods that error identification assembly 130 relies on and technology can perform each other simultaneously or separately and use.Other technologies and method can also be used in a similar manner.

The present invention can also retrieve real text in every way.Error identification assembly 130 can use the prompting retrieval real text from other speakers.Such as, speaker repeats or repeats the content that user said.Problematic text can also be similar to but word in conversation subject or phrase, mark real text by searching pronunciation.Further, can with reference to " N number of the best " list having the voice of question text, retrieval real text.Can also by reference to the information (such as, which kind of frequent fault user produces) in user profiles, mark real text.

Error identification assembly 130 can export various data and information.That is, the output of error identification assembly 130 can comprise text output.This text output can comprise problematic text, and the text is labeled and can illustrates together with multiple candidates of real text.

Alternatively, error identification assembly 130 can also export alternative text to avoid the common mispronounce of user.Alternative text can comprise advises semantically being similar to real text but for user, not having crackjaw a certain text to user.Such as, if user has problems in the pronunciation of word " automobile (automobile) ", then error identification assembly 130 can advise that this user replaces " car (automobile) ".A kind of mode performing this operation is by using or similar software.

In one exemplary embodiment, the output of error identification assembly 130 is supplied to the input of error extraction assembly 140.

Further, the output of error identification assembly 130 can also be received by user interface 150.

Error extraction assembly 140 uses the information extraction mistake received.In fact, in one exemplary embodiment, error extraction assembly 140 receives input, and this input comprises problematic text, and ASR therefrom generates real text and/or the original audio of question text.

Then error extraction assembly 140 such as can export the mistake with text display.This can be realized by following operation: more problematic text and real text, and finds difference.The difference found in the comparison is the mistake extracted.

Error extraction assembly 140 can export voice mistake.This realizes when error extraction module compares the voice having the voice of question text with " truly " text.The difference found in the comparison is the mistake extracted.

In addition, error extraction assembly 140 can export with the mistake of audio frequency display.This realizes when real text is sent to ASG (automatic speech generation module) by error extraction assembly, and ASG generates the orthoepy of real text.Then, by orthoepy compared with original audio, and difference is the mistake in audio frequency.

The output of error extraction assembly is supplied to user interface 150.The output of error extraction assembly can also be supplied to memory module 160.

Memory module 160 stores any mistake found.The output of memory module 160 is supplied to mistake and gathers assembly 170.Therefore, the data stored from memory module 160 are inputs that mistake gathers assembly 170.

Mistake gathers the pattern that assembly 170 can detect the user error that can be formed.Therefore, mistake gathers the frequent fault pattern that assembly 170 can gather user.These error patterns can be shown to user.Further, these error patterns can be relied on so that alternatively for user arranges to be intended to correct the course of these mistakes.

The data gathering assembly 170 inediting in mistake are outputted to and is stored in user profiles assembly 180.User profiles can comprise and store the various information of associated subscriber.Any accent that this information can include but not limited to user nationality, user has, and the historical information of associated subscriber.The frequent fault pattern that this historical information can comprise user, any user speech analyzed, any previous response to the mistake identified, and from any feedback of user.User may select some error pattern ignoring his generation.Therefore, within certain a period of time, if he produces this mistake again, then he may wish this mistake of system tolerant.This tolerance preference can also be stored in user profiles, and this configuration can be changed easily.

Create and customize the course being intended to correct and improve user speech and can also provide many benefits for user.Information from user's profile component 180 can be outputted to course planner assembly 190.Course planner assembly 190 can arrange user's course.These course height alternately and highly customizable.Course can be created by depending on user's input, user feedback, user error pattern or other user data.After having this information, user can customize course further.All courses may be used for the problem domain of mistake for particular type, user, and other difficult fields.Can these courses of passive arrangement and without any need for user time or interworking.

User also may wish initiatively to participate in some course.User initiatively can participate in the course from Active Learning assembly 195.This assembly may not use dialogue, but may need the active participate of user.Course such as can comprise the reference sentence more easily may determining mistake.In one exemplary embodiment, mistake can be trained to gather assembly 170 by machine learning.Such as, machine learning algorithm can automatically be classified to mistake and extract error pattern: if provide the pronunciation of expection and incorrect pronunciation, then use context, voice and/or morphological information as characteristic.Different machine learning techniques (such as decision tree, SVM etc.) can be used to perform classification.Error identification assembly 130 can depend on the information in user profiles assembly 180.

Can also based on Fig. 1 execution according to the illustrative methods of exemplary embodiment of the present invention.

Fig. 2 illustrates the illustrative methods according to one exemplary embodiment of the present invention.In step 200, catch phonetic entry.Then, in step 210, perform Real time identification for described phonetic entry.Then, in step 220, analyze the phonetic entry that identifies and can mistake be identified.

The method of an exemplary embodiment of the present disclosure can be provided by way of example.Show the example using the context of dialogue (that is, context dialog prompt) below.

In this example, the present invention is in the background work of the smart phone of user Sha Li.Sha Li drives to instruct " Youshouldgototheleftlearnwhenyouseethedepartmentstore; thentakealeftturnatthenextlight. (when seeing department store; you should forward left study to, then turning left at next traffic lights place) for its husband provides." answer of Sha jasmine husband agrees to but do not be sure of: " Uh-huh... (...) ".The present invention uses this contextual information, and interrupt Sha Li in the mode only having her to hear in case remind she " youprobablymeanlaneandnotlearn (your meaning may be track instead of study) ".Then Sha Li corrects herself " makesureyoustayintheleftlane. (determining that you are in left-lane) to its husband.”

Subsequently, Sha Li makes a phone call to require that its husband buys some sugar " Buysomepurekearnsugar (buying some pure Koln sugar) " again.The answer of she husband be " Wheredotheykeepthecornsugar? AllIcanfindnearflouriscanesugar. (where sell primverose? sucrose can only be seen by my nigh flour place.) " the present invention notices that the meaning of Sha jasmine may be sugarcane (sugarcane) and advise its synonym as cane (sugarcane).Then the present invention points out Sha jasmine " trysugarcaneinsteadofcane (trial sugarcane instead of cane) " and records the difficulty that she uses cane.To its husband, then Sha Li explains that she needs sucrose, and herself clearly understood to feel very proud.

At any time, when Sha jasmine needs, she can use application to carry out checking to check the mistake that she produces, and when today, her most frequent fault pattern is as " * earn " using " * ane ".This pattern will be stored in the personal profiles of Sha jasmine, and helps the more how mistake of catching her afterwards.Can arrange to correct the course that this error pattern customizes for Sha jasmine.

Person of ordinary skill in the field knows, various aspects of the present invention can be implemented as system, method or computer program.Therefore, various aspects of the present invention can be implemented as following form, that is: hardware embodiment, completely Software Implementation (comprising firmware, resident software, microcode etc.) completely, or the embodiment that hardware and software aspect combines, " circuit ", " module " or " system " can be referred to as here.In addition, various aspects of the present invention can also be embodied as the form of the computer program in one or more computer-readable medium, comprise computer-readable program code in this computer-readable medium.

The combination in any of one or more computer-readable medium can be adopted.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Computer-readable recording medium can be such as-but be not limited to the system of-electricity, magnetic, optical, electrical magnetic, infrared ray or semiconductor, device or device, or the combination of above-mentioned any appropriate.The example more specifically (non exhaustive list) of computer-readable recording medium comprises: the combination with the electrical connection of one or more wire, portable computer diskette, hard disk, random access memory (RAM), ROM (read-only memory) (ROM), erasable type programmable read only memory (EPROM or flash memory), optical fiber, Portable, compact dish ROM (read-only memory) (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate.In this document, computer-readable recording medium can be any comprising or stored program tangible medium, and this program can be used by instruction execution system, device or device or be combined with it.

Computer-readable signal media can comprise such as in a base band or as carrier wave a part propagate data-signal, wherein carry computer-readable program code.The data-signal of this propagation can adopt various ways, comprise-but be not limited to the combination of-electromagnetic signal, light signal or above-mentioned any appropriate.Computer-readable signal media can be any computer-readable medium beyond computer-readable recording medium, and this computer-readable medium can send, propagates or transmit the program for being used by instruction execution system, device or device or be combined with it.

The program code that computer-readable medium comprises can to comprise with any suitable medium transmission-but be not limited to-wireless, wired, optical cable, RF etc., or the combination of above-mentioned any appropriate.

The computer program code of the operation for performing various aspects of the present invention can be write with the combination in any of one or more programming languages, described programming language comprises object oriented program language-such as Java, Smalltalk, C++ etc., also comprises conventional process type programming language-such as " C " language or similar programming language.Program code can fully perform on the user computer, partly perform on the user computer, as one, independently software package performs, partly part performs on the remote computer or performs on remote computer or system completely on the user computer.In the situation relating to remote computer, remote computer can by the network of any kind-comprise LAN (Local Area Network) (LAN) or wide area network (WAN)-be connected to subscriber computer, or, outer computer (such as utilizing ISP to pass through Internet connection) can be connected to.

Below with reference to the process flow diagram of the method according to the embodiment of the present invention, device (system) and computer program and/or block diagram, various aspects of the present invention are described.Should be appreciated that the combination of each square frame in each square frame of process flow diagram and/or block diagram and process flow diagram and/or block diagram, can be realized by computer program instructions.These computer program instructions can be supplied to the processor of multi-purpose computer, special purpose computer or other programmable data treating apparatus, thus produce a kind of machine, make these instructions when the processor by computing machine or other programmable data treating apparatus performs, create the device of the function/action specified in the one or more square frames in realization flow figure and/or block diagram.

Also can these computer program instructions be stored in computer-readable medium, these instructions make computing machine, other programmable data treating apparatus or other equipment work in a specific way, thus the instruction be stored in computer-readable medium just produces the manufacture (articleofmanufacture) of the instruction of the function/action specified in the one or more square frames comprised in realization flow figure and/or block diagram.

Also can computer program instructions be loaded on computing machine, other programmable data treating apparatus or other equipment, make to perform sequence of operations step on computing machine, other programmable devices or other equipment, to produce computer implemented process, thus the instruction performed on the computer or other programmable apparatus is made to provide the process of the function/action specified in the one or more square frames in realization flow figure and/or block diagram.

Above-described embodiment is only a kind of specific implementation of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these apparent replacement forms all belong to protection scope of the present invention.

Claims

1. Web-based instruction real-time voice analytical approach, is characterized in that, described method comprises the steps:

A) phonetic entry is caught;

B) Real time identification of described phonetic entry is performed;

D) described text is processed to extract context dialog prompt;

2. Web-based instruction real-time voice analytical approach according to claim 1, it is characterized in that, described possible errors comprises at least one in mispronounce, syntax error and grammar mistake.

3. Web-based instruction real-time voice analytical approach according to claim 1, it is characterized in that, described analysis comprises conventional semantic analysis.

4. Web-based instruction real-time voice analytical approach according to claim 1, is characterized in that, performs Real time identification and comprises the voice message used from least one other speaker.

5. Web-based instruction real-time voice analytical approach according to claim 1, is characterized in that, identifies described possible errors by using context dialog prompt.

6. Web-based instruction real-time voice analytical approach according to claim 1, is characterized in that, also comprise: the error correcting of offering suggestions for described user in real time.

7. Web-based instruction real-time voice analytical approach according to claim 1, it is characterized in that, also comprise: the user learning session creating customization, wherein said learning session comprises interactive learning session, and wherein said learning session is based on frequent fault pattern.

8. Web-based instruction real-time voice analytical approach according to claim 1, is characterized in that, also comprise: by identified mistake, visual correction, can listen correct and suggestion synonym at least one export to described user.

9. Web-based instruction real-time voice analytical approach according to claim 1, is characterized in that, also comprise: extract the mistake that described user produces; Under the help of machine learning algorithm, gather frequent fault pattern; And at least one in the described mistake described user to be produced and described frequent fault pattern is stored in user profiles.

10. Web-based instruction real-time voice analytical approach according to claim 9, it is characterized in that, described user profiles comprises at least one in user nationality, user's accent and user's history, and described user's history comprises analyzed user speech, at least one in the previous response of identified mistake, previous user feedback and the fault-tolerant preference of user.