CN101739870B

CN101739870B - Interactive language learning system and method

Info

Publication number: CN101739870B
Application number: CN2009101887026A
Authority: CN
Inventors: 王岚; 李崇国; 陈金玉; 蒙美玲
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2009-12-03
Filing date: 2009-12-03
Publication date: 2012-07-04
Anticipated expiration: 2029-12-03
Also published as: CN101739870A

Abstract

The invention relates to an interactive language learning system and an interactive language learning method. A core module of the interactive language learning system comprises a feature extraction module, a voice recognition module, a pronunciation evaluation module, a rhythm detection module and a rhythm evaluation module, all of which form a pronunciation and rhythm detection module. The interactive language learning system can judge and feed back the voice input of a learner in real time, so that the learner can accurately know specific pronunciation errors, and provide the memory content by combining the feedback result and the dynamic memory curve. Therefore, the learner can improve the language level gradually and form an interactive learning style.

Description

Interacting language learning system and interacting language learning method

[technical field]

The present invention relates to a kind of interacting language learning system and interacting language learning method.

[background technology]

Language learning is one of very important part during people gain knowledge.Also there is increasing people to rely on the language learning aid to improve the speed and the efficient of language learning.Abundant learning content, interactive interactive mode, personalized course, the aspect such as to be convenient to operate be the inexorable trend of the development of langue leaning system.

Dictionary essence is a kind of assisted learning system, and it only is media with the literal, though helpful on reading and writing, can not play direct help for listening and speaking.Along with the continuous development of technology such as computing machine, multimedia, voice, can the assistant learning system that aspects such as listening, speaking, reading and writing have to a certain degree or part is supported be continued to bring out.From the appearance of e-dictionary, reading following machine finally, point reader, and some learning softwares are enriched the form of assisted learning system and function such as the appearance of hearing, writing software etc. gradually.

But the weak point of these systems is the part supports that only realized listening, speaking, reading and writing, does not organically combine each link of language learning, lacks real-time false judgment and feedback, and the learner is just accepting passively.Wherein have system, but it finally gives learner's just mark or rank to voice quality evaluation and test, and the difficult tool accuracy of this mark and authority.The more important thing is, the learner is concerned about be own pronunciation concrete mistake, which place is wrong, still this evaluating pronunciation system is difficult to provide the result that the learner wants, and does not tell how the learner corrects a mistake.

Therefore, there is defective in prior art, needs to improve.

[summary of the invention]

In view of this, be necessary, the real-time feedback learning error situation of a kind of ability be provided and have interactive exercise and the interacting language learning system and the interacting language learning method of interactive memory to the problems referred to above.

A kind of interacting language learning system comprises: voice acquisition module is used to gather learner's speech data; Pronunciation and prosody detection module are used for extracting the characteristic parameter that is used to pronounce with rhythm error-detecting from speech data, and to the degree that mistake is further judged and controlled wrong demonstration, obtain final phoneme mistake and rhythm mistake; Data storage and statistical module are used to write down said phoneme mistake and rhythm mistake, and combine these mistakes to give overall assessment to learner's pronunciation situation, and evaluation result is fed back to interactive module; Interactive module comprises display interface, and said display interface is used to show phoneme mistake and rhythm mistake, learner the pronounce overall assessment and the help options of situation, and the pronunciation prompting is provided; Said pronunciation and prosody detection module comprise: characteristic extracting module is used for extracting the characteristic parameter that is used to pronounce with rhythm error-detecting from said speech data; Sound identification module combines language model or speech network based on acoustic model, and said characteristic parameter is discerned, and obtains word sequence, aligned phoneme sequence, corresponding time border, likelihood probability value respectively; The pronunciation evaluation module, the aligned phoneme sequence that is used for identification is obtained and the reference phoneme of system compare alignment, obtain phoneme mistake and help options; Prosody detection module is used to combine characteristic parameter, aligned phoneme sequence, time boundary information, likelihood probability value, adopts statistical model to obtain word and reads pattern, whole sentence intonation and time rhythm again; Rhythm evaluation module is used for reading word again pattern, whole sentence intonation and time rhythm with comparing with reference to pronunciation, obtains rhythm mistake and help options.

Preferably, the speech data of said voice acquisition module collection comprises that the pronunciation prompting that system is provided follow the speech data of reading and obtaining in a minute according to the sight that pronounces.

Preferably; Said pronunciation evaluation module at first uses the method for statistics to combine said word sequence, aligned phoneme sequence, time border and likelihood probability value to carry out the differentiation of word level content; If content is inconsistent; The system log (SYSLOG) content false, and the whole sentence of prompting content is undesirable in said interactive module, the request learner re-enters voice; Otherwise phoneme is detected, obtain the phoneme mistake, comprise insertion, deletion, the replacement mistake of phoneme in the word.

Preferably, pattern read again in said word is that unit is judged with the syllable, comprises the position of main stressed syllable in the word and the position of time stressed syllable; Said whole sentence intonation is the sentence stress of whole word, i.e. the position of stressed syllable in this sentence, and it reflects whole fundamental frequency variation tendency based on syllable and intonation; Said time rhythm is the judgement to speed of speaking and duration.

Preferably, pronunciation text, the target learning content that this pronunciation text is the learner are adopted in the pronunciation of said interactive module prompting; Or adopt with reference to pronunciation, this is the received pronunciation that the people sent out of target language country with reference to pronunciation; Again or adopt the pronunciation sight, this pronunciation sight is the sight that system provides, and requires the learner according to this pronunciation sight in a minute.

Preferably, said interactive module also comprises inputting interface, and said inputting interface is used to select memory pattern, learning content or logs off; Said display interface also is used for the display system feedack, comprises audio frequency and spelling information and said data storage and statistical module feedack; Said interactive module is selected the language learning material; Through audio frequency or text mode the learner is pointed out; Audio prompt is the pronunciation that system provides needs memory, requires the learner to spell and follow and read, and the spelling prompting is the text prompt that system provides the spelling content that needs to remember; Require the learner to spell, obtain spelling content; Said interacting language learning system also comprises text collection module and text spelling detection module, and said text collection module is used to gather said spelling content, obtains input text; Said text spelling detection module is used to check input text, through calculating the similarity editing distance of input text and model answer text, obtains misspelling; Said data storage and statistical module also are used to write down said misspelling; Said data storage and statistical module also wrap and expand a database, and concrete error statistics situation will be write this database in time, and this database is not only stored learning records, but also has stored learning content; System is according to the learning content of storing in the memory pattern of current error logging, selection and the database; Select and produce new learning content and audio frequency and spelling prompting; Feed back to said interactive module; Thereby get into the interactive learning of next round, perhaps reselect learning content, or log off according to current study schedule.

Preferably, said misspelling comprises alternative, insertion and deletion error.

Preferably; Said interactive module also is used to show the session operational scenarios of a group task form, after selected certain session operational scenarios of this interactive module, the subtask will occur; The information that the learner will provide according to this interactive module is carried out interactive operation and is pronounced and spell and finish the work; Said interacting language learning system also comprises user interface, operation discrimination module; Said user interface is used to gather said interactive operation; Said operation discrimination module is used to judge whether said interactive operation meets mission requirements, obtains operating mistake; Said data storage and statistical module also are used to write down said operating mistake, and said database has also been stored the information relevant with dialogue; Said interacting language learning system also comprises the session operational scenarios module, and error statistics and the information relevant with dialogue according to said data storage and statistical module output dynamically generate new session operational scenarios, and shows through said interactive module; The learner can select to get into new round study through said interactive module, perhaps withdraws from study.

Preferably, the implementation of said interacting language learning system be client/server approach, browser/server mode, a kind of based in the single cpu mode of embedded system.

A kind of interacting language learning method comprises: gather the speech data that the requirement of learner's follow procedure pronounces to obtain; From speech data, extract the characteristic parameter that is used to pronounce with rhythm error-detecting; Based on acoustic model, in conjunction with language model or speech network, characteristic parameter is discerned, obtain word sequence, aligned phoneme sequence, corresponding time border, likelihood probability value respectively; The reference phoneme of aligned phoneme sequence and system is compared alignment, obtain phoneme mistake and help options; In conjunction with characteristic parameter, aligned phoneme sequence, time boundary information, adopt statistical model to obtain word and read pattern, whole sentence intonation and time rhythm again; Read word again pattern, whole sentence intonation and time rhythm with comparing, obtain rhythm mistake and help options with reference to pronunciation; Show phoneme, rhythm mistake, the overall assessment and the help options of pronunciation situation, and the pronunciation prompting is provided.

Preferably, further comprising the steps of: before gathering speech data, the memory material of output audio or text mode requires the learner to pronounce and spells; Collection needs the spelling content of memory, obtains input text; The inspection input text obtains misspelling; Carry out error statistics according to the phoneme that obtains, the rhythm and misspelling, write down concrete phoneme mistake, rhythm mistake and misspelling situation, and provide evaluation score and feedback information; Show and estimate score and feedback information; Receive the instruction of selecting memory pattern, learning content or quitting a program.

Preferably, further comprising the steps of: show session operational scenarios, the learner by the session operational scenarios requirement pronounce, spelling and interactive operation; Gather interactive operation; Judge that whether interactive operation meets mission requirements, obtains operating mistake; Carry out error statistics according to the phoneme that obtains, the rhythm, spelling and operating mistake, write down concrete phoneme pronunciation, the rhythm, spelling and operating mistake situation, and provide evaluation score and feedback; Dynamically generate new session operational scenarios, and show.

The real-time phonetic entry with the learner of above-mentioned interacting language learning system ability is judged and is fed back; Learner's input audio frequency is carried out the rhythm detection of the utterance detection and the word level of phone-level; Make the learner can accurately hold oneself the pronunciation concrete wrong part; And combine feedback result and memory curve that the memory content dynamically is provided, and make the raising language proficiency that the learner can be incremental, form a kind of interactively mode of learning.

[description of drawings]

Fig. 1 is the synoptic diagram of interacting language learning system first embodiment.

Fig. 2 is the synoptic diagram of pronunciation and prosody detection module.

Fig. 3 is the synoptic diagram of interacting language learning system second embodiment.

Fig. 4 is the synoptic diagram of interacting language learning system the 3rd embodiment.

[embodiment]

Below in conjunction with accompanying drawing,, will make technical scheme of the present invention and other beneficial effects obvious through the detailed description of specific embodiments of the invention.

Fig. 1 is the synoptic diagram of interacting language learning system first embodiment.Interacting language learning system comprises the two large divisions, promptly user oriented user side 11 and the data processing end 12 that carries out background process.User side 11 provides equipment and the display interface of gathering learner's behavior, comprises voice acquisition module 112, interactive module 111; Data processing end 12 is responsible for display message is handled and generated to the data that user side 11 is gathered, and comprises pronunciation and prosody detection module 121, data storage and statistical module 122.

Voice acquisition module 112 is used to gather learner's speech data.At first need carry out silence detection for the voice that collect; It is through calculating audio frequency characteristics; Whether for example energy (Energy), zero-crossing rate (ZeroCrossing Rate) etc. judge whether phonetic entry or input is quiet etc.; Do not have phonetic entry or quiet if differentiate, will require to gather again voice.

Pronunciation and prosody detection module 121 are used for extracting the characteristic parameter that is used to pronounce with rhythm error-detecting from speech data, and to the degree that mistake is further judged and controlled wrong demonstration, obtain final phoneme mistake and rhythm mistake.

Data storage and statistical module 122 recorded content mistakes, phoneme mistake and rhythm mistake, and combine these mistakes to give overall assessment to learner's pronunciation situation, evaluation result is fed back to interactive module 111.

Interactive module 111 is used for this content, phoneme, rhythm mistake, and the overall assessment and the help options of pronunciation situation are shown to the learner, and the pronunciation prompting that comprises pronunciation text, reference pronunciation or pronunciation sight is provided.The target learning content that this pronunciation text is the learner is like word, phrase or sentence; This is the received pronunciation that the people sent out of target language country with reference to pronunciation; This pronunciation sight is the sight that system provides, and for example on the way runs into friend and greets to it, requires the learner to speak according to this sight.

Fig. 2 is the synoptic diagram of pronunciation and prosody detection module.Pronunciation and prosody detection module 121 comprise characteristic extracting module 202, sound identification module 203, pronunciation evaluation module 204, prosody detection module 205, rhythm evaluation module 206.

202 pairs of speech datas of characteristic extracting module extract the characteristic parameter that is used to pronounce with rhythm error-detecting; Perception linear forecasting parameter PLP (Perceptual Linear Prediction coefficients) for example; Mel cepstrum coefficient MFCC (Mel-frequency cepstral coefficients); Frame average energy (Energy), be the energy of all frames of crossing over of vowel; The average fundamental frequency of frame (Pitch), be the fundamental frequency of all frames of crossing over of vowel and to be crossed over frame number by it average; And before and after their to differential parameter, comprise that the forward frame average energy is poor, back is poor to the frame average energy, forward direction consonant frame average energy is poor, the average fundamental frequency of forward frame is poor, the back is poor to the average fundamental frequency of frame, the forward direction duration is poor, afterwards to the duration difference etc.

Sound identification module 203 is based on acoustic model, and combination language model or speech network, and characteristic parameter is discerned, and obtains sequence, time corresponding border and the corresponding likelihood probability value (likelihood) of word level and phone-level respectively.Can use acoustic model and a pronunciation dictionary based on hidden Markov model (HMM, HiddenMarkov Model).Its acoustic model is to use the people (Native Speakers) that collected target language country to cover the voice of all phonemes and trains and obtain; Pronunciation dictionary has not only comprised correct pronunciation, has also comprised possible incorrect pronunciations simultaneously.Its language model or speech network are the statistical models at the word level probability of happening.With the speech data of reading to import, sound identification module 203 can use the pressure alignment schemes for the learner, and the combining with pronunciation text is discerned, and obtains word sequence and aligned phoneme sequence, and time border and likelihood probability value; For the learner according to require the to speak speech data of input of sight, sound identification module can the bluebeard compound network or language model decode, obtain word sequence and aligned phoneme sequence, and the time border.

Pronunciation evaluation module 204 at first uses the method for statistics to combine the input of sound identification module 203 to carry out the differentiation of word level content.If judge by the pronunciation prompting different with the word sequence of reference pronunciation with the speech data of reading to obtain; Perhaps different with the model answer content by the speech data that obtains in a minute of pronunciation sight; With can not carrying out the judgement of phone-level, and directly get into data storage and statistical module 122, the recorded content mistake; And the whole sentence of prompting content is undesirable in interactive module 111, and the request user re-enters voice; Otherwise use character string alignment algorithm; Dynamic programming algorithm (Dynamic Programming Algorithm) for example; Compare alignment and pronounce to estimate through the reference phoneme that aligned phoneme sequence and system are provided, obtain the phoneme mistake, comprise the insertion (Insertion) of phoneme in the word according to the feedback error precision of setting; Deletion (Deletion) and replacement (Substitution) three types of mistakes, and help options.

Prosody detection module 205 comprises that the word accent pattern (Lexical stress) of word level detects, the rhythm (Prosody) detects; Its combine sound identification module 203 the result, be aligned phoneme sequence, time corresponding boundary information, likelihood probability value; With the average fundamental frequency information of frame average energy, frame that characteristic extracting module 202 obtains, pattern, whole sentence intonation and time rhythm situation read again in the word that the statistical model that provides according to system obtains in the speech data sentence.This statistical model can be the supporting vector machine model (SVM, Support Vector Machine) that obtains through training, perhaps neural network (Neural Network), perhaps hidden Markov model (HMM, Hidden Markov Model) etc.; Pattern read again in this word is that unit is judged with the syllable, comprises the position of main stressed syllable in the word and the position of time stressed syllable; This whole sentence intonation is the sentence stress of whole word, i.e. the position of stressed syllable in this sentence is based on whole fundamental frequency variation tendency of syllable and intonation; This time rhythm is the speed in a minute and the judgement of duration aspect.

Rhythm evaluation module 206 is read this word again pattern, whole sentence intonation and time rhythm with comparing with reference to pronunciation; And obtain error situation that word reads pattern again and corrects helps, and rhythm error situation and help options such as whole stressed syllable, whole tone and rhythm according to the feedback error accuracy requirement of setting.

Fig. 3 is the synoptic diagram of interacting language learning system second embodiment.The difference of itself and first embodiment has been to increase text collection module 113 that belongs to user side 11 and the text spelling detection module 123 that belongs to data processing end 12, and will do corresponding expansion with the function of data storage and statistical module 122 with the direct-connected interactive module 111 of these two modules.

Interactive module 111 comprises a display interface and an inputting interface.Display interface is used for the information that display system feeds back to the learner, comprises audio frequency and spelling information, data storage and statistical module 122 feedacks etc.Inputting interface is used to select memory pattern, learning content or logs off etc.Interactive module 111 is according to the language learning material that the learner selects or system is selected automatically, and such as word, phrase or a text chunk, the purpose to the language memory offers the learner through text or audio frequency mode.Audio prompt is the pronunciation that system provides needs memory, and requires the learner to spell and follow to read; Spelling prompting is the spelling content that system provides needs memory, such as the subalphbet of a word, and the perhaps part word of a sentence.The learner spells, reads simultaneously the content of needs memory according to prompting, thereby in pronunciation and spelling, remembers simultaneously.

Text collection module 113 is used to gather the content of the needs memory of learner's spelling, obtains input text.

Text spelling detection module 123 is used to check input text; Through calculating the similarity editing distance (Levenshtein distance) of input text and model answer text, obtain concrete alternative (Substitution), insert (Insertion), delete misspellings such as (Deletion).

Data storage and statistical module 122 carry out error statistics according to the voice mistake and the misspelling that obtain; Recording learning person's concrete phoneme pronunciation mistake, rhythm mistake and misspelling situation; And provide evaluation score and feedback, show through interactive module 111.Data storage and statistical module 122 comprise a database, and concrete error statistics situation will be write this database in time; This database has not only been stored learner's learning records, but also has stored learning content, comprises corresponding multimedia messages and model answer etc.; System selects and produces new learning content and audio frequency and spelling prompting, thereby get into the interactive mode memory of next round according to the learning content of storing in the memory pattern of active user's mistake, selection and the database.The learner also can reselect learning content according to current study schedule, perhaps withdraws from this subsystem.

Fig. 4 is the synoptic diagram of interacting language learning system the 3rd embodiment.The key distinction of itself and second embodiment has been to increase the user interface 114 that belongs to user side 11 and the operation discrimination module 124, the session operational scenarios module 125 that belong to data processing end 12, and will do corresponding expansion with the function of data storage and statistical module 122 with the direct-connected interactive module 111 of these three modules.The 3rd embodiment of interacting language learning system combines language memory and dialogue (Dialogue), abundant listening, speaking, reading and writing four key elements in the practice language study, and combines with specific scene, the utilization of learning language in the specific occasion through the mode of talking with.

Interactive module 111 is interface equipments towards the learner, is used for showing the session operational scenarios of a group task form to the learner, such as ask the way, buy vegetables, the scene of various use language such as tourism accomplishes the task of system's appointment; When the learner through this module after selected certain session operational scenarios, will successively occur dialogue, spelling, with read, subtask such as selection, the information that the learner provides according to session operational scenarios is carried out interactive operation, input voice and text message and is finished the work.

User interface 114 is used to gather the interactive operation of learner and system, for example controls direction, perhaps uses mouse to select with keyboard, obtains concrete learner and internally holds the perhaps selection of answer.

Operation discrimination module 124 is used to judge whether learner's interactive operation meets mission requirements, obtains operating mistake.

Data storage and statistical module 122 carry out error statistics according to the voice mistake, misspelling and the operating mistake that obtain; Recording learning person's concrete phoneme pronunciation mistake, rhythm mistake, misspelling and operating mistake situation; And provide the evaluation score, show through interactive module 111.Data storage and statistical module 122 comprise a database, and concrete error statistics situation will be write this database in time; This database has not only been stored learner's learning records, has also stored learning content, comprises corresponding multimedia messages and model answer etc., and has stored the information relevant with dialogue, such as session operational scenarios information, mission bit stream etc.

Session operational scenarios module 125 is according to the error statistics situation of data storage and statistical module 122 outputs and session operational scenarios, mission bit stream; Dynamically generate new session operational scenarios; And be shown to the learner through interactive module 111; The learner can select to get into the study of new round session operational scenarios through interactive module 111, perhaps selects to withdraw from study.

Above-mentioned interacting language learning system has multiple implementation, for example based on network client/server (Client/Server) mode, based on network browser/server (Browser/Server) mode, based on single cpu mode of embedded system or the like.

Based on network client, server mode: its client is learner's access terminal; Phonetic entry, text input, voice playing and mouse-keyboard operation are provided; And the input audio frequency is accomplished functions such as silence detection, feature extraction and Network Transmission, session operational scenarios generation, and its server end is accomplished functions such as the incorrect pronunciations detection of input voice, word accent mode detection, rhythm detection, spell check, error feedback, help options feedback, the generation of session operational scenarios content, database manipulation, learning information statistics, Network Transmission.

Based on network browser, server mode: its browser is learner's access terminal; Phonetic entry, text input, voice playing, mouse-keyboard operation, Network Transmission, session operational scenarios are provided; And pass through plug-in unit (Plug-in) and accomplish operations such as input audio frequency completion silence detection and feature extractions; Its server comprises data processing server and Web server; Wherein the data server end is accomplished functions such as incorrect pronunciations detection to the input voice, word accent mode detection, rhythm detection, spell check, error feedback, help options feedback, conversation content generation, database manipulation, learning information statistics, Network Transmission; Wherein Web server is the access server of browser, carries out direct data transmission between browser and the data processing server.

Unit mode based on embedded system: mode detection, rhythm detection, spell check, error feedback, conversation content generation, database manipulation, learning information statistics etc. read again in incorrect pronunciations detection and the word of in a program frame, accomplishing phonetic entry, text input, voice playing, audio mute detection, audio feature extraction, input voice.

Above-mentioned interacting language learning system has made up a kind of interacting language learning platform; Make listening, speaking, reading and writing four key elements in the abundant practice language study of learner; Organically combine each link of language learning; Provide the high scene dialogue study form of degree of freedom to improve learner's interest, the enthusiasm of transferring the learner initiatively participates in the middle of the study it, and provides real-time false judgment and feedback.

Above-mentioned interacting language learning system detects and the rhythm (Prosody) detection the incorrect pronunciations (Mispronunciation) that learner's input audio frequency carries out real-time phone-level (Phone-level), and the rhythm detects the word accent pattern (Lexical stress) that comprises word level and detects and correct help, the rhythm (Prosody) detection and imitate help; Wherein the incorrect pronunciations of phone-level detects the input voice is carried out the speech recognition of phone-level, and points out the concrete phoneme that it makes a mistake; Wherein the word accent mode detection of word level and the aligned phoneme sequence that correct to help the detection according to phone-level to obtain are carried out the identification of word level, identify the word stress pattern of word and provide the error type that compares with correct word stress pattern; Wherein rhythm detection and imitation help to comprise the sentence stressed (Sentence Stress) to the pronunciation statement; Rhythm (Rhythm); The contrast of the rhythm of the check and analysis of intonation aspects such as (Intonation) and the statement of RP is differentiated, and be given on the rhythm evaluation and with the help options of imitation RP statement.Make the learner can accurately hold oneself the pronunciation concrete wrong part.And combine feedback result and memory curve that the memory content dynamically is provided, make the raising language proficiency that the learner can be incremental.

The above embodiment has only been expressed several kinds of embodiments of the present invention, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to claim of the present invention.Should be pointed out that for the person of ordinary skill of the art under the prerequisite that does not break away from the present invention's design, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with accompanying claims.

Claims

1. an interacting language learning system is characterized in that, comprising:

Voice acquisition module is used to gather learner's speech data;

Pronunciation and prosody detection module are used for extracting the characteristic parameter that is used to pronounce with rhythm error-detecting from speech data, and to the degree that mistake is further judged and controlled wrong demonstration, obtain final phoneme mistake and rhythm mistake;

Data storage and statistical module are used to write down said phoneme mistake and rhythm mistake, and combine these mistakes to give overall assessment to learner's pronunciation situation, and evaluation result is fed back to interactive module;

Interactive module comprises display interface, and said display interface is used to show phoneme mistake and rhythm mistake, learner the pronounce overall assessment and the help options of situation, and the pronunciation prompting is provided;

Said pronunciation and prosody detection module comprise:

Characteristic extracting module is used for extracting the characteristic parameter that is used to pronounce with rhythm error-detecting from said speech data;

Sound identification module combines language model or speech network based on acoustic model, and said characteristic parameter is discerned, and obtains word sequence, aligned phoneme sequence, corresponding time border, likelihood probability value respectively;

The pronunciation evaluation module, the aligned phoneme sequence that is used for identification is obtained and the reference phoneme of system compare alignment, obtain phoneme mistake and help options;

Prosody detection module is used to combine characteristic parameter, aligned phoneme sequence, time boundary information, likelihood probability value, adopts statistical model to obtain word and reads pattern, whole sentence intonation and time rhythm again;

Rhythm evaluation module is used for reading word again pattern, whole sentence intonation and time rhythm with comparing with reference to pronunciation, obtains rhythm mistake and help options.

2. interacting language learning system according to claim 1 is characterized in that: the speech data of said voice acquisition module collection comprises that the pronunciation prompting that system is provided follow the speech data of reading and obtaining in a minute according to the sight that pronounces.

3. interacting language learning system according to claim 1; It is characterized in that: said pronunciation evaluation module at first uses the method for statistics to combine said word sequence, aligned phoneme sequence, time border and likelihood probability value to carry out the differentiation of word level content; If content is inconsistent; The system log (SYSLOG) content false, and the whole sentence of prompting content is undesirable in said interactive module, the request learner re-enters voice; Otherwise phoneme is detected, obtain the phoneme mistake, comprise insertion, deletion, the replacement mistake of phoneme in the word.

4. interacting language learning system according to claim 1 is characterized in that: pattern read again in said word is that unit is judged with the syllable, comprises the position of main stressed syllable in the word and the position of time stressed syllable; Said whole sentence intonation is the sentence stress of whole word, i.e. the position of stressed syllable in this sentence, and it reflects whole fundamental frequency variation tendency based on syllable and intonation; Said time rhythm is the judgement to speed of speaking and duration.

5. interacting language learning system according to claim 1 is characterized in that: pronunciation text, the target learning content that this pronunciation text is the learner are adopted in the pronunciation prompting of said interactive module; Or adopt with reference to pronunciation, this is the received pronunciation that the people sent out of target language country with reference to pronunciation; Again or adopt the pronunciation sight, this pronunciation sight is the sight that system provides, and requires the learner according to this pronunciation sight in a minute.

6. interacting language learning system according to claim 1 is characterized in that: said interactive module also comprises inputting interface, and said inputting interface is used to select memory pattern, learning content or logs off; Said display interface also is used for the display system feedack, comprises audio frequency and spelling information and said data storage and statistical module feedack; Said interactive module is selected the language learning material; Through audio frequency or text mode the learner is pointed out; Audio prompt is the pronunciation that system provides needs memory, requires the learner to spell and follow and read, and the spelling prompting is the text prompt that system provides the spelling content that needs to remember; Require the learner to spell, obtain spelling content;

Said interacting language learning system also comprises text collection module and text spelling detection module, and said text collection module is used to gather said spelling content, obtains input text; Said text spelling detection module is used to check input text, through calculating the similarity editing distance of input text and model answer text, obtains misspelling;

Said data storage and statistical module also are used to write down said misspelling; Said data storage and statistical module also comprise a database, and concrete error statistics situation will be write this database in time, and this database is not only stored learning records, but also has stored learning content; System is according to the learning content of storing in the memory pattern of current error logging, selection and the database; Select and produce new learning content and audio frequency and spelling prompting; Feed back to said interactive module; Thereby get into the interactive learning of next round, perhaps reselect learning content, or log off according to current study schedule.

7. interacting language learning system according to claim 6 is characterized in that: said misspelling comprises alternative, insertion and deletion error.

8. interacting language learning system according to claim 7; It is characterized in that: said interactive module also is used to show the session operational scenarios of a group task form; After selected certain session operational scenarios of this interactive module; The subtask will occur, the information that the learner will provide according to this interactive module is carried out interactive operation and is pronounced and spell and finish the work;

Said interacting language learning system also comprises user interface, operation discrimination module; Said user interface is used to gather said interactive operation; Said operation discrimination module is used to judge whether said interactive operation meets mission requirements, obtains operating mistake;

Said data storage and statistical module also are used to write down said operating mistake, and said database has also been stored the information relevant with dialogue;

Said interacting language learning system also comprises the session operational scenarios module, and error statistics and the information relevant with dialogue according to said data storage and statistical module output dynamically generate new session operational scenarios, and shows through said interactive module; The learner can select to get into new round study through said interactive module, perhaps withdraws from study.

9. interacting language learning system according to claim 1 is characterized in that: the implementation of said interacting language learning system is client/server approach, browser/server mode, a kind of based in the single cpu mode of embedded system.

10. interacting language learning method comprises:

Gather the speech data that the requirement of learner's follow procedure pronounces to obtain;

From speech data, extract the characteristic parameter that is used to pronounce with rhythm error-detecting;

Based on acoustic model, in conjunction with language model or speech network, characteristic parameter is discerned, obtain word sequence, aligned phoneme sequence, corresponding time border, likelihood probability value respectively;

The reference phoneme of aligned phoneme sequence and system is compared alignment, obtain phoneme mistake and help options;

In conjunction with characteristic parameter, aligned phoneme sequence, time boundary information, adopt statistical model to obtain word and read pattern, whole sentence intonation and time rhythm again;

Read word again pattern, whole sentence intonation and time rhythm with comparing, obtain rhythm mistake and help options with reference to pronunciation;

Show phoneme, rhythm mistake, the overall assessment and the help options of pronunciation situation, and the pronunciation prompting is provided.

11. interacting language learning method according to claim 10 is characterized in that, and is further comprising the steps of:

Before gathering speech data, the memory material of output audio or text mode requires the learner to pronounce and spells;

Collection needs the spelling content of memory, obtains input text;

The inspection input text obtains misspelling;

Carry out error statistics according to the phoneme that obtains, the rhythm and misspelling, write down concrete phoneme mistake, rhythm mistake and misspelling situation, and provide evaluation score and feedback information;

Show and estimate score and feedback information;

Receive the instruction of selecting memory pattern, learning content or quitting a program.

12. interacting language learning method according to claim 11 is characterized in that, and is further comprising the steps of:

Show session operational scenarios, the learner by the session operational scenarios requirement pronounce, spelling and interactive operation;

Gather interactive operation;

Judge that whether interactive operation meets mission requirements, obtains operating mistake;

Carry out error statistics according to the phoneme that obtains, the rhythm, spelling and operating mistake, write down concrete phoneme pronunciation, the rhythm, spelling and operating mistake situation, and provide evaluation score and feedback;

Dynamically generate new session operational scenarios, and show.