CN108831212B - Auxiliary device and method for oral teaching - Google Patents

Auxiliary device and method for oral teaching Download PDF

Info

Publication number
CN108831212B
CN108831212B CN201810689188.3A CN201810689188A CN108831212B CN 108831212 B CN108831212 B CN 108831212B CN 201810689188 A CN201810689188 A CN 201810689188A CN 108831212 B CN108831212 B CN 108831212B
Authority
CN
China
Prior art keywords
processing unit
standard
syllable
group
weighted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810689188.3A
Other languages
Chinese (zh)
Other versions
CN108831212A (en
Inventor
何光耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
He Guangyao
Original Assignee
Shenzhen Langease Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Langease Education Technology Co ltd filed Critical Shenzhen Langease Education Technology Co ltd
Priority to CN201810689188.3A priority Critical patent/CN108831212B/en
Publication of CN108831212A publication Critical patent/CN108831212A/en
Application granted granted Critical
Publication of CN108831212B publication Critical patent/CN108831212B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B19/00Teaching not covered by other main groups of this subclass
    • G09B19/06Foreign languages
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Signal Processing (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention relates to a spoken language teaching auxiliary device and a method, the device comprises a processing unit, and a storage unit, a voice input unit and an output unit which are respectively and electrically connected with the processing unit, wherein the storage unit stores a plurality of spoken language learning data, each spoken language learning data comprises a training sentence, a plurality of standard syllable groups and weighting information, the training sentence is composed of a plurality of standard characters which are sequentially arranged, the standard syllable groups correspond to the standard characters, and the training sentence is obtained by analyzing the standard characters in advance. The voice processing capability of the invention is strong, the pronunciation of the user can be accurately distinguished, the voice accuracy of the user can be evaluated, the virtual reality image can be presented to the user through voice interaction, the interactivity is strong, the user experience is good, the feeling of being personally on the scene is strong, and therefore, the interest of the user in oral language learning is improved.

Description

Auxiliary device and method for oral teaching
Technical Field
The invention belongs to the technical field of oral teaching, and particularly relates to an auxiliary device and method for oral teaching.
Background
Spoken language learning is one of the necessary approaches to learning foreign languages, and spoken language teaching for spoken language learning is an important means. The spoken language teaching way and method in the prior art mainly comprise audio-visual teaching, software teaching, real person (on-site or remote) teaching and the like. The oral teaching approach and method in the prior art have the following disadvantages: the defects of video teaching are as follows: the mode has no interactivity with the user; the software teaching has the following defects: the user experience is not easy to realize the telepresence of the spoken language use situation, and the user experience is not good; the disadvantages of real teaching: the price is high, and some users can not be in the mood and can not be opened when meeting real teachers.
Disclosure of Invention
In view of the above problems in the prior art, an object of the present invention is to provide a spoken language teaching assistance device and method that can avoid the above technical drawbacks.
In order to achieve the above object, the present invention provides the following technical solutions:
a spoken language teaching auxiliary device comprises a processing unit, and a storage unit, a voice input unit and an output unit which are respectively electrically connected with the processing unit.
Further, the output unit comprises a display screen and a loudspeaker which are respectively electrically connected with the processing unit.
Further, the voice input unit includes a microphone.
Further, the storage unit stores a plurality of spoken language learning data, each of which includes a training sentence, a plurality of standard syllable sets and a weighting information, the training sentence is composed of a plurality of standard characters arranged in sequence, the standard syllable sets correspond to the standard characters and are obtained by pre-parsing the standard characters; the weighted information has a weighted word corresponding to one of the standard words included in the training sentence and a weighted value corresponding to the weighted word.
Further, the processing unit controls the output unit to output a training sentence; when receiving a voice from the voice input unit, the processing unit analyzes the voice to obtain a sentence to be judged consisting of a plurality of characters to be judged and a plurality of syllable groups to be judged respectively corresponding to the characters to be judged; when the processing unit determines that the standard syllable set is not matched with at least one syllable set to be determined, the processing unit takes the at least one standard character corresponding to the at least one syllable set which is not matched as at least one unmatched character; the processing unit generates an original score related to the at least one unmatched character according to the training sentence and the at least one unmatched character by using the language identification model; when the processing unit judges that the at least one unmatched character corresponds to the weighted character, taking a weighted value corresponding to the weighted character as a target weighted value; the processing unit generates a weighted score according to the original score and the target weighted value, and controls the output unit to output an evaluation related to the voice according to the weighted score.
A spoken language teaching assistance method realized by the spoken language teaching assistance apparatus according to claim 1, comprising the steps of:
(A) the processing unit controls the output unit to output a training sentence;
(B) when receiving the voice from the voice input unit, the processing unit analyzes the voice to obtain a sentence to be determined which is composed of a plurality of characters to be determined and a plurality of syllable groups to be determined which respectively correspond to the characters to be determined;
(C) when the processing unit determines that the standard syllable set is not matched with at least one syllable set to be determined, the processing unit takes the at least one standard character corresponding to the at least one syllable set which is not matched as at least one unmatched character;
(D) the processing unit generates an original score related to the at least one unmatched character according to the training sentence and the at least one unmatched character by utilizing a language identification model;
(E) when the processing unit judges that the at least one unmatched character corresponds to the weighted character, taking a weighted value corresponding to the weighted character as a target weighted value; and
(F) the processing unit generates a weighted score according to the original score and the target weighted value, and controls the output unit to output an evaluation related to the voice according to the weighted score.
Further, the step (C) includes:
(c1) the processing unit determines that the at least one of the standard syllable sets that does not correspond to the to-be-determined syllable set when determining that the standard syllable set and the to-be-determined syllable set do not correspond to each other;
(c2) the processing unit determines whether the standard syllable sets are respectively identical to the to-be-determined syllable sets when it is determined that the standard syllable sets respectively correspond to the to-be-determined syllable sets, and determines that at least one different syllable set of the standard syllable sets does not correspond to the to-be-determined syllable sets when it is determined that the standard syllable sets are not identical to the to-be-determined syllable sets.
Further, in the step (E), the processing unit controls the display screen and/or the speaker to output the evaluation, and controls the display screen to display the at least one non-conforming character.
Further, before the step (a), the processing unit controls the display screen to display the virtual reality image of the multimedia data according to the multimedia data included in the spoken language learning data.
Further, the language identification model is an N-Gram model.
The spoken language teaching auxiliary device and the spoken language teaching auxiliary method provided by the invention have the advantages that the voice processing capacity is strong, the pronunciation of a user can be accurately distinguished, the voice accuracy of the user can be evaluated, the virtual reality image can be presented to the user through voice interaction, the interactivity is strong, the user experience is good, the immersive feeling is strong, the interest of the user in spoken language learning is improved, and the requirements of practical application can be well met.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A spoken language teaching auxiliary device comprises a storage unit, a voice input unit, an output unit and a processing unit, wherein the storage unit, the voice input unit and the output unit are electrically connected with the processing unit respectively.
The storage unit stores a plurality of spoken language learning data, each spoken language learning data comprises a training sentence, a plurality of standard syllable sets and weighting information, the training sentence is composed of a plurality of standard characters which are sequentially arranged, the standard syllable sets correspond to the standard characters and are obtained by analyzing the standard characters in advance; the weighted information has a weighted word corresponding to one of the standard words included in the training sentence and a weighted value corresponding to the weighted word.
The processing unit controls the output unit to output training sentences; when the processing unit receives a voice from the voice input unit, the processing unit analyzes the voice to obtain a sentence to be judged consisting of a plurality of characters to be judged and a plurality of syllable groups to be judged respectively corresponding to the characters to be judged; the processing unit takes the at least one standard character corresponding to the at least one syllable group not matched as at least one unmatched character when the standard syllable group is determined to be not matched with the at least one syllable group to be determined; the processing unit generates an original score related to the at least one unmatched character according to the training sentence and the at least one unmatched character by utilizing a language identification model; the language identification model is an N-Gram language identification model; when the processing unit judges that the at least one unmatched character corresponds to the weighted character, taking a weighted value corresponding to the weighted character as a target weighted value; the processing unit generates a weighted score according to the original score and the target weighted value, and controls the output unit to output an evaluation related to the voice according to the weighted score.
The output unit comprises a display screen and a loudspeaker, the display screen and the loudspeaker are respectively electrically connected with the processing unit, and the processing unit controls the display screen and/or the loudspeaker to output the evaluation and controls the display screen to display the at least one character which is not matched.
The processing unit determines that the at least one of the standard syllable set not corresponding to the to-be-determined syllable set does not correspond to the to-be-determined syllable set when determining whether the standard syllable set matches the to-be-determined syllable set or not, and the processing unit determines that the at least one of the standard syllable set not corresponding to the to-be-determined syllable set does not correspond to the to-be-determined syllable set when determining that the standard syllable set does not match the to-be-determined syllable set
The processing unit further determines whether the standard syllable sets are respectively identical to the to-be-determined syllable sets when it is determined that the standard syllable sets respectively correspond to the to-be-determined syllable sets, and determines that at least one different syllable set of the standard syllable sets does not correspond to the to-be-determined syllable sets when it is determined that the standard syllable sets are not identical to the to-be-determined syllable sets.
Each piece of learning data also comprises multimedia data, the multimedia data is provided with a virtual reality image related to the training sentences, and the processing unit controls the display screen to display the virtual reality image of the multimedia data according to the multimedia data contained in the learning data before controlling the output unit to output the training sentences of the learning data.
The voice input unit comprises a microphone. This oral teaching auxiliary device still includes the shell of a helmet shape, and storage unit, speech input unit, output unit and processing unit all assemble on this shell, and this oral teaching auxiliary device is a wearable on overhead head-mounted display device.
The processing unit is provided with a speech recognition engine which comprises three main modules: a foreign language dictionary module: for example, the present embodiment uses english, so an "english dictionary" is required as a standard; a word analysis module: parsing each word (vocarbulariy) in the foreign language dictionary module into a series of syllables; a language model module: and analyzing the proportion of the words in the language appearing in succession according to the use habits of different languages.
A spoken language teaching auxiliary method is realized by the spoken language teaching auxiliary device, and comprises the following steps:
(A) the processing unit controls the output unit to output a training sentence;
(B) when receiving the voice from the voice input unit, the processing unit analyzes the voice to obtain a sentence to be determined which is composed of a plurality of characters to be determined and a plurality of syllable groups to be determined which respectively correspond to the characters to be determined;
(C) when the processing unit determines that the standard syllable set is not matched with at least one syllable set to be determined, the processing unit takes the at least one standard character corresponding to the at least one syllable set which is not matched as at least one unmatched character;
(D) the processing unit generates an original score related to the at least one unmatched character according to the training sentence and the at least one unmatched character by utilizing a language identification model;
(E) when the processing unit judges that the at least one unmatched character corresponds to the weighted character, taking a weighted value corresponding to the weighted character as a target weighted value; and
(F) the processing unit generates a weighted score according to the original score and the target weighted value, and controls the output unit to output an evaluation related to the voice according to the weighted score.
In the step (E), the processing unit controls the display screen and/or the speaker to output the evaluation, and controls the display screen to display the at least one non-conforming character.
The step (C) comprises:
(c1) the processing unit determines that the at least one of the standard syllable sets that does not correspond to the to-be-determined syllable set when determining that the standard syllable set and the to-be-determined syllable set do not correspond to each other; and
(c2) the processing unit determines whether the standard syllable sets are respectively identical to the to-be-determined syllable sets when it is determined that the standard syllable sets respectively correspond to the to-be-determined syllable sets, and determines that at least one different syllable set of the standard syllable sets does not correspond to the to-be-determined syllable sets when it is determined that the standard syllable sets are not identical to the to-be-determined syllable sets.
Before the step (A), the processing unit controls the display screen to display the virtual reality image of the multimedia data according to the multimedia data contained in the spoken language learning data.
In step (D), the language identification model used is an N-Gram language identification model. An N-Gram (N-Gram) Language recognition model can predict or evaluate whether a sentence is reasonable in NLP (Natural Language Processing), and can be used to evaluate the degree of difference between two strings.
Assuming S represents a meaningful sentence, consisting of a string of words w1, w2, w3,.., wn in a particular order, n is the length of the sentence. The probability that S appears in the text (corpus), mathematically called the probability p (S), is:
P(S)=P(w1,w2,w3,..,wn)=P(W1)P(W2|W1)P(W3|W1,W2)..P(Wn|W1,W2,..,Wn-1);
the defects of the calculation method are as follows:
the parameter space is too large: the conditional probability P (wn | w1, w 2.., wn-1) is too likely to be estimated and not likely to be useful;
the data sparsity is severe: for combinations of very many word pairs, none of which is present in the corpus, the probability from the maximum likelihood estimate will be 0. The end result is that the model can only compute a few sentences, while most sentences compute a probability of 0.
In order to solve the problem of overlarge parameter space, a Markov assumption is introduced: the probability of an arbitrary word occurring is only related to the limited word or words that it precedes. If a word occurs that depends only on a word that appears before it, it is called bigram:
P(S)=P(w1,w2,w3,..,wn)=P(W1)P(W2|W1)P(W3|W1,W2)..P(Wn|W1,W2,..,Wn-1)≈P(W1)P(W2|W1)P(W3|W2)..P(Wn|Wn-1)P(S)=P(w1,w2,w3,..,wn)=P(W1)P(W2|W1)P(W3|W1,W2)..P(Wn|W1,W2,..,Wn-1)≈P(W1)P(W2|W1)P(W3|W2)..P(Wn|Wn-1);
assuming that the occurrence of a word depends only on the two words that it occurs before, it is called a trigram:
P(S)=P(w1,w2,w3,..,wn)=P(W1)P(W2|W1)P(W3|W1,W2)..P(Wn|W1,W2,..,Wn-1)≈P(W1)P(W2|W1)P(W3|W2,W1)..P(Wn|Wn-1,Wn-2);
in general, the N-gram model assumes that the probability of occurrence of the current word is related to only the N-1 words preceding it. These probability parameters can be calculated by a large-scale corpus, for example, the ternary probabilities include:
P(Wi|Wi-1,Wi-2)≈count(Wi-2Wi-1Wi)/count(Wi-2Wi-1)。
the data smoothing technology is an important means for constructing a highly robust language model, and the effect of data smoothing is related to the scale of a training corpus. The smaller the training corpus scale is, the more remarkable the data smoothing effect is; the larger the training corpus, the less pronounced or even negligible the effect of data smoothing. The purpose of data smoothing is two: one is to make the sum of all the N-Gram probabilities 1; one is to make all the N-Gram probabilities not 0. The main strategy is to properly reduce the probability of events occurring in the training sample and then assign the reduced probability density to events that do not occur in the training corpus. In practice, there are many kinds of smoothing algorithms, such as: add-one smoothing; Witten-Bell smoothing; Good-Turing smoothing; KatzBackoff; stupid backkoff.
String distance defined based on the N-Gram model:
the key to fuzzy matching is how to measure the "difference" between two very long words (or strings), often referred to as "distance". In addition to defining the edit distance between two strings (typically using the Needleman-Wunsch algorithm or the Smith-Waterman algorithm), the Ngram distance between them can also be defined. Assuming that there is a character string S, the Ngram of the character string represents a word segment obtained by segmenting the original word by the length N, that is, all the substrings with the length N in S. It is envisaged that if there are two strings and then their Ngrams are evaluated separately, then the Ngram distance between the two strings can be defined in terms of the number of their common substrings. But simply counting common sub-strings is clearly insufficient, and this solution obviously ignores the problem that a difference in length of two strings may cause. Such as the strings girl and girlfriend, the number of common substrings owned by both is obviously equal to the number of common substrings owned by girl and itself, but we cannot consider girl and girlfriend as two equivalent matches according to this. To solve this problem, some scholars propose to define Ngram distances based on non-repetitive Ngram participles, the formula is expressed as follows:
|GN(S1)|+|GN(S2)|-2×|GN(S1)∩GN(S2)|;
where | GN (S1) | is the Ngram set of the character string, and the value of N generally takes 2 or 3. Segmenting the character strings gorbachelv and Gorbechyov by taking N-2 as an example, the following results can be obtained:
1 Go or rb ba ac ch he ev
2 Go or rb be ec ch hy yo ov
combining the above formula, it can be calculated that the distance between two character strings is 8+9-2 × 4 ═ 9. Obviously, the smaller the distance between strings, the closer they are. When two strings are completely equal, the distance between them is 0.
Using the Ngram model to evaluate whether the statement is reasonable:
from a statistical point of view, a sentence S in natural language can be made up of any string of words, but the probability p (S) is large or small. Given a section of a sentence, it is possible to guess what the following words should be, for example:
the large green_.mountain or tree?
Kate swallowed the large green_.pill or broccoli?
suppose now that there is a corpus as follows, where < s1> < s2> is a sentence start marker and </s2> </s1> is a sentence end marker:
1<s1><s2>yes no no no no yes</s2></s1>
2<s1><s2>no no no yes yes yes no</s2></s1>
the probability of this sentence is evaluated as follows:
1<s1><s2>yes no no yes</s2></s1>
the result of calculating the probability using the trigram model is:
P(yes|<s1>,<s2>)=1/2,P(no|yes,no)=1/2,P(</s2>|no,yes)=1/2,P(no|<s2>,yes)=1P(yes|no,no)=2/5P(</s1>|yes,</s2>)=1
the resulting probability is equal to: 1/2 × 1 × 1/2 × 2/5 × 1/2 × 1 is 0.05.
The oral teaching auxiliary device is a helmet type wearable device and is provided with a helmet type shell. When the device is used, after a user wears the helmet type shell, the display screen is right facing eyes of the user, the head can be freely rotated, a real-time three-dimensional virtual scene and a character with 360 degrees can be seen from a viewing window of the helmet type shell, and a virtual character role is an object for practicing spoken language of the user; the user can hear the spoken language teaching situation and course content customized by the program in advance from the loudspeaker on the helmet-type shell; when the program requires the user to interact with the customized program in a "talk-through" manner, the user can talk directly through the mouth, and the program will receive the voice spoken by the user (speech input) through a microphone mounted on the helmet-based shell; the customized program is provided with a voice recognition module, so that the voice input information received by the microphone can be interpreted in real time, the evaluation is carried out in real time, and the evaluation result (very good speaking, to-be-strengthened and the like) is fed back to the user, so that the user can know that the spoken language of the user is not good at the first time; the user can also compare the spoken language with the standard spoken language pre-recorded in the program by the real person in detail through the function of recording/replaying to improve the spoken language ability.
The storage unit stores text data (i.e. conversation courses) which require the user to input in spoken language, and the user makes the microphone receive audio data in a spoken mode. The processing unit utilizes the pre-customized text data to be converted into an identification template for comparing the spoken input data through the speech recognition engine. After the user inputs the spoken language, the processing unit analyzes the audio of the spoken language input into a series of syllables through a word analysis module of the speech recognition engine, and then converts the syllables into spoken language samples through a language model module. The processing unit compares the spoken language sample with a preset identification template. During data comparison, a predefined weight parameter is adopted, floating point number between 0 and 1 is calculated by fuzzy logic to represent, and the difference degree between the spoken language sample and the identification template is obtained, wherein 0 represents that the data of the spoken language sample and the identification template are completely different, 1 represents that the data of the spoken language sample and the identification template are completely the same, and 0.7 represents that the data of the spoken language sample and the identification template are about 70% of the same.
Data comparison can be performed by a speech recognition engine in two ways:
1) semantic analysis and comparison: the method comprises the steps of firstly splitting sentences in an identification template into single words, labeling the single words into different labels according to characteristics, finding out semantic structures of the sentences through the previous and later steps, then comparing the sentences which are read by a user with the sentence structures in the identification template, analyzing whether the semantic structures of the two sentences are similar, giving a score between 0 and 1 by an algorithm, wherein the higher the score is, the higher the similarity is. For example:
example 1
□ recognize sentences in the template: let's go to the thread o' clcok show.
□ user 1 Let's go to the show
□ user 2 Let's go to the thread clock show
□ analysis results: user 1 misses the important time "three o' clock" and gets only 0.5 point. The user 2 has named o' clock as clock, but since the preceding and following words are correctly named, a relatively high score of 0.8 can be obtained.
Example 2
□ recognize sentences in the template: uh-huh, the go right to the next block and the post office'll be on your left.
□ user 1 Uh-huh, the n go right to the next block and the polieof ice be on your right.
□ user 2: Uh-huh, go to the next block, the post office be on yourleft.
□ analysis results: the sentence structure of user 1 is similar to that of the text, but the post office and the policy office are different places, the direction is wrong, and the obtained score is 0.4 which is lower. Although the user 2 omits the words "the", "right", "and", "will", etc., the semantic meanings are similar to the text, and a good score of 0.7 can be obtained.
Example 3
□ recognize sentences in the template: thanks! Can I get off at and stop up arriving the route?
□ user 1: thank! Can I get off store up the road?
□ user 2: thanks! Can I get on at any stop arriving at the route?
□ two users have similar scores, but user 2 pronounces get on as the key word, which is completely contrary to the semantic meaning of the text, and therefore deducts a very large score, only 0.6.
2) Single word weight comparison: the difference between the sentences in the recognition template and the sentences spoken by the user is compared word by word, and the sentences are given according to the importance weight of the single word. For example:
□ recognize sentences in the template: i gu you've gauge a cold. have you checked your temperature?
□ user: i lower you've catch a cold. Have you closed your temperature?
□ analysis results: the user recites caught as catch, the obtained original score is 0.91, and through weighting calculation, caught is a weighted word, so the score is corrected to be 0.78.
And (3) comparison evaluation: and the processing unit marks a score on the result of the data comparison, and the score is fed back to the user for knowing. For example: after the spoken language of the user is input, the result of data comparison is more than 0.8, and the user can give the evaluation of 'very good'; another example is: after the user inputs the spoken language, the result of data comparison is less than 0.4, the system gives an evaluation of 'to be strengthened', and the user can practice repeatedly.
Degree of difference: the processing unit can mark the place where the compared data are not in accordance according to the result of the comparison of the voice recognition engine data, so that the user can know which words are evaluated as being in accordance by the system in the stage of executing 'spoken language input' (the text data can mark the words which are not in accordance with each other with special colors and display the words on the display screen in real time).
The spoken language teaching auxiliary device and the spoken language teaching auxiliary method provided by the invention have the advantages that the voice processing capacity is strong, the pronunciation of a user can be accurately distinguished, the voice accuracy of the user can be evaluated, the virtual reality image can be presented to the user through voice interaction, the interactivity is strong, the user experience is good, the immersive feeling is strong, the interest of the user in spoken language learning is improved, and the requirements of practical application can be well met.
The above-mentioned embodiments only express the embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. The auxiliary device for the oral teaching is characterized by comprising a processing unit, and a storage unit, a voice input unit and an output unit which are respectively and electrically connected with the processing unit;
the storage unit stores a plurality of spoken language learning data, each spoken language learning data comprises a training sentence, a plurality of standard syllable groups and weighting information, the training sentence is composed of a plurality of standard characters which are sequentially arranged, and the standard syllable groups correspond to the standard characters and are obtained by analyzing the standard characters in advance; the weighted information has a weighted word and a weighted value corresponding to the weighted word, the weighted word corresponds to one of the standard words included in the training sentence;
the processing unit controls the output unit to output a training sentence; when receiving a voice from a voice input unit, a processing unit analyzes the voice to obtain a sentence to be judged consisting of a plurality of characters to be judged and a plurality of syllable groups to be judged corresponding to the characters to be judged respectively; when the processing unit judges that the standard syllable group is not matched with at least one syllable group of the syllable group to be judged, the processing unit takes at least one standard character corresponding to the at least one syllable group which is not matched as at least one unmatched character; the processing unit generates an original score related to at least one unmatched character according to the training sentence and the at least one unmatched character by using the language identification model; when the processing unit judges that at least one unmatched character corresponds to the weighted character, taking the weighted value corresponding to the weighted character as a target weighted value; the processing unit generates a weighted score according to the original score and the target weighted value, and controls the output unit to output an evaluation related to the voice according to the weighted score.
2. The spoken language teaching aid of claim 1, wherein the output unit includes a display screen and a speaker electrically connected to the processing unit, respectively.
3. The device of claim 1, wherein the voice input unit comprises a microphone.
4. A spoken language teaching assistance method implemented by the spoken language teaching assistance apparatus according to any one of claims 1 to 3, comprising the steps of:
(A) the processing unit controls the output unit to output a training sentence;
(B) when receiving the voice from the voice input unit, the processing unit analyzes the voice to obtain a sentence to be judged consisting of a plurality of characters to be judged and a plurality of syllable groups to be judged respectively corresponding to the characters to be judged;
(C) when the processing unit judges that the standard syllable group is not matched with at least one syllable group of the syllable group to be judged, the processing unit takes at least one standard character corresponding to the at least one syllable group which is not matched as at least one unmatched character;
(D) the processing unit generates an original score related to at least one unmatched character according to the training sentence and the at least one unmatched character by utilizing a language identification model;
(E) when the processing unit judges that at least one unmatched character corresponds to the weighted character, taking the weighted value corresponding to the weighted character as a target weighted value; and
(F) the processing unit generates a weighted score according to the original score and the target weighted value, and controls the output unit to output an evaluation related to the voice according to the weighted score.
5. The spoken language teaching assistance method of claim 4, wherein the step (C) comprises:
(c1) when the processing unit judges that the standard syllable group does not correspond to at least one syllable group of the syllable group to be judged, judging that the at least one syllable group of the standard syllable group which does not correspond to the syllable group to be judged;
(c2) the processing unit determines whether the standard group of syllables is respectively identical to the group of syllables to be determined when it is determined that the standard group of syllables respectively corresponds to the group of syllables to be determined, and determines that at least one different group of syllables of the standard group of syllables does not correspond to the group of syllables to be determined when it is determined that the standard group of syllables is not identical to the group of syllables to be determined.
6. The spoken language teaching assistance method of claim 4, wherein in step (E), the processing unit controls the display screen and/or the speaker to output the rating and controls the display screen to display at least one non-conforming word.
7. The spoken language teaching assistance method of claim 4, wherein before step (A), the processing unit controls the display screen to display a virtual reality image of the multimedia data according to the multimedia data included in the spoken language learning data.
8. The spoken language teaching assistance method of claim 4 wherein the language identification model is an N-Gram model.
CN201810689188.3A 2018-06-28 2018-06-28 Auxiliary device and method for oral teaching Active CN108831212B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810689188.3A CN108831212B (en) 2018-06-28 2018-06-28 Auxiliary device and method for oral teaching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810689188.3A CN108831212B (en) 2018-06-28 2018-06-28 Auxiliary device and method for oral teaching

Publications (2)

Publication Number Publication Date
CN108831212A CN108831212A (en) 2018-11-16
CN108831212B true CN108831212B (en) 2020-10-23

Family

ID=64133588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810689188.3A Active CN108831212B (en) 2018-06-28 2018-06-28 Auxiliary device and method for oral teaching

Country Status (1)

Country Link
CN (1) CN108831212B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109243230A (en) * 2018-11-21 2019-01-18 汕头市美致模型有限公司 A kind of interactive teaching system based on toy robot
CN111639217A (en) * 2020-05-12 2020-09-08 广东小天才科技有限公司 Spoken language rating method, terminal device and storage medium
GB2613563A (en) * 2021-12-03 2023-06-14 Learnlight Uk Ltd Apparatus, computing device and method for speech analysis

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1064464C (en) * 1991-12-12 2001-04-11 致远科技股份有限公司 Speech procesisng system based on multiple evaluation function
US5340316A (en) * 1993-05-28 1994-08-23 Panasonic Technologies, Inc. Synthesis-based speech training system
US7657221B2 (en) * 2005-09-12 2010-02-02 Northwest Educational Software, Inc. Virtual oral recitation examination apparatus, system and method
TW200900969A (en) * 2007-06-27 2009-01-01 Inventec Besta Co Ltd Chinese character pronumciation learning apparatus with pronunciation correction function and method thereof
CN102169642B (en) * 2011-04-06 2013-04-03 沈阳航空航天大学 Interactive virtual teacher system having intelligent error correction function
TWI432179B (en) * 2011-08-12 2014-04-01 Taipei Veterans General Hospital Vac Interactive speech testing and training platform
CN102930866B (en) * 2012-11-05 2014-05-21 广州市神骥营销策划有限公司 Evaluation method for student reading assignment for oral practice
CN103151042B (en) * 2013-01-23 2016-02-24 中国科学院深圳先进技术研究院 Full-automatic oral evaluation management and points-scoring system and methods of marking thereof
JP2014164261A (en) * 2013-02-27 2014-09-08 Canon Inc Information processor and information processing method
CN103578465B (en) * 2013-10-18 2016-08-17 威盛电子股份有限公司 Speech identifying method and electronic installation
CN104599680B (en) * 2013-10-30 2019-11-26 语冠信息技术(上海)有限公司 Real-time spoken evaluation system and method in mobile device
CN103559892B (en) * 2013-11-08 2016-02-17 科大讯飞股份有限公司 Oral evaluation method and system
CN104778865A (en) * 2014-01-14 2015-07-15 王萍丽 Method for conducting spoken language correction through speech recognition technology and language learning machine
CN104810017B (en) * 2015-04-08 2018-07-17 广东外语外贸大学 Oral evaluation method and system based on semantic analysis
TWM529913U (en) * 2016-06-22 2016-10-01 Yu Da University Of Science And Technology Language learning system
CN106875941B (en) * 2017-04-01 2020-02-18 彭楚奥 Voice semantic recognition method of service robot
CN106875764B (en) * 2017-04-26 2020-03-31 北京大生在线科技有限公司 Virtual reality foreign language learning system based on network and control method
CN107464476A (en) * 2017-09-03 2017-12-12 佛山神航科技有限公司 A kind of instrument for aiding in English study
CN108052499B (en) * 2017-11-20 2021-06-11 北京百度网讯科技有限公司 Text error correction method and device based on artificial intelligence and computer readable medium

Also Published As

Publication number Publication date
CN108831212A (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN110517689B (en) Voice data processing method, device and storage medium
CN114694076A (en) Multi-modal emotion analysis method based on multi-task learning and stacked cross-modal fusion
CN112784696B (en) Lip language identification method, device, equipment and storage medium based on image identification
CN107632980A (en) Voice translation method and device, the device for voiced translation
CN108831212B (en) Auxiliary device and method for oral teaching
CN113380271B (en) Emotion recognition method, system, device and medium
CN109584906B (en) Method, device and equipment for evaluating spoken language pronunciation and storage equipment
CN112466279B (en) Automatic correction method and device for spoken English pronunciation
CN112151015A (en) Keyword detection method and device, electronic equipment and storage medium
CN111192659A (en) Pre-training method for depression detection and depression detection method and device
JP6810580B2 (en) Language model learning device and its program
US20210151036A1 (en) Detection of correctness of pronunciation
Pervaiz et al. Emotion recognition from speech using prosodic and linguistic features
EP1398758B1 (en) Method and apparatus for generating decision tree questions for speech processing
Hrúz et al. Automatic fingersign-to-speech translation system
Hori et al. A statistical approach to automatic speech summarization
CN110853669A (en) Audio identification method, device and equipment
CN114254096A (en) Multi-mode emotion prediction method and system based on interactive robot conversation
CN113393841B (en) Training method, device, equipment and storage medium of voice recognition model
JP4934090B2 (en) Program character extraction device and program character extraction program
KR20210131698A (en) Method and apparatus for teaching foreign language pronunciation using articulator image
Rasipuram et al. Automatic prediction of fluency in interface-based interviews
CN111681680B (en) Method, system, device and readable storage medium for acquiring audio frequency by video recognition object
CN114420159A (en) Audio evaluation method and device and non-transient storage medium
CN112766101A (en) Method for constructing Chinese lip language identification modeling unit set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240314

Address after: No. 7 Gongyuan Road, Jiangdu District, Yangzhou City, Jiangsu Province, 225200

Patentee after: He Guangyao

Country or region after: China

Address before: 518057, Room 405, Building A, Zhongke Neng R&D Center, Yuexing Sixth Road, Yuehai Street, Nanshan District, Shenzhen, Guangdong Province

Patentee before: SHENZHEN LANGEASE EDUCATION TECHNOLOGY Co.,Ltd.

Country or region before: China