TWI683290B

TWI683290B - Spoken language teaching auxiliary method and device

Info

Publication number: TWI683290B
Application number: TW107122214A
Authority: TW
Inventors: 吳雲中
Original assignee: 吳雲中
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2020-01-21
Also published as: TW202001825A

Abstract

一種口語教學輔助裝置儲存一筆包含一訓練語句及一加權資訊的口語學習資料，該訓練語句由多個依序排列的標準文字所構成，該加權資訊具有一對應於其中一標準文字的加權文字及一加權值。該裝置輸出該訓練語句，並在接收到一語音時解析該語音以獲得多個待判定文字，且將該等標準文字當中與該等待判定文字不相符者作為至少一未相符文字，並利用一語言辨識模型，產生一相關於該至少一未相符文字的原始分數，且在判定出該至少一未相符文字對應於該加權文字時，根據該原始分數及該目標加權值，控制該輸出單元輸出一個相關於該語音的評價。A spoken language teaching auxiliary device stores a piece of spoken language learning data including a training sentence and a weighted information, the training sentence is composed of a plurality of standard characters arranged in sequence, the weighted information has a weighted character corresponding to one of the standard characters and A weighted value. The device outputs the training sentence, and parses the speech when a speech is received to obtain a plurality of texts to be determined, and among the standard texts, those that do not match the text to be determined as at least one unmatched text, and use a The language recognition model generates an original score related to the at least one unmatched text, and when it is determined that the at least one unmatched text corresponds to the weighted text, the output unit is controlled to output according to the original score and the target weighted value A comment related to the voice.

Description

Spoken language teaching auxiliary method and device

本發明是有關於一種教學輔助方法及裝置，特別是指一種口語教學輔助方法及裝置。The invention relates to a teaching auxiliary method and device, in particular to an oral teaching auxiliary method and device.

近年來，習知的語言教學途徑與方法，主要有影音教學、軟體教學、真人(現場或遠距)教學等。In recent years, the learned language teaching methods and methods mainly include audio-visual teaching, software teaching, and real person (on-site or remote) teaching.

真人教學的缺點在於價格昂貴、有的使用者面對真人教師會因難為情而開不了口。影音教學的缺點在於無法與用戶進行互動。軟體教學的缺點在於用戶體驗不容易實現口語使用情境之臨場感。特別是，使用者僅能跟著影音教學之內容或者軟體教學之內容覆誦，難以判定本身覆誦之內容是否正確。The disadvantage of live-teaching is that it is expensive, and some users will not be able to speak for embarrassment when faced with live-teacher. The disadvantage of audiovisual teaching is that it cannot interact with users. The disadvantage of software teaching is that the user experience is not easy to realize the presence of spoken language. In particular, users can only follow the content of audiovisual teaching or software teaching, and it is difficult to determine whether the content they have repeated is correct.

因此，本發明之其中一目的，即在提供一種口語教學輔助方法，能克服現有技術的至少一缺點。Therefore, one of the objects of the present invention is to provide an oral teaching aid method, which can overcome at least one disadvantage of the prior art.

於是，本發明一種口語教學輔助方法，由一口語教學輔助裝置執行，該口語教學輔助裝置包含一儲存單元、一語音輸入單元、一輸出單元、及一電連接該儲存單元、該語音輸入單元、及該輸出單元的處理單元，該儲存單元儲存複數筆口語學習資料，每一筆口語學習資料包含一訓練語句、多個標準音節組及一加權資訊，該訓練語句由多個依序排列的標準文字所構成，該等標準音節組分別對應於該等文字且是藉由預先解析自該等文字所獲得，該加權資訊具有一個加權文字以及一對應於該加權文字的加權值，該加權文字對應於該訓練語句所包含的該等文字其中一者，該口語教學輔助方法包含：Therefore, the present invention provides an oral teaching assistant method, which is executed by an oral teaching assistant device. The oral teaching assistant device includes a storage unit, a voice input unit, an output unit, and an electrical connection to the storage unit, the voice input unit, And the processing unit of the output unit, the storage unit stores a plurality of spoken language learning data, each spoken language learning data includes a training sentence, a plurality of standard syllable groups and a piece of weighted information, the training sentence is composed of a plurality of standard characters arranged in sequence Constituted, the standard syllable groups correspond to the texts and are obtained by pre-analysing the texts, the weighted information has a weighted text and a weighted value corresponding to the weighted text, the weighted text corresponds to For one of the words included in the training sentence, the spoken language teaching aids include:

(A)該處理單元控制該輸出單元輸出該訓練語句；(A) The processing unit controls the output unit to output the training sentence;

(B) 該處理單元在接收到來自該語音輸入單元的一語音時，解析該語音以獲得由多個待判定文字組成的一待判定語句及多個分別對應於該等待判定文字的待判定音節組；(B) When receiving a voice from the voice input unit, the processing unit parses the voice to obtain a sentence to be determined composed of a plurality of characters to be determined and a plurality of syllables to be determined corresponding to the characters to be determined respectively group;

(C) 該處理單元在判定出該等標準音節組與該等待判定音節組不相符的至少一音節組時，將不相符的該至少一音節組所對應的該至少一標準文字作為至少一未相符文字；(C) When determining that at least one syllable group that does not match the standard syllable group and the pending syllable group, the processing unit regards the at least one standard text corresponding to the at least one syllable group that does not match as at least one Matching text

(D) 該處理單元利用一語言辨識模型，根據該訓練語句及該至少一未相符文字，產生一相關於該至少一未相符文字的原始分數；(D) The processing unit uses a language recognition model to generate an original score related to the at least one unmatched word based on the training sentence and the at least one unmatched word;

(E)該處理單元在判定出該至少一未相符文字對應於該加權文字時，將該加權文字所對應的加權值作為一目標加權值；及(E) when the processing unit determines that the at least one unmatched character corresponds to the weighted character, the weighted value corresponding to the weighted character is used as a target weighted value; and

(F)該處理單元根據該原始分數及該目標加權值，控制該輸出單元輸出一個相關於該語音的評價。(F) The processing unit controls the output unit to output an evaluation related to the speech based on the original score and the target weighted value.

在一些實施態樣中，該輸出單元包括一顯示螢幕及一揚聲器，在該步驟(E)中，該處理單元控制該顯示螢幕或該揚聲器輸出該評價，並控制該顯示螢幕顯示該至少一未相符文字。In some embodiments, the output unit includes a display screen and a speaker. In step (E), the processing unit controls the display screen or the speaker to output the evaluation, and controls the display screen to display the at least one Match text.

在一些實施態樣中，步驟(C)包含：In some embodiments, step (C) includes:

(c1)該處理單元在判定出該等標準音節組與該等待判定音節組未對應的至少一音節組時，判定該等標準音節組的未對應的該至少一音節組與該等待判定音節組不相符；及(c1) When the processing unit determines that the at least one syllable group that does not correspond to the standard syllable group and the waiting determination syllable group, determines that the at least one syllable group that does not correspond to the standard syllable group and the waiting determination syllable group Does not match; and

(c2)該處理單元在判定出該等標準音節組分別對應於該等待判定音節組時，判定該等標準音節組是否分別相同於該等待判定音節組，並在判定出該等標準音節組與該等待判定音節組不相同的至少一音節組時，判定該等標準音節組的不相同的該至少一音節組與該等待判定音節組不相符。(c2) When the processing unit determines that the standard syllable groups respectively correspond to the waiting determination syllable group, it determines whether the standard syllable groups are respectively the same as the waiting determination syllable group, and determines that the standard syllable groups and When the at least one syllable group that is different from the waiting determination syllable group is determined, it is determined that the at least one syllable group that is different from the standard syllable groups does not match the waiting determination syllable group.

在一些實施態樣中，每筆學習資料還包含一多媒體資料，該多媒體資料具有一個相關於該訓練語句的虛擬實境影像，在步驟(A)之前，處理單元根據該學習資料所包含的該多媒體資料，控制該顯示螢幕顯示該多媒體資料的虛擬實境影像。In some implementation forms, each learning material further includes a multimedia material, the multimedia material has a virtual reality image related to the training sentence, and before step (A), the processing unit according to the learning material contains the Multimedia data, controlling the display screen to display the virtual reality image of the multimedia data.

在一些實施態樣中，該口語訓練裝置是一頭戴式顯示裝置，該輸出單元還包括一揚聲器，該語音輸入單元包括一麥克風。In some embodiments, the spoken language training device is a head-mounted display device, the output unit further includes a speaker, and the voice input unit includes a microphone.

在一些實施態樣中，該語言辨識模型是一N元(N-Gram)語言辨識模型。In some implementation aspects, the language recognition model is an N-gram language recognition model.

因此，本發明之另一目的，即在提供一種口語教學輔助裝置，能克服現有技術的至少一缺點。Therefore, another object of the present invention is to provide a spoken language teaching aid that can overcome at least one of the shortcomings of the prior art.

於是，本發明一種口語教學輔助裝置，包含一儲存單元、一語音輸入單元、一輸出單元及一電連接該儲存單元、該語音輸入單元、及該輸出單元的處理單元。該儲存單元儲存複數筆口語學習資料，每一筆口語學習資料包含一訓練語句、多個標準音節組及一加權資訊，該訓練語句由多個依序排列的標準文字所構成，該等標準音節組分別對應於該等標準文字且是藉由預先解析自該等標準文字所獲得，該加權資訊具有一個加權文字以及一對應於該加權文字的加權值，該加權文字對應於該訓練語句所包含的該等文字其中一者。Therefore, the present invention provides a spoken language teaching aid including a storage unit, a voice input unit, an output unit, and a processing unit electrically connected to the storage unit, the voice input unit, and the output unit. The storage unit stores a plurality of spoken language learning data. Each spoken language learning data includes a training sentence, a plurality of standard syllable groups, and a piece of weighted information. The training sentence is composed of a plurality of standard characters arranged in sequence. These standard syllable groups Corresponding to the standard texts respectively and obtained by parsing from the standard texts in advance, the weighted information has a weighted text and a weighted value corresponding to the weighted text, the weighted text corresponding to the training sentence One of these words.

該處理單元控制該輸出單元輸出該訓練語句，並在接收到來自該語音輸入單元的一語音時，解析該語音以獲得由多個待判定文字組成的一待判定語句及多個分別對應於該等待判定文字的待判定音節組。The processing unit controls the output unit to output the training sentence, and when receiving a speech from the speech input unit, parses the speech to obtain a sentence to be composed of a plurality of characters to be decided and a plurality of words corresponding to the sentence The syllable group to be judged waiting for the judged text.

該處理單元在判定出該等標準音節組與該等待判定音節組不相符的至少一音節組時，將不相符的該至少一音節組所對應的該至少一標準文字作為至少一未相符文字。When determining that at least one syllable group that does not match the standard syllable group and the pending syllable group, the processing unit uses the at least one standard character corresponding to the at least one syllable group that does not match as at least one unmatched character.

該處理單元利用一語言辨識模型，根據該訓練語句及該至少一未相符文字，產生一相關於該至少一未相符文字的原始分數。The processing unit uses a language recognition model to generate an original score related to the at least one unmatched word based on the training sentence and the at least one unmatched word.

該處理單元在判定出該至少一未相符文字對應於該加權文字時，將該加權文字所對應的加權值作為一目標加權值。When determining that the at least one unmatched character corresponds to the weighted character, the processing unit uses the weighted value corresponding to the weighted character as a target weighted value.

該處理單元根據該原始分數及該目標加權值，控制該輸出單元輸出一個相關於該語音的評價。The processing unit controls the output unit to output an evaluation related to the speech based on the original score and the target weighted value.

在一些實施態樣中，該輸出單元包括一顯示螢幕及一揚聲器，且該處理單元控制該顯示螢幕或該揚聲器輸出該評價，還更控制該顯示螢幕顯示該至少一不相符文字。In some implementations, the output unit includes a display screen and a speaker, and the processing unit controls the display screen or the speaker to output the evaluation, and further controls the display screen to display the at least one inconsistent text.

在一些實施態樣中，該處理單元在判定該等標準音節組是否與該等待判定音節組相符時，是在判定出該等標準音節組與該等待判定音節組未對應的至少一音節組時，判定該等標準音節組的未對應的該至少一音節組與該等待判定音節組不相符，並在判定出該等標準音節組分別對應於該等待判定音節組時，還判定該等標準音節組是否分別相同於該等待判定音節組，且在判定出該等標準音節組與該等待判定音節組不相同的至少一音節組時，判定該等標準音節組的不相同的該至少一音節組與該等待判定音節組不相符。In some implementation aspects, when the processing unit determines whether the standard syllable groups are consistent with the waiting determination syllable group, it is when determining that the standard syllable groups and the waiting determination syllable group do not correspond to at least one syllable group , It is determined that the at least one syllable group that does not correspond to the standard syllable groups does not match the waiting determination syllable group, and when it is determined that the standard syllable groups respectively correspond to the waiting determination syllable group, the standard syllables are also determined Whether the groups are respectively the same as the waiting determination syllable group, and when it is determined that the at least one syllable group of the standard syllable group is different from the waiting determination syllable group, it is determined that the at least one syllable group of the standard syllable group is different It does not match the waiting determination syllable group.

在一些實施態樣中，每筆學習資料還包含一多媒體資料，該多媒體資料具有一個相關於該訓練語句的虛擬實境影像，處理單元在控制該輸出單元輸出該訓練語句其中該者之前，還根據該學習資料所包含的該多媒體資料，控制該顯示螢幕顯示該多媒體資料的虛擬實境影像。In some implementation forms, each learning material further includes a multimedia material, the multimedia material has a virtual reality image related to the training sentence, and the processing unit controls the output unit to output the training sentence before According to the multimedia data contained in the learning data, the display screen is controlled to display a virtual reality image of the multimedia data.

本發明之功效在於：利用該語言辨識模型根據該訓練語句及該至少一不相符文字獲得一原始分數，並輸出相關於該原始分數及加權值的評價，使用者能根據該評價判斷其所說出的語音與該訓練語句語意之間的差異程度。且能避免使用者因面對真人教師而難為情以致開不了口之情形。The effect of the present invention is to use the language recognition model to obtain an original score based on the training sentence and the at least one inconsistent text, and output an evaluation related to the original score and weighted value, and the user can judge what he said based on the evaluation The degree of difference between the outgoing speech and the semantic meaning of the training sentence. And can avoid the situation that the user is embarrassed by the face of a real teacher and cannot speak.

在本發明被詳細描述之前，應當注意在以下的說明內容中，類似的元件是以相同的編號來表示。Before the present invention is described in detail, it should be noted that in the following description, similar elements are denoted by the same numbers.

參閱圖1，本發明口語教學輔助裝置100的一實施例包含一儲存單元1、一語音輸入單元2、一輸出單元3及一電連接該儲存單元1、該語音輸入單元2及該輸出單元3的處理單元4。Referring to FIG. 1, an embodiment of the oral teaching aid device 100 of the present invention includes a storage unit 1, a voice input unit 2, an output unit 3 and an electrical connection to the storage unit 1, the voice input unit 2 and the output unit 3的处理单元4。 The processing unit 4.

在本實施例中，該儲存單元1儲存一評價資訊11及多筆口語學習資料12。該評價資訊11包含多個分別對應於不同分數區間的評價，舉例來說，該評價資訊11包含三個評價，分別是對應於分數區間為0至小於0.4的「有待加強」之評價、對應於分數區間為0.4至0.8的「還不錯」之評價、以及分數區間為大於0.8且不小於0.1的「非常棒」之評價，但不以此為限。In this embodiment, the storage unit 1 stores evaluation information 11 and multiple spoken language learning materials 12. The evaluation information 11 includes a plurality of evaluations respectively corresponding to different score intervals. For example, the evaluation information 11 includes three evaluations, which respectively correspond to "to be strengthened" evaluations with a score interval of 0 to less than 0.4, corresponding to "Not bad" evaluations with a score range of 0.4 to 0.8, and "Excellent" evaluations with a score range of more than 0.8 and not less than 0.1, but not limited to this.

每一筆口語學習資料12包含一多媒體資料121及至少一訓練語句122，以該口語學習資料12只包含一訓練語句122為例，該訓練語句122對應多個標準音節組123及一加權資訊124。該多媒體資料121相關於一虛擬實境影像且該虛擬實境影像之情境與該訓練語句122有關。該訓練語句122是由多個依序排列的文字所構成的句子，且該訓練語句122所對應的該等標準音節組123分別對應於該訓練語句122的該等文字且是藉由預先解析該等文字所獲得。該訓練語句122對應的該加權資訊124具有一個加權文字124A以及一對應於該加權文字124A的加權值124B，該加權文字124A對應於該訓練語句122所包含的該等標準文字其中一者。舉例來說，該虛擬實境影像呈現出一個由三維擬真的場景及人物所構成並且相關於邀請的一段對話之情境，該段對話包含該訓練語句122及其他句子，該訓練語句122例如是「Let’s go to the three o’clock show」，該等標準文字為「Let’s」、「 go」、「to」、「the」、「three」、「 o’clock」、「show」，對應於標準文字「o’clock」的該標準音節組123包含「o」及「clock」等兩個音節的文字類型資料，該加權文字124A則為「 o’clock」，其所對應的加權值為0.7。Each spoken language learning material 12 includes a multimedia data 121 and at least one training sentence 122. Taking the spoken language learning material 12 only including a training sentence 122 as an example, the training sentence 122 corresponds to a plurality of standard syllable groups 123 and a weighted information 124. The multimedia data 121 is related to a virtual reality image and the situation of the virtual reality image is related to the training sentence 122. The training sentence 122 is a sentence composed of a plurality of characters arranged in sequence, and the standard syllable groups 123 corresponding to the training sentence 122 respectively correspond to the texts of the training sentence 122 and are pre-parsed by the And other texts. The weighted information 124 corresponding to the training sentence 122 has a weighted text 124A and a weighted value 124B corresponding to the weighted text 124A. The weighted text 124A corresponds to one of the standard texts included in the training sentence 122. For example, the virtual reality image presents a scene composed of three-dimensional realistic scenes and characters and related to an invited dialogue. The dialogue includes the training sentence 122 and other sentences. The training sentence 122 is, for example, "Let's go to the three o'clock show", these standard texts are "Let's", "go", "to", "the", "three", "o'clock", "show", corresponding to the standard The standard syllable group 123 of the text "o'clock" contains text type data of two syllables such as "o" and "clock", and the weighted text 124A is "o'clock", and the corresponding weighted value is 0.7.

值得說明的是，該儲存單元1所儲存的口語學習資料12的數量以及每一口語學習資料12所包含的訓練語句122及其對應的加權資訊124的數量不以本實施例為限，在其他實施態樣中，該儲存單元1可儲存多筆口語學習資料12，且每一口語學習資料12可包含多個訓練語句122及對應的多個加權資訊124。It is worth noting that the number of spoken language learning data 12 stored in the storage unit 1 and the number of training sentences 122 and corresponding weighted information 124 contained in each spoken language learning data 12 are not limited to this embodiment, in other In an implementation form, the storage unit 1 can store a plurality of spoken language learning materials 12, and each spoken language learning data 12 can include a plurality of training sentences 122 and a corresponding plurality of weighted information 124.

在本實施例中，該語音輸入單元2包括一麥克風21，而可供使用者以說話的方式輸入語音。在本實施例中，該輸出單元3包括一顯示螢幕31及一揚聲器32，而可藉由顯示螢幕31顯示虛擬實境影像，且能藉由揚聲器32輸出相關於虛擬實境影像的聲音，即上述的該訓練語句122。在本實施例中，該口語教學輔助裝置100例如被實施為一個頭戴式顯示裝置(圖未示出)，而適合使用者穿戴於頭部，當其被使用者配戴時可經由該顯示螢幕31看見該虛擬影像當中的三維擬真的場景及人物且經由該揚聲器32聽見相關於該場景及人物的聲音，即上述的該訓練語句122，且該顯示螢幕31所顯示的內容能跟隨使用者頭部的轉動而適應性的變化，使用者並在需要時能以對著該麥克風21說話的方式來輸入一語音至該口語教學輔助裝置100。但本實施例的該口語教學輔助裝置100並不以頭戴式顯示裝置為限，它也可以採用個人電腦、筆記型電腦、平板電腦或智慧型手機配合內建或外接的揚聲器及麥克風即可實現。In this embodiment, the voice input unit 2 includes a microphone 21, which allows the user to input voice by speaking. In this embodiment, the output unit 3 includes a display screen 31 and a speaker 32, and the virtual reality image can be displayed through the display screen 31, and the sound related to the virtual reality image can be output through the speaker 32, namely The above training sentence 122. In this embodiment, the spoken language teaching aid device 100 is, for example, implemented as a head-mounted display device (not shown in the figure), which is suitable for the user to wear on the head, and can be displayed through the display when worn by the user The screen 31 sees the three-dimensional realistic scenes and characters in the virtual image and hears the sounds related to the scenes and characters through the speaker 32, that is, the training sentence 122 mentioned above, and the content displayed on the display screen 31 can be followed to use The adaptability of the rotation of the person's head allows the user to input a voice into the spoken language teaching aid 100 by speaking into the microphone 21 when needed. However, the spoken language teaching aid device 100 of this embodiment is not limited to a head-mounted display device. It can also use a personal computer, a notebook computer, a tablet computer, or a smartphone with built-in or external speakers and microphones. achieve.

以下說明本實施例的該口語教學輔助裝置100如何執行本發明教學輔助方法的一實施例。The following describes how the spoken language teaching aid device 100 of this embodiment performs an embodiment of the teaching aid method of the present invention.

首先，如步驟S1，該處理單元4根據該口語學習資料12所包含的該多媒體資料121，控制該顯示螢幕31顯示該虛擬實境影像。延續前例，該顯示螢幕31所顯示的虛擬實境影像是關於邀約的對話情境，如此，配戴該口語教學輔助裝置100(即頭戴式顯示裝置)的使用者可自由旋轉頭部且在每一角度都能從顯示螢幕31中看到三維擬真的場景及人物。First, in step S1, the processing unit 4 controls the display screen 31 to display the virtual reality image according to the multimedia data 121 included in the spoken language learning material 12. Continuing the previous example, the virtual reality image displayed on the display screen 31 is a dialogue situation about an invitation, so that a user wearing the spoken language teaching aid device 100 (ie, a head-mounted display device) can freely rotate his head The three-dimensional realistic scene and characters can be seen from the display screen 31 at an angle.

接著，如步驟S2，該處理單元4控制該輸出單元3輸出相關於該訓練語句122的一訓練資訊，並在接收到來自該語音輸入單元2的一語音時，如步驟S3，該處理單元4解析該語音以獲得由多個待判定文字組成的一待判定語句及多個分別對應於該等待判定文字的待判定音節組。延續前例，該顯示螢幕31所顯示的虛擬實境影像是關於邀約的對話情境，且該揚聲器32發出該對話的聲音，其中包含該訓練語句122「Let’s go to the three o’clock show」。接著，該使用者朝該麥克風21說出一段語音，該語音例如是「Let’s go to the show」，則該處理單元4解析該等待判定文字包含「Let’s」、「 go」、「to」、「 the」、「show」，該待判定語句為「Let’s go to the show」並解析出多個分別對應於該等待判定文字的待判定音節組。Then, in step S2, the processing unit 4 controls the output unit 3 to output training information related to the training sentence 122, and when receiving a voice from the voice input unit 2, in step S3, the processing unit 4 The speech is parsed to obtain a sentence to be determined composed of a plurality of characters to be determined and a plurality of groups of syllables to be determined corresponding to the characters to be determined. Continuing the previous example, the virtual reality image displayed on the display screen 31 is a dialogue situation about an invitation, and the speaker 32 emits the voice of the dialogue, which includes the training sentence 122 "Let’s go to the three o’ clock show". Then, the user speaks a voice into the microphone 21, for example, "Let's go to the show", the processing unit 4 parses the waiting judgment text to include "Let's", "go", "to", " "", "show", the sentence to be judged is "Let's go to the show" and multiple syllable groups to be judged corresponding to the character to be judged are parsed out.

然後，如步驟S4，該處理單元4判斷該等標準音節組123與該等待判定音節組是否相符，並在判定出該等標準音節組123與該等待判定音節組不相符的至少一音節組時，流程進入步驟S5，該處理單元4將不相符的該至少一音節組所對應的該至少一標準文字作為至少一未相符文字，否則，流程進入步驟S6，在步驟S6中，因為輸入語音所對應的該待判定語句與該訓練語句122完全相符(相同)，因此該處理單元4控制該顯示螢幕31顯示「非常棒」之評價。Then, in step S4, the processing unit 4 determines whether the standard syllable groups 123 and the waiting determination syllable group match, and when it is determined that the standard syllable groups 123 and the waiting determination syllable group do not match at least one syllable group , The process proceeds to step S5, the processing unit 4 uses the at least one standard text corresponding to the at least one syllable group as at least one unmatched text, otherwise, the process proceeds to step S6, in step S6, because the input voice The corresponding sentence to be determined completely matches (same) the training sentence 122, so the processing unit 4 controls the display screen 31 to display a "very good" evaluation.

在本實施例中，在步驟S4中，該處理單元4在判定該等標準音節組123是否與該等待判定音節組相符時，藉由判定出該等標準音節組123與該等待判定音節組未對應的至少一音節組的方式，來判定該等標準音節組的未對應的該至少一音節組與該等待判定音節組不相符。延續前例，由於該等待判定文字(包含「Let’s」、「 go」、「to」、「 the」、「show」)所對應的該等待判定音節組缺漏了該等標準文字(包含「Let’s」、「 go」、「to」、「the」、「three」、「 o’clock」、「show」)當中的順序為第五及第六個標準文字「three」及「 o’clock」的兩個標準音節組123，亦即，對應於「three」及「 o’clock」等兩個標準文字的標準音節組123未對應於該等待判定音節組，故「three」及「 o’clock」等兩個標準文字被判定為兩個未相符文字。此外，該處理單元4在判定出該等標準音節組123分別對應於該等待判定音節組時，還判定該等標準音節組123是否分別相同於該等待判定音節組，且在判定出該等標準音節組123與該等待判定音節組不相同的至少一音節組時，判定該等標準音節組123的不相同的該至少一音節組與該等待判定音節組不相符。為方便說明，在此假設使用者說出的語句是「Let’s go to the nine o’clock show」，該處理單元4判定出該待判定語句「Let’s go to the nine o’clock show」中的該等待判定文字是分別對應於訓練語句當中的該等標準文字(亦即，該待判定語句未缺漏文字)，接著該處理單元4判定出「nine」不相同於該訓練語句122中順序為第五的標準文字「three」，則「nine」將被作為一個未相符文字。In this embodiment, in step S4, when the processing unit 4 determines whether the standard syllable groups 123 are consistent with the waiting determination syllable group, it determines that the standard syllable groups 123 and the waiting determination syllable group are not In a manner corresponding to at least one syllable group, it is determined that the uncorresponding at least one syllable group of the standard syllable groups does not match the pending syllable group. Continuing the previous example, because the waiting judgment syllable group corresponding to the waiting judgment text (including "Let's", "go", "to", "the", "show") is missing these standard characters (including "Let's", The order of ``go'', ``to'', ``the'', ``three'', ``o'clock'', ``show'') is the two of the fifth and sixth standard characters ``three'' and ``o'clock'' The standard syllable group 123, that is, the standard syllable group 123 corresponding to two standard characters such as "three" and "o'clock" does not correspond to the waiting determination syllable group, so "three" and "o'clock" and other two A standard character was judged as two non-matching characters. In addition, when the processing unit 4 determines that the standard syllable groups 123 respectively correspond to the waiting determination syllable group, it also determines whether the standard syllable groups 123 are respectively the same as the waiting determination syllable group, and determines the standards When the syllable group 123 is at least one syllable group that is different from the waiting determination syllable group, it is determined that the at least one syllable group that is different from the standard syllable group 123 does not match the waiting determination syllable group. For convenience of explanation, it is assumed here that the sentence spoken by the user is "Let's go to the nine o'clock show", and the processing unit 4 judges that the sentence to be judged "Let's go to the nine o'clock show" The waiting judgment texts respectively correspond to the standard texts in the training sentences (that is, the sentence to be judged is not missing text), and then the processing unit 4 judges that "nine" is not the same as the fifth in the training sentence 122 Standard text "three", then "nine" will be treated as an unmatched text.

然後，如步驟S7，該處理單元4利用一語言辨識模型，根據該訓練語句122及該至少一未相符文字，產生一相關於該待判定語句的原始分數。在本實施例中，該語言辨識模型是採用一N元(N-Gram)語言辨識模型，並不以此為限，在其他的實施態樣中，該語言辨識模型也可使用遞歸神經網路（recurrent neural networks，RNN）或長短期記憶模型（long short-term memory，LSTM）模型。該N元(N-Gram)語言辨識模型例如能根據該訓練語句122之語意及該待判定語句中至少一未相符文字判斷出該至少一未相符文字的重要性，並根據該重要性決定一個0到1之間的分數以作為該原始分數，由於該N元語言辨識模型如何根據該訓練語句122之語意及該待判定語句中至少一未相符文字判斷出該至少一未相符文字的重要性為現有技術，故於此不再贅述。延續前例，該處理單元4利用該N元語言辨識模型，根據訓練語句「Let’s go to the three o’clock show」之語意及該待判定語句中兩個未相符文字「three」及「 o’clock」，判斷「three」及「 o’clock」的重要性，並根據該重要性決定出一個0.5的分數，因此該待判定語句的該原始分數即為0.5。因為「three」及「 o’clock」在該訓練語句122的語意上是重要的文字，換言之，若使用者未正確說出或說出的語音中缺少這兩個單字，則表示使用者說出的語句與訓練語句122差異度大，故原始分數僅有滿分(滿分為1)的一半。Then, in step S7, the processing unit 4 uses a language recognition model to generate an original score related to the sentence to be determined according to the training sentence 122 and the at least one unmatched text. In this embodiment, the language recognition model is an N-gram language recognition model, which is not limited to this. In other implementations, the language recognition model can also use a recurrent neural network (Recurrent neural networks, RNN) or long short-term memory model (LSTM) model. The N-gram language recognition model can determine the importance of the at least one unmatched text according to the semantic meaning of the training sentence 122 and at least one unmatched text in the sentence to be determined, and determine a A score between 0 and 1 is used as the original score, because the N-gram language recognition model determines the importance of the at least one unmatched word based on the semantic meaning of the training sentence 122 and at least one unmatched word in the sentence to be determined It is an existing technology, so it will not be described here. Continuing the previous example, the processing unit 4 uses the N-gram language recognition model, according to the semantic meaning of the training sentence "Let's go to the three o'clock show" and the two unmatched words "three" and "o'clock" in the sentence to be determined ", judge the importance of "three" and "o'clock", and determine a score of 0.5 according to the importance, so the original score of the sentence to be judged is 0.5. Because "three" and "o'clock" are important words in the semantics of the training sentence 122, in other words, if the user does not speak correctly or the two words are missing in the spoken voice, it means that the user said And the training sentence 122 are very different, so the original score is only half of the full score (full score is 1).

接著，如步驟S8，該處理單元4判斷該待判定語句的該至少一未相符文字是否對應於該加權文字，若是，流程進入步驟S9，否則，流程進入步驟S11。在步驟S9中，該處理單元4將該至少一未相符文字對應的該加權文字所對應的加權值作為一目標加權值。延續前例，由於該兩未相符文字其中一者「 o’clock」是對應於加權文字(「 o’clock」)，故0.7被作為該目標加權值。Next, in step S8, the processing unit 4 determines whether the at least one unmatched text of the sentence to be determined corresponds to the weighted text. If so, the flow proceeds to step S9, otherwise, the flow proceeds to step S11. In step S9, the processing unit 4 uses the weighted value corresponding to the weighted text corresponding to the at least one unmatched text as a target weighted value. Continuing the previous example, since one of the two unmatched characters "o'clock" corresponds to a weighted character ("o'clock"), 0.7 is used as the target weighted value.

在步驟S10中，該處理單元4根據該原始分數及該目標加權值計算一加權後分數，並根據該加權後分數控制該輸出單元3輸出一個相關於該語音的評價且顯示該至少一不相符文字。延續前例，該處理單元4將該原始分數與該目標加權值的乘積作為該加權後分數，因此該加權後分數為0.35，而該加權後分數(0.35)其所對應的評價是「有待加強」，故該顯示螢幕31將顯示出「有待加強」之評價以及「three」及「 o’clock」等使用者說錯或未說出的單字，如此，該使用者可得知其說出的語音中缺少「three」及「 o’clock」兩個單字。在其他實施態樣中，該評價也可以是由該揚聲器32以聲音的形式輸出。值得說明的是，加權資訊124是由廠商預先根據本身的教學目的而預設的，如此，藉由該目標加權值加重對於「 o’clock」的扣分，能更貼近其教學目的。In step S10, the processing unit 4 calculates a weighted score based on the original score and the target weighted value, and controls the output unit 3 to output an evaluation related to the voice based on the weighted score and display the at least one disagreement Text. Continuing the previous example, the processing unit 4 takes the product of the original score and the target weighted value as the weighted score, so the weighted score is 0.35, and the corresponding evaluation of the weighted score (0.35) is "to be strengthened." , So the display screen 31 will display the evaluation of "to be strengthened" and words that the user such as "three" and "o'clock" said is wrong or unspoken, so that the user can know the voice he speaks The words "three" and "o'clock" are missing. In other embodiments, the evaluation may be output by the speaker 32 in the form of sound. It is worth noting that the weighting information 124 is preset by the manufacturer according to its own teaching purpose in advance, so that the deduction of "o’clock" by the target weighting value can be closer to its teaching purpose.

補充說明的是，在其他實施態樣中，該口語學習資料12包含多個加權資訊，且每一加權訊的該加權文字對應於一標準文字，如此，當於步驟S8中判定出不相符文字有多個且分別對應於多個加權文字時，步驟S10中的該加權後分數即為該原始分數及該等多個目標加權值的乘積。It is added that in other implementations, the spoken language learning material 12 includes a plurality of weighted information, and the weighted text of each weighted message corresponds to a standard text. Thus, when it is determined in step S8 that the text does not match When there are multiple and corresponding to multiple weighted characters, the weighted score in step S10 is the product of the original score and the target weighted values.

而在步驟S11中，由於該待判定語句的該至少一未相符文字並未對應於該加權文字124A，因此該處理單元4僅根據該原始分數控制該輸出單元3輸出一個相關於該語音的評價且顯示該至少一不相符文字。In step S11, since the at least one unmatched text of the sentence to be determined does not correspond to the weighted text 124A, the processing unit 4 only controls the output unit 3 to output an evaluation related to the speech based on the original score And display the at least one inconsistent text.

綜上所述，上述實施例利用該語言辨識模型根據該訓練語句及使用者輸入之語音所對應產生的該待判定語句中與該訓練語句不相符的至少一文字，並根據該不相符的至少一文字的重要性獲得一原始分數，且該處理單元4藉由該原始分數與該加權值相乘而予以加重扣分，並輸出相關於該原始分數或加重扣分後的加權後分數的評價，讓使用者能根據該評價判斷其所說出的語音與該訓練語句語意之間的差異程度並得知其說出的語音中缺漏或發音錯誤的單字。且能實現口語使用情境之臨場感，也能避免使用者面對真人教師因為難為情而開不了口之情形。In summary, the above embodiment uses the language recognition model to generate at least one text in the to-be-decided sentence corresponding to the training sentence from the training sentence and the voice input by the user, and according to the at least one character The importance of obtains an original score, and the processing unit 4 multiplies the original score by the weighted value to increase the deduction of points, and outputs an evaluation related to the original score or the weighted score after the increased deduction to let The user can judge the degree of difference between the spoken speech and the semantics of the training sentence according to the evaluation and learn the missing or incorrectly pronounced words in the spoken speech. And it can realize the presence of the spoken language usage situation, and can also prevent the user from being unable to speak because of embarrassment for the real teacher.

惟以上所述者，僅為本發明之實施例而已，當不能以此限定本發明實施之範圍，凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾，皆仍屬本發明專利涵蓋之範圍內。However, the above are only examples of the present invention, and should not be used to limit the scope of the present invention. Any simple equivalent changes and modifications made according to the scope of the patent application of the present invention and the content of the patent specification are still classified as This invention covers the patent.

100····· 口語教學輔助裝置 1········ 儲存單元 11······ 評價資訊 12······ 口語學習資料 121····· 多媒體資料 122····· 訓練語句 123····· 標準音節組 124····· 加權資訊 124A··· 加權文字 124B··· 加權值 2········ 語音輸入單元 21······ 麥克風 3········ 輸出單元 31······ 顯示螢幕 32······ 揚聲器 4········ 處理單元 S1~S11 100················································································································································ Multimedia material · Training sentences 123 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ···· Microphone 3········ Output unit 31······· Display screen 32······· Speaker 4········ Processing unit S1~S11

本發明之其他的特徵及功效，將於參照圖式的實施方式中清楚地呈現，其中：圖1是一方塊圖，說明本發明口語教學輔助裝置的一實施例；及圖2是一流程圖，說明本發明口語教學輔助方法的一實施例。Other features and functions of the present invention will be clearly presented in the embodiment with reference to the drawings, in which: FIG. 1 is a block diagram illustrating an embodiment of the oral teaching aid device of the present invention; and FIG. 2 is a flowchart Illustrates an embodiment of the oral teaching aid method of the present invention.

100····· 口語教學輔助裝置 1········ 儲存單元 11······ 評價資訊 12······ 口語學習資料 121····· 多媒體資料 122····· 訓練語句 123····· 標準音節組 124····· 加權資訊 124A··· 加權文字 124B··· 加權值 2········ 語音輸入單元 21······ 麥克風 3········ 輸出單元 31······ 顯示螢幕 32······ 揚聲器 4········ 處理單元 100················································································································································ Multimedia material · Training sentences 123 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ···· Microphone 3········ Output unit 31······· Display screen 32······· Speaker 4········· Processing unit

Claims

A spoken language teaching auxiliary method is executed by a spoken language teaching auxiliary device. The spoken language teaching auxiliary device includes a storage unit, a voice input unit, an output unit, and a process of electrically connecting the storage unit, the voice input unit and the output unit Unit, the storage unit stores a plurality of spoken language learning data, each of which includes a training sentence and a plurality of standard syllable groups corresponding to the training sentence and a weighted information, the training sentence is composed of a plurality of standard characters arranged in sequence Constituted, the standard syllable groups correspond to the texts respectively and are obtained by parsing the texts in advance, the weighted information has a weighted text and a weighted value corresponding to the weighted text, the weighted text corresponding to the text One of the characters included in the training sentence, the spoken language teaching aid includes: (A) the processing unit controls the output unit to output the training sentence; (B) the processing unit receives a word from the speech input unit When speaking, parse the speech to obtain a sentence to be judged consisting of a plurality of characters to be judged and a plurality of syllable groups to be judged corresponding to the characters waiting to be judged; (C) The processing unit judges the standard syllable groups When at least one syllable group that does not match the waiting determination syllable group, the at least one standard character corresponding to the at least one syllable group that is not matched is taken as at least one unmatched character; (D) the processing unit uses a language recognition model , According to the training sentence and the at least one unmatched text, generate an original score related to the at least one unmatched text; (E) When the processing unit determines that the at least one unmatched text corresponds to the weighted text, The weighted value corresponding to the weighted text is used as a target weighted value; and (F) the processing unit generates a weighted score based on the original score and the target weighted value, and controls the output unit to output a The evaluation of the voice.

The assisting method for spoken language teaching according to claim 1, wherein the output unit includes a display screen and a speaker, and in the step (E), the processing unit controls the display screen or the speaker to output the evaluation and controls the The display screen displays the at least one unmatched text.

The method for supporting spoken language teaching according to claim 1 or 2, wherein step (C) includes (c1) when the processing unit determines that at least one syllable group that does not correspond to the standard syllable group and the pending syllable group, Determining that the at least one syllable group that does not correspond to the standard syllable groups does not match the waiting determination syllable group; and (c2) when the processing unit determines that the standard syllable groups respectively correspond to the waiting determination syllable group, Whether the standard syllable groups are respectively the same as the waiting determination syllable group, and when it is determined that the standard syllable group is at least one syllable group different from the waiting determination syllable group, it is determined that the standard syllable group is different At least one syllable group does not match the waiting syllable group.

The auxiliary method for oral teaching according to claim 3, wherein each learning material further includes a multimedia material, the multimedia material has a virtual reality image related to the training sentence, and before step (A), the processing unit The multimedia data contained in the learning data controls the display screen to display virtual reality images of the multimedia data.

The oral teaching auxiliary method according to claim 4, wherein the oral teaching auxiliary device is a head-mounted display device, and the voice input unit includes a microphone.

The auxiliary method for teaching spoken language according to claim 1, wherein the language recognition model is an N-gram language recognition model.

A spoken language teaching auxiliary device includes a storage unit for storing a plurality of oral learning materials, each oral learning data includes a training sentence, a plurality of standard syllable groups and a weighted information, the training sentence is composed of multiple standards arranged in sequence Consisting of text, the standard syllable groups correspond to the standard text and are obtained by pre-analyzing the standard text, the weighted information has a weighted text and a weighted value corresponding to the weighted text, the weighted The text corresponds to one of the texts included in the training sentence; a voice input unit; an output unit; and a processing unit, electrically connected to the storage unit, the voice input unit, the output unit, and the voice input unit; Wherein, the processing unit controls the output unit to output training sentences; wherein, when receiving a speech from the speech input unit, the processing unit parses the speech to obtain a sentence to be judged consisting of a plurality of characters to be judged and more A syllable group to be determined corresponding to the waiting determination text; wherein, when the processing unit determines at least one syllable group that does not match the standard syllable group and the waiting determination syllable group, the at least one syllable group that does not match The at least one standard text corresponding to the group is regarded as at least one unmatched text; wherein, the processing unit uses a language recognition model to generate a text related to the at least one unmatched text according to the training sentence and the at least one unmatched text The original score; wherein, when the processing unit determines that the at least one unmatched text corresponds to the weighted text, the weighted value corresponding to the weighted text is used as a target weighted value; and, the processing unit is based on the original score and The target weighted value generates a weighted score, and controls the output unit to output an evaluation related to the speech according to the weighted score.

The spoken language teaching support device according to claim 7, wherein the output unit includes a display screen and a speaker, and the processing unit controls the display screen or the speaker to output the evaluation, and controls the display screen to display the at least one Match text.

The spoken language teaching aid according to claim 7 or 8, wherein the processing unit is determining whether the standard syllable groups are consistent with the standard syllable group when determining whether the standard syllable groups are consistent with the waiting determination syllable group When waiting to determine at least one syllable group that does not correspond to the syllable group, it is determined that the at least one syllable group that does not correspond to the standard syllable group does not match the waiting determination syllable group, and the processing unit determines that the standard syllable groups are respectively Corresponding to the waiting determination syllable group, it is also determined whether the standard syllable groups are respectively the same as the waiting determination syllable group, and when it is determined that at least one syllable group is different from the waiting determination syllable group, It is determined that the at least one syllable group that is different from the standard syllable groups does not match the pending syllable group.

The spoken language teaching auxiliary device according to claim 9, wherein each learning material further includes a multimedia material having a virtual reality image related to the training sentences, and the processing unit is controlling the output unit to output the Before the one of the training sentences, according to the multimedia data contained in the learning data, the display screen is controlled to display the virtual reality image of the multimedia data.

The spoken language teaching auxiliary device according to claim 10, wherein the spoken language training device is a head-mounted display device, the output unit further includes a speaker, and the voice input unit includes a microphone.

The spoken language teaching auxiliary device according to claim 7, wherein the language recognition model is an N-gram language recognition model.