TWI846240B

TWI846240B - Speech training method, speech training system and user interface thereof

Info

Publication number: TWI846240B
Application number: TW111150154A
Authority: TW
Inventors: 林書宇; 何冠廷; 陳昱璋; 田鈞獻
Original assignee: 財團法人工業技術研究院
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2024-06-21
Also published as: TW202427414A

Abstract

A speech training method for intelligently generating several training words for pronunciation training, comprising the following steps. Several word factors are received through an input search device, and several search words are obtained by searching on the cloud platform according to the word factors. Performing recognition processing, analysis processing and translation processing on the search words, through a computing device, to generate training words. The training words is presented according to a bar form or a graphic form, through a feedback device.

Description

Voice training method, voice training system and user interface thereof

本揭示關於一種資料處理的方法及系統，特別關於一種用於語音訓練的方法、系統及其使用者介面。The present disclosure relates to a method and system for data processing, and more particularly to a method, system and user interface thereof for speech training.

「構音異常」是構音位置錯誤、氣流方向及速度的不準確、及口腔動作不協調所造成的語音錯誤及不清晰。可能導致說話結巴不流利、或語言發展遲緩。「構音異常」是評估學齡前兒童生長發育的指標。"Arthria" is the incorrect position of articulation, inaccurate airflow direction and speed, and uncoordinated oral movements that cause speech errors and unclear speech. It may lead to stuttering and unfluent speech, or delayed language development. "Arthria" is an indicator for assessing the growth and development of preschool children.

現有的構音異常的治療方式中，訓練者(例如語言治療師)藉由實體紙本呈現訓練語料(即，訓練詞彙)，並輔以遊戲方式，對於被訓練者(例如兒童)進行語音訓練。在訓練療程進行前，訓練者對被訓練者進行評估，以確認被訓練者對於哪一個發音有異常。在訓練療程進行中，訓練者提供適合的訓練詞彙給被訓練者進行發音訓練，大約以6次作為一個療程。In the existing treatment methods for articulation disorders, trainers (e.g., speech therapists) conduct pronunciation training for trainees (e.g., children) by presenting training materials (i.e., training vocabulary) on physical paper and using games. Before the training course, the trainer assesses the trainee to confirm which pronunciation the trainee has an abnormality. During the training course, the trainer provides the trainee with appropriate training vocabulary for pronunciation training, and one course of treatment is about 6 times.

在每個療程之間的間隔期間，被訓練者可居家進行訓練。然而，若間隔期間過長，被訓練者僅能使用既有的訓練詞彙的題庫反覆訓練，容易產生倦怠感而降低訓練成效。若被訓練者(或被訓練者的父母)能夠自行產生並更新訓練詞彙，可增加被訓練者的興趣而提升訓練成效。During the interval between each course of treatment, the trainee can train at home. However, if the interval is too long, the trainee can only use the existing training vocabulary to repeatedly train, which is easy to produce fatigue and reduce the training effect. If the trainee (or the trainee's parents) can generate and update the training vocabulary by themselves, it can increase the trainee's interest and improve the training effect.

因此，本技術領域亟需改良的語音訓練方法及語音訓練系統，能夠適應性的提供訓練詞彙，以利於被訓練者隨時進行發音訓練。Therefore, there is an urgent need in the art for an improved speech training method and speech training system that can adaptively provide training vocabulary to facilitate trainees to perform pronunciation training at any time.

根據本揭示之一方面，提供一種語音訓練系統，用於智慧地產生複數個訓練詞彙以進行一發音訓練，包括輸入搜尋裝置、運算裝置及回饋裝置。輸入搜尋裝置用於接收複數個詞彙因子，並且根據該些詞彙因子於一雲端平台搜尋得到複數個搜尋詞彙。運算裝置用於對於該些搜尋詞彙進行一辨識處理、一解析處理及一轉譯處理，以產生該些訓練詞彙。回饋裝置用於根據一條列形式或一圖形形式呈現各該訓練詞彙。According to one aspect of the present disclosure, a speech training system is provided for intelligently generating a plurality of training words for performing a pronunciation training, including an input search device, a computing device, and a feedback device. The input search device is used to receive a plurality of vocabulary factors, and to search a plurality of search words on a cloud platform according to the vocabulary factors. The computing device is used to perform a recognition process, a parsing process, and a translation process on the search words to generate the training words. The feedback device is used to present each of the training words in a list form or a graphic form.

根據本揭示之另一方面，提供一種語音訓練方法，用於智慧地產生複數個訓練詞彙以進行一發音訓練，包括以下步驟。藉由一輸入搜尋裝置接收複數個詞彙因子，並且根據該些詞彙因子於一雲端平台搜尋得到複數個搜尋詞彙。藉由一運算裝置對於該些搜尋詞彙進行一辨識處理、一解析處理及一轉譯處理，以產生該些訓練詞彙。藉由一回饋裝置根據一條列形式或一圖形形式呈現各該訓練詞彙。According to another aspect of the present disclosure, a speech training method is provided for intelligently generating a plurality of training words for performing a pronunciation training, comprising the following steps. A plurality of vocabulary factors are received by an input search device, and a plurality of search words are searched on a cloud platform according to the vocabulary factors. A computing device performs a recognition process, a parsing process, and a translation process on the search words to generate the training words. A feedback device presents each of the training words in a list form or a graphic form.

透過閱讀以下圖式、詳細說明以及申請專利範圍，可見本揭示之其他方面以及優點。Other aspects and advantages of the present disclosure will become apparent by reading the following drawings, detailed description, and claims.

本說明書的技術用語係參照本技術領域之習慣用語，如本說明書對部分用語有加以說明或定義，該部分用語之解釋係以本說明書之說明或定義為準。本揭示之各個實施例分別具有一或多個技術特徵。在可能實施的前提下，本技術領域具有通常知識者可選擇性地實施任一實施例中部分或全部的技術特徵，或者選擇性地將這些實施例中部分或全部的技術特徵加以組合。The technical terms in this specification refer to the customary terms in this technical field. If this specification explains or defines some terms, the interpretation of these terms shall be subject to the explanation or definition in this specification. Each embodiment of this disclosure has one or more technical features. Under the premise of possible implementation, a person with ordinary knowledge in this technical field can selectively implement part or all of the technical features in any embodiment, or selectively combine part or all of the technical features in these embodiments.

第1A圖為本揭示一實施例之語音訓練系統1000的功能方塊圖。如第1A圖所示，語音訓練系統1000包括：輸入搜尋裝置100、運算裝置200、回饋裝置300及本地資料庫400。輸入搜尋裝置100連接於運算裝置200、運算裝置200連接於回饋裝置300及本地資料庫400。語音訓練系統1000係通訊地連接於雲端平台500以及終端裝置600。其中，輸入搜尋裝置100係通訊地連接於雲端平台500，回饋裝置300係通訊地連接於終端裝置600。FIG. 1A is a functional block diagram of a speech training system 1000 according to an embodiment of the present disclosure. As shown in FIG. 1A , the speech training system 1000 includes: an input search device 100, a computing device 200, a feedback device 300, and a local database 400. The input search device 100 is connected to the computing device 200, and the computing device 200 is connected to the feedback device 300 and the local database 400. The speech training system 1000 is communicatively connected to a cloud platform 500 and a terminal device 600. Among them, the input search device 100 is communicatively connected to the cloud platform 500, and the feedback device 300 is communicatively connected to the terminal device 600.

語音訓練系統1000智慧地自動產生多個訓練詞彙tw，此些訓練詞彙tw組成訓練詞彙群集TW。訓練詞彙tw提供給構音異常之被訓練者。被訓練者作為使用者U20的角色，根據訓練詞彙tw進行發音訓練，以矯正構音異常的症狀。例如，被訓練者經常在漢字的發音「ㄆㄧㄥ」發生構音異常，則訓練詞彙群集TW包括相關於發音「ㄆㄧㄥ」的「蘋果」、「瓶子」等等的訓練詞彙tw。The speech training system 1000 intelligently and automatically generates a plurality of training vocabularies tw, which form a training vocabulary cluster TW. The training vocabulary tw is provided to a trainee with articulation abnormality. The trainee, as the user U20, performs pronunciation training according to the training vocabulary tw to correct the symptoms of articulation abnormality. For example, if the trainee often has articulation abnormality in the pronunciation of the Chinese character "ㄆㄧㄥ", the training vocabulary cluster TW includes training vocabularies tw related to the pronunciation of "ㄆㄧㄥ", such as "apple", "bottle", etc.

另一方面，訓練者作為使用者U10的角色來操作語音訓練系統1000產生訓練詞彙群集TW。訓練者例如是被訓練者的家長、或被訓練者的治療機構的語言治療師。訓練者檢視被訓練者發生構音異常的發音，訓練者解析此發音的韻母、聲母、介符、韻腳及結合韻符以作為詞彙因子wf。例如，訓練者解析出發音「ㄆㄧㄥ」的聲母「ㄆ」及韻母「ㄧ」、「ㄥ」。此些韻母、聲母(或韻腳，等等)作為詞彙因子wf。On the other hand, the trainer operates the speech training system 1000 as the user U10 to generate a training vocabulary cluster TW. The trainer is, for example, the parent of the trainee or a speech therapist at the trainee's treatment institution. The trainer examines the trainee's pronunciation with abnormal articulation, and the trainer analyzes the vowels, consonants, prepositions, rhymes and combined rhymes of the pronunciation as vocabulary factors wf. For example, the trainer analyzes the initial consonant "ㄆ" and the vowels "ㄧ" and "ㄥ" of the pronunciation "ㄆㄧㄥ". These vowels, consonants (or rhymes, etc.) are used as vocabulary factors wf.

使用者U10(即，訓練者)經由輸入搜尋裝置100輸入詞彙因子wf。輸入搜尋裝置100具有文字輸入功能，使用者U10以文字方式輸入詞彙因子wf為「ㄆ」、「ㄧ」及「ㄥ」。或者，輸入搜尋裝置100具有語音輸入功能，使用者U10以口語方式輸入詞彙因子wf。The user U10 (i.e., the trainer) inputs the vocabulary factor wf via the input search device 100. The input search device 100 has a text input function, and the user U10 inputs the vocabulary factor wf as "ㄆ", "ㄧ", and "ㄥ" in text form. Alternatively, the input search device 100 has a voice input function, and the user U10 inputs the vocabulary factor wf in a spoken form.

輸入搜尋裝置100根據詞彙因子wf於雲端平台500進行搜尋。雲端平台500是網際網路上的開放平台，例如是網路搜尋引擎、網路影音平台或網路部落格。根據詞彙因子wf，從雲端平台500搜尋得到多個搜尋詞彙sw，其為流通於雲端平台500的常用詞彙或流行詞彙。例如，相關於詞彙因子wf為「ㄆ」、「ㄧ」及「ㄥ」的搜尋詞彙sw是「蘋果」、「瓶子」、「平劇」、「平行空間」與「屏風」，等等。此些搜尋詞彙sw組成搜尋詞彙群集SW。The input search device 100 searches on the cloud platform 500 according to the vocabulary factor wf. The cloud platform 500 is an open platform on the Internet, such as a web search engine, an online video platform, or an online blog. According to the vocabulary factor wf, a plurality of search terms sw are searched from the cloud platform 500, which are common or popular terms circulating on the cloud platform 500. For example, the search terms sw related to the vocabulary factors wf of "ㄆ", "ㄧ", and "ㄥ" are "apple", "bottle", "drama", "parallel space", and "screen", etc. These search terms sw constitute a search term cluster SW.

在另一種示例中，使用者U10亦可經由輸入搜尋裝置100直接輸入來源語料，例如，使用者U10直接輸入「蘋果」。另一方面，輸入搜尋裝置100於雲端平台500搜尋相關於詞彙因子wf的輔助圖像sg。輔助圖像sg例如是「蘋果」或「瓶子」的實物照片或手繪圖案。In another example, the user U10 can also directly input the source language through the input search device 100, for example, the user U10 directly inputs "apple". On the other hand, the input search device 100 searches for auxiliary images sg related to the vocabulary factor wf in the cloud platform 500. The auxiliary images sg are, for example, real photos or hand-drawn patterns of "apple" or "bottle".

運算裝置200對於搜尋詞彙群集SW之中的多個搜尋詞彙sw進行辨識處理、解析處理及轉譯處理，以產生多個訓練詞彙tw。此些訓練詞彙tw是從此些搜尋詞彙sw選取而得，訓練詞彙tw是適合於被訓練者進行發音訓練。此些訓練詞彙tw組成訓練詞彙群集TW。更具體而言，運算裝置200分析從雲端平台500汲取的資料(即，搜尋到的搜尋詞彙sw)，區分辨識此些資料之中的產品資訊及使用者訊息，判斷此些資料蘊含的語意，且彙整不同語系的資料。例如，從雲端平台500搜尋到的資料包括：「蘋果」、「瓶子」、「平劇」、「平行空間」與「屏風」等等的搜尋詞彙sw。當被訓練者為低齡的兒童(例如，(例如3~10歲)，被訓練者能夠理解的搜尋詞彙sw是「蘋果」與「瓶子」。據此，運算裝置200濾除「平劇」、「平行空間」與「屏風」，而選取「蘋果」與「瓶子」作為訓練詞彙tw。「蘋果」與「瓶子」之訓練詞彙tw組成訓練詞彙群集TW。另一方面，運算裝置200對於搜尋到的輔助圖像sg進行分析，以判斷是否為他人著作權，僅選取無侵權疑慮的輔助圖像sg。而後，訓練詞彙群集TW以及輔助圖像sg傳送至回饋裝置300，並且儲存於本地資料庫400。The computing device 200 performs recognition processing, parsing processing and translation processing on multiple search terms sw in the search term cluster SW to generate multiple training terms tw. These training terms tw are selected from these search terms sw, and the training terms tw are suitable for the trainees to perform pronunciation training. These training terms tw constitute the training term cluster TW. More specifically, the computing device 200 analyzes the data obtained from the cloud platform 500 (i.e., the searched search terms sw), distinguishes and recognizes the product information and user information in these data, determines the semantics contained in these data, and integrates data in different languages. For example, the data searched from the cloud platform 500 include search terms sw such as "apple", "bottle", "Ping opera", "parallel space" and "screen". When the trainee is a young child (e.g., 3 to 10 years old), the search terms sw that the trainee can understand are "apple" and "bottle". Accordingly, the computing device 200 filters out "Ping opera", "parallel space" and "screen", and selects "apple" and "bottle" as training terms tw. The training terms of "apple" and "bottle" The training vocabulary cluster TW is composed of the vocabulary tw. On the other hand, the computing device 200 analyzes the searched auxiliary images sg to determine whether they are copyrighted by others and only selects auxiliary images sg without infringement suspicion. Then, the training vocabulary cluster TW and the auxiliary images sg are transmitted to the feedback device 300 and stored in the local database 400.

本地資料庫400儲存目前產生的訓練詞彙群集TW及輔助圖像sg、及被訓練者歷來使用過的訓練詞彙群集TW。並且，本地資料庫400儲存被訓練者進行發音訓練的歷史紀錄，包括：進行發音訓練的時程以及訓練結果。此歷史紀錄可反映被訓練者的學習成長經驗。本地資料庫400根據目前產生及歷來使用過的訓練詞彙群集TW、輔助圖像、及被訓練者進行發音訓練的歷史紀錄，以建立訓練詞彙題庫TWL。The local database 400 stores the currently generated training vocabulary clusters TW and auxiliary images sg, and the training vocabulary clusters TW that the trainee has used in the past. In addition, the local database 400 stores the historical records of the trainee's pronunciation training, including: the schedule of pronunciation training and the training results. This historical record can reflect the trainee's learning and growth experience. The local database 400 establishes a training vocabulary question bank TWL based on the currently generated and previously used training vocabulary clusters TW, auxiliary images, and the historical records of the trainee's pronunciation training.

回饋裝置300根據訓練詞彙題庫TWL及被訓練者進行發音訓練的歷史紀錄來處理訓練詞彙tw，使得訓練詞彙tw以適合的呈現方式而呈現。例如，回饋裝置300進行排列處理，使訓練詞彙tw具有條列形式。訓練詞彙tw的排列順序可以由訓練者指定、或回饋裝置300根據被訓練者的歷史紀錄來決定。並且，回饋裝置300進行視覺化處理，使訓練詞彙tw具有圖形形式(例如，適合兒童的可愛字體)。The feedback device 300 processes the training vocabulary tw according to the training vocabulary question bank TWL and the historical record of the pronunciation training of the trainee, so that the training vocabulary tw is presented in a suitable presentation manner. For example, the feedback device 300 performs arrangement processing so that the training vocabulary tw has a list form. The arrangement order of the training vocabulary tw can be specified by the trainee, or the feedback device 300 determines it according to the historical record of the trainee. In addition, the feedback device 300 performs visualization processing so that the training vocabulary tw has a graphic form (for example, a cute font suitable for children).

回饋裝置300係通訊地連接於終端裝置600。回饋裝置300將條列形式或圖形形式的訓練詞彙tw傳送至終端裝置600。終端裝置600以條列形式或圖形形式將訓練詞彙tw呈現給使用者U20(即，被訓練者)，終端裝置600並將搜尋到的輔助圖像sg呈現給使用者U20。The feedback device 300 is communicatively connected to the terminal device 600. The feedback device 300 transmits the training vocabulary tw in a list form or a graphic form to the terminal device 600. The terminal device 600 presents the training vocabulary tw to the user U20 (i.e., the trainee) in a list form or a graphic form, and the terminal device 600 also presents the searched auxiliary image sg to the user U20.

終端裝置600是被訓練者目前使用的裝置。終端裝置600可以是可攜式的行動運算裝置，例如：智慧型手機、平板電腦或筆記型電腦。終端裝置600也可以是固定式的運算裝置，例如：桌上型電腦或智慧型電視。The terminal device 600 is the device currently used by the trainee. The terminal device 600 may be a portable mobile computing device, such as a smart phone, a tablet computer, or a laptop computer. The terminal device 600 may also be a fixed computing device, such as a desktop computer or a smart TV.

在本實施例中，語音訓練系統1000是獨立於終端裝置600。例如，語音訓練系統1000設置於終端裝置600以外的另一個終端裝置700。In this embodiment, the speech training system 1000 is independent of the terminal device 600. For example, the speech training system 1000 is disposed in another terminal device 700 other than the terminal device 600.

終端裝置700的輸入介面(例如終端裝置700的觸控螢幕或麥克風)可作為語音訓練系統1000之中的輸入搜尋裝置100。終端裝置700的中央處理器(CPU)或圖形處理器(GPU)、或終端裝置700執行的應用程式，可作為語音訓練系統1000之中的運算裝置200及回饋裝置300。終端裝置700的記憶體或儲存媒介可作為語音訓練系統1000之中的本地資料庫400。The input interface of the terminal device 700 (e.g., a touch screen or microphone of the terminal device 700) can be used as the input search device 100 in the voice training system 1000. The central processing unit (CPU) or graphics processing unit (GPU) of the terminal device 700, or the application program executed by the terminal device 700, can be used as the computing device 200 and the feedback device 300 in the voice training system 1000. The memory or storage medium of the terminal device 700 can be used as the local database 400 in the voice training system 1000.

當語音訓練系統1000獨立於終端裝置600時，使用者U10(即，訓練者)經由終端裝置600以外的終端裝置700操作語音訓練系統1000，據以產生訓練詞彙tw。終端裝置700係通訊地連接於終端裝置600，終端裝置700經由有線通訊或無線通訊將訓練詞彙tw傳送至終端裝置600。使用者U20(即，被訓練者)經由終端裝置600讀取條列形式或圖形形式的訓練詞彙tw及輔助圖像sg，據以進行發音訓練。When the voice training system 1000 is independent of the terminal device 600, the user U10 (i.e., the trainer) operates the voice training system 1000 via the terminal device 700 other than the terminal device 600 to generate training vocabulary tw. The terminal device 700 is communicatively connected to the terminal device 600, and the terminal device 700 transmits the training vocabulary tw to the terminal device 600 via wired communication or wireless communication. The user U20 (i.e., the trainee) reads the training vocabulary tw and the auxiliary image sg in the form of a list or a graphic via the terminal device 600 to perform pronunciation training.

可變換的，語音訓練系統1000可整合於終端裝置600。第1B圖為本揭示另一實施例之語音訓練系統1000b的功能方塊圖。如第1B圖所示，語音訓練系統1000b設置於終端裝置600，語音訓練系統1000b的各元件(即，輸入搜尋裝置100、運算裝置200、回饋裝置300及本地資料庫400)分別對應於終端裝置600的中央處理器、圖形處理器、記憶體或儲存媒介，或者是終端裝置600執行的應用程式。Alternatively, the speech training system 1000 can be integrated into the terminal device 600. FIG. 1B is a functional block diagram of a speech training system 1000b of another embodiment of the present disclosure. As shown in FIG. 1B, the speech training system 1000b is disposed in the terminal device 600, and each component of the speech training system 1000b (i.e., the input search device 100, the computing device 200, the feedback device 300, and the local database 400) respectively corresponds to the central processing unit, the graphics processing unit, the memory or the storage medium of the terminal device 600, or is an application executed by the terminal device 600.

當語音訓練系統1000b整合於終端裝置600時，終端裝置600先交付於使用者U10(即，訓練者)，使用者U10經由終端裝置600操作語音訓練系統1000b以產生訓練詞彙tw。而後，終端裝置600交付於使用者U20(即，被訓練者)，使用者U20經由終端裝置600讀取條列形式或圖形形式的訓練詞彙tw及輔助圖像sg。When the voice training system 1000b is integrated into the terminal device 600, the terminal device 600 is first delivered to the user U10 (i.e., the trainer), and the user U10 operates the voice training system 1000b via the terminal device 600 to generate training vocabulary tw. Then, the terminal device 600 is delivered to the user U20 (i.e., the trainee), and the user U20 reads the training vocabulary tw and the auxiliary image sg in the form of a list or a graphic via the terminal device 600.

在一種示例中，被訓練者亦可作為使用者U10的角色，由被訓練者自行操作終端裝置600的語音訓練系統1000b以產生訓練詞彙tw。據此，在被訓練者離開治醫機構的居家期間(例如兩次就診之間隔期間)，被訓練者可自行操作語音訓練系統1000b產生訓練語料以自我訓練。In one example, the trainee can also play the role of the user U10, and the trainee can operate the voice training system 1000b of the terminal device 600 to generate training vocabulary tw. Accordingly, when the trainee is away from the medical institution and staying at home (for example, between two visits to the doctor), the trainee can operate the voice training system 1000b to generate training corpus for self-training.

並且，被訓練者的終端裝置600可同步於訓練者目前使用的終端裝置(圖中未顯示)，且操作語音訓練系統1000b的回饋裝置300可提供被訓練者與訓練者之間的溝通機制。訓練者可經由回饋裝置300下達指令以指導被訓練者。並且，被訓練者可經由回饋裝置300回報訓練結果。據此，在兩次就診之間隔期間內，訓練者無須等待被訓練者產生訓練詞彙tw，被訓練者亦可隨時掌握訓練者的訓練狀況，以節省雙方的時間。Furthermore, the terminal device 600 of the trainee can be synchronized with the terminal device currently used by the trainer (not shown in the figure), and the feedback device 300 of the voice training system 1000b can provide a communication mechanism between the trainee and the trainer. The trainer can issue instructions to guide the trainee through the feedback device 300. Furthermore, the trainee can report the training results through the feedback device 300. Accordingly, during the interval between two consultations, the trainer does not need to wait for the trainee to generate training vocabulary tw, and the trainee can also grasp the training status of the trainer at any time to save time for both parties.

第1C圖為語音訓練系統1000之運算裝置200的功能方塊圖。如第1C圖所示，運算裝置200包括辨識裝置210、解析裝置220及轉譯裝置230。FIG. 1C is a functional block diagram of the computing device 200 of the speech training system 1000. As shown in FIG. 1C, the computing device 200 includes a recognition device 210, a parsing device 220, and a translation device 230.

辨識裝置210、解析裝置220及轉譯裝置230皆連接於輸入搜尋裝置100，以從輸入搜尋裝置100接收搜尋詞彙群集SW及輔助圖像sg，並對於搜尋詞彙群集SW進行處理以產生訓練詞彙群集TW。並且，辨識裝置210、解析裝置220及轉譯裝置230皆連接於回饋裝置300，以將訓練詞彙群集TW及輔助圖像sg傳送至回饋裝置300。The recognition device 210, the parsing device 220 and the translation device 230 are all connected to the input search device 100 to receive the search vocabulary cluster SW and the auxiliary image sg from the input search device 100, and process the search vocabulary cluster SW to generate the training vocabulary cluster TW. In addition, the recognition device 210, the parsing device 220 and the translation device 230 are all connected to the feedback device 300 to transmit the training vocabulary cluster TW and the auxiliary image sg to the feedback device 300.

在運作上，辨識裝置210可執行偵測程式，其對於搜尋詞彙群集SW之中的多個搜尋詞彙sw進行辨識處理，以區分其中的產品資訊及使用者訊息。並且，辨識裝置210可判斷搜尋詞彙sw的同音字及破音字的相關資料。在一種示例中，辨識裝置210可偵測此些搜尋詞彙sw是否為符合使用者U10(即，訓練者)期望的訓練語料。辨識裝置210濾除不符合使用者U10期望的搜尋詞彙sw。例如，搜尋詞彙sw包括「澎湖」及「皮球」，其發音相異於「ㄆ」、「ㄧ」及「ㄥ」，表示「澎湖」及「皮球」不符合比對結果，因而濾除。又例如，搜尋詞彙sw包括「屁股」，其語意屬於不雅的範疇，因而濾除。輸入搜尋裝置100進而於雲端平台500搜尋更相關於「ㄆ」、「ㄧ」及「ㄥ」的詞彙。In operation, the recognition device 210 can execute a detection program, which performs recognition processing on multiple search terms sw in the search term cluster SW to distinguish the product information and user information therein. In addition, the recognition device 210 can determine the relevant data of the homophones and pronounciations of the search terms sw. In one example, the recognition device 210 can detect whether these search terms sw are training corpora that meet the expectations of the user U10 (i.e., the trainer). The recognition device 210 filters out the search terms sw that do not meet the expectations of the user U10. For example, the search words sw include "Penghu" and "piqiu", which are pronounced differently from "ㄆ", "ㄧ" and "ㄥ", indicating that "Penghu" and "piqiu" do not meet the matching results and are therefore filtered out. For another example, the search words sw include "buttocks", which have an indecent meaning and are therefore filtered out. The input search device 100 further searches the cloud platform 500 for words more related to "ㄆ", "ㄧ" and "ㄥ".

解析裝置220可執行解析程式，以分析詞彙因子wf相關的語句、字串或流行詞彙。例如，在雲端平台500上常用的流行詞彙是「乒乓球比賽」，其相關於「ㄆ」、「ㄧ」及「ㄥ」的詞彙因子wf。在一種示例中，解析裝置220更可連接於本地資料庫400，從本地資料庫400取得被訓練者進行訓練的歷史紀錄，據以分析被訓練者的個人背景，包括：被訓練者對於哪一類型的語彙較有興趣，等等。據此，解析裝置220可經由特徵篩選對於多個搜尋詞彙sw進行集群整理，以從此些搜尋詞彙sw之中選取出適合於被訓練者的訓練詞彙tw。The parsing device 220 can execute a parsing program to analyze sentences, strings or popular words related to the vocabulary factor wf. For example, a popular word commonly used on the cloud platform 500 is "ping-pong competition", which is related to the vocabulary factors wf of "ㄆ", "ㄧ" and "ㄥ". In one example, the parsing device 220 can be further connected to the local database 400 to obtain the historical records of the trainee's training from the local database 400 to analyze the trainee's personal background, including: what type of vocabulary the trainee is more interested in, etc. Accordingly, the analysis device 220 can cluster the multiple search terms sw through feature screening to select training terms tw suitable for the trainee from these search terms sw.

轉譯裝置230可執行轉譯程式，以轉譯不同韻母的類似詞彙，可擴充訓練詞彙tw而提供被訓練者更多樣化的訓練。例如，「ㄆ」、「ㄧ」及「ㄥ」的詞彙因子wf其中的韻母「ㄥ」可轉譯成韻母「ㄣ」，以得到發音為「ㄆ」、「ㄧ」及「ㄣ」的類似詞彙，例如「拼圖」及「品格」。據此，被訓練者可從回饋裝置300得到「蘋果」、「瓶子」、「拼圖」及「品格」等訓練詞彙tw，使發音訓練更多樣化。The translation device 230 can execute a translation program to translate similar words with different vowels, and can expand the training vocabulary tw to provide the trainee with more diversified training. For example, the vowel "ㄥ" in the vocabulary factors wf of "ㄆ", "ㄧ" and "ㄥ" can be translated into the vowel "ㄣ" to obtain similar words pronounced as "ㄆ", "ㄧ" and "ㄣ", such as "jigsaw" and "character". Accordingly, the trainee can obtain training vocabulary tw such as "apple", "bottle", "jigsaw" and "character" from the feedback device 300, making the pronunciation training more diversified.

第2A圖為語音訓練系統1000或1000b之輸入搜尋裝置100的操作介面31a的示意圖，以語音訓練系統1000b為例說明。語音訓練系統1000b設置於終端裝置600(對應於第1B圖的實施例)，終端裝置600例如是智慧型手機。訓練者作為使用者U10的角色，訓練者經由操作介面31a操作輸入搜尋裝置100，以建立將實施於被訓練者的題型。FIG. 2A is a schematic diagram of the operation interface 31a of the input search device 100 of the voice training system 1000 or 1000b, and the voice training system 1000b is used as an example for explanation. The voice training system 1000b is set in the terminal device 600 (corresponding to the embodiment of FIG. 1B), and the terminal device 600 is, for example, a smart phone. The trainer acts as the user U10, and the trainer operates the input search device 100 through the operation interface 31a to establish the question type to be implemented on the trainee.

操作介面31a例如是圖形化使用者介面(GUI)。操作介面31a包括多個操作區域311~313及321~326。操作區域311是主選單，其包括多個選項，例如：「主頁」、「就診紀錄」及「居家訓練」。當選項「主頁」被選取時，操作區域312、313顯示題型的綱目。例如，操作區域312顯示「標準訓練」，操作區域313顯示「兒童訓練」，其適合於低齡的被訓練者。The operation interface 31a is, for example, a graphical user interface (GUI). The operation interface 31a includes a plurality of operation areas 311 to 313 and 321 to 326. The operation area 311 is a main menu, which includes a plurality of options, such as: "Home", "Medical Records" and "Home Training". When the option "Home" is selected, the operation areas 312 and 313 display the outline of the question types. For example, the operation area 312 displays "Standard Training" and the operation area 313 displays "Children's Training", which is suitable for younger trainees.

操作區域321~326則選擇性的顯示更細部的自訂題型。例如，操作區域322顯示的自訂題型是「兒童居家訓練」，操作區域325顯示的自訂題型是「綜合訓練重編版」。The operation areas 321 to 326 selectively display more detailed customized question types. For example, the customized question type displayed in the operation area 322 is "Children's Home Training", and the customized question type displayed in the operation area 325 is "Comprehensive Training Revised Edition".

第2B圖為語音訓練系統1000或1000b之輸入搜尋裝置100的另一個操作介面31b的示意圖，以語音訓練系統1000b為例說明。訓練者經由第2A圖的操作介面31a建立題型的綱目及/或自訂題型之後，訓練者再經由第2B圖的操作介面31b選擇訓練詞彙tw。FIG. 2B is a schematic diagram of another operation interface 31b of the input search device 100 of the speech training system 1000 or 1000b, and the speech training system 1000b is used as an example for explanation. After the trainer creates the outline of the question type and/or customizes the question type through the operation interface 31a of FIG. 2A, the trainer selects the training vocabulary tw through the operation interface 31b of FIG. 2B.

訓練者先經由操作介面31b的操作區域341、342輸入詞彙因子wf。例如，經由操作區域341輸入聲母「ㄆ」，經由操作區域342輸入韻母「ㄣ」。另一方面，操作區域343顯示從雲端平台500得到的相關於聲母「ㄆ」及韻母「ㄣ」的搜尋詞彙sw，包括：「噴泉」、「拼音」、「臉盆」及「乒乓」，等等。The trainer first inputs the vocabulary factor wf through the operation areas 341 and 342 of the operation interface 31b. For example, the initial consonant "ㄆ" is input through the operation area 341, and the final vowel "ㄣ" is input through the operation area 342. On the other hand, the operation area 343 displays the search words sw related to the initial consonant "ㄆ" and the final vowel "ㄣ" obtained from the cloud platform 500, including: "fountain", "pinyin", "wash basin" and "ping pong", etc.

而後，運算裝置200對於上述的搜尋詞彙sw進行辨識處理、解析處理及轉譯處理，以從中選取適合於被訓練者的訓練詞彙tw，包括：「噴泉」及「乒乓」。並且，操作區域345顯示所選取之「噴泉」及「乒乓」的訓練詞彙tw。Then, the computing device 200 performs recognition processing, analysis processing and translation processing on the search words sw to select training words tw suitable for the trainee, including: "fountain" and "ping pong". In addition, the operation area 345 displays the selected training words tw of "fountain" and "ping pong".

第3A圖為語音訓練系統1000或1000b之回饋裝置300的操作介面41a的示意圖，以語音訓練系統1000b為例說明。被訓練者作為使用者U20的角色，被訓練者經由操作介面41a操作回饋裝置300，以進行發音訓練。如第3A圖所示，操作介面41a的操作區域411依序顯示條列形式的訓練詞彙tw，例如，目前顯示的訓練詞彙tw是「蘋果」。並且，回饋裝置300可用圖形形式加強顯示訓練詞彙tw，例如，用適合兒童的可愛字體414顯示「蘋果」於操作區域411。FIG. 3A is a schematic diagram of the operation interface 41a of the feedback device 300 of the speech training system 1000 or 1000b, and the speech training system 1000b is used as an example for explanation. The trainee plays the role of the user U20, and the trainee operates the feedback device 300 through the operation interface 41a to perform pronunciation training. As shown in FIG. 3A, the operation area 411 of the operation interface 41a displays the training vocabulary tw in the form of a list in sequence. For example, the training vocabulary tw currently displayed is "apple". In addition, the feedback device 300 can enhance the display of the training vocabulary tw in a graphical form, for example, displaying "apple" in the operation area 411 using a cute font 414 suitable for children.

另一方面，操作區域412為錄音鍵，被訓練者觸碰或滑動錄音鍵以啟動終端裝置600的錄音功能，將被訓練者對於「蘋果」的發音錄製於語音訓練系統1000b之本地資料庫400。操作區域413可對應地顯示錄音時間。On the other hand, the operation area 412 is a recording button. The trainee touches or slides the recording button to activate the recording function of the terminal device 600, and the trainee's pronunciation of "apple" is recorded in the local database 400 of the voice training system 1000b. The operation area 413 can display the recording time accordingly.

第3B圖為語音訓練系統1000或1000b之回饋裝置300的另一個操作介面41b的示意圖。如第3B圖所示，回饋裝置300經由操作介面41b顯示相關於「蘋果」的輔助圖像sg。操作介面41b的操作區域421顯示「蘋果」的文字形式，操作區域423顯示「蘋果」的輔助圖像sg。當被訓練者為低齡的兒童時，輔助圖像sg有助於提升被訓練者的興趣，且操作區域424顯示引導敘述，以提示被訓練者關注輔助圖像sg而進行訓練。類似於第3A圖的操作介面41a，操作介面41b的操作區域422為錄音鍵。Figure 3B is a schematic diagram of another operation interface 41b of the feedback device 300 of the voice training system 1000 or 1000b. As shown in Figure 3B, the feedback device 300 displays an auxiliary image sg related to "apple" through the operation interface 41b. The operation area 421 of the operation interface 41b displays the text form of "apple", and the operation area 423 displays the auxiliary image sg of "apple". When the trainee is a young child, the auxiliary image sg helps to enhance the interest of the trainee, and the operation area 424 displays a guiding narrative to prompt the trainee to pay attention to the auxiliary image sg for training. Similar to the operation interface 41a of FIG. 3A , the operation area 422 of the operation interface 41b is a recording key.

第4圖為語音訓練系統1000或1000b之回饋裝置300的又一個操作介面51的示意圖。操作介面51是由訓練者作為使用者U10的角色而操作，訓練者經由操作介面51控制回饋裝置300以檢視被訓練者的訓練結果。如第4圖所示，操作介面51的操作區域511顯示：目前欲檢視的訓練詞彙tw為「蘋果」，回饋裝置300可從本地資料庫400取得被訓練者對於「蘋果」的發音之錄製內容，且進行播放(即，錄音回放)。FIG. 4 is a schematic diagram of another operation interface 51 of the feedback device 300 of the speech training system 1000 or 1000b. The operation interface 51 is operated by the trainer as the user U10. The trainer controls the feedback device 300 through the operation interface 51 to view the training results of the trainee. As shown in FIG. 4, the operation area 511 of the operation interface 51 shows that the current training vocabulary tw to be viewed is "apple", and the feedback device 300 can obtain the recorded content of the trainee's pronunciation of "apple" from the local database 400 and play it (i.e., recording playback).

根據錄音回放，訓練者可檢視被訓練者對於「蘋果」的訓練結果，以判斷被訓練者的發音是否發生構音異常。訓練者可將訓練結果紀錄於操作區域512~515。例如，被訓練者對於「蘋果」的首字「蘋」的聲母「ㄆ」及韻母「ㄧ」、「ㄥ」的發音判斷為正確，訓練者在操作區域512及513選取「T」(即，True)，表示正面的訓練結果。According to the recording playback, the trainer can check the trainee's training results for "蘋果" to determine whether the trainee's pronunciation has articulation abnormalities. The trainer can record the training results in the operation areas 512~515. For example, if the trainee's pronunciation of the initial consonant "ㄆ" and the vowels "ㄧ" and "ㄥ" of the first character "蘋果" is correct, the trainer selects "T" (i.e., True) in the operation areas 512 and 513, indicating a positive training result.

類似的，被訓練者對於次字「果」的韻母「ㄨ」及「ㄛ」的發音正確，則訓練者在操作區域515選取「T」以表示正面的訓練結果。另一方面，被訓練者對於「果」的聲母「ㄍ」的發音錯誤，則在操作區域514選取「F」(即，False)以表示負面的訓練結果。在一種示例中，負面的訓練結果「F」以高反差色彩顯示。Similarly, if the trainee pronounces the vowels "ㄨ" and "ㄛ" of the second character "果" correctly, the trainee selects "T" in the operation area 515 to indicate a positive training result. On the other hand, if the trainee pronounces the initial consonant "ㄍ" of "果" incorrectly, the trainee selects "F" (i.e., False) in the operation area 514 to indicate a negative training result. In one example, the negative training result "F" is displayed in a high-contrast color.

第5圖為本揭示一實施例的語音訓練方法的流程圖，本實施例的語音訓練方法可配合於第1A圖至第4圖之實施例的語音訓練系統1000或1000b而實施。參見第5圖，首先，在步驟S102，藉由輸入搜尋裝置100接收詞彙因子wf，並且根據詞彙因子wf於雲端平台500搜尋得到搜尋詞彙sw。FIG. 5 is a flow chart of a speech training method according to an embodiment of the present disclosure. The speech training method according to the present embodiment can be implemented in conjunction with the speech training system 1000 or 1000b of the embodiments of FIG. 1A to FIG. 4. Referring to FIG. 5, first, in step S102, the input search device 100 receives a vocabulary factor wf, and searches the cloud platform 500 for a search vocabulary sw according to the vocabulary factor wf.

而後，在步驟S104，藉由運算裝置200對於搜尋詞彙sw進行辨識處理、解析處理及轉譯處理，以產生訓練詞彙tw。步驟S104包括以下子步驟。在步驟S104-1，藉由辨識裝置210判斷搜尋詞彙sw的同音字及破音字。在步驟S104-2，藉由解析裝置220分析詞彙因子wf相關的字串或流行詞彙。在步驟S104-3，藉由轉譯裝置230轉譯該韻母相關的類似詞彙。Then, in step S104, the search vocabulary sw is processed by the computing device 200 for recognition, analysis and translation to generate the training vocabulary tw. Step S104 includes the following sub-steps. In step S104-1, the homophones and broken pronunciations of the search vocabulary sw are determined by the recognition device 210. In step S104-2, the character strings or popular words related to the vocabulary factor wf are analyzed by the parsing device 220. In step S104-3, the similar words related to the vowel are translated by the translation device 230.

而後，在步驟S106，將訓練詞彙tw及發音訓練的歷史紀錄儲存於本地資料庫400。而後，在步驟S108，根據訓練詞彙tw及歷史紀錄產生訓練詞彙題庫TWL。而後，在步驟S110，藉由回饋裝置300根據訓練詞彙題庫TWL及歷史紀錄來處理訓練詞彙tw，使得訓練詞彙tw具有條列形式或圖形形式。Then, in step S106, the training vocabulary tw and the history record of pronunciation training are stored in the local database 400. Then, in step S108, a training vocabulary question bank TWL is generated according to the training vocabulary tw and the history record. Then, in step S110, the training vocabulary tw is processed by the feedback device 300 according to the training vocabulary question bank TWL and the history record, so that the training vocabulary tw has a list form or a graphic form.

而後，在步驟S112，藉由回饋裝置300根據條列形式或圖形形式呈現訓練詞彙tw。而後，在步驟S114，藉由回饋裝置300將具有條列形式或圖形形式的訓練詞彙tw傳送至終端裝置600。Then, in step S112, the training vocabulary tw is presented in a list form or a graphic form by the feedback device 300. Then, in step S114, the training vocabulary tw in a list form or a graphic form is transmitted to the terminal device 600 by the feedback device 300.

雖然本發明已以較佳實施例及範例詳細揭示如上，可理解的是，此些範例意指說明而非限制之意義。可預期的是，所屬技術領域中具有通常知識者可想到多種修改及組合，其多種修改及組合落在本發明之精神以及後附之申請專利範圍之範圍內。Although the present invention has been disclosed in detail with preferred embodiments and examples, it is understood that these examples are intended to be illustrative rather than restrictive. It is expected that a person with ordinary knowledge in the art can think of various modifications and combinations, which fall within the spirit of the present invention and the scope of the attached patent application.

1000,1000b:語音訓練系統 100:輸入搜尋裝置 200:運算裝置 300:回饋裝置 400:本地資料庫 500:雲端平台 600,700:終端裝置 210:辨識裝置 220:解析裝置 230:轉譯裝置 31a,31b,41a,41b,51:操作介面 311~313,321~326,341~345:操作區域 411~414,421~424,511~515:操作區域 U10,U20:使用者 wf:詞彙因子 SW:搜尋詞彙群集 sw:搜尋詞彙 sg:輔助圖像 TW:訓練詞彙群集 tw:訓練詞彙 TWL:訓練詞彙題庫 S102~S114:步驟 1000,1000b: Voice training system 100: Input search device 200: Computing device 300: Feedback device 400: Local database 500: Cloud platform 600,700: Terminal device 210: Recognition device 220: Analysis device 230: Translation device 31a,31b,41a,41b,51: Operation interface 311~313,321~326,341~345: Operation area 411~414,421~424,511~515: Operation area U10,U20: User wf: Vocabulary factor SW: Search vocabulary cluster sw: Search vocabulary sg: auxiliary image TW: training vocabulary cluster tw: training vocabulary TWL: training vocabulary question bank S102~S114: steps

第1A圖為本揭示一實施例之語音訓練系統的功能方塊圖。第1B圖為本揭示另一實施例之語音訓練系統的功能方塊圖。第1C圖為語音訓練系統之運算裝置的功能方塊圖。第2A圖為語音訓練系統之輸入搜尋裝置的操作介面的示意圖。第2B圖為語音訓練系統之輸入搜尋裝置的另一個操作介面的示意圖。第3A圖為語音訓練系統之回饋裝置的操作介面的示意圖。第3B圖為語音訓練系統之回饋裝置的另一個操作介面的示意圖。第4圖為語音訓練系統之回饋裝置的又一個操作介面的示意圖。第5圖為本揭示一實施例的語音訓練方法的流程圖。 FIG. 1A is a functional block diagram of a speech training system of one embodiment of the present disclosure. FIG. 1B is a functional block diagram of a speech training system of another embodiment of the present disclosure. FIG. 1C is a functional block diagram of a computing device of the speech training system. FIG. 2A is a schematic diagram of an operation interface of an input search device of the speech training system. FIG. 2B is a schematic diagram of another operation interface of an input search device of the speech training system. FIG. 3A is a schematic diagram of an operation interface of a feedback device of the speech training system. FIG. 3B is a schematic diagram of another operation interface of a feedback device of the speech training system. FIG. 4 is a schematic diagram of another operation interface of a feedback device of the speech training system. Figure 5 is a flow chart of the speech training method of an embodiment of the present disclosure.

1000:語音訓練系統 1000: Voice training system

100:輸入搜尋裝置 100: Enter the search device

200:運算裝置 200: Computing device

300:回饋裝置 300: Feedback device

400:本地資料庫 400: Local database

500:雲端平台 500: Cloud Platform

600,700:終端裝置 600,700: terminal device

U10,U20:使用者 U10,U20:User

wf:詞彙因子 wf: Lexical factor

SW:搜尋詞彙群集 SW:Search term clusters

sw:搜尋詞彙 sw:Search vocabulary

sg:輔助圖像 sg: auxiliary images

TW:訓練詞彙群集 TW: Training vocabulary clusters

tw:訓練詞彙 tw: Vocabulary training

TWL:訓練詞彙題庫 TWL: Vocabulary Training Question Bank

Claims

A speech training system is used to intelligently generate a plurality of training words for a pronunciation training, comprising: an input search device for receiving a plurality of vocabulary factors and searching a plurality of search words on a cloud platform according to the vocabulary factors; an operation device for performing a recognition process, a parsing process and a translation process on the search words to generate the training words; and a feedback device for presenting each of the training words in a list form or a graphic form.

The speech training system as described in claim 1 further includes: A local database for storing the training vocabularies and a history record of the pronunciation training, and generating a training vocabulary question bank based on the training vocabularies and the history record.

A speech training system as described in claim 2, wherein the feedback device processes the training vocabulary according to the training vocabulary question bank and the historical record so that each of the training vocabulary has the list form or the graphic form.

The speech training system as described in claim 1, wherein the vocabulary factors include at least one initial consonant and one final consonant with abnormal articulation.

The speech training system as described in claim 4, wherein the computing device comprises: a recognition device for determining homophones and pronounciations of each of the search words; a parsing device for analyzing strings or popular words related to the vocabulary factors; and a translation device for translating similar words related to the vowels.

The speech training system as described in claim 1, wherein the cloud platform is an Internet search engine, an Internet video platform or an Internet blog.

A voice training system as described in claim 6, wherein the search terms are common terms or popular terms circulated on the cloud platform.

A speech training system as described in claim 1, wherein the feedback device is communicatively connected to a terminal device, and the feedback device transmits each of the training words in the list form or the graphic form to the terminal device.

A speech training method is used to intelligently generate a plurality of training words for performing a pronunciation training, comprising: Receiving a plurality of vocabulary factors by an input search device, and searching a plurality of search words on a cloud platform according to the vocabulary factors; Performing a recognition process, a parsing process and a translation process on the search words by a computing device to generate the training words; and Presenting each of the training words in a list form or a graphic form by a feedback device.

The speech training method as described in claim 9 further includes: Storing the training vocabularies and a history record of the pronunciation training in a local database; and Generating a training vocabulary question bank based on the training vocabularies and the history record.

The speech training method as described in claim 10 further includes: The feedback device processes the training vocabulary according to the training vocabulary question bank and the historical record, so that each training vocabulary has the list form or the graphic form.

The speech training method as described in claim 9, wherein the vocabulary factors include at least one initial consonant and one final consonant with abnormal articulation.

The speech training method as described in claim 12, wherein the steps of performing the recognition processing, the parsing processing and the translation processing on the search words by the computing device include: Determining homophones and pronounciations of each of the search words by a recognition device; Analyzing strings or popular words related to the vocabulary factors by a parsing device; and Translating similar words related to the vowels by a translation device.

The voice training method as described in claim 9, wherein the cloud platform is an Internet search engine, an Internet video platform or an Internet blog.

A voice training method as described in claim 14, wherein the search terms are common terms or popular terms circulated on the cloud platform.

The voice training method as described in claim 9, wherein the feedback device is communicatively connected to a terminal device, and the voice training method further comprises: The feedback device transmits each of the training words in the list form or the graphic form to the terminal device.

A user interface of a speech training system is used to operate the speech training system to intelligently generate a plurality of training words for a pronunciation training. The speech training system includes an input search device and a feedback device. The user interface includes: a first operation interface, associated with the input search device, and used to input a plurality of vocabulary factors; and a second operation interface, associated with the feedback device, and used to display each of the training words in a list form or a graphic form, The speech training system searches for multiple search terms on a cloud platform based on the vocabulary factors, and performs a recognition process, a parsing process, and a translation process on the search terms to generate the training terms.

A user interface as described in claim 17, wherein the first operating interface displays the search terms correspondingly.

A user interface as described in claim 17, wherein the speech training system searches the cloud platform for a plurality of auxiliary images based on the vocabulary factors, and the second operating interface displays the auxiliary images accordingly.

A user interface as described in claim 17, wherein the feedback device is communicatively connected to a terminal device, and each training vocabulary displayed on the second operating interface is transmitted to the terminal device.