TW202403695A - Voice correction auxiliary method and system characterized by providing the practice of articulation and pronunciation at home and providing references for speech therapists - Google Patents

Voice correction auxiliary method and system characterized by providing the practice of articulation and pronunciation at home and providing references for speech therapists Download PDF

Info

Publication number
TW202403695A
TW202403695A TW111126129A TW111126129A TW202403695A TW 202403695 A TW202403695 A TW 202403695A TW 111126129 A TW111126129 A TW 111126129A TW 111126129 A TW111126129 A TW 111126129A TW 202403695 A TW202403695 A TW 202403695A
Authority
TW
Taiwan
Prior art keywords
data
word
sound
similarity
recording
Prior art date
Application number
TW111126129A
Other languages
Chinese (zh)
Other versions
TWI806703B (en
Inventor
塗雅雯
阮聖彰
蕭丞軒
陳俞瑾
Original Assignee
國泰醫療財團法人國泰綜合醫院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國泰醫療財團法人國泰綜合醫院 filed Critical 國泰醫療財團法人國泰綜合醫院
Priority to TW111126129A priority Critical patent/TWI806703B/en
Application granted granted Critical
Publication of TWI806703B publication Critical patent/TWI806703B/en
Publication of TW202403695A publication Critical patent/TW202403695A/en

Links

Images

Landscapes

  • Electrically Operated Instructional Devices (AREA)
  • Input Circuits Of Receivers And Coupling Of Receivers And Audio Equipment (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

A voice correction auxiliary method includes a training program executed by a training device and an execution program executed by an execution device. The execution program includes: playing a first sound data corresponding to a first word in a voice sample data when it is determined that a first pronunciation button is triggered; recording a first recording data corresponding to the first word and generating a data to be analyzed on the basis of the first recording data, when it is determined that a first recording button is triggered; and comparing the similarity between the data to be analyzed and the first sound data to generate an analysis result when it is determined that the start analysis button is triggered. The present invention is a tool for practicing articulation and pronunciation at home, and can analyze the user's articulation status on the basis of the data to be analyzed, which records the user's pronunciations, so as to provide references for speech therapists. The training program includes: using a training data to train an AI mode to establish a sound comparison mode; and transmitting the sound comparison mode to the execution device.

Description

語音矯正輔助方法和系統Speech correction auxiliary method and system

一種語音矯正輔助方法和系統,尤指可以協助分析和協助矯正發音的一種語音矯正輔助方法和系統。A speech correction auxiliary method and system, especially a speech correction auxiliary method and system that can assist in analyzing and correcting pronunciation.

部分的兒童在成長過程中會遇到咬字發音上的困難,而這種發音過程中所遇到的困難與錯誤可稱為語言障礙。語言障礙是一種可以經治療而受到改善的障礙。Some children will encounter difficulties in articulating and pronouncing words as they grow up, and the difficulties and errors encountered in the pronunciation process can be called language barriers. Speech disorder is a disorder that can improve with treatment.

語言障礙的治療需要由專業的語音治療師所主持,並且進一步需要孩童的家長配合監督孩童居家練習發音才能有效改善孩童的語言障礙。然而,當家長配合監督孩童居家練習發音時,家長無法如專業的語音治療師般仔細洞悉孩童練習時所犯下的發音錯誤。因此,家長可能在孩童發音錯誤的當下,未能即時反饋給孩童,導致孩童重複的進行錯誤的發音練習,造成整體語言障礙治療上的阻力。The treatment of speech disorders needs to be presided over by a professional speech therapist, and the children's parents further need to cooperate in supervising the children's pronunciation practice at home to effectively improve the children's speech disorders. However, when parents cooperate in supervising their children's pronunciation practice at home, they are unable to discern the pronunciation errors their children make during practice as carefully as professional speech therapists. Therefore, parents may fail to provide immediate feedback to their children when their children make mispronunciation errors, causing the children to repeatedly practice incorrect pronunciation, causing resistance to overall language disorder treatment.

進一步來說,就算家長洞悉了孩童發音上出現了錯誤,大多家長無法有方針地教導孩童如何改變發音的技巧。換句話說,即便家長能察覺孩童發音錯誤,卻不了解孩童是如何發音錯誤的,也因此無法提供針對性的改善方法。如此,將造成孩童跟家長雙方的負擔。Furthermore, even if parents are aware of the errors in their children's pronunciation, most parents are unable to teach their children how to change their pronunciation skills in a targeted manner. In other words, even if parents can detect their children's pronunciation errors, they do not understand how their children make them, and therefore cannot provide targeted improvement methods. This will cause a burden on both children and parents.

有鑑於上述的問題,本發明提供一語音矯正方法和系統。In view of the above problems, the present invention provides a speech correction method and system.

本發明之該語音矯正輔助系統,包括一訓練裝置和一執行裝置。該訓練裝置進一步包括一顯示模組、一音訊模組、一記憶模組、一通訊模組和一處理模組。該處理模組分別電連接該顯示模組、該音訊模組、該記憶模組和該通訊模組。The speech correction assistance system of the present invention includes a training device and an execution device. The training device further includes a display module, an audio module, a memory module, a communication module and a processing module. The processing module is electrically connected to the display module, the audio module, the memory module and the communication module respectively.

該記憶模組存有一第一圖像資訊和對應該第一圖像資訊的一第一語詞資訊、以及一語音樣本資料。該第一語詞資訊包括一第一單字,且該語音樣本資料包括對應該第一單字的一第一聲音資料。該通訊模組連接一網路以通訊連接該訓練裝置。The memory module stores a first image information, a first word information corresponding to the first image information, and a voice sample data. The first word information includes a first word, and the speech sample data includes a first sound data corresponding to the first word. The communication module is connected to a network to communicate with the training device.

本發明之該語音矯正輔助方法,包括一訓練程序和一執行程序。該訓練程序係由該訓練裝置所執行,而該執行程序係由該執行裝置的該處理模組所執行,且該執行程序包括以下步驟: a.透過該執行裝置的該顯示模組顯示一開始分析按鈕、一第一圖像資訊以及對應該第一圖像資訊的一第一語詞資訊,並且顯示該第一語詞資訊所包括的一第一單字,以及對應該第一單字的一第一發音按鈕和一第一錄音按鈕; b.當判斷該第一發音按鈕被觸發時,透過該執行裝置的該音訊模組撥放一語音樣本資料中對應該第一單字的一第一聲音資料; c.當判斷該第一錄音按鈕被觸發時,透過該音訊模組錄製對應該第一單字的一第一錄音資料,並根據該第一錄音資料產生一待分析資料; d.當判斷該開始分析按鈕被觸發時,比對該待分析資料和該第一聲音資料的相似度以產生一分析結果。 The speech correction auxiliary method of the present invention includes a training program and an execution program. The training program is executed by the training device, and the execution program is executed by the processing module of the execution device, and the execution program includes the following steps: a. Display a start analysis button, a first image information and a first word information corresponding to the first image information through the display module of the execution device, and display a first word information included in the first word information. A word, and a first pronunciation button and a first recording button corresponding to the first word; b. When it is determined that the first pronunciation button is triggered, play a first sound data corresponding to the first word in a voice sample data through the audio module of the execution device; c. When it is determined that the first recording button is triggered, record a first recording data corresponding to the first word through the audio module, and generate a data to be analyzed based on the first recording data; d. When it is determined that the start analysis button is triggered, compare the similarity between the data to be analyzed and the first sound data to generate an analysis result.

本發明提供一家長協助患有語言障礙的一孩童矯正發音的一輔助工具。當本發明透過該執行程序撥放該第一單字的該第一聲音資料和呈現該第一圖像資訊時,孩童將可以聊解如何正確發音。當本發明錄製對應該第一單字的該第一錄音資料,並且產生該分析結果後,該家長能夠透過機器的協助了解孩童的發音是否正確。進一步來說,當多次使用本發明錄製該第一錄音資料和產生該分析結果後,該家長能夠透過機器的協助了解孩童的發音是否出現進步,即了解該第一聲音資料和該第一錄音資料的相似度是否越來越趨近相似。The present invention provides an auxiliary tool for parents to assist a child with speech impairment in correcting pronunciation. When the present invention plays the first sound data of the first word and presents the first image information through the execution program, children will be able to understand how to pronounce it correctly. When the present invention records the first recording data corresponding to the first word and generates the analysis result, the parent can understand whether the child's pronunciation is correct through the assistance of the machine. Furthermore, after using the present invention multiple times to record the first recording data and generate the analysis results, the parent can understand whether the child's pronunciation has improved with the help of the machine, that is, understand the first sound data and the first recording Whether the similarity of the data is getting closer and closer to similarity.

本發明能提供居家練習時輔助該家長解析和矯正該孩童咬字發音的工具,亦可以提供參考性的分析結果給語音治療師做該孩童語言障礙的輔助判斷。The present invention can provide tools to assist parents in analyzing and correcting the child's pronunciation during home practice, and can also provide reference analysis results to speech therapists to assist in the judgment of the child's language disorder.

請參閱圖1所示,本發明提供一種語音矯正輔助方法和系統。本發明之該語音矯正輔助系統包括一訓練裝置100和一執行裝置200。該執行裝置200包括一顯示模組10、一音訊模組20、一記憶模組30、一處理模組40和一通訊模組50。該處理模組40分別電連接該顯示模組10、該音訊模組20和該記憶模組30。Referring to Figure 1, the present invention provides a speech correction auxiliary method and system. The speech correction assistance system of the present invention includes a training device 100 and an execution device 200. The execution device 200 includes a display module 10, an audio module 20, a memory module 30, a processing module 40 and a communication module 50. The processing module 40 is electrically connected to the display module 10, the audio module 20 and the memory module 30 respectively.

該記憶模組30存有一第一圖像資訊和對應該第一圖像資訊的一第一語詞資訊、以及一語音樣本資料。該第一語詞資訊包括一第一單字,且該語音樣本資料包括對應該第一單字的一第一聲音資料。該執行裝置200的該通訊模組50連接一網路以通訊連接該訓練裝置100。The memory module 30 stores a first image information, a first word information corresponding to the first image information, and a voice sample data. The first word information includes a first word, and the speech sample data includes a first sound data corresponding to the first word. The communication module 50 of the execution device 200 is connected to a network to communicate with the training device 100 .

請參閱圖2所示,該語音矯正輔助方法包括一訓練程序S100和一執行程序S200。其中該訓練程序S100係由該訓練裝置100所執行,而該執行程序S200係由該執行裝置200的該處理模組40所執行。該執行程序S200包括以下步驟: 步驟S210:透過該執行裝置200的該顯示模組10顯示一開始分析按鈕、該第一圖像資訊以及對應該第一圖像資訊的該第一語詞資訊,並且顯示該第一語詞資訊所包括的該第一單字,以及對應該第一單字的該第一發音按鈕和該第一錄音按鈕; 步驟S220:當判斷該第一發音按鈕被觸發時,透過該執行裝置200的該音訊模組20撥放該語音樣本資料中對應該第一單字的該第一聲音資料; 步驟S230:當判斷該第一錄音按鈕被觸發時,透過該音訊模組20錄製對應該第一單字的一第一錄音資料,並根據該第一錄音資料產生一待分析資料; 步驟S240:當判斷該開始分析按鈕被觸發時,比對該待分析資料和該第一聲音資料的相似度以產生一分析結果。 Please refer to Figure 2. The speech correction assistance method includes a training program S100 and an execution program S200. The training program S100 is executed by the training device 100, and the execution program S200 is executed by the processing module 40 of the execution device 200. The execution procedure S200 includes the following steps: Step S210: Display a start analysis button, the first image information and the first word information corresponding to the first image information through the display module 10 of the execution device 200, and display the first word information included The first word, and the first pronunciation button and the first recording button corresponding to the first word; Step S220: When it is determined that the first pronunciation button is triggered, play the first sound data corresponding to the first word in the speech sample data through the audio module 20 of the execution device 200; Step S230: When it is determined that the first recording button is triggered, record a first recording data corresponding to the first word through the audio module 20, and generate a data to be analyzed based on the first recording data; Step S240: When it is determined that the start analysis button is triggered, compare the similarity between the data to be analyzed and the first sound data to generate an analysis result.

在本發明的一實施例中,該執行裝置200進一步包括一人機互動模組60,且該人機互動模組60電連接該處理模組40。該人機互動模組60產生一選取訊號。使用該執行裝置200的一使用者可透過該人機互動模組60根本發明互動,即通過選取該第一發音按鈕聆聽本發明撥放的該第一聲音資料,和通過選取該第一錄音按鈕錄製對應該第一單字的該第一錄音資料。In an embodiment of the present invention, the execution device 200 further includes a human-computer interaction module 60 , and the human-computer interaction module 60 is electrically connected to the processing module 40 . The human-computer interaction module 60 generates a selection signal. A user using the execution device 200 can create interactions through the human-computer interaction module 60, that is, by selecting the first pronunciation button to listen to the first sound data played by the present invention, and by selecting the first recording button. The first recording material corresponding to the first word is recorded.

當該處理模組40接收該人機互動模組60產生的該選取訊號,且該選取訊號對應選取該第一發音按鈕時,該處理模組40即判斷該第一發音按鈕被觸發,而進一步透過該音訊模組20撥放該第一聲音資料。當該處理模組40接收該人機互動模組60產生的該選取訊號,且該選取訊號對應選取該第一錄音按鈕時,該處理模組40即判斷該第一錄音按鈕被觸發,而進一步透過該音訊模組20錄製對應該第一單字的該第一錄音資料。When the processing module 40 receives the selection signal generated by the human-computer interaction module 60, and the selection signal corresponds to the selection of the first pronunciation button, the processing module 40 determines that the first pronunciation button is triggered, and further The first audio data is played through the audio module 20 . When the processing module 40 receives the selection signal generated by the human-computer interaction module 60, and the selection signal corresponds to the selection of the first recording button, the processing module 40 determines that the first recording button is triggered, and further The first recording data corresponding to the first word is recorded through the audio module 20 .

在本實施例中,該第一語詞資訊為一語詞,而該第一圖像資訊為對應該語詞的一靜態圖式或是一動態圖式。該動態圖式例如一圖像互換格式(Graphics Interchange Format;GIF)。該第一單字為該語詞中的其中一個單字,而該第一聲音資料為正確朗讀該第一單字的聲音檔。該第一錄音資料為該使用者朗讀該第一單字的聲音檔。In this embodiment, the first word information is a word, and the first image information is a static pattern or a dynamic pattern corresponding to the word. The dynamic graphic is, for example, a Graphics Interchange Format (GIF). The first word is one of the words in the word, and the first sound data is a sound file that correctly pronounces the first word. The first recording data is a sound file of the user reading the first word.

在本實施例中,該通訊模組50連接的該網路為一加密網路,即該使用者必須先登入該網路以經過身分確認後才能下載一軟體更新資料。本發明之該處理模組40透過該通訊模組50從該網路下載該軟體更新資料,且該處理模組40根據下載的該軟體更新資料更新該記憶模組30內存有的該語音樣本資料,以達到擴充資料的目的。當該通訊模組50停止連接該網路後,本發明還是能夠正常的離線運作,因該執行裝置200的該處理模組40所做的處理無須倚賴雲端運算的功能。In this embodiment, the network connected to the communication module 50 is an encrypted network, that is, the user must first log in to the network and verify his or her identity before downloading a software update data. The processing module 40 of the present invention downloads the software update data from the network through the communication module 50, and the processing module 40 updates the voice sample data stored in the memory module 30 according to the downloaded software update data. , to achieve the purpose of expanding data. When the communication module 50 stops connecting to the network, the present invention can still operate normally offline because the processing performed by the processing module 40 of the execution device 200 does not need to rely on the function of cloud computing.

舉例來說,在本實施例中,該執行裝置200為一智慧型手機,即該處理模組40為一處理器,該通訊模組50為一網路模組,該顯示模組10和該人機互動模組60為一觸控螢幕,該音訊模組20為一喇叭和一麥克風,一記憶模組30為一記憶體。在另一實施例中,該執行裝置200為一平板電腦。在另一實施例中,該執行裝置200為一電腦,且該顯示模組10為一螢幕,該人機互動模組60為一鍵盤和一滑鼠。另外,在本實施例中,該訓練裝置100為可連接該網路的一電腦或是一雲端伺服器,且該執行裝置200透過該網路通訊連接該訓練裝置100的該通訊模組50。For example, in this embodiment, the execution device 200 is a smart phone, that is, the processing module 40 is a processor, the communication module 50 is a network module, and the display module 10 and the The human-computer interaction module 60 is a touch screen, the audio module 20 is a speaker and a microphone, and the memory module 30 is a memory. In another embodiment, the execution device 200 is a tablet computer. In another embodiment, the execution device 200 is a computer, the display module 10 is a screen, and the human-computer interaction module 60 is a keyboard and a mouse. In addition, in this embodiment, the training device 100 is a computer or a cloud server that can be connected to the network, and the execution device 200 is connected to the communication module 50 of the training device 100 through the network communication.

請參閱圖3所示,該語音矯正輔助方法之該執行程序S200進一步包括以下步驟:Referring to Figure 3, the execution program S200 of the speech correction auxiliary method further includes the following steps:

步驟S201:透過該通訊模組50下載一軟體更新資料,且根據該軟體更新資料更新該記憶模組30內存有的該語音樣本資料。Step S201: Download a software update data through the communication module 50, and update the voice sample data stored in the memory module 30 according to the software update data.

步驟S202:透過該通訊模組50下載一問卷資料,且透過該顯示模組10顯示該問卷資料。Step S202: Download a questionnaire data through the communication module 50, and display the questionnaire data through the display module 10.

步驟S203:當該問卷資料受到選取完畢時,產生一使用者資料,且將該使用者資料儲存於該記憶模組30中。Step S203: When the questionnaire data is selected, a user data is generated, and the user data is stored in the memory module 30.

顯示該問卷資料即本發明透過該顯示模組10詢問該使用者關於個人資訊和健康資訊的問題。同樣的,該使用者可透過該人機互動模組60選取選擇題的答案,而該人機互動模組60將對應產生該選取訊號於該問卷資料中做出選擇。在執行步驟S202和步驟S203之間,該處理模組40判斷是否該問卷資料中的所有問題都已受到該選取訊號的選擇而選取完畢。當判斷未選取完畢時,即未產生該使用者資料。本發明所產生的該使用者資料只會存於該記憶模組30中,而未通過該通訊模組50送出,因此該使用者資料僅供該使用者自身所持有,以保護該使用者的隱私。該使用者資料為協助該使用者整理和呈現自身的資訊和生心理狀況。Displaying the questionnaire data means that the present invention asks the user questions about personal information and health information through the display module 10 . Similarly, the user can select the answer to the multiple-choice question through the human-computer interaction module 60, and the human-computer interaction module 60 will correspondingly generate the selection signal to make a selection in the questionnaire data. Between step S202 and step S203, the processing module 40 determines whether all questions in the questionnaire data have been selected by the selection signal and have been selected. When it is judged that the selection has not been completed, the user data is not generated. The user information generated by the present invention will only be stored in the memory module 30 and not sent through the communication module 50. Therefore, the user information is only held by the user himself to protect the user. privacy. This user information is used to assist the user in organizing and presenting his or her own information and physical and mental status.

在本實施例中,該問卷資料包括複數題目資訊,而該些題目資訊可為填寫題、單選題或是複選題。該些題目資訊為單選題或是複選題之題目進一步包括對應的複數供選擇答案。其中,填寫題例如填寫兒童姓名,單選題例如選擇進食狀況為良好、挑食、胃口不佳、咀嚼困難或是流口水等其一選項。單選題又可例如選擇呼吸狀況為正常、有雜音或是由口呼吸等其一選項,而複選題例如可複選口齒不清楚、說話性語音障礙(大舌頭)、嗓音障礙(沙啞)、口吃或是語言發展遲緩等選項。當該些題目資訊的各對應答案受到選取完畢時,本發明即產生包括了複數受選擇答案的該使用者資料,並將該使用者資料儲存於該記憶模組30中。In this embodiment, the questionnaire data includes plural question information, and the question information may be fill-in questions, single-choice questions or multiple-choice questions. If the question information is a single-choice question or a multiple-choice question, it further includes corresponding plural answers for selection. Among them, the fill-in questions include filling in the name of the child, and the multiple-choice questions include selecting one of the options of good eating status, picky eating, poor appetite, difficulty chewing, or drooling. For single-choice questions, for example, you can select one of the options of breathing status: normal, noisy, or breathing through the mouth. For multiple-choice questions, for example, you can select unclear articulation, speech disorder (big tongue), voice disorder (hoarseness), Options include stuttering or delayed speech development. When each corresponding answer of the question information is selected, the present invention generates the user data including a plurality of selected answers, and stores the user data in the memory module 30 .

請參閱圖4所示,在本實施例中,該記憶模組30存有的該第一語詞資訊進一步包括一第二單字,且該語音樣本資料進一步包括對應該第二單字的一第二聲音資料。Please refer to FIG. 4 . In this embodiment, the first word information stored in the memory module 30 further includes a second word, and the speech sample data further includes a second sound corresponding to the second word. material.

當該處理模組40執行步驟S210時,該處理模組40進一步顯示該第一語詞資訊所包括的該第二單字,以及對應該第二單字的一第二發音按鈕和一第二錄音按鈕。並且該執行程序S200進一步於步驟S240之前包括以下步驟:When the processing module 40 executes step S210, the processing module 40 further displays the second word included in the first word information, and a second pronunciation button and a second recording button corresponding to the second word. And the execution program S200 further includes the following steps before step S240:

步驟S231:當判斷該第二發音按鈕被觸發時,透過該音訊模組20撥放該語音樣本資料中對應該第二單字的該第二聲音資料。Step S231: When it is determined that the second pronunciation button is triggered, play the second sound data corresponding to the second word in the voice sample data through the audio module 20.

步驟S232:當判斷該第二錄音按鈕被觸發時,透過該音訊模組20錄製對應該第二單字的一第二錄音資料,並根據該第二錄音資料更新該待分析資料。該待分析資料受更新後即包括該第一錄音資料和該第二錄音資料。Step S232: When it is determined that the second recording button is triggered, record a second recording data corresponding to the second word through the audio module 20, and update the data to be analyzed based on the second recording data. The data to be analyzed includes the first recording data and the second recording data after being updated.

另外在本實施例中,該訓練裝置100存有一人工智慧模型(Artificial Intelligence model;AI model)和用以訓練該人工智慧模型的一訓練資料。在該訓練裝置100中,該人工智慧模型為一卷積神經網路模型(Convolutional Neural Network model;CNN model),且該訓練裝置100根據該訓練資料對該卷積神經網路模型進行訓練,以建立一聲音比對模型,並且當該聲音比對模型訓練完成後,該訓練裝置100再將該聲音比對模型以通訊方式,例如該網路,存入該執行裝置200的該記憶模組30中,供該執行裝置200的該處理模組40使用。該訓練裝置100通訊連接該執行裝置200之該通訊模組50,故可將訓練後之該聲音比對模型通過傳輸該通訊模組50傳輸給該處理模組40存入該記憶模組30中。該執行裝置200的該處理模組40即使用該聲音比對模型比對該待分析資料和該第一聲音資料的相似度以產生該分析結果。In addition, in this embodiment, the training device 100 stores an artificial intelligence model (AI model) and a training data for training the artificial intelligence model. In the training device 100, the artificial intelligence model is a convolutional neural network model (CNN model), and the training device 100 trains the convolutional neural network model based on the training data to A sound comparison model is established, and after the sound comparison model training is completed, the training device 100 then stores the sound comparison model in the memory module 30 of the execution device 200 through a communication method, such as the network. , for use by the processing module 40 of the execution device 200 . The training device 100 is connected to the communication module 50 of the execution device 200, so the trained sound comparison model can be transmitted to the processing module 40 through the communication module 50 and stored in the memory module 30. . The processing module 40 of the execution device 200 uses the sound comparison model to compare the similarity between the data to be analyzed and the first sound data to generate the analysis result.

請參閱圖5所示,該訓練裝置100另存有一調整半音數、一平移時間、一調速百分比、一擴音百分比和一環境噪音等數值。該訓練資料包括用以訓練該人工智慧模型的複數小孩聲音檔。並且,該訓練程序S100進一步包括以下步驟:Please refer to FIG. 5 . The training device 100 also has values such as an adjustment semitone number, a translation time, a speed adjustment percentage, a sound amplification percentage, and an environmental noise value. The training data includes a plurality of child voice files used to train the artificial intelligence model. Moreover, the training program S100 further includes the following steps:

步驟S110:透過一音調位移步驟訓練該人工智慧模型,即將各該些小孩聲音檔之音訊音調分別上下調整該調整半音數以訓練該人工智慧模型。Step S110: Train the artificial intelligence model through a pitch shift step, that is, adjust the audio pitch of each child's voice file up or down by the adjusted semitone number to train the artificial intelligence model.

例如,該調整半音數為二,步驟S110即將各該些小孩聲音檔之音訊音調分別上下調整兩半音以訓練該人工智慧模型,如此以模擬不同人聲之間音調(pitch)的差別,藉以訓練該人工智慧模型認識不同人聲之間音調(pitch)的差別。For example, if the number of adjusted semitones is two, step S110 is to adjust the audio pitch of each of the children's voice files by two semitones up and down to train the artificial intelligence model, thereby simulating the difference in pitch between different human voices, thereby training the artificial intelligence model. The artificial intelligence model recognizes the difference in pitch between different human voices.

步驟S120:透過一時間位移步驟訓練該人工智慧模型,即將各該些小孩聲音檔之音訊時間軸隨機平移該平移時間以訓練該人工智慧模型。Step S120: Train the artificial intelligence model through a time shift step, that is, randomly shift the audio timeline of each of the children's sound files by the translation time to train the artificial intelligence model.

例如,該平移時間為一秒鐘之時間,步驟S120即將各該些小孩聲音檔之音訊時間軸隨機前後平移一秒鐘之時間,如此以訓練該人工智慧模型認識音訊受到平移後之變化態樣,增強該人工智慧模型對該些小孩聲音檔的音訊解讀能力。For example, the translation time is one second. Step S120 is to randomly translate the audio timeline of each of the children's voice files back and forth by one second, so as to train the artificial intelligence model to recognize the changes in the audio after being translated. , to enhance the artificial intelligence model’s ability to interpret the audio files of these children.

步驟S130:透過一速度縮放步驟訓練該人工智慧模型,即將各該些小孩聲音檔之音訊速度隨機縮放該調速百分比以訓練該人工智慧模型。Step S130: Train the artificial intelligence model through a speed scaling step, that is, randomly scale the audio speed of each of the children's voice files by the speed adjustment percentage to train the artificial intelligence model.

例如,該調速百分比為25%,步驟S130即將各該些小孩聲音檔之音訊速度隨機縮放25%之原始音訊速度,如此以模擬不同說話速度快慢之變化態樣,藉以訓練該人工智慧模型認識不同說話速度快慢之變化態樣。For example, the speed adjustment percentage is 25%. Step S130 is to randomly scale the audio speed of each of the children's voice files by 25% of the original audio speed, so as to simulate changes in different speaking speeds and train the artificial intelligence model to recognize Changes in different speaking speeds.

步驟S140:透過一增加音量步驟訓練該人工智慧模型,即將各該些小孩聲音檔之音訊聲量加大該擴音百分比以訓練該人工智慧模型。Step S140: Train the artificial intelligence model through a volume increasing step, that is, increase the audio volume of each of the children's voice files by the amplification percentage to train the artificial intelligence model.

例如,該擴音百分比為15%,步驟S140即將各該些小孩聲音檔之音訊聲量加大15%之原始音量大小,如此以模擬不同說話大小聲之變化態樣,藉以訓練該人工智慧模型認識不同說話大小聲之變化態樣。For example, if the amplification percentage is 15%, step S140 is to increase the audio volume of each of the children's voice files by 15% of the original volume, so as to simulate the changes in loudness and volume of different speakers to train the artificial intelligence model. Understand the changes in different speaking styles.

步驟S150:透過一增加白噪音步驟訓練該人工智慧模型,即將各該些小孩聲音檔之音訊添加該環境噪音以訓練該人工智慧模型。Step S150: Train the artificial intelligence model through a step of adding white noise, that is, adding the environmental noise to the audio of each of the children's voice files to train the artificial intelligence model.

例如,因為該環境噪音為全頻率的雜訊,所以可以增添各該些小孩聲音檔中所有頻率受到雜訊影響後之變化態樣,以訓練該人工智慧模型認識各該些小孩聲音檔受到雜訊影響後變化之多樣性。For example, because the environmental noise is full-frequency noise, the changes in all frequencies in the children's voice files that are affected by the noise can be added to train the artificial intelligence model to recognize that the children's voice files are affected by the noise. Diversity of changes after the impact of information.

步驟S160:結束訓練該人工智慧模型,並且建立訓練完成的該聲音比對模型。Step S160: End training the artificial intelligence model, and establish the trained voice comparison model.

步驟S170:將訓練完成的該聲音比對模型傳送至該執行裝置200。Step S170: Send the trained sound comparison model to the execution device 200.

步驟S170中,該訓練裝置100將訓練完成的該聲音比對模型透過該網路傳輸至該執行裝置200的該通訊模組50。該執行裝置200的該處理模組40通過該通訊模組50接收訓練完成的該聲音比對模型,以建立或是更新訓練完成的該聲音比對模型於該記憶模組30中。當該執行裝置200的該處理模組40執行程序S240時,該處理模組40係透過該聲音比對模型比對該待分析資料和該第一錄音資料的相似度以產生該分析結果。上述步驟S110至步驟S170之先後順序和使用之訓練組合不以此實施例為限。In step S170, the training device 100 transmits the trained sound comparison model to the communication module 50 of the execution device 200 through the network. The processing module 40 of the execution device 200 receives the trained sound comparison model through the communication module 50 to establish or update the trained sound comparison model in the memory module 30 . When the processing module 40 of the execution device 200 executes the program S240, the processing module 40 compares the similarity between the data to be analyzed and the first recording data through the sound comparison model to generate the analysis result. The order of the above-mentioned steps S110 to step S170 and the training combination used are not limited to this embodiment.

請參閱圖6所示,在另一實施例中,該訓練資料包括用以訓練該人工智慧模型的該些小孩聲音檔和複數大人聲音檔。並且,該訓練程序S100進一步包括以下步驟:Referring to FIG. 6 , in another embodiment, the training data includes the children's voice files and a plurality of adult voice files used to train the artificial intelligence model. Moreover, the training program S100 further includes the following steps:

步驟S100A:透過一梅爾頻譜步驟訓練該人工智慧模型,即將各該些大人聲音檔和各該些小孩聲音檔時頻轉換後,擷取複數訊號窗口內的頻段音訊,並且將該些訊號窗口內的頻段音訊過濾後,再次時頻轉換,並用以訓練該人工智慧模型。Step S100A: Train the artificial intelligence model through a Mel spectrum step, that is, after time-frequency conversion of the adult voice files and the child voice files, capture the frequency band audio in the plurality of signal windows, and convert the signal windows into After the frequency band audio is filtered, the time-frequency conversion is performed again and used to train the artificial intelligence model.

步驟S100B:結束訓練該人工智慧模型,並且建立訓練完成的該聲音比對模型。Step S100B: End training the artificial intelligence model, and establish the trained voice comparison model.

步驟S100C:將訓練完成的該聲音比對模型傳送至該執行裝置200。Step S100C: Send the trained sound comparison model to the execution device 200 .

在本實施例中,步驟S100A係由該訓練裝置100通過矩陣實驗室(Matrix Laboratory;Matlab)的MelSpectrogram指令所執行。該些訊號窗口,詳細來說,包括了三個不同窗口大小的訊號窗口,每一個訊號窗口即擷取各該些大人聲音檔和各該些小孩聲音檔時頻轉換後的一時頻譜(spectrogram)的不同頻段。這一些擷取的頻段受到過濾時即可針對雜訊之頻率去除雜訊,以優化這一些擷取頻段的訊號品質,並藉以訓練該人工智慧模型提升比對音訊的準確率。換言之,當該執行裝置200的該處理模組40執行程序S240時,該處理模組40即可透過該聲音比對模型更準確的比對該待分析資料和該第一錄音資料的相似度以產生該分析結果。進一步來說,當執行步驟S100A以透過該梅爾頻譜步驟訓練該人工智慧模型時,該訓練裝置100先將取得梅爾頻率倒譜係數(Mel-scale Frequency Cepstral Coefficients;MFCC),再透過取得的MFCC來建立梅爾倒頻譜(Mel-Frequency Cepstrum, MFC)的係數,以在非線性的梅爾刻度(Mel scale)上得到頻譜,即得到一梅爾頻譜。其中,梅爾刻度和線性的頻率刻度赫茲(Hz)之間已具有數學上對數尺度(logarithmic scale;log scale)的換算公式。In this embodiment, step S100A is executed by the training device 100 through the MelSpectrogram instruction of the Matrix Laboratory (Matlab). The signal windows, specifically, include three signal windows with different window sizes. Each signal window captures the time-frequency converted spectra of the adult voice files and the child voice files. of different frequency bands. When these captured frequency bands are filtered, noise can be removed based on the frequency of the noise to optimize the signal quality of these captured frequency bands, and thereby train the artificial intelligence model to improve the accuracy of the audio comparison. In other words, when the processing module 40 of the execution device 200 executes the program S240, the processing module 40 can more accurately compare the similarity between the data to be analyzed and the first recording data through the sound comparison model. produce the results of this analysis. Furthermore, when performing step S100A to train the artificial intelligence model through the Mel-scale step, the training device 100 will first obtain Mel-scale Frequency Cepstral Coefficients (MFCC), and then use the obtained Mel-scale Frequency Cepstral Coefficients (MFCC). MFCC is used to establish the coefficients of the Mel-Frequency Cepstrum (MFC) to obtain the spectrum on the nonlinear Mel scale (Mel scale), that is, a Mel spectrum is obtained. Among them, there is a mathematical conversion formula of logarithmic scale (logarithmic scale; log scale) between the Mel scale and the linear frequency scale Hertz (Hz).

進一步,當將該些訊號窗口內的頻段音訊過濾時,該訓練裝置100係透過一濾波器組(Filter bank;FBank)來過濾該些大人聲音檔和各該些小孩聲音檔時頻轉換後的頻率分佈,以過濾掉該些訊號窗口外的雜訊,保留該些訊號窗口內的頻段音訊。該些訊號窗口外的雜訊,例如高於人類發聲頻率之高頻雜音,或是低於於人類發聲頻率之低頻雜訊。在本實施例中,該濾波器組為一數位濾波器組,且該濾波器組所濾波的頻率可受到該訓練裝置100的設定。Further, when filtering the frequency band audio within the signal windows, the training device 100 filters the adult voice files and the children's voice files after time-frequency conversion through a filter bank (FBank). Frequency distribution to filter out the noise outside the signal windows and retain the frequency band audio within the signal windows. Noise outside these signal windows, such as high-frequency noise higher than the frequency of human vocalization, or low-frequency noise lower than the frequency of human vocalization. In this embodiment, the filter bank is a digital filter bank, and the frequency filtered by the filter bank can be set by the training device 100 .

在另一實施例中,該分析結果中根據該聲音比對模型所產生的相似度為所謂的一子音正確率(Percentage of Consonants Correct;PCC)。該子音正確率的取得為訓練該人工智慧模型時輸入複數錯誤音和複數正確音後所產生之結果。該人工智慧模型可從聲學上了解不同程度的語音障礙,並且進行後續的分析與錯誤類型分類。當產生該聲音比對模型後,該聲音比對模型可呈現出現的錯誤音類別。舉例來說,就訓練該人工智慧模型熟悉根據華語注音符號表(International Phonetic Alphabet;IPA)而言,不同種類的錯誤發音之間具有邏輯關係,例如:In another embodiment, the similarity generated according to the sound comparison model in the analysis result is the so-called Percentage of Consonants Correct (PCC). The consonant accuracy is obtained as a result of inputting a plurality of incorrect tones and a plurality of correct tones when training the artificial intelligence model. The artificial intelligence model can acoustically understand different degrees of speech impairment and perform subsequent analysis and error type classification. After the sound comparison model is generated, the sound comparison model can present the categories of error sounds that occur. For example, in terms of training the artificial intelligence model, it is familiar with the logical relationship between different types of mispronunciation according to the Chinese phonetic alphabet (International Phonetic Alphabet; IPA), for example:

塞擦音ㄐ = 塞音ㄉ + 擦音ㄒ。Africiate ㄐ = stop ㄉ + fricative ㄒ.

塞擦音ㄓ = 塞音ㄉ + 擦音ㄕ。Africiate ㄓ = stop ㄉ + fricative ㄕ.

塞擦音ㄗ = 塞音ㄉ + 擦音ㄙ,等等。Africiate ㄗ = stop ㄉ + fricative ㄙ, etc.

進一步就聲學上而言,當塞音、塞擦音、擦音或是舌根音等錯誤發音時頻轉換後,可於頻譜圖上觀察出各錯誤發音之頻譜特徵,例如舌根音所對應之頻譜於較低頻率具有較多之能量。頻譜中低頻率具有較多之能量,例如可以以低頻強度(intensity)較高或是樣本密度(sample density)於低頻較高表示。就塞音(stops)的時域聲譜特徵而言,在聲譜上可觀察到的特性有所謂的送氣(aspiration)、發聲起始時間(voice onset time, VOT)和共振峰轉變(formant transition)等時域聲譜特徵。塞音的時域聲譜特徵主要會出現幾個較為相近的共振峰,並且也可以在發音的一開始觀察到明顯的VOT。Furthermore, in terms of acoustics, when mispronunciations such as stops, affricates, fricatives or root sounds are time-frequency converted, the spectral characteristics of each mispronunciation can be observed on the spectrogram. For example, the spectrum corresponding to the root sound is Lower frequencies have more energy. Low frequencies in the spectrum have more energy, which can be expressed by, for example, higher intensity at low frequencies or higher sample density at low frequencies. As far as the time-domain acoustic spectrum characteristics of stops are concerned, the characteristics that can be observed on the acoustic spectrum are the so-called aspiration, voice onset time (VOT), and formant transition. Isotemporal sound spectrum characteristics. The time domain sound spectrum characteristics of stops mainly include several relatively similar formants, and obvious VOT can also be observed at the beginning of pronunciation.

本發明之該人工智慧模型受訓後即可綜合此些上述之聲學知識和邏輯,以產生該聲音比對模型。上述之聲學知識和邏輯僅為本實施例之簡單舉例,因此並不以此為限。當該聲音比對模型受到該執行裝置200使用時,該執行裝置200無需做出錄製聲音之時頻轉換即可獲得該人工智慧模型根據其學習聲學知識和邏輯所歸納之該分析結果。After training, the artificial intelligence model of the present invention can integrate the above-mentioned acoustic knowledge and logic to generate the sound comparison model. The above acoustic knowledge and logic are only simple examples of this embodiment, and are therefore not limited thereto. When the sound comparison model is used by the execution device 200, the execution device 200 does not need to perform time-frequency conversion of the recorded sound to obtain the analysis results summarized by the artificial intelligence model based on its learned acoustic knowledge and logic.

請參閱圖7所示,在本實施例中,當該執行裝置200的該處理模組40執行程序S240時,該處理模組40係執行以下步驟:Please refer to FIG. 7 . In this embodiment, when the processing module 40 of the execution device 200 executes the program S240 , the processing module 40 performs the following steps:

步驟S241:判斷該開始分析按鈕是否被觸發。當判斷該開始分析按鈕未被觸發時,執行程序S241。Step S241: Determine whether the start analysis button is triggered. When it is determined that the start analysis button has not been triggered, program S241 is executed.

步驟S242:當判斷該開始分析按鈕被觸發時,透過該聲音比對模型比對該待分析資料中該第一錄音資料和該第一聲音資料的相似度以產生該第一結果。Step S242: When it is determined that the start analysis button is triggered, compare the similarity between the first recording data and the first sound data in the data to be analyzed through the sound comparison model to generate the first result.

步驟S243:透過該聲音比對模型比對該待分析資料中該第二錄音資料和該第二聲音資料的相似度以產生一第二結果。Step S243: Compare the similarity between the second recording data and the second sound data in the data to be analyzed through the sound comparison model to generate a second result.

步驟S244:根據該第一結果和該第二結果產生該分析結果。Step S244: Generate the analysis result according to the first result and the second result.

進一步,在本實施例中,步驟S244係平均該第一結果和該第二結果以產生該分析結果。換句話說,本實施例分別使用步驟S242和步驟S243利用該聲音比對模型做出兩次的比對分析,並且再平均兩次分析分別產出的結果以得到該分析結果。這裡所指的平均兩次分析分別產出的結果,即該分析結果 = (該第一結果+該第二結果)/2。Further, in this embodiment, step S244 averages the first result and the second result to generate the analysis result. In other words, this embodiment uses step S242 and step S243 to perform two comparison analyzes using the sound comparison model, and then averages the results produced by the two analyzes to obtain the analysis result. What is referred to here is the average result produced by the two analyses, that is, the analysis result = (the first result + the second result)/2.

在本發明另一實施例中,該第一語詞資訊所包括的一第三單字,以及對應該第三單字的一第三發音按鈕和一第三錄音按鈕。依照本案前述之邏輯,當該第三發音按鈕被觸發時,透過該音訊模組20撥放該語音樣本資料中對應該第三單字的一第三聲音資料。當該第三錄音按鈕被觸發時,透過該音訊模組20錄製對應該第三單字的一第三錄音資料,並根據該第三錄音資料更新該待分析資料。以此邏輯類推,在該處理模組40執行程序S243後,係會進一步透過該聲音比對模型比對該待分析資料中該第三錄音資料和該第三聲音資料的相似度以產生一第三結果。接著,再根據所有產生之結果,也就是該第一結果、該第二結果和該第三結果,產生該分析結果。換言之,該分析結果 = (所有產生結果的總和)/(所有產生結果的數量) = (該第一結果+該第二結果+該第三結果)/3。In another embodiment of the present invention, the first word information includes a third word, and a third pronunciation button and a third recording button corresponding to the third word. According to the aforementioned logic of this case, when the third pronunciation button is triggered, a third sound data corresponding to the third word in the voice sample data is played through the audio module 20. When the third recording button is triggered, a third recording data corresponding to the third word is recorded through the audio module 20, and the data to be analyzed is updated according to the third recording data. By analogy with this logic, after the processing module 40 executes the program S243, the system will further compare the similarity between the third recording data and the third sound data in the data to be analyzed through the sound comparison model to generate a first Three results. Then, the analysis result is generated based on all generated results, that is, the first result, the second result, and the third result. In other words, the analysis result = (the sum of all generated results)/(the number of all generated results) = (the first result + the second result + the third result)/3.

請參閱圖8所示,在本發明另一實施例中,當該執行裝置200的該處理模組40執行程序S240時,該處理模組40係執行以下步驟:Please refer to FIG. 8 . In another embodiment of the present invention, when the processing module 40 of the execution device 200 executes program S240, the processing module 40 performs the following steps:

步驟S240A:判斷該開始分析按鈕是否被觸發。當判斷該開始分析按鈕未被觸發時,執行程序S240A。Step S240A: Determine whether the start analysis button is triggered. When it is determined that the start analysis button has not been triggered, program S240A is executed.

步驟S240B:當判斷該開始分析按鈕被觸發時,透過該聲音比對模型一起比對該待分析資料中該第一錄音資料和該第一聲音資料的相似度、該待分析資料中該第二錄音資料和該第二聲音資料的相似度以產生該分析結果。Step S240B: When it is determined that the start analysis button is triggered, compare the similarity between the first recording data and the first sound data in the data to be analyzed, and the second voice data in the data to be analyzed through the sound comparison model. The similarity between the recorded data and the second sound data is used to generate the analysis result.

換句話說,本實施例利用該聲音比對模型一次一起比對分析該第一錄音資料和該第一聲音資料的相似度以及該第二錄音資料和該第二聲音資料的相似度,以僅使用該聲音比對模型產出一綜合結果為該分析結果。In other words, this embodiment uses the sound comparison model to compare and analyze the similarity between the first recording material and the first sound data and the similarity between the second recording material and the second sound data together at once, so as to only Using the sound comparison model produces a comprehensive result as the analysis result.

在本發明之該第一語詞資訊所包括的該第三單字,以及對應該第三單字的該第三發音按鈕和該第三錄音按鈕的又另一實施例中,當該第三錄音按鈕被觸發時,透過該音訊模組20錄製對應該第三單字的該第三錄音資料,並根據該第三錄音資料更新該待分析資料。並且,當判斷該開始分析按鈕被觸發時,同前述實施例之邏輯,即透過該聲音比對模型一起比對該待分析資料中該第一錄音資料和該第一聲音資料的相似度、該待分析資料中該第二錄音資料和該第二聲音資料的相似度、以及該待分析資料中該第三錄音資料和該第三聲音資料的相似度以綜合產生該分析結果。In yet another embodiment of the present invention, the third word included in the first word information, and the third pronunciation button and the third recording button corresponding to the third word, when the third recording button is When triggered, the third recording data corresponding to the third word is recorded through the audio module 20, and the data to be analyzed is updated according to the third recording data. Moreover, when it is determined that the start analysis button is triggered, the same logic as in the previous embodiment is used to compare the similarity between the first recording data and the first sound data in the data to be analyzed through the sound comparison model. The similarity between the second recording data and the second sound data in the data to be analyzed, and the similarity between the third recording data and the third sound data in the data to be analyzed are used to comprehensively generate the analysis result.

該分析結果包括一正常音相似度和複數異常音相似度,且該正常音相似度和該些異常音相似度各為透過該聲音比對模型產生的一百分比。該百分比可數值化的表示該使用者的發音是否正常,也就是綜合該第一錄音資料是否近似於該第一聲音資料、該第二錄音資料是否近似於該第二聲音資料和該第三錄音資料是否近似於該第三聲音資料的分析結果。當該正常音相似度為0%時即代表完全不相似,而該正常音相似度為100%時即代表完全相似一致。The analysis result includes a normal sound similarity and a plurality of abnormal sound similarities, and the normal sound similarity and the abnormal sound similarity are each a percentage generated through the sound comparison model. The percentage can numerically represent whether the user's pronunciation is normal, that is, whether the first recording data is similar to the first sound data, whether the second recording data is similar to the second sound data and the third recording Whether the data is similar to the analysis result of the third sound data. When the similarity of the normal sounds is 0%, it means that they are completely dissimilar, and when the similarity of the normal sounds is 100%, it means that they are completely similar.

詳細來說,本發明具進一步根據該記憶模組30存有的一正常音相似度閾值資料以產生對該正常音相似度不同的解讀。當根據該正常音相似度閾值資料判斷該正常音相似度大於85%時,即判斷該使用者具有輕度之語言障礙。當判斷該正常音相似度大於或是等於65%且小於或是等於85%時,即判斷該使用者具有輕中度之語言障礙。當判斷該正常音相似度大於或是等於50%且小於或是等於64%時,即判斷該使用者具有中重度之語言障礙。當判斷該正常音相似度為小於85%時,即判斷該使用者具有重度之語言障礙。根據對該正常音相似度不同的解讀,本發明產生一語言障礙解讀資料存入該記憶模組30中。Specifically, the present invention further generates different interpretations of the normal sound similarity based on a normal sound similarity threshold data stored in the memory module 30 . When the normal sound similarity is determined to be greater than 85% based on the normal sound similarity threshold data, the user is judged to have a mild language disorder. When the similarity of the normal sounds is determined to be greater than or equal to 65% and less than or equal to 85%, it is determined that the user has a mild to moderate speech impediment. When it is determined that the similarity between normal sounds is greater than or equal to 50% and less than or equal to 64%, it is determined that the user has a moderate to severe language disorder. When it is determined that the similarity of the normal pronunciation is less than 85%, it is determined that the user has severe language impairment. Based on the different interpretations of the similarity of the normal sounds, the present invention generates a language disorder interpretation data and stores it in the memory module 30 .

請參閱圖9所示,在執行完步驟S240後,本發明之該處理模組40進一步執行該執行程序S200之以下步驟:Please refer to Figure 9. After executing step S240, the processing module 40 of the present invention further executes the following steps of the execution program S200:

步驟S250:透過該顯示模組10顯示該分析結果的該正常音相似度和該些異常音相似度。Step S250: Display the similarity of the normal sounds and the similarities of the abnormal sounds of the analysis results through the display module 10 .

步驟S260:根據該正常音相似度閾值資料解讀該正常音相似度,以產生該語言障礙解讀資料。Step S260: Interpret the normal sound similarity based on the normal sound similarity threshold data to generate the language disorder interpretation data.

該些異常音相似度,詳細來說,可細分為一塞音化相似度、一母音化相似度、一母音省略相似度、一舌前音化相似度、一舌根音化相似度、一不送氣音化相似度、一聲隨韻母相似度、一邊音化相似度、一齒間音相似度、一子音省略相似度、一擦音化相似度、一介音省略相似度、一塞擦音化相似度和一複韻母省略相似度等數值。當該執行裝置200的該處理模組40執行程序S240時,本發明即可透過該聲音比對模型,也就是使用經訓練後之該人工智慧模型,分析該些異常音相似度的數值分布組成為何。當該些異常音相似度百分比越高時,即代表根據該聲音比對模型的分析,越有較高的機率面臨發音上的對應困境。例如,當該塞音化相似度為99%且該擦音化相似度為1%時,該使用者發音異常有極高的機率為面臨塞音化的發聲問題,而只有極小的機率為面臨擦音化的發聲問題。Specifically, these abnormal sound similarities can be subdivided into one stop consonant similarity, one vowel consonant similarity, one vowel omission similarity, one tongue front consonant similarity, one tongue base consonant similarity, and one unaspirated similarity. Phonetic similarity, vowel similarity, unilateral phonetic similarity, interdental similarity, consonant omission similarity, fricative similarity, medial omission similarity, and affricate similarity Degrees and complex vowels omit numerical values such as similarity. When the processing module 40 of the execution device 200 executes the program S240, the present invention can analyze the numerical distribution composition of the similarities of the abnormal sounds through the sound comparison model, that is, using the trained artificial intelligence model. Why. When the similarity percentage of these abnormal sounds is higher, it means that according to the analysis of the sound comparison model, there is a higher probability of facing a pronunciation correspondence dilemma. For example, when the similarity of the stop consonantization is 99% and the similarity of the fricative consonantization is 1%, the user's abnormal pronunciation has a very high probability of facing the pronunciation problem of stop consonantization, but only a very small probability of facing a fricative problem. cultural voice issues.

在本實施例中,當該些異常音相似度中的部分組成相似度為零時,則省略顯示其0%之組成於該顯示模組10。舉例來說,假設經一次的分析,該使用者的該塞音化相似度為0.56%、該舌根音化相似度為1.95%、該聲隨韻母相似度為0.31%、該塞擦音化相似度為91.17%,而其餘之該母音化相似度、該母音省略相似度、該舌前音化相似度、該不送氣音化相似度、該邊音化相似度、該齒間音相似度、該子音省略相似度、該擦音化相似度、該介音省略相似度和該複韻母省略相似度都為0%,則該顯示模組10僅顯示百分比大於0%的具代表性的該塞音化相似度、該舌根音化相似度、該聲隨韻母相似度和該塞擦音化相似度。In this embodiment, when the partial component similarity among the abnormal sound similarities is zero, 0% of the components are omitted from display on the display module 10 . For example, suppose that after one analysis, the user's similarity of the stop consonantization is 0.56%, the similarity of the root consonantization is 1.95%, the similarity of the consonant finals is 0.31%, and the similarity of the fricative consonantization is 0.56%. is 91.17%, and the remaining similarity is the vowelization similarity, the vowel omission similarity, the prelingual vowelization similarity, the unaspirated vowelization similarity, the lateralization similarity, the interdental similarity, the If the consonant omission similarity, the fricativeization similarity, the medial omission similarity and the compound vowel omission similarity are all 0%, then the display module 10 will only display the representative stop consonantization with a percentage greater than 0%. The similarity, the similarity of the root of the tongue, the similarity of the sound and the final and the similarity of the affricate.

在本發明另一實施例中,本發明之該執行裝置200執行步驟S260後,進一步將該待分析資料和該分析結果通過該通訊模組50回傳至該訓練裝置100,以回饋一次分析之相關資料。該訓練裝置100可根據回饋之該待分析資料和該分析結果而檢視和調整該人工智慧模型,藉以有更多之數據做為未來訓練該人工智慧模型的教材。In another embodiment of the present invention, after executing step S260, the execution device 200 of the present invention further transmits the data to be analyzed and the analysis results to the training device 100 through the communication module 50 to feed back the results of an analysis. Related information. The training device 100 can review and adjust the artificial intelligence model based on the feedback data to be analyzed and the analysis results, so as to have more data as teaching materials for training the artificial intelligence model in the future.

請參閱圖10所示,圖10示意了該執行裝置200之該顯示模組10顯示的畫面,且本發明之該語音矯正輔助方法以一應用程式(Application;APP)實現。Please refer to FIG. 10 . FIG. 10 illustrates the screen displayed by the display module 10 of the execution device 200 , and the speech correction auxiliary method of the present invention is implemented by an application program (Application; APP).

圖10中,該顯示模組10顯示了一開始畫面10A。該開始畫面10A中包括一下載選項2和一開始測驗選項3。當該下載選項2受到選擇時,即執行程序S201,並且下載的進度由一進度百分比1所顯示。In Figure 10, the display module 10 displays a start screen 10A. The start screen 10A includes a download option 2 and a start test option 3. When the download option 2 is selected, the process S201 is executed, and the download progress is displayed by a progress percentage of 1.

請參閱圖11所示,當該開始測驗選項3受到選取後,本發明跳出了該開始畫面10A而進入一測驗畫面10B。該測驗畫面10B中,該顯示模組10顯示了該第一圖像資訊4和對應該第一圖像資訊4的該第一語詞資訊5,即為布丁。其中,該第一語詞資訊5的該第一單字5A為布丁的布字,而該第一語詞資訊5的該第二單字5B為布丁的丁字。該顯示模組10也顯示該第一單字5A所對應的該第一發音按鈕5AS和該第一錄音按鈕5AMic,以及該第二單字5B所對應的該第二發音按鈕5BS和該第二錄音按鈕5BMic。該第一圖像資訊4的下方為一第二圖像資訊6,而該第二圖像資訊6為一青菜。該第一圖像資訊4的布丁和該第二圖像資訊6的青菜為同樣的道理,即協助該使用者分析和矯正咬字發音的不同物件。另外,該顯示模組10顯示了一開始分析選項7。當該開始分析選項7受到選取後,即停止取樣,而根據目前所有的錄製聲音樣本作分析。Please refer to FIG. 11 . When the start test option 3 is selected, the present invention jumps out of the start screen 10A and enters a test screen 10B. In the test screen 10B, the display module 10 displays the first image information 4 and the first word information 5 corresponding to the first image information 4, which is pudding. The first word 5A of the first word information 5 is the word "bu" for pudding, and the second word 5B of the first word information 5 is the word "D" for pudding. The display module 10 also displays the first pronunciation button 5AS and the first recording button 5AMic corresponding to the first word 5A, and the second pronunciation button 5BS and the second recording button corresponding to the second word 5B. 5BMic. Below the first image information 4 is a second image information 6, and the second image information 6 is a green vegetable. The pudding of the first image information 4 and the vegetables of the second image information 6 are for the same reason, that is, they are different objects that assist the user in analyzing and correcting pronunciation of words. In addition, the display module 10 displays a start analysis option 7. When the start analysis option 7 is selected, sampling is stopped and analysis is performed based on all currently recorded sound samples.

在本發明另一實施例中,當該開始測驗選項3受到選取後,本發明係跳出了該開始畫面10A後先進入一仿說畫面,而後才跳至該測驗畫面10B。該仿說畫面中顯示複數仿說字眼,以協助該使用者進行仿說。該些仿說字眼例如「阿」和「1、2、3、4、5、6、7、8、9、10」等。此目的為希望能誘導該使用者習慣閱讀該顯示模組10顯示之字眼,以利而後進入該測驗畫面10B後錄音該使用者說話的品質能後更好,也就是以利透過該音訊模組20錄製對應該第一單字的該第一錄音資料和對應該第二單字的該第二錄音資料能夠因該使用者習慣閱讀後而品質更好、以錄製該使用者更趨正常放鬆情況下所做出的發音。In another embodiment of the present invention, when the start test option 3 is selected, the present invention jumps out of the start screen 10A and first enters a simulation screen, and then jumps to the test screen 10B. A plurality of imitating words are displayed in the imitation speaking screen to assist the user in imitating speaking. These imitative words include "A" and "1, 2, 3, 4, 5, 6, 7, 8, 9, 10" and so on. The purpose of this is to induce the user to get used to reading the words displayed by the display module 10, so that the quality of recording the user's speech after entering the test screen 10B can be better, that is, to facilitate the use of the audio module 20 Recording the first recording material corresponding to the first word and the second recording material corresponding to the second word can have better quality due to the user's habit of reading, so as to record the user's normal and relaxed situation. Pronunciation made.

請參閱圖12所示,當該開始分析選項7受到選取後,本發明跳出了該測驗畫面10B而進入一分析結果畫面10C。該分析結果畫面10C中,該顯示模組10顯示了該分析結果,即顯示一第一項目8N以及其對應的該正常音相似度8,以代表該使用者所錄製的該待分析資料和正常發音約99.11%相似。Please refer to FIG. 12 . When the start analysis option 7 is selected, the present invention jumps out of the test screen 10B and enters an analysis result screen 10C. In the analysis result screen 10C, the display module 10 displays the analysis result, that is, a first item 8N and its corresponding normal sound similarity 8 are displayed to represent the data to be analyzed and the normal sound recorded by the user. The pronunciation is about 99.11% similar.

該顯示模組10進一步顯示了複數異常發音資訊和對應的該些異常音相似度。該顯示模組10顯示了一第一異常發音資訊8AN和對應的該塞音化相似度8A、一第二異常發音資訊8BN和對應的該舌根音化相似度8B、一第三異常發音資訊8CN和對應的該聲隨韻母相似度8C、一第四異常發音資訊8DN和對應的該塞擦音化相似度8D。The display module 10 further displays a plurality of abnormal pronunciation information and corresponding similarities of the abnormal pronunciations. The display module 10 displays a first abnormal pronunciation information 8AN and the corresponding stop consonantization similarity 8A, a second abnormal pronunciation information 8BN and the corresponding tongue base consonantization similarity 8B, a third abnormal pronunciation information 8CN and The corresponding consonant-final similarity 8C, a fourth abnormal pronunciation information 8DN and the corresponding affricate similarity 8D.

該顯示模組10進一步顯示一儲存和上傳選項9和一下一頁選項11。當該儲存和上傳選項9受到選取時,該處理模組40即通過該通訊模組50連接的該加密網路上傳該待分析資料以及該分析結果至一雲端資料庫做紀錄。另外,當該顯示模組10顯示的該些異常發音資訊不夠顯始於一頁面時,該下一頁選項11即可受到選擇而更新顯示頁面為下一頁,以繼續顯示其餘之該些異常發音資訊。The display module 10 further displays a storage and upload option 9 and a next page option 11 . When the storage and upload option 9 is selected, the processing module 40 uploads the data to be analyzed and the analysis results to a cloud database for recording through the encrypted network connected to the communication module 50 . In addition, when the abnormal pronunciation information displayed by the display module 10 is not enough to be displayed on one page, the next page option 11 can be selected to update the display page to the next page to continue to display the remaining abnormalities. Pronunciation information.

請參閱圖13所示,當該儲存和上傳選項9受到選取後,本發明跳出了該分析結果畫面10C而進入一輔導資訊畫面10D。該處理模組40通過該通訊模組50下載了複數健康常識資訊12,而在該輔導資訊畫面10D中,該處理模組40通過該顯示模組10顯示該些健康常識資訊12和一結束選項13。該顯示模組10顯示的該些健康常識資訊12能協助該使用者增加常識,以使該使用者了解該些異常發音資訊8AN、8BN、8CN、8DN所代表的意義。例如,塞擦音化的異常發音即可能源自於某發音部位的發聲過程出現了某一種的錯誤。當該使用者了解該些異常發音資訊8AN、8BN、8CN、8DN所代表的意義後,該使用者更能了解如何糾正發音錯誤的問題。例如,因為該使用者的該塞擦音化相似度8D為91.17%非常高之百分比,所以代表該使用者面臨了發音上塞擦音化的較大困難,因此比起其他的發音問題,塞擦音化的問題需要優先受到改善。如此,本發明除了可以使該使用者受到矯正發音的輔助,更能提供該使用者參考性的分析數據做矯正發音的紀錄。Please refer to Figure 13. When the storage and upload option 9 is selected, the present invention jumps out of the analysis result screen 10C and enters a coaching information screen 10D. The processing module 40 downloads a plurality of health knowledge information 12 through the communication module 50, and in the counseling information screen 10D, the processing module 40 displays the health knowledge information 12 and an end option through the display module 10 13. The health knowledge information 12 displayed by the display module 10 can help the user increase their knowledge, so that the user understands the meaning of the abnormal pronunciation information 8AN, 8BN, 8CN, and 8DN. For example, abnormal pronunciation of affricates may originate from a certain error in the pronunciation process of a certain articulatory part. After the user understands the meaning of the abnormal pronunciation information 8AN, 8BN, 8CN, and 8DN, the user can better understand how to correct the pronunciation error. For example, because the user's similarity of affricate 8D is 91.17%, which is a very high percentage, it means that the user faces greater difficulties in pronunciation of affricate. Therefore, compared with other pronunciation problems, affricate The problem of fricativeization needs to be improved as a priority. In this way, the present invention can not only enable the user to receive assistance in correcting pronunciation, but also provide the user with reference analysis data to record pronunciation correction.

當該結束選項13受到選取後,本發明即跳出該輔導資訊畫面10D而回到該開始畫面10A,即結束一次錄製和分析聲音相似度的所有流程而回到該開始畫面10A待命下一次流程的開始。當該開始畫面10A中的該開始測驗選項3受到選取後,即開始新一次錄製和分析聲音相似度的流程。When the end option 13 is selected, the present invention jumps out of the tutoring information screen 10D and returns to the start screen 10A. That is, it ends all processes of recording and analyzing voice similarity and returns to the start screen 10A to standby for the next process. Start. When the start test option 3 in the start screen 10A is selected, a new process of recording and analyzing the sound similarity is started.

本發明提供一家長協助患有語言障礙的一孩童矯正發音的一輔助工具。本發明的該使用者可為該家長和該孩童,並由該家長負責操作本發明之該人機互動模組60,而由可能有語言障礙問題的兒童觀看該第一圖像資訊4和對應該第一圖像資訊4的該第一語詞資訊5、聆聽對應該第一單字5A的該第一聲音資料、和錄製對應該第一單字5A的該第一錄音資料。當本發明撥放該第一單字5A的該第一聲音資料和呈現該第一圖像資訊4時,孩童將可以聊解如何正確發音。當本發明錄製對應該第一單字5A的該第一錄音資料,並且產生該分析結果後,該家長能夠透過機器的協助了解孩童的發音是否正確。進一步來說,當多次使用本發明錄製該第一錄音資料和產生該正常音相似度8後,該家長能夠透過本發明的分析和長時間紀錄成果,協助了解孩童的發音是否出現進步,即了解該第一聲音資料和該第一錄音資料的相似度是否越來越趨近相似,或是該第一聲音資料和該第一錄音資料的相似度以及該第二聲音資料和該第二錄音資料的相似度的綜合相似度是否越來越趨正常。當該正常音相似度8的百分比越來越高時,即代表孩童的發音越來越正確。The present invention provides an auxiliary tool for parents to assist a child with speech impairment in correcting pronunciation. The user of the present invention can be the parent and the child, and the parent is responsible for operating the human-computer interaction module 60 of the present invention, and the children who may have language barriers watch the first image information 4 and interact with each other. The first word information 5 corresponding to the first image information 4, listening to the first sound data corresponding to the first word 5A, and recording the first recording data corresponding to the first word 5A. When the present invention plays the first sound data of the first word 5A and presents the first image information 4, children will be able to understand how to pronounce it correctly. When the present invention records the first recording data corresponding to the first word 5A and generates the analysis result, the parent can understand whether the child's pronunciation is correct through the assistance of the machine. Furthermore, after using the present invention multiple times to record the first recording data and generate the normal sound similarity 8, the parent can help understand whether the child's pronunciation has improved through the analysis and long-term recording results of the present invention, that is, Understand whether the similarity between the first sound material and the first recording material is getting closer and closer, or the similarity between the first sound material and the first recording material and the similarity between the second sound material and the second recording Is the comprehensive similarity of the data similarity becoming more and more normal? When the percentage of normal sound similarity 8 becomes higher and higher, it means that the child's pronunciation is becoming more and more correct.

當該顯示模組10顯示該分析結果後,由該家長一併綜合該顯示模組10顯示的該些健康常識資訊12協助該孩童正確練習咬字發音,並且由該家長提供本發明之紀錄給一語音治療師做該孩童發音的參考性的輔助資料。After the display module 10 displays the analysis result, the parent integrates the health knowledge information 12 displayed by the display module 10 to help the child correctly practice articulation and pronunciation, and the parent provides the record of the present invention to a child. The speech therapist provides reference materials for the child's pronunciation.

1:進度百分比 2:下載選項 3:開始測驗選項 4:第一圖像資訊 5:第一語詞資訊 5A:第一單字 5Amic:第一錄音按鈕 5AS:第一發音按鈕 5B:第二單字 5Bmic:第二錄音按鈕 5BS:第二發音按鈕 6:第二圖像資訊 7:開始分析選項 8:正常音相似度 8N:第一項目 8A:塞音化相似度 8AN:第一異常發音資訊 8B:舌根音化相似度 8BN:第二異常發音資訊 8C:聲隨韻母相似度 8CN:第三異常發音資訊 8D:塞擦音化相似度 8DN:第四異常發音資訊 9:儲存和上傳選項 10:顯示模組 10A:開始畫面 10B:測驗畫面 10C:分析結果畫面 10D:輔導資訊畫面 11:下一頁選項 12:健康常識資訊 13:結束選項 20:音訊模組 30:記憶模組 40:處理模組 50:通訊模組 60:人機互動模組 100:訓練裝置 200:執行裝置 S100:訓練程序 S110、S120、S130、S140、S150、S160、S170:步驟 S100A、S100B、S100C:步驟 S200:執行程序 S201~S203、S210、S220、S230~S232、S240~S244:步驟 S240A、S240B:步驟 S250、S260:步驟 1: Progress percentage 2:Download options 3: Start quiz option 4:First image information 5: First language word information 5A:First word 5Amic: First recording button 5AS: First pronunciation button 5B: The second word 5Bmic: Second recording button 5BS: Second pronunciation button 6: Second image information 7: Start analyzing options 8: Normal sound similarity 8N: The first project 8A: Stop consonant similarity 8AN: The first abnormal pronunciation information 8B: Similarity of tongue base phoneticization 8BN: Second abnormal pronunciation information 8C: Similarity of sounds and finals 8CN: Third abnormal pronunciation information 8D: Africative similarity 8DN: The fourth abnormal pronunciation information 9:Save and upload options 10:Display module 10A:Start screen 10B: Test screen 10C: Analysis result screen 10D: Counseling information screen 11: Next page options 12: Health knowledge information 13:End option 20: Audio module 30:Memory module 40: Processing module 50: Communication module 60:Human-computer interaction module 100:Training device 200: Execution device S100: Training program S110, S120, S130, S140, S150, S160, S170: steps S100A, S100B, S100C: Steps S200: Execute program S201~S203, S210, S220, S230~S232, S240~S244: steps S240A, S240B: steps S250, S260: steps

圖1為本發明一語音矯正輔助系統的方塊圖。 圖2為本發明一語音矯正輔助方法的流程圖。 圖3為本發明該語音矯正輔助方法一執行程序的流程圖。 圖4為本發明該語音矯正輔助方法該執行程序的另一流程圖。 圖5為本發明該語音矯正輔助方法一訓練程序的流程圖。 圖6為本發明該語音矯正輔助方法該訓練程序的另一流程圖。 圖7為本發明該語音矯正輔助方法該執行程序的又一流程圖。 圖8為本發明該語音矯正輔助方法該執行程序的還一流程圖。 圖9為本發明該語音矯正輔助方法該執行程序的再一流程圖。 圖10為本發明該語音矯正輔助系統之一執行裝置顯示一開始畫面的示意圖。 圖11為本發明該語音矯正輔助系統之該執行裝置顯示一測驗畫面的示意圖。 圖12為本發明該語音矯正輔助系統之該執行裝置顯示一分析結果畫面的示意圖。 圖13為本發明該語音矯正輔助系統之該執行裝置顯示一輔導資訊畫面的示意圖。 Figure 1 is a block diagram of a speech correction auxiliary system of the present invention. Figure 2 is a flow chart of a speech correction auxiliary method according to the present invention. FIG. 3 is a flow chart of an execution program of the speech correction auxiliary method of the present invention. FIG. 4 is another flow chart of the execution program of the speech correction auxiliary method of the present invention. Figure 5 is a flow chart of a training program of the speech correction auxiliary method of the present invention. Figure 6 is another flow chart of the training program of the speech correction auxiliary method of the present invention. Figure 7 is another flow chart of the execution program of the speech correction auxiliary method of the present invention. FIG. 8 is another flow chart of the execution program of the speech correction auxiliary method of the present invention. FIG. 9 is another flow chart of the execution program of the speech correction auxiliary method of the present invention. Figure 10 is a schematic diagram of an execution device of the speech correction auxiliary system of the present invention displaying a start screen. Figure 11 is a schematic diagram of the execution device of the speech correction auxiliary system of the present invention displaying a test screen. Figure 12 is a schematic diagram of the execution device of the speech correction assistance system of the present invention displaying an analysis result screen. Figure 13 is a schematic diagram of the execution device of the speech correction assistance system of the present invention displaying a coaching information screen.

S100:訓練程序 S100: Training program

S200:執行程序 S200: Execute program

S210、S220、S230、S240:步驟 S210, S220, S230, S240: steps

Claims (11)

一種語音矯正輔助方法,包括: 一執行程序,係由一執行裝置的一處理模組所執行,且包括以下步驟: a.透過該執行裝置的一顯示模組顯示一開始分析按鈕、一第一圖像資訊以及對應該第一圖像資訊的一第一語詞資訊,並且顯示該第一語詞資訊所包括的一第一單字,以及對應該第一單字的一第一發音按鈕和一第一錄音按鈕; b.當判斷該第一發音按鈕被觸發時,透過該執行裝置的一音訊模組撥放一語音樣本資料中對應該第一單字的一第一聲音資料; c.當判斷該第一錄音按鈕被觸發時,透過該音訊模組錄製對應該第一單字的一第一錄音資料,並根據該第一錄音資料產生一待分析資料; d.當判斷該開始分析按鈕被觸發時,比對該待分析資料和該第一聲音資料的相似度以產生一分析結果。 A speech correction auxiliary method, including: An execution program is executed by a processing module of an execution device and includes the following steps: a. Display a start analysis button, a first image information and a first word information corresponding to the first image information through a display module of the execution device, and display a first word information included in the first word information. A word, and a first pronunciation button and a first recording button corresponding to the first word; b. When it is determined that the first pronunciation button is triggered, play a first sound data corresponding to the first word in a voice sample data through an audio module of the execution device; c. When it is determined that the first recording button is triggered, record a first recording data corresponding to the first word through the audio module, and generate a data to be analyzed based on the first recording data; d. When it is determined that the start analysis button is triggered, compare the similarity between the data to be analyzed and the first sound data to generate an analysis result. 如請求項1所述之語音矯正輔助方法,其中: 該第一語詞資訊包括一第二單字; 當執行步驟a時,進一步顯示該第一語詞資訊所包括的該第二單字,以及對應該第二單字的一第二發音按鈕和一第二錄音按鈕; 該執行程序於步驟d之前,進一步包括以下步驟: c1.當判斷該第二發音按鈕被觸發時,透過該音訊模組撥放該語音樣本資料中對應該第二單字的一第二聲音資料; c2.當判斷該第二錄音按鈕被觸發時,透過該音訊模組錄製對應該第二單字的一第二錄音資料,並根據該第二錄音資料更新該待分析資料;其中,該待分析資料包括該第一錄音資料和該第二錄音資料。 The speech correction auxiliary method as described in claim 1, wherein: The first word information includes a second word; When performing step a, further display the second word included in the first word information, and a second pronunciation button and a second recording button corresponding to the second word; The execution procedure further includes the following steps before step d: c1. When it is determined that the second pronunciation button is triggered, play a second sound data corresponding to the second word in the voice sample data through the audio module; c2. When it is determined that the second recording button is triggered, record a second recording data corresponding to the second word through the audio module, and update the data to be analyzed based on the second recording data; wherein, the data to be analyzed Including the first recording material and the second recording material. 如請求項1或2所述之語音矯正輔助方法,進一步包括: 一訓練程序,係由一訓練裝置所執行,且該訓練程序包括以下步驟: A.   使用一訓練資料訓練一人工智慧模型以建立一聲音比對模型; B.    將該聲音比對模型傳送至該執行裝置; 當執行步驟d時,係透過該聲音比對模型比對該待分析資料和該第一錄音資料的相似度以產生該分析結果。 The speech correction auxiliary method as described in claim 1 or 2 further includes: A training program is executed by a training device, and the training program includes the following steps: A. Use a training data to train an artificial intelligence model to build a sound comparison model; B. Send the sound comparison model to the execution device; When step d is executed, the similarity between the data to be analyzed and the first recording data is compared through the sound comparison model to generate the analysis result. 如請求項3所述之語音矯正輔助方法,其中: 當執行步驟d時,係在判斷該開始分析按鈕被觸發時,執行以下步驟: d1.透過該聲音比對模型比對該待分析資料中該第一錄音資料和該第一聲音資料的相似度以產生一第一結果; d2.透過該聲音比對模型比對該待分析資料中該第二錄音資料和該第二聲音資料的相似度以產生一第二結果; d3.根據該第一結果和該第二結果產生該分析結果。 The speech correction auxiliary method as described in claim 3, wherein: When executing step d, when it is determined that the start analysis button is triggered, the following steps are executed: d1. Compare the similarity between the first recording data and the first sound data in the data to be analyzed through the sound comparison model to generate a first result; d2. Compare the similarity between the second recording data and the second sound data in the data to be analyzed through the sound comparison model to generate a second result; d3. Generate the analysis result based on the first result and the second result. 如請求項4所述之語音矯正輔助方法,其中: 步驟d3係平均該第一結果和該第二結果以產生該分析結果。 The speech correction auxiliary method as described in claim 4, wherein: Step d3 averages the first result and the second result to generate the analysis result. 如請求項3所述之語音矯正輔助方法,其中: 當執行步驟d時,係在判斷該開始分析按鈕被觸發時,透過該聲音比對模型一起比對該待分析資料中該第一錄音資料和該第一聲音資料的相似度、該待分析資料中該第二錄音資料和該第二聲音資料的相似度以產生該分析結果。 The speech correction auxiliary method as described in claim 3, wherein: When step d is executed, when it is determined that the start analysis button is triggered, the similarity of the first recording data and the first sound data in the data to be analyzed, and the data to be analyzed are compared together through the sound comparison model. The similarity between the second recording data and the second sound data is used to generate the analysis result. 如請求項3所述之語音矯正輔助方法,其中: 該分析結果包括一正常音相似度和複數異常音相似度,且該正常音相似度和該些異常音相似度各為透過該聲音比對模型產生的一百分比。 The speech correction auxiliary method as described in claim 3, wherein: The analysis result includes a normal sound similarity and a plurality of abnormal sound similarities, and the normal sound similarity and the abnormal sound similarity are each a percentage generated through the sound comparison model. 如請求項3所述之語音矯正輔助方法,其中: 該訓練資料包括複數小孩聲音檔; 當執行程序A時,係透過一音調位移步驟、一時間位移步驟、一速度縮放步驟、一增加音量步驟和一增加白噪音步驟訓練該人工智慧模型; 該音調位移步驟係將各該些小孩聲音檔之音訊音調分別上下調整一調整半音數以訓練該人工智慧模型; 該時間位移步驟係將各該些小孩聲音檔之音訊時間軸隨機平移一平移時間以訓練該人工智慧模型; 該速度縮放步驟係將各該些小孩聲音檔之音訊速度隨機縮放一調速百分比以訓練該人工智慧模型; 該增加音量步驟係將各該些小孩聲音檔之音訊聲量加大以訓練該人工智慧模型; 該增加白噪音步驟係將各該些小孩聲音檔之音訊添加一環境噪音以訓練該人工智慧模型。 The speech correction auxiliary method as described in claim 3, wherein: The training material includes multiple children's voice files; When executing program A, the artificial intelligence model is trained through a pitch shift step, a time shift step, a speed scaling step, a volume increase step and a white noise increase step; The pitch shifting step is to adjust the audio pitch of each of the children's voice files up and down by a number of semitones to train the artificial intelligence model; The time shifting step is to randomly shift the audio timeline of each of the children's voice files by a shifting time to train the artificial intelligence model; The speed scaling step is to randomly scale the audio speed of each of the children's voice files by a speed adjustment percentage to train the artificial intelligence model; The step of increasing the volume is to increase the audio volume of each of the children's voice files to train the artificial intelligence model; The step of adding white noise is to add an environmental noise to the audio of each of the children's voice files to train the artificial intelligence model. 如請求項3所述之語音矯正輔助方法,其中: 該訓練資料包括複數大人聲音檔和複數小孩聲音檔; 當執行程序A時,係透過一梅爾頻譜步驟訓練該人工智慧模型; 該梅爾頻譜步驟係將各該些大人聲音檔和各該些小孩聲音檔時頻轉換後,擷取複數訊號窗口內的頻段音訊,並且將該些訊號窗口內的頻段音訊過濾後,再次時頻轉換,並用以訓練該人工智慧模型。 The speech correction auxiliary method as described in claim 3, wherein: The training data includes multiple adult voice files and multiple child voice files; When executing program A, the artificial intelligence model is trained through a Mel spectrum step; The Mel spectrum step converts the adult voice files and the child voice files into time-frequency conversions, captures the frequency band information within the plurality of signal windows, and filters the frequency band information within the signal windows, and then re-times frequency conversion and used to train the artificial intelligence model. 如請求項9所述之語音矯正輔助方法,其中: 當將該些訊號窗口內的頻段音訊過濾時,係透過一濾波器組(FBank)來過濾該些訊號窗口外的雜訊; 該濾波器組為一數位濾波器組,且該濾波器組所濾波的頻率可受到該訓練裝置的設定。 The speech correction auxiliary method as described in claim 9, wherein: When filtering the frequency band audio within the signal windows, a filter bank (FBank) is used to filter the noise outside the signal windows; The filter bank is a digital filter bank, and the frequency filtered by the filter bank can be set by the training device. 一種語音矯正輔助系統,包括: 一訓練裝置,執行如請求項3至10中任一項所述之語音矯正輔助方法的訓練程序; 一執行裝置,包括: 一顯示模組; 一音訊模組; 一記憶模組,存有一第一圖像資訊和對應該第一圖像資訊的一第一語詞資訊、以及一語音樣本資料;其中,該第一語詞資訊包括一第一單字,且該語音樣本資料包括對應該第一單字的一第一聲音資料; 一處理模組,分別電連接該顯示模組、該音訊模組和該記憶模組; 一通訊模組,電連接該處理模組,連接一網路以通訊連接該訓練裝置; 其中該處理模組執行如請求項1至10任一項所述之語音矯正輔助方法的執行程序。 A speech correction auxiliary system, including: A training device that executes the training program of the speech correction auxiliary method described in any one of claims 3 to 10; An execution device, including: a display module; an audio module; A memory module that stores a first image information, a first word information corresponding to the first image information, and a voice sample data; wherein the first word information includes a first word, and the voice sample The data includes a first sound data corresponding to the first word; A processing module is electrically connected to the display module, the audio module and the memory module respectively; a communication module electrically connected to the processing module and connected to a network to communicate with the training device; The processing module executes the execution program of the speech correction auxiliary method described in any one of claims 1 to 10.
TW111126129A 2022-07-12 2022-07-12 Auxiliary method and system for voice correction TWI806703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW111126129A TWI806703B (en) 2022-07-12 2022-07-12 Auxiliary method and system for voice correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW111126129A TWI806703B (en) 2022-07-12 2022-07-12 Auxiliary method and system for voice correction

Publications (2)

Publication Number Publication Date
TWI806703B TWI806703B (en) 2023-06-21
TW202403695A true TW202403695A (en) 2024-01-16

Family

ID=87803504

Family Applications (1)

Application Number Title Priority Date Filing Date
TW111126129A TWI806703B (en) 2022-07-12 2022-07-12 Auxiliary method and system for voice correction

Country Status (1)

Country Link
TW (1) TWI806703B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5843894B2 (en) * 2014-02-03 2016-01-13 山本 一郎 Recording and recording equipment for articulation training
JP6710893B2 (en) * 2015-02-26 2020-06-17 カシオ計算機株式会社 Electronics and programs
CN106357715A (en) * 2015-07-17 2017-01-25 深圳新创客电子科技有限公司 Method, toy, mobile terminal and system for correcting pronunciation
CN112767961B (en) * 2021-02-07 2022-06-03 哈尔滨琦音科技有限公司 Accent correction method based on cloud computing
CN114596880A (en) * 2021-12-30 2022-06-07 苏州清睿智能科技股份有限公司 Pronunciation correction method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
TWI806703B (en) 2023-06-21

Similar Documents

Publication Publication Date Title
Bocquelet et al. Real-time control of an articulatory-based speech synthesizer for brain computer interfaces
US6123548A (en) Method and device for enhancing the recognition of speech among speech-impaired individuals
US6865533B2 (en) Text to speech
US20020086269A1 (en) Spoken language teaching system based on language unit segmentation
Nye et al. Shadowing latency and imitation: the effect of familiarity with the phonetic patterning of English
Li Language‐specific developmental differences in speech production: A cross‐language acoustic study
AU2003300130A1 (en) Speech recognition method
JPH075807A (en) Device for training conversation based on synthesis
Hongwei et al. An investigation of tone perception and production in German learners of Mandarin
Beckman et al. Methods for eliciting, annotating, and analyzing databases for child speech development
Kataoka Phonetic and cognitive bases of sound change
Grossinho et al. Robust phoneme recognition for a speech therapy environment
Zhang et al. Adjustment of cue weighting in speech by speakers and listeners: Evidence from amplitude and duration modifications of Mandarin Chinese tone
US20180197535A1 (en) Systems and Methods for Human Speech Training
TWI806703B (en) Auxiliary method and system for voice correction
Liu Fundamental frequency modelling: An articulatory perspective with target approximation and deep learning
Sairanen Deep learning text-to-speech synthesis with Flowtron and WaveGlow
Rosdi et al. The Effect Of Changes In Speech Features On The Recognition Accuracy Of ASR System: A Study On The Malay Speech Impaired Children
Lalevee et al. Development of speech frame control: a longitudinal study of oral/nasal control
Näslund Simulating hypernasality with phonological features in Swedish TTS
Felps Articulatory-based speech processing methods for foreign accent conversion
GAOL Students‟ Ability in Pronouncing English Words by Using ELSA Speak Application of the Second-Year Students of SMA Eka Prasetya Medan
Souici The Effect of Automatic Speech Recognition Technologies on Students’ Pronunciation A Case Study of First-year EFL Learners at Biskra University
Roumaissa The Effect of Automatic Speech Recognition Technologies on Students’ Pronunciation A Case Study of First-year EFL Learners at Biskra University
Albalkhi Articulation modelling of vowels in dysarthric and non-dysarthric speech