TW201023176A

TW201023176A - Evaluation system for sound construction anomaly

Info

Publication number: TW201023176A
Application number: TW97148626A
Authority: TW
Inventors: You-Zun Chen; Jing-Wei Huang
Original assignee: Univ Southern Taiwan
Priority date: 2008-12-12
Filing date: 2008-12-12
Publication date: 2010-06-16

Abstract

The present invention provides an evaluation system for sound construction anomaly. Said system includes a step of pronunciation upon picture reading; an step of automatically separating language signals and classifying marks; a step of examining erroneous enunciation and a step of dependent network databank. By the aforementioned steps, a phonetic pathology expert can examine the sound construction anomaly type generated from enunciation disabled and a phonetics trainer can train people with sound construction according to the examination result.

Description

201023176 六、發明說明：【發明所屬之技術領域】本發明係提供-種構音異常之評估系统，藉由本系統可讓語音病理專家檢知構音障礙者所產生之構音錯誤類型及讓語音訓練師依據該結果對構音障礙者進行訓練。' 【先前技術】語言障礙是學齡兒童中最常見問題，且會嚴重影塑兒 ❹童的溝通能力、學習成效、生活、社會適應和人際關:，情緒發展等；語音辨識這項技術在195〇年代才^始被研究，當時語音訊息參數多數採用頻域的特徵參數；一直到 1960年代’有幾個重要的技術突破，例如快速傅利葉轉換，倒頻譜分析，及線性預估編碼，這些技術給197〇年代的語音辨識發展，奠定了重要的基礎。也由於國内外不斷有大量的研究，促使語音辨識技術層次提昇，讓辨識度愈來愈高，以慢慢達到-個可接受的程度；因此辨識技術可以廣 ❹泛應用，舉凡病歷輸入、醫藥咨詢、衛生教育、語音控制、電子商務、自動總機、安全控管、教學補助、視聽娛樂、以及各類查詢交易系統等等，真可說是不勝枚舉。然而以醫療方面而言，近幾年來語音辨識技術對於語障者復健上的應用已具備極大潛力，但卻甚少被技術開發到。雖然國内已有若干論文提到輔助語言障礙者方面的^ 術開發。但大多研究也都只朝向音控、聽寫、助講器…等方面著手，而關於說話訓練部分，也就是語音辨識系統用於辅助語障者臨床上練習構音正確性的研究則是幾乎沒 201023176 有。簡要言之，習知技術有如下缺失：1.在臨床方面，使用人工進行聽取構音之語音資料，並進行人工判定構音錯誤類型。2.在自動化處理方面，目前只使用「單音」進行構音評估，因此需要較長的測試時間，且只能評估錯誤類型。就上述因素而論，為了提昇醫療效能，必要自行開發一套針對華語構音與音韻異常的系統，以協助臨床語言治療師診斷和幫助使用者橋正治療。【發明内容】本發明係提供一種構音異常之評估系統，該系統包括有：一看圖命名之步驟；一自動分割語言信號且歸類標記之步驟；一鑑定發音錯誤樣式之步驟；一從屬關係網絡資料庫之步驟；藉由上述步驟，讓語音病理專家檢知構音障礙者所產生之構音錯誤類型及讓語音訓練師依據該結果對構音障礙者進行訓練。【實施方式】如第一圖所示，本發明構音異常之評估系統具有兩個主要資料庫··即是看圖命名中所須要使用的圖形或照片資料庫以及以臨床診治而科學性之知識為基礎的從屬關係網絡資料庫（DN)。當語音障礙者被激勵在看圖命名中針對所看到的圖形或照片取名而發音時，此對應的語言信號則被抓取出來以作比對。此時，本發明構音異常之評估系統會將使用者之語言信號分割成片段並且加以歸類標記；而從屬關係網絡 201023176 資料庫（DN)則以其中具體指㈣發音錯誤樣式與該語言信號的標記結果進行比對以評估彼此之間的可能相似度。 A.看圖命名的設計：在臨床診治之守則中’語言病理學家則利用以圖片呈現之PNT (取代上述之圖形或照片資料庫）以獲得兒童語言發音的資訊。而一位語音障礙者之發音錯誤樣式則# = 從這些語言中辨識出來。因此，該PNT應該包含具有容易201023176 VI. Description of the Invention: [Technical Field of the Invention] The present invention provides an evaluation system for an abnormal sound structure. The system allows a speech pathologist to detect the type of articulation error generated by a dysarthria and allows the voice trainer to This result trains people with dysarthria. [Prior Art] Language barriers are the most common problem among school-age children, and they will seriously affect the communication skills, learning outcomes, life, social adaptation and interpersonal relationships of children and children: emotional development, etc.; speech recognition technology in 195 It was only in the early years that the speech message parameters were mostly characterized by frequency domain characteristics; until the 1960s, there were several important technological breakthroughs, such as fast Fourier transform, cepstral analysis, and linear predictive coding. It laid an important foundation for the development of speech recognition in the 197s. Due to the continuous research and development at home and abroad, the level of speech recognition technology has been improved, and the recognition has become more and more high, so as to gradually reach an acceptable level. Therefore, the identification technology can be widely applied, including medical record input and medicine. Consulting, health education, voice control, e-commerce, automatic switchboards, security controls, teaching grants, audio-visual entertainment, and various query trading systems are just a few of them. However, in terms of medical care, in recent years, speech recognition technology has great potential for the application of rehabilitation of speech-disabled people, but it has rarely been developed by technology. Although there are several papers in China that mention the development of assisted language barriers. However, most of the researches only focus on the aspects of sound control, dictation, helper, etc., and the research on the part of the speech training, that is, the speech recognition system used to assist the speech disorder in clinical practice of constructing sound is almost no 201023176. Have. In short, the prior art has the following shortcomings: 1. In the clinical aspect, the voice data of the articulated sound is manually used, and the type of the articulation error is manually determined. 2. In terms of automated processing, only the “single tone” is currently used for the sound evaluation, so it takes a long test time and can only evaluate the error type. In view of the above factors, in order to improve medical performance, it is necessary to develop a system for Chinese articulation and phonological abnormalities to assist clinical language therapists in diagnosing and helping users to bridge the treatment. SUMMARY OF THE INVENTION The present invention provides an evaluation system for anamorphic anomalies, the system comprising: a step of naming a picture; a step of automatically segmenting a language signal and categorizing the mark; a step of identifying a typo pattern; a affiliation The steps of the network database; by the above steps, the speech pathologist is made to detect the type of articulation error produced by the dysarthria and to allow the speech trainer to train the dysarthria according to the result. [Embodiment] As shown in the first figure, the evaluation system of the articulation abnormality of the present invention has two main databases, that is, a graphic or photo database required for viewing the naming of the picture, and scientific knowledge for clinical diagnosis and treatment. Based on the Dependent Network Database (DN). When a voice-disabled person is motivated to pronounce a picture or a photo in the name of the picture, the corresponding language signal is captured for comparison. At this time, the evaluation system of the articulation abnormality of the present invention divides the language signal of the user into segments and categorizes them; and the Dependency Network 201023176 database (DN) specifically refers to (4) the pronunciation error pattern and the language signal. The labeled results are compared to assess possible similarities between each other. A. Design of the naming of the picture: In the code of clinical diagnosis and treatment, the linguistic pathologist uses the picture-presented PNT (instead of the above-mentioned graphic or photo database) to obtain information on the pronunciation of the child's language. The pronunciation error pattern of a voice disabled person is # = identified from these languages. Therefore, the PNT should contain easy

辨認之圖片且»悉的財。如此，當兒童看到用以測^ 刺激圖片項目奚現時，即不必模仿臨床醫生而發音· 該ΡΝΤ應該對特定語言中所有音素之產物二二’ 囊發音時，紗音素應該呈現在至少兩處方，2 二，該ΡΝΤ應该對逐漸複雜之内文的發聲進〜ι瓜乃，第該包含單音節和多音節語彙的指標發聲。订評估；也應因此’帶有ζ組熟悉語彙的ρΝΤ可以突 W={wIt w2, .... wN} 如下的方式：其中’ N是所用以測試語彙的編號。而每〜（1)Identify the picture and » the wealth. In this way, when the child sees the stimuli picture item, it is not necessary to imitate the clinician, and the sputum should be pronounced for the product of all the phonemes in the specific language, the yin should be presented in at least two prescriptions. 2, the ΡΝΤ should be vocalized into the gradual complexity of the text. The first should include the monosyllabic and polysyllabic vocabulary vocalization. The evaluation should also be made. Therefore, the ρΝΤ with the familiar vocabulary of the ζ group can be pronounced W={wIt w2, .... wN} in the following way: where 'N is the number used to test the vocabulary. And every ~ (1)

Wi則可以的音素來看待並且可 ’試的語彙Wi can look at the phonemes and can test the vocabulary

Wr..sNi 如下的方式：其中，Sj是代表第j個的音素，而Ni則是（2) 所含音素的數目。因此，PNT可以看作〜試語彙％寫戍如下的方式：串的音素並且其中，而Μ是屬:一個測試語彙w所含音素 ⑶ 之中的每一個測試的Sj則可以隱藏隹數目。而在 (H_)加以槔型化。 4爾可夫模: 5 201023176 B.臨床知識為基礎之從屬關係網絡（DN)的設計：在语s評價的臨床診治過程中，一位被測試者被激勵針對一連串音素的Sm*發音時，此對應語言信號之觀測結果〇m則可以被抓取出來以作比對；語言病理學家則必須利用sm* 〇m以手工方式將〇m分割成片段並且加以歸類標圯成為§m。因此，以歸類標記之從屬關係網絡（DN)被建構起來如第2a圖所示。此從屬關係網絡（DN)的聯合或然率係由一組條件性或然率之分佈所組成，表示如下：〇 ρ(νΛ) = ^Μη)^(〇Λ)ρ(ίΛ) (4) 既然§m的歸類標記過程包括語言的和聲學的資訊，則可被設計成由§、和§\所組成的節點；其中，是基於语言的資訊的歸類標記結果，而則是基於聲學的資訊的歸類標記結果。因此，用以自動歸類標記過程之從屬關係網絡（DN)則可被模型化成為如第孔圖所示的情形，而節點§m之條件性或然率則可以寫成如下的方式： ❿ 1¼)=(^(¾ f p(san \〇m p ⑶ 其中，Wi是基於語言的資訊的加權比例因子，而w則是基於聲學的資㈣加權_因子n P(§lm|Sm)』基於ς 5的資訊的或然率’而P(§ m|Om)則是基於聲學的資訊的或然率。語言病理學豕利用指標音素Sju和襟印立去 < 丨、2这 h項的發音錯誤樣式之從屬關係網絡可以被設計成如第三圖所示之方式。對於指標音素sm，其發音錯誤樣式之或然率則可以估叶如下： 6 201023176 -啦1认)(啦μ„Γ咐k)"⑹ c·自動分割成片段並加以歸類標記：當一位被測試對象被激勵針對一連串音素的Wi = SjySNi而發音時，此對應語言信號之觀測結果〇i則可以被記錄起來以作比對。因此’以第2b圖所定義之從屬關係網絡（DN)被應 ❺用於此來自動尋找分割之片段Oi^OiOyONi以及帶有極大值之後或然率而對應之標記結果Wi = …&Ni，其情形表示如下： 1¾ = arg max P{wiwiOi) = argmaxi5(v1〇1s;2i2〇2...^jw〇w) (7)Wr..sNi is as follows: where Sj is the phoneme representing the jth, and Ni is the number of phonemes contained in (2). Therefore, PNT can be regarded as ~ test language % written in the following way: string phoneme and where, and Μ is a genus: a test vocabulary w contains the phoneme (3) Each test Sj can hide the number of 隹. And typed in (H_). 4 Erkov Model: 5 201023176 B. Design of Clinical Knowledge-Based Dependent Relationship Network (DN): During the clinical diagnosis and treatment of the s evaluation, when a test subject is motivated to speak a series of phonemes, Sm* The observation result 〇m of the corresponding language signal can be captured for comparison; the speech pathologist must use sm* 〇m to manually divide 〇m into fragments and classify them into §m. Therefore, the affiliation network (DN) with the categorization tag is constructed as shown in Figure 2a. The joint probability of this Dependent Relation Network (DN) consists of a set of conditional probability distributions, expressed as follows: 〇ρ(νΛ) = ^Μη)^(〇Λ)ρ(ίΛ) (4) Since §m The categorization marking process, including linguistic and acoustic information, can be designed as a node consisting of §, and §\; where is the classification result of language-based information, and the attribution of acoustic-based information. Class tag results. Therefore, the Dependent Relational Network (DN) used to auto-classify the marking process can be modeled as shown in the first hole diagram, and the conditional probability of the node §m can be written as follows: ❿ 11⁄4)= (^(3⁄4 fp(san \〇mp (3) where Wi is a weighted scale factor for language-based information, and w is based on acoustics (4) weighting_factor n P(§lm|Sm)) based on ς 5 information The probability of probability ' and P (§ m|Om) is the probability of information based on acoustics. Linguistic pathology 豕 use the indicator phoneme Sju and 襟立 & 丨 2 2 2 2 2 2 2 2 2 2 It is designed as shown in the third figure. For the indicator phoneme sm, the probability of the pronunciation error pattern can be estimated as follows: 6 201023176 -啦1 recognition) (啦μ„Γ咐k)"(6) c·Auto Dividing into segments and categorizing them: When a test object is excited to pronounce for a series of phonemes, Wi = SjySNi, the observation result 〇i of the corresponding language signal can be recorded for comparison. Dependent Relationship Network (DN) as defined in Figure 2b It should be used to automatically find the segmentation segment Oi^OiOyONi and the corresponding result of the maximum value after the probability of Wi = ... & Ni, the situation is as follows: 13⁄4 = arg max P{wiwiOi) = argmaxi5 (v1 〇1s; 2i2〇2...^jw〇w) (7)

Vi-%, * ' 1 / 對於語音障礙者而言，每二個音素之發音資訊是前後一致的’而且其共同發音效果係以依賴内文式模型方式而模型眷化。如此’每一個音素被假定為彼此讀獨立，而整合公式 (4)和公式（5)之公式（7)則可以推導如下： =0 PP (。》尸⑷ 制…咏rf、％)%) ⑻ 因為觀測結果〇j和指標Sj在每次交叉計算中都是常數’刀母Ρ(〇』)和p(Sj)都可以被省略以降低計算的複雜性。 7 201023176 最後’分類標記和分則段的絲率則可輯算如下： "ΊτΟΗ咏)、咏r广（9) 在本發明中’或然率p(ga」丨係以隱藏式馬爾可夫模型 =MM)方法計算，而或然率ρ(§"Α)則係以最大相似度估算法（MLE)方法計算如下：吵》也)—cfe) ❹ [h，%)~^>) (10) 其中’ C(·)係表示在訓練之全程中所出現情況之數目。分割片段和分類標記的加速，在尋求最可能的標仏果中’威特比演算法提供了一種有效的方法。再者， …發音混淆網絡(PCN)係編造來代表從屬關係網絡⑽）並且用以改進辨識的正確性。音，^目’^係由語'f病理學家找出來^的第j個候選音係瘦由成缺之候選音素的數目。每一音素之最後狀態 ❹轉變連接到收集狀態;而收集狀態 =、，由轉變或然率為p(§ij lsi)的另一零轉變連接到起始 D·發音錯誤樣式的確認·· 對於確認被測試對象之第i個發音錯誤，須應用到如第二圖所示具有M個測立 ° 確認被測試對象之第i個發音錯曰策模型。而估計如下：、樣式它的可能性則可以Vi-%, * ' 1 / For speech impaired people, the pronunciation information of every two phonemes is consistent' and their common pronunciation effect is modeled by relying on the internal model. Thus, 'each phoneme is assumed to be independent of each other, and the formula (7) of formula (4) and formula (5) can be derived as follows: =0 PP (.) corpse (4) system...咏rf, %)%) (8) Because the observations 〇j and Sj are constants in each crossover calculation, both 'knife Ρ(〇』) and p(Sj) can be omitted to reduce the computational complexity. 7 201023176 Finally, the silk rate of the classification mark and the score segment can be calculated as follows: "ΊτΟΗ咏), 咏r wide (9) In the present invention, the probability ratio p(ga) is a hidden Markov model. =MM) method calculation, and the probability ρ(§"Α) is calculated by the maximum similarity estimation method (MLE) method as follows: Noisy)—cfe) ❹ [h,%)~^>) (10 Where 'C(·) is the number of occurrences during the training session. The acceleration of segmentation and classification marks provides an effective method for the Witby algorithm in seeking the most likely target. Furthermore, the ... Pronunciation Confusion Network (PCN) is fabricated to represent the Dependency Network (10) and is used to improve the correctness of the identification. The sound, ^目'^ is the number of candidate phonemes for the jth candidate sound that the pathologist 'f pathologist finds. The final state of each phoneme is connected to the collection state; and the collection state =, the other zero transition of the transition probability p(§ij lsi) is connected to the confirmation of the initial D. pronunciation error pattern. The i-th pronunciation of the test object is incorrect, and it must be applied to the i-th utterance error-correction model with M test-measurements as shown in the second figure. And the estimate is as follows:

(11) L^(^(KsJnon 8 201023176 其中，η是正數而且可能是用以選擇多選一式決策的係數。若當’ η是1時’則Ε1的可能性是所有多選一式項目的平均值；而當’ η是接近無限大時，則Ei的可紐變成咖 p(EmSm§mOm)。如果，下列條件成立時，則第i個誤樣式E1便可以被確認：曰 L0)>Ht (12) 其中，Hi是E1預先定義的臨界門播。既然’觀測結果Oj和指標Sj在每次交又計算中都是常數，分母ρ(ο】)和p(Sj)都可以被省略。根據公式（6)，公 (11)則可以改窵如下： "" '去半:丨认)(哏k)v(砥疒)’一打之 p(Eim|Sm§m) 利用最大相似度估算法（MLE)，公式（13) 可以估算如下：(11) L^(^(KsJnon 8 201023176 where η is a positive number and may be a coefficient used to select a multiple-choice decision. If 'η is 1' then the probability of Ε1 is the average of all multiple-choice items. Value; and when 'η is close to infinity, Ei's can be turned into coffee (EmSm§mOm). If the following conditions are true, the ith error pattern E1 can be confirmed: 曰L0)> Ht (12) where Hi is the pre-defined critical gatecast of E1. Since 'observation result Oj and index Sj are constant in each intersection calculation, the denominators ρ(ο)) and p(Sj) can be omitted. According to formula (6), public (11) can be changed as follows: "" 'Go to half: 丨 )) (哏k)v(砥疒)'A dozen of p(Eim|Sm§m) The maximum similarity estimation method (MLE), formula (13) can be estimated as follows:

(14) 當標記結果Sm是由語言病理學家以手工方式讀認，’則公式（13)之條件性或然率吵扎)可以被估算^ 來。因此’手工方式標記結果之可能，則可以推導如下： (15) 由上所論，本發明構音異常之評估祕具有如下優點： 9 201023176 1. 應用圖卡進行連續音輸入，具高度友善操作介面。 2. 應用語音識別技術進行構音異常偵測。 3. 透過機率統計模型進行構音錯誤類型判別。統論之，本發明較習知技藝更具有實用價值及進步性，乃提出發明專利申請。【圖式簡單說明】第一圖：鑑定發音錯誤樣式之方塊示意圖。 φ 第二a圖：原來之手工歸類標記方式。第二b圖：改進後之自動歸類標記方式。第三圖：利用Sm以確認發音錯誤樣式Em之從屬關係網絡示意圖。第四圖：Wi的PCN示意圖。【主要元件符號說明】無(14) When the marker result Sm is read by the language pathologist manually, 'the conditional probability of the formula (13) is arbitrarily) can be estimated. Therefore, the possibility of manually marking the results can be derived as follows: (15) From the above, the evaluation secret of the articulation anomaly of the present invention has the following advantages: 9 201023176 1. The continuous audio input is applied by using the graphics card, and the user interface is highly friendly. 2. Apply speech recognition technology for articulation anomaly detection. 3. Determine the type of articulation error through the probability statistical model. In general, the present invention is more practical and progressive than the prior art, and is an invention patent application. [Simple description of the diagram] The first picture: a block diagram of the identification of the wrong style of pronunciation. φ Second a picture: the original manual classification mark method. Figure b: Improved automatic classification marking method. The third picture: Using Sm to confirm the affiliation of the pronunciation error pattern Em. Fourth picture: Schematic diagram of Wi's PCN. [Main component symbol description] None

Claims

201023176 VII. Patent application scope: 1. An evaluation system for articulated anomalies, including: a step of naming the picture; a step of automatically cutting the signal and classifying the mark; - identifying the wrong style of the pronunciation (4); _ 资料资料 ; 四四四四四四四四四四四四四四语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音语音The evaluation system '3' is a database of graphics or photographs.誃, etc.:: Fan Gu's evaluation system for articulated anomalies mentioned in Item 2, where 4 = by the straight 3 * the exhibit is better for the figure or photo that the subject can easily recognize. The evaluation system of the articulated anomaly described in the first item, wherein the affiliation network method and the step of the gentleman's library include a series of phonemes S*n, the pair of signals, the pair of phonemes, and the pronunciation. This correspondence pathologist ·=01° can be taken out for *pair; the language segment and 4 are used to manually divide 0 into slices and language and sign into §m; the above §m The classification of the markup process package 'i's learning = information, then ^ can be designed to be based on 匕 and 匕鲈, and § a, where § is a categorization mark of language-based information: The classification result of the information of the acoustics is 'as it is modeled. ‘The subordinates __(DN) can be 11