200818117 七、指定代表圖: (一) 本案指定代表圖為:第(一)圖。 (二) 本代表圖之元件符號簡單說明: S11〜S16 :步驟流程。 八、本案若有化學式時,請揭示最能顯示發明特徵的化 學式= 九、發明說明: 【發明所屬之技術領域】 本發明係為提供一種語音辨識系統之詞彙資料庫建置方 法及其搜尋比對方法,特別是一種可支援破音字處理之詞彙 資料庫建置方法及其更具效率之搜尋比對方法。 4 200818117 【先前技術】 習知語音辨識系統,並沒有加入破音字的處理功能,導 致使用者在進行語音輸入時,必須唸成其破音字的另—種於 音才能辨識成功,例如,人名陳力行的「行」字,必須發立 為「厂尤/」才能辨識成功,如使用者發音為「丁 一厶〆 便無法正確辨識,又例如,樂團的「樂」字,必須發音為「为1 古\」才能辨識,若發音為「U乜\」亦無法正確辨識,而 這樣的語音輸入方式與一般使用者的發音習慣有拫大的差 異。此外,語音辨識系統在進行辨識時,通常是利用維特比 演算法(Viterbi Algorithm)計算詞茱中每個字所對應聲 學模型的機率值來進行辨識,而這樣的〉貞异也是語音辨識系 統花費最大計算量的地方,因此,若是經常重複計算某些相 同的字將導致系統不必要的計算量加重,也會造成系統辨識 速度的下降,因此促成我們思考如何避免重複計算相同的字 以降低整體的運算量。 本發明人基於多年從事研究與諸多實務經驗,經多方研 究設計與專題探討,遂於本發明提出一種語音辨識系統之詞 彙資料庫建置方法及其搜尋比對方法,以作為前述期望一實 現方式與依據。 貝 【發明内容】 有鏗於上述課題,本發明之目的為提供一種語音辨識系統 之凋果資料庫建置方法及其搜尋比對方法,特別是一種可支 援破音字處理之詞彙資料庫建置方法及其更具效率之搜尋 比對方法。 、 緣是,為達上述目的,依本發明之語音辨識系統之詞彙資料 庫建置方法,其包含下列步驟: 、/ 200818117 (a) 提供一破音字資料; (b) 輸入一詞彙; (c)比對破音字資料,判斷此詞彙是否包含至少一破音 字,若是,則對於此詞彙所包含之破音字之複數個發音方^ 分別建立相對應之複數個聲學模型,若否,則 ^ 9 立單一對應之聲學模型;以及 、此詞果建 (d)儲存此詞彙及其對應之聲學模型至詞彙資料庫。 、承上所述,因依本發明之語音辨識系統之詞彙資料庫建置 ,,及其搜尋比對方法,可建置—種支援破音字處理功能之 詞彙資料庫,使語音辨識系統更加人性化也更貼近_般使用 者之發音習慣。此外,依本發明之詞彙資料庫搜尋比對方 法,可避免系統重複計算。 丄錄為使貴審查委員對本發明之技術特徵及所達成之 功效有更進-步之瞭解與認識,τ文謹提供較佳之實施例及 相關圖式以為輔佐之m細之說明文字配合說明如 【實施方式】 以下將參助麵式,制依本發明較佳實蘭之語音辨識 系統之詞彙㈣庫建置方法及其搜尋比對方法,射相同的^ 件將以相同的參照符號加以說明。 本發明之語音辨識系統主要是利用隱藏式馬可夫模型 (Hidden Markov Model,HMM)的方法作辨識,它以機率模型來 描述發音的财,將—小段語音的發音触,看成是—個馬可夫 莫中連續的狀悲轉移,其中辨識過程所利用之語音特徵夹數 (Mel-Frequency Cepstrum Coefficients! 200818117 除了考慮到人耳對不同頻率的感受程度,更具有 ^ &道模型與激發訊號的特性,使得我們在語音辨識 日^會受到說話者的音量大小,或中文語音之五種= (、—、三、四聲與輕聲)的影響。 基於以上特性’我們將從2 4 5個中文破音字中選出適合 梅統之破音字’由於辨識時利用到的特徵參數: 此二'、°曰糸數’因此破音字中其發音差異僅在於聲調不同的這 包含在我們要處理的破音字中,例⑹:「少」這個破音 有兩種’其-為「尸么V」,另-則為「尸么\」,差显 石 ’我們便將其捨去,雜剩下來的便是我們的 曰子貝枓’其包含的字大致有:行、仔、樂、和、重、說、乾、 ί:ί、曾、沈、冒、沒、校、從、都、落、朝、傳、單、彷、 :、尸、n ?、調、參、黏、省、塞、差、蓋、傍、般、 識、二黎、:、暴、熟、模、給、薄、告、嚇、藏、還、翟、 識騎、繫、覺、露、屬、攪等等。 建’係、顯示本發明之語音辨識系統之詞彙資料庫 建置方法之步驟流程圖,其步驟如後: 步驟S11 :提供一破音字資料; 步驟S12 :輸入一詞彙; 少比曰對該破音字資料,判斷該詞囊是否包含至 個於立二、、右疋’職於該詞彙所包含之該破音字之複數 “:巧建立相對應之複數個聲學模型,若否,則對 哀凋果建立早一對應之該聲學模型;以及 步驟SH:料該詞彙及触聲學模縣該詞彙資料庫。 上述音字資料係包含複數個破音字及其發音方式, 耳子«係為一隱藏式馬可夫模型。 7 2o〇818117 請參閱第二圖,係顯示本發明之語音辨識系統之詞彙資料庫 建置方法之較佳實施例之步驟流程圖,其係以歌手姓名為例,、 置歌手姓名之詞彙資料庫,其步驟如後: ^ 步驟S21 ··讀入歌手姓名; 步驟S22 :比對破音字資料,判斷此歌手姓名是否勹a 至少-破音字,若是’執行步驟S23,若否,執行步驟= 步驟S23:增加一組由破音字代替的姓名; 步驟S24:分別將姓名的字轉換成由隱藏式馬可夫模塑 步驟S25 :是否讀到最後一筆歌手姓名;以及 步驟S26 .結束初始化,進入辨識流程。 透過本發明所建置之詞彙資料庫,具有 使用者能夠依昭一船惜頊夕八立卞〒成力月b ’讓 百此順…心用之發音’而得到正確的辨識結果。 和韻;外聲-個中文字可將其分解為聲母 文车gP_ ’韻母Μ在音節尾端,每-個中 識即是母的聲學模型來代表’而語音辨 果將詞囊資料庫中的詞彙以字;;===,因此如 亚且記下前—個詞彙同音字的機率值,在計管日t/作f序二 的詞彙與上-鋼囊不同音♦ 〜彳以、要計算目前 的機率值,可料搜尋輯時的計算f。,而不轉撕算同音字 搜尋=辨物之詞彙資料庫 彙係=二包此些詞 且,此些同彙係以_對 200818117 一方式對應於複數個聲學模型; 步驟S32 ·•輸入一語音訊號; 步驟S33 :擷取語音訊號之一特徵參數,· i-刑·割寺徵參數逐一比對此些詞彙之聲學模型,聲學 應於特徵參數分職生—機率值,其中,每-詞彙係繼 承别-相剩彙中相同發音字元所產生之機率值;以及 步驟,:透過此賴彙之機雜,賤行語音訊號之辨識。 述耳予枳型係為一隱藏式馬可夫模型,上述特徵參數# a ^ c〇;fnc^4 1 ^機率值係維特比演算法(化 ’200818117 VII. Designation of representative representatives: (1) The representative representative of the case is: (1). (2) A brief description of the component symbols of this representative figure: S11~S16: Step flow. 8. If there is a chemical formula in this case, please disclose the chemical formula that best shows the characteristics of the invention. IX. Description of the invention: [Technical field of the invention] The present invention provides a vocabulary database construction method for a speech recognition system and its search ratio. The method, in particular, a vocabulary database construction method that supports broken word processing and a more efficient search comparison method. 4 200818117 [Prior Art] The conventional speech recognition system does not include the processing function of the broken word, which causes the user to read the broken word in the voice input to be recognized successfully. For example, the name of the person The word "行" must be issued as "factory/" to identify success. If the user pronounces "Ding Yizhen, it cannot be correctly identified. For example, the "le" of the orchestra must be pronounced as " 1 Ancient \" can be identified. If the pronunciation is "U乜\", it cannot be correctly identified. Such a voice input method is quite different from the general user's pronunciation habit. In addition, when the speech recognition system performs identification, it usually uses the Viterbi Algorithm to calculate the probability value of the acoustic model corresponding to each word in the vocabulary, and the difference is also the cost of the speech recognition system. The place where the maximum amount of calculation, therefore, if the repeated calculation of some of the same words often leads to an unnecessary increase in the amount of calculation of the system, it will also cause a decrease in the speed of system identification, thus causing us to think about how to avoid double counting the same words to reduce the overall The amount of computation. The inventor has been engaged in research and many practical experiences for many years, and has been researched and designed by many parties. In view of the present invention, a vocabulary database construction method and a search comparison method for a speech recognition system are proposed as the aforementioned desired implementation. And basis. In view of the above problems, the object of the present invention is to provide a method for constructing a fruit database of a speech recognition system and a search comparison method thereof, in particular, a vocabulary database construction capable of supporting broken word processing. Method and its more efficient search comparison method. The method for constructing a vocabulary database for a speech recognition system according to the present invention comprises the following steps: /, 200818117 (a) providing a broken word data; (b) inputting a vocabulary; (c) Comparing the broken words, determining whether the vocabulary contains at least one broken word, and if so, establishing a plurality of corresponding acoustic models for the plural utterances of the broken words included in the vocabulary, if not, then A single corresponding acoustic model; and, the word (d) stores the vocabulary and its corresponding acoustic model to a vocabulary database. According to the above, the vocabulary database of the speech recognition system according to the present invention, and the search comparison method thereof, can establish a vocabulary database supporting the broken word processing function, so that the speech recognition system is more human. It is also closer to the pronunciation habits of users. In addition, the vocabulary database search according to the present invention can avoid repeated calculations of the system.丄 为 使 使 使 使 使 使 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 审查 τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ [Embodiment] Hereinafter, the vocabulary (four) library construction method and the search comparison method of the voice recognition system of the preferred real orchid according to the present invention will be described, and the same reference numerals will be described with the same reference symbols. . The speech recognition system of the present invention mainly utilizes the Hidden Markov Model (HMM) method for identification. It uses the probability model to describe the pronunciation of the money, and the speech of the small segment of speech is regarded as a Marcofu. In the continuous sorrowful shift, the number of phonetic features used in the identification process (Mel-Frequency Cepstrum Coefficients! 200818117, in addition to the human ear's perception of different frequencies, has the characteristics of the ^ & track model and the excitation signal, So that we will be affected by the volume of the speaker in the speech recognition day, or the five kinds of Chinese speech = (, -, three, four and soft). Based on the above characteristics 'we will be from 2 4 5 Chinese broken words Select the broken word that is suitable for Meitong's characteristic parameters used for identification: the second ', °曰糸', so the difference in pronunciation in the broken words is only the difference in tone, which is included in the broken words we want to deal with, for example (6): There are two kinds of "small" breaks. The other is "the corpse V", and the other is "the corpse\". It is our scorpion Bellow's words that contain roughly the following: line, aberdeen, music, harmony, emphasis, speaking, doing, ί: ί, Zeng, Shen, Yang, no, school, from, to, fall, Chao, pass, single, imitation, :, corpse, n?, tune, ginseng, sticky, province, plug, poor, cover, 傍, general, knowledge, Erli,:, violence, familiar, mold, give, thin, Reporting, scaring, hiding, returning, squatting, knowing riding, tying, sensation, dew, genus, agitation, etc. A flow chart showing the steps of the vocabulary database construction method of the speech recognition system of the present invention, the steps thereof As follows: Step S11: providing a broken word data; Step S12: inputting a vocabulary; less than 破 曰 曰 曰 曰 曰 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断 判断The plural of the broken word ": to establish a corresponding plurality of acoustic models, if not, to establish an earlier acoustic model of the sorrow; and step SH: the vocabulary and the acoustic vocabulary The above-mentioned phonetic data contains a plurality of broken words and their pronunciation, and the ear is a hidden Markov module. 7 2o〇818117 Please refer to the second figure, which is a flow chart showing the steps of a preferred embodiment of the vocabulary database construction method of the speech recognition system of the present invention, which takes the singer's name as an example and sets the vocabulary of the singer's name. The database has the following steps: ^ Step S21 ··Reading the artist name; Step S22: Comparing the broken word data, determining whether the artist name is 勹a at least-breaking word, if it is 'execution step S23, if not, performing steps = step S23: adding a set of names replaced by broken words; step S24: respectively converting the words of the name into a hidden Markov molding step S25: whether the last singer name is read; and step S26. Ending the initialization, entering the identification Process. Through the vocabulary database built by the present invention, the user can obtain the correct identification result according to the singularity of the singer and the singer of the singer. And rhyme; external sound - a Chinese character can be decomposed into the initials gP_ 'the rhyme Μ at the end of the syllable, each of the knowledge is the mother's acoustic model to represent 'and the voice discriminating will be in the corpus database The vocabulary is in words;; ===, so if you remember the probability value of the vocabulary homonym, the vocabulary on the accounting day t/f is the same as the upper-steel sac ♦ 彳 彳, To calculate the current probability value, you can calculate the calculation f at the time of the search. , instead of turning to the homonym search = the vocabulary database of the object = two packs of these words and these same lines correspond to a plurality of acoustic models in a way of _ pair 200818117; step S32 ·• input one Voice signal; Step S33: extracting one of the characteristic parameters of the voice signal, · i-criminal and cutting the temple sign parameters one by one compared to the acoustic model of the vocabulary, the acoustics should be based on the feature parameters, the probability value, where each - The vocabulary inherits the probability value generated by the same pronunciation character in the remaining remittance; and the step: through the machine of the reliance, the identification of the speech signal. The ear 枳 枳 is a hidden Markov model, the above characteristic parameter # a ^ c 〇; fnc ^ 4 1 ^ probability value is Viterbi algorithm (chemical
Algorithm)計算產生。 丘有==名^司莱貧料庫t例,若總數有692個歌手姓名, _學:型做法:⑵中 :並=下 名字時^科算r同=二『率,所以在計算這-筆歌手 請麥閱第四1|,係顯示本發明之語音辨 搜尋比對方法之較佳實施例之步驟流程圖,其步驟^杲貝枓庫 步驟S41 ·輸入語音之梅爾倒頻谱係數; 步驟S42 ··讀入歌手姓名模型; 目前歌手姓名的發音與前一 ίΓΓ疋,執行步驟S44,若否,則執行步驟撕; 代替,再由不同發音的字繼續進行下二錄的機率 200818117 V_S45利用維特比演算法(viAlgorithm)計 算機率; 步驟S46 ··儲存目前歌手姓名每個字的機率; 步驟S47:是否所有歌手姓名皆已計算機率,若是,執 行步驟S48 ’右否,則重複上述步驟S42 ;以及 步驟S48:排列出五個最大機率的歌手姓名。 以歌手姓名「陳力行」為例,其與歌手「陳力宏」 是相同的,因此在做維特比演算 去十#叶輪入的梅爾倒頻譜係數先盘「陳力杆所 ::表的6個聲學模型做機率計算, ‘ =皆下來輸入語音要與「陳力宏」做機率計;:子以 目前「尹的陳」攻兩個字的機率,接著加上 力宏」的完整機率。 即』侍到1陳 以上所述僅為舉例性,而非為限 之精神與齡,而對其進行之等效修改或本發明 之申請專利範圍中。 /文更,均應包含於後附 圖式簡單說明】 置方法之 圖係顯示本發明之語音辨識系統之 步驟流程圖; 貝枓庫建 資料庫建置方法 之 第二圖係顯示本發明之語音辨識系統之詞 季父佳實施例之步驟流程圖; ^圖係顯示本發明之語音辨識系統之 去之步驟流程圖;以及 、枓庫彳又哥比對方 10 200818117 第四圖係顯示本發明之語音辨識系統之詞彙資料庫搜尋比對方 法之較佳實施例之步驟流程圖。 【主要元件符號說明】 S11〜S16 :步驟流程; S21- -S26 :步驟流程; S31- ◊S35 :步驟流程;以及 S41- -S48 :步驟流程。 11Algorithm) calculation generated. Qiu has == name ^ Sile poor library t example, if the total number of 692 singer names, _ learning: type practices: (2) in: and = the next name ^ ^ calculate r with = two "rate, so in calculating this - The pen singer please read the fourth step 1|, which is a flow chart showing the steps of the preferred embodiment of the speech recognition search method of the present invention, the step of which is step S41, the input of the voice of the Mel Cepstrum Coefficient; Step S42 ··Reading the artist name model; Currently the pronunciation of the artist name is the same as the previous one, step S44 is performed, if not, the step is performed; instead, the probability of the next two words being continued by the different pronunciation words 200818117 V_S45 uses the Viterbi algorithm (viAlgorithm) computer rate; Step S46 ··Saves the probability of each word of the current artist name; Step S47: Whether all the artist names have computer rate, and if yes, execute step S48 'No right, repeat The above step S42; and step S48: arranging the five maximum probability singer names. Take the singer's name "Chen Lixing" as an example. It is the same as the singer "Chen Lihong". Therefore, in the Viterbi calculus, the Mel cepstral coefficient of the tenth impeller is entered first. "Chen Lie:: Table 6 An acoustic model is used to calculate the probability. ' = All voices are input with Chen Lihong. The probability of attacking the two characters with the current "Yin Chen" is followed by the full probability of Leehom. That is, the above description is only exemplary, and not limited to the spirit and age, and the equivalent modification thereof or the patent application scope of the present invention. / wen wen, should be included in the following description of the brief description] The method of the method shows the flow chart of the steps of the speech recognition system of the present invention; the second figure of the Beckham database construction method shows the present invention A flow chart of the steps of the speech recognition system; the figure shows the flow chart of the steps of the speech recognition system of the present invention; and, the library and the other party 10 200818117, the fourth figure shows the invention A flow chart of the steps of a preferred embodiment of the lexical database search alignment method of the speech recognition system. [Description of main component symbols] S11~S16: Step flow; S21--S26: Step flow; S31- ◊S35: Step flow; and S41--S48: Step flow. 11