TW200818117A

TW200818117A - Lexicon database implementation method for audio recognition system and search/match method thereof

Info

Publication number: TW200818117A
Application number: TW95137548A
Authority: TW
Inventors: Chung-Po Liao
Original assignee: Inventec Besta Co Ltd
Priority date: 2006-10-12
Filing date: 2006-10-12
Publication date: 2008-04-16
Also published as: TWI299854B

Abstract

A lexicon database implementation method for audio recognition system includes the following steps:(a) providing heteronym data;(b) inputting a lexicon;(c) matching the heteronym data, determining whether or not the lexicon includes at least one heteronym, if yes, establishing a plurality of acoustic models for a plurality of pronunciations of the heteronyms in the lexicon, if not, establishing a single acoustic model for the lexicon; and (d) saving the lexicon and corresponding acoustic models to the lexicon database.

Description

200818117 七、指定代表圖： (一）本案指定代表圖為：第（一）圖。 (二）本代表圖之元件符號簡單說明： S11〜S16 :步驟流程。八、本案若有化學式時，請揭示最能顯示發明特徵的化學式= 九、發明說明：【發明所屬之技術領域】本發明係為提供一種語音辨識系統之詞彙資料庫建置方法及其搜尋比對方法，特別是一種可支援破音字處理之詞彙資料庫建置方法及其更具效率之搜尋比對方法。 4 200818117 【先前技術】習知語音辨識系統，並沒有加入破音字的處理功能，導致使用者在進行語音輸入時，必須唸成其破音字的另—種於音才能辨識成功，例如，人名陳力行的「行」字，必須發立為「厂尤/」才能辨識成功，如使用者發音為「丁一厶〆便無法正確辨識，又例如，樂團的「樂」字，必須發音為「为1 古\」才能辨識，若發音為「U乜\」亦無法正確辨識，而這樣的語音輸入方式與一般使用者的發音習慣有拫大的差異。此外，語音辨識系統在進行辨識時，通常是利用維特比演算法（Viterbi Algorithm)計算詞茱中每個字所對應聲學模型的機率值來進行辨識，而這樣的〉貞异也是語音辨識系統花費最大計算量的地方，因此，若是經常重複計算某些相同的字將導致系統不必要的計算量加重，也會造成系統辨識速度的下降，因此促成我們思考如何避免重複計算相同的字以降低整體的運算量。本發明人基於多年從事研究與諸多實務經驗，經多方研究設計與專題探討，遂於本發明提出一種語音辨識系統之詞彙資料庫建置方法及其搜尋比對方法，以作為前述期望一實現方式與依據。貝【發明内容】有鏗於上述課題，本發明之目的為提供一種語音辨識系統之凋果資料庫建置方法及其搜尋比對方法，特別是一種可支援破音字處理之詞彙資料庫建置方法及其更具效率之搜尋比對方法。、緣是，為達上述目的，依本發明之語音辨識系統之詞彙資料庫建置方法，其包含下列步驟：、/ 200818117 (a) 提供一破音字資料； (b) 輸入一詞彙； (c)比對破音字資料，判斷此詞彙是否包含至少一破音字，若是，則對於此詞彙所包含之破音字之複數個發音方^ 分別建立相對應之複數個聲學模型，若否，則 ^ 9 立單一對應之聲學模型；以及、此詞果建 (d)儲存此詞彙及其對應之聲學模型至詞彙資料庫。、承上所述，因依本發明之語音辨識系統之詞彙資料庫建置，，及其搜尋比對方法，可建置—種支援破音字處理功能之詞彙資料庫，使語音辨識系統更加人性化也更貼近_般使用者之發音習慣。此外，依本發明之詞彙資料庫搜尋比對方法，可避免系統重複計算。丄錄為使貴審查委員對本發明之技術特徵及所達成之功效有更進-步之瞭解與認識，τ文謹提供較佳之實施例及相關圖式以為輔佐之m細之說明文字配合說明如【實施方式】以下將參助麵式，制依本發明較佳實蘭之語音辨識系統之詞彙㈣庫建置方法及其搜尋比對方法，射相同的^ 件將以相同的參照符號加以說明。本發明之語音辨識系統主要是利用隱藏式馬可夫模型 (Hidden Markov Model，HMM)的方法作辨識，它以機率模型來描述發音的财，將—小段語音的發音触，看成是—個馬可夫莫中連續的狀悲轉移，其中辨識過程所利用之語音特徵夹數 (Mel-Frequency Cepstrum Coefficients! 200818117 除了考慮到人耳對不同頻率的感受程度，更具有 ^ &道模型與激發訊號的特性，使得我們在語音辨識日^會受到說話者的音量大小，或中文語音之五種= (、—、三、四聲與輕聲）的影響。基於以上特性’我們將從2 4 5個中文破音字中選出適合梅統之破音字’由於辨識時利用到的特徵參數：此二'、°曰糸數’因此破音字中其發音差異僅在於聲調不同的這包含在我們要處理的破音字中，例⑹：「少」這個破音有兩種’其-為「尸么V」，另-則為「尸么\」，差显石 ’我們便將其捨去，雜剩下來的便是我們的曰子貝枓’其包含的字大致有：行、仔、樂、和、重、說、乾、 ί:ί、曾、沈、冒、沒、校、從、都、落、朝、傳、單、彷、 :、尸、n ?、調、參、黏、省、塞、差、蓋、傍、般、識、二黎、：、暴、熟、模、給、薄、告、嚇、藏、還、翟、識騎、繫、覺、露、屬、攪等等。建’係、顯示本發明之語音辨識系統之詞彙資料庫建置方法之步驟流程圖，其步驟如後：步驟S11 :提供一破音字資料；步驟S12 :輸入一詞彙；少比曰對該破音字資料，判斷該詞囊是否包含至個於立二、、右疋’職於該詞彙所包含之該破音字之複數 “：巧建立相對應之複數個聲學模型，若否，則對哀凋果建立早一對應之該聲學模型；以及步驟SH:料該詞彙及触聲學模縣該詞彙資料庫。上述音字資料係包含複數個破音字及其發音方式，耳子«係為一隱藏式馬可夫模型。 7 2o〇818117 請參閱第二圖，係顯示本發明之語音辨識系統之詞彙資料庫建置方法之較佳實施例之步驟流程圖，其係以歌手姓名為例，、置歌手姓名之詞彙資料庫，其步驟如後： ^ 步驟S21 ··讀入歌手姓名；步驟S22 :比對破音字資料，判斷此歌手姓名是否勹a 至少-破音字，若是’執行步驟S23，若否，執行步驟= 步驟S23:增加一組由破音字代替的姓名；步驟S24:分別將姓名的字轉換成由隱藏式馬可夫模塑步驟S25 :是否讀到最後一筆歌手姓名；以及步驟S26 .結束初始化，進入辨識流程。透過本發明所建置之詞彙資料庫，具有使用者能夠依昭一船惜頊夕八立卞〒成力月b ’讓百此順…心用之發音’而得到正確的辨識結果。和韻;外聲-個中文字可將其分解為聲母文车gP_ ’韻母Μ在音節尾端，每-個中識即是母的聲學模型來代表’而語音辨果將詞囊資料庫中的詞彙以字;；===，因此如亚且記下前—個詞彙同音字的機率值，在計管日t/作f序二的詞彙與上-鋼囊不同音♦ 〜彳以、要計算目前的機率值，可料搜尋輯時的計算f。，而不轉撕算同音字搜尋=辨物之詞彙資料庫彙係=二包此些詞且，此些同彙係以_對 200818117 一方式對應於複數個聲學模型；步驟S32 ·•輸入一語音訊號；步驟S33 :擷取語音訊號之一特徵參數，· i-刑·割寺徵參數逐一比對此些詞彙之聲學模型，聲學應於特徵參數分職生—機率值，其中，每-詞彙係繼承别-相剩彙中相同發音字元所產生之機率值；以及步驟，：透過此賴彙之機雜，賤行語音訊號之辨識。述耳予枳型係為一隱藏式馬可夫模型，上述特徵參數# a ^ c〇；fnc^4 1 ^機率值係維特比演算法（化 ’200818117 VII. Designation of representative representatives: (1) The representative representative of the case is: (1). (2) A brief description of the component symbols of this representative figure: S11~S16: Step flow. 8. If there is a chemical formula in this case, please disclose the chemical formula that best shows the characteristics of the invention. IX. Description of the invention: [Technical field of the invention] The present invention provides a vocabulary database construction method for a speech recognition system and its search ratio. The method, in particular, a vocabulary database construction method that supports broken word processing and a more efficient search comparison method. 4 200818117 [Prior Art] The conventional speech recognition system does not include the processing function of the broken word, which causes the user to read the broken word in the voice input to be recognized successfully. For example, the name of the person The word "行" must be issued as "factory/" to identify success. If the user pronounces "Ding Yizhen, it cannot be correctly identified. For example, the "le" of the orchestra must be pronounced as " 1 Ancient \" can be identified. If the pronunciation is "U乜\", it cannot be correctly identified. Such a voice input method is quite different from the general user's pronunciation habit. In addition, when the speech recognition system performs identification, it usually uses the Viterbi Algorithm to calculate the probability value of the acoustic model corresponding to each word in the vocabulary, and the difference is also the cost of the speech recognition system. The place where the maximum amount of calculation, therefore, if the repeated calculation of some of the same words often leads to an unnecessary increase in the amount of calculation of the system, it will also cause a decrease in the speed of system identification, thus causing us to think about how to avoid double counting the same words to reduce the overall The amount of computation. The inventor has been engaged in research and many practical experiences for many years, and has been researched and designed by many parties. In view of the present invention, a vocabulary database construction method and a search comparison method for a speech recognition system are proposed as the aforementioned desired implementation. And basis. In view of the above problems, the object of the present invention is to provide a method for constructing a fruit database of a speech recognition system and a search comparison method thereof, in particular, a vocabulary database construction capable of supporting broken word processing. Method and its more efficient search comparison method. The method for constructing a vocabulary database for a speech recognition system according to the present invention comprises the following steps: /, 200818117 (a) providing a broken word data; (b) inputting a vocabulary; (c) Comparing the broken words, determining whether the vocabulary contains at least one broken word, and if so, establishing a plurality of corresponding acoustic models for the plural utterances of the broken words included in the vocabulary, if not, then A single corresponding acoustic model; and, the word (d) stores the vocabulary and its corresponding acoustic model to a vocabulary database. According to the above, the vocabulary database of the speech recognition system according to the present invention, and the search comparison method thereof, can establish a vocabulary database supporting the broken word processing function, so that the speech recognition system is more human. It is also closer to the pronunciation habits of users. In addition, the vocabulary database search according to the present invention can avoid repeated calculations of the system.丄为使使使使使使审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查审查 τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ [Embodiment] Hereinafter, the vocabulary (four) library construction method and the search comparison method of the voice recognition system of the preferred real orchid according to the present invention will be described, and the same reference numerals will be described with the same reference symbols. . The speech recognition system of the present invention mainly utilizes the Hidden Markov Model (HMM) method for identification. It uses the probability model to describe the pronunciation of the money, and the speech of the small segment of speech is regarded as a Marcofu. In the continuous sorrowful shift, the number of phonetic features used in the identification process (Mel-Frequency Cepstrum Coefficients! 200818117, in addition to the human ear's perception of different frequencies, has the characteristics of the ^ & track model and the excitation signal, So that we will be affected by the volume of the speaker in the speech recognition day, or the five kinds of Chinese speech = (, -, three, four and soft). Based on the above characteristics 'we will be from 2 4 5 Chinese broken words Select the broken word that is suitable for Meitong's characteristic parameters used for identification: the second ', °曰糸', so the difference in pronunciation in the broken words is only the difference in tone, which is included in the broken words we want to deal with, for example (6): There are two kinds of "small" breaks. The other is "the corpse V", and the other is "the corpse\". It is our scorpion Bellow's words that contain roughly the following: line, aberdeen, music, harmony, emphasis, speaking, doing, ί: ί, Zeng, Shen, Yang, no, school, from, to, fall, Chao, pass, single, imitation, :, corpse, n?, tune, ginseng, sticky, province, plug, poor, cover, 傍, general, knowledge, Erli,:, violence, familiar, mold, give, thin, Reporting, scaring, hiding, returning, squatting, knowing riding, tying, sensation, dew, genus, agitation, etc. A flow chart showing the steps of the vocabulary database construction method of the speech recognition system of the present invention, the steps thereof As follows: Step S11: providing a broken word data; Step S12: inputting a vocabulary; less than 破曰曰曰曰曰判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断判断The plural of the broken word ": to establish a corresponding plurality of acoustic models, if not, to establish an earlier acoustic model of the sorrow; and step SH: the vocabulary and the acoustic vocabulary The above-mentioned phonetic data contains a plurality of broken words and their pronunciation, and the ear is a hidden Markov module. 7 2o〇818117 Please refer to the second figure, which is a flow chart showing the steps of a preferred embodiment of the vocabulary database construction method of the speech recognition system of the present invention, which takes the singer's name as an example and sets the vocabulary of the singer's name. The database has the following steps: ^ Step S21 ··Reading the artist name; Step S22: Comparing the broken word data, determining whether the artist name is 勹a at least-breaking word, if it is 'execution step S23, if not, performing steps = step S23: adding a set of names replaced by broken words; step S24: respectively converting the words of the name into a hidden Markov molding step S25: whether the last singer name is read; and step S26. Ending the initialization, entering the identification Process. Through the vocabulary database built by the present invention, the user can obtain the correct identification result according to the singularity of the singer and the singer of the singer. And rhyme; external sound - a Chinese character can be decomposed into the initials gP_ 'the rhyme Μ at the end of the syllable, each of the knowledge is the mother's acoustic model to represent 'and the voice discriminating will be in the corpus database The vocabulary is in words;; ===, so if you remember the probability value of the vocabulary homonym, the vocabulary on the accounting day t/f is the same as the upper-steel sac ♦ 彳彳, To calculate the current probability value, you can calculate the calculation f at the time of the search. , instead of turning to the homonym search = the vocabulary database of the object = two packs of these words and these same lines correspond to a plurality of acoustic models in a way of _ pair 200818117; step S32 ·• input one Voice signal; Step S33: extracting one of the characteristic parameters of the voice signal, · i-criminal and cutting the temple sign parameters one by one compared to the acoustic model of the vocabulary, the acoustics should be based on the feature parameters, the probability value, where each - The vocabulary inherits the probability value generated by the same pronunciation character in the remaining remittance; and the step: through the machine of the reliance, the identification of the speech signal. The ear 枳枳 is a hidden Markov model, the above characteristic parameter # a ^ c 〇; fnc ^ 4 1 ^ probability value is Viterbi algorithm (chemical

Algorithm)計算產生。丘有==名^司莱貧料庫t例，若總數有692個歌手姓名， _學:型做法:⑵中 :並=下名字時^科算r同=二『率，所以在計算這-筆歌手請麥閱第四1|，係顯示本發明之語音辨搜尋比對方法之較佳實施例之步驟流程圖，其步驟^杲貝枓庫步驟S41 ·輸入語音之梅爾倒頻谱係數；步驟S42 ··讀入歌手姓名模型；目前歌手姓名的發音與前一 ίΓΓ疋，執行步驟S44,若否，則執行步驟撕；代替，再由不同發音的字繼續進行下二錄的機率 200818117 V_S45利用維特比演算法（viAlgorithm)計算機率；步驟S46 ··儲存目前歌手姓名每個字的機率；步驟S47:是否所有歌手姓名皆已計算機率，若是，執行步驟S48 ’右否，則重複上述步驟S42 ;以及步驟S48:排列出五個最大機率的歌手姓名。以歌手姓名「陳力行」為例，其與歌手「陳力宏」是相同的，因此在做維特比演算去十#叶輪入的梅爾倒頻譜係數先盘「陳力杆所 ::表的6個聲學模型做機率計算， ‘ =皆下來輸入語音要與「陳力宏」做機率計;:子以目前「尹的陳」攻兩個字的機率，接著加上力宏」的完整機率。即』侍到1陳以上所述僅為舉例性，而非為限之精神與齡，而對其進行之等效修改或本發明之申請專利範圍中。 /文更，均應包含於後附圖式簡單說明】置方法之圖係顯示本發明之語音辨識系統之步驟流程圖；貝枓庫建資料庫建置方法之第二圖係顯示本發明之語音辨識系統之詞季父佳實施例之步驟流程圖； ^圖係顯示本發明之語音辨識系統之去之步驟流程圖；以及、枓庫彳又哥比對方 10 200818117 第四圖係顯示本發明之語音辨識系統之詞彙資料庫搜尋比對方法之較佳實施例之步驟流程圖。【主要元件符號說明】 S11〜S16 :步驟流程； S21- -S26 :步驟流程； S31- ◊S35 :步驟流程；以及 S41- -S48 :步驟流程。 11Algorithm) calculation generated. Qiu has == name ^ Sile poor library t example, if the total number of 692 singer names, _ learning: type practices: (2) in: and = the next name ^ ^ calculate r with = two "rate, so in calculating this - The pen singer please read the fourth step 1|, which is a flow chart showing the steps of the preferred embodiment of the speech recognition search method of the present invention, the step of which is step S41, the input of the voice of the Mel Cepstrum Coefficient; Step S42 ··Reading the artist name model; Currently the pronunciation of the artist name is the same as the previous one, step S44 is performed, if not, the step is performed; instead, the probability of the next two words being continued by the different pronunciation words 200818117 V_S45 uses the Viterbi algorithm (viAlgorithm) computer rate; Step S46 ··Saves the probability of each word of the current artist name; Step S47: Whether all the artist names have computer rate, and if yes, execute step S48 'No right, repeat The above step S42; and step S48: arranging the five maximum probability singer names. Take the singer's name "Chen Lixing" as an example. It is the same as the singer "Chen Lihong". Therefore, in the Viterbi calculus, the Mel cepstral coefficient of the tenth impeller is entered first. "Chen Lie:: Table 6 An acoustic model is used to calculate the probability. ' = All voices are input with Chen Lihong. The probability of attacking the two characters with the current "Yin Chen" is followed by the full probability of Leehom. That is, the above description is only exemplary, and not limited to the spirit and age, and the equivalent modification thereof or the patent application scope of the present invention. / wen wen, should be included in the following description of the brief description] The method of the method shows the flow chart of the steps of the speech recognition system of the present invention; the second figure of the Beckham database construction method shows the present invention A flow chart of the steps of the speech recognition system; the figure shows the flow chart of the steps of the speech recognition system of the present invention; and, the library and the other party 10 200818117, the fourth figure shows the invention A flow chart of the steps of a preferred embodiment of the lexical database search alignment method of the speech recognition system. [Description of main component symbols] S11~S16: Step flow; S21--S26: Step flow; S31- ◊S35: Step flow; and S41--S48: Step flow. 11

Claims

200818117 X. Patent application scope: At least one or two kinds of speech recognition system vocabulary database construction method broken words data provide input vocabulary = pair = phonetic data, determine whether the vocabulary contains at least one pronunciation mode respectively Multiple

2 No = Create a single-corresponding acoustic model for the word capsule; store the vocabulary and the acoustic models to the vocabulary database. = The vocabulary of the speech recognition system described in the third paragraph of the patent application, wherein the broken word data includes a plurality of breaks, and the pronunciation of the broken words. 3. If the patent application is based on the vocabulary, the speech model is a Hidden Markov Model (HMM). 4. A vocabulary database search comparison method for a speech recognition system, comprising at least: + providing a vocabulary database comprising a plurality of vocabularies, the vocabulary being sorted in a manner in which the same sub-headers are adjacent, and The vocabulary corresponds to a plurality of acoustic models in a one-to-one manner; inputting a voice signal; extracting one of the characteristic parameters of the voice signal; and comparing the feature parameters to the acoustic models of the vocabulary one by one, The scholastic model generates a probability value corresponding to the characteristic parameter, wherein each of the vocabulary inherits a probability value generated by the same uttered character in the previous adjacent vocabulary; and 12 200818117 through the vocabulary The probability value is entered into ~▲2, as described in the fourth paragraph of the patent application scope. A library search alignment method in which the acoustic data model is used. ', is a Tibetan-style Marco, = ϊ patent scope of the fourth paragraph of the speech recognition system of the word capsule database search brother comparison method, wherein the characteristic parameter is a Mel Cepstrum coefficient (Mel-Frequency Cepstrum C〇efficients, MFCC). 7 7. The vocabulary database search comparison method of the speech recognition system according to claim 4, further comprising calculating the probability value by using a Viterbi Algorithm. 13