TWI299854B - Lexicon database implementation method for audio recognition system and search/match method thereof - Google Patents

Lexicon database implementation method for audio recognition system and search/match method thereof Download PDF

Info

Publication number
TWI299854B
TWI299854B TW95137548A TW95137548A TWI299854B TW I299854 B TWI299854 B TW I299854B TW 95137548 A TW95137548 A TW 95137548A TW 95137548 A TW95137548 A TW 95137548A TW I299854 B TWI299854 B TW I299854B
Authority
TW
Taiwan
Prior art keywords
vocabulary
word
broken
recognition system
words
Prior art date
Application number
TW95137548A
Other languages
Chinese (zh)
Other versions
TW200818117A (en
Inventor
Chung Po Liao
Original Assignee
Inventec Besta Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Co Ltd filed Critical Inventec Besta Co Ltd
Priority to TW95137548A priority Critical patent/TWI299854B/en
Publication of TW200818117A publication Critical patent/TW200818117A/en
Application granted granted Critical
Publication of TWI299854B publication Critical patent/TWI299854B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

,1299854 七、指定代表圖: (一)本案指定代表圖為:第(一)圖。 - (二)本代表圖之元件符號簡單說明: - S11〜S16 :步驟流程。 八、本案若有化學式時,請揭示最能顯示發明特徵的化 學式: 九、發明說明: 【發明所屬之技術領域】 本發明係為提供一種語音辨識系統之詞彙資料庫建置方 法及其搜尋比對方法,特別是一種可支援破音字處理之詞彙 資料庫建置方法及其更具效率之搜尋比對方法。 4 1299854 【先前技術】 習知語音辨識系統,並沒有加入破音字的處理功能,導 致使用者在進行語音輸入時,必須唸成其破音字的另一種發 音才能辨識成功,例如,人名陳力行的「行」字,必須發音 為「厂尤v」才能辨識成功,如使用者發音為「丁一厶/」 便無法正確辨識,又例如,樂團的「樂」字,必須發音為「力 亡\」才能辨識,若發音為「u廿\」亦無法正確辨識,而 這樣的語音輸入方式與一般使用者的發音習慣有很大的差 異。此外,語音辨識系統在進行辨識時,通常是利用維特比 演算法(Viterbi Algorithm)計算詞彙中每個字所瘫爽 學,型的機率值來進行辨識,而這樣的演算也是語音辨;氕 統花費乘大計算量的地方,因此,若是經常重複計算^系 同的予將導致系統不必要的計算量加重,也會造成系統 速度的下降,因此促成我們思考如何避免重複計算相 ^ 以降低整體的運算量。 j的予 本發明人基於多年從事研究與諸多實務經驗,經 究設計與專題探討,遂於本發賴出—種語 研 ;;=;方法及其—前述:㈡ 【發明内容】 有^監於上述課題,本發明之目的為提供—種語音辨 «料及其搜尋比對方法,特別是;= 處理之詞棄資料庫建置方法及其更具效率以 5 Ί299854 (a) 提供一破音字資料; (b) 輸入一詞彙; (c) 比對破音字資料,判斷此詞彙是否包含至少一破音 字,若是,則對於此詞彙所包含之破音字之複數個發音方式 分別建立相對應之複數個聲學模型,若否,則對於此詞彙建 立單一對應之聲學模型;以及 ' ^ (Φ儲存此詞彙及其對應之聲學模型至詞彙資料庫。 承上所述,因依本發明之語音辨識系統之詞彙資料庫建置 方法士其搜尋崎方法,可建置—種域破音字處理功能之 詞彙資料庫,使語音辨識系統更加人性化也更貼近一般使用 者之發音習慣。此外,依本發明之詞彙資料庫搜尋 法,可避免系統重複計算。、 茲為使貴審查委員對本發明之技術特徵及所達成之 功效有更進-步之瞭解與認識,下文謹提供較佳之實施例及 相關圖式以為辅佐之用,並以詳細之說明文字配合說 後。 【實施方式】 /以了 ^照相麵式,·依本發明較佳實施例之語音辨識 糸統之詞彙貧料庫建置方法及其搜尋比對 i 二 件將以相_參照符號加以綱。 /、中相同的兀 ^發狀語音_系齡妓利㈣藏式馬可夫 =ddenMark〇vModel,HMM)的方法作觸,它以機率模型 私杨音的現象’將-小段語音的發音過程,看成是 可去 中連續的狀態轉移;其帽識過程所_之語音特徵 ( Me I-Frequency Cepstrum Coefficients^ 6 1299854 MFCC),它除了考慮到人耳對不關率的感受程度,更 分離發音腔减型與激發峨的雜,使得我們在語音 時不會受到說話者的音量大小,或中文語音之五種聲調 • (一、二、三、四聲與輕聲)的影響。 基於以上特性’我們將從245個3中文破音字中選出 本發明辨識系統之破音字’由於辨識時利用到的特徵參 梅爾倒頻譜係數,因此破音字中其發音差異僅在於聲調不同這 些字,並不包含在我們要處理的破音字中 _ 字的發音有兩種,其-為「以v」,另—則為「個= # ^於聲調的不同,我們便將其捨去,最後剩下來的便是我們的 f音字資料’其包含的字大致有:行、仔、樂、和、重二我:的 · ;、m、w、沒、校、從、都、落、朝、傳、單、彷、 :;、m、強、調、參、黏、省、塞、差、蓋、傍、般、 m、i、暴、熟、模、給、薄、告、嚇、藏、還、翟、 識、騎、繫、覺、露、屬、攪等等。 ^隹 建圖’侧林發明之語音辨識系統之詞彙資料庫 建置方法之步驟流程圖,其步驟如後: 、 φ 步驟S11 ··提供一破音字資料; 步驟S12:輸入一詞彙; 少破音字資料’判斷該詞彙是否包含至 -等tUi,則對於該詞囊所包含之該破音字之複數 .於該詞彙ί:單對應之複數個聲學模型’若否,則對 』果遷立早對應之該聲學模型;以及 2此儲存該詞彙及該些聲學模型至該詞彙資料庫。 上it:/上述破音字資料係包含複數個破音字及轉立方$, 上述聲學_係為m馬可夫翻。 料音方式 7 !299854 請參閱第二圖,係顯示本發明之語音辨識系統之詞彙資料庫 建置方法之較佳實施例之步驟流程圖,其係以歌手姓名為例,建 置歌手姓名之詞彙資料庫,其步驟如後·· 步驟S21 · f買入歌手姓名; 步驟S22 ·比對破音字資料,判斷此歌手姓名是否包含 至少一破音字,若是,執行步驟S23 ,若否,執行步驟S24 ; 步驟S23 ··增加一組由破音字代替的姓名; 步驟S24 ··分別將姓名的字轉換成由隱藏式馬可夫模 來表示;、 步驟S25 ··是否讀到最後一筆歌手姓名;以及 步驟S26 ··結束初始化,進入辨識流程。 透過本發酬建置之詞彙歸庫,具有破音字_功能,讓 使用者能夠依照一般慣用之發音,而得到正確的辨識結果。 另外’在語麵識技射,每—辦文字可將其分解為聲母 口明母’琴母出現在音節前端,韻母出現在音節尾端,每一個中 文字都可_兩個麵聲母及韻·聲學觀來代表,而扭 母及韻母的聲學模型機率值來做判定,二如 果將_讀庫中的詞彙以字首相同者排在一起的方式作排序, 了W-個詞彙同音字的機率值,在計算時便只要計算目前 的_與上-_彙不同音字的機率值,而不f重複計 的機率值,可節省搜尋比對時的計算量。 ^曰 請參閱第三圖,係顯示本發明之語音辨識系 搜哥比對方法之步職程圖,齡驟如後: 4貝科犀 步驟S31 :提供一詞彙資料庫,係包含複數個詞囊,此些, 果係以字百相同者相鄰之方式進行排序,且,此些詞囊係以二對 1299854 一方式對應於複數個聲學模型; 步驟S32 ··輸入一語音訊號; 步驟S33 :擷取語音訊號之一特徵參數; 步驟S34 :將特徵參數逐一比對此些詞彙之聲學模型風 模型係對應於特徵參數分別產生—機率值,其中,每了 承前一相鄰詞彙中相同發音字元所產生之機率值;以及〜 步驟您:透過此些詞彙之機率值,以進行語音訊號之辨識。 上述每學㈣係為—隱藏式馬可夫模型,上述特徵來 一梅爾倒_ 係數(MehFrequeneyCepstrumC n = Μ_,上述機率值係利用一維特比演算法m 她,, 1299854 VII. Designation of representative drawings: (1) The representative representative of the case is: (1). - (b) A brief description of the symbol of the representative figure: - S11~S16: Step flow. 8. If there is a chemical formula in this case, please disclose the chemical formula that best shows the characteristics of the invention: IX. Description of the invention: [Technical field of the invention] The present invention provides a vocabulary database construction method for a speech recognition system and its search ratio The method, in particular, a vocabulary database construction method that supports broken word processing and a more efficient search comparison method. 4 1299854 [Prior Art] The conventional speech recognition system does not include the processing function of the broken words, so that the user must recognize another pronunciation of the broken words when the voice input is performed, for example, the name of Chen Lixing The word "行" must be pronounced "factory v" to identify success. If the user pronounces "Ding Yizhen/", it will not be recognized correctly. For example, the "Le" word of the orchestra must be pronounced "Defense\" In order to be recognized, if the pronunciation is "u廿\", it cannot be correctly identified, and such a voice input method is quite different from the pronunciation habit of a general user. In addition, when the speech recognition system performs identification, it is usually calculated by using the Viterbi Algorithm to calculate the probability value of each word in the vocabulary, and the calculation is also speech recognition; It takes a lot of calculations, so if you repeatedly repeat the calculations, the system will increase the amount of unnecessary calculations, which will cause the system to slow down. This will lead us to think about how to avoid double counting. The amount of computation. The inventor of the present invention based on years of research and many practical experiences, research design and topical discussion, 遂 本 本 — 种 种 种 种 种 种 种 种 种 种 ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; In view of the above problems, the object of the present invention is to provide a speech recognition method and a search comparison method thereof, in particular, a method for constructing a word discarded database and a more efficient method for providing a broken word by 5 Ί 299854 (a). (b) input a vocabulary; (c) compare the broken words, determine whether the vocabulary contains at least one broken word, and if so, establish a corresponding plural for the plural pronunciations of the broken words contained in the vocabulary An acoustic model, if not, a single corresponding acoustic model for the vocabulary; and '^ (Φ stores the vocabulary and its corresponding acoustic model to the lexical database. As described above, the speech recognition system according to the present invention The vocabulary database construction method is based on the search for the Saki method, and the vocabulary database of the domain-breaking word processing function can be built, so that the speech recognition system is more humanized and closer to the pronunciation of the general user. In addition, according to the vocabulary database search method of the present invention, the system can be repeatedly calculated. In order to enable the reviewing committee to have a more advanced understanding and understanding of the technical features and the effects achieved by the present invention, the following is a summary. The preferred embodiments and related drawings are used for assistance, and the detailed explanations are used in conjunction with the following descriptions. [Embodiment] / Taking a photo surface, the speech recognition system according to the preferred embodiment of the present invention is poor in vocabulary The method of database construction and its search comparison i will be based on the phase_reference symbol. /, the same 兀^ hairline voice _ system age profit (four) Tibetan Markov = ddenMark 〇 vModel, HMM) At the touch, it takes the phenomenon of the probabilistic model Yang Yang's phenomenon to describe the pronunciation process of the small-segment speech as a continuous state transition; the voice feature of the cap recognition process ( Me I-Frequency Cepstrum Coefficients^ 6 1299854 MFCC), in addition to taking into account the human ear's perception of the rate of unrelatedness, it is more separate from the pronunciation cavity reduction and the excitation 峨, so that we do not receive the speaker's volume when speaking, or Chinese speech. Five tones • The effects of (one, two, three, four and soft). Based on the above characteristics, we will select the broken word of the identification system of the invention from 245 3 Chinese broken words. Because of the characteristic parameter Merkel cepstral coefficient used in the identification, the difference in pronunciation in the broken words is only the words with different tones. It is not included in the broken words we want to deal with. There are two kinds of pronunciations of the word _, the - is "with v", and the other is "a = # ^ in the difference of the tone, we will give it up, and finally What's left is our f-word data, which contains roughly the following words: line, aberdeen, music, and, heavy, two: my, m, w, no, school, slave, capital, fall, dynasty, Pass, single, imitation, :;, m, strong, tune, ginseng, sticky, province, plug, poor, cover, squat, general, m, i, violent, cooked, model, give, thin, sue, scare, hide ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, , φ step S11 ··provide a broken word data; step S12: input a vocabulary; less broken word data 'determine whether the vocabulary Including t-i, etc., for the plural of the broken word contained in the word capsule. In the word ί: a plurality of acoustic models corresponding to a single one, if not, the acoustic model corresponding to the early transition; 2 This stores the vocabulary and the acoustic models into the vocabulary database. On it: / The above-mentioned broken word data contains a plurality of broken words and a turn cube $, the above acoustic _ is m makfu. The sound mode 7 !299854 Please refer to the second figure, which is a flow chart showing the steps of a preferred embodiment of the vocabulary database construction method of the speech recognition system of the present invention. The vocabulary database of the singer's name is established by taking the singer's name as an example. Step S21 · f buys the artist name; Step S22 · Compare the broken word data, determine whether the artist name contains at least one broken word, if yes, execute step S23, if not, execute step S24; Step S23 ·· Adding a set of names replaced by broken words; Step S24 · Converting the words of the name to be represented by a hidden Markov module; Step S25 · Whether to read the last artist name; and Step S26 ·· The beam is initialized and enters the identification process. The vocabulary returned to the library through this payment has a broken word _ function, which allows the user to get the correct identification result according to the usual idiom. Each text can be decomposed into a mother-in-law mother-in-law. The mother-in-law appears at the front end of the syllable, and the finals appear at the end of the syllable. Each Chinese character can be represented by two-faced initials and rhyme and acoustics. And the acoustic model probability value of the finals is used for judgment. Second, if the vocabulary in the _reading library is sorted by the same word prefix, the probability values of the W-word homophones are calculated. The current _ and __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ^曰Please refer to the third figure, which shows the step-by-step diagram of the speech recognition system of the present invention. After the age is as follows: 4 Beike rhinoceros step S31: provide a vocabulary database containing a plurality of words The capsules are sorted in such a manner that the words are identical to each other, and the word capsules correspond to the plurality of acoustic models in a pair of 1299854; step S32 · inputting a voice signal; step S33 : extracting one of the characteristic parameters of the voice signal; Step S34: comparing the characteristic parameters one by one to the acoustic model wind model of the vocabulary corresponding to the characteristic parameter respectively generating a probability value, wherein each of the adjacent vocabulary has the same pronunciation The probability value generated by the character; and ~ Step you: use the probability values of these words to identify the voice signal. Each of the above (4) is a hidden Markov model, and the above features are a Meyer's coefficient (MehFrequeneyCepstrumC n = Μ _, the probability value is a one-dimensional performance algorithm m she,

Algorithm)計算產生。 以歌手姓名之雜倾料例,若總數有692個歌手姓名, 共有2233個字,在做維特比演算法計算機率時,每段注音將合盘Algorithm) calculation generated. In the case of a singer's name, if there are 692 singer names in total, there are 2233 words. When doing the Viterbi algorithm computer rate, each piece of phonetic will be combined.

ί 靡次的搜尋,在這些搜尋中有“是重G 异的,因此’士發明將歌手姓名作排序,讓相同姓的歌手排在一 起’亚且記下前-個名字同音字的機率,所 名字時,只要計算不同音字的機率。 聿歌乎 請參閱第係顯示本發明之語音辨 搜尋比對方法之較佳實施例之步驟流程圖,其步驟如後 1菜貝+犀 步驟S41 .輸入語音之梅爾倒頻譜係數; 步驟S42:讀入歌手姓名模型; 驟S43=i斷目前歌手姓名的發音與前—個歌手姓名 疋否重複’右疋,執行步驟S44,若否,則執行步驟雜; wH44·”音的字利用前一個名字記錄的機率 代替,再由不同發音的字繼續進行下—個步鄉; 9 1299854 步驟S45 :利甩維特比演算法(Viterbi Algorithm)計 算機率; 步驟S46 ·儲存目前歌手姓名每個字的機率; 右疋 步驟S47 :是否所有歌手姓名皆已計算機率 行步驟S48,若否,則重複上述步驟S42 ;以及 步驟S48 :排列出五個最大機率的歌手姓名。ί 的 的 的 , , , 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的 的In the name, it is only necessary to calculate the probability of different phonetic words. Please refer to the flow chart of the preferred embodiment of the voice recognition search comparison method of the present invention, and the steps are as follows: 1 dish + rhino step S41. Enter the Mel Cepstral coefficient of the voice; Step S42: Read the singer name model; Step S43=i break the current singer name pronunciation and the previous singer name 疋 No repeat 'Right 疋, execute step S44, if not, execute Steps are mixed; wH44·” words are replaced by the probability of the previous name record, and then the words of different pronunciations continue to proceed to the next step; 9 1299854 Step S45: Viterbi Algorithm computer rate; Step S46: storing the probability of each word of the current singer name; right 疋 step S47: whether all singer names have been computerized in step S48, if not, repeating the above step S42; and step S48: arranging The chances of the five biggest artist name.

以歌手姓名「陳力行」為例,其與歌手「陳力宏」相鄰,這 =立歌手姓名的前兩個字的發音是相_,因此在做維特比演瞀 法之計算時’輸人語音__賴餘先盘「陳二 =表^麟學__铸算,並存純 ,,皆下來輸人語音要與「陳力宏」做機率計算時,口 二r宏」r個聲學模型所計算的機率值ΐ可 力在」的完整機率。 彳付到陳 。任何未脫離本發明 更’均應包含於後附 以上所述僅為舉例性,而非為限制性者 之精神與範.,㈣其進行之等效修改或 之申請專利範圍中。 圖式簡單說明】 第-圖係顯示本發明之語音辨識系統之 步驟流裎圖; 果貝枓庫建置方法之 車之:辨識系統之詞彙資料庫建置方法之 統之詞彙資料庫搜尋比對方 第三圖係顯示本發明之語音辨識系 法之步驟流程圖;以及 3 ^ 1299854 第四圖係顯示本發明之語音辨識系統之詞彙資料庫搜尋比對方 法之較佳實施例之步驟流程圖。 【主要元件符號說明】 S11〜S16 :步驟流程; S21〜S26 :步驟流程; S31〜S35 :步驟流程;以及 S41〜S48 ··步驟流程。Take the singer's name "Chen Lixing" as an example. It is adjacent to the singer "Chen Lihong". This is the pronunciation of the first two words of the singer's name. Therefore, when doing the calculation of the Viterbi deductive method, the input voice is entered. __赖余先盘"Chen Er = Table ^ Lin Xue __ Casting, and pure, all down, the input voice is to be calculated with "Chen Lihong" when the probability is calculated, the mouth two r macro" r acoustic model calculated The probability that the probability value is at a good chance. I paid to Chen. The present invention is intended to be limited to the scope of the invention, and is not intended to be Brief Description of the Drawings] The first figure shows the flow chart of the speech recognition system of the present invention; the car of the method of establishing the vocabulary database of the identification system: the vocabulary database search ratio of the vocabulary database construction method of the identification system The third figure of the other party shows a flow chart of the steps of the speech recognition system of the present invention; and the third figure shows the flow chart of the preferred embodiment of the lexical database search comparison method of the speech recognition system of the present invention. . [Description of main component symbols] S11~S16: Step flow; S21~S26: Step flow; S31~S35: Step flow; and S41~S48 ··Step flow.

1111

Claims (1)

1299854 申請專利範圍: 1 2 4 ❿ 、:種吾音辨識系統之詞彙資料庫建置方法,至少包 提供一破音字資料; 輸入一詞彙; 音Ϊ對二是否包含至少一破 個發音方式分別建立含數二 否’^於該詞囊建立單一對應=個耳=型’若 、如申請專利範圍第!項所料庫。 字,及該些破音字;係包含複數個破音 庫建置方法,其中該語音模 J果貝枓 型(腿den Markov Model,麵^,、。减式馬可夫模 、:種語音辨識系統之詞棄資料庫搜尋比對方法,至少包 提供一詞彙資料庫,係包含福書 A 字首相同者相鄰之方式進行排序,且’該,詞彙係以 -方式對應於概鱗學翻; 〜轉係以-對 輸入一語音訊號,· 操取該語音訊號之一特徵參數; 將該特徵參數逐-比對該些詞彙 r學模型係對應於該特徵參數分別產生二些 之機率值;以及 果中相问务音子元所產生 12 1299854 透過该些凋彙之该些機率值,以進行該語音訊號之辨識。 5、如申請專利範圍f 4項所述之語音辨識系統之詞彙資料 庫搜尋比對方法,其中該聲學模型係為一隱藏式馬可 夫模型。 6、 如申料利細第4項·之語音辨κ統之詞棄資料 庫搜尋比對方法,其中該特徵參數係為—梅爾倒頻譜 係數(Me卜Frequency CePstrum C e MFCC)〇 lenTS,1299854 Patent application scope: 1 2 4 ❿ , : The vocabulary database construction method of the seed ou recognition system, at least one broken word data is provided; a vocabulary is input; whether the vocabulary pair contains at least one broken pronunciation method is established separately Including the number two no '^ in the word capsule to establish a single correspondence = ear = type 'if, as claimed in the scope of the scope of the item! The word, and the broken words; the method includes a plurality of broken sound bank construction methods, wherein the voice mode J is a type of the Markov Model, a face, a subtractive Markov model, and a speech recognition system. The word abandonment database search comparison method, at least provides a vocabulary database, which is sorted by the method in which the same words of the same word of Fushu A are adjacent, and 'the vocabulary is corresponding to the scales in the way--; Translating a speech signal with a pair of speech signals, and taking a characteristic parameter of the speech signal; and comparing the characteristic parameters to the vocabulary model corresponding to the characteristic parameter respectively to generate two probability values; The 12 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 The search comparison method, wherein the acoustic model is a hidden Markov model. 6. The method for searching and comparing the words of the speech recognition method according to the fourth item of the claim is as follows: wherein the characteristic parameter is - plum Cepstral coefficients (Me Bu Frequency CePstrum C e MFCC) square lenTS, 7、 如申請專職_ 4項所述之語音辨識系統 庫搜尋比對方法’其中更包含利用一維特比、、宫 (Viterbi Algorithm)計算該機率值。 ^7. If the application for full-time _ 4 of the speech recognition system library search comparison method' includes calculating the probability value using a Viterbi Algorithm. ^ 1313
TW95137548A 2006-10-12 2006-10-12 Lexicon database implementation method for audio recognition system and search/match method thereof TWI299854B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW95137548A TWI299854B (en) 2006-10-12 2006-10-12 Lexicon database implementation method for audio recognition system and search/match method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW95137548A TWI299854B (en) 2006-10-12 2006-10-12 Lexicon database implementation method for audio recognition system and search/match method thereof

Publications (2)

Publication Number Publication Date
TW200818117A TW200818117A (en) 2008-04-16
TWI299854B true TWI299854B (en) 2008-08-11

Family

ID=44769501

Family Applications (1)

Application Number Title Priority Date Filing Date
TW95137548A TWI299854B (en) 2006-10-12 2006-10-12 Lexicon database implementation method for audio recognition system and search/match method thereof

Country Status (1)

Country Link
TW (1) TWI299854B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI421857B (en) * 2009-12-29 2014-01-01 Ind Tech Res Inst Apparatus and method for generating a threshold for utterance verification and speech recognition system and utterance verification system
US8655655B2 (en) 2010-12-03 2014-02-18 Industrial Technology Research Institute Sound event detecting module for a sound event recognition system and method thereof
TWI660340B (en) * 2017-11-03 2019-05-21 財團法人資訊工業策進會 Voice controlling method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI421857B (en) * 2009-12-29 2014-01-01 Ind Tech Res Inst Apparatus and method for generating a threshold for utterance verification and speech recognition system and utterance verification system
US8655655B2 (en) 2010-12-03 2014-02-18 Industrial Technology Research Institute Sound event detecting module for a sound event recognition system and method thereof
TWI660340B (en) * 2017-11-03 2019-05-21 財團法人資訊工業策進會 Voice controlling method and system

Also Published As

Publication number Publication date
TW200818117A (en) 2008-04-16

Similar Documents

Publication Publication Date Title
CN107195296B (en) Voice recognition method, device, terminal and system
Gold et al. Speech and audio signal processing: processing and perception of speech and music
Yamagishi et al. Thousands of voices for HMM-based speech synthesis–Analysis and application of TTS systems built on various ASR corpora
Juang et al. Automatic speech recognition–a brief history of the technology development
CN104380373B (en) The system and method pronounced for title
US20130090921A1 (en) Pronunciation learning from user correction
JP3933750B2 (en) Speech recognition method and apparatus using continuous density Hidden Markov model
JP4296231B2 (en) Voice quality editing apparatus and voice quality editing method
WO2019214047A1 (en) Method and apparatus for establishing voice print model, computer device, and storage medium
WO2017114172A1 (en) Method and device for constructing pronunciation dictionary
CN108630200B (en) Voice keyword detection device and voice keyword detection method
TW201118854A (en) Method and apparatus for builiding phonetic variation models and speech recognition
Sharma et al. NHSS: A speech and singing parallel database
Zhang et al. Durian-sc: Duration informed attention network based singing voice conversion system
JP5326169B2 (en) Speech data retrieval system and speech data retrieval method
WO2022089097A1 (en) Audio processing method and apparatus, electronic device, and computer-readable storage medium
CN108346426A (en) Speech recognition equipment and audio recognition method
Mittal et al. Development and analysis of Punjabi ASR system for mobile phones under different acoustic models
TWI299854B (en) Lexicon database implementation method for audio recognition system and search/match method thereof
Renals et al. Speech recognition
Wrench A new resource for production modelling in speech technology
CN101217035A (en) A vocabulary database construction method and the corresponding hunting and comparison method for voice identification system
Nakano et al. A drum pattern retrieval method by voice percussion
Selvan et al. Speaker recognition system for security applications
Veisi et al. Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees