TW449734B - Keyword spotting method for mandarin speech without using filler models - Google Patents

Keyword spotting method for mandarin speech without using filler models Download PDF

Info

Publication number
TW449734B
TW449734B TW88115161A TW88115161A TW449734B TW 449734 B TW449734 B TW 449734B TW 88115161 A TW88115161 A TW 88115161A TW 88115161 A TW88115161 A TW 88115161A TW 449734 B TW449734 B TW 449734B
Authority
TW
Taiwan
Prior art keywords
model
chinese
scope
patent application
keyword extraction
Prior art date
Application number
TW88115161A
Other languages
Chinese (zh)
Inventor
Chung-Mou Pengwu
Sen-Chia Chang
Chun-Hsien Chen
Szu-Chen Jou
Original Assignee
Ind Tech Res Inst
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ind Tech Res Inst filed Critical Ind Tech Res Inst
Priority to TW88115161A priority Critical patent/TW449734B/en
Application granted granted Critical
Publication of TW449734B publication Critical patent/TW449734B/en

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention is a keyword spotting method for mandarin speech without using filler models that by means of the connection structure for the regularity for sound origin in connection with tune origin in pronunciation of mandarin character, first identify the possible locations of the end points of character pronunciation signal in the input speech signal and then form a set of character sound link map with the connection states of speech paragraph generated by the end points of the speech signals, and then recognize and verify the keyword from the character sound link map to spot the keyword in the speech signal.

Description

449734 ---------— ----------- 五、發明說明(1) 【本發明之領域】 本發明係有關語音辨識之技術頜域’尤指一種免用贅 詞模型之中文關鍵詞萃取方法。 【本發明之背景】 按,一般之語音辨識系缽所能辨識的範圍為有限的詞 彙,語者所輸入之語音與詞彙中之某一詞音必須完全符 合’不能摻雜其他的語音否則辨識易產生錯誤,例如在語 音訂車票系統中’假設能辨識的詞彙範圍為站名,當使用 者被問到終站時只能一字不差的回答「台北」而不能說 「我要到台北」,否則無法得到正確之辨識結果。 因此’為改進前述之缺失,即有關鍵詞萃取技術之發 展二以讓使用者自由輸入任何語詞,而由語音辨識系統自 動萃取可能之關鍵詞,因而解除了語音辨識上許多不便的 限制’而在關鍵詞萃取之習知作法,係如第五圖所示,先 用關鍵詞模型(51)及贅詞模型(52)建構—如圖所示之 接續關係的語音模型(53 ),再用雉特比(Viterbi )演 算法將語音訊號(55 )與語音模型(53 )之可能接續路徑 做匹配(Matching),以獲致如第六圖所示之匹配路徑 圖 而在得到一最大近似值(Maximum Like 1 ihood,ML ) 之=配路徑(6 1 )時即可得知在訊號中那些段落為贅詞之 f音’那些段落為那個關鍵詞之發音,對於萃取出之關鍵 β可發θ I又落再進一步做確認(veriHCau〇n),做出接受 (Accept)或拒絕(Rejecf)的決定,對於被接受的段落449734 ----------- ----------- V. Description of the invention (1) [Field of the invention] The present invention relates to the technology of speech recognition. Chinese keyword extraction method using redundant word model. [Background of the present invention] According to the general speech recognition system, the range that can be recognized is a limited vocabulary. The speech input by the speaker and a certain sound in the vocabulary must completely match 'cannot be mixed with other speech, otherwise recognition It is easy to make mistakes. For example, in the voice ticket booking system, 'assuming that the vocabulary range that can be recognized is the station name, when the user is asked at the end station, he can only answer "Taipei" verbatim instead of saying "I want to go to Taipei." "Or you wo n’t get the correct identification result. Therefore, 'in order to improve the aforementioned deficiency, there is the development of keyword extraction technology to allow users to enter any words freely, and the possible keywords are automatically extracted by the speech recognition system, thus eliminating many inconvenient restrictions on speech recognition' and In the conventional method of keyword extraction, as shown in the fifth figure, a keyword model (51) and a redundant word model (52) are first used to construct a speech model (53) of a connection relationship as shown in the figure, and then used. The Viterbi algorithm matches the speech signal (55) with the possible connection path of the speech model (53) to obtain a matching path map as shown in the sixth figure and obtain a maximum approximation (Maximum Like 1 ihood (ML) = Matching path (6 1), you can know that in the signal, those paragraphs are f sounds of redundant words, and those paragraphs are the pronunciation of that keyword. For the extracted key β, you can send θ I VeriHCau〇n and then make a further decision to accept (Accept) or reject (Rejecf), for the accepted paragraph

第4頁 449734Page 4 449734

五、發明說明 :2) 以及其 辨 識 結 果 即 為 被 萃 取 出 之 關 鍵詞。 惟 前 述 之 習 知 關 鍵 詞 萃 取 方 法 涉及贅詞模型 (Garbage Mode 1 或F i 1 1 er Model ) (52 )之建立,而為 要得到 較 好 之 萃 取 效 果 > 在 歐 洲 專 利EP0800158 A1 971008 ” 語 詞 萃 取 (Word spo 11 ing ) ”乙案中,已指出贅 詞模型 必 須 根 據使 用 者 在 應 用 場 合 常用之贅詞而建立,又 對中文 語 音 辨 識 而 __ 美 國 專 利 案 USP56490 57 "使用關鍵 詞模型 及 非 關 鍵 詞 模 型 之 語 音 辨 識 (Speech recognition employ i ng ke y word mode 1 i ng and non-keyword mode 1 i ng ) "也說明了同樣之現象 惟因贅詞的範圍往往 隨著應 用 領 域 ( Task Domai η ) 以及使用者的用語習慣而 有所變 化 因 此 為 了 得 到 較 好 的 萃取效果,必須針對使 用時會 常 出 現 之 贅 語 詞 彙 作 為 建 立 模型之依據,亦即贅詞 模型都 要 依 不 同 之 應 用 場 合 而 有 所 調整(Task Dependent ),而 調 整 所 耗 之 工 夫 均 涉 及 贅 詞 模塑的重建,故習知關 鍵詞萃 取 方 法 之 缺 點 為 每 當 改 變 應 用領城時,即要收集使 用者在 此 應 用 場 合 可 能 會 產 生 的 贅 詞,再針對這些資訊建 立適當 的 贅 詞 模 型 且 收 集 語 彙 是 極其繁瑣的過程,因 此,前 述 之 關 鍵 詞 萃 取 方 法 實 有 予 以改進之必要。 發 明 人 爰 因 於 此 本 於 積 極 發 明之精神’亟思一種可 以解決 上 述 問 題 之 免 用 贅 詞 模 型 之 中文關鍵詞萃取方法, 幾經研 究 實 驗 終 至 完 成 此 項 新 穎 進 步之發明° 【本發 明 之 概 述 ] 4 497 3 4 五、發明說明(3) 本發明之目的係在提供一種中文關鍵詞萃取方法,以 在不必針對應用領域建立贅詞模型之要求下萃取語音訊 中之關鍵詞。 ^ 為達前述之目的,本發明所提出之免用贅詞模型之中 文關鍵詞萃取方法係首先利用中文字音所具有之聲母接韻 母的接續結構,而自該輸入語音訊號中找出其可能之字音 訊號端點及字音段落,以建構出一由複數個字音段落接續 而成之字音連接圖;再依據關鍵詞之詞長以從該字音連接 圖中辨識出關鍵詞,而得以免用贅詞模型並能正確地 中文關鍵詞。 由於本發明之設計新穎’能提供產業上利用,且確 增進功效,故依法申請專利。 為使責審查委員能進一步瞭解本發明之結構、特徵 及其目的,茲附以圖式及較佳具體實施例之詳細說明如 【圖式簡單說明】 第一圖:係本發明之免用贅詞模型之中文關鍵 方法的處理流程。 μ平取 第二圖:係本發明之免用贅詞模型之中文關鍵 方法所使用之通用聲、韻母模型。 卒取 第三圖··係說明依據不同字音個數之假設以 音訊號進行匹配演算所得到之最大近似值。 a 第四圖:係顯示一語音訊號"從新竹到桃園"之發音波V. Description of the invention: 2) and its identification results are the key words that have been extracted. However, the aforementioned conventional keyword extraction method involves the establishment of a redundant word model (Garbage Mode 1 or F i 1 1 er Model) (52), and in order to obtain a better extraction effect > in European Patent EP0800158 A1 971008 '' words Extraction (Word spo 11 ing) In case B, it has been pointed out that the superfluous word model must be established based on the superfluous words commonly used by users in the application, and the Chinese speech recognition __ US Patent Case USP56490 57 " Using Keyword Model And non-keyword model speech recognition (Speech recognition employ i ng ke y word mode 1 i ng and non-keyword mode 1 i ng) " also illustrates the same phenomenon because the scope of redundant words often varies with the application area ( Task Domai η) and the user ’s terminology have changed. Therefore, in order to obtain a better extraction effect, the verbal vocabulary that often appears during use must be used as the basis for building a model, that is, the verbal model must be based on different Application-specific adjustments (Task Dependent ), And the time spent adjusting all involves the reconstruction of superfluous words. Therefore, the disadvantage of the conventional keyword extraction method is that whenever the application city is changed, the superfluous words that may be generated by users in this application are collected. It is extremely tedious to build an appropriate redundant word model and collect vocabulary based on this information. Therefore, it is necessary to improve the aforementioned keyword extraction method. Because of this, the spirit of active invention was' immediately thinking about a Chinese keyword extraction method that can solve the above problem without using redundant word models. After several research experiments, this novel and progressive invention has been completed. Overview] 4 497 3 4 V. Description of the invention (3) The purpose of the present invention is to provide a Chinese keyword extraction method to extract keywords in a voice message without the need to establish a redundant word model for the application field. ^ In order to achieve the aforesaid objective, the Chinese keyword extraction method of the unnecessary word model proposed by the present invention first uses the consonant structure of consonants and vowels in the initials of Chinese characters, and finds its possible from the input voice signal The end points of the zigzag signal and the syllabic paragraphs are used to construct a zigzag connection diagram which is a continuation of a plurality of syllabic paragraphs; and then the keywords are identified based on the word length of the keywords to avoid the redundant Word model and correct Chinese keywords. Since the novel design of the present invention can provide industrial utilization and indeed enhance the efficacy, a patent is applied in accordance with the law. In order to enable the review committee to further understand the structure, characteristics and purpose of the present invention, detailed descriptions of the drawings and preferred embodiments are attached as [Simplified Description of the Drawings]. Word model Chinese key method processing flow. μ flat drawing Figure 2: This is the general sound and vowel model used in the Chinese key method of the unnecessary word model of the present invention. The third figure is the maximum approximation obtained by matching calculations based on the assumption of different numbers of sounds. a The fourth picture: it shows a voice signal " from Hsinchu to Taoyuan "

第6頁 449734 五、發明說明(4) 形及依據本發明之免用贅詞模型之中文關鍵詞萃取方法所 形成之字音連接圖。 第五圖:係顯示一由關鍵詞模型及贅詞模型所建構之 習用語音模型。 第六圖:係顯示以維特比演算法將語音訊號與語音模 型做匹配所獲得之匹配路徑圖。 【圖號說明】 (2 1 )聲母模型 (22 )韻母模型 ( 31 ) ( 42 ) ( 43 ) ( 44)字音訊號端點組 (4 1 )( 5 5 )語音訊號發音波形 (4 5 )字音連接圖 (5 1 )關鍵詞模型 (52 )贅詞模型 (5 3 )語音模型 (61 )匹配路徑 【較佳具體實施例之詳細說明】 有關本發明之免用贅詞模型之中文關鍵詞萃取方法之 一較佳實施例,請先參照第一圖所示,其顯示本發明之方 法主要包括有尋找語音訊號字音段落及關鍵詞搜尋之兩階 段的處理步驟(SI,S2 ),其中,尋找語音訊號字音段落 之處理步驟(S1 )係用以找出所欲辨識之語音訊號中字音Page 6 449734 V. Description of the invention (4) The character-to-speech connection diagram formed by the Chinese keyword extraction method of the shape-free and redundant word model according to the present invention. Figure 5: Shows a conventional speech model constructed by a keyword model and a redundant word model. Figure 6: Shows the matching path map obtained by matching the voice signal with the voice model using the Viterbi algorithm. [Illustration of drawing number] (2 1) Initial model (22) Final model (31) (42) (43) (44) Word audio signal endpoint group (4 1) (5 5) Voice signal pronunciation waveform (4 5) Word sound Connection diagram (5 1) Keyword model (52) Redundant model (5 3) Speech model (61) Matching path [Detailed description of the preferred embodiment] Chinese keyword extraction related to the free redundant model of the present invention A preferred embodiment of the method, please refer to the first figure, which shows that the method of the present invention mainly includes a two-stage processing step (SI, S2) of searching for a voice signal, a paragraph of words, and a keyword search. The processing step (S1) of the voice signal word and sound paragraph is used to find the word sound in the voice signal to be recognized

4497 3 4 五、發明說明(5) 訊號端點的可能位置,並據 (训―Graph),而關鍵= 音連接圓 從字音連接圖中尋找關鍵詞,據U =理:驟⑻則 正確萃取中文關鍵詞之目的。據以達成免用贅詞模型而可 中在的處理步驟⑶ 擋之聲母聲音(例如「電」字之’「'由氣^被發音器官所阻 發出之韻母聲音(例「&子〜幻」)及由聲帶顫動所 生,而根據中文字音的-弓」)所接續產 可利用語音模型與訊號比對韻母的規律性, 聲、韻母段落,進而找出字音音;號中匹配出 處理步驟首先需建立一如 /又落,因此,此階段之 及韻母模型(22)所接續而:斤由f母模型⑺) 能夠辨識語音訊號是否為 :I、頊母模型,以便 般已知之能夠辨熾θ ·或明母即可,但不需要如一 批夕J符-¾疋那一個聲 型’該通用聲、韻母模型之建立= 用聲、韻母模 的訓練語料而訓練出只 =收集所有聲、韻母 是將已有的專用聲、措聲母或韻母的通用模型,或 在建立ϋΞΪ =#模型予以合併而獲得。 訊號比對法則在言五立3 =型之,含則可使用語音模型與 出字音訊號之1;::=:=韻母段落,進而找 號比對法則係為維特比t ^演中語音模型與訊 =法進行匹配時,=公二’ 第一圖所不,假設輸入訊號最多有L個字音, 五、發明謂^' 1 則在^對t輪入語音訊號進行匹配演算過程令可得到字音數為 予音數為L的最大近似值ML广MLl,其中每一最大近似值 : '應予音讯號端點組(3 1 )’在此L個最大近似值 由.於使用維特比演算法切割後的結果可能會有稍許插 =二除(lnserti〇n / Deleti〇n)的誤差,亦即輸人訊 合’正的子音個數與對應於最大近似值最高的字音個數 句1差1、2個,而為容許此種誤差之存在,在選取匹配路 ,時,自ML广MLl中選取前}^個最高之最大近似值而非只取1 個最高之最大近似值,則此N個最大近似值所對應之字音 個j便有很高之機率可包含輸入語音之字音個數。此外, f貭算過程中,除了字音個數的資訊外尚可得到訊號中字 二段落的位置,這些字音段落的接續可構成一字音連接 ,同樣的,連接圖中有高機率的含有正確接續的字音段 第四圖係顯示一實際之範例,其係對一語音訊號,,從 到桃園"(41 )之發音波形以維特比演算法進行匹配 碉异,並自匹配演算所得之最大近似值中選取前三個最高 之最大近似值,而對應此三個噩宾夕县丄_ 间 ^ . β 7 。 ^ ^ 敢间之最大近似值之字音個 數為6、7、8,其中,正確之字音數6即包含在其中,又 ,三個最高之最大近似值所分別對應之字音訊號端點纟且、 ^42,43 J4)予以合併,即可獲致所有可能之字音訊& ,點0〜9及子音段⑨(以箭頭之起點至終點表示), 邊等字音段落接續而成之字音連接圖(“)。 當輸入語音之字音連接圖建構完成之後,即可進行第4497 3 4 V. Explanation of the invention (5) The possible positions of the signal endpoints, and according to (Training-Graph), and the key = sound connection circle looks for keywords from the word sound connection map. According to U = reason: Suddenly extract correctly The purpose of Chinese keywords. According to the processing steps that can be used to avoid the redundant word model, the consonant sound of the block (for example, the word "" "in the" electricity "vowel sound is blocked by the vowel organ (eg" & 子 ~ 幻”) And vocal vocal tremor, and according to the“ -bow ”of the Chinese character voicing”) can use the voice model and the signal to compare the regularity of the vowel, the vowel and the vowel paragraph, and then find the word sound; the number matches The processing steps first need to establish the same / failure. Therefore, at this stage, the vowel model (22) is followed by: the f mother model f) can identify whether the voice signal is: I, the mother model, so as to be known It can be distinguished by radiant θ · or Mingmu, but it does not need to be a sound pattern such as a batch of Xi J symbol -¾ 疋. The establishment of the universal voice and final model = training using the training data of the voice and final model only = Collecting all initials and finals is obtained by combining the existing general models of special initials, initials, or finals, or by merging the ϋΞΪ = # model. The rule of signal comparison is in the words 3 = type, and Han can use the voice model and the output of the voice signal; 1 :: =: = vowel paragraph, and then the rule of sign comparison is Viterbi t ^ during the speech model When matching with the signal = method, = the public second 'not shown in the first picture, assuming that the input signal has a maximum of L characters, five, the invention is called ^' 1 The matching calculation process of the t-round voice signal at ^ can be obtained The number of words is the maximum approximation ML and MLl of the number of presuppositions L, each of which is the maximum approximation: 'Yingyu signal signal endpoint group (3 1)' where the L maximum approximations are made by using the Viterbi algorithm after cutting The result may have a slight interpolation = two division (lnserti〇n / Deleti〇n) error, that is, the number of positive consonants and the number of phonetic sounds corresponding to the maximum approximation are 1, 1, 2 In order to allow the existence of such errors, when selecting the matching path, the top} ^ highest approximate value is selected from ML and MLl instead of taking only the 1 highest maximum approximate value. There is a high probability that the corresponding zigzag j can include the number of zigzags of the input voice. In addition, in the f 貭 calculation process, in addition to the information of the number of phonetic numbers, the position of the second paragraph of the signal can be obtained. The continuation of these phonetic paragraphs can form a phonetic connection. Similarly, there is a high probability in the connection diagram that there is a correct connection. The fourth picture of the word segment shows a practical example, which uses a Viterbi algorithm to match the pronunciation waveform of a speech signal from Taoyuan " (41) to the maximum approximation obtained from the matching calculation. Choose the first three highest maximum approximations, which correspond to these three Hobinxi counties 间 _ ^. Β 7. ^ ^ The number of vowels with the maximum approximate value of dare is 6, 7, 8, among which the correct number of vowels 6 is included, and the endpoints of the voicing signal corresponding to the three highest maximum approximations are, respectively, ^ 42, 43 J4) combined, you can get all possible phonics & points 0 to 9 and sub-syllables ⑨ (indicated by the start point to the end of the arrow). ). After the construction of the connection map of the input voice is completed, the first step can be performed.

第9頁 4497 3 4 五、發明說明(7) 二階段之關鍵詞搜尋之處理步驟(S2 ),亦即自字音連接 圖中尋找關鍵詞,以前述之範例說明,若候選之關鍵詞為 29個台灣的主要地名,而假設關鍵詞彙之内容均為2個字 音之台灣地名,則第四圓之字音連接圖(45 )即包含有! j 個含2字音的字音段洛0-2、〇-3、1-3、1_4、2-4、3_5、 4-6、4-7、5-8、5-9及6-9,其中字音段落1-4與段落5-9 即是「新竹」與「桃園」的發音,因此當以例如基於隱藏 式馬可夫模型(Hidden Markov Model,HMM)之中文語音 辨識器對此段落作辨識時,應可得到正確的結果,此外, 其餘的段落經由辨識也會各個得到一地名,因此,每個段 落辨識所得的结果只能視為產生關鍵詞的假定 (Hypotheses ),故經辨識後會有11個關鍵詞的假定,這 些假定需進一步的轉認以對辨識結果做出接受或拒絕之決 定,而最後被接受之確認結果即為萃取到的關鍵詞。對於 此範例而言’其辨識及確認之結果為: 段落 辨 識 結 果 確認 結果 0-2 厂 中 壢 J 拒 絕 0-3 厂 中 壢 J 拒 絕 1-3 厂 新 營 J 拒 絕 1-4 厂 新 竹 J 接 受 2-4 厂 新 竹 J 拒 絕 5-8 厂 桃 園 J 拒 絕 5-9 厂 桃 園 J 接 受Page 9 4497 3 4 V. Description of the invention (7) The processing step (S2) of the keyword search in the second stage, that is, searching for keywords from the phonetic connection diagram, using the example described above, if the candidate keyword is 29 This is the main place name in Taiwan, and assuming that the contents of the keyword collection are all 2 place names in Taiwan, the connection map (45) of the fourth circle contains it! j syllables with 2 syllables 0-2, 0-3, 1-3, 1_4, 2-4, 3_5, 4-6, 4-7, 5-8, 5-9, and 6-9, of which The phonetic paragraphs 1-4 and 5-9 are the pronunciations of "Hsinchu" and "Taoyuan". Therefore, when the Chinese speech recognizer based on the Hidden Markov Model (HMM) is used to recognize this paragraph, The correct result should be obtained. In addition, the remaining paragraphs will each get a place name through identification. Therefore, the result of each paragraph identification can only be regarded as a hypothesis of generating keywords (Hypotheses), so after identification, there will be 11 Keyword hypotheses. These hypotheses need to be further acknowledged to make an acceptance or rejection decision on the identification result, and the final accepted confirmation result is the extracted keywords. For this example, the results of its identification and confirmation are: Paragraph identification result confirmation result 0-2 Factory Zhong 坜 J Rejected 0-3 Factory Zhong1-3J Rejected 1-3 Factory Xinying J Refused 1-4 Factory Hsinchu J Accepted 2 -4 Factory Hsinchu J rejected 5-8 Factory Taoyuan J Rejected 5-9 Factory Taoyuan J Accepted

第10頁 449734Page 10 449734

其中,段落卜4與段落5-9所獲得之確認分數較高,辨識結 果因而被接受’其餘的段落不管辨識結果為何將因低分而 被拒絕,因此,段落1 -4所對應之關鍵詞「新竹」與段落 5 - 9所對應之關鍵祠「桃園」即為萃取所得之關鍵詞。 又為增加中文關鍵詞萃取之速度,本發明之另一較佳 實施例係以縮減所假定之關鍵詞數目及候選之關鍵詞數目 來減少確認及辨識之次數’與前—實施例不同之處在於其 係依據每個字音段落所提供之資訊來過濾掉根本不可能^ 為關鍵詞的段落,當中較為可靠的資訊即為每個字音段落 的韻母辨識結果C中文語音中韻母辨識較可靠),因此, 若對每個字音段落取前F名韻母辨識的結果(一般辨識器 取F = 5至10可得到98%以上的正確率)作為候選韻母,則對 含k子音的關鍵祠而§ ,只要檢視每個字音的韻母是否有 包含在前F名韻母辨識結果’即可得知接續的k個字音段落 疋否有可能為遠關鍵詞’又對此一 k字音段落而言,原本 在辨識時需對每個含k字音的候選關鍵詞一一做比對’,'但 若有了每個字音前F名韻母的資訊’則只需對篩選剩下 韻母符合之候選關鍵詞做比對,但如果篩選後沒有剩下 何候選關鍵詞,即表示此段落各字音的前F名韻母都不 在於關鍵詞中,便可不必再作進一步之處理,因此可大 減少關鍵詞確認及辨識之次數而有效増進萃取之速度。: 以前一實施例之範例而言,可將原先的〗丨個字音段=诘 7J固’亦即確認的動作可由u次減至7次,而在辨識時間之 即省上,原先每個字音段落作辨識時需要對29個地名詞音Among them, paragraphs 4 and 5-9 have higher confirmation scores, and the recognition results are accepted. The remaining paragraphs will be rejected due to low scores regardless of the recognition results. Therefore, the keywords corresponding to paragraphs 1-4 The key shrine corresponding to "Hsinchu" and paragraphs 5-9, "Taoyuan", is the key word from the extraction. In order to increase the speed of Chinese keyword extraction, another preferred embodiment of the present invention is to reduce the number of confirmations and recognitions by reducing the number of assumed keywords and candidate keywords. The reason is that it is based on the information provided by each phonetic paragraph to filter out paragraphs that are impossible to use as keywords. The more reliable information is the final result of each phonetic paragraph. C Chinese phonetic vowel recognition is more reliable.) Therefore, if the result of the first F name vowel recognition is taken for each syllable paragraph (general recognizer takes F = 5 to 10 to obtain a 98% accuracy rate) as the candidate vowel, then for the key temples containing k consonants, §, As long as you check whether the finals of each vowel contain the recognition result of the first F vowels, you can know whether the subsequent k-character paragraphs are likely to be distant keywords. And for this k-character paragraph, it was originally identified When comparing each candidate key word with k vowels one by one, 'but if you have the information of the first F vowels of each vowel', you only need to screen the candidate keywords that the remaining vowels match. Comparison, but if there are no candidate keywords left after screening, it means that the first F vowels of each syllable of this paragraph are not in the keywords, and no further processing is required, so the keyword confirmation and The number of identifications and the speed of effective extraction. : For the example of the previous embodiment, the original 〖丨 character segment = J7Jsolid ', that is, the confirmation action can be reduced from u times to 7 times, and in the time of recognition, the original, each character sound When identifying paragraphs, you need to pronounce 29 place nouns.

第11頁 449734 五、發明說明(9) 做比對’用以本實施例之方法師選後母字音段落(如 ^的話)候選詞只剩1至2個需做比對,故可明顯加速關鍵 ’萃取之速度。 本發明之免用贅詞模型之中文關鍵詞萃取方法在實際 之試驗下確有極佳之表現,在以應用領域為電話中個人分 機號碼之查詢且關鍵詞為2 〇 〇個人名的條件下進行測試, 使用者在電話中只要說出如:『請轉XXX』或『請問χχχ的 &機號碼』則系統需萃取出人名而對應到分機號碼,習知 採用贅詞模型之關鍵詞萃取方法在〗5 2個測試語句中有j 〇 句萃取失敗,而在使用相同的辨識器之情況下,本發明之 方法有8句萃取失敗,因此,其確實能夠在不必針對應用 領域建立贅詞模型之要求下達成極佳之關鍵詞萃取效果。 _综上所陳,本發明無論就目的、手段及功效,在在均 顯不其週異於習知技術之特徵,為中文關鍵詞萃取方法之 λ计上的~大突破,懇請貴審查委員明察,早曰賜准專 利,俾嘉惠社會,實感德便。惟應注意的是,上述諸多實 施例僅係為了便於說明而舉例而已,本發明所主張之權利 fe圍自應以申請專利範圍所述為準,而非僅限於上述實施Page 11 449734 V. Description of the invention (9) Comparisons' The method used in this example is to select only one or two candidate words in the vowel section (such as ^) for comparison, so it can be significantly accelerated. The key 'extraction speed. The Chinese keyword extraction method of the free word model of the present invention has excellent performance under actual experiments. Under the condition that the application field is the query of the personal extension number in the telephone and the keywords are 2000 personal names For testing, the user only needs to say “Please transfer to XXX” or “Ask & phone number” on the phone, the system needs to extract the name of the person and correspond to the extension number. The keyword extraction using the redundant word model is known. The method failed to extract j 0 sentences in 5 of the 2 test sentences, and in the case of using the same recognizer, the method failed to extract 8 sentences. Therefore, it does not need to establish redundant words for the application field. Under the requirements of the model, it achieves an excellent keyword extraction effect. _In summary, the present invention, regardless of its purpose, means and efficacy, is different from the known technology in its characteristics. It is a breakthrough on the lambda meter of the Chinese keyword extraction method. Observing clearly, granting a quasi-patent as early as possible, and benefiting the society, I feel a sense of virtue. It should be noted that many of the above-mentioned embodiments are merely examples for the convenience of description. The rights claimed in the present invention should be based on the scope of the patent application, and not limited to the above-mentioned implementations.

Claims (1)

449734 六、申請專利範圍 、2白1於入種A用贅詞模型之中文關鍵詞萃取方法,係用 \琥中卒取出包含在該語音訊號内之關鍵 詞,該方法主要包括下述之步驟: (A )利用中文宁*仏 ^ 、,ώ兮於 子a所具有之聲母接韻母的接續結 構’以自3輸入語音印妹a 字音段落,而建構出m出其可能之字音訊號端點及 連接圖;以& 自複數個字音段落接續而成之字音 (B )依據關鍵全引少E 確認關鍵詞。 a長以從該字音連接圖中辨識並 2,:申請專利範圍第1項所述之免用贅詞模型之中立 關鍵詞萃取方法,其中, K中文 . 丹甲步驟(A)主要包括下述之子步 驟: 少 (A1) ^ —通用聲母模型及一通用韻母模型接 語音模型丄亚用-語音之模型與訊號比對法則將該待 之輸入語s訊號與該語音模型進行至多次之匹配,^八子 求取四配所得之最大近似值並找出對應之字音 字音段落;以及 u ^點及 (A2)找出該最大近似值令之前多數個最大 、、 出所對應之多數組字音訊號端點,將該多數… 並找 點結合成該字音連接圖。 、,且予《訊號端 3.如_請專利範圍第1項所述之免用贅朽模 關鍵詞萃取方法,其中,步驟(B)主要自紅 之中文 ,驟·· #下述之子步 (B1)由該字音連接圖中找出所有符合 丨剛现§司之詞長 4497 3 4 4497 3 4 六、申請專利範圍 的字音段落接續;以及 a tjv /ϊτ -rs: m , u ,D O、接續加以辨識並確 (B2)對所找出之每一字青段洛换只 ^ 認 ,^ ^ ^ 讣之免用贅詞模型之中文 4 ·如申州專利範圍第1項所▲ 也白妊下γ夕;半 關鍵詞萃取方法,其中,步赖(B)主要0括下达之子步 驟: ,D1 , ^ _ ,在個孕音段落辨識韻母’ (B1 )由該字音連接圖中對每個= 以分別找出前數個辨識社果最高之候k 0 ’ ⑻)由該字音找出所有2關鍵詞之詞長 的字音段落接續,對所找出之# 一字音袄落接續,依據其 所對應之候選韻母,選擇具有與該對應之候選韻母之一符 合的關鍵詞進行辨識及確認。 5. 如申請專利範圍第2項所述之免用贅詞模型之中文 關鍵詞萃取方法,其中,於子少鄉(八1)中之模型與讯號 比對法則係為維特比演算法。 6. 如申請專利範圍第2項所述之免用贅詞模型之中文 關鍵詞萃取方法,其中,於子步驊(Α1)中之通用聲母模 塑係僅用以辨識語音是否為聲番,而通用韻母模型則係僅 用以辨識語音是否為韻母。 1.如申清專利範圍第2項所述之免用· 5弓模型之中文 關鍵间%取方法,其中,於子步驊(A1 )中’係經由設定 該語音模塑之字音個數為1至L,以對輸入語音訊號執行l 次之匹配’而分別求取L個最大近似值’當中L為輸入語音 訊號中能被容許之最多字音數。 449734 六、申請專利範圍 8 ·如申請專利範圍第3項所述之免用贅詞模型之中文 關鍵詞萃取方法,其中,於子步驟(B2 )中,係以基於隱 藏式馬可夫模型之中文語音辨識器進行辨識。 9. 如申請專利範圍第3項所述之免用贅詞模型之中文 關鍵詞萃取方法,其中,於子步驟(B2 )中,每一字音段 落經辨識後可得到一關鍵詞假定,該關鍵詞假定經確認而 被接受後即為萃取所得之關鍵詞。 10. 如申請專利範圍第3項所述之免用贅詞模型之中 文關鍵詞萃取方法,其中,於子步驟(B2 )中,每一字音 段落經辨識後可得到一關鍵詞假定,該關鍵詞假定經確認 而被接受後即為萃取所得之關鍵詞。 11. 如申請專利範圍第4項所述之免用贅詞模型之中 文關鍵詞萃取方法,其中,於子步驟(Μ )中,係找出前 5至1 0個辨識結果最高之候選韻母。 12. 如申請專利範圍第4項所述之免用贅詞模型之中 文關鍵詞萃取方法,其中,於子步驟(Β2 )中,係以基於 隱藏式馬可夫模型之中文語音辨識器進行辨識。449734 6. The scope of patent application, 2 Bai 1 Yuzhong A Chinese keyword extraction method for seeding A, using \ hoo stroke to extract the keywords contained in the voice signal, the method mainly includes the following steps: (A) Using the Chinese Ning * 仏 ^, the consonant structure of the consonants and vowels in the son a to input the phonetic Indian girl a from 3, and construct a possible endpoint of the phonetic signal and Connection diagram; & a syllabary (B) concatenated from a plurality of syllabic paragraphs to confirm the keywords based on the key quotes E. a long to identify from the phonetic connection diagram and 2, 2: the neutral keyword extraction method described in the first patent application scope of the unnecessary redundant word model, where K Chinese. Danjia step (A) mainly includes the following Sub-steps: Shao (A1) ^ — universal initial model and a universal final model connected to the speech model 丄 sub-speech model and signal comparison rule: the input s signal to be matched with the speech model multiple times, ^ Yako finds the maximum approximation obtained by the four matches and finds the corresponding diacritic phonetic paragraph; and u ^ and (A2) finds the maximum approximation that makes the previous multiples the largest, and the corresponding multi-array word and audio signal endpoints. The majority ... And find a point to combine the phonetic connection diagram. , And give "signal terminal 3. As described in _ please patent scope of the useless unnecessary die extraction keyword extraction method, wherein step (B) is mainly from the red Chinese, step · · # the following sub-steps (B1) Find out all the words that meet the § Division 4497 3 4 4497 3 4 from the phonetic connection diagram. 6. The phonetic paragraph continuation of the patent application scope; and a tjv / ϊτ -rs: m, u, DO 2. Continue to identify and confirm (B2) For each character found in the paragraph, change only ^ recognize, ^ ^ ^ 中文 the Chinese without the use of redundant words model 4 · Shenzhou Patent Scope Item 1 ▲ also White pregnancy under γ evening; semi-keyword extraction method, in which step Lai (B) mainly includes the following sub-steps: D1, ^ _, identify the vowel in each pregnancy paragraph '(B1). Each = to find the highest number of identifying the highest number of social fruit k 0 '找出) to find the word length of all 2 keywords from the phonetic continuation, and continue to the # 字音 袄 found, According to its corresponding candidate finals, select keywords that have a match with one of the corresponding candidate finals for identification and Recognition. 5. The method for extracting Chinese keywords from the redundant word model described in item 2 of the scope of patent application, wherein the model and signal comparison rule in Yuzi Shaoxiang (Aug. 1) is a Viterbi algorithm. 6. The Chinese keyword extraction method of the superfluous word-free model described in item 2 of the scope of the patent application, wherein the universal initial molding in the substep (A1) is only used to identify whether the voice is a voice, The general finals model is only used to identify whether the speech is a final. 1. As described in item 2 of the patent claim, the method of taking out the Chinese key of the 5-bow model is to take the %% method, wherein, in the sub-step (A1), 'the number of vowels molded by the voice is set to 1 to L, in order to perform the matching of the input voice signal 1 times 'to find L maximum approximations', respectively, where L is the maximum number of words allowed in the input voice signal. 449734 VI. Scope of patent application 8 · Chinese keyword extraction method based on the unnecessary redundant word model described in item 3 of the scope of patent application, wherein in sub-step (B2), the Chinese speech based on the hidden Markov model is used The recognizer performs recognition. 9. The Chinese keyword extraction method of the redundant word model exemption model described in item 3 of the scope of the patent application, wherein in the sub-step (B2), a key word hypothesis can be obtained after each syllable paragraph is identified, the key Words are assumed to be extracted keywords after being confirmed and accepted. 10. The Chinese keyword extraction method of the unnecessary word model described in item 3 of the scope of patent application, wherein, in the sub-step (B2), a key word hypothesis can be obtained after each syllable paragraph is identified. The key Words are assumed to be extracted keywords after being confirmed and accepted. 11. The Chinese-language keyword extraction method as described in item 4 of the patent application scope, wherein in the sub-step (M), the top 5 to 10 candidate finals with the highest recognition result are found. 12. The Chinese-language keyword extraction method as described in item 4 of the patent application scope, wherein in the sub-step (B2), the Chinese speech recognizer based on the hidden Markov model is used for identification. 第15頁Page 15
TW88115161A 1999-09-03 1999-09-03 Keyword spotting method for mandarin speech without using filler models TW449734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW88115161A TW449734B (en) 1999-09-03 1999-09-03 Keyword spotting method for mandarin speech without using filler models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW88115161A TW449734B (en) 1999-09-03 1999-09-03 Keyword spotting method for mandarin speech without using filler models

Publications (1)

Publication Number Publication Date
TW449734B true TW449734B (en) 2001-08-11

Family

ID=21642165

Family Applications (1)

Application Number Title Priority Date Filing Date
TW88115161A TW449734B (en) 1999-09-03 1999-09-03 Keyword spotting method for mandarin speech without using filler models

Country Status (1)

Country Link
TW (1) TW449734B (en)

Similar Documents

Publication Publication Date Title
KR100769029B1 (en) Method and system for voice recognition of names in multiple languages
US8380505B2 (en) System for recognizing speech for searching a database
JP7200405B2 (en) Context Bias for Speech Recognition
US20140244258A1 (en) Speech recognition method of sentence having multiple instructions
JP3542026B2 (en) Speech recognition system, speech recognition method, and computer-readable recording medium
JP2001005488A (en) Voice interactive system
US20080133245A1 (en) Methods for speech-to-speech translation
US20070016420A1 (en) Dictionary lookup for mobile devices using spelling recognition
Novitasari et al. Cross-lingual machine speech chain for javanese, sundanese, balinese, and bataks speech recognition and synthesis
US20170270923A1 (en) Voice processing device and voice processing method
Mohanty et al. Speaker identification using SVM during Oriya speech recognition
US20200372110A1 (en) Method of creating a demographic based personalized pronunciation dictionary
JP2014164261A (en) Information processor and information processing method
TW449734B (en) Keyword spotting method for mandarin speech without using filler models
CN111429886B (en) Voice recognition method and system
Vancha et al. Word-level speech dataset creation for sourashtra and recognition system using kaldi
Zheng A syllable-synchronous network search algorithm for word decoding in Chinese speech recognition
JP2002215184A (en) Speech recognition device and program for the same
JP2001195081A (en) Japanese dictation system
KR101095864B1 (en) Apparatus and method for generating N-best hypothesis based on confusion matrix and confidence measure in speech recognition of connected Digits
Pranjol et al. Bengali speech recognition: An overview
KR20030010979A (en) Continuous speech recognization method utilizing meaning-word-based model and the apparatus
JP3881155B2 (en) Speech recognition method and apparatus
JP2001188556A (en) Method and device for voice recognition
JP2001013992A (en) Voice understanding device

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees