TW201023175A - System for consulting dictionary by speech recognition and method thereof - Google Patents

System for consulting dictionary by speech recognition and method thereof Download PDF

Info

Publication number
TW201023175A
TW201023175A TW97148169A TW97148169A TW201023175A TW 201023175 A TW201023175 A TW 201023175A TW 97148169 A TW97148169 A TW 97148169A TW 97148169 A TW97148169 A TW 97148169A TW 201023175 A TW201023175 A TW 201023175A
Authority
TW
Taiwan
Prior art keywords
word
voice
syllable
sound
file
Prior art date
Application number
TW97148169A
Other languages
Chinese (zh)
Inventor
Fred Chen
Saral Liou
Original Assignee
Inventec Besta Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Co Ltd filed Critical Inventec Besta Co Ltd
Priority to TW97148169A priority Critical patent/TW201023175A/en
Publication of TW201023175A publication Critical patent/TW201023175A/en

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

A system for consulting a dictionary by speech recognition and method thereof is provided to solve a problem of being ineffective in querying a word which is known its pronunciation but its spelling. By recognizing speech signals of the word, analyzing the syllables produced by preceding step, and comparing the syllables with presetting samples to offer statement of the word. It solves the previous problem indeed; and further, it reduces the system overload.

Description

201023175 九、發明說明: 【發明所屬之技術領域】 種透過语音查詢單詞的系統及其方法,特別係指一 驗據減之語音訊號提供單娜義之單触詢系統及 ' 其方法。 【先前技術】 對外語學習者^言’攜帶型電子詞典·上電子詞典 ❹ 幾乎是不可或缺的學習工具;攜帶㈣子詞典與線上電子 A、相較之下疋各有千秋,前者擁有攜帶方便、可隨時查 詢的優勢,而後者則具有資料量魔大與經濟實惠的優點。 但無論是使用攜帶型電子詞典或是線上電子詞典,若 使用者欲查詢-個只聽聞其發音,卻不知道其拼字的單 詞、,就只能從發音去推敲這個單詞的拼字,再輸入電子詞 典進灯查均’在使用者猜測錯誤情況下,電子詞典將回覆 使用者查無此字,甚至是顯示出非使用者想要的另一個單 ❹ 言句。例如使用者原要查詢一個發短音的單詞㈣,卻因 為誤會而輸入長音的拼法peach,此時電子詞典將出現非 使用者所要的單詞與解釋;這樣的狀況不僅導致查詢的不 便’更因為使用者採取此種試誤查詢方式,而導致需花費 - 較長的時間來獲取所需的單詞解釋,此亦將增加網路食查 詢系_負擔。此外,因攜帶型電子詞典的鍵盤較小,誤 鍵率較局,將使上述的試誤過程更加不便。 综上所述,可知先前技術於查詢僅知其發音而不知其 5 201023175 拼字的單詞時,一直存在無法有效提 ^此有必要提纽進的技術手段,來解決此博^叫’ 【發明内容】 ^:先前技術存在的無法針賴知其發音而不知 種有效提供單詞解釋的問題’本發明遂揭露-種透過δβ θ查詢單詞的系統及其方法,其中: 立所揭露之透過語音查解詞㈣統,其包含有 θ即糾庫、音檔資料庫、字詞資料庫、接收模組、 ❹ 2識模組、查找模組、確認模組與顯示模組。其中,音節 資料庫儲存有複數個音祕麟應各音節檔之—字母 合;音檑資料庫齡有概個音顯_各音狀一音播 關鍵字;字詞資料庫儲存有對應各音檔驗字之一字詞^ 料’接收模組用以接收語音訊號;語音辨識模組用以對接 收之語音峨進行端點制,依據概㈣之#本,提取 該語音訊號之特徵,並依據該語音訊號之特徵,將該語音 峨切分為至少—音節;查賴朗該音節資料庫查 找出符合各音節之音節檔,提取對應各音節檔之字母組 合,並依序將該些字母組合拼合為一單詞;確認模組用以 確認音檔資料庫有符合該單詞之一音檔關鍵字,並於字詞 資料庫進一步查找對應該音檔關鍵字之字詞資料;顯示模 組則用以顯示該字詞資料。 本發明所揭露之透過語音查詢單詞的方法,預先建立 包含有複數個音節檔與對應各該音節檔之一字母組合的 6 201023175 音節資料庫、包含有複數個音槽與對應各該音檔之-音檔 關,字的音檔資料庫,以及包含有對應各該音檔關鍵字之 。-子詞資料的字咐料庫;接收—語音訊號;對該語音訊 號進行端點檢測;依據端點檢測的樣本,提取該語音訊號 的特徵’依據該语音訊號的特徵,將該語音訊號切分為至 少-音節;於音節資料庫查找出符合各音節之音節槽',以 提取其對應之字母組合;將該些字母組合依序拼合為一單 f ’確遇音檔資料庫有符合該糊之—音侧鍵字;於該 子。弓資料庫查找與該音勸_字對應之字詞資料;最後, 顯示該字詞資料。 本發明所揭露之系統與方法如上,與先前技術之間的 差異在於本發明具有對接收之語音訊號進行語音辨識,以 及對語音觸產生的音節進行分析比對的技術手段;透過 上述的技術手段,本發明可以達成透過語音查詢單詞並有 效提供單詞釋義的技術功效。 【實施方式】 以下將配合圖式及實施例來詳細說明本發明之實施 方式,藉此對本發明如何應用技術手段來解決技術問題並 達成技術功效的實現過程能充分理解並據以實施。 「第1圖」繪示為本發明之透過語音查詢單詞的系統 其方塊示意圖。請參照「第丨圖」,語音單詞查詢系統1〇〇 包含有音節資料庫11〇、音檔資料庫12〇、字詞資料庫 130、接收模組14〇、語音辨識模組15〇、查找模組16〇、 7 201023175 模組180。其中,音節資料庫110包 θ節槽與對應各音節槽之—字母組合·音節楷201023175 IX. INSTRUCTIONS: [Technical Fields of the Invention] A system for querying words by voice and a method thereof, in particular, a single-touch system and a method for providing a single voice of a voice signal. [Prior Art] Foreign language learners ^ words 'portable electronic dictionary · electronic dictionary ❹ almost indispensable learning tools; carrying (four) sub-dictionary and online electronic A, compared to each other, the former has easy to carry, The advantage of being able to query at any time, while the latter has the advantage of a large amount of data and economic benefits. But whether you use a portable electronic dictionary or an online electronic dictionary, if the user wants to query a word that only hears its pronunciation but does not know its spelling, it can only use the pronunciation to guess the spelling of the word. Enter the electronic dictionary into the light check. In the case of user guessing errors, the electronic dictionary will reply to the user to check the word, or even display another single statement that the user does not want. For example, the user originally wants to query a short-sounding word (4), but enters the long-spoken spelling peach because of misunderstanding. At this time, the electronic dictionary will have words and explanations that are not intended by the user; such a situation not only causes inconvenience of the query. Because the user takes such a trial and error query method, which will take a long time to obtain the required word explanation, this will also increase the online food inquiry system. In addition, because the keyboard of the portable electronic dictionary is small and the error rate is relatively small, the above-mentioned trial and error process will be more inconvenient. In summary, it can be seen that in the prior art, when the query only knows its pronunciation and does not know its 5 201023175 spelling words, there is always a technical means that cannot be effectively raised, and it is necessary to solve the problem. Contents] ^: The problem of the prior art that cannot be used to know the pronunciation without knowing the kind of effective explanation of the words. The present invention discloses a system for searching for words through δβ θ and a method thereof, wherein: The solution (4) system includes θ, ie, correction library, audio file database, word database, receiving module, 识2 module, search module, confirmation module and display module. Among them, the syllable database stores a plurality of syllables, the syllables of the syllables, the syllables, the syllables, the syllables, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the sounds, the One of the words of the test word is used to receive the voice signal; the voice recognition module is configured to perform endpoint processing on the received voice, and extract the feature of the voice signal according to the #4 of the text. According to the characteristics of the voice signal, the voice is divided into at least a syllable; the syllable database of the Chalailang finds the syllable file corresponding to each syllable, extracts the letter combination corresponding to each syllable file, and sequentially puts the letters The combination is combined into a word; the confirmation module is configured to confirm that the sound file database has a sound file keyword that matches one of the words, and further search for the word data corresponding to the sound file keyword in the word database; the display module is Used to display the word data. The method for querying a word by voice according to the present invention pre-establishes a 6 201023175 syllable database including a plurality of syllable files and a letter combination corresponding to each of the syllable files, including a plurality of sound channels and corresponding sound files. - The sound file is off, the sound file database of the word, and the keywords corresponding to each of the sound files are included. - a word library of sub-word data; a receiving-speech signal; performing endpoint detection on the voice signal; extracting a feature of the voice signal according to the sample detected by the endpoint', according to the characteristics of the voice signal, cutting the voice signal Divided into at least - syllables; find the syllable groove 'singing the syllables' in the syllable database to extract the corresponding letter combination; the letter combinations are sequentially combined into a single f 'the exact sound file database has the same Paste - the sound side of the key; in the child. The bow database searches for the word data corresponding to the sound _ word; finally, the word data is displayed. The system and method disclosed by the present invention are different from the prior art in that the present invention has a technical means for performing speech recognition on a received voice signal and analyzing and comparing the syllables generated by the voice touch; The invention can achieve the technical effect of querying words through voice and effectively providing word interpretation. [Embodiment] Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings and embodiments, so that the application of the technical means to solve the technical problems and the realization of the technical effects can be fully understood and implemented. Fig. 1 is a block diagram showing a system for querying words by voice according to the present invention. Please refer to the "figure map". The voice word query system 1〇〇 includes a syllable database 11〇, a sound file database 12〇, a word database 130, a receiving module 14〇, a voice recognition module 15〇, and a search. Module 16〇, 7 201023175 Module 180. Among them, the syllable database 110 includes θ slot and corresponding syllable slot - letter combination syllable 楷

之可為「.wav」、「.mp3」或是其他格式的聲音播, 而對應之子母組合並咨挺別能Ά,___ 編號 字母组合 ~~~~~ 音標 1 A ei 音節編號 f中,制似為 「A |。 曰槽貝料庫120儲存有複數個音檔麵應各音檀之音 檔關鍵字’此狀音來源可為真人發音,難格式可 為:篇」、「.„11)3」’或其他格式的聲音權;此與前述之 音節檔差異在於’音節縣音節的聲音檔,而音檔係為音 檔關鍵字的聲音檔。字詞資料庫m儲存有對應各音槽關It can be ".wav", ".mp3" or other format sounds, and the corresponding combination of the child and the parent can not be ambiguous, ___ numbered letter combination ~~~~~ phonetic symbol 1 A ei syllable number f, The system seems to be "A |. The grooving beech library 120 stores a plurality of sound files. The sound should be the sound of the soundtrack. The source of the sound can be a real human voice. The hard format can be: "", "." 11) 3"' or other format sound rights; this differs from the aforementioned syllable file in the sound file of the syllable county syllable, and the sound file is the sound file of the sound file keyword. The word database m is stored with corresponding sound channels.

鍵子之子抑料,對應音檔關鍵字「A」之字詞資料為 如「第3圖」所示。 接收模組140帛以接收如「第4圖」所示之使用者錄 製的語音訊號’語音辨識模組15〇用以對接收之該語 號進行端冑酬,雜雜朗之樣本,提取該語音訊號 之特徵,魏_音概及縣音减讀徵,將該語音 訊號切分為至少-音節。語音纖技術著重於語音訊號其 特徵的擷取’如此才可將其與參考波形(音節檔)作比對, 以進行辨識。以下將對語音辨識模組15〇其可能之實施細 節做詳細說明。 8 201023175 合理假設語音訊號的變化是連續且緩慢的,因此常見 的作法係將接收之語音訊號劃分為若干個短時距(音 框)’其長度可為20ms至30ms不等,並對每一音框加窗 (window ); 一般加窗大致以漢明視窗(Hamming Window)最常見’它具有壓抑音框兩端及保持中間段的 特性,其他有矩形窗(RectangUiar window)與漢尼窗 (Hanning Window)。語音辨識模組150可採用短時距能 量(Short-Time Energy and Average Magnitude)與短時越 零率(Short-Time Average Zero-Crossing Rate )對每一音框 進行端點檢測;短時距能量代表振幅,後續將依據短時距 能量數據來删除一些細小雜訊,短時越零率為信號波形穿 越杈軸的次數,用以作為判斷週期長短的依據。假設該語 音訊號的採樣鮮為8Hz,每__音框之長度為2Qms,共 汁160個採樣點,每隔2〇ms (即一音框)計算一次短時 ^能量與短時越零率,並以Ei代表第丨段音框的短時距能 量,Zi代表第i段音框的短時越零率。 依據上述端點制之縣’提轉語音峨之各音框 的特徵。由於-般單詞發音的檔餘整句辨識要小,其數 據量亦不若整句觸來的大,因此可先贱速排料 (Quicksort)對各音框之短時距能量與短時越零率進行排 序(從小到大),並找出中位值、、1/3處的‘與糾 處的&,以進-步計算學值與谷值;其中,谷值位於第 -個在鄰近的10段音框中能量最小的音框處,若這樣的 201023175 谷值處不存在,則將谷值定義為在兩個相鄰峰值處間,能 . 4最小的音框處。爾濁音具餘大振幅與其基音週^ (辦鱗輸低)哺點’運用上述之計算值對該語音 訊號進行音精分;其巾,雜處代械音_心位置, ' _相鄰峰值處之間的第-個谷值處作為音節的分界 點,以將該語音訊號切分為至少一音節。 接著對每一音節進行修整,檢測每一峰值的基因週 © 期,以合併間距小於80ms的兩個鄰近峰值,刪除這兩個 峰值間的谷值,並將其巾能量較大的峰值作為新的峰值 點;另外對於能量過小(小於尽,4)的峰值處與能量過大 (大於兵〃)的谷值處,檢測其鄰近的基音週期參數以刪 除不穩定的峰值處/谷值處,以最後確定各音節的起始點與 終止點。 ' 查找模組160透過線性預測分析得到包含線性預測倒 頻譜係數(Linear Predictive Cepstrum Coefficients,LPCC ) ® $各種參數’用以對各音節與音節資料庫11〇之音節檔進 行比對分析,查找出與各音節符合之音節檔;對於因所錄 製的s吾音訊號不正確或不清晰,而導致顯示非使用者所要 的子詞的情形,如使用者欲查詢「pitch」,卻誤發音為[pjt • 々而導致顯示為「Peach」,故比對標準可設定為5〇%匹配, 並提取對應各該音節檔的字母組合(如ba與by),並將該 些子母組合依序拼合為至少一單詞(如baby)。 對於每一單詞,確認模組170用以確認音檔資料庫 201023175 12G存有符合該單詞之-音翻鍵字,並進-步於字詞資 料庫130錢朗該音侧鍵字之糊·,由顯示模組 18G顯7F該糊麵,此時可增域示鮮發音$⑺及發 音比較520 (請見「第SA圖」),以及於此語音單詞查詢 • '系統100增加一輸出模、组(圖中未示),以提供使用者於 點選該標準發音51〇選項或發音比較52〇時,由輸出模組 輸出與該音播關鍵字對應之音檔,或輸出比較該音槽與使 ❹ 帛者雜的語音峨舰生的比較雜。若確認模组17〇 確遠、曰檔貝料庫12〇無任何符合該些單詞之音播關鍵字 時’可進-步自字詞請庫13G錢與該單詞符合之字詞 資料;若確認於字詞資料庫13〇有符合該單詞之字詞資料 時’顯示該字詞資料,並增加顯示ETTS (English Text to Speech)發曰53〇 (凊見「第5B圖」),用以於使用者點 選時,由輸出模、组依據該單詞輸出Εττ 齡她18G_-提如告知使时絲失^ 「第2 @」料為本㈣之透過語音查詢單詞的方法 其步驟流程圓。請參照「第2圖」,預先建立音節資料庫 no、音檔資料庫m與字詞資料庫⑼,其中該音節資料 庫H0儲存有複數個音節檔與對應各該音節槽之一字母植 合,該音槽資料庫120儲存有複數個音檔與對應各該音檔 之一音槽關鍵字,字詞資料庫13〇儲存有對應各該音播關 鍵字之一字詞資料;接收-語音訊號,並對該語音訊號進 行端點酬(倾⑽),·罐概物狀縣,提取該 201023175 語音訊號之·(倾22G);依魏語音減之特徵 該語音訊號切分為至少一音節(步驟23〇);於音節資料庫 no查找崎合各音節之音_,以提取其巾各 ==Γ:240);將該些字母組合依序拼合2 -(步驟5〇),確認音檔資料庫⑽有符合該單詞之— 參 ❹ 曰。檔關,子(步驟26〇);於字詞資料庫13〇查找對應該音 檔關鍵予之予詞資料,並顯示該字詞資料(步驟挪)。 、、採用h時距i量與短時越零率進行端點檢 測’而端點檢測產生之樣本用以作為判斷濁音之依據;此 外’於步驟230之射更包含修整各音節的步驟,以最故 確定各音節的起始點與終止點。於執行步驟270之顯示該 字詞資料的同時,可更包含依據使用者的外部操作,輸出z 對應該音檔關鍵字之音檔,或是ETTS發音。 本發明為接收使用者錄製之語音訊號,並透過端點檢 f等技術’域濁音之概雜語音峨切分為至少一音 即’再以線性测分析等技術轉出與各音祕配之預設 音節檔’以操取其中各音節槽對應之字母組合,將該些字 母組合依序拼合為-單詞’並查找與該單詞符合之音槽關 鍵字;若有該音檔關鍵字,則進一步顯示該音檔關鍵字之 字詞資料。本㈣可顧_帶型電子顺,更可延伸應 用至主從式架構。 〜 综上所述,可知本發明與先前技術之間的差異在於具 有對接收之語音訊號進行語音辨識,以及對語音辨識產生 12 201023175 的曰郎進行分析比對的技術手段,藉由此一技術手段可以 解決先前技術所存在的問題,進而達成透過語音查詢單詞 並有效提供單詞釋義的技術功效。 雖然本發明所揭露之實施方式如上,惟所述之内容並 非用以直接限定本發明之專利保護範圍。任何本發明所屬 技術領域中具有通常知識者,在不脫離本發明所揭露之精 神和範_前提下,可以在實施的形式上及細節上作些^ 之更動。本發明之專利保護範圍,仍須以所附之申請專利 範圍所界定者為準。 【圖式簡單說明】 第1圖為本發明之透過語音查詢單詞的系統其方塊示 意圖。 第2圖為本發明之透過語音查詢單詞的方法其步驟流 程圖。 第3圖為本發明之字詞資料庫其資料示意圖。 第4圖為接收之語音訊號其示意圖。 第5A圖為確認有對應字詞資料之音檔其介面示意 圖。 第5B圖為確認無對應字詞資料之音檔其介面示意 圖。 【主要元件符號說明】 100 語音單詞查詢系統 110 音節資料庫 13 201023175 120 音檔資料庫 130 字詞資料庫 140 接收模組 150 語音辨識模組 160 查找模組 170 確認模組 180 顯示模組The key of the key is the material, and the word data corresponding to the audio key "A" is as shown in "Fig. 3". The receiving module 140 is configured to receive a voice signal recorded by the user as shown in FIG. 4, and the voice recognition module 15 is configured to perform a payment on the received message number, and extract the sample. The characteristics of the voice signal, Wei _ 音 and the county tone subtraction sign, the voice signal is divided into at least - syllables. The speech fiber technology focuses on the capture of the characteristics of the speech signal so that it can be compared to the reference waveform (syllabic file) for identification. The speech recognition module 15 will be described in detail below with regard to its possible implementation details. 8 201023175 It is reasonable to assume that the change of voice signal is continuous and slow, so the common practice is to divide the received voice signal into several short time intervals (sound frames), which can range from 20ms to 30ms, and each The window is windowed; the window is generally the most common in the Hamming Window. It has the characteristics of suppressing both ends of the frame and maintaining the middle section. Others have a rectangular window (RectangUiar window) and a Hanni window ( Hanning Window). The speech recognition module 150 can perform endpoint detection for each frame by using Short-Time Energy and Average Magnitude and Short-Time Average Zero-Crossing Rate; short-time energy Representing the amplitude, the subsequent small noise will be deleted based on the short-term energy data. The short-time zero-zero rate is the number of times the signal waveform crosses the axis, which is used as the basis for judging the length of the cycle. Assume that the sampling of the voice signal is freshly 8 Hz, the length of each __ sound box is 2Qms, and 160 sampling points are shared, and a short time ^ energy and a short time zero rate are calculated every 2 〇 ms (ie, a sound box). And Ei represents the short-term energy of the third-order frame, and Zi represents the short-term zero-crossing rate of the i-th segment. According to the above-mentioned end point system, the county's features of the voice box are reversed. Since the whole sentence of the general word pronunciation is small, the amount of data is not as large as the whole sentence. Therefore, the short-term energy and the short-term time of each sound box can be quickly sifted. Zero rate sorting (small to large), and find the median value, '1/3 of the 'and the correction', to calculate the value and the valley value in the step-by-step; where the valley value is located in the first At the lowest energy frame in the adjacent 10-segment box, if such a 201023175 valley does not exist, the valley is defined as the smallest frame between the two adjacent peaks. The voiced voice has a large amplitude and its pitch sound ^ (running the scales low) feeding point 'using the above calculated value to sing the voice signal; its towel, miscellaneous mechanical sound _ heart position, ' _ adjacent peak The first valley between them serves as a demarcation point for the syllable to divide the voice signal into at least one syllable. Then, each syllable is trimmed, and the gene week of each peak is detected to merge two adjacent peaks with a pitch of less than 80 ms, and the valley between the two peaks is deleted, and the peak of the towel energy is taken as a new peak. The peak point; in addition, for the peak of the energy too small (less than 4,) and the valley of the energy too large (greater than the soldiers), the neighboring pitch period parameters are detected to delete the unstable peak/valley, Finally, the starting point and ending point of each syllable are determined. The search module 160 obtains Linear Predictive Cepstrum Coefficients (LPCC) ® $ various parameters through linear prediction analysis to compare and analyze the syllables of the syllables and the syllable database. A syllable file that conforms to each syllable; in the case where a sub-word that is not intended by the user is displayed due to an incorrect or unclear sound recorded by the syllable, if the user wants to query "pitch", the erroneous pronunciation is [ Pjt • 显示 and the display is “Peach”, so the comparison standard can be set to 5〇% matching, and the letter combination corresponding to each syllable file (such as ba and by) is extracted, and the combination of the mother and the child are sequentially combined. Be at least one word (like baby). For each word, the confirmation module 170 is used to confirm that the sound file database 201023175 12G has a sound-over key that matches the word, and proceeds to step in the word database 130. The paste is displayed by the display module 18G. At this time, the field can be increased by the pronunciation of $(7) and the pronunciation comparison 520 (see "SA"), and the voice word query is selected. a group (not shown) for providing a user to select the standard pronunciation 51 〇 option or the pronunciation comparison 52 ,, the output module outputs a sound file corresponding to the audio key, or output the comparison sound channel It is more complicated than the voice of the ship. If the confirmation module 17 is far away, the file library 12〇 does not have any audio keywords that match the words, the words can be entered into the word 13G money and the words in the word; Confirmation in the word database 13 when there is a word data matching the word 'display the word data, and add ETTS (English Text to Speech) 53曰 (see "5B") for When the user clicks, the output mode and the group output Εττ according to the word age. Her 18G_- is told to make the time lost. "The second @" is based on (4) the method of querying the word through the voice. Please refer to "Fig. 2" to pre-establish the syllable database no, the audio file database m and the word database (9), wherein the syllable database H0 stores a plurality of syllable files and one of the corresponding syllable slots. The sound channel database 120 stores a plurality of sound files and a sound box keyword corresponding to each of the sound files, and the word database 13 stores one word data corresponding to each of the sound key keywords; receiving-voice Signal, and the end of the voice signal (dip (10)), the cans of the county, extract the 201023175 voice signal · (dip 22G); according to the characteristics of the Wei voice reduction, the voice signal is divided into at least one syllable (Step 23〇); Find the sound _ of the syllables in the syllable database no, to extract the towels each ==Γ: 240); combine the letter combinations in sequence - 2 (step 5〇), confirm the sound The file database (10) has the word ❹ ❹ 曰. The file is closed (step 26〇); in the word database 13〇, the word information corresponding to the key key is searched, and the word data is displayed (step move). And using the h-time distance i quantity and the short-time zero-rate rate for endpoint detection', and the sample generated by the endpoint detection is used as a basis for judging the voiced sound; in addition, the step of the step 230 includes the steps of trimming the syllables to The starting point and the ending point of each syllable are determined most. While the step of displaying the word data in step 270, the outputting of the z corresponding audio file key or the ETTS pronunciation may be further included according to the external operation of the user. The invention is for receiving the voice signal recorded by the user, and is divided into at least one sound by using the technique of the end point detection f and the like, and the sound is divided into at least one sound, and then transferred to each sound with a technique such as linear measurement analysis. Presetting the syllable file 'to take the letter combination corresponding to each syllable slot, and combining the letter combinations into the word 'word' and finding the phonogram keyword corresponding to the word; if there is the syllable keyword, The word data of the audio file keyword is further displayed. This (4) can be used as a master-slave architecture. ~ In summary, it can be seen that the difference between the present invention and the prior art is that there is a technical means for performing speech recognition on the received speech signal, and analyzing and comparing the speech recognition generation 12 201023175 by the technique. The means can solve the problems existing in the prior art, thereby achieving the technical effect of querying the words through the voice and effectively providing the meaning of the words. While the embodiments of the present invention are as described above, the above description is not intended to directly limit the scope of the invention. Any changes in the form and details of the embodiments may be made without departing from the spirit and scope of the invention. The scope of the invention is to be determined by the scope of the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram showing a system for querying words by voice according to the present invention. Fig. 2 is a flow chart showing the steps of the method for querying words by voice according to the present invention. Figure 3 is a schematic diagram of the data of the word database of the present invention. Figure 4 is a schematic diagram of the received voice signal. Figure 5A is a schematic diagram of the interface for confirming the audio file of the corresponding word data. Figure 5B is a schematic diagram of the interface for confirming the audio file without corresponding word data. [Main component symbol description] 100 voice word inquiry system 110 syllable database 13 201023175 120 audio file database 130 word database 140 receiving module 150 voice recognition module 160 search module 170 confirmation module 180 display module

510 標準發音 520 發音比較 530 ETTS 發音 步驟2H)接收-語音訊號,並對該語音訊號進行端 點檢測 步驟220依據該端點檢測之樣本,提取該語音訊號 之特徵510 Standard pronunciation 520 Pronunciation comparison 530 ETTS pronunciation Step 2H) Receive-speech signal and perform end point detection on the voice signal Step 220 extracts the characteristics of the voice signal based on the sample detected by the endpoint

步驟230依據該語音減之特徵,將該語音訊號切 分為至少一音節 步驟240於該音節資料庫查找出符合各該音節之 該音郎槽,提取對應各該音節槽之字母組 合 步驟250依序拼合各該字母組合,產生一單詞 步驟260確認該音檔資料庫包含有符合該單詞之 一音檔關鍵字 步驟270於該字詞資料庫查找對應該音檔關鍵字 201023175 之字詞資料並顯示之Step 230, according to the feature of the voice subtraction, dividing the voice signal into at least one syllable step 240, searching for the sound slot corresponding to each syllable in the syllable database, and extracting the letter combination corresponding to each syllable slot. Sorting each of the letter combinations, generating a word step 260, confirming that the sound file database contains a sound file keyword 270 corresponding to the word, and searching for the word data corresponding to the sound file keyword 201023175 in the word database. Display

1515

Claims (1)

201023175 Λ申請專利範園·· 一種透縣音查詢單__,其包含: 二音㈣料庫’儲存麵數個料檔與對應各該 曰即檔之一字母組合; 槽之:::=’儲存有複數個音檔與對應各該音201023175 ΛApplication for Patent Park·· A kind of toning sound inquiry list __, which contains: Two-tone (four) material library 'storage surface number of material files and one of the corresponding letter files of each file; slot:::= 'Stored with multiple audio files and corresponding to the sound 子巧:貝料庫,儲存有對應各該音檔關鍵字之-字詞資料; 一接收模組,用以接收語音訊號; 一語音辨職組,語音峨断端點檢 測,依據該端點檢測之樣本,提取該語音訊號之特 徵,並依據該語音訊號之特徵,將該語音訊號切分為 至少一音節;Sub-family: a billiard library, storing the word data corresponding to each of the audio file keywords; a receiving module for receiving the voice signal; a voice discriminating group, the voice interrupting the endpoint detection, according to the endpoint Detecting the sample, extracting the feature of the voice signal, and dividing the voice signal into at least one syllable according to the characteristics of the voice signal; 一查找模組,用以於該音節資料庫查找出符合各 該音節之該音節檔,提取對應各該音節檔之字母組 合’並依序拼合各該字母組合,產生一單詞; 一確認模組,用以確認該音檔資料庫有符合該單 詞之' §'樓關鍵字,並於該字詞資料庫進一步查找對 應該音檔關鍵字之字詞資料;及 一顯示模組,用以顯示該字詞資料。 2.如申請專利範圍第1項所述之透過語音查詢單詞的系 統,其中該語音辨識模組用以依據濁音特色進行語音 辨識。 201023175 *申Μ專利範圍第1項所述之透過語音查詢單詞的系 - 2 ’其巾該語音辨賴組於切分該語音減為至少- _ 音節後,更包含用以對各該音節進行修整。 •如申凊專利範圍第1項所述之透過語音查詢單詞的系 統’該系統更包含-輸出模組’肢輸出對應該音槽 關鍵字之音檔。 .如申睛專利範圍第i項所述之透過語音查詢單詞的系 統其中該音檔之生成可為真人發音或是ETTS發音。 6.如申請專利範圍第!項所述之語音單詞查詢系統該 系統可延伸應用至主從式架構。 ’〜種透聰音查詢單觸方法,建蛇含有複數個音 即檔與對應各該音節檔之—字母組合的_音節資料 2複數個音檔與對應各該音檔之一音檔關鍵字的一 音檔資料庫,以及包含有對應各該音稽關鍵字之一字 ❹ 弓Η料的一子詞資料庫,該方法包含下列步驟·· 接收一語音訊號; 對該語音訊號進行端點檢測; , 依據该端點檢測之樣本,提取該語音訊號之特 徵; 依據該語音訊號之特徵,將該語音訊號切分為至 少一音節; 於該音節資料庫查找出符合各該音節之該音節 槽’提取對應各該音節檔之字母纟且合; 17 201023175 依序拼合各該字母組合,產生一單詞; 確認該音檔資料庫包含有符合該單詞之一音檔 關鍵字; 3 S 於該字詞資料庫查找對應該音檔關鍵字之字詞 資料;及 顯示該字詞資料。 8. e 9. 10. 11. 如申請專利範圍第7項所述之透過語音查詢單詞的方 法,其中端點檢測之樣本用以作為判斷濁音之依據。 如申請專纖圍第7項所述之透聽音查解詞的方 法,其中於驗獅語音峨之槪,將該語音訊號 切分為至少_音節的步職,更包含修整各該音節的 步驟。 如申請專利範圍第7項所述之透過語音查解詞的方 法,該方法更包含輸出對應該音檔關鍵字之音檔。 如申請專繼圍第7撕述之透過語音查解詞的方 法’其中該音檔之生成可為真人發音或是抓8發音。 如申請專·圍第7酬述之透過語音查解詞的方 法,該方法可延伸應用至主從式架構。 12.a search module for finding the syllable file corresponding to each syllable in the syllable database, extracting a letter combination corresponding to each syllable file and sequentially splicing each letter combination to generate a word; For confirming that the sound file database has a '§' building keyword that matches the word, and further searching for the word data corresponding to the sound file keyword in the word database; and a display module for displaying The word information. 2. The system for querying words by voice as described in claim 1 wherein the speech recognition module is configured to perform speech recognition based on voiced features. 201023175 * The system for querying words by voice in the first paragraph of claim 1 of the patent scope is as follows: after the segmentation of the speech is reduced to at least - _ syllables, it is further included for each syllable trim. • The system for querying words by voice as described in claim 1 of the patent scope, the system further includes an output module that outputs a sound file corresponding to the sound box keyword. A system for querying a word by voice as described in item i of the scope of the patent application, wherein the generation of the sound file can be a human voice or an ETTS pronunciation. 6. If you apply for a patent range! The voice word query system described in the section can be extended to the master-slave architecture. '~ kind of transcendence query one-touch method, the built-in snake contains a plurality of sounds and files corresponding to each syllable file-letter combination _ syllable data 2 plural sound files and one of the audio file keys corresponding to each of the sound files a file database, and a sub-word database containing a word corresponding to each of the key words, the method comprising the following steps: receiving a voice signal; performing an endpoint on the voice signal Detecting, according to the sample detected by the endpoint, extracting the feature of the voice signal; and according to the characteristics of the voice signal, dividing the voice signal into at least one syllable; finding the syllable corresponding to each syllable in the syllable database The slot 'extracts the letter corresponding to each of the syllable files; 17 201023175 sequentially combines the letter combinations to generate a word; confirm that the sound file database contains a phonetic keyword that matches one of the words; 3 S The word database finds the word data corresponding to the phonetic keyword; and displays the word data. 8. e 9. 10. 11. For the method of querying words by voice as described in item 7 of the patent application, the sample of the endpoint detection is used as the basis for judging the voiced sound. For example, if the application for the vocalization of the vocalization is described in the seventh section of the syllabus, the voice signal is divided into at least _ syllables, and the syllables are trimmed. step. For example, the method for transacting a voice through a word as described in claim 7 of the patent scope further includes outputting a sound file corresponding to the audio file keyword. For example, the method of applying for the word-recognition by the voice of the seventh section is applied. The voice file can be generated by a real person or by a sound. This method can be extended to the master-slave architecture by applying the method of voice-resolving words in the 7th remuneration. 12.
TW97148169A 2008-12-11 2008-12-11 System for consulting dictionary by speech recognition and method thereof TW201023175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97148169A TW201023175A (en) 2008-12-11 2008-12-11 System for consulting dictionary by speech recognition and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97148169A TW201023175A (en) 2008-12-11 2008-12-11 System for consulting dictionary by speech recognition and method thereof

Publications (1)

Publication Number Publication Date
TW201023175A true TW201023175A (en) 2010-06-16

Family

ID=44833289

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97148169A TW201023175A (en) 2008-12-11 2008-12-11 System for consulting dictionary by speech recognition and method thereof

Country Status (1)

Country Link
TW (1) TW201023175A (en)

Similar Documents

Publication Publication Date Title
Reddy et al. Toward completely automated vowel extraction: Introducing DARLA
Campbell et al. Phonetic speaker recognition with support vector machines
Loukina et al. Rhythm measures and dimensions of durational variation in speech
KR101309042B1 (en) Apparatus for multi domain sound communication and method for multi domain sound communication using the same
US20160314783A1 (en) Method for building language model, speech recognition method and electronic apparatus
JP5017534B2 (en) Drinking state determination device and drinking state determination method
China Bhanja et al. A pre-classification-based language identification for Northeast Indian languages using prosody and spectral features
Ng et al. Spoken language recognition with prosodic features
Kopparapu Non-linguistic analysis of call center conversations
Hanani et al. Spoken Arabic dialect recognition using X-vectors
Kurzekar et al. Continuous speech recognition system: A review
Mangalam et al. Learning spontaneity to improve emotion recognition in speech
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
Hafen et al. Speech information retrieval: a review
Putri et al. Music information retrieval using Query-by-humming based on the dynamic time warping
Tripathi et al. VEP detection for read, extempore and conversation speech
Jacob et al. Prosodic feature based speech emotion recognition at segmental and supra segmental levels
Sazhok et al. Punctuation Restoration for Ukrainian Broadcast Speech Recognition System based on Bidirectional Recurrent Neural Network and Word Embeddings.
Gong et al. Audio to score matching by combining phonetic and duration information
Kruspe et al. A GMM approach to singing language identification
Kafle et al. Modeling Acoustic-Prosodic Cues for Word Importance Prediction in Spoken Dialogues
Pranjol et al. Bengali speech recognition: An overview
TW201023175A (en) System for consulting dictionary by speech recognition and method thereof
Ning et al. Using tilt for automatic emphasis detection with bayesian networks
Gelas et al. Evaluation of crowdsourcing transcriptions for African languages