TW201216253A - A speech recognition method on sentences in all languages - Google Patents

A speech recognition method on sentences in all languages Download PDF

Info

Publication number
TW201216253A
TW201216253A TW99134580A TW99134580A TW201216253A TW 201216253 A TW201216253 A TW 201216253A TW 99134580 A TW99134580 A TW 99134580A TW 99134580 A TW99134580 A TW 99134580A TW 201216253 A TW201216253 A TW 201216253A
Authority
TW
Taiwan
Prior art keywords
sentence
sound
matrix
cepstrum
sentences
Prior art date
Application number
TW99134580A
Other languages
Chinese (zh)
Other versions
TWI460718B (en
Inventor
Tze-Fen Li
Lee Tai-Jan Li
Shih-Tzung Li
Shih-Hon Li
Li-Chuan Liao
Original Assignee
Tze-Fen Li
Lee Tai-Jan Li
Shih-Tzung Li
Shih-Hon Li
Li-Chuan Liao
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tze-Fen Li, Lee Tai-Jan Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao filed Critical Tze-Fen Li
Priority to TW099134580A priority Critical patent/TWI460718B/en
Publication of TW201216253A publication Critical patent/TW201216253A/en
Application granted granted Critical
Publication of TWI460718B publication Critical patent/TWI460718B/en

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention can recognize sentences in all languages. A sentence can be a syllable, word, name or sentence. The most important feature of this invention is that all sentences are represented by E*P=12*12 matrices of linear predict coding cepstra (LPCC), produced by E=12 equal-sized elastic frames without filter and without overlap. The prior speech recognition methods have to compute and compare a series of matrices of features of words, but the invention only computes and compares an E*P matrix of LPCC. 1000 different voices are transformed into 1000 different matrices of LPCC to represent 1000 different databases. The E*P matrices of known sentences after deletion of noise and time intervals between two words and two syllables are put into their closest databases. To classify an unknown sentence, use the distance to find its F closest databases from 1000 different databases and then from known sentences in the F closest databases, find a known sentence to be the unknown one. The invention needs no samples and can find a sentence in one second using Visual Basic. Any person without training can immediately and freely communicate with a computer in any language. It can recognize up to 7200 English words, 500 sentences (any language) and 500 Chinese words and input 4400 Chinese words.

Description

201216253 六、發明說明: 【發明所屬之技術領域】 本發明可辨認任何語言的句子。本發明用12彈性框 (窗)’等長’無驗器,不重疊,將-到多個單字組成長短 不:的-個句子的音波轉換成ExP=12xl2的線性預估編碼倒 頻譜(LPCC)的轉。將全部被觸的已㈣? _似度先分 到-千個不同資料庫中。辨認一個未知句子時,將它先轉換 成ExP線性預估編碼倒頻譜矩陣,再用未知句子Εχρ矩陣用 距離從-千個資料庫中,找最接近的資料庫&再從最接近的 資料庫内的已知句子,用距離找要辨認未知的句子。 田使用者發音後,用Visual Basic,不到-秒鐘很快能辨 %所要的句子。方法簡單,不需樣本,任何人都可及時使用, 發音不標準或發錯音者也可。赠要計算及比對一個句子特 徵值,本發明只要計算及比對一個句子秘矩陣值 。速度快, ^確率间。職語、台語、英語、日語、德語發音均測試過β 可辨認大4語音。本發财雜本,單數學計算及辨認, 又準又快。 【先前技術】 般辨個未知句子是先將該未知句子切劃成多個單 201216253 音或單字,_是—雜絲度技術,尤其是英語,—個英語 單字有多個音節,很難切割準確…音節之差會絲知句子: 認錯。所以講話時,要小心,要慢,要清楚,單字間隔要長。 再將未知自子全料字和㈣庫的已知單字比對,-個單字錯 誤會使未知句子辨認錯。再將單字資料庫找到的已知單字依^ 未知句子單字順序,連成―個已知句子,再從句子資料庫找最 可能已知句子為未知句子。一般辨認一個未知句子方法很難準 • 確,費時,不能正常和電腦自由交談。-般辨認方法須費時做 樣本,須用統計計算及辨認,當然不準,因統計只能估計 -個句子的發音是料絲示。音波是—種隨時間作非線 性變化的系統,-個句子音波内含有一種動態特性,也隨時間 作非線性連續變化。相同句子發音時,有一連串相同動態特 性’隨時間作非線性伸展及收縮,但相同動態特性依時間排列 秩序-樣’但時間不同。相同句子發音時,將相同的動態特性 # 排列在同一時間位置上非常困難。 -個電腦化語言辨認系統,首先要抽取聲波有關語言資 訊’也即動態特性,過渡和語言無關的雜音,如人的音色、音 調,說話時喊、生理讀緒和語細認無關細去。然後再 將相同句子的相同特徵排列在相同的時間位置上。此一連串的 特徵用-等長系珊徵向量表示,稱為—個句子_徵模型。 目前語音辨認系統要產生大小-致的特徵翻太複雜,且費 201216253 時’因為相同句子的相同特徵很難排列在同一時間位置上,尤 其是英語,導致比對辨·認困難。 一個連續聲波特徵常用有下列幾種:能量(energy),零 才只過點數(zero crossings) ’ 極值數目(exfreme c〇unt), 顛峰(formants),線性預估編碼倒頻譜(lpcc)及梅爾頻率 倒頻譜(MFCC),其中以線性預估編碼倒頻譜(LpCC)及梅爾 頻率倒頻譜(MFCC)是最有效,並普遍使用。線性預估編碼倒 春 頻瑨(LPCC)是代表一個連續音最可靠,穩定又準確的語言特 徵。匕用線性迴歸模式代表連續音音波,以最小平方估計法計 算迴歸係數’其估計值再轉換成倒頻譜,就成為線性預估編碼 倒頻譜(LPCC)。而梅爾頻率倒頻譜(mfcc)是將音波用傅氏 轉換法轉換成頻率。再根據梅爾頻率比例去估計聽覺系統。根201216253 VI. Description of the Invention: [Technical Field to Which the Invention Is Ascribed] The present invention recognizes sentences in any language. The invention uses 12 elastic frame (window) 'isometric' non-verifiers, does not overlap, converts the sound waves of - a plurality of words into a long sentence: a linear predictive coding cepstrum of ExP=12xl2 (LPCC) ). Will all be touched (four)? _ likeness is first divided into - thousands of different databases. When identifying an unknown sentence, convert it to the ExP linear predictive coding cepstrum matrix, and then use the unknown sentence Εχρ matrix to find the closest database & the closest data from the -1000 database. Known sentences in the library, use distance to find unknown sentences. After the user of the field is pronounced, with Visual Basic, it is possible to identify the desired sentence in less than - seconds. The method is simple, no sample is needed, anyone can use it in time, and the pronunciation is not standard or the wrong sound can be used. To calculate and compare a sentence feature value, the present invention only needs to calculate and compare a sentence secret matrix value. Fast, ^ sure rate. The pronunciation of the official language, Taiwanese, English, Japanese, and German have all been tested with β recognizable large 4 voices. This wealthy book, single mathematics calculation and identification, and quasi-fast. [Prior Art] Generally, an unknown sentence is firstly divided into a plurality of single 201216253 sounds or words, _ is a heterogeneous technique, especially English, and an English single word has multiple syllables, which is difficult to cut. Accurate... The difference between syllables will know the sentence: Admit mistakes. Therefore, when speaking, be careful, be slow, be clear, and have a long word interval. Then, the unknown single sub-words are compared with the known single words of the (4) library, and a single word error will make the unknown sentence recognize the wrong one. Then, the known words found in the single-word database are concatenated into a known sentence according to the order of the unknown sentences, and the most likely known sentence is the unknown sentence from the sentence database. It is often difficult to accurately identify an unknown sentence method. It is time-consuming and cannot talk freely with the computer. - The method of identification must take time to do samples, which must be calculated and identified by statistics. Of course, it is not allowed. Statistics can only be estimated - the pronunciation of a sentence is a silk thread. Sound waves are systems that make non-linear changes over time. - The sentence sound waves contain a dynamic characteristic that also changes nonlinearly with time. When the same sentence is pronounced, there are a series of identical dynamic characteristics that nonlinearly stretch and contract over time, but the same dynamic characteristics are arranged in time-ordered but different in time. When the same sentence is pronounced, it is very difficult to arrange the same dynamic characteristics # at the same time position. A computerized language recognition system first extracts sound waves about language information, that is, dynamic characteristics, transitions, and language-independent noises, such as human voices, tones, speech screams, physiological readings, and linguistic nuances. The same features of the same sentence are then arranged at the same time position. This series of features is represented by an isometric vector, called a sentence _ sign model. At present, the speech recognition system is too complicated to generate a large-scale feature, and it costs 201216253' because the same features of the same sentence are difficult to arrange at the same time position, especially in English, which makes the comparison difficult. A continuous acoustic feature is commonly used in the following ways: energy, zero crossings, 'exfreme c〇unt', peaks, linear predictive coding cepstrum (lpcc) And the Mel Frequency Cepstrum (MFCC), in which the linear predictive coding cepstrum (LpCC) and the Mel frequency cepstrum (MFCC) are the most effective and commonly used. Linear Predictive Coding Spring Frequency (LPCC) is the most reliable, stable, and accurate language feature for a continuous tone.线性 Linear regression mode is used to represent continuous sound waves, and the regression coefficient is calculated by the least squares estimation method, and its estimated value is converted into cepstrum, which becomes linear predictive coding cepstrum (LPCC). The Mel frequency cepstrum (mfcc) converts sound waves into frequencies using the Fourier transform method. The auditory system is estimated based on the ratio of the frequency of the Mel. root

據學者S.B, DavisandP. Mermelstein於 1980 年出版在 IEEEAccording to scholar S.B, Davisand P. Mermelstein was published in IEEE in 1980.

Transactions on Acoustics, Speech Signal Processing, φ V〇丨.28,No. 4 發表的論文 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences中用動態時間扭曲法 (DTW),梅爾頻率倒頻譜(MFCC)特徵比線性預估編碼倒頻譜 (LPCC)特徵辨認率要高。但經過多次語音辨認實驗(包含本 人前發明),用貝氏分類法,線性預估編碼倒頻譜(LPCC)特 徵辨認率比梅爾頻率倒頻譜(MFCC)特徵要高,且省時。 [s] 7 201216253 至於語言韻’已有很多方法_ 1動態時間扭曲法 (dynamic time-warping),向量量化法(vect〇r quantizati〇n) 及隱藏式馬可夫模式法(HMM)。如果相同的發音在時間上的變 化有差異,一面比對,一面將相同特徵拉到同一時間位置。辨 認率會很好’但將相同特徵拉到同一位置很困難並扭曲時間太 長,不能應用。向量量化法如辨認大量單音,不但不準確,且 費時。最近隱藏式馬可夫模式法(HMM)辨認方法不錯,但方 籲 法繁雜,太多未知參數需估計,計算估計值及辨認費時。以 T. F. Li(黎自奮)於 2003 年出版在 Pattern Recognition, vol. 36 發表的論文 Speech recognition of mandarin monosyllables’ Li,TzeFen (黎自奮)於1997年在美國專利 證書,Apparatus and Method for Normalizing and CategorizingTransactions on Acoustics, Speech Signal Processing, φ V〇丨.28, No. 4 Published by Parametric representations for monosyllabic word recognition in continuous speech Sentences using dynamic time warping (DTW), Mel frequency cepstrum (MFCC) The feature is higher than the linear predictive coding cepstrum (LPCC) feature recognition rate. However, after many speech recognition experiments (including my previous invention), the Bayesian classification method, the linear prediction coding cepstrum (LPCC) feature recognition rate is higher than the Mel frequency cepstrum (MFCC) feature, and saves time. [s] 7 201216253 There are many methods for language rhyme _ 1 dynamic time-warping, vector quantization (vect〇r quantizati〇n) and hidden Markov mode (HMM). If the same pronunciation changes in time, one side compares the same feature to the same time position. The recognition rate will be good' but it is difficult and too long to pull the same feature to the same position and cannot be applied. Vector quantization, such as identifying a large number of tones, is not only inaccurate, but also time consuming. Recently, the hidden Markov model method (HMM) identification method is good, but the method is complicated, too many unknown parameters need to be estimated, and the estimated value and the time limit for identification are calculated. T. F. Li (Li Zifen) published in 2003 in Pattern Recognition, vol. 36 Speech recognition of mandarin monosyllables’ Li, TzeFen (Li Zifen) in 1997 in the US Patent Certificate, Apparatus and Method for Normalizing and Categorizing

Linear Prediction Code Vectors using Bayesian CategorizationLinear Prediction Code Vectors using Bayesian Categorization

Technique,U.S.A· Patent No. 5,704,004, Dec. 30, 1997,黎自奮 • 於2008年在中華民國專利證書I 297487號(2008, 6,1)名稱語 音辨認方法及黎自奮於2009年在中華民國專利證書第I 310543號(2009, 6,1)名稱一個連續二次貝氏分類法辨認相似 國語單音的方法中,用貝氏分類法,以相同資料庫,將長短不 同一系列LPCC向量用各種方法壓縮成相同大小的特徵模型, 辨認結果比 Y.K· Chen, C.Y.Liu,G. H. Chiang,M.T. Lin 於 1990 年出版在 proceedingS 〇f Telecommunication [si 8 201216253Technique,USA· Patent No. 5,704,004, Dec. 30, 1997, Li Zifen • In 2008, in the Republic of China Patent Certificate I 297487 (2008, 6, 1) name speech recognition method and Li Zifen in 2009 in the Republic of China patent certificate I 310543 (2009, 6, 1) Name A continuous quadratic Bayesian classification method for identifying similar Mandarin singles. Using the Bayesian classification method, the same database is used to compress a series of different LPCC vectors in various ways. Character model of the same size, identification results than YK·Chen, CYLiu, GH Chiang, MT Lin published in 1990 in proceedings 〇f Telecommunication [si 8 201216253

Symposium,Taiwan 發表的論文 The recognition of mandarin 〇 osyllables based on the discrete hidden Markov model 中用隱藏式馬可夫模式法HMM方法要好。但壓縮過程複雜費 時’且相同單音很難將相同特徵壓縮到相同時間位置,對於相 似單音,很難辨認。 本發明語音辨認方法針對上述缺點,從學理方面,根據音 • 波有一種語音特徵,隨時間作非線性變化,自然導出一套抽取 吾音特徵綠,將任何語言的奸,,全侧相等的Εχρ=ΐΜ2 矩陣”表示。 【發明内容】 ⑴本發明最4要的目的是可以快速又準確地辨認任何語言 的任何句子,以前要計算及比對—個$子全部單字特徵 值’本發明只要計算及比對一個句子Εχρ矩陣值,可達到 和電腦自由交談的目地。 ⑵為了達到⑴的目地,本發明應用一種句子音波正常化及 抽取特徵方法。微職少數Ε==12 _轉性框,等 長,不重疊,沒有濾波器,能依一個句子音波長短,自由 調節含蓋全部句子縣,歸所有要龍—到多個單字 長短不-㈣子全部轉誠鱗Εχρ=ΐ2χΐ2線性預估編 碼倒頻譜矩陣。句子内-系列隨時間作非線性變化的動 201216253 態特性轉換成-個大小相等的ExP線性預估編瑪倒頻譜 矩陣,並且相同句子的特徵模型在相同時間位置上有= 同特徵。可以及時比對,達到電腦即時辨認效果。 ⑶本發明應用-千個不同資料庫,能辨認大量句子,逮 度快,準確率也大大提高。主要將全部已知句子分散在 -千個資料庫最接近聲音的資料庫内,舰未知句子 時,先找和未知句子聲音F個最接近的資料庫,再從ρ 個最接近的資料庫内的已知句子找所要辨認的未知句 子。在F個最接近的資料庫内所有的已知句子不多,很 谷易辨認’又準又快。以前要計算及比對一個句子内所 有單字特徵值矩陣,本發明只要計算及比對一個句子的 一個ExP矩陣值。 ⑷本發明不用樣本用統計計算及辨認,用數學計算 及用句子的雜預估編碼倒頻譜ΕχΡ矩陣之間的距離來 辨認。 ⑸本發賴财法可關認講話以域餘太慢的句 子。講話太快時,一個句子音波很短,本發明的 等長彈性框長度可以縮小’仍然用相同數£個等長的彈 性框含蓋短音波。產生Ε個線性預估編碼侧譜(Lpcc) 向量。講太慢所發出-個句子音波較長。㈣等長彈性 框長度會伸長。所產生相同數E個線性預估編碼倒頻譜 10 (6)201216253 (LPCC)向量也能有效代表該長句子。 本發明提供一種修正技術, 次即可。 對辨錯的句子清楚地發音一 【實施方式】 用第-圖及第二圖·發明執行程序。第—圖是表 • 立個資料庫,每個資料庫内有相似已知句子。第二 圖是表示使用者辨認未知句子執行程序。 先有個不同聲音〗,一個聲音音波轉換成數位化 ^遽點10,除去雜音或靜音2〇。祕該有聲音音波正常化 =抽取特徵,將-鱗音音波全部錢齡成帥等時 奴’每時段組成-個框。一個聲音一共有E個等長框加, 沒有驗n,*重疊,根據聲音铸錢闕長度,E個相 • 等框長度自由調整含蓋全部信號點。所以該框稱為等長彈 性框’長度自由伸縮,但E個,,彈性框長度一樣”。不像 漢明(Hamming)窗’有濾波器、半重疊、固定長度、不能隨 波長自由調整。因一個聲音音波隨時間作非線性變化,音 波含有-個語音動態特徵,也隨時間作非線性變化。因為 不重疊’所以本發明使用較少(E=12)個等長彈性框,涵蓋 全。P聲音音波’因信號點可由前面信號點估計,用隨時間 作線性變化的迴歸模式來密切估計非線性變化的音波,用 201216253 取、】,平方絲相縣知絲。每轉長彈性翻,用最 十舁P—12個線性預估編碼倒頻譜4〇, 一個聲音 用ExP線性預估編碼倒頻譜矩陣代表,—個聲音的祕線 性預估編碼倒頻譜矩陣代表一個資料庫一共有一千個資 ;、’庫5Q對所要辨認的已知句子清楚發音-次,除去靜音 及雜9在句子之前及句子之後’兩單字及兩音節之間, 删去所有的靜音及雜音。用E個相等彈性框將要辨認的已 夫句子轉換成-個Exp線性預估編碼倒頻譜(Lpcc)矩陣 60用距離將該已知句子的Εχρ線性預估編碼麵譜(㈣) 矩陣分到最接近的資料庫㈣。全部要辨_已知句子分 到Μ-1000個不同資料庫。有Μ=1_個資料庫每個資料 庫含相似的已知句子8〇。 >第二圖表示辨認的未知句子方法流程。先對—個所要辨 。的未知#子β楚發音2。未㈣子音波數位化成信號點 除去靜曰及雜音2〇,在未知句子之前及未知句子之後, 兩單字及1^音節之間’刪麵有的靜音及雜音。Ε個等長彈 性框正常化音波’抽取特徵’將要辨認的未知句子全部具 有語音的錢點分成Ε _段,每時段形成—個彈性框 3〇 ° -共有Ε辦轉性框,沒有滤波器 ,不重疊,自由 伸縮s蓋王札號點。在每樞内,因信號點可由前面信號 12 201216253 估計,用最小付法求迴歸未知係_估計值。每框内所 ,生的M2最小付估計值叫做線性驗編碼⑽)向 夏,將線性預估編碼(LPC)向量轉換較穩定線性預估編 瑪倒簡即向量,-個未知㈣—個Εχρ線性預估 編碼倒頻譜矩陣代表4卜本發_要辨認未知句子的ΕχΡ 線性預估編碼倒頻譜矩陣和Μ=1_資料庫抑Εχρ線性預 估編碼倒頻譜矩陣的距離或加權距離,找F個最接近的資 料庫,也即射個㈣庫距該要辨認未知句子的線性預估 編碼倒頻譜矩_ F個最小輯84。距離或加權距離 在Π固最接近資料庫内已知句子,找要辨認未知句子卯。 本發明詳述於後: ⑴-個聲音(句子)清楚發音後丨,將此聲音(句子)音波轉 換成一系列數化音波信號點(signai s卿㈣ points) 10。再啦不具語音音贿號點,在聲音 (句子)之前及聲音(句子)之後,兩單字及兩音節之 間’刪去所有的靜音及雜音2〇。不具語音信號點刪 去後,剩下信號點代表—個聲音(句子)全部信號點。 先將音波正常化再抽取特徵,將全部信號點分成㈣ 等時段,每時段形成一個框。一個聲音(句子)共有e 個等長的彈性框,沒有渡波器'不重疊、自由伸 縮’涵蓋全部信號點30。在每個等長彈性框内,信號 13 [S3 201216253 點隨時間作非線性變化,很難用數學模型表示。因為 J. Markhoui—於 1975 年出版在 proceedings of IEEE, Vol. 63’ Νο·4 發表論文 Linear Prediction: A tutorial review及 LU Tze Fen (黎自奮)於 1997 年在美國專利證書,Apparatus and Method for Normalizing and Categorizing Linear Prediction Code Vectors using Bayesian Categorization Technique, U.S.A. PatentNo.5,704,004,Dec?. 30, 1997 中說明信號點與前 面信號點有線性關係,可用隨時間作線性變化的迴歸 的模型估計此非線性變化的信號點。信號點5⑻可由 前面信號點估計’其估計值《S,⑻由下列迴歸模式表示:Symposium, Taiwan published paper The recognition of mandarin 〇 osyllables based on the discrete hidden Markov model HMM method with hidden Markov mode is better. However, the compression process is complicated and time consuming and the same tone is difficult to compress the same feature to the same time position, which is difficult to recognize for similar tones. The speech recognition method of the present invention is directed to the above shortcomings. From the academic point of view, according to the sound characteristics of the sound wave, the nonlinear change with time, naturally derives a set of extracted feature green, which is equal to any language. Εχ ρ = ΐΜ 2 matrix ” [Abstract] (1) The fourth objective of the present invention is to quickly and accurately identify any sentence in any language, previously calculated and compared - a single sub-word eigenvalue 'the present invention as long as Calculate and compare the value of a sentence Εχρ matrix to achieve the goal of free conversation with the computer. (2) In order to achieve the purpose of (1), the present invention applies a sentence sound wave normalization and extraction feature method. A small number of Ε ==12 _ rotational box , isometric, non-overlapping, no filter, can be according to a sentence with a short wavelength, freely adjust all the sentences containing the cover, return to all the dragons - to the length of multiple words - (four) all turn to honest scales Εχ ΐ = 2 2 linear prediction Encoding the cepstrum matrix. The intra-sentence-series changes nonlinearly with time. The 201216253 state property is converted into an equal-sized ExP linear prediction. Cepstrum matrix, and the feature model of the same sentence has the same feature at the same time position. It can be compared in time to achieve the real-time recognition effect of the computer. (3) The application of the invention - thousands of different databases, can identify a large number of sentences, and catch fast The accuracy rate is also greatly improved. The main known sentences are scattered in the database of the nearest sounds of thousands of databases. When the ship is unknown, first find the closest database with the unknown sentence sounds, and then from ρ The known sentences in the closest database find the unknown sentences to be identified. There are not many known sentences in the F closest database, and it is easy to identify and accurate. Previously calculated and compared For all single-character eigenvalue matrices in a sentence, the present invention only needs to calculate and compare an ExP matrix value of a sentence. (4) The present invention does not use samples for statistical calculation and recognition, and uses mathematical calculations and misinterpretation of sentences to encode cepstrums. The distance between the matrices is recognized. (5) The text of the confession can recognize the sentence in which the speech is too slow. When the speech is too fast, the sentence sound is very short, the present invention The length of the isometric elastic frame can be reduced to 'still use the same number of elastic frames with the same length to cover the short sound waves. Generate a linear predictive coding side spectrum (Lpcc) vector. Speak too slowly - a sentence is longer (4) The length of the elastic frame of the same length will be elongated. The same number of E linear predictive coding cepstrums 10 (6) 201216253 (LPCC) vector can also effectively represent the long sentence. The present invention provides a correction technique. Clearly pronounce the sentence that is being erroneous. [Embodiment] Use the first and second diagrams to invent the execution program. The first diagram is a table • A database is established, each database has similar known sentences. The figure shows that the user recognizes the unknown sentence execution program. There is a different sound, and a sound sound wave is converted into a digital sound point 10 to remove the noise or mute 2〇. The secret should be normalization of the sound sound wave = the extraction feature, the - scale sound sound wave all the money age into a handsome time slaves - each time frame - frame. A sound has a total of E equal length frames, no check n, * overlap, according to the length of the sound cast money, E phase • The length of the frame is free to adjust all the signal points covered. Therefore, the frame is called the same length elastic frame 'length freely stretchable, but E, the length of the elastic frame is the same." Unlike the Hamming window, there are filters, semi-overlaps, fixed lengths, and cannot be freely adjusted with wavelength. Because a sound wave changes nonlinearly with time, the sound wave contains a dynamic feature of the voice, which also changes nonlinearly with time. Because it does not overlap, the present invention uses less (E=12) equal-length elastic frames, covering the whole P sound sound wave 'Because the signal point can be estimated from the previous signal point, use the regression mode which changes linearly with time to closely estimate the nonlinear change sound wave, use 201216253 to take,], square silk phase county know silk. Using the top ten P-12 linear prediction codes to encode the cepstrum 4〇, one sound is represented by the ExP linear prediction coding cepstrum matrix, and the secret linear prediction coding of the sound is represented by a database. Thousands of funds;, 'Library 5Q clearly pronounces the known sentences to be recognized - times, except for mute and miscellaneous 9 before and after the sentence 'between words and two syllables, delete all mute and To convert the sentence to be recognized into an exp-predictive coded cepstrum (Lpcc) matrix 60 using E equal elastic boxes. Use the distance to classify the Εχρ linear predictive coding spectrum ((4)) matrix of the known sentence. To the nearest database (4). All must be identified _ known sentences are divided into Μ-1000 different databases. There are Μ = 1 database each database contains similar known sentences 8 &. > second The figure shows the process of identifying the unknown sentence method. First, the unknown #子β楚 pronunciation 2 is not. The (4) sub-sonic is digitized into signal points to remove the silence and murmur 2〇, before the unknown sentence and after the unknown sentence, Between the two words and the 1^ syllables, the mute and murmurs are deleted. The equal-length elastic frame normalizes the sound wave 'extraction feature'. The unknown sentences that are to be recognized are all divided into Ε _ segments, which are formed every time period— Elastic frame 3〇° - There are a total of rotating frames, no filters, no overlap, freely telescopic s cover Wang Zai points. In each pivot, because the signal points can be estimated by the front signal 12 201216253, using the minimum payment method Regression unknown system _ estimated value. Each box The resulting M2 minimum estimate is called linear test coding (10). To the summer, the linear predictive coding (LPC) vector is transformed into a more stable linear estimate, which is a vector, and an unknown (four)-a Εχρ linear estimate. The coded cepstrum matrix represents 4 本 发 _ 要 辨 辨 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 要 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ The library, that is, the shot (4) library from the linear predictive coding cepstrum moment to identify the unknown sentence _ F minimum set 84. The distance or weighted distance in the closest sentence to the known sentence in the database, find the unknown sentence The present invention is described in detail later: (1) After a sound (sentence) is pronounced clearly, the sound (sentence) sound wave is converted into a series of digitized sound wave signal points (signai s (four) points) 10 . Again, there is no voice tone, and before the sound (sentence) and after the sound (sentence), between the two words and the two syllables, all the mute and noise are deleted. After the voice signal point is deleted, the remaining signal points represent all the signal points of a sound (sentence). The sound wave is normalized and then the feature is extracted, and all the signal points are divided into (four) time periods, and a frame is formed every time period. A sound (sentence) has a total of e-length elastic frames, and no waver 'does not overlap, freely stretch' covers all signal points 30. In each equal-length elastic frame, the signal 13 [S3 201216253 point changes nonlinearly with time and is difficult to represent with a mathematical model. Because J. Markhoui—published in proceedings of IEEE in 1975, Vol. 63' Νο·4 published papers Linear Prediction: A tutorial review and LU Tze Fen (Li Zifen) in 1997 in the US Patent Certificate, Apparatus and Method for Normalizing and Categorizing Linear Prediction Code Vectors using Bayesian Categorization Technique, USA Patent No. 5,704,004, Dec?. 30, 1997 illustrates that the signal point has a linear relationship with the previous signal point, and the nonlinearly varying signal can be estimated using a regression model that varies linearly with time. point. Signal point 5(8) can be estimated from the previous signal point 'The estimated value of 'S, (8) is represented by the following regression pattern:

P s,(.n) = ^i,S(n-k), n>〇 (i)P s,(.n) = ^i,S(n-k), n>〇 (i)

Ar=| 在(1)式中’〜,Α = 1,...,Ρ,是迴歸未知係數估計值,P是 前面信號點數目。用L. Rabiner及B. H. Juang於1993 年著作書 Fundamentals of Speech Recognition, Prentice Hall PTR, Englewood Cliffs, New Jersey 及Li,TzeFen (黎自奮)於1997年在美國專利證書, Apparatus and Method for Normalizing and Categorizing Linear Prediction Code Vectors using Bayesian Categorization Technique, U.S.A. Patent No. 5,704,004, Dec. 30,1997中Durbin的猶環公式求最小平方估計 [s] 201216253 值,此組估計值叫做線性預估編碼(LPC)向量。泉樞 内信號點的線性預估編碼(LPC)向量方法詳述如下. 以表示信號點S⑻及其估計值s,⑻之間平方差總和. ^ι=Σ[5(Λ)-Σα*5(»-^)]2 rM) P L ka\ (2) 求迴歸係數使平方總和尽達最小。對每個未知迴歸係Ar=| In the formula (1), '~, Α = 1,...,Ρ, is the estimated value of the regression unknown coefficient, and P is the number of previous signal points. In 1993, L. Rabiner and BH Juang published Fundamentals of Speech Recognition, Prentice Hall PTR, Englewood Cliffs, New Jersey and Li, TzeFen (Li Zifen) in 1997 in the US Patent Certificate, Apparatus and Method for Normalizing and Categorizing Linear Prediction. Code Vectors using Bayesian Categorization Technique, USA Patent No. 5,704,004, Dec. 30, 1997, Durbin's formula for finding the least squares estimate [s] 201216253, this set of estimates is called the linear predictive coding (LPC) vector. The Linear Predictive Coding (LPC) vector method for signal points in Quanshu is detailed below. To represent the sum of the squared differences between signal point S(8) and its estimated value s, (8). ^ι=Σ[5(Λ)-Σα*5 (»-^)]2 rM) PL ka\ (2) Find the regression coefficient to minimize the sum of squares. For each unknown regression system

數〜,=1,·.·,夂求(2)式的偏微分,並使偏微分為〇 ,得到 P組正常方程式: 2a^S(n~mn-i) = ^SWin-i), imp (3) rt 展開(2)式後,以(3)式代入,得最小總平方差心 £/>=Z52(w)-Ea*Z5(«)^(«'^) (4) n η (3)式及(4)式轉換為 ρ Σα* 及(’·-灸)=及(0, ⑸The number ~, =1, ···, pleading for the partial differential of (2), and dividing the partial differential into 〇, the normal equation of P group is obtained: 2a^S(n~mn-i) = ^SWin-i), Imp (3) rt After expanding (2), substituting (3), the minimum total squared difference is £/>=Z52(w)-Ea*Z5(«)^(«'^) (4) n η (3) and (4) are converted to ρ Σα* and ('·- moxibustion)= and (0, (5)

五户=聊~ Σ〜離) ⑹ 在⑸及(6)式中’用絲示框内信號點數, N,i ^(0 = X 5(ri)S(« + /), i > 〇 (?) l|s〇 用Durbin的循環快速計算線性預估編碼(Lpc)向量 如下: (8) E〇 = R(0) [Si 15 (9)201216253 kt ^(,) =k,.(five households = chat ~ Σ ~ away) (6) In (5) and (6), use the wire to indicate the number of signal points in the frame, N, i ^ (0 = X 5(ri)S(« + /), i > 〇(?) l|s〇 Use the Durbin's loop to quickly calculate the linear predictive coding (Lpc) vector as follows: (8) E〇= R(0) [Si 15 (9)201216253 kt ^(,) =k,.

(10) 01) 尽=(1 食’ (12)(10) 01) End = (1 food) (12)

(8~12)公式贿計算,得到迴歸係數最小平方估計值 七’户1,···',(線性預估編碼(LPC)向量)如下: Μτπ#將LPC向量轉換較穩鎌性預估編碼倒 頻3鲁(LPCC)向量户 a'=a,+|(7)a^ (14) …’里(f)W P<i (15) 個彈性框產生—個線性驗編碼倒頻譜(Lpc〇向(8~12) Formula bribe calculation, get the regression coefficient least squares estimation value of ''1', '···', (linear predictive coding (LPC) vector) as follows: Μτπ# will be more stable prediction of LPC vector conversion Code scrambling 3 Lu (LPCC) vector household a'=a, +|(7)a^ (14) ... 'in (f) W P<i (15) elastic boxes produce a linear coded cepstrum ( Lpc orientation

量('’ ’ P) 40 °根據本發明語音辨認方法,用P=12, 因最後的線性雛編碼觸譜(祖)幾乎為G。一個 以E個線性預估編碼倒頻譜⑽c)向量表示一個聲 曰(句子)特徵,也即一個含Εχρ=1Μ2個線性預估編 碼倒頻譜(LPCC)的矩陣表示一個聲音5〇。 ⑵將要贿的已知句子發音後,已知句子驗,兩單字 及兩θ 4之間’刪去所有的靜音及雜音,⑽-⑸公式 將已矣句子轉換成線性預估編碼倒頻错似⑻矩 [si- 201216253 陣60。用已知句子線性預估編碼倒頻譜(lpcqexP矩 陣與所有M=1000不同聲音的線性預估編碼倒頻譜Εχρ 矩陣之間距離或加權距離找最接近的資料庫,將該已 知句子的ΕχΡ線性預估編碼倒頻譜(LPCC)矩陣分到最 接近的資料庫内70。有.1000個資料庫,每個資料 庫含相似的已知句子8〇。The amount ('' ’ P) 40 ° according to the speech recognition method of the present invention, P = 12, because the final linear coding code (grandfather) is almost G. A cepstrum (10) c) vector with E linear predictive coding represents a sonic (sentence) feature, that is, a matrix containing Εχρ=1Μ2 linear predictive coding cepstrums (LPCC) represents a sound of 5〇. (2) After the known sentence of the bribe is pronounced, the sentence is known, the two words and the two θ 4 are 'deleted all the mute and murmur, and the (10)-(5) formula converts the sentence into a linear predictive coding scrambling error. (8) Moment [si- 201216253 array 60. Linearly predict the coded cepstrum with a known sentence (the distance between the lpcqexP matrix and the linear predictive coding cepstrum Εχρ matrix of all M=1000 different sounds or the weighted distance to find the closest database, the linearity of the known sentence The estimated coded cepstrum (LPCC) matrix is assigned to the nearest database. 70. There are .1000 databases, each containing 8 known similar sentences.

(3)要辨認未知句子時,使用者先對一個所要辨認的未知 句子清楚發音2。未知句子音波數位化成信號點1〇, 除去靜音及雜音20,縣知句子之歧未知句子之 後,兩單字及兩音節之間,刪去所有的靜音及雜音。(3) When an unknown sentence is to be recognized, the user first pronounces a known sentence that is to be recognized. The unknown sentence is digitized into a signal point of 1〇, except for mute and murmur 20. After the county knows the sentence is unknown, the mute and murmur are deleted between the two words and the two syllables.

E個等長職框正常化音波,抽取概,將要辨認的 未知句子全部具有时的频齡成£ _段,每時 段形成-轉性框…財E轉長雜框,沒有滤 波器,不重疊,自由伸縮含蓋全部信號點3〇。在每框 因信號點可由前面信號估計,用最小平方法求迴 2知係數的估計值。每框内用㈣)公式轉換成線 1估編碼倒麵(_ΕχΡ矩陣,—個未知句子用 f ΕχΡ線性預估編碼觸譜矩陣代表幻。本發明 ^ M該未知句子線性預估編碼倒頻譜ΕχΡ矩陣與所 =”_音_編铜頻譜秘矩陣 0之間的距離或加權距離找F個最接近的資料庫 201216253 抑。再從F個最接近的資料庫,該未知句子線性 預估編碼倒頻譜ExP矩陣與最接近的F個資料庫内已 知句子的線性讎編碼隹4頻譜Εχρ矩陣之間的距離或 加權距離,找使用者所要的未知句子90。 (4)為了證實本發明驗速又準確地辨認任何語言任何 句子,可辆和細自由交談的目地,發明人用1〇〇〇 個英語單字聲音代表1〇〇〇個不同資料庫,發明人發音 _ 928句子(8G英語句子、284中文句子、3台語句子、 2曰語句子、160英語單字、398中文單字、丨德文單 字)。測試後’句子及英語單字全部排名第一,以前 要計算及比對-個句子全部單字特徵值,本發明只要 計算及比對一個ExP矩陣值,中文單字也在前兩名, 因同音子太多,辨認時間不到1秒;發明人發音 英語單字,測試後’也在前五名辨認時間不到2秒; • 發明人發音4400中文單字,測試後,也在前20名, 辨認時間不到2秒。4400中文單字做為語音輸入中 文軟體。本發明說明書用本軟體輸入。 (5)圖六,七用本軟體(Visual Basic)輸入片斷本發明 說明書。圖三到圖五用本發明辨認中文及英文句子。 【圖式簡單說明】 第一圖及第二圖5尤明發明執行程序。第一圖是表示建立M=1〇〇0 [S] 201216253 個不同資料庫,每個資料庫含相似的已知句子。第二圖表示辨認未 知句子的流程。第三_第是表示用yis⑽軟體輸入片 斷本發明說明書及辨認中文及英文句子。 【主要元件符號說明】 (1)先有M=1000個不同聲音 鲁 (1〇)音波數位化 (20)除去雜音及靜音時段 (30) E個等長彈性框正常化全部有聲音波 (40)每鮮長雜_,崎小平絲計刺目線性預估 編碼倒頻譜 (50) —個聲音的線性預估編碼倒頻譜Εχρ矩陣代表一個資 料庫,一共有一千個資料庫 • (60)對已知句子清楚發音-次,除去靜音及雜音,將它轉 換成線性預估編碼倒頻譜(Lpcc) ΕχΡ矩陣 (70)用距離將已知句子線性預估編碼倒頻譜(LpCc) 矩陣分到最接近的資料庫内 (80)有Μ=1000個資料庫,每個資料庫含相似的已知句子 (2)對要辨§忍未知句子清楚發音 (41)每個等長彈性框内,用最小平方法計算ρ個線性預估 編碼倒頻譜,一個未知句子用線性預估編碼倒頻譜ε 201216253 χΡ矩陣代表 (84)用距離在Μ=1000個資料庫找F個和該要辨認未知句 子最接近的資料庫 (90)在F個最接近的資料庫内相似的已知句子,用距離找 要辨認的未知句子E equal length frame normalizes the sound wave, extracts the average, and the unknown sentences to be identified all have the time of the time into the £_ segment, forming a transition box every time period... the financial E turns the long box, there is no filter, no overlap , freely telescopic cover all signal points 3 〇. In each frame, the signal point can be estimated from the previous signal, and the estimated value of the 2 knowledge coefficient is obtained by the least square method. Each box uses (4)) formula to convert to line 1 to estimate the coded inverse (_ΕχΡ matrix, - an unknown sentence with f ΕχΡ linear prediction coded spectral matrix represents illusion. The invention ^ M the unknown sentence linear prediction coding cepstrum ΕχΡ The distance between the matrix and the ="____ copper-sense spectrum secret matrix 0 or the weighted distance is found in the closest database of 201216253. From the closest database of F, the linear prediction of the unknown sentence is encoded. The distance or weighted distance between the spectrum ExP matrix and the linear 雠 code 隹4 spectrum Εχρ matrix of the known sentences in the closest F databases is used to find the unknown sentence 90 desired by the user. (4) To confirm the speed of the invention And accurately identify any sentence in any language, the purpose of the car and the fine free conversation, the inventor uses 1 English single-word voice to represent 1 different databases, the inventor pronounces _ 928 sentences (8G English sentences, 284 Chinese sentences, 3 sentences, 2 sentences, 160 English words, 398 Chinese words, and German characters. After the test, the sentences and English words are all ranked first, and the previous calculations and comparisons - sentences All single word feature values, the present invention only needs to calculate and compare an ExP matrix value, the Chinese word is also in the top two, because the homophone is too much, the recognition time is less than 1 second; the inventor pronounces the English word, after the test 'before The five recognition time is less than 2 seconds; • The inventor pronounces 4400 Chinese words, and after the test, it is also in the top 20, and the recognition time is less than 2 seconds. The 4400 Chinese word is used as the voice input Chinese software. The present specification is input by the software. (5) Figure 6 and Figure 7 are used to input the fragment of the present invention. Figure 3 to Figure 5 identify the Chinese and English sentences with the invention. [Simplified illustration] The first picture and the second picture 5 The invention is executed. The first figure shows the establishment of M=1〇〇0 [S] 201216253 different databases, each containing similar known sentences. The second figure shows the process of identifying unknown sentences. The first is to use the yis (10) software to input the segment of the present invention and to identify Chinese and English sentences. [Main component symbol description] (1) First M=1000 different sounds Lu (1〇) sound wave digitization (20) Remove noise and mute Time Segment (30) E equal length elastic frames normalized all sound waves (40) every fresh long _, Saki small flat wire glaring linear prediction coding cepstrum (50) - a linear prediction of the sound encoding cepstrum Εχ ρ matrix Represents a database with a total of one thousand databases. • (60) Clearly pronounces known sentences - times, removes mutes and noises, converts them into linear predictive coding cepstrum (Lpcc) ΕχΡ matrix (70) with distance The known sentence linear predictive coding cepstrum (LpCc) matrix is divided into the closest database (80) with Μ=1000 databases, each containing similar known sentences (2) Forbearance of unknown sentences is clearly pronounced (41) Within each equal-length elastic frame, ρ linear predictive coding cepstrums are calculated by the least square method, and an unknown sentence is coded by linear predictive cepstrum ε 201216253 χΡ matrix representation (84) distance Find the unknown sentences in the F closest database with the closest known database (90) in the database of 1000=1000 databases, and find the unknown sentences to be identified by distance.

Claims (1)

201216253 七、申請專利範圍: 1. 一個辨認所有語言句子方法,其步驟包含: (1) 一個句子可能是任何語言一個單音,單字,名稱或句 子,先有Μ=ιοοο個不同聲音; (2) —個先前處理器(pre-processor)刪去在句子〈聲音) 之前及句子(聲音)之後,兩單字之間及兩音節之間,所有 φ 不具語音音波信號點(sampled points)的靜音及雜音; (3) —個聲音或句子音波正常化及抽取特徵方法:用e個 相等彈性框,沒有濾波器(Filter),不重疊,將一個聲音 或句子音波正常化,並轉換成大小相等的線性預估編碼倒 頻譜(LPCC) ExP矩陣; (4) M=1000不同聲音的線性預估編碼倒頻譜(Lpcc) Εχρ 矩陣代表Μ=1000不同資料庫; . (5)使用者對已知句子清楚發音一次,刪去在句子之前及 句子之後’兩單字及兩音節之間,所有不具語音音波信號 點(sampled points)的靜音及雜音,用]£個相等彈性框將 -個已知句子具語音的音波正常化,並轉換成大小相等的 線性預估編碼倒頻譜(LPCC) ExP矩陣; (6)用已知句子線性預估編碼倒頻譜αρ(χ)Εχρ矩陣與所 有M=1000不同聲音的線性預估編碼倒頻譜Εχρ矩陣之間距 離或加權距離找最接近的資料庫將該已知句子的線性預 21 [S3 201216253 估編碼倒頻譜(LPCC) ExP矩陣分到最接近的資料庫内同 樣,再用距離或加權距離,將全部要辨認的任何語言已知 句子的線性預估編碼倒頻譜ExP矩陣分到和代表資料庫聲 音的線性預估編碼倒頻譜ExP矩陣距離最近的資料庫内, 相似已知句子都放在同一資料庫内; (Ό要辨認未知句子時,使用者對所要未知句子發音後,本 發明同樣用絲知句子雜觀編碼倒觸Εχρ矩陣與所 有Μ-1000不同聲音的線性預估編碍倒頻譜gyp矩陣之間的 距離或加權距離找F個最接近·料庫,再用該未知句子線 性預估編碼倒頻譜ExP矩陣與F個最接近的資料庫内,,相 似已知句子的線性預估編碼倒頻譜ΕχΡ矩陣之間的距離 或加權距離,找使用者所要的未知句子; ⑻如果辨認不成功,使用者再發音該句子一次,用請 相專彈性框將該句子轉換成線性預估編碼倒頻譜(而Εχ p矩陣’用距離將該句子的線性預估編碼觸譜(Lpcc)Exp 矩陣分到最接近的資料庫内,再辨認該句子會成功。 2·根據申請專嫌圍第1項所述之-細認所有語言句子方 法’其中步驟(3)包含用E個相等彈性框,等長,沒有濾波 器(Filter),不重疊,將—個聲音或句子音波正常化及抽 取大小一致的特徵矩陣,步驟如下: (a)刪去在句子(聲音)之前及句子(聲音)之後,兩單字之間及 m 22 2〇1216253 兩音節之間,所有不具語音音波信號點(卿】ed細⑹ 的靜音及雜音,用一個均等分一個句子(一個聲音)所有 有聲音波信號點方法’為了用線性變化的迴歸模式密切估 計非線性變化的音波,將全長有聲音波錢點分成£=12 相等時段,每相料段形成一個彈,難,一個句子(一個 聲《)共有E個等長彈性框,沒有渡波器㈣, 不重疊,可以自由伸縮含蓋全長音波,不是固定長度的漢 明(Hamming)窗; (b) 每個#長雜框内,用一隨時間作線性變化的迴歸模 式估計隨時間作非線性變化的音波; (c) 用Durbin’s循環方式 ^0) = J^S(n)S(n + /), / &gt; 〇 五〇=及⑼201216253 VII. Patent application scope: 1. A method for recognizing all language sentences, the steps of which include: (1) A sentence may be a single tone, a single word, a name or a sentence in any language, preceded by ι=ιοοο a different voice; ) - a pre-processor deletes the sentence <sound) and after the sentence (sound), between the two words and between the two syllables, all φ does not have the voiced signal points (sampled points) of the mute and (3) - Sound or sentence sound normalization and extraction feature method: use e equal elastic frames, no filter, no overlap, normalize a sound or sentence sound wave, and convert to equal size Linear predictive coding cepstrum (LPCC) ExP matrix; (4) M=1000 linear prediction coding cepstrum for different sounds (Lpcc) Εχρ matrix stands for Μ=1000 different databases; (5) user to known sentences Clearly pronounce once, delete the mute and murmur of the sampled points before and after the sentence, between the two words and the two syllables. The equal-elastic frame normalizes the sound waves of a known sentence with speech and converts them into a linear predictive coding cepstrum (LPCC) ExP matrix of equal size; (6) Linearly predicts the encoded cepstrum αρ with a known sentence (χ) Εχρ matrix and all M=1000 different sound linear prediction coding cepstrum Εχρ matrix distance or weighted distance to find the closest database to the linear pre-preparation of the known sentence [S3 201216253 Estimated Coded Cepstrum (LPCC) The ExP matrix is divided into the closest database. Then, using the distance or weighted distance, the linear predictive coded cepstrum ExP matrix of any known language is recognized and the linear predictive code representing the sound of the database is divided. The censored ExP matrix is located in the nearest database, and similarly known sentences are placed in the same database; (When the unknown sentence is to be recognized, after the user pronounces the unknown sentence, the present invention also uses the known sentence to encode the sentence. The linear equation of the inverse Εχρ matrix and all Μ-1000 different sounds is used to find the distance or weighted distance between the cepstrum gyp matrices to find the F closest to the library, and then use the unknown sentence. The linear predictive coding cepstrum ExP matrix and the F closest database, similar to the distance or weighted distance between the linear predictive coding cepstrum ΕχΡ matrix of the known sentence, find the unknown sentence desired by the user; (8) If If the recognition is unsuccessful, the user will pronounce the sentence once again, and convert the sentence into a linear predictive coding cepstrum by using the special elastic box (and Εχ p matrix' uses the distance to encode the linear predictive coding of the sentence (Lpcc) The Exp matrix is assigned to the closest database and the sentence will be recognized successfully. 2. According to the application of the suspects around the first item - carefully identify all language sentence methods 'where step (3) contains E equal elastic boxes, equal length, no filter, no overlap, will be - To normalize the sound or sentence sound waves and extract the feature matrix of the same size, the steps are as follows: (a) Before the sentence (sound) and after the sentence (sound), between the two words and between m 22 2〇1216253 two syllables, All mutes and murmurs that do not have a voice signal point (Q) ed fine (6), use a method of equally dividing a sentence (a sound) with all sound wave signal points. 'In order to closely estimate the nonlinearly varying sound waves with a linearly varying regression pattern, The full length has a sound wave point divided into £=12 equal time periods. Each phase of the material segments forms a bullet. It is difficult. One sentence (one sound ") has a total of E equal length elastic frames. There is no wave (4), no overlap, and it can be freely stretched and covered. Full-length sound wave, not a fixed-length Hamming window; (b) In each #long box, use a regression model that varies linearly with time to estimate the sound wave that changes nonlinearly with time. (C) a cyclic manner with Durbin's ^ 0) = J ^ S (n) S (n + /), / &gt; = and fifty square ⑼ k^W)-J:arR^j)]/E y Si-1 aj = aT\ κβΡ,叫做線性預估 求P=12迴歸係數最小平方估計值e 編碼(LPC)向量,再用 23 201216253 ^=^+Σ(ζΚ-&gt;αν 1 化ρ y=i 1 a', = Σ (τΚ-&gt;°'7- p&lt;i j=i-^ 1 轉換線性預估編碼(LPC)向量為穩定的線性預估編碼倒 頻譜(LPCC)向量&lt;,匕以; (d)用E個線性預估編碼倒頻譜(LPCC)向量(一個線性預估 編碼倒頻譜(LPCC)ExP矩陣)表示一個句子(或一個聲音)。k^W)-J:arR^j)]/E y Si-1 aj = aT\ κβΡ, called linear prediction P=12 regression coefficient least squares estimate e coding (LPC) vector, reuse 23 201216253 ^ =^+Σ(ζΚ-&gt;αν 1 ρ y=i 1 a', = Σ (τΚ-&gt;°'7- p&lt;ij=i-^ 1 Convert Linear Predictive Coding (LPC) Vector to Stable The linear predictive coding cepstrum (LPCC) vector &lt;, 匕; (d) expresses a sentence using E linear predictive coding cepstrum (LPCC) vectors (a linear predictive coding cepstrum (LPCC) ExP matrix) (or a voice). [S3 24[S3 24
TW099134580A 2010-10-11 2010-10-11 A speech recognition method on sentences in all languages TWI460718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW099134580A TWI460718B (en) 2010-10-11 2010-10-11 A speech recognition method on sentences in all languages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW099134580A TWI460718B (en) 2010-10-11 2010-10-11 A speech recognition method on sentences in all languages

Publications (2)

Publication Number Publication Date
TW201216253A true TW201216253A (en) 2012-04-16
TWI460718B TWI460718B (en) 2014-11-11

Family

ID=46787165

Family Applications (1)

Application Number Title Priority Date Filing Date
TW099134580A TWI460718B (en) 2010-10-11 2010-10-11 A speech recognition method on sentences in all languages

Country Status (1)

Country Link
TW (1) TWI460718B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI610294B (en) * 2016-12-13 2018-01-01 財團法人工業技術研究院 Speech recognition system and method thereof, vocabulary establishing method and computer program product

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100827097B1 (en) * 2004-04-22 2008-05-02 삼성전자주식회사 Method for determining variable length of frame for preprocessing of a speech signal and method and apparatus for preprocessing a speech signal using the same
TWI370441B (en) * 2008-02-19 2012-08-11 Tze Fen Li A speech recognition method for both english and chinese

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI610294B (en) * 2016-12-13 2018-01-01 財團法人工業技術研究院 Speech recognition system and method thereof, vocabulary establishing method and computer program product
US10224023B2 (en) 2016-12-13 2019-03-05 Industrial Technology Research Institute Speech recognition system and method thereof, vocabulary establishing method and computer program product

Also Published As

Publication number Publication date
TWI460718B (en) 2014-11-11

Similar Documents

Publication Publication Date Title
Hasan et al. Speaker identification using mel frequency cepstral coefficients
TWI396184B (en) A method for speech recognition on all languages and for inputing words using speech recognition
US7636659B1 (en) Computer-implemented methods and systems for modeling and recognition of speech
US20080059156A1 (en) Method and apparatus for processing speech data
CN108877784B (en) Robust speech recognition method based on accent recognition
Patil et al. Automatic Speech Recognition of isolated words in Hindi language using MFCC
Ranjan et al. Isolated word recognition using HMM for Maithili dialect
Prakoso et al. Indonesian Automatic Speech Recognition system using CMUSphinx toolkit and limited dataset
Sinha et al. Continuous density hidden markov model for context dependent Hindi speech recognition
Priyadarshani et al. Dynamic time warping based speech recognition for isolated Sinhala words
Goh et al. Robust computer voice recognition using improved MFCC algorithm
Khanna et al. Application of vector quantization in emotion recognition from human speech
JP2006235243A (en) Audio signal analysis device and audio signal analysis program for
JP5091202B2 (en) Identification method that can identify any language without using samples
Deemagarn et al. Thai connected digit speech recognition using hidden markov models
Touazi et al. An experimental framework for Arabic digits speech recognition in noisy environments
Thalengala et al. Study of sub-word acoustical models for Kannada isolated word recognition system
Ananthakrishna et al. Kannada word recognition system using HTK
Graciarena et al. Voicing feature integration in SRI's decipher LVCSR system
Saundade et al. Speech recognition using digital signal processing
Chow et al. Speaker identification based on log area ratio and Gaussian mixture models in narrow-band speech: speech understanding/interaction
TW201216253A (en) A speech recognition method on sentences in all languages
Saksamudre et al. Isolated word recognition system for Hindi Language
Sharma et al. Speech recognition of Punjabi numerals using synergic HMM and DTW approach
US20120116764A1 (en) Speech recognition method on sentences in all languages

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees