TWI395200B - A speech recognition method for all languages without using samples - Google Patents

A speech recognition method for all languages without using samples Download PDF

Info

Publication number
TWI395200B
TWI395200B TW98126015A TW98126015A TWI395200B TW I395200 B TWI395200 B TW I395200B TW 98126015 A TW98126015 A TW 98126015A TW 98126015 A TW98126015 A TW 98126015A TW I395200 B TWI395200 B TW I395200B
Authority
TW
Taiwan
Prior art keywords
continuous
unknown
sound
sentence
name
Prior art date
Application number
TW98126015A
Other languages
Chinese (zh)
Other versions
TW201106340A (en
Inventor
Tze Fen Li
Tai Jan Lee Li
Shih Tzung Li
Shih Hon Li
Li Chuan Liao
Original Assignee
Tze Fen Li
Tai Jan Lee Li
Shih Tzung Li
Shih Hon Li
Li Chuan Liao
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tze Fen Li, Tai Jan Lee Li, Shih Tzung Li, Shih Hon Li, Li Chuan Liao filed Critical Tze Fen Li
Priority to TW98126015A priority Critical patent/TWI395200B/en
Publication of TW201106340A publication Critical patent/TW201106340A/en
Application granted granted Critical
Publication of TWI395200B publication Critical patent/TWI395200B/en

Links

Description

一種不用樣本能辨認所有語言的辨認方法 An identification method that can identify all languages without using samples

一連續音包含一或多個音節(單音)。本發明可以不用連續音的樣本能辨認所有語言。 A continuous tone contains one or more syllables (monophonic). The present invention can recognize all languages without using samples of continuous sound.

本發明用12彈性框(窗),等長,無濾波器,不重疊,將長短不一的連續音的音波轉換成12×12的線性預估編碼倒頻譜(LPCC)的矩陣,一未知的連續音用12×12的線性預估編碼倒頻譜的矩陣表示。一12×12矩陣認為是一144度空間的一向量。很多未知連續音的向量散佈在144度空間。當說話人發一已知連續音,該已知連續音的特徵由周圍的未知連續音的特徵(LPCC)模擬及計算。 The invention uses 12 elastic frames (windows), equal length, no filter, no overlap, and converts sound waves of continuous sounds of different lengths into a matrix of 12×12 linear predictive coding cepstrum (LPCC), an unknown The continuous tone is represented by a matrix of 12×12 linear predictive coding cepstrum. A 12 x 12 matrix is considered to be a vector of a 144 degree space. Many vectors of unknown continuous sounds are scattered in 144 degrees. When the speaker sends a known continuous sound, the characteristics of the known continuous sound are simulated and calculated by the characteristics of the surrounding unknown continuous sound (LPCC).

本發明包含12個彈性框正常化一連續音的音波,貝氏比對法在資料庫中為發音者的未知連續音找一已知連續音,將一說話者的一未知的句子分成D個未知連續音,及一視窗篩選法,篩選一已知句子為說話者的未知句子。 The invention comprises 12 elastic frames normalizing a continuous sound sound wave, and the Bayesian comparison method finds a known continuous sound for the unknown continuous sound of the speaker in the database, and divides an unknown sentence of a speaker into D sounds. Unknown continuous sound, and a window screening method, screening a known sentence as a speaker's unknown sentence.

發一連續音時,它的發音是用音波表示。音波是一種隨時間作非線性變化的系統,一連續音音波內含有一種動態特性,也隨時間作非線性連續變化。相同連續音發音時,有一連串相同動態特性,隨時間作非線性伸展及收縮,但相同動態特性依時間排列秩序一樣,但時間不同。相同連續音發音時,將相同的動態特性排列在同一時間位置上非常困難。更因相似連續音 特多,造成辨認更難。 When a continuous sound is made, its pronunciation is expressed by sound waves. A sound wave is a system that changes nonlinearly with time. A continuous sound wave contains a dynamic characteristic and also changes nonlinearly with time. When the same continuous sound is pronounced, there are a series of identical dynamic characteristics, which are nonlinearly stretched and contracted with time, but the same dynamic characteristics are arranged in the same order according to time, but the time is different. When the same continuous sound is pronounced, it is very difficult to arrange the same dynamic characteristics at the same time position. Similar continuous sound Tedo, making identification more difficult.

一電腦化語言辨認系統,首先要抽取聲波有關語言資訊,也即動態特性,過濾和語言無關的雜音,如人的音色、音調,說話時心理、生理及情緒和語音辨認無關先刪去。然後再將相同連續音的相同特徵排列在相同的時間位置上。此一連串的特徵用一等長系列特徵向量表示,稱為一連續音的特徵模型。目前語音辨認系統要產生大小一致的特徵模型太複雜,且費時,因為相同連續音的相同特徵很難排列在同一時間位置上,尤其是英語,導致比對辨認困難。 A computerized language recognition system first extracts sound waves related to language information, that is, dynamic characteristics, filtering and language-independent noises, such as human voices, tones, psychology, physical and emotional speech and speech recognition. The same features of the same continuous tone are then arranged at the same time position. This series of features is represented by a series of equal-length feature vectors, called a continuous-tone feature model. At present, it is too complicated and time-consuming for a speech recognition system to produce a feature model of uniform size, because the same features of the same continuous sound are difficult to be arranged at the same time position, especially in English, which makes the identification difficult.

一般句子或名稱辨認方法有下列一連串五個主要工作:未知句子或名稱切割成D個未知連續音、抽取特徵、特徵正常化(特徵模型大小一致,且相同連續音的相同特徵排列在同一時間位置)、未知連續音辨認、及在句子或名稱資料庫找適合句子或名稱。一連續音聲波特徵常用有下列幾種:能量(energy),零橫過點數(zero crossings),極值數目(extreme count),顛峰(formants),線性預估編碼倒頻譜(LPCC)及梅爾頻率倒頻譜(MFCC),其中以線性預估編碼倒頻譜(LPCC)及梅爾頻率倒頻譜(MFCC)是最有效,並普遍使用。線性預估編碼倒頻譜(LPCC)是代表一連續音最可靠,穩定又準確的語言特徵。它用線性迴歸模式代表連續音音波,以最小平方估計法計算迴歸係數,其估計值再轉換成倒頻譜,就成為線性預估 編碼倒頻譜(LPCC)。而梅爾頻率倒頻譜(MFCC)是將音波用傅氏轉換法轉換成頻率。再根據梅爾頻率比例去估計聽覺系統。根據學者S.B.Davis and P.Mermelstein於1980年出版在IEEE Transactions on Acoustics,Speech Signal Processing,Vol.28,No.4發表的論文Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences中用動態時間扭曲法(DTW),梅爾頻率倒頻譜(MFCC)特徵比線性預估編碼倒頻譜(LPCC)特徵辨認率要高。但經過多次語音辨認實驗(包含本人前發明),用貝氏分類法,線性預估編碼倒頻譜(LPCC)特徵辨認率比梅爾頻率倒頻譜(MFCC)特徵要高,且省時。 The general sentence or name recognition method has the following five main tasks: unknown sentence or name cut into D unknown continuous sounds, extracted features, feature normalization (feature model size is consistent, and the same features of the same continuous sound are arranged at the same time position) ), unknown continuous tone recognition, and find a suitable sentence or name in the sentence or name database. A continuous sound wave feature is commonly used in the following ways: energy, zero crossings, extreme count, formants, linear predictive coding cepstrum (LPCC) and The Mel Frequency Cepstrum (MFCC), in which linear predictive coding cepstrum (LPCC) and Mel frequency cepstrum (MFCC) are most effective and commonly used. Linear Predictive Cepstrum (LPCC) is the most reliable, stable, and accurate linguistic feature that represents a continuous tone. It uses a linear regression model to represent continuous sound waves, calculates the regression coefficients by the least squares estimation method, and converts the estimated values into cepstrums, which becomes a linear prediction. Coded Cepstrum (LPCC). The Mel Frequency Cepstrum (MFCC) converts sound waves into frequencies using the Fourier transform method. The auditory system is estimated based on the ratio of the frequency of the Mel. According to the scholar SBDavis and P. Mermelstein published in IEEE Transactions on Acoustics, Speech Signal Processing, Vol.28, No. 4, the use of dynamic time warping method in the paper of Comparatively parametric representations for monosyllabic word recognition in the continuous sentences (DTW), the Mel Frequency Cepstrum (MFCC) feature is higher than the Linear Predictive Cepstrum (LPCC) feature recognition rate. However, after many speech recognition experiments (including my previous invention), the Bayesian classification method, the linear prediction coding cepstrum (LPCC) feature recognition rate is higher than the Mel frequency cepstrum (MFCC) feature, and saves time.

至於語言辨認,已有很多方法採用。有動態時間扭曲法(dynamic time-warping),向量量化法(vector quantization)及隱藏式馬可夫模式法(HMM)。如果相同的發音在時間上的變化有差異,一面比對,一面將相同特徵拉到同一時間位置。辨認率會很好,但將相同特徵拉到同一位置很困難並扭曲時間太長,不能應用。向量量化法如辨認大量連續音,不但不準確,且費時。最近隱藏式馬可夫模式法(HMM)辨認方法不錯,但方法繁雜,太多未知參數需估計,計算估計值及辨認費時。最近T.F.Li(黎自奮)於2003年出版在Pattern Recognition, vol.36發表的論文Speech recognition of mandarin monosyllables中用貝氏分類法,以相同資料庫,將各種長短一系列LPCC向量壓縮成相同大小的分類模型,辨認結果比Y.K.Chen,C.Y.Liu,G.H.Chiang,M.T.Lin於1990年出版在Proceedings of Telecommunication Symposium,Taiwan發表的論文The recognition of mandarin monosyl lables based on the discrete hidden Markov model中用隱藏式馬可夫模式法HMM方法要好。但壓縮過程複雜費時,且相同連續音很難將相同特徵壓縮到相同時間位置,對於相似連續音,很難辨認。 As for language recognition, there are many ways to adopt it. There are dynamic time-warping, vector quantization and hidden Markov mode (HMM). If the same pronunciation changes in time, one side compares the same feature to the same time position. The recognition rate will be good, but pulling the same feature to the same position is difficult and the distortion time is too long to apply. Vector quantization, such as identifying a large number of continuous sounds, is not only inaccurate, but also time consuming. Recently, the hidden Markov mode method (HMM) identification method is good, but the method is complicated, too many unknown parameters need to be estimated, and the estimated value and the recognition time are calculated. Recently T.F.Li (Li Zifen) was published in Pattern Recognition in 2003. The paper published in vol.36 uses the Bayesian classification method to compress various long and short series of LPCC vectors into the same size classification model with the same database. The recognition result is better than YKChen, CYLiu, GHChiang, MTLin published in the Proceedings of Telecommunication Symposium in 1990, and the paper published in Taiwan, The recognition of mandarin monosyl lables based on the discrete hidden Markov model, is better with the hidden Markov mode HMM method. However, the compression process is complicated and time consuming, and it is difficult for the same continuous sound to compress the same feature to the same time position, which is difficult to recognize for similar continuous sounds.

本發明語音辨認方法針對上述缺點,從學理方面,根據音波有一種語音特徵,隨時間作非線性變化,自然導出一套抽取語音特徵方法。將一連續音音波先正常化再轉換成一足以代表該連續音的大小相等特徵模型,並且相同連續音在它們特徵模型內相同時間位置有相同特徵。不需要人為或實驗調節本發明內的未知參數及門檻。用簡易貝氏分類法,即可將未知連續音分類模型和連續音特徵資料庫內已知連續音標準模型比對,不需要再壓縮,扭曲或尋找相同的特徵來比對。所以本發明語音辨認方法,能快速完成特徵抽取,特徵正常化及辨認。 The speech recognition method of the present invention is directed to the above-mentioned shortcomings. From the academic point of view, according to the sound characteristics of the sound waves, nonlinear changes over time, a set of methods for extracting speech features is naturally derived. A continuous sound wave is normalized first and then converted into an equal-sized feature model sufficient to represent the continuous sound, and the same continuous sound has the same characteristics at the same time position within their feature models. There is no need to artificially or experimentally adjust the unknown parameters and thresholds within the present invention. Using the simple Bayesian classification method, the unknown continuous sound classification model can be compared with the known continuous sound standard model in the continuous sound feature database, without recompressing, distorting or finding the same features for comparison. Therefore, the speech recognition method of the present invention can quickly perform feature extraction, feature normalization and recognition.

(1)本發明最重要的目的是用多個未知連續音的特徵來模擬及計算任何一種語言的任何一已知連續音的特徵,因此本發 明可以不用樣本,就可以建立任何一種語言的任何一連續音的特徵,即本發明不用樣本也能正確辨認各種語言。詳細地說,本發明對任何一種語言的任何一已知連續音,用貝氏距離在144度空間找N個未知連續音矩陣來模擬及計算該已知連續音,以達到不用已知連續音的樣本,仍能夠建立任何已知連續音的特徵。因此可以辨認任何語言。 (1) The most important object of the present invention is to simulate and calculate the characteristics of any known continuous sound in any language using features of a plurality of unknown continuous sounds, It is possible to establish the characteristics of any continuous tone of any language without using samples, that is, the present invention can correctly recognize various languages without using samples. In detail, the present invention simulates and calculates the known continuous tone by finding a sequence of N unknown continuous sounds in a space of 144 degrees using a Bayesian distance for any known continuous sound of any language, so as to achieve no known continuous sound. The sample is still able to establish the characteristics of any known continuous sound. Therefore, any language can be identified.

(2)本發明提供一種語言辨認方法。它能將不具語言音波刪除。 (2) The present invention provides a language recognition method. It can remove non-verbal sound waves.

(3)本發明提供一種連續音音波正常化及抽取特徵方法。它使用E個相等彈性框,不重疊,沒有濾波器,能依一連續音波長短自由調節含蓋全部波長,能將連續音音波內一系列隨時間作非線性變化的動態特性轉換成一大小相等的特徵模型,並且相同連續音音波的特徵模型在相同時間位置上有相同特徵。可以及時辨認,達到電腦即時辨認效果。 (3) The present invention provides a method for normalizing and extracting continuous sound waves. It uses E equal elastic frames, does not overlap, has no filter, can freely adjust the entire wavelength of the cover according to a continuous sound wavelength, and can convert a series of dynamic characteristics in a continuous sound wave with a nonlinear change with time into an equal size. The feature model, and the feature models of the same continuous sound waves have the same features at the same time position. It can be identified in time to achieve instant recognition of the computer.

(4)本發明提供一種簡易有效貝氏辨認未知連續音方法,認錯機率達到最小,計算少、辨認快及辨識率高。 (4) The present invention provides a simple and effective method for identifying unknown continuous sounds, which has the minimum probability of recognition, low calculation, fast recognition and high recognition rate.

(5)本發明提供一種抽取連續音特徵方法,連續音音波有一種隨時間作非線性變化的動態特性。本發明用隨時間作線性變化的迴歸模型估計隨時間作非線性變化的音波,產生的迴歸未知係數的最小平方估計值(LPC向量)。 (5) The present invention provides a method for extracting continuous sound features, and the continuous sound waves have a dynamic characteristic that changes nonlinearly with time. The present invention estimates a least squares estimate of the regression unknown coefficients (LPC vectors) using a regression model that varies linearly with time.

(6)本發明使用所有具有語音音波(音波信號點)。用較少 數E=12個相等彈性框,沒有濾波器,不重疊含蓋所有信號點特徵。不因為一連續音音波太短,刪去該連續音,也不因為太長,刪去或壓縮部分信號點。只要人類聽覺能辨別此連續音,本發明即可將該連續音抽取特徵。所以本發明語音辨認方法應用每一具有語音的信號點,可以盡量抽取語音特徵。因E=12個彈性框不重疊,框數少,大大減少特徵抽取及計算線性預估編碼倒頻譜(LPCC)時間。 (6) The present invention uses all voice sound waves (sound wave signal points). Use less The number E = 12 equal elastic frames, without filters, does not overlap all signal point features with cover. It is not because a continuous sound wave is too short to delete the continuous sound, nor because it is too long, to delete or compress part of the signal point. As long as human hearing can discern this continuous sound, the present invention can extract features from the continuous sound. Therefore, the speech recognition method of the present invention applies each signal point having speech, and can extract speech features as much as possible. Since E=12 elastic frames do not overlap, the number of frames is small, which greatly reduces feature extraction and calculates linear predictive coding cepstrum (LPCC) time.

(7)本發明辨認方法可以辨認講話太快或講話太慢的連續音。講話太快時,一連續音音波很短,本發明的彈性框長度可以縮小,仍然用相同數E個等長的彈性框含蓋短音波。產生E個線性預估編碼倒頻譜(LPCC)向量。只要該短音人類可辨別,那麼該E個線性預估編碼倒頻譜(LPCC)向量可以有效代表該短音的特徵模型。講太慢所發出連續音音波較長。彈性框會伸長。所產生E個線性預估編碼倒頻譜(LPCC)向量也能有效代表該長音。 (7) The identification method of the present invention can recognize continuous sounds that are too fast or too slow to speak. When the speech is too fast, a continuous sound wave is very short, and the length of the elastic frame of the present invention can be reduced, and the same number of E equal length elastic frames are still used to cover the short sound wave. Generates E linear predictive coding cepstrum (LPCC) vectors. The E linear predictive coding cepstrum (LPCC) vector can effectively represent the feature model of the short tone as long as the short human is discernible. The continuous sound waves emitted by the speaker are too slow. The elastic frame will stretch. The generated E-Linear Estimated Coded Cepstrum (LPCC) vectors are also effective to represent the long tone.

(8)本發明提供一穩定及調節資料庫內所有已知連續音的特徵方法,使所有連續音的特徵在144度空間內相互佔有自己的位置及空間,以便辨認正確。 (8) The present invention provides a method for stabilizing and adjusting the characteristics of all known continuous tones in a database, so that the features of all consecutive tones occupy their own positions and spaces in a space of 144 degrees for identification.

(9)辨認一句子或名稱時,先將未知句子或名稱切割成D個未知連續音,本發明將每個未知連續音用貝氏法在連續音特徵資料庫,選擇最相似F個已知連續音。一句子用D ×F個已知連續音表示,因切割困難可能切成比較多或比較少未知連續音個數,本發明以每個未知連續音前後三列F個相似已知連續音比對句子或名稱中一已知連續音,也即在句子及名稱資料庫中,對每一句子或名稱用3×F視窗的已知相似連續音篩選一已知連續音,再從句子及名稱資料庫找一最可能句子或名稱,方法簡單,成功率很高(辨認70英語句子及名稱和407國語句子及名稱)。 (9) When identifying a sentence or name, first cut the unknown sentence or name into D unknown continuous sounds. The present invention uses the Bayesian method in the continuous sound feature database to select the most similar F known. Continuous sound. One sentence with D ×F known continuous sounds indicate that the number of unknown continuous sounds may be cut into more or less due to the difficulty of cutting, and the present invention uses three similarly known continuous pairs of sounds in the sentence or name before and after each unknown continuous sound. A known continuous sound, that is, in the sentence and name database, filter a known continuous sound for each sentence or name with a known similar continuous sound of a 3×F window, and then find the most from the sentence and name database. Possible sentences or names, simple method, high success rate (identification of 70 English sentences and names and 407 national sentences and names).

(10)本發明提供二種技術修正連續音的特徵使未知連續音及未知句子或名稱辨認成功。 (10) The present invention provides two techniques for modifying the characteristics of continuous sounds such that unknown continuous sounds and unknown sentences or names are successfully recognized.

(11)本發明將一語單音當作一只有一音節連續音,中文及外文的特徵都由同樣本大小矩陣表示。 (11) The present invention treats a single tone as having a syllable continuous sound, and the features of both Chinese and foreign languages are represented by the same size matrix.

因此本發明可以同時辨認各種語言。 Therefore, the present invention can recognize various languages at the same time.

用第一圖及第二圖說明發明執行程序。第一圖是表示已知連續音永久資料庫,已知連續音特徵資料庫和句子及名稱資料庫三個資料庫建立流程。連續音特徵資料庫包含所有已知連續音的標準模型,表示已知連續音的特徵。輸入一已知連續音或一句子或名稱(句子或名稱將會切成多個連續音)1,以一連續音波10形式進入接收器20。數位轉換器30將連續音波轉為一序列音波數位的信號點。先前處理器45有兩種刪去方法:(1)計算一小時段內信號點的變異數及一 般雜音變異數。如前者小於後者,則該小時段不具語音,應刪去。(2)計算一小時段內連續兩信號點距離總和及一般雜音的總和,如前者小於後者,則該小時段不具語音,應刪去。經過先前處理器45之後,得到一序列具有該已知連續音信號點。先將音波正常化再抽取特徵,將已知連續音的全部信號點分成E等時段,每時段組成一個框。一連續音一共有E個等長框50,沒有濾波器,不重疊,根據連續音全部信號點的長度,E個框長度自由調整含蓋全部信號點。所以該框稱為彈性框,長度自由伸縮,但E個彈性框長度一樣。不像漢明(Hamming)窗,有濾波器、半重疊、固定長度、不能隨波長自由調整。因一連續音音波隨時間作非線性變化,音波含有一語音動態特徵,也隨時間作非線性變化。因為不重疊,所以本發明使用較少(E=12)個彈性框,涵蓋全部連續音音波,因信號點可由前面信號點估計,用隨時間作線性變化的迴歸模式來密切估計非線性變化的音波,用最小平方法估計迴歸未知係數。每框內產生一組未知係數最小平方估計值,叫做線性預估編碼(LPC向量)。再將線性預估編碼(LPC)向量轉換為較穩定線性預估編碼倒頻譜(LPCC)。一連續音音波內含有一序列隨時間作非線性變化的語音動態特徵,在本發明內轉換成大小相等E個線性預估編碼倒頻譜(LPCC)向量60。為了抽取一已知連續音的特徵,先準備一永久已知 連續音資料庫。每個已知連續音由一標準,清晰者發音一次。如果辨認一口音重或不標準的說話,那麼由此人發音,將所有已知連續音轉換成E×P個LPCC矩陣放在永久已知連續音資料庫內。在永久已知連續音資料庫內,為一已知連續音抽取特徵,先準備一未知連續音的資料庫,未知連續音資料庫有二種:一種是未知連續音有樣本,另一種是沒有標準。有樣本的資料庫,先求每一未知連續音的平均值及變異數。在有樣本的未知連續音資料庫中用貝氏距離對該已知連續音周圍找N個最近的未知連續音。再求N個未知音的N個平均值及該已知連續音的線性預估編碼倒頻譜(LPCC)的N+1個加權平均值作為已知連續音的平均值,並以N個連續音的N個變異數的加權平均值作為該已知連續音的變異數,此E×P平均值及變異數矩陣是該已知連續音的初步特徵值79放在連續音特徵資料庫中。如果未知資料庫沒有樣本,在未知連續音資料庫中,用最小絕對值距離為該已知連續音周圍找N個未知連續音。該已知連續音及N個未知連續音的線性預估編碼倒頻譜(LPCC)看作(N+1)個數字。求(N+1)個數字的加權平均值作為該已知連續音的平均值,及求(N+1)個數字的變異數作為該已知連續音的變異數,此E×P平均值及變異數的矩陣代表該已知音的初步特徵,放已知連續音特徵資料庫內79。在已知連續音特徵資料庫內,如果一已知連續音 的平均值和在永久已知連續音資料庫內同樣一已知連續音的LPCC的貝氏距離,在特徵資料庫內不是最小,那麼在特徵資料庫內用貝氏距離找N個已知連續音,它們的貝氏距陣對該已知連續音的LPCC是N個最小。求N個已知連續音N個平均值及該已知單音的LPCC加權平均值作為該已知連續音新平均值,並用N個已知連續音的N個變異數的加權平均值作為該已知連續音的新的變異數。用此方法重複多次計算特徵資料庫內每一已知連續音的新平均值及變異數,最後E×P新的平均值及變異數矩陣叫做標準模型代表該已知連續音,放在特徵資料庫中80,再用已知特徵資料庫的已知連續音建立句子及名稱資料庫85。 The invention execution program will be described using the first diagram and the second diagram. The first figure is a flow chart showing the establishment of a known continuous sound permanent database, a known continuous sound feature database and a sentence and name database. The continuous tone feature library contains a standard model of all known continuous tones, representing the characteristics of known continuous tones. Entering a known continuous tone or a sentence or name (sentence or name will be cut into multiple consecutive tones) 1 enters the receiver 20 in the form of a continuous stream of sound 10. The digitizer 30 converts the continuous sound waves into signal points of a sequence of sound waves. The previous processor 45 has two methods of deleting: (1) calculating the number of variations of the signal points in one hour and one The number of murmur variations. If the former is smaller than the latter, the hour segment has no voice and should be deleted. (2) Calculate the sum of the distance between the two consecutive signal points and the general noise in one hour. If the former is smaller than the latter, the hour segment has no voice and should be deleted. After passing through the previous processor 45, a sequence of such known continuous tone signal points is obtained. The sound wave is normalized and then the feature is extracted, and all the signal points of the known continuous sound are divided into E and other time periods, and each time frame constitutes a frame. A continuous sound has a total of E equal length frames 50, no filters, no overlap, and all the signal points of the cover are freely adjusted according to the length of all the signal points of the continuous sound. Therefore, the frame is called a flexible frame, and the length is free to expand and contract, but the length of the E elastic frames is the same. Unlike the Hamming window, there are filters, semi-overlaps, fixed lengths, and no freedom to adjust with wavelength. Since a continuous sound wave changes nonlinearly with time, the sound wave contains a dynamic feature of the voice, and also changes nonlinearly with time. Since there is no overlap, the present invention uses less (E=12) elastic frames covering all continuous sound waves, since the signal points can be estimated from the previous signal points, and the regression pattern that varies linearly with time is used to closely estimate the nonlinear changes. Sound waves, using the least squares method to estimate the regression unknown coefficients. A set of least squares estimates of unknown coefficients is generated in each box, called Linear Predictive Coding (LPC Vector). The linear predictive coding (LPC) vector is then converted to a more stable linear predictive coding cepstrum (LPCC). A continuous sound wave contains a sequence of speech dynamics that change nonlinearly with time, and is converted into equal-sized E linear predictive coding cepstrum (LPCC) vectors 60 within the present invention. In order to extract the characteristics of a known continuous tone, first prepare a permanent known Continuous sound database. Each known continuous tone is pronounced once by a standard, clearer. If a tone or non-standard speech is recognized, then the person is pronounced, converting all known continuous tones into E x P LPCC matrices in a permanently known continuous tone database. In the permanent known continuous sound database, for a known continuous sound extraction feature, a database of unknown continuous sounds is prepared first, and there are two unknown continuous sound data bases: one is that the unknown continuous sound has samples, the other is no standard. With a sample database, first find the average and variation of each unknown continuous sound. Find the N most recent unknown continuous sounds around the known continuous sound with a Bayesian distance in the unknown continuous sound database with samples. The N average values of the N unknown sounds and the N+1 weighted averages of the linear predictive coding cepstrum (LPCC) of the known continuous sound are taken as the average of the known continuous sounds, and N consecutive sounds are obtained. The weighted average of the N variances is used as the variance of the known continuous tone, and the E×P mean and the variance matrix are the preliminary feature values 79 of the known continuous tone placed in the continuous tone feature database. If there is no sample in the unknown database, in the unknown continuous sound database, find N unknown continuous sounds around the known continuous sound with the minimum absolute distance. The linear predictive coding cepstrum (LPCC) of the known continuous tone and the N unknown continuous tones is regarded as (N+1) numbers. Find the weighted average of (N+1) numbers as the average of the known continuous sounds, and find the (N+1) number of variances as the variance of the known continuous sound, this E×P average And the matrix of the variogram represents the preliminary feature of the known sound, and is placed in the known continuous sound feature database 79. In a known continuous tone feature database, if a known continuous tone The mean value and the Bayesian distance of the LPCC of the same known continuous sound in the permanent known continuous sound database are not the smallest in the feature database, then find N known continuous points in the feature database with Bayesian distance. Tones, their Bayesian matrix is N minimum for the LPCC of known continuous tones. Finding N average values of N known continuous sounds and the LPCC weighted average of the known single sounds as the new average of the known continuous sounds, and using the weighted average of the N variances of the N known continuous sounds as the A new variation of continuous sound is known. This method is used to repeatedly calculate the new average and variance of each known continuous sound in the feature database. Finally, the E×P new average and the variance matrix are called standard models representing the known continuous sounds. In the database 80, a sentence and name database 85 is created using the known continuous sounds of the known feature database.

第二圖表示一未知句子或名稱辨認方法流程。當輸入一未知句子或名稱2到本發明語音辨認方法後,以一組未知連續音波11進入接收器20,由數位轉換器30轉為一系列音波信號點。將一未知句子或名稱的音波切成D個未知連續音的音波40,再以第一圖先前處理器45刪去不具語音的音波。再將每個未知連續音音波正常化,抽取特徵,將句子或名稱每個未知連續音全部具有語音的信號點分成E等時段,每時段形成一彈性框50。每個連續音一共有E個彈性框,沒有濾波器,不重疊,自由伸縮含蓋全部信號點。在每框內,因信號點可由前面信號估計,用最小平方法求迴歸未知係數的估 計值。每框內所產生的一組最小平方估計值叫做線性預估編碼(LPC)向量,線性預估編碼(LPC)向量有正常分配,再將線性預估編碼(LPC)向量轉換較穩定線性預估編碼倒頻譜(LPCC)向量60。一未知連續音以E個線性預估編碼倒頻譜(LPCC)向量代表特徵模型,稱為分類模型90,和已知連續音標準模型大小一樣。一句子一共有D個分類模型代表D個未知連續音90,如果一已知連續音是此未知連續音,它的標準模型的平均值最靠近未知連續音分類模型的線性預估編碼倒頻譜(LPCC)。所以本發明的簡易貝氏辨認法,以未知連續音的分類模型和連續音資料庫80每一已知連續音的標準模型比較100。如果一已知連續音是該未知連續音,為了計算省時,假定未知連續音的分類模型內所有線性預估編碼倒頻譜(LPCC)有獨立正常分配,它們的平均數及變異數以已知連續音標準模型內平均值及變異數估計。簡易貝氏法是計算未知連續音的線性預估編碼倒頻譜(LPCC)與已知連續音的平均數的距離,再以已知連續音變異數調整,所得的值代表該未知連續音與一已知連續音相似度。選擇與未知連續音F個相似度最高已知連續音代表為未知連續音,因此一未知句子或名稱用D×F個已知連續音來表示110。一未知句子或名稱切割成D個未知連續音後,很難剛好切成一未知句子或名稱所包含的連續音及個數,有時一連續音切成兩個, 有時兩個連續音念的很近,電腦切成一個,因此,D個未知連續音並不一定是講話者真正連續音的個數,所以某一列F個已知相似連續音並不一定包含講話者的連續音。在辨認一未知句子或名稱時,在句子和名稱資料庫85,測試每一已知句子及名稱,在測試一句子或名稱是否是講話者的句子或名稱,將該句子或名稱從頭一已知連續音比對D×F矩陣相似連續音的前後三列相似連續音(當然第一個比對只能比對中後兩列相似連續音),再移動3×F視窗(前後三列已知相似連續音)120找句子第二個已知連續音,直到測試句子全部已知連續音。在句子及名稱資料庫中,以最高機率的句子或名稱為講話者的句子或名稱(測試句子或名稱中已知連續音在3×F視窗數目除以測試句子或名稱中連續音數)130。當然可在句子及名稱資料庫中選擇和未知句子或名稱(D個未知連續音)長度大約相等的句子或名稱比對,節省時間。如果句子或名稱不能辨認,用貝氏分類法在特徵資料庫中找N個最相似連續音(79)改進句中的連續音特徵。本發明詳述於後: The second diagram shows the flow of an unknown sentence or name recognition method. When an unknown sentence or name 2 is input to the speech recognition method of the present invention, a set of unknown continuous sound waves 11 enters the receiver 20, and the digital converter 30 converts to a series of sound signal points. The sound wave of an unknown sentence or name is cut into sound waves 40 of D unknown continuous sounds, and the sound waves without speech are deleted by the previous processor 45 of the first figure. Then, each unknown continuous sound wave is normalized, and features are extracted, and the signal points of each sentence or name each of the unknown continuous sounds having speech are divided into E and the like, and a flexible frame 50 is formed every time period. There are a total of E elastic frames for each continuous sound, no filters, no overlap, and free telescopic covers all signal points. In each frame, because the signal point can be estimated from the previous signal, the least square method is used to estimate the regression unknown coefficient. Valuation. The set of least squares estimates produced in each box is called the linear predictive coding (LPC) vector, the linear predictive coding (LPC) vector is normally allocated, and the linear predictive coding (LPC) vector is converted to a more stable linear estimate. Coded Cepstrum (LPCC) vector 60. An unknown continuous tone represents a feature model with E linear predictive coding cepstrum (LPCC) vectors, called the classification model 90, which is the same size as the known continuous tone standard model. There are a total of D classification models representing D unknown continuous sounds 90. If a known continuous sound is this unknown continuous sound, the average of its standard model is closest to the linear predictive coding cepstrum of the unknown continuous sound classification model ( LPCC). Therefore, the simple Bayesian recognition method of the present invention compares 100 with a classification model of unknown continuous sounds and a standard model of each known continuous sound of the continuous sound database 80. If a known continuous sound is the unknown continuous sound, in order to calculate the time-saving, it is assumed that all linear predictive coding cepstrums (LPCC) in the classification model of unknown continuous sound have independent normal allocation, and their mean and variance are known. Estimation of mean and variance in the continuous sound standard model. The simple Bayesian method is to calculate the distance between the linear predictive coding cepstrum (LPCC) of the unknown continuous sound and the average of the known continuous sounds, and then adjust the known continuous sound variation, and the obtained value represents the unknown continuous sound and one. Continuous sound similarity is known. Selecting the F-similar similarity to the unknown continuous sound The highest known continuous sound is represented as an unknown continuous sound, so an unknown sentence or name is represented 110 by D x F known continuous sounds. After an unknown sentence or name is cut into D unknown continuous sounds, it is difficult to cut into an unknown sentence or the continuous sound and number contained in the name. Sometimes a continuous sound is cut into two. Sometimes two consecutive notes are very close, and the computer is cut into one. Therefore, D unknown continuous sounds are not necessarily the number of true continuous sounds of the speaker, so a certain F known similar continuous sounds do not necessarily contain The continuous sound of the speaker. When identifying an unknown sentence or name, in the sentence and name database 85, test each known sentence and name, test whether a sentence or name is the speaker's sentence or name, and the sentence or name is known from the first one. The continuous tonal ratio is similar to the continuous three-column continuous continuous sound of the D×F matrix similar continuous sound (of course, the first comparison can only compare the last two consecutive similar sounds), and then move the 3×F window (the front and rear three columns are known) Similar continuous sound) 120 finds the second known continuous sound of the sentence until the test sentence is all known to be continuous. In the sentence and name database, the highest probability sentence or name is the speaker's sentence or name (the number of known continuous sounds in the test sentence or name is divided by the number of 3×F windows divided by the number of consecutive words in the test sentence or name) 130 . Of course, it is possible to save time by selecting sentences or names that are approximately equal in length to the unknown sentence or name (D unknown continuous tones) in the sentence and name database. If the sentence or name is unrecognizable, use the Bayesian classification to find the N most similar continuous sounds (79) in the feature database to improve the continuous sound features in the sentence. The invention is described in detail later:

(1)一連續音輸入語音辨認方法後,將此連續音連續音波轉換一系列數化音波信號點(signal sampled points)。再刪去不具語音音波信號點。本發明提供二種方法:一是計算一小時段內信號點的變異數。二是計算該時段內相鄰二信號點距離的總和。理論上,第一種方法比較好,因信號點的變異 數大於雜音變異數,表示有語音存在。但在本發明辨認連續音時,兩種方法辨認率一樣,但第二種省時。 (1) After a continuous tone input speech recognition method, the continuous sound continuous sound wave is converted into a series of signal sampled points. Then delete the point that does not have a voice signal. The present invention provides two methods: one is to calculate the number of variations of signal points within one hour. The second is to calculate the sum of the distances of adjacent two signal points in the time period. In theory, the first method is better because of the variation of signal points. The number is greater than the number of noise variations, indicating that there is speech. However, when the present invention recognizes continuous sounds, the two methods have the same recognition rate, but the second saves time.

(2)不具語音信號點刪去後,剩下信號點代表一連續音全部信號點。先將音波正常化再抽取特徵,將全部信號點分成E等時段,每時段形成一個框。一連續音共有E個等長的彈性框,沒有濾波器、不重疊、自由伸縮,涵蓋全部信號點。彈性框內信號點隨時間作非線性變化,很難用數學模型表示。因為J.Markhoul於1975年出版在Proceedings of IEEE,Vol.63,No.4發表論文Linear Prediction:A tutorial review中說明信號點與前面信號點有線性關係,可用隨時間作線性變化的迴歸的模型估計此非線性變化的信號點。信號點S(n)可由前面信號點估計,其估計值S'(n)由下列迴歸模式表示: 在(1)式中,a k ,k=1,...,P,是迴歸未知係數估計值,P是前面信號點數目。用L.Rabiner及B.H.Juang於1993年著作書Fundamentals of Speech Recognition,Prentice Hall PTR,Englewood Cliffs,New Jersey中Durbin的循環公式求最小平方估計值,此組估計值叫做線性預估編碼(LPC)向量。求框內信號點的線性預估編碼(LPC)向量方法詳述如下:以E 1表示信號點S(n)及其估計值S'(n)之間平方差總和: 求迴歸係數使平方總和E 1達最小。對每個未知迴歸係數a i ,i=1,...,P,求(2)式的偏微分,並使偏微分為0,得到P組正常方程式: 展開(2)式後,以(3)式代入,得最小總平方差E P (3)式及(4)式轉換為 (2) After the voice signal point is deleted, the remaining signal points represent all the signal points of a continuous tone. The sound wave is normalized and then the feature is extracted, and all the signal points are divided into E and other time periods, and a frame is formed every time period. A continuous sound has a total of E equal length elastic frames, no filters, no overlap, free stretch, covering all signal points. The signal points in the elastic frame change nonlinearly with time and are difficult to express by mathematical models. Because J. Markhoul published in Proceedings of IEEE, Vol. 63, No. 4 published in 1975, Linear Prediction: A tutorial review shows that the signal point has a linear relationship with the previous signal point, and the regression model can be used to change linearly with time. Estimate the signal point of this nonlinear change. The signal point S ( n ) can be estimated from the previous signal point, and its estimated value S '( n ) is represented by the following regression mode: In equation (1), a k , k =1,..., P are the estimated values of the regression unknown coefficients, and P is the number of previous signal points. The least squares estimate is obtained using the cyclic formula of Durbin in L. Rabiner and BHJuang in 1993, Fundamentals of Speech Recognition, Prentice Hall PTR, Englewood Cliffs, New Jersey. This set of estimates is called a linear predictive coding (LPC) vector. The linear predictive coding (LPC) vector method for finding the signal points in the frame is detailed as follows: the sum of the squared differences between the signal point S ( n ) and its estimated value S '( n ) is represented by E 1 : Find the regression coefficient to minimize the sum of squares E 1 . For each unknown regression coefficient a i , i =1,..., P , find the partial differential of (2) and divide the partial differential into 0 to get the normal equation of P group: After expanding (2), substitute (3) to get the minimum total squared difference E P (3) and (4) are converted to

在(5)及(6)式中,用N表示框內信號點數, 用Durbin的循環快速計算線性預估編碼(LPC)向量如下:E 0=R(0) (8) In equations (5) and (6), N is used to indicate the number of signal points in the frame. The linear predictive coding (LPC) vector is quickly calculated using Durbin's loop as follows: E 0 = R (0) (8)

(8-12)公式循環計算,得到迴歸係數最小平方估計值a j j=1,...,P,(線性預估編碼(LPC)向量)如下: 再用下列公式將LPC向量轉換較穩定線性預估編碼倒頻譜(LPCC)向量a' j j=1,...,P, (8-12) Formula loop calculation to obtain the regression coefficient least squares estimate a j , j =1,..., P , (linear predictive coding (LPC) vector) as follows: Then use the following formula to convert the LPC vector to a more stable linear predictive coding cepstrum (LPCC) vector a ' j , j =1,..., P ,

一彈性框產生一線性預估編碼倒頻譜(LPCC)向量(a'1,...,a' P )。根據本發明語音辨認方法,用P=12,因最後的線性預估編碼倒頻譜(LPCC)幾乎為0。一連續音以E個線性預估編碼倒頻譜(LPCC)向量表示特徵,也即一含E×P個線性預估編碼倒頻譜(LPCC)的矩陣表示一連續音,一連續音包含一至多個音節。 An elastic frame produces a linear predictive coding cepstrum (LPCC) vector ( a ' 1 ,..., a ' P ). According to the speech recognition method of the present invention, P = 12, since the final linear prediction coding cepstrum (LPCC) is almost zero. A continuous tone is characterized by E linear predictive coding cepstral (LPCC) vectors, that is, a matrix containing E×P linear predictive coding cepstrums (LPCC) represents a continuous tone, and one continuous tone includes one or more syllable.

(3)同樣方法以(8-15)式計算出一未知連續音音波的E個線性預估編碼倒頻譜(LPCC)向量,有同樣大小E×P個LPCC的矩陣,叫做未知連續音的分類模型。 (3) The same method calculates the E linear predictive coding cepstrum (LPCC) vectors of an unknown continuous sound wave with the formula (8-15), and the matrix of the same size E × P LPCC, called the classification of unknown continuous sounds model.

(4)在第二圖中,語音辨認器100,收到一未知連續音的分類模型,一E×P LPCC的矩陣。用X={X jl },j=1,...,El=1,...,P,表示未知連續音分類模型。在與一已知連續音c i i=1,...,m(m表示所有連續音總數),比對時,為了快速計算比對值,假定{X jl }有E×P個獨立正常分配,它的平均數及變異數(μ ijl ,),以已知連續 音標準模型內的平均值及變異數估計。以f(x|c i )表示X的條件密度函數。以T.F.Li(黎自奮)於2003年出版在Pattern Recognition,Vol.36發表論文Speech recognition of mandarin monosyllables中的決策理論說明貝氏分類法如下:假設特徵資料庫一共有m個已知連續音的標準模型。以θ i ,i=1,...,m,表示連續音c i ,i=1,...,m,出現的機率,也即先前機率,則。以d表示一決策方法。定義一簡單損失函數(loss function),也即d的判錯機率(misclassification Probability):如決策方法d判錯一未知連續音d(x)≠c i ,則損失函數L(c i ,d(x))=1。如果d判對一未知連續音d(x)=c i ,則無損失L(c i ,d(x))=0。辨認方法如下:以Γ i ,i=1,...,m,表示X=x矩陣值屬於已知連續音c i 的範圍。也即X在Γ i d判未知連續音屬於已知連續音c i d判錯平均機率為 在(16)中,τ=(θ 1,...,θ m ),是Γ i 以外範圍。以D表示所有語音辨認方法,也即劃分m個已知連續音的範圍所有方法。在D中找一辨認方法d τ 使它的平均認錯機率(16)達到最小,以R(τ,d τ )表示 滿足(17)式的辨認方法d τ 叫做與先前機率τ有關的貝氏分類法。可用下列表示:d τ (x)=c i if θ i f(x|c i )>θ j f(x|c j ) (18)在(18)式中,j=1,...,m,ji,也即屬於已知連續音c i 的範圍是對所有ji i ={x|θ i f(x|c i )>θ j f(x|c j )}。如所有已知連續音出現機率一樣,則貝氏分類法和最大機率法一樣。 (4) In the second figure, the speech recognizer 100 receives a classification model of an unknown continuous sound, a matrix of E × P LPCC. Use X = { X jl }, j =1,..., E , l =1,..., P to represent the unknown continuous sound classification model. In comparison with a known continuous sound c i , i =1,..., m ( m represents the total number of all consecutive sounds), in order to quickly calculate the alignment value, it is assumed that { X jl } has E × P independent Normal allocation, its mean and variation ( μ ijl , ), estimated by the mean and variance in the known continuous tone standard model. The conditional density function of X is denoted by f ( x | c i ). The decision theory in TFLi (Li Zifen) published in 2003 in Pattern Recognition, Vol.36 published in Speech recognition of mandarin monosyllables shows that the Bayesian classification is as follows: It is assumed that the feature database has a standard model of m known continuous sounds. Let θ i , i =1,..., m denote the continuous sound c i , i =1,..., m , the probability of occurrence, that is, the previous probability, then . A decision method is represented by d . Define a simple loss function, that is, the misclassification probability of d : if the decision method d judges an unknown continuous sound d ( x ) ≠ c i , then the loss function L ( c i , d ( x ))=1. If d is judged to an unknown continuous sound d ( x ) = c i , there is no loss L ( c i , d ( x )) = 0. The recognition method is as follows: Γ i , i =1,..., m , indicating that the X = x matrix value belongs to the range of the known continuous sound c i . That is, X is Γ i , d and the unknown continuous sound belongs to the known continuous sound c i . d the average probability of error In (16), τ = ( θ 1 ,..., θ m ), It is outside the range of Γ i . All methods of speech recognition, that is, all methods of dividing the range of m known continuous tones, are represented by D. Find a recognition method d τ in D to minimize its average error rate (16), expressed as R ( τ , d τ ) The recognition method d τ satisfying the formula (17) is called the Bayesian classification method related to the previous probability τ . The following expressions can be used: d τ ( x )= c i if θ i f ( x | c i )> θ j f ( x | c j ) (18) In the formula (18), j =1,..., m , ji , that is, the range belonging to the known continuous sound c i is for all ji , Γ i ={ x | θ i f ( x | c i )> θ j f ( x | c j )} . As with all known continuous tones, the Bayesian classification is the same as the maximum probability method.

貝氏分類法(18)辨認一未知連續音時,先計算所有X的條件密度函數f(x|c i ),i=1,...,m, 在(19)中,i=1,...,m,(已知連續音總數)。為了計算方便,將(19)式取對數,並刪去常數,得貝氏距離 貝氏分類法(18)變成對每個已知連續音c i ,計算l(c i )值(20),l(c i )也稱為未知連續音和已知連續音c i 的相似度,或貝氏距離(Baysian distance)。在(20)式中,x={x jl },j=1,...,E,l=1,...,P,是未知連續音分類模型內線性預估編碼倒頻譜(LPCC)值,{μ ijl ,}用已知連續音的標準模型內的平均數及變異數估計。本發明最重要的貢獻是不用樣本在已知連續音特徵資料庫為每一已知連續音c i 找到互相穩定的中心點c i ={μ ijl }及明確不重疊的範圍 Γ i ={x|θ i f(x|c i )>θ j f(x|c i )},ji. (21)這裡x={x ijl }是表示已知連續音c i 的E×P LPCC矩陣範圍。 When the Bayesian classification method (18) recognizes an unknown continuous sound, it first calculates the conditional density function f ( x | c i ) of all X , i =1,..., m , In (19), i = 1, ..., m , (the total number of continuous tones is known). For the convenience of calculation, take the logarithm of (19) and delete the constant to get the Bayesian distance. The Bayesian classification (18) becomes for each known continuous sound c i , and the l ( c i ) value (20) is calculated, and l ( c i ) is also called the similarity between the unknown continuous sound and the known continuous sound c i . , or Baysian distance. In (20), x = { x jl }, j =1,..., E , l =1,..., P , is the linear predictive coding cepstrum (LPCC) in the unknown continuous sound classification model. Value, { μ ijl , } Estimate the mean and variance within the standard model of known continuous tones. The most important contribution of the present invention is to find a mutually stable center point c i ={ μ ijl } and a clearly non-overlapping range Γ i ={ x for each known continuous sound c i without using a sample in a known continuous tone feature database. θ i f ( x | c i )> θ j f ( x | c i )}, ji . (21) where x = { x ijl } is an E × P LPCC matrix representing a known continuous sound c i range.

(5)抽取一已知連續音的特徵,先準備一未知連續音的資料庫,未知連續音資料庫有二種:一種是未知連續音有樣本,另一種是沒有樣本。有樣本的資料庫,先求每一未知連續音的平均值及變異數。在有樣本的未知連續音資料庫中用貝氏距離對該已知連續音周圍找N個最近的未知連續音。再求N個未知音的N個平均值及該已知連續音的線性預估編碼倒頻譜(LPCC)的N+1個加權平均值作為已知連續音的平均值,並以N個連續音的N個變異數的加權平均值作為該已知連續音的變異數,此E×P平均值及變異數矩陣是該已知連續音的初步特徵值79放在連續音特徵資料庫中。如果未知連續音資料庫沒有樣本,在未知連續音資料庫中,用最小絕對值距離為該已知連續音周圍找N個未知連續音。該已知連續音及N個未知連續音的線性預估編碼倒頻譜(LPCC)看作(N+1)個數字。求(N+1)個數字的加權平均值作為該已知連續音的平均值,及求(N+1)個數字的變異數作為該已知連續音的變異數,此E×P平均值及變異數的矩陣代表該已知連續音的初步特徵,放已知連續音特徵資料庫內79。在已知連續音特徵資料庫內,如果一已知連續音的平均值和在永久已知連續音資料庫內同樣一已知連續音的LPCC的貝氏距離,在特徵資料庫內不是最小,那麼在特徵資 料庫內用貝氏距離找N個已知連續音,它們的貝氏距陣對該已知連續音的LPCC是N個最小。求N個已知連續音N個平均值及該已知連續音的LPCC加權平均值作為該已知連續音新平均值,並用N個已知連續音的N個變異數的加權平均值作為該已知連續音的新的變異數。用此方法重複多次計算特徵資料庫內每一已知連續音的新平均值及變異數,最後E×P新的平均值及變異數矩陣叫做標準模型代表該已知連續音,放在特徵資料庫中80再用已知特徵資料庫的已知連續音建立句子及名稱資料庫85。 (5) Extracting the characteristics of a known continuous sound, first preparing a database of unknown continuous sounds, there are two kinds of unknown continuous sound database: one is that there are samples of unknown continuous sound, and the other is no sample. With a sample database, first find the average and variation of each unknown continuous sound. Find the N most recent unknown continuous sounds around the known continuous sound with a Bayesian distance in the unknown continuous sound database with samples. The N average values of the N unknown sounds and the N+1 weighted averages of the linear predictive coding cepstrum (LPCC) of the known continuous sound are taken as the average of the known continuous sounds, and N consecutive sounds are obtained. The weighted average of the N variances is used as the variance of the known continuous tone, and the E×P mean and the variance matrix are the preliminary feature values 79 of the known continuous tone placed in the continuous tone feature database. If there is no sample in the unknown continuous sound database, in the unknown continuous sound database, find N unknown continuous sounds around the known continuous sound with the minimum absolute distance. The linear predictive coding cepstrum (LPCC) of the known continuous tone and the N unknown continuous tones is regarded as (N+1) numbers. Find the weighted average of (N+1) numbers as the average of the known continuous sounds, and find the (N+1) number of variances as the variance of the known continuous sound, this E×P average And the matrix of the variance number represents the preliminary feature of the known continuous tone, and is placed in the known continuous tone feature database 79. In the known continuous tone feature database, if the average of a known continuous tone and the Bayesian distance of the LPCC of the same known continuous tone in the permanently known continuous tone database is not the smallest in the feature database, Then in the feature capital N known continuous tones are found in the library by Bayesian distance, and their Bayesian matrix is N minimum for the LPCC of the known continuous tone. Find N average values of N known continuous sounds and LPCC weighted average of the known continuous sounds as the new average of the known continuous sounds, and use the weighted average of the N variances of the N known continuous sounds as the A new variation of continuous sound is known. This method is used to repeatedly calculate the new average and variance of each known continuous sound in the feature database. Finally, the E×P new average and the variance matrix are called standard models representing the known continuous sounds. The database 80 then uses the known continuous tones of the known feature database to create a sentence and name database 85.

(6)如果辨認一未知連續音c辨認錯誤,本發明提供二種技術修正舊特徵,使該連續音辨認正確; (6) If an unknown continuous sound c is recognized as an error, the present invention provides two techniques for correcting the old feature so that the continuous sound is recognized correctly;

(a)用貝氏分類法(20)在特徵資料庫內找N個和連續音c最相似的連續音{μ ijl ,},i=1,2,...,N,再求平均值(或加權平均值) 用{μ jl ,},j=1,...,E,l=1,...,P,代表該未知連續音的新標準模型,存在連續音特徵資料庫中,再測試該音一定成功。 (a) Using the Bayesian classification (20), find N consecutive sounds { μ ijl that are most similar to the continuous sound c in the feature database. }, i =1,2,..., N , then average (or weighted average) With { μ jl , }, j =1,..., E , l =1,..., P , represents the new standard model of the unknown continuous sound, exists in the continuous sound feature database, and then tests the sound to be successful.

(b)在(a)項中N個最相似已知連續音平均值及該未知連續音的線性預估編碼倒頻譜(LPCC)求加權平均值作為該未知連續音的新平均值,以N個最相似連續音的變異數加權平均值作為未知連續音的新變異數,用{μ jl ,},j=1,...,E,l=1,...,P,代表該未知連續音新的標準模型。 (b) in (a), the N most similar known continuous sound mean values and the linear predictive coding cepstrum (LPCC) of the unknown continuous sound are weighted averages as the new average of the unknown continuous sounds, to N The weighted average of the variance of the most similar continuous sounds as the new variance of the unknown continuous sound With { μ jl , }, j =1,..., E , l =1,..., P , represents the new standard model of the unknown continuous sound.

(7)為了證實本發明能同時辨認任何語言,本發明執行2人語音辨認實驗。 (7) In order to confirm that the present invention can recognize any language at the same time, the present invention performs a 2-person speech recognition experiment.

(a)首先建立一未知連續音資料庫。本單音資料庫是從台灣的中央研究院購買。資料庫一共有388個國語單音(第三圖),全是女性發音,樣本從6個到99個不等,很多單音的發音幾乎一樣。 (a) First establish an unknown continuous sound database. This monophonic database was purchased from the Central Research Institute of Taiwan. There are 388 Mandarin singles in the database (the third picture), all of which are female pronunciations, ranging from 6 to 99 samples, and many monophonic pronunciations are almost the same.

(b)從(2)節中方法將所有樣本轉成E×P LPCC矩陣,一共有12400個矩陣。 (b) Convert all samples into E×P LPCC matrices from the method in (2), a total of 12400 matrices.

(c)在388國語單音中,用樣本求平均值及變異數。 (c) In the 388 Mandarin singles, use the sample to average and the number of variances.

(d)盲目混合388國語單音,使388有樣本平均值及變異數的單音變成388未知連續音資料庫(一國語單音也是只有一音節的連續音)。 (d) Blindly blending 388 Mandarin singles, making 388 samples with sample mean and variance numbers into 388 unknown continuous sound databases (one Mandarin single tone is also a one-syllable continuous sound).

(e)再找一男一女對654國語單音,154音語,1個德語,1個日語及3個台語發音一次建立兩個813個永久已知連續音資料庫,每個連續音以線性預估編碼倒頻譜(LPCC)E×P矩陣表示。 (e) Find another man and a woman to pronounce 654 Mandarin singles, 154 words, 1 German, 1 Japanese and 3 Taiwanese pronunciations to create two 813 permanent known continuous sound databases, each of which is Linear Predictive Coded Cepstrum (LPCC) E x P matrix representation.

(f)在永久已知連續音資料庫813個已知連續音中,對每一已知連續音,用貝氏距離(20)在388未知連續音中找N=15個未知連續音,該已知連續音的線性預估編碼倒頻譜(LPCC)及N個未知連續音的樣本平均值求N+1個加權平均值為該已知連續音平均值,並求N個未知連續音的樣本變異數的 加權平均值為該已知連續音的變異數。此平均值及變異數12×12矩陣叫做該已知連續音的初步特徵79,存在已知連續音特徵資料庫。也即特徵資料庫包含813個12×12平均值及變異數矩陣80。 (f) In the 813 known continuous tones of the permanent known continuous sound database, for each known continuous sound, find N=15 unknown continuous sounds in 388 unknown continuous sounds with Bayesian distance (20). It is known that the linear predictive coding cepstrum (LPCC) of continuous sound and the sample mean of N unknown continuous sounds are N+1 weighted averages for the known continuous sound mean, and samples of N unknown continuous sounds are obtained. Variant number The weighted average is the number of variations of the known continuous tone. This average and variance 12 x 12 matrix is called the preliminary feature 79 of the known continuous tone, and there is a library of known continuous tone features. That is, the feature database contains 813 12×12 averages and a variance matrix 80.

(g)在特徵資料庫中,如果一已知連續音的平均值和在永久連續音資料庫中同樣該已知連續音的LPCC的貝氏距離不是最小。在813連續音特徵資料用貝氏距離找N=15已知連續音。用N個連續音的N個平均值及該已知連續音的LPCC求加權平均值為該已知連續音新平均值。對N個已知連續音的變異數求加權平均值為該已知連續音的新變異數。重覆計算新平均值及變異數多次。最後的12×12平均值及變異數矩陣叫做標準模型,表示該已知連續音特徵,存在已知連續音特徵資料庫中80。 (g) In the feature database, if the average of a known continuous tone and the LPCC of the known continuous tone in the permanent continuous tone database are not the smallest. In the 813 continuous tone feature data, N = 15 known continuous sounds were found using the Bayesian distance. The weighted average of the N averages of the N consecutive tones and the LPCC of the known continuous tones is the new average of the known continuous tones. The weighted average of the variances of the N known continuous tones is the new variation of the known continuous tones. Repeat the calculation of the new average and the number of variations multiple times. The final 12×12 mean and variance matrix is called the standard model and represents the known continuous tone feature, which exists in the known continuous tone feature database.

本發明執行下列連續音辨認,辨認率依人而定,因相似太多,入圍前三名就算對: The present invention performs the following continuous tone recognition, and the recognition rate is determined by the person. Because there are too many similarities, the top three finalists are even right:

①辨認384國語單音,1個德語,1個日語,2個台語(第三圖)(辨認率非常好) 1 Identify 384 Mandarin singles, 1 German, 1 Japanese, 2 Taiwanese (3rd) (very good recognition rate)

②辨認154英語,1個德語(第四圖)(辨認率非常好) 2 Identify 154 English, 1 German (4th) (very good recognition rate)

③同時辨認154英語及388國語,1德語,1日語,2台語(辨認率非常好) 3 Identify 154 English and 388 Mandarin, 1 German, 1 Japanese, 2 (very good recognition rate)

④辨認654國語單音,1德語,1日語,3台語(第三圖及第五圖)(辨認率好,沒有前三者好) 4 Identify 654 Mandarin singles, 1 German, 1 Japanese, 3 Japanese (3rd and 5th) (good recognition rate, no good for the first three)

(8)對一講話者的句子或名稱辨認,我們先建立一英語及國語句子及名稱資料庫,每個句子或名稱內的連續音全部由連續音特徵資料庫內(384+154)已知英語及國語任意組成,由154英語單字組成70英語句子及名稱,384國語單字組成407個國語句子及名稱,(第六圖)。辨認方法如下: (8) To identify the sentence or name of a speaker, we first establish a database of English and national sentences and names. The continuous sounds in each sentence or name are all known from the continuous sound feature database (384+154). English and Mandarin are arbitrarily composed of 154 English words composed of 70 English sentences and names, and 384 Mandarin words consisting of 407 national sentences and names (Sixth). The method of identification is as follows:

(a)切割一未知句子或名稱成為D個未知連續音,每單位時段計算相鄰二信號點落差距離總和,如太小,該時段為雜音或靜音,沒有語音訊號的相鄰單位時段累積太多(比連續音兩音節時間還多),表示全是雜音或靜音,應該是兩連續音分界線就應切割,一共切成D個未知連續音,再用第二圖45,50,60及90流程轉成E×P LPCC矩陣。對每一未知連續音,用貝氏分類法(20)在英語及國語特徵資料庫中選擇最相似F個已知連續音(可能同時包含英語及國語(圖)),一未知句子或名稱以D×F最相似已知連續音表示。 (a) cutting an unknown sentence or name into D unknown continuous sounds, calculating the sum of the distances of adjacent two signal points per unit time period, such as too small, the time period is noise or mute, and the adjacent unit time period without voice signal accumulates too More (more than two consecutive syllables), indicating that it is all noise or mute, it should be cut between two consecutive sound lines, cut into D unknown continuous sounds, and then use the second figure 45, 50, 60 and The 90 process is converted into an E×P LPCC matrix. For each unknown continuous sound, use the Bayesian classification (20) to select the most similar F known continuous sounds in the English and Mandarin feature database (possibly including both English and Mandarin (picture)), an unknown sentence or name. D x F is most similar to the known continuous tone representation.

(b)在句子及名稱資料庫尋找講話者的句子或名稱,在477英語及國語句子和名稱中,挑選長度有(D±1)個已知連續音句子和名稱。 (b) Find the sentence or name of the speaker in the sentence and name database. In the 477 English and national sentences and names, select (D ± 1) known continuous sentences and names.

(c)如果資料庫的選擇比對的句子或名稱和講話者的句子或名稱等長(D個未知連續音時,那麼將D個每列F個相似已知連續音和比對句子或名稱的D個已知連續音依順序比對,看看F個相似連續音有沒有比對句子或名稱內的已知連續音。如每列相似連續音內都含一比對句子或名稱內的已知連續音,辨認正確連續音是D個,則該比對的句子或名稱就是講話者的句子或名稱。 (c) If the choice of database is the same as the sentence or name of the speaker and the sentence or name of the speaker (D unknown continuous tones, then D each column of F similar known continuous tones and alignment sentences or names The D known continuous tones are compared in order, to see if there are any similar continuous tones in the sentence or the name in the F consecutive continuous tones. For example, each column of similar continuous tones contains a comparison sentence or within the name. A continuous sound is known, and it is recognized that the correct continuous sound is D, and the sentence or name of the comparison is the sentence or name of the speaker.

(d)如果資料庫比對句子和名稱內已知連續音數是D-1或D+1或(c)的辨認正確連續音不是D個,本發明則用3×F視窗篩選。在比對句子或名稱(資料庫內)中,第i個已知連續音,用D×F矩陣中前後三列相似已知連續音(即第i-1,i,i+1列)比對第i個已知連續音,計算D×F矩陣有多少比對句子或名稱內的已知連續音,再除 以總數D得到該比對句子或名稱的機率,在資料庫選擇一機率最大句子或名稱為講話者的發音。 (d) If the database matches the sentence and the known continuous number of sounds in the name is D-1 or D+1 or (c), the correct continuous sound is not D, and the present invention uses a 3×F window to screen. In the comparison sentence or name (in the database), the ith known continuous sound, using the similarly continuous continuous sounds (ie, the i - th , i , i +1 columns) in the three columns of the D×F matrix For the ith known continuous sound, calculate how many ratios of the D×F matrix are compared to the known continuous sounds in the sentence or name, and divide by the total number D to get the probability of the matching sentence or name, and select the highest probability in the database. The sentence or name is the speaker's pronunciation.

(e)假如某句子或名稱辨認錯誤,一定是在D個未知連續音中,有一或多個不在它們的F個相似已知連續音中,用貝氏分類法(20)在(155+384)已知連續音中尋找前N=15順位已知連續音,求N個相似連續音及該未知連續音的LPCC加權平均值改進該未知連續音,務使D個未知連續音在它們F個相似已知連續音內,再測試一定成功。 (e) If a sentence or name is identified incorrectly, it must be in D unknown continuous sounds, one or more of them are not in their F similar known continuous tones, using the Bayesian classification (20) at (155+384) Knowing that the continuous Nine is the first N=15-order known continuous sound, and the N similar continuous sounds and the LPCC weighted average of the unknown continuous sound are used to improve the unknown continuous sound, so that D unknown continuous sounds are in their F Similar to known continuous sounds, the test will be successful.

本發明執行下列英語及國語句子及名稱辨認,辨認幾乎全部正確,依人而異: The present invention implements the following English and national sentences and name recognition, and the identification is almost entirely correct and varies from person to person:

①辨認70英語句子及名稱(非常好)。 1 Identify 70 English sentences and names (very good).

②辨認407國語句子及名稱(非常好) 2 Identify 407 country sentences and names (very good)

③辨認70英語句子及名稱與407國語句子及名稱(非常好)。 3 Identify 70 English sentences and names with 407 national sentences and names (very good).

*附二張Visual Basic辨認圖(第七圖、第八圖)表示同時辨認英語及國語句子、名稱。 * Two Visual Basic recognition diagrams (7th and 8th) indicate the simultaneous recognition of English and national sentences and names.

(1)‧‧‧建立一已知連續音永久資料庫,發音一連續音或一句子,句子再分成多個已知連續音 (1) ‧‧‧Create a permanent database of known continuous sounds, pronounce a continuous sound or a sentence, and divide the sentence into several known continuous sounds

(10)‧‧‧連續音連續音波 (10)‧‧‧Continuous sound continuous sound waves

(20)接收器 (20) Receiver

(30)‧‧‧音波數位轉換器 (30)‧‧‧Sonic Digital Converter

(45)‧‧‧除去雜音 (45) ‧ ‧ remove noise

(50)‧‧‧E個彈性框正常化音波 (50) ‧‧‧E elastic frames normalized sound waves

(60)‧‧‧最小平方法計算線性預估編碼倒頻譜向量 (60) ‧‧‧Last-flat method for calculating linear predictive coding cepstrum vectors

(70)‧‧‧用貝氏距離(絕對值距離),對每一已知連續音(永久資料庫),在未知連續音資料庫找N個最近未知連續音。 (70) ‧‧‧ Use the Bayesian distance (absolute value distance) to find N recently unknown continuous sounds in the unknown continuous sound database for each known continuous sound (permanent database).

(79)‧‧‧對每一已知連續音(永久資料庫),用周圍N個未知連續音及該已知連續音的線性預估編碼倒頻譜求加權平均值,為該已知連續音初步特徵,放在特徵資料庫。再在特徵資料庫用貝氏距離找N個已知連續音和該已知連續音LPCC求加權平均值,計算多次。最後加權平均值(E×P平均值及變異數)代表該已知連續音的標準模型 (79) ‧ ‧ For each known continuous sound (permanent database), use the surrounding N unknown continuous sounds and the linear predictive coding cepstrum of the known continuous sound to obtain a weighted average for the known continuous sound Preliminary features are placed in the feature database. Then, in the feature database, N known continuous sounds and the known continuous sound LPCC are used to find a weighted average value, and the calculation is performed multiple times. The final weighted average (E × P mean and variance) represents the standard model of the known continuous sound

(80)‧‧‧已知連續音特徵資料庫包含所有平均值及變異數的標準模型 (80) ‧‧‧The known continuous tone feature database contains standard models for all mean and variance

(85)‧‧‧用已知連續音特徵資料庫的連續音建立要辨認的句子及名稱 的句子及名稱資料庫 (85) ‧ ‧ Build a sentence and name to be recognized using the continuous sound of a known continuous tone feature database Sentence and name database

(2)‧‧‧輸入一未知句子或名稱 (2) Enter an unknown sentence or name

(11)‧‧‧一組未知連續音波 (11)‧‧‧A group of unknown continuous sound waves

(40)‧‧‧將一句子或名稱切成D個未知連續音 (40) ‧ ‧ Cut a sentence or name into D unknown continuous sounds

(90)‧‧‧D個未知連續音的線性預估編碼倒頻譜矩陣代表D個未知連續音分類模型 (90) ‧‧‧D linear predictive coding cepstrum matrix of unknown continuous sounds representing D unknown continuous sound classification models

(100)‧‧‧用貝氏分類法比較每一已知連續音標準模型及未知連續音分類模型 (100)‧‧‧Compare each known continuous sound standard model and unknown continuous sound classification model by Bayesian classification

(110)‧‧‧一句子或名稱中為每一未知連續音找最相近的F個已知連續音,一句子或名稱一共有D×F個已知最相似連續音表示 (110)‧‧‧ In the sentence or name, find the closest F known continuous sounds for each unknown continuous sound, one sentence or name has a total of D × F known most similar continuous sounds

(120)‧‧‧在句子和名稱資料庫中,用3×F視窗的相似已知連續音篩選所有句子及名稱中的每個已知連續音 (120) ‧‧‧ In the sentence and name database, use the similar known continuous tones of the 3×F window to filter all sentences and each known continuous sound in the name

(130)‧‧‧在句子及名稱資料庫中找一最可能的句子或名稱 (130) ‧‧ Find the most likely sentence or name in the sentence and name database

第一圖及第二圖說明發明執行程序。第一圖是表示已知連續音永久資料庫,已知連續音特徵資料庫和句子及名稱三個資料庫建立流程,第二圖表示一未知句子或名稱辨認方法流程。 The first and second figures illustrate the invention execution procedure. The first figure shows the known continuous sound permanent database, the known continuous sound feature database and the three database establishment flow of sentences and names, and the second figure shows the flow of an unknown sentence or name recognition method.

第三圖是表示辨認384國語單音,1個德語,1個日語,2個台語。 The third picture shows the recognition of 384 Mandarin singles, 1 German, 1 Japanese, 2 Taiwanese.

第四圖是表示辨認154英語,1個德語。 The fourth picture shows the recognition of 154 English and 1 German.

第五圖是表示辨認269國語單音,3台語。 The fifth picture shows the recognition of 269 Mandarin singles and 3 words.

第六圖是表示句子及名稱資料庫有70英語句子及407中文句子及名稱。 The sixth picture shows that the sentence and name database has 70 English sentences and 407 Chinese sentences and names.

第七圖及第八圖說明Visual Basic辨認圖表示同時辨認英語及國語句子、名稱方法。 The seventh and eighth diagrams illustrate the Visual Basic recognition diagram showing the simultaneous recognition of English and national sentences and names.

(2)‧‧‧輸入一未知句子或名稱 (2) Enter an unknown sentence or name

(11)‧‧‧一組未知連續音波 (11)‧‧‧A group of unknown continuous sound waves

(20)‧‧‧接收器 (20)‧‧‧ Receiver

(30)‧‧‧音波數位轉換器 (30)‧‧‧Sonic Digital Converter

(40)‧‧‧將一句子或名稱切成D個未知連續音 (40) ‧ ‧ Cut a sentence or name into D unknown continuous sounds

(45)‧‧‧除去雜音 (45) ‧ ‧ remove noise

(50)‧‧‧E個彈性框正常化音波 (50) ‧‧‧E elastic frames normalized sound waves

(60)‧‧‧最小平方法計算線性預估編碼倒頻譜(LPCC)向量 (60) ‧‧‧Last-flat method for calculating linear predictive coding cepstrum (LPCC) vectors

(90)‧‧‧D個未知連續音的線性預估編碼倒頻譜(LPCC)矩陣代表D個未知連續音分類模型 (90) ‧‧‧D linear predictive coding cepstrum (LPCC) matrices of unknown continuous sounds represent D unknown continuous sound classification models

(80)‧‧‧連續音特徵資料庫包含所有平均值及變異數的標準模型 (80) The ‧ ‧ continuous tone feature database contains standard models for all mean and variance

(100)‧‧‧用貝氏分類法比較已知連續音標準模型與未知連續音分類模型 (100)‧‧‧Comparing the known continuous sound standard model with the unknown continuous sound classification model by Bayesian classification

(110)‧‧‧一句子或名稱中為每一未知連續音找最相近的F個已知連續音,一句子或名稱一共有D×F個已知最相似連續音表示 (110)‧‧‧ In the sentence or name, find the closest F known continuous sounds for each unknown continuous sound, one sentence or name has a total of D × F known most similar continuous sounds

(85)‧‧‧用連續音特徵資料庫的已知連續音建立要辨認的句子及名稱(任意一組連續音)的句子及名稱資料庫 (85) ‧‧‧Create a sentence and name database of the sentence and name (any set of continuous sounds) to be recognized using the known continuous sounds of the continuous tone feature database

(120)‧‧‧在句子和名稱資料庫中,用3×F視窗篩選所有句子及名稱中的每個已知連續音 (120) ‧‧‧ In the sentence and name database, use the 3×F window to filter all sentences and each known continuous sound in the name

(130)‧‧‧在句子和名稱資料庫中找一最可能的句子或名稱 (130) ‧‧ Find the most likely sentence or name in the sentence and name database

Claims (9)

一種不用樣本能辨認所有語言的辨認方法,其步驟包含:(1)有一無樣本之資料庫,該資料庫包含未知連續音的音波;(2)有一已知連續音永久資料庫,該資料庫包含使用者對已知連續音發音一次的音波及有一句子及名稱資料庫包含已知連續音;(3)使用處理器刪去不具語音音波信號點(sampled points)或雜音;(4)在該無樣本資料庫不同未知連續音及永久資料庫已知連續音,正常化每一連續音音波並抽取該每一連續音音波的特徵:用E個彈性框將該音波正常化並轉換成大小相等的線性預估編碼倒頻譜(LPCC)ExP特徵矩陣;(5)正常化使用者之未知連續音音波及抽取該音波特徵:將該音波正常化並轉換成ExP線性預估編碼倒頻譜(LPCC)矩陣,表示為未知連續音分類模型;(6)在無樣本未知連續音資料庫中,找N個未知連續音的N個ExP特徵矩陣和已知連續音的ExP線性預估編碼倒頻譜(LPCC)矩陣距離最小的N個未知連續音的線性預估編碼倒頻譜(LPCC)矩陣及該已知連續音的線性預估編碼倒頻譜(LPCC)矩陣作為(N+1)數,計算(N+1)數加權 平均值及變異數,此E×P平均值及變異數矩陣叫做該已知連續音標準模型,存在已知連續音特徵資料庫中;(7)使用一貝氏(Bayesian)分類演算法:比較使用者之該未知連續音分類模型與已知連續音標準模型,找一已知連續音與使用者之未知連續音之貝式距離為最小者,辨認為該使用者未知連續音;(8)將使用者發音的未知句子或名稱切成D個未知連續音;(9)使用一貝式分類法在D個未知連續音中,為每一未知連續音在已知連續音特徵資料庫中,選F個貝式距離最短的已知連續音,表示F個最相似已知連續音,一未知句子或名稱用D×F矩陣的已知連續音表示;(10)用未知句子或名稱的D×F矩陣的已知連續音,比對句子及名稱資料庫全部句子及名稱,計算機率以比對句子或名稱的已知連續音落在3×F視窗內多少除以D,找尋一比對已知句子或名稱為使用者的未知句子或名稱;(11)有一修正一連續音特徵。 A method for recognizing all languages without using a sample, the steps comprising: (1) having a database without samples, the database containing sound waves of unknown continuous sounds; and (2) a permanent database of known continuous sounds, the database The sound wave containing the user's pronunciation of the known continuous sound and the sentence and name database containing the known continuous sound; (3) using the processor to delete the sampled points or murmurs without voice sound; (4) The sample-free database has different unknown continuous sounds and permanent data banks known as continuous sounds, normalizes each successive sound wave and extracts the characteristics of each successive sound wave: normalizes and converts the sound waves into equal sizes with E elastic frames a linear predictive coding cepstrum (LPCC) ExP feature matrix; (5) normalizing the user's unknown continuous sound wave and extracting the sound signature: normalizing the sound wave and converting it into an ExP linear predictive coding cepstrum (LPCC) Matrix, expressed as an unknown continuous sound classification model; (6) Find N ExP feature matrices of N unknown continuous sounds and ExP linear predictive coding cepstrum of known continuous sounds in a sampleless unknown continuous sound database (LPCC) a linear predictive coding cepstrum (LPCC) matrix of N unknown continuous tones with a minimum matrix distance and a linear predictive coding cepstrum (LPCC) matrix of the known continuous tone as (N+1) number, calculated ( N+1) number weighting Mean and variance, this E × P mean and variance matrix is called the known continuous sound standard model, there is a library of known continuous sound features; (7) using a Bayesian classification algorithm: comparison The user's unknown continuous sound classification model and the known continuous sound standard model find that the shell distance of a known continuous sound and the user's unknown continuous sound is the smallest, and the user is determined to be unknown continuous sound; (8) Cutting the unknown sentence or name of the user into D unknown continuous sounds; (9) using a Bayesian classification method in D unknown continuous sounds, for each unknown continuous sound in the known continuous sound feature database, Select the F consecutive shortest known continuous tones, representing the F most similar known continuous tones, an unknown sentence or name expressed in the known continuous tones of the D×F matrix; (10) D with unknown sentences or names The known continuous sound of the ×F matrix, comparing all the sentences and names of the sentence and the name database, the computer rate is compared with the known continuous sound of the sentence or the name in the 3×F window, divided by D, to find a comparison Known sentence or name is the unknown sentence or name of the user ; (11) a continuous tone having a correction characteristic. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其中步驟(3)刪去不具語音的音波或雜音,更包含:(a)在一小時段內信號點,計算信號點的變異數及沒有音波信號點的變異數,當信號點的變異數小於沒有音波信號點變異 數,則刪去該時段;(b)在一小時段內信號點,計算相鄰兩信號點距離總和和沒有音波信號點相鄰兩信號點距離總和,當前者小於後者則刪去該時段。 According to the scope of claim 1 of the patent application, a method for recognizing all languages can be identified without using a sample, wherein step (3) deletes the sound wave or noise without speech, and further includes: (a) signal points in one hour period, calculation The number of variations in signal points and the number of variations without sound signal points, when the number of variances of signal points is less than the variation of no signal points If the number is deleted, the time period is deleted; (b) the signal point is calculated within one hour, and the sum of the distance between the adjacent two signal points and the distance between the two signal points adjacent to the sound signal point is calculated. If the current one is smaller than the latter, the time period is deleted. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其中步驟(4)更包含一連續音音波正常化及抽取大小一致的特徵矩陣,步驟如下:(a)均等分一連續音音波信號點:為了用線性變化的迴歸模式密切估計非線性變化的音波,將音波全長分成E等時段,每時段形成一彈性框,一連續音共有E個等長彈性框,沒有濾波器(Filter),不重疊,使其可以自由伸縮含蓋全長音波;(b)每框內,用一隨時間作線性變化的迴歸模式估計隨時間作非線性變化的音波;(c)信號點S(n)可由前面信號點估計,其估計值S'(n)由下列迴歸模式表示: 在(1)式中,a k ,k=1,...,P,是迴歸未知係數估計值,P是前面信號點數目,以E 1表示信號點S(n)及其估計值S'(n)之間平方差總和: 求迴歸係數使平方總和E 1達最小,對每個未知迴歸係數a i ,i=1,...,P,求(2)式的偏微分,並使偏微分為0,得到P組正常方程式: 展開(2)式後,以(3)式代入,得最小總平方差E P (3)式及(4)式轉換為 在(5)及(6)式中,用N表示框內信號點數, 用Durbin的循環快速計算線性預估編碼(LPC)向量如下:E 0=R(0) (8) (8-12)公式循環計算,得到迴歸係數最小平方估計值a j j=1,...,P,叫線性預估編碼(LPC)向量如下: 再用下列公式將LPC向量轉換較穩定線性預估編碼倒頻譜(LPCC)向量a' j j=1,...,P, 一彈性框產生一線性預估編碼倒頻譜(LPCC)向量(a'1,...,a' P );(d)一連續音以E個線性預估編碼倒頻譜(LPCC)向量表示特徵,用該E個線性預估編碼倒頻譜(LPCC)向量表示一連續音。 According to the first aspect of the patent application, a method for identifying all languages can be identified without using a sample, wherein the step (4) further comprises a continuous sound wave normalization and extracting a feature matrix of uniform size, the steps are as follows: (a) Equalization A continuous sound wave signal point: In order to closely estimate the nonlinearly varying sound wave with a linearly varying regression mode, the full length of the sound wave is divided into E equal periods, and an elastic frame is formed every time period, and a continuous sound has a total of equal length elastic frames without filtering. Filter, which does not overlap, so that it can freely expand and contract the full-length sound wave; (b) In each frame, use a regression model that changes linearly with time to estimate the acoustic wave that changes nonlinearly with time; (c) Signal point S ( n ) can be estimated from the previous signal point, and its estimated value S '( n ) is represented by the following regression modes: In equation (1), a k , k =1,..., P are the estimated values of the regression unknown coefficients, P is the number of previous signal points, and E 1 is the signal point S ( n ) and its estimated value S ' The sum of the squared differences between ( n ): Find the regression coefficient to minimize the sum of the squares E 1 . For each unknown regression coefficient a i , i =1,..., P , find the partial differential of (2) and divide the partial differential into 0 to get the normal P group. equation: After expanding (2), substitute (3) to get the minimum total squared difference E P (3) and (4) are converted to In equations (5) and (6), N is used to indicate the number of signal points in the frame. The linear predictive coding (LPC) vector is quickly calculated using Durbin's loop as follows: E 0 = R (0) (8) (8-12) The formula is cyclically calculated to obtain the regression coefficient least squares estimate a j , j =1,..., P , called the linear predictive coding (LPC) vector as follows: Then use the following formula to convert the LPC vector to a more stable linear predictive coding cepstrum (LPCC) vector a ' j , j =1,..., P , An elastic frame produces a linear predictive coding cepstrum (LPCC) vector ( a ' 1 ,..., a ' P ); (d) a continuous tone is characterized by E linear predictive coding cepstrum (LPCC) vectors The E linear predictive coding cepstrum (LPCC) vector is used to represent a continuous tone. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其中步驟(5)更包含去計算未知連續音的分類模型,其步驟如下:(a)將未知連續音音波分成E等時段,每時段組成一彈性框,一未知連續音有E個等長彈性框,沒有濾波器,不重疊,自由伸縮含蓋全部音波信號點;(b)每個彈性框內,用一隨時間作線性變化的迴歸模式估計隨時間作非線性變化的音波;(c)用Durbin’s循環方式 E 0=R(0) 計算迴歸係數最小平方估計值a j ,1 j P,叫線性預估編碼向量(LPC向量);(d)再將LPC向量用公式 轉換成穩定線性預估編碼倒頻譜(LPCC)向量a' i ,1 i P;(e)用E個線性預估編碼倒頻譜(LPCC)向量,E×P LPCC矩陣,作為該未知連續音的分類模型。 According to the claim 1, the non-sample can identify the recognition method of all languages, wherein the step (5) further comprises a classification model for calculating the unknown continuous sound, the steps are as follows: (a) dividing the unknown continuous sound wave into E, etc., each period consists of a flexible frame, an unknown continuous sound has E equal length elastic frames, no filters, no overlap, freely telescopic covers all acoustic signal points; (b) within each elastic frame, with one A regression model that linearly changes over time estimates acoustic waves that change nonlinearly over time; (c) uses Durbin's circular approach E 0 = R (0) Calculate the regression coefficient least squares estimate a j ,1 j P , called linear predictive coding vector (LPC vector); (d) then formulate LPC vector Converted to a stable linear predictive coding cepstrum (LPCC) vector a ' i ,1 i P ; (e) E linear predictive coding cepstrum (LPCC) vector, E × P LPCC matrix, as the classification model of the unknown continuous sound. 根據申請專利範圍第1項所述一種不用樣本能辨認所有語言的辨認方法,其步驟(7)更包含一貝氏演算法(Bayesian)辨認未知連續音,其步驟如下:(a)一未知連續音的特徵之分類模型,用一E×P LPCC矩陣X={X jl },j=1,...,El=1,...,P,表示:E×P個LPCC{X jl }是E×P個獨立隨機變數,且有正常分配,當未知連續音和一已知連續 音c i i=1,...,m,(m是所有已知連續音總數)比對時,則{X jl }的平均數及變異數(μ ijl ,)用該已知連續音標準模型平均值及本變異數估計,那麼X的條件密度函數是 X={X jl }是未知連續音的分類模型的線性預估編碼倒頻譜(LPCC),但(μ ijl ,)可用已知連續音c i 標準模型內的平均數及變異數估計;(b)用貝氏分類法針對已知連續音特徵資料庫中找一已知連續音c i 最像此未知連續音X,一已知連續音c i 對未知連續音X相似度以下式中f(x|c i )表示 (c)用對數化簡(b)中條件密度函數f(x|c i ),並刪去不必計算的常數,得貝式距離,該距離計算如下 (d)對每一已知連續音c i i=1,...,m,計算(c)式中貝式距離l(c i )值;(e)在特徵資料庫中,選擇一已知連續音c' i ,當該已知連續音c' i 它的貝式距離l()值是最小,判為該未知連續音。 According to the first aspect of the patent application scope, a method for recognizing all languages without using a sample, the step (7) further comprises a Bayesian algorithm for identifying unknown continuous sounds, the steps of which are as follows: (a) an unknown continuous The classification model of the characteristics of the sound, using an E × P LPCC matrix X = { X jl }, j =1, ..., E , l =1, ..., P , means: E × P LPCC { X Jl } is E × P independent random variables, and there is a normal allocation, when the unknown continuous sound and a known continuous sound c i , i =1,..., m , ( m is the total number of all known continuous sounds) For the time, the average and variation of { X jl } ( μ ijl , Using the known continuous sound standard model mean and the present variance estimate, then the conditional density function of X is X = { X jl } is the linear predictive coding cepstrum (LPCC) of the classification model of unknown continuous sound, but ( μ ijl , ()) Estimate the mean and variance in the standard model of the known continuous tone c i ; (b) Find a known continuous tone c i in the known continuous tone feature database using the Bayesian classification. X , a known continuous sound c i versus unknown continuous sound X similarity in the following expression f ( x | c i ) (c) Using the logarithmic density function f ( x | c i ) in logarithm simplification (b), and deleting the constants that do not have to be calculated, the Bayer distance is obtained, and the distance is calculated as follows (d) for each known continuous sound c i , i =1,..., m , calculate the value of the shell distance l ( c i ) in ( c ); (e) in the feature database, select one The continuous sound c ' i is known, when the known continuous sound c ' i its shell distance l ( The value is the smallest and is judged as the unknown continuous sound. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其步驟(8)更包含一未知句子或名稱切成D個未知連續音:(a)每單位時段計算相鄰二個信號點落差距離總和,和沒有音 波信號點相鄰兩信號點距離總和,當前者小於後者則刪去該時段,該時段則是靜音或雜音,沒有語音訊號;(b)當靜音或雜音相鄰單位時段累積到比連續音內兩個音節之間長時,將該時段判斷為兩連續音分界線,切割該未知句子或名稱為D個未知連續音;(c)再將每個連續音除去靜音及雜音,使彈性框正常化,最小平方計算E×P線性預估編碼倒頻譜(LPCC)矩陣,代表一未知連續音,該句子或名稱使用D個線性預估編碼倒頻譜(LPCC)E×P矩陣表示。 According to the method of claim 1, the method for recognizing all languages can be identified without using a sample, and the step (8) further comprises an unknown sentence or a name cut into D unknown continuous sounds: (a) calculating adjacent neighbors per unit time period The sum of the two signal points, the sum of the distances, and no sound The distance between the two signal points adjacent to the wave signal point, the current one is smaller than the latter, the time period is deleted, the time period is mute or noise, and there is no voice signal; (b) when the mute or noise adjacent unit time period is accumulated to the continuous sound When the two syllables are long, the time period is judged as two continuous sound dividing lines, and the unknown sentence or the name is D unknown continuous sounds; (c) each continuous sound is removed from the mute and the murmur, so that the elastic frame is normal. The least squares E×P linear predictive coding cepstrum (LPCC) matrix represents an unknown continuous tone, which is represented by D linear predictive coding cepstrum (LPCC) E×P matrices. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其步驟(9)更包含下列:(a)將使用者之未知句子或名稱切割成D個未知連續音後,每一未知連續音{X jl }用貝式分類法,在特徵資料庫中,計算每個已知連續音c i ={μ ijl ,}和該未知連續音{X jl }貝式距離l(c i ), 找最近的F個已知連續音,該F個已知連續音可同時包含多種語言,並用該F個已知相似連續音表示一未知連續音;(b)使D列F個已知相似連續音表示該未知句子或名稱。 According to the method of claim 1, the method for recognizing all languages is not used in the sample, and the step (9) further comprises the following steps: (a) after cutting the unknown sentence or name of the user into D unknown continuous sounds, Each unknown continuous sound { X jl } uses the Bayesian classification method to calculate each known continuous sound c i ={ μ ijl in the feature database. } and the unknown continuous sound { X jl } shell distance l ( c i ), Find the nearest F known continuous sounds, which can contain multiple languages at the same time, and use the F known similar continuous tones to represent an unknown continuous sound; (b) Make D columns F known similarly continuous The tone indicates the unknown sentence or name. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其步驟(10)更包含下列一句子及名稱辨認: (a)在句子及名稱資料庫中,挑選出D-1,D,D+1個已知連續音的比對句子和名稱;(b)當在該句子及名稱資料庫中,對挑選出的句子或名稱,當它的長度剛好和講話者的句子或名稱都有D個已知連續音時,那麼將D個每列F個相似已知連續音和被挑選的比對句子或名稱的D個已知連續音依順序比對,判斷F個相似已知連續音中是否存在被比對句子或名稱內的已知連續音,當每列F個相似已知連續音依次都包含比對句子或名稱內一已知連續音時,則表示比對句子或名稱全部有D個未知連續音,該比對句子或名稱辨認講話者的句子或名稱;(c)當句子及名稱資料庫中的比對句子或名稱有D個已知連續音時,當講話者的D個連續音沒有完全辨認時,或比對句子或名稱是D-1或D+1個已知連續音時,則使用3×F視窗篩選,該篩選包含用D×F矩陣相似已知連續音中前後三列相似已知連續音,比對句子及名稱資料庫中有D個或D±1個已知連續音的比對句子或名稱中每一已知連續音,在資料庫中選擇一機率最大的比對句子或名稱為講話者的句子或名稱,該機率值是以比對句子或名稱的已知連續音落在3×F視窗內的數目除以D。 According to the method of claim 1 of the patent application, a method for recognizing all languages can be identified without using a sample, and the step (10) further includes the following sentence and name recognition: (a) in the sentence and name database, select the matching sentences and names of D-1, D, D+1 known continuous sounds; (b) in the sentence and name database, select the The sentence or name, when it is just the length and the speaker's sentence or name has D known continuous tones, then D each column of F similar known continuous tones and the selected alignment sentence or name D known continuous sounds are sequentially compared, and it is judged whether there are known continuous sounds in the similar known continuous sounds in the compared sentences or names, and each of the F similar known continuous sounds includes the alignment in turn. When a sentence or a known continuous tone is in the name, it means that there are D unknown continuous sounds in the matching sentence or name, the matching sentence or name identifies the sentence or name of the speaker; (c) in the sentence and name database When the matching sentence or name has D known continuous sounds, when the speaker's D consecutive sounds are not fully recognized, or when the matching sentence or name is D-1 or D+1 known continuous sounds, then Screening using a 3×F window containing similarly known continuous tones in the three consecutive columns of similar known continuous tones in a D×F matrix Compare the sentences and the name database with D or D±1 known continuous sounds in the matching sentence or each known continuous sound in the name, and select the most likely matching sentence or name in the database. The sentence or name of the person, the probability value is the number of known continuous sounds of the matching sentence or name falling within the 3×F window divided by D. 根據申請專利範圍第1項所述之一種不用樣本能辨認所有語言的辨認方法,其步驟(11)更包含一修正連續音特徵: (a)當某句子或名稱辨認錯誤,在D個未知連續音中,有一或多個不在它們的F個相似已知連續音中,用c表示其中一未知連續音不在它的F個相似已知連續音中,用N個最相似的已知連續音,求N個連續音特徵{μ ijl ,},i=1,...,N,j=1,...,E,l=1,...,P,的平均值或依順序加權,,,使平均值{μ jl ,}代表該未知連續音c的新特徵;(b)在(a)項中,以測試者發音的線性預估編碼倒頻譜(LPCC)和N個最相似已知連續音的N個平均值求N+1個加權平均值為該未知連續音的新平均值,求N個最相似已知連續音的N個變異數的加權平均值為該未知連續音的新變異數,此新平均值及變異數E×P矩陣代表該未知連續音新的標準模型;(c)再測試該未知句子或名稱。 According to the method of claim 1 of the patent application, a method for identifying all languages can be recognized without a sample, and the step (11) further comprises a modified continuous sound feature: (a) when a sentence or name is recognized incorrectly, in D unknown continuous One or more of the F-like similar known continuous tones, with c indicating that one of the unknown continuous tones is not in its F similarly known continuous tones, using the N most similar known continuous tones, Find N consecutive sound features { μ ijl , }, i =1,..., N , j =1,..., E , l =1,..., P , the average value or weighted in order, , To make the average { μ jl , } represents a new feature of the unknown continuous sound c ; (b) in (a), the average of the linear predictive coding cepstrum (LPCC) of the tester's pronunciation and the N average of the N most similar known continuous tones The N+1 weighted averages are the new averages of the unknown continuous sounds, and the weighted average of the N variances of the N most similar known continuous sounds is the new variance of the unknown continuous sound, the new average and The variance E×P matrix represents a new standard model of the unknown continuous sound; (c) the unknown sentence or name is retested.
TW98126015A 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples TWI395200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98126015A TWI395200B (en) 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98126015A TWI395200B (en) 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples

Publications (2)

Publication Number Publication Date
TW201106340A TW201106340A (en) 2011-02-16
TWI395200B true TWI395200B (en) 2013-05-01

Family

ID=44814314

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98126015A TWI395200B (en) 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples

Country Status (1)

Country Link
TW (1) TWI395200B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI460613B (en) * 2011-04-18 2014-11-11 Tze Fen Li A speech recognition method to input chinese characters using any language

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336108B1 (en) * 1997-12-04 2002-01-01 Microsoft Corporation Speech recognition with mixtures of bayesian networks
US20070033027A1 (en) * 2005-08-03 2007-02-08 Texas Instruments, Incorporated Systems and methods employing stochastic bias compensation and bayesian joint additive/convolutive compensation in automatic speech recognition
CN101079103A (en) * 2007-06-14 2007-11-28 上海交通大学 Human face posture identification method based on sparse Bayesian regression

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336108B1 (en) * 1997-12-04 2002-01-01 Microsoft Corporation Speech recognition with mixtures of bayesian networks
US20070033027A1 (en) * 2005-08-03 2007-02-08 Texas Instruments, Incorporated Systems and methods employing stochastic bias compensation and bayesian joint additive/convolutive compensation in automatic speech recognition
CN101079103A (en) * 2007-06-14 2007-11-28 上海交通大学 Human face posture identification method based on sparse Bayesian regression

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Tze Fen Li and Shui-Ching Chang "Classification on Defective Items Using Unidentified Samples," the Journal of the Pattern Recognition Society, Elsevier Computer Science, Pattern Recognition 38, 2005, pp. 51-58. *

Also Published As

Publication number Publication date
TW201106340A (en) 2011-02-16

Similar Documents

Publication Publication Date Title
TWI396184B (en) A method for speech recognition on all languages and for inputing words using speech recognition
US8160877B1 (en) Hierarchical real-time speaker recognition for biometric VoIP verification and targeting
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
Bezoui et al. Feature extraction of some Quranic recitation using mel-frequency cepstral coeficients (MFCC)
EP0838805B1 (en) Speech recognition apparatus using pitch intensity information
US20190279644A1 (en) Speech processing device, speech processing method, and recording medium
Yücesoy et al. Gender identification of a speaker using MFCC and GMM
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
Yusnita et al. Malaysian English accents identification using LPC and formant analysis
US20010010039A1 (en) Method and apparatus for mandarin chinese speech recognition by using initial/final phoneme similarity vector
Hidayat et al. Wavelet detail coefficient as a novel wavelet-mfcc features in text-dependent speaker recognition system
JP5091202B2 (en) Identification method that can identify any language without using samples
Kamble et al. Emotion recognition for instantaneous Marathi spoken words
TWI297487B (en) A method for speech recognition
TWI395200B (en) A speech recognition method for all languages without using samples
Hasija et al. Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier
CN110838294B (en) Voice verification method and device, computer equipment and storage medium
TWI460718B (en) A speech recognition method on sentences in all languages
US20120116764A1 (en) Speech recognition method on sentences in all languages
Mousa MareText independent speaker identification based on K-mean algorithm
Chakraborty et al. An automatic speaker recognition system
Sailaja et al. Text independent speaker identification with finite multivariate generalized gaussian mixture model and hierarchical clustering algorithm
Li et al. Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra
Dutta et al. A comparative study on feature dependency of the Manipuri language based phonetic engine
Patro et al. Statistical feature evaluation for classification of stressed speech

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees