TW201106340A

TW201106340A - A speech recognition method for all languages without using samples

Info

Publication number: TW201106340A
Application number: TW98126015A
Authority: TW
Inventors: Tze-Fen Li; Lee Tai-Jan Li; Shih-Tzung Li; Shih-Hon Li; Li-Chuan Liao
Original assignee: Tze-Fen Li; Lee Tai-Jan Li; Shih-Tzung Li; Shih-Hon Li; Li-Chuan Liao
Priority date: 2009-08-03
Filing date: 2009-08-03
Publication date: 2011-02-16
Also published as: TWI395200B

Abstract

The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12x12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.

Description

201106340 、發明說明：【發明所屬之技術領域】 -個連續音包含-個❹個音節（單音）。本發明可以不用連續音的樣本能辨認所有語言。本發明用12彈性框（窗），等長，無遽波器，不重疊，將長短不一的—個連續音的音波賴成12x12的線性預估編碼倒頻譜(LPCC)的矩陣，-個未知的連續音用12χ12的線性預估編碼倒頻譜的矩陣表示。—個12χΐ2矩陣認為是—個⑷ 度空間的一個向量。很多未知連續音的向量散佈在144度空間。當說話人發-個已知連續音，該已知連續音的特徵由周圍的未知連續音的特徵(LPCC)模擬及計算。本發明包含_性框正常化—個連續音的音波，貝氏 =對法在貞料庫巾為發音者的未知連續音找—個已知連續音，將-個說話者的-個未知的句子分成D個未知連續音，及-個視ff帛·’筛選―個已知句子為繼者的未知【先前技術】 4卞士發-個連續音時，它的發音是用音波表示。音波是一種隨㈣作非線性變化㈣統，—個連續音音助含有—種動性，也隨時間作非線性連續變化。相同連續音發音時，有二連串相同動態特性，隨時間作非線性伸展及收縮，但相性依時間排列秩序-樣’但時間不同。相同連續音發音時:將相同的動態特性排列在同一時間位置上非常困難。更因相似連 201106340 續音特多，造成辨認更難。 — 個也腦化浯^辨認系統，首先要抽取聲波有關語言資201106340, invention description: [Technical field to which the invention pertains] - A continuous tone contains - one syllable (mono tone). The present invention can recognize all languages without a sample of continuous sound. The invention uses 12 elastic frames (windows), equal length, no chopper, no overlap, and the sound waves of different lengths and one continuous sound are converted into a matrix of 12x12 linear predictive coding cepstrum (LPCC), Unknown continuous sounds are represented by a matrix of 12χ12 linear predictive coding cepstrum. A 12χΐ2 matrix is considered to be a vector of (4) degrees of space. Many vectors of unknown continuous sounds are scattered in 144 degrees. When the speaker sends a known continuous sound, the characteristics of the known continuous sound are simulated and calculated by the characteristics of the surrounding continuous sound (LPCC). The present invention comprises a sigmoidal box normalization-sound sound wave, and the Bayesian=pair method finds a known continuous sound for the speaker's unknown continuous sound, which will be - a speaker's unknown The sentence is divided into D unknown continuous sounds, and - a visual ff帛·'s screening - a known sentence is the successor of the successor [Prior Art] 4 gentleman hair - a continuous sound, its pronunciation is expressed by sound waves. The sound wave is a kind of nonlinear change (4) with (4), and a continuous sound helps to have a kind of dynamic, and also makes a nonlinear continuous change with time. When the same continuous sound is pronounced, there are two consecutive dynamic characteristics, which are nonlinearly stretched and contracted with time, but the phase is arranged in time-like order but the time is different. When the same continuous sound is pronounced: It is very difficult to arrange the same dynamic characteristics at the same time position. It is even more difficult to identify because of the similarity of 201106340. - a brain-based 浯 ^ identification system, the first to extract sound waves related language resources

訊，也即動態特性，過濾和語言無關的雜音，如人的音色、音調，說話時心理、生理及情緒和語音辨認無關細知然^ 將相同連g的相同特徵排列在相同的時間位置上。此一連串的特徵用-等長系列特徵向量表示，稱為—個連續音的特徵模型。目前語音辨認祕要產生大小―致的概模型太複雜，且費時，因為相同連續音的相同特徵很難排列在同—時間位置上，尤其是英語，導致比對辨認困難。 -般句子或名類認方法打列—連紅個主要工作：未知句子或名稱切割成D個未知連續音、抽取特徵、特徵正常化 (特徵模型大小-致，且相同連續音的相同特徵排列在同一時間位置）、未知連續音觸、及在句子或名贿料庫找適合句子或名稱。-個連續音聲波特徵常用打顺種·能量 (贿gy )，零橫過點數（zero。職卿），極值數目（邮· c喊），顛峰（f〇rmants) ’線性預估編碼倒頻譜⑽〇及梅爾頻率倒頻譜（MFCC)，其巾以線性難編碼_哉（Lpcc) 及梅爾頻糊麟⑽C)是最有效，並#遍伽。、雜預估編碼倒頻譜（lpct)是代表—個連續音最”，穩定又準確的語言特徵。它祕性迴歸模式代表連續音音波，Μ小平方估計法計算靖絲，其估龍再賴_㈣，峡為線性預 201106340 估編碼倒頻譜（LPCC)。而梅爾頻率倒頻譜（MFCC)是將音波用傅氏轉換法轉換成頻率。再根據梅爾頻率比例去估計聽覺系統。根據學者 S.B. Davis and P. Mermelstein 於 1980 年出版在 IEEE Transactions on Acoustics，Speech Signal Processing, Vol. 28，No. 4 發表的論文 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences 中用動態時間扭曲法（DTW) ’梅爾頻率倒頻譜（MFCC)特徵比線性預估編碼倒頻譜（LPCC)特徵辨認率要高。但經過多次語音辨認實驗（包έ本人刖發明）’用貝氏分類法，線性預估編碼倒頻譜 (LPCC)特徵辨認率比梅爾頻率倒頻譜（MFCC)特徵要高，且省時。至於語言辨認’已魏乡方法獅。有祕時間扭曲法 (dynamic time-warping)，向量量化法（vect〇rquantizati〇n) 及隱藏式馬可夫模《法（Η隨）。如果相同的發音在時間上的變化有差異，一面比對，一面將相同特徵拉到同一時間位置。辨認率會很好，但將相同特徵拉到同—位置报困難並扭曲時間太長，不能應用。向量量化法如辨認大量連續音，不但不準確，且費時。最近隱藏式馬可夫模式法辨認方法不錯，但方法繁雜’太乡未知參數需料，計算料似辨認費時。最近 T.F. Li (黎自奮）於 2_ 年出版在 Pattern ReCognition， 201106340 vo 1· 36 發表的論文 Speech recognition of mandarin monosyllables中用貝氏分類法，以相同資料庫，將各種長短一系列LPCC向量壓縮成相同大小的分類模型，辨認結果比γ. κ.News, that is, dynamic characteristics, filtering and language-independent noise, such as human voices, tones, psychological, physiological and emotional and speech recognition when speaking, do not know the details ^ Align the same features of the same g at the same time position . This series of features is represented by a series of equal-length eigenvectors called a continuous eigenmodel. At present, the speech recognition secret size is too complicated and time consuming, because the same features of the same continuous sound are difficult to arrange in the same time position, especially in English, which makes the identification difficult. - General sentence or name recognition method - even red main work: unknown sentence or name cut into D unknown continuous sounds, extracted features, feature normalization (feature model size - and the same feature arrangement of the same continuous sound At the same time position), unknown continuous sounds, and find a suitable sentence or name in the sentence or name of the bribe. - A continuous sound wave feature is commonly used to shun kind of energy · bribe gy, zero crossing points (zero. Secretary), the extreme number (mail · c shout), peak (f〇rmants) 'linear prediction The coded cepstrum (10) and the Mel frequency cepstrum (MFCC) are most effective in linearly difficult coding _哉 (Lpcc) and Mel's frequency (10) C). The mis-predictive coding cepstrum (lpct) is the most stable and accurate linguistic feature of a continuous tone. Its secretive regression mode represents continuous sound waves, and the small square estimation method calculates Jingsong. _ (4), the gorge is the linear pre-201106340 estimated coded cepstrum (LPCC), while the Mel frequency cepstrum (MFCC) converts the sound wave into a frequency using the Fourier transform method, and then estimates the auditory system according to the ratio of the Mel frequency. SB Davis and P. Mermelstein published in 1980 in IEEE Transactions on Acoustics, Speech Signal Processing, Vol. 28, No. 4, "Comparison of parametric representations for monosyllabic word recognition in the continuous sentences" using dynamic time warping (DTW) The 'Mel frequency cepstrum (MFCC) feature is higher than the linear predictive coding cepstrum (LPCC) feature recognition rate. However, after repeated speech recognition experiments (including my invention), using Bayesian classification, linear pre- The estimated coded cepstrum (LPCC) feature recognition rate is higher than the Mel frequency cepstrum (MFCC) feature and saves time. Identify the 'Wei Xiang method lion. There are secret time-warping (vector), vector quantization (vect〇rquantizati〇n) and hidden Markov model (method). If the same pronunciation is in time There is a difference in the change, one side compares the same feature to the same time position. The recognition rate will be very good, but the same feature is pulled to the same position and the time is difficult to report and the distortion time is too long to be applied. Vector quantization method, such as identifying a large number of Continuous sound is not only inaccurate, but also time consuming. Recently, the hidden Markov model method is good, but the method is complicated. 'Taixiang unknown parameters are needed, and the calculation seems to be time-consuming. Recently TF Li (Li Zifen) was published in Pattern ReCognition in 2 years. , 201106340 vo 1· 36 Published in the paper Speech recognition of mandarin monosyllables using Bayesian classification, with the same database, a series of LPCC vectors of various lengths and compressions into the same size classification model, the recognition result ratio γ. κ.

Chen，C.Y.Liu，G.H. Chiang，Μ·Τ. Lin 於 1990 年出版在Chen, C.Y. Liu, G.H. Chiang, Μ·Τ. Lin published in 1990

Proceedings of Telec〇mmUnicati〇n Symp〇sium，加酬發表的論文 The rec〇gnition of mandarin m〇n〇syllables based on伽discrete hidden Markov model中用隱藏式馬可夫模式法HMM方法要好。但壓縮過程複雜費時，且相同連續音很難將相同特徵壓、___位置，對於相似連續音，很難辨認。本發明語切财法觸±频點，從學財面，根據音波有-種語音特徵，隨時間作非線性變化，自然導出一套抽取語音特徵方法。將—猶續音音波先正常倾轉換成-個足以代表該連續音的大小轉特徵模型，並且相徵模型内相同時間位置有相、在匕們特择日㈣⑽士不需要人域實驗調節本音特徵資料庫内已知連續音標準模型比明語音辨認方尋找相陶徵來崎。所以本發【發明内容】‘速元成特徵抽取，特徵正常化及辨認。 ⑴本&明㈣要的目的是帛乡擬及計算任何1語謂任何—個 ^以特徵來模發明可以不用樣本，就可以建立任何的特徵，因此本種-s的任何一個連續 201106340 音的特徵，即本發明不用樣本也能正確辨認各種語言。詳細地 5兒，本發明對任何一種語言的任何—個已知連續音，用貝氏距離在144度空間個未知連續音矩陣來模擬及計算該已知連續音，以達到不用已知連續音的樣本，仍能夠建立任何已知連續音的特徵。因此可以辨認任何語言。 ⑵本發明提供一種語言辨認綠。它能將不具語言音波刪 ⑶本發明提供一種連續音音波正常化及抽取特徵方法。它使Proceedings of Telec〇mmUnicati〇n Symp〇sium, the paper of the remuneration of the paper The rec〇gnition of mandarin m〇n〇syllables based on the hidden discrete hidden Markov model with the hidden Markov model HMM method is better. However, the compression process is complicated and time consuming, and it is difficult for the same continuous tone to press the same feature and the ___ position, which is difficult to recognize for similar continuous sounds. The speech-cutting financial method of the present invention touches the frequency point, and from the learning plane, according to the sound characteristics of the sound wave, the nonlinear change with time, naturally derives a set of methods for extracting the voice feature. The gradual normalization of the syllable wave is converted into a size-transformation model sufficient to represent the continuous sound, and the phase of the same time position in the symmetry model has a phase, and in our special selection day (4) (10), the human domain experiment is not required to adjust the sound. The known continuous sound standard model in the feature database is better than the clear speech recognition party. Therefore, the present invention [Summary] ‘speed element into feature extraction, feature normalization and recognition. (1) The purpose of this & Ming (4) is to calculate and calculate any of the words in the township. Any feature can be used to model the invention. Any feature can be created without the sample, so any one of the -s- consecutive 201106340 sounds The feature that the present invention can correctly recognize various languages without using samples. In detail, the present invention simulates and calculates the known continuous tone for any known continuous tone of any language using a Bayesian distance of 144 degrees of spatially unknown continuous tone matrix to achieve no known continuous tone. The sample is still able to establish the characteristics of any known continuous sound. Therefore, any language can be identified. (2) The present invention provides a language recognition green. It can delete non-verbal sound waves. (3) The present invention provides a method for normalizing and extracting continuous sound waves. it makes

•…a〜乃风依主佑邗冏吋間位置上有相同特 *。可以及日请認’達到電腦即時辨認效果。•...a~ is the same as the wind in the main 邗冏吋邗冏吋 position. Please recognize the day and reach the computer's instant recognition effect.

⑹本㈣使崎有料辦音波（音聽餘）。用較少(6) This (4) makes it possible for Saki to handle sound waves (sounds). Use less

「號點）。用較少不重疊含蓋所有信 201106340 號點特徵。不因為一個連續音音波太短，刪去該連續音，也不因為太長’刪去或壓縮部分信號點。只要人類聽覺能辨別此連續音，本發明即可將該連續音抽取特徵。所以: 發明語音辨認方法應用每一個具有語音的信號點，可以盡量拙取語音特徵。因#=12個彈性框不重疊，框數少，大大減少特徵抽取及計算雜雜編節fapGc)時間。 ⑺本發明辨财法可關認講社快或齡太慢的連續音。講話太快時’-個連續音音驗短，本發明的彈性框長度可以縮小，仍然用相同數以固等長的彈性框含蓋短音波。產生方個線性預估編碼倒頻譜（Lpcc)向量。只要該短音人類可辨別，那麼該雜線性預估編碼倒頻譜ap⑹ 向量可以有效代表該短音的特徵模型。講太慢所發出連續音音波較長。彈性框會伸長。所產生抑線性預估編碼倒頻瑨（LPCC)向量也能有效代表該長音。 ⑻本發明提供一個穩定及調節資料庫内所有已知連續音的特徵方法’使所有連續音的特徵在144度空間内相互伯有自己的健及郎，以便觸iL確。 ⑼辨認-綱子或名稱時，紐未㈣子或名稱切割成〇個未知連續音，本發明將每個未知連續音用貝氏法在連續音特徵資料庫，選擇最相似F個已知連續音。一個句子用 _個已知連續音表示，因切割困難可能切成比較多或比 201106340 較少未知連續音個數，本發明以每個未知連續音前後三列 F個相似已知連續音比對句子或名稱中一個已知連續音，也即在句子及名稱資料庫中’對每一句子或名稱用3xF視囪的已知相似連續音篩選一個已知連續音，再從句子及名稱賁料庫找一個最可能句子或名稱，方法簡單，成功率很同（辨認70英語句子及名稱和4〇7國語句子及名稱）。"No. Point." Cover all letters 201106340 with less overlap. Not because a continuous sound wave is too short, delete the continuous sound, and not because it is too long to cut or compress part of the signal point. As long as human The auditory can discriminate the continuous sound, and the present invention can extract the continuous sound. Therefore: the inventive speech recognition method applies each of the signal points with speech, and can capture the speech features as much as possible. Since #=12 elastic frames do not overlap, The number of frames is small, which greatly reduces the feature extraction and calculation of the misGc) time. (7) The invention of the invention can recognize the continuous sound of the fast or the age of the lecture. When the speech is too fast, the continuous sound is short. The length of the elastic frame of the present invention can be reduced, and the short sound waves are still covered by the same number of elastic frames of the same length. A square linear predictive coding cepstrum (Lpcc) vector is generated. As long as the short sound can be discerned, the miscellaneous The linear predictive coding cepstrum ap(6) vector can effectively represent the characteristic model of the short sound. The continuous sound waves emitted by the too slow are longer. The elastic frame will be elongated. The generated linear predictive coding scrambling 瑨 (LP) The CC) vector can also effectively represent the long sound. (8) The present invention provides a method for stabilizing and adjusting the characteristics of all known continuous sounds in the database 'make all the continuous sound features have their own health in a 144 degree space, In order to touch iL. (9) When identifying the name or name, the New Zealand (four) child or name is cut into an unknown continuous sound. The present invention uses the Bayesian method in the continuous sound feature database to select the most similar F. One known continuous sound. One sentence is represented by _ known continuous sounds, which may be cut into more or less than the number of consecutive sounds of 201106340 due to the difficulty of cutting. The present invention is similar to three consecutive columns of F for each unknown continuous sound. It is known that continuous tones compare a known continuous sound in a sentence or name, that is, in a sentence and name database, 'screen a known continuous sound for each sentence or name with a known similar continuous sound of 3xF. Find the most likely sentence or name from the sentence and name database. The method is simple and the success rate is the same (identify 70 English sentences and names and 4〇7 country sentences and names).

(10) 本發明提供二種技娜正連續音的特纖未知連續音及未知句子或名稱辨認成功。 (11) ^發明將—個國語單音當作—個只有—個音節連續曰中文及外文的特徵都由同樣本大小矩陣表示。因此本發明可以同時辨認各種語言。【實施方式】固况啊發明執行程序。第一圖是鱗續音永久資料庫，已知連續音特徵資曰）1，以一個連續音波10形、’ 3〇將連續音波轉為一序列收為20。數位轉氣 45有兩種删去方法：⑴趣位的信號點。先前處理i —般雜音變異數。如前者:後域點的變異數』應刪去。（2)計丨、時段不具語音斤吻段内連續兩信號點距離總和及1 201106340 ，顺叫細立H I心45之後，得到—序列具有該已知連續曰k號點。先將音波正常化再抽取特徵，將已知立(10) The present invention provides two kinds of unique continuous sounds of unknown and continuous speech sounds and unknown sentence or name recognition success. (11) ^Invented - a Mandarin single as a - only - syllable continuous 曰 Chinese and foreign features are represented by the same size matrix. Therefore, the present invention can recognize various languages at the same time. [Embodiment] The condition of the invention is invented. The first picture is a permanent database of scales, known as continuous tone feature,1, with a continuous sound wave of 10, '3' to convert continuous sound waves into a sequence to receive 20. There are two ways to delete the digital gas: 45 (1) Signal points of interest. Previously processed i-like noise variations. For example, the former: the variation of the back domain point should be deleted. (2) Counting time, no time in speech, and the distance between two consecutive signal points in the kiss segment and 1 201106340, after the H I heart 45 is finished, the obtained sequence has the known continuous 曰k point. Normalize the sound wave and then extract the feature, which will be known

部信號點分成轉時段，每時段組成—個框。全共有_長框50,沒有濾波器，不重疊，根據連續:: =點的長度，㈣框長度自由調整含蓋全部信號點。咖二匡^為彈性框，長度自由伸縮，但_性框長度—樣。不像漢明（H毫lng)窗，有毅器、半重疊、固定長卢能隨，長自由調整。因—個連續音音波隨_作非二變匕’.音波含有-個語音動態特徵，也隨時間作非線性變化。因為不重疊，所以本發明使用較少㈤2)個雜框，涵罢全部連續音音波’ _舰可由_錢點料，崎時間作線性變化的迴歸模式來密切估計非線性變化的音波，用最小平方法估計迴歸未知係數。每翻產生—鱗知係數最小平方估計值，叫做線性預估編碼⑽向量）。再將線性預估編 j (LPC)向量轉換為較穩定線性預估編竭倒頻譜（敗個連續音音波内含有_序列隨時間作非線性變化的語音動轉徵，在本發明内轉換成大小相等糊線性預估編石^ 項日（LPCC)向里60。為了抽取一個已知連續音的特徵，先準備了個永久已知連續音資料庫。每個已知連續音由叫固標準/月晰者發音-次。如果辨認一個口音重或不標準的說 11 201106340The signal points are divided into turn-time periods, and each time frame is composed of a frame. All have _ long box 50, no filter, no overlap, according to the length of:: = the length of the point, (four) the length of the frame freely adjust all the signal points with the cover.咖二匡^ is a flexible frame, the length is free to stretch, but the length of the _ sex box. Unlike the Hamming (H milli lng) window, there are fortune, semi-overlapping, fixed length Lu can follow, long free adjustment. Because a continuous sound wave follows the _ non-two change 匕'. The sound wave contains a dynamic feature of the voice, and also changes nonlinearly with time. Because they do not overlap, the present invention uses less (five) 2) miscellaneous frames, culminating all continuous sound waves ' _ ship can be _ money points, the time of the linear change of the regression mode to closely estimate the nonlinear changes of the sound waves, with the most The Xiaoping method estimates the regression unknown coefficients. The minimum squared estimate of the scale-to-score coefficient is called the linear predictive coding (10) vector. The linear predictive coding (LPC) vector is then converted into a more stable linear prediction and the cepstrum is degraded. The speech transition sign containing the _ sequence with non-linear variation over time is converted into a nonlinear transition in the present invention. The equal-precision linear predictive calculus ^ item day (LPCC) inward 60. In order to extract the characteristics of a known continuous sound, a permanent known continuous sound database is prepared. Each known continuous sound is called the solid standard. /月明者 pronunciation-time. If you recognize an accent that is heavy or not standard, say 11 201106340

話’那㈣此人發音，_❻梅娜成ExP_C 矩陣放在水久已知連續音資料相。在永紅知連續音資料庫内，為-個已知連續音抽取特徵，先準備—個未知連續音的貢料庫’未知連續音雜騎二種：—较未知連續音有另—嫩撕。雜卿_，絲每一個未音的平均似變異數。麵樣柄未知連續音資料庫 :用貝氏距離對該已知連續音周圍找N個最近的未知連續 ^再求鳴知伽辨均軸_續音的估編碼倒頻譜(LPCC)的N+1個加權平均值作為已知連續音的平均值，並以N個連續音的N個變魏的加權平均 _口連續音的變異數，此ExP平均值及變異數矩陣是扯知連續音的初步特徵值79放在連續音特徵資料庫中。如果未知貢料庫沒有樣本，在未知連續音資料料，用最小絕對值距離為該已知連續音觸找N個未知連 1及N個未知連續音的線性預估編碼倒_α⑽二= 1)個數子。求⑽）個數字的加權平均值作為該已知連續立的平均值，及求(_個數字的變異數作為該已知連續音: ^數，此權權她陣代她知音的初步 =放已知繼_軸79。在⑽連續音崎二庫内’如果-個已知連續音辭均值和在永久已知連續音貝枓庫内同樣—個已知連續音的Lpcc的貝氏距離，在特: 12 201106340 貝枓庫内枝最小，那麼在特徵:#料庫内㈣氏距離找N個The words 'that (4) this person pronounced, _ ❻ 娜成 into the ExP_C matrix placed in the water for a long time known continuous sound data phase. In the Yonghongzhi continuous sound database, for a known continuous sound extraction feature, first prepare a tribute library of unknown continuous sounds. 'Unknown continuous sounds and mixed riding two kinds: - more unknown continuous sounds have another - tender tear . Miscellaneous _, the average variability of each unvoiced sound. Unknown continuous sound database of the surface handle: N + of the estimated coded cepstrum (LPCC) for finding the N most recent unknown continuous around the known continuous sound with Bayesian distance 1 weighted average is used as the average of the known continuous sounds, and the number of variances of the weighted average _ mouth continuous sounds of N consecutive sounds of N consecutive sounds, the ExP average and the variance matrix are the continuous sounds The preliminary feature value 79 is placed in the continuous tone feature database. If there is no sample in the unknown treasury, in the unknown continuous sound data, find the linear prediction code of N unknown 1 and N unknown continuous sounds with the minimum absolute distance for the known continuous sound. _α(10) 2 = 1 ) a number of children. Find the weighted average of (10)) numbers as the average of the known continuous values, and find (the number of variances of the number as the known continuous sound: ^ number, this power is her initial generation of her companion = put It is known that the _axis 79. In the (10) continuous sounds of the second library, if the mean of the known continuous syllables is the same as the Bayesian distance of the Lpcc of the known continuous sounds in the permanent known continuous syllables, In special: 12 201106340 Bessie's inner branch is the smallest, then in the feature: #内库(四)氏距离找N

❻連續音’它們的貝氏轉對觀知連續音的LPCC是N 個最小。求請已知_音N辦触韻已知單音的㈣力立口權平均值作為該已知連續音新平均值，並用N個已知連續音的N個變異數的加權平均值作_已知連續音的新的變異數。用此方法重複多次計算特徵資料庫内每—個已知連續音的新平均值及變異數，最後Εχρ新的平均值及變異數矩陣叫做標準模型代表該已知連續音，放在特徵資料庫中观再用已知_資料庫的已知連續音建立句子及名稱資料庫阳。 —第二®表示-個未知句子或名稱辨财法流程。當輸入 -個未知句子或名稱2到本發明語音辨認方法後，以—組未知連績音波11進人触器2〇,缝轉換^ 3轉為—系列音波信號點。將—個未知句子或名稱的音波切成D個未知連 2音的音波40，再以第—圖先前處判45刪衫具語音的音波。再將每個未知連續音音波正常化，抽取特徵，將句子或名稱每個未知逹續音全部具有語音的_齡成^等時段，每時段形成一個彈性框5〇。每個連續音一共有糊彈性框，沒有濾波器，不重疊，自由伸縮含蓋全部信號點。在每框内，因錄點可域祕雜計，喊小平綠求购未知係數的估計值。每框崎產生的—組最斜方估計值叫做線I·生預估編竭（LPC)向量’線性預估編碼（Lpc)向量有正 13 201106340❻Continuous sounds. Their Bayesian turn to observe the continuous sound LPCC is N minimum. (4) The average value of the force of the known tone is used as the new average of the known continuous sound, and the weighted average of the N variances of the N known continuous sounds is used. A new variation of continuous sound is known. This method is used to repeatedly calculate the new average and variance of each known continuous sound in the feature database. Finally, the new average and the matrix of the variance are called the standard model representing the known continuous sound, and the feature data is placed. The library also uses the known continuous sounds of the known _ database to create sentences and name data. - The second ® represents an unknown sentence or name-financing process. When an unknown sentence or name 2 is input to the speech recognition method of the present invention, the splicing conversion ^ 3 is converted into a series of sound wave signal points by the group unknown continuation sound wave 11 into the human touch device 2 。. The sound wave of an unknown sentence or name is cut into D sound waves 40 of unknown 2 sounds, and then the sound wave of the voice is deleted by the first picture. Then, each unknown continuous sound wave is normalized, and the feature is extracted, and each unknown sentence of the sentence or the name has a period of _ age of the voice, and a flexible frame is formed every time period. Each continuous sound has a paste elastic frame, no filter, no overlap, and free telescopic cover all signal points. In each box, because of the location of the recorded points, Xiaoping Green is asked to purchase an estimate of the unknown coefficient. The estimated value of the most oblique square of each group is called Line I·Life Prediction Compilation (LPC) Vector 'Linear Estimation Coding (Lpc) Vector has positive 13 201106340

2配，縣祕齡編碼αρ〇向量娜較穩定線性預 =觸譜咖）向量6Q。—個未知連續⑽個線性 :編碼倒麟（_向量__，稱為分類模型 !lgit已知連續音標準模型大—樣一個句子—共有Η固概表D個未知連續音9G，如果—個已知連續音是此知連續音’它的標準_的平均值最靠近未知連續音分類 f⑽性預估編物轉（卿。所以本發賴易貝一辨縣，以未知連續音的分類模型和連續音資料庫如每 2已知連續音的標準模型比較1⑽。如果—個已知連續音 2未知連齡’為了狀斜，蚊轴_音的分麵内所有線性預估編•靡譜（聰）有獨立正常分配，它 =平均數及變異數以已知連續音標準模型内平均值及變頻^計。簡易貝氏法是計算未知連續音的線性預估編碼倒立 =(_與已知連續音的平均數的距離，再以已知連續 =異數調整，所得的值代表該未知連續音與一個已知連續 2似度。選擇與未知連續音F個相似度最高已知連續音代知連Μ音，因此—個未知句子或名_ _個已知連二來表示no。-個未知句子或名稱切割成d個未知連續 :’很難剛好切成—個未知句子或名稱所包含的連續音及 ^ ’有時-個連續音切成兩個，有時兩個連續音念的後、’電腦切成-個’因此，_未知連續音並不一定是講話 14 201106340 j正連續音的健’所料―列F個已知她連續音並不一疋包含講話者的連續音。在辨認一個未知句子在奸和名稱資料庫85，測試每一個已知句子及名稱，在測試H句该名稱是较講話者_伐名稱，將該句子或名稱攸頭-個已知連續音比對DxF矩陣相似連續音的前後三，以連續音（當絲—個比對只能輯中後兩列相似連續音），再移動3xF視窗（前後三列已知相似連續音）12〇找句子第二個已知連續音，直到測試句子全部已知連續音。在句子及名稱貝料庫巾，以最高機率的句子或名稱為講話者的句子或名％ (測試句子或名稱中已知連續音在3xF視窗數目除以測試句子或名射連續音數）⑽。當然可在句子及名稱資料庫中選擇和未知句子或名稱（D個未知連續音）長度大約相等的句子或名稱比對，節省時間。如果句子或名稱=能辨認，用貝氏分類法在特徵資料庫顿N個最相似連續音⑽ 改進句中的稍音特徵，—定會觸成功。本糾詳述於後： (1)-個連續音輸人語讀财法後，將此連續音連續音波轉換一系列數化音波信號點（signal卿—她⑹。再刪去不具》口日音波信號點。本發明提供二種方法：—是計算一小時段内信號點的變異數。二是計算糾段内相鄰二信= 點距離的總和。理論上’第—種方法比較好，因錢點的變異數大於雜日y異數，表示有語音存在。但在本發明辨認連 15 201106340 續音時，兩種方法辨認率一樣，但第二種省時。 (2)不具#f音鐵點刪去後，剩下錢點代表—個連續音全扎號點A將音波正常化再抽取特徵，將全部信號點分成 Θ等時段’每時段形成—個框。—個連續音共有€個等長的彈性框沒有濾波器、不重疊、自由伸縮，涵蓋全部信號點。彈性框mm賴時間作非雜變化，很誠數學模型表 ^ ^ ^ J. Markhoul ^ 1975 ^ ^ Proceedings of IEEE,2 with, county secret age code αρ〇 vector Na more stable linear pre = touch spectrum coffee) vector 6Q. - Unknown continuous (10) linear: Encoding inverted lining (_vector__, called classification model! lgit known continuous sound standard model large - like a sentence - total stagnation summary D unknown continuous sound 9G, if - It is known that the continuous sound is the average of the known continuous sound 'its standard _ is closest to the unknown continuous sound classification f (10) predictive compilation turn (Qing. Therefore, this is the classification of unknown continuous sounds and The continuous sound database is compared to the standard model of every 2 known continuous sounds. (1) If there is a known continuous sound 2 unknown continuous age 'for the slant, the linear axis of the mosquito axis _ sound is composed of all the linear predictions (靡) Cong) has independent normal distribution, which = average and variance are calculated by the average and frequency of the known continuous sound standard model. The simple Bayesian method is to calculate the linear predictive code of the unknown continuous tone. Inversion = (_ and known The distance of the average of the continuous sounds is then adjusted by the known continuous = odd number, and the obtained value represents the unknown continuous sound with a known continuous 2 similarity. Selecting the unknown similarity with the unknown continuous sound F the highest known continuous sound generation Knowing the voice, so an unknown sentence or The name _ _ is known to connect two to mean no. - An unknown sentence or name is cut into d unknown contiguous: 'It is difficult to cut into just - an unknown sentence or the continuous sound contained in the name and ^ 'sometimes - consecutive The sound is cut into two, sometimes after two consecutive sounds, 'computer cut into one'. Therefore, _unknown continuous sound is not necessarily a speech 14 201106340 j positive continuous sound of the 'received' column F Knowing that her continuous sound does not include the continuous sound of the speaker. In identifying an unknown sentence in the traitor and name database 85, testing each known sentence and name, in the test H sentence, the name is more than the speaker _ _ _, The sentence or name gimmick - a known continuous tone is compared to the front and back of the DxF matrix similar continuous sound, with a continuous sound (when the silk-a pair can only match the last two consecutive similar sounds), then move the 3xF window (There are three known similar continuous tones in the first three columns) 12〇 Find the second known continuous tone of the sentence until the test sentence is all known to be continuous. In the sentence and name, the sentence or name is the speaker with the highest probability of the sentence or name. Sentence or name % (test sentence or name already The number of consecutive sounds in the 3xF window divided by the test sentence or the number of consecutive sounds) (10). Of course, you can select sentences or names that are approximately equal in length to the unknown sentence or name (D unknown continuous sounds) in the sentence and name database. Save time. If the sentence or name= can be identified, use the Bayesian classification method to improve the slightly-sounding features of the N most similar continuous sounds (10) in the feature data, and it will succeed. This is detailed in the following: (1) After a continuous sound input into the human language reading financial method, the continuous sound continuous sound wave is converted into a series of digitized sound wave signal points (signal Qing - she (6). Then delete the mouth sound wave signal point. The present invention provides Two methods: - is to calculate the number of variances of signal points in one hour. The second is to calculate the sum of adjacent two letters = point distances in the correction segment. Theoretically, the first method is better, because the variation of the money points is greater than The odd number of the day is y, indicating that there is voice. However, in the case of the invention, the recognition rate of the two methods is the same, but the second time is saved. (2) After the #f sound iron point is deleted, the remaining money points represent - a continuous sound full tie number point A normalizes the sound wave and extracts the feature, and divides all the signal points into the time period of 'the time interval'. . - A continuous sound with a total length of flexible frames without filters, no overlap, freely telescopic, covering all signal points. The elastic frame mm time is not a miscellaneous change, and the mathematical model table is ^ ^ ^ J. Markhoul ^ 1975 ^ ^ Proceedings of IEEE,

Vol. 63，No. 4 發表論文 Linear Prediction: A tutOTial review中况明信號點與前面信號點有線性關係，可用隨時間作線性變化的迴歸的模型估計此非線性變化的信號點。信號點W可由别面信號點估計，其估計值心)由下列迴歸模式表不 · (1) 鲁在（1)式中，〜’七=1，..”尸，是迴歸未知係數估計值，戶是前面信Vol. 63, No. 4 Published in Linear Prediction: A tutOTial review shows that the signal point has a linear relationship with the previous signal point, and the signal point of this nonlinear change can be estimated by a regression model that changes linearly with time. The signal point W can be estimated from other surface signal points, and its estimated value is expressed by the following regression modes. (1) Lu in (1), ~ 'seven=1, .." corpse, is the estimated value of the regression unknown coefficient. , the household is a letter

號點數目。用L. Rabiner及B.H· 了1!卿於1993年著作書 Fundamentals of Speech Recognition, Prentice Hall PTRNumber of points. L. Rabiner and B.H. 1! Qing published in 1993 Fundamentals of Speech Recognition, Prentice Hall PTR

Englewood Cliffs，New Jersey 中 Durbin 的循環公式求s 小平方估計值，此組估計值叫做線性預估編碼（Lpc)向量。 _内信號點的線性預估編碼（LP〇向量方法詳述如下·· 以巧表示信號點^(«)及其估計值⑻之間平方差總和： 16 201106340 £1=|；[5(η)-2α,5(η-λ:)]2 (2) η=〇灰=1求迴歸係數使平方總和Α達最小。對每個未知迴細係數 α,_，ί = 1，.··，Ρ，求(2)式的偏微分，並使偏微分為〇 ’得到户組正常方程式：In Englewood Cliffs, New Jersey, Durbin's cyclic formula finds s small squared estimates, which are called linear predictive coding (Lpc) vectors. The linear predictive coding of the _ inner signal point (LP 〇 vector method is detailed as follows) to represent the sum of the squared differences between the signal point ^(«) and its estimated value (8): 16 201106340 £1=|;[5(η )-2α,5(η-λ:)]2 (2) η=〇灰=1 Find the regression coefficient to minimize the sum of squares. For each unknown coefficient of refinement α, _, ί = 1,.·· , Ρ, find the partial differential of (2), and divide the partial into 〇 ' to get the normal equation of the household:

^a^Sin- k)S(n - ί) = Σ5(η)5(η-〇, 1 < ^ ^ fc=l η «展開（2)式後，以（3)式代入，得最小總平方差α E„ =X52(n)-^at£5(n)5(n-)t) n k=\ η(3)式及(4)式轉換為 ⑶ ⑷ $atR(i - fc) = R(i)， l<i^P k=i (5)^a^Sin- k)S(n - ί) = Σ5(η)5(η-〇, 1 < ^ ^ fc=l η «Expand after (2), substituting (3), the minimum Total squared difference α E„ =X52(n)-^at£5(n)5(n-)t) nk=\ η(3) and (4) are converted to (3) (4) $atR(i - fc) = R(i), l<i^P k=i (5)

Ep = R(0) ⑹ jfc=l在（5)及（6)式中，用#表示框内信號點數， W—i ^(0 = + 0) ϊ ^ 0 (7) /»=〇用Durbin的循環快速計算線性預估編碼（Lpc)向量如下： E0 =/?(〇) 勾=_ 一客〇·-·7·)]/£μ ⑻ ⑼ αί° = of -kta^, E{ =(l-^j2)^j_i (10) (ID (12) 17 201106340 (8-12)公式循環計算’得到迴歸係數最小平方估計值α 严1，…’’ C線性預估編碼（LPC)向量）如下： (13) 預伯編碼倒頻譜Ep = R(0) (6) jfc=l In equations (5) and (6), use # to indicate the number of signal points in the frame, W—i ^(0 = + 0) ϊ ^ 0 (7) /»=〇 Use the Durbin loop to quickly calculate the linear predictive coding (Lpc) vector as follows: E0 =/?(〇) 勾=_ 一客〇·-·7·)]/£μ (8) (9) αί° = of -kta^, E { =(l-^j2)^j_i (10) (ID (12) 17 201106340 (8-12) Formula Cyclic Calculation 'Get the least squares estimate of the regression coefficient α Strict 1,...'' C Linear Estimation Coding (LPC) The vector) is as follows: (13) Pre-coded cepstrum

aj = ajP). \<j<P 再用下列公式將LPC向量轉換較穩定線性 (LPCC)向量<，y = 1，”户， ;=1 i Μ (14) (15) —個彈性減生—轉性預估編碼倒辆（服）向量 :’.··0。根據本發縣音辨認方法，用帥，因最後的線性預估編碼倒頻譜（LPCC)幾乎為0。一個連續音以錄線性預估編碼倒賴（脱）_示特徵，切—個含挪個線性 Z編碼倒頻譜（_的矩陣表示—個連續音…個連續音包含一至多個音節。、 =)同樣方法以（8-⑸式計算出—個未知連續音音波的以固線 =估編碼倒頻譜（聞向量，有同樣大小辦個LPCC的矩陣，叫做未知連續音的分類模型。 2第二财，語音辨認_，__個未知連續音的分類一個細LPCC的矩陣。❹.卞”川=1’户， q未知連續音分類模型。在與—個已知連續音d心、所有連績音總數）’比對時，為了快速計算比對值，假定⑸ 201106340Aj = ajP). \<j<P Then use the following formula to convert the LPC vector to a more stable linear (LPCC) vector<,y = 1," household, ;=1 i Μ (14) (15) - elasticity The reduction-rotation prediction code reversing vehicle (service) vector: '.··0. According to the method of identification of the local county, the handsome, because the final linear prediction coding cepstrum (LPCC) is almost 0. A continuous The sound is recorded by the linear prediction code. The cut-off feature contains a linear Z-coded cepstrum (the matrix of _ is a continuous tone... a continuous tone contains one or more syllables. , =) The method calculates the cepstrum of the unknown continuous sound wave by the equation (8-(5)) (sense vector, the matrix of the LPCC with the same size, called the classification model of the unknown continuous sound. 2 second wealth, Speech recognition _, __ classification of unknown continuous sounds A matrix of fine LPCC. ❹.卞"chuan=1' household, q unknown continuous sound classification model. In and with a known continuous tone d heart, all continuation Total) 'In comparison, in order to quickly calculate the alignment value, assume (5) 201106340

有办户個獨立正常分配，它的平均數及變異數，以已知連縯音標準模型内的平均值及變異數估計。以八xlc )表示x的條件密度函數。以T. F. Li (黎自奮）於2003年出版在Pattern Recognition，Vol. 36 發表論文 Speech rec〇gniti〇n 〇f mandarin monosyllables中的決策理論說明貝氏分類法如下：假設特徵資料庫一共有历個已知連續音的標準模型。以心’=1’.._，《’表不連續音^ = 1，％出現的機率，也即先前機率，貝!§3-1。心表示-個決策方法。定義__個簡單損失函數 (loss function) ’也即d的判錯機率（misclass出如㈤ probabiUty):如決策方知判錯一個未知連續音物c，，則損失函數他，· = 1。如果,判對—個未知連續音咖^，則無損失L(Ci，々x))=0。辨認方法如丁 .，、,广卜.以Γ·_，表示矩陣值屬於已知連續音Ci的範圍。汝也px在η，^/判未知連續音屬於已知連續音Ci判錯平均機率為 ΚΜ)=Ρ>κ,ά(χ))ηχ^^ m (16) 在（16)中 ’ r = (Uj, r, jr ，疋^乂外範圍。以D表立認方法，也即劃分瓜個已知連綠不斤有扣曰辨以取4) 建、、貝曰的範圍所有方法。在D中找一個辨認方法歧它的平均認錯機率⑽達到最小表示 201106340 (Π) R(T^-minR(T>d) 滿足（17)式的辨财法《叫做與先前機率『有關的貝氏分類法。可用下列表示： (18) ’’也即屬於已知連續音ς·的範圍是對在（18)式中，；·=1,There is an independent normal allocation of the households, and its average and variation are estimated by the mean and variance within the known continuous sound standard model. The conditional density function of x is expressed in eight xlc). The decision theory in Speech Rec〇gniti〇n 〇f mandarin monosyllables published by TF Li (Li Zifen) in 2003 in Pattern Recognition, Vol. 36 shows that the Bayesian classification is as follows: Assume that the feature database has a total of known A standard model of continuous sound. The heart '=1'.._, "the probability of the occurrence of the table discontinuity ^ = 1,%, that is, the previous probability, ** § 3-1. The heart represents a decision method. Define __ a simple loss function (loss function) ‘that is, the probability of error in d (misclass is as in (5) probabiUty): If the decision maker knows that an unknown continuous syllable c, then the loss function, he = 1. If, for example, an unknown continuous tone coffee ^, there is no loss L (Ci, 々 x)) = 0. The identification method is D, ., , and Guang. With Γ·_, the matrix value belongs to the range of the known continuous tone Ci.汝 also px in η, ^ / sentence unknown continuous sound belongs to the known continuous sound Ci error average probability ΚΜ) = Ρ > κ, ά (χ)) η χ ^ ^ m (16) In (16) 'r = (Uj, r, jr, 疋^乂 outside the range. The D-form method is recognized, that is, the method of dividing the melon is known to be green and not deductible to take 4) the range of construction and shellfish. Find an identification method in D. The average probability of error recognition (10) reaches a minimum of 201106340 (Π) R (T^-minR(T>d) satisfies the formula (17) of the method of financing, called "related to the previous probability" The classification method can be expressed as follows: (18) ''The range of known continuous sounds is the same in (18),;·=1,

所有〕}。如所有已知連續音出現機率一樣’氏分紐和最大機率法—樣。貝氏分類法(18)辨認-個未知連續音時，紐算所有z的條件密度函數/(X|C,.)，i=l，...，w， f (x\ c.)= Π n (19) 在（19)中，^‘^乂已知連續音總數^為了計算方便’將⑽) 式取對數，並刪去常數，得貝氏距離 %) = / = 1,...,m. ⑽all〕}. As with all known continuous tones, the probability of occurrence is the same as the "maximum probability" method. Bayesian classification (18) identifies the conditional density function of all z/(X|C,.), i=l,...,w, f (x\ c.)= Π n (19) In (19), ^'^乂 knows the total number of continuous sounds ^ For the convenience of calculation, '(10)) is taken as a logarithm, and the constant is deleted, and the Bayesian distance %) = / = 1,.. .,m. (10)

Jl ^ Ji σιμJl ^ Ji σιμ

貝氏分類法（18)變成對每個已知連續音Ci，計算你;）值（2〇)，代C,·)也稱為未知連續音和已知連續音^的相似度，或貝氏距離 (Baysian distance )。在（20)式中，x = u;i} , _/=ι，...,£·， hi，···，/1，是未知連續音分類模型内線性預估編碼倒頻譜 (LPCC)值’（~，站}用已知連續音的標準模型内的平均數及變異數估計。本發明最重要的貢獻是不用樣本在已知連續音特徵資料庫為每一個已知連續音C,.找到互相穩定的中心點 201106340 〇=(〜}及明確不重疊的範圍這裡叫〜}是表示已知連續音⑽似Lpcc矩陣範圍。 ⑸抽取-個已知連續音的·，先顿—個未知連續音的資料庫’未知連續音資料庫有.二種：一種是未知連續音有樣本，另-種是沒有樣本。有樣本的資料庫，先求每一個未知連續音的平均值及變異數。在有樣本的未知連續音㈣庫中用貝氏距離對該已知連續音周圍找N個最近的未知連續音。再求N個未曰的N個平均值及该已知連續音的雜預估編碼倒頻譜 (to的N+1個加權平均值作為已知連續音的平均值，並二 N個連續音的N_魏的域平触作為配知連續音的變 2膽平恤編喷_輪的初步特徵 2輪音龍資料料。如果未知賴音資料庫沒有樣本，在未知連續音資料庫中，用最小絕對值 5周圍《個未知連續音。該已知連續音及N個未知連已2 '雜預估編碼觸譜(_看作㈣)個數字。求: 1的加權平均值作為該已知連續音的平均值，及求( 數子的變細編已知連物輸 2的矩嶋如傭_觸，心貧料庫内79。名P心、垂读* u 丈貝曰符域在已知連績音特徵資料庫内，績音的平均㈣如知撕嶋嶋—個= 21 201106340 待sn狀轉’在特徵軸内不是最小，那麼在狀爾則处知_音，它烟貝氏對5亥已知連續音的LPCC是IV個县,,, 早平均信另w 土由士取小。求N個已知連續音IV個新平均值’並用N個已知連續音的:’音The Bayesian classification (18) becomes for each known continuous tone Ci, calculating your ;) value (2〇), generation C, ·) also known as the unknown continuous sound and the known continuous sound ^ similarity, or shell Baysian distance. In (20), x = u;i} , _/=ι,...,£·, hi,···,/1, is the linear predictive coding cepstrum in the unknown continuous sound classification model (LPCC) The value '(~, station} is estimated by the mean and the number of variations in the standard model of known continuous tones. The most important contribution of the present invention is that there is no sample in the known continuous tone feature database for each known continuous tone C Find the center point of each other's stability 201106340 〇 = (~} and the range of clear non-overlapping here is called ~} is to indicate the range of known continuous sounds (10) like Lpcc matrix. (5) Extract - a known continuous tone ·, first - A database of unknown continuous sounds. There are two types of unknown continuous sound data. One is that there are samples of unknown continuous sounds, and the other is no samples. There is a database of samples, and the average value of each unknown continuous sound is first sought. The number of variances. In the unknown continuous sound (4) library with samples, find the N most recent unknown continuous sounds around the known continuous sound with Bayesian distance. Then find N unnumbered N averages and the known continuous sounds. Miscellaneous predictive coding cepstrum (N+1 weighted averages of to as the average of known continuous tones, and two N The continuous sound of N_Wei's domain flat touches as a continuation of the sound of the 2 biliary t-shirts _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ In the middle, use the minimum absolute value of 5 around the "unknown continuous sound. The known continuous sound and N unknown connected 2' mispredicted coding spectrum (_ as (4)) numbers. Find: 1 weighted average as The average value of the known continuous sounds, and the seeking (the number of sub-variables is known as the moment of the loss of 2, such as the maid_touch, the heart of the library 79. Name P heart, reading * u 曰贝曰In the known continuation feature database, the average of the performance scores (four) is known as tearing one = 21 201106340 to be in the shape of the snake is not the smallest in the characteristic axis, then the _ sound is known in the shape It is the IV counties of the five-year-old continuous sounds of the smoked Bayes, and the early average letters are taken from the smaller ones. Find the N new averages of the N known continuous sounds and use N known continuous Sound: 'sound

為遠已知連縯音的新的變異數。用此方法重複多次計料庫内每一個已知連續音的新、5貝 :=:再_嶋庫—音建： ⑻㈣觸—個未知連續音e觸錯誤，本剌提供娜正舊特徵，使該連續音辨認正確；八 ω用貝氏分類法⑽在特内找For the far known new variations of the continuous sound. In this way, repeat the new 5 lbs of each known continuous sound in the library. ==: _ 嶋 — 音 - (8) (4) Touch - an unknown continuous tone e-touch error, this 剌 provides Nazheng old features To make the continuous sound recognizable correctly; Eight ω is found in Teh by Bayesian classification (10)

相―續音—，...，—:值Phase-continuation—,...,—: value

用{/V 〜 = ?^=Σ: ijt ;}, £=1,...,ρ , (22) 刑七如、代表該未知連續音的新標準握子連續音特徵資料庫中，再測試該音一定成工力、 (l)ii吻巾Ν個最相似已知連續音平均值及該未知連續音的 pt 丁 ^ ,1、以^值’^個最相似連續音的變異數加權平均值作為未〜只日丁叫值及茲未知連、射預估編碼倒頻譜(LPCC)求加權平均值作為該未知連續音知連續音的•絲々用{ J =X：.,E, I =l,...5p ’代表該 22 201106340 未知連續音新的標準模型。 ⑺為了證實本發·同_何語言，本㈣執行2人語音辨認實驗。 (a)首先建立-個未知連續音f料庫。本單音資料庫是從台灣的中央研究院購買。資料庫一共有删個國語單音^ 三圖），全是女性發音，樣本從6個至·j 99個不等，很音的發音幾乎一樣。 ⑹從⑵計方法將所有樣本轉成制Lpa辦，一共有12400個矩陣。 (c)在388國語單音中，用樣本求平均值及變異數。 ⑷盲目混合3_語單音’使有樣本平均值及變異數的單音變成388未知連續音資料庫（一個國語單音也一個音節的連續音）。 ^ (e)再找-男-女對654國語單音，154音語，丨個德語，丄個日語及3個台語發音一次建立兩個813個永久已知連續音資料庫，每_續音赠性雜編·觸(LPGQExP矩陣表示。 ⑴在永久已知連續音資料庫813個已知連續音中，對每— 個已知連續音，用貝氏距離⑽在388未知連續音中找和15個未知連續音’該已知連續音的線性預估編碼倒頻譜 (LPCC)及N個未知連續音的樣本平均值求n+i個加權平均 23 201106340 里數的力權未料續音的樣本變變里數H 已知連續音的變胁。此平均值及 L、數则矩陣叫做該已知連續音的初步特徵79 已知連績音特徵資料庫。也即特徵資料庫包含⑽個平均值及變異數矩陣80。 (g)在特徵資料料，如果—個已知連續音的平均值和在永 3續音資料庫中同樣該已知連續音的LPCC的貝氏距離不疋取小。在813連續音特徵資料用貝氏距離找N=15已知連 :曰用N個連績音的N個平均值及該已知連續音的聰未加權平均值聽已知連續音新平均值。對n個已知連續曰的變異數求加權平均值為該已知連續音的新變異數。重覆計算新平及變魏多次。最後的12灿平均值及變 ”數矩陣叫做鮮模型，表示該已知連續音特徵，存在已知連續音特徵資料庫中8〇。本么明執行下列連續音辨認，辨認率援人而定，因相似太多，入圍前三名就算對·· ① 辨認384國語單音，1個德語，1個日語，2個台語（第三圖）（辨認率非常好） ② 辨認154英語，1個德語（第四圖）（辨認率非常好） ③同時辨認154英語及388國語，1德語，1日語，2台語 24 201106340 (辨認率非常好） ④辨認654國語單音，1德語，1日語，3台語（第三圖及第五圖辨認率好，沒有前三者好）講?者的句子或謂· ’我們先建立-個英語及國 =句广及名财料庫’每個句子或名稱_連續音全1由連捧立 Ϊ徵貨料庫内（3_4)已知英語及國語任意組成 =成子及名稱，384國語單字組成407個國語句ΐ及名％，（弟六圖）。辨認方法如下： u J卞汉Use {/V 〜 = ?^=Σ: ijt ;}, £=1,...,ρ , (22) Penalty ruth, representing the unknown continuous tone of the new standard grip continuous tone feature database, and then Test the sound must be a work force, (l) ii kiss scarf, the most similar known continuous sound mean and the cont of the unknown continuous sound, 1, the weight of the ^ value of the most similar continuous sound The average value is used as the weighted average of the not-only dice and the unknown, and the estimated cross-spectrum (LPCC) is used as the weighted average of the unknown continuous sound. { J =X:.,E, I = l, ... 5p 'represents the 22 201106340 new standard model of unknown continuous sound. (7) In order to confirm the original language and the same language, this (4) performs a two-person speech recognition experiment. (a) First establish an unknown continuous stream f library. This monophonic database was purchased from the Central Research Institute in Taiwan. The database has a total of single-sentences (three figures), all of which are female pronunciations. The samples range from 6 to 99, and the pronunciation is almost the same. (6) Convert all samples from the (2) method to the Lpa office, with a total of 12,400 matrices. (c) In the 388 Mandarin singles, use the sample to average and the number of variances. (4) Blindly mix 3_monophonic' to make the single tone with the sample mean and the variance number into a 388 unknown continuous sound database (one Mandarin single tone and one syllable continuous sound). ^ (e) Re-find - male-female 654 Mandarin mono, 154 vocabulary, one German, one Japanese and three Taiwanese pronunciations to create two 813 permanent known continuous sound databases, each _continued (2) In the 813 known continuous tones of the permanent known continuous sound database, for each known continuous sound, find the 388 unknown continuous sound with Bayesian distance (10). And 15 unknown continuous sounds, the linear predictive coding cepstrum (LPCC) of the known continuous sound and the sample mean of the N unknown continuous sounds are n+i weighted averages 23 201106340 Miles power is not renewed The sample variation kernel H is known to be the threat of continuous sound. This average and the L and number matrix are called the preliminary features of the known continuous sound. 79 The known continuous tone feature database. That is, the feature database contains (10) The mean and the variance matrix 80. (g) In the feature data, if the average of the known continuous sounds and the LPCC of the known continuous sound in the Yong 3 continuous database are not the same Take small. In the 813 continuous tone feature data, use the Bayesian distance to find N=15 known connection: N N flat with N consecutive performances The mean and the unweighted average of the known continuous sounds are listened to the new average of the known continuous sounds. The weighted average of the variances of the n known continuous turns is the new variance of the known continuous sound. Repeated calculation Xinping and Wei have been repeated many times. The final 12-can average and variable-number matrix is called the fresh model, indicating the known continuous sound feature, and there are 8〇 in the known continuous tone feature database. Benming performs the following continuous tone recognition. The recognition rate depends on the person. Because there are too many similarities, the top three are even right. · 1 Identify 384 Mandarin singles, 1 German, 1 Japanese, 2 Taiwanese (3rd) (very good recognition rate) 2 Identify 154 English, 1 German (4th) (very good recognition rate) 3 Identify 154 English and 388 Mandarin, 1 German, 1 Japanese, 2 Taiwanese 24 201106340 (The recognition rate is very good) 4 Identify 654 Mandarin Monophonic, 1 German, 1 Japanese, 3 languages (the third and fifth pictures have good recognition rate, no good for the first three). The sentence of the speaker is said or not. 'We first establish - English and country = sentence wide And the name of the database 'each sentence or name _ continuous sound all 1 by the joint holding In the cargo warehouse (3_4), English and Mandarin are known to be arbitrarily composed = adult and name, and 384 Mandarin characters are composed of 407 national sentences and %, (different six figures). The identification method is as follows: u J卞汉

(!)切割-個未知句子或名稱成為D =鄰:信號點落姆總和，如太小，該‘為计丘切成D個夫釦、查繂立θ應該疋兩連、、員g分界線就應切割，一 ;;lpIc f;tt Λ;：ί r50 60 ^90 ^ 及國語特徵資料庫+選擇最相似貝氏續々[2〇) ;及國語（圖））’—未知句子或名稱以歸續包ii (b)在句子及名稱資料庫尋找講國語句子和名稱中，挑選長度有⑽j在47乎語及等長（D個未知連續音時 f稱和4話者的句子或名稱比對句子或名稱的5知連續音和内都含-個比對句子或名稱内的已知^貝立曰抛如母列相似連續音 =如m㈣料名續音是D (d如果_庫_句子和名稱冉。(!) cutting - an unknown sentence or name becomes D = neighbor: the sum of the signal points, such as too small, the 'for the Qiuqiu cut into D buckles, check 繂 θ should be 疋 two,, g The boundary should be cut, one;; lpIc f; tt Λ;: ί r50 60 ^90 ^ and the Mandarin feature database + select the most similar Bayesian continuation [2〇); and the Mandarin (Figure)) - unknown sentences or The name is in the continuation package ii (b) in the sentence and name database to find the speaking sentence and the name, the length of the selection is (10) j in 47 words and the same length (D unknown continuous sound f and 4 words of the sentence or The name matches the sentence or the name of the 5 known continuous sounds and both contain - a comparison sentence or the name inside the known ^ bei 曰曰 throwing as the mother column similar continuous sound = such as m (four) material name is a D (d if _ Library_sentence and name 冉.

Cc)的辨認正確連續音不晏硬,曰數疋D-】或D+】或比對句子或名稱（資料庫内）中，’本發明則用祕視窗篩選，在 ;中第，個已知連續音，用歸矩陣 25 201106340 =後三列相似已知連續音（即第Cc) recognizes that the correct continuous sound is not hard, the number 疋 D-] or D+] or the comparison sentence or name (in the database), 'the invention is screened with the secret window, in the first; Continuous tone, using the return matrix 25 201106340 = the last three columns are similar to known continuous sounds (ie the first

DxF矩陣有多少比對句子或名稱内的已知大句子或名稱為講的齡女祕庫選擇-個機率最假如某句子或名_認錯誤，—定是在一個或多個不在它們的F個相似已知連續音中，曰中’有在(155+384)已知連續音中尋找前Ν=15順位已知連^1(2= 相似連續音及該未知連續音的Lpa：加 ^= =使叫連續音在它們F個相似已知連續音How many DxF matrices are selected than the known large sentences in the sentence or the name or the name of the female secret library - the probability is the most likely if a sentence or name _ acknowledgment error - is in one or more of them Among the similar known continuous sounds, there is a known continuous sound in (155+384). The first Ν=15 is known to be connected to ^1 (2= similar continuous sound and Lpa of the unknown continuous sound: plus ^ = = make continuous sounds in their F similar known continuous tones

行T列英語及國語句子及名稱辨認，辨認幾乎全部正確， ① 辨認70英語句子及名稱（非常好）。 ② 辨s忍407國語句子及名稱（非常好） ③ 辨認70英語句子及名稱與術句子及名稱（非常好) *附一張Visual Basic辨認圖（第辨έ忍央s吾及國語句.子、名稱方法。七圖第八圖）表示同時【圖式簡單說明】第-圖及第二圖說明發明執行程序。第—圖是表示已知連續音永久資料庫，已知連續音特徵資料庫和句子及名稱三個資料庫: 流釭，第一圖表示一個未知句子或名稱辨認方法流程。第三圖是表示辨認384國語單音，！個德語，}個日語，2個台語。第四圖是表示辨認154英語，1個德語。第五圖是表示辨認269國語單音，3台語。第六圖是表示句子及名稱資料庫有7〇英語句子及術中文句子 26 201106340 及名稱。第七圖及第八圖說明Visual Basic辨認圖表示同時辨認英語及國語句子、名稱方法。【主要元件符號說明】 (1)建立一個已知連續音永久資料庫，發音一個連續音或一個句子’句子再分成多個已知連續音。 (10)連續音連續音波 (20)接收器 (30)音波數位轉換器 (45)除去雜音 (50) E個彈性框正常化音波 (60)最小平方法計算線性預估編碼倒頻譜（⑽）向量、氏距離(絕對值距離），對每一個已知連續音(永久資料庫)，在未知連續音資料庫找_最近未知連續音。 (79)對母一個已知連續音(永久資料庫），用周削個未知連續音及 :連’的敗味加權平均值，為該已知連續音初步特 ^庐在特徵寅料庫。再在特徵資料庫用貝氏距離找1^個已知連績音和該已知連續音LPCC求加權平均值，計算多次。最後 ^平句值(ΕχΡ平均值及變異數)代表該已知連續音的標模型。 τ Γ ^ ^^曰特徵資料庫包含所有平均值及變異數的標準模型已头連續音特徵資料庫的連續音建立要辨認的句子及名稱 27 201106340 的句子及名稱資料庫 ⑵輸入-未㈣子或名稱 (11) 一組未知連續音波 (40)將-個句子或名稱切個未知連續音 (9〇) 音的、物卿_代細固未头連續音分類模型 (100)用狀分類法比較每—個已知連續音標準翻及未知分類模型曰 (110)-句子或名稱巾為每—個未知連續音找最相近的卩個已知連續曰，一句子或名稱一共有Dxp個已知最相似連續音表示 (120)在句子和名稱資料庫中，用3xF視窗的相似已知連續音篩選所有句子及名稱中的每個已知連續音 (130)在句子及名稱資料庫中找一個最可能的句子或名稱T column English and national sentences and name recognition, identification is almost all correct, 1 identify 70 English sentences and names (very good). 2 Distinguish s 407 national sentence and name (very good) 3 Identify 70 English sentences and names and sentences and names (very good) * Attach a Visual Basic identification diagram (the first sentence έ έ s s 及及及及子子, the name method. The seventh figure of the seventh figure shows the simultaneous [simplified description of the drawing] The first figure and the second figure illustrate the invention execution program. The first figure shows the known continuous sound permanent database, the continuous sound feature database and the three databases of sentences and names: rogue, the first figure shows an unknown sentence or name identification method flow. The third picture shows the identification of 384 Mandarin singles! German, } Japanese, 2 Taiwanese. The fourth picture shows the recognition of 154 English and 1 German. The fifth picture shows the recognition of 269 Mandarin singles and 3 words. The sixth picture shows that the sentence and name database has 7 English sentences and Chinese sentences 26 201106340 and the name. The seventh and eighth diagrams illustrate the Visual Basic recognition diagram showing the simultaneous recognition of English and national sentences and name methods. [Explanation of main component symbols] (1) Establish a permanent database of known continuous sounds, pronounce a continuous sound or a sentence' sentence and divide it into several known continuous sounds. (10) continuous sound continuous sound wave (20) receiver (30) sound wave digital converter (45) remove noise (50) E elastic frame normalized sound wave (60) least square method calculate linear predictive coding cepstrum ((10)) Vector, distance (absolute distance), for each known continuous sound (permanent database), find the nearest unknown continuous sound in the unknown continuous sound database. (79) A known continuous sound (permanent database) for the mother, with an unknown continuous sound and a weighted average of the succumbs of the sequel, for the known continuous sound, a preliminary feature. Then, the feature database is used to find 1^ known perpetual scores and the known continuous tone LPCC to obtain a weighted average value, which is calculated multiple times. The last ^ flat sentence value (ΕχΡ mean and variance) represents the nominal model of the known continuous tone. τ Γ ^ ^^曰Characteristic database contains all the averages and the number of variants of the standard model. The continuous tone of the continuous tone feature database is created to identify the sentence and name. 27 201106340 sentence and name database (2) input - not (four) Or name (11) a group of unknown continuous sound waves (40) will be - a sentence or name cut into an unknown continuous sound (9 〇) sound, qing _ generation fine solid head continuous sound classification model (100) use classification Compare each known continuous tone standard and unknown classification model 110(110)-sentence or name towel to find the closest known 曰 consecutive 曰 for each unknown continuous sound, one sentence or name has a total of Dxp Knowing the most similar continuous tone representation (120) In the sentence and name database, use the similar known continuous tones of the 3xF window to filter all sentences and each known continuous sound in the name (130) in the sentence and name database. One of the most likely sentences or names

2828

Claims

201106340 VII. Patent application scope: A method for identifying various languages without using samples. The steps include: (1) an unknown continuous sound database (with or without samples); by test (2) - a known continuous sound permanent database By a clear standard speaker, each known continuous sound is pronounced once, such as the test population sound is very heavy;

(3) - a sacred ϋ (ρΐΈ-ρπχ^_) 林语音 voice voice sound point points (sampled points) or murmur; ° (4) - a continuous sound wave normalization and extraction feature method: with e elastic box will The sound waves are normalized and converted into equal-sized linear predictive coding cepstrum (LPCC) ExP feature matrix; (5): an unknown continuous tone red f- and pumping (four) method: normalize the sound wave and convert it into size and Knowing the continuous phonetic scale (represented by a matrix of (10) averages and variances), the feature matrix of equal size, called unknown continuous-segment facet, contains linear predictive coding cepstrum (LPCC); (8)- Simplify the Bayesian classification: compare the unknown continuous sound classification model with all known continuous sound standard models of known continuous sound feature databases (by a list of ExP averages and variances)* Continuous sound is known, which is the least conveyed to the unknown continuous sound, and is recognized as an unknown continuous sound; (7) in the known permanent database of sounds, for each known continuous sound, there are 29

201106340 ^ object, sound database f, (four) turn to the nearest °, _ sound 'if unknown, please do not have money, use absolute distance to find N recently unknown continuous sound, · (8) two stocks have samples The N+1 weighted averages of 'calculating the N most recent unknown continuous tones and the linear estimates of the known continuous tones' are the average continuous sounds, n unknown continuous tones (10) The weighted average of the variances is the variation of the known continuous sound. The ExP average and the variance matrix are called the preliminary features of the known continuous sound, and are placed in the known continuous sound feature database. (9) If the continuous continuous There is no sample in the sound database. N is a linear estimate of the unknown continuous sound. The horse's cepstrum (off) and the linear predictive code of the known continuous sound are cepstrum (4) ··+1), and the (10) number is weighted. The mean and the number of variances, the mean value of the Εχρ and the matrix of the variance are called the preliminary features of the known continuous sound, and there is a library of known continuous sound features; (10)- Repeat calculation and stabilization of each known continuous sound feature Method for making known continuous sound features Each known continuous tone has a mutual stability feature (represented by a matrix of mean values and variances), a standard model called the known continuous tone, placed in a known database; '曰(11) will An unknown sentence or name is cut into D unknown continuous sound methods; (12) - A simplified shell classification method in D unknown continuous sounds, for each - 30 201106340 Unknown continuous sounds in known continuous sound features (four) library t , select the F most similar known continuous sounds - an unknown sentence or the name is represented by the known continuous sound of the _ matrix; (10) using the known continuous sound of DXF_, comparing the sentence and the name database with all sentences and names, Look for - the most likely known sentence or name; (10) - a method of correcting a continuous tone feature, so that the speaker's sentence or name is identified as being read. 2 • According to the identification method of the patent application scope 1, the method of the steps: the item mentioned in the item - the miscellaneous words recognize various languages (3) delete the sound waves or murmurs without speech, including the signal points in the second a small segment 'Calculating the number of variances of signal points and the variation of the general murmurs' as recorded _ the number of variograms is smaller than the murmur, and the coffee goes to that time; (^-hours are the points of the number, the sum of the peaks of the two recorded points and the general miscellaneous _ The age of the 辑跋辑 Μ 后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者后者The matrix, the steps are as follows: (= 2 aliquots - a continuous sound wave recording method 1 (10) Sexual changes closely estimate the nonlinear changes of the sound waves, the full length of the sound waves into the transition ^ 'mother time period formation - the rotation box, one Continuous sound has a paste frame, no tender 11 (FUW stack, can be freely stretched and covered

201106340, Long sound wave 'is not a fixed length Hamming (Hari (8) window; (b) inside the mother box, with - linearly varying sound waves over time; the mode of rotation is estimated over time (c) with Durbin's loop mode R ( i) = ^S(n)S(n + i\ ,·>〇η=〇E〇=R(〇) Yes, + =_-乞〇_刀]/κ }-\ 卜1 α,ω Et = (1-^2) £m aj=aT^ l<j<P Find regression coefficient, square estimate #W, call the silky estimate mother (LPC) vector, reuse | 1-1 / α '«·=α.·+Σ(~)α.·-;α,/·>ι</<ρ j-\ l α'.=Σ(Ζ«, p<i /=α-ρ The converted linear predictive coding (LPC) vector is a stable linear predictive coding cepstrum (LPCC) vector α', ·, ΐϋΡ; (d) a linear predictive coding cepstrum (LPCC) vector is used to represent a continuous tone 32 201106340 4 According to one of the methods described in the first paragraph of the application of the scope of the transfer, the step (5) further includes a -type method, the steps are as follows: (5) will tear the sound wave (four) wheel, round _ ^ unknown _ sound with scale length Box, no money wave g,, free telescopic cover all sounds _ point; "

A sample that can be used to identify unknown continuous sounds in various languages can be identified (with a regression pattern that varies linearly with time in a flexible box). (c) Use Durbin's Cyclic mode N -im^^)S(n+iX i>() Λ=〇~U Ε0 = R(Q) Outline - , (0 Ei=(l-kf)E τ(Ρ) is JSP Calculate the regression coefficient The least squares are calculated... (LPC vector) (d) The LPC vector is further transformed into a stable linear prediction coded spectrum (Lpcc) vector (6) using the formula corpus 33 201106340 a'i=£(f)W μ The estimated code scrambling hunger (10) vector, (tearing Lpcc matrix), as the classification model of the unknown continuous sound. According to the application of the patent (4), the use of a sample does not recognize the various languages.

The financial law's step (6) contains a simple Bayesian (Bay which identifies the unknown continuous sound method, the steps are as follows: (4) - The characteristics of an unknown continuous sound are classification models, using a cadence matrix 5 7 -1, ·., £ 5 ^=1 d The main one ^L· ,.·.,ρ, the table is not, in order to quickly identify, the office has an LPCC and tampers the independent random variables, there is a normal allocation, if the continuous continuous sound and A known number of filaments 9 c, ι·-1'.··'% (the calendar is the total number of known continuous sounds) a comparison of the average number and the number of variances (/w4) Knowing the continuous (quad) mean value and the estimate of the variance, the conditional density function of Lai X

Π μ 4ΐπσίΗ Knowing the classification pattern of continuous sounds _ line (4) Estimating the cepstral variation lemma 4) Using the average number in the known continuous phonological model and the _ Yi Bei's classification to materialize the continuous sound characteristics and find the continuous sound ^ Most like this is the unknown continuous standing, the male 曰 X known continuous sound to the unknown even 34 201106340 Continuation χ similarity in the following formula Zhong Zhong Ci) indicates [Υ42πσΰ (\ , C) for rapid identification, with logarithm Simplify the conditional density function (b) in (b), and delete the constants that do not need to be calculated. The Bayesian distance is also called Bayesian classification. %)); ^ 2 n %

(4) Calculate the Bayer distance (10) value in (c) for each known continuous tone e; ' f = 1,%; (6) Select - a known continuous sum in the feature database, its shell distance % ;) The value 疋 is the smallest and is judged as the unknown continuous sound. 6. According to the application of patent scope i, a non-sample can identify the recognition method of each language's step (10) contains - money to calculate and stabilize all known continuous tone features of the library: ω if known in continuous The sound feature #料库 has a known continuous sound, which is not the smallest in the characteristic data 2 of the same known continuous sound in the permanent database of known continuous sounds, and the mine (four) distance is shouted in the special shoe library. The linear predictive coding cepstrum of the continuous sound is also known as the permanent data (10) 〇 the most recent known continuous performance sound; (b) the estimated coded cepstrum of the most recent continuous sound (LPCC) N+1 N The average value and the linear pre-weighted average of the known continuous sound are the new average of 35 201106340 of the known continuous sound, and the weighted average of the N variances of the (four) most recent consecutive sounds is the new variance. This new feature of the new mean and new variability is placed in the feature database; (7) Repeating the butterfly a) to (6) multiple times, the last new feature, (4) the mean and the matrix of the variance 'called the known continuous sound Registration model; 'U secret Chou material library encoding a known side continuous tone spectrum (Cong) unchanged.

• According to the application of the patent scope, the use of a sample can identify the method of the various languages, the steps ((1) include an unknown sentence or the name is cut into an unknown method: (4) Calculate the phase per unit period The distance between the two adjacent signal points is the sum of the distances. If it is too small, the period is muted or murmur, and there is no voice signal. (5) Silence or noise accumulates too much in the adjacent unit period (longer than the two syllables in the continuous tone), The time period should be two consecutive sound dividing lines, which should be cut, an unknown sentence or name cut into!) an unknown continuous sound; (c) a continuous sound to remove mute and noise, the elastic frame normalization, the least squares calculation linear prediction A coded cepstrum (LPCC) vector representing an unknown connection, ', 曰$ sub or name—shares D linear predictive coding cepstrum (LPCC) ExP matrix representations. According to the first item of the application for the special fiber, the method of identifying the various languages is not included. The step (12) further includes the following: 36 201106340 After an unknown sentence or name is cut into D unknown continuous sounds, each one is unknown. Continuous ice refinement of the shell classification method, in the feature database, each known continuous sound ς.Κ} and the unknown continuous sound ^ bech distance ^c- ^= Σ ln (<am) + Φ^- l^L) Find the nearest F known continuous sounds (F may contain multiple languages at the same time), and the unknown continuous sounds are represented by the F known similar continuous sounds; (8) Therefore - an unknown sentence or name has a D column f A known similar continuous tone indicates 'that is, the known similar continuous tone of the sentence or name in the DxF scale:: Very high. According to the method of claiming the scope of patent application, the method of identifying the various languages (4) is not complicated, and the step (13) further includes the following sentence and name identification methods: (a) in the sentence and name database, select A sentence or name that is approximately equal in length to the speaker's sentence or name (ie, a sentence and name with D ± 1 known continuous tones); (b) If in the sentence and name database, select the sentence or name of the comparison , its length is just as long as the speaker's sentence or name is an unknown continuous sound) then 'D then D each column F similar known continuous tones and the selected D consecutive words of the selected sentence or name According to the order comparison, see if there are any known continuous sounds in the f-like similar known continuous sounds, such as the similar continuous sounds in each column, including the matching sentences or the names in the order. 201106340 Zhilian, S will have all D unknown continuous sounds correctly identified, the matching sentence or name is the speaker's sentence or name; (5) If the sentence and name are compared to the sentence or name there are D known companies Jiyin' Comparing the speakers, the D consecutive sounds are not fully recognized correctly (not in F similar known continuous sounds) or the matching sentences or names are not D lengths, the invention is a secret crane screening, with the scales she knows the continuous sound In the middle and the bottom three similar known continuous tones, in order to compare each of the sentences or names in the sentence and name database with 'D or D ± 1 known continuous tones, in the database, in the database Select the most likely match sentence or the sentence or name of the speaker. The probability is proportional to the known continuous pitch of the sentence or name divided by the full length (D or D ± 1) in the 3xF window. 10 · According to the identification method of “not using samples” mentioned in item 1 of the scope of patent application, step (14) further includes a method of modifying the continuous tone feature to make the sentence or name identify correctly: (a) If a sentence or name Touch error, which is determined to be in one of the D unknown continuous tones, one or more of them are not in their F similar known continuous tones, assuming that c indicates that one of the unknown continuous tones is not in it? In a similar known continuous sound, N consecutive sound features are obtained using N most similar known continuous sounds {={/ν'4}'ί = ι,···'"'; = 1'. .·, £, € = 1', p, the average value (or weighted by order) ', keep 'this average one represents the new feature of the unknown continuous sound c; 38 201106340 (b) in item (a) In the linear prediction of the tester's pronunciation, the spectrum and the N average of the N most similar known continuous tones are N+u solid: the weight average is the new average of the unknown continuous sound, and the N most The weighted average of the N variances of similar known continuous tones is the new variation of the unknown continuous tone: This new average and the variation ExP matrix represents the new standard model of the unknown continuous tone; (c) Retest the unknown The sentence or name should be successful.

39