TW201106340A - A speech recognition method for all languages without using samples - Google Patents

A speech recognition method for all languages without using samples Download PDF

Info

Publication number
TW201106340A
TW201106340A TW98126015A TW98126015A TW201106340A TW 201106340 A TW201106340 A TW 201106340A TW 98126015 A TW98126015 A TW 98126015A TW 98126015 A TW98126015 A TW 98126015A TW 201106340 A TW201106340 A TW 201106340A
Authority
TW
Taiwan
Prior art keywords
continuous
sound
unknown
sentence
name
Prior art date
Application number
TW98126015A
Other languages
Chinese (zh)
Other versions
TWI395200B (en
Inventor
Tze-Fen Li
Lee Tai-Jan Li
Shih-Tzung Li
Shih-Hon Li
Li-Chuan Liao
Original Assignee
Tze-Fen Li
Lee Tai-Jan Li
Shih-Tzung Li
Shih-Hon Li
Li-Chuan Liao
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tze-Fen Li, Lee Tai-Jan Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao filed Critical Tze-Fen Li
Priority to TW98126015A priority Critical patent/TWI395200B/en
Publication of TW201106340A publication Critical patent/TW201106340A/en
Application granted granted Critical
Publication of TWI395200B publication Critical patent/TWI395200B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12x12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.

Description

201106340 、發明說明: 【發明所屬之技術領域】 -個連續音包含-個❹個音節(單音)。本發明可以不用 連續音的樣本能辨認所有語言。 本發明用12彈性框(窗),等長,無遽波器,不重疊,將 長短不一的—個連續音的音波賴成12x12的線性預估編碼 倒頻譜(LPCC)的矩陣,-個未知的連續音用12χ12的線性預 估編碼倒頻譜的矩陣表示。—個12χΐ2矩陣認為是—個⑷ 度空間的一個向量。很多未知連續音的向量散佈在144度空 間。當說話人發-個已知連續音,該已知連續音的特徵由周 圍的未知連續音的特徵(LPCC)模擬及計算。 本發明包含_性框正常化—個連續音的音波,貝氏 =對法在貞料庫巾為發音者的未知連續音找—個已知連續 音,將-個說話者的-個未知的句子分成D個未知連續音, 及-個視ff帛·’筛選―個已知句子為繼者的未知 【先前技術】 4卞 士發-個連續音時,它的發音是用音波表示。音波是一種隨 ㈣作非線性變化㈣統,—個連續音音助含有—種動 性,也隨時間作非線性連續變化。相同連續音發音時,有二連 串相同動態特性,隨時間作非線性伸展及收縮,但相 性依時間排列秩序-樣’但時間不同。相同連續音發音時:將 相同的動態特性排列在同一時間位置上非常困難。更因相似連 201106340 續音特多,造成辨認更難。 — 個也腦化浯^辨認系統,首先要抽取聲波有關語言資201106340, invention description: [Technical field to which the invention pertains] - A continuous tone contains - one syllable (mono tone). The present invention can recognize all languages without a sample of continuous sound. The invention uses 12 elastic frames (windows), equal length, no chopper, no overlap, and the sound waves of different lengths and one continuous sound are converted into a matrix of 12x12 linear predictive coding cepstrum (LPCC), Unknown continuous sounds are represented by a matrix of 12χ12 linear predictive coding cepstrum. A 12χΐ2 matrix is considered to be a vector of (4) degrees of space. Many vectors of unknown continuous sounds are scattered in 144 degrees. When the speaker sends a known continuous sound, the characteristics of the known continuous sound are simulated and calculated by the characteristics of the surrounding continuous sound (LPCC). The present invention comprises a sigmoidal box normalization-sound sound wave, and the Bayesian=pair method finds a known continuous sound for the speaker's unknown continuous sound, which will be - a speaker's unknown The sentence is divided into D unknown continuous sounds, and - a visual ff帛·'s screening - a known sentence is the successor of the successor [Prior Art] 4 gentleman hair - a continuous sound, its pronunciation is expressed by sound waves. The sound wave is a kind of nonlinear change (4) with (4), and a continuous sound helps to have a kind of dynamic, and also makes a nonlinear continuous change with time. When the same continuous sound is pronounced, there are two consecutive dynamic characteristics, which are nonlinearly stretched and contracted with time, but the phase is arranged in time-like order but the time is different. When the same continuous sound is pronounced: It is very difficult to arrange the same dynamic characteristics at the same time position. It is even more difficult to identify because of the similarity of 201106340. - a brain-based 浯 ^ identification system, the first to extract sound waves related language resources

訊,也即動態特性,過濾和語言無關的雜音,如人的音色、音 調,說話時心理、生理及情緒和語音辨認無關細知然^ 將相同連g的相同特徵排列在相同的時間位置上。此一連串 的特徵用-等長系列特徵向量表示,稱為—個連續音的特徵模 型。目前語音辨認祕要產生大小―致的概模型太複雜,且 費時,因為相同連續音的相同特徵很難排列在同—時間位置 上,尤其是英語,導致比對辨認困難。 -般句子或名類認方法打列—連紅個主要工作:未 知句子或名稱切割成D個未知連續音、抽取特徵、特徵正常化 (特徵模型大小-致,且相同連續音的相同特徵排列在同一時 間位置)、未知連續音觸、及在句子或名贿料庫找適合句 子或名稱。-個連續音聲波特徵常用打顺種·能量 (贿gy ),零橫過點數(zero。職卿),極值數目(邮· c喊),顛峰(f〇rmants) ’線性預估編碼倒頻譜⑽〇及 梅爾頻率倒頻譜(MFCC),其巾以線性難編碼_哉(Lpcc) 及梅爾頻糊麟⑽C)是最有效,並#遍伽。、雜預估 編碼倒頻譜(lpct)是代表—個連續音最”,穩定又準確的 語言特徵。它祕性迴歸模式代表連續音音波,Μ小平方估 計法計算靖絲,其估龍再賴_㈣,峡為線性預 201106340 估編碼倒頻譜(LPCC)。而梅爾頻率倒頻譜(MFCC)是將音波 用傅氏轉換法轉換成頻率。再根據梅爾頻率比例去估計聽覺系 統。根據學者 S.B. Davis and P. Mermelstein 於 1980 年出 版在 IEEE Transactions on Acoustics,Speech Signal Processing, Vol. 28,No. 4 發表的論文 Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences 中用動態 時間扭曲法(DTW) ’梅爾頻率倒頻譜(MFCC)特徵比線性預估 編碼倒頻譜(LPCC)特徵辨認率要高。但經過多次語音辨認實 驗(包έ本人刖發明)’用貝氏分類法,線性預估編碼倒頻譜 (LPCC)特徵辨認率比梅爾頻率倒頻譜(MFCC)特徵要高,且 省時。 至於語言辨認’已魏乡方法獅。有祕時間扭曲法 (dynamic time-warping),向量量化法(vect〇rquantizati〇n) 及隱藏式馬可夫模《法(Η隨)。如果相同的發音在時間上的變 化有差異,一面比對,一面將相同特徵拉到同一時間位置。辨 認率會很好,但將相同特徵拉到同—位置报困難並扭曲時間太 長,不能應用。向量量化法如辨認大量連續音,不但不準確, 且費時。最近隱藏式馬可夫模式法辨認方法不錯,但 方法繁雜’太乡未知參數需料,計算料似辨認費時。最 近 T.F. Li (黎自奮)於 2_ 年出版在 Pattern ReCognition, 201106340 vo 1· 36 發表的論文 Speech recognition of mandarin monosyllables中用貝氏分類法,以相同資料庫,將各種長短 一系列LPCC向量壓縮成相同大小的分類模型,辨認結果比γ. κ.News, that is, dynamic characteristics, filtering and language-independent noise, such as human voices, tones, psychological, physiological and emotional and speech recognition when speaking, do not know the details ^ Align the same features of the same g at the same time position . This series of features is represented by a series of equal-length eigenvectors called a continuous eigenmodel. At present, the speech recognition secret size is too complicated and time consuming, because the same features of the same continuous sound are difficult to arrange in the same time position, especially in English, which makes the identification difficult. - General sentence or name recognition method - even red main work: unknown sentence or name cut into D unknown continuous sounds, extracted features, feature normalization (feature model size - and the same feature arrangement of the same continuous sound At the same time position), unknown continuous sounds, and find a suitable sentence or name in the sentence or name of the bribe. - A continuous sound wave feature is commonly used to shun kind of energy · bribe gy, zero crossing points (zero. Secretary), the extreme number (mail · c shout), peak (f〇rmants) 'linear prediction The coded cepstrum (10) and the Mel frequency cepstrum (MFCC) are most effective in linearly difficult coding _哉 (Lpcc) and Mel's frequency (10) C). The mis-predictive coding cepstrum (lpct) is the most stable and accurate linguistic feature of a continuous tone. Its secretive regression mode represents continuous sound waves, and the small square estimation method calculates Jingsong. _ (4), the gorge is the linear pre-201106340 estimated coded cepstrum (LPCC), while the Mel frequency cepstrum (MFCC) converts the sound wave into a frequency using the Fourier transform method, and then estimates the auditory system according to the ratio of the Mel frequency. SB Davis and P. Mermelstein published in 1980 in IEEE Transactions on Acoustics, Speech Signal Processing, Vol. 28, No. 4, "Comparison of parametric representations for monosyllabic word recognition in the continuous sentences" using dynamic time warping (DTW) The 'Mel frequency cepstrum (MFCC) feature is higher than the linear predictive coding cepstrum (LPCC) feature recognition rate. However, after repeated speech recognition experiments (including my invention), using Bayesian classification, linear pre- The estimated coded cepstrum (LPCC) feature recognition rate is higher than the Mel frequency cepstrum (MFCC) feature and saves time. Identify the 'Wei Xiang method lion. There are secret time-warping (vector), vector quantization (vect〇rquantizati〇n) and hidden Markov model (method). If the same pronunciation is in time There is a difference in the change, one side compares the same feature to the same time position. The recognition rate will be very good, but the same feature is pulled to the same position and the time is difficult to report and the distortion time is too long to be applied. Vector quantization method, such as identifying a large number of Continuous sound is not only inaccurate, but also time consuming. Recently, the hidden Markov model method is good, but the method is complicated. 'Taixiang unknown parameters are needed, and the calculation seems to be time-consuming. Recently TF Li (Li Zifen) was published in Pattern ReCognition in 2 years. , 201106340 vo 1· 36 Published in the paper Speech recognition of mandarin monosyllables using Bayesian classification, with the same database, a series of LPCC vectors of various lengths and compressions into the same size classification model, the recognition result ratio γ. κ.

Chen,C.Y.Liu,G.H. Chiang,Μ·Τ. Lin 於 1990 年出版在Chen, C.Y. Liu, G.H. Chiang, Μ·Τ. Lin published in 1990

Proceedings of Telec〇mmUnicati〇n Symp〇sium,加酬發 表的論文 The rec〇gnition of mandarin m〇n〇syllables based on伽discrete hidden Markov model中用隱藏式馬可夫模 式法HMM方法要好。但壓縮過程複雜費時,且相同連續音很難 將相同特徵壓、___位置,對於相似連續音,很難辨認。 本發明語切财法觸±频點,從學財面,根據音 波有-種語音特徵,隨時間作非線性變化,自然導出一套抽取 語音特徵方法。將—猶續音音波先正常倾轉換成-個足以 代表該連續音的大小轉特徵模型,並且相 徵模型内相同時間位置有相 、在匕們特 择日㈣⑽士不需要人域實驗調節本 音特徵資料庫内已知連續音標準模型比 明語音辨認方尋找相陶徵來崎。所以本發 【發明内容】‘速元成特徵抽取,特徵正常化及辨認。 ⑴本&明㈣要的目的是帛乡 擬及計算任何1語謂任何—個 ^以特徵來模 發明可以不用樣本,就可以建立任何的特徵,因此本 種-s的任何一個連續 201106340 音的特徵,即本發明不用樣本也能正確辨認各種語言。詳細地 5兒,本發明對任何一種語言的任何—個已知連續音,用貝氏距 離在144度空間個未知連續音矩陣來模擬及計算該已知連 續音,以達到不用已知連續音的樣本,仍能夠建立任何已知連 續音的特徵。因此可以辨認任何語言。 ⑵本發明提供一種語言辨認綠。它能將不具語言音波刪 ⑶本發明提供一種連續音音波正常化及抽取特徵方法。它使Proceedings of Telec〇mmUnicati〇n Symp〇sium, the paper of the remuneration of the paper The rec〇gnition of mandarin m〇n〇syllables based on the hidden discrete hidden Markov model with the hidden Markov model HMM method is better. However, the compression process is complicated and time consuming, and it is difficult for the same continuous tone to press the same feature and the ___ position, which is difficult to recognize for similar continuous sounds. The speech-cutting financial method of the present invention touches the frequency point, and from the learning plane, according to the sound characteristics of the sound wave, the nonlinear change with time, naturally derives a set of methods for extracting the voice feature. The gradual normalization of the syllable wave is converted into a size-transformation model sufficient to represent the continuous sound, and the phase of the same time position in the symmetry model has a phase, and in our special selection day (4) (10), the human domain experiment is not required to adjust the sound. The known continuous sound standard model in the feature database is better than the clear speech recognition party. Therefore, the present invention [Summary] ‘speed element into feature extraction, feature normalization and recognition. (1) The purpose of this & Ming (4) is to calculate and calculate any of the words in the township. Any feature can be used to model the invention. Any feature can be created without the sample, so any one of the -s- consecutive 201106340 sounds The feature that the present invention can correctly recognize various languages without using samples. In detail, the present invention simulates and calculates the known continuous tone for any known continuous tone of any language using a Bayesian distance of 144 degrees of spatially unknown continuous tone matrix to achieve no known continuous tone. The sample is still able to establish the characteristics of any known continuous sound. Therefore, any language can be identified. (2) The present invention provides a language recognition green. It can delete non-verbal sound waves. (3) The present invention provides a method for normalizing and extracting continuous sound waves. it makes

•…a〜乃风依主佑邗冏吋間位置上有相同特 *。可以及日请認’達到電腦即時辨認效果。•...a~ is the same as the wind in the main 邗冏吋 邗冏吋 position. Please recognize the day and reach the computer's instant recognition effect.

⑹本㈣使崎有料辦音波(音聽餘)。用較少(6) This (4) makes it possible for Saki to handle sound waves (sounds). Use less

「號點)。用較少 不重疊含蓋所有信 201106340 號點特徵。不因為一個連續音音波太短,刪去該連續音, 也不因為太長’刪去或壓縮部分信號點。只要人類聽覺能 辨別此連續音,本發明即可將該連續音抽取特徵。所以: 發明語音辨認方法應用每一個具有語音的信號點,可以盡 量拙取語音特徵。因#=12個彈性框不重疊,框數少,大 大減少特徵抽取及計算雜雜編節fapGc)時間。 ⑺本發明辨财法可關認講社快或齡太慢的連續 音。講話太快時’-個連續音音驗短,本發明的彈性框 長度可以縮小,仍然用相同數以固等長的彈性框含蓋短音 波。產生方個線性預估編碼倒頻譜(Lpcc)向量。只要該 短音人類可辨別,那麼該雜線性預估編碼倒頻譜ap⑹ 向量可以有效代表該短音的特徵模型。講太慢所發出連續 音音波較長。彈性框會伸長。所產生抑線性預估編碼倒 頻瑨(LPCC)向量也能有效代表該長音。 ⑻本發明提供一個穩定及調節資料庫内所有已知連續音 的特徵方法’使所有連續音的特徵在144度空間内相互 伯有自己的健及郎,以便觸iL確。 ⑼辨認-綱子或名稱時,紐未㈣子或名稱切割成〇 個未知連續音,本發明將每個未知連續音用貝氏法在連續 音特徵資料庫,選擇最相似F個已知連續音。一個句子用 _個已知連續音表示,因切割困難可能切成比較多或比 201106340 較少未知連續音個數,本發明以每個未知連續音前後三列 F個相似已知連續音比對句子或名稱中一個已知連續音, 也即在句子及名稱資料庫中’對每一句子或名稱用3xF視 囪的已知相似連續音篩選一個已知連續音,再從句子及名 稱賁料庫找一個最可能句子或名稱,方法簡單,成功率很 同(辨認70英語句子及名稱和4〇7國語句子及名稱)。"No. Point." Cover all letters 201106340 with less overlap. Not because a continuous sound wave is too short, delete the continuous sound, and not because it is too long to cut or compress part of the signal point. As long as human The auditory can discriminate the continuous sound, and the present invention can extract the continuous sound. Therefore: the inventive speech recognition method applies each of the signal points with speech, and can capture the speech features as much as possible. Since #=12 elastic frames do not overlap, The number of frames is small, which greatly reduces the feature extraction and calculation of the misGc) time. (7) The invention of the invention can recognize the continuous sound of the fast or the age of the lecture. When the speech is too fast, the continuous sound is short. The length of the elastic frame of the present invention can be reduced, and the short sound waves are still covered by the same number of elastic frames of the same length. A square linear predictive coding cepstrum (Lpcc) vector is generated. As long as the short sound can be discerned, the miscellaneous The linear predictive coding cepstrum ap(6) vector can effectively represent the characteristic model of the short sound. The continuous sound waves emitted by the too slow are longer. The elastic frame will be elongated. The generated linear predictive coding scrambling 瑨 (LP) The CC) vector can also effectively represent the long sound. (8) The present invention provides a method for stabilizing and adjusting the characteristics of all known continuous sounds in the database 'make all the continuous sound features have their own health in a 144 degree space, In order to touch iL. (9) When identifying the name or name, the New Zealand (four) child or name is cut into an unknown continuous sound. The present invention uses the Bayesian method in the continuous sound feature database to select the most similar F. One known continuous sound. One sentence is represented by _ known continuous sounds, which may be cut into more or less than the number of consecutive sounds of 201106340 due to the difficulty of cutting. The present invention is similar to three consecutive columns of F for each unknown continuous sound. It is known that continuous tones compare a known continuous sound in a sentence or name, that is, in a sentence and name database, 'screen a known continuous sound for each sentence or name with a known similar continuous sound of 3xF. Find the most likely sentence or name from the sentence and name database. The method is simple and the success rate is the same (identify 70 English sentences and names and 4〇7 country sentences and names).

(10) 本發明提供二種技娜正連續音的特纖未知連續 音及未知句子或名稱辨認成功。 (11) ^發明將—個國語單音當作—個只有—個音節連續 曰中文及外文的特徵都由同樣本大小矩陣表示。 因此本發明可以同時辨認各種語言。 【實施方式】 固况啊發明執行程序。第一圖是 鱗續音永久資料庫,已知連續音特徵資 曰)1,以一個連續音波10形 、’ 3〇將連續音波轉為一序列 收為20。數位轉氣 45有兩種删去方法:⑴趣位的信號點。先前處理i —般雜音變異數。如前者:後域點的變異數』 應刪去。(2)計丨、時段不具語音 斤吻段内連續兩信號點距離總和及1 201106340 ,顺叫細 立H I心45之後,得到—序列具有該已知連續 曰k號點。先將音波正常化再抽取特徵,將已知立(10) The present invention provides two kinds of unique continuous sounds of unknown and continuous speech sounds and unknown sentence or name recognition success. (11) ^Invented - a Mandarin single as a - only - syllable continuous 曰 Chinese and foreign features are represented by the same size matrix. Therefore, the present invention can recognize various languages at the same time. [Embodiment] The condition of the invention is invented. The first picture is a permanent database of scales, known as continuous tone feature,1, with a continuous sound wave of 10, '3' to convert continuous sound waves into a sequence to receive 20. There are two ways to delete the digital gas: 45 (1) Signal points of interest. Previously processed i-like noise variations. For example, the former: the variation of the back domain point should be deleted. (2) Counting time, no time in speech, and the distance between two consecutive signal points in the kiss segment and 1 201106340, after the H I heart 45 is finished, the obtained sequence has the known continuous 曰k point. Normalize the sound wave and then extract the feature, which will be known

部信號點分成轉時段,每時段組成—個框。全 共有_長框50,沒有濾波器,不重疊,根據連續:: =點的長度,㈣框長度自由調整含蓋全部信號點。咖 二匡^為彈性框,長度自由伸縮,但_性框長度—樣。 不像漢明(H毫lng)窗,有毅器、半重疊、固定長卢 能隨,長自由調整。因—個連續音音波隨_作非二變 匕’.音波含有-個語音動態特徵,也隨時間作非線性變化。 因為不重疊,所以本發明使用較少㈤2)個雜框,涵罢全 部連續音音波’ _舰可由_錢點料,崎時間作 線性變化的迴歸模式來密切估計非線性變化的音波,用最小 平方法估計迴歸未知係數。每翻產生—鱗知係數最小平 方估計值,叫做線性預估編碼⑽向量)。再將線性預估編 j (LPC)向量轉換為較穩定線性預估編竭倒頻譜(敗 個連續音音波内含有_序列隨時間作非線性變化的語音 動轉徵,在本發明内轉換成大小相等糊線性預估編石^ 項日(LPCC)向里60。為了抽取一個已知連續音的特徵,先 準備了個永久已知連續音資料庫。每個已知連續音由叫固標 準/月晰者發音-次。如果辨認一個口音重或不標準的說 11 201106340The signal points are divided into turn-time periods, and each time frame is composed of a frame. All have _ long box 50, no filter, no overlap, according to the length of:: = the length of the point, (four) the length of the frame freely adjust all the signal points with the cover.咖二匡^ is a flexible frame, the length is free to stretch, but the length of the _ sex box. Unlike the Hamming (H milli lng) window, there are fortune, semi-overlapping, fixed length Lu can follow, long free adjustment. Because a continuous sound wave follows the _ non-two change 匕'. The sound wave contains a dynamic feature of the voice, and also changes nonlinearly with time. Because they do not overlap, the present invention uses less (five) 2) miscellaneous frames, culminating all continuous sound waves ' _ ship can be _ money points, the time of the linear change of the regression mode to closely estimate the nonlinear changes of the sound waves, with the most The Xiaoping method estimates the regression unknown coefficients. The minimum squared estimate of the scale-to-score coefficient is called the linear predictive coding (10) vector. The linear predictive coding (LPC) vector is then converted into a more stable linear prediction and the cepstrum is degraded. The speech transition sign containing the _ sequence with non-linear variation over time is converted into a nonlinear transition in the present invention. The equal-precision linear predictive calculus ^ item day (LPCC) inward 60. In order to extract the characteristics of a known continuous sound, a permanent known continuous sound database is prepared. Each known continuous sound is called the solid standard. /月明者 pronunciation-time. If you recognize an accent that is heavy or not standard, say 11 201106340

話’那㈣此人發音,_❻梅娜成ExP_C 矩陣放在水久已知連續音資料相。在永紅知連續音資料 庫内,為-個已知連續音抽取特徵,先準備—個未知連續音 的貢料庫’未知連續音雜騎二種:—较未知連續音有 另—嫩撕。雜卿_,絲每一個未 音的平均似變異數。麵樣柄未知連續音資料庫 :用貝氏距離對該已知連續音周圍找N個最近的未知連續 ^再求鳴知伽辨均軸_續音的 估編碼倒頻譜(LPCC)的N+1個加權平均值作為已知連續音 的平均值,並以N個連續音的N個變魏的加權平均 _口連續音的變異數,此ExP平均值及變異數矩陣是扯 知連續音的初步特徵值79放在連續音特徵資料庫中。如果 未知貢料庫沒有樣本,在未知連續音資料料,用最小絕對 值距離為該已知連續音觸找N個未知連 1及N個未知連續音的線性預估編碼倒_α⑽二= 1)個數子。求⑽)個數字的加權平均值作為該已知連續立 的平均值,及求(_個數字的變異數作為該已知連續音: ^數,此權權她陣代她知音的初步 =放已知繼_軸79。在⑽連續音崎 二庫内’如果-個已知連續音辭均值和在永久已知連續音 貝枓庫内同樣—個已知連續音的Lpcc的貝氏距離,在特: 12 201106340 貝枓庫内枝最小,那麼在特徵:#料庫内㈣氏距離找N個The words 'that (4) this person pronounced, _ ❻ 娜 成 into the ExP_C matrix placed in the water for a long time known continuous sound data phase. In the Yonghongzhi continuous sound database, for a known continuous sound extraction feature, first prepare a tribute library of unknown continuous sounds. 'Unknown continuous sounds and mixed riding two kinds: - more unknown continuous sounds have another - tender tear . Miscellaneous _, the average variability of each unvoiced sound. Unknown continuous sound database of the surface handle: N + of the estimated coded cepstrum (LPCC) for finding the N most recent unknown continuous around the known continuous sound with Bayesian distance 1 weighted average is used as the average of the known continuous sounds, and the number of variances of the weighted average _ mouth continuous sounds of N consecutive sounds of N consecutive sounds, the ExP average and the variance matrix are the continuous sounds The preliminary feature value 79 is placed in the continuous tone feature database. If there is no sample in the unknown treasury, in the unknown continuous sound data, find the linear prediction code of N unknown 1 and N unknown continuous sounds with the minimum absolute distance for the known continuous sound. _α(10) 2 = 1 ) a number of children. Find the weighted average of (10)) numbers as the average of the known continuous values, and find (the number of variances of the number as the known continuous sound: ^ number, this power is her initial generation of her companion = put It is known that the _axis 79. In the (10) continuous sounds of the second library, if the mean of the known continuous syllables is the same as the Bayesian distance of the Lpcc of the known continuous sounds in the permanent known continuous syllables, In special: 12 201106340 Bessie's inner branch is the smallest, then in the feature: #内库(四)氏距离找N

❻連續音’它們的貝氏轉對觀知連續音的LPCC是N 個最小。求請已知_音N辦触韻已知單音的㈣ 力立口權平均值作為該已知連續音新平均值,並用N個已知連續 音的N個變異數的加權平均值作_已知連續音的新的變異 數。用此方法重複多次計算特徵資料庫内每—個已知連續音 的新平均值及變異數,最後Εχρ新的平均值及變異數矩陣叫 做標準模型代表該已知連續音,放在特徵資料庫中观再用 已知_資料庫的已知連續音建立句子及名稱資料庫阳。 —第二®表示-個未知句子或名稱辨财法流程。當輸入 -個未知句子或名稱2到本發明語音辨認方法後,以—組未 知連績音波11進人触器2〇,缝轉換^ 3轉為—系列 音波信號點。將—個未知句子或名稱的音波切成D個未知連 2音的音波40,再以第—圖先前處判45刪衫具語音的 音波。再將每個未知連續音音波正常化,抽取特徵,將句子 或名稱每個未知逹續音全部具有語音的_齡成^等時 段,每時段形成一個彈性框5〇。每個連續音一共有糊彈性 框,沒有濾波器,不重疊,自由伸縮含蓋全部信號點。在每 框内,因錄點可域祕雜計,喊小平綠求购未 知係數的估計值。每框崎產生的—組最斜方估計值叫做 線I·生預估編竭(LPC)向量’線性預估編碼(Lpc)向量有正 13 201106340❻Continuous sounds. Their Bayesian turn to observe the continuous sound LPCC is N minimum. (4) The average value of the force of the known tone is used as the new average of the known continuous sound, and the weighted average of the N variances of the N known continuous sounds is used. A new variation of continuous sound is known. This method is used to repeatedly calculate the new average and variance of each known continuous sound in the feature database. Finally, the new average and the matrix of the variance are called the standard model representing the known continuous sound, and the feature data is placed. The library also uses the known continuous sounds of the known _ database to create sentences and name data. - The second ® represents an unknown sentence or name-financing process. When an unknown sentence or name 2 is input to the speech recognition method of the present invention, the splicing conversion ^ 3 is converted into a series of sound wave signal points by the group unknown continuation sound wave 11 into the human touch device 2 。. The sound wave of an unknown sentence or name is cut into D sound waves 40 of unknown 2 sounds, and then the sound wave of the voice is deleted by the first picture. Then, each unknown continuous sound wave is normalized, and the feature is extracted, and each unknown sentence of the sentence or the name has a period of _ age of the voice, and a flexible frame is formed every time period. Each continuous sound has a paste elastic frame, no filter, no overlap, and free telescopic cover all signal points. In each box, because of the location of the recorded points, Xiaoping Green is asked to purchase an estimate of the unknown coefficient. The estimated value of the most oblique square of each group is called Line I·Life Prediction Compilation (LPC) Vector 'Linear Estimation Coding (Lpc) Vector has positive 13 201106340

2配,縣祕齡編碼αρ〇向量娜較穩定線性預 =觸譜咖)向量6Q。—個未知連續⑽個線性 :編碼倒麟(_向量__,稱為分類模型 !lgit已知連續音標準模型大—樣一個句子—共有Η固 概表D個未知連續音9G,如果—個已知連續音是此 知連續音’它的標準_的平均值最靠近未知連續音分類 f⑽性預估編物轉(卿。所以本發賴易貝 一辨縣,以未知連續音的分類模型和連續音資料庫如每 2已知連續音的標準模型比較1⑽。如果—個已知連續音 2未知連齡’為了狀斜,蚊轴_音的分麵 内所有線性預估編•靡譜(聰)有獨立正常分配,它 =平均數及變異數以已知連續音標準模型内平均值及變 頻^計。簡易貝氏法是計算未知連續音的線性預估編碼倒 立 =(_與已知連續音的平均數的距離,再以已知連續 =異數調整,所得的值代表該未知連續音與一個已知連續 2似度。選擇與未知連續音F個相似度最高已知連續音代 知連Μ音,因此—個未知句子或名_ _個已知連 二來表示no。-個未知句子或名稱切割成d個未知連續 :’很難剛好切成—個未知句子或名稱所包含的連續音及 ^ ’有時-個連續音切成兩個,有時兩個連續音念的後 、’電腦切成-個’因此,_未知連續音並不一定是講話 14 201106340 j正連續音的健’所料―列F個已知她連續音並不 一疋包含講話者的連續音。在辨認一個未知句子 在奸和名稱資料庫85,測試每一個已知句子及名稱,在測 試H句该名稱是较講話者_伐名稱,將該句子或 名稱攸頭-個已知連續音比對DxF矩陣相似連續音的前後三 ,以連續音(當絲—個比對只能輯中後兩列相似連續 音),再移動3xF視窗(前後三列已知相似連續音)12〇找句 子第二個已知連續音,直到測試句子全部已知連續音。在句 子及名稱貝料庫巾,以最高機率的句子或名稱為講話者的句 子或名% (測試句子或名稱中已知連續音在3xF視窗數目除 以測試句子或名射連續音數)⑽。當然可在句子及名稱 資料庫中選擇和未知句子或名稱(D個未知連續音)長度大 約相等的句子或名稱比對,節省時間。如果句子或名稱=能 辨認,用貝氏分類法在特徵資料庫顿N個最相似連續音⑽ 改進句中的稍音特徵,—定會觸成功。本糾詳述於後: (1)-個連續音輸人語讀财法後,將此連續音連續音波 轉換一系列數化音波信號點(signal卿—她⑹。再 刪去不具》口日音波信號點。本發明提供二種方法:—是計算 一小時段内信號點的變異數。二是計算糾段内相鄰二信= 點距離的總和。理論上’第—種方法比較好,因錢點的變 異數大於雜日y異數,表示有語音存在。但在本發明辨認連 15 201106340 續音時,兩種方法辨認率一樣,但第二種省時。 (2)不具#f音鐵點刪去後,剩下錢點代表—個連續音全 扎號點A將音波正常化再抽取特徵,將全部信號點分成 Θ等時段’每時段形成—個框。—個連續音共有€個等長的 彈性框沒有濾波器、不重疊、自由伸縮,涵蓋全部信號點。 彈性框mm賴時間作非雜變化,很誠數學模型表 ^ ^ ^ J. Markhoul ^ 1975 ^ ^ Proceedings of IEEE,2 with, county secret age code αρ〇 vector Na more stable linear pre = touch spectrum coffee) vector 6Q. - Unknown continuous (10) linear: Encoding inverted lining (_vector__, called classification model! lgit known continuous sound standard model large - like a sentence - total stagnation summary D unknown continuous sound 9G, if - It is known that the continuous sound is the average of the known continuous sound 'its standard _ is closest to the unknown continuous sound classification f (10) predictive compilation turn (Qing. Therefore, this is the classification of unknown continuous sounds and The continuous sound database is compared to the standard model of every 2 known continuous sounds. (1) If there is a known continuous sound 2 unknown continuous age 'for the slant, the linear axis of the mosquito axis _ sound is composed of all the linear predictions (靡) Cong) has independent normal distribution, which = average and variance are calculated by the average and frequency of the known continuous sound standard model. The simple Bayesian method is to calculate the linear predictive code of the unknown continuous tone. Inversion = (_ and known The distance of the average of the continuous sounds is then adjusted by the known continuous = odd number, and the obtained value represents the unknown continuous sound with a known continuous 2 similarity. Selecting the unknown similarity with the unknown continuous sound F the highest known continuous sound generation Knowing the voice, so an unknown sentence or The name _ _ is known to connect two to mean no. - An unknown sentence or name is cut into d unknown contiguous: 'It is difficult to cut into just - an unknown sentence or the continuous sound contained in the name and ^ 'sometimes - consecutive The sound is cut into two, sometimes after two consecutive sounds, 'computer cut into one'. Therefore, _unknown continuous sound is not necessarily a speech 14 201106340 j positive continuous sound of the 'received' column F Knowing that her continuous sound does not include the continuous sound of the speaker. In identifying an unknown sentence in the traitor and name database 85, testing each known sentence and name, in the test H sentence, the name is more than the speaker _ _ _, The sentence or name gimmick - a known continuous tone is compared to the front and back of the DxF matrix similar continuous sound, with a continuous sound (when the silk-a pair can only match the last two consecutive similar sounds), then move the 3xF window (There are three known similar continuous tones in the first three columns) 12〇 Find the second known continuous tone of the sentence until the test sentence is all known to be continuous. In the sentence and name, the sentence or name is the speaker with the highest probability of the sentence or name. Sentence or name % (test sentence or name already The number of consecutive sounds in the 3xF window divided by the test sentence or the number of consecutive sounds) (10). Of course, you can select sentences or names that are approximately equal in length to the unknown sentence or name (D unknown continuous sounds) in the sentence and name database. Save time. If the sentence or name= can be identified, use the Bayesian classification method to improve the slightly-sounding features of the N most similar continuous sounds (10) in the feature data, and it will succeed. This is detailed in the following: (1) After a continuous sound input into the human language reading financial method, the continuous sound continuous sound wave is converted into a series of digitized sound wave signal points (signal Qing - she (6). Then delete the mouth sound wave signal point. The present invention provides Two methods: - is to calculate the number of variances of signal points in one hour. The second is to calculate the sum of adjacent two letters = point distances in the correction segment. Theoretically, the first method is better, because the variation of the money points is greater than The odd number of the day is y, indicating that there is voice. However, in the case of the invention, the recognition rate of the two methods is the same, but the second time is saved. (2) After the #f sound iron point is deleted, the remaining money points represent - a continuous sound full tie number point A normalizes the sound wave and extracts the feature, and divides all the signal points into the time period of 'the time interval'. . - A continuous sound with a total length of flexible frames without filters, no overlap, freely telescopic, covering all signal points. The elastic frame mm time is not a miscellaneous change, and the mathematical model table is ^ ^ ^ J. Markhoul ^ 1975 ^ ^ Proceedings of IEEE,

Vol. 63,No. 4 發表論文 Linear Prediction: A tutOTial review中况明信號點與前面信號點有線性關係,可用隨時間 作線性變化的迴歸的模型估計此非線性變化的信號點。信號 點W可由别面信號點估計,其估計值心)由下列迴歸模式 表不 · (1) 鲁在(1)式中,〜’七=1,..”尸,是迴歸未知係數估計值,戶是前面信Vol. 63, No. 4 Published in Linear Prediction: A tutOTial review shows that the signal point has a linear relationship with the previous signal point, and the signal point of this nonlinear change can be estimated by a regression model that changes linearly with time. The signal point W can be estimated from other surface signal points, and its estimated value is expressed by the following regression modes. (1) Lu in (1), ~ 'seven=1, .." corpse, is the estimated value of the regression unknown coefficient. , the household is a letter

號點數目。用L. Rabiner及B.H· 了1!卿於1993年著作書 Fundamentals of Speech Recognition, Prentice Hall PTRNumber of points. L. Rabiner and B.H. 1! Qing published in 1993 Fundamentals of Speech Recognition, Prentice Hall PTR

Englewood Cliffs,New Jersey 中 Durbin 的循環公式求s 小平方估計值,此組估計值叫做線性預估編碼(Lpc)向量。 _内信號點的線性預估編碼(LP〇向量方法詳述如下·· 以巧表示信號點^(«)及其估計值⑻之間平方差總和: 16 201106340 £1=|;[5(η)-2α,5(η-λ:)]2 (2) η=〇 灰=1求迴歸係數使平方總和Α達最小。對每個未知迴細係數 α,_,ί = 1,.··,Ρ,求(2)式的偏微分,並使偏微分為〇 ’得到户組正 常方程式:In Englewood Cliffs, New Jersey, Durbin's cyclic formula finds s small squared estimates, which are called linear predictive coding (Lpc) vectors. The linear predictive coding of the _ inner signal point (LP 〇 vector method is detailed as follows) to represent the sum of the squared differences between the signal point ^(«) and its estimated value (8): 16 201106340 £1=|;[5(η )-2α,5(η-λ:)]2 (2) η=〇灰=1 Find the regression coefficient to minimize the sum of squares. For each unknown coefficient of refinement α, _, ί = 1,.·· , Ρ, find the partial differential of (2), and divide the partial into 〇 ' to get the normal equation of the household:

^a^Sin- k)S(n - ί) = Σ5(η)5(η-〇, 1 < ^ ^ fc=l η «展開(2)式後,以(3)式代入,得最小總平方差α E„ =X52(n)-^at£5(n)5(n-)t) n k=\ η(3)式及(4)式轉換為 ⑶ ⑷ $atR(i - fc) = R(i), l<i^P k=i (5)^a^Sin- k)S(n - ί) = Σ5(η)5(η-〇, 1 < ^ ^ fc=l η «Expand after (2), substituting (3), the minimum Total squared difference α E„ =X52(n)-^at£5(n)5(n-)t) nk=\ η(3) and (4) are converted to (3) (4) $atR(i - fc) = R(i), l<i^P k=i (5)

Ep = R(0) ⑹ jfc=l在(5)及(6)式中,用#表示框内信號點數, W—i ^(0 = + 0) ϊ ^ 0 (7) /»=〇用Durbin的循環快速計算線性預估編碼(Lpc)向量如下: E0 =/?(〇) 勾=_ 一客 〇·-·7·)]/£μ ⑻ ⑼ αί° = of -kta^, E{ =(l-^j2)^j_i (10) (ID (12) 17 201106340 (8-12)公式循環計算’得到迴歸係數最小平方估計值α 严1,…’’ C線性預估編碼(LPC)向量)如下: (13) 預伯編碼倒頻譜Ep = R(0) (6) jfc=l In equations (5) and (6), use # to indicate the number of signal points in the frame, W—i ^(0 = + 0) ϊ ^ 0 (7) /»=〇 Use the Durbin loop to quickly calculate the linear predictive coding (Lpc) vector as follows: E0 =/?(〇) 勾=_ 一客〇·-·7·)]/£μ (8) (9) αί° = of -kta^, E { =(l-^j2)^j_i (10) (ID (12) 17 201106340 (8-12) Formula Cyclic Calculation 'Get the least squares estimate of the regression coefficient α Strict 1,...'' C Linear Estimation Coding (LPC) The vector) is as follows: (13) Pre-coded cepstrum

aj = ajP). \<j<P 再用下列公式將LPC向量轉換較穩定線性 (LPCC)向量<,y = 1,”户, ;=1 i Μ (14) (15) —個彈性減生—轉性預估編碼倒辆(服)向量 :’.··0。根據本發縣音辨認方法,用帥,因最後的線性 預估編碼倒頻譜(LPCC)幾乎為0。一個連續音以錄線性預 估編碼倒賴(脱)_示特徵,切—個含挪個線性 Z編碼倒頻譜(_的矩陣表示—個連續音…個連續音 包含一至多個音節。 、 =)同樣方法以(8-⑸式計算出—個未知連續音音波的以固線 =估編碼倒頻譜(聞向量,有同樣大小辦個LPCC的 矩陣,叫做未知連續音的分類模型。 2第二财,語音辨認_,__個未知連續音的分類 一個細LPCC的矩陣。❹.卞”川=1’户, q未知連續音分類模型。在與—個已知連續音d心 、所有連績音總數)’比對時,為了快速計算比對值,假定⑸ 201106340Aj = ajP). \<j<P Then use the following formula to convert the LPC vector to a more stable linear (LPCC) vector<,y = 1," household, ;=1 i Μ (14) (15) - elasticity The reduction-rotation prediction code reversing vehicle (service) vector: '.··0. According to the method of identification of the local county, the handsome, because the final linear prediction coding cepstrum (LPCC) is almost 0. A continuous The sound is recorded by the linear prediction code. The cut-off feature contains a linear Z-coded cepstrum (the matrix of _ is a continuous tone... a continuous tone contains one or more syllables. , =) The method calculates the cepstrum of the unknown continuous sound wave by the equation (8-(5)) (sense vector, the matrix of the LPCC with the same size, called the classification model of the unknown continuous sound. 2 second wealth, Speech recognition _, __ classification of unknown continuous sounds A matrix of fine LPCC. ❹.卞"chuan=1' household, q unknown continuous sound classification model. In and with a known continuous tone d heart, all continuation Total) 'In comparison, in order to quickly calculate the alignment value, assume (5) 201106340

有办户個獨立正常分配,它的平均數及變異數,以已知 連縯音標準模型内的平均值及變異數估計。以八xlc )表示x的 條件密度函數。以T. F. Li (黎自奮)於2003年出版在Pattern Recognition,Vol. 36 發表論文 Speech rec〇gniti〇n 〇f mandarin monosyllables中的決策理論說明貝氏分類法如下: 假設特徵資料庫一共有历個已知連續音的標準模型。以 心’=1’.._,《’表不連續音^ = 1,%出現的機率,也即先前機率, 貝!§3-1。心表示-個決策方法。定義__個簡單損失函數 (loss function) ’也即d的判錯機率(misclass出如㈤ probabiUty):如決策方知判錯一個未知連續音物c,,則 損失函數他,· = 1。如果,判對—個未知連續音咖^,則 無損失L(Ci,々x))=0。辨認方法如丁 .,、,广 卜.以Γ·_,表示矩 陣值屬於已知連續音Ci的範圍。汝 也px在η,^/判未知連續音屬 於已知連續音Ci判錯平均機率為 ΚΜ)=Ρ>κ,ά(χ))ηχ^^ m (16) 在(16)中 ’ r = (Uj, r, jr ,疋^乂外範圍。以D表 立 認方法,也即劃分瓜個已知連綠 不斤有扣曰辨 以取4) 建、、貝曰的範圍所有方法。在D中找 一個辨認方法歧它的平均認錯機率⑽達到最小 表示 201106340 (Π) R(T^-minR(T>d) 滿足(17)式的辨财法《叫做與先前機率『有關的貝氏分類 法。可用下列表示: (18) ’’也即屬於已知連續音ς·的範圍是對 在(18)式中,;·=1,There is an independent normal allocation of the households, and its average and variation are estimated by the mean and variance within the known continuous sound standard model. The conditional density function of x is expressed in eight xlc). The decision theory in Speech Rec〇gniti〇n 〇f mandarin monosyllables published by TF Li (Li Zifen) in 2003 in Pattern Recognition, Vol. 36 shows that the Bayesian classification is as follows: Assume that the feature database has a total of known A standard model of continuous sound. The heart '=1'.._, "the probability of the occurrence of the table discontinuity ^ = 1,%, that is, the previous probability, ** § 3-1. The heart represents a decision method. Define __ a simple loss function (loss function) ‘that is, the probability of error in d (misclass is as in (5) probabiUty): If the decision maker knows that an unknown continuous syllable c, then the loss function, he = 1. If, for example, an unknown continuous tone coffee ^, there is no loss L (Ci, 々 x)) = 0. The identification method is D, ., , and Guang. With Γ·_, the matrix value belongs to the range of the known continuous tone Ci.汝 also px in η, ^ / sentence unknown continuous sound belongs to the known continuous sound Ci error average probability ΚΜ) = Ρ > κ, ά (χ)) η χ ^ ^ m (16) In (16) 'r = (Uj, r, jr, 疋^乂 outside the range. The D-form method is recognized, that is, the method of dividing the melon is known to be green and not deductible to take 4) the range of construction and shellfish. Find an identification method in D. The average probability of error recognition (10) reaches a minimum of 201106340 (Π) R (T^-minR(T>d) satisfies the formula (17) of the method of financing, called "related to the previous probability" The classification method can be expressed as follows: (18) ''The range of known continuous sounds is the same in (18),;·=1,

所有〕}。如所有已知連續音出現機 率一樣’氏分紐和最大機率法—樣。 貝氏分類法(18)辨認-個未知連續音時,紐算所有z的條件 密度函數/(X|C,.),i=l,...,w, f (x\ c.)= Π n (19) 在(19)中,^‘^乂已知連續音總數^為了計算方便’將⑽) 式取對數,並刪去常數,得貝氏距離 %) = / = 1,...,m. ⑽all〕}. As with all known continuous tones, the probability of occurrence is the same as the "maximum probability" method. Bayesian classification (18) identifies the conditional density function of all z/(X|C,.), i=l,...,w, f (x\ c.)= Π n (19) In (19), ^'^乂 knows the total number of continuous sounds ^ For the convenience of calculation, '(10)) is taken as a logarithm, and the constant is deleted, and the Bayesian distance %) = / = 1,.. .,m. (10)

Jl ^ Ji σιμJl ^ Ji σιμ

貝氏分類法(18)變成對每個已知連續音Ci,計算你;)值(2〇), 代C,·)也稱為未知連續音和已知連續音^的相似度,或貝氏距離 (Baysian distance )。在(20)式中,x = u;i} , _/=ι,...,£·, hi,···,/1,是未知連續音分類模型内線性預估編碼倒頻譜 (LPCC)值’(~,站}用已知連續音的標準模型内的平均數及 變異數估計。本發明最重要的貢獻是不用樣本在已知連續音特 徵資料庫為每一個已知連續音C,.找到互相穩定的中心點 201106340 〇=(〜}及明確不重疊的範圍 這裡叫〜}是表示已知連續音⑽似Lpcc矩陣範圍。 ⑸抽取-個已知連續音的·,先顿—個未知連續音的資 料庫’未知連續音資料庫有.二種:一種是未知連續音有樣本, 另-種是沒有樣本。有樣本的資料庫,先求每一個未知連續音 的平均值及變異數。在有樣本的未知連續音㈣庫中用貝氏距 離對該已知連續音周圍找N個最近的未知連續音。再求N個未 曰的N個平均值及该已知連續音的雜預估編碼倒頻譜 (to的N+1個加權平均值作為已知連續音的平均值,並二 N個連續音的N_魏的域平触作為配知連續音的變 2膽平恤編喷_輪的初步特徵 2輪音龍資料料。如果未知賴音資料庫沒有 樣本,在未知連續音資料庫中,用最小絕對值 5周圍《個未知連續音。該已知連續音及N個未知連已2 '雜預估編碼觸譜(_看作㈣)個數字。求: 1的加權平均值作為該已知連續音的平均值,及求( 數子的變細編已知連物輸 2的矩嶋如傭_觸,心 貧料庫内79。名P心、垂读* u 丈貝曰符域 在已知連績音特徵資料庫内, 績音的平均㈣如知撕嶋嶋—個= 21 201106340 待sn狀轉’在特徵軸内不是最小,那麼在 狀爾則处知_音,它烟貝氏 對5亥已知連續音的LPCC是IV個县,,, 早 平均信另w 土由士 取小。求N個已知連續音IV個 新平均值’並用N個已知連續音 的:’音The Bayesian classification (18) becomes for each known continuous tone Ci, calculating your ;) value (2〇), generation C, ·) also known as the unknown continuous sound and the known continuous sound ^ similarity, or shell Baysian distance. In (20), x = u;i} , _/=ι,...,£·, hi,···,/1, is the linear predictive coding cepstrum in the unknown continuous sound classification model (LPCC) The value '(~, station} is estimated by the mean and the number of variations in the standard model of known continuous tones. The most important contribution of the present invention is that there is no sample in the known continuous tone feature database for each known continuous tone C Find the center point of each other's stability 201106340 〇 = (~} and the range of clear non-overlapping here is called ~} is to indicate the range of known continuous sounds (10) like Lpcc matrix. (5) Extract - a known continuous tone ·, first - A database of unknown continuous sounds. There are two types of unknown continuous sound data. One is that there are samples of unknown continuous sounds, and the other is no samples. There is a database of samples, and the average value of each unknown continuous sound is first sought. The number of variances. In the unknown continuous sound (4) library with samples, find the N most recent unknown continuous sounds around the known continuous sound with Bayesian distance. Then find N unnumbered N averages and the known continuous sounds. Miscellaneous predictive coding cepstrum (N+1 weighted averages of to as the average of known continuous tones, and two N The continuous sound of N_Wei's domain flat touches as a continuation of the sound of the 2 biliary t-shirts _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ In the middle, use the minimum absolute value of 5 around the "unknown continuous sound. The known continuous sound and N unknown connected 2' mispredicted coding spectrum (_ as (4)) numbers. Find: 1 weighted average as The average value of the known continuous sounds, and the seeking (the number of sub-variables is known as the moment of the loss of 2, such as the maid_touch, the heart of the library 79. Name P heart, reading * u 曰贝曰In the known continuation feature database, the average of the performance scores (four) is known as tearing one = 21 201106340 to be in the shape of the snake is not the smallest in the characteristic axis, then the _ sound is known in the shape It is the IV counties of the five-year-old continuous sounds of the smoked Bayes, and the early average letters are taken from the smaller ones. Find the N new averages of the N known continuous sounds and use N known continuous Sound: 'sound

為遠已知連縯音的新的變異數。用此方法重複多次計 料庫内每一個已知連續音的新 、5貝 :=:再_嶋庫—音建: ⑻㈣觸—個未知連續音e觸錯誤,本剌提供 娜正舊特徵,使該連續音辨認正確; 八 ω用貝氏分類法⑽在特 内找For the far known new variations of the continuous sound. In this way, repeat the new 5 lbs of each known continuous sound in the library. ==: _ 嶋 — 音 - (8) (4) Touch - an unknown continuous tone e-touch error, this 剌 provides Nazheng old features To make the continuous sound recognizable correctly; Eight ω is found in Teh by Bayesian classification (10)

相―續音—,...,—:值Phase-continuation—,...,—: value

用{/V 〜 = ?^=Σ: ijt ;}, £=1,...,ρ , (22) 刑七如、 代表該未知連續音的新標準握 子連續音特徵資料庫中,再測試該音一定成工力 、 (l)ii吻巾Ν個最相似已知連續音平均值及該未知連續音的 pt 丁 ^ ,1、 以^值’^個最相似連續音的變異數加權平均值作為未 〜只日丁叫值及茲未知連 、射預估編碼倒頻譜(LPCC)求加權平均值作為該未知 連續音 知連續音的•絲々用{ J =X:.,E, I =l,...5p ’代表該 22 201106340 未知連續音新的標準模型。 ⑺為了證實本發·同_何語言,本㈣執行2人語音 辨認實驗。 (a)首先建立-個未知連續音f料庫。本單音資料庫是從台 灣的中央研究院購買。資料庫一共有删個國語單音^ 三圖),全是女性發音,樣本從6個至·j 99個不等,很 音的發音幾乎一樣。 ⑹從⑵計方法將所有樣本轉成制Lpa辦,一共 有12400個矩陣。 (c)在388國語單音中,用樣本求平均值及變異數。 ⑷盲目混合3_語單音’使有樣本平均值及變異數 的單音變成388未知連續音資料庫(一個國語單音也 一個音節的連續音)。 ^ (e)再找-男-女對654國語單音,154音語,丨個德語,丄 個日語及3個台語發音一次建立兩個813個永久已知連續 音資料庫,每_續音赠性雜編·觸(LPGQExP矩 陣表示。 ⑴在永久已知連續音資料庫813個已知連續音中,對每— 個已知連續音,用貝氏距離⑽在388未知連續音中找 和15個未知連續音’該已知連續音的線性預估編碼倒頻譜 (LPCC)及N個未知連續音的樣本平均值求n+i個加權平均 23 201106340 里數的力權未料續音的樣本變 變里數H 已知連續音的變胁。此平均值及 L、數则矩陣叫做該已知連續音的初步特徵79 已知連績音特徵資料庫。也即特徵資料庫包含⑽個 平均值及變異數矩陣80。 (g)在特徵資料料,如果—個已知連續音的平均值和在永 3續音資料庫中同樣該已知連續音的LPCC的貝氏距離不 疋取小。在813連續音特徵資料用貝氏距離找N=15已知連 :曰用N個連績音的N個平均值及該已知連續音的聰 未加權平均值聽已知連續音新平均值。對n個已知連續 曰的變異數求加權平均值為該已知連續音的新變異數。重 覆計算新平及變魏多次。最後的12灿平均值及變 ”數矩陣叫做鮮模型,表示該已知連續音特徵,存在已知 連續音特徵資料庫中8〇。 本么明執行下列連續音辨認,辨認率援人而定,因相似太 多,入圍前三名就算對·· ① 辨認384國語單音,1個德語,1個日語,2個台語(第三 圖)(辨認率非常好) ② 辨認154英語,1個德語(第四圖)(辨認率非常好) ③同時辨認154英語及388國語,1德語,1日語,2台語 24 201106340 (辨認率非常好) ④辨認654國語單音,1德語,1日語,3台語(第三圖及第 五圖辨認率好,沒有前三者好) 講?者的句子或謂· ’我們先建立-個英語及國 =句广及名财料庫’每個句子或名稱_連續音全1由連捧立 Ϊ徵貨料庫内(3_4)已知英語及國語任意組成 =成子及名稱,384國語單字組成407個國語句ΐ及 名%,(弟六圖)。辨認方法如下: u J卞汉Use {/V 〜 = ?^=Σ: ijt ;}, £=1,...,ρ , (22) Penalty ruth, representing the unknown continuous tone of the new standard grip continuous tone feature database, and then Test the sound must be a work force, (l) ii kiss scarf, the most similar known continuous sound mean and the cont of the unknown continuous sound, 1, the weight of the ^ value of the most similar continuous sound The average value is used as the weighted average of the not-only dice and the unknown, and the estimated cross-spectrum (LPCC) is used as the weighted average of the unknown continuous sound. { J =X:.,E, I = l, ... 5p 'represents the 22 201106340 new standard model of unknown continuous sound. (7) In order to confirm the original language and the same language, this (4) performs a two-person speech recognition experiment. (a) First establish an unknown continuous stream f library. This monophonic database was purchased from the Central Research Institute in Taiwan. The database has a total of single-sentences (three figures), all of which are female pronunciations. The samples range from 6 to 99, and the pronunciation is almost the same. (6) Convert all samples from the (2) method to the Lpa office, with a total of 12,400 matrices. (c) In the 388 Mandarin singles, use the sample to average and the number of variances. (4) Blindly mix 3_monophonic' to make the single tone with the sample mean and the variance number into a 388 unknown continuous sound database (one Mandarin single tone and one syllable continuous sound). ^ (e) Re-find - male-female 654 Mandarin mono, 154 vocabulary, one German, one Japanese and three Taiwanese pronunciations to create two 813 permanent known continuous sound databases, each _continued (2) In the 813 known continuous tones of the permanent known continuous sound database, for each known continuous sound, find the 388 unknown continuous sound with Bayesian distance (10). And 15 unknown continuous sounds, the linear predictive coding cepstrum (LPCC) of the known continuous sound and the sample mean of the N unknown continuous sounds are n+i weighted averages 23 201106340 Miles power is not renewed The sample variation kernel H is known to be the threat of continuous sound. This average and the L and number matrix are called the preliminary features of the known continuous sound. 79 The known continuous tone feature database. That is, the feature database contains (10) The mean and the variance matrix 80. (g) In the feature data, if the average of the known continuous sounds and the LPCC of the known continuous sound in the Yong 3 continuous database are not the same Take small. In the 813 continuous tone feature data, use the Bayesian distance to find N=15 known connection: N N flat with N consecutive performances The mean and the unweighted average of the known continuous sounds are listened to the new average of the known continuous sounds. The weighted average of the variances of the n known continuous turns is the new variance of the known continuous sound. Repeated calculation Xinping and Wei have been repeated many times. The final 12-can average and variable-number matrix is called the fresh model, indicating the known continuous sound feature, and there are 8〇 in the known continuous tone feature database. Benming performs the following continuous tone recognition. The recognition rate depends on the person. Because there are too many similarities, the top three are even right. · 1 Identify 384 Mandarin singles, 1 German, 1 Japanese, 2 Taiwanese (3rd) (very good recognition rate) 2 Identify 154 English, 1 German (4th) (very good recognition rate) 3 Identify 154 English and 388 Mandarin, 1 German, 1 Japanese, 2 Taiwanese 24 201106340 (The recognition rate is very good) 4 Identify 654 Mandarin Monophonic, 1 German, 1 Japanese, 3 languages (the third and fifth pictures have good recognition rate, no good for the first three). The sentence of the speaker is said or not. 'We first establish - English and country = sentence wide And the name of the database 'each sentence or name _ continuous sound all 1 by the joint holding In the cargo warehouse (3_4), English and Mandarin are known to be arbitrarily composed = adult and name, and 384 Mandarin characters are composed of 407 national sentences and %, (different six figures). The identification method is as follows: u J卞汉

(!)切割-個未知句子或名稱成為D =鄰:信號點落姆總和,如太小,該‘為计 丘切成D個夫釦、查繂立θ應該疋兩連、、員g分界線就應切割,一 ;;lpIc f;tt Λ;:ί r50 60 ^90 ^ 及國語特徵資料庫+選擇最相似貝氏續々[2〇) ;及國語(圖))’—未知句子或名稱以歸續包ii (b)在句子及名稱資料庫尋找講 國語句子和名稱中,挑選長度有⑽j在47乎語及 等長(D個未知連續音時 f稱和4話者的句子或名稱 比對句子或名稱的5知連續音和 内都含-個比對句子或名稱内的已知^貝立曰抛如母列相似連續音 =如m㈣料名續音是D (d如果_庫_句子和名稱 冉。(!) cutting - an unknown sentence or name becomes D = neighbor: the sum of the signal points, such as too small, the 'for the Qiuqiu cut into D buckles, check 繂 θ should be 疋 two,, g The boundary should be cut, one;; lpIc f; tt Λ;: ί r50 60 ^90 ^ and the Mandarin feature database + select the most similar Bayesian continuation [2〇); and the Mandarin (Figure)) - unknown sentences or The name is in the continuation package ii (b) in the sentence and name database to find the speaking sentence and the name, the length of the selection is (10) j in 47 words and the same length (D unknown continuous sound f and 4 words of the sentence or The name matches the sentence or the name of the 5 known continuous sounds and both contain - a comparison sentence or the name inside the known ^ bei 曰 曰 throwing as the mother column similar continuous sound = such as m (four) material name is a D (d if _ Library_sentence and name 冉.

Cc)的辨認正確連續音不晏 硬,曰數疋D-】或D+】或 比對句子或名稱(資料庫内)中,’本發明則用祕視窗篩選,在 ;中第,個已知連續音,用歸矩陣 25 201106340 =後三列相似已知連續音(即第Cc) recognizes that the correct continuous sound is not hard, the number 疋 D-] or D+] or the comparison sentence or name (in the database), 'the invention is screened with the secret window, in the first; Continuous tone, using the return matrix 25 201106340 = the last three columns are similar to known continuous sounds (ie the first

DxF矩陣有多少比對句子或名稱内的已知 大句子或名稱為講的齡女祕庫選擇-個機率最 假如某句子或名_認錯誤,—定是在 一個或多個不在它們的F個相似已知連續音中,曰中’有 在(155+384)已知連續音中尋找前Ν=15順位已知連^1(2= 相似連續音及該未知連續音的Lpa:加 ^= =使叫連續音在它們F個相似已知連續音How many DxF matrices are selected than the known large sentences in the sentence or the name or the name of the female secret library - the probability is the most likely if a sentence or name _ acknowledgment error - is in one or more of them Among the similar known continuous sounds, there is a known continuous sound in (155+384). The first Ν=15 is known to be connected to ^1 (2= similar continuous sound and Lpa of the unknown continuous sound: plus ^ = = make continuous sounds in their F similar known continuous tones

行T列英語及國語句子及名稱辨認,辨認幾乎全部正確, ① 辨認70英語句子及名稱(非常好)。 ② 辨s忍407國語句子及名稱(非常好) ③ 辨認70英語句子及名稱與術句子及名稱(非常好) *附一張Visual Basic辨認圖(第 辨έ忍央s吾及國語句.子、名稱方法。 七圖 第八圖)表示同時 【圖式簡單說明】 第-圖及第二圖說明發明執行程序。第—圖是表示已知連續音 永久資料庫,已知連續音特徵資料庫和句子及名稱三個資料庫: 流釭,第一圖表示一個未知句子或名稱辨認方法流程。 第三圖是表示辨認384國語單音,!個德語,}個日語,2個台語。 第四圖是表示辨認154英語,1個德語。 第五圖是表示辨認269國語單音,3台語。 第六圖是表示句子及名稱資料庫有7〇英語句子及術中文句子 26 201106340 及名稱。 第七圖及第八圖說明Visual Basic辨認圖表示同時辨認英語 及國語句子、名稱方法。 【主要元件符號說明】 (1)建立一個已知連續音永久資料庫,發音一個連續音或一個句 子’句子再分成多個已知連續音。 (10)連續音連續音波 (20)接收器 (30)音波數位轉換器 (45)除去雜音 (50) E個彈性框正常化音波 (60)最小平方法計算線性預估編碼倒頻譜(⑽)向量 、氏距離(絕對值距離),對每一個已知連續音(永久資料庫), 在未知連續音資料庫找_最近未知連續音。 (79)對母一個已知連續音(永久資料庫),用周削個未知連續音及 :連’的敗味加權平均值,為該已知連續音初步特 ^庐在特徵寅料庫。再在特徵資料庫用貝氏距離找1^個已知 連績音和該已知連續音LPCC求加權平均值,計算多次。最後 ^平句值(ΕχΡ平均值及變異數)代表該已知連續音的標 模型。 τ Γ ^ ^^曰特徵資料庫包含所有平均值及變異數的標準模型 已头連續音特徵資料庫的連續音建立要辨認的句子及名稱 27 201106340 的句子及名稱資料庫 ⑵輸入-未㈣子或名稱 (11) 一組未知連續音波 (40)將-個句子或名稱切個未知連續音 (9〇) 音的、物卿_代細固 未头連續音分類模型 (100)用狀分類法比較每—個已知連續音標準翻及未知 分類模型 曰 (110)-句子或名稱巾為每—個未知連續音找最相近的卩個已知連 續曰,一句子或名稱一共有Dxp個已知最相似連續音表示 (120)在句子和名稱資料庫中,用3xF視窗的相似已知連續音篩選 所有句子及名稱中的每個已知連續音 (130)在句子及名稱資料庫中找一個最可能的句子或名稱T column English and national sentences and name recognition, identification is almost all correct, 1 identify 70 English sentences and names (very good). 2 Distinguish s 407 national sentence and name (very good) 3 Identify 70 English sentences and names and sentences and names (very good) * Attach a Visual Basic identification diagram (the first sentence έ έ s s 及 及 及 及 子 子, the name method. The seventh figure of the seventh figure shows the simultaneous [simplified description of the drawing] The first figure and the second figure illustrate the invention execution program. The first figure shows the known continuous sound permanent database, the continuous sound feature database and the three databases of sentences and names: rogue, the first figure shows an unknown sentence or name identification method flow. The third picture shows the identification of 384 Mandarin singles! German, } Japanese, 2 Taiwanese. The fourth picture shows the recognition of 154 English and 1 German. The fifth picture shows the recognition of 269 Mandarin singles and 3 words. The sixth picture shows that the sentence and name database has 7 English sentences and Chinese sentences 26 201106340 and the name. The seventh and eighth diagrams illustrate the Visual Basic recognition diagram showing the simultaneous recognition of English and national sentences and name methods. [Explanation of main component symbols] (1) Establish a permanent database of known continuous sounds, pronounce a continuous sound or a sentence' sentence and divide it into several known continuous sounds. (10) continuous sound continuous sound wave (20) receiver (30) sound wave digital converter (45) remove noise (50) E elastic frame normalized sound wave (60) least square method calculate linear predictive coding cepstrum ((10)) Vector, distance (absolute distance), for each known continuous sound (permanent database), find the nearest unknown continuous sound in the unknown continuous sound database. (79) A known continuous sound (permanent database) for the mother, with an unknown continuous sound and a weighted average of the succumbs of the sequel, for the known continuous sound, a preliminary feature. Then, the feature database is used to find 1^ known perpetual scores and the known continuous tone LPCC to obtain a weighted average value, which is calculated multiple times. The last ^ flat sentence value (ΕχΡ mean and variance) represents the nominal model of the known continuous tone. τ Γ ^ ^^曰Characteristic database contains all the averages and the number of variants of the standard model. The continuous tone of the continuous tone feature database is created to identify the sentence and name. 27 201106340 sentence and name database (2) input - not (four) Or name (11) a group of unknown continuous sound waves (40) will be - a sentence or name cut into an unknown continuous sound (9 〇) sound, qing _ generation fine solid head continuous sound classification model (100) use classification Compare each known continuous tone standard and unknown classification model 110(110)-sentence or name towel to find the closest known 曰 consecutive 曰 for each unknown continuous sound, one sentence or name has a total of Dxp Knowing the most similar continuous tone representation (120) In the sentence and name database, use the similar known continuous tones of the 3xF window to filter all sentences and each known continuous sound in the name (130) in the sentence and name database. One of the most likely sentences or names

2828

Claims (1)

201106340 七、申請專利範圍: 一個不用樣本能辨認各種語言的辨認方法,其步驟包含· (1)一個未知連續音資料庫(有無樣本均可); 由測試 ⑵-個已知連續音永久資料庫由—個清晰標準發音人,對 每一已知連續音發音一次,如測試人口音很重曰 人發音;201106340 VII. Patent application scope: A method for identifying various languages without using samples. The steps include: (1) an unknown continuous sound database (with or without samples); by test (2) - a known continuous sound permanent database By a clear standard speaker, each known continuous sound is pronounced once, such as the test population sound is very heavy; ⑶-個先祕理ϋ(ρΐΈ-ρπχ^_)耻林語音音波声 號點(sampled points)或雜音; ° (4)-個連續音音波正常化及抽取特徵方法:用e個彈性框 將音波正常化並轉換成大小相等的線性預估編碼倒頻 譜(LPCC ) ExP特徵矩陣; (5):個未知連續音音紅f化及抽㈣財法:將音波正 常化並轉換成大小與已知連續音標顿型(由—個⑽ 平均值及變異數的矩陣表示)大小相等的特徵矩陣,稱 為未知連續音分麵型,内含有線性預估編碼倒頻譜 (LPCC); ⑻-個簡化貝氏(Bayesian)分類法:將未知連續音分類 模型與已知連續音特徵資料庫所有已知連續音標準模 型(由-個ExP平均值及變異數的矩陣表*)比較,找— 個已知連續音,它和未知連續音貝式轉達最小,辨認 為未知連續音; ⑺在已知_音永久資料庫,對每—個已知連續音,在有 29(3) - a sacred ϋ (ρΐΈ-ρπχ^_) 林 语音 voice voice sound point points (sampled points) or murmur; ° (4) - a continuous sound wave normalization and extraction feature method: with e elastic box will The sound waves are normalized and converted into equal-sized linear predictive coding cepstrum (LPCC) ExP feature matrix; (5): an unknown continuous tone red f- and pumping (four) method: normalize the sound wave and convert it into size and Knowing the continuous phonetic scale (represented by a matrix of (10) averages and variances), the feature matrix of equal size, called unknown continuous-segment facet, contains linear predictive coding cepstrum (LPCC); (8)- Simplify the Bayesian classification: compare the unknown continuous sound classification model with all known continuous sound standard models of known continuous sound feature databases (by a list of ExP averages and variances)* Continuous sound is known, which is the least conveyed to the unknown continuous sound, and is recognized as an unknown continuous sound; (7) in the known permanent database of sounds, for each known continuous sound, there are 29 201106340 ^物,音資料庫f,㈣氏轉則個最近的 °連、_音’如未知賴音請沒錢本,用絕對值距 離找N個最近未知連續音,· ⑻二料庫有樣本’計算N個最近未知連續音 固千均值及該已知連續音的線性預估編媽倒頻譜 收)的N+1個加權平均值為該已知連續音平均值,n 個未知連續音⑽個變異數的加權平均值為該已知連 ^音的變異數,此ExP平均值及變異數矩陣叫做該已知 連續音初步特徵,放在已知連續音特徵資料庫中. ⑼如未知連續音資料庫沒有樣本』N贿近未知連續音 的線性預估編馬倒頻譜(脱)及該已知連續音的線性 預估編碼倒頻譜㈣〇作·+1)數,計算⑽)數加權 平均值及變異數,此Εχρ平均值及變異數矩陣叫做該已 知連續音初步特徵,存在已知連續音特徵資料庫中; (10)-個重覆計算及穩定每一個已知連續音特徵方法,使 在已知連續音特徵倾庫内,每—個已知連續音都有互 相穩定特徵(由一個Εχρ平均值及變異數的矩陣表 示),叫做該已知連續音的標準模型,放在已知立 資料庫中; '曰 (11) 將一個未知句子或名稱切成D個未知連續音方法; (12) -個簡化貝式分類法在D個未知連續音中,為每—個 30 201106340 未知連續音在已知連續音特徵㈣庫t,選F個最相似 的已知連續音’―個未知句子或名稱用_矩陣的已知 連續音表示; ⑽用DXF_的已知連續音,比對句子及名稱資料庫全 部句子及名稱,找尋—個最可能已知句子或名稱; ⑽-個修正—個連續音特徵的方法,務使講話者的句子 或名稱辨認正讀。 2 •根據申請專利範圍第1 的辨認方法,其中步驟 種方法: 項所述之-财雜本麵認各種語言 (3)刪去不具語音的音波或雜音,包含二 a小枯段内信號點’計算信號點的變異數及—般雜音的變 二數’如錄_變異數小於雜音變,咖去該時段; (^-小時段誠號點,計算轉兩錄點峰總和和一般雜 _相^跋點輯齡,Μ者顿後者酬去該時段。 1撕述之—個不轉械辨認各種語言 m法’其中轉⑷包含—個連續音音波正常化及抽取大 J —致的特徵矩陣,步驟如下: (=2等分-個連續音音波錄財法1 了⑽性變化的 密切估計非線性變化的音波,將音波全長分成轉 性^ ’母時段形成—轉性框,—個連續音共有糊 框,沒嫩11 (FUW疊,可以自由伸縮含蓋201106340 ^ object, sound database f, (four) turn to the nearest °, _ sound 'if unknown, please do not have money, use absolute distance to find N recently unknown continuous sound, · (8) two stocks have samples The N+1 weighted averages of 'calculating the N most recent unknown continuous tones and the linear estimates of the known continuous tones' are the average continuous sounds, n unknown continuous tones (10) The weighted average of the variances is the variation of the known continuous sound. The ExP average and the variance matrix are called the preliminary features of the known continuous sound, and are placed in the known continuous sound feature database. (9) If the continuous continuous There is no sample in the sound database. N is a linear estimate of the unknown continuous sound. The horse's cepstrum (off) and the linear predictive code of the known continuous sound are cepstrum (4) ··+1), and the (10) number is weighted. The mean and the number of variances, the mean value of the Εχρ and the matrix of the variance are called the preliminary features of the known continuous sound, and there is a library of known continuous sound features; (10)- Repeat calculation and stabilization of each known continuous sound feature Method for making known continuous sound features Each known continuous tone has a mutual stability feature (represented by a matrix of mean values and variances), a standard model called the known continuous tone, placed in a known database; '曰(11) will An unknown sentence or name is cut into D unknown continuous sound methods; (12) - A simplified shell classification method in D unknown continuous sounds, for each - 30 201106340 Unknown continuous sounds in known continuous sound features (four) library t , select the F most similar known continuous sounds - an unknown sentence or the name is represented by the known continuous sound of the _ matrix; (10) using the known continuous sound of DXF_, comparing the sentence and the name database with all sentences and names, Look for - the most likely known sentence or name; (10) - a method of correcting a continuous tone feature, so that the speaker's sentence or name is identified as being read. 2 • According to the identification method of the patent application scope 1, the method of the steps: the item mentioned in the item - the miscellaneous words recognize various languages (3) delete the sound waves or murmurs without speech, including the signal points in the second a small segment 'Calculating the number of variances of signal points and the variation of the general murmurs' as recorded _ the number of variograms is smaller than the murmur, and the coffee goes to that time; (^-hours are the points of the number, the sum of the peaks of the two recorded points and the general miscellaneous _ The age of the 辑 跋 辑 Μ 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者 后者The matrix, the steps are as follows: (= 2 aliquots - a continuous sound wave recording method 1 (10) Sexual changes closely estimate the nonlinear changes of the sound waves, the full length of the sound waves into the transition ^ 'mother time period formation - the rotation box, one Continuous sound has a paste frame, no tender 11 (FUW stack, can be freely stretched and covered 201106340 ,長音波’不是固定長度的漢明(Harami⑻窗; (b) 母框内,用—隨時間作線性 線性變化的音波; 的轉模式估計隨時間作非 (c) 用Durbin’s循環方式 R(i) = ^S(n)S(n + i\ ,·>〇 η=〇 E〇=R(〇) 是,+ =_-乞〇_ 刀]/κ }-\ 卜1 α,ω Et =(1-^2)£m aj=aT^ l<j<P 求迴歸係數別、平方估計值#W,叫絲性預估編媽 (LPC)向量,再用 | 1-1 / α'«·=α.·+Σ(~)α.·-;α,/·> ι</<ρ j-\ l α’.=Σ(Ζ«, p<i /=α-ρ l 轉換線性預估編碼(LPC)向量為穩定的線性預估編碼倒頻 譜(LPCC)向量α',·,ΐϋΡ ; (d)用彳個線性預估編碼倒頻譜(LPCC)向量表示一個連續音。 32 201106340 4根據申轉利範圍第1項所述之一 的辨認方法’其中步驟⑸又包含-型方法,其步驟如下: ㈤將^撕音波㈣輪,輪_ ^未知_音以鱗長框,沒錢波g,, 自由伸縮含蓋全部音_號點; "201106340, Long sound wave 'is not a fixed length Hamming (Hari (8) window; (b) inside the mother box, with - linearly varying sound waves over time; the mode of rotation is estimated over time (c) with Durbin's loop mode R ( i) = ^S(n)S(n + i\ ,·>〇η=〇E〇=R(〇) Yes, + =_-乞〇_刀]/κ }-\ 卜1 α,ω Et = (1-^2) £m aj=aT^ l<j<P Find regression coefficient, square estimate #W, call the silky estimate mother (LPC) vector, reuse | 1-1 / α '«·=α.·+Σ(~)α.·-;α,/·>ι</<ρ j-\ l α'.=Σ(Ζ«, p<i /=α-ρ The converted linear predictive coding (LPC) vector is a stable linear predictive coding cepstrum (LPCC) vector α', ·, ΐϋΡ; (d) a linear predictive coding cepstrum (LPCC) vector is used to represent a continuous tone 32 201106340 4 According to one of the methods described in the first paragraph of the application of the scope of the transfer, the step (5) further includes a -type method, the steps are as follows: (5) will tear the sound wave (four) wheel, round _ ^ unknown _ sound with scale length Box, no money wave g,, free telescopic cover all sounds _ point; " 個不用樣本能辨認各種語言 個計算未知連續音的分類模 計隨 (_個彈性框内,用—個隨時間作線性變化的迴歸模式估 k間作非線性變化的音波; (c)用Durbin’s循環方式 N -i m^^)S(n+iX i>() Λ=〇 〜U Ε0 = R(Q) 綱— ,(0 Ei=(l-kf)E τ(Ρ) is JSP 計算迴歸係數最小平方姑計值化…(LPC向量) (d)再將LPC向量用公式 尸 33 201106340 a’i=£(f)W μ 轉換成穩定線性預估編碼例頻譜(Lpcc)向量 ⑹用亀線性預估編碼倒頻飢⑽向量,(撕Lpcc矩陣), 作為該未知連續音的分類模型。 根據申%專利㈣第丨項所述—個不用樣本誠認各種語言的辨A sample that can be used to identify unknown continuous sounds in various languages can be identified (with a regression pattern that varies linearly with time in a flexible box). (c) Use Durbin's Cyclic mode N -im^^)S(n+iX i>() Λ=〇~U Ε0 = R(Q) Outline - , (0 Ei=(l-kf)E τ(Ρ) is JSP Calculate the regression coefficient The least squares are calculated... (LPC vector) (d) The LPC vector is further transformed into a stable linear prediction coded spectrum (Lpcc) vector (6) using the formula corpus 33 201106340 a'i=£(f)W μ The estimated code scrambling hunger (10) vector, (tearing Lpcc matrix), as the classification model of the unknown continuous sound. According to the application of the patent (4), the use of a sample does not recognize the various languages. 财法’其步驟⑹包含—個簡易貝氏(Bay哪r〇辨認未知連 續音方法,其步驟如下: ⑷-個未知連續音的特徵是分類模型,用一個辦聰矩陣 5 7 -1, ··.,£ 5 ^=1 d 主一 ^L· ,.·.,ρ,表不,為了快速辨認,办户個LPCC 戸又疋疋固獨立隨機變數,有正常分配,如果未知連 續音和一個已知連縴立 9 c, ι·-1’.··’% (历是所有已知連續音總 立)a比對日守’則的平均數及變異數(/w4)用該已知連續 ^準㈣平均值及本變異數估計,賴X的條件密度函數The financial law's step (6) contains a simple Bayesian (Bay which identifies the unknown continuous sound method, the steps are as follows: (4) - The characteristics of an unknown continuous sound are classification models, using a cadence matrix 5 7 -1, ·., £ 5 ^=1 d The main one ^L· ,.·.,ρ, the table is not, in order to quickly identify, the office has an LPCC and tampers the independent random variables, there is a normal allocation, if the continuous continuous sound and A known number of filaments 9 c, ι·-1'.··'% (the calendar is the total number of known continuous sounds) a comparison of the average number and the number of variances (/w4) Knowing the continuous (quad) mean value and the estimate of the variance, the conditional density function of Lai X Π μ 4ΐπσίΗ 知連續音的分類模_線㈣估料倒頻譜 變異触ΓΓ4)可用已知連續音爾模型内的平均數及 _易貝氏分類法物化知連續音特徵雜中找 知連續音^最像此未知連續立 個已 運、·男曰X個已知連續音對未知連 34 201106340 續音χ相似度以下式中仲丨Ci)表示 [Υ42πσΰ(\ , C)為快速辨認,用對數化簡(b)中條件密度函數八^^),並删 去不必計算的常數,得貝式距離,也叫貝氏(Bayesian) 分類法 %·) + ; ^ 2 n %Π μ 4ΐπσίΗ Knowing the classification pattern of continuous sounds _ line (4) Estimating the cepstral variation lemma 4) Using the average number in the known continuous phonological model and the _ Yi Bei's classification to materialize the continuous sound characteristics and find the continuous sound ^ Most like this is the unknown continuous standing, the male 曰 X known continuous sound to the unknown even 34 201106340 Continuation χ similarity in the following formula Zhong Zhong Ci) indicates [Υ42πσΰ (\ , C) for rapid identification, with logarithm Simplify the conditional density function (b) in (b), and delete the constants that do not need to be calculated. The Bayesian distance is also called Bayesian classification. %)); ^ 2 n % ⑷對每-個已知連續音e; ’ f = 1,%計算(c)式中貝式距離⑽ 值; ⑹在特徵資料庫中’選擇-個已知連續和,它的貝式距離 %;)值疋最小,判為該未知連續音。 6·根據申請專利範圍第i項所述之一個不用樣本能辨認各種語言的 辨認方法’其步驟⑽包含—個錢計算並穩定賴倾庫所有已 知連續音特徵方法: ω如果在已知連續音特徵#料庫有―個已知連續音,它對在已 知連續音永久資料庫的同樣已知連續音的貝氏距離在特徵資料 2不是最小,鑛㈣氏距離在特蹄料庫喊Ν個距永久資料 士同樣已知連續音的線性預估編碼倒頻譜⑽〇最近的已知連 績音; (b)計算Ν個最近連續音的 估編碼倒頻譜(LPCC) N+1 N個平均值及該已知連續音的線性預 個的加權平均值為該已知連續音新的 35 201106340 平均值,並計㈣個最近連續音的N個變異數加權平均值騎已 新變異數,此-新平均值及新變異數的矩陣 知連項曰新的特徵,放在特徵資料庫内; ㈦重覆蝴a)到⑹多次,最後新的特徵,㈣平均值及變 異數矩陣表示’叫做該已知連續音的標準模型; ’料庫U已知連續音的祕讎編碼侧譜(聰) 未改變。(4) Calculate the Bayer distance (10) value in (c) for each known continuous tone e; ' f = 1,%; (6) Select - a known continuous sum in the feature database, its shell distance % ;) The value 疋 is the smallest and is judged as the unknown continuous sound. 6. According to the application of patent scope i, a non-sample can identify the recognition method of each language's step (10) contains - money to calculate and stabilize all known continuous tone features of the library: ω if known in continuous The sound feature #料库 has a known continuous sound, which is not the smallest in the characteristic data 2 of the same known continuous sound in the permanent database of known continuous sounds, and the mine (four) distance is shouted in the special shoe library. The linear predictive coding cepstrum of the continuous sound is also known as the permanent data (10) 〇 the most recent known continuous performance sound; (b) the estimated coded cepstrum of the most recent continuous sound (LPCC) N+1 N The average value and the linear pre-weighted average of the known continuous sound are the new average of 35 201106340 of the known continuous sound, and the weighted average of the N variances of the (four) most recent consecutive sounds is the new variance. This new feature of the new mean and new variability is placed in the feature database; (7) Repeating the butterfly a) to (6) multiple times, the last new feature, (4) the mean and the matrix of the variance 'called the known continuous sound Registration model; 'U secret Chou material library encoding a known side continuous tone spectrum (Cong) unchanged. •根據申請專利範圍第丨項所述之一個不用樣本能辨認各種語言的 辨邮方法,其步驟(⑴更包含一未知句子或名稱切成〇個未知連 績音方法: ⑷每單位時段計算相鄰二個信號點落差距離總和,如太小 該時段則是靜音或雜音,沒有語音訊號; ⑸靜音或雜音相鄰單位時段累積太多(比連續音内兩個音節 之間要長),該時段應是兩連續音分界線,應切割,一個 未知句子或名稱切割成!)個未知連續音; (c)再料個連續音除去靜音及雜音,彈性框正常化,最小平 方計算線性預估編碼倒頻譜(LPCC)向量,代表一個未知連 、、’、曰$子或名稱—共用D個線性預估編碼倒頻譜 (LPCC) ExP矩陣表示。 根據申請專纖圍第1項所叙—個不雜她^各種語言的 辨認方法,其步驟(12)更包含下列: 36 201106340 一個未知句子或名稱切割成D個未知連續音後,每一個 未知連續冰細化貝式分類法,在特徵資料庫中 异每個已知連續音ς.Κ}和該未知連續音^貝式距 ^c- ^= Σ ln(<am) + Φ^-l^L) 找最近的F個已知連續音(F個可能同時包含多種語言), 個未知連續音用該F個已知相似連續音表示; ⑻因此—個未知句子或名稱有D列f個已知相似連續音表 示’也即該句子或名稱在DxF鱗的已知相似連續音:: 非常高。 根據申請專利範圍第丨項所述之—個不雜本能辨認各種語言的 辨㈣方法,其步驟(13)更包含下列—個句子及名稱辨認方法: (a) 句子及名稱資料庫中,挑選和講話者的句子或名稱長度大 約相等的句子或名稱(即有D±1個已知連續音的句子和名 稱); (b) 如果在句子及名稱資料庫中,挑選比對的句子或名稱,它 的長度剛好和講話者的句子或名稱等長個未知連續音) 時’那麼將D個每列F個相似已知連續音和被挑選的比對 句子或名稱的D個已知連續音依順序比對,看看f個相似 已知連續音中有沒有比對句子或名稱内的已知連續音,如 每列相似已知連續音依次都包含比對句子或名稱内—個已 37 201106340 知連、S共會有全部D個未知連續音辨認正確,該比 對句子或名稱就是講話者的句子或名稱; ㈤如果句子及名稱倾料的比對句子或名稱有D個已知連 績音’但比對講話者,D個連續音沒有完全辨認正確(不在 F個相似已知連續音内)或比對句子或名稱不是D個長度, 本發明顧祕鶴篩選,用Μ鱗她已知連續音中 前後三列相似已知連續音依順序比對句子及名稱資料庫 中’有D個或D±1個已知連續音的比對句子或名稱中每一 個已知連續音,在資料庫中選擇—個機率最大的比對句子 或名稱為講話者的句子或名稱,機率以多少比對句子或名 稱的已知連續音落在3xF視窗内除以全長(D或D±1)。 10 ·根據申請專利範圍第1項所述之―個不用樣本的辨認方法,其 步驟(14)更包含-個修正連續音特徵方法務使句子或名稱辨認 正確: (a)假如某句子或名稱觸錯誤,—定是在D個未知連續音 中,有一個或多個不在它們的F個相似已知連續音中,假 定c表示其中一個未知連續音不在它的?個相似已知連續 音中,用N個最相似的已知連續音,求N個連續音特徵 Α={/ν’4}’ί = ι,···’"’; = 1’..·,£,€ = 1’,p,的平均值(或依順 序加權)’令,守’此平均值一代表該未 知連續音c的新特徵; 38 201106340 (b) 在(a)項中’以測試者發音的線性預估編,頻譜 和N個最相似已知連續音的N個平均值求N+u固:權平均 值為該未知連續音的新平均值,求N個最相似已知連續音 的N個變異數的加權平均值為該未知連續音的新變異數: 此新平均值及變異數ExP矩陣代表該未知連續 音新的標準 模型; (c) 再測試該未知句子或名稱,應該會成功。• According to the application of the patent scope, the use of a sample can identify the method of the various languages, the steps ((1) include an unknown sentence or the name is cut into an unknown method: (4) Calculate the phase per unit period The distance between the two adjacent signal points is the sum of the distances. If it is too small, the period is muted or murmur, and there is no voice signal. (5) Silence or noise accumulates too much in the adjacent unit period (longer than the two syllables in the continuous tone), The time period should be two consecutive sound dividing lines, which should be cut, an unknown sentence or name cut into!) an unknown continuous sound; (c) a continuous sound to remove mute and noise, the elastic frame normalization, the least squares calculation linear prediction A coded cepstrum (LPCC) vector representing an unknown connection, ', 曰$ sub or name—shares D linear predictive coding cepstrum (LPCC) ExP matrix representations. According to the first item of the application for the special fiber, the method of identifying the various languages is not included. The step (12) further includes the following: 36 201106340 After an unknown sentence or name is cut into D unknown continuous sounds, each one is unknown. Continuous ice refinement of the shell classification method, in the feature database, each known continuous sound ς.Κ} and the unknown continuous sound ^ bech distance ^c- ^= Σ ln (<am) + Φ^- l^L) Find the nearest F known continuous sounds (F may contain multiple languages at the same time), and the unknown continuous sounds are represented by the F known similar continuous sounds; (8) Therefore - an unknown sentence or name has a D column f A known similar continuous tone indicates 'that is, the known similar continuous tone of the sentence or name in the DxF scale:: Very high. According to the method of claiming the scope of patent application, the method of identifying the various languages (4) is not complicated, and the step (13) further includes the following sentence and name identification methods: (a) in the sentence and name database, select A sentence or name that is approximately equal in length to the speaker's sentence or name (ie, a sentence and name with D ± 1 known continuous tones); (b) If in the sentence and name database, select the sentence or name of the comparison , its length is just as long as the speaker's sentence or name is an unknown continuous sound) then 'D then D each column F similar known continuous tones and the selected D consecutive words of the selected sentence or name According to the order comparison, see if there are any known continuous sounds in the f-like similar known continuous sounds, such as the similar continuous sounds in each column, including the matching sentences or the names in the order. 201106340 Zhilian, S will have all D unknown continuous sounds correctly identified, the matching sentence or name is the speaker's sentence or name; (5) If the sentence and name are compared to the sentence or name there are D known companies Jiyin' Comparing the speakers, the D consecutive sounds are not fully recognized correctly (not in F similar known continuous sounds) or the matching sentences or names are not D lengths, the invention is a secret crane screening, with the scales she knows the continuous sound In the middle and the bottom three similar known continuous tones, in order to compare each of the sentences or names in the sentence and name database with 'D or D ± 1 known continuous tones, in the database, in the database Select the most likely match sentence or the sentence or name of the speaker. The probability is proportional to the known continuous pitch of the sentence or name divided by the full length (D or D ± 1) in the 3xF window. 10 · According to the identification method of “not using samples” mentioned in item 1 of the scope of patent application, step (14) further includes a method of modifying the continuous tone feature to make the sentence or name identify correctly: (a) If a sentence or name Touch error, which is determined to be in one of the D unknown continuous tones, one or more of them are not in their F similar known continuous tones, assuming that c indicates that one of the unknown continuous tones is not in it? In a similar known continuous sound, N consecutive sound features are obtained using N most similar known continuous sounds {={/ν'4}'ί = ι,···'"'; = 1'. .·, £, € = 1', p, the average value (or weighted by order) ', keep 'this average one represents the new feature of the unknown continuous sound c; 38 201106340 (b) in item (a) In the linear prediction of the tester's pronunciation, the spectrum and the N average of the N most similar known continuous tones are N+u solid: the weight average is the new average of the unknown continuous sound, and the N most The weighted average of the N variances of similar known continuous tones is the new variation of the unknown continuous tone: This new average and the variation ExP matrix represents the new standard model of the unknown continuous tone; (c) Retest the unknown The sentence or name should be successful. 3939
TW98126015A 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples TWI395200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98126015A TWI395200B (en) 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98126015A TWI395200B (en) 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples

Publications (2)

Publication Number Publication Date
TW201106340A true TW201106340A (en) 2011-02-16
TWI395200B TWI395200B (en) 2013-05-01

Family

ID=44814314

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98126015A TWI395200B (en) 2009-08-03 2009-08-03 A speech recognition method for all languages without using samples

Country Status (1)

Country Link
TW (1) TWI395200B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI460613B (en) * 2011-04-18 2014-11-11 Tze Fen Li A speech recognition method to input chinese characters using any language

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6807537B1 (en) * 1997-12-04 2004-10-19 Microsoft Corporation Mixtures of Bayesian networks
US20070033027A1 (en) * 2005-08-03 2007-02-08 Texas Instruments, Incorporated Systems and methods employing stochastic bias compensation and bayesian joint additive/convolutive compensation in automatic speech recognition
CN101079103A (en) * 2007-06-14 2007-11-28 上海交通大学 Human face posture identification method based on sparse Bayesian regression

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI460613B (en) * 2011-04-18 2014-11-11 Tze Fen Li A speech recognition method to input chinese characters using any language

Also Published As

Publication number Publication date
TWI395200B (en) 2013-05-01

Similar Documents

Publication Publication Date Title
JP6906067B2 (en) How to build a voiceprint model, devices, computer devices, programs and storage media
Kinnunen Spectral features for automatic text-independent speaker recognition
Zhan et al. Vocal tract length normalization for large vocabulary continuous speech recognition
TWI396184B (en) A method for speech recognition on all languages and for inputing words using speech recognition
CN111161702A (en) Personalized speech synthesis method and device, electronic equipment and storage medium
Pawar et al. Review of various stages in speaker recognition system, performance measures and recognition toolkits
Paulose et al. Performance evaluation of different modeling methods and classifiers with MFCC and IHC features for speaker recognition
Subhashree et al. Speech Emotion Recognition: Performance Analysis based on fused algorithms and GMM modelling
Hansen et al. Speaker height estimation from speech: Fusing spectral regression and statistical acoustic models
CN113782032B (en) Voiceprint recognition method and related device
Yousfi et al. Holy Qur'an speech recognition system Imaalah checking rule for warsh recitation
CN110838294B (en) Voice verification method and device, computer equipment and storage medium
JP5091202B2 (en) Identification method that can identify any language without using samples
Kinnunen Optimizing spectral feature based text-independent speaker recognition
TWI297487B (en) A method for speech recognition
CN114724589A (en) Voice quality inspection method and device, electronic equipment and storage medium
TW201106340A (en) A speech recognition method for all languages without using samples
Kupryjanow et al. Real-time speech signal segmentation methods
Bose et al. Robust speaker identification using fusion of features and classifiers
Koolwaaij Automatic speaker verification in telephony: a probabilistic approach
Mittal et al. Age approximation from speech using Gaussian mixture models
Pan et al. Mandarin vowel pronunciation quality evaluation by a novel formant classification method and its combination with traditional algorithms
JP4236502B2 (en) Voice recognition device
Hautamäki Fundamental Frequency Estimation and Modeling for Speaker Recognition
Mital Speech enhancement for automatic analysis of child-centered audio recordings

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees