經濟部中央橾準局貝工消費合作社印製 3幻567 A7 _B7__ i'發明説明(/) 01 本發明係提供一種基週同步式的線性預估語音合成器 ,特別是指一種低資料量、參數化且應用於語音合成領域 者。 按,語音合成器乃是一種數學模式,其傷用來模擬人 05 類的發聲器官,因此其可藉由參數之輸入而産生數位語音 合成訊號。傳統的合成器大都以線性預估模式為基礎發展 出來的,例如LPC 、MPLPC、RPLPC、CELP、等,而其它 的方法尚有PSOLA等,大致上依其特性可以將上掲的此等 方法分為兩大類,一為參數化,另一為非參數化;參數化 10 的合成器對於語音韻律的調整及語音資料的壓縮較為容易 ,但其失真量較大,而非參數化之語音合器,雖有較高品 質之語音,但其接受語音韻律調整之彈性較差,資料量亦 大。 一般在文句翻譯語音条統(即TTS系統)中,韻律訊 15 息與頻譜訊息任意産生,以提供語音合成器來合成語音, 因此在TTS糸統之語音合成器必需有能力接受變化的韻律 與頻譜訊息,以便於合成語音;換言之,語音合成器必需 具備有韻律調變之能力,而LPC合成器之所以被廣泛使用 於TTS糸統中,也就是因為其具有優良的韻律調變能力, 2〇 但也因LPC合成器對於激發訊息之楔擬過於簡單,以致於 其所合成之語音品質不佳·,因此有其它修正過後的語音合 成器的産生,例如MPLPC、RPLPC及CELP等語音合成器, 此等語音合成器均以線性預估為基礎,對於激發訊息的模 擬使用更複雜的模型,也因此這些改良後的語音合成器得 第04頁 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 1· — — I— f 装 I 灯 f ••4 - (請先閲讀背面之注意事項再填寫本頁) 322567 A7 B7 五、發明説明(2 ) (請先閲讀背面之注意事項再填寫本頁) 〇1 到不錯的語音品質,但是此等語音合成器對於激發訊息無 法做韻律參數調整以合成不同韻律之語音,以至於無法應 用於TTS糸統中;而另一種PS0LA合成器則是直接在時域 下將語音波中的音素以窗框一一取出,然後做韻律調變合 〇5 成語音,此種語音合成器雖然可以在TTS糸統中做韻律調 變,而且語音品質亦佳,然而其對於過度的韻律調變後, 所合成之語音品質反而會變得很差,而且此種語音合成器 需要有龐大的記億容量。 就上掲該等語音合成器所採行之技術而言,或是易於 1〇 做參數調整,或是可以得到較高的音質效果,或是需要有 龐大的記億容量相搭配,然而卻無法兼具有易於參數調整 、低資料量及高語音品質效果,實為語音合成器發展過程 中的缺憾。 有鑑於習知之語音合成器尚有上掲之缺憾,是以,本 15 發明人積多年從事研究,終有本創作『基週同步式的線性 預估語音合成器』之産生。 經濟部中央標準局貝工消費合作社印製 本發明之主要目的偽提供一種兼具有低資料量、良好 語音品質及參數調整功能之基週同步式線性預估語音合成 器,且在做參數諝整時不會有因過度調整參數而使語音合 20 成之音質變差。 而,本發明之主要恃徵係將語音之分析分成一有聲語 音分析及一無聲語音分析,其中有磬語音分析偽以一已知 基週值輸出器控制器控制一昵波産生器模擬激發訊息做開 放式遞迴蓮算及一第一隨機碼書模擬激發訊息做封閉式遞 第05頁 本紙張尺度逋用中國國家標準(CNS ) A4規格(210X297公釐) 322567 A7 B7 五、發明説明(3) 01 迺運算,分別求得與有聲語音相關之最佳基週同步脈衝之 位置及大小值及最佳基週同步碼字、最佳基週同步碼字能 量、長期預估偽數,而無聲語音合成裝置則以一第二隨機 碼書模擬無磬語音激發訊號,並求得最佳時序同步碼字及 05 最佳時序同步碼字能量,使得在語音分析時,可根據該等 參數做調整韻律。 緣是,為達到上述之目的,本發明偽將有聲語音及無 磬語音分開處理,大體上包括一有轚語音分析裝置、一有 轚語音合成裝置、一無聲語音分析裝置及一無轚語音合成 10 裝置,其中: 該有聲語音分析裝置,包括有:一基週同步脈衝分析 迴路,偽以一事先已知基週值輸出器控制一脈波産生器提 供近似週期性但有能量變化之基週脈波模擬激發訊息,再 經一第一短期預估濂波器産生一第一级有聲語音合成信號 15 ,復將有聲原始語音與該第一级有轚語音合成信比較産生 一第一级有磬語音誤差信號,並將該第一级有聲語音誤差 信號遞回並控制該脈波産生器,及蓮算求得最佳基週脈衝 之位置及大小值;及一基週同部碼字分析迴路,係以一第 一隨機碼書並配合上述基週脈衝位置模擬激發訊息並賦予 20 能量,再經一長期預估濾波器及一第二短期預估濾波器産 生一第二级有磬語音合成信號,而將上述週期脈衝分析迺 路所産生之第一级有聲語音誤差信號與該第二级有轚語音 合成信號比較産生第二级有轚語音誤差信號,而將該第二 级有聲語音誤差信號經一誤差加權濾波器做錯誤罩遮處理 _______第 06頁_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) J! I — I "f 装 — — I 訂^ 、 ί -(請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局貝工消費合作社印装 經濟部中央標準局貞工消費合作社印裂 A7 __B7_ 五、發明説明(〆) 01 ,復遞回至該第一隨機碼書及該長期預估濾波器並求得最 佳基週同步碼字、最佳基週同步碼字對應之能量及長期預 估偽數; 該有磬語音合成裝置,傺接收上述語音分析装置所分 05 析得之最佳基週脲衝之大小值、最佳基遇同步碼字、最佳 基週同步碼字能量、長期預估偽數等參數以及接受欲合成 之語音之基週值,利用新的基週值産生新的基週脈衝位置 ,並調整該等參數經一第三短期預估濾波器楔擬合成有聲 語音信號; 10 該無聲語音分析裝置,傜以一第二隨機碼書模擬無轚 語音激發訊息並賦予能量,而後經一第四短期預估濾波器 模擬輸出一無磬語音合成信號,復將無磬原始語音與該無 聲語音合成信號相減得到一無轚語音誤差信號,而後將該 無磬語音誤差信號經一誤差加權濾波器做錯誤罩遮處理, 15 復遞回該第二隨機碼書及調整能量,並以蓮算求得該無聲 語音之最佳時序碼字及最佳時序碼字能量;及 該無聲語音合成裝置,偽將上述無轚語音分析裝置所 求得之最佳時序碼字及最佳時序碼字能量經一第五短期預 估慮波器模擬産生與該無聲原始語音相應之無聲合成語音 20 〇 再者,本發明更可以一語音合成裝置接收該有轚語音 分析裝置、該無轚語音分析裝置所分析之參數,其中該語 音合成裝置包括有一第二乘法器、第三乘法器、第四乘法 器、第五乘法器、第三加法器、第四加法器、一延遲器及 __第07頁__ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) I. A 装 II 訂 1 I I I ( - (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局貝工消費合作社印製 A7 B7 五、發明説明(5") 01 一第三短期預估濾波器,其中上述有轚語音分析裝置所利 用事先已知之基週分析求得之最佳基週脈衝值、最佳基遇 同步碼字及最佳基週同步碼字能量、長期預估傜數分別以 該第二乘法器、該第三乘法器、該第五乘法器乘以能量做 05 為能量調整,而該第四乘法器則做為長期預估偽數之輸入 單元,因此當最佳基週同步碼字及長期預估傜數分別經由 該第三乘法器及該第五乘法器調整能量後經該第三加法器 相加後,一方面經該延遅器(延遲時間為一週期)遞回至 該第四乘法器與該長期預估偽數相乘,另一方面則傳送至 第四加法器與經第二乘法器做過能量調整之最佳脲衝值相 加後送至第三短期預估濾波器模擬合成有聲語音信號,而 該無聲語音分析裝置分析求得之最佳碼字及最佳碼字能量 則經由第三乘法器做能量調整,而後經該第三加法器、該 第四加法器及該第三短期預估濾波器模擬成無聲語音。 15 有關本發明為達上述目的、特徴所採用的技術手段及 其功效,Η例舉較佳實施例並配合圖式說明如下: 第一圖傜原始語音『尸丫』之波形圖。 第二圖像本發明較佳實施例之有轚語音分析方塊圖。 第三圖偽本發明較佳實施例之有轚語音合成方塊圖。 2〇 第四圖偽本發明較佳實施例之無聲語音分析方塊圖。 第五圖俗本發明較佳實施例之無聲語音合成方塊圖。 第六圖傜本發明較佳贲施例之有轚語分析之第一级分 析波形圔。 第t圖偽本發明較佳實施例之有轚語音分析之流程圖 ___第08頁_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) J--------------、玎------「 ' ί - (請先閱讀背面之注意事項再填寫本頁) A7 B7 經濟部中央標準局貝工消費合作社印裝 五、 發明説明(6) -. 1 1 01 〇 1 1 第八圖傜本發明較佳實施例之有聲語分析之第二级分 1 1 析波形圖。 請 先 Γ 第九圖傜本發明較佳實施例之無聲語分析之波形圔 〇 閲 讀 背 1 I 05 第十圖偽本發明較佳實施例之無轚語音分析之流程圖 〇 Sf 之 注 意 事 項 再 % 窝 本 頁 1 I 第十一圖傜本發明較佳實施例做合成語音之基週調整 之波形圖。 第十二圖A、 B像本發明較佳實施例做合成語音之長 1 袈 1 1 10 度調整之波形圖 〇 1 1 有轚語音分析装置 10、 第四乘法器 23 1 I 基週同步脈衝分析迺路 11、 第五乘法器 24 1 訂 脈波産生器 111, 第三加法器 25 1 第一短期預估濂波器 112、 延遅器 26 1 I 15 第一加法器 113、 第四加法器 27 1 1 已知基週值輸出器 114、 第三短期預估濾波器 28 1 基週同步碼字分析迺路 12、 新基遇值輪出器 29 I 第一隨機碼書 121、 無聲語音分析裝置 30 1 1 第一乘法器 122λ 第二随機碼書 31 I 20 長期預估濾波器 123, 第六乘法器 32 1 第二短期預估濾波器 124、 第四短期預估濾波器 33 '1 第二加法器 125. 第五加法器 34 1 I 第三誤差加權濾波器 126. 第二誤差加權濾波器 35 1 1 有轚語音合成裝置 20、 無轚語音合成装置 40 1 1 1 第09頁 1 1 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) A7 B7 322567 五、發明説明(Γ) 01 第二乘法器 21、 第t乘法器 41 第三乘法器 22、 第五短期預估濾波器 42 首先, 為了方便說明及易於瞭解本發明之實施, 先將 〇5 有關於語音合成器通用規則或楔型中的要件先予以簡要說 明如下: ⑴語音依其特性可以分成有轚音(VOICE )及無轚音或清 音(UNVOICE ),第一圖所示傺原始音節Γ尸』『丫』 之波形,其中,『尸』為無轚音(UNVOICE ),其波形 1〇 雜亂無規則性,『丫』為有聲音(VOICE ),其波形有 週期性變化,而本發明在以下之實施例即以原始語音『 尸』『丫』為說明示例; 0短期預估濾波器:傜用來模擬口鼻腔形狀,且可將输入 之訊號予以反射並隨時間衰減,一般是以函數h[n]來表 15 示,而在Z-D0MIN中則以” H(Z) = 1M1+? a d-1) ” 來 t 表示,而” hUkZ-VlKz)), Z-1代表Z-反轉換”; 而在本案中,短期預估濾波器之實現僳採兩種方式:在 有聲語音分析裝置10中的第一短期預估濾波器112及第 二短期預估濾波器124 ,以及無轚語音分析装置30中的 2〇 第四短期預估濾波器33,均採用第一種方法,即先將H(Z) 轉換成h[n],經截取一週期長度後,當作FIR之脈波嚮 應(impulse response),以迴旋積分(convolution )蓮算來實現;在有轚語音合成裝置20中的第三短期預 估濾波器28及語音合成裝置中的第五短期預估濾波器42 第10頁 本紙張尺度適用中國國家標準(CNS ) A4规格(210X297公釐) -I--------f 装------訂------{ j - - (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 經濟部中央標準局員工消費合作社印裝 322567 ΑΊ ____Β7 _ 五、發明説明(g ) 01 ,均採用第二種方式,即以H(z)之IIR direct forn實 現,其中短期預估濾波器H(Z)中的參數iu為LPC係數, 在本實施例中,參數au傜以一段音框之語音訊號求取一 組ai, i=l、2...... 12,且音框長度為10 ns ( 〇5 即取樣率為1 0 kHz時,音框長=1 〇 〇點); 長期預估濾波器:偽為了更降低訊號之失真量而設計, 其主要是去除每値週期訊號間相同之部份,因此當週期 訊號經過長期預估濾波器後,僅剩週期信號差異之部份 ,此差異部份訊號愈小,則合成之語音的誤差愈小,在 1〇 數學分析中,長期預估濾波器傜併入短期預估濾波器中 計算,其共同的數學函數在Z-D0MAIN中偽為” H(Z) = 1A 1-rZ —) ”,其中P為基週週期值,而r則為長期預估 偽數; ⑷隨機碼書:偽用來模擬激發訊號,其傜由” +1”、” 15 — 1 ”及” 0”所組成之高斯雜訊,其長度約2500點至 3000點,視使用狀況而定,而選用的碼字長度為25或50 或100 ,每隔兩點選一碼字之起始點,而在數學上可將 碼字以” a[n]”來表示; (9誤差加權濾波器:傜使合成語音中之雜訊讓人耳較不易 20 聽出,因此誤差加權濾波器又稱為聽覺感官模型或錯誤 罩遮(ERROR SIGNAL MASKING)濾波器,其數學函數表 示為” w[n】”,而在 Z-D0MUH 則為” W[Z] = (1+E a \ "(1+1: a .c^Z-1),’ ,而 ” w[n] = Z -1 (W(Z))”。 請參考第二圖、第三圖第四圖及第五圖所示,本發明 __第11頁_ 本紙張尺度適用中國國家標準(CNS ) Α·4規格(210X297公釐) I--------------tr------{ * i (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(9) 01 之「基週同步式的線性預估語音合成器」大體上包括有一 有聲語音分析装置10、一有聲語音合成裝置20、一無聲語 音分析裝置30及一無聲語音合成裝置40;其中: 該有聲語音分析装置10,包括有第一级之基週同步脈 05 衝分析迴路11及第二级之基週同步碼字分析迴路12 ;其中 該基週同步脈衝分析迴路11,包括有一已知基週值輸 出器114、一脈波産生器111、一第一短期預估濾波器11 2及一第一加法器113 ,請配合第六圖所示,其中該脈波 10 産生器111傜根據已知基週值輸出器114所提供之基週,用 以産生近乎週期性但有能量變化之基週脈波3 [η]波形如 第六圖之(Β)來模擬激發訊息,即模擬發聲時流經口腔之 氣流,δ [η]傳送至該第一短期預估濾波器112模擬合成 第一级有聲語音合成信號S0 [η](波形如第六圖之(C)) 15 ,而後該第一加法器則將原始語音t [η](波形如第六画 之(Α))與該第一级有磬語音合成信號le [η]比較相減輸 出第一级有磬語音誤差信號es [η】(波形如第六圖之(D) ),再將該第一级語音誤差信號ee [η】予以遞回該脈波産 生器111 ,以調整週期脈衝之能量值及位置;在第一级之 20 基週同步脈衝分析迺路11主要是以基週同步分析方式,對 每一週期脈衝使用最佳化(OPTIMAL )的方法求得最佳脈 衝值,其蓮算方式如下:Printed by the Central Ministry of Economic Affairs of the Ministry of Economic Affairs, Beigong Consumer Cooperative, 3 Magic 567 A7 _B7__ i'Invention Description (/) 01 The present invention provides a basic cycle synchronous linear predictive speech synthesizer, especially a low data volume, Parametric and applied to the field of speech synthesis. Press, the speech synthesizer is a mathematical model, which is used to simulate the human 05 vocal organs, so it can generate digital speech synthesis signals through the input of parameters. Most of the traditional synthesizers are developed based on the linear prediction mode, such as LPC, MPLPC, RPLPC, CELP, etc., and other methods include PSOLA, etc. Generally, these methods can be divided according to their characteristics. It is divided into two categories, one is parameterized and the other is non-parametric; the synthesizer of parameter 10 is easier to adjust the rhythm of speech and the compression of speech data, but its distortion is larger, and the non-parametric speech synthesizer Although it has higher-quality voice, it has less flexibility in accepting voice rhythm adjustment and large amount of data. Generally in the sentence-to-speech speech system (ie TTS system), the rhythm information and spectrum information are generated arbitrarily to provide a speech synthesizer to synthesize speech. Spectrum information in order to synthesize speech; in other words, the speech synthesizer must have the ability of rhythm modulation, and the reason why the LPC synthesizer is widely used in the TTS system is because of its excellent rhythm modulation ability, 2 〇 But because the LPC synthesizer is too simple to stimulate the wedge, so that the synthesized speech quality is not good, so there are other modified speech synthesizers, such as MPLPC, RPLPC, CELP and other speech synthesizers These speech synthesizers are based on linear estimation, and more complex models are used for the simulation of the excitation message. Therefore, these improved speech synthesizers must be on page 04. The paper scale is applicable to the Chinese National Standard (CNS) A4 specifications (210X297mm) 1 · — — I— f Install I lamp f •• 4-(please read the precautions on the back before filling this page) 322567 A 7 B7 V. Description of the invention (2) (Please read the notes on the back before filling in this page) 〇1 To a good voice quality, but these speech synthesizers can not adjust the rhythm parameters of the excitation message to synthesize different rhythm speech , So that it cannot be applied to the TTS system; another PS0LA synthesizer directly takes out the phonemes in the speech wave through the window frame one by one in the time domain, and then performs rhythmic modulation to convert it into speech. Although the speech synthesizer can perform rhythm modulation in the TTS system, and the voice quality is also good, however, after excessive rhythm modulation, the synthesized speech quality will become very poor, and such a speech synthesizer needs There is a huge capacity of 100 million. As far as the technologies adopted by the above-mentioned speech synthesizers are concerned, it is easy to adjust the parameters, or you can get a higher sound quality effect, or you need to have a huge capacity of 100 million billions to match, but they cannot It also has the effect of easy parameter adjustment, low data volume and high voice quality, which is really a shortcoming in the development of speech synthesizers. In view of the shortcomings of the conventional speech synthesizers, the inventor of this book has been engaged in research for many years, and finally produced the book "Basic Synchronous Linear Predictive Speech Synthesizer". The main purpose of the invention printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs is to provide a base-cycle synchronous linear predictive speech synthesizer with low data volume, good voice quality and parameter adjustment functions, and is doing parameter decimation. At all times, there will be no deterioration in the sound quality of the 20% voice due to excessive adjustment of parameters. However, the main symptom of the present invention is to divide the voice analysis into a voiced voice analysis and a voiceless voice analysis, in which the voiced voice analysis is controlled by a known base cycle value output controller to control a nick wave generator to simulate the excitation message Do an open recurrence lotus calculation and a first random code book to simulate the excitation information. Do a closed repetition. Page 05. The paper scale uses the Chinese National Standard (CNS) A4 specification (210X297 mm). 322567 A7 B7 V. Invention description ( 3) 01 operation, to obtain the position and size value of the best base cycle synchronization pulse and the best base cycle synchronization codeword, the best base cycle synchronization codeword energy, and the long-term estimated pseudo-number, which are related to the voiced speech, and The silent speech synthesis device uses a second random codebook to simulate the soundless speech excitation signal, and obtain the optimal timing synchronization codeword and the 05 optimal timing synchronization codeword energy, so that in speech analysis, it can be done according to these parameters Adjust the rhythm. The reason is that, in order to achieve the above purpose, the present invention pseudo-separately processes voiced speech and non-sounding speech, which generally includes a voiced speech analysis device, a voiced speech synthesis device, a voiceless speech analysis device, and a voiceless speech synthesis 10 Device, in which: the voiced speech analysis device includes: a basic cycle synchronous pulse analysis circuit, a pseudo-previously known basic cycle value output device is used to control a pulse wave generator to provide an approximately periodic, but energy changing basic cycle The pulse wave simulates the excitation message, and then a first short-term estimator generates a first-level voiced speech synthesis signal 15, and then compares the original voiced voice with the first-level speech synthesis signal to produce a first-level Chime speech error signal, and return the first-stage voiced speech error signal and control the pulse wave generator, and the lotus calculation to find the position and size value of the best basic pulse; and a code analysis of the same part of the basic cycle The loop uses a first random codebook and the above-mentioned base pulse position to simulate the excitation message and give 20 energies, then a long-term estimation filter and a second short-term estimation filter Generating a second-level chirped speech synthesis signal, and comparing the first-level voiced speech error signal generated by the above-mentioned periodic pulse analysis method with the second-level chirped speech synthesis signal to generate a second-level chirped speech error signal, And the second-level voiced speech error signal is processed by an error weighting filter for error masking_______page 06_ This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) J! I — I " f outfit — I order ^, ί-(please read the precautions on the back before filling in this page) Printed by the Ministry of Economic Affairs Central Bureau of Standardization Beigong Consumer Cooperative Printed by the Ministry of Economic Affairs Central Bureau of Standards Zhengong Consumer Cooperative Printed A7 __B7_ 5 2. Description of the invention (〆) 01, recursively return to the first random codebook and the long-term estimation filter and find the best base cycle synchronization codeword, the energy corresponding to the best base cycle synchronization codeword and the long-term estimation Pseudo-number; the chirped speech synthesis device, which receives the size value of the best base cycle urea pulse analyzed by the above voice analysis device 05, the best base encounter synchronization codeword, the best base cycle synchronization codeword energy, long-term Estimated pseudo-number Parameters and the base cycle value of the speech to be synthesized, use the new base cycle value to generate a new base cycle pulse position, and adjust these parameters to fit a voiced voice signal through a third short-term prediction filter wedge; 10 the silent The speech analysis device uses a second random codebook to simulate the non-spoken speech excitation message and energize it, and then outputs a non-speech speech synthesis signal through a fourth short-term estimation filter, and then combines the non-speech original speech and the silent speech The speech synthesis signal is subtracted to obtain a non-spoken speech error signal, and then the non-speech speech error signal is subjected to an error masking process through an error weighting filter, and the second random code book is re-returned and the energy is adjusted. Calculate the best timing codeword and the best timing codeword energy of the silent speech; and the silent speech synthesis device, pseudo-synthesize the best timing codeword and the best timing codeword obtained by the above-mentioned speechless speech analysis device The energy is simulated by a fifth short-term estimation wave generator to produce a silent synthesized speech corresponding to the silent original speech. In addition, the present invention can receive the speech by a speech synthesis device. The parameters analyzed by the unspoken speech analysis device and the unspoken speech analysis device, wherein the speech synthesis device includes a second multiplier, a third multiplier, a fourth multiplier, a fifth multiplier, a third adder, a fourth Adder, one retarder and __page 07__ The paper size is in accordance with Chinese National Standard (CNS) A4 specification (210X297mm) I. A Pack II Order 1 III (-(Please read the precautions on the back first (Fill in this page) A7 B7 printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of the invention (5 ") 01—The third short-term estimation filter, in which the above-mentioned speech analysis device uses the previously known base-week analysis The best base cycle pulse value, the best base cycle synchronization code word and the best base cycle synchronization code word energy, and the long-term estimated Weng number are obtained by the second multiplier, the third multiplier, and the fifth multiplication, respectively The energy multiplier is 05 for energy adjustment, and the fourth multiplier is used as an input unit for long-term estimation of pseudo-numbers. Therefore, when the best base-cycle synchronization codeword and the long-term estimated number are respectively passed through the third multiplier And the fifth multiplier adjustment After the energy is added by the third adder, it is returned to the fourth multiplier by the delay multiplier (the delay time is one cycle) and multiplied by the long-term estimated pseudo number, on the other hand, it is sent to The fourth adder and the energy-adjusted optimal urea impulse value added by the second multiplier are added to the third short-term estimation filter to simulate the synthesized voiced speech signal, and the silent voice analysis device analyzes the best The energy of the codeword and the optimal codeword are adjusted by the third multiplier, and then simulated into silent speech by the third adder, the fourth adder, and the third short-term estimation filter. 15 Regarding the technical means adopted by the present invention for achieving the above-mentioned purposes, and the effectiveness thereof, the example of the preferred embodiment is explained in conjunction with the drawings as follows: The first figure shows the waveform of the original voice "corpse girl". The second image is a block diagram of the voice analysis of the preferred embodiment of the present invention. The third figure is a block diagram of the pseudo speech synthesis of the preferred embodiment of the present invention. 20. The fourth figure is a block diagram of silent speech analysis according to the preferred embodiment of the present invention. Fifth figure is a block diagram of silent speech synthesis according to a preferred embodiment of the present invention. The sixth figure shows the first-level analysis of the waveform analysis of the preferred embodiment of the present invention, which includes the parlance analysis. Figure t Flow chart of pseudo speech analysis according to the preferred embodiment of the present invention ___ 第 08 页 _ This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) J ------- ------- 、 玎 ------ "'ί-(Please read the notes on the back before filling in this page) A7 B7 Printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Description of invention ( 6)-. 1 1 01 〇1 1 The eighth figure of the second embodiment of the present invention of the second embodiment of the voiced language analysis 1 1 analysis waveform diagram. Please first Γ the ninth figure of the present invention of the preferred embodiment of the silent Waveforms of language analysis. Read back 1 I 05. The tenth figure shows the flow chart of the non-spoken speech analysis according to the preferred embodiment of the present invention. The notes on Sf are%. This page 1 I. The eleventh figure is better for the present invention. The embodiment makes a waveform diagram for adjusting the base cycle of synthesized speech. Figures 12 and 12 are the waveform diagram of the length of the synthesized speech that is adjusted by a loop 1 1 10 degrees like the preferred embodiment of the present invention. 10. Fourth multiplier 23 1 I base cycle synchronization pulse analysis method 11. Fifth multiplier 24 1 order Pulse generator 111, third adder 25 1 first short-term estimator 112, extender 26 1 I 15 first adder 113, fourth adder 27 1 1 known base cycle value output 114 , The third short-term estimation filter 28 1 base cycle synchronization codeword analysis method 12, the new base encounter value wheel out 29 I first random codebook 121, silent speech analysis device 30 1 1 first multiplier 122λ second Random code book 31 I 20 long-term estimation filter 123, sixth multiplier 32 1 second short-term estimation filter 124, fourth short-term estimation filter 33'1 second adder 125. fifth adder 34 1 I Third error-weighting filter 126. Second error-weighting filter 35 1 1 With speechless speech synthesis device 20, without speechless speech synthesis device 40 1 1 1 Page 09 1 1 This paper scale is subject to the Chinese National Standard (CNS) A4 specification (210X297mm) A7 B7 322567 5. Description of invention (Γ) 01 Second multiplier 21, t-th multiplier 41 Third multiplier 22, fifth short-term estimation filter 42 First, for convenience of explanation and ease Understand the implementation of the invention, Let ’s first briefly describe the requirements of 〇5 related to the general rules of speech synthesizers or wedge shapes as follows: ⑴Voices can be divided into VOICE and UNVOICE according to their characteristics. Show the waveform of "Ya" in the original syllable Γ corpse of Ye, where "the corpse" is unvoiced (UNVOICE), its waveform 10 is chaotic and irregular, and "Ya" is vocal (VOICE), and its waveform is periodic Variations, and in the following embodiments of the present invention, the original speech "corpse" "Ya" is used as an illustrative example; 0 short-term estimation filter: Tong is used to simulate the shape of the mouth and nose, and can reflect the input signal and over time Attenuation is generally represented by the function h [n] in Table 15, and in Z-D0MIN it is represented by "H (Z) = 1M1 +? A d-1)", and "hUkZ-VlKz)), Z -1 stands for Z-inverse conversion "; in this case, the short-term estimation filter is implemented in two ways: the first short-term estimation filter 112 and the second short-term estimation filter in the voiced speech analysis device 10 124, and the fourth short-term prediction filter 33 in the non-spoken speech analysis device 30, both of which use One method is to convert H (Z) into h [n] first, and after intercepting the length of one cycle, it is regarded as the impulse response of FIR, which is realized by convolution integral calculation. The third short-term estimation filter 28 in the speech synthesizer 20 and the fifth short-term estimation filter 42 in the speech synthesizer 42 Page 10 This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm)- I -------- f outfit ------ order ------ {j--(please read the precautions on the back before filling out this page) Printed by Employee Consumer Cooperative of Central Bureau of Standards, Ministry of Economic Affairs 322567 ΑΊ ____ Β7 _ Printed by the Employees ’Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs. V. Invention Instructions (g) 01, all of which adopt the second method, which is realized by IIR direct forn of H (z), in which the short-term estimation filter H The parameter iu in (Z) is the LPC coefficient. In this embodiment, the parameter au Meng uses a speech signal of a sound box to find a set of ai, i = l, 2 ... 12, and the length of the sound box 10 ns (〇5 when the sampling rate is 10 kHz, the length of the sound frame = 100 points); Long-term estimation filter: It is set to reduce the amount of signal distortion , It mainly removes the same part between each period signal, so when the period signal passes the long-term estimation filter, only the part of the period signal difference remains, the smaller the difference part signal, the error of the synthesized speech The smaller, in the mathematical analysis of 10, the long-term estimation filter is incorporated into the short-term estimation filter, and the common mathematical function is pseudo-H (Z) = 1A 1-rZ in Z-D0MAIN —) ", Where P is the base cycle period value, and r is the long-term estimated pseudo-number; ⑷ random codebook: pseudo-used to simulate the excitation signal, its value is determined by" +1 "," 15-1 "and" 0 " The composition of the Gaussian noise, its length is about 2500 points to 3000 points, depending on the usage, and the length of the selected code word is 25 or 50 or 100, choose the starting point of the code word every two points, and in mathematics The codeword can be expressed as “a [n]”; (9 error weighted filter: Meng makes the noise in the synthesized speech harder to hear, so the error weighted filter is also called the auditory sensory model Or ERROR SIGNAL MASKING filter, its mathematical function is expressed as "w [n]", and Z-D0MUH is "W [Z] = (1 + E a \ " (1 + 1: a .c ^ Z-1), ', and" w [n] = Z -1 (W (Z) ) ". Please refer to the second picture, the third picture, the fourth picture and the fifth picture, the present invention __ page 11 _ This paper size is applicable to the Chinese National Standard (CNS) A · 4 specifications (210X297 mm) I --- ----------- tr ------ {* i (Please read the precautions on the back before filling in this page) A7 B7 printed by the Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs (9) 01 "Basic Week Synchronous Linear Predictive Speech Synthesizer" generally includes a voiced speech analysis device 10, a voiced speech synthesis device 20, a voiceless speech analysis device 30, and a voiceless speech synthesis device 40; : The voiced speech analysis device 10 includes a first-level base cycle synchronization pulse 05 pulse analysis circuit 11 and a second-level base cycle synchronization code word analysis circuit 12; wherein the base cycle synchronization pulse analysis circuit 11 includes a Known cycle output 114, a pulse generator 111, a first short-term estimation filter 112 and a first adder 113, please match the sixth figure, where the pulse 10 generator 111 is According to the base cycle provided by the known base cycle value output unit 114, it is used to generate almost periodic but energy The changing base cycle pulse 3 [η] waveform is as shown in (B) of the sixth figure to simulate the excitation message, that is, the air flow through the mouth when vocalization is simulated, and δ [η] is sent to the first short-term estimation filter 112 for analog synthesis The first-level voiced speech synthesis signal S0 [η] (the waveform is as shown in the sixth figure (C)) 15, and then the first adder converts the original speech t [η] (the waveform is as shown in the sixth picture (A)) and The first-level chime speech synthesis signal le [η] is compared and subtracted to output the first-level chime speech error signal es [η] (the waveform is as shown in (D) of the sixth figure), and then the first-level speech error signal ee [η] is returned to the pulse wave generator 111 to adjust the energy value and position of the periodic pulse; the 20th base cycle synchronization pulse analysis in the first stage is mainly based on the base cycle synchronization analysis method. Periodic pulses use the OPTIMAL method to find the optimal pulse value. The lotion calculation method is as follows:
假設處理中之基週脈衝位置為nl,脈衝值為gl,而誤 差量Et = zee2[n], P為已知基週值,因此,當》E/MUO __第12頁_ 本银「張尺度適用中國國家標準(CNsTa4規格(210X297公釐1 K —------^ ^------、訂------、 - ; -(請先閱讀背面之注意事項再填寫本頁) 322567 A7 B7 五、發明説明(/ Ο ) 〇1 時,gl為最佳脈衝值gUPt,即: »Ei/»gl = »(Z[(Se[n]-Ss[n])]2)/»gl = 0 Y\ 待,δ1 = δ1〇ρ t =(Σ Se [η】x Ηη-ηΙΠΑΣ h2 [n-nl])---第 Φ 式 η η 〇5 此即在該基週同步脈衝分析迴路11所欲求得之最佳脈 衝值;而最佳脈衝位置nl則靠捜尋法,由可能範圍的每一 點位置逐一代入尋得最小誤差量^時,即為最佳同步脈衝 位置;此一分析過程傜對每一基週脈衝以遇期同步分析方 式逐一進行,因此誤差量Εί的計算範圍從nl至nl + P-1,共 10 P個取樣點。 該基週同步碼字分析迺路12,包括有一第一随機碼書 121、一第一乘法器122、一長期預沽濾波器123、一第 二短期預估濾波器124、一第二加法器125及一第三誤差 加權濾波器126 ,該第一隨機碼書121偽由多數個固定長 15 度之第一碼字aUn]所組成,用來提供棋擬激發訊息,該 第一乘法器122則將由上述第一隨機碼書121所擷取之一 第一碼字iu[n】乘以第一碼字能量g2,而後經過該長期預 估濾波器123及該第二短期預估濾波器124模擬第二级合 成語音訊號St [η],而該第二加法器125則將由該第一级 20 之基週同步脈衝分析迴路所得到之第一级語音誤差信號e3 [η】與該第二級合成語音信號SiU]相減輸出第二级有聲 語音誤差信號ei[n],而後將在經該第一誤差加權濾波器 126蓮算處理送出第二级語音誤差加權信號復將 該第二级語音誤差加權信號h’U】遞回至該第一随機碼書 第13頁 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) J A 装 訂 ( 1 ί (請先閲讀背面之注意事項再填寫本頁) 經濟部中央樣準局貝工消費合作社印製 322567 五、發明説明(//) 01 121、該長期預估濾波器124及調整能最g2。 在第二级之基遇同步碼字分析迺路12主要是根據第一 级所求得之最佳基週同步脈衝位置,以基週同步分析方式 求得最佳基週同步碼字、最佳基週同步碼字之能量及長期 05 預估偽數;而其演算方法如下: 在求取長期預估偽數r時,先假設第一级分析後所得 之誤差信號為eeUkSWn],長期預估濾波器123與第二 短期預估濾波器124之合併之函數為h2 [η],且於第二级 分析中,前一次週期以前所分析得到之第一碼字累積總合 10 Ρ[η]經該長期預估濾波器123、該第二短期預估濾波器12 4總合成訊號為ίυ[η],該第一誤差加權濾波器126之數 學函數為w(n),則本週期之長期預估係數r可由下式獲得 »E2/>r=0 15 Εζ=Σ [(St [n]-Sn [n])«w[n]]a *1 ΛAssuming that the position of the base cycle pulse in the process is nl, the pulse value is gl, and the error amount Et = zee2 [n], P is the known base cycle value, therefore, when》 E / MUO __page 12 _ local silver " The Zhang scale is applicable to the Chinese national standard (CNsTa4 specification (210X297 mm 1 K —------ ^ ^ ------, order ------,-;-(Please read the notes on the back (Fill in this page again) 322567 A7 B7 V. Description of the invention (/ Ο) 1, gl is the best pulse value gUPt, namely: »Ei /» gl = »(Z [(Se [n] -Ss [n] )] 2) / »gl = 0 Y \ Wait, δ1 = δ1〇ρ t = (Σ Se [η] x Ηη-ηΙΠΑΣ h2 [n-nl]) --- the Φ formula η η 〇5 The best pulse value desired by the base cycle synchronous pulse analysis circuit 11 is obtained; and the best pulse position nl is determined by the search method, and the minimum error amount ^ is obtained from each point position of the possible range one by one generation, which is the best Synchronous pulse position; this analysis process is carried out one by one for each base cycle pulse in the period synchronization analysis mode, so the calculation range of the error Εί is from nl to nl + P-1, a total of 10 P sampling points. The base cycle Synchronous code word analysis: Path 12, including a first random code book 121, a first multiplier A multiplier 122, a long-term pre-sale filter 123, a second short-term estimation filter 124, a second adder 125, and a third error weighting filter 126, the first random codebook 121 is fixed by a plurality of It consists of a first codeword aUn] with a length of 15 degrees, which is used to provide a pseudo-stimulus message. The first multiplier 122 will multiply the first codeword iu [n] extracted from the first random codebook 121. With the first codeword energy g2, and then through the long-term estimation filter 123 and the second short-term estimation filter 124 to simulate the second-stage synthesized speech signal St [η], and the second adder 125 will be The first-level speech error signal e3 [η] obtained by the first-level 20-cycle synchronous pulse analysis loop is subtracted from the second-level synthesized speech signal SiU] to output the second-level voiced speech error signal ei [n], and then After the second error weighting filter 126 sends the second-level speech error weighted signal, the second-level speech error weighted signal h'U is returned to the first random code book page 13 The standard is applicable to China National Standard (CNS) Α4 specification (210Χ297mm) JA binding (1 ί (please read the precautions on the back before filling in this page) 322567 printed by Beigong Consumer Cooperative of Central Bureau of Standards of the Ministry of Economy V. Description of invention (//) 01 121, the long-term estimation filter 124 and adjustment performance The most g2. In the second stage of the base synchronization codeword analysis, the way 12 is mainly based on the best base cycle synchronization pulse position obtained in the first stage, and the best base cycle synchronization code word is obtained by the base cycle synchronization analysis method. , The energy of the best base-cycle synchronization codeword and the long-term 05 pseudo-number prediction; and its calculation method is as follows: When obtaining the long-term estimated pseudo-number r, first assume that the error signal obtained after the first level analysis is eeUkSWn], The combined function of the long-term estimation filter 123 and the second short-term estimation filter 124 is h2 [η], and in the second stage analysis, the cumulative total of the first codewords analyzed before the previous cycle is 10 Ρ [η] After the long-term estimation filter 123 and the second short-term estimation filter 124, the total synthesized signal is ίυ [η], and the mathematical function of the first error weighting filter 126 is w (n). The long-term estimated coefficient r of the period can be obtained from the following formula »E2 / > r = 0 15 Εζ = Σ [(St [n] -Sn [n])« w [n ]] a * 1 Λ
Si 1 [n]=rP [n]«hi [n] 得 r=Z Si ’ [n】》P’[η]/Σ (P’[n])2 ——第②式 η m 其中 Si ’ [n】=Si [nhwtn]、P’ [n]=P[n]*h[n]*w[n] 經濟部中央標準局負工消费合作社印裝 (請先閱讀背面之注意事項再填寫本頁) 而最佳基週同步碼字a1ePt[n]及最佳基週同步碼字能 20 量g2ePt之運算方式如下: 首先,假設第二级之原始語音St [η]減掉長期預估所 貢獻之信號Su [η]為第二级有轚語音誤差信號eu [η](卽 ,第一级有轚語音誤差信號ee[n]減掉AS^[n],因此ei1[n ]),而當》E3/*g2 = 0時,可以得到最佳 第14頁 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央標準局貝工消費合作社印装 A7 B7_ 五、發明説明(β) 01 碼字能量g2 = g2ePt ;即, Ε3 = Σ I (e 11 [n]-S12 [n] ) [η] ]2 ---第③式 ηSi 1 [n] = rP [n] «hi [n] gives r = Z Si '[n]》 P' [η] / Σ (P '[n]) 2-the second formula η m where Si' [n] = Si [nhwtn], P '[n] = P [n] * h [n] * w [n] Printed and printed by the Consumer Labor Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling in This page) The best base cycle synchronization codeword a1ePt [n] and the best base cycle synchronization codeword can calculate 20 g2ePt as follows: First, suppose that the second-level original speech St [η] minus the long-term estimate Contributed signal Su [η] is the second-level lost speech error signal eu [η] (卽, the first-level lost speech error signal ee [n] minus AS ^ [n], so ei1 [n]) , And when "E3 / * g2 = 0, you can get the best page 14 of this paper standard is applicable to China National Standards (CNS) A4 specifications (210X297 mm) Printed by the Central Standards Bureau of the Ministry of Economic Affairs Beigong Consumer Cooperative A7 B7_ 5 3. Description of the invention (β) 01 Codeword energy g2 = g2ePt; that is, Ε3 = Σ I (e 11 [n] -S12 [n]) [η]] 2 --- the third formula η
St £ [η]=g2at [η][η] »E3"g2 = *(E (eii [n]*w[n]*^i2 [n]*w[n】)2)"g2 = 0 r\ 05 得 g2^t = ESi£Mn]ar[n]/E(ai’[n】)2 第④式 Y\ V\ 其中,St2’[n】=eii[n]>i<w[nl、ai’[n】=ai[n]«h[nl*w[n] 進而言之,請配合第七圖及第八圖所示,當一開始時 提供一値基週之原始語音信號以[η】,而在第一级之基遇 同步脈衝分析迴路11分析求得最佳脈衝值glept及位置, 10 及在第二级之基週同步碼字分析迴路12中先求得長期預估 僳數r,而由第一级之基週脲衝分析迴路11所産生的第一 级有轚語音誤差信號e8 [η]送入第二级之基遇同步碼字分 析迺路12中時,即將該第一级有磬語音誤差信號ee[nl視 為第二级之有聲原始語音hU],而後減去長期預估所貢 15 獻之信號Sutn]獲得ei1[n],再由該第一随機碼害121擷 取之一第一碼字^^]代入第@式中,即可求得最佳碼字 能量g2ePt ,而後再代人第③式中即可計算出eil[n]與碼 字所貢獻的合成信號l2【n]之加權誤差量E3,並記錄之, 而後以捜尋法(FULL-SEARCH) 將該第一隨機碼書之第一 2〇 碼字ai[n】——代入第④式中,並如前述依次求得並記錄 相對之加權誤差量E3,比較該等加權誤差量,其中加權誤 差量E3最小者卽為代表其相應之碼宇為最佳同步碼字aiep t[n],及其求得之能量為最佳同步碼字能量g2ept,而後 將該最佳碼字anePt[n]及該最佳同步碼字能躉g2ePt儲存 __第15頁__ 本紙張尺度逋用中國國家標準(CNS〉A4規格(210X297公釐) J H 一 装 訂— f i ί - (請先閱讀背面之注意事項再填寫本頁) 〇22567 a? __B7 五、發明説明(人3) 01 起來,緊接著再送入下一個週期之原始語音訊號,重覆上 述之流程尋找並記錄該語音訊號之該最佳碼字及該最佳瑪 字能量。 因此,在有聲語音分析裝置10中共分析求得最佳基週 05 脲衝之位置及大小值、最佳第一碼字、最佳第一碼字能量 及長期預估偽數,且將所有得到之第一碼字能量、最佳基 周脈衝以及長期預估偽數予以正規化(NORMALIZED),如 此做的目的是為了在合成語音時,直接將此等正規化後之 碼字能量、脲衝大小以及長期預估偽數乘上所欲合成之語 10 音能量,即可不需要考慮分析語音之能量差異·,當然此等 資料經由軟體壓縮之技巧儲存在記億體(圔中未示)形成 資料庫。 經濟部中央揉準局貝工消費合作社印製 ^^1- m —--I t^n 1^1 Bui «1_1 ^^1 ^ I— a_^^i ^^1 ^^1 ^^1 - - f -(請先閱讀背面之注意事項再填寫本頁) 請配合第三圖所示,該有聲語音合成裝置20,僳接收 上述有轚語音分析裝置10所分析求得之最佳基週脈衝值、 15 最佳第一碼字aiept [η]及最佳第一碼字能量、長期預估傑 數r ,且分別以第二乘法器21、第三乘法器22、第五乘法 器24乘以能量做為能量調整,而第四乘法器23則做為長期 預估偽數r之输入單元,因此當最佳碼字a〃Pt[n]及長期 預估僳數r分別經由第三乘法器22及第五乘法器24調整能 20 量後經第三加法器25相加後,一方面經一延羥器26 (延遲 時間為一週期)遞回至第四乘法器23與該長期預估偽數相 乘,另一方面則傳送至第四加法器27與經新基遇值输出器 2 9控制以及經第二乘法器21做過能量調整之最佳週期脈衝 相加後送至第三短期預估濾波器28模擬合成有聲語音信號 __第16頁 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央揉準局貝工消費合作社印製 ^22567 ΑΊ __Β7________ 五、發明説明(/〆) 01 ,波形如第八圖(G);而在第八圖中之第八圖(E)為『丫 』之原始波形,第八圖(F)為只經第一级基遇同步腯衝分 析迺路11所得到的合成語音,將第八圖(G)、(E)、(F)予 以tb較,很明顯的經過第一级及第二级分析迴路所得到的 05 合成語音波形較優於只經過第一级分析迴路所得到之合成 語音波形,另,第八圖(H)則為經第八圖(G)與(E)相比較 所得到之誤差訊號。 上掲為本發明針對有轚音分析及合成所蓮用之技術手 段,而對無磬音(或清音)分析則如下述,並請配合第四 10 圖及第五圖。 該無轚語音分析裝置30,包括有一第二隨機碼書31、 —第六乘法器32、一第四短期預估濾波器33、一第五加法 器34及一第二誤差加權濾波器35 ;其中: 請配合第九圆所示,該第二随機碼書31偽以固定長度 15 之碼字a2【n]來模擬無轚語音之激發信號,而該第六乘法 器32則是將由第二随機碼書31所擷取之第二碼字a2[n]乘 上一能量g3(波形如第九圖(J),即乘上能量後的碼字) 送至該第四短期預估濾波器33,而後送出一無轚語音合成 信號[η],而後將該無轚語音合成信號[n]與原始語 20 音So’ [η](波形如第九圖(I))經第五加法器34相減得一 無轚語音誤差信號e0’ [η]送至該第二誤差加權濾波器35處 理,並输出一無聲語音誤差加權信號ee”[n】遞回至該第二 隨機碼書31及諝整能量g3;在該無聲語音分析装置30主要 是分析求得該第二隨機碼書31之最佳碼字a2ept [η]及最佳 _____第 17 頁_ 紙張尺度逋用中國國家標準(CNS ) Α4規格(210X297公釐) ^ 装 n 訂 ( i - - (請先閱讀背面之注^^項再填寫本頁) A7 B7 i、發明説明(/3) 01 碼字能量S3ePt【η】,其蓮算方式如下: 首先•假設第二誤差加權濾波器35之數學函數為《[η] ,而無聲原始語音為S0’[n],而當》E«/»g3 = 0時,可以得 到最佳碼字能量g3 = g3ePt ;即, 05 E4 = i ee<[n】,且本實施例中N固定等於5 〇 Ε4 = Σ [(Sc* [nl-Se* [n])#w2 [n]]a 第® 式 Υ\ iE4/^g3 = >(Z (So* [nl^wInJ-So* [η]^ι«ί[η3)2)/>Η3 = 0 ΫΛ 得 = 2 S0,,[n]a2’ [η】/Σ (a2, [n])2 ---第⑥式 λ η 其中,SeM[nl=SB’[n]*u[n]、az’fnkazUHhtnpwtn] 10 因此,當一開始先提供一段原始無轚語音Se* [n],而 經濟部中央標準局員工消費合作社印製 .^^—-^^1 ^^^1 _u» In 1^1 ^^^1 -II nn 1^1 一 Ϊ *户 (請先閱讀背面之注意事項再填寫本頁) 訂 後將由該第二随機碼害31中擷取一第二随機碼字a2 [η]代 入第⑬式中,即可求得最佳碼字能量g3ePt,而後再代入 第©式中即可計算出原始語音信號ST [η】與無聲語音合成 信號$β’[η]之加權誤差量Ε4,並記錄之,而後以捜尋法將 15 該第二隨機碼書之碼字^卜]一一代入第®式中,並如前 述由第®式依次求得並記錄相對之誤差量比較該等誤 差量,其中誤差量最小者即為代表其相應之碼字為最佳碼 字a£w[n],及其求得之能量為最佳碼字能量g3〃t ,而 後將該最佳碼字a〃Pt [η]及該最佳碼字能量g3。經軟體 2〇 壓縮儲存在条統之記憶體,緊接著再送入下一段無轚原始 語音訊號,重覆上述之流程尋找並記錄該語音訊號之該最 佳碼字及該最佳碼字能量。而每一段欲分析之無聲原始語 音S3’[n]之固定長睹為5 ◦點,當然從第一隨機碼書31中 所擷取的每個第二隨機碼字a2[n】的長度均為固定50點 第18頁 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) A7 B7 322567 五、發明説明(/6 ) 〇1 ,在本實施例中,語音取樣訊號為1 0 KHz。 —------1—--Y 装------訂 1-·· (請先閲讀背面之注意事項再填寫本頁) 該無轚語音合成裝置40,偽接收上述無轚語音分析裝 置3 0所求得之最佳碼字a^Pt[n]及最佳碼字能量g3^t, 經一第七乘法器41做能量調整,而後經第五短期預估濾波 05 器42模擬合成得到無聲語音信號(波形如第九圖(K)), 而得到之第九圖(K)與原始無磬語音波形如第九圖(I)之 誤差信號則如第九圖(L)所示之波形。 上掲該有聲語音合成裝置20與該無磬語音合成裝置40 共同完成一完整的語音合成,此完整的語音合成則可透過 10 揚磬裝置(圖中未示)發聲至外界,而在該無聲語音合成 裝置40的實施方式亦可以使用有磬語音合成裝置20以軟體 控制,使得在合成無磬語音信號時只有最佳碼字P t [η】 及最佳碼字能量g3^t經第三乘法器22、第三加法器25、 第四加法器27之該第三短期預估濾波器28棋擬合成無轚語 15 音,當然該第三加法器25、苐四加法器27在做為處理無磬 語音信號時因為無其他外加信號,故無任何作用。 另,該第一隨機碼書與該第二隨機碼書亦可以軟體控 制的方式由單一隨機碼書來達成。 經濟部中央標準局貝工消費合作社印裝 藉由上述之構造,本發明可以得到與有聲語音信號相 20 關之五個參數,包括有最佳週期脈衝之位置及大小值、最 佳基週同步碼字、最佳基週同步碼字能量及長期預估僳數 ,而與無轚語音信號相關之參數則包括有最佳時序碼字及 最佳時序碼字能量,而根據上述獲得之參數在合成語音時 可做下列之調整: _第19頁_ 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央標準局員工消費合作社印製 A7 «Β7 五、發明説明(/Γ) °ι ⑴能量調整:要做能量調整時,只要直接將所需要的 能量乘上有聲語音之週期脈衝信號、最佳基週同步碼字及 長期預估偽數,或無磬語音之最佳時序碼字,即可做能量 調整。 °5 0基週調整:基週調整則直接改變有聲語音合成裝置 20中之新基週值输出器29,進而改變输入之基週脈衝及碼 字之頻率卽可;第十一圖即原始語音經基周調整後之合成 語音,其中第十一圖之(Μ)為原始語音,(Ν)為頻率提高 ,此時合成語音近似於女性之音調,(Ρ)為頻率降低,此 時合成語音近似於成年男性之音調,而(0)則靨於較中性 之音調。 G)長度調整:長度諏整像以重複插入或刪除整锢週期 激發信號來逹到語音拉長或縮短之目的,而調整過程乃根 據最佳週期脈衝位置選擇最接近之週期激發信號,加以刪 15 除(如第十二圖Α之(Q)表示原始激發信號,(R)表示刪除 後之激發信號)或加入(如第十二圖A之(S)表示原始激 發信號,(T)表示插入後之激發信號)例如分析所得之基 週有1 0個,則對此10組之資料做複製或刪除,例如音拉 長0. 5倍時,則為兩個基週複製一個,若音長縮短一半 2〇 ,則每兩個基週去除一値。 因此,將本發明與習知者相較,不但可以做參數之調 整,得到較佳之合成語音,而且可由第六画、第八圖、第 九圖之波形及以下列表一所示,本發明之『基週同步式語 音合成器』不但可以得到較佳之音質,而且可調整參數, 第20頁 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) L---------f裝------訂------{ I - (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局負工消費合作社印裝 A7 B7 i、發明説明(/g) °1 且過度調整參數之下仍不致影鬱到音質。 合成器 複雜度 資料率 參數 調整 音質 過度參數 調整之音質 LPC 中 低 可 差 幾乎不變 MPLPC 中 高 不可 佳 - CELP 高 中 不可 佳 - PS0LA 低 非常高 可 佳 變差 本發明 中 中 可 佳 幾乎不變 <表一 > 綜上所述,本發明之《基週同步式的線性預估語音合 成器j,確能藉上述所掲露之構迪、裝置,達到預期之目 的與功效,且申請前未見於刊物亦未公開使用,符合發明 2〇 專利之高度創作、新穎、進步等要件。 惟,上述所掲之圖式及說明,僅為本發明之實施例而 已,非為限定本發明之實施;大凡熟悉該項技藝之人仕, 其所依本發明之特擻範睹,所作之其他等效變化或修飾, 皆應涵蓋在以下本案之申請專利範圍内。 第21頁 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) —; I- A 装 訂 I ( ί - (請先聞讀背面之注意事項再填寫本頁)St £ [η] = g2at [η] [η] »E3 " g2 = * (E (eii [n] * w [n] * ^ i2 [n] * w [n]) 2) " g2 = 0 r \ 05 got g2 ^ t = ESi £ Mn] ar [n] / E (ai '[n]) 2 The fourth formula Y \ V \ where St2' [n] = eii [n] > i < w [nl, ai '[n] = ai [n] «h [nl * w [n] In addition, please cooperate with the seventh and eighth figures to provide an original voice of the base week when it starts The signal is [η], and the optimal pulse value glept and position are analyzed and analyzed in the first-stage base-synchronization pulse analysis circuit 11 and 10, and the long-term is first obtained in the second-stage base-cycle synchronization codeword analysis circuit 12. Estimate the number r, and the first-stage speech error signal e8 [η] generated by the first-stage base cycle urea impact analysis circuit 11 is sent to the second-stage base-meeting synchronization codeword analysis channel 12 At this time, the first-level chirped speech error signal ee [nl is regarded as the second-level voiced original speech hU], and then subtracted the long-term estimated contribution 15 signal Sutn] to obtain ei1 [n], and then the One of the first codewords retrieved by the first random code 121 is substituted into the formula @, and the optimal codeword energy g2ePt can be obtained, and then substitute the formula ③ to calculate eil [ n] and the weighted error amount E3 of the composite signal l2 [n] contributed by the codeword, and record it, and then the first 20 codeword ai [of the first random codebook is obtained by FULL-SEARCH n] —— Substitute into formula ④, and obtain and record the relative weighted error amount E3 in sequence as described above, and compare the weighted error amounts, where the smallest weighted error amount E3 is the best for the corresponding code. The sync code word aiep t [n], and the energy obtained is the best sync code word energy g2ept, and then the best code word anePt [n] and the best sync code word can be stored in g2ePt__ 第 15 Page __ This paper uses the Chinese National Standard (CNS> A4 size (210X297mm) JH One Binding — fi ί-(please read the precautions on the back before filling this page) 〇22567 a? __B7 V. Invention Instructions (Person 3) 01, then send the original voice signal of the next cycle, repeat the above process to find and record the best code word and the best energy of the voice signal. Therefore, in voiced voice The CCP analyzes and analyzes the analysis device 10 to obtain the best base cycle 05. One codeword, best first codeword energy and long-term estimated pseudo-number, and normalize all the obtained first codeword energy, best base cycle pulse and long-term estimated pseudo-number (NORMALIZED), so do The purpose is to directly multiply the normalized codeword energy, urea pulse size and long-term estimated pseudo-number by the 10-speech energy of the language to be synthesized when synthesizing speech, so that it is not necessary to analyze the energy difference of speech. Of course, this kind of data is stored in a billion-byte volume (not shown) by software compression techniques to form a database. Printed by the Beigong Consumer Cooperative of the Central Bureau of Economics of the Ministry of Economic Affairs ^^ 1- m —-I t ^ n 1 ^ 1 Bui «1_1 ^^ 1 ^ I— a _ ^^ i ^^ 1 ^^ 1 ^^ 1- -f-(please read the precautions on the back and then fill out this page) Please cooperate with the third figure to show that the voiced speech synthesis device 20 receives the best base cycle pulses analyzed by the above-mentioned voiced speech analysis device 10 Value, 15 best first codeword aiept [η] and best first codeword energy, long-term estimated code number r, and are multiplied by second multiplier 21, third multiplier 22, and fifth multiplier 24, respectively The energy is used as the energy adjustment, and the fourth multiplier 23 is used as the input unit of the long-term estimated pseudo-number r, so when the optimal codeword a〃Pt [n] and the long-term estimated number r are respectively subjected to the third multiplication After adjusting the energy by the multiplier 22 and the fifth multiplier 24 by the third adder 25, on the one hand, it is returned to the fourth multiplier 23 and the long-term The estimated pseudo-number is multiplied, on the other hand, it is sent to the fourth adder 27 and the best periodic pulse controlled by the new base value output device 29 and energy-adjusted by the second multiplier 21, and then sent to the three Short-term estimation filter 28 Analog synthetic voiced speech signal __page 16 This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) Printed by Beigong Consumer Cooperative of the Central Bureau of Economic Development of the Ministry of Economic Affairs ^ 22567 ΑΊ __Β7 ________ 5 1. Description of the invention (/ 〆) 01, the waveform is as shown in the eighth picture (G); and the eighth picture (E) in the eighth picture is the original waveform of "Ya", and the eighth picture (F) is only the first The synthesizing speech obtained by analysing the Lulu 11 at the base level of the base level is compared with the eighth pictures (G), (E), and (F), which is obviously obtained through the first and second analysis loops. The 05 synthesized speech waveform is better than the synthesized speech waveform obtained only through the first-level analysis loop. In addition, the eighth picture (H) is the error signal obtained by comparing the eighth pictures (G) and (E) . Shangqi is the technical means used by the present invention for the analysis and synthesis of the voice, and the analysis of the non-sound (or unvoiced) sound is as follows, and please cooperate with the fourth and fifth figures. The unspoken speech analysis device 30 includes a second random code book 31,-a sixth multiplier 32, a fourth short-term estimation filter 33, a fifth adder 34 and a second error weighting filter 35; Among them: Please match with the ninth circle, the second random codebook 31 pseudo simulates the excitation signal of non-spoken speech with the codeword a2 [n] of fixed length 15, and the sixth multiplier 32 will be controlled by the The second code word a2 [n] extracted by the two random code books 31 is multiplied by an energy g3 (the waveform is as shown in the ninth figure (J), which is the code word multiplied by the energy) and sent to the fourth short-term estimate The filter 33, and then sends a non-speech speech synthesis signal [η], and then the non-speech speech synthesis signal [n] and the original 20-tone So '[η] (waveform as shown in the ninth picture (I)) through the fifth The adder 34 subtracts a non-spoken speech error signal e0 '[η] and sends it to the second error weighting filter 35 for processing, and outputs a silent speech error weighted signal ee "[n] and returns it to the second random code Book 31 and the integrated energy g3; in the silent speech analysis device 30, the best code word a2ept [η] and the best _____ page 17 of the second random code book 31 are mainly analyzed and obtained _ The paper standard adopts the Chinese National Standard (CNS) Α4 specification (210X297mm) ^ Pack n order (i--(please read the note on the back ^^ before filling in this page) A7 B7 i. Description of invention (/ 3 ) 01 codeword energy S3ePt [η], the calculation method is as follows: First • Suppose the mathematical function of the second error weighting filter 35 is "[η], and the silent original speech is S0 '[n], and when" E When «/» g3 = 0, the best codeword energy g3 = g3ePt can be obtained; that is, 05 E4 = i ee < [n], and in this embodiment, N is fixed to be equal to 5 〇Ε4 = Σ [(Sc * [nl -Se * [n]) # w2 [n]] a th ® formula Υ \ iE4 / ^ g3 = > (Z (So * [nl ^ wInJ-So * [η] ^ ι «ί [η3) 2) / > Η3 = 0 ΫΛ = 2 S0 ,, [n] a2 '[η] / Σ (a2, [n]) 2 --- the sixth formula λ η where SeM [nl = SB' [n] * u [n], az'fnkazUHhtnpwtn] 10 Therefore, when the beginning provides a piece of original unspoken speech Se * [n], which is printed by the employee consumer cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs. ^^ —- ^^ 1 ^^ ^ 1 _u »In 1 ^ 1 ^^^ 1 -II nn 1 ^ 1 1 Ϊ * Users (please read the precautions on the back before filling out this page) After ordering, one will be retrieved from the second random code 31 Second random The code word a2 [η] is substituted into formula ⑬ to obtain the optimal code word energy g3ePt, and then substituted into formula © to calculate the original speech signal ST [η] and the silent speech synthesis signal $ β '[ η] the weighted error amount Ε4, and record it, and then use the search method to put 15 the code word of the second random codebook ^ Bu] into the formula of the first generation, and according to the previous formula as obtained Record the relative error amount and compare the error amounts. The smallest error amount means that the corresponding codeword is the best codeword a £ w [n], and the energy obtained is the best codeword energy g3. 〃T, and then the best codeword a〃Pt [η] and the best codeword energy g3. The software 20 compresses and stores the memory in the system, and then sends the next section of the original voice signal, repeating the above process to find and record the best code word and the best code word energy of the voice signal. The fixed length of each piece of silent original speech S3 '[n] to be analyzed is 5 ◦ points. Of course, the length of each second random code word a2 [n] extracted from the first random code book 31 is In order to fix 50 points on page 18, the paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) A7 B7 322567 5. Description of the invention (/ 6) 〇1, in this embodiment, the voice sampling signal is 1 0 KHz. —------ 1 —-- Y outfit —---- order 1- · (please read the precautions on the back before filling in this page) The non-spoken speech synthesis device 40, pseudo-receives the above non-spoken speech synthesis device The best codeword a ^ Pt [n] and the best codeword energy g3 ^ t obtained by the speech analysis device 30 are adjusted by a seventh multiplier 41, and then filtered by the fifth short-term estimation filter 05 42 Analog synthesis to obtain a silent voice signal (waveform as shown in the ninth picture (K)), and the error signal of the obtained ninth picture (K) and the original soundless speech waveform as the ninth picture (I) is shown in the ninth picture (L) ) Shows the waveform. The voice speech synthesis device 20 and the speechless speech synthesis device 40 work together to complete a complete speech synthesis. This complete speech synthesis can be uttered to the outside world through a 10 Yangsong device (not shown). The embodiment of the speech synthesis device 40 can also use the chime speech synthesis device 20 under software control, so that only the best code word P t [η] and the best code word energy g3 ^ t are passed through the third The third short-term estimation filter 28 of the multiplier 22, the third adder 25, and the fourth adder 27 is fitted into a non-spoken 15 tone, of course, the third adder 25 and the fourth adder 27 are used as There is no other added signal when processing the unvoiced voice signal, so it has no effect. In addition, the first random code book and the second random code book can also be achieved by a single random code book in a software-controlled manner. Printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy. With the above structure, the present invention can obtain five parameters related to the voiced speech signal, including the position and size of the best periodic pulse, and the best base cycle synchronization. Codewords, the best base-cycle synchronization codeword energy and long-term estimated number, and the parameters related to the unspoken speech signal include the best timing codeword and the best timing codeword energy, and according to the parameters obtained above The following adjustments can be made when synthesizing speech: _ 第 19 页 _ This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) A7 «Β7 printed by the Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economy V. Invention description (/ Γ) ° ι ⑴Energy adjustment: when energy adjustment is needed, simply multiply the required energy by the periodic pulse signal of the voiced speech, the best base cycle synchronization codeword and the long-term estimated pseudo-number, or the best of the unsound speech The best timing codeword can be used for energy adjustment. ° 5 0 base cycle adjustment: the base cycle adjustment directly changes the new base cycle value output device 29 in the voiced speech synthesis device 20, and then changes the frequency of the input base cycle pulses and codewords; the eleventh figure is the original speech Synthesized speech adjusted by the base cycle, where (Μ) in the eleventh picture is the original speech, (Ν) is the frequency increase, the synthesized speech is similar to the female tone at this time, and (Ρ) is the frequency decreased, then the synthesized speech It is similar to the tone of an adult male, while (0) is more neutral. G) Length adjustment: the length of the whole image is to repeatedly insert or delete the periodical excitation signal to achieve the purpose of speech lengthening or shortening, and the adjustment process is to select the closest periodical excitation signal according to the best periodical pulse position and delete it 15 Division (such as (Q) of the twelfth figure A represents the original excitation signal, (R) represents the deleted excitation signal) or join (as the twelfth picture A (S) represents the original excitation signal, (T) represents Excitation signal after insertion) For example, if there are 10 base cycles obtained by analysis, copy or delete the data of the 10 groups. For example, when the tone is elongated by 0.5 times, it will be copied for two base cycles. If the length is shortened by 20%, one value is removed every two base weeks. Therefore, comparing the present invention with those of the prior art, not only can the parameters be adjusted to obtain a better synthesized speech, but also can be shown by the waveforms of the sixth picture, the eighth picture, the ninth picture and the following list one. "Base Synchronous Speech Synthesizer" can not only get better sound quality, but also adjust the parameters. The paper standard on page 20 is applicable to China National Standard (CNS) Α4 specification (210X297mm) L -------- -f 装 ------ book ------ {I-(please read the precautions on the back before filling out this page) A7 B7 i, invention description printed by the Consumer Labor Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs ( / g) ° 1 and over-adjusting the parameters will not cause depression to the sound quality. Synthesizer complexity, data rate parameter adjustment, sound quality, excessive parameter adjustment, sound quality, LPC, low to medium, poor to be almost unchanged, MPLPC, medium to high, unsatisfactory-CELP, high to medium, unsatisfactory, PS0LA, low, very high to excellent, degraded Table 1 > In summary, the "base-period synchronous linear predictive speech synthesizer j of the present invention can indeed achieve the intended purpose and effect by the above-mentioned configuration and device, and before application It is not found in publications or used publicly, and meets the requirements of the invention, such as the high degree of creation, novelty, and progress of the 20 patent. However, the above drawings and descriptions are only examples of the present invention, and are not intended to limit the implementation of the present invention; most people who are familiar with this skill, according to the special examples of the present invention, have done Other equivalent changes or modifications shall be covered by the following patent applications in this case. Page 21 This paper scale is applicable to the Chinese National Standard (CNS) Α4 specification (210X297mm) —; I- A binding I (ί-(please read the precautions on the back before filling this page)