317631 A7 B7 五、發明説明(1 ) 〔發明所鼷之技術領域〕 本發明係闞於一種將語音信號壓縮編碼成數位信號的語 音編碼裝置及語音鑷碼解碼装置。 [習知之技術〕 9顯示將習知的語音輪入分期I成頻譜包絡資訊和音源 信號資訊,並以圈框單位將音源信號資訊進行鑷碼的語音 編碼解礓裝置之全體構成之一例,係和日本專利特開昭 64-40899號公報所示者相同。 經濟部中央梂準局貝工消费合作社印製 I_____Γ____-— (請先閲讀背面之注意事項再壤寫本頁) 訂 線一 匯中,1為编碼器,2為解碼器,3為多工器,4為分離櫬 構,5為語音輸入,6為傅輸媒·7為語音翰出。編碣器1係 由Μ下之8〜15所構成。8為鑲性預拥#數分析拥構,9為 嬢性預拥參數纒碣機構,10為自邃應音源繾碼簿,11為自 逋應音源檢索櫬構,12為誤差信虢生成櫬構,13為驅動音 灌繾碥簿,14為鼷動音源檢索櫬構,15為音涯信號生成機 構。又,解碣器2係由Μ下之16〜22所構成。16為媒性預 測參數解碼櫬構,17為自遭應音源編碼薄,18為自遛應音 滙解碼機樽,19為驅動音源鑷碥薄,20為驅動音源解碼播 構· 21為音海信號生成櫬構,22為合成濾波器。 Μ下,係就將上述習知之諝音輪入分割成頻譜包络資訊 和音源信號霣訊,並Μ圈框單位將音源信號資訊進行編碣 的語音絪碼解碼装置之動作加Μ說明。 苜先在鑷碼器1中,例如,以8kHz所抽樣的数位語音信 號當作語音輪入5來輪入。煽性預测參數分析嫌構8*係分 析語音_入5,而抽出作為語音之頻_包络貢訊的繚性預 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) -4 317631 A7 B7_ 五、發明説明(2 ) 澜參數。接著線性預測參數編碼機構9將已抽出的前述線 性預測參數量子化,並將與之對應的鱺碼输出至多工器3 上*同時將董子化的媒性預測參數输出至自逋應音源檢索 櫬構11、誤差信號生成皤構12、驅動音源檢索機構14上。 其次,就音源信號資訊的編碣加K說明。在自逋應音海 纗碼薄10上,存賭有在從音涯信號生成播構15所輪入的過 去生成的音源信號,並將對應由自逋懕音源檢索櫬構11所 _入的延遲參數丨之圈框長度的自遘懕音源向量输出至自 通應音源狳索機構11上。在此,前述自缠懕音源向ft係對 延遲#數1從1抽櫬通去(sanpU earlier)中切出HI框長度 的音源信號者,在該延遲#數1短於圃框長度的情況時重 覆1抽樣的章源信號直至_框長度為止所生成者。_10 (a) 為顬示框長度之情況的自遽應音源向量之例,而B 10(b)為1〈圈框長度之情況的自癫應音源向董之例》 經濟部中夹梂準局—工消費含作社印装 自瘇應音源檢索機構11,係對例如20彡1忘128之範画的 延遯參數1,將由前述自適應音灌篇碼簿10所耱入的自磨 應音源向量使用由前述線性預拥參數鑷碼機構9所输入之 量子化的線性預测參數進行堞性預測合成Μ生成語音合成 .向fi。接著,求出由語音_入5以每一圃框切出的語音鑰 入向量和前述語音合成向量的》覺加權失興。其次*比較 評估前述失冥*以求出前述失興變成最小的延遲參數L和 與之對應的自遛應音源增益泠,並將前述延遲參數L和自 壤應音源增益点的鱺碭_出至多工器3上,同時生成對應 前述延邇參数L之自邃應音源參數乘Μ前述自遑應音源堉 5 (請先《讀背面之注$項再填寫本買) 本紙張尺度適用中國國家梯準(CNS ) Α4規格(210Χ297公釐) 7 Β 3 /V 明説 明發 號 信15 源構 音機 應成 適生 自號 的信 点源 益音 上 和 2 構. 機 成 生 號 信 差 誤 至 出 輸 並 1 機 構碼 機編 索數 檢參 源測 音預 應性 適線 自述 述前 前由 由用 將使 係號 12信 構源 機音 成應 生適 號自 信的 差入 誤輪 所 成 合 測 預 性 線 行 進 數 參 测 預 性 線 的 化 子 量 之 入 輸 所 9 構 每 差 誤 的 差 入之 輸5 音向 語成14 由合構 出音機 求語索 , 述檢 著前源 接和音 。量動 量向驅 向入至 成輸出 合音輸 音語 Μ 語的, 成出量 生切.向 上 框 圖 據 S3 信 (請先閲讀背面之注意事項再填寫本頁) -裝. 經濟部中央梂準局貝工消费合作社印製 在驅動音源編碼簿13上,例如存儲有由随機噪音所生成 的N個驅動音源向5,並輸出對應由驅動音源檢索櫬構14 所輸入之驅動音源編碼i的驅動音源向1。驅動音源檢索 機構14係對N個驅動音源向悬,將由前述驅動音源煸碼簿 13所輸入的驅動音源向悬使用由前述線性預测參數煽碼機 構9所輸入之量子化的線性預測參數進行線性預測合成K 生成語音合成向量。接著,求出由前述誤差信號生成機構 12所輸入的誤差信號向1和前述語音合成向fi的聽覺加權 失真。其次,比較評估前述失真,以求出前述失真變成最 小的驅動音源編碼I和與之對應的驅動音源增益7 ,並將 前述驅動音源蹁碼I和驅動音源增益7的編碼輸出至多工 器3上,同時生成對應前述驅動音源編碼I之驅動音源向量 乘Μ前述驅動音源增益7的驅動音源信號,並輸出至音源 信號生成機構15上。 音源信號生成機構15係相加由前述自適應音源檢索機構 11所輸入的自適應音源信號和由前述驅動音源檢索櫬構14 訂 線 本紙張尺度適用中國國家標準(CNS > Λ4規格(210Χ297公釐) 6 317631 A7 B7 五、發明説明(4 ) 所輸入之驅動音源信號Μ生成音源信號,並輸出至自適應 音源編碼簿1 0。 (請先閲讀背面之注意事項再填寫本頁) 在以上编碼结束之後*多工器3係將對應前述fi#化之 線性預測參數的编碼,和對應延遲參數L、驅動音源編碼I 、及音源增益点、7的編碼送出至傳輸線6上。 其次,就解碼器2之動作加K說明。 首先接受多工器3之輸出的分離機構4,係各別將線性預 測參數之編碼輸出至媒性預測參數解碼機構16上;將延遲 參數L、音源增益泠之編碼輸出至自適應音源解碼櫬構18 上;及將驅動音源編碼I、音源增益7之編碼輸出至驅動 音源解碼機構20上。 經濟部中央橾準局貝工消費合作社印装 線性預測參數解碼機構16,係解碼對應前述線性預測參 數之編碼的線性預測參數,並輸出至合成滤波器22上。自 適應音源解碼機構18,係從自逋應音源編碼簿17讀出對應 前述延遲參數L的自適應音源向量,又從前述自適應音源 增益/3的編碼中解碼自適應音源增益点,K生成前述自逋 應音源向量乘Κ前述自適應音源增益/3的自適應音源信號 ,並輸出至音源信號生成機構21上。驅動音源解碼機構20 ,係從驅動音源編碼簿19中謓出對應前述驅動音源編碼I 的驅動音源向量,又從前述驅動音源增益7的編碼中解碼 驅動音源增益7 ,Κ生成前述驅動音源向量乘以前述驅動 音源增益7的驅動音源信號,並輸出至音源信號生成機構 21上。 音源信號生成機構21係相加由前述自缠應音源檢索機構 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) 五、發明説明(5 ) Α7 Β7 20應 構適 櫬.自 碼至 解出 源輪 音並 動, 驅號 述信 前源 由 音 和成 號生 信Κ 源號 音信 應源 遒音 自動 的驅 入之 輪入 所 _ 18所317631 A7 B7 V. Description of the invention (1) [Technical Field of the Invention] The present invention is a speech encoding device and a speech tweezers decoding device that compressively encodes a speech signal into a digital signal. [Known Techniques] 9 shows an example of the overall configuration of a voice coding and decoding device that rotates the conventional speech into stage I into spectrum envelope information and sound source signal information, and performs tweezing of the sound source signal information in circle units. It is the same as that shown in Japanese Patent Laid-Open No. 64-40899. I _____ Γ ____- printed by Beigong Consumer Cooperative of the Central Bureau of Economic Affairs of the Ministry of Economic Affairs (please read the precautions on the back before writing this page). In the summary, 1 is the encoder, 2 is the decoder, and 3 is multitasking 4 is the separate structure, 5 is the voice input, 6 is the Fu input medium and 7 is the voice output. The braid device 1 is composed of 8 ~ 15 under Μ. 8 is the embedded pre-supported number analysis analysis structure, 9 is the pre-supported parameter setting mechanism, 10 is the self-supporting audio source codebook, 11 is the self-supporting audio source retrieval structure, and 12 is the error information generation structure. Structure, 13 is the drive sound irrigation script, 14 is to retrieve the sound source search structure, and 15 is the sound signal generation mechanism. In addition, the decoupling device 2 is composed of 16 to 22 under Μ. 16 is the media prediction parameter decoding structure, 17 is the self-contained audio source code book, 18 is the self-contained audio source decoder, 19 is the drive sound source tweezers thin, 20 is the drive sound source decoding broadcast structure. 21 is the sound sea Signal generation structure, 22 is a synthesis filter. Under Μ, the action of the speech code decoding device that divides the above-mentioned known cogwheel into the spectrum envelope information and the sound source signal information, and the sound frame signal information is edited by the rim unit is explained. First, in the tweezers 1, for example, the digital speech signal sampled at 8 kHz is used as the speech round 5 to enter. Sensitive prediction parameter analysis suspected that 8 * analyzes voice_in 5, and extracts the frequency as voice_enveloping the pre-prepared paper size of the paper. The standard of China National Standard (CNS) A4 (210 X 297 mm ) -4 317631 A7 B7_ V. Description of the invention (2) Lan parameters. Next, the linear prediction parameter encoding mechanism 9 quantizes the aforementioned linear prediction parameters, and outputs the corresponding code to the multiplexer 3 * At the same time, Dong Zihua ’s media prediction parameters are output to the self-recorded sound source retrieval structure. 11. Error signal generation mechanism 12, driving sound source retrieval mechanism 14. Secondly, add K description to the compilation of the audio signal information. On the codebook 10 of the self-sounding sea, there is a bet on the sound source signal generated in the past from which the sound structure 15 generates the broadcast structure 15 and will correspond to the input from the sound structure retrieval from the sound structure 11 The self-propagating sound source vector of the length of the circle of the delay parameter 丨 is output to the self-adapting sound source ado cable mechanism 11. Here, if the aforementioned self-entangled sound source cuts out the source signal of the length of the HI frame from the sanpU earlier to the delay #number 1 from 1 to the ft system, when the delay #number 1 is shorter than the length of the nursery frame It is generated by repeating the 1-sample chapter source signal until the _ frame length. _10 (a) is an example of self-supporting sound source vector in case of frame length, and B 10 (b) is 1 <case of self-epileptic response sound source in case of circle frame length to Dong ’s case ” Bureau-consumer-consumer-printed self-recording audio source search agency 11, which is the self-polishing of the parameter 1 of the drawing of 20 彡 1forget 128, for example, which will be embedded by the aforementioned adaptive audio codebook 10 The sound source vector uses the quantized linear prediction parameters input by the aforementioned linear pre-holding parameter tweezers 9 to perform castellation prediction synthesis M to generate speech synthesis to fi. Next, the speech weighting vectors and the aforementioned speech synthesis vectors of speech input vectors cut out by speech_entry 5 for each frame are obtained. Secondly * comparatively evaluate the aforementioned loss of life * to find the delay parameter L and the corresponding gain of the self-walking sound source corresponding to the minimum of the aforementioned disappointment, and the above-mentioned delay parameter L and the gain point of the gain point of the self-speaking sound source _ out On the multiplexer 3, simultaneously generate the self-supporting sound source parameter corresponding to the aforementioned extended parameter L multiplied by the aforementioned self-supporting sound source 堉 5 (please first read the "note" item on the back and then fill in the purchase) This paper size is applicable to China National Standards (CNS) Α4 specification (210Χ297 mm) 7 Β 3 / V Explain clearly that the number-signaling source 15 source tone generator should be the appropriate source of the self-numbered source point and the tone structure. The machine has a letter error. Before and after the input and output, the machine code number is compiled and the number of the reference source is measured. The pre-response adaptability line is described by the user who will make the source code of the 12th letter form the source machine into a suitable number. The number of advancements of the pre-test line is measured by the input and output of the quantifier of the pre-test line 9. The input and output of each difference of the error is 5. Pick up harmony. Momentum and momentum are driven into the output and output into the chorus input language M, the output is quantitative. The upper frame is based on the letter S3 (please read the precautions on the back before filling in this page)-installed. Printed on the driving source codebook 13 by the quasi-authorized Peking Consumer Cooperative, for example, storing N driving source directions 5 generated by random noise, and outputting the corresponding driving source code i input by the driving source retrieval mechanism 14 'S drive sound source to 1. The driving sound source retrieval mechanism 14 is to suspend the N driving sound sources, and the driving sound source input from the driving sound source codebook 13 is quantized using the quantized linear prediction parameters input from the linear prediction parameter stimulating mechanism 9 Linear predictive synthesis K generates a speech synthesis vector. Next, the auditory weighted distortion of the error signal direction 1 and the speech synthesis direction fi input by the error signal generating means 12 are obtained. Secondly, compare and evaluate the aforementioned distortion to obtain the driving source code I and the corresponding driving source gain 7 corresponding to the minimum distortion, and output the encoding of the driving source code I and the driving source gain 7 to the multiplexer 3 At the same time, the driving sound source signal corresponding to the driving sound source code I of the driving sound source code I multiplied by the aforementioned driving sound source gain 7 is generated and output to the sound source signal generating mechanism 15. The sound source signal generation mechanism 15 adds the adaptive sound source signal input by the aforementioned adaptive sound source retrieval mechanism 11 and the aforementioned drive sound source retrieval mechanism 14. The paper size of the book is applicable to the Chinese national standard (CNS > Λ4 specification (210Χ297 ) 6 317631 A7 B7 5. Description of the invention (4) The input driving sound source signal M generates a sound source signal and outputs it to the adaptive sound source code book 10. (Please read the precautions on the back before filling in this page) After the encoding is completed, the multiplexer 3 sends the encoding corresponding to the fi # -ized linear prediction parameter, and the encoding corresponding to the delay parameter L, the driving sound source code I, and the sound source gain point, 7 to the transmission line 6. Next , The operation of the decoder 2 is added with K. First, the separation mechanism 4 that receives the output of the multiplexer 3 outputs the encoding of the linear prediction parameters to the medium prediction parameter decoding mechanism 16; the delay parameter L, the sound source The output of the gain code is output to the adaptive sound source decoding structure 18; and the code of the driving sound source code I and the sound source gain 7 is output to the driving sound source decoding mechanism 20. In the Ministry of Economy Printed linear prediction parameter decoding mechanism 16 printed by the Central Bureau of Industry and Fisheries Cooperative Cooperative, decodes the linear prediction parameters corresponding to the aforementioned linear prediction parameters, and outputs to the synthesis filter 22. The adaptive audio source decoding mechanism 18 is derived from The audio source codebook 17 reads out the adaptive sound source vector corresponding to the aforementioned delay parameter L, and decodes the adaptive sound source gain point from the encoding of the aforementioned adaptive sound source gain / 3, K generates the aforementioned self-dependent audio source vector times K An adaptive sound source signal adapted to the sound source gain / 3 and output to the sound source signal generating mechanism 21. The driving sound source decoding mechanism 20 is a driving sound source vector corresponding to the aforementioned driving sound source code I from the driving sound source code book 19, and from In the encoding of the driving sound source gain 7, the driving sound source gain 7 is decoded, and K generates the driving sound source signal multiplied by the driving sound source vector 7 and output to the sound source signal generating mechanism 21. The sound source signal generating mechanism 21 is added According to the paper standard of the aforementioned self-entangled sound source retrieval agency, the Chinese National Standard (CNS) Α4 specification (210X297 (5) Description of the invention (5) Α7 Β7 20 should be constructed to suit. From the code to the solution of the source wheel tone and move simultaneously, the source of the drive letter before the letter is generated by the tone and the number of the letter K. The source number of the letter should be sourced automatically. Of the chased into the wheel _ 18
經濟部中央揉準扃貝工消费合作社印I 音源鑷碼簿17和合成濾波器22上。合成濾波器22係將由前 述音源信號生成機構21所輪入之音源信號,使用由前述線 性預測參數解碼櫬構16所輸入之線性預測參數進行線性預 測合成,並輸出語音輸出7。 又,作為上述習知之語音編碼解碼裝置之改良的先前技 術,及可獲得更高品質之語音輸出的語音編碼解碼裝置, 係有 P.Kroon and B.S.Atal 著〃 Pitch predictors with high temporal resolution ^ilCASSP" 90,pp661-664, 1 990)中所揭示者。 該習知所改良的語音煽碼解碣裝置,係在圔9所示之習 知語音編碼解碼裝置的構成中*作為自適懕音源檢索檄構 11中之檢索對象的延遲參數除了整數值之外亦有取非整數 有理數,自適應音源编碼簿10、17係將對應前述非整數有 理數之延遲參數的自瘇應音源向量內插生成於過去所生成 的音源信號之抽樣之間,而输出者。圃11係顯示延遅參數 1為非整數有理數之情況時之自缠應音源向董之例。圓 11(a)為12圔框長度之情況之例,而腿11(b)為1〈圔框長 度之情況之例。 藉由該棰構成,Μ高於語音_入之抽揉通期的精度決定 延遲參数,就可生成自缠應音源向霣,比日本專利特開昭 6 4-40 8 9 9號公報所揭示的装董通可生成更高品質的語音輸 出0 (請先Μ讀背面之注意Ϋ項再填寫本頁) -裝· -訂 線 本紙張又度適用中國國家揉隼(CNS ) Α4规格(210X297公釐) 317631 A7 Αν B7 五、發明説明(6 ) 又,作為習知之語音編碼解碼裝置的其他先前技術,有 日本專利特開平4-344699號公報。圈12為顯示習知語音编 碼裝置之全體構成之一例的構成圖。 圈12中,和圈9相同的部分使用相同的符號,而其說明 省略。 圖12中,23、24為驅動音源鏞碼簿,和圖9之驅動音源 编碼薄相異者。 在此就上述構成之編碼解碼装置的動作加K說明。 經濟部中央揉準局貝工消費合作社印製 首先,在牖碼器1中,自適應音源檢索機構11,例如對 20S1認128之範圍的延遲參數1,將由前述自適應音源编 碼簿10所輸入的自適應音源向董使用由媒性預測參數編碼 機構9所輪入之1子化的線性預測參數進行線性預測合成 Μ生成語音合成向置。接著,求出由語音輸入5以每一圔 框切出的語音輸入向量和前述語音合成向量的膣覺加權失 真。其次,比較評估前述失真,Μ求出前述失真變成最小 的延遲參數L和與之對應的自適應音源增益/3 ,並將前述 延遲參數L和自適懕音源增益/3的編碼輸出至多工器3和驅 動音源編碼簿23上,同時生成對應前述延遲參數L之自適 應音源參數乘Κ前述自適應音源增益/9的自適應音源信號 ,並輸出至誤差信號生成櫬構12和音源信號生成機構15上。 在驅動音源編碼簿23上,例如存儲有由随機噪音所生成 的Ν個驅動音源向量,並Κ對應前述延遲參數L之每一週期 重覆對懕由驅動音源檢索機構14所輸入之驅動音源編碼i 的驅動音源向量並將之週期化而輸出。圖13(a)為顯示遇 9 (請先閲讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家橾準(CNS ) A4規格(210X297公釐) 317631 Α7 Β7 五、發明説明(7 ) 例 ^ ^ ^ 量 } 期 向(b遇 源13之 音圖將 動如並 驅 , , 之況間 化情之 期的樣 數抽 理的 有量 數向 整源 非音 為動 , 驅 於 成 生 推 内 會 示 所 數 參 遲 延 在 量 向 源 音 動 驅 個 由 述电 前用 由 將 使 量 向 源 音 肋 屋 的 化 期 週 對之 係入 4 输 1 ® 構所 I 3 機 2 索簿 檢碼 源編 音 源 動音 驅 動 驅 數 參 測 預 性 線 的 ib 子 量 之 入 輪 所 9 構 機 碼 钃 数 參 測 預 性 線 諉成 由 合 出 音 求語 , 述 著前· 接和 〇 量 量向 向號 成信 合差 音誤 語的 成入 生輸 以所 成12 合構 測機 預成 性生 線號 行信 進差 (請先閲讀背面之注意事項再填寫本頁) 經濟部中央揉準局員工消费合作社印製 向量的聽覺加權失真。其次,比較評估前述失真,K求出 前述失真變成最小的驅動音源編碼I和與之對應的驅動音 源增益7 I並將前述驅動音源牖碼I和驅動音源增益7的 編碼輸出至多工器3上,同時生成對應前述驅動音源编碼I 之週期化的驅動音源向量乘Μ前述驅動音源增益7的驅動 音源信號,並輸出至音源信號生成機構15上。 在編碼结束之後,多工器3係將對應前述量子化之線性 預測參数的編碼,和對應延遲參數L、驅動音源編碼I、及 音源增益/3、7的編碼送出至傳輸線6上。 其次,就解碼器2之動作加以說明。 首先接受多工器3之輸出的分離機構4,係各別將線性預 測參數之編碼輸出至線性預測參數解碼機構16上;將延遲 參數L、音源增益/3之編碼輸出至自適應音源解碼機構18 、驅動音源編碼簿24上;及將驅動音源媚碼I、音源增益 7之煸碼输出至驅動音源解碼機構20上。 驅動音源編碼簿24存雠有和編碼側之驅動音源編碼薄23 本紙張尺度適用中國國家橾準(CNS ) Α4規格(210Χ297公釐) 10 317631 A7 ___B7 五、發明説明(8 ) 相同的N個驅動音源向量,並K對應前述延遲參數l之每一 通期重覆對應由驅動音源解碼皤構20所輸入之驅動音源編 碼I的驅動音源向躉並將之週期化,而.輸出至驅動音源解 碼機_ 20上。 驅動音源解碼機構20,係從前述驅動音源增益7的篇碼 中解碼驅動音源增益’ K生成由前述驅動音源編竭簿2 4所 鑰入之週期化的驅動音源向量乘以前述音源增益7的驅動 音源信號,而輸出至音源信號生成機構21上。 音源信號生成機構21係相加由自墉應音源解碼櫬構〗^所 输入的自通應音源信號和由驅動音源解碼檄構20所输入之 驅動音源信號Μ生成音源信號,並输出至自瑾應音源鍰碼 薄17和合成》坡器22上。合成濾波器22係將由前述音源信 號生成機構21所输入之音源信號,使用由線性預測參数解 碼機構16所輸入之線性預测參數進行線性預測合成,並输 出語音输出7。 〔發明所欲解決之問題〕 經濟部中央揉準局員工消费合作社印裝 在上述習知之語音编碼解碼装置中,在進行鏞碼之音源 檢索之際,可按照延遲參數,將自逋應音源向量或驅動音 源向量遇期化生成而作為圈框長度的音源向量•將之進行 媒性預測合成而生成語音合成向董*以求出在画框長度區 間之語音输入向a和語音合成向a的失真。但是,由於加 在線性預測合成上的運算量大,所以在音源檢索上有箱要 很大運算1之間題。 本發明係為解決如此間趄而所成者*其目的係在於獲得 11 (請先閲讀背面之注意事項再填寫本X ) 本紙張尺度逋用中國國家揉率(CNS > A4规格(210X297公釐) 317631 A7 B7 五、發明説明(9 ) 一種將語音進行編碼時,可迴避語音合成之品質劣化,且 K很少的運算量就可生成品質良好的語音合成之語音編碼 裝置及語音編碼解碼裝置者。 〔解決問題之手段〕 為解決上述問題*本發明之語音編碼裝置,係具備有: 目標語音生成機構*係從語音輸入中生成對應延遲參數之 向量長度的目標語音向量者;自適應音源編碼簿,係從過 去所生成的音源信號中生成對應前述延遲參數之商量長度 的自適應音源向量者;自i應音源檢索機構,係評估與由 前述自適應音源向量中所得之語音合成向量的前述目摞語 音向量相對的失真,並檢索失真為最小的自適應音源向量 者;及圖框音源生成機構,係從前述失真為最小的自通懕 音源向量中生成圖框長度的音源信號者。 經濟部中央樣準局貝工消費合作社印製 (請先閲讀背面之注^^項再填寫本頁) 又,本發明之語音編碼裝置,係更具備有:第二目檷語 音生成機構,係從目標語音向最和失真成為最小的自逋應 音源向量中生成第二目標語音向量者;驅動音源編碼簿* 係用以生成對應延遲參數的向量長度之驅動音源向量者; 驅動音源檢索櫬構,係用Μ評估與由前述驅動音源向1所 獲得的第二語音合成向S之前述第二目標語音向量相對的 失真,而檢索失真成為最小的驅動音源向1者;及第二圖 框音源生成櫬構,係從前述失真成為最小的驅動音源向量 中用以生成第二圖框長度的音源信號者。 又I本發明之語音編碼裝置,係具備有:目摞語音生成 櫬構,係從語音輸入中生成對應延遲參数之向1長度的目 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) -12 _ ______B7_ 五、發明説明(10 ) 標語音向1者;驅動音源编碼簿,係用以生成對懕延遲參 数的向量長度之驅動音源向量者;驅動音源檢索機構,係 用Μ評估與由前述驅動音源向1所獲得的語音合成向量之 前述目摞語音向量相對的失真,而檢索失真成為最小的驅 動音源向量者;及圈框音源生成機構,係從前述失真成為 最小的驅動音源向JI中用以生成圓框長度的音源信號者。 又,本發明之語音编碣装置,係更進一步對應語音输入 之節距週期而決定目棵語音向量及驅動音源向量的向1長 度者。 又,本發明之語音編碼装置,係更進一步使對應延埋參 數的向量長度取有理數者。 又,本發明之語音缠碼装置,係更進一步使目標語音生 成櫬構,Μ對應延遲參數的每一向ft長度分割圖框的語音 輸入,並相加平均每一向逢長度的語音输入Μ生成目檷語 音向量者。 經濟部中央揉率局貝工消费合作社印製 (請先閲讀背面之注意事項再填寫本頁) 又*本發明之語音鑷碼装置,係更進一步使目禰語音生 成櫬構,Κ每一向董長度分剌對應延遲參數的向置長度之 整數倍長度之語音輸入,並相加平均每一向量畏度的語音 輸入Κ生成目檷語音向量者。 又,本發明之語音編碼装置,係更進一步使對懕延遲參 數的向霣長度之整數倍長度在圔框長度Μ上者。 又,本發明之語音煸碼装置,係更進一步使目檷語音生 成機構按照有闞對應延遲參數的每一向S長度的語音输入 之特戤量* Μ決定每一向量長度相加平均語音_入以生成 _;_ 本紙張尺度適用中困國家標準(CNS ) Α4规格(210Χ297公釐) _ 13 · A7 經濟部中央梯準局貞工消费合作社印製 Β7 五、發明説明( II ) 1 S 摞 語 音 向 最 之 際 的 加 權 者 0 *· 1 | 又 , 本 發 明 之 語 音 m 碼 裝 置 9 係 更 進 — 步 使 有 鼷 對 應 延 1 1 1 遲 參 數 的 每 一 向 量 長 度 的 語 音 輪 入 之 特 徵 1 至 少 包 含 語 音 請 先 1 1 输 入 的 功 率 資 訊 者 〇 閱 背 I 又 f 本 發 明 之 語 音 m 碼 裝 置 9 係 更 進 — 步 使 有 闞 對 應 延 面 之 注 1 * I 遲 參 數 的 每 — 向 量 長 度 的 語 音輸 入 之 特 戡 董 至 少 包 含 語 音 意 事 1 項 I; 輪 入 的 相 « 資 訊 者 0 再 填 1 又 9 本 發 明 之 語 音 m 碼 裝 置 係 更 進 一 步 使 百 標 語 音 生 寫 本 頁 裝 1 成 機 構 按 照 對 應 延 遲 參 數 的 每 一 向 董 長 度 的 語 音 輸 入 之 時 1 1 間 闞 係 Μ 決 定 每 — 向 量 長 度 相 加 平 均 語 音 輸 入 生 成 巨 1 I 標 語 音 向 1 之 際 的 加 播 者 0 1 :訂 又 本 發 明 之 語 音 編 碼 装 置 9 係 更 進 — 步 使 百 檷 語 音 生 1 1 成 機 構 在 Μ 對 懕 延 遲 參 数 的 每 一 向 量 長 度 相 加 平 均 語 音 • 1 1 輸 人 之 際 微 調 每 一 向 量 長 度 的 語 音 _ 入 之 時 間 闞 係 者 〇 1 1 又 f 本 發 明 之 語 音 m 碼 装 置 9 係 更 進 一 步 使 圜 框 音 涯 生 線 成 機 構 9 Μ 前 述 每 — 向 量 長 度 重 覆 使 對 應 延 遲 參 數 的 向 1 I 長 度 之 音 源 信 號 通 期 化 9 Μ 生 成 圓 框 長 度 的 音 源 信 號 者 0 1 I 又 t 本 發 明 之 語 音 編 碼 装 置 係 更 進 一 步 使 圈 框 音 源 生 1 1 成 機 構 9 在 圈 框 間 内 插 對 應 延 理 參 數 的 向 量 長 度 之 音 源 信 1 1 號 Η 生 成 音 源 信 號 者 0 1 又 9 本 發 明 之 語 音 煸 碣 裝 置 » 係 更 進 — 步 使 i 逋 應 音 源 1 1 檢 索 機 構 具 儀 合 成 瀘 波 器 , 使 用 該 合 成 濾 波 器 之 脈 衝 應 答 1 1 I • >λ 反 覆 計 算 與 由 前 述 白 適 應 音 源 向 董 中 所 得 之 語 音 合 成 1 1 向 董 的 前 述 Β 標 語 音 向 量 相 對 的 失 興 者 ΰ 1 1 本紙張尺度適用中國國家標準(CNS > A4说格(210X297公釐) 五、發明説明(12 ) 又,本發明之語音鐶磚装置*係更進一步具備»加抽樣 (up-sanpling)語音輸入的語音_入增加抽樣櫬構,而目 標語音機梅係從被增加抽樣的語音鑰入中生成目禰語音向 量者。 又,本發明之語音编碥装置*係更進一步具備增加抽樣 遇去所生成的音涯信號之音源信號蝤加抽樣機構,而自逋 懕音涯编碼薄係從被増加抽樣之過去所生成的音源信號中 生成自遒應音涯向量者〇 又•本發明之語音煸碾裝置,偁更進一步使增加抽樣機 構對應延邏參數,以變更增加抽櫬倍率者。 又,本發明之語音纗碣裝置*係更進一步使埔加抽樣櫬 構只在按照對應延邏參數之向量長度的範園內變更語音输 入或音源信號的增加抽樣倍率者。 經濟部中央標準局貝工消费合作社印製 又,本發明之鑷碼解碼装置,係在编碣俩具備有,目獮 語音生成櫬構,係從語音_入中生成對應延邏參數之向最 長度的目檷語音向1者;自癰應音源编碣簿,係從逢去所 生成的音源信號中生成對應前述延遲參數之向ft長度的自 癯應音源向量者;自遘應音源檢索機構,係評估與由前述 自逋應音源向ft中所得之語音合成向δ的前述目檷語音向 最相對的失真,並檢索失真為最小的自磨應音源向ft者; 及框音涯生成拥構·係從前述失真為最小的自遑應音源 向量中生成圈框畏度的眘源信號者•另一方面在解碼側具 拥有•自適應音源編碼薄,用以生成對應延遲參數之向量 長度的自遍應音源者:及框音源生成播構,係從自遑應 15 (請先Η讀背面之注^^項再填寫本買) 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐〉 317631 A7 B7 五、發明説明(13 ) 經濟部中央橾準局員工消费合作社印敢 音源向量 又,本 有,第二 音源向量 係用K生 驅動音源 獲得的第 失真,而 框音源生 中用Μ生 側具備有 向1長度 係從驅動 又*本 語音生成 向量者; 源向量者 音源向董 的失真, 音源生成 用以生成 有,驅動 度之驅動 源向量中 中用Κ生 發明之編 目標語音 中生成第 成對應延 檢索機構 二語音合 檢索失真 成機構, 成第二圖 ,驅動音 之驅動音 音源向量 發明之編 機構,係 驅動音源 ;驅動音 所獲得的 而檢索失 機構,係 圖框長度 音源編碼 音源向量 用Μ生成 成圖框長度的音源信號者 碼解碼裝置,係更進一步 生成機構,係從目標語音 二目標語音向量者;驅動 遲參數的向量長度之驅動 ,係用以評估與由前述驅 成向S之前述第二目標語 成為最小的驅動音源向量 係從前述失真成為最小的 框長度的音源信號者,另 源編碼簿,係用Μ生成對 源向量者;及第二圔框音 中用Μ生成第二圓框長度 碼解碼裝置,係在編碼側 從語音輸入中生成向量長 編碼簿,係用以生成向量 源檢索機構,係用Κ評估 語音合成向S之前述目標 真成為最小的驅動音源向 從前述失真成為最小的驅 的音源信號者,另一方面 簿,係用Μ生成對應延遲 者;及圖框音源生成櫬構 圖框長度的音源信號者。 在編碼側 向虽和自 音源編碼 音源向量 動音源向 音向1相 者;及第 驅動音源 一方面在 應延遲參 源生成機 的音源信 具備有, 度的目檷 長度之驅 具備 適應 簿, 者; 量所 對的 二圓 向量 解碼 數的 構, 號者 目標 語音 動音 與由前述驅動 語音向量相對 量者;及圖框 動音源向董中 在解碼側具蔺 參数的向量長 ,係從驅動音 (請先聞讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家橾準(CNS ) Α4規格(210 X 297公釐) 16 1 β - 經濟部中央梯準局貞工消费合作社印製 A7 B7 五、發明説明(u ) 〔發明之實施形態〕 霣施形態1 . B1為顬示本發明之實施形態1之語音«碼裝置及語音解 碼裝置的全«構成方塊_。 圔1中,1為編碼器,2為解碥器,3為多工器,4為分敲 檄構,5為語音输入,6為傳輪線,7為語音_出° 堪碼器1係由W下之8、9、15、及25〜36所構成° 8為镍 性預_參數分析櫬構,9為線性預澜參數纗碥機構’ 15為 音源信號生成機構,25為抽出語音输入之節距邐期的節距 分析檐構,26為在檢索自《I應音源向量之際決定廷遲參數 之檢索範圃的延邇參數檢索範麵決定檐構,27為埔加抽樣 語音输入的語音鑰入谱加抽樣機構,28為用K生成對應延 IS參数之向霣長度之目標語音向量的目禰語音生成機構* 29為增加抽樣過去所生成的音灌信號之音涯信號增加抽樣 機構· 30為從過去所生成的音海信轚中輸出對懕延遲參数 之向量長度之自缠應音源向量的自逋應音源編碼簿· 31為 自通應音源檢索櫬構•係評估與由自鳙應音源向董中所得 之語音合成向量的前述目禰語音向量相對的失冥•並檢索 .失真為最小的自通應音源向量者· 32為圈框音源生成機構 ,係從對應延遅參數之向量長度的自癱懕音源向ft中用Μ 生成圖框長度的音源信號者,33為第二目禰語音生成欐構 ,係用以對應驅動音源向ft檢索中之音源#數的向量長度 之目禰語苷向最者;34為驅動音源編碾簿,係鍮出對應延 埋#數之向最長度的驅動音源向量者,35為驅動音源檐索 本紙張尺度適用中國國家橾隼(CNS ) A4規格(210X297公釐) 17 —I- —>i Bmli n L.^— a^i· ϋ^_— 1 ·ϋ mmmm I (請先w讀背面之注$項再填寫本茛) 訂 線· 317631 經濟部中央標準局*ς工消费合作社印装 A7 B7五、發明説明(15 ) 鼸構’係用Μ評估與由糴動音源向量所獲得的語音令成向 量之第二目禰語音向量相對的失真,而檢索失真成為最小 的颺動音源向量者,36為第二麵框音源生成櫬構,係從對 應延遲參數之向1長度的驅動音源信號中用Μ生成圖框長 度的騸動音源信號者。 又,解碭器2係由以下之16、21、22、及37〜43所構成 。16為線性預測參數解碣櫬構,21為音涯信號生成皤構, 22為合成濾波器* 37為埔加抽樣過去所生成的音源信號之 音源信號增加抽樣機構,38為输出對懕延遲參数之向量長 度之自缠應音源向量的自遽懕音源鑷碼薄,39為自癯應音 涯解碼櫬構,係解碣辑應延埋#败之向量長度的自竈應音 源信號者,40為園框音灌生成機構,係從對應延邏參數之 向量長度的自遴應音涯向悬中用Μ生成圏框長度的音源信 號者,41為驅動音源鏞磚薄,係输出對應延蠼參數之向量 長度的驅動音源向量者,42為皤動音源解碼櫬構,係解碣 對應延遲#數之向量長度的驅動音源信號者* 43為第二圃 框音涯生成檄構,係從對懕延遲參數之向量長度的驅動音 源信號中用Κ生成圓框長度的鼷動音源信號者。 Κ下說明動作。 首先在鍰碼器1中,例如,以8kHz所抽樣的数位語音信 號當作語音输入5來_入。媒性預拥參數分析懺構8,係分 析前述語音輪入5·而抽出作為語音之頻繒包络資訊的媒 性預測參數。接著線性預拥I參數絪碣懺構9將已抽出的前 述線性預拥參敝量子化 > 並將與之對應的編碣_出至多工 本紙張尺度逋用中國國家橾準(CNS) A4规格(2丨〇><297公釐> -18 - (锖先閱讀背面之注意Ϋ項再填寫本頁) 裝· 訂 線* Α7 Β7 經濟部中央揉率局員工消费合作社印装 五、發明説明(16 ) 器3上,同時將董子化的媒性預测參数耱出至自適應.音涯 檢索櫬構31、第二目禰語音生成機構33、鼷動音源檢索嫌 構35上。 節距分析機構2 5係分析語音输入5M抽出節距通期P。接 著延遲#數檢索範圍決定櫬構26係從前述節距通期P例如 按照式(1)而決定在檢索自癩懕音源向量之際的延遲參數1 之檢索範团1» t ηέΐέΐ» a X *以輸出至語音輪入埔 加抽樣拥構27、音源信號增加抽樣櫬構29、自邊應音源檢 索機構31上。在此,ΔΡ例如係設為P/10。 1 m I η =Ρ-Δ Ρ 1· a χ=Ρ + ΔΡ (1) 語音輸入增加抽樣機構27係Μ按照由延遲參數檢索範醒 決定櫬構26所輪入的延蠼參數之檢索範園之抽樣率,例如 在作為钃碣音源信號的睪位之圔框區間增加抽樣語音输入 5·以_出至目標語音生成櫬構28上。在此,埘加抽樣率 ,例如係依Μ下所決定。 在丨之情況,進行4倍增加抽樣。 在45盔lmtn<65 之情況*進行2倍增加抽樣。 在65忘1« I η 之情況,不進行埔加抽樣。 目檷語音生成櫬構28係將由前述語音輸入增加抽樣襪構 2 7所输入之增加抽樣的圓框長度之語音输入,對懕由自逋 應音源檢索機構31所輸入的延遲參數1·例如Μ每一通期1 進行分割,藉由相加平均對懕該分割之延«參败丨的每— 向ft長度之語音输入· Η生成對應延邐參败1的向量長度 (請先聞讀背面之注意事項再填寫本頁) h -裝. 訂 本紙張尺度適用中國國家標準(CNS ) A4規格(210x297公釐) 19 317631 經濟部中央揉準局負工消费合作社印製 A7 __B7_____五、發明説明(Π ) 之目檷語音向5,並輪出至自缠應音源檢索機構31和第二 目禰語音生成櫬構33上。在此,延遲參數1除了取整數值 之外*亦可取非整數有理數,按照1的存在範画,例如在 將It n t作為整数俥延埋的情況時可取以下的值。 在 1<45 之情況,It n t , li n t+1/4* 1· „ 1/2 * 1 t „ t +3/4 在 45忘 1<65 之情況,h n t , li n t +1/2 在65Sil 之情況,llnt 圈2顯示對應由圓框長度之語音鑰入所生成的延通參數1 的向ft長度之目檷語音向最之例。在此,在1泛圖框長度 的情況不會進行前述相加平均,而畲將鼷框長度的語音輪 入當作目禰語音向量。 音源信號增加抽樣拥構29»係只在按照由前述延遲參數 檢索範園決定懺構2 6所輸入的前述延遲參數之檢索範園之 自缠應音源檢索所需要的區間上,Μ對應前述延邐參数之 檢索範圃之抽樣率埔加抽搛在由音源信諕生成櫬構15所输 入之過去中所生成的音源信號,Μ輸出至自缠應音源编碣 簿30上。在此,增加抽樣率,例如係依以下所決定。 在1<45 之區間,進行4倍增加抽樣。 在45客1<65 之區間,進行2倍增加抽樣。 在65^1 之S間,不進行螬加抽樣。 自癱懕音源鏞碼薄30係從由音源信號坩加抽樣櫬構29所 輪入之增加抽樣的音源信號中,將對應由自遽應音源檢索 播構31所輸入的延趣參數1之向ft長度的自邇應音源向蠢 (請先閱讀背面之注$項再填寫本頁) η 訂 >» T-錬. 本紙張尺度適用中國國家標準(CNS ) A4規格(21〇X297公釐) -2〇 經濟部中央揉準局員工消费合作社印«. A7 B7_ 五、發明説明(18 ) ~ _出至自適應音源檢索櫬構31上。在此,前述自逋應音涯 係對延遲參數1切出通去一個抽樣的音涯信號者,在1 畏度的情況會從—個抽樣過去中切出画框長度的音 源信號》 自《H音源檢索機構31·具備有合成漶波器,係使用由 數鑷碼櫬構9所输入之量子化的線性預测參數 Μ求出合成》波器的脈衝應答。其次,對於llB t „彡1彡 1» a x的範圃之延遲參數丨,使用前述脈衡應答反覆計算 &成由前述自逋應音源編碼簿30所输入的自逋應音源向最 ’ W生成語音合成向S。接著,求出由萌述目標語音生成 櫬《28所输入的目標語音向量和前述語音合成向ft的聽覺 加權失真。其次,比較評估前述失真,Μ求出前述失真變 成最小的延理參數L和與之對應的自遽應音源埔益点*並 將前述延遲參數L和自遽應音源增益々的鏞碼輸出至多工 器3和驅動音源鑷碼薄34上,同時生成對應前述延邂參數L 之自瘇應音源參數乘以前述自通應音源增益/3的自逋應音 源信號,並输出至画框音源生成櫬構32和第二目標語音生 成櫬構33上。在此,前述自適應音源信號,係在L〈豳框長 度的情況為L抽樣,而在LS_框長度的情況為圈框長度的 信號。 圈框音滙生成櫬構3 2係從由前述自遘應音源檢索檄構31 所輸入的自遽應音源信號中,例如Μ每一通期L反覆進行 並埋期化· Κ生成圖框長度的自遘懕音源信號,而输出至 音源信號生成機構15上。 本紙張尺度適用中國國家標準(CNS ) Α4規格(210X297公釐) -2\ - (請先閲讀背面之注$項再填寫本頁) -裝· 訂 線 經濟部中央揉準局男工消费合作杜印製 A7 B7 五、發明説明(19 ) 第二目標語音生成機構33係使用由媒性預測參數禪碼機 構9所輸入之量子化的線性預测參數將由前述自逋懕音源 檢索櫬構31所輸入的自適應音源信號進行線性預測合成Μ 生成語音合成向量。接著,求出由目禰語音生成機構28所 輸入的目標語音向量和前述語音合成向量的差,並將之當 作第二目標語音向1輸出至驅動音源檢索機構35上。 在驅動音源牖碼簿3 4上,例如存儲有從皤櫬噪音中所生 成的Ν個驅動音源向1,以對應前述延遲參數L的向量長度 切出及輸出對應由驅動音源檢索櫬構35所輸入的驅動音源 编碼i之驅動音源向量。在此,在L迄臛框長度的情況,畲 輪出圓框長度的驅動音源向董。 驅動音源檢索櫬構35傜對N俚驅動音源向Μ,將由前述 驅動音源編碼簿34所输入之切出的騮動音源向5使用由前 述媒性預測參數蹁碼櫬構9所输入之ft子化的線性預測參 數進行線性預測合成Μ生成語音合成向量。接著,求出由 前述第二目禰語音生成機構33所输入的第二目標語音向董 和前述語音合成向量的聽覺加播失真。其次*比較評估前 述失真,Κ求出前述失真變成最小的驅動音源鏞碼I和與 之對應的驅動音源增益7 ,並將前述驅動音源鏞碼I和驅 動音源坩益7的編碼輸出至多工器3上*同時生成對應前 述驅動音源編碼I的驅動音源向ft乘Κ前_驅動音源增益 7的驅動音源信號,並輸出至第二躕框音源生成櫬構36上。 第二鬮框音源生成機構36係從由前述驅動音源檢索櫬構 35所輸入的驅動音源信號中,例如以每一週期Lit行反覆 本紙張尺度適用中國國家標準(CNS ) A4规格(210X297公釐) -2 2 - (請先聞讀背面之注$項再填寫本頁) -裝. 訂 線 經濟部中央橾準局貝工消费合作社印製 A7 B7 五、發明説明(20 ) 埋期化,以生成圈框長度的驅動音源信號,而输出至音源 信號生成檐構15上。 音源信號生成櫬構15係相加由前述匾框音源生成機構32 所输入的圔框長度之自通懕音源信號和由前述第二圆框音 源生成櫬構36所输入的圃框長度之驅動音涯信號K生成音 涯信號,而输出至音源信號增加抽樣機構29上。 在K上鏞碣结束之後,多工器3係將對應前述量子化之 線性預洒參數的钃碼,和對應延遲參數L、驅動音源緬碥I 、及音源«益/3、7的編碼送出至傳输媒6上。 Μ上係本資施形態1之語音«碼装置的特激性動作。 其次,就解碼器2之動作加以說明》 首先接受多工器3之輸出的分離循構4,係各別將線性預 測參數之鏞碼输出至線性預澜參數解碥機構16上;將延遲 參數L之鑷碼输出至自遺應音源解碣機構39、颶励音源钃 碼薄41上;音源增益泠之钃碥輸出至自逋應音源解碼櫬構 39上;及將鼴動音源繾碣I、音源堪益7之獮碣輪出至驅 動音源解碼機構42上。 自逋應音源解碼櫬構39·係首先將前述延埋參數輸出至 音源信號埒加抽搛機構37和自癯應音源.钃碣薄38上。音涯 信號增加抽揉機構37,在按照由前述自瘇應音源解碣櫬構 3 9所輸入的前述延遲參數L之值之自遘應音源向量生成所 需要的匾間,Κ按照前述延遲參數L之值之抽樣率*埔加 抽樣由音湄信號生成镛構21所輸入的過去中所生成的音滙 信號,以輸出至自逋應音源钃磉簿38上。在此•增加抽樣 (請先聞讀背面之注$項再填寫本頁) .裝. 訂 -線· 本紙張尺度適用中國國家標车(CNS ) A4規格(210 X 297公釐) 23 317631 Α7 Β7 經濟部中央揉準局負工消費合作社印製 五、發明説明(21 ) 率係和編碼器中的音源信號埔加抽樣櫬構29同樣決定。 自逋應音源纗碼簿38係從由前述音源信號增加抽樣櫬構 3 7所輪入之增加抽樣的音滙信號中,將對應由自缠懕音源 解碼檄構39所輪入的延遲參數L之向量長度的自邊應音源 向最輸出至自邊應音涯解碣櫬構39上。在此,萠逑自應應 音源向霣係對延遲參數L切出通去L參數的音源信號者,在 框長度的情況畲從L抽樣達去中切出鼸框長度的音源 信號。 自逋應音源解碼機構39·係從前述自逋應音源堉益冷之 撙益中解碣自逋應音源埔益/3,以生成由前述自逋應音涯 钃碼薄38所输入的自遽應音源向最乘K前述自遽懕音源增 益卢的自遽應音源信號,而输出至Η框音涯生成檐構40上 。框音源生成機構40係由自遽應音源解碼機構39所输入 的自瓛應音源信號中,例如Κ每一埋期L反覆遁期化,Κ 生成讓框長度的自遒應音源信號,而_出至音源信號生成 機構21上。 鼷動音源编碼簿41係存雔有和編碼側之驅動音源編碼薄 34相同的Ν儷驅動音源向量,並以對應前述延遲參數L之向 fi長度切出對應由SI動音源解碼櫬構42所_入之驅動音源 钃碣I的驅動音源向量·而輸出至驅動音源解碼播構42上。 驅動音源解碣機構42,係從前述鼷動音源增益7的堪碼 中解碼驅動音源增益,Μ生成由前述驅動音源級碼薄41所 輪入之切出的軀動音源向量乘以前述音源埔益7的驅_音 源信號,而_出至第二圈框音源生成播構43上。第二_框 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ 297公釐) -----J---- I 裝-- (請先W讀背面之注意事項再填寫本頁) 訂 線 經濟部中央揉準局負工消費合作社印装 A7 B7___ 五、發明説明(22 ) # is生成櫬構43係從由前述騸動音源解碼櫬構42所输入的 音源信號中,例如以每一埋期P反覆遇期化,以生成 圓框長度的驅動音灌信號,而輪出至音源信號生成櫬構21 上。 音源信號生成機構21係相加由前述鼷框音源生成櫬構40 所_入的Η框長度之自應應音源信號和由前述第二_框音 源生成櫬構43所输入之圔框長度的騮動音源信號Μ生成音 源信號*並输出至音源信號埔加抽樣馥構37和合成濾波器 22上》合成瀘波器22係將由前述音源信號生成機構21所输 入之音源信號,使用由嬢性預拥參數解碣機構16所鑰入之 線性預測參數進行媒性預測合成,並_出語音输出7。 以上係本實施形態1之語音解碣装置的特激性動作。若 依據本實施形態1,則在決定最佳的延埋參數之際,當延 理參數1短於圔框畏度時遇期性相加平均語音输入Μ生成 向量長度1的目禰語音向量·相對於此轷估和線性預測合 成向量县度1之自遒應音涯向量所生成的語音合成向量之 失真•又*即使在決定最佳的鼷動音源》礓之際,亦可箱 由在失真評估上使用媒性預测合成向董長度1之驅動音滙 向fi所生成的語音合成向量,埋避語音合成的品質惡化, 及可生成理算量少且品霣佳的語音合成。 霣施形態2 . 在上述賁施形態1中,在_框音源生成《構32、40及第 二國框音源生成機構36、43之方面,雖係Μ每一理期L反覆 將對應延邏參數L之向fi長度的自蠊應音源信號或騸動音 本紙張尺度適用中國國家揉準(CNS >A4規格(2丨0X297公釐) -25 - (請先聞讀背面之注f項再填寫本頁) η -裝. 訂 線 317631 A7 B7 五、發明説明(23 ) 源信號遇期化,K生成圃框長度的自適應音源信號或驅動 音源信號,但是亦可將前述對應前述延遲參數L之向霣長 度的自適應音源信號或驅動音源信號,例如Μ每一通期L 進行波形内插,在圈框間内插,Μ生成圖框長度的自適應 音源信號或驅動音源信號。 若依據該貿施形態2,則可使圈框間的音源信號之變化 變得滑顒,使語音合成的再琨性良好,並使品質提高。 實施形態3. 在上述實施形態1、2中,雖係從對應延遲參數L之向董 長度的自逋應音源信號或驅動音源信號中,使用圈框音源 生成櫬構和第二圖框音源生成櫬構Κ生成疆框長度的自逋 應音源信號或圖框長度的驅動音源信號,並相加該等而生 成圖框長度的音源信號,但是亦可相加對應延遲參數L之 向量長度的自適應音源信號或驅動音源信號以生成對應延 遲參數L之向量長度的音源信號,例如以每一週期L反覆將 之遇期化,而生成圖框長度的音源信號。 g施形態4. 經濟部中央梂隼局貝工消费合作社印装 (請先閲讀背面之注意事項再填寫本頁) 在上述實施形態1中,編碼器、解碼器雖皆採用新的構 成,但是亦可將编碼器作為實施形態1的编碼器,將解碼 器作為圖12所記載之習知的解碼器。 實施形態5. 在上述實施形態1中,在目標語音生成機構28之方面雖 该從國框長度之語音输入中生成對應延遲參數1之向量長 度的目標語音向量,但是亦可如圈3所示,從對應延遲參 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) -2 6 _ 317631 A7 ._____'_B7_ 五、發明説明(24 ) 數1之向量長度之整數倍長度的語音輸入中生成目檷.語音 向量。 (請先閲讀背面之注$項再填寫本頁) 若依據該實施形態5,則在生成目標語音向量之際的平 均化處理中,沒有必要進行向量長度相異的向量,就可簡 單處理。又,藉由在使超出圖框長度之語音輸入進行語音 傾碼之際的評估上使用,就可使語音合成之再現性良好, 並使品質提高。 實施形態6. 在上述實施形態1中,在目禰語音生成機構28之方面雖 係在從語音輸入中生成對應延遲參數1之向fi長度的目禰 語音向量之際將之單純平均,但是亦可如IB 4所示,利用 按照對應各延遲參數1之向量長度之語音输入的功率加播 ,例如,使功率越大加播就越大,而進行相加平均。 若依據該實施形態6,則在生成目檷語音向量之際的平 均化處理中,藉由利用語音输入之功率大的部分進行加播 語音編碼,就可使帶給主觀品質影響大的語音合成之功率 經濟部中央梂準為貝工消费合作社印策 高 提 質 品 使 並 好 良 成 變 性 規 再 r 之態 分形 部施 的實 大 中 数 參 遲 延 應 對 成 生 態中 形入 施輸 實 音 述語 上從 在在 係 雖標 面目 方的 之度 28長 構量 機向 成之 生I 音 語 檷 百 在 用播 利加 ’ 率 示功 所的 It 入 0 音 語 之 度 長 5量 均向 平之 純U 單數 之參 將遲 際延 之各 Μ 應 向對 音照 語按 圖 如 可 亦 是 但 输 音 語 之 長 數進 參以 遲’ 延播 各加 他小 其縮 應況 對情 和的 在低 , 關 如相 例的 , 入 27 本紙張尺度逋用中國國家標準(CNS ) Α4規格(2!0><297公釐) 317631 A7 B7 經濟部中央揉準局貞工消費合作社印製 五、發明説明( 25) 1 1 若 依 據 該 實 施 形 態 7 則在生成目標語音向量之際.的平 1 1 均 化 處 理 中 藉 由 縮 小 語 音輸入在具有週期1之週期性的 1 1 情 況 相 關 低 的 部 分 之 加 權 而進行語音編碼,即使對節距週 請 先 1 1 期 變 動 的 語 音 輸 入 亦 可 生 成對應1節距週期的失真小的目 閲 背 1 1 標 語 音 向 量 使 語 音 合 成 之再現性良好,並使品質提高。 面 之 注 1 實 施 形 態 8 . 1 項 μ 在 上 述 實 施 形 態 1中 在目標語音生成機構2δ之方面雖 再 填 1 係 在 從 語 音 輸 入 中 生 成 對 應延遲參數1之向躉長度的目標 % 本 頁 奴 1 語 音 向 量 之 際 將 之 單 純 平 均,但是亦可如圖6所示*利用 1 1 按 照 對 應 各 延 遲 參 數 1之向1長度之語音輸入的功率加權 1 I » 例 如 對 圖 框 境 界 近 旁 的語音輸入加大加權,以進行相 1 1 訂 加 平 均 0 1 I 若 依 據 該 實 施 形 態 8 則在生成目標語音向量之際的平 1 1 均 化 處 理 中 藉 由 在 ΓΕΠ 腦 框 境界近旁的語音輸入中加大加權 1 而 生 成 巨 τη 保 語 音 向 3 並 將之編碼,就可使圖框境界近旁 γ 線 | 的 語 音 合 成 之 再 現 性 良 好 ,並可使圖框間的語音合成之變 化 變 得 滑 順 0 該 效 果 係 在圔框間内插生成實施形態2中 1 之 音 源 信 號 的 情 況 時 特 別顯著。 1 1 簧 施 形 態 9 . 1 1 在 上 述 實 施 形 態 1中 ,在目標語音生成機構28之方面雖 1 I 像 在 從 語 音 輸 入 中 生 成 對 應延遲參數1之向虽長度的目標 1 1 I 語 音 向 最 之 際 Μ 每 · 週 期 1相加平均語音輸入,但是亦可 1 1 I 如 圖 7所示 ,例如為了使對應各延遲參數1之向量長度之語 1 1 音 輸 入 間 的 相 互 關 (c Γ 0 S S -correlation)變成最大•而微 1 1 本紙張尺度適用中國國家梂準(CNS ) A4規格(210X297公釐) -28- A7 B7 經濟部中央梂準局負工消费合作社印«. 五、發明説明(26) 鋦切出語音輪入的位置*並將之相加平均。 若依據該實腌形軀9,則在生成目搮語音向1之際的平 均化處理中,為了使對應延蠼參數1之向董長度之語音輪 人間的相互闞變成最大而微調所切出的位置,即使對節距 通期變動的語音輪入亦可生成對應1節距通期的失真小的 目禰語音向量,使語音合成之再現性良好*並使品質提高。 資施形態10 . 画8為顯示本發明寊胞形態10之語音鏞碼装置及語音解 碼装置之全級構成的方塊画。在該圈中和圈1相同的部分 使用相同的符號,並省略其說明。 H8中,與圈1比較新的構成係如K下所示。44為增加抽 樣語音輸入的語音输入增加抽揉機構,45為用Μ生成對應 節距遇期之向量長度之目樣語音向量的目檷語音生成檐構 ,46、51為軀動音源蟠碣薄,係輸出對應節距通期之向ft 長度的驅勖音源向ft者,47為驩動音源檢索播構,係用Μ 評估與由鼷動音源向JI所蕕得的語音合成向躉之目禰語音 向量相對的失真,而檢索失真成為最小的驅動音海向量者 ,48為第二目欏語音生成拥構•係用以生成對應第二驅動 ,音源向量檢索中之節距通期的向1長度之目標語音向量者 ,49、54為第二驅動音源緬碣薄,係用以輸出辑應節距埋 期之向量長度的第二驅動音源向量者,50為第二播動音源 檢索機構,係用以評估與由第二糴動音源向董所獲得的語 音合成向1之第二目檷語音向量相對的失真,而檢索失真 成為最小的驅動音源向fi者· 52為驅動音源解碣機構•係 本紙張尺度適用中國國家標準(CNS > A4规格(210X297公釐) -29 - (請先閲讀背面之注$項再填寫本頁) •裝· 訂 線. S17631 A7 __B7_ 五、發明説明(27 ) 用以解碼對應節距通期之向量長度的驅動音源信虢者· 53 為團框音源生成機構,係從對應節距通期之向量長度的驅 動音涯向量中用Μ生成圈框長度的驅動音源信號者,55為 第二驩動音灌解碼櫬搆|係用Μ解碼對應節距通期之向董 長度的第二驅動音源信號者,56為第二圖框音涯生成循構 ,係從對jT節距週期之向量長度的第二驅動音源信號中用 以生成圄框長度的驅動音源信號者。 以下,係以上述的新構成為中心說明其動作。 首先,在«碼器1中,節距分析拥構25係分析語音輸入5 而抽出節距埋期P,以输出至多工器3、語音輸入增加抽樣 機構44、目檷語音生成播構45、驅動音源鏞碼薄46、第二 驅動音源篇碼薄49上。在此,節距通期P除了整數值之外 ,亦取非整數有理數*按照P的存在範,例如在將* 設為蝥數值節距通期的情況取得Μ下值》 .在 Ρ<45 之情況,為 Pi n t »Pl n t+l/4,Pl n t ♦ 1 / 2 . P I n t + 3 / 4 在 45 忘 P<65 之情況,為 P t n * ,P t n t +1/2 經濟部中夬揉準局貝工消费合作社印製 (請先閲讀背面之注項再填寫本頁) 在65SP 之情況,為Pint 語音輸入增加抽樣機構44係M按照由節距分析機構25所 _入的節距通期之抽揉率•例如在作為篇碼音源信虢之簞 位的圓框區間埔加抽樣語音_入5·以输出至目檷語音生 成機構45上。在此,埔加抽樣率,例如係由以下所決定。 在Ρ<45 之情況,進行4倍增加抽樣。 在45SP<65 之情況,埋行2倍壜加抽揉。 本紙張尺度遑用中國國家橾準(CNS > A4规格(210X297公釐) -3〇 ** 經濟部中央搮率扃貝工消费合作社印裝 A7 ___B7_ 五、發明説明(28 ) 在65SP 之情況,不進行增加抽樣。 目檷語音生成機構45係將由前述語音输入增加抽樣機構 44所輪入之埔加抽樣之框長度的語音輪入,對應由節距 分析機構25所輸入的節距理期,並藉由例如以每一埋期p 相加平均,K生成向量長度P的目禰語音向量,而鑰出至 II動音源檢索櫬構47和第二目檷語音生成機構48。在此, 在P^·框長度的情況不進行前述相加平均,而將框長 度的語音输入作為目禰語音向ft。 在驅動音源鏞碾薄46上,存雔有例如由醣檐晚音所生成 的N個鼷動音源向悬,利用對應由前述節距分析機構25所 輸入的節距通.期P之向量長度切出及输出對應由腰動音源 檢衆櫬構47所輸入的糴動音源編碼i之駆動音源向量。在 此,在P备圈框長度的情況,輸出釅框長度的疆動源向量。 驅動音源檢索機構47係對N個祖動音源向量,將由前逑 臛動音源钃碼薄46所输入之切出的驅動音源向量使用由擦 性預測參數编碣櫬構9所《Τ入之量子化的媒性預测參數進 行線性預測合成以生成語音合成向1。接著,求出由前逑 目禰語音生成機構45所輪入的目標語音向量和前述語音合 成向量的聽覺加播失真。其次,比較評估前述失真,W求 出前述失真變成最小的鼷動音源钃碼I和與之對應的驅_ 音源堆益7 ,並將前述驅動音源編碼I和驅動音源增益7 的鏞碼_出至多工器3上,同時生成對應前述驅動音源繾 碥I的驅動音源向ft乘以前述驅動音源增益7的軀動音源 信號,並输出至第二目禰語音生成檝構48上》 本紙張尺度適用中國國家揉準(CNS ) A4规格(210X297公釐) -31 - (請先閱讀背面之注f項再填寫本頁) h 裝· 線‘ 經濟部中央梂率局負工消费合作社印裝 A7 B7__ 五、發明説明(29 ) 第二目欏語音生成機構48係將由前述驅動音源檢索櫬構 47所输入之驅動音源信號使用由線性預两I參數鑲磚櫬構9 所輸入之量子化的媒性預測參數進行線性預測合成W生成 語音合成向量。接著,求出由前述目棵語音生成櫬構45所 _入的目檷語音向量和前述語音合成向量的差•並將之當 作第二目檷語音向量输出至第二儸動音源檢索櫬構50上。 在第二驅動音源篇碼薄49上,存儲有例如由随機嗓音所 生成的N傾驅動音源向霣·利用對應由前述節距分析機構 25所輸入的節距迥期P之向ft長度切出及_出對應由驅動 音源檢索櫬構50所输入的驅動音源掮碼j之第二鼷動音源 向量。在此,在P在圔框長度的情況*輪出圔框長度的驅 動源向5 〇 第二騸動音源檢索機構50係對N個糴動音源向量,將由 前逑第二驅動音源繾碼薄49所_入之切出的第二騮動音源 向量使用由前述線性預测參數編碣機構9所输入之董子化 的線性預测參數進行媒性預測合成以生成語音合成向量。 接著,求出由前述第二目禰語音生成機構48所_入的第二 目檷語音向量和前述語音合成向量的睡覺加播失真。其次 ,比較評估前述失真,以求出前述失真變成最小的第二驅 動音源緬碣J和與之對應的驅動音海增益7 2,並將前述第 二驅動音源鑷碣J和驅動音源增益7 2的緬碼輪出至多工器 3上°The Ministry of Economic Affairs of the People's Republic of China has printed the sound source tweezers codebook 17 and synthesis filter 22 on the Pui Pongong Consumer Cooperative. The synthesis filter 22 synthesizes the sound source signal rounded by the aforementioned sound source signal generating mechanism 21 using the linear prediction parameters input by the linear prediction parameter decoding structure 16 and outputs a speech output 7. In addition, as an improved prior art of the above-mentioned conventional speech encoding and decoding device, and a speech encoding and decoding device that can obtain higher-quality speech output, there are P. Kroon and BSAtal 〃 Pitch predictors with high temporal resolution ^ ilCASSP " 90, pp661-664, 1 990). The conventional speech encoding and decoding device improved by the conventional knowledge is in the structure of the conventional speech encoding and decoding device shown in Fig. 9 * The delay parameter of the retrieval object in the retrieval structure 11 as an adaptive audio source is other than an integer value Non-integer rational numbers are also taken. The adaptive audio source codebooks 10 and 17 interpolate and generate self-appropriating audio source vectors corresponding to the aforementioned delay parameters of non-integer rational numbers between samples of audio signal generated in the past, and the output . Pu 11 shows an example of self-entangled sound source to Dong when the extension parameter 1 is a non-integer rational number. The circle 11 (a) is an example of the case of 12 squat frame length, and the leg 11 (b) is an example of the case of 1 <squat frame length. With this structure, Μ is higher than the accuracy of the voice-in and out-of-pounding period of the voice_in determining the delay parameter, and the self-entangled sound source can be generated, as disclosed in Japanese Patent Laid-Open No. 6 4-40 8 9 9 The installed Dongtong can generate higher-quality voice output 0 (please read the note Ϋ on the back first and then fill in this page) -installed--The line book is again suitable for China National Falcon (CNS) Α4 specification (210X297 Mm) 317631 A7 Αν B7 5. Description of the invention (6) In addition, as other conventional technologies of the conventional speech codec device, there is Japanese Patent Laid-Open No. 4-344699. Circle 12 is a configuration diagram showing an example of the overall configuration of a conventional speech encoding device. In the circle 12, the same parts as those of the circle 9 are denoted by the same symbols, and their explanations are omitted. In Fig. 12, 23 and 24 are drive sound source codebooks, which are different from the drive sound source code book of Fig. 9. Here, the operation of the codec apparatus configured as described above will be described by adding K. Printed by the Beigong Consumer Cooperative of the Central Bureau of Economic Development of the Ministry of Economic Affairs. First, in the encoder 1, the adaptive sound source search mechanism 11, such as the delay parameter 1 in the range of 128 for 20S1, will be set by the aforementioned adaptive sound source codebook 10. The input adaptive sound source uses a linear prediction parameter subdivided by the media prediction parameter encoding mechanism 9 to perform linear prediction synthesis to generate a speech synthesis target. Next, the weighted distortion of the speech input vector and each of the speech synthesis vectors cut out by the speech input 5 in each frame is obtained. Secondly, the aforementioned distortion is compared and evaluated, and the delay parameter L at which the aforementioned distortion becomes minimum and the corresponding adaptive sound source gain / 3 are obtained, and the encoding of the aforementioned delay parameter L and the adaptive sound source gain / 3 is output to the multiplexer 3 And drive the audio source codebook 23, at the same time generate the adaptive sound source signal corresponding to the aforementioned delay parameter L by the adaptive sound source parameter multiplied by the aforementioned adaptive sound source gain / 9, and output to the error signal generation structure 12 and the sound source signal generation mechanism 15 on. On the driving sound source codebook 23, for example, N driving sound source vectors generated by random noise are stored, and the driving sound source input by the driving sound source retrieval mechanism 14 is repeated for each cycle of the aforementioned delay parameter L corresponding to the aforementioned delay parameter L Encode the driving sound source vector of i and output it periodically. Figure 13 (a) shows the case of 9 (please read the precautions on the back before filling in this page) This paper size is applicable to China National Standard (CNS) A4 specification (210X297mm) 317631 Α7 Β7 V. Description of invention (7) Example ^ ^ ^ Amount} Periodic direction (b The sound graph of the source 13 will move in parallel, , The situation is that the number of samples in the period of the emotional situation is quantified and the whole source is non-sounding, driving into the The delay in the parameters shown in the biometrics is driven by the volume source sound before the electricity is used. It will be transferred to the 4 input 1 ® Institute I 3 machine 2 cable. The code check source, sound source, dynamic sound drive, drive parameter test predictive line, ib component, input wheel, 9, structure, code, parameter test, predictive line, utterance, interrogation, interrogation, 〇Quantity to the number of signs into the letter and the difference between the formation of the input and output of the error into the 12 pre-formed test machine pre-formed line number line letter advance (please read the precautions on the back before filling this page) Economy The weighted distortion of the auditory weight printed vectors of the Employee Consumer Cooperatives of the Ministry of Central Accreditation Bureau. , Compare and evaluate the aforementioned distortion, K find the driving source code I and the corresponding driving source gain 7 I corresponding to the minimum distortion and output the driving source code I and the driving source gain 7 codes to the multiplexer 3, Simultaneously generate the driving sound source signal corresponding to the periodic driving sound source vector of the aforementioned driving sound source code I multiplied by the aforementioned driving sound source gain 7 and output it to the sound source signal generating mechanism 15. After the encoding is completed, the multiplexer 3 will correspond The coding of the aforementioned quantized linear prediction parameters, and the coding corresponding to the delay parameter L, the driving sound source code I, and the sound source gain / 3, 7 are sent to the transmission line 6. Next, the operation of the decoder 2 will be described. First accept The separation mechanism 4 of the output of the multiplexer 3 outputs the coding of the linear prediction parameter to the linear prediction parameter decoding mechanism 16; the output of the delay parameter L and the audio source gain / 3 to the adaptive audio source decoding mechanism 18, Drive the audio source codebook 24; and output the code of the drive source code I and the source gain 7 to the drive source decoding mechanism 20. drive source code 24. There is a drive source code book on the coding side. This paper size is applicable to the Chinese National Standard (CNS) Α4 specification (210 × 297 mm). 10 317631 A7 ___B7 V. Invention description (8) The same N drive source vectors, And each pass corresponding to the aforementioned delay parameter l repeats correspondingly to the driving sound source coded by the driving sound source code I input by the driving sound source decoding mechanism 20 and is cycled, and is output to the driving sound source decoder_20 . The drive sound source decoding mechanism 20 decodes the drive sound source gain from the code of the drive sound source gain 7 to generate the periodic drive sound source vector multiplied by the sound source gain 7 keyed in by the drive sound source book 2 4 The sound source signal is driven and output to the sound source signal generating mechanism 21. The sound source signal generating mechanism 21 adds the self-adaptive sound source signal input by the self-supporting sound source decoding structure and the driving sound source signal M input by the driving sound source decoding structure 20 to generate a sound source signal and output it to Zijin The audio source should be thin on the code 17 and synthesized on the slope device 22. The synthesis filter 22 performs linear prediction synthesis on the sound source signal input by the aforementioned sound source signal generating mechanism 21 using the linear prediction parameter input by the linear prediction parameter decoding mechanism 16, and outputs a voice output 7. [Problems to be Solved by the Invention] The Employee Consumer Cooperative of the Central Ministry of Economic Affairs of the Ministry of Economic Affairs is printed on the above-mentioned conventional voice encoding and decoding device. When searching for the sound source of the Yong code, it can be adapted to the sound source according to the delay parameter. The vector or the driving sound source vector is generated as a circle-length sound source vector at the time of generation. It is subjected to media prediction synthesis to generate a speech synthesis direction Dong * to obtain the speech input direction a and the speech synthesis direction a in the frame length interval Distortion. However, due to the large amount of calculations added to the linear prediction synthesis, there is a box in the sound source search that requires a large amount of calculations. The present invention was created to solve such problems. Its purpose is to obtain 11 (please read the precautions on the back before filling in this X). This paper standard adopts the Chinese national rubbing rate (CNS> A4 specification (210X297 ) 317631 A7 B7 V. Description of the invention (9) A speech encoding device and speech encoding and decoding device that can avoid the deterioration of the quality of speech synthesis when encoding speech and can generate good quality speech synthesis with a small amount of calculation Installer. [Means to solve the problem] To solve the above problem * The speech encoding device of the present invention is provided with: a target speech generating mechanism * a person who generates a target speech vector corresponding to a vector length of a delay parameter from speech input; adaptive Audio source codebook, which generates adaptive sound source vectors of the negotiated length corresponding to the aforementioned delay parameters from the sound source signals generated in the past; from the i-source sound source retrieval mechanism, it evaluates and synthesizes the speech synthesis vectors obtained from the aforementioned adaptive sound source vectors The relative distortion of the aforementioned target stack of speech vectors, and retrieve the adaptive sound source vector with the least distortion; and the frame sound The source generation mechanism is a person who generates a frame length audio signal from the aforementioned self-tuning audio source vector with the smallest distortion. Printed by the Beige Consumer Cooperative of the Central Sample Bureau of the Ministry of Economic Affairs (please read the note ^^ on the back and fill in (This page) In addition, the speech encoding device of the present invention is further provided with: a second target speech generating mechanism that generates a second target speech vector from the target speech to the self-responsive sound source vector whose distortion is minimized; The driving sound source codebook * is used to generate the driving sound source vector corresponding to the vector length of the delay parameter; the driving sound source retrieval structure is to use M to evaluate and the second speech synthesis direction S obtained from the aforementioned driving sound source to 1. The relative distortion of the two target speech vectors, and the search distortion becomes the smallest driving sound source to one; and the second frame sound source generation structure is used to generate the second frame length from the aforementioned driving sound source vector whose distortion becomes the minimum The source of the signal. The voice coding device of the present invention is equipped with: a structure for generating a target voice, which generates a corresponding delay parameter from the voice input. The standard paper size is applicable to the Chinese National Standard (CNS) Α4 specification (210X297 mm) -12 _ _B7_ V. Description of invention (10) The phonetic transcription is directed to one; it drives the audio source codebook, which is used to generate the delay parameter The number of vectors of the driving sound source vector; the driving sound source retrieval mechanism, which uses M to evaluate the distortion relative to the aforementioned target speech vector of the speech synthesis vector obtained from the aforementioned driving sound source to 1, and the retrieval distortion becomes the smallest driving sound source Vectors; and the circle frame sound source generating mechanism, which is from the aforementioned driving sound source with the least distortion to the sound source signal used to generate the circle frame length in JI. In addition, the voice editing device of the present invention further corresponds to the voice input The pitch period determines the length of the target speech vector and the driving sound source vector. In addition, the speech coding apparatus of the present invention further makes the vector length corresponding to the embedding parameter rational. In addition, the voice coding device of the present invention is further configured to generate the target speech. M corresponds to the speech input of the delay parameter for each direction of the ft length. The speech input of the frame is divided, and the average speech input of each direction is added to generate the destination. The voice vector. Printed by the Beigong Consumer Cooperative of the Central Bureau of Economic Development of the Ministry of Economic Affairs (please read the precautions on the back before filling in this page) and the voice tweezers device of the present invention is a further mechanism for generating your voice. The length division corresponds to the speech input that is an integer multiple of the length of the delay parameter, and the speech input K that averages the magnitude of each vector is added to generate the speech vector. In addition, the speech coding device of the present invention further makes the length of the integer multiple of the length of the delay parameter for the delay parameter above the frame length M. In addition, the speech encoding device of the present invention further enables the target speech generating mechanism to determine the average speech input of each vector length according to the special amount of speech input of each length S length with corresponding delay parameters * Μ In order to generate _; _ This paper scale is applicable to the national standard (CNS) Α4 specifications (210Χ297 mm) _ 13 · A7 Printed by the Ministry of Economic Affairs Central Bureau of Standards and Engineering Zhengzheng Consumer Cooperative B7. Invention description (II) 1 S stack The weighting factor 0 * · 1 of the most recent speech | In addition, the speech m-code device 9 of the present invention is more advanced-the feature of speech rounding for each vector length of the delay parameter corresponding to the delay 1 1 1 delay parameter 1 If you have at least voice, please input the power information 1 1 first. Read the I and f. The voice m-code device 9 of the present invention is more advanced-step 1 corresponding to the extension of the extended surface Note 1 * I delay parameter of each-the length of the vector Special manager of voice input contains at least voice intention 1 Item I; Involved phase «Informer 0 Refill 1 and 9 The voice m-code device of the present invention is to further enable 100-standard voice to write a page-mounted 1% mechanism for each length of voice input according to the corresponding delay parameter. At the time 1 1 颚 系 Μ decided to add the average speech input of each vector length to generate a huge 1 I standard speech to the broadcaster at the time of 0 1: The speech encoding device 9 of the present invention is further improved-step by step One hundred phonic speech generators 1 1 The mechanism adds the average speech to each vector length of the delay parameter in Μ • 1 1 Fine-tune the speech of each vector length when inputting _ the time of entering the system 〇1 1 and f The voice m-code device 9 of the present invention further makes the ring frame sound line generating mechanism 9 Μ The aforementioned length of each vector repeats the corresponding delay parameter to 1 I Regularization of the sound source signal of the degree 9 Μ The sound source signal that generates the length of the round frame 0 1 I and t The speech coding device of the present invention further generates the sound source of the circle frame 1 1 into the mechanism 9 Interpolates the corresponding extension parameters between the circle frames The sound source signal of the vector length of No. 1 1 Η The person who generated the sound source signal 0 1 and 9 The device for speech sound of the present invention »It is more advanced — to make i respond to the sound source 1 1 Retrieval mechanism with instrument synthesis synthesizer, use the synthesis The impulse response of the filter 1 1 I • > λ Iteratively calculates the disappointed person ΰ 1 relative to the speech synthesis obtained from the aforementioned white adaptive sound source to Dong Zhong 1 1 to Dong ’s aforementioned B-labeled speech vector 1 This paper scale is applicable to China National Standard (CNS > A4 said grid (210X297mm) V. Description of the invention (12) In addition, the voice arbor brick device of the present invention * is further equipped with »up-sanpling voice input voice_input increase Sampling structure, and goal Generated mesh repeater Mei Mi from speech to be increased by the amount of speech samples in the key. In addition, the voice editing device of the present invention * further includes a sound source signal oversampling mechanism for the upsampling signal generated by the upsampling encounter, and the self-sounding upsampling codebook is generated from the past that has been upsampled The sound source signal generated from the audio frequency vector is also supported. The voice mixing device of the present invention further increases the corresponding sampling parameters of the sampling mechanism to change the sampling magnification. In addition, the speech sound device * of the present invention further makes the Poja sampling structure only change the speech input or the increased sampling rate of the sound source signal within the range of the vector length corresponding to the delay parameter. Printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs. The tweet code decoding device of the present invention is provided in both the editor and the target speech generation structure, which generates the longest corresponding delay parameter from the voice input. The degree of the target voice is one; the self-carburized audio source compilation book is the one that generates the self-sounding sound source vector corresponding to the aforementioned delay parameter ft length from the sound source signal generated on the occasion; the self-speaking sound source retrieval agency , Is to evaluate the distortions that are most relative to the aforementioned target speech speech synthesis direction δ derived from the aforementioned self-responding sound source to ft, and to retrieve the self-grinding sound source with the least distortion to ft; Constructed from the aforementioned self-responsive sound source vectors with the least distortion, generating a source signal with a circle frame fear degree. On the other hand, it has a decoding side. • An adaptive sound source codebook is used to generate the vector length corresponding to the delay parameter. The source of the self-adaptation of the sound source: the frame sound source is generated from the self-supporting 15 (please read the note ^^ on the back before filling in the purchase) This paper standard is applicable to the Chinese National Standard (CNS) Α4 specification (210Χ297 Mm> 317 631 A7 B7 V. Description of the invention (13) The dare sound source vector printed by the Employee Consumer Cooperative of the Central Central Bureau of Economic Affairs of the Ministry of Economic Affairs. Originally, the second sound source vector was the first distortion obtained by driving the sound source with K students, while the frame sound source used M The source side has a directional 1 length that generates a vector from the drive and the original speech; the source vector is the distortion of the sound source to Dong, and the sound source is generated to generate the target speech of the K invention in the drive source vector with the drive degree In the second generation, the search mechanism of the second-synthesis-corresponding search distortion distortion mechanism is generated in the second graph, the driving mechanism of the driving sound source vector invention is the driving sound source; the driving sound source is obtained and the retrieval mechanism is lost, the frame The length of the sound source coded sound source vector is generated by M as a frame length sound source signal decoder device, which is a further generation mechanism that is derived from the target speech and the target speech vector; the drive of the vector length of the late parameter is used to evaluate and The driving sound source vector that becomes the smallest from the aforementioned driving target to the second target language of S becomes the smallest from the aforementioned distortion The source signal of the length, the other source codebook, uses M to generate the source vector; and the second frame sound uses M to generate the second round frame length code decoding device, which generates the vector length from the voice input on the encoding side The codebook is used to generate a vector source search mechanism, which uses K to evaluate the driving source of speech synthesis to the target of S that is the smallest to the source signal from the distortion to the smallest drive. On the other hand, the book uses M. Generate the corresponding delay; and the frame sound source to generate the sound source signal of the length of the frame. On the encoding side and the source sound source vector from the sound source encoding sound source vector to the sound direction 1 phase; and the first driving sound source should delay the reference source on the one hand The sound source signal of the generator is available, and the driver of the target length of the degree is adapted to the book, or the structure of the two-circle vector decoding number of the pair, the number of the target voice dynamics and the relative amount of the driven voice vector; and The picture frame moving sound source to Dong Zhong has a vector length with a parameter on the decoding side, which is driven by the sound (please read the precautions on the back before filling this page). Applicable to China National Standard (CNS) Α4 specification (210 X 297 mm) 16 1 β-Printed by the Ministry of Economic Affairs, Central Bureau of Standards and Engineering, Zhengong Consumer Cooperative A7 B7 V. Description of Invention (u) [Implementation of Invention] 霣 施Form 1. B1 is a block diagram of the speech «code device and speech decoding device of embodiment 1 of the present invention. In 圔 1, 1 is the encoder, 2 is the demultiplexer, 3 is the multiplexer, 4 is the sub-knock structure, 5 is the voice input, 6 is the pass line, and 7 is the voice_out ° coder 1 series It is composed of 8, 9, 15, and 25 ~ 36 under W. 8 is the nickel pre-parameter analysis structure, 9 is the linear pre-parameter parameter mechanism. 15 is the sound source signal generation mechanism, and 25 is the extracted voice input. The pitch analysis of the pitch period of the eaves period, 26 is the extended parameter search model of the retrieval model that determines the Ting Chi parameters when I retrieve the sound source vector, and 27 is the Pojia sampled speech input The speech key input spectrum plus sampling mechanism, 28 is the target speech generation mechanism that uses K to generate the target speech vector corresponding to the length of the IS parameter. The 29 is to increase the sampling of the sound irrigation signal generated in the past. Sampling organization · 30 is the self-supporting audio source codebook for outputting the self-entangled sound source vector of the vector length of the delay parameter from the yinhai information generated in the past · 31 is the self-supporting sound source retrieval structure · system evaluation and The aforementioned speech vectors of the speech synthesis vectors obtained from the sound source from Biaoying to Dong Zhong are relative to Search and search. The source vector with the least distortion is the self-supporting sound source vector. 32 is the ring frame sound source generating mechanism, which generates a frame length sound source signal from the self-paralysis sound source corresponding to the vector length of the extension parameter to ft. In addition, 33 is the structure of the second vocalization of the second ear, which is used to correspond to the most vocal glucoside of the vector length of the number of sound sources in the search of the ft to the ft; 34 is the drive of the sound source. Corresponding to the longest driving sound source vector in the direction of number # 35, 35 is the driving sound source eaves. The paper size is applicable to the Chinese National Falcon (CNS) A4 specification (210X297 mm). 17 —I- — > i Bmli n L . ^ — A ^ i · ϋ ^ _— 1 · ϋ mmmm I (please read the note $ item on the back first and then fill in the ranunculus) Ordering line · 317631 Central Bureau of Standards of the Ministry of Economic Affairs * Printed by the Consumer Cooperative Cooperative A7 B7 5 3. Description of the invention (15) "Bangu 'uses M to evaluate the distortion relative to the second-order speech vector of the speech repertoire vector obtained from the moving sound source vector, and the search distortion becomes the smallest moving sound source vector, 36 Generate a structure for the second frame sound source, from the length of the corresponding delay parameter to 1 Those driving sound source signals use M to generate a frame-length moving sound source signal. In addition, the debouncer 2 is composed of the following 16, 21, 22, and 37 to 43. 16 is the linear prediction parameter deconstruction structure, 21 is the audio signal generation conversion structure, 22 is the synthesis filter * 37 is the Po source sampling the source signal generated in the past to increase the source signal sampling mechanism, 38 is the output delay parameter The number of vector lengths of the self-entangled sound source vector is self-adhesive, and the sound source tweezers are thin. 39 is the self-contained sound source decoding structure. 40 is a ring frame sound irrigation generating mechanism, which is from the self-selected Yingya to the suspension of the vector length corresponding to the delay parameter, and uses M to generate the sound source signal of the ring frame length. 41 is the thin brick that drives the sound source and outputs the corresponding delay. The vector length of the audio source driving the audio source vector is 42. It is the structure of the audio source decoding decoding. It is the signal source driving the source of the vector length corresponding to the delay # number * 43. It is the structure of the second garden frame sound generation. For the driving sound source signal of the vector length of the delay parameter, a K is used to generate a moving sound source signal of circular frame length. The action will be described below. First, in the encoder 1, for example, a digital speech signal sampled at 8 kHz is input as a speech input 5. The media pre-congestion parameter analysis structure 8 analyzes the aforementioned speech round 5 and extracts the media prediction parameters as the frequency envelope information of the speech. Next, the linear pre-holding I parameter configuration 9 quantizes the extracted linear pre-holding parameters above and compiles the corresponding _ to the multi-work paper size using the Chinese National Standard (CNS) A4 specification (2 丨 〇 > < 297mm > -18-(Read the notes on the back first and then fill in this page) Packing & Threading * Α7 Β7 Printed by the Consumers ’Cooperative of the Central Bureau of Economic Development of the Ministry of Economy V. Invention Instructions (16) 3. At the same time, Dong Zihua ’s media prediction parameters are simultaneously output to the adaptive. Yinya search structure 31, the second-order speech generation mechanism 33, and the dynamic sound source search suspect structure 35. The pitch analyzing mechanism 25 5 analyzes the voice input 5M and extracts the pitch pass period P. Next, the delay # number search range determines the structure 26 from the aforementioned pitch pass period P, for example, according to formula (1) to determine the search parameter group 1 of the delay parameter 1 when retrieving from the episodic sound source vector 1 »t ηέΐέl» a X * The output is input to the voice round sampling sample structure 27, the sound source signal is increased by the sample structure 29, and the self-responding sound source retrieval mechanism 31. Here, ΔP is set to P / 10, for example. 1 m I η = Ρ-Δ Ρ 1 · a χ = Ρ + ΔΡ (1) The voice input increases the sampling mechanism 27 series M according to the delay parameter retrieval Fan Xing determines the structure of the retrieval parameters of the round 26 The sampling rate, for example, increases the sampled speech input 5 · to the target frame of the target speech generation structure 28 in the sigmoid frame interval which is the source of the audio signal. Here, the sampling rate is determined by, for example, M. In the case of 丨, increase sampling by 4 times. In 45 helmet lmtn < 65 case * Double sampling is performed. In the case of 65 forget 1 «I η, Po plus sampling is not conducted. The target speech generating structure 28 is a voice input with an increased sampling round frame length input from the aforementioned voice input to increase the sampling sock structure 27, and corresponds to the delay parameter 1 input by the self-supporting sound source retrieval mechanism 31, for example, Μ Each period 1 is divided, and the length of the division is added to the speech input of the length of each segment by adding the average «parameters to the ft length. Η generates a vector length corresponding to the extension of the parameter 1 (please read the back side first (Notes and then fill out this page) h-installed. The size of the paper is applicable to the Chinese National Standard (CNS) A4 specification (210x297 mm) 19 317631 Printed by the Ministry of Economic Affairs Central Bureau of Accreditation Consumer Cooperatives A7 __B7_____ V. Description of invention The voice of (Π) 's head is in the direction of 5, and it comes out to the self-entangled sound source retrieval mechanism 31 and the second head's voice generation structure 33. Here, in addition to taking integer values, the delay parameter 1 can also take non-integer rational numbers. According to the existence paradigm of 1, for example, when It n t is embedded as an integer, the following values can be taken. at 1 < 45 case, It n t, li n t + 1/4 * 1 · „1/2 * 1 t„ t +3/4 at 45 forget 1 < 65 case, hnt, li nt +1/2 In 65Sil case, llnt circle 2 shows the most suitable example of the ft length of the target voice corresponding to the extension parameter 1 generated by the voice key input of the round frame length . Here, in the case of a generalized frame length, the above-mentioned addition averaging is not performed, and She takes the round of the length of the frame as the target speech vector. The source signal increased sampling frame 29 »is only in the interval required for the self-entangled sound source search of the search field according to the delay parameter input by the delay parameter search structure determined by the aforementioned delay parameter search fan garden, and M corresponds to the aforementioned delay The sampling rate of the retrieval parameter of the parameter is added to the sound source signal generated in the past input from the source structure 15 generated by the sound source signal, and the output signal M is output to the self-adapted sound source editing book 30. Here, increasing the sampling rate is determined, for example, according to the following. at 1 In the interval of < 45, 4 times up sampling is performed. At 45 guests 1 In the range of < 65, double sampling is performed. Between 65 ^ 1 S, no sampling is performed. The self-parametric audio source codebook 30 is derived from the up-sampled audio source signal rounded by the audio source signal crucible plus sampling structure 29, and will correspond to the direction of the delay parameter 1 input by the self-supporting audio source retrieval broadcast structure 31. The length of the ft should be from the sound source to the stupid (please read the note $ item on the back and fill in this page) η Order > »T- 錬. This paper size is applicable to the Chinese National Standard (CNS) A4 specification (21〇X297 mm )-2〇 The Ministry of Economic Affairs, Central Bureau of Accreditation and Employee ’s Consumer Cooperative Printed «. A7 B7_ V. Description of Invention (18) ~ _To the adaptive audio source search structure 31. Here, the aforementioned self-sounding audio system cuts out a sampled audio signal from the delay parameter 1 and will cut out the audio signal of the frame length from a sampled past in the case of 1 degree of fear. The H sound source search mechanism 31 is equipped with a synthetic waver, and uses the quantized linear prediction parameter M input by the digital tweezers 9 to obtain the pulse response of the synthetic waver. Secondly, for the delay parameter of the llB t „彡 1 彡 1» ax, use the aforementioned pulse response response to calculate and repeat from the self-supporting audio source input by the aforementioned self-supporting audio source codebook 30 to W Generate the speech synthesis direction S. Next, find the target speech vector input from the narration target speech generation 28 and the auditory weighted distortion of the aforementioned speech synthesis direction ft. Second, compare and evaluate the aforementioned distortion, M find that the aforementioned distortion becomes minimum The extension parameter L of the corresponding and the corresponding self-supporting source point * and output the above-mentioned delay parameter L and the self-supporting source gain 々 yin code to the multiplexer 3 and the driving source tweezers code sheet 34, and generate The self-corresponding sound source parameter corresponding to the aforementioned delay parameter L is multiplied by the self-corresponding sound source signal of the aforementioned self-corresponding sound source gain / 3, and is output to the frame sound source generating structure 32 and the second target speech generating structure 33. Here, the aforementioned adaptive sound source signal is L-sampling in the case of L <frame length, and the circle-frame length in the case of LS_frame length. The circle-sound vocabulary generation structure 3 2 is derived from the foregoing Retrieval of self-supporting audio sources 31 In the input self-adapted sound source signal, for example, M is repeated and embedded in each pass period. Κ generates a self-adapted sound source signal with a frame length and outputs it to the sound source signal generation mechanism 15. This paper size is applicable China National Standard (CNS) Α4 specification (210X297 mm) -2 \-(please read the $ item on the back and then fill in this page)-Install · Threading and Printing A7 B7 5. Description of the invention (19) The second target speech generating mechanism 33 uses the quantized linear prediction parameters input by the media prediction parameter Zen code mechanism 9 to retrieve the self-inputs from the aforementioned self-sounding sound source retrieval structure 31 Adapt the sound source signal to perform linear prediction synthesis M to generate a speech synthesis vector. Next, find the difference between the target speech vector input by the Meg speech generation mechanism 28 and the aforementioned speech synthesis vector, and output it as the second target speech to 1. To the driving sound source retrieval mechanism 35. On the driving sound source codebook 34, for example, N driving sound sources generated from the noise are stored to 1, corresponding to the vector length of the aforementioned delay parameter L The output and output correspond to the driving sound source code i of the driving sound source code i input by the driving sound source retrieval structure 35. Here, in the case where L is the length of the frame, she rounds out the driving sound source of the round frame length to Dong. Retrieve the 35th pair of N sound drive sound sources to M, and use the ft sub-sources input by the aforementioned drive sound source codebook 34 to use the ft input by the aforementioned media prediction parameters to the code 9 Linear prediction parameters are used to perform linear prediction synthesis M to generate a speech synthesis vector. Next, the auditory addition distortion of the second target speech inputted by the second target speech generation mechanism 33 to Dong and the aforementioned speech synthesis vector is obtained. Secondly * comparison Evaluate the aforementioned distortion, find the driving sound source code I and the corresponding driving source gain 7 corresponding to the minimum distortion, and output the codes of the driving sound source code I and the driving sound source 7 to the multiplexer 3 * At the same time, a driving sound source signal corresponding to the aforementioned driving sound source code I is multiplied by ft before K_driving sound source gain 7 and is output to the second frame sound source generating structure 36. The second frame sound source generating mechanism 36 is based on the driving sound source signal input from the aforementioned driving sound source retrieval mechanism 35, for example, the Lit line is repeated every cycle, and the paper scale is adapted to the Chinese National Standard (CNS) A4 specification (210X297 mm ) -2 2-(please read the $ item on the back and then fill out this page) -install. A7 B7 printed by the Central Bureau of Economics and Trade, Beigong Consumer Cooperative of the Ministry of Economics. 5. Description of invention (20) Buried, The driving sound source signal of the length of the circle frame is generated and output to the sound source signal generating eaves 15. The sound source signal generating structure 15 is a summation of the self-tuning sound source signal input by the aforementioned plaque frame sound source generating mechanism 32 and the driving sound of the garden frame length input by the aforementioned second round frame sound source. The ya signal K generates a ya signal and outputs it to the sound source signal up-sampling mechanism 29. After the end of K-Yong, the multiplexer 3 sends out the codes corresponding to the aforementioned quantized linear pre-spray parameters, and the codes corresponding to the delay parameter L, the driving sound source Burmese I, and the sound source «益 / 3, 7 To transmission medium 6. Μ 上 is a special action of the voice code device of the first form. Secondly, the operation of the decoder 2 will be described. "First, the separation cycle 4 of the output of the multiplexer 3 is accepted, and the Y code of the linear prediction parameter is separately output to the linear pre-parameter parameter decoupling mechanism 16; the delay parameter The tweezers code of L is output to the self-restrained sound source decomposing mechanism 39 and the hurricane sound source metal code thin 41; the sound source gain Ling Zhige is output to the self-supporting sound source decoding structure 39; and the sound source sound source I The sound source of the sound source 7 is rounded out to drive the sound source decoding mechanism 42. The self-supporting audio source decoding structure 39. The system first outputs the aforementioned embedding parameters to the audio source signal plus extraction mechanism 37 and the self-supporting audio source. The audio signal is added with a rubbing mechanism 37. Between the plaques required for generating the self-responsive sound source vector according to the value of the aforementioned delay parameter L input from the self-defining sound source decompression structure 39, K follows the aforementioned delay parameter Sampling rate of the value of L * Pu plus samples the vocal sink signals generated in the past from the Yone Ma signal generation Yong structure 21 for output to the audio source audiobook 38. Here • Increase the sampling (please read the note $ item on the back and fill in this page). Binding. Threading-This paper size applies to China National Standard Vehicle (CNS) A4 specification (210 X 297 mm) 23 317631 Α7 Β7 Printed by the Ministry of Economy Central Bureau of Accreditation Consumer Cooperatives V. Description of invention (21) The rate system and the sound source signal in encoder plus sampling structure 29 are also decided. The self-responding sound source codebook 38 is from the up-sampled vocal ensemble signal rounded by the aforementioned sound source signal up-sampling structure 37, and will correspond to the delay parameter L rounded by the self-entangled sound source decoding structure 39. The vector-length self-edge response sound source is output to the self-edge response sound solution structure 39. Here, the person who responds to the audio source cuts the delay parameter L to the sound source signal passing through the L parameter, and in the case of the frame length, cuts the sound source signal with the length of the frame from the L sample. The self-recording audio source decoding mechanism 39 is to extract the self-recording audio source Poyi / 3 from the aforementioned self-recording audio source Yiyi Lengyi, to generate the self-recording audiobook input codebook 38 The Yuning sound source outputs the self-supporting sound source signal which is the most multiplied by the aforementioned K from the Yunyi sound source gain Lu, and outputs it to the H frame sound generation generating eaves structure 40. The frame sound source generating mechanism 40 is a self-contained sound source signal input from the self-contained sound source decoding mechanism 39, for example, κ is repeated for each buried period, and K generates a self-contained sound source signal with a frame length, and _ Out to the sound source signal generating mechanism 21. The audio source codebook 41 stores the same N-driver sound source vector as the drive source codebook 34 on the coding side, and cuts out the length of the direction fi corresponding to the aforementioned delay parameter L to be decoded by the SI dynamic sound source. The driving sound source vector of the input driving sound source 颃 碣 I is output to the driving sound source decoding and broadcasting structure 42. The driving sound source decoupling mechanism 42 decodes the driving sound source gain from the aforementioned code of the moving sound source gain 7, and M generates a cut-off body sound source vector rounded by the driving sound source level code thin 41 multiplied by the aforementioned sound source Po The sound source signal of Yi 7 is driven out to the second ring frame sound source generation broadcast structure 43. The second _ frame paper size is applicable to China National Standard (CNS) Α4 specification (210Χ 297 mm) ----- J ---- I installed-- (please read the precautions on the back before filling this page) A7 B7___ Printed by the Ministry of Economics of the Ministry of Economics, Central Bureau of Rectification and Consumer Cooperatives 5. Description of the invention (22) # is to generate the structure 43 from the sound source signal input by the aforementioned sound source decoding structure 42, for example A buried period P is repeatedly encountered to generate a driving sound irrigation signal of a round frame length, and then rounds out to the sound source signal generating structure 21. The sound source signal generating mechanism 21 adds the adaptive sound source signal of the H frame length input by the aforementioned frame sound source generating structure 40 and the length of the frame frame input by the aforementioned second frame sound source generating structure 43 The dynamic sound source signal M generates a sound source signal * and outputs it to the sound source signal plus sampling structure 37 and synthesis filter 22. The synthesis filter 22 is a sound source signal input by the aforementioned sound source signal generation mechanism 21, using The linear prediction parameters keyed in by the parameter unblocking mechanism 16 perform media prediction synthesis, and output voice output 7. The above is the extreme action of the voice decoupling device of the first embodiment. According to the first embodiment, when determining the optimal embedding parameter, when the prolongation parameter 1 is shorter than the frame degree, the average speech input is added periodically to generate the speech vector of length 1 Relative to this estimate and linear prediction of the synthesis vector, the distortion of the speech synthesis vector generated from the Yingyinya vector from the first degree of the synthesis vector county • and * even when deciding on the best source of the sound of movement " Distortion evaluation uses the media prediction synthesis to generate the speech synthesis vector generated from the driving sound of Dong length 1 to fi, to avoid the deterioration of the quality of speech synthesis, and to generate speech synthesis with low cost and good quality. Ensuring Form 2. In the above-mentioned Benshi Form 1, the _frame sound source generates "frames 32, 40 and the second country frame sound source generating mechanism 36, 43, although it is the same for each period of L. It will correspond to the delay. The parameter of the length of the parameter L to the fi length of the self-cockroach signal or the sound of the sound book is suitable for the Chinese national standard (CNS & A4 specifications (2 丨 0X297mm) -25-(please read note f on the back (Fill in this page again) η -installation. 317631 A7 B7. V. Description of the invention (23) When the source signal meets the period, K generates an adaptive sound source signal or driving sound source signal of the length of the garden frame, but it can also correspond to the aforementioned delay. The adaptive sound source signal or driving sound source signal of the length of the parameter L, for example, M performs waveform interpolation for each pass L, interpolates between the circles, and M generates a frame length adaptive sound source signal or driving sound source signal. According to the second form of the trade application, the change of the sound source signal between the circles becomes smooth, the resynthesis of speech synthesis is good, and the quality is improved. Embodiment 3. In the above-mentioned Embodiments 1 and 2, Although it is an adaptive response from the corresponding delay parameter L to the length of Dong In the source signal or the driving sound source signal, use the ring frame sound source to generate the frame and the second frame sound source to generate the frame K to generate the frame length of the adaptive source signal or the frame length of the driving source signal, and add the same Generate a sound source signal of frame length, but you can also add an adaptive sound source signal of the vector length corresponding to the delay parameter L or drive the sound source signal to generate a sound source signal of the vector length corresponding to the delay parameter L. For example, each cycle L will repeatedly Encounter the period, and generate a sound source signal of the frame length. G Application form 4. Printed by the Beigong Consumer Cooperative of the Central Bumper Bureau of the Ministry of Economic Affairs (please read the precautions on the back and then fill out this page) In the first embodiment above Although both the encoder and the decoder adopt new configurations, the encoder can be used as the encoder of Embodiment 1 and the decoder as the conventional decoder described in FIG. 12. Embodiment 5. In the above In the first embodiment, the target speech generating mechanism 28 generates the target speech vector corresponding to the vector length of the delay parameter 1 from the speech input of the national frame length, but it can also be As shown in 3, the Chinese national standard (CNS) A4 specification (210 X 297 mm) is applied from the corresponding delay reference paper scale -2 6 _ 317631 A7 ._____'_ B7_ V. Description of invention (24) The number of vector length of number 1 The speech vector. The speech vector is generated from the speech input of an integer multiple length. (Please read the note $ item on the back and then fill in this page.) According to the fifth embodiment, in the averaging process when generating the target speech vector, there is no If vectors with different vector lengths are necessary, it can be easily processed. Furthermore, by using it for evaluation when the speech input exceeding the frame length is subjected to speech dumping, the reproducibility of speech synthesis can be improved, and The quality is improved. Embodiment 6. In the above-mentioned Embodiment 1, the target speech generating means 28 is simply averaged when generating the target speech vector corresponding to the length of the delay parameter 1 from the voice input to the fi length, but it is also As shown in IB 4, the power input of the voice input according to the vector length corresponding to each delay parameter 1 can be used. For example, the higher the power, the larger the added broadcast, and the average is added. According to the sixth embodiment, in the averaging process when generating the target speech vector, by performing an on-air speech coding using a portion with a large input power of speech, speech synthesis with a large influence on subjective quality can be achieved The Central Ministry of Power Economics has printed high quality products for the Beigong Consumer Cooperatives to improve the quality and make it into a transgender regulation. Then the delay of the real large and medium number parameters applied by the fractal department should be dealt with into the ecological form. From the 28-degree measuring machine in the system, the long-term measuring machine to the Chengshengsheng I, the pronunciation of Baibai is in the use of Poligo's rate indicator, It enters 0, the length of the speech is 5 degrees, and the amount is pure U. The singular reference should be delayed for each of the delays M should be according to the picture according to the corresponding pronunciation, but the long number of the input speech should be included in the delay. , As close to the example, enter 27 paper standards using the Chinese National Standard (CNS) Α4 specification (2! 0 > < 297 mm) 317631 A7 B7 Printed by the Ministry of Economic Affairs, Central Bureau of Economic Development, Jeonggong Consumer Cooperative V. Description of the invention (25) 1 1 If the target speech vector is generated according to the seventh embodiment, the average 1 1 average In the processing, the speech input is coded by reducing the weight of the low-correlation part of the 1 1 case with the periodicity of period 1. Even for the pitch cycle, please change the voice input by 1 1 period first to generate the corresponding 1 section. The visual distortion of the 1 1 mark speech vector with less distortion from the period improves the reproducibility of speech synthesis and improves the quality. Note 1 to the implementation form 8.1 Item μ In the above-mentioned embodiment 1, the target speech generating mechanism 2δ is filled in with 1 as the target% of the length of the direction corresponding to the delay parameter 1 is generated from the speech input. 1 The speech vectors are simply averaged, but they can also be shown in Figure 6 * Use 1 1 to weight the power of the speech input corresponding to the length of each delay parameter 1 to 1 1 I »For example, for speech input near the border of the frame The weighting is increased to perform phase 1 1 order addition average 0 1 I. According to the eighth embodiment, the average 1 1 homogenization process at the time of generating the target speech vector is performed by adding in the speech input near the boundary of the ΓΕΠ brain frame A large weighting of 1 to generate a huge τη preserving speech direction 3 and encoding it can make the speech synthesis near the border of the frame γ line | have good reproducibility, and can make the change of speech synthesis between frames become smooth 0 The effect is between the frames Especially significant when generating interpolation Embodiment 2 in the case of a sound source signal. 1 1 spring application form 9. 1 1 In the above-mentioned first embodiment, in terms of the target speech generating mechanism 28, although 1 I is like generating a target corresponding to the length of the delay parameter 1 from the speech input, although the length of the target 1 1 I is At the time of M, the average speech input is added every 1 cycle, but it can also be 1 1 I as shown in FIG. 7, for example, in order to make the speech length corresponding to the vector length of each delay parameter 1 1 1 sound input related (c Γ 0 SS -correlation) becomes the largest • and micro 1 1 This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) -28- A7 B7 Printed by the Ministry of Economic Affairs Central Bureau of Unemployment Consumer Cooperative «. V. Invention Explanation (26) Li cut out the position where the voice turns in * and add them to the average. According to the real pickling body 9, in the averaging process when generating the vocal speech direction 1, in order to maximize the mutual gap between the speech wheels corresponding to the length of the direction of the corresponding parameter 1, the fine-tuned cut is cut out Position, even if the round of the pitch of the period changes, it can generate a small distortion of the corresponding speech vector corresponding to the pitch of the period, so that the speech synthesis has good reproducibility * and improves the quality. Information form 10. Picture 8 is a block picture showing the full-level structure of the voice yoke coding device and voice decoding device of the cell form 10 of the present invention. In this circle, the same part as circle 1 is given the same symbol, and its explanation is omitted. In H8, the new structure compared to circle 1 is shown below. 44 is a voice input to increase the sampling voice input and a rubbing mechanism is added, 45 is a mute voice generating eaves structure that uses M to generate a sample voice vector corresponding to the vector length of the pitch encounter period, and 46 and 51 are thin body sound sources. , Is to output the driving source corresponding to the length of the pitch ft to the ft. 47 is the retrieving and broadcasting structure of the moving source. It is used to evaluate and synthesize the speech synthesis from the moving source to JI. The relative distortion of the speech vector, and the retrieval distortion becomes the smallest driving sound sea vector, 48 generates a structure for the second target voice. It is used to generate the corresponding second drive, the length of the pitch period in the sound source vector retrieval. For the target speech vector, 49 and 54 are the second driving sound source, which is the second driving sound source vector for outputting the vector length of the edit pitch, and 50 is the second broadcasting sound source retrieval mechanism. Used to evaluate the distortion relative to the speech synthesis vector obtained from the second sound source from Dong to the second-order speech vector of 1, and the search distortion becomes the smallest drive sound source to fi. 52 is the drive sound source solution mechanism. Paper ruler Applicable to Chinese National Standards (CNS & A4 specifications (210X297mm) -29-(please read the $ item on the back and then fill in this page) • Binding · Strapping. S17631 A7 __B7_ 5. Description of the invention (27) for The driver that decodes the vector length of the pitch corresponding to the pitch period. 53 is a group frame sound source generation mechanism. It is a driver who generates a ring length driving sound source signal from M from the drive pitch vector that corresponds to the vector length of the pitch period. 55 is the decoding structure of the second joyful sound irrigation system. It is the second driving sound source signal that decodes the length of the corresponding pitch through period corresponding to the length of the Dong. 56 is the generation of the tracking structure for the sound frame of the second frame. It is from the jT pitch. In the second driving sound source signal of the vector length of the period, it is used to generate the driving sound source signal of the frame length. Below, the operation will be described centering on the above-mentioned new structure. First, in the «coder 1, the pitch analysis supports The system 25 analyzes the voice input 5 and extracts the pitch buried period P to output to the multiplexer 3, the voice input increase sampling mechanism 44, the target voice generation and broadcast structure 45, the driving sound source codebook 46, the second driving sound source code Thin 49. Here, the pitch pass P value other than an integer, non-integer rational also taken according to the presence of P * Fan, e.g. Μ acquired value of the "Spanish fly in the case where the set value * of the pitch of the through At Ρ < 45 case, Pi n t »Pl n t + l / 4, Pl n t ♦ 1/2. P I n t + 3/4 at 45 forget P < In case of 65, it is P tn *, P tnt +1/2 Printed by the Beigong Consumer Cooperative of the Ministry of Economic Affairs of the People ’s Republic of China (please read the notes on the back before filling in this page) In the case of 65SP, it is Pint The voice input increasing sampling mechanism 44 is based on the extraction rate of the pitch pass period entered by the pitch analysis mechanism 25. For example, the sampled voice is added to the round frame interval which is the position of the source code of the code. The output is to be output to the Miao speech generation mechanism 45. Here, the Poka sample rate is determined by, for example, the following. In Ρ In the case of < 45, increase sampling by 4 times. At 45SP < 65 times, buries twice as much as rubbing. This paper uses the Chinese National Standard (CNS> A4 specification (210X297mm) -3〇 ** Printed by the Ministry of Economic Affairs of the Ministry of Economic Affairs, Shellfish Consumer Cooperative A7 ___B7_ 5. Description of the invention (28) at 65SP No additional sampling is performed. The vowel speech generating mechanism 45 rounds up the speech of the frame length of the Po plus sampling that was input by the aforementioned speech input increasing sampling mechanism 44, corresponding to the pitch period input by the pitch analysis mechanism 25 , And by, for example, adding and averaging each buried period p, K generates the head voice vector of the vector length P, and the key is output to the II moving sound source retrieval structure 47 and the second head voice generation mechanism 48. Here, In the case of P ^ · frame length, the aforementioned addition and averaging are not performed, and the speech input of the frame length is used as the head voice to ft. On the driving sound source yin milling 46, there are, for example, those generated by the late sound of sugar eaves The N sound sources are suspended, using the pitch corresponding to the input from the pitch analysis mechanism 25. The vector length of period P is cut out and the output corresponds to the sound source code input by the waist motion source checker structure 47 i's moving sound source vector. Here, in P In the case of the length of the circle frame, the boundary motion source vector of the length of the frame is output. The driving sound source retrieval mechanism 47 uses the driving sound source vector cut out from the input of the ancestral moving sound source code code 46 for the N ancestor moving sound source vectors The linear prediction synthesis is performed on the quantized intermediary prediction parameters of the entangled prediction parameter 9 compiled by the eradicative prediction parameters to generate the speech synthesis direction 1. Next, the rounds are obtained by the pre-sentence speech generation mechanism 45 The target speech vector and the speech synthesis vector's auditory add-on distortion. Secondly, the aforementioned distortion is compared and evaluated to obtain the dynamic sound source code I and the corresponding drive _ sound source heap 7 that the aforementioned distortion becomes the smallest. The aforementioned driving sound source code I and the driving sound source gain 7 are outputted to the multiplexer 3, and at the same time, the driving sound source corresponding to the aforementioned driving sound source I is generated by multiplying the driving sound source signal ft by the aforementioned driving sound source gain 7 and output To the second item, "Speech Generation 48", this paper size is applicable to China National Standard (CNS) A4 (210X297mm) -31-(please read note f on the back and fill in this page) h Line ' A7 B7__ Printed by the Consumer Labor Cooperative of the Central Bureau of Economic Development of the Ministry of Economy 5. Description of the invention (29) The second-generation voice generation unit 48 uses the driving sound source signal input by the aforementioned driving sound source retrieval structure 47 to use linear pre-I The parameterized quantized medium prediction parameters entered in the structure 9 are subjected to linear prediction synthesis to generate a speech synthesis vector. Next, the target speech vector and the aforementioned speech input by the aforementioned structure generating structure 45 are obtained The difference of the synthesized vectors • It is output as the second target speech vector to the second sound source retrieval structure 50. On the second driving sound source code book 49, for example, a random voice is generated. N tilt drive sound source. Use the direction ft length corresponding to the pitch period P input by the pitch analysis mechanism 25 and _ corresponding to the drive sound source code j input by the drive sound source retrieval mechanism 50. The second sound source vector. Here, in the case where P is at the length of the frame * the driving source of the frame length is rounded to 50. The second moving sound source retrieval mechanism 50 sets N moving sound source vectors, which will be thinner by the second driving sound source. The second sacral motion sound source vector entered in 49 is subjected to media prediction synthesis using the linear prediction parameters input by the aforementioned linear prediction parameter editing mechanism 9 to generate a speech synthesis vector. Next, the sleep-on-play distortion of the second-meat speech vector and the speech-synthesis vector entered by the second-mean speech generating unit 48 are obtained. Secondly, the aforementioned distortion is compared and evaluated to obtain the second driving sound source Burr J that has the smallest distortion and the corresponding driving sound sea gain 7 2, and the aforementioned second driving sound source Tweezer J and driving sound source gain 7 2 Burmese yards round up to the multiplexer 3 °
在以上掮碼结束之後,多工器3係將對應前述量子化之 線性預测參数的编磚,和對應節距通期P、辐動音源鏞礴I 本紙張尺度適用中國國家榡準(CNS ) A4規格(210X297公釐) · 32- (請先閩讀背面之注$項再填寫本頁) h -- 訂 線- 經濟部中央揉準局貝工消费合作社印製 A7 B7 五、發明説明(3〇 ) 、J及音源增益7、72的編碼送出至傅输埭6上。 以上係本»施形戆10之語音鏞碼裝置的特戡性動作。 其次*就解碼器2之動作加以說明。 胃&接受多工器3之_出的分離機構4*係各別將線性預 »#*5ίϋΒ碼输出至線性預测參數解碼櫬構16上;將節距 «期Ρ之編碼輸出至驅動音源繮碼簿51、第二驅動音源鍰 ϋ薄[5 4上;將驅動音源纗碼I、音源增益7之傾碼輪出至 驅動音源解碼皤構52上;及將第二驅動音源《«J、音源 «益7 2之緬碼輸出至第二騸動音源解碣機構55上。 ϋ動音源繙碼簿51係存髄有和纗碼供之驅動音源鏞碣薄 46相同的Ν個騮動音源向量,並Μ對應前述節距遍期ρ之向 董長度切出對應由驅動音源解碼櫬構5 2所輸入之驅動音源 鏞碼I的驅動音源向量·而輸出至驅動音源解碼櫬構52上。 驅動音源解碼檄構52,係從前述驅動音源蝤益7的钃碣 中解碥驅動音源坩益7 ,以生成由前述騙動音源牖碣薄51 所輪入之切出的驅動音涯向董乘以前述音源增益7的驅動 音源信號,而輸出至圈框音源生成櫬構53上。匾框音源生 成皤構5 3係從由前述臈動音涯解碼播構52所_入的騙動音 源信虢中,例如以每一通期Ρ反覆邇期化,Μ生成_框長 度的驅動音源信號,而输出至音涯信號生成櫬構21上》 第二驅動音源纗碼薄5 4係存雠有和編碼側之第二驅動音 源钃礓薄49相同的Κ個軀動音源向量,並以對應前述節距 通期Ρ之向量長度切出對應由第二驅動音源解碼櫬構55所 輸入之第二驅動音源鏞碣J的第二顆動音源向量’而_出 本紙張尺度適用中國國家梯準(CNS ) Α4規格(210x297公釐) -33 - (請先閲讀背面之注意Ϋ項再埃寫本頁) -裝. 訂 線 317631 A7 B7 五、發明説明(31 ) 益增 源 音 動 驅 二 第 述 前 從 。 係 上 , 5 5 5 5 構構 櫬櫬 碼碼 解解 源源 音音 動動 i I S 顆 二 二 第第 至 經濟部中央樣牟局員工消费合作社印製 7 2的纗碼中解碼第二腰動音源增益7 2,K生成由前述第 二騸動音源编碼薄5 4所_入之切出的第二驅動音源向董乘 Μ前述驅動音源增益7 2的第二驅動音源信號,而输出至 第二晒框音源生成機構56上。第二圈框音源生成機構56係 從由前述第二騮動音源解碼櫬構55所輪入的第二驅動音源 信號中,例如Μ每一埋期Ρ反覆通期化,以生成第二麵涯 長度的驅動音源信號,而_出至音源信號生成機構21上。 音源信號生成櫬構21係相加由前述園框音源生成機構53 所_入的麵框長度之自逋應音源信號和由前述第二圆框音 源生成櫬構.56所输入之第二圈框長度的驅動音源信號以生 成音源信號,並输出至合成濾波器22上。合成»波器22係 將由前述音源信號生成機構21所轆入之音湎信號,使用由 線性預拥參数解碼機構16所輸入之線性預測參数進行線性 預測合成,並輪出語音翰出7。 以上係本實施形態10之語音解碣裝置的特激性動作。 若依據本簧施形態10,則藉由當語音输入的節距遇期Ρ .短於圓框長度時邇期性相加平均語音输入以生成向董長度 Ρ的目禰語音向悬,相對於此評估和媒性預測合成向量長 度Ρ之驅動音源向fl所生成的語音合成向量之失真*躭可 埋避語音合成的品質惡化·及可生成運算fi少且品質佳的 語音合成。 〔發明之效果〕 (锖先閱讀背面之注$項再填寫本頁) •裝· -訂 線- 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) -34 317631 A7 _ B7_ 五、發明説明(32 ) 如以上詳述,若依據申請專利範圍第1〜4、6、8、. 9、 11〜1 4項中所記載的發明,則由於在語音編碼裝置上,具 備有:目標語音生成機構,係從語音輸入中生成對應延遲 參數之向量長度的目標語音向量者;自適應音源編碼簿, 係從過去所生成的音源信號中生成對應前述延遲參數之向 量長度的自適應音源向量者;自適應音源檢索機構,係評 估與由前述自適應音源向量中所得之語音合成向量的前述 目標語音向量相對的失真*並檢索失真為最小的自適應音 源向量者;及圖框音源生成機構,係從前述失真為最小的 自適應音源向虽中生成圖框長度的音源信號者,所K可迺 避語音合成的品質惡化,及可生成運算量少且品質佳的語 音合成。 又,若依據申請專利範圍第5項中所記載的發明,則由 於目標語音向董之向虽長度取為有理數,所K在從語音輸 入中生成目標語音向量之際,不因語音輸人之抽樣週期而 可精度佳地生成目標語音向董,並可迴避語音合成的品質 惡化,及可生成運算量少且品質佳的語音合成。 經濟部中央橾準局貝工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 又,若依據申請專利範圍第7項中所記載的發明,則由 於目搮語音生成機構會以每一向量長度分割對應延遲參數 之向量長度的整數倍長度之語音輸入,並相加平均前述每 一向S長度的語音輸入以生成目搮語音向S,所以在進行 生成目標語音向量之際的平均化處理中,不需要處理向量 長度相異的向霣,就可簡單處理*並可迴避語音合成的品 質惡化,及可生成運算量少且品霣佳的語音合成。 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐> -35 - 317631 A7 B7 經濟部中央橾準局貝工消費合作社印装 五、發明説明(33) 1 又 若 將 生 成 巨 標 向 量 之 向 量 長 度 的 整 數 倍 長 度 Z 語 音 Ί 1 輸 入 設 為 圖 框 長 度 以 上 則 藉 由 在 將 超 出 臞 框 長 度 之 語 音 1 1 輸 入 進 行 語 音 編 碼 之 際 的 評 估 上 使 用 > 及 藉 由 該 圖 框 之 語 請 1 1 音合成亦附加帶給該圖框Μ後之影響而決定編碼 就 可 使 閱 讀 1 背 1 00 音 合 成 之 再 現 性 良 好 並 使 品 質 提 高 0 由 之 注 1 又 若 闞 於 每 —* 向 1 長 度 之 語 音 輸 入 的 特 徵 量 至 少 包 含 意 事 1^- 纽 音 輸 入 的 功 率 資 訊 則 藉 由 在 語 音 輸 入 的 功 率 大 的 部 分 再 4 1 上 進 行 加 權 語 音 編 碼 就 可 使 帶 給 主 觀 品 質 之 影 響 大 的 語 寫 本 頁 裝 1 音合成功率大的部分之再現性變得 良 好 並 使 品 質 提 高 〇 1 1 又 若 關 於 每 一 向 量 長 度 之 語 音 輸 入 的 特 微 量 至 少 包 含 1 I 語 音 輸 入 的 功 率 資 訊 則 藉 由 在 語 音 輸 入 具 有 週 期 1之週 1 1 訂 期 性 的 情 況 縮 小 相 關 低 的 部 分 之 加 權 並 進 行 語 音 編 碼 即 1 使對節距週期變動的語音輸入亦可生成對應1節距週期的 1 1 失 真 小 的 S 標 語 音 向 量 使語音合成之再現性 良 好 並 使 1 1 品 質 提 高 0 l· 線 又 若 巨 標 語 音 生 成 im 徹 構 係 按 照 每 一 向 量 長 度 之 m an 音 輸 I 入 的 時 間 關 係 並 前 述 每 一 向 量 長 度 相 加 平 均 語 音 輸 入 1 而 決 定 在 生 成 巨 標 語 音 向 量 之 際 的 加 權 則藉由在圖框境 1 1 界 近 旁 的 語 音 輸 入 中 加 大 加 權 而 生 成 百 標 語 音 向 量 並 將 1 | 之 編 碼 就 可 使 圖 框 境 界 近 旁 的 語 音 合 成 之 再 現 性 良 好 , 1 I 並 可 使 圖 框 間 的 語 音 合 成 之 變 化 滑 順 0 1 1 I 又 » 若 巨 標 語 音 生 成 機 構 在 以 每 一 向 量 畏 度 相 加 平 均 語 1 1 音 輸 入 之 際 > 微 調 前 述 每 一 向 量 畏 度 之 語 音 輸 入 的 時 間 Μ 1 1 係 » 則 藉 由 向 量 長 度 之 語 音 m 入 間 的 相 互 m 變 成 最 大 而 微 1 1 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) -36 - 經濟部中央橾準局負工消费合作社印製 A7 B7 五、發明説明(34 ) 調所切出 的位置,即使對.節距週期變動的語音輸入亦可 生成對應1節距週期的失真小的目標語音向量,使語音合 成之再現性良好,並使品質提高。 又,若依據申請專利範圍第1 0項中所記載的發明,則由 於圖框音源生成機構係在圖框間内插向董畏度之音源向量 而生成音源信號,所以可使圖框間的音源信號之變化變得 滑順,使語音合成之再現性良好,並使品質提高。 〔圖式之簡單說明〕 圄1顯示本發明之實施形態1之語音編碼裝置及語音解碼 裝置的全體構成方塊圖。 圖2顯示本發明之實施形態1之目標語音生成機構之動作 一例的說明圖。 圖3顯示本發明之實施形態5之目標語音生成機構之動作 一例的說明圖。 圖4顯示本發明之實施形態6之目標語音生成機構之動作 一例的說明圖。 圖5顯示本發明之資施形態7之目標語音生成機構之動作 一例的說明圖。 圖6顯示本發明之實施肜態8之目標語音生成機構之動作 一洌的說明圃。 圖7顯示本發明之實施形態9之目標語音生成機構之動作 一例的說明圖。 圈8顯示本發明之實施形態10之語音編碼裝置及語音解 碼裝置的全體構成方塊圖。 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) -37 - (請先閲讀背面之注意事項再填寫本頁) f -裝· 訂 線 冬 8a 8. :u A7 B7 經濟部中央樣準局負工消费合作杜印製 五、發明説明(35) 1 1 I 圖 9顗示習知語音編碼解碼装置之- -例的全體構成方塊 1 1 I 圖 〇 1 I 圖 10(a) 、 (b) 顳示 習 知 語 音 編 碼解 碼裝置中 之自 適 應 音 請 先 1 1 閲 I 源 向 量之一例的 說明 圖 0 讀 背 | ίι I 圖 1 1 ( a )、( b ) 顯示 改 良 習 知 之 語音 編碼解碼 裝置 中 之 S 之 注 1 1 適 應 音源向量之 —例 的 說 明 圓 0 患 事 項 1 I 匾 12 顯示 習 知 語 音 编 碼解 碼裝置之 另一 相 異 之 再 填 1 % 本 裝 * 例 的全體構成 方塊 圖 0 X 1 1 圖 13(a)' (b) 顯示 習 知 語 音 m 碼解 碼装置中 之遇 期 化 之 1 1 驅 動 音源1¾量之 -例 的 說 明 圖 0 1 1 C 元 件編號之說 明〕 1 訂 1 編碼器, 1 I 2 解碼器, 1 1 I 3 多工器, 1 1 4 分離機構 >,, 5 語音輸入 | 6 傳輸線, 1 I 7 語音輸出 1 1 I 8 線性預測參數分析機構, 1 1 9 線性預測參數編碼機構> 1 1 10 、17 自適 懕音 源 編 碼 薄 t 1 I 11 自適應音 源檢 索 機 構 t 1 I 12 誤差信號 生成 機 構 1 1 I 13 、19 驅動 音源 編 碼 薄 t 1 1 本紙張尺度適用中國國家橾準(CNS ) Α4規格(210X297公釐) ^ .Τ ·ΖΓ 38 A7 B7 經濟部中央棣準局及工消费合作社印製 五、發明説明(36 ) 1 14 驅 動 音 源 檢 % 櫬 構 1 1 I 15 ' 21 音 源 信 號 生 成 檐 構 9 1 1 16 線 性 頊 測 參 數 解 碼 機 構 9 請 先 1 1 18 白 遛 應 音 源 解 碣 機 構 » 閲 讀 1 背 1 20 驅 動 音 源 解 碭 機 構 面 之 注 1 22 合 成 m 波 器 » I 項 11 r l 23 ' 24 驅 動 音 源 m 碼 簿 * 再 S 裝 1 25 節 距 分 析 檐 櫞 本 頁 26 延 遲 參 數 檢 索 範 画 決 定 檐 構, 1 1 27 語 音 输 入 增 加 抽 樣 機 構 9 1 I 28 百 檷 語 音 生 成 級 構 I 1 訂 2 9 ' 37 音 源 信 號 增 加 抽 樣 懺 構, 1 30、 38 自 缠 懕 音 源 堪 礓 薄 1 1 31 I 缠 應 音 源 檢 索 櫬 構 * 1 32 ' 40 圔 框 音 源 生 成 檄 構 1 線 33 第 二 巨 檁 語 音 生 成 櫬 構 1 I 34、 41 驅 動 音 源 煽 碼 簿 1 35 驅 動 音 源 檢 索 機 構 1 1 36 ' 43 第 二 圖 框 音 源 生 成 櫬 構, 1 | 39 i m 應 音 源 解 碭 拥 嫌 1 I 42 驅 動 音 源 解 碼 櫬 構 , 1 1 I 44 語 音 输 入 增 加 抽 樣 拥 構 • 1 1 45 巨 禰 語 音 生 成 機 構 » 1 1 46 ' 51 驅 動 音 源 m 碼 薄 9 1 1 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) -39 - 317631 A7 B7 五、發明説明(37 ) 經濟部中央梂準局負工消费合作社印製 47 驅 動 音 源 檢 索 機 構 9 48 第 二 巨 標 語 音 生 成 機 構 t 49、 54 第 二 驅 動 音 源 編碼 薄 50 第 二 驅 動 音 源 .檢 索 機 構 9 52 驅 動 音 源 解 碼 機 構 1 53 圖 框 音 源 生 成 機 構 9 55 第 二 驅 動 音 源 解 碼 機 構 f 5 6 第 二 圖 框 音 源 生 成 機 構 〇 (請先閲讀背面之注意事項再填寫本頁)After the above code is over, the multiplexer 3 will make bricks corresponding to the aforementioned quantized linear prediction parameters, and correspond to the pitch period P and the radiating sound source. This paper scale is applicable to the Chinese National Standard (CNS ) A4 specification (210X297 mm) · 32- (please read the $ item on the back and then fill in this page) h-line booking-A7 B7 printed by the Beigong Consumer Cooperative of the Central Bureau of Economic Development of the Ministry of Economic Affairs (3〇), J and sound source gain 7, 72 are sent to the code Fu Fudai 6. The above is the special action of this phonetic code device for Shi Xingjian 10. Next, the operation of the decoder 2 will be explained. Stomach & accepting the multiplexer 3, the separation mechanism 4 * separately outputs the linear pre-code ## 5 * ϋB code to the linear prediction parameter decoding structure 16; outputs the code of the pitch «period P to the drive The source rein codebook 51, the second driving sound source is thin [5 4; turn the driving sound source code I, the source gain 7 tilt code wheel out to the driving sound source decoding structure 52; and the second driving sound source "« J. The sound source «Yi 7 2 Burmese code is outputted to the second moving sound source decoupling mechanism 55. ϋ The codebook 51 for moving sound source stores N number of metaphonic sound source vectors that are the same as the driving sound source for the code 46, and Μ corresponds to the length of the preceding pitch cycle ρ. The drive sound source vector of the drive sound source code I input by the decoding structure 52 is output to the drive sound source decoding structure 52. The driving sound source decoding structure 52 is to extract the driving sound source crucible 7 from the metal sound of the aforementioned driving sound source benefit 7 to generate a driving sound source which is cut out by the aforementioned fraudulent sound source 碖 碣 thin 51. The driving sound source signal multiplied by the aforementioned sound source gain 7 is output to the ring sound source generating structure 53. The plaque frame sound source generation structure 5 3 is derived from the fraudulent sound source information input by the aforementioned moving sound world decoding broadcast structure 52, for example, it is repeated in stages with each pass period, Μ generates _ frame length driving sound source The signal is output to the audio signal generation module 21. The second driving sound source codebook 5 4 series has the same K body sound source vectors as the second driving sound source codebook 49 on the coding side, and uses The vector length corresponding to the aforementioned pitch pass period P is cut out to correspond to the second moving sound source vector input by the second driving sound source decoding structure 55 inputted by the second driving sound source 颛 碣 J ' (CNS) Α4 specification (210x297mm) -33-(please read the note on the back first and then write this page) -installation. Stranding 317631 A7 B7 5. Description of invention (31) Yizengyuan Audio Drive II Before the first statement. On the system, 5 5 5 5 construct the source code solution to decode the source source audio iIS, the second to second decode the second waist audio source printed by the 2nd to the Ministry of Economic Affairs Central Sample Bureau Employee Consumer Cooperative 7 2 Gain 72, K generates a second drive sound source signal cut out from the second drive sound source codebook 54 to the Dong multiplied drive sound source gain 72, and is output to the second The second frame sound source generating mechanism 56 is provided. The second ring frame sound source generating mechanism 56 is derived from the second driving sound source signal rounded by the aforementioned second meta-motion sound source decoding structure 55, for example, Μ is repeated for each buried period to generate a second length Drive the sound source signal, and output to the sound source signal generating mechanism 21. The sound source signal generating structure 21 adds the self-responsive sound source signal of the length of the face frame input by the aforementioned round frame sound source generating mechanism 53 and the second round frame input by the aforementioned second round frame sound source. The sound source signal of the length is driven to generate a sound source signal, and output to the synthesis filter 22. Synthesis »The waver 22 is to linearly predict and synthesize the audio signal input by the aforementioned sound source signal generating mechanism 21 using the linear prediction parameters input by the linear pre-owning parameter decoding mechanism 16, and to output the voice 7 . The above is the extreme action of the voice decoupling device of the tenth embodiment. If according to the present spring application form 10, the average speech input is added by the periodical addition when the pitch of the speech input is shorter than the length of the circular frame to generate the speech of the length of Dong to the Dong, which is relative to This evaluation and media prediction predicts the distortion of the speech synthesis vector generated by the driving sound source of the synthesis vector length P to fl * It can avoid the deterioration of the quality of speech synthesis and can generate speech synthesis with less calculation fi and good quality. [Effects of the invention] (Read the note $ item on the back first and then fill in this page) • Binding ·-Threading-This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) -34 317631 A7 _ B7_ 5 Description of the invention (32) As detailed above, according to the inventions described in items 1 to 4, 6, 8, 9, 9 and 11 to 14 of the patent application scope, since the voice encoding device is provided with: The target speech generating mechanism generates a target speech vector corresponding to the vector length of the delay parameter from the voice input; the adaptive sound source codebook generates an adaptive sound source corresponding to the vector length of the aforementioned delay parameter from the sound source signal generated in the past Vectors; an adaptive sound source retrieval mechanism that evaluates the distortion relative to the target speech vector of the speech synthesis vector obtained from the adaptive sound source vector * and retrieves the adaptive sound source vector with the least distortion; and the frame sound source generation The mechanism is to generate a sound source signal with a frame length from the aforementioned adaptive sound source with the smallest distortion, so that the quality of speech synthesis can be avoided, and the Speech synthesis into operation less and good quality. In addition, according to the invention described in item 5 of the patent application scope, although the length of the target speech to Dong Zhi is taken as a rational number, when K generates a target speech vector from the speech input, there is no reason for the input of the speech The sampling period can accurately generate the target speech to Dong, and can avoid the deterioration of the quality of speech synthesis, and can generate speech synthesis with less calculation and good quality. Printed by the Beigong Consumer Cooperative of the Central Bureau of Economic Affairs of the Ministry of Economic Affairs (please read the precautions on the back before filling in this page). Also, if the invention described in item 7 of the scope of patent application is applied, the voice generator Each vector length divides the speech input corresponding to an integer multiple of the vector length of the delay parameter, and adds the average of the speech input of each direction S length to generate the target speech direction S, so the average when generating the target speech vector In the process of processing, it is not necessary to deal with vectors with different vector lengths, it can be simply processed * and can avoid the deterioration of the quality of speech synthesis, and can generate speech synthesis with less calculation and good quality. This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297mm > -35-317631 A7 B7 Printed by the Central Bureau of Economic Affairs of the Ministry of Economic Affairs Beigong Consumer Cooperative. V. Invention description (33) 1 If another giant standard will be generated The integer length of the vector length of the vector is Z. The length of the voice Ί 1 input is set to be longer than the frame length. It is used for evaluation when speech coding is performed on the voice 1 1 input that exceeds the frame length > and by the frame Words 1 1 tone synthesis also adds to the impact of the frame M and decides the encoding can make reading 1 back 1 00 tone synthesis good reproducibility and improve the quality of 0 from the note 1 and if it is every time- * The feature quantity input to the speech of 1 length contains at least the meaning 1 ^-The power information of the button input is weighted by 4 1 in the part of the power of the speech input Tone coding can make language writing that has a great influence on subjective quality. This page contains 1 the reproducibility of the part with high power of sound synthesis and improves the quality. 1 1 If the characteristics of speech input per vector length The trace contains at least 1 I of the power information of the voice input. By reducing the weight of the relevant low part and performing voice coding when the voice input has a cycle of 1 of the cycle 1 1 schedule, the voice input of the pitch cycle changes It can also generate S 1 standard speech vectors corresponding to 1 pitch period with low distortion, which makes the speech synthesis reproducible and improves the quality of 1 1 0 l · line and if the standard speech generation generates im according to the length of each vector The time relationship between the input sound input and the average speech input 1 of each of the aforementioned vector lengths determines the weighting when generating the giant speech vector Then, by increasing the weight in the speech input near the frame border 1 1 to generate a hundred-standard speech vector and encoding 1 |, the reproducibility of speech synthesis near the frame border is good, 1 I can also make The change of speech synthesis between the frames is smooth 0 1 1 I and »If the giant speech generator generates the average language 1 1 sound input with each vector's degree of fear > fine-tune the aforementioned speech of each vector's degree of fear The input time Μ 1 1 series »then by the vector length of the speech m between the input m becomes the largest and the micro 1 1 The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) -36-Central Ministry of Economic Affairs A7 B7 printed by the quasi-bureau consumer cooperative. V. Description of the invention (34) Adjust the cut position. Even for the speech input with a variable pitch cycle, a target speech vector with a small distortion corresponding to a pitch cycle can be generated. Make the reproducibility of speech synthesis good, and improve the quality. In addition, according to the invention described in item 10 of the patent application range, since the frame sound source generating mechanism interpolates the sound source vectors between Dong Wei and the frame to generate the sound source signal, the sound source signal between the frames can be used The changes become smooth, making the speech synthesis reproducible and improving the quality. [Brief description of drawings] Fig. 1 is a block diagram showing the overall configuration of a speech coding apparatus and a speech decoding apparatus according to Embodiment 1 of the present invention. Fig. 2 is an explanatory diagram showing an example of the operation of the target speech generating mechanism in Embodiment 1 of the present invention. Fig. 3 is an explanatory diagram showing an example of the operation of the target speech generating mechanism in Embodiment 5 of the present invention. Fig. 4 is an explanatory diagram showing an example of the operation of the target speech generating mechanism in Embodiment 6 of the present invention. Fig. 5 is an explanatory diagram showing an example of the operation of the target speech generating mechanism of the funding form 7 of the present invention. FIG. 6 shows an explanation of the actions of the target speech generating mechanism implementing the state 8 of the present invention. Fig. 7 is an explanatory diagram showing an example of the operation of the target speech generating mechanism in Embodiment 9 of the present invention. Circle 8 shows a block diagram of the overall configuration of a speech coding apparatus and a speech decoding apparatus according to Embodiment 10 of the present invention. This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297mm) -37-(please read the precautions on the back before filling in this page) f-Binding · Stranding Winter 8a 8.: u A7 B7 Central Ministry of Economic Affairs Sample printing of quasi-bureaus and labor cooperation cooperation Du Du. V. Description of the invention (35) 1 1 I FIG. 9 shows a conventional speech encoding and decoding device-the overall configuration block of the example 1 1 I FIG. 〇1 I FIG. 10 (a) , (B) Temporary display of adaptive sound in the conventional speech encoding and decoding device, please first read the description of an example of I source vector. Figure 0 Read back | ίι I Figure 1 1 (a), (b) shows improved knowledge Note 1 of S in the speech coding and decoding device 1 1 Adaptation to the sound source vector-example explanation circle 0 Illness 1 I plaque 12 shows another difference of the conventional speech coding and decoding device and then fill in 1% of this device * example Block diagram of the overall composition 0 X 1 1 Figure 13 (a) '(b) shows the periodical 1 1 drive in the conventional speech m-code decoding device Audio source 1-quantity description-example illustration 0 1 1 C component number description] 1 order 1 encoder, 1 I 2 decoder, 1 1 I 3 multiplexer, 1 1 4 separation mechanism >, 5 voice input | 6 transmission lines, 1 I 7 speech output 1 1 I 8 linear prediction parameter analysis mechanism, 1 1 9 linear prediction parameter coding mechanism > 1 1 10, 17 self-adapted audio source coding thin t 1 I 11 adaptive audio source retrieval mechanism t 1 I 12 Error signal generation mechanism 1 1 I 13, 19 Drive source code thin t 1 1 This paper scale is applicable to China National Standard (CNS) Α4 specification (210X297 mm) ^ .Τ · ΓΓ 38 A7 B7 Central Ministry of Economics Printed by the Bureau and the Industry and Consumer Cooperatives 5. Description of the invention (36) 1 14 Driving audio source detection% Structure 1 1 I 15 '21 Audio source signal generation eaves structure 9 1 1 16 Linear measurement parameter decoding mechanism 9 Please first 1 1 18 Bai Yunying's sound source resolution mechanism »Read 1 Back 1 20 Note on the drive sound source resolution mechanism 1 22 Synthesizer m waver» I Item 11 rl 23 '24 Drive source m codebook * Re-install 1 25 pitch analysis eaves This page 26 The delay parameter search paradigm determines the eaves structure, 1 1 27 Voice input increase sampling mechanism 9 1 I 28 Hundreds of speech generation level structure I 1 order 2 9 '37 sound source signal increase sampling structure, 1 30, 38 from The sound source is very thin 1 1 31 I The sound source search structure * 1 32 '40 The frame sound source generation structure 1 line 33 The second giant purlin speech generation structure 1 I 34, 41 Drive source code book 1 35 Driven sound source retrieval mechanism 1 1 36 '43 second frame sound source generation structure, 1 | 39 im response sound source decongestion 1 I 42 drive sound source decoding structure, 1 1 I 44 speech input increase sampling structure • 1 1 45 Ju You Speech Generator »1 1 46 '51 driver sound source m code thin 9 1 1 This paper scale is applicable to China National Standard (CNS) A4 specification (210X297 mm) -39-317631 A7 B7 V. Invention description (37 ) Printed by the Central Bureau of Economics of the Ministry of Economic Affairs of the Consumer Cooperative 47 Driven Audio Source Retrieval Agency 9 48 Second Megaphone Speech Generation Organization t 49, 54 Second Driven Audio Source Code Book 50 Second Drived Audio Source. Retrieval Agency 9 52 Driven Audio Source Decoding Organization 1 53 Frame sound source generation mechanism 9 55 Second drive sound source decoding mechanism f 5 6 Second frame sound source generation mechanism 〇 (Please read the precautions on the back before filling this page)
A •裝. 訂 線 本紙張尺度適用中國國家橾準(CNS ) A4規格(210X297公釐〉 -40 -A • Binding. Threading This paper standard is applicable to China National Standard (CNS) A4 (210X297mm) -40-