TW535140B

TW535140B - Phoneme decoder

Info

Publication number: TW535140B
Application number: TW90130355A
Authority: TW
Inventors: Huang-Lin Yang
Original assignee: Inventec Besta Co Ltd
Priority date: 2001-12-07
Filing date: 2001-12-07
Publication date: 2003-06-01

Abstract

The present invention provides a phoneme decoder, which uses the voice data encoded with three parameters in the voice data, such as pitch, root of mean square, and reflection of coefficients, for being composed as a voice signal. The phoneme decoder comprises: an initialization unit, a parameter loading unit, a smoothing unit, a composition unit, and a voice output unit. The initialization unit generates the initialization signal, and loads the three voice parameters, such as pitch, root of mean square, and reflection coefficients, into the smoothing unit. The smoothing unit receives the voice parameter data, and uses the interpolation method for smoothing, and transmits the processed voice parameter data to the composition unit. The composition unit composes the voice data according to the parametric sequence of pitch, reflection coefficients and root of mean square, and outputs the same to the voice output unit.

Description

535140 五、發明說明d) 【發明之應用領域】本發明係關於一種語基礎來對語音編碼後加以【發明背景】在中低階的電子辭典能’已成為電子辭典主要子辭典在市場的競爭力，改進並且同時要能降低生錄製特定語音，由於其資大限制，相當耗費成本，合成的方式來接近真人發料5己fe、體並提南聲音品質這種語音分析合成的析語言信號並將其提出必照語音產生的模型合成為數，就有相應的語音編碼由於語音分析合成的資料來代表原始信號，所術，其牽涉到語音的取樣波形編碼中的適應性差量 Pulse Code Modulation 於使重建的信號與原始信而言，其採用最小均方誤音合成裔’特別是一種以解碼之語音音素解碼器。市場中’標榜以真人發音訴求之特色。為了提昇中各家廠商無不專注於語音產成本。有些廠商所強調料量大，且系統輸出之種所以，大多數廠商都以語音’可讓電子辭典能節省〇技術是依照一定的處理方要的特徵參數，並用這些語音。於是，依據不同的方法以及語音合成方法。過程是將聲音信號以最少以，一般也稱之為語音壓以及編碼與解碼等技術。脈衝碼調變（Adapt ive m ，A D P C Μ )的編碼方式，芦-號波形愈像愈好；從數學差的準則（Minimum Mean535140 V. Description of the invention d) [Application field of the invention] The present invention is about a language basis to encode the speech [Background of the invention] In the low-to-medium-level electronic dictionary, 'the electronic dictionary can become the main sub-dictionary in the market competition Power, improvement, and at the same time can reduce the recording of specific speech, due to its large limitations, it is quite costly, the synthesis method to approach real human hair, 5 voice, body, and the sound quality of this voice analysis and synthesis of speech analysis The number of models generated by the proposed speech must be synthesized, and the corresponding speech coding represents the original signal due to the data synthesized by speech analysis. Therefore, it involves the adaptive difference in the sampling waveform encoding of speech. Pulse Code Modulation As for the reconstructed signal and the original letter, it uses the least mean square error synthesizer, especially a speech phoneme decoder for decoding. The ‘flag’ in the market is characterized by a real human voice. In order to improve, all manufacturers have focused on voice production costs. Some manufacturers emphasize the large amount of data and the output of the system. Therefore, most manufacturers use voice to save electronic dictionaries. The technology is to use these voices in accordance with certain processing parameters. Therefore, according to different methods and speech synthesis methods. The process is to minimize the sound signal, which is also commonly referred to as speech compression and encoding and decoding technologies. Pulse code modulation (Adaptive m, A D P C M) encoding method, the better the Lu- number waveform, the better; from the mathematical difference criterion (Minimum Mean

Error Criterion)，但ADPCM方法的位元率小於音素為的功低階電功能的的真人類受極音分析語音資法，分茶數按特徵參的數位縮技如語音 ^ 11 a t點在的觀點 SquareError Criterion), but the bit rate of the ADPCM method is lower than that of the phoneme for low-order electrical functions. Real humans receive extreme sound analysis voice data method. Perspective Square

535140 五、發明說明（2) 2 4kbps(Kilo Bit Per Second)，會有經還原後的聲音品質變差，且運算量大的問題。以上所述的語音分析合成，其特色是具有可大幅壓縮語音資料量，亦可額外有保密通信之優點（運用加密技術）。不過，其缺點為語音合成之輕重、分音、基週往往與自然語音有所差距，造成不自然，甚至不易識別的缺點。535140 V. Description of the Invention (2) 2 4kbps (Kilo Bit Per Second), there will be problems that the quality of the sound after the reduction is poor and the amount of calculation is large. The above-mentioned speech analysis and synthesis has the advantages of greatly compressing the amount of speech data, and also has the advantage of confidential communication (using encryption technology). However, its shortcomings are the importance of speech synthesis, partial separation, and base period, which are often different from natural speech, resulting in unnatural and even difficult to identify shortcomings.

即便是經過壓縮的語音分析合成技術，仍然有節省記憶體空間的可能性。此外，現有的語音分析合成技術多以線上（on-1 ine)的方式運作，所以，必須加上判斷語音是否「有聲音」的動作，常常在判斷的過程中，會將「有聲」與「無聲」的部分判斷錯誤，造成語音合成時產生聲音沙啞的情形。於是，如何能讓語音分析合成技術所產生的語音，一方面能達到接近自然語音，亦即，音質的改善；另一方面，如何能達到最大壓縮的程度，亦即，最不耗佔記憶體空間；再一方面，如何能讓語音分析合成的過程較為簡單；以上幾點均成為重要的研究課題。Even with compressed speech analysis and synthesis technology, there is still the possibility of saving memory space. In addition, the existing speech analysis and synthesis technologies mostly operate on-line. Therefore, it is necessary to add the action of judging whether the voice has "sound". Often, in the process of judging, "sound" and "sound" are used. The "soundless" part of the judgment is incorrect, which leads to a hoarse sound during speech synthesis. Therefore, how can the speech generated by the speech analysis and synthesis technology be close to natural speech, that is, the improvement of the sound quality; on the other hand, how can the maximum compression be achieved, that is, the least memory consumption? Space; on the other hand, how to make the process of speech analysis and synthesis relatively simple; the above points have become important research topics.

【發明之目的及概述】鑒於以上習知技術的問題，本發明提供一種基於語音音素分類之編碼方法，利用將語音音素分為有聲、無聲與靜音三種，只要將有聲的部分加以編碼；於解碼時，只要針對有聲的編碼部分運用本發明之語音音素解碼器，即可執行高運算量的語音解碼。[Objective and Summary of the Invention] In view of the problems of the conventional technology above, the present invention provides a coding method based on speech phoneme classification, which uses speech phonemes to be classified into voiced, silent, and mute, as long as the voiced part is encoded; decoded At this time, as long as the speech phoneme decoder of the present invention is applied to the voiced encoding part, a high-computation speech decoding can be performed.

第5頁 535140 五、發明說明（3) 依據本發明所揭露的技術，本發明提供一種語音音素解碼器，其以一振幅參數（RMS)、一基週參數（Pi tch)與一以線性預估編碼方式編碼（L i n e a r P r e d i c t i v e C 〇 d i n g， LPC)之頻譜參數（RC’ s)所邊碼之語音資料加以解碼，該經編碼之語音資料存於一語音資料庫，對語音資料庫中的語音資料加以解碼，本發明之語音音素解碼器包含：一初始化單元、一載入參數單元、一平滑處理單元、一合成單元與一語音輸出單元。Page 5 535140 V. Description of the invention (3) According to the technology disclosed by the present invention, the present invention provides a speech phoneme decoder, which uses an amplitude parameter (RMS), a base period parameter (Pi tch) and a linear prediction. Estimating the encoding method (Linear Prective Coding, LPC) to decode the speech data of the side code of the spectral parameter (RC's), the encoded speech data is stored in a speech database, and the speech data database The speech data of the present invention is decoded. The speech phoneme decoder of the present invention includes: an initialization unit, a loading parameter unit, a smoothing processing unit, a synthesis unit, and a speech output unit.

其中，初始化單元，用來產生一初始化信號 (i n i t i a 1 )。載入參數單元則與初始化單元相連接，用來接收初始化單元所發出的初始化信號，並以一個音框 (F r a m e )為單位從語音資料庫中載入此次音框之語音資料。平滑處理單元則用來接收載入參數單元所載入之此次音框之語音資料，並以此次音框中之一個基週（P i t c h )為長度，運用内差法分別處理此次音框之語音資料中之振幅參數、基週參數與頻譜參數，平滑處理單元處理完此次音框之語音資料後，發出下一個音框之信號至載入參數單元以載入下一個音框之語音資料。合成單元用來接收平滑處理單元所處理的每個基週之語音資料，並將其合成為一語音信號；合成單元處理完每個基週之語音資料後，發出下一個基週之信號至平滑處理單元以處理下一個基週之語音資料。最後，合成單元即可將合成的語音信號送至語音輸出單元以輸出語音。此外，平滑處理則以内差法來作處理，其中必須計算The initialization unit is used to generate an initialization signal (i n i t i a 1). The loading parameter unit is connected to the initialization unit, and is used to receive the initialization signal sent by the initialization unit, and load the voice data of the voice frame from the voice database by using a voice frame (F r a m e) as a unit. The smoothing processing unit is used to receive the voice data of the sound frame loaded by the loading parameter unit, and uses a base period (P itch) of the sound frame as the length to process the sound separately using the internal difference method. The amplitude parameter, base period parameter and spectrum parameter in the voice data of the frame. After the smoothing processing unit processes the voice data of the sound frame, it sends a signal of the next sound frame to the loading parameter unit to load the next sound frame. Voice data. The synthesis unit is used to receive the voice data of each base cycle processed by the smoothing processing unit and synthesize it into a voice signal; after the synthesis unit has processed the voice data of each base cycle, it sends a signal of the next base cycle to the smoothing The processing unit processes the voice data of the next base week. Finally, the synthesis unit can send the synthesized voice signal to the voice output unit to output the voice. In addition, smoothing is handled by the internal difference method, which must be calculated

第6頁 535140 五、發明說明（4) 比例參數（Prop; Proportion)。另外，因為在合成時以基週為合成單位，即一次合成一個週期。因而，在一個音框合成的週期總長度，必須小於這個要合成的語音長度 (Frame_len)，其殘留未合成的語音長度 (Frame —res = Frame—len — Synths)^1 併到下個音才匡處理，故下一個音框要合成的語音長度即為，Page 6 535140 V. Description of the invention (4) Proportion parameter (Prop; Proportion). In addition, because the base unit is used as the synthesis unit during synthesis, that is, one cycle at a time. Therefore, the total length of the period synthesized in a sound frame must be less than the length of the speech to be synthesized (Frame_len), and the remaining unsynthesized speech length (Frame —res = Frame —len — Synths) ^ 1 and only until the next sound Marina processing, so the speech length to be synthesized in the next sound box is

Frame_1en=Frame_res+180〇其中，Pr〇p=(Synths + Pitchl)/Frame_lenoFrame_1en = Frame_res + 180〇 where Pr〇p = (Synths + Pitchl) / Frame_leno

有關本發明的特徵與實作，茲配合圖示作最佳實施例詳細說明如下：【發明之詳細說明】With regard to the features and implementation of the present invention, the preferred embodiment is described in detail with reference to the drawings: [Detailed description of the invention]

由於電子辭典市場的語音處理較為規則，且其要求的資料壓縮量較大，所以，本發明運用線性預估編碼 (Linear Predictive Coding，以下簡稱 LPC)的方式作為本發明之編碼與解碼的方式，因為，此方法是基於語音發聲模型，而估計信號的聲道濾波器（V 〇 c a 1 T r a c t F i 11 e r ) 參數及基本週期（P i t ch )達到壓縮的目的，可達到非常低的位元率（L 〇 w B i t R a t e )，所以相當適合作為本發明的編碼方法。本發明運用聲音的「有聲」（voiced)、「無聲」（氣音；unvoiced)與「靜音」語音音素（phoneme)來做基本聲音分類，並將有聲語音音素部分加以壓縮編碼，無聲語音音素部分則保留其原音不壓縮，靜音部分則只記錄靜音長度。以此種分類方式所計算出的參數，包括振幅（R M S ;Because the electronic dictionary market has relatively regular speech processing and requires a large amount of data compression, the present invention uses the Linear Predictive Coding (hereinafter referred to as LPC) method as the encoding and decoding method of the present invention. Because this method is based on a speech utterance model, the channel filter (V 0ca 1 Tract F i 11 er) of the estimated signal and the basic period (P it ch) are used to achieve the purpose of compression, which can reach very low bits. The element rate (L ow B it Rate) is quite suitable as the encoding method of the present invention. The present invention uses the "voiced", "unvoiced" and "silent" voice phonemes to classify basic sounds, and compresses and encodes the voiced phoneme parts, and the voiced phoneme parts The original sound is left uncompressed, and the mute part only records the mute length. The parameters calculated in this classification include the amplitude (R M S;

第7頁 535140 五、發明說明（5) root of mean square)、基週（Pitch，亦即音調）及頻譜 (RC’s;反射係數，reflection coefficients)參數三種。其中，振幅參數與基週參數的獲得，係以一個音框 (一個音框frame: 180取樣點，8kHz之取樣率）為單位，逐步計算出其參數值。而頻譜參數（RC，s)的獲得則依LpC的模型計异而得，亦即，依照下列轉移函數（T r a n s丨e rPage 7 535140 V. Description of the invention (5) There are three types of parameters: root of mean square), pitch (ie pitch) and spectrum (RC's; reflection coefficients) parameters. Among them, the acquisition of the amplitude parameter and the base period parameter is based on a sound frame (a sound frame: 180 sampling points, a sampling rate of 8 kHz) as a unit, and the parameter values are calculated step by step. The spectrum parameters (RC, s) are obtained according to the LpC model, that is, according to the following transfer function (T r a n s 丨 e r

Function in Z-Domain)H(z)計算而得： H(z)二 A0/(l+alz-l+a2z - 2··· +ai〇z—i〇) 其中，AO係為振幅參數，z( = e_jwM、為複數（c〇_iex number)，al 〜alO 即為 LPC 參數。由以上的三種參數，—個「有聲」語音音框 (180sampleS)可編碼為54 bits，壓縮位元率相當於 2. 4kbps，各個參數的位元配置如下： 6 I 6 5 ~5^1 ^^ 1 5 4 4 4 4 3 3 由θ 9素編碼方法所得到的經編碼語音，在解壓縮時需將有聲語音部份，利用内差方式將振幅、基週及頻缙麥！做平滑處理，再利用語音合成器、，還原有聲語曰’…耳口F刀/、而依據位址取出原語音加以還原；而靜音部分，只需取出靜音時間長即可。透過上述方法所建立的語音資料庫，即认上述三種參數作為編碼的基礎’解碼時，只要依據語建立Function in Z-Domain) H (z) is calculated as: H (z) two A0 / (l + alz-l + a2z-2 ·· + ai〇z—i〇) where AO is the amplitude parameter, z (= e_jwM, is a complex number (c0_iex number), al ~ alO are LPC parameters. From the above three parameters, a "voice" voice sound box (180sampleS) can be encoded into 54 bits, compressed bit rate Equivalent to 2.4 kbps, the bit configuration of each parameter is as follows: 6 I 6 5 ~ 5 ^ 1 ^^ 1 5 4 4 4 4 3 3 The encoded speech obtained by the θ 9 prime encoding method needs to be decompressed Use the internal difference method to smooth the amplitude, base period, and frequency of the voiced speech part! Then use the speech synthesizer to restore the voiced speech, “... the ear mouth F knife /, and take out the original voice according to the address For the mute part, just take out the long silence time. The voice database established by the above method recognizes the above three parameters as the basis for encoding.

第8頁 535140 五發明說明（6) 規則來設計語音音素解碼器即可。Page 8 535140 5 Description of the invention (6) The rules can be used to design speech phoneme decoder.

元經與成入 Pi P 語音音素解碼器的動作過。。、，序列（Bi t Stream)，亦即^」首先’先將一連串的位編碼的語音資料，轉為編=語音資料庫當中所選取的頻譜參數，再將這些參數麵:的一項芩數，振幅、音高時以一個基週（Pi tcW為單二δ吾^合成器合成語音。合一組參數，並儲存上一個立\母隔一個音框（Frame)讀 ;f ru η \ 卜 g 框苓數（RMS0，RC0 ltCh〇)，母一個週期合成，队U， itch)，由這些音框盥上一個\之麥數（RMS，KC， (s嶋ther)得到。個音框參數作平滑處理平滑處理則以内差法來作声 ^(Prop; ,..^ Η , Λ 力^因為在合成時以基週為人成早位，即-次合成一個週期。因而，在_個音框=二週期總長度’必須小於這個要合成的語音長度、 (Frame — len)，其殘留未合成的語音長度又 (Frame_res = Fraine—len-Synths)將併到下個音框處理，故下一個音框要合成的語音長度即為，Yuan Jing and Pi Pi speech phoneme decoder actions have been performed. . ", Sequence (Bi t Stream), that is," ^ "First, a series of bit-coded speech data is first converted into the selected spectral parameters in the speech database, and then these parameters are faced with: , The amplitude and pitch are synthesized with a base period (Pi tcW is a single-two δ ^ synthesizer). Combine a set of parameters and store the previous frame \ mother frame a frame read; f ru η \ Bu g frame number (RMS0, RC0 ltCh〇), a period synthesis of the mother, team U, itch), from these sound frames using a \ number of wheat (RMS, KC, (s 嶋 ther) to get. For smoothing, smoothing uses the internal difference method to make a sound ^ (Prop;, .. ^ Η, Λ force ^ because the base period is used to synthesize early in the synthesis, that is, one cycle is synthesized at a time. Therefore, in _ Sound frame = total length of two cycles' must be less than the length of the speech to be synthesized, (Frame — len), and the remaining unsynthesized speech length (Frame_res = Fraine — len-Synths) will be processed in the next sound frame, so The speech length to be synthesized in the next frame is

Frame—len=Frarae—res+180。其中，Pr〇p:(synths :)itchl )/Frame_len〇以下，將詳細介紹本發明利用上述語音編碼方法所設計之語音音素解碼器。 & 首先，請參考「第1圖」，本發明之語音音素解碼器之系統架構圖，其包含了以下幾個部分··初始化單元1 〇、載入參數單元20、平滑處理單元30、合成單元40及語音輸 535140 五、發明說明（7) 出單元50。首先，初始化單元1 0產生一初始化信號（i n i t i a 1 )，載入參數單元2 0依此設定各項參數初值。接著，載入參數單元20即依序載入所要合成的音框（Frame)當中的所有參數值，亦即，一次載入一個音框當中的三項語音參數。接著，平滑處理單元3 0將載入參數單元2 0所載入的各項語音參數加以平滑處理後，一次處理一個音框當中的一個基週 (Pitch)，並將這些經過平滑處理的參數送至合成單元40 合成為語音，並送出一個”下一個音框n(Next_Frame)的信號至載入參數單元2 0，讓其載入π下一個音框"的語音參數。合成單元4 0所合成之語音信號送至語音輸出單元5 0即可輸出語音，並且送出”下一個基週”（Next_Pi tch)信號至平滑處理單元3 0，讓平滑處理單元處理"下一個基週π的語音參數。接下來，將以具體實施例來說明本發明的語音音素解碼器，請繼續參考「第2圖」，其說明了本發明的信號傳輸架構。初始化單元1 0產生初始化信號（i n i t i a 1 )。載入參數單元2 0依初始化信號來設定各項初始值，此外，並負責載入語音音素的三項參數（RCj( 10)， RMSj， Pitch〕），並保留上一個音框的三項參數（RCO(IO)，RMSO， P i t c h 0 )，最後，依據平滑處理單元所送來的每一次音框處理的合成長度（L)以產生下一個音框處理的長度（M)。平滑處理單元3 0則接收載入參數單元2 0所傳送的各項參數後，將此次所處理音框的三項參數（RC j(10)，RMSj，Frame_len = Frarae_res + 180. Among them, Prop: (synths:) itchl) / Frame_len0, the speech phoneme decoder designed by the present invention using the above-mentioned speech coding method will be described in detail. & First, please refer to "Figure 1", the system architecture diagram of the speech phoneme decoder of the present invention, which includes the following parts: Initialization unit 1 0, loading parameter unit 20, smoothing processing unit 30, synthesis Unit 40 and voice input 535140 V. Description of the invention (7) Output unit 50. First, the initialization unit 10 generates an initialization signal (i n i t i a 1), and loads the parameter unit 20 to set the initial values of the parameters accordingly. Then, the parameter loading unit 20 sequentially loads all parameter values in the frame to be synthesized, that is, loads three speech parameters in one frame at a time. Next, the smoothing processing unit 30 smoothes various speech parameters loaded in the loading parameter unit 20, processes one pitch (one pitch) in one sound frame at a time, and sends these smoothed parameters The synthesis unit 40 synthesizes the speech, and sends a signal of "Next sound frame n (Next_Frame)" to the loading parameter unit 20, and it loads the speech parameters of the next sound frame ". The synthesis unit 40 The synthesized voice signal is sent to the voice output unit 50 to output the voice, and the "Next_Pi tch" signal is sent to the smoothing unit 30, which allows the smoothing unit to process " the voice of the next base cycle π Parameters. Next, the speech phoneme decoder of the present invention will be described with specific embodiments, please continue to refer to "Figure 2", which illustrates the signal transmission architecture of the present invention. The initialization unit 10 generates an initialization signal (i n i t i a 1). Load parameter unit 20 sets the initial values according to the initialization signal. In addition, it is responsible for loading the three parameters of speech phonemes (RCj (10), RMSj, Pitch]), and retains the three parameters of the previous sound box. (RCO (IO), RMSO, Pitch 0). Finally, according to the synthesized length (L) of each frame processing sent by the smoothing unit to generate the next frame processing length (M). The smooth processing unit 30 receives the parameters transmitted by the loading parameter unit 20, and then changes the three parameters of the sound box (RC j (10), RMSj,

第10頁 535140 五、發明說明（8)Page 10 535140 V. Description of the invention (8)

Pi tch j)加以平滑處理，並將處理後的參數（RC( 1 0)，RMS， Pitch)，一次以一個基週（Pitch j)為單位，傳送至合成單元40，並送出下一個基週（Next_Pitch)信號，要求平滑處理單元3 0傳送下一個基週的參數，以及，送出一個合成長度（L )至載入參數單元2 0，亦即，此次合成語音的長度。最後，合成單元4 0將三項參數合成後，送至語音輸出單元 5 0以輸出語音。其中，各項參數的位元信號，如「第2圖」所示，初始化信號（initial)為一位元的控制信號；RC0(10)為帶符號（s i g n e d )之八位元信號；R M S 0為未帶符號（u n s i g n e d )之十六位元信號；P i t c h 0為未帶符號之八位元信號； R C j ( 1 0 )為帶符號之八位元信號；R M S j為未帶符號之十六位元信號；P i t ch j為未帶符號之八位元信號；合成音框長度Μ為未帶符號之九位元信號；合成長度L為未帶符號之九位元信號；RC為帶符號之八位元信號；RMS為未帶符號之十六位元信號；P i t c h為未帶符號之八位元信號； N e X t _ F r a m e為一位元的控制信號；N e X t _ P i t c h為一位元的控制信號；合成單元所送出的為帶符號之十六位元信號。接著請參考「第3圖」，其說明了從初始化單元至載入參數單元的信號產生架構圖。首先，初始化單元1 0所產生的初始化信號 n i n i t i a Γ，讓載入參數單元2 0設定各項參數初值，包括已合成長度（L = 0)、合成音框長度（Μ二 1 8 0個取樣點，以取樣率為每秒8 0 0 0次為例）、振幅 (RMSO二0) (RMSj; Root of Mean Square)、音高Pi tch j) is smoothed, and the processed parameters (RC (1 0), RMS, Pitch) are transmitted to the synthesis unit 40 in units of one base cycle (Pitch j) at a time, and the next base cycle is sent out The (Next_Pitch) signal requires the smoothing processing unit 30 to transmit the parameters of the next base cycle, and sends a synthesis length (L) to the loading parameter unit 20, that is, the length of the synthesized speech this time. Finally, the synthesis unit 40 synthesizes the three parameters and sends them to the voice output unit 50 to output the voice. Among them, the bit signal of each parameter, as shown in "Figure 2", the initialization signal (initial) is a one-bit control signal; RC0 (10) is an eight-bit signal with a sign (signed); RMS 0 Is an unsigned sixteen-bit signal; P itch 0 is an unsigned eight-bit signal; RC j (1 0) is a signed eight-bit signal; RMS j is an unsigned ten Six-bit signal; P it ch j is an unsigned eight-bit signal; the length of the synthesized sound frame M is an unsigned nine-bit signal; the synthesized length L is an unsigned nine-bit signal; RC is a signed Signed eight-bit signal; RMS is unsigned sixteen-bit signal; Pitch is unsigned eight-bit signal; N e X t _ Frame is a one-bit control signal; N e X t _ Pitch is a one-bit control signal; the sixteen-bit signal with a sign sent by the synthesis unit. Then please refer to "Figure 3", which illustrates the signal generation architecture diagram from the initialization unit to the parameter loading unit. First, the initialization signal ninitia Γ generated by the initialization unit 10 allows the loading parameter unit 20 to set initial values of various parameters, including the synthesized length (L = 0) and the length of the synthesized sound frame (M 2 180 samples). Points, taking the sampling rate of 8 0 0 times per second as an example), amplitude (RMSO-20) (RMSj; Root of Mean Square), pitch

535140 五、發明說明（9)535140 V. Description of the invention (9)

(PitchO^Pitchl) (Pitchj;第j個音框的基本週期 > 及頻譜參數（RCO(i) = RCl(i)，i 二0, 1，2，···，9; Renecti Coefficients)等等。讀取資料的動作由資料載入單元24 執行，其中，位元序列（5 4位元)為從語音資料庫所綠入的語音音素資料，經由參/數解碼器241解碼為rCKi〇): RMs · 與Pi tch j等部分，分別輸入至第二暫存器25當中。 + J 第二暫存為2 5即將所讀入的資料傳送至下一個邻八 = 即，平滑處理單元3 〇以及第三暫存器2 6。裳：二亦此次所讀入的資料暫存，即可作為下一 :9 ^器26將音參數的參考資料，亦即，當接收到平严二王;：讀入的語一個音框” （Next— Frame)的命令時，即二二；早兀之下上一個音框的參數值（RC〇(1〇)，RMs〇 _ —人苓數設定為至平滑處理單元3Q作平滑處理的參 1 tCh0)，並輸入、卜，由於一開始的音框長度（18〇 ^^ ^ 基週的長度，所以會有剩餘的部分。 έ整倍於分併入下次的音框長度當中，說明如此，將此剩餘的部 (inUial)輪入至暫存器以及計算合 · f先初始化信號並將暫存器21及計算合成長度單元/度=33當中，匕口法器23輪出即為第一個合成音框長:出丈為零，此時异下二個合成音框長度，暫存器2 1將=—三厶〇 )。接者計 (9位兀）載入減法器2 2當中，減去前：欠成音框長度並由加法器23加上音框長度預設值常數的=二=，度（L)， = 180)，即可士+管山 , ^ (預卩又值^數士香异出下一個音框的長度U-Μ - Uum 由載入參數單元所載人之參數，接U—=180)。接者由平滑處理單元(PitchO ^ Pitchl) (Pitchj; basic period of the j-th frame > and spectrum parameters (RCO (i) = RCl (i), i 2 0, 1, 2, ..., 9; Renecti Coefficients), etc. Etc. The operation of reading data is performed by the data loading unit 24, wherein the bit sequence (54 bits) is the phoneme data of the voice input from the voice database, and is decoded into rCKi by the parameter decoder 241. ): RMs · Pi tch j and other parts are input into the second register 25 respectively. + J The second temporary storage is 2 5 to transfer the read data to the next neighboring eight = that is, the smoothing processing unit 3 0 and the third temporary storage device 2 6. Sang: The data read by Er Yi this time is temporarily stored, and can be used as the next: 9 ^ Device 26 will refer to the sound parameter reference data, that is, when the second king of Ping Yan is received ;: a sound box of the read language (Next—Frame) command, that is, two or two; the parameter value of the previous sound frame (RC0 (1〇), RMs〇_ — the number of people is set to the smoothing unit 3Q for smoothing processing (Refer to 1 tCh0), and enter, bu, due to the initial frame length (18〇 ^^ ^ the length of the base cycle, there will be a remaining part. Hand full double the minutes into the next frame length In this case, the remaining part (inUial) is rotated into the temporary register and the calculation f is initialized first, the temporary register 21 and the calculated combined length unit / degree = 33 are included, and the dagger implement 23 is rotated out. That is the length of the first synthesized sound box: the length of the two synthesized sound boxes is different, and the length of the next two synthesized sound boxes is different. The register 2 1 will be equal to-= 3 厶.) The receiver (9 bits) is loaded into the subtraction. In the device 2 and 2, before the subtraction: the length of the sound box is deducted and the preset value of the sound box length is constant by the adder 23 = two =, degree (L), = 180), that is, + the mountain, ^ (Pre-revised value ^ number Shi Xiang) The length of the next sound frame U-M-Uum is determined by the parameters carried by the loading parameter unit, then U— = 180. The smoothing unit is used by the receiver.

第12頁 535140 五、發明說明（ίο) 3 0加以平滑處理，請參考「第4圖」。其中，’，平滑處理單元3 0 π包括音高參數平滑處理單元3卜計算比例單元3 2、計算合成長度單元33、振幅參數平滑處理單元34、頻譜參數平滑處理單元3 5及一個暫存器3 6。平滑處理單元3 0在接收到兩次音框的參數資料後，亦即，此次音框語音音素參數（RC j (丨0)、rMS j、Pi tch j )與上一次語音音素音框參數（RC〇(1〇)、RMS0與Pitch〇)，即開始作平滑處理，每隔一個基週長度（p i t ch )做一次平滑處理。首先’由計算比例單元3 2計算出比例參數，亦即， Prop - L/M。接著’基週參數（pj；tchj，PitchO)由音高參數平滑處單元3 1作處理，以得出經處理的基週，亦即： Pitch = Pitch〇*(l-Pr〇p) + Pitchj*Prop，計算出的基週 (Pitch)，即可送至暫存器36暫存。振幅參數（RMSj·， RMS0)則由振幅參數平滑處理單元34作平滑處理，得出經平滑處理的振幅參數，亦即，RMS = RMS0*(1-Pr〇p) + RMSj*Prop，同樣送至暫存器36暫存。頻譜參數（R C j (1 〇 )，R C 0 (1 0 ))的平滑處理則由頻譜參數平滑處理單元3 5負責，得出經平滑處理的頻譜參數，亦即， RC(i) = RC0(i)*(l-Prop) + RCj (i)*Prop，i = 〇，l，···，9, 同樣地，將經處理後的頻譜參數存至暫存器3 6當中。存至暫存器3 6當中的音高參數與振幅參數以及頻譜參數送至下一個部分，亦即，合成單元4 0當中後，合成單元 4 0合成此次基週的參數後，送出”下一個基週Page 12 535140 V. Description of the invention (ίο) 3 0 For smoothing, please refer to "Figure 4". Among them, ', the smoothing processing unit 3 0 π includes a pitch parameter smoothing unit 3 and a calculation ratio unit 3 2, a calculation and synthesis length unit 33, an amplitude parameter smoothing unit 34, a spectral parameter smoothing unit 35, and a register. 3 6. After receiving the parameter data of the sound box twice, the smoothing processing unit 30, that is, the phoneme parameters (RC j (丨 0), rMS j, Pi tch j) of the sound box and the phoneme parameter of the previous phoneme (RC0 (10), RMS0, and Pitch0), that is, smoothing processing is started, and smoothing processing is performed every other base cycle length (pit ch). First, the proportional parameter is calculated from the calculating proportional unit 32, that is, Prop-L / M. Then the 'base period parameter (pj; tchj, PitchO) is processed by the pitch parameter smoothing unit 31 to obtain the processed base period, that is: Pitch = Pitch〇 * (l-Pr〇p) + Pitchj * Prop, the calculated base period (Pitch) can be sent to the temporary register 36 for temporary storage. The amplitude parameter (RMSj ·, RMS0) is smoothed by the amplitude parameter smoothing unit 34 to obtain the smoothed amplitude parameter, that is, RMS = RMS0 * (1-Pr〇p) + RMSj * Prop, which is also sent. It is temporarily stored in the temporary register 36. The smoothing processing of the spectral parameters (RC j (1 0), RC 0 (1 0)) is performed by the spectral parameter smoothing processing unit 3 5 to obtain the smoothed spectral parameters, that is, RC (i) = RC0 ( i) * (l-Prop) + RCj (i) * Prop, i = 〇, l, ..., 9, Similarly, the processed spectrum parameters are stored in the register 36. The pitch parameters, amplitude parameters, and spectral parameters stored in the register 36 are sent to the next part, that is, after the synthesis unit 40 is synthesized, the synthesis unit 40 synthesizes the parameters of the base cycle and sent out " One week

第13頁 535140 五、發明說明（11) "(Next_P itch)信號，此信號可控制暫存器36的輸出，當暫存器接收到此信號時，即載入下一個基週的經平滑處理的語音參數。π平滑處理單元3 0”當中的計算合成長度單元 3 3接受”下一個基週”（1^11:_?丨悅}1)信號後，即計算此次合成的音框長度，得出L= L+ Pitch， if L>M then L=0;否則，送出n下一個音框n ( N e x t _ F r a m e )信號至載入參數單元 20，進行下一個音框參數之載入，並令L = 0。其中，初始化單元1 0所送出的初始化信號（i n i t i a 1 )則送至計算合成長度單元3 3當中，並令L = 0，用以起始化本單元。接下來的工作由合成單元4 0來進行，請參考「第5 圖」，其包含了脈衝序列產生器4 1、聲道濾波器4 2、振幅調整單元4 3以及記憶體4 4。脈衝序列產生器4 1係輸出一個週期的脈衝信號，此脈衝信號乃模擬人聲帶振動之波形，預先儲存於其中所包含的記憶體中，擷取其前長度為Pi tch之值，若 Pi tch大於記憶體所儲存之脈衝序列長度，超過部份補0。例如：記憶體所儲存之脈衝序列為{ p [ 1 ]，P [ 2 ]，…，p [ 2 5 ] }，則若 Pitch>25，輸出 e(n)二{p[l]， p[2]，…，p[25]， 0， …，0}若 Pitch< = 25，輸出 e(n) = {p[l]， p[2]，…， p[P i t ch]}° 聲道濾波器4 2為模擬人口腔、鼻腔、聲道等，對聲帶振動所產生的共振效應，可以一全極點濾波器（A 1 1 Po 1 e F i 11 e r )或一點陣渡波器（L a 11 i c e F i 11 e r )來實現，其輸入濾波器參數為RC(i)， i = 0, 1，2,…9。Page 13 535140 V. Description of the invention (11) " (Next_P itch) signal, this signal can control the output of the register 36. When the register receives this signal, it will load the smoothed signal of the next base cycle. Processed voice parameters. The π smoothing processing unit 3 0 ”calculates the synthesized length unit 33. After receiving the“ next base period ”(1 ^ 11: _? 丨悦} 1) signal, it calculates the sound frame length of this synthesis, and obtains L = L + Pitch, if L > M then L = 0; Otherwise, send the signal of next frame n (N ext _ Frame) to the loading parameter unit 20, load the next frame parameter, and let L = 0. Among them, the initialization signal (initia 1) sent by the initialization unit 10 is sent to the calculation length unit 3 3, and L = 0 is used to initialize the unit. The next work is performed by the synthesis unit 40, please refer to "Figure 5", which contains the pulse sequence generator 41, the channel filter 4, the amplitude adjustment unit 43, and the memory 44. The pulse sequence generator 41 outputs a periodic pulse signal. This pulse signal is a waveform that simulates the vibration of the human vocal cord. It is stored in the memory contained in advance and its previous length is Pi tch. If Pi tch It is longer than the length of the pulse sequence stored in the memory. For example: the pulse sequence stored in the memory is {p [1], P [2], ..., p [2 5]}, then if Pitch > 25, output e (n) two {p [l], p [ 2],…, p [25], 0,…, 0} If Pitch < = 25, output e (n) = {p [l], p [2], ..., p [P it ch]} ° The channel filter 4 2 simulates the human oral cavity, nasal cavity, channel, etc. The resonance effect on the vocal fold vibration can be an all-pole filter (A 1 1 Po 1 e F i 11 er) or a point array wave filter (L a 11 ice F i 11 er), the input filter parameters are RC (i), i = 0, 1, 2, ... 9.

第14頁 535140 五、發明說明（12) 將脈衝序列通過聲道濾波器4 2後，再經過振幅調整單元43，即可合成語音信號，振幅調整單元43由RMS計算振幅所需調整的量。語音合成後，由振幅調整單元4 3送出π 下一個基週n(Next_Pitch)至平滑處理單元30。記憶體44則負責暫存聲道濾波器42與振幅調整單元43 所計算的語音信號。最後，將平滑處理單元3 0處理後的參數經由合成單元 4 0合成一個基本週期語音，從記憶體4 4中送至語音輸出單元5 0，即可輸出語音。語音輸出單元5 0至少有一個記憶體緩衝器（Memory Buffer)，讓每次合成之語音週期儲存到其中。雖然本發明以前述之較佳實施例揭露如上，然其並非用以限定本發明，任何熟習相關技藝者，在不脫離本發明之精神和範圍内，當可作些許之更動與潤飾，因此本發明之專利保護範圍須視本說明書所附之申請專利範圍所界定者為準。Page 14 535140 V. Description of the invention (12) After passing the pulse sequence through the channel filter 4 2 and then passing through the amplitude adjustment unit 43, the speech signal can be synthesized. The amplitude adjustment unit 43 calculates the amount of amplitude adjustment required by the RMS. After the speech synthesis, the amplitude adjustment unit 43 sends π the next base cycle n (Next_Pitch) to the smoothing processing unit 30. The memory 44 is responsible for temporarily storing the speech signals calculated by the channel filter 42 and the amplitude adjustment unit 43. Finally, the parameters processed by the smoothing unit 30 are synthesized into a basic periodic speech by the synthesizing unit 40 and sent from the memory 44 to the speech output unit 50 to output the speech. The speech output unit 50 has at least one memory buffer (Memory Buffer), so that each synthesized speech period is stored therein. Although the present invention is disclosed above with the foregoing preferred embodiments, it is not intended to limit the present invention. Any person skilled in the relevant arts can make some changes and retouch without departing from the spirit and scope of the present invention. The scope of patent protection of an invention shall be determined by the scope of patent application attached to this specification.

第15頁 535140 圖式簡單說明第1圖為本發明之語音音素解碼器之系統架構圖；第2圖為本發明之語音音素解碼器之具體實施例；第3圖為本發明之語音音素解碼器具體實施例中初始化單元與參數載入單元之架構圖；第4圖為本發明之語音音素解碼器具體實施例中平滑處理單元之架構圖；及第5圖為本發明之語音音素解碼器具體實施例中合成單元之架構圖。【圖示符 •號說明 ] 10 初始化單元 20 載入參數單元 21 暫存器 22 減法器 23 加法器 24 資料載入單元 241 參數解石馬器 25 第 •-- 暫存器 26 第 —· 暫存器 30 平滑處理單元 31 音南參數平滑處理單元 32 計算比例單元 33 計算合成長度 tic? 早元 34 振幅參數平滑處理單元 35 頻譜參數平滑處理單元Page 535140 Brief Description of the Drawings Figure 1 is a system architecture diagram of the speech phoneme decoder of the present invention; Figure 2 is a specific embodiment of the speech phoneme decoder of the present invention; and Figure 3 is a speech phoneme decoding of the present invention FIG. 4 is a structural diagram of a smoothing processing unit in a specific embodiment of a speech phoneme decoder according to the present invention; and FIG. 5 is a speech phoneme decoder of the present invention. Architecture diagram of the synthesis unit in the specific embodiment. [Illustration of symbol and number] 10 Initialization unit 20 Load parameter unit 21 Register 22 Subtracter 23 Adder 24 Data load unit 241 Parameter calcite horse 25th --- Register 26th --- Temporary Register 30 Smoothing processing unit 31 Yinan parameter smoothing processing unit 32 Calculation ratio unit 33 Calculating synthetic length tic? Early element 34 Amplitude parameter smoothing processing unit 35 Spectrum parameter smoothing processing unit

第16頁 535140Page 16 535140

圖式簡單說明 36 40 41 42 43 44 50 8u 8 s 16u 16s initial L M Next_F rame Next_P i tch Pitch P i t c h j PitchO Prop RC RC j RC〇 RMS 暫存器合成單元脈衝序列產生器聲道濾波器振幅調整單元記憶體語音輸出單元未帶符號之八位元（unsigned 8 bits) 帶符號定之八位元（signed 8 bits) 未帶符號之十六位元（unsigned 16 bits 帶符號之十六位元（signed 16 bits) 初始化信號已合成長度合成音框長度下一個音框(Next_F rame ) 下一個基週（Next_Pitch) 基週此次處理的基週上一次處理的基週比{列（Proportion) 頻譜參數此次處理之頻譜參數上一次處理之頻譜參數振幅參數（均方根值）Brief description of the drawing 36 40 41 42 43 44 50 8u 8 s 16u 16s initial LM Next_F rame Next_P i tch Pitch P itchj PitchO Prop RC RC j RC〇RMS register synthesizer unit pulse sequence generator channel filter amplitude adjustment unit Memory speech output unit unsigned 8 bits (signed 8 bits) signed fixed 8 bits (unsigned 16 bits signed 16 bits (signed 16 bits) Initialization signal has been synthesized length Synthesized sound frame length Next sound frame (Next_Frame) Next base cycle (Next_Pitch) Base cycle This processed base week The last processed base cycle ratio {Column (Proportion) Spectrum parameters this time Processed Spectrum Parameters Last Processed Spectrum Parameter Amplitude Parameter (root mean square value)

第17頁 535140Page 17 535140

第18頁Page 18

Claims

535140 6. Scope of patent application 1. A speech phoneme decoder, which uses an amplitude parameter (RMS), a base period parameter (Pi tch) and a linear predictive coding method (Linear P redictive Coding, LPC). ) To decode the speech data encoded by the spectral parameters (RC's), the encoded speech data is stored in a speech database, and the speech phoneme decoder includes: an initialization unit for generating an initialization signal; The input parameter unit is connected to the initialization unit, and is used to receive the initialization signal, and load a voice frame's voice data from the voice database with a frame (Frame) as a unit; a smoothing processing unit To receive the voice data of the voice frame, and to use one base period (Pi tch) of the voice frame as the length, use the internal difference method to separately process the amplitude in the voice data of the voice frame Parameters, the base period parameter and the spectrum parameter, send a signal of a sound frame to the loading parameter unit to load the voice data of the next sound frame; a synthesis unit, using Receiving the speech data of the base cycle processed by the smoothing processing unit and synthesizing the speech data into a speech signal; after the synthesis unit has processed the speech data of the base week, sending out a signal of the base cycle to the smoothing processing unit to process the next signal A base period of speech data; and a speech output unit for receiving a speech signal transmitted by the synthesis unit to output speech. 2. The speech phoneme decoder according to item 1 of the scope of the patent application, wherein the loading parameter unit includes a parameter decoder which decodes and decodes the code based on the base cycle parameter, the amplitude parameter, and the spectral parameter encoding order. Output to the smoothing unit in parallel.

535140 VI. Application for patent scope 3. The speech phoneme decoder described in item 1 of the scope of patent application, wherein the loading parameter unit is after loading the voice data of the voice frame, temporarily storing the voice of the current voice frame Data, and after receiving the signal of the next sound box transmitted by the smoothing processing unit, load the speech data of the next sound box, and combine the speech data of the current sound box with that of the next sound box The speech data is sent to the smoothing processing unit. 4. The speech phoneme decoder according to item 1 of the scope of patent application, the smoothing processing unit processes the speech data of the current sound frame and the speech data of the next sound frame by the internal difference method and outputs it to the synthesis unit. 5. The speech phoneme decoder according to item 1 of the scope of patent application, the smoothing processing unit includes: a calculation ratio unit for calculating the length of the synthesized sound frame of the current sound frame and the synthesized sound of the sound frame Proportion of frame length; a pitch parameter smoothing processing unit for receiving a base period parameter of the current sound frame and a base period parameter of the next sound frame, and calculating a composite base period parameter by an internal difference method; an amplitude A parameter smoothing processing unit is configured to receive the amplitude parameter of the current sound frame and the amplitude parameter of the next sound frame, and calculate a composite amplitude parameter by the internal difference method; a spectrum parameter smoothing processing unit is used to receive the current The spectrum parameter of the sound frame and the spectrum parameter of the next sound frame, and a composite spectrum parameter is calculated by the internal difference method; a calculation length unit is used to calculate the length of the synthesized sound frame of the current sound frame and input the result To the calculation scale unit and output this

Page 20 535140 VI. The signal of the next sound frame in the scope of patent application to the loading parameter unit; and a temporary register to store the synthetic base cycle parameter, the synthetic amplitude parameter and the synthetic spectrum parameter and output to the Synthesis unit. 6. The speech phoneme decoder according to item 1 of the scope of patent application, the synthesis unit comprising: a pulse sequence generator for outputting the base period parameter as an excitation signal (Excitation Signal);

A channel filter for receiving the excitation signal and processing it into a synthetic speech signal according to the spectral parameter as a filtering parameter of the channel filter; and an amplitude adjusting unit for multiplying the synthetic speech signal by The amplitude parameter is used to output a restored voice and output to the voice output unit. 7. The speech phoneme decoder according to item 1 or 6 of the scope of patent application, the synthesis unit further includes a memory for temporarily storing the synthesized speech signal and the restored speech, and outputting the restored speech to the speech output unit.

Page 21