TW526466B

TW526466B - Encoding and voice integration method of phoneme

Info

Publication number: TW526466B
Application number: TW90126503A
Authority: TW
Inventors: Huang-Lin Yang
Original assignee: Inventec Besta Co Ltd
Priority date: 2001-10-26
Filing date: 2001-10-26
Publication date: 2003-04-01

Abstract

The invention provides an encoding and voice integration method of phoneme and employs an off-line method to sample voice. The sampled voice data can be classified into three phonemes in terms of voiced, unvoiced and mute phonemes that are further encoded in accordance with a pitch parameter, an amplitude parameter and a spectrum parameter. Those unvoiced phonemes are directly recorded, while recording the time period of mute phoneme, and then those encoded phoneme data is recorded in a voice database. Subsequently, as long as the encoded phoneme data in voice database is decoded and synthesized, the voice can be recovered. It is able to synthesize a voice close to the original one by synthesizing the retrieved voiced phoneme just via a voice synthesizer designed on the basis of the pitch parameter, amplitude parameter and spectrum parameter, directly receiving the unvoiced phoneme, and playing the mute length of the mute phoneme.

Description

526466526466

【發明之應用領域】【發明背景】[Application Field of the Invention] [Background of the Invention]

At，在中低階的電子辭典市場中，標榜以真人發音的功月匕，已成為電子辭典主要訴求之特色。為，在市場的競爭力，各家廠商要：降低生產成本。有些廒商所；的= “：:二’由於其資料量大’且系統輸出之種類受極、制，相备耗費成本，所以，大多數廠商立式來接近真人發音，可讓電子辭典能節；枓兄fe體並提高聲音品質。貝這種語音分析合成的技術是依照一定的處理方法，分 =語言信號並將其提出必要的特徵參數，並用這些參數按 ”’、m a產生的杈型合成為語音的技術。由於語音分析合，過程是將聲音訊號以最少的數位資料來代表原始信;。，以，一般也稱之為語音壓縮技術，其牽涉到語音的取樣以及編碼與解碼等技術。如語音波形編碼中的適應性差量脈衝碼調變（AdaPtlve Delta pulse c〇de ⑽； ADPCM)的編碼方式，其重點在於使重建的信號與原始信號波形愈像愈好；從數學的觀點而言，其採用最小均方誤差的準則（Minimum Mean Square Error Criterion)，但 ADPCM 方法的位元率小於24kbps(Kll〇 Bit per Sec〇nd)，At, in the mid-to-low-end electronic dictionary market, advertised the merits of being pronounced by real people, which has become the main feature of electronic dictionaries. In order to be competitive in the market, each manufacturer must: Reduce production costs. Some commercial firms; = ":: two 'Because of its large amount of data' and the type of system output is very expensive and expensive, so most manufacturers stand close to real people's pronunciation, which allows electronic dictionaries to This technique of speech analysis and synthesis is based on a certain processing method, which divides = speech signals and proposes necessary characteristic parameters, and uses these parameters to press "', ma to generate the branch Type synthesis technology for speech. Due to the combination of speech analysis, the process is to represent the original letter with the least digital data in the sound signal; Therefore, it is also commonly referred to as speech compression technology, which involves speech sampling and encoding and decoding technologies. For example, the adaptive differential pulse code modulation (AdaPtlve Delta pulse code; ADPCM) coding method in speech waveform coding focuses on making the reconstructed signal more similar to the original signal waveform. From a mathematical point of view, , Which adopts the Minimum Mean Square Error Criterion, but the bit rate of the ADPCM method is less than 24kbps (Kll〇Bit per Sec〇nd),

第4頁 526466 五、發明說明（2) 會有經還原後的品質變差，且運算量大的問題。以上所述的σ 3刀析合成，其特色是具有可大幅壓縮語音資料量，邡可額外有保密通信之優點（運用加密技術）。不過，其缺點為語音合成之輕重、分音、基週往往與自然語音有所是距，造成不自然，甚至不易識別的缺點。即便是經過壓縮的語音分析合成技術，仍然有節省記憶體空間的可能性。此外’％有的語音分析合成技術多以線f(on：line)的方式運作，所以’必須加上判斷語音是 ^有聲音丄的動作’常常在判斷的過程中，會將「有 :」與「無耸」的部分判斷錯誤，造成語音合成時產音沙啞的情形。於是，如何能讓語音分析方面能達到接近自然語音，亦面’如何能達到最大壓縮的程 f間；再一方面，如何能讓語單；以上幾點均成為重要的研【發明之目的及概述】合成技術所產生的語音，一即，音質的改善；另一方度，亦即，最不耗佔記憶體音分析合成的過程較為簡究課題。鑒於以上習知技術的問之編碼及語音合成方法，艾 bH — line)的狀況下，事先與無聲音素，加以分別處理成的過程。題，本發明提供一種語音音素目的在於，可於離線將語音的音素區分為有聲音素，可於語音合成時簡化語音合將有聲語音音素加以編碼，計算振幅、基週及頻譜參Page 4 526466 V. Description of the invention (2) There will be problems that the quality after reduction is reduced and the amount of calculation is large. The above-mentioned σ 3 knife analysis and synthesis has the characteristics that it can greatly reduce the amount of voice data, and can also have the advantage of confidential communication (using encryption technology). However, its shortcomings are the importance of speech synthesis, the partial separation, and the base period, which are often at a distance from natural speech, resulting in unnatural and even difficult to identify shortcomings. Even with compressed speech analysis and synthesis technology, there is still the possibility of saving memory space. In addition, '% of speech analysis and synthesis technologies mostly operate in a line f (on: line) mode, so' must be added to determine that the speech is a motion with sound 丄 'often in the process of judgment, will "have:" Misjudgment with the "No Tower" part resulted in hoarseness in speech synthesis. Therefore, how to make the analysis of speech close to natural speech, and also how to achieve the maximum compression range f; on the other hand, how to make the vocabulary; the above points have become important research [the purpose of the invention and Overview] The speech produced by the synthesis technology is improved in sound quality; the other is that the process of analyzing and synthesizing the least amount of memory is relatively simple. In view of the coding and speech synthesis methods of the above-mentioned conventional techniques, in the case of Ai bH-line), the processes are processed separately in advance and without vowels. The present invention provides a phoneme for speech. The purpose is to distinguish phonemes of speech into vowels offline, which can simplify speech synthesis during speech synthesis. Encode voiced phonemes to calculate amplitude, base period, and spectral parameters.

第5頁 526466Page 5 526466

-- I 五、發明說明（3) 數亚進行編碼，其中，頻譜參數以Lpc參數編碼方式；而對=無聲（氣音；unv〇iced )語音音素檔保留其原音不^縮；靜音部分則只記錄靜音長度。解壓縮時，只需曰將有=曰部份，利用内插方式將振幅、基週及頻譜參數做平α處理，再利用語音合成器，還原有聲語音；無聲部八 :需依據位址取出原語音加以還原；而靜音部分’，、“：出靜音時間長即可。而取依據本發明所揭露的技術，本發明提供一種語音立去之編碼及語音合成方法’其包含 ‘：：* 立階段與語音合成階段，說明如下：…枓庫建庫建立㈣，包括下列步驟：將該 &刀為一有耸、無聲與靜音〜素碼’並將該無聲音素進行-位址編碼= = 儲存該無聲與靜音音素至該語音資料广。亥有耳曰素亚一旦使用者鍵入文字資料，八素並讀取該語音資料庫的音辛資料，：f文字資料之音階段。曰畜貝枓接者，即進入下一個 5吾音合成階段依據該古丑立咨該文字資料之注立，彳日丨貝料庫之該音素資料，合成耸音素碼、該無聲音素碼與該；：遠曰素資料之有音，並依據語音合成器合成-有聲語依據該靜音音；mn:碼產生-無聲語音，並第6頁 526466-I V. Description of the invention (3) The number of codes is encoded, in which the spectral parameters are encoded in the Lpc parameter; and the original phoneme file of the unvoiced (unvoiced) voice is not contracted; the mute part is Only the mute length is recorded. When decompressing, you only need to use the == part, and use the interpolation method to flatten the amplitude, base period, and spectrum parameters, and then use the speech synthesizer to restore the voiced voice; the silent part 8: It needs to be taken out based on the address The original voice is restored; and the mute part ',,': long silence time can be taken. According to the technology disclosed in the present invention, the present invention provides a method for encoding and synthesizing a speech, which includes :: * The establishment phase and the speech synthesis phase are explained as follows: ... 枓 Building a library to build a library, including the following steps: The & knife is a blocky, silent and silent ~ prime code 'and the voiceless element is-address code = = Store the silent and mute phonemes in the voice data. Once the user enters text data, Basu and read the phonetic data of the voice database, f: the sound phase of the text data. The receiver will enter the next stage of synthesizing the five vowels. According to the note of the text data of the ancient and ugly, the phoneme data on the next day will be synthesized into the phoneme code, the non-voice code, and the : Far have said the phoneme data, and synthesized according to the speech synthesizer - sound mute language according to the tone; mn: code generation - a silent voice, and Page 6526466

五、發明說明（4) 在語音資料庫建立階段中，振幅朱數盥并f雖夂r 另耳曰素依據基週芩數、芩數與位址參數加以編碼.靜立立=耳日/、則依據基週間參數加以編碼。曰曰素則依據基週參數與時音資$;二：f:’只要依據語音編碼的規則，取出語 :八f ^中的有耸語音碼、無聲語音碼與靜音語音碼，二刀別加以解碼與合成即可獲得一合成的語音。豆中， :語音ί呈由一語音合成器，此語音合成器是依照基週參數、頻瑨爹數以及振幅參數三者加以設計。二、，有關本發明的特徵與實作，茲配合圖示作最佳實施例详細說明如下：【發明之詳細說明】若以發音為基礎，大部分語言都是多音節語言。以英文為例，若把英文細分為由各個不同的音標所組成的不同的單音節，則可將英文歸納出幾千個基本的發音單元，這些龟a單元即為音素，而每個不同的音素本身都含有其I 週（Pitch)。所以，可以利用這種以音素為發音基礎的 '語" _ 言，反過來以音素做為該語音的編碼與解碼，本發明即為依據此種概念的應用。其次，由於電子辭典市場的語音處理較為規則，且其要求的資料壓縮量較大，所以，本發明運用線性預估編碼 (Linear Predictive Coding，以下簡稱LPC)的方式作為本發明之編碼與解碼的方式。此外，線性預測編碼 (Linear Prediction Coding; LPC)，是基於語音發聲模V. Description of the invention (4) During the establishment of the speech database, the amplitudes of the numbers are not equal to 夂 r, and the other numbers are encoded based on the base cycle number, unit number, and address parameters. Jing Li Li = Ear Day / , It is encoded according to the parameters between base cycles. Said prime is based on the base period parameter and the time sound $; 2: f: 'As long as the speech coding rules are used, the language: eight f ^ has a towering voice code, a silent voice code, and a mute voice code. Decode and synthesize to get a synthesized speech. In the bean, the speech is presented by a speech synthesizer, which is designed according to the three parameters of the base period parameter, the frequency and the amplitude parameter. 2. About the features and implementation of the present invention, the best embodiment will be described in detail with reference to the drawings. [Detailed description of the invention] If it is based on pronunciation, most languages are multi-syllable languages. Taking English as an example, if the English is subdivided into different monosyllables composed of different phonetic symbols, the English can be summarized into thousands of basic pronunciation units. These turtle a units are phonemes, and each different The phoneme itself contains its Pitch. Therefore, it is possible to use this phoneme-based pronunciation of 'languages', which in turn uses phonemes as the encoding and decoding of the speech, and the present invention is an application based on this concept. Secondly, because the electronic dictionary market has relatively regular speech processing and requires a large amount of data compression, the present invention uses Linear Predictive Coding (hereinafter referred to as LPC) as the encoding and decoding method of the present invention. the way. In addition, Linear Prediction Coding (LPC)

526466 五、發明說明（5) T-ct Pllter)^, 元率到壓縮的目的，可達到非常低的位法。 u Rate)，所以相當適合作為本發明的編碼方碼及語：ί成；：：弟1圖」’本發明之語音音素之編無聲與靜；i:(法步之二圖進包 =之有聲音素碼、無聲音素與靜音音辛儲『: a素解碼與平滑處理(步驟40);以及，合成語音=將。其中，從上述的編碼與解碼流程中，；^括 4〇 5 4 Vr階段(步驟1。，)與解碼階段(步驟編碼階段所著重的在於語音資料庫的建，所以，亦可稱之為語音資料庫建立階〜 ::在電子辭典使用者在按下所想要發音：文,：：辭典即可依々五立咨糾 j 乂子日守電子素，並依本發明的編碼規則取出所：曰曰解碼’進而還原與合成語音，所以，切：；二素再力:以成階段°卩下將針對個別的步驟逐—說明，亦可稱之為合首先’在步驟1Q當中，由於七五立、分區分出語音音素（ph〇neme)，而立以_^文字的發音部類’·所以’本發明運用語音音素當中的曰「素有亦：以加以分 (voiced)、「無聲」（unv〇jced)盘靜立丄類方式。由於「有聲」的音素為週期：區別來做基本分部分’所以，可進—步壓縮；❿ =2的語音 …、」的曰素為非週期526466 V. Description of the invention (5) T-ct Pllter) ^, from the element rate to the purpose of compression, it can achieve a very low bit method. u Rate), so it is quite suitable as the coding code and phrase of the present invention: 成成; :: Picture 1 "" The compilation of the phoneme of the present invention is silent and silent; i: (the second step of the picture into the package = of There are phoneme codes, no phonemes, and mute sounds: "a phoneme decoding and smoothing (step 40); and, synthesized speech = will. Among them, from the above encoding and decoding process, ^ including 4 05 4 The Vr stage (step 1.) and the decoding stage (step encoding stage focuses on the construction of the voice database, so it can also be called the voice database creation stage ~ :: In the electronic dictionary, the user presses what he wants To pronounce: text, :: The dictionary can be based on the five literary rectification j 乂子日守电子素, and according to the encoding rules of the present invention to extract the place: "Decoding" and then restore and synthesize speech, so, cut :; two prime Re-strength: In the following stages, the individual steps will be explained one by one, which can also be referred to as the first step. In step 1Q, because of the seventy-five, the phoneme (ph0neme) is distinguished, and the _ ^ The pronunciation part of the text '· So' The present invention uses the phonemes Said "have also also: voiced", "unvoice" (unv〇jced) disc static stand method. Because "voiced" phonemes are cycles: the difference is to make the basic sub-parts', so you can go one step further. Compression; ❿ = 2 speech ..., "" prime is aperiodic

第8頁 526466 五、發明說明（6) —_ 性（n〇nperi〇(iic)的語音部分，所Γ; 丁、則直接紀錄其長度即可。進行壓縮；靜音以電子辭典當中的英文發音為彳丨標（Ph〇netlc aiphabet)的配有、'、於其字母以及音每個音節為單位以區分出二音^的規則/亦即，以所以，可事先透過央文資料庫當中二…、耳邰別語音的有聲與無聲。例如，「曰不貝料加以區 S、t等，例如：free之音产「f ·Γ/」的部分有ί、P、 lreei9feLfri]處理德 ·Ί 國語與其他語言的語音處理，道理亦同：.]。至於透過語言本身的資訊，即可將語音的有辣I a μ + 、'泉（of f-l me)的狀況下’透過事前處理，=耳A熙耸在離碼前，將所有的α立立I % 亦17，於語音編中，「有聲」音素的處理 :ς無严耳兩類。其只留韻母有聲音。…聲音素的聲母氣音，音與音節音素的聲母氣音，而；語；J ’二留無八聲子些微雜訊)全部設為零，只記錄靜音長7。可此含有將語音的音素分類後，即可進入步立碼。由於本發明λ牛驴ί η木士 μ 進仃曰素編聲」、「無立當！，將語音音素分為「有種事先分類好的語立Γ二2 一種，因此，本發明將針對三將語音編碼的三個：：夫；:乂編碼。本發明的編碼方式係振幅參數之均=數力二、'碼’三個參數分別為： (Pitch，亦即 n / 〇f mean _are)、基週 reflectl^ 9调）參數及頻譜參數（KC，S;反射係數， reUection c〇efficients)。Page 8 526466 V. Description of the invention (6) —_ The speech part of nature (n〇nperi〇 (iic), so Γ; Ding, you can directly record its length. Compression; mute the English pronunciation in the electronic dictionary It is equipped with 彳丨标 (Phnetlc aiphabet), and the rules for distinguishing two syllables in the unit of each letter and syllable are syllables. That is, you can use the two …, The voiced or unvoiced voices of the ears. For example, "Jiebubei is added to areas S, t, etc. For example, the part of" f · Γ / "of the free sound production is ί, P, lreei9feLfri]. The same is true for speech processing in Mandarin and other languages :.]. As for the information in the language itself, you can use the voice I a μ + and 'under the condition of spring (of fl me)' to process it in advance, = Ear Axi towered before leaving the yard, and set all α α I% also 17. In the speech editing, the processing of "voice" phonemes: There are no strict ears. It only leaves the finals with sound .... Consonant vowel, consonant vowel sound of vowel and syllable phoneme Noise) are all set to zero, only 7 long recording silence. However, after this classifies the phonemes of the speech, you can enter the step code. Since the λ cow and donkey of the present invention λ 士 μμ 素 prime vowel editing "and" without standing! ", The phonemes of speech are classified into" there is a kind of pre-classified language stand Γ2 2 ". Therefore, the present invention will be directed to Three of the three voice coding :: husband;: 乂 coding. The coding method of the present invention is the average of the amplitude parameter = number of force two, the three parameters of 'code' are: (Pitch, that is n / 〇f mean _are ), Base period reflectl ^ 9 tuning) parameters and spectrum parameters (KC, S; reflection coefficient, reUection c〇efficients).

526466 五、發明說明（7) 其中，振幅參數與基週參數的 / ,. 散的獲仔，係以一個立柄 (一個音框frame二180取樣點，8kη , 曰忙牛外管屮Α夂勣佶品厢/ ^ kHz之取樣率）為單位，逐步5十斤出其讀值。而頻❹數（Rc，s)的獲 ^ 方式計算而得，亦即，依照下列方程式計 b · PC的 A0/(l+alZ-l+a2Z-2 -+alOZ-l〇) ^ ^ 其中，A0係為振幅參數，z係為’ al〜al〇即為[π夂數0 >526466 V. Description of the invention (7) Among them, the amplitude parameter and the base period parameter of /,. Are obtained by a vertical handle (a sound frame frame 180 sampling points, 8kη, said busy cattle outer tube 屮 Α 夂) Product sampling rate / ^ kHz sampling rate) as the unit, and gradually read out the value of 50 pounds. And the frequency unit number (Rc, s) is obtained by calculating ^, that is, according to the following equation b · PC's A0 / (l + alZ-l + a2Z-2-+ alOZ-l〇) ^ ^ where , A0 is the amplitude parameter, z is' al ~ al〇 which is [π 夂数 0 >

由以上的三種參數，一個「有聲」語音音框 (18〇SampleS)可編碼為54 bits，壓縮位元率相當於 2 · 4 k b p s，各個參數的位元配置如下：From the above three parameters, a "voice" speech frame (18〇SampleS) can be encoded as 54 bits, and the compressed bit rate is equivalent to 2 · 4 k b p s. The bit configuration of each parameter is as follows:

Pitch(6 bits) ，RMS(6 bits) ，RC，s(RC0〜RC9) 665555444433 至於「無聲」的語音音框，由於本發明直接將其紀錄起來，所以，定義其基週（Pi tch)參數值為1，其編碼方式如下：Pitch (6 bits), RMS (6 bits), RC, s (RC0 ~ RC9) 665555444433 As for the "silent" voice frame, since it is directly recorded by the present invention, the Pi tch parameter is defined The value is 1, which is encoded as follows:

Pitch(6 bits)Index_of_ ,υηνο iced__speech 6 8 ( I d x)Pitch (6 bits) Index_of_, υηνο iced__speech 6 8 (I d x)

6 8(Idx) 其中，I dx為實際語音（氣音）的指標，亦即，其所儲存的位址。「靜音」的語音音框，設其基週參數值為〇，其編碼方式如下：6 8 (Idx) Among them, I dx is the index of actual speech (qi sound), that is, the address stored in it. "Mute" voice frame, set its base period parameter value to 0, and its encoding method is as follows:

第10頁 526466 五、發明說明（8)Page 10 526466 V. Description of the invention (8)

Pitch(6 bits)Length_of_silence 68(Ls) 其中，Ls為無聲的長度。 6 8(Ls) 接下來，即可將以上所編碼的語音資料紀錄至任立次料庫’亦π，步驟30。以上的步驟10,已說；曰貝的編碼規則，也就是利用語音音素本身的「有聲 &「一聲」與「靜音」三個部分，用不同的方式加以：碼：如無此’即可省去相當大的記憶體空間。此一建立好的語音資料庫，即可用來作五立人礎二也就是，讀取語音資料時以基週參始:: 口貝取』，亦即，若Pltch>1，則共讀取54 原成有聲語音；sPltch=1，則再讀取8 bitsd 了退據I dx載入實際語音氣音資料，以英文為例， f 氣音資料約佔記憶^lmbytes ;州 # ^ bA(Ls)，解碼還原成靜音，長度L㈣。料頃取8 「右ΐ句話說，由於本發明採取的技術策略為將語音的一 |二」、無聲」與「靜音」的部分分開處理，所以， :、广編碼的資料型態有所不同’如上所述的各種聲：构^彳置。於是，在合成語音時，只要依據本發明編i的、ή ” i紅作即可。以下，將介紹合成階段的择作方凌，也就是步驟4〇〜5〇。知作方Pitch (6 bits) Length_of_silence 68 (Ls) where Ls is the length of silence. 6 8 (Ls) Next, you can record the encoded audio data to Renlici's database, also π, step 30. The above step 10 has been said; the coding rules of Yuebei, that is, using the three parts of "voice &" one sound "and" mute "of the phoneme itself, are applied in different ways: code: if there is no such Can save considerable memory space. This established voice database can be used as the basis for five people. That is, when reading the voice data, start with the basic weekly reference :: 口贝取 ”, that is, if Pltch > 1, then read a total of 54 Original voiced speech; sPltch = 1, then read 8 bitsd back data I dx to load actual voice breath data, taking English as an example, f breath data account for about ^ lmbytes; state # ^ bA (Ls) , The decoding is restored to mute, length L㈣. The material is taken 8 "Right words, because the technical strategy adopted by the present invention is to separate the one, two, silent" and "silent" parts of the voice, so the data types of: and wide-coded are different. 'Various sounds as described above: 彳 ^ 彳置. Therefore, when synthesizing speech, it is only necessary to compose i, i, i, and i according to the present invention. In the following, the selection method of the synthesis phase will be introduced, that is, steps 40 ~ 50.

百先，先介紹音素解碼與平滑處理的部分，亦即，IBaixian, first introduce the part of phoneme decoding and smoothing, that is, I

第11頁 526466 五、發明說明（9) 驟40。在步驟40當中’亦需依照三種音素分別加以處理。先爹考「有聲」音素方面，請參考「第2圖」，本發明之洁音合成杰1 〇〇方塊圖。在語音合成時，先依據使用者所鍵入的文字資料，依音素拆解法則取出適當的語音音素，其做法為’先利用可產生週期為有聲音素的「基週」之脈衝序列產生器（Impulse Train; EKitation Signal) 101 ;接著，將之通過一個聲道濾波器（v〇cal Tract Fi Iter )102，此聲道濾波器i〇2的頻率響應由RC，s值決定；然後，根據RMS值經由乘法器1 〇3調整輸出語音能量。其中，脈衝序列產生器1 〇 1是模擬人聲帶的振動，請參考「第3圖」，其產生方法是將序列p[25] = { 8, -16, 26, -48,86,-162, 294,-502,718,-728， 184， 672， -610， -672， 184， 728， 718， 502， 294， 162， 86， 48， 26， 1 6，8 } 組成一週期性序列e ( η )，週期即為基週（p i t c h ) 參數。若 Pitch>25,則 e(n)二{p[l]，p[2]，…，p[25]，0, …，〇};若 Pitch< = 25,則 e(n)二{p[ 1 ]， P[2]， ···， p[Pitch]}。然後e(n)再通過一個 L〇Wpass Filter (1 + 0 · 7 5 z - 1 + 〇 · 1 2 5 z - 2 )，得到聲道濾波器的輸入激發信號 (Excitation Signal )。至於聲道濾波器1 0 2，其為模擬口腔通道的頻率響應，滤波器參數即為依LPC方式所計算出的頻譜參數， RC ’ s ’透過聲道濾波器丨〇 2即可實現，其輸入信號為 e(n) ’輸出為語音s(n)。由於LPC處理過程，在編碼時有做預強調處理（pre_emphasis)(1 —〇· 9875z —D，其用以加Page 11 526466 V. Description of Invention (9) Step 40. In step 40, '' is also processed separately according to the three phonemes. For the first test of "voice" phoneme, please refer to "Figure 2". The clean sound synthesis block diagram of the present invention is 100 square meters. In speech synthesis, first, according to the text data typed by the user, and extract the appropriate phonemes according to the phoneme disassembly rule, the method is to first use a pulse sequence generator that can generate a "base period" with a phoneme ( Impulse Train; EKitation Signal) 101; Next, pass it through a channel filter (v〇cal Tract Fi Iter) 102, the frequency response of this channel filter i〇2 is determined by the value of RC, s; then, according to RMS The value is adjusted via the multiplier 103 to output the speech energy. Among them, the pulse sequence generator 1 〇1 is to simulate the vibration of the vocal vocal cord. Please refer to “Figure 3”. The method of generating the pulse sequence generator is p [25] = {8, -16, 26, -48,86, -162 , 294, -502,718, -728, 184, 672, -610, -672, 184, 728, 718, 502, 294, 162, 86, 48, 26, 1 6, 8} form a periodic sequence e (η ), The period is the pitch parameter. If Pitch > 25, then e (n) di {p [l], p [2], ..., p [25], 0, ..., 〇}; if Pitch < = 25, then e (n) di {p [1], P [2], ···, p [Pitch]}. Then e (n) passes a L0Wpass Filter (1 + 0 · 7 5 z-1 + 0 · 1 2 5 z-2) to obtain the input excitation signal (Excitation Signal) of the channel filter. As for the channel filter 102, which is the frequency response of the simulated oral channel, the filter parameters are the spectral parameters calculated according to the LPC method, and RC's' can be achieved through the channel filter. The input signal is e (n) 'and the output is speech s (n). Due to the LPC process, a pre-emphasis (1 — 0.9875z —D) is used during encoding, which is used to add

第12頁 526466 五、發明說明（ίο) 強高頻信號的正確運算，故解碼時，需加一個解預強調濾波器（De-emphasis Filter)W(；l-〇. 98 75z-l)。在「第2圖」的乘法器中’將增益值（Gain)加入，亦即，將解碼後語音信號之RMS值乘入經聲道濾波器1〇2的解碼值，亦Μ，上述的振幅參數，將之調整為與即可，其中： 14」Page 12 526466 V. Description of the invention (ίο) Correct operation of strong high-frequency signals, so when decoding, a De-emphasis Filter W (; l-〇. 98 75z-l) needs to be added. In the multiplier of "Figure 2", the gain value (Gain) is added, that is, the RMS value of the decoded speech signal is multiplied by the decoded value of the channel filter 102, which is also the above-mentioned amplitude. Parameters, adjust it to AND, where: 14 "

Gam = PMSGam = PMS

.另外，在有聲音素的語音合成時，需要將基週加：同人步。同步的方法係在語音合成時，以-個 C ί:二：成連續幾個週期’，合成語音長度必須。框f樣本點數卜音框長度(18。”前一合成音框剩立‘严：^剩―下未滿總樣本點數的樣本點，併於下個In addition, in speech synthesis with vowels, it is necessary to add the base cycle: doubling. The method of synchronization is in speech synthesis, with one C ί: two: into several consecutive cycles', the length of the synthesized speech must be. Box f number of sample points and the length of the sound box (18.) The previous synthesized sound box is left with ‘strict: ^ left’ — the number of sample points that are less than the total number of sample points.

曰=。口弟3圖」所示，以取樣率為每秒8千一個音框的長度約為〗8 η . ^ ^ 々1 J 滿18〇點，剩餘^數Γ不Γ 後，由於不其編〇下一個週期繼續，以此類推。之將 =後，即進入步驟40的第二階段，平滑處理，亦即，將基週、振幅座p r。会去1 ^ 一 RC 數平滑處理。參數是以内差方式，做平滑處理，其Φ 人a、& :中5成芩數=上一音框參數-Prop) +目=曰框芬數邛Γ〇ρ。其中’ 〇 <= pr〇p(pr〇p〇rti⑽； h .<= 1，Ραρ =目前音框已合成樣本點數/目前音框總樣本點數。Said =. As shown in Figure 3 of the "Four Brothers", the length of a sound box at a sampling rate of 8 thousand per second is approximately 〖8 η. ^ ^ 々1 J is full of 180 points, and the remaining number Γ is not Γ. 〇 The next cycle continues, and so on. After it will be =, it enters the second stage of step 40, smoothing processing, that is, the base cycle and the amplitude seat p r. Will go to 1 ^ one RC number smoothing. The parameter is smoothed by the internal difference method. Its Φ person a, &: 50% of the number = the previous frame parameter -Prop) + head = said frame number 邛 Γ〇ρ. Where '〇 < = prOp (pr〇p〇rti⑽; h. &Lt; = 1, Pαρ = number of sample points synthesized in the current frame / total number of sample points in the current frame.

526466 五發明說明（11) 有於有聲音素的編碼過程、一清楚的說明來描述其合成過程。接；所以，上述以較的音素合成做一個統整的介紹，來，將針對三種不同程，請參考「第4圖」，本發：之二’：套合成語音的流透過此流程圖，將可更喑^ 9 9素角午碼流程圖，在整好音資：明步驟40與5。的具體操作。碼採取基週（Pitch)參數編於資f中，’由於本發明的編且，「有聲」的基週參數依計貝算料而的取7,端「的Λ式，並苓數為1，「靜音」的基週參數 :、然茸」的基週的資料加以判斷其為「有爽’、、、以’-可以基週參數料，並分別加以處理。無聲」或「靜音」資先靖入6位兀（步驟401 )，以判斤無聲」或「靜音」。如果，其$ ,，貝料為有耳」、為「有聲」音素，接著，ί取1(步驟40 2 ) ’則其必即，振幅參數（m)與頻^的t8個位元資料，亦「有聲」語音：以步美驟4°9)即可將經編碼的必為「靜立，目，丨括λδ果，基週=〇(步驟4〇3)，則其度，並產：L」S*8、二位：(步驟404 )，以讀取靜音的長 1，又不^ ^ 疗日。（步驟40 7 );如果，基週不大於 4〇5)，介寺於’則基週參數必為1，則讀入8位元（步驟様“(：，搜尋氣音的儲存位址，根據資料庫讀入氣音馀本點（步驟4〇6)。最後，立孔曰來的語音@「有聲」、r…輸出5"(步），將原加以還原热耸」與「靜音」的部分，分別526466 V. Description of the invention (11) For the encoding process of phonons, a clear description to describe the synthesis process. So, the above is a comprehensive introduction to the comparison of phoneme synthesis. In the following, it will be directed to three different processes. Please refer to "Figure 4". This issue: No. 2: The flow of synthesized speech passes through this flowchart. You will be able to change the flow chart of the 9-9 prime noon code, in order to tune the audio data: Steps 40 and 5. Specific operations. The code adopts the pitch parameter and is compiled in the resource f. “Because of the editing of the present invention, the“ sound ”base period parameter is calculated based on the calculation of the shell. , "Mute" base period parameter: "Ran Rong" base period data is judged to be "youshuang", "," can be base period parameter data, and processed separately. "Silent" or "mute" information First enter the 6th position (step 401) to judge whether it is silent or silent. If its $ ,, and its material are eared "and" voiced "phonemes, then take 1 (step 40 2) 'then it must be t8 bit data of amplitude parameter (m) and frequency, Also "voiced" voice: In steps of 4 ° 9), the coded must be "Standing, head, including λδ results, base period = 0 (step 4〇3), then its degree, and produce : L "S * 8, two digits: (step 404), to read the mute long 1 without ^^ treatment day. (Step 40 7); if the base period is not greater than 4 05), the templet parameter must be 1, then read in 8 bits (step 8 "(:, search for the storage address of Qiyin, According to the database, read the rest point of the qi sound (step 406). At last, the voice of Li Konglai @ 「有声」, r ... output 5 " (step), restore the original sound, and "mute" Part of

526466526466

請繼續參考「第5圖」，本發明之語音合忐的^ 里机程圖，透過此圖可以更清楚地說明「有譽處成。另耳」音素的合在「有聲」的資料中，其約佔5 4個位元，以下g 、人成的流程。首先，在步驟4 1 1中，先讀入第一個音框P參為合數’接著，在步驟4 1 2中，少令N:0，L二 180，基週0 =基週 RMSO^O, RCOi 二RCi，i 二0, 1，…，9 _ 以讀取RC參數，接著，即可進行參數平滑的動作，以讓音質更好，此即步驟4 1 3，如下所示： prop 二 N/L; 基週卜基週0*(l-prop)+基週*prop RMSj = RMS0*(1-prop) + RMS*pr〇p ; RCj(i) = RCO(i)*(l-prop)+RC(i) *pr op i =0，1，···，9 其中，p r o p即為比例（P r ο p o r t i ο n )，L則為音框的大小，一開始時，L·= 1 8 0。 _ 接著，如果N+基週j>L (步驟41 4)，亦即，取到大於一個音框的長度後，重新讀取下一個音框，也就是，進入步驟415 : 令 L = L-NH80 N = 0Please continue to refer to "Figure 5", the ^ mile chart of the voice combination of the present invention. Through this figure, the "prestige. The other ear" phonemes are combined in the "voice" data. It occupies about 54 bits, following the process of g, human. First, in step 4 1 1, first read in the first frame P parameter is a composite number '. Then, in step 4 1 2, let N: 0, L = 180, base cycle 0 = base cycle RMSO ^ O, RCOi two RCi, i two 0, 1, ..., 9 _ to read the RC parameters, and then, you can perform parameter smoothing to improve the sound quality, this is step 4 1 3, as shown below: prop Two N / L; base period, base period 0 * (l-prop) + base period * prop RMSj = RMS0 * (1-prop) + RMS * pr〇p; RCj (i) = RCO (i) * (l -prop) + RC (i) * pr op i = 0, 1, ···, 9, where prop is the proportion (P r ο porti ο n), and L is the size of the sound frame. At the beginning, L · = 1 8 0. _ Next, if N + base period j > L (step 41 4), that is, after taking the length greater than one frame, read the next frame again, that is, proceed to step 415: let L = L-NH80 N = 0

第15頁 526466Page 15 526466

第16頁 526466 五、發明說明（14) 素檔的記憶體容量屢、# (〜2.4kbPS)，可大量=，2“立元組（bytes)以下取樣為1 6位元，解壓^ 1 5己憶體空間，並提高音質素連結不好的語音。再¥利用平滑處理，則可改善語音個別處理，有聲a者’因為本編碼方法將有聲產生的有聲、無聲二』h不會發生在一般語音編碼問題；無聲部份，狀况，造成聲音沙啞等發果。邊氣音原音，以維持最佳的氣雖然本發明以益用以限定本發日月#，之較佳實施例揭露如上，铁之精神和範圍内，當可 =，在不脫離者為準。圍須視本說明書所附之申請專利範圍為母個部份音與無聲時，所音不良音效其並非本發明本發明所界定Page 16 526466 V. Description of the invention (14) The memory capacity of the prime file is repeated, # (~ 2.4kbPS), can be a large amount =, 2 "bytes (bytes) or less is sampled into 16 bits, decompressed ^ 1 5 Recall the body space and improve the poor quality of the connected speech. If you use smoothing, you can improve the individual processing of the speech. The person with sound a 'because the encoding method will produce the sound and sound without sound. General speech coding problems; silent parts, conditions, causing hoarseness and other fruits. Side tone original sounds to maintain the best tone. Although the present invention is used to limit the date and month #, a preferred embodiment is disclosed. As above, within the spirit and scope of iron, when it can be =, whichever does not depart from it. It is necessary to regard the scope of the patent application attached to this specification as the mother part of the sound and silence, the bad sound effect is not the present invention Defined by invention

526466 圖式簡單說明第1圖為本發明之語音音素之編碼及語音合成方法之流程圖；第2圖為本發明之語音合成器方塊圖；第3圖為本發明之模擬人聲帶振動圖；第4圖為本發明之語音音素解碼流程圖；第5圖為本發明之語音合成器訊號處理流程圖；第6A圖為單字’'abbreviation"之原音語音波形；第6 B圖為單字n a b b r e v i a t i ο ηπ利用本發明加以編碼與語音合成方法之語音波形；第6 C圖為單字n a b b r e ν i a t i ο ηπ 以一般方式編碼與語音合成方法之語音波形；第7Α圖為第6Α圖之頻譜圖；第7Β圖為第6Β圖之頻譜圖；及第7C圖為第6C圖之頻譜圖。【圖示符號說明】 100 語音合成器 101 脈衝序列產生器 102 聲道濾波器 103 乘法器步驟1 0 區分有聲、無聲與靜音音素步驟2 0 進行音素編碼步驟3 0 儲存經編碼之有聲音素碼、無聲音素與靜音音素步驟4 0 將音素解碼與平滑處理526466 Brief description of the diagram. Figure 1 is a flowchart of the method for encoding and synthesizing speech phonemes of the present invention; Figure 2 is a block diagram of the speech synthesizer of the present invention; Figure 3 is a simulated vocal cord vibration diagram of the present invention; Figure 4 is the flowchart of the phoneme decoding of the present invention; Figure 5 is the flowchart of the signal processing of the speech synthesizer of the present invention; Figure 6A is the original voice waveform of the single word `` abbreviation "; Figure 6B is the single word nabbreviati ο ηπ speech waveform using encoding and speech synthesis method of the present invention; FIG. 6C is a single word nabbre ν iati ο ππ speech waveform of general encoding and speech synthesis method; FIG. 7A is a spectrum chart of FIG. 6A; The figure is a spectrum diagram of FIG. 6B; and FIG. 7C is a spectrum diagram of FIG. 6C. [Symbol description] 100 Speech synthesizer 101 Pulse sequence generator 102 Channel filter 103 Multiplier step 1 0 Differentiate voiced, silent and mute phonemes Step 2 0 Perform phoneme encoding Step 3 0 Store the encoded phoneme code Step 4 0 Decoding and Smoothing Phonemes

第18頁 526466 圖式簡單說明步驟5 0 步驟401 步驟4 0 2 步驟403 步驟4 0 4 步驟4 0 5 步驟4 0 6 步驟4 0 7 步驟4 0 8 步驟4 0 9 步驟4 10 步驟41 1 步驟4 1 2 步驟4 1 3 合成语音讀入6位元基週〉1 基週=0 讀入8位元讀入8位元根據資料庫讀入氣音樣本點產生Ls*8 點靜音言買入4 8位元經語音合成器處理輸出語音讀入第一個音框參數令N = 0，L=180，基週〇=基週；RMSO = 0， RCO i =RC i，i 二〇，1，…，9 prop = N/L;基週 j=基週〇*(l-pr〇p) +基週氺prop RMSj = RMS0*(1-pr〇p)+RMS*pr〇p ; RCj(i) = RC〇（i)氺（l-pr〇p)+RC(i)氺prop i 二 0，1，···，9 N+基週j>L 令1^叶-N + 180 N = 0 基週0=基週 RMSO-RMS, « 步驟41 4 步驟4 1 5Page 18 526466 Schematic description step 5 0 step 401 step 4 0 2 step 403 step 4 0 4 step 4 0 5 step 4 0 6 step 4 0 7 step 4 0 8 step 4 9 step 4 10 step 41 1 step 4 1 2 Step 4 1 3 Synthetic speech reading 6-bit base period> 1 base period = 0 8-bit reading 8-bit reading Ls * 8 points based on database sound sample points 4 8-bit speech processed by the speech synthesizer. Read the first speech frame parameter. Let N = 0, L = 180, base cycle 〇 = base cycle; RMSO = 0, RCO i = RC i, i 20, 1 , ..., 9 prop = N / L; base period j = base period 〇 * (l-pr〇p) + base period 氺 prop RMSj = RMS0 * (1-pr〇p) + RMS * pr〇p; RCj ( i) = RC〇 (i) 氺 (l-pr〇p) + RC (i) 氺 prop i 2 0,1, ..., 9 N + basic period j > L Let 1 ^ 叶 -N + 180 N = 0 Base Week 0 = Base Week RMSO-RMS, «Step 41 4 Step 4 1 5

第19頁 526466Page 526466

第20頁Page 20

Claims

526466 6. Scope of patent application ---------- Coding and speech synthesis method, • Offline encoding of ^ and speech synthesis of speech phonemes in the language, including the following steps: Establish a speech database, including the following steps: the speech; the speech phoneme is divided into-voiced, silent and mute sound: the earphone compression coding, and the speechless phoneme: address coding and the mute phoneme A time-length encoding; static stand 压 3: press and% encode the phoneme and store the silent and comment phonemes into the voice database; when the user presses the person-under-voice and reads the phonetic database analysis = The sound of the text data is based on the speech "2 breeding materials, and it is twisted. It carries the a prime data of I and the library, and synthesizes the text data I. It includes the following steps: τ is the sound of the phoneme data Li A ^ and the mute phoneme code; and the ear elementary stone horse "the phoneme code is based on the phoneme code of the phoneme data via-& Liren 2 :: -voice sound" and according to the voice data; J sound ... Based on Rong mute phoneme code generation—quiet • Please call for the coding of speech phonemes and linguist method described in item i of the patent scope, where the sampling rate for sampling the language is 8 thousand per second-due to D, as requested The combination of speech phonemes described in item 1 of the patent scope

Page 526466 —— VI. Method of patent application f Method 'The compression coding of vowels is based on-basic weekly reference ^ Vibration field number and a spectrum parameter to encode; the second vowelless ==, The flat code is encoded by the far base period parameter and the one-bit address parameter; the time length encoding of the phoneme is encoded by the base period parameter. > Requirement 4. The base period parameter and the amplitude parameter of the phoneme in the speech phoneme described in item 3 of the scope of the patent application and the phoneme in the speech combination are calculated in units of: frame, and the parameter value is calculated step by step. . 5 · = Declares that the coding of speech phonemes and speech combination '.' As described in item 3 of the patent scope, and the encoding of mid-frequency reading parameters are coded using a linear predictive coding (6 Coding, LPC) method. Synthesize the speech phoneme encoding and speech Γ L described in item 1 or 3, where the address parameter is the storage address of the sounded phoneme of the sampled speech. 7 · ίΓ = encoding and speech of the phoneme of speech as described in item 1 or 3 of the circle. The time parameter in 1 /, is to record the sampled δ of the speech; the application time is long and wide. The method, 1 in the ^, item 1 of the phoneme coding and speech synthesis 9 :::: prime value is defined as i 'the synthesizer ί MG ί 1 or 3 of the speech phoneme encoding and speech parameters and The synthesis of frequency speech is based on the parameters of the base cycle, and the amplitude system includes :, π M to synthesize the voiced speech, where the speech synthesizer M, page 22, 526466 6. Patent application scope-a pulse sequence generator for The week parameter output is an excitation signal; a channel filter is used as a filter parameter of the channel filter according to the spectral parameter to receive the excitation signal and output it as a voice signal; and The multiplier is used to multiply the voice signal by the amplitude parameter to output a restored voice. 10. The method for encoding and synthesizing speech phonemes as described in item 1 or 3 of the scope of patent application, wherein the generation of the silent speech is based on reading a silent speech phoneme in the speech database according to the address parameter, and The silent speech phoneme generates the silent speech. Π. The method for encoding and synthesizing speech phonemes as described in item 1 or 3 of the scope of the patent application, wherein the generation of the mute speech is based on the time parameter to output a silent sound with an amplitude of 0 that corresponds to the time length of the time parameter

Page 23