TW454173B - Semi-automatic human voice dubbing method - Google Patents

Semi-automatic human voice dubbing method Download PDF

Info

Publication number
TW454173B
TW454173B TW88122862A TW88122862A TW454173B TW 454173 B TW454173 B TW 454173B TW 88122862 A TW88122862 A TW 88122862A TW 88122862 A TW88122862 A TW 88122862A TW 454173 B TW454173 B TW 454173B
Authority
TW
Taiwan
Prior art keywords
signal
length
channel
cycle
dubbing
Prior art date
Application number
TW88122862A
Other languages
Chinese (zh)
Inventor
Hung-Yan Gu
Mau-Sung Hung
Original Assignee
Gu Hung Yan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gu Hung Yan filed Critical Gu Hung Yan
Priority to TW88122862A priority Critical patent/TW454173B/en
Application granted granted Critical
Publication of TW454173B publication Critical patent/TW454173B/en

Links

Landscapes

  • Auxiliary Devices For Music (AREA)

Abstract

A semi-automatic human voice dubbing method in the field of human dubbing or timbre conversion field, simply converting original timbre into various different timbres. In human voice dubbing job of drama work (such as movie) usually many dubbers are required. Hence, the invention hope to go through the processing of timbre conversion with computer to achieve the function with one dubber capable of dubbing different timbres to save man power and fulfill the goal of personal dubbing studio. The reason why the human voice dubbing system in the invention is called ""semi-automatic"" is because the expression of happiness, anger, sadness and bliss still needs the voice signal of original persons who input. The achievement of the invention is to bring out a semi-automatic human voice dubbing method, using method of independent control base frequency, vocal track length, voice source signal and inner proportion of vocal track to attain goal of human voice timbre alternation. In addition, the invention also has implemented an on-line instant operation system to actually acquire quite abundant timbre variation after practical listening and measuring verification.

Description

PA880503.TWP - 3/33PA880503.TWP-3/33

454173 五、發明説明( 【技術領域】 本發明係關於-種半自動人聲配音方法,特別是關於 -種藉由電腦來模擬及控制-些發聲器官的特性,進而達 到改變音色的目的’然後把本發明的絲實作成—個即時 的半自動人聲配音之系統。 【先前技術】 以往有關音色轉換的研究,如語音轉換(voice conversion),幾乎是集中在特定對象與另—特定對象之間 的音色轉換關係,也就是將某甲的音色經過處理之後轉換 10成某乙的音色,這些方法都必須要先錄製曱、乙兩人所說 的共同句子,再根據這些句子去求取甲、乙兩人的音色對 應關係,然後才能依對應關係將甲的語音轉換成乙的語 音。這些轉換的方法大致可分成兩大類,第一類是參數式 (parametric)的方法,這類方法的效果全視參數是否能夠精 15 4的描述語者的特徵H3];另一類是非參數式(n〇n_ parametric)的方法,以各種訓練的程序來獲得兩人之間的 音色對應關係,這種方法需要夠多的訓練語句,才能得到 較佳的效果[4,5]。另外,Baudoin與Stylianou曾對四種頻域 上的方法去比較語音轉換的效果 20 如何將一種音色轉換成為許多不同的音色的研究,我 們還沒有找到他人呤直接相關之研究成果,不過,前述的 語音轉換之研究是有一定的參考價值的。 由於習知的技術仍有缺失,並不能做到配音的功能, 本案發明人乃亟思加以改良創新,並1里—番苦心潛心研究 ► 裝丨| (請先閲讀背面之注意事項再E本頁) 訂· ----線. 經濟部智慧財產局員工消費合作社印製 4δ4173454173 V. Description of the invention ([Technical Field] The present invention relates to a method of semi-automatic human voice dubbing, and in particular to a method of simulating and controlling the characteristics of some vocal organs through a computer, thereby achieving the purpose of changing the tone color. The invention of the silk is a real-time semi-automatic vocal dubbing system. [Previous technology] Previous research on tone conversion, such as voice conversion, has focused on the tone conversion between a specific object and another-specific object. Relationship, that is, the tone of a certain person is converted into a tone of a certain B after processing. These methods must first record the common sentences spoken by the two people, and then obtain the two people according to these sentences. The corresponding relationship of tone colors can be converted into the speech of B according to the corresponding relationship. These conversion methods can be roughly divided into two categories. The first type is the parametric method. The effect of this method depends on the parameters. Whether it is possible to refine the features of the descriptor 15 4 H3]; the other is a non-parametric (n〇n_ parametric) method to various This program requires enough training sentences to obtain better results. [4,5]. In addition, Baudoin and Stylianou used four methods in the frequency domain. To compare the effects of speech conversion. 20 How to convert one tone into many different ones has not been found in the research results directly related to others, but the previous study of speech conversion has some reference value. Because Xi The known technology is still lacking, and cannot achieve the function of dubbing. The inventor of this case is eager to improve and innovate, and 1 mile-Fan painstaking research ► Install 丨 | (Please read the precautions on the back before E page) Order · ---- Line. Printed by the Consumer Cooperative of Intellectual Property Bureau of the Ministry of Economic Affairs 4δ4173

發明説明( 後,終於成功研發完成本 【發明目的】 科自動人聲配音方法。 PA880503.TWP - 4/33 5 以猫* 月之目的在於提供一種半自動人聲配立方> 存 之方式,來逹到改變聲源訊號、及聲道内部比例 性的語音,瘦由配立/ 的目標°如何可輸入成年男 男性語音。胃女性、小孩或不同音色之 經濟部智慧財產局員工消費合作社印製 本發明之次一目的係在藉 些發聲器宫的特性/ 精由電版來拉擬及控制一 ^ 注進而達到改變音色的目的。已蔣装眚 作成-個可線上即時 將其貫 註,㈣p , 系統’並且經由實際聽測驗 的確可得到相當豐富的音色轉換。 人聲轉換的應用,在許多戲劇作品(如電影)裡的 人耳配曰工作,通常需要許多的配音員,因此 能運甩電腦來作立^ 我们布望 昌你^ 3色轉換之處理,以達成只需要-位配音 、月b配出許多不同音色的功能,而使得戲劇作口及有 聲書裡關於人聲配立的m 寸關作。口及有 立 人卓配曰的人力花費可以節省下來,並使得個 配曰工作室的理想能夠實現1們所研究的人聲配音系 統稱之^半自動的’因為喜、怒、哀、樂的表達,仍然要 由聲音信號的原輪入者來控制,我們的系統只負責音色的20轉們定義一個全自動的人聲配音系統是,不需要輸 曰號並且吾怒衣樂的情緒可由參數來控制,這樣 的個文句翻語音(text_t〇_speech)系統,很明顯的以目前的 技術來說’要製作一個全自動的人聲配音系統是相當困難 的。 10 15Description of the invention (Later, the successful development of this [invention purpose] scientific voice vocal dubbing method. PA880503.TWP-4/33 5 The purpose of the cat * month is to provide a semi-automatic voice vocal cube> storage method to come Change the sound source signal and the internal proportional voice of the sound channel. How to input the voice of adult male and male? How to input the voice of adult male and female. Stomach female, child or employee with different sounds. The second purpose is to borrow some of the characteristics of the sound generator's house / refined by the electric version to draw and control a ^ note to achieve the purpose of changing the tone. It has been made by Jiang Zhuang-a real-time online attention, ㈣p, The system 'and indeed through the actual listening test can get quite rich tone conversion. The application of human voice conversion, in many drama works (such as movies), ear dubbing work, usually requires a lot of dubbers, so it can be thrown off the computer to Zuo Li ^ We look forward to you ^ 3 color conversion processing to achieve the function of only -bit dubbing, month b with many different tone colors, so that the drama speaks and has The book's m-inch keynote for vocal dubbing. The labor cost of speaking and having a person can be saved, and the ideal of a dubbing studio can be realized. The vocal dubbing system we studied is called ^ semi-automatic. The expression of joy, anger, sorrow, and music must still be controlled by the original turn-in of the sound signal. Our system is only responsible for the 20 turns of the tone. A fully automatic vocal dubbing system is defined. No input is required. No. And the mood of my anger clothing music can be controlled by parameters. Such a text_t0_speech system is obviously based on the current technology. 'It is quite difficult to make a fully automatic vocal dubbing system. 10 15

u n 裝 ~^--I--- -. I 丁 ______、 , - ^ i----:---—線---- (請先閲讀背面之注意事項再^本育) ,Γ - ,j. u I - « -I. H- - ^^454 1 7 3 Α7 Β7 PA880503.TWP - 5/33 五、發明説明( 10 【技術内容】 具有上述優點之本件半自動人聲配音方法,係提出了 一種音色轉換的方法’以獨立控制基頻、聲道(vGcaltrack) 長度、_滅、及聲道内較_方絲賴改變人聲 音色的目標’並且已經將這個方法實作成一個可線上即時 操作的人聲配音系統,實際聽測的驗證是,的確可得到相 當豐富的音色轉換。而運用於本發明之整體架構包括有: 一即日^"基週L將輪人的語音信號以雜(frame)為 單位進行切割’然後求取時框中各基週頂點乜红也%止)的 位置,此部份使用我們新提出的方法。 一音調與聲道長調整,此部份是修改我們先前提出的 TIPW (Time Proportioned Interpolation of Pitch Waveform)音節信 號合成方法[7](專利公告第3〇9588號),以便能夠在即時 的要求下,將語音信號依據所設定的音高、聲道長參數來 請 先 閱 讀 背 之 注 意 事 項 再〆 t Τ 15作調整 經濟部智慧財產局員工消費合作社印製 一聲源訊號調整,此.部.份是透過LPC(lin.ear prediction coding)分析來求取聲源訊號,然後依據前人的研究成果來 調整聲源訊號[8,9]〇 一聲道内部比例調整,以LPC分析所建構的聲道模型 20為基礎,改變聲道前後部分(咽腔、口腔)的長短比例, 以模擬不同人的聲道内部比例之差異。 ί 【圖式簡單說明】 請參閱以下有關本發明實施例之詳細說朋及其附圖, 將可進一步暸解本發明之技術内容及-其目的功效;有關該 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ.297公釐) 4 54 1 7 3 A7 B7 五、發明説明 PA880503.TWP - 6/33 實施例之附圖為: 圖一為本發明半自動人聲配音方法之處理流程架構 圖, 圖二為半自動人聲配音方法之基週位置求取流程圖 囷二(A)為基週求取之緩衝區設定示意圖; 圖二(B)為基週求取之缓衝區設定示意圖; 圖四為波形頂點之陣列元素X⑴〜χ[κ]分佈示意圖; 圖五為有錯誤的基週選取結果示意圖; 圖六為基週修正範圍示意圖; 圖七為基週修正後的結果示意圖; 圖八為T1增加時的聲源波形圖; 圖九為求LPC分析剩餘信號之週期設定方式示意圖 圖十為調整聲源訊號之流程圖; 圖十一為格狀濾波器與聲道的對應圖; I5 冑十二為聲道内部比例調整之/a/音戴面積比較圖; 圖十三(a)為聲道内部比例調整(原始〜音’ 譜分析圖; ° 10 (免^一^讀背面之注意事贫再、^本夏} 裝. -m 經濟部智慧財產局員工消費合作社印製 )之頻 (u==1.4)之頻譜分 圖十三(b )為聲道内部比例調整 析圖, 20 圖十三(c)為聲道内部比例調整(u==〇6) 析圖; · 表一為基週位置選取之正確率表; 表二為三個母音在11¾:為1.4與0.6時的比較表. 表三為音色轉換之1〇種參數設定表; 之頻譜分 線 本紙張又度適用中國國家標率(CNS ) 4 54 1 7 3 A7 五、發明説明( 表四為清晰度與自然度之聽測結果表; 表五為辨別度評估之結果表。 【主要部分代表符號】 1語音信號輸入 3音調與聲道長調整 5聲道内部比例調整 【實施例】 (請先閣讀背面之注意事項再^^:本夏) 2即時基週偵測 4聲源訊號調整 6語音信號輸出 本發明的主要處理流程就如圖一所示,這裡先對各個 處理方塊做簡略的介紹,評細的作法則在以後各節中說 明。 1.基本架禮 10 一語音信號輸入1 ; 一基週偵測2,將輸入的語音信號以時框(frame)為單位 進行切割然後即時求取時.框中.各基週頂點(pitch peak)的· 位置; 經濟部智慧財產局員工消費合作社印製 一音調與聲道長調整3,此部份是修改本案申請人先 15 前提出的 TIPW (Time Proportioned.. ..Interpolation of Pitch Waveform)音節信號合成方法[7](專利公告第309588 號),以便能夠在即時的要求下,將語音信號依據所設定 的音高、聲道長參數來作調整; ) 一聲源訊號調整4 ’此部份是透過LPC(linear prediction 20 coding)分析來求取聲源訊號’然後依據前人的研究成果來 調整聲源訊號[8,9]; -7- 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公φ ) 454173 Α7 Β7 五、發明説明( PA880503.TWP - 8/33 一聲道内部比例調整5,以LPC分析所建 為基礎,改變聲道前後部分(咽腔、〇 以模擬不同人的聲道内部比例之差異. 腔 構的聲道模型 的長短比例, 一語音信號輸出6 即時基週位置偵測 10 15 .經濟部智慧財產局員工消費合作社印製 20 基週(長度或位置)的求取在許多語音信號處理的應用 裡都是-個重要的部分,而對於音色轉換處理來說,基週 更是-項重要的參數,並且求出的基週位置的準確性對系 統的效能有重大的卿1於基週求取叫 被提出的方法可粗略分為兩大類’其中―類是依據頻譜或 相關性(co-relation)分析來求取基週長度,例如使用Up 的剩餘訊號的自相關係數[ιι],或使用交互相關係數(l〇ss C〇-re_n) [U],來求取基週長度;另外—類是直接在 波形上求取基週,例如_波峰來尋找基週位置的方法 [13,14]。這兩大類的方法,—般來說頻譜與相關性分析方 法求出的基週長度之準4度較高,但是所花費的時間較 多,而時域波形上的方法相對的準確度較低,但其所需的 處理時間較少。 ^ 2.1基週位置偵測的方法 雖'然有些語音信號處理的應賴,只要求取基週長 度,不需要求取基週位置’也不.需嘐即時處理,然而本^ 明的系統除了要求’求取基週位置及求取的準確性之外:un installed ~ ^-I ----. I 丁 ______,,-^ i ----: ----- line ---- (Please read the precautions on the back before ^ this education), Γ -, j. u I-«-I. H--^^ 454 1 7 3 Α7 Β7 PA880503.TWP-5/33 V. Description of the invention (10 [Technical content] This method of semi-automatic human voice dubbing has the above advantages. A method of tone conversion was proposed 'to independently control the fundamental frequency, the length of the channel (vGcaltrack), _off, and the goal of changing the vocal color of the channel compared with _Fangsilai', and this method has been implemented into an online The real-time operation of the human voice dubbing system has been verified by actual listening. It can indeed obtain quite rich tone color conversion. The overall architecture applied to the present invention includes: One day ^ " The basic week L uses the voice signal of the rookie (Frame) is cut as a unit, and then the position of the vertices of each base period in the frame is also determined. This part uses our newly proposed method. A tone and channel length adjustment, this part is to modify our previously proposed TIPW (Time Proportioned Interpolation of Pitch Waveform) syllable signal synthesis method [7] (Patent Bulletin No. 309588), so as to be able to meet the immediate requirements According to the set pitch and channel length parameters of the voice signal, please read the precautions before t t 15 to adjust the source signal printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs, this department. It is based on LPC (lin.ear prediction coding) analysis to obtain the sound source signal, and then adjusts the sound source signal according to the previous research results [8,9]. The internal channel adjustment of one channel is constructed by LPC analysis. Based on the vocal tract model 20, the length ratio of the front and rear portions of the vocal tract (pharyngeal cavity, oral cavity) is changed to simulate the difference in the internal vocal tract ratio of different people. ί [Schematic description] Please refer to the following detailed description of the embodiments of the present invention and its drawings, to further understand the technical content of the present invention and its purpose and efficacy; the relevant national standards (CNS ) A4 specification (210 × 297 mm) 4 54 1 7 3 A7 B7 V. Description of the invention PA880503.TWP-6/33 The drawings of the embodiment are: Figure 1 is a schematic diagram of the processing flow of the semi-automatic human voice dubbing method of the present invention. Figure 2 is a flowchart of base position determination for semi-automatic voice dubbing method. Figure 2 (A) is a schematic diagram of buffer setting for base period. Figure 2 (B) is a schematic diagram of buffer setting for base period. Figure 4 It is a schematic diagram of the distribution of the array elements X⑴ ~ χ [κ] of the waveform vertices; Figure 5 is a schematic diagram of the selection of the base cycle with errors; Figure 6 is a schematic diagram of the correction range of the base cycle; Figure 7 is a schematic diagram of the modified result of the base cycle; Sound source waveform diagram when T1 is increased; Figure 9 is a schematic diagram of the period setting method for obtaining the remaining signal of LPC analysis; Figure 10 is a flowchart for adjusting the sound source signal; Figure 11 is the correspondence between the grid filter and the channel; I5胄 Twelve is the comparison chart of the internal ratio adjustment of the channel / a / sound wear area; Figure 13 (a) is the internal adjustment of the channel ratio (original ~ tone 'spectrum analysis chart; ° 10 (free ^ one ^ read the back of the Pay attention to poverty, ^ this summer} equipment. -M Frequency spectrum (u == 1.4) of the Intellectual Property Bureau of the Ministry of Economic Affairs (printed by the Consumer Cooperative). 20 Figure 13 (c) is an analysis of the internal ratio adjustment of the channel (u == 〇6); · Table 1 is a table of the correct rate selected for the base position; Table 2 is the three vowels at 11¾: 1.4 and 0.6 Comparison table. Table 3 is a list of 10 parameter settings for tone conversion. The spectrum of the paper is again applicable to China's national standard (CNS) 4 54 1 7 3 A7. 5. Description of the invention (Table 4 is the clarity and Table of hearing results of naturalness; Table 5 is the result of discrimination evaluation. [Representative symbols of main parts] 1 Voice signal input 3 Tone and channel length adjustment 5 Channel internal ratio adjustment [Example] (Please read first Note on the back again ^^: This summer) 2 Real-time base cycle detection 4 Sound source signal adjustment 6 Voice signal output The processing flow is shown in Figure 1. Here, each processing block is briefly introduced, and the detailed evaluation method is explained in the following sections. 1. Basic frame 10-Voice signal input 1; One basic week detection 2 The input voice signal is cut in units of time frames (frames) and then the time is obtained in real time. The positions of the peak peaks (pitch peaks) of each base cycle; the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs prints a tone and Channel length adjustment 3, this part is to modify the TIPW (Time Proportioned ... Interpolation of Pitch Waveform) syllable signal synthesis method [7] (Patent Bulletin No. 309588) proposed by the applicant of this case Under real-time requirements, adjust the voice signal according to the set pitch and channel length parameters;) A sound source signal adjustment 4 'This part is to obtain the sound source signal through LPC (linear prediction 20 coding) analysis 'Then adjust the sound source signal according to previous research results [8,9]; -7- This paper size applies Chinese National Standard (CNS) A4 specification (210 X 297 mm φ) 454173 Α7 Β7 V. Description of the invention (PA880503 .TWP-8/33 Internal channel ratio adjustment5, based on LPC analysis, changing the front and back parts of the channel (pharyngeal cavity, 0 to simulate the difference in the internal ratio of the channels of different people. The length ratio of the channel model of the cavity structure, a voice signal Output 6 Real-time base-cycle position detection 10 15. Printed by the consumer property cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 20 base-cycle (length or position) The calculation of many base-cycle (length or position) is an important part in many voice signal processing applications. For tone conversion processing, the base period is an important parameter, and the accuracy of the position of the base period is important to the efficiency of the system. 1 The method proposed for base period can be roughly divided into two. The big class is of which the class is based on spectrum or co-relation analysis to obtain the base period length, for example, using the auto-correlation coefficient [ι] of the remaining signal of Up, or using the cross-correlation coefficient (l0ss C〇- re_n) [U] to find the base period length; In addition, a method is to find the base period directly on the waveform, such as _ wave peak to find the position of the base period [13, 14]. These two types of methods, generally speaking, the base cycle length obtained by the spectrum and correlation analysis method is higher than 4 degrees, but it takes more time, and the method on the time domain waveform is relatively less accurate. , But it requires less processing time. ^ 2.1 Although the method of detecting the position of the base period is “but some voice signal processing is required, only the length of the base period is required, and the position of the base period is not required.” It is not necessary to perform instant processing. Requirement 'Besides finding the base position and the accuracy of the finding:

' (請先閲讀背面之注意事項本頁} .裝_ 訂· 線 4 64 1 7 3 A7 B7 PA880503.TWP - 9/33 五、發明説明() (請先閱讀背面之注意事項再^^本頁) 同時還必須兼顧即時處理的需求,因此採用了時域波形上 直接偵測的方法,圖二是本發明發展的基週偵測方法的流 程圖,圖中各方塊的功能將在下面各段中加以說明。 5 2.1.1 緩衝區與時框長度 緩衝區(buffer)的長度設為 500 * 扣/ 11,025 個 取樣點,因此可以包含22Hz以上的聲音信號,而緩衝區的 取法如圖三所示,圖三(A)中的緩衝區裡的信號波形經過 基週位置偵測之後可得圖三(B)中所示的基週位置,而下 10 一次缓衝區的設定也如圖三(B)裡所示的。另外,一個缓 衝區内的信號樣本要分成時框來處理,時框長度設為緩衝 區的一半,且有50%的部分與下一個時框重聱,所以一個 緩衝區可劃分為三個時框,這樣劃分是為了增加基週偵測 的準確度_。 15 2.1.2 能量與零點交越率 經濟部智慧財產局員工消費合作社印製 這裡定義能量是一個時框内信號樣本的振幅平方的加 總,通常一個具有週期性的信號其能量都會比靜音或雜訊 信號能量大許多,然而一個時框能量的門檻應該要設定為 20 多少才合理呢?依據經驗’以一個時框具250個16 bits的樣 本而言,其能量門檻可設為256,000,如此當一個時框的能 ί 量小於256,POO時,就將該時框的週期性旗標設為零,否則 設為1。零點友越率是指單位時間内信號通過時間軸的次 數,週期性的信號通常零點交越率會货雜訊信號為低,本 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 4 ^ 經濟部智慧財產局員工消費合作社印製 4 1 7 3 A7 B7 五、發明説明( PA880503.TWP - 10/33 10 15 20 發明以-個時框為單位來計算零點交越率,以—個η·— (取樣率11025Hz時,時框長為25〇個樣本點)的時框來說, 設門檻為60’如此當—個時框的零點交越率大於仞,就認 為該時框中沒有週期性訊號,而將該時框的週期性旗標設 為零,否則設為1。 、75 Λ 綜合能量與零點交越率的結果,只有在三個時框的週 期性旗標都等於零時,才認為這個緩衝區為不具有週期性 信號。 ’ 2.1.3 基遇位置撰敗 以下說明基週位置選取的處理步驟, 步驟(1):依能量及零點交越率來設定週期性旗標,然 後判斷週期性旗標是否皆為零,若皆為零(即無週期性信 號)’直接傳回零個週期。 步驟(2):合併三個時框中各取出的15個最大振幅值, 依時間次序加以排序’然後存入陣列υ[_ι]〜γμ习。 步驟(3):計算振幅門檻值並存於變數Clip,我們設定 Clip的程序是’本次緩衝區的三個時框各取出一個振幅極 大值,設為maxl,max2,max3 ,接著判斷前一個緩衡區是否 具有週期性訊號,如果沒有,就令Qip = (maxl+max2+max3)*0.2,如果有週期性訊號,就令clip = mm(maxl,max2,max3)*0.6 ° 步驟(4):由於Y[l]〜Y[45]並不是每一個點都在峰值的 位置,而是形成一群群集中在峰值及峰值附近的點上,因 -10- 春紙承尺度適用中國國家標準(CNS ) Α4規格(210Χ2.97公釐) (請先閱讀背面之注意事項Γ )寫本頁) .裝_ • I 73 Α7 -------Β7 PA880503.TWP - 11/33 五、發明説明() 此依序取出Y[l]〜Υ[45]中振幅大於Clip且為鋒值者,將它 們存入陣列X[l]〜X[K]。圖四為陣列元素X[l]〜χ[κ]分佈 的例子。 步驟(5):上、下週期長度之門檻值的設定: 5 若前一個緩衝區具有週期性訊號 則令下週期長度門檻=前一個緩衝區的週期平均值乘 上0.75 ,即 (avejpitch) * 0.75 上週期長度門檻=前一個諼衝區的週期平均值乘上 10 1_75,即 15 ---^--;---Ί---:裝--- (請先閲讀背面之注意事項本頁) 經濟部智慧財產局員工消費合作社印製 20 (avejitch) * 1.75 (1) 否則令下週期長度門檀=35 * samp丨ing_rate /1 i,〇25 上週期長度門檻=200 * sampling_rate /11,025 (2) 也就是當第一個週期信號開始出現時,本發明設定其 i頻必須在55Hz至315Hz的範圍内,而當目前所分析的緩 衝區不是連續週期信號的起始點時,則隨著到目前為止的 平均週期長度作調整。 ' 步驟(6广由陣列X⑴〜x [κ]中找出本次緩衝區中所有 的週期頂點位置。作法為,以緩衝區起點為參考點,將 x[l]〜χ[κ]中距離在上、下週期長度門檻之間的邓]找 出,從中取出一鮮有最大純的點當作是週期的 點;再以此邊界點為參考點,往前找出叩]〜乂闽中下二 批介於上、下週期長度.門檻之間的则,從中挑出振Μ 大之Χ[ι]當作是下-個週期邊界點,如此繼續找下田'(Please read the precautions on the back page first.). _ Order and Thread 4 64 1 7 3 A7 B7 PA880503.TWP-9/33 5. Description of the invention () (Please read the precautions on the back before ^^ this (Page) At the same time, the need for real-time processing must be taken into account, so the direct detection method on the time domain waveform is used. Figure 2 is a flowchart of the base cycle detection method developed by the present invention. The functions of each block in the figure will be described below. 5 2.1.1 Buffer and time frame length The length of the buffer (buffer) is set to 500 * deduction / 11,025 sampling points, so it can contain sound signals above 22Hz, and the buffer method is as follows As shown in Fig. 3, the signal waveform in the buffer area in Fig. 3 (A) can be obtained after detecting the base position, and the base position shown in Fig. 3 (B) can be obtained. As shown in Figure 3 (B). In addition, the signal samples in one buffer must be divided into time frames for processing. The length of the time frame is set to half of the buffer, and 50% of the portion is the same as the next time frame. Alas, so a buffer can be divided into three time frames, this division is to increase the base detection The accuracy of the measurement _. 15 2.1.2 Energy and zero crossing rate Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs, printed by the Consumer Cooperative, where energy is defined as the sum of the squared amplitudes of signal samples within a time frame, usually a periodic signal Its energy will be much larger than the energy of the mute or noise signal. However, the threshold of the energy of a time frame should be set to 20. How much is reasonable? According to experience, with 250 samples of 16 bits in a time frame, the energy threshold is It can be set to 256,000, so when the energy of a time frame is less than 256, POO, the periodic flag of the time frame is set to zero, otherwise it is set to 1. The zero crossing rate means that the signal passes in a unit time. The number of times on the time axis, the periodic signal usually has a zero crossover rate, and the noise signal is low. The paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm). System 4 1 7 3 A7 B7 V. Description of the invention (PA880503.TWP-10/33 10 15 20 The invention uses a time frame as a unit to calculate the zero-crossing rate, with η · — (at a sampling rate of 11025 Hz, For a time frame with a time frame length of 25 sample points), the threshold is set to 60 '. So when the zero-crossing rate of a time frame is greater than 仞, it is considered that there is no periodic signal in the time frame, and the time frame The periodic flag of the frame is set to zero, otherwise it is set to 1. , 75 Λ The result of the combined energy and the zero crossing rate. Only when the periodic flag of the frame is equal to zero in all three frames is the buffer considered as not It has a periodic signal. '2.1.3 Completion of the base position The following describes the processing steps for selecting the base position. Step (1): Set the periodic flag according to the energy and the zero crossing rate, and then determine whether the periodic flag is All are zero, if all are zero (that is, there is no periodic signal), it returns zero cycles directly. Step (2): Combine the 15 maximum amplitude values taken out from each of the three time frames, sort them in time order ’, and then store them in the array υ [_ι] ~ γμ. Step (3): Calculate the amplitude threshold and store it in the variable Clip. Let ’s set the procedure of Clip as: 'The three time frames of the buffer each take one amplitude maximum value, set it to maxl, max2, max3, and then judge the previous one. Whether the weighing area has a periodic signal. If not, let Qip = (maxl + max2 + max3) * 0.2. If there is a periodic signal, let clip = mm (maxl, max2, max3) * 0.6 ° Step (4) : Because Y [l] ~ Y [45] are not every point is at the peak position, but form a group of points near the peak and the peak, because -10- spring paper bearing scale applies Chinese national standard ( CNS) Α4 specification (210 × 2.97 mm) (please read the precautions on the back Γ) and write this page). Equipment_ • I 73 Α7 ------- Β7 PA880503.TWP-11/33 V. Invention Explanation () This sequence takes out those in Y [l] ~ Υ [45] whose amplitude is greater than Clip and are frontier, and stores them in the array X [l] ~ X [K]. Figure 4 shows an example of the distribution of the array elements X [l] ~ χ [κ]. Step (5): Set the threshold of the upper and lower cycle lengths: 5 If the previous buffer has a periodic signal, then set the lower cycle length threshold = the average value of the previous buffer's period multiplied by 0.75, which is (avejpitch) * 0.75 Upper cycle length threshold = Period average of the previous punching area multiplied by 10 1_75, which is 15 --- ^-; --- Ί ---: equipment --- (Please read the precautions on the back first Page) Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 20 (avejitch) * 1.75 (1) Otherwise, the cycle length threshold is set to 35 * samp 丨 ing_rate / 1 i, 〇25 Upper cycle length threshold = 200 * sampling_rate / 11,025 (2) That is, when the first periodic signal starts to appear, the present invention sets its i frequency to be in the range of 55Hz to 315Hz, and when the buffer area currently being analyzed is not the starting point of the continuous periodic signal, it follows Adjust the average cycle length so far. 'Step (6) Find all the positions of the periodic vertices in the buffer this time from the array X⑴ ~ x [κ]. The method is to use the starting point of the buffer as the reference point and set the distance in x [l] ~ χ [κ] Deng] between the upper and lower cycle length thresholds] Find out, take out a point with little maximum purity as the point of the cycle; and then use the boundary point as a reference point to find out 叩] ~ 乂 MIN 中The next two batches are between the upper and lower cycle lengths. The ones between the thresholds are used to pick out the vibration of the big X [ι] as the next cycle boundary point, so continue to find Shimoda

line

• 1 I- I 4 54 1 7 3• 1 I- I 4 54 1 7 3

PA880503.TWP - 12/33 、發明説明( 經濟部智慧財產局員工消費合作社印製 棊週位 、、’二過以上的基週位置選取 : 正確基週位置,不過这#去、#瓦大約已可取得80〇/〇的 準’而且這樣的取法還报容易求的準確性標 正確基週位置,或者將—個較長的週^^小於Clip值的 導致後面改變音色處理時的品質降低刀:了兩個週期, 點,因此發展了修正基週選取結果的方法些缺 的兩個週期長度不會有很劇烈的改變的觀點就f艮據相鄰 小即所提的平均週期長度及 再配合别一 得的基週位置。+的振幅改變量來修正所取 根據多次的實驗觀察,本發明定義兩個: T2是相近的’如果它們滿足如下之條件 4長度Tl和 ABS(TrT2) <ΜΑΧ (Τι,Τ2) χι1/5〇 依據此條件,就可偵測出圖五中的乃與巧不(33), 的,圖五為一個緩衝區經過基週位置選取後的^ = 5相近 Pi表示Si點上的振幅,接著,就進行如下所列的處2 驟,來將它更正回來。 v 步驟⑻:由週期序列中取出丁丨及乃兩個週期長声, 20於Τι、T2符合公式(3 ),且滿足振幅變化量的阼制 μλχ(Ρι,Ρ2)/μχν(Ρι,Ρ2)<2 :,即 因此將Τι及丁2保留不作修正,然後將平均遇期長声 10 15 為 avejpitch = (Tj+T2)/2 -12- 本紙張尺度適用中國國家標隼(CMS ) A4規格(210X 29"7公釐ΓPA880503.TWP-12/33, invention description (printed by the Consumers' Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs, the weekly position, and the selection of the position of the second week or more: the correct position of the base week, but this # 去 、 # 瓦 约 了You can get a standard of 80/0, and this method also reports the accuracy of the correct base cycle position, or a longer cycle ^^ is smaller than the Clip value, which will cause the quality of the tone to change when the tone is changed later. : Two periods, points, so the method of correcting the selection result of the base period was developed. The lack of two period lengths will not change drastically. The point is that the average period length and In accordance with the position of the base cycle obtained by another. + Corrected by the amount of change in amplitude. Based on multiple experimental observations, the present invention defines two: T2 is similar 'if they meet the following conditions 4 length Tl and ABS (TrT2) < ΜAX (Τι, Τ2) χι1 / 5. Based on this condition, it can be detected whether or not (33) is shown in Figure 5. Figure 5 is a buffer after selecting the base position ^ = 5 similar Pi represents the amplitude at the Si point, and then, Perform steps 2 listed below to correct it. V Step ⑻: Remove Ding and two periodic long sounds from the periodic sequence, 20 and T2, T2 meet the formula (3), and meet the amplitude change amount The control of μλχ (Ρι, P2) / μχν (Ρι, P2) < 2: That is, keep Tι and Ding 2 without modification, and then save the average long-term sound 10 15 as avejpitch = (Tj + T2) / 2 -12- This paper size is applicable to China National Standard (CMS) A4 specification (210X 29 " 7mm Γ

. I I 裝—^ ^ 訂 ; 务 _ * 、.: ; (象先閱讀背面之注意事項^L^本頁)__K (5) 454173 A7 PA880503.TWP - 13/33 B7 五、發明説明() 步驟(b):比較T2及T3,由於不符合公式(3),所以便開 始修正T3,即修正S4的位置。如圖六所示,S4可能的位置 範圍定義在a〜b之間,而a、b點的位置則由公式(6) a = S3 + ave_pitch * 90 %, ‘5 b = S3 + ave_pitch * 110% (6) 來設定,然後從〔a, b〕區間找尋一個具有最大振幅值的 波峰,將所找到的最大波峰點設為新的S4,並計算出新的 T3值,而平均週期長度則改為 ave_pitch = ( ave_pitch + T3 ) / 2 ◦) 10 步驟(c):繼續檢查Ti與Ti+1,採取步驟⑻或(b)的處理 方式,直到整個緩衝區處理完畢。處理完成之後便得到如 圖七所示的基週位置偵測結果。 2.2基週位置偵測之評估 15 .加入基週位置修正之處理後,進行評估實驗,其結果 經濟部智蒽財產局員工消費合作社印製 如表一所示,整體來說修正之處理可將基週位置選取正確 率提高至95 %以上。另外,由表一可知,對於正確率影響 較大的是那些沒有被選到的基週,根據觀察,這些沒被選 到的基週全部集中在當聲音由靜音區轉換到有週期性時的 20 前一至二個週期,以及從有週期性進入靜音區間之最後的 一至二個週期,歸納其原因,是由於前述這兩個交換區間II equipment — ^ ^ order; service _ * 、.:; (Like the precautions on the back ^ L ^ this page) __ K (5) 454173 A7 PA880503.TWP-13/33 B7 V. Description of the invention () Steps (b): Compare T2 and T3, because it does not meet the formula (3), so start to modify T3, that is, modify the position of S4. As shown in Figure 6, the possible position range of S4 is defined between a and b, and the positions of points a and b are given by formula (6) a = S3 + ave_pitch * 90%, and '5 b = S3 + ave_pitch * 110 % (6) to set, and then find a peak with the maximum amplitude value from the [a, b] interval, set the found maximum peak point to the new S4, and calculate the new T3 value, and the average period length is Change to ave_pitch = (ave_pitch + T3) / 2 ◦) 10 Step (c): Continue to check Ti and Ti + 1 and take the processing method of step ⑻ or (b) until the entire buffer is processed. After the processing is completed, the detection result of the base position shown in Fig. 7 is obtained. 2.2 Evaluation of base-period position detection 15. After adding the process of base-period position correction, an evaluation experiment is performed. The results are printed as shown in Table 1. The correct selection rate of the base position was increased to more than 95%. In addition, from Table 1, it can be seen that those that have a greater impact on the accuracy rate are those base cycles that have not been selected. According to observations, these base cycles that have not been selected are all concentrated when the sound is switched from the silent zone to the periodic one. 20 The previous one to two cycles, and the last one to two cycles that have entered the mute interval periodically, can be summarized because of the two exchange intervals

I 週期長度變化較大,且能量小於我們所設的能量門檻,因 此被判斷為非週期信號,而表一中;的基週選取錯誤,主要 是發生在爆破音上,如/六/及/勹/。. /… -13 - 本紙張尺度適用中國國家標準(CNS ) A4規格(210X 297公釐) 414173 A7 B7 PA880503.TWP - 14/33 五、發明説明() 表一基週位置選取之正確率 基週選 取正確 個數 基週選 取錯誤 個數 沒選到 的基週 個數 總基週個 數 正確率 實驗一 122 0 6 128 95.3% 實驗二 283 3 11 297 95.3% 實驗三 1279 8 60 1347 95.0% (請先聞讀背面之注意事項再丨篇本頁)I The period length varies greatly, and the energy is less than the energy threshold we set. Therefore, it is judged as an aperiodic signal, and the base period selection error in Table 1 is mainly caused by the blasting sound, such as / 六 / and /勹 /. / / -13-This paper size applies to China National Standard (CNS) A4 (210X 297 mm) 414173 A7 B7 PA880503.TWP-14/33 V. Description of the invention Select the correct number of weeks. Select the wrong number of base weeks. Select the wrong number of base weeks. Total correct number of base weeks. Experiment 1 122 0 6 128 95.3% Experiment 2 283 3 11 297 95.3% Experiment 3 1279 8 60 1347 95.0% (Please read the precautions on the back before reading this page)

3.音調與聲道長調整 J 經濟部智慧財產局員工消費合作社印製 這裡的音調與聲道長調整指的是音高調整以及聲道長 5 度調整等兩個部分,大體而言,影響音色最大的因素是音 調(基頻)高低與聲道的共振頻率(formant frequency)高低,其 中共振頻率高低與聲道長短有著密切的關係,當聲道變短 時,共振頻率會提高,反之則會降低,即呈現反比的關 係,因此,可以經由改變共振頻率的高低來達成控制聲道 10 長短改變的目的。以往一種控制音高及共振頻率的簡單作 法是,將錄音機的放音速度調快或調慢,這樣的做法的確 可同時將音高與共振頻率在相反方向成比例的升高與降 ( 低,不過發音的時間長度卻也跟著會減少或延長,所以這 並不是一個好的方法。因此,本發明採用本案申請人先前 15 提出的TIPW音節信號合成法(專利公告第309588號), TIPW可以獨立的控制音長(duration)、音高以及共振頻率的 高低,因此可以解決前述的問題,而達到分別控制音高與 聲道長的要求。以下就對TIPW法的處理步驟作一簡單描 述,詳細情形讀參考原始文章[7]。 -14 -:本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) #54173 A7 B7 PA880503.TWP - 15/33 經濟部智慧財產局員工消費合作社印製 五、發明説明() 3.1 基頻調整 首先介紹基頻(音高)的調整,基頻的物理意義是指聲 帶振動的頻率,通常小孩子的基頻比較高,其次是成年女 5 性,而成年男性的基頻較低,我們藉由調整基頻便可以達 到初步的音色轉換,以下是基頻調整的處理步驟: (STEP 1)依據基週調整比率及輸入信號裡的基週長 度,求取目前欲合成之基週的長度。 (STEP 2)依據合成週期的Η寺間位置,找尋兩個對應的 10 在輸入信號裡的原始基週,然後依線性時間比率計算兩原 始基週波形的加權。 (STEP3)兩原始基週波形各乘上自己的加權。 (STEP 4)依據合成週期的長度及原始週期的長度來共 同決定餘弦窗(cosine window)的長度,然後將原始基週波形 15 乘上兩個半邊的餘弦窗,對齊放在合成基週的左右兩邊界 並疊加。 (STEP5)將兩個處理過的原始基週波形相加。 3.2 聲道長度調整 20 聲道長度的調整是透過調整聲道共振頻率之方式來達 成,而共振頻率的調整是經由再取樣(resampling)來達成, 例如當要把共振頻^全體調高為原來的1.25倍時,就相當 於在原始的信號波形上,每1.25個取樣點設定一個新的取 樣點(但取樣率不變),因此新的取樣點可能會落在兩個舊 -15- (請先閲讀背面之注意事項再^r^'頁) -裝· 訂 線 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 454173 A73. Tone and channel length adjustment J Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economy The tone and channel length adjustments here refer to the two parts of pitch adjustment and 5-degree channel length adjustment. Generally speaking, the impact The biggest factor in timbre is the pitch (fundamental frequency) and the resonance frequency (formant frequency) of the channel. The resonance frequency is closely related to the length of the channel. When the channel becomes shorter, the resonance frequency will increase, and vice versa It will decrease, that is, it shows an inverse relationship. Therefore, the purpose of controlling the length of the channel 10 can be changed by changing the level of the resonance frequency. In the past, a simple way to control the pitch and resonance frequency was to increase or decrease the playback speed of the recorder. This approach can indeed simultaneously increase and decrease the pitch and the resonance frequency in the opposite direction (low, However, the duration of the pronunciation will also decrease or increase, so this is not a good method. Therefore, the present invention adopts the TIPW syllable signal synthesis method proposed by the applicant of the previous 15 (Patent Bulletin No. 309588), and the TIPW can be independent Controls the duration, pitch and resonance frequency, so it can solve the aforementioned problems and meet the requirements for controlling pitch and channel length separately. The processing steps of the TIPW method are briefly described below in detail. For the situation, please refer to the original article [7]. -14-: This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) # 54173 A7 B7 PA880503.TWP-15/33 Printed by the Intellectual Property Bureau of the Ministry of Economic Affairs, Consumer Consumption Cooperative Fifth, the description of the invention () 3.1 Basic frequency adjustment First introduce the adjustment of the fundamental frequency (pitch). The physical meaning of the fundamental frequency refers to the frequency of the vocal cord vibration. The fundamental frequency of children is often high, followed by adult females, and the fundamental frequency of adult males is low. We can achieve the initial tone conversion by adjusting the fundamental frequency. The following is the processing steps of fundamental frequency adjustment: (STEP 1) Adjust the ratio of the base cycle and the length of the base cycle in the input signal to find the length of the base cycle that is currently being synthesized. (STEP 2) Find two corresponding 10s in the input signal based on the inter-temple position of the synthesis cycle. , And then calculate the weighting of the two original base period waveforms according to the linear time ratio. (STEP 3) Each of the two original base period waveforms is multiplied by its own weight. (STEP 4) Based on the length of the synthesis period and the length of the original period. Determine the length of the cosine window, then multiply the original base period waveform 15 by the cosine windows of the two halves, align them on the left and right borders of the composite base period, and superimpose them. (STEP5) Combine the two processed original bases. The cycle waveforms are added. 3.2 Channel Length Adjustment The adjustment of 20 channel length is achieved by adjusting the resonance frequency of the channel, and the adjustment of the resonance frequency is achieved through resampling. If you want to increase the resonance frequency ^ as a whole by 1.25 times, it is equivalent to setting a new sampling point (but the sampling rate is not changed) every 1.25 sampling points on the original signal waveform, so the new sampling The point may fall between the two old -15- (Please read the precautions on the back before ^ r ^ 'page)-Binding and binding This paper size is applicable to China National Standard (CNS) A4 size (210X297 mm) 454173 A7

^ ---* Μ,--- 請先閱讀背面之注意事項再r'J本頁) 10 求仔A、B、C之後,再將x以x-x〇之值帶入/(x)便可 以得到新的樣本值,如此繼續可得到新的週期波形,声後 再將職—後的波形取代節中音高調整(STEP 3)裡所 用的原始基週波形,就可以得到經過音高調整以及聲道長 度調整的新合成的週期波形了。 15 4.聲源訊號镅糕 1丨_ 消 20 聲源訊號指的是藉由聲帶震動所產生的氣流訊號,聲 源訊號的頻率(即聲帶開關的頻率),就是前面所說的基 頻,關於基頻的調整,上一節已經討論過了,而本節所指 的聲源訊號調整是指,調整聲源訊號的波形,如圊八所 示,將一個聲源訊號的週期由聲帶張開至最大的時刻&, - 16 - 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公楚丁 1 7 3^ --- * Μ, --- Please read the notes on the back before r'J page) 10 Find A, B, and C, and then bring x into / (x) with the value of xx〇 Obtain new sample values, and then continue to obtain new periodic waveforms. After the sound, replace the post-post waveform with the original base-period waveform used in pitch adjustment (STEP 3) to obtain the adjusted pitch and Newly synthesized periodic waveform for channel length adjustment. 15 4. Sound source signal 镅 cake 1 丨 _ Cancel 20 The sound source signal refers to the airflow signal generated by the vocal cord vibration. The frequency of the sound source signal (that is, the frequency of the vocal cord switch) is the fundamental frequency mentioned above. Regarding the adjustment of the fundamental frequency, the previous section has been discussed, and the adjustment of the sound source signal referred to in this section refers to adjusting the waveform of the sound source signal. As shown in Figure 28, the period of a sound source signal is expanded from the vocal cord to The biggest moment &,-16-This paper size applies to China National Standard (CNS) A4 (210 X 297 Gong Chuding 1 7 3

五、發明說明( 5 10 IS::進行拉長與縮短如2的處理,使其仍維持- t = 根據前人的研究結_,這樣的改變可使 :.曰的日色發生變化,當Tl增加時可以使聲音具有細 效果’而當τ2增加時則具有shimm_效果。 勝慮實作上可_彳的問題,並未經由積分 刀剩餘訊號來求得聲源訊號波形,而是採用近似 ㈣法’如4婿裡的說i這樣做 LPC分析求得的聲源信號 1 = 丁上-由 麼理相,〜7 有介夕時候,不像圖八所示那 、㈣Γ / 峰而已,那麼程式如何去分辨哪一個 波峰才疋所要的? 1 i I 之 注 意l:i 頁 經濟部智慧財產局員工消費合作社印製 在說明聲源信號最高點之時刻a如何決定之前,本發 明先說明剩餘信號是如何求出的,就是先取得一個週期的 语音信號’對它進行LPC分析而建立聲道的全極(an#)模 型1後讓分析用的語音信號通過反全極模型,就可求得 剩紅號[11]。接著,由比較剩餘訊號與聲源訊號我們知 逼’當剩餘訊號振幅達到最大時通常就是圖八裡聲源訊號 的a點位置,而再觀_餘訊號與原始訊號之間的對應關 20係貝J可發現當原始語音信號到達基週端點(p触喊) 時,剩餘訊號振幅也,會達到最大,因此推論,當原始語音 信號位於基週端點時,通常就是聲源訊號的振幅最大值位 f。 15 面所指的一個週期的語音信號-是指圖九中由b至V. Description of the invention (5 10 IS :: Stretching and shortening such as 2 to maintain it-t = According to previous research results _, such changes can change: When Tl is increased, the sound can have a thin effect, and when τ2 is increased, it has a shimm_ effect. Considering the problem that can be implemented in practice, the sound source signal waveform is not obtained through the remaining signal of the integral knife, but is used instead. The approximation method is as described in 4 婿 i. The sound source signal obtained by doing an LPC analysis is 1 = Ding Shang-Yu Lili, ~ 7. When there is a break, it is not like the 所示 Γ / peak shown in Figure 8. Then, how does the program distinguish which peak is the one you want? 1 i I Note l: i Page printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs. Before explaining how the moment a of the sound source signal is determined, the present invention first Explain how the remaining signal is obtained, that is, first obtain a period of speech signal 'analyze it by LPC to establish an all-pole (an #) model of the channel 1 and then pass the analysis voice signal through the inverse all-pole model. The remaining red number can be obtained [11]. The Yu signal and the sound source signal we know that when the amplitude of the remaining signal reaches the maximum, it is usually the position of point a of the sound source signal in Figure Eight. Looking at the correspondence between the Yu signal and the original signal, we can find that When the original speech signal reaches the end of the base period (p touch shout), the remaining signal amplitude will also reach the maximum, so it is inferred that when the original speech signal is located at the end of the base period, it is usually the maximum amplitude bit f of the sound source signal 15-period speech signal-refers to b to

訂 線 454173 A7 B7 五、發明說明( PA880503.TWP W· 之間的信號。w C的定義為b = (S1+S2) / 2,c = (S2+S3) / 2, 並且’本發明令圖八中的時間點a等於S2,即原始語音信 破中的基週邊界點。轉明以雜的Lpc來分析—個週期 的語音信號。 4.2 聲源訊號調整方沾^ 由4.1節的推論,我們依據圖九裡的a、b、c參數來訂 定圖八裡的1^與丁2的數值,即令 T\ = a - b,T2 (ίο) 10 由於令a點在週期邊界點幻上,因此圖八中的丁丨通常 不等於丁2 ’但是為了維持與原週期長度相同.,因此丁1與乃 的調整是有連帶關係的,因此調整方式為 若 L < r2 ,則令 Γι,= Γι><及, -----j---------裝--- .... (請先閱讀背面之注音?事項本頁) i. 15 τ2'=τ - τ', ’則令 T^K2-R) τλ'=τ-τ2' (11) .線_ 經濟部智慧財產局員工消費合作社印製 20 其中Τ = Τι+Τ2,而R是調整比率,R的數值範圍是〇< R < 2,有了新的Τι',丁2'之後’接著以如同3.2節中的 resampling的作法來改變聲源訊號上升區間與下降區間的樣 本點數’而整個聲源訊號調整的流程如圖十所示。在聽覺 上,經過聲源訊號調整的語音信號,當R值比1〇大許多時 (如1.5) ’其音色聽起來的感覺像是當喉嚨乾燥(發炎)時所 發出來的聲音,而當R比1 ·〇小許多時(如0.5),聽起來的語 音有種閃爍不清的感覺。 -18 本紙張尺度適用中國國突標準(CNS)A4規格(210 X 297公釐) 454173Order line 454173 A7 B7 V. Description of the invention (PA880503.TWP Signal between W ·. The definition of w C is b = (S1 + S2) / 2 and c = (S2 + S3) / 2 and 'This invention makes The time point a in Figure 8 is equal to S2, which is the base-period boundary point in the original voice message break. It turns out that the Lpc is used to analyze a period of the speech signal. 4.2 Sound source signal adjustment Fang Zhan ^ Inference from Section 4.1 We set the values of 1 ^ and D2 in Figure 8 according to the parameters a, b, and c in Figure 9. That is, let T \ = a-b, T2 (ίο) 10 Since point a is at the boundary point of the cycle, Therefore, Ding in Figure 8 is usually not equal to Ding 2 ', but in order to maintain the same length as the original cycle. Therefore, the adjustment of Ding 1 and Nai has a joint relationship. Therefore, if L < r2, let Γι, = Γι > < and, ----- j --------- install --- .... (Please read the phonetic on the back? Matters page) i. 15 τ2 '= τ-τ ',' Then let T ^ K2-R) τλ '= τ-τ2' (11). Line _ Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economy 20 where T = Τι + Τ2, and R is the adjustment ratio , The value range of R is 0 < R < 2 ι ', after Ding 2', then use the method of resampling in Section 3.2 to change the sample points of the rising and falling intervals of the sound source signal ', and the entire sound source signal adjustment process is shown in Figure 10. In terms of hearing, when the R signal is adjusted to a value greater than 10 (such as 1.5), the sound signal sounds like a sound made when the throat is dry (inflamed), and when When R is much smaller than 1 · 〇 (such as 0.5), the sound of the sound has a flickering feeling. -18 This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) 454173

五、發明說明( A7 B7 PAOOOCOD.TWn g·聲道内部比例镅榦 人類的聲道是由聲帶以後的咽腔、口腔、鼻腔等所構 成’由於每一個人的生理結構都不盡相同,因此發音出來 5的曰色也就不一樣。這裡我們考量的是,即使聲道長度相 同的兩個人’他們的音色聽起來仍然是有差別的,造成這 種現象的原因,我們認為聲道内部比例(即咽腔、口腔的 長度比例)是其中一個具有重大影響力的因素,因此這一 節就說明我們實現聲道内部比例調整的一種方法。 10 依據前人的研究,知道經由LPC分析所得到的反射係 數,其代表的意義是聲道内部各栢鄰區間(secti〇n)的截面 積比[15],而由反射係數所建立的格狀(lattice)濾波器,如 圖十一所示,便可當作是聲道的一種模型。由於Lpc分析 得到的反射係數心至^^,分別表示聲道從聲帶到嘴唇之 15間的η個相鄰區間的截面積比,因此本發明便想到以此作 為聲道内部比例調整的依據,詳細情形如下。 __聲道内部比例調整方法 經 濟 部 智 慧 財 產 局 員 20 消 費 合 作 社 印 製 本發明的聲道内部比例調整之方法是,先依據反射係 數去求出聲道各區間的相對截面積大小,再以内插的方式 來凋整聲道内各區間的截面積大小,然後再將改變後的截 面積反轉成新的一組反射係數,用以建立聲道内部比例調 整後的新的聲道模型,詳細處理步驟為: (STEP a)對各個語音信號週期進行Lpc分析而求出反射 -19- 本紙張尺度適用中關家標韦(CNS)A4規格(210 X 297公楚Γ 5 經濟部智慧財產局員工消費合作社印製 4 Π 73 Α7 Β7 PA880503.TWP - 20/33 五、發明說明( 10 係數與剩餘訊號,設LPC分析階數L為24 ; (STEPb)將Kjl-Kl帶入公式(丨2)求出聲道截面積 Areai 〜AreaL ;V. Description of the invention (A7 B7 PAOOOCOD.TWn g. Internal ratio of the sound channel. The human's sound channel is composed of the pharyngeal cavity, oral cavity, nasal cavity, etc. after the vocal cord. The color of 5 is not the same. Here we consider that even if two people with the same channel length 'their sounds still sound different, the reason for this phenomenon, we think that the internal ratio of the channel (The ratio of the length of the pharyngeal cavity and the oral cavity) is one of the factors that have a significant influence, so this section explains one way we achieve the internal adjustment of the vocal tract. 10 According to previous studies, we know what is obtained by LPC analysis. The reflection coefficient, which represents the meaning is the cross-sectional area ratio [15] of each adjacent section (sectiOn) inside the channel, and a lattice filter established by the reflection coefficient, as shown in Figure 11, It can be regarded as a model of the channel. Since the reflection coefficient obtained by Lpc analysis reaches ^^, it represents the cross-sectional area ratio of η adjacent sections between the vocal fold and 15 lips, so The present invention contemplates using this as the basis for adjusting the internal ratio of the audio channel, and the detailed situation is as follows. __Method for adjusting the internal ratio of the audio channel Calculate the relative cross-sectional area of each section of the channel based on the reflection coefficient, and then use interpolation to adjust the cross-sectional area of each section of the channel, and then reverse the changed cross-sectional area into a new set of reflections. The coefficient is used to establish a new channel model after the channel internal scale is adjusted. The detailed processing steps are as follows: (STEP a) Perform Lpc analysis on each voice signal period to obtain the reflection. -19- This paper scale applies to Zhongguanjiabiao Wei ( CNS) A4 specification (210 X 297 Gong Chu Γ 5 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 4 Π 73 Α7 Β7 PA880503.TWP-20/33 V. Description of the invention (10 coefficients and residual signals, set LPC analysis order L is 24; (STEPb) Bring Kjl-Kl into the formula (丨 2) to find the channel cross-sectional area Areai ~ AreaL;

Area Y+l 1-. .u. x Are%,令 Area 100 (12) (STEPc)依照所需要的聲道内部調整比例u,將 〜jreay 内插成 jreaArea Y + l 1-. .U. X Are%, make Area 100 (12) (STEPc) adjust the proportion u according to the required channel internally, interpolate ~ jreay into jrea

Area, ~ Arear 內插成 Area:〜Area[ -+1 +1 (STEP d)將Areai,〜Area!;帶入公式(13)以求取調整後的 反射係數Kf〜Κι;;Area, ~ Arear is interpolated into Area: ~ Area [-+ 1 +1 (STEP d) Areai, ~ Area !; bring into formula (13) to find the adjusted reflection coefficient Kf ~ Kι;

Area] - Area'M Area, + Areal (13) (請先閱讀背面之注意事項寫本頁)Area]-Area'M Area, + Areal (13) (Please read the notes on the back first to write this page)

M (STEP e)將圖十一中的反射係數K以K’替代,再將 (STEP a)所求得的剩餘訊號代入格狀濾波器,便可得到改 變聲道内部比例之後的語音信號。. 15 5.2 驗證與比較 5.2.1 截面積上的驗證 對於前面所提的聲道内部比例調整方法的一個驗證作 法是,‘拿原始語音信號的LPC分析聲道截面積,與調整聲 20 道截面積後的合成信號的LPC分析截面積作比較,藉此判 -20- 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 4 54 ί73ι 五、發明說明() 7所提出的方法是否㈣達成本發料㈣效果。例如圖 日:-:二是對/a/音信號在,=0.6、咖之三種調整比例 二。:成、㈣信號進行LP C分析所得到的聲道截面積之 又。於廷個方法所採用的是直接改變聲道内部前後兩 5段的長度比例,理論上應該只會影響聲道内部的長度比 例’不過,在分析合成的語音信號後,我例發現除了聲道 内部長度比例會被改變之外,截面積也會受到影響,推測 其,因是,經由LPC分析所得到的剩餘訊號,並不是只有 乾淨的聲源訊號而已,它還包含了聲道的訊息在内,這些 10包含聲道訊息的剩餘訊號經過調整過的聲道模型時,便會 產生截面積上的改變。其他音素如/w與/u/的分析結果, 也發現有類似的現象。 5.2.2頻譜上的比鲂 15 經濟部智慧財產局員工消費合作社印製 20 關於頻譜上的tt較,在此僅㈣/ay音在頻譜上的分析,其 餘的/1/以及/u/的頻譜也都有相似的結果,圖十三為經遇 LPC分析後得到的頻譜圖,圖十三(a)、(b)與⑷分別為調壁 比例u設為1.0、1.4與〇.6時所得到的分析結果。觀察圖十三 可發現,當11= 1·4時,F1的振幅顯現增加的趨勢,打的則 減少,在頻率彳立置上,^與们兩者都沒改變,因此可以相 這種現象解釋為低頻部分的能量增加;另外當u==〇6時, F1的振幅顯現減少的‘趨勢’们的則增加,在頻率位置上, F1有往F2移動的趨勢,而们則沒改變,因此可以把這種現 象解釋為南頻部分的能量增加。 _ -21-本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公愛) 4ί4173 A7 B7 五、發明說明( .PA88050-i.TWP - ? '33 10 多·2·3音色上的比較 關於音色上的比較’.係以/ a /、/ i /、/ u /三個母音來實 驗’經過實際試聽後’得到如表二裡記錄的結果,即設定 調整比例u=〇.6時,三個母音都一致地變得較為浮悶(音高 感覺升尚了)’而設定調整比例u==1.4時,三個母音也都一 致地變得較為厚實(音高感覺降低了),這樣的現象可由 5.2.2節裡觀測到的頻譜變化來加以解釋;並且,這裡的聲 道内部比例調整方法,對於各種音素的音色改變都能展現 出一致性,而沒有南轅北轍的音色改變情形。 n ___n n 1 -:.- n i n 1 n n n L 1 n n I n n (請先閱讀背面之注音?事項寫本頁). 表—二個·&奋力11热太1 zi由iV CtHb仏,丨4·^ u = 0.6 u= 1.4 /a/ 比原音浮悶 比原音厚實 /1/ 比原音浮悶 •比原音厚實 /u/ '比原音浮悶 比原音厚f 6·系統評估 本發明的半自動人聲配音方法是在値人%啊〜以 15作業系統上建造、發展的,由於雈4立4 1· 電腦之Lil· 20 Α ϋ日双下無汝问Β寻進杆 υ音的轉,所財發明❹了兩 進行錄音與放音的處理;在軟體的寫作上二 理的作法,圖一裡 木取〗P忪處仿Μ彼 步驟,是以信號週期為單’循序、獨立地去處理;所 Κ6-2 / 333MHz。 ⑽人電知 CPU疋 AMd -22 - 本紙張尺度適财國國家標準M (STEP e) replaces the reflection coefficient K in FIG. 11 with K ′, and substitutes the remaining signal obtained by (STEP a) into the grid filter to obtain the speech signal after changing the internal ratio of the channel. 15 5.2 Verification and comparison 5.2.1 Verification on cross-sectional area A verification method for the previously mentioned internal adjustment method of the channel is to 'take the LPC of the original speech signal to analyze the cross-sectional area of the channel, and adjust the 20-channel cross-section of the sound. The cross-sectional area of the LPC analysis of the synthesized signal after the area is compared to determine -20- This paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) 4 54 ί73ι V. Description of the invention () 7 Whether the method can achieve the effect of cost. For example, Figure Day:-: The second is to adjust the ratio of the / a / tone signal, = 0.6, and three of the two. : The cross-sectional area of the channel obtained by the LPC analysis of the Cheng and Chi signals. The method used in Yu Ting's method is to directly change the length ratio of the two front and back sections of the channel. In theory, it should only affect the length ratio of the channel's internal length. However, after analyzing the synthesized speech signal, I found that in addition to the channel, In addition to the change in the internal length ratio, the cross-sectional area will also be affected. It is speculated that the remaining signal obtained by LPC analysis is not just a clean sound source signal. It also contains the channel information. Here, when the remaining channel signals including the channel information 10 are adjusted, the cross-sectional area will be changed. Similar results were found for other phonemes such as / w and / u /. 5.2.2 Comparison on the spectrum 15 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 20 About the comparison on the spectrum, only the analysis of the / ay tone on the spectrum, and the rest of the / 1 / and / u / The spectrum also has similar results. Figure 13 is the spectrum obtained after LPC analysis. Figures 13 (a), (b), and 为 are the wall adjustment ratios u when they are set to 1.0, 1.4, and 0.6, respectively. The analysis results obtained. Observing Figure XIII, it can be found that when 11 = 1.4, the amplitude of F1 appears to increase, and the number of hits decreases, and at the frequency of standing, both ^ and us have not changed, so it can be related to this phenomenon. It is explained that the energy of the low frequency part increases; in addition, when u == 〇6, the amplitude of F1's amplitude decreases and the trend tends to increase. At the frequency position, F1 has a tendency to move toward F2, but they have not changed. Therefore, this phenomenon can be interpreted as an increase in the energy of the south frequency part. _ -21- This paper size applies to China National Standard (CNS) A4 (210 X 297 public love) 4ί4173 A7 B7 V. Description of the invention (.PA88050-i.TWP-? '33 10 more · 2 · 3 Comparison on the comparison of the tone '. The experiment is based on the three vowels of / a /, / i /, / u /. After the actual audition, the results are recorded as shown in Table 2, that is, the adjustment ratio u = 0.6 When the three vowels are uniformly duller (the pitch feels higher), and when the adjustment ratio u == 1.4 is set, the three vowels are also uniformly thicker (the pitch is lowered) This phenomenon can be explained by the spectral changes observed in Section 5.2.2; and, the internal channel ratio adjustment method here can show consistency for the tone change of various phonemes, without the tone change situation of the same. . N ___n n 1-: .- nin 1 nnn L 1 nn I nn (Please read the phonetic on the back? Matters to write on this page). Table—Two · & Strive 11 Hot 1 zi by iV CtHb 仏, 丨4 · ^ u = 0.6 u = 1.4 / a / is thicker than the original sound / 1 / is thicker than the original sound / 1 / The original sound is thick and thicker than the original sound. The systemic evaluation of the semi-automatic vocal dubbing method of the present invention is constructed and developed on a 15% operating system. Since the 4 is 4 and the computer is 1 No. Lil · 20 Α The next day, there is no question B. The rotation of the sound of the search bar, the property has invented two processes for recording and playback; the two-practice approach to software writing, Figure 1 〖P 忪 is similar to the other steps, which are based on the signal cycle as a single step to process independently and independently; so Κ6-2 / 333MHz. ⑽ 人 电 知 CPU 疋 AMd -22-This paper is suitable for national standards

4 5 4 1 73 A7 B7 PA880503.TWP - 23/33 經濟部智慧財產局員工消費合作社印製 五、發明説明() 6.1音色測試 本發明採取聽測的方式來進行音色評估,事先請一位 .男性與一位女性各錄一句話並加以存檔,然後拿這兩句話 給本發明的系統作音色轉換處理,每句話經由不同參數設 5 定之音色轉換處理,選取較具代表性的十句不同音色的合 成語句給試聽者聽,這裡參與聽測的人數有18人,分別就 十句合成語音的清晰度、自然度與辨別度作評估。在此所 謂的清晰度是與原始錄進來的語句作比較,一樣清晰無雜 訊則得滿分十分,比原始語句差則酌量扣分,而自然度是 10指,合成出來的語句是否有男生假裝女聲說話或是女聲假 裝男生說話的情形,如果沒有則得滿分十分,否則依照假 裝程度酌量扣分。 測試時本發明系統的相關設定是,語音信號取樣頻率 為44,100Hz,測試語句為「請把這籃兔子送走」,其他參 15 數的設定如表三裡所列的,包括音高升降、聲道長度、聲 源調整、聲道内部比例等參數的設定。對於播放的十個語 句,試聽者依其感覺逐句給予清晰度與自然度兩項評分, 結果得到如表四裡所列的分數資料。觀察表四男聲部分, 可以發現第五句、第九句的清晰度與自然度都相對的較 20 低,對照表三的設定,皆為將音高降低,聲道總長度拉 長,也就是將聲音往男低音的方向改變,與第七句比較起 I 來,算是調整滿大的,因此效果上較差,反過來,第六、 八句則是自然度與清晰度最高的·,對照表三的設定,為將 音高升高,聲道總長度減少,也就是往女高音的方向改 -23 - 請 閱 讀 背 ώ 之 注 意 事 項 再/ 裝 訂 線 太紙張又度適用中國國家標準(CNS ) Α4規格(210X297公釐) 4 經濟部智慧財產局員工消費合作社印製 5 4 1 73 A7 B7 PA880503.TWP - 24/33五、發明説明() 變,由於原本是男性的聲音,因此這樣的改變效果還算不 錯。 表三音色轉換之10種參數設定 音 尚 % 聲道 長度 % R% U% 音 % 聲道 長度 % R% U% 第一句 125 100 100 140 第六句 200 60 120 100 第二句 140 70 100 100 第七句 80 110 100 170 第三句 400 50 100 100 第八句 140 70 100 80 第四句 60 60 100 100 第九句 50 130 80 80 第五句 50 150 100 100 第十句 70 90 100 30 5 再由表四的女聲部分來看,第三、四句算是清晰度與 自然度較低的,而第五、七、九句為較高的三句,對照表 3也可以發現,將原本是女性的聲音調整成男性的聲音(第 五、七、九句)的效果是比較好的,而將原本是女性的聲 音調整成女高音的聲音(第三、四句)則效果較差。 10 再由平均值來看,在自然度上男聲、女聲兩者的差異 不大,而在清晰度上明顯的是女聲部分較高,這樣的現象 我們分析後發現,將兩句原始錄音(其中一句是男生唸 的,另一句是女生唸的)放出來聽時,可聽出原始的男聲 語句在清晰度上就比女聲語句要差一些,所以才會得封這 15 樣的結果。不過綜合來看,不論是清晰度或是自然度上都 還算是中上的水準,因為經過調整後的語音信號,或多或 少都會導入雜訊(如基週之偵測,未能達到百分之百正 確)而影響其清晰度,另外試聽者也事先知道聽測語句是 -24- 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) m- I —^—^1 - -I ! —^ϋ ID ^^^^1 fll^i In I— nm l s nn In fm ml 、V5 务 - - (請先聞讀背面之注意事項-Ef>c本頁) ,;:..' 173 4 5 4 A7 B7 五、發明説明() 由男聲(或女聲)所轉換過來的 、、’α比十分低一些的分數。 PA880503.TWP - 25/33 所以在自然度的評估上會 ___衣四清晰度與自然度之聽測結果 ---- ___ 女聲 清晰度 自然度 清晰度 自然度 第一句. 9.1 7.71 第六句 7.28 6.87 —第二句 9.02 7.61 第七句 9.31 8.15 第三句 7.17 6.67 . 第八句 9.24 7.33~ 第四句 7.5 6.65 _第九句 8.85 8.36 第五句 8.99 8.54 第十句 7.5 7.29 _ 平均 8.396 7.518 男聲 ------------ 第一句 清晰度 —7:42」 自然度 -------- 7.87 ~第六句 清晰度 — 8.4~ 自然度 7.77 第二句 7.52 7.66 第七句 7.71 7.41 __弟二句 7.37 ^7.24 第八句 8.51 Γ 8.51 弟四句 6.53 6.7 余九句 6.52 6.06 弟五.句 6.32 5.75 第十句 7.05 7 13 平均 7.335 7.21 *^^^^=*===81 (請先閱讀背面之注意事項本頁) •裝· 訂 經濟部知曰慧財產局員工消費合作社印製 - 另外,由合成語句的音色來辨別說話者是不是同一人 的辨別度評估,我們的評估方法是:隨機從男聲(或女聲) 邛分的10個合成語句中選取出1〇個測試語句,可重複選取 同—句,職依照被選取的順序撥放戰語句給試聽者 10聽’讓他分辨這10句話是由幾個人說出來的,最後以試聽 者回答的人數P除以真正不同音色的人數Q,作為辨別度 的評分。舉例來說,若隨機選的10句中有4句重複,則Q= -25- _·_ ---.... . 表紙張尺度適用中國國家標準(CNS ) M規格y^〇X2,97公慶) -線4 5 4 1 73 A7 B7 PA880503.TWP-23/33 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5. Description of the invention () 6.1 Tone test The present invention adopts the method of audiometry to evaluate the tone, please ask one in advance. The male and a female record one sentence each and then archive them, then take these two sentences to the system of the present invention for tone conversion processing, each sentence is set to 5 different tone conversion processing with different parameters, and the more representative ten sentences are selected. Synthesized sentences of different tones are given to the listeners. There are 18 people participating in the test, and the intelligibility, naturalness and discrimination of the ten synthesized speeches are evaluated respectively. The so-called intelligibility here is compared with the original recorded sentences. The same clearness and no noise will be scored out of ten points, and the points that are worse than the original sentences will be deducted. The naturalness is 10 fingers. Does the synthesized sentence have a boy pretending In the case of a female voice or a female voice pretending to be a male speaker, the score will be a full tenth if not, otherwise the points will be deducted according to the degree of pretending. The relevant settings of the system of the present invention during the test were that the sampling frequency of the voice signal was 44,100Hz, and the test sentence was "Please send this basket of rabbits away." The other parameters are set as listed in Table 3, including pitch elevation, Channel length, sound source adjustment, channel internal ratio and other parameters. For the ten sentences played, the listener gave two scores of clarity and naturalness one by one according to their feelings. As a result, the score data listed in Table 4 were obtained. Looking at the male voice in Table 4, we can find that the intelligibility and naturalness of the fifth and ninth sentences are relatively lower than 20. The settings in Table 3 are all to reduce the pitch and lengthen the total channel length, that is, Change the direction of the sound towards the bass. Compared with the seventh sentence, I is considered to be full, so the effect is poor. Conversely, the sixth and eighth sentences are the most natural and clear. Three settings, in order to increase the pitch, the total length of the channel is reduced, that is, the direction of the soprano is changed -23-Please read the precautions for the back price again / The binding line is too paper, and it also applies the Chinese National Standard (CNS) Α4 Specifications (210X297 mm) 4 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs 5 4 1 73 A7 B7 PA880503.TWP-24/33 V. Description of the invention () Change, because it was originally a male voice, so the effect of this change not bad. Table 3 The 10 parameters for tone conversion. Set the tone% channel length% R% U% tone% channel length% R% U% first sentence 125 100 100 140 sixth sentence 200 60 120 100 second sentence 140 70 100 100 seventh sentence 80 110 100 170 third sentence 400 50 100 100 eighth sentence 140 70 100 80 fourth sentence 60 60 100 100 ninth sentence 50 130 80 80 fifth sentence 50 150 100 100 tenth sentence 70 90 100 30 5 According to the female voice in Table 4, the third and fourth sentences are considered to have lower intelligibility and naturalness, while the fifth, seventh, and nine sentences are the higher three. According to Table 3, you can also find that the original It is better to adjust the female voice to the male voice (fifth, seventh, and nine sentences), but to adjust the voice that is originally female to the soprano voice (third and fourth sentences) is less effective. 10 From the average point of view, the difference between the male voice and the female voice is not large in terms of naturalness, but the female voice is obviously higher in clarity. After analyzing this phenomenon, we found that the two original recordings (of which One sentence was read by a boy and the other was read by a girl.) When you put it out, you can hear that the original male voice is worse than the female voice, so you have to close these 15 results. However, on the whole, both in terms of clarity and naturalness are considered to be of a medium-to-upper level, because the adjusted voice signal will more or less introduce noise (such as the detection of the base period, which has not reached 100%). Correct) and affect its intelligibility. In addition, the listener also knew in advance that the test sentence was -24- This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) m- I — ^ — ^ 1--I! — ^ Ϋ ID ^^^^ 1 fll ^ i In I— nm ls nn In fm ml, V5 service--(Please read the precautions on the back-Ef > c page), :: .. '173 4 5 4 A7 B7 V. Description of the invention () The score converted by the male voice (or female voice), with a lower alpha ratio. PA880503.TWP-25/33 Therefore, in the evaluation of naturalness, we will ___________ the intelligibility results of natural clarity and naturalness ---- ___ female voice clarity naturalness clarity naturalness first sentence. 9.1 7.71 Six sentences 7.28 6.87 — second sentence 9.02 7.61 seventh sentence 9.31 8.15 third sentence 7.17 6.67. Eighth sentence 9.24 7.33 ~ fourth sentence 7.5 6.65 _ ninth sentence 8.85 8.36 fifth sentence 8.99 8.54 tenth sentence 7.5 7.29 _ average 8.396 7.518 Male voice ------------ First sentence clarity—7: 42 "Naturalness -------- 7.87 ~ Sixth sentence clarity—8.4 ~ Naturalness 7.77 Second Sentence 7.52 7.66 Seventh sentence 7.71 7.41 __Second sentence 7.37 ^ 7.24 Eighth sentence 8.51 Γ 8.51 Fourth sentence 6.53 6.7 Remaining nine sentences 6.52 6.06 Five. Sentence 6.32 5.75 Tenth sentence 7.05 7 13 Average 7.335 7.21 * ^^ ^^ = * === 81 (Please read the note on the back page first) • Binding and ordering Printed by the Consumers' Cooperative of the Zhihui Property Bureau of the Ministry of Economic Affairs-In addition, the tone of the synthesized sentence is used to identify whether the speakers are the same Evaluation of human discrimination, our evaluation method is: randomly select 10 points from male (or female) points 10 test sentences are selected from the completed sentence, and the same sentence can be repeatedly selected, and the war sentence is given to the listener 10 according to the selected order. 'Let him distinguish these 10 sentences from several people. Finally, Divide the number of people answering by the listener P by the number of people Q with really different timbre as the score of discrimination. For example, if there are 4 repetitions in 10 randomly selected sentences, then Q = -25- _ · _ ---..... The paper size applies the Chinese National Standard (CNS) M specification y ^ 〇X2, 97 public holidays)-line

• -—-I I - - II 73 73 經濟部智慧財產局員工消費合作社印製' 5 4 A7 ___P顯抓丁 WP·咖 五、發明説明() 6,而分辨出的人數P有4,5,3,7等四種數值(設試聽者有 4人),此時辨別度的計算為(4 +5+ 3+ 5)/(6 *4) = 70.83 %,請注意第四位試聽者回答為7個不同的音色,但是Q = 6,因此將第四位的P改為P1,即令 5 P'二 Q-、P-Q) , if P>Q (14) 如此,辨別度越高,就代表各個音色越不同,我們的 辨別度實驗結果如表五裡所示的,數值上比85%高,即10 種轉換出的音色平均會被認為是由8.5個人說出來的,另 外也可發現由女生聲音轉換出的不同音色,具有較高一些 10 的辨別度,不過,我們還不能做這樣的結論,就是女生聲_ 音的音色轉換會具有較高的辨別度,因為只實驗過一個男 生與一個女生的聲音而已。 •表五辨別度評估之結果 男聲 女聲 辨別度 85.7% 92.9% 15 6.2 即時性測試 本系統的輸入方式分為由標案输入以及麥克風輸入兩 種,而取樣頻率分為11,025Hz,22,050Hz以及44,100Hz等三 種,同時又有四個參數可以調整,因此在即時性上的測試 20 是以當四個參數都作調整的情形下進行的,即時的定義 是,當連續輸入語音信號給系統處理時,由喇0八輸出的音 色轉換過的語音信號,不會發生斷斷續續的情形。考慮輸 -26- 本紙法尺度適用中國國家標準(CNS ) A4規格(210X297公釐) ' ' (請先閲讀背面之注意事項再本頁) -裝• -—- II--II 73 73 Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs' 5 4 A7 ___P Xian Ding WP · Ka V. Invention Description (6), and the number of distinguished persons P is 4, 5, Four values, such as 3 and 7 (assuming there are 4 listeners). At this time, the discrimination is calculated as (4 +5+ 3+ 5) / (6 * 4) = 70.83%. Please note that the fourth listener answered There are 7 different tones, but Q = 6, so change the P of the fourth position to P1, that is, make 5 P 'two Q-, PQ). If P > Q (14) So, the higher the degree of discrimination, it means The more different the tones are, the results of our discrimination experiments are shown in Table 5. The values are higher than 85%, that is, the 10 converted tones will be considered to be spoken by 8.5 people on average. The different tones converted from girls 'voices have a higher discrimination of 10, however, we cannot yet make such a conclusion that the tone conversion of girls' tones will have higher discrimination because only one boy has been tested with The voice of a girl. • Table 5 Results of the discrimination evaluation The discrimination between male and female voices is 85.7% 92.9% 15 6.2 Immediate test The input method of this system is divided into two types: standard input and microphone input, and the sampling frequency is divided into 11,025Hz, 22,050Hz, and 44,100. There are three types including Hz, and four parameters can be adjusted at the same time. Therefore, the test 20 on the immediacy is performed when all four parameters are adjusted. The immediate definition is when continuous input of voice signals to the system for processing. The voice signal converted from the tone color output by La 0 Ba will not be intermittent. Consider losing -26- This paper method scale is applicable to Chinese National Standard (CNS) A4 specification (210X297mm) '' (Please read the precautions on the back before this page)-Install

、1T 線_ 54 1 73, 1T line_ 54 1 73

五、發明說明( 二:::取樣頻率的各種組合,測試的結果為,當由㈣ 音色轉換,^時,縣述三種取樣率下㈣達到即時 色轉換不過,當取樣率高於⑽ 士 入時,就盔、、+ , 寺且由麥克風輸 料所有參數皆調整的情況下_即時 風輸入且取樣率古麻湖旧不作调整時’則以麥克 【特點及:】A HZ仍可以達到即時的要求。 ::明提出利用聲道特性來轉換出多樣音色的方法, /明之方法去實作出—個即時的半自動人聲配音 = 程中’本發明考慮了發聲器官的數項機 、曾_卜Τ振動頻率,聲道長度,聲源訊號調整,聲 ,比例等,以改變這些特性來模擬聲道形狀的變化, 希望以此得到多樣化的音色。由於這裡的半自動人聲配音 方法’須作即時的基週偵測的處理,且基週位置侦測的準 15確性對於下游的音色轉換處理及語音品質有很大的影響, 所以’ j發明花了不少d於即時基週仙丨問題的研究 上’目前’基週谓測的正確率可達95%。此外,以所製作 的系統合成出的語音作聽覺測試,結果顯示藉由本發明之 方法的確可以轉換出多樣化的音色。 上列詳細說明係針對本發明之一可行實施例之具體說 明’惟該實施例並非,用以限制本發明之專利範圍,凡未脫 離本發明技藝精神所為之等效實施或變更,均應包含於本 案之專利範圍中。 、 綜上所述,本案不但在技術思想上確屬創新,並能較 I____ -27- 本紙張尺度適用帽國家標準(CNS04規格(210 X 297公爱) -n i· I n Γ -----tri———!線-/! (請先閱讀背面之注意事項再填寫本頁) ,、 73 4 五、發明說明( A7 B7 P'AfiflUSO^.TWl· - 習用處理方法土始推μ、+.夕 進步性之法定; = 功:依=充分符合新顆性及 局核准本件發明專利申請案,以勵發明出申請,懇請貴 ,至感德便。 <請先閱讀背面之注意事項本頁) 訂· --線 經濟部智慧財產局員工消費合作社印製 -28- 本紙張尺度適用中國國家標芋(CNS)A4規格(210 X 297公釐) 454173 A7 B7 rMoououo. \ vvr - ^y/jo 五·、發明說明() 【參考文獻】 [1] H. Kuwabara and Y. Sagisaka, "Acoustic Characteristics of Speaker Individuality: Control and Conversion", Speech Communication, Vol. 16, pp. 165-174, 1995. 5 [2] H. Mizuno and M. Abe, "Voice Conversion Algorithm Based on Piecewise Linear Conversion Rules of Formant Frequency and Spectrum Tilt", Speech Communication, Vol. 16, pp. 153-164, 1995.V. Description of the Invention (2 :: Various combinations of sampling frequencies. The test result is that when the tone color is converted by ㈣, the instant color conversion is achieved under the three sampling rates described above. However, when the sampling rate is higher than In the case of helmets, +, and temples, and all parameters are adjusted by the microphone feed_Instantaneous wind input and the sampling rate is not adjusted in the old Ma Ma Lake's microphone [Features and:] A HZ can still achieve real-time Requirement :: Ming proposed a method of using the channel characteristics to convert a variety of timbres, / Ming method to make a real-time semi-automatic vocal dubbing = in the process, the present invention considers the number of vocal organs, Zeng_ 卜 Τ Vibration frequency, channel length, sound source signal adjustment, sound, ratio, etc., to change these characteristics to simulate the change of channel shape, hoping to get a variety of sounds. Because of the semi-automatic vocal dubbing method 'must be done in real time The processing of base period detection, and the accuracy of base position detection has a great impact on downstream tone conversion processing and voice quality, so it took a lot of d to invent it. Ji Zhouxian 丨 The accuracy rate of the “current” Ji Zhou measurement in the research of the problem can reach 95%. In addition, the speech synthesized by the produced system is used for the hearing test, and the results show that the method of the present invention can indeed convert a variety of The above detailed description is a specific description of one of the feasible embodiments of the present invention, but this embodiment is not intended to limit the scope of the patent of the present invention, and any equivalent implementation or change without departing from the technical spirit of the present invention , Should be included in the scope of patents in this case. In summary, this case is not only innovative in terms of technical ideas, but also can be compared to I____ -27- This paper size applies to the national cap standard (CNS04 specification (210 X 297 public love) ) -Ni · I n Γ ----- tri ————! Line- /! (Please read the notes on the back before filling out this page), 73 4 V. Description of the invention (A7 B7 P'AfiflUSO ^. TWl ·-The customary treatment method begins to promote the statutory progress of μ, +. Xi; = work: according to = fully in line with the new nature and the Bureau approved this invention patent application, in order to encourage the invention of the application, I implore you, please ≪ Please read the back first Note on this page) Order · Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs -28- This paper size applies to the Chinese national standard (CNS) A4 specification (210 X 297 mm) 454173 A7 B7 rMoououo. \ Vvr -^ y / jo V. Description of the invention () [References] [1] H. Kuwabara and Y. Sagisaka, " Acoustic Characteristics of Speaker Individuality: Control and Conversion ", Speech Communication, Vol. 16, pp. 165 -174, 1995. 5 [2] H. Mizuno and M. Abe, " Voice Conversion Algorithm Based on Piecewise Linear Conversion Rules of Formant Frequency and Spectrum Tilt ", Speech Communication, Vol. 16, pp. 153-164, 1995 .

[3] Y. Stylianou and O. Cappe, "A System for Voice Conversion Based on Probabilistic Classification and a Harmonic plus Noise 10 Model", IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Vol. l,pp. 281-284, 1998.[3] Y. Stylianou and O. Cappe, " A System for Voice Conversion Based on Probabilistic Classification and a Harmonic plus Noise 10 Model ", IEEE Int. Conf. On Acoustics, Speech, and Signal Processing, Vol. L, pp 281-284, 1998.

[4] M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of Formants for Voice Conversion Using Artificial Neural Networks", Speech Communication, Vol. 16, 15 pp. 207-216, 1995.[4] M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, " Transformation of Formants for Voice Conversion Using Artificial Neural Networks ", Speech Communication, Vol. 16, 15 pp. 207-216, 1995.

[5] N. Iwahashi and Y. Sagisaka, "Speech Spectrum Conversion Based on Speaker Interpolation and Mulit-functional Representation with Weighting by Radial Basis Function Networks", Speech Communication, Vol. 16, pp. 139-152, 1995. 20 [6] G. Baudoin and Y. Stylianou, "On the Transformation of the[5] N. Iwahashi and Y. Sagisaka, " Speech Spectrum Conversion Based on Speaker Interpolation and Mulit-functional Representation with Weighting by Radial Basis Function Networks ", Speech Communication, Vol. 16, pp. 139-152, 1995. 20 [6] G. Baudoin and Y. Stylianou, " On the Transformation of the

Speech Spectrum for Voice Conversion", Int. Conf. on Spoken Language Processing, Vol. 3, pp. 1405-1408, 1996.Speech Spectrum for Voice Conversion ", Int. Conf. On Spoken Language Processing, Vol. 3, pp. 1405-1408, 1996.

[7] Η. Y. Gu and W. L. Shiu, "A Mandarin-syllable Signal Synthesis Method with Increased Flexibility in Duration, Tone and -29- 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) 請 先 閱 讀 背 s> 之 注 意 事 項 再 t 經濟部智慧財產局員工消費合作社印製 A7 PA<5503l!l5.TWP - όΰ/οό 73 _^_B7 五、發明說明()[7] Y. Y. Gu and WL Shiu, " A Mandarin-syllable Signal Synthesis Method with Increased Flexibility in Duration, Tone and -29- This paper size applies to China National Standard (CNS) A4 (210 X 297 mm) ) Please read the precautions before printing the A7 PA < 5503l! L5.TWP-όΰ / οό 73 _ ^ _ B7 printed by the Intellectual Property Bureau's Consumer Cooperatives of the Ministry of Economy

Timbre Control", Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, Vol. 22, No. 3, pp. 385-395, 1998.Timbre Control ", Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, Vol. 22, No. 3, pp. 385-395, 1998.

[8] D. G. Childers, "Glottal Source Modeling for Voice 5 Conversion", Speech Communication, Vol. 16, pp. 127-138, 1995.[8] D. G. Childers, " Glottal Source Modeling for Voice 5 Conversion ", Speech Communication, Vol. 16, pp. 127-138, 1995.

[9] P. H. Milenkovic, "Voice Source Model for Continuous Control of Pitch Period", J. Acost. Soc. Am., Vol. 93, No. 2, pp. 1087-1096, 1993.[9] P. H. Milenkovic, " Voice Source Model for Continuous Control of Pitch Period ", J. Acost. Soc. Am., Vol. 93, No. 2, pp. 1087-1096, 1993.

[10] L. R. Rabiner, et al., "A Comparative Performance Study of 10 Several Pitch Detection Algorithms", IEEE trans. Acoust., Speech, and Signal Processing, pp. 399-418, Oct. 1976.[10] L. R. Rabiner, et al., &Quot; A Comparative Performance Study of 10 Several Pitch Detection Algorithms ", IEEE trans. Acoust., Speech, and Signal Processing, pp. 399-418, Oct. 1976.

[11] J. D. Markel and A. H. Gray, Jr., Linear Prediction of Speech, New York: Springer-Verlag, 1976.[11] J. D. Markel and A. H. Gray, Jr., Linear Prediction of Speech, New York: Springer-Verlag, 1976.

[12] Y. Medan, E. Yair, and D. Chazan, "Super Resolution Pitch 15 Determination of Speech Signals", IEEE trans. Signal Processing, pp. 40-48, Jan. 1991.[12] Y. Medan, E. Yair, and D. Chazan, " Super Resolution Pitch 15 Determination of Speech Signals ", IEEE trans. Signal Processing, pp. 40-48, Jan. 1991.

[13] 古鴻炎、譚百華,''歌唱聲至樂器聲之即時轉換系 統”,全國計算機會議論文集(中壢),第228-234頁,1995。 [14] J. F. Wang, et al., "A Hierarchical Neural Network Model 20. Based on a C/V Segmentation Algorithm for Isolated Mandarin[13] Gu Hongyan, Tan Baihua, "Instant Conversion System of Singing to Instrumental Sound", Proceedings of National Computer Conference (Zhongli), pp. 228-234, 1995. [14] JF Wang, et al. ,, " A Hierarchical Neural Network Model 20. Based on a C / V Segmentation Algorithm for Isolated Mandarin

Speech Recognition", IEEE trans. Signal Processing, pp. 2141-2146, Sep. 1991.Speech Recognition ", IEEE trans. Signal Processing, pp. 2141-2146, Sep. 1991.

[15] L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. -30- 本紙張尺度適用中國國家標準(CNS)A4規格(210 X 297公釐) -------------裝—--—訂------線 e ft . * (請先閱讀背面之注意事項再w本頁) 經濟部智慧財產局員工消費令作社印製[15] LR Rabiner and BH Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993. -30- This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) --------- ---- Equipment ----- Order ------ line e ft. * (Please read the notes on the back before w page)

Claims (1)

454173 Α8 Β8 C8 D8 申請專利範圍 1_ -種半自動人聲配音方法’係為—種音色轉換之方 法,以改變基頻、聲道長度、聲源訊號及聲道内部比 例之處理步驟來達到將—種音色轉換出的目標;該方 法主要處理流程依序包括: -基週❹j,將輸人的語音信—時師_)為單位 進行切割,然後即時求取時框中各基週頂點㈣㈣ 的位置; 經濟部智慧財產局員工消費合作社印製 一音調與聲道長調整,此部份是修改本案申請人先前 提出的 TIPW (Time Proportioned Interpolation of Pitch 10 Waveform)音節信號合成方法,以便能夠在即時的要 求下’將語音信號依據所設定的音高、聲道長參數來 作調整; 一聲源訊號调整’此部份是透過LPC(linear prediction coding)分析來求取聲源訊號,並調整該聲源訊號; 15 —聲道内部比例調整,以LPC分析所建構的聲道模型 為基礎,改變聲道前後部分的長短比例,以模擬不同 人的聲道内部比例之差異。 2.如申請專利範圍第1項所述之半自動人聲配音方法’ 其中基週頂點的位置選取步驟為: 20 步驟⑴:依能量及零點交越率來設定週期性旗標’然 後判斷週期性旗標是否皆為零,若皆為零(即無週期 t 性信號),直接傳回零個適期; 步驟(2):合併三個時框中各取出的15個最大振幅值’ 依時間次序加以排序,然後存入陣·列Y[1]〜 Y[45]; -31 _ 本紙張尺度適用中國國家襟準(CNS ) A4規格(210 X 297公釐) (請先閣讀背面之注意事項再本頁) I or-'— τ 裝 I . • m XV 、vs .I I - - - 1^1. I 1 經濟部智慧財產局員工消費合作社印製 4 5 ^ A8 3 B8 C8 D8 -PA880503.TWP - 32/33 六、申請專利範圍 步驟(3):計算振幅門檻值並存於變數Clip,我們設定 Clip的程序是’本次緩衝區的二個時框各取出一個振 幅極大值,設為maxl,max2,max3,接著判斷前一個緩 衝區是否具有週期性訊號,如果沒有,就令Clip == 5 (maxl+max2+max3)*0.2,如果有週期性訊號,就令Clip =min(maxl,max2,max3)*0.6 ; 步驟(4):由於Y[l]〜Y[45]並不是每一個點都在峰值的 位置,而是形成一群群集中在峰值及峰值附近的點 上,因此依序取出Y[l]〜Υ[45]中振幅大於Clip且為鋒 ίο 值者,將它們存入陣列χ[ι]〜χ[κ]; , 步驟(5):上、下週期長度之門檻值的設定: 若前一個缓衝區具有週期性訊號 則令下週期長度門檻=前一個緩衝區的週期平均值乘 0.75,即 15 ( ave_pitch) * 0.75 上週期長度門檻二前一個緩衝區的週期平均值乘 1.75,即 (ave_pitch) ^ 1.75 否則令下週期長度門榧=35 * sampling_rate / 11,025 20 上週期長度門權=200 * sampling_rate /11,025 也就是當第一個週期信號開始出現時,設定其基頻必 須车55Hz至315Hz的範圍内,而當目前所分析的緩衝 : 區不是連續週期信號的起始點時,則隨著到目前.為止 的平均週期長度作調整; -32- 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) Hi - ........ •- -!- —1 - - - |!1 1 —.1 :-!;= - - - - I - 一aJ - - I - - - -- i I h ,-- f V (請先閱讀背面之注意東項再气.本頁) _^^ A8 B8 C8 08_PA880503.TWP - 33/33 六、申請專利範圍 (請先閱讀背面之注意事項再 步驟(6):由陣列X[l]〜X[K]中找出本次緩衝區中所有 的週期頂點位置;作法為以緩衝區起點為參考點,將 X[l]〜Χ[Κ]中距離在上、下週期長度門檻之間的x[i] 找出,從中取出一個具有最大振幅的點當作是週期的 5 邊界點;再以此邊界點為參考點,往前找出X[l]〜 X[K]中下一批介於上、下週期長度門檻之間的X[i], 從中挑出振幅最大之X[i]當作是下一個週期邊界點, 如此繼續找下去。 經濟部智慧財產局員工消費合作社印製 -33 - 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐)454173 Α8 Β8 C8 D8 Application scope of patent 1_ -Semi-automatic vocal dubbing method 'is a method of tone conversion, in order to change the fundamental frequency, channel length, sound source signal and internal ratio of the channel to achieve- The target of the tone conversion; the main processing sequence of this method includes:-base cycle ❹j, which cuts the input voice message—time division _) as a unit, and then instantly finds the position of each base cycle vertex 时 in the time frame ; The consumer cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs printed a tone and channel length adjustment. This part is to modify the TIPW (Time Proportioned Interpolation of Pitch 10 Waveform) syllable signal synthesis method previously proposed by the applicant of this case, so that Under the request, 'the voice signal is adjusted according to the set pitch and channel length parameters; a sound source signal adjustment' This part is to obtain the sound source signal through LPC (linear prediction coding) analysis and adjust the sound Source signal; 15 — Internal channel ratio adjustment, based on the channel model constructed by LPC analysis, changing the length-to-length ratio of the front and rear parts of the channel , The ratio of the difference of internal channels to simulate different people. 2. The semi-automatic vocal dubbing method described in item 1 of the scope of the patent application, wherein the steps of selecting the position of the apex of the base cycle are: 20 Step ⑴: Set the periodic flag according to the energy and the zero crossing rate, and then judge the periodic flag If the targets are all zero, if they are all zero (that is, there is no periodic t-signal), return zero zero periods directly; Step (2): Combine the 15 maximum amplitude values taken out from each of the three time frames. Sort and store them in the array Y [1] ~ Y [45]; -31 _ This paper size is applicable to China National Standard (CNS) A4 (210 X 297 mm) (Please read the precautions on the back first (More on this page) I or -'— τ I. • m XV, vs. II---1 ^ 1. I 1 Printed by the Consumer Cooperative of the Intellectual Property Bureau of the Ministry of Economic Affairs 4 5 ^ A8 3 B8 C8 D8 -PA880503. TWP-32/33 VI. Patent Application Step (3): Calculate the amplitude threshold and store it in the variable Clip. We set the procedure of Clip as' the two time frames of the buffer each take a maximum amplitude value and set it to maxl , Max2, max3, then determine whether the previous buffer has a periodic signal, if not If yes, let Clip == 5 (maxl + max2 + max3) * 0.2. If there is a periodic signal, let Clip = min (maxl, max2, max3) * 0.6; Step (4): Since Y [l] ~ Y [45] is not that every point is at the position of the peak, but forms a group of points near the peak and the peak. Therefore, in Y [l] ~ Υ [45], the amplitude is greater than Clip and is frontal. ίο Valuers, store them in the array χ [ι] ~ χ [κ]; Step (5): Setting the threshold value of the length of the upper and lower cycles: If the previous buffer has a periodic signal, the next cycle is ordered Length threshold = Period average of the previous buffer multiplied by 0.75, which is 15 (ave_pitch) * 0.75 Upper cycle length threshold 2 Period average of the previous buffer multiplied by 1.75, which is (ave_pitch) ^ 1.75 Otherwise, set the cycle length threshold. = 35 * sampling_rate / 11,025 20 Upper cycle length gate weight = 200 * sampling_rate / 11,025 That is, when the first periodic signal starts to appear, set its fundamental frequency to be in the range of 55Hz to 315Hz, and when the buffer being analyzed is currently : When the area is not the starting point of a continuous period signal, the The length of the cycle is adjusted; -32- This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) Hi-........ •--!-—1---|! 1 1 — .1:-!; =----I-Yi aJ--I----i I h,-f V (Please read the note on the back first and then breathe. This page) _ ^^ A8 B8 C8 08_PA880503.TWP-33/33 6. Scope of patent application (please read the precautions on the back and then step (6): find all the cycles in the buffer this time from the array X [l] ~ X [K] The position of the vertex; the method is to use the starting point of the buffer as a reference point, find x [i] in the distance between X [l] ~ χ [Κ] between the upper and lower cycle length thresholds, and take a point with the largest amplitude from it Take it as the 5 boundary point of the cycle; then use this boundary point as a reference point to find the next batch of X [l] ~ X [K] X [i] between the upper and lower cycle length thresholds Then, pick X [i] with the largest amplitude as the boundary point of the next cycle, and continue to find it. Printed by the Consumer Cooperatives of the Intellectual Property Bureau of the Ministry of Economic Affairs -33-This paper size applies to China National Standard (CNS) A4 (210X297 mm)
TW88122862A 1999-12-24 1999-12-24 Semi-automatic human voice dubbing method TW454173B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW88122862A TW454173B (en) 1999-12-24 1999-12-24 Semi-automatic human voice dubbing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW88122862A TW454173B (en) 1999-12-24 1999-12-24 Semi-automatic human voice dubbing method

Publications (1)

Publication Number Publication Date
TW454173B true TW454173B (en) 2001-09-11

Family

ID=21643533

Family Applications (1)

Application Number Title Priority Date Filing Date
TW88122862A TW454173B (en) 1999-12-24 1999-12-24 Semi-automatic human voice dubbing method

Country Status (1)

Country Link
TW (1) TW454173B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115602182A (en) * 2022-12-13 2023-01-13 广州感音科技有限公司(Cn) Sound conversion method, system, computer device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115602182A (en) * 2022-12-13 2023-01-13 广州感音科技有限公司(Cn) Sound conversion method, system, computer device and storage medium

Similar Documents

Publication Publication Date Title
Sundberg The perception of singing
Sundberg et al. Effects on the glottal voice source of vocal loudness variation in untrained female and male voices
Boersma et al. Spectral characteristics of three styles of Croatian folk singing
Umbert et al. Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges
JP5039865B2 (en) Voice quality conversion apparatus and method
Ardaillon Synthesis and expressive transformation of singing voice
JP2010014913A (en) Device and system for conversion of voice quality and for voice generation
Roekhaut et al. A model for varying speaking style in TTS systems
JP2014048472A (en) Voice synthesis system for karaoke and parameter extractor
Vurma et al. Where is a singer's voice if it is placed “forward”?
Raitio et al. Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis
TW454173B (en) Semi-automatic human voice dubbing method
Janer Singing-driven interfaces for sound synthesizers
Loscos Spectral processing of the singing voice.
Aso et al. Speakbysinging: Converting singing voices to speaking voices while retaining voice timbre
JP4349316B2 (en) Speech analysis and synthesis apparatus, method and program
Howard The vocal tract organ and the vox humana organ stop
Raitio Hidden Markov model based Finnish text-to-speech system utilizing glottal inverse filtering
JP2022065554A (en) Method for synthesizing voice and program
JP5560769B2 (en) Phoneme code converter and speech synthesizer
Roberts et al. A time-scale modification dataset with subjective quality labels
Deng et al. Speech analysis: the production-perception perspective
Bous A neural voice transformation framework for modification of pitch and intensity
Glasner The Development of the Operatic Voice in the 20th Century: An Analysis of the Effect of Early Recording Technology
Howard Virtual choirs

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MM4A Annulment or lapse of patent due to non-payment of fees