TWI375214B - System for simulating human singing and method thereof - Google Patents

System for simulating human singing and method thereof Download PDF

Info

Publication number
TWI375214B
TWI375214B TW97150669A TW97150669A TWI375214B TW I375214 B TWI375214 B TW I375214B TW 97150669 A TW97150669 A TW 97150669A TW 97150669 A TW97150669 A TW 97150669A TW I375214 B TWI375214 B TW I375214B
Authority
TW
Taiwan
Prior art keywords
voice
module
song
lyrics
text
Prior art date
Application number
TW97150669A
Other languages
Chinese (zh)
Other versions
TW201025300A (en
Inventor
Nick Liao
Original Assignee
Inventec Besta Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Co Ltd filed Critical Inventec Besta Co Ltd
Priority to TW97150669A priority Critical patent/TWI375214B/en
Publication of TW201025300A publication Critical patent/TW201025300A/en
Application granted granted Critical
Publication of TWI375214B publication Critical patent/TWI375214B/en

Links

Landscapes

  • Reverberation, Karaoke And Other Acoustics (AREA)

Description

九、發明說明: 【發明所屬之技術領域】 一種語音模擬系統及其方法,特別係指一種模擬人聲 歌唱之系統及其方法。 【先前技術】 唱歌已經成為現代人的休間活動之一,現在除了可以 到KTV使用伴唱設備唱歌之外,也有家用的伴唱設備, 提供給歌唱者在家中自行練唱。不論是在κτν或是在家 中使用伴唱設備唱歌,歌唱者會在螢幕或電視上看到歌曲 的歌詞,並會聽到歌曲的曲調,也就是歌_節奏以及旋 律,如此,歌唱者可以跟著曲調唱出顯示在螢幕或電視上 的歌詞伴唱設财會歌曲的原唱麵唱出的歌 詞0 在某些情況之下,歌唱者不知道歌曲該如何演唱歌 唱者會希奸人聲導唱的魏,也就是原唱麵唱歌曲的 聲音可以隨著歌曲的曲調—触播放絲,讓歌唱者一起 ,唱’因此’目前在κτν使用的伴唱設備所儲存的歌曲 ^,除了儲存歌曲的歌詞、歌曲的曲調以及再歌财記錄 提不歌唱者跟唱歌觸_之外,還另外的儲存了原唱者 ^歌聲,原唱者的歌聲與歌曲的曲·以不_聲道播 放,當歌唱料雜_唱相歌 會播放出原唱麵歌聲,± 夂將播放原喊的歌聲的聲道靜音,如此歌唱者就不會 聽到原唱者的歌聲。 不過由於歌曲的數量過於龐大, 的原唱者幢聲,勢必《财大的轉tfr歌曲 般伴唱設備所提供_存空·不夠大,二、而’-所有歌曲之原唱者的齡蓉 ^致於無法存入 導唱的功能。、〜,土…法為所有歌曲提供人聲 綜上所述,可知先前技術令長期 所有歌曲提供人聲導唱功能的 J·…、法為 的技術手段,來解決此-問題/因此有必要提出改進 f發明内容】 有—馨於先前技術存在無法為所有歌錢供人聲導唱 ::::本發明遂揭露-種模擬人聲歌唱之系統及其 本發明所揭露之模擬人聲歌唱之系統,係應用於伴唱 設備中,伴唱設備輸出歌曲之歌詞及歌曲之曲調,其包括 有:儲存模組,用以健存複數個語音檔案,各語音檔案分 別對應文字;輸人模組,用以輪人歌曲之歌詞、歌曲之曲 調;語音擷賴組,肋由與歌财之各文物對應之語 音檔案中揭取分別對應各文字之各語音訊號;語音調整模 組’用以齡語音訊號之音調輯放長賴㈣與相對應 之各文子於曲調中對應之音調及播放長度相同 ;語音合成 模組’用以將語音訊號依據文字於歌詞中之順序合成語音 貝料’語音输出模組,用以輪出語音資料,其透過語音擷 1375214 模财巾各文㈣时職,並在語音調整 立it調調整語音訊號的音調以及播放長度後,由語 ::,:選歌詞中各文字的排列順序合成調整後的 °。9訊Ά轉決先前技術畴在的問題。 發明所贿之模擬人聲歌唱之方法,係應用於伴唱 二,伴唱設備輸出歌曲之歌詞及歌曲之曲調,其包括 胸猶絲個分卿應文字之語音儲;輸入IX. Description of the invention: [Technical field to which the invention pertains] A speech simulation system and method thereof, in particular, a system for simulating vocal singing and a method thereof. [Prior Art] Singing has become one of the rest activities of modern people. Now, in addition to singing to KTV using sing-along equipment, there are also home sing-along devices that provide singers to sing at home. Whether singing at κτν or using a vocal device at home, the singer will see the lyrics of the song on the screen or on the TV, and will hear the tune of the song, namely the song _ rhythm and melody, so that the singer can follow the tune The lyrics sung by the original singer of the songs displayed on the screen or on the TV. In some cases, the singer does not know how the song sings the singer and will sing the voice of the vocalist. The sound of the original singing song can be accompanied by the song's tune—touching the silk, letting the singer sing together, so 'the songs stored by the singer device currently used in κτν^, in addition to storing the lyrics of the song, the tune of the song. In addition to the song record, the singer and the singer are also singed, and the original singer’s song is also stored. The song of the original singer and the song of the song are played by the _ channel. The song will play the original singing voice, ± 静音 will mute the channel of the original singing voice, so the singer will not hear the original singer's voice. However, because the number of songs is too large, the original singer's singer is bound to be "the rich tfr song-like sing-along equipment provided by 财大_ _ empty space is not big enough, two, and '- the song of the original singer of all songs ^ The function of the guide cannot be saved. , ~, earth ... method for all songs to provide a vocal summary, we can see that the prior art has long-term all songs to provide vocal vocal function J ...., the technical means to solve this - problem / therefore it is necessary to propose improvements f Inventive content] There is no singularity in the prior art for the vocal singing of all songs:::: The invention discloses a system for simulating vocal singing and the system for simulating vocal singing disclosed by the present invention, the application In the accompaniment device, the vocal device outputs the lyrics of the song and the tune of the song, which includes: a storage module for storing a plurality of voice files, each voice file corresponding to the text; the input module for rotating the song The lyrics and the tunes of the songs; the voices rely on the group, the ribs are extracted from the voice files corresponding to the cultural relics of the songs, respectively, corresponding to the voice signals of the respective texts; the voice adjustment module 'the tone of the voice signals for the ages The long-term (four) and the corresponding texts have the same pitch and play length in the tune; the speech synthesis module' is used to combine the voice signals according to the order of the characters in the lyrics. The voice-being 'voice output module' is used to turn out the voice data. It uses the voice 撷1375214 模财巾(4), and adjusts the tone and length of the voice signal after the voice adjustment. :,: Select the order of the characters in the lyrics to synthesize the adjusted °. 9 News turned to the problem of the prior art. The method of simulating vocal singing by the invention is applied to the vocal chorus. The chorus device outputs the lyrics of the song and the tune of the song, including the voice storage of the text of the chest.

歌曲之歌詞及歌曲之曲調;由與歌财之各文字相對應之 語音棺針齡分別對應各文字之各語音訊號;調整各語 音喊之音調及減長度為與相對應之各文字於曲調中 相應之音調及播放長度相同;依據文字於歌詞中之順序, &成相對應之„。g況號為語音資料;輸出語音資料,其透 過擷取對應歌种各文字的語音訊號,並依據曲綱整語 音訊號的音調以及播放長度後,依據歌詞中各文字的排列The lyrics of the song and the tune of the song; the voices corresponding to the characters of the songs correspond to the voice signals of each character; the pitch and the length of each voice are adjusted to correspond to the corresponding characters in the tune. The corresponding pitch and play length are the same; according to the order of the text in the lyrics, & is corresponding to the „.g condition number is the voice data; the output voice data is obtained by capturing the voice signal of each text of the corresponding song, and according to After the tone of the whole voice signal and the length of the play, according to the arrangement of the characters in the lyrics

順序合成調整後的語音峨,藉畴決先前技術所存在的 問題。 本發明所揭露之系統與方法如上,其與先前技術之間 的差異在於本發明會先擷取與歌曲之歌詞中各文字對應 的語音訊號,並在依據歌曲之_罐語音峨的音調以 及播放長度後’再依獅:财各文字簡觸序合成各個 調整後的語音喊。透過上述的技術手段,本發明可以提 供所有歌狀將唱的舰,並達成模麵似人聲歌唱的 技術功效。 1375214 【實施方式】 以下將配合圖式及實施例來詳細說明本發明之詳細 特徵與實施方式,内容足以使任何熟習相關技藝者輕易地 理解本發明解決技術問題所應用的技術手段並據以實 施,及理解實現本發明可達成的功效。 以下以「第1圖」來說明提供模擬人聲歌唱之環境。 本發明執行在伴唱設備100巾,而伴唱設備1〇〇、影像播 放設備600以及聲音播放設備700組成提供使用者進行歌 曰的伴唱ί哀境’其中’伴唱設備1〇〇包含但不限於點唱機、 執行網頁提供歌唱服務的電腦等、影像播放設備_包含 但不限於電視、電腦螢幕擘。伴唱設備1〇〇與影像播放設 備600以及聲音播放設備7〇〇連接。 以下先以「第2圖」本發明所提之模擬人聲歌唱之系 統架構圖來制本發_系統運作。如圖所示,本發明之 系統含有儲存模組11〇、輸人模組12G、語音操取模組 130、語音調整模组150、語音合成模组17〇以及語音輸出 模組180。 儲存模組110負責儲存複數個語音檔案,各個被儲存 的語音檔案至少與-個文字姆應,且各個文字可以對應 多個=同的語音檔案。其巾’本發明所提之文字包含但不 P艮於央語、漢語等語言的文字;本發明所提之語音槽案包 含-個文字、—解詞或是—她句,也就是說語音樓案 中可以記錄一個或多個文字的發音。 I S3 8 1375214 一般而&,s吾音樓案係依據文字之字音與文字建立對 應關係,也就是說,不論不同文字的字形是否相同,只要 字音相同的文字就會對應到相同的語音檔案,但本發明並 不以此為限,也可以為字形不同的文字以及語音檔案建立 一對多的對應關係。 另外,由於同-個文字在-個句子中的不同位置以及 句子需要強綱重點不同’通常會有不—樣語氣,這將造 成發音有些許的不同,例如「『我』是一個常用的文字」、 「一般人講話都會用到『我』這個字」以及「尤其是講話 開頭都使用『我』」這三個句子中的文字「我」在所展現 的聲音特性都不同’喊文字『我』會有至少三種發音, 因此儲存模組110中會儲存對應文字「我」之三^曰 語音檔案。The sequential synthesis of the adjusted speech 峨 is based on the problems of the prior art. The system and method disclosed in the present invention are as above, and the difference between the prior art and the prior art is that the present invention first captures the voice signal corresponding to each character in the lyrics of the song, and plays the tone according to the sound of the song. After the length, 'by the lion: the various characters of the text are synthesized in sequence to synthesize each adjusted voice. Through the above technical means, the present invention can provide all the songs that the singer will sing, and achieve the technical effect of the vocal vocal singing. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed features and embodiments of the present invention will be described in detail below with reference to the drawings and embodiments, which are sufficient to enable those skilled in the art to readily understand the technical means to which the present invention solves the technical problems. And understanding the achievable effects of the present invention. The following is an illustration of the environment for providing simulated vocal singing in "Picture 1". The present invention is implemented in a accompaniment device 100, and the accompaniment device 1 影像, the video playback device 600, and the sound playback device 700 are provided to provide a singer for the user to sing a song. The singer device 1 includes but is not limited to a point. A player, a computer that performs a singing service on a webpage, and a video playback device include, but are not limited to, a television or a computer screen. The accompaniment device 1 is connected to the video playback device 600 and the audio playback device 7A. In the following, the system operation of the system for simulating vocal singing proposed in the "Fig. 2" of the present invention is first described. As shown, the system of the present invention includes a storage module 11A, an input module 12G, a voice operation module 130, a voice adjustment module 150, a voice synthesis module 17A, and a voice output module 180. The storage module 110 is responsible for storing a plurality of voice files, each stored voice file is at least one text, and each text can correspond to multiple = same voice files. The text mentioned in the present invention includes but not a word in a language such as a Chinese language or a Chinese language; the voice slot case proposed by the present invention includes a text, a solution word, or a her sentence, that is, a voice. The pronunciation of one or more words can be recorded in the building case. I S3 8 1375214 General and &, s, my tone building case is based on the word sound and text to establish a corresponding relationship, that is, regardless of whether the different text fonts are the same, as long as the word sounds the same text will correspond to the same voice file However, the present invention is not limited thereto, and a one-to-many correspondence relationship may be established for characters and voice files having different font shapes. In addition, because the same text in different positions in a sentence and the sentence needs to be different from the strong focus 'usually there will be no-like tone, which will make the pronunciation slightly different, for example, "I" is a commonly used text. "The words of "I" are used by ordinary people and "I", especially in the beginning of the speech." The words "I" in the three sentences are different in the characteristics of the voices. There will be at least three pronunciations, so the storage module 110 will store the three voice files corresponding to the text "I".

輸入模組120負責輸入一首歌曲的歌詞以及該首歌曲 的曲調。歌唱者可以隨著播放的歌曲的曲調和歌詞吟唱歌 曲,因此歌詞與曲調具有特定的對應關係,也就是說,歌 ===文字會對應到曲調中—段音調,與各文字對應 的曰調包含一個以上的音高。 另外,各個文字騎應之音調婦 併被記錄在歌詞中,但歌詞中記錄各文 t獅 間,盆他限於記錄起始時間與終止時 ^ /、他方式例如只記錄各文字射庫立 而由累積計算排列ϋ 調的播放時間, 异排財别的文字對應之音調的播放時間之 I S1 9 取得某個文字對應之音雛播放起始時間。I中, 此為限間和錄㈣她的方式,但本發明並不以 除了上觀敎字龍之音調婦放_放長 ^,,歌詞可以分為多個段落,每個段落包含一個或多ς文 ^如此,記錄於歌财的時_為各段落_放的 長度。 在大多數的情況下,歌詞與曲調會被儲存在儲存翻 110中二輸入模、组m根據歌唱者的歌曲選擇自儲存模組 110中5胃崎朗歌詞與㈣,但本發明並不以此為限, 例如輸入槪12〇也可以提供制雜人歌詞與_。 士語音揭取额130負責在輸入模组m輸入歌詞後, 讀出歌詞中的各個文字,並至儲存模组1 i 0由與歌詞中的 各個文字相對應的語音㈣中娜出各個語音訊號,被語 音娜模、纽130所操取出來的各個語音訊號分別與各個文 字相對應。若語音檔案記錄—個文字的發音,則儲存模組 110中所儲存的語音檔案中所記錄的語音訊號即為與文字 對應的έ吾音訊號,因此語音掏取模組130可以棟取完整的 5吾音檔案做為5吾音訊號,而若語音檔案記錄一個單詞或一 個短句的發音,則語音擷取模組130會擷取出部分的語音 樓案做為為音sfl號,被擷取出來的語音訊號即為相對應之 文字的發音。 一般而言,語音擷取模組13〇會依據歌詞中各文字之 順序,由對應各文字的語音檔案中擷取對應各文字的語音 訊號,但語音擷取模組130擷取對應歌詞中各文字的語音 訊號之順序並不以此為限,凡可讓語音擷取模組13〇擷取 出歌詞中所有文字對應之語音訊號之方式均可由本發明 所使用。 若儲存模組110中所儲存的語音檔案與文字為多對一 的對應關係,則語音擷取模組130可以使用隱馬爾可夫模 型(Hidden Markov Model,HMM)來選擇與文字相對應的 語音訊號,但本發明選擇被擷取之語音檔案的方式並不以 此為限。而若儲存模組中所儲存的語音檔案與文字為 一對一的對應關係,也寸以在儲存模組11〇中使-用對雇表一 來記錄的語音檔案與文字的對劇歸,_由與文字相對 應的語音檔案中擷取對應文字的語音訊號。 值得一提的是,若語音擷取模組13〇使用隱馬爾可夫 模型選擇$音檔案,則需要先提供訓練資料對隱馬爾可夫 模型進行訓練,讓隱馬爾可夫模型計算MFCC,如此,隱 馬爾可夫模型便可以由訓練資料訓練出不同情況的模 型’並且針賴型裡的每一個狀態訓練出轉換函式,接著 在將文子以及相對應的語音檔案做為輸人參數輸入隱馬 爾可夫模型後’隱馬爾可夫模型會選出最適合提供給語音 調整模組150輕音蘭語音峨。另外,岐輸入的語 音檔案為發音包含文字的單詞或短句,隱馬爾可夫模型會 利用咸特比(Viterbi)演算法得到輸人之語音齡的最^ 1375214 狀態序列’所得到的最佳序列即可讓隱馬爾可夫模型切割 語音檔案的發音’也就是說’語音擷取模組130透過隱馬 爾可夫模型更可以由紀錄文字發音的語音檔案中擷取出 對應文字的語音訊號。 語音調整模組150負責將語音擷取模組13〇擷取出來 之語音訊號的音調調整到與相對應的文字在曲調中對應 之音調相同,之後語音調整模組15〇也會將語音訊號的播 放長度調整到與相對應的文字在曲調中對應之音調的播 放長度相同。通常語音調整模組150會依據語音擷取模組 130將語音訊號擷取出來之順序調整語音訊號的音調,但 本發明並不α此為限。若語音擷取模組1;3〇櫧取出來之語 -音訊號的音調在未經語音調整模組150調整之前,便已與 相對應的文字在曲調中對應之音調相同,則語音擷取模組 130將不會對語音擷取模組13〇擷取出來之語音訊號進行 調整。 一般而言’語音調整模組15〇會使用時域基頻同步疊 加(Time Domain Pitch Synchronous Overlap-add, TD-PSOLA)演算法來調整語音訊號的音調,時域基頻同 步疊加演算法會先計算出語音訊號中的每一個波峰 (peak)的位置’再調整各波峰的間距來完成語音訊號之 音調的調整’但本發明所提之調整語音訊號之音調的方式 並不以此為限,其他方式例如相位語音編碼器(phase vocoder)等。 IS] 12 語音合成模組170負責將經過語音調整模組15〇調整 ,音調的語音峨合成語音=賴,其巾,語音合賴組17〇 是依據各個文字在歌詞中的順序,把各文字對應的語音訊 號合成語音·。若語音触触13G依據歌财之各文 字的排列順序由相對應的語音檀案中擷取出語音訊號,語 音調整模組15G也依據語音訊號被擷取出來之順序,也就 是歌詞中之文字的排列順序調整語音訊號,則語音合成模 組170可以直接依據語音調整模組15_整語音訊號之順 序,依序將語音訊號合成語音資料;若語音操取模組13〇/ 語音調整模組並未按照歌詞中之文字的排列順序操取 /調整浯音錢’則語音合賴 1Γ7_σ-需要讀取輸—人模盈_ — 120所輸入的歌詞,藉以取得歌詞中各文字的排列順序, 使得語音調整模組15〇調整後的各個語音訊號可以依照正 確的順序被合成。 需要特別說明的是,由於經過語音調整模組15〇調整 後的各個語音訊號在語音資料中被播出的播放長度與各 語音訊號對應的文字所對應之音調在曲調中被播出的播 放長度相同。13此’語音合成模組17〇所產生的語音資料 將會隨著_吟唱出歌觸聲音,但被產生的語音資料並 不包含曲調,只有人聲。 語音輸出模組180負責輪出語音合成模組17〇合成各 個5吾音§fl號所產生的語音資料。語音輸出模組18〇可以播 放語音資料、將語音資料儲存為新的檀案等方式輸出語音The input module 120 is responsible for inputting the lyrics of a song and the tune of the song. The singer can sing along with the tune and lyrics of the song being played, so the lyrics have a specific correspondence with the tune, that is, the song === text will correspond to the tune of the tune, and the tune corresponding to each character Contains more than one pitch. In addition, each character rides the voice tones and is recorded in the lyrics, but the lyrics record the texts of the lions, and the pots are limited to the recording start time and the end time ^ /, his way, for example, only record each character volley The playback time is adjusted by the cumulative calculation, and the playback time of the tone corresponding to the text of the different banknotes is I S1 9 to obtain the playback start time of the tone corresponding to a certain character. In I, this is the way to limit and record (4) her, but the invention is not in addition to the upper view of the dragon's voice, the lyrics can be divided into multiple paragraphs, each paragraph contains one or More than the text ^ so, recorded in the time of the song _ for the length of each paragraph _ put. In most cases, the lyrics and tunes will be stored in the storage flip 110 in the two input mode, the group m is selected from the storage module 110 according to the singer's songs, and the present invention does not This is limited to, for example, input 槪12〇 can also provide miscellaneous lyrics and _. The voice release amount 130 is responsible for reading each character in the lyrics after inputting the lyrics in the input module m, and to the storage module 1 i 0 by the voices corresponding to the respective characters in the lyrics (4) Each voice signal taken out by the voice Namo and New 130 corresponds to each text. If the voice file records the pronunciation of the text, the voice signal recorded in the voice file stored in the storage module 110 is the voice signal corresponding to the text, so the voice capture module 130 can complete the complete voice. 5 The voice file is used as the 5 voice signal, and if the voice file records the pronunciation of a word or a short sentence, the voice capture module 130 extracts part of the voice building as the sound sfl number, and is captured. The voice signal that comes out is the pronunciation of the corresponding text. In general, the voice capture module 13 撷 extracts the voice signals corresponding to the respective characters from the voice files corresponding to the respective characters according to the order of the characters in the lyrics, but the voice capture module 130 captures the corresponding lyrics. The order of the voice signals of the text is not limited thereto, and any manner in which the voice capture module 13 can extract the voice signals corresponding to all the characters in the lyrics can be used by the present invention. If the voice file and the text stored in the storage module 110 have a many-to-one correspondence, the voice capture module 130 may use a Hidden Markov Model (HMM) to select a voice corresponding to the text. Signal, but the way in which the present invention selects the voice file to be retrieved is not limited thereto. If the voice file and the text stored in the storage module are in a one-to-one correspondence, the voice file and the text recorded in the storage module 11 are used in the storage module 11〇, _ The voice signal corresponding to the text is retrieved from the voice file corresponding to the text. It is worth mentioning that if the speech capture module 13 uses the hidden Markov model to select the audio file, it is necessary to provide the training data to train the hidden Markov model, and let the hidden Markov model calculate the MFCC. The hidden Markov model can train the different situations of the model from the training data' and each state in the pin-type type trains the conversion function, and then inputs the text and the corresponding voice file as input parameters. After the Markov model, the 'hidden Markov model' will be selected to provide the voice adjustment module 150 with a light-tone blue voice. In addition, the input voice file is a word or phrase that contains words, and the hidden Markov model uses the Viterbi algorithm to obtain the best of the 1375214 state sequence of the input speech age. The sequence allows the hidden Markov model to cut the pronunciation of the voice file. That is to say, the voice capture module 130 can extract the voice signal of the corresponding text from the voice file of the recorded text through the hidden Markov model. The voice adjustment module 150 is configured to adjust the tone of the voice signal extracted by the voice capture module 13 to the same tone as the corresponding text in the tune, and then the voice adjustment module 15 will also transmit the voice signal. The playback length is adjusted to be the same as the playback length of the corresponding tone in the tune. Generally, the voice adjustment module 150 adjusts the pitch of the voice signal according to the sequence in which the voice capture module 130 extracts the voice signal, but the present invention is not limited thereto. If the voice capture module 1; 3 is taken out - the tone of the audio signal is the same as the corresponding tone in the tune before being adjusted by the voice adjustment module 150, then the voice capture is performed. The module 130 will not adjust the voice signal extracted by the voice capture module 13. In general, the voice adjustment module 15 will use the Time Domain Pitch Synchronous Overlap-add (TD-PSOLA) algorithm to adjust the pitch of the voice signal. The time domain fundamental frequency synchronization overlay algorithm will first Calculate the position of each peak in the voice signal 're-adjust the pitch of each peak to complete the adjustment of the pitch of the voice signal'. However, the manner of adjusting the pitch of the voice signal proposed by the present invention is not limited thereto. Other methods such as a phase vocoder and the like. IS] 12 The speech synthesis module 170 is responsible for adjusting the voice adjustment module 15〇, the voice of the tone is synthesized into a voice = Lai, and the towel and the voice group are 17 according to the order of the characters in the lyrics. Corresponding voice signal synthesized speech. If the voice touch 13G extracts the voice signal from the corresponding voice Tan case according to the order of the characters of the song, the voice adjustment module 15G is also extracted according to the voice signal, that is, the text in the lyrics. The voice synthesizing module 170 can directly synthesize the voice signal according to the sequence of the voice adjustment module 15_the whole voice signal; if the voice operation module 13〇/the voice adjustment module is If you do not follow the order of the words in the lyrics, you can manipulate/adjust the voice money. Then the voice depends on 1Γ7_σ-you need to read the input--person model surplus_- 120 input lyrics, so as to obtain the order of the characters in the lyrics, so that The adjusted voice signals of the voice adjustment module 15 can be synthesized in the correct order. It should be specially noted that the length of the playback of the voice signal adjusted by the voice adjustment module 15 in the voice data and the tone corresponding to the voice corresponding to each voice signal are played in the tune. the same. 13 The speech data generated by the 'speech synthesis module 17' will sing the song with the _ 吟, but the generated speech data does not contain the tune, only the vocals. The voice output module 180 is responsible for rotating the voice synthesis module 17 to synthesize the voice data generated by each of the five voices §fl. The voice output module 18 can play voice data, store voice data as a new file, and output voice.

資料,但本發明並不以此為限。若語音輸出模組180將語 音資料以播放的方式輸出,則執行本發明的伴唱設備100 會直接將語音資料以及歌曲的曲調輸出到相連接的聲音 播放設備700播放,而若語音輸出模組18〇以檔案的方式 輸出語音資料’則伴唱設備削會在讀取被輸出的檔案 後,將語音資料以及歌曲的曲調輸出到相連接的聲音播放 設備700播放。另夕卜,在伴唱設備丨⑻將語音資料以及歌 曲的曲調輸出到相連接的聲音播放設備7〇〇的同時,伴唱 設備1GG也會將歌_歌哺㈣f彡像魏設備_播 放’使得歌曲的歌曲、曲調以及合成的語音倾可以同步 播放’提供使—用著入聲導唱的-功-能l ...................Information, but the invention is not limited thereto. If the voice output module 180 outputs the voice data in a play mode, the accompaniment device 100 executing the present invention directly outputs the voice data and the tune of the song to the connected sound playback device 700, and if the voice output module 18输出 Output voice data in the form of a file. The singer device will output the voice data and the tune of the song to the connected sound playback device 700 after reading the output file. In addition, while the vocal device 8 (8) outputs the voice material and the tune of the song to the connected sound playing device 7 ,, the vocal device 1GG also plays the song _ song feeding (four) f 彡 like Wei device _ play 'making the song The songs, tunes, and synthesized voices can be played simultaneously. 'Providing--using the vocal vocal-function-energy...................

事實上’本發明的輸入模组12〇更負責提供使用者輸 入if感參數’ If感參數係提供給可附加於本發明之效果 整模組160做為調整語音訊號之輸入效果的依據,藉以模 擬在不_情感參數所表示的情感之下,語㈣號的音 南、音量會產生相對觸,讓_的語音訊號隨著不 同的情感參數㈣應不同的輸出效果。 效果調整模組160負責在輸人模組12嗜人情感參类 Ϊ 情感參數調整各文字對應之語音訊號的輸_ 斤明的調正’即疋效果調整模組⑽在語音訊號中办 、改變語音訊號的振幅等機,細讓語音訊號初 ^=產峨爾料、射、淡人、淡出、關 或曰置變化等輸出效果。其中,效果調整模組⑽並不一 14 1375214 定會對所有的語音訊號的輸出效果進行調整,只會對需要 調整輸出效果的語音訊號進行調整,例如,抖音效果通常 只會出現在音長超過一定時間長度的音調上,因此效果調 整模組160只會將被播放之播放長度超過一定 語音訊號調整為帶有抖音的輸出效果。 、又, 接著以一個實施例來解說本發明的運作系統與方 法,並請參照「第3圖」本發明所提之模擬人聲歌唱之方 馨 法流程圖。在本實施例中,以執行有本發明的伴唱設備 1〇〇、電視(影像播放設備6〇〇)以及家用音響(聲音播放 設備700)來提供歌唱環境給使用者。 在使用— 者使用本發—明之前,執行本發明之伴唱設備 100上的儲存模組110會先儲存與所有中文字對應的語音 檔案(步騾210)。假設在本實施例中,是以中文字的字音 與3吾音槽案建立對應關係,則由於中文字共有個基本 音節(字音)’因此儲存模組110中至少會儲存411個語 • 音檔案,且不同的字音都至少對應到一個語音檔案。 當使用者在歌唱環境中開啟人聲導唱的功能,但執行 本發明之伴唱設備100的沒有儲存使用者點唱的歌曲之原 曰歌聲時’輸入模組120會輸入使用者在歌唱環境中所點 唱之歌曲的歌詞,以及同一首歌曲的曲調(步驟221)。假 設使用點選的歌曲為「紫竹調」,則輸入模組120至儲^ 模組110中讀出歌曲「紫竹調」的曲調以及如「第4圖」 所不之歌曲「紫竹調」的歌詞3〇〇,藉以完成歌曲「紫竹 15 1375214 調」之歌詞以及曲調的輸入。 之後,語音擷取模組130會至儲存模組11〇中讀取歌 曲「紫竹調」之歌詞300中的各個文字所對應的語音訊號 (步驟23G),由於在本實施例中,各個文字係以文字的字 音對應到語音槽案,因此語音操取模組13()會對歌詞3〇〇 中的各個文字分猶行語音職的鮮,也就是說,語音 擷取模組130會依據文字「一」的字音搜尋相對應的語音 檔案’並讀取對應文字「一」的語音檔案,由藉以由對應 文字「一」的語音檔案中擷取出對應文字「一」的語音訊 號’而在搜尋出對應文字「_」的語音職後,繼續搜尋 對應文字「根」、文字「紫」、…、文字「會」以及文字「了」 的語音檔案並由語音檔案巾擷取出語音訊號。假設儲存模 組110所儲存的多個語音檔案與文字「根」義,則語音 榻取模組13G會由與文字「根」對應的多個語音樓案中, 以隱馬爾可夫觀搜尋出最適合語音縦模組15()調整的 語音檔案。其餘文字可以類推,故不再贅述。 在θ擷取模組13〇至儲存模組丨中摘取出歌詞中 之各個文字對應的語音訊號(步驟23G)後,語音調整模 組150會調整語音操取模系且13〇所擷取出來之語音訊號的 音調以及敝長度,使得語音喊的音難雌語音訊號 的文子在曲調巾所對應之音調相同,且語音訊號的播放長 度與該音調之播放長度相同(步驟25〇) ^為了方便說明, 以下將以如「第5圖」所示簡譜侧來描述歌曲「紫竹調」 的曲調’文子「―」在曲調中所對應的音調為「sol」’且 其之播放紐為兩拍’因此語音調整模組15G可以使用時 域基頻同步疊加演算法將對應文字「一」之語音訊號的音 調調整為「S01」’且將語音訊號的播放長度調整(延長或 縮紐)為兩拍,但語音調整模組150並不限於使用時域基 頻同步璺加演算法調整語音訊號。語音調整模組150調整 歌詞300中其餘文字所對應之語音訊號的方式可以類推, 故不資述。 在语音擷取模組130擷取出語音訊號(步驟23〇)以 及語音婦模組15G婦語音域的音調浦放長度(步 騍250)的步騍中,語音調整-模—組——⑽一可以在語音指_莫—一 2 130擷取出一個文字對應的語音訊號後,就立刻調整語 曰戒號的音調與播放長度,也可以在語音擷取模組130將 歌凋_所有文字對應的語音訊號都操取出來後,在分別調 整所有擷取出之語音訊號的音調與播放長度。 在語音調整模組150調整完語音擷取模組13〇搜尋出 之所有語音訊號的音調與播放長度(步驟250)後,語音 合成模組170會依據各個文字在歌詞中的順序,依序合1 文予對應的s吾音訊號,並在完成歌詞中所有文字的合成後 產生語音資料(步驟270),也就是說’語音合成模組17〇 會依據文字「一」'文字「根」、文字「紫」、…、文字「會」 以及文字「了」的順序,將對應文字「根」的語音訊號θ接」 在對應文字「一」的語音訊號之後合成語音資料,並將"對 應文字「紫」的語音訊號接在已合成的語音資料之後,合 f的語音資料,…,依此類推,直到合成文字「了」的 語音訊號產生最_語音資料5〇〇,合成後的語音資料5〇〇 被播放後’歌詞300中各文字的聲音在語音資料5〇〇中之 播放長度將如「第6圖」所示。 在語音合成模組170將文字對應的語音訊號合成產生 語音資料(步驟270)後,語音輸出模組18〇會輸出合成 後的語音資料(步驟280)。假設在本實施例中,語音輸出 杈組180係以播放語音資料的方式輸出語音資料,則執行 本發明的伴唱設備1〇〇會直接將語音輸出模組18〇播放語 音贺粹以及儲存在儲存模組打0-,之歌菊的曲調傳送到京一 用音響,同時,執行本發明的伴唱設備100也會將歌曲的 歌詞傳送給電視顯示,使得歌曲的曲調、歌曲的歌詞以及 語音資料同步的播放,讓使用者能夠在歌唱環境中聆聽到 隨著曲調之音調吟唱歌詞的人聲,以及觀看到該吟唱的歌 詞。 如前所述,本發明可以進一步提供效果調整模組 160 ’當加入效果調整模組16〇之後,使用者在透過輸入 模組120輸入歌詞300以及曲調(步驟221)的同時或之 後’還可以透過輸入模組120輸入情感參數(步驟223 ), 而在語音調整模組150調整語音訊號的音調與播放長度 (步驟250)之後’效果調整模組160會依據情感參數調 整各個語音訊號的輸出效果(步驟260),例如當使用者輸 入的情感參數為「自信」’則效果赃模組16G便可以依 據情感參數「自信」在發聲較長語音訊號,例如「直苗苗」 中的第一個「苗」字所對應的語音訊號最後二分之一拍調 整為抖音’也就是難語音訊號最後H拍的音高, 讓音南些稍脫離原先的音高;或是在細者輸入的情感 參數為「憤怒」時,效果調整模址160會依據情感參數為 「憤怒」難語音訊號的輸出時的振幅,也就是調整語音 讯號的音直,但本發明所提之調整輸出效果並不以上述兩 者為限。 在語音調整模組150調整語音訊號的音調與播放長度 (步驟250)以及效果調整模組16〇調整各個語音訊號的 輸出效果(步驟260)的步驟中,效果調整模組16〇可以 在έ吾音調整模組150完成-個語音訊狀音贿播放長度 的調整後,就立刻調整語音訊號的輸出效果,也可以在語 音調整模組15G將所有文字職之語音訊號的音調與播放 長度都調整完畢後’在分騎需要調錄纽果的語音訊 號進行調整。 在效果调整模組160調整各個語音訊號的輸出效果 (步驟260)後,語音合成模組17〇會將經過語音調整模 組150以及效果調整模組160調整後的語音訊號合成為語 音資料(步驟270 ),並由語音輸出模組丨8〇輸出合成後的 語音資料(步驟280),如此使用者便可以獲得以人聲吟唱 歌詞的語音資料。 7曰 综上所述’可知本發明與先前技術之間的差異在於具 有擷取與歌曲之歌詞中各文字對應的語音訊號,並依據歌 曲之曲調調整語音訊號的音調以及播放長度後,再依據歌 词中各文字的排列順序合成各個調整後的語音訊號藉由 此一技術手段可以解決先前技術所存在無法為所有歌曲 提供人聲導唱功能的問題,進而達成模擬類似人聲歌唱的 功效。In fact, the input module 12 of the present invention is more responsible for providing the user with the input of the if parameter. The If parameter is provided to the effect module 160 that can be attached to the present invention as a basis for adjusting the input effect of the voice signal. Under the emotion expressed by the non-emotional parameter, the sound and the volume of the language (4) will produce a relative touch, so that the voice signal of _ should have different output effects with different emotional parameters (4). The effect adjustment module 160 is responsible for adjusting the input and output of the voice signal corresponding to each character in the input parameter of the input module 12, and the effect adjustment module (10) in the voice signal is changed and changed in the voice signal. The amplitude of the voice signal is equal to the output of the voice signal, such as the production of the material, shooting, light, fade, off or change. Among them, the effect adjustment module (10) does not adjust the output effect of all the voice signals, and only adjusts the voice signal that needs to adjust the output effect. For example, the vibrato effect usually only appears in the length of the sound. The tone adjustment module 160 only adjusts the length of the played play beyond a certain voice signal to an output effect with vibrato. Further, the operation system and method of the present invention will be explained by way of an embodiment, and the flow chart of the simulation method for simulating vocal singing proposed by the present invention will be described with reference to "Fig. 3". In the present embodiment, the singing environment is provided to the user by the sing-along device 1 〇〇, the television (video playback device 6 〇〇), and the home audio (sound playback device 700) having the present invention. Before using the present invention, the storage module 110 on the accompaniment device 100 of the present invention first stores the voice files corresponding to all the Chinese characters (step 210). It is assumed that in the present embodiment, the Chinese character has a corresponding relationship with the 3 voice channel, and since the Chinese character has a basic syllable (word), at least 411 voice files are stored in the storage module 110. And different words and sounds correspond to at least one voice file. When the user turns on the function of the vocal guide in the singing environment, but the singer 100 of the present invention does not store the original sing of the song sung by the user, the input module 120 inputs the user in the singing environment. The lyrics of the song sung and the tune of the same song (step 221). Assuming that the selected song is "Zizhu Tune", the input module 120 to the storage module 110 reads the tune of the song "Zizhu Tune" and the lyrics of the song "Zizhu Tune" as shown in "Fig. 4". 3〇〇, to complete the song "Zizhu 15 1375214 tune" lyrics and the input of the tune. After that, the voice capturing module 130 reads the voice signal corresponding to each character in the lyrics 300 of the song "Zizhu Tun" in the storage module 11 (step 23G), because in this embodiment, each text system The word sound of the text corresponds to the voice slot, so the voice operation module 13() will separate the words in the lyrics 3〇〇, that is, the voice capture module 130 will follow the text. The word "one" is searched for the corresponding voice file' and the voice file corresponding to the text "one" is read. The voice signal corresponding to the word "one" is extracted from the voice file corresponding to the word "one". After the voice job corresponding to the text "_", continue to search for the voice file corresponding to the text "root", the text "purple", ..., the text "will" and the text "了" and take the voice signal from the voice file. Assuming that the plurality of voice files and texts stored in the storage module 110 are "root", the voice couching module 13G searches for a hidden Markov concept from a plurality of voice buildings corresponding to the "root" of the text. The most suitable voice file for voice 縦 module 15 () adjustment. The rest of the text can be analogized, so I won't go into details. After the θ extraction module 13 〇 to the storage module 摘 extracts the voice signal corresponding to each character in the lyrics (step 23G), the voice adjustment module 150 adjusts the voice operation module and captures the voice operation module. The pitch of the voice signal and the length of the voice signal are such that the voice of the voice is difficult to be the same as the tone of the voice towel, and the length of the voice signal is the same as the length of the tone (step 25〇) ^ For convenience of explanation, the tune "Wenzi" of the song "Zizhu Tune" will be described as "sol" in the tune as "sol" in the tune side as shown in "5th figure" and the play button is two shots. Therefore, the voice adjustment module 15G can adjust the pitch of the voice signal corresponding to the text "1" to "S01" using the time domain fundamental frequency synchronization superposition algorithm and adjust (extend or shorten) the playback length of the voice signal into two. Beat, but the voice adjustment module 150 is not limited to adjusting the voice signal using the time domain baseband synchronization plus algorithm. The manner in which the voice adjustment module 150 adjusts the voice signal corresponding to the rest of the characters in the lyrics 300 can be analogized, and therefore is not described. In the step of the voice capture module 130 extracting the voice signal (step 23〇) and the tone placement length of the voice module 15G voice field (step 250), the voice adjustment mode-group-(10) one After the voice signal corresponding to the text is taken out, the tone and the length of the voice ring can be adjusted immediately, and the voice capture module 130 can also correspond to all the characters in the voice capture module 130. After the voice signals are all taken out, the pitch and play length of all the extracted voice signals are adjusted separately. After the voice adjustment module 150 adjusts the pitch and play length of all the voice signals searched by the voice capture module 13 (step 250), the voice synthesis module 170 will sequentially follow the order of the characters in the lyrics. 1 to the corresponding sui audio signal, and after the completion of the synthesis of all the characters in the lyrics, the voice data is generated (step 270), that is, the voice synthesis module 17 will be based on the text "one" and the text "root", In the order of the words "purple", ..., the text "will" and the text "了", the voice signal θ corresponding to the text "root" is connected to the voice signal corresponding to the text "one", and the speech data is synthesized and corresponding to The voice signal of the word "purple" is connected to the synthesized voice data, the voice data of the f, ..., and so on, until the voice signal of the synthesized text "has" produces the most _ voice data 5 〇〇, the synthesized voice After the data is played, the playback length of each character in the lyrics 300 will be as shown in "Figure 6". After the speech synthesis module 170 synthesizes the speech signals corresponding to the characters to generate the speech data (step 270), the speech output module 18 outputs the synthesized speech data (step 280). It is assumed that in the embodiment, the voice output group 180 outputs voice data in a manner of playing voice data, and the singer device 1 executing the present invention directly plays the voice output module 18 and plays the stored message in the storage. The module hits 0-, and the melody of the lyrics is transmitted to the Jingyi audio. At the same time, the accompaniment device 100 executing the present invention also transmits the lyrics of the song to the television display, so that the melody of the song, the lyrics of the song, and the voice data are synchronized. The play allows the user to listen to the vocals that sing along with the tune of the tune in the singing environment, and to watch the lyrics of the humming. As described above, the present invention can further provide the effect adjustment module 160. After the effect adjustment module 16 is added, the user can input the lyrics 300 and the tune (step 221) through the input module 120. The emotion parameter is input through the input module 120 (step 223), and after the voice adjustment module 150 adjusts the pitch and play length of the voice signal (step 250), the effect adjustment module 160 adjusts the output effect of each voice signal according to the emotion parameter. (Step 260), for example, when the emotional parameter input by the user is "confidence", the effect 赃 module 16G can utter a longer voice signal according to the emotional parameter "confidence", for example, the first one of "straight seedlings" The last half of the voice signal corresponding to the word "Miao" is adjusted to vibrate, which is the pitch of the last H beat of the difficult voice signal, so that the sound is slightly off the original pitch; or the input is fine. When the emotional parameter is "angry", the effect adjustment template address 160 is based on the amplitude of the emotional parameter as the output of the "angry" difficult speech signal, that is, the sound of the adjusted speech signal. , But the present invention is referred to adjust the output effect is not limited to those in the above two. In the step that the voice adjustment module 150 adjusts the pitch and play length of the voice signal (step 250) and the effect adjustment module 16 adjusts the output effect of each voice signal (step 260), the effect adjustment module 16 can be used in the έ吾After the tone adjustment module 150 completes the adjustment of the length of the voice message, the voice signal output effect is adjusted immediately, and the voice adjustment module 15G can adjust the tone and the play length of all voice signals. After the completion, the voice signal of the New Fruit needs to be adjusted in the ride. After the effect adjustment module 160 adjusts the output effect of each voice signal (step 260), the voice synthesis module 17 合成 combines the voice signals adjusted by the voice adjustment module 150 and the effect adjustment module 160 into voice data (steps). 270), and the synthesized voice data is output by the voice output module 丨8〇 (step 280), so that the user can obtain the voice data of the singing voice by the voice. According to the above description, it can be seen that the difference between the present invention and the prior art is that it has a voice signal corresponding to each character in the lyrics of the song, and adjusts the pitch and length of the voice signal according to the tune of the song, and then according to The arrangement order of the characters in the lyrics is combined with the adjusted voice signals. By this technical means, the problem that the prior art cannot provide the vocal guide function for all the songs can be solved, thereby achieving the effect of simulating similar vocal singing.

再者,本發明之合成人聲歌曲之方法,可實現於硬 體軟體或硬體與軟體之組合中,亦可在電腦系統中以集 中方式實現或以不同元件麟於若干互連之電腦系統的 分散方式實現。Furthermore, the method for synthesizing vocal songs of the present invention can be implemented in a hardware software or a combination of hardware and software, or can be implemented in a centralized manner in a computer system or in a computer system with different components. Decentralized implementation.

雖然本發明所揭露之實施方式如上,惟所述之内容並 Μ、直接限&本發明之專利保護範圍。任何本發明所屬 技術領域巾具有财知識者,在不_本伽所揭露之精 神和範圍之内,在實施的形式上及細紅所為 本發明之專娜護細1此本判之專利保護 二簡=;___範圍所界定者為準。 圖 第1圖係本發明所提之提供合成人聲歌曲之環境示意 第2圖係本發騎提之合成人聲歌曲之 第3圖係本發騎提之合成人聲料之权流2。° 第4圖係本發明實施例所提之歌詞。 田 20 第5圖係本發明實施例所提之簡譜。 第6圖係本發明實施例所提之語音資料播放之聲音播 放長度示意圖。 【主要元件符號說明】 100 伴唱設備 110 儲存模組 120 輸入模組 130 語音擷取模組 150 语音調整模組 160 效果調整模組 ' 語音顧顧——一 ------------- 180 語音輸出模組 300 歌詞 400 簡譜 500 語音資料 600 影像播放設備 700 聲音播放設備 步驟210儲存對應文字之語音檔案 步驟221輸入歌曲之歌詞及歌曲之曲調 步驟223輸入情感參數 步驟230纟與歌詞中各文字相對應之語音權案中操 取對應各文字之語音訊號 步驟250 語音訊號之音調及播放長度為與對應 1375214 之文字於曲調中相應之音調及播放長度相 同 步驟260依據情感參數調整語音訊號之輸出效果 步驟270依據各文字於歌詞中之順序合成相對應之 各語音訊號為語音資料 步驟280輸出語音資料 22Although the embodiments of the present invention are as described above, the contents are not limited to the scope of the present invention. Any person skilled in the art to which the present invention pertains has financial knowledge, and in the spirit and scope of the disclosure of the present invention, in the form of implementation and the fine red is the invention of the invention, the patent protection of the present invention The definition of the range = ___ is subject to change. Figure 1 is a schematic diagram of the environment for providing synthesized vocal songs according to the present invention. Fig. 2 is a diagram showing the flow of synthetic vocal music of the present invention. ° Fig. 4 is a lyrics of the embodiment of the present invention. Field 20 Figure 5 is a simplified representation of an embodiment of the invention. Figure 6 is a schematic diagram showing the length of sound playback of the voice data played by the embodiment of the present invention. [Main component symbol description] 100 vocal equipment 110 storage module 120 input module 130 voice capture module 150 voice adjustment module 160 effect adjustment module 'voice care' - one ---------- --- 180 Voice Output Module 300 Lyrics 400 Spread Spectrum 500 Voice Data 600 Video Playback Device 700 Sound Playback Device Step 210 Store the voice file of the corresponding text Step 221 Enter the song lyrics and song tune Step 223 Enter the emotion parameters Step 230纟The voice signal corresponding to each character in the corresponding lyrics of the lyrics is taken. Step 250 The pitch and play length of the voice signal are the same as the corresponding pitch and play length of the corresponding letter 1375214 in the tune. Step 260 is adjusted according to the emotional parameters. The voice signal output effect step 270 synthesizes the corresponding voice signals into voice data according to the order of the characters in the lyrics. Step 280 outputs the voice data 22

Claims (1)

十、申請專利範圍: L 一種模擬人聲歌唱之方法,係應用於一伴唱設備中, . 該伴唱設備輸出一歌曲之一歌詞及該歌曲之一曲調, 該方法包含下列步驟: 儲存複數個對應至少一文字之語音檔案; 輸入該歌詞及該曲調; 由與該歌詞中之各文字相對應之該些語音檔案中 • 擷取分別對應各該文字之各語音訊號; 調整各該語音訊號之音調及播放長度為與相對應 之各該文字於該曲調中相應之音調及播放長度相同; 依據該文字於該歌詞中之順序,合成相對應之該 些語音訊號為一語音資料;及 輸出該語音資料。 2·如申請專利範圍第1項所述之模擬人聲歌唱之方法, 其中該擷取該些語音訊號之步驟更包含以隱馬爾可夫 鲁 模型(Hidden Markov Model,HMM )分別搜尋該些語 音檔案。 3. 如申請專利範圍第丨項所述之模擬人聲歌唱之方法, - 其中该由該些語音檔案分別擷取該些語音訊號之步驟 - 係分別由該些語音檔案中擷取出對應該文字之部分為 該語音訊號。 4. 如申請專利範圍第丨項所述之模擬人聲歌唱之方法, 其中該調整各該語音訊號之音調之步驟係以時域基頻 23 1375214 同步疊加(Time Domain Pitch Synchronous 〇veriap_add TD-PSOLA)演算法調整各該語音訊號。 5. 如申請專利範圍第1項所述之模擬人聲歌唱之方法, 其中該方法於該調整各該語音訊號之音調之步驟後, 更包含提供輸入一情感參數,並依據該情感參數調整 該些語音訊號之輸出效果之步驟。 6. 如申請專利範圍第5項所述之模擬人聲歌唱之方法, 其中該調整該些語音訊號之輸出效果之步驟係包含調 整s玄些語音訊號之音高、音量至少其中之一。 7. —種模擬人聲歌唱之系統,係應用於一伴唱設備中, 該伴唱設備輸出一歌面之一歌詞及該歌曲之一曲調, 該糸統包含: 一儲存模組,用以儲存複數個語音檔案,各該語 音檔案至少對應一文字; 一輸入模組,用以輸入該歌詞及該曲調; 一語音擷取模組,用以由與該歌詞中之各文字相 對應之該些語音檔案中擷取分別對應該文字之各語音 訊號; 一語音調整模組,用以將各該語音訊號之音調及 播放長度調整為與相對應之各該文字於該曲調中對應 之音調及播放長度相同; 一語音合成模組,用以將該些語音訊號依據該文 字於該歌詞中之順序合成一語音資料;及 IS1 24 一語音輸域組’㈣輸出該語音資料。 8. t申請料翻第7酬述之模擬人聲歌唱之系統, ”中…。J蛛組係以隱可夫模型搜尋對應該 - 些文字之該些語音檔案。 9. 如申請專概㈣7項所述之模擬人聲歌唱之系統, 其中該語音操取模組更用以分別由該些語音構案中擷 取出對應該文字之部分為該些語音訊號。 • 1G.如申請翻細第7項所述之模擬人聲歌唱之系統, 其中該語音調整模組係以時域基頻同步疊加演算法調 整5玄S吾音訊號。 11. 如申轉利範_ 7項所述之模擬人聲教唱之系統, 其中該系統更包含一效果調整模組,用以依據一情感 參數調塾該些語音訊號之輸出效果,其中該情感參數 係由該輪入模組所輸入。 12. 如申請專利範圍第u項所述之模擬人聲歌唱之系統, 參其中該輸出效果係抖音、顫音、淡入、淡出、延遲或 音量至少其中之一。 13·如申請專利範圍第11項所述之模擬人聲歌唱之系統, 其中該效果5周整模組係依據該情感參數調整該些語音 ., 訊號之音高、音量至少其中之一。 25X. Patent application scope: L A method for simulating vocal singing, which is applied to a accompaniment device, the accompaniment device outputs one lyric of a song and one tune of the song, and the method comprises the following steps: storing a plurality of corresponding at least a voice file of a character; inputting the lyrics and the tune; by the voice files corresponding to the texts in the lyrics; capturing respective voice signals corresponding to the respective characters; adjusting the pitch and playback of each voice signal The length is the same as the corresponding pitch and play length of the corresponding text in the tune; according to the order of the text in the lyric, the corresponding voice signals are synthesized as a voice data; and the voice data is output. 2. The method for simulating vocal singing as described in claim 1, wherein the step of extracting the voice signals further comprises searching for the voice files by a Hidden Markov Model (HMM). . 3. The method for simulating vocal singing as described in the scope of the patent application, in which the steps of extracting the voice signals from the voice files are respectively taken out from the voice files Part of this voice signal. 4. The method for simulating vocal singing according to the scope of the patent application, wherein the step of adjusting the pitch of each of the voice signals is synchronized with the time domain base frequency 23 1375214 (Time Domain Pitch Synchronous 〇veriap_add TD-PSOLA) The algorithm adjusts each of the voice signals. 5. The method of simulating vocal singing according to claim 1, wherein the method further comprises: providing an input of an emotional parameter after adjusting the pitch of each of the voice signals, and adjusting the emotion parameters according to the emotional parameter The steps of the output of the voice signal. 6. The method of simulating vocal singing according to item 5 of the patent application, wherein the step of adjusting the output effect of the voice signals comprises adjusting at least one of a pitch and a volume of the voice signal. 7. A system for simulating vocal singing, applied to a accompaniment device, which outputs a lyric of a song face and a tune of the song, the system comprising: a storage module for storing a plurality of a voice file, each of the voice files corresponding to at least one character; an input module for inputting the lyrics and the tune; a voice capture module for using the voice files corresponding to each text in the lyrics The voice adjustment module is configured to adjust the pitch and the play length of each voice signal to be the same as the corresponding pitch and play length of the corresponding text in the tune; a speech synthesis module for synthesizing the speech signals according to the order of the characters in the lyrics; and IS1 24 a speech transmission domain group '(4) outputting the speech data. 8. t Apply for the 7th reward of the simulated vocal singing system, "中.... J spider group uses the hidden model to search for the corresponding voice files of these words. 9. If you apply for the general (4) 7 items The system for simulating vocal singing, wherein the voice operation module is further configured to respectively extract the corresponding part of the text into the voice signals from the voice structures. • 1G. The system for simulating vocal singing, wherein the voice adjustment module adjusts the 5 sin-song signal by a time-domain fundamental frequency synchronous superposition algorithm. 11. The vocal vocal vocalization as described in the application of the syllabus _ 7 The system further includes an effect adjustment module for adjusting an output effect of the voice signals according to an emotional parameter, wherein the emotion parameter is input by the wheel entry module. The system for simulating vocal singing described in item u, wherein the output effect is at least one of vibrato, vibrato, fade in, fade out, delay or volume. 13· Simulated vocal singing as described in claim 11 System, The effect of the 5-week module is to adjust the voice according to the emotional parameter. The pitch and volume of the signal are at least one of them.
TW97150669A 2008-12-25 2008-12-25 System for simulating human singing and method thereof TWI375214B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW97150669A TWI375214B (en) 2008-12-25 2008-12-25 System for simulating human singing and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW97150669A TWI375214B (en) 2008-12-25 2008-12-25 System for simulating human singing and method thereof

Publications (2)

Publication Number Publication Date
TW201025300A TW201025300A (en) 2010-07-01
TWI375214B true TWI375214B (en) 2012-10-21

Family

ID=44852569

Family Applications (1)

Application Number Title Priority Date Filing Date
TW97150669A TWI375214B (en) 2008-12-25 2008-12-25 System for simulating human singing and method thereof

Country Status (1)

Country Link
TW (1) TWI375214B (en)

Also Published As

Publication number Publication date
TW201025300A (en) 2010-07-01

Similar Documents

Publication Publication Date Title
WO2014169700A1 (en) Performance method of electronic musical instrument and music
Larsen Essential Guide to Irish Flute and Tin Whistle
US9601029B2 (en) Method of presenting a piece of music to a user of an electronic device
TW201108202A (en) System, method, and apparatus for singing voice synthesis
JP6452229B2 (en) Karaoke sound effect setting system
CN107146598A (en) The intelligent performance system and method for a kind of multitone mixture of colours
JP4748568B2 (en) Singing practice system and singing practice system program
JP2004233698A (en) Device, server and method to support music, and program
JP2007264569A (en) Retrieval device, control method, and program
TW200907874A (en) Karaoke system providing user with self-learning function
JP2011095437A (en) Karaoke scoring system
CN108346418A (en) A kind of method, system and terminal that song generates
JP4808641B2 (en) Caricature output device and karaoke device
JP4038836B2 (en) Karaoke equipment
Lebon The Versatile Vocalist: Singing Authentically in Contrasting Styles and Idioms
TWI375214B (en) System for simulating human singing and method thereof
JP5193654B2 (en) Duet part singing system
TW201040939A (en) Method for generating self-recorded singing voice
JP6144593B2 (en) Singing scoring system
Pischnotte The Saxophone Music of Jacob ter Veldhuis: A Discussion of Pitch Black, Garden of Love, and Buku
JP2014191331A (en) Music instrument sound output device and music instrument sound output program
WO2023153033A1 (en) Information processing method, program, and information processing device
Kuhns Beatboxing and the flute: its history, repertoire, and pedagogical importance
Silvera-Jensen A comparison of stylistic, technical and pedagogical perspectives in vocal instruction among classical and jazz voice teachers
JP2007233078A (en) Evaluation device, control method, and program

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees