TWI272511B - Animation generation system and method - Google Patents

Animation generation system and method Download PDF

Info

Publication number
TWI272511B
TWI272511B TW094131779A TW94131779A TWI272511B TW I272511 B TWI272511 B TW I272511B TW 094131779 A TW094131779 A TW 094131779A TW 94131779 A TW94131779 A TW 94131779A TW I272511 B TWI272511 B TW I272511B
Authority
TW
Taiwan
Prior art keywords
audio signal
sound
pronunciation
waveform
mouth shape
Prior art date
Application number
TW094131779A
Other languages
Chinese (zh)
Other versions
TW200712954A (en
Inventor
Hung-Lien Shen
Original Assignee
Culture Com Technology Macau Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Culture Com Technology Macau Ltd filed Critical Culture Com Technology Macau Ltd
Priority to TW094131779A priority Critical patent/TWI272511B/en
Application granted granted Critical
Publication of TWI272511B publication Critical patent/TWI272511B/en
Publication of TW200712954A publication Critical patent/TW200712954A/en

Links

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

Disclosed are an animation generation system and a method thereof, which are applicable to animation application programs to provide a database that pre-stores audio signal decomposition rules, a correlation table of voice package unit and sound pronunciation, and a correlation table of pronunciation and parameters of mouth shape. A decomposing mechanism is adopted to decompose a received audio signal into at least one voice package unit in accordance with the audio signal decomposition rules. An identification mechanism bases on the correlation table of the voice package unit and pronunciation to identify the pronunciation presented by the voice package unit and an output mechanism bases on the correlation table of pronunciation and parameters of mouth shape to retrieve the mouth shape parameters associated with the pronunciation and supplies the parameters to the animation application programs for the animation application programs to generate animation of the mouth shape. As such, the animation of mouth shape is generated in accordance with the associated audio signal so that integration and preciseness of the animation of mouth shape and the audio signal can be significantly enhanced.

Description

12725 Π 九、發明說明: 【發明所屬之技術領域】 本發明係有關於一種動晝生成系統以及方 δ之,係有關於一種應用於動書 〇f 成系統與方法。 -衣作應用&式中的動晝生 【先前技術】 隨著資訊科技與網路通訊等多 技術的提升,以往僅能透過紙筆方能產d種讀與 不復存在。無論是文字、音樂、圖方;=:;^^ 作,利用資訊科技與網路通訊予以=成:作物之創 抵播的風潮,其中又以利用如工作站、個 理裝置輔助創作人創作最為普遍。 白寺貝枓處 、":乍為例,早期的動晝係以實體的紙筆作*, 透過n㈣—張張的㈣以每秒數十葦= 而現今利用資料處理裝置並透過動晝製作應 ,出的電腦動晝,則不需要透過實體的紙筆即許::所:= 三維的動晝。惟無論是傳統動晝或電腦動金\隹或 的角色能夠「發聲」,都必 使其中 以::嘴」方式,依據劇本内容搭配動:^ 或發出其他聲音。 一月邑之I形說话 具版而έ,於習知的動晝製作過程中,制 先設想動晝角色在某一時間點 :乍 過配音員的對嘴配音,方能賦與動畫角色^音二後再透 18875 5 12725 Π 然前述習知的動畫配立 言之,配音員必須先熟記二:二者:多缺點。舉例 私圭垒& 、、 旦角色‘巴的嘴形,再配人今 動|角色$巴的嘴形說話或發出聲音。—再·…玄 與嘴形配合,即合 一配曰内容無法 ,Λ p s產生動作與聲音不協調的处果、广苗壬 新配音,進而诰成軏查制〜 j日J、、、〇果,必須重 以配立。 成旦衣作進度的落後及成本增加。此外, 二貝配合動晝嘴形的配音方式,要 的配曰經驗與技巧,因為—方面要表現出動*角 一方面又要對嘴配音並非㈣,二色^感 作的困難声。至本 $又人執行配音工 、又 ,於動晝製作的過程中,動查制你本& 創作的嘴形相當單,、S a e 旦衣作者所 角色的嘴形是依據說話的字數控制的動畫 ::=度?笑等晝出特別的嘴形,因—“表: 形的後】= 彳 不精確、配音技術門样二:::存在有動晝角色嘴形 點。ra L 過同以及配音效率與成本過高等缺 产卜如何能夠提供一種可以增進動晝角色嘴形精蜂 :重二低配音技術?監以及提升配音效率並降低配音成本 旦生成技術’隨成為目前亟待解決之課題。 【發明内容】 為解決前述習知技術之種種缺點,本發明之主要目的 =提供-種能增進動晝角色嘴形精確度之動晝生成系統 U及方法。 本毛月之另目的在於提供一種能降低配音技術門 18875 6 1272511 極l之動晝生成系統以及方法。 本發明之又一目的在於提供一種能提升配音效率之 動晝生成系統以及方法。 本毛明之再一目的在於提供一種能降低配音成本之 動晝生成系統以及方法。 為達成W述以及其他目的,本發明之動晝生成系統, =:動:製!應用程式中,舰:儲存有音頻訊號 ^ 、、曰包單兀與發音之對應表以及發音與動畫嘴形 ^數之對應表的處料庫;用以依據該音頻訊號解析規則將 f收到的音頻訊號予以解析成至少―個音包單元之解析模 、’用以依m包單元與發音之對應表辨識該音包單元 ”之的辨識杈組;以及用以依據該發音與動書嘴 ==輯㈣取出對應料音的嘴形參數並輸出i該 旦衣作應用私式的輸出模組。 浐解析:::月之一種型態中,該解析模組係依據該音頻訊 ㈣的音頻喊㈣成該音頻訊號之 2波輕’再將該原始波形予以解析成至少-個音包單 訊號解浙之另^種型態中’該解析模組係依據該音海 之^皮η冑顯㈣的音頻訊號轉換成該音頻訊號 : 原始波形予以分段切割複數個讓 波形段落分別解析成至少一個音包單元。 法包括?::::晝生成系統’執行本發明之動晝生成方 .〉'包括有音頻訊號解析規則、音包單元與 18875 7 1272511 • &曰之對應表以及發音與動晝嘴形參數之對應表的資料 庫,依據該音頻訊號解析規則將接收到的音頻訊號予以解 析成至少一個音包單元;依據該音包單元與發音之對應表 ”識及e包單元所代表之發音;以及依據該發音與動畫嘴 形苓數之對應表擷取出對應該發音的嘴形參數並輸出至該 動晝製作應用程式。 ;本毛明之一種型悲中,前述依據該音頻訊號解析規 _ i :接收到的音頻訊號予以解析成至少一個音包單元之步 =係將該接收到的音頻訊號轉換成該音頻訊號之原㈣ >後,再將該原始波形予以解析成至少一個音包單元。 &月之¥種型悲巾,月ίι述依據該音頻訊號解析 規則將接收到的音頻訊號予以解析成至少一個音包單元之 .^ :係依據該音頻訊號解析規則,將該接㈣的音頻訊 ^換成该音頻訊號之原始波形後,將該原始波形予以分 段切割複數個波形段落,再將各該波形段落分別解 鲁少一個音包單元。 純於習知技術,透過本發明之動晝生成“以及方 /所提供之資料庫、解析、辨識與輸出機制,動晝製作應 7程式之嘴形動4的形成係依據相對應之音頻訊號,故= 大幅提升嘴形動晝與音頻訊號之結合度與 【實施方式】 、以下係藉由特定的具體實施例說明本發明之實施方 悉此技藝之人士可由本說明書所揭示之内:輕易地 “午本發明之其他優點與功效。本發明亦可藉由其他不同 18875 8 1272511 的具體實施例加以施行或應用 可基於不同觀點與應用,况月曰中的各項細節亦 種修飾與變更。 ’子離本發明之精神下進行各12725 九 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 发明 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 - The application of the clothing & [Previous technology] With the advancement of many technologies such as information technology and network communication, in the past, only paper and pen can be produced and read. Whether it is text, music, or graphic; =:;^^, using information technology and network communication to become a trend of crop creation, which uses the use of workstations and devices to assist creative creation. universal. For example, the White Temple Bellows, ": 乍, the early 昼 以 以 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体 实体If you want to make a computer, you don't need to pass through the physical pen and paper:::: Three-dimensional movement. However, whether the role of traditional or computer-driven 隹 隹 能够 能够 , , , , , : : : : : : : : : : : : : 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色 角色In January, the I-shaped vocabulary has a version, and in the process of the production of the genius, the system first assumes that the character is at a certain point in time: the voice of the voice of the voice actor can be assigned to the animated character. ^音二后后透18875 5 12725 Π 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述 前述For example, the private guerrilla & 、, the role of ‘bar’s mouth shape, and then the person’s movement|character $ bar’s mouth speaks or makes a sound. - Re-... Xuan and mouth shape cooperation, that is, the content of the combination of the 无法 can not be, Λ ps produces the action and sound inconsistency, the new dubbing of Guangmiao, and then into the investigation system ~ j J,,, 〇 If you have to, you must focus on matching. The progress of the clothing industry has fallen behind and the cost has increased. In addition, the two-bes combined with the mouth-shaped voice-over method, the matching experience and skill, because the - aspect should show the movement * angle on the one hand, but also the mouth dubbing is not (four), two-color ^ feeling difficult sound. To the present, the person who performs the dubbing, and in the process of making the production, the system of the mouth of your creation is quite simple, and the shape of the character of the author of Sae Danyi is based on the number of words spoken. Controlled animation::=degree? Laughs and other special mouth shape, because - "table: shape after" = 彳 inaccurate, dubbing technology door 2::: there is a moving character mouth point. ra L How to provide a kind of mouth-shaped bee that can enhance the dynamic role of the mouth, the dubbing efficiency and the cost is too high: the second low dubbing technology, the monitoring and improving the dubbing efficiency and reducing the dubbing cost, the technology of generation has become urgent SUMMARY OF THE INVENTION In order to solve the various shortcomings of the prior art, the main object of the present invention is to provide a dynamic generation system U and method capable of improving the accuracy of the mouth shape of a moving character. It is to provide a dynamic generation system and method capable of reducing the sound distribution technology door 18875 6 1272511. Another object of the present invention is to provide a dynamic generation system and method capable of improving the dubbing efficiency. One object is to provide a dynamic generating system and method capable of reducing the cost of dubbing. In order to achieve the above and other objects, the dynamic generating system of the present invention, in the application: the ship: stores an audio signal ^ And a correspondence table of the correspondence between the single package and the pronunciation, and a correspondence table of the pronunciation and the animation mouth number; for parsing the audio signal received by f into at least one sound package according to the audio signal analysis rule An analysis module of the unit, an identification group for identifying the package unit according to a correspondence table between the m packet unit and the pronunciation; and a mouth shape for extracting the corresponding material according to the pronunciation and the moving book mouth == series (4) The parameters are output and the output module is used as a private application.浐Analysis::: In one type of month, the parsing module is based on the audio (4) of the audio signal (4) into 2 waves of the audio signal and then parses the original waveform into at least one tone packet signal. In the other type of solution, the analysis module is based on the audio signal of the sound sea 皮 胄 ( (4) into the audio signal: the original waveform is segmented and cut into multiple numbers to resolve the waveform segments into at least A sound pack unit. The law includes? ::::昼Generation system 'Performs the dynamic generator of the present invention.>' includes the audio signal analysis rules, the correspondence between the sound packet unit and the 18875 7 1272511 • & amp and the pronunciation and the mouth shape parameters The database of the table, according to the audio signal parsing rule, parses the received audio signal into at least one sound packet unit; according to the correspondence table between the sound packet unit and the pronunciation, the pronunciation of the e-packet unit is recognized; Correspondence between the pronunciation and the animated mouth shape, the mouth shape parameter corresponding to the pronunciation is taken out and output to the animation production application. One of the types of sorrow in the present, the above-mentioned audio signal analysis rule _i: received The audio signal is parsed into at least one packet unit step = the original audio signal is converted into the original (4) > of the audio signal, and then the original waveform is parsed into at least one packet unit. The month of the ¥ type of sad scarf, the month ι 依据 according to the audio signal analysis rules to parse the received audio signal into at least one sound packet unit. ^ : is based on the audio After parsing the rule, the audio signal of the connection (4) is replaced by the original waveform of the audio signal, and the original waveform is segmented and cut into a plurality of waveform segments, and then each of the waveform segments is unobstructed by one less sound packet unit. According to the prior art, through the dynamics of the present invention, "the database, the analysis, the identification and the output mechanism provided by the party/the party are generated, and the formation of the mouth shape 4 of the program is based on the corresponding audio signal. Therefore, the degree of combination of the mouthpiece and the audio signal is greatly improved. [Embodiment] The following is a description of the embodiments of the present invention by way of specific embodiments. Those skilled in the art can be disclosed in the present specification: Other advantages and effects of the present invention. The present invention may also be implemented or applied by other specific embodiments of 18875 8 1272511, which may be based on different viewpoints and applications, and the details of the present invention are also modified and changed. 'The child is carried out under the spirit of the present invention

請參閱第1圖,苴#太八DD 構示意圖。如圖所示,本發9:之:成系統之應用架 料庫-解析…、辨識 於本實施例中,本發明之 二:出杈組1“ 晝製作應用程式2中。哕f/? t $ …々】係應用於動 接受使用者以輸入參數的方式設計動書丄二有可 徵之變化。更且I#而士 _ —角色局口 P或王部特 之嘴形。 此外 予以執行 記型電腦 ^而5,该局部或全部特徵係為動晝角色 = ί作應用程式2係於資料處理裝置3中 6亥舅料處理裝晋3 3 了為工作站、個人電腦、 個人數位助理笔1女 該資料庫,。,儲存有里功能之裝置。 發音之對應表以及發音金動規則、音包單元與 ,,,+ A /、勤畫噶形翏數之對應表。於本實 =:規;=訊號解析規則包括將音頻訊號轉換成原始 規則.、原始波形分段切割成複數個波形段落之 則=及將該原始波形解析成㈣應的音包單元之規 及r曰包早兀與發音之對應表包括各種型態之音包單元 及其相對應之文字。須拉K | 6 „。 肩知別矹明者,於本實施例中,該音 ^早元係指^段原始波形中所能夠切割出最小且有意義的 :兀舉例5之’如該原始波形係為—句話「動畫生成系 兄乂及方法」,則§玄原始波形分別包括「動」、「晝」、「生」、 18875 9 1272511 「成」、 包單元 系Please refer to Figure 1, 苴#太八DD 构造结构. As shown in the figure, the present invention is as follows: in the present embodiment, the second invention of the present invention is: "杈 杈 1 “ “ 昼 昼 应用 应用 / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? t $ ...々] is applied to the user to input the parameters in the way of inputting the parameters. In addition, I# 士__ role port P or the king's mouth shape. Execute the record computer ^ and 5, the partial or all features are the active role = ί application 2 is in the data processing device 3 6 舅 处理 processing 3 3 for the workstation, personal computer, personal digital assistant The pen 1 female database, the device that stores the function. The correspondence table of the pronunciation and the pronunciation rule, the sound box unit and,, + A /, the corresponding table of the number of the 噶 。 。. =: gauge; = signal parsing rules include converting the audio signal into the original rule. The original waveform is segmented into a plurality of waveform segments = and the original waveform is parsed into (4) the rules of the sound packet unit and the r package The correspondence table between early 兀 and pronunciation includes various types of sound pack units and their corresponding words Must pull K | 6 ". In this embodiment, the sound ^ early element means that the original waveform of the segment can be cut to be the smallest and meaningful: 兀 Example 5 'If the original waveform is - the sentence "Animation "Generation of brothers and methods", § original waveforms include "moving", "昼", "生", 18875 9 1272511 "成", package unit

J 統 以 」、「及」、「方」以及「法」等音 ^方面’ ^本貫施例中,才目同的波形段落具有相同 的音包早元型態,拖+ ^ 包單元。舉例而言 砂— 兴θ之,相同發音的字有相同型態的音 」、「楝」、「洞」、「康」與「恫 具 動 有相同為的音包單元。因&,該發音與動晝嘴形參數之 對應表中’相同發音的文字由於具有相同型態之音包單 元,故亦共用相同的動晝嘴形參數。 該解析模組12,係用以依據該音頻訊號解析規則將接 收到的音頻訊號予以解析成至少—個音包單元。於本實施 例中4解析核組12係依據該音頻訊號解析規則,將該接 收」的日頻Α號轉換成該音頻訊號之原始波形後,將該原 ^波^予以分段㈣複數個波形段落,再將各該波形段落 上=解析成至J 一個音包單元。承前所述,於本實施例中, 。玄曰頻_解析規則包括將音頻訊號轉換成原始波形之規 則’將該原始波形分段切割成複數個波形段落之規則;以 及將該原始波形解析餘對應的音包單元之規則。 口此,若使用者透過與該資料處理裝置3相連接之音 ,輸入裝置30,例如麥克風等,輸入一句子「這是我生平 次喝到這麽好的東西,可惜不知道是什麼」。該解析模 口 12即冒依據该將音頻訊號轉換成原始波形之規則,將該 =子这疋我生平第一次喝到這麼好的東西,可惜不知道 =什麼」轉換成如第2Α圖所示之原始波形示意圖。其次, /角午析杈組12再依據該將該原始波形分段切割成複數個 18875 1272511 、11洛之規則,將該原始波形切割成如第2B圖所示之 2洛卜11、111與1V。接著,該解析模組12復依據該將 奴洛I與m中之該原始波形切割成如第 複數個音包單元^^出、^..。 之表辨:=ί14,係用以依據該音包單元與發音之對應 音包單^ t早7"所代表之發音。請㈣第3目,其係該 例中 1音之對應表的^意圖。承前所述,於本實施 j中,忒辨識模組i 4传依攄立 σ -將該音包單元,早兀與發音之對應表, 「 、「 u、lu、W所對應之發音「這」、「是」、 生」···辨識出來 該輸出模組16,係用以依據該 對應表擷取出對應該發音的嘴 ;二形咖 應用程n r 數輸出至該動晝製竹 之對庫二的閱第4圖’其係該發音與動晝嘴形泉數 之對應表的示意圖。承前所述 "數 組16係依據該發音 Η例中’該輪出損該發音「這」、「广、動旦之對應表’分別將對 我 生 系列嘴形參數; 出至誃動金制从f _ 1」丄」…之一糸列嘴形參數 生成嘴形動;。…’俾供該動晝製作應用程式 本發方本發明之動晝生成系統執 一玍成方法時的流程圖。 如圖所示,於步驟S501中, 號解析規則、音肖罝_ t ϋ 至少包括有音頻1 形泉*㈣二 與發音之對應表以及發音與動* d之對應表的資料庫。接著進行步驟⑽。旦 ]] 18875 1272511 .、於步驟S502中,依據該音頻訊號解析規則將接收到 的音頻訊料以解析成至少—個音包單元。接著進行步驟 S 5 0 3 〇 _知步驟S503中,依據該音包單元與發音之對應表辨 識該音包單元所代表之發音。接著進行步驟S504。 於步驟S504中,依據該發音與動晝嘴形參數之對應 取出對應該發音的嘴形參數並輸出至該動晝製作應用 規則:ί發明之另一種實施例,前述依據該音頻訊號解析 ^將接收到的音頻訊號予以解析成至少-個音包單元之 係將該接收到的音頻訊號轉換成該音頻訊號之原始 / M,再將該原始波形予以解析成至少—個音包單元。 輸:ί發明之又一種型態中,前述依據該音頻訊號解析 將接收到的音頻訊號予以解析成至少—個音包單元之 二:換=;該音頻訊號解析規則,將該接收到的音頻訊 個波形段落,再將各該波形段落 少一個音包單元。 听竹驭主 多不上所述,透過本發明之查 4 供之資料;t , 旦生成系、、先以及方法所提 貝科庫、㈣、辨識與輸出機制,動晝製作應用程式 之鳴形動晝的形成係依據相對應之 :二 并嘴來#Λ全作立Ρ 、σ藏故月b大幅& /力旦人曰頻訊號之結合度與準確性。 上述實施例僅為例示性說明本發明之原理及且功 效,而非用於限制本發明。任何熟習此項技藝之I士均可 18875 12 1272511 在不違背本發明之精神及範疇下,對上述實施例進行修 與變化。因A,本發明之權利保護範圍,應如後述之. 專利範圍所列。 月 【圖式簡單說明】 第1圖係本發明之動晝生成系統之應用架構示意圖; 第2A圖係音頻轉換後之原始波形示意圖; 第2B圖係原始波形切割成不同段落之示意圖; f 2C圖係原始波形切割成音包單元之示意圖; 弟3圖係音包單元與發音之對應表的示意圖; 第4圖係發音與動晝嘴形參數之對應表的示意圖;以 統執行本發明 第5圖’其係透過本發明之動畫生 之動晝生成方法時的流程圖。 【主要元件符號說明】 1 動晝生成系統 10 資料庫 12 解析模組 14 辨識模組 16 輪出模組 2 動晝製作應用 3 資料處理裝置 30 音頻輪入裝置 IJIJIIJV 段落 S501〜S504 步驟 程式J is based on the sounds of "," "and", "square", and "law". In this example, the same waveform segment has the same sound packet early metatype, dragging + ^ packet unit. For example, sand-xing θ, the same pronunciation of the word has the same type of sound, "楝", "hole", "康" and "the same movement of the sound pack unit. Because & The words of the same pronunciation in the correspondence table of the pronunciation and the mouth shape parameters share the same dynamic mouth shape parameters because they have the same type of voice packet unit. The analysis module 12 is configured to use the audio signal according to the audio signal. The parsing rule parses the received audio signal into at least one audio packet unit. In this embodiment, the 4 parsing core group 12 converts the received daily frequency nickname into the audio signal according to the audio signal parsing rule. After the original waveform, the original wave is segmented (four) into a plurality of waveform segments, and then each of the waveform segments is parsed into J a packet unit. As described above, in the present embodiment, The Xuan Xun frequency _ parsing rule includes a rule for converting an audio signal into an original waveform 'the rule of segmenting the original waveform into a plurality of waveform segments; and the rule of the original waveform to parse the corresponding sound packet unit. In this case, if the user inputs a sound connected to the data processing device 3, an input device 30, such as a microphone, etc., a sentence "This is what I have tasted so good in my life, but I don't know what it is." The parsing module 12 takes the rule of converting the audio signal into the original waveform, and the = sub-this is the first time I have tasted such a good thing in my life, but unfortunately I don’t know what?" The original waveform diagram is shown. Secondly, the /Analysis group 12 is further cut into a plurality of 18875 1272511, 11 Luo rules according to the rule of the 18875 1272511, 11 Luo, and the original waveform is cut into 2 Lob 11 and 111 as shown in FIG. 2B. 1V. Then, the parsing module 12 cuts the original waveform in the slaves I and m into the same plurality of packet units, ^. The table identification: = ί14, is based on the sound packet unit and the pronunciation of the corresponding sound package single ^ t early 7 " the pronunciation of the representative. Please (4) Item 3, which is the intention of the correspondence table of 1 tone in this example. As described above, in the present embodiment, the identification module i 4 transmits the corresponding σ - the correspondence table of the sound packet unit, the early 兀 and the pronunciation, ", the pronunciation of "u, lu, W" The "output" module 16 is used to identify the mouth corresponding to the pronunciation according to the corresponding table; the number of applications of the second-shaped coffee application is output to the pair of the bamboo The second picture of the library 2 is a schematic diagram of the correspondence table between the pronunciation and the number of the mouth-shaped springs. According to the pronunciation of the above, the array 16 is based on the pronunciation of the 'in the round, the pronunciation of the sound "this" and the "wide table of the broad and dynamic" will be the parameters of the mouth of the series; Generate a mouth shape from one of the f _ 1 "丄"... ...' 俾 该 该 该 昼 昼 本 本 本 本 本 本 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 。 As shown in the figure, in step S501, the number analysis rule, the sound 罝_t ϋ includes at least a database of correspondence tables of audio 1 springs*(4)2 and pronunciations, and correspondence tables of pronunciations and motions*d. Then proceed to step (10). Once in the step S502, the received audio material is parsed into at least one packet unit according to the audio signal parsing rule. Next, in step S5 0 3 〇 _ knowing step S503, the pronunciation represented by the packet unit is recognized based on the correspondence table between the packet unit and the pronunciation. Next, step S504 is performed. In step S504, according to the correspondence between the pronunciation and the dynamic mouth shape parameter, the mouth shape parameter corresponding to the pronunciation is taken out and output to the dynamic creation application rule: 另一 another embodiment of the invention, the foregoing according to the audio signal analysis ^ The received audio signal is parsed into at least one packet unit, and the received audio signal is converted into the original/M of the audio signal, and the original waveform is parsed into at least one packet unit. In another type of invention, the audio signal is parsed into at least one of the audio component units according to the audio signal analysis: the replacement audio signal parsing rule, the received audio Signal a waveform segment, and then add one packet unit to each of the waveform segments. Listening to the bamboo raft masters, the information provided by the survey 4; t, the generation system, the first method and the method of the Beccu library, (4), the identification and output mechanism, the production of the application The formation of the shape of the movement is based on the corresponding: two and the mouth to #Λ全作立Ρ, σ藏故月b大 & / Lidan people frequency signal combination and accuracy. The above-described embodiments are merely illustrative of the principles and advantages of the invention and are not intended to limit the invention. Anyone skilled in the art can use 18875 12 1272511 to modify and modify the above embodiments without departing from the spirit and scope of the invention. Because of A, the scope of protection of the present invention should be as described later. [Simplified illustration of the drawing] Fig. 1 is a schematic diagram of the application architecture of the dynamic generating system of the present invention; Fig. 2A is a schematic diagram of the original waveform after audio conversion; Fig. 2B is a schematic diagram of cutting the original waveform into different paragraphs; f 2C The schematic diagram of the original waveform cut into a sound packet unit; the schematic diagram of the corresponding table of the sound packet unit and the pronunciation of the brother 3; the fourth diagram is a schematic diagram of the correspondence table of the pronunciation and the dynamic mouth shape parameters; 5 is a flow chart showing a method of generating an animation generated by the animation of the present invention. [Main component symbol description] 1 Dynamic generation system 10 Database 12 Analysis module 14 Identification module 16 Round module 2 Dynamic production application 3 Data processing device 30 Audio wheeling device IJIJIIJV Paragraph S501~S504 Step Program

18875 1318875 13

Claims (1)

1272511 •十 申%專利範圍: • 種動晝生成系統,係應用於動晝製作應用程式中, 其包括: ,貝料庫,係用以儲存音頻訊號解析規則、音包單元 共叙音之對應表以及發音與動晝嘴形參數之對應表; 解析模組,係用以依據該音頻訊號解析規則將接收 到的音頻訊號予以解析成至少一個音包單元; 辨識模組,係用以依據該音包單元與 辨識該音包單元所絲之發音;以及 ^表 輸出模組,係用以依據該發音與動晝嘴形參數之對 應表擷取出對應該發音的嘴形參數並輸出至該動壹制 作應用程式。 旦衣 2. 如申請專利範圍第1項之動晝生成系統,其中,該— 頻訊號解析規則包括將音頻訊號轉換成原始波形之# 則;將該原始波形分段切割成複數個波形段落之規則 以及將该原始波形解析成相對應的音包單元之規則 3. 如=請專利範圍項之動晝生成系統,其中,該^ 包單元與發音之對應表包括各種型態之音包單元二 相對應之文字。 早兀及/ 4. 5.如申請專利範圍第1項之動晝生成系 —旦工观尔肌,具中,今玄 音與動晝嘴形來數之斜庙本士 ,歎之對應表中,相同發音的文字由 18875 14 1272511 具有相同型態之音包單元,並共用相同的動晝嘴形參 數。 6·如申請專利範圍第1項之動畫生成系統,其中,該解 析模組係依據該音頻訊號解析規則,將該接收到的立 頻訊號轉換成該音頻訊號之原始波形後,再將該原= 波形予以解析成至少一個音包單元。 7·如申請專利範圍第丨項之動晝生成系統,其中,該解 析模組係依據該音頻訊號解析規則,將該接收到的音 頻訊號轉換成該音頻訊號之原始波形後,將該原仏、皮 形予以分段切割複數個波形段落,再將各該波形段落 刀別角午析成至少一個音包單元。 & 一種動晝生成方法,係應用於動晝製作應用程式 其包括: 建立至少包括有音頻訊號解析規則、音包單元與發 音之對應表以及發音與動晝嘴形參數之對應表的資 庫; 貝〆 依據該音頻訊號解析規則將接收到的音頻訊號予 以解析成至少一個音包單元; 依據該音包單元與發音之對應表辨識該音包單元 所代表之發音;以及 依據該發音與動晝嘴形參數之對應表擷取出對應 该發音的嘴形參數並輸出至該動晝製作應用程式。 9·如申請專利範圍第8項之動畫生成方法,其中^該音 頻訊號解析規則包括將音頻訊號轉換成原始波形之規 18875 15 12725 Γ1 則;將該原始波形分段切割成複數個波形段落之規則; 以及爿寸该原始波形解析成相對應的音包單元之規則。 10·如申請專利範圍第8項之動晝生成方法,其中,該音 包單兀與發音之對應表包括各種型態之音包單元及其 相對應之文字。 11.如申請專利範圍第δ項之動畫生成方法,其中,該音 包單兀係為一段原始波形中所能夠切割出最小且有意 義的單元。 12·=申請專利範圍第8項之動晝生成方法,其中,該發 曰與動晝嘴形參數之對應表中,相同發音的文字由於 〃有相同型恶之音包單元,並共用相同的動晝嘴形夂 .13·如中請專利範圍第8項之動晝生成方法,其中,該依 據該音頻訊號解析規則將接收到的音頻訊號予以=析 成至J —個音包單元之步驟,係將該接收到的音頻訊 鲁號轉換成該音頻訊號之原始波形後,再將該原始波形 予以解析成至少一個音包單元。 1 L如申請專利範圍第8項之動畫生成方法,其中,該 據該音頻訊號解析規則將接收到的音頻訊號予以^ ,至少-個音包單元之步驟,係依據該音頻訊號解 是則,將該接收到的音頻訊號轉換成該音頻訊號之 $波形後,將該原始波形予以分段切割複數個波° 洛,再將各該波形段落分別解析成至少—個音包單」 18875 】61272511 •10% patent scope: • The dynamic generation system is applied to the dynamic production application, which includes: , the library, which is used to store the audio signal analysis rules and the sound box unit common narrative a table and a correspondence table between the pronunciation and the mouth shape parameter; the analysis module is configured to parse the received audio signal into at least one sound packet unit according to the audio signal analysis rule; the identification module is configured to a sound packet unit and a sound of the sound of the sound packet unit; and a table output module for extracting a mouth shape parameter corresponding to the sound according to the correspondence between the sound and the mouth shape parameter and outputting to the motion壹 Create an app. The clothing generation system of claim 1, wherein the frequency signal analysis rule comprises: converting the audio signal into the original waveform; and cutting the original waveform into a plurality of waveform segments. Rules and rules for parsing the original waveform into corresponding packet units. 3. If the patent scope item is generated, the correspondence table of the unit and the pronunciation includes various types of sound pack units. Corresponding text. Early 兀 and / 4. 5. If the application of the patent scope of the first item of the 昼 昼 — — 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工 工Medium, the same pronunciation of the text by 18875 14 1272511 has the same type of sound pack unit, and share the same dynamic mouth shape parameters. 6. The animation generation system of claim 1, wherein the analysis module converts the received vertical frequency signal into an original waveform of the audio signal according to the audio signal analysis rule, and then the original = The waveform is parsed into at least one packet unit. 7. The method of claim 2, wherein the parsing module converts the received audio signal into an original waveform of the audio signal according to the audio signal parsing rule, and then the original The skin shape is divided into a plurality of waveform segments in sections, and then each of the waveform segments is analyzed into at least one sound packet unit. & A dynamic generation method is applied to an animation production application, which comprises: establishing a library including at least an audio signal analysis rule, a correspondence table of a sound packet unit and a pronunciation, and a correspondence table between a pronunciation and a mouth-shaped parameter. According to the audio signal parsing rule, Bessie parses the received audio signal into at least one sound packet unit; identifies the pronunciation represented by the sound packet unit according to the correspondence table of the sound packet unit and the pronunciation; and according to the pronunciation and movement Correspondence table of the mouth shape parameter takes out the mouth shape parameter corresponding to the pronunciation and outputs it to the animation making application. 9. The method for generating an animation according to item 8 of the patent application, wherein the audio signal parsing rule comprises a rule for converting an audio signal into an original waveform 18875 15 12725 Γ1; the original waveform segment is segmented into a plurality of waveform segments. Rules; and rules for parsing the original waveform into corresponding packet units. 10. The method for generating a dynamic item according to item 8 of the patent application scope, wherein the correspondence table of the sound package unit and the pronunciation includes various types of sound pack units and corresponding characters. 11. The method of generating an animation according to the δth item of the patent application, wherein the package is a unit that can cut the smallest and meaningful meaning in a piece of the original waveform. 12·=The method for generating the dynamics of the eighth item of the patent application scope, wherein in the correspondence table between the hairpin and the mouth shape parameter, the same pronunciation word has the same type of evil voice packet unit, and shares the same昼 夂 夂 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 After converting the received audio signal to the original waveform of the audio signal, the original waveform is parsed into at least one sound packet unit. 1 L. The animation generating method according to item 8 of the patent application scope, wherein the audio signal received by the audio signal parsing rule is subjected to a step of at least one sound packet unit, according to the audio signal solution, After converting the received audio signal into the waveform of the audio signal, the original waveform is segmented and cut into a plurality of waves, and each of the waveform segments is respectively parsed into at least one audio package. 18875 】
TW094131779A 2005-09-15 2005-09-15 Animation generation system and method TWI272511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW094131779A TWI272511B (en) 2005-09-15 2005-09-15 Animation generation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW094131779A TWI272511B (en) 2005-09-15 2005-09-15 Animation generation system and method

Publications (2)

Publication Number Publication Date
TWI272511B true TWI272511B (en) 2007-02-01
TW200712954A TW200712954A (en) 2007-04-01

Family

ID=38441281

Family Applications (1)

Application Number Title Priority Date Filing Date
TW094131779A TWI272511B (en) 2005-09-15 2005-09-15 Animation generation system and method

Country Status (1)

Country Link
TW (1) TWI272511B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI416356B (en) * 2010-10-13 2013-11-21 Univ Nat Cheng Kung Database system

Also Published As

Publication number Publication date
TW200712954A (en) 2007-04-01

Similar Documents

Publication Publication Date Title
Sutton et al. Voice as a design material: Sociophonetic inspired design strategies in human-computer interaction
Wolfert et al. A review of evaluation practices of gesture generation in embodied conversational agents
Ghorbani et al. ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech
TWI313418B (en) Multimodal speech-to-speech language translation and display
US9047868B1 (en) Language model data collection
JP7232293B2 (en) MOVIE GENERATION METHOD, APPARATUS, ELECTRONICS AND COMPUTER-READABLE MEDIUM
Al-onazi et al. Transformer-based multilingual speech emotion recognition using data augmentation and feature fusion
Meyerhoff Turning variation on its head: Analysing subject prefixes in Nkep (Vanuatu) for language documentation
Mussakhojayeva et al. KazakhTTS: An open-source Kazakh text-to-speech synthesis dataset
CN109166409A (en) A kind of sign language conversion method and device
Huang Toward multimodal corpus pragmatics: Rationale, case, and agenda
TWI269192B (en) Semantic emotion classifying system
Graham et al. Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits
Knight A multi-modal corpus approach to the analysis of backchanneling behaviour
CN104217039A (en) Method and system for recording telephone conversations in real time and converting telephone conversations into declarative sentences
Chang et al. Knowledge transfer for on-device speech emotion recognition with neural structured learning
TWI272511B (en) Animation generation system and method
JP6511192B2 (en) Discussion support system, discussion support method, and discussion support program
Liu et al. Contrastive Learning based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition with Missing Modalities
Evangeline A survey on Artificial Intelligent based solutions using Augmentative and Alternative Communication for Speech Disabled
Le et al. Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning.
Ramachandra et al. Human centered computing in digital persona generation
Bharti et al. An approach for audio/text summary generation from webinars/online meetings
Kadam et al. A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation
CN112233648A (en) Data processing method, device, equipment and storage medium combining RPA and AI

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees