TWI423051B - Semantic-based visual effect generating system and program product - Google Patents

Semantic-based visual effect generating system and program product Download PDF

Info

Publication number
TWI423051B
TWI423051B TW99131706A TW99131706A TWI423051B TW I423051 B TWI423051 B TW I423051B TW 99131706 A TW99131706 A TW 99131706A TW 99131706 A TW99131706 A TW 99131706A TW I423051 B TWI423051 B TW I423051B
Authority
TW
Taiwan
Prior art keywords
semantic
data
category
text
image
Prior art date
Application number
TW99131706A
Other languages
Chinese (zh)
Other versions
TW201214153A (en
Inventor
Ya Chi Chuang
Chueh Pin Ko
Ming Shan Liu
Original Assignee
Acer Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acer Inc filed Critical Acer Inc
Priority to TW99131706A priority Critical patent/TWI423051B/en
Publication of TW201214153A publication Critical patent/TW201214153A/en
Application granted granted Critical
Publication of TWI423051B publication Critical patent/TWI423051B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Description

基於語義的視覺效果產生系統及程式產品Semantic-based visual effects generation system and program products

本發明是有關於一種視覺效果產生系統,特別是指一種基於語義的視覺效果產生系統及程式產品。The invention relates to a visual effect generating system, in particular to a semantic based visual effect generating system and a program product.

由於網際網路蓬勃發展,各種多媒體資料,可以藉由網路平台獲得;而目前也存在一些語義分析技術,可對該等多媒體資料進行語義分析,以獲得相關的語義資料(或稱詮釋資料(metadada)),前述語義資料一般用於對與其相關的該等多媒體資料進行描述、分類,或組織,以加速後續對於該等多媒體資料之檢索。Due to the rapid development of the Internet, various multimedia materials can be obtained through the Internet platform; however, there are also some semantic analysis techniques for semantic analysis of such multimedia materials to obtain related semantic data (or interpret data ( Metadada)), the aforementioned semantic data is generally used to describe, classify, or organize the multimedia materials associated therewith to accelerate subsequent retrieval of the multimedia materials.

如US 7065250所揭露的一種自動化影像詮釋及檢索系統,係對數位影像及視訊資料進行分析後,將數位影像及視訊資料以語義標記(semantic label)詮釋,以供後續的程序使用,例如,基於內容的檢索(content-based retrieval),及視訊摘要的產生(video abstract generation)。An automated image interpretation and retrieval system as disclosed in US Pat. No. 7,052,250, which analyzes digital images and video data, and interprets digital images and video data with a semantic label for subsequent use, for example, based on Content-based retrieval, and video abstract generation.

然,上述語義資料除了用於檢索的用途之外,還可反映該等多媒體資料的情境、情感、意象等內涵;若能將語義資料進一步用於視覺相關的後處理,將為使用者帶來更深一層的視覺感受。However, the above semantic data can reflect the context, emotion, image and other connotations of the multimedia materials in addition to the purpose of the retrieval; if the semantic data can be further used for visual related post-processing, it will bring the user A deeper visual experience.

因此,本發明之目的,即在提供一種基於語義的視覺效果產生系統。Accordingly, it is an object of the present invention to provide a semantic-based visual effect generation system.

於是,本發明基於語義的視覺效果產生系統是包含一語義處理模組,及一視覺處理模組。該語義處理模組包括一語義分類單元;該語義分類單元用以接收相關於一來源資料的一語義資料組,並用以對該語義資料組進行語義/類別對應分類,以將該語義資料組的內容對應分類至所屬的至少一語義類別,藉此產生包括該語義類別的一語義類別聯集,其中,該來源資料包括一影像資料,及一聲音資料與一文字資料兩者其中至少一者。該視覺處理模組包括一視覺參數產生單元,及一視覺後製單元;該視覺參數產生單元用以接收該語義類別聯集,並根據該語義類別聯集以得到一視覺參數組,該視覺後製單元用以根據該視覺參數組產生至少一視覺效果,並將該來源資料的該影像資料結合該視覺效果後進行顯示。Therefore, the semantic-based visual effect generation system of the present invention comprises a semantic processing module and a visual processing module. The semantic processing module includes a semantic classification unit; the semantic classification unit is configured to receive a semantic data group related to a source material, and perform semantic/category correspondence classification on the semantic data group to The content is correspondingly classified to the at least one semantic category, thereby generating a semantic category association including the semantic category, wherein the source material includes an image material, and at least one of a sound material and a text material. The visual processing module includes a visual parameter generating unit and a visual post-production unit; the visual parameter generating unit is configured to receive the semantic category association and collect the visual parameter group according to the semantic category to obtain a visual parameter group. The unit is configured to generate at least one visual effect according to the visual parameter set, and display the image material of the source material in combination with the visual effect.

本發明之另一目的,即在提供一種內儲基於語義的視覺效果產生程式的程式產品。Another object of the present invention is to provide a program product that stores a semantic-based visual effect generating program.

於是,本發明內儲基於語義的視覺效果產生程式的程式產品,可被載入一電子裝置執行,以完成上述基於語義產生視覺效果系統之該語義處理模組,及該視覺處理模組的功能。Therefore, the program product storing the semantic-based visual effect generating program of the present invention can be loaded into an electronic device to perform the semantic processing module based on the semantic generating visual effect system, and the function of the visual processing module. .

本發明的功效在於:藉由該語義處理模組及該視覺處理模組,產生反映該來源資料內涵的該視覺效果,並將該來源資料的該影像資料與該視覺效果結合後進行顯示,可為使用者帶來更深一層的視覺感受。The effect of the present invention is that the visual processing module and the visual processing module generate the visual effect reflecting the connotation of the source data, and combine the image data of the source material with the visual effect to display Give users a deeper visual experience.

有關本發明的前述及其他技術內容、特點與功效,在以下配合參考圖式的一個較佳實施例的詳細說明中,將可清楚的呈現。The foregoing and other technical aspects, features and advantages of the present invention will be apparent from the following description of the preferred embodiments.

參閱圖1,本發明基於語義之視覺效果產生系統的較佳實施例包含一來源資料分類模組1、耦接於該來源資料分類模組1的一語義處理模組2,及耦接於該語義處理模組2的一視覺處理模組3。在本較佳實施例中,該基於語義之視覺效果產生系統係以軟體方式實施,其實施態樣為內儲基於語義的視覺效果產生程式的程式產品,當一電子裝置(例如,電腦之處理器,圖未示)載入該程式並執行後,可完成該來源資料分類模組1、該語義處理模組2,及該視覺處理模組3之功能。Referring to FIG. 1 , a preferred embodiment of the semantic-based visual effect generating system of the present invention includes a source data classification module 1 , a semantic processing module 2 coupled to the source data classification module 1 , and coupled to the A visual processing module 3 of the semantic processing module 2. In the preferred embodiment, the semantic-based visual effect generation system is implemented in a software manner, and the implementation aspect thereof is a program product for storing a semantic-based visual effect generation program, when an electronic device (for example, a computer processing) After the program is loaded and executed, the source data classification module 1, the semantic processing module 2, and the function of the visual processing module 3 can be completed.

該來源資料分類模組1用以接收一來源資料,並對該來源資料進行分類,其中,該來源資料包括一影像資料,及一聲音資料與一文字資料兩者其中至少一者,該影像資料可為靜態影像,或包括一連串影像的視訊。在本較佳實施例中,該來源資料為一數位多媒體資料,其包括一影像資料、一聲音資料,及一文字資料。The source data classification module 1 is configured to receive a source data and classify the source data, wherein the source material includes an image data, and at least one of a sound data and a text data, the image data may be A still image, or a video that includes a series of images. In the preferred embodiment, the source material is a digital multimedia material, which includes an image data, a sound data, and a text data.

該語義處理模組2包括一影像分析器21、一聲音分析器22、一文字分析器23、耦接於該影像、聲音及文字分析器21~23的一語義分類單元24、耦接於該影像分析器21的一影像物件資料庫25、耦接於文字分析器23的一關鍵字資料庫26,及耦接於該語義分類單元24的一語義/類別資料庫27。The semantic processing module 2 includes an image analyzer 21, a sound analyzer 22, a text analyzer 23, and a semantic classification unit 24 coupled to the image, sound and text analyzers 21-23, coupled to the image. An image object database 25 of the analyzer 21, a keyword database 26 coupled to the text analyzer 23, and a semantic/category database 27 coupled to the semantic classifying unit 24.

該影像分析器21、該聲音分析器22,及該文字分析器23用以接收已分類的該來源資料,並分別對其影像、聲音及文字資料進行語義分析,以得到相關於該來源資料的一語義資料組。其中,該影像分析器21、該聲音分析器22,及該文字分析器23所進行之處理進一步描述如下。The image analyzer 21, the sound analyzer 22, and the text analyzer 23 are configured to receive the classified source data, and perform semantic analysis on the image, sound and text data respectively to obtain related information about the source material. A semantic data set. The processing performed by the image analyzer 21, the sound analyzer 22, and the text analyzer 23 is further described below.

該影像分析器21用以對該來源資料的該影像資料進行分析以得到該語義資料組,其中,對於該影像資料的其中一影像,該影像分析器21係求得對應該影像的一亮度值及一對比值,並根據儲存於該影像物件資料庫25內的一影像物件(image object)集合進行比對,以自該影像中擷取出至少一重要物件(key object);該語義資料組包括對應該影像的該亮度值、該對比值,及該重要物件。在本較佳實施例中,該影像分析器21係求得該影像中所有像素(pixel)的亮度(luminance)的一平均值作為該亮度值;求得所有像素的亮度中一最大亮度與一最小亮度的一差值作為該對比值。該影像物件集合係預先建立並儲存於該影像物件資料庫25中,該影像物件集合包括日常生活中常見的影像物件,例如,交通工具、人、建築設施等,該影像物件資料庫25還可隨著使用上的需要而擴充並更新內容。The image analyzer 21 is configured to analyze the image data of the source data to obtain the semantic data group, wherein the image analyzer 21 determines a brightness value corresponding to the image for one image of the image data. And comparing the value, and comparing the image object according to the set of image objects stored in the image object database 25 to extract at least one key object from the image; the semantic data group includes The brightness value corresponding to the image, the comparison value, and the important object. In the preferred embodiment, the image analyzer 21 obtains an average value of the luminance of all pixels in the image as the brightness value; and obtains a maximum brightness and a brightness among all the pixels. A difference value of the minimum brightness is taken as the comparison value. The image object collection is pre-established and stored in the image object database 25, and the image object collection includes image objects commonly used in daily life, such as vehicles, people, building facilities, etc., and the image object database 25 can also be Expand and update content as needed for use.

該聲音分析器22用以對該來源資料的該聲音資料進行分析以得到該語義資料組,其中,對於該聲音資料的其中一聲音段(audio segmentation),該聲音分析器22係求得對應該聲音段的至少一頻率(frequency)及至少一振幅(amplitude);該語義資料組還包括對應該聲音段的該頻率及該振幅。在本較佳實施例中,該頻率為該聲音段中頻率值為較高者,該振幅該聲音段中振幅值為較大者。The sound analyzer 22 is configured to analyze the sound data of the source material to obtain the semantic data set, wherein the sound analyzer 22 is corresponding to one of the audio segments of the sound data. At least one frequency (frequency) and at least one amplitude of the sound segment; the semantic data set further includes the frequency corresponding to the sound segment and the amplitude. In the preferred embodiment, the frequency is higher in the sound segment, and the amplitude is greater in the sound segment.

該文字分析器23用以對該來源資料的該文字資料進行分析以得到該語義資料組,其中,對於該文字資料的其中一文字段(word segmentation),該文字分析器23係根據一關鍵字(key word)集合進行比對,以自該文字段中擷取出具有代表性的至少一關鍵字;該文字分析器23還根據該文字段持續的一時間間隔及一總字數,以求得對應該文字段的一文字速度;該語義資料組還包括對應該文字段的該關鍵字及該文字速度。在本較佳實施例中,該關鍵字集合係預先建立並儲存於該關鍵字資料庫26中,該關鍵字集合包括常用的關鍵詞彙,例如,專有名詞、地名等,該關鍵字資料庫26還可隨著使用上的需要而擴充並更新內容。The text analyzer 23 is configured to analyze the text data of the source material to obtain the semantic data group, wherein the text analyzer 23 is based on a keyword (word segmentation) of the text material. The key word) is compared to extract a representative at least one keyword from the text field; the text analyzer 23 further determines the pair according to the duration of the text field and a total number of words. A text speed of the text field; the semantic data set also includes the keyword corresponding to the text field and the text speed. In the preferred embodiment, the keyword set is pre-established and stored in the keyword database 26, and the keyword set includes common keyword pools, such as proper nouns, place names, etc., the keyword database. 26 can also expand and update content as needed for use.

該語義分類單元24用以對該語義資料組進行語義/類別對應分類,以將該語義資料組的內容對應分類至所屬的至少一語義類別,藉此產生包括該語義類別的一語義類別聯集(union)。在本較佳實施例中,該語義分類單元24係根據一語義/類別關聯(relation)組,對該語義資料組進行語義/類別對應分類,且產生的該語義類別聯集包括複數語義類別。其中,該語義/類別關聯組係預先建立並儲存於該語義/類別資料庫27中,該語義/類別關聯組包括一亮度值/類別對應關聯、一對比值/類別對應關聯、一影像物件/類別對應關聯、一頻率/類別對應關聯、一振幅/類別對應關聯、一關鍵字/類別對應關聯,及一文字速度/類別對應關聯;該語義/類別關聯組係藉由統計大量的數位多媒體資料建立而成,主要是用於將該語義資料組的內容對應至有特定意義的語義類別,該語義/類別關聯組如下表1所示。The semantic classification unit 24 is configured to perform semantic/category correspondence classification on the semantic data group to classify the content of the semantic data group into at least one semantic category, thereby generating a semantic category association including the semantic category. (union). In the preferred embodiment, the semantic classification unit 24 performs semantic/category correspondence classification on the semantic data group according to a semantic/category association group, and the generated semantic category association includes a complex semantic category. The semantic/category association group is pre-established and stored in the semantic/category database 27, and the semantic/category association group includes a luminance value/category correspondence, a comparison value/category correspondence, and an image object/ Category correspondence, a frequency/category correspondence, an amplitude/category association, a keyword/category association, and a text speed/category association; the semantic/category association is established by counting a large number of digital multimedia materials It is mainly used to map the content of the semantic data group to a semantic category with a specific meaning. The semantic/category association group is shown in Table 1 below.

該視覺處理模組3包括一視覺參數產生單元31,及耦接於該視覺參數產生單元31的一視覺後製單元32。The visual processing module 3 includes a visual parameter generating unit 31 and a visual post-production unit 32 coupled to the visual parameter generating unit 31.

該視覺參數產生單元31用以接收該語義類別聯集,並根據該語義類別聯集的該等語義類別產生一視覺參數組。其中,該視覺參數產生單元31係根據預先建立的一語義類別/調整參數關聯、一語義類別/字幕參數關聯、一語義類別/文字參數關聯,及該語義類別聯集的各語義類別,以得到該視覺參數組。在本較佳實施例中,該語義類別/調整參數關聯、該語義類別/字幕參數關聯,及語義類別/文字參數關聯為整合於程式的複數判斷條件,不過,該語義類別/調整參數關聯、該語義類別/字幕參數關聯,及語義類別/文字參數關聯亦可預先建立於一資料庫(圖未示)中,並不限於本較佳實施例所揭露;該語義類別/調整參數關聯是根據該語義類別聯集中各語義類別,及其等的交集或聯集,對應判斷出用以調整單張影像的該視覺參數組,其包括一特效濾鏡(filter)、一背景對比調整參數、一背景亮度調整參數組;該語義類別/字幕參數關聯是根據該語義類別聯集中各語義類別,及其等的交集或聯集,對應判斷出與一語義強化字幕相關的該視覺參數組,其包括對應該語義強化字幕的一字體大小、一顏色、一字型,及一字幕特效其中至少一者。The visual parameter generating unit 31 is configured to receive the semantic category association and generate a visual parameter group according to the semantic categories of the semantic category association. The visual parameter generating unit 31 is configured according to a semantic category/adjustment parameter association, a semantic category/subtitle parameter association, a semantic category/text parameter association, and each semantic category of the semantic category association. The visual parameter set. In the preferred embodiment, the semantic category/adjustment parameter association, the semantic category/subtitle parameter association, and the semantic category/text parameter association are complex judgment conditions integrated into the program, but the semantic category/adjustment parameter association, The semantic category/subtitle parameter association, and the semantic category/text parameter association may also be pre-established in a database (not shown), and is not limited to the preferred embodiment; the semantic category/adjustment parameter association is based on The semantic category is associated with each semantic category, and its intersection or union, correspondingly determining the visual parameter set for adjusting a single image, which includes a special effect filter (filter), a background contrast adjustment parameter, and a a background brightness adjustment parameter group; the semantic category/subtitle parameter association is based on the semantic category of the semantic category, and an intersection or a combination thereof, correspondingly determining the visual parameter group related to a semantic enhanced subtitle, including At least one of a font size, a color, a font, and a subtitle effect corresponding to semantically enhanced subtitles.

該視覺後製單元32用以根據該視覺參數組產生至少一視覺效果,並將該來源資料的該影像資料結合該視覺效果後一同輸出並顯示,其中,該視覺效果可為動態視覺效果或靜態視覺效果。該視覺效果包括用於對該影像資料進行影像調整處理的一單張影像特效、用於與該影像資料進行疊合顯示的該語義強化字幕,及用於對該文字資料進行調整處理的一文字調整特效其中至少一者。值得一提的是,該視覺後製單元32之詳細實作方式係為熟習此項技術者所熟知,且目前已存在許多與視覺後製相關的軟體,故不在此贅述。The visual post-production unit 32 is configured to generate at least one visual effect according to the visual parameter set, and output and display the image data of the source material together with the visual effect, wherein the visual effect may be a dynamic visual effect or a static Visual effect. The visual effect includes a single image special effect for performing image adjustment processing on the image data, the semantic enhanced subtitle for superimposing and displaying the image data, and a text adjustment for adjusting the text data. At least one of the special effects. It is worth mentioning that the detailed implementation of the visual post-production unit 32 is well known to those skilled in the art, and there are many softwares related to visual post-production, and therefore will not be described here.

參閱圖1、圖2,與圖3,對應上述較佳實施例,以下配合一基於語義之視覺效果產生方法及一應用範例,以對該來源資料分類模組1、該語義處理模組2,及該視覺處理模組3彼此間的互動作出說明。其中,該基於語義之視覺效果產生方法包含下列步驟。Referring to FIG. 1 , FIG. 2 , and FIG. 3 , corresponding to the above preferred embodiment, a semantic-based visual effect generating method and an application example are combined to use the source data classification module 1 and the semantic processing module 2 . And the interaction of the visual processing modules 3 with each other is explained. The semantic-based visual effect generating method includes the following steps.

如步驟S41所示,該來源資料分類模組1對一來源資料進行分類,以得到一影像資料、一聲音資料,及一文字資料。As shown in step S41, the source data classification module 1 classifies a source data to obtain an image data, a sound data, and a text data.

在本應用範例中,該來源資料為一附有字幕的數位影音資料,該影像資料的其中一影像5及該文字資料的其中一文字段6如圖3所示;其中,該文字段6為一段字幕的內容,即,「劍湖山遊樂場將加碼演出火藥爆破秀」,該聲音資料的其中一聲音段為對應該文字段6的一段聲音。In this application example, the source data is a digital audio and video material with subtitles, one of the image 5 of the image data and one of the text fields 6 of the text data is as shown in FIG. 3; wherein the text field 6 is a segment The content of the subtitles, that is, "Jianhushan Playground will be superimposed to show the gunpowder blast show", one of the sound segments of the sound data is a sound corresponding to the text field 6.

如步驟S42所示,該語義處理模組2之該影像分析器21、該聲音分析器22,及該文字分析器23分別對步驟S41分類出的該影像、聲音及文字資料進行語義分析,以得到相關於該來源資料的一語義資料組。As shown in step S42, the image analyzer 21, the sound analyzer 22, and the text analyzer 23 of the semantic processing module 2 perform semantic analysis on the image, sound, and text data classified in step S41, respectively. Get a semantic data set related to the source material.

在本應用範例中,該語義資料組包括:對應該影像5的一亮度值(假設值為135)、一對比值(假設值為90)與一重要物件51(假設為一摩天輪)、對應該聲音段的一頻率(假設為350赫茲)與兩個振幅(假設分別為70分貝及80分貝)、多個關鍵字(假設為劍湖山、火藥、爆破秀),及一文字速度(假設為80字/分)。In this application example, the semantic data set includes: a luminance value corresponding to the image 5 (assumed to be 135), a contrast value (assumed to be 90), and an important object 51 (assumed to be a Ferris wheel), Should have a frequency of the sound segment (assumed to be 350 Hz) and two amplitudes (assuming 70 decibels and 80 decibels respectively), multiple keywords (assumed to be Jianhushan, gunpowder, blast show), and a text speed (assumed to be 80) Word/min).

如步驟S43所示,該語義處理模組2之該語義分類單元24根據表1所示的該語義/類別關聯組,分別將步驟S42求得的該亮度值、該對比值、該重要物件、該頻率、該振幅、該等關鍵字,及該文字速度對應分類至所屬的複數語義類別,並產生出包括該等語義類別的一語義類別聯集。As shown in step S43, the semantic classification unit 24 of the semantic processing module 2 respectively determines the brightness value, the comparison value, the important object obtained in step S42 according to the semantic/category association group shown in Table 1. The frequency, the amplitude, the keywords, and the text velocity are correspondingly classified to the associated plural semantic category, and a semantic category association including the semantic categories is generated.

在本應用範例中,該亮度值(135)分類至一亮度類別_3,其指示該影像5的亮度為高亮度;該對比值(90)分類至一對比類別_5,其指示該影像5的對比為中偏高對比;該重要物件51(摩天輪)分類至一影像物件類別_2,其指示該影像5與遊樂場有關;該頻率(350赫茲)分類至一頻率類別_6,其指示該聲音段為高亢的女性聲音;該等振幅(70分貝及80分貝)同樣分類至一振幅類別_7,其指示該聲音段為大音量;該等關鍵字(劍湖山、火藥、爆破秀)分別分類至一關鍵字類別_7、一關鍵字類別_10,及一關鍵字類別_3,其等分別指示一遊樂場名稱、一專有名詞,及一活動名稱;該文字速度(80字/分)分類至一文字速度類別_4,其指示說話速度為快速。In this application example, the brightness value (135) is classified into a brightness category _3 indicating that the brightness of the image 5 is high brightness; the comparison value (90) is classified into a comparison category _5 indicating the image 5 The contrast is medium to high contrast; the important object 51 (Ferris wheel) is classified into an image object category_2, which indicates that the image 5 is related to the playground; the frequency (350 Hz) is classified into a frequency category _6, which a female voice indicating that the sound segment is high; the amplitudes (70 decibels and 80 decibels) are also classified into an amplitude category _7 indicating that the sound segment is at a high volume; the keywords (Jianhushan, gunpowder, blasting show) ) respectively classified into a keyword category _7, a keyword category _10, and a keyword category _3, which respectively indicate a playground name, a proper noun, and an activity name; the text speed (80) The word/minute is classified into a text speed category _4, which indicates that the speaking speed is fast.

又,假設根據預先進行的統計,「高亮度」代表該影像5的意象為快樂、「中偏高對比」代表該影像5的意象為快樂或活潑、「與遊樂場有關」代表該影像5的意象為快樂、「高亢」代表的情緒為興奮或激動、「大音量」代表的情緒為激動、「遊樂場名稱、專有名詞,及活動名稱」代表的情境為熱鬧、「快速的說話速度」代表的情緒為興奮或激動。該語義分類單元24所產生的該語義類別聯集表示如下:{亮度類別_3,對比類別_5,影像物件類別_2}+劍湖山{關鍵字類別_7}+火藥{振幅類別_7,關鍵字類別_10}+爆破秀{頻率類別_6,振幅類別_7,關鍵字類別_3},由此可知,該語義類別聯集可反映出該來源資料的內涵。Further, it is assumed that "high brightness" means that the image of the image 5 is happy, "middle high contrast" means that the image of the image 5 is happy or lively, and "related to the playground" represents the image 5 according to the statistics performed in advance. The image is happy, the emotion represented by "sorghum" is excitement or excitement, the emotion represented by "high volume" is excitement, the situation represented by "playground name, proper noun, and activity name" is lively, "fast speaking speed" The emotions represented are excited or excited. The semantic category union generated by the semantic classification unit 24 is expressed as follows: {luminance category _3, contrast category _5, image object category_2} + sword lake hill {keyword category _7} + gunpowder {amplitude category _7 , keyword category _10} + blast show {frequency category _6, amplitude category _7, keyword category _3}, from which it can be seen that the semantic category association can reflect the connotation of the source material.

如步驟S44所示,該視覺處理模組3之該視覺參數產生單元31根據該語義類別/調整參數關聯、該語義類別/字幕參數關聯、該語義類別/文字參數關聯,及步驟S43求得的該語義類別聯集的該等語義類別,得到該視覺參數組。As shown in step S44, the visual parameter generating unit 31 of the visual processing module 3 obtains the semantic category/adjustment parameter association, the semantic category/subtitle parameter association, the semantic category/text parameter association, and the step S43. The semantic categories of the semantic category are combined to obtain the visual parameter set.

在本應用範例中,該視覺參數產生單元31根據該語義類別/調整參數關聯,及該亮度類別_3、該對比類別_5、該影像物件類別_2進行判斷,所得到的該視覺參數組為用以將該影像5之背景調亮的一背景亮度調整參數組;該視覺參數產生單元31根據該語義類別/字幕參數關聯,及該頻率類別_6、該振幅類別_7、該關鍵字類別_7、該文字速度類別_4進行判斷,所得到的該視覺參數組包括一字體大小為大字體、一顏色為橘色、一字型為粗體陰影,及一字幕特效為火焰特效;再者,由於該語義類別聯集中,火藥{振幅類別_7,關鍵字類別_10}+爆破秀{頻率類別_6,振幅類別_7,關鍵字類別_3}指示出火藥及爆破秀不但是關鍵字,且被以較大音量、較高頻率唸出,所以,該視覺參數產生單元31選擇這兩筆關鍵字作為一字幕內容;該視覺參數產生單元31根據該語義類別/文字參數關聯,及該關鍵字類別_7進行判斷,所得到的該視覺參數組為:對該文字資料中對應該關鍵字類別_7的文字(即,劍湖山)進行一字體加大的調整參數組。In this application example, the visual parameter generating unit 31 determines the visual parameter group according to the semantic category/adjustment parameter association, and the brightness category_3, the comparison category_5, and the image object category_2. a background brightness adjustment parameter set for brightening the background of the image 5; the visual parameter generation unit 31 associates according to the semantic category/subtitle parameter, and the frequency category _6, the amplitude category _7, the keyword The category_7, the text speed category _4 is determined, and the obtained visual parameter group includes a font size of a large font, a color of orange, a font of bold shadow, and a subtitle effect as a flame effect; Furthermore, due to the semantic category, gunpowder {amplitude category_7, keyword category_10}+blasting show {frequency category_6, amplitude category_7, keyword category_3} indicates that the gunpowder and the blast show are not However, the keyword is read at a relatively high volume and a high frequency. Therefore, the visual parameter generating unit 31 selects the two keywords as a subtitle content; the visual parameter generating unit 31 closes according to the semantic category/text parameter. , And the keyword category _7 judge, the visual parameters obtained are: to be keyword categories _7 text (ie, Janfusun) set a parameter adjusted to increase the font of the text profile.

參閱圖1、圖2,與圖4,如步驟S45~S46所示,該視覺處理模組3的該視覺後製單元32根據步驟S44求得的該視覺參數組產生對應的數種視覺效果,並將該來源資料的該影像資料結合該等視覺效果後一同輸出並顯示。Referring to FIG. 1 , FIG. 2 , and FIG. 4 , as shown in steps S45 to S46 , the visual post-production unit 32 of the visual processing module 3 generates corresponding visual effects according to the visual parameter set obtained in step S44. The image data of the source material is combined and displayed together with the visual effects.

在本應用範例中,該視覺後製單元32根據該背景亮度調整參數組調亮該影像5的一背景52以供後續輸出及顯示;該視覺後製單元32還根據該字體大小、該顏色、該字型、該字幕特效,及該字幕內容產生如圖4所示的一語義強化字幕7,並將該語義強化字幕7疊合於該影像5以供後續輸出及顯示;該視覺後製單元32還根據該字體加大的調整參數組,對該文字資料6中對應該關鍵字類別_7的文字61進行字體加大;不過,該文字調整特效並不限於對該文字資料的特定文字作字體調整,亦可對該文字資料6進行新增或刪除等調整,舉例來說,若與該文字資料6相關的語義類別指示為不文雅的字眼,針對這些不文雅的字眼可進行打叉、刪除,或置換為空格(space)。In this application example, the visual post-production unit 32 illuminates a background 52 of the image 5 for subsequent output and display according to the background brightness adjustment parameter set; the visual post-production unit 32 further determines the font size, the color, The font, the subtitle effect, and the subtitle content generate a semantic enhanced subtitle 7 as shown in FIG. 4, and the semantic enhanced subtitle 7 is superimposed on the image 5 for subsequent output and display; the visual post-production unit 32 further enlarges the font of the text 61 corresponding to the keyword category _7 according to the adjusted parameter group of the font; however, the text adjustment effect is not limited to the specific text of the text data. For font adjustment, adjustment or addition of the text data 6 may be performed. For example, if the semantic category associated with the text material 6 is indicated as a non-stereo word, the inconspicuous words may be crossed, Delete, or replace with a space.

參閱圖1、圖4,與圖5,該視覺參數產生單元31所產生的該字幕特效亦可為動畫字幕特效,該視覺後製單元32可根據該字幕特效產生具有動畫效果的該語義強化字幕7,像是,圖4與圖5該語義強化字幕7中,「爆破秀」之陰影部分可以來回旋轉;而該視覺參數產生單元31所產生的該字幕特效亦可為特定的一動畫特效,像是,圖4與圖5中可以抖動閃爍的火焰特效。Referring to FIG. 1 , FIG. 4 , and FIG. 5 , the subtitle effect generated by the visual parameter generating unit 31 may also be an animation subtitle effect, and the visual post-production unit 32 may generate the semantic enhanced subtitle with an animation effect according to the subtitle effect. 7, in the semantic enhanced subtitles 7 of FIG. 4 and FIG. 5, the shaded portion of the "blasting show" can be rotated back and forth; and the subtitle effect generated by the visual parameter generating unit 31 can also be a specific animation effect. For example, the flame effects that can be flickering in Figures 4 and 5 can be seen.

值得一提的是,在本範例中,係以單張影像5進行說明,但是,該視覺後製單元32亦可對包括一連串影像的視訊進行類似的處理,並不限於本範例所揭露。It should be noted that, in this example, the single image 5 is used for description. However, the visual rear unit 32 can perform similar processing on the video including a series of images, and is not limited to the example.

綜上所述,本發明藉由該語義處理模組2產生該語義類別聯集,並藉由該視覺處理模組3產生對應的該視覺效果以對該來源資料進行視覺強化處理後輸出;讓使用者可以同時看到該來源資料,以及反映出該來源資料的內涵的視覺效果,的確帶給使用者更深一層的視覺感受,故確實能達成本發明之目的。In summary, the semantic processing module 2 generates the semantic category association, and the visual processing module 3 generates the corresponding visual effect to perform visual enhancement processing on the source data, and then output; The visual effect that the user can see the source material at the same time and reflect the connotation of the source material does bring a deeper visual experience to the user, so the object of the present invention can be achieved.

惟以上所述者,僅為本發明之較佳實施例而已,當不能以此限定本發明實施之範圍,即大凡依本發明申請專利範圍及發明說明內容所作之簡單的等效變化與修飾,皆仍屬本發明專利涵蓋之範圍內。The above is only the preferred embodiment of the present invention, and the scope of the invention is not limited thereto, that is, the simple equivalent changes and modifications made by the scope of the invention and the description of the invention are All remain within the scope of the invention patent.

1...來源資料分類模組1. . . Source data classification module

2...語義處理模組2. . . Semantic processing module

21...影像分析器twenty one. . . Image analyzer

22...聲音分析器twenty two. . . Sound analyzer

23...文字分析器twenty three. . . Text analyzer

24...語義分類單元twenty four. . . Semantic classification unit

25...影像物件資料庫25. . . Image object database

26...關鍵字資料庫26. . . Keyword database

27...語義/類別資料庫27. . . Semantics/category database

3...視覺處理模組3. . . Visual processing module

31...視覺參數產生單元31. . . Visual parameter generation unit

32...視覺後製單元32. . . Visual post unit

S41~S46...步驟S41~S46. . . step

5...影像5. . . image

51...重要物件51. . . Important object

52...背景52. . . background

6...文字段6. . . Field

61...文字61. . . Text

7...語義強化字幕7. . . Semantic enhanced subtitles

圖1是一系統圖,說明本發明基於語義的視覺效果產生系統的一較佳實施例;1 is a system diagram illustrating a preferred embodiment of a semantic-based visual effect generation system of the present invention;

圖2是一流程圖,說明對應本發明較佳實施例的一語義之視覺效果產生方法;2 is a flow chart illustrating a semantic visual effect generating method corresponding to a preferred embodiment of the present invention;

圖3是一示意圖,說明一來源資料的一影像資料的其中一影像,及一文字資料的其中一文字段;3 is a schematic diagram showing one image of an image data of a source material, and one of the text fields of a text material;

圖4是一示意圖,說明依據本發明較佳實施例,將該來源資料的該影像資料結合各種視覺效果;及4 is a schematic diagram showing the image material of the source material combined with various visual effects according to a preferred embodiment of the present invention; and

圖5是一示意圖,配合圖4說明一動畫字幕特效。FIG. 5 is a schematic diagram illustrating an animation subtitle effect in conjunction with FIG. 4.

1...來源資料分類模組1. . . Source data classification module

2...語義處理模組2. . . Semantic processing module

21...影像分析器twenty one. . . Image analyzer

22...聲音分析器twenty two. . . Sound analyzer

23...文字分析器twenty three. . . Text analyzer

24...語義分類單元twenty four. . . Semantic classification unit

25...影像物件資料庫25. . . Image object database

26...關鍵字資料庫26. . . Keyword database

27...語義/類別資料庫27. . . Semantics/category database

3...視覺處理模組3. . . Visual processing module

31...視覺參數產生單元31. . . Visual parameter generation unit

32...視覺後製單元32. . . Visual post unit

Claims (13)

一種基於語義的視覺效果產生系統,包含:一語義處理模組,包括一語義分類單元,該語義分類單元用以接收相關於一來源資料的一語義資料組,並用以對該語義資料組進行語義/類別對應分類,以將該語義資料組的內容對應分類至所屬的至少一語義類別,藉此產生包括該語義類別的一語義類別聯集,其中,該來源資料包括一影像資料,及一聲音資料與一文字資料兩者其中至少一者;及一視覺處理模組,包括一視覺參數產生單元,及一視覺後製單元,該視覺參數產生單元用以接收該語義類別聯集,並根據該語義類別聯集以得到一視覺參數組,該視覺後製單元用以根據該視覺參數組產生至少一視覺效果,並將該來源資料的該影像資料結合該視覺效果後進行顯示。A semantic-based visual effect generation system, comprising: a semantic processing module, comprising a semantic classification unit, the semantic classification unit is configured to receive a semantic data group related to a source material, and use the semantic data group to perform semantics on the semantic data group / class corresponding classification, to classify the content of the semantic data group into at least one semantic category, thereby generating a semantic category association including the semantic category, wherein the source material includes an image data, and a sound At least one of the data and the text data; and a visual processing module comprising a visual parameter generating unit and a visual post-production unit, the visual parameter generating unit configured to receive the semantic category association, and according to the semantic The categorization is performed to obtain a visual parameter set, and the visual post-production unit is configured to generate at least one visual effect according to the visual parameter set, and display the image data of the source material in combination with the visual effect. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一影像分析器,用以對該來源資料的該影像資料進行分析以得到該語義資料組,其中,對於該影像資料的其中一影像,該影像分析器係求得對應該影像的一亮度值,該語義資料組包括對應該影像的該亮度值,且該語義分類單元係根據預先建立的一亮度值/類別對應關聯,將該亮度值對應分類至所屬的語義類別。The semantic-based visual effect generating system according to claim 1, wherein the semantic processing module further includes an image analyzer for analyzing the image data of the source material to obtain the semantic data group. The image analyzer obtains a brightness value corresponding to the image for the image of the image data, the set of semantic data includes the brightness value corresponding to the image, and the semantic classification unit is based on the pre-established A luminance value/category corresponds to the association, and the luminance value is correspondingly classified to the associated semantic category. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一影像分析器,用以對該來源資料的該影像資料進行分析以得到該語義資料組,其中,對於該影像資料的其中一影像,該影像分析器係求得對應該影像的一對比值,該語義資料組包括對應該影像的該對比值,且該語義分類單元係根據預先建立的一對比值/類別對應關聯,將該對比值對應分類至所屬的語義類別。The semantic-based visual effect generating system according to claim 1, wherein the semantic processing module further includes an image analyzer for analyzing the image data of the source material to obtain the semantic data group. The image analyzer obtains a contrast value corresponding to the image, the semantic data set includes the contrast value corresponding to the image, and the semantic classification unit is based on the pre-established A comparison value/category corresponds to the association, and the comparison value is correspondingly classified to the associated semantic category. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一影像分析器,用以對該來源資料的該影像資料進行分析以得到該語義資料組,其中,對於該影像資料的其中一影像,該影像分析器係根據預先建立的一影像物件集合進行比對,以自該影像中擷取出至少一重要物件,該語義資料組包括該重要物件,該語義分類單元係根據預先建立的一影像物件/類別對應關聯,將該重要物件對應分類至所屬的語義類別。The semantic-based visual effect generating system according to claim 1, wherein the semantic processing module further includes an image analyzer for analyzing the image data of the source material to obtain the semantic data group. The image analyzer is configured to perform at least one important object from the image according to a pre-established image object set, wherein the semantic data set includes the important object. The semantic classification unit classifies the important object corresponding to the associated semantic category according to a pre-established image object/category correspondence. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一聲音分析器,該來源資料係包括該影像資料及該聲音資料,該聲音分析器用以對該聲音資料進行分析以得到該語義資料組,其中,對於該聲音資料的其中一聲音段,該聲音分析器係求得對應該聲音段的至少一頻率,該語義資料組包括對應該聲音段的該頻率,該語義分類單元是根據預先建立的一頻率/類別對應關聯,將該頻率對應分類至所屬的語義類別。The semantic-based visual effect generating system of claim 1, wherein the semantic processing module further comprises a sound analyzer, the source data comprising the image data and the sound data, and the sound analyzer is used for And analyzing the sound data to obtain the semantic data group, wherein, for one sound segment of the sound data, the sound analyzer determines at least one frequency corresponding to the sound segment, and the semantic data group includes a corresponding sound segment The frequency classification unit is configured to classify the frequency into the associated semantic category according to a pre-established frequency/category correspondence. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一聲音分析器,該來源資料係包括該影像資料及該聲音資料,該聲音分析器用以對該聲音資料進行分析以得到該語義資料組,其中,對於該聲音資料的其中一聲音段,該聲音分析器係求得對應該聲音段的至少一振幅,該語義資料組包括對應該聲音段的該振幅,該語義分類單元是根據預先建立的一振幅/類別對應關聯,將該振幅對應分類至所屬的語義類別。The semantic-based visual effect generating system of claim 1, wherein the semantic processing module further comprises a sound analyzer, the source data comprising the image data and the sound data, and the sound analyzer is used for The sound data is analyzed to obtain the semantic data set, wherein, for one sound segment of the sound data, the sound analyzer determines at least one amplitude corresponding to the sound segment, and the semantic data set includes the corresponding sound segment The amplitude of the semantic classification unit is based on a pre-established amplitude/category correspondence, and the amplitude is correspondingly classified to the associated semantic category. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一文字分析器,該來源資料係包括該影像資料及該文字資料,該文字分析器用以對該文字資料進行分析以得到該語義資料組,其中,對於該文字資料的其中一文字段,該文字分析器係根據預先建立的一關鍵字集合進行比對,以自該文字段中擷取出至少一關鍵字,該語義資料組包括對應該文字段的該關鍵字,該語義分類單元係根據預先建立的一關鍵字/類別對應關聯,將該關鍵字對應分類至所屬的語義類別。The semantic-based visual effect generating system according to claim 1, wherein the semantic processing module further includes a text analyzer, the source data includes the image data and the text data, and the text analyzer is used to The text data is analyzed to obtain the semantic data set, wherein, for one of the text fields of the text data, the text analyzer compares according to a pre-established set of keywords to extract at least one from the text field. The keyword, the semantic data group includes the keyword corresponding to the text field, and the semantic classification unit classifies the keyword into the belonging semantic category according to a pre-established keyword/category correspondence. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該語義處理模組還包括一文字分析器,該來源資料係包括該影像資料及該文字資料,該文字分析器用以對該文字資料進行分析以得到該語義資料組,其中,對於該文字資料的其中一文字段,該文字分析器係求得對應該文字段的一文字速度,該語義資料組包括對應該文字段的該文字速度,該語義分類單元係根據預先建立的一文字速度/類別對應關聯,將該文字速度對應分類至所屬的語義類別。The semantic-based visual effect generating system according to claim 1, wherein the semantic processing module further includes a text analyzer, the source data includes the image data and the text data, and the text analyzer is used to The text data is analyzed to obtain the semantic data group, wherein, for one of the text fields of the text data, the text analyzer obtains a text speed corresponding to the text field, and the semantic data group includes the text corresponding to the text field Speed, the semantic classification unit classifies the text speed correspondingly to the associated semantic category according to a pre-established text speed/category correspondence. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該視覺效果為對該來源資料的該影像資料進行影像調整處理的一單張影像特效,該視覺參數產生單元是根據預先建立的一語義類別/調整參數關聯,及該語義類別聯集,以得到該視覺參數組。The semantic-based visual effect generating system according to claim 1, wherein the visual effect is a single image special effect of performing image adjustment processing on the image data of the source material, and the visual parameter generating unit is based on A pre-established semantic category/adjustment parameter association, and the semantic category collection, to obtain the visual parameter set. 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該視覺效果為用於與該來源資料的該影像資料進行疊合顯示的一語義強化字幕,該視覺參數產生單元係根據預先建立的一語義類別/字幕參數關聯,及該語義類別聯集,以得到該視覺參數組。The semantic-based visual effect generating system according to claim 1, wherein the visual effect is a semantic enhanced subtitle for superimposing and displaying the image data of the source material, the visual parameter generating unit According to a pre-established semantic category/subtitle parameter association, and the semantic category collection, the visual parameter group is obtained. 根據申請專利範圍第10項所述的基於語義的視覺效果產生系統,其中,該視覺參數組包括對應該語義強化字幕的一字體大小、一顏色、一字型,及一字幕特效其中至少一者。The semantic-based visual effect generating system according to claim 10, wherein the visual parameter set includes at least one of a font size, a color, a font, and a subtitle effect corresponding to the semantic enhanced subtitle. . 根據申請專利範圍第1項所述的基於語義的視覺效果產生系統,其中,該來源資料係包括該影像資料及該文字資料,該視覺效果為對該文字資料進行調整處理的一文字調整特效,該視覺參數產生單元係根據預先建立的一語義類別/文字參數關聯,及該語義類別聯集,以得到該視覺參數組。The semantic-based visual effect generating system according to claim 1, wherein the source data includes the image data and the text data, and the visual effect is a text adjustment effect for adjusting the text data. The visual parameter generating unit is based on a pre-established semantic category/text parameter association, and the semantic category is combined to obtain the visual parameter group. 一種內儲基於語義的視覺效果產生程式的程式產品,當一電子裝置載入該程式並執行後,可完成申請專利範圍1~12中任一項記載之該語義處理模組,及該視覺處理模組所述之功能。A program product for storing a semantic-based visual effect generating program, which, when loaded into an electronic device and executed, can complete the semantic processing module described in any one of the patent applications 1 to 12, and the visual processing The function described by the module.
TW99131706A 2010-09-17 2010-09-17 Semantic-based visual effect generating system and program product TWI423051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99131706A TWI423051B (en) 2010-09-17 2010-09-17 Semantic-based visual effect generating system and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99131706A TWI423051B (en) 2010-09-17 2010-09-17 Semantic-based visual effect generating system and program product

Publications (2)

Publication Number Publication Date
TW201214153A TW201214153A (en) 2012-04-01
TWI423051B true TWI423051B (en) 2014-01-11

Family

ID=46786416

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99131706A TWI423051B (en) 2010-09-17 2010-09-17 Semantic-based visual effect generating system and program product

Country Status (1)

Country Link
TW (1) TWI423051B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080062196A1 (en) * 1999-07-26 2008-03-13 Rackham Guy J J System and method for enhancing the visual effect of a video display
TW200928954A (en) * 2007-12-28 2009-07-01 E Ten Information Sys Co Ltd Electronic device and method capable of generating screen picture transformation visual effects

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080062196A1 (en) * 1999-07-26 2008-03-13 Rackham Guy J J System and method for enhancing the visual effect of a video display
TW200928954A (en) * 2007-12-28 2009-07-01 E Ten Information Sys Co Ltd Electronic device and method capable of generating screen picture transformation visual effects

Also Published As

Publication number Publication date
TW201214153A (en) 2012-04-01

Similar Documents

Publication Publication Date Title
JP7201729B2 (en) Video playback node positioning method, apparatus, device, storage medium and computer program
US10031649B2 (en) Automated content detection, analysis, visual synthesis and repurposing
US7603620B2 (en) Creating visualizations of documents
Zlatintsi et al. COGNIMUSE: A multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
US9961403B2 (en) Visual summarization of video for quick understanding by determining emotion objects for semantic segments of video
US8126220B2 (en) Annotating stimulus based on determined emotional response
US20230146144A1 (en) Digital image classification and annotation
CN112565899A (en) System and method for visual analysis of emotion consistency in video
Ding et al. Beyond audio and video retrieval: towards multimedia summarization
JP2011215963A (en) Electronic apparatus, image processing method, and program
KR20070121810A (en) Synthesis of composite news stories
CN101783886A (en) Information processing apparatus, information processing method, and program
CN109614482A (en) Processing method, device, electronic equipment and the storage medium of label
CN104731873A (en) Evaluation information generation method and device
CN112732974A (en) Data processing method, electronic equipment and storage medium
CN109739354A (en) A kind of multimedia interaction method and device based on sound
US11514924B2 (en) Dynamic creation and insertion of content
CN115580758A (en) Video content generation method and device, electronic equipment and storage medium
Jänicke et al. SoundRiver: semantically‐rich sound illustration
CN113450804A (en) Voice visualization method and device, projection equipment and computer readable storage medium
TWI423051B (en) Semantic-based visual effect generating system and program product
CN102455847A (en) Visual effect generation system based on semanteme
EP3877916A1 (en) Method for performing legal clearance review of digital content
Erol et al. Multimedia thumbnails for documents
KR20220000459A (en) Method of evaluating language difficulty in video and computer program for the same