TWI283375B - Anchor person detection for television news segmentation based on audiovisual features - Google Patents

Anchor person detection for television news segmentation based on audiovisual features Download PDF

Info

Publication number
TWI283375B
TWI283375B TW94126220A TW94126220A TWI283375B TW I283375 B TWI283375 B TW I283375B TW 94126220 A TW94126220 A TW 94126220A TW 94126220 A TW94126220 A TW 94126220A TW I283375 B TWI283375 B TW I283375B
Authority
TW
Taiwan
Prior art keywords
image
color
sound
segment
pixels
Prior art date
Application number
TW94126220A
Other languages
Chinese (zh)
Other versions
TW200707336A (en
Inventor
Shih-Hung Lee
Chia-Hung Yeh
Hsuan-Huei Shih
Chung-Chieh Kuo
Original Assignee
Mavs Lab Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mavs Lab Inc filed Critical Mavs Lab Inc
Priority to TW94126220A priority Critical patent/TWI283375B/en
Publication of TW200707336A publication Critical patent/TW200707336A/en
Application granted granted Critical
Publication of TWI283375B publication Critical patent/TWI283375B/en

Links

Abstract

A video segmentation method for segmenting video clips according to content of the video clips is disclosed. The method comprises scanning pixels of video frames with a first horizontal scan line to determine if colors of the pixels fall within a predetermined color range; creating a color map utilizing pixels located on the first horizontal scan line from a plurality of successive video frames; labeling the current video segment as a candidate video segment if the color map indicates the presence of a stable region of pixels falling within the predetermined color range for a predetermined number of successive video frames; and performing histogram color comparisons on the stable regions for detecting shot transitions. Audio signals of the video clips may also be analyzed to further verify the candidate video segments.

Description

1283375 九、發明說明: 【發明所屬之技術領域】 本發明係有關一種影像分段技術,尤指一種偵測電視新聞主 播’並將電視新聞節目分段的方法。 【先前技術】 因為電視上的新聞頻道日益增加,所以可以取得的新聞資訊 也愈來愈多,因此觀眾愈來愈不容易搜尋並找出想要的新聞節 目。一個新聞節目通常包含有若干段不同的新聞,而每一段新聞 之間通¥;又有太多的關聯。為了讓搜尋以及分類每段新聞變得更 加便利,可以_電視新聞主播的影像來期每—段新聞何時開 始以及何時結束。因此在每-段新聞晝面中,電視新聞主播的^ 頭成為最重要的綱,電賴駄·常在每—騎關始時做 引言介紹,歧在每-段糊結束時對新_容講評或整理。因 此電視新齡義綱可財效地傳達新_容社要概冬,觀 眾柯以根據電視新聞主播的鏡頭來瀏覽新_目,也狀 以藉由偵測新聞主播來識別每一段新聞。 疋π 傳統將新聞分段的方法用的是一種機器學習(咖⑽ i_ng)技術’該技術會自動將咖分類,細制知技 細限㈣峨蝴讀來顯示刊 末源的貝料。也有其他使用較複雜演算法 以及語者識別(speakeridentiflcati()I〇的…例如臉部辨識 旳方法,因為電視新聞主播 1283375 是誰以及他在晝面中的位置是未知的。以下所列是幾種習知的分 段方法··頭部偵測、嘴型偵測、口音及音樂的分類或辨識、隱藏 式字幕(closed-caption)擷取以及影像光學文字辨識系統(〇ptical character recognition,OCR),以及模型基礎方法(m〇del-base(i method)。然而上述的方法皆仰賴極為複雜的演算法。 【發明内容】 本發明的目的之一在於提供一種掃描新聞節目影像晝面的方 法,來解決上述的問題,此方法係藉由比對畫素顏色與膚色範圍 來偵測電視新聞主播是否出現於影像晝面中。 根據本發明的實施例,其係揭露—種影像分段方法,用采很 據影像片段_容雜影像片段。該方法包含有:接收一包含有 複數個影像畫面的影像訊號;利用—第—水平掃描線來分析該影 像峨的影像畫©,其中該第—水平掃描線_取至少—列像素 來作分析;分析影像畫面中錄該第—水平掃鱗上之像素以決 定該像素_色是否落於-預定腕賴之内;在該影像晝面中 指出落於該預定齡範n相鄰像素所含蓋_域;利用複 數個連續的·晝面中位於該第—水平掃描線上之像素來產生一 色彩地圖;如果色彩地圖顯示―預定數目的連續影像晝面中 包含-穩定的像素區域’並且該像素皆落於該預定的顏色範圍, 則將目前的影像段落標*為候選的影像段落;對於每一個候 影像段落’自每N個影像畫面中選出一個影像畫面,並且針對每 1283375 個l出的〜像晝_穩定區域,產生—色譜曲線;執行一第一色 譜曲雜較,比較每_對連續選出的影像畫面之色譜曲線;當該 第-色㈣線比較所得的第—色譜曲線差大於—第—臨界值時, 執订—第—色相線比較,比較介於騎連顧㈣影像畫面之 間之母對連續的影像晝面之穩定區域,其令該對連續選出的影 像直面之色韻線差係大於該第—臨界值;以及當該第二色譜曲 線比較所得的第—色睹曲線差大於—第二臨界值時,指示該候選 的影像段落中有一鏡頭改變。 根據本發明的另-實酬’其麵露—種影像分段方法,用 來根據影像片段的内容剪輯影像片段。該方法包含有:接收一包 3有複触u邊晝面的影像賴;;接收與該已接收之影像訊號相 關聯的聲音訊號;利用一第一水平掃描線與一第二水平掃描線來 分析該影像訊號的影像晝面,其中該第—水平掃描線與該第二水 平掃描線,各選取至少—列像絲作分析;如果影像晝面上位於 該第第—水传鱗上之像素的顏色係落於—預定顏色範圍 内則將該像素设定為邏輯值“J”;利用位於該第一及第二 掃描線上之相對應的像素,執行—“或(⑻,,邏輯運算丁 生合成的像素資料;顧該合成的像素·來指出該影像 落於該預定純範_婦騎_域;棚複數個連^ 晝面中之合梅綱紐—色侧;如果色彩地= 1定數目的連續影像畫面中,皆包含—穩定的像素區域,並二 该像素皆落於_定_色範圍,則將目前的影像段落標示為候 1283375 選的影像段落·,對於每—倾親f綠段落,自糾個影像晝面 中選出-個影像晝面’並且針對每個選出的影像晝面的穩定區 域’產生-色譜崎;執行—第—色譜曲線味,比較每一對連 續選出的影像畫面之色譜曲線;當該第一色譜曲線比較所得的第 -色譜曲線差大於-第—臨界值時,執行—第二色譜曲線比較, 比車乂 ”於該對連續選出的影像畫面之間之每一對連續的影像畫面 之穩定區域’其中騎_選出的影像畫面之色譜曲線差係大於 該第臨界值,胃該第―色譜曲線比較所得的第二色譜曲線差大 於一第二臨界值時,指示該候選的影像段料有-鏡頭改變;以 及分析該聲音訊號以過滤該候選的影像段落,其中該聲音訊號之 特徵係藉由處理鱗音峨讀數個敢尺相聲音獅(減〇 frame)而取得。 根據本發另—實施例,其亦揭露—種影像分段方法,該 方法係根據偵測-電視新聞影像片段中的電視新聞主播來剪輯該 電視新聞_段。該方法包含有n包含有複數個新聞影 :二:影像訊號;利用一第一水平掃描線來分析該影像訊號的 '耳衫像晝面,其中該第—水平掃描線係選取至少—列像素來作 分析’ ·分析新聞影像畫面中位於該第一水平掃描線上之像素以決 H 象素的顏色是否落於一預定顏色範圍之内以偵測該電視新 聞主播的膚色;在該賴縣畫面中指出落於_定顏色範圍之 内之相,像素所含蓋的區域;利用複數個連續的新聞影像畫面中 位於該第一水平掃描線上之像素來產生一色彩地圖;如果=彩地 1283375 圖顯示-就數目的連續新聞影像晝面中,皆包含—穩定的像素 ,域,並且該像素皆落於該預定的顏色細,騎目前的影像段 洛標示為候選的影像段落;對於每—健選的影像段落,自每N 個新聞影像晝面中選出一個新聞影像晝面,並 新聞影像畫面的敎區域,產生—色譜曲線;執行-第- ί比較’比較每―對連續選出的新聞影像晝面之色譜曲線;當該 第Γ色譜曲線比較所得的第一色譜曲線差大於一第一臨界值時, 執仃-第二色譜曲線比較,比較介於該對連續選出的新聞影像畫 =1:每一對連續的新聞影像畫面之穩定區域,其中該對連續 &amp;出的新_像畫面之色譜曲線差係大_第—臨界值;以及春 色譜曲線比較所得的第二色譜曲線差大於—第二臨界值田 寺’才曰不該候選的影像段落中有一鏡頭改變。 算法本發明所提出的方法利用簡單的演 貝I像旦面中疋否出現落於膚色 找出新·_換的㈣。藉 置,即使_面包含有分物f _播“現的位 位置 分段。 “她之,Ltr 視新聞主播的頭部部分出現的 〜之本發明提供-種簡易的計算方法來將 1283375 【實施方式】 '參閱第1圖,帛1圖為電視新聞分段系統的方塊圖。該系 、、先10係藉由债測電視新聞主播而將電視新聞分段,系統⑺包含 有影像處理魏30’f爾理電輸會依侧_ =。來產f新聞影像的― /之後會進一步分析該候選片段4〇的聲音資訊,以確保 影像分析的正確性。 ' 衫像處理電路3〇包含有一鏡頭偵測電路32、一臉部膚色偵 ,路34 Μ及-物纽桃36。臉部膚色躺電路34用來谓 洛在預疋辄圍的影像畫面上的像素,該預定範圍即代表膚色範 圍。請參閱第2圖及第3圖,第2圖顯示利用第一水平掃描線⑽ 以及第二水平掃描線1G4來_影像畫面·上是否有電視新聞 主播的臉部影像。第3圖則為本發明_電視_主播臉部影像 的流程圖。 研究顯示’攝影師通常習慣將主播的臉部置於離影像畫面頂 端約三分之-的地方’因此,臉部膚色偵測電路別利用第一 掃描線H)2或有時候連同第二水平掃 色的畫素。軸_的辆只需要第—斜概^^,^=卜 利用第二水平掃描、線1〇4可以讓臉部膚色侦測電路%產生 的結果。舉例來說,水平掃描線可能通過電視新聞主播的眼睛或 嘴巴,雖然水平掃描線仍然通過電視新聞主播的臉部,但偵測到 12 1283375 的顏色卻不是膚色,這會導致不準確_測結果。為了減少這種 情況的發生機會’以及為了提供更魏用來侧電視新聞主播臉 部位置的資料,因此會用到兩條水平掃描線。 不論第一水平掃描線1〇2或是第二水平掃描線皆會分析 影像晝面100中至少-列的像素,並分別產生取樣像素顏色112 以及1M。睛注意,第一水平掃描線1〇2的位置以及第二水平择描 線1〇4的位置會儘可能落於電視畫面三分之一的地方,以增加掃 描到電視新聞主播的臉部的可能性。以下將轉第3圖中所顯示 的步驟。 ^ ^ 步驟150:開始。 步驟12·將影像晝面卿的色影空間㈤。寧⑹從卿色彩 空間轉換至Lab色彩空間。Lab色彩空間較適合用來 偵測膚色,而且也更為普遍使用。然而,本發明也可 以使用其他種類的色彩空間,例如RGB、YCbCr以及 IRgBy 〇 步驟154 ·判別第一水平掃描線ι〇2 (或有時連同第二水平掃描線 1〇4)疋否在该影像晝面100上掃描到任何落於膚色範 圍的像素。這個膚色範圍可以依據局部區域或是攝影 棚的燈光狀況而調整。 步驟156 : _是否有—簡大且連_膚色細區域。也就是 說’判別是否有—群賴的像素,其個數比—預定值 13 1283375 =而且全數落於膚色義。如果有—働大且連續 牛驟158.如辄圍區域’到步驟158’否則則到步驟160。 乂 _.=將目前的_段設_選的影像片段。因為之 後會對該影像片段做更多的影像以及聲音分析,該影 牛驟㈣像片段可能不會再被設定為候選的影像片段。 步驟160 :結束。 “閱第4圖’第4圖顯不如何從兩條掃描線來得到一邏輯 =圖=gie ,綱電視新耻播。在新聞播報的 一“視_主獅位置—般而言都很固定,因此可以利用這 個事實來_在連_影像晝面巾,是否在大__位置上都 包含有具有膚色的像素。本針所舉_子皆假設影像區段中每 秒鐘包含有30姆彡像t面。翻賊速獻是方便於轉本發明 所提出的方法,然而不應將此播放速度作為本案的限制。 第-水平掃描線1()2以及第二水平掃描線1〇4係用來在複數 個影像畫© 210巾,例如3G個連續的影像晝面,產生取樣像素顏 色112以及114。-旦取樣像素顏&amp; 112以及114產生之後就會 啟動膚色制程序22〇來將每轉素作分類··如果像素是落在膚 色範圍’則其代表-邏輯值“丨”,如果像素非落在膚色範圍,則 其代表-邏輯值“0”,取樣像素顏色112以及114的結果則如指 標陣列222及224所示。之後對指標陣列222以及224執行一 “或 (OR) ”邏輯運算226,來得到結果陣列232。分析完3〇個連續 14 1283375 的影像晝面中的每一個之後,結果陣列232則儲存於一色彩地圖 230中。色彩區塊(c〇i〇rblock) 240為色彩地圖230的一個圖示 範例’色彩區塊240中的30列分別對應至已分析的3〇個影像晝 面,其中白色區塊代表膚色範圍的像素,而黑色則不代表膚色範 圍的像素。色彩區塊240中在區域245附近的像素,也就是由左 至右大約從像素210至330之間的膚色像素,其呈現穩定狀態時 代表電視新聞主播可能是影像晝面的主題。為了得到更正確的結 φ 果’可以執行更多的分析來驗証。 一旦候選的影像區段經過識別之後,鏡頭偵測電路32可以協 助識別影像區段何時改變。例如,鏡頭偵測電路32可以藉由分析 影像畫面的色彩性質來偵測一個鏡頭何時由穩定地呈現電視新聞 主播而切換至另一個鏡頭。請參閱第5圖,第5圖顯示偵測鏡頭 切換之示意圖,藉由比較兩個影像畫面312和322的影像帶315 _ 和325的局部色譜曲線,來偵測鏡頭切換。為了減少運算的複雜 度,鏡頭偵測電路32首先偵測大規模的鏡頭切換,一旦發現切換 之後’接著鎖定較小的範圍來偵測切換癌切的發生區域。 第5圖顯示兩組影像晝面組31〇及32〇,在本例中,每一組影 像晝面310或320皆包含30個影像畫面,也就是代表一秒鐘長度 的影像。在每一個影像畫面組31〇以及320中各選出一個影像晝 面,為了簡單起見,通常會選出第30個影像晝面來做比較。在連 續選出的兩個影像晝面312和322中選出影像帶315及325,其中1283375 IX. DESCRIPTION OF THE INVENTION: FIELD OF THE INVENTION The present invention relates to an image segmentation technique, and more particularly to a method of detecting a television news anchor' and segmenting a television news program. [Prior Art] Because of the increasing number of news channels on TV, more and more news information can be obtained, so it is increasingly difficult for viewers to search and find the news programs they want. A news program usually contains several different pieces of news, and each piece of news is connected to each other; there are too many connections. In order to make it easier to search and classify each piece of news, you can use the image of the TV news anchor to see when each piece of news starts and ends. Therefore, in every news section, the head of the TV news anchor has become the most important program. Comment or organize. Therefore, the new TV platform can communicate the new _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _疋π Traditionally, the method of segmenting news is a machine learning (cafe (10) i_ng) technology. The technology automatically classifies the coffee, and fine-cuts the technical details (4) to read the shellfish. There are also other methods that use more complex algorithms and speaker recognition (speakeridentiflcati()I〇...such as face recognition, because the TV news anchor 1283375 is who his position in the face is unknown. The following are a few A conventional segmentation method · head detection, mouth detection, classification and recognition of accent and music, closed-caption capture, and optical character recognition (OCR) And the model base method (m〇del-base(i method). However, the above methods all rely on extremely complicated algorithms. [Invention] One of the objects of the present invention is to provide a method for scanning a news program image. To solve the above problem, the method is to detect whether a TV news anchor appears in the image plane by comparing the pixel color and the skin color range. According to an embodiment of the invention, the image segmentation method is disclosed. The image is taken according to the image segment _ tolerance image segment. The method includes: receiving an image signal containing a plurality of image frames; utilizing - A flat scan line is used to analyze the image of the image ©, wherein the first horizontal scan line _ takes at least a column of pixels for analysis; and the pixel on the first horizontal scalar is recorded in the analysis image to determine the pixel_color Whether it falls within the predetermined wristband; in the image plane, the cover_domain that belongs to the adjacent pixel of the predetermined age is indicated; and the plurality of consecutive facets are located on the first horizontal scan line Pixels to generate a color map; if the color map displays "a predetermined number of consecutive image planes containing a - stable pixel area" and the pixels all fall within the predetermined color range, the current image paragraph is marked as a candidate Image segment; for each image segment 'select one image from each N image frames, and for each 1283375 l out ~ like _ stable region, generate - chromatogram curve; perform a first chromatogram Comparing the chromatogram of each successively selected image frame; when the first-color (four) line comparison results in a difference of - the first threshold value, the binding - the first hue line ratio Comparing the stable region of the maternal continuous image between the images of the rider and the image, wherein the pair of successively selected images has a colorline difference greater than the first threshold; and when the first When the difference between the two chromatographic curves is greater than the second threshold, it indicates that there is a lens change in the candidate image segment. According to the present invention, the other aspect of the image is segmented. The method is configured to: cut a video clip according to the content of the video clip. The method includes: receiving an image of a packet 3 having a complex touch u edge; receiving an audio signal associated with the received image signal; using a first The horizontal scan line and a second horizontal scan line are used to analyze the image plane of the image signal, wherein the first horizontal scan line and the second horizontal scan line are each selected from at least one column of the image for analysis; The color of the pixel located on the first water-water scale falls within a predetermined color range to set the pixel to a logical value "J"; using corresponding pixels located on the first and second scan lines Execution - "or ((8),, the logical operation of the pixel data of the synthesis; take the synthesized pixel to indicate that the image falls on the predetermined pure _ _ _ _ _ domain; shed a number of ^ 昼 昼纲纽—Color side; if the color field = 1 a fixed number of consecutive image frames, including a stable pixel area, and the pixel is in the _ _ color range, the current image paragraph is marked as waiting 1283375 Selected image paragraphs, for each-following the green paragraph, select one image from the face of the image and create a stable region for each selected image. - chromatographic curve taste, comparing the chromatographic curve of each pair of consecutively selected image images; when the first chromatographic curve comparison of the first chromatographic curve is greater than - the first critical value, the execution - the second chromatographic curve comparison, than the car乂" in the stable region of each pair of consecutive image frames between the successively selected image frames, wherein the chromatographic curve difference of the image of the riding_selected image is greater than the first critical value, and the first chromatographic curve comparison When the difference of the second chromatogram is greater than a second threshold, indicating that the candidate image segment has a lens change; and analyzing the audio signal to filter the candidate image segment, wherein the sound signal is characterized by processing scale The sound 峨 reading is obtained by a dare to the sound lion (minus frame). According to another embodiment of the present invention, there is also disclosed an image segmentation method for editing a television news segment based on a television news anchor in a detection-television news image segment. The method comprises the following steps: n: a plurality of video images: two: an image signal; using a first horizontal scanning line to analyze the image of the image of the earphone, wherein the first horizontal scanning line selects at least a column of pixels For analysis </ br> analyzes the pixels on the first horizontal scan line in the news image screen to determine whether the color of the H pixel falls within a predetermined color range to detect the skin color of the television news anchor; in the Lai County screen Pointing out the area within the range of the _ fixed color, the area covered by the pixel; generating a color map using pixels located on the first horizontal scan line in a plurality of consecutive news image frames; if = color map 1283375 Display - in the number of consecutive news images, all contain - stable pixels, fields, and the pixels are all in the predetermined color, riding the current image segment as the candidate image segment; for each - health Selected image passages, select one news image from each N news image, and the 敎 region of the news image, generate - chromatogram; execution - first - ί ratio 'Compare each chromatogram of the successively selected news images; when the difference between the first chromatographic curve of the Dijon chromatographic curve is greater than a first critical value, the comparison of the second chromatographic curve is compared between The pair of consecutively selected news image paintings=1: a stable region of each pair of consecutive news image frames, wherein the chromatographic curve difference of the pair of consecutive &amp; new images is large _ first-threshold value; and spring color spectrum The second chromatographic curve difference obtained by the curve comparison is greater than the second critical value of Tiansi', and there is a lens change in the candidate image segment. Algorithm The method proposed by the present invention utilizes a simple representation of the appearance of the image in the face of the image. Borrowing, even if _ bread contains the distribution f _ broadcast "the current position of the segment." "She, Ltr as the head of the news anchor appears in the invention provided - a simple calculation method to be 1283375 Mode] 'Refer to Figure 1, 帛1 is a block diagram of the TV news segmentation system. The department, the first 10 series segmented the TV news by the debt test TV news anchor, and the system (7) contains the image processing Wei 30'f power transmission according to the side _ =. The / / news image of the production of f will further analyze the sound information of the candidate segment 4 to ensure the correctness of the image analysis. The shirt image processing circuit 3 includes a lens detecting circuit 32, a face skin color detector, a road 34, and an object new peach 36. The face color lying circuit 34 is used to refer to pixels on the pre-imaged image frame, which represents the skin color range. Referring to Fig. 2 and Fig. 3, Fig. 2 shows whether or not there is a face image of a television news anchor on the video screen by using the first horizontal scanning line (10) and the second horizontal scanning line 1G4. Figure 3 is a flow chart of the invention_TV_mains face image. Research shows that 'photographers are used to placing the anchor's face about three-thirds of the top of the image.' Therefore, the facial skin tone detection circuit does not use the first scan line H) 2 or sometimes with the second level. Sweeping the pixels. The vehicle of the axis _ only needs the first - oblique ^^, ^ = 卜 The second horizontal scanning, the line 1 〇 4 can make the result of the facial skin color detecting circuit %. For example, the horizontal scan line may pass through the eyes or mouth of the TV news anchor. Although the horizontal scan line still passes through the face of the TV news anchor, it is detected that the color of 12 1283375 is not the skin color, which may result in inaccurate results. In order to reduce the chance of occurrence of this situation and to provide more information on the location of the face of the TV news anchor, two horizontal scan lines are used. Regardless of the first horizontal scan line 1〇2 or the second horizontal scan line, at least the columns of pixels in the image plane 100 are analyzed, and the sampled pixel colors 112 and 1M are generated, respectively. Note that the position of the first horizontal scanning line 1〇2 and the position of the second horizontal selection line 1〇4 will fall as far as possible to one-third of the TV screen to increase the possibility of scanning the face of the TV news anchor. Sex. The steps shown in Figure 3 will be followed below. ^ ^ Step 150: Start. Step 12 · The image will be shaded by the shadow space (5). Ning (6) converted from the color space to the Lab color space. The Lab color space is better suited for detecting skin tones and is more commonly used. However, the present invention may also use other kinds of color spaces, such as RGB, YCbCr, and IRgBy, step 154 - discriminate the first horizontal scan line ι 2 (or sometimes along with the second horizontal scan line 1 〇 4). The image plane 100 scans for any pixels that fall within the skin color range. This skin color range can be adjusted depending on the local area or the lighting conditions of the studio. Step 156: _ Is there a simple and even _ skin area. That is to say, 'determine whether there is a group of pixels, the number of them is - the predetermined value 13 1283375 = and all of them fall on the skin color. If there is - 働 large and continuous 158. If the area is 'to step 158' otherwise to step 160.乂 _.= Set the current _ segment to the selected video clip. Since more image and sound analysis will be performed on the image segment, the image segment may not be set as a candidate image segment. Step 160: End. "Reading Figure 4" Figure 4 shows how to get a logic from the two scan lines = map = gie, the new TV show of the TV. In the news broadcast, a "view _ main lion position - generally fixed Therefore, it is possible to take advantage of the fact that the _image 昼 face towel contains pixels with skin color at the large __ position. This pin assumes that the image segment contains 30 m-th image t-planes per second. It is convenient to transfer the method proposed by the present invention, but this playback speed should not be limited as in this case. The first horizontal scanning line 1 () 2 and the second horizontal scanning line 1 〇 4 are used to generate sampling pixel colors 112 and 114 in a plurality of images, for example, 3G consecutive image planes. Once the sampled pixels &amp; 112 and 114 are generated, the skin color program 22 will be activated to classify each turn. If the pixel is in the skin color range, then its representative - logical value "丨", if the pixel is not Falling in the skin color range, it represents a logical value of "0", and the results of sampling pixel colors 112 and 114 are as indicated by indicator arrays 222 and 224. An "OR" logic operation 226 is then performed on indicator arrays 222 and 224 to obtain a resulting array 232. After analyzing each of the three consecutive image frames of 14 1283375, the resulting array 232 is stored in a color map 230. The color block (c〇i〇rblock) 240 is an illustrative example of the color map 230. The 30 columns in the color block 240 correspond to the analyzed image frames, respectively, wherein the white blocks represent the skin color range. Pixels, while black does not represent pixels in the skin color range. The pixels in the color block 240 near the area 245, that is, the skin tone pixels between the pixels 210 and 330 from left to right, when presented in a steady state, represent that the TV news anchor may be the subject of the image facet. In order to get a more correct result, more analysis can be performed to verify. Once the candidate image segments have been identified, the lens detection circuitry 32 can assist in identifying when the image segment changes. For example, the lens detecting circuit 32 can detect when a lens is switched to another lens by stably presenting the TV news anchor by analyzing the color properties of the image frame. Referring to FIG. 5, FIG. 5 is a schematic diagram showing the detection of the switching of the lens. The lens switching is detected by comparing the partial chromatograms of the image strips 315 _ and 325 of the two image frames 312 and 322. In order to reduce the complexity of the operation, the lens detecting circuit 32 first detects a large-scale lens switching, and once the switching is found, 'then locks a smaller range to detect the area where the cancer is cut. Figure 5 shows two sets of image masks 31〇 and 32〇. In this example, each group of images 310 or 320 contains 30 images, which is an image representing the length of one second. An image plane is selected for each of the image group groups 31 and 320. For the sake of simplicity, the 30th image plane is usually selected for comparison. Image strips 315 and 325 are selected from the two selected image planes 312 and 322, among which

15 1283375 影像帶315及325對應到色彩區塊24〇中區域245的位置,該位 置即代表穩定的膚色像素。意即影像帶315及325的位置也就是 電視新聞主播的頭部所出現的位置。要執行第一次色譜曲線比較15 1283375 Image strips 315 and 325 correspond to the location of region 245 in color block 24, which represents a stable skin tone pixel. This means that the location of the video strips 315 and 325 is where the head of the TV news anchor appears. To perform the first chromatographic curve comparison

時,係比較連續選出的兩個影像畫面312和322中的影像帶MS 及奶的色譜曲線,如果第一次色譜曲線比較所得到的色譜曲線 差係大於-第-臨界值’就會在3G個插人的彩色晝面中,選取每 -對彩色畫面’對其相對應的影像帶執行第二次色譜曲線比較, 以找出鏡頭切換發生時確切的影像畫面。藉由鎖定影像帶仍以 及32S所代表的區域,本發明可以正確_理包含分割晝面的影 像畫面,因為本發明僅會對該影像晝面中的—部分_色譜曲線 比較法進行分析。 、 在臉部膚色偵測電路34以及鏡頭偵測電路%產生候選 之後’ _處理· 36 _性地執行額相步驟。例何以 =景〔片時間少於-縣長度的片段,例如少於—秒或三秒,因 =1片段很可能沒有電視新聞主播的鏡頭。再者,為了 的目的,也可輯算包含有歡的耗區_影像晝面的百姐。 f影像處理電路30產生候選區段4〇之後,可以進行聲音分 二^供更多的資訊,以確保更精確地偵測新聞片 鏡碩會顯錄多臉孔,例如群眾的晝面。如 :片 用’代表偵測電視新社_產生錯誤_#^片貝科雜 、導或疋訪問也會包含大且穩定的臉部特寫。這些時候如果不 1283375 執行聲音分析的話,這些晝面也會被判斷為電視新聞主播的畫面。 聲音資料也可以當作用來決定候選段落的主要資訊,而不僅 是當作影像資料的獅資訊,如果使帛可麵聲音處理技術,例 如語音辨識技術,則使用聲音資料亦可獲得很高的可靠度。 睛回頭參閱第丨圖,當產生波形的統計數值之後,聲音訊號 • 就變得極為有用。基於上述的原目,非重疊位移視窗電路12將聲 音訊號分隔成獨立的25亳秒聲音區段,當然上述的時間長度可以 較長或較短’ 25亳秒僅是本案的一個例子。隨後快速傅立葉轉換 (fastFouriertransform’FFT)電路14會對聲音視窗進行快速傅立 葉轉換’產生的結果會傳遞給聲音能量分析電路2〇來分析聲音樣 本的能量。快速傅立葉轉換電路14將聲音樣本轉換至頻域 (frequency domain),然後分析該聲音樣本的頻率響應。聲音能量 鲁分析電路20包含有電路22、電路24以及電路%,電路用^ 計算頻率低於13 kHz的聲音樣本的能量,電路24用來計算頻率 介於8-13 kHz的聲音樣本的能量,而電路%貝,】是用來計算聲音樣 本的頻率中心(frequency centroid)。頻率中心即為所有頻譜的^算 數平均數,用來指相轉應的巾雜。聲音能量分析電路如; 的電路22、24或26的輸出隨後會和影像處理電路3〇的輪出相7 合,則可以同時處理影像分析以及聲音分析。 一合適的背景能階(background energy level)電路幻係用來 17 Ϊ283375 叶算背景雜訊的能階,背景能階電路42係採用局部能量的最低十 個的平均值,然而不一定要取十個,可以取較多或者較少,但是 利用這種平均方式可以得到較準確的聲音資料的背景雜訊能階。 所有由聲音能量分析電路20以及背景能階電路42計算出的 月b階資訊接著傳遞給比例計算電路5〇,比例計算電路計算出各 種能量比例,用來判定接收到的聲音資料的特徵。電路5 計算背景聲音能階與全部聲音能階之間的比例,電路54係用來計 算頻率落於如kHz《間的聲音之平均聲音能階與全部聲音能階 之間的比例,電路56係用來計算目前候選區段的頻率中心的變異 數電路58係用來计异無音訊比例(也⑽⑽, _音能階低於背景聲音能階之聲音段落的數目二= =數目比。比例計算電路5G計算完電路52、54、56及58所輸 r ml有比狀後接著會將計算後的比顺複數侧先確定的 7做比較。如果_特徵財落在上職财之 的新聞段落中,其餘的段細從二 十异電路50中輸出,視為具有電視新聞主播的鏡頭。 含分:二分:斷-個新聞 的像素的位置是否對應於電^7,、、、後再_洛於膚色範圍 再者’利用比較色譜曲線的:耳:的位置而呈現穩定狀態。 凌,本案可以很快地判斷出何時電. 1283375 視新聞主播已經沒有繼續^現在新聞畫面上。織再執行聲音分 析來進一步縮限候選片段的數目。 她於其齡騎則段的方法,本發曝有許乡優點,例 如’即使影像晝面包含有兩個或更多分割晝面,偵測電視新聞主 播的方法仍然相當有效。本發明可以利用-條水平掃描線來作分 析^運算複雜度較低,但結果較不準確;抑或可_用兩條水 • 1勒線來作分析,其财複雜錢雜冑,何制鮮確的 結果。再者’本發明所提出的方法亦翻於_出現—個或一個 以上的電視新縣播的晝面,而且也適用於多角度鏡頭。使用— 色彩空間來執行像素量測與比較可以更加確保本案可以有效地偵 測膚色範圍,然而使用Lab色彩空間不是本發明的必要手段。而 且臨界值也可以根據不同的膚色或是不同的化妝應用而做調整。 總之,本發明提供-種簡易的計算方法來將電視新聞節目分段。 験以上所述僅為本發明之較佳實施例,凡依本發明申請專利範 圍所做之均等變化與修飾,皆應屬本發明之涵蓋範圍。 【圖式簡單說明】 弟1圖為電視新聞分段系統的方塊圖。 第2圖,示第—水平掃描線以及第二水平掃描線來偵測影像 畫面上是否有電視新聞主播的臉部影像。 第3圖為本發_難視賴域臉部影像的流程圖。 ⑧ 19 1283375 第4圖顯示如何從兩條掃 視新聞主播。 綠來得到—邏輯色彩圖,用來偵測電 =圖顯示_鏡頭切換之示意圖,藉触較兩個影像晝面的影 像㈣局部色譜曲線,來_綱切換。 ⑧ 【主要元件符號說明】 系統 12非重疊位移視窗電路 14快速傅立葉轉換電路 22 24聲音樣本的能量之計算電路 26聲音樣本的頻率中心之計算電路 20聲音能量分析電路 3〇影像處理電路 32鏡頭偵測電路 34臉部膚色偵測電路 36後續處理電路 40新聞影像的候選片段 5〇比例計算電路 52、54聲音能階比例計算電路 42背景能階電路 56頻率中心的變異數之計算電路 58無音訊比例計算電路 100、210影像晝面 102第一水平掃描線 104弟二水平掃描線 112、114取樣像素顏色 220膚色偵測程序 222、224指標陣列 226 “或”邏輯運算 230色彩地圖 232結果陣列 240色彩區塊 245呈現膚色的穩定區域 310、320影像晝面組 312、322影像晝面 315、325影像帶 20When comparing the chromatograms of the images with MS and milk in the two image frames 312 and 322 selected continuously, if the chromatographic curve obtained by comparing the first chromatographic curve is greater than the -th-threshold value, it will be in 3G. In the inserted color plane, select each pair of color pictures to perform a second chromatographic curve comparison on the corresponding image band to find the exact image picture when the lens switching occurs. By locking the image strip and the area represented by the 32S, the present invention can correctly image the image including the split face, because the present invention only analyzes the partial-chromatography curve comparison method in the facet of the image. After the face skin color detecting circuit 34 and the lens detecting circuit % generate candidates, the phase step is performed _processing. For example, if the film is less than the length of the county, for example, less than -second or three seconds, because the =1 segment is likely to have no footage of the TV news anchor. In addition, for the purpose, it is also possible to calculate the hundred sisters who have the consumption area of the Huan. After the image processing circuit 30 generates the candidate segment 4, the sound can be divided into more information to ensure more accurate detection of the news film to record multiple faces, such as the face of the masses. Such as: film with the representative of the detection of the new TV news _ generated error _ # ^ film Beca miscellaneous, guide or 疋 access will also contain large and stable face close-up. At these times, if you do not perform sound analysis on 1283375, these faces will also be judged as the screen of the TV news anchor. Sound data can also be used as the main information used to determine the candidate passages, not just the lion information used as the image data. If the sound processing technology such as voice recognition technology is used, the sound data can be used to obtain high reliability. degree. Looking back at the figure, the sound signal becomes extremely useful when the statistical values of the waveform are generated. Based on the above, the non-overlapping displacement window circuit 12 separates the sound signal into separate 25-second sound segments. Of course, the above-mentioned time length can be longer or shorter '25 亳 seconds is only an example of the present case. The result of the fast Fourier transform of the sound window by the fast Fourier transform (FFT) circuit 14 is then passed to the sound energy analysis circuit 2 to analyze the energy of the sound sample. The fast Fourier transform circuit 14 converts the sound samples to a frequency domain and then analyzes the frequency response of the sound samples. The sound energy analysis circuit 20 includes a circuit 22, a circuit 24, and a circuit %. The circuit calculates the energy of a sound sample having a frequency lower than 13 kHz, and the circuit 24 calculates the energy of the sound sample having a frequency between 8 and 13 kHz. The circuit %, is used to calculate the frequency centroid of the sound sample. The frequency center is the average of the ^ arithmetic of all the spectra and is used to refer to the matching of the teeth. The output of the circuit 22, 24 or 26 of the sound energy analysis circuit, for example, is then combined with the wheeling of the image processing circuit 3, and image analysis and sound analysis can be processed simultaneously. A suitable background energy level circuit phantom is used to calculate the energy level of the background noise of 17 Ϊ283375, and the background energy level circuit 42 uses the lowest ten average of the local energy, but does not necessarily take ten You can take more or less, but you can get more accurate background noise level of sound data by using this averaging method. All of the monthly b-order information calculated by the sound energy analysis circuit 20 and the background energy level circuit 42 is then passed to the proportional calculation circuit 5, and the proportional calculation circuit calculates various energy ratios for determining the characteristics of the received sound data. Circuit 5 calculates the ratio between the background sound level and the total sound level, and circuit 54 is used to calculate the ratio of the average sound level of the sound falling between frequencies such as kHz to the total sound level, circuit 56 The variance number circuit 58 used to calculate the frequency center of the current candidate segment is used to calculate the proportion of the unvoiced audio (also (10) (10), the number of sound segments whose _ tone energy level is lower than the background sound level, == number ratio. The circuit 5G calculates the ratio of the r ml inputted by the circuits 52, 54, 56, and 58 and then compares the calculated ratio 7 with the first side of the complex number. If the _ characteristic money falls in the news passage of the upper account In the middle, the rest of the segment is output from the twenty-six circuit 50, which is regarded as the lens with the TV news anchor. Including: two points: break - the position of the pixel of the news corresponds to the electric ^7,,,, and then _ Loose in the skin color range and then use the comparative chromatographic curve: the position of the ear: to show a steady state. Ling, this case can quickly determine when the electricity. 1283375 The news anchor has not continued ^ now on the news screen. Sound analysis Further narrowing down the number of candidate segments. She used the method of riding the segment in her age. This method exposes the advantages of Xuxiang. For example, even if the image contains two or more segments, the method of detecting TV news anchors is still It is quite effective. The invention can use the horizontal scanning line for analysis. The computational complexity is lower, but the result is less accurate; or it can be analyzed with two waters and 1 ray, which is complicated and complicated. What is the result? The method proposed by the present invention is also turned over to the presence of one or more televisions in the new county, and is also applicable to multi-angle lenses. Use - color space to perform pixels Measurement and comparison can further ensure that the case can effectively detect the skin color range. However, using the Lab color space is not a necessary means of the present invention, and the threshold can also be adjusted according to different skin colors or different makeup applications. The invention provides a simple calculation method for segmenting a television news program. The above description is only a preferred embodiment of the present invention, and is made according to the scope of the patent application of the present invention. Equivalent changes and modifications shall fall within the scope of the present invention. [Simple diagram of the diagram] Figure 1 is a block diagram of the TV news segmentation system. Figure 2 shows the first horizontal scan line and the second horizontal scan line. Detect whether there is a TV news anchor's face image on the image screen. Figure 3 is a flow chart of the _ 难 赖 脸 脸 face image. 8 19 1283375 Figure 4 shows how to scan the news anchor from two. Get the - logic color map, used to detect the electricity = map display _ lens switching diagram, by touching the image of the two images (4) local chromatographic curve, to _ class switching. 8 [Main component symbol description] System 12 non Overlap displacement window circuit 14 fast Fourier transform circuit 22 24 energy sample calculation circuit 26 frequency sample calculation circuit 20 sound energy analysis circuit 3 image processing circuit 32 lens detection circuit 34 face skin color detection circuit 36 Subsequent processing circuit 40 candidate segment of news image 5〇 ratio calculation circuit 52, 54 sound energy level ratio calculation circuit 42 background energy level circuit 56 frequency center The calculation circuit 58 of the variance number has no audio ratio calculation circuit 100, 210 image plane 102 first horizontal scan line 104 second horizontal scan line 112, 114 sample pixel color 220 skin color detection program 222, 224 indicator array 226 "or" logic Operation 230 color map 232 result array 240 color block 245 presents a stable region of skin color 310, 320 image face group 312, 322 image face 315, 325 image band 20

Claims (1)

1283375 十、申請專利範圍: 1· 一種影像分段方法,用來根據影像片段的内容剪輯影像片段, 該方法包含有: 接收一包含有複數個影像畫面的影像訊號; 利用-第-水平掃描線來分析該影像訊號的影像晝面,其令 該第一水平掃描線係選取至少一列像素來作分析;、 分析影像晝面巾位於該第_水平掃描紅之像素以決定該像 素的顏色是否落於一預定顏色範圍之内; 在該影像晝面中指出落於該預定顏色範圍之内之相鄰像素所 含盖的區域; ' 利用複數個連續的影像晝面中位於該第一水平掃描線上之像 素來產生一色彩地圖; 如果,彩地圖顯示一預定數目的連續影像畫面中,皆包含一 穩定的像素區域,並且該像素皆落於該預定的顏色範圍, 則將目如的影像段落標示為候選的影像段落; 對=每個候選的影像段落,自每則固影像畫面中選出一個 p像旦面’並且針對每個選出的影像畫面的穩定區域,產 生一色譜曲線; 執仃第色谱曲線比較,比較每一對連續選出的影像晝面 之色譜曲線; 呑第色4曲線比較所得的第一色譜曲線差大於一第—臨 界值^執彳了-帛二色譜曲線比較,味介於該對連續選 _~像晝面之間之每_對連續的影像晝面之穩定區 1283375 域’其中該對連續選出的影像晝面之色譜曲線差係大於該 第一臨界值;以及 當該第二色譜曲線比較所得的第二色譜曲線差大於一第二臨 界值時,指示該候選的影像段落中有一鏡頭改變。 2·如專利範®第1項所述之方法,其中該自每N個影像晝面 中選出一個影像晝面係包含選取第N個影像晝面。 3·如申4專利_第丨項所述之方法,其中該第—水平掃描線係 位於-影像晝面中自頂端往下大約三分之一晝面的地方。 4.如申請專利範圍第i項所述之方法,更包含有在分析該影像畫 面中位於該第-水平掃描線上之像素前,執行一臟色彩至 Lab色I轉換’來判斷該像素的顏色是否落於該預定顏色範圍。 # 5·如申請專利範圍第!項所述之方法,其中該預定數目的連續影 像畫面係構成三秒鐘的影像。 6·如申明專利範圍第1項所述之方法,更包含有: 利用-第二水平掃描線來分析該影像訊號的影像書面,盆中 如果 = 象一線係選取相同數目的列來作分析; 如n面上位於料―、第二水平掃描線上之像 色係洛於該預定顏色範圍内,則將該像素設定為邏輯值 a 22 1283375 « 1,, l ; 利用位於該第—及第二水平掃描線上的相對應的像素,執行 或(OR)邏輯運算,來產生合成的像素資料;以 及 、 利用該合成的像素資料來指出該影像晝面中落於該預定顏色 範圍的相鄰像素的區域,以及利用該複數個連續的影像畫 面來產生該色彩地圖。 7·如申請專利範圍第6項所述之方法,其中該第一、第二水平掃 描線係位於一影像畫面中自頂端往下大約三分之一晝面的地 方。 8·如申請專利範圍第1項所述之方法,更包含有移除長度小於一 預定時間的候選的影像段落。 • 9·如申請專利範圍第1項所述之方法,更包含有: 接收與該已接收之影像訊號相關聯的聲音訊號;以及 分析該聲音訊號以過濾該候選的影像段落,其中該聲音訊號 係在一預定尺寸的聲音攔框(audi〇ftame)中進行處理。 : 10·如申請專利範圍第9項所述之方法,更包含將聲音樣本轉換至 頻域(frequency domain)以分析該聲音攔框的頻率響應,以 及计鼻該聲音搁框的總體聲音能階。 23 1283375 U·如申請專利範圍® ι〇項所述之方法,更包含有: 計算該聲音攔框之背景聲音能階; 比較該背景聲音能階與該總體聲音能階;以及 如果該背景聲音能階對該總體聲音能階之關並未落在一第 一特定範圍,則消除該候選的影像段落。 12·如申請專利範圍帛u項所述之方法,更包含有: 片异聲音能階低於該背景聲音能階之聲音攔餘對所有聲音 攔框數之比例;以及 如果該比例係未落於一第二特定範圍,則消除該候選的影像 段落。 13·如申請專利範圍第10項所述之方法,更包含有: 計算頻率落於8-13 kHz之聲音攔框的平均聲音能量; 計算頻率落於8-13kHz之聲音攔框的平均聲音能量對總體聲 音能階之比例;以及 如果該比例未落於一特定的範圍,則消除該候選的影像段落。 14·如申請專利範圍第1〇項所述之方法,更包含有: 计鼻该目前候選影像段落之頻率中心(frequency centroid)的 變異數;以及 如果該頻率中心的變異數未落於一特定的範圍,則消除該候 選的影像段落。 24 1283375 15. —種影像分段方法,用來根據影像片段的内容剪輯影像片段, 該方法包含有: 接收一包含有複數個影像晝面的影像訊號; 接收與該已接收之影像訊號相關聯的聲音訊號; 利用-第-水平掃描線與一第二水平掃描線來分析該影像訊 唬的影像晝面,其中該第一水平掃描線與該第二水平掃描 線係各選取至少一列像素來作分析; 如果影像晝面上位於該第-、第二水平掃描線上之像素的顏 色係落於-預定顏色範圍内,翁該像素設定為邏輯值 “Γ ; 利用位於該第-及第二水平掃描線上之相對應的像素,執行 -“或(QR) ”邏輯運算,來產生合成的像素資料; 利用該合成的像素資料來指^該影像晝面巾落於該預定顏色 範圍的相鄰像素的區域; 利用複數個連續的影像晝面中之合成的像素資料來產生一色 彩地圖; 如果色彩地圖顯示一預定數目的連續影像晝面中,皆包含一 穩定的像素區域,並且該像素皆落於該預定的顏色範圍, 則將目前的影像段落標示為候選的影像段落· 對於每-個候選的影像鄕,自每N個影像晝面中選出—個 影像晝面,並且針對每個選出的影像晝面的穩定區域產 生一色譜曲線; 執行-第-色譜曲線比較,比較每―對連續選出的影像晝面 25 1283375 之色譜曲線; 當該第-色譜曲線比較所得的第—色譜曲線差大於一第一臨 界值時’執行-第二色譜曲線比較,比較介於該對連續選 出的影像晝®r之間之每—對連續的影像晝面之穩定區 域’其中該對連續選出的影像畫面之色譜曲線差係大於該 第一臨界值; 虽及第一色4曲線比較所得的第二色譜曲線差大於一第二臨 界值時,指示該候選的影像段落中有-綱改變;以及 分析該聲音峨⑽濾雜選的影像段落其巾該聲音訊號 之特徵係藉域理鱗音峨之魏翅定尺寸的聲音 攔框(audio frame)而取得。 16.如申請專利範圍第15項所述之方法,其中該第-、第二水平 知描線係位於-影像晝面中自頂端往下大約三分之一晝面 的地方。1283375 X. Patent application scope: 1. An image segmentation method for editing an image segment according to the content of the image segment, the method comprising: receiving an image signal including a plurality of image frames; utilizing a -first horizontal scanning line The image plane of the image signal is analyzed, and the first horizontal scanning line selects at least one column of pixels for analysis; and the analysis image is located at the pixel of the first horizontal scanning red to determine whether the color of the pixel falls. Within a predetermined color range; indicating, in the image plane, an area covered by an adjacent pixel that falls within the predetermined color range; 'Using a plurality of consecutive image planes on the first horizontal scan line Pixels to generate a color map; if the color map displays a predetermined number of consecutive image frames, each of which includes a stable pixel area, and the pixels all fall within the predetermined color range, the target image segments are marked as Candidate image segment; For each candidate image segment, select a p-image from each solid image And a chromatogram is generated for each stable region of the selected image frame; the chromatographic curve of the pair of consecutively selected images is compared, and the first chromatogram of the image of the pair of consecutively selected images is compared; The difference is greater than a first-threshold value ^ 彳 帛 帛 色谱 色谱 色谱 色谱 色谱 色谱 , , , 味 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱 色谱The chromatographic curve difference of the successively selected image faces is greater than the first critical value; and when the second chromatographic curve difference obtained by the second chromatographic curve comparison is greater than a second critical value, indicating that there is a lens in the candidate image segment change. 2. The method of claim 1, wherein the selecting one image from each of the N image planes comprises selecting the Nth image plane. 3. The method of claim 4, wherein the first horizontal scanning line is located in the image plane from about one third of the top surface of the image. 4. The method of claim i, further comprising performing a dirty color to Lab color I conversion to determine the color of the pixel before analyzing pixels located on the first horizontal scan line in the image frame. Whether it falls within the predetermined color range. # 5·If you apply for a patent range! The method of the item, wherein the predetermined number of consecutive image frames form a three second image. 6. The method of claim 1, further comprising: analyzing the image of the image signal by using a second horizontal scanning line, if the same number of columns are selected for analysis in the basin; If the image on the n-surface is on the second horizontal scan line and the image color is within the predetermined color range, the pixel is set to a logical value a 22 1283375 « 1,, l; the use is located in the first and second Corresponding pixels on the horizontal scan line, performing an OR operation to generate synthesized pixel data; and using the synthesized pixel data to indicate adjacent pixels of the image plane falling within the predetermined color range a region, and using the plurality of consecutive image frames to generate the color map. 7. The method of claim 6, wherein the first and second horizontal scanning lines are located in an image frame about one-third of the time from the top to the bottom. 8. The method of claim 1, further comprising removing the candidate image segments having a length less than a predetermined time. 9. The method of claim 1, further comprising: receiving an audio signal associated with the received image signal; and analyzing the audio signal to filter the candidate image segment, wherein the audio signal It is processed in a predetermined size sound block (audi〇ftame). 10. The method of claim 9, further comprising converting the sound sample to a frequency domain to analyze the frequency response of the sound block, and calculating the overall sound level of the sound shelf. . 23 1283375 U. The method of claim 1, wherein the method further comprises: calculating a background sound level of the sound block; comparing the background sound level with the overall sound level; and if the background sound If the energy level of the energy level does not fall within the first specific range, the candidate image segment is eliminated. 12. The method of claim 2, further comprising: a ratio of a sound imbalance of the sound level of the background sound level to a ratio of all sound blocks; and if the ratio is not In a second specific range, the candidate image segments are eliminated. 13. The method of claim 10, further comprising: calculating an average sound energy of a sound intercepting frame whose frequency falls within 8-13 kHz; calculating an average sound energy of a sound intercepting frame whose frequency falls within 8-13 kHz The ratio of the overall sound level; and if the ratio does not fall within a particular range, the candidate image segment is eliminated. 14. The method of claim 1, further comprising: counting the frequency centroid variation of the current candidate image segment; and if the frequency center variation does not fall within a specific The range of the image segment is eliminated. 24 1283375 15. An image segmentation method for editing an image segment according to the content of the image segment, the method comprising: receiving an image signal including a plurality of image frames; receiving the image signal associated with the received image a sound signal; analyzing the image plane of the image signal by using a first horizontal scanning line and a second horizontal scanning line, wherein the first horizontal scanning line and the second horizontal scanning line each select at least one column of pixels For analysis; if the color of the pixel on the first and second horizontal scan lines on the image plane falls within the predetermined color range, the pixel is set to a logical value "Γ; use at the first and second levels Performing - "or (QR)" logic operations on the corresponding pixels on the scan line to generate synthesized pixel data; using the synthesized pixel data to refer to the image of the adjacent pixels falling within the predetermined color range a region; generating a color map using the synthesized pixel data in a plurality of consecutive image frames; if the color map displays a predetermined number of consecutive In the image plane, a stable pixel area is included, and the pixel falls within the predetermined color range, and the current image paragraph is marked as a candidate image segment. For each candidate image, from each N One image plane is selected from the image plane, and a chromatogram is generated for each stable region of the selected image; the execution-first-chromatogram comparison is performed to compare each successive pair of selected images to the surface of the image 25 1283375 Chromatographic curve; when the first chromatographic curve is compared with the first chromatographic curve difference is greater than a first critical value, the 'execution-second chromatographic curve comparison compares each of the pairs of consecutively selected images 昼®r— a stable region of the continuous image plane, wherein the chromatographic curve difference of the consecutively selected image frames is greater than the first critical value; although the difference between the second chromatogram curve obtained by comparing the first color 4 curve is greater than a second critical value a value indicating that there is a -an change in the candidate image segment; and analyzing the sound 峨(10) filtering the selected image segment of the image The method of claim 15 wherein the first and second horizontal lines are located in the image plane. About one-third of the time from the top down. 17 jrr a方法該方法係根據細—電視新_像片段中 =社罐___輪,財法包含有: =欠一包含有複數個新聞影像晝_影像訊號; 1复由第*水平知域來分析該影像訊號的新聞影像畫面, -新 兮榇H 水千知描線上之像素以決定 &quot; 色衫落於—預定顏色鋼之内,以侧該電 26 1283375 視新聞主播的膚色; 在該新_像畫面&quot;出落於該觀顏色麵之内之相鄰像 素所含蓋的區域; 利用複數個連續的酬影像畫面中位於該第—水平婦描線上 之像素來產生一色彩地圖; 如果色彩地圖顯示-預定數目的連續新聞影像畫面中,皆包 含一穩定的像素區域,並且該像素皆落於該預定的顏色範 圍,則將目前的影像段落標示為候選的影像段落; 對於每-键選的影像麟’自每N個新聞影像晝面中選出 7新聞影像畫面,並且針對每個選㈣新_像晝面的 穩定區域,產生一色譜曲線; 執行-第-色譜曲線比較,比較每—對連續選出的新聞影像 畫面之色譜曲線; 當該第一色譜曲線比較所得的第一色罐曲線差大於—第一臨 界值時,執行-第二色譜曲線比較,比較介於該對連續選 出的新聞影像畫©之間之每—對連續的糊影像晝面之 穩定區域,其中麟連續選出崎_像畫面— 差係大於該第-臨界值;以及 °曰線 當該第二色譜曲線比較所得的第二色譜曲線差大於_第二臨 界值時,指示該候選的影像段落中有一鏡頭改變。一 十一、圖式··17 jrr a method This method is based on the fine-TV new _ image segment = community can ___ round, the financial method contains: = owed one contains a plurality of news images 昼 _ image signal; 1 complex by the * level The field is used to analyze the news image of the image signal, and the pixel of the new H-water-known line is determined to be &#; the color shirt falls within the predetermined color steel, and the side of the electricity is 26 1283375 depending on the skin color of the news anchor; In the new _image screen &quot; the area covered by the adjacent pixels within the color plane; generating a color by using pixels in the plurality of consecutive image frames on the first horizontal line Map; if the color map display - a predetermined number of consecutive news image screens, including a stable pixel area, and the pixels fall within the predetermined color range, the current image paragraph is marked as a candidate image paragraph; The image of each button is selected from 7 news image frames from each N news image, and a chromatogram is generated for each selected (four) new _ image of the stable region; Comparing the spectral curves, comparing the chromatographic curves of each of the successively selected news image frames; when the first chromatogram curve difference obtained by the first chromatographic curve comparison is greater than the first critical value, performing - comparing the second chromatogram curves, comparing Between each pair of consecutively selected news image paintings ©—a stable region of the continuous paste image, in which the lining successively selects the singular image—the difference is greater than the first critical value; When the difference between the second chromatogram curve and the obtained second chromatographic curve is greater than the second threshold value, it indicates that there is a lens change in the candidate image segment. 11. Illustrated ··· 2727
TW94126220A 2005-08-02 2005-08-02 Anchor person detection for television news segmentation based on audiovisual features TWI283375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW94126220A TWI283375B (en) 2005-08-02 2005-08-02 Anchor person detection for television news segmentation based on audiovisual features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW94126220A TWI283375B (en) 2005-08-02 2005-08-02 Anchor person detection for television news segmentation based on audiovisual features

Publications (2)

Publication Number Publication Date
TW200707336A TW200707336A (en) 2007-02-16
TWI283375B true TWI283375B (en) 2007-07-01

Family

ID=39428161

Family Applications (1)

Application Number Title Priority Date Filing Date
TW94126220A TWI283375B (en) 2005-08-02 2005-08-02 Anchor person detection for television news segmentation based on audiovisual features

Country Status (1)

Country Link
TW (1) TWI283375B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9741345B2 (en) 2013-08-15 2017-08-22 Chunghwa Telecom Co., Ltd. Method for segmenting videos and audios into clips using speaker recognition

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI700925B (en) * 2018-01-04 2020-08-01 良知股份有限公司 Digital news film screening and notification methods
CN111866610B (en) * 2019-04-08 2022-09-30 百度时代网络技术(北京)有限公司 Method and apparatus for generating information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9741345B2 (en) 2013-08-15 2017-08-22 Chunghwa Telecom Co., Ltd. Method for segmenting videos and audios into clips using speaker recognition

Also Published As

Publication number Publication date
TW200707336A (en) 2007-02-16

Similar Documents

Publication Publication Date Title
US7305128B2 (en) Anchor person detection for television news segmentation based on audiovisual features
RU2494566C2 (en) Display control device and method
EP0720114B1 (en) Method and apparatus for detecting and interpreting textual captions in digital video signals
JP4269473B2 (en) Method, computer storage medium and computer system for segmenting audio-visual recordings
US6546185B1 (en) System for searching a particular character in a motion picture
Ren et al. Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection
US20030068087A1 (en) System and method for generating a character thumbnail sequence
EP1081960A1 (en) Signal processing method and video/voice processing device
CN104134435B (en) Image processing equipment and image processing method
CN100589532C (en) Caption region extracting device and method
US20110007975A1 (en) Image Display Apparatus and Image Display Method
US20090067807A1 (en) Signal processing apparatus and method thereof
CN107430780A (en) The method created for the output based on video content characteristic
TW200536389A (en) Intelligent key-frame extraction from a video
WO2014000515A1 (en) Advertisement video detection method
US8630532B2 (en) Video processing apparatus and video processing method
CN109876416A (en) A kind of rope skipping method of counting based on image information
TWI283375B (en) Anchor person detection for television news segmentation based on audiovisual features
US8311269B2 (en) Blocker image identification apparatus and method
KR101471204B1 (en) Apparatus and method for detecting clothes in image
CN103124325A (en) Image processing device, image processing method, and recording medium
Wang et al. Robust image chroma-keying: a quadmap approach based on global sampling and local affinity
JP2007515891A (en) Image format conversion
CN101827224A (en) Detection method of anchor shot in news video
US8718366B2 (en) Moving text detection in video

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees