TWI283375B

TWI283375B - Anchor person detection for television news segmentation based on audiovisual features

Info

Publication number: TWI283375B
Application number: TW94126220A
Authority: TW
Inventors: Shih-Hung Lee; Chia-Hung Yeh; Hsuan-Huei Shih; Chung-Chieh Kuo
Original assignee: Mavs Lab Inc
Priority date: 2005-08-02
Filing date: 2005-08-02
Publication date: 2007-07-01
Also published as: TW200707336A

Abstract

A video segmentation method for segmenting video clips according to content of the video clips is disclosed. The method comprises scanning pixels of video frames with a first horizontal scan line to determine if colors of the pixels fall within a predetermined color range; creating a color map utilizing pixels located on the first horizontal scan line from a plurality of successive video frames; labeling the current video segment as a candidate video segment if the color map indicates the presence of a stable region of pixels falling within the predetermined color range for a predetermined number of successive video frames; and performing histogram color comparisons on the stable regions for detecting shot transitions. Audio signals of the video clips may also be analyzed to further verify the candidate video segments.

Description

1283375 九、發明說明：【發明所屬之技術領域】本發明係有關一種影像分段技術，尤指一種偵測電視新聞主播’並將電視新聞節目分段的方法。【先前技術】因為電視上的新聞頻道日益增加，所以可以取得的新聞資訊也愈來愈多，因此觀眾愈來愈不容易搜尋並找出想要的新聞節目。一個新聞節目通常包含有若干段不同的新聞，而每一段新聞之間通¥;又有太多的關聯。為了讓搜尋以及分類每段新聞變得更加便利，可以_電視新聞主播的影像來期每—段新聞何時開始以及何時結束。因此在每-段新聞晝面中，電視新聞主播的^ 頭成為最重要的綱，電賴駄·常在每—騎關始時做引言介紹，歧在每-段糊結束時對新_容講評或整理。因此電視新齡義綱可財效地傳達新_容社要概冬，觀眾柯以根據電視新聞主播的鏡頭來瀏覽新_目，也狀以藉由偵測新聞主播來識別每一段新聞。疋π 傳統將新聞分段的方法用的是一種機器學習（咖⑽ i_ng)技術’該技術會自動將咖分類，細制知技細限㈣峨蝴讀來顯示刊末源的貝料。也有其他使用較複雜演算法以及語者識別（speakeridentiflcati()I〇的…例如臉部辨識旳方法，因為電視新聞主播 1283375 是誰以及他在晝面中的位置是未知的。以下所列是幾種習知的分段方法··頭部偵測、嘴型偵測、口音及音樂的分類或辨識、隱藏式字幕(closed-caption)擷取以及影像光學文字辨識系統（〇ptical character recognition，OCR)，以及模型基礎方法（m〇del-base(i method)。然而上述的方法皆仰賴極為複雜的演算法。【發明内容】本發明的目的之一在於提供一種掃描新聞節目影像晝面的方法，來解決上述的問題，此方法係藉由比對畫素顏色與膚色範圍來偵測電視新聞主播是否出現於影像晝面中。根據本發明的實施例，其係揭露—種影像分段方法，用采很據影像片段_容雜影像片段。該方法包含有：接收一包含有複數個影像畫面的影像訊號；利用—第—水平掃描線來分析該影像峨的影像畫©，其中該第—水平掃描線_取至少—列像素來作分析；分析影像畫面中錄該第—水平掃鱗上之像素以決定該像素_色是否落於-預定腕賴之内；在該影像晝面中指出落於該預定齡範n相鄰像素所含蓋_域；利用複數個連續的·晝面中位於該第—水平掃描線上之像素來產生一色彩地圖；如果色彩地圖顯示―預定數目的連續影像晝面中包含-穩定的像素區域’並且該像素皆落於該預定的顏色範圍，則將目前的影像段落標*為候選的影像段落；對於每一個候影像段落’自每N個影像畫面中選出一個影像畫面，並且針對每 1283375 個l出的〜像晝_穩定區域，產生—色譜曲線；執行一第一色譜曲雜較，比較每_對連續選出的影像畫面之色譜曲線；當該第-色㈣線比較所得的第—色譜曲線差大於—第—臨界值時，執订—第—色相線比較，比較介於騎連顧㈣影像畫面之間之母對連續的影像晝面之穩定區域，其令該對連續選出的影像直面之色韻線差係大於該第—臨界值；以及當該第二色譜曲線比較所得的第—色睹曲線差大於—第二臨界值時，指示該候選的影像段落中有一鏡頭改變。根據本發明的另-實酬’其麵露—種影像分段方法，用來根據影像片段的内容剪輯影像片段。該方法包含有：接收一包 3有複触u邊晝面的影像賴;；接收與該已接收之影像訊號相關聯的聲音訊號;利用一第一水平掃描線與一第二水平掃描線來分析該影像訊號的影像晝面，其中該第—水平掃描線與該第二水平掃描線，各選取至少—列像絲作分析；如果影像晝面上位於該第第—水传鱗上之像素的顏色係落於—預定顏色範圍内則將該像素设定為邏輯值“J”；利用位於該第一及第二掃描線上之相對應的像素，執行—“或（⑻，，邏輯運算丁生合成的像素資料；顧該合成的像素·來指出該影像落於該預定純範_婦騎_域；棚複數個連^ 晝面中之合梅綱紐—色侧；如果色彩地= 1定數目的連續影像畫面中，皆包含—穩定的像素區域，並二该像素皆落於_定_色範圍，則將目前的影像段落標示為候 1283375 選的影像段落·，對於每—倾親f綠段落，自糾個影像晝面中選出-個影像晝面’並且針對每個選出的影像晝面的穩定區域’產生-色譜崎；執行—第—色譜曲線味，比較每一對連續選出的影像畫面之色譜曲線；當該第一色譜曲線比較所得的第 -色譜曲線差大於-第—臨界值時，執行—第二色譜曲線比較，比車乂 ”於該對連續選出的影像畫面之間之每一對連續的影像畫面之穩定區域’其中騎_選出的影像畫面之色譜曲線差係大於該第臨界值，胃該第―色譜曲線比較所得的第二色譜曲線差大於一第二臨界值時，指示該候選的影像段料有-鏡頭改變；以及分析該聲音訊號以過滤該候選的影像段落，其中該聲音訊號之特徵係藉由處理鱗音峨讀數個敢尺相聲音獅（減〇 frame)而取得。根據本發另—實施例，其亦揭露—種影像分段方法，該方法係根據偵測-電視新聞影像片段中的電視新聞主播來剪輯該電視新聞_段。該方法包含有n包含有複數個新聞影 :二：影像訊號;利用一第一水平掃描線來分析該影像訊號的 '耳衫像晝面，其中該第—水平掃描線係選取至少—列像素來作分析’ ·分析新聞影像畫面中位於該第一水平掃描線上之像素以決 H 象素的顏色是否落於一預定顏色範圍之内以偵測該電視新聞主播的膚色；在該賴縣畫面中指出落於_定顏色範圍之内之相，像素所含蓋的區域；利用複數個連續的新聞影像畫面中位於該第一水平掃描線上之像素來產生一色彩地圖；如果=彩地 1283375 圖顯示-就數目的連續新聞影像晝面中，皆包含—穩定的像素，域，並且該像素皆落於該預定的顏色細，騎目前的影像段洛標示為候選的影像段落；對於每—健選的影像段落，自每N 個新聞影像晝面中選出一個新聞影像晝面，並新聞影像畫面的敎區域，產生—色譜曲線；執行-第- ί比較’比較每―對連續選出的新聞影像晝面之色譜曲線；當該第Γ色譜曲線比較所得的第一色譜曲線差大於一第一臨界值時，執仃-第二色譜曲線比較，比較介於該對連續選出的新聞影像畫 =1:每一對連續的新聞影像畫面之穩定區域，其中該對連續 &出的新_像畫面之色譜曲線差係大_第—臨界值；以及春色譜曲線比較所得的第二色譜曲線差大於—第二臨界值田寺’才曰不該候選的影像段落中有一鏡頭改變。算法本發明所提出的方法利用簡單的演貝I像旦面中疋否出現落於膚色找出新·_換的㈣。藉置，即使_面包含有分物f _播“現的位位置分段。 “她之，Ltr 視新聞主播的頭部部分出現的〜之本發明提供-種簡易的計算方法來將 1283375 【實施方式】 '參閱第1圖，帛1圖為電視新聞分段系統的方塊圖。該系、、先10係藉由债測電視新聞主播而將電視新聞分段，系統⑺包含有影像處理魏30’f爾理電輸會依侧_ =。來產f新聞影像的― /之後會進一步分析該候選片段4〇的聲音資訊，以確保影像分析的正確性。 ' 衫像處理電路3〇包含有一鏡頭偵測電路32、一臉部膚色偵，路34 Μ及-物纽桃36。臉部膚色躺電路34用來谓洛在預疋辄圍的影像畫面上的像素，該預定範圍即代表膚色範圍。請參閱第2圖及第3圖，第2圖顯示利用第一水平掃描線⑽ 以及第二水平掃描線1G4來_影像畫面·上是否有電視新聞主播的臉部影像。第3圖則為本發明_電視_主播臉部影像的流程圖。研究顯示’攝影師通常習慣將主播的臉部置於離影像畫面頂端約三分之-的地方’因此，臉部膚色偵測電路別利用第一掃描線H)2或有時候連同第二水平掃色的畫素。軸_的辆只需要第—斜概^^，^=卜利用第二水平掃描、線1〇4可以讓臉部膚色侦測電路％產生的結果。舉例來說，水平掃描線可能通過電視新聞主播的眼睛或嘴巴，雖然水平掃描線仍然通過電視新聞主播的臉部，但偵測到 12 1283375 的顏色卻不是膚色，這會導致不準確_測結果。為了減少這種情況的發生機會’以及為了提供更魏用來侧電視新聞主播臉部位置的資料，因此會用到兩條水平掃描線。不論第一水平掃描線1〇2或是第二水平掃描線皆會分析影像晝面100中至少-列的像素，並分別產生取樣像素顏色112 以及1M。睛注意，第一水平掃描線1〇2的位置以及第二水平择描線1〇4的位置會儘可能落於電視畫面三分之一的地方，以增加掃描到電視新聞主播的臉部的可能性。以下將轉第3圖中所顯示的步驟。 ^ ^ 步驟150:開始。步驟12·將影像晝面卿的色影空間㈤。寧⑹從卿色彩空間轉換至Lab色彩空間。Lab色彩空間較適合用來偵測膚色，而且也更為普遍使用。然而，本發明也可以使用其他種類的色彩空間，例如RGB、YCbCr以及 IRgBy 〇步驟154 ·判別第一水平掃描線ι〇2 (或有時連同第二水平掃描線 1〇4)疋否在该影像晝面100上掃描到任何落於膚色範圍的像素。這個膚色範圍可以依據局部區域或是攝影棚的燈光狀況而調整。步驟156 : _是否有—簡大且連_膚色細區域。也就是說’判別是否有—群賴的像素，其個數比—預定值 13 1283375 =而且全數落於膚色義。如果有—働大且連續牛驟158.如辄圍區域’到步驟158’否則則到步驟160。乂 _.=將目前的_段設_選的影像片段。因為之後會對該影像片段做更多的影像以及聲音分析，該影牛驟㈣像片段可能不會再被設定為候選的影像片段。步驟160 :結束。 “閱第4圖’第4圖顯不如何從兩條掃描線來得到一邏輯 =圖=gie ，綱電視新耻播。在新聞播報的一“視_主獅位置—般而言都很固定，因此可以利用這個事實來_在連_影像晝面巾，是否在大__位置上都包含有具有膚色的像素。本針所舉_子皆假設影像區段中每秒鐘包含有30姆彡像t面。翻賊速獻是方便於轉本發明所提出的方法，然而不應將此播放速度作為本案的限制。第-水平掃描線1()2以及第二水平掃描線1〇4係用來在複數個影像畫© 210巾，例如3G個連續的影像晝面，產生取樣像素顏色112以及114。-旦取樣像素顏& 112以及114產生之後就會啟動膚色制程序22〇來將每轉素作分類··如果像素是落在膚色範圍’則其代表-邏輯值“丨”，如果像素非落在膚色範圍，則其代表-邏輯值“0”，取樣像素顏色112以及114的結果則如指標陣列222及224所示。之後對指標陣列222以及224執行一 “或 (OR) ”邏輯運算226，來得到結果陣列232。分析完3〇個連續 14 1283375 的影像晝面中的每一個之後，結果陣列232則儲存於一色彩地圖 230中。色彩區塊（c〇i〇rblock) 240為色彩地圖230的一個圖示範例’色彩區塊240中的30列分別對應至已分析的3〇個影像晝面，其中白色區塊代表膚色範圍的像素，而黑色則不代表膚色範圍的像素。色彩區塊240中在區域245附近的像素，也就是由左至右大約從像素210至330之間的膚色像素，其呈現穩定狀態時代表電視新聞主播可能是影像晝面的主題。為了得到更正確的結 φ 果’可以執行更多的分析來驗証。一旦候選的影像區段經過識別之後，鏡頭偵測電路32可以協助識別影像區段何時改變。例如，鏡頭偵測電路32可以藉由分析影像畫面的色彩性質來偵測一個鏡頭何時由穩定地呈現電視新聞主播而切換至另一個鏡頭。請參閱第5圖，第5圖顯示偵測鏡頭切換之示意圖，藉由比較兩個影像畫面312和322的影像帶315 _ 和325的局部色譜曲線，來偵測鏡頭切換。為了減少運算的複雜度，鏡頭偵測電路32首先偵測大規模的鏡頭切換，一旦發現切換之後’接著鎖定較小的範圍來偵測切換癌切的發生區域。第5圖顯示兩組影像晝面組31〇及32〇,在本例中，每一組影像晝面310或320皆包含30個影像畫面，也就是代表一秒鐘長度的影像。在每一個影像畫面組31〇以及320中各選出一個影像晝面，為了簡單起見，通常會選出第30個影像晝面來做比較。在連續選出的兩個影像晝面312和322中選出影像帶315及325，其中1283375 IX. DESCRIPTION OF THE INVENTION: FIELD OF THE INVENTION The present invention relates to an image segmentation technique, and more particularly to a method of detecting a television news anchor' and segmenting a television news program. [Prior Art] Because of the increasing number of news channels on TV, more and more news information can be obtained, so it is increasingly difficult for viewers to search and find the news programs they want. A news program usually contains several different pieces of news, and each piece of news is connected to each other; there are too many connections. In order to make it easier to search and classify each piece of news, you can use the image of the TV news anchor to see when each piece of news starts and ends. Therefore, in every news section, the head of the TV news anchor has become the most important program. Comment or organize. Therefore, the new TV platform can communicate the new _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _疋π Traditionally, the method of segmenting news is a machine learning (cafe (10) i_ng) technology. The technology automatically classifies the coffee, and fine-cuts the technical details (4) to read the shellfish. There are also other methods that use more complex algorithms and speaker recognition (speakeridentiflcati()I〇...such as face recognition, because the TV news anchor 1283375 is who his position in the face is unknown. The following are a few A conventional segmentation method · head detection, mouth detection, classification and recognition of accent and music, closed-caption capture, and optical character recognition (OCR) And the model base method (m〇del-base(i method). However, the above methods all rely on extremely complicated algorithms. [Invention] One of the objects of the present invention is to provide a method for scanning a news program image. To solve the above problem, the method is to detect whether a TV news anchor appears in the image plane by comparing the pixel color and the skin color range. According to an embodiment of the invention, the image segmentation method is disclosed. The image is taken according to the image segment _ tolerance image segment. The method includes: receiving an image signal containing a plurality of image frames; utilizing - A flat scan line is used to analyze the image of the image ©, wherein the first horizontal scan line _ takes at least a column of pixels for analysis; and the pixel on the first horizontal scalar is recorded in the analysis image to determine the pixel_color Whether it falls within the predetermined wristband; in the image plane, the cover_domain that belongs to the adjacent pixel of the predetermined age is indicated; and the plurality of consecutive facets are located on the first horizontal scan line Pixels to generate a color map; if the color map displays "a predetermined number of consecutive image planes containing a - stable pixel area" and the pixels all fall within the predetermined color range, the current image paragraph is marked as a candidate Image segment; for each image segment 'select one image from each N image frames, and for each 1283375 l out ~ like _ stable region, generate - chromatogram curve; perform a first chromatogram Comparing the chromatogram of each successively selected image frame; when the first-color (four) line comparison results in a difference of - the first threshold value, the binding - the first hue line ratio Comparing the stable region of the maternal continuous image between the images of the rider and the image, wherein the pair of successively selected images has a colorline difference greater than the first threshold; and when the first When the difference between the two chromatographic curves is greater than the second threshold, it indicates that there is a lens change in the candidate image segment. According to the present invention, the other aspect of the image is segmented. The method is configured to: cut a video clip according to the content of the video clip. The method includes: receiving an image of a packet 3 having a complex touch u edge; receiving an audio signal associated with the received image signal; using a first The horizontal scan line and a second horizontal scan line are used to analyze the image plane of the image signal, wherein the first horizontal scan line and the second horizontal scan line are each selected from at least one column of the image for analysis; The color of the pixel located on the first water-water scale falls within a predetermined color range to set the pixel to a logical value "J"; using corresponding pixels located on the first and second scan lines Execution - "or ((8),, the logical operation of the pixel data of the synthesis; take the synthesized pixel to indicate that the image falls on the predetermined pure _ _ _ _ _ domain; shed a number of ^ 昼昼纲纽—Color side; if the color field = 1 a fixed number of consecutive image frames, including a stable pixel area, and the pixel is in the _ _ color range, the current image paragraph is marked as waiting 1283375 Selected image paragraphs, for each-following the green paragraph, select one image from the face of the image and create a stable region for each selected image. - chromatographic curve taste, comparing the chromatographic curve of each pair of consecutively selected image images; when the first chromatographic curve comparison of the first chromatographic curve is greater than - the first critical value, the execution - the second chromatographic curve comparison, than the car乂" in the stable region of each pair of consecutive image frames between the successively selected image frames, wherein the chromatographic curve difference of the image of the riding_selected image is greater than the first critical value, and the first chromatographic curve comparison When the difference of the second chromatogram is greater than a second threshold, indicating that the candidate image segment has a lens change; and analyzing the audio signal to filter the candidate image segment, wherein the sound signal is characterized by processing scale The sound 峨 reading is obtained by a dare to the sound lion (minus frame). According to another embodiment of the present invention, there is also disclosed an image segmentation method for editing a television news segment based on a television news anchor in a detection-television news image segment. The method comprises the following steps: n: a plurality of video images: two: an image signal; using a first horizontal scanning line to analyze the image of the image of the earphone, wherein the first horizontal scanning line selects at least a column of pixels For analysis </ br> analyzes the pixels on the first horizontal scan line in the news image screen to determine whether the color of the H pixel falls within a predetermined color range to detect the skin color of the television news anchor; in the Lai County screen Pointing out the area within the range of the _ fixed color, the area covered by the pixel; generating a color map using pixels located on the first horizontal scan line in a plurality of consecutive news image frames; if = color map 1283375 Display - in the number of consecutive news images, all contain - stable pixels, fields, and the pixels are all in the predetermined color, riding the current image segment as the candidate image segment; for each - health Selected image passages, select one news image from each N news image, and the 敎 region of the news image, generate - chromatogram; execution - first - ί ratio 'Compare each chromatogram of the successively selected news images; when the difference between the first chromatographic curve of the Dijon chromatographic curve is greater than a first critical value, the comparison of the second chromatographic curve is compared between The pair of consecutively selected news image paintings=1: a stable region of each pair of consecutive news image frames, wherein the chromatographic curve difference of the pair of consecutive & new images is large _ first-threshold value; and spring color spectrum The second chromatographic curve difference obtained by the curve comparison is greater than the second critical value of Tiansi', and there is a lens change in the candidate image segment. Algorithm The method proposed by the present invention utilizes a simple representation of the appearance of the image in the face of the image. Borrowing, even if _ bread contains the distribution f _ broadcast "the current position of the segment." "She, Ltr as the head of the news anchor appears in the invention provided - a simple calculation method to be 1283375 Mode] 'Refer to Figure 1, 帛1 is a block diagram of the TV news segmentation system. The department, the first 10 series segmented the TV news by the debt test TV news anchor, and the system (7) contains the image processing Wei 30'f power transmission according to the side _ =. The / / news image of the production of f will further analyze the sound information of the candidate segment 4 to ensure the correctness of the image analysis. The shirt image processing circuit 3 includes a lens detecting circuit 32, a face skin color detector, a road 34, and an object new peach 36. The face color lying circuit 34 is used to refer to pixels on the pre-imaged image frame, which represents the skin color range. Referring to Fig. 2 and Fig. 3, Fig. 2 shows whether or not there is a face image of a television news anchor on the video screen by using the first horizontal scanning line (10) and the second horizontal scanning line 1G4. Figure 3 is a flow chart of the invention_TV_mains face image. Research shows that 'photographers are used to placing the anchor's face about three-thirds of the top of the image.' Therefore, the facial skin tone detection circuit does not use the first scan line H) 2 or sometimes with the second level. Sweeping the pixels. The vehicle of the axis _ only needs the first - oblique ^^, ^ = 卜 The second horizontal scanning, the line 1 〇 4 can make the result of the facial skin color detecting circuit %. For example, the horizontal scan line may pass through the eyes or mouth of the TV news anchor. Although the horizontal scan line still passes through the face of the TV news anchor, it is detected that the color of 12 1283375 is not the skin color, which may result in inaccurate results. In order to reduce the chance of occurrence of this situation and to provide more information on the location of the face of the TV news anchor, two horizontal scan lines are used. Regardless of the first horizontal scan line 1〇2 or the second horizontal scan line, at least the columns of pixels in the image plane 100 are analyzed, and the sampled pixel colors 112 and 1M are generated, respectively. Note that the position of the first horizontal scanning line 1〇2 and the position of the second horizontal selection line 1〇4 will fall as far as possible to one-third of the TV screen to increase the possibility of scanning the face of the TV news anchor. Sex. The steps shown in Figure 3 will be followed below. ^ ^ Step 150: Start. Step 12 · The image will be shaded by the shadow space (5). Ning (6) converted from the color space to the Lab color space. The Lab color space is better suited for detecting skin tones and is more commonly used. However, the present invention may also use other kinds of color spaces, such as RGB, YCbCr, and IRgBy, step 154 - discriminate the first horizontal scan line ι 2 (or sometimes along with the second horizontal scan line 1 〇 4). The image plane 100 scans for any pixels that fall within the skin color range. This skin color range can be adjusted depending on the local area or the lighting conditions of the studio. Step 156: _ Is there a simple and even _ skin area. That is to say, 'determine whether there is a group of pixels, the number of them is - the predetermined value 13 1283375 = and all of them fall on the skin color. If there is - 働 large and continuous 158. If the area is 'to step 158' otherwise to step 160.乂 _.= Set the current _ segment to the selected video clip. Since more image and sound analysis will be performed on the image segment, the image segment may not be set as a candidate image segment. Step 160: End. "Reading Figure 4" Figure 4 shows how to get a logic from the two scan lines = map = gie, the new TV show of the TV. In the news broadcast, a "view _ main lion position - generally fixed Therefore, it is possible to take advantage of the fact that the _image 昼 face towel contains pixels with skin color at the large __ position. This pin assumes that the image segment contains 30 m-th image t-planes per second. It is convenient to transfer the method proposed by the present invention, but this playback speed should not be limited as in this case. The first horizontal scanning line 1 () 2 and the second horizontal scanning line 1 〇 4 are used to generate sampling pixel colors 112 and 114 in a plurality of images, for example, 3G consecutive image planes. Once the sampled pixels & 112 and 114 are generated, the skin color program 22 will be activated to classify each turn. If the pixel is in the skin color range, then its representative - logical value "丨", if the pixel is not Falling in the skin color range, it represents a logical value of "0", and the results of sampling pixel colors 112 and 114 are as indicated by indicator arrays 222 and 224. An "OR" logic operation 226 is then performed on indicator arrays 222 and 224 to obtain a resulting array 232. After analyzing each of the three consecutive image frames of 14 1283375, the resulting array 232 is stored in a color map 230. The color block (c〇i〇rblock) 240 is an illustrative example of the color map 230. The 30 columns in the color block 240 correspond to the analyzed image frames, respectively, wherein the white blocks represent the skin color range. Pixels, while black does not represent pixels in the skin color range. The pixels in the color block 240 near the area 245, that is, the skin tone pixels between the pixels 210 and 330 from left to right, when presented in a steady state, represent that the TV news anchor may be the subject of the image facet. In order to get a more correct result, more analysis can be performed to verify. Once the candidate image segments have been identified, the lens detection circuitry 32 can assist in identifying when the image segment changes. For example, the lens detecting circuit 32 can detect when a lens is switched to another lens by stably presenting the TV news anchor by analyzing the color properties of the image frame. Referring to FIG. 5, FIG. 5 is a schematic diagram showing the detection of the switching of the lens. The lens switching is detected by comparing the partial chromatograms of the image strips 315 _ and 325 of the two image frames 312 and 322. In order to reduce the complexity of the operation, the lens detecting circuit 32 first detects a large-scale lens switching, and once the switching is found, 'then locks a smaller range to detect the area where the cancer is cut. Figure 5 shows two sets of image masks 31〇 and 32〇. In this example, each group of images 310 or 320 contains 30 images, which is an image representing the length of one second. An image plane is selected for each of the image group groups 31 and 320. For the sake of simplicity, the 30th image plane is usually selected for comparison. Image strips 315 and 325 are selected from the two selected image planes 312 and 322, among which

15 1283375 影像帶315及325對應到色彩區塊24〇中區域245的位置，該位置即代表穩定的膚色像素。意即影像帶315及325的位置也就是電視新聞主播的頭部所出現的位置。要執行第一次色譜曲線比較15 1283375 Image strips 315 and 325 correspond to the location of region 245 in color block 24, which represents a stable skin tone pixel. This means that the location of the video strips 315 and 325 is where the head of the TV news anchor appears. To perform the first chromatographic curve comparison

時，係比較連續選出的兩個影像畫面312和322中的影像帶MS 及奶的色譜曲線，如果第一次色譜曲線比較所得到的色譜曲線差係大於-第-臨界值’就會在3G個插人的彩色晝面中，選取每 -對彩色畫面’對其相對應的影像帶執行第二次色譜曲線比較，以找出鏡頭切換發生時確切的影像畫面。藉由鎖定影像帶仍以及32S所代表的區域，本發明可以正確_理包含分割晝面的影像畫面，因為本發明僅會對該影像晝面中的—部分_色譜曲線比較法進行分析。、在臉部膚色偵測電路34以及鏡頭偵測電路％產生候選之後’ _處理· 36 _性地執行額相步驟。例何以 =景〔片時間少於-縣長度的片段，例如少於—秒或三秒，因 =1片段很可能沒有電視新聞主播的鏡頭。再者，為了的目的，也可輯算包含有歡的耗區_影像晝面的百姐。 f影像處理電路30產生候選區段4〇之後，可以進行聲音分二^供更多的資訊，以確保更精確地偵測新聞片鏡碩會顯錄多臉孔，例如群眾的晝面。如 :片用’代表偵測電視新社_產生錯誤_#^片貝科雜、導或疋訪問也會包含大且穩定的臉部特寫。這些時候如果不 1283375 執行聲音分析的話，這些晝面也會被判斷為電視新聞主播的畫面。聲音資料也可以當作用來決定候選段落的主要資訊，而不僅是當作影像資料的獅資訊，如果使帛可麵聲音處理技術，例如語音辨識技術，則使用聲音資料亦可獲得很高的可靠度。睛回頭參閱第丨圖，當產生波形的統計數值之後，聲音訊號 • 就變得極為有用。基於上述的原目，非重疊位移視窗電路12將聲音訊號分隔成獨立的25亳秒聲音區段，當然上述的時間長度可以較長或較短’ 25亳秒僅是本案的一個例子。隨後快速傅立葉轉換 (fastFouriertransform’FFT)電路14會對聲音視窗進行快速傅立葉轉換’產生的結果會傳遞給聲音能量分析電路2〇來分析聲音樣本的能量。快速傅立葉轉換電路14將聲音樣本轉換至頻域 (frequency domain)，然後分析該聲音樣本的頻率響應。聲音能量鲁分析電路20包含有電路22、電路24以及電路％，電路用^ 計算頻率低於13 kHz的聲音樣本的能量，電路24用來計算頻率介於8-13 kHz的聲音樣本的能量，而電路％貝,】是用來計算聲音樣本的頻率中心（frequency centroid)。頻率中心即為所有頻譜的^算數平均數，用來指相轉應的巾雜。聲音能量分析電路如; 的電路22、24或26的輸出隨後會和影像處理電路3〇的輪出相7 合，則可以同時處理影像分析以及聲音分析。一合適的背景能階（background energy level)電路幻係用來 17 Ϊ283375 叶算背景雜訊的能階，背景能階電路42係採用局部能量的最低十個的平均值，然而不一定要取十個，可以取較多或者較少，但是利用這種平均方式可以得到較準確的聲音資料的背景雜訊能階。所有由聲音能量分析電路20以及背景能階電路42計算出的月b階資訊接著傳遞給比例計算電路5〇，比例計算電路計算出各種能量比例，用來判定接收到的聲音資料的特徵。電路5 計算背景聲音能階與全部聲音能階之間的比例，電路54係用來計算頻率落於如kHz《間的聲音之平均聲音能階與全部聲音能階之間的比例，電路56係用來計算目前候選區段的頻率中心的變異數電路58係用來计异無音訊比例（也⑽⑽， _音能階低於背景聲音能階之聲音段落的數目二= =數目比。比例計算電路5G計算完電路52、54、56及58所輸 r ml有比狀後接著會將計算後的比顺複數侧先確定的 7做比較。如果_特徵財落在上職财之的新聞段落中，其餘的段細從二十异電路50中輸出，視為具有電視新聞主播的鏡頭。含分：二分：斷-個新聞的像素的位置是否對應於電^7，、、、後再_洛於膚色範圍再者’利用比較色譜曲線的:耳：的位置而呈現穩定狀態。凌，本案可以很快地判斷出何時電. 1283375 視新聞主播已經沒有繼續^現在新聞畫面上。織再執行聲音分析來進一步縮限候選片段的數目。她於其齡騎則段的方法，本發曝有許乡優點，例如’即使影像晝面包含有兩個或更多分割晝面，偵測電視新聞主播的方法仍然相當有效。本發明可以利用-條水平掃描線來作分析^運算複雜度較低，但結果較不準確；抑或可_用兩條水 • 1勒線來作分析，其财複雜錢雜冑，何制鮮確的結果。再者’本發明所提出的方法亦翻於_出現—個或一個以上的電視新縣播的晝面，而且也適用於多角度鏡頭。使用— 色彩空間來執行像素量測與比較可以更加確保本案可以有效地偵測膚色範圍，然而使用Lab色彩空間不是本發明的必要手段。而且臨界值也可以根據不同的膚色或是不同的化妝應用而做調整。總之，本發明提供-種簡易的計算方法來將電視新聞節目分段。験以上所述僅為本發明之較佳實施例，凡依本發明申請專利範圍所做之均等變化與修飾，皆應屬本發明之涵蓋範圍。【圖式簡單說明】弟1圖為電視新聞分段系統的方塊圖。第2圖，示第—水平掃描線以及第二水平掃描線來偵測影像畫面上是否有電視新聞主播的臉部影像。第3圖為本發_難視賴域臉部影像的流程圖。 ⑧ 19 1283375 第4圖顯示如何從兩條掃視新聞主播。綠來得到—邏輯色彩圖，用來偵測電 =圖顯示_鏡頭切換之示意圖，藉触較兩個影像晝面的影像㈣局部色譜曲線，來_綱切換。 ⑧ 【主要元件符號說明】系統 12非重疊位移視窗電路 14快速傅立葉轉換電路 22 24聲音樣本的能量之計算電路 26聲音樣本的頻率中心之計算電路 20聲音能量分析電路 3〇影像處理電路 32鏡頭偵測電路 34臉部膚色偵測電路 36後續處理電路 40新聞影像的候選片段 5〇比例計算電路 52、54聲音能階比例計算電路 42背景能階電路 56頻率中心的變異數之計算電路 58無音訊比例計算電路 100、210影像晝面 102第一水平掃描線 104弟二水平掃描線 112、114取樣像素顏色 220膚色偵測程序 222、224指標陣列 226 “或”邏輯運算 230色彩地圖 232結果陣列 240色彩區塊 245呈現膚色的穩定區域 310、320影像晝面組 312、322影像晝面 315、325影像帶 20When comparing the chromatograms of the images with MS and milk in the two image frames 312 and 322 selected continuously, if the chromatographic curve obtained by comparing the first chromatographic curve is greater than the -th-threshold value, it will be in 3G. In the inserted color plane, select each pair of color pictures to perform a second chromatographic curve comparison on the corresponding image band to find the exact image picture when the lens switching occurs. By locking the image strip and the area represented by the 32S, the present invention can correctly image the image including the split face, because the present invention only analyzes the partial-chromatography curve comparison method in the facet of the image. After the face skin color detecting circuit 34 and the lens detecting circuit % generate candidates, the phase step is performed _processing. For example, if the film is less than the length of the county, for example, less than -second or three seconds, because the =1 segment is likely to have no footage of the TV news anchor. In addition, for the purpose, it is also possible to calculate the hundred sisters who have the consumption area of the Huan. After the image processing circuit 30 generates the candidate segment 4, the sound can be divided into more information to ensure more accurate detection of the news film to record multiple faces, such as the face of the masses. Such as: film with the representative of the detection of the new TV news _ generated error _ # ^ film Beca miscellaneous, guide or 疋 access will also contain large and stable face close-up. At these times, if you do not perform sound analysis on 1283375, these faces will also be judged as the screen of the TV news anchor. Sound data can also be used as the main information used to determine the candidate passages, not just the lion information used as the image data. If the sound processing technology such as voice recognition technology is used, the sound data can be used to obtain high reliability. degree. Looking back at the figure, the sound signal becomes extremely useful when the statistical values of the waveform are generated. Based on the above, the non-overlapping displacement window circuit 12 separates the sound signal into separate 25-second sound segments. Of course, the above-mentioned time length can be longer or shorter '25 亳 seconds is only an example of the present case. The result of the fast Fourier transform of the sound window by the fast Fourier transform (FFT) circuit 14 is then passed to the sound energy analysis circuit 2 to analyze the energy of the sound sample. The fast Fourier transform circuit 14 converts the sound samples to a frequency domain and then analyzes the frequency response of the sound samples. The sound energy analysis circuit 20 includes a circuit 22, a circuit 24, and a circuit %. The circuit calculates the energy of a sound sample having a frequency lower than 13 kHz, and the circuit 24 calculates the energy of the sound sample having a frequency between 8 and 13 kHz. The circuit %, is used to calculate the frequency centroid of the sound sample. The frequency center is the average of the ^ arithmetic of all the spectra and is used to refer to the matching of the teeth. The output of the circuit 22, 24 or 26 of the sound energy analysis circuit, for example, is then combined with the wheeling of the image processing circuit 3, and image analysis and sound analysis can be processed simultaneously. A suitable background energy level circuit phantom is used to calculate the energy level of the background noise of 17 Ϊ283375, and the background energy level circuit 42 uses the lowest ten average of the local energy, but does not necessarily take ten You can take more or less, but you can get more accurate background noise level of sound data by using this averaging method. All of the monthly b-order information calculated by the sound energy analysis circuit 20 and the background energy level circuit 42 is then passed to the proportional calculation circuit 5, and the proportional calculation circuit calculates various energy ratios for determining the characteristics of the received sound data. Circuit 5 calculates the ratio between the background sound level and the total sound level, and circuit 54 is used to calculate the ratio of the average sound level of the sound falling between frequencies such as kHz to the total sound level, circuit 56 The variance number circuit 58 used to calculate the frequency center of the current candidate segment is used to calculate the proportion of the unvoiced audio (also (10) (10), the number of sound segments whose _ tone energy level is lower than the background sound level, == number ratio. The circuit 5G calculates the ratio of the r ml inputted by the circuits 52, 54, 56, and 58 and then compares the calculated ratio 7 with the first side of the complex number. If the _ characteristic money falls in the news passage of the upper account In the middle, the rest of the segment is output from the twenty-six circuit 50, which is regarded as the lens with the TV news anchor. Including: two points: break - the position of the pixel of the news corresponds to the electric ^7,,,, and then _ Loose in the skin color range and then use the comparative chromatographic curve: the position of the ear: to show a steady state. Ling, this case can quickly determine when the electricity. 1283375 The news anchor has not continued ^ now on the news screen. Sound analysis Further narrowing down the number of candidate segments. She used the method of riding the segment in her age. This method exposes the advantages of Xuxiang. For example, even if the image contains two or more segments, the method of detecting TV news anchors is still It is quite effective. The invention can use the horizontal scanning line for analysis. The computational complexity is lower, but the result is less accurate; or it can be analyzed with two waters and 1 ray, which is complicated and complicated. What is the result? The method proposed by the present invention is also turned over to the presence of one or more televisions in the new county, and is also applicable to multi-angle lenses. Use - color space to perform pixels Measurement and comparison can further ensure that the case can effectively detect the skin color range. However, using the Lab color space is not a necessary means of the present invention, and the threshold can also be adjusted according to different skin colors or different makeup applications. The invention provides a simple calculation method for segmenting a television news program. The above description is only a preferred embodiment of the present invention, and is made according to the scope of the patent application of the present invention. Equivalent changes and modifications shall fall within the scope of the present invention. [Simple diagram of the diagram] Figure 1 is a block diagram of the TV news segmentation system. Figure 2 shows the first horizontal scan line and the second horizontal scan line. Detect whether there is a TV news anchor's face image on the image screen. Figure 3 is a flow chart of the _ 难赖脸脸 face image. 8 19 1283375 Figure 4 shows how to scan the news anchor from two. Get the - logic color map, used to detect the electricity = map display _ lens switching diagram, by touching the image of the two images (4) local chromatographic curve, to _ class switching. 8 [Main component symbol description] System 12 non Overlap displacement window circuit 14 fast Fourier transform circuit 22 24 energy sample calculation circuit 26 frequency sample calculation circuit 20 sound energy analysis circuit 3 image processing circuit 32 lens detection circuit 34 face skin color detection circuit 36 Subsequent processing circuit 40 candidate segment of news image 5〇 ratio calculation circuit 52, 54 sound energy level ratio calculation circuit 42 background energy level circuit 56 frequency center The calculation circuit 58 of the variance number has no audio ratio calculation circuit 100, 210 image plane 102 first horizontal scan line 104 second horizontal scan line 112, 114 sample pixel color 220 skin color detection program 222, 224 indicator array 226 "or" logic Operation 230 color map 232 result array 240 color block 245 presents a stable region of skin color 310, 320 image face group 312, 322 image face 315, 325 image band 20

Claims

1283375 X. Patent application scope: 1. An image segmentation method for editing an image segment according to the content of the image segment, the method comprising: receiving an image signal including a plurality of image frames; utilizing a -first horizontal scanning line The image plane of the image signal is analyzed, and the first horizontal scanning line selects at least one column of pixels for analysis; and the analysis image is located at the pixel of the first horizontal scanning red to determine whether the color of the pixel falls. Within a predetermined color range; indicating, in the image plane, an area covered by an adjacent pixel that falls within the predetermined color range; 'Using a plurality of consecutive image planes on the first horizontal scan line Pixels to generate a color map; if the color map displays a predetermined number of consecutive image frames, each of which includes a stable pixel area, and the pixels all fall within the predetermined color range, the target image segments are marked as Candidate image segment; For each candidate image segment, select a p-image from each solid image And a chromatogram is generated for each stable region of the selected image frame; the chromatographic curve of the pair of consecutively selected images is compared, and the first chromatogram of the image of the pair of consecutively selected images is compared; The difference is greater than a first-threshold value ^ 彳帛帛色谱色谱色谱色谱色谱色谱 , , , 味色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱色谱The chromatographic curve difference of the successively selected image faces is greater than the first critical value; and when the second chromatographic curve difference obtained by the second chromatographic curve comparison is greater than a second critical value, indicating that there is a lens in the candidate image segment change. 2. The method of claim 1, wherein the selecting one image from each of the N image planes comprises selecting the Nth image plane. 3. The method of claim 4, wherein the first horizontal scanning line is located in the image plane from about one third of the top surface of the image. 4. The method of claim i, further comprising performing a dirty color to Lab color I conversion to determine the color of the pixel before analyzing pixels located on the first horizontal scan line in the image frame. Whether it falls within the predetermined color range. # 5·If you apply for a patent range! The method of the item, wherein the predetermined number of consecutive image frames form a three second image. 6. The method of claim 1, further comprising: analyzing the image of the image signal by using a second horizontal scanning line, if the same number of columns are selected for analysis in the basin; If the image on the n-surface is on the second horizontal scan line and the image color is within the predetermined color range, the pixel is set to a logical value a 22 1283375 « 1,, l; the use is located in the first and second Corresponding pixels on the horizontal scan line, performing an OR operation to generate synthesized pixel data; and using the synthesized pixel data to indicate adjacent pixels of the image plane falling within the predetermined color range a region, and using the plurality of consecutive image frames to generate the color map. 7. The method of claim 6, wherein the first and second horizontal scanning lines are located in an image frame about one-third of the time from the top to the bottom. 8. The method of claim 1, further comprising removing the candidate image segments having a length less than a predetermined time. 9. The method of claim 1, further comprising: receiving an audio signal associated with the received image signal; and analyzing the audio signal to filter the candidate image segment, wherein the audio signal It is processed in a predetermined size sound block (audi〇ftame). 10. The method of claim 9, further comprising converting the sound sample to a frequency domain to analyze the frequency response of the sound block, and calculating the overall sound level of the sound shelf. . 23 1283375 U. The method of claim 1, wherein the method further comprises: calculating a background sound level of the sound block; comparing the background sound level with the overall sound level; and if the background sound If the energy level of the energy level does not fall within the first specific range, the candidate image segment is eliminated. 12. The method of claim 2, further comprising: a ratio of a sound imbalance of the sound level of the background sound level to a ratio of all sound blocks; and if the ratio is not In a second specific range, the candidate image segments are eliminated. 13. The method of claim 10, further comprising: calculating an average sound energy of a sound intercepting frame whose frequency falls within 8-13 kHz; calculating an average sound energy of a sound intercepting frame whose frequency falls within 8-13 kHz The ratio of the overall sound level; and if the ratio does not fall within a particular range, the candidate image segment is eliminated. 14. The method of claim 1, further comprising: counting the frequency centroid variation of the current candidate image segment; and if the frequency center variation does not fall within a specific The range of the image segment is eliminated. 24 1283375 15. An image segmentation method for editing an image segment according to the content of the image segment, the method comprising: receiving an image signal including a plurality of image frames; receiving the image signal associated with the received image a sound signal; analyzing the image plane of the image signal by using a first horizontal scanning line and a second horizontal scanning line, wherein the first horizontal scanning line and the second horizontal scanning line each select at least one column of pixels For analysis; if the color of the pixel on the first and second horizontal scan lines on the image plane falls within the predetermined color range, the pixel is set to a logical value "Γ; use at the first and second levels Performing - "or (QR)" logic operations on the corresponding pixels on the scan line to generate synthesized pixel data; using the synthesized pixel data to refer to the image of the adjacent pixels falling within the predetermined color range a region; generating a color map using the synthesized pixel data in a plurality of consecutive image frames; if the color map displays a predetermined number of consecutive In the image plane, a stable pixel area is included, and the pixel falls within the predetermined color range, and the current image paragraph is marked as a candidate image segment. For each candidate image, from each N One image plane is selected from the image plane, and a chromatogram is generated for each stable region of the selected image; the execution-first-chromatogram comparison is performed to compare each successive pair of selected images to the surface of the image 25 1283375 Chromatographic curve; when the first chromatographic curve is compared with the first chromatographic curve difference is greater than a first critical value, the 'execution-second chromatographic curve comparison compares each of the pairs of consecutively selected images 昼®r— a stable region of the continuous image plane, wherein the chromatographic curve difference of the consecutively selected image frames is greater than the first critical value; although the difference between the second chromatogram curve obtained by comparing the first color 4 curve is greater than a second critical value a value indicating that there is a -an change in the candidate image segment; and analyzing the sound 峨(10) filtering the selected image segment of the image The method of claim 15 wherein the first and second horizontal lines are located in the image plane. About one-third of the time from the top down.

17 jrr a method This method is based on the fine-TV new _ image segment = community can ___ round, the financial method contains: = owed one contains a plurality of news images 昼 _ image signal; 1 complex by the * level The field is used to analyze the news image of the image signal, and the pixel of the new H-water-known line is determined to be &#; the color shirt falls within the predetermined color steel, and the side of the electricity is 26 1283375 depending on the skin color of the news anchor; In the new _image screen " the area covered by the adjacent pixels within the color plane; generating a color by using pixels in the plurality of consecutive image frames on the first horizontal line Map; if the color map display - a predetermined number of consecutive news image screens, including a stable pixel area, and the pixels fall within the predetermined color range, the current image paragraph is marked as a candidate image paragraph; The image of each button is selected from 7 news image frames from each N news image, and a chromatogram is generated for each selected (four) new _ image of the stable region; Comparing the spectral curves, comparing the chromatographic curves of each of the successively selected news image frames; when the first chromatogram curve difference obtained by the first chromatographic curve comparison is greater than the first critical value, performing - comparing the second chromatogram curves, comparing Between each pair of consecutively selected news image paintings ©—a stable region of the continuous paste image, in which the lining successively selects the singular image—the difference is greater than the first critical value; When the difference between the second chromatogram curve and the obtained second chromatographic curve is greater than the second threshold value, it indicates that there is a lens change in the candidate image segment. 11. Illustrated ···

27