TW201504955A

TW201504955A - Method of detecting news anchorperson shot using face recognition

Info

Publication number: TW201504955A
Application number: TW102126632A
Authority: TW
Inventors: Li-Wu Cai; Ji-Yi Liu; He-Can Zheng
Original assignee: Chunghwa Telecom Co Ltd
Priority date: 2013-07-25
Filing date: 2013-07-25
Publication date: 2015-02-01
Also published as: TWI520077B

Abstract

Provided is a method of detecting news anchorperson shot using face recognition, which uses the recognition technique of anchorperson face to detect each anchorperson shot of the news program. The method provided by the invention includes two major steps: (1)anchorperson face learning: Within a section of the news program in the beginning stage the face model of an anchorperson is trained through an unsupervised learning, and used for determining whether the facial image is a criteria as for a news anchorperson's face; (2) anchorperson shot detection: after the step (1) is finished, according to the anchorperson face model and nchorperson face standard, whenever the anchorperson face appeared in the news program is determined, therefore, each of the anchorperson shots and its starting playback time in the news program is detected. The information can be used to separate a news program by unit of report.

Description

Method for detecting news anchor picture by using face recognition

本發明係關於一種偵測新聞節目中主播畫面的方法，特別係指一種結合影像處理與視覺內容分析的新聞主播畫面偵測技術。本發明所提方法是以主播人臉做為主播畫面的關鍵特徵，故需以人臉辨識技術來辨認新聞中出現的人臉。除此之外，本發明結合非監督式學習技術使所提之方法的運行可以完全自動化。 The invention relates to a method for detecting an anchor picture in a news program, in particular to a news anchor picture detection technology combining image processing and visual content analysis. The method proposed by the invention is a key feature of the anchor image as the main broadcast screen, so the face recognition technology is needed to identify the face appearing in the news. In addition, the present invention, in conjunction with unsupervised learning techniques, allows the operation of the proposed method to be fully automated.

主播畫面為新聞節目中的一個特殊場景。一般來說，一個新聞節目主要由數十則新聞報導所組成，而每一則新聞報導又為一個新聞主播畫面與緊接的記者採訪畫面所構成。 The anchor picture is a special scene in the news program. Generally speaking, a news program is mainly composed of dozens of news reports, and each news report is composed of a news anchor picture and an immediately interviewed reporter.

偵測主播畫面的基本概念就是針對新聞主播畫面的重要特徵與特性進行偵測。常見的方法大致可分為兩種，一種為場景變化偵測法(scene change detection)，係從新聞中找出主播畫面與採訪畫面的轉折點。此原理在於因主播畫面有特殊的畫面結構，包括位置固定的主播與變動甚少的背景畫面，這相對在各種場合下拍攝的採訪畫面會來得更加穩定。所以當新聞從一個變動場景轉換到一個穩定的場景，就極有可能是主播畫面的出現。故從變動場景到穩定場景的轉折點與從穩定場景到變動場景的轉折點之間即為主播畫面；另一種方法則為場景群聚法(scene clustering)，係分析完整一部的新聞節目，將相似的畫面分別群聚在一起後，再從中過濾出主播畫面。此方法的原理在於因主播畫面有固定的畫面結構，所以分屬不同報導的主播畫面之間仍具有相當高的相似度，故能由分群過程將其聚集在一起。接著，透過主播畫面具有依一定頻率重複出現的特性，可以找出真正由主播畫面所組成的那一群畫面以達到偵測主播畫面的效果。 The basic concept of detecting the anchor picture is to detect the important features and characteristics of the news anchor picture. The common methods can be roughly divided into two types, one is scene change detection, which is to find the turning point of the anchor screen and the interview screen from the news. The principle is that the anchor picture has a special picture structure, including a fixed anchor and a background image with little change, which is more stable than the interview picture taken in various occasions. So when news changes from a changing scene to a stable scene, it is very likely that the anchor picture will appear. Therefore, the turning point from the changing scene to the stable scene and the transition from the stable scene to the changing scene The point is the main broadcast picture; the other method is scene clustering, which analyzes the complete news program, clusters similar pictures, and then filters out the anchor picture. The principle of this method is that since the anchor picture has a fixed picture structure, the anchor pictures belonging to different reports still have a relatively high degree of similarity, so they can be gathered together by the grouping process. Then, through the characteristics that the anchor picture has repeated appearance according to a certain frequency, it is possible to find out the group of pictures that are actually composed of the anchor pictures to achieve the effect of detecting the anchor picture.

不過，近年來隨著科技的演進，上述的兩類方法出現了新的缺點。一是目前的新聞畫面相較於過去更加花俏、多變。傳統的方法都是基於認定主播畫面中，主播都是保持在同一個位置，而且其身後的背景部分是固定不變或只有些許變化。但就觀察目前的新聞節目可知，在同一個新聞節目中主播的位置是會變動的，而且採訪畫面的片段與報導相關的動畫特效已取代過去制式不變的佈景成為主播畫面背景的一部分。故透過偵測背景畫面變動程度來找出主播畫面的方法，其精確度會因此大幅下降。例如先前的中華民國專利M386559、美國專利7720281B2號案中，利用人體膚色偵測來排除掉新聞畫面中可能為主播人臉的區塊，接著採用場景變化偵測法從剩餘背景畫面的變化找出新聞主播畫面。而且，其方法的皮膚顏色必須事先定義，但膚色可能會隨著不同新聞訊號來源而有所變動，進而影響其方法效能的穩定。除此之外，傳統的場景群聚法不適用於即時的新聞串流(streaming)，因為它必須分析完整一部新聞節目的畫面分布。但隨著智慧型電視(smart TV)逐漸成為新一代的電視產品，與過去不同的是智慧型電視具有連結網路的特性與人機互動的功能。這使得觀眾不再只是接收訊息的一方，進而可以與電視進行更多的即時互動。倘若為了提供觀眾即時的新聞內容互動功能，就必須以具即時性且不需人工介入的技術來分析新聞。需要等待到整部新聞播放完畢才能開始分析的技術是無法符合需求的。 However, in recent years, with the evolution of technology, the above two methods have emerging new shortcomings. First, the current news screen is more fancy and changeable than in the past. The traditional method is based on the identification that the anchors are kept in the same position in the anchor picture, and the background part behind them is fixed or only slightly changed. However, observing the current news program, the location of the anchor in the same news program will change, and the animation of the interview screen and the animation effect of the report have replaced the scene of the past system as part of the background of the anchor screen. Therefore, the method of finding the anchor picture by detecting the degree of change of the background picture will greatly reduce the accuracy. For example, in the case of the former Republic of China patent M386559 and the US patent 7720281B2, the human skin color detection is used to exclude the block of the news picture that may be the main broadcast face, and then the scene change detection method is used to find out the change of the remaining background picture. News anchor screen. Moreover, the skin color of the method must be defined in advance, but the skin color may vary with different news signal sources, which may affect the stability of the method performance. In addition, the traditional scene clustering method is not suitable for instant news streaming because it must analyze the picture distribution of a complete news program. However, as smart TVs gradually become a new generation of TV products, unlike the past, smart TVs have the function of connecting networks and interacting with humans. This makes the audience no longer just the party receiving the message, and thus can have more instant interaction with the TV. If you want to provide an audience, When the news content interaction function, it is necessary to analyze the news with a technology that is instantaneous and does not require human intervention. The technology that needs to wait until the entire news has been played to begin analysis is not up to the requirements.

由此可見，上述習用方式仍有諸多缺失，實非一良善之設計，而亟待加以改良。 It can be seen that there are still many shortcomings in the above-mentioned methods of use, which is not a good design, but needs to be improved.

本案發明人鑑於上述習用方式所衍生的各項缺點，乃亟思加以改良創新，並經多年苦心孤詣潛心研究後，終於成功研發完成本件利用人臉辨識偵測新聞主播畫面的方法。 In view of the shortcomings derived from the above-mentioned conventional methods, the inventor of the present invention has improved and innovated, and after years of painstaking research, he finally succeeded in researching and developing the method for detecting the news anchor picture using face recognition.

本發明之目的即在於針對新聞節目提出一種偵測主播畫面的方法。係可分析即時的新聞串流，並隨著新聞的播放同步偵測出當中的每一個主播畫面與其的起始播放時間。 The object of the present invention is to propose a method for detecting an anchor picture for a news program. The system can analyze the instant news stream, and synchronously detect each of the anchor pictures and their initial playing time as the news is played.

本發明之次一目的在於提供一種偵測新聞主播畫面的方法，係可在不分析整張影格(frame)的情況下偵測出主播畫面，以因應主播畫面之穩定特性愈來愈不明顯的現象。 A second object of the present invention is to provide a method for detecting a news anchor picture, which can detect the anchor picture without analyzing the entire frame, so that the stable characteristics of the anchor picture are less and less obvious. phenomenon.

本發明之再一目的在於提供一種新聞主播人臉模型的非監督式學習方法，係可以利用新聞節目開始的一個片段，在不需人為介入的情況下自動且即時地訓練出該新聞的主播人臉模型。 A further object of the present invention is to provide an unsupervised learning method for a news anchor face model, which can utilize a segment of a news program to automatically and instantly train the anchor of the news without human intervention. Face model.

達成上述發明目的之利用人臉辨識偵測新聞主播畫面的方法，係利用主播人臉的辨識技術偵測出新聞中的每一個主播畫面。因為主播的人臉為主播畫面中一個相當穩定的關鍵特徵且不受背景畫面干擾，所以透過主播的人臉能很迅速地偵測到主播畫面的出現。為了判斷一個新聞影格中是否含有主播人臉，需先透過人臉偵測技術先找出此影格中的所有人臉圖像，再進一步辨識是否有人臉圖像為主播的人臉。而在主播人臉辨識的過程中需要使用到此新聞主播的人臉模型，若與一張人臉圖像與主播人臉模型之間的差異小於誤差範圍內，即可認定此張人臉為主播人臉。為了獲得主播人臉模型，本發明所提出之方法是利用新聞節目開始一個片段，即時透過非監督式學習法所自動訓練出來的。之後，隨著新聞的播放，每當連續有數個影格皆含有主播人臉，即代表有可能是主播畫面的出現。 The method for detecting the news anchor picture by using face recognition to achieve the above object is to use the anchor face recognition technology to detect each anchor picture in the news. Because the anchor's face is a fairly stable key feature in the main broadcast and is not interfered by the background image. The appearance of the anchor picture can be detected very quickly by the face of the anchor. In order to determine whether a news frame contains an anchor face, it is necessary to first find all the face images in the frame through the face detection technology, and further identify whether the face image is the face of the main broadcast. In the process of the anchor face recognition, the face model of the news anchor is needed. If the difference between a face image and the anchor face model is less than the error range, the face can be identified as The anchor face. In order to obtain the anchor face model, the method proposed by the present invention is to start a segment by using a news program and automatically train it through the unsupervised learning method. After that, as the news is played, whenever there are several consecutive frames containing the anchor face, it means that there may be the appearance of the anchor picture.

本發明所提出之方法包括二大步驟：(一)主播人臉學習：先擷取新聞節目開始一個片段中之所有人臉圖像，接著分析圖像，以過濾出最有可能的主播人臉圖像，並藉此訓練出這個新聞的主播人臉模型。得到主播人臉模型後，再擷取新聞接下來一個片段中的所有人臉圖像，然後分析這些人臉圖像與主播人臉模型之間的差異，進一步評估出一張人臉圖像是否為新聞主播人臉的標準為何；(二)主播畫面偵測：完成步驟(一)後，利用主播人臉模型與主播人臉標準，辨識播放中的新聞，何時會出現含有主播人臉的影格。每當含有主播人臉的影格，在一段時間內頻繁地出現，即代表此段時間有一個主播畫面，並推估出此主播畫面之起始播放時間。 The method proposed by the invention comprises two major steps: (1) anchor face learning: first capturing a news program to start all face images in a segment, and then analyzing the image to filter out the most likely anchor face. Image and use this to train the anchor model of this news. After obtaining the anchor face model, all the face images in the next segment of the news are captured, and then the difference between the face images and the anchor face model is analyzed to further evaluate whether a face image is (2) anchor screen detection: After completing step (1), use the anchor face model and the anchor face standard to identify the news in the broadcast, when there will be a frame containing the anchor face. . Whenever a frame containing an anchor face appears frequently for a period of time, it means that there is an anchor picture during this time, and the initial play time of the anchor picture is estimated.

一種利用人臉辨識偵測新聞主播畫面的方法，其步驟包括：a.主播人臉學習，係利用新聞節目開始的一個片段，以非監督式學習法，即時訓練出此新聞節目的主播人臉模型與主播人臉標準；以及b.主播畫面偵測，係利用自步驟a中得到的主播人臉模型與主播人臉標準，辨識出新聞中依序出現的主播，以即時偵測出主播畫面與其的起始播放時間。 A method for detecting a news anchor picture by using face recognition, the steps comprising: a. anchor face learning, using a segment starting from a news program, and instantly training the anchor face of the news program by using an unsupervised learning method. Model and anchor face standard; and b. anchor screen detection, using the anchor face model and the anchor face standard obtained from step a, identify the anchors that appear in the news in order to instantly detect the anchor screen Starting with time.

其中，該主播人臉學習，其步驟係包括：c.主播人臉模型訓練，係利用新聞節目一個片段中的人臉圖像，自動過濾出主播的人臉圖像以訓練出此新聞節目的主播人臉模型；以及d.主播人臉標準評估，係利用步驟c中得到的主播人臉模型與另一新聞節目片段中的人臉圖像，評估出一張人臉圖像為主播人臉的標準。 The main body face learning includes the following steps: c. anchor face model training, which uses the face image in a segment of the news program to automatically filter out the face image of the anchor to train the news program. The anchor face model; and d. the anchor face standard evaluation, using the anchor face model obtained in step c and the face image in another news program segment to evaluate a face image as the host face Standard.

其中，該主播人臉模型訓練，其步驟係包括：透過一新聞影像接收組件，依序擷取新聞影像訊號中的影格；透過一人臉偵測組件，利用人臉偵測技術擷取出一個影格中的全部人臉圖像；以及透過一主播人臉模型訓練組件，利用自人臉偵測組件所得的複數張人臉圖像訓練出主播人臉模型。 The method for training the anchor face model includes: capturing a frame in the news image signal through a news image receiving component; and using a face detection component to extract a frame by using a face detection technology; All of the face images; and through an anchor face model training component, the host face model is trained using a plurality of face images obtained from the face detection component.

其中，該主播人臉標準評估，其步驟係包括：透過一新聞影像接收組件，依序擷取新聞影像訊號中的影格；透過一人臉偵測組件，利用人臉偵測技術擷取出一個影格中的全部人臉圖像；以及透過一主播人臉標準評估組件，利用自人臉偵測組件所得的複數張人臉圖像與主播人臉模型以評估出主播人臉標準。 The main body face standard evaluation includes the steps of: capturing a video frame in a news image signal through a news image receiving component; and using a face detection component to extract a frame by using a face detection technology; All face images; and an anchor face standard evaluation component, using a plurality of face images and anchor face models obtained from the face detection component to evaluate the anchor face criteria.

其中，該主播畫面偵測，其步驟係包括：透過一新聞影像接收組件，依序擷取新聞影像訊號中的影格；透過一人臉偵測組件，利用人臉偵測技術擷取出一個影格中的所有人臉圖像；透過一主播影格辨識組件，利用主播人臉辨識技術判斷一個影格是否為含有此新聞節目主播的人臉圖像之影格，即主播影格；以及透過一主播畫面偵測組件，利用主播影格偵測主播畫面與其的起始播放時間。 The method for detecting the anchor image includes: capturing a video frame in the news image signal through a news image receiving component; and using a face detection component to extract a video in the image frame by using a face detection component All face images; Through an anchor frame recognition component, the anchor face recognition technology is used to determine whether a frame is a frame of a face image containing the news program anchor, that is, an anchor frame; and the anchor frame is used to detect the anchor through an anchor picture detection component. The initial playback time of the screen.

其中，該主播影格辨識組件，內含處理步驟包括：e.擷取一個影格中所有人臉圖像的影像特徵值；f.基於影像特徵值，計算每一張人臉圖像與主播人臉模型的相異值後，利用主播人臉標準計算相異值的標準分數；以及g.由標準分數辨識是否有人臉圖像為主播人臉，並據此將含有主播人臉的影格設定為主播影格。 The main frame identification component includes the following steps: e. capturing image feature values of all face images in a frame; f. calculating each face image and the anchor face based on the image feature values After the difference value of the model, the standard score of the dissimilar value is calculated by using the anchor face standard; and g. the standard score is used to identify whether the face image is the main broadcast face, and accordingly, the frame containing the anchor face is set as the main broadcast. Frame.

其中，該主播畫面偵測組件，內含處理步驟包括：h.將接收到的一個影格之影格類型儲存於一個變量，其存有包含此個影格在內的過去N_D個影格的影格類型；i.計算變量中主播影格的數量R；j.檢查R>N_D/2條件是否成立；k.當j所述之條件成立時，係指偵測到一個主播畫面，故將目前主播畫面偵測狀態設為主播狀態，否則設為非主播狀態；以及l.當j所述之條件成立且主播畫面偵測狀態是由非主播狀態轉換為目前的主播狀態時，輸出變量中的最早一個主播影格之播放時間做為此個主播畫面的起始播放時間。 The main picture detection component includes a processing step including: h. storing the received frame type of a frame in a variable, and storing the frame type of the past N _D frames including the frame; i. Calculate the number of anchor frames in the variable R; j. Check if the condition of R>N _D /2 is true; k. When the condition described by j is established, it means that an anchor picture is detected, so the current anchor picture is detected. The measured state is set to the anchor state, otherwise it is set to the non-anchor state; and l. When the condition described by j is established and the anchor screen detection state is converted from the non-anchor state to the current anchor state, the earliest one of the output variables is the anchor. The playback time of the frame is the starting time of the playback of this anchor screen.

其中，該主播畫面偵測狀態，係指在主播畫面偵測過程中的兩種狀態，其包括：新聞目前處於主播畫面的主播狀態；以及新聞目前不處於主播畫面的非主播狀態。 The status of the anchor picture detection refers to two states in the process of detecting the anchor picture, including: The news is currently in the anchor state of the anchor screen; and the news is currently not in the non-hosting state of the anchor screen.

其中，該主播畫面偵測狀態，其初始狀態為非主播狀態。 The anchor picture detection state is in a non-host state.

其中，該主播人臉模型訓練組件，內含處理步驟包括：m.擷取複數張人臉圖像各自的影像特徵值；n.利用分群方法依影像特徵值的相似性將全部人臉圖像分為三群；o.針對每一個人臉圖像群，計算群的群聚程度值，其定義為群內人臉圖像的數量除以群內影像特徵值與平均影像特徵值之差異值的標準差；以及p.選取一個群聚程度值最大的人臉圖像群中的人臉圖像來訓練主播人臉模型。 The anchor face model training component includes the following steps: m. capturing image feature values of the plurality of face images; n. using the grouping method to image all face images according to similarity of image feature values Divided into three groups; o. For each face image group, calculate the group degree of group value, which is defined as the number of face images within the group divided by the difference between the image feature value in the group and the average image feature value. Standard deviation; and p. Select a face image in the face image group with the largest clustering degree to train the anchor face model.

其中，該主播人臉標準評估組件，內含處理步驟包括：q.擷取複數張人臉圖像各自的影像特徵值；r.基於影像特徵值，計算每一張人臉圖像與主播人臉模型的相異值；s.利用分群方法將所有相異值分為二群，並計算各自的相異值之平均值；以及t.選取擁有較低相異值之平均值的一群，並從中評估出主播人臉標準，其包含相異值之平均值、相異值之標準差。 The main face standard evaluation component includes the following steps: q. capturing image feature values of the plurality of face images; r. calculating each face image and the anchor person based on the image feature values Different values of the face model; s. use the grouping method to divide all the different values into two groups and calculate the average of the different values; and t. select a group with the average of the lower dissimilar values, and From this, the anchor face standard is evaluated, which includes the average value of the different values and the standard deviation of the different values.

其中，該主播畫面偵測，其步驟係包括：透過一新聞影像接收組件，依序擷取新聞影像訊號中的影格；透過一人臉偵測組件，利用人臉偵測技術擷取出一個影格中的所有人臉圖像；透過一主播影格辨識組件，利用主播人臉辨識技術判斷一個影格是否為含有此新聞節目主播的人臉圖像之影格，即主播影格；以及透過一主播畫面偵測組件，利用主播影格偵測主播畫面與其的起始播放時間。 The method for detecting the anchor image includes: capturing a video frame in the news image signal through a news image receiving component; and using a face detection component to extract a video in the image frame by using a face detection component All face images; through an anchor frame recognition component, using the anchor face recognition technology to determine whether a frame is included The image of the face image of the news program anchor, that is, the anchor frame; and the anchor picture detection component, uses the anchor frame to detect the anchor picture and its initial play time.

本發明所揭露之利用人臉辨識偵測新聞主播畫面的方法，與其他習用技術相互比較時，更具備下列優點： The method for detecting a news anchor picture by using face recognition disclosed in the present invention has the following advantages when compared with other conventional technologies:

1.本發明之主播畫面偵測方法，能隨著新聞的播放同步偵測出主播畫面，故可提供與新聞的即時互動功能，如以單則報導為單位的新聞快倒轉功能。 1. The anchor picture detection method of the present invention can synchronously detect the anchor picture along with the news broadcast, so that the instant interaction function with the news can be provided, such as the news fast reverse function by a single report.

2.本發明之主播畫面偵測方法，只以主播人臉做為找尋主播畫面的關鍵特徵而不需分析新聞背景畫面，故可因應日益複雜之新聞畫面。 2. The method for detecting the anchor picture of the present invention only uses the anchor face as a key feature for finding the anchor picture without analyzing the news background picture, so that it can cope with an increasingly complicated news picture.

3.本發明之主播畫面偵測方法，是以人臉辨識技術確認新聞主播的出現，故相比單以如顏色等低階影像特徵來偵測主播較具效能穩定性。 3. The method for detecting the anchor picture of the present invention is to confirm the appearance of the news anchor by the face recognition technology, so that the performance of the anchor is more stable than the low-order image features such as color.

4.本發明之主播人臉學習方法，能即時以非監督式學習法自動訓練出一個新聞節目的主播人臉模型，故不需人為的介入有利於此方法運用於一個即時營運的系統上。 4. The anchor face learning method of the present invention can automatically train the anchor face model of a news program in an unsupervised learning method, so that the method is not suitable for the application of the method to an instant operation system.

5.本發明之主播人臉學習方法，可即時訓練出一個新聞節目的主播人臉模型，故能確使此主播人臉模型與當下的主播人臉具一致性。 5. The anchor face learning method of the present invention can instantly train the anchor face model of a news program, so that the anchor face model can be surely consistent with the current anchor face.

上列詳細說明係針對本發明之一可行實施例之具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。 The detailed description of the preferred embodiments of the present invention is intended to be limited to the scope of the invention, and is not intended to limit the scope of the invention. The patent scope of this case.

綜上所述，本案不但在空間型態上確屬創新，並能較習用物品增進上述多項功效，應已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請貴局核准本件發明專利申請案，以勵發明，至感德便。 In summary, this case is not only innovative in terms of space type, but also can enhance the above-mentioned multiple functions compared with the conventional items, and should fully comply with the statutory invention patents of novelty and progress. If you apply in accordance with the law, you are requested to approve the application for the invention patent in order to invent the invention.

11‧‧‧主播人臉學習 11‧‧‧Hosting face learning

12‧‧‧主播畫面偵測 12‧‧‧Host screen detection

21‧‧‧主播人臉模型訓練 21‧‧‧Hosting face model training

211‧‧‧新聞影像接收組件 211‧‧‧News image receiving component

212‧‧‧人臉偵測組件 212‧‧‧Face Detection Component

213‧‧‧主播人臉模型訓練組件 213‧‧‧Host Face Model Training Component

22‧‧‧主播人臉標準評估 22‧‧‧Audit face standard assessment

221‧‧‧新聞影像接收組件 221‧‧‧News image receiving component

222‧‧‧人臉偵測組件 222‧‧‧Face Detection Component

223‧‧‧主播人臉標準評估組件 223‧‧‧Host Face Standard Evaluation Component

51‧‧‧新聞影像接收組件 51‧‧‧News image receiving component

52‧‧‧人臉偵測組件 52‧‧‧Face Detection Component

53‧‧‧主播影格辨識組件 53‧‧‧Annual frame recognition component

54‧‧‧主播畫面偵測組件 54‧‧‧ anchor screen detection component

S1~S24‧‧‧步驟流程 S1~S24‧‧‧Step process

第1圖為本發明利用人臉辨識偵測主播畫面方法的流程圖。 FIG. 1 is a flow chart of a method for detecting an anchor picture by using face recognition according to the present invention.

第2圖為本發明中主播人臉學習的流程圖。 Figure 2 is a flow chart of the main face learning in the present invention.

第3圖為本發明中主播人臉模型訓練組件中的訓練流程圖。 Figure 3 is a flow chart of the training in the anchor model training component of the present invention.

第4圖為本發明中主播人臉標準評估組件中的評估流程圖。 Figure 4 is a flow chart of the evaluation in the anchor face standard evaluation component of the present invention.

第5圖為本發明中主播畫面偵測的流程圖。 Figure 5 is a flow chart of the anchor picture detection in the present invention.

第6圖為本發明中主播影格辨識組件中的辨識流程圖。 Figure 6 is a flow chart of identification in the anchor frame recognition component of the present invention.

第7圖為本發明中主播畫面偵測組件中的偵測流程圖。 Figure 7 is a flow chart of detection in the anchor picture detecting component of the present invention.

第8圖為本發明中主播畫面偵測的狀態變化圖。 Figure 8 is a diagram showing the state change of the anchor picture detection in the present invention.

為利貴審查委員了解本發明之技術特徵、內容與優點及其所能達到之功效，茲將本發明配合附圖，並以實施例之表達形式詳細說明如下，而其中所使用之圖式，其主旨僅為示意及輔助說明書之用，未必為本發明實施後之真實比例與精準配置，故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍，合先敘明。 The technical features, contents, and advantages of the present invention, as well as the advantages thereof, can be understood by the reviewing committee, and the present invention will be described in detail with reference to the accompanying drawings. The subject matter is only for the purpose of illustration and description. It is not intended to be a true proportion and precise configuration after the implementation of the present invention. Therefore, the scope and configuration relationship of the attached drawings should not be interpreted or limited. First described.

請參閱第1圖所示，為本發明利用人臉辨識偵測主播畫面方法的流程圖，係包含兩大步驟：主播人臉學習11、主播畫面偵測12。 Please refer to FIG. 1 , which is a flowchart of a method for detecting an anchor screen by using face recognition according to the present invention. The method includes two major steps: anchor face learning 11 and anchor screen detection 12 .

S1：首先，新聞節目開始播放； S1: First, the news program starts to play;

S2：主播人臉學習11會利用新聞節目開始的一個片段，自動以非監督式學習法即時訓練出此新聞節目的主播人臉模型與主播人臉標準； S2: The anchor face learning 11 will use a segment of the news program to automatically train the anchor face model and the anchor face standard of the news program in an unsupervised learning manner;

S3：當完成上述S2步驟後，主播畫面偵測12利用從主播人臉學習11得到的主播人臉模型與主播人臉標準，隨著新聞的播放即時偵測每一個出現的主播畫面與其的起始播放時間； S3: After completing the above S2 step, the anchor screen detection 12 utilizes the anchor face model and the anchor face standard obtained from the anchor face learning 11 to instantly detect each of the appearing anchor screens with the news broadcast. Start time

S4：新聞節目結束播放，偵測動作停止。 S4: The news program ends playing and the detection action stops.

請參閱第2圖所示，為本發明利用人臉辨識偵測主播畫面方法之主播人臉學習的流程圖，係包含兩個步驟： Please refer to FIG. 2 , which is a flowchart of the method for the anchor face learning of the method for detecting the anchor picture by using the face recognition method of the present invention, which comprises two steps:

S5：首先，主播人臉模型訓練21中的新聞影像接收組件211會依序擷取播放中的新聞影格，並將影格送至人臉偵測組件212以擷取出影格中的所有人臉圖像。而擷取出來的人臉圖像會被傳至主播人臉模型訓練組件213。當主播人臉模型訓練組件213在得到N_T1張人臉圖像後，新聞影像接收組件211停止擷取新聞影格，而主播人臉模型訓練組件213會從N_T1張人臉圖像中即時以非監督式學習法訓練出主播人臉模型。其中N_T1係指用來訓練主播人臉模型的人臉圖像張數，其定義如式(1)所示：N _T1=s ₁*t ₁ (1) S5: First, the news image receiving component 211 in the anchor face model training 21 sequentially captures the playing news frame, and sends the frame to the face detecting component 212 to extract all face images in the frame. . The extracted face image is transmitted to the anchor face model training component 213. When the anchor face model training component 213 to obtain N _T1 faces in images, news video receiver assembly 211 to stop extracting news Frame, and anchor face model training component 213 from N _T1 instantly to human faces in the image The unsupervised learning method trains the anchor face model. Where N _{T1 is the} number of face images used to train the anchor face model, and its definition is as shown in equation (1): N _{T 1} = s ₁ * t ₁ (1)

其中，s1為新聞影像接收組件211在每一秒鐘的新聞中所擷取的影格數量，t1為所需多少時間人臉圖像的秒數。在完成主播人臉模型訓練21後，會獲得一個主播人臉模型； Where s1 is the number of frames captured by the news image receiving component 211 in each second of the news, and t1 is the number of seconds of the face image required. After completing the anchor face model training 21, an anchor face model is obtained;

S6：主播人臉標準評估22中的新聞影像接收組件221會依序擷取之後播放的新聞影格，並將影格送至人臉偵測組件222以擷取出影格中的所有人臉圖像。而擷取出來的人臉圖像會被傳至主播人臉標準評估組件223。當主播人臉標準評估組件223在得到N_T2張的人臉圖像後，新聞影像接收組件221停止擷取新聞影格，而人臉標準評估組件223會利用S5所得的主播人臉模型與此N_T2張人臉圖像自動評估出主播人臉標準。其中N_T2係指用來評估主播人臉標準的圖像張數，其定義如式(2)所示：N _T2=s ₂*t ₂ (2) S6: The news image receiving component 221 in the anchor face standard evaluation 22 sequentially captures the news frames played after, and sends the frames to the face detecting component 222 to extract all face images in the frame. The extracted face image is transmitted to the anchor face standard evaluation component 223. When the anchor face evaluation component 223 in the standard face image to obtain N _T2 sheets, news video receiver assembly 221 to stop capturing Frame press, while the face evaluation component 223 would use standard S5 of the resulting face model with this anchor N _{The T2} face image automatically evaluates the anchor face standard. Where N _{T2 is the} number of images used to evaluate the standard of the anchor face, and its definition is as shown in equation (2): N _{T 2} = s ₂ * t ₂ (2)

其中，s₂為新聞影像接收組件221在每一秒鐘的新聞影像中所擷取的影格數量，t₂為所需多少時間人臉圖像的秒數。在完成主播人臉標準評估22後，會獲得一組主播人臉標準。 Where s ₂ is the number of frames captured by the news image receiving component 221 in the news image of each second, and t ₂ is the number of seconds of the face image required. After completing the anchor face criteria evaluation 22, a set of anchor face criteria will be obtained.

請參閱第3圖所示，為本發明利用人臉辨識偵測主播畫面方法之主播人臉模型訓練組件中的訓練流程圖。 Please refer to FIG. 3, which is a training flowchart in the anchor face model training component of the method for detecting the anchor picture by using the face recognition method of the present invention.

S7：係在得到N_T1張人臉圖像後，先擷取這些人臉圖像各自的影像特徵值：區域二元圖樣(local binary pattern,LBP)； S7: after obtaining the N _T1 face images, first capturing the image feature values of the face images: a local binary pattern (LBP);

S8：基於人臉圖像影像特徵值的相似性，利用K平均分群演算法(k-means clustering)將全部人臉圖像分為三群。其中，影像特徵值的相似性定義為特徵值兩兩之間的歐幾里得距離(Euclidean distance)，距離愈小代表相似性愈高； S8: Based on the similarity of the feature values of the face image, all the face images are divided into three groups by k-means clustering. The similarity of the image feature values is defined as the Euclidean distance between the feature values, and the smaller the distance, the higher the similarity;

S9：接著針對每一個人臉圖像群，計算群的群聚程度值C(m)，其定義如式(3)所示： S9: Next, for each face image group, the cluster degree value C(m) of the group is calculated, and the definition is as shown in the formula (3):

其中，m=1,2,3，N_m為第m群人臉圖像中的人臉圖像張數，std_m為第m群人臉圖像中影像特徵值的離散程度，其定義如式(4)所示： Where m=1, 2, 3, N _m is the number of face images in the m-th group face image, and std _m is the degree of dispersion of the image feature values in the m-th group face image, and the definition is as follows Equation (4):

其中，d_mi為第m群人臉圖像中第i張人臉圖像的特徵影像值與該群平均影像特徵值的歐幾里得距離，D_m為第m群人臉圖像中全部d_mi的平均值。其中，群平均影像特徵值中第j維度的值X_m,j定義如式(5)所示： Where d _mi is the Euclidean distance of the feature image value of the i-th face image in the m-th group face image and the average image feature value of the group, and D _m is all in the m-th group face image The average of d _mi . Wherein, the value of the jth dimension of the group average image feature value X _{m,j is} defined as shown in the formula (5):

其中，x_mi,j代表第m群人臉圖像中第i張人臉圖像之影像特徵值第j維度的值。因為在一個新聞節目中，主播人臉圖像會頻繁地出現而且圖像之間的相似度很高，故一個人臉圖像群擁有愈大的C(m)代表其愈有可能是一個由主播人臉所組成的人臉圖像群； Where x _mi,j represents the value of the j-th dimension of the image feature value of the i-th face image in the m- _th group face image. Because in a news program, the anchor image will appear frequently and the similarity between the images is high, the larger the C (m) of a face image group, the more likely it is to be an anchor. a group of face images composed of faces;

S10：最後，從三群人臉圖像中選取一個群聚程度值最大的人臉圖像群，使用當中的人臉圖像訓練出此新聞節目的主播人臉模型。其中，本實施方式中的主播人臉模型之影像特徵值也採用LBP。 S10: Finally, a group of face images with the largest degree of clustering is selected from the three groups of face images, and the face model of the news program is trained using the face image. The image feature value of the anchor face model in the present embodiment also uses LBP.

請參閱第4圖所示，為本發明利用人臉辨識偵測主播畫面方法之主播人臉標準評估組件中的評估流程圖。 Please refer to FIG. 4, which is an evaluation flowchart of the anchor face standard evaluation component of the method for detecting the anchor picture by using the face recognition method of the present invention.

S11：係在得到N_T2張人臉圖像後，擷取這些人臉圖像各自的影像特徵值LBP。 S11: After obtaining the N _T2 face images, the image feature values LBP of the face images are captured.

S12：利用影像特徵值計算每一張人臉圖像與主播人臉模型之間相異值，其定義為影像特徵值之間的歐幾里得距離。當相異值愈大，代表此張人臉圖像與主播人臉模型相似度愈低； S12: Calculate a difference value between each face image and the anchor face model by using image feature values, which is defined as a Euclidean distance between image feature values. When the difference value is greater, it represents this person. The lower the similarity between the face image and the anchor face model;

S13：接著利用K平均分群演算法將所有的相異值分為二群，並分別計算這二群的相異值之平均值。因為主播人臉與主播人臉模型之間的相異值較低，而非主播人臉與主播人臉模型之間的相異值較高，所以此兩種相異值的分布會明顯的不同； S13: Then, using the K-average grouping algorithm, all the dissimilar values are divided into two groups, and the average values of the disparity values of the two groups are respectively calculated. Because the difference between the anchor face and the anchor face model is lower, and the difference between the non-host face and the anchor face model is higher, the distribution of the two different values will be significantly different. ;

S14：最後選取具有較低相異值之平均值的一群，即能從中知道主播人臉與主播人臉模型的差異為何，故將此群的相異值之平均值crit₁與標準差crit₂做為主播人臉標準。其定義如式(6)與式(7)所示： S14: Finally, a group with a lower average value is selected, that is, the difference between the anchor face and the anchor face model can be known, so the average value of the different values of the group is crit ₁ and the standard deviation crit ₂ Be the main broadcast face standard. Its definition is as shown in equations (6) and (7):

其中，N_T3、e_i分別代表具有較低相異值之平均值一群中的相異值個數、第i個相異值。 Wherein, N _T3 and e _i respectively represent the number of distinct values and the i-th different value in a group of average values having lower dissimilar values.

請參閱第5圖所示，為本發明利用人臉辨識偵測主播畫面方法之主播畫面偵測的流程圖，係說明如何在每接收到一張新聞影格時偵測主播畫面。 Please refer to FIG. 5 , which is a flowchart of the anchor picture detection method for detecting the anchor picture by using the face recognition method, and illustrates how to detect the anchor picture every time a news frame is received.

S15：新聞影像接收組件51負責接收播放中的新聞訊號並擷取出新聞影格，然後依序送至人臉偵測組件52； S15: The news image receiving component 51 is responsible for receiving the news signal in play and extracting the news frame, and then sending it to the face detecting component 52 in sequence;

S16：人臉偵測組件52會擷取出一個影格中的所有人臉圖像，並傳至主播影格辨識組件53； S16: the face detection component 52 will extract all the face images in a frame and pass to the anchor frame recognition component 53;

S17：主播影格辨識組件53利用擷取出來的人臉圖像來判斷此個影格的類型是主播影格或非主播影格，並將結果傳至主播畫面偵測組件54； S17: The anchor frame recognition component 53 uses the captured face image to determine whether the type of the frame is an anchor frame or a non-host frame, and transmits the result to the anchor picture detection component 54;

S18：最後，主播畫面偵測組件54會將此影格的影格類型納入偵測目前新聞是否處在主播畫面的參考。一旦偵測到一個新出現的主播畫面，即會立刻輸出此主播畫面的起始播放時間。但若是偵測到一個已偵測過的主播畫面，則不再輸出此段主播畫面的起始播放時間。 S18: Finally, the anchor screen detection component 54 will incorporate the frame type of the frame into a reference for detecting whether the current news is on the anchor screen. Once a new anchor picture is detected, the start time of this anchor picture is immediately output. However, if a detected anchor picture is detected, the initial playback time of the anchor picture is no longer output.

請參閱第6圖所示，為本發明利用人臉辨識偵測主播畫面方法之主播影格辨識組件中的辨識流程圖，係說明如何辨別一個影格為主播影格或非主播影格。其中，一個主播影格係指一個含有主播人臉圖像的影格，非主播影格係指一個不含人臉圖像或所含人臉圖像並無主播人臉的影格。為辨識一張人臉圖像是否為主播人臉圖像，其步驟如下： Please refer to FIG. 6 , which is a flowchart of identification in the anchor frame recognition component of the method for detecting the anchor image by using the face recognition method, and illustrates how to identify a frame as a main frame or a non-host frame. Among them, an anchor frame refers to a frame containing an anchor face image, and a non-host frame refers to a frame that does not contain a face image or a face image without an anchor face. In order to identify whether a face image is a host face image, the steps are as follows:

S19：首先擷取此人臉圖像的影像特徵值LBP。 S19: First, the image feature value LBP of the face image is captured.

S20：接下來，利用影像特徵值計算其與主播人臉模型之間相異值ef，其定義為兩者影像特徵值之間的歐幾里得距離。接著再計算此相異值的標準分數(z-score)，其定義如式(8)所示： S20: Next, the image feature value is used to calculate the difference ef between the model and the anchor face model, which is defined as the Euclidean distance between the image feature values of the two. Then calculate the standard score (z-score) of this disparity value, which is defined as shown in equation (8):

S21：當z小於一個門檻值(threshold)θ時，則此張人臉圖像即為主播人臉，其中θ介於2至4。最後，依據人臉圖像的辨識結果，決定此個影格的類型。 S21: When z is less than a threshold value (the threshold) θ, then the face image is the main broadcast face, where θ is between 2 and 4. Finally, the type of the frame is determined according to the recognition result of the face image.

請參閱第7圖所示，為本發明利用人臉辨識偵測主播畫面方法之主播畫面偵測組件中的偵測流程圖，係說明每當接收到一個影格時，偵測是否有主播畫面出現的過程。 Please refer to FIG. 7 , which is a detection flowchart in the anchor image detecting component of the method for detecting the anchor image by using the face recognition method, which is used to detect whether an anchor screen appears when a frame is received. the process of.

S22：首先，將從主播影格辨識組件53收到的一個主播影格辨識結果儲存於一個變量中。此變量會記錄下包含此個影格在內的過去N_D個影格之類型。其中N_D的大小定義如下：N _D=s ₃*t ₃ (9) S22: First, an anchor frame recognition result received from the anchor frame recognition component 53 is stored in a variable. This variable records the type of past N _D frames containing this frame. The size of N _D is defined as follows: N _D = s ₃ * t ₃ (9)

其中，s₃為新聞影像接收組件51在每一秒鐘的新聞影像中所擷取的影格數量，t₃則為主播畫面的最短秒數，其值介於5至10。 Where s ₃ is the number of frames captured by the news image receiving component 51 in each second of the news image, and t ₃ is the minimum number of seconds of the main broadcast picture, and the value is between 5 and 10.

S23：計算此變量中的主播影格的數量R S23: Calculate the number of anchor frames in this variable R

S24：最後，依此數值R來控制主播畫面偵測狀態的變化，並在偵測到一個新的主播畫面時，輸出其的起始播放時間。其中，主播畫面狀態的變化請參閱第8圖所示，主播畫面偵測過程包含兩種狀態：主播狀態、非主播狀態。其中，主播狀態係指目前的新聞處於主播畫面，非主播狀態係指目前的新聞不處於主播畫面。而非主播狀態為主播畫面偵測過程的起始狀態。兩狀態的變化會依R與N_D之間的關係而改變。只有當變量中一半以上的影格為主播影格，即R>N_D/2，且是由非主播狀態偵測狀態變為主播狀態時，才會輸出一個主播畫面的起始播放時間。其中，此個主播畫面起始播放時間為目前變量中最早一個的主播影格之播放時間。 S24: Finally, according to the value R, the change of the detection state of the anchor picture is controlled, and when a new anchor picture is detected, the initial play time is output. Among them, the change of the status of the anchor screen is shown in Figure 8. The anchor screen detection process includes two states: the anchor state and the non-host state. Among them, the anchor status means that the current news is in the anchor picture, and the non-host status means that the current news is not in the anchor picture. Instead of the anchor state, the initial state of the main screen detection process. The change in the two states will change depending on the relationship between R and N _D . The start time of an anchor picture is output only when more than half of the frames in the variable are the main frame, ie R>N _D /2, and the status is changed from the non-online status detection status to the anchor status. The starting time of the anchor screen is the playing time of the earliest anchor frame in the current variable.

2.本發明之主播畫面偵測方法，只以主播人臉做為找尋主播畫面的關鍵特徵而不需分析新聞背景畫面，故可因應日益複雜之新聞畫面。 2. The method for detecting the anchor picture of the present invention only uses the anchor face as a key feature for finding the anchor picture. It is not necessary to analyze the news background image, so it can respond to increasingly complex news images.

上列詳細說明乃針對本發明之一可行實施例進行具體說明，惟該實施例並非用以限制本發明之專利範圍，凡未脫離本發明技藝精神所為之等效實施或變更，均應包含於本案之專利範圍中。 The detailed description of the present invention is intended to be illustrative of a preferred embodiment of the invention, and is not intended to limit the scope of the invention. The patent scope of this case.

綜上所述，本案不僅於技術思想上確屬創新，並具備習用之傳統方法所不及之上述多項功效，已充分符合新穎性及進步性之法定發明專利要件，爰依法提出申請，懇請貴局核准本件發明專利申請案，以勵發明，至感德便。 To sum up, this case is not only innovative in terms of technical thinking, but also has many of the above-mentioned functions that are not in the traditional methods of the past. It has fully complied with the statutory invention patent requirements of novelty and progressiveness, and applied for it according to law. Approved this invention patent application, in order to invent invention, to the sense of virtue.

11‧‧‧主播人臉學習 11‧‧‧Hosting face learning

12‧‧‧主播畫面偵測 12‧‧‧Host screen detection

S1~S4‧‧‧步驟流程 S1~S4‧‧‧Step procedure

Claims

A method for detecting a news anchor picture by using face recognition, the steps comprising: a. anchor face learning, using a segment starting from a news program, and instantly training the anchor face of the news program by using an unsupervised learning method. Model and anchor face standard; and b. anchor screen detection, using the anchor face model and the anchor face standard obtained from step a, identify the anchors that appear in the news in order to instantly detect the anchor screen The starting time of its start.

The method for detecting a news anchor picture by using face recognition according to the first aspect of the patent application, wherein the method for the subject face learning comprises: c. anchor face model training, using a segment of the news program The face image automatically filters out the face image of the anchor to train the anchor face model of the news program; and d. The anchor face standard evaluation uses the anchor face model obtained in step c and another news The face image in the program segment evaluates the standard of a face image as the main face.

The method for detecting a news anchor picture by using face recognition according to the second aspect of the patent application, wherein the method for training the anchor face model comprises: capturing, by a news image receiving component, the news image signal in sequence Video frame; through a face detection component, using face detection technology to extract all face images in a frame; and using a host face model training component, using a plurality of faces obtained from the face detection component The image trains the anchor face model.

The method for detecting a news anchor picture by using face recognition according to item 3 of the patent application scope, wherein the method for assessing the subject face criteria includes: Through a news image receiving component, the video images in the news image signal are sequentially captured; through the face detection component, the face detection technology is used to extract all the face images in one frame; and through a host face standard The evaluation component uses a plurality of face images and an anchor face model obtained from the face detection component to evaluate the anchor face standard.

The method for detecting a news anchor screen by using face recognition according to the fourth aspect of the patent application, wherein the step of detecting the anchor screen comprises: capturing, by a news image receiving component, the video image signal in sequence Video frame; through a face detection component, using face detection technology to extract all face images in a frame; through an anchor frame recognition component, using the anchor face recognition technology to determine whether a frame contains the news program anchor The image of the face image, that is, the anchor frame; and the anchor picture detection component, uses the anchor frame to detect the anchor picture and its initial playback time.

The method for detecting a news anchor picture by using face recognition according to claim 5, wherein the main frame identification component comprises the following steps: e. capturing image features of all face images in a frame Value; f. based on the image feature value, after calculating the difference value of each face image and the anchor face model, using the anchor face standard to calculate the standard score of the different value; and g. identifying whether there is someone by the standard score The face image is the main broadcast face, and the frame containing the anchor face is set as the main broadcast frame accordingly.

The method for detecting a news anchor picture by using face recognition according to the sixth aspect of the patent application, wherein the main picture detection component includes the following steps: h. storing the received frame type of a frame in a a variable containing the frame type of the past N _D frames including the frame; i. calculating the number of anchor frames in the variable R; j. checking whether the R>N _D /2 condition is true; k. when step j When the condition is met, it means that an anchor picture is detected, so the current anchor picture detection status is set to the anchor status, otherwise it is set to the non-host state; and l. The condition described in step j is established and the anchor picture is set. When the detection state is converted from the non-anchor state to the current anchor state, the play time of the earliest one of the output frames in the output variable is used as the starting play time of the anchor screen.

The method for detecting a news anchor picture by using face recognition according to claim 7 of the patent application scope, wherein the anchor picture detection status refers to two states in the process of detecting an anchor picture, including: the news is currently in The anchor status of the anchor screen; and the news is currently not in the non-anchor state of the anchor screen.

The method for detecting a news anchor picture by using face recognition according to item 8 of the patent application scope, wherein the anchor picture detection state is initially in a non-anchor state.

The method for detecting a news anchor picture by using face recognition according to the fourth aspect of the patent application, wherein the anchor face model training component comprises the following steps: m. capturing respective images of the plurality of face images The eigenvalues; n. use the grouping method to divide all face images into three groups according to the similarity of image feature values; o. For each face image group, calculate the group degree of group value, which is defined as the group of people Face map The number of images divided by the standard deviation of the difference between the image feature value and the average image feature value; and p. Select a face image in the face image group with the largest clustering value to train the anchor face model .

The method for detecting a news anchor picture by using face recognition according to claim 10, wherein the anchor face standard evaluation component includes the following steps: q. capturing respective images of the plurality of face images Eigenvalue; r. Calculate the difference value between each face image and the anchor face model based on the image feature value; s. Use the grouping method to divide all the disparate values into two groups and calculate the respective dissimilar values The average of; and t. select a group of averages with lower disparity values and evaluate the subjective face criteria from which the mean value of the disparity values and the standard deviation of the disparity values are included.

The method for detecting a news anchor screen by using face recognition according to the first aspect of the patent application, wherein the step of detecting the anchor screen comprises: capturing, by a news image receiving component, the video image signal in sequence Video frame; through a face detection component, using face detection technology to extract all face images in a frame; through an anchor frame recognition component, using the anchor face recognition technology to determine whether a frame contains the news program anchor The image of the face image, that is, the anchor frame; and the anchor picture detection component, uses the anchor frame to detect the anchor picture and its initial playback time.