TW202023285A

TW202023285A - Video extracting method and electronic device using the same

Info

Publication number: TW202023285A
Application number: TW107143895A
Authority: TW
Inventors: 黃彥碩; 謝憶得
Original assignee: 宏碁股份有限公司
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2020-06-16
Also published as: TWI688269B

Abstract

A video extracting method is provided. The method is used for extracting at least one clip from a video by an electronic device, where the video corresponds to a viewer chat room. The video extracting method includes: obtaining feature information of the video at a plurality of time points of the video, where the feature information includes a refresh frequency of the viewer chat room; and extracting the at least one clip from the video according to the feature information at each time point. In addition, an electronic device using the method is also provided.

Description

Video capturing method and electronic device using the method

本發明是有關於一種擷取技術，且特別是有關於一種影片擷取方法與使用此方法的電子裝置。The present invention relates to a capturing technology, and more particularly to a video capturing method and an electronic device using the method.

隨著科技的進步、網路直播平台的普及與各式網路影音平台興起，娛樂影音唾手可得，改變了人們的影音觀看行為。從固定時間蹲坐在電視螢幕前，轉而移向隨選隨看的網路影音，網路影音帶來新的互動與觀看體驗。With the advancement of science and technology, the popularization of online live broadcast platforms and the rise of various online audio and video platforms, entertainment audio and video are readily available, which has changed people's audiovisual viewing behavior. From sitting in front of the TV screen at a fixed time, to on-demand online audio and video, online audio and video brings new interaction and viewing experience.

然而，由於直播門檻的降低，只要下載直播應用程式（application，APP）或是登入直播平台，每個人隨時隨地都可以變成直播主，直播儼然成為了一種新的娛樂模式。但，往往一個直播經常超過一小時且時間不定，觀眾也不會一直關注某個直播主每天隨時的完整內容。因此，為了吸引觀眾，直播主通常都要花費額外的時間過濾冗長的直播內容並剪輯精華片段以吸引粉絲觀看。此外，直播主也能將精簡版的精華影片分享至其他社群平台以提高知名度並增加訂閱觀眾。據此，如何能夠設計出一套自動的方法來剪輯精華片段，是本領域的技術人員研究的課題之一。However, due to the lowering of the live broadcast threshold, everyone can become a live broadcast host anytime and anywhere as long as they download a live application (APP) or log in to the live broadcast platform. Live broadcast has become a new entertainment mode. However, often a live broadcast often exceeds one hour and the time is uncertain, and the audience will not always pay attention to the complete content of a live broadcaster anytime every day. Therefore, in order to attract viewers, live broadcasters usually spend extra time filtering lengthy live broadcast content and editing essential clips to attract fans to watch. In addition, live broadcasters can also share the streamlined version of the best videos to other social platforms to increase visibility and increase subscription audiences. Based on this, how to design an automatic method to edit the essence clips is one of the topics studied by the technicians in the field.

本發明提供一種影片擷取方法與使用此方法的電子裝置，能夠自動分析影片以擷取影片中的精華片段，據此可節省使用者剪輯影片的時間。The present invention provides a video capturing method and an electronic device using the method, which can automatically analyze the video to capture the essence of the video, thereby saving the user's time to edit the video.

本發明的影片擷取方法，用於以電子裝置擷取影片的至少一剪輯，其中所述影片對應於觀眾聊天室，所述影片擷取方法包括：取得所述影片在多個時間點的特徵資訊，其中所述特徵資訊包括所述觀眾聊天室的刷新頻率；以及根據各時間點的所述特徵資訊，從所述影片擷取所述至少一剪輯。The video capturing method of the present invention is used to capture at least one clip of a video with an electronic device, wherein the video corresponds to an audience chat room, and the video capturing method includes: obtaining characteristics of the video at multiple time points Information, wherein the characteristic information includes the refresh rate of the audience chat room; and the at least one clip is extracted from the video according to the characteristic information at each time point.

在本發明的一實施例中，上述根據各時間點的所述特徵資訊，從所述影片擷取所述至少一剪輯的步驟包括：根據各時間點的所述特徵資訊，利用神經網路模型將所述影片在各時間點的影像分類為精華類別或非精華類別，其中所述神經網路模型是基於多個歷史精華影片來建構；以及根據分類為所述精華類別的所述影像產生所述至少一剪輯。In an embodiment of the present invention, the step of extracting the at least one clip from the video based on the feature information at each time point includes: using a neural network model based on the feature information at each time point The images of the film at each point in time are classified into either a highlight category or a non-essential category, wherein the neural network model is constructed based on a plurality of historical highlights; and a generated image is generated based on the images classified as the highlight category Describe at least one clip.

在本發明的一實施例中，上述取得所述影片在所述時間點的所述特徵資訊的步驟包括：記錄第一時間點時，所述觀眾聊天室的最近更新留言；在第二時間點找出所述最近更新留言在所述觀眾聊天室中的所在位置；以及根據所述第一時間點、所述第二時間點以及所述所在位置，計算所述影片在所述第一時間點的所述刷新頻率。In an embodiment of the present invention, the step of obtaining the feature information of the movie at the time point includes: recording the latest update message of the audience chat room at the first time point; and at the second time point Find out the location of the most recently updated message in the audience chat room; and calculate the movie at the first time point according to the first time point, the second time point, and the location The refresh frequency.

在本發明的一實施例中，上述的特徵資訊包括表情分數及音量，並且取得所述影片在所述時間點的所述特徵資訊的步驟包括：對所述影片在各時間點的影像進行人臉偵測並且取得多種表情類別的其中之一；以及根據所述表情類別設定所述表情分數。In an embodiment of the present invention, the above-mentioned feature information includes expression score and volume, and the step of obtaining the feature information of the movie at the time point includes: performing a human image of the movie at each time point. Face detection and obtaining one of multiple expression categories; and setting the expression score according to the expression category.

在本發明的一實施例中，上述根據各時間點的所述特徵資訊，從所述影片擷取所述至少一剪輯的步驟包括：根據各時間點的所述刷新頻率、所述表情分數以及所述音量計算特徵分數；以及根據所述特徵分數，從所述影片擷取所述至少一剪輯，其中計算所述特徵分數時，所述刷新頻率的權重值高於所述表情分數的權重值以及所述音量的權重值。In an embodiment of the present invention, the step of extracting the at least one clip from the movie according to the feature information at each time point includes: according to the refresh frequency, the expression score, and Calculating a feature score by the volume; and extracting the at least one clip from the movie according to the feature score, wherein when calculating the feature score, the weight value of the refresh frequency is higher than the weight value of the expression score And the weight value of the volume.

本發明的電子裝置，用以擷取影片的至少一剪輯，其中所述影片對應於觀眾聊天室，所述電子裝置包括：儲存裝置，記錄多個模組；以及處理器，耦接於所述儲存裝置，以存取並執行所述儲存裝置中記錄的所述模組，所述模組包括：資料收集模組，取得所述影片在多個時間點的特徵資訊，其中所述特徵資訊包括所述觀眾聊天室的刷新頻率；以及影片擷取模組，根據各時間點的所述特徵資訊，從所述影片擷取所述至少一剪輯。The electronic device of the present invention is used to capture at least one clip of a movie, wherein the movie corresponds to the audience chat room, and the electronic device includes: a storage device that records a plurality of modules; and a processor coupled to the A storage device to access and execute the module recorded in the storage device, the module includes: a data collection module to obtain feature information of the film at multiple time points, wherein the feature information includes The refresh rate of the audience chat room; and a video capture module, which captures the at least one clip from the video according to the characteristic information at each time point.

在本發明的一實施例中，上述的影片擷取模組包括：根據各時間點的所述特徵資訊，利用神經網路模型將所述影片在各時間點的影像分類為精華類別或非精華類別，其中所述神經網路模型是基於多個歷史精華影片來建構；以及根據分類為所述精華類別的所述影像產生所述至少一剪輯。In an embodiment of the present invention, the above-mentioned video capture module includes: according to the feature information at each time point, a neural network model is used to classify the image of the video at each time point into a highlight category or a non-essential category Categories, wherein the neural network model is constructed based on a plurality of historical highlights videos; and the at least one clip is generated according to the images classified into the highlights category.

在本發明的一實施例中，上述的資料收集模組包括：記錄第一時間點時，所述觀眾聊天室的最近更新留言；在第二時間點找出所述最近更新留言在所述觀眾聊天室中的所在位置；以及根據所述第一時間點、所述第二時間點以及所述所在位置，計算所述影片在所述第一時間點的所述刷新頻率。In an embodiment of the present invention, the above-mentioned data collection module includes: recording the most recent update message of the audience chat room at the first time point; and finding out that the most recent update message is in the audience at the second time point. The location in the chat room; and calculating the refresh frequency of the movie at the first time point according to the first time point, the second time point, and the location.

在本發明的一實施例中，上述的特徵資訊包括表情分數及音量，並且所述資料收集模組包括：對所述影片在各時間點的影像進行人臉偵測並且取得多種表情類別的其中之一；以及根據所述表情類別設定所述表情分數。In an embodiment of the present invention, the aforementioned feature information includes expression scores and volume, and the data collection module includes: performing face detection on images of the video at various points in time and obtaining one of multiple expression categories One; and setting the expression score according to the expression category.

在本發明的一實施例中，上述的影片擷取模組包括：根據各時間點的所述刷新頻率、所述表情分數以及所述音量計算特徵分數；以及根據所述特徵分數，從所述影片擷取所述至少一剪輯，其中計算所述特徵分數時，所述刷新頻率的權重值高於所述表情分數的權重值以及所述音量的權重值。In an embodiment of the present invention, the above-mentioned video capture module includes: calculating a feature score according to the refresh frequency, the expression score, and the volume at each time point; and according to the feature score, from the The video captures the at least one clip, wherein when calculating the feature score, the weight value of the refresh frequency is higher than the weight value of the expression score and the weight value of the volume.

基於上述，本發明實施例所提供的影片擷取方法與使用此方法的電子裝置，能夠將影片在多個時間點的特徵資訊輸入神經網路模型以將在各時間點的影像分類為精華類別或非精華類別，並且對分類為精華類別的影像進行擷取操作，藉此能夠自動分析影片以擷取影片中的精華片段。如此一來，可節省使用者剪輯影片的時間。Based on the above, the video capturing method and the electronic device using the method provided by the embodiments of the present invention can input the feature information of the video at multiple time points into the neural network model to classify the images at each time point into the essence category Or non-essential category, and perform the capture operation on the images classified as the essential category, which can automatically analyze the video to capture the essential fragments in the movie. In this way, the time for users to edit the video can be saved.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more obvious and understandable, the embodiments are specifically described below and described in detail in conjunction with the accompanying drawings.

圖1繪示本發明一實施例中電子裝置的方塊圖。FIG. 1 shows a block diagram of an electronic device in an embodiment of the invention.

請參照圖1，電子裝置100包括儲存裝置110以及處理器120。電子裝置100例如為桌上型電腦（Desktop）、筆記型電腦（Notebook）、平板電腦（Tablet PC）、智慧型行動電話（Smart Phone）或攜帶式遊戲機（PSP）等具有運算能力以提供剪輯功能的裝置，本發明並不加以限制。Please refer to FIG. 1, the electronic device 100 includes a storage device 110 and a processor 120. The electronic device 100 is, for example, a desktop computer (Desktop), a notebook computer (Notebook), a tablet computer (Tablet PC), a smart mobile phone (Smart Phone), or a portable game console (PSP), etc., with computing capabilities to provide editing The functional device is not limited by the present invention.

儲存裝置110用以記錄資料收集模組112以及影片擷取模組114，這些模組例如是儲存在儲存裝置110中的程式。在一些實施例中，儲存裝置100例如是任何型態的固定式或可移動式隨機存取記憶體（random access memory，RAM）、唯讀記憶體（read-only memory，ROM）、快閃記憶體（flash memory）或類似元件或上述元件的組合，本發明並不加以限制。The storage device 110 is used to record the data collection module 112 and the video capture module 114, and these modules are, for example, programs stored in the storage device 110. In some embodiments, the storage device 100 is, for example, any type of fixed or removable random access memory (random access memory, RAM), read-only memory (read-only memory, ROM), or flash memory. Flash memory or similar components or combinations of the above components are not limited by the present invention.

處理器120耦接儲存裝置110，並且會從儲存裝置110載入資料收集模組112以及影片擷取模組114的程式碼，據以執行本發明實施例的影片擷取方法。在一些實施例中，處理器120例如是中央處理單元（Central Processing Unit，CPU），或是其他可程式化之一般用途或特殊用途的微處理器（Microprocessor）、數位訊號處理器（Digital Signal Processor，DSP）、可程式化控制器、特殊應用積體電路（Application Specific Integrated Circuits，ASIC）、可程式化邏輯裝置（Programmable Logic Device，PLD）或其他類似裝置或這些裝置的組合，本發明並不加以限制。The processor 120 is coupled to the storage device 110, and loads the data collection module 112 and the video capture module 114 code from the storage device 110 to execute the video capture method according to the embodiment of the present invention. In some embodiments, the processor 120 is, for example, a central processing unit (Central Processing Unit, CPU), or other programmable general-purpose or special-purpose microprocessors, or digital signal processors (Digital Signal Processors). DSP), programmable controllers, application specific integrated circuits (Application Specific Integrated Circuits, ASIC), programmable logic devices (Programmable Logic Device, PLD) or other similar devices or a combination of these devices, the present invention does not Be restricted.

為了能夠自動擷取影片中的精華片段，本發明實施例的電子裝置100分析影片在各時間點的特徵資訊以進行影片擷取操作，其中所分析的特徵資訊包括影片在各時間點時對應的觀眾聊天室的刷新頻率。如此一來，電子裝置100可立即提供影片的精華片段，以減少使用者剪輯影片的時間。In order to be able to automatically capture the essence of the video, the electronic device 100 of the embodiment of the present invention analyzes the feature information of the video at each time point to perform a video capture operation, wherein the analyzed feature information includes the corresponding video at each time point The refresh rate of the audience chat room. In this way, the electronic device 100 can immediately provide the essence of the video, so as to reduce the time for the user to edit the video.

圖2繪示本發明一實施例的影片擷取方法的流程圖。FIG. 2 shows a flowchart of a video capturing method according to an embodiment of the invention.

圖2實施例的影片擷取方法適用於圖1實施例的電子裝置100。以下將參照圖1實施例的各項元件來詳細說明圖2實施例的影片擷取方法。在此需特別說明的是，本實施例中的影片對應於一個觀眾聊天室。在影片播放的過程中，影片的觀眾可以在觀眾聊天室中留言。在一些實施例中，影片中的影像包括所述觀眾聊天室，本發明並不加以限制。The video capturing method of the embodiment of FIG. 2 is applicable to the electronic device 100 of the embodiment of FIG. 1. Hereinafter, the video capturing method of the embodiment of FIG. 2 will be described in detail with reference to various elements of the embodiment of FIG. 1. It should be particularly noted that the film in this embodiment corresponds to an audience chat room. During the video playback, the audience of the video can leave a message in the audience chat room. In some embodiments, the image in the film includes the audience chat room, which is not limited by the present invention.

在步驟S220中，處理器120執行資料收集模組112，處理器120取得影片在多個時間點的特徵資訊，所述特徵資訊包括觀眾聊天室的刷新頻率。具體而言，處理器120取得所述觀眾聊天室的刷新頻率的方式如下：首先，處理器120記錄第一時間點時，觀眾聊天室的最近更新留言。接著，處理器120在第二時間點找出所述最近更新留言在觀眾聊天室中的所在位置。最後，處理器120根據第一時間點、第二時間點以及所在位置，統計在第一時間點與第二時間點之間出現幾次新的留言，隨後處理器120將所述新的留言的數量除以第一時間點與第二時間點的差值，以計算影片在第一時間點的刷新頻率。In step S220, the processor 120 executes the data collection module 112, and the processor 120 obtains feature information of the movie at multiple time points, and the feature information includes the refresh rate of the audience chat room. Specifically, the way for the processor 120 to obtain the refresh frequency of the audience chat room is as follows: First, the processor 120 records the latest update message of the audience chat room at the first time point. Then, the processor 120 finds the location of the most recently updated message in the audience chat room at the second time point. Finally, the processor 120 counts how many new messages appear between the first time point and the second time point according to the first time point, the second time point, and the location, and then the processor 120 calculates the number of new messages The number is divided by the difference between the first time point and the second time point to calculate the refresh frequency of the movie at the first time point.

以下舉實施例說明處理器120取得觀眾聊天室的刷新頻率的具體方式。The following examples are given to illustrate the specific manner in which the processor 120 obtains the refresh frequency of the audience chat room.

圖3A及圖3B繪示本發明一實施例的影片中影像的示意圖。圖3A示出在影片中時間點t秒的影像300a，在影像300a中顯示觀眾聊天室320的最近更新留言為留言322。圖3B示出在影片中時間點t+3秒的影像300b，在影像300b中顯示在時間點t+3秒時，留言322在觀眾聊天室320中的位置。處理器120根據時間點t秒、時間點t+3秒以及留言322在時間點t+3秒時的所在位置，統計在時間點t秒與時間點t+3秒之間出現9次新的留言，因此影片在時間點t的刷新頻率為3次/秒。3A and 3B are schematic diagrams of images in a movie according to an embodiment of the invention. FIG. 3A shows an image 300a at the time point t seconds in the movie, and the latest update message of the audience chat room 320 is displayed as the message 322 in the image 300a. FIG. 3B shows the image 300b at the time point t+3 seconds in the movie, and the position of the message 322 in the audience chat room 320 is displayed in the image 300b at the time point t+3 seconds. Based on the time point t seconds, time point t+3 seconds, and the location of the message 322 at time point t+3 seconds, the processor 120 counts that 9 new messages appear between time point t seconds and time point t+3 seconds Leave a message, so the video refresh frequency at time t is 3 times/sec.

在一些實施例中，所述特徵資訊還包括表情分數及音量。處理器120取得所述音量的方式如下：處理器120依據影片的音軌統計在各時間點的分貝數，所述分貝數即為音量。處理器120取得所述表情分數的方式如下：首先，處理器120利用情緒偵測模型（例如，Python的DLIB函式庫等，但不限於此）對影片在各時間點的影像進行人臉偵測並且取得多種表情類別的其中之一，所述表情類別包括平淡無奇的表情、害怕的表情或開心的表情，本發明並不加以限制。接著，處理器120根據所述表情類別設定表情分數。詳細來說，處理器120取出人臉顯示為害怕或開心的表情，並將所述害怕或開心的表情量化為數值1（最大值），也就是說，處理器120將表情量化為數值0至1，其中數值0為平淡無奇的表情，數值1為害怕或開心的表情（所述數值即為表情分數）。值得注意的是，在一些實施例中，所述影片中的影像還包括直播主視窗，處理器120僅針對直播主視窗進行人臉偵測，以減輕系統的運算負擔，但本發明並不加以限制。In some embodiments, the characteristic information further includes expression score and volume. The way for the processor 120 to obtain the volume is as follows: the processor 120 counts the decibels at each time point according to the audio track of the movie, and the decibels are the volume. The way for the processor 120 to obtain the expression score is as follows: First, the processor 120 uses an emotion detection model (for example, Python’s DLIB library, but not limited to this) to perform face detection on images of the movie at each time point. Measure and obtain one of a variety of expression categories. The expression categories include plain expressions, scared expressions, or happy expressions, which are not limited by the present invention. Then, the processor 120 sets an expression score according to the expression category. In detail, the processor 120 takes out facial expressions that are scared or happy, and quantizes the scared or happy expressions to a value of 1 (the maximum value), that is, the processor 120 quantizes the expressions to a value of 0 to 1. The value 0 is a plain expression, and the value 1 is a scared or happy expression (the value is the expression score). It is worth noting that, in some embodiments, the images in the movie also include the live main window, and the processor 120 only performs face detection for the live main window to reduce the computational burden of the system, but the present invention does not limit.

舉例來說，請參照圖3A及圖3B，圖3A的影像300a及圖3B的影像300b包括直播主視窗340。在圖3A的示例中，於影片中時間點t秒時，直播主視窗340中的人臉顯示為平淡無奇的表情，因此處理器120在取得人臉顯示為平淡無奇的表情後，設定表情分數為0。在圖3B的示例中，於影片中時間點t+1秒時，直播主視窗340的人臉顯示為害怕的表情，因此處理器120在取得人臉顯示為害怕的表情後，設定表情分數為1。For example, referring to FIGS. 3A and 3B, the image 300a in FIG. 3A and the image 300b in FIG. 3B include the live main window 340. In the example of FIG. 3A, at the time point t seconds in the movie, the face in the live main window 340 is displayed as a bland expression. Therefore, the processor 120 obtains the face as a bland expression and sets The expression score is 0. In the example of FIG. 3B, at the time t+1 second in the video, the face of the live main window 340 is displayed as a scared expression. Therefore, the processor 120 sets the expression score to be after obtaining the scared expression on the face. 1.

回到圖2的流程圖，在步驟S240中，處理器120執行影片擷取模組，處理器120根據各時間點的特徵資訊，從影片擷取至少一剪輯。在一些實施例中，處理器120根據各時間點的特徵資訊，利用神經網路模型將影片在各時間點的影像分類為精華類別或非精華類別，接著根據分類為精華類別的影像產生至少一剪輯。Returning to the flowchart of FIG. 2, in step S240, the processor 120 executes the video capture module, and the processor 120 captures at least one clip from the video according to the characteristic information of each time point. In some embodiments, the processor 120 uses a neural network model to classify the images of the movie at each time point into a highlight category or a non-essential category based on the feature information at each time point, and then generates at least one image based on the images classified as the highlight category. Clip.

在一些實施例中，處理器120根據各時間點的刷新頻率、表情分數以及音量計算特徵分數，並且根據特徵分數，從影片擷取至少一剪輯。特別是，處理器120計算所述特徵分數時，刷新頻率的權重值高於表情分數的權重值以及音量的權重值。舉例來說，處理器120會將所述特徵分數輸入神經網路模型以輸出機率值，且判斷所述機率值是否大於一預設閾值，若所述機率值大於所述預設閾值，處理器120將所述時間點的影像分類為精華類別；若所述機率值不大於所述預設閾值，處理器120將所述時間點的影像分類為非精華類別。接著，處理器120對分類為精華類別的影像進行擷取以產生至少一剪輯。在一些實施例中，特徵分數等於刷新頻率加上表情分數與音量的乘積，本發明並不加以限制。In some embodiments, the processor 120 calculates a feature score based on the refresh frequency, expression score, and volume at each time point, and extracts at least one clip from the movie based on the feature score. In particular, when the processor 120 calculates the feature score, the weight value of the refresh frequency is higher than the weight value of the expression score and the weight value of the volume. For example, the processor 120 inputs the feature score into a neural network model to output a probability value, and determines whether the probability value is greater than a preset threshold value, and if the probability value is greater than the preset threshold value, the processor 120 120 classifies the image at the time point into a highlight category; if the probability value is not greater than the preset threshold, the processor 120 classifies the image at the time point as a non-essential category. Then, the processor 120 captures the images classified as the essence category to generate at least one clip. In some embodiments, the feature score is equal to the refresh rate plus the product of the expression score and the volume, which is not limited by the present invention.

在其他實施例中，處理器120也可以直接將刷新頻率、表情分數以及音量輸入神經網路模型，以將各時間點的影像分類為精華類別或非精華類別。然而，本發明並不加以限制。In other embodiments, the processor 120 may also directly input the refresh rate, expression score, and volume into the neural network model, so as to classify the images at each time point into the essence category or the non-essential category. However, the present invention is not limited.

值得一提的是，在一些實施例中，神經網路模型是基於多個歷史精華影片來建構。神經網絡模型的建立方式請見下述說明。首先，處理器120取得至少一個歷史精華影片，所述歷史精華影片包括多個精華影像，且各精華影像包括特徵資訊。舉例來說，歷史精華影片例如是由直播主剪輯完成的精華影片。詳細來說，處理器120在取得到歷史精華影片後，透過影片分析技術（video object detection）將歷史精華影片中的精華影像分類為直播主視窗以及觀眾聊天室，處理器120取得各精華影像中觀眾聊天室的刷新頻率、直播主視窗的表情分數以及聲音。接著，處理器120利用精華影像以及各精華影像的特徵資訊（觀眾聊天室的刷新頻率、直播主視窗的表情分數以及聲音）訓練神經網路模型。It is worth mentioning that in some embodiments, the neural network model is constructed based on multiple historical highlights. Please refer to the following description for the establishment of neural network model. First, the processor 120 obtains at least one historical highlight video, the historical highlight video includes a plurality of highlight images, and each highlight image includes feature information. For example, the historical highlight video is, for example, a highlight video completed by a live broadcast master editing. In detail, after the processor 120 obtains the historical highlights, it uses video object detection to classify the highlights of the historical videos into the live main window and the audience chat room, and the processor 120 obtains the essence of the images. The refresh rate of the audience chat room, the expression score and voice of the live broadcast main window. Then, the processor 120 trains the neural network model by using the essence image and the characteristic information of each essence image (the refresh rate of the audience chat room, the expression score and the voice of the live broadcast main window).

此外，在一些實施例中，處理器120可利用資料反饋的方式，透過使用者有無使用所述至少一剪輯當精華片段的行為進行重訓練流程（Re-Training Process）。所述重訓練流程針對新取得精華片段中影像的不同刷新頻率、不同表情分數以及不同聲音分別在神經網路模型中進行機器學習（machine learning），藉此優化神經網路模型，而使得未來經神經網絡模型所輸出的結果更加準確，並且更符合個人化需求。In addition, in some embodiments, the processor 120 may use a data feedback method to perform a re-training process (Re-Training Process) based on whether the user uses the at least one clip as an essential clip. The retraining process performs machine learning in the neural network model for the different refresh frequencies, different expression scores, and different sounds of the images in the newly acquired essence clips, so as to optimize the neural network model and make the future experience The output of the neural network model is more accurate and more in line with personal needs.

綜上所述，本發明實施例所提供的影片擷取方法與使用此方法的電子裝置，能夠將影片在多個時間點的特徵資訊輸入神經網路模型以將在各時間點的影像分類為精華類別或非精華類別，並且對分類為精華類別的影像進行擷取操作，藉此能夠自動分析影片以擷取影片中的精華片段。如此一來，可節省使用者剪輯影片的時間。In summary, the video capture method and the electronic device using the method provided by the embodiments of the present invention can input the feature information of the video at multiple time points into the neural network model to classify the images at each time point into Highlight category or non-essential category, and perform the capture operation on the images classified as the highlight category, so as to automatically analyze the video to capture the highlights of the video. In this way, the time for users to edit the video can be saved.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed as above by the embodiments, it is not intended to limit the present invention. Any person with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be subject to the scope defined in the appended patent application.

100:電子裝置110:儲存裝置112:資料收集模組114:影片擷取模組120:處理器S220、S240:影片擷取方法的步驟300a、300b:影像320:觀眾聊天室322:留言340:直播主視窗100: Electronic device 110: Storage device 112: Data collection module 114: Video capture module 120: Processor S220, S240: Steps 300a, 300b of the video capture method: Image 320: Audience chat room 322: Message 340: Live main window

圖1繪示本發明一實施例中電子裝置的方塊圖。圖2繪示本發明一實施例的影片擷取方法的流程圖。圖3A及圖3B繪示本發明一實施例的影片中影像的示意圖。FIG. 1 shows a block diagram of an electronic device in an embodiment of the invention. FIG. 2 shows a flowchart of a video capturing method according to an embodiment of the invention. 3A and 3B are schematic diagrams of images in a movie according to an embodiment of the invention.

S220、S240:影片擷取方法的步驟 S220, S240: Steps of video capture method

Claims

A video capturing method for capturing at least one clip of a video with an electronic device, wherein the video corresponds to an audience chat room, and the video capturing method includes: obtaining feature information of the video at multiple time points; The feature information includes the refresh rate of the audience chat room; and the at least one clip is extracted from the video according to the feature information at each time point.

The method for capturing a video according to claim 1, wherein the step of capturing the at least one clip from the video according to the feature information at each time point includes: according to the feature information at each time point , Using a neural network model to classify the images of the video at each point in time into a highlight category or a non-essential category, wherein the neural network model is constructed based on a plurality of historical highlights; and classified as the highlight category The image of generates the at least one clip.

The method for capturing a video as described in claim 1, wherein the step of obtaining the feature information of the video at the time point includes: recording the latest update message of the audience chat room at the first time point ; Find out the location of the most recently updated message in the audience chat room at the second time point; and calculate the location of the movie based on the first time point, the second time point and the location The refresh frequency at the first time point.

The video capture method described in any one of items 1 to 3 of the scope of patent application, wherein the feature information includes expression score and volume, and the feature information of the video at the time point is obtained The steps include: performing face detection on the image of the movie at each time point and obtaining one of multiple expression categories; and setting the expression score according to the expression category.

The method for capturing a video according to claim 4, wherein the step of capturing the at least one clip from the video according to the characteristic information at each time point includes: according to the refresh frequency at each time point , The expression score and the volume to calculate a feature score; and according to the feature score, extract the at least one clip from the movie, wherein when calculating the feature score, the refresh frequency weight value is higher than the The weight value of the expression score and the weight value of the volume.

An electronic device for capturing at least one clip of a movie, wherein the movie corresponds to an audience chat room, the electronic device includes: a storage device that records a plurality of modules; and a processor coupled to the storage device , To access and execute the module recorded in the storage device, the module includes: a data collection module to obtain feature information of the movie at multiple points in time, wherein the feature information includes the The refresh frequency of the audience chat room; and the video capture module, which captures the at least one clip from the video according to the characteristic information at each time point.

According to the electronic device described in item 6 of the scope of patent application, the video capture module includes: according to the feature information at each time point, a neural network model is used to classify the image of the video at each time point into A highlight category or a non-essential category, wherein the neural network model is constructed based on a plurality of historical highlights; and the at least one clip is generated based on the images classified into the highlight category.

The electronic device according to item 6 of the scope of patent application, wherein the data collection module includes: recording the latest update message of the audience chat room at the first time point; and finding the latest update at the second time point The location of the message in the audience chat room; and calculating the refresh frequency of the movie at the first time point according to the first time point, the second time point, and the location.

The electronic device according to any one of items 6 to 8 of the scope of patent application, wherein the characteristic information includes expression scores and volume, and the data collection module includes: The image performs face detection and obtains one of multiple expression categories; and sets the expression score according to the expression category.

The electronic device according to item 9 of the scope of patent application, wherein the video capture module includes: calculating a feature score according to the refresh frequency, the expression score, and the volume at each time point; and according to the feature Score, extract the at least one clip from the movie, wherein when calculating the feature score, the weight value of the refresh frequency is higher than the weight value of the expression score and the weight value of the volume.