TWI741550B

TWI741550B - Method for bookmark frame generation, and video player device with automatic generation of bookmark and user interface thereof

Info

Publication number: TWI741550B
Application number: TW109111082A
Authority: TW
Inventors: 黃純敏; 盧科丞; 劉東榮
Original assignee: 國立雲林科技大學
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2021-10-01
Also published as: TW202139150A

Abstract

The present invention discloses a method for bookmark frame generation, a video player device with automatic generation bookmarks, and a user interface thereof. Based on the concept that "The presentation of important frames should allow enough time for careful reading", the method for bookmark frame generation includes: extracting a plurality of frames ordered by time from a movie; calculating the structural similarity between two adjacent frames in the frames; when the structural similarity of a first frame in the frames with an adjacent second frames is less than a preset value, and the structural similarity of all the frames corresponding to the first frame and within a preset time is greater than or equal to the preset value, the second frame is designated as a bookmark frame; and storing the bookmark frame and its occurrence time.

Description

Method for generating bookmark frame, video and audio playback device for automatically generating bookmark, and user interface thereof

本發明關於一種書籤影格的生成方法、自動生成書籤的影音播放裝置及其使用者介面。 The invention relates to a method for generating a bookmark frame, a video and audio playback device for automatically generating a bookmark, and a user interface thereof.

近年來，數位學習在各大專院校中越來越普遍，透過影片的方式還原上課情境，並可以加上更詳細的補充資料以幫助學生理解上課內容。數位學習可以不再受到空間與時間的限制，幫助學生以自己的步調進行知識的吸收。雖然數位學習尚有不足的地方，但很明顯地，數位學習是未來的學習趨勢。 In recent years, digital learning has become more and more common in universities and colleges. The situation of class can be restored through video, and more detailed supplementary information can be added to help students understand the content of class. Digital learning can no longer be restricted by space and time, helping students to absorb knowledge at their own pace. Although there are still shortcomings in digital learning, it is clear that digital learning is the future learning trend.

目前數位學習的教學方式中，教學影片為最大宗，但是，在觀看影片時會遇到一個問題，那就是複習十分麻煩，有少數教學影片會整理出其中各片段的超連結(書籤)，但是這個功能往往都是靠人工的方式進行，是一件費時又費力的工作，所以一般的教學影片並不會提供此功能。在沒有此功能的情況下，會造成學生在複習的時候會有不少的麻煩，由於教學影片長度往往都很長，但學生要複習的內容可能只要一、兩分鐘，所以在複習時就有種海底撈針的感覺。 Among the current teaching methods of digital learning, teaching videos are the largest. However, one problem will be encountered when watching videos, that is, it is very troublesome to review. A few teaching videos will sort out the hyperlinks (bookmarks) of each segment, but This function is often carried out manually, which is a time-consuming and laborious task, so normal teaching videos do not provide this function. Without this function, it will cause students to have a lot of trouble in reviewing, because the length of teaching videos is often very long, but the content that students want to review may only take one or two minutes, so there will be Kind of the feeling of finding a needle in a haystack.

另外，影音分析技術已有長足的進步，可以透過分析影片中的各種數據，讓使用者可以對於影片中的內容能有更好的掌控，但是現今的影片播放平台(例如YouTube或BiliBili)並沒有顯著的進步，大部分平台的播放器還是只有時間軸、快轉、倒轉和暫停等基礎功能，並沒有發揮影音分析技術的潛力。 In addition, audio-visual analysis technology has made great progress. It can analyze various data in the video, so that users can have better control of the content in the video, but today's video playback platform (such as YouTube or BiliBili) does not Significant progress. Players on most platforms still only have basic functions such as timeline, fast forward, rewind, and pause, and have not played the potential of audio-visual analysis technology.

本發明的目的為提供一種書籤影格的生成方法、自動生成書籤的影音播放裝置及其使用者介面，透過影片的書籤影格可以讓使用者快速跳轉至所要查詢的影片位置，讓使用者可以快速地找到所需的影片內容。另外，本發明可讓使用者以更智慧的方式瀏覽影片，節省影片片段的搜尋及觀看時間。 The purpose of the present invention is to provide a method for generating a bookmark frame, an audio-visual playback device for automatically generating bookmarks, and a user interface thereof. Through the bookmark frame of the movie, the user can quickly jump to the location of the movie to be queried, so that the user can quickly Find the desired video content. In addition, the present invention allows users to browse videos in a smarter way, saving time for searching and viewing video clips.

本發明以“重要影格的呈現，應讓人有細讀時間”的核心概念，提出了一種透過相似畫面的呈現時間生成書籤影格的方法、自動生成書籤的影音播放裝置及其使用者介面。本發明經研究後發現，影片製作人為了傳達重要訊息，會將重要的畫面停留較久的時間，根據此特性本研究提出了一種以相似影格呈現時間多寡來取出影片中重要的畫面。 Based on the core concept of "the presentation of important frames should allow people to have time to read carefully", the present invention proposes a method for generating bookmark frames through the presentation time of similar pictures, an audio-visual playback device that automatically generates bookmarks, and a user interface thereof. According to the research of the present invention, it is found that in order to convey important information, the film producer will stay the important picture for a longer time. Based on this characteristic, this research proposes a similar frame presentation time to extract the important picture in the film.

為達上述目的，依據本發明之一種書籤影格的生成方法，包括：從一影片中擷取依時間排序的多個影格；計算該些影格中，兩相鄰影格間的結構相似度；當該些影格中的一第一影格與其後相鄰之一第二影格的結構相似度小於一預設值，且該第一影格之後且於一預設時間內的所有影格對應的結構相似度皆大於或等於該預設值時，則將該第二影格指定為一書籤影格；以及，儲存該書籤影格的影像及其出現時間。 To achieve the above object, a method for generating bookmark frames according to the present invention includes: extracting a plurality of frames sorted by time from a movie; calculating the structural similarity between two adjacent frames in the frames; The structural similarity between a first frame and the next adjacent second frame in these frames is less than a predetermined value, and the structural similarity of all frames after the first frame and within a predetermined period of time is greater than When it is equal to the preset value, the second frame is designated as a bookmark frame; and the image of the bookmark frame and its appearance time are stored.

為達上述目的，依據本發明之一種自動生成書籤的影音播放裝置，包括一或多個處理單元以及一記憶單元，該記憶單元電性連接該一或多個處理單元，該記憶單元儲存一或多個程式指令，當該一或多個程式指令被該一或多個處理單元執行時，該一或多個處理單元至少進行：從一影片中擷取依時間排序的多個影格；計算該些影格中，兩相鄰影格間的結構相似度；當該些影格中的一第一影格與其後相鄰之一第二影格的結構相似度小於一預設值，且該第一影格之後且於一預設時間內的所有影格對應的結構相似度皆大於或等於該預設值時，則將該第二影格指定為一書籤影格，並儲存該書籤影格的影像及其出現時間；以及對該書籤影格進行圖像文字辨識，以生成一書籤。 To achieve the above objective, an audio-visual playback device for automatically generating bookmarks according to the present invention includes one or more processing units and a memory unit, the memory unit is electrically connected to the one or more processing units, and the memory unit stores one or more A plurality of program instructions, when the one or more program instructions are executed by the one or more processing units, the one or more processing units at least perform: extract a plurality of frames sorted by time from a video; calculate the In some frames, the structural similarity between two adjacent frames; when the structural similarity between a first frame in the frames and a second adjacent frame after the frame is less than a preset value, and after the first frame, and When the structural similarity corresponding to all frames within a preset time is greater than or equal to the preset value, the second frame is designated as a bookmark frame, and the image of the bookmark frame and its appearance time are stored; and The bookmark frame performs image text recognition to generate a bookmark.

為達上述目的，依據本發明之一種自動生成書籤的影音播放裝置的使用介面，該影音播放裝置包括一顯示單元，該顯示單元呈現該使用者介面，該使用者介面包括一書籤呈現區以及一影片播放區。該書籤呈現區顯示至少一書籤，其中該書籤是由該影音播放裝置根據以下的步驟自動產生：從一影片中擷取依時間排序的多個影格；計算該些影格中，兩相鄰影格間的結構相似度；當該些影格中的一第一影格與其後相鄰之一第二影格的結構相似度小於一預設值，且該第一影格之後且於一預設時間內的所有影格對應的結構相似度皆大於或等於該預設值時，則將該第二影格指定為一書籤影格；儲存該書籤影格的影像及其出現時間；及，對該書籤影格進行圖像文字辨識，以生成該書籤。該影片播放區鄰設於該書籤呈現區；其中，使用者點擊該書籤時，則該影音播放裝置於該影片播放區播放對應於該書籤的影片片段。 In order to achieve the above objective, according to the present invention, a user interface of an audio-visual playback device that automatically generates bookmarks includes a display unit that presents the user interface, the The user interface includes a bookmark presentation area and a video playback area. The bookmark presentation area displays at least one bookmark, wherein the bookmark is automatically generated by the audio-visual playback device according to the following steps: extracting a plurality of frames sorted by time from a video; calculating between two adjacent frames of the frames The structural similarity of; when the structural similarity of a first frame and a second adjacent frame in the frames is less than a predetermined value, and all frames after the first frame and within a predetermined time When the corresponding structural similarity is greater than or equal to the preset value, the second frame is designated as a bookmark frame; the image of the bookmark frame and its appearance time are stored; and the image text recognition is performed on the bookmark frame, To generate the bookmark. The video playback area is adjacent to the bookmark presentation area; wherein, when the user clicks on the bookmark, the video and audio playback device plays a video clip corresponding to the bookmark in the video playback area.

在一實施例中，該預設值介於0.8與0.9之間。 In one embodiment, the preset value is between 0.8 and 0.9.

在一實施例中，該預設時間介於1秒與3秒之間。 In one embodiment, the predetermined time is between 1 second and 3 seconds.

在一實施例中，該書籤包含文字，或是文字和圖像的組合。 In one embodiment, the bookmark contains text, or a combination of text and image.

在一實施例中，該一或多個處理單元更進行以下步驟：儲存該書籤於該記憶單元。 In one embodiment, the one or more processing units further perform the following steps: storing the bookmark in the memory unit.

在一實施例中，影音播放裝置更包括一顯示單元，其電性連接該一或多個處理單元，其中，該一或多個處理單元更進行以下步驟：使該顯示單元呈現一使用者介面，其中該使用者介面包括一書籤呈現區；及顯示該書籤於該書籤呈現區。 In one embodiment, the audio-visual playback device further includes a display unit which is electrically connected to the one or more processing units, wherein the one or more processing units further perform the following steps: make the display unit present a user interface , Wherein the user interface includes a bookmark presentation area; and the bookmark is displayed in the bookmark presentation area.

在一實施例中，該書籤的數量為多個，該多個書籤以條列的方式顯示於該書籤呈現區。 In one embodiment, the number of the bookmarks is multiple, and the multiple bookmarks are displayed in the bookmark presentation area in a row.

在一實施例中，使用者介面更包括一書籤搜尋區，其鄰設於該書籤呈現區；其中，使用者於該書籤搜尋區輸入一關鍵字並進行搜尋時，該影片播放區顯示對應於該關鍵字的書籤影格。 In one embodiment, the user interface further includes a bookmark search area, which is adjacent to the bookmark presentation area; wherein, when the user enters a keyword in the bookmark search area and searches, the video playback area displays corresponding to The bookmark frame of the keyword.

承上所述，在本發明的書籤影格的生成方法、自動生成書籤的影音播放裝置及其使用者介面中，透過兩相鄰影格間的結構相似度的計算及判斷條件，可以找出影片中的書籤影格及其對應書籤，並且可在使用者介面中顯示這些書籤，且透過這些書籤讓使用者快速地跳轉至所要查詢的影片位置，藉此找到所要觀看的影片內容，因此，本發明可讓使用者以更智慧的方式瀏覽影片，節省影片片段的搜尋及觀看時間，特別適用於數位學習的教學影片。 As mentioned above, in the method for generating bookmark frames, the video and audio playback device for automatically generating bookmarks, and the user interface of the present invention, the structure similarity between two adjacent frames can be calculated and the judgment conditions can be found in the movie. The bookmark frame and its corresponding bookmarks, and these bookmarks can be displayed in the user interface, and through these bookmarks, users can quickly jump to the position of the video to be queried, thereby finding The content of the video to be watched, therefore, the present invention allows the user to browse the video in a smarter way, saving time for searching and viewing video fragments, and is particularly suitable for teaching videos for digital learning.

1:影音播放裝置 1: Audio and video playback device

11:處理單元 11: Processing unit

12:記憶單元 12: Memory unit

13:顯示單元 13: display unit

A1:影片播放區 A1: Video playback area

A2:書籤呈現區 A2: Bookmark presentation area

A3:書籤搜尋區 A3: Bookmark search area

BF1,BF2:書籤影格 BF1, BF2: Bookmark frame

BM1,BM2,BM3:書籤 BM1, BM2, BM3: bookmarks

F:影片 F: Movie

F1,F2,F3:影格 F1, F2, F3: Frame

S1,S2,S3,S31,S32,S33,S4:步驟 S1, S2, S3, S31, S32, S33, S4: steps

UI:使用者介面 UI: User interface

圖1A為本發明較佳實施例之一種書籤影格的生成方法的流程圖。 FIG. 1A is a flowchart of a method for generating a bookmark frame according to a preferred embodiment of the present invention.

圖1B為本發明較佳實施例之一種自動生成書籤之影音播放裝置的功能方塊示意圖。 FIG. 1B is a functional block diagram of a video and audio playback device for automatically generating bookmarks according to a preferred embodiment of the present invention.

圖2A為本發明一實施例之影片依時間排列之連續影格的示意圖。 2A is a schematic diagram of continuous frames of a movie arranged in time according to an embodiment of the present invention.

圖2B為本發明一實施例之影片依時間排列之影片片段的示意圖。 2B is a schematic diagram of video clips arranged in time according to an embodiment of the present invention.

圖3為本發明一實施例之影音播放裝置的示意圖。 FIG. 3 is a schematic diagram of an audio-visual playback device according to an embodiment of the invention.

以下將參照相關圖式，說明依本發明較佳實施例之書籤影格的生成方法、自動生成書籤的影音播放裝置及其使用者介面，其中相同的元件將以相同的參照符號加以說明。 Hereinafter, referring to related drawings, the method for generating a bookmark frame, an audio-visual playback device for automatically generating bookmarks, and a user interface thereof according to the preferred embodiment of the present invention will be described. The same components will be described with the same reference symbols.

圖1A為本發明較佳實施例之一種書籤影格的生成方法的流程圖，圖1B為本發明較佳實施例之一種自動生成書籤之影音播放裝置的功能方塊示意圖。先說明的是，鑒於一般教學影片中，不同影片片段之間的轉換，往往畫面都有大幅度改變，所以差異度大的影格(或稱圖像、圖片)可視為不同片段影片的開始，這種差異度大的影格，本創作將其稱為“書籤影格”。 FIG. 1A is a flowchart of a method for generating a bookmark frame according to a preferred embodiment of the present invention, and FIG. 1B is a functional block diagram of an audio-visual playback device for automatically generating bookmarks according to a preferred embodiment of the present invention. The first thing to note is that in general teaching videos, the transitions between different video clips often have large changes in the picture, so the frames (or images, pictures) with large differences can be regarded as the beginning of different clips. This kind of frame with big difference is called "bookmark frame" in this creation.

請參照圖1A所示，本發明之書籤影格的生成方法至少包括以下步驟：從一影片中擷取依時間排序的多個影格(步驟S1)、計算該些影格中，兩相鄰影格間的結構相似度(Structural Similarity,SSIM)(步驟S2)、當該些影格中的一第一影格與其後相鄰之一第二影格的結構相似度小於一預設值，且該第一影格之後且於一預設時間內的所有影格對應的結構相似度皆大於或等於該預設值時，則將該第二影格指定為一書籤影格(步驟S3包括步驟S31~S33)、以及儲存該書籤影格的影像及其出現時間(步驟S4)。 1A, the method for generating bookmark frames of the present invention at least includes the following steps: extracting a plurality of frames sorted by time from a video (step S1), calculating the number of frames between two adjacent frames Structural Similarity (Structural Similarity, SSIM) (step S2), when the structural similarity between a first frame in the frames and a second adjacent frame next to it is less than a preset value, and after the first frame, and When the structural similarity corresponding to all frames within a preset time is greater than or equal to the preset value, the second frame is designated as a bookmark frame (step S3 includes steps S31~S33), and the bookmark frame is stored And its appearance time (step S4).

前述步驟S2的結構相似度(SSIM)可定義如下：輸入的影格會將其轉為灰階，影格中的每個像素皆會有一個介於0到255之間值。 The structural similarity (SSIM) of the aforementioned step S2 can be defined as follows: the input frame will be converted to grayscale, and each pixel in the frame will have a value between 0 and 255.

給定兩個一維陣列的影格信號x和y，則兩者的結構相似性可定義為：SSIM(x,y)=[l(x,y)]^α[c(x,y)]^β[ s (x,y)]^γ、

Given the frame signals x and y of two one-dimensional arrays, the structural similarity between the two can be defined as: SSIM( x , y )=[ l ( x , y )] ^α [ c ( x , y )] ^β [ s ( x , y )] ^γ ,

其中，l(x,y)為比較x和y的亮度，c(x,y)為比較x和y的對比度，s(x,y)為比較x和y的結構(structure)。μ_x及μ_y、σ_x及σ_y分別為x和y的平均值和標準差，σ_xy為x和y的互協方差，C1、C2、C3為常數，其用以維持l(x,y)、c(x,y)、s(x,y)的穩定。α>0，β>0，γ>0，其為調整l(x,y)、c(x,y)、s(x,y)的重要參數。其中，若SSIM值越大，代表兩個信號的相似性越高。如果使用全等的兩張影格(即兩張完全相同的圖片)去做SSIM的運算，也就是說，μ_x=μ_y、σ_x=σ_y，則

Among them, l ( x , y ) is to compare the brightness of x and y , c ( x , y ) is to compare the contrast of x and y , and s ( x , y ) is to compare the structure of x and y. μ _x and μ _y , σ _x and σ _y are the mean and standard deviation of x and y _{, respectively, σ xy} is the cross-covariance of x and y , C1, C2, and C3 are constants, which are used to maintain l ( x , y ), c ( x , y ), s ( x , y ) are stable. α>0, β>0, γ>0, which are important parameters for adjusting l ( x , y ), c ( x , y ), and s ( x , y ). Among them, the larger the SSIM value, the higher the similarity between the two signals. If two congruent frames (that is, two identical pictures) are used to perform SSIM operations, that is, μ _x = μ _y , σ _x = σ _y , then

基於結構項影響運算結果甚微，為方便程式運算，再將上述公式設定為α=β=γ=1，並將C3=C2/2代入，則公式可簡化如下：

Based on the structure term affecting the calculation result very little, in order to facilitate the calculation of the formula, the above formula is set as α = β = γ =1, and C3=C2/2 is substituted, the formula can be simplified as follows:

另外，本發明的影音播放裝置可為手機、電腦(包括平板電腦、筆記型電腦、或桌上型電腦)、或影音播放器，並不限制。 In addition, the audio-visual playback device of the present invention can be a mobile phone, a computer (including a tablet computer, a notebook computer, or a desktop computer), or an audio-visual player, without limitation.

請參照圖1B所示，本實施例之自動生成書籤的影音播放裝置1包括一或多個處理單元11以及一記憶單元12，記憶單元12可透過例如匯流排電性連接該一或多個處理單元11。於此，圖1B是以顯示一個處理單元11與一個記憶單元12為例。 1B, the audio-visual playback device 1 for automatically generating bookmarks of this embodiment includes one or more processing units 11 and a memory unit 12. The memory unit 12 can be electrically connected to the one or more processing units through, for example, a bus. Unit 11. Here, FIG. 1B shows one processing unit 11 and one memory unit 12 as an example.

處理單元11可存取記憶單元12所儲存的資料。處理單元11可包含影音播放裝置1之核心控制組件，例如可包含至少一中央處理器(CPU)及一記憶體，或包含其它控制硬體、軟體或韌體。而記憶單元12可儲存至少一應用軟體，該應用軟體可包含一或多個程式指令，當應用軟體的該一或多個程式指令被該一或多個處理單元11執行時，該一或多個處理單元11可執行至少前述書籤影格生成方法的步驟S1至步驟S4，以及對該書籤影格進行圖像文字辨識，以生成一書籤的內容。 The processing unit 11 can access data stored in the memory unit 12. The processing unit 11 may include the core control components of the audio-visual playback device 1, for example, may include at least one central processing unit (CPU) and a memory, or include other control hardware, software, or firmware. The memory unit 12 can store at least one application software. The application software can include one or more program instructions. When the one or more program instructions of the application software are When one or more processing units 11 are executed, the one or more processing units 11 can perform at least steps S1 to S4 of the aforementioned bookmark frame generation method, and perform image and text recognition on the bookmark frame to generate the content of a bookmark .

記憶單元12可為一非暫態電腦可讀取記錄媒體(non-transitory computer readable storage medium)，例如可包含至少一記憶體、一記憶卡、一光碟片、一錄影帶、一電腦磁帶，或其任意組合。記憶體可包含唯讀記憶體(ROM)、快閃(Flash)記憶體、或可程式化邏輯閘陣列(Field-Programmable Gate Array,FPGA)，或其他形式的記憶體，或其組合。記憶單元12可為影音播放裝置1內建的記憶體，或是一雲端記憶體而位於雲端裝置中，故應用軟體也可儲存於雲端裝置，使用者再由雲端裝置載入影音播放裝置1中即可執行。 The memory unit 12 may be a non-transitory computer readable storage medium (non-transitory computer readable storage medium), for example, may include at least one memory, a memory card, an optical disc, a video tape, a computer tape, or Any combination of it. The memory may include read-only memory (ROM), flash memory, or field-programmable gate array (FPGA), or other forms of memory, or a combination thereof. The memory unit 12 can be a built-in memory of the audio-visual playback device 1, or a cloud memory located in the cloud device, so the application software can also be stored in the cloud device, and the user can load the audio-visual playback device 1 from the cloud device It can be executed.

另外，本實施例的影音播放裝置1更可包括一顯示單元13，顯示單元13與處理單元11電性連接。顯示單元13可為顯示螢幕、顯示屏、或監視器。 In addition, the audio-visual playback device 1 of this embodiment may further include a display unit 13 which is electrically connected to the processing unit 11. The display unit 13 can be a display screen, a display screen, or a monitor.

以下，請參照圖1A及圖1B，並配合圖2A、圖2B及圖3以說明前述的書籤影格的生成方法、自動生成書籤的影音播放裝置及其使用者介面的詳細技術內容。其中，圖2A為本發明一實施例之影片依時間排列之連續影格的示意圖，圖2B為本發明一實施例之影片依時間排列之影片片段的示意圖，而圖3為本發明一實施例之影音播放裝置的示意圖。其中，圖3的影音播放裝置1是以筆記型電腦為例。 Hereinafter, please refer to FIG. 1A and FIG. 1B in conjunction with FIG. 2A, FIG. 2B and FIG. 3 to illustrate the foregoing bookmark frame generation method, the audio-visual playback device for automatically generating bookmarks and the detailed technical content of the user interface thereof. 2A is a schematic diagram of continuous frames of a film arranged in time according to an embodiment of the present invention, FIG. 2B is a schematic diagram of film fragments arranged in time according to an embodiment of the present invention, and FIG. 3 is a schematic diagram of an embodiment of the present invention Schematic diagram of an audio-visual playback device. Among them, the audio-visual playback device 1 in FIG. 3 is an example of a notebook computer.

首先，當影音播放裝置1載入影片F後，處理單元11可透過轉檔軟體(例如ffmpeg軟體)將影片F先轉換成影音播放裝置1可支援的格式(例如由MKV格式或MP4格式轉換成FLV格式)，同時辨識影片F的基本數據，例如解析度、影片時間等資訊，以便進行後續的處理。 First, after the audio-visual playback device 1 loads the video F, the processing unit 11 can first convert the video F into a format supported by the audio-visual playback device 1 (for example, from MKV format or MP4 format) through conversion software (such as ffmpeg software). FLV format), and at the same time identify the basic data of the video F, such as resolution, video time and other information for subsequent processing.

接著，處理單元11再從影片F中擷取依時間排序的多個影格(即步驟S1)，該些影格例如圖2A中出現的影格F1~F5...。於此，處理單元11擷取依時間排序的多個影格後，再利用影像處理軟體(例如OpenCV軟體)將這些影格轉換成灰階影格，以進行步驟S2的SSIM值的計算。轉換前與轉換後的影格，本發明仍統稱為“影格”。 Then, the processing unit 11 then extracts a plurality of frames sorted by time from the movie F (ie, step S1), such as the frames F1 to F5 shown in FIG. 2A. Here, the processing unit 11 captures a plurality of frames sorted by time, and then uses image processing software (such as OpenCV software) to convert these frames into gray-scale frames to calculate the SSIM value in step S2. The frames before and after conversion are still collectively referred to as "frames" in the present invention.

之後，處理單元11再計算該些影格中，兩相鄰影格間的結構相似度(步驟S2)。具體來說，處理單元11會計算所有影格之間的SSIM值(如影格F1與影格F2之間的SSIM值、影格F2與影格F3之間的SSIM值…)，直到得到所有影格對應的SSIM值。其中，影格F1與影格F2之間的SSIM值即為影格F1對應的SSIM值，影格F2與影格F3之間的SSIM值即為影格F2對應的SSIM值，以此類推。若有100個影格的話，則會有對應的99個SSIM值。 After that, the processing unit 11 calculates the structural similarity between two adjacent frames in the frames (step S2). Specifically, the processing unit 11 will calculate the SSIM value between all frames (such as the SSIM value between frame F1 and frame F2, the SSIM value between frame F2 and frame F3...), until the SSIM value corresponding to all frames is obtained. . Among them, the SSIM value between frame F1 and frame F2 is the SSIM value corresponding to frame F1, the SSIM value between frame F2 and frame F3 is the SSIM value corresponding to frame F2, and so on. If there are 100 frames, there will be 99 corresponding SSIM values.

之後，處理單元11再依序檢查各SSIM值並進行判斷，即進行步驟S3：當該些影格中的一第一影格與其後相鄰之一第二影格的結構相似度小於一預設值(步驟S31)，則再檢查第一影格之後且於預設時間內的所有影格對應的SSIM值是否皆大於或等於該預設值(步驟S32)，亦即檢查預設時間內，第一影格之後出現的第二影格、第三影格、第四影格…所分別對應的SSIM值，如果預設時間內出現的所有影格對應的SSIM值全部都是“是”的話，則將該第二影格指定為書籤影格(步驟S33)，並儲存該書籤影格的影像及其出現時間(步驟S4)於記憶單元12中。如果步驟S32中有一個影格對應的SSIM值小於預設值，則步驟S32為“否”。 After that, the processing unit 11 checks each SSIM value in order and makes a judgment, namely, proceed to step S3: When the structural similarity between a first frame in the frames and a second adjacent frame after it is less than a preset value ( Step S31), then check whether the SSIM values corresponding to all the frames after the first frame and within the preset time are greater than or equal to the preset value (Step S32), that is, check the preset time, after the first frame The SSIM value corresponding to the second frame, the third frame, the fourth frame..., if the SSIM values corresponding to all the frames appearing within the preset time are all "Yes", then the second frame is designated as Bookmark frame (step S33), and store the image of the bookmark frame and its appearance time (step S4) in the memory unit 12. If the SSIM value corresponding to one frame in step S32 is less than the preset value, then step S32 is "No".

具體來說，如果影格F1對應的SSIM值小於預設值，則表示影格F1與影格F2的圖像差異度大，則再分別檢查後續於預設時間內出現的影格F2、F3…所分別對應的SSIM值，如果預設時間內出現的影格F2、F3…所分別對應的SSIM值全部者都大於或等於預設值時，則表示後續預設時間內出現的全部影格的差異度都不大(屬於同一個影片片段的畫面)，則影格F2即為書籤影格，處理單元11則將書籤影格(影格F2)的影像及其出現時間儲存於記憶單元12中。 Specifically, if the SSIM value corresponding to frame F1 is less than the preset value, it indicates that the image difference between frame F1 and frame F2 is large, and then check the subsequent frames F2, F3... which appear within the preset time. If the SSIM values corresponding to frames F2, F3... that appear within the preset time are all greater than or equal to the preset value, it means that the difference of all frames that appear within the subsequent preset time is not large (Pictures belonging to the same video segment), the frame F2 is the bookmark frame, and the processing unit 11 stores the image of the bookmark frame (frame F2) and its appearance time in the memory unit 12.

影格F1對應的SSIM值檢查完後，處理單元11會再繼續檢查影格F1後面出現之影格F2所對應的SSIM值，亦即回到步驟S31，直到所有影格對應的SSIM值皆檢查完成為止，藉此，即可找出影片F中的所有書籤影格及其對應的影片片段。提醒的是，上述步驟中出現的第一影格、第二影格、第三影格…為影片F中依時間排序的連續出現的多個影格，其可為影片F剛開始出現的第一個影格、第二個影格和第三個影格…，也可以是影片F中任何時間(例如中間部分)出現的連續多個影格。 After the SSIM value corresponding to frame F1 is checked, the processing unit 11 will continue to check the SSIM value corresponding to frame F2 appearing after frame F1, that is, return to step S31, until the SSIM values corresponding to all frames have been checked. In this way, all the bookmark frames in the movie F and their corresponding movie fragments can be found. It is reminded that the first frame, the second frame, and the third frame that appear in the above steps... are multiple frames that appear consecutively in the sequence of time in the film F, which can be the first frame that appears at the beginning of the film F, The second frame and the third frame... can also be multiple consecutive frames that appear at any time in the film F (such as the middle part).

在上述中，教學影片的該預設值可介於0.8與0.9之間，例如為0.85；而該預設時間可介於1秒與3秒之間，例如為2秒。換句話說，在教學影片中，若某一影格(如第一影格)對應的SSIM值小於例如0.85，且該第一影格之後例如2秒內出現的所有影格中(即第一影格之後2秒內出現的第二影格、第三影格、第四影格...等)，其對應的SSIM值皆大於或等於例如0.85時，則可將第一影格之後出現的第二影格視為一個書籤影格(如圖2B中的書籤影格BF1、BF2)，並且兩個書籤影格之間的影片片段即為該書籤影格對應的影片，例如書籤影格BF1與書籤影格BF2之間的影片片段1即為書籤影格BF1對應的影片。特別提醒的是，當影片不是教學影片時，則前述的預設值及預設時間的範圍可能不同，視影片的特性而定。 In the above, the preset value of the teaching video may be between 0.8 and 0.9, such as 0.85; and the preset time may be between 1 second and 3 seconds, such as 2 seconds. In other words, in the teaching video, if the SSIM value corresponding to a certain frame (such as the first frame) is less than 0.85, for example, and in all the frames that appear within, for example, 2 seconds after the first frame (ie, 2 seconds after the first frame) When the second frame, third frame, fourth frame... etc. appearing in the frame, its corresponding SSIM value is greater than or equal to, for example, 0.85, then the second frame appearing after the first frame can be regarded as a bookmark frame (Bookmark frames BF1 and BF2 in Figure 2B), and the movie fragment between the two bookmark frames is the movie corresponding to the bookmark frame, for example, the movie fragment 1 between the bookmark frame BF1 and the bookmark frame BF2 is the bookmark frame The video corresponding to BF1. It is specially reminded that when the video is not a teaching video, the aforementioned preset value and preset time range may be different, depending on the characteristics of the video.

相反的，如果步驟S31為“否”的話，表示該第一影格與第二影格的差異度不夠大(兩者可能為同一個影片片段的相關影格)，則回到步驟S31，再繼續檢查第二影格對應的SSIM值是否滿足該條件。另外，如果步驟S31為“是”，但步驟S32為“否”的話，則再回到步驟S31，直到影片F中的所有影格對應的SSIM值皆檢查完，即可找出影片F中的所有書籤影格。 On the contrary, if step S31 is “No”, it means that the difference between the first frame and the second frame is not large enough (the two may be related frames of the same movie segment), then return to step S31 and continue to check the first frame. Whether the SSIM value corresponding to the two frames meets this condition. In addition, if step S31 is "Yes" but step S32 is "No", then go back to step S31 until the SSIM values corresponding to all frames in movie F are checked, and then all the frames in movie F can be found Bookmark frame.

找到書籤影格後，處理單元11可再對書籤影格進行圖像文字的辨識，以生成對應於該書籤影格的書籤，並將該書籤儲存於記憶單元12中。換句話說，在取得書籤影格BF1、BF2…之後，處理單元11可使用圖像識別軟體(例如Google Cloud Vision)進行書籤影格BF1、BF2…的圖像文字辨識，進而辨識出各書籤影格BF1、BF2中的文字訊息，這些文字訊息透過文字重點權重算法可找出代表該影格的代表敘述，其公式如下：

After the bookmark frame is found, the processing unit 11 can recognize the image and text of the bookmark frame to generate a bookmark corresponding to the bookmark frame, and store the bookmark in the memory unit 12. In other words, after obtaining the bookmark frames BF1, BF2..., the processing unit 11 can use image recognition software (such as Google Cloud Vision) to perform image text recognition of the bookmark frames BF1, BF2..., and then identify the bookmark frames BF1, BF2... The text messages in BF2. These text messages can be used to find the representative narrative of the frame through the text emphasis weighting algorithm. The formula is as follows:

F1為字體佔畫面中所有字體的比例，C為顏色權重，SZ為字形大小，T為影片中的書籤影格總數，CT為影片中包含該文字的影格數，NLP為語音重點權重。其中，F1用於偵測出影格中的特殊字體，如：粗體，斜體，此種特殊字體所佔比例小，但多為重要內容標示。C為自定義的普遍認知的文字顏色權重，如：紅色最高、黑色最低。SZ為該文字訊息的字形大小單位為pixel，log(

)用於計算逆向檔案頻率，去除書籤影格中重複性高的文字訊息，NLP為分析語音訊息後，該文字訊息的TextRank權重。在選出書籤影格中權重最高的文字訊息後，將其作為各書籤影格BF1、BF2的代表敘述，代表敘述即為書籤的名稱。其他的文字訊息透過斷詞處理後會成為該書籤影格的關鍵字，方便使用者透過影片內容來搜尋對應的書籤影格，進而搜尋對應的影片片段。當然，書籤的名稱也可以是該書籤影格的關鍵字之一。除了文字之外，處理單元11也可擷取書籤影格中的部分或全部圖像作為該書籤影格的代表(書籤)。換句話說，書籤影格所對應的書籤可以包含代表書籤影格的文字，或是文字和圖像的組合。此外，處理單元11也可儲存各書籤、各書籤影格及各影片片段之間的關聯性。 F1 is the ratio of fonts to all fonts in the screen, C is the color weight, SZ is the font size, T is the total number of bookmark frames in the movie, CT is the number of frames containing the text in the movie, and NLP is the voice focus weight. Among them, F1 is used to detect special fonts in the frame, such as bold and italic. This special font occupies a small proportion, but it is mostly important content indication. C is the custom and universally recognized text color weight, such as: red is the highest and black is the lowest. SZ is the font size of the text message. The unit is pixel, log (

) Is used to calculate the frequency of the reverse file and remove the repetitive text messages in the bookmark frame. NLP is the TextRank weight of the text message after analyzing the voice message. After selecting the text message with the highest weight in the bookmark frame, it is used as the representative description of each bookmark frame BF1 and BF2, and the representative description is the name of the bookmark. Other text messages will become the keywords of the bookmark frame after word hyphenation, which is convenient for users to search for the corresponding bookmark frame through the video content, and then search for the corresponding video clip. Of course, the name of the bookmark can also be one of the keywords of the bookmark frame. In addition to text, the processing unit 11 can also capture part or all of the images in the bookmark frame as a representative (bookmark) of the bookmark frame. In other words, the bookmark corresponding to the bookmark frame can contain text representing the bookmark frame, or a combination of text and image. In addition, the processing unit 11 can also store the associations between bookmarks, bookmark frames, and video clips.

最後，處理單元11更可進行以下步驟：使顯示單元13呈現一使用者介面UI。如圖3所示，處理單元11可使顯示單元13呈現的使用者介面UI包括有一影片播放區A1及一書籤呈現區A2。顧名思義，影片播放區A1為播放影片的區域，而書籤呈現區A2則為顯示至少一個書籤(名稱)的區域。因此，在得到書籤後，處理單元11可控制使用者介面UI的書籤呈現區A2顯示該書籤影格的對應書籤，亦即顯示書籤影格對應的書籤名稱於書籤呈現區A2。當然，如果影片有多個書籤時，如圖3的書籤BM1、BM2、BM3…等(一個書籤名稱對應一個書籤影格及一個影片片段)，則書籤BM1、BM2、BM3…可以條列的方式顯示於書籤呈現區A2。並且，當使用者點擊書籤呈現區A2的某一書籤時，例如點擊書籤BM1，則可在影片播放區A1顯示對應書籤BM1的書籤影格BF1，當然也可播放對應的影片片段1。此外，在一些實施例中，處理單元11也可自動產出一份書籤影格與其逐字稿的文字檔案作為使用者的學習筆記，節省使用者抄筆記及學習時間。 Finally, the processing unit 11 may further perform the following steps: cause the display unit 13 to present a user interface UI. As shown in FIG. 3, the processing unit 11 may cause the user interface UI presented by the display unit 13 to include a video playing area A1 and a bookmark presenting area A2. As the name implies, the movie playing area A1 is an area for playing movies, and the bookmark presentation area A2 is an area where at least one bookmark (name) is displayed. Therefore, after obtaining the bookmark, the processing unit 11 can control the bookmark presentation area A2 of the user interface UI to display the corresponding bookmark of the bookmark frame, that is, display the bookmark name corresponding to the bookmark frame in the bookmark presentation area A2. Of course, if the movie has multiple bookmarks, such as the bookmarks BM1, BM2, BM3... etc. in Figure 3 (a bookmark name corresponds to a bookmark frame and a movie clip), the bookmarks BM1, BM2, BM3... can be displayed in a row In the bookmark presentation area A2. Moreover, when the user clicks a bookmark in the bookmark presentation area A2, for example, clicks the bookmark BM1, the bookmark frame BF1 corresponding to the bookmark BM1 can be displayed in the movie playback area A1, and of course the corresponding movie fragment 1 can also be played. In addition, in some embodiments, the processing unit 11 can also automatically generate a text file of the bookmark frame and its verbatim manuscript as the user's study notes, which saves the user from copying notes and studying time.

另外，本實施例的使用者介面UI除了有影片播放區A1及書籤呈現區A2外，還可包括有書籤搜尋區A3，書籤搜尋區A3鄰設於書籤呈現區A2。於此，書籤搜尋區A3可位於書籤呈現區A2的上側、下側、左側、或右側，並不限制。當使用者於書籤搜尋區A3輸入一關鍵字並進行搜尋時，可以找到對應的書籤。其中，該關鍵字可為該書籤的名稱，或是該書籤對應的書籤影格的文字內容。找到對應的書籤後，使用者點選該書籤時，影片播放區A1可以顯示對應於該關鍵字的書籤影格，使用者可以按播放以在影片播放區A1觀看該書籤影格對應的影片片段，因此，可以讓使用者不需憑著記憶一直拉時間軸來尋找想要觀看的內容，透過書籤的搜尋可以以更智慧的方式瀏覽影片，節省使用者的觀看時間。 In addition, in addition to the video playback area A1 and the bookmark presentation area A2, the user interface UI of this embodiment may also include a bookmark search area A3, and the bookmark search area A3 is adjacent to the bookmark presentation area A2. Here, the bookmark search area A3 can be located on the upper, lower, left, or right side of the bookmark presentation area A2, without limitation. When the user enters a keyword in the bookmark search area A3 and searches, the corresponding bookmark can be found. Wherein, the keyword can be the name of the bookmark or the text content of the bookmark frame corresponding to the bookmark. After finding the corresponding bookmark, when the user clicks on the bookmark, the video playback area A1 can display the bookmark frame corresponding to the keyword, and the user can press play to watch the video corresponding to the bookmark frame in the video playback area A1. Therefore, users can search for the content they want to watch without pulling the timeline from memory. Searching through bookmarks can browse the videos in a smarter way, saving users viewing time.

因此，本發明的影音播放裝置1藉由上述的書籤影格的生成方法及圖像文字分析技術，可以獲取書籤影格的文字內容，進而得到對應的書籤(及關鍵字)，方便使用者在顯示單元13顯示的使用者介面UI中進行關鍵字查詢。並且，透過使用者介面UI顯示的這些書籤可以讓使用者快速地跳轉至其對應的影片位置，也可透過這些書籤進行影片段落的快轉與倒轉，因此，就算再長的影片也可讓使用者在短時間內準確地找到所需的影片內容，增加使用者對影片的掌握度，提升影片的價值。 Therefore, the audio-visual playback device 1 of the present invention can obtain the text content of the bookmark frame by using the above-mentioned bookmark frame generation method and image text analysis technology, and then obtain the corresponding bookmark (and keyword), which is convenient for the user to display on the display unit. 13 Perform keyword query in the displayed user interface UI. Moreover, these bookmarks displayed through the user interface UI can allow users to quickly jump to their corresponding video locations, and can also fast forward and reverse video paragraphs through these bookmarks. Therefore, even long videos can be used The user can accurately find the desired video content in a short time, increase the user’s grasp of the video, and enhance the value of the video.

綜上所述，在本發明的書籤影格的生成方法、自動生成書籤的影音播放裝置及其使用者介面中，透過兩相鄰影格間的結構相似度的計算及判斷條件，可以找出影片中的書籤影格及其對應書籤，並且可在使用者介面中顯示這些書籤，且透過這些書籤讓使用者快速地跳轉至所要查詢的影片位置，藉此找到所要觀看的影片內容，因此，本發明可讓使用者以更智慧的方式瀏覽影片，節省影片片段的搜尋及觀看時間，特別適用於數位學習的教學影片。 In summary, in the method for generating bookmark frames, the audio-visual playback device for automatically generating bookmarks, and the user interface of the present invention, the structure similarity between two adjacent frames can be calculated and the judgment conditions can be found in the movie. The bookmark frame and its corresponding bookmarks, and these bookmarks can be displayed in the user interface, and through these bookmarks, the user can quickly jump to the position of the video to be queried, thereby finding the content of the video to be watched. Therefore, the present invention can Allows users to browse videos in a smarter way, saving video clips search and viewing time, especially suitable for teaching videos for digital learning.

以上所述僅為舉例性，而非為限制性者。任何未脫離本發明之精神與範疇，而對其進行之等效修改或變更，均應包含於後附之申請專利範圍中。 The above descriptions are merely illustrative and not restrictive. Any equivalent modifications or alterations that do not depart from the spirit and scope of the present invention should be included in the scope of the appended patent application.

S1,S2,S3,S31,S32,S33,S4:步驟 S1, S2, S3, S31, S32, S33, S4: steps

Claims

A method for generating bookmark frames includes: extracting multiple frames sorted by time from a movie; calculating the structural similarity between two adjacent frames in the frames; when a first frame in the frames is When the structural similarity of the next adjacent second frame is less than a preset value, and the structural similarity of all the frames after the first frame and within a preset time is greater than or equal to the preset value, then Designating the second frame as a bookmark frame; and storing the image of the bookmark frame and its appearance time.

The generating method according to claim 1, wherein the preset value is between 0.8 and 0.9.

The generating method according to claim 1, wherein the preset time is between 1 second and 3 seconds.

An audio-visual playback device for automatically generating bookmarks, comprising: one or more processing units; and a memory unit electrically connected to the one or more processing units, the memory unit storing one or more program instructions, when the one or more When a program instruction is executed by the one or more processing units, the one or more processing units at least perform: extract a plurality of frames sorted by time from a video; calculate the difference between two adjacent frames in the frames Structural similarity; when the structural similarity of a first frame and a second adjacent frame in the frames is less than a preset value, and all frames after the first frame and within a preset time correspond to When the structural similarity of is greater than or equal to the preset value, the second frame is designated as a bookmark frame, and the image of the bookmark frame and its appearance time are stored; and the image and text recognition of the bookmark frame is performed to Generate a bookmark.

The audio-visual playback device according to claim 4, wherein the preset value is between 0.8 and 0.9.

The audio-visual playback device according to claim 4, wherein the preset time is between 1 second and 3 seconds.

The audio-visual playback device according to claim 4, wherein the bookmark contains text, or a combination of text and image.

The audio-visual playback device according to claim 4, wherein the one or more processing units further perform the following steps: storing the bookmark in the memory unit.

The audio-visual playback device according to claim 4, wherein the audio-visual playback device further includes: a display unit electrically connected to the one or more processing units, wherein the one or more processing units further perform the following steps: The display unit presents a user interface, wherein the user interface includes a bookmark presentation area; and displays the bookmark in the bookmark presentation area.

The audio-visual playback device according to claim 9, wherein the number of the bookmarks is multiple, and the multiple bookmarks are displayed in the bookmark presentation area in a row.

A user interface of an audio-visual playback device. The audio-visual playback device includes a display unit for presenting the user interface. The user interface includes: a bookmark presentation area displaying at least one bookmark, wherein the bookmark is created by the audio The playback device automatically generates according to the following steps: extracting multiple frames sorted by time from a video; calculating the structural similarity between two adjacent frames in the frames; when a first frame in the frames is When the structural similarity of the next adjacent second frame is less than a preset value, and the structural similarity of all the frames after the first frame and within a preset time is greater than or equal to the preset value, then Designate the second frame as a bookmark frame; store the image of the bookmark frame and its appearance time; and perform image and text recognition on the bookmark frame to generate the bookmark; and a video playback area adjacent to the bookmark Presentation area; wherein, when the user clicks on the bookmark, the video playback device plays the video clip corresponding to the bookmark in the video playback area.

The user interface according to claim 11, wherein the bookmark includes text, or a combination of text and image.

The user interface according to claim 11, wherein the number of the bookmarks is multiple, and the multiple bookmarks are displayed in the bookmark presentation area in a bar.

The user interface described in claim 11 further includes: a bookmark search area adjacent to the bookmark display area; Wherein, when the user enters a keyword in the bookmark search area and searches, the video playback area displays the bookmark frame corresponding to the keyword.