TWI699663B

TWI699663B - Segmentation method, segmentation system and non-transitory computer-readable medium

Info

Publication number: TWI699663B
Application number: TW108104097A
Authority: TW
Inventors: 藍國誠; 詹詩涵
Original assignee: 台達電子工業股份有限公司
Priority date: 2018-09-07
Filing date: 2019-02-01
Publication date: 2020-07-21
Also published as: CN110891202A; TW202011231A; TW202011749A; CN110889034A; TWI696386B; SG10201906347QA; CN110888896B; SG10201905532QA; CN110888994A; JP2020042777A; TWI700597B; TWI709905B; TWI725375B; SG10201905236WA; TW202011221A; TW202011222A; TW202011232A; CN110895654A; SG10201905523TA; CN110888896A

Abstract

The present disclosure relates to a segmentation method, a segmentation system and a non-transitory computer-readable medium. The segmentation method includes the following operations: receiving captioning information, wherein the captioning information includes a plurality of captioning sentences; selecting the captioning sentences according to a defalut value and dividing the selected captioning sentence into a first paragraph; performing a common segmentation vocabulary judgment for a first captioning sentence; wherein the first captioning sentence is one of the captioning sentences; and generating a second paragraph or merging the first captioning sentence into the first paragraph according to a judgment result of the common segmentation vocabulary judgment.

Description

Segmentation method, segmentation system and non-transient computer readable medium

本揭示內容關於一種分段方法、分段系統及非暫態電腦可讀取媒體，且特別是有關於一種針對字幕的分段方法、分段系統及非暫態電腦可讀取媒體。The present disclosure relates to a segmentation method, a segmentation system and a non-transitory computer readable medium, and more particularly to a segmentation method, a segmentation system and a non-transitory computer readable medium for subtitles.

線上學習平台是指一種將眾多學習資料儲存於伺服器中，讓使用者能透過網際網路連線至伺服器，以隨時瀏覽學習資料的網路服務。在現行的各類線上學習平台中，提供的學習資料類型包含影片、音訊、簡報、文件或論壇。Online learning platform refers to a network service that stores many learning materials in a server, so that users can connect to the server through the Internet to browse the learning materials at any time. In the current various online learning platforms, the types of learning materials provided include videos, audios, presentations, documents or forums.

由於線上學習平台中儲存的學習資料數量龐大，為了能夠方便使用者的使用，需要針對學習資料的文字進行自動分段以及建立段落關鍵字。因此，如何根據學習影片的內容之間的差異性進行處理，達到將學習影片中類似的主題進行分段並標註關鍵字的功能是本領域待解決的問題。Due to the huge amount of learning materials stored in the online learning platform, in order to facilitate the use of users, it is necessary to automatically segment the text of the learning materials and create paragraph keywords. Therefore, how to deal with the differences between the contents of the learning videos to achieve the function of segmenting similar topics in the learning videos and labeling keywords is a problem to be solved in this field.

本揭示內容之第一態樣是在提供一種分段方法。分段方法包含下列步驟：接收字幕資訊；其中，字幕資訊包含複數個字幕句；根據設定值選取字幕句，並將被選取的字幕句分為第一段落；針對第一字幕句進行常見分段詞彙判斷；其中，第一字幕句是字幕句的其中之一；以及根據常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落。The first aspect of the present disclosure is to provide a segmentation method. The segmentation method includes the following steps: receiving subtitle information; wherein the subtitle information includes a plurality of subtitle sentences; selecting subtitle sentences according to the set value, and dividing the selected subtitle sentences into the first paragraph; performing common segmentation vocabulary for the first subtitle sentence Judgment; where the first subtitle sentence is one of the subtitle sentences; and the second paragraph is generated or the first subtitle sentence is merged into the first paragraph according to the judgment result of the common segmentation vocabulary judgment.

本揭示內容之第二態樣是在提供一種分段系統，其包含儲存單元以及處理器。儲存單元用以儲存字幕資訊、分段結果、第一段落對應的註解以及第二段落對應的註解。處理器與儲存單元電性連接，用以接收字幕資訊；其中，字幕資訊包含複數個字幕句，處理器包含：分段單元、常見詞偵測單元、以及段落產生單元。分段單元用以利用設定值根據特定順序選取字幕句，並將被選取的字幕句分為第一段落。常見詞偵測單元與分段單元電性連接，用以針對第一字幕句進行常見分段詞彙判斷；其中，第一字幕句是該些字幕句的其中之一。段落產生單元與常見詞偵測單元電性連接，用以根據常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落。The second aspect of the present disclosure is to provide a segmented system including a storage unit and a processor. The storage unit is used to store the subtitle information, the segmentation result, the annotation corresponding to the first paragraph, and the annotation corresponding to the second paragraph. The processor is electrically connected with the storage unit to receive subtitle information; wherein the subtitle information includes a plurality of subtitle sentences, and the processor includes: a segmentation unit, a common word detection unit, and a paragraph generation unit. The segmentation unit is used to select subtitle sentences according to a specific order by using the set value, and divide the selected subtitle sentences into the first paragraph. The common word detection unit is electrically connected to the segmentation unit to perform common segmentation vocabulary judgment for the first subtitle sentence; wherein, the first subtitle sentence is one of the subtitle sentences. The paragraph generation unit is electrically connected to the common word detection unit, and is used to generate a second paragraph or merge the first subtitle sentence into the first paragraph according to the judgment result of the common segment vocabulary judgment.

本案之第三態樣是在提供一種非暫態電腦可讀取媒體包含至少一指令程序，由處理器執行至少一指令程序以實行分段方法，其包含以下步驟：接收字幕資訊；其中，字幕資訊包含複數個字幕句；根據設定值選取字幕句，並將被選取的字幕句分為第一段落；針對第一字幕句進行常見分段詞彙判斷；其中，第一字幕句是字幕句的其中之一；以及根據常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落。The third aspect of the present case is to provide a non-transitory computer-readable medium containing at least one instruction program. The processor executes at least one instruction program to implement the segmentation method, which includes the following steps: receiving subtitle information; The information contains multiple subtitle sentences; select subtitle sentences according to the set value, and divide the selected subtitle sentences into the first paragraph; perform common segmentation vocabulary judgments for the first subtitle sentence; where the first subtitle sentence is one of the subtitle sentences One; and according to the judgment result of common segmentation vocabulary, the second paragraph is generated or the first subtitle sentence is merged into the first paragraph.

本揭露之分段方法、分段系統及非暫態電腦可讀取媒體，其主要係改進以往係利用工方式進行影片段落標記，耗費大量人力以及時間的問題。先計算每一字幕句對應的關鍵字，在針對字幕句進行常見分段詞彙判斷，根據該常見分段詞彙判斷的判斷結果產生第二段落或將第一字幕句併入第一段落，以產生分段結果，達到將學習影片中類似的主題進行分段並標註關鍵字的功能。The segmentation method, segmentation system, and non-transitory computer-readable media disclosed in the present disclosure are mainly to improve the problem of using manual methods to mark video paragraphs, which consumes a lot of manpower and time. First calculate the keyword corresponding to each subtitle sentence, and perform common segmentation vocabulary judgments for subtitle sentences, and generate the second paragraph according to the judgment result of the common segmentation vocabulary or merge the first subtitle sentence into the first paragraph to generate subtitle sentences. Segment results, to achieve the function of segmenting similar topics in the learning film and labeling keywords.

以下將以圖式揭露本案之複數個實施方式，為明確說明起見，許多實務上的細節將在以下敘述中一併說明。然而，應瞭解到，這些實務上的細節不應用以限制本案。也就是說，在本揭示內容部分實施方式中，這些實務上的細節是非必要的。此外，為簡化圖式起見，一些習知慣用的結構與元件在圖式中將以簡單示意的方式繪示之。Hereinafter, multiple implementation modes of this case will be disclosed in schematic form. For the sake of clarity, many practical details will be described in the following description. However, it should be understood that these practical details should not be used to limit the case. In other words, in some implementations of the present disclosure, these practical details are unnecessary. In addition, in order to simplify the drawings, some conventionally used structures and elements are shown in the drawings in a simple and schematic manner.

於本文中，當一元件被稱為「連接」或「耦接」時，可指「電性連接」或「電性耦接」。「連接」或「耦接」亦可用以表示二或多個元件間相互搭配操作或互動。此外，雖然本文中使用「第一」、「第二」、…等用語描述不同元件，該用語僅是用以區別以相同技術用語描述的元件或操作。除非上下文清楚指明，否則該用語並非特別指稱或暗示次序或順位，亦非用以限定本發明。In this text, when a component is referred to as “connected” or “coupled”, it can be referred to as “electrically connected” or “electrically coupled”. "Connected" or "coupled" can also be used to mean that two or more components cooperate or interact with each other. In addition, although terms such as “first”, “second”, etc. are used herein to describe different elements, the terms are only used to distinguish elements or operations described in the same technical terms. Unless the context clearly indicates, the terms do not specifically refer to or imply order or sequence, nor are they used to limit the present invention.

請參閱第1圖。第1圖係根據本案之一些實施例所繪示之分段系統100的示意圖。如第1圖所繪示，分段系統100包含儲存單元110以及處理器130。儲存單元110電性連接至處理器130，儲存單元110用以儲存字幕資訊、分段結果、常見分段詞彙資料庫DB1、課程資料庫DB2、第一段落對應的註解以及第二段落對應的註解。Please refer to Figure 1. FIG. 1 is a schematic diagram of a segmentation system 100 according to some embodiments of the present application. As shown in FIG. 1, the segmentation system 100 includes a storage unit 110 and a processor 130. The storage unit 110 is electrically connected to the processor 130, and the storage unit 110 is used to store subtitle information, segmentation results, common segmentation vocabulary database DB1, course database DB2, annotations corresponding to the first paragraph, and annotations corresponding to the second paragraph.

承上述，處理器130包含關鍵字擷取單元131、分段單元132、常見詞偵測單元133、段落產生單元134以及註解產生單元135。分段單元132與關鍵字擷取單元131以及常見詞偵測單元133電性連接，段落產生單元134與常見詞偵測單元133以及註解產生單元135電性連接，常見詞偵測單元133與註解產生單元135電性連接。In view of the above, the processor 130 includes a keyword extraction unit 131, a segmentation unit 132, a common word detection unit 133, a paragraph generation unit 134, and a comment generation unit 135. The segmentation unit 132 is electrically connected to the keyword capturing unit 131 and the common word detecting unit 133, the paragraph generating unit 134 is electrically connected to the common word detecting unit 133 and the comment generating unit 135, and the common word detecting unit 133 is electrically connected to the comment The generating unit 135 is electrically connected.

於本發明各實施例中，儲存裝置110可以實施為記憶體、硬碟、隨身碟、記憶卡等。處理器130可以實施為積體電路如微控制單元(microcontroller)、微處理器(microprocessor)、數位訊號處理器(digital signal processor)、特殊應用積體電路(application specific integrated circuit，ASIC)、邏輯電路或其他類似元件或上述元件的組合。In various embodiments of the present invention, the storage device 110 may be implemented as a memory, a hard disk, a flash drive, a memory card, and so on. The processor 130 can be implemented as an integrated circuit such as a microcontroller, a microprocessor, a digital signal processor, an application specific integrated circuit (ASIC), and a logic circuit. Or other similar elements or a combination of the above elements.

請參閱第2圖。第2圖係根據本案之一些實施例所繪示之分段方法200的流程圖。於一實施例中，第2圖所示之分段方法200可以應用於第1圖的分段系統100上，處理器130用以根據下列分段方法200所描述之步驟，針對字幕資訊進行分段以產生一分段結果以及每一段落對應的註解。如第2圖所示，分段方法200首先執行步驟S210接收字幕資訊。於一實施例中，字幕資訊包含複數個字幕句。舉例而言，字幕資訊為影片的字幕檔案，影片的字幕檔案已經根據影片撥放時間將影片內容分為複數個字幕句，字幕句也會根據影片播放時間排序。Please refer to Figure 2. FIG. 2 is a flowchart of a segmentation method 200 according to some embodiments of the present application. In one embodiment, the segmentation method 200 shown in FIG. 2 can be applied to the segmentation system 100 in FIG. 1, and the processor 130 is used to divide the subtitle information according to the steps described in the following segmentation method 200 Paragraphs to produce a segmented result and corresponding notes for each paragraph. As shown in Figure 2, the segmentation method 200 first executes step S210 to receive subtitle information. In one embodiment, the subtitle information includes a plurality of subtitle sentences. For example, the subtitle information is the subtitle file of the video. The subtitle file of the video has divided the video content into a plurality of subtitle sentences according to the playback time of the video, and the subtitle sentences are also sorted according to the playback time of the video.

接著，分段方法200執行步驟S220根據設定值選取字幕句，並將被選取的字幕句分為當前段落。於一實施例中，設定值可以是任意的正整數，在此設定值以3為例，因此在此步驟中會根據影片播放的時間選擇3句字幕句組成當前段落。舉例而言，如果總共有N句字幕句，可以選擇第1字幕句~第3字幕句組成當前段落。Next, the segmentation method 200 executes step S220 to select a subtitle sentence according to the set value, and divide the selected subtitle sentence into the current paragraph. In one embodiment, the setting value can be any positive integer. Here, the setting value is 3 as an example. Therefore, in this step, 3 subtitle sentences are selected to form the current paragraph according to the time of the movie being played. For example, if there are a total of N subtitle sentences, you can select the first subtitle sentence ~ the third subtitle sentence to form the current paragraph.

接著，分段方法200執行步驟S230針對當前字幕句進行常見分段詞彙判斷。於一實施例中，常見分段詞彙係儲存於常見分段詞彙資料庫DB1，常見詞偵測單元133會偵測是否出現常見分段詞彙。常見分段詞彙可以分為常見開頭詞彙以及常見結尾詞彙。舉例而言，常見開頭詞彙可以為「接下來」、「開始說明」等，常見結尾詞彙可以為「以上說明到此」、「今天到這裡告一段落」等。在此步驟中，會偵測是否出現常見分段詞彙以及出現的常見分段詞彙類型(常見開頭詞彙或常見結尾詞彙)。Next, the segmentation method 200 executes step S230 to perform common segmentation vocabulary judgment for the current subtitle sentence. In one embodiment, the common segment vocabulary is stored in the common segment vocabulary database DB1, and the common word detection unit 133 detects whether the common segment vocabulary appears. Common segmented words can be divided into common beginning words and common ending words. For example, common opening words can be "next", "beginning", etc., common ending words can be "the above description is here", "today is over here", etc. In this step, it will detect whether there are common segmented words and the type of common segmented words (common beginning words or common ending words).

接著，分段方法200執行步驟S240根據常見分段詞彙判斷的判斷結果產生下一段落或將當前字幕句併入當前段落。於一實施例中，根據前述常見詞偵測單元133的偵測結果，可以決定是要產生新的段落或是將當前執行字幕句併入當前段落。舉例而言，當前段落是由第1字幕句~第3字幕句組成，當前執行字幕句可以是第4字幕句，根據判斷結果可以將第4字幕句併入當前段落或是將第4字幕句作為新的段落的開始。Next, the segmentation method 200 executes step S240 to generate the next paragraph or merge the current subtitle sentence into the current paragraph according to the judgment result of the common segmentation vocabulary judgment. In one embodiment, according to the detection result of the aforementioned common word detection unit 133, it may be determined whether to generate a new paragraph or merge the currently executed subtitle sentence into the current paragraph. For example, the current paragraph is composed of the first subtitle sentence to the third subtitle sentence, and the currently executed subtitle sentence can be the fourth subtitle sentence. According to the judgment result, the fourth subtitle sentence can be merged into the current paragraph or the fourth subtitle sentence As the beginning of a new paragraph.

承上述，步驟S240執行將當前字幕句併入當前段落後，會接著執行下一字幕句的常見分段詞彙判斷，因此會重行執行步驟S230的判斷。舉例而言，如果第4字幕句併入當前段落後，會接著執行第5字幕句的常見分段詞彙判斷。如果步驟S240執行產生下一段落後，會接著執行利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落，因此會重行執行步驟S220的操作。舉例而言，如果第4字幕句被分類為下一段落後，會重新選擇第5字幕句、第6字幕句以及第7字幕句加入下一段落。因此，會重複執行分段的動作，直到字幕句被分段完畢，最後產生分段結果。In view of the above, after the step S240 is executed to merge the current subtitle sentence into the current paragraph, the common segment vocabulary judgment of the next subtitle sentence will be executed, so the judgment of step S230 will be executed again. For example, if the fourth subtitle sentence is merged into the current paragraph, then the common segmentation vocabulary judgment of the fifth subtitle sentence will be executed. If step S240 is executed to generate the next paragraph, the subtitle sentences will be selected according to a specific order using the set value, and the selected subtitle sentences will be divided into the next paragraph, so the operation of step S220 will be repeated. For example, if the fourth subtitle sentence is classified as the next paragraph, the fifth subtitle sentence, the sixth subtitle sentence, and the seventh subtitle sentence will be re-selected and added to the next paragraph. Therefore, the segmentation action is repeated until the subtitle sentence is segmented, and finally the segmentation result is generated.

接著，步驟S240更包含步驟S241~S242，請一併參考第3圖，第3圖係根據本案之一些實施例所繪示之步驟S240的流程圖。如第3圖所示，分段方法200進一步執行步驟S241如果當前字幕句與常見分段詞彙相關聯，進行分段處理產生下一段落，並利用設定值根據特定順序選取字幕句，將被選取的字幕句加入下一段落。其中，步驟S241更包含步驟S2411~S2413，請進一步參考4圖，第4圖係根據本案之一些實施例所繪示之步驟S241的流程圖。如第4圖所示，分段方法200進一步執行步驟S2411根據判斷結果決定當前字幕句是否與開頭分段詞彙以及結尾分段詞彙的其中之一相關聯。接續上方實施例，根據步驟S230的判斷結果，可以決定當前字幕句是否與開頭分段詞彙或結尾分段詞彙相關聯。Next, step S240 further includes steps S241 to S242, please refer to FIG. 3 together, which is a flowchart of step S240 according to some embodiments of the present application. As shown in Figure 3, the segmentation method 200 further executes step S241. If the current subtitle sentence is associated with a common segmentation vocabulary, perform segmentation processing to generate the next paragraph, and use the set value to select the subtitle sentence in a specific order, which will be selected The subtitle sentence is added to the next paragraph. Among them, step S241 further includes steps S2411 to S2413, please further refer to FIG. 4, which is a flowchart of step S241 drawn according to some embodiments of the present application. As shown in Figure 4, the segmentation method 200 further executes step S2411 to determine whether the current subtitle sentence is associated with one of the beginning segment vocabulary and the end segment vocabulary according to the judgment result. Following the above embodiment, according to the judgment result of step S230, it can be determined whether the current subtitle sentence is associated with the beginning segment vocabulary or the end segment vocabulary.

承上述，分段方法200進一步執行步驟S2412如果當前字幕句與開頭分段詞彙相關聯，以當前字幕句作為下一段落的起始句。舉例而言，如果在前述的判斷結果中偵測到第4字幕句中具有「接下來」的詞彙，即將第4字幕作為下一段落的起始句。In view of the foregoing, the segmentation method 200 further executes step S2412. If the current subtitle sentence is associated with the beginning segment vocabulary, the current subtitle sentence is used as the start sentence of the next paragraph. For example, if it is detected that the fourth subtitle sentence has the word "next" in the aforementioned judgment result, the fourth subtitle sentence is used as the beginning sentence of the next paragraph.

承上述，分段方法200進一步執行步驟S2413如果當前字幕句與結尾分段詞彙相關聯，以當前字幕句作為當前段落的結尾句。舉例而言，如果在前述的判斷結果中偵測到第4字幕句中具有「以上說明到此」的詞彙，即將第4字幕作為當前段落的結尾句。執行完步驟S241的操作後會接著執行利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落，因此會重行執行步驟S220的操作，在此不再贅述。In view of the foregoing, the segmentation method 200 further executes step S2413. If the current subtitle sentence is associated with the ending segment vocabulary, the current subtitle sentence is used as the ending sentence of the current paragraph. For example, if it is detected in the foregoing judgment result that the fourth subtitle sentence has the word "the above description is here", the fourth subtitle sentence is regarded as the ending sentence of the current paragraph. After the operation of step S241 is performed, the subtitle sentences are selected according to a specific order by using the set value, and the selected subtitle sentences are divided into the next paragraph. Therefore, the operation of step S220 will be repeated, which will not be repeated here.

接著，分段方法200進一步執行步驟S242如果當前字幕句不與常見分段詞彙相關聯，當前字幕句與當前段落進行相似值計算，如果相似，將第一字幕句併入當前段落。其中，步驟S242更包含步驟S2421~ S2423，請進一步參考5圖，第5圖係根據本案之一些實施例所繪示之步驟S242的流程圖。如第5圖所示，分段方法200進一步執行步驟S2421比較當前字幕句對應的至少一特徵與當前段落對應的至少一特徵的差異值是否大於門檻值。Next, the segmentation method 200 further executes step S242. If the current subtitle sentence is not associated with the common segmentation vocabulary, the current subtitle sentence is calculated for similarity with the current paragraph, and if similar, the first subtitle sentence is merged into the current paragraph. Wherein, step S242 further includes steps S2421 to S2423, please further refer to FIG. 5, which is a flowchart of step S242 drawn according to some embodiments of the present application. As shown in FIG. 5, the segmentation method 200 further performs step S2421 to compare whether the difference between at least one feature corresponding to the current subtitle sentence and at least one feature corresponding to the current paragraph is greater than the threshold value.

承上述，於一實施例中，從字幕句中提取出複數個關鍵字，提取出的關鍵字即為當前字幕句對應的至少一特徵。利用TF-IDF統計方法(T ermF requency–InverseD ocumentF requency)計算字幕句對應的關鍵字。TF-IDF統計方法用來評估一字詞對於資料庫中的一份檔案的重要程度，字詞的重要性隨著它在檔案中出現的次數成正比增加，但同時也會隨著它在資料庫中出現的頻率成反比下降。於此實施例中，TF-IDF統計方法可以計算當前字幕句的關鍵字。接著，計算當前字幕句的至少一特徵(關鍵字)與當前段落的至少一特徵(關鍵字)的相似值，計算出的相似值越高即可判定為當前字幕句與當前段落的內容越接近。In accordance with the foregoing, in one embodiment, a plurality of keywords are extracted from the subtitle sentence, and the extracted keyword is at least one feature corresponding to the current subtitle sentence. Use the TF-IDF statistical method ( T erm F requency-Inverse D ocument F requency) to calculate the keywords corresponding to the subtitle sentences. The TF-IDF statistical method is used to evaluate the importance of a word to a file in the database. The importance of a word increases in proportion to the number of times it appears in the file, but at the same time it also increases in the data. The frequency of occurrence in the library decreases inversely. In this embodiment, the TF-IDF statistical method can calculate the keywords of the current subtitle sentence. Next, calculate the similarity value between at least one feature (keyword) of the current subtitle sentence and at least one feature (keyword) of the current paragraph. The higher the calculated similarity value, the closer the content of the current subtitle sentence and the current paragraph is. .

承上述，分段方法200進一步執行步驟S2422如果差異值小於門檻值，將當前字幕句併入當前段落。於一實施例中，利用門檻值對相似值進行篩選，當相似值不小於門檻值時，表示當前字幕句與當前段落的內容比較相似，因此可以將當前字幕句併入當前段落中。舉例而言，如果第4字幕句與當前段落的相似值不小於門檻值，表示第4字幕句與當前段落的內容比較相似，因此可以將第4字幕句加入當前段落。In view of the above, the segmentation method 200 further executes step S2422 if the difference value is less than the threshold value, merge the current subtitle sentence into the current paragraph. In one embodiment, the threshold value is used to filter the similarity value. When the similarity value is not less than the threshold value, it means that the content of the current subtitle sentence is relatively similar to the current paragraph, so the current subtitle sentence can be incorporated into the current paragraph. For example, if the similarity value between the fourth subtitle sentence and the current paragraph is not less than the threshold value, it means that the content of the fourth subtitle sentence is relatively similar to the current paragraph, so the fourth subtitle sentence can be added to the current paragraph.

承上述，分段方法200進一步執行步驟S2423如果差異值不小於門檻值，以當前字幕句作為下一段落的起始句，並利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落。舉例而言，當相似值小於門檻值時，表示當前字幕句與當前段落的內容具有差異，因此將當前字幕句判定為第二段落的起始句。舉例而言，如果第4字幕句與當前段落的相似值小於門檻值，表示第4字幕句與當前段落的內容具有差異，因此將第4字幕句作為下一段落的起始句。執行完步驟S252的操作後會接著執行利用設定值根據特定順序選取字幕句，將被選取的字幕句分為下一段落，因此會重行執行步驟S230的操作，在此不再贅述。In accordance with the foregoing, the segmentation method 200 further executes step S2423. If the difference value is not less than the threshold value, the current subtitle sentence is used as the starting sentence of the next paragraph, and the subtitle sentence is selected according to a specific order using the set value, and the selected subtitle sentence is divided into Next paragraph. For example, when the similarity value is less than the threshold value, it means that the content of the current subtitle sentence is different from the content of the current paragraph, so the current subtitle sentence is determined as the start sentence of the second paragraph. For example, if the similarity value between the fourth subtitle sentence and the current paragraph is less than the threshold value, it indicates that the content of the fourth subtitle sentence is different from the current paragraph. Therefore, the fourth subtitle sentence is used as the starting sentence of the next paragraph. After the operation of step S252 is performed, the subtitle sentence will be selected according to a specific order using the set value, and the selected subtitle sentence will be divided into the next paragraph. Therefore, the operation of step S230 will be performed again, which will not be repeated here.

由上述的分段操作可以得知，每次做完一句字幕句的分段計算後會接著執行下一句字幕句的分段計算，直到所有的字幕句執行完畢為止，如果有剩餘字幕句的數量少於設定值的設定時，可以不再針對剩餘字幕句進行分段計算，而是直接將剩餘字幕句併入當前段落，舉例而言，如果剩餘字幕句的數量為2，少於前述的設定值(前述將設定值設定為3)，因此剩下的2句字幕句即可併入當前段落。From the above segmentation operation, it can be known that each time the segment calculation of a subtitle sentence is completed, the segment calculation of the next subtitle sentence will be executed until all the subtitle sentences are executed. If there is the number of remaining subtitle sentences When the setting is less than the set value, you can no longer perform segmentation calculation for the remaining subtitles, but directly merge the remaining subtitles into the current paragraph. For example, if the number of remaining subtitles is 2, it is less than the aforementioned setting Value (the aforementioned set value is set to 3), so the remaining 2 subtitle sentences can be incorporated into the current paragraph.

接著，執行完上述的分段步驟後，分段方法200執行步驟S250產生段落對應的註解。舉例而言，如果在執行完全部的字幕句後分為3個段落，會分別計算3個段落的註解，註解可以是根據段落中的字幕句對應的關鍵字產生。最後，將分好的段落以及段落對應的註解儲存至儲存單元110的課程資料庫DB2中。舉例而言，如果差異值小於門檻值時，表示當前字幕句與當前段落較相似，因此可以利用字幕句的關鍵字作為當前段落對應的至少一特徵。如果差異值不小於門檻值時，表示當前字幕句與當前段落不相似，因此可以利用字幕句的關鍵字作為下一段落對應的至少一特徵。Then, after performing the above-mentioned segmentation step, the segmentation method 200 executes step S250 to generate a comment corresponding to the paragraph. For example, if all subtitle sentences are executed and divided into 3 paragraphs, the annotations of the 3 paragraphs will be calculated respectively. The annotations can be generated based on the keywords corresponding to the subtitle sentences in the paragraph. Finally, the divided paragraphs and the annotations corresponding to the paragraphs are stored in the course database DB2 of the storage unit 110. For example, if the difference value is less than the threshold value, it indicates that the current subtitle sentence is relatively similar to the current paragraph. Therefore, the keyword of the subtitle sentence can be used as at least one feature corresponding to the current paragraph. If the difference value is not less than the threshold value, it means that the current subtitle sentence is not similar to the current paragraph, so the keyword of the subtitle sentence can be used as at least one feature corresponding to the next paragraph.

由上述本案之實施方式可知，主要係改進以往係利用工方式進行影片段落標記，耗費大量人力以及時間的問題。先計算每每一字幕句對應的關鍵字，在針對字幕句進行常見分段詞彙判斷，根據該常見分段詞彙判斷的判斷結果產生下一段落或將第一字幕句併入當前段落，以產生分段結果，達到將學習影片中類似的主題進行分段並標註關鍵字的功能。From the above implementation of this case, it can be seen that it is mainly to improve the problem of using manual methods to mark movie paragraphs, which consumes a lot of manpower and time. First calculate the keyword corresponding to each subtitle sentence, and perform common segmentation vocabulary judgment for subtitle sentences, and generate the next paragraph according to the judgment result of the common segmentation vocabulary judgment or merge the first subtitle sentence into the current paragraph to generate the segmentation As a result, the function of segmenting similar topics in the learning film and labeling keywords is achieved.

另外，上述例示包含依序的示範步驟，但該些步驟不必依所顯示的順序被執行。以不同順序執行該些步驟皆在本揭示內容的考量範圍內。在本揭示內容之實施例的精神與範圍內，可視情況增加、取代、變更順序及/或省略該些步驟。In addition, the above examples include sequential exemplary steps, but these steps need not be executed in the order shown. Performing these steps in a different order is within the scope of the present disclosure. Within the spirit and scope of the embodiments of the present disclosure, the steps may be added, replaced, changed, and/or omitted as appropriate.

雖然本揭示內容已以實施方式揭露如上，然其並非用以限定本發明內容，任何熟習此技藝者，在不脫離本發明內容之精神和範圍內，當可作各種更動與潤飾，因此本發明內容之保護範圍當視後附之申請專利範圍所界定者為準。Although the present disclosure has been disclosed in the above embodiments, it is not intended to limit the content of the present invention. Anyone who is familiar with this technique can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the present invention The scope of protection of the content shall be subject to the scope of the attached patent application.

100:分段系統110:儲存單元130:處理器DB1:常見分段詞彙資料庫DB2:課程資料庫131:關鍵字擷取單元132:分段單元133:常見詞偵測單元134:段落產生單元135:註解產生單元200:分段方法S210～S250、S241~S242、S2411~S2413、S2421~S2423:步驟100: Segmentation system 110: Storage unit 130: Processor DB1: Common segmentation vocabulary database DB2: Course database 131: Keyword extraction unit 132: Segmentation unit 133: Common word detection unit 134: Paragraph generation unit 135: Annotation generation unit 200: Segmentation method S210~S250, S241~S242, S2411~S2413, S2421~S2423: Steps

為讓本發明之上述和其他目的、特徵、優點與實施例能更明顯易懂，所附圖式之說明如下：第1圖係根據本案之一些實施例所繪示之分段系統的示意圖；第2圖係根據本案之一些實施例所繪示之分段方法的流程圖；第3圖係根據本案之一些實施例所繪示之步驟S240的流程圖；第4圖係根據本案之一些實施例所繪示之步驟S241的流程圖；以及第5圖係根據本案之一些實施例所繪示之步驟S242的流程圖。In order to make the above and other objectives, features, advantages and embodiments of the present invention more comprehensible, the description of the accompanying drawings is as follows: Figure 1 is a schematic diagram of a segmented system drawn according to some embodiments of the present invention; Figure 2 is a flow chart of the segmentation method according to some embodiments of this case; Figure 3 is a flow chart of step S240 according to some embodiments of this case; Figure 4 is a flowchart of some implementations according to this case The flowchart of step S241 shown in the example; and FIG. 5 is a flowchart of step S242 shown in some embodiments of the present application.

200:分段方法 200: Segmentation method

S210~S250:步驟 S210~S250: steps

Claims

A segmentation method includes: receiving a subtitle information; wherein the subtitle information includes a plurality of subtitle sentences; selecting the subtitle sentences according to a setting value, and dividing the selected subtitle sentences into a first paragraph; The subtitle sentence performs a common segmentation vocabulary judgment; wherein, the first subtitle sentence is one of the subtitle sentences; and a second paragraph is generated according to a judgment result of the common segmentation vocabulary or the first subtitle sentence Sentence is merged into the first paragraph; wherein, the second paragraph is generated according to the judgment result of the common segmented vocabulary or the first subtitle sentence is merged into the first paragraph, and further includes: if the first subtitle sentence and the common The segmented vocabulary is related, a segmentation process is performed to generate the second paragraph, and the set value is used to select the subtitle sentences according to a first specific order, and the selected subtitle sentences are added to the second paragraph; and if the first paragraph A subtitle sentence is not associated with the common segmented vocabulary, the first subtitle sentence and the first paragraph are calculated by a similarity value, and if they are similar, the first subtitle sentence is incorporated into the first paragraph.

The segmentation method according to claim 1, wherein, after the first subtitle sentence is incorporated into the first paragraph, the common segmentation vocabulary judgment is performed for a second subtitle sentence; wherein, the second subtitle sentence is in accordance with a first paragraph The second specific sequence follows the first subtitle sentence.

The segmentation method according to claim 1, wherein when the second paragraph is generated, the set value is used to select the subtitle sentences according to a second specific order, and the selected subtitle sentences are added to the second paragraph.

The segmentation method according to claim 1, wherein the segmentation processing includes: determining whether the first subtitle sentence is associated with one of an opening segment vocabulary and an end segment vocabulary according to the judgment result; if The first subtitle sentence is associated with the opening segment vocabulary, and the first subtitle sentence is used as the opening sentence of the second paragraph; and if the first subtitle sentence is associated with the closing segment vocabulary, the first subtitle sentence is The subtitle sentence serves as the ending sentence of the first paragraph.

The segmentation method according to claim 1, wherein the similarity value calculation includes: comparing at least one feature corresponding to the first subtitle sentence and at least one feature corresponding to the first paragraph whether a difference value is greater than a threshold; if If the difference value is less than the threshold value, incorporate the first subtitle sentence into the first paragraph; and if the difference value is not less than the threshold value, use the first subtitle sentence as the start sentence of the second paragraph and use the The setting value selects the subtitle sentences according to the first specific order, and divides the selected subtitle sentences into the second paragraph.

The segmentation method according to claim 5, wherein from these words A plurality of keywords are extracted from the subtitle sentence, and the keywords are at least one feature corresponding to the first subtitle sentence.

The segmentation method according to claim 6, wherein the at least one feature corresponding to the first paragraph is generated from the related characters extracted from the subtitle sentences in the first paragraph.

A segmentation system includes: a storage unit for storing a subtitle information, a segmentation result, a common segmentation vocabulary database, a first paragraph corresponding annotation and a second paragraph corresponding annotation; and a processor , Is electrically connected to the storage unit to receive the subtitle information; wherein the subtitle information includes a plurality of subtitle sentences, and the processor includes: a segmentation unit for selecting the subtitle sentences using a set value, and Divide the selected subtitle sentence into a first paragraph; a common word detection unit is electrically connected to the segmentation unit for judging a common segment vocabulary for a first subtitle sentence; wherein, the first subtitle Sentence is one of the subtitle sentences; and a paragraph generation unit is electrically connected to the common word detection unit for generating a second paragraph or the first paragraph according to a judgment result of the common segment vocabulary judgment The subtitle sentence is incorporated into the first paragraph; wherein, the paragraph generation unit is further configured to perform the following steps according to the judgment result: if the first subtitle sentence is associated with the common segmented vocabulary, perform one point The paragraph processing generates a second paragraph, and uses the setting value to select the subtitle sentences according to a first specific order, and adds the selected subtitle sentences to the second paragraph; and if the first subtitle sentence does not correspond to the common segment The vocabulary is related, the first subtitle sentence and the first paragraph are calculated by a similarity value, and if they are similar, the first subtitle sentence is incorporated into the first paragraph.

The segmentation system according to claim 8, wherein, after the first subtitle sentence is incorporated into the first paragraph, the common word detection unit is further used to perform the common segmentation vocabulary judgment for a second subtitle sentence; wherein , The second subtitle sentence follows the first subtitle sentence in a second specific order.

The segmentation system according to claim 8, wherein after the second paragraph is generated, the segmentation unit is further used to select the subtitle sentences according to a second specific order by using the setting value, and add the selected subtitle sentences The second paragraph.

The segmentation system according to claim 8, wherein the segmentation processing includes: determining whether the first subtitle sentence is associated with one of an opening segment vocabulary and an end segment vocabulary according to the judgment result; if The first subtitle sentence is associated with the opening segment vocabulary, and the first subtitle sentence is used as the opening sentence of the second paragraph; and if the first subtitle sentence is associated with the closing segment vocabulary, the first subtitle sentence is The subtitle sentence serves as the ending sentence of the first paragraph.

The segmentation system according to claim 8, wherein the similarity value calculation includes: comparing at least one feature corresponding to the first subtitle sentence and at least one feature corresponding to the first paragraph whether a difference value is greater than a threshold; if If the difference value is less than the threshold value, incorporate the first subtitle sentence into the first paragraph; and if the difference value is not less than the threshold value, use the first subtitle sentence as the start sentence of the second paragraph and use the The setting value selects the subtitle sentences according to the first specific order, and divides the selected subtitle sentences into the second paragraph.

The segmentation system described in claim 12 further includes: a keyword extraction unit electrically connected to the segmentation unit for extracting a plurality of keywords from the subtitle sentences, and the keywords are At least one feature corresponding to the first subtitle sentence.

The segmentation system according to claim 13, wherein the at least one feature corresponding to the first paragraph is generated by the related characters extracted from the subtitle sentences in the first paragraph.

A non-transitory computer-readable medium includes at least one instruction program. The at least one instruction program is executed by a processor to implement a segmentation method, which includes: Receive a subtitle information; where the subtitle information includes a plurality of subtitle sentences; select the subtitle sentences according to a setting value, and divide the selected subtitle sentences into a first paragraph; perform a common segmentation for a first subtitle sentence Vocabulary judgment; wherein, the first subtitle sentence is one of the subtitle sentences; and a second paragraph is generated according to a judgment result of the common segmentation vocabulary judgment or the first subtitle sentence is incorporated into the first paragraph; Wherein, generating the second paragraph according to the judgment result of the common segmented vocabulary or incorporating the first subtitle sentence into the first paragraph further includes: if the first subtitle sentence is associated with the common segmented vocabulary, perform A segmentation process generates the second paragraph, and uses the setting value to select the subtitle sentences according to a specific order, and adds the selected subtitle sentences to the second paragraph; and if the first subtitle sentence does not correspond to the common segment The vocabulary is related, the first subtitle sentence and the first paragraph are calculated by a similarity value, and if they are similar, the first subtitle sentence is incorporated into the first paragraph.