TW201039149A - Robust algorithms for video text information extraction and question-answer retrieval - Google Patents

Robust algorithms for video text information extraction and question-answer retrieval Download PDF

Info

Publication number
TW201039149A
TW201039149A TW98112787A TW98112787A TW201039149A TW 201039149 A TW201039149 A TW 201039149A TW 98112787 A TW98112787 A TW 98112787A TW 98112787 A TW98112787 A TW 98112787A TW 201039149 A TW201039149 A TW 201039149A
Authority
TW
Taiwan
Prior art keywords
text
word
words
information extraction
paragraph
Prior art date
Application number
TW98112787A
Other languages
Chinese (zh)
Inventor
Yu-Chieh Wu
Original Assignee
Yu-Chieh Wu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yu-Chieh Wu filed Critical Yu-Chieh Wu
Priority to TW98112787A priority Critical patent/TW201039149A/en
Publication of TW201039149A publication Critical patent/TW201039149A/en

Links

Abstract

This invention proposes a robust algorithm for video closed caption extraction and retrieval. the algorithm can automatically detect text information, localize text areas, segment text fragments, track text in multiple frames, and recognize the text. by means of setting reasonable video frame size, it is able to identify the text which is in reasonable size and is also above certain threshold of height and width. also the first appearing time stamp of each recognized text will be record for further retrieving. on the basis of video text extraction algorithm and the stored time stamp information, a fixed size passage segmentation (grouping) method is applied to group neighboring sentences. each g consecutive sentences will be merged in order to form a passage. then, a robust retrieval algorithm is designed to search the recognized video ocr transcripts. regardless of the chinese word segmentation, this algorithm tends to give higher weight to the passage that contains dense and longer n-gram matching patterns.

Description

201039149 六、發明說明: 【發明所屬之技術領威】 [0001] 本發明係關於/種強健性自動影片文字資訊萃取與問答 檢索方法,用以對影片之中萃取其文字資訊,提供輸入 介面’讓使用者能以自然語言方式查詢問題,得到問題 答案在對應的影片與其播放起始時間資訊,並得以觀看 該段落之影片内容。 【先前技術】 [0002] 本發明係關於一種強健性自動影片文字資訊萃取與問答 檢索方法’實際上此間題奪涉到幾個領域知識如:(1)多 媒體處理:影像處理(Image processing),影像文字 定位、擷取(Text L0caiizatiori and Extraction) ,光學字元辨識(Optical character Recognition) ,(2)文件訊息處理:資訊檢索(Inf〇rmati〇n Re_ trieval )中英文件處理(Chinese/Engl ish Text201039149 VI. Description of the Invention: [Technical Leadership of the Invention] [0001] The present invention relates to a robust and automatic film text information extraction and question-and-answer retrieval method for extracting text information from a movie and providing an input interface. Allows the user to query the question in a natural language, get the answer to the question in the corresponding movie and its playback start time information, and be able to view the video content of the paragraph. [Previous Technology] [0002] The present invention relates to a robust automatic film text information extraction and question-and-answer retrieval method. In fact, this topic involves several areas of knowledge such as: (1) multimedia processing: image processing (Image processing), Text L0caiizatiori and Extraction, Optical Character Recognition, (2) Document Message Processing: Information Retrieval (Inf〇rmati〇n Re_ trieval) Chinese and English Document Processing (Chinese/Engl ish Text

Processing)等項目。就整體架構設計精神而論,是過 去國内外技術所能作到的技術能力皆具一定水平,但尚 未正&為系統!·生架構而且也未針對自然語言輸入之問 題解決,大多是以關鍵字檢索為主之技術W本發明之 作者之過去發表著作外),但對於此項整合性發明與其 ^ ^自動問答次算法之設計是未曾見過。以下是以國際 觀點來談論過去的相關技術與研究報告。 f年來對於元片之中的内容檢索的相關學術研究已陸續 提出例如使用影片先學文字識別(V ideoOCR)、語音識 別技術,以支持影片上之文字檢索。早期,國内林 川傑博士等人)曾 、林 098112787 表單編號A0101 簡早的同彙加權方法應用在影片 第4頁/共19頁 0982〇21335-〇 201039149 ❹ 檢索上。他們只對影片中之”白色”字幕進行辨識,並 且手動建立關鍵字列表的詞彙以提高加權值。在2003年 ,國外學者提出了一個相當複雜且只針對英文新聞影片 的自動問答系統,名為VideoQA。在他們的系統中集合了 非常龐大外部資源,如WordNet關連詞典、淺層句法解析 器(shallow syntactic parser),專有名詞辨識器 (Named entity tagger)、網際網路上之電子新聞資料 、人為規則、與自建的Ontology本體系統等。如同前文 作法,他們僅是以關鍵詞權重方法來找出最有可能出現 之答案。通常,將該系統移植到另一種語言或領域,這 些主要所使用之外部資源都得重新準備,而且不是每個 領域或語系都有像英文那麼多可用之外部資源。所以對 移植中文而言這將是一項非常難鉅的任務。而且中文仍 有斷(分)詞問題尚未解決。實際上中文斷詞結果往往 會影響所擷取出之關鍵字品質,一旦未知詞或斷錯詞發 生時,關鍵字將永遠找不到。最重要的一點是,過去的 ο 研究發現,是不去解決中文 過去五年内,本發明人曾設計了全世界第一個跨語言( 英文到中文)影片字幕自動問答系統。其主要解決之問 題為特定人事時地物的專有名詞上之答案回覆,透過該 系統,使用者之能以英文問題去檢索出中文影片。而其 背後所使用之技術為自行發展之段落檢索與答案對應演 算法。該研究是以一個機器自動翻譯之機制,將所有中 文字幕轉換為英文,接著再以英文之專有名詞辨識器, 去指出可能答案。明顯地,這方法並不適用於純中文語 系下的中文對中文查詢,而且其發展之段落檢索法仍是 098112787 表單編號A0101 第5頁/共19頁 0982021335-0 201039149 以詞為主,所以對於中文而言,斷詞問題仍是須克服之 第一要素。 雖然這些已在學術上發表過的刊物與本發明之目標十分 類似,但本質上仍有著極大差異。其中最重要的幾個地 方是,過去的研究是將影片視為廣播新聞,忽略其在畫 面上出現之字幕訊息。另外雖然有些影片是有字幕檔, 不須額外抽取這些資訊,不過對現在絕大部分的影片或 在電視上放送之Video而言,是十分不可行。尤其是在 台灣地區的影片或新聞内容,大多是以字幕來傳達影片 訊息,所以抽取這些影片字幕或場景中之文字,是非常 符合現在影片所要表達的意念。 除此之外,過去的文獻所著眼的是如何移轉現有之技術 ,並且採用一般現有的文件檢索方式來搜尋使用者所須 之影片資訊。然而,文件檢索之技術雖然十分成熟,且 廣為應用在教育研究、與產經業界,其原先之設計方式 就是針對人為所打出的文件或是幾乎沒有錯字的文章進 行檢索。很明顯地,直接使用文件檢索之技術只能看出 一般現有技術對這些辨識出之文字之檢閱率如何,單純 地不作任何調整就應用在此領域之中。有別於以往傳統 文件,影片之中所辨識出的文字(不論是由字幕抽取或 聲音中抽離),往往格式與一般人類打好的文件不同; 更甚者,許多錯字是無法避免的,因為現在的光學字元 辨識在良好環境下僅達90%準確度,而語音辨識則只能在 相同人說話時到達70%之效果。在這樣的文件下直接使用 傳統文件檢索之方法顯然是會有問題的,因為有許多的 詞是找不到或辨識錯誤。而且對中文來說,由文章之中 098112787 表單編號A0101 第6頁/共19頁 0982021335-0 201039149 分出詞彙也是一個極大的挑戰》 【發明内容】 [0003] Ο 〇 098112787 本發明之目的乃是在於提出一個新穎的強健性自動影片 内容問答系統,以本發明所提出的文字資訊萃取技術, 將新聞、廣告、連續劇等多類型影片中的字幕資訊辨識 出’並留下其相對應出現所在時間資訊,再以本發明所 提出的自動影片問答技術,解析查詢問題或關鍵字,最 後將能回答問題或相關之影片回應給使用者。 本發明之目的也在於能提出一個自動影片内容問答系統 ’以供使用者以關鍵字或自然語言方式查詢問題,得到 問題答案在對應的影片與其播放起始時間:資訊,並得以 觀看該段落之影片内容。 广:… '丨 '' '觀屬 本發明之目的也在於能提出一個自動影片内容問答系統 ’以供使用者以關鍵字或自然語言方式查詢問題,並能 直接檢索其影片字幕辨識之結果内容。 本發明之目的也在於能提出一個自動影片内容問答系統 ’能讓使用者觀看其與查詢相關之影片段落,與精確的 播放起始時間,並能直接對該影片作快轉、倒帶、暫停 、停止、播放等操作動作。 本發明之目的也在於能提出一個自動影片内容問答系統 ’以供使用者以關鍵字或自然語言方式查詢問題,並能 直接檢索其影片字幕辨識之結果内容。 本發明之目的也在於能提出一個自動影片内容問答系統 ’整合影片文字字幕辨識技術(本發明所發展)、檢索 使用行為等分析’使用者能以常見使用搜尋引擎的行為 方式,對影片字幕或文字進行檢索瀏覽。 表單編號A_1 第7頁/共19頁 0982021335-0 201039149 【實施方式】 [〇〇〇4]請參考第一圖,該圖為本發明之整體運作圖以及操作流 程圖。第一圖包含本發明中幾個重要核心模組、與使用 者輸入系統輪出概念圖。本發明實施例之操作流程將說 明如下。 以整體系統觀點而言,(100~103)是與使用者相關 之模組,(100)為使用者所查詢問題或關鍵字輸入至系統 ,而(101)代表的是整個影片自動問題系統之代碼; (10 2)是系統自動判別出所要回應給使用者的答案影片, 至於(103)影片庫則是事先準備好的影片來源資料庫。 首先先就影片庫之文字辨識來說明本發明之實施方 法。(130)代表的是影片文字辨識系統之整體系統。當要 辨識(103)影片庫内容時,會先將影片中每11張1?1^1^的 影像抽出,再對該影像進行(131)文字區塊偵測與切割, 將所有文字區塊偵測並精確切割出其所在影像位置。接 著將每一個區塊都經由本發明所發展孓(132)過濾非文字 區塊方法’將不是文字的區塊再行一次過滤。由於影格 是連續的’(133)文字追蹤模組負責追蹤每—文字出現在 多少張Frames ’並將這些出現的同一文字内容予以合 併(134)文子區塊二值化則是對這彩色字圖作抽取文字 顏色之動作,將文字顏色轉為黑色,背景轉為白色。 而(135)辨識肋將所輸人之二值化後的字圖,作文字之 辨認’所識別出之結果為(137)韻文字檔1最後一階 段則是合併Fr繼之文字為段落,所採用的是(ι 切割模組。 098112787 第二部分則是⑴0)中文字詞擷取,其主要是將中文 表單編號 A0101 H R "S' / α 1 a 0982021335-0 201039149 Ο 字串的詞組部分分離’因為中文不像英文,有空白作為 詞之邊界。當令(111)中文詞切分是一個斷詞方法,再本 發明之中,是以固定的(112) 1字詞(113) 2字詞(114) 3字詞為切割單位,不須準備字典,而直接對字串作這三 種方式的斷詞。 本發明於第三部分是在這步驟中(120)初始檢索系 統,以變形化ΒΜ-25為主要權重配分方式,達成一個 (121)簡易檢索模型以取出(122) ΤορΝ個段落,作為答 案候選者。 最後在(140)答案比對模組上’本發明先設計出—個 (141)最佳匹配法,列出所有可能之比對模式,接著再 〇 以動態規劃法(142)匹配結果,找出最佳一對一的查詢字 串與文章段落配對組合,接下來對此匹配組合,作(143) 再切割字串,最後則是將所細切出之字串,配合本發明 之權重配置計分法(144)段落權重配分與檢出,將所有結 果依此權重重新排名,並呈現給予使用者除了相對應之 答案影片 '出現於影片内之時間資訊、以及影片文字辨 識結果。 [0005] 【圖式簡單說明】 第一圖為本發明之系統架構示意圖 第二圖為影片文字辨識模組之示意圖 第二圖為答案比對模組之示意圖 第四圖為本發明之系統查詢介面圖 第五圖為本發明之系統瀏覽介面圖 第六圖為本發明之影片觀看介面圖 098112787 第七圖為本發明網路介面之檢余圖 表單編號Α0101 第9頁/共19頁 0982021335-0 201039149 【主要元件符號說明】 [0006] 1 0 1〜1 44 :系統步驟流程 2 01〜2 0 6 :答案比對演算法步驟流程 [0007] 100 使用者查詢問題或關鍵字 101 影片自動問題系統 102 答案影片 103 影片庫 110 中文字詞擷取 111 中文詞切分 112 1字詞 113 2字詞 114 3字詞 120 初始檢索系統 121 簡易檢索模型 122 ΤορΝ個段落 130 影片文字辨識系統 131 文字區塊偵測與切割 132 過濾非文字區塊 133 文字追蹤 134 文字區塊二值化 135 辨識 136 段落切割 137 辨識文字 140 答案比對 141 最佳匹配法 表單編號A0101 第10頁/共19頁 098112787 0982021335-0 201039149 142匹配結果 143再切割 144段落權重配分與檢出 [0008] 201 ΤορΝ個段落 202快速匹配字串演算法 203尋找最佳匹配位置組合演算法 204重切割演算法 205權重配置演算法 Ο 206計算整體段落排名分數 Ο 098112787 表單編號Α0101 第11頁/共19頁 0982021335-0Processing) and other projects. As far as the overall architectural design spirit is concerned, in the past, the technical capabilities that can be achieved by domestic and foreign technologies have a certain level, but they have not been positively developed for the system and have not been solved for natural language input. The keyword-based technology is the author of the author of the present invention, but the design of this integrated invention and its automatic question-and-answer algorithm has never been seen. The following is an international perspective on past related technologies and research reports. In the past years, relevant academic research on content retrieval in meta-films has been proposed, for example, using video vocabulary recognition (V ideoOCR) and speech recognition technology to support text retrieval on video. Early, domestic Dr. Lin Chuanjie et al.) Zeng, Lin 098112787 Form No. A0101 The simple same weighting method applied in the film Page 4 of 19 0982〇21335-〇 201039149 检索 Search. They only identify the "white" subtitles in the movie and manually create the vocabulary of the keyword list to increase the weighting. In 2003, foreign scholars proposed a fairly complex automated question answering system for English news videos called VideoQA. They have a huge collection of external resources in their systems, such as the WordNet relational dictionary, the shallow syntactic parser, the Named entity tagger, the electronic news materials on the Internet, artificial rules, And the self-built Ontology ontology system. As in the previous approach, they only use the keyword weighting method to find the most likely answer. Often, the system is ported to another language or domain, and the external resources used primarily are re-prepared, and not every domain or language has as many external resources as English. So this will be a very difficult task for transplanting Chinese. Moreover, the problem of broken (sub)words in Chinese has not yet been resolved. In fact, Chinese word-breaking results often affect the quality of the keywords that are extracted. Once an unknown word or a broken word occurs, the keyword will never be found. The most important point is that the past ο research found that it is not to solve Chinese. In the past five years, the inventor has designed the world's first cross-language (English to Chinese) film subtitle automatic question answering system. The main problem to be solved is the answer to the answer on the proper noun of the specific person's time. Through the system, the user can retrieve the Chinese film in English. The technology behind it is a self-developed paragraph retrieval and answer algorithm. The study uses a machine-automatic translation mechanism to convert all Chinese subtitles into English, followed by a proper noun recognizer in English to indicate possible answers. Obviously, this method is not suitable for Chinese-Chinese query in pure Chinese language, and the development of the paragraph search method is still 098112787 Form No. A0101 Page 5 / 19 pages 0982021335-0 201039149 Based on words, so for In Chinese, the problem of word breaking is still the first element to be overcome. Although these academically published publications are very similar to the objectives of the present invention, there are still significant differences in nature. The most important of these is that the past study used the film as a broadcast news, ignoring the subtitles that appeared on the screen. In addition, although some videos have subtitle files, there is no need to extract this information. However, it is very infeasible for most of the current videos or videos that are broadcast on TV. In particular, the film or news content in Taiwan is mostly based on subtitles to convey the video message. Therefore, the subtitles or texts in the scenes are very consistent with the ideas expressed in the current film. In addition, the literature in the past focuses on how to transfer existing technologies and uses the usual existing document retrieval methods to search for video information required by users. However, although the technology of document retrieval is very mature and widely used in educational research and the industry of origin, its original design method is to search for documents written by humans or articles with few typos. Obviously, the technique of directly using document retrieval can only see how the review rate of these recognized texts in the prior art is applied in this field without any adjustment. Different from the traditional documents in the past, the words recognized in the film (whether it is extracted by subtitles or extracted from the sound) are often different in format from the files that are generally played by human beings; even more, many typos are unavoidable. Because the current optical character recognition is only 90% accurate in a good environment, speech recognition can only reach 70% of the effect when the same person speaks. The direct use of traditional file retrieval methods under such documents is clearly problematic because there are many words that cannot be found or identified. Moreover, for Chinese, the vocabulary of 098112787 Form No. A0101 Page 6/19 pages 0982021335-0 201039149 is also a great challenge. [Invention] [0003] Ο 〇 098112787 The purpose of the present invention is It is to propose a novel robust automatic video content question answering system. With the text information extraction technology proposed by the present invention, the subtitle information in multiple types of movies such as news, advertisements, serials and the like can be identified as 'and the corresponding time is left. Information, and then use the automatic film question and answer technology proposed by the present invention to analyze query questions or keywords, and finally respond to questions or related videos to the user. The object of the present invention is also to provide an automatic video content question answering system 'for the user to query the question by keyword or natural language, to get the answer to the question in the corresponding movie and its playing start time: information, and to view the paragraph Video content.广:... '丨'' 'View' The purpose of the present invention is also to provide an automatic video content question answering system' for users to query questions in a keyword or natural language, and to directly retrieve the results of their movie caption recognition. . The object of the present invention is also to provide an automatic video content question answering system that allows the user to view the video passages related to the query, and the precise playback start time, and can directly fast, rewind, pause the movie. , stop, play, etc. It is also an object of the present invention to provide an automatic video content question answering system ’ for users to query questions in a keyword or natural language manner and to directly retrieve the result content of their movie caption recognition. The object of the present invention is also to provide an automatic video content question answering system 'integrated film text subtitle identification technology (developed by the present invention), search usage behavior and the like analysis 'users can use the common search engine behavior, subtitles or The text is searched and browsed. Form No. A_1 Page 7 of 19 0982021335-0 201039149 [Embodiment] [〇〇〇4] Please refer to the first figure, which is an overall operation diagram and operation flow chart of the present invention. The first figure contains several important core modules of the present invention, and a conceptual diagram of the user input system. The operational flow of the embodiment of the present invention will be explained as follows. From the perspective of the overall system, (100~103) is a module related to the user, (100) is input to the system for the question or keyword queryed by the user, and (101) represents the entire automatic problem system of the movie. Code; (10 2) is the system automatically determines the answer to the user's answer to the video, as for (103) the film library is a prepared source database. First, the implementation of the present invention will be described with respect to the character recognition of the video library. (130) represents the overall system of the film text recognition system. When you want to identify (103) the contents of the video library, you will first extract every 11 images of 1?1^1^ in the movie, and then perform (131) text block detection and cutting on the image, and block all the text blocks. Detect and accurately cut the position of the image where it is located. Each of the blocks is then filtered by the non-text block method of the development of the present invention (132). Since the frame is continuous '(133) the text tracking module is responsible for tracking how many frames each text appears in and combining the same text content that appears (134) the binning of the text block is the color word map The action of extracting the color of the text turns the text color to black and the background to white. And (135) the identification rib will be the binarized word map of the input person, for the recognition of the text 'the result identified is (137) rhyme text file 1 the last stage is the combination of Fr followed by the text as a paragraph, The word used is (ι cutting module. 098112787, the second part is (1) 0) Chinese character word retrieval, which is mainly the Chinese form number A0101 HR "S' / α 1 a 0982021335-0 201039149 Ο string phrase Partial separation 'Because Chinese is not like English, there is a blank as the boundary of the word. When the (111) Chinese word segmentation is a word-breaking method, in the present invention, the fixed (112) 1 word (113) 2 word (114) 3 words are used as the cutting unit, and the dictionary is not required to be prepared. And directly make a three-word break word for the string. In the third part of the present invention, in the step (120), the initial retrieval system uses the deformed ΒΜ-25 as the main weighting division method to achieve a (121) simple retrieval model to extract (122) ΤορΝ paragraphs as candidates for the answer. By. Finally, on the (140) answer comparison module, the present invention first designs a (141) best matching method, lists all possible comparison modes, and then uses the dynamic programming method (142) to match the results. The best one-to-one query string is paired with the article paragraph, and then the matching combination is made (143) and then the string is cut, and finally the finely cut string is matched with the weight allocation of the present invention. The scoring method (144) paragraph weights are divided and detected, and all results are re-ranked according to this weight, and the time information given to the user in addition to the corresponding answer film 'appears in the video, and the video text recognition result are presented. BRIEF DESCRIPTION OF THE DRAWINGS [Fig. 1 is a schematic diagram of a system architecture of the present invention. The second diagram is a schematic diagram of a video text recognition module. The second diagram is a schematic diagram of an answer comparison module. The fourth diagram is a system query of the present invention. The fifth diagram of the interface is the system browsing interface diagram of the present invention. The sixth diagram is the video viewing interface diagram of the present invention. 098112787. The seventh diagram is the remnant diagram of the network interface of the present invention. Form number Α0101, page 9/19 pages 0992021335- 0 201039149 [Main component symbol description] [0006] 1 0 1~1 44 : System step flow 2 01~2 0 6 : Answer comparison algorithm step flow [0007] 100 User query question or keyword 101 Video auto problem System 102 Answer Video 103 Video Library 110 Chinese Words 111 Chinese Words Segment 112 1 Words 113 2 Words 114 3 Words 120 Initial Search System 121 Simple Search Model 122 ΤορΝ Paragraph 130 Video Character Recognition System 131 Text Area Block Detection and Cutting 132 Filtering Non-Text Blocks 133 Text Tracking 134 Text Block Binarization 135 Identification 136 Section Cutting 137 Identification Text 140 Answer Alignment 141 Best Match Method Form No. A0101 Page 10 / Total 19 Page 098112787 0982021335-0 201039149 142 Match Result 143 Re-cut 144 Paragraph Weight Assignment and Detection [0008] 201 ΤορΝParagraph 202 Quick Match String Algorithm 203 Search Best matching position combination algorithm 204 re-cutting algorithm 205 weighting algorithm Ο 206 Calculating the overall paragraph ranking score Ο 098112787 Form number Α 0101 Page 11 / Total 19 page 0982021335-0

Claims (1)

201039149 七、申請專利範圍: 1. 一種自動影片文字資訊萃取與檢索系統,係萃取影片中 出現之文字並加以檢索之演算過程,包括: 2. —影片文字辨識模組,包括: 一文字區塊偵測與切割:將影像上有可能是文字區塊之範圍 偵測及切割成單元 一過濾非文字區塊:對於切割出卻不是文字區塊的單元進行 再一次判別及過濾 一文字追蹤:對同一文字區塊卻連續出現在多張影像上之合 併並加以追蹤。 一文字區塊二值化:對文字區塊進行顏色上之區隔,使文字 顏色與背景顏色完全不同 一文字辨識:對二值化後之文字區域進行最後之字元辨識 一段落切割模組:每一 Frame内所有辨識出之文字視為一 句,每K句為一單位,重複前L句,將多句聚集成段,並紀 錄其出現時間 3. —中文字詞擷取模組,包括: 一中文詞切分模組:將查詢問題或字串,依1字詞、2字詞 、3字詞等三個單元 4. 一初始段落檢索模組,包括: 一簡易檢索模型:建立多元詞庫,包含1字詞索引庫、2字 詞索引庫、3字詞索引庫等,並將輸入問題及辨識後文字字 串分解為1字詞來匹配出N筆候選段落 5. —答案比對模組,包括: 一最佳字串匹配演算法:根據匹配出N筆候選段落,找出各 098112787 表單編號A0101 第12頁/共19頁 0982021335-0 201039149 個候選段落的最佳多重詞與最密集詞匹配結果 〇 一再切割模組:依據匹配結果,再依1字詞索引庫、2字詞 索引庫、3字詞索引庫將字串再切割 一段落權重配分與檢出模組:將所匹配到的段落,依其對應 到的1字詞、2字詞、3字詞權重,給予整段最後之權重值 ,並依此分數將段落排名,予以檢索。 6. 2. 如申請專利範圍第1項所述之一種自動影片文字資訊萃取 與檢索系統,其中該文字區塊偵測與切割,係先將抽出之影 像以Canny邊緣檢測後,運用整體Otsu法將非明顯邊緣點過 濾後,再以局部(20xl6)Entropy法,將局部非明顯邊緣點 過濾;接著以水平投影法第一次切割文字,垂直投影法,允 許7個不連續像素之垂直文字區塊切割;接著再重複上述步 驟1次,切割出所有可能文字區塊。 3. 如申請專利範圍第1項所述之一種自動影片文字資訊萃取 與檢索系統,其中過濾非文字區塊,係以下述規則去除非文 〇 子區塊. 高度< 9 pixels, 水平越零< 9次 水平波峰< 7次 而獲得所有文字區塊。 4.如申請專利範圍第1項所述之一種自動影片文字資訊萃取 與檢索系統,其中文字追蹤,係以垂直邊緣點投影量為向量 ,容忍左右各一格誤差,進行多張frame之追蹤比對,若 二個向量歐幾里德距離小於一定門檻值,則視為相同之文字 ,並且記錄保存;而合併係將文字區塊内所有像素在多張 098112787 表單編號A0101 第13頁/共19頁 0982021335-0 201039149 frames出現機率須高於一定百分比時,予以保留否則視 為背景,予以刪除。 5.如申請專利範圍第1項所述之一種自動影片文字資訊萃取 與檢索系統,其中文字區塊二值化,為二階段作法: 第一階段先以下述公式求出整體門檻: Threshold = 0.75*0tsu+0.25^Entropy 並以此門檻對文字區塊進行二值化 第二階段則是自文字區塊之外皮向外延伸一定比例,如 5pixels ’並自外向内以連通圖方式,把與外皮相連通之像 素去除。 6·如申請專利範圍第!項所述之一種自動影片文字資訊萃取 與檢索系統,其中文字辨識,為多階段作法,包含文字切割 ,大小正規化’特徵棘,詞群_,字辨識,與辨識結果 校正。如第4圖所示。 7·如申請專利範圍第i項所述之一種自動影片文字資訊萃取 與檢索系統,其中段落切割,是以frame為翠位句子並 記錄該句出現時間,然後合併連續]{個句子,以及其前^句 是前一段的最後Μ句。 8·如申請專利範圍第i項所述之一種自動影片文字資訊萃取 與檢索系統,其中中文字詞擷取,是以多元固定詞長方式 (Multiple N-gram)梅取,依工字詞、2字詞、3字詞等 三個單元將輸人字串進行切割。另外,若包含英文與數字, 則不以此方式擷取,而是以詞為單位切割。 098112787 9.如申請專利範圍第β所述之一種自動影片文字資訊萃取 與檢索系統’其中簡易檢索模型,是以1字詞為解,接著以 βΜ 25的*司彙權重方式,先檢索出ΤορΝ個相關段落。 表單編號Α0101 第14頁/共19頁 0982021335-0 201039149 10. 如申請專利範圍第1項所述之一種自動影片文字資訊萃 取與檢索系統,其中最佳字串匹配演算法,係根據匹配出N 筆候選段落,找出各個候選段落的最佳多重詞與最密集詞匹 配結果。匹配方式是以動態規劃式,找出一個最佳一對一的 配對組而非傳統計算詞頻。 11. 如申請專利範圍第1項所述之一種自動影片文字資訊萃 取與檢索系統,其中一再切割模組,是依據匹配結果,再依 1字詞索引庫、2字詞索引庫、3字詞索引庫將ΤορΝ個段 落再切割其字串成為最多3字詞之結果。 〇 12. 如申請專利範圍第1項所述之一種自動影片文字資訊萃 取與檢索系統,其中段落權重配分與檢出模組,將所匹配到 的段落,依其對應到的1字詞、2字詞、3字詞權重,給予 整段最後之權重值,並依此分數將段落排名,予以檢索。 權重計算方式如下: Passage _ Score(P) = max{A X QW _ Density(0, P,) + (I - λ) X QW _ Weight(0, P,), 〇 λ X QW _ Densily(C?, P2) + (I - λ) X QW _ Weight(0, P2)} 0982021335-0 098112787 表單編號A0101 第15頁/共19頁201039149 VII. Patent application scope: 1. An automatic film text information extraction and retrieval system, which is a process of extracting the text appearing in the film and searching it, including: 2. The film text recognition module, including: a text block detection Measure and cut: detect and cut the range of the text block into a unit. Filter the non-text block: For the unit that is cut but not the text block, make another discriminant and filter a text track: the same text The blocks are successively combined on multiple images and tracked. Binding of a text block: color segmentation of the text block, so that the text color and the background color are completely different. A character recognition: the final character recognition of the binarized text area is performed by a segment cutting module: each All recognized words in the frame are treated as one sentence, each K sentence is a unit, the previous L sentence is repeated, multiple sentences are gathered into the segment, and the time of occurrence is recorded. 3. The Chinese character word capture module, including: one Chinese Word segmentation module: will query questions or strings, according to three units of 1 word, 2 words, 3 words, etc. 4. An initial paragraph retrieval module, including: a simple retrieval model: establish a multi-word library, It includes a 1-word index library, a 2-word index library, a 3-word index library, etc., and the input question and the recognized character string are decomposed into 1 word to match the N candidate paragraphs. 5. The answer comparison module , including: an optimal string matching algorithm: according to matching N candidate paragraphs, find the best multiple words and the most dense words of each 098112787 form number A0101 page 12 / 19 pages 0982021335-0 201039149 candidate paragraphs Match result 〇 repeatedly cutting the module: according to the matching result, and then according to the 1 word index library, 2 word index library, 3 word index library, the word string is further cut into a paragraph weight distribution and detection module: the matching paragraph According to the corresponding 1 word, 2 word, 3 word weight, give the final weight value of the whole paragraph, and rank the paragraph according to the score, and search. 6. 2. An automatic film text information extraction and retrieval system as described in item 1 of the patent application, wherein the text block detection and cutting is performed by using the overall Otsu method after the extracted image is detected by the Canny edge. After filtering the non-obvious edge points, the local non-obvious edge points are filtered by the local (20xl6) Entropy method; then the first time the text is cut by the horizontal projection method, the vertical projection method allows the vertical text areas of 7 discrete pixels. Block cutting; then repeat the above steps once to cut out all possible text blocks. 3. An automatic film text information extraction and retrieval system as described in claim 1, wherein the non-text block is filtered, and the non-text block is removed by the following rule. Height < 9 pixels, the level is zero < 9 horizontal peaks < 7 times to obtain all text blocks. 4. An automatic film text information extraction and retrieval system according to item 1 of the patent application scope, wherein the text tracking is performed by using a vertical edge point projection amount as a vector, tolerating each of the left and right lattice errors, and performing tracking ratios of multiple frames. Yes, if the two vector Euclidean distance is less than a certain threshold, it is regarded as the same text, and the record is saved; and the merge system will have all the pixels in the text block in multiple sheets 098112787 Form No. A0101 Page 13 of 19 Page 0982021335-0 201039149 frames When the probability of occurrence must be higher than a certain percentage, it is retained or treated as a background and deleted. 5. An automatic film text information extraction and retrieval system as described in claim 1, wherein the text block is binarized and is a two-stage process: The first stage first finds the overall threshold by the following formula: Threshold = 0.75 *0tsu+0.25^Entropy and use this threshold to binarize the text block. The second stage is to extend a certain proportion from the outer corner of the text block, such as 5pixels' and connect the skin from the outside to the inside. The connected pixels are removed. 6. If you apply for a patent scope! The automatic film text information extraction and retrieval system described in the item, wherein the character recognition is a multi-stage method, including text cutting, size normalization, characteristic spike, word group _, word recognition, and identification result correction. As shown in Figure 4. 7. An automatic film text information extraction and retrieval system as described in item i of the patent application scope, wherein the paragraph cutting is a frame-based sentence and records the appearance time of the sentence, and then merges successively {{sentences, and The former sentence is the last sentence of the previous paragraph. 8. An automatic film text information extraction and retrieval system as described in item i of the patent application scope, wherein the Chinese character word retrieval is based on multiple N-grams, and according to the work words, Three units, such as 2 words and 3 words, cut the input string. In addition, if English and numbers are included, they are not taken in this way, but are cut in units of words. 098112787 9. An automatic film text information extraction and retrieval system as described in Patent Application No. β, wherein the simple retrieval model is based on a word, and then the 司ορΝ is first searched by the weight of βΜ 25 Related paragraphs. Form No. 1010101 Page 14/19 pages 0992021335-0 201039149 10. An automatic film text information extraction and retrieval system as described in claim 1 of the patent application, wherein the best string matching algorithm is based on matching N Write candidate paragraphs to find the best multi-word and most dense word matching results for each candidate paragraph. The matching method is based on dynamic programming to find a best one-to-one pairing group instead of the traditional computing word frequency. 11. An automatic film text information extraction and retrieval system according to item 1 of the patent application scope, wherein the repeated cutting module is based on the matching result, and further according to the 1 word index database, the 2 word index database, and the 3 word words. The index library will slash the paragraph and then cut its string into the result of a maximum of 3 words. 〇12. An automatic film text information extraction and retrieval system as described in item 1 of the patent application scope, wherein the paragraph weight distribution and the detection module, the matched paragraphs, according to the corresponding 1 word, 2 The weight of words and 3 words is given to the final weight value of the whole paragraph, and the paragraphs are ranked according to the score and retrieved. The weights are calculated as follows: Passage _ Score(P) = max{AX QW _ Density(0, P,) + (I - λ) X QW _ Weight(0, P,), 〇λ X QW _ Densily(C? , P2) + (I - λ) X QW _ Weight(0, P2)} 0982021335-0 098112787 Form No. A0101 Page 15 of 19
TW98112787A 2009-04-17 2009-04-17 Robust algorithms for video text information extraction and question-answer retrieval TW201039149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW98112787A TW201039149A (en) 2009-04-17 2009-04-17 Robust algorithms for video text information extraction and question-answer retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW98112787A TW201039149A (en) 2009-04-17 2009-04-17 Robust algorithms for video text information extraction and question-answer retrieval

Publications (1)

Publication Number Publication Date
TW201039149A true TW201039149A (en) 2010-11-01

Family

ID=44995335

Family Applications (1)

Application Number Title Priority Date Filing Date
TW98112787A TW201039149A (en) 2009-04-17 2009-04-17 Robust algorithms for video text information extraction and question-answer retrieval

Country Status (1)

Country Link
TW (1) TW201039149A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761345A (en) * 2014-02-27 2014-04-30 苏州千视通信科技有限公司 Video retrieval method based on OCR character recognition technology
TWI479344B (en) * 2011-02-02 2015-04-01 Microsoft Corp Information retrieval using subject-aware document ranker
CN107087225A (en) * 2011-05-25 2017-08-22 谷歌公司 Closed caption stream is used for device metadata
CN109800757A (en) * 2019-01-04 2019-05-24 西北工业大学 A kind of video text method for tracing based on layout constraint
CN111325195A (en) * 2020-02-17 2020-06-23 支付宝(杭州)信息技术有限公司 Text recognition method and device and electronic equipment
WO2023037283A1 (en) * 2021-09-09 2023-03-16 L&T Technology Services Limited Methods and system for extracting text from a video
TWI821671B (en) * 2020-08-14 2023-11-11 大陸商中國銀聯股份有限公司 A method and device for positioning text areas

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI479344B (en) * 2011-02-02 2015-04-01 Microsoft Corp Information retrieval using subject-aware document ranker
CN107087225A (en) * 2011-05-25 2017-08-22 谷歌公司 Closed caption stream is used for device metadata
CN107087225B (en) * 2011-05-25 2020-04-03 谷歌有限责任公司 Using closed captioning streams for device metadata
CN103761345A (en) * 2014-02-27 2014-04-30 苏州千视通信科技有限公司 Video retrieval method based on OCR character recognition technology
CN109800757A (en) * 2019-01-04 2019-05-24 西北工业大学 A kind of video text method for tracing based on layout constraint
CN109800757B (en) * 2019-01-04 2022-04-19 西北工业大学 Video character tracking method based on layout constraint
CN111325195A (en) * 2020-02-17 2020-06-23 支付宝(杭州)信息技术有限公司 Text recognition method and device and electronic equipment
CN111325195B (en) * 2020-02-17 2024-01-26 支付宝(杭州)信息技术有限公司 Text recognition method and device and electronic equipment
TWI821671B (en) * 2020-08-14 2023-11-11 大陸商中國銀聯股份有限公司 A method and device for positioning text areas
WO2023037283A1 (en) * 2021-09-09 2023-03-16 L&T Technology Services Limited Methods and system for extracting text from a video

Similar Documents

Publication Publication Date Title
Yang et al. Content based lecture video retrieval using speech and video text information
Albanie et al. Bbc-oxford british sign language dataset
CN111078943B (en) Video text abstract generation method and device
TW201039149A (en) Robust algorithms for video text information extraction and question-answer retrieval
WO2015054627A1 (en) Methods and systems for aggregation and organization of multimedia data acquired from a plurality of sources
CN106126619A (en) A kind of video retrieval method based on video content and system
CN116361510A (en) Method and device for automatically extracting and retrieving scenario segment video established by utilizing film and television works and scenario
Baidya et al. LectureKhoj: automatic tagging and semantic segmentation of online lecture videos
Wu et al. Leveraging social Q&A collections for improving complex question answering
CN114880496A (en) Multimedia information topic analysis method, device, equipment and storage medium
Varol et al. Scaling up sign spotting through sign language dictionaries
Metze et al. Beyond audio and video retrieval: topic-oriented multimedia summarization
AlMousa et al. Nlp-enriched automatic video segmentation
Pal et al. Anubhuti--An annotated dataset for emotional analysis of Bengali short stories
Qi et al. Automated coding of political video ads for political science research
Langlois et al. VIRUS: video information retrieval using subtitles
Wu et al. CLVQ: Cross-language video question/answering system
US20210342393A1 (en) Artificial intelligence for content discovery
Lee Text-based video genre classification using multiple feature categories and categorization methods
Lin et al. A simple method for Chinese video OCR and its application to question answering
Hauptmann et al. A hybrid approach to improving semantic extraction of news video
Gonsalves et al. ML-Based Indexing of Media Libraries for Insights and Search
Muraoka et al. Visual Concept Naming: Discovering Well-Recognized Textual Expressions of Visual Concepts
Khollam et al. A survey on content based lecture video retrieval using speech and video text information
Rajarathinam et al. Analysis on video retrieval using speech and text for content-based information