TW202338664A

TW202338664A - Word recognition system and word recognition method

Info

Publication number: TW202338664A
Application number: TW111111157A
Authority: TW
Inventors: 周世恩; 楊雅汝
Original assignee: 多利曼股份有限公司
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2023-10-01
Also published as: TWI792957B

Abstract

A word recognition system and method are provided, which comprises inputting image information into a collection module, and making the collection module generate a text screenshot; and by the recognition module receiving the text screenshot, obtaining word string data without being affected by interference of background of the image information. Therefore, it can reduce the probability of affecting the accuracy of identification.

Description

Text recognition system and method

本發明提供一種文字辨識技術，尤指一種適用於網路影像內容之文字辨識系統及方法。 The present invention provides a text recognition technology, particularly a text recognition system and method suitable for online image content.

隨著網路平台的崛起，一般人都能輕易地透過網路平台獲取自己所需的影音內容。例如，近年來直播系統逐漸成熟，電視台除了在傳統管道播送訊號外，更整合如Youtube、Facebook等網路社群平台，將影音內容訊號透過該網路社群平台之直播工具播放，甚至一些網路媒體亦嘗試導入直播室。 With the rise of online platforms, ordinary people can easily obtain the audio and video content they need through online platforms. For example, in recent years, the live broadcast system has gradually matured. In addition to broadcasting signals through traditional channels, TV stations have also integrated online social platforms such as Youtube and Facebook to play audio and video content signals through the live broadcast tools of the online social platforms. Even some websites Road Media is also trying to introduce live broadcast rooms.

由於網路資訊過於發達，相關產業更需仰賴該些網路資訊。例如，品牌媒體或公關公司等產業需求端若需即時掌握社群輿情，需擷取該些網路社群平台之影音內容進行分析。 Due to the overdevelopment of online information, related industries need to rely more on this online information. For example, if industry demands such as brand media or public relations companies need to grasp social public opinion in real time, they need to capture the audio and video content of these online social platforms for analysis.

然而，該些影音內容中包含影像資料與文字資料，致使該些影音內容之檔案大小過於龐大，導致存取不易。 However, these audio and video contents include image data and text data, which makes the file size of these audio and video contents too large, making access difficult.

再者，該產業需求端雖可透過外掛軟體僅擷取該些影音內容中之文字資料，但習知外掛軟體對於該影音內容中之影像資料之背景干擾 (noise)非常敏感，往往會受該背景干擾而影響辨識正確率。另一方面，習知外掛軟體於辨識文字時，需於對比度極高(如白底黑字或黑底白字)之條件下進行，否則辨識正確率極低。例如，若該影像資料中之標題背景呈現漸層顏色或其它圖樣，將導致辨識正確率急劇下降。 Furthermore, although the demand side of the industry can use plug-in software to capture only the text data in the audio and video content, plug-in software is known to cause background interference to the image data in the video and audio content. (noise) is very sensitive and is often affected by background interference, which affects the accuracy of recognition. On the other hand, when conventional plug-in software recognizes text, it needs to operate under extremely high contrast conditions (such as black text on a white background or white text on a black background), otherwise the recognition accuracy is extremely low. For example, if the background of the title in the image data shows gradient colors or other patterns, the recognition accuracy will drop sharply.

因此，如何克服上述習知技術的種種問題，實已成目前亟欲解決的課題。 Therefore, how to overcome the various problems of the above-mentioned conventional technologies has become an urgent issue to be solved.

本發明提供一種線上文字辨識系統，係包括：收集模組，係接收影像資訊以產生文字截圖；以及辨識模組，係通訊連接該收集模組以接收該文字截圖，並進行辨識作業以獲取包含字串資料之目標資訊。 The present invention provides an online text recognition system, which includes: a collection module that receives image information to generate text screenshots; and a recognition module that communicates with the collection module to receive the text screenshots, and performs recognition operations to obtain information containing The target information of the string data.

前述之文字辨識系統中，該收集模組係採用網路爬蟲型式，以自動搜尋及收集於網路上所公開之該影像資訊。 In the aforementioned text recognition system, the collection module adopts a web crawler mode to automatically search and collect the image information published on the Internet.

前述之文字辨識系統中，該收集模組係將該影像資訊進行截圖動作，以獲取初始截圖，且自該初始截圖中產生該文字截圖。 In the aforementioned text recognition system, the collection module performs a screenshot action on the image information to obtain an initial screenshot, and generates the text screenshot from the initial screenshot.

前述之文字辨識系統中，該辨識模組係為光學文字辨識形式之人工智慧模組。 In the aforementioned character recognition system, the recognition module is an artificial intelligence module in the form of optical character recognition.

前述之文字辨識系統中，復包括通訊連接該辨識模組以接收該目標資訊之彙整模組，係將該目標資訊整理成進階資訊，其中，該進階資訊係包含該字串資料與其所對應之參考資料。 The aforementioned text recognition system also includes an assembly module that is connected to the recognition module to receive the target information, and organizes the target information into advanced information, where the advanced information includes the string data and its location. Corresponding reference materials.

本發明復提供一種文字辨識方法，係包括：藉由收集模組進行收集作業，以將影像資訊輸入至該收集模組，使該收集模組產生文字截圖；以及藉由辨識模組進行辨識作業，使該辨識模組接收該文字截圖，以獲取包含字串資料之目標資訊。 The present invention further provides a text recognition method, which includes: performing a collection operation through a collection module to input image information into the collection module, so that the collection module generates text snippets. Figure; and perform recognition operations through the recognition module, so that the recognition module receives the text screenshot to obtain target information including string data.

前述之文字辨識方法中，該收集作業係採用網路爬蟲型式，以自動搜尋及收集於網路上所公開之該影像資訊。 In the aforementioned text recognition method, the collection operation adopts a web crawler method to automatically search and collect the image information published on the Internet.

前述之文字辨識方法中，該收集作業係對該影像資訊進行截圖動作，以獲取初始截圖，且自該初始截圖中產生該文字截圖。 In the aforementioned text recognition method, the collection operation is to perform a screenshot action on the image information to obtain an initial screenshot, and generate the text screenshot from the initial screenshot.

前述之文字辨識方法中，該辨識模組係為光學文字辨識形式之人工智慧模組。 In the aforementioned text recognition method, the recognition module is an artificial intelligence module in the form of optical text recognition.

前述之文字辨識方法中，復包括藉由彙整模組接收該目標資訊，以將該目標資訊整理成進階資訊，其中，該進階資訊係包含該字串資料與其所對應之參考資料。 The aforementioned text recognition method further includes receiving the target information through an assembly module to organize the target information into advanced information, where the advanced information includes the string data and its corresponding reference materials.

由上可知，本發明之文字辨識系統及方法，主要藉由該收集模組與該辨識模組之配合，以獲取所需之字串資料，供後續需求端應用，故相較於習知外掛軟體，本發明可有效獲取字串資料，而不受該影像資訊之背景干擾之影響，因而減少影響辨識正確率之機率，且本發明之收集模組可產生該文字截圖，再藉由該辨識模組從該文字截圖中直接辨識出該字串資料，因而可提高辨識正確率。 It can be seen from the above that the character recognition system and method of the present invention mainly obtain the required character string data through the cooperation of the collection module and the recognition module for subsequent demand-side applications. Therefore, compared with conventional plug-ins, Software, the present invention can effectively acquire character string data without being affected by the background interference of the image information, thereby reducing the chance of affecting the accuracy of recognition, and the collection module of the present invention can generate screenshots of the text, and then use the recognition The module directly recognizes the string data from the text screenshot, thus improving the recognition accuracy.

1:文字辨識系統 1: Text recognition system

1a:主機 1a:Host

10:收集模組 10:Collect modules

101:擷取模型 101: Retrieve model

102:處理模型 102: Processing models

103:暫存區 103: Temporary storage area

11:辨識模組 11:Identification module

111:訓練模型 111:Training model

112:分析模型 112:Analytical model

12:彙整模組 12: Collection module

12a:資料庫 12a:Database

2:電子裝置 2: Electronic devices

A1,A2,A3:文字區塊 A1,A2,A3: text block

P:初始截圖 P:Initial screenshot

T1,T2,T3:文字截圖 T1, T2, T3: Text screenshots

S30~S33:步驟 S30~S33: steps

圖1係為本發明之文字辨識系統之架構示意圖。 Figure 1 is a schematic structural diagram of the text recognition system of the present invention.

圖2係為本發明之文字辨識系統之配置示意圖。 Figure 2 is a schematic configuration diagram of the character recognition system of the present invention.

圖3係為本發明之文字辨識方法之流程圖。 Figure 3 is a flow chart of the text recognition method of the present invention.

圖4A至圖4C係為本發明之文字辨識系統之收集模組於運作時之實施例示意圖。 4A to 4C are schematic diagrams of an embodiment of the collection module of the character recognition system of the present invention during operation.

須知，本說明書所附圖式所繪示之結構、比例、大小等，均僅用以配合說明書所揭示之內容，以供熟悉此技藝之人士之瞭解與閱讀，並非用以限定本發明可實施之限定條件，故不具技術上之實質意義，任何結構之修飾、比例關係之改變或大小之調整，在不影響本發明所能產生之功效及所能達成之目的下，均應仍落在本發明所揭示之技術內容得能涵蓋之範圍內。同時，本說明書中所引用之如「一」、「第一」、「第二」、「上」及「下」等之用語，亦僅為便於敘述之明瞭，而非用以限定本發明可實施之範圍，其相對關係之改變或調整，在無實質變更技術內容下，當視為本發明可實施之範疇。 It should be noted that the structures, proportions, sizes, etc. shown in the drawings attached to this specification are only used to coordinate with the content disclosed in the specification for the understanding and reading of those familiar with the art, and are not used to limit the implementation of the present invention. Therefore, it has no technical substantive significance. Any structural modifications, changes in proportions, or adjustments in size shall still fall within the scope of this invention without affecting the effects that can be produced and the purposes that can be achieved. The technical content disclosed by the invention must be within the scope that can be covered. At the same time, terms such as "a", "first", "second", "upper" and "lower" cited in this specification are only for convenience of description and are not used to limit the scope of the present invention. Changes or adjustments in the scope of implementation and relative relationships shall be regarded as the scope within which the present invention can be implemented without substantially changing the technical content.

圖1係為本發明之文字辨識系統1之架構示意圖。如圖1所示，所述之文字辨識系統1係包括：一收集模組10、一辨識模組11以及一彙整模組12。 Figure 1 is a schematic structural diagram of the character recognition system 1 of the present invention. As shown in FIG. 1 , the character recognition system 1 includes: a collection module 10 , a recognition module 11 and a collection module 12 .

於本實施例中，如圖2所示，該文字辨識系統1係配載於一主機1a中，如伺服器、雲端、具有各種處理器之電腦裝置或行動裝置等。 In this embodiment, as shown in Figure 2, the character recognition system 1 is installed in a host 1a, such as a server, a cloud, a computer device or a mobile device with various processors, etc.

所述之收集模組10係用以收集影像資訊，其中，該影像資訊係包含影像資料及參考資料(如影像來源、截圖時間、原始畫面或其它項目等)。 The collection module 10 is used to collect image information, where the image information includes image data and reference materials (such as image source, screenshot time, original screen or other items, etc.).

於本實施例中，該影像資訊係適用於網路傳輸格式。例如，該影像資訊係公開於網路平台上，且該網路平台可為網路社群平台、媒體、品牌主網站等公開網站，如臉書(Facebook)粉絲頁資料、Instagram商業帳號資料、LINE官方帳號資料、Youtube頻道資料、谷歌(Google)地圖商家資料、MOD數位影音資料等。 In this embodiment, the image information is suitable for network transmission format. For example, the image information is disclosed on an online platform, and the online platform can be an online social platform, media, brand main website and other public websites, such as Facebook fan page information, Instagram business account information, LINE official account information, Youtube channel information, Google Maps business information, MOD digital audio and video information, etc.

再者，該收集模組10係可採用網路爬蟲(Video Crawler)型式，以自動搜尋及節錄網路上所公開之影像資訊。例如，該收集模組10適用於直播畫面。 Furthermore, the collection module 10 can adopt a video crawler mode to automatically search and extract image information published on the Internet. For example, the collection module 10 is suitable for live broadcast images.

又，該收集模組10係包含一用以擷取畫面之擷取模型101及一用以預處理畫面之處理模型102。 In addition, the collection module 10 includes a capture model 101 for capturing images and a processing model 102 for preprocessing images.

所述之擷取模型101係用以擷取畫面，以當輸入如Youtube之影音網址或Facebook之直播網址等初始參數(甚至可輸入如華視、公視、三立、TVBS或其它新聞網名稱之初始參數)時，該擷取模型將輸出如新聞直播畫面之初始截圖。例如，該擷取模型101係使用python程式語言編程，如Selenium、cv2、numpy、collections、PIL或其它等相關程式套件。進一步，該擷取模型101係採用Selenium建構，其使用無頭瀏覽器(headless)設定為隱藏視窗，以節省系統資源，且於該文字辨識系統1啟動後，該擷取模型101會暫停一分鐘並等待廣告結束後，再以預定秒數截圖一次。 The capture model 101 is used to capture the screen and input initial parameters such as the video URL of Youtube or the live broadcast URL of Facebook (you can even enter the names of China Television, Public Television, Sanli, TVBS or other news networks). Initial parameters), the capture model will output an initial screenshot of a live news broadcast. For example, the retrieval model 101 is programmed using python programming language, such as Selenium, cv2, numpy, collections, PIL or other related program suites. Furthermore, the capture model 101 is constructed using Selenium, which uses a headless browser (headless) and is set as a hidden window to save system resources. After the text recognition system 1 is started, the capture model 101 will pause for one minute. And wait for the advertisement to end, and then take a screenshot for a predetermined number of seconds.

所述之處理模型102係用以提取感興趣之文字區塊(Extracting Text Region of Interest)，以當輸入如新聞直播畫面之初始截圖時，該處理模型102將輸出如白底黑字形式之標題或副標題圖檔，供作為文字截圖。例如，該處理模型102於鎖定該新聞直播畫面中對應之標題及/或副標題的絕對位置後，進行該絕對位置處之文字區塊之截圖作業，且待調整該文字區塊之大小後，再將該文字區塊轉換成白底黑字之規格，供作為該文字截圖，以儲存至一如雲端資料庫之暫存區103。應可理解地，大多新聞標題區塊常因新聞主題而轉換背景顏色，故該處理模型102會視文字區塊之背景所使用之顏色豐富度，以準確將該文字區塊轉換至白底黑字規格之文字截圖。 The processing model 102 is used to extract the text region of interest (Extracting Text Region of Interest), so that when an initial screenshot of a live news broadcast is input, the processing model 102 will output a title in the form of black text on a white background. Or subtitle image file for text screenshot. For example, the processing model 102 locks the corresponding title in the live news screen. and/or the absolute position of the subtitle, take a screenshot of the text block at the absolute position, and after adjusting the size of the text block, convert the text block into white background and black text specifications for use as The text screenshot is stored in the temporary storage area 103 of the cloud database. It should be understood that most news title blocks often change the background color due to the news theme. Therefore, the processing model 102 will depend on the color richness used in the background of the text block to accurately convert the text block to white on a black background. Text screenshot of font size.

所述之辨識模組11係通訊連接該收集模組10以接收該文字截圖，並進行辨識作業以獲取目標資訊。 The recognition module 11 communicates with the collection module 10 to receive the text screenshot, and performs recognition operations to obtain target information.

於本實施例中，該辨識模組11係為光學文字辨識(Optical Character Recognition，簡稱OCR)形式之人工智慧(artificial intelligence，簡稱AI)模組，其包含至少一訓練模型111及一分析模型112。 In this embodiment, the recognition module 11 is an artificial intelligence (AI) module in the form of Optical Character Recognition (OCR), which includes at least one training model 111 and an analysis model 112 .

所述之訓練模型111之建構係分為準備階段及訓練階段，且該準備階段係準備該訓練階段用之特徵檔案。 The construction of the training model 111 is divided into a preparation stage and a training stage, and the preparation stage prepares the feature files used in the training stage.

例如，該準備階段係輸入所有欲訓練之中文字型檔(如檔名為.tiff)，以輸出該特徵檔案。於本實施例中，以第一開源工具(如jTessBoxEditorFX)將如word軟體之字型檔中之各字元區塊擷取出來，且合併成一張合成圖片，並利用第二開源工具(如Tesseract)生成暫存之box檔案，再藉由第三開源工具(如jTessBoxEditorFX)標示該合成圖片的訓練特徵範圍，之後儲存成人工智慧(AI)訓練用之特徵檔案。應可理解地，因為手動調整該合成圖片之框線需耗費大量時間，故此處tiff檔的生成係採用python程式語言生成各種字型/大小的單一文字tiff檔，再設定文字之位置，即可節省大量時間。 For example, in this preparation stage, all Chinese character files to be trained (for example, the file name is .tiff) are input to output the feature file. In this embodiment, the first open source tool (such as jTessBoxEditorFX) is used to extract each character block in the zigzag file of word software and merge it into a composite picture, and the second open source tool (such as Tesseract ) generates a temporary box file, and then uses a third open source tool (such as jTessBoxEditorFX) to mark the training feature range of the synthetic image, and then saves it as a feature file for artificial intelligence (AI) training. It should be understood that since manually adjusting the frame lines of the composite image takes a lot of time, the tiff file is generated here using the Python programming language to generate a single text tiff file of various fonts/sizes, and then set the position of the text. Save a lot of time.

再者，該訓練階段之訓練方式係輸入該特徵檔案，以輸出該訓練模型。於本實施例中，採用該準備階段所生成的AI訓練用之特徵檔案，再使用AI文字辨識機器學習用之第四開源工具(如Tesseract套件)，其核心演算法為深度學習中的長短期記憶(Long Short-Term Memory，簡稱LSTM)類神經演算法，以依不同字型與字型大小輸出成各種不同之訓練模型。 Furthermore, the training method of this training stage is to input the feature file to output the training model. In this embodiment, the feature file for AI training generated in this preparation stage is used, and then the fourth open source tool for AI text recognition machine learning (such as the Tesseract suite) is used. The core algorithm is the long-term and short-term algorithm in deep learning. The Long Short-Term Memory (LSTM) neural algorithm outputs various training models according to different fonts and font sizes.

應可理解地，有關開源工具之種類繁多，可依需求選擇，並不限於上述，且可依需求使用相同開源工具，如第一與第三開源工具、或第二與第四開源工具，並無特別限制。 It should be understood that there are many types of open source tools that can be selected according to needs, and are not limited to the above, and the same open source tools can be used according to needs, such as the first and third open source tools, or the second and fourth open source tools, and No special restrictions.

所述之分析模型112係輸入該訓練模型111與任一白底黑字型之文字截圖，再輸出包含字串資料(如文字檔，而非圖檔)之目標資訊。例如，採用第五開源工具(如Tesseract套件)，將該文字截圖進行銳利化處理後，再傳輸至指定之訓練模型111進行辨識，使該第五開源工具(如Tesseract套件)輸出對應該文字截圖之字串資料，其中，該字串資料係包含對應該文字截圖中之所有文字。 The analysis model 112 inputs the training model 111 and any text screenshot with black font on a white background, and then outputs target information including string data (such as a text file, not an image file). For example, the fifth open source tool (such as the Tesseract suite) is used to sharpen the text screenshot, and then transmitted to the designated training model 111 for recognition, so that the fifth open source tool (such as the Tesseract suite) outputs the corresponding text screenshot. The string data includes all the text in the screenshot corresponding to the text.

應可理解地，有關該人工智慧(AI)模組之種類及其建置方式繁多，並不限於上述。 It should be understood that there are many types of artificial intelligence (AI) modules and their construction methods, which are not limited to the above.

所述之彙整模組12係通訊連接該辨識模組11，以將該目標資訊整理成進階資訊。 The compilation module 12 is communicatively connected to the identification module 11 to organize the target information into advanced information.

於本實施例中，該彙整模組12係將該辨識模組11所得之字串資料與其所對應之影像來源、影片快照畫面(即截圖)、截圖時間(即該收集模組10擷取畫面之時間)及/或其它項目等參考資料(來自該暫存區103)整理成該進階資訊，如結構化的資料格式，以存放於一資料庫12a(如Google BigQuery、MySQL、ElasticSearch或其它常用類型)中，供後續進行相關應用(例如，當分析師在回溯畫面內容時，因內容皆已辨識為一般文字，可透過搜尋系統立即取得某一刻提及特定字眼的新聞畫面，同時達到影像知識保存目的)。 In this embodiment, the collection module 12 combines the string data obtained by the recognition module 11 with the corresponding image source, video snapshot screen (i.e., screenshot), and screenshot time (i.e., the collection module 10 captures the screen time) and/or other project references (from the temporary Area 103) is organized into the advanced information, such as a structured data format, to be stored in a database 12a (such as Google BigQuery, MySQL, ElasticSearch or other commonly used types) for subsequent related applications (for example, when an analyst When reviewing the content of the screen, since the content has been recognized as ordinary text, the news screen that mentions specific words at a certain moment can be immediately obtained through the search system, and the purpose of image knowledge preservation is achieved at the same time).

圖3係為本發明之文字辨識方法之流程圖。於本實施例中，該文字辨識方法係藉由該文字辨識系統1執行。 Figure 3 is a flow chart of the text recognition method of the present invention. In this embodiment, the character recognition method is executed by the character recognition system 1 .

於步驟S31中，使用者藉由電子裝置2(如圖2所示之智慧型手機、電腦或其它)啟動該文字辨識系統1，以藉由該收集模組10進行收集作業，令至少一影像資訊輸入至該收集模組10，使該收集模組10產生文字截圖。 In step S31, the user activates the text recognition system 1 through the electronic device 2 (smartphone, computer or others as shown in Figure 2) to perform collection operations through the collection module 10, so that at least one image The information is input to the collection module 10, causing the collection module 10 to generate text screenshots.

於本實施例中，該收集作業係採用網路爬蟲(Video Crawler)方式，以令該收集模組10自動收集網路平台(如圖3所示之步驟S30之資料來源)上之影像資訊，使該收集作業可對該影像資訊進行截圖動作，以獲取初始截圖，且自該初始截圖中產生該文字截圖。例如，該影像資訊係為新聞直播影片，其包含複數連續畫面，故藉由該擷取模型101擷取多張初始截圖P(如圖4A所示之其中一張)，並藉由該處理模型102提取至少一張初始截圖中之文字區塊A1,A2,A3(如圖4B所示)，以獲取至少一文字截圖T1,T2,T3(如圖4C所示)，再儲存至該暫存區103。 In this embodiment, the collection operation adopts a video crawler method, so that the collection module 10 automatically collects image information on the network platform (the data source of step S30 shown in Figure 3). The collection operation can perform a screenshot action on the image information to obtain an initial screenshot, and generate the text screenshot from the initial screenshot. For example, the image information is a live news video, which contains a plurality of continuous frames, so multiple initial screenshots P (one of which is shown in Figure 4A) are captured through the capture model 101, and through the processing model 102 Extract text blocks A1, A2, and A3 in at least one initial screenshot (as shown in Figure 4B) to obtain at least one text screenshot T1, T2, and T3 (as shown in Figure 4C), and then store them in the temporary storage area 103.

於步驟S32中，藉由辨識模組11進行辨識作業，使該辨識模組11接收該暫存區103中之至少一文字截圖，以獲取包含字串資料之目標資訊。 In step S32, the recognition module 11 performs a recognition operation so that the recognition module 11 receives at least one text screenshot in the temporary storage area 103 to obtain target information including string data.

於本實施例中，藉由該辨識模組11同時將該文字截圖與該訓練模型111輸入置該分析模型112，以令該分析模型112提供對應該文字截圖之字串資料，如下表所示： In this embodiment, the recognition module 11 simultaneously inputs the text screenshot and the training model 111 into the analysis model 112, so that the analysis model 112 provides string data corresponding to the text screenshot, as shown in the following table :

再者，該辨識模組11之訓練模型111可同時接收該暫存區103中之文字截圖，以進行機器學習作業，如圖3所示之步驟S32a。例如，該機器學習作業係包含準備階段，如圖3所示之步驟S32b，以輸入所需之字型檔(如檔名為.tiff)。 Furthermore, the training model 111 of the recognition module 11 can simultaneously receive text screenshots in the temporary storage area 103 to perform machine learning operations, as shown in step S32a in Figure 3 . For example, the machine learning operation includes a preparation stage, step S32b shown in Figure 3, to input the required zigzag file (for example, the file name is .tiff).

於步驟S33中，藉由彙整模組12進行整編作業，以儲存該目標資訊。 In step S33, the compilation module 12 performs a compilation operation to store the target information.

於本實施例中，該彙整模組12係將該字串資料與其所對應之參考資料存放至一資料庫12a，以供後續進行應用。例如，將直播內容透過AI方式辨識擷取字幕內容、字幕發生時間與原始圖片，再一併存入該資料庫12a，使需求端(如品牌公關部門、媒體公關部門、公關公司、分析師或其它等)更易於透過步驟S34之搜尋引擎(如Google)快速摘錄特定時間範圍內的電視輿情內容，而不需額外打字人員幫忙，即可即時獲取內容。 In this embodiment, the collection module 12 stores the string data and the corresponding reference materials in a database 12a for subsequent application. For example, the subtitle content, subtitle occurrence time and original pictures of the live broadcast content are recognized and captured through AI method, and then stored in the database 12a, so that the demand side (such as brand public relations department, media public relations department, public relations company, analyst or other etc.) It is easier to quickly extract the TV public opinion content within a specific time range through the search engine (such as Google) in step S34, and the content can be obtained immediately without the help of additional typing personnel.

於另一應用中，需求端可開發快訊推播功能，以整合常用之Line與E-mail等推播工具，故當直播影片內容提及特定字詞時，於E-mail收件夾或Line聊天群組可即時收到對應之推播訊息(如包含螢幕畫面與對應來源說明等內容)，需求端僅需確認推播訊息之正確性，即可針對該推播訊息之重要性進行後續應用。 In another application, the demand side can develop a news push function to integrate commonly used push tools such as Line and E-mail. Therefore, when the live video content mentions specific words, it will be displayed in the E-mail inbox or Line The chat group can immediately receive the corresponding push message (including screen images and corresponding source descriptions, etc.). The demand side only needs to confirm the correctness of the push message and can make subsequent applications based on the importance of the push message. .

或者，將該目標資訊傳輸至該辨識模組11之訓練模型111，以進行機器學習作業。 Or, the target information is transmitted to the training model 111 of the recognition module 11 to perform machine learning operations.

因此，該資料庫12a中可儲存多組目標資訊，供需求端(如品牌媒體或公關公司)統計分析各時期之熱門話題，如下表所示： Therefore, multiple sets of target information can be stored in the database 12a for the demand side (such as brand media or public relations companies) to statistically analyze hot topics in each period, as shown in the following table:

綜上所述，本發明之文字辨識系統及方法，主要藉由該收集模組與該辨識模組之配合，以從Youtube直播、Facebook直播、數位電視訊號源或其它影片資料獲取所需之字串資料，供後續需求端應用，故相較於習知外掛軟體，本發明能有效獲取字串資料，而不受該影像資訊之背景干擾之影響，因而能減少影響辨識正確率之機率，且本發明之收集模組能產生文字截圖，再以辨識模組從該文字截圖中直接辨識出字串資料，因而能大幅提高辨識正確率。 To sum up, the text recognition system and method of the present invention mainly obtain the required words from Youtube live broadcast, Facebook live broadcast, digital TV signal source or other video data through the cooperation of the collection module and the recognition module. The string data is provided for subsequent demand-side applications. Therefore, compared with conventional plug-in software, the present invention can effectively obtain string data without being affected by background interference of the image information, thus reducing the chance of affecting the accuracy of recognition, and The collection module of the present invention can generate text screenshots, and then use the recognition module to directly identify string data from the text screenshots, thus greatly improving the recognition accuracy.

因此，若需求端(如品牌公關部門、媒體公關部門、公關公司等)需即時掌握大眾輿情，藉由該文字辨識系統1所提供之目標資訊能更快掌握多台電視(或社群網路平台)之輿情。 Therefore, if the demand side (such as brand public relations department, media public relations department, public relations company, etc.) needs to grasp public opinion in real time, the target information provided by the text recognition system 1 can quickly grasp multiple TV stations (or social networks) Platform) public opinion.

再者，將直播內容透過AI技術之辨識模組獲取字幕內容(字串資料)，且將該字串資料與字幕發生時間(截圖時間)與原始畫面(或圖片)一併存入該資料庫，使需求端之分析師更易於透過搜尋引擎(如Google)快速摘錄特定時間範圍內的大眾輿情與相關截圖。 Furthermore, the subtitle content (string data) is obtained from the live content through the recognition module of AI technology, and the string data and the subtitle occurrence time (screenshot time) are stored in the database together with the original screen (or picture). It makes it easier for demand-side analysts to quickly extract public opinions and related screenshots within a specific time range through search engines (such as Google).

又，本發明之文字辨識系統之辨識成果可用於使用者預先註冊追蹤特定字詞或特定輿情快訊，當該文字辨識系統擷取的內容包含特定字詞或特定輿情快訊時，本發明之文字辨識系統之辨識成果可結合推播系統(如Line、E-mail或其它通訊方式等)告知使用者，以令使用者(或企業)能以更高效率掌握多家新聞媒體輿情，有效將反應時間縮減至分鐘內。 In addition, the recognition results of the text recognition system of the present invention can be used by users to pre-register to track specific words or specific public opinion news. When the content captured by the text recognition system contains specific words or specific public opinion news, the text recognition of the present invention The identification results of the system can be combined with the push system (such as Line, E-mail or other communication methods, etc.) to inform the user, so that the user (or company) can grasp the public opinion of multiple news media more efficiently, effectively shortening the response time. Down to minutes.

另外，當該進階資訊(或字串資料)累積一段時間後，需求端可整合如自然語言處理等之分類技術進行統計分析，例如，分析過往各主題於各新聞媒體直播的秒數與出現頻率。進一步，使用者可同時監看多方的Youtube影音或其它來源等之新聞頻道，而不需額外雇用多位監播人員觀看，以利於提供24小時不間斷監播之服務，使需求端(如公關團隊)能更輕鬆且即時掌握所需之輿情。 In addition, when the advanced information (or string data) is accumulated for a period of time, the demand side can integrate classification technologies such as natural language processing for statistical analysis. For example, analyze the number of seconds and occurrence of each topic in the live broadcast of various news media in the past. frequency. Furthermore, users can monitor multiple News channels from other sources such as YouTube or other sources can be viewed without the need to hire multiple additional monitors, which facilitates the provision of 24-hour uninterrupted monitoring services, making it easier for the demand side (such as the public relations team) to grasp in real time The public opinion needed.

上述實施形態僅例示性說明本發明之原理、特點及其功效，並非用以限制本發明之可實施範疇，任何熟習此項技藝之人士均可在不違背本發明之精神及範疇下，對上述實施形態進行修飾與改變。任何運用本發明所揭示內容而完成之等效改變及修飾，均仍應為申請專利範圍所涵蓋。因此，本發明之權利保護範圍，應如申請專利範圍所列。 The above embodiments are only illustrative to illustrate the principles, characteristics and effects of the present invention, and are not intended to limit the scope of the present invention. Anyone skilled in the art can make the above-mentioned modifications without violating the spirit and scope of the present invention. Modify and change the implementation form. Any equivalent changes and modifications made by applying the contents disclosed in the present invention shall still be covered by the patent application. Therefore, the protection scope of the present invention should be as listed in the patent application scope.

1:文字辨識系統 1: Text recognition system

1a:主機 1a:Host

10:收集模組 10:Collect modules

101:擷取模型 101: Retrieve model

102:處理模型 102: Processing models

103:暫存區 103: Temporary storage area

11:辨識模組 11:Identification module

111:訓練模型 111:Training model

112:分析模型 112:Analytical model

12:彙整模組 12: Collection module

12a:資料庫 12a:Database

Claims

A text recognition system including:

The collection module receives image information to generate text screenshots; and

The recognition module communicates with the collection module to receive the text screenshot, and performs recognition operations on the text screenshot to obtain target information including string data.

The text recognition system as described in claim 1, wherein the collection module adopts a web crawler mode to automatically search and collect the image information disclosed on the Internet.

The text recognition system as described in claim 1, wherein the collection module performs a screenshot action on the image information to obtain an initial screenshot, and generates the text screenshot from the initial screenshot.

The character recognition system as described in claim 1, wherein the recognition module is an artificial intelligence module in the form of optical character recognition.

The character recognition system described in claim 1 further includes a compilation module that is connected to the recognition module to receive the target information, and organizes the target information into advanced information, wherein the advanced information includes the character String information and its corresponding reference materials.

A text recognition method includes:

Perform collection operations through a collection module, input image information into the collection module, and cause the collection module to generate text screenshots; and

The text screenshot is received by the recognition module to perform recognition operations in order to obtain target information including string data.

The text recognition method described in claim 6, wherein the collection operation adopts a web crawler method to automatically search and collect the image information published on the Internet.

The text recognition method as described in claim 6, wherein the collection operation is to perform a screenshot action on the image information to obtain an initial screenshot, and generate the text screenshot from the initial screenshot.

The character recognition method as described in claim 6, wherein the recognition module is an artificial intelligence module in the form of optical character recognition.

The text recognition method described in claim 6 further includes receiving the target information through an assembly module to organize the target information into advanced information, wherein the advanced information includes the string data and its corresponding References.