TWM631021U

TWM631021U - Word recognition equipment

Info

Publication number: TWM631021U
Application number: TW111202960U
Authority: TW
Inventors: 周世恩; 楊雅汝
Original assignee: 多利曼股份有限公司
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-08-21

Abstract

A word recognition equipment is provided, which comprises inputting image information into a collection module, and making the collection module generate a text screenshot; and by the recognition module receiving the text screenshot, obtaining word string data without being affected by interference of background of the image information. Therefore, it can reduce the probability of affecting the accuracy of identification.

Description

text recognition device

本創作提供一種文字辨識技術，尤指一種適用於網路影像內容之文字辨識設備。 This creation provides a text recognition technology, especially a text recognition device suitable for online video content.

隨著網路平台的崛起，一般人都能輕易地透過網路平台獲取自己所需的影音內容。例如，近年來直播系統逐漸成熟，電視台除了在傳統管道播送訊號外，更整合如Youtube、Facebook等網路社群平台，將影音內容訊號透過該網路社群平台之直播工具播放，甚至一些網路媒體亦嘗試導入直播室。 With the rise of Internet platforms, ordinary people can easily obtain the audio and video content they need through Internet platforms. For example, in recent years, the live broadcast system has gradually matured. In addition to broadcasting signals through traditional channels, TV stations have also integrated online social platforms such as Youtube and Facebook to broadcast audio and video content signals through the live broadcast tools of the online social platform, and even some online social platforms. Road Media also tried to import live studios.

由於網路資訊過於發達，相關產業更需仰賴該些網路資訊。例如，品牌媒體或公關公司等產業需求端若需即時掌握社群輿情，需擷取該些網路社群平台之影音內容進行分析。 Due to the development of Internet information, related industries need to rely on the Internet information. For example, if an industry demand side such as a brand media or a public relations company needs to grasp social public opinion in real time, it is necessary to capture the video and audio content of these online social platforms for analysis.

然而，該些影音內容中包含影像資料與文字資料，致使該些影音內容之檔案大小過於龐大，導致存取不易。 However, these audio-visual contents include image data and text data, so that the file size of these audio-visual contents is too large, which makes it difficult to access.

再者，該產業需求端雖可透過外掛軟體僅擷取該些影音內容中之文字資料，但習知外掛軟體對於該影音內容中之影像資料之背景干擾(noise)非常敏感，往往會受該背景干擾而影響辨識正確率。另一方面，習知外掛軟體於辨識文字時，需於對比度極高(如白底黑字或黑底白字)之條件下進行，否則辨識正確率極低。例如，若該影像資料中之標題背景呈現漸層顏色或其它圖樣，將導致辨識正確率急劇下降。 Furthermore, although the industry demand side can only capture the text data in the audio and video content through the plug-in software, the conventional plug-in software is very sensitive to the background noise of the video data in the audio and video content, and is often affected by the background noise. Background interference affects the recognition accuracy. On the other hand, Xi When the plug-in software recognizes text, it needs to be performed under the conditions of extremely high contrast (such as black characters on a white background or white characters on a black background), otherwise the recognition accuracy rate is extremely low. For example, if the title background in the image data presents gradient colors or other patterns, the recognition accuracy will drop sharply.

因此，如何克服上述習知技術的種種問題，實已成目前亟欲解決的課題。 Therefore, how to overcome the above-mentioned various problems of the conventional technology has become an urgent problem to be solved at present.

本創作提供一種線上文字辨識設備，係包括：主機；收集模組，係配載於該主機中且接收影像資訊以產生文字截圖；以及辨識模組，係配載於該主機中且通訊連接該收集模組以接收該文字截圖，並進行辨識作業以獲取包含字串資料之目標資訊。 This creation provides an online text recognition device, which includes: a host; a collection module, which is mounted on the host and receives image information to generate text screenshots; and a recognition module, which is mounted on the host and communicates with the host The collection module receives the text screenshot, and performs an identification operation to obtain target information including string data.

前述之文字辨識設備中，該收集模組係採用網路爬蟲型式，以自動搜尋及收集於網路上所公開之該影像資訊。 In the aforementioned text recognition device, the collection module adopts a web crawler type to automatically search for and collect the image information published on the Internet.

前述之文字辨識設備中，該收集模組係將該影像資訊進行截圖動作，以獲取初始截圖，且自該初始截圖中產生該文字截圖。 In the aforementioned text recognition device, the collection module performs a screenshot operation on the image information to obtain an initial screenshot, and generates the text screenshot from the initial screenshot.

前述之文字辨識設備中，該辨識模組係為光學文字辨識形式之人工智慧模組。 In the aforementioned text recognition device, the recognition module is an artificial intelligence module in the form of optical text recognition.

前述之文字辨識設備中，復包括配載於該主機中且通訊連接該辨識模組以接收該目標資訊之彙整模組，係將該目標資訊整理成進階資訊，其中，該進階資訊係包含該字串資料與其所對應之參考資料。 The aforementioned text recognition device further includes an aggregation module that is mounted on the host and is communicatively connected to the recognition module to receive the target information, which organizes the target information into advanced information, wherein the advanced information is Contains the string data and its corresponding reference data.

由上可知，本創作之文字辨識設備，主要藉由該收集模組與該辨識模組之配合，以獲取所需之字串資料，供後續需求端應用，故相較於習知外掛軟體，本創作可有效獲取字串資料，而不受該影像資訊之背景干擾之影響，因而減少影響辨識正確率之機率，且本創作之收集模組可產生該文字截圖，再藉由該辨識模組從該文字截圖中直接辨識出該字串資料，因而可提高辨識正確率。 As can be seen from the above, the text recognition device of this creation mainly relies on the cooperation of the collection module and the recognition module to obtain the required character string data for subsequent application on the demand side. Therefore, compared with the conventional plug-in software, the This creation can effectively obtain string data regardless of the context of the image information The influence of interference, thus reducing the probability of affecting the recognition accuracy, and the collection module of this creation can generate the text screenshot, and then use the recognition module to directly identify the string data from the text screenshot, thus improving recognition. Correct rate.

1:文字辨識設備 1: Text recognition equipment

1a:主機 1a: host

10:收集模組 10: Collect Mods

101:擷取模型 101: Capture the model

102:處理模型 102: Processing Models

103:暫存區 103: Temporary storage area

11:辨識模組 11: Identification module

111:訓練模型 111: Train the model

112:分析模型 112: Analytical Models

12:彙整模組 12: Assemble module

12a:資料庫 12a:Database

2:電子裝置 2: Electronic device

A1,A2,A3:文字區塊 A1,A2,A3: Text block

P:初始截圖 P: initial screenshot

T1,T2,T3:文字截圖 T1, T2, T3: Text screenshots

S30~S33:步驟 S30~S33: Steps

圖1係為本創作之文字辨識設備之架構示意圖。 FIG. 1 is a schematic diagram of the structure of the text recognition device of the present creation.

圖2係為本創作之文字辨識設備之配置示意圖。 FIG. 2 is a schematic diagram of the configuration of the text recognition device of the present creation.

圖3係為本創作之文字辨識設備之辨識方法之流程圖。 FIG. 3 is a flowchart of the recognition method of the text recognition device of the present creation.

圖4A至圖4C係為本創作之文字辨識設備之收集模組於運作時之實施例示意圖。 4A to FIG. 4C are schematic views of an embodiment of the collection module of the character recognition device of the present creation during operation.

須知，本說明書所附圖式所繪示之結構、比例、大小等，均僅用以配合說明書所揭示之內容，以供熟悉此技藝之人士之瞭解與閱讀，並非用以限定本創作可實施之限定條件，故不具技術上之實質意義，任何結構之修飾、比例關係之改變或大小之調整，在不影響本創作所能產生之功效及所能達成之目的下，均應仍落在本創作所揭示之技術內容得能涵蓋之範圍內。同時，本說明書中所引用之如「一」、「第一」、「第二」、「上」及「下」等之用語，亦僅為便於敘述之明瞭，而非用以限定本創作可實施之範圍，其相對關係之改變或調整，在無實質變更技術內容下，當視為本創作可實施之範疇。 It should be noted that the structures, proportions, sizes, etc. shown in the drawings in this specification are only used to cooperate with the contents disclosed in the specification for the understanding and reading of those who are familiar with the art, and are not intended to limit the implementation of this creation. Therefore, it has no technical significance. Any modification of the structure, change of the proportional relationship or adjustment of the size, shall still fall within the scope of this work without affecting the effect and purpose of this creation. The technical content disclosed by the creation must be within the scope of coverage. At the same time, the terms such as "one", "first", "second", "upper" and "lower" quoted in this specification are only for the convenience of description and are not used to limit the possibilities of this creation. The scope of implementation and the change or adjustment of its relative relationship shall be regarded as the scope of implementation of this creation, provided there is no substantial change in the technical content.

圖1係為本創作之文字辨識設備1之架構示意圖。如圖1所示，所述之文字辨識設備1係包括：一主機1a、一收集模組10、一辨識模組11以及一彙整模組12。 FIG. 1 is a schematic diagram of the structure of the text recognition device 1 of the present creation. As shown in FIG. 1 , the character recognition device 1 includes: a host 1 a , a collection module 10 , a recognition module 11 and a collection module 12 .

於本實施例中，如圖2所示，該收集模組10、辨識模組11及彙整模組12係配載於該主機1a中，且該主機1a係如伺服器、雲端、具有各種處理器之電腦裝置或行動裝置等。 In this embodiment, as shown in FIG. 2 , the collection module 10 , the identification module 11 and the aggregation module 12 are mounted in the host 1 a , and the host 1 a is a server, a cloud, and has various processes. computer device or mobile device, etc.

所述之收集模組10係用以收集影像資訊，其中，該影像資訊係包含影像資料及參考資料(如影像來源、截圖時間、原始畫面或其它項目等)。 The collection module 10 is used for collecting image information, wherein the image information includes image data and reference data (such as image source, screenshot time, original image or other items, etc.).

於本實施例中，該影像資訊係適用於網路傳輸格式。例如，該影像資訊係公開於網路平台上，且該網路平台可為網路社群平台、媒體、品牌主網站等公開網站，如臉書(Facebook)粉絲頁資料、Instagram商業帳號資料、LINE官方帳號資料、Youtube頻道資料、谷歌(Google)地圖商家資料、MOD數位影音資料等。 In this embodiment, the image information is suitable for a network transmission format. For example, the image information is published on an online platform, and the online platform may be a public website such as an online community platform, media, or brand main website, such as Facebook fan page information, Instagram business account information, LINE official account information, Youtube channel information, Google map business information, MOD digital audio and video information, etc.

再者，該收集模組10係可採用網路爬蟲(Video Crawler)型式，以自動搜尋及節錄網路上所公開之影像資訊。例如，該收集模組10適用於直播畫面。 Furthermore, the collection module 10 can use a video crawler type to automatically search and extract image information published on the Internet. For example, the collection module 10 is suitable for live broadcast images.

又，該收集模組10係包含一用以擷取畫面之擷取模型101及一用以預處理畫面之處理模型102。 Furthermore, the collection module 10 includes a capture model 101 for capturing images and a processing model 102 for preprocessing images.

所述之擷取模型101係用以擷取畫面，以當輸入如Youtube之影音網址或Facebook之直播網址等初始參數(甚至可輸入如華視、公視、三立、TVBS或其它新聞網名稱之初始參數)時，該擷取模型將輸出如新聞直播畫面之初始截圖。例如，該擷取模型101係使用python程式語言編程，如Selenium、cv2、numpy、collections、PIL或其它等相關程式套件。進一步，該擷取模型101係採用Selenium建構，其使用無頭瀏覽器(headless)設定為隱藏視窗，以節省系統資源，且於該文字辨識設備1啟動後，該擷取模型101會暫停一分鐘並等待廣告結束後，再以預定秒數截圖一次。 The capture model 101 is used to capture images, so as to input initial parameters such as the video URL of Youtube or the live URL of Facebook (or even input names such as China Television, Public Television, Sanli, TVBS or other news network names). initial parameters), the extraction model will output as The initial screenshot of the live news screen. For example, the extraction model 101 is programmed using a python programming language, such as Selenium, cv2, numpy, collections, PIL, or other related program packages. Further, the capture model 101 is constructed by using Selenium, which uses a headless browser (headless) as a hidden window to save system resources, and after the text recognition device 1 is activated, the capture model 101 will pause for one minute And wait for the ad to end, then take another screenshot in a predetermined number of seconds.

所述之處理模型102係用以提取感興趣之文字區塊(Extracting Text Region of Interest)，以當輸入如新聞直播畫面之初始截圖時，該處理模型102將輸出如白底黑字形式之標題或副標題圖檔，供作為文字截圖。例如，該處理模型102於鎖定該新聞直播畫面中對應之標題及/或副標題的絕對位置後，進行該絕對位置處之文字區塊之截圖作業，且待調整該文字區塊之大小後，再將該文字區塊轉換成白底黑字之規格，供作為該文字截圖，以儲存至一如雲端資料庫之暫存區103。應可理解地，大多新聞標題區塊常因新聞主題而轉換背景顏色，故該處理模型102會視文字區塊之背景所使用之顏色豐富度，以準確將該文字區塊轉換至白底黑字規格之文字截圖。 The processing model 102 is used for extracting text regions of interest, so that when the input is an initial screenshot of a live news screen, the processing model 102 will output a title in the form of black characters on a white background. Or a subtitle image file for screenshots of text. For example, after locking the absolute position of the corresponding title and/or subtitle in the live news screen, the processing model 102 performs a screenshot operation of the text block at the absolute position, and after adjusting the size of the text block, The text block is converted into a specification of black characters on a white background, which is used as a screenshot of the text to be stored in the temporary storage area 103 such as a cloud database. It should be understood that most news headline blocks often change the background color due to the news theme, so the processing model 102 will accurately convert the text block to black on a white background according to the color richness of the background of the text block. A screenshot of the character specification.

所述之辨識模組11係通訊連接該收集模組10以接收該文字截圖，並進行辨識作業以獲取目標資訊。 The identification module 11 is communicatively connected to the collection module 10 to receive the text screenshot and perform identification operations to obtain target information.

於本實施例中，該辨識模組11係為光學文字辨識(Optical Character Recognition，簡稱OCR)形式之人工智慧(artificial intelligence，簡稱AI)模組，其包含至少一訓練模型111及一分析模型112。 In this embodiment, the recognition module 11 is an artificial intelligence (AI) module in the form of Optical Character Recognition (OCR), which includes at least a training model 111 and an analysis model 112 . .

所述之訓練模型111之建構係分為準備階段及訓練階段，且該準備階段係準備該訓練階段用之特徵檔案。 The construction of the training model 111 is divided into a preparation stage and a training stage, and the preparation stage is to prepare a feature file for the training stage.

例如，該準備階段係輸入所有欲訓練之中文字型檔(如檔名為.tiff)，以輸出該特徵檔案。於本實施例中，以第一開源工具(如jTessBoxEditorFX)將如word軟體之字型檔中之各字元區塊擷取出來，且合併成一張合成圖片，並利用第二開源工具(如Tesseract)生成暫存之box檔案，再藉由第三開源工具(如jTessBoxEditorFX)標示該合成圖片的訓練特徵範圍，之後儲存成人工智慧(AI)訓練用之特徵檔案。應可理解地，因為手動調整該合成圖片之框線需耗費大量時間，故此處tiff檔的生成係採用python程式語言生成各種字型/大小的單一文字tiff檔，再設定文字之位置，即可節省大量時間。 For example, in the preparation stage, input all Chinese character files to be trained (for example, the file name is .tiff) to output the feature file. In this embodiment, each character block in a zigzag file such as word software is extracted by a first open source tool (such as jTessBoxEditorFX), and combined into a composite image, and a second open source tool (such as Tesseract ) to generate a temporary box file, and then use a third open source tool (such as jTessBoxEditorFX) to mark the training feature range of the synthesized image, and then save it as a feature file for artificial intelligence (AI) training. It should be understood that since it takes a lot of time to manually adjust the frame line of the composite image, the generation of the tiff file here is to use the python programming language to generate a single text tiff file of various fonts/sizes, and then set the position of the text. Save a lot of time.

再者，該訓練階段之訓練方式係輸入該特徵檔案，以輸出該訓練模型。於本實施例中，採用該準備階段所生成的AI訓練用之特徵檔案，再使用AI文字辨識機器學習用之第四開源工具(如Tesseract套件),其核心演算法為深度學習中的長短期記憶(Long Short-Term Memory，簡稱LSTM)類神經演算法，以依不同字型與字型大小輸出成各種不同之訓練模型。 Furthermore, the training method in the training stage is to input the feature file to output the training model. In this embodiment, the feature file for AI training generated in the preparation stage is used, and then the fourth open source tool (such as the Tesseract suite) for AI text recognition machine learning is used, and its core algorithm is the long-term and short-term in deep learning. The memory (Long Short-Term Memory, referred to as LSTM) neural algorithm can output various training models according to different fonts and font sizes.

應可理解地，有關開源工具之種類繁多，可依需求選擇，並不限於上述，且可依需求使用相同開源工具，如第一與第三開源工具、或第二與第四開源工具，並無特別限制。 It should be understood that there are many types of open source tools, which can be selected according to needs, not limited to the above, and the same open source tools can be used according to needs, such as the first and third open source tools, or the second and fourth open source tools, and There are no particular restrictions.

所述之分析模型112係輸入該訓練模型111與任一白底黑字型之文字截圖，再輸出包含字串資料(如文字檔，而非圖檔)之目標資訊。例如，採用第五開源工具(如Tesseract套件)，將該文字截圖進行銳利化處理後，再傳輸至指定之訓練模型111進行辨識，使該第五開源工具(如Tesseract套件)輸出對應該文字截圖之字串資料，其中，該字串資料係包含對應該文字截圖中之所有文字。 The analysis model 112 inputs the training model 111 and a screenshot of any text in black on a white background, and then outputs target information including string data (eg, a text file, not an image file). For example, use a fifth open source tool (such as the Tesseract suite) to sharpen the text screenshot After processing, it is then transmitted to the designated training model 111 for identification, so that the fifth open source tool (such as the Tesseract suite) outputs the string data corresponding to the text screenshot, wherein the string data includes all the corresponding text screenshots. Word.

應可理解地，有關該人工智慧(AI)模組之種類及其建置方式繁多，並不限於上述。 It should be understood that there are various types of the artificial intelligence (AI) module and its construction methods, which are not limited to the above.

所述之彙整模組12係通訊連接該辨識模組11，以將該目標資訊整理成進階資訊。 The aggregating module 12 is communicatively connected to the identifying module 11 to organize the target information into advanced information.

於本實施例中，該彙整模組12係將該辨識模組11所得之字串資料與其所對應之影像來源、影片快照畫面(即截圖)、截圖時間(即該收集模組10擷取畫面之時間)及/或其它項目等參考資料(來自該暫存區103)整理成該進階資訊，如結構化的資料格式，以存放於一資料庫12a(如Google BigQuery、MySQL、ElasticSearch或其它常用類型)中，供後續進行相關應用(例如，當分析師在回溯畫面內容時，因內容皆已辨識為一般文字，可透過搜尋系統立即取得某一刻提及特定字眼的新聞畫面，同時達到影像知識保存目的)。 In this embodiment, the compiling module 12 is the string data obtained by the identifying module 11 and its corresponding image source, video snapshot screen (ie, screenshot), and screenshot time (ie, the screen captured by the collection module 10 ). time) and/or other items and other reference data (from the temporary storage area 103) are organized into the advanced information, such as a structured data format, to be stored in a database 12a (such as Google BigQuery, MySQL, ElasticSearch or other Common types) for subsequent related applications (for example, when analysts review the content of the screen, because the content has been recognized as general text, the search system can immediately obtain the news screen that mentions a specific word at a certain moment, while reaching the image knowledge preservation purposes).

圖3係為本創作之文字辨識設備1之辨識方法之流程圖。 FIG. 3 is a flowchart of the recognition method of the text recognition device 1 of the present creation.

於步驟S31中，使用者藉由電子裝置2(如圖2所示之智慧型手機、電腦或其它)啟動該文字辨識設備1，以藉由該收集模組10進行收集作業，令至少一影像資訊輸入至該收集模組10，使該收集模組10產生文字截圖。 In step S31, the user activates the text recognition device 1 through the electronic device 2 (such as a smart phone, a computer or others as shown in FIG. 2), so as to perform a collection operation through the collection module 10 to make at least one image The information is input to the collection module 10, so that the collection module 10 generates a text screenshot.

於本實施例中，該收集作業係採用網路爬蟲(Video Crawler)方式，以令該收集模組10自動收集網路平台(如圖3所示之步驟S30之資料來源)上之影像資訊，使該收集作業可對該影像資訊進行截圖動作，以獲取初始截圖，且自該初始截圖中產生該文字截圖。例如，該影像資訊係為新聞直播影片，其包含複數連續畫面，故藉由該擷取模型101擷取多張初始截圖P(如圖4A所示之其中一張)，並藉由該處理模型102提取至少一張初始截圖中之文字區塊A1,A2,A3(如圖4B所示)，以獲取至少一文字截圖T1,T2,T3(如圖4C所示)，再儲存至該暫存區103。 In this embodiment, the collection operation adopts a video crawler method, so that the collection module 10 automatically collects the information of the network platform (step S30 shown in FIG. 3 ). image information on the data source), so that the collection operation can perform a screenshot action on the image information to obtain an initial screenshot, and the text screenshot is generated from the initial screenshot. For example, the image information is a live news video, which includes a plurality of consecutive frames, so the capture model 101 captures a plurality of initial screenshots P (one of which is shown in FIG. 4A ), and uses the processing model 102 Extract the text blocks A1, A2, A3 in at least one initial screenshot (as shown in FIG. 4B ) to obtain at least one text screenshot T1, T2, T3 (as shown in FIG. 4C ), and then store them in the temporary storage area 103.

於步驟S32中，藉由辨識模組11進行辨識作業，使該辨識模組11接收該暫存區103中之至少一文字截圖，以獲取包含字串資料之目標資訊。 In step S32 , the identification module 11 performs an identification operation, so that the identification module 11 receives at least one text screenshot in the temporary storage area 103 to acquire target information including string data.

於本實施例中，藉由該辨識模組11同時將該文字截圖與該訓練模型111輸入置該分析模型112，以令該分析模型112提供對應該文字截圖之字串資料，如下表所示： In this embodiment, the text screenshot and the training model 111 are input into the analysis model 112 by the recognition module 11 at the same time, so that the analysis model 112 provides the string data corresponding to the text screenshot, as shown in the following table :

再者，該辨識模組11之訓練模型111可同時接收該暫存區103中之文字截圖，以進行機器學習作業，如圖3所示之步驟S32a。例如，該機器學習作業係包含準備階段，如圖3所示之步驟S32b，以輸入所需之字型檔(如檔名為.tiff)。 Furthermore, the training model 111 of the recognition module 11 can simultaneously receive the text screenshots in the temporary storage area 103 to perform machine learning operations, as shown in step S32a in FIG. 3 . For example, the machine learning operation system includes a preparatory stage, such as step S32b shown in FIG. 3 , for inputting a required zigzag file (eg, the file name is .tiff).

於步驟S33中，藉由彙整模組12進行整編作業，以儲存該目標資訊。 In step S33, the compilation operation is performed by the compilation module 12 to store the target information.

於本實施例中，該彙整模組12係將該字串資料與其所對應之參考資料存放至一資料庫12a，以供後續進行應用。例如，將直播內容透過AI方式辨識擷取字幕內容、字幕發生時間與原始圖片，再一併存入該資料庫12a，使需求端(如品牌公關部門、媒體公關部門、公關公司、分析師或其它等)更易於透過步驟S34之搜尋引擎(如Google)快速摘錄特定時間範圍內的電視輿情內容，而不需額外打字人員幫忙，即可即時獲取內容。 In this embodiment, the aggregating module 12 stores the string data and its corresponding reference data in a database 12a for subsequent application. For example, the subtitle content, the subtitle occurrence time and the original picture are identified and captured from the live broadcast content through AI, and then stored in the database 12a, so that the demand side (such as brand public relations department, media public relations department, public relations company, analysts or other etc.) it is easier to quickly extract the TV public opinion content within a specific time range through the search engine (eg, Google) in step S34, and the content can be obtained in real time without the help of additional typing staff.

於另一應用中，需求端可開發快訊推播功能，以整合常用之Line與E-mail等推播工具，故當直播影片內容提及特定字詞時，於E-mail收件夾或Line聊天群組可即時收到對應之推播訊息(如包含螢幕畫面與對應來源說明等內容)，需求端僅需確認推播訊息之正確性，即可針對該推播訊息之重要性進行後續應用。 In another application, the demand side can develop a push notification function to integrate the commonly used push tools such as Line and E-mail. Therefore, when a specific word is mentioned in the live video content, it will be displayed in the E-mail inbox or Line. The chat group can receive the corresponding push messages (such as screen images and corresponding source descriptions, etc.) in real time. The demand side only needs to confirm the correctness of the push messages, and then follow-up applications can be made according to the importance of the push messages. .

或者，將該目標資訊傳輸至該辨識模組11之訓練模型111，以進行機器學習作業。 Alternatively, the target information is transmitted to the training model 111 of the identification module 11 to perform machine learning operations.

因此，該資料庫12a中可儲存多組目標資訊，供需求端(如品牌媒體或公關公司)統計分析各時期之熱門話題，如下表所示： Therefore, the database 12a can store multiple sets of target information for the demand side (such as brand media or public relations companies) to statistically analyze the hot topics in each period, as shown in the following table:

綜上所述，本創作之文字辨識設備及方法，主要藉由該收集模組與該辨識模組之配合，以從Youtube直播、Facebook直播、數位電視訊號源或其它影片資料獲取所需之字串資料，供後續需求端應用，故相較於習知外掛軟體，本創作能有效獲取字串資料，而不受該影像資訊之背景干擾之影響，因而能減少影響辨識正確率之機率，且本創作之收集模組能產生文字截圖，再以辨識模組從該文字截圖中直接辨識出字串資料，因而能大幅提高辨識正確率。 To sum up, the text recognition device and method of this creation mainly rely on the cooperation of the collection module and the recognition module to obtain the required characters from Youtube live broadcast, Facebook live broadcast, digital TV signal source or other video data The string data is used for subsequent demand-side applications. Therefore, compared with the conventional plug-in software, this creation can effectively obtain the string data without being affected by the background interference of the image information, thereby reducing the probability of affecting the recognition accuracy. The collection module of this creation can generate text screenshots, and then use the recognition module to directly identify the string data from the text screenshots, thereby greatly improving the recognition accuracy.

因此，若需求端(如品牌公關部門、媒體公關部門、公關公司等)需即時掌握大眾輿情，藉由該文字辨識設備1所提供之目標資訊能更快掌握多台電視(或社群網路平台)之輿情。 Therefore, if the demand side (such as brand public relations department, media public relations department, public relations company, etc.) needs to grasp public opinion in real time, the target information provided by the text recognition device 1 can quickly grasp multiple TV sets (or social networks) platform) public opinion.

再者，將直播內容透過AI技術之辨識模組獲取字幕內容(字串資料)，且將該字串資料與字幕發生時間(截圖時間)與原始畫面(或圖片)一併存入該資料庫，使需求端之分析師更易於透過搜尋引擎(如Google)快速摘錄特定時間範圍內的大眾輿情與相關截圖。 Furthermore, the subtitle content (string data) of the live broadcast content is obtained through the recognition module of AI technology, and the string data and the subtitle generation time (screenshot time) are stored in the database together with the original picture (or picture), It makes it easier for demand-side analysts to quickly extract public opinion and related screenshots within a specific time frame through search engines (such as Google).

又，本創作之文字辨識設備之辨識成果可用於使用者預先註冊追蹤特定字詞或特定輿情快訊，當該文字辨識設備擷取的內容包含特定字詞或特定輿情快訊時，本創作之文字辨識設備之辨識成果可結合推播系統(如Line、E-mail或其它通訊方式等)告知使用者，以令使用者(或企業)能以更高效率掌握多家新聞媒體輿情，有效將反應時間縮減至分鐘內。 In addition, the recognition results of the text recognition device of this creation can be used for users to pre-register to track specific words or specific public opinion alerts. When the content captured by the text recognition device contains specific words or specific public opinion alerts, the text recognition of this creation The identification results of the device can be combined with the push broadcast system (such as Line, E-mail or other communication methods) to inform the user, so that the user (or enterprise) can more efficiently grasp the public opinion of multiple news media and effectively shorten the response time. Reduced to minutes.

另外，當該進階資訊(或字串資料)累積一段時間後，需求端可整合如自然語言處理等之分類技術進行統計分析，例如，分析過往各主題於各新聞媒體直播的秒數與出現頻率。進一步，使用者可同時監看多方的Youtube影音或其它來源等之新聞頻道，而不需額外雇用多位監播人員觀看，以利於提供24小時不間斷監播之服務，使需求端(如公關團隊)能更輕鬆且即時掌握所需之輿情。 In addition, after the advanced information (or string data) has been accumulated for a period of time, the demand side can integrate classification techniques such as natural language processing for statistical analysis, for example, to analyze the seconds and occurrences of the past live broadcasts of various topics in various news media frequency. Further, users can simultaneously monitor multiple news channels such as Youtube videos or other sources without hiring additional monitoring personnel to watch, so as to facilitate the provision of 24-hour uninterrupted monitoring services, enabling demand-side (such as public relations) team) can more easily and instantly grasp the required public opinion.

上述實施形態僅例示性說明本創作之原理、特點及其功效，並非用以限制本創作之可實施範疇，任何熟習此項技藝之人士均可在不違背本創作之精神及範疇下，對上述實施形態進行修飾與改變。任何運用本創作所揭示內容而完成之等效改變及修飾，均仍應為申請專利範圍所涵蓋。因此，本創作之權利保護範圍，應如申請專利範圍所列。 The above embodiments are only illustrative of the principles, features and effects of the present creation, and are not intended to limit the scope of implementation of the present creation. Modifications and changes are made to the implementation form. any application Equivalent changes and modifications made by creating the disclosed content shall still be covered by the scope of the patent application. Therefore, the scope of protection of the rights of this creation should be listed in the scope of the patent application.

1:文字辨識設備 1: Text recognition equipment

1a:主機 1a: host

10:收集模組 10: Collect Mods

101:擷取模型 101: Capture the model

102:處理模型 102: Processing Models

103:暫存區 103: Temporary storage area

11:辨識模組 11: Identification module

111:訓練模型 111: Train the model

112:分析模型 112: Analytical Models

12:彙整模組 12: Assemble module

12a:資料庫 12a:Database

Claims

A character recognition device, comprising:

host;

A collection module that is deployed in the host and receives image information to generate text screenshots; and

The identification module is installed in the host and communicated with the collection module to receive the text screenshot, and perform identification operation on the text screenshot, so as to obtain target information including string data.

The character recognition device according to claim 1, wherein the collection module adopts a web crawler type to automatically search for and collect the image information published on the Internet.

The character recognition device according to claim 1, wherein the collection module performs a screenshot operation on the image information to obtain an initial screenshot, and generates the text screenshot from the initial screenshot.

The character recognition device according to claim 1, wherein the recognition module is an artificial intelligence module in the form of optical character recognition.

The character recognition device as claimed in claim 1 further comprises an aggregation module that is mounted on the host and is communicatively connected to the recognition module to receive the target information, and organizes the target information into advanced information, wherein the Advanced information includes the string data and its corresponding reference data.