TWM618756U

TWM618756U - Image recognition system

Info

Publication number: TWM618756U
Application number: TW110208671U
Authority: TW
Inventors: 張天豪; 林易瑩; 黃建勛
Original assignee: 永豐金融控股股份有限公司
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2021-10-21

Abstract

The present utility proposes a novel image recognition system, comprising the following modules. (i) A control module that controls the operation of the object character recognition system. (ii) An image preprocessing module, coupled with the control module, that adjusts the quality of image. The image preprocessing module further contains a positioning unit that allocates the graphic coordinate within the image. (iii) An identification module, coupled with the image preprocessing module, that adjusts the graphic area to be processed in the image via the graphic coordinate and outputs recognition results according to the graphic area.

Description

Image recognition system

本創作涉及一種影像識別系統，更詳而言之，為一種藉由識別模組將所欲識別的影像資料進行比對後，轉譯出影像資料所對應圖文或符號的識別系統。 This creation relates to an image recognition system. More specifically, it is a recognition system that compares the image data to be recognized by the recognition module, and then translates the image text or symbol corresponding to the image data.

近年來，隨著硬體運算能力的日益精進，以及掃瞄儀器和手持設備大行其道的情況下，使得圖像識別的技術被廣泛運用於市場。其中，所謂的圖像識別技術，為一種辨識影像資料的技術。一般常見的應用，為透過掃描儀器將已經為電腦列印，具特定格式的文字或圖像，以光學掃描的方式輸入電腦，並經特定的編碼識別後，取代使用者的手動輸入，從而縮短工作時間，這對於處理龐大的印刷資料產生極大的幫助。此外，相對於前述光學掃描，目前市場上對於手寫的字體或符號亦有手寫辨識系統的應用，藉由如手寫板、觸控板、觸控螢幕所輸入的文字，以進行符號或文字的辨識。 In recent years, as hardware computing capabilities have become more sophisticated, and scanning instruments and handheld devices have become popular, image recognition technology has been widely used in the market. Among them, the so-called image recognition technology is a technology for recognizing image data. A common application is to use a scanning instrument to input text or images in a specific format into the computer through optical scanning, and after a specific code is recognized, it replaces the user’s manual input, thereby shortening Working time, which is of great help to the processing of huge printed materials. In addition, compared to the aforementioned optical scanning, there are also handwriting recognition systems for handwritten fonts or symbols currently on the market. The symbols or texts can be recognized by text input such as handwriting pads, touch pads, and touch screens. .

光學掃描技術最早被利用來處理大量的報刊雜誌、文件和單據報表等等。對於擁有數量龐大紙本文件的機構而言，利用文字辨識技術將文件電子化，除了減少存放空間，亦可對各類文件設定電子標籤利於分類管理，因此印刷體的文字識別發展相對較早。以漢字印刷體來說，藉由光學掃描技術進行輸入的研究，最早由兩位IBM公司旗下的研究員Casey和Nagy，在1966年於IEEE期刊發表了關於漢字識別的論文。該篇論文中，藉由將印刷體漢字的字型拆分為二極化矩陣的模板建檔為資料庫，接著採用上述資料庫的內容對所需辨識的對象進行模板匹配，最終達成可識別出1000個印刷體漢字的研究成果。 Optical scanning technology was first used to process a large number of newspapers and magazines, documents, receipts and reports, etc. For organizations with a large number of paper documents, using text recognition technology to digitize documents can not only reduce storage space, but also set electronic tags for various documents to facilitate classification management. Therefore, the development of printed text recognition is relatively early. In the case of printed Chinese characters, the research on input using optical scanning technology was first published by two IBM researchers, Casey and Nagy, in an IEEE journal in 1966 on Chinese character recognition. In this paper, by splitting the fonts of printed Chinese characters into the template of the bipolar matrix to build a database, and then using the content of the database to perform template matching on the objects to be recognized, the recognition is finally achieved Research results of 1,000 printed Chinese characters.

此外，對於在手寫文件上辨識文字的方法，在台灣先前的相關技術中，工研院亦在台灣專利公告號TW 294804揭示了一種透過OCR進行中英文表單辨識方法。該專利包含利用表單學習模組針對欲辨識的表單進行欄位、字元、格線等位置座標的學習，以加強對欲抽取的字元所位於的欄位加以定位的精準度，校正表單掃描時的傾斜與偏移；在抽取預定欄位的字元後，依照應用的需求以單一比對演算法(如該’804案中，較佳實施例之詳細描述、C.資料萃取、3.字元抽取、(ii)手寫字元抽取小節中所述)，或數個比對演算法和資料庫所儲存的字元比對，以其比對後的信心分數決定其是否需要人工更正表單。 In addition, for the method of recognizing text on handwritten documents, the previous related technology in Taiwan During the operation, the Industrial Technology Research Institute also disclosed a method of identifying Chinese and English forms through OCR in Taiwan Patent Publication No. TW 294804. The patent includes the use of the form learning module to learn the position coordinates of the fields, characters, grid lines, etc. of the form to be recognized, so as to enhance the accuracy of positioning the field where the character to be extracted is located, and to correct the form scanning The tilt and offset of the time; after extracting the characters in the predetermined field, a single comparison algorithm is used according to the needs of the application (such as the detailed description of the preferred embodiment in the '804 case, C. data extraction, 3. Character extraction, (ii) handwritten character extraction as described in the subsection), or comparison of several comparison algorithms with the characters stored in the database, and the confidence score after the comparison determines whether it needs to be corrected manually .

承上述，對於光學掃描來說，包含文字或圖像，而影響後續文字辨識之辨識率高低最重要的因素，在於其特徵的萃取，與所述特徵的比對。近年來，機器學習興起加上硬體設備支援，卷積神經網路(Convolutional Neural Networks，CNN)在圖像識別上，成了特徵萃取的主流。常見之卷積神經網路結構多由一個或多個卷積層和頂端的全聯通層組成，同時也常包含關聯權重與池化層(Pooling Layer)，藉由在所輸入的像素矩陣上，滑動各個對應不同特徵的卷積核過程中，卷積核與所輸入的像素矩陣之間會進行卷積運算(Convolution)，即執行像素矩陣和卷積核對應元素的乘積並求和，以將感像素矩陣上特定區域的資訊投影到特徵矩陣中，萃取出諸如直線、橫線、斜線、圓等特徵。 In view of the above, for optical scanning, text or images are included, and the most important factor that affects the recognition rate of subsequent text recognition is the extraction of its features and the comparison with the features. In recent years, with the rise of machine learning and the support of hardware equipment, Convolutional Neural Networks (CNN) has become the mainstream of feature extraction in image recognition. Common convolutional neural network structures are mostly composed of one or more convolutional layers and a fully connected layer at the top. They also often include associated weights and pooling layers. By sliding on the input pixel matrix, In the process of each convolution kernel corresponding to different features, a convolution operation (Convolution) is performed between the convolution kernel and the input pixel matrix, that is, the product of the pixel matrix and the corresponding elements of the convolution kernel is performed and summed, so as The information of a specific area on the pixel matrix is projected into the feature matrix, and features such as straight lines, horizontal lines, diagonal lines, circles, etc. are extracted.

然而，上述的文字辨識技術儘管在近年來有長足的發展，對於格式固定且工整的圖像或文字印刷體能達到極高的辨識率，但是對於相對潦草或變化較大的情況，尤其是手寫文字的識別，由於每個人書寫的習慣與筆跡皆不相同，其辨識率則偏低，僅能當作娛樂與趣味應用，或在辨識後由人工對辨識結果加以修正，而無法達到能被正式應用的水平，特別是當辨識出的文字需應用於正式的文件，如銀行支票、契約文件、法律文件時尤其如此。此外，在先前的技術中，也較缺乏將所辨識出的文字，在經由判讀文意後，轉換為另一種格式的應用，例如在銀行支票，或買賣契約中，將不同表示方式的手寫的漢字，如「伍仟零壹圓整」、「五仟零一圓整」、「伍仟零壹元」、「五千零一元整」等文字，轉換為對應同一文意的阿拉伯數字「5001」，造成了在商務應用上，依然需要花費大量的人力資源對較常規的文件進行審核。是以，在現時時點上，有關於圖文識別的技術，依然有進一步改善的空間。 However, although the above-mentioned text recognition technology has made great progress in recent years, it can achieve a very high recognition rate for images or text printed in a fixed and neat format, but for relatively scribble or large changes, especially handwritten text Because each person’s writing habits and handwriting are not the same, the recognition rate is low. It can only be used for entertainment and fun, or the recognition result can be corrected manually after recognition, but it cannot be officially used. This is especially true when the recognized text needs to be applied to formal documents, such as bank checks, contract documents, and legal documents. In addition, in the previous technology, there is also a lack of applications to convert the recognized text into another format after the interpretation of the text. Chinese characters, such as "five thousand zero one rounded", "five thousand zero one rounded", "five thousand zero one yuan", "five thousand and one yuan rounded" and other characters are converted into Arabic numerals corresponding to the same meaning " "5001", resulting in business applications, still need to spend a lot of human resources to review more conventional documents. Therefore, at the current point in time, there is still room for further improvement in the technology of image and text recognition.

有鑒於前述習知技術的缺點，本創作提出一種影像識別系統，其系統的架構，包含：控制模組，提供系統的運作與管理；輸入模組，輸入影像資料；影像前處理模組，耦接輸入模組，調整影像資料的資料品質，其中，影像前處理模組包含定位單元，定位影像資料的圖文座標；識別模組，耦接影像前處理模組，將影像資料分解為多個分解資料，並依據前述的圖文座標，輸出識別資料予控制模組。其中，本創作中所指的影像資料，可包含印刷或手寫於各類文件、物體的文字或符號，例如「伍仟零壹圓整」；此外，本創作中所指的識別資料，包含將上述文字或符號經過影像識別系統識別後，所輸出各種格式，例如txt.、doc.、docx.、xlsx，對應於印刷或手寫的文字或符號，如「伍仟零壹圓整」，或阿拉伯數字「5001」。其中，在本創作一實施例中，阿拉伯數字的長度範圍，可為0-20位。 In view of the shortcomings of the aforementioned conventional technology, this creation proposes an image recognition system. The system architecture includes: a control module, which provides system operation and management; an input module, which inputs image data; an image pre-processing module, which is coupled Connect the input module to adjust the data quality of the image data. The image pre-processing module includes a positioning unit to locate the image and text coordinates of the image data; the recognition module is coupled to the image pre-processing module to decompose the image data into multiple Decompose the data, and output the identification data to the control module according to the aforementioned graphic coordinates. Among them, the image data referred to in this creation can include words or symbols printed or handwritten on various documents and objects, such as "Five thousand and one round"; in addition, the identification data referred to in this creation includes After the above-mentioned characters or symbols are recognized by the image recognition system, the output formats, such as txt., doc., docx., xlsx, correspond to printed or handwritten characters or symbols, such as "Five thousand and one rounded", or Arabic The number "5001". Among them, in an embodiment of this creation, the length of the Arabic numerals can range from 0 to 20 digits.

根據本創作之內容，上述影像前處理模組更包含裁切單元，將影像資料依照圖文座標切出所需識別的圖文區域，並將圖文區域的識別結果傳輸回識別模組，提高圖文區域以及定位單元所定位圖文座標的精準度。 According to the content of this creation, the above-mentioned image pre-processing module further includes a cutting unit, which cuts the image data into the image and text area to be recognized according to the image and text coordinates, and transmits the recognition result of the image and text area back to the recognition module to improve The graphic area and the accuracy of the graphic coordinates located by the positioning unit.

根據本創作之內容，上述影像前處理模組更包含校正單元，將影像資料依照圖文座標判定影像資料的圖文角度或圖文色彩，並將圖文角度或圖文色彩的識別結果傳輸回識別模組，提高識別精準度。 According to the content of this creation, the above-mentioned image pre-processing module further includes a correction unit, which determines the image angle or image color of the image data according to the image and text coordinates, and transmits the recognition result of the image angle or image color back Recognition module to improve recognition accuracy.

根據本創作之內容，上述影像前處理模組更包含分類單元，將影像資料依照圖文座標判定影像資料的圖文類型，並將圖文類型的識別結果傳輸回識別模組，提高圖文類型的識別精準度。 According to the content of this creation, the above-mentioned image pre-processing module further includes a classification unit, which determines the image type of the image data according to the image and text coordinates, and transmits the recognition result of the image type back to the recognition module to improve the image type The accuracy of recognition.

根據本創作之內容，所述的識別模組，基於編碼-解碼架構(Encoder-Decoder)，包含編碼單元，藉由特徵提取矩陣將影像資料轉換為向量形式的編碼資料；解碼單元，依據編碼資料，計算出編碼資料的特徵向量，並輸出為解碼資料。 According to the content of this creation, the recognition module is based on the Encoder-Decoder architecture and includes an encoding unit, which converts image data into vector form encoded data through a feature extraction matrix; the decoding unit is based on the encoded data , Calculate the feature vector of the encoded data and output it as the decoded data.

根據本創作之一實施例，影像識別系統可以根據應用的需要，選擇包含資料儲存模組，耦接控制模組，將解碼資料與/或識別資料儲存為訓練資料。其中，解碼資料中，包含圖文座標、圖文區域、圖文角度、圖文色彩、圖文類型。 According to an embodiment of the present invention, the image recognition system can optionally include a data storage module, coupled to the control module, and store the decoded data and/or the recognition data as training data according to the needs of the application. Among them, the decoded data includes graphic coordinates, graphic regions, graphic angles, graphic colors, and graphic types.

在本創作一實施例，上述的定位單元以一應用的情境舉例，可具有一自動定位辨識目標位置(Automatic Positioning)的步驟，可先將輸入的影像資料(例如支票)，先依據應用的需要透過裁切單元裁切出一部分，例如上半部，或下半部。定位單元則依照裁切單元所裁切的區域，輸出一圖文座標，例如(x,y)，並由該圖文座標為基準，上下左右依據一預設值進行裁切需要辨識的範圍，例如橫軸自x-△x至x+△x，縱軸自y-△y至y+△y，識別此一範圍可表示例如「伍仟零壹圓整」或阿拉伯數字「5001」的圖文範圍，並再以裁切單元所裁切出的範圍進行後續的辨識流程。 In an embodiment of this creation, the above positioning unit takes an application scenario as an example. It can have a step of automatic positioning and recognizing the target position (Automatic Positioning). The input image data (such as a check) can be imported first according to the needs of the application. Cut out a part through the cutting unit, such as the upper half or the lower half. The positioning unit outputs a graphic coordinate according to the area cut by the cutting unit, such as ( x, y ), and based on the graphic coordinate, the range that needs to be recognized is cut up, down, left, and right according to a preset value. For example, the horizontal axis is from x- △x to x + △x , and the vertical axis is from y- △y to y + △y . Recognizing this range can indicate a graph such as "5001" or "5001". The scope of the text is then used for the subsequent identification process based on the scope cut by the cutting unit.

此外，為了改善習知技術的缺點，本創作同時提出了一種影像識別方法，其方法步驟，包含藉由輸入模組輸入待辨識的影像資料與訓練資料；將影像資料與訓練資料輸入識別模組，並藉由比對影像資料與訓練資料，以調整編碼單元、解碼單元中的向量權重；識別模組將依據向量權重與影像資料，判讀對應於影像資料的識別資料並加以輸出至所需的終端，該終端可為但不限於智慧手機、平板、筆記型電腦、桌上型電腦或是智慧穿戴裝置。 In addition, in order to improve the shortcomings of the conventional technology, this creation also proposes an image recognition method. The method steps include inputting the image data and training data to be recognized through the input module; inputting the image data and training data into the recognition module , And by comparing the image data and training data to adjust the vector weights in the coding unit and the decoding unit; the recognition module will determine the identification data corresponding to the image data based on the vector weight and the image data and output it to the required terminal , The terminal can be, but not limited to, a smart phone, a tablet, a notebook computer, a desktop computer, or a smart wearable device.

其中，在本創作內容中，上述的步驟包含於影像前處理的過程中，裁切所需識別的圖文區域。 Among them, in this creative content, the above-mentioned steps are included in the process of image pre-processing, cropping the image and text area that needs to be recognized.

根據本創作一實施例，影像識別方法的方法步驟中，包含將識別資料再次儲存為訓練資料。 According to an embodiment of the present creation, the method steps of the image recognition method include storing the recognition data as training data again.

根據本創作之一實施例，影像識別方法的方法步驟於訓練的過程中，更包含將所述識別資料比對訓練資料，並輸出一相關係數α，並計算相關係數α是否大於一預設值K。 According to an embodiment of the present creation, the method steps of the image recognition method during the training process further include comparing the recognition data with the training data, outputting a correlation coefficient α, and calculating the correlation Whether the number α is greater than a preset value K.

根據本創作之內容，影像識別方法的方法步驟中，更包含定位影像資料的圖文座標。 According to the content of this creation, the method steps of the image recognition method further include locating the graphic coordinates of the image data.

根據本創作之內容，影像識別方法的方法步驟中，更包含分類影像資料的圖文類型。 According to the content of this creation, the method steps of the image recognition method further include the graphic types of the classified image data.

根據本創作之內容，影像識別方法的方法步驟中，更包含校正影像資料的圖文角度。 According to the content of this creation, the method steps of the image recognition method further include correcting the angle of the image and text of the image data.

根據本創作之內容，影像識別方法的方法步驟中，更包含藉由識別模組，將上述的識別資料轉換為對應的文字或符號。 According to the content of this creation, the method steps of the image recognition method further include using the recognition module to convert the above-mentioned recognition data into corresponding characters or symbols.

以上所述係用以說明本創作之目的、技術手段以及其可達成之功效，相關領域內熟悉此技術之人可經由以下實施例之示範與伴隨之圖式說明及申請專利範圍更清楚明瞭本創作。 The above descriptions are used to illustrate the purpose, technical means and achievable effects of this creation. Those who are familiar with this technology in the relevant field can use the following examples of demonstrations and accompanying schematic descriptions. The scope of the patent application makes this creation clearer.

100:影像識別系統 100: Image recognition system

101:控制模組 101: Control Module

103:輸入模組 103: Input Module

105:影像前處理模組 105: Image pre-processing module

105a:裁切單元 105a: cutting unit

105c:校正單元 105c: correction unit

105e:分類單元 105e: Taxonomy unit

105g:定位單元 105g: positioning unit

107:資料儲存模組 107: Data Storage Module

109:識別模組 109: Identification Module

109a:編碼單元 109a: coding unit

109c:解碼單元 109c: Decoding unit

201:影像資料 201: image data

203:分解資料 203: Decompose data

205:編碼資料 205: coded data

207:解碼資料 207: Decoding data

400:影像識別方法 400: Image recognition method

S401-S407:方法步驟 S401-S407: Method steps

S501-S507:方法步驟 S501-S507: Method steps

如下所述之對本創作的詳細描述與實施例之示意圖，應使本創作更被充分地理解；然而，應可理解此僅限於作為理解本創作應用之參考，而非限制本創作於一特定實施例之中。 The detailed description of this creation and the schematic diagram of the embodiments as described below should make this creation more fully understood; however, it should be understood that this is only used as a reference for understanding the application of this creation, and does not limit this creation to a specific implementation In the case.

圖1係說明影像識別系統的系統架構。 Figure 1 illustrates the system architecture of the image recognition system.

圖2顯示識別模組中，編碼單元與解碼單元對所輸入的影像資料的處理方式。 Figure 2 shows how the encoding unit and the decoding unit process the input image data in the recognition module.

圖3A係說明所輸入的影像資料，如何為影像識別系統轉換為編碼資料。 Figure 3A illustrates how the input image data is converted into encoded data by the image recognition system.

圖3B係說明所輸入的影像資料，如何為影像識別系統轉換為編碼資料。 Figure 3B illustrates how the input image data is converted into encoded data by the image recognition system.

圖4係說明影像識別方法的方法步驟。 Figure 4 illustrates the method steps of the image recognition method.

圖5係進一步說明如何藉由機器學習達到影像識別的方法步驟。 Figure 5 further illustrates the steps of how to achieve image recognition through machine learning.

本創作將以較佳之實施例及觀點加以詳細敘述。下列描述提供本創作特定的施行細節，俾使閱者徹底瞭解這些實施例之實行方式。然該領域之熟習技藝者須瞭解本創作亦可在不具備這些細節之條件下實行。此外，本創作亦可藉由其他具體實施例加以運用及實施，本說明書所闡述之各項細節亦可基於不同需求而應用，且在不悖離本創作之精神下進行各種不同的修飾或變更。本創作將以較佳實施例及觀點加以敘述，此類敘述係解釋本創作之結構，僅用以說明而非用以限制本創作之申請專利範圍，在本創作的方法中，各個步驟的執行順序，可為前後執行，亦可為同時執行，其可依照本領域實際應用的需要進行調整。以下描述中使用之術語將以最廣義的合理方式解釋，即使其與本創作某特定實施例之細節描述一起使用。 This creation will be described in detail with preferred embodiments and viewpoints. The following description provides specific implementation details of this creation, so that readers can thoroughly understand the implementation of these embodiments. However, those who are familiar with the art in this field must understand that this creation can also be implemented without these details. In addition, this creation can also be used and implemented by other specific embodiments. The details described in this specification can also be applied based on different needs, and various modifications or changes can be made without departing from the spirit of this creation. . This creation will be described with preferred embodiments and viewpoints. Such descriptions are used to explain the structure of this creation, and are only used to illustrate rather than limit the scope of patent application for this creation. In the method of this creation, the execution of each step The sequence may be executed before and after, or executed at the same time, which can be adjusted according to the needs of practical applications in the field. The terms used in the following description will be interpreted in the broadest reasonable way, even if they are used together with the detailed description of a specific embodiment of this creation.

本創作之目的，在於試圖改善過往圖文識別的技術，其雖然對於格式固定工整的圖像或文字可以達到良好的辨識率，但對於書寫潦草，或所辨識的文字具有多種表達方式但代表同一文意時，則辨識率會大幅下降的缺失。本創作具體改善上述缺失的技術手段，在於使用機器學習的方式，藉由找出最佳化的演算法架構與所搭配的演算模型，以及整體所需的參數範圍，諸如考量包含如何於有限記憶體大小的情況下，於訓練與辨識效率中求取平衡、如何在演算模型進行影像資料的特徵萃取時能較快的收斂到最佳解、如何調整各個特徵萃取時的權重，以及圖文資料的辨識位置，以達成更好的辨識率，使圖文識別的技術亦能應用於較正式的文件，如銀行支票、契約文件、法律文件的識別，本創作進一步的技術手段，則將詳如下述。 The purpose of this creation is to try to improve the past graphic recognition technology. Although it can achieve a good recognition rate for images or texts with a fixed and neat format, it is scribbled or the recognized text has multiple expressions but represents the same When it is textual, the recognition rate will drop sharply. The technical means to improve the above-mentioned deficiency in this creation is to use machine learning to find out the optimized algorithm structure and the matching algorithm model, as well as the overall required parameter range, such as considering how to include in the limited memory In the case of body size, strike a balance between training and recognition efficiency, how to quickly converge to the best solution when the algorithm model extracts features of image data, how to adjust the weight of each feature extraction, and graphic data In order to achieve a better recognition rate, the graphic recognition technology can also be applied to more formal documents, such as the recognition of bank checks, contract documents, and legal documents. The further technical means of this creation will be detailed as follows Narrated.

為了達到上述目的，在本創作的策略中，為了使影像識別系統(100)得以分辨出輸入的影像資料(201)，影像識別系統(100)須先藉由機器學習的技術手段，先行訓練出一演算模型。該演算模型包含了判斷影像資料(201)中，需判讀的圖文所在之座標、區域、角度、類型(例如，銀行支票、契約文件、法律文件等等。若為銀行支票，則可能為VIP支票、非VIP支票，或其他種類支票)、該圖文呈現之型態(如，印刷、手寫)，及此些圖文表達的識別資料，(例如，「伍仟零壹圓整」、「伍千零壹元」，代表了阿拉伯數字「5001」)。由於不同座標、區域、角度、類型在真正執行辨識影像資料(201)前已經被影像識別系統(100)學習，其特徵萃取與特徵向量的運算將更有效率，以精準的探知影像資料(201)中待識別的資料。 In order to achieve the above objectives, in this creative strategy, in order for the image recognition system (100) to distinguish the input image data (201), the image recognition system (100) must first be trained by machine learning technology. A calculus model. The calculation model includes determining the coordinates, regions, angles, and types of images and texts that need to be interpreted in the image data (201) (for example, bank checks, contract documents, legal documents, etc.). If it is a bank check, it may be VIP Checks, non-VIP checks, or other types of checks), the type of graphic presentation (e.g., printing, handwriting), and the identification data of these graphic representations, (for example, "Five thousand and one round", " Five thousand and one yuan" represents the Arabic numeral "5001"). Since different coordinates, regions, angles, and types have been learned by the image recognition system (100) before the actual recognition of image data (201), the feature extraction and feature vector calculations will be more efficient, and the image data (201) can be accurately detected. ) To be identified in the data.

因此，基於上述策略，請參閱圖1，本創作提出一種影像識別系統(100)，其系統的架構包含：控制模組(101)，提供系統的運作與管理；輸入模組(103)，輸入影像資料(201)；影像前處理模組(105)，耦接輸入模組(103)，調整影像資料(201)的資料品質，其中，影像前處理模組(105)包含定位單元(105g)，定位影像資料(201)的圖文座標；識別模組(109)，耦接影像前處理模組(105)，將影像資料(201)分解為多個分解資料(203)，藉由定位單元(105g)所定位的圖文座標，調整所需處理的圖文區域，並將資料座標傳輸回影像前處理模組(105)，並依據前述的圖文區域輸出識別資料予控制模組(101)。其中，在本創作中，所述之控制模組(101)，通常包含處理晶片、記憶體、顯示裝置、網路通訊模組、作業系統及應用程式等等，以通常已知方式相互連接，執行運算、暫存、顯示及資料傳輸，提供影像識別系統(100)之運作與管理協調等功能，基於控制模組(101)屬於通常已知的架構，故在此即不再贅述。 Therefore, based on the above strategy, please refer to Figure 1. This creation proposes an image recognition system (100). The system architecture includes: a control module (101) that provides system operation and management; and an input module (103) that inputs Image data (201); image pre-processing module (105) coupled to the input module (103) to adjust the data quality of the image data (201), wherein the image pre-processing module (105) includes a positioning unit (105g) , Locate the graphic coordinates of the image data (201); the recognition module (109), coupled to the image pre-processing module (105), decomposes the image data (201) into a plurality of decomposition data (203), by the positioning unit (105g) The location of the graphic coordinates, adjust the graphic area to be processed, and transmit the data coordinates back to the image pre-processing module (105), and output the identification data to the control module (101) according to the aforementioned graphic area ). Among them, in this creation, the control module (101) usually includes a processing chip, memory, display device, network communication module, operating system, application program, etc., which are connected to each other in a commonly known manner. Perform calculation, temporary storage, display and data transmission, and provide functions such as the operation and management coordination of the image recognition system (100). Based on the control module (101), it belongs to a generally known architecture, so it will not be repeated here.

此外，在本創作之一實施例中，所述的識別資料可被人工比對或自動化的進一步回饋至識別模組(109)中，以進一步訓練機器學習的演算模型，改善資料識別的精準度。根據本創作之一觀點，除了識別資料外，本創作中的圖文座標、圖文區域、圖文角度、圖文色彩、圖文類型，亦可視應用的需要多次回饋至識別模組(109)，以改善裁切單元(105a)、校正單元(105c)、分類單元(105e)、定位單元(105g)的精準度。 In addition, in an embodiment of the present creation, the identification data can be manually compared or automatically fed back to the identification module (109) to further train the machine learning algorithm model and improve the accuracy of data identification . According to one of the viewpoints of this creation, in addition to the identification data, the graphic coordinates, graphic regions, graphic angles, graphic colors, and graphic types in this creation can also be fed back to the recognition module (109 ) To improve the cutting unit (105a), correction unit (105c), classification unit (105e), The accuracy of the positioning unit (105g).

其中，上述的影像識別系統(100)包含資料儲存模組(107)，儲存由輸入模組(103)所輸入的訓練資料。在本創作之較佳的實施例中，所述的訓練資料來源，可包含圖文中任意表達型式的識別資料(例如，前述的銀行支票、契約文件、法律文件等等，或是圖文中的文字類型，如「伍仟零壹圓整」、「伍千零壹元」，代表了阿拉伯數字「5001」)，以及事先已被標記出圖文座標、圖文區域、圖文角度、圖文色彩、圖文類型的影像資料(201)；或是，識別模組(109)將每一次所輸出的識別資料再度回饋至識別模組(109)，以進一步修正前述的圖文座標、圖文區域、圖文角度、圖文色彩，與圖文類型。亦即，在每一次的識別資料產生後，其辨識的結果經修正後(人工或由影像識別系統100自動化)，均會被用於改進下一次識別資料辨識的精準度。 Wherein, the aforementioned image recognition system (100) includes a data storage module (107) for storing training data input by the input module (103). In a preferred embodiment of the present creation, the training data source may include any type of identification data in graphics and text (for example, the aforementioned bank checks, contract documents, legal documents, etc., or in graphics and text The type of text, such as "Five thousand and one round", "Five thousand and one yuan", which represents the Arabic numeral "5001"), and has been marked with the graphic coordinates, graphic area, graphic angle, and graphic in advance. The image data (201) of text color and graphic type; or, the recognition module (109) will feed back each output recognition data to the recognition module (109) to further modify the aforementioned graphic coordinates and graphics. Text area, graphic angle, graphic color, and graphic type. That is, after each identification data is generated, the identification result is corrected (manually or automatically by the image recognition system 100), and it will be used to improve the accuracy of the next identification data identification.

請參閱圖2、圖3A與圖3B，根據本創作之一觀點，為了較佳的對應「將影像資料(201)轉換為向量形式表達的編碼資料(205)，再由編碼資料(205)經由特徵提取矩陣運算出解碼資料(207)，並最終由識別模組(109)轉換為識別資料」此一序列至序列(Sequence to Sequence)形式的問題，在本創作較佳地實施例中，識別模組(109)採用基於編碼-解碼(Encoder-Decoder)的訓練架構，因此在識別模組(109)中，包含了編碼單元(109a)與解碼單元(109c)的系統元件。其中，編碼單元(109a)將數個分解資料(203)編碼為編碼資料(205)，以及編碼單元(109a)將編碼資料(205)解碼為解碼資料(207)的過程中，編碼單元(109a)與解碼單元(109c)的演算模型，各自可以依照應用的需要，選擇例如卷積神經網路(Convolutional Neural Networks，CNN)、遞歸神經網路(Recurrent Neural Network，RNN)、長短期記憶網路(Long Short-Term Memory，LSTM)、雙向遞歸神經網路(Bidirectional RNN，BiRNN)、循環門控網路(Gated Recurrent Unit，GRU)、注意力模型(Attention)等等。舉例來說，該識別模組可採用循環門控網路作為編碼單元(109a)，但於解碼單元(109c)採用雙向遞歸神經網路。其中，在本創作中所述的編碼資料(205)、解碼資料(207)可被表示為一高維度的向量。以編碼資料(205)而言，在本創作的一個實施例中，其可依據每個分解資料(203)的灰階色彩，將其表示為一二元值的矩陣。例如，將灰階色彩較高的部分，表示為1，而灰階色彩較低之處表示為零。則在圖文「伍」中，編碼資料(205)的表達方式，則可為一u x v的矩陣，其中，u與v係不為0的正整數，並可依應用的需要選擇其大小。 Please refer to Figure 2, Figure 3A and Figure 3B. According to one viewpoint of this creation, in order to better correspond to "convert the image data (201) into the coded data (205) expressed in vector form, and then the coded data (205) through The feature extraction matrix calculates the decoded data (207), which is finally converted into the identification data by the recognition module (109). This problem in the form of "Sequence to Sequence", in the preferred embodiment of this creation, recognizes The module (109) adopts an Encoder-Decoder based training architecture. Therefore, the recognition module (109) includes system components of an encoding unit (109a) and a decoding unit (109c). Among them, the encoding unit (109a) encodes several decomposed data (203) into encoded data (205), and the encoding unit (109a) decodes the encoded data (205) into decoded data (207), the encoding unit (109a) The calculation models of the decoding unit (109c) and the decoding unit (109c) can be selected according to the needs of the application, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and long- and short-term memory networks. (Long Short-Term Memory, LSTM), Bidirectional RNN (BiRNN), Gated Recurrent Unit (GRU), Attention, etc. For example, the recognition module can use a cyclic gated network as the encoding unit (109a), but the decoding unit (109c) uses a bidirectional recurrent neural network. Among them, the encoded data (205) and decoded data (207) described in this creation can be expressed as a high-dimensional vector. For the coded data (205), in an embodiment of the present creation, it can be expressed as a binary value matrix according to the grayscale color of each decomposition data (203). For example, the part with higher gray-scale color is represented as 1, and the gray-scale color The lower part is indicated as zero. Then in the graphic "5", the expression of the coded data (205) can be a matrix of u x v, where u and v are positive integers other than 0, and their size can be selected according to the needs of the application.

承上述，在本案較佳的實施例中，編碼單元(109a)採用卷積神經網路，解碼單元(109c)則採用長短期記憶網路，並以注意力模型強化編碼資料(205)中特徵向量的向量權重，避免當所需學習的影像資料(201)與訓練資料數量龐大時，識別模組其特徵向量可能會有弱化的現象。此外，根據本創作之一觀點，為了較佳的使本創作中的編碼單元(109a)與解碼單元(109c)在應用於影像識別系統(100)的過程中，諸如訓練資料的建立、影像前處理模組(105)中的判斷執行(例如圖文座標、圖文區域、圖文角度、圖文色彩、圖文類型等前處理)、識別模組(109)依據前述的資料，輸出為識別資料的過程，能取得較好的訓練效率與辨識效率，其影像識別系統(100)的各種參數可設定為以下：Epoch(訓練次數)：5-65；Batch Size(批次大小)：10-1024；Early Stop(早停值)：5-30；Learning Rate(學習率)：10^-2-10^-6。其中，上列參數的設定，其優勢是可在記憶體的效率與記憶體容量間取得平衡，並同時達到辨識效率、特徵萃取、以及辨識率的優化，使影像識別系統(100)得以應用於正式文件的識別，例如銀行支票、契約文件，或法律文件。 In view of the above, in the preferred embodiment of this case, the coding unit (109a) uses a convolutional neural network, and the decoding unit (109c) uses a long-term short-term memory network, and uses an attention model to enhance the features in the encoded data (205) The vector weight of the vector avoids the phenomenon that the feature vector of the recognition module may be weakened when the number of image data (201) and training data to be learned is large. In addition, according to one of the viewpoints of this creation, in order to better enable the encoding unit (109a) and decoding unit (109c) in this creation to be applied to the image recognition system (100), such as the creation of training data and image pre-image The judgment execution in the processing module (105) (such as pre-processing of graphic coordinates, graphic area, graphic angle, graphic color, graphic type, etc.), the recognition module (109) outputs the recognition based on the aforementioned data The process of data can achieve better training efficiency and recognition efficiency. The various parameters of the image recognition system (100) can be set as follows: Epoch (training times): 5-65; Batch Size (batch size): 10- 1024; Early Stop: 5-30; Learning Rate: 10 ^-2 -10 ^-6 . Among them, the setting of the above parameters has the advantage of achieving a balance between memory efficiency and memory capacity, and at the same time achieving recognition efficiency, feature extraction, and optimization of recognition rate, so that the image recognition system (100) can be applied Identification of official documents, such as bank checks, contract documents, or legal documents.

根據本創作內容，影像前處理模組(105)包含裁切單元(105a)，其裁切影像資料(201)的方式，請參閱圖3A。在本創作之實施例中，當所需辨識的影像資料(201)中具有多個圖文時，裁切單元(105a)依據資料儲存模組(107)中所儲存的訓練資料，裁切出與欲辨識圖文適當大小的圖文區域(如欄位、範圍)，以先行篩選出每個圖文所存在的圖文區域。在本創作較佳地實施例中，圖文的整體被視為一個完整的輸入，裁切單元(105a)依照圖文座標，將圖文所在的圖文區域自原圖中分開，其無須針對不同圖文中的各別字元進行裁切(例如，將伍、仟、零、壹、圓、整各別切開)。而於演算模型建立的過程中，則同樣以圖文的整體，如「伍仟零一圓整」，使識別模組(109)得以分辨出其整體的文意，而非個別不相關的文字。此外，在本創作的另一實施例中，裁切單元(105a)亦可依照應用的需要將字元間各別裁切，以避免圖文與圖文之間，因筆畫彼此連接，使影像識別系統(100)將兩個欲辨識的圖文視作一個圖文，而使後續識別模組(109)在識別上出現錯誤。例如將「伍」、「仟」兩個文字的視作一個文字，而非獨立的兩個文字。當裁切單元(105a)將適當的圖文區域裁切完畢後，其每一次的結果可被再度傳輸回識別模組(109)，以進一步修正定位單元(105g)下一次所定位的圖文區域，提高圖文區域的定位的精準度(人工或由影像識別系統100自動化修正)。 According to this creative content, the image pre-processing module (105) includes a cutting unit (105a), and the way of cutting the image data (201) is shown in FIG. 3A. In the embodiment of this creation, when there are multiple images in the image data (201) to be recognized, the cropping unit (105a) crops out according to the training data stored in the data storage module (107) To identify the appropriate size of the image and text area (such as field, range), to filter out the image area of each image and text in advance. In the preferred embodiment of this authoring, the whole picture and text is regarded as a complete input, and the cutting unit (105a) separates the picture and text area where the picture and text are located from the original picture according to the coordinates of the picture and text. Cut individual characters in different pictures and texts (for example, cut five, thousand, zero, one, circle, and whole respectively). In the process of establishing the calculation model, the whole picture and text are also used, such as "Five thousand and one round", so that the recognition module (109) can distinguish the overall text, rather than individual irrelevant text. . In addition, in another embodiment of this creation, the cutting unit (105a) can also be used according to the needs of the application. The characters should be cut separately to avoid the connection between the text and the text because the strokes are connected to each other, so that the image recognition system (100) will treat the two texts to be recognized as one image and text, and make subsequent recognition Module (109) has an error in recognition. For example, the two characters "五" and "千" are regarded as one character instead of two independent characters. After the cutting unit (105a) cuts the appropriate image and text area, the result of each time can be transmitted back to the recognition module (109) to further correct the image and text positioned by the positioning unit (105g) next time Area, to improve the positioning accuracy of the graphic area (manually or automatically corrected by the image recognition system 100).

根據本創作一實施例，上述裁切單元(105a)的裁切方式，稱為自動定位辨識，其詳細的實施方式，請參閱圖3A。其可先將輸入的影像資料(例如支票)，依據應用的需要裁切出一部分，例如上半部，或下半部。舉例來說，在圖3A中，可依照一支票的通常位置，由裁切單單元(109a)裁切出欲辨識的圖文區域，例如面額大小的欄位。具體的做法，為透過定位單元(105g)自動定位圖文座標(x,y)，並由該(x,y)為基準，上下左右由一範圍△x與△y進行裁切需要辨識的範圍，例如橫軸自x-△x至x+△x，縱軸自y-△y至y+△y，以切出「伍仟零壹圓整」的圖文區域。爾後再由識別模組(109)依照裁切單元(109a)所裁切的位置，進行後續的辨識流程。 According to an embodiment of the present creation, the cutting method of the above-mentioned cutting unit (105a) is called automatic positioning and identification. For a detailed implementation method, please refer to FIG. 3A. It can first cut out a part of the input image data (such as a check) according to the needs of the application, such as the upper half or the lower half. For example, in FIG. 3A, according to the normal position of a check, the cutting list unit (109a) cuts out a graphic area to be recognized, such as a denomination field. The specific method is to automatically locate the graphic coordinates ( x, y ) through the positioning unit (105g), and use the ( x, y ) as the reference, and cut the range from △x and △y to the range that needs to be identified. , For example, the horizontal axis is from x- △x to x + △x and the vertical axis is from y- △y to y + △y to cut out the graphic area of "Five thousand zero one rounded". Afterwards, the recognition module (109) performs the subsequent recognition process according to the position cut by the cutting unit (109a).

根據本創作內容，影像前處理模組(105)包含校正單元(105c)，用於校正影像資料(201)的色彩，例如RGB三色的強度。在本創作一實施例中，當所需辨識的影像資料(201)中的圖文，其RGB三色中的任意訊號強度過強，例如，當「伍仟零壹圓整」圖文中，其背景的紅色印章強度過強，可能影響到識別模組(109)對於圖文的辨識時，校正單元(105c)可依據資料儲存模組(107)中所儲存的訓練資料，將影像資料(201)的紅色強度降低，避免影像資料(201)中的圖文因背景的色彩太強，導致辨識錯誤。較佳地，當校正單元(105c)將圖文色彩校正完畢後，其每一次的結果則被再度傳輸回識別模組(109)，依據識別模組(109)之識別結果，搭配人工或自動化修正，以進一步修正校正單元(105c)下一次的所校正圖文色彩，提高其校正的精準度。 According to this creative content, the image pre-processing module (105) includes a correction unit (105c) for correcting the color of the image data (201), such as the intensity of the RGB three colors. In an embodiment of this creation, when the image and text in the image data (201) to be recognized has any signal strength in the RGB three colors is too strong, for example, when the image and text in "Five Thousand Zero One Round" is too strong, The red stamp in the background is too strong, which may affect the recognition of images and texts by the recognition module (109), the correction unit (105c) can convert the image data ( The red intensity of 201) is reduced to prevent the image data (201) from being identified incorrectly due to the strong background color. Preferably, after the correction unit (105c) completes the color correction of the image and text, the result of each time is transmitted back to the recognition module (109). According to the recognition result of the recognition module (109), it is combined with manual or automatic The correction is used to further correct the graphic color to be corrected next time by the correction unit (105c) to improve the accuracy of the correction.

根據本創作另一實施例，校正單元(105c)亦可被用於校正影像資料(201)角度，其校正的方式請參閱圖3A。在本創作之實施例中，當所需辨識的影像資料(201)，其中的圖文具有一δ的傾斜角度時，校正單元(105c)依據資料儲存模組(107)中所儲存的訓練資料，校正圖文區域中圖文的角度，避免影像資料(201)中的圖文因傾斜角度過大，使圖文中的筆劃識別出現錯誤。較佳地，當校正單元(105c)將圖文區域校正完畢後，其每一次的結果則被再度傳輸回識別模組(109)，以進一步修正校正單元(105c)下一次的所校正圖文區域，提高圖文區域的校正的精準度。其中，在本創作一實施例中，該修正可為人工或自動化。 According to another embodiment of the present invention, the correction unit (105c) can also be used to correct the angle of the image data (201). Please refer to FIG. 3A for the correction method. In the embodiment of this creation, when the need to identify When the image data (201) has an inclination angle of δ, the correction unit (105c) corrects the angle of the image and text in the image and text area according to the training data stored in the data storage module (107) to avoid the image The image and text in the data (201) have too large an inclination angle, which makes the stroke recognition error in the image and text. Preferably, after the correction unit (105c) finishes correcting the graphic area, the result of each time is transmitted back to the recognition module (109) to further correct the graphic and text corrected next time by the correction unit (105c) Area, improve the accuracy of the correction of the graphic area. Among them, in an embodiment of the present creation, the correction can be manual or automated.

根據本創作內容，影像前處理模組(105)包含分類單元(105e)。在本創作之實施例中，在由識別模組(109)進行影像資料(201)中的圖文辨識前，分類單元(105e)依據資料儲存模組(107)中所儲存的訓練資料，分類影像資料(201)中的文件類型係屬於銀行支票、契約文件，或法律文件。例如，當影像資料(201)的文件屬於銀行支票時，分類單元(105e)將所述的銀行支票藉由訓練資料將其分類為VIP支票、非VIP支票，或它種支票。其每一次的分類結果則被再度傳輸回識別模組(109)與影像前處理模組(105)，使裁切單元(105a)、校正單元(105c)、定位單元(105g)，能更精確的裁切、校正、定位出圖文座標、圖文區域、圖文角度、圖文色彩，提高影像前處理模組(105)的精準度。 According to this creative content, the image pre-processing module (105) includes a classification unit (105e). In the embodiment of this creation, before the image data (201) is recognized by the recognition module (109), the classification unit (105e) classifies the training data according to the training data stored in the data storage module (107) The document types in the image data (201) are bank checks, contract documents, or legal documents. For example, when the file of the image data (201) belongs to a bank check, the classification unit (105e) classifies the bank check as a VIP check, a non-VIP check, or other kinds of checks through the training data. Each classification result is transmitted back to the recognition module (109) and image pre-processing module (105), so that the cutting unit (105a), correction unit (105c), and positioning unit (105g) can be more accurate The cutting, correcting, and positioning of the graphic coordinates, graphic area, graphic angle, and graphic color can improve the accuracy of the image pre-processing module (105).

因此，基於本創作之目的與策略，請參閱圖4與圖5，本創作提出一種影像識別方法(400)，其方法步驟，包含：於步驟(S405)中，分類影像資料(201)的圖文類型；於步驟(S401)中，藉由輸入模組(103)輸入訓練資料，與待辨識的影像資料(201)，標記對應於影像資料(201)的訓練資料；在步驟(S501)中，將訓練資料與影像資料(201)輸入識別模組(109)之編碼單元(109a)，並由一特徵提取矩陣轉換為向量形式的編碼資料(205)；於步驟(S502)中將編碼資料(205)輸入解碼單元(109c)，並由一特徵提取矩陣轉換為向量形式的解碼資料(207)；並由步驟(S503)將解碼資料(207)傳送至識別模組(109)；執行步驟(S406)，藉由識別模組(109)，將解碼資料轉換為對應的識別資料。 Therefore, based on the purpose and strategy of this creation, please refer to Figures 4 and 5. This creation proposes an image recognition method (400). The method steps include: in step (S405), the image data (201) is classified Text type; in step (S401), input training data through the input module (103), and the image data (201) to be recognized, and mark the training data corresponding to the image data (201); in step (S501) , The training data and image data (201) are input into the coding unit (109a) of the recognition module (109), and a feature extraction matrix is converted into vector form coding data (205); in step (S502) the coding data (205) Input the decoding unit (109c), and convert a feature extraction matrix into decoded data in vector form (207); and send the decoded data (207) to the recognition module (109) in step (S503); execute step (S406), using the identification module (109) to convert the decoded data into corresponding identification data.

其中，在本創作一實施例中，影像識別方法，包含本創作於影像前處理時所執行之步驟(S404)，亦即裁切出影像資料(201)中所需識別的圖文區域；及步驟(S407)，將解碼資料(207)與識別資料，儲存為訓練資料。 Among them, in an embodiment of the creation, the image recognition method includes the step (S404) performed during the image preprocessing of the creation, that is, cropping out the image and text area that needs to be recognized in the image data (201); and In step (S407), the decoded data (207) and the identification data are stored as training data.

其中，在本創作一較佳地實施例中，編碼-解碼網路中，其編碼架構使用卷積神經網路，解碼架構則使用長短期記憶網路，並以注意力模型強化編碼資料(205)中特徵向量的向量權重，避免當所訓練的影像資料(201)數量龐大時，識別模組其特徵向量可能產生弱化的現象。 Among them, in a preferred embodiment of this creation, the encoding-decoding network uses a convolutional neural network for its encoding architecture, and a long- and short-term memory network for its decoding architecture, and uses an attention model to enhance the encoded data (205 The vector weight of the feature vector in) avoids the phenomenon that the feature vector of the recognition module may be weakened when the amount of training image data (201) is large.

在本創作一較佳地實施例中，影像資料(201)與識別資料的比對方式，為藉由步驟(S504)中，對應於影像資料(201)的識別資料與訓練資料比對，並輸出一相關係數α。其中，於步驟(S505)，判斷所述的相關係數α是否大於一預設值K，若是，則執行步驟(S507)，將解碼資料(207)儲存為訓練資料，並決定是否接受、通過或是保留、儲存該向量權重，若否，則於步驟(S506)中，執行步驟(S404)，以重新調整特徵向量權重，並決定是否停止或拒絕後續執行步驟。 In a preferred embodiment of the present creation, the method of comparing the image data (201) with the identification data is by comparing the identification data corresponding to the image data (201) with the training data in the step (S504), and Output a correlation coefficient α . Wherein, in step (S505), it is determined whether the correlation coefficient α is greater than a preset value K, if yes, step (S507) is executed to store the decoded data (207) as training data, and it is determined whether to accept, pass or It is to retain and store the vector weight. If not, in step (S506), perform step (S404) to readjust the feature vector weight and decide whether to stop or reject subsequent steps.

根據本創作之內容，影像識別方法(400)中，更包含執行步驟(S402)，藉由定位單元(105g)定位影像資料(201)的圖文座標；於步驟(S403)中，藉由校正單元(105c)校正影像資料(201)的圖文角度和圖文色彩。 According to the content of this creation, the image recognition method (400) further includes performing step (S402), locating the graphic coordinates of the image data (201) by the positioning unit (105g); in step (S403), by correcting The unit (105c) corrects the angle and color of the image and text of the image data (201).

以上敘述係為本創作之較佳實施例。此領域之技藝者應得以領會其係用以說明本創作而非用以限定本創作所主張之專利權利範圍。其專利保護範圍當視後附之申請專利範圍及其等同領域而定。凡熟悉此領域之技藝者，在不脫離本專利精神或範圍內，所作之更動或潤飾，均屬於本創作所揭示精神下所完成之等效改變或設計，且應包含在下述之申請專利範圍內。 The above description is the preferred embodiment of this creation. Those skilled in this field should be able to understand that it is used to illustrate the creation and not to limit the scope of the patent rights claimed by this creation. The scope of its patent protection shall be determined by the scope of the attached patent application and its equivalent fields. Anyone familiar with the art in this field, without departing from the spirit or scope of this patent, makes changes or modifications that are equivalent changes or designs completed under the spirit of this creation, and should be included in the scope of the following patent application Inside.

100:影像識別系統 100: Image recognition system

101:控制模組 101: Control Module

103:輸入模組 103: Input Module

105:影像前處理模組 105: Image pre-processing module

105a:裁切單元 105a: cutting unit

105c:校正單元 105c: correction unit

105e:分類單元 105e: Taxonomy unit

105g:定位單元 105g: positioning unit

107:資料儲存模組 107: Data Storage Module

109:識別模組 109: Identification Module

109a:編碼單元 109a: coding unit

109c:解碼單元 109c: Decoding unit

Claims

An image recognition system, including:

A control module to provide system operation and management;

An input module to input image data;

An image pre-processing module coupled to the input module to adjust the data quality of the image data, wherein the image pre-processing module further includes a positioning unit for locating at least one graphic coordinate of the image data; and,

An identification module, coupled to the image pre-processing module, decomposes the image data into a plurality of decomposition data, and recognizes and outputs at least one identification data according to the at least one graphic coordinate.

The image recognition system according to claim 1, wherein the image pre-processing module further includes a cropping unit for cropping the image data into at least one graphic area according to at least one graphic coordinate positioned by the positioning unit , Thereby adjusting the range of processing required by the image recognition system.

The image recognition system according to claim 1, wherein the image pre-processing module further includes a correction unit, and when the intensity of a color in the graphic data may affect the recognition of the recognition module, the correction unit adjusts the image data The color of a graphic in.

The image recognition system according to claim 2, wherein the image pre-processing module further includes a classification unit for classifying the file types in the image data according to the at least one graphic area.

The image recognition system according to claim 1, wherein the recognition module further includes an encoding unit for encoding the plurality of decomposition data into at least one encoded data in the form of a vector; and a decoding unit for encoding the at least one The encoded data is output as at least one decoded data.

The image recognition system according to claim 5, wherein the calculation model of the encoding unit is a convolutional neural network; the calculation model of the decoding unit is a long and short-term memory network.

The image recognition system according to claim 1, wherein the control module sets the number of times of training (Epoch) of the recognition module to 5-65 times.

The image recognition system according to claim 1, wherein the control module sets the batch size of the recognition module to 10-1024.

The image recognition system according to claim 1, wherein the control module sets the early stop value (Early Stop) of the recognition module to 5-30.

The image recognition system according to claim 1, wherein the control module sets the learning rate of the recognition module to 10 ^-2 -10 ^-6 .