TW202004549A

TW202004549A - A method for character pattern recognition

Info

Publication number: TW202004549A
Application number: TW107118676A
Authority: TW
Inventors: 陳正倫; 黃照凱; 吳庭禎
Original assignee: 國立中興大學
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2020-01-16
Also published as: TWI685796B

Abstract

The present invention provides a method for character pattern recognition. A network of a single hidden layer is chosen; one uses an initialized figure to set the number of hidden layer elements and then generates a parameter randomly. By using a weight of an output layer link calculated by character-training images going through the network, and establishes a new neuron and generates its parameter by random number. Calculating a weight of the output link of the neuron by the character-training images, then determines whether the current neuron should be discarded according to the the errors of the overall recognition after adding the neuron to the current network. One repeats the steps of establishing neurons and determining if the neurons should be discarded until a stopping condition is met, then one inputs the character-training images and begins character pattern recognition. Said network only needs a single hidden layer while the training is for calculating output link parameters and also ensures newly-added neurons to decrease recognition errors. Besides the simplification of the nerve network structure and the acceleration of its training, decrease in recognition time and precision is also achieved.

Description

Intelligent text and figure recognition method

本發明係關於一種文字圖形識別方法，尤指一種基於機器學習之智慧型文字圖形識別方法，其具有精簡之運算系統架構，並可提升識別精確度及減少識別所需時間。The invention relates to a text and figure recognition method, in particular to a smart text and figure recognition method based on machine learning, which has a simplified computing system architecture, and can improve recognition accuracy and reduce the time required for recognition.

圖形識別技術為人工智慧的濫觴，亦為帶動各種人工智慧產品的主要技術之一，目前圖形識別技術已廣泛用於各種電子產品與大型系統上，包括手寫文字識別、車牌識別、人像識別、人臉識別、文字識別等。Graphic recognition technology is the origin of artificial intelligence, and it is also one of the main technologies driving various artificial intelligence products. At present, graphic recognition technology has been widely used in various electronic products and large-scale systems, including handwritten text recognition, license plate recognition, portrait recognition, people Face recognition, text recognition, etc.

圖形識別技術中，主要可分為構造圖形識別與決策圖形識別（或稱統計圖形識別），在文字圖形識別技術中，常使用到的圖形識別技術包含構造圖形識別中的八連通法、投影法、樣板比對法及邊緣統計法等等，而決策圖形識別技術中，常用的則有支援向量機（SVM）、放射基底函數（RBF）及類神經網路的卷積神經網路（Convolutional Neural Network，簡稱 CNN）。In the pattern recognition technology, it can be mainly divided into structured pattern recognition and decision-making pattern recognition (or statistical pattern recognition). In the text pattern recognition technology, the commonly used pattern recognition technology includes the eight connected method and the projection method in the structure pattern recognition. , Template comparison method and edge statistical method, etc., and in decision-making pattern recognition technology, commonly used support vector machine (SVM), radial basis function (RBF) and neural network-like convolutional neural network (Convolutional Neural Network, referred to as CNN).

構造圖形識別主要係利用文字構造進行辨識，在機器學習技術尚未盛行之前，主要採用構造圖形識別進行文字辨識，然而，構造圖形識別技術主要係針對文字圖形構造進行設計，因此用於不同的文字圖形，皆需要設計不同的識別方法，且每個辨識中需要進行多個影像處理、數值統計等步驟，因此每次辨識都需要花費較長的時間，加上容易受到雜訊影像辨識精準度，其整體而言的精準度也低於機器學習與類神經網路，因此，目前已逐漸受決策圖形識別取代。Structural pattern recognition mainly uses text structure for identification. Before machine learning technology is prevalent, structural pattern recognition is mainly used for text recognition. However, structural pattern recognition technology is mainly designed for text graphic structure, so it is used for different text graphics , All need to design different recognition methods, and each recognition needs to perform multiple image processing, numerical statistics and other steps, so each recognition takes a long time, coupled with the accuracy of noise image recognition, which The accuracy on the whole is also lower than that of machine learning and neural networks. Therefore, it has been gradually replaced by decision pattern recognition.

決策圖形識別中，主要係將文字影像進行各種濾鏡之資料預處理步驟後，利用機器學習方法進行訓練，相較於構造圖形識別，決策圖形識別的影用範圍較為廣泛，較不容易受到雜訊影響，且訓練完畢後，機器只需儲存訓練後的權重參數，將新的資料點帶入參數進行矩陣運算即可得到輸出結果，其辨識花費時間較短。In decision-making pattern recognition, the text image is pre-processed with various filter data pre-processing steps, and then trained using machine learning methods. Compared with constructing pattern recognition, decision-making pattern recognition has a wider range of applications and is less susceptible to confusion After the training is completed, the machine only needs to store the weight parameters after training and bring the new data points into the parameters for matrix operation to obtain the output result, and the identification takes less time.

目前用於圖形識別中最常用的卷積類神經網路，其主要結構包含一組卷積層及一組平坦層，各別包含多個神經元，卷積層主要進行影像處理權重的調整，平坦層則是神經元權重的調整，目前的技術中，疊合多層卷積層及針對卷積層進行改良確實有助於提升整體的精確度，但越多層的結構暗示網路中需調整之參數數量越多，所需訓練時間隨之上升，也會導致訓練後進行文字圖形識別需花費更多運算效能與時間，且目前並無理論基礎可以用來規劃神經網路之層數，使得神經網路之結構趨於複雜，因此在某些特殊領域，例如社區安全或是贓車追蹤等，需要快速辨識出車牌文字的需求中，以過於複雜之類神經網路建立的圖形識別方法顯得不適用，需有更佳方案。Currently the most commonly used convolutional neural network for pattern recognition, its main structure includes a set of convolution layers and a set of flat layers, each containing multiple neurons, and the convolution layer mainly adjusts the weight of image processing. It is the adjustment of neuron weights. In the current technology, superimposing multiple convolutional layers and improving the convolutional layer does help to improve the overall accuracy, but the more multi-layer structure implies that the number of parameters in the network needs to be adjusted. , The required training time will increase, it will also lead to more computing performance and time for text and pattern recognition after training, and there is currently no theoretical basis for planning the number of layers of the neural network, making the structure of the neural network It tends to be complicated, so in some special fields, such as community safety or stolen car tracking, etc., the need to quickly identify the license plate text, the pattern recognition method established by an overly complicated neural network is not applicable, and needs to be Better plan.

有鑑於現有圖形識別技術中，構造圖形識別的精確度不佳，而決策圖形識別疊合多層神經元之識別方法會增加識別耗費時間之技術缺陷，本發明係提出一種智慧型文字圖形識別方法，其具有精簡之運算系統架構，縮減訓練時間，並可提升識別精確度及減少識別所需時間。In view of the existing pattern recognition technology, the accuracy of constructing pattern recognition is not good, and the method of decision-making pattern recognition overlapping multi-layer neurons will increase the technical defects of recognition time-consuming, the present invention proposes a smart text pattern recognition method, It has a streamlined computing system architecture, reduces training time, and can improve recognition accuracy and reduce the time required for recognition.

欲達上述目的所採用的技術手段，係令該智慧型文字圖形識別方法包含：The technical means adopted to achieve the above purpose make the intelligent text and pattern recognition method include:

進行訓練影像預處理，取得一文字之區域影像與其對應的特徵向量；Perform training image preprocessing to obtain a text area image and its corresponding feature vector;

設定一初始化數量及一停止條件；Set an initialization quantity and a stop condition;

提供一單一隱藏層之神經網路，依據該初始化數量設定該隱藏層神經元數量，並以高斯函數

作為各神經元的激發函數，且亂數產生高斯函數的中心點xc與標準差σ；Provide a single hidden layer neural network, set the number of hidden layer neurons according to the initialization number, and use Gaussian function

As the excitation function of each neuron, and the random number produces the center point xc of the Gaussian function and the standard deviation σ;

輸入預處理後之文字區域影像的特徵向量至該神經網路，並計算輸出層連結權重與辨識誤差；Input the feature vector of the pre-processed text area image to the neural network, and calculate the output layer connection weight and identification error;

建立新的神經元，以高斯函數作為新增神經元的激發函數，且亂數產生高斯函數的中心點與標準差；Create new neurons, use the Gaussian function as the excitation function of the new neuron, and the random number produces the center point and standard deviation of the Gaussian function;

輸入預處理後文字區域影像的特徵向量，計算新增該神經元至網路隱藏層所致整體辨識誤差，若新誤差小於前次誤差，則保留該神經元，若否，則捨去該神經元；Input the feature vector of the pre-processed text area image, and calculate the overall recognition error caused by adding the neuron to the hidden layer of the network. If the new error is less than the previous error, the neuron is retained, if not, the neuron is discarded yuan;

重複建立神經元、亂數產生其參數、計算輸出連結權重以及計算辨識誤差之遞迴步驟，直至達到該停止條件；Repeat the recursive steps of creating a neuron, generating its parameters in random numbers, calculating the output link weight, and calculating the identification error until the stop condition is reached;

輸入測試影像，進行文字圖形識別。Enter the test image for text and graphics recognition.

本發明係利用單一隱藏層之神經網路即可達到非線性分類之性質，因此利用單一隱藏層神經網路即可進行文字圖形識別，與傳統類神經網路之主要差別在於，本發明所使用之神經網路隱藏層節點函數之參數，即高斯函數之中心點與標準差，是利用亂數產生，在訓練過程中不需調整。其次所使用之神經網路隱藏層節點數可視辨識精度需求自行新增，新增之神經元可確保具有降低辨識誤差之效果，故得以在精簡化的神經網路結構前提下找到足夠精準度之網路模型，藉此技術，不僅可縮減網路訓練計算複雜度，亦可降低訓練後進行文字圖形識別所需時間。The present invention uses a single hidden layer neural network to achieve the nature of non-linear classification. Therefore, a single hidden layer neural network can be used for text and graphic recognition. The main difference from the traditional neural network is that the present invention uses The parameters of the node function of the hidden layer of the neural network, that is, the center point and standard deviation of the Gaussian function, are generated using random numbers and do not need to be adjusted during the training process. Secondly, the number of hidden layers of the neural network used in the visual recognition accuracy needs to be added by itself. The new neuron can ensure the effect of reducing the identification error, so that it can find sufficient accuracy under the premise of a simplified neural network structure. The network model, with this technology, can not only reduce the computational complexity of network training, but also reduce the time required for text and pattern recognition after training.

請參閱圖1，本發明智慧型文字圖形識別方法包含：Please refer to FIG. 1, the intelligent text and figure recognition method of the present invention includes:

進行影像預處理，取得一文字之區域影像與其對應的特徵向量；Perform image preprocessing to obtain a regional image of text and its corresponding feature vector;

設定一初始化數量及一停止條件，於一實施例中，該初始化數量可設定為0，該停止條件可設定為一神經元數量、一誤差臨界值或一最大遞迴次數；Set an initialization number and a stop condition. In an embodiment, the initialization number can be set to 0, and the stop condition can be set to a neuron number, an error threshold, or a maximum number of recursions;

利用預處理後文字區域訓練影像計算該神經元之輸出連結權重，進而計算新增該神經元至網路隱藏層所致整體辨識誤差，若新誤差小於前次誤差，則保留該神經元，若否，則捨去該神經元；Use the pre-processed text area training image to calculate the output link weight of the neuron, and then calculate the overall recognition error caused by adding the neuron to the hidden layer of the network. If the new error is less than the previous error, the neuron is retained. If not, the neuron is discarded;

請進一步配合參閱圖2，係以一實施例說明上述步驟，上述步驟中，第一次計算誤差時，由於隱藏層的神經元初始數量為0，相當於沒有隱藏層，此時即計算輸出（actual output）與希望的輸出（desired output）之誤差，於一實施例中，可以利用最小平方差來計算該誤差，再進行新增神經元之步驟，建立一個以高斯函數為激發函數的神經元，並且以亂數生成該高斯函數的中心點與標準差，此後再計算實際輸出（actual output）與參考輸出（desired output）間之誤差，若此誤差小於前次誤差，表示此神經元有助於精準度之提升，故新增此神經元至網路，繼續建立下一個神經元，若否，表示此神經元無助於精準度提升，捨棄之，藉由上述步驟依序生成φ₁ ~φ_n 隱藏層神經元。Please further refer to FIG. 2 to illustrate the above steps with an embodiment. In the above steps, when the error is calculated for the first time, since the initial number of neurons in the hidden layer is 0, it is equivalent to no hidden layer, and the output is calculated at this time ( The actual output) and the desired output (desired output) error, in one embodiment, you can use the least square difference to calculate the error, and then add the neuron step to create a neuron with Gaussian function as the excitation function , And generate the center point and standard deviation of the Gaussian function with random numbers, and then calculate the error between the actual output and the desired output. If this error is less than the previous error, this neuron is helpful In order to improve the accuracy, add this neuron to the network and continue to create the next neuron. If not, it means that this neuron does not contribute to the accuracy improvement. Abandon it and generate φ ₁ ~ in sequence by the above steps φ _n hidden layer neuron.

上述計算新增該神經元至網路隱藏層所致整體辨識誤差之步驟，可使用Mean Squared Error(MSE)得到

，以計算神經元激發量數學式如下式、ct為新的權重、

為是否新增進神經網路的基準：

The above calculation steps to add the neuron to the overall recognition error caused by the hidden layer of the network can be obtained by Mean Squared Error (MSE)

, The mathematical formula for calculating the amount of neuron excitation is as follows, ct is the new weight,

The benchmark for adding into the neural network:

當contri＜bestcon時就把神經元，新增進神經網路內，更新新的權重。請進一步配合參閱圖3，上述新增神經元步驟可改為建立k個神經元，亂數產生各神經元高斯函數之參數，即中心點與標準差，分別計算各經神元之輸出連結權重以及識別誤差變化量，選擇變化量為負且值最小之神經元新增至目前網路，捨去其他神經元。重複上述建立複數神經元與篩選動作。此實施例，整體而言，可加快收斂速度，如圖4a及圖4b所示，此實施例的收斂曲線如圖4b，相較上一實施例具有較快之收斂速度。When contri <bestcon, add neurons into the neural network and update the new weights. Please further refer to FIG. 3, the above step of adding neurons can be changed to create k neurons, and the random number generates the parameters of the Gaussian function of each neuron, that is, the center point and the standard deviation, and calculates the output link weight of each neuron separately And to identify the amount of change in error, select the neuron with the smallest change and the smallest value to add to the current network, and discard other neurons. Repeat the above steps of creating complex neurons and screening. Overall, this embodiment can speed up the convergence speed, as shown in FIGS. 4a and 4b. The convergence curve of this embodiment is shown in FIG. 4b, which has a faster convergence speed than the previous embodiment.

再請進一步配合參閱圖5，上述本發明智慧型文字圖形識別方法可用於水錶數字辨識及車牌辨識，以下謹先以水錶數字辨識為例，其影像預處理步驟包含：Please refer to FIG. 5 for further reference. The above-mentioned intelligent text and figure recognition method of the present invention can be used for water meter digital recognition and license plate recognition. The following takes the water meter digital recognition as an example. The image preprocessing steps include:

影像灰階化及二值化，如圖6a及圖6b所示；Image grayscale and binarization, as shown in Figure 6a and Figure 6b;

去除影像過小像素連接點，如圖6c所示；Remove the small pixel connection points of the image, as shown in Figure 6c;

侵蝕及閉合影像處理，如圖6d所示；Erosion and closed image processing, as shown in Figure 6d;

填充影像區域，如圖6e所示，此步驟中可以找到可能為文字之區域座標；Fill the image area, as shown in Figure 6e, in this step you can find the coordinates of the area that may be text;

依據指定座標得到一文字區域；Obtain a text area based on the specified coordinates;

擷取對應之文字區域影像，如圖6f所示；Capture the corresponding text area image, as shown in Figure 6f;

將文字區域影像二值化；Binarize the image of the text area;

利用八連通方法找到各個字符後，提取字符的方向梯度直方圖（Histogram of Oriented Gradient, HOG）作為特徵向量，並輸入上述類神經網路中進行訓練或文字圖形辨識。After finding each character using the eight-connected method, the histogram of oriented gradient (HOG) of the character is extracted as a feature vector, and input into the above neural network for training or text and graphic recognition.

再請進一步配合參閱圖7，以下謹針對車牌辨識為例，其影像預處理步驟包含：Please refer to Figure 7 for further cooperation. The following is an example of license plate recognition. The image preprocessing steps include:

影像二值化，如圖8a所示；Image binarization, as shown in Figure 8a;

影像邊緣偵測，於一實施例中，可以用Sobel邊緣偵測，如圖8b所示；Image edge detection, in one embodiment, Sobel edge detection can be used, as shown in Figure 8b;

去除影像過小像素連接點，如圖8c所示；Remove the small pixel connection points of the image, as shown in Figure 8c;

侵蝕及閉合影像處理，如圖8d所示，此步驟中可以找到可能為文字之區域座標；Erosion and closed image processing, as shown in Figure 8d, in this step you can find the coordinates of the area that may be text;

得到一文字區域；Get a text area;

擷取對應之文字區域影像，如圖8e所示；Capture the corresponding text area image, as shown in Figure 8e;

文字區域影像二值化，如圖8f所示；Binary image of text area, as shown in Figure 8f;

利用八連通方法找到各個字符後，提取字符的方向梯度直方圖（Histogram of Oriented Gradient, HOG）作為特徵向量，並輸入上述類神經網路中進行訓練或文字圖形辨識。After finding each character using the eight-connected method, extract the histogram of oriented gradient (HOG) of the character as a feature vector, and input it into the above neural network for training or text and graphic recognition.

經實際比較，以上述本發明進行水錶數字辨識之精準度為83%，辨識所需時間為0.3~0.5秒，而利用傳統的樣板比對法，精準度為54%，辨識所需時間為0.9~1秒；傳統的邊緣統計法，精準度為29%，辨識所需時間為0.8~1秒；支援向量機，精準度為79%，辨識所需時間為1.5~1.8秒；輻射基底函數，精準度只有20.8%，辨識所需時間為0.1~0.3秒；本發明之方法具有較高之精準度及較快之辨識速度。According to actual comparison, the accuracy of the water meter digital identification according to the present invention is 83%, the identification time is 0.3~0.5 seconds, and using the traditional template comparison method, the accuracy is 54%, and the identification time is 0.9 ~1 second; traditional edge statistics method, accuracy is 29%, identification time is 0.8~1 second; support vector machine, accuracy is 79%, identification time is 1.5~1.8 seconds; radiation basis function, The accuracy is only 20.8%, and the time required for identification is 0.1 to 0.3 seconds; the method of the present invention has higher precision and faster identification speed.

以車牌辨識為例，則本發明兩個實施例之精準度分別為77.5%及87.5%，辨識時間分別為0.4及0.3；傳統的樣板比對法，精準度為75%，辨識所需時間為3.5秒；傳統的邊緣統計法，精準度為25%，辨識所需時間為2.5秒；支援向量機，精準度為67.5%，辨識所需時間為0.9秒；輻射基底函數，精準度57.5%，辨識所需時間為1.6秒；本發明之方法亦具有較高之精準度及較快之辨識速度。Taking license plate recognition as an example, the accuracy of the two embodiments of the present invention is 77.5% and 87.5%, and the recognition time is 0.4 and 0.3; the traditional template comparison method, the accuracy is 75%, and the recognition time is 3.5 seconds; traditional edge statistics method, accuracy is 25%, identification time is 2.5 seconds; support vector machine, accuracy is 67.5%, identification time is 0.9 seconds; radiation basis function, accuracy is 57.5%, The time required for identification is 1.6 seconds; the method of the present invention also has higher accuracy and faster identification speed.

綜合以上所述，本發明利用之智慧型文字圖形識別方法能夠提升文字圖形辨識之精準度，且訓練完成後的辨識系統具有較快速的辨識時間，除可以改良傳統文字辨識之精準度瓶頸以外，亦可用於需要快速辨識如車牌追蹤等技術領域，克服傳統文字圖形辨識之技術缺陷。In summary, the intelligent text and graphics recognition method used in the present invention can improve the accuracy of text and graphics recognition, and the recognition system after training has a faster recognition time, in addition to improving the accuracy bottleneck of traditional text recognition, It can also be used in technical fields that require rapid identification, such as license plate tracking, to overcome the technical defects of traditional text and graphic identification.

以上所述僅為本發明的較佳實施例而已，並非用以限定本發明主張的權利範圍，凡其它未脫離本發明所揭示的精神所完成的等效改變或修飾，均應包括在本發明的申請專利範圍內。The above are only preferred embodiments of the present invention and are not intended to limit the scope of the claimed rights of the present invention. Any other equivalent changes or modifications made without departing from the spirit of the present invention should be included in the present invention Within the scope of patent application.

圖1：為本發明一實施例之步驟流程圖。圖2：為圖1一實施例之神經網路示意圖。圖3：為圖1另一實施例步驟流程圖。圖4a：為圖1實施例之收斂速度統計圖。圖4b：為圖3實施例之收斂速度統計圖。圖5：為圖1一實施例之影像預處理步驟流程圖。圖6a~6f：為圖5中各步驟影像處理示意圖。圖7：為圖1另一實施例之影像預處理步驟流程圖。圖8a~8f：為圖7中各步驟影像處理示意圖。FIG. 1 is a flowchart of steps according to an embodiment of the invention. FIG. 2 is a schematic diagram of a neural network according to an embodiment of FIG. 1. FIG. 3 is a flowchart of steps in another embodiment of FIG. 1. Fig. 4a is a statistical diagram of the convergence speed of the embodiment of Fig. 1. FIG. 4b is a statistical diagram of the convergence speed of the embodiment of FIG. 3. FIG. 5 is a flowchart of the image preprocessing steps of the embodiment of FIG. 1. Figures 6a~6f: Schematic diagrams of image processing in each step of Figure 5. FIG. 7 is a flowchart of image preprocessing steps in another embodiment of FIG. 1. Figures 8a~8f: schematic diagrams of the image processing in each step of Figure 7.

Claims

A smart text pattern recognition method, including: performing training image preprocessing to obtain a regional image of a text and its corresponding feature vector; setting an initialization quantity and a stopping condition; providing a single hidden layer neural network, based on the initialization quantity Set the number of neurons in the hidden layer, use the Gaussian function as the excitation function of each neuron, and generate the center point and standard deviation of each Gaussian function in random numbers; input the feature vector of the pre-processed text area image to the neural network and calculate The weight and recognition error of the output layer connection; create a new neuron, use the Gaussian function as the excitation function of the new neuron, and the random number generates the center point and standard deviation of the Gaussian function; input the feature vector of the text area image after preprocessing, Calculate the change in recognition error caused by adding the neuron to the current neural network. If the change is negative, add the neuron to the current network. If the change is positive, discard the neuron ; Repeat the above recursive steps of creating new neurons, generating parameters of random numbers, calculating output link weights, calculating recognition error changes, and choosing neurons until the stop condition is reached; input test images for text and graphic recognition.

The intelligent text pattern recognition method as described in claim 1, which includes: image binarization; removal of under-pixel connection points of the image; erosion and closed image processing; filling of the image area; obtaining a text area according to the specified coordinates; retrieving the corresponding text Area image and binarization of text area image; use eight connected methods to find each character; and extract feature vector.

The intelligent text pattern recognition method as described in claim 1, which includes: image binarization; image edge detection; removing small pixel connection points of the image; erosion and closed image processing; obtaining a text area; capturing the corresponding text area image ; Binarize the image of the text area; use the eight-connected method to find each character; and extract feature vectors.

The intelligent text and pattern recognition method as described in any one of claims 1 to 3, further includes a recursive step of adding a new neuron: creating a complex neuron, and generating the center point and standard deviation of each neuron in random numbers, Calculate the output link weight and recognition error change of each neuron separately, select the neuron with negative change and minimum value to add to the current neural network, and discard other neurons.

According to the intelligent text pattern recognition method described in claim 1, the stop condition is a number of neurons.

According to the intelligent text pattern recognition method described in claim 1, the stop condition is an error threshold.

According to the intelligent text and figure recognition method described in claim 1, the stop condition is a maximum number of recursions.