TWI766930B

TWI766930B - Classification-based character skew correction apparatus and method thereof

Info

Publication number: TWI766930B
Application number: TW107101057A
Authority: TW
Inventors: 簡浩宇
Original assignee: 台達電子工業股份有限公司
Priority date: 2018-01-11
Filing date: 2018-01-11
Publication date: 2022-06-11
Also published as: TW201931202A

Abstract

A classification-based character skew correction method is provided. The method includes the steps of: obtaining an input character image; recognizing a plurality of target objects from the input character image; classifying the target objects into at least one candidate character categories; determining a primary character category from the at least one candidate character categories; calculating a skew angle of the target objects in the primary character category; and performing image skew correction on the input character image according to the calculated skew angle of the primary character category.

Description

Classification-based character inclination correction device and method thereof

本發明係有關於影像處理，特別是有關於一種分類基礎式之字符傾斜校正裝置及其方法。 The present invention relates to image processing, in particular to a classification-based character inclination correction device and method thereof.

在視覺影像檢測上，輸入的待識別影像中之字符(text/character)會因會打印或定位的關係而有傾斜的情況。除此之外，在實際的應用上，同時會遇到許多不同狀況的字符傾斜，如：不同的字體(點陣字、印刷體)、字體大小不一、雜亂背景等情況，這會影響字符辨識之準確率。所以在對影像中的字符進行文字辨識處理前，就需要先檢測字符方向，並校正字符方向。 In visual image detection, the characters (text/character) in the input image to be recognized may be inclined due to the relationship between printing and positioning. In addition, in practical applications, there will be many characters skewed in different situations, such as: different fonts (dot-matrix, printed), different font sizes, cluttered backgrounds, etc., which will affect character recognition. the accuracy rate. Therefore, before character recognition processing is performed on the characters in the image, it is necessary to detect the character direction and correct the character direction.

在傳統的傾斜校正演算法中，通常都是直接對在待識別影像中之字符考量其整體的傾斜變化，例如霍夫變換(Hough Transform)、雷登變換(Radom Transform)、最小標準差法(Minimum Standard Deviation)、及最短距離平方法等等。換言之，上述傳統的傾斜校正演算法都是尋找在輸入影像中之字符整體的最佳解之角度以決定其傾斜角度，而無法考量到個別字符之間的關係，也因此這些方法所得到的傾斜角度不穩定且容易受到其他因素影響(例如字體不同、字符大小寫混合、雜訊干擾)，進而導致字符校正錯誤。 In the traditional tilt correction algorithm, the overall tilt change of the characters in the image to be recognized is usually directly considered, such as Hough Transform, Radom Transform, minimum standard deviation method ( Minimum Standard Deviation), and the shortest distance square method, etc. In other words, the above-mentioned traditional tilt correction algorithm is to find the angle of the best solution of the whole character in the input image to determine the tilt angle, and cannot take into account the individual The relationship between characters, and therefore the inclination angle obtained by these methods, is unstable and easily affected by other factors (eg, different fonts, mixed uppercase and lowercase characters, noise interference), which leads to character correction errors.

因此，需要一種分類基礎式之字符傾斜校正裝置及其方法以解決上述問題。 Therefore, a classification-based character inclination correction device and method thereof are required to solve the above-mentioned problems.

本發明係提供一種分類基礎式之字符傾斜校正方法，包括：取得一輸入字符影像；由該輸入字符影像中辨識出複數個目標物件；將該複數個目標物件分類為至少一個候選字符類別；由該至少一個候選字符類別中決定一主要字符類別；計算該主要字符類別中之各目標物件所相應之一傾斜角度；以及依據所計算出之該主要字符類別的該傾斜角度對該輸入影像進行影像傾斜校正以產生一校正影像。 The present invention provides a character inclination correction method based on classification, comprising: obtaining an input character image; identifying a plurality of target objects from the input character image; classifying the plurality of target objects into at least one candidate character category; Determine a main character type from the at least one candidate character type; calculate an inclination angle corresponding to each target object in the main character type; and image the input image according to the calculated inclination angle of the main character type Tilt correction to generate a corrected image.

本發明更提供一種分類基礎式之字符傾斜校正裝置，包括：一記憶體單元，用以儲存一字符傾斜校正程式；以及一處理單元，用以由該記憶體單元讀取並執行該字符傾斜校正程式以執行下列步驟：取得一輸入字符影像；由該輸入字符影像中辨識出複數個目標物件；將該複數個目標物件分類為至少一個候選字符類別；由該至少一個候選字符類別中決定一主要字符類別；計算該主要字符類別中之各目標物件所相應之一傾斜角度；以及依據所計算出之該主要字符類別的該傾斜角度對該輸入影像進行影像傾斜校正以產生一校正影像。 The present invention further provides a classification-based character skew correction device, comprising: a memory unit for storing a character skew correction program; and a processing unit for reading and executing the character skew correction from the memory unit The program performs the following steps: obtaining an input character image; identifying a plurality of target objects from the input character image; classifying the plurality of target objects into at least one candidate character class; determining a main character from the at least one candidate character class character type; calculating an inclination angle corresponding to each target object in the main character type; and performing image inclination correction on the input image according to the calculated inclination angle of the main character type to generate a corrected image.

100‧‧‧字符傾斜校正裝置 100‧‧‧Character tilt correction device

110‧‧‧處理單元 110‧‧‧Processing unit

120‧‧‧記憶體單元 120‧‧‧Memory cells

121‧‧‧揮發性記憶體 121‧‧‧Volatile memory

122‧‧‧非揮發性記憶體 122‧‧‧Non-volatile memory

123‧‧‧字符傾斜校正程式 123‧‧‧Character Skew Correction Program

210‧‧‧輸入字符影像 210‧‧‧Input character image

θ₁‧‧‧旋轉角度 θ ₁ ‧‧‧Rotation angle

Y_TP1、Y_BP1、Y_TP2、Y_BP2‧‧‧座標 Y _TP1 , Y _BP1 , Y _TP2 , Y _BP2 ‧‧‧coordinates

S310-S360‧‧‧步驟 S310-S360‧‧‧Steps

第1圖係顯示依據本發明一實施例中之字符傾斜校正裝置之功能方塊圖。 FIG. 1 is a functional block diagram of a character skew correction device according to an embodiment of the present invention.

第2A圖係繪示本發明一實施例中計算目標物件之物件高度的示意圖。 FIG. 2A is a schematic diagram of calculating the object height of the target object according to an embodiment of the present invention.

第2B圖係繪示本發明另一實施例中計算目標物件之物件高度的示意圖。 FIG. 2B is a schematic diagram of calculating the object height of the target object in another embodiment of the present invention.

第3圖係顯示依據本發明一實施例中之一分類基礎式之字符傾斜校正方法的流程圖。 FIG. 3 is a flowchart showing a method for correcting character inclination according to a classification-based formula in an embodiment of the present invention.

第4A~4F圖係顯示依據本發明一實施例中對目標物件進行分類之分類條件的示意圖。 FIGS. 4A to 4F are schematic diagrams showing classification conditions for classifying target objects according to an embodiment of the present invention.

為使本發明之上述目的、特徵和優點能更明顯易懂，下文特舉一較佳實施例，並配合所附圖式，作詳細說明如下。 In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, a preferred embodiment is exemplified below, and is described in detail as follows in conjunction with the accompanying drawings.

第1圖係顯示依據本發明一實施例中之字符傾斜校正裝置之功能方塊圖。如第1圖所示，字符傾斜校正裝置100包括一處理單元110及一記憶體單元120。記憶體單元120係包括一揮發性記憶體121及一非揮發性記憶體122。揮發性記憶體121可為一隨機存取記憶體，例如是一靜態隨機存取記憶體(SRAM)或一動態隨機存取記憶體(DRAM)，但本發明並不限於此。非揮發性記憶體122例如可為一硬碟機、一固態硬碟機、一快閃記憶體、或一唯讀記憶體，但本發明並不限於此。 FIG. 1 is a functional block diagram of a character skew correction device according to an embodiment of the present invention. As shown in FIG. 1 , the character skew correction device 100 includes a processing unit 110 and a memory unit 120 . The memory unit 120 includes a volatile memory 121 and a non-volatile memory 122 . The volatile memory 121 may be a random access memory, such as a static random access memory (SRAM) or a dynamic random access memory (DRAM), but the invention is not limited thereto. The non-volatile memory 122 can be, for example, a hard disk drive, a solid state hard drive, A flash memory, or a ROM, but the invention is not limited thereto.

非揮發性記憶體122係儲存一字符傾斜校正程式123，處理單元110係將該字符傾斜校正程式123由該非揮發性記憶體122讀取至揮發性記憶體121並執行，其中該字符傾斜校正程式123係包括一字符傾斜校正方法之程式碼。 The non-volatile memory 122 stores a character skew correction program 123, and the processing unit 110 reads the character skew correction program 123 from the non-volatile memory 122 to the volatile memory 121 and executes, wherein the character skew correction program Line 123 includes code for a character skew correction method.

本發明中之字符傾斜校正方法係可考慮在輸入字符影像中之部份字符的方向變化，而不直接考量全部字符之整體的方向變化。因為全部字符之整體的方向變化很容易受到雜訊影響之外，字符本身的差異也會影響到計算的結果。例如在輸入字符影像中有一字串「A---」，其中A很明顯的和其他字不同。在計算該字串之傾斜角度時，因為字符A的關係會導致整個字串之字符之整體的方向偏移而造成所計算出的傾斜角度錯誤。相對而言，若能只考量字符「-」的方向變化，所計算出的傾斜角度才會是正確的。 The character inclination correction method of the present invention can consider the direction change of some characters in the input character image, instead of directly considering the whole direction change of all characters. Because the overall direction change of all characters is easily affected by noise, differences in characters themselves also affect the calculation results. For example, there is a string "A---" in the input character image, in which A is obviously different from other characters. When calculating the inclination angle of the character string, because the relationship of the character A causes the direction of the whole character of the whole character string to be shifted, the calculated inclination angle is wrong. Relatively speaking, if only the direction change of the character "-" can be considered, the calculated inclination angle will be correct.

第2A圖係繪示本發明一實施例中計算目標物件之物件高度的示意圖。第2B圖係繪示本發明另一實施例中計算目標物件之物件高度的示意圖。在本實施例中，目標物件以一個字符為例，如「D」(大寫字母)、「x」(小寫字母)、「；」(標點符號)。 FIG. 2A is a schematic diagram of calculating the object height of the target object according to an embodiment of the present invention. FIG. 2B is a schematic diagram of calculating the object height of the target object in another embodiment of the present invention. In this embodiment, the target object is a character such as "D" (uppercase letter), "x" (lowercase letter), ";" (punctuation mark).

如第2A圖所示，目標物件例如可為一大寫英文字母D。假定該目標物件之旋轉角度為0，則處理單元110可計算出目標物件之頂點座標TP及基點座標BP，其中頂點係指目標物件在垂直方向(Y軸)的最高座標點，基點係指目標物件在垂直方向(Y軸)的最低點座標。需注意的是，當輸入字符影像以其中心點進行旋轉時，目標物件之頂點及基點並非一直在固定位置，在不同的旋轉角度下，頂點均是指目標物件在該旋轉角度下在垂直方向(Y軸)的最高座標點，基點均是指目標物件在該旋轉角度下在垂直方向(Y軸)的最低點座標。在第2A圖之實施例係以輸入字符影像中之其中一個目標物件為例。若目標物件之數量等於1，則並不需進行額外的分類判斷，可直接判斷目標物件之傾斜角度。 As shown in FIG. 2A , the target object can be, for example, a capital letter D. Assuming that the rotation angle of the target object is 0, the processing unit 110 can calculate the vertex coordinate TP and the base point coordinate BP of the target object, wherein the vertex refers to the highest coordinate point of the target object in the vertical direction (Y axis), and the base point refers to the target The coordinates of the lowest point of the object in the vertical direction (Y axis). It should be noted that when the input character image starts with When the center point is rotated, the vertex and base point of the target object are not always in a fixed position. Under different rotation angles, the vertex refers to the highest coordinate point of the target object in the vertical direction (Y axis) under the rotation angle, and the base point Both refer to the coordinates of the lowest point of the target object in the vertical direction (Y axis) under the rotation angle. The embodiment in FIG. 2A takes one of the target objects in the input character image as an example. If the number of target objects is equal to 1, no additional classification judgment is required, and the inclination angle of the target object can be directly determined.

若目標物件之數量大於1，處理單元110係分別計算各目標物件在旋轉角度範圍(例如旋轉角度介於正負45度之間)之內中之各旋轉角度θ(例如可以一預定間隔，例如1度或5度進行取樣計算)的頂點座標TP(X_TP,Y_TP)及基點座標BP(X_BP,Y_BP)。其中頂點及基點之間的垂直方向距離即為物件高度h。接著，若目標物件之數量大於1(意即有複數個目標物件)，處理單元110係以組合的方式比較在輸入字符影像中之任意兩個目標物件之間在各旋轉角度θ之物件高度差值diff(例如僅考慮Y軸座標)，例如可用下列方程式表示：diff _ij=|object _i(Y_TP)-object _j(Y_TP)|+|object _i(Y_BP)-object _j(Y_BP)| (1) If the number of target objects is greater than 1, the processing unit 110 calculates each rotation angle θ of each target object within the rotation angle range (eg, the rotation angle is between plus and minus 45 degrees) (for example, a predetermined interval, such as The vertex coordinates TP (X _TP , Y _TP ) and the base point coordinates BP (X _BP , Y _BP ) are sampled in degrees or 5 degrees). The vertical distance between the vertex and the base point is the object height h. Next, if the number of target objects is greater than 1 (that is, there are multiple target objects), the processing unit 110 compares the object height difference at each rotation angle θ between any two target objects in the input character image in a combined manner The value diff (e.g. considering only the Y-axis coordinates) can be represented, for example, by the following equation: diff _ij =| object _i (Y _TP )- object _j (Y _TP )|+| object _i (Y _BP )- object _j (Y _BP ) | (1)

其中i及j係為介於1~N的正整數，且i不等於j。在一實施例中，若目標物件之數量大於1(意即有複數個目標物件)，處理單元110係依據所計算出在各旋轉角度之任意兩個目標物件之間的物件高度差值diff對此兩個目標物件進行分類。舉例來說，處理單元110係判斷物件高度差值diff是否小於一預定值，若該物件高度差值小於一預定值時，則處理單元110則可判斷此兩個目標物件屬於同一候選字符類別。若該物件高度差值不小於該預定值時，將所選擇該兩個目標物件分至不同候選字符類別。 where i and j are positive integers ranging from 1 to N, and i is not equal to j. In one embodiment, if the number of target objects is greater than 1 (that is, there are multiple target objects), the processing unit 110 calculates a diff pair of object height differences between any two target objects at each rotation angle. The two target objects are classified. For example, the processing unit 110 determines whether the object height difference diff is less than a predetermined value. If the object height difference is less than a predetermined value, the processing unit 110 determines that the two target objects belong to the same candidate character category. If the object height When the difference is not less than the predetermined value, the two selected target objects are classified into different candidate character categories.

舉例來說，若在輸入字符影像210中有5個目標物件：「aDcDc」，當對輸入字符影像進行旋轉時(例如順時針旋轉θ₁角度)，此5個目標物件亦會對應地旋轉，如第2B圖所示。上述兩兩目標物件之高度差值的比較運算的總次數即為

，也就是10次。兩兩目標物件的比較關係為：(1)aD(2)ac(3)aD(4)ac(5)Dc(6)DD(7)Dc(8)cD(9)cc(10)Dc。意即以不同組合的方式對所有物件進行兩兩目標物件之高度差值diff的比較。舉例來說，對於左邊的大寫字母D來說，其頂點及端點在垂直方向(Y軸)的座標例如分別為Y_TP1及Y_BP1；對於右邊的大寫字母D來說，其頂點及端點在垂直方向(Y軸)的座標例如分別為Y_TP2及Y_BP2。就在此旋轉角度θ₁相應的高度差值diff=|Y_TP1-Y_TP2|+|Y_BP1-Y_BP2|。在此例中，目標物件D、D會被分為同一類、目標物件a、c、c亦會被分為同一類。 For example, if there are 5 target objects: "aDcDc" in the input character image 210, when the input character image is rotated (for example, rotated clockwise by an angle of _θ1 ), the 5 target objects will also be rotated correspondingly. As shown in Figure 2B. The total number of comparison operations of the height difference between the above two target objects is

, which is 10 times. The comparison relationship between pairwise target objects is: (1)aD(2)ac(3)aD(4)ac(5)Dc(6)DD(7)Dc(8)cD(9)cc(10)Dc. It means to compare the height difference diff between the two target objects for all objects in different combinations. For example, for the capital letter D on the left, the coordinates of the vertex and the end point in the vertical direction (Y axis) are, for example, Y _TP1 and Y _BP1 respectively; for the capital letter D on the right, the vertex and the end point are respectively Y TP1 and Y BP1 The coordinates in the vertical direction (Y axis) are, for example, Y _TP2 and Y _BP2 , respectively. The height difference value diff=|Y _TP1 -Y _TP2 |+|Y _BP1 -Y _BP2 | at this rotation angle θ ₁ . In this example, target objects D and D will be classified into the same class, and target objects a, c, and c will also be classified into the same class.

詳細而言，若是字體相同的字符，通常會有類似的物件高度，例如英文大寫字母，這些字母在進行上述判斷時會在幾個特定角度中會具有相當接近的物件高度差值，故會被分在同一個類別中。然而，英文小寫字母的高度則不一致，要同屬於相同的字母、同屬於一般小寫字母(例如w、n、c、e、a等等)、或是同屬於特殊小寫字母(例如g、j、p、q、y等等、或是f、h、b、d、k等等)才會被分在同一個類別中。此外，標點符號(+-*/)之高度也不一致，要相同的標點符號才會被分在同一個類別中，不同的標點符號則不一定會被分在同一個類別中。 In detail, if characters with the same font, usually have similar object heights, such as English capital letters, these letters will have very close object height differences in several specific angles when making the above judgment, so they will be in the same category. However, the heights of English lowercase letters are not consistent, they must belong to the same letter, belong to the same general lowercase letters (such as w, n, c, e, a, etc.), or belong to the same special lowercase letters (such as g, j, p, q, y, etc., or f, h, b, d, k, etc.) are classified in the same category. In addition, the heights of punctuation marks (+-*/) are also inconsistent. Only the same punctuation marks can be classified into the same category, and different punctuation marks are not necessarily classified into the same category. middle.

舉例來說，若在輸入字符影像中有一字串為「WWWxxyy--」，其中W為英文大寫字母，x及y為英文小寫字母，-則為標點符號。依據上述之判斷機制，則會得到4個候選字符類別，例如(1)WWW、(2)xx、(3)yy、(4)--。 For example, if there is a character string in the input character image as "WWWxxyy--", where W is an English uppercase letter, x and y are English lowercase letters, and - is a punctuation mark. According to the above judgment mechanism, four candidate character categories are obtained, such as (1) WWW, (2) xx, (3) yy, (4)--.

在一實施例中，處理單元110係以目標物件數量最多的候選字符類別以做為主要字符類別以進行其傾斜角度之計算，例如計算的條件為在主要字符類別中之各目標物件之間的物件高度差值的為最小值所相應的旋轉角度，此即為傾斜角度φ，傾斜角度φ例如可用下列公式表示：

In one embodiment, the processing unit 110 uses the candidate character class with the largest number of target objects as the main character class to calculate the inclination angle thereof. The height difference of the object is the rotation angle corresponding to the minimum value, which is the inclination angle φ. The inclination angle φ can be expressed by the following formula, for example:

在決定主要字符類別之傾斜角度後，處理單元110即依據所決定之該傾斜角度對該輸入字符影像進行影像傾斜校正。對於光學文字辨識(Optical Character Recognition，OCR)之流程來說，需先對輸入字符影像進行影像傾斜校正之後，才會進行字符分割及字符辨識，並輸出最後的文字辨識結果。 After determining the inclination angle of the main character type, the processing unit 110 performs image inclination correction on the input character image according to the determined inclination angle. For the process of Optical Character Recognition (OCR), it is necessary to perform image skew correction on the input character image before performing character segmentation and character recognition, and outputting the final character recognition result.

本發明中之字符傾斜校正方法可讓影像傾斜校正之校正影像中之字符更能符合實際的字符傾斜情況，進而可增加字符辨識之準確率。 The character inclination correction method in the present invention can make the characters in the corrected image of the image inclination correction more in line with the actual character inclination, thereby increasing the accuracy of character recognition.

在一些實施例中，處理單元110亦可使用其他演算法對輸入字符影像中之各目標物件進行分類，例如：(1)字符本身的長寬比，如第4A圖所示；(2)字符之外接矩形的中心點，如第4B圖所示；(3)字符之外接矩形的面積，如第4C圖所示； (4)字符之外接圓形的圓心，如第4D圖所示；(5)字符之外接圓形的面積，如第4E圖所示；(6)字符之外接圓形的半徑，如第4F圖所示。意即，處理單元110可使用至少一種方法對輸入字符影像中之各目標物件進行分類以產生至少一個候選字符分類。 In some embodiments, the processing unit 110 may also use other algorithms to classify each target object in the input character image, for example: (1) the aspect ratio of the character itself, as shown in FIG. 4A; (2) the character The center point of the circumscribed rectangle, as shown in Figure 4B; (3) the area of the character circumscribed rectangle, as shown in Figure 4C; (4) The center of the circle circumscribed by the character, as shown in Figure 4D; (5) The area of the circle circumscribed by the character, as shown in Figure 4E; (6) The radius of the circle circumscribed by the character, as shown in Figure 4F as shown in the figure. That is, the processing unit 110 can use at least one method to classify each target object in the input character image to generate at least one candidate character classification.

在另一些實施例中，若能先確定所要辨識之字符的樣式類型，則可使用傳統的分類器進行分類。也就是說，可將所要辨識之各字符一一輸入分類器進行訓練，以產生各字符在訓練後之分類結果。其中上述分類器例如可使用支持向量機(Support Vector Machine、SVM)、最近鄰居分類法(K-nearest neighbor classification)、卷積神經網路(Convolutional Neural Networks)、或是深度學習(Deep learning)等方法，但本發明並不限於此。 In other embodiments, if the pattern type of the character to be recognized can be determined first, a traditional classifier can be used for classification. That is to say, each character to be recognized can be input into the classifier one by one for training, so as to generate the classification result of each character after training. The above-mentioned classifier can use, for example, Support Vector Machine (SVM), K-nearest neighbor classification (K-nearest neighbor classification), Convolutional Neural Networks (Convolutional Neural Networks), or deep learning (Deep learning), etc. method, but the present invention is not limited to this.

承上述實施例，處理單元110由至少一個候選字符類別中決定主要字符類別時，除了依據前述實施例中具有最小差值總和之候選字符類別之外，還可依據多種方法以由至少一個候選字符類別中決定主要字符類別，例如在候選字符類別中：(1)具有最大的字符面積總和；(2)具有最大的邊緣強度總和；(3)具有最多的字符數量；(4)具有最小的字符差異；(5)具有最大的字符之標準差總和；(6)具有最大的字符之變異數總和；(7)字符的平均亮度最小(黑色字)/最大(白色字)，但本發明並不限於此。 According to the above-mentioned embodiment, when the processing unit 110 determines the main character type from the at least one candidate character type, in addition to the candidate character type with the smallest sum of difference values in the foregoing embodiment, the processing unit 110 can also use various methods to determine the main character type from the at least one candidate character type. The main character class is determined in the class, for example, in the candidate character class: (1) has the largest sum of character area; (2) has the largest sum of edge strength; (3) has the largest number of characters; (4) has the smallest character (5) The sum of the standard deviations of the characters with the largest; (6) The sum of the variances of the characters with the largest; (7) The average brightness of the characters is minimum (black words)/maximum (white words), but the present invention does not limited to this.

第3圖係顯示依據本發明一實施例中之一分類基礎式之字符傾斜校正方法的流程圖。在步驟S310，取得一輸入字符影像。其中該輸入字符影像可來自一外部影像擷取裝置，或是可透過有線或無線傳輸介面由其他電腦設備所取得。 FIG. 3 is a flowchart showing a method for correcting character inclination according to a classification-based formula in an embodiment of the present invention. In step S310, an input is obtained character image. The input character image may come from an external image capturing device, or may be obtained by other computer equipment through a wired or wireless transmission interface.

在步驟S320，由該輸入字符影像中辨識出複數個目標物件。上述目標物件係包括字符或雜訊。舉例來說，在辨識目標物件前，處理單元110可以採用至少一種演算法以將雜訊從輸入字符影像去除，例如可利用設定閾值、濾波器、及/或是形態學(morphology)之運算以去除雜訊。設定閾值之演算法例如可利用下列條件：灰階值強度(gray value)、邊緣強度(magnitude)、面積大小、高度、寬度、或其組合，但本發明並不限於此。使用濾波器之演算法例如可利用下列濾波器以去除雜訊，例如：平均濾波器(mean filter)、中值濾波器(median filter)、最小值/最大值濾波(min/max filter)、波峰波谷濾波器(peak and valley filter)、高斯濾波器(Gaussian filter)、低通濾波器(low-pass filter)、高通濾波器(high-pass filter)等等，但本發明並不限於此。使用形態學之運算以去除雜訊的演算法例如包括：侵蝕(erosion)、開放運算(opening)等等，但本發明並不限於此。在進行雜訊去除處理之後，處理單元110則會由已去除雜訊之輸入字符影像辨識出複數個目標物件。 In step S320, a plurality of target objects are identified from the input character image. The above-mentioned target objects include characters or noise. For example, before identifying the target object, the processing unit 110 may employ at least one algorithm to remove noise from the input character image, such as setting thresholds, filters, and/or morphological operations to Remove noise. For example, the algorithm for setting the threshold may use the following conditions: gray value, edge magnitude, area size, height, width, or a combination thereof, but the invention is not limited thereto. Algorithms using filters, for example, can use the following filters to remove noise, such as: mean filter, median filter, min/max filter, peak A peak and valley filter, a Gaussian filter, a low-pass filter, a high-pass filter, etc., but the present invention is not limited thereto. The algorithms for removing noise using morphological operations include, for example, erosion, opening, etc., but the present invention is not limited thereto. After the noise removal process is performed, the processing unit 110 identifies a plurality of target objects from the input character image from which the noise has been removed.

在步驟S330，將該複數目標物件分類為至少一個候選字符類別。舉例來說，處理單元110可依據一第一判斷機制將該複數個目標物件分類為至少一個候選字符類別，其中第一判斷機制除了可使用物件高度差值以對目標物件進行分類之外，亦可使用不同的演算法對輸入字符影像中之各目標物件進行分類，例如：(1)字符本身的長寬比；(2)字符之外接矩形的中心點；(3)字符之外接矩形的面積；(4)字符之外接圓形的圓心；(5)字符之外接圓形的面積；(6)字符之外接圓形的半徑。在另一些實施例中，處理單元110亦可使用支持向量機(Support Vector Machine、SVM)、最近鄰居分類法(K-nearest neighbor classification)、卷積神經網路(Convolutional Neural Networks)、或是深度學習(Deep learning)等方式預先對各種可能輸入的字符預先進行訓練並分類。 In step S330, the plural target objects are classified into at least one candidate character category. For example, the processing unit 110 may classify the plurality of target objects into at least one candidate character category according to a first determination mechanism, wherein the first determination mechanism may not only use the object height difference to classify the target objects, but also Different algorithms can be used to classify each target object in the input character image, such as: (1) the aspect ratio of the character itself; (2) the rectangle surrounding the character (3) the area of the rectangle circumscribed by the character; (4) the center of the circle circumscribed by the character; (5) the area of the circle circumscribed by the character; (6) the radius of the circle circumscribed by the character. In other embodiments, the processing unit 110 may also use Support Vector Machine (SVM), K-nearest neighbor classification (K-nearest neighbor classification), Convolutional Neural Networks, or deep Various possible input characters are pre-trained and classified by methods such as deep learning.

在步驟S340，由該至少一個候選字符類別中決定一主要字符類別。舉例來說，處理單元110可依據一第二判斷機制以由該至少一個候選字符類別中決定該主要字符類別，其中第二判斷機制除了可使用具有最小差值總和之候選字符類別以決定為主要字符類別之外，亦可使用多種方法以決定主要字符類別，例如：例如在候選字符類別中：(1)具有最大的字符面積總和；(2)具有最大的邊緣強度總和；(3)具有最多的字符數量；(4)具有最小的字符差異；(5)具有最大的字符之標準差總和；(6)具有最大的字符之變異數總和；(7)字符的平均亮度最小(黑色字)/最大(白色字)，但本發明並不限於此。 In step S340, a main character class is determined from the at least one candidate character class. For example, the processing unit 110 may determine the main character category from the at least one candidate character category according to a second determination mechanism, wherein the second determination mechanism may use the candidate character category with the smallest difference sum to determine the main character category. In addition to the character category, various methods can also be used to determine the main character category, such as: for example, among the candidate character categories: (1) has the largest sum of character areas; (2) has the largest sum of edge strengths; (3) has the most (4) with the smallest character difference; (5) with the largest sum of standard deviations of the characters; (6) with the largest sum of the variances of the characters; (7) with the smallest average brightness of the characters (black characters)/ maximum (white words), but the present invention is not limited to this.

在步驟S350，計算該主要字符類別中之各目標物件所相應之一傾斜角度。簡單來說，在輸入字符影像中之各目標物件的傾斜角度係由主要字符類別中之各目標物件的傾斜角度所決定。 In step S350, an inclination angle corresponding to each target object in the main character category is calculated. To put it simply, the inclination angle of each target object in the input character image is determined by the inclination angle of each target object in the main character category.

在步驟S360，依據所計算出之該主要字符類別的該傾斜角度對該輸入影像進行影像傾斜校正以產生一校正影像。因為本發明中對目標物件進行分類以及由候選字符類別中決定主要字符類別之方法可準確地計算出傾斜角度，故依據所計算出之傾斜角度進行影像傾斜校正可得到更好的效果，並可增進後續文字辨識之準確率。 In step S360, image tilt correction is performed on the input image according to the calculated tilt angle of the main character type to generate a corrected image. Because in the present invention, the target object is classified and selected from the candidate character category The method of determining the main character type can accurately calculate the inclination angle, so the image inclination correction according to the calculated inclination angle can obtain better effects and improve the accuracy of subsequent character recognition.

綜上所述，本發明係提供一種分類基礎式的字符傾斜校正裝置及其方法，其可經由對輸入字符影像中之字符進行分類，並決定出主要字符類別以計算相應的傾斜角度，並利用所計算出之傾斜角度進行影像傾斜校正，藉以取得較好的校正影像。此外，當取得了較佳的校正影像後，亦可增加在光學文字辨識之後續流程中之字符分割及字符辨識之準確度。 To sum up, the present invention provides a classification-based character inclination correction device and method thereof, which can classify the characters in the input character image, determine the main character type, calculate the corresponding inclination angle, and use the The calculated tilt angle is used for image tilt correction, so as to obtain a better corrected image. In addition, when a better corrected image is obtained, the accuracy of character segmentation and character recognition in the subsequent process of optical character recognition can also be increased.

本發明之方法，或特定型態或其部份，可以以程式碼的型態包含於實體媒體，如軟碟、光碟片、硬碟、或是任何其他機器可讀取(如電腦可讀取)儲存媒體，其中，當程式碼被機器，如電腦載入且執行時，此機器變成用以參與本發明之裝置或系統。本發明之方法、系統與裝置也可以以程式碼型態透過一些傳送媒體，如電線或電纜、光纖、或是任何傳輸型態進行傳送，其中，當程式碼被機器，如電腦接收、載入且執行時，此機器變成用以參與本發明之裝置或系統。當在一般用途處理器實作時，程式碼結合處理器提供一操作類似於應用特定邏輯電路之獨特裝置。 The methods of the present invention, or specific forms or portions thereof, may be embodied in the form of code on physical media such as floppy disks, optical disks, hard disks, or any other machine-readable (eg, computer-readable) ) storage medium in which, when the code is loaded and executed by a machine, such as a computer, the machine becomes a device or system for participating in the present invention. The method, system and device of the present invention can also be transmitted in the form of code through some transmission medium, such as wire or cable, optical fiber, or any transmission type, wherein when the code is received by a machine, such as a computer, loaded, loaded And when executed, the machine becomes a device or system for participating in the present invention. When implemented on a general-purpose processor, the code combines with the processor to provide a unique device that operates analogously to application-specific logic circuits.

本發明雖以較佳實施例揭露如上，然其並非用以限定本發明的範圍，任何所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可做些許的更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 Although the present invention is disclosed above with preferred embodiments, it is not intended to limit the scope of the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the appended patent application.

S310-S360‧‧‧步驟 S310-S360‧‧‧Steps

Claims

A character inclination correction method based on classification, comprising: obtaining an input character image; identifying a plurality of target objects from the input character image; classifying the plurality of target objects into at least one candidate character category; determining a main character type from the character type; calculating an inclination angle corresponding to each target object in the main character type; and performing image inclination correction on the input image according to the calculated inclination angle of the main character type to generate A calibration image, wherein the step of classifying the plurality of target objects into at least one candidate character category includes: calculating a vertex coordinate and a base point coordinate in each target object; calculating any two targets in the plurality of target objects an object height difference value at each rotation angle of the object within a rotation angle range; and when the object height difference value is less than a predetermined value, the two calculated target objects are classified into the same candidate character category; when the object When the height difference is not less than the predetermined value, the two calculated target objects are classified into different candidate character categories.

The classification-based character inclination correction method described in item 1 of the scope of the application further comprises: determining a candidate character class with a maximum number of characters in the at least one candidate character class as the main character class.

The classification-based character inclination correction method described in item 1 of the patent application scope further includes: according to the aspect ratio of each target object itself, the center point of the circumscribed rectangle, the area of the circumscribed rectangle, the center of the circumscribed circle, the circumscribed One of the area of the circle, the radius of the circumscribed circle, or a combination thereof to classify the plurality of target objects into the at least one candidate character category.

The classification-based character inclination correction method as described in item 1 of the scope of the application, further comprising: classifying the plurality of target objects as the At least one candidate character class.

The classification-based character inclination correction method described in item 1 of the scope of application for patent further comprises: determining the character with the largest sum of area, the largest sum of edge strength, the largest number of characters, and the smallest character from each candidate character class. The difference, the sum of the standard deviations of the largest characters, the sum of the variances of the largest characters, or the candidate character class with the smallest average brightness is used as the main character class.

A character inclination correction device of a classification-based type, comprising: a memory unit for storing a character inclination correction program; and a processing unit for reading and executing the character inclination correction program from the memory unit to execute the following Steps: obtaining an input character image; identifying a plurality of target objects from the input character image; classifying the plurality of target objects into at least one candidate character class; determining a main character class from the at least one candidate character class; calculating an inclination angle corresponding to each target object in the main character class; Perform image tilt correction on the input image based on the inclination angle of the main character category to generate a corrected image, wherein in the step of classifying the plurality of target objects into at least one candidate character category, the processing unit further calculates each target object A vertex coordinate and a base point coordinate in the object, calculate an object height difference value of each rotation angle of any two target objects in the plurality of target objects within a rotation angle range, wherein when the object height difference value is less than When a predetermined value is used, the processing unit classifies the two target objects into the same candidate character category; when the height difference between the objects is not less than the predetermined value, the processing unit classifies the two target objects into different candidate characters category.

The classification-based character inclination correction device as described in claim 6, wherein the processing unit determines a candidate character type having a maximum number of characters in the at least one candidate character type as the main character type.

The classification-based character inclination correction device as described in item 6 of the scope of application, wherein the processing unit is based on the length and width of each target object itself ratio, the center point of the circumscribed rectangle, the area of the circumscribed rectangle, the center of the circumscribed circle, the area of the circumscribed circle, the radius of the circumscribed circle, or a combination thereof to classify the plurality of target objects as the at least one candidate character class.

The classification-based character inclination correction device as described in claim 6, wherein the processing unit uses support vector machine, nearest neighbor classification, convolutional neural network, or deep learning to classify the plurality of target objects is the at least one candidate character category.

The classification-based character inclination correction device according to item 6 of the scope of the application, wherein the processing unit is determined from each candidate character class to have the largest sum of character areas, the largest sum of edge strengths, the largest number of characters, the smallest The character difference of , the sum of the standard deviation of the largest character, the sum of the variation of the largest character, or the candidate character class with the smallest average brightness is used as the main character class.