TW399187B - On-line character recognizing device - Google Patents
On-line character recognizing device Download PDFInfo
- Publication number
- TW399187B TW399187B TW087100891A TW87100891A TW399187B TW 399187 B TW399187 B TW 399187B TW 087100891 A TW087100891 A TW 087100891A TW 87100891 A TW87100891 A TW 87100891A TW 399187 B TW399187 B TW 399187B
- Authority
- TW
- Taiwan
- Prior art keywords
- feature
- dictionary
- segment
- character
- input
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0354—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of 2D relative movements between the device, or an operating part thereof, and a plane or surface, e.g. 2D mice, trackballs, pens or pucks
- G06F3/03545—Pens or stylus
Landscapes
- Character Discrimination (AREA)
Abstract
Description
A7 經濟部中央標準局員工消費合作社印装 五、發明説明(i ) 發明所屬技術領域 本發明係關於在筆輸入式電腦等以手寫輸入文字之 線上文字識别裝置,尤其係關於提高對連續字等之文字 識别率之線上文字識别裝置。 - 習知技術 在係用以以筆和圖形輸入板爲輸入裝置之電腦輸入 文字碼之必要技術之線上文字識别裝置,對於用楷書所 記入之文字,利用周知之基本字跡線(Stroke)方式(預先 將數種字跡線之形狀定義爲基本字跡線後,以基本字跡 線之組合表達文字)或其他各種識别方式實現高精度的識 别。可是,目前對於連績字之識别性能和楷書的袓此羞 不足夠。-因而,以前就開始研究處理連續字之線上文字 識别方式。例如,在電子通信學會論文誌J66-D No.5第 5937〜600頁所記載之「利用選擇性基本字跡線結合之和 筆劃數、筆順不相關之線上文字識别」。以下,將其設 爲習知例1。若利用習知例1,藉著將在輸入圖型和辭典 中字跡線(由落筆到提筆爲止之座標串之單位)數少的字 跡線和字跡線數多的一對一對應,在字跡線多的一方未 對應之字跡線和已對應之字跡線選擇性結合,使用 DP(Dynamic Programming)比對算出結合後之辭典和輸入 圖型之座標點間之距離後輸出候選文字,使得可識别文 字。關於DP比對,因例如在「圖形識别」(船久保登 著:共立出版)之第62頁有記述,在此不詳細説明。在 習知例1,將座標點用於DP比對之特徵,但是此外也有 I---------裝------1T------東 (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) A7 經濟部中央標準局員工消費合作社印裝 五、發明説明(2 ) 使用圖18所示之沿著運筆等分割之座標點間之方向成分 (方向碼)。 在圖18,對於輸入圖型沿著運筆方向將字跡線設爲 如用一筆劃寫的般完全連接之狀態後,將字跡線以適當 的長度近以等分割。將圖18之各分割點間(al、a2、a3、 a4、…、a21)之方向成分近似成例如圖19所示之8方向 碼,將該方向成分用於DP比對之特徵,進行連續字識 别也可能。 又,在使用基本字跡線之方法上,有製作和連續字 對應之辭典後識别之方法,或在辭典具有字跡線之分離 資料,並如輪和辭典之字跡線數變成同一樣分解字跡線 之方法。 例如在使用基本字跡線之習知例上有在特公平2-10473號公報公開之方法。以下,將其設爲習知例2。在 此,説明該習知例2之構造及勤作。 圖20係表示習知例2之線上文字識别裝置之基本構 造之方塊構造圖。在圖20,包括座標輸入裝置21、輸入 座標輸入裝置21之輸出之基本線段識别電路22、輸入 基本線段識别電路22之輸出後依次輸出之線段碼送出電 路23、線段碼緩衝器24、在判定電路29再識别時依次 分解字跡線之線段之線段分解電路25、控制電路26、比 較線段碼送出電路23之輸出和辭典記憶部28之輸出之 比較電路27、輸入比較電路27之輸出後進行文字識别 之判定電路29以及向比較電路27依次送出所記憶之辭 (請先閲讀背面之注意事項再填寫本頁) 訂 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) A7 經濟部中央標準局員工消費合作社印製 B7 五、發明説明(3 ) 典資料之辭典記憶部28。 在被供給自座標輸入裝置21所輸出之座標點之時系 列資料之基本線段識别電路22,將字跡線以折線近似 後,用圖19所示之8方向碼表示各折線(Segment)之方 向成分。其次,使用圖21所示方向碼串和基本字跡線之 對應表決定輸入字跡線屬於那一基本字跡線。 其次,使用圖22説明習知例2之動作。在圖22, 按照線段101、102、103、104、105之順序記入字跡線。 將各字跡線以折線近似後,用8方向碼沿著筆順表示時, 變成{(1)、(6)、(7)、(1,7)、(1)}。使用圖21所示之基 本字跡線表,得到基本字跡線串{(1)、(3)、(4)、(7)、(1)}。 圖22之文字圖型,因以5劃記入,在比較電路27 進行和5劃辭典之比對處理後,在判定電路29進行選擇 之判定。結果,判定和辭典内之文字「石j 一致,輸出 文字碼。 其次,説明以圖23之連續字圖型説明動作。在圖23, 按照線段106、107、108、109之順序記入。對於圖23 之連續字圖型,一樣地用基本線段識别電路22得到基本 字跡線串{(1)、(3)、(4)、(21)}。因筆劃數係4,在比較 電路27和和4劃辭典比較。此時,若在辭典記憶部28 •内不存在「石」之4劃辭典,判定電路29就無法輸出文 字。因此,回到择JLf路26 ’使用線段分解電路25依 Π次分解皇1線2 、,一-一 圖24係表示連續字字跡線之分解法則之圖。字跡線 (請先閱讀背面之注意事項再填寫本頁)A7 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the Invention (i) Field of the Invention The present invention relates to an online character recognition device for inputting characters by hand in a pen-type computer and the like, and particularly to improving continuous characters, etc. Character recognition rate of online character recognition device. -Known technology is an online text recognition device that is a necessary technology for inputting text codes on a computer using a pen and a graphics tablet as input devices. For texts entered in regular script, the well-known basic stroke (Stroke) method is used ( The shapes of several types of handwriting lines are defined as basic handwriting lines in advance, and the combination of basic handwriting lines is used to express text) or various other recognition methods to achieve high-precision recognition. However, the current shame in recognition of the continuous word and the regular script is not enough. -Therefore, research on the recognition of online characters in continuous characters has been started. For example, "Combination of Selective Basic Glyphs and Sum Strokes, Stroke Order, Irrelevant Online Character Recognition," described in the Journal of the Institute of Electronics and Communications, J66-D No. 5, pp. 5937 ~ 600. Hereinafter, this will be referred to as a conventional example 1. If the first example is used, a one-to-one correspondence between a small number of handwriting lines and a large number of handwriting lines in the input pattern and dictionary (the unit of the coordinate string from the pen to the pen) is one-to-one. The non-corresponding zigzag line and the corresponding zigzag line on the side with more lines are selectively combined, and DP (Dynamic Programming) comparison is used to calculate the distance between the combined dictionary and the coordinate point of the input pattern, and then output candidate text to make it recognizable. Text. The DP comparison is described in, for example, page 62 of "Graphic Recognition" (by Funakubo: Kyoritsu Publishing), and will not be described in detail here. In Conventional Example 1, the coordinate points are used for the characteristics of DP comparison, but in addition, there is also I ------------------------- 1T ---- (please read the back first) Note: Please fill in this page again.) This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) A7 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (2) Use the edge shown in Figure 18 Directional component (direction code) between coordinate points divided by a moving pen. In FIG. 18, the handwriting lines are set to be completely connected like a stroke in the stroke direction for the input pattern, and the handwriting lines are divided into equal lengths with an appropriate length. The directional components (al, a2, a3, a4, ..., a21) between the division points in FIG. 18 are approximated to, for example, the 8-direction code shown in FIG. 19, and the directional components are used for the feature of the DP comparison to perform continuous Word recognition is also possible. In addition, in the method of using basic handwriting lines, there is a method of identifying a dictionary corresponding to a continuous word, or separating data in the dictionary with handwriting lines, and decomposing the handwriting lines in the same way as the number of rounding lines in the dictionary method. For example, there is a method disclosed in Japanese Unexamined Patent Publication No. 2-10473 in a conventional example using a basic handwriting line. Hereinafter, this will be referred to as Conventional Example 2. Here, the structure and hard work of the conventional example 2 will be described. Fig. 20 is a block configuration diagram showing a basic configuration of the online character recognition device of the conventional example 2. In FIG. 20, the basic line segment identification circuit 22 including the coordinate input device 21, the input of the coordinate input device 21, the line segment code sending circuit 23, the line segment buffer 24, and the output after the input of the output of the basic line segment identification circuit 22 are sequentially performed. When the circuit 29 re-identifies, the line segment decomposition circuit 25, the control circuit 26, the output of the comparison line code sending circuit 23, the comparison circuit 27 of the dictionary memory 28, and the input of the output of the dictionary memory 28 are sequentially decomposed into characters. The identification judgment circuit 29 and the comparison circuit 27 send the memorized words in sequence (please read the precautions on the back before filling in this page) The size of the paper is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) A7 Ministry of Economic Affairs Printed by the Consumer Standards Cooperative of the Central Bureau of Standards B7 V. Description of the invention (3) Dictionary memory 28. When supplied with the coordinate points output from the coordinate input device 21, the basic line segment identification circuit 22 of the series of data approximates the handwriting line with a polyline, and uses the 8-direction code shown in FIG. 19 to indicate the direction component of each polyline (Segment). . Next, the correspondence table between the direction code string and the basic trace line shown in FIG. 21 is used to determine which basic trace line the input trace belongs to. Next, the operation of the conventional example 2 will be described using FIG. 22. In FIG. 22, the writing lines are written in the order of the line segments 101, 102, 103, 104, and 105. After each handwriting line is approximated by a polyline, when it is represented along the stroke order by an 8-direction code, it becomes {(1), (6), (7), (1, 7), (1)}. Using the basic handwriting line table shown in Fig. 21, basic handwriting line strings {(1), (3), (4), (7), (1)} are obtained. Since the character pattern of FIG. 22 is entered in 5 strokes, the comparison circuit 27 performs comparison processing with the 5 stroke dictionary, and then the selection circuit 29 makes a selection decision. As a result, it is determined that the word "stone j" in the dictionary is the same, and a text code is output. Next, the operation will be explained using the continuous word pattern of Fig. 23. In Fig. 23, the line segments 106, 107, 108, and 109 are entered in the order. 23 of the continuous word pattern, the basic line segment sequence {(1), (3), (4), (21)} is obtained by the basic line segment recognition circuit 22. Because the number of strokes is 4, in the comparison circuit 27 and the Comparison of 4-stroke dictionary. At this time, if the 4-stroke dictionary of "stone" does not exist in the dictionary memory section 28, the determination circuit 29 cannot output characters. Therefore, returning to the JLf path 26 ', the line segment decomposition circuit 25 is used to decompose the Emperor 1 line 2 and 1, 1-1. Fig. 24 is a diagram showing a decomposition rule of a continuous word line. Handwriting line (Please read the notes on the back before filling in this page)
、1T -真· 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) ·. 經濟部中央標準局員工消費合作社印裝 A7 B7 _ 五、發明説明(4 ) 之分解和分解法則預先登記在辭典記憶部28,在此使用 圖24所示之法則將基本字跡線(21)分割成成(7)、(1), 還將筆劃數設爲5。結果,輸入圖型之基本字跡線串被 修正爲{(1)、(3)、(4)、(7)、(1)},比較電路27進行和 辭典記憶部28内之5劃辭典之比對作業。結果,和辭典 内之文字「石」一致,判定電路29就輸出結果。 發明要解決之課題 可是,若利巧習知例1,可識别連續字,但是例如 將圖25所示之辭典圖型(a)和輸入圖型(b)比對時,如圖 25(c)所示,因位置偏移或變形,所對應之座標點間之距 離變大,結果,和辭典之路離變大,有易誤讀之問題點。 又,如圖18所示,在DP比對所用之特徵使用方向 碼時,除了可識别連續字以外,對於圖25所示之位置偏 移變強,但是在運筆方向相似之文字之間,例如「伎」 和「低」、「却」和「劫」或「村」和「杖」等,有易 識别錯誤之問題點。 又,在識别字形潦草而和辭典之對應部分之方向差 大之文字時,在輸入圖型和辭典之DP比對所得到之價 値(Cost)値比字型工整之圖型的大,結果,有易識别爲 其他文字之問題點。 此外,例如,和如圖26所示般含有「撇夠」「下 壓」成分之文字(a)和不含「撇鉤」「下壓」成分之文字 (即未具有對應之成分)之辭典(b)之相似距離變大,有易 識别錯誤之問題點。對於此問題,例如想出忽略字跡 本紙張尺度適用中國國家標準(CNS ) A4規格(210 X 297公釐) (請先閱讀背面之注意事項再填寫本頁), 1T-true · This paper size applies to Chinese National Standard (CNS) A4 specification (210X297 mm) ·. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 _ V. Explanation of the invention (4) decomposition and decomposition rules in advance It is registered in the dictionary memory unit 28, and the basic handwriting line (21) is divided into (7) and (1) using the rule shown in FIG. 24, and the number of strokes is set to 5. As a result, the basic handwriting line string of the input pattern is corrected to {(1), (3), (4), (7), (1)}, and the comparison circuit 27 performs the sum of the five strokes in the dictionary memory section 28. Compare jobs. As a result, the word "stone" in the dictionary is matched, and the decision circuit 29 outputs the result. The problem to be solved by the invention, however, is that if you are familiar with Example 1, you can recognize continuous words, but when you compare the dictionary pattern (a) and input pattern (b) shown in Figure 25, as shown in Figure 25 (c As shown in the figure, due to the position shift or deformation, the distance between the corresponding coordinate points becomes larger, and as a result, the distance from the dictionary path becomes larger, and there are problems that are easy to misread. In addition, as shown in FIG. 18, when a direction code is used for the features used in the DP comparison, in addition to recognizing continuous characters, the position offset shown in FIG. 25 becomes stronger, but between characters with similar pen stroke directions, for example, "Trick" and "low", "but" and "robbing" or "village" and "stick", etc., have problems that are easy to identify. In addition, when recognizing characters that are scribbled and have a large difference in direction from the corresponding part of the dictionary, the cost 値 (Cost) obtained by the DP comparison of the input pattern and the dictionary is larger than the font's neat pattern. As a result, There are problems that can be easily identified as other characters. In addition, for example, as shown in FIG. 26, a dictionary containing the word "amount" and "press down" (a) and a word without a "skip" and "press down" (that is, does not have a corresponding component) (B) The similarity distance becomes larger, which has the problem of easy identification error. For this problem, for example, think of ignoring the handwriting. The paper size applies the Chinese National Standard (CNS) A4 specification (210 X 297 mm) (Please read the precautions on the back before filling this page)
A7 經濟部中央標準局員工消費合作社印装 B7 五、發明説明(5 ) 線之起點、終點附近之折回成分(例如連續之直線部之角 度差係90度以下之起點或終點之線段)之方向碼、或加 權等方法,但是在不意識文字時無法判定起點、終點附 近之折回成分係雜訊或係文字所需之特徵。因而,有無 法只是忽略字跡線之起點、終點之問題點。 而在使用基本字跡線等字跡線特徵之識别方法,有 不使辭典和輸入圖型之筆劃數致時無法計算距離之問題 點,爲了處理連續字,需要預先登記連續字之圖型,或 在辭典之文字圖型就各文字記述易連續之部分。即,在 習知例2,若在分解辭典内無輸入圖型之連績之字跡線 之分解法則,就無法分解字跡線,有識别錯誤之問題。 而在應付全部文字之各種連續字之字跡線上有需要龐大 之辭典容量之問題點。 本發明係爲了解決上述之問題而想出來的,其目的 在於提供一種線上文字識别裝置,防止由輸入的文字圖 型中之未預期之成分所引之識别率之降低,而且提高對 連續字等之識别率。 ..課決課題之手段 •嘴. 爲了達成上述之目的,第1種發明之線上文字識别 K :1.+. :A裝置,係關於輸入文字圖型之座標點串資料後輸出和該 輸入的文字圖型對應之文字碼之線上文字識别裝置,包 括: 輸入裝置,輸入記述該輸入的文字圖型時之字跡線 上之座標點串資料; I---------^裝------訂------ (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) A7 B7 經濟部中央標準局員工消費合作社印製 五、發明説明(6 ) 特徵抽出裝置,將藉著以折線近似輸入在該輸入裝 置所輸入之座標點串資料所含按照時間系列順序排列之 座標點而得到之各直線部分設爲片段(Segment)後,抽出 關於該各片段之特徵資料和係各片段之端點之特教點; 辭典記憶裝置,預先記憶就各文字儲存了關於構成 文字之片段之特徵資料及特徵點之辭典; 特徵點對應裝置,依照辭典所記述之各文字之特徵 資料和自特徵抽出裝置所抽出之特徵資料進行構成辭典 内之各文字之片段和自輸入的文字圖型所得到之片段之 對應後,計算片段對應距離; 指定區間特徵抽出裝置,將由和辭典所指定之字跡 線上之特徵點組對應之該輸入的文字圖型之字跡線上之 特徵點組所決定區間之特徵資料抽出,作爲對應字跡線 特徵; 特徵比對裝置,將指定區間特徵抽出裝置所抽出之 對應字跡線特徵和辭典資料内之特徵資料比對後,計算 對應字跡線特徵之距離; 以及輸出裝置,輸出在特徵比對部5所得到之候選 _文字碼, -- 該特徵比對裝置依照所算出之對應字跡線特徵之距 離及由該特徵點對應裝置算出之片段對應距離,特别指 定和該辭典内之該輸入的文字圖型對應之文字。 第2種發明之線上文字識别裝置,係在第1種發明 中,該特徵抽出裝置在相鄰之片段之方向近似時以單一 (請先閱讀背面之注意事項再填寫本頁)A7 Printed by the Consumer Standards Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs B7 V. Description of the invention (5) Direction of the turning-back component near the start and end of the line (for example, the angle difference between consecutive straight lines is the starting or ending line segment below 90 degrees) Code, or weighting, but when the text is not conscious, it is impossible to determine whether the reentrant component near the starting point or the end point is noise or a characteristic required by the text. Therefore, is it possible to just ignore the problematic points of the beginning and end of the handwriting line. In the recognition method using the character line features such as basic handwriting, there is a problem that the distance cannot be calculated when the number of strokes in the dictionary and the input pattern is not the same. In order to process continuous characters, it is necessary to register the pattern of the continuous characters in advance, or The text pattern of the dictionary describes the parts that are easy to be continuous in each text. That is, in Conventional Example 2, if there is no decomposition rule for the successive line of the input pattern in the decomposition dictionary, the line cannot be decomposed, and there is a problem of recognition error. However, there is a problem that a huge dictionary capacity is required on the handwriting lines of various continuous characters that deal with all characters. The present invention was conceived in order to solve the above-mentioned problems, and an object thereof is to provide an online character recognition device, which prevents a reduction in recognition rate caused by unexpected components in an input character pattern, and improves continuous characters, etc. Recognition rate. .. Method to solve the problem • Mouth. In order to achieve the above-mentioned purpose, the first invention of the online character recognition K: 1. + .: A device is used to input the coordinate dot string data of the text pattern and output the input. An online text recognition device for a text code corresponding to a text pattern includes: an input device for inputting coordinate point string data on a handwriting line when describing the input text pattern; I --------- ^ 装- ----- Order ------ (Please read the notes on the back before filling this page) This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) A7 B7 Employees of the Central Standards Bureau of the Ministry of Economic Affairs Printed by the consumer cooperative. V. Description of the invention (6) The feature extraction device will set each straight line part obtained by inputting the coordinate points arranged in time series in the coordinate point string data input by the input device with a polyline approximation. After segmentation (Segment), extract the characteristic data about each segment and the special teaching points that are the endpoints of each segment; The dictionary memory device stores the feature data and feature points of the segments that constitute the text in advance for each character dictionary The feature point correspondence device calculates the correspondence between the fragments constituting each character in the dictionary and the fragments obtained from the input character pattern according to the feature data of each character described in the dictionary and the feature data extracted from the feature extraction device, and then calculates Segment corresponding distance; Feature extraction device for specified section extracts feature data in the section determined by the feature point group on the input text pattern's handwriting line corresponding to the feature point group on the handwriting line designated by the dictionary as the corresponding handwriting line feature Feature comparison device, which compares the corresponding handwriting line features extracted from the designated section feature extraction device with the feature data in the dictionary data, and calculates the distance of the corresponding handwriting line features; and an output device, which outputs the data in the feature comparison unit 5 Candidate_character code obtained-The feature comparison device specifies the text pattern corresponding to the input in the dictionary according to the calculated distance of the corresponding character line feature and the corresponding distance of the segment calculated by the feature point corresponding device. Corresponding text. The online character recognition device of the second invention is in the first invention, and the feature extraction device is single when the directions of adjacent segments are similar (please read the precautions on the back before filling this page)
T 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) 經濟部中央標準局員工消費合作社印袋 A7 五、發明説明(7 ) 之片段結合。 第3種發明之線上文字識别裝置,係在第1種發明 中,該特徵點對應裝置對該辭典内之文字之各字跡線之 起點和終點各自和該輸入的文字圖型之一個座標點對 應;該指定區間特徵抽出裝置將和該起點對應之該輸入 的文字圖型的特徵點與該終點對應之該輸入的文字圖型 的特徵點作爲特徵點組。 第4種發明之線上文字識别裝置,係在第1種發明 中,該特徵比對裝置在特别指定該辭典内之文字時對該 對應字跡線特徵之距離及該片段對應之距離加權。 第5種發明之線上文字識别裝置,係在第1種發明 中,在關於該片段之資料包舍各片段之方向及長度;該 特徵點對應裝置依照該特徵抽出裝置所算出之各片段之 方向及長度算出對應之各片段之價値後,依照該價値算 出片段對應之距離。 第6種發明之線上文字識别裝置,係在第5種發明 中,該特徵點對應裝置使得對應於以構成辭典中之文字 部分表示出之字跡線之片段和對應於構成輸入的文字圖 型之字跡線之中未表示出之部分之片段不對應。 第7種發明之線上文字識别裝置,係在第5種發明 中,該辭典記憶裝置記憶在關於指定之片段之特徵資料 附加了方向非相依資料;該特徵點對應裝置將依照附加 了方向非相依資料之片段所算出之價値作爲定値。 第8種發明之線上文字識别裝置,係在第7種發明 10 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐)~' : (請先閱讀背面之注意事項再填寫本頁) 袭. 訂 - 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(8 ) 中,在關於構成該辭典内之文字之片段之中方向因所輸 入的文字圖型而變動的片段的特徵資料附加方向非相依 資料。 第9種發明之線上文字識别裝置,係在第7種發明 中,該特徵點對應裝置將依照附加了方向非相依資料之 片段所算出之價値設爲〇。 第10種發明之線上文字識别裝置,係在第7種發 明中,設定自和附加了方向非相依資料之片段對應之輸 入的文字圖型所得到之片段數之上限。 發明之實施例 以下,依照圖面説明本發明之最佳實施例。 實施例1 圖1係表示本發明之線上文字識别裝置之實施例1 之方塊構造圖。在本實施例之線上文字識别裝置由輸入 部1、特徵抽出部2、特徵點對應部3、指定區間特徵抽 出部4、特徵比對部5、辭典記憶部6以及輸出部7構成。 輸入部1設置成輸入裝置,輸入記述使用者用筆在圖形 輸入板等所輸入文字資料(輸入的文字圖型)時之字跡線 (Stroke)上之座標點串資料。特徵抽出部2設置成特徵抽 出裝置,將藉著以折線近似輸入;Γ輸入部1之座標點串 資料所含按照時間系列順序排列之座標點而得到之各直 線部分設爲片段(Segment)後,抽出關於該各片段之特徵 資料和係各片段之端點之特徵點。特徵點對應部3設置 11 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) -口 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(9 ) 爲特徵點對應裝置,依照辭典所記述之各文字之特徵資 料和自特徵抽出部2所抽出之特徵資料進行構成辭典内 之各文字之片段和自輸入的文字圖型所得到之片段之對 應後,計算片段對應距離。指定區間特徵抽出部_ 4設置 爲指定區間特徵抽出裝置,將由和辭典所指定之字跡線 上之特徵點組對應之輸入的文字圖型之字跡線上之特徵 點組所決定區間之特徵資料作爲對應字跡線特徵抽出。 特徵比對部5設置爲特徵比對裝置,將指定區間特徵抽 出部4所抽出之對應字跡線特徵和辭典資料内之特徵資 料比對後,計算對應字跡線特徵之距離。辭典記憶部6 設置爲辭典記憶裝置,預先記憶上述之辭典。在本實施 例之辭典就各文字儲存關於構成文字之片段之特徵資料 及特徵點。輸出部7設置爲輸出裝置,輸出在特徵比對 部5所得到之候選文字碼。 圖2係以表形式表示關於辭典中之文字「家」之資 料之圖。圖3係以表形式表示關於辭典中之文字「琢」 之資料之圖。辭典記憶部6所記憶辭典所含内容及特徵 係文字碼、作爲片段之特徵資料之方向碼及片段長度、 字跡線之外接矩形寬度以及字跡線之外接矩形高度。片 段之方向碼及片段長度,除了字跡線以外,對於假想字 跡線也保持。此外,字跡線意指由落筆到提筆爲止之座 標串之單位,但是在此將此字跡線稱爲實字跡線,而由 某字跡線之終點(提筆位置)到下一字跡線之起點(落筆位 置)爲止所連接之字跡線稱爲假想字跡線。實字跡線係可 12 本紙張又度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閱讀背面之注意事項再填寫本頁) Γ Μ Μ 經濟部中央標準局員工消費合作社印製· Β7 五、發明説明(ίο ) 對1字跡線保持多個片段,而假想字跡線對於1字跡線 當作1個片段。在圖2、圖3在括弧内表示保持多個片 段之字跡線之由起點往終點之方向。又,在圖2、圖3 雖未示,保持識别各片段是實字跡線或是假想字跡線之 字跡線識别碼。 圖4係表示在本實施例之文字識别處理之流程圖。 圖5係表示特徵抽出部2之處理之流程圖。圖6(a)係表 示16方向碼之圖例,圖6(b)係表示設定了在DP比對所 用値之表之圖。此外,在本實施例,利用DP比對進行 構成辭典内之各文字之片段和自輸入的文字圖型所得到 之片段之對應。 其次依照圖4之流程圖説明在本實施例之識别處理 之流程。 首先,輸入部1得到使用者用筆在圖形輸入板等所 記入之手寫文字資料之按照時間系列順序排列之座標串 (步騍100)。其次,特徵抽出部2進行前處理、特徵抽出 (步驟101),使用圖5所示之流程圖説明該處理之細節。 特徵抽出部2將對於輸入座標串連續之座標點間之 距離和基準寬度比較後,進行距離未超過基準寬度之點 之間拔處理(步驟201)。在本實施例,將由該間拔後之座 標點開始下一座標點,即按照時間系列順序排列之座標 點以折線近似所得到之各直線部分稱爲片段。其次,抽 出片段之方向碼(步驟202)。除了實字跡線以外,對於假 想字跡線也抽出片段方向碼。在此,使用圖6(a)所示16 13 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 訂 A7 經濟部中央標準局員工消費合作社印製 B7 五、發明説明(11 ) 方向碼抽出方向碼串。然後,進行片段之結合處理(步驟 203)。在此,相鄰片段之方向近似時,具體而言相鄰片 段間之方向差爲±1時,將該片段彼此結合後,再計算結 合後之片段方向碼。例如,接著方向碼8之片段出現方 向碼9之片段時,將這些方向碼結合,作爲方向碼8之 單一的片段。但,對假想字跡線不執行結合處理。然後, 計算結合後之片段長度(步驟204)。片段長度以係在間拔 處理所用基準寬度之幾倍表示。在本實施例,抽出表示 片段之方向之方向碼和長度,作爲特徵資料。經以上之 特徵抽出處理後之輸入圖型如圖7所示。又,以表形式 表示所抽出之特徵的如圖8所示。 其次,回到圖4,、特徵點對應部3自辭典取出1個 文字資料(步驟102)。在本例取出圖2所示「家」之辭典。 其次,特徵點對應部3在輸入圖型和辭典内之文字「家」 之間利用DP比對進行片段之對應(步驟103)。DP比對 照如下所示進行。 設輸入圖型之片段爲Si={si(l),si(2),"sKi),·· si(I)}、辭典之之片段爲 Sd={sd(l),sd(2),"sdG),·· sd(J)},則執行下式。 d[i][j] + D[si(i + l)][sdG + l)]*2' d[i + 1]D +1] = min] d[i][j +1] + D[si(i + l)][sdG +1)] ► d[[i + l][j] + D[si(i + l)][sda + l)], 以下,將上式設爲數學式1。此外,函數min係用以求 14 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 袭- 訂 經濟部中央標準局員工消費合作社印装 A7 _B7 五、發明説明(12 ) 最小値之函數。在此使用下式。 D[si(i+l)][sd(j+l)]=a[si(i+l) » sd〇+l)] * (| si(i+l) | +1 sd(j+l) |) 以下,將上式設爲數學式2。在數學式l,d[i+l][j+l]表 示由起點到si(i+l)、sd(j+l)爲止之對應價値(Cost)之累 積。在數學式2,D[si(i+l)][sd(j+l)]表示片段Si(i+1)和片 段 sd(j+l)之對應價値。a[si(i+l),sd(j+l)]係由片段 si(i+l) 和片段sd(j+l)之方向差決定之値,在此使用圖6(b)所示 表的値。| si(i+l),l 及 I sd(j+l) | 係片段 si(i+l)、Sd(j+1)之 片段長度。又,在此圖上雖未示,也保持賦與最小値之 對應的路徑表。 漸近似地計算數學式1,最後計算下式。 dist dp=d[I][J]/(I+J) 以下,將上式設爲數學式3。將數學式3作爲和辭典之 DP比對之價値(片段對應距離)。變化次數3内之dist意 指用以求DP比對之價値之函數(distance)。此外,該DP 比對使用在上述之「圖形識别」(船久保登著:共立出版) 所記载之方法。 在此,連續字因筆劃數比以正確的筆劃數寫的情況 減少,在正確的筆劃數之文字辭典和連續字輸入圖型之 字跡線及片段對應,對於輸入圖型有辭典之成分多個對 應之情況。 可是,一般辭典之實字跡線或實片段不會成爲輸入 圖型之假想字跡線。因此,有必要使得對應於以構成辭 典中之文字部分表示出之字跡線之片段和對應於構成輸 15 本紙張尺度適用中國國家襟準(CNs ) A4規格(210X297公楚) ϋ n - n n n —i m m n It m n n n T n I n In n . 、ves/, (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(13 ) 入的文字圖型之字跡線之中未表示出之部分之片段不對 應。因此,在片段之DP比對時,在計算輸入圖型之假 想字跡線和辭典之實字跡線成分時對D[si(i)][sd(j)]之値 賦與大的懲罰距離,使得這些片段不會對應,卜藉此阻 止實際上不可能之對應。因而,即使是連續字,也可更 確實地防止文字之誤認。當然,也可使用别的方法使得 上述片段不會對應。 使用數學式.1〜3及圖6計算圖2所示之「家」之辭 典之特徵和圖8所示輸入圖型之特徵之對應後,得到dist dp=682 ° 其次,在圖4,特徵點對應部3使用在步驟103得 到之圖上未示之路徑表,得到和辭典字跡線之起點、終 點對應之輸入圖型之座標點(步驟104)。圖9表示和辭典 内之文字「家」之起點、終點對應之輸入圖型之座標點。, 接著,指定區間特徵抽出部4抽出輸入圖型之對應點間 之特徵(步驟105)。在此,將圖9所示之「家」之各字跡 線之起點、終點之對應點之組作爲特徵點組,將和構成 該特徵點組之起點對應之輸入的文字圖型之特徵點(起點) 與和該終點對應之輸入的文字圖型之特徵點(終點)作爲 在輸入的文字圖型之特徵點組。然後,由在該輸入的文 字圖型之特徵點組所夾之座標點串求輸入的文字圖型之 起點、終點間之外接矩形寬度、外接矩形高度、由起點 往終點之方向,又,特徵點組間之假想字跡線(由終點之 對應點往下一字跡線之起點之對應點之向量)之方向及距 (請先閱讀背面之注意事項再填寫本頁)T This paper size applies to China National Standard (CNS) A4 (210X297 mm). Printed bags for consumer cooperatives of employees of the Central Standards Bureau of the Ministry of Economic Affairs. A. V. Fragment of Invention Description (7). The online character recognition device of the third invention is the first invention. In the first invention, the feature point corresponding device corresponds to the starting point and the ending point of each handwriting line of the character in the dictionary corresponding to a coordinate point of the input character pattern. ; The designated section feature extraction device uses feature points of the input character pattern corresponding to the starting point and feature points of the input character pattern corresponding to the end point as a feature point group. The online character recognition device of the fourth invention is the first invention. In the first invention, the feature comparison device weights the distance of the corresponding character line feature and the distance corresponding to the segment when specifying the characters in the dictionary. The online text recognition device of the fifth invention is the direction and length of each fragment in the data package of the fragment in the first invention; the feature point corresponding device calculates the direction of each fragment calculated by the feature extraction device After calculating the price of each segment corresponding to the length and length, the distance corresponding to the segment is calculated according to the price. The online character recognition device of the sixth invention is the fifth invention, and the feature point corresponding device makes corresponding to the segment of the handwriting line represented by the text part in the dictionary and the text pattern corresponding to the input text. Segments not shown in the handwriting lines do not correspond. In the seventh invention, the online character recognition device is the fifth invention. The dictionary memory device stores the direction-independent data in the feature data of the specified segment; the feature point corresponding device will follow the direction-independent way. The price calculated from the fragment of the data is used as the fixed price. The online character recognition device of the eighth invention is based on the seventh invention. 10 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) ~ ': (Please read the precautions on the back before filling this page) Revision-Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 5. In the description of the invention (8), among the segments constituting the words in the dictionary, the directions of the segments changed due to the input text pattern Non-dependent data is attached to the characteristic data. The online character recognition device of the ninth invention is the seventh invention, and the feature point corresponding device sets the value 算出 calculated by the segment to which the direction-independent data is added as zero. The online character recognition device of the tenth invention sets the upper limit of the number of fragments obtained from the input text pattern corresponding to the fragment to which the direction-independent data is added in the seventh invention. Embodiments of the Invention Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings. Embodiment 1 FIG. 1 is a block diagram showing Embodiment 1 of an online character recognition device according to the present invention. The online character recognition device in this embodiment is composed of an input unit 1, a feature extraction unit 2, a feature point correspondence unit 3, a designated section feature extraction unit 4, a feature comparison unit 5, a dictionary memory unit 6, and an output unit 7. The input unit 1 is provided as an input device, and inputs coordinate point string data describing a character stroke (stroke) when a user uses a pen to input character data (input character pattern) on a graphic tablet or the like. The feature extraction unit 2 is provided as a feature extraction device, and each straight line portion obtained by the coordinate points arranged in the time series order contained in the coordinate point string data of the Γ input unit 1 is set as a segment. , Extract feature information about the segments and feature points that are the endpoints of each segment. Feature Point Correspondence Section 3 Set 11 This paper size is applicable to Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)-Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economy and Trade B7 V. Description of the invention (9) It is a feature point corresponding device, and according to the feature data of each character described in the dictionary and the feature data extracted from the feature extraction unit 2, a fragment of each character in the dictionary and a self-input character map are formed. After the correspondence of the fragments obtained by the type, calculate the corresponding distance of the fragments. The designated section feature extraction unit _ 4 is set as a designated section feature extraction device, and uses the feature data of the section determined by the feature point group on the character line on the input text pattern corresponding to the feature point group on the handwriting line designated by the dictionary as the corresponding handwriting. Line feature extraction. The feature comparison unit 5 is set as a feature comparison device, and after comparing the corresponding handwriting line features extracted by the specified section feature extraction unit 4 with the feature data in the dictionary data, the distance of the corresponding handwriting line features is calculated. The dictionary memory unit 6 is provided as a dictionary memory device, and stores the aforementioned dictionary in advance. In the dictionary of this embodiment, feature data and feature points regarding segments constituting a character are stored for each character. The output section 7 is provided as an output device, and outputs the candidate character code obtained by the feature comparison section 5. Figure 2 is a table showing information about the word "home" in the dictionary. Fig. 3 is a table showing information about the word "zhuo" in the dictionary in a table format. The contents and features contained in the dictionary memorized by the dictionary memory section 6 are a text code, a direction code and a segment length which are characteristic data of the segment, a width of the rectangle outside the handwriting line, and a height of the rectangle outside the handwriting line. The direction code and segment length of the segment are maintained for the imaginary trace in addition to the trace. In addition, the handwriting line refers to the unit of the coordinate string from the writing to the lifting of the pen, but here the handwriting line is referred to as a real writing line, and the end of a certain writing line (the position of the holding pen) to the beginning of the next The handwriting line connected up to the pen-down position is called an imaginary handwriting line. The real handwriting line can be 12 papers that are again applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the notes on the back before filling this page) Γ Μ Μ Printed by the Central Consumers Bureau of the Ministry of Economic Affairs Consumer Cooperatives · Β7 V. Description of the Invention (ίο) Keep multiple segments for 1 trace, while the imaginary trace is regarded as 1 segment for 1 trace. In FIG. 2 and FIG. 3, the direction from the start point to the end point of the handwriting line holding a plurality of segments is shown. In addition, although not shown in Figs. 2 and 3, a handwriting line identification code for identifying whether each segment is a real handwriting line or an imaginary handwriting line is held. FIG. 4 is a flowchart showing a character recognition process in this embodiment. FIG. 5 is a flowchart showing the processing of the feature extraction unit 2. Fig. 6 (a) is a diagram showing a 16-direction code, and Fig. 6 (b) is a diagram showing a table in which a chirp used for DP comparison is set. In addition, in this embodiment, the correspondence between the segments constituting each character in the dictionary and the segments obtained from the input character pattern is performed using DP comparison. Next, the flow of the recognition process in this embodiment will be described in accordance with the flowchart of FIG. 4. First, the input unit 1 obtains a coordinate string in the time series order of handwritten text data entered by a user on a tablet or the like with a pen (step 100). Next, the feature extraction unit 2 performs pre-processing and feature extraction (step 101), and details of this process will be described using a flowchart shown in FIG. 5. The feature extraction unit 2 compares the distance between consecutive coordinate points of the input coordinate string with the reference width, and performs a drawing process between points whose distance does not exceed the reference width (step 201). In this embodiment, each straight line portion obtained by approximating the coordinate points arranged in the time series order from the coordinate points after the drawing is called a segment. Next, the direction code of the segment is extracted (step 202). In addition to the real handwriting lines, the segment direction code is also extracted for the imaginary handwriting lines. Here, use 16 13 as shown in Figure 6 (a). This paper size applies the Chinese National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page). Order A7 Central Bureau of Standards, Ministry of Economic Affairs Printed by employee consumer cooperative B7 V. Description of invention (11) Direction code extracts the direction code string. Then, the segment combining process is performed (step 203). Here, when the directions of adjacent segments are similar, specifically when the direction difference between adjacent segments is ± 1, the segments are combined with each other, and then the combined segment direction codes are calculated. For example, when a segment of direction code 8 appears next to a segment of direction code 9, these direction codes are combined as a single segment of direction code 8. However, the combining process is not performed on the imaginary word line. Then, the combined fragment length is calculated (step 204). The length of the segment is expressed in multiples of the reference width used in the thinning process. In this embodiment, a direction code and a length indicating a direction of a segment are extracted as feature data. The input pattern after the above feature extraction processing is shown in FIG. 7. Fig. 8 shows the extracted features in a table format. Next, returning to Fig. 4, the feature point correspondence unit 3 extracts one character data from the dictionary (step 102). In this example, the dictionary of "home" shown in FIG. 2 is taken out. Next, the feature point correspondence unit 3 uses DP comparison to perform segment correspondence between the input pattern and the word "home" in the dictionary (step 103). The DP comparison is performed as shown below. Let the segment of the input pattern be Si = {si (l), si (2), " sKi), si (I)}, and the segment of the dictionary as Sd = {sd (l), sd (2) , &Quot; sdG), ... sd (J)}, then execute the following formula. d [i] [j] + D [si (i + l)] [sdG + l)] * 2 'd [i + 1] D +1] = min] d [i] [j +1] + D [si (i + l)] [sdG +1)] ► d [[i + l] [j] + D [si (i + l)] [sda + l)], below, set the above formula to math Formula 1. In addition, the function min is used to obtain 14 paper sizes that are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page). Printing A7 _B7 V. Description of the Invention (12) The function of the smallest unit. The following formula is used here. D [si (i + l)] [sd (j + l)] = a [si (i + l) »sd〇 + l)] * (| si (i + l) | +1 sd (j + l ) |) In the following, the above expression is set to Mathematical Expression 2. In the mathematical formula l, d [i + l] [j + l] represents the cumulative product of the corresponding prices 値 (Cost) from the starting point to si (i + l) and sd (j + l). In Mathematical Formula 2, D [si (i + l)] [sd (j + l)] represents the corresponding price 値 of the segment Si (i + 1) and the segment sd (j + l). a [si (i + l), sd (j + l)] is determined by the difference in direction between the segment si (i + l) and the segment sd (j + l). Here we use Figure 6 (b) Table of puppets. | si (i + l), l and I sd (j + l) | are the lengths of the fragments si (i + l), Sd (j + 1). Although not shown in this figure, a path table corresponding to the minimum value is also maintained. Asymptotically calculate Mathematical Formula 1, and finally calculate the following formula. dist dp = d [I] [J] / (I + J) Here, the above expression is set to Mathematical Expression 3. Let Mathematical Formula 3 be the price of DP comparison with the dictionary (the distance corresponding to the segment). The distance within the number of changes 3 means the distance used to find the value of the DP comparison. In addition, this DP comparison uses the method described in "Graphic Recognition" (published by Funakubo: Kyoritsu Publishing). Here, the number of consecutive characters is reduced compared to the case of writing with the correct number of strokes. Correspondence between the character dictionary of the correct number of strokes and the handwriting lines and fragments of the continuous character input pattern, there are multiple components that have a dictionary for the input pattern. Corresponding situation. However, the real handwriting or real segment of the general dictionary will not become the imaginary handwriting of the input pattern. Therefore, it is necessary to make the fragments corresponding to the handwriting lines represented in the text part of the dictionary and the corresponding 15 paper sizes to the Chinese National Standards (CNs) A4 (210X297) Chu ϋ n-nnn — immn It mnnn T n I In In. The part of the part which is not shown in the zigzag line does not correspond. Therefore, in the DP comparison of the segments, the endowment of D [si (i)] [sd (j)] has a large penalty distance when calculating the imaginary handwriting line of the input pattern and the real handwriting line component of the dictionary. So that these fragments will not correspond, so as to prevent the correspondence that is impossible in practice. Therefore, even if it is a continuous word, it is possible to more surely prevent misunderstanding of the character. Of course, other methods can also be used so that the above fragments do not correspond. Calculate the correspondence between the characteristics of the "home" dictionary shown in Fig. 2 and the features of the input pattern shown in Fig. 8 using mathematical formulas 1-3 and Fig. 6, and then obtain dist dp = 682 ° Second, in Fig. 4, the characteristics The point correspondence unit 3 uses a path table not shown in the map obtained in step 103 to obtain coordinate points of the input pattern corresponding to the start and end points of the dictionary handwriting line (step 104). Fig. 9 shows the coordinates of the input pattern corresponding to the start and end points of the word "home" in the dictionary. Next, the designated section feature extraction unit 4 extracts features between corresponding points of the input pattern (step 105). Here, the set of the corresponding points of the starting point and the end point of each handwriting line of "home" shown in FIG. 9 is taken as the feature point group, and the feature points of the input text pattern corresponding to the starting point constituting the feature point group ( (Start point) The feature point (end point) of the input text pattern corresponding to the end point is used as the feature point group of the input text pattern. Then, the starting point of the input text pattern, the width of the outer rectangle, the height of the outer rectangle, and the direction from the start point to the end point are calculated from the coordinate point string sandwiched by the feature point group of the input text pattern. Direction and distance of imaginary handwriting lines (vectors from the corresponding point at the end point to the corresponding point at the starting point of the next handwriting line) between the point groups (please read the precautions on the back before filling this page)
16 本紙張尺度適用中ΐ國家標準(CNS ) A4規格(210X297公釐) :16 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm):
經 央 準 局 員 合 作 社 五、發明説明(Η . 離1後,將這些特徵稱爲對應字%線特 圖10所示。 其次,特徵:對部5比對辭典之 圖型之對應字跡線特徵(步驟106)β闕於對…待徵和輪入 之比對,例如使用(外接矩形寬度之差;+ 、字縣線特徵 之差)+(由起點往終點之方向之差接矩形高度 差)+(假想字跡線長度之差)計算。跡線之方向 升在輸入圖铟 辭典對應之假想字跡線時,不計算讀部分。个存在和 10之特徵間進行上述計算後,得到和辭7典圖2和圖 字跡線特徵之距離distst=93。 、私」之對應 其次,特徵比對部5判斷比對之辭典是 碟107)。辭典内存在其他文字時,回到步驟ι〇2在(步 下—文字之比對。此時存在其他文字,和圖3之辭:仃 「琢」比對。特徵點對應部3在步碟m及步聲1〇= 行和上述相同的處理後,得到「琢」之抑比對之價値― dp==674。同樣地,特徵點對應部3執行步驟1〇4,求輸 入圖髮和文字「琢」之字跡線之起點、終輯應之座標 値點。其結果如圖11所示。 其次’指定區間特徵抽出部4執行步驟105,和辭 典之「家」一樣地抽出對應字跡線特徵。其結果如圖12 所示。然後,特徵比對部5參照圖12所示之字跡線特徵 和圖3之辭典計算,得到字跡線特徵之距離dist st=223。 處刻在辭典中無參照之文字爲止持續上述之流程,無 文字後,輸出部7進行識别結果之分類作業(步騍1〇8)。結 其結果 如 請 聞 讀 背 面 之 注 意 事 項 再 頁 訂 17 本紙張尺度適用中國國家標準(CNS)A4規格(210X297公餐) 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明(15 ) 果之分類係對各自之辭典求下式。 dist all=axdist dp+Pxdist st (a、β 係加權常數) 以下,將上式設爲數學式4。現在,設α=1、β=1,則 dist all 「家」=682+93=775 - dist all 「琢」=674+223=897 。藉著按照上升順序將dist all分類,將「家」設爲第1 候選文字,將「琢」設爲第2候選文字。最後,輸出部 7輸出候選文字「,家」及「琢」後結束(步驟109)。 上述處理後,最後識别結果爲「家」。在只有DP 比對之結果,「琢」爲第1候選文字,但是藉著併用對 應字跡線特徵計算候選文字可得到正解。 如上述所示,若利用本實施例,藉著併用DP比對 和對應字跡線特徵進行識别處理,對於字形扭曲之連續 字之圖形,不保持和連續字對應之辭典資料也可識别。 此外,在實施例1,將輸入的文字圖型和全部辭典 中之文字比對,但是使用少數之特徵進行大分類後,對 大分類之結果進行DP比對之計算,也可計算對應字跡 線特徵。又,在上述例子設α=1、β=1,使DP比對和對 應字跡線特徵之加權相等,但是未限定該値。又,最後 計算距離之數學式(數學式4)設爲對DP比對之結果和對 應字跡線特徵之之結果加權後之値之和,但是也可不是 以簡單的係數,而是以附加其他的計算方法或條件等加 權,使得得到芷解,例如使用對應字跡線特徵分類後, 第1候選文字之距離大於某値時,只使用DP比對之結 18 ---I ( HI m I n^i HI n HI nn In 一nJ (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) kl B7 五、發明説明(16 ) --—-- 果重新分類,將第丨名作爲候選文字等。 (請先閱讀背面之注意事項再填寫本頁) 此外,片段之對應使用Dp比對,但是不限定爲Dp 比對,也可使雜緩法等其他方法。又,使用對應部分 I寬度、商度、由起點往終點之方向、假想字蹄線之寬 度、方向説明了對應字跡線特徵,但是也可抽出例如基 本字跡線等其他的特徵替代之。 實施例2 其次,使用在實施例1所用之圖6和圖13、圖14、 圖I5、圖I6説明在實施例i _^Dp t匕對抑制文字圖型之 變動所引起之價値的上升。圖13係表示包含變動大之部 分之文字之例子之圖,圖14係表示文字「木」之片段辭 典之内容例之圖,圖15係表示圖13(句所示文字「木」 之片段特徵之圖,圖16係表示使用了方向非相依碼之文 字「木」之片段辭典之圖。 如實施例1般使用片段之方向碼及片段長度進行Dp 比對時,因個人而在輸入文字之某字跡線附加了「撇鉤」 之情況,或在文字圖型之某字跡線之終點和下一字跡線 之起點之距離近之情況等,有片段或假想字跡線之方向 碼變動很大之情況。在圖13(a)〜(C)所示之文字,圖中〇 内之部分之假想字跡線之方向差,使用16方向碼時在圖 型間變爲8,成爲使DP比對之價値增加之原因。又,如 圖13(d)所示,對於有「撇鉤」之圖型,在辭典無「撇鉤」 之情況,DP比對之價値變大,這種字跡線在同一文字中 有多個時,結果有誤讀爲其他文字之情況。 本紙張尺度適用中國國家標準(CNS ) A4規格(2丨〇><297公釐) 經濟部中央標準局員工消費合作社印裝 A7 ——-____ _B7 五、發明説明(----- 在本實施例,其特徵在於,爲了防止之,藉著預先 對於方向差因個人或文字圖型而相差很大之部分不計算 方向差,而設置只使用字跡線長度資料計算之片段,使 得迴避此問題點。 例如,照目15所示抽出目13⑷之圖型之片段之方 向碼串和片段長度,「木」之辭典之片段之方向碼串和 片段長度如圖14所示。片段特徵之Dp比對時,若孥止 輸入圖型之假想字跡線和辭典之f字跡線對應,則:圖 14、圖15’因筆劃數都等於4劃,就各自按照筆順將字 跡線一對一對應。即,圖15之輸入字跡線之第2劃之方 向碼{9, 13}就和辭典之第2劃之方向碼{9}對應,使用 數學式2及圖6(b)計算輸入圖型之第2劃之「撇鉤」之 部分之片段和辭典之對應價値,得到價値=方向差扣片 段長度之和=20x(7+l)=160。 而,處理方向碼之變動之在本實施例之特徵性辭典 之例子如圖16所示。在本實施例,使得在關於規定的片' 丰又,即如上述所示,被認爲因輸入的文字圖型而易出現 個人差異、「撇鉤」等之方向變動的、有方向碼變動很 大之情況之片段之特徵資料附加方向非相依資料。在本 實施例,在方向非相依資料上使用方向非相依碼號碼。 在圖6(a)之16方向碼,假設方向非相依碼號碼爲17, 保持在圖16之第2劃。關於方向碼係17之片段之片段 計算’定義爲將方向差設爲〇後進行DP比對,圖丨6之 第2劃之字跡線和圖15之第2劃之字跡線之價値計算與 20 本紙張尺度適用中國國家標準(CMS ) Μ規格(21〇X297公釐Cooperative Association of the Central Government Bureau V. Description of the invention (Η. After leaving 1, these features are called the corresponding word% line chart 10. Second, the feature: the corresponding line character of the dictionary pattern of the dictionary 5 ( Step 106) β is used for the comparison of the… sign and the turn, for example, using (the difference between the width of the circumscribed rectangle; + and the difference between the characters of the county line) + (the difference between the direction from the starting point to the ending point is the height difference of the rectangle) + (Difference in the length of the hypothetical handwriting line) calculation. The direction of the trace is not calculated when the hypothetical handwriting line corresponding to the indium dictionary is input. After performing the above calculation between the features of the existence and 10, the 7 dictionary is obtained. The distance distst of the features of the figure 2 and the graph line is distst = 93. Second, the feature comparison unit 5 judges that the dictionary is the disc 107). When there are other words in the dictionary, go back to step ι〇2 (under the step-text comparison. At this time, there are other texts, and the word in Figure 3: 琢 "zhuo" comparison. Feature point corresponding part 3 is in the step disc m and step sound 10 = After performing the same processing as above, we get the price of "cut", dp = = 674. Similarly, the feature point corresponding unit 3 executes step 104 to find the input image and The starting point of the handwriting line of the word "Zhuo", and the coordinate point of the final series. The result is shown in Figure 11. Next, the "specified section feature extraction unit 4 executes step 105, and extracts the corresponding handwriting line like the" home "in the dictionary. The result is shown in Fig. 12. Then, the feature comparison unit 5 refers to the handwriting line feature shown in Fig. 12 and the dictionary of Fig. 3 to obtain the distance dist st = 223 of the handwriting line feature. The above process continues until the referenced text. After no text, the output section 7 performs the classification of the recognition results (step 108). For the results, please read the precautions on the back and then order 17 paper standards for China. National Standard (CNS) A4 Specification (210X297) ) Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the invention (15) The classification of the fruit is based on the following formulae: dist all = axdist dp + Pxdist st (a, β are weighted constants) Below, Set the above formula to Mathematical Formula 4. Now, if α = 1 and β = 1, then dist all “Home” = 682 + 93 = 775-dist all “zhuo” = 674 + 223 = 897. By following the ascending order Classify dist all, set "home" as the first candidate character, and "cut" as the second candidate character. Finally, the output unit 7 outputs the candidate characters ", home" and "cut" and ends (step 109). After the above processing, the final recognition result is "Home". In the result of only DP comparison, "Zhuo" is the first candidate character, but a positive solution can be obtained by calculating the candidate character with the corresponding character line features. As shown above, if With this embodiment, by using DP comparison and the corresponding character line feature recognition processing, the pattern of continuous characters with distorted glyphs can be identified without maintaining the dictionary data corresponding to the continuous characters. In addition, in the first embodiment, Entered text patterns and all in the dictionary Text comparison, but after using a small number of features for large classification, the DP comparison of the results of the large classification can also be used to calculate the characteristics of the corresponding line. Also, in the above example, set α = 1 and β = 1 to make DP The weighting of the comparison and the corresponding handwriting line features is equal, but the 値 is not limited. Furthermore, the mathematical formula for calculating the distance (Mathematical formula 4) is set to the weighted result of the DP comparison result and the result of the corresponding handwriting line features. Sum, but it can also be weighted with other calculation methods or conditions instead of simple coefficients, so that the solution can be obtained. For example, after using the corresponding character line feature classification, the distance of the first candidate text is greater than a certain distance, Use only DP comparison knot 18 --- I (HI m I n ^ i HI n HI nn In 1 nJ (Please read the precautions on the back before filling this page) This paper size applies Chinese National Standard (CNS) A4 Specifications (210X297 mm) kl B7 V. Description of the invention (16) ------- Reclassify the fruit and use the first name as the candidate text. (Please read the notes on the back before filling in this page.) In addition, the correspondence of fragments uses Dp comparison, but it is not limited to Dp comparison. Other methods such as clutter method can also be used. In addition, the width of the corresponding portion I, the quotient, the direction from the start point to the end point, the width and direction of the imaginary hoof line are used to describe the characteristics of the corresponding handwriting line, but other features such as the basic handwriting line may be extracted instead. Embodiment 2 Next, using FIG. 6 and FIG. 13, FIG. 14, FIG. 15, and FIG. I6 used in Embodiment 1 to explain the increase in price caused by suppressing the change in the text pattern in Embodiment i_Dp tk. FIG. 13 is a diagram showing an example of a character including a part that has changed a lot, FIG. 14 is a diagram showing an example of a content of a dictionary of the segment of the character “wood”, and FIG. 15 is a feature showing a segment of the character “wood” shown in FIG. 13 Fig. 16 is a diagram showing a segment dictionary using the word "wood" with a direction-independent code. When performing Dp comparison using the direction code and segment length of the segment as in Example 1, personally input the text In the case where a "skimming hook" is attached to a certain writing line, or when the distance between the end of a writing line and the starting point of the next writing line in the text pattern is close, there is a large change in the direction code of a fragment or an imaginary writing line. In the characters shown in Figures 13 (a) ~ (C), the direction of the imaginary handwriting line in the part within 0 in the figure is different. When using the 16-direction code, it becomes 8 between the patterns, which makes the DP comparison. The reason for the increase in price. Also, as shown in Figure 13 (d), for the pattern with "skeleton", in the case that there is no "skipper" in the dictionary, the price ratio of DP comparison becomes larger. If there are multiple characters, the result may be misinterpreted as other characters. Applicable to China National Standard (CNS) A4 specification (2 丨 〇 < 297 mm) Printed on A7 of Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs ——-____ _B7 V. Description of the invention (----- The embodiment is characterized in that, in order to prevent this, by not calculating the direction difference for a part in which the direction difference is greatly different due to personal or text patterns, and setting only a segment calculated using the length data of the handwriting line, so as to avoid this problem For example, as shown in head 15, the direction code string and length of the fragment of the pattern of head 13 目 are extracted, and the direction code string and length of the fragment of the "Wood" dictionary are shown in Figure 14. The Dp ratio of the feature of the fragment At the same time, if the imaginary handwriting line of the input pattern and the f handwriting line of the dictionary correspond, then: Figures 14 and 15 'because the number of strokes are equal to 4 strokes, the handwriting lines are corresponding one to one according to the stroke order. The direction code {9, 13} of the second stroke of the input handwriting line in FIG. 15 corresponds to the direction code {9} of the second stroke of the dictionary. Use the mathematical formula 2 and FIG. 6 (b) to calculate the number of the input pattern. The snippets of the 2-stroke "skimmer" and the corresponding price of the dictionary, Arrival price 値 = sum of the lengths of the direction difference clips = 20x (7 + l) = 160. In addition, an example of a characteristic dictionary in this embodiment for processing changes in the direction code is shown in Fig. 16. In this embodiment, Regarding the predetermined film 'Feng', that is, as shown above, it is considered that the direction of personal differences, "skippers", etc. are likely to change due to the input text pattern, and there are cases where the direction code changes greatly. The characteristic data is attached with direction-independent data. In this embodiment, the direction-independent code number is used on the direction-independent data. In the 16-direction code in FIG. 6 (a), it is assumed that the direction-independent code number is 17, which is maintained in the figure. The second stroke of 16. The segment calculation of the segment of the direction code 17 is defined as DP comparison after setting the direction difference to 0. The line of the second stroke of FIG. 6 and the line of the second stroke of FIG. 15 Calculate the price of the wire and 20 paper sizes applicable to the Chinese National Standard (CMS) M specifications (21 × 297 mm)
經濟部中央標準局員工消費合作社印製 A7 s__________ B7 __ 五、發明説明(18 ) " 圖I4之情況一樣地計算Dp比對,使用數學式5計算。 與圖14之情況一樣地進行第2劃之對應之結果,對於辭 典{9,17}和輸入圖型{9,13},{9}和{9}、{17}和{13} 對應’在數學式5之{17}和{13丨之價値計算,得到方向 差〇x(l + l)=〇,加上{9}和⑼之價値計算〇χ(7+7)=〇,在 字跡線單位之價値一定成爲固定値〇。於是,藉著使用 方向非相依碼,與和圖14之辭典之價値16〇相比,在 DP比對之價値變小,結果,和辭典之距離變小,可防止 識别錯誤。 可是,進行和方向非相依碼之Dp比對時,存在未 如期待進行對應之情況。使用圖17説明之。設爲進行圖 17(a)之折線和圖17(b)之辭典之DP比對的。圖i7(a)具 有5個片段11〜15’圖17(b)具有2個片段16、17。使 用圖6(a)時,各片段之方向碼,片段u係9、片段12 係5、片段13係9、片段14係5、片段15係9,又片段 16係9,片段17設爲方向非相依碼η。此外e,片段長 度都設爲1。在此,在圖17(a)及(b)使用數學式!及數學 式2進行對應。首先,將片段11和片段16對應,其價 値係方向差0χ(1 + 1)==0 其次,在數學式1,片段12和 片段16,因方向差係4,由圖數學式2,得到價 値爲20x(l + l)=40,而片段11和片段17因係和方向非 相依碼對應,價値爲0,片段12和片段17也因係和方 向非相依碼對應’價値爲0,價値最小之片段1 1和片段 17或片段Π和片段17對應。同樣地計算,剩下的片段 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐〉 -------:I-11—— (請先閱讀背面之注意事項再填寫本頁)Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs A7 s__________ B7 __ V. Description of the invention (18) " The Dp comparison is calculated in the same manner as in the case of Figure I4, and is calculated using mathematical formula 5. As in the case of FIG. 14, the corresponding result of the second stroke is performed. For the dictionary {9, 17} and the input pattern {9, 13}, {9} corresponds to {9}, {17} and {13} correspond to ' Calculate the price {of {17} and {13 丨 in Mathematical Formula 5 to get the direction difference 〇x (l + l) = 〇, add {9} and the price of 値 to calculate 〇 (7 + 7) = 〇, in The price of the unit of writing lines must be fixed. Therefore, by using the direction-independent code, the value of the DP comparison value becomes smaller than the value of the dictionary value of 16 in FIG. 14, and as a result, the distance from the dictionary becomes smaller, thereby preventing recognition errors. However, when the Dp comparison with the direction-independent code is performed, there may be cases where the correspondence is not as expected. This will be described using FIG. 17. It is assumed that the DP line of FIG. 17 (a) is compared with the DP of the dictionary of FIG. 17 (b). Fig. I7 (a) has five fragments 11 to 15 ', and Fig. 17 (b) has two fragments 16,17. When using Fig. 6 (a), the direction code of each segment, segment u is 9, segment 12 is 5, segment 13 is 9, segment 14 is 5, segment 15 is 9, segment 16 is 9, and segment 17 is set as the direction. Non-dependent code η. In addition, the length of the clips is all set to 1. Here, use mathematical formulas in Figures 17 (a) and (b)! Correspond to the mathematical formula 2. First, segment 11 and segment 16 correspond to each other, and the valence direction difference is 0χ (1 + 1) == 0. Secondly, in mathematical formula 1, segment 12 and fragment 16, because of the direction difference system 4, can be obtained from mathematical formula 2 of the figure. The price tag is 20x (l + l) = 40, while fragment 11 and fragment 17 correspond to the non-dependent code of the direction and the direction, and the price tag is 0, and the fragment 12 and the fragment 17 also correspond to the code and the direction-independent code. The smallest segment 11 corresponds to segment 17 or segment Π corresponds to segment 17. Calculate the same, the remaining paper size of this paper applies Chinese National Standard (CNS) A4 specifications (210X297 mm) -------: I-11—— (Please read the precautions on the back before filling this page )
、tT B7 ............— 五、發明説明(19 ) 13、14、15都和片段17對應’價値也 之片段對應之價値使用數學式3設Ϊ * 結果,意指圖17(a)及(13)係同一, , 成爲不同的對應0 應時,雖包含方向非相依碼,但是可能識 别爲形狀完全不相似之文字。爲 決僉-署白“。 爲 〈,在本實施例, 、'疋^自和附加了方向非相依碼之片段對應之輸入的 又子圖型所得到片段之個數之上限。 例如’設和方向非相依碼對應之上限數爲Μ,關 :圖17⑷及(b)之價値,片段u〜15和片段16對應、片 段15和片段17對應。該價値變爲(片段u和片段“之 價値=)〇+(片段12和片段1δ之價値=M〇+(片段13和 片段16之價値=)〇+(片段14和片段16之價値=)4〇+(片 段15和片段16之價値=)0+(片段15和片段17之價値 =)0=80,和剛才之價値相比,得到如期待之價値。 此外,在上述實施例,爲了依照方向碼係17之片 •k所计算之價値變成〇而將其方向差設爲〇,使得忽略 「撇鉤」等之片段,但是也可應用將價値設爲〇以外之 固定値,或如不成爲0般設定方向差等。 此外,在上述例,DP比對之對應點之計算式使用 了數學式1~3,但是未限定如此,也可使用其他的數學 式。 【發明效果】 22 本紙張尺度適用中國國家標準(CNS ) A4规格(210X297公釐) (請先閲讀背面之注意事項再填寫本頁) 訂 經濟部中央標準局員工消費合作社印製, TT B7 ............ — V. Description of the invention (19) 13, 14, 15 are corresponding to the fragment 17 and the price is corresponding to the fragment. Use the mathematical formula 3 to set * Result, It means that Figs. 17 (a) and (13) are the same, and become different correspondences. Although they may contain direction-independent codes, they may be recognized as characters with completely different shapes. To determine-sign ". For <, in this embodiment, ', ^^ from and to the direction of the non-dependent code-added segment corresponding to the input and sub-patterns obtained by the upper limit of the number of segments. For example,' set The upper limit corresponding to the direction-independent code is M. Off: the price points in Figures 17 and (b), the fragments u ~ 15 correspond to the fragment 16 and the fragments 15 correspond to the fragment 17. The price ratio becomes (fragment u and fragment "of Price 値 =) 〇 + (values of fragments 12 and 1 δ 値) = M 〇 + (values of fragments 13 and 16 値 =) 〇 + (values of fragments 14 and 16 値 =) 4 + (values of fragments 15 and 16 値) =) 0+ (prices of fragments 15 and 17 値 =) 0 = 80, compared with the price of just now, we get the expected price 値. In addition, in the above embodiment, it is calculated in accordance with the direction code 17 of the piece • k The price 値 becomes 0 and the direction difference is set to 0, so that fragments such as "skippers" are ignored, but a fixed 値 where the price 値 is set to other than 0, or a direction difference is set if it does not become 0. In addition, In the above example, the formulas for the corresponding points of the DP comparison use Mathematical Formulas 1 ~ 3, but it is not limited to this. Other formulas can also be used. Mathematical formula. [Inventive effect] 22 This paper size is applicable to Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page) Order Printed by the Consumers Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs
A7 B7 經濟部中央標準局員工消費合作社印製 五、發明説明(2〇 ) 若利用本發明,對於易識别錯誤之連續字或運筆方 向相似之文字間,藉著併用片段對應和對應字跡線特徵 進行文字識别處理,可更詳細地檢定文字,結果可高精 度地識别文字。又,在同一筆順,持續文字之任一部分 也可用一種辭典識别,對不同的連續字不必各自設置對 應之辭典。因此,例如對於字形潦草之連續字圖型,不 保持和連續字對應之辭典資料也可識别,有削減製作辭 典之勞力及辭典容量等附加之各種效果。 又,利用計算識别最後的文字時,因使得可設定片 段對應和對應字跡線特徵之加權,可提供更正確的文字 識别處理。 又,因使得對應於以構成辭典中之文字部分表示出 之字跡線之片段和對應於構成輸入的文字圖型之字跡線 之中未表示出之部分之片段不對應,係連續字也可更確 實地防止文字之識别錯誤。 又,因使得設置附加了方向非相依資料之辭典,計 算片段對應之距離時,可使得其計算結果不因「撇鉤」 「下壓」等文字之部分變動而大變動,結果可進行更高 精度之文字識别。 又,因使得設置自和附加了方向非相依碼之片段對 應之輸入的文字圖型所得到片段之個數之上限,可進行 更高精度之文字識别。 【圖式簡單説明】 23 (請先閱讀背面之注意事項再填寫本頁) 訂 本紙張尺度適用中國國家標準(CNS )八4規格(210X297公釐) 經濟部中央標準局員工消費合作社印装 Μ五、發明説明(21 ) 圖1係表示本發明之線上文字識别裝置之實施例1 之方塊構造圖。 圖2係表示存入在實施例1使用之辭典之文字「家」 之資料之内容例之圖。 - 圖3係表示存入在實施例1使用之辭典之文字「琢」 之資料之内容例之圖。 圖4係表示在實施例1之文字識别處理之流程圖。 圖5係表示在實施例1之文字識别處理之中特徵抽 出部所進行處理之流程圖。 圖6(a)係表示在實施例1使用的16方向碼之例之 圖,(b)係表示設定了在DP比對所用値之表之圖。 圖7係表示在實施例1對輸入圖型進行特徵抽出處 理後之圖型之圖。 圖8係以表之形式表示在實施例1利用對輸入圖型 進行特徵抽出處理所抽出之特徵之圖。 圖9係表示和辭典内之文字「家」之起點、終點對 應之輸入圖型之座標點之圖。 圖10係表示和對於辭典内之文字「家J之輸入圖 型對應之字跡線特徵之圖.。 圖11係表示和辭典内之文字「琢」之起點、終點 對應之輸入圖型之座標點之圖。 圖12係表示和對於辭典内之文字「琢」之輸入圖 型對應之字跡線特徵之圖。 圖13係依據文字圖型表示變動大的部分之例子之 24 ----------^裝------1T------λ (請先閱讀背面之注意事項再填寫本頁) 本紙張尺度適用中國國家標準(CNS ) Α4規格(210Χ297公釐) A7 B7 五、發明説明(22 ) 圖。容例之V4係表示辭典内之文字「木」之片段辭典之内 圖 圖15係表示w 13⑷所示文字「木」之片段特徵 之 經濟部中央標準局員工消費合作社印裝 圖16係表示使用了方向非相依碼之文 、 片段辭典之圖。 木」々 之對應V:::實施例2用_和方向非相依瑪 之特徵圖^係^在習知心使用了方㈣之識别方式 圖19係表示8方向碼之例子之圖。 圖20係表示習知例2之線上文字識别 構造之方塊構造圖。 本 圖21係表示方向碼串和字跡線碼之對應表之圖。 圖22係爲了説明習知例2之動作而表示使用文字 「石」之輸入圖型之圖。 厂圖23係爲了説明習知例2之動作而表示使用文字 厂石」之輸入圖型之圖。 圖24係表示連續字字跡線之分解法則之圖。 圖25係表示在習知例i之距離計算時之位置偏差 之影響之圖。 圖26係表示包含Γ撇鉤」「下壓」之輸入圖型和 不含「撇夠」「下壓」之辭典之例子之圖。A7 B7 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (20) If the present invention is used, for the consecutive characters that are easy to identify and mistake or the characters with similar pen stroke directions, the segment correspondence and corresponding character line characteristics are used together. Perform character recognition processing to verify characters in more detail, and as a result, characters can be recognized with high accuracy. In addition, in the same stroke, any part of the continuous characters can be identified by a dictionary. It is not necessary to set a corresponding dictionary for different consecutive characters. Therefore, for example, the continuous font pattern of the glyph scribble can be recognized without maintaining dictionary data corresponding to the continuous characters, and various additional effects such as reducing the labor and dictionary capacity for creating the dictionary can be recognized. In addition, when calculating the last character by calculation, it is possible to set the weighting of the segment correspondence and the corresponding character line feature, which can provide more accurate character recognition processing. In addition, since the segment corresponding to the handwriting line indicated by the text portion in the constituent dictionary is not compatible with the segment corresponding to the portion not shown in the handwriting line constituting the input text pattern, the continuous word can also be changed. Prevents text recognition errors. In addition, because a dictionary with direction-independent data is added, when calculating the distance corresponding to a segment, the calculation result can not be greatly changed due to some changes in the characters such as "skipper" and "press down", and the result can be higher. Accurate text recognition. In addition, because the upper limit of the number of segments obtained from the input character pattern corresponding to the segment to which the direction-independent code is added is set, a higher-precision character recognition can be performed. [Schematic description] 23 (Please read the notes on the back before filling out this page) The size of the paper used in this edition is applicable to China National Standard (CNS) 8-4 specifications (210X297 mm). V. Description of the Invention (21) FIG. 1 is a block diagram showing Embodiment 1 of the online character recognition device of the present invention. FIG. 2 is a diagram showing an example of the content of data stored in the character "home" in the dictionary used in the first embodiment. -Fig. 3 is a diagram showing an example of the content of the data "word" stored in the dictionary used in Example 1. Fig. 4 is a flowchart showing a character recognition process in the first embodiment. Fig. 5 is a flowchart showing processing performed by a feature extraction unit in the character recognition processing of the first embodiment. Fig. 6 (a) is a diagram showing an example of a 16-direction code used in Embodiment 1, and (b) is a diagram showing a table in which a chirp used for DP comparison is set. Fig. 7 is a diagram showing a pattern after the feature extraction processing is performed on the input pattern in the first embodiment. Fig. 8 is a table showing the features extracted by performing feature extraction processing on the input pattern in the first embodiment. Fig. 9 is a diagram showing the coordinate points of the input pattern corresponding to the start and end points of the word "home" in the dictionary. Fig. 10 is a graph showing the characteristics of the handwriting line corresponding to the input pattern of the character "JJ" in the dictionary. Fig. 11 is a coordinate point of the input pattern corresponding to the starting point and the end point of the character "cut" in the dictionary. Figure. Fig. 12 is a diagram showing the characteristics of the handwriting line corresponding to the input pattern of the word "cut" in the dictionary. Figure 13 is an example of a part that shows a large change according to the text pattern. 24 ---------- ^ Installation ----- 1T ------ λ (Please read the precautions on the back first (Fill in this page again.) This paper size is in accordance with Chinese National Standard (CNS) A4 specification (210 × 297 mm) A7 B7 V. Description of invention (22). The V4 of the example is the part of the dictionary that shows the word "wood" in the dictionary. Figure 15 is the printed version of the staff of the Central Standards Bureau of the Ministry of Economic Affairs, which shows the characteristics of the part of the character "wood" shown in w 13⑷. It shows the direction of non-dependent code, and the fragment dictionary. Correspondence of "wood" V ::: Example 2 The characteristic map of _ and direction non-relational imaginary ^ system ^ Recognition method using Fang 在 in the knowledge. Figure 19 is a diagram showing an example of 8-direction code. Fig. 20 is a block diagram showing the structure of an online character recognition of the second conventional example. FIG. 21 is a diagram showing a correspondence table between a direction code string and a handwriting line code. Fig. 22 is a diagram showing an input pattern using the character "stone" in order to explain the operation of the conventional example 2. Fig. 23 is a diagram showing the input pattern of "character plant stone" in order to explain the operation of the conventional example 2. FIG. 24 is a diagram showing a decomposition rule of continuous word handwriting lines. Fig. 25 is a graph showing the effect of positional deviation in the distance calculation of the conventional example i. Fig. 26 is a diagram showing an example of an input pattern including "skew" and "down" and a dictionary that does not include "skip enough" and "down".
(請先閱讀背面之注意事項再填寫本頁) ,專 -訂 A7 B7 五、發明説明(23 ) 圖27係表示利用連續字表示出假想字跡線時之例 子之圖。 符號説明 - 1〜輸入部、2〜特徵抽出部、3〜特徵點對應部、4〜指 定區間特徵抽出部、5〜特徵比對部、6〜辭典記憶部、7~ 輸出部 -------:--------訂------3 (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐)(Please read the precautions on the back before filling out this page), special-order A7 B7 V. Description of the invention (23) Figure 27 is a diagram showing an example when continuous words are used to show the imaginary handwriting lines. Explanation of symbols-1 ~ input section, 2 ~ feature extraction section, 3 ~ feature point correspondence section, 4 ~ specified section feature extraction section, 5 ~ feature comparison section, 6 ~ dictionary memory section, 7 ~ output section ---- ---: -------- Order ------ 3 (Please read the notes on the back before filling out this page) The paper standard printed by the Staff Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs applies to Chinese national standards (CNS) A4 specification (210X297 mm)
Claims (1)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP04864897A JP3657077B2 (en) | 1997-03-04 | 1997-03-04 | Online character recognition device |
Publications (1)
Publication Number | Publication Date |
---|---|
TW399187B true TW399187B (en) | 2000-07-21 |
Family
ID=12809190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW087100891A TW399187B (en) | 1997-03-04 | 1998-01-22 | On-line character recognizing device |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP3657077B2 (en) |
KR (1) | KR100301216B1 (en) |
CN (1) | CN1096043C (en) |
TW (1) | TW399187B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4817297B2 (en) * | 2006-02-10 | 2011-11-16 | 富士通株式会社 | Character search device |
US20080049969A1 (en) * | 2006-08-25 | 2008-02-28 | Jason David Koziol | Methods And Systems For Generating A Symbol Identification Challenge For An Automated Agent |
CN109101973B (en) * | 2018-08-06 | 2019-12-10 | 掌阅科技股份有限公司 | Character recognition method, electronic device and storage medium |
-
1997
- 1997-03-04 JP JP04864897A patent/JP3657077B2/en not_active Expired - Fee Related
-
1998
- 1998-01-22 TW TW087100891A patent/TW399187B/en not_active IP Right Cessation
- 1998-02-25 KR KR1019980005891A patent/KR100301216B1/en not_active IP Right Cessation
- 1998-03-02 CN CN98105119A patent/CN1096043C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
JPH10247221A (en) | 1998-09-14 |
KR100301216B1 (en) | 2001-11-30 |
JP3657077B2 (en) | 2005-06-08 |
CN1096043C (en) | 2002-12-11 |
KR19980079762A (en) | 1998-11-25 |
CN1201207A (en) | 1998-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7873217B2 (en) | System for line extraction in digital ink | |
EP3712812A1 (en) | Recognizing typewritten and handwritten characters using end-to-end deep learning | |
Dai et al. | Chart decoder: Generating textual and numeric information from chart images automatically | |
Nguyen et al. | A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks | |
Fink et al. | Online Bangla word recognition using sub-stroke level features and hidden Markov models | |
CN109472234B (en) | Intelligent recognition method for handwriting input | |
JP2005242579A (en) | Document processor, document processing method and document processing program | |
US5438631A (en) | Handwriting and character recognition system | |
JPH0684006A (en) | Method of online handwritten character recognition | |
WO2022087847A1 (en) | Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium | |
WO2007094078A1 (en) | Character string search method and device thereof | |
CN114730241A (en) | Gesture stroke recognition in touch user interface input | |
Bae et al. | Segmentation of touching characters using an MLP | |
TW399187B (en) | On-line character recognizing device | |
US9208381B1 (en) | Processing digital images including character recognition using ontological rules | |
US20210209354A1 (en) | Information processing device, information processing method, and information processing program | |
Kumar et al. | Model-based annotation of online handwritten datasets | |
Naz et al. | Arabic script based character segmentation: a review | |
JP2021060876A (en) | Learning data generator, control method therefor, and program | |
Sundaram et al. | Lexicon-free, novel segmentation of online handwritten Indic words | |
US6636636B1 (en) | Character recognizing apparatus, method, and storage medium | |
CN117894030B (en) | Text recognition method and system for campus smart pen | |
JP2002074366A (en) | Signature collating method and card processing system | |
JPS5835674A (en) | Extracting method for feature of online hand-written character | |
AU2004214901B2 (en) | Line extraction in digital ink |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent | ||
MM4A | Annulment or lapse of patent due to non-payment of fees |