JPH0247788B2 - - Google Patents
Info
- Publication number
- JPH0247788B2 JPH0247788B2 JP57111912A JP11191282A JPH0247788B2 JP H0247788 B2 JPH0247788 B2 JP H0247788B2 JP 57111912 A JP57111912 A JP 57111912A JP 11191282 A JP11191282 A JP 11191282A JP H0247788 B2 JPH0247788 B2 JP H0247788B2
- Authority
- JP
- Japan
- Prior art keywords
- kanji
- handwritten
- furigana
- character
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Description
【発明の詳細な説明】
(1) 発明の技術分野
本発明は、手書き日本語文を認識する文字認識
装置に係り、特に漢字部分にのみフリガナを付与
して手書き日本語文の認識精度を向上した認識処
理方式に関するものである。[Detailed Description of the Invention] (1) Technical Field of the Invention The present invention relates to a character recognition device that recognizes handwritten Japanese text, and in particular to a character recognition device that improves the recognition accuracy of handwritten Japanese text by adding furigana only to kanji parts. This is related to the processing method.
(2) 技術の背景
一般に文字の認識装置においては、最初にパタ
ーンの観測が行なわれる。次にその文字の形が大
きいか小さいか、又線が多いか少ないかなどの特
徴を見出す、すなわちパターンの特徴抽出が行な
われる。そしてパターンの特徴は、装置内に記憶
している過去のデータから得た特徴と照し合わせ
て最も類似しているパターンを選び出す。すなわ
ち、特徴の処理が行なわれ、ひきつづいてその結
果から与えられたパターンを識別するための決定
が行なわれる。このようにして文字の認識が行な
われるものであるが、最近の文字認識装置は手書
きカナ文字に限らず手書き漢字用のものが開発さ
れている。(2) Background of the technology In general, character recognition devices first observe patterns. Next, characteristics such as whether the shape of the character is large or small, whether there are many or few lines, etc. are found, that is, the characteristics of the pattern are extracted. Then, the pattern characteristics are compared with the characteristics obtained from past data stored in the device, and the most similar pattern is selected. That is, features are processed and subsequent decisions are made to identify a given pattern from the results. Although character recognition is performed in this manner, recent character recognition devices have been developed not only for handwritten kana characters but also for handwritten kanji characters.
(3) 従来技術と問題点
漢字の認識は従来のカタカナまでの認識に比べ
てその特徴抽出が困難であるために精度が上がら
ないという問題がある。更に、日本語文に於いて
は次のような問題をも有している。(3) Prior art and problems There is a problem in the recognition of kanji that the accuracy cannot be improved because it is more difficult to extract the features compared to the conventional recognition of katakana. Furthermore, Japanese sentences also have the following problems.
即ち、特にカタカナと漢字の間ではほぼ同形の
文字が多く存在し、それらを識別して認識するこ
とは非常に困難である。例えば「エア」と「工
事」に於ける「工」、「力士」と「カサ」に於ける
「力」は識別不可能である。従来、このための方
策として、認識装置内において文章の前後関係を
判断して非漢字と漢字を識別する方法が知られて
いるが、この方法は複雑であると共に必ずしも識
別できないという欠点を有している。 That is, there are many characters that are almost the same, especially between katakana and kanji, and it is very difficult to identify and recognize them. For example, ``tech'' in ``air'' and ``construction'' and ``power'' in ``sumo wrestler'' and ``casa'' are indistinguishable. Conventionally, as a method for this purpose, a method is known in which the recognition device judges the context of the sentence and distinguishes between non-kanji and kanji, but this method is complex and has the disadvantage that it cannot always be distinguished. ing.
又、別の方法としては手書き日本語文自体に非
漢字と漢字の識別標識をそれぞれ付与するという
ことも考えられているが、このような標識を文字
枠内に付与することは面倒であり実用的でないと
いう欠点を有している。 Another method is to add identification marks for non-kanji and kanji to the handwritten Japanese text itself, but adding such marks within the character frame is troublesome and impractical. It has the disadvantage that it is not.
(4) 発明の目的
本発明は上記従来の欠点を除去し、漢字の認識
精度を向上させると共に、簡単に非漢字と漢字と
を識別し得る手書き日本語文の認識処理方式を提
供することを目的としている。(4) Purpose of the Invention The purpose of the present invention is to provide a recognition processing method for handwritten Japanese text that can eliminate the above-mentioned conventional drawbacks, improve the recognition accuracy of kanji, and easily distinguish between non-kanji and kanji. It is said that
(5) 発明の構成
そしてこの目的は本発明によれば、手書き日本
語文と該手書き日本語文に対応してその漢字のみ
に付与された手書きフリガナ文字を夫々区別して
読取り該手書き日本語文の認識出力と該手書きフ
リガナ文字の認識出力のカナ漢字対応出力とを照
合することにより漢字を認識すると共に、対応す
るフリガナの存在しない部分は非漢字として認識
するようにしたことを特徴とする手書き日本語の
認識処理方式を提供することをよつて達成され
る。(5) Structure of the Invention According to the present invention, the purpose is to read the handwritten Japanese text and the handwritten furigana characters added only to the kanji corresponding to the handwritten Japanese text separately, and to recognize and output the handwritten Japanese text. The handwritten Japanese character recognition method is characterized in that a kanji is recognized by comparing the recognition output of the handwritten furigana character with a kana-kanji corresponding output, and a part where a corresponding furigana does not exist is recognized as a non-kanji character. This is achieved by providing a recognition processing method.
(6) 発明の実施例
以下本発明の一実施例を図面に従つて詳述す
る。第1図は本発明による手書き日本語文の認識
処理方式を実現するための構成図を示す。(6) Embodiment of the Invention An embodiment of the present invention will be described in detail below with reference to the drawings. FIG. 1 shows a block diagram for realizing a handwritten Japanese sentence recognition processing method according to the present invention.
第1図において、1は手書き日本語文が書かれ
た帳票等の被読取り媒体であり、該媒体上には第
2図に示す如く、本文である手書き文11と該本
文11に対応してその漢字のみに付与された手書
きフリガナ文字12が書かれている。被読取り媒
体1上の文字(フリガナ12及び本文11)は
CCD等の光電変換装置2によつて走査されて読
取られ画像情報として認識前処理回路3に入力す
る。 In FIG. 1, reference numeral 1 denotes a medium to be read, such as a form, on which a handwritten Japanese text is written.As shown in FIG. There are 12 handwritten furigana characters added only to kanji. The characters on medium 1 to be read (furigana 12 and text 11) are
The image is scanned and read by a photoelectric conversion device 2 such as a CCD, and is input to the recognition preprocessing circuit 3 as image information.
認識前処理回路3は各文字の特徴抽出を行ない
各文字の候補を選択して出力する。この場合、フ
リガナについては殆ど正しい答が出るので1位の
候補のみ選択してやればよい。 The recognition preprocessing circuit 3 extracts the features of each character, selects and outputs candidates for each character. In this case, since most of the furigana answers are correct, it is sufficient to select only the first candidate.
しかし乍ら、漢字を含む本文については各文字
の候補をそのまま出力する。4,5はそれぞれフ
リガナ、本文を格納するバツフアであり、認識前
処理回路3からの候補出力がフリガナか本文かに
よつて振り分けられる。6は読出しコントローラ
であり、フリガナバツフア4からのフリガナ出力
に対応してヨミ辞書メモリ7をアクセスし、読出
されたカナ漢字対応出力と本文バツフア5からの
本文候補群出力とを順次照合することによつて本
文中の漢字候補を取捨選択し、各字20の候補カテ
ゴリーから2〜3の候補に絞る。尚、読出しコン
トローラ6は本文の各文字についてのフリガナの
カナ漢字対応出力との照合の結果、対応するフリ
ガナが存在しないと認識した場合にはその文字は
非漢字であるカタカナ、ひらがな、アルフアベツ
ト、アラビア数字、その他の記号であると判断す
る。 However, for text containing kanji, candidates for each character are output as is. Buffers 4 and 5 store furigana and text, respectively, and the candidate outputs from the recognition preprocessing circuit 3 are sorted depending on whether they are furigana or text. Reference numeral 6 denotes a reading controller which accesses the reading dictionary memory 7 in response to the furigana output from the furigana buffer 4 and sequentially collates the read kana-kanji corresponding output with the text candidate group output from the text buffer 5. The candidates for kanji in the main text are selected and narrowed down to 2 to 3 candidates from the 20 candidate categories for each character. If the readout controller 6 recognizes that the corresponding furigana does not exist as a result of comparing the furigana for each character in the main text with the kana-kanji corresponding output, the readout controller 6 converts the character into a non-kanji character such as katakana, hiragana, alphabet, or Arabic. Judge it as a number or other symbol.
このようにして読出しコントローラ6で取捨選
択された文字候補(各文字について2〜3の候
補)は候補バツフア8に格納される。候補バツフ
ア8からの文字候補出力は選択回路9に入力し、
ここで各文字候補は文章を構成するように横に並
べられ、最終的な正しいカテゴリーを1つ選択す
るための処理が行なわれる。選択回路9からの文
字出力はフロツピー、計算機等の出力回路10に
出力される。 The character candidates selected by the readout controller 6 in this manner (two to three candidates for each character) are stored in the candidate buffer 8. The character candidate output from the candidate buffer 8 is input to the selection circuit 9,
Here, each character candidate is arranged horizontally to form a sentence, and processing is performed to select one final correct category. The character output from the selection circuit 9 is output to an output circuit 10 such as a floppy disk or a computer.
第3図は第1図における読出しコントローラ6
の動作を説明するための図である。第3図aは本
文11とそのフリガナ12の一例を示すものであ
り、それぞれ文字枠に対応して連続番号を付して
ある。第3図b〜eは第3図aにおけるフリガナ
11の部分〜に対応する本文11の漢字候補
の認識決定の模様を示すものであり、それぞれフ
リガナ部分〜のスタート時点を変えた時のフ
リガナのヨミに対応するカナ漢字対応出力と本文
バツフア6(第1図)からの候補カテゴリの中で
合致するものがあるか否かを示すものである。同
図の場合はフリガナ部分〜に対応する漢字の
候補は「当社」であると判断される。同様にし
て、フリガナ部分〜に対応する漢字は「工事
業者」であることが判断され、従つて本文の部分
〜の「はエアコン」は非漢字で、この場合カ
タカナであることが認識される。 Figure 3 shows the readout controller 6 in Figure 1.
FIG. FIG. 3a shows an example of the main text 11 and its furigana 12, each of which is numbered consecutively in correspondence with the character frame. Figures 3b to 3e show how the kanji candidates of text 11 corresponding to the furigana part 11 in Figure 3a are recognized and determined. This indicates whether or not there is a match among the candidate categories from the kana-kanji corresponding output corresponding to the reading and the text buffer 6 (FIG. 1). In the case of the same figure, the kanji candidate corresponding to the furigana part ~ is determined to be "our company." Similarly, the kanji character corresponding to the furigana part ~ is determined to be "construction contractor", and therefore, the text part ~ "is air conditioner" is a non-kanji character, and in this case it is recognized to be katakana.
(7) 発明の効果
以上説明したように本発明によれば、漢字に比
べて認識が容易なフリガナを利用して漢字と非漢
字の同形のものの弁別をするものであり漢字の認
識精度を高めることができると共に、簡単に非漢
字と漢字とを識別して認識できるという効果を有
する。(7) Effects of the Invention As explained above, according to the present invention, furigana, which is easier to recognize than kanji, is used to distinguish between kanji and non-kanji with the same shape, thereby improving the recognition accuracy of kanji. This has the effect that non-kanji characters and kanji characters can be easily distinguished and recognized.
第1図は本発明の認識処理方式を実現するため
の構成図、第2図は第1図における被読取り媒体
1に書かれた文字の例、第3図は第1図における
読み出しコントローラ6の動作を説明するための
図である。
図において、1は日本語文11とフリガナ12
が書かれた被読取り媒体、2は光電変換装置、3
は認識前処理回路、4はフリガナバツフア、5は
本文バツフア、6は読み出しコントローラ、7は
ヨミ辞書メモリをそれぞれ示す。
FIG. 1 is a block diagram for realizing the recognition processing method of the present invention, FIG. 2 is an example of characters written on the read medium 1 in FIG. 1, and FIG. 3 is an example of the read controller 6 in FIG. FIG. 3 is a diagram for explaining the operation. In the diagram, 1 is Japanese sentence 11 and furigana 12.
2 is a photoelectric conversion device; 3 is a medium to be read on which is written;
4 is a reading buffer, 5 is a text buffer, 6 is a reading controller, and 7 is a reading dictionary memory.
Claims (1)
てその漢字のみに単語単位に付与された手書きフ
リガナ文字を夫々区別して読み取り、該手書きフ
リガナ文字から得られるカナの開始位置を1文字
毎にずらして各々のカナ漢字対応出力を得、該
各々のカナ漢字対応出力と上記手書き日本語文の
認識出力を照合することにより漢字を認識すると
共に、対応するフリガナの存在しない部分は非漢
字として認識するようにしたことを特徴とする手
書き日本語文の認識処理方式。1. Distinguish and read the handwritten Japanese text and the handwritten furigana characters added to each word only for the kanji corresponding to the handwritten Japanese text, and shift the starting position of the kana obtained from the handwritten furigana characters for each character. By obtaining output corresponding to each kana-kanji and comparing the output corresponding to each kana-kanji with the recognition output of the above-mentioned handwritten Japanese sentence, the kanji is recognized, and parts where the corresponding furigana does not exist are recognized as non-kanji. A recognition processing method for handwritten Japanese sentences.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP57111912A JPS592191A (en) | 1982-06-29 | 1982-06-29 | Recognizing and processing system of handwritten japanese sentence |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP57111912A JPS592191A (en) | 1982-06-29 | 1982-06-29 | Recognizing and processing system of handwritten japanese sentence |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| JPS592191A JPS592191A (en) | 1984-01-07 |
| JPH0247788B2 true JPH0247788B2 (en) | 1990-10-22 |
Family
ID=14573230
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP57111912A Granted JPS592191A (en) | 1982-06-29 | 1982-06-29 | Recognizing and processing system of handwritten japanese sentence |
Country Status (1)
| Country | Link |
|---|---|
| JP (1) | JPS592191A (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS6334680A (en) * | 1986-07-29 | 1988-02-15 | Toshiba Corp | Character reader |
| JPH01147829A (en) * | 1987-12-04 | 1989-06-09 | Toshiba Corp | Manufacturing method of semiconductor device |
| JPH04242491A (en) * | 1991-01-17 | 1992-08-31 | Nec Corp | Optical character reader |
| US5326713A (en) * | 1992-09-04 | 1994-07-05 | Taiwan Semiconductor Manufacturies Company | Buried contact process |
| ES2993665T3 (en) | 2021-02-22 | 2025-01-03 | Zeiss Carl Vision Int Gmbh | Devices and methods for processing eyeglass prescriptions |
| EP4101367A1 (en) | 2021-06-09 | 2022-12-14 | Carl Zeiss Vision International GmbH | Method and device for determining a visual performance |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPS5699573A (en) * | 1980-01-09 | 1981-08-10 | Hitachi Ltd | Kanji (chinese character) distinction system using katakana (square form of japanese syllabary) |
| JPS5699581A (en) * | 1980-01-10 | 1981-08-10 | Toshiba Corp | Kanji (chinese character) read method |
-
1982
- 1982-06-29 JP JP57111912A patent/JPS592191A/en active Granted
Also Published As
| Publication number | Publication date |
|---|---|
| JPS592191A (en) | 1984-01-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP2713622B2 (en) | Tabular document reader | |
| JP3452774B2 (en) | Character recognition method | |
| US4531231A (en) | Method for distinguishing between complex character sets | |
| US4990903A (en) | Method for storing Chinese character description information in a character generating apparatus | |
| JPH0247788B2 (en) | ||
| JP2740335B2 (en) | Table reader with automatic cell attribute determination function | |
| KR102646607B1 (en) | System for recognizing character based on deep learning | |
| JPH0520794B2 (en) | ||
| JPS5842904B2 (en) | Handwritten kana/kanji character recognition device | |
| JPS6160189A (en) | optical character reader | |
| JP2972443B2 (en) | Character recognition device | |
| JP3015137B2 (en) | Handwritten character recognition device | |
| JP2939945B2 (en) | Roman character address recognition device | |
| JPH0586585B2 (en) | ||
| JPS58125183A (en) | Method for displaying unrecognizable character in optical character reader | |
| KR900005141B1 (en) | Character Recognition Device | |
| Tierney et al. | Printed Cyrillic character recognition system | |
| JP2740506B2 (en) | Image recognition method | |
| JP2549831B2 (en) | Character recognition device input pattern / character string registration method | |
| JPS5757379A (en) | Character information input device | |
| JPS61153787A (en) | Information processing device | |
| JPH06203201A (en) | Method and device for recognizing optical handwritten character | |
| JPH02181269A (en) | Address recognizing system | |
| JPH11282965A (en) | Character recognition device and computer-readable recording medium storing character recognition program | |
| JPH0520503A (en) | Character recognizing device |