JPS6145378A - Word reader - Google Patents

Word reader

Info

Publication number
JPS6145378A
JPS6145378A JP59166418A JP16641884A JPS6145378A JP S6145378 A JPS6145378 A JP S6145378A JP 59166418 A JP59166418 A JP 59166418A JP 16641884 A JP16641884 A JP 16641884A JP S6145378 A JPS6145378 A JP S6145378A
Authority
JP
Japan
Prior art keywords
characters
word
heading
recognition candidate
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP59166418A
Other languages
Japanese (ja)
Inventor
Haruo Mizukami
水上 治雄
Masataka Yamamoto
山本 勝敬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Priority to JP59166418A priority Critical patent/JPS6145378A/en
Publication of JPS6145378A publication Critical patent/JPS6145378A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To improve both reading accuracy and speed with a word reader by replacing characters of the 1st heading with most similar internal characters if said heading characters are equal to external ones and preparing the 2nd heading having same contents as the 1st heading in other cases within a word dictionary. CONSTITUTION:The characters on a form 1 are scanned by a scanning means 2 and undergo the preprocessing such as the binary coding, etc. through a preprocessing means 3. The recognized candidate characters selected by a character recognizing means 4 are stored to a recognized candidate character memory means 5. A word deciding means 6 decides and delivers a word name out of the recognized candidate characters stored the means 5 by means of a word dictionary 7. The dictionary 7 contains the word name to be read as the 1st heating. In case the external characters are included in those characters which form the 1st heading, these external characters are replaced with the internal characters having similar forms similar to those of the external characters. Thus the 2nd heading having the same contents as the 1st heading is obtained.

Description

【発明の詳細な説明】 〔発明の技術分野〕 本発明は、単語を構成する文字を/文字ごとに認識し、
その結果を用いて単語を読取る単語読取装置に関するも
のである。
[Detailed Description of the Invention] [Technical Field of the Invention] The present invention recognizes characters constituting a word for each character,
The present invention relates to a word reading device that reads words using the results.

〔従来技術〕[Prior art]

従来の単語読取装置では、単語を構成する文字を7文字
ごとに認識し、その結果から卑語を読取るように構成さ
れていた。したがって、認識対象文字(以下、内字とい
う)のみから成る卑語は読取り可能であるが、認識対象
外文字(以下、外字という)を含む単語は、読取ること
ができなかった・つまり、読取り結果が誤読またはりジ
エクトになるため読取り結果を修正する必要があった。
Conventional word reading devices are configured to recognize every seven characters that make up a word, and read vulgar words from the results. Therefore, words containing only recognition target characters (hereinafter referred to as internal characters) can be read, but words containing non-recognition target characters (hereinafter referred to as external characters) cannot be read.In other words, the reading result is It was necessary to correct the reading results to avoid misreading or misdirection.

読取り結果を修正する最も簡単な方法は、JI8区点番
号などの文字コードで再入力することである・しかし、
この方法では、再入力しようとする文字コードを対照表
を用いて調べなければならず。
The easiest way to correct the reading result is to re-enter it with a character code such as JI8 Kuten number. However,
With this method, you have to check the character code you want to re-enter using a comparison table.

非常に手間がかかるという欠点があった。The drawback was that it was very time consuming.

また、他の、読取り結果を修正する方法においては、内
字ごとに、その文字と字形が類似した外字をあらかじめ
使用、頻度順に記憶しておき、外字を入力するときには
この外字に字形が類似した認識候補文字を選択して指定
することによって、対応する外字を表示させ、その中か
ら選択して入力していた。この方法では、入力しようと
する外字の文字コードを調べなくてもよいが1文字を選
択するため人間が介入しなげればならないという欠点が
あった。
In addition, in another method of correcting the reading results, for each internal character, a external character with a similar shape to that character is used in advance, memorized in order of frequency, and when inputting the external character, use the external character with a similar shape to this external character. By selecting and specifying a recognition candidate character, the corresponding external characters are displayed, and the user selects and inputs from among them. This method does not require checking the character code of the external character to be input, but has the disadvantage that it requires human intervention to select a single character.

〔発明の概要〕 本発明は上記のような欠点を除去するためになされたも
ので、外字を含む単語を高精度で早く読取る単語読取装
置を提供することを目的とする。
[Summary of the Invention] The present invention has been made in order to eliminate the above-mentioned drawbacks, and an object of the present invention is to provide a word reading device that can quickly read words including non-standard characters with high accuracy.

この目的を達成するためにこの発明で採用された技術的
手段は、複数の文字から成る単語を認1;哉して読取る
凰語読取装置であって、垣語を構成する文字な1文字ご
とに認識して入力文字としての確度が高い順にそれぞれ
複数の認識候補文字を選択する文字認識手段と、該文字
認識手段で選択された認識候補文字を記憶する認識候補
文字記憶手段と、入力単語となる読取り対象単語名を第
1の見出し項目として持ち、該読取り対象単語のうち認
識対象外文字を含む単語に関しては該認識対象外文字を
字形が類似する認識対象文字に置換し。
The technical means adopted in this invention to achieve this purpose is a word reading device that recognizes and reads words consisting of multiple characters, one by one, each character constituting a word. character recognition means for recognizing and selecting a plurality of recognition candidate characters in descending order of accuracy as input characters; recognition candidate character storage means for storing recognition candidate characters selected by the character recognition means; The target reading word name is the first heading item, and for words that include non-recognition characters among the reading target words, the non-recognition characters are replaced with recognition target characters having similar fonts.

認識対象外文字を含まない単語は第1の見出し項目と同
一内容とした第2の見出し項目を持つ単語辞書と、前記
認識候補文字記憶手段内の認識候補文字の組合せと前記
単語辞書内の前記第2の見出し項目と比欲して両者が一
致する単語名を決定し。
Words that do not include non-recognition target characters include a word dictionary having a second heading item with the same content as the first heading item, a combination of recognition candidate characters in the recognition candidate character storage means, and the combination of the recognition candidate characters in the word dictionary. A word name that matches the second heading item is determined.

前記第2の見出し項目に対応する前記第7の見出し項目
を最終的な単語の読取り結果として出力する単語決定手
段と、を備えたことを特徴とする構成である。
This configuration is characterized by comprising a word determining means for outputting the seventh heading item corresponding to the second heading item as a final word reading result.

〔発明の実施例〕[Embodiments of the invention]

以下1本発明の実施例を図を用いて詳細に説明するO 第1図は1本発明の実施例を示すブロック図である。帳
票を上の文字を走査子゛段コで走査して光電変換し、前
処理手段3で二値化などの前処理を行い2文字認識手段
弘で選択した認識候補文字を認識候補文字記憶手段5に
格納する。単語決定手段6では、認識候補文字記憶手段
よに格納された認識候補文字から、単語辞書りを用いて
単語名を決定し出力する。
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of the present invention. The characters on the form are scanned by a scanner stage and subjected to photoelectric conversion, and the preprocessing means 3 performs preprocessing such as binarization, and the recognition candidate characters selected by the two character recognition means are stored in the recognition candidate character storage means. Store in 5. The word determining means 6 uses a word dictionary to determine and output a word name from the recognition candidate characters stored in the recognition candidate character storage means.

第二図は1文字認識乎段ψの動作を説明するための図で
ある。帳票l上に記入された読取るべき入力単語ざは1
例えば、文字「麻」tと「雀」l。
FIG. 2 is a diagram for explaining the operation of one character recognition stage ψ. Input word written on form l to be read Zaha 1
For example, the characters ``hema'' t and ``sparrow'' l.

とで構成されているものとする。文字認識手段ダでは、
入力単語ざを構成する各文字を1文字ごとに切り出し、
既に知られているパターンマツチング法等の技術を用い
て認識し、各文字ごとに入力文字としての確度が高い順
に順位(例えば、「麻」ノコと「省」13との組合せが
図示の如く第1位)付けした複数の認識候補文字1)を
選択する。これら認識候補文字//は認識候補文字記憶
手段5に格納される。
It is assumed that it consists of With character recognition means,
Cut out each character that makes up the input word,
It is recognized using techniques such as already known pattern matching methods, and each character is ranked in descending order of accuracy as an input character (for example, the combination of ``A'' saw and ``Sho'' 13 is shown in the figure). 1) Select the plurality of recognition candidate characters 1) assigned. These recognition candidate characters // are stored in the recognition candidate character storage means 5.

今、入力単語tを構成する「麻」9を内字。Now, "hemp" 9, which makes up the input word t, is an internal character.

「雀」IOを外字と仮定すると、「麻」りは認識候補文
字lノの中に含まれるが、「雀」IOは認識候補文字l
lの中に含まれることはない。
Assuming that "sparrow" IO is a foreign character, "ma" ri is included in the recognition candidate characters lno, but "sparrow" IO is a recognition candidate character l.
It is never included in l.

第3図は、単語辞書7の構成例である。この単語辞書7
は、読取り対象単語名を第7の見出し項目として持ち、
第1の見出し項目を構成する文字が外字を含む場合には
、その外字を字形の類似した内字で置換し、その他の第
1の見出し項目と同一内容とした第2の見出し項目を有
するように構成されている。
FIG. 3 shows an example of the structure of the word dictionary 7. This word dictionary 7
has the name of the word to be read as the seventh heading item,
If the characters constituting the first heading item include external characters, replace the external characters with internal characters of similar shape, and create a second heading item with the same content as the other first heading item. It is composed of

なお、単語辞書7の、外字を含む第1の見出し項目lq
に対応する第2の見出し項目/Sの作成は、読取り対象
単語に含まれる外字を認識させ。
Note that the first heading item lq of the word dictionary 7 that includes external characters
To create the second heading item/S corresponding to , the external characters included in the word to be read are recognized.

それが最も4(゛誤読される内字な求め、第1の見出し
項目内の外字と置き換えることによって行われる。
This is done by replacing the most frequently misread internal characters with external characters in the first heading item.

単語決定手段6では1以上のようにして作成された認識
候補文字1)と単語辞書7を用い1次のようにして単語
名を決定する。
The word determining means 6 determines a word name in a first order manner using the recognition candidate characters 1) created in one or more ways and the word dictionary 7.

まず、認識候補文字//の組合せのうち、単語辞!!ニ
アの第2の見出し項目と一致する組合せを見つける。一
致する組合せを見つげる。一致する組合せが複数存在す
る場合には1例えば、認識候補文字に付された順位の和
が最小となる組合せを選択すればよい。この例の場合は
、第1位の「麻」7.2と「省」/3との組合せだけを
単語辞書りの第2の見出し項目に見つけることができる
。つまり、入力単語tに対して、「麻雀Jtqという単
語が決定されたことになる。そこで、単語辞書りの第2
の見出し項目である「麻雀Jtsに対応する第1の見出
し項目「麻雀」lIIを最終的な読取り結果とすること
により、入力単語ざを正しく読取ることができる。
First, among the combinations of recognition candidate characters //, the word dictionary! ! Find a combination that matches the second heading item of Near. Look for matching combinations. If there are a plurality of matching combinations, for example, the combination that minimizes the sum of the ranks assigned to the recognition candidate characters may be selected. In this example, only the combination of the first place "hema" 7.2 and "Ministry"/3 can be found in the second heading item of the word dictionary. In other words, the word ``Mahjong Jtq'' has been determined for the input word t.
By setting the first heading item "Mahjong" III corresponding to the heading item "Mahjong Jts" as the final reading result, the input word za can be read correctly.

実際、単語を構成する文字のうち、漢字同士の連接確率
は7%以下であることが知られているので、外字をその
外字に最も字形が類似している内字に置換しても、他の
意味のある単語になる可能性は非常に小さく、シたがっ
て本発明に係る単語読取装置は外字を含む単語を充分に
高い確率で読取ることができる。
In fact, it is known that among the characters that make up a word, the probability of concatenation between kanji characters is less than 7%. There is a very small possibility that the word will be a meaningful word, and therefore the word reading device according to the present invention can read words that include non-standard characters with a sufficiently high probability.

なお、以上の実施例では外字を1文字含んでコ文字から
成る単語の場合について説明したが1本発明はこれに限
らず、1文字あるいは3文字以上から成る単語の場合に
も適用できるし、単語を構成する文字が2個以上の外字
を含む場合にも適用できる。
In addition, in the above embodiment, the case of a word consisting of a letter C including one external character was explained, but the present invention is not limited to this, but can also be applied to a word consisting of one letter or three or more letters. This method can also be applied when the characters composing a word include two or more external characters.

〔発明の効果〕〔Effect of the invention〕

以上のように1本発明によれば、単語辞書内に。 As described above, according to the present invention, in the word dictionary.

読取り対象単語名を第1の見出し項目として持ち、第1
の見出し項目を構成する文字が外字の場合には、その文
字の字形に最も類似した内字で置換し、その他は第1の
見出し項目と同一内容とする第2の見出し項目を持って
いるので、内字のみで構成される単語はもちろん、外字
を含んで構成される単語路も高精度で早く読取ることが
できるという利点がある。
It has the name of the word to be read as the first heading item, and
If the characters that make up the heading item are external characters, they are replaced with internal characters that are most similar to the character shape, and the other items have a second heading item that has the same content as the first heading item. , it has the advantage of being able to read not only words made up of internal characters only, but also word lines that include external characters with high accuracy and speed.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の実施例を示すブロック図、第2図は文
字認識手段の#h作を説明するための図。 そして第3図は、単語辞書の構成例を示す図、である。 図中、グ・・文字認識手段、5・・認識候補文字記憶手
段、6φ・単語決定手段、り・・単語辞書 f a 会
入力単語 /10m認識候補文字なお1図中同一あるい
は相当部分には同一符号を付して示しである。 手続補正書(自発) 昭和6゜チ4 、月5日
FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is a diagram for explaining the #h operation of the character recognition means. FIG. 3 is a diagram showing an example of the structure of a word dictionary. In the figure, G... character recognition means, 5... recognition candidate character storage means, 6φ... word determination means, ri... word dictionary f a group input word /10m recognition candidate characters. They are shown with the same reference numerals. Procedural amendment (voluntary) Showa 6゜chi4, May 5th

Claims (2)

【特許請求の範囲】[Claims] (1)複数の文字から成る単語を認識して読取る単語読
取装置であつて、単語を構成する文字を1文字ごとに認
識して入力文字としての確度が高い順にそれぞれ複数の
認識候補文字を選択する文字認識手段と、該文字認識手
段で選択された認識候補文字を記憶する認識候補文字記
憶手段と、入力単語となる読取り対象単語名を第1の見
出し項目として持ち、該読取り対象単語のうち認識対象
外文字を含む単語に関しては該認識対象外文字を字形が
類似する認識対象文字に置換し、認識対象外文字を含ま
ない単語は第1の見出し項目と同一内容とした第2の見
出し項目を持つ単語辞書と、前記認識候補文字記憶手段
内の認識候補文字の組合せと前記単語辞書内の前記第2
の見出し項目と比較して両者が一致する単語名を決定し
、前記第2の見出し項目に対応する前記第1の見出し項
目を最終的な単語の読取り結果として出力する単語決定
手段と、を備えたことを特徴とする単語読取装置。
(1) A word reading device that recognizes and reads words made up of multiple characters, which recognizes each character that makes up a word and selects multiple recognition candidate characters in order of their accuracy as input characters. a recognition candidate character storage means for storing recognition candidate characters selected by the character recognition means; and a recognition candidate character storage means for storing recognition candidate characters selected by the character recognition means; For words that include non-recognized characters, the non-recognized characters are replaced with recognition target characters that have similar glyphs, and words that do not include non-recognized characters are created as second heading items with the same content as the first heading item. a combination of recognition candidate characters in the recognition candidate character storage means and the second word dictionary in the word dictionary;
word determining means for comparing the first heading item with the first heading item to determine a matching word name, and outputting the first heading item corresponding to the second heading item as a final word reading result. A word reading device characterized by:
(2)前記単語決定手段は、前記認識候補文字の組合せ
と前記第2の見出し項目とを比較して両者が一致する組
合せが複数存在する場合には、前記認識候補文字に付さ
れた順位の和が最小となる組合せを選択するものである
特許請求の範囲第1項記載の単語読取装置。
(2) The word determining means compares the combination of the recognition candidate characters and the second heading item, and if there are multiple combinations in which both match, the word determination means determines the rank assigned to the recognition candidate characters. The word reading device according to claim 1, which selects a combination with a minimum sum.
JP59166418A 1984-08-10 1984-08-10 Word reader Pending JPS6145378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59166418A JPS6145378A (en) 1984-08-10 1984-08-10 Word reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59166418A JPS6145378A (en) 1984-08-10 1984-08-10 Word reader

Publications (1)

Publication Number Publication Date
JPS6145378A true JPS6145378A (en) 1986-03-05

Family

ID=15831053

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59166418A Pending JPS6145378A (en) 1984-08-10 1984-08-10 Word reader

Country Status (1)

Country Link
JP (1) JPS6145378A (en)

Similar Documents

Publication Publication Date Title
US5113452A (en) Hand-written character recognition apparatus and method
JP3139521B2 (en) Automatic language determination device
US5394484A (en) Image recognition apparatus
US5881172A (en) Hierarchical character recognition system
JPS6145378A (en) Word reader
US6961465B2 (en) System and method for efficient determination of recognition initial conditions
EP0625764A2 (en) Accelerated OCR classification
JPH0157837B2 (en)
JPH07271920A (en) Character recognizing device
JP3025382B2 (en) Document processing device
JPS63263588A (en) Character reader
JP2677271B2 (en) Character recognition device
JPS6095689A (en) Optical character reader
JPS61153781A (en) Optical character reading device
JPS63249282A (en) Multifont printed character reader
JPH0721303A (en) Character recognizing device
JP3416975B2 (en) Character recognition device and method of correcting recognized characters
JPS6174087A (en) Word reading device
JPS59188783A (en) Character discriminating and processing system
JPH11296619A (en) Character recognition device
JPH0484380A (en) Character recognizing device
JPS6095690A (en) Character reader
JPH11120294A (en) Character recognition device and medium
JPS59197970A (en) Form for character reading device
JPS61114388A (en) Character input device