JPH08241314A

JPH08241314A - Document filing system

Info

Publication number: JPH08241314A
Application number: JP4579495A
Authority: JP
Inventors: Kazuto Kikuchi; 和人菊池; Makoto Kawakita; 誠川北; Kiyousuke Hirono; 恭資廣野; Hideyuki Yoshida; 秀行吉田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1995-03-06
Filing date: 1995-03-06
Publication date: 1996-09-17
Anticipated expiration: 2022-06-13
Also published as: JP3928739B2

Abstract

PURPOSE: To provide a document filing system which reads out information entered into a document by utilizing features of the object document to be recognized and efficiently perform input and itemizing operation. CONSTITUTION: This document filing system which reads out characters having information on plural items entered into a document according to a certain format, converts them into character codes, and stores them is equipped with a character recognizing means 111 which recognizes the characters on the basis of dot patterns showing characters included in an image corresponding to the document and outputs corresponding character codes as recognition results, a decomposing means 112 which decomposes text information obtained as the recognition results into parts of speech as its constituent elements, and a classifying means 113 which analyzes the relativity of a series of parts of speech obtained by the decomposing means 112, classifying information represented with the series of parts of speech by items, and stores it.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書のイメージから文
字を認識する文字認識装置を適用した文書ファイリング
システムに関し、特に、戸籍データなどのように、文字
認識装置による認識率が低いことが予想される文書に対
応する文書ファイリングシステムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document filing system to which a character recognition device for recognizing characters from an image of a document is applied, and in particular, it is expected that the recognition rate by the character recognition device such as family register data is low. The present invention relates to a document filing system corresponding to a document to be created.

【０００２】文字認識装置は、光学的に読み取った文書
のイメージに含まれる文字のパターンを内蔵している文
字パターンと照合することにより、文書の内容をコード
化するものであり、横書きで適当な間隔で活字体の文字
が配列された文書を対象としたものが多数製品化されて
いる。したがって、このような文字認識装置を文書ファ
イリングシステムに適用することにより、書籍をはじ
め、活字を用いて印刷された様々な資料や報告書など、
オフィス内の膨大な文書の内容をコード化し、コンパク
トなサイズのファイルとして保存しておくことができ、
また、検索なども容易となるため、情報の共有化を図る
ことができる。A character recognition device encodes the content of a document by collating a character pattern included in an image of an optically read document with a built-in character pattern, and is suitable for horizontal writing. Many products have been commercialized for documents in which typeface characters are arranged at intervals. Therefore, by applying such a character recognition device to a document filing system, various materials and reports printed using print, including books,
The contents of a huge amount of documents in the office can be encoded and saved as a compact size file.
In addition, information can be shared because searches are easy.

【０００３】ところで、近年では、ワードプロセッサな
どの普及に伴って、活字で印刷された文書の比率が圧倒
的であるが、例えば、全国の自治体で管理している戸籍
原本のように、手書きによる文書や手書き部分とタイプ
による活字部分とが混在した文書も相当な量があり、こ
れらの資料もコード化して保存する必要に迫られてい
る。By the way, in recent years, with the spread of word processors and the like, the ratio of documents printed in print is overwhelming, but for example, handwritten documents such as original family registers managed by local governments nationwide. There is a considerable amount of documents in which handwritten parts and typed parts depending on the type are mixed, and there is an urgent need to store these materials in a coded form.

【０００４】特に、戸籍原本は、戸籍に記載された全て
の人物の除籍後８０年間の保存が義務づけられているた
め、タイプの導入以前に編成された戸籍原本が全体に占
める割合はかなり大きく、戸籍データをコード化して保
存する際には、手書き文字の存在を考慮することがぜひ
とも必要である。[0004] In particular, since the original family register is required to be preserved for 80 years after the removal of all persons listed in the family register, the ratio of the original family register organized before the introduction of the type is considerably large, When encoding and storing family register data, it is absolutely necessary to consider the existence of handwritten characters.

【０００５】[0005]

【従来の技術】上述したように、従来の文字認識装置
は、活字体の文字が一定の間隔で配列された文書に対応
するものであり、罫線が施されたテンプレートに毛筆に
よって縦方向に非常に詰まった状態で文字が記載されて
いる文書のイメージから、それぞれの文字を認識するこ
とは非常に困難である。2. Description of the Related Art As described above, a conventional character recognition apparatus corresponds to a document in which characters of a typeface are arranged at a constant interval, and a template with ruled lines is vertically extended by a writing brush. It is very difficult to recognize each character from the image of the document in which the characters are described in the state of being jammed.

【０００６】このため、従来は、図８に示すように、戸
籍原本を撮影したマイクロフィルム３０１をマイクロフ
ィルムリーダー３０２にかけて戸籍原本のイメージを紙
に印刷して写し３０３を作成し、この写し３０３に基づ
いて、戸籍原本に記載された情報を操作者が読み取っ
て、文書ファイリングシステム３１０に備えられたキー
ボード３１１などの入力装置を介して、読み取り結果を
入力していた。For this reason, conventionally, as shown in FIG. 8, a microfilm 301, which is an original copy of a family register, is placed on a microfilm reader 302 to print an image of the original family register on paper to create a copy 303. Based on this, the operator reads the information described in the original family register, and inputs the reading result through the input device such as the keyboard 311 provided in the document filing system 310.

【０００７】また、この読み取り結果の入力に応じて、
編集処理部３１２により、戸籍に記載されている各項目
の情報をそれぞれ抽出して戸籍データファイル３１３を
作成し、これらの各項目の情報を確認するために、照合
リスト作成部３１４により、項目別に記載した照合リス
ト３０４を印刷出力しており、この照合リスト３０４と
上述した写し３０３とが人手によって照合されていた。Further, in response to the input of the reading result,
The edit processing unit 312 extracts information of each item described in the family register to create a family register data file 313, and in order to confirm the information of each of these items, the collation list creation unit 314 separates each item. The described collation list 304 was printed out, and the collation list 304 and the above-mentioned copy 303 were collated manually.

【０００８】このときに、誤りが発見されると、再び端
末操作者がキーボード３１１を操作して該当部分の修正
を行い、上述した照合処理で誤りが発見されなくなった
ときに、初めて、各項目に対応するコード情報が、戸籍
データファイル３１３に保存される構成となっている。
ここで、戸籍原本に記載された情報を読み取る作業を支
援するために、必要な項目に対応する部分を示すマーク
を写し３０３に予め施しておく作業（以下、マーキング
作業と称する）を行う場合がある。At this time, when an error is found, the terminal operator again operates the keyboard 311 to correct the corresponding portion, and when the error is not found by the above-described collation processing, each item is not displayed for the first time. The code information corresponding to is stored in the family register data file 313.
Here, in order to support the work of reading the information described in the original family register, a work (hereinafter referred to as a marking work) in which a mark indicating a portion corresponding to a necessary item is preliminarily applied to the copy 303 may be performed. is there.

【０００９】このマーキング作業で、各項目の区切りな
どを明確に指示しておけば、上述した編集処理部３１２
は、項目ごとに区切られた情報を受け取ることができる
から、それぞれの情報が項目に適合しているか否かを判
定し、この判定結果を編集処理に反映すればよい。一
方、マーキング作業で、必要な情報として入力すべき範
囲を示した場合は、編集処理部３１２により、入力され
た情報から各項目に対応する部分を抽出する処理を行う
必要があるが、マーキング作業に要する手間を大幅に軽
減することができる。In this marking operation, if the division of each item is clearly specified, the above-mentioned edit processing unit 312
Can receive information divided for each item, so it is only necessary to determine whether each item of information is suitable for the item and reflect the determination result in the editing process. On the other hand, when the marking work indicates the range to be input as the necessary information, the edit processing unit 312 needs to perform a process of extracting a portion corresponding to each item from the input information. The labor required for can be reduced significantly.

【００１０】このように、従来は、上述したマーキング
作業や編集処理部３１２による処理によって、戸籍情報
の入力作業の若干の効率化が図られていたが、写し３０
３からの情報読み取り作業，入力作業およびこの作業結
果の確認作業を全て人手で行っており、これらの作業を
自動化する試みは行われていなかった。As described above, conventionally, the marking work and the processing by the editing processing unit 312 have been used to slightly improve the efficiency of the work for inputting family register information.
The information reading work from 3, the input work, and the work of confirming the result of this work are all performed manually, and no attempt has been made to automate these works.

【００１１】[0011]

【発明が解決しようとする課題】上述したように、情報
読み取り作業と入力作業と照合作業との全てを人手で処
理するのでは、操作者の負担があまりにも大きく、この
ため、読み取りミスや読み取った情報を入力する際の単
純なタイプミス，照合作業の際のチェックミスなど様々
な段階で多くのミスを誘発してしまう可能性が高い。As described above, if all the information reading work, the input work and the collation work are processed manually, the operator's burden is too great, and therefore, the reading error or the reading work is not performed. There is a high possibility that many mistakes will be triggered at various stages, such as simple typos when inputting information, and check mistakes during collation work.

【００１２】また、上述したような人手に頼る方法で膨
大な戸籍原本を全て電子化するためには、莫大な人手が
必要となり、そのために天文学的な費用が必要となって
しまう。このため、戸籍情報をファイリングするために
は、情報読み取り，入力作業の自動化を図るとともに、
照合作業を支援することが必要である。In addition, in order to digitize all of the enormous family register originals by the above-mentioned method relying on manpower, enormous manpower is required, which requires astronomical cost. Therefore, in order to file the family register information, the information reading and input work should be automated and
It is necessary to support the collation work.

【００１３】ところで、罫線を有するテンプレートに文
字列が縦書きで配置されているという特殊な文書に特殊
化したアプローチにより、戸籍原本のような文書に対応
するイメージデータからそれぞれの文字をある程度の認
識率で認識する目処が付き、これにより、このような文
書に含まれる個々の文字のコード化作業の自動化を図る
ことは可能となった。By the way, by the approach specialized for a special document in which a character string is arranged vertically in a template having ruled lines, each character is recognized to some extent from image data corresponding to a document such as an original family register. With the aim of recognizing by rate, it became possible to automate the work of encoding each character contained in such a document.

【００１４】しかしながら、このように単にコード化し
ただけでは、項目名など各項目の情報としては不必要な
情報もコード化されてしまうため、コード化されたテキ
スト情報から各項目の情報を抽出する処理に工夫が必要
である。また、手書き文字ではかなりの頻度で認識漏れ
が発生する可能性があるので、文字認識装置による認識
漏れに対する配慮も必要である。However, if the information is simply coded in this way, information that is unnecessary as information for each item such as the item name will also be coded. Therefore, the information of each item is extracted from the coded text information. It is necessary to devise the processing. In addition, handwritten characters may cause recognition omissions at a high frequency, so it is necessary to consider the recognition omissions by the character recognition device.

【００１５】本発明は、文書の特徴を利用して、自動的
に情報読み取り処理を行うことが可能な文書ファイリン
グシステムおよびコード化された情報と元の原稿との間
の照合処理を支援することが可能な文書ファイリングシ
ステムを提供することを目的とする。The present invention supports the document filing system capable of automatically performing the information reading process by utilizing the characteristics of the document and the collation process between the coded information and the original document. The purpose is to provide a document filing system capable of performing.

【００１６】[0016]

【課題を解決するための手段】図１は、請求項１，請求
項５および請求項６の文書ファイリングシステムの原理
ブロック図である。請求項１の発明は、原稿に一定の書
式にしたがって複数の項目に関する情報を記載された文
字を読み取って、文字コードに変換して保存する文書フ
ァイリングシステムにおいて、原稿に対応するイメージ
に含まれている文字を表すドットパターンに基づいて各
文字を認識し、対応する文字コードを認識結果として出
力する文字認識手段１１１と、認識結果として得られる
テキスト情報をその構成要素である品詞に分解する分解
手段１１２と、分解手段１１２で得られた一連の品詞の
関連性を解析して、複数の項目ごとに一連の品詞によっ
て表される情報を分類し、保存処理に供する分類手段１
１３とを備えたことを特徴とする。FIG. 1 is a block diagram showing the principle of the document filing system according to claims 1, 5, and 6. According to the invention of claim 1, in a document filing system for reading a character in which information on a plurality of items is described in a manuscript according to a certain format, converting the character into a character code and storing the character code, an image corresponding to the manuscript is included. A character recognizing means 111 for recognizing each character based on a dot pattern representing the existing character and outputting a corresponding character code as a recognition result, and a decomposing means for decomposing the text information obtained as a recognition result into its parts of speech. The classifying unit 1 that analyzes the relationship between 112 and the series of parts of speech obtained by the disassembling unit 112, classifies information represented by the series of parts of speech for each of a plurality of items, and provides the storage processing.
And 13 are provided.

【００１７】図２は、請求項２ないし請求項４の文書フ
ァイリングシステムの原理ブロック図である。請求項２
の発明は、原稿に記載された文字を読み取って、文字コ
ードに変換して保存する文書ファイリングシステムにお
いて、原稿に対応するイメージに含まれている文字を表
すドットパターンに基づいて各文字を認識し、対応する
文字コードを認識結果として出力する文字認識手段１１
１と、認識結果として得られるテキスト情報をその構成
要素である品詞に分解する分解手段１１２と、分解手段
１１２で得られた一連の品詞の関連性を解析して、文字
で表された情報としての不整合を検出する不整合検出手
段１１４と、不整合検出手段１１４による検出結果に基
づいて、認識結果のテキスト情報を修正して保存処理に
供する修正手段１１５とを備えたことを特徴とする。FIG. 2 is a principle block diagram of the document filing system of claims 2 to 4. Claim 2
In a document filing system that reads characters written in a manuscript, converts them into character codes and saves them, the invention recognizes each character based on a dot pattern that represents the characters included in the image corresponding to the manuscript. , Character recognition means 11 for outputting the corresponding character code as a recognition result
1 and the decomposing means 112 for decomposing the text information obtained as a recognition result into its parts of speech, and the series of parts of speech obtained by the decomposing means 112 are analyzed to obtain information represented by characters. And a correction unit 115 for correcting the text information of the recognition result on the basis of the detection result by the mismatch detection unit 114 and for the storage processing. .

【００１８】請求項３の発明は、請求項１または請求項
２に記載の文書ファイリングシステムにおいて、文字認
識手段１１１で認識できなかったドットパターンについ
ての認識結果を補完して、分解手段１１２に送出する補
完手段１２０を備え、補完手段１２０は、未認識のドッ
トパターンに対応する候補文字を入力する入力手段１２
１と、原稿に対応するイメージにおいて未特定のドット
パターンに隣接する領域に、候補文字を表す文字パター
ンを配置して、原稿に対応するイメージと合成する第１
の合成手段１２２と、第１の合成手段１２２で得られた
イメージを表示する第１の表示手段１２３と、確定指示
の入力に応じて、入力手段１２１を介して入力された候
補文字を該当するドットパターンの認識結果として確定
し、分解手段１１２に送出する確定手段１２４とを備え
た構成であることを特徴とする。According to the invention of claim 3, in the document filing system according to claim 1 or 2, the recognition result of the dot pattern which cannot be recognized by the character recognition means 111 is complemented and sent to the decomposition means 112. The input means 12 for inputting a candidate character corresponding to an unrecognized dot pattern.
1 and a character pattern representing a candidate character is arranged in an area adjacent to an unspecified dot pattern in the image corresponding to the original, and is combined with the image corresponding to the original.
Corresponding to the candidate character input via the input unit 121 in response to the input of the confirmation instruction, and the first display unit 123 for displaying the image obtained by the first combination unit 122. It is characterized in that it is provided with a confirming means 124 for confirming as a dot pattern recognition result and sending it to the disassembling means 112.

【００１９】請求項４の発明は、請求項３に記載の文書
ファイリングシステムにおいて、第１の合成手段１２２
は、候補文字を示す文字パターンの入力に応じて、未認
識のドットパターンに隣接する領域のイメージ情報を文
字パターンで置き換えるイメージ置換手段１２６と、書
体を指定する指示の入力に応じて、候補文字を指定され
た書体に変換し、イメージ置換手段１２６に送出する変
換手段１２５とを備えた構成であることを特徴とする。The invention of claim 4 is the document filing system according to claim 3, wherein the first synthesizing means 122 is used.
Is an image replacement unit 126 that replaces the image information of the area adjacent to the unrecognized dot pattern with the character pattern in response to the input of the character pattern indicating the candidate character, and the candidate character according to the input of the instruction to specify the typeface. Is converted into a designated typeface and is sent to the image replacing means 126.

【００２０】請求項５の発明は、請求項１または請求項
２に記載の文書ファイリングシステムにおいて、文字認
識手段１１１は、文字コードに対応して該当する文字を
表す文字パターンを格納するパターン辞書１３１と、原
稿に対応するイメージに含まれる各文字を表すドットパ
ターンの入力に応じて、パターン辞書１３１に格納され
た文字パターンのそれぞれと照合し、ドットパターンと
一致する文字パターンに対応する文字コードを認識結果
として出力する照合手段１３２と、照合手段１３２によ
る照合結果に応じて、ドットパターンに対応する文字パ
ターンに新たな文字コードを与えてパターン辞書に登録
する登録手段１３３とを備えた構成であることを特徴と
する。According to a fifth aspect of the present invention, in the document filing system according to the first or second aspect, the character recognition means 111 stores a pattern dictionary 131 which stores a character pattern representing a corresponding character corresponding to a character code. According to the input of a dot pattern representing each character included in the image corresponding to the manuscript, each of the character patterns stored in the pattern dictionary 131 is matched, and the character code corresponding to the character pattern matching the dot pattern is generated. The configuration is provided with a collating unit 132 for outputting as a recognition result, and a registering unit 133 for giving a new character code to a character pattern corresponding to a dot pattern and registering it in a pattern dictionary in accordance with the collating result by the collating unit 132. It is characterized by

【００２１】請求項６の発明は、請求項１に記載の文書
ファイリングシステムにおいて、分類された各項目の情
報が認識結果のテキスト情報において占める位置に基づ
いて、それぞれの情報が対応する原稿上のドットパター
ンの範囲を項目領域として指定する位置指定手段１４１
と、原稿に対応するイメージデータの各項目領域に隣接
する領域に、それぞれ対応する情報を表す文字パターン
を合成する第２の合成手段１４２と、第２の合成手段１
４２で得られたイメージデータを表示する第２の表示手
段１４３とを備えた構成であることを特徴とする。According to a sixth aspect of the invention, in the document filing system according to the first aspect, based on the position occupied by the information of each classified item in the text information of the recognition result, each information corresponds to the original document. Position specifying means 141 for specifying the dot pattern range as an item area
A second synthesizing means 142 for synthesizing a character pattern representing the corresponding information in an area adjacent to each item area of the image data corresponding to the original, and a second synthesizing means 1.
And a second display unit 143 for displaying the image data obtained at 42.

【００２２】[0022]

【作用】請求項１の発明は、文字認識手段１１１によっ
て認識した文字列を分解手段１１２によって形態素に分
解し、分類手段１１３の分類処理に供することにより、
原稿に記載された各項目の情報の読み取り作業および項
目化作業の自動化を図ることができる。According to the invention of claim 1, the character string recognized by the character recognizing means 111 is decomposed into morphemes by the disassembling means 112 and subjected to the classification processing of the classifying means 113.
It is possible to automate the work of reading the information of each item described in the manuscript and the itemization work.

【００２３】請求項２の発明は、文字認識手段１１１に
よる認識結果を分解手段１１２によって一連の形態素に
分解して不整合検出手段１１４の検出処理に供し、この
検出結果に応じて、修正手段１１５が動作することによ
り、情報としての整合性を考慮して、認識結果の修正を
行うことができる。請求項３の発明は、補完手段１２０
の第１の合成手段１２２の動作により、入力手段１２１
を介して入力した候補文字と未認識のドットパターンと
を第１の表示手段１２３の表示画面上で並べて表示する
ことができるから、候補文字と未認識のドットパターン
とを十分に見比べながら、認識結果の修正を行うことが
できる。また、入力した候補文字が上述した未認識のド
ットパターンの認識結果として適当であると判断したと
きに、操作者が確定指示を入力し、これに応じて確定手
段１２４が動作することにより、入力した候補文字を示
す文字コードを該当する認識結果として分解手段１１２
に送出することができ、文字認識手段１１１の認識結果
を操作者の判断を交えて補完することができる。According to the second aspect of the present invention, the recognition result by the character recognition means 111 is decomposed into a series of morphemes by the decomposition means 112, and is provided to the detection processing of the mismatch detection means 114. According to the detection result, the correction means 115 is provided. By operating the, the recognition result can be corrected in consideration of the consistency as information. The invention of claim 3 is the complementing means 120.
By the operation of the first combining means 122 of
Since the candidate character and the unrecognized dot pattern input via the can be displayed side by side on the display screen of the first display unit 123, the candidate character and the unrecognized dot pattern can be sufficiently compared and recognized. The results can be modified. In addition, when it is determined that the input candidate character is appropriate as the recognition result of the above-described unrecognized dot pattern, the operator inputs a confirmation instruction, and the confirmation means 124 operates in response to the confirmation instruction. The character code indicating the selected candidate character is decomposed as the corresponding recognition result 112.
And the recognition result of the character recognition means 111 can be complemented with the operator's judgment.

【００２４】請求項４の発明は、変換手段１２５が、候
補文字を指定された書体の文字パターンに変換してイメ
ージ置換手段１２６の処理に供することにより、候補文
字を原稿に記載された認識対象の文字と類似した書体を
用いて表示手段１２３に表示することができる。請求項
５の発明は、文字認識手段１１１において、照合手段１
３２による照合結果に応じて、登録手段１３３が新しい
文字コードに対応する文字パターンをパターン辞書１３
１に登録することにより、以後は、この新しい文字パタ
ーンも文字認識処理に利用することが可能となる。これ
により、人名などに頻繁に出現する所謂「造字」にも柔
軟に対応して、文字認識手段１１１の認識率を向上する
ことができる。According to a fourth aspect of the present invention, the conversion means 125 converts the candidate characters into a character pattern of a designated typeface and provides them to the processing of the image replacement means 126, so that the candidate characters are recognized as the recognition target described in the manuscript. It can be displayed on the display unit 123 using a typeface similar to the character. According to the invention of claim 5, in the character recognition means 111, the matching means 1
In accordance with the collation result by 32, the registration means 133 creates a character pattern corresponding to the new character code in the pattern dictionary 13
By registering in 1, the new character pattern can be used for the character recognition process thereafter. As a result, the recognition rate of the character recognition unit 111 can be improved by flexibly coping with so-called “letters” that frequently appear in personal names and the like.

【００２５】請求項６の発明は、位置指定手段１４１か
らの指示に応じて、第２の合成手段１４２が動作するこ
とにより、原稿に対応するイメージと分類手段１１３に
よって得られた各項目に対応する情報とを合成し、第２
の表示手段１４３の表示画面上で並べて表示することが
できるから、各項目の情報と原稿に記載された文字を表
すイメージとを十分に見比べながら照合作業を進めるこ
とができる。According to the sixth aspect of the present invention, the second synthesizing means 142 operates in response to the instruction from the position designating means 141 to correspond to the image corresponding to the document and each item obtained by the classifying means 113. The information to be combined with the second
Since they can be displayed side by side on the display screen of the display means 143, it is possible to proceed with the collation work while sufficiently comparing the information of each item and the image representing the characters written in the manuscript.

【００２６】[0026]

【実施例】以下、図面に基づいて本発明の実施例につい
て詳細に説明する。図３は、請求項１および請求項３の
発明を適用した戸籍情報ファイリングシステムの実施例
構成図である。図３において、マイクロフィルムリーダ
３０２は、戸籍原本を撮影したマイクロフィルムによる
像を紙に印刷する代わりに、この像に対応するイメージ
データをイメージバッファ２０１を介して文字認識装置
２１０に送出する。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 3 is a block diagram of an embodiment of a family register information filing system to which the inventions of claims 1 and 3 are applied. In FIG. 3, the microfilm reader 302 sends the image data corresponding to this image to the character recognition device 210 via the image buffer 201, instead of printing the image of the microfilm obtained by photographing the original family register on paper.

【００２７】これに応じて、文字認識装置２１０の領域
抽出部２１１は、イメージバッファ２０１に保持された
イメージデータから各文字に対応する領域を切り出し、
パターン照合部２１２は、これらの各領域のドットパタ
ーンをパターン辞書２１３内の文字パターンと照合する
ことにより、各領域のドットパターンで示される文字を
認識する構成となっている。In response to this, the area extraction unit 211 of the character recognition device 210 cuts out an area corresponding to each character from the image data held in the image buffer 201,
The pattern matching unit 212 is configured to recognize the character indicated by the dot pattern in each area by matching the dot pattern in each area with the character pattern in the pattern dictionary 213.

【００２８】このパターン照合部２１２は、各領域のド
ットパターンについての照合結果として、ドットパター
ンと文字パターンとの一致率が所定の閾値以上であった
場合に、その領域のドットパターンの認識結果として、
該当する文字パターンで示される文字に対応する文字コ
ードを出力し、一致率が所定の閾値以下であった場合に
は、認識結果が未確定である旨を出力すればよい。The pattern matching unit 212, as the matching result of the dot patterns of each area, if the matching rate of the dot pattern and the character pattern is equal to or more than a predetermined threshold value, as the recognition result of the dot pattern of the area. ,
A character code corresponding to the character indicated by the corresponding character pattern may be output, and if the matching rate is less than or equal to a predetermined threshold value, it may be output that the recognition result is undetermined.

【００２９】ここで、上述したパターン辞書２１３は、
通常のタイプ印刷で用いられる明朝体などの標準書体に
対応する文字パターンとともに、毛筆体の文字パターン
を備えており、更に、それぞれの書体について、当用漢
字だけでなく、人名漢字や旧字体の文字パターンも備え
ている。このようにして得られた各領域についての認識
結果の入力に応じて、補完手段１２０に相当する認識補
完処理部２２０が動作する。Here, the above-mentioned pattern dictionary 213 is
In addition to character patterns corresponding to standard typefaces such as Mincho typeface used in normal type printing, it also has a character pattern of a writing brush.Furthermore, for each typeface, not only the corresponding kanji but personal name kanji and old kanji It also has a character pattern. The recognition complementation processing unit 220 corresponding to the complementing unit 120 operates according to the input of the recognition result for each area obtained in this way.

【００３０】図３に示した認識補完処理部２２０におい
て、コード保持部２２１は、文字認識装置２１０による
認識結果を示すコードを保持しており、パターン変換部
２２２を介してイメージ合成部２２３に送出し、このイ
メージ合成部２２３が、イメージバッファ２０１内のイ
メージデータとパターン変換部２２２から受け取った認
識結果を表す一連の文字パターンとを合成して、ディス
プレイ装置２０２による表示動作に供する構成となって
いる。In the recognition complement processing section 220 shown in FIG. 3, the code holding section 221 holds the code indicating the recognition result by the character recognition device 210, and sends it to the image synthesizing section 223 via the pattern converting section 222. Then, the image synthesizing unit 223 synthesizes the image data in the image buffer 201 and a series of character patterns representing the recognition result received from the pattern converting unit 222, and provides the display device 202 with a display operation. There is.

【００３１】このイメージ合成部２２３は、第１の合成
手段１２２に相当するものであり、まず、図４(a) に示
すように、戸籍原本に記載された各文字に対応するイメ
ージそれぞれの領域に隣接した領域に、認識結果として
得られた文字あるいは未確定である旨のマーク（図４に
おいて、符号？を付して示す）を表す文字パターンを合
成すればよい。The image synthesizing section 223 corresponds to the first synthesizing means 122. First, as shown in FIG. 4 (a), the regions of the images corresponding to the respective characters described in the original family register are respectively displayed. A character pattern representing a character obtained as a recognition result or a mark indicating indetermination (indicated by a symbol? In FIG. 4) may be synthesized in the area adjacent to.

【００３２】また、図３において、候補入力部２２４
は、入力手段１２１に相当するものであり、キーボード
（図示せず）などを介して入力される操作者からの指示
に応じて、未確定領域のドットパターンに対応する文字
の候補を示す文字コードの入力を受け付け、置換処理部
２２５を介して、上述したコード保持部２２１の該当す
る文字コードを書き換える構成となっている。Further, in FIG. 3, the candidate input unit 224
Is a character code corresponding to the input means 121, and is a character code indicating a candidate of a character corresponding to the dot pattern of the undetermined area in response to an instruction from the operator input via a keyboard (not shown) or the like. Is input and the corresponding character code of the code holding unit 221 is rewritten via the replacement processing unit 225.

【００３３】この場合は、例えば、図４(a) に符号で
示したドットパターンについて、候補入力部２２４を介
して候補文字「編」が入力されると、置換処理部２２４
により、コード保持部２２１の該当する認識結果（この
場合は未確定を示す「？」）が候補文字「編」を示す文
字コードに置き換えられ、これに応じて、パターン変換
部２２２により、候補文字「編」を表す文字パターンが
得られ、イメージ合成部２２３による合成処理に供され
る。これにより、図４(b) に示すように、該当する領域
のドットパターンは、候補文字「編」を表す文字パター
ンで書き換えられる。In this case, for example, when the candidate character “edit” is input through the candidate input unit 224 for the dot pattern indicated by the symbol in FIG. 4A, the replacement processing unit 224
By this, the corresponding recognition result of the code holding unit 221 (“?” Indicating undetermined in this case) is replaced with the character code indicating the candidate character “edit”, and accordingly, the pattern conversion unit 222 causes the candidate character to be changed. A character pattern representing a “composition” is obtained and is used in the combining process by the image combining unit 223. As a result, as shown in FIG. 4B, the dot pattern in the corresponding area is rewritten with the character pattern representing the candidate character “edit”.

【００３４】このように、置換処理部２２５がコード保
持部２２１の内容を置換し、全ての未確定領域のドット
パターンについての置き換えを終了したときに、コード
保持部２２１が、保持している内容を認識結果として出
力することにより、請求項３で述べた確定手段１２４の
機能を実現し、未確定のドットパターンに対応する文字
を確定し、文字認識装置２１０による認識結果を補完す
ることができる。In this way, when the replacement processing unit 225 replaces the contents of the code holding unit 221, and the replacement of the dot patterns of all the undefined areas is completed, the contents held by the code holding unit 221. Is output as the recognition result, the function of the determining means 124 described in claim 3 is realized, the character corresponding to the undetermined dot pattern is confirmed, and the recognition result by the character recognition device 210 can be complemented. .

【００３５】この場合は、イメージ合成部２２３によ
り、未確定のドットパターンと候補文字に対応する文字
パターンとが並べて表示されるので、操作者は、認識対
象のドットパターンと候補文字の文字パターンとを十分
に見比べることができ、２つのドットパターンの一致不
一致を直観的に、しかも正確に判断することができる。
これにより、手書き文字のように、文字認識装置２１０
による認識率が低くなりがちな文字にも柔軟に対応し
て、正確な文字認識を支援することができ、戸籍原本の
ような手書き文字を含んだ文書を文字コードに変換し、
原稿の内容を示すテキスト情報を得ることができる。In this case, since the undetermined dot pattern and the character pattern corresponding to the candidate character are displayed side by side by the image synthesizing unit 223, the operator recognizes the dot pattern to be recognized and the character pattern of the candidate character. Can be compared sufficiently, and the coincidence / non-coincidence of two dot patterns can be intuitively and accurately determined.
Thereby, like the handwritten character, the character recognition device 210
It can flexibly support characters whose recognition rate tends to be low and can support accurate character recognition, and converts documents containing handwritten characters such as the original family register into character codes,
It is possible to obtain text information indicating the content of the manuscript.

【００３６】このようにして得られたテキスト情報は、
分解手段１１２および分類手段１１３に相当する解析処
理部２３０による処理に供される。図３に示した解析処
理部２３０において、分解処理部２３１は、形態素辞書
２３２に基づいて、入力されたテキスト情報を形態素に
分解することにより、分解手段１１２の機能を実現し、
一連の形態素を構文解析部２３３および意味解析部２３
４の処理に供する構成となっている。The text information thus obtained is
The analysis processing unit 230 corresponding to the disassembling unit 112 and the classifying unit 113 is provided with the processing. In the analysis processing unit 230 shown in FIG. 3, the decomposition processing unit 231 realizes the function of the decomposition unit 112 by decomposing the input text information into morphemes based on the morpheme dictionary 232.
The series of morphemes is analyzed by the syntactic analysis unit 233 and the semantic analysis unit 23.
It is configured to be provided for the processing of No. 4.

【００３７】上述した形態素辞書２３２は、戸籍簿に含
まれる情報の種類に対応する領域を備えており、それぞ
れに該当する形態素を格納する構成となっている。この
形態素辞書２３２には、例えば、住所領域に市町村名な
どの地名を格納し、氏名領域には姓と名前とを分けて格
納しておけばよい。また、解析処理部２３０は、戸籍簿
に記載される文における各形態素のつながりに関する規
則を保持する構文規則保持部２３５を備えており、構文
解析部２３３は、この構文規則保持部２３５に保持され
た規則を参照しながら、分解処理部２３１で得られた一
連の形態素のつながりを解析する構成となっている。The morpheme dictionary 232 described above has areas corresponding to the types of information contained in the family register, and is configured to store the corresponding morphemes. In this morpheme dictionary 232, for example, a place name such as a municipality name may be stored in the address area, and a family name and a first name may be stored separately in the name area. In addition, the analysis processing unit 230 includes a syntax rule holding unit 235 that holds a rule relating to the connection of each morpheme in the sentence written in the family register, and the syntax analysis unit 233 is held in this syntax rule holding unit 235. The configuration is such that the connection of a series of morphemes obtained by the decomposition processing unit 231 is analyzed with reference to the rule.

【００３８】このとき、構文解析部２３３は、構文規則
に従って、一連の形態素をまとめてそれぞれ項目に対応
付ければよい。例えば、図３に示した戸籍原本の認識結
果を分解して得られる６つの形態素「東京都」，「丸の
内」，「一」，「丁目」，「一」，「番」は、本籍地を
表す形態素のまとまりとして、項目名「本籍」に対応づ
ければよい。同様にして、「氏名」，「編成日」など様
々な項目名と該当する形態素のまとまりとを対応付けれ
ばよい。At this time, the syntactic analysis unit 233 may associate a series of morphemes with each item according to syntactic rules. For example, the six morphemes “Tokyo”, “Marunouchi”, “One”, “Chome”, “One”, and “No.” obtained by decomposing the recognition result of the original family register shown in FIG. The item name “main domicile” may be associated with each other as a group of morphemes to be represented. Similarly, various item names such as “name” and “composition date” may be associated with a group of corresponding morphemes.

【００３９】一方、意味解析部２３４は、各項目に対応
付けられた形態素のまとまりの意味を解析し、それぞれ
の意味が対応する項目と整合しているか否かを判定する
構成となっている。この意味解析部２３４は、各項目に
対応して、対応付けられる情報の範囲に関する情報を保
持しており、例えば、項目名「編成日」に対応する形態
素のまとまりで示される日付と項目名「編成日」に対応
する日付の範囲とを比較することにより、該当する情報
の整合性を判定すればよい。On the other hand, the semantic analysis unit 234 is configured to analyze the meaning of a set of morphemes associated with each item and determine whether each meaning matches the corresponding item. The semantic analysis unit 234 holds information about the range of information associated with each item, and for example, the date and item name “indicated by a set of morphemes corresponding to the item name“ composition date ””. The consistency of the relevant information may be determined by comparing the date range corresponding to the “organization date”.

【００４０】また、解析処理部２３０の解析制御部２３
６は、意味解析部２３４により、各項目と対応する情報
とが整合しているとされた場合に、各項目と形態素のま
とまりとを組み合わせて戸籍データファイル２０３に保
存するとともに出力処理部２４０に送出し、照合作業の
ための出力処理に供する構成となっている。一方、構文
解析部２３３あるいは意味解析部２３４により、不整合
が検出された場合には、解析制御部２３６は、修正処理
部２３７を起動し、この修正処理部２３７が、操作者か
ら必要な修正指示を受け取って、形態素の区切り位置あ
るいは元のテキスト情報そのものを修正し、再び構文解
析部２３３および意味解析部２３４の処理に供すればよ
い。Further, the analysis control unit 23 of the analysis processing unit 230
6 is stored in the family register data file 203 by combining each item and a group of morphemes when the semantic analysis unit 234 determines that each item and corresponding information match each other, and the output processing unit 240 also stores the combination. It is configured to be sent out and output for the collation work. On the other hand, when the syntax analysis unit 233 or the semantic analysis unit 234 detects an inconsistency, the analysis control unit 236 activates the correction processing unit 237, and the correction processing unit 237 corrects the correction required by the operator. The instruction may be received, the morpheme delimiter position or the original text information itself may be corrected, and the morpheme delimiter position and the original text information may be processed again by the syntactic analysis unit 233 and the semantic analysis unit 234.

【００４１】このように、解析制御部２３６からの指示
に応じて、構文解析部２３３，意味解析部２３４および
修正処理部２３７が動作することにより、分類手段１１
３の機能を実現し、文字認識装置による認識結果として
得られるテキスト情報に含まれている各項目の情報を自
動的に分類し、コード化された情報として保存すること
ができる。As described above, the syntactic analysis unit 233, the semantic analysis unit 234, and the correction processing unit 237 operate according to the instruction from the analysis control unit 236, so that the classification unit 11
By implementing the function of 3, the information of each item included in the text information obtained as the recognition result by the character recognition device can be automatically classified and stored as coded information.

【００４２】これにより、戸籍情報の読み取り作業およ
び情報入力作業の自動化を図るとともに、従来の人手に
よるマーキング作業を省いて、操作者の負担を大幅に軽
減することができる。また、この解析処理部２３０の処
理と上述した文字認識装置２１０および認識補完処理部
２２０の処理とを組み合わせることにより、人手による
情報入力作業の大部分を省き、操作者の負担を大幅に軽
減することが可能である。As a result, the reading work of the family register information and the information input work can be automated, and the conventional manual marking work can be omitted to greatly reduce the burden on the operator. Further, by combining the processing of the analysis processing unit 230 with the processing of the character recognition device 210 and the recognition complementation processing unit 220 described above, most of the manual information input work is omitted and the burden on the operator is greatly reduced. It is possible.

【００４３】すなわち、請求項１の発明を適用すれば、
人手によるマーキング作業や情報入力作業を省いて、戸
籍データの変換処理の自動化を図り、最終的な照合作業
に供することができる。例えば、上述した戸籍データフ
ァイル２０３の内容とともに、イメージバッファ２０１
内のイメージデータを上述した出力処理部２４０に送出
しておき、出力処理部２４０が、このイメージデータで
表される戸籍原本の像と新しい戸籍フォーマットに項目
に分類された情報を配置して得られた戸籍簿とを並べて
印刷出力すればよい。That is, if the invention of claim 1 is applied,
It is possible to omit the manual marking work and information input work, automate the conversion process of family register data, and use it for the final matching work. For example, together with the contents of the family register data file 203 described above, the image buffer 201
The image data in the image is sent to the output processing unit 240 described above, and the output processing unit 240 arranges the image of the original family register represented by this image data and the information classified into items in the new family register format. It is sufficient to print out the registered family register side by side.

【００４４】これにより、元の戸籍原本と新しいフォー
マットの戸籍簿とを同一の紙面上で見比べながら照合作
業を行うことができる。ところで、認識補完処理部２２
０による修正処理にもかかわらず、認識結果のテキスト
情報に誤りが残る場合がある。このような認識誤りは、
解析処理部２３０による解析結果を認識補完処理部２２
０にフィードバックすることによって解決することが可
能である。Thus, it is possible to perform the collation work while comparing the original family register and the new format family register on the same sheet. By the way, the recognition complement processing unit 22
Despite the correction processing by 0, an error may remain in the text information of the recognition result. Such a recognition error is
Recognition result of analysis by the analysis processing unit 230
It is possible to solve by feeding back to 0.

【００４５】図５に、請求項２の発明を適用した戸籍情
報ファイリングシステムの実施例構成図を示す。この場
合に、解析処理部２３０の分解処理部２３１は、形態素
辞書２３２に登録されていない文字列の入力に応じて、
この文字列を仮に固有名詞として分解し、他の分解結果
とともに構文解析部２３３および意味解析部２３４の処
理に供すればよい。FIG. 5 shows a block diagram of an embodiment of a family register information filing system to which the invention of claim 2 is applied. In this case, the decomposition processing unit 231 of the analysis processing unit 230 responds to the input of the character string that is not registered in the morpheme dictionary 232.
This character string may be temporarily decomposed as a proper noun and provided to the processing of the syntactic analysis unit 233 and the semantic analysis unit 234 together with other decomposition results.

【００４６】また、構文解析部２３３は、仮に分解され
た固有名詞をその前後の形態素との関係と構文規則とに
基づいて適当な項目に分類し、意味解析部２３４は、通
常の判定処理とともに、上述した固有名詞が分類された
項目が固有名詞が許される項目であるか否かを判定し、
項目とその内容との不整合を検出すればよい。例えば、
本籍地や届け出場所など地名が記載される項目や日付が
記載される項目には、固有名詞は許容されないから、こ
れらの項目に対応する情報に上述したような仮の固有名
詞が含まれていた場合に、これを不整合として検出し、
これに応じて、解析制御部２３６は、修正処理部２３７
の代わりに、認識補完処理部２２０を起動して、認識結
果の修正処理の再試行を指示すればよい。Further, the syntactic analysis unit 233 classifies the temporarily decomposed proper noun into appropriate items based on the relation between the preceding and following morphemes and the syntactic rule. , It is determined whether or not the item in which the proper noun is classified is an item in which the proper noun is allowed,
It is sufficient to detect the inconsistency between the item and its contents. For example,
Proper nouns are not allowed for items with place names such as permanent domiciles or reporting locations or items with dates, so the provisional proper nouns described above were included in the information corresponding to these items. If this is detected as an inconsistency,
In response to this, the analysis control unit 236 causes the correction processing unit 237 to
Instead of, the recognition complement processing unit 220 may be activated to instruct the retry of the correction processing of the recognition result.

【００４７】ここで、上述した構文解析部２３３の処理
により、誤った認識結果とされた文字列を含む情報は、
適切な項目に分類されているから、解析制御部２３６
は、修正処理の再試行指示とともに、不整合が検出され
た箇所と不整合となった理由を示す情報として、該当す
る項目の情報の範囲と整合しない旨を通知すればよい。
上述した再試行指示の入力に応じて、イメージ合成部２
２３は、再び、戸籍原本に対応するイメージとコード保
持部２２１に保持された認識結果を表す文字パターンと
を合成し、ディスプレイ装置２０２を介して表示すれば
よい。Here, the information including the character string which is erroneously recognized by the processing of the above-mentioned syntax analysis unit 233 is
Since the items are classified into appropriate items, the analysis control unit 236
With the retry instruction of the correction process, the information indicating the reason for the inconsistency with the location where the inconsistency is detected may be notified to the effect that it does not match the information range of the corresponding item.
In response to the input of the retry instruction described above, the image synthesis unit 2
23, again, the image corresponding to the original family register and the character pattern representing the recognition result held in the code holding unit 221 may be combined and displayed via the display device 202.

【００４８】また、このとき、イメージ合成部２２３
は、上述したようにして合成したイメージにおいて、認
識結果のうち不整合とされた部分とこの部分に対応する
ドットパターンとを強調表示して、修正が必要な箇所を
示すとともに、上述した不整合理由を示す情報を表す表
示データを作成し、それぞれディスプレイ装置２０２に
送出すればよい。At this time, the image composition section 223
In the image synthesized as described above, highlights the portion of the recognition result that is inconsistent and the dot pattern corresponding to this portion to indicate the portion that needs to be corrected, and Display data representing information indicating the reason may be created and sent to the display device 202, respectively.

【００４９】このように、解析制御部２３６からの指示
に応じて、認識補完処理部２２０の各部が動作すること
により、修正手段１１５の機能を実現し、構文解析処理
および意味解析処理によって不整合が検出された箇所の
認識結果を修正することができる。この場合は、不整合
が検出された部分の認識結果は、認識結果が未確定であ
る部分と同様に扱われ、文字認識結果が誤っている可能
性の高い部分が、その前後の認識結果とともに対応する
ドットパターンに隣接した領域に表示され、また、それ
ぞれに対応する不整合理由も表示される。As described above, the functions of the correction means 115 are realized by the operation of the respective parts of the recognition complementation processing part 220 in accordance with the instruction from the analysis control part 236, and the mismatching is caused by the syntactic analysis process and the semantic analysis process. It is possible to correct the recognition result of the location where is detected. In this case, the recognition result of the part where the inconsistency is detected is treated in the same way as the part where the recognition result is undetermined, and the part where the character recognition result is likely to be erroneous is recognized along with the recognition results before and after it. It is displayed in the area adjacent to the corresponding dot pattern, and the reason for the mismatch is also displayed.

【００５０】したがって、操作者は、それぞれに対応す
る不整合理由で示された項目に適合する情報の種類の範
囲と該当する領域のイメージデータと前後の認識結果と
を手掛かりにして、正しい文字列を推測することができ
る。この推測結果をキーボードなどを介して候補入力部
２２３に候補文字として入力し、置換処理部２２４がこ
の候補文字を示す文字コードでコード保持部２２３の該
当するコードを置換することにより、該当する部分の文
字認識結果を修正することができる。Therefore, the operator uses the range of the type of information that matches the item indicated by the corresponding inconsistency reason, the image data of the corresponding area, and the recognition results before and after the clue as the correct character string. Can be guessed. This estimation result is input as a candidate character to the candidate input unit 223 via a keyboard or the like, and the replacement processing unit 224 replaces the corresponding code of the code holding unit 223 with the character code indicating this candidate character, thereby the corresponding portion. The character recognition result of can be corrected.

【００５１】これにより、読み取り結果が誤っている可
能性が高い部分を選択的に、しかも、多角的な情報に基
づいて修正することができる。特に、不整合理由が提供
されることにより、操作者は、該当する領域のイメージ
データに対応すべき文字として考えられる範囲を絞り込
むことができるから、操作者による修正作業を支援し
て、より正確な読み取り結果を得ることが可能となる。As a result, it is possible to selectively correct a portion having a high possibility that the reading result is erroneous and also based on the diversified information. In particular, by providing the reason for inconsistency, the operator can narrow down the range that can be considered as the characters that should correspond to the image data of the corresponding area, so that the operator can correct the work and be more accurate. It is possible to obtain various reading results.

【００５２】更に、形態素辞書２３２の住所領域に、町
名変更など地名の変更に関する情報を各年代における地
名とともに保持しておき、意味解析部２３４が、地名が
記載される項目に分類された情報の整合性を判定する際
に、対応する日付が記載された項目の情報と上述した地
名変更に関する情報とを参照する構成とすれば、より精
密な判定が可能となる。Further, in the address area of the morpheme dictionary 232, information about the change of the place name such as the change of the town name is held together with the place name of each generation, and the semantic analysis unit 234 stores the information classified into the items in which the place name is described. When determining the consistency, if the configuration is such that the information on the item in which the corresponding date is described and the information on the place name change described above is referred to, more precise determination is possible.

【００５３】この場合に、例えば、形態素辞書２３２の
住所領域から、前後の地名や地名変更に関する情報に基
づいて、誤った認識結果に対応する形態素を検索して、
認識補完処理部２２０に候補文字列の例として提供して
もよい。これにより、形態素辞書２３２の内容や戸籍原
本に記載された関連する記述の内容を活用して、より強
力にイメージからの文字認識処理を支援することができ
る。In this case, for example, the morpheme corresponding to the erroneous recognition result is searched from the address area of the morpheme dictionary 232 based on the information about the place name before and after or the place name change.
It may be provided to the recognition complement processing unit 220 as an example of the candidate character string. This makes it possible to more strongly support the character recognition process from the image by utilizing the contents of the morpheme dictionary 232 and the contents of the related description written in the original family register.

【００５４】また、パターン辞書２１３の構成を工夫す
ることにより、常用漢字，当用漢字以外の造られた文字
（以下、造字と称する）にも柔軟に対応して、以後の文
字認識に利用することが可能である。図６に、請求項５
の発明を適用した戸籍情報ファイリングシステムの実施
例構成図を示す。Further, by devising the structure of the pattern dictionary 213, it is possible to flexibly deal with characters made up other than common kanji and common kanji (hereinafter referred to as "kanji") and used for subsequent character recognition. It is possible to In FIG. 6, claim 5
FIG. 3 is a configuration diagram of an embodiment of a family register information filing system to which the invention of FIG.

【００５５】図６において、戸籍情報ファイリングシス
テムは、図３に示した戸籍情報ファイリングシステムに
登録手段１３３に相当する登録処理部２５０を付加し、
この登録処理部２５０が、操作者からの指示に応じて、
指定された領域のドットパターンに新規の文字コードを
対応付けて、文字認識装置２１０のパターン辞書２１３
に設けた造字領域２１４に登録する構成となっている。In FIG. 6, the family register information filing system has a registration processing unit 250 corresponding to the registration means 133 added to the family register information filing system shown in FIG.
The registration processing unit 250 responds to the instruction from the operator.
A new character code is associated with the dot pattern in the designated area, and the pattern dictionary 213 of the character recognition device 210 is associated.
It is configured to be registered in the character formation area 214 provided in the.

【００５６】この登録処理部２５０において、イメージ
切出部２５１は、利用者からの登録指示に応じて、指定
されたドットパターンをイメージバッファ２０１から読
み出し、パターン作成部２５２は、このドットパターン
に基づいて、新規に文字パターンとして登録する造字パ
ターンを作成する構成となっている。このパターン作成
部２５３は、例えば、指定された領域のドットパターン
に細線化処理を施すことにより、少なくとも１つの線分
が特定の位置関係で配置されたパターンを抽出すればよ
い。そして、このパターンを上述したドットパターンで
表された文字に対応する照合用の文字パターンとして、
書込処理部２５４に送出すればよい。このとき、元のド
ットパターンが毛筆による文字の像である場合は、この
ドットパターンを毛筆体用の照合用文字パターンとして
利用してもよい。In the registration processing section 250, the image cutout section 251 reads out a designated dot pattern from the image buffer 201 in response to a registration instruction from the user, and the pattern creating section 252 uses the dot pattern as a basis. Then, a character formation pattern to be newly registered as a character pattern is created. The pattern creating unit 253 may extract a pattern in which at least one line segment is arranged in a specific positional relationship by performing a thinning process on the dot pattern in the designated area, for example. Then, this pattern is used as a collation character pattern corresponding to the character represented by the dot pattern described above,
It may be sent to the writing processing unit 254. At this time, when the original dot pattern is an image of a character by a writing brush, this dot pattern may be used as a collating character pattern for a writing brush.

【００５７】また、コード決定部２５３は、上述した登
録指示の入力に応じて、造字領域２１４から未登録の文
字コードを検索し、この文字コードを新しい文字パター
ンに対応する文字コードとして出力する構成となってお
り、書込処理部２５４は、この文字コードに対応して、
上述した造字パターンおよびドットパターンそのものを
パターン辞書２１３の造字領域２１４に書き込む構成と
なっている。In addition, the code determination unit 253 retrieves an unregistered character code from the character formation area 214 in response to the input of the registration instruction described above, and outputs this character code as a character code corresponding to the new character pattern. The writing processing unit 254 corresponds to this character code,
The character formation pattern and the dot pattern described above are written in the character formation area 214 of the pattern dictionary 213.

【００５８】このようにして、照合手段１３２に相当す
る照合処理部２１２により、パターン辞書２１３に該当
する文字パターンが存在しないとされた場合に、必要に
応じて、新しい文字パターンを造字パターンとして登録
することができる。例えば、照合処理部２１２によって
未確定とされたドットパターンに対して、認識補完処理
部２２０の処理により、操作者が様々な候補文字との照
合を行い、その結果、操作者が該当するドットパターン
が造字に対応するものであると判断したときに、キーボ
ードなどを操作して登録指示を入力し、上述した登録処
理を起動すればよい。In this way, when the matching processing unit 212 corresponding to the matching means 132 determines that the corresponding character pattern does not exist in the pattern dictionary 213, a new character pattern is used as a character forming pattern as necessary. You can register. For example, the operator performs matching with various candidate characters by the process of the recognition complementation processing unit 220 for the dot pattern that has been undetermined by the matching processing unit 212, and as a result, the operator matches the corresponding dot pattern. When is determined to correspond to the typesetting, the registration instruction may be input by operating the keyboard or the like to activate the above-described registration processing.

【００５９】なお、この場合は、解析処理部２３０の分
解処理部２３１は、造字用の文字コードの入力に応じ
て、この文字コードを含む文字列を固有名詞として分解
し、構文解析部２３３，意味解析部２３４の処理に供す
ればよい。これにより、造字の有無にかかわらず、解析
処理部２３０の処理によって、認識結果として得られた
テキスト情報を項目ごとに分類することができる。In this case, the decomposition processing unit 231 of the analysis processing unit 230 decomposes the character string including this character code as a proper noun according to the input of the character code for character formation, and the syntax analysis unit 233. It may be provided to the processing of the semantic analysis unit 234. Thereby, the text information obtained as the recognition result can be classified into each item by the processing of the analysis processing unit 230 regardless of the presence or absence of the typesetting.

【００６０】また、上述したようにして、パターン辞書
２１３に新たな造字を登録したことにより、以後は、照
合処理部２１２および認識補完処理部２２０により、こ
の造字も含めて文字認識を行うことができるから、認識
率の向上を図ることができる。更に、解析処理部２３０
において、構文解析部２３３の処理結果に基づいて、造
字を含んだ固有名詞に適切な品詞（例えば、姓，名な
ど）を判断し、形態素辞書２３２の該当する品詞の新し
い要素として登録すれば、以降は、この固有名詞も他の
形態素と同様に扱うことができる。By registering a new character in the pattern dictionary 213 as described above, the collation processing unit 212 and the recognition complementation processing unit 220 thereafter perform character recognition including this character. Therefore, the recognition rate can be improved. Furthermore, the analysis processing unit 230
In, in the case where a proper part-of-speech (for example, surname, first name, etc.) for a proper noun including a character is determined based on the processing result of the syntactic analysis unit 233 and registered as a new element of the corresponding part-of-speech in the morphological dictionary 232. After that, this proper noun can be treated like other morphemes.

【００６１】このようにして、人名を表す文字としてし
ばしば出現する造字に柔軟に対応して、文字認識装置２
１０による認識処理を強力に支援することができ、造字
を含んだ認識結果を解析処理部２３０による項目化処理
に供することができるから、戸籍情報のファイリング作
業をより効率よく進めることができる。更に、新しいフ
ォーマットの戸籍簿と元の戸籍原本とを紙の上で比較す
る代わりに、両者をディスプレイ装置２０２の表示画面
上で比較することも可能である。In this way, the character recognition device 2 flexibly responds to the characters that often appear as characters representing a person's name.
Since the recognition process by 10 can be strongly supported and the recognition result including the character formation can be used for the itemization process by the analysis processing unit 230, the filing work of the family register information can be performed more efficiently. Further, instead of comparing the new format family register with the original family register on paper, it is also possible to compare both on the display screen of the display device 202.

【００６２】図７に、請求項６の発明を適用した戸籍情
報ファイリングシステムの実施例構成図を示す。図７に
おいて、戸籍情報ファイリングシステムは、図３に示し
た出力処理部２４０を備える代わりに、照合データ作成
部２６１を備え、戸籍データファイル２０３の内容に基
づいて作成した照合データをパターン変換部２２２を介
して認識補完処理部２２０のイメージ合成部２２３に送
出し、イメージバッファ２０１に保持された戸籍原本の
イメージとの合成処理に供する構成となっている。FIG. 7 is a block diagram showing an embodiment of a family register information filing system to which the invention of claim 6 is applied. In FIG. 7, the family register information filing system includes a collation data creation unit 261 instead of the output processing unit 240 illustrated in FIG. 3, and the collation data created based on the contents of the family register data file 203 is converted into the pattern conversion unit 222. The image is sent to the image synthesizing unit 223 of the recognition complementation processing unit 220 via the image compensating processing unit 220, and is used for the synthesizing process with the image of the original family register held in the image buffer 201.

【００６３】この照合データ作成部２６１は、例えば、
戸籍データファイル２０３の内容と認識結果として得ら
れたテキスト情報とを比較し、重複している部分以外の
文字コードを全て空白を示す文字コードに変換して、各
項目に対応する情報が元のテキスト情報において占める
位置であり、他の部分が空白であるような照合データを
作成すればよい。The collation data creation unit 261 is, for example,
The contents of the family register data file 203 are compared with the text information obtained as a recognition result, all the character codes other than the overlapping portions are converted into character codes indicating blanks, and the information corresponding to each item is the original. It is only necessary to create collation data that is the position occupied by the text information and that the other part is blank.

【００６４】この場合は、照合データにおける空白以外
の文字コードの位置により、各項目に対応する情報を表
示すべき位置が示されており、上述した照合データ作成
部２６１により、位置指定手段１４１の機能が実現され
ている。また、パターン変換部２２２は、上述した照合
データをコード保持部２２１からの認識結果の代わりに
受け取り、文字認識装置２１０のパターン辞書２１３か
ら該当する文字パターンを検索して、順次にイメージ合
成部２２３に送出すればよい。In this case, the position of the character code other than the blank in the collation data indicates the position where the information corresponding to each item is to be displayed. The function is realized. Further, the pattern conversion unit 222 receives the above-mentioned collation data instead of the recognition result from the code holding unit 221, retrieves the corresponding character pattern from the pattern dictionary 213 of the character recognition device 210, and sequentially performs the image synthesis unit 223. You can send it to.

【００６５】これに応じて、イメージ合成部２２３は、
認識結果との合成処理と同様にして、戸籍原本に対応す
るイメージにおいて、各文字を表すドットパターンが分
布している範囲に隣接する領域に、受け取った文字パタ
ーンを順次に配置して合成し、ディスプレイ装置２０２
に送出すればよい。このように、照合データの入力に応
じて、パターン変換部２６２とイメージ合成部２２３と
が動作することにより、第２の合成手段１４２の機能を
実現し、戸籍原本のイメージと各項目の情報を表す一連
の文字パターンとを合成し、第２の表示手段１４３に相
当するディスプレイ装置２０２に表示することができ
る。In response to this, the image composition section 223
Similar to the process of synthesizing with the recognition result, in the image corresponding to the original family register, the received character patterns are sequentially arranged and synthesized in an area adjacent to the range in which the dot patterns representing each character are distributed, Display device 202
You can send it to. In this way, the pattern converting unit 262 and the image synthesizing unit 223 operate according to the input of the collation data to realize the function of the second synthesizing unit 142, and to obtain the image of the original family register and the information of each item. A series of character patterns that are represented can be combined and displayed on the display device 202 corresponding to the second display unit 143.

【００６６】これにより、操作者は、戸籍原本に記載さ
れた情報と項目に分類された情報とを極く近くで見比べ
ながら照合作業を進めることができるから、各項目の情
報に対応する戸籍原本の情報を直観的にかつ正確に把握
し、効率よく作業を行うことが可能となり、操作者の作
業負担を大幅に軽減することができる。また、イメージ
合成部２２３が、戸籍原本のイメージにおいて、各項目
に対応するドットパターンの領域を強調表示すれば、各
項目の情報に対応する戸籍原本の情報の把握をより容易
にすることができる。As a result, the operator can proceed with the collation work while closely comparing the information recorded in the original family register with the information classified into the items, so that the original family register corresponding to the information of each item can be advanced. The information can be intuitively and accurately grasped, and the work can be efficiently performed, and the work load on the operator can be significantly reduced. Further, if the image composition unit 223 highlights the dot pattern area corresponding to each item in the image of the original family register, it is possible to more easily grasp the information of the original family register corresponding to the information of each item. .

【００６７】更に、パターン変換部２２２が、操作者か
らの指示に応じて、指定された項目についてパターン辞
書２１３から標準書体の文字パターンを検索する代わり
に、毛筆体の文字パターンを検索する構成とすれば、請
求項４で述べた変換手段１２５の機能を実現し、戸籍原
本において毛筆で記載された部分については、該当する
項目の情報を毛筆体で表示することができる。Furthermore, the pattern conversion section 222 searches for a character pattern of a writing brush instead of searching a character pattern of a standard typeface from the pattern dictionary 213 for a designated item in accordance with an instruction from the operator. Then, the function of the conversion unit 125 described in claim 4 is realized, and for the portion described by the brush in the original family register, the information of the corresponding item can be displayed by the brush body.

【００６８】このように、戸籍原本と類似した書体を用
いて、該当する項目の情報を元のイメージデータに隣接
して表示することにより、操作者が、戸籍原本に記載さ
れた情報と項目に分類された情報とをドットパターンの
一致不一致として直観的に照合することが可能である。
これにより、照合作業の際の操作者の作業負担をより一
層軽減することができる。As described above, by displaying the information of the corresponding item adjacent to the original image data by using the typeface similar to the original family register, the operator can display the information and items described in the original family register. It is possible to intuitively collate the classified information with the dot pattern matching.
As a result, it is possible to further reduce the work load on the operator during the collation work.

【００６９】また、上述したようにして、画面上で照合
作業を行う構成としたことにより、照合作業で不整合が
検出された場合に、そのまま認識補完処理部２２０の処
理に移ることが可能となる。例えば、操作者は、キーボ
ード２０２を介して認識補完処理部２２０の候補入力部
２２２に候補文字列を入力し、置換処理部２２３を動作
させて、該当する項目の情報を候補文字列に対応する文
字コードに置換すればよい。Further, as described above, since the collation work is performed on the screen, when the inconsistency is detected in the collation work, the process of the recognition complement processing section 220 can be directly performed. Become. For example, the operator inputs a candidate character string to the candidate input unit 222 of the recognition complementation processing unit 220 via the keyboard 202, operates the replacement processing unit 223, and causes the information of the corresponding item to correspond to the candidate character string. Replace with the character code.

【００７０】このようにして、照合作業を進めながら、
逐次、検出した誤りを訂正していくことが可能であるか
ら、照合作業およびこれに伴う最終的な訂正作業の操作
性を飛躍的に向上して、戸籍情報のファイリング作業を
効率よく進めることができる。上述したように、本発明
は、認識補完処理部２２０，解析処理部２３０の処理お
よび認識補完処理部２２０を利用した照合処理により、
文字認識装置による認識結果を補完することができるか
ら、従来は、このようなファイリング作業の対象になり
えなかった様々な文書のファイリング作業に適用するこ
とができる。In this way, while proceeding with the collation work,
Since it is possible to successively correct detected errors, it is possible to dramatically improve the operability of the collation work and the final correction work associated therewith, and to efficiently proceed the filing work of family register information. it can. As described above, according to the present invention, by the processing of the recognition complementation processing unit 220 and the analysis processing unit 230 and the collation processing using the recognition complementation processing unit 220,
Since the recognition result by the character recognition device can be complemented, it can be applied to the filing work of various documents which could not be the target of such filing work in the past.

【００７１】例えば、文字認識装置２１０に備えるパタ
ーン辞書２１３として、草書，行書に対応するものを用
意し、また、古文における形態素および構文規則をそれ
ぞれ形態素辞書２２２および構文規則保持部２３５に格
納しておけばよい。これにより、古文書などのファイリ
ングにも本発明システムを適用することが可能となるか
ら、貴重な文化財の保存および活用に多大な貢献をする
ことができる。For example, as the pattern dictionary 213 provided in the character recognition device 210, ones corresponding to cursive writing and line writing are prepared, and morphemes and syntactic rules in old sentences are stored in the morpheme dictionary 222 and the syntactic rule holding unit 235, respectively. You can leave it. As a result, the system of the present invention can be applied to filing of old documents and the like, which can greatly contribute to preservation and utilization of valuable cultural assets.

【００７２】[0072]

【発明の効果】以上説明したように請求項１の発明は、
形態素解析を採用することで、認識結果を自動的に項目
ごとに分類することができ、原稿に記載されたも字情報
の読み取り作業，入力作業とともに、項目化作業の自動
化を図ることができ、利用者の作業負担を大幅に軽減す
ることができる。As described above, the invention of claim 1 is
By adopting morphological analysis, it is possible to automatically classify the recognition results for each item, and it is possible to automate the itemization work as well as the reading and inputting work of the character information written in the manuscript. The work load on the user can be significantly reduced.

【００７３】また、請求項２の発明は、形態素解析結果
を考慮しながら認識結果の修正を行うことにより、認識
誤りが発生したときに、その前後の文字列のつながりを
手掛かりにして認識誤りを修正することができるから、
文字認識手段による認識処理を補完することができる。
更に、請求項３の発明は、未認識のドットパターンと候
補文字とを並べて表示することにより、これらを十分に
見比べながら修正作業を行った結果を認識結果とするこ
とができるから、文字認識手段による認識処理を補完し
て、より正確な認識結果を得ることができる。Further, in the invention of claim 2, when the recognition result is corrected by considering the morphological analysis result, when the recognition error occurs, the connection of the character strings before and after it is used as a clue to detect the recognition error. I can fix it,
The recognition processing by the character recognition means can be complemented.
Further, according to the invention of claim 3, by displaying the unrecognized dot pattern and the candidate character side by side, the result of the correction work can be taken as the recognition result while sufficiently comparing them, and therefore the character recognition means. A more accurate recognition result can be obtained by complementing the recognition processing by.

【００７４】特に、候補パターンを未認識の文字に対応
する書体に変換して表示することにより、書体による文
字の形状の特徴を考慮しながら、認識結果の修正作業を
行うことができ、文字認識手段による認識処理をさらに
強力に支援することができる。また、請求項５の発明を
適用し、必要に応じて新たな文字パターンをパターン辞
書に登録すれば、更に認識率の向上が期待できる。In particular, by converting the candidate pattern into a typeface corresponding to an unrecognized character and displaying the typeface, the recognition result can be corrected in consideration of the characteristics of the character shape of the typeface. The recognition processing by means can be supported more strongly. Further, if the invention of claim 5 is applied and a new character pattern is registered in the pattern dictionary as needed, further improvement in the recognition rate can be expected.

【００７５】また、請求項６の発明によれば、項目ごと
に分類された情報を元の原稿のイメージに重ね合わせて
表示することができ、これにより、各項目の情報を原稿
の該当する部分と十分に見比べることが可能となるか
ら、照合作業を強力に支援して、利用者の負担を大幅に
軽減することができる。Further, according to the invention of claim 6, the information classified for each item can be displayed by being superimposed on the image of the original document, whereby the information of each item can be displayed in a corresponding portion of the document. Therefore, it is possible to strongly support the collation work and significantly reduce the burden on the user.

[Brief description of drawings]

【図１】請求項１，請求項５，請求項６の文書ファイリ
ングシステムの原理ブロック図である。FIG. 1 is a principle block diagram of a document filing system according to claims 1, 5, and 6;

【図２】請求項２ないし請求項４の文書ファイリングシ
ステムの原理ブロック図である。FIG. 2 is a principle block diagram of the document filing system according to claims 2 to 4;

【図３】請求項１および請求項３の発明を適用した戸籍
情報ファイリングシステムの実施例構成図である。FIG. 3 is a configuration diagram of an embodiment of a family register information filing system to which the inventions of claims 1 and 3 are applied.

【図４】イメージ合成処理を説明する図である。FIG. 4 is a diagram illustrating an image combining process.

【図５】請求項２の発明を適用した戸籍情報ファイリン
グシステムの実施例構成図である。FIG. 5 is a configuration diagram of an embodiment of a family register information filing system to which the invention of claim 2 is applied.

【図６】請求項５の発明を適用した戸籍情報ファイリン
グシステムの実施例構成図である。FIG. 6 is a configuration diagram of an embodiment of a family register information filing system to which the invention of claim 5 is applied.

【図７】請求項６の発明を適用した戸籍情報ファイリン
グシステムの実施例構成図である。FIG. 7 is a configuration diagram of an embodiment of a family register information filing system to which the invention of claim 6 is applied.

【図８】従来の戸籍情報ファイリングシステムの構成例
を示す図である。FIG. 8 is a diagram showing a configuration example of a conventional family register information filing system.

[Explanation of symbols]

１１１文字認識手段１１２分解手段１１３分類手段１１４不整合検出手段１１５修正手段１２０補完手段１２１入力手段１２２第１の合成手段１２３第１の表示手段１２４確定手段１２５変換手段１２６イメージ置換手段１３１，２１３パターン辞書１３２照合手段１３３登録手段１４１位置指定手段１４２第２の合成手段１４３第２の表示手段２０１イメージバッファ２０２ディスプレイ装置（ディスプレイ）２０３，３１３戸籍データファイル２１０文字認識装置２１１領域抽出部２１２パターン照合部２１４造字領域２２０認識補完処理部２２１コード保持部２２２パターン変換部２２３イメージ合成部２２４候補入力部２２５置換処理部２３０解析処理部２３１分解処理部２３２形態素辞書２３３構文解析部２３４意味解析部２３５構文規則保持部２３６解析制御部２３７修正処理部２４０出力処理部２５０登録処理部２５１イメージ切出部２５２パターン作成部２５３コード決定部２５４書込処理部２６１照合データ作成部３０１マイクロフィルム３０２マイクロフィルムリーダ３０３写し３０４照合リスト３１０文書ファイリングシステム３１１キーボード３１２編集処理部３１４照合リスト作成部 111 character recognition means 112 decomposition means 113 classification means 114 inconsistency detection means 115 correction means 120 complementing means 121 input means 122 first combining means 123 first display means 124 confirming means 125 converting means 126 image replacing means 131, 213 patterns Dictionary 132 Collating means 133 Registering means 141 Position specifying means 142 Second synthesizing means 143 Second displaying means 201 Image buffer 202 Display device (display) 203, 313 Family register data file 210 Character recognition device 211 Area extracting part 212 Pattern matching part 214 character formation region 220 recognition complementation processing unit 221 code holding unit 222 pattern conversion unit 223 image synthesis unit 224 candidate input unit 225 replacement processing unit 230 analysis processing unit 231 decomposition processing unit 232 form Elementary dictionary 233 Syntax analysis unit 234 Semantic analysis unit 235 Syntax rule storage unit 236 Analysis control unit 237 Correction processing unit 240 Output processing unit 250 Registration processing unit 251 Image cutout unit 252 Pattern creation unit 253 Code determination unit 254 Write processing unit 261 Collation data creation unit 301 Microfilm 302 Microfilm reader 303 Copy 304 Collation list 310 Document filing system 311 Keyboard 312 Edit processing unit 314 Collation list creation unit

フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所 9194−5ＬＧ０６Ｆ 15/403 ３１０Ｃ 15/62 ３３０Ｄ (72)発明者廣野恭資神奈川県横浜市西区北幸２丁目８番４号株式会社富士通京浜システムエンジニアリング内 (72)発明者吉田秀行神奈川県横浜市西区北幸２丁目８番４号株式会社富士通京浜システムエンジニアリング内Continuation of the front page (51) Int.Cl. ⁶ Identification number Office reference number FI Technical display location 9194-5L G06F 15/403 310C 15/62 330D (72) Inventor Kyosuke Hirono 2 Kitayuki Nishi-ku, Yokohama-shi, Kanagawa 8-4 chome, Fujitsu Keihin System Engineering Co., Ltd. (72) Inventor Hideyuki Yoshida 2-8-4 Kitayuki, Nishi-ku, Yokohama City, Kanagawa Prefecture Fujitsu Keihin System Engineering Co., Ltd.

Claims

[Claims]

1. A document filing system for reading a character having information on a plurality of items according to a certain format on a manuscript, converting the character into a character code, and storing the character code, which is included in an image corresponding to the manuscript. Character recognition means for recognizing each character based on the dot pattern representing the character and outputting the corresponding character code as a recognition result, and decomposition means for decomposing the text information obtained as a recognition result into its parts of speech, A series of parts of speech obtained by the disassembling means are analyzed to classify the information represented by the series of parts of speech for each of the plurality of items, and a classifying means for storage processing is provided. And document filing system.

2. A document filing system for reading characters written on a manuscript, converting the characters into character codes and storing the character codes, wherein each character is converted based on a dot pattern representing characters included in an image corresponding to the manuscript. Character recognition means for recognizing and outputting the corresponding character code as a recognition result, decomposing means for decomposing the text information obtained as a recognition result into its parts of speech, and a series of parts of speech obtained by the decomposing means. An inconsistency detecting unit that analyzes the relevance and detects an inconsistency as information represented by characters, and based on the detection result by the inconsistency detecting unit, corrects the text information of the recognition result and saves it. A document filing system, characterized by comprising:

3. The document filing system according to claim 1, further comprising a complementing unit that complements a recognition result of a dot pattern that could not be recognized by the character recognizing unit and sends the result to the disassembling unit. Complementing means, the input means for inputting a candidate character corresponding to the unrecognized dot pattern, and the character pattern representing the candidate character in the area adjacent to the unspecified dot pattern in the image corresponding to the document, A first synthesizing unit for synthesizing the image corresponding to the original, and a first synthesizing unit for displaying the image obtained by the first synthesizing unit.
And a confirmation means for confirming the candidate character input via the input means as a recognition result of the corresponding dot pattern in response to the input of the confirmation instruction and transmitting it to the decomposition means. A document filing system characterized by the fact that there is.

4. The document filing system according to claim 3, wherein the first synthesizing unit outputs image information of an area adjacent to an unrecognized dot pattern to the character in response to input of a character pattern indicating a candidate character. An image replacement means for replacing with a pattern, and a conversion means for converting a candidate character into a specified typeface according to an input of an instruction for designating the typeface and sending it to the image replacement means. Document filing system to do.

5. The document filing system according to claim 1 or 2, wherein the character recognizing means stores a pattern dictionary storing a character pattern representing a corresponding character corresponding to a character code, and an image corresponding to a document. In accordance with the input of a dot pattern representing each character included in, the collation that collates with each of the character patterns stored in the pattern dictionary, and outputs the character code corresponding to the character pattern that matches the dot pattern as a recognition result. A document characterized by comprising: means for registering a new character code to a character pattern corresponding to the dot pattern and registering the character pattern in the pattern dictionary in accordance with a matching result by the matching means. Filing system.

6. The document filing system according to claim 1, wherein, based on the position occupied by the classified information in the text information of the recognition result, the range of the dot pattern on the original document corresponding to each information is determined. A position designating means for designating as an item area; a second synthesizing means for synthesizing a character pattern representing corresponding information in an area adjacent to each item area of the image data corresponding to the document; A document filing system comprising: second display means for displaying the image data obtained by the means.