JP2890788B2

JP2890788B2 - Document recognition device

Info

Publication number: JP2890788B2
Application number: JP2271865A
Authority: JP
Inventors: 昇清水
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1990-10-09
Filing date: 1990-10-09
Publication date: 1999-05-17
Anticipated expiration: 2014-05-17
Also published as: JPH04147391A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文書画像を認識して電子的な文書を作成す
る装置に関する。Description: TECHNICAL FIELD The present invention relates to an apparatus for recognizing a document image and creating an electronic document.

[Conventional technology]

近年の文字認識技術の発達と共に種々のOCR（Optical
Charactor Reader）が開発されている。With the development of character recognition technology in recent years, various OCR (Optical
Charactor Reader) has been developed.

そして、これまでに書式に関する事前情報を用いずに
紙面を構成する文字、図表或いは写真等の要素を抽出す
る手法として黒／白両画素のランレングスを用いるも
の、一定の大きさのウィンドウ内の画素濃度を用いるも
の、近接線密度法を用いるものなどがある。これらの手
法はイメージ蓄積編集或いはイメージ伝送を前提とした
濃淡図形と２値図形の分離、図表領域と文字領域の分離
を主な目的としたものである。As a method of extracting elements such as characters, diagrams, photographs, etc. constituting a paper surface without using prior information on a format, a method using a run length of both black and white pixels, a method using a window of a fixed size, There are a method using a pixel density and a method using a proximity linear density method. These methods are mainly intended to separate a gray-scale figure and a binary figure on the premise of image storage / editing or image transmission, and to separate a figure area and a character area.

一方、文字の認識を前提として文字列を抽出する手法
として、黒画素の連結成分の追跡による手法が報告され
ている。また、段の位置等、書式に関する大まかな情報
を予め与えておき、それを手掛かりに紙面構成要素を抽
出してゆく手法がある。On the other hand, as a method of extracting a character string on the premise of character recognition, a method of tracking connected components of black pixels has been reported. In addition, there is a method in which rough information about a format such as the position of a column is given in advance, and a paper component is extracted based on the rough information.

さらに、木田、増田「書式指定情報によらない紙面構
成要素抽出法」，電子通信学会論文集、83/1,Vol.J66−
D No.1,p111〜p118記載の論文では、フィールドセパレ
ータ、すなわち、紙面上の領域を強制的に分割する直線
状図形と見出し及び本文の文字列とを構成要素とする紙
面から、紙面上での位置に関する事前情報を全く用いず
に、これらの紙面構成要素を自動的に抽出する手法が提
案されている。Kida, Masuda, "Paper Constituent Elements Extraction Method without Format Specification Information", IEICE Transactions, 83/1, Vol.J66-
In the paper described in D No. 1, p111 to p118, a field separator, that is, a linear figure forcibly dividing an area on the paper and a headline and a character string of the main body as a constituent element, a paper form is used. There has been proposed a method of automatically extracting these paper components without using any prior information on the position of the paper.

このように紙面構成要素の配列、すなわち、レイアウ
トを自動的に認識する場合、誤って認識される場合があ
る。このため、自動認識されたレイアウトに対して、人
間が修正を加えていた。As described above, when the arrangement of the page components, that is, the layout is automatically recognized, it may be erroneously recognized. For this reason, humans have modified the automatically recognized layout.

従来の文書認識装置においては、認識結果を修正する
作業において、レイアウトの修正時はレイアウトのみを
修正し、文字部分の修正時は文字部分のみを修正するこ
とによって、認識結果の修正を行っていた。このため、
レイアウト構造の認識が間違っているために文字認識が
間違う場合が生じても、レイアウトのみの修正で終わ
り、自動的に文字部分の修正が行われるということはな
かった。In the conventional document recognition device, in the work of correcting the recognition result, the correction of the recognition result is performed by correcting only the layout when correcting the layout, and correcting only the character portion when correcting the character portion. . For this reason,
Even if the character recognition is mistaken due to the wrong recognition of the layout structure, the correction is completed only with the layout, and the character portion is not automatically corrected.

具体的な例をあげて説明すると、第３図（ａ）に示す
ような２段の文章とその中間にある行番号等の文字から
構成される原画像に対して、本来は同図（ｂ）のような
２段とその中間にある文字に分けてレイアウト解析をす
べき箇所を、中間にある文字と両端のブロック内の文字
行とが水平方向に接近しているため、同図（ｃ）のよう
に中間文字部分が隣接する段の行と融合してレイアウト
解析される場合がある。この場合、同図（ｄ）に破線で
囲んだ領域６で示されるように、左の段の１行と、中間
文学部分と、右の段の１行とが連続した１行と見做され
て文脈処理が行われるため、本来「気です。明日10ロー
クでゴー」と文脈処理が行われるべき部分が、同図
（ｅ）に示すように、「気です。明日100−７でゴー」
と誤って文脈処理が行われてしまう。なお、このように
文脈処理が行われるのは、後述するように数字の認識が
優先される場合があるためである。Explaining this with a specific example, an original image composed of a two-stage sentence as shown in FIG. ), The portion to be laid out in two stages and the middle character is subjected to layout analysis because the middle character and the character lines in the blocks at both ends are close to each other in the horizontal direction. ), The layout analysis may be performed in such a manner that the intermediate character portion is merged with the adjacent line. In this case, as shown by an area 6 surrounded by a broken line in FIG. 9D, one line on the left, the intermediate literary part, and one line on the right are regarded as one continuous line. Because the context processing is performed, the part that should be contextually processed as "I'm fine. Go tomorrow at 10 roks" is, as shown in Fig. (E), "I'm fine. Go tomorrow 100-7".
Context processing is performed by mistake. The reason why the context processing is performed in this way is that recognition of numbers may be given priority as described later.

このように、間違ってレイアウト解析が行われると、
文字認識結果に基づいて文脈処理を行って意味の通る文
章を作成するという知識処理において正常な働きをする
ことができず、第３図（ｅ）に示すような間違った認識
結果を出力してしまう。Thus, if layout analysis is performed by mistake,
In the knowledge processing of creating a meaningful sentence by performing context processing based on the character recognition result, a normal function cannot be performed, and an incorrect recognition result as shown in FIG. I will.

このような場合、操作者が手動でレイアウトを修正す
ることになるが、従来は、正確なレイアウト解析ができ
ていれば文字認識結果に対する知識処理も正解が得られ
る場合においても、操作者がレイアウトを修正しても、
文字認識結果までは自動的に修正されなかった。In such a case, the operator manually corrects the layout.However, conventionally, if the correct layout analysis can be performed, even if the knowledge processing on the character recognition result can be obtained correctly, the operator can modify the layout. Even if you modify
The character recognition result was not automatically corrected.

[Problems to be solved by the invention]

このように従来の文書認識装置は、レイアウト解析に
おいての間違いが操作者によって修正されても、なんら
文字認識結果にまで正常な影響を及ぼす働きにはならな
かった。As described above, in the conventional document recognition apparatus, even if an error in the layout analysis is corrected by the operator, the document recognition apparatus does not function to have a normal effect on the character recognition result.

本発明は、以上のような問題点を解決するためになさ
れたものであり、認識結果の修正効率を改善することを
目的とする。The present invention has been made to solve the above problems, and has as its object to improve the efficiency of correcting a recognition result.

[Means for solving the problem]

本発明の文書認識装置は、上記の目的を達成するため
に、文書を画像として入力する画像入力部と、この画像
入力部により入力された文書画像のレイアウトを抽出す
るレイアウト解析部と、このレイアウト解析部から得ら
れたレイアウト情報に基づいて文字の部分を文字認識し
認識結果とともに候補文字も出力する文字認識部と、こ
の文字認識部の文字認識の結果から文脈処理を行う知識
処理部と、前記レイアウト解析部からのレイアウト情報
と前記文字認識部による文字認識との結果を表示しレイ
アウトの修正を可能にする修正インタフェース部と、こ
の修正インタフェース部からのレイアウト修正の情報に
基づいて知識処理部へ再処理の指示を行う再処理指示部
とから構成されたことを特徴とする。In order to achieve the above object, a document recognition device according to the present invention includes an image input unit that inputs a document as an image, a layout analysis unit that extracts a layout of a document image input by the image input unit, A character recognition unit that recognizes a character portion based on the layout information obtained from the analysis unit and outputs candidate characters along with the recognition result, a knowledge processing unit that performs context processing from the character recognition result of the character recognition unit, A correction interface unit for displaying layout information from the layout analysis unit and a result of character recognition by the character recognition unit and enabling layout correction; and a knowledge processing unit based on the layout correction information from the correction interface unit. And a reprocessing instruction unit for instructing reprocessing.

[Action]

本発明によると、画像入力部から一般文書をディジタ
ル画像として入力し、レイアウト解析部において画像入
力部から入力された原画像のレイアウトを抽出し、レイ
アウト解析により文字ブロックであると判定された領域
に対して文字認識部により候補文字の出力も行う文字認
識処理を行う。次いで、知識処理部において、候補文字
を含む文字認識結果より文脈処理を行って意味の通る文
章を作成する。次いで、修正インタフェース部がレイア
ウト解析部と知識処理部とを通した文字認識部の結果を
認識結果表示部に表示し、操作者がキーボード／マウス
等の指示装置を通して修正を行う。この修正の結果、レ
イアウト情報が修正された場合、このレイアウト修正の
情報に基づいて、再処理指示部が知識処理部にフイード
バックして新たに知識処理された結果を認識結果表示部
に表示する。このように、修正処理を繰り返すことによ
って、目的とする文書が作成される。According to the present invention, a general document is input as a digital image from an image input unit, a layout of an original image input from the image input unit is extracted in a layout analysis unit, and a layout analysis is performed on an area determined to be a character block. On the other hand, the character recognition unit performs character recognition processing for outputting candidate characters. Next, in the knowledge processing unit, context processing is performed on the result of character recognition including the candidate character to create a meaningful sentence. Next, the correction interface unit displays the result of the character recognition unit that has passed through the layout analysis unit and the knowledge processing unit on the recognition result display unit, and the operator makes a correction through an instruction device such as a keyboard / mouse. When the layout information is corrected as a result of the correction, the reprocessing instruction unit feeds back to the knowledge processing unit based on the layout correction information, and displays a newly knowledge-processed result on the recognition result display unit. As described above, a target document is created by repeating the correction process.

〔Example〕

以下、図面に示す実施例に基づいて本発明の特徴を具
体的に説明する。Hereinafter, features of the present invention will be specifically described based on embodiments shown in the drawings.

第１図は本実施例における文書認識装置の構成図、第
２図は同文書認識装置において使用される修正部の構成
図である。FIG. 1 is a configuration diagram of a document recognition device in the present embodiment, and FIG. 2 is a configuration diagram of a correction unit used in the document recognition device.

文書認識装置は、文書を画像として入力する画像入力
部１と、入力された文書画像のレイアウトを抽出するレ
イアウト解析部２と、得られたレイアウト情報に基づい
て文字の部分を文字認識し認識結果とともに候補文字も
出力する文字認識部３と、文字認識の結果から文法処理
等を行う知識処理部４と、レイアウト情報と文字認識と
の結果を表示しレイアウトの修正を可能にする修正イン
タフェース部51と、この修正情報に基づいて知識処理部
４へ再処理の指示を行う再処理指示部52とから構成され
ている。The document recognition apparatus includes an image input unit 1 for inputting a document as an image, a layout analysis unit 2 for extracting a layout of the input document image, and character recognition based on the obtained layout information. A character recognition unit 3 that also outputs candidate characters, a knowledge processing unit 4 that performs grammatical processing and the like from the result of character recognition, and a correction interface unit 51 that displays layout information and the result of character recognition to enable layout correction. And a reprocessing instruction unit 52 for instructing the knowledge processing unit 4 to perform reprocessing based on the correction information.

さらに、上記修正インタフェース部51と再処理指示部
52とで修正部を構成しており、修正インタフェース部51
は、指示装置であるキーボード／マウス511と、文字認
識部３での認識結果を表示する認識結果表示部512と、
文字認識の結果を受け取る修正処理制御部513から構成
されている。なお、修正処理制御部513は、キーボード
／マウス511からの入力に基づき文書を作成する機能、
すなわち、ワードプロセッサ機能を有している。Further, the correction interface unit 51 and the reprocessing instruction unit
52 and a correction unit, and a correction interface unit 51
A keyboard / mouse 511 as an instruction device, a recognition result display unit 512 for displaying a recognition result in the character recognition unit 3,
It comprises a correction processing control unit 513 that receives the result of character recognition. The correction processing control unit 513 has a function of creating a document based on an input from the keyboard / mouse 511,
That is, it has a word processor function.

次に、上述の文書認識装置の動作について説明する。 Next, the operation of the above-described document recognition device will be described.

まず、画像入力部１で入力し２値化した文書画像を、
レイアウト解析部２で解析する。解析の方法は、前記の
参考文献「書式指定情報によらない紙面構成要素抽出
法」等に提案されている解析の方法、即ち、フィールド
セパレータ（紙面上の領域を強制的に分割する直線状図
形）と見出し及び本文の文字列とを構成要素とする紙面
から、紙面上での位置に関する事前情報を全く用いずに
これらの紙面構成要素を自動的に抽出する手法等を用
い、文字のみの領域と判断したブロックに対して文字認
識部３により文字認識を行う。First, a binarized document image input by the image input unit 1 is
The analysis is performed by the layout analysis unit 2. The analysis method is a method of analysis proposed in the above-mentioned reference document "paper component extraction method not depending on format specification information", that is, a field separator (a linear figure forcibly dividing an area on the paper). ) And a heading and body character string as constituent elements, using a method of automatically extracting these paper constituent elements without using any prior information on the position on the paper, etc. Character recognition is performed by the character recognition unit 3 on the block determined to be.

文字認識の方法は既知の方法を用いる。この文字認識
部３は、認識の確からしさとして第１位の認識文字以外
にもその文字らしいという複数の候補文字も出力する。
知識処理部４では、文字認識部３から出力された認識結
果から文法処理等を行い、用意した文法に適合した文字
列に変換する処理を行う。A known method is used as a method of character recognition. The character recognizing unit 3 outputs a plurality of candidate characters that are likely to be the character in addition to the first recognized character as the certainty of the recognition.
The knowledge processing unit 4 performs a grammar process or the like from the recognition result output from the character recognition unit 3 and performs a process of converting it into a character string conforming to the prepared grammar.

この処理の１例を、第３図に示す画像を例に挙げて説
明する。いま、元の文字列に対して文字認識部３から出
力された候補文字を含む認識結果が第１表に示すような
ものであったとする。An example of this processing will be described using the image shown in FIG. 3 as an example. Now, it is assumed that the recognition result including the candidate characters output from the character recognition unit 3 for the original character string is as shown in Table 1.

第１表に示すように、文字認識部３から出力された候
補文字を含む認識結果から、確からしさ第１位の文字列
の中で、現在注目している文字のすぐ前の文字が数字で
あるならば、現在注目している文字も数字である確率が
高い。そこで、確からしさにおいて第１位の文字が数字
でなく確からしさにおいて第２位の文字が数字であるな
らば、その第２位の文字を認識結果として出力する。同
様のことを数字以外の漢字、ひらがな、カタカナに対し
て行う。この処理によって、第１表において文字列「10
ローク」は、認識処理の確からしさ第１位の文字のみで
構成すると、文字列「10ローク」と正しく認識される。
ここで、認識結果「ローク」の確からしさ第２位の候補
文字として「０−７」があるならば、上記に説明した知
識処理を行うと、中間の文字、すなわち、行番号「10」
まで一緒にして「100−７」という文字列を生成する。
このように、文字認識部３の結果を知識処理部４によっ
て自動修正する。 As shown in Table 1, from the recognition result including the candidate characters output from the character recognition unit 3, the character immediately before the currently focused character in the character string of the first degree of certainty is a numeral. If there is, there is a high probability that the current character is also a number. Therefore, if the first character in the certainty is not a number and the second character in the certainty is a number, the second character is output as a recognition result. Do the same for kanji, hiragana, and katakana other than numbers. By this processing, the character string “10
If "Rourke" is composed of only the character having the first place in the certainty of the recognition process, it is correctly recognized as the character string "10 Rourke".
Here, if there is “0-7” as the second-place candidate character in the likelihood of the recognition result “Rook”, the above-described knowledge processing will result in an intermediate character, that is, the line number “10”.
Together to generate a character string "100-7".
As described above, the result of the character recognition unit 3 is automatically corrected by the knowledge processing unit 4.

ただし、この処理は処理対象として文字列が正しく切
り出されていないと、つまり、レイアウト解析部２の処
理が、目的とするレイアウトとして解析されていない
と、第３図に示したように認識率を下げる処理になって
しまう。レイアウト解析の結果、すなわち、ブロックの
種類、各文字の位置、大きさなど、知識処理を行った文
字認識の結果（候補文字も含む）および、どのブロック
のどの位置に認識した文字が対応しているかを表わして
いる表も修正部５に送る。However, in this processing, if the character string is not correctly cut out as a processing target, that is, if the processing of the layout analysis unit 2 is not analyzed as a target layout, as shown in FIG. It becomes processing to lower. The result of the layout analysis, that is, the result of character recognition (including candidate characters) that has been subjected to knowledge processing such as the type of block, the position and size of each character, and the character recognized at which position in which block A table indicating whether or not it is present is also sent to the correction unit 5.

修正部５では、まず修正インタフェース部51におい
て、レイアウト解析の結果および知識処理を行った文字
認識の結果を修正処理制御部513が受け取り、認識結果
表示部512にレイアウトと文字認識結果の表示を行う。In the correction unit 5, first, in the correction interface unit 51, the correction processing control unit 513 receives the result of the layout analysis and the result of the character recognition that has been subjected to the knowledge processing, and displays the layout and the character recognition result on the recognition result display unit 512. .

そして、操作者がこの認識結果表示部512を見なが
ら、キーボード／マウス511などの指示装置を用いて修
正を行う。Then, the operator makes a correction using an instruction device such as a keyboard / mouse 511 while looking at the recognition result display section 512.

表示の際に、文字のみのブロックは青色、図形のブロ
ックは赤色などのように色分けすることによって、操作
者にブロックの種類が分かりやすいようにすることが可
能である。At the time of display, blocks of characters only are color-coded such as blue, and blocks of figures are red, so that the type of block can be easily understood by the operator.

また、修正完了のブロックと未修正のブロックとの区
別を色分けることも可能である。修正手順はまず操作者
が見てわかりやすいレイアウトから修正する。修正によ
って、ブロックの種類（文字のみのブロックまたは図形
のブロックなど）、その位置や大きさなどが変更され
る。これによって、修正処理制御部51が持っているブロ
ックの位置、大きさ、そのブロックに含まれている文字
などの情報が入っている表を書き換える。Further, it is also possible to color-code a block between a corrected block and an uncorrected block. The modification procedure is first modified from a layout that is easy for the operator to see. The modification changes the type of block (a block of only characters or a block of a figure, etc.), its position and size, and the like. As a result, the table held by the correction processing control unit 51 and containing information such as the position and size of the block and the characters included in the block is rewritten.

そして、レイアウト修正が終了して目的とするレイア
ウトの作成ができた時点で、再処理指示部52が、変更さ
れたレイアウト部分について、知識処理部４に再処理の
指示を行う。Then, when the layout modification is completed and the target layout is created, the reprocessing instruction unit 52 instructs the knowledge processing unit 4 to reprocess the changed layout portion.

これによって、第３図（ａ）のようなレイアウトを持
つ文書に対して、同図（ｃ）のようなレイアウト解析が
行われた場合、操作者が同図（ｂ）のように修正すれ
ば、知識処理部では、誤レイアウト解析された領域６に
対しても正しく知識処理を行うことができる。Thus, if a layout analysis as shown in FIG. 3C is performed on a document having a layout as shown in FIG. 3A, the operator can correct the layout as shown in FIG. 3B. In addition, the knowledge processing section can correctly perform the knowledge processing even on the area 6 subjected to the erroneous layout analysis.

最後に、文字の部分を修正処理制御部513の前記ワー
プロ機能を用い、文字認識結果の間違いを修正する。こ
れらの処理によって、操作者が目的とする文書の作成が
行われる。Finally, an error in the character recognition result is corrected using the word processing function of the correction processing control unit 513 for the character portion. Through these processes, an operator creates a target document.

上記の例では文書作成のみの実施例について説明した
が、伝票入力にも使える。In the above example, the embodiment in which only a document is created has been described, but the present invention can also be used for slip input.

また、上記の実施例では、全てのレイアウト修正が完
了してから知識処理へ再処理の指示を行っていたが、一
つ一つのレイアウト修正が行われる度に知識処理へ再処
理の指示を行うことも可能である。Further, in the above-described embodiment, the reprocessing instruction is issued to the knowledge processing after all the layout corrections are completed. However, the reprocessing instruction is issued to the knowledge processing every time each layout correction is performed. It is also possible.

また、上記の実施例では、知識処理への再処理のみに
ついて説明したが、文字認識への再処理も可能である。In the above embodiment, only the reprocessing to the knowledge processing has been described, but the reprocessing to the character recognition is also possible.

ところで、文字認識への再処理指示が行われるのは、
レイアウト解析の中で最小単位である文字の切り出しが
失敗した場合である。By the way, the reprocessing instruction for character recognition is issued
This is a case in which extraction of a character, which is the minimum unit, in layout analysis has failed.

具体的に説明すると、「林」の字について正確な文字
の切り出しが失敗すると、「木」，「木」というように
文字認識してしまう。この場合は、「木」，「木」とい
う文字を囲む二つの矩形を一文字であると修正すること
によって、文字認識部へ再処理指示を行う。More specifically, if an accurate character cutout fails for the character "Hayashi", the character is recognized as "tree" or "tree". In this case, the character recognition unit is instructed to reprocess by correcting the two rectangles surrounding the characters "tree" and "tree" to be one character.

〔The invention's effect〕

以上に説明したように、本発明によると、レイアウト
修正の結果が知識処理部で処理されるため知識処理が完
全に行われるようになり、文書作成の際の文字修正操作
の効率が著しく改善されるという効果を奏する。As described above, according to the present invention, the result of the layout correction is processed by the knowledge processing unit, so that the knowledge processing is completely performed, and the efficiency of the character correction operation at the time of document creation is significantly improved. It has the effect of

[Brief description of the drawings]

第１図は本発明の一実施例を示す文書認識装置の構成
図、第２図は同文書認識装置において使用される修正部
の構成図、第３図は従来例における問題点の説明図であ
る。 1:画像入力部、2:レイアウト解析部 3:文字認識部、4:知識処理部 5:修正部 51:修正インタフェース 511:キーボード／マウス 512:認識結果表示部、513:修正処理制御部 52:再処理指示部 6:誤レイアウト解析領域FIG. 1 is a block diagram of a document recognition device showing an embodiment of the present invention, FIG. 2 is a block diagram of a correction unit used in the document recognition device, and FIG. 3 is an explanatory diagram of problems in a conventional example. is there. 1: Image input unit, 2: Layout analysis unit 3: Character recognition unit, 4: Knowledge processing unit 5: Correction unit 51: Correction interface 511: Keyboard / mouse 512: Recognition result display unit, 513: Correction processing control unit 52: Reprocessing instruction section 6: incorrect layout analysis area

Claims

(57) [Claims]

An image input unit for inputting a document as an image;
A layout analysis unit for extracting a layout of the document image input by the image input unit; and a character recognition unit for character recognition of a character part based on the layout information obtained from the layout analysis unit and outputting candidate characters along with a recognition result. Unit, a knowledge processing unit that performs context processing based on the result of character recognition by the character recognition unit, and the layout information from the layout analysis unit and the result of character recognition by the character recognition unit are displayed to enable layout modification. And a reprocessing instruction unit for instructing the knowledge processing unit to perform reprocessing based on layout correction information from the correction interface unit.