JP2701350B2

JP2701350B2 - Document reading device

Info

Publication number: JP2701350B2
Application number: JP63211146A
Authority: JP
Inventors: 善丈辻
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-08-25
Filing date: 1988-08-25
Publication date: 1998-01-21
Anticipated expiration: 2013-01-21
Also published as: JPH0259880A

Description

【発明の詳細な説明】（産業上の利用分野）本願発明は、書籍等の文書画像内の任意の文字読取領
域を抽出し、文字読取りを行う文書読取装置に係わり、
特に、利用者の負担を軽減するようにした文書読取装置
に係わる。DETAILED DESCRIPTION OF THE INVENTION (Industrial application field) The present invention relates to a document reading apparatus that extracts an arbitrary character reading area in a document image of a book or the like and reads the character.
In particular, the present invention relates to a document reading device that reduces the burden on a user.

（従来の技術）一般書籍等の既存文書画像内の所望の読取り領域を文
字認識装置を用いて自動的に読み取ることは、既存文書
画像の効率的な蓄積および伝送を実現する上で重要であ
る。このような既存文書を読み取る場合、読み取りを行
うべき領域は、文書全体やアブストラク等の文書の一部
分のように、利用状況によって一意に定まらないという
問題点が生じる。また、既存文書画像には、段組、更に
は図や表が含まれることもある。(Prior Art) It is important to automatically read a desired reading area in an existing document image such as a general book using a character recognition device in order to realize efficient storage and transmission of the existing document image. . When such an existing document is read, there is a problem that an area to be read is not uniquely determined depending on a use situation like a whole document or a part of a document such as an abstract. Further, the existing document image may include columns, figures, and tables.

従来、このような既存文書画像から読取るべき領域を
決定する最も一般的な方法は、文書画像上の２点を指定
することによって読取領域を矩形領域で求める第一の方
式が知られている。Conventionally, the most common method for determining an area to be read from such an existing document image is a first method in which a reading area is determined as a rectangular area by designating two points on the document image.

また、例えば特願昭61−281377号「画像理解方式」に
示されているように、文書画像を複数個の矩形領域の集
合として定義された文法に従って抽出すべき領域を求め
る第２の方式が知られている。Further, as shown in, for example, Japanese Patent Application No. 61-281377, "Image understanding method", a second method for obtaining an area to extract a document image in accordance with a grammar defined as a set of a plurality of rectangular areas is known. Are known.

一方、２段組や図，表等が含まれる一般文書面像を文
字行・図・表等の基本要素に分割し、所望領域を自動抽
出する方式として、例えば、本願発明者と同一人による
「スプリット検出法に基づく頁画像の構造解析」（電子
通信学会技術報告パターン認識と学習PRL85−17,1985−
6,63ページ〜70ページ）なる技術論文に記載されている
第３の方式が知られている。On the other hand, as a method of dividing a general document image including a two-column form, a figure, a table, and the like into basic elements such as character lines, figures, and tables, and automatically extracting a desired area, for example, "Structural Analysis of Page Images Based on Split Detection Method" (IEICE Technical Report, Pattern Recognition and Learning PRL85-17, 1985-
A third method described in a technical paper (6, 63 pages to 70 pages) is known.

また、文書画像を文字行・図・表等の基本要素に分割
後、文書画像を構成する要素及び要素間の配置関係を階
層的に表現した木構造として構造化する第４の方式が、
例えば、本願発明者と同一人による特願62−172199号
「文書画像解析方式」に記載され、知られている。A fourth method of dividing a document image into basic elements such as character lines, figures, and tables, and then structuring the elements constituting the document image and the arrangement relation between the elements into a tree structure in a hierarchical manner,
For example, it is described and known in Japanese Patent Application No. 62-172199 “Document Image Analysis System” by the same person as the present inventor.

（発明が解決しようとする課題）上記「従来の技術」の欄で述べた第１の方式では、常
に文書画像を見ながら所望の読取り領域を矩形領域で表
わすための最低２点を指定する必要があり、更にディス
プレイ画面上の画像表示精度が劣化する場合も考慮する
と、利用者の負担が大きくなるという欠点があった。更
に、２段組等の文書画像では、数回に分けて指定する必
要があった。(Problem to be Solved by the Invention) In the first method described in the section of "Prior Art", it is necessary to designate at least two points for expressing a desired reading area as a rectangular area while always looking at a document image. In addition, there is a disadvantage that the burden on the user is increased in consideration of the case where the image display accuracy on the display screen is deteriorated. Further, in the case of a two-column document image or the like, it is necessary to specify the document image several times.

上記「従来の技術」の欄で述べた第２の方式では、矩
形領域の位置・サイズを絶対又は相対座標をベースにす
べて定義することは、労力を必要とし、また、座標によ
る矩形領域の定義は、行数の変化や図等の混在によっ
て、更に利用者の負担が大きくなる。In the second method described in the section of the above-mentioned "Prior Art", defining the position and size of the rectangular area based on absolute or relative coordinates requires a lot of effort. In this case, the load on the user is further increased due to a change in the number of lines or the mixture of figures and the like.

上記「従来の技術」の欄で述べた本願発明者と同一人
による第３の方式では、既存文書画像から所望の領域を
自動抽出できる一方、読取り領域を固定的に決定されて
いるために、利用形態が固定されるという欠点があっ
た。In the third method by the same person as the inventor of the present application described in the section of the above "prior art", while a desired area can be automatically extracted from an existing document image, a reading area is fixedly determined. There was a drawback that the use form was fixed.

また、上記「従来の技術」の欄で述べた本願発明者と
同一人による第４の方式では、文書画像を構成する要素
及び要素間の配置関係を木構造として構造化する方式が
述べられているが、具体的に利用者が選択する読取り領
域を決定し、文字読取りを行う方法について示されてい
ない。Further, in the fourth method by the same person as the inventor of the present invention described in the section of the "prior art", a method of structuring the elements constituting the document image and the arrangement relation between the elements as a tree structure is described. However, it does not specifically describe a method of determining a reading area selected by a user and reading characters.

このように、文書画像から読取るべき領域を決定する
従来の方式には、上述の４つの方式のいずれにも解決す
べき課題がある。As described above, the conventional method of determining the area to be read from the document image has a problem to be solved in any of the above-described four methods.

そこで、本願発明の目的は、従来の上記課題を解決す
るために、入力画像から文書を構成する要素及び要素間
の配置関係を木構造として構造化した後、その木構造を
用いて文書画像のレイアウト構造を抽出し、レイアウト
表示された要素領域上を１点で指定することによって、
利用者の負担を軽減するようにした文書読取装置を提供
することにある。Therefore, an object of the present invention is to solve the above-described conventional problems by structuring the elements constituting a document from an input image and the arrangement relationship between the elements as a tree structure, and then using the tree structure to form a document image. By extracting the layout structure and specifying one point on the element area where the layout is displayed,
It is an object of the present invention to provide a document reading device that reduces the burden on a user.

また、本願発明の他の目的は、既存文書画像から種々
なレベルで要求される所望の読取り領域を１点でしかも
指定位置を緩和した状態で指定できるようにして、利用
者の負担を容易に軽減するようにした文書読取装置を提
供することにある。Another object of the present invention is to allow a desired reading area required at various levels from an existing document image to be designated at one point and in a state where the designated position is relaxed, so that the burden on the user can be easily reduced. It is an object of the present invention to provide a document reading device which is reduced.

（課題を解決するための手段）前述の課題を解決するために本願発明が提供する文書
読取装置は、文書画像を文字行，図等の基本要素ブロッ
クに分解する領域分割手段と、前記複数個の基本要素ブ
ロックから順次に構造化し、文書画像を構成する要素ブ
ロック及び各要素ブロック間の配置関係を階層的に表現
した木構造として構造化する文書構造化手段と、所定レ
ベル又は、後記再表現領域選択手段によって選択された
要素ブロックのレイアウト構造を前記木構造を用いて抽
出し、表示するレイアウト抽出手段と、レイアウト表示
された１つ又は複数個の要素ブロックから詳細なレイア
ウト情報を表現すべき要素ブロックを選択する再表示領
域選択手段と、レイアウト表示された１つ又は複数個の
要素ブロックから読取領域を１点で指定して決定する領
域決定手段と、前記読取領域内の文字行のみを順次に前
記木構造を縦型探索した順序で読出す行読出し手段と、
読出された文字行を１文字単位に切出し、認識辞書と照
合して文字読取り結果を順次に出力する文字認識手段と
から成る。(Means for Solving the Problems) In order to solve the above-mentioned problems, a document reading apparatus provided by the present invention comprises: a region dividing means for decomposing a document image into basic element blocks such as character lines and figures; Document structuring means for sequentially structuring from the basic element blocks of the above, and structuring the element blocks constituting the document image and the arrangement relationship between the element blocks as a tree structure hierarchically expressed, and a predetermined level or re-expression Layout extraction means for extracting and displaying the layout structure of the element block selected by the area selection means using the tree structure, and detailed layout information should be expressed from one or a plurality of the layout-displayed element blocks. A re-display area selecting means for selecting an element block, and specifying a reading area at one point from one or a plurality of element blocks displayed in a layout. Area determining means for determining, and line reading means for sequentially reading only the character lines in the reading area in the vertical search order of the tree structure,
A character recognition unit that cuts out the read character line in units of one character, collates the character line with a recognition dictionary, and sequentially outputs character read results.

（実施例）以下に本願発明の実施例について図面を参照しながら
説明する。(Example) Hereinafter, an example of the present invention will be described with reference to the drawings.

第１図は、横書きで記載された文書画像を基本要素に
分割した後、文書を構成する要素に構造化する方法を説
明するために示した一例である。また、第２図は、第１
図で示した文書画像から基本要素の分割及び構造化によ
って、文書画像を構成する要素及び要素間の配置構造を
木構造として生成された結果を示した一例である。FIG. 1 is an example shown for explaining a method of dividing a horizontally written document image into basic elements and then structuring the elements into elements constituting the document. Also, FIG.
FIG. 11 is an example showing a result of generating a tree structure of elements constituting a document image and an arrangement structure between the elements by dividing and structuring basic elements from the document image shown in FIG.

第１図において、斜線を入れた丸は文字を示してい
る。文書画像を基本要素（図中、記号Si（ｉ＝1,2,…2
0）で示す文字行ブロック）に分割する方法は、例え
ば、前述した本願発明と同一人による「スプリット検出
法に基づく頁画像の構造解析」（電子通信学会技術研究
報告パターン認識と学習PRL85−17,1985−６）によって
実現することができる。尚、上記従来技術等を用いるこ
とによって、例えば、図，等が混在していてもあるい
は、縦書きであっても基本要素に分割できることは言う
までもない。次に、基本要素から順次に構造化を行い、
文書画像を構成する要素及び要素間の配置構造を木構造
として生成する処理について述べる。In FIG. 1, hatched circles indicate characters. A document image is used as a basic element (in the figure, the symbol Si (i = 1, 2, ... 2)
For example, the method of dividing into a character line block indicated by 0) is described in “Structural analysis of page image based on split detection method” by the same person as the above-mentioned invention of the present application (IEEE Technical Report PRL85-17 , 1985-6). It is needless to say that, by using the above-mentioned conventional technology and the like, for example, even if figures and the like are mixed or vertical writing is performed, it can be divided into basic elements. Next, structuring is performed sequentially from the basic elements,
Processing for generating the elements constituting the document image and the arrangement structure between the elements as a tree structure will be described.

まず、文書画像を構成する重要な要素として文章ブロ
ックがある。ここで、文章ブロックを同一文字の並び方
向を持つ文字行が所定の行間ピッチ以下で並んでいる文
字行の集合と定義すると、以下で述べる文章ブロック
は、通常の文書に於けるパラグラフ単位に構造化された
領域と見なしても良い。First, a sentence block is an important element constituting a document image. Here, if a sentence block is defined as a set of character lines in which character lines having the same character arrangement direction are arranged at a predetermined pitch or less, the sentence block described below has a structure in paragraph units in a normal document. It may be regarded as a transformed area.

例えば、第１図の文字行S₃,S₄,S₅は、文章ブロックT₂
に上記定義に従って構造化されることになる。また、以
下の説明では、文章ブロックや文字行，図，表，写真等
の基本要素ブロックの組合せ領域を仮想ブロックと呼ぶ
ことにする。例えば、第１図の文字行ブロックS₆と文章
ブロックT₃の合成領域M₃は仮想ブロックとなる。次に、
文書画像を構成する要素ブロック間の配置関係として、
上下関係，左右関係，包含関係を導入する。例えば、図
において、文字行ブロックS₁,S₂は、上下関係にあり、
仮想ブロックM₃と文章ブロックT₄は左右関係にある。ま
た、２段組領域を意味する仮想ブロックM₂と仮想ブロッ
クM₃とは包含関係となる。For example, the character line of FIG. 1 S _3, S _4, S ₅ is text block T ₂
Will be structured according to the above definition. In the following description, a combination area of basic element blocks such as a text block, a character line, a figure, a table, and a photograph will be referred to as a virtual block. For example, a character row block S ₆ and combining region M ₃ sentences blocks T ₃ of FIG. 1 is a virtual block. next,
As the arrangement relationship between the element blocks that constitute the document image,
Introduce a vertical relationship, a horizontal relationship, and an inclusion relationship. For example, in the figure, the character line blocks S ₁ and S ₂ are in a vertical relationship,
Virtual block M ₃ and the sentence block T ₄ is on the left and right relationship. Further, the containment relationship between the virtual block M ₂ and the virtual block M _3, which means 2-column region.

以上に説明したような配置関係も含めて領域の構造化
を行うと、第１図で示した文書画像に対して第２図で示
すような木構造が生成できる。第２図において、図中丸
印で示したノードは、各要素ブロックを示し、ノード内
の記号は、それぞれ第１図の要素ブロック（但し、記号
Ｐは１頁領域とする）を示している。また、図中矢印↓
及→は、それぞれ上下関係及び左右関係の配置関係を意
味する。例えば、図２から１頁領域Ｐは、左右関係にあ
る２つの仮想ブロックを含んでいることが容易にわか
る。尚、各要素ブロックの情報として、位置，大きさ，
要素名，配置関係を示すポインター等を持っているとす
る。By structuring the area including the arrangement relationship described above, a tree structure as shown in FIG. 2 can be generated for the document image shown in FIG. In FIG. 2, the nodes indicated by circles in the figure indicate the respective element blocks, and the symbols in the nodes respectively indicate the element blocks in FIG. 1 (where the symbol P is a one-page area). Also, the arrow in the figure ↓
→ refers to an arrangement relationship of a vertical relationship and a horizontal relationship, respectively. For example, it can be easily understood from FIG. 2 that one page region P includes two virtual blocks in a left-right relationship. In addition, position, size,
Assume that it has an element name, a pointer indicating the arrangement relationship, and the like.

いま、１頁領域Ｐから始めて第２図の木構造を通常の
縦型探索を行い、文字行ブロックSi（ｉ＝１…20）を順
次取り出す場合を考えると、最初に文字行S₁が見つか
り、次に文字行S₂が見つかり、最後に文字行S₃が見つか
ることになる。即ち、上下関係を満足する場合には、上
から下へ順次文字行が読み出せ、左右関係を満足する場
合には、第１図の横書きの例では、左から右へ順次文字
行を読み出すことができるため、文章の読みべき順序で
文字行が検出できる。Now, the tree structure of Figure 2 starting from the first page region P performs normal vertical search, considering the case where sequential retrieving a character row block Si (i = 1 ... 20) , the first character line S ₁ found , then find the character line S _2, it will be the last to find a character line S _3. That is, when the vertical relationship is satisfied, the character lines can be sequentially read from top to bottom. When the horizontal relationship is satisfied, the character lines can be sequentially read from left to right in the example of horizontal writing in FIG. Character strings can be detected in the order in which sentences should be read.

尚、第２図で示したような文書画像の各要素の配置関
係も含んだ構造化方法については、例えば前述したよう
な本願と同一人による特願62−172199号「文書画像解析
方式」に記載された方式を利用することによって実現で
きる。The structuring method including the arrangement relation of each element of the document image as shown in FIG. 2 is described in, for example, Japanese Patent Application No. 62-172199 “Document Image Analysis Method” by the same person as the present application. This can be realized by using the described method.

第３図は、第２図の木構造を縦型探索し、文章ブロッ
ク又は基本要素ブロックを抽出し、それらの位置・サイ
ズ情報に従ってレイアウト表示した一例である。そこ
で、第３図を用いて本願の第１の発明の文書読取装置の
領域指定方法について説明する。FIG. 3 is an example in which the tree structure in FIG. 2 is searched vertically, text blocks or basic element blocks are extracted, and the layout is displayed according to their position and size information. Therefore, a method for specifying an area of the document reading apparatus according to the first invention of the present application will be described with reference to FIG.

尚、第３図で示したレイアウト表示について、各要素
ブロックは、色情報や図形パタン等を用いて要素名毎に
識別しても良い。更に、例えば、縮少した文書画像を第
３図のレイアウト表示に対応付けて表示することも容易
に実現できる。図中矢印で示した記号a,b,c,dはそれぞ
れ、表示画面上の利用者の指定位置の一例を示してい
る。尚、表示画面のポインティングデバイスとしては、
マウス等の公知の装置が利用できるが、これに限定され
るものではない。In the layout display shown in FIG. 3, each element block may be identified for each element name using color information, a graphic pattern, or the like. Further, for example, a reduced document image can be easily displayed in association with the layout display of FIG. Symbols a, b, c, and d indicated by arrows in the figure each indicate an example of a user-specified position on the display screen. In addition, as the pointing device of the display screen,
A known device such as a mouse can be used, but is not limited thereto.

いま、図中矢印ａで示すような文章ブロックT₂内の任
意の１点が読取領域として指定されると、第２図で示し
た文章ブロックT₂内に含まれる文字行ブロックS₃,S₄,S₅
を順次に読み出し、次に、各文字行ブロック内の１文字
が順次切り出され、認識される。これにより、文章ブロ
ックT₂の各文字イメージが文章として文字コード列に変
換される。尚、上記文字切出し及び文字認識には、従来
の公知の技術が利用できる。Now, the arbitrary point in the text block T ₂ as shown by the arrow a is designated as a reading area, a character row block S ₃ included in the text block T ₂ shown in FIG. 2, S ₄ , S ₅
Are sequentially read, and then one character in each character line block is sequentially cut out and recognized. Thus, each character image in the text block T ₂ is converted into a character code string as a sentence. Note that a conventionally known technique can be used for the above-described character extraction and character recognition.

同様に、矢印ｂで示すような文書ブロックT₃内の任意
の１点を読取り領域として指定することにより、文章ブ
ロックT₃の各文字イメージを順次に文字コード列に変換
することが容易にできる。Similarly, by specifying the arbitrary point in the document block T ₃ as shown by an arrow b as the reading area, it can be easily converted to each character image in the text block T ₃ sequentially to the character code string .

次に、第１図に示した２段組を表わす仮想ブロックM₂
内の文章全体、即ち、文字行Sk（ｋ＝６…20）を１回の
指定で文字コード列に変換する場合について述べる。第
３図で示した要素ブロックの表示レベルの場合において
は、例えば、矢印ｃの位置を読取り領域として指定する
と、矢印ｃの位置は、第２図の仮想ブロックM₂に含ま
れ、その背下の仮想ブロックM₃及び文章ブロックT₄には
含まれないために、矢印ｃの指定により仮想ブロックM₂
を決定することができる。即ち、第２図の木構造を探索
し、最後に検出された指定された位置を含む要素ブロッ
クとして求めることができる。Next, the virtual block M ₂ representing the two-column set shown in FIG.
The case where the whole sentence, that is, the character line Sk (k = 6... 20) is converted into a character code string by one designation will be described. In the case of a display level element blocks shown in FIG. 3, for example, specifying the position of the arrow c as read area, location of the arrow c is included in the virtual block M ₂ of FIG. 2, the back under for the virtual block M ₃ and not included in the text block T _4, virtual block M ₂ by the designation of the arrow c
Can be determined. That is, the tree structure shown in FIG. 2 is searched, and the tree structure can be obtained as an element block including the specified position detected last.

同様に矢印ｄの位置を指定すると、文書画像全体、即
ち、文書画像内の文字行がすべて第２図で木構造で縦型
探索した順序で文字コード列に変換されることになる。Similarly, when the position of arrow d is designated, the entire document image, that is, all character lines in the document image are converted into character code strings in the order of vertical search in the tree structure in FIG.

上に述べたように、本願の第一の発明によって、従来
の矩形領域を求めるために必要な２点の指定から１点の
指定でしかも指定位置がかなり緩和されることにより利
用者の負担が著しく軽減することができる。しかしなが
ら、第３図で示した矢印ｃの指定の場合には、指定位置
が矢印ｂ等の場合に比べて少し制限されることになる。
尚、１点による読取り領域指定を数回に分けて行っても
良いことは言うまでもない。As described above, according to the first aspect of the present application, the user can be burdened by the fact that the designation position is considerably reduced from the designation of two points required for obtaining a rectangular area to the designation of one point and the designated position is considerably reduced. It can be significantly reduced. However, in the case of the designation of the arrow c shown in FIG. 3, the designated position is slightly restricted as compared with the case of the arrow b or the like.
Needless to say, the reading area designation by one point may be performed several times.

第４図は、本願の第２の発明の文書読取装置の領域指
定方法について説明するために示した一例である。本願
の第１の発明では、第３図の矢印ｃで示した読取り領域
指定のように、表示された各要素ブロックの位置に基づ
いて表示されない要素ブロックの１点による読取り領域
指定を行う手法を提供した。一方、本願の第２の発明で
は、表示レベルを順次変更して表示することにより、所
望の要素ブロックの１点による領域指定を行う手法を提
供する。これにより、再表示指定による数回の表示を行
う必要があるが、１点の指定位置の制限がかなり緩和さ
れる。FIG. 4 is an example shown for explaining the area specifying method of the document reading apparatus of the second invention of the present application. In the first invention of the present application, a method of designating a reading area by one point of an element block that is not displayed based on the position of each displayed element block, such as a reading area specification indicated by an arrow c in FIG. Offered. On the other hand, the second invention of the present application provides a technique for sequentially changing the display level and displaying the area, thereby specifying an area by one point of a desired element block. As a result, it is necessary to perform display several times by specifying redisplay, but the restriction on the designated position of one point is considerably relaxed.

第４図（ａ）は、第２図の１頁領域Ｐのレイアウト表
示を示したものであり、矢印ｅの位置で再表示指定とし
てポインティングを行うと、第４図（ｂ）で示したよう
に、第２図における１頁領域Ｐの背下の仮想ブロック
M₁,M₂が表示される。一方、矢印ｅの位置で第３図で説
明したように読取り領域として指定すると、文書内の各
文字行が順次文字コード列に変換される。同様に、第４
図（ｂ）で、矢印ｆの位置で再表示指定としてポインテ
ィングを行うと、第２図における仮想ブロックM₂の背下
の仮想ブロックM₃及び文章ブロックT₄が第４図（ｃ）で
示すように表示される。FIG. 4 (a) shows a layout display of one page area P in FIG. 2, and when pointing is performed at the position of arrow e as redisplay designation, as shown in FIG. 4 (b). FIG. 2 shows a virtual block under one page area P in FIG.
M ₁ and M ₂ are displayed. On the other hand, when the reading area is designated at the position of arrow e as described in FIG. 3, each character line in the document is sequentially converted into a character code string. Similarly, the fourth
In FIG. (B), when performing pointing as redisplayed specified at the location of arrow f, indicating virtual block M virtual block M ₃ in the back of a ₂ and text block T ₄ in the second diagram in Figure 4 (c) Is displayed as follows.

尚、第４図で示した表示では、表示倍率を変更して表
示しても良い。更に、第３図と同様に、レイアウト表示
と共に、縮少した文書画像を対応付けて表示しても良
い。In the display shown in FIG. 4, the display magnification may be changed and displayed. Further, similarly to FIG. 3, a reduced document image may be displayed in association with the layout display.

第５図は、本願発明の一実施例を示すブロック図であ
る。FIG. 5 is a block diagram showing one embodiment of the present invention.

図において、１は、文書画像を量子化された画像情報
として記憶する画像メモリである。２は、領域分割部で
ある。領域分割部２は、画像メモリ１に記憶された文書
画像を文字行，図，表等の基本要素ブロックに分割する
機能を有しており、その結果を構造化データ記憶部４に
格納する。文書構造化部３は、構造化データ記憶部４の
内容を順次読み取り、更新することによって、第２図で
説明したように、文書画像を構成する要素ブロック及び
各要素ブロック間の配置関係を木構造として生成し、構
造化データ記憶部４に格納する。レイアウト抽出部５
は、後述する再表示選択部15の内部に従って、構造化デ
ータ記憶部４に格納された前記木構造を探索し、表示さ
れるべき各要素ブロック情報を表示部12に転送する。表
示部12は、転送された各要素ブロック情報に基づいて、
第３図で示したように、表示画面（図中省略）上にレイ
アウト表示を行う。尚、表示部12は、画像メモリ１より
文書画像を読み出し、画像縮少を行った後、画面上にレ
イアウト表示と対応付けて表示する機能も持っていると
する。次に、利用者が画面上のレイアウト表示を参照し
ながら第３図で説明したように、マウス等のポインティ
ングデバイスを用いて読取り領域指定を行うと、領域指
定部７からポインティング入力位置情報が再表示選択部
15に転送される。領域決定部８は、ポインティング入力
位置情報に従って、読取るべき要素ブロックを構造化デ
ータ記憶部４に格納された木構造から探索し、行読取し
部９へ転送する。行読取し部９は、読取るべき要素ブロ
ック内に含まれる文字行のみを構造化データ記憶部４に
格納された木構造を縦型探索して順序で順次、読み出
し、文字切出し部10へ転送する。文字切出し部10は、順
次転送される文字ブロック情報に従って、１文字単位の
イメージを画像メモリ１に記憶された文書画像から順次
に切り出し、認識部11へ転送する。認識部11は、認識辞
書13と順次入力された文字イメージと照合し、文字コー
ドに変換し、認識結果記憶部14に順次記憶される。In the figure, reference numeral 1 denotes an image memory for storing a document image as quantized image information. Reference numeral 2 denotes an area dividing unit. The area dividing unit 2 has a function of dividing a document image stored in the image memory 1 into basic element blocks such as character lines, figures, and tables, and stores the result in the structured data storage unit 4. The document structuring unit 3 sequentially reads and updates the contents of the structured data storage unit 4 to determine the element blocks constituting the document image and the arrangement relationship between the element blocks as described in FIG. It is generated as a structure and stored in the structured data storage unit 4. Layout extractor 5
Searches the tree structure stored in the structured data storage unit 4 in accordance with the inside of a redisplay selection unit 15 described later, and transfers each element block information to be displayed to the display unit 12. The display unit 12 displays, based on the transferred element block information,
As shown in FIG. 3, layout display is performed on a display screen (omitted in the figure). It is assumed that the display unit 12 also has a function of reading a document image from the image memory 1, performing image reduction, and displaying the document image on a screen in association with a layout display. Next, as described in FIG. 3 with reference to the layout display on the screen, when the user specifies a reading area using a pointing device such as a mouse, the pointing input position information is re-input from the area specifying unit 7. Display selection section
Transferred to 15. The area determination unit 8 searches the tree structure stored in the structured data storage unit 4 for an element block to be read in accordance with the pointing input position information, and transfers the row to the row reading unit 9. The line reading unit 9 vertically searches the tree structure stored in the structured data storage unit 4 only for character lines included in the element block to be read, sequentially reads out the tree structure, and sequentially transfers the tree structure to the character cutout unit 10. . The character cutout unit 10 sequentially cuts out an image in units of one character from a document image stored in the image memory 1 according to the sequentially transferred character block information, and transfers the image to the recognition unit 11. The recognition unit 11 checks the recognition dictionary 13 with the sequentially input character images, converts them into character codes, and sequentially stores them in the recognition result storage unit 14.

この実施例では、前述したように、表示レベルを順次
変更しながら所望の要素ブロックの領域指定を行うため
に、例えば第４図（ａ）の矢印ｅの領域指定の説明の際
に述べたように、ポインティングデバイスで指定された
入力位置情報と共に、その入力位置情報が再表示指定を
意味するのかあるいはそうでない（即ち読取り領域指定
を表わす）かを示す再表示情報も同時に領域指定部７か
ら出力され、再表示選択部15を転送される。再表示選択
部15では、入力位置情報が再表示領域であれば、その入
力位置情報をレイアウト抽出部５へ転送し、そうでなけ
れば入力位置情報を領域決定部８へ転送する。レイアウ
ト抽出部５において、第５図で述べたように、構造化デ
ータ記憶部４に格納された前記木構造を探索し、表示さ
れるべき各要素ブロック情報を表示部12に転送する機能
は、本願の第一の発明の実施例と同等な機能であるが、
表示されるべき各要素ブロックの探索処理が異なる。即
ち、再表示選択部15から転送された入力位置情報を含む
要素ブロックをまず、表示部12へ転送して、表示した各
要素ブロックから再表示要素ブロックとして検出し、次
に構造化データ記憶部４に格納された木構造を探索し、
再表示要素ブロックの一つ背下にある複数個の要素ブロ
ックを表示されるべき要素ブロックとして取り出し、表
示部12に転送することになる。In this embodiment, as described above, in order to specify the area of a desired element block while sequentially changing the display level, for example, as described in the description of the area specification of the arrow e in FIG. 4A. In addition, together with the input position information specified by the pointing device, the re-display information indicating whether the input position information indicates the re-display specification or not (ie, indicates the reading area specification) is output from the area specifying unit 7 at the same time. Then, the redisplay selection unit 15 is transferred. The redisplay selection unit 15 transfers the input position information to the layout extraction unit 5 if the input position information is the redisplay region, and transfers the input position information to the region determination unit 8 otherwise. As described with reference to FIG. 5, the function of the layout extraction unit 5 for searching the tree structure stored in the structured data storage unit 4 and transferring each element block information to be displayed to the display unit 12 includes: It has the same function as the embodiment of the first invention of the present application,
The search processing for each element block to be displayed is different. That is, the element blocks including the input position information transferred from the re-display selection unit 15 are first transferred to the display unit 12 and detected as re-display element blocks from the displayed element blocks. Search the tree structure stored in 4
A plurality of element blocks behind one of the redisplay element blocks are taken out as element blocks to be displayed and transferred to the display unit 12.

尚、レイアウト抽出部５は、表示部12へ転送する初期
要素ブロック情報として１頁領域が転送されるものとす
る。It is assumed that the layout extraction unit 5 transfers one page area as the initial element block information to be transferred to the display unit 12.

（発明の効果）以上に説明したように、本願発明の文書読取装置によ
れば、入力画像から文書を構成する要素及び要素間の配
置関係を木構造として構造化し、その木構造に従ってレ
イアウト表示された要素領域上を１点でしかも指定位置
を緩和した状態で指定することによって、利用者の負担
を著しく軽減し、しかも既存文書画像から所望の領域の
文字読取りを容易に行うことができる。(Effects of the Invention) As described above, according to the document reading apparatus of the present invention, the elements constituting a document and the arrangement relation between the elements are structured as a tree structure from the input image, and the layout is displayed according to the tree structure. By designating the element region with one point and with the designated position relaxed, the burden on the user can be remarkably reduced, and characters in a desired region can be easily read from the existing document image.

[Brief description of the drawings]

第１図は、文書画像を基本要素に分割した後、文書を構
成する要素に構造化する方法を説明する図である。第２図は、第１図の文書画像に対して得られる文書の配
置構造を木構造として生成された結果の一例を示す図で
ある。第３図は、第１図の文書画像に対して適用する本願の第
１の発明の文書読取装置における領域指定法を説明する
図である。第４図は、第１図の文書画像に対して適用する本願の第
２の発明の文書読取装置における領域指定法を説明する
図である。第５図は、本願発明の一実施例を示すブロック図であ
る。図において、１は画像メモリ、２は領域分割部、３は文
書構造化部、４は構造化データ記憶部、５はレイアウト
抽出部、７は領域指定部、８は領域決定部、９は行読取
し部、10は文字切出し部、11は認識部、12は表示部、13
は認識辞書、14に認識結果記憶部、15は再表示選択部で
ある。FIG. 1 is a diagram for explaining a method of dividing a document image into basic elements and then structuring the elements into elements constituting the document. FIG. 2 is a diagram showing an example of a result of generating a document arrangement structure obtained for the document image of FIG. 1 as a tree structure. FIG. 3 is a view for explaining an area designation method applied to the document image of FIG. 1 in the document reading apparatus of the first invention of the present application. FIG. 4 is a diagram for explaining an area designation method applied to the document image of FIG. 1 in the document reading apparatus of the second invention of the present application. FIG. 5 is a block diagram showing one embodiment of the present invention. In the figure, 1 is an image memory, 2 is an area dividing section, 3 is a document structuring section, 4 is a structured data storage section, 5 is a layout extracting section, 7 is an area specifying section, 8 is an area determining section, and 9 is a line. Reading unit, 10 is a character cutout unit, 11 is a recognition unit, 12 is a display unit, 13
Is a recognition dictionary, 14 is a recognition result storage unit, and 15 is a redisplay selection unit.

Claims

(57) [Claims]

1. A region dividing means for decomposing a document image into basic element blocks such as character lines, figures, etc., and sequentially structuring the plurality of basic element blocks to form an element block constituting a document image and an inter-element block. A document structuring means for structuring the arrangement relationship as a tree structure hierarchically expressed, and a predetermined level or a layout structure of element blocks selected by a redisplay area selecting means described later are extracted using the tree structure, Layout extracting means to be displayed, re-display area selecting means for selecting an element block in which detailed layout information is to be expressed from one or a plurality of layout-displayed element blocks, and one or more layout-displayed one or more element blocks. Area determination means for designating and determining a reading area at one point from an element block, and a vertical search of the tree structure sequentially for only character lines in the reading area. And reading row readout means in the order,
A document reading apparatus comprising: a character recognition unit that cuts out a read character line in units of one character, collates the character line with a recognition dictionary, and sequentially outputs character reading results.