JPH07107711B2

JPH07107711B2 - Document image processing device

Info

Publication number: JPH07107711B2
Application number: JP60193738A
Authority: JP
Inventors: 純一東野; 康明中野; 浩道藤沢; 博唯上田; 誠治柏岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1985-09-04
Filing date: 1985-09-04
Publication date: 1995-11-15
Anticipated expiration: 2010-11-15
Also published as: JPS6255769A

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は画像処理方式に係わり、特に入力文書画像のレ
イアウトを電子的に変更して画像を出力する装置に好適
な画像理解方式に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image processing system, and more particularly to an image understanding system suitable for an apparatus that electronically changes the layout of an input document image and outputs the image.

[Background of the Invention]

従来の電子的文書画像フアイル装置は単に文書の各ペー
ジを画像として格納するのみであり、検索のための二次
情報は別にキーボードなどの符号入力手段によつて外部
から与える必要があつた。また見易くするために文書の
レイアウトを修正・変更など編集をしたり、文書の内容
を連想させるようなインデツクス画像をあらかじめ抽出
するために、クロスヘヤ・カーソルやマウスなどの座標
入力手段によつてその画像領域を外部から与える必要が
あつた。しかし、フアイルの入力・編集作業を省力化す
るためには文書中に記載されている表題や筆者などを自
動的に読み取つて二次情報を生成することが望ましい。
さらに検索を高度化するためには図表のキヤプシヨンや
章・節表題の自動入力あるいは本文自体の認識による自
動キーワード抽出などが必要となる。また対象文書の画
像を表題・著者・要約・本文・図・写真などの部分に分
割すること、文字の部分は認識して符号化すること、さ
らに、複数の文書から必要な領域を抽出して、抄録誌な
どを作成することは、記録スペースの削減や検索単位の
多様化のためにも要請されていた。The conventional electronic document image file device merely stores each page of a document as an image, and secondary information for retrieval needs to be provided externally by a code input means such as a keyboard. In addition, in order to make it easier to see, you can edit or modify the layout of the document, and in order to extract in advance an index image that reminds you of the content of the document, you can use a coordinate input means such as a crosshair cursor or mouse to display the image. It was necessary to give the area from the outside. However, in order to save the labor of inputting / editing files, it is desirable to automatically read the titles and authors described in the document to generate the secondary information.
In order to further improve the search, it is necessary to use captions for figures and tables, automatic input of chapter / section titles, or automatic keyword extraction by recognizing the text itself. Also, divide the image of the target document into parts such as titles, authors, abstracts, text, figures, and photos, recognize and encode the character parts, and extract the necessary areas from multiple documents. The creation of abstract journals was also requested to reduce the recording space and diversify search units.

従来技術ではこのような問題点に対処するため、文書の
中の対象領域を外部から指定し、画像を抽出したり、文
字認識装置によつて文字データに自動的に変換する文法
が提案されている。In the prior art, in order to deal with such a problem, a grammar has been proposed in which a target area in a document is designated from the outside, an image is extracted, or a character recognition device automatically converts it into character data. There is.

情報処理学会第28回全国大会講演論文「画像処理を応
用した文書画像フアイルの一検索方式」（論文集5C−
１、昭和59年）特許公開公報昭和60−17565号「画像記憶検索装置」
（昭和60年１月29日公開）しかし、この方法は検索対象領域の指定に人手がかかる
という問題点がある。IPSJ 28th National Convention Lecture Paper "A Search Method for Document Image Files Using Image Processing" (Proceedings 5C-
1, 1984) Patent publication gazette No. 60-17565 "image memory retrieval device"
(Published January 29, 1985) However, this method has a problem that it takes a lot of labor to specify the search target area.

人手指定方式の問題点に対処するため、画像処理を行う
ことによつて対象領域を自動的に抽出したり、文書の内
容を理解しその理解結果に基づいて文書を処理を行う方
式がある。In order to deal with the problem of the manual designation method, there is a method of automatically extracting a target area by performing image processing, or understanding the content of the document and processing the document based on the understanding result.

情報処理学会第29回全国大会講演論文「ドキユメント
画像用情報システム用新聞見出し抽出アルゴリズム」
（論文集6M−９、昭和59年）特許公報昭和59−39065号「郵便物宛名文字列検出装
置」（昭和59年９月20日公告）情報処理学会第23回全国大会講演論文「複雑な構造を
もつ文書画像の自動解析」（論文集6C−２、昭和56年）しかし、これらの文書処理技術は、画像データを単純に
処理する方式（ボトムアツプ方式）のため、抽出した領
域に意味を与えることが困難であつたり、あらかじめ領
域に意味を与えてから画像データを処理する方式（トツ
プダウン方式）を採用してはいるが、その処理手順が制
御プログラム中に記述された方式であるため、処理する
文書の種類を変更することは容易ではなかつた。IPSJ 29th Annual Conference Lecture Paper "Algorithm for newspaper headline extraction for information systems for document images"
(Proceedings 6M-9, 1984) Patent Publication No. 39065, "A device for detecting character strings in mail" (published on September 20, 1984) IPSJ 23rd National Convention Lecture Paper "Complex "Automatic analysis of structured document images" (Papers 6C-2, 1981) However, these document processing technologies are methods that simply process image data (bottom-up method), so the extracted areas have meaning. Although it is difficult to give it, or the method of processing image data after giving meaning to the area in advance (top-down method) is adopted, since the processing procedure is the method described in the control program, Changing the type of documents processed has never been easier.

すなわち、これらの文書理解技術は新聞や、郵便物の宛
先を対象としたものであるため、論文誌や特許公報のよ
うにある程度定型化されてはいるが、その書式が多種多
様にわたるような文書に対して、検索のため必要な表題
や著者などの二次情報を効率的に抽出する目的には必ず
しも適しない。また、二次情報抽出を失敗したとき抽出
方法を改良して行く手段は適当なものがない。In other words, since these document understanding technologies are targeted at newspapers and mail destinations, they are standardized to some extent like journals and patent gazettes, but their formats vary widely. On the other hand, it is not necessarily suitable for the purpose of efficiently extracting secondary information such as titles and authors necessary for search. Further, there is no suitable means for improving the extraction method when the secondary information extraction fails.

画像を出力する方法に関しては、従来文字を主体とした
システム（TeX）や、図形を主体としたシステム（GKS）
などがある。前者は印刷時の活字の選択を自動化するこ
とを目的としている。後者は図形や画像の合成を目的と
したものであるため、文書を画像として理解してからそ
の内容に応じた画像を出力することは考慮されていな
い。画像を配置する方法に関しては、上記のシステムで
は表示装置の分解能と整合をとるため座標系を任意に設
定できる。また画面上の位置もビユーポートなる矩形領
域によつて任意に設定できる。しかし、座標系またはビ
ユーポートは階層的に表現することはできない。Regarding the method of outputting an image, the conventional system mainly based on characters (TeX) and the system mainly based on graphics (GKS)
and so on. The former aims at automating the selection of print type. Since the latter is intended to synthesize figures and images, it is not considered to understand a document as an image and then output an image corresponding to the content. Regarding the method of arranging the images, in the above system, the coordinate system can be arbitrarily set in order to match the resolution of the display device. Also, the position on the screen can be arbitrarily set by a rectangular area which is a viewport. However, coordinate systems or viewports cannot be represented hierarchically.

さらに、意味情報の付加された画像領域を再配置できる
適切な方式がなかつた。Furthermore, there is no suitable method for rearranging the image area to which the semantic information is added.

[Object of the Invention]

本発明の目的は、ほぼ定型化された文書を対象とし、そ
の画像レイアウトから文書構造を自動的に解析し、所望
のレイアウトに再配置した文書画像を得ることができる
分書画像の処理装置を提供することにある。An object of the present invention is to provide a separation image processing apparatus capable of automatically analyzing a document structure from an image layout of a document that is almost standardized and obtaining a document image rearranged in a desired layout. To provide.

[Outline of Invention]

かかる目的を達成するために、本発明の処理装置は、未知文書をイメージとして入力し、ディジタル画像に変
換する手段、画像を複数個の矩形領域の集合として記述する表記法で
各矩形領域の配置と大きさを変数を含んで表現し、もっ
て上記処理対象の文書に共通の書式を規定した第１の書
式データを保持する第１の記憶領域と、出力すべき画像
のレイアウトを上記表記法で規定した第２の書式データ
を保持する第２の記憶領域と、上記ディジタル画像を保
持する第３の記憶領域と、上記出力すべき画像が格納さ
れる第４の記憶領域とを少なくとも有する記憶手段、上記第１の記憶領域の第１の書式データを参照して該第
１の書式データに記された各矩形領域を上記第３の記憶
領域の上記ディジタル画像上で探索し、もって上記未知
文書の書式の各書式要素に対応する部分画像の存在領域
をそれぞれ特定する構造解析処理と、該構造解析処理で
特定された存在領域を用いて上記第２の書式データをに
記された各書式要素に対応する部分画像を上記第３の記
憶領域からそれぞれ切り出し、該部分画像を上記第４の
記憶領域の中の上記第２の書式データで規定される位置
に転送する画像処理とを実行する制御部、及び上記第４の記憶領域に蓄積された部分画像の集合を表示
する画像出力部を有することを特徴とする。上記の表記
法では、文書画像を矩形領域の集合として表現する。特
に第１の書式データは、処理対象文書に共通の書式を規
定するために各書式要素（例えば技術論文が表題、著者
名、本文、図面からなるとき、それぞれを書式要素と呼
ぶ）に対応する文書画像上の矩形領域の絶対的あるいは
相対的な大きさ及び矩形領域間の絶対的あるいは相対的
な関係を表す数量を変数として含んでいる。また、矩形
領域の探索方法を指定することができる。さらに、矩形
領域をまた矩形領域の集合として表現し、このような階
層的な表現によつて、文書の書式を細部に至るまで表現
できる。In order to achieve such an object, the processing apparatus of the present invention comprises means for inputting an unknown document as an image and converting it into a digital image, and arrangement of each rectangular area by a notation describing the image as a set of a plurality of rectangular areas. And the size are expressed by including the variables, and thus the first storage area for holding the first format data defining the format common to the document to be processed and the layout of the image to be output are expressed by the above notation. Storage means having at least a second storage area for holding the defined second format data, a third storage area for holding the digital image, and a fourth storage area for storing the image to be output. , The first format data in the first storage area is referred to, each rectangular area described in the first format data is searched on the digital image in the third storage area, and thus the unknown document is obtained. of Structure analysis processing for specifying the existing regions of the partial images corresponding to the respective format elements of the expression, and using the existing areas specified by the structural analysis processing, the second format data is added to the respective format elements described in. A control unit that executes image processing of cutting out corresponding partial images from the third storage area and transferring the partial images to a position defined by the second format data in the fourth storage area. , And an image output unit for displaying a set of partial images accumulated in the fourth storage area. In the above notation, the document image is expressed as a set of rectangular areas. In particular, the first format data corresponds to each format element (for example, when a technical paper consists of a title, an author's name, a text, and a drawing, each is referred to as a format element) in order to define a format common to documents to be processed. The variable includes the absolute or relative size of the rectangular area on the document image and the quantity indicating the absolute or relative relationship between the rectangular areas. Also, a search method for a rectangular area can be specified. Further, the rectangular area is also expressed as a set of rectangular areas, and by such a hierarchical expression, the document format can be expressed in detail.

構文解析処理では未知文書が入力されると、上記第１の
書式データで指定された探索方法に従つて矩形領域を探
索し、探索が成功したか否かの情報と探索時に定まるパ
ラメータ（矩形領域の絶対的あるいは相対的な大きさ及
び矩形領域間の絶対的あるいは相対的な関係）を表す数
値を抽出する。構文解析部は、上記のパラメータの数値
を上記第１の書式データの中の変数に代入し、次の解析
を行うことにより、順次文書の構造解析を進める。解析
が終了し、画像の内容が理解された後の画像処理におい
ては、上記の第２の書式データに従つて画像を再び配置
出力する。従つて、上記第２の記憶領域に格納する第２
の書式データの内容を変更することによつて異なつた書
式で画像を出力することもできる。When an unknown document is input in the syntactic analysis process, a rectangular area is searched according to the search method specified by the first format data, information on whether or not the search is successful, and a parameter (rectangular area) determined at the time of search. The absolute or relative size of and the absolute or relative relationship between the rectangular regions) is extracted. The syntactic analysis unit substitutes the numerical values of the above parameters into the variables in the above first format data, and performs the following analysis, thereby sequentially advancing the structural analysis of the document. In the image processing after the analysis is completed and the content of the image is understood, the image is again arranged and output according to the second format data. Therefore, the second storage area stored in the second storage area is
An image can be output in a different format by changing the content of the format data of.

以下に本発明の原理を説明する。第１図，第２図にほぼ
一定の書式を有する技術論文の一頁の例を示す。以下の
説明では対象として技術論文を例にとるが、他の文書で
あつても書式データの内容もしくは表現法の一部を変更
することによりその対象文書に共通の書式を規定するこ
とができるので、本発明が適用でき、本発明は上記技術
論文の一例に限定されるものではない。第３図は、第1,
2図の内容を理解して表題、著者名、代表図を表形式の
構造をもつレイアウトに変更した出力画像である。The principle of the present invention will be described below. FIGS. 1 and 2 show examples of pages of technical papers having a substantially constant format. In the following explanation, a technical paper will be taken as an example of the target, but even for other documents, a common format can be specified for the target document by changing the contents of the format data or part of the expression method. The present invention can be applied, and the present invention is not limited to an example of the above technical paper. Figure 3 shows
2 It is an output image in which the contents of the figure are understood and the title, author name, and representative figure are changed to a layout with a tabular structure.

次に、文書の構造を記述する表現法（以下文書構造表現
と略する）の一例を示す。Next, an example of an expression method for describing the structure of a document (hereinafter, abbreviated as document structure expression) is shown.

（defform F （form F1（10 90 10 40））（form F2 ……… ）（form F3 ……… ））（defform F1 （form F11（10 90 10 50））（form F12（10 90 60 90）））（defmac LINE−１（％１）（point ?Y1（mode IN Y LESS））（point ?Y2（mode OUT Y LESS））（form ％1 （0 ?W ?Y1 ?Y2）））上記文書構造表現を第４図の文書画像の例を参照して説
明する。(Defform F (form F1 (10 90 10 40)) (form F2 ………) (form F3 ………)) (defform F1 (form F11 (10 90 10 50)) (form F12 (10 90 60 90)) )) (Defmac LINE-1 (% 1) (point? Y1 (mode IN Y LESS)) (point? Y2 (mode OUT Y LESS)) (form% 1 (0? W? Y1? Y2))) The structural representation will be described with reference to the example of the document image in FIG.

最初のdefform F……は、書式Ｆが第５図のように、書
式要素F1の下部に書式要素F2およびF3が横に並んだもの
が付随して構成されることを示す。第４図では第５図に
対応したF1,F2,F3の部分は破線で囲んで示してある。書
式要素名F1の次の（）で挾まれた４個の数値 10 90 10 40 は書式Ｆに対応する全領域を100×100としたときの書式
要素F1の領域の位置を示す。ここで、座標系は左上を原
点としている。領域を示す数値は、Ｘ座標の最小値、Ｘ
座標の最大値、Ｙ座標の最小値、Ｙ座標の最大値であ
る。この例のようにパラメータの値が既知のときは、そ
の値を直接記述すればよい。同様に、書式要素F2、書式
要素F3も矩形領域で記述する。The first defform F ... indicates that the form F is formed by additionally forming the form elements F2 and F3 side by side under the form element F1 as shown in FIG. In FIG. 4, the portions of F1, F2 and F3 corresponding to FIG. 5 are shown surrounded by broken lines. The four numbers 10 90 10 40 sandwiched by () next to the format element name F1 indicate the position of the area of the format element F1 when the entire area corresponding to the format F is 100 × 100. Here, the coordinate system has the upper left as the origin. The numerical value indicating the area is the minimum value of X coordinate, X
It is the maximum value of the coordinates, the minimum value of the Y coordinates, and the maximum value of the Y coordinates. When the parameter value is known as in this example, the value may be directly described. Similarly, the format elements F2 and F3 are also described in rectangular areas.

次のdefform F1……は、書式要素F1が、さらに書式要素
F11とF12が縦に並んで構成されることを示す。すなわ
ち、書式要素F11のＹ方向の領域は10から50であり、書
式要素F12のＹ方向の領域は60から90である。書式要素F
11と書式要素F12の領域の位置は、書式要素F1の左上を
原点とした座標系で記述している。従つて、書式Ｆから
みれば相対座標系になつている。In the next defform F1 ..., the format element F1 is
It shows that F11 and F12 are vertically arranged. That is, the Y-direction area of the format element F11 is 10 to 50, and the Y-direction area of the format element F12 is 60 to 90. Format element F
The position of the area of 11 and the format element F12 is described in the coordinate system with the upper left of the format element F1 as the origin. Therefore, the format F has a relative coordinate system.

このように、書式を矩形領域で表現し、この領域の集合
として階層的に次々と表現する事によつて画像を一般化
して表現することができる。もちろん階層的ではなく、
第６図に示すように書式Ｆを基準として絶対座標系で記
述してもよい。この場合、第５図と同様の矩形領域を指
定するためには下記のようにすればよい。In this way, the image can be generalized by expressing the format in the rectangular area and hierarchically expressing the area as a set of the areas. Of course not hierarchical,
As shown in FIG. 6, it may be described in the absolute coordinate system with the format F as a reference. In this case, in order to specify the same rectangular area as in FIG. 5, the following may be done.

（defform F （form F11（18 82 13 25））（form F12（18 82 28 38））（form F2 ……… ）（form F3 ……… ））次のdefmac LINE−１（％１）以降は、マクロ定義であ
る。本マクロ定義の本体である、以下の３行の記述は、
矩形領域の上から１行目が書式要素％１であることを表
現したものである。(Defform F (form F11 (18 82 13 25)) (form F12 (18 82 28 38)) (form F2 ………) (form F3 ………)) Next defmac LINE-1 (% 1) , Is a macro definition. The following three lines, which are the main body of this macro definition,
This expresses that the first line from the top of the rectangular area is the format element% 1.

（point ?Y1（mode IN Y LESS）（point ?Y2（mode OUT Y LESS））（form ％1 （0 ?W ?Y1 ?Y2））ここで、?Wは書式要素の横方向の大きさ、?Hは書式要素
の縦方向の大きさを表す。?Y1,?Y2は以下に述べるよう
に探索により特定される変数である。(Point? Y1 (mode IN Y LESS) (point? Y2 (mode OUT Y LESS)) (form% 1 (0? W? Y1? Y2)) where? W is the horizontal size of the format element, ? H represents the vertical size of the format element, and? Y1,? Y2 are variables specified by the search as described below.

pointはある条件を満足する点を探索し、変数に代入す
ることを示す。探索条件はmodeによつて指定する。IN−
OUTは探索点が白から黒への変化点か黒から白への変化
点かを示し、Ｙは探索軸（ＸまたはＹ）を示し、LESSは
探索方向を表す。なお、この例には存在しないがpoint
に対応するものとして、areaがある。これは、後述する
ように探索範囲の領域を示す。point means to search for a point that satisfies a certain condition and substitute it into a variable. The search condition is specified by mode. IN−
OUT indicates whether the search point is a change point from white to black or a change point from black to white, Y indicates a search axis (X or Y), and LESS indicates a search direction. Note that although it does not exist in this example, point
There is an area corresponding to. This indicates the area of the search range as described later.

探索方法をマクロ定義の記述を例に第７図を用いて説明
する。（Ａ）は書式中にTitle……、Author……なる行
が存在することを示す。これらの行のＹ方向の座標値、
すなわち１行目が?Y1から?Y2までに存在し、２行目が?Y
3から?Y4までに存在していることを記述したのが（Ｂ）
と（Ｃ）である。前述したように（Ｂ）は１行目の書式
要素が％１であることを定義したマクロ、同様に、
（Ｃ）は２行目の書式要素が％１であることを定義した
マクロである。％付きの変数は仮の変数であり、マクロ
を呼び出した時の引き数におき換えられてから実行され
る。従つて、これらマクロの呼び出しかたは以下のよう
にすればよい。The search method will be described with reference to FIG. 7 by taking the description of the macro definition as an example. (A) indicates that lines such as Title ..., Author ... are present in the format. Coordinate values of these rows in the Y direction,
That is, the first line exists from? Y1 to? Y2 and the second line exists from? Y.
(B) describes that it exists from 3 to? Y4
And (C). As described above, (B) is a macro that defines that the format element in the first line is% 1,
(C) is a macro that defines that the format element on the second line is% 1. The variable with% is a temporary variable and is executed after being replaced by the argument when the macro was called. Therefore, you can call these macros as follows.

（LINE−1 F1）（LINE−2 F2）すなわち、１行目の書式要素名がF1、２行目の書式要素
名がF2となる。（Ｂ）の２行目のpointで指定される座
標値?Y1の探索条件は、IN Y LESSである。従つて、白か
ら黒への変化点、探索軸はＹ、その方向はLESSすなわち
Ｙ座標値の小さいものから探索するという条件になつて
いる。またＹ座標値の大きいものから探索する場合はGR
EATERとすればよい。これらの条件を満足するものが１
行目の上限の座標値?Y1である。（Ｂ）の３行目のpoint
で指定される１行目の下限の座標値?Y2は上記の探索条
件において、黒から白への変化点であると記述すればよ
い。すなわち?Y2の探索条件は、OUT Y LESSである。(LINE-1 F1) (LINE-2 F2) That is, the format element name on the first line is F1, and the format element name on the second line is F2. The search condition for the coordinate value? Y1 designated by point in the second line in (B) is IN Y LESS. Therefore, the condition is that the change point from white to black, the search axis is Y, and the direction is LESS, that is, the search is performed from the one having a smaller Y coordinate value. When searching from the one with the largest Y coordinate value, GR
It should be EATER. 1 that satisfies these conditions
The uppermost coordinate value of the line is? Y1. Point on the 3rd line of (B)
It is sufficient to describe that the lower limit coordinate value? Y2 in the first line designated by is the changing point from black to white in the above search condition. That is, the search condition of? Y2 is OUT Y LESS.

次に、書式中の２行目を定義した（Ｃ）について説明す
る。２行目は１行目の次の行であるから、１行目の下限
?Y2を探索し、?Y3は探索範囲の領域をareaで示す。すな
わち、探索対象とする矩形領域を、０ ?W ?Y2 ?H とすることによつて、１行目の下限から（Ｂ）と同様の
探索をすることができる。Next, (C) which defines the second line in the format will be described. Since the second line is the line following the first line, the lower limit of the first line
? Y2 is searched, and? Y3 indicates the area of the search range by area. That is, by setting the rectangular area to be searched as 0? W? Y2? H, the same search as in (B) can be performed from the lower limit of the first line.

文書の構造解析処理においては、上記の表現法で書かれ
た第１の書式データを参照し、その中に記述された矩形
領域が文書に存在するか否かを順次調べて行く。変数を
含んで記述された矩形領域が探索されると、その変数の
数値が得られることとなり、以後はその数値を変数に代
入して用いる。In the document structure analysis processing, the first format data written in the above representation method is referred to, and it is sequentially checked whether or not the rectangular area described therein exists in the document. When a rectangular area including a variable is searched for, a numerical value of the variable is obtained, and thereafter, the numerical value is substituted for the variable and used.

次に、矩形領域間の演算について説明する。実際の文書
では矩形以外の形状をした領域も出現する。第８図
（Ａ），（Ｂ）は矩形以外の形状をした領域の例であ
る。また、（Ｃ）は一つの矩形領域が二つの矩形領域に
分離した例を示す。第８図（Ａ），（Ｂ）は、それぞれ
破線で示すように、二つの矩形領域の和あるいは差とし
て考えられる。また、（Ｃ）は二つの矩形領域がつなが
つて仮想的に一つの矩形領域に纒まつていると考えれ
ば、表現が単純になる。このような矩形領域間の演算を
可能にするため、次のように領域の仮想的な転送を定義
する。Next, the calculation between the rectangular areas will be described. In an actual document, an area having a shape other than a rectangle also appears. FIGS. 8A and 8B are examples of regions having shapes other than rectangles. Further, (C) shows an example in which one rectangular area is separated into two rectangular areas. 8 (A) and 8 (B) can be considered as the sum or difference of two rectangular areas, as indicated by broken lines. Further, in (C), if it is considered that two rectangular areas are connected to each other to virtually form one rectangular area, the expression becomes simple. In order to enable calculation between such rectangular areas, virtual transfer of areas is defined as follows.

（map＆form F （space ?W ?H）（position （（?X0 ?Y0）（?Xmin ?Xmax ?Ymin ?Ymax））（…… ）））第９図はこの定義の意味を示したものである。space
は、新しく書式Ｆとして幅?W、高さ?Hの矩形領域を設定
し、この領域中に転送が行われることを示す。position
は転送先の矩形領域の左上の座標を表す。４個の値（?Xmin ?Xmax ?Ymin ?Ymax）で示される転送元の矩形領域を、上記の転送先に複写す
る。(Map & form F (space? W? H) (position ((? X0? Y0) (? Xmin? Xmax? Ymin? Ymax)) (...))) Figure 9 shows the meaning of this definition. . space
Indicates that a rectangular area of width? W and height? H is newly set as the format F, and that transfer is performed in this area. position
Represents the upper left coordinates of the rectangular area of the transfer destination. The rectangular area of the transfer source indicated by four values (? Xmin? Xmax? Ymin? Ymax) is copied to the above transfer destination.

この仮想的転送を第10図により具体的に説明する。解析
する対象である実際の書式が（Ａ）のように配置されて
あつたとする。これは多段組、もしくはダブルカラムと
よばれているものである。書式要素F1と書式要素F2は、
空間的に横並びとして配置されているが、意味的には
（Ｂ）のように縦並びになつている。このような矩形領
域間の演算は（map＆form F （space 50 60）（position （（10 10）（10 40 10 40））（（10 40）（10 70 10 30））））によつて表現できる。（Ｂ）に示す仮想的な書式は
（Ａ）のspaceによつて、幅50、高さ60の矩形領域を設
定する。そして（Ａ）と（Ｂ）の関係を（position（（10 10）（10 40 10 40））（（10 40）（40 70 10 30）））のように表現する。（Ａ）における矩形領域（10 40 10
40）は、（Ｂ）の（10 10）を原点とする領域に転送さ
れる。This virtual transfer will be specifically described with reference to FIG. It is assumed that the actual format to be analyzed is arranged as shown in (A). This is called a multi-column or double column. Format element F1 and format element F2 are
Although they are spatially arranged side by side, they are arranged vertically vertically as shown in (B). The operation between such rectangular areas can be expressed by (map & form F (space 50 60) (position ((10 10) (10 40 10 40)) ((10 40) (10 70 10 30)))) . In the virtual format shown in (B), a rectangular area having a width of 50 and a height of 60 is set by the space shown in (A). Then, the relationship between (A) and (B) is expressed as (position ((10 10) (10 40 10 40)) ((10 40) (40 70 10 30))). Rectangular area in (A) (10 40 10
40) is transferred to the area whose origin is (10 10) in (B).

以上に説明した仮想的転送を組み合わせれば、第８図に
示したような複雑な形状の領域は二つ以上の矩形領域間
の演算によつて表現することができる。たとえば、第８
図（Ａ）は大きさの異なる二つの矩形領域を隣接させて
転送したものとして表現できる。By combining the virtual transfer described above, a region having a complicated shape as shown in FIG. 8 can be expressed by an operation between two or more rectangular regions. For example, the 8th
The figure (A) can be expressed as a case where two rectangular areas having different sizes are adjacently transferred.

次に、画像を再配置して所望のレイアウトで表示するた
めの画像処理の内容を、とくに縦横比の変更に伴う場合
の画像転送方法について説明する。第11図は、矩形領域
の縦横比に対応して、画像の縮尺を変更して転送する場
合を示す。（１）は入力画像の矩形領域の幅がＷ、高さ
がＨでその領域の画像が文字Ａのパターンであるとす
る。（２）は転送先の矩形領域の幅がＷ′、（３）は高
さがＨ′、（４）は幅がＨ′、高さがＷ′である場合を
示す。上記構造解析処理により第１の書式データの変数
が数値に置き換えられ、ある書式要素に対応する部分画
像の存在領域が第11図の（１）のように特定されると、
画像処理ではその部分画像を切り出して第２の記憶領域
中の矩形領域に転送する。第２の書式データに規定する
矩形領域が第11図（２）、（３）もしくは（４）の様に
もとの存在領域と形が異なる場合にはそれに応じて画
像、つまり文字パターンが修正されて転送される。第12
図は、出力画像の縦横比を変更せずに転送する場合を示
す。（１）の入力画像を、（２）のように転送先の矩形
領域内部をx y zに分割する。（３）はｘ＝０、（４）
はｙ＝０、（５）はｘ＝ｙ、（６）は任意の分割比の場
合である。Next, the contents of the image processing for rearranging the images and displaying them in a desired layout, especially the image transfer method when the aspect ratio is changed will be described. FIG. 11 shows a case where the scale of an image is changed and transferred according to the aspect ratio of the rectangular area. In (1), it is assumed that the width of the rectangular area of the input image is W, the height is H, and the image of the area is the pattern of the character A. (2) shows the case where the width of the rectangular area of the transfer destination is W ', (3) the height is H', and (4) shows the case where the width is H'and the height is W '. By the structure analysis process, the variable of the first format data is replaced with a numerical value, and the existence area of the partial image corresponding to a certain format element is specified as shown in (1) of FIG. 11,
In the image processing, the partial image is cut out and transferred to the rectangular area in the second storage area. If the rectangular area defined in the second format data has a different shape from the original existing area as shown in Fig. 11 (2), (3) or (4), the image, that is, the character pattern is corrected accordingly. Will be transferred. 12th
The figure shows a case where the output image is transferred without changing the aspect ratio. The input image of (1) is divided into xyz inside the rectangular area of the transfer destination as in (2). (3) is x = 0, (4)
Is y = 0, (5) is x = y, and (6) is an arbitrary division ratio.

以上の説明から分るように本発明で提案した文書構造表
現では、文書の構造を矩形領域の組み合わせとして把握
し、矩形領域間の関係を記述しているので文書の表現力
が増し、領域内の行数が不定の場合や、矩形領域が出現
するか否かが不定の場合など、従来取り扱いが困難であ
つた対象も記述できる。従つて、多種多様の文書の解析
と再配置が可能となる。As can be seen from the above description, in the document structure representation proposed by the present invention, the structure of a document is grasped as a combination of rectangular areas and the relationship between the rectangular areas is described, so that the expressive power of the document increases and It is possible to describe a target that has been difficult to handle in the past, such as when the number of lines of is uncertain or when the appearance of a rectangular area is uncertain. Therefore, it is possible to analyze and rearrange various documents.

Example of Invention

以下、本発明の実施例について図面を用いて詳細に説明
する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第13図は本発明の一実施例による画像理解方式を採用し
た装置の構成を示すブロツク図である。本実施例におい
ては理解する画像を文書を対象にして述べるが、一般の
画像、すなわち図形や写真など濃淡画像などが含まれて
いる場合でも応用できる。装置の各部はバス１に接続さ
れ、全体の動作は制御部２により制御される。文書３上
の情報（文書画像情報）はスキヤナ４により光電変換・
デイジタル化されてデイジタル画像となり、バス１を介
してメモリ61に格納される。メモリ61は後述する62,63,
64,65,66,67とともにメモリ６の一部をなす。デイジタ
ル画像61をスキヤナ４から得る代わりに、光デイスクな
どのデイジタル画像フアイル装置から読みこんでもよ
い。また、入力部５から文字コード情報を得てそのコー
ドに対応する画像パターンをフアイル８から読みこんで
もよい。以下の説明では１画素１ビツトに二値化するも
のとするが、１画素を多値で表現してもよく、カラース
キヤナにより光電変換して色情報を付与してもよい。文
書画像に対し制御部２により公知の位置補正処理、傾き
補正処理などを行つて得られる正規化画像がメモリ62に
格納される。FIG. 13 is a block diagram showing the configuration of an apparatus that employs the image understanding method according to the embodiment of the present invention. In the present embodiment, an image to be understood will be described for a document, but the present invention can be applied even when a general image, that is, a grayscale image such as a figure or a photograph is included. Each unit of the device is connected to the bus 1, and the entire operation is controlled by the control unit 2. Information on the document 3 (document image information) is photoelectrically converted by the scanning unit 4.
It is digitalized into a digital image and stored in the memory 61 via the bus 1. The memory 61 will be described later 62, 63,
It forms a part of the memory 6 together with 64, 65, 66 and 67. Instead of obtaining the digital image 61 from the scanner 4, it may be read from a digital image file device such as an optical disc. Alternatively, the character code information may be obtained from the input unit 5 and the image pattern corresponding to the code may be read from the file 8. In the following description, it is assumed that one pixel is binarized into one bit, but one pixel may be represented by multiple values, or color information may be provided by photoelectric conversion by a color scanner. The memory 62 stores a normalized image obtained by performing known position correction processing, tilt correction processing, and the like on the document image by the control unit 2.

前述した変数を含む文書構造表現に従つて書かれた対象
文書に共通の書式データが、あらかじめメモリ64に格納
されているものとする。制御部２は、この書式データを
用いて上記の正規化画像の構造解析処理を行う。ここで
構造解析処理とは、正規化画像を複数の矩形領域に分解
し、メモリ64に格納された書式データを参照して各矩形
領域を探索し、探索結果によりこの書式データに含まれ
る変数を数値に置き換えることをいう。構造解析処理の
結果として得られる各領域のうち、認識対象領域として
あらかじめ定められた領域について、その部分の画像を
文字・図形認識部７に送つて、内部の文字・図形パター
ンを認識させる。一般に元の文書画像は複雑な形状をし
ているが、文書構造解析処理の結果として得られる領域
は矩形形状をしているので、公知の手法により文字・図
形の切り出し・認識が容易にできる。文字認識結果とし
て得られる文字符号列あるいはそれを編集した文字符号
列、または図形認識結果として得られるベクトル・デー
タあるいは記号列などは、指定された領域の検索情報と
して使用できる。これらの検索情報に対応した画像パタ
ーンをフアイル８から読みだし再配置すべきパターンと
して使用できる。以上のようにして得られた入力文書の
検索情報をフアイル９に、再配置した文書のデイジタル
画像をフアイル８に出力する。文書のデイジタル画像の
フアイル８への出力に際して、分解された複数の矩形領
域単位で別々に出力してもよい。また、フアイル８とフ
アイル９は同一のものとしてもよい。It is assumed that the format data common to the target document written according to the document structure expression including the variables described above is stored in the memory 64 in advance. The control unit 2 uses the format data to perform the structure analysis process of the normalized image. Here, the structural analysis processing is to decompose the normalized image into a plurality of rectangular areas, search each rectangular area by referring to the format data stored in the memory 64, and search the variables included in this format data according to the search result. It means replacing with a numerical value. Of the areas obtained as a result of the structural analysis processing, the area of the area that is predetermined as the recognition target area is sent to the character / graphic recognition unit 7 to recognize the internal character / graphic pattern. Generally, the original document image has a complicated shape, but since the area obtained as a result of the document structure analysis processing has a rectangular shape, it is possible to easily cut out and recognize characters and figures by a known method. A character code string obtained as a character recognition result, a character code string obtained by editing the character code string, or vector data or a symbol string obtained as a figure recognition result can be used as search information for a designated area. An image pattern corresponding to the search information can be read from the file 8 and used as a pattern to be rearranged. The search information of the input document obtained as described above is output to the file 9, and the digital image of the rearranged document is output to the file 8. When outputting the digital image of the document to the file 8, it may be separately output in units of a plurality of decomposed rectangular areas. Further, the file 8 and the file 9 may be the same.

さらに前記した表記法に従って書かれた表示すべき出力
文書の書式データが、あらかじめメモリ65に格納されて
いる。但し、このメモリ65に格納された書式データは、
変数を含まず、つまり各書式要素に対応する矩形領域の
大きさ、配置を数値で規定、明示する。制御部２は、こ
の書式データを用いて上記の正規化画像の画像処理を行
う。ここで、画像処理とは、第２の書式データに記され
た各書式要素に対応する部分画像を上記正規画像からそ
れぞれ切り出し、出力画像を保持するためのメモリ63の
第２の書式データにて規定されたそれぞれの領域に格納
する処理をいう。つまり、複数の矩形領域を再び合成
し、メモリ63に格納する。そして画像出力部10により出
力画像を表示する。Further, the format data of the output document to be displayed written according to the above-mentioned notation is stored in the memory 65 in advance. However, the format data stored in this memory 65 is
It does not include variables, that is, the size and arrangement of the rectangular area corresponding to each format element is specified and specified numerically. The control unit 2 uses the format data to perform the image processing of the normalized image. Here, the image processing refers to the second format data of the memory 63 for holding the output image by cutting out the partial image corresponding to each format element described in the second format data from the regular image. It refers to the process of storing in each specified area. That is, a plurality of rectangular areas are combined again and stored in the memory 63. Then, the image output unit 10 displays the output image.

以下に文書の構文解析処理の詳細を述べる。第14図及び
第15図は、文書理解の処理の流れを説明する図である。
処理の流れは、PAD（Program Analysis Diagram）形式
で書かれている。100で文書画像の輪郭抽出を行い、メ
モリ66に格納する。輪郭抽出は公知の手法を使用すれば
よい。輪郭抽出の代わりにいわゆる連結領域抽出法を使
用してもよい。200で抽出された各輪郭ｉからそのＸ座
標及びＹ座標の最大値と最小値Xmin（ｉ） Xmax（ｉ）
Ymin（ｉ） Ymax（ｉ）を抽出する。この４個の数値か
ら輪郭ｉの外接長方形が求まる。300,400,500はそれぞ
れ構文解析処理の初期化、本体、終了判定である。The details of the document parsing process are described below. 14 and 15 are diagrams for explaining the flow of document understanding processing.
The process flow is written in PAD (Program Analysis Diagram) format. The contour of the document image is extracted at 100 and stored in the memory 66. A known method may be used for the contour extraction. A so-called connected area extraction method may be used instead of the contour extraction. The maximum and minimum values of the X and Y coordinates of each contour i extracted in 200, Xmin (i) Xmax (i)
Ymin (i) Ymax (i) is extracted. The circumscribed rectangle of the contour i is obtained from these four numerical values. 300, 400, and 500 are initialization, main body, and end determination of the syntax analysis process, respectively.

300ではメモリ64に格納されている書式データを作業用
メモリ67に複写し、各種テーブルやプログラム内部変数
の初期化を行う。In 300, the format data stored in the memory 64 is copied to the work memory 67, and various tables and program internal variables are initialized.

構文解析処理の本体400は、410〜460から構成される。4
10は、420〜450の処理を460で終了判定が行われるまで
繰り返し行うように制御する。420では書式データ中の
ステートメントを取り出す。処理未了ステートメントと
は、その中に含まれる変数で値の定まつていないものが
あるか、または対応する文書領域がまだ決定されていな
いような行を指す。430は、処理未了ステートメントが
残つていない場合は440の処理をスキツプする判定であ
る。この場合には終了判定が行われることになる。420
で取り出したステートメントが処理未了ステートメント
の場合、440の処理が行われる。440は、ステートメント
の種類を判定して分岐する部分で、ステートメトの種類
に応じて処理の内容が変化する。第14,15図及び以下の
説明では、formステートメント、すなわち（form F0 （?Xmin ?Xmax ?Ymin ?Ymax）（shrink ?X ?Y））の場合についてのみ述べるが、他のステートメントでも
同様にそのステートメント特有の処理が行われる。The main body 400 of the parsing process is composed of 410 to 460. Four
The control unit 10 controls the processing from 420 to 450 to be repeated until the end determination is made in 460. At 420, the statement in the format data is retrieved. An unprocessed statement is a line in which some of the variables contained in it have undetermined values, or the corresponding document area has not yet been determined. Reference numeral 430 is a judgment to skip the processing of 440 when there are no unprocessed statements left. In this case, the end determination is made. 420
If the statement fetched in step is an unprocessed statement, 440 processing is performed. Reference numeral 440 is a portion that determines the type of statement and branches, and the processing content changes depending on the type of state met. 14 and 15 and the following description, only the case of the form statement, that is, (form F0 (? Xmin? Xmax? Ymin? Ymax) (shrink? X? Y)) is described. Statement-specific processing is performed.

第15図441〜448は述語formを処理する部分である。441
では書式名称F0が登録済みか否かを調べ、未登録ならば
442で書式テーブルにF0を登録する。442では、変数名?X
min、?Xmax、?Ymin、?Ymax、?X、?Yの位置に書かれた文
字列が変数か数値か、変数なら登録済みか否かを調べ、
未登録ならこれらを変数表に登録する。変数が登録済み
ならばその値が確定しているか否かを調べ、確定してい
なければform処理は終了する（この場合このステートメ
ントは処理未終了となる）。確定していれば、ステート
メント中の変数名を上記の数値で書き換える。FIGS. 441 to 448 are parts for processing the predicate form. 441
Then, check whether the format name F0 has been registered, and if it is not registered,
Register F0 in the format table with 442. In 442, variable name? X
Check whether the character string written at the position of min,? Xmax,? Ymin,? Ymax,? X,? Y is a variable or a numerical value, and if it is a variable, it has been registered.
If not registered, register these in the variable table. If the variable is already registered, the value is checked to see if it is fixed. If not, the form process ends (in this case, this statement is unfinished). If confirmed, rewrite the variable name in the statement with the above numerical value.

具体例として、 ?Xmin＝０、?Xmax＝90、 ?Ymin、?Ymax:未登録 ?X ＝５、?Y ＝５、のとき、前記のステートメントは（form F0 （0 90 ?Ymin ?Ymax）（shrink 5 5））と書き換えられ、変数?Xmin、?Ymaxが変数テーブルに登
録されて、値未確定となる。As a specific example, when? Xmin = 0,? Xmax = 90,? Ymin,? Ymax: unregistered? X = 5,? Y = 5, the above statement is (form F0 (0 90? Ymin? Ymax) (Shrink 5 5)) is rewritten, and the variables? Xmin and? Ymax are registered in the variable table, and the values are undetermined.

443で、ステートメント中の変数名が全て数値に書き換
えられているか否かにより分岐し、全て数値に書き換え
られていたとき、444のform実行処理を行う。form実行
処理の詳細は445〜448で表される。445は、200で抽出さ
れた輪郭ｉについて以下の処理を繰り返すことを示す。
446では、輪郭ｉのＸ座標及びＹ座標の最小値と最大値 Xmin（ｉ） Xmax（ｉ） Ymin（ｉ） Ymax（ｉ）をステートメント中の変数 ?Xmin ?Xmax ?Ymin ?Ymax ?X ?Y に対応する数値と比較し ?Xmin＜Xmin（ｉ）＜Xmax（ｉ）＜?Xmax ?Ymin＜Ymin（ｉ）＜Ymax（ｉ）＜?Ymax ?X ＜Xmax（ｉ）−Xmin（ｉ） ?Y ＜Ymax（ｉ）−Ymin（ｉ）が成立する輪郭か否かを判定する。447では、上記の条
件が成立したとき、その輪郭ｉをF0の成分テーブルに登
録する。448では、上記の条件が成立する輪郭が存在し
ないとき、解析失敗のフラグを立てる。At 443, it branches depending on whether all the variable names in the statement have been rewritten to numerical values, and when all have been rewritten to numerical values, the form execution processing of 444 is performed. Details of form execution processing are represented by 445 to 448. Reference numeral 445 indicates that the following processing is repeated for the contour i extracted in 200.
In 446, the minimum value and the maximum value Xmin (i) Xmax (i) Ymin (i) Ymax (i) of the X coordinate and the Y coordinate of the contour i are set in the variable? Xmin? Xmax? Ymin? Ymax? X? Y in the statement. ? Xmin <Xmin (i) <Xmax (i) <? Xmax? Ymin <Ymin (i) <Ymax (i) <? Ymax? X <Xmax (i) -Xmin (i)? It is determined whether or not the contour satisfies Y <Ymax (i) -Ymin (i). In 447, when the above condition is satisfied, the contour i is registered in the F0 component table. In 448, when there is no contour satisfying the above conditions, an analysis failure flag is set.

以上説明したように441〜448の処理により、書式データ
中のステートメントformに対応する構造が入力画像に存
在することを検出できる。form以外のステートメントに
ついても同様である。formの場合には出力データはない
が、ステートメントによつては、ステートメント中の変
数に解析時に求めたパラメータを代入するものもあり、
その結果が他のステートメントで用いられる。As described above, by the processing of 441 to 448, it can be detected that the structure corresponding to the statement form in the format data exists in the input image. The same applies to statements other than form. In the case of form, there is no output data, but depending on the statement, there are also those that substitute the parameter obtained at the time of analysis into the variable in the statement,
The result is used in other statements.

450では、解析失敗フラグを調べ、解析が失敗したとき
後戻りして再試行する。この場合、解析済みのステート
メントに戻つてパラメータを代入した変数をまた以前の
状態に書き直し、別の可能性を探索するように制御す
る。At 450, the parsing failure flag is checked and if the parsing fails, it goes back and tries again. In this case, the variable in which the parameter is substituted in the parsed statement is rewritten to the previous state, and control is performed to search another possibility.

460では、解析失敗フラグが立つていないか、あるいは
後戻り再試行の後解析失敗フラグがあるかを検出し、終
了判定を行う。At 460, it is detected whether or not the analysis failure flag is set or whether there is the analysis failure flag after the backward return retry, and the end determination is performed.

500は解析の結果得られたデータを外部に受け渡す部分
である。外部に受け渡すデータとしては、書式名称に対
応して検出した矩形領域の文書上での座標などがある。Reference numeral 500 is a part for passing the data obtained as a result of the analysis to the outside. The data to be transferred to the outside includes the coordinates on the document of the rectangular area detected corresponding to the format name.

解析失敗フラグを立てる指定のあるステートメントで解
析が失敗したとき、この文書は理解不能であり、このと
きはリジエクト処理を行う。たとえば文書理解の最終結
果あるいは中間結果をコンソール11に表示し、マンマシ
ン的に修正する。If parsing fails for a statement with the parsing-failed flag set, this document is incomprehensible and will be rejected. For example, the final result or intermediate result of document understanding is displayed on the console 11 and is corrected man-machinely.

次に、form実行処理の内容を第16図に用いて具体的に説
明する。第16図（Ａ）は画像中にノイズ成分や文字1,A,
2,B成分が存在している場合を示す。Next, the contents of the form execution process will be specifically described with reference to FIG. Fig. 16 (A) shows noise components and characters 1, A,
2 shows the case where B component is present.

（Ｂ）は、formステートメントの実行時のパラメータ
が、（form F（20 80 10 50）（shrink 0 0））（Ｃ）は、formステートメントの実行時のパラメータ
が、（form F（20 80 10 50）（shrink 5 5））の場合である。図に示すように書式Ｆの成分テーブルに
は、（Ｂ）の場合、ノイズ成分と、文字1,A成分が登録
され、（Ｃ）の場合、文字1,A成分は登録されるが、ノ
イズ成分はshrink指定によつて登録されず、除去され
る。また、書式Ｆの矩形領域がformの実行後、図のよう
に、領域内に含まれる文字成分によつて領域を正規化す
ることができ、画像の内容に応じて領域の大きさを柔軟
に特定することができる。(B) shows the execution parameters of the form statement, (form F (20 80 10 50) (shrink 0 0)) (C) shows the execution parameters of the form statement, (form F (20 80 10) 50) (shrink 5 5)). As shown in the figure, in the component table of format F, in the case of (B), the noise component and the character 1 and A components are registered, and in the case of (C), the character 1 and A components are registered, but the noise component The component is not registered by the shrink specification and is removed. In addition, after executing the form for the rectangular area of the format F, the area can be normalized by the character components included in the area as shown in the figure, and the area size can be flexibly changed according to the content of the image. Can be specified.

第17図に、上記form実行時の輪郭成分の選択方法を、具
体的に説明する。第17図（Ａ）は、第16図（Ａ）に示す
輪郭画像を第14図200によつて処理した結果の外接長方
形を示す。すなわち、５はノイズ成分、１−８は文字成
分、さらに６−８は所謂内輪郭である。これらの成分の
Xmin,Xmax,Ymin,Ymaxを（Ｂ）に示す。そして書式Ｆに
含まれるか否かは 20＜Xmin（ｉ）＜Xmax（ｉ）＜80 10＜Ymin（ｉ）＜Ymax（ｉ）＜50 5＜Xmax（ｉ）−Xmin（ｉ） 5＜Ymax（ｉ）−Ymin（ｉ）が成立するかどうかをもつて判定する。この例では、輪
郭ｉ＝１と３が成立する。３の文字成分は６の成分を含
んでいるから書式Ｆから除いてもよい。FIG. 17 concretely describes the method of selecting the contour component when executing the above form. FIG. 17 (A) shows a circumscribed rectangle as a result of processing the contour image shown in FIG. 16 (A) according to FIG. 14 200. That is, 5 is a noise component, 1-8 is a character component, and 6-8 is a so-called inner contour. Of these ingredients
Xmin, Xmax, Ymin and Ymax are shown in (B). Whether or not it is included in the format F is 20 <Xmin (i) <Xmax (i) <80 10 <Ymin (i) <Ymax (i) <50 5 <Xmax (i) -Xmin (i) 5 <Ymax (I) -Ymin (i) is determined based on whether or not it holds. In this example, the contours i = 1 and 3 hold. Since the character component of 3 includes the component of 6, it may be omitted from the format F.

次に構文解析処理の終了結果から画像を再配置して表示
するための画像処理の詳細を述べる。第18図は部分画像
を配置する場合の概要を図示したものである。ABCDなる
文字コード情報から対応する画像パターンを（２）に示
すように配置する。この例では文字パターンの幅をＷ、
高さをＨ、文字の間隔をＳとした。そして（２）に示す
ように書式２で定義される書式３に（１）の部分画像を
転送する。この縮尺は前述したように書式の矩形領域の
比率によつて決める。また、画像を転送する機能として
縮小、拡大、回転、アフイン変換などをふくんでもよ
い。文字パターンは第19図（１）（２）図に示すような
輪郭データまたは画素データとして格納しておく。第20
図は第19図（１）で表わした輪郭データの特徴を記述し
た図である。（１）はそれぞれの折れ曲がり点をフラグ
と点列で記述している。フラグは外輪郭か内輪郭かを示
すもので、この場合１が外輪郭２が内輪郭である。さら
に点列は（２）に示すように番号とＸ座標、Ｙ座標で記
述される。本実施例では説明の簡素化のため直線要素か
ら構成される輪郭データを取り上げて入るが、もちろん
２次関数やスプライン関数などの数学的関数で表現して
もよい。輪郭データで画像を表現することによつて縮
小、拡大、回転、アフイン変換などの処理が、画素デー
タで表現する場合に比べて容易になる利点がある。従つ
て、文字パターンのように数多くの大きさをもつフオン
トや変形した字体の作成にも使用できる。第21図で矩形
領域で記述する書式について説明する。（１），（２）
は図のような部分画像が、（form G−ABC（20 80 10 50）（free Y））（form G−123（20 80 50 60）（free X））によつて（３）に示す書式を持つた画像をえることがで
きる。freeは第12図に説明した画像の転送時の配置を制
御するパラメータである。この例では（free Y）と指定
された場合Ｙ方向の比率がｘ＝ｙとなるように、また
（free X）は前記においてＹ軸がＸ軸におきかえたもの
である。図22図は配置した部分画像を一つの部分画像と
みなして再び配置する様子を示す。（Ａ），（Ｂ）は次
の書式（１）（２）で記述される。Next, the details of the image processing for rearranging and displaying the images based on the result of the parsing processing will be described. FIG. 18 shows an outline of a case where partial images are arranged. The image pattern corresponding to the character code information ABCD is arranged as shown in (2). In this example, the width of the character pattern is W,
The height is H and the character spacing is S. Then, as shown in (2), the partial image of (1) is transferred to the format 3 defined by the format 2. This scale is determined by the ratio of the rectangular area of the format as described above. Further, as a function of transferring an image, reduction, enlargement, rotation, affine conversion, etc. may be included. The character pattern is stored as contour data or pixel data as shown in FIGS. 19 (1) and 19 (2). 20th
The figure describes the features of the contour data shown in FIG. 19 (1). In (1), each bending point is described by a flag and a point sequence. The flag indicates whether it is an outer contour or an inner contour. In this case, 1 is the outer contour and 2 is the inner contour. Further, the point sequence is described by numbers, X coordinates, and Y coordinates as shown in (2). In the present embodiment, contour data composed of straight line elements is taken up for simplification of description, but of course it may be expressed by a mathematical function such as a quadratic function or a spline function. By expressing the image with the contour data, there is an advantage that processing such as reduction, enlargement, rotation, and affine conversion becomes easier as compared with the case of expressing with the pixel data. Therefore, it can also be used to create fonts and deformed fonts that have many sizes such as character patterns. The format described in the rectangular area will be described with reference to FIG. (1), (2)
Is a partial image as shown in (3) according to (form G-ABC (20 80 10 50) (free Y)) (form G-123 (20 80 50 60) (free X)). You can get an image with. free is a parameter that controls the arrangement of images when they are transferred as described in FIG. In this example, when (free Y) is designated, the ratio in the Y direction is x = y, and (free X) is the one in which the Y axis is replaced with the X axis in the above. FIG. 22 shows how the arranged partial images are regarded as one partial image and are arranged again. (A) and (B) are described in the following formats (1) and (2).

（defform G−ABC−123 （width 100）（height 60）（form G−ABC（20 80 10 50）（free Y））（form G−123（20 80 50 60）（free X）） ………（１）（defform G （width 100）（height 80）（form G−ABC−123（0 80 0 60））（form G−ABC−123（50 100 50 80）））………（２）（２）では（１）で定義した書式Ｇ−ABC−123が２回縮
尺比を変えて配置されている。書式Ｇ−ABC−123はその
書式の大きさが100×60（横長さをwidthで、縦長さをhe
ightで表現している。）そしてこの書式の（20 80 10 5
0）の領域にＧ−ABCがあり、（20 80 50 60）の領域に
Ｇ−123が存在することを示す。ここで作成した書式を
使つて書式Ｇを定義する。（２）では書式ＧにＧ−ABC
−123が（0 80 0 60）と（50 100 50 80）の領域に存在
することを示す。(Defform G-ABC-123 (width 100) (height 60) (form G-ABC (20 80 10 50) (free Y)) (form G-123 (20 80 50 60) (free X)) ……… (1) (defform G (width 100) (height 80) (form G-ABC-123 (0 80 0 60)) (form G-ABC-123 (50 100 50 80))) ……… (2) ( In 2), the format G-ABC-123 defined in (1) is placed twice with a different scale ratio.The format of G-ABC-123 is 100x60 (width is width and width is width). , The height is he
It is expressed in ight. ) And in this format (20 80 10 5
It is shown that G-ABC exists in the region (0) and G-123 exists in the region (20 80 50 60). The format G is defined using the format created here. In (2), format G is G-ABC
Indicates that −123 exists in the (0 80 0 60) and (50 100 50 80) regions.

以上の処理方式によつて画像を任意の書式に従つて配置
することができる。第23図は同一の書式の文書が複数存
在した場合、ページ毎に文書の内容を理解した様子を示
す。そして第24図に示す書式に従つて再び配置すること
によつて第25図に示すような抄録を作成することが可能
になる。この方法を詳細に説明する。第23図は文書の書
式がF,F1,F11,F12,F2,F3であり、書式F11に対応したpag
e毎の文字列が抽出される。第24図は出力用の画像の書
式Ｇを表現したものである。タイトルの欄はG1,G2,G3で
構成する。これの内容はpage毎にG1（page i）,G2（pag
e i）,G3（page i）（ｉはpage数を示す）である。第25
図は再び配置する書式毎の部分画像を示す。これらの部
分画像を第22図に説明したような方法でメモリ65の書式
データに従つて配置すればよい。With the above processing method, images can be arranged in an arbitrary format. FIG. 23 shows how the contents of the document are understood page by page when a plurality of documents having the same format exist. By re-arranging according to the format shown in FIG. 24, it becomes possible to create an abstract as shown in FIG. This method will be described in detail. Figure 23 shows that the document format is F, F1, F11, F12, F2, F3, and pag corresponding to format F11.
The character string for each e is extracted. FIG. 24 shows a format G of an output image. The title column consists of G1, G2, and G3. The contents of this are G1 (page i), G2 (pag
ei) and G3 (page i) (i indicates the number of pages). 25th
The figure shows partial images for each format to be rearranged. These partial images may be arranged according to the format data in the memory 65 by the method as described in FIG.

〔The invention's effect〕

以上説明したごとく、本発明によれば入力した対象文書
の解析を自動的に行うことが可能であり、任意のレイア
ウトで出力することが可能になる。さらに、対象となる
文書を記述する書式データと、出力する文書の書式を記
述する書式データを同一の表現形式にすることができる
ため、多種多様な書式に容易に対応できる。また、対応
文書の構造が変化しても書式データを変更すれば、直ち
に対応できるなどの利点がある。As described above, according to the present invention, it is possible to automatically analyze an input target document and output it in an arbitrary layout. Further, since the format data describing the target document and the format data describing the format of the document to be output can have the same expression format, it is possible to easily cope with various formats. Further, even if the structure of the corresponding document is changed, if the format data is changed, it can be dealt with immediately.

[Brief description of drawings]

第1,2図は入力文書の一例を示す図、第３図は出力文書
の一例を示す図、第4,5,6,7,8,9,10,11,12図は本発明の
原理を説明するための説明図、第13図は本発明の文書処
理方式を実施する装置の構成を示すブロツク図、第14,1
5図は第13図中の制御部２における処理を説明するため
の流れ図、第16,17,18,19,20,21,22図は第10図の処理内
容を説明するための説明図、第23,25図は文書の抄録作
成に応用した場合の図、第24図は出力用の画像の書式を
示す図である。１……バス、２……制御部、３……文書、４……スキヤ
ナ、５……入力部、６……メモリ、７……文字・図形認
識部、8,9……フアイル、10……画像出力部、11……コ
ンソール。1 and 2 are diagrams showing an example of an input document, FIG. 3 is a diagram showing an example of an output document, and FIGS. 4, 5, 6, 7, 8, 9, 9, 10, 11, and 12 are the principles of the present invention. 13 is a block diagram showing the structure of an apparatus for carrying out the document processing system of the present invention, and FIGS.
FIG. 5 is a flow chart for explaining the processing in the control unit 2 in FIG. 13, and FIGS. 16, 17, 18, 19, 20, 21, and 22 are explanatory diagrams for explaining the processing contents of FIG. 23 and 25 are diagrams when applied to the creation of an abstract of a document, and FIG. 24 is a diagram showing a format of an image for output. 1 ... bus, 2 ... control unit, 3 ... document, 4 ... scanner, 5 ... input unit, 6 ... memory, 7 ... character / figure recognition unit, 8,9 ... file, 10 ... … Image output part, 11… Console.

───────────────────────────────────────────────────── フロントページの続き (72)発明者上田博唯東京都国分寺市東恋ヶ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者柏岡誠治東京都国分寺市東恋ヶ窪１丁目280番地株式会社日立製作所中央研究所内 ─────────────────────────────────────────────────── ─── Continuation of front page (72) Hiroi Ueda 1-280 Higashi Koigakubo, Kokubunji, Tokyo Metropolitan Research Laboratory, Hitachi, Ltd. (72) Seiji Kashiwaoka 1-280 Higashi Koigakubo, Kokubunji, Tokyo Hitachi Ltd. Central Research Laboratory

Claims

[Claims]

Claim: What is claimed is: 1. A processing device for processing a substantially standardized document, outputting the image in which the document is captured as an image and the layout is rearranged, and the unknown document is input as an image and converted into a digital image. First format data that expresses the arrangement and size of each rectangular area by a notation that describes an image as a set of a plurality of rectangular areas, and thus defines a common format for the document to be processed. In advance, a second storage area in which the second format data in which the layout of the image to be output is defined by the above notation is stored in advance, and a third storage in which the digital image is stored. Storage means having at least an area and a fourth storage area in which the image to be output is stored, the first format data in the first storage area being referred to, The region corresponding to each rectangular area marked in formula data searched on the digital image of the third memory area,
Accordingly, the structure analysis processing for specifying the existence area of the partial image corresponding to each format element of the format of the unknown document, and the existence area specified by the structure analysis processing are used to describe in the second format data. Image processing for cutting out the partial image corresponding to each format element from the third storage area and transferring the partial image to the position defined by the second format data in the fourth storage area. Control unit to execute,
And a document image processing apparatus having an image output unit for displaying a set of partial images accumulated in the fourth storage area.