JPH0776970B2

JPH0776970B2 - Document shaping device

Info

Publication number: JPH0776970B2
Application number: JP63009580A
Authority: JP
Inventors: 勇岩井; 洋一竹林; 美和子土井; 美佳福井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-01-21
Filing date: 1988-01-21
Publication date: 1995-08-16
Anticipated expiration: 2010-08-16
Also published as: JPH01185761A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、文書の論理的構造に基づいて、指定された領
域内に文書を自動的に配置する文書整形装置に関する。The present invention relates to a document shaping device that automatically arranges a document in a designated area based on the logical structure of the document.

（従来の技術）近年、文書データからその論理構造や意味構造を抽出
し、これらの構造データから上記文書を自動的に所定の
出力フォーマットに整形する文書処理システムの開発が
進められている。（情報処理学会第34回全国大会講演論
文集“知的文書処理システムにおける文書構造生成機能
について“岩井勇、土井美和子、福井美佳,pp.1309−p
p.1310）。この種のシステムでは、文書をあるフォーマ
ットで出力する場合、予め文書を配置する為のフレーム
を定義して、そのフレームに従って文書を流し込むこと
が行われる。例えば、技術文献の文書整形を例にとる
と、先頭頁については、標題、著者名、所属、要約等を
割付けるためのシングルカラムのフレームが、頁上部に
設定され、本文を割付けるためのダブルカラムのフレー
ムが、上記フレームの下に設定される。また、２頁目以
降は、全てダブルカラムのフレームが設定される。この
ように、従来は文書の性質に合せてフレーム構造を予め
設定し、この設定されたフレーム内に文書データを流し
込むようにしていた。(Prior Art) In recent years, the development of a document processing system for extracting the logical structure and the semantic structure of document data and automatically shaping the document into a predetermined output format from these structure data has been advanced. (Proceedings of the 34th National Conference of the Information Processing Society of Japan "Document Structure Generation Function in Intelligent Document Processing Systems" Isamu Iwai, Miwako Doi, Mika Fukui, pp.1309-p
p.1310). In this type of system, when outputting a document in a certain format, a frame for arranging the document is defined in advance, and the document is poured according to the frame. For example, in the case of document formatting of technical documents, for the first page, a single-column frame for allocating the title, author name, affiliation, abstract, etc. is set at the top of the page, and the text for allocating the text is A double column frame is set below the frame. In addition, a double-column frame is set for all of the second and subsequent pages. As described above, conventionally, the frame structure is set in advance according to the property of the document, and the document data is poured into the set frame.

しかしながら、フレームを予め設定しておくと、例え
ば、文書データの内容に変更があった場合、設定された
フレームに文書データが入り切らなかったり、また、逆
にフレームの面積に対して文書データ量が極端に少なく
なり、大きなスペースが発生して文書の見栄えが悪くな
るという欠点があった。従って、このような場合、再度
フレームを設定し直す必要があるが、この作業はフレー
ム構造が複雑である場合、非常に面倒な作業となってい
た。また、この問題は、罫線設定を行なった後のデータ
修正時にも同様に発生していた。However, if the frame is set in advance, for example, when the content of the document data is changed, the document data may not fit in the set frame, or conversely, the amount of document data relative to the area of the frame. However, there is a drawback that the document becomes unattractive because of a large amount of space. Therefore, in such a case, it is necessary to reset the frame again, but this work is very troublesome when the frame structure is complicated. Further, this problem also occurs when correcting data after setting ruled lines.

（発明が解決しようとする課題）このように、従来の文書整形装置では、予め設定された
文書の配置領域と、この配置領域に配置すべき文書デー
タとが整合せずに、文書データが割付け不能となった
り、文書の見栄えが低下する等の問題があった。(Problems to be Solved by the Invention) As described above, in the conventional document shaping device, the preset document arrangement area and the document data to be arranged in the arrangement area do not match, and the document data is allocated. There were problems such as being impossible and the appearance of the document being degraded.

本発明は、上記問題点に鑑みなされたもので、文書デー
タの整形処理を領域情報に基づいて行なう文書整形装置
において、常に文書データと領域情報との整合がとれ、
見易く見栄えの良い文書を整形できる文書整形装置を提
供することを目的とする。The present invention has been made in view of the above problems, and in a document shaping device that performs shaping processing of document data based on area information, the document data and the area information are always consistent,
An object of the present invention is to provide a document shaping device capable of shaping a document that is easy to see and looks good.

［発明の構成］（課題を解決するための手段）本発明は、第１図に示すように、文書データを入力する
入力部10と、この入力部10を介して入力された文書デー
タの構造属性を解析する構造解析部20と、前記文書デー
タの構造属性に応じた割付け領域を各頁毎に特定する領
域情報を記憶する領域情報記憶部30と、前記構造解析部
20で求められた前記文書データの構造属性と前記領域情
報記憶部30に記憶された領域情報とに基づいて前記文書
データをその構造属性に応じた前記割付け領域に割付け
る文書整形部40と、この文書整形部40で割付けた状態を
監視し前記文書データと前記割付け領域との不整合が生
じた場合にこれを検知する出力状態監視部50と、この出
力状態監視部50で前記不整合が検知されたときに前記領
域情報記憶部30の記憶内容を上記不整合が解消されるよ
うに修正する領域情報修正部60と、前記文書整形部40で
割付けられた文書を出力する出力部70とを具備したこと
を特徴としている。[Structure of the Invention] (Means for Solving the Problems) The present invention, as shown in FIG. 1, has an input unit 10 for inputting document data and a structure of document data input via the input unit 10. A structure analysis unit 20 that analyzes attributes, a region information storage unit 30 that stores region information that specifies, for each page, an allocated region according to the structure attributes of the document data, and the structure analysis unit
A document shaping unit 40 that allocates the document data to the allocation area according to the structural attribute based on the structural attribute of the document data obtained in 20 and the area information stored in the area information storage unit 30, An output state monitoring unit 50 that monitors the state allocated by the document shaping unit 40 and detects the inconsistency between the document data and the allocated region, and the output state monitoring unit 50 detects the inconsistency. A region information correction unit 60 that corrects the stored contents of the region information storage unit 30 when detected so as to eliminate the inconsistency, and an output unit 70 that outputs the document assigned by the document shaping unit 40. It is characterized by having.

（作用）本発明によれば、文書データの構造属性に応じた割付け
領域を各頁毎に特定する領域情報を領域情報記憶部に記
憶しているので、文書整形部は、前記領域情報と文書デ
ータの構造属性情報とから、その文書データをどの領域
に割付ければ良いかを判別できる。そして、出力状態監
視部が、この文書整形部で文書データを割付けた場合の
文書データと領域情報との不整合を検出し、領域情報修
正部が、上記不整合を解消すべく前記領域情報記憶部の
記憶内容を修正するので、常に、文書データと領域情報
との整合性がとれ、見易く見栄えの良い文書を自動整形
することができる。(Operation) According to the present invention, since the area information storage unit stores the area information specifying the allocation area according to the structural attribute of the document data for each page, the document shaping unit stores the area information and the document. The area to which the document data should be allocated can be determined from the structure attribute information of the data. Then, the output state monitoring unit detects a mismatch between the document data and the area information when the document data is assigned by the document shaping unit, and the area information correction unit stores the area information in order to eliminate the mismatch. Since the content stored in the copy is corrected, the document data and the area information are always consistent with each other, and a document that is easy to see and looks good can be automatically shaped.

（実施例）以下、図面に基づいて本発明の一実施例について説明す
る。(Embodiment) An embodiment of the present invention will be described below with reference to the drawings.

第２図〜第10図は、本実施例に係る文書整形装置を説明
するための図である。2 to 10 are diagrams for explaining the document shaping device according to the present embodiment.

この装置は、第２図にその全体構成を示すように、入力
部10と、構造解析部20と、フレーム情報記憶部31と、文
書整形部40と、出力状態監視部51と、フレーム情報修正
部61と、出力部70とで構成されている。This apparatus has an input section 10, a structure analysis section 20, a frame information storage section 31, a document shaping section 40, an output state monitoring section 51, and a frame information correction section, as shown in FIG. It is composed of a unit 61 and an output unit 70.

入力部10は、文書データを入力するためのもので、原文
を直接入力するためのキーボード11や、既に作成された
原文を記憶しているディスクファイル12等から、入力制
御部13を介して例えばパラグラフ単位に文書データを取
出し、構造解析部20に与えるように構成されている。入
力制御部13は、例えば第３図に示すように、入力原文中
の改行コードのように文書の構造を規定する単位の区切
りコードによって入力原文を分割する。The input unit 10 is for inputting document data, and for example, from a keyboard 11 for directly inputting an original sentence, a disk file 12 storing an already created original sentence, or the like via an input control unit 13, The document data is taken out in paragraph units and given to the structure analysis unit 20. For example, as shown in FIG. 3, the input control unit 13 divides the input original text by a delimiter code that is a unit that defines the structure of the document such as a line feed code in the input original text.

構造解析部20は、上記入力部10からパラグラフ単位で入
力される文書データの構造属性を抽出するもので、構造
属性抽出部21と、構造解析辞書22と、文書データ記憶部
23とで構成されている。構造解析辞書22には、日本語辞
書、形態素解析規則、論理解析規則等が備えられてい
る。構造属性抽出部21は、この構造解析辞書22を参照し
て入力された文書データの形態要素のつながり関係、前
後の文の構造をもとに、標題、章見出し、章段落等の構
造属性を抽出する。文書データ記憶部23は、第３図に示
すように、入力された文書データをパラグラフ単位で上
記抽出された構造属性と対応させて記憶する。The structure analysis unit 20 extracts the structure attributes of the document data input from the input unit 10 in paragraph units, and includes a structure attribute extraction unit 21, a structure analysis dictionary 22, and a document data storage unit.
It is composed of 23 and. The structure analysis dictionary 22 includes a Japanese dictionary, morphological analysis rules, logical analysis rules, and the like. The structural attribute extraction unit 21 determines structural attributes such as titles, chapter headings, and chapter paragraphs based on the connection relationship between the morphological elements of the document data input with reference to the structure analysis dictionary 22 and the structure of the preceding and following sentences. Extract. As shown in FIG. 3, the document data storage unit 23 stores the input document data in paragraph units in association with the extracted structural attributes.

フレーム情報記憶部31は、文書データの割付け領域であ
るフレームに関する情報を記憶する部分である。第４図
にフレーム情報記憶部31の内容を示す。即ち、フレーム
情報記憶部31には、各ページの情報として、ページ情報
とフレーム情報とがページ毎に格納されている。ページ
情報は、フレームを配置するための外枠の情報であり、
ページ番号、ページサイズ、１行の文字数、１ページの
行数、そのページに含まれるフレーム数から構成され
る。フレーム情報は、そのページに定義される１又は複
数のフレームの各フレームについてフレーム番号、フレ
ームの種類（シングルカラムかダブルカラムか）、フレ
ームX,Y位置、フレームX,Yサイズ、隣接するフレームの
番号、割付ける文書の構造属性等を特定する情報であ
る。これらの情報は、後述するフレーム情報修正部61に
よって適宜修正、追加、削除される。The frame information storage unit 31 is a unit that stores information regarding frames, which are allocation areas of document data. FIG. 4 shows the contents of the frame information storage unit 31. That is, the frame information storage unit 31 stores page information and frame information for each page as information of each page. Page information is the information of the outer frame for arranging the frame,
The page number, the page size, the number of characters in one line, the number of lines in one page, and the number of frames included in the page. The frame information includes the frame number, frame type (single column or double column), frame X, Y position, frame X, Y size, and adjacent frame for each frame of one or more frames defined on the page. It is information that specifies the number, the structural attribute of the document to be allocated, and the like. These pieces of information are appropriately modified, added, or deleted by the frame information modification unit 61 described later.

文書整形部40は、文書データ記憶部23に記憶された文書
データを、その構造属性情報とフレーム情報記憶部31に
格納されたフレーム情報とに基づいて、指定フレーム内
に割付け処理するもので、文書整形制御部41と整形規則
辞書42とから構成されている。整形規則辞書42は、例え
ば第５図に示すように、各構造属性に応じた割付け規則
を備えたものとなっている。The document shaping unit 40 allocates the document data stored in the document data storage unit 23 in a designated frame based on the structure attribute information and the frame information stored in the frame information storage unit 31, It is composed of a document shaping control unit 41 and a shaping rule dictionary 42. The shaping rule dictionary 42 has allocation rules according to each structural attribute, as shown in FIG. 5, for example.

出力状態監視部51は、文書整形部40で割付けられる文書
データがフレーム枠をはみ出したり、フレーム内に極端
に余白が残ってしまったり、そのページに割付けるべき
文書データのフレームが存在しない等の、文書データと
フレームとの間の不整合が発生した場合、これを検出す
る。The output state monitoring unit 51 detects that the document data assigned by the document shaping unit 40 is out of the frame, has an extremely large margin in the frame, or has no document data frame to be assigned to the page. If a mismatch between the document data and the frame occurs, this is detected.

フレーム情報修正部61は、上記出力状態監視部51が文書
データとフレームとの間の不整合を検出した場合、フレ
ーム情報記憶部31の内容を不整合の生じない内容に修正
するものである。The frame information correction unit 61 corrects the content of the frame information storage unit 31 to the content that does not cause the inconsistency when the output state monitoring unit 51 detects the inconsistency between the document data and the frame.

出力部70は、フレーム情報記憶部31の内容が不整合のな
い状態に修正された後、文章整形部40にて整形された文
書を出力するもので、出力制御を司る出力制御部71と、
この出力制御部71の制御のもとに文書を表示するCRTデ
ィスプレイ72と、このCRTディスプレイ72に表示された
文書を紙の形態で出力するプリンタ73とで構成されてい
る。The output unit 70 outputs the document shaped by the sentence shaping unit 40 after the content of the frame information storage unit 31 is corrected to a state without inconsistency, and the output control unit 71 that controls output control,
A CRT display 72 that displays a document under the control of the output control unit 71 and a printer 73 that outputs the document displayed on the CRT display 72 in the form of paper.

次に以上のように構成された本実施例の装置の動作につ
いて説明する。Next, the operation of the apparatus of this embodiment configured as described above will be described.

入力部10を介して第３図に示した文書がパラグラフ単位
で構造解析部20に入力されると、構造情報抽出部21は、
構造解析辞書を参照しながら、上記入力文書の構造属性
を抽出する。第３図の例では、文番号１は、最初の文章
で、「文章整形装置」という名詞句で終わっているの
で、これは標題であると判定される。文番号２は、「山
田太郎」という固有名詞であることが構造解析辞書22中
の日本語辞書によって分るので、前後の関係からこれは
著者名であると判定される。同様にして所属、章見出
し、章段落等の論理構造属性が抽出される。入力文書デ
ータの全てのパラグラフについて構造属性が抽出された
ら、抽出された構造属性情報は文書データとともに文書
データ記憶部23に格納される。When the document shown in FIG. 3 is input to the structure analysis unit 20 in paragraph units via the input unit 10, the structure information extraction unit 21
The structural attributes of the input document are extracted with reference to the structural analysis dictionary. In the example of FIG. 3, sentence number 1 is the first sentence and ends with the noun phrase “sentence shaping device”, so this is determined to be the title. Sentence number 2 is known to be a proper noun "Taro Yamada" by the Japanese dictionary in the structural analysis dictionary 22, so it is determined to be the author's name from the context. Similarly, logical structure attributes such as affiliation, chapter headings, chapter paragraphs are extracted. When the structural attributes have been extracted for all paragraphs of the input document data, the extracted structural attribute information is stored in the document data storage unit 23 together with the document data.

続いて文書データ記憶部23に格納された文書データは、
文書整形部40に与えられ、文書整形処理に供される。即
ち、例えば、フレーム情報記憶部31に記憶された第１ペ
ージのフレーム情報によって第６図（ａ）に示すような
３つのフレーム＃1,＃2,＃３が定義され、第１フレーム
の「フレームのＹサイズ」として「３行」が、また「割
付ける文書の構造属性」として「標題」、「著者名」及
び「所属」が定義され、第2,第３フレームの「割付ける
文書の構造属性」として「章見出し」、「章段落」等が
定義されているとする。文書整形部40は、まず、第１文
「文書整形装置」を読出し、その属性が「標題」である
ことから、「標題」を割付け可能な第１フレームに、第
５図の規則に基づき、この第１文を１行目にセンタリン
グして割付ける。続いて文書整形部40は、第２文の「山
田太郎」、第３文の「○ｘ△株式会社」についても同様
に整形規則辞書42に従って第１フレームに割付ける。次
に第４文「xxxx研究所」を取出して、その属性「所属」
に応じた第１フレームに割当てようとすると、第６図
（ａ）に示すように、第１フレームは、３行分のスペー
スしかないため、この第４文ははみ出してしまうことに
なる。このような事態が発生すると、出力状態監視部51
はこれを検知しフレーム情報61に１文はみ出しが発生し
たことを示す情報を出力する。Subsequently, the document data stored in the document data storage unit 23 is
It is given to the document shaping unit 40 and used for the document shaping process. That is, for example, the three frames # 1, # 2, and # 3 as shown in FIG. 6A are defined by the frame information of the first page stored in the frame information storage unit 31. "3 lines" is defined as "Y size of frame", and "title", "author name" and "affiliation" are defined as "structural attributes of document to be laid out". It is assumed that “chapter heading”, “chapter paragraph”, etc. are defined as “structural attributes”. The document shaping unit 40 first reads the first sentence “document shaping device”, and since the attribute is “title”, the “title” can be assigned to the first frame based on the rule of FIG. This first sentence is centered and assigned to the first line. Subsequently, the document shaping unit 40 similarly assigns the second sentence “Taro Yamada” and the third sentence “○ xΔ Co., Ltd.” to the first frame according to the shaping rule dictionary 42. Next, take out the 4th sentence "xxxx Institute" and assign it the attribute "affiliation".
If the first frame corresponding to the above is attempted to be allocated to the first frame, as shown in FIG. 6 (a), the first frame has only a space for three lines, and therefore the fourth sentence would be out of range. When such a situation occurs, the output state monitoring unit 51
Detects this and outputs to the frame information 61 information indicating that one sentence has overflowed.

この情報を受取ったフレーム情報修正部61は、フレーム
情報記憶部31にアクセスして現フレーム（第１フレー
ム）に関する情報のうちＹサイズを１つ増やす。次に下
側に隣接するフレームの番号（ここでは＃2,＃３）を取
出し、これら第２、第３フレームのＹサイズを１つ減ら
すとともにＹ位置を一つ増やす。これによって第６図
（ｂ）に示すように、第１フレームが１行分拡大され、
第2,第３フレームが１行分縮小する。そして、文書整形
制御部41は、再度、整形規則辞書42から第１フレームの
情報を取出し、第４文の割付け処理を行なう。The frame information correction unit 61 that has received this information accesses the frame information storage unit 31 and increases the Y size by one from the information regarding the current frame (first frame). Next, the numbers of the frames adjacent to the lower side (# 2, # 3 in this case) are taken out, and the Y size of these second and third frames is decreased by one and the Y position is increased by one. As a result, the first frame is enlarged by one line as shown in FIG.
The second and third frames are reduced by one line. Then, the document shaping control unit 41 again extracts the information of the first frame from the shaping rule dictionary 42, and performs the allocation process of the fourth sentence.

第５文以降は、構造属性が「省見出し」、「章段落」、
・・・と続くので、第２フレームから第３フレームへと
割付けが行われる。From the 5th sentence onwards, the structural attributes are "Minimum heading", "Chapter paragraph",
... and so on, so allocation is performed from the second frame to the third frame.

更に第11文が取出されると、その構造属性が「脚注」と
なっている。この時点では、「脚注」を格納すべきフレ
ームが第１ページのフレーム情報に定義されていないの
で、再び出力状態監視部51はこれを検知しフレーム情報
61に未定義フレームへの割付けが発生したことを示す情
報を出力する。Furthermore, when the 11th sentence is extracted, its structural attribute is "footnote". At this point, the frame in which the “footnote” should be stored is not defined in the frame information of the first page, so the output state monitoring unit 51 again detects this and detects the frame information.
The information indicating that the allocation to the undefined frame has occurred is output to 61.

この情報を受取ったフレーム情報修正部61は、その構造
属性が「脚注」であることから、脚注参照箇所を抽出
し、該参照箇所を含むフレーム（第２フレーム）に関す
る情報のうちＹサイズを１つ減らす。次に「脚注」用の
新たな第４フレームを第２フレームの下に形成すべく新
たなフレーム情報を登録する。これによって第６図
（ｃ）に示すように、脚注用の第４フレームが形成され
ることになる。The frame information correction unit 61 that has received this information extracts the footnote reference point because its structural attribute is "footnote", and sets the Y size to 1 out of the information related to the frame (second frame) including the reference point. Reduce by two. Next, new frame information is registered to form a new fourth frame for "footnote" under the second frame. As a result, a fourth frame for footnote is formed as shown in FIG. 6 (c).

更に第54文が割付けられると、次の文が存在しないこと
から、この文が文書データの最終文であることがわか
る。このとき割付けが行われていたフレームがダブルカ
ラムである場合、第７図（ａ）に示すように、第６フレ
ームのＹサイズa21＋b1に対し、余白部分b1が存在して
いると、出力状態監視部51はこれを検知し、フレーム情
報修正部61にダブルカラムのフレーム内に余白が発生し
たことを示す情報を出力する。Further, when the 54th sentence is assigned, the next sentence does not exist, which means that this sentence is the final sentence of the document data. If the frame being allocated at this time is a double column, as shown in FIG. 7 (a), if the margin portion b1 exists for the Y size a21 + b1 of the sixth frame, the output state monitoring is performed. The unit 51 detects this and outputs information indicating that a blank space has occurred in the double-column frame to the frame information correction unit 61.

この情報を受取ったフレーム情報修正部61は、第６フレ
ームとこれと左右に隣接する第５フレームのＹサイズを
a11から（a11＋a21）/2＝c1＝c2に変更する。これによ
って、第７図（ｂ）に示すようにダフルカラムのフレー
ムの左右のＹサイズを揃えた形の文書を整形できる。Upon receiving this information, the frame information correction section 61 determines the Y size of the sixth frame and the Y size of the fifth frame adjacent to the sixth frame.
Change from a11 to (a11 + a21) / 2 = c1 = c2. As a result, as shown in FIG. 7 (b), it is possible to shape a document in which the left and right Y sizes of the frame of the duffle column are aligned.

このようにして、正しいフレーム情報が登録されたら、
文書整形部40は、最終的に文書の割付け処理を実行し、
得られた文書を出力部を介して出力する。In this way, when the correct frame information is registered,
The document shaping unit 40 finally executes the document allocation process,
The obtained document is output via the output unit.

以上の処理を第８図〜第10図に示す。即ち、第８図に示
すように、この装置では、１パラグラフずつ文書整形処
理を行ない、はみ出しやフレームの不存在が発生するか
どうかを調べ、これらが検知されたらフレーム修正処理
を実行する。フレームの修正は、第９図に示すように、
原フレームの収納属性とはみ出し文書の属性との同一性
に応じてフレーム修正処理と新フレーム設定処理のいず
れか一方を選択する。また、第８図において、割付けが
終了したら、文書末尾整形処理を行なう。この文書末尾
整形処理は、第10図に示すように、現フレームがダブル
カラムで、原フレームに余白が残っている時に、前述し
た左右のフレームのサイズを統一すれば良い。The above processing is shown in FIGS. That is, as shown in FIG. 8, in this apparatus, the document shaping process is performed one paragraph at a time, and it is checked whether the protrusion or the absence of the frame occurs, and if these are detected, the frame correction process is executed. As shown in FIG. 9, the frame modification is as follows.
Either the frame correction process or the new frame setting process is selected according to the identity between the original frame storage attribute and the protruding document attribute. Further, in FIG. 8, when the allocation is completed, the document tail shaping process is performed. In this document tail shaping process, as shown in FIG. 10, when the current frame is a double column and a margin remains in the original frame, the sizes of the left and right frames described above may be unified.

以上の実施例によれば、フレームサイズと文書データと
の整合性がとれるばかりでなく、ダブルカラムの文書末
尾のサイズも揃えることができるので、見栄えの良い見
易い文書を整形できる。According to the above-described embodiment, not only the frame size and the document data can be matched, but also the sizes of the double-column document ends can be made uniform, so that a good-looking and easy-to-see document can be shaped.

なお、上記実施例では、文書データの割付け領域として
フレームの自動設定について説明したが、同様に罫線の
内外の文書データを構造属性に基づいて判定することに
より、罫線の設定処理に本発明を適用することも可能で
ある。In the above embodiment, the automatic setting of the frame as the allocation area of the document data has been described, but similarly, the present invention is applied to the ruled line setting process by determining the document data inside and outside the ruled line based on the structural attribute. It is also possible to do so.

［発明の効果］以上述べたように、本発明によれば、文書データの構造
属性と領域情報とに基づいて、文書データを所定の領域
に割付けながら文書データと領域情報との整合性をとっ
ていくので、最終的に設定された領域情報は文書データ
と整合性の取れたものとなる。このため、見栄え良く見
易い文書を整形できるという効果を奏する。[Effects of the Invention] As described above, according to the present invention, the consistency between the document data and the area information is obtained while allocating the document data to a predetermined area based on the structure attribute of the document data and the area information. Therefore, the finally set area information becomes consistent with the document data. Therefore, it is possible to form a document that looks good and is easy to see.

[Brief description of drawings]

第１図は本発明の構成を示すブロック図、第２図〜第10
図は本発明の一実施例に係る文書整形装置を説明するた
めの図で、第２図は全体ブロック図、第３図は文書デー
タと構造属性抽出部で抽出された構造属性を示す図、第
４図はフレーム情報記憶部の記憶内容を示す図、第５図
は整形規則辞書の内容を示す図、第６図及び第７図は同
装置によりフレームが形成されていく様子を時系列的に
示した図、第８図〜第10図は同装置の作用を説明するた
めの流れ図である。 10……入力部、11……キーボード、12……ディスクファ
イル、13……入力制御部、20……構造解析部、21……構
造属性抽出部、22……構造解析辞書、23……文書データ
記憶部、40……文書整形部、41……文書整形制御部、42
……整形規則辞書、50,51……出力状態監視部、60……
領域情報修正部、61……フレーム情報修正部、70……出
力部、71……出力制御部、72……CRTディスプレイ、73
……プリンタ。FIG. 1 is a block diagram showing the configuration of the present invention, and FIGS.
FIG. 2 is a diagram for explaining a document shaping device according to an embodiment of the present invention, FIG. 2 is an overall block diagram, FIG. 3 is a diagram showing document data and structural attributes extracted by a structural attribute extraction unit, FIG. 4 is a diagram showing the contents stored in the frame information storage unit, FIG. 5 is a diagram showing the contents of the shaping rule dictionary, and FIGS. 6 and 7 are time-series diagrams showing how frames are formed by the apparatus. 8 and 10 are flow charts for explaining the operation of the apparatus. 10 …… input section, 11 …… keyboard, 12 …… disk file, 13 …… input control section, 20 …… structure analysis section, 21 …… structure attribute extraction section, 22 …… structure analysis dictionary, 23 …… document Data storage unit, 40 ... Document shaping unit, 41 ... Document shaping control unit, 42
...... Shaping rule dictionary, 50,51 …… Output state monitoring unit, 60 ……
Area information correction unit, 61 ... Frame information correction unit, 70 ... Output unit, 71 ... Output control unit, 72 ... CRT display, 73
...... Printer.

Claims

[Claims]

1. An input unit for inputting document data, a structure analysis unit for analyzing a structure attribute of document data input via this input unit, and an allocation area corresponding to the structure attribute of the document data for each page. A region information storage unit that stores region information that is specified for each of the document data, based on the structure attribute of the document data obtained by the structure analysis unit and the region information stored in the region information storage unit. A document shaping unit that is assigned to the assigned area according to the structural attribute, and an output state monitoring that monitors the state assigned by the document shaping unit and detects if there is a mismatch between the document data and the assigned area Section, an area information correction section that corrects the stored contents of the area information storage section when the output state monitoring section detects the inconsistency so that the inconsistency is resolved, and the document shaping section Be Document formatting apparatus being characterized in that and an output unit for outputting a document.

2. The area information storage unit, for each page, page information such as page number, page size, number of lines per page, and the like,
2. The document shaping device according to claim 1, which stores frame information such as a frame number of a frame included in the page, a frame type, a frame position, a frame size, and a allocable structural attribute.

3. The output state monitoring unit detects a mismatch of a storage space of the allocation area with respect to document data to be allocated, an absence of an area to be allocated, and the area information correction unit. 3. The document shaping apparatus according to claim 2, wherein the frame size is corrected or a new frame is added based on the information from the output state monitoring unit.