JP2680540B2

JP2680540B2 - Document layout method

Info

Publication number: JP2680540B2
Application number: JP6095145A
Authority: JP
Inventors: 勇岩井; 美和子土井; 利夫岡本
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1994-05-09
Filing date: 1994-05-09
Publication date: 1997-11-19
Anticipated expiration: 2012-11-19
Also published as: JPH06342428A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は文書デ―タの文書構造を
解析して求められる階層的論理構造に従って、その文書
デ―タの出力レイアウト形式を効果的に支援することの
できる文書処理装置に関する。【０００２】【従来の技術】ワ―ドプロセッサ等の文書処理装置にあ
っては、文字コ―ドや句読点コ―ド等のコ―ド情報の系
列として文書デ―タが入力される。そして、そのコ―ド
情報の系列で示される文書デ―タを文書ファイルに登録
したり、プリンタやディスプレイに出力したりする。【０００３】然し乍ら、コ―ド情報で示される文字列を
そのまま出力するだけでは、その文書が非常に読み難
い。そこで一般的には、或るまとまりのある文の区切り
位置に改行コ―ドを挿入し、またその改行位置の次の先
頭にスペ―ス・コ―ドを挿入して段落をつけたりして、
その文書形式を整えることが行われる。【０００４】更には文書全体を、例えば章・節等の複数
の範囲に分割し、そのまとまり毎に見出しを付したり、
更にその文書を読み易くするべく、タブやインデント等
の制御コ―ドを挿入することも行われる。【０００５】【発明が解決しようとする課題】ところが本来の文書デ
―タとは直接関係のない、例えば上述した改行コ―ド等
の制御コ―ドを挿入しながら文書作成することは文書作
成の思考の妨げとなり、文書作成効率の低下の原因とな
っている。【０００６】また、このようにして作成した文書を編集
し直す場合には、上述した制御コ―ドを削除したり、ま
た別の箇所に挿入したりすることが必要となる。この
際、文書デ―タに挿入される制御コ―ドによって文書構
造が変化することから、例えば数ペ―ジに亙る文書を再
編集するような場合、その文書構造を全体的に統一する
には多大な労力を必要とする等の問題があった。例えば
文書形式を統一する為には、数ペ―ジ前の文書形式を参
照する等の手続きが必要となる。これ故、簡易に文書の
編集処理を進めることができず、その処理効率の向上を
図ることが望めなかった。【０００７】本発明はこのような事情を考慮してなされ
たもので、その目的とするところは、文書が持つ階層的
論理構造を積極的に利用して文書処理を行なうことによ
り、上述した文書処理の煩わしさを解消して効果的な文
書処理を可能ならしめる文書処理装置を提供することに
ある。【０００８】【課題を解決するための手段】本発明は、入力される文
書データから見出しとなる文と、この見出し文に続く文
とを判別し、前記文書データから判別される見出し文同
士を比較して順次前記見出し文同士の関係を求め、前記
見出し文および該見出し文に続く文の少なくとも一方を
前記求められた見出し文同士の関係に応じたレイアウト
形式として前記文書データを出力手段に展開するように
している。【０００９】【００１０】【作用】この結果、本発明によれば、入力される文書デ
ータから見出しとなる文と、この見出し文に続く文とを
判別する。次いで、この判別された見出し文について、
見出し文同士を比較して、順次見出し文同士の関係を求
める。そして、見出し文および該見出し文に続く文の少
なくとも一方を、求められた見出し文同志の関係に応じ
たレイアウト形式にした上で、文書データを展開するよ
うにしている。これにより、文書データの出力レイアウ
ト形式を事前に求めた見出し文同士の関係を参考にしな
がら自動的に決定でき、これら結果に応じたレイアウト
形式によりディスプレイ表示またはプリント出力でき
る。【００１１】従って文書デ―タの入力時、またはその編
集処理時に一々その出力レイアウト形式を配慮すること
なく、つまり改行コ―ドの挿入位置を配慮したり、イン
デントやタブの設定を行うことなしにその文書処理する
ことが可能となる。しかもこのようにして任意に文書処
理しても、その文書デ―タは予め定められたレイアウト
形式に展開されてディスプレス表示、またはプリント出
力されることになる。【００１２】【実施例】以下、図面を参照して本発明の実施例につき
説明する。図１は実施例装置の概略構成図である。図１
において１は装置本体をなす文書管理部である。キ―ボ
―ド等からなる入力部２を介してコ―ド情報の系列とし
て入力される文書デ―タは、例えば図２に示すような文
書構造を持つものであり、上記文書管理部１の制御の下
で原文記憶部３に格納され、文書処理に供される。そし
て文書管理部１にて文書処理された文書デ―タは、表示
制御部４の制御の下で表示部５にて表示されるようにな
っている。【００１３】図３は入力文書デ―タに対する文書構造解
析の処理手続きを示すものであり、この流れに沿って本
装置の機能を説明する。文書管理部１は前記入力部２か
ら文書デ―タを入力し（ステップａ）、これを上記原文
記憶部３に格納すると共に、該文書デ―タ中の区切りコ
―ド、例えば改行コ―ドを検出し、この区切りコ―ドに
より区切られる１まとまりの文を順に抽出している。同
時にその１文の長さを計測している。そして抽出した１
文を単位として、以下に示す処理の実行を管理・制御し
ている。【００１４】見出し抽出部６は、文書管理部１にて抽出
された１文が見出しとしての可能性があるか否かを、上
記の如く計測された１文の長さの情報と、見出し語辞書
６ａを参照して判定する（ステップｂ）。この見出し語
辞書６ａは、見出しとして出現頻度の高い語句や記号
を、例えば図４に示すようにそのカテゴリ毎に分類して
予め登録したものである。具体的には見出しとして出現
頻度の高い、例えば「はじめに」「あらすじ」等の語句
を『見出し予約語』なるカテゴリにまとめて登録し、ま
た見出しとして出現頻度の高い数字・記号を、それぞれ
のカテゴリ毎にまとめて登録している。【００１５】見出し抽出部６は、抽出された文の長さが
所定の文字数（例えば４０文字）以内であるか否かを判
定し、所定の文字数以内である場合には見出しの可能性
があると判定している。そしてこの文について、その文
（コ―ド情報の系列で示される語句や数字・記号）が見
出し語辞書６ａに登録されているか否かを検索し（ステ
ップｂ）、見出し語辞書６ａに該当する語句が登録され
ている場合には、これを見出し候補としている（ステッ
プｃ）。【００１６】しかして見出し判定部７は、その見出し規
則辞書７ａに格納された図５、図６および図７（ａ）
（ｂ）に示す如き見出し規則に基いて、前記見出し抽出
部６で見出し候補として抽出された文が上記見出し規則
にマッチングするか否かを判定し（ステップｃ）、その
見出し候補が見出し規則にマッチングした場合、これを
見出し文であると判定している（ステップｅ）。尚、前
記見出し抽出部６にて見出し語が検出されなかった文、
および見出し候補として判定された文であっても前記見
出し規則に該当しなかった文は、見出し文でないと判定
される（ステップｆ）。つまり段落等の文書本文である
と判定される。【００１７】文書構造判定部８は、上記の如く見出し
文、或いはそれ以外の文として決定された各文に対し
て、文書構造規則辞書８ａに格納された、例えば図８、
図９および図１０（ａ）（ｂ）に示す如き文書構造規則
に従い、その文が章見出しであるか、節見出しである
か、段落であるか等の文書論理構造をそれぞれ判定して
いる（ステップｇ）。この文書構造判定部８にて判定さ
れた前記各文の論理構造を求め（ステップｈ）、その論
理構造の情報を各文にそれぞれ対応付けて論理構造記憶
部１０に格納している（ステップｉ）。【００１８】尚、文書構造規則に基く論理構造解析に失
敗した文については、その論理的構造に誤りがあるとし
て、例えば入力文書デ―タの誤り修正を促す等のエラ―
処理が行われる（ステップｊ）。【００１９】文書展開部９は、以上の文書構造解析処理
によって求められた入力文書デ―タに対する階層的論理
構造に従って、入力された各文をその論理構造に応じて
定められたレイアウト形式で出力部に展開する。即ち、
文書展開部９のレイアウト規則辞書９ａには、その論理
構造の種類に応じて定められた出力レイアウト形式を決
定するレイアウト規則が格納されている。文書展開部９
は、上述した如く各文についてそれぞれ求められた論理
構造に従って、各文の文書デ―タを前記表示部５にて表
示する為のレイアウト規則を上記レイアウト規則辞書９
ａから求め、そのレイアウト規則に従って各文の文書デ
―タを前記表示制御部４にそれぞれ展開している。【００２０】前記表示制御部４は、このように展開され
た文書デ―タの展開構造に従って前述した文書デ―タの
表示部５による表示を制御することになる。かくしてこ
のように構成された装置によれば、次のようにして入力
文書デ―タの階層的論理構造が解析される。【００２１】入力部２から文書デ―タが入力されると、
その文書デ―タは原文記憶部３に順次格納されると共
に、文書管理部１にて区切り処理される。この区切り処
理は、入力コ―ド情報が改行コ―ドやスペ―ス・コ―ド
か、或いは「…」「；」「：」等の区切り記号であるか
を判定し、これらの区切りコ―ドによって入力コ―ド情
報の系列を１文毎に切り出すことによって行われる。こ
の際、上記区切りコ―ドによって区切られるコ―ド情報
の系列の長さを計数する等して、その文の長さ（文字
数）が計測される。【００２２】しかして今、前記図２に示す文書デ―タが
入力された場合について説明すると、改行コ―ドによっ
て区切られた第１行目の文『文書構造理解システム』、
および第２行目の文『大川太郎』が与えられると、これ
らの各文については該当する語句が見出し語辞書６ａに
登録されていないことから、見出し抽出部６にてそれぞ
れ見出しでないと判定される。文書構造判定部８はこの
第１行目の文を、例えば文書の冒頭に出現する名詞句で
ある等の規則に適合することから、その属性が標題であ
るとして判定する。また第２行目の文については、固有
名詞、特に人名を示す固有名詞であり、標題の後に出現
する文である等の規則に従って著者名であると判定す
る。【００２３】しかる後、第３行目の文『１．はじめに』
が与えられると、この文を構成している『１』『．』
『はじめに』なる語句が前記見出し語辞書６ａからそれ
ぞれ見出される。この結果、この文は見出し候補Ａとし
て判定され、同時にその見出し候補を構成しているカテ
ゴリが『（数字部）（後置部）（見出し予約語）』とし
て求められる。【００２４】すると見出し判定部７は、この見出し候補
Ａとして判定された文の構造が見出し規則に適合してい
るか否かを見出し規則辞書７ａを参照して調べる。この
見出しの判定処理は、先ず見出し候補Ａを構成するカテ
ゴリの並びを解析し、その解析構造が前記図５に示す見
出しとしての条件を満しているか否かを判定する。この
場合、上記カテゴリ『（数字部）（後置部）（見出し予
約語）』が、図５、図６および図７（ａ）（ｂ）に示す
規則に従って図１１に示す如く解析され、見出しパタ―
ンを構成していることが確認されるから、見出しＢであ
ると判定される。尚、この判定処理によって上記カテゴ
リの並びが図５、図６および図７（ａ）（ｂ）に示す条
件のいずれにも一致しないことが判定されたならば、上
記見出し候補は見出しでないと決定されることになる。【００２５】しかして上記見出しＢが求められると、次
に前記文書構造判定部８にてその文書構造が図８、図９
および図１０（ａ）（ｂ）に例示する規則のどれに該当
するかが判定される。この場合には、今までに分析され
た文の論理構造が前述したように「標題」と「著者名」
であり、章見出しが出現していないことから図８に示す
見出し規則と照合される。この照合によって上記文
『１．はじめに』なる見出し候補が図８に示す規則の条
件（１）（１，１）（１，１，１）（１，１，１，１）
に一致すことが見出され、該見出し『１．はじめに』が
章見出しＣを構成していることが一意に決定される。そ
してその論理構造の情報が、論理構造記憶部１０に格納
される。【００２６】しかる後、第４〜５行目に亙る文が入力さ
れると、その文字数が見出しとして可能性の或る所定の
文字数を越えることから、見出し以外の文であると判定
される。そしてこの場合には、その文が図１０（ａ）に
示す文書構造規則に該当していることから、段落を構成
する文であると判定される。【００２７】以下、同様にして区切りコ―ドによって区
切られた文が見出し文であるか否かが判定され、文書構
造規則辞書８ａと照合されてその文書構造が順次求めら
れる。【００２８】例えば見出しを構成する文が再び入力され
ると、先の例と同様にして見出し候補Ａとして検出さ
れ、図５、図６および図７（ａ）（ｂ）に示す見出し規
則に適合することが判定される。そしてこの見出しＢに
ついては、前記論理構造記憶部１０の内容から既に章見
出しが検出されていることが示されるので、先ず図９に
示す文書構造規則を参照し、上記見出しＢのパタ―ンが
どの条件に該当するかを調べる。そしてその見出し候補
のパタ―ンが条件（１，１）（２，１）（３，１）
（４，１）に該当する場合には、先に求められた章見出
しＣと同じレベルの見出しである可能性があることが判
定される。しかる後、前記図８に示す規則に適合するか
否かを調べ、図８に示す文書構造規則の条件（１）
（１，１）（１，１，２）（１，１，２，２）（１，
１，２，２，１）に一致する場合には、その見出しを既
に判定された先の章見出しと同一レベルの章見出しであ
ると判定する。つまり、ここでは、先に求められた見出
しのレベルを常に参照し、以前に出てきた見出しと違う
種類の見出しである場合は、新たなレベルの見出しと決
定し、以前に出てきた見出しと同じ種類の見出しである
場合は、同じレベルの見出しと決定するようになる。【００２９】尚、丸数字を冒頭に付した見出し文が与え
られた場合には、それ以前に同様なパタ―ンの見出しが
検出されていないので図９に示す規則（条件）との照合
にてマッチング不成功となる。この結果、先に求められ
た見出しとはレベルの異なる見出しであることが判定さ
れる。その後、図８に示す文書構造規則に対する照合に
より、例えば条件（１）（１，１）（１，１，２）
（１，１，２，２）（１，１，２，２，２）（１，１，
２，２，２，１）との一致が検出されてその見出し文が
箇条書き見出しであると判定される。【００３０】尚、文が段落であるとして判定された場合
には、その段落がどのレベルの見出しを受けたものかが
わからない場合がある。このようなときには、例えば図
１０（ｂ）に示す規則を参照して、段落と見出しとの接
続関係を判定し、そのレベルを定めるようにすれば良
い。【００３１】このような文書構造解析処理によって、そ
の入力文書デ―タを構成する各文の文書構造がそれぞれ
求められ、その階層的論理構造の情報が入力文書デ―タ
にそれぞれ対応して、例えば図１２に示すように論理構
造記憶部１０に格納される。【００３２】この図１２に示す文書の論理構造デ―タに
ついて簡単に説明すると、その文書の情報は、［］で
囲んで示した文書デ―タにそれぞれ対応付けて、｛｝
で囲んで示され、冒頭の数値にてその文の階層レベル、
次の情報でその文属性を、そして記号ｚで始まるデ―タ
にて文書デ―タの解析結果を示している。この文書デ―
タの解析結果は、例えば次のような意味を持っている。【００３３】ここで、ｚ１は記号部、ｚ２は英数字括弧
始め、ｚ３は予約語部、ｚ４は英数字部１、ｚ５は英数
字部２、ｚ６は英数字括弧終り、ｚ７は後置部１、〜、
ｚ２１は見出し予約語、ｚ２２は主見出し本体始め位
置、ｚ２３は主見出し本体終り位置、ｚ２４は後見出し
記号、〜、ｚ２８は内容部始め位置、ｚ２９は内容部終
り位置である。【００３４】このような意味を持つ記号の後に、その文
書デ―タが何であるか、或いは何文字目であるかの情報
を付して、その解析結果が表現される。さて前記文書展
開部９は、上述した如く求められた文書デ―タに対する
階層的論理構造の情報に従い、該文書デ―タを次のよう
にして出力部に展開している。【００３５】図１３は文書展開部９のハ―ドウエア構成
を示すものである。論理構造デ―タ読出し部１１は、前
記論理構造記憶部１０に格納された前記図１２に示す如
き文書デ―タとその論理構造デ―タを読出すもので、そ
の文書デ―タは１文字切出し部１２にて１文字毎に切出
されて文書デ―タ割当て部１３に与えられる。この際、
文属性取出し部１４は、前記図１２に示す「ｔｉｔｌ
ｅ」「ａｕｔｈｅｒ」等の文属性デ―タを抽出してレイ
アウト情報検出部１５に与えてやり、またレイアウト情
報解析部１６は、前述したｚの記号を付されて記述され
た論理構造情報を読出している。【００３６】ここでレイアウト規則辞書９ａには、例え
ば図１４（ａ）（ｂ）に示すように、文書デ―タを出力
するフレ―ム（文書枠）に関する情報と、文属性に応じ
てその文をどのようにレイアウト出力するかの情報から
なるレイアウト規則が格納されている。具体的には上記
フレ―ム情報は、図１４（ａ）に示すようにディスプレ
イやプリンタ等の出力装置上でのフレ―ム管理番号、そ
の出力位置の情報、およびそのフレ―ムの大きさの情報
からなり、これによって文書デ―タが出力される表示画
面、または印刷用紙におけるフレ―ムの構成が、例えば
図１５に示す如く定義されるものとなっている。これら
のフレ―ム情報は、文書デ―タを出力するべく設定され
た複数のフレ―ム毎にそれぞれ定義される。【００３７】レイアウト規則読出し部１７は、このよう
なフレ―ム規則をレイアウト規則辞書９ａから読出して
フレ―ム形式情報バッファ１８に格納すると共に、その
情報を演算部１９に与え、そのフレ―ムの最大カラム
数、および最大ライン数を求めている。この最大カラム
数・ライン数のデ―タは最大カラム・ライン値バッファ
２０にセットされて後述する文字デ―タの展開出力制御
に用いられる。【００３８】また前記レイアウト規則辞書９ａに格納さ
れた文属性に関するレイアウト規則は、例えば図１４
（ｂ）に示すように定義される。このレイアウト規則に
よって、その文属性の種別毎にその文書デ―タを、前述
した如く定義されるフレ―ム上にどのようにレイアウト
するかが規定される。【００３９】ここでは、例えば「ｌｆ」にて対象文字列
の最後を強制改行すること、「Ｃｅ」にてその対象文字
列をセンタリングすることがそれぞれ示される。また
「ｄｓ；１」にて対象文書デ―タを出力した後に１行分
の改行を行うことが示され、「ｕｓ；１」にて対象文書
デ―タを出力する前に１行分の改行を行うことが示され
る。更に「ｌｓ；１」にてフレ―ムの左側から１文字分
のマ―ジン（空白）を設けることが示される。尚、段落
の場合にはその先頭行の１文字目を空白とする。【００４０】このようなレイアウト規則が前記レイアウ
ト情報検出部１５の制御の下で、その文属性の種別に従
って上記レイアウト規則辞書９ａから読出されて前記レ
イアウト情報解析部１６に与えられる。【００４１】レイアウト情報解析部１６は、このように
して求められるレイアウト規則に基いて文書デ―タ割当
て部１３を制御し、上記の如く１文字づつ切出される文
書デ―タを指定されたフレ―ムに順に展開している。【００４２】即ち、レイアウト情報解析部１６は、前記
フレ―ムの情報に従って文書デ―タを展開するフレ―ム
を判定し、同時にそのフレ―ムの大きさを判定してい
る。そしてその指定されたフレ―ムに、前記文属性に応
じてレイアウト規則に従って、その文書デ―タを展開制
御している。【００４３】この際、１文字切出し部１２にて文書デ―
タが１文字毎に切出される都度、これに同期してカラム
・ラインカウンタ２１の計数値がカウントアップされ
る。このカラム・ラインカウンタ２１による計数値は、
比較部２２にて前記最大カラム・ライン値バッファ２０
にセットされた値と逐次比較されており、その比較結果
は判定部２３にて判定されている。この判定部２３にて
前記１文字切出し部１２にて切出された文字のレイアウ
ト位置が、そのとき指定されているフレ―ム内であるか
否かが判定される。【００４４】そしてレイアウト制御されて展開される文
書デ―タの割当て位置がその指定されたフレ―ムを越え
たことが判定されると、フレ―ム切り換え部２４が起動
されて、前記文書デ―タ割当て部１３による文書デ―タ
の割当てフレ―ムが次のフレ―ムに更新制御されるよう
になっている。【００４５】このようにして文書展開部９は、そのレイ
アウト規則辞書９ａに設定されているレイアウト規則に
従って、その文書デ―タを出力すべきフレ―ムを設定し
ている。そしてその設定されたフレ―ムに、上記文属性
に応じて設定されたレイアウト規則に従ってその文書デ
―タを展開している。【００４６】例えば図１２に示す如く求められた論理構
造の文書デ―タにあっては、第１文が標題であり、その
レイアウト規則が「ｌｆ」「Ｃｅ」「ｄｓ；１」で示さ
れることから、先ずレイアウト情報解析部１６では前記
最大カラム・ライン値バッファ２０から、その設定され
たフレ―ムの最大カラム・ライン数を求め、例えば（Ｍ
ＡＸライン−対象文の文字数）÷２等として、そのセン
タリングして表示出力すべき文字位置を求めている。こ
のデ―タによって前記カラム・ラインカウンタ２１の計
数値がプリセット的に変更され、その変更位置からその
対象文の文書デ―タの展開、つまり文字位置の割当てが
行われる。【００４７】その後、対象文の文字デ―タの出力割当て
が終了した時点で、「ｄｓ；１」なる情報に従って強制
改行され、出力された標題を示す文字列の下に１行文の
空白行が設定される。【００４８】このような文書デ―タの出力（展開）制御
が、前述した如く切出された各文毎に、その文について
求められた論理構造に従って順次行われ、その文書デ―
タが所定のフレ―ムに順に展開されていく。この結果、
例えば図１６に例示するように、前記図１２に示す如く
階層的論理構造が求められた文書デ―タが、それに対し
て指定されたレイアウト規則に従って出力展開されるこ
とになる。【００４９】かくしてこのように構成された装置によれ
ば、入力文書デ―タからその文書の階層的論理構造が求
められ、その階層的論理構造に従う文書デ―タのレイア
ウト規則に従ってその文書デ―タが、設定されたフレ―
ム毎にその文属性に応じてそれぞれ定められたレイアウ
ト形式で展開されることになる。【００５０】具体的には定められたフレ―ムに文書デ―
タが順に割付けられ、またその文章の階層的論理構造に
従う各文の文書構造の情報、つまりその文属性に従っ
て、例えば強制改行やセンタリング等の制御が行われて
文書デ―タの展開が行われる。【００５１】そして、文書作成時や文書編集時の文書処
理を、文書データから判別される見出し文および該見出
し文に続く文について、見出し文同士を比較して求めら
れた関係に応じたレイアウト形式で文書データを出力手
段に展開できるようにしているので、レイアウト形式を
配慮することなしに文書内容にのみに着目して効率良く
実行することができる。しかも、このようにして文書処
理しても、上述した如く解析された階層的文書構造、お
よびその文書構造（文属性）によって指定されるレイア
ウト規則によって、その文書デ―タの出力レイアウト形
式が自動的に制御されるので、全体的に統一のとれた文
書形式でその文書を作成することが可能となる。【００５２】従って文書の出力形式の制御の為に文書作
成等の思考が妨げられることがなく、その文書処理を簡
易に、且つ効果的に行うことが可能となる等の実用上多
大なる効果が奏せられる。【００５３】尚、本発明は上述した実施例に限定される
ものではない。例えば文書の論理構造を示すデ―タ形式
や、その論理構造を求める為の規則、更には文書デ―タ
の出力形式を定めるレイアウト規則は、その仕様に応じ
て定めれば良いものである。また字体（文字フォント）
の変形を考慮した見出しの抽出と、文書構造の判定処
理、更にはその出力制御を行うようにしても良い。この
場合、出力文字の高輝度表示や反転表示等を同時に制御
することも勿論可能であり、そのような制御情報をレイ
アウト規則として設定しておけば十分である。【００５４】また上述した例では、文書デ―タの階層的
論理構造を求めた後、その出力レイアウト形式の制御を
行ったが、１文毎にそのレイアウト展開処理を行っても
良い。この場合、その論理構造を一旦蓄積することな
く、直接的にレイアウト展開処理に用いるようにしても
良い。更には、１つの文属性について複数のレイアウト
規則を準備しておき、その中から選択的にレイアウト規
則を適用することも可能である。これによって同一文属
性で異なった用途のレイアウト形式、例えば学会論文形
式と社内報告書形式等に適宜変換することができ、種々
のレイアウト編集を容易に行うことが可能となる。また
ここでは日本語を例に説明したが、他国語に対しても同
様に適用可能なことは勿論のことである。その他、本発
明はその要旨を逸脱しない範囲で種々変形して実施する
ことができる。【００５５】【発明の効果】以上説明したように本発明によれば、文
書データから判別される見出し文および該見出し文に続
く文について、見出し文同士を比較して求められた関係
に応じたレイアウト形式で文書データを出力手段に展開
できるので、文書処理をレイアウト形式を配慮すること
なしに効率良く実行することができ、これにより文書出
力形式の制御のためにその思考過程を妨げられることな
く、その文書処理を効果的に進めることを可能ならしめ
るなどの実用上多大なる効果を奏せられる。DETAILED DESCRIPTION OF THE INVENTION [0001] BACKGROUND OF THE INVENTION The present invention provides a document structure for document data.
According to the hierarchical logical structure obtained by analysis, the document
To effectively support the output layout format of the data
The present invention relates to a document processing device that can be used. [0002] 2. Description of the Related Art Document processors such as word processors are used.
Is a system of code information such as character codes and punctuation marks.
Document data is input as a column. And that code
Register document data represented by a series of information in a document file
Or output to a printer or display. However, the character string indicated by the code information is
The document is very difficult to read if you just output it.
No. So, in general, a group of sentence breaks
Inserts a line break code at the position and the next point after the line break.
Insert a space code on the head and add a paragraph,
The document format is adjusted. Furthermore, the entire document is divided into a plurality of chapters and sections, for example.
Divide into the range of, and add a heading for each group,
To make the document easier to read, tabs, indents, etc.
The control code of is also inserted. [0005] [Problems to be Solved by the Invention] However, the original document data
-There is no direct relation to the data, such as the line feed code mentioned above
Creating a document while inserting the control code of
This hinders the thinking of the document and causes a decrease in document creation efficiency.
ing. Further, the document created in this way is edited.
When re-doing, delete the control code mentioned above or
It is necessary to insert it in another place. this
At this time, the document structure is controlled by the control code inserted in the document data.
Due to changes in construction, for example, several pages of documents
When editing, unify the document structure as a whole
Had problems such as requiring a lot of labor. For example
To unify the document formats, see the document formats several pages before.
Procedures such as lighting are required. Therefore, you can easily
It is impossible to proceed with the editing process and the processing efficiency is improved.
I couldn't hope for it. The present invention has been made in consideration of such circumstances.
The purpose of the document is the hierarchical structure of the document.
By actively using the logical structure for document processing
To eliminate the annoyance of document processing described above
Providing a document processing device that enables call processing
is there. [0008] According to the present invention, an input sentence
The sentence that becomes the heading from the calligraphy data and the sentence that follows this heading sentence
And a headline sentence determined from the document datasame
MasterBy sequentially obtaining the relationship between the headline sentences,
At least one of the heading sentence and the sentence following the heading sentence
The requested headlineOne anotherLayout according to the relationship of
Output the above document data as a formatmeansTo expand to
doing. [0009] [0010] [Action]As a result, according to the present invention, the input document data
The sentence that becomes the heading from the data and the sentence that follows this heading sentence
Determine. Then, for this determined headline sentence,
Compare the headline sentences and find the relationship between them.
Confuse. The headline sentence and the number of sentences following the headline sentence are small.
At least one of them, depending on the desired heading sentence companionship
The document data is expanded after the layout format is changed.
I'm trying. This allows the document data output layout
Do not refer to the relationship between headline sentences
However, the layout can be determined automatically, and the layout according to these results
Can be displayed on the display or printed out depending on the format
You. Therefore, at the time of inputting document data, or its edition
Consider the output layout format one by one when collecting data.
Not, that is, considering the insertion position of the line feed code,
Process the document without setting dents or tabs
It becomes possible. Moreover, in this way
However, the document data has a predetermined layout.
Expanded to format and displayed or printed
Will be empowered. [0012] Embodiments of the present invention will now be described with reference to the drawings.
explain. FIG. 1 is a schematic configuration diagram of the embodiment apparatus. FIG.
In FIG. 1, reference numeral 1 is a document management unit that constitutes the main body of the apparatus. Keyboard
-A series of code information via the input unit 2 composed of code etc.
The document data input as is, for example, a sentence as shown in FIG.
It has a document structure and is under the control of the document management unit 1 described above.
Is stored in the original text storage unit 3 and is used for document processing. Soshi
The document data processed by the document management unit 1 is displayed.
Under the control of the control unit 4, it is displayed on the display unit 5.
ing. FIG. 3 shows a document structure solution for input document data.
It shows the analysis processing procedure, and along with this flow
The function of the device will be described. Is the document management unit 1 the input unit 2?
Enter the document data (step a),
The delimiter code in the document data is stored in the storage unit 3.
-For example, a line feed code is detected and this delimiter code is
A group of sentences that are further separated is extracted in order. same
Sometimes the length of one sentence is measured. And extracted 1
Manages and controls the execution of the following processes in units of statements
ing. The headline extraction unit 6 is extracted by the document management unit 1.
Check whether there is a possibility that the one sentence
Information on the length of one sentence measured as described above, and the entry word dictionary
The determination is made by referring to 6a (step b). This headword
The dictionary 6a is a word or symbol that frequently appears as a headline.
For example, as shown in FIG.
It is registered in advance. Specifically, it appears as a headline
Frequent words such as "Introduction" and "Synopsis"
Are collectively registered in the category of "heading reserved word", and
Numbers and symbols that appear frequently as headings
Are registered together for each category. The headline extraction unit 6 determines that the length of the extracted sentence is
Determine whether it is within a predetermined number of characters (for example, 40 characters)
If the number of characters is within the specified number, the possibility of heading
It is determined that there is. And about this sentence, that sentence
(Words, numbers and symbols shown in the code information sequence)
It is searched whether it is registered in the word dictionary 6a (step
B), the corresponding word / phrase is registered in the headword dictionary 6a.
If this is the case, this is the heading candidate (step
C). Therefore, the headline judging section 7 determines the headline rule.
5, 6 and 7 (a) stored in the rule dictionary 7a
Based on the heading rule as shown in (b), the heading extraction
The sentence extracted as a heading candidate in part 6 is the heading rule
Is determined (step c), and
If the heading candidate matches the heading rule,
It is determined that it is a headline sentence (step e). In addition, before
The sentence in which the entry word is not detected by the entry extraction unit 6,
And even if the sentence is judged as a heading candidate,
Sentences that do not meet the issuing rules are judged not to be headline sentences
(Step f). In other words, it is the text of a document such as a paragraph.
Is determined. The document structure determination unit 8 finds the index as described above.
For each sentence determined to be a sentence or other sentence
Stored in the document structure rule dictionary 8a, for example, in FIG.
Document structure rules as shown in FIGS. 9 and 10A and 10B
The sentence is a chapter heading or section heading
The logical structure of the document such as
(Step g). This is determined by the document structure determination unit 8.
The logical structure of each of the above sentences (step h)
Logical structure memory by associating physical structure information with each sentence
It is stored in the section 10 (step i). Note that the logical structure analysis based on the document structure rule is lost.
Regarding the lost sentence, it is said that there is an error in its logical structure.
Error such as urging correction of input document data.
Processing is performed (step j). The document expanding section 9 is for the above document structure analysis processing.
Hierarchical logic for input document data obtained by
According to the structure, each sentence that is input according to its logical structure
It is expanded to the output part in the specified layout format. That is,
The layout rule dictionary 9a of the document expansion unit 9 stores its logic.
Determine the output layout format determined according to the type of structure
The layout rule to be specified is stored. Document development unit 9
Is the logic obtained for each sentence as described above.
The document data of each sentence is displayed on the display unit 5 according to the structure.
The above-mentioned layout rule dictionary 9 is used to show the layout rules for indicating.
a, and the document data of each sentence according to the layout rule.
The data are expanded in the display control unit 4. The display control unit 4 is developed in this way.
Of the document data described above according to the development structure of the document data.
The display on the display unit 5 is controlled. Hiding
According to the device configured as
The hierarchical logical structure of the document data is analyzed. When the document data is input from the input unit 2,
When the document data is sequentially stored in the original text storage unit 3,
Then, the document management unit 1 performs a division process. This break
The reason is that the input code information is line feed code and space code.
Or is it a delimiter such as "..." ";" ":"
The input code information is judged by these delimiter codes.
This is done by cutting out a series of reports for each sentence. This
In this case, the code information separated by the above delimiter code
The length of the sentence (character
Number) is measured. Now, the document data shown in FIG.
To explain the case where it is entered, the line feed code
The first line sentence "Document structure understanding system" separated by
And given the sentence "Taro Okawa" on the second line,
For each of these sentences, the corresponding phrase is found in the entry word dictionary 6a.
Since it is not registered, the headline extraction section 6
It is determined that it is not a headline. The document structure determination unit 8
For example, the sentence in the first line is a noun phrase that appears at the beginning of the document.
Since it conforms to certain rules, its attribute is the title.
It is judged as Also, the sentence on the second line is unique
A noun, especially a proper noun indicating a person's name, that appears after the title
It is judged as the author's name according to the rules such as
You. Then, the sentence "1. Introduction ”
Is given, "1" ". 』
The word "Introduction" is changed from the entry word dictionary 6a.
Each is found. As a result, this sentence is heading candidate A.
And the categories that make up the heading candidate at the same time.
Gori says “(number part) (postfix part) (heading reserved word)”
Required. Then, the headline determination section 7 determines that this headline candidate
The sentence structure judged as A conforms to the heading rules
Whether or not to search is checked by referring to the index rule dictionary 7a. this
In the heading determination process, first, the categories that form heading candidate A are
The sequence of gorigo is analyzed, and the analysis structure is as shown in FIG.
It is determined whether or not the conditions for delivery are satisfied. this
In the case of the above category “(number part) (postfix part)
About) is shown in FIGS. 5, 6 and 7 (a) (b).
According to the rule, the heading pattern is analyzed as shown in FIG.
It is confirmed that the
Is determined. It should be noted that the above categorization is performed by this determination processing.
The arrangement of the lines is shown in FIGS. 5, 6 and 7 (a) (b).
If it is determined that none of the
It is determined that the heading candidate is not a heading. When the above-mentioned heading B is obtained,
In addition, the document structure determination unit 8 has the document structure shown in FIGS.
And which of the rules illustrated in FIGS. 10 (a) and 10 (b)
It is determined whether to do it. In this case, ever analyzed
As described above, the logical structure of the sentence is “title” and “author name”.
And it is shown in Fig. 8 because the chapter headings have not appeared.
Matched with heading rules. The above sentence by this collation
"1. The headline candidate "Introduction" is the rule article shown in Fig. 8.
Items (1) (1,1) (1,1,1) (1,1,1,1)
Is found to match the heading “1. "Introduction"
It is uniquely determined that the chapter heading C is formed. So
Then, the information of the logical structure is stored in the logical structure storage unit 10.
Is done. Then, the sentences on the 4th to 5th lines are input.
If the number of characters is
Since it exceeds the number of characters, it is judged as a sentence other than the headline
Is done. And in this case, the sentence is shown in FIG.
A paragraph is constructed because it conforms to the document structure rules shown below.
It is determined that the sentence is Thereafter, similarly, the division code is used for division.
It is determined whether the cut sentence is a headline sentence, and
The document structure is sequentially obtained by collating with the construction rule dictionary 8a.
It is. For example, the sentences forming the headline are input again.
Then, it is detected as the heading candidate A in the same manner as the previous example.
FIG. 5, FIG. 6 andFigureHeading rule shown in 7 (a) (b)
It is determined that the rule is met. And in this heading B
Regarding the contents of the logical structure storage unit 10,
Since it is shown that the protrusion is detected, first, in FIG.
Referring to the document structure rules shown below, the pattern of heading B above
Find out which conditions apply. And the headline candidate
Pattern is condition (1,1) (2,1) (3,1)
If it falls under (4,1), find the chapter found earlier.
It is possible that the heading is the same level as C
Is determined. After that, whether it complies with the rules shown in FIG. 8 above
The condition (1) of the document structure rule shown in FIG. 8 is checked.
(1,1) (1,1,2) (1,1,2,2) (1,
1, 2, 2, 1), the headline is already
The chapter heading of the same level as the previous chapter heading judged by
To determine.That is, here is the finding that was sought before
Always refer to the Shino level and differ from the headlines that appeared earlier
If it is a heading of a type, it is determined as a new level heading.
Defined and is the same type of headline that came out earlier
You'll come to a decision with the same level of headlines. The headline with a circled number at the beginning gives
If a similar pattern heading was issued before that,
Matching with the rules (conditions) shown in Fig. 9 because they have not been detected
Matching is unsuccessful. As a result of this,
It is determined that the
It is. After that, in the collation with the document structure rule shown in FIG.
Therefore, for example, condition (1) (1,1) (1,1,2)
(1,1,2,2) (1,1,2,2,2) (1,1,
2, 2, 2, 1) is detected and the headline
Judged as a bulleted heading. When the sentence is determined to be a paragraph
Tells you what level of heading the paragraph received.
I may not understand. In such a case, for example,
Refer to the rules shown in 10 (b), and connect paragraphs and headings.
You should determine the continuity relationship and set the level.
No. By such document structure analysis processing,
The document structure of each sentence that makes up the input document data of
The information of the hierarchical logical structure that is obtained is input document data.
For example, as shown in FIG.
It is stored in the manufacturing storage unit 10. In the logical structure data of the document shown in FIG.
To explain briefly, the information of the document is [].
Corresponding to the document data shown in the box, {}
It is surrounded by, and the numerical value at the beginning is the hierarchical level of the sentence,
The following information describes the sentence attribute and data starting with the symbol z.
Shows the analysis result of document data. This document data
The data analysis results have the following meanings, for example. Here, z1 is a symbol part, and z2 is an alphanumeric bracket.
At the beginning, z3 is a reserved word part, z4 is an alphanumeric part 1, z5 is an alphanumeric character.
The letter part 2, z6 is the end of alphanumeric brackets, z7 is the trailing part 1, ...,
z21 is a reserved word for heading, and z22 is the starting position of the main heading body.
Position, z23 is the end position of the main body of the main heading, and z24 is the rear heading.
Symbols, ~, z28 is the start position of the content part, and z29 is the end position of the content part.
It is a position. After a symbol having such a meaning, the sentence
Information about what the writing data is or what letter character
The analysis result is expressed by adding. Well, the document exhibition
The opening 9 is for the document data obtained as described above.
According to the information of the hierarchical logical structure, the document data is as follows.
It is being developed in the output section. FIG. 13 shows the hardware structure of the document expanding section 9.
It shows. The logical structure data reading unit 11
As shown in FIG. 12 stored in the logical structure storage unit 10.
It reads the document data and its logical structure data.
The document data of is cut out for each character by the one-character cutout unit 12.
It is provided to the document data allocation unit 13. On this occasion,
The sentence attribute extracting unit 14 displays the "titl" shown in FIG.
The sentence attribute data such as "e" and "auther" is extracted and extracted.
The layout information is given to the output information detection unit 15 again.
The information analysis unit 16 is described with the above-mentioned z symbol attached.
Read out the logical structure information. In the layout rule dictionary 9a, for example,
For example, as shown in FIGS. 14 (a) and 14 (b), the document data is output.
According to the information about the frame (document frame)
From the information on how to output the sentence in layout
The following layout rules are stored. Specifically above
The frame information is displayed as shown in FIG.
Frame management number on the output device such as
Output position information and frame size information
The display image from which the document data is output.
The frame or the composition of the frame on the printing paper is, for example,
It is defined as shown in FIG. these
Frame information is set to output document data.
It is defined for each of a plurality of frames. The layout rule reading unit 17
Reading the frame rule from the layout rule dictionary 9a
It is stored in the frame format information buffer 18 and
Information is given to the arithmetic unit 19 and the maximum column of the frame
I want the number and maximum number of lines. This maximum column
The maximum number of columns and line number data
Expansion output control of character data set to 20 and described later
Used for Also stored in the layout rule dictionary 9a.
The layout rule relating to the sentence attribute is, for example, as shown in FIG.
It is defined as shown in (b). To this layout rule
Therefore, the document data for each type of sentence attribute is
How to layout on a frame defined as
Whether to do it is regulated. Here, the target character string is, for example, "lf".
Force a line break at the end of, the target character in "Ce"
Centering the columns is shown respectively. Also
One line after outputting the target document data with "ds; 1"
Is indicated as a line break, and the target document is indicated by "us; 1"
It is indicated that a line break for one line is performed before outputting the data.
You. Furthermore, "ls; 1" is 1 character from the left side of the frame.
It is shown that a margin (blank) is provided. In addition, paragraph
In the case of, the first character of the first line is blank. Such a layout rule is used for the layout.
Under the control of the text information detection unit 15, according to the type of the sentence attribute.
Is read from the layout rule dictionary 9a,
It is given to the layout information analysis unit 16. The layout information analysis unit 16
Document data allocation based on layout rules required by
The text that controls the text part 13 and is cut out character by character as described above.
It develops the calligraphy data in order to the specified frame. That is, the layout information analysis unit 16 is
A frame that develops document data according to the information in the frame
And the size of the frame at the same time.
You. Then, according to the specified frame, the sentence attribute is applied.
Then, the document data is developed according to the layout rule.
I control At this time, the document data is extracted by the one-character cutout unit 12.
Each time the data is cut out character by character, the column is synchronized with this.
・ Count value of line counter 21 is incremented
You. The count value by the column / line counter 21 is
The maximum column / line value buffer 20 in the comparison unit 22
It is being compared successively with the value set in, and the comparison result
Is determined by the determination unit 23. In this determination unit 23
The layout of the characters cut out by the one-character cutting section 12
Is the position within the frame specified at that time?
It is determined whether or not. Then, the sentence which is layout-controlled and expanded
The assigned position of the write data exceeds the specified frame
When it is determined that the frame is switched, the frame switching unit 24 is activated.
Then, the document data by the document data assigning unit 13
So that the allocation frame of the next frame is updated and controlled to the next frame.
It has become. In this way, the document expanding unit 9
The layout rule set in the out-rule dictionary 9a
Therefore, set the frame where the document data should be output.
ing. Then, the sentence attribute is added to the set frame.
According to the layout rules set according to
-We are deploying data. For example, the logical structure obtained as shown in FIG.
The first sentence is the title in the created document data.
Layout rules are indicated by "lf", "Ce", "ds; 1"
Therefore, the layout information analysis unit 16 first
From the maximum column / line value buffer 20
The maximum number of columns and lines of the frame, for example (M
AX line-the number of characters in the target sentence) / 2, etc.
It seeks the character position to be displayed by tiling. This
Of the column / line counter 21 according to the data of
The numerical value is changed in a preset manner, and the
The expansion of the document data of the target sentence, that is, the assignment of the character position
Done. After that, output allocation of character data of the target sentence
When is finished, it is forced according to the information "ds; 1"
A line of text is displayed below the character string indicating the title that has been output on a new line.
A blank line is set. Output (development) control of such document data
However, for each sentence cut out as described above,
Sequentially according to the obtained logical structure, the document data
The data will be expanded in order in a predetermined frame. As a result,
For example, as illustrated in FIG. 16, as illustrated in FIG.
Document data that requires a hierarchical logical structure
Output is expanded according to the specified layout rule.
And Thus, according to the device thus constructed,
For example, the hierarchical logical structure of the document is obtained from the input document data.
And a layer of document data that follows the hierarchical logical structure
The document data is set according to the
A layout that is set according to the sentence attributes of each
It will be expanded in the format. Specifically, the document data is written in the defined frame.
Data is assigned in sequence, and the hierarchical logical structure of the sentence is
Follows the document structure information of each sentence, that is, according to its sentence attributes.
For example, control such as forced line feed and centering is performed.
Document data is developed. The document processing at the time of document creation and document editing
ReasonHeadline sentences and their headings that are identified from document data
For the sentence following the sentence, it is calculated by comparing the headline sentences with each other.
Output document data in the layout format according to the relationship
Since it is designed to be able to be deployed in stages,Layout format
Efficiently focusing only on the document content without consideration
Can be executed. Moreover, in this way
Even if it makes sense, the hierarchical document structure analyzed as described above,
And the layer specified by its document structure (sentence attribute)
Output layout form of the document data according to the rule
The expression is automatically controlled, so the overall statement is uniform.
It is possible to create the document in written form. Therefore, in order to control the output format of the document, the document creation
Easy to process documents without disturbing thoughts such as success
It is practically useful in that it can be performed easily and effectively.
Great effect can be achieved. The present invention is limited to the above-mentioned embodiments.
Not something. For example, data format showing the logical structure of the document
And rules for obtaining the logical structure, and further document data
The layout rule that determines the output format of
It is good to set it. Also font (character font)
Extraction of headings in consideration of transformation of documents and judgment process of document structure
Alternatively, the output control may be performed. this
In case of high-intensity display and reverse display of output characters,
Of course, it is also possible to transmit such control information.
It is enough to set it as an out rule. In the above-mentioned example, the document data is hierarchical.
After determining the logical structure, control the output layout format.
I did, but even if I did the layout expansion process for each sentence
good. In this case, do not store the logical structure once.
Even if it is used directly for layout expansion processing,
good. Furthermore, multiple layouts for one sentence attribute
Prepare rules and select layout rules from them.
It is also possible to apply the rules. This makes the same sentence
Layout formats for different uses depending on gender, such as academic papers
Can be converted to formulas and in-house report formats, etc.
It is possible to easily edit the layout. Also
Although Japanese was used as an example here, the same applies to other languages.
It is needless to say that the above can be applied. Others
Akira implements various modifications without departing from the scope of the invention.
be able to. [0055] As described above, according to the present invention,Sentence
Following the headline sentence and the headline sentence that is determined from the calligraphy data
The relationship obtained by comparing the headline sentences with each other
Expand the document data to the output means in the layout format according to
Because it is possible, consider the layout format for document processing.
Can be run efficiently withoutDocument publication
Control of force form does not interfere with the thought process.
In addition, it is possible to process the document effectively.
It has a great effect in practice.

【図面の簡単な説明】【図１】本発明の一実施例の概略構成図。【図２】入力文書デ―タの一例を示す図。【図３】実施例装置における文書構造解析処理の手続き
の流れを示す図。【図４】見出し語辞書の一例を示す図。【図５】見出し抽出規則辞書の構成例を示す図。【図６】見出し抽出規則辞書の構成例を示す図。【図７】見出し抽出規則辞書の構成例を示す図。【図８】文書構造規則辞書の構成例を示す図。【図９】文書構造規則辞書の構成例を示す図。【図１０】文書構造規則辞書の構成例を示す図。【図１１】見出し文の解析構造を示す図。【図１２】論理構造記憶部に格納された文書構造の情報
の例を示す図。【図１３】文書展開部の構成例を示す図。【図１４】レイアウト規則辞書の構成例を示す図。【図１５】レイアウト規則によって示されるフレ―ム
（文書枠）の例を示す図。【図１６】レイアウト展開処理されて出力される文書形
式の例を示す図。【符号の説明】１…文書管理部、２…入力部、３…原文記憶部、４…表
示制御部、５…表示部、６…見出し抽出部、６ａ…見出
し語辞書、７…見出し判定部、７ａ…見出し規則辞書、
８…文書構造判定部、８ａ…文書構造規則部、９…文書
展開部、９ａ…レイアウト規則辞書、１０…論理構造記
憶部。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic configuration diagram of one embodiment of the present invention. FIG. 2 is a diagram showing an example of input document data. FIG. 3 is an exemplary flowchart showing a flow of a document structure analysis process in the apparatus according to the embodiment. FIG. 4 is a diagram showing an example of a headword dictionary. FIG. 5 is a diagram showing a configuration example of a heading extraction rule dictionary. FIG. 6 is a diagram showing a configuration example of a heading extraction rule dictionary. FIG. 7 is a diagram showing a configuration example of a heading extraction rule dictionary. FIG. 8 is a diagram showing a configuration example of a document structure rule dictionary. FIG. 9 is a diagram showing a configuration example of a document structure rule dictionary. FIG. 10 is a diagram showing a configuration example of a document structure rule dictionary. FIG. 11 is a diagram showing an analysis structure of a headline sentence. FIG. 12 is a diagram showing an example of document structure information stored in a logical structure storage unit. FIG. 13 is a diagram illustrating a configuration example of a document development unit. FIG. 14 is a diagram showing a configuration example of a layout rule dictionary. FIG. 15 is a diagram showing an example of a frame (document frame) indicated by a layout rule. FIG. 16 is a diagram showing an example of a document format output after layout development processing. [Description of Signs] 1 ... Document management unit, 2 ... Input unit, 3 ... Original text storage unit, 4 ... Display control unit, 5 ... Display unit, 6 ... Heading extraction unit, 6a ... Headword dictionary, 7 ... Heading determination unit , 7a ... heading rule dictionary,
8: Document structure determination unit, 8a: Document structure rule unit, 9: Document development unit, 9a: Layout rule dictionary, 10: Logical structure storage unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者岡本利夫神奈川県川崎市幸区小向東芝町１番地株式会社東芝総合研究所内 (56)参考文献特開昭60−24622（ＪＰ，Ａ) 特開昭60−17522（ＪＰ，Ａ) ｂｉｔ，Ｖｏｌ．15，Ｎｏ．５共立出版株式会社Ｐ．44〜50 ＣｏｕｐｕｔｉｎｇＳｕｒｖｅｙｓ，Ｖｏｌ．14，Ｎｏ．３（Ｓｅｐ. 1982）Ｐ．417〜472 ────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Toshio Okamoto 1 Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, Kanagawa Toshiba Research Institute, Ltd. (56) References JP-A-60-24622 (JP, A) JP 60-17522 (JP, A) bit, Vol. 15, No. 5 Kyoritsu Publishing Co. P. 44-50 Coupling Survey s, Vol. 14, No. 3 (Sep. 1982) P. 417 ~ 472

Claims

(57) [Claims] A sentence that becomes a headline from the input document data and a sentence that follows this headline sentence are discriminated, the headline sentences discriminated from the document data are compared with each other, and the relation between the headline sentences is sequentially obtained. document layout wherein the deploying the and output means said document data to at least one of the statements following the該見out statement as a layout format corresponding to the relationship caption between obtained above.