JPS61190653A

JPS61190653A - Document processor

Info

Publication number: JPS61190653A
Application number: JP60030290A
Authority: JP
Inventors: Toshio Okamoto; 利夫岡本; Isamu Iwai; 岩井　勇
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1985-02-20
Filing date: 1985-02-20
Publication date: 1986-08-25
Anticipated expiration: 2010-01-30
Also published as: JPH077409B2

Abstract

PURPOSE:To facilitate an editing operation in clause units and to improve the function by determining a hierarchical structure in accordance with headers and holding this structure. CONSTITUTION:Document data designated by an input device is inputted to a document data storage part 7. Candidates of a header and the start position of the header of document data inputted to the document data storage part 7 are obtained by the processing of a header candidate finding part 8 and are inputted to storage parts 9 and 10. Data inputted to storage parts 9 and 10 are sent to a header determining part 11, and it is discriminated whether data is the header or not and the classification of the header is discriminated on the basis of data in a discrimination rule storage part 12 where a discrimination rule is written. When all header data cells are generated, the processing of a document structure determining part 14 is started, and a hierarchical structure determining rule storage part 15 is used to link header data cells into the hierarchical structure.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は文書処理装置に２いて、文書データの形態上の
論理構造を生成、保持する文書処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a document processing device that generates and maintains a logical structure in the form of document data.

[Technical background of the invention and its problems]

文書は読みやすくするため、全体を複数の範囲に分割し
、その各々の１つのまとまりに見だしと呼ばれる、短い
文をそのまとまりの最初の部分につけ、またいま、分割
したまとまりをさらに分割し、その各々に見だしをつけ
るといった構造を持っているのが一般的である。そして
見だしには、その見だしの示す範囲と、他の見だしとの
関係を示すために見だしの菅頭部に見だし記号というも
のがついＣいるのが一般的である。In order to make a document easier to read, the entire document is divided into multiple ranges, each group is given a short sentence called a heading at the beginning of the group, and the divided group is further divided, It is common to have a structure in which each item has a heading. Headings generally have a heading symbol at the top of the heading to indicate the scope of the heading and its relationship with other headings.

たとえば「第１章」　「第３節」というのがここでいつ
ている見だし記号の例である。このような文書構造を持
った文書を計算機によって処理しようとした場合、この
ような文書構造をまったく考慮し′Ｃおらず、もっばら
、装置上の制約から、文書構造に関係ない別の単位、た
とえば、表示画面に一度に表示できる大きさの単位へあ
るいは、紙に印刷した時に一頁に入る大きさの単位等で
取り扱っており、そのため文書構造を持った文書を編集
する場合、上記の制約のため利用者にとって“りかいに
くいものになりてい九。For example, ``Chapter 1'' and ``Section 3'' are examples of heading symbols used here. When a computer attempts to process a document with such a document structure, it does not take such document structure into account at all, and due to device constraints, it is likely to be processed by another unit unrelated to the document structure. For example, when editing a document that has a document structure, such as a unit of size that can be displayed on the display screen at one time, or a unit of size that fits on one page when printed on paper, the above restrictions apply. This makes it “difficult to use” for users.

たとえば、章の入れかえを行う場合、従来の装置では、
章の初めと終りをカーソル等で移動範囲を示す必要がラ
リ、かつ、その章が大きい場合、その章の初めを指定し
、終りを指定するまで、表示画面を何回も更新しなけれ
ばならないので、指定位置をまちがえることが多い。For example, when replacing chapters, with conventional equipment,
If it is necessary to indicate the movement range using a cursor or the like to indicate the beginning and end of a chapter, and if the chapter is large, the display screen must be updated many times until the beginning and end of the chapter are specified. Therefore, the specified position is often mistaken.

しかし、文書構造を持った装置では求める章を簡単に指
定できるので、上記のような繁雑な指定操作は不要にな
り操作性はかなり向上すると思われる。However, with a device that has a document structure, it is possible to easily specify the desired chapter, so the complicated specification operations described above are no longer necessary, and operability is thought to be considerably improved.

また、他人が作成した文書や、自分が作成した文書でも
、以前に作りたもので、どのような内容のものか忘れた
ものでも、従来の装置では、文書ファイルのファイル名
程度の内容しかわからず、そこから、文書内容を類推す
るのは困難な場合が多いが、文書の見だしだけでも一度
に見られると、その文書の内容が類推しやすい。Furthermore, even if the document was created by someone else, by yourself, or if you have created it before and have forgotten its contents, conventional devices can only tell you the contents of the document file, such as the file name. First, it is often difficult to infer the content of a document from there, but if you can see just the heading of the document at once, it is easier to infer the content of the document.

つまり従来の装置では、文書の構造を利用したより高度
の文書処理の実現が困難であるという欠点があった。In other words, the conventional apparatus has the disadvantage that it is difficult to realize more advanced document processing using the structure of a document.

[Purpose of the invention]

本発明は、上記実情に鑑みなさｎたもので、文書の形態
上の論理構造を抽出し、階層構造を構築することを目的
とする。The present invention was developed in view of the above-mentioned circumstances, and an object of the present invention is to extract a logical structure in the form of a document and construct a hierarchical structure.

[Summary of the invention]

本発明は、コード情報で書かれた文書データから見だし
の候補を取り出し、その見だしの記号部分から、見だし
相互間の関係を決め、見だしの階層構造を決めることを
可能としたものである。The present invention makes it possible to extract heading candidates from document data written in code information, determine the relationship between the headings from the symbol part of the heading, and determine the hierarchical structure of the headings. It is.

具体例ｔ−６げると、改行コードで区切った文字列で文
頭から、−桁分に相当する長さ以下の文字列を取りだし
、その冒頭部分に「第−章」。To give a specific example t-6, a character string whose length is equal to or less than - digits is extracted from the beginning of the sentence in a character string separated by line feed codes, and "Chapter -" is written at the beginning of the character string.

ｒ（３Ｎ　　ｒＯＪ　　等の見だし記号の入っている文
字列を見だしとする。A character string containing a heading symbol such as r(3N rOJ) is used as a heading.

見だし記号に数字、英文字が入っている時は、その字体
、頴序性、形式によりて、また、記号だけの場合は、前
後の関係によりて、今処理している見だしの階層構造上
の順位金決め、見だし間の論理構造をもつデータ構造を
つくりだすことによって実施される。When the heading symbol contains numbers and alphabetic characters, the hierarchical structure of the heading currently being processed is determined depending on the font, order, and format, and when only symbols are included, the hierarchical structure of the heading currently being processed is determined based on the relationship between the front and back. This is done by creating a data structure with a logical structure between the top rankings and headings.

〔Effect of the invention〕

本発明によれば、処理する文書が、見だし金利用して、
形態上、階層構造を持りてい九場合、見だしから階層構
造を決定し、保持するので、そのデータ構造を利用して
、章単位１節単位の編集操作が容易になり、操作性が向
上するほか、階層構造を利用したより高度な機能またと
えば、目次のリスト作成とか、章９節ごとに規則的に字
下げして印刷して文書を見やすくする機能とか、見だし
記号の誤りの指通、見だし記号の°りけ直し等が容易に
実現できる。ざらに、本装置に入力する文書データは階
層構造のデータを持つ必要がないので、従来の文書処理
装置で作成した文書データでも本装置で処理でき、かつ
、本装置で新規に文書を作成する操作でも、階層構造を
意識することな〈従来と同じ操作でよく、操作者にさら
に負担をかけることがない等の効果を奏する。According to the present invention, the document to be processed is
If the book has a hierarchical structure, the hierarchical structure is determined and maintained from the headings, so by using that data structure, it becomes easier to edit chapters and sections, improving operability. In addition, there are more advanced functions that utilize hierarchical structures, such as creating a table of contents list, printing regularly indented chapters and nine sections to make the document easier to read, and identifying errors in heading symbols. It is easy to change the format of text, heading symbols, etc. In general, the document data input to this device does not need to have a hierarchical structure, so document data created with conventional document processing devices can be processed with this device, and new documents can be created with this device. Even in operation, there is no need to be aware of the hierarchical structure (the same operations as in the past are required, and there is no additional burden on the operator).

[Embodiments of the invention]

以下図面全参照し”〔本発明の一実施例を説明する第１
図は本発明の一実施例の全体のブロック図を示す。ここ
で、入力装置１．出力装置２９表示装置３．外部記憶装
置４．内部記憶装置５は従来から広く一般に文書処理で
使用されているもので特定はしない。Please refer to all the drawings below.
The figure shows an overall block diagram of an embodiment of the invention. Here, input device 1. Output device 29 Display device 3. External storage device 4. The internal storage device 5 is one that has been widely used in document processing and will not be specified.

制御装置６を詳記したのが第２図のブロック図である。The block diagram of FIG. 2 shows the control device 6 in detail.

第１図にかい゛Ｃ１文書ファイルが入っている外部記憶
装置４から、入力装置１により指定された名前のファイ
ルを内部記憶装置５内の文書データ記憶部に送られる。From the external storage device 4 containing the C1 document file shown in FIG. 1, a file with the name specified by the input device 1 is sent to the document data storage section in the internal storage device 5.

文書データ記憶部７に入った文書は見だし候補発見部８
の処理によって、見だしの候補及びその見だしの開始位
置が求められ、それぞれの記憶部９．１０に入る。入り
たデータは次の見出し決定部１１に送られ、そこでこの
データが見だしかどうか、ｉた見だしであれば、どのよ
うな種類の見出しかとその判定規則が書かれている判定
規則記憶部１２のデータをもとに処理される。処理され
たデータは見だしデータセルと呼ばれる、その見だしに
関し、種々の情報を書きこんで−かたまりとしたものが
作成され、見だしデータセル記憶部１３に貯えられる。The document entered in the document data storage section 7 is sent to the heading candidate detection section 8
Through the process, heading candidates and starting positions of the headings are obtained and stored in the respective storage units 9.10. The entered data is sent to the next heading determining section 11, where it is determined whether this data is a heading or not, and if so, a judgment rule storage section in which the type of heading and its judgment rule are written. Processed based on 12 data. The processed data is called a heading data cell, which is created by writing various pieces of information regarding the heading, and is stored in the heading data cell storage section 13.

この見だし候補発見部８と見だし判定部１１の処理を、
今処理し°Ｃいる文書がすべておわるまで続ける。The processing of the heading candidate discovery section 8 and the heading judgment section 11 is as follows:
Continue until all documents currently being processed have been processed.

そして見だしデータセルがすべて作られたところで処理
は文書構造決定部１４に移り、ここで階層構造決定規則
記憶部１５を用いて見だしデータセルを階層構造につな
ぎあわせ、すべての処理をおえる。When all the heading data cells have been created, the processing moves to the document structure determination section 14, where the hierarchical structure determination rule storage section 15 is used to connect the heading data cells into a hierarchical structure, and all processing is completed.

次に各部の動作を詳しく説明する。Next, the operation of each part will be explained in detail.

見だし候補発見部８のブロック図を第３図に示す。A block diagram of the heading candidate finding section 8 is shown in FIG.

読出位置制＃部１６の指示により、文書データ記憶部７
から順番に１文字づつ比較部１７に送り、同時にアドレ
ス記憶部１８にその文字の文書中の位置を貯えておく。According to instructions from the read position system # section 16, the document data storage section 7
The characters are sequentially sent one by one to the comparison section 17, and at the same time, the position of each character in the document is stored in the address storage section 18.

一方、比較部１７に送られたデータをレジスタ１に入っ
ている改行コードと比較する。又、データはラインバッ
ファ１９にも同時に送られ、ここに貯えられ、文字数を
カウンタ２１で計測しておく。データ中に改行コードが
現われた時のカウンタ２１の値と、あらかじめ定めてお
いた１行文の長さくたとえば４０文字）としてレジスタ
２に入れておいた値とを比較部２２で比較し、カウンタ
がその値と一致した時判定部２０に入り、改行コードの
一致が同時又は先行した場合、このデータは見だし候補
と決定され、ラインバッファ部のデータを見だし候補デ
ータ記憶部１０に入れ、アドレス記憶部の値から、カウ
ンタの値を演算部２３で減算処理し、この値からライン
バッファ部の先頭アドレスが求まるので、これを見だし
候補データ開始位置記憶部９へ入れ、ラインバッファ、
カウンタを初期状態に戻し、再びこの見だし発見部の処
理をくり返す。On the other hand, the data sent to the comparing section 17 is compared with the line feed code stored in the register 1. The data is also simultaneously sent to the line buffer 19 and stored there, and the number of characters is counted by a counter 21. The comparison unit 22 compares the value of the counter 21 when a line feed code appears in the data with the value stored in the register 2 as a predetermined length of one line (for example, 40 characters), and the counter When the value matches, the judgment unit 20 enters. If the line feed code matches at the same time or precedes, this data is determined to be a heading candidate, and the data in the line buffer is stored in the heading candidate data storage unit 10, and the address The value of the counter is subtracted by the calculation unit 23 from the value of the storage unit, and the start address of the line buffer unit is determined from this value. This is stored in the header candidate data start position storage unit 9, and the line buffer,
The counter is returned to its initial state and the process of this heading discovery section is repeated again.

逆に、カウンタの値と一行文の長さの値の一致の方が先
行した場合、ラインバッファ部に入っているデータは見
だしになり得ないと判断し、ラインバッファ、カウンタ
を初期状態に戻し、再びこの見だし発見部の処理をくり
返す。On the other hand, if the counter value matches the one-line sentence length value first, it is determined that the data in the line buffer section cannot be a heading, and the line buffer and counter are returned to their initial states. Return it and repeat the process of this heading discovery section again.

処理中に文書データが終了したら、処理を見だし判定部
１１に引き継ぐ。When the document data is completed during processing, the processing is handed over to the header determination section 11.

次に見だし決定部のブロック図を第４図に示す。Next, a block diagram of the heading determination section is shown in FIG.

判定規則記憶部２４には第５図に示すような正規表現で
あられされる見だし決定の規則が入っている。これを判
定規則適用制御部２５で適用し゛〔成功した場合見だし
と決定し、成功しなかった場合、見だしでないと決定す
る。The judgment rule storage unit 24 contains rules for determining headings expressed using regular expressions as shown in FIG. This is applied by the determination rule application control unit 25, and if it is successful, it is determined that it is a heading, and if it is not successful, it is determined that it is not a heading.

第５図で丸印でかこまれた記号は、非終端記号と呼ぶも
のであり、四角でかこま・れた記号は終端記号で文字コ
ードデータそのものでるる。各規則は、左側に丸でも四
角でも囲まれていない記号が右側の非終端記号、終端記
号に置き換ることを示している。各記号は矢印の方向に
順番に適用し、上下に矢印が並んでいる場合は上の規則
から適用し、上のルールが成功しなかった場合、下のル
ールを適用することを示している。そしてこれらの規則
を適用しＣすべＣの非終端記号が終端記号に置き換り走
時、この適用した規則は成功し九ことになり、途中で置
き換えがうまくいかなくなりた場合、その適用は成功し
ないことになる。In Figure 5, the symbols enclosed in circles are called non-terminal symbols, and the symbols enclosed in squares are terminal symbols and represent the character code data itself. Each rule states that a symbol that is not enclosed in a circle or square on the left is replaced by a nonterminal or terminal symbol on the right. Each symbol is applied in order in the direction of the arrow, and arrows above and below indicate that the upper rule will be applied first, and if the upper rule is not successful, the lower rule will be applied. Then, when these rules are applied and the non-terminal symbols of C and C are replaced with terminal symbols, this applied rule will be successful, and if the replacement does not go well in the middle, the application will not be successful. It turns out.

たとえば、第６図の文書を入力して　５行目の「ｓ４　
ｗ　ｗ　ｗ　ｌ、はじめに−Ｊ　　（ただし口は空白コ
ードを意味する）が見だし決定部に入力したとすると、
第７図のように規則を適用すると成功し、見だしと決定
される。それ以外の適用の仕方をすると成功しない。For example, input the document shown in Figure 6 and enter "s4" in the 5th line.
w w w l, Introduction -J (where 口 means a blank code) is input into the heading determination section,
Applying the rules as shown in FIG. 7 is successful and is determined to be a heading. Any other application will not be successful.

このように適用の仕方も何通りもあるので、どれか１通
りの適用の仕方が成功すれば見だしとされ、すべての適
用の仕方が成功しなければ見だしでないとする。In this way, there are many ways of application, so if any one way of application is successful, it is considered a heading, and if all the ways of application are unsuccessful, it is not considered a heading.

児だしと決定されると次に見だしデータセル作成部２６
へ処理が行く。ここでは第８図に示すような見だしデー
タセル？１つの見だしに対して１つ作成する。そして見
だし決定規則に従って成功し走過用の仕方の結果をセル
に書いておく。また数字部、英字部を適用した見だしは
、数字部の場合はそのａ値を、英字部の場合は、アルフ
ァベット項で何番目かといり数値をオーダーと名付けそ
こに入れておく。たとえば見だし記号が「第二章」　　
゛の場合、オーダーは２．ｒｃｏＪ　の場合は３となる
。When it is determined that the child is a child, the heading data cell creation unit 26
Processing will proceed to. Here, the heading data cell as shown in Figure 8? Create one for each heading. Then, according to the heading determination rule, write the result of the successful running method in the cell. In addition, for headings to which numeric parts and alphabetic parts are applied, in the case of numeric parts, the a value is called the order, and in the case of the alphabetic parts, the numeric value is called the order in the alphabetical section, and is entered there. For example, the heading symbol is "Chapter 2"
In the case of ゛, the order is 2. In the case of rcoJ, it is 3.

この作成部で見だしデータセルのうち第８図のλの部分
が定まったことになる。このようなデータセルを見友し
データセル記憶部２７に貯える。This creation section determines the portion λ in FIG. 8 of the header data cells. Such data cells are collected and stored in the data cell storage section 27.

たとえば、第７図で例に示した見だしの場合、前置部は
なし、数字部は数字Ａ（第５図（ｅ））でオーダーは１
．後置部は「、」ということがわかり、見だし候補デー
タ開始位置記憶部９にこの見だしの開始位置が入り°Ｃ
いるのでこれもデータセルに移す。以上の操作で第９図
のようにデータセルが出来あがる。見だしでないと決定
された場合、データセルをつくらずこの見だし候補デー
タと開始位置のデータをすてる。For example, in the case of the heading shown in the example in Figure 7, there is no prefix, the number part is the number A (Figure 5 (e)), and the order is 1.
．． It turns out that the suffix is "," and the start position of this heading is entered in the heading candidate data start position storage unit 9 °C
Since there is, we will also move this to the data cell. With the above operations, a data cell is completed as shown in FIG. 9. If it is determined that it is not a heading, no data cell is created and this heading candidate data and data at the start position are discarded.

以上の処理を入力データがなくなるまでくり返す、入力
データがなくなったら処理を文書構造決定部１４に進め
る。The above process is repeated until there is no more input data, and when there is no more input data, the process is advanced to the document structure determining section 14.

最後に文書構造決定部１４の詳しい説明をする。Finally, the document structure determining section 14 will be explained in detail.

ここでは前段で作成された見だしデータセルを階層構造
につなげるところである。Here, the heading data cells created in the previous step are connected to the hierarchical structure.

処理手順を第１０図第１１図第１２図にしめす。The processing procedure is shown in FIG. 10, FIG. 11, and FIG. 12.

例をあげて説明する第６図の文書の場合、第１３図に示
すように５つのセルが出来る。In the case of the document shown in FIG. 6, which will be explained using an example, five cells are created as shown in FIG. 13.

まず１番目のセル「１．はじめに」を入力する。First, enter the first cell "1. Introduction".

そして第１０図のフローに従い、あらかじめ設けである
ルートのセルの子供として１番目のセルをつなげる。つ
まり１番目のセルの親セルの先頭アドレスを書く所にル
ートのアドレスを入れ、次の兄第セルの先頭アドレス、
１番目の子セルの先頭アドレス、エラーフラグの所は今
、該当しないのでどこも差し示していないという意味の
特別な値を入れて２く。そして階数の所は１．つまり１
番上位の児だしとする。Then, according to the flow shown in FIG. 10, the first cell is connected as a child of the root cell, which is provided in advance. In other words, enter the root address where you write the start address of the parent cell of the first cell, and the start address of the next older cell,
The first child cell's starting address and error flag are currently not applicable, so enter a special value that means it is not pointing anywhere. And the floor number is 1. That is 1
Let's assume it's the top child.

次に、２番目のセル「２０本発明の特徴」を入力する。Next, input the second cell "20 Features of the Present Invention".

現在のセル（１′ｔｒ目のセル）は前置部　　なし英数字で　数字部の場合後１部　　「、」オーダー　１次のセル（２番目のセル）はオーダーが２以外は同じで
ある。よりて第１２図の規則で■のケースになるので、
２番目のセルを１番目のセルと兄第とする。つまり、１
番目のセル中の次の兄第セルの先頭アドレスをかく所に
２番目のセルの先頭アドレスをかき込み、２番目のセル
の親セルの先頭アドレスには１番目のセルの親セルの先
頭アドレスの所に入っているのと同じアドレス（つまり
ルートセルのアドレス）を書き込み、階数は現存のセル
の階数と同じ１をかきこみその他は該当なしの値を入れ
Ｃおく。The current cell (the 1'tr cell) is an alphanumeric character with no prefix, and if it is a numeric part, the last part is "," Order 1 The next cell (2nd cell) is the same except for the order 2. Therefore, according to the rules in Figure 12, the case becomes ■, so
Let the second cell be the older brother of the first cell. In other words, 1
Write the start address of the second cell in place of the start address of the next older cell in the second cell, and write the start address of the parent cell of the first cell into the start address of the parent cell of the second cell. Write the same address (that is, the address of the root cell) that is in the cell, write 1 for the rank, which is the same as the rank of the existing cell, and enter unapplicable values for the rest, and set C.

次に３番目のセル「（１）従来の技術」を入力する。Next, input the third cell "(1) Prior art".

これは現在のセルである２番目のセルと比べると、前置
部、数字の４類、後置部とも異なっており、現在のオー
ダーは１である。よって第１２図の■のケースになりこ
の３番目のセルを２番目のセルの子供にし、階数金１つ
増して２とすることになる。そしてさきほどと同様にポ
インタ操作をしてつなげる。４番目のセル「（２）％徴
の説明」を入力し同様の処理をすると３番目の兄第とな
る。Compared to the second cell, which is the current cell, this is different in prefix, number 4, and postfix, and the current order is 1. Therefore, in the case of ■ in FIG. 12, this third cell is made a child of the second cell, and the rank is increased by one to make it 2. Then use the pointer to connect them as before. If you input the fourth cell "(2) Explanation of % characteristics" and perform the same process, you will get the third older cell.

５番目のセル「３．終りに」を入力すると、第１２図の
■のケースになり階数を１つ減して１とし４番目のセル
の親つまり２番目のセル「２０本発明の特徴」との比較
になり、処理結果、これの兄第になるから、２番目のセ
ルの兄第としてこの５番目のセルをつなげる。If you input the 5th cell "3. At the end", the case of ■ in Figure 12 will occur, and the rank will be reduced by 1 to 1, which is the parent of the 4th cell, that is, the 2nd cell "20 Features of the present invention" As a result of the processing, it becomes the older brother of this cell, so we connect this fifth cell as the older brother of the second cell.

以上全部のセルを処理すると第１３図のような階層構造
が完成し、処理を終える。When all the cells are processed, a hierarchical structure as shown in FIG. 13 is completed, and the processing is completed.

この処理では見だし記号がまちがえていた場合、ある程
度推測して階層構造をつなごうとしている。In this process, if the header symbol is incorrect, it makes some guesses and attempts to connect the hierarchical structure.

たとえば「第−章」の次にｒ第二章」とせず「第三章」
とした場合、つまりオーダーが飛んでいる場合、また「
第−章」の次に「第−章」というようにオーダーが同じ
場合、これは本来、オーダーが続いているものと仮定し
、セルにエラーフラグをたてて同じ階層、兄第セルとし
てリンクする。そして本来の正しいオーダーをセルの所
定の所に書き込み、この値を使って以後の処理を続ける
。また、ｒ（１）Ｊの次にｒ　Ｃ２：ｌ　Ｊとなった場
合、つまり形が違うがオーダーが続いている場合で親に
その形が出′Ｃきていない時は、形をあやまりてつかっ
たものと仮定して両者は兄第としエラーフラグをたてて
最初の形をセルにつけ加えておく。For example, after "Chapter -", instead of "Chapter 2", say "Chapter 3".
In other words, if the order is flying,
If the order is the same, such as "Chapter - Chapter" followed by "Chapter - Chapter", this is assumed to be a continuation of the order, and an error flag is set on the cell and linked as the older cell in the same hierarchy. do. Then, the original correct order is written in a predetermined location of the cell, and subsequent processing is continued using this value. Also, if r(1)J is followed by rC2:lJ, that is, the shape is different but the order continues, and the parent does not have that shape, the shape is wrong. Assuming that the two are used as older brothers, set an error flag and add the first form to the cell.

以上のエラー処理でも階層不明のセルがあるときは、現
在の兄弟としてろ・りかいエラーフラグをたて兄のセル
の形、オーダーをひきつぐものとする。Even with the above error processing, if there is a cell whose hierarchy is unknown, an error flag is set to indicate that the cell is the current sibling, and the shape and order of the older cell are inherited.

尚、上記した実施例においては、見だし候補を全部求め
Ｃから、見だしかどうか判定し、判定が全部終った後で
階層構造を決定しているが、見だし候補が１°り見°り
かり仕第、階層構造決定までのすべての処理を行い、処
理が終ったら、再び、見だし候補を見つけるというよう
にくり返し処理で実施してもよい。In the above-mentioned embodiment, all heading candidates are found and judged from C whether they are headings or not, and the hierarchical structure is determined after all the judgments are completed. It is also possible to carry out the process repeatedly, such as performing all the processes from starting the process to determining the hierarchical structure, and once the process is completed, finding heading candidates again.

また見だし候補決定部において、１つの文の長さが一桁
分を起えるものは見だし候補として採用していないがす
べての改行コード等、文を区切るコードで区切られた文
をすべて見だし候補として仮定し、次の処理部である、
見だし判定部で見だしを決定し”Ｃもよい。In addition, in the heading candidate determination section, sentences whose length is one digit are not selected as heading candidates, but all sentences separated by sentence delimiting codes such as line feed codes are checked. Assuming it as a dashi candidate, the next processing section is,
The heading determining section determines the heading and "C" is also good.

また求めた見疋しの階層構造をデータ構造としで保持す
るやり方は、上記の実施例では、自分の親と第と第Ｘ子
の見だしのセルの先頭アドレスを持つことで実現してい
るが、他の方法でデータ島構造を実現してもかまわない
。In addition, in the above example, the hierarchical structure of the obtained headings is stored as a data structure by having the start addresses of the heading cells of its own parent, th child, and Xth child. However, the data island structure may be realized in other ways.

更に各見だしのセルの内容もこの実施例にこだわらず、
たとえば見だし以下につづく本文をこのセルに結びつけ
る等など考えられるが、それでもかまわない。Furthermore, the contents of each heading cell are not limited to this example,
For example, you could tie the text that follows the heading to this cell, but that is fine.

見だしの階層化に限ったことではなく、たとえば組織図
のように形態上、階層構造をもつデータの場合でも、見
だし決定規則や文書構造決定規則などのルールのデータ
を変えることで応用可能である。This is not limited to heading hierarchy, but can also be applied to data that has a hierarchical structure, such as an organization chart, by changing the data of rules such as heading determination rules and document structure determination rules. It is.

本実施例は日本語の文書を扱っているが日本語に限らず
、他国語の文書でも見だしがついているものであれば、
判定規則１階層構造決定規則を、その国語用に書き直せ
ばそれで同様に実施できる。This example deals with Japanese documents, but it is not limited to Japanese, but documents in other languages can also be used if they have headings.
Judgment Rule 1 If the hierarchical structure determination rule is rewritten for the Japanese language, it can be implemented in the same way.

[Brief explanation of drawings]

第１図は本装置の全体の構成図。第２図は制御部の詳しい構成図。第３図は制御部のなかの見だし候補発見部の構成図。第４図は制御部のなかの見だし決定部の構成図。第５図は見だし決定部の判定規則記憶部中に貯えられて
いる判定規則を正規表現した図。第６図は本装置に入力する文書構造をもった文書の例を
示す図。第７図は第６図の文書例中の見だし候補の１つに第５図
の判定規則がどのように適用されるか、その例を示す図
。第８図は見だしデータセルのデータ得造図。第９図は第６図の文書例中の１つの見だしがデータセル
でどのように表現さｎるか、その例を示す図。第１０図、第１１図、第１２図は、制御部のなかの文書
構造決定部での処理操作を表わした図。第１３図は第６図の文書例の見だしのデータセルが文書
構造決定部を通っ゛〔どのようにつながったかその例を
示す図である。１・・・入力装置、２・・・出力装置、３・・・表示装
置。４・・・外部記憶装置、５・・・内部記憶装置、６・・
・制御装置。３、代理人弁理±１　則　近　Ｍ　佑　（はが１名）第
　　１　図第　　７　図第　　３　図第　　４　図第５図第　　５　図第　　５　図（か＃Ｃ置部：　　　　　　　（７７１）莢キ部：第　
　６　図第１０図第Ｓ図第１１図第　　９　図第１２図第１３図FIG. 1 is an overall configuration diagram of this device. FIG. 2 is a detailed configuration diagram of the control section. FIG. 3 is a configuration diagram of a heading candidate discovery section in the control section. FIG. 4 is a configuration diagram of the heading determining section in the control section. FIG. 5 is a diagram showing a regular expression of the judgment rules stored in the judgment rule storage section of the heading determining section. FIG. 6 is a diagram showing an example of a document having a document structure to be input to the present apparatus. FIG. 7 is a diagram showing an example of how the determination rule of FIG. 5 is applied to one of the heading candidates in the document example of FIG. 6. Figure 8 is a data acquisition diagram of the heading data cell. FIG. 9 is a diagram showing an example of how one heading in the document example of FIG. 6 is expressed in a data cell. FIG. 10, FIG. 11, and FIG. 12 are diagrams showing processing operations in a document structure determination section in the control section. FIG. 13 is a diagram showing an example of how the heading data cells of the document example in FIG. 6 are connected through the document structure determining section. 1... Input device, 2... Output device, 3... Display device. 4...external storage device, 5...internal storage device, 6...
·Control device. 3. Attorney's Attorney ±1 Rule Chika M Yu (1 person) Figure 1 Figure 7 Figure 3 Figure 4 Figure 5 Figure 5 Figure 5 Department: No.
6 Figure 10 Figure S Figure 11 Figure 9 Figure 12 Figure 13

Claims

[Claims]

an input means for inputting document data written in code information; an extraction means for extracting heading candidates from the document data input from the input means based on the length of a sentence measured by a line feed code, etc.; means for extracting a symbol at the beginning of the heading candidate for each of the heading candidates extracted by the extracting means, determining and storing a hierarchical structure such as an inclusion relationship, an ordering relationship, etc. of the heading candidate based on the format of the symbol; A document processing device characterized by: