JP2013016108A

JP2013016108A - Document processing method

Info

Publication number: JP2013016108A
Application number: JP2011149939A
Authority: JP
Inventors: Tokuji Kobayashi; 徳滋小林
Original assignee: ANTENNA HOUSE KK
Current assignee: ANTENNA HOUSE KK
Priority date: 2011-07-06
Filing date: 2011-07-06
Publication date: 2013-01-24
Anticipated expiration: 2031-07-06
Also published as: JP5739253B2

Abstract

PROBLEM TO BE SOLVED: To provide a method for simply and effectively perform document processing of setting links to mutually refer to locations in an original document file and a new document file relevant to each other, by applying hierarchized numbers to indexes, captions of diagrams, annotations, etc. in a plurality of document files as consecutive numbers across respective document files and by creating a new document file such as a reference chart composed of the indexes, the captions of the diagrams, the annotations, terminologies, index terms, etc.SOLUTION: A document processing method comprises the steps of: creating a tree structure definition that shows relations between a plurality of document files with an edit unit provided on a computer, reading the plurality of document files with a computer program and combining the files into one document file using the tree structure definition, applying consecutively hierarchized numbers to markup places in the combined document files, and dividing the combined document files after consecutive numbers are applied to the files into the plurality of original document files.

Description

本発明は、多数のマークアップ済み電子文書ファイル（以下、単に「文書」ともいう）にまたがる階層化番号と参照を効率的に作成する仕組みに係り、この仕組みを利用した文書処理方法、並びに、文書処理装置に関する。
特に、本発明は、文書内のデータが木構造を有する複数の要素に分類された構造化文書を処理するための方法、その方法を利用した文書処理装置に関する。 The present invention relates to a mechanism for efficiently creating hierarchical numbers and references across a number of marked-up electronic document files (hereinafter also simply referred to as “documents”), a document processing method using this mechanism, and The present invention relates to a document processing apparatus.
In particular, the present invention relates to a method for processing a structured document in which data in a document is classified into a plurality of elements having a tree structure, and a document processing apparatus using the method.

インターネットでは膨大な量の電子文書ファイルが流通している。ＷＷＷ（Ｗｏｒｌｄ
ＷｉｄｅＷｅｂ）上で公開されるこれら電子文書ファイルのほとんどはＨＴＭＬまたはＸＨＴＭＬ等のマークアップがなされてコンピュータの記憶装置の上に分散した状態で公開されており、分散ファイル間の参照関係を設定したリンクによって情報をたどっていく仕組みとなっている。 A huge amount of electronic document files are distributed on the Internet. WWW (World
Most of these electronic document files published on WideWeb) are published in a state of being distributed on a computer storage device with markup such as HTML or XHTML, and a link in which a reference relationship between the distributed files is set It is a mechanism to follow information by.

しかし、これらの電子文書ファイルは体系化されておらず、膨大な量の電子文書ファイルを効率よく体系化することができないという問題点が指摘されるようになった。
すなわち、マークアップされた多数の文書ファイルを整理・体系化して一つのパッケージとしてまとめた電子書籍として発行するために、複数のマークアップ済み電子文書ファイルを簡便に、効率よく整理し、体系化する処理を行なうための方法および装置の開発が求められている。 However, these electronic document files are not systematized, and it has been pointed out that a huge amount of electronic document files cannot be efficiently organized.
In other words, in order to publish a large number of marked-up document files as an electronic book that is organized and organized into a single package, multiple marked-up electronic document files are organized and organized easily and efficiently. There is a need to develop a method and apparatus for performing processing.

本発明は、上記の電子文書ファイルの現状に鑑み、多数の文書ファイルを体系化するために複数の文書ファイルを階層化し順序付けして並べた上で、複数の文書ファイル内の見出し、図表のキャプション、注釈などを対象として階層化した番号を、各文書ファイルをまたがる連続番号として付与するとともに、見出しや図表のキャプション、注釈、用語・索引語などを取り出して一覧表などにした新しい文書ファイルを作り出して、元の文書ファイルと新しい文書ファイル間の関係する場所を相互に参照するリンクを設定する文書処理を簡便かつ効率的に行なう方法および装置を提供することを、課題とする。 In view of the current state of the above-mentioned electronic document file, the present invention provides a caption and chart caption in a plurality of document files after arranging and arranging a plurality of document files in order to organize a large number of document files. In addition to assigning hierarchical numbers for annotations, etc., as a continuous number across each document file, create a new document file that takes out captions, annotations, terms, index terms, etc. from headings and charts and makes it a list Thus, an object of the present invention is to provide a method and an apparatus for simply and efficiently performing document processing for setting a link that mutually refers to a related place between an original document file and a new document file.

すなわち本発明は、上記の課題を解決することができる文書処理方法、及び、文書処理装置を提供することを目的とするが、この目的は、特許請求の範囲における独立請求項１，２に記載の発明の組み合わせ、或は、独立請求項４又は５に記載の発明により達成される。また、従属請求項３の発明は、本発明の更なる有利な具体例を規定している。以下、本発明について詳しく述べる。 That is, the present invention aims to provide a document processing method and a document processing apparatus capable of solving the above-mentioned problems, and this object is described in the independent claims 1 and 2 in the claims. It is achieved by the combination of the inventions described above, or the invention described in the independent claim 4 or 5. Further, the invention of the dependent claim 3 defines further advantageous specific examples of the present invention. The present invention will be described in detail below.

上記課題を解決することを目的としてなされた本発明の文書処理装置の構成は、各々が木構造としてマークアップ済みの多数の文書ファイルを記録・保管する記憶装置と、多数の文書ファイル同士の関係を木構造として定義し、木構造の根からの階層（深さ）に応じた階層化番号形式を定義し、所定マークアップ部分を取り出して集め新しい文書ファイルを作成し、相互参照を設定するための処理を定義する出版編成機構部と、前記記憶装置から取り出した文書ファイルを、前記木構造定義に従って結合した上で前記階層化番号形式定義により階層化した連続番号を付与し、前記所定マークアップ部分を取出す定義を使って多数の文書ファイルから所定マークアップ部分を取り出して新しい文書を作成し、その新しい文書と元の文書間で相互参照のためのリンクを作り出した上で、前記の結合された電子文書ファイルを元のファイル単位に分割する機能を有する文書処理加工部と、文書処理加工部の出力をパッケージして電子書籍を作成する電子書籍出力機構部と、を含むことを特徴とするものである。 The configuration of the document processing apparatus of the present invention, which has been made for the purpose of solving the above problems, includes a storage device that records and stores a large number of document files each marked up as a tree structure, and a relationship between the large number of document files. Is defined as a tree structure, a hierarchical number format is defined according to the hierarchy (depth) from the root of the tree structure, a predetermined markup part is extracted and collected to create a new document file, and a cross-reference is set A publishing organization mechanism for defining the processing of the document and the document file retrieved from the storage device are combined in accordance with the tree structure definition and given a serial number hierarchized by the hierarchical number format definition, and the predetermined markup Create a new document by extracting predetermined markup parts from a number of document files using the definition to extract parts, and then create a new document and the original document Create an electronic book by creating a link for reference and packaging the document processing unit with the function of dividing the combined electronic document file into original file units and the output of the document processing unit And an electronic book output mechanism unit.

一方、上記の課題を解決できる本発明文書処理方法の構成は、次の通りである。
すなわち、本発明文書処理方法における第一の工程は、マークアップが施された多数の文書ファイルを記録している記録装置から電子ファイルのファイル名と名前を読み取って多数の電子文書ファイルを一つにパッケージしたときに読み込む際の順序関係と階層関係を木構造で定義し、階層化番号形式を定義し、マークアップを使って新しい文書を作り出し、元の文書と新しい文書の関連する箇所の間の相互参照設定を定義する工程である。
本発明文書処理方法における第二の工程は、マークアップが施された多数の文書ファイルを記録している記録装置から電子文書ファイルを読み出して、文書ファイル同士の木構造の指定にしたがって、電子文書ファイル同士を結合したうえで、結合された電子文書ファイル内のマークアップで指定された部分に階層化した連続番号を付与する工程と、結合された電子文書ファイルの指定マークアップ部分を複製または移動して新しい電子文書ファイルを作り出し、結合された電子文書ファイルと新しい電子文書ファイル間の関係する箇所に双方向の参照をマークアップする工程と、結合された電子文書ファイルを元の電子文書ファイル単位に分割する工程とからなる。
本発明文書処理方法における第三の工程は、上記第二の工程で出来上がった電子文書ファイル群を受け取って電子書籍の形にパッケージする工程であり、本発明文書処理方法は、上記の第一工程〜第三工程を備えたものである。 On the other hand, the configuration of the document processing method of the present invention that can solve the above-described problems is as follows.
That is, the first step in the document processing method of the present invention is to read a file name and name of an electronic file from a recording device that records a large number of document files that have been marked up, and Define the ordering and hierarchical relationships in a tree structure when packaged in a package, define a hierarchical number format, create a new document using markup, and between the original document and the relevant part of the new document This is a step of defining the cross reference setting.
The second step in the document processing method of the present invention is to read an electronic document file from a recording device that records a large number of document files that have been marked up, and in accordance with the designation of the tree structure of the document files, the electronic document After combining files, the process of assigning a hierarchical serial number to the part specified in the markup in the combined electronic document file and copying or moving the specified markup part of the combined electronic document file Creating a new electronic document file, marking up a bi-directional reference at the relevant location between the combined electronic document file and the new electronic document file, and combining the combined electronic document file with the original electronic document file unit. And the process of dividing into two.
The third step in the document processing method of the present invention is a step of receiving the electronic document file group completed in the second step and packaging it in the form of an electronic book, and the document processing method of the present invention is the above first step. -It has a third step.

本発明における電子書籍の形式としてはＥＰＵＢのように多数のＸＨＴＭＬを一つのアーカイブにパッケージして配布する形式を作り出すが、ＥＰＵＢに限らず、類似のパッケージ形式の電子出版物を作るために使うことができる。また、マークアップはＸＭＬ、ＨＴＭＬ、ＸＨＴＭＬおよびそれらに対して拡張または限定をおこなった形式のうちのいずれかの形式で記述されても良い。
本発明の文書処理装置における記録媒体は、コンピュータ読み取り可能な記録媒体であって、上記いずれかに記載の方法をコンピュータに実行させるためのプログラムを記録する。
本発明の電子文書はコンピュータで読み取り可能な電子ファイルとしてコンピュータの記憶装置に記録されるものであるが、できあがった電子書籍を除いて永続的に記録されるものである必要はなく処理工程が終わった時点で消去されても良い。
なお、上述した本発明の概要は、本発明の構成に必要な特徴のすべてを列挙したものではなく、これらの特徴のサブコンビネーションもまた発明となりうる。 The format of the electronic book in the present invention is to create a format in which a large number of XHTML is packaged and distributed in a single archive, such as EPUB, but is not limited to EPUB, and is used to create an electronic publication in a similar package format. Can do. Further, the markup may be described in any one of XML, HTML, XHTML, and a format in which these are extended or limited.
The recording medium in the document processing apparatus of the present invention is a computer-readable recording medium, and records a program for causing a computer to execute any of the methods described above.
The electronic document of the present invention is recorded in a computer storage device as an electronic file that can be read by a computer. However, the electronic document does not need to be permanently recorded except for the completed electronic book, and the processing process is completed. It may be erased at that time.
The summary of the present invention described above does not list all the features necessary for the configuration of the present invention, and sub-combinations of these features can also be the invention.

本発明は、多数の文書ファイルを体系化するために複数の文書ファイルを階層化し順序付けして並べた上で、複数の文書ファイルの見出し、図表のキャプション、注釈などを対象として階層化した番号を、各文書ファイルをまたがる連続番号として付与するとともに、見出しや図表のキャプション、注釈、用語・索引語などを取り出して一覧表などにした新しい文書ファイルを作り出して、元の文書ファイルと新しい文書ファイル間の関係する場所を相互に参照するリンクを設定することにより、複数の文書ファイルを処理するから、マークアップされた多数の文書ファイルを効率よく整理・体系化して一つのパッケージとしてまとめた電子書籍として形成することが、容易かつ簡便にできる。 In the present invention, in order to organize a large number of document files, a plurality of document files are hierarchized and arranged in an order, and the numbers hierarchized for a plurality of document file headings, captions of captions, annotations, etc. In addition to assigning each document file as a sequential number that spans each document file, a new document file is created by taking out captions, annotations, terms, index terms, etc. of headings and diagrams and creating a list, etc., between the original document file and the new document file Since multiple document files are processed by setting links that refer to each other's related locations, an electronic book that efficiently organizes and organizes a large number of marked-up document files into a single package It is easy and simple to form.

本発明の文書処理装置ユニットの全体構造例を示すブロック図。1 is a block diagram showing an example of the overall structure of a document processing apparatus unit according to the present invention. 文書ファイル群の木構造定義ユニットの説明図。Explanatory drawing of the tree structure definition unit of a document file group. 図２のｆｉｌｅ−１の内容の説明図。Explanatory drawing of the content of file-1 of FIG. 図２のｆｉｌｅ−２の内容の説明図。Explanatory drawing of the content of file-2 of FIG. 図２のｆｉｌｅ−３の内容の説明図。Explanatory drawing of the content of file-3 of FIG. 階層化番号形式定義ユニットの説明図。Explanatory drawing of a hierarchical number format definition unit. 電子文書加工処理ユニット３０の文書結合と階層化番号付け処理の説明図。Explanatory drawing of the document combination of the electronic document processing unit 30, and hierarchical numbering processing. 電子文書加工処理ユニット３０の文書結合と階層化番号付け処理の説明図。Explanatory drawing of the document combination of the electronic document processing unit 30, and hierarchical numbering processing. 結合された文書ファイルの内容を説明するための説明図。Explanatory drawing for demonstrating the content of the combined document file. 階層化番号付処理済み文書ファイルの内容を説明するための説明図。Explanatory drawing for demonstrating the content of the hierarchized numbered document file. 結合ファイルを元の文書ファイル単位に分割した文書ファイルの説明図。Explanatory drawing of the document file which divided | segmented the combined file into the original document file unit. 結合ファイルを元の文書ファイル単位に分割した文書ファイルの説明図。Explanatory drawing of the document file which divided | segmented the combined file into the original document file unit. 結合ファイルを元の文書ファイル単位に分割した文書ファイルの説明図。Explanatory drawing of the document file which divided | segmented the combined file into the original document file unit. 文書ファイルの組４００と文書ファイルと注釈一覧の組４６０の説明図。Explanatory drawing of the document file set 400 and the document file and annotation list set 460. 電子文書加工処理ユニット３０の文書結合と注釈一覧作成処理の説明図。Explanatory drawing of the document combination of the electronic document processing unit 30, and an annotation list creation process. 電子文書加工処理ユニット３０の文書結合と注釈一覧作成処理の説明図。Explanatory drawing of the document combination of the electronic document processing unit 30, and an annotation list creation process. 注釈処理用の文書ファイル１０の説明図。Explanatory drawing of the document file 10 for annotation processing. 注釈処理用の文書ファイル１１の説明図。Explanatory drawing of the document file 11 for annotation processing. 文書ファイル１０と文書ファイル１１を結合した文書ファイル３４０の説明図。Explanatory drawing of the document file 340 which combined the document file 10 and the document file 11. FIG. 注釈ファイル処理を済ませた文書ファイル５２０のリンク設定を説明するための説明図。Explanatory drawing for demonstrating the link setting of the document file 520 which completed annotation file processing. 新しく作成した注釈ファイル５３０の内容の説明図。Explanatory drawing of the content of the annotation file 530 newly created. 注釈処理済みの結合ファイルを元の文書ファイルに分割した説明図。Explanatory drawing which divided | segmented the combined file after annotation processing into the original document file. 注釈処理済みの結合ファイルを元の文書ファイルに分割した説明図。Explanatory drawing which divided | segmented the combined file after annotation processing into the original document file.

以下、発明の実施の形態を通じて本発明を説明する。
なお、以下の実施形態は請求項にかかる発明を限定するものではなく、又、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。
すなわち、以下の説明では、本発明の実施形態として、請求項の発明に関わりのない部分については説明を省略しているので、実施形態の説明だけで完全なソリューションを実現することはできない。 Hereinafter, the present invention will be described through embodiments of the invention.
The following embodiments do not limit the invention according to the claims, and all combinations of features described in the embodiments are not necessarily essential to the solution means of the invention.
That is, in the following description, as the embodiment of the present invention, the description of the part not related to the invention of the claims is omitted, so that a complete solution cannot be realized only by the description of the embodiment.

図１は、本実施形態の処理装置の単位（以下、ユニットという）の構成を示したものである。図１において、文書ファイル記録ユニット１０は、ハードディスクなどの記憶媒体を利用してマークアップ済みの多数の文書ファイルを格納するものである。出版物編成指示ユニット２０は、ユーザーと対話するための画面を持つときは画面上に多数の文書ファイルのファイル名およびファイルの内容を表示しながらユーザーの要求に従って、多数の文書ファイルから電子書籍としてパッケージする文書ファイルを選択し、電子書籍を構成する文書ファイルの並びの順序と上下の階層構造を設定する木構造を作成する機能と階層化した番号形式を設定するユニットである。電子文書加工処理ユニット３０は、多数の文書ファイルを文書ファイル記録ユニット１０から受け取り、出版物編成指示ユニット２０が作成した木構造設定データを参照して文書ファイルを一つに結合し、結合した文書ファイルに出版物編成指示ユニット２０で規定した階層化した番号形式を付与し、元の文書ファイル単位に分割するユニットである。電子書籍出力生成ユニット４０は、多数の文書ファイルを電子書籍５０にパッケージするユニットである。 FIG. 1 shows the configuration of a unit (hereinafter referred to as a unit) of the processing apparatus of this embodiment. In FIG. 1, a document file recording unit 10 stores a large number of marked-up document files using a storage medium such as a hard disk. When the publication organization instruction unit 20 has a screen for interacting with the user, it displays the file names and file contents of a large number of document files on the screen according to the user's request from the large number of document files as an electronic book. This is a unit for selecting a document file to be packaged and setting a hierarchical number format and a function for creating a tree structure for setting the order of document files constituting an electronic book and an upper and lower hierarchical structure. The electronic document processing unit 30 receives a large number of document files from the document file recording unit 10, refers to the tree structure setting data created by the publication organization instruction unit 20, combines the document files into one, and combines the documents. This is a unit that gives the file a hierarchical number format defined by the publication organization instruction unit 20 and divides the file into original document file units. The electronic book output generation unit 40 is a unit that packages a large number of document files into the electronic book 50.

図２は、文書ファイル記録ユニット１０の内部に保存された文書ファイル群１１０を対象として、出版物編成指示ユニット２０の中に設けたユーザーとの対話機能をもつ出版物木構造定義ユニット１２０を使って文書ファイル間の木構造を作成する様子を示す説明図である。出版物木構造定義ユニット１２０は、文書ファイル群１１０の中の多数のファイルを整理して電子書籍のパッケージを木構造で表す。図２では電子書籍のパッケージ名をＰｕｂ１として設定しており、Ｐｕｂ１が根（第０階層）に該当する。いうまでもなくパッケージ名は任意に変更可能である。木構造の子供（第１階層）の先頭のファイルがｆｉｌｅ−１、次のファイルがｆｉｌｅ−２であり、ｆｉｌｅ−２の子供（第２階層）の先頭のファイルがｆｉｌｅ−３であり、以下、同じ階層にｆｉｌｅ−ｎまで配置する木構造を作成した様子を表している。この木構造定義は、出版物木構造定義データ１３０として記憶媒体に記録される。
また、この木構造では文書ファイルを取り出す順番は、ｆｉｌｅ−１、ｆｉｌｅ−２、ｆｉｌｅ−３、以下ｆｉｌｅ−ｎと規定する。 FIG. 2 uses a publication tree structure definition unit 120 having a dialog function with a user provided in the publication organization instruction unit 20 for the document file group 110 stored in the document file recording unit 10. FIG. 6 is an explanatory diagram showing how a tree structure between document files is created. The publication tree structure definition unit 120 organizes a large number of files in the document file group 110 to represent the electronic book package in a tree structure. In FIG. 2, the package name of the electronic book is set as Pub1, and Pub1 corresponds to the root (0th hierarchy). Needless to say, the package name can be changed arbitrarily. The first file of the tree-structured child (first layer) is file-1, the next file is file-2, the first file of the child of file-2 (second layer) is file-3, This shows a state in which a tree structure that arranges up to file-n in the same hierarchy is created. This tree structure definition is recorded in the storage medium as publication tree structure definition data 130.
Further, in this tree structure, the document file extraction order is defined as file-1, file-2, file-3, and hereinafter file-n.

図３は、図２におけるｆｉｌｅ−１の内容を模式的に例示している。ｆｉｌｅ−１はＸＨＴＭＬに類似のＸＭＬで記述された内部が木構造の文書であるが、図３では本発明とは関連性のない部分の説明を省略している。以下、文書ファイルの内容の説明では同じ要領で説明を省略する。ｆｉｌｅ−１の内部でｈ１、ｈ２要素は見出しを表すマークアップである。ＸＨＴＭＬでは見出しには階層の概念がないので、この例ではｄｉｖ要素で囲む方式で階層化している。ＨＴＭＬ５では階層を表すためにｓｅｃｔｉｏｎ要素を導入しているが、これはｄｉｖの代わりにｓｅｃｔｉｏｎで階層構造を表すのである。このようにＸＭＬを使った電子文書のマークアップでは、異なる要素と属性の組を使って等価なマークアップを表現することができるが、本実施例の記述とは一見異なっていても、論理的には等価なマークアップ方式を使う実施形態も本発明範囲内のものである。 FIG. 3 schematically illustrates the contents of file-1 in FIG. The file-1 is a document having a tree structure inside XML that is similar to XHTML. However, in FIG. 3, the description of the portion not related to the present invention is omitted. Hereinafter, the description of the contents of the document file will be omitted in the same manner. Within file-1, h1 and h2 elements are markups representing headings. In XHTML, there is no concept of hierarchy in the headings, so in this example, hierarchization is performed by enclosing with div elements. In HTML5, a section element is introduced in order to represent a hierarchy, but this represents a hierarchical structure by section instead of div. In this way, in the markup of an electronic document using XML, an equivalent markup can be expressed using a combination of different elements and attributes, but even if it differs from the description of this embodiment, it is logical. Embodiments using an equivalent markup scheme are also within the scope of the present invention.

図４は、ｆｉｌｅ−２の内容を示している。ｆｉｌｅ−２にはｈ１で表す見出し要素のみを含んでいる。 FIG. 4 shows the contents of file-2. file-2 contains only the heading element represented by h1.

図５は、ｆｉｌｅ−３の内容を示している。ｆｉｌｅ−３にはｈ１で表す見出し要素とｈ２で表す見出し要素を含んでおり、これをｄｉｖ要素でマークアップして階層化している。 FIG. 5 shows the contents of file-3. The file-3 includes a heading element represented by h1 and a heading element represented by h2, and these are marked up with a div element and hierarchized.

図６は、出版物編成指示ユニット２０の中に設けた階層化番号形式定義ユニット２１０の中で電子書籍の文書ファイルの内部の見出し部分に階層化した順序番号を付けるための設定を行なう様子を模式的に示したものである。ここでは、出版物を構成する文書ファイルの木構造の根から数えた階層レベルによって番号を構造化し、同一階層内における出現順序によって番号を増加する仕組みの構造化した番号を、見出しｈ１からｈ３要素の内容に付与する設定を行なう。 FIG. 6 shows a state in which the hierarchical number format definition unit 210 provided in the publication organization instruction unit 20 performs settings for assigning a hierarchical sequence number to the heading portion in the document file of the electronic book. It is shown schematically. Here, the structured number of the structure in which the number is structured according to the hierarchy level counted from the root of the tree structure of the document file constituting the publication and the number is increased according to the appearance order in the same hierarchy is represented by the headings h1 to h3. Set to be added to the contents of.

図７および図８は、電子文書加工処理ユニット３０の内部の処理の様子を模式的に示す説明図である。文書ファイル結合処理ユニット３２０は、出版物木構造定義データ１３０に記録された電子書籍パッケージを構成する文書ファイルを文書ファイル群１１０から取り出して、出版物木構造定義データ１３０で決めた木構造に従ってひとつのファイルに結合し、結合された文書ファイル３４０を記憶媒体の上に作りだす。階層番号付け処理ユニット３５０は結合された文書ファイル３４０を入力として、階層化番号形式定義データ２２０を使って図８に示されている階層番号付処理済み文書ファイル３６０に変換する。文書ファイル結合処理ユニット３２０と階層番号付け処理ユニット３５０は、標準的なＸＳＬＴプロセッサを使って容易に作成することができる。次に文書ファイル分割処理ユニット３７０が階層番号付処理済み文書ファイル３６０を元の文書ファイル単位に分割して電子書籍のパッケージを構成する文書ファイル群３８０を作りだして、文書ファイル記録ユニット１０に記録する。 FIG. 7 and FIG. 8 are explanatory diagrams schematically showing the state of processing inside the electronic document processing unit 30. The document file combination processing unit 320 takes out a document file constituting the electronic book package recorded in the publication tree structure definition data 130 from the document file group 110, and one according to the tree structure determined by the publication tree structure definition data 130. And a combined document file 340 is created on the storage medium. The hierarchical numbering processing unit 350 receives the combined document file 340 and converts it into a hierarchical numbered processed document file 360 shown in FIG. 8 using the hierarchical number format definition data 220. The document file combination processing unit 320 and the hierarchical numbering processing unit 350 can be easily created using a standard XSLT processor. Next, the document file division processing unit 370 divides the hierarchically numbered processed document file 360 into original document file units to create a document file group 380 constituting an electronic book package, and records it in the document file recording unit 10. .

図９は、結合された文書ファイル３４０の内部の様子を模式的に示している。結合された文書ファイル３４０の木構造における根要素はｐｕｂｌｉｓｈであり、その属性に電子書籍のパッケージ名Ｐｕｂ１を設定している。結合された文書ファイルの区切りにはｆｉｌｅ要素を配置し、ｆｉｌｅ要素の内部に文書ファイルの内容を取り込んでいる。ｆｉｌｅ要素のｌｅｖｅｌ属性に階層化番号形式定義データ２２０に記録されたその文書ファイルのレベルを記録し、ｎａｍｅ属性にファイル名を記録した形式でマークアップしている。ＸＭＬ仕様に基づくこのようなマークアップ方式と論理的に等価な表現方式は様々に考えられるが、本発明方式と等価な結合文書ファイルのマークアップ方式を使った実施形態は発明の範囲内のものである。 FIG. 9 schematically shows the internal state of the combined document file 340. The root element in the tree structure of the combined document file 340 is “publish”, and the package name “Pub1” of the electronic book is set as its attribute. A file element is arranged at the delimiter of the combined document file, and the contents of the document file are taken in the file element. The level of the document file recorded in the hierarchical number format definition data 220 is recorded in the level attribute of the file element, and the file name is recorded in the format of the name attribute. Various expression methods logically equivalent to such a markup method based on the XML specification can be considered, but an embodiment using the markup method of a combined document file equivalent to the method of the present invention is within the scope of the invention. It is.

図１０は、結合された文書ファイル３４０から変換して作成した階層番号付処理済み文書ファイル３６０の内部の様子を模式的に例示したものである。図１０で下線を施した部分が階層化番号である。階層化番号形式定義データ２２０によって階層番号をつける対象は、見出し要素ｈ１〜ｈ６と指定しているので、見出しの内容に階層化番号が付与される。階層については、出版物木構造定義データ１３０により、文書ファイル間の木構造で階層構造を規定しており、文書ファイル自体の階層レベルは、ｆｉｌｅ要素のｌｅｖｅｌ属性で表現されており、さらに文書ファイルの内部では、ｄｉｖ要素によって階層化しているので、結合済み文書ファイル自体も木構造であり、その内部のすべての見出し要素ｈ１〜ｈ６について、木構造の根要素であるｐｕｂｌｉｓｈからみて、何レベル目の階層に該当するかということを一意に決定でき、同一階層レベル内での当該見出しの出現順序を一意に決定できる。 FIG. 10 schematically illustrates an internal state of the hierarchically numbered processed document file 360 created by converting from the combined document file 340. The underlined portion in FIG. 10 is the hierarchization number. Since the targets to be assigned hierarchical numbers by the hierarchical number format definition data 220 are designated as heading elements h1 to h6, a hierarchical number is given to the contents of the heading. Regarding the hierarchy, the publication tree structure definition data 130 defines a hierarchical structure with a tree structure between document files. The hierarchy level of the document file itself is expressed by the level attribute of the file element. Since the combined document file itself has a tree structure, all the heading elements h1 to h6 in the hierarchy are numbered from the publish which is the root element of the tree structure. It can be uniquely determined whether it corresponds to the hierarchy, and the appearance order of the headings within the same hierarchical level can be uniquely determined.

図１１〜図１３は、階層化番号をつける処理を施したあとの結合ファイル３６０を元の文書ファイル単位に分割してできた文書ファイルの内容である。各文書ファイルの名前は結合ファイル図１０のｆｉｌｅ要素のｎａｍｅ属性の値として保持されているので、分割してできた文書ファイルには元の文書ファイル名を付与することができる。 FIG. 11 to FIG. 13 show the contents of a document file obtained by dividing the combined file 360 after the processing for assigning hierarchical numbers into original document file units. Since the name of each document file is held as the value of the name attribute of the file element in FIG. 10, the original document file name can be given to the divided document file.

図１４は、請求項２の発明を実施した適用例の概要を示したものである。図１４で示す概要における処理の例は、文書ファイルの組４００は、二つの文書ファイル４１０と４２０から構成する。文書ファイル４１０と４１０の内容には、注釈文章がマークアップされている。文書ファイル４１０から注釈文章を削除し、その代わりに注釈参照と注釈一覧文書ファイル４５０の関連する箇所へのリンクを挿入した文書ファイル４３０に変換し、文書ファイル４２０から注釈文章を削除し、その代わりに注釈参照と注釈一覧文書ファイル４５０の関連する箇所へのリンクを挿入した文書ファイル４４０に変換する。注釈一覧文書４５０には各注釈の文章からその文章が含まれていた文書ファイルの含まれていた箇所へのリンクを設定する。 FIG. 14 shows an outline of an application example in which the invention of claim 2 is implemented. In the example of the processing in the outline shown in FIG. 14, the document file set 400 includes two document files 410 and 420. Annotation text is marked up in the contents of the document files 410 and 410. Instead, the annotation text is deleted from the document file 410, and instead, the annotation reference and the link to the relevant part of the annotation list document file 450 are inserted into the document file 430, and the annotation text is deleted from the document file 420, instead. Is converted into a document file 440 in which a comment reference and a link to a related portion of the annotation list document file 450 are inserted. In the annotation list document 450, a link is set from the text of each annotation to the location where the document file containing the text was included.

図１５および図１６は、請求項２の発明を実施するさいの電子文書加工処理ユニット３０の内部の処理の様子を例示した説明図である。請求項２の発明の実施形態では、結合された文書ファイル３４０を作成するところまでは先に説明した実施形態と共通である。次に、電子文書加工処理ユニット３０内に設けた注釈処理ユニット５１０は注釈処理指示データ５００を使って結合された文書ファイル３４０の内部の注釈文章を削除して、その代わりに注釈参照番号と注釈ファイル５３０の中の参照先へのリンクを挿入して、注釈処理済み結合文書ファイル５２０に変換する。結合された文書ファイル３４０の中の注釈文章は注釈ファイル５３０に集められて、注釈処理済み結合文書ファイル５２０の注釈参照番号への参照のためのリンクを設定する。注釈処理ユニット５１０は、標準的なＸＳＬＴプロセッサをつかって容易に作成することができる。次に、電子文書加工処理ユニット３０内に設けた文書ファイル分割処理ユニット３７０が注釈処理済結合文書ファイル５２０を元の文書ファイル単位に分割して電子書籍パッケージを構成する文書ファイル群３８０を作りだして、文書ファイル記録ユニット１０に記録する。 FIGS. 15 and 16 are explanatory views illustrating the state of processing inside the electronic document processing unit 30 when carrying out the invention of claim 2. The embodiment of the invention of claim 2 is the same as the embodiment described above until the combined document file 340 is created. Next, the annotation processing unit 510 provided in the electronic document processing unit 30 deletes the annotation text inside the document file 340 combined using the annotation processing instruction data 500, and instead uses the annotation reference number and the annotation. A link to a reference destination in the file 530 is inserted and converted into an annotated combined document file 520. The annotation text in the combined document file 340 is collected in the annotation file 530 to establish a link for reference to the annotation reference number in the annotated combined document file 520. The annotation processing unit 510 can be easily created using a standard XSLT processor. Next, the document file division processing unit 370 provided in the electronic document processing unit 30 creates the document file group 380 constituting the electronic book package by dividing the annotation-processed combined document file 520 into original document file units. To be recorded in the document file recording unit 10.

図１７および図１８は、注釈処理を行なおうとする二つの文書ファイルｆｉｌｅ−１０とｆｉｌｅ−１１の内容を示したところである。注釈はｎｏｔｅ要素でマークアップされている。ｎｏｔｅには注釈の場所を示すための識別子がｉｄ属性の値として設定されている。注釈をｎｏｔｅではない別の要素でマークアップしても本実施例と同様の効果を得ることができるので、これと等価なマークアップを使った実施形態は本発明の範囲内のものである。 FIGS. 17 and 18 show the contents of two document files file-10 and file-11 to be subjected to the annotation process. Annotations are marked up with note elements. In the note, an identifier for indicating the location of the annotation is set as the value of the id attribute. Even if the annotation is marked up with another element other than “note”, the same effect as in this embodiment can be obtained. Therefore, an embodiment using markup equivalent to this is within the scope of the present invention.

図１９は、文書ファイル図１７と文書ファイル図１８を結合したときの結合された文書ファイル３４０の内容である。 FIG. 19 shows the contents of the combined document file 340 when the document file FIG. 17 and the document file FIG. 18 are combined.

図２０は、結合済みの文書ファイル３４０の注釈マークアップの内容である文章を削除して、そこに、注釈文章をあつめて作成した注釈ファイル５３０の注釈番号への参照のためのリンクを挿入したものである。この実施形態では注釈番号は階層によらずｎｏｔｅ要素の出現順序で番号を付与しているが、請求項１の方法を適用して注釈に階層化番号をつけることも容易である。 In FIG. 20, the text that is the content of the annotation markup of the combined document file 340 is deleted, and a link for reference to the annotation number of the annotation file 530 created by adding the annotation text is inserted therein. Is. In this embodiment, the annotation numbers are assigned in the order in which the note elements appear regardless of the hierarchy, but it is also easy to apply a hierarchical number to the annotation by applying the method of claim 1.

図２１は、新しく作成した注釈ファイル５３０の内容である。各注釈はｌｉでマークアップしているが、ｌｉには、元の文書の注釈のｎｏｔｅのｉｄに対応するｉｄを設定している。このｉｄがあるので結合済みの文書ファイル３４０から注釈ファイル５３０へのリンクを設定することができる。各注釈には元の文書の該当箇所への参照リンクを設定している。 FIG. 21 shows the contents of a newly created annotation file 530. Each annotation is marked up with li, and an id corresponding to the note id of the annotation of the original document is set in li. Since this id exists, a link from the combined document file 340 to the annotation file 530 can be set. Each annotation has a reference link to the corresponding part of the original document.

図２２と図２３は、注釈処理済みの結合ファイルを元の文書ファイルｆｉｌｅ−１０とｆｉｌｅ−１１に分割したものである。 22 and FIG. 23 are obtained by dividing the combined file after the annotation processing into the original document file file-10 and file-11.

図１４で概要を示し、段落００２３〜００２８で実施例を説明した注釈処理では、元の文書ファイルの中の注釈としてマークアップされた箇所を削除して注釈番号を挿入した。本発明では、図表のキャプションを集めて一覧する新しい文書ファイルを作成して、元の文書ファイルの図表キャプションと新しい文書ファイルの対応する図表キャプションに相互参照のリンクを設定する仕組みも上記例と同様にして実現できる。
これにより元の文書のマークアップ箇所の操作の異なる実施形態も本発明の範囲内のものであることが分かる。 In the annotation process outlined in FIG. 14 and described in the embodiments in paragraphs 0023 to 0028, the part numbered as an annotation in the original document file is deleted and the annotation number is inserted. In the present invention, a mechanism for creating a new document file for collecting and listing chart captions and setting a cross-reference link between the chart caption of the original document file and the corresponding chart caption of the new document file is the same as the above example. Can be realized.
Thus, it can be seen that different embodiments of manipulation of markup locations in the original document are also within the scope of the present invention.

本発明は以上の通りであるから、多数の文書ファイルを利用して電子書籍を作成する上できわめて有用である。 Since the present invention is as described above, it is extremely useful for creating an electronic book using a large number of document files.

１０文書ファイル記録ユニット
２０出版物編成指示ユニット
３０電子文書加工処理ユニット
４０電子書籍生成出力ユニット
５０電子書籍 10 Document File Recording Unit 20 Publication Organization Instruction Unit 30 Electronic Document Processing Unit 40 Electronic Book Generation Output Unit 50 Electronic Book

Claims

A document processing method for assigning a serial number that is hierarchized to a plurality of document files to markup portions of a plurality of document files that are marked up in a tree structure, and is an editing unit provided in a computer A step of creating a tree structure definition representing a relationship between a plurality of document files; a step of reading data of a plurality of document files by a computer program and combining the data into a single document file using the tree structure definition; and a combined document A step of assigning a hierarchized serial number to a markup location in the file, and a step of dividing the combined document file after giving the serial number into a plurality of original document files Document processing method.

A document processing method for generating a new file by using markup of a plurality of document files that have been marked up to represent a tree structure, wherein the computer program reads the data of the plurality of document files Using the relation tree structure definition of many document files to combine them into a single document file, create a new document file that lists the markup locations in the combined document file, and create a new document file. A step of giving a bi-directional reference relationship between the item of the document file and the item in the combined document file, and a step of dividing the combined document file after giving the reference relationship into a plurality of original document files And a document processing method.

The markup document file is described in any one of XML, HTML, XHTML, and an extended or limited form thereof. Item 3. The document processing method according to Item 2.

Read the file name and name of the electronic file from a recording device that records a large number of document files with markup, and then read the order and hierarchy when reading many electronic document files in one package. Define a tree structure and define a setting to create a new document using markup, and read an electronic document file from a recording device that records many document files with markup and define it first Combining the electronic document files according to the specified tree structure of the document files, and adding a hierarchical serial number to the part specified in the markup of the combined electronic document files, The specified markup part of the created electronic document file is duplicated to create a new electronic document file, and the combined electronic document file A step of marking up a bi-directional reference between a new electronic document file and a new electronic document file, a step of dividing the combined electronic document file into units of the original electronic document file, and an electronic document obtained by the above steps A document processing method comprising a step of receiving a file group and packaging it in the form of an electronic book.

A storage device that records and stores a large number of document files, each marked up as a tree structure;
A function of defining a relationship between a large number of document files as a tree structure, and combining the document files retrieved from the storage device according to the tree structure definition and assigning a serial number that is hierarchized between the multiple documents; A function for defining and processing to create a new document file by extracting and collecting predetermined markup parts, and creating a new document by extracting predetermined markup parts from a number of document files using the definition, and A document processing unit having a function of dividing an electronic document file into original file units after creating a link that realizes a cross-reference between the new document and the original document;
An electronic book output mechanism unit that creates an electronic book by packaging the output of the document processing unit;
A document processing apparatus comprising: