JP2002117055A

JP2002117055A - Structured document managing method

Info

Publication number: JP2002117055A
Application number: JP2000307025A
Authority: JP
Inventors: Takuya Okamoto; 卓哉岡本; Atsushi Shimada; 敦史島田; Masayoshi Matsumoto; 正義松本; Makoto Imachi; 真琴井町; Toru Takahashi; 亨高橋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-10-06
Filing date: 2000-10-06
Publication date: 2002-04-19

Abstract

PROBLEM TO BE SOLVED: To provide a structured document managing method which efficiently manages a structured document and makes it possible to obtain a structured document on which a fast update function and an update result are reflected. SOLUTION: The structured document management system 101 comprises a memory which stores programs 104, 106, 110, 112, 114 and 116 operating according to instructions from an input/output device 102 and data, and files, and a structured document 103 to be registered is registered in a structured document DB 105 by the document registration program 104, an attribute information extraction and registration processing program 106 extracts property information from the structured document 103 according to extraction element definitions 107 and registers it in an attribute information DB 108, and the property information update program 110 updates property information in the property information DB 108 according to property information update specification 109. Then the structured document readout program 112 and property information readout processing program 114 read structured document contents 113 and property information 115 out according to the specification 111 of the readout document and the updated structured document generation processing program 116 generates a structured document 117 having been updated by using the structured document contents 113 and property information 115.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】文書管理システムにおける構
造化文書を管理する文書管理方法に係り、特に、構造化
文書を管理する際に、構造化文書の特定要素の内容を抽
出し、構造化文書と関連付けて管理するシステムにおい
て、構造化文書の内容と抽出した内容の更新結果との整
合性を取るための方法を提供する。[0001] 1. Field of the Invention [0002] The present invention relates to a document management method for managing a structured document in a document management system, and in particular, when managing a structured document, extracts the contents of a specific element of the structured document, and Provided is a method for maintaining consistency between the content of a structured document and an update result of extracted content in a system for managing in association.

【０００２】[0002]

【従来の技術】ワードプロセッサなどの普及により、作
成される文書情報の電子化が進んでいる。これらの電子
化文書は、作成される機器、ソフトウェアによって個々
のフォーマットを持っており、別の機器あるいはソフト
ウェアでは利用できない、あるいは、何らかの変換手段
を用意することが必要となっていた。このような文書交
換のための共通フォーマットとして、各種の構造化文書
が提案されている。これらの構造化文書は、文書の基本
構造である、章、節、項などの階層構造を定義できるだ
けでなく、レイアウト情報を含むことも可能となってい
る。構造化文書の記述言語として、標準化が進められて
いる言語の１つに、ＸＭＬ（eXtensible Markup Langua
ge）がある。ＸＭＬは、構造化文書の構造情報をタグと
呼ばれる特定の文字列をテキスト中に埋め込むことで、
文書の構造を表現する方法を用いている。ＸＭＬでは、
タグの名称、内容、さらに、タグによって示される文書
構造をＤＴＤ（Document Type Definition）＝「文書型
定義」によって規定することができる。ＸＭＬは、Ｗ３
Ｃ（World Wide Web Consortium）で規格化が進められ
ており、1998年2月に、Version1.0の勧告版が公開され
ている。ＸＭＬは元々構造化文書の記述言語として規格
化が進められたのであるが、記述の自由度が高いこと、
テキストであるためデータの互換性があること、また内
容が可読であることから、様々な業種においてインター
ネット上でのデータ交換用のフォーマットへの適用が進
められている。これらのデータ交換フォーマットとして
利用されるＸＭＬは、文書ではなくタグによって意味付
けされたデータとして位置づけられる。したがって、こ
のようなＸＭＬは、インターネット上でのデータ交換は
ＸＭＬで行われても、管理する際には、タグによって識
別されるデータを分解して、データベースなどに格納す
る場合が多い。2. Description of the Related Art With the spread of word processors and the like, computerized document information is being digitized. These digitized documents have individual formats depending on the device or software to be created, and cannot be used with another device or software, or it is necessary to prepare some conversion means. Various structured documents have been proposed as a common format for such document exchange. These structured documents can not only define a hierarchical structure such as chapters, sections, and sections, which are the basic structure of the document, but also include layout information. XML (eXtensible Markup Language) is one of the languages that are being standardized as a description language for structured documents.
ge). XML embeds structural information of a structured document in a specific character string called a tag in text,
It uses a method to represent the structure of the document. In XML,
The name and content of the tag and the document structure indicated by the tag can be defined by DTD (Document Type Definition) = “document type definition”. XML is W3
Standardization is being promoted by C (World Wide Web Consortium), and a recommended version of Version 1.0 was released in February 1998. Although XML was originally standardized as a description language for structured documents, it has a high degree of freedom in description.
Because of the text, the data is compatible, and the content is readable, so that it is being applied to formats for data exchange on the Internet in various industries. The XML used as these data exchange formats is positioned not as a document but as data meaningful by a tag. Therefore, in such XML, even if data exchange on the Internet is performed by XML, in many cases, when managed, data identified by a tag is decomposed and stored in a database or the like.

【０００３】また、ＸＭＬ関連の規格として、特定の用
途に利用するためのＤＴＤおよび記述ルールが定めら
れ、様々な用途に利用されている。これらの規格の１つ
にＸＳＬＴ(XSL(= eXtensible Stylesheet Language) T
ransformations)がある。本規格もＷ３Ｃから1999年11
月にVersion1.0が勧告となっている。ＸＳＬＴは、ＸＭ
Ｌで記述された文書を別の構造を持つ構造化文書に変換
するための定義を記述するためのＸＭＬである。ＸＳＬ
Ｔで変換定義を記述し、変換元のＸＭＬ文書を与えるこ
とで、変換後のＸＭＬを生成する機能を持つＸＳＬＴエ
ンジンが存在する。また、構造化文書中の要素を指定す
るための規格として、ＸＰａｔｈ(XML Path Language)
が存在する。ＸＰａｔｈも1999年11月に、Ｗ３Ｃにより
規格化されている。ＸＰａｔｈでは、最上位構造である
<文書>要素の下の<タイトル>要素は、"/文書/タイトル"
のように記述される。ＸＰａｔｈは、ＸＳＬＴにおい
て、構造化文書の変換対象の要素を特定するために用い
られる。[0003] Further, as XML-related standards, a DTD and a description rule for use in a specific application are defined, and are used for various applications. One of these standards is XSLT (XSL (= eXtensible Stylesheet Language) T
ransformations). This standard is also W3C since 1999 11
Version 1.0 is recommended in January. XSLT is XM
This is XML for describing a definition for converting a document described in L into a structured document having another structure. XSL
There is an XSLT engine that has a function of generating a converted XML by describing a conversion definition in T and providing a conversion source XML document. As a standard for designating elements in a structured document, XPath (XML Path Language)
Exists. XPath was also standardized by W3C in November 1999. In XPath, it is the top-level structure
The <title> element under the <document> element is "/ document / title"
It is described as follows. XPath is used in XSLT to specify an element to be converted of a structured document.

【０００４】ＸＭＬが、データ交換用のフォーマットと
して利用される場合、上記のようにＸＭＬの内容を抽出
して、データとして管理する。しかし、これらのデータ
がインターネットを通じて交換される際には、抽出した
データの更新結果を元にＸＭＬを生成することが必要と
なる。このような機能を実現するための従来技術（以
下、従来技術１）として、構造化文書から全ての要素の
内容を抽出して、抽出方法の定義と共に管理し、抽出し
たデータと抽出ルールを元に構造化文書を再構築するこ
とで、抽出したデータの更新結果を反映した構造化文書
を生成する技術がある。本技術に関するホワイトペーパ
がhttp://www.infoteria.com/jp/contents/product/xml
-s-c/WPiCon.pdfに記載されている。従来技術１によ
り、ＸＭＬで記述されたデータをデータベースに格納す
ることと、データベースに格納されたデータからＸＭＬ
を生成することができる。これにより、ＸＭＬの全要素
の内容をデータベースに格納することで、更新した内容
を反映した構造化文書を生成することが可能となる。When XML is used as a format for data exchange, the contents of the XML are extracted and managed as data as described above. However, when these data are exchanged through the Internet, it is necessary to generate XML based on the update result of the extracted data. As a conventional technology (hereinafter referred to as a conventional technology 1) for realizing such a function, the contents of all elements are extracted from a structured document, managed together with the definition of an extraction method, and the extracted data and extraction rules are used as a basis. There is a technique for generating a structured document that reflects an update result of extracted data by reconstructing a structured document. White paper on this technology is available at http://www.infoteria.com/jp/contents/product/xml
It is described in -sc / WPiCon.pdf. According to the prior art 1, data described in XML is stored in a database, and XML described in XML is stored in the database.
Can be generated. This makes it possible to generate a structured document reflecting the updated contents by storing the contents of all XML elements in the database.

【０００５】また、別の従来技術(以下、従来技術２)と
して、構造化文書を解析し、構造の階層情報を保持した
ままの構造解析データとして構造化文書を格納すること
で、構造化文書の任意の要素を参照、更新することを可
能にする技術が存在する。従来技術２では、構造化文書
の解析結果を保持することで、任意の要素の内容に対す
る参照および更新が可能である。したがって、更新対象
となるデータを抽出して保持するのではなく、構造解析
データとして保持し、解析データを更新する。本従来技
術では、更新対象の時に更新結果を反映した構造化文書
を生成することができるため、構造化文書の更新が可能
となる。本技術に関する仕様がhttp://www-4.ibm.com/s
oftware/data/db2/extenders/xmlext/docs/v71wrk/dxxa
wmst.pdfに示されている。[0005] As another prior art (hereinafter, referred to as prior art 2), a structured document is analyzed, and the structured document is stored as structural analysis data while retaining structure hierarchical information. There is a technology that makes it possible to refer to and update any element of the. In the prior art 2, by holding the analysis result of the structured document, it is possible to refer to and update the content of an arbitrary element. Therefore, instead of extracting and holding the data to be updated, the data is held as structural analysis data and the analysis data is updated. According to the related art, a structured document that reflects an update result can be generated when an update is to be performed, so that the structured document can be updated. Http://www-4.ibm.com/s
oftware / data / db2 / extenders / xmlext / docs / v71wrk / dxxa
It is shown in wmst.pdf.

【０００６】図２０および図２１を用いてこれらの従来
技術の内容を説明する。図２０は、従来技術１の内容を
説明する図である。本従来技術では、登録対象のＸＭＬ
（２００１）の内容を分解して、データベースのカラム
（２００２）などに格納し、逆にデータベースのカラム
（２００２）などに格納されたデータから、ＸＭＬ（２
００１）を再構築することができる。本図に示すよう
に、登録対象のＸＭＬ（２００１）を、ＸＭＬ変換処理
（２００３）によって、データベースの格納形式に対応
した中間形式のＸＭＬ（２００４）に変換する。本機能
を実現するために、登録対象のＸＭＬのＤＴＤ（２００
５）とデータベースのカラム情報（２００６）との対応
付けを行なう変換情報生成処理（２００７）を実行し、
変換情報（２００８）を生成する。ＸＭＬ変換処理（２
００３）では、生成された変換情報（２００８）を元に
中間形式のＸＭＬ（２００４）を生成する。この中間形
式のＸＭＬ（２００４）には、データベースのカラムに
対応する情報が記述されているため、その記載内容を解
析することで、ＤＢ入出力処理（２００９）によって、
データベースのカラム（２００２）などに格納すること
ができる。逆に、データベースのカラム（２００２）な
どに格納されたデータから、ＸＭＬ（２００１）を生成
する場合は、まず、ＤＢ入出力処理（２００９）におい
て、カラム名に対応したタグを付与した中間形式のＸＭ
Ｌ（２００４）を生成する。この中間形式のＸＭＬ（２
００４）に対して、変換情報（２００８）を利用するこ
とで、ＸＭＬ変換処理（２００３）においてＸＭＬ文書
を生成する。本従来技術１により、構造化文書（２００
１）の全ての情報を、データベースのカラムなど（２０
０２）に格納し、データベースに対する操作により、格
納した情報を更新する。そして、更新した情報を元に、
構造化文書を生成することで、更新した内容を反映した
構造化文書を生成することができる。The contents of these prior arts will be described with reference to FIGS. 20 and 21. FIG. 20 is a diagram for explaining the contents of the prior art 1. In this prior art, the XML to be registered is
(2001) is decomposed and stored in a database column (2002) or the like, and conversely, XML (2)
001) can be reconstructed. As shown in the figure, the XML (2001) to be registered is converted into an intermediate XML (2004) corresponding to the storage format of the database by an XML conversion process (2003). In order to realize this function, the XML DTD (200
5) executing conversion information generation processing (2007) for associating database column information (2006) with
The conversion information (2008) is generated. XML conversion processing (2
In 003), an intermediate format XML (2004) is generated based on the generated conversion information (2008). Since the information corresponding to the columns of the database is described in the XML (2004) of the intermediate format, the content of the analysis is analyzed, and the DB input / output processing (2009) executes
It can be stored in a column (2002) of the database. Conversely, when generating XML (2001) from data stored in the database column (2002) or the like, first, in the DB input / output processing (2009), an intermediate format with a tag corresponding to the column name is added. XM
L (2004). The XML (2
004), an XML document is generated in the XML conversion process (2003) by using the conversion information (2008). According to the related art 1, a structured document (200
All information of 1) is stored in database columns (20
02), and the stored information is updated by operating the database. And, based on the updated information,
By generating a structured document, it is possible to generate a structured document reflecting the updated contents.

【０００７】図２１は、従来技術２の内容を説明する図
である。本図に示すように登録されたＸＭＬ文書（２１
０１）をＸＭＬ登録処理（２１０２）において解析し、
構造解析データ（２１０３）を生成し保持する。構造解
析データ（２１０３）は、文書構造を木構造で表現して
おり、この木構造の特定の要素を指定して、その内容を
参照や更新することができる。さらに、この構造解析デ
ータ（２１０３）から、ＸＭＬ取得処理（２１０４）に
より、構造化文書を復元することが可能である。したが
って、構造解析データに対して、更新を行なうことで、
構造解析データに対する更新結果を反映した更新後ＸＭ
Ｌ（２１０５）を生成することができる。FIG. 21 is a diagram for explaining the contents of the prior art 2. As shown in the figure, an XML document (21
01) in the XML registration process (2102),
Generate and hold structural analysis data (2103). The structure analysis data (2103) expresses the document structure in a tree structure, and a specific element of the tree structure can be designated to refer to or update the content. Further, a structured document can be restored from the structural analysis data (2103) by an XML acquisition process (2104). Therefore, by updating the structural analysis data,
XM after update that reflects the update result for structural analysis data
L (2105) can be generated.

【０００８】[0008]

【発明が解決しようとする課題】上記の従来技術に示し
たように、構造化文書の任意の箇所を更新した場合に、
更新結果を反映した構造化文書を取得することが可能で
ある。しかし、従来技術１に示した方法の場合、構造化
文書の全ての内容を抽出して管理しなければ、更新結果
を反映した構造化文書を生成することができないという
問題がある。また、従来技術２による構造解析データに
対する更新処理は、指定構造の取得、更新結果を反映し
た構造解析データの生成などの処理が必要になるため、
更新処理に大きな処理時間を要するという問題がある。
また、上記のいずれの技術であっても、文書管理機能と
して登録対象の構造化文書を原文書のまま管理しておく
場合は、構造解析データもしくは抽出データとして取得
した内容と、構造化文書の原文書をいずれも保持しなけ
ればならないことから、管理のためのデータ容量が大き
くなるという問題がある。登録対象の構造化文書からそ
の一部だけを属性情報として抽出して管理し、構造化文
書そのものは、そのまま原文書として管理するような構
造化文書管理システムにおいては、いずれの従来技術を
適用しても、上記のような問題が残る。本発明が解決し
ようとする課題は、これらの問題に対して、構造化文書
を効率的に管理し、かつ高速な更新機能と更新結果を反
映した構造化文書の取得を可能とする構造化文書管理方
法を提供することである。As described in the prior art, when an arbitrary portion of a structured document is updated,
It is possible to obtain a structured document reflecting the update result. However, the method described in the prior art 1 has a problem that unless all contents of the structured document are extracted and managed, a structured document reflecting the update result cannot be generated. In addition, the update processing for the structural analysis data according to the related art 2 requires processing such as acquisition of a specified structure and generation of structural analysis data reflecting the update result.
There is a problem that a long processing time is required for the update processing.
Also, in any of the above technologies, when the structured document to be registered is managed as the original document as a document management function, the contents acquired as structural analysis data or extracted data and the contents of the structured document Since all the original documents must be stored, there is a problem that the data capacity for management becomes large. In a structured document management system in which only a part of a structured document to be registered is extracted and managed as attribute information, and the structured document itself is managed as it is as the original document, any conventional technology is applied. However, the above problem remains. The problem to be solved by the present invention is to solve these problems, a structured document that efficiently manages a structured document, and enables a high-speed update function and the acquisition of a structured document reflecting the update result. It is to provide a management method.

【０００９】[0009]

【課題を解決するための手段】上記の課題を解決するた
めに、本発明では、電子文書を管理し、登録、参照する
機能を持つ文書管理システムにおいて、構造化文書を登
録する際に、構造化文書と構造化文書から抽出した属性
情報と属性情報の抽出方法の定義を管理し、さらに、抽
出した属性情報は、構造化文書と無関係に更新可能と
し、そして、構造化文書を取得する際に、元の構造化文
書から属性を抽出した箇所の更新結果を構造化文書に書
き換えた構造化文書を取得するようにしている。更に詳
細に述べると、登録対象の構造化文書から抽出すべき要
素を指定する抽出要素指定情報に基づいて、構造化文書
から指定された要素の内容を抽出する構造化文書内容抽
出ステップと、抽出した内容と登録対象の構造化文書を
対応付けて登録・管理する構造化文書管理ステップと、
前記抽出した内容を更新情報に基づいて更新する抽出内
容更新ステップと、登録した構造化文書を元に、抽出内
容更新ステップにおいて更新された抽出した内容を反映
した構造化文書を生成する更新済構造化文書生成ステッ
プを有するようにしている。また、上記の抽出した内容
と登録対象の構造化文書を対応付けて登録・管理する構
造化文書管理ステップでは、抽出した内容を構造化文書
の属性情報として管理するステップ、もしくは、データ
ベース上における同じレコードの異なるカラム上に格納
するステップ、もしくは、データベース上における異な
るテーブルのレコードに格納し、それぞれのレコードを
繋ぐキーとなる値を保持するカラムに値を格納するステ
ップを有するようにしている。また、上記の更新済構造
化文書生成ステップでは、上記の構造化文書内容抽出ス
テップで使用した抽出要素指定情報を元に、該構造化文
書内容抽出ステップで指定された要素および該要素の内
容の格納先を特定し、この格納先を特定する情報に基づ
いて、登録に使用した構造化文書から抽出した内容を該
内容の更新後の情報に置き換える指定要素内容更新ステ
ップによって、抽出した内容の更新結果を反映した構造
化文書を生成するようにしている。また、上記の指定要
素内容更新ステップでは、上記の構造化文書内容抽出ス
テップで使用した抽出要素指定情報と、上記の抽出内容
更新ステップで更新された内容を元に、構造化文書の指
定した要素の内容を前記抽出内容更新ステップで更新さ
れた内容に変換するための構造化文書変換定義情報を生
成し、本構造化文書変換定義情報に基づき構造化文書を
変換する構造化文書変換ステップにより、登録に使用し
た構造化文書の内容を変換するようにしている。また、
上記の更新済構造化文書生成ステップでは、本ステップ
において生成した構造化文書を文書登録システムに登録
する更新済構造化文書登録ステップを有し、更新済構造
化文書と更新済の属性情報を対応付けて管理することが
できるようにしている。また、上記の構造化文書管理ス
テップでは、構造化文書内容抽出ステップで摘出した内
容が構造化文書の登録時点から更新されているか否かを
識別する抽出内容更新情報を保持し、上記の抽出内容更
新ステップでは、更新した内容に対応する抽出内容更新
情報に、抽出内容が更新されたことを示す情報を設定
し、上記の更新済構造化文書生成ステップでは、抽出内
容更新情報を参照することで、更新が行なわれた抽出内
容だけを更新対象として、抽出した内容の更新結果を反
映した構造化文書を生成するようにしている。また、上
記の更新済構造化文書登録ステップでは、抽出内容更新
情報に未更新であることを示す情報を設定するステップ
を有することにより、更新後の構造化文書に対して、さ
らに更新された抽出内容だけを対象とした構造化文書の
更新処理を実現することができるようにしている。According to the present invention, there is provided a document management system having a function of managing, registering, and referring to an electronic document. Manages the definition of attribute information extracted from structured documents and structured documents, and the method of extracting attribute information. Furthermore, the extracted attribute information can be updated independently of the structured documents. Then, a structured document in which the update result of the location where the attribute is extracted from the original structured document is rewritten to a structured document is obtained. More specifically, a structured document content extracting step of extracting the content of the specified element from the structured document based on the extracted element specifying information for specifying the element to be extracted from the structured document to be registered; A structured document management step of registering and managing the registered content and the structured document to be registered,
An extracted content updating step of updating the extracted content based on the update information; and an updated structure for generating a structured document reflecting the extracted content updated in the extracted content updating step based on the registered structured document It has a structured document generation step. In the structured document management step of registering and managing the extracted content and the structured document to be registered in association with each other, the step of managing the extracted content as attribute information of the structured document or the same in the database There is a step of storing the data in different columns of the record, or storing the data in a record of a different table on the database and storing the value in a column holding a value serving as a key connecting the records. In the updated structured document generating step, based on the extracted element designation information used in the structured document content extracting step, the element specified in the structured document content extracting step and the content of the element are extracted. The storage location is specified, and based on the information specifying the storage location, the content extracted from the structured document used for registration is replaced with the updated information of the content. A structured document reflecting the result is generated. In the above-mentioned specified element content updating step, the specified element of the structured document is specified based on the extracted element specifying information used in the structured document content extracting step and the content updated in the extracted content updating step. In the structured document conversion step of generating structured document conversion definition information for converting the content of the extracted content into the content updated in the extracted content update step, and converting the structured document based on the structured document conversion definition information, The contents of the structured document used for registration are converted. Also,
The above-described updated structured document generation step includes an updated structured document registration step of registering the structured document generated in this step in the document registration system, and associates the updated structured document with the updated attribute information. It can be attached and managed. Further, in the structured document management step, the extracted content update information for identifying whether or not the content extracted in the structured document content extracting step has been updated since the registration of the structured document is held, and the extracted content In the update step, information indicating that the extracted content has been updated is set in the extracted content update information corresponding to the updated content, and in the above-described updated structured document generation step, the extracted content update information is referred to. In addition, a structured document reflecting the update result of the extracted content is generated with only the extracted content updated as an update target. In the above-mentioned updated structured document registering step, a step of setting information indicating that the extracted structured document is not updated is included in the extracted content update information, so that the updated structured document is further updated. It is possible to realize a structured document update process for only the contents.

【００１０】[0010]

【発明の実施の形態】図１に本発明を利用した第一の実
施例の処理内容をブロック図で示す。本実施例は、構造
化文書を登録、管理し、参照する文書管理システムであ
る。構造化文書の登録時に、構造化文書の内容の一部を
属性情報として抽出して管理する。構造化文書を取得す
る際に、抽出した属性情報を反映した構造化文書を取得
することができる。構造化文書管理システム（１０１）
は、ＣＰＵ、メモリ、外部記憶装置、システムバス、可
搬型媒体ネットワーク装置により構成される汎用コンピ
ュータである。入出力装置（１０２）を利用して、構造
化文書管理システム（１０１）に対する動作指示および
情報の表示を行なう。構造化文書管理システムは、本シ
ステム上で、ＣＰＵを利用して動作するプログラムと、
処理データやプログラムを格納するメモリ、登録文書な
どを管理するファイルシステムから構成される。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a block diagram showing the processing contents of a first embodiment utilizing the present invention. The present embodiment is a document management system that registers, manages, and refers to a structured document. When a structured document is registered, a part of the content of the structured document is extracted and managed as attribute information. When acquiring a structured document, a structured document reflecting the extracted attribute information can be acquired. Structured document management system (101)
Is a general-purpose computer including a CPU, a memory, an external storage device, a system bus, and a portable medium network device. Using the input / output device (102), operation instructions and information are displayed for the structured document management system (101). The structured document management system includes a program operating on the system using a CPU,
It comprises a memory for storing processing data and programs, and a file system for managing registered documents.

【００１１】構造化文書管理システムは、登録対象の構
造化文書（１０３）をシステムに登録するための文書登
録処理プログラム（１０４）と、登録した構造化文書を
格納する構造化文書データベース（構造化文書ＤＢ）
（１０５）と、登録対象の構造化文書（１０３）を登録
する際に、構造化文書中に記載された内容を構造化文書
の属性情報として抽出し登録する、属性情報抽出・登録
処理プログラム（１０６）と、属性情報抽出・登録処理
プログラム（１０６）が動作するための定義情報である
抽出要素定義（１０７）と、抽出した属性情報を構造化
文書と関連付けて格納する属性情報データベース（属性
情報ＤＢ）（１０８）と、指定された属性情報更新指定
（１０９）を元に動作する属性情報更新プログラム（１
１０）と、読み出し文書の指定（１１１）に基づき構造
化文書ＤＢ（１０５）から構造化文書を読み出し、メモ
リ上に構造化文書内容（１１３）を出力する構造化文書
読み出し処理プログラム（１１２）と、読み出し文書の
指定（１０９）に基づき属性情報ＤＢ（１０８）から、
属性情報を読み出し、メモリ上に属性情報（１１５）を
出力する属性情報読み出し処理プログラム（１１４）
と、メモリ上に取得した構造化文書内容（１１３）と属
性情報（１１５）と抽出要素定義（１０７）に基づき、
属性情報（１１５）の更新結果を構造化文書内容（１１
３）に反映した構造化文書を生成し、更新済構造化文書
（１１７）として出力する更新済構造化文書生成処理プ
ログラム（１１６）から成る。The structured document management system includes a document registration processing program (104) for registering a structured document (103) to be registered in the system, and a structured document database (structured document) for storing the registered structured documents. Document DB)
(105) and an attribute information extraction / registration processing program for extracting and registering the contents described in the structured document as attribute information of the structured document when registering the structured document (103) to be registered ( 106), an extraction element definition (107) which is definition information for operating the attribute information extraction / registration processing program (106), and an attribute information database (attribute information) for storing the extracted attribute information in association with the structured document DB) (108) and an attribute information update program (1) that operates based on the specified attribute information update designation (109).
10) a structured document reading processing program (112) for reading a structured document from the structured document DB (105) based on the specification of the read document (111) and outputting the structured document contents (113) on a memory; From the attribute information DB (108) based on the specification (109) of the read document,
Attribute information read processing program (114) for reading attribute information and outputting attribute information (115) on the memory
And the structured document contents (113), attribute information (115), and extracted element definition (107) acquired on the memory,
The update result of the attribute information (115) is stored in the structured document content (11).
An updated structured document generation processing program (116) for generating a structured document reflected in 3) and outputting it as an updated structured document (117).

【００１２】図２に文書登録処理（１０４）、属性情報
抽出・登録処理（１０６）のフローチャートを示す。ス
テップ２０１では、登録対象の構造化文書を読み出す。
ステップ２０２では、読み出した構造化文書を構造化文
書ＤＢに格納する。ステップ２０３では、ステップ２０
１で読み出した構造化文書から属性情報を取得し、構造
化文書ＤＢと関連付けて属性情報ＤＢに格納する。ステ
ップ２０４では、登録すべき構造化文書が残っているか
否かを確認し、残っていれば、次の文書に対して、ステ
ップ２０１からステップ２０３までの処理を繰り返す。
登録すべき構造化文書がなくなった時点で、登録処理を
終了する。FIG. 2 shows a flowchart of the document registration process (104) and the attribute information extraction / registration process (106). In step 201, a structured document to be registered is read.
In step 202, the read structured document is stored in the structured document DB. In step 203, step 20
The attribute information is acquired from the structured document read in step 1, and stored in the attribute information DB in association with the structured document DB. In step 204, it is checked whether or not a structured document to be registered remains, and if so, the processing from step 201 to step 203 is repeated for the next document.
When there are no more structured documents to be registered, the registration process ends.

【００１３】図３に、ステップ２０３において構造化文
書の内容から属性情報となる情報を取得し、属性情報Ｄ
Ｂに登録するための処理内容を示す。まず、構造化文書
（１０３）に対して、構造化文書構造解析処理（３０
１）を行なう。本処理は、既存のＸＭＬパーサを利用す
ることで実現可能である。構造解析の結果、構造化文書
を構成する要素の階層関係を表現する構造解析データ
（３０２）が生成される。構造解析データは、要素の最
上位要素を根とし、各要素の下位要素を枝、末端要素を
葉とする構造木で表現される。属性情報として抽出する
内容は、抽出要素定義（１０７）にしたがって、構造解
析データ（３０２）の要素を辿ることで、取得すること
ができる。抽出要素定義は以下の形式で記述する。抽出属性名=属性データ型="抽出対象の要素指定" 属性情報を文字列で取得する場合は、属性データ型にst
ringと記述し、整数値で取得する場合は、属性データ型
にintと記載する。intを指定した場合は、取得した文字
列を整数値に変換してから属性情報ＤＢに格納する。ま
た、属性情報が配列形式のデータであれば、配列の各要
素が文字列の場合は、属性データ型にstring_arrayと記
述し、各要素が数値の場合は、int_arrayと記述する。
この場合は抽出対象の要素指定により複数取得した内容
を順に配列の各要素に格納する。属性情報のデータ型
を、属性データ型情報として別管理することで、抽出要
素定義に属性データ型を記述しない方法もある。この場
合は、抽出要素定義には、属性データ型を記述せず、属
性データ型情報に対して、抽出属性名により検索を行な
い、属性データ型の情報を取得することで、以降の処理
は同様に実現することができる。In FIG. 3, in step 203, information as attribute information is obtained from the content of the structured document, and the attribute information D
4 shows the processing contents for registering in B. First, a structured document structure analysis process (30) is performed on the structured document (103).
Perform 1). This processing can be realized by using an existing XML parser. As a result of the structural analysis, structural analysis data (302) expressing the hierarchical relationship of the elements constituting the structured document is generated. The structural analysis data is represented by a structural tree having the highest element of the element as a root, the lower element of each element as a branch, and the terminal element as a leaf. The content to be extracted as the attribute information can be obtained by tracing the elements of the structural analysis data (302) according to the extraction element definition (107). The extraction element definition is described in the following format. Extraction attribute name = attribute data type = "extraction target element specification" To obtain attribute information as a character string, set st as the attribute data type.
Describe as ring, and when acquiring as an integer value, describe as int in the attribute data type. When int is specified, the acquired character string is converted into an integer value and stored in the attribute information DB. If the attribute information is data in an array format, if each element of the array is a character string, describe it as string_array in the attribute data type, and if each element is a numerical value, describe it as int_array.
In this case, a plurality of contents obtained by designating the element to be extracted are sequentially stored in each element of the array. There is a method in which the data type of the attribute information is separately managed as the attribute data type information so that the attribute data type is not described in the extracted element definition. In this case, the attribute data type is not described in the extracted element definition, and the attribute data type information is searched by the extracted attribute name, and the attribute data type information is obtained. Can be realized.

【００１４】抽出要素定義の記述において、抽出対象の
要素指定は、ＸＰａｔｈの記述方法を利用する。したが
って、"/文書/タイトル/text()"というＸＰａｔｈの記
述を解析し、構造解析データ（３０２）の要素を辿るこ
とで、最上位要素である"文書"という要素(３０２−ａ)
の下の"タイトル"という要素（３０２−ｂ）内の文字列
(CDATA)（３０２−ｃ）を取得できる。取得した内容
は、属性情報"タイトル"の値として属性情報ＤＢ（１０
８）に格納する。また、"/文書/著者/text()"というＸ
Ｐａｔｈの記述を解析し、構造解析データ（３０２）の
要素を辿ることで、最上位要素である"文書"という要素
(３０２−ａ)の下の"著者"という要素（３０２−d）内
の文字列(CDATA)（３０２−e）を取得できる。取得した
内容は、配列形式の属性情報"著者"の1つめの要素の値
として属性情報ＤＢ（１０８）に格納する。複数の著者
が存在する場合は、出現順に属性情報"著者"の１つの要
素として属性情報ＤＢ（１０８）に格納する。In the description of the extraction element definition, the element to be extracted is specified using the XPath description method. Therefore, by analyzing the description of the XPath "/ document / title / text ()" and tracing the elements of the structural analysis data (302), the element (302-a) of the topmost element "document" is obtained.
Character string in the element (302-b) called "Title" under
(CDATA) (302-c) can be obtained. The acquired contents are stored in the attribute information DB (10
8). Also, the X "/ document / author / text ()"
By analyzing the description of the Path and tracing the elements of the structural analysis data (302), the element “document”, which is the top-level element,
The character string (CDATA) (302-e) in the element (302-d) called "author" under (302-a) can be acquired. The acquired content is stored in the attribute information DB (108) as the value of the first element of the attribute information "author" in the array format. If there are a plurality of authors, they are stored in the attribute information DB (108) as one element of the attribute information "author" in the order of appearance.

【００１５】図４は、図２のステップ２０３において、
構造化文書から属性情報となる内容を取得し、属性の値
として格納するための詳細な処理手順を示したフローチ
ャートである。ステップ４０１では、登録対象の構造化
文書を構造解析し、構造解析データ（３０２）を生成す
る。ステップ４０２では、抽出要素定義情報（１０７）
から抽出要素定義を１つ取り出し、記述内容を解析す
る。ステップ４０３では、ステップ４０２において解析
した抽出要素定義の"抽出対象の要素指定"に基づき、構
造化文書解析データを最上位要素である"文書"から順
に、指定された要素まで辿っていく。ステップ４０４で
は、構造化文書構造解析データ（３０２）が保持する、
ステップ４０３の処理で辿り着いた要素内の文字列を取
得する。ステップ４０５では、取得した内容を抽出要素
定義の抽出属性名で指定された属性の値として格納す
る。ＸＭＬで記述される内容は、すべて文字列として扱
われるため、属性情報のデータ型が整数値である場合
は、文字列から数値への変換を行なってから、整数値と
して格納する。文字列から整数値に変換する方法は、Ｃ
言語の関数では、atoi()などが提供されており、これら
を利用することができる。ステップ４０６では、抽出要
素定義情報（１０７）で未処理のものが存在するか否か
をチェックする。存在する場合は、次の抽出要素定義に
したがって、ステップ４０２からステップ４０５の処理
を繰り返す。FIG. 4 shows that in step 203 of FIG.
9 is a flowchart illustrating a detailed processing procedure for acquiring contents serving as attribute information from a structured document and storing the contents as attribute values. In step 401, the structure of the structured document to be registered is analyzed to generate structure analysis data (302). In step 402, the extracted element definition information (107)
, One extracted element definition is taken out, and the description content is analyzed. In step 403, the structured document analysis data is traced from the topmost element "document" to the specified element in order based on "extraction element specification" in the extraction element definition analyzed in step 402. In step 404, the structured document structure analysis data (302) holds
The character string within the element reached in the processing of step 403 is acquired. In step 405, the acquired contents are stored as the value of the attribute specified by the extraction attribute name of the extraction element definition. Since all the contents described in XML are treated as character strings, if the data type of the attribute information is an integer value, the character string is converted into a numerical value and then stored as an integer value. To convert a string to an integer, use C
Atoi () and the like are provided as language functions, and these can be used. In step 406, it is checked whether there is any unprocessed extraction element definition information (107). If there is, the processing from step 402 to step 405 is repeated according to the next extraction element definition.

【００１６】図５は、構造化文書と属性情報の管理形式
の例１である。この例では、構造化文書（５０１）ファ
イルのプロパティ格納領域（５０２）に取得した属性情
報を格納する。図５に示した形式で構造化文書と属性情
報を管理する場合、構造化文書の内容から抽出した属性
情報は構造化文書ファイルのプロパティとして管理され
ているため、構造化文書ファイルに対するプロパティ取
得により、属性情報の取得が可能である。また、構造化
文書ファイルに対するプロパティ更新により、属性情報
の更新が可能である。FIG. 5 shows an example 1 of a management format of a structured document and attribute information. In this example, the acquired attribute information is stored in the property storage area (502) of the structured document (501) file. When managing the structured document and the attribute information in the format shown in FIG. 5, the attribute information extracted from the content of the structured document is managed as the property of the structured document file. , Attribute information can be obtained. Further, attribute information can be updated by updating the property of the structured document file.

【００１７】図６は、構造化文書と属性情報の管理形式
の例２である。この例では、構造化文書をデータベース
の１つのテーブルである構造化文書管理テーブル（６０
１）の構造化文書内容カラム（６０２）に格納する。ま
た、構造化文書から抽出した属性情報を同じテーブル
（６０１）の属性格納用のカラム（６０３，６０４）に
格納する。構造化文書と属性情報を同じレコードに格納
することで、構造化文書と属性情報を関連付けて管理す
る。文書識別子（６０５）は、構造化文書および属性情
報の取得、更新時に構造化文書または属性情報特定する
際に利用する。図６に示した形式で格納した構造化文書
とその属性情報は、テーブル上の同じレコードに格納さ
れるため、テーブルに対する検索時に同時に取得するこ
とが可能である。また、属性情報は、文書識別子を指定
したカラム内容の更新処理により、更新が可能である。FIG. 6 shows a second example of the management format of the structured document and the attribute information. In this example, the structured document is stored in a structured document management table (60
It is stored in the structured document content column (602) of 1). The attribute information extracted from the structured document is stored in the attribute storage columns (603, 604) of the same table (601). By storing the structured document and the attribute information in the same record, the structured document and the attribute information are managed in association with each other. The document identifier (605) is used to specify the structured document or the attribute information when acquiring and updating the structured document and the attribute information. Since the structured document and its attribute information stored in the format shown in FIG. 6 are stored in the same record on the table, it is possible to acquire the structured document and the table at the same time. Further, the attribute information can be updated by updating the column content specifying the document identifier.

【００１８】図７は、構造化文書と属性情報の管理形式
の例３である。この例では、構造化文書は、データベー
スの第一のテーブルである構造化文書管理テーブル（７
０１）の、レコードの構造化文書内容カラム（７０２）
のデータとして格納し、格納した構造化文書と同じレコ
ードの文書識別子カラム（７０３）に構造化文書を一意
に特定する文書識別子を格納する。また、第二のテーブ
ルである属性情報管理テーブル（７０４）に構造化文書
から抽出した属性情報と構造化文書を一意に特定する文
書識別子を同じレコードのカラム（７０５、７０６、７
０７）にそれぞれ格納する。ここで、文書識別子のカラ
ム（７０７）には、第一のテーブル（７０１）の文書識
別子のカラム（７０３）に格納した文書識別子と同じ値
を格納する。図７に示した形式で格納した構造化文書と
その属性情報は、第一、第二のテーブルに対して、文書
識別子を指定した検索を行なうことにより取得すること
ができる。また、属性情報は、文書識別子を指定したカ
ラム内容の更新処理により、更新が可能である。FIG. 7 shows a third example of the management format of the structured document and the attribute information. In this example, the structured document is a structured document management table (7
01), the structured document content column of the record (702)
And stores a document identifier that uniquely identifies the structured document in the document identifier column (703) of the same record as the stored structured document. Also, the attribute information extracted from the structured document and the document identifier for uniquely identifying the structured document are stored in the same record column (705, 706, 7) in the attribute information management table (704) as the second table.
07). Here, the same value as the document identifier stored in the document identifier column (703) of the first table (701) is stored in the document identifier column (707). The structured document and its attribute information stored in the format shown in FIG. 7 can be obtained by performing a search on the first and second tables by designating a document identifier. Further, the attribute information can be updated by updating the column content specifying the document identifier.

【００１９】図８は、属性情報更新処理（１１０）の処
理手順を示すフローチャートである。まず、ステップ８
０１において、属性情報を更新する文書を検索する。検
索は、文書識別子やタイトルなどの属性情報さらには、
構造化文書に対する全文検索などにより実現する。ステ
ップ８０２において検索した文書の属性情報を取得す
る。ステップ８０３において、更新した属性情報を属性
情報ＤＢ（１０８）に格納することで、属性情報の更新
を実現する。FIG. 8 is a flowchart showing a procedure of the attribute information updating process (110). First, step 8
At 01, a document whose attribute information is to be updated is searched. Search is based on attribute information such as document identifier and title,
It is realized by full-text search for structured documents. In step 802, attribute information of the searched document is obtained. In step 803, updating of the attribute information is realized by storing the updated attribute information in the attribute information DB (108).

【００２０】図９は、抽出した属性情報の更新結果を反
映した更新済構造化文書生成の例1を示した図である。
この例では、構造化文書（９０１）の"タイトル"要素か
ら取得した内容（９０１−ａ）を、属性情報"タイトル"
に格納している。属性情報"タイトル"の内容は、属性情
報の更新処理により、"構造化文書概説"から"構造化文
書の概要"に更新されている（９０２）。更新結果を元
の構造化文書に反映することで、属性の更新内容を反映
した構造化文書（９０３）が生成され、構造化文書の"
タイトル"タグで囲まれた内容が、"構造化文書の概要"
に更新されている（９０３−ａ）。FIG. 9 is a diagram showing Example 1 of the generation of an updated structured document reflecting the update result of the extracted attribute information.
In this example, the content (901-a) acquired from the “title” element of the structured document (901) is stored in the attribute information “title”.
Is stored in The content of the attribute information “title” has been updated from “structured document outline” to “structured document outline” by the attribute information update process (902). By reflecting the update result on the original structured document, a structured document (903) reflecting the updated content of the attribute is generated.
The content enclosed by the title "tag" is "Summary of structured document"
(903-a).

【００２１】図１０は、抽出した属性情報の更新結果を
反映した構造化文書生成の例２を示した図である。この
例では、構造化文書（１００１）の"著者"要素から取得
した内容（１００１−ａ）を、属性情報"著者"の1つ目
の要素"著者[1]"に格納している。属性情報"著者"の内
容は、属性情報の更新処理により、"日立次郎"が属性情
報"著者"の2つ目の要素"著者[2]"として追加されている
（１００２）。更新結果を元の構造化文書に反映するこ
とで、属性の更新内容を反映した構造化文書（１００
３）が生成される。構造化文書の"著者"タグが１つ追加
され、追加された"著者"タグに囲まれた内容に"日立次
郎"が格納されている（１００３−ａ）。FIG. 10 is a diagram showing Example 2 of the generation of a structured document reflecting the update result of the extracted attribute information. In this example, the content (1001-a) acquired from the "author" element of the structured document (1001) is stored in the first element "author [1]" of the attribute information "author". In the content of the attribute information "author", "Hitachi Jiro" is added as the second element "author [2]" of the attribute information "author" by the attribute information update process (1002). By reflecting the update result on the original structured document, the structured document (100
3) is generated. One "author" tag of the structured document is added, and "Jitaro Hitachi" is stored in the content surrounded by the added "author" tag (1003-a).

【００２２】図１１は、属性情報の更新結果を反映した
構造化文書を生成する更新済構造化文書生成処理（１１
６）の処理内容の第１の例を示した図である。まず、構
造化文書ＤＢ（１０５）から取得した構造化文書（１１
０１）に対して構造化文書構造解析処理（１１０２）を
行ない、構造解析データ（１１０３）を生成する。本処
理は、ＸＭＬパーサにより実現可能である。次に、属性
情報ＤＢ（１０８）から、構造化文書ＤＢ（１０５）か
ら取得した構造化文書（１１０１）に対応する属性情報
（１１０４）を取得する。取得した属性情報（１１０
４）と抽出要素定義情報（１１０５）から構造化文書更
新処理（１１０６）において、構造解析データ（１１０
３）に対して指定された要素の内容を更新する処理を行
なう。まず、構造化文書解析データ（１１０３）に対し
て、抽出要素定義（１１０５）の情報を元に、属性情報
として抽出された要素を取得する。抽出要素定義の抽出
対象の要素指定には、「タイトル=string="/文書/タイ
トル/text()"」と記述されていることから、属性情報"
タイトル"は、最上位要素である"文書"要素(１１０３−
ａ)から、"タイトル"要素（１１０３−ｂ）を辿って文
字列(CDATA)（１１０３−ｃ）から取得されたことが分
かる。属性情報"タイトル"の内容は、"構造化文書の概
要"となっており、構造解析データの内容をこの属性情
報に置き換えることで、構造解析データの更新が行なわ
れる。FIG. 11 shows an updated structured document generation process (11) for generating a structured document reflecting the update result of the attribute information.
FIG. 9 is a diagram showing a first example of the processing content of 6). First, the structured document (11) acquired from the structured document DB (105)
01) is subjected to a structured document structure analysis process (1102) to generate structure analysis data (1103). This processing can be realized by an XML parser. Next, attribute information (1104) corresponding to the structured document (1101) acquired from the structured document DB (105) is acquired from the attribute information DB (108). Acquired attribute information (110
4) and the extracted element definition information (1105), in the structured document update processing (1106), the structural analysis data (110)
For 3), a process of updating the content of the specified element is performed. First, for the structured document analysis data (1103), an element extracted as attribute information is acquired based on the information of the extracted element definition (1105). In the specification of the element to be extracted in the extraction element definition, since "title = string =" / document / title / text () "" is described, attribute information "
The “title” is a “document” element (1103-
From a), it can be seen that the character string (CDATA) (1103-c) was obtained by tracing the "title" element (1103-b). The content of the attribute information “title” is “outline of structured document”, and the structural analysis data is updated by replacing the content of the structural analysis data with this attribute information.

【００２３】もう1つの属性情報"著者"は、配列形式の
属性情報であり、属性情報の更新の結果、要素が１つ増
えている。抽出要素定義の抽出対象の要素指定には、
「タイトル=string="/文書/著者/text()"」と記述され
ていることから、属性情報"著者"は、最上位要素であ
る"文書"要素(１１０３−ａ)から、"著者"要素（１１０
３−ｄ）を辿って文字列(CDATA)（１１０３−ｅ）から
取得されたことが分かる。属性情報"著者"は、要素数が
増加していることから、文字列(CDATA)（１１０３−
ｅ）の内容を更新するだけではなく、追加された属性情
報"著者[2]"を格納するために、要素（１１０３−ｆ）
を追加する必要がある。追加した"著者"要素に属性情
報"著者[2]"の内容である"日立次郎"を格納すること
で、属性更新後の構造解析データを生成することができ
る。上記の処理により更新された構造解析データ（１１
０３）から構造化文書生成処理（１１０７）により、更
新済構造化文書（１１０８）を生成する。また、属性情
報が整数値の場合には、属性情報を取得した後に、文字
列に変換することで以下の手順は、同じとなる。The other attribute information "author" is attribute information in an array format, and as a result of updating the attribute information, one element is added. To specify the extraction target element in the extraction element definition,
Since "title = string =" / document / author / text () "" is described, the attribute information "author" is obtained from the "document" element (1103-a) as the top-level element. Element (110
It can be seen that the character string (CDATA) (1103-e) was obtained by following 3-d). The attribute information "author" has a character string (CDATA) (1103-
In order to not only update the content of e) but also to store the added attribute information "author [2]", the element (1103-f)
Need to be added. By storing "Jiro Hitachi" which is the content of the attribute information "Author [2]" in the added "Author" element, it is possible to generate the structural analysis data after the attribute update. The structural analysis data (11
03), an updated structured document (1108) is generated by a structured document generation process (1107). If the attribute information is an integer value, the following procedure is the same by acquiring the attribute information and converting it to a character string.

【００２４】図１２は、構造化文書の読み出し処理（１
１２）、属性情報の読み出し処理（１１４）および更新
済構造化文書生成処理（１１６）において、図１１を用
いて説明した処理内容の手順を示したフローチャートで
ある。ステップ１２０１では、取得対象の構造化文書を
検索する。取得方法が文書識別子の場合は、文書識別子
に対する検索により文書を取得する。ステップ１２０２
では、ステップ１２０１で検索された構造化文書の内容
を取得する。ステップ１２０３では、ステップ１２０１
で検索された構造化文書の属性情報を取得する。ステッ
プ１２０４では、ステップ１２０２で取得した構造化文
書に対する構造解析処理を行ない、構造解析データを生
成する。ステップ１２０５では、抽出要素定義情報か
ら、属性情報を取得した要素を特定する。ステップ１２
０６では、ステップ１２０５で特定した要素の内容を属
性情報の更新に置き換えることで、内容を更新する。ス
テップ１２０７では、全ての属性情報について、内容更
新が行なわれたか否かを判定する。内容更新が行なわれ
ていない属性情報が存在する場合は、ステップ１２０５
からステップ１２０６の処理を繰り返す。全ての属性情
報について、内容更新が行なわれた場合は、ステップ１
２０８で、更新済の構造解析データから構造化文書を生
成する。FIG. 12 shows a structured document reading process (1).
12) is a flowchart showing the procedure of the processing contents described with reference to FIG. 11 in the attribute information reading processing (114) and the updated structured document generation processing (116). In step 1201, a structured document to be acquired is searched. If the acquisition method is a document identifier, the document is acquired by searching for the document identifier. Step 1202
Then, the contents of the structured document retrieved in step 1201 are acquired. In step 1203, step 1201
To obtain the attribute information of the structured document searched by. In step 1204, a structural analysis process is performed on the structured document acquired in step 1202 to generate structural analysis data. In step 1205, the element for which the attribute information has been acquired is specified from the extracted element definition information. Step 12
In step 06, the content is updated by replacing the content of the element specified in step 1205 with the update of the attribute information. In step 1207, it is determined whether or not the content has been updated for all the attribute information. If there is attribute information that has not been updated, step 1205
To repeat the processing of step 1206. If the content has been updated for all the attribute information, step 1
At 208, a structured document is generated from the updated structural analysis data.

【００２５】図１３は、属性情報の更新結果を反映した
構造化文書を生成する更新済構造化文書生成処理（１１
６）の処理内容の第２の例を示した図である。まず、構
造化文書ＤＢ（１０５）から取得した構造化文書（１３
０１）を取得する。次に、属性情報ＤＢ（１０８）か
ら、構造化文書ＤＢ（１０５）から取得した構造化文書
（１３０１）に対応する属性情報（１３０２）を取得す
る。取得した属性情報（１３０２）と抽出要素定義情報
（１３０３）から構造化文書変換定義生成処理（１３０
４）において、構造化文書変換定義（１３０５）を生成
する。構造化文書変換定義の内容（１３０６）は、抽出
要素定義（１３０３）から抽出できる内容を属性情報
（１３０２）に置き換えるという定義を記述したもので
ある。この構造化文書変換定義（１３０６）と構造化文
書（１３０１）を構造化文書変換処理（１３０７）に入
力することで、抽出要素定義（１３０３）で指定された
内容を属性情報１３０２）の内容に更新した更新済構造
化文書（１３０８）が生成される。構造化文書変換定義
（１３０６）には、ＸＳＬＴを用い、構造化文書変換処
理（１３０７）にＸＳＬＴエンジンを利用することで、
変換処理が実現できる。FIG. 13 shows an updated structured document generation process (11) for generating a structured document reflecting the update result of the attribute information.
FIG. 13 is a diagram illustrating a second example of the processing content of 6). First, the structured document (13) acquired from the structured document DB (105)
01). Next, attribute information (1302) corresponding to the structured document (1301) acquired from the structured document DB (105) is acquired from the attribute information DB (108). Based on the acquired attribute information (1302) and the extracted element definition information (1303), a structured document conversion definition generation process (130)
In 4), a structured document conversion definition (1305) is generated. The content (1306) of the structured document conversion definition describes a definition that replaces the content that can be extracted from the extraction element definition (1303) with attribute information (1302). By inputting the structured document conversion definition (1306) and the structured document (1301) to the structured document conversion process (1307), the contents specified by the extraction element definition (1303) are converted into the contents of the attribute information 1302). An updated updated structured document (1308) is generated. By using XSLT for the structured document conversion definition (1306) and using the XSLT engine for the structured document conversion process (1307),
Conversion processing can be realized.

【００２６】図１４は、構造化文書の読み出し処理（１
１２）、属性情報の読み出し処理（１１４）および更新
済構造化文書生成処理（１１６）において、図１３を用
いて説明した処理内容の手順を示したフローチャートで
ある。ステップ１４０１では、取得対象の構造化文書を
検索する。取得方法が文書識別子の場合は、文書識別子
に対する検索により文書を取得する。ステップ１４０２
では、ステップ１４０１の検索でヒットした構造化文書
の内容を取得する。ステップ１４０３では、ステップ１
４０１の検索でヒットした構造化文書の属性情報を取得
する。ステップ１４０４では、抽出要素定義情報と更新
済みの属性情報から、属性情報を取得した要素の内容を
更新済み属性情報に変換するための構造化文書変換定義
を生成する。ステップ１４０５では、ステップ１４０４
で生成した構造化文書変換定義によって、ステップ１４
０２で取得した構造化文書を更新済の属性情報の内容に
更新した更新済構造化文書に変換する。FIG. 14 shows a reading process of structured document (1).
14 is a flowchart showing the procedure of the processing contents described with reference to FIG. 13 in the attribute information readout processing (114) and the updated structured document generation processing (116). In step 1401, a structured document to be acquired is searched. If the acquisition method is a document identifier, the document is acquired by searching for the document identifier. Step 1402
Then, the content of the structured document hit in the search in step 1401 is acquired. In step 1403, step 1
The attribute information of the structured document hit by the search at 401 is acquired. In step 1404, a structured document conversion definition for converting the content of the element for which the attribute information has been obtained into the updated attribute information is generated from the extracted element definition information and the updated attribute information. In step 1405, step 1404
Step 14 according to the structured document conversion definition generated in
02 is converted into an updated structured document updated to the contents of the updated attribute information.

【００２７】次に本発明を利用した第二の実施例を示
す。第二の実施例では、構造化文書から抽出した属性情
報を管理する際に、属性情報の更新の有無を判定するた
めの属性情報の更新フラグを保持することで、更新され
た属性情報だけを構造化文書の更新に利用することがで
きる。また、構造化文書取得時に、構造化文書ＤＢで管
理する構造化文書を更新済構造化文書に置き換えること
で、管理している属性情報と構造化文書の内容を整合さ
せる。図１５は、第二の実施例の処理ブロック図であ
る。第一の実施例のブロック図（図1）との違いは、更
新済構造化文書（１１７）を構造化文書ＤＢ（１０５）
に格納する構造化文書更新処理プログラム（１５０１）
が追加されていることである。その他に変更点はない。Next, a second embodiment utilizing the present invention will be described. In the second embodiment, when managing the attribute information extracted from the structured document, only the updated attribute information is retained by holding an attribute information update flag for determining whether the attribute information is updated. It can be used to update structured documents. In addition, at the time of acquiring a structured document, the structured document managed by the structured document DB is replaced with the updated structured document, so that the managed attribute information and the contents of the structured document are matched. FIG. 15 is a processing block diagram of the second embodiment. The difference from the block diagram of the first embodiment (FIG. 1) is that the updated structured document (117) is stored in the structured document DB (105).
Structured Document Update Processing Program (1501)
Is added. There are no other changes.

【００２８】図１６は、第二の実施例において、図７に
示した、構造化文書と属性情報の管理形式の例３に対し
て、属性の更新情報を保持するようにした構造化文書と
属性情報の管理方法の例４である。この例では、構造化
文書は、データベースの第一のテーブルである構造化文
書管理テーブル（１６０１）の、レコードの構造化文書
内容カラム（１６０２）のデータとして格納し、格納し
た構造化文書と同じレコードの文書識別子カラム（１６
０３）に構造化文書を一意に特定する文書識別子を格納
する。また、第二のテーブルである属性情報管理テーブ
ル（１６０４）に構造化文書から抽出した属性情報と構
造化文書を一意に特定する文書識別子を同じレコードの
カラム（１６０５、１６０６、１６０７）にそれぞれ格
納する。ここで、文書識別子のカラム（１６０７）に
は、第一のテーブル（１６０１）の文書識別子のカラム
（１６０３）に格納した文書識別子と同じ値を格納す
る。さらに、属性情報"タイトル"の更新フラグ（１６０
８）と属性情報"著者"の更新フラグ（１６０９）を保持
する。図１６に示した形式で格納した構造化文書とその
属性情報は、第一、第二のテーブルに対して、文書識別
子を指定した検索を行なうことにより取得することがで
きる。また、属性情報は、文書識別子を指定したカラム
内容の更新処理により、更新が可能である。属性の更新
フラグは、図５に示した構造化文書と属性情報の管理方
法の例１、図６に示した構造化文書と属性情報の管理方
法の例２のいずれに対しても同様に保持するようにする
ことができる。FIG. 16 shows, in the second embodiment, the structured document and attribute information management format example 3 shown in FIG. It is Example 4 of the management method of attribute information. In this example, the structured document is stored as data in the structured document content column (1602) of the record in the structured document management table (1601), which is the first table of the database, and is the same as the stored structured document. Document identifier column of record (16
03) stores a document identifier for uniquely specifying a structured document. The attribute information extracted from the structured document and the document identifier for uniquely identifying the structured document are stored in the same record column (1605, 1606, 1607) in the attribute information management table (1604), which is the second table. I do. Here, the same value as the document identifier stored in the document identifier column (1603) of the first table (1601) is stored in the document identifier column (1607). Furthermore, the update flag (160) of the attribute information "title"
8) and the update flag (1609) of the attribute information "author". The structured document and its attribute information stored in the format shown in FIG. 16 can be obtained by performing a search on the first and second tables by designating a document identifier. Further, the attribute information can be updated by updating the column content specifying the document identifier. The attribute update flag is similarly held in both the structured document and attribute information management method example 1 shown in FIG. 5 and the structured document and attribute information management method example 2 shown in FIG. You can make it.

【００２９】図１７に、第二の実施例における、文書登
録処理（１０４）、属性情報抽出・登録処理（１０６）
のフローチャートを示す。ステップ１７０１では、登録
対象の構造化文書（１０３）を読み出す。ステップ１７
０２では、読み出した構造化文書（１０３）を構造化文
書ＤＢ（１０５）に格納する。ステップ１７０３では、
ステップ１７０１で読み出した構造化文書（１０３）か
ら属性情報を取得し、構造化文書ＤＢ（１０５）と関連
付けて属性情報ＤＢ（１０８）に格納する。ステップ１
７０４では、属性情報の更新フラグを初期化し、未更新
状態に設定する。ステップ１７０５では、登録すべき構
造化文書が残っているか否かを確認し、残っていれば、
次の文書に対して、ステップ１７０１からステップ１７
０４までの処理を繰り返す。登録すべき構造化文書がな
くなった時点で、登録処理を終了する。FIG. 17 shows a document registration process (104) and an attribute information extraction / registration process (106) in the second embodiment.
The flowchart of FIG. In step 1701, the structured document (103) to be registered is read. Step 17
In 02, the read structured document (103) is stored in the structured document DB (105). In step 1703,
Attribute information is acquired from the structured document (103) read in step 1701, and stored in the attribute information DB (108) in association with the structured document DB (105). Step 1
In step 704, the update flag of the attribute information is initialized and set to an unupdated state. In step 1705, it is checked whether or not a structured document to be registered remains.
Steps 1701 through 17 for the next document
The processing up to 04 is repeated. When there are no more structured documents to be registered, the registration process ends.

【００３０】図１８は、第二の実施例における、属性情
報更新処理（１１０）の処理手順を示すフローチャート
である。まず、ステップ１８０１において、属性情報を
更新する文書を検索する。検索は、文書識別子やタイト
ルなどの属性情報さらには、構造化文書に対する全文検
索などにより実現する。ステップ１８０２において検索
した文書の属性情報を取得する。ステップ１８０３にお
いて、更新した属性情報を属性情報ＤＢ（１０８）に格
納することで、属性情報の更新を実現する。ステップ１
８０４において、更新した属性情報の更新フラグの内容
を更新済状態に設定する。FIG. 18 is a flowchart showing the procedure of the attribute information updating process (110) in the second embodiment. First, in step 1801, a document whose attribute information is to be updated is searched. The search is realized by attribute information such as a document identifier and a title, as well as a full-text search for a structured document. In step 1802, attribute information of the searched document is obtained. In step 1803, the updated attribute information is realized by storing the updated attribute information in the attribute information DB (108). Step 1
In step 804, the content of the update flag of the updated attribute information is set to an updated state.

【００３１】図１９は、第二の実施例における、構造化
文書の読み出し処理（１１２）、属性情報の読み出し処
理（１１４）および更新済構造化文書生成処理（１１
６）、構造化文書更新処理（１５０１）において、図１
３を用いて説明した処理内容の手順を示したフローチャ
ートである。第二の実施例では、更新済構造化文書を生
成する際に、更新された属性値だけを更新する。ステッ
プ１９０１では、取得対象の構造化文書を検索する。取
得方法が文書識別子の場合は、文書識別子に対する検索
により文書を取得する。ステップ１９０２では、ステッ
プ１９０１の検索でヒットした構造化文書の内容を取得
する。ステップ１９０３では、ステップ１９０１の検索
でヒットした構造化文書の属性情報と属性情報の更新フ
ラグを取得する。ステップ１９０４では、ステップ１９
０３で取得した更新フラグをチェックし、更新された属
性情報が存在しない場合は、ステップ１９０５に進む。
ステップ１９０５では、ステップ１９０２で取得した構
造化文書の内容を、そのまま出力する。更新済みの属性
情報が存在する場合は、ステップ１９０６に進む。ステ
ップ１９０６では、抽出要素定義情報と更新済みの属性
情報から、属性情報を取得した要素の内容を更新済み属
性情報の内容に変換するための構造化文書変換定義を生
成する。ここで生成する構造化文書変換定義は、ステッ
プ１９０３で取得した属性情報の更新フラグが更新済み
になっている要素に対する変換定義のみが記述されたも
のである。ステップ１９０７では、ステップ１９０６で
生成した構造化文書変換定義によって、ステップ１９０
２で取得した構造化文書を更新済の属性情報の内容に更
新した更新済構造化文書に変換する。ステップ１９０８
では、更新済構造化文書を構造化文書ＤＢ（１０５）の
変換元の構造化文書に上書きすることで構造化文書の内
容を更新する。ステップ１９０９では、更新登録した構
造化文書の属性情報の更新フラグを未更新に設定する。
以上の処理により、更新された属性情報が無ければ元の
構造化文書をそのまま出力し、存在すれば、更新された
属性情報だけを変換する構造化文書変換定義を生成し、
この構造化文書変換定義を利用して、構造化文書から抽
出した属性情報の更新結果を反映した構造化文書を生成
することが可能となる。FIG. 19 shows a structured document reading process (112), attribute information reading process (114), and an updated structured document generating process (11) in the second embodiment.
6), in the structured document update processing (1501), FIG.
3 is a flowchart showing the procedure of the processing content described using FIG. In the second embodiment, when an updated structured document is generated, only the updated attribute value is updated. In step 1901, a structured document to be acquired is searched. If the acquisition method is a document identifier, the document is acquired by searching for the document identifier. In step 1902, the contents of the structured document hit in the search in step 1901 are acquired. In step 1903, the attribute information of the structured document hit in the search in step 1901 and the update flag of the attribute information are acquired. In step 1904, step 19
The update flag acquired in step 03 is checked, and if there is no updated attribute information, the process proceeds to step 1905.
In step 1905, the contents of the structured document acquired in step 1902 are output as they are. If there is updated attribute information, the process proceeds to step 1906. In step 1906, a structured document conversion definition for converting the content of the element for which the attribute information has been acquired into the content of the updated attribute information is generated from the extracted element definition information and the updated attribute information. The structured document conversion definition generated here describes only the conversion definition for the element whose attribute information update flag acquired in step 1903 has been updated. In step 1907, the structured document conversion definition generated in step 1906 is used to execute step 190.
The structured document acquired in step 2 is converted into an updated structured document updated to the contents of the updated attribute information. Step 1908
Then, the content of the structured document is updated by overwriting the updated structured document with the source structured document in the structured document DB (105). In step 1909, the update flag of the attribute information of the structured document that has been updated and registered is set to “not updated”.
By the above processing, if there is no updated attribute information, the original structured document is output as it is, and if there is, a structured document conversion definition for converting only the updated attribute information is generated,
Using this structured document conversion definition, it is possible to generate a structured document reflecting the update result of the attribute information extracted from the structured document.

【００３２】[0032]

【発明の効果】本発明により、構造化文書管理システム
において、構造化文書の内容から属性情報として抽出し
た情報を更新する処理を高速に実現し、かつ管理情報を
削減し、かつ構造化文書を取得する際に、属性情報とし
て抽出した内容の更新結果を反映した構造化文書を生成
することが可能となる。According to the present invention, in a structured document management system, processing for updating information extracted as attribute information from the content of a structured document is realized at high speed, management information is reduced, and the structured document is deleted. At the time of acquisition, it is possible to generate a structured document reflecting the update result of the content extracted as attribute information.

[Brief description of the drawings]

【図１】第一の実施例の処理内容を示すブロック図であ
る。FIG. 1 is a block diagram showing processing contents of a first embodiment.

【図２】第一の実施例の文書登録、属性情報抽出・登録
処理のフローチャートを示す図である。FIG. 2 is a flowchart illustrating a document registration and attribute information extraction / registration process according to the first embodiment.

【図３】属性情報の抽出および登録処理の内容を示す図
である。FIG. 3 is a diagram showing contents of extraction and registration processing of attribute information.

【図４】構造化文書属性抽出・登録処理のフローチャー
トを示す図である。FIG. 4 is a diagram showing a flowchart of structured document attribute extraction / registration processing.

【図５】構造化文書および属性情報の管理形式の例１を
示す図である。FIG. 5 is a diagram illustrating a first example of a management format of a structured document and attribute information.

【図６】構造化文書および属性情報の管理形式の例２を
示す図である。FIG. 6 is a diagram illustrating an example 2 of a management format of a structured document and attribute information.

【図７】構造化文書および属性情報の管理形式の例３を
示す図である。FIG. 7 is a diagram illustrating an example 3 of a management format of a structured document and attribute information.

【図８】第一の実施例の属性情報更新処理のフローチャ
ートを示す図である。FIG. 8 is a diagram illustrating a flowchart of attribute information update processing according to the first embodiment.

【図９】属性情報の更新および構造化文書への反映例１
を示す図である。FIG. 9 is an example of updating attribute information and reflecting it in a structured document 1
FIG.

【図１０】属性情報の更新および構造化文書への反映例
２を示す図である。FIG. 10 is a diagram showing a second example of updating attribute information and reflecting it in a structured document.

【図１１】属性情報の更新結果を反映した更新済構造化
文書生成処理１の内容を示す図である。FIG. 11 is a diagram showing the contents of an updated structured document generation process 1 reflecting an update result of attribute information.

【図１２】属性情報の更新結果を反映した更新済構造化
文書生成処理１のフローチャートを示す図である。FIG. 12 is a diagram illustrating a flowchart of an updated structured document generation process 1 reflecting an update result of attribute information.

【図１３】属性情報の更新結果を反映した更新済構造化
文書生成処理２の内容を示す図である。FIG. 13 is a diagram illustrating the content of an updated structured document generation process 2 reflecting the update result of the attribute information.

【図１４】属性情報の更新結果を反映した更新済構造化
文書生成処理２のフローチャートを示す図である。FIG. 14 is a diagram illustrating a flowchart of an updated structured document generation process 2 reflecting an update result of attribute information.

【図１５】第二の実施例の処理内容を示すブロック図で
ある。FIG. 15 is a block diagram illustrating processing contents of a second embodiment.

【図１６】構造化文書および属性情報の管理形式の例４
を示す図である。FIG. 16 shows a fourth example of a management format of a structured document and attribute information.
FIG.

【図１７】第二の実施例の文書登録、属性情報抽出・登
録処理のフローチャートを示す図である。FIG. 17 is a diagram illustrating a flowchart of document registration and attribute information extraction / registration processing according to the second embodiment.

【図１８】第二の実施例の属性情報更新処理のフローチ
ャートを示す図である。FIG. 18 is a diagram illustrating a flowchart of attribute information update processing according to the second embodiment.

【図１９】第二の実施例の更新済構造化文書生成処理の
フローチャートを示す図である。FIG. 19 is a diagram illustrating a flowchart of an updated structured document generation process according to the second embodiment.

【図２０】従来技術１による属性情報を反映した構造化
文書の生成方法を示す図である。FIG. 20 is a diagram illustrating a method of generating a structured document reflecting attribute information according to the related art 1.

【図２１】従来技術２による属性情報を反映した構造化
文書の生成方法を示す図である。FIG. 21 is a diagram illustrating a method of generating a structured document reflecting attribute information according to the conventional technique 2.

[Explanation of symbols]

１０１構造化文書管理システム１０２入出力装置１０３登録対象の構造化文書１０４文書登録処理プログラム１０５構造化文書ＤＢ１０６属性情報抽出・登録処理プログラム１０７抽出要素定義１０８属性情報ＤＢ１０９属性情報更新指定１１０属性情報更新処理プログラム１１１読み出し文書指定１１２構造化文書読み出し処理プログラム１１３構造化文書内容１１４属性情報読み出し処理プログラム１１５属性情報１１６更新済構造化文書生成処理プログラム１１７更新済構造化文書１５０１構造化文書更新処理プログラム Reference Signs List 101 structured document management system 102 input / output device 103 structured document to be registered 104 document registration processing program 105 structured document DB 106 attribute information extraction / registration processing program 107 extracted element definition 108 attribute information DB 109 attribute information update designation 110 attribute Information update processing program 111 Read document designation 112 Structured document read processing program 113 Structured document contents 114 Attribute information read processing program 115 Attribute information 116 Updated structured document generation processing program 117 Updated structured document 1501 Structured document update processing program

───────────────────────────────────────────────────── フロントページの続き (72)発明者松本正義神奈川県川崎市幸区鹿島田890番地株式会社日立製作所ビジネスソリューション事業部内 (72)発明者井町真琴神奈川県川崎市幸区鹿島田890番地株式会社日立製作所ビジネスソリューション事業部内 (72)発明者高橋亨神奈川県川崎市幸区鹿島田890番地株式会社日立製作所ビジネスソリューション事業部内Ｆターム(参考） 5B009 NA05 SA00 5B075 NR02 NR20 5B082 AA11 EA01 ──────────────────────────────────────────────────の Continuing from the front page (72) Inventor Masayoshi Matsumoto 890 Kashimada, Saiwai-ku, Kawasaki-shi, Kanagawa Prefecture Hitachi Business Solution Business Division (72) Inventor Makoto Imachi 890 Kashimada, Saiwai-ku, Kawasaki-shi, Kanagawa Prefecture Co., Ltd. Hitachi Business Solution Division (72) Inventor Toru Takahashi 890 Kashimada, Saiwai-ku, Kawasaki City, Kanagawa Prefecture F-term, Hitachi Business Solution Division 5B009 NA05 SA00 5B075 NR02 NR20 5B082 AA11 EA01

Claims

[Claims]

1. A structured document management method in a document management system having a function of managing, registering, and referencing an electronic document, wherein extraction element designation information for designating an element to be extracted from a structured document to be registered is provided. A structured document content extracting step of extracting the content of a specified element from the structured document based on the structured document; a structured document managing step of registering and managing the extracted content in association with the structured document to be registered; An extracted content update step of updating the extracted content based on the update information; and an updated structured document that generates a structured document reflecting the extracted content updated in the extracted content update step based on the registered structured document A structured document management method comprising a generation step.

2. The structured document management method according to claim 1, wherein in the structured document management step, the structured document to be registered is stored in a record of one table in a database, and the structured document content extracting step is performed. Storing the content extracted in step 2 in a record of a table different from the table storing the structured document, and storing key information for combining the two records in a column of each record. Structured document management method.

3. The structured document management method according to claim 1, wherein in the updated structured document generation step, the structured document content is determined based on extraction element designation information used in the structured document content extraction step. The element specified in the extraction step and the storage location of the content of the element are specified, and based on the information specifying the storage location, the content extracted from the structured document used for registration is used as the updated information of the content. A structured document management method, wherein a structured document reflecting an update result of extracted contents is generated by a specified element content updating step to be replaced.

4. The structured document management method according to claim 3, wherein in the specified element content updating step, the extraction element specification information used in the structured document content extraction step and the extracted element specification information are updated in the extracted content updating step. Based on the content, generate structured document conversion definition information for converting the content of the designated element of the structured document into the content updated in the extraction content update step, and generate a structure based on the structured document conversion definition information. A structured document management method comprising: converting a structured document used for registration in a structured document conversion step of converting a structured document.

5. The structured document management method according to claim 1, wherein in the structured document management step, whether or not the content extracted in the structured document content extraction step has been updated from the time of registration of the structured document. In the extraction content update step, information indicating that the extraction content has been updated is set in the extraction content update information corresponding to the updated content, and the updated structured document In the generating step, by referring to the extracted content update information, only the updated extracted content is to be updated, and a structured document reflecting the update result of the extracted content is generated. Management method.

6. A structured document management device that operates in a document management system having a function of managing, registering, and referring to an electronic document, wherein an extraction element specification that specifies an element to be extracted from the structured document to be registered A structured document content extracting means for extracting the content of a specified element from the structured document based on the information; and a structured document managing means for registering and managing the extracted content in association with the structured document to be registered. An extracted content updating unit that updates the extracted content based on update information; and an updated structuring unit that generates a structured document reflecting the content updated by the extracted content updating unit based on the registered structured document. A structured document management device comprising document generation means.

7. A structured document that operates in a document management system having a function of managing, registering, and referencing an electronic document, based on extracted element designation information for designating an element to be extracted from a structured document to be registered. A structured document content extraction procedure for extracting the content of a specified element from a structured document, a structured document management procedure for registering and managing the extracted content in association with a structured document to be registered, and updating the extracted content with update information Structured document having an extracted content update procedure for updating based on a registered structured document, and an updated structured document generation procedure for generating a structured document reflecting the content updated in the extracted content update procedure based on the registered structured document A computer-readable recording medium recording a management program.