JP2008146178A

JP2008146178A - Structured document processing device and method

Info

Publication number: JP2008146178A
Application number: JP2006329919A
Authority: JP
Inventors: Wataru Shimizu; 渉清水
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2006-12-06
Filing date: 2006-12-06
Publication date: 2008-06-26

Abstract

PROBLEM TO BE SOLVED: To improve the reliability of error processing even with a compact apparatus by verifying the adequacy of a structured document with a saved memory without maintaining all information on a schema. SOLUTION: This structured document processing device performs verification by reading the structured document as a verification object by one node and by reading only the required amount of a schema for verifying the node (S207). After completing the verification of one node, it is determined whether or not the maintained schema information may be required in future and if it is not required, it is abandoned (S208, S210). After that, these steps are repeated until reading through the entire structured document. COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は構造化文書の検証技術に関する。 The present invention relates to a structured document verification technique.

近年、コンピュータ上で扱う文書やアプリケーションの設定ファイル、データ通信時のフォーマットとしてＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）が普及している。ＸＭＬは開始タグ（'＜'タグ名'＞'）と終了タグ（'＜/'タグ名'＞'）によってテキストに意味付けを行うための規格である。ＸＭＬではタグの名前は任意であり、またある名前のタグが具体的にどのような意味を持つかは、ＸＭＬを基に作られたそれぞれのフォーマットによる。 In recent years, XML (Extensible Markup Language) has become widespread as a document and application setting file to be handled on a computer and as a format for data communication. XML is a standard for giving meaning to text by a start tag ('<' tag name '>') and an end tag ('</' tag name '>'). In XML, the name of a tag is arbitrary, and the specific meaning of a tag with a certain name depends on the respective formats created based on XML.

例えばＸＨＴＭＬは他の文書へのリンクを含む文書を画面に表示するための規格である。この規格において、ｂというタグは文字を太字にすることを意味する。つまり「これはとても＜ｂ＞重要＜／ｂ＞です」という記述があれば「これはとても重要です」となり、「重要」の部分が太字で表示される。 For example, XHTML is a standard for displaying a document including a link to another document on a screen. In this standard, the tag b means that the character is bold. In other words, if there is a description “This is very important ”, “this is very important”, and the “important” part is displayed in bold.

ＸＭＬを基にした言語は誰でも自由に作成することができるため、標準化団体が定めたものや個人が独自で定めたものなど、数多くのＸＭＬを基にしたフォーマットが存在する。 Since anyone can freely create a language based on XML, there are a number of formats based on XML, such as those defined by standardization organizations and those defined by individuals.

ＸＭＬの文書を処理するときは通常ＸＭＬ専用の解析器（ＸＭＬパーサ）を使用する。ＸＭＬパーサはＸＭＬ文書を読み込んで解析を行い、要素や文字データなどのＸＭＬを構成する単位（ノード）に切り分けてアプリケーションに渡すものである。 When an XML document is processed, an analyzer dedicated to XML (XML parser) is usually used. The XML parser reads an XML document, analyzes it, divides it into units (nodes) constituting XML such as elements and character data, and passes them to the application.

ここで、ＸＭＬを処理するときのエラーには２種類存在することになる。つまりＸＭＬとしての文法違反（整形式エラー）とＸＭＬを用いた各フォーマットの文法のエラー（妥当性エラー）である。 Here, there are two types of errors when processing XML. That is, a grammatical violation as XML (a well-formed error) and a grammar error (validity error) of each format using XML.

例えば「＜Ａ＞Ｈｅｌｌｏ＜／Ｂ＞」という文書は、開始タグと終了タグの名前が異なっている。これはＸＭＬの文法に違反した文書である。一方、「＜ＸＹＺ＞Ｈｅｌｌｏ＜／ＸＹＺ＞」という文書はＸＭＬの文法は満たしている。しかしＸＨＴＭＬには＜ＸＹＺ＞というタグは存在しないためエラーとなる。このような各フォーマット固有の文法に対するエラーを妥当性エラーと呼び、妥当性エラーがないかを検証することを妥当性検証と呼ぶ。 For example, a document “<A> Hello ” has different names for the start tag and the end tag. This is a document that violates the XML grammar. On the other hand, the document “<XYZ> Hello </ XYZ>” satisfies the XML grammar. However, since there is no <XYZ> tag in XHTML, an error occurs. Such an error for each format-specific grammar is called a validity error, and verification of whether there is a validity error is called validation.

妥当性検証を行うためには、各フォーマットの文法を規定した文書（スキーマ）が必要である。スキーマとＸＭＬ文書を読み込み、そのＸＭＬ文書が妥当かの検証を行う検証器は以前から存在していた。例えばＪＡＶＡ（登録商標）に付属しているＸＭＬパーサはその一例である。また、ＸＭＬ文書のそれぞれのノードに対して、そのノードを検証するのに必要な代理を特定するためのテーブルを用意し、そのテーブルによって特定された代理が検証を行う、という方式が提案されている。 In order to perform validity verification, a document (schema) that defines the grammar of each format is required. There has been a verifier that reads a schema and an XML document and verifies whether the XML document is valid. For example, an XML parser attached to JAVA (registered trademark) is an example. In addition, a method has been proposed in which a table for specifying a proxy necessary for verifying each node of the XML document is prepared, and the proxy specified by the table performs verification. Yes.

特開２００５−６３４１６号公報JP 2005-63416 A

しかしながら、ＪＡＶＡ（登録商標）の検証器では一度スキーマをすべて読み込んでメモリ上に保持し、保持したスキーマ情報を利用してＸＭＬ文書の検証を行うものである。特許文献１の方式も同様に、代理を特定するためのテーブルを作成するためにあらかじめスキーマをすべて読み込む必要がある。このため、スキーマのデータサイズが大きい場合には多量のメモリが必要となり、特にメモリの少ない機器では検証を行うことができないという問題があった。 However, the JAVA (registered trademark) verifier once reads all the schemas and stores them in the memory, and verifies the XML document using the stored schema information. Similarly, in the method of Patent Document 1, it is necessary to read all schemas in advance in order to create a table for specifying a proxy. For this reason, when the data size of the schema is large, a large amount of memory is required, and there is a problem that verification cannot be performed especially on a device with a small amount of memory.

本発明の目的は、多量のメモリを消費することなく構造化文書の妥当性を検証することが可能な構造化文書処理装置を提供することである。 An object of the present invention is to provide a structured document processing apparatus capable of verifying the validity of a structured document without consuming a large amount of memory.

本発明の一側面に係る構造化文書処理装置は、構造化文書及び、構造化文書で記述されたスキーマを、ノードごとに逐次解析する構造化文書逐次解析手段と、前記構造化文書逐次解析手段がスキーマを解析することにより得られたスキーマのノード情報から、スキーマの制約情報であるスキーマ情報を得るスキーマ情報解析手段と、前記スキーマ情報解析手段によって得られた前記スキーマ情報を保持する保持手段と、前記スキーマ情報の構造制約の情報から、構造化文書の現在のノードを検証するのに必要なスキーマ情報を有しているかを判断するスキーマ情報量判断手段と、前記構造化文書逐次解析手段が前記構造化文書を解析することにより得られたノードに対して、前記スキーマ情報を用いて妥当性の検証を行う妥当性検証手段と、前記スキーマ情報の構造制約の情報から、前記保持手段により保持されているスキーマ情報が以後の検証に必要かどうかを判断するスキーマ情報必要性判断手段と、前記スキーマ必要性判断手段によって不要と判断されたスキーマ情報を破棄する破棄手段とを有することを特徴とする。 A structured document processing apparatus according to an aspect of the present invention includes a structured document sequential analysis unit that sequentially analyzes a structured document and a schema described in the structured document for each node, and the structured document sequential analysis unit. Schema information analysis means for obtaining schema information which is schema constraint information from schema node information obtained by analyzing the schema, and holding means for holding the schema information obtained by the schema information analysis means A schema information amount judging means for judging whether the schema information necessary for verifying the current node of the structured document is obtained from the structure constraint information of the schema information, and the structured document sequential analyzing means, Validity verification means for verifying validity using the schema information for the node obtained by analyzing the structured document; From the structure constraint information of the schema information, the schema information necessity judgment means for judging whether the schema information held by the holding means is necessary for subsequent verification, and the schema necessity judgment means judge that the schema information is unnecessary. And discarding means for discarding the schema information.

本発明によれば、スキーマを用いて構造化文書の妥当性を検証する際に、スキーマの情報をすべて保持することなく検証を行うため省メモリで実現することが可能となり、小型の機器でもエラー処理の信頼性を向上させることができる。 According to the present invention, when the validity of a structured document is verified using a schema, verification can be performed without retaining all information of the schema, so that it can be realized with a small amount of memory. Processing reliability can be improved.

以下、図面を参照して本発明の好適な実施形態について詳細に説明する。なお、本発明は以下の実施形態に限定されるものではなく、本発明の実施に有利な具体例を示すにすぎない。また、以下の実施形態の中で説明されている特徴の組み合わせの全てが本発明の課題解決手段として必須のものであるとは限らない。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In addition, this invention is not limited to the following embodiment, It shows only the specific example advantageous for implementation of this invention. In addition, not all combinations of features described in the following embodiments are essential as the problem solving means of the present invention.

＜実施形態１＞
本発明の第一の実施形態における構造化文書処理装置の構成を図１に示す。 <Embodiment 1>
The structure of the structured document processing apparatus in the first embodiment of the present invention is shown in FIG.

同図において、ＣＰＵ１０１はシステム制御部であり装置全体を制御する。ＲＯＭ１０２はＣＰＵの制御プログラムや各種固定データを格納する。ＲＡＭ１０３はＳＲＡＭ、ＤＲＡＭなどで構成され、プログラム制御変数などを格納する。また、各種設定パラメータ、各種ワークバッファなどもＲＡＭ１０３に格納される。記憶部１０４はハードディスクなどであり、ファイルデータを格納する。 In the figure, a CPU 101 is a system control unit and controls the entire apparatus. The ROM 102 stores CPU control programs and various fixed data. The RAM 103 is composed of SRAM, DRAM, and the like, and stores program control variables and the like. Various setting parameters, various work buffers, and the like are also stored in the RAM 103. The storage unit 104 is a hard disk or the like, and stores file data.

構造化文書逐次解析部１０５は構造化文書及び構造化文書で記述されたスキーマを解析し、ノード情報をＲＡＭ１０３に格納する。スキーマ情報解析部１０６は、スキーマを構造化文書逐次解析部１０５によって解析して得られたノード情報を、スキーマの制約情報として解析し、構造化文書検証部１０８が使用できるスキーマ情報としてＲＡＭ１０３に格納する。 The structured document sequential analysis unit 105 analyzes the structured document and the schema described in the structured document, and stores the node information in the RAM 103. The schema information analysis unit 106 analyzes the node information obtained by analyzing the schema by the structured document sequential analysis unit 105 as schema constraint information, and stores it in the RAM 103 as schema information that can be used by the structured document verification unit 108. To do.

なお構造化文書逐次解析部１０５は逐次処理が可能となっている。逐次処理とは、対象となる文書を一度にすべて読み込むのではなく、文書の先頭から一部ずつ読み込んで使用し、また必要のなくなった情報を適宜破棄していくことが可能であることを意味する。 The structured document sequential analysis unit 105 can perform sequential processing. Sequential processing means that it is possible to read and use a part of the document from the beginning of the document one by one, and discard information that is no longer needed as appropriate, rather than reading all of the target document at once. To do.

スキーマ情報量判断部１０７は、構造化文書逐次解析部１０５が読み込んだノードを、現在保持しているスキーマ情報で検証を行うことが可能かを判断する。構造化文書検証部１０８は、ＲＡＭ１０３に保持しているスキーマ情報を用いて、ＲＡＭ１０３に保持しているノードが妥当なものかどうかを検証する。スキーマ情報必要性判断部１０９は、ＲＡＭ１０３に保持しているスキーマ情報が以後も使用されるかどうかを判断する。 The schema information amount determination unit 107 determines whether the node read by the structured document sequential analysis unit 105 can be verified with the currently held schema information. The structured document verification unit 108 verifies whether or not the node stored in the RAM 103 is valid using the schema information stored in the RAM 103. The schema information necessity determination unit 109 determines whether or not the schema information held in the RAM 103 will be used thereafter.

この構造化文書処理装置が、記憶部１０４に保存されている構造化文書を、同じく記憶部１０４に保存されているスキーマを用いて検証を行う。なお、本実施形態では構造化文書はＸＭＬを基とするフォーマットで書かれており、スキーマはＲＥＬＡＸＮＧ（http://www.relaxng.org/）で記述されているものとする。ただし、本実施形態中にある記述は説明の簡略化のため、本発明の本質から外れる部分は除外している。 This structured document processing apparatus verifies the structured document stored in the storage unit 104 using the schema stored in the storage unit 104. In this embodiment, the structured document is written in a format based on XML, and the schema is described in RELAX NG (http://www.relaxng.org/). However, the description in the present embodiment excludes parts that are out of the essence of the present invention for simplification of explanation.

スキーマの例を図３に示す。ＲＥＬＡＸＮＧのスキーマでは、ｅｌｅｍｅｎｔ要素によってＸＭＬ文書中に存在するべき要素を表し、ｎａｍｅ属性でその要素の名前を記述する。また、ｅｌｅｍｅｎｔ要素の内容として、さらにｅｌｅｍｅｎｔ要素がある場合は、その外側にある要素の内容として内側にある要素が存在しなくてはならないことを示す。ｅｌｅｍｅｎｔ要素の内容がｔｅｘｔ要素のときは、その要素の内容として任意の文字列を持つことを意味する。ｉｎｔｅｒｌｅａｖｅ要素は、その中に記述された要素が存在しなければならないが、順番は問わないことを意味する。ｚｅｒｏＯｒＭｏｒｅ要素は、その中に記述された要素が繰り返し存在することが可能で、一つもなくてもよいことを意味する。またその他に、複数の要素のうちのいずれか一つが存在すればよいことを示すｃｈｏｉｃｅ要素などがある。 An example schema is shown in FIG. In the RELAX NG schema, an element element represents an element that should exist in the XML document, and a name attribute describes the name of the element. Further, when there is an element element as the content of the element element, it indicates that the element inside the element must exist as the content of the element outside the element element. When the content of the element element is a text element, it means that the element has an arbitrary character string. The interleave element means that the elements described therein must exist, but the order does not matter. The zeroOrMore element means that the element described in the element can exist repeatedly, and there may not be one. In addition, there is a choice element indicating that any one of a plurality of elements is present.

つまり図３のスキーマは、ｐｅｒｓｏｎという要素が存在し、その内容としてはまずｆａｍｉｌｙＮａｍｅとｇｉｖｅｎＮａｍｅ要素が順不同で存在しなければならない。さらにその後に、ｃｅｒｔｉｆｉｃａｔｉｏｎ要素が０回以上繰り返して存在する、ということを意味する。また、ｆａｍｉｌｙＮａｍｅ、ｇｉｖｅｎＮａｍｅ、ｃｅｒｔｉｆｉｃａｔｉｏｎ要素の内容は任意の文字列が可能である。 That is, the schema of FIG. 3 has an element called person, and the contents must first have a familyName and a givenName element in no particular order. After that, it means that the certification element is repeated zero or more times. The contents of the familyName, givenName, and certification elements can be arbitrary character strings.

このスキーマに対して妥当な文書の例を図４及び図５に示す。いずれもｐｅｒｓｏｎ要素があり、その中にｆａｍｉｌｙＮａｍｅ及びｇｉｖｅｎＮａｍｅがあり、その後にｃｅｒｔｉｆｉｃａｔｉｏｎ要素が繰り返し存在している。 Examples of documents valid for this schema are shown in FIGS. Each has a person element in which familyName and givenName are present, and then a certification element is repeatedly present.

これに対して図６は、上記スキーマに対して妥当でない文書の例である。この文書はａｇｅという要素を含んでいるが（６０４）、この要素はスキーマには存在しないため妥当でないとみなされる。 In contrast, FIG. 6 is an example of a document that is not valid for the schema. This document contains an element called age (604), but this element is not valid because it does not exist in the schema.

第一の実施形態の構造化文書処理装置の処理の流れを図２に示す。ここでは、例として使用するスキーマは図３、解析する文書は図５を用いる。まず検証対象の文書をすべて読み込んだかを確認する（ステップＳ２０２）。まだすべてを読み込んでいない場合は検証対象の文書を１ノード読み取る（ステップＳ２０３）。ノードとは、ＸＭＬの開始タグ、終了タグ、文字データなど、構造化文書を構成する単位となるべきものである。この例の場合は、＜ｐｅｒｓｏｎ＞タグ（５０１）が読み込まれる。 FIG. 2 shows a processing flow of the structured document processing apparatus according to the first embodiment. Here, FIG. 3 is used as an example schema and FIG. 5 is used as a document to be analyzed. First, it is confirmed whether all documents to be verified have been read (step S202). If not all have been read yet, one node of the document to be verified is read (step S203). A node is a unit constituting a structured document, such as an XML start tag, end tag, and character data. In this example, a <person> tag (501) is read.

次に、読み込んだノードを検証するのに必要なスキーマ情報を有しているかを確認する（ステップＳ２０４）。最初の段階ではスキーマは何も読んでおらず、必要なスキーマ情報を保持していないため、スキーマを１ノード読み込み（ステップＳ２０５）、スキーマ情報として解析してＲＡＭ１０３に保存する（ステップＳ２０６）。スキーマの最初のノード（３０１）はｅｌｅｍｅｎｔタグであり、ｎａｍｅ属性にｐｅｒｓｏｎがあることから、最初の要素がｐｅｒｓｏｎでなければならないことがわかる。 Next, it is confirmed whether or not it has schema information necessary for verifying the read node (step S204). At the first stage, no schema is read and necessary schema information is not held, so one node is read (step S205), analyzed as schema information, and stored in the RAM 103 (step S206). Since the first node (301) of the schema is an element tag and there is person in the name attribute, it can be seen that the first element must be person.

次にもう一度、読み込んだノードを検証するのに必要なスキーマ情報を有しているかの判断を行う（ステップＳ２０４）。この判断は、スキーマ情報の構造制約の情報から判断できる。ＲＥＬＡＸＮＧによるスキーマの構成は主に順次、選択、順不同、反復の４種類の構成からなる。順次構造はｅｌｅｍｅｎｔ要素が連続する場合など、指定された要素がその順で存在しなければならないことを表すものである。選択構造は、複数の候補のうちの一つが存在すればよいことを示すものである。順不同構造は、指定された複数の要素が存在しなければならないがその順序は問わないものである。反復構造は与えられた要素が繰り返し存在可能であることを表すものである。選択、順不同、反復は、ＲＥＬＡＸＮＧではそれぞれｃｈｏｉｃｅ、ｉｎｔｅｒｌｅａｖｅ、ｚｅｒｏＯｒＭｏｒｅなどで表す。 Next, it is determined again whether the schema information necessary for verifying the read node is present (step S204). This determination can be made from the structure constraint information of the schema information. The schema structure by RELAXNG mainly consists of four types of structures: sequential, selection, random order, and repetition. The sequential structure represents that specified elements must exist in that order, such as when element elements are continuous. The selection structure indicates that one of a plurality of candidates only needs to exist. An unordered structure is one in which a plurality of designated elements must exist, but the order is not important. The repetitive structure represents that a given element can exist repeatedly. Selection, random order, and repetition are represented by choice, interleave, zeroOrMore, etc. in RELAXNG, respectively.

これら４つの構成のうち選択構造と順不同構造の場合は、その内容をすべて保持していないと検証を行うことはできない。なぜなら、ノードに対する制約の候補が複数存在することになり、それらのすべてに対応する必要があるからである。一方順次構造や反復構造の場合は、存在できるノードの候補が一つしかないため、その一つの制約のみを読み込めば検証を行うことが可能である。また、順不同構造内の検証の途中である場合は、保持しているスキーマ情報で検証が可能であるため読み込む必要がないと判断される。つまり、ＲＥＬＡＸＮＧの場合は、読み込んだスキーマがｅｌｅｍｅｎｔ、ｔｅｘｔ要素などの場合はその段階で検証を行うことが可能だと判断できる。一方、ｃｈｏｉｃｅやｉｎｔｅｒｌｅａｖｅ要素のときは、その終わりまでを読む必要があると判断する。また、ｉｎｔｅｒｌｅａｖｅ要素内の要素の検証中は、そのｉｎｔｅｒｌｅａｖｅ要素をはじめに読んだときに最後まで読み込んでいるため、それ以上スキーマを読み込む必要がないと判断できる。 In the case of the selected structure and the unordered structure among these four configurations, verification cannot be performed unless all the contents are held. This is because there are a plurality of constraint candidates for the node and it is necessary to deal with all of them. On the other hand, in the case of a sequential structure or a repetitive structure, since there is only one candidate node that can exist, verification can be performed by reading only one constraint. Further, when verification is in progress in an unordered structure, it is determined that it is not necessary to read because verification is possible with the schema information held. That is, in the case of RELAX NG, if the read schema is an element or a text element, it can be determined that verification can be performed at that stage. On the other hand, in the case of a choice or interleave element, it is determined that it is necessary to read the end of the element. Further, during the verification of the element in the interleave element, when the interleave element is read to the end when it is first read, it can be determined that it is not necessary to read the schema any more.

このような判断の結果、最初に読み込んだときはｅｌｅｍｅｎｔ要素（３０１）であるため、この段階で検証可能と判断し、検証を行う（ステップＳ２０７）。スキーマで指定された要素と保持しているノードとが一致しているため、妥当であると判断される。もしそれ以外のノードであれば妥当ではないとみなされエラーとなる（ステップＳ２０８、２０９）。 As a result of such determination, since it is the element element (301) when it is first read, it is determined that verification is possible at this stage, and verification is performed (step S207). Since the element specified in the schema matches the held node, it is determined to be valid. If it is any other node, it is regarded as invalid and an error occurs (steps S208 and S209).

次に、現在保持しているスキーマ情報が今後も必要かどうかの判断を行う（ステップＳ２１０）。この判断もスキーマ情報の構造制約の情報から判断できる。前述の４種類の構造のうち、反復構造では同じ構成が連続する可能性があるため、スキーマ情報を保持する必要があると判断できる。反復構造は、その反復構造の次の要素が現れ、確実に反復構造が終了したと判断できる場合にのみ不要と判断することができる。逆に、その他３つの構造の場合は、再び使われることはないと判断できる。この判断によって不要と判断された場合は、ＲＡＭ１０３に保持されているスキーマ情報は破棄される（ステップＳ２１１）。 Next, it is determined whether the currently held schema information is necessary in the future (step S210). This determination can also be made from the structure constraint information of the schema information. Of the four types of structures described above, it can be determined that it is necessary to retain schema information because the same structure may continue in the repetitive structure. A repetitive structure can be determined to be unnecessary only when the next element of the repetitive structure appears and it can be determined that the repetitive structure has ended. Conversely, in the case of the other three structures, it can be determined that they will not be used again. If it is determined that this is unnecessary, the schema information stored in the RAM 103 is discarded (step S211).

図３の３０１におけるスキーマ要素は反復構造ではないため不要と判断され、この情報は破棄される。 Since the schema element 301 in FIG. 3 is not a repetitive structure, it is determined to be unnecessary, and this information is discarded.

この段階では検証対象文書をまだすべて読み終えていないため次のノードを読み取る（ステップＳ２０２，２０３）。次に読み込む検証対処文書のノードはｆａｍｉｌｙＮａｍｅ要素（５０２）である。また読み込むスキーマはｉｎｔｅｒｌｅａｖｅ要素（３０２）である。ｉｎｔｅｒｌｅａｖｅ要素は、内部の構造が順不同であるため、検証を行うにはｉｎｔｅｒｌｅａｖｅの終わり（３０５）まで必要と判断され、そこまでをすべて読み込む。その結果５０２が妥当であると判断できる。次のｇｉｖｅｎＮａｍｅ要素（５０３）のときは、ｉｎｔｅｒｌｅａｖｅ要素内であるため、スキーマを読み込む必要がないことを除けば同様に検証できる。 At this stage, since all the verification target documents have not been read yet, the next node is read (steps S202 and 203). The node of the verification handling document to be read next is a familyName element (502). The schema to be read is an interleave element (302). Since the internal elements of the interleave element are out of order, it is determined that the interleave element is necessary up to the end of the interleave (305) to perform verification, and all of the elements are read. As a result, it can be determined that the result 502 is appropriate. The next giveName element (503) can be verified in the same manner except that it is not necessary to read the schema because it is in the interleave element.

同様に、ｃｅｒｔｉｆｉｃａｔｉｏｎ要素（５０４）を読み込んだときは、ｚｅｒｏＯｒＭｏｒｅ要素（３０６）を読み込むが、これは反復構造であるため、最初の制約であるｅｌｅｍｅｎｔ要素（３０７）までを読み込めば検証可能と判断して検証を行う。検証後、保持しているスキーマ情報が必要かを判断するが、ｚｅｒｏＯｒＭｏｒｅ要素のような反復構造の場合は保持しておく、そのスキーマ情報を使用してｃｅｒｔｉｆｉｃａｔｉｏｎ要素５０５も検証を行うことができる。 Similarly, when the certification element (504) is read, the zeroOrMore element (306) is read. Since this is an iterative structure, it is determined that verification is possible by reading up to the element element (307) which is the first constraint. And verify. After the verification, it is determined whether the stored schema information is necessary. However, in the case of a repetitive structure such as a zeroOrMore element, the certification element 505 can be verified using the schema information stored.

以上の処理を繰り返し、検証対象文書を最後まで読み込み、妥当性エラーが起きなければその文書がスキーマに対して妥当であると判断することができる。 The above processing is repeated, the verification target document is read to the end, and if no validity error occurs, it can be determined that the document is valid for the schema.

＜実施形態２＞
本発明の第二の実施形態における構造化文書処理装置の構成を図７に示す。これは、ネットワーク上などの外部の機器から、逐次処理を行うことができないスキーマを受信したときに、スキーマを変換して保存することにより、逐次処理を行うものである。 <Embodiment 2>
FIG. 7 shows the configuration of the structured document processing apparatus according to the second embodiment of the present invention. In this method, when a schema that cannot be sequentially processed is received from an external device on the network or the like, the schema is converted and stored to perform sequential processing.

図７において、７０１から７０９まではそれぞれ、図１の１０１から１０９までと同じものである。図７では、更に、スキーマ変換部７１０及びネットワークインタフェース７１１が付加されている。スキーマ変換部７１０は、スキーマ逐次解析部７０６によって読み込んだスキーマ情報を、逐次処理が可能となるように変換して出力する。ネットワークインタフェース７１１は、ＬＡＮボードやモデムといったネットワークインタフェースであり、ネットワーク上の他の機器との通信を可能とする。 In FIG. 7, reference numerals 701 to 709 are the same as 101 to 109 in FIG. In FIG. 7, a schema conversion unit 710 and a network interface 711 are further added. The schema conversion unit 710 converts the schema information read by the schema sequential analysis unit 706 so as to enable sequential processing and outputs the converted information. The network interface 711 is a network interface such as a LAN board or a modem, and enables communication with other devices on the network.

この構造化文書処理装置は、図８に示すネットワークに接続されている。図８において、８０１はプロバイダと呼ばれ、ネットワーク上の他の機器に対して何らかのサービスを提供するものである。また、８０２はリクエスタと呼ばれ、プロバイダに対してサービスの依頼をするものである。これらの機器はネットワーク８０３で接続されており、相互に通信が可能である。本実施形態における構造化文書処理装置はこのうちのリクエスタにあたるものである。 This structured document processing apparatus is connected to the network shown in FIG. In FIG. 8, reference numeral 801 denotes a provider, which provides some service to other devices on the network. Reference numeral 802 denotes a requester that requests a service from a provider. These devices are connected via a network 803 and can communicate with each other. The structured document processing apparatus according to this embodiment corresponds to a requester among them.

この構造化文書処理装置（リクエスタ）が新たにプロバイダのサービスを利用するためには、まずプロバイダに送信又は受信するメッセージのフォーマット、つまりスキーマを知る必要がある。スキーマの追加は、人手で行う方法や、プロバイダから取得するなどの方法で行う。 In order for this structured document processing apparatus (requester) to newly use a provider's service, it is necessary to first know the format of the message to be transmitted or received to the provider, that is, the schema. The schema is added manually or by obtaining it from a provider.

一般的にスキーマは人間が見たときの見やすさや、保守及び拡張性のためにモジュール化されていることが多く、そのままでは逐次処理が不可能であることが多い。 In general, schemas are often modularized for human viewing, maintenance, and extensibility, and sequential processing is often impossible as they are.

逐次処理が不可能なスキーマの例を図９に示す。この例もＲＥＬＡＸＮＧを簡略化した書式で記述している。ここでｓｔａｒｔという要素が、このスキーマ全体の開始点となるべき場所を示している。つまり、ｓｔａｒｔ要素の中の最初のｅｌｅｍｅｎｔ要素（９０３）が、このスキーマに従ったＸＭＬ文書の先頭の要素の制約を表している。この例ではａｄｄｒｅｓｓＢｏｏｋとなっている。さらにその中にｅｌｅｍｅｎｔ要素（９０４）によって、次にｎａｍｅという要素が必要であることがわかる。９０５にあるｒｅｆという要素は、他の場所で定義されたものを使用することを表すものである。定義はｄｅｆｉｎｅという要素で記述し、ｒｅｆ要素のｎａｍｅ属性によって参照先を特定する。つまりｒｅｆ要素（９０５）は、文書の後方にある９１０から９１３の箇所を参照することを意味している。 An example of a schema that cannot be sequentially processed is shown in FIG. This example also describes RELAX NG in a simplified format. Here, the element “start” indicates a place to be the starting point of the entire schema. That is, the first element element (903) in the start element represents the restriction of the head element of the XML document according to this schema. In this example, it is addressBook. Furthermore, it can be seen from the element element (904) that an element named "name" is required next. The element ref in 905 indicates that the one defined elsewhere is used. The definition is described by an element “define”, and the reference destination is specified by the name attribute of the ref element. In other words, the ref element (905) means that a portion from 910 to 913 at the back of the document is referred to.

このように特定の部分を別の箇所に置き、そこを参照することで、スキーマ全体構成の把握が容易になり、また他のスキーマで再利用が可能になるという利点がある。しかし、文書の後方を参照しているスキーマの場合には、参照先まで読み込まなければ検証を行うことができず、逐次処理を行うことが出来ない。そこで、図１４の流れに従って、スキーマを変換することが必要となる。 Thus, by placing a specific part in another place and referring to it, there is an advantage that it is easy to grasp the entire schema configuration and it can be reused in another schema. However, in the case of a schema that references the back of the document, verification cannot be performed unless the reference destination is read, and sequential processing cannot be performed. Therefore, it is necessary to convert the schema according to the flow of FIG.

まずスキーマ内にある、参照先となり得る部分と、それ以外の部分にわけた複数のファイルとして記憶装置７０４に保存する（ステップＳ１４０２）。次にもう一度参照先以外の部分のファイルを読みながら、参照先を保存したファイルと組み合わせることで新たな逐次処理用スキーマを生成する（ステップＳ１４０３）。その後、ネットワークインタフェースでＸＭＬを受信し（ステップＳ１４０４）、解析及び検証を行う（ステップＳ１４０５）。 First, the file is stored in the storage device 704 as a plurality of files divided into a part that can be a reference destination and other parts in the schema (step S1402). Next, a new sequential processing schema is generated by combining the reference destination with the saved file while reading the file other than the reference destination again (step S1403). Thereafter, XML is received by the network interface (step S1404), and analysis and verification are performed (step S1405).

ステップＳ１４０２におけるファイル保存処理の詳細を図１５に示す。 Details of the file saving process in step S1402 are shown in FIG.

まず、スキーマを１ノード読み込む（ステップＳ１５０２）。そのノードがｄｅｆｉｎｅ要素であるかを確認し（ステップＳ１５０３）、ｄｅｆｉｎｅ要素以外のノードはそのまま出力する（ステップＳ１５０４、１５１０）。ｄｅｆｉｎｅ要素であれば、それを別のファイルに保存をする（ステップＳ１５０５）。このときファイル名は他の同一のものがなければ任意のものでよく、保存したファイル名と参照名（ｄｅｆｉｎｅ要素のｎａｍｅ属性の値）を保存するテーブルに追加する（ステップＳ１５０６）。ここではｎａｍｅＤａｔａ．ｒｎｇという名前で保存されたとする。そしてｄｅｆｉｎｅ要素の終わりまでのノードをすべて新たに作成したファイルに出力する（ステップＳ１５０７、１５０８、１５０９）。このようにしてスキーマのすべてのノードを別のファイルに出力する。 First, one node of the schema is read (step S1502). It is confirmed whether or not the node is a define element (step S1503), and nodes other than the define element are output as they are (steps S1504 and 1510). If it is a define element, it is stored in another file (step S1505). At this time, the file name may be arbitrary as long as there is no other same one, and the saved file name and the reference name (value of the name attribute of the define element) are added to the table for saving (step S1506). Here, nameData. Suppose that it was saved with the name rng. Then, all the nodes up to the end of the define element are output to the newly created file (steps S1507, 1508, 1509). In this way, all nodes in the schema are output to a separate file.

図９のスキーマに対しては結果として、参照先以外の情報を持つファイル（図１０）、ｎａｍｅＤａｔａという参照名で参照される情報を持つファイル（図１１）、及び参照名とファイル名のテーブル（図１２）が生成される。 For the schema of FIG. 9, as a result, a file having information other than the reference destination (FIG. 10), a file having information referred to by the reference name “nameData” (FIG. 11), and a table of reference names and file names ( FIG. 12) is generated.

次に、ステップＳ１４０３における逐次処理用スキーマの作成処理の詳細を図１６に示す。 Next, FIG. 16 shows details of the processing for creating the sequential processing schema in step S1403.

まず参照先以外の情報を持つスキーマを１ノードずつ読み込み（ステップＳ１６０２）、ｒｅｆ要素の内容以外はそのまま出力する（ステップＳ１６０４）。ｒｅｆ要素の場合はテーブルから参照名に対するファイル名を調べ（ステップＳ１６０５）、そのファイルのｄｅｆｉｎｅタグ以外の部分をすべてそのまま出力する（ステップＳ１６０６）。以上の処理をすべてのノードに対して行う（ステップＳ１６０７）。その結果として図１３のスキーマが出力される。このスキーマは参照の記述が解決されており、スキーマ先頭から順に処理を行う逐次処理が可能となっている。実際にＸＭＬ文書を検証する際の処理の流れは第一の実施形態のときと同様である。 First, a schema having information other than the reference destination is read node by node (step S1602), and contents other than the contents of the ref element are output as they are (step S1604). In the case of the ref element, the file name for the reference name is checked from the table (step S1605), and all parts other than the define tag of the file are output as they are (step S1606). The above processing is performed for all nodes (step S1607). As a result, the schema of FIG. 13 is output. In this schema, the description of the reference is solved, and sequential processing is performed in which processing is performed in order from the top of the schema. The processing flow when actually verifying the XML document is the same as in the first embodiment.

本実施形態において、構造化文書のフォーマットとしてＸＭＬ、スキーマ言語としてＲＥＬＡＸＮＧを例に用いたが、その他の構造化文書フォーマットやスキーマ言語を使用することも可能である。例えばＷ３ＣＸＭＬＳｃｈｅｍａの場合は、順次、選択、順不同、反復構造をそれぞれｓｅｑｕｅｎｃｅ、ｃｈｏｉｃｅ、ａｌｌ、ｍａｘＯｃｃｕｒｅｓ要素などに当てはめ、ｒｅｆ要素を参照構造とみなすことで、同様の処理が可能である。 In this embodiment, XML is used as the format of the structured document, and RELAX NG is used as the schema language. However, other structured document formats and schema languages can be used. For example, in the case of W3C XML Schema, the same processing can be performed by sequentially applying selection, random order, and repeating structure to the sequence, choice, all, maxOccures elements, and the like, and regarding the ref element as a reference structure.

また、実施形態はＸＭＬの解析時に検証を行うものであったが、検証を行った後にＸＭＬを生成する構成にすることで、ＸＭＬの生成時の検証にも活用することが可能である。 In the embodiment, the verification is performed at the time of the XML analysis. However, the XML can be generated after the verification, and can be used for the verification at the time of generating the XML.

また本実施形態では、スキーマ情報量の判断の際、ｃｈｏｉｃｅ要素内の要素をすべて読み込んでから検証を行っていたが、これをｃｈｏｉｃｅ内の要素を順次読み込んで検証を行うことも可能である。例えば、ｃｈｏｉｃｅ要素内にｅｌｅｍｅｎｔ要素が３つあるとき、一つずつ読み込んで検証を行い、妥当であった時点でそのノードは妥当であると判断し、３つすべてが妥当でないときに妥当性違反と判断することで、より省メモリ化を実現することができる。 In this embodiment, when the schema information amount is determined, the verification is performed after reading all the elements in the choice element. However, the verification can be performed by sequentially reading the elements in the choice. For example, when there are three element elements in the choice element, it is read and verified one by one, and when it is valid, the node is determined to be valid, and when all three are not valid, the validity violation By determining that, memory saving can be further realized.

（他の実施形態）
以上、本発明の実施形態を詳述したが、本発明は、複数の機器から構成されるシステムに適用してもよいし、また、一つの機器からなる装置に適用してもよい。 (Other embodiments)
Although the embodiments of the present invention have been described in detail above, the present invention may be applied to a system constituted by a plurality of devices or may be applied to an apparatus constituted by one device.

なお、本発明は、前述した実施形態の各機能を実現するプログラムを、システム又は装置に直接又は遠隔から供給し、そのシステム又は装置に含まれるコンピュータがその供給されたプログラムコードを読み出して実行することによっても達成される。 In the present invention, a program for realizing each function of the above-described embodiments is supplied directly or remotely to a system or apparatus, and a computer included in the system or apparatus reads and executes the supplied program code. Can also be achieved.

従って、本発明の機能・処理をコンピュータで実現するために、そのコンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、上記機能・処理を実現するためのコンピュータプログラム自体も本発明の一つである。 Accordingly, since the functions and processes of the present invention are implemented by a computer, the program code itself installed in the computer also implements the present invention. That is, the computer program itself for realizing the functions and processes is also one aspect of the present invention.

その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。 In this case, the program may be in any form as long as it has a program function, such as an object code, a program executed by an interpreter, or script data supplied to the OS.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷなどがある。また、記録媒体としては、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などもある。 Examples of the recording medium for supplying the program include a flexible disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, and CD-RW. Examples of the recording medium include a magnetic tape, a non-volatile memory card, a ROM, a DVD (DVD-ROM, DVD-R), and the like.

また、プログラムは、クライアントコンピュータのブラウザを用いてインターネットのホームページからダウンロードしてもよい。すなわち、ホームページから本発明のコンピュータプログラムそのもの、若しくは圧縮され自動インストール機能を含むファイルをハードディスク等の記録媒体にダウンロードしてもよい。また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードする形態も考えられる。つまり、本発明の機能・処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明の構成要件となる場合がある。 The program may be downloaded from a homepage on the Internet using a browser on a client computer. That is, the computer program itself of the present invention or a compressed file including an automatic installation function may be downloaded from a home page to a recording medium such as a hard disk. Further, it is also possible to divide the program code constituting the program of the present invention into a plurality of files and download each file from a different home page. That is, a WWW server that allows a plurality of users to download a program file for realizing the functions and processing of the present invention on a computer may be a constituent requirement of the present invention.

また、本発明のプログラムを暗号化してＣＤ−ＲＯＭ等の記憶媒体に格納してユーザに配布してもよい。この場合、所定条件をクリアしたユーザにのみ、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせ、その鍵情報で暗号化されたプログラムを復号して実行し、プログラムをコンピュータにインストールしてもよい。 Further, the program of the present invention may be encrypted and stored in a storage medium such as a CD-ROM and distributed to users. In this case, only the user who cleared the predetermined condition is allowed to download the key information to be decrypted from the homepage via the Internet, decrypt the program encrypted with the key information, execute it, and install the program on the computer May be.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現されてもよい。なお、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部又は全部を行ってもよい。もちろん、この場合も、前述した実施形態の機能が実現され得る。 Further, the functions of the above-described embodiments may be realized by the computer executing the read program. Note that an OS or the like running on the computer may perform part or all of the actual processing based on the instructions of the program. Of course, also in this case, the functions of the above-described embodiments can be realized.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれてもよい。そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部又は全部を行ってもよい。このようにして、前述した実施形態の機能が実現されることもある。 Furthermore, the program read from the recording medium may be written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Based on the instructions of the program, a CPU or the like provided in the function expansion board or function expansion unit may perform part or all of the actual processing. In this way, the functions of the above-described embodiments may be realized.

本発明の第一の実施形態における構造化文書処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the structured document processing apparatus in 1st embodiment of this invention. 本発明の第一の実施形態における構造化文書処理装置の検証処理を示すフローチャートである。It is a flowchart which shows the verification process of the structured document processing apparatus in 1st embodiment of this invention. スキーマの例を示す図である。It is a figure which shows the example of a schema. 、, 図３のスキーマに対して妥当な文書の例を示す図である。FIG. 4 is a diagram illustrating an example of a document valid for the schema of FIG. 3. 図３のスキーマに対して妥当でない文書の例を示す図である。FIG. 4 is a diagram illustrating an example of a document that is not valid for the schema of FIG. 3. 本発明の第二の実施形態における構造化文書処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the structured document processing apparatus in 2nd embodiment of this invention. 本発明の第二の実施形態における構造化文書処理装置と他機器との関係を示す図である。It is a figure which shows the relationship between the structured document processing apparatus and other apparatus in 2nd embodiment of this invention. スキーマ内での参照の記述を含むスキーマの例を示す図である。It is a figure which shows the example of the schema containing the description of the reference in a schema. 参照の記述を省いたスキーマの例を示す図である。It is a figure which shows the example of the schema which excluded the description of the reference. 参照される箇所のみを出力した結果の例を示す図である。It is a figure which shows the example of the result of outputting only the location referred. 参照名とファイル名との関係を示すテーブルの例を示す図である。It is a figure which shows the example of the table which shows the relationship between a reference name and a file name. 逐次処理が可能なように変換した結果のスキーマの例を示す図である。It is a figure which shows the example of the schema of the result converted so that sequential processing is possible. 本発明の第二の実施形態における構造化文書処理装置の処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of a process of the structured document processing apparatus in 2nd embodiment of this invention. スキーマ変換の一次処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the primary process of schema conversion. スキーマ変換の二次処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the secondary process of schema conversion.

Claims

Structured document sequential analysis means for sequentially analyzing the structured document and the schema described in the structured document for each node;
Schema information analysis means for obtaining schema information which is schema constraint information from the node information of the schema obtained by analyzing the schema by the structured document sequential analysis means;
Holding means for holding the schema information obtained by the schema information analyzing means;
Schema information amount determination means for determining whether or not the schema information necessary for verifying the current node of the structured document is obtained from the structure constraint information of the schema information;
Validity verification means for verifying validity using the schema information for nodes obtained by the structured document sequential analysis means analyzing the structured document;
Schema information necessity judgment means for judging whether or not the schema information held by the holding means is necessary for subsequent verification from the structure constraint information of the schema information;
A discarding unit that discards schema information determined to be unnecessary by the schema necessity determining unit;
A structured document processing apparatus comprising:

Structured document sequential generation means for receiving node information of the structured document and sequentially generating the structured document;
Structured document sequential analysis means for sequentially analyzing the schema described in the structured document for each node;
Schema information analysis means for obtaining schema information which is schema constraint information from the node information of the schema obtained by analyzing the schema by the structured document sequential analysis means;
Holding means for holding the schema information obtained by the schema information analyzing means;
Schema information amount judging means for judging whether or not the schema information necessary for verifying the current node is obtained from the structure constraint information of the schema information;
Validity verification means for verifying validity using the schema information for the node information of the structured document;
Schema information necessity judgment means for judging whether or not the schema information held by the holding means is necessary for subsequent verification from the structure constraint information of the schema information;
A discarding unit that discards schema information determined to be unnecessary by the schema necessity determining unit;
A structured document processing apparatus comprising:

A reference destination file creation unit that outputs a portion that may be referred to from another location to the schema information obtained by the schema information analysis unit;
When the schema information obtained by the schema information analysis means has a description referring to another location, the contents of the file created by the reference destination file creation means are read and output, and the other location is referred to An output means for outputting the part which is not as it is,
The structured document processing apparatus according to claim 1, further comprising:

A structured document sequential analysis step for sequentially analyzing the structured document and the schema described in the structured document for each node;
A schema information analysis step for obtaining schema information which is schema constraint information from the schema node information obtained by the schema analysis in the structured document sequential analysis step;
A storage step of storing the schema information obtained in the schema information analysis step in a memory;
A schema information amount determination step for determining whether the schema information necessary to verify the current node of the structured document is obtained from the structure constraint information of the schema information;
A validity verification step of performing validity verification using the schema information for the node obtained by the analysis of the structured document in the structured document sequential analysis step;
A schema information necessity determination step for determining whether or not the schema information stored in the memory is necessary for subsequent verification from the structure constraint information of the schema information;
A deletion step of deleting schema information determined to be unnecessary in the schema necessity determination step from the memory;
A structured document processing method characterized by comprising:

A structured document sequential generation step of receiving node information of the structured document and sequentially generating the structured document;
A structured document sequential analysis step for sequentially analyzing the schema described in the structured document for each node;
A schema information analysis step for obtaining schema information which is schema constraint information from the schema node information obtained by the schema analysis in the structured document sequential analysis step;
A storage step of storing the schema information obtained in the schema information analysis step in a memory;
A schema information amount determining step for determining whether the schema information necessary for verifying the current node is obtained from the structure constraint information of the schema information;
A validity verification step of performing validity verification on the node information of the structured document using the schema information;
A schema information necessity determination step for determining whether or not the schema information stored in the memory is necessary for subsequent verification from the structure constraint information of the schema information;
A deletion step of deleting schema information determined to be unnecessary in the schema necessity determination step from the memory;
A structured document processing method characterized by comprising:

A reference file creation step for outputting to a separate file a portion that may be referred to from another location in the schema information obtained in the schema information analysis step;
If the schema information obtained in the schema information analysis step has a description that refers to another location, the content of the file created in the reference destination file creation step is read and output, and the other location is referenced An output step that outputs the part that is not
6. The structured document processing method according to claim 4 or 5, further comprising:

A program for causing a computer to execute the structured document processing method according to any one of claims 4 to 6.