JP2007122278A

JP2007122278A - Document processing device and method, and program

Info

Publication number: JP2007122278A
Application number: JP2005311802A
Authority: JP
Inventors: Takeya Soeda; 岳也添田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2005-10-26
Filing date: 2005-10-26
Publication date: 2007-05-17

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that an XML parser which is only capable of analyzing canonical XML cannot ensure equipment compatibility. <P>SOLUTION: A document processing device has a canonical DOM parser part 502 for analyzing canonical XML documents, a canonicalization part 504 for converting noncanonical XML documents into canonical XML documents and supplying them to the canonical DOM parser part 502, and a grammar violation information recognition part 503 for identifying whether or not input documents are at least canonical XML documents. XML documents identified as noncanonical XML documents are converted into canonical XML documents by the canonicalization part 504 and analyzed by the canonical DOM parser part 502. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、構造化文書の文書構造を処理する文書処理装置及びその方法、プログラムに関するものである。 The present invention relates to a document processing apparatus, method, and program for processing the document structure of a structured document.

近年、機器間のデータ互換性を確保するためにＸＭＬ（Extensible Markup Language／非特許文献１）が利用されるようになっている。ＸＭＬはテキストベースであるため、文字エンコーディングや書式の自由度が高いという特徴を持つ。一方で、従来の機器に依存したバイナリデータよりも自由度が高いため、情報を解析するためのＸＭＬパーサも複雑なものになる。このＸＭＬパーサが複雑になることを避けるためには、ＸＭＬの書式を制限する必要がある。 In recent years, XML (Extensible Markup Language / Non-patent Document 1) has been used to ensure data compatibility between devices. Since XML is text-based, it has a feature that it has a high degree of freedom in character encoding and format. On the other hand, since the degree of freedom is higher than that of binary data depending on conventional devices, an XML parser for analyzing information becomes complicated. In order to avoid the complexity of the XML parser, it is necessary to limit the XML format.

ＸＭＬの書式を制限するための１つの方法としては、機器間で交換するデータを正規化ＸＭＬ（Canonical ＸＭＬ／非特許文献２）で記述することが考えられる。この正規化ＸＭＬは、ＸＭＬにおける自由度の高い書式を１つの書式に制限し、かつＸＭＬと同等の情報を表現できる仕様であり、本来は電子署名等に利用される。ネットワーク間で交換するデータを全て正規化ＸＭＬに限定できれば、ＸＭＬパーサに必要とされる解析能力も大幅に絞られ、ＸＭＬパーサの軽量化及び高速化を実現できる。
「Extensible Markup Language（ＸＭＬ） 1.0 （Third Edition）」、 W3C, 2004年、http://www.w3.org/TR/REC-XML/ 「Canonical XML Version 1.0」、W3C, 2001年、http://www.w3.org/TR/XML-c14n As one method for restricting the XML format, it can be considered that data exchanged between devices is described in normalized XML (Canonical XML / Non-patent Document 2). This normalized XML is a specification that limits the format with a high degree of freedom in XML to one format and can express information equivalent to XML, and is originally used for electronic signatures and the like. If all data exchanged between networks can be limited to normalized XML, the analysis capability required for the XML parser can be greatly reduced, and the XML parser can be reduced in weight and speed.
"Extensible Markup Language (XML) 1.0 (Third Edition)", W3C, 2004, http://www.w3.org/TR/REC-XML/ "Canonical XML Version 1.0", W3C, 2001, http://www.w3.org/TR/XML-c14n

局所的なネットワークであれば接続先の機器も制限されるので、全ての機器に対し上記の制約を加えることができる。しかし、広域なネットワーク上の機器との接続を前提とすると、不特定多数の機器に上記の制約を加えるのは不可能であり、正規化ＸＭＬのみを解析する能力を持つＸＭＬパーサでは機器間の相互運用性を確保することができない。 In the case of a local network, connection destination devices are also limited, and thus the above-described restrictions can be applied to all devices. However, given the connection with devices on a wide area network, it is impossible to apply the above restrictions to an unspecified number of devices, and an XML parser that has the ability to analyze only normalized XML can be used between devices. Interoperability cannot be ensured.

本発明の目的は、上記従来の問題点を解決することにある。 An object of the present invention is to solve the above-mentioned conventional problems.

本発明の特徴は、構造化文書の解析を効率良く行う技術を提供することにある。 A feature of the present invention is to provide a technique for efficiently analyzing a structured document.

上記目的を達成するために本発明の一態様に係る文書処理装置は以下のような構成を備える。即ち、
入力された構造化文書を解析する文書処理装置であって、
正規化ＸＭＬ文書を解析する正規化ＸＭＬ解析手段と、
非正規化ＸＭＬ文書を正規化ＸＭＬ文書へ変換して前記正規化ＸＭＬ解析手段に供給する正規化手段と、
入力された文書が、少なくとも正規化ＸＭＬ文書であるか否かを識別する識別手段とを備え、前記識別手段で正規化ＸＭＬ文書でないと識別されたＸＭＬ文書を前記正規化手段により正規化ＸＭＬ文書へ変換して前記正規化ＸＭＬ解析手段で解析することを特徴とする。 In order to achieve the above object, a document processing apparatus according to an aspect of the present invention has the following arrangement. That is,
A document processing device for analyzing an input structured document,
Normalized XML analysis means for analyzing the normalized XML document;
Normalization means for converting an unnormalized XML document into a normalized XML document and supplying the normalized XML document to the normalized XML analysis means;
Identification means for identifying whether the input document is at least a normalized XML document, and an XML document identified by the identification means as not being a normalized XML document is normalized XML document by the normalization means And the analysis is performed by the normalized XML analysis means.

上記目的を達成するために本発明の一態様に係る文書処理方法は以下のような工程を備える。即ち、
構造化文書を入力して解析する文書処理方法であって、
正規化ＸＭＬ文書を解析する正規化ＸＭＬ解析工程と、
非正規化ＸＭＬ文書を正規化ＸＭＬ文書へ変換して前記正規化ＸＭＬ解析工程に供給する正規化工程と、
入力した文書が、少なくとも正規化ＸＭＬ文書であるか否かを識別する識別工程とを備え、前記識別工程で正規化ＸＭＬ文書でないと識別されたＸＭＬ文書を前記正規化工程により正規化ＸＭＬ文書へ変換して前記正規化ＸＭＬ解析工程で解析することを特徴とする。 In order to achieve the above object, a document processing method according to an aspect of the present invention includes the following steps. That is,
A document processing method for inputting and analyzing a structured document,
A normalized XML analysis step of analyzing the normalized XML document;
A normalization step of converting a non-normalized XML document into a normalized XML document and supplying it to the normalized XML analysis step;
An identification step for identifying whether or not the input document is at least a normalized XML document, and an XML document identified as not being a normalized XML document in the identification step is converted into a normalized XML document by the normalization step. It converts, and it analyzes by the said normalization XML analysis process, It is characterized by the above-mentioned.

本発明によれば、最小限の文書解析機能と、汎用的な記述を最小限の文法規則に変換する正規化機能とを分けて利用することができる。これにより、正規化ＸＭＬ文書に対しては最小限の資源と時間で、非正規化ＸＭＬ文書に対して、必要な資源を追加して効率良く解析処理を行うことが可能となる。 According to the present invention, a minimum document analysis function and a normalization function for converting a general-purpose description into a minimum grammar rule can be used separately. This makes it possible to efficiently perform analysis processing by adding necessary resources to a denormalized XML document with a minimum amount of resources and time for a normalized XML document.

以下、添付図面を参照して本発明の好適な実施の形態を詳しく説明する。尚、以下の実施の形態は特許請求の範囲に係る本発明を限定するものでなく、また本実施の形態で説明されている特徴の組み合わせの全てが本発明の解決手段に必須のものとは限らない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the present invention according to the claims, and all combinations of features described in the present embodiments are essential to the solution means of the present invention. Not exclusively.

＜実施の形態１＞
本発明の実施の形態１では、ＤＯＭパーサを用いて、正規化ＸＭＬ文書、非正規化ＸＭＬ文書、非ＸＭＬ文書の解析を行う例を説明する。 <Embodiment 1>
In the first embodiment of the present invention, an example in which a normalized XML document, a denormalized XML document, and a non-XML document are analyzed using a DOM parser will be described.

図１は、本実施の形態に係る情報処理装置の概略構成を説明するブロック図である。 FIG. 1 is a block diagram illustrating a schematic configuration of the information processing apparatus according to the present embodiment.

図１において、１０１は、この情報処理装置全体を制御する中央制御部（以下、ＣＰＵ）である。ＲＯＭ１０２は、変更を必要としないプログラムやパラメータ、各種データを格納している。ＲＡＭ１０３は、外部装置などから供給されるプログラムやデータを一時的に記憶する。記憶部１０４は、機器に固定して設置されたハードディスクやメモリカード、或は着脱可能なフレキシブルディスク（ＦＤ）やCompact Disk（ＣＤ）等の光ディスク、磁気や光カード、ＩＣカード、メモリカードなどを含む外部記憶装置である。この記憶部１０４には、ＯＳや各種アプリケーションプログラムがインストールされており、これらプログラムは実行時にＲＡＭ１０３にロードされ、ＣＰＵ１０１の制御の下に実行される。操作部１０５は、ユーザによる操作を受け、データを入力するポインティングデバイスやキーボード、その他ハードウェアキーやタッチパネル等の入力デバイスと、そのインタフェースを含む。表示部１０６は、この装置の保持するデータや供給されたデータを表示するためのモニタ、及びそのインタフェースを含んでいる。 In FIG. 1, reference numeral 101 denotes a central control unit (hereinafter referred to as CPU) that controls the entire information processing apparatus. The ROM 102 stores programs, parameters, and various data that do not need to be changed. The RAM 103 temporarily stores programs and data supplied from an external device or the like. The storage unit 104 is a hard disk or memory card that is fixedly installed in a device, an optical disk such as a removable flexible disk (FD) or Compact Disk (CD), a magnetic or optical card, an IC card, a memory card, or the like. Including an external storage device. An OS and various application programs are installed in the storage unit 104, and these programs are loaded into the RAM 103 at the time of execution and executed under the control of the CPU 101. The operation unit 105 includes an input device such as a pointing device and a keyboard for inputting data in response to a user operation and other hardware keys and a touch panel, and an interface thereof. The display unit 106 includes a monitor for displaying data held by the apparatus and supplied data, and an interface thereof.

ここで記憶部１０４には、予め解析対象の文書が格納されている。この文書の格納手順に関しては本発明の範囲外のため触れないこととする。格納されている文書を読み出して各処理を行い、処理後必要に応じて、当該文書を記憶部１０４に格納する。 Here, a document to be analyzed is stored in the storage unit 104 in advance. This document storage procedure is out of the scope of the present invention and will not be described. The stored document is read and each process is performed, and the document is stored in the storage unit 104 as necessary after the process.

ＣＰＵ１０１により実行されるプログラムは、正規化ＸＭＬ文書を解析するＸＭＬパーサと、非特許文献２で定義された１４個の正規化変換規則に従って、入力されたＸＭＬ文書を正規化ＸＭＬ文書へ変換する正規化部とを含む。この正規化変換規則のうち特徴的なものとして以下のようなものが挙げられる。
（１）改行文字を「＃ｘＡ」（１６進数でＡ番目の文字の意。１０進数では「１０」に相当。以下同様）とする。
（２）ＸＭＬ宣言、ＤＴＤ宣言、コメントは削除
（３）空要素タグを開始、終了タグの組に置換
（４）属性値の囲み記号を「"」に統一
（１）を実現するために、正規化部は入力されたＸＭＬ文書を先頭から順に読み取り、改行文字に利用される記号（＃ｘＡ，＃ｘＤ）を探す。汎用的なＸＭＬ文書で１つの改行文字として利用されるパターンは、「＃ｘＡ」のみ、「＃ｘＤ」のみ、「＃ｘＤ＃ｘＡ」の連続の３種類がある。正規化部は、入力された改行文字がどのパターンに当てはまるのかを順に確認し、全てのパターンを「＃ｘＡ」に置換して、正規化結果として出力する。 A program executed by the CPU 101 includes an XML parser that analyzes a normalized XML document and a normalization that converts an input XML document into a normalized XML document in accordance with 14 normalization conversion rules defined in Non-Patent Document 2. Including Among the normalization conversion rules, the following are characteristic.
(1) The line feed character is “#xA” (meaning the Ath character in hexadecimal notation, equivalent to “10” in decimal notation, and so on).
(2) Delete XML declaration, DTD declaration, and comment (3) Replace empty element tag with a set of start and end tags (4) Unify attribute value enclosure symbol to """(1) The normalization unit reads the input XML document in order from the top and searches for symbols (#xA, #xD) used for line feed characters. There are three types of patterns that are used as a single line feed character in a general-purpose XML document: “#xA” only, “#xD” only, and “#xD #xA”. The normalization unit sequentially checks to which pattern the input line feed character applies, replaces all patterns with “#xA”, and outputs the result as a normalization result.

また（２）を実現するために、正規化部は、入力されたＸＭＬ文書からＸＭＬ宣言部、ＤＴＤ宣言部、コメントに該当する部分を探し出す。但し、正規化結果としては一切出力しない。 In order to realize (2), the normalization unit searches for an XML declaration part, a DTD declaration part, and a part corresponding to a comment from the input XML document. However, no normalization result is output.

また（３）を実現するために、正規化部は、開始、終了、空要素タグの先頭を示す記号「＜」を探す。それぞれのタグは、「＜要素名属性属性 ...＞」、「＜／要素名＞」、「＜要素名属性属性 ...／＞」という文法を取るので、正規化部はそれぞれの文法を認識しながら正規化方法を切り替える。まず、次の文字が「／」であるかどうかを確認し、「／」であれば終了タグとして通常の終了タグの情報を出力する。一方、それ以外の要素名として利用できる文字であった場合は、開始、空要素タグのいずれかであるものとして、要素名に該当する部分のコピーを取りながら要素名を出力する。更に、属性の記述があれば、それはコピーを取らずにそのまま出力する。最後に「＞」が現れると、開始タグであったと認識し、コピーした情報を破棄し、通常の開始タグとして「＞」を出力する。 In order to realize (3), the normalization unit searches for the symbol “<” indicating the start, end, and head of the empty element tag. Since each tag has a grammar of “<element name attribute attribute ...>”, “</ element name>”, and “<element name attribute attribute ... />”, the normalization unit has a grammar of each. Switch the normalization method while recognizing. First, it is confirmed whether or not the next character is “/”. If it is “/”, information of a normal end tag is output as an end tag. On the other hand, if it is a character that can be used as an element name other than that, the element name is output while taking a copy of the part corresponding to the element name, assuming that it is either a start or an empty element tag. Furthermore, if there is a description of the attribute, it is output as it is without making a copy. When ">" appears at the end, it recognizes that it was a start tag, discards the copied information, and outputs ">" as a normal start tag.

一方、「／＞」が確認できた場合は、空要素タグであったと認識する。空要素タグであると認識した場合、「＞」を出力して開始タグを閉じた直後に、そのコピーした要素名を持つ終了タグを出力する。そして、先に出力した開始タグを閉じる終了タグを正規化ＸＭＬ文書に加える。このようなプロセスを辿ることにより、空要素タグが全て開始及び終了タグの組に正規化される。 On the other hand, if “/>” can be confirmed, it is recognized as an empty element tag. If the tag is recognized as an empty element tag, immediately after closing the start tag by outputting “>”, an end tag having the copied element name is output. Then, an end tag for closing the output start tag is added to the normalized XML document. By following such a process, all empty element tags are normalized to a set of start and end tags.

また（４）を実現するために正規化部は、入力されたＸＭＬ文書の属性値の囲み記号を確認する。属性の書式は「属性名＝"属性値"」と、「属性名＝'属性値'」のいずれかとなる。正規化部が「＝」の後に「"」を確認した場合は、既に正規化された書式となっているため、何も加工せずにそのまま属性の情報を最後まで出力する。 In order to realize (4), the normalization unit confirms the enclosing symbol of the attribute value of the input XML document. The attribute format is either “attribute name =“ attribute value ”” or “attribute name = 'attribute value'”. If the normalization unit confirms “” after “=”, the format has already been normalized, and the attribute information is output to the end without any processing.

一方、正規化部が「＝」の後に「'」を確認した場合は、代わりに「"」を出力し、それ以降に続く文字が「"」又は「'」ではないことを確認しながら、順次入力された文字をそのまま出力する。もし「"」が入力された場合は、「"」を意味する実体参照である「 """ 」を出力し、属性値として「"」が出力されることを避ける。 On the other hand, when the normalization unit confirms "'" after "=", it outputs "" "instead, and confirms that the subsequent characters are not" "" or "'" Sequentially input characters are output as they are. If "" "is input," "" "" that is an entity reference meaning "" "is output, and" "" is not output as an attribute value.

一方、「'」が入力された場合は、属性値の宣言が終了したと認識し「"」を出力する。そして次の属性の宣言、又はタグの閉じる記号を探すプロセスに移行する。 On the other hand, when “′” is input, it is recognized that the declaration of the attribute value is completed, and ““ ”is output. Then, the process proceeds to the process of searching for the next attribute declaration or tag closing symbol.

以上のような処理を組み合わせることで、正規化部はＸＭＬ文書を正規化ＸＭＬ文書へ変換できる。 By combining the above processing, the normalization unit can convert the XML document into a normalized XML document.

図２〜図４のそれぞれは、本実施の形態で解析対象となる正規化ＸＭＬ文書、非正規化ＸＭＬ文書、非ＸＭＬ文書の一例を示す図である。 Each of FIG. 2 to FIG. 4 is a diagram illustrating an example of a normalized XML document, a non-normalized XML document, and a non-XML document to be analyzed in the present embodiment.

図２は、正規化ＸＭＬの仕様に従って記述されている。図３は、属性値の囲み記号として、ＸＭＬでは認められているが正規化ＸＭＬでは認められていない記号（'）を用いた例を示している。更に図４は、属性値の囲み記号がないためＸＭＬ文書というカテゴリからも外れている。尚、これら図では、説明のために行番号と改行、空白文字を適宜加えている。 FIG. 2 is described according to the specification of normalized XML. FIG. 3 shows an example in which a symbol (') that is recognized in XML but not allowed in normalized XML is used as an enclosure symbol for an attribute value. Further, FIG. 4 is also out of the category of XML document because there is no surrounding symbol for attribute values. In these figures, line numbers, line feeds, and white space characters are added as appropriate for explanation.

図５は、本実施の形態１に係るＤＯＭパーサ５０１の機能構成を示すブロック図である。尚、このＤＯＭパーサ５０１の具体的なハードウェア構成は、前述の図１に示す構成と同様である。 FIG. 5 is a block diagram showing a functional configuration of the DOM parser 501 according to the first embodiment. The specific hardware configuration of the DOM parser 501 is the same as that shown in FIG.

このＤＯＭパーサ５０１は、正規化ＸＭＬ文書を解析する機能を備える正規化ＤＯＭパーサ部５０２を有している。この正規化ＤＯＭパーサ部５０２は、正規化ＸＭＬ文書の解析、及び正規化ＸＭＬ文書と非正規化ＸＭＬ文書との判別を実行する。解析対象である文書は記憶部５０８より取得され、その解析結果は、出力インターフェース（Ｉ／Ｆ）部５０７を介してアプリケーション５０９へ伝えられる。記憶部５０８は、解析対象となる複数の文書を記憶している。記憶されている文書は、入力Ｉ／Ｆ部５０６を介して正規化ＤＯＭパーサ部５０２に送られる。 The DOM parser 501 has a normalized DOM parser unit 502 having a function of analyzing a normalized XML document. The normalized DOM parser unit 502 analyzes the normalized XML document and discriminates between the normalized XML document and the non-normalized XML document. The document to be analyzed is acquired from the storage unit 508, and the analysis result is transmitted to the application 509 via the output interface (I / F) unit 507. The storage unit 508 stores a plurality of documents to be analyzed. The stored document is sent to the normalized DOM parser unit 502 via the input I / F unit 506.

正規化ＤＯＭパーサ部５０２は解析した文書に文法違反を発見すると、文法違反情報を生成して文法違反情報確認部５０３へ伝える。これにより文法違反情報確認部５０３は、文法違反情報確認テーブルを格納するデータベース５０５から受け取った文法違反情報が、非正規化ＸＭＬ文書解析時に生成されるものかどうかを判別する。正規化部５０４は、解析に失敗した文書を正規化して正規化ＤＯＭパーサ部５０２に送出する。 When the normalized DOM parser unit 502 finds a grammatical violation in the analyzed document, it generates grammatical violation information and transmits it to the grammatical violation information confirmation unit 503. Thereby, the grammar violation information confirmation unit 503 determines whether or not the grammar violation information received from the database 505 storing the grammar violation information confirmation table is generated at the time of denormalized XML document analysis. The normalization unit 504 normalizes the document that failed to be analyzed and sends it to the normalized DOM parser unit 502.

図６は、本発明の実施の形態１に係るＤＯＭパーサ５０１における処理手順を説明するフローチャートである。以下、解析対象となる文書（図２〜図４）を解析する処理手順を説明する。尚、このフローチャートで示される処理を実行するプログラムは、その実行時にはＲＡＭ１０３に記憶され、ＣＰＵ１０１の制御の下に実行される。
（Ａ）図２に示す正規化ＸＭＬ文書を記憶部５０８より取得する場合を説明する。 FIG. 6 is a flowchart for explaining a processing procedure in the DOM parser 501 according to the first embodiment of the present invention. A processing procedure for analyzing a document to be analyzed (FIGS. 2 to 4) will be described below. A program for executing the processing shown in this flowchart is stored in the RAM 103 at the time of execution, and is executed under the control of the CPU 101.
(A) A case where the normalized XML document shown in FIG. 2 is acquired from the storage unit 508 will be described.

ＤＯＭパーサ５０１が、入力Ｉ／Ｆ部５０６を介して図２に示す正規化ＸＭＬ文書を取得する。まずステップＳ６０１では、正規化ＤＯＭパーサ部５０２で、その文書を解析する。この場合、正規化ＤＯＭパーサ５０２は、最後まで文法違反を検出せずに解析を終了し、内部にＤＯＭツリーを構築する。従ってステップＳ６０２では文法違反がなかったため、解析対象であるＸＭＬ文書を正規化ＸＭＬ文書であると判断する。そしてステップＳ６０３で、正規化ＤＯＭパーサ５０２内部に構築されたＤＯＭツリーを最終的な解析結果とし、出力Ｉ／Ｆ部５０７を介してアプリケーション５０９へ伝える。こうしてステップＳ６０４で、正規化ＸＭＬ文書の解析を正常終了する。
（Ｂ）図３に示す非正規化ＸＭＬ文書を記憶部５０８より取得した場合について説明する。 The DOM parser 501 acquires the normalized XML document shown in FIG. 2 via the input I / F unit 506. First, in step S601, the normalized DOM parser unit 502 analyzes the document. In this case, the normalized DOM parser 502 ends the analysis without detecting a grammatical violation until the end, and builds a DOM tree therein. Accordingly, since there is no grammatical violation in step S602, it is determined that the XML document to be analyzed is a normalized XML document. In step S603, the DOM tree built in the normalized DOM parser 502 is used as a final analysis result, and is transmitted to the application 509 via the output I / F unit 507. In step S604, the analysis of the normalized XML document ends normally.
(B) A case where the denormalized XML document shown in FIG. 3 is acquired from the storage unit 508 will be described.

この場合は、ステップＳ６０１の正規化ＤＯＭパーサ部５０２による解析処理において、図３の２行目に記述されている属性値の囲み記号に対する文法違反情報が生成される。これによりステップＳ６０２で解析が成功せず、解析対象の文書が、非正規化ＸＭＬ文書或は非ＸＭＬ文書であると判断してステップＳ６０５に進む。このとき、正規化ＤＯＭパーサ部５０２内で途中まで構築されていたＤＯＭツリーが破棄され、生成された文法違反情報は文法違反情報確認部５０３へ伝えられる。これにより文法違反情報確認部５０３は、図７に示す文法違反情報確認テーブル５０５から受け取った文法違反情報が、非正規化ＸＭＬ文書解析時に生成されるものかどうかを判断する。 In this case, in the analysis process by the normalized DOM parser unit 502 in step S601, grammatical violation information for the enclosing symbol of the attribute value described in the second line of FIG. 3 is generated. Accordingly, the analysis is not successful in step S602, and it is determined that the document to be analyzed is a non-normalized XML document or a non-XML document, and the process proceeds to step S605. At this time, the DOM tree constructed halfway in the normalized DOM parser unit 502 is discarded, and the generated grammatical violation information is transmitted to the grammatical violation information confirmation unit 503. Thereby, the grammar violation information confirmation unit 503 determines whether or not the grammar violation information received from the grammar violation information confirmation table 505 shown in FIG. 7 is generated at the time of denormalized XML document analysis.

図７は、本発明の実施の形態１〜２に係る文法違反情報対応テーブル５０５の具体例を示す図である。 FIG. 7 is a diagram showing a specific example of the grammatical violation information correspondence table 505 according to Embodiments 1 and 2 of the present invention.

ここには、解析の結果、エラーとして判定される「エラー内容」と、その原因がとして「非ＸＭＬ文書であるために発生したエラー」、或は「非正規化文書であるために発生したエラー」であるかが記述されている。尚、図７はあくまでも具体例の一例を示すものであり、これ以外のエラー項目や、エラー原因が含まれていても良い。 Here, “error content” determined as an error as a result of analysis and “error generated because it is a non-XML document” or “error generated because it is a denormalized document” as the cause Is described. Note that FIG. 7 is merely an example of a specific example, and other error items and error causes may be included.

図３の文書の場合は、属性の解析中に、記号「＝」の次に記号「'」が出現したという文法違反情報を受け取る。これは図７の７００で示すエラー内容に該当する。その結果、解析対象の文書が非正規化ＸＭＬ文書である可能性があると判断する。これによりステップＳ６０６からステップＳ６０８に進み、その解析に失敗した文書に対して正規化部５０４にて正規化を行う。次にステップＳ６０９で、エラーが発生しなければステップＳ６１０に進み、その正規化したＸＭＬ文書を再度、正規化ＤＯＭパーサ５０２に供給して解析する。そしてステップＳ６１１で、文法違反が検出されなければステップＳ６１２に進み、その解析対象の文書は非正規化ＸＭＬ文書であると判断し、２度目に解析した結果生成されるＤＯＭツリーをアプリケーション５０９へ通知する。こうしてステップＳ６１３で、非正規化ＸＭＬ文書に対する処理を正常に終了する。 In the case of the document in FIG. 3, grammatical violation information that the symbol “′” appears after the symbol “=” during attribute analysis is received. This corresponds to the error content indicated by 700 in FIG. As a result, it is determined that there is a possibility that the document to be analyzed is a denormalized XML document. As a result, the process proceeds from step S606 to step S608, and normalization is performed by the normalization unit 504 for the document that failed to be analyzed. In step S609, if no error occurs, the process advances to step S610, and the normalized XML document is supplied again to the normalized DOM parser 502 for analysis. If no grammatical violation is detected in step S611, the process advances to step S612 to determine that the analysis target document is a denormalized XML document, and notify the application 509 of the DOM tree generated as a result of the second analysis. To do. Thus, in step S613, the process for the denormalized XML document is normally terminated.

一方、ステップＳ６０９或はステップＳ６１１で、文法違反が検出された時には非ＸＭＬ文書と判断してステップＳ６１４に進み、その旨をアプリケーション５０９へ伝えて、非ＸＭＬ文書に対する処理（異常処理）を終了する。
（Ｃ）図４に示す非ＸＭＬ文書を解析する場合を説明する。 On the other hand, if a grammatical violation is detected in step S609 or step S611, the document is determined to be a non-XML document, and the process proceeds to step S614. .
(C) A case where the non-XML document shown in FIG. 4 is analyzed will be described.

図４に示す文書を解析すると、正規化ＤＯＭパーサ部５０２は、属性解析時に記号「＝」の次に「"」と「'」以外の記号があるという内容の文法違反情報を生成する。これにより、ステップＳ６０６で、非ＸＭＬ文書の場合のみに生成されるエラー内容であると判断してステップＳ６０７に進み、その旨をアプリケーション５０９へ伝えて、非ＸＭＬ文書に対する処理（異常処理）を終了する。 When the document shown in FIG. 4 is analyzed, the normalized DOM parser unit 502 generates grammatical violation information indicating that there is a symbol other than ““ ”and“ ′ ”next to the symbol“ = ”during attribute analysis. As a result, in step S606, it is determined that the error content is generated only in the case of a non-XML document, and the process proceeds to step S607. To do.

以上説明したように本実施の形態１によれば、正規化ＸＭＬ文書のみに対応した正規化ＤＯＭパーサ部５０２と、正規化部５０４とを適宜組み合わせながら正規化ＸＭＬ文書及び非正規化ＸＭＬ文書を共に解析できるＤＯＭパーサ５０１を提供することができる。このＤＯＭパーサを使うことで、入力された文書を最適なコストで解析処理を行うことができる。 As described above, according to the first embodiment, the normalized XML document and the non-normalized XML document are combined with the normalized DOM parser unit 502 corresponding to only the normalized XML document and the normalizing unit 504 as appropriate. A DOM parser 501 that can be analyzed together can be provided. By using this DOM parser, an input document can be analyzed at an optimal cost.

＜実施の形態２＞
次に本発明の実施の形態２について説明する。尚、本実施の形態２に係るＤＯＭパーサのハードウェア構成は図１で示す実施の形態１の構成と同じであるため、その説明を省略する。 <Embodiment 2>
Next, a second embodiment of the present invention will be described. The hardware configuration of the DOM parser according to the second embodiment is the same as that of the first embodiment shown in FIG.

本実施の形態２では、ＳＡＸパーサ部を用いて、正規化ＸＭＬ文書、非正規化ＸＭＬ文書、非ＸＭＬ文書を解析する例を説明する。本実施の形態２は、前述の実施の形態１に対し正規化ＤＯＭパーサ部５０２を正規化ＳＡＸパーサ部８０２に変更している点が異なっている。 In the second embodiment, an example in which a normalized XML document, a non-normalized XML document, and a non-XML document are analyzed using a SAX parser unit will be described. The second embodiment is different from the first embodiment in that the normalized DOM parser unit 502 is changed to a normalized SAX parser unit 802.

図８は、本発明の実施の形態２に係るＤＯＭパーサ８０１の機能構成を示すブロック図で、前述の実施の形態１の構成と共通する部分は同じ記号で示し、それらの説明を省略する。尚、このＤＯＭパーサ８０１の具体的なハードウェア構成は、前述の図１に示す構成と同様である。 FIG. 8 is a block diagram showing a functional configuration of the DOM parser 801 according to the second embodiment of the present invention. Components common to the configuration of the first embodiment are indicated by the same symbols, and description thereof is omitted. The specific hardware configuration of the DOM parser 801 is the same as that shown in FIG.

ＳＡＸパーサ８０１は、前述の実施の形態１と同様に、図２〜図４に示す解析対象文書を記憶部５０８より取得する。このＳＡＸパーサ８０１は、前述の実施の形態１のＤＯＭパーサ５０１とは異なり、解析処理の進行に合わせて逐次解析結果をアプリケーション５０９へ通知するものとする。 The SAX parser 801 acquires the analysis target document shown in FIGS. 2 to 4 from the storage unit 508, as in the first embodiment. Unlike the DOM parser 501 according to the first embodiment, the SAX parser 801 notifies the application 509 of the analysis results sequentially as the analysis process proceeds.

図９は、本発明の実施の形態２に係るＤＯＭパーサ８０１における処理手順を説明するフローチャートである。以下、解析対象となる文書（図２〜図４）を解析する処理手順を説明する。尚、このフローチャートで示される処理を実行するプログラムは、その実行時にはＲＡＭ１０３に記憶され、ＣＰＵ１０１の制御の下に実行される。
（Ａ）図２に示す正規化ＸＭＬ文書を解析する場合について説明する。 FIG. 9 is a flowchart for explaining a processing procedure in the DOM parser 801 according to the second embodiment of the present invention. A processing procedure for analyzing a document to be analyzed (FIGS. 2 to 4) will be described below. A program for executing the processing shown in this flowchart is stored in the RAM 103 at the time of execution, and is executed under the control of the CPU 101.
(A) A case where the normalized XML document shown in FIG. 2 is analyzed will be described.

この場合は、正規化ＳＡＸパーサ８０２で文法違反が解決されないため、実施の形態１と同様に、ステップＳ９０１〜Ｓ９０４で解析を行い、一通り解析が終了するとステップＳ９０４からステップＳ９０５に進み、正規化ＸＭＬ文書の解析処理を正常に終了する。
（Ｂ）図３に示す非正規化ＸＭＬ文書を解析する場合を説明する。 In this case, since the grammatical violation is not resolved by the normalized SAX parser 802, the analysis is performed in steps S901 to S904 as in the first embodiment. When the analysis is completed, the process proceeds from step S904 to step S905, and the normalization is performed. The XML document analysis process ends normally.
(B) A case where the non-normalized XML document shown in FIG. 3 is analyzed will be described.

実施の形態１と同様に、ステップＳ９０２で文法違反を検出すると、文法違反情報を生成してステップＳ９０６に進む。ステップＳ９０６では、前述の実施の形態１とは異なり、それまでの解析結果をアプリケーション５０９へ通知する。これにより、文法違反情報確認部５０３の処理（Ｓ９０６から９０７）に移る前に、アプリケーション５０９に対して、それまでの解析結果を無効にする情報を通知できる。その後のステップＳ９０７〜Ｓ９１５の処理は、前述の実施の形態１のステップＳ６０５〜Ｓ６１４の処理と共通である。但し、この実施の形態２では、ステップＳ９１０で、正規化部５０４でＸＭＬ文書を正規化した後、ステップＳ６０９におけるエラー判定処理を省略し、ステップＳ９１１で、正規化ＳＡＸパーサ８０２で、その正規化されたＸＭＬ文書を解析している。そして解析に成功してステップＳ９１４で解析が終了するとステップＳ９１５で、その正規化ＸＭＬ文書の解析処理を正常に終了する。ここで解析エラーが発生するとステップＳ９０９で、異常終了となる。
（Ｃ）図４に示す非ＸＭＬ文書を解析する場合を説明する。 As in the first embodiment, when a grammatical violation is detected in step S902, grammatical violation information is generated and the process proceeds to step S906. In step S906, unlike the first embodiment, the application 509 is notified of the analysis results up to that point. Thereby, before moving to the processing (S906 to 907) of the grammatical violation information confirmation unit 503, it is possible to notify the application 509 of information for invalidating the analysis result so far. Subsequent steps S907 to S915 are the same as steps S605 to S614 in the first embodiment. However, in the second embodiment, after the XML document is normalized by the normalization unit 504 in step S910, the error determination process in step S609 is omitted, and in step S911, the normalization SAX parser 802 performs the normalization. Analyzing the generated XML document. When the analysis is successful and the analysis ends in step S914, the analysis processing of the normalized XML document is normally ended in step S915. If an analysis error occurs, the process ends abnormally in step S909.
(C) A case where the non-XML document shown in FIG. 4 is analyzed will be described.

この場合は、ステップＳ９０２でエラーとなってステップＳ９０６に進み、前述の実施の形態１とは異なり、それまでの解析結果をアプリケーション５０９へ通知する。これにより、文法違反情報確認部５０３の処理（Ｓ９０６から９０７）に移る前に、アプリケーション５０９に対して、それまでの解析結果を無効にする情報を通知できる。そしてステップＳ９０８で、非正規化文書で発生するエラーではないためステップＳ９０９に進み、非ＸＭＬ文書に対する異常終了処理に進む。 In this case, an error occurs in step S902, and the process advances to step S906 to notify the application 509 of the analysis result so far, unlike the first embodiment. Thereby, before moving to the processing (S906 to 907) of the grammatical violation information confirmation unit 503, it is possible to notify the application 509 of information for invalidating the analysis result so far. In step S908, since the error does not occur in the denormalized document, the process proceeds to step S909, and the abnormal termination process for the non-XML document is performed.

以上説明したように本実施の形態２によれば、ＳＡＸパーサ８０１においても、正規化ＸＭＬ文書にのみ対応した正規化ＳＡＸパーサ部８０２と正規化部５０４とを組み合わせて、最適なコストで解析処理を行うことができる。 As described above, according to the second embodiment, the SAX parser 801 also combines the normalized SAX parser unit 802 and the normalization unit 504 corresponding only to the normalized XML document, and performs analysis processing at an optimal cost. It can be performed.

＜実施の形態３＞
次に本発明の実施の形態３を説明する。この実施の形態３では、実施の形態３に係るパーサを用いて正規化ＸＭＬ文書、非正規化ＸＭＬ文書、非ＸＭＬ文書を解析する例を説明する。本実施の形態３は、前述の実施の形態１，２とは異なり、正規化部５０４が解析対象文書（図２〜図４）を最初に分析する。また正規化パーサ部１００２は、正規化ＤＯＭパーサ部５０２、正規化ＳＡＸパーサ部８０２のどちらを利用してもよく、また、正規化ＸＭＬ文書を解析する機能を持つ他のパーサを利用してもよい。 <Embodiment 3>
Next, a third embodiment of the present invention will be described. In the third embodiment, an example in which a normalized XML document, a denormalized XML document, and a non-XML document are analyzed using the parser according to the third embodiment will be described. In the third embodiment, unlike the first and second embodiments, the normalization unit 504 first analyzes the analysis target document (FIGS. 2 to 4). The normalized parser unit 1002 may use either the normalized DOM parser unit 502 or the normalized SAX parser unit 802, or may use another parser having a function of analyzing a normalized XML document. Good.

図１０は、本発明の実施の形態３に係るパーサ１００１の構成を説明するブロック図で、前述の実施の形態の構成（図５）と共通する部分は同じ記号で示している。また、このパーサ１００１の具体的なハードウェア構成は、前述の図１に示す構成と同様である。 FIG. 10 is a block diagram for explaining the configuration of the parser 1001 according to the third embodiment of the present invention, in which parts common to the configuration of the above-described embodiment (FIG. 5) are indicated by the same symbols. The specific hardware configuration of the parser 1001 is the same as that shown in FIG.

図１０では、入力した文書は、最初に正規化部５０４に入力されて正規化された後、後段の正規化パーサ部１００２に送られる。 In FIG. 10, the input document is first input to the normalization unit 504 and normalized, and then sent to the normalization parser unit 1002 at the subsequent stage.

図１１は、本発明の実施の形態３に係るパーサ１００１における処理手順を説明するフローチャートである。以下、解析対象となる文書（図２〜図４）を解析する処理手順を説明する。尚、このフローチャートで示される処理を実行するプログラムは、その実行時にはＲＡＭ１０３に記憶され、ＣＰＵ１０１の制御の下に実行される。 FIG. 11 is a flowchart illustrating a processing procedure in parser 1001 according to Embodiment 3 of the present invention. A processing procedure for analyzing a document to be analyzed (FIGS. 2 to 4) will be described below. A program for executing the processing shown in this flowchart is stored in the RAM 103 at the time of execution, and is executed under the control of the CPU 101.

まずステップＳ１１０１で、記憶部５０８より取得した解析対象文書を、最初に正規化部５０４で正規化する。
（Ａ）図２に示す正規化ＸＭＬ文書を解析する場合について説明する。 First, in step S1101, the analysis target document acquired from the storage unit 508 is first normalized by the normalization unit 504.
(A) A case where the normalized XML document shown in FIG. 2 is analyzed will be described.

この場合、正規化を行う際に文法違反は検出されないので、ステップＳ１１０２でエラーが発生せず、正規化部５０４は解析対象文章を正規化ＸＭＬ文書であると判断してステップＳ１１０３に進む。ステップＳ１１０３では、その正規化ＸＭＬ文書を正規化パーサ１００２で解析する。そしてステップＳ１１０４で解析に成功するとステップＳ１１０５に進み、その解析処理の結果をアプリケーション５０９に通知する。 In this case, since no grammatical violation is detected when normalization is performed, an error does not occur in step S1102, and the normalization unit 504 determines that the analysis target sentence is a normalized XML document and proceeds to step S1103. In step S1103, the normalized XML document is analyzed by the normalized parser 1002. If the analysis is successful in step S1104, the process advances to step S1105 to notify the application 509 of the result of the analysis processing.

また、アプリケーション５０９が、正規化ＸＭＬか否かの判断結果を必要とした場合はステップＳ１１０６で、正規化部５０４において正規化処理の前後で文字の置換の有無に関する情報を利用し、判断結果を伝えることができる。即ち、ステップＳ１１０６で文字の置換が行われるとステップＳ１１０７に進み、非正規化ＸＭＬ文書として正常終了する。またステップＳ１１０６で文字の置換が行われない場合はステップＳ１１０８に進み、正規化ＸＭＬ文書として正常終了する。
（Ｂ）図３に示す非正規化ＸＭＬ文書を解析する場合を説明する。 If the application 509 requires a determination result as to whether or not it is normalized XML, in step S1106, the normalization unit 504 uses information regarding the presence / absence of character replacement before and after the normalization process, and determines the determination result. I can tell you. That is, if character replacement is performed in step S1106, the process proceeds to step S1107, and the process ends normally as a denormalized XML document. If no character replacement is performed in step S1106, the process proceeds to step S1108, where the process ends normally as a normalized XML document.
(B) A case where the non-normalized XML document shown in FIG. 3 is analyzed will be described.

この場合も、前述の図２の場合と同様に、正規化部５０４において正規化処理を実行する。ステップＳ１１０２〜Ｓ１１０６では、その正規化結果に基づいて、図２に示す正規化ＸＭＬ文書と同様に処理を行う。こうして解析結果をアプリケーション５０９へ伝えることができる。
（Ｃ）図４に示す非ＸＭＬ文書を解析する場合を説明する。 Also in this case, the normalization processing is executed in the normalization unit 504 as in the case of FIG. In steps S1102 to S1106, processing is performed in the same manner as the normalized XML document shown in FIG. 2 based on the normalization result. In this way, the analysis result can be transmitted to the application 509.
(C) A case where the non-XML document shown in FIG. 4 is analyzed will be described.

正規化部５０４又は正規化ＸＭＬパーサ部１００２で文法違反を検出すると（Ｓ１１０２，Ｓ１１０４）ステップＳ１１０９に進み、その旨をアプリケーション５０９へ伝えるとともに、非ＸＭＬ文書として異常終了する。 If the normalization unit 504 or the normalization XML parser unit 1002 detects a grammatical violation (S1102, S1104), the process proceeds to step S1109 to notify the application 509 of the fact and abnormally terminates as a non-XML document.

以上説明したように本実施の形態３によれば、非正規化ＸＭＬ文書を多く解析する環境であっても正規化パーサ１００２を用いた解析を効率良く行うことができる。 As described above, according to the third embodiment, the analysis using the normalized parser 1002 can be efficiently performed even in an environment in which many unnormalized XML documents are analyzed.

＜実施の形態４＞
本実施の形態４及び５では、本実施の形態に係るパーサを備え、ネットワークを介してＸＭＬ文書を交換するノードの例を示す。まず実施の形態４ではノード側に正規化機能を持たせた場合で説明する。 <Embodiment 4>
In the fourth and fifth embodiments, an example of a node that includes the parser according to the present embodiment and exchanges XML documents via a network will be described. First, the fourth embodiment will be described in the case where a normalization function is provided on the node side.

図１２は、本発明の実施の形態４に係るパーサを持つノードが接続されたネットワークを説明する図である。 FIG. 12 is a diagram for explaining a network to which nodes having parsers according to Embodiment 4 of the present invention are connected.

正規化ＸＭＬ対応ノード１２１３〜１２１５，１２２３〜１２２４は、本実施の形態４に係るパーサを備えている。また、これらのノードは他のノードへＸＭＬ文書を送る際、常に正規化ＸＭＬ文書を出力する。一方、汎用ＸＭＬ対応ノード１２３３〜１２３４は、一般的なパーサを有し、他のノードへは汎用のＸＭＬ文書を出力する。また、汎用ＸＭＬ対応ノード１２３３〜１２３４の持つパーサは正規化されていないＸＭＬ文書を解析する機能を持つ。 Each of the normalized XML corresponding nodes 1213 to 1215 and 1223 to 1224 includes the parser according to the fourth embodiment. Also, these nodes always output a normalized XML document when sending an XML document to other nodes. On the other hand, the general-purpose XML compatible nodes 1331 to 1234 have a general parser, and output a general-purpose XML document to other nodes. In addition, the parser possessed by the general-purpose XML compatible nodes 1331 to 1234 has a function of analyzing an unnormalized XML document.

これらのノードは各ＬＡＮ１２１１，１２２１，１２３１に接続され、それぞれゲートウェイ１２１２，１２２２，１２３２を介して共通のＷＡＮ１２０１へ接続されている。全てのノードはそれぞれネットワーク１２０１，１２１１，１２２１，１２３１上で一意の識別子を持っている。そして全てのノードは、上記ＬＡＮ１２１１，１２２１，１２３１、ゲートウェイ１２１２，１２２２，１２３２、ＷＡＮ１２０１を介することで相互にデータを交換する通信経路を確保している。 These nodes are connected to the respective LANs 1211, 1221, and 1231, and are connected to a common WAN 1201 through gateways 1212, 1222, and 1232, respectively. All nodes have unique identifiers on the networks 1201, 1211, 1221, and 1231, respectively. All nodes secure communication paths for exchanging data with each other via the LANs 1211, 1221, 1231, gateways 1212, 1222, 1232, and WAN 1201.

図１３は、本実施の形態４に係る正規化ＸＭＬ対応ノード１２１３の構成を示すブロック図である。尚、このノードのハードウェア構成は、前述の実施の形態１の構成と同じであるためその説明を省略する。 FIG. 13 is a block diagram showing a configuration of a normalized XML compatible node 1213 according to the fourth embodiment. Since the hardware configuration of this node is the same as the configuration of the first embodiment, the description thereof is omitted.

この正規化ＸＭＬ対応ノード１２１３はパーサ部１３０１を有し、その内部にはＬＡＮを介してＸＭＬ文書を受け取るための入力Ｉ／Ｆ部１３０２と、入力ＸＭＬ文書が正規化ＸＭＬ文書であるか否かを判定する正規化判定部１３０３とを備える。他の構成要素は前述の実施の形態と同様である。また、他の正規化ＸＭＬ対応ノード１２１４〜１２１５，１２２３〜１２２４も同様の構成となっている。 This normalized XML compatible node 1213 has a parser unit 1301, an input I / F unit 1302 for receiving an XML document via the LAN, and whether or not the input XML document is a normalized XML document. And a normalization determination unit 1303. Other components are the same as those in the above-described embodiment. Also, the other normalized XML compatible nodes 1214 to 1215 and 1223 to 1224 have the same configuration.

正規化判定部１３０３は、予めネットワーク上にある正規化ＸＭＬ対応ノードと、該当するノードのネットワークアドレスの対応テーブルを取得しておく。ネットワークアドレスとして、ドメインやサブネットアドレス、ゲートウェイのアドレスといった複数のノードを一括して指し示す情報を利用することもできる。本実施の形態４では、このサブネットアドレスを利用して、ＬＡＮごとに正規化ＸＭＬ対応ノードを指定するものとする。 The normalization determination unit 1303 acquires a correspondence table of the normalized XML compatible node on the network and the network address of the corresponding node in advance. Information indicating a plurality of nodes such as a domain, a subnet address, and a gateway address can be used as a network address. In the fourth embodiment, it is assumed that a normalized XML compatible node is designated for each LAN using this subnet address.

正規化判定部１３０３は、正規化ＸＭＬ対応ノードからのＸＭＬ文書であれば、その文書を正規化パーサ部１００２に送り、出力インターフェース部５０７を介してアプリケーション５０９に出力する。一方、正規化ＸＭＬ対応ノードからのＸＭＬ文書でないときは、その受信したＸＭＬ文書を正規化部５０４に送って正規化し、その結果を正規化パーサ部１００２に送り、出力インターフェース部５０７を介してアプリケーション５０９に出力する。 If the normalization determination unit 1303 is an XML document from a normalized XML compatible node, the normalization determination unit 1303 sends the document to the normalization parser unit 1002 and outputs the document to the application 509 via the output interface unit 507. On the other hand, if it is not an XML document from the normalized XML compatible node, the received XML document is sent to the normalization unit 504 to normalize, and the result is sent to the normalization parser unit 1002, and the application is sent via the output interface unit 507. Output to 509.

尚、図１２では、ＬＡＮ１２１１とＬＡＮ１２２１が正規化ＸＭＬ対応ノードから構成されており、各ＬＡＮに対応するサブネットアドレスを用いることで正規化ＸＭＬ対応ノードを指定できる。 In FIG. 12, the LAN 1211 and the LAN 1221 are composed of normalized XML compatible nodes, and the normalized XML compatible nodes can be designated by using the subnet address corresponding to each LAN.

図１４は、本実施の形態４に係るパーサ部１３０１における処理手順を説明するフローチャートである。以下、他のノードから受け取ったＸＭＬ文書を解析する処理手順を説明する。尚、このフローチャートで示される処理を実行するプログラムは、その実行時にはＲＡＭ１０３に記憶され、ＣＰＵ１０１の制御の下に実行される。 FIG. 14 is a flowchart illustrating a processing procedure in parser unit 1301 according to the fourth embodiment. Hereinafter, a processing procedure for analyzing an XML document received from another node will be described. A program for executing the processing shown in this flowchart is stored in the RAM 103 at the time of execution, and is executed under the control of the CPU 101.

正規化ＸＭＬ対応ノード１２１３は、ネットワークを介してＸＭＬ文書を受け取ると、まずステップＳ１４０１で、正規化判定部１３０３を用いてＸＭＬ文書の送信元ノードのアドレスを確認する。次にステップＳ１４０２で、送信元アドレスが正規化ＸＭＬノードに対応するサブネットに含まれていれば、その受け取ったＸＭＬ文書が正規化ＸＭＬ文書であると判断してステップＳ１４０６に進む。ステップＳ１４０６で、そのＸＭＬ文書を正規化パーサ部１００２に出力して解析し、ステップＳ１４０７で解析に成功したかどうかを判定する。成功した場合はステップＳ１４０８に進み、正規化パーサ部１００２にて解析した結果をアプリケーションへ通知する。こうしてステップＳ１４０９で、正常終了する。一方、ステップＳ１４０７で、解析に成功しないと判定した場合はステップＳ１４０５に進み、異常終了する。 When the normalized XML compatible node 1213 receives the XML document via the network, first, in step S1401, the normalization determination unit 1303 is used to confirm the address of the source node of the XML document. In step S1402, if the source address is included in the subnet corresponding to the normalized XML node, it is determined that the received XML document is a normalized XML document, and the process advances to step S1406. In step S1406, the XML document is output to the normalization parser unit 1002 and analyzed. In step S1407, it is determined whether the analysis is successful. If successful, the process advances to step S1408, and the result analyzed by the normalization parser unit 1002 is notified to the application. In step S1409, the process ends normally. On the other hand, if it is determined in step S1407 that the analysis is not successful, the process proceeds to step S1405 and ends abnormally.

本実施の形態４では、ＬＡＮ１２１１とＬＡＮ１２２１に接続されたノード１２１４〜１２１５，１２２３〜１２２４がサブネットアドレスの情報より正規化ＸＭＬ対応ノードと判断される。よって、これらのノードより受け取ったＸＭＬ文書は、全て正規化ＸＭＬ文書として判断され、正規化処理を行わずに正規化パーサ部１００２により解析が行われる。 In the fourth embodiment, the nodes 1214 to 1215 and 1223 to 1224 connected to the LAN 1211 and the LAN 1221 are determined as normalized XML compatible nodes from the subnet address information. Accordingly, the XML documents received from these nodes are all determined as normalized XML documents, and are analyzed by the normalization parser unit 1002 without performing normalization processing.

一方、送信元アドレスが正規化ＸＭＬノードの属するサブネットアドレス以外の場合は、ステップＳ１４０２で、汎用ＸＭＬ対応ノードから送信されたＸＭＬ文書であると判断する。この場合はステップＳ１４０３に進み、その文書を正規化部５０４に送って正規化処理を行う。ここでエラーが発生しなければステップＳ１４０６に進み、正規化パーサ部１００２による解析及びアプリケーションへの結果の通知を行う（Ｓ１４０６〜１４０８）。もしエラーが発生するとステップＳ１４０５で、異常終了となる。 On the other hand, if the source address is other than the subnet address to which the normalized XML node belongs, it is determined in step S1402 that the document is an XML document transmitted from the general-purpose XML compatible node. In this case, the process proceeds to step S1403, and the document is sent to the normalization unit 504 to perform normalization processing. If no error occurs, the process proceeds to step S1406, where the normalization parser unit 1002 analyzes and notifies the application of the result (S1406 to 1408). If an error occurs, the process ends abnormally in step S1405.

以上説明したように本実施の形態４によれば、必要な場合のみ正規化を行い、正規化ＸＭＬ文書に対して、高速かつ軽量な正規化ＸＭＬ部のみを利用するパーサを提供することができる。このパーサを使うことで、受信したデータ毎に最適なコストで解析処理を行うことができる。 As described above, according to the fourth embodiment, it is possible to provide a parser that performs normalization only when necessary and uses only a fast and lightweight normalized XML portion for a normalized XML document. . By using this parser, analysis processing can be performed at an optimal cost for each received data.

尚、正規化判定部１３０３がネットワーク上のアドレスと正規化、非正規化ネットワークとの対応情報を持たない場合でも、前述の実施の形態１〜３に示した各パーサを正規化ＸＭＬ対応ノード内のパーサ部１３０１として利用する。すると、正規化・汎用ＸＭＬ対応ノードとのＸＭＬ文書の交換を効率良く行うことができる。 Even when the normalization determination unit 1303 does not have correspondence information between the addresses on the network and the normalized and non-normalized networks, the parsers described in the above first to third embodiments are included in the normalized XML compatible node. The parser unit 1301 is used. Then, the XML document can be exchanged efficiently with the normalization / general-purpose XML compatible node.

＜実施の形態５＞
この実施の形態５では、ゲートウェイ側に正規化機能を持たせ、ゲートウェイとノードで一組の汎用パーサとする例を示す。前述の実施の形態４とは異なり、正規化ＸＭＬ対応ノードは正規化判定部１３０３や正規化部５０４を備えず、正規化ＸＭＬ対応ノードが属するＬＡＮを管理するゲートウェイ１２１２，１２２２側で正規化判定部１３０３、正規化部５０４を備える。 <Embodiment 5>
The fifth embodiment shows an example in which a normalization function is provided on the gateway side, and a set of general-purpose parsers is configured with the gateway and the node. Unlike the above-described fourth embodiment, the normalization XML compatible node does not include the normalization determination unit 1303 and the normalization unit 504, and the normalization determination is performed on the gateway 1212 and 1222 side that manages the LAN to which the normalization XML compatible node belongs. Unit 1303 and normalization unit 504.

図１５は、本発明の実施の形態５に係るゲートウェイ１５０１の構成を示すブロック図で、前述の図１３と共通する部分は同じ記号で示し、それらの説明を省略する。 FIG. 15 is a block diagram showing the configuration of the gateway 1501 according to the fifth embodiment of the present invention. The parts common to those in FIG.

ＷＡＮインターフェース部１５０２は、ＷＡＮ１２０１と、このゲートウェイ１５０１とを接続するインターフェース部である。正規化判定部１３０３は、ＷＡＮインターフェース部１５０２から入力した文書が正規化ＸＭＬ文書かどうかを判定し、正規化ＸＭＬ文書であれば、その文書をＬＡＮインターフェース部１５０３を介してＬＡＮ１２１１に出力する。一方、正規化ＸＭＬ文書でないときは、その受信したＸＭＬ文書を正規化部５０４に送って正規化し、その結果をＬＡＮインターフェース部１５０３を介してＬＡＮ１２１１に出力する。 The WAN interface unit 1502 is an interface unit that connects the WAN 1201 and the gateway 1501. The normalization determination unit 1303 determines whether the document input from the WAN interface unit 1502 is a normalized XML document. If the document is a normalized XML document, the normalization determination unit 1303 outputs the document to the LAN 1211 via the LAN interface unit 1503. On the other hand, if it is not a normalized XML document, the received XML document is sent to the normalization unit 504 for normalization, and the result is output to the LAN 1211 via the LAN interface unit 1503.

図１６は、実施の形態５に係るノード１６０１の構成を示すブロック図で、前述の図５、図１３と共通する部分は同じ記号で示し、それらの説明を省略する。 FIG. 16 is a block diagram illustrating the configuration of the node 1601 according to the fifth embodiment. Portions that are the same as those in FIGS.

図１５に示すゲートウェイ１５０１は、内部に正規化判定部１３０３を備え、前述の実施の形態４と同様の方法で正規化及び非正規化ＸＭＬ文書を判定する。正規化部５０４も前述の実施の形態４の正規化部と同様の処理を行う。 A gateway 1501 shown in FIG. 15 includes a normalization determination unit 1303 inside, and determines normalized and non-normalized XML documents by the same method as in the fourth embodiment. The normalizing unit 504 performs the same processing as that of the normalizing unit of the fourth embodiment.

図１６に示す正規化ＸＭＬ対応ノード１６０１は、内部に正規化パーサ部１００２を備え、これも実施の形態４と同様の処理を行う。 A normalized XML corresponding node 1601 shown in FIG. 16 includes a normalized parser unit 1002 inside, and this also performs the same processing as in the fourth embodiment.

図１７は、本発明の実施の形態５に係るゲートウェイ１５０１の処理手順を示すフローチャートである。尚、このフローチャートで示される処理を実行するプログラムは、その実行時にはＲＡＭ１０３に記憶され、ＣＰＵ１０１の制御の下に実行される。 FIG. 17 is a flowchart showing a processing procedure of the gateway 1501 according to the fifth embodiment of the present invention. A program for executing the processing shown in this flowchart is stored in the RAM 103 at the time of execution, and is executed under the control of the CPU 101.

ＬＡＮ１２１１内のノード間でＸＭＬ文書を交換する場合、ゲートウェイ１２１２を経由しないため、各ノード間で直接、正規化ＸＭＬ文書が交換される。この処理手順は図示しないが、例えば、正規化ＸＭＬ対応ノード１２１４から正規化ＸＭＬ対応ノード１２１３へメッセージが送られた場合、正規化ＸＭＬ対応ノード１２１４は正規化ＸＭＬ文書のみを出力する。このため、そのままＸＭＬ対応ノード１２１３が備える正規化パーサ部１００２で、その文書を解析し、アプリケーション５０９へメッセージの内容を伝える（図１６）。 When XML documents are exchanged between nodes in the LAN 1211, the normalized XML documents are exchanged directly between the nodes because they do not go through the gateway 1212. Although this processing procedure is not shown, for example, when a message is sent from the normalized XML compatible node 1214 to the normalized XML compatible node 1213, the normalized XML compatible node 1214 outputs only the normalized XML document. For this reason, the document is analyzed by the normalization parser unit 1002 provided in the XML corresponding node 1213, and the content of the message is transmitted to the application 509 (FIG. 16).

一方、例えば正規化ＸＭＬ対応ノード１２１４が他のＬＡＮ１２２１に属する正規化ＸＭＬ対応ノード１２２３からメッセージを受け取る場合、このメッセージは、順にゲートウェイ１２２２，１２１２を転送される。ゲートウェイ１２２２は、ＬＡＮ１２２１からＷＡＮ１２０１の方向に転送するメッセージに対しては何も加工せずにそのまま転送する。一方、ゲートウェイ１２１２は、ＷＡＮ１２０１からＬＡＮ１２１１へ転送するメッセージに対して、前述の実施の形態４と同様の正規化判定を行う。今回は正規化ＸＭＬ対応ノード１２２３から送られたメッセージであるため、正規化ＸＭＬ文書であると判断し（Ｓ１７０２）、そのままＬＡＮ１２１１へ転送する（Ｓ１７０６）。正規化対応ノード１２１４も、その受け取ったメッセージをそのまま正規化ＸＭＬ文書として解析し、アプリケーション５０９へ内容を通知する。 On the other hand, for example, when the normalized XML compatible node 1214 receives a message from the normalized XML compatible node 1223 belonging to another LAN 1221, this message is forwarded to the gateways 1222 and 1212 in order. The gateway 1222 transfers the message transferred from the LAN 1221 in the direction of the WAN 1201 without processing anything. On the other hand, the gateway 1212 performs normalization determination similar to that in the above-described fourth embodiment with respect to the message transferred from the WAN 1201 to the LAN 1211. Since this time the message is sent from the normalized XML compatible node 1223, it is determined to be a normalized XML document (S1702), and transferred to the LAN 1211 as it is (S1706). The normalization-compatible node 1214 also analyzes the received message as it is as a normalized XML document and notifies the application 509 of the contents.

また例えば正規化ＸＭＬ対応ノード１２１４が他のＬＡＮ１２３１に属する汎用ＸＭＬ対応ノード１２３３からメッセージを受け取る場合も順に、ゲートウェイ１２３２，１２１２を転送される。ゲートウェイ１２３２は通常のゲートウェイであるため、メッセージを何も加工せずにＬＡＮ１２３１からＷＡＮ１２０１の方向に転送する。ゲートウェイ１２１２は、ＷＡＮ１２０１からＬＡＮ１２１１へ転送するメッセージに対して、前述の実施の形態４と同様の正規化判定を行う。今回は汎用ＸＭＬ対応ノード１２３３から送られたメッセージであるため、ステップＳ１７０２で非正規化ＸＭＬ文書であると判断する。よってステップＳ１７０３に進み、正規化部５０４で正規化した後ステップＳ１７０６に進み、その正規化したＸＭＬ文書をＬＡＮ１２１１へ転送する。正規化対応ノード１２１４は、受け取ったメッセージが既に正規化されているため、そのまま正規化ＸＭＬ文書として解析を行い、アプリケーション５０９へ内容を通知する。 Further, for example, when the normalized XML compatible node 1214 receives a message from the general-purpose XML compatible node 1233 belonging to another LAN 1231, the gateways 1232 and 1212 are sequentially transferred. Since the gateway 1232 is a normal gateway, the message is transferred from the LAN 1231 to the WAN 1201 without any processing. The gateway 1212 performs normalization determination similar to that in the above-described fourth embodiment with respect to the message transferred from the WAN 1201 to the LAN 1211. Since this time the message is sent from the general-purpose XML-compatible node 1233, it is determined in step S1702 that the document is a denormalized XML document. Accordingly, the process proceeds to step S1703, and after normalization by the normalization unit 504, the process proceeds to step S1706, and the normalized XML document is transferred to the LAN 1211. Since the received message has already been normalized, the normalization corresponding node 1214 analyzes it as a normalized XML document as it is and notifies the application 509 of the contents.

以上説明したように本実施の形態５によれば、正規化ＸＭＬ対応ノードが属するＬＡＮ内部では正規化ＸＭＬ文書のみが転送されるようになる。そのため実施の形態４では正規化ＸＭＬ対応ノードに持たせていた正規化判定部や正規化部が不要となり、正規化ＸＭＬ対応ノードが非常に軽量かつ高速なものを実装することができるようになる。 As described above, according to the fifth embodiment, only the normalized XML document is transferred inside the LAN to which the normalized XML compatible node belongs. Therefore, the normalization determination unit and the normalization unit provided in the normalized XML compatible node in the fourth embodiment are not necessary, and the normalized XML compatible node can be implemented with a very light and high speed. .

一方で、ゲートウェイ側の負荷は上がるが、小型軽量安価といった特徴が求められるノードよりも比較的コストをかけることができる。また、直接ユーザが触れるものではなく小型軽量等の要求が強く求められるものでもないため負荷に対応し易い。よって、正規化処理に必要な資源を各ノードからゲートウェイに集中させることで、総合的に見てパフォーマンスを向上させることができる。 On the other hand, although the load on the gateway side increases, it can be more costly than a node that requires features such as small size and light weight. In addition, it is not something that is directly touched by the user, and is not strongly demanded for a small size and light weight, so it is easy to cope with the load. Therefore, by concentrating resources necessary for normalization processing from each node to the gateway, it is possible to improve performance in a comprehensive manner.

尚、本実施の形態５では、ゲートウェイに正規化判定部１３０３を持たせたが、ホストの情報ではなく、転送するデータ自体を解析して正規化文書か、非正規化文書かを判定することもできる。その場合は、正規化判定部に正規化ＸＭＬとして要求される文法情報を持たせ、転送データと比較するなどの方法を取る方法などが利用できる。 In the fifth embodiment, the gateway has the normalization determination unit 1303. However, the data itself to be transferred is analyzed instead of the host information to determine whether the document is a normalized document or a non-normalized document. You can also. In that case, the normalization determination unit can have grammatical information required as normalization XML, and a method of comparing with transfer data can be used.

図１８は、本発明の実施の形態５の変形例であるゲートウェイ１５０１の構成を示すブロック図である。この図１８に示すように、図１５に示す正規化判定部１３０３を無くし、ＷＡＮ１２０１からＬＡＮへ転送されるＸＭＬ文書に対し全て正規化を行うようにしてもよい。元々正規化されているＸＭＬ文書を更に正規化しても結果に影響はないため、図１５に示すゲートウェイを利用した場合と同様に、ＬＡＮで正規化ＸＭＬ文書のみ利用する環境が保たれ、同様の効果を奏することができる。 FIG. 18 is a block diagram showing a configuration of a gateway 1501 that is a modification of the fifth embodiment of the present invention. As shown in FIG. 18, the normalization determination unit 1303 shown in FIG. 15 may be eliminated, and all the XML documents transferred from the WAN 1201 to the LAN may be normalized. Even if the XML document that has been normalized is further normalized, the result is not affected. Therefore, as in the case where the gateway shown in FIG. 15 is used, an environment in which only the normalized XML document is used on the LAN is maintained. There is an effect.

（他の実施形態）
以上、本発明の実施形態について詳述したが、本発明は、複数の機器から構成される文書処理システムに適用しても良いし、また、一つの機器からなる文書処理装置に適用しても良い。 (Other embodiments)
The embodiment of the present invention has been described in detail above. However, the present invention may be applied to a document processing system including a plurality of devices or a document processing apparatus including a single device. good.

なお、本発明は、前述した実施形態の機能を実現するソフトウェアのプログラムを、システム或いは装置に直接或いは遠隔から供給し、そのシステム或いは装置のコンピュータが該供給されたプログラムを読み出して実行することによっても達成され得る。上記実施形態では、図５から７、及び図９から１３のフローチャートに対応したプログラムである。その場合、プログラムの機能を有していれば、形態は、プログラムである必要はない。従って、本発明の機能処理をコンピュータで実現するために、該コンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明のクレームでは、本発明の機能処理を実現するためのコンピュータプログラム自体も含まれる。その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。 In the present invention, a software program that implements the functions of the above-described embodiments is supplied directly or remotely to a system or apparatus, and the computer of the system or apparatus reads and executes the supplied program. Can also be achieved. In the above embodiment, the program corresponds to the flowcharts of FIGS. 5 to 7 and FIGS. 9 to 13. In that case, as long as it has the function of a program, the form does not need to be a program. Accordingly, since the functions of the present invention are implemented by computer, the program code installed in the computer also implements the present invention. That is, the claims of the present invention include the computer program itself for realizing the functional processing of the present invention. In this case, the program may be in any form as long as it has a program function, such as an object code, a program executed by an interpreter, or script data supplied to the OS.

プログラムを供給するための記録媒体としては、様々なものが使用できる。例えば、フロッピー（登録商標）ディスク、ハードディスク、光ディスク、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などである。 Various recording media for supplying the program can be used. For example, floppy (registered trademark) disk, hard disk, optical disk, magneto-optical disk, MO, CD-ROM, CD-R, CD-RW, magnetic tape, nonvolatile memory card, ROM, DVD (DVD-ROM, DVD- R).

その他、プログラムの供給方法としては、クライアントコンピュータのブラウザを用いてインターネットのホームページに接続し、該ホームページからハードディスク等の記録媒体にダウンロードすることによっても供給できる。その場合、ダウンロードされるのは、本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルであってもよい。また、本発明のプログラムを構成するプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードすることによっても実現可能である。つまり、本発明の機能処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明のクレームに含まれるものである。 As another program supply method, the program can be supplied by connecting to a home page on the Internet using a browser of a client computer and downloading the program from the home page to a recording medium such as a hard disk. In this case, the computer program itself of the present invention or a compressed file including an automatic installation function may be downloaded. It can also be realized by dividing the program code constituting the program of the present invention into a plurality of files and downloading each file from a different homepage. That is, a WWW server that allows a plurality of users to download a program file for realizing the functional processing of the present invention on a computer is also included in the claims of the present invention.

また、本発明のプログラムを暗号化してＣＤ−ＲＯＭ等の記憶媒体に格納してユーザに配布する形態としても良い。その場合、所定の条件をクリアしたユーザに対し、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせ、その鍵情報を使用することにより暗号化されたプログラムが実行可能な形式でコンピュータにインストールされるようにする。 Further, the program of the present invention may be encrypted, stored in a storage medium such as a CD-ROM, and distributed to users. In that case, a user who has cleared a predetermined condition is allowed to download key information to be decrypted from a homepage via the Internet, and using the key information, the encrypted program can be executed on a computer in a format that can be executed. To be installed.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現される形態以外の形態でも実現可能である。例えば、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部または全部を行ない、その処理によっても前述した実施形態の機能が実現され得る。 Further, the present invention can be realized in a form other than the form in which the functions of the above-described embodiments are realized by the computer executing the read program. For example, based on the instructions of the program, an OS or the like running on the computer performs part or all of the actual processing, and the functions of the above-described embodiments can also be realized by the processing.

更に、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれるようにしてもよい。この場合、その後で、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行ない、その処理によって前述した実施形態の機能が実現される。 Furthermore, the program read from the recording medium may be written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. In this case, thereafter, based on the instructions of the program, the CPU or the like provided in the function expansion board or function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing. .

本発明の実施の形態に係る情報処理装置の概略構成を説明するブロック図である。It is a block diagram explaining the schematic structure of the information processing apparatus which concerns on embodiment of this invention. 本実施の形態における解析対象である正規化ＸＭＬ文書の一例を示す図である。It is a figure which shows an example of the normalization XML document which is the analysis object in this Embodiment. 本実施の形態における解析対象である非正規化ＸＭＬ文書の一例を示す図である。It is a figure which shows an example of the denormalized XML document which is the analysis object in this Embodiment. 本実施の形態における解析対象の非ＸＭＬ文書の一例を示す図である。It is a figure which shows an example of the non-XML document of the analysis object in this Embodiment. 本発明の実施の形態１に係るＤＯＭパーサの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the DOM parser which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係るＤＯＭパーサにおける処理手順を説明するフローチャートである。It is a flowchart explaining the process sequence in the DOM parser which concerns on Embodiment 1 of this invention. 本発明の実施の形態１〜２に係る文法違反情報対応テーブルの具体例を示す図である。It is a figure which shows the specific example of the grammar violation information corresponding | compatible table which concerns on Embodiment 1-2 of this invention. 本発明の実施の形態２に係るＤＯＭパーサの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the DOM parser which concerns on Embodiment 2 of this invention. 実施の形態２に係るＤＯＭパーサにおける処理手順を説明するフローチャートである。12 is a flowchart for explaining a processing procedure in the DOM parser according to the second embodiment. 本発明の実施の形態３に係るパーサの構成を説明するブロック図である。It is a block diagram explaining the structure of the parser which concerns on Embodiment 3 of this invention. 実施の形態３に係るパーサにおける処理手順を説明するフローチャートである。10 is a flowchart illustrating a processing procedure in a parser according to Embodiment 3. 本発明の実施の形態４，５に係るパーサを持つノードが接続されたネットワークを説明する図である。It is a figure explaining the network with which the node which has a parser concerning Embodiment 4 and 5 of this invention was connected. 本実施の形態４に係る正規化ＸＭＬ対応ノードの構成を示すブロック図である。It is a block diagram which shows the structure of the normalization XML corresponding | compatible node which concerns on this Embodiment 4. 本実施の形態４に係るパーサ部における処理手順を説明するフローチャートである。14 is a flowchart illustrating a processing procedure in a parser unit according to the fourth embodiment. 本発明の実施の形態５に係るゲートウェイの構成を示すブロック図である。It is a block diagram which shows the structure of the gateway which concerns on Embodiment 5 of this invention. 実施の形態５に係るノードの構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a node according to a fifth embodiment. 本発明の実施の形態５に係るゲートウェイの処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the gateway which concerns on Embodiment 5 of this invention. 本発明の実施の形態５の変形例であるゲートウェイの構成を示すブロック図である。It is a block diagram which shows the structure of the gateway which is a modification of Embodiment 5 of this invention.

Claims

A document processing device for analyzing an input structured document,
Normalized XML analysis means for analyzing the normalized XML document;
Normalization means for converting an unnormalized XML document into a normalized XML document and supplying the normalized XML document to the normalized XML analysis means;
Identification means for identifying whether the input document is at least a normalized XML document,
A document processing apparatus characterized in that an XML document identified as not being a normalized XML document by the identifying means is converted into a normalized XML document by the normalizing means and analyzed by the normalized XML analyzing means.

The identification means has a database for storing grammatical rule information of the XML document,
The grammar of the input document is identified with reference to the database, and the input document is classified into any one of a denormalized XML document, a normalized XML document, and a non-XML document. 1. The document processing apparatus according to 1.

2. The processing according to claim 1, wherein when the input document is determined to be a normalized XML document by the identification unit, the input document is processed based on a result of analysis by the normalized XML analysis unit. 2. The document processing apparatus according to 2.

A document processing device for analyzing an input structured document,
Normalized XML analysis means for analyzing the normalized XML document;
Normalization means for converting the input document into a normalized XML document and supplying the normalized XML document to the normalized XML analysis means;
And a discriminating unit that discriminates a document that cannot be normalized by the normalizing unit as a non-XML document.

The document processing apparatus according to claim 1, wherein the identification unit identifies based on an identifier of the input document.

A document processing method for inputting and analyzing a structured document,
A normalized XML analysis step of analyzing the normalized XML document;
A normalization step of converting a non-normalized XML document into a normalized XML document and supplying it to the normalized XML analysis step;
An identification step for identifying whether the input document is at least a normalized XML document,
A document processing method, wherein an XML document identified as not a normalized XML document in the identifying step is converted into a normalized XML document by the normalizing step and analyzed in the normalized XML analyzing step.

The identifying step identifies a grammar of the input document with reference to a database storing grammatical rule information of the XML document, and the input document is classified into a denormalized XML document, a normalized XML document, and a non-XML document. The document processing method according to claim 6, wherein the document processing method is classified into any one of the above.

7. The processing according to claim 6, wherein when the input document is determined to be a normalized XML document in the identification step, the input document is processed based on a result of analysis by the normalization XML analysis step. 8. The document processing method according to 7.

A document processing method for inputting and analyzing a structured document,
A normalized XML analysis step of analyzing the normalized XML document;
A normalization step of converting the input document into a normalization XML document and supplying the normalization XML document to the normalization XML analysis step;
And a determination step of determining a document that cannot be normalized by the normalization step as a non-XML document.

10. The document processing method according to claim 6, wherein in the identification step, identification is performed based on an identifier of the input document.

A program for executing the document processing method according to any one of claims 6 to 9.