JP2001325248A

JP2001325248A - Document data processor

Info

Publication number: JP2001325248A
Application number: JP2000144947A
Authority: JP
Inventors: Nobuo Iwata; 伸夫岩田
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2000-05-17
Filing date: 2000-05-17
Publication date: 2001-11-22

Abstract

PROBLEM TO BE SOLVED: To provide a document data processor having improved processing efficiency in contrast to a conventional document data processor which has a problem that the processing efficiency is low since the processing of an HTML parser is performed after the processing of an XML parser is performed. SOLUTION: In this document data processor, a CPU 11 reads document data and performs the processing as the XML parser. At the time of detecting a tag which is not an XML tag during the processing of the XML parser, the start tag processing part or end tag processing part of the HTML parser is activated and the pertinent part is processed. Further, at the time of finding the tag related to CDATA or a pre-format, a corresponding processing is performed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、構造化された文書
データを表示等するための文書データ処理装置に係り、
特にＸＭＬ（Extensible Markup Language）と、ＨＴＭ
Ｌ（HyperText Markup Langage）との双方に対応可能な
文書データ処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document data processing apparatus for displaying structured document data and the like.
In particular, XML (Extensible Markup Language) and HTM
The present invention relates to a document data processing device capable of supporting both L (HyperText Markup Language).

【０００２】[0002]

【従来の技術】近年、インターネットで広く用いられる
文書データの形式としてＨＴＭＬと呼ばれるマーク付け
言語がある。ＨＴＭＬは、ＳＧＭＬ（Standard General
ized Markup Language）と呼ばれる、メタ言語文法に準
拠して記述されている。しかし、このＳＧＭＬにより記
述された文書の処理は、種々の指定処理が可能であるも
のの、処理が複雑になるため、ＸＭＬと呼ばれる簡略化
された言語が策定されつつある。また、このＸＭＬに
は、文書中に文書の属性や特定の処理の対象となるデー
タ（直接表示はされないが処理装置で利用されるデー
タ）を含ませることができる。2. Description of the Related Art In recent years, there is a markup language called HTML as a format of document data widely used on the Internet. HTML is SGML (Standard General
It is described in accordance with a meta-language grammar called "ized Markup Language". However, in the processing of a document described in SGML, various kinds of designation processing can be performed, but the processing becomes complicated. Therefore, a simplified language called XML is being formulated. In addition, the XML can include attributes of the document and data to be subjected to specific processing (data not directly displayed but used by the processing device) in the document.

【０００３】このＸＭＬにより記述された文書データ
は、ＨＴＭＬと同様に取得されて処理されるのである
が、従来の文書データ処理装置としてのＷｅｂブラウザ
を含むパーソナルコンピュータでは、このＸＭＬを直接
処理できないため、取得したＸＭＬ文書を、ＨＴＭＬに
変換する処理を行ってから、ＨＴＭＬ文書として改めて
処理を行っている。[0003] Document data described in XML is acquired and processed in the same manner as HTML. However, a personal computer including a Web browser as a conventional document data processing apparatus cannot directly process the XML. After the acquired XML document is converted into HTML, the process is performed again as an HTML document.

【０００４】ところで、ＸＭＬやＳＧＭＬでは、文書フ
ァイル中または文書ファイルから参照される他のファイ
ルにＤＴＤ（Data Type Difinition）を設定してシンタ
ックスを宣言し、この宣言に従って文書データの処理を
行わせる。In XML and SGML, a syntax is declared by setting a DTD (Data Type Definition) in a document file or another file referred to by the document file, and the document data is processed according to the declaration. .

【０００５】例えば、ＳＧＭＬでは、マーク付けのため
のタグ（特定の処理を行わせるべきコンテンツを囲むデ
ータ）をＤＴＤを用いて定義できる。このＳＧＭＬで
は、終了を表すタグを省略して記載できるようにＤＴＤ
を用いた定義を作成できる。このように終了のタグを省
略した文書データを処理する場合、当該タグに対するコ
ンテンツであるか否かを判別し、そうでないコンテンツ
が見いだされた場合に、終了タグがあったものとして処
理する必要がある。具体的にＨＴＭＬの<P>（段落の開
始タグ）は、対応する終了タグ（</P>）を省略できるこ
とをＤＴＤを用いて定義できる。さらにＳＧＭＬでは、
文書データ中に必須の開始タグを省略可能と定義するこ
ともできるようになっている。For example, in SGML, a tag for marking (data surrounding content to be subjected to a specific process) can be defined using DTD. In this SGML, the DTD is used so that the tag indicating the end can be omitted and described.
You can create a definition using When processing document data in which the end tag is omitted in this way, it is necessary to determine whether or not the content is for the tag, and if content that is not found is found, it is necessary to process the content as if there was an end tag. is there. Specifically, <D> (start tag of paragraph) in HTML can be defined by using DTD to omit the corresponding end tag (</ P>). In SGML,
It is also possible to define that a required start tag can be omitted in the document data.

【０００６】これに比べ、ＸＭＬは、開始タグと終了タ
グの種類が制限されるとともに、コンテンツのタイプも
参照データとタグとの関係でＳＧＭＬより限定的に定め
られている。すなわち、ＳＧＭＬでは、コンテンツ中の
タグや（画像データなどに対する）参照データを処理す
るＰＣＤＡＴＡ（Parsed Character Data）と、タグを
無視して参照データのみを処理するＲＣＤＡＴＡ（Rapl
aceable Character Data）と、タグも参照データも単な
る文字列として扱うことができるＣＤＡＴＡ（Characte
r Data）との３種類があったが、ＸＭＬでは、ＰＣＤＡ
ＴＡのみが利用可能である。ここで、ＣＤＡＴＡは、表
示や印刷の目的でなく、ＪａｖａＳｃｒｉｐｔによるス
クリプトプログラムの記述に利用されている。On the other hand, in XML, the types of the start tag and the end tag are restricted, and the type of the content is more limited than the SGML in relation to the reference data and the tag. That is, in SGML, PCDATA (Parsed Character Data) for processing tags and reference data (with respect to image data and the like) in content, and RCDATA (RaplPlatform) for processing only reference data ignoring tags.
ACEable Character Data) and CDATA (Characte) that can handle both tags and reference data as simple character strings.
r Data), but in XML, PCDA
Only TA is available. Here, CDATA is used not for the purpose of display and printing, but for description of a script program in JavaScript.

【０００７】このため、従来の文書データ処理装置で
は、ＸＭＬとＨＴＭＬの双方に対応させるために、ＸＭ
Ｌデータを処理してＨＴＭＬデータに変換し、次に変換
後のＨＴＭＬデータを処理することとしている。For this reason, in a conventional document data processing apparatus, in order to support both XML and HTML, an XML
The L data is processed and converted into HTML data, and then the converted HTML data is processed.

【０００８】[0008]

【発明が解決しようとする課題】このように、上記従来
の文書データ処理装置では、ＸＭＬをＨＴＭＬへ変換す
るためにＸＭＬパーザを実行し、さらにこのパーザの処
理結果に基づいてＨＴＭＬパーザを実行するので、処理
負荷が大きくなる。また、既存のＨＴＭＬ文書の多く
は、ＤＴＤ定義が関連づけられていないものが多く、Ｘ
ＭＬパーザが正常に動作しない場合がある。さらに、既
存のＨＴＭＬ文書では、終了タグが省略されているもの
が多く、ＸＭＬパーザがそのまま正常に処理できないと
いう問題点があった。As described above, in the conventional document data processing apparatus, an XML parser is executed to convert XML to HTML, and an HTML parser is executed based on the processing result of the parser. Therefore, the processing load increases. In addition, many existing HTML documents do not have a DTD definition associated with them.
The ML parser may not operate properly. Further, in many existing HTML documents, the end tag is omitted in many cases, and there is a problem that the XML parser cannot process normally as it is.

【０００９】さらに、近年のＨＴＭＬ文書データには、
ＪａｖａＳｃｒｉｐｔのための<SCRIPT>タグや、スタイ
ルシートの指定のための<STYLE>タグ等が設定されてお
り、コンテンツがＣＤＡＴＡとして処理されることが前
提となっているものがある。さらに<PRE>タグと呼ばれ
るタグを用いて、スペースや改行をコンテンツ通りに表
示する手法が用いられたものもある。これらの文書デー
タは、ＸＭＬパーザではそのまま正常に処理することが
できない。Further, recent HTML document data includes:
A <SCRIPT> tag for JavaScript, a <STYLE> tag for designating a style sheet, and the like are set, and some of them are based on the assumption that the content is processed as CDATA. Furthermore, there is a method that uses a tag called a <PRE> tag to display a space or a line feed according to the content. These document data cannot be processed normally by the XML parser.

【００１０】本発明は上記実情に鑑みて為されたもの
で、ＨＴＭＬ中に混在するＸＭＬデータを正常に処理で
き、かつ処理効率を向上できる文書データ処理装置を提
供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to provide a document data processing apparatus capable of normally processing XML data mixed in HTML and improving processing efficiency.

【００１１】[0011]

【課題を解決するための手段】上記従来例の問題点を解
決するための請求項１記載の発明は、文書データ処理装
置であって、ＨＴＭＬにより記述された部分文書データ
とＸＭＬにより記述された部分文書データの少なくとも
一方を含むタグ付き文書データ、を処理する文書データ
処理装置であって、所定の部分文書データをＨＴＭＬ文
書として処理し、出力するＨＴＭＬパーザ手段と、処理
対象となったタグ付き文書データを部分文書データごと
に順次解析し、解析の結果に応じて前記ＨＴＭＬパーザ
手段を起動する動作と、当該解析された部分文書データ
をＸＭＬ文書として処理する動作とのいずれかを選択的
に行うＸＭＬパーザ手段と、を含むことを特徴としてい
る。According to a first aspect of the present invention, there is provided a document data processing apparatus, comprising: a partial document data described in HTML and a partial document data described in XML; A document data processing apparatus for processing tagged document data including at least one of partial document data, comprising: HTML parser means for processing and outputting predetermined partial document data as an HTML document; The document data is sequentially analyzed for each partial document data, and an operation of activating the HTML parser means according to a result of the analysis and an operation of processing the analyzed partial document data as an XML document are selectively performed. And an XML parser for performing the processing.

【００１２】このＸＭＬパーザ手段により、ＸＭＬパー
ザの解析処理中に検出されたＨＴＭＬ部分の文書データ
がＨＴＭＬパーザにより処理され、処理効率を向上でき
る。また、上記従来例の問題点を解決するための請求項
２記載の発明は、請求項１記載の文書データ処理装置に
おいて、処理対象となったタグ付き文書データのデータ
型属性を取得する手段と、前記データ型属性に対応付け
て事前に定義されているデフォルトデータ型定義を取得
する手段とを含み、前記ＸＭＬパーザ手段が、前記取得
したデフォルトデータ型定義に従って処理を行うことを
特徴としている。これにより、ＤＴＤ定義が関連づけら
れていない文書データに対してもデフォルトのＤＴＤを
適用でき、正常に文書データを処理できる。By this XML parsing means, the document data of the HTML portion detected during the parsing of the XML parser is processed by the HTML parser, so that the processing efficiency can be improved. According to a second aspect of the present invention, there is provided a document data processing apparatus, comprising: means for acquiring a data type attribute of tagged document data to be processed; Means for acquiring a default data type definition defined in advance in association with the data type attribute, wherein the XML parser performs processing according to the acquired default data type definition. As a result, the default DTD can be applied to document data to which no DTD definition is associated, and the document data can be processed normally.

【００１３】さらに、ＸＭＬパーザ手段は、処理対象と
なったタグ付き文書データにＸＭＬ宣言がない場合にの
み前記取得したデフォルトデータ型定義に従って処理を
行うことも好適である。Further, it is preferable that the XML parsing means performs the processing according to the acquired default data type definition only when the tagged document data to be processed does not have an XML declaration.

【００１４】また、上記従来例の問題点を解決するため
の請求項４記載の発明は、請求項１に記載の文書データ
処理装置であって、処理対象となったタグ付き文書デー
タと、当該文書データに関連づけられたデータ型定義と
を取得する手段と、前記タグ付き文書データのデータ型
属性を取得する手段と、前記データ型属性に対応付けて
事前に定義されているデフォルトデータ型定義を取得す
る手段とを含み、前記ＸＭＬパーザ手段が、前記取得し
たデータ型定義とデフォルトデータ型定義とに従って処
理を行うことを特徴としている。According to a fourth aspect of the present invention, there is provided a document data processing apparatus as set forth in the first aspect, wherein the tagged document data to be processed includes: Means for obtaining a data type definition associated with the document data; means for obtaining a data type attribute of the tagged document data; and a default data type definition previously defined in association with the data type attribute. Acquiring means, wherein the XML parsing means performs processing according to the acquired data type definition and the default data type definition.

【００１５】さらに、上記従来例の問題点を解決するた
めの請求項５記載の発明は、請求項１から４のいずれか
に記載の文書データ処理装置において、特殊タグごと
に、当該特殊タグに関連する部分文書データを処理する
パーザ手段と、前記特殊タグの情報と、当該特殊タグに
対応するパーザ手段とを少なくとも一組設定する手段を
含み、前記ＸＭＬパーザ手段は、タグ付き文書データを
解析し、当該解析の結果、前記特殊タグを検出すると、
当該特殊タグに対応するパーザ手段を起動することを特
徴としている。According to a fifth aspect of the present invention, there is provided a document data processing apparatus as set forth in any one of the first to fourth aspects, wherein each special tag is assigned to the special tag. A parser for processing related partial document data; a unit for setting at least one set of information of the special tag and a parser corresponding to the special tag; wherein the XML parser analyzes the tagged document data. When the special tag is detected as a result of the analysis,
It is characterized in that parser means corresponding to the special tag is activated.

【００１６】また、ここで前記特殊タグには、少なくと
もＣＤＡＴＡに関連するタグと、プレフォーマットに関
連するタグとのいずれかを含むことが好ましい。It is preferable that the special tag includes at least one of a tag related to CDATA and a tag related to preformat.

【００１７】さらに、上記従来例の問題点を解決するた
めの請求項７記載の発明は、請求項１から６のいずれか
に記載の文書データ処理装置において、さらに、各タグ
ごとの省略可否情報を格納する手段を含み、前記ＸＭＬ
パーザ手段は、省略可能に設定されたタグを検出する
と、タグが省略されているか否かを解析し、当該解析の
結果に基づいてタグが省略されているときには、当該省
略されたタグを補完して文書データを処理することを特
徴としている。これによりタグが省略された文書データ
に対しても正常な処理を行うことができる。According to a seventh aspect of the present invention, there is provided a document data processing apparatus as set forth in any one of the first to sixth aspects, further comprising information on omission availability of each tag. , Wherein the XML
When the parser detects a tag set to be omissible, the parser analyzes whether or not the tag is omitted. If the tag is omitted based on a result of the analysis, the parser complements the omitted tag. And processing the document data. As a result, normal processing can be performed on document data from which tags have been omitted.

【００１８】上記従来例の問題点を解決するための請求
項８記載の発明は、文書データ処理装置であって、第１
のルールで記述された部分文書データと第２のルールに
従って記述された部分文書データの少なくとも一方を含
む文書データ、を処理する文書データ処理装置であっ
て、所定の部分文書データを前記第１のルールに従って
処理し、出力する第１パーザ手段と、処理対象となった
文書データを部分文書データごとに順次解析し、解析の
結果に応じて前記第１パーザ手段を起動する動作と、当
該解析された部分文書データを第２のルールに従って処
理する動作とのいずれかを選択的に行う第２パーザ手段
と、を含むことを特徴としている。An invention according to claim 8 for solving the problem of the conventional example is a document data processing apparatus, wherein
A document data processing apparatus for processing document data including at least one of partial document data described in accordance with the first rule and partial document data described in accordance with the second rule. A first parser for processing and outputting according to the rules, an operation of sequentially analyzing the document data to be processed for each partial document data, and activating the first parser in accordance with the result of the analysis; And second parser means for selectively performing one of the following operations on the partial document data according to the second rule.

【００１９】[0019]

【発明の実施の形態】本発明の実施の形態について図面
を参照しながら説明する。本発明の実施の形態に係る文
書データ処理装置は、パーソナルコンピュータであり、
具体的には図１に示すように、ＣＰＵ１１と、ブートＲ
ＯＭ１２と、ＲＡＭ１３と、ハードディスク１４と、デ
ィスプレイ１５と、操作部１６と、ＬＡＮインタフェー
ス１７と、外部記憶装置１８とから基本的に構成されて
いる。Embodiments of the present invention will be described with reference to the drawings. A document data processing device according to an embodiment of the present invention is a personal computer,
Specifically, as shown in FIG.
It basically includes an OM 12, a RAM 13, a hard disk 14, a display 15, an operation unit 16, a LAN interface 17, and an external storage device 18.

【００２０】ＣＰＵ１１は、電源投入直後にブートＲＯ
Ｍ１２に格納されているプログラムをＲＡＭ１３上にロ
ードして実行する初期化処理を行う。この初期化処理に
より、ＣＰＵ１１は、ハードディスク１４に格納された
オペレーティングシステムをＲＡＭ１３上にロードし、
処理を開始する。そして、ＣＰＵ１１は、操作部１６か
ら入力される指示により、文書データの処理を行うブラ
ウザをハードディスク１４からＲＡＭ１３上にロードし
て処理を行う。このブラウザの処理については、後に詳
しく説明する。The CPU 11 starts the boot RO immediately after the power is turned on.
An initialization process for loading the program stored in M12 onto the RAM 13 and executing the program is performed. By this initialization process, the CPU 11 loads the operating system stored in the hard disk 14 onto the RAM 13 and
Start processing. Then, the CPU 11 loads a browser for processing document data from the hard disk 14 onto the RAM 13 according to an instruction input from the operation unit 16 and performs processing. This browser processing will be described later in detail.

【００２１】ブートＲＯＭ１２は、ＣＰＵ１１の初期化
処理に関連するプログラムを格納している。ＲＡＭ１３
は、ＣＰＵ１１のワークメモリとして動作する。ハード
ディスク１４は、ＣＰＵ１１が処理する各種プログラム
を格納している。またこのハードディスク１４は、ＣＰ
Ｕ１１の処理に必要なデータ（例えば事前に設定された
デフォルトＤＴＤ等）を格納している。The boot ROM 12 stores programs related to the initialization processing of the CPU 11. RAM 13
Operates as a work memory of the CPU 11. The hard disk 14 stores various programs processed by the CPU 11. Also, this hard disk 14 has a CP
Data necessary for the processing of U11 (for example, a default DTD set in advance) is stored.

【００２２】ディスプレイ１５は、ＣＰＵ１１から入力
される指示により、種々のデータを表示出力する。操作
部１６は、キーボードやマウス等であり、ユーザが行う
操作の内容をＣＰＵ１１に伝達する。ＬＡＮインタフェ
ース１７は、ＬＡＮ（LocalArea Network）又はインタ
ーネットを経由してＷｅｂサーバに接続されており、Ｃ
ＰＵ１１から入力される指示によりネットワークを介し
てデータを送信し、また、ネットワークを介して到来す
るデータを受信してＣＰＵ１１に出力する。The display 15 displays and outputs various data according to an instruction input from the CPU 11. The operation unit 16 is a keyboard, a mouse, or the like, and transmits contents of an operation performed by the user to the CPU 11. The LAN interface 17 is connected to a Web server via a LAN (Local Area Network) or the Internet.
According to an instruction input from the PU 11, data is transmitted via the network, and data arriving via the network is received and output to the CPU 11.

【００２３】外部記憶装置１８は、フロッピー（登録商
標）ディスクや光磁気ディスク等、光学的又は電磁気的
にデータを保持し、コンピュータにより読み取り可能な
記録媒体等からデータを読み出してＣＰＵ１１に出力す
る。ＣＰＵ１１は、この外部記憶装置１８から読み出し
たデータをハードディスク１４に処理プログラムとして
インストールする。The external storage device 18 holds data optically or electromagnetically, such as a floppy (registered trademark) disk or a magneto-optical disk, reads data from a computer-readable recording medium or the like, and outputs the data to the CPU 11. The CPU 11 installs the data read from the external storage device 18 on the hard disk 14 as a processing program.

【００２４】ここでＣＰＵ１１の文書データ処理につい
て説明する。本実施の形態に係るＣＰＵ１１が処理する
文書データ処理のためのブラウザプログラムは、図２に
示すように、ＴＣＰ／ＩＰプロトコル解析部２１と、Ｘ
ＭＬパーズ部２２と、ＨＴＭＬパーズ部２３と、ブラウ
ザコア部２４と、描画部２５とから構成されている。ま
た、ＨＴＭＬパーズ部２３は、プレフォーマット処理部
３１と、ＣＤＡＴＡ処理部３２と、開始タグ処理部３３
と、終了タグ処理部３４とから構成されている。ここ
で、ＸＭＬパーズ部２２が、本発明のＸＭＬパーザ手段
又は第２パーザ手段に、ＨＴＭＬパーズ部２３が本発明
のＨＴＭＬパーザ手段又は第１パーザ手段にそれぞれ相
当している。また、プレフォーマット処理部３１とＣＤ
ＡＴＡ処理部３２とが本発明の特殊タグに対応するパー
ザ手段に相当し、プレフォーマット処理すべき文書デー
タは、特殊タグ「<PRE>」に関連づけられ、ＣＤＡＴＡ
に関連する特殊タグは、「<SCRIPT>」等である。Here, the document data processing of the CPU 11 will be described. As shown in FIG. 2, a browser program for document data processing processed by the CPU 11 according to the present embodiment includes a TCP / IP
It comprises an ML parse unit 22, an HTML parse unit 23, a browser core unit 24, and a drawing unit 25. The HTML parse unit 23 includes a pre-format processing unit 31, a CDATA processing unit 32, and a start tag processing unit 33.
And an end tag processing unit 34. Here, the XML parse unit 22 corresponds to the XML parser unit or the second parser unit of the present invention, and the HTML parse unit 23 corresponds to the HTML parser unit or the first parser unit of the present invention. Also, the preformat processing unit 31 and the CD
The ATA processing unit 32 corresponds to parser means corresponding to the special tag of the present invention, and the document data to be pre-formatted is associated with the special tag “<PRE>”
Is a special tag such as “<SCRIPT>”.

【００２５】ＴＣＰ／ＩＰプロトコル処理部２１は、Ｔ
ＣＰ／ＩＰプロトコルによってネットワークを経由して
文書データを取得し、ＸＭＬパーズ部２２に出力する。
ここで、ＴＣＰ／ＩＰプロトコル処理部２１が取得する
文書データは、具体的に図３で示すようなものである。
この図３において、文字「<」と、「>」とで囲まれてい
る部分（例えば先頭の<HTML>）がタグと呼ばれる。ま
た、このタグのうち、「</」で始まるものが終了タグで
あり、そうでないものが開始タグである。図３に示すよ
うに、本実施の形態の文書データ処理装置において想定
している文書データは、開始タグとコンテンツデータと
終了タグとからなる基本構造が入れ子になっているもの
である。すなわち、開始タグ<HTML>と、終了タグ</HTML
>の間のコンテンツには、さらに開始タグ<BODY>と、終
了タグ</BODY>とに囲まれたコンテンツがあり、さら
に、このコンテンツ内にも基本構造が複数含まれてい
る。尚、この図３の文書データにおいては、数多く存在
するＨＴＭＬ文書と同様に、例えば「<BR>」タグに対応
する終了タグが省略されている。The TCP / IP protocol processing unit 21
The document data is acquired via the network by the CP / IP protocol and output to the XML parse unit 22.
Here, the document data acquired by the TCP / IP protocol processing unit 21 is specifically as shown in FIG.
In FIG. 3, a part (for example, <HTML> at the head) surrounded by characters “<” and “>” is called a tag. In addition, among these tags, those that start with “</” are end tags, and those that do not are start tags. As shown in FIG. 3, the document data assumed in the document data processing apparatus according to the present embodiment has a nested basic structure including a start tag, content data, and an end tag. That is, start tag <HTML> and end tag </ HTML
The content between <> includes content surrounded by a start tag <BODY> and an end tag </ BODY>, and the content also includes a plurality of basic structures. In the document data of FIG. 3, an end tag corresponding to, for example, a “<BR>” tag is omitted, as in the case of a large number of HTML documents.

【００２６】また、図３には現れていないが、ＸＭＬに
おいては、「/>」で終了するタグは、便宜的に終了タグ
として扱われるのが一般的である。Although not shown in FIG. 3, in XML, tags ending with "/>" are generally treated as end tags for convenience.

【００２７】ここで、ＣＰＵ１１がＴＣＰ／ＩＰプロト
コル処理部２１で取得した文書データに対して行うＸＭ
Ｌパーズ部２２としての処理を図４を参照して説明す
る。尚、以下の説明において、ハードディスク１４に
は、動作パラメータと、デフォルトデータ型定義に相当
するデフォルトＤＴＤとが事前に設定され、格納されて
いるものとする。ここで、動作パラメータとは、図５に
示すように、開始タグ処理部３３へのポインタ（Ａ）
と、終了タグ処理部３４へのポインタ（Ｂ）と、ＣＤＡ
ＴＡ処理部３２へのポインタ（Ｃ）と、ＣＤＡＴＡとし
て処理すべきタグ名の配列（Ｄ）と、プレフォーマット
処理部３１へのポインタ（Ｅ）と、プレフォーマット処
理部３１で処理すべきタグ名の配列（Ｆ）と、終了タグ
の省略の可否を表すフラグ（Ｇ）とを関連づけたもので
あり、デフォルトＤＴＤは、要素宣言と、属性宣言とを
含み、要素宣言は、図６（ａ）に示すように、識別子
（Ｈ）と、タグ名（Ｉ）と、タイプ（Ｊ）と、開始タグ
及び終了タグの省略可否を表すフラグ（Ｋ）と、コンテ
ンツに含まれる可能性のあるタグのリスト（Ｌ）とを関
連づけたものである。また、属性宣言は、図６（ｂ）に
示すように、識別子（Ｍ）と、タグ名（Ｎ）と、属性名
（Ｏ）と、属性値のタイプ（Ｐ）とを関連づけたもので
あり、属性値タイプが列挙型（enumeration）である場
合には、さらに取りうる値の配列（Ｑ）が関連づけられ
ている。これらの図５及び図６において配列やリスト
は、通常広く知られるように、ＮＵＬＬで配列の終了を
識別することとしている。Here, the XM which the CPU 11 performs on the document data acquired by the TCP / IP protocol processing unit 21
The processing as the L parse unit 22 will be described with reference to FIG. In the following description, it is assumed that an operation parameter and a default DTD corresponding to a default data type definition are set and stored in the hard disk 14 in advance. Here, the operation parameter is a pointer (A) to the start tag processing unit 33 as shown in FIG.
, A pointer (B) to the end tag processing unit 34, and the CDA
Pointer (C) to TA processing unit 32, array (D) of tag names to be processed as CDATA, pointer (E) to preformat processing unit 31, and tag name to be processed by preformat processing unit 31 (F) is associated with a flag (G) indicating whether or not the end tag can be omitted. The default DTD includes an element declaration and an attribute declaration. As shown in FIG. 5, an identifier (H), a tag name (I), a type (J), a flag (K) indicating whether or not a start tag and an end tag can be omitted, and a tag that may be included in the content. The list (L) is associated with the list (L). As shown in FIG. 6B, the attribute declaration associates an identifier (M), a tag name (N), an attribute name (O), and an attribute value type (P). When the attribute value type is an enumeration, an array of possible values (Q) is further associated. In these FIGS. 5 and 6, the end of the array is identified by NULL, as is generally widely known.

【００２８】ＣＰＵ１１は、ＸＭＬパーズ部２２の処理
として、図４に示すように、まず、動作パラメータをハ
ードディスク１４からロードし（Ｓ１）、デフォルトＤ
ＴＤをハードディスク１４からロードする（Ｓ２）。そ
して、文書データを読込み（Ｓ３）、文書データが終了
したか否かを調べ（Ｓ４）、終了していれば（Ｙｅｓな
らば）、処理を終了する。As shown in FIG. 4, the CPU 11 first loads operation parameters from the hard disk 14 as a process of the XML parse unit 22 (S1).
The TD is loaded from the hard disk 14 (S2). Then, the document data is read (S3), and it is determined whether or not the document data has been completed (S4).

【００２９】また、処理Ｓ４において、終了していなけ
れば（Ｎｏならば）、読み込んだ文書データがＸＭＬタ
グであるか否かを調べる（Ｓ５）。ここでＸＭＬタグと
は、「<!」や、「<?」で開始する特別なタグである。こ
のようなタグとして例えば、「<!--」で始まるコメント
などがある。この処理Ｓ５において、ＸＭＬタグであれ
ば（Ｙｅｓならば）、ＸＭＬタグの処理を実行して（Ｓ
６）、処理Ｓ３に戻って処理を続ける（Ａ）。一方、Ｘ
ＭＬタグでなければ（Ｎｏならば）、さらに開始タグで
あるか否かを調べ（Ｓ７）、開始タグであれば（Ｙｅｓ
ならば）、当該開始タグのタグ名をキーとしてデフォル
トＤＴＤを参照し、開始タグ処理部３３へのポインタを
取得し、読み込んだ文書データを引数として開始タグ処
理部３３を起動する（Ｓ８）。この開始タグ処理部３３
の動作については後述する。In step S4, if it is not completed (if No), it is checked whether the read document data is an XML tag (S5). Here, the XML tag is a special tag starting with “<!” Or “<?”. Such tags include, for example, comments beginning with "<!-". In this process S5, if the tag is an XML tag (Yes), the process of the XML tag is executed (S5).
6) Return to the process S3 to continue the process (A). On the other hand, X
If it is not an ML tag (if No), it is further checked whether it is a start tag (S7), and if it is a start tag (Yes)
Then, the default DTD is referred to using the tag name of the start tag as a key, a pointer to the start tag processing unit 33 is obtained, and the start tag processing unit 33 is activated using the read document data as an argument (S8). This start tag processing unit 33
The operation of will be described later.

【００３０】そして、開始タグ処理部３３の処理が完了
すると、ＣＰＵ１１は、ＸＭＬパーザの処理を再開し、
処理Ｓ８で処理した開始タグがＣＤＡＴＡとして処理す
べきタグであるか否かを動作パラメータを参照して検査
し（Ｓ９）、ＣＤＡＴＡとして処理すべきタグであれば
（Ｙｅｓであれば）、ＣＤＡＴＡ処理部３２を起動する
（Ｓ１０）。ＣＰＵ１１は、このＣＤＡＴＡ処理部３２
の動作として、対応する終了タグが読み込まれるまでの
間、ＣＤＡＴＡとして処理し、木構造に追加する処理を
行う。そして、ＣＤＡＴＡ処理部３２の動作が完了する
と、ＣＰＵ１１は、処理Ｓ３に戻ってＸＭＬパーザの処
理を続ける。When the processing of the start tag processing section 33 is completed, the CPU 11 resumes the processing of the XML parser,
It is checked whether or not the start tag processed in step S8 is a tag to be processed as CDATA with reference to the operation parameter (S9). If the tag is to be processed as CDATA (if Yes), the CDATA processing is performed. The unit 32 is started (S10). The CPU 11 controls the CDATA processing unit 32
As an operation of, processing is performed as CDATA until the corresponding end tag is read, and processing for adding to the tree structure is performed. When the operation of the CDATA processing unit 32 is completed, the CPU 11 returns to the processing S3 and continues the processing of the XML parser.

【００３１】一方、処理Ｓ９において、ＣＤＡＴＡとし
て処理すべきタグでなければ（Ｎｏであれば）、さらに
当該開始タグがプレフォーマット処理すべきタグである
か否かを動作パラメータを参照して検査し（Ｓ１１）、
プレフォーマット処理すべきタグであれば（Ｙｅｓなら
ば）、プレフォーマット処理部３１を起動する（Ｓ１
２）。そしてＣＰＵ１１は、プレフォーマット処理部３
１の処理として、当該タグに対応する終了タグが読み込
まれる間、ＰＣＤＡＴＡとして木構造に追加する処理を
行う。このプレフォーマット処理部３１では、改行コー
ドを検出すると、改行タグ<BR>に置き換える処理を行
う。そして、ＣＰＵ１１は、プレフォーマット処理部３
１の処理が完了すると、処理Ｓ３に戻ってＸＭＬパーザ
の処理を続ける。また、処理Ｓ１１において、プレフォ
ーマット処理すべきタグでなければ、そのまま処理Ｓ３
に戻ってＸＭＬパーザの処理を続ける。On the other hand, in step S9, if the tag is not a tag to be processed as CDATA (if No), it is further checked whether or not the start tag is a tag to be preformatted by referring to the operation parameters. (S11),
If the tag is to be preformatted (Yes), the preformat processing unit 31 is started (S1).
2). Then, the CPU 11 executes the preformat processing unit 3
As a process 1, while the end tag corresponding to the tag is read, a process of adding it to the tree structure as PCDATA is performed. When the pre-format processing unit 31 detects a line feed code, the pre-format processing unit 31 performs a process of replacing the line feed code with a line feed tag <BR>. Then, the CPU 11 executes the preformat processing unit 3
Upon completion of the process 1, the process returns to the process S3 to continue the process of the XML parser. If it is determined in step S11 that the tag is not to be preformatted, the process proceeds to step S3.
And the processing of the XML parser is continued.

【００３２】さらに、ＣＰＵ１１は、処理Ｓ７におい
て、開始タグでなければ（Ｎｏならば）、読み込んだ文
書データが終了タグであるか否かを調べる（Ｓ１３）。
そして、終了タグであれば（Ｙｅｓならば）、終了タグ
処理部３４を起動し（Ｓ１４）、終了タグ処理を行い、
処理Ｓ３に戻って処理を続ける。ここで、終了タグ処理
部３４の処理内容については、後述する。Further, in step S7, if it is not the start tag (if No), the CPU 11 checks whether or not the read document data is the end tag (S13).
If it is an end tag (if Yes), the end tag processing unit 34 is started (S14), and end tag processing is performed.
The process returns to step S3 and continues. Here, the processing contents of the end tag processing unit 34 will be described later.

【００３３】また処理Ｓ１３において、終了タグでなけ
れば（Ｎｏならば）、通常のコンテンツとして処理を行
い（Ｓ１５）、処理Ｓ３に戻って処理を続ける。In step S13, if the tag is not the end tag (if No), the process is performed as normal content (S15), and the process returns to step S3 to continue the process.

【００３４】ここで、ＣＰＵ１１が行う開始タグ処理部
３３の動作について説明する。ＣＰＵ１１は、開始タグ
処理部３３を読み込まれた文書データを引数として起動
し、まず、読み込まれたタグの要素を木構造の現時点で
ポイントしている要素の１レベル下位に追加し、追加し
た要素を新たに現在の要素としてポイントする。また最
初のタグであれば（木構造がなければ）、この最初のタ
グを木構造の最初のレベル（ルート）として登録し、当
該ルートを現在の要素としてポイントする。Here, the operation of the start tag processing section 33 performed by the CPU 11 will be described. The CPU 11 activates the start tag processing unit 33 with the read document data as an argument, and first adds the read tag element one level below the element currently pointed to in the tree structure, and adds the added element. Is newly pointed as the current element. If the tag is the first tag (if there is no tree structure), the first tag is registered as the first level (root) of the tree structure, and the root is pointed as the current element.

【００３５】具体的に図４に示した文書データに対し、
ＣＰＵ１１は、図７に示す木構造を形成する。すなわ
ち、図４の文書データの１行目の<HTML>を開始タグとし
て認識し、開始タグ処理部３３の処理としてこの「HTM
L」を木構造のルートとして設定してポイントし
（Ｘ）、次に２行目の<HEAD>をさらに開始タグとして認
識して現在ポイントしている要素（HTML）より１レベル
下位に「HEAD」の要素を追加して（Ｙ）、この「HEAD」
をポイントする。さらに３行目の「TITLE」も開始タグ
であるので、さらに１レベル下位に「TITLE」の要素を
付加し、引き続く「HOME PAGE」をＰＣＤＡＴＡとして
付加する（Ｚ）。Specifically, for the document data shown in FIG.
The CPU 11 forms the tree structure shown in FIG. That is, <HTML> on the first line of the document data in FIG. 4 is recognized as a start tag, and the “HTM
"L" is set as the root of the tree structure and pointed (X), and then <HEAD> on the second line is recognized as a start tag, and "HEAD" is one level lower than the element (HTML) currently pointed to. "Element (Y), and this" HEAD "
To point. Further, since "TITLE" on the third line is also a start tag, an element of "TITLE" is further added one level lower, and the subsequent "HOME PAGE" is added as PCDATA (Z).

【００３６】また、ここで、ＣＰＵ１１が行う終了タグ
処理部３４の動作について説明する。ＣＰＵ１１は、終
了タグ処理部３４を読み込まれた文書データと木構造中
で現在ポイントしている要素とを引数として起動し、Ｘ
ＭＬの木構造の要素をＨＴＭＬとして解析する処理を行
って、当該要素をＨＴＭＬ文書に変換してＲＡＭ１３に
格納し、木構造中でポイントしている位置を１レベル上
位に設定して、新たに現在の要素としてポイントする。
具体的に図４及び図７においては、「TITLE」要素に付
加されているＰＣＤＡＴＡ「HOME PAGE」（Ｚ）の後の
「</TITLE>」を終了タグとして認識し、開始タグ「<TIT
LE>」から終了タグ「</TITLE>」までの基本構造「<TITL
E>HOME PAGE</TITLE>」をＨＴＭＬ文書として解析し、
ＨＴＭＬ文書としてＲＡＭ１３に格納する。そして、
「TITLE」より１レベル上位の「HEAD」（Ｙ）をポイン
トするようにして、終了タグの処理を完了する。このよ
うにして、終了タグ処理部３４の処理の結果として得ら
れるＨＴＭＬ文書は、図８に示すように、行構造に変換
されたものとなる。Here, the operation of the end tag processing unit 34 performed by the CPU 11 will be described. The CPU 11 starts the end tag processing unit 34 with the read document data and the element currently pointed in the tree structure as arguments, and
The element of the ML tree structure is analyzed as HTML, the element is converted into an HTML document and stored in the RAM 13, and the position pointed in the tree structure is set one level higher, and newly created. Points to the current element.
Specifically, in FIGS. 4 and 7, “</ TITLE>” after PCDATA “HOME PAGE” (Z) added to the “TITLE” element is recognized as an end tag, and a start tag “<TITLE” is recognized.
LE>"to the end tag"</TITLE>"
E> HOME PAGE </ TITLE> ”as an HTML document,
It is stored in the RAM 13 as an HTML document. And
The processing of the end tag is completed by pointing to “HEAD” (Y) one level higher than “TITLE”. In this way, the HTML document obtained as a result of the processing of the end tag processing unit 34 is converted into a line structure as shown in FIG.

【００３７】さらにＣＰＵ１１は、ＸＭＬパーズ部２２
の処理が完了すると、ブラウザコア部２４を起動して、
ＲＡＭ１３に格納されたＨＴＭＬ文書（ＨＴＭＬ及びＸ
ＭＬの混在した文書からＸＭＬパーズ部２２及びＨＴＭ
Ｌパーザ部２３の動作によりＨＴＭＬに変換された文
書）を参照し、このＨＴＭＬ文書に基づいて描画部２５
に描画の指示を出力する。描画部２５は、文書の各行の
絶対座標やサイズ、テキストや画像部分、絶対座標やサ
イズ、テキストに対するフォント、画像を参照する情報
としてのＵＲＬ（Uniform Resource Locators）等を考
慮して、ディスプレイ１５に描画結果としての文書デー
タの内容を表示する。The CPU 11 further includes an XML parse unit 22
Is completed, the browser core unit 24 is activated,
HTML documents stored in the RAM 13 (HTML and X
XML parse unit 22 and HTM from document with mixed ML
The document is converted to HTML by the operation of the L parser unit 23), and based on the HTML document, the drawing unit 25
To output the drawing instruction. The drawing unit 25 displays the absolute coordinates and size of each line of the document, the text and the image portion, the absolute coordinates and size, the font for the text, the URL (Uniform Resource Locators) as information for referring to the image, and the like. Displays the contents of the document data as a drawing result.

【００３８】尚、ここまでの説明においては、デフォル
トＤＴＤを予めロードしているが、ＸＭＬタグ処理にお
いて、データ型を宣言するタグである「<!DOCTYPE」が
検出されたときに、当該データ型が特定のもの（例えば
ＨＴＭＬ３．２）であったときにのみデフォルトＤＴＤ
をロードするようにしてもよい。また、開始タグ処理部
３３の動作として、「<HTML>」タグを検出した場合に、
それまでにＤＴＤ宣言（「<!DOCTYPE」）を検出してい
ないときにのみ、デフォルトＤＴＤをロードすることと
してもよい。In the above description, the default DTD is loaded in advance. However, when the tag “<! DOCTYPE” that declares the data type is detected in the XML tag processing, the data type is not loaded. Default DTD only if is a specific one (eg HTML 3.2)
May be loaded. Also, as an operation of the start tag processing unit 33, when the “<HTML>” tag is detected,
The default DTD may be loaded only when the DTD declaration (“<! DOCTYPE”) has not been detected by then.

【００３９】さらに、読み込まれた文書データに関連し
て、当該文書データ内部又は当該文書データから参照さ
れるＤＴＤがある場合には、ＣＰＵ１１は、このＤＴＤ
をロードして、デフォルトＤＴＤの代わりに、又はデフ
ォルトＤＴＤとともにＸＭＬパーズ部２２の処理におい
て利用することも好適である。Further, if there is a DTD that is referred to in the document data or from the document data in relation to the read document data, the CPU 11 executes the DTD.
Is preferably used in the processing of the XML parse unit 22 in place of the default DTD or together with the default DTD.

【００４０】また、動作パラメータにおいて、終了タグ
の省略の可否を表すフラグにより、特定のタグの省略が
可能と設定された場合には、ＣＰＵ１１は、当該開始タ
グを読み込むと、木構造に追加された要素の１つ上位の
要素を取得しておき、ロードされたＤＴＤを参照して、
省略可能であるときには、タイプがＥＭＰＴＹである
（「/>」で終了するタグである）か、又はＥＭＰＴＹで
ないが要素宣言中コンテンツに含まれる可能性のあるタ
グのリストにないタグであるときに、終了タグが省略さ
れたものとして終了タグ処理部３４を起動する。When a flag indicating whether or not the end tag can be omitted is set in the operation parameter so that a specific tag can be omitted, the CPU 11 reads the start tag and adds it to the tree structure. The next higher element is obtained, and the loaded DTD is referred to.
When it can be omitted, if the type is EMPTY (a tag ending with "/>") or it is a tag that is not EMPTY but is not in the list of tags that may be included in the content in the element declaration Then, the end tag processing unit 34 is activated assuming that the end tag is omitted.

【００４１】また、省略可能でないタグであるときや、
ＥＭＰＴＹでなく、かつ要素宣言中コンテンツに含まれ
る可能性のあるタグのリストにある場合には、当該タグ
を木構造に追加して開始タグ処理部３３を起動する。When the tag cannot be omitted,
If the tag is not EMPTY and is in the list of tags that may be included in the element declaration content, the tag is added to the tree structure and the start tag processing unit 33 is started.

【００４２】[0042]

【発明の効果】本発明によれば、ＸＭＬパーズ手段が、
必要に応じてＨＴＭＬパーズ手段を起動するので、文書
データを効率的に処理できる。According to the present invention, the XML parse means comprises:
Since the HTML parse means is activated as needed, the document data can be processed efficiently.

【００４３】また本発明によれば、デフォルトデータ型
定義が利用されるので、データ型定義に関連づけられて
いない文書データも正常に処理できる。According to the present invention, since the default data type definition is used, document data not associated with the data type definition can be normally processed.

【００４４】さらに、本発明によれば、省略可能なタグ
がある場合に、終了タグを補完して処理するので、タグ
を省略しても正常に処理できる。Furthermore, according to the present invention, when there is a tag that can be omitted, the end tag is complemented and processed, so that normal processing can be performed even if the tag is omitted.

[Brief description of the drawings]

【図１】本発明の実施の形態に係る文書データ処理装
置の構成ブロック図である。FIG. 1 is a configuration block diagram of a document data processing device according to an embodiment of the present invention.

【図２】ＣＰＵ１１が処理するブラウザのソフトウエ
アの構造を表す構成ブロック図である。FIG. 2 is a configuration block diagram illustrating a structure of browser software processed by a CPU 11;

【図３】文書データの一例を表す説明図である。FIG. 3 is an explanatory diagram illustrating an example of document data.

【図４】ＣＰＵ１１のＸＭＬパーザとしての処理を表
すフローチャート図である。FIG. 4 is a flowchart illustrating a process performed by the CPU 11 as an XML parser.

【図５】動作パラメータの一例を表す説明図である。FIG. 5 is an explanatory diagram illustrating an example of an operation parameter.

【図６】ＤＴＤの一例を表す説明図である。FIG. 6 is an explanatory diagram illustrating an example of a DTD.

【図７】木構造の一例を表す説明図である。FIG. 7 is an explanatory diagram illustrating an example of a tree structure.

【図８】行構造の一例を表す説明図である。FIG. 8 is an explanatory diagram illustrating an example of a row structure.

[Explanation of symbols]

１１ＣＰＵ、１２ブートＲＯＭ、１３ＲＡＭ、１
４ハードディスク、１５ディスプレイ、１６操作
部、１７ＬＡＮインタフェース、１８外部記憶装
置、２１ＴＣＰ／ＩＰプロトコル解析部、２２ＸＭ
Ｌパーズ部、２３ＨＴＭＬパーズ部、２４ブラウザコ
ア部、２５描画部、３１プレフォーマット処理部、
３２ＣＤＡＴＡ処理部、３３開始タグ処理部、３４
終了タグ処理部。11 CPU, 12 Boot ROM, 13 RAM, 1
4 hard disk, 15 display, 16 operation unit, 17 LAN interface, 18 external storage device, 21 TCP / IP protocol analysis unit, 22 XM
L parse part, 23 HTML parse part, 24 browser core part, 25 drawing part, 31 pre-format processing part,
32 CDATA processing unit, 33 start tag processing unit, 34
End tag processing unit.

Claims

[Claims]

1. A document data processing apparatus for processing tagged document data including at least one of partial document data described in HTML and partial document data described in XML, wherein predetermined document data is converted to HTML. HTML parser means for processing and outputting as a document, and an operation of sequentially analyzing the tagged document data to be processed for each partial document data, and activating the HTML parser means according to the result of the analysis. XML parser means for selectively performing one of the following: an operation of processing the partial document data as an XML document.

2. The document data processing apparatus according to claim 1, further comprising: means for acquiring a data type attribute of the tagged document data to be processed; Means for acquiring a defined default data type definition, wherein the XML parser performs processing according to the acquired default data type definition.

3. The document data processing apparatus according to claim 2, wherein the XML parser means according to the acquired default data type definition only when there is no XML declaration in the tagged document data to be processed. A document data processing device for performing processing.

4. The document data processing apparatus according to claim 1, wherein: a means for acquiring tagged document data to be processed and a data type definition associated with the document data; Means for acquiring a data type attribute of the document data; and means for acquiring a default data type definition predefined in association with the data type attribute, wherein the XML parser means comprises: And a default data type definition.

5. The document data processing apparatus according to claim 1, wherein for each special tag, parser means for processing partial document data related to the special tag, information on the special tag, Means for setting at least one set of parser means corresponding to the special tag, wherein the XML parser means analyzes the tagged document data and detects the special tag as a result of the analysis. A document data processing apparatus for activating a parser means for performing the processing.

6. The document data processing apparatus according to claim 5, wherein the special tag includes at least one of a tag related to CDATA and a tag related to preformat. Processing equipment.

7. The document data processing apparatus according to claim 1, further comprising: means for storing omissibility information for each tag, wherein said XML parser means is set to be omissible. When a tag is detected, it is analyzed whether or not the tag is omitted, and when the tag is omitted based on the result of the analysis, the document data is processed by complementing the omitted tag. Document data processing device.

8. A document data processing apparatus for processing document data including at least one of partial document data described according to a first rule and partial document data described according to a second rule, comprising: A first parser for processing and outputting the document data according to the first rule; and sequentially analyzing the document data to be processed for each partial document data, and activating the first parser according to the result of the analysis. And an operation of selectively processing the analyzed partial document data in accordance with a second rule.

9. A document data processing program for processing document data including at least one of partial document data described according to a first rule and partial document data described according to a second rule, comprising: A first parser module for processing and outputting document data in accordance with the first rule, and sequentially analyzing document data to be processed for each partial document data, and activating the first parser module in accordance with a result of the analysis And a second parser module for selectively performing one of an operation of processing the analyzed partial document data in accordance with a second rule and a second parser module. A readable recording medium.