JP2005011183A

JP2005011183A - Information processor, information processing method, and program

Info

Publication number: JP2005011183A
Application number: JP2003176378A
Authority: JP
Inventors: Naoko Sato; 直子佐藤; Masatoshi Tagawa; 昌俊田川; Masayoshi Sakakibara; 正義榊原; Masaki Satake; 雅紀佐竹; Yoshiyuki Naito; 芳幸内藤
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-06-20
Filing date: 2003-06-20
Publication date: 2005-01-13

Abstract

<P>PROBLEM TO BE SOLVED: To stop processing in response to a processing condition, in a midway of the processing of converting a document for describing a plurality of elements hierarchically into tree structure, before an object of the tree structure after conversion is not able to be stored because of a deficient storage capacity to generate an error. <P>SOLUTION: An XML document is input. The input XML document is analyzed, a partial tree using analyzed elements as nodes is prepared based on an analyzed result therein, and the prepared partial trees are repeatedly processed to be stored in the first storage part 14. A DOM three constituted of the plurality of partial trees is prepared thereby. When a predetermined stopping condition is judged to be satisfied during the repetition, the processing is stopped once, the nodes of the DOM tree stored already in the first storage part 14 are deleted to satisfy a deletion condition, and the processing is started again thereafter. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の要素が階層的に記述された文書を処理する情報処理装置、情報処理方法、及びプログラムに係り、特に該文書を構成する各要素をノードとする木構造で表現したオブジェクトを作成する情報処理装置、情報処理方法、及びプログラムに関するものである。
【０００２】
【従来の技術】
計算機あるいはアプリケーションによってばらばらであったデータ形式を、異なる計算機やアプリケーションでも使用できるようにするための統一規格として、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）が知られている。ＸＭＬは、複数の要素を階層的に記述する構造化文書の代表的な規格である。
【０００３】
ＸＭＬ文書のような構造化文書をアプリケーションで操作するためのＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）として、ＤＯＭ（ＤｏｃｕｍｅｎｔＯｂｊｅｃｔＭｏｄｅｌ）が知られている。ＤＯＭは、構造化文書を木構造のオブジェクトとして扱うためのＡＰＩである。
【０００４】
構造化文書をＤＯＭに変換することにより、アプリケーションは、構造化文書の木構造を認識でき、木をたどって必要な要素にアクセスすることができる。
【０００５】
しかし、構造化文書をＤＯＭに変換する場合、構造化文書のサイズが大きいほど、変換後の木構造のオブジェクトが大きくなり、これを記憶するために多くの記憶容量が必要となる。また、階層が深い場合にも、階層構造を表すための記憶容量が多く必要となる。従って、木構造に変換しても容量不足により記憶装置に格納できずに、エラーとなって処理が中断する、或いは格納できたとしてもアクセス効率が悪くなってしまう、等の問題が発生する。
【０００６】
必要な記憶容量を削減するための装置としては、構造化文書の階層を浅くすることにより動作メモリ量を削減し、データアクセス効率を改善する装置（例えば、特許文献１を参照。）や、複数の要素を、それらの位置関係を示す情報と共に一つの要素として合成する装置（例えば、特許文献２を参照。）が知られている。
【０００７】
【特許文献１】
特開２００２−２９７５６９号公報
【特許文献２】
特開２００２−１０８８５０号公報
【０００８】
【発明が解決しようとする課題】
しかしながら、上述した従来の装置は、実際の処理状態（例えば、オブジェクトの作成量等）を判断することなく処理する装置であるため、必要なだけ動作記憶容量を削減できない場合がある。従って、木構造に変換しても容量不足により記憶装置に格納できずにエラーとなって処理が中断する、という問題や、格納できたとしてもアクセス効率が悪くなってしまう、という問題は依然として解消されない。
【０００９】
本発明は、上述した問題を解決するために提案されたものであり、複数の要素を階層的に記述する文書を木構造に変換する処理の途中で、記憶容量が不足して変換後の木構造のオブジェクトを記憶できずにエラーが発生する前に、処理状態に応じて処理を停止することができる情報処理装置、情報処理方法、及びプログラムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記目的を達成するために、本発明の情報処理装置は、複数の要素が階層的に記述された文書を入力する入力手段と、前記入力手段により入力された文書を解析する解析処理と、前記解析処理の解析結果に基づいて、前記文書を構成する各要素をノードとする木構造で表現したオブジェクトを作成する作成処理と、前記作成処理により作成されたオブジェクトを記憶する記憶処理と、を繰り返す処理手段と、予め定められた停止条件が成立したか否かを判断する判断手段と、前記判断手段により前記停止条件が成立したと判断された場合に、前記解析処理、前記作成処理、及び前記記憶処理の少なくとも１つが停止されるように前記処理手段を制御する制御手段と、を含んで構成されている。
【００１１】
入力手段は、複数の要素が階層的に記述された文書を入力する。複数の要素が階層的に記述された文書は、例えば、ＸＭＬ文書等に代表される構造化文書とすることができる。
【００１２】
処理手段は、解析処理と、作成処理と、記憶処理とを繰り返して行う。解析処理は、入力手段により入力された文書を解析する処理である。例えば、入力された文書を上から順に読み込みながら、文書に記述された各要素の階層構造を解析するようにすることができる。
【００１３】
作成処理は、解析処理の解析結果に基づいて、入力した文書を構成する各要素をノード（節点）とする木構造で表現したオブジェクトを作成する処理である。
【００１４】
記憶処理は、作成処理により作成されたオブジェクトを記憶する。オブジェクトを記憶するための記憶媒体は、特に限定されず、例えば、メインメモリとして一般的に用いられるＲＡＭとすることもできる。
【００１５】
判断手段は、予め定められた停止条件が成立したか否かを判断する。例えば、前記解析処理の解析量が所定の量に達した場合、前記作成処理の作成量が所定の量に達した場合、及び、予め指定された要素が前記解析処理で検出された場合、の少なくともいずれかの場合に、前記停止条件が成立したと判断することができる。
【００１６】
制御手段は、判断手段により停止条件が成立したと判断された場合に、解析処理、作成処理、及び記憶処理の少なくとも１つが停止されるように処理手段を制御する。
【００１７】
このように、停止条件が成立したと判断した場合には、解析処理、作成処理、及び記憶処理の少なくとも１つが停止されるように処理手段を制御するようにしたため、記憶容量が不足して、作成された木構造のオブジェクトを記憶できずにエラーが発生する前に、処理手段の解析処理、作成処理、及び記憶処理の少なくとも１つを停止することができる。
【００１８】
更に、本発明の情報処理装置は、前記停止条件の成立後、前記記憶されたオブジェクトのノードを削除する削除手段を更に含んで構成することもできる。
【００１９】
前記削除手段は、前記停止条件の成立後、外部から指示を受けた場合に、前記記憶されたオブジェクトのノードを削除するようにすることもできる。
【００２０】
すなわち、削除手段は、外部から、例えば、作成されたオブジェクトを参照或いは操作するアプリケーションから削除するように指示を受けた場合に、削除を行うことができる。
【００２１】
前記削除手段は、ノードを削除する際に、階層が深いノード及び該ノードより上位のノードを優先的に削除する、参照回数の少ない要素を起点とする木を構成するノードを優先的に削除する、最後に参照されたときからの経過時間が長い要素を起点とする木を構成するノードを優先的に削除する、他の要素から参照されている要素を起点とする木を構成するノード以外のノードを優先的に削除する、予め指定された要素を起点とする木を構成するノードを優先的に削除する、予め定義された集合に属する要素毎に削除する、予め定められた優先順位に従って削除する、外部から指定された要素を起点とする木を構成するノードを優先的に削除する、の少なくとも１つを実行するようにすることもできる。
【００２２】
例えば、階層が深いノードを含む木は、階層が深い分だけ、階層を表すための記憶容量が多く必要となる。従って、階層が深いノード及び該ノードより上位のノードを優先的に削除するようにすることが好ましい。
【００２３】
例えば、過去、作成されたオブジェクトを参照するアプリケーションにより参照された回数が少ない要素の場合には、今後も参照される可能性が低い。このため、参照回数の少ない要素を起点とする木を構成するノードを優先的に削除するのが好ましい。なお、ここでいう「参照」は、アプリケーションがオブジェクトを操作する場合も含む。
【００２４】
例えば、作成されたオブジェクトを参照するアプリケーションにより最後に参照されたときから長い時間が経過している場合には、今後も参照される可能性が低い。このため、最後に参照されたときからの経過時間が長い要素を起点とする木を構成するノードを優先的に削除するのが好ましい。なお、同様に、ここでいう「参照」は、アプリケーションがオブジェクトを操作する場合も含む。
【００２５】
例えば、ある要素の情報を他の要素も使う場合、すなわち、ある要素を他の要素が参照する場合には、参照先の要素が削除されてしまうと、ある要素を参照する他の要素は参照先の情報が使えなくなってしまう。このため、他の要素から参照されている要素を起点とする木を構成するノード以外のノードを優先的に削除するのが好ましい。
【００２６】
例えば、予め参照される可能性が低い要素がわかっている場合には、予めその要素を指定しておき、該要素を起点とする木を構成するノードを優先的に削除することができる。
【００２７】
例えば、作成されたオブジェクトを参照するアプリケーションが提供するサービス毎に、参照する要素が異なる場合には、各サービスと各サービスが参照する要素（サービスに属する要素）を定義しておき、サービス毎に削除することもできる。このように、予め定義された集合に属する要素毎に削除することができる。
【００２８】
例えば、予め参照される可能性が低い要素、或いは参照される可能性が高い要素がわかっている場合には、予め優先順位を定めておけば、該優先順位に従って削除することができる。
【００２９】
例えば、予め削除する要素を指定した場合には、該指定された要素を起点とする木を構成するノードを優先的に削除することができる。
【００３０】
また、前記制御手段は、前記削除手段により削除されたノードの情報を、前記記憶処理で記憶する記憶領域とは別の記憶領域に記憶するように更に制御することもできる。
【００３１】
前記制御手段は、外部から指示を受けた場合に、前記別の記憶領域に記憶するように制御することもできる。
【００３２】
前記制御手段は、前記削除手段により削除されたノードの情報をテキストデータとして記憶するように制御することもできる。
【００３３】
すなわち、削除されたノードの情報を、木構造のオブジェクトのまま記憶せずにテキストデータとして記憶することにより、記憶するデータサイズが小さくなる。
【００３４】
前記制御手段は、複数の同じ要素名、または複数の同じ属性名が存在する場合には、これら要素名または属性名を対応付けられた番号に置換えて記憶するように制御することもできる。
【００３５】
すなわち、複数の同じ要素名を対応付けられた番号に置換えて格納することにより、記憶するデータサイズが小さくなる。また、要素名に限らず、要素に付されている属性（要素の特性を示す情報）の名前（属性名）についても、同じ名前が複数存在する場合には、同様にして記憶するようにすることができる。
【００３６】
本発明の情報処理装置は、予め定められた再開条件が成立したか否かを判断する再開条件判断手段を更に含み、前記制御手段は、前記停止条件の成立後、前記再開条件判断手段により前記再開条件が成立したと判断された場合に、前記停止を解除し、停止している処理が再開されるように前記処理手段を制御するようにすることもできる。
【００３７】
再開条件判断手段は、予め定められた再開条件が成立したか否かを判断する。例えば、外部から再開の指示を受けた場合、及び前記記憶処理により記憶された前記オブジェクトの記憶量が所定の量以下となった場合、の少なくとも一方の場合に、前記再開条件が成立したと判断するようにすることもできる。
【００３８】
制御手段は、停止条件の成立後、再開条件判断手段により再開条件が成立したと判断された場合に、停止を解除し、停止している処理が再開されるように処理手段を制御する。
【００３９】
前記処理手段は、予め指定された要素が前記解析処理で検出された場合には、該要素及び該要素に従属する要素については前記作成処理を行わないようにすることもできる。
【００４０】
作成処理を行わないように指定する要素は、例えば、作成されたオブジェクトを参照・操作するアプリケーションが参照・操作する頻度の低い要素とすることができる。
【００４１】
前記処理手段の記憶処理は、前記作成処理を行わなかった要素及び該要素に従属する要素の情報を、前記作成されたオブジェクトを記憶する記憶領域とは別の記憶領域に記憶するようにすることもできる。
【００４２】
本発明の情報処理方法は、複数の要素が階層的に記述された文書を入力する入力工程と、前記入力工程により入力された文書を解析する解析処理と、前記解析処理の解析結果に基づいて、前記文書を構成する各要素をノードとする木構造で表現したオブジェクトを作成する作成処理と、前記作成処理により作成されたオブジェクトを記憶する記憶処理と、を繰り返す処理工程と、予め定められた停止条件が成立したか否かを判断する判断工程と、前記判断工程により前記停止条件が成立したと判断された場合に、前記解析処理、前記作成処理、及び前記記憶処理の少なくとも１つを停止する停止工程と、を含んで構成されている。
【００４３】
この情報処理方法によれば、停止条件が成立したと判断した場合には、解析処理、作成処理、及び記憶処理の少なくとも１つを停止するようにしたため、記憶容量が不足して、作成された木構造のオブジェクトを記憶できずにエラーが発生する前に、解析処理、作成処理、及び記憶処理の少なくとも１つを停止することができる。
【００４４】
本発明の情報処理方法は、前記停止条件の成立後、前記記憶されたオブジェクトのノードを削除する削除工程を更に含んで構成することもできる。
【００４５】
本発明の情報処理方法は、前記削除工程により削除されたノードの情報を、前記記憶処理で記憶する記憶領域とは別の記憶領域に記憶する記憶工程を更に含んで構成することもできる。
【００４６】
本発明の情報処理方法は、予め定められた再開条件が成立したか否かを判断する再開条件判断工程と、前記停止条件の成立後、前記再開条件判断工程により前記再開条件が成立したと判断された場合に、前記停止を解除し、停止している処理を再開する再開工程と、を更に含んで構成することもできる。
【００４７】
本発明のプログラムは、コンピュータに、複数の要素が階層的に記述された文書を入力する入力工程と、前記入力工程により入力された文書を解析する解析処理と、前記解析処理の解析結果に基づいて、前記文書を構成する各要素をノードとする木構造で表現したオブジェクトを作成する作成処理と、前記作成処理により作成されたオブジェクトを記憶する記憶処理と、を繰り返す処理工程と、予め定められた停止条件が成立したか否かを判断する判断工程と、前記判断工程により前記停止条件が成立したと判断された場合に、前記解析処理、前記作成処理、及び前記記憶処理の少なくとも１つを停止する停止工程と、を実行させる。
【００４８】
このようなプログラムをコンピュータに実行させることにより、停止条件が成立したと判断した場合には、解析処理、作成処理、及び記憶処理の少なくとも１つを停止するようにしたため、記憶容量が不足して、作成された木構造のオブジェクトを記憶できずにエラーが発生する前に、解析処理、作成処理、及び記憶処理の少なくとも１つを停止することができる。
【００４９】
なお、プログラムを記憶する記憶媒体は、特に限定されず、ＲＯＭや、ＣＤ−ＲＯＭやＤＶＤディスク、光磁気ディスクやＩＣカード、あるいはハードディスク等であってもよいし、電気通信回線上の搬送波のような伝送媒体であってもよい。
【００５０】
本発明のプログラムは、前記停止条件の成立後、前記記憶されたオブジェクトのノードを削除する削除工程を更に含んで構成することもできる。
【００５１】
本発明のプログラムは、前記削除工程により削除されたノードの情報を、前記記憶処理で記憶する記憶領域とは別の記憶領域に記憶する記憶工程を更に含んで構成することもできる。
【００５２】
本発明のプログラムは、予め定められた再開条件が成立したか否かを判断する再開条件判断工程と、前記停止条件の成立後、前記再開条件判断工程により前記再開条件が成立したと判断された場合に、前記停止を解除し、停止している処理を再開する再開工程と、を更に含んで構成することもできる。
【００５３】
【発明の実施の形態】
以下、本発明の好ましい実施の形態について図面を参照しながら詳細に説明する。
【００５４】
［第１の実施の形態］
図１（Ａ）は、本実施の形態に係る構造化文書処理システムの機能構成及び構造化文書処理システムで行われる処理を概念的に示した図である。図示されるように、テキスト形式で記述されたＸＭＬ文書３０が、本発明の情報処理装置としてのＸＭＬプロセッサ（ＸＭＬパーザ）１０に入力されると、ＸＭＬプロセッサ１０は、ＸＭＬ文書３０の各要素をノードとする木構造で表現したオブジェクト（ＤＯＭツリー４０）を作成する。上位ＸＭＬアプリケーション５０は、作成されたＤＯＭツリー４０を参照或いは操作することができる。。
【００５５】
図１（Ｂ）は、図１（Ａ）で示された構造化文書処理システムの機能構成及び構造化文書処理システムで行われる処理を更に詳細に示した図である。
【００５６】
（１）では、上位ＸＭＬアプリケーション５０がＸＭＬ文書３０を参照及び操作するために、ＤＯＭインタフェース６０を介して、ＸＭＬプロセッサ１０にＤＯＭツリー４０を作成するように指示を与える。
【００５７】
（２）では、指示を受けたＸＭＬプロセッサ１０は、ＸＭＬ文書３０を入力して解析する。
【００５８】
図２は、ＸＭＬ文書３０の一例を示している。図示されるようにＸＭＬ文書３０は、テキスト形式で、要素の始まりを示す開始タグ＜○○○＞と要素の終了を示す終了タグ＜／○○○＞により、複数の要素が階層的に記述されている。
【００５９】
具体的には、ＸＭＬプロセッサ１０は、図２に示されるようなＸＭＬ文書３０を上から順に読み込み、開始タグ及び終了タグを検出していくことにより、ＸＭＬ文書３０の各要素の階層構造を解析する。
【００６０】
更にＸＭＬプロセッサ１０は、解析結果に基づいて、ＤＯＭツリー４０を作成する。
【００６１】
図３は、図２のＸＭＬ文書３０を解析して作成されたＤＯＭツリー４０の一例を示している。図示されるように、各要素（Ｅｌｅｍｅｎｔ）が木構造のノード（節点）として構成されている（例えば、図のノード７０）。また、要素の内容、すなわち、開始タグと終了タグで挟まれた文字列（Ｔｅｘｔ）もノードとして構成されている（例えば、図のノード７２）。
【００６２】
なお、図３の▲１▼で示された部分が、図２の▲１▼のｓｃａｎ要素の記述に対応する部分木（全体の木の一部）である。また、図３の▲２▼で示された部分が、図２の▲２▼のｓｔｏｒａｇｅ要素の記述に対応する部分木である。
【００６３】
このように、ＤＯＭツリー４０は、複数の部分木により構成されている。
【００６４】
（３）では、上位ＸＭＬアプリケーション５０が、作成されたＤＯＭツリー４０をＤＯＭインタフェース６０を介して参照或いは操作する。
【００６５】
図４は、ＸＭＬプロセッサ１０の機能を実現するためのハードウェア資源としての情報処理装置１１の構成を示すブロック図である。
【００６６】
図示されるように、情報処理装置１１は、操作部１２、第１記憶部１４、入出力部１６、ＣＰＵ２０、ネットワークＩ／Ｆ２２、及びＲＯＭ２４、を含んで構成され、これらはバスにより相互に接続されている。
【００６７】
操作部１２は、例えばキーボード等により構成され、ユーザは操作部１２を用いて任意のデータを入力し、所定の操作を行う。
【００６８】
ネットワークＩ／Ｆ２２は、各種ネットワークに接続するためのインタフェースである。
【００６９】
ＲＯＭ２４は、ＸＭＬプロセッサ１０の機能としての、ＸＭＬ文書からＤＯＭツリーを作成するためのＤＯＭツリー作成処理ルーチンのプログラムが記憶されている。更にＲＯＭ２４には、ＸＭＬ文書を扱う上位ＸＭＬアプリケーション５０とＤＯＭインタフェース６０の機能を実現するためのプログラムも記憶されている。
【００７０】
ＣＰＵ２０は、ＲＯＭ２４等に記憶されたプログラムを実行することにより各種機能を実現する。
【００７１】
第１記憶部１４は、例えばＲＡＭにより構成され、作成されたＤＯＭツリー等を記憶する。
【００７２】
入出力部１６は、各種データを入出力するためのインタフェースである。入出力部１６には、第２記憶部１８が接続されている。第２記憶部１８は、例えば、第１記憶部１４に比してアクセススピードは遅いが、記憶容量が大きい記憶媒体、例えば、ハードディスク装置等とすることができる。
【００７３】
図１に示されたＸＭＬプロセッサ１０は、図４に示される情報処理装置１１をハードウェア資源とし、ＲＯＭ２４に記憶されたプログラムをソフトウェア資源として用いて実現される機能である。なお、上位ＸＭＬアプリケーション５０及びＤＯＭインタフェース６０も、同様にして実現される機能である。
【００７４】
以下、ＤＯＭツリー作成処理ルーチンを、図５を参照して説明する。
【００７５】
ステップ１００では、ＸＭＬプロセッサ１０は上位ＸＭＬアプリケーション５０からのＤＯＭツリー作成指示を受け、構造化文書（ここではＸＭＬ文書）を取得（入力）する。例えば、ＸＭＬ文書が記憶された記憶装置が第２記憶部１８である場合には、第２記憶部１８から入出力部１６を介して取得され、ネットワーク上の他のコンピュータシステムに記憶されている場合には、ネットワークＩ／Ｆ２２を介して取得される。
【００７６】
ステップ１０２では、取得したＸＭＬ文書を解析する。前述したように、開始タグ及び終了タグを順に検出することにより、これらタグで記述された要素毎にその構造と内容を解析する。
【００７７】
ステップ１０４では、ステップ１０２の解析結果に基づいて、木構造のオブジェクト（ＤＯＭツリー）を作成する。より詳しくは、解析された要素をノードとする部分木としてのＤＯＭツリーを作成する。
【００７８】
以下、この部分木を作成する処理については、部分木作成処理と呼称し、ＤＯＭツリー作成処理と区別して説明する。また、この部分木作成処理で作成されたＤＯＭツリーは、部分木と呼称することにより全体のＤＯＭツリーと区別して説明する。
【００７９】
ステップ１０６では、第１記憶部１４の空容量が不足しているか否かを判断する。ここで、空容量が不足していない、すなわち、ステップ１０４で作成された部分木を格納するのに十分な空容量があると判断した場合には、ステップ１０８で、作成された部分木を第１記憶部１４に格納する。ステップ１０６で、空容量が不足していると判断した場合には、ステップ１１０の第２記憶部格納処理に移行し、作成された部分木を第２記憶部１８に格納する。
【００８０】
図６は、第２記憶部格納処理の流れを示すフローチャートである。
【００８１】
ステップ２００では、第２記憶部１８の空容量が不足しているか否かを判断する。ここで、空容量が不足していないと判断した場合には、ステップ２０２で、作成された部分木をそのまま第２記憶部１８に格納する。また、空容量が不足していると判断した場合には、ステップ２０４に移行し、作成された部分木をテキストデータに変換して、データサイズを小さくする。変換後は、ステップ２０２に移行し、テキストデータに変換されたものを第２記憶部１８に格納する。
【００８２】
このように、作成された部分木を第１記憶部１４或いは第２記憶部１８に格納した後は、図５のステップ１１２に移行し、停止条件が成立したか否かを判断する。
【００８３】
停止条件は、予め設定されており、ここでは、ＤＯＭツリーを格納するための使用メモリ量を予め定めておき、ＤＯＭツリーの作成量（部分木の作成量の累積）が該定められた量に達した場合に停止条件が成立したと判断するように設定されている。使用メモリ量は、任意に設定可能である。
【００８４】
また、予め指定された要素がステップ１０２の解析処理中に検出された場合に、停止条件が成立したと判断されるように設定されていてもよい。この要素の指定も、予めユーザが任意に設定することができる。
【００８５】
更にまた、ＤＯＭツリーの作成量が予め定められた量に達した場合、及び、予め指定された要素が検出された場合、の少なくともいずれか一方の場合に、停止条件が成立したと判断されるように設定されていてもよい。
【００８６】
ステップ１１２で、停止条件は成立していないと判断した場合には、ステップ１１６に移行し、ＸＭＬ文書全体の解析が終了したか否かを判断する。
【００８７】
ステップ１１６でＸＭＬ文書全体の解析が終了していないと判断した場合には、ステップ１０２に戻り、更にＸＭＬ文書を読み込んで、次の開始タグ及び終了タグを検出することにより、ＸＭＬ文書の解析を続ける。
【００８８】
ステップ１１２で、停止条件が成立したと判断した場合には、ステップ１１４に移行し、停止条件成立時の処理を行う。
【００８９】
図７は、停止条件成立時の処理の流れを示すフローチャートである。
【００９０】
ステップ３００で、上位ＸＭＬアプリケーション５０に停止条件が成立した旨を通知する。なお、ここで、これまで作成したＤＯＭツリーを上位ＸＭＬアプリケーション５０に渡すこともできる。その場合には、ＸＭＬプロセッサ１０は上位ＸＭＬアプリケーション５０に対して、作成したＤＯＭツリーの格納場所をＤＯＭインタフェース６０を介して通知する形でＤＯＭツリーを渡す。
【００９１】
ステップ３０２で、上位ＸＭＬアプリケーション５０から削除要求を取得したか否かを判断する。削除要求を取得していないと判断した場合には、ステップ３０８に移行し、第２記憶部への格納要求を取得したか否かを判断する。ここで第２記憶部への格納要求を取得していないと判断した場合には、ステップ３０２に戻る。すなわち、ＸＭＬプロセッサ１０は、上位ＸＭＬアプリケーション５０からの要求を取得するまで待機状態を維持する。
【００９２】
ステップ３０２で、削除要求を取得したと判断した場合には、ステップ３０４に移行し、削除する要素（ノード）の検索を行う。
【００９３】
例えば、階層が深いノード及び該ノードより上位のノードを検索するようにし、該ノードが優先的に削除されるようにしてもよい。
【００９４】
以下、この検索処理について図８及び図９を参照しながら、詳細に説明する。図８は、元のＸＭＬ文書を示し、図９は、図８のＸＭＬ文書から作成されて第１記憶部１４に格納されているＤＯＭツリーを示している。
【００９５】
図９に示されるように、点線▲３▼で囲まれた部分木のａｕｔｈｏｒ要素は、同じ階層の他の要素ｐ、ｔｉｔｌｅと比較して、階層が深いノードを従属している。例えば、要素ノード（ｆｉｒｓｔ、ｌａｓｔ）やＴｅｘｔノード（ａａａ、ｂｂｂ、ｓｓｓｓ）は、他の要素ｐ、ｔｉｔｌｅを起点とする部分木の末端のノードの階層より深い。
【００９６】
従って、ここで削除の対象として検索されるノードは、点線▲３▼で囲まれた部分木のノードである。
【００９７】
このようなノードを削除対象とすることにより、浅い階層の部分木、すなわちデータサイズの小さい部分木を削除する場合に比べて、第１記憶部１４の空容量を増やすことができる。なお、後述するように、第１記憶部１４から削除したノードの情報を第２記憶部１８に格納しておき、必要に応じて第２記憶部１８から読み出して第１記憶部１４に展開することができるようにする場合には、大きいサイズの部分木を構成するノードを削除して格納する方が、小さいサイズの部分木を多く削除して格納する場合に比して、第２記憶部１８から読み出して第１記憶部１４に展開する処理回数を減らすことができ、装置の負荷を低減させることができる。
【００９８】
なお、上位ＸＭＬアプリケーション５０の削除要求に、削除する要素の指示が含まれていてもよい。これにより、上位ＸＭＬアプリケーション５０からの指示に基づいて削除する要素（ノード）の検索を行うことができる。
【００９９】
ステップ３０６で、検索したノードを第１記憶部１４から削除する。
【０１００】
また、削除要求を取得する代わりに、ステップ３０８で、第２記憶部格納要求を取得した場合には、ステップ３１０及びステップ３１２で、上述したステップ３０４及びステップ３０６と同様の処理を行った後、ステップ３１４の第２記憶部格納処理に移行し、削除したノードの情報を第２記憶部に格納する。第２記憶部１８に格納することにより、必要に応じて、削除したノードを第１記憶部１４に展開して用いることができる。第２記憶部格納処理の流れの詳細は前述した通りであるため、説明を省略する。
【０１０１】
ステップ３０６またはステップ３１４の処理の後は、ステップ３１６に移行し、第１記憶部１４に記憶されているＤＯＭツリーの記憶量が指定量以下になったか否かを判断する。指定量以下になっていないと判断した場合には、ステップ３００に戻り、上位ＸＭＬアプリケーション５０にＤＯＭツリーの記憶量が指定量以下になっていない旨を通知して、ステップ３０２以降の処理を繰り返す。ステップ３００からステップ３１６までの処理は、ステップ３１６で肯定判断されるまで繰り返される。
【０１０２】
ステップ３１６で、肯定判断した場合には、ステップ３１８で、上位ＸＭＬアプリケーション５０に対して、ＤＯＭツリーの記憶量が指定量以下になったことを通知する。
【０１０３】
ステップ３２０で、上位ＸＭＬアプリケーション５０から、処理を再開する要求（再開要求）を取得したか否かを判断する。再開要求を取得するまでは、待機が維持されるので、次の解析処理以降が停止される。再開要求を取得した場合には、図５のステップ１１６に戻り、解析処理を再開する。
【０１０４】
図５において、ステップ１１６で肯定判断されるまでは、ステップ１０２のからステップ１１６の処理が繰り返され、順に部分木が作成されて記憶される。
【０１０５】
ステップ１１６で肯定判断した場合には、ステップ１１８に移行し、作成したＤＯＭツリーを上位ＸＭＬアプリケーション５０に渡し、ＤＯＭツリー作成処理を終了する。なお、ここでは、上位ＸＭＬアプリケーション５０及びＤＯＭインタフェース６０はＸＭＬプロセッサ１０と同じハードウェア資源を用いて実現されているため、ＸＭＬプロセッサ１０は、上位ＸＭＬアプリケーション５０に、作成したＤＯＭツリーの格納場所をＤＯＭインタフェース６０を介して通知する形でＤＯＭツリーを渡す。
【０１０６】
以上説明したように、入力したＸＭＬ文書を解析する処理と、解析結果に基づいて部分木を作成する処理と、作成した部分木を第１記憶部１４に格納する処理とを繰り返す間に、停止条件が成立したと判断された場合には、少なくとも１つの処理を停止するようにしたため、第１記憶部１４の容量不足により、作成した部分木を格納できずにエラーが発生する前に、処理を停止することができる。
【０１０７】
また、停止条件が成立した後、第１記憶部１４に記憶されたＤＯＭツリーのノードを削除するようにしたため、第１記憶部１４が容量不足になる事態を防止できる。
【０１０８】
また、削除したノードの情報を、必要に応じて第２記憶部１８に格納するようにしたため、上位ＸＭＬアプリケーション５０の操作に必要なノードを削除した場合であっても、該ノードの情報は失われず、必要に応じて第２記憶部１８から読み出して第１記憶部１４に展開して用いることができる。
【０１０９】
また、停止条件が成立した後、第１記憶部１４に格納したＤＯＭツリーの記憶量が指定量以下となった場合に、停止状態を解除し、停止した処理を再開するようにしたため、ＸＭＬ文書全体について解析を行うことができ、処理が中断されたまま終了するような事態を防止できる。
【０１１０】
なお、削除処理は、上述した例に限定されず、例えば、予め定義された集合に属する要素毎に削除するようにしてもよい。更に、定義された集合毎に優先順位を定めておき、優先順位の高い（或いは低い）集合に属する要素から削除するようにしてもよい。集合を定義する方法として、例えば、取り扱う文書がＸＭＬ文書の場合には、ＸＭＬのＮａｍｅｓｐａｃｅという規格を採用することができる。
【０１１１】
Ｎａｍｅｓｐａｃｅは、要素がどの集合に属するかを指定するために用いられる。
【０１１２】
以下、図１０及び図１１を参照しながら、Ｎａｍｅｓｐａｃｅを用いて集合を定義し、定義した集合毎に優先順位を設けて削除する例について説明する。
【０１１３】
図１０は、Ｎａｍｅｓｐａｃｅを定義するテーブルの一例であり、例えば、上位ＸＭＬアプリケーションの起動時等にＸＭＬプロセッサ１０に登録される。Ｓｅｒｖｉｃｅｎａｍｅ８０は、上位ＸＭＬアプリケーション５０が提供するサービス（すなわち、集合）を定義する。ｎａｍｅｓｐａｃｅＰｒｅｆｉｘ８２は、定義されたサービスと各要素を結びつけるための接頭辞であり、この接頭辞をＸＭＬ文書の開始タグ等に記述しておくことにより、各要素を接頭辞で表されるサービスに属する要素として関連付けることができる。ここで、ｎｏｎｅは、どのサービスにも属さないことを意味する。Ｐｒｉｏｒｉｔｙ８４は、優先順位を定義する。この項目を参照することにより、優先順位の高い（或いは低い）サービスの接頭辞が記述された要素を優先的に削除することができる。
【０１１４】
図１１は、定義された接頭辞を用いて記述されているＸＭＬ文書の一例を示している。
【０１１５】
図示されるように、ａという接頭辞が記述されている要素（例えば＜ａ：ａｃｔｉｏｎ＞及び＜／ａ：ａｃｔｉｏｎ＞で示される要素）は、ａｓｅｒｖｉｃｅに属するとみなされる。また、ｂという接頭辞が記述されている要素（例えば、＜ｂ：ａｃｔｉｏｎ＞及び＜／ｂ：ａｃｔｉｏｎ＞で示される要素）は、ｂｓｅｒｖｉｃｅに属するとみなされる。また、接頭辞の記述がない要素は、どのサービスにも属さないとみなされる。ここで、優先順位の低いサービスが優先的に削除されるように設定されている場合には、図１０に示されるテーブルに従い、接頭辞の記述のない要素ｄａｔｅのノードと該ノードに従属するノードが優先的に削除される。
【０１１６】
例えば、要素を印刷サービスに用いる場合には１度参照するとその後は参照されない場合が多いため、優先順位を低く設定して優先的に削除されるようにし、画面表示サービスに用いる場合には再度参照される可能性が高いため、優先順位を高く設定して削除されないようにすることができる。
【０１１７】
更に、削除処理は、このような例に限定されず、例えば、参照回数の少ない要素を起点とする部分木を構成するノードを優先的に削除するようにしてもよいし、最後に参照されたときからの経過時間が長い要素を起点とする部分木を構成するノードを優先的に削除するようにしてもよい。参照回数が少ない要素、或いは最後に参照されたときからの経過時間が長い要素は、上位ＸＭＬアプリケーション５０が参照或いは操作する可能性が低いとみなせるため、これらを削除の対象とすることが好ましい。このような削除処理を行う場合には、例えば、上位ＸＭＬアプリケーション５０がどの要素をいつ参照したか、という情報を所定のテーブルに登録しておき、このテーブルを参照することにより判断してもよい。
【０１１８】
また、他の要素から参照されている要素を起点とする部分木を構成するノードは削除されないようにすることが好ましい。例えば、ＸＭＬで用いられるＸＰａｔｈを用いて他の要素から参照されている要素については、削除対象から外すようにする。ＸＰａｔｈは、ＤＯＭツリーを構成するノードの位置情報を記述する言語である。ある要素の位置情報を示すＸｐａｔｈを、他の要素に保持させておけば、該Ｘｐａｔｈで指示された要素の内容を該他の要素が参照することができる。他の要素から参照されている要素を削除すると、削除された要素の内容を他の要素が参照できなくなるため、このような要素を起点とする部分木を構成するノードについては削除の対象としないことが好ましい。
【０１１９】
また、予め参照される可能性が低い要素がわかっている場合には、予めその要素を指定しておき、該要素を起点とする木を構成するノードを優先的に削除するようにしてもよい。
【０１２０】
また、要素毎に優先順位を定めておいてもよい。これにより所望のＤＯＭツリーを作成することができる。
【０１２１】
また、上位ＸＭＬアプリケーション５０から、削除する要素の指定があった場合には、該指定された要素を起点とする木を構成するノードを優先的に削除するようにすることができる。
【０１２２】
なお、削除処理は、これらのうち複数を実行してもよいし、いずれか１つを実行するようにしてもよい。
【０１２３】
また、このような削除の仕方は、予めＸＭＬプロセッサ１０に設定しておいてもよいし、上位ＸＭＬアプリケーション５０から指定されてもよいし、ユーザが任意に設定するようにしてもよい。
【０１２４】
［第２の実施の形態］
第１の実施の形態では、ＤＯＭツリーの作成量（部分木の作成量の累積）が所定の量に達した場合、或いは、予め指定された要素が検出された場合に、停止条件が成立したと判断されるように設定されている例について説明したが、本実施の形態では、ＤＯＭツリーの作成量に代えて、解析処理の解析量を用いる例について説明する。
【０１２５】
なお、本実施の形態におけるハードウェア構成及びソフトウェア構成は第１の実施の形態と同様であるため、説明を省略する。
【０１２６】
図１２は、本実施の形態に係るＤＯＭツリー作成処理ルーチンの流れを示したフローチャートである。
【０１２７】
本実施の形態では、ステップ４００の構造化文書取得処理、ステップ４０２の解析処理の後、ステップ４０４で、停止条件が成立したか否かを判断する。
【０１２８】
本実施の形態では、解析処理の解析量（累積）が所定の量に達した場合に、停止条件が成立したと判断されるように設定されている。ここでいう解析量は、ＸＭＬプロセッサ１０が読み込んで解析したＸＭＬ文書の量である。なお、該所定の量はユーザが予め任意に設定することができる。
【０１２９】
また、第１の実施の形態と同様に、予め指定された要素がステップ４０２の解析処理中に検出された場合に、停止条件が成立したと判断されるように設定されていてもよい。この要素の指定も、予めユーザが任意に設定することができる。
【０１３０】
更にまた、解析処理の解析量（累積）が所定の量に達した場合、及び、予め指定された要素が検出された場合、の少なくともいずれか一方の場合に、停止条件が成立したと判断されるように設定されていてもよい。
【０１３１】
ステップ４０４で、停止条件は成立していないと判断した場合には、ステップ４０８に移行し、ステップ４０２の解析結果に基づいて、部分木を作成する。
【０１３２】
ステップ４０４で、停止条件が成立したと判断した場合には、ステップ４０６に移行し、停止条件成立時の処理を行う。停止条件成立時の処理は、第１の実施の形態と同様であるため説明を省略する。
【０１３３】
停止条件成立時の処理後は、ステップ４０８に移行して、ステップ４０２の解析結果に基づいて、部分木の作成を再開する。
【０１３４】
ステップ４１０では、作成された部分木を第１記憶部１４に格納する。
【０１３５】
ステップ４１２では、ＸＭＬ文書全体の解析が終了したか否かを判断する。ここでＸＭＬ文書全体の解析が終了していないと判断した場合には、ステップ４０２に戻り、更にＸＭＬ文書を読み込んで、ＸＭＬ文書の解析を続ける。
【０１３６】
ステップ４１２でＸＭＬ文書全体の解析が終了したと判断した場合には、ステップ４１４に移行し、作成したＤＯＭツリーを上位ＸＭＬアプリケーション５０に渡し、ＤＯＭツリー作成処理を終了する。
【０１３７】
以上説明したように、停止条件が成立したか否かを、解析処理の解析量によって判断するようにしたため、部分木作成処理及び記憶処理前に、停止条件が成立したか否かを判断でき、部分木作成処理を停止して、停止条件成立時の処理を行い第１記憶部１４の空容量を増加させることができる。
【０１３８】
なお、停止条件の成立の判断は、第１の実施の形態及び第２の実施の形態で示した例に限らず、例えば、解析処理の解析量（累積）が所定の量に達した場合、ＤＯＭツリーの作成量が予め定められた量に達した場合、及び、予め指定された要素が検出された場合、の少なくともいずれかの場合に、停止条件が成立したと判断されるように設定されていてもよい。
【０１３９】
［第３の実施形態］
第１の実施の形態及び第２の実施の形態における停止条件成立時の処理では、ＸＭＬプロセッサ１０が上位ＸＭＬアプリケーション５０から削除要求や第２記憶部格納要求を取得した場合に、削除処理や第２記憶部への格納処理を行う例について説明したが、本実施の形態では、ＸＭＬプロセッサ１０により自動的に削除処理や第２記憶部への格納処理を行う例について説明する。
【０１４０】
図１３は、本実施の形態に係る停止条件成立時の処理の流れを示すフローチャートである。
【０１４１】
ステップ５００で、削除する要素（ノード）の検索を行う。
【０１４２】
ステップ５０２で、ステップ５００で検索されたノードを第１記憶部１４から削除する。なお、削除処理は、第１の実施の形態と同様に行うことができる。
【０１４３】
ステップ５０４で、削除したノードの情報を第２記憶部１８へ格納するか否かを判断する。例えば、第１記憶部１４から削除したノードが常に第２記憶部１８に格納されるように設定されていれば、ＸＭＬプロセッサ１０はステップ５０４で肯定判断し、ステップ５０６に移行して削除したノードについて第２記憶部格納処理を実行する。なお、ステップ５０６の第２記憶部格納処理の流れの詳細は、前述した通りであるため、説明を省略する。
【０１４４】
なお、予め指定されている要素についてのみ第２記憶部に格納するように設定されていてもよい。この場合には、該指定のある要素ノード及び該要素の内容を示すＴｅｘｔノードが削除された場合のみ、第２記憶部格納処理を実行する。
【０１４５】
また、第２記憶部１８への格納はいっさい行わないように設定されている場合には、ステップ５０４で常に否定判断し、ステップ５０８に移行する。
【０１４６】
ステップ５０８で、第１記憶部１４に記憶されているＤＯＭツリーの記憶量が指定量以下になったか否かを判断する。指定量以下になっていないと判断した場合には、ステップ５００に戻り、上述の処理を繰り返す。ステップ５００からステップ５０８までの処理は、ステップ５０８で肯定判断されるまで繰り返される。
【０１４７】
ステップ５０８で、肯定判断した場合には、ステップ５１０で、上位ＸＭＬアプリケーション５０に対して、ＤＯＭツリーの記憶量が指定量以下になったことを通知する。
【０１４８】
ステップ５２０で、上位ＸＭＬアプリケーション５０から、処理を再開する要求（再開要求）を取得したか否かを判断する。再開要求を取得するまでは、待機が維持されるので、次の処理（第１の実施の形態では解析処理、第２の実施の形態では部分木作成処理）以降が停止される。再開要求を取得した場合には、図５のステップ１１６或いは図１２のステップ４０８に戻り、処理を再開する。
【０１４９】
以上説明したように、停止条件成立時の処理を、上位ＸＭＬアプリケーション５０からの削除要求或いは第２記憶部格納要求を取得せずに、自動的に判断して削除処理及び格納処理を行うことができる。
【０１５０】
［第４の実施形態］
本実施の形態では、指定された要素については、停止条件が成立したか否かに拘らず、部分木を作成せずに元のＸＭＬ文書のテキスト形式のまま第２記憶部１８に格納する例について説明する。
【０１５１】
第１から第３の実施の形態におけるＤＯＭツリー作成処理ルーチンに、図１４に示されるステップを追加する。
【０１５２】
図５のステップ１０２、或いは図１２のステップ４０２の後、ステップ６００で、解析した要素が、部分木作成前にテキスト形式のまま第２記憶部１８へ格納するように指定されている要素か否かを判断する。
【０１５３】
ここで、否定判断した場合には、第２記憶部１８への格納を行わずに、図５のステップ１０４、或いは図１２のステップ４０４に移行する。
【０１５４】
ステップ６００で、肯定判断した場合には、ステップ６０２で、解析した要素を、テキスト形式のまま第２記憶部１８に格納する。格納後は、図５のステップ１１６、及び図１２のステップ４１２に移行する。
【０１５５】
このように、予め指定のある要素については、部分木を作成せずに第２記憶部に記憶するようにしたため、第１記憶部１４におけるＤＯＭツリーの記憶量を削減することができると共に、無駄な処理を省き、効率的にＤＯＭツリーを作成することができる。
【０１５６】
［第５の実施形態］
本実施の形態では、指定された要素については、停止条件が成立したか否かに拘らず、作成された部分木を常に第２記憶部１８に格納する例について説明する。
【０１５７】
第１から第３の実施の形態におけるＤＯＭツリー作成処理ルーチンに、図１５に示されるステップを追加する。
【０１５８】
図５のステップ１０６、或いは図１２のステップ４０８の後、ステップ７００で、部分木を作成した要素が予め第２記憶部１８へ格納するように指定されている要素か否かを判断する。
【０１５９】
ここで、否定判断した場合には、第２記憶部１８への格納を行わずに、図５のステップ１０８、或いは図１２のステップ４１０に移行する。
【０１６０】
ステップ７００で、肯定判断した場合には、ステップ７０２の第２記憶部格納処理に移行し、作成された部分木を第２記憶部１８に格納する処理を行う。第２記憶部格納処理は、第１の実施の形態と同様であるため説明を省略する。格納後は、図５のステップ１１６、及び図１２のステップ４１２に移行する。
【０１６１】
このように、予め指定のある要素については、停止条件が成立したか否かに拘らず、作成された部分木を第２記憶部に記憶するようにしたため、第１記憶部１４におけるＤＯＭツリーの記憶量を削減することができる。
【０１６２】
［第６の実施形態］
本実施の形態では、第２記憶部１８に同じ要素名を有する要素を複数格納する場合には、該同じ要素名に所定の番号を対応付けておき、該要素名を対応付けられた番号に置換えて記憶する例について説明する。
【０１６３】
図１６は、複数の同じ要素名が記述されたＸＭＬ文書の一例を示した図である。図示されるように、ｃａｔａｌｏｇ、ｂｏｏｋｎａｍｅという名前の要素が複数記述されている。
【０１６４】
図１７は、図１６のＸＭＬ文書を基に作成されたＤＯＭツリーを示した図である。
【０１６５】
ここで、ｃａｔａｌｏｇ、ｂｏｏｋｎａｍｅという要素名のそれぞれに所定の番号を対応付ける。図１８は、その対応付けを格納したテーブルの一例である。ｃａｔａｌｏｇという要素名には番号１が対応付けられ、ｂｏｏｋｎａｍｅという要素名には番号２が対応付けられている。ＸＭＬプロセッサ１０は、同じ名前の要素名が複数存在する場合には、該要素名に所定の番号を対応付け、このテーブルに逐次登録する。
【０１６６】
図１９は、図１８に示されたテーブルを用いて、各要素名を対応付けられた番号におきかえた状態を示している。図示されるように、ｃａｔａｌｏｇという要素名は番号１に置換えられ、ｂｏｏｋｎａｍｅという要素名は番号２に置換えられている。
【０１６７】
このように、複数の同じ要素名を対応付けられた番号に置換えて格納することにより、記憶量を削減することができる。
【０１６８】
なお、要素名に限らず、要素に付されている属性（要素の特性を示す情報）の名前（属性名）についても、同じ名前が複数存在する場合には、同様にしてテーブルを用いて所定の番号を対応付け、該番号に置換えて記憶するようにすることができる。
【０１６９】
【発明の効果】
以上説明したように、複数の要素が階層的に記述された文書を解析する解析処理と、解析処理の解析結果に基づいて、該文書を構成する各要素をノードとする木構造で表現したオブジェクトを作成する作成処理と、該作成処理により作成されたオブジェクトを記憶する記憶処理と、を繰り返す間に、予め定められた停止条件が成立したと判断された場合に、解析処理、作成処理、及び記憶処理の少なくとも１つが停止されるようにしたため、記憶容量が不足して、作成した木構造のオブジェクトを記憶できずにエラーが発生する前に、処理状態に応じて処理を停止することができる、という効果を奏する。
【図面の簡単な説明】
【図１】図１（Ａ）は、本実施の形態に係る構造化文書処理システムの機能構成と、構造化文書処理システムで行われる処理を概念的に示した図であり、図１（Ｂ）は、図１（Ａ）で示された構造化文書処理システムの機能構成と、構造化文書処理システムで行われる処理とを更に詳細に示した図である。
【図２】ＸＭＬ文書の一例を示した図である。
【図３】図２のＸＭＬ文書を解析して作成されたＤＯＭツリーの一例を示した図である。
【図４】ＸＭＬプロセッサの機能を実現するためのハードウェア資源としての情報処理装置の構成を示すブロック図である。
【図５】第１の実施の形態に係るＤＯＭツリー作成処理ルーチンの流れを示したフローチャートである。
【図６】第２記憶部格納処理の流れを示すフローチャートである。
【図７】停止条件成立時の処理の流れを示すフローチャートである。
【図８】ＸＭＬ文書の一例を示した図である。
【図９】図８のＸＭＬ文書から作成されて第１記憶部に格納されているＤＯＭツリーの一例を示した図である。
【図１０】Ｎａｍｅｓｐａｃｅを定義するテーブルの一例である。
【図１１】定義された接頭辞を用いて記述されているＸＭＬ文書の一例を示した図である。
【図１２】第２の実施の形態に係るＤＯＭツリー作成処理ルーチンの流れを示したフローチャートである。
【図１３】第３の実施の形態に係る停止条件成立時の処理の流れを示すフローチャートである。
【図１４】第４の実施の形態に係る追加ステップを示した図である。
【図１５】第５の実施の形態に係る追加ステップを示した図である。
【図１６】複数の同じ要素名が記述されたＸＭＬ文書の一例を示した図である。
【図１７】図１６のＸＭＬ文書を基に作成されたＤＯＭツリーを示した図である。
【図１８】要素名と番号との対応付けを格納したテーブルの一例である。
【図１９】図１８に示されたテーブルを用いて、各要素名を対応付けられた番号におきかえた状態を示した図である。
【符号の説明】
１０ＸＭＬプロセッサ
１１情報処理装置
１４第１記憶部
１８第２記憶部
２０ＣＰＵ
２４ＲＯＭ
５０上位ＸＭＬアプリケーション[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information processing apparatus, an information processing method, and a program for processing a document in which a plurality of elements are hierarchically described. In particular, an object expressed in a tree structure in which each element constituting the document is a node. The present invention relates to an information processing apparatus, an information processing method, and a program to be created.
[0002]
[Prior art]
XML (extensible Markup Language) is known as a unified standard for enabling a data format that has been dispersed by a computer or application to be used by different computers or applications. XML is a typical standard for structured documents that hierarchically describe a plurality of elements.
[0003]
DOM (Document Object Model) is known as an API (Application Programming Interface) for operating a structured document such as an XML document by an application. DOM is an API for handling a structured document as a tree-structured object.
[0004]
By converting the structured document to DOM, the application can recognize the tree structure of the structured document and can access necessary elements by tracing the tree.
[0005]
However, when converting a structured document to DOM, the larger the size of the structured document, the larger the tree-structured object after conversion, and a larger storage capacity is required to store this. Even when the hierarchy is deep, a large storage capacity is required to represent the hierarchical structure. Therefore, there is a problem that even if the tree structure is converted, the data cannot be stored in the storage device due to a lack of capacity, the process is interrupted and the processing is interrupted, or even if the data can be stored, the access efficiency is deteriorated.
[0006]
As a device for reducing the necessary storage capacity, a device (see, for example, Patent Document 1) that reduces the amount of operation memory by reducing the hierarchy of structured documents and improves data access efficiency, or a plurality of devices. There is known an apparatus (for example, see Patent Document 2) that combines the above elements together with information indicating their positional relationship as one element.
[0007]
[Patent Document 1]
JP 2002-297469 A
[Patent Document 2]
JP 2002-108850 A
[0008]
[Problems to be solved by the invention]
However, since the conventional apparatus described above is an apparatus that performs processing without determining an actual processing state (for example, the amount of object creation), the operation storage capacity may not be reduced as much as necessary. Therefore, even if converted to a tree structure, the problem that the process cannot be stored in the storage device due to a lack of capacity and the process is interrupted, and the problem that the access efficiency is deteriorated even if stored can still be solved. Not.
[0009]
The present invention has been proposed in order to solve the above-described problem. During the process of converting a document that hierarchically describes a plurality of elements into a tree structure, the tree after conversion due to insufficient storage capacity is provided. An object of the present invention is to provide an information processing apparatus, an information processing method, and a program that can stop processing according to a processing state before an error occurs because an object of a structure cannot be stored.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, an information processing apparatus of the present invention includes an input unit that inputs a document in which a plurality of elements are hierarchically described, an analysis process that analyzes the document input by the input unit, Based on the analysis result of the analysis process, a creation process for creating an object expressed in a tree structure with each element constituting the document as a node and a storage process for storing the object created by the creation process are repeated. A processing unit; a determination unit that determines whether or not a predetermined stop condition is satisfied; and when the determination unit determines that the stop condition is satisfied, the analysis process, the creation process, and the And control means for controlling the processing means so that at least one of the storage processes is stopped.
[0011]
The input means inputs a document in which a plurality of elements are hierarchically described. A document in which a plurality of elements are described hierarchically can be, for example, a structured document represented by an XML document or the like.
[0012]
The processing means repeatedly performs analysis processing, creation processing, and storage processing. The analysis process is a process of analyzing the document input by the input unit. For example, the hierarchical structure of each element described in the document can be analyzed while reading the input document in order from the top.
[0013]
The creation process is a process of creating an object expressed in a tree structure in which each element constituting the input document is a node (node) based on the analysis result of the analysis process.
[0014]
The storage process stores the object created by the creation process. The storage medium for storing the object is not particularly limited, and may be a RAM generally used as a main memory, for example.
[0015]
The determination means determines whether or not a predetermined stop condition is satisfied. For example, when the analysis amount of the analysis process reaches a predetermined amount, when the creation amount of the creation process reaches a predetermined amount, and when a predesignated element is detected by the analysis process, In at least one of the cases, it can be determined that the stop condition is satisfied.
[0016]
The control unit controls the processing unit so that at least one of the analysis process, the creation process, and the storage process is stopped when the determination unit determines that the stop condition is satisfied.
[0017]
Thus, when it is determined that the stop condition is satisfied, the processing means is controlled so that at least one of the analysis process, the creation process, and the storage process is stopped. Before the generated tree-structured object cannot be stored and an error occurs, at least one of the analysis processing, creation processing, and storage processing of the processing means can be stopped.
[0018]
Furthermore, the information processing apparatus according to the present invention may further include a deleting unit that deletes the node of the stored object after the stop condition is satisfied.
[0019]
The deletion means may delete the stored object node when an external instruction is received after the stop condition is satisfied.
[0020]
In other words, the deletion unit can perform deletion when receiving an instruction from the outside to delete the created object from an application that refers to or manipulates the created object.
[0021]
The deletion unit preferentially deletes a node having a deep hierarchy and a node higher than the node when deleting a node, and preferentially deletes a node constituting a tree starting from an element having a low reference count. , Delete preferentially the nodes that make up the tree starting from the element that has elapsed since the last reference, other than the nodes that make up the tree starting from the element referenced by other elements Delete nodes preferentially, delete nodes that make up a tree starting from a predesignated element, delete each element belonging to a predefined set, delete according to a predetermined priority order It is also possible to execute at least one of preferentially deleting a node constituting a tree starting from an element designated from the outside.
[0022]
For example, a tree including nodes with deep hierarchies requires a larger storage capacity for representing the hierarchies. Therefore, it is preferable to delete a node having a deep hierarchy and a node higher than the node with priority.
[0023]
For example, in the case of an element that has been referred to by an application that refers to an object created in the past, the possibility that it will be referred to in the future is low. For this reason, it is preferable to preferentially delete the nodes constituting the tree starting from an element with a low reference count. Note that “reference” here includes a case where an application manipulates an object.
[0024]
For example, when a long time has passed since the last reference by an application that refers to the created object, there is a low possibility that it will be referred to in the future. For this reason, it is preferable to preferentially delete nodes constituting a tree starting from an element having a long elapsed time since the last reference. Similarly, “reference” here includes a case where an application manipulates an object.
[0025]
For example, when other elements use information of an element, that is, when an element refers to an element, if the referenced element is deleted, the other element that refers to the element is referred to The previous information becomes unusable. For this reason, it is preferable to preferentially delete nodes other than the nodes constituting the tree starting from an element referred to by another element.
[0026]
For example, if an element that is unlikely to be referred to in advance is known, the element can be designated in advance, and nodes constituting the tree starting from the element can be preferentially deleted.
[0027]
For example, if the elements to be referenced differ for each service provided by the application that refers to the created object, define each service and the element to be referenced by each service (elements belonging to the service). It can also be deleted. In this way, it is possible to delete each element belonging to a predefined set.
[0028]
For example, when an element that is unlikely to be referred to in advance or an element that is highly likely to be referenced is known, if a priority order is determined in advance, the element can be deleted according to the priority order.
[0029]
For example, when an element to be deleted is designated in advance, nodes constituting a tree starting from the designated element can be preferentially deleted.
[0030]
Further, the control means can further control to store the information of the node deleted by the deleting means in a storage area different from the storage area stored in the storage processing.
[0031]
The control means can also be controlled to store in the other storage area when receiving an instruction from the outside.
[0032]
The control unit can also control to store the information of the node deleted by the deletion unit as text data.
[0033]
That is, by storing the deleted node information as text data without storing it as a tree-structured object, the data size to be stored is reduced.
[0034]
When there are a plurality of the same element names or a plurality of the same attribute names, the control means can also control to store these element names or attribute names by replacing them with associated numbers.
[0035]
That is, the data size to be stored is reduced by replacing a plurality of identical element names with associated numbers and storing them. Further, not only the element name but also the name (attribute name) of the attribute (information indicating the characteristic of the element) attached to the element is stored in the same manner when there are a plurality of the same names. be able to.
[0036]
The information processing apparatus of the present invention further includes a restart condition determining unit that determines whether or not a predetermined restart condition is satisfied, and the control unit is configured to execute the restart condition determining unit by the restart condition determining unit after the stop condition is satisfied. When it is determined that the restart condition is satisfied, the stop may be released, and the processing unit may be controlled so that the stopped process is restarted.
[0037]
The restart condition determining means determines whether or not a predetermined restart condition is satisfied. For example, it is determined that the restart condition is satisfied in at least one of a case where a restart instruction is received from the outside and a storage amount of the object stored by the storage process is equal to or less than a predetermined amount. You can also do it.
[0038]
The control means releases the stop and controls the processing means so that the stopped process is resumed when the restart condition determining means determines that the restart condition is satisfied after the stop condition is satisfied.
[0039]
The processing means may be configured not to perform the creation process for the element and elements subordinate to the element when a previously designated element is detected in the analysis process.
[0040]
The element that designates not to perform the creation process can be, for example, an element that is less frequently referenced / operated by an application that references / operates the created object.
[0041]
In the storage process of the processing means, the information of the element that has not been subjected to the creation process and the element subordinate to the element is stored in a storage area that is different from the storage area that stores the created object. You can also.
[0042]
The information processing method of the present invention is based on an input process for inputting a document in which a plurality of elements are hierarchically described, an analysis process for analyzing the document input by the input process, and an analysis result of the analysis process A processing process for repeating a creation process for creating an object expressed in a tree structure with each element constituting the document as a node, and a storage process for storing the object created by the creation process; and a predetermined process step A determination step for determining whether or not a stop condition is satisfied; and when the determination step determines that the stop condition is satisfied, stop at least one of the analysis process, the creation process, and the storage process And a stopping process.
[0043]
According to this information processing method, when it is determined that the stop condition is satisfied, at least one of the analysis process, the creation process, and the storage process is stopped. Before an error occurs because the tree-structured object cannot be stored, at least one of the analysis process, the creation process, and the storage process can be stopped.
[0044]
The information processing method of the present invention may further include a deletion step of deleting the stored object node after the stop condition is satisfied.
[0045]
The information processing method of the present invention may further include a storage step of storing the node information deleted in the deletion step in a storage area different from the storage area stored in the storage process.
[0046]
In the information processing method of the present invention, a restart condition determination step for determining whether or not a predetermined restart condition is satisfied, and after the stop condition is satisfied, the restart condition determination step determines that the restart condition is satisfied. In this case, it may be configured to further include a restarting step for releasing the stop and restarting the stopped process.
[0047]
The program of the present invention is based on an input process for inputting a document in which a plurality of elements are hierarchically described in a computer, an analysis process for analyzing the document input by the input process, and an analysis result of the analysis process And a processing step for repeating a creation process for creating an object expressed in a tree structure having each element constituting the document as a node and a storage process for storing the object created by the creation process, A determination step for determining whether or not the stop condition is satisfied, and when the determination step determines that the stop condition is satisfied, at least one of the analysis process, the creation process, and the storage process is performed. And a stop step for stopping.
[0048]
When it is determined that the stop condition is satisfied by causing the computer to execute such a program, the storage capacity is insufficient because at least one of the analysis process, the creation process, and the storage process is stopped. At least one of the analysis process, the creation process, and the storage process can be stopped before the generated tree-structured object cannot be stored and an error occurs.
[0049]
The storage medium for storing the program is not particularly limited, and may be a ROM, a CD-ROM, a DVD disk, a magneto-optical disk, an IC card, a hard disk, or the like, or a carrier wave on a telecommunication line. It may be a simple transmission medium.
[0050]
The program of the present invention may further include a deletion step of deleting the stored object node after the stop condition is satisfied.
[0051]
The program of the present invention may further include a storage step of storing the information of the node deleted by the deletion step in a storage area different from the storage area stored in the storage process.
[0052]
The program of the present invention determines that the restart condition is satisfied by the restart condition determining step for determining whether or not a predetermined restart condition is satisfied, and after the stop condition is satisfied, by the restart condition determining step. In this case, it may be configured to further include a restarting step for releasing the stop and restarting the stopped process.
[0053]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
[0054]
[First Embodiment]
FIG. 1A is a diagram conceptually showing the functional configuration of the structured document processing system according to the present embodiment and the processing performed in the structured document processing system. As shown in the figure, when an XML document 30 described in a text format is input to an XML processor (XML parser) 10 as an information processing apparatus of the present invention, the XML processor 10 converts each element of the XML document 30. An object (DOM tree 40) expressed by a tree structure as a node is created. The upper XML application 50 can refer to or operate the created DOM tree 40. .
[0055]
FIG. 1B is a diagram showing in more detail the functional configuration of the structured document processing system shown in FIG. 1A and the processing performed in the structured document processing system.
[0056]
In (1), in order for the upper XML application 50 to refer to and operate the XML document 30, the XML processor 10 is instructed to create the DOM tree 40 via the DOM interface 60.
[0057]
In (2), the XML processor 10 that has received the instruction inputs and analyzes the XML document 30.
[0058]
FIG. 2 shows an example of the XML document 30. As shown in the figure, the XML document 30 is a text format in which a plurality of elements are hierarchically described by a start tag <XXX> indicating the start of the element and an end tag </ XXX> indicating the end of the element. Has been.
[0059]
Specifically, the XML processor 10 analyzes the hierarchical structure of each element of the XML document 30 by sequentially reading the XML document 30 as shown in FIG. 2 and detecting the start tag and the end tag. To do.
[0060]
Further, the XML processor 10 creates a DOM tree 40 based on the analysis result.
[0061]
FIG. 3 shows an example of a DOM tree 40 created by analyzing the XML document 30 shown in FIG. As illustrated, each element (Element) is configured as a node (node) of a tree structure (for example, node 70 in the figure). The contents of the element, that is, a character string (Text) sandwiched between the start tag and the end tag is also configured as a node (for example, node 72 in the figure).
[0062]
The portion indicated by (1) in FIG. 3 is a subtree (part of the entire tree) corresponding to the description of the scan element in (1) in FIG. Also, the part indicated by (2) in FIG. 3 is a subtree corresponding to the description of the storage element in (2) in FIG.
[0063]
Thus, the DOM tree 40 is composed of a plurality of subtrees.
[0064]
In (3), the upper XML application 50 refers to or operates the created DOM tree 40 via the DOM interface 60.
[0065]
FIG. 4 is a block diagram showing a configuration of the information processing apparatus 11 as hardware resources for realizing the functions of the XML processor 10.
[0066]
As illustrated, the information processing apparatus 11 includes an operation unit 12, a first storage unit 14, an input / output unit 16, a CPU 20, a network I / F 22, and a ROM 24, which are connected to each other by a bus. Has been.
[0067]
The operation unit 12 includes, for example, a keyboard, and the user inputs arbitrary data using the operation unit 12 and performs a predetermined operation.
[0068]
The network I / F 22 is an interface for connecting to various networks.
[0069]
The ROM 24 stores a DOM tree creation processing routine program for creating a DOM tree from an XML document as a function of the XML processor 10. Further, the ROM 24 also stores programs for realizing the functions of the higher-level XML application 50 that handles XML documents and the DOM interface 60.
[0070]
The CPU 20 implements various functions by executing programs stored in the ROM 24 or the like.
[0071]
The 1st memory | storage part 14 is comprised by RAM, for example, and memorize | stores the created DOM tree etc. FIG.
[0072]
The input / output unit 16 is an interface for inputting / outputting various data. A second storage unit 18 is connected to the input / output unit 16. The second storage unit 18 can be, for example, a storage medium that has a slower access speed than the first storage unit 14 but has a large storage capacity, such as a hard disk device.
[0073]
The XML processor 10 shown in FIG. 1 is a function realized by using the information processing apparatus 11 shown in FIG. 4 as hardware resources and using a program stored in the ROM 24 as software resources. Note that the upper XML application 50 and the DOM interface 60 are functions realized in the same manner.
[0074]
Hereinafter, the DOM tree creation processing routine will be described with reference to FIG.
[0075]
In step 100, the XML processor 10 receives a DOM tree creation instruction from the upper XML application 50, and acquires (inputs) a structured document (here, an XML document). For example, when the storage device storing the XML document is the second storage unit 18, the XML document is acquired from the second storage unit 18 via the input / output unit 16 and stored in another computer system on the network. In the case, it is acquired via the network I / F 22.
[0076]
In step 102, the acquired XML document is analyzed. As described above, by detecting the start tag and the end tag in order, the structure and contents of each element described by these tags are analyzed.
[0077]
In step 104, a tree-structured object (DOM tree) is created based on the analysis result in step 102. More specifically, a DOM tree is created as a subtree having the analyzed element as a node.
[0078]
Hereinafter, the process for creating the subtree is referred to as a subtree creation process, and will be described separately from the DOM tree creation process. Also, the DOM tree created by this partial tree creation process is referred to as a partial tree, and will be described separately from the entire DOM tree.
[0079]
In step 106, it is determined whether or not the free space of the first storage unit 14 is insufficient. If it is determined that there is no shortage of free space, that is, there is sufficient free space to store the partial tree created in step 104, the created partial tree is 1 is stored in the storage unit 14. If it is determined in step 106 that the free space is insufficient, the process proceeds to the second storage unit storage process in step 110, and the created partial tree is stored in the second storage unit 18.
[0080]
FIG. 6 is a flowchart showing the flow of the second storage unit storage process.
[0081]
In step 200, it is determined whether or not the free capacity of the second storage unit 18 is insufficient. If it is determined that the free space is not insufficient, the created partial tree is stored in the second storage unit 18 as it is in step 202. If it is determined that the free space is insufficient, the process proceeds to step 204 where the created partial tree is converted to text data to reduce the data size. After the conversion, the process proceeds to step 202, and the converted text data is stored in the second storage unit 18.
[0082]
After storing the created subtree in the first storage unit 14 or the second storage unit 18, the process proceeds to step 112 in FIG. 5 to determine whether or not a stop condition is satisfied.
[0083]
The stop condition is set in advance. Here, the amount of memory used to store the DOM tree is determined in advance, and the amount of DOM tree creation (accumulation of the amount of subtree creation) is set to the predetermined amount. It is set so that it is determined that the stop condition is satisfied when it is reached. The amount of memory used can be set arbitrarily.
[0084]
Further, it may be set so that it is determined that the stop condition is satisfied when a predesignated element is detected during the analysis processing in step 102. The designation of this element can also be arbitrarily set in advance by the user.
[0085]
Furthermore, it is determined that the stop condition is satisfied when at least one of the case where the creation amount of the DOM tree reaches a predetermined amount and when a predetermined element is detected. It may be set as follows.
[0086]
If it is determined in step 112 that the stop condition is not satisfied, the process proceeds to step 116 to determine whether or not the analysis of the entire XML document has been completed.
[0087]
If it is determined in step 116 that the analysis of the entire XML document has not been completed, the process returns to step 102 to further read the XML document and detect the next start tag and end tag, thereby analyzing the XML document. to continue.
[0088]
If it is determined in step 112 that the stop condition is satisfied, the process proceeds to step 114 to perform processing when the stop condition is satisfied.
[0089]
FIG. 7 is a flowchart showing the flow of processing when the stop condition is satisfied.
[0090]
In step 300, the host XML application 50 is notified that the stop condition has been established. Here, the DOM tree created so far can also be passed to the upper XML application 50. In that case, the XML processor 10 passes the DOM tree in a form of notifying the storage location of the created DOM tree to the higher-level XML application 50 via the DOM interface 60.
[0091]
In step 302, it is determined whether a deletion request has been acquired from the upper XML application 50. If it is determined that the deletion request has not been acquired, the process proceeds to step 308 to determine whether a storage request to the second storage unit has been acquired. If it is determined that a request for storing in the second storage unit has not been acquired, the process returns to step 302. In other words, the XML processor 10 maintains a standby state until a request from the upper XML application 50 is acquired.
[0092]
If it is determined in step 302 that a deletion request has been acquired, the process proceeds to step 304 to search for an element (node) to be deleted.
[0093]
For example, a node having a deep hierarchy and a node higher than the node may be searched, and the node may be deleted preferentially.
[0094]
Hereinafter, this search process will be described in detail with reference to FIGS. FIG. 8 shows an original XML document, and FIG. 9 shows a DOM tree created from the XML document of FIG. 8 and stored in the first storage unit 14.
[0095]
As shown in FIG. 9, the author element of the subtree surrounded by the dotted line {circle over (3)} is subordinate to a node having a deeper hierarchy than other elements p and title of the same hierarchy. For example, the element node (first, last) and the text node (aaa, bbb, ssss) are deeper than the hierarchy of the node at the end of the subtree starting from the other element p, title.
[0096]
Therefore, the node searched as a deletion target here is a node of the subtree surrounded by the dotted line (3).
[0097]
By making such a node to be deleted, the free capacity of the first storage unit 14 can be increased as compared with the case of deleting a shallow subtree, that is, a subtree having a small data size. As will be described later, the node information deleted from the first storage unit 14 is stored in the second storage unit 18, read out from the second storage unit 18 as necessary, and expanded in the first storage unit 14. In the case where it is possible to store the second storage unit, it is possible to delete and store the nodes constituting the large-sized subtree as compared with the case where many small-sized subtrees are deleted and stored. It is possible to reduce the number of processes that are read from 18 and developed in the first storage unit 14, and the load on the apparatus can be reduced.
[0098]
The deletion request of the upper XML application 50 may include an instruction for the element to be deleted. Thereby, it is possible to search for an element (node) to be deleted based on an instruction from the upper XML application 50.
[0099]
In step 306, the searched node is deleted from the first storage unit 14.
[0100]
In addition, instead of acquiring a deletion request, if a second storage unit storage request is acquired in step 308, the same processing as in steps 304 and 306 described above is performed in steps 310 and 312. The process proceeds to the second storage unit storage process in step 314, and the deleted node information is stored in the second storage unit. By storing in the second storage unit 18, the deleted node can be expanded and used in the first storage unit 14 as necessary. Since the details of the flow of the second storage unit storage process are as described above, the description thereof is omitted.
[0101]
After the processing of step 306 or step 314, the process proceeds to step 316, and it is determined whether or not the storage amount of the DOM tree stored in the first storage unit 14 is equal to or less than the specified amount. If it is determined that the amount is not less than the specified amount, the process returns to step 300 to notify the upper XML application 50 that the storage amount of the DOM tree is not less than the specified amount, and the processing from step 302 is repeated. . The processing from step 300 to step 316 is repeated until an affirmative determination is made in step 316.
[0102]
If the determination in step 316 is affirmative, in step 318, the higher level XML application 50 is notified that the storage amount of the DOM tree has become equal to or less than the specified amount.
[0103]
In step 320, it is determined whether or not a request for resuming processing (resume request) has been acquired from the upper XML application 50. The standby is maintained until the restart request is acquired, and the subsequent analysis processing and thereafter are stopped. When the restart request is acquired, the process returns to step 116 in FIG. 5 to restart the analysis process.
[0104]
In FIG. 5, the processing from step 102 to step 116 is repeated until affirmative determination is made in step 116, and subtrees are created and stored in order.
[0105]
If an affirmative determination is made in step 116, the process moves to step 118, the created DOM tree is passed to the upper XML application 50, and the DOM tree creation process is terminated. Here, since the upper XML application 50 and the DOM interface 60 are implemented using the same hardware resources as the XML processor 10, the XML processor 10 stores the storage location of the created DOM tree in the upper XML application 50. The DOM tree is passed in a form notified through the DOM interface 60.
[0106]
As described above, the process stops while repeating the process of analyzing the input XML document, the process of creating a subtree based on the analysis result, and the process of storing the created subtree in the first storage unit 14. If it is determined that the condition is satisfied, at least one process is stopped. Therefore, before the error occurs because the created partial tree cannot be stored due to insufficient capacity of the first storage unit 14, Can be stopped.
[0107]
Moreover, since the node of the DOM tree memorize | stored in the 1st memory | storage part 14 was deleted after the stop condition was satisfied, the situation where the capacity | capacitance of the 1st memory | storage part 14 becomes insufficient can be prevented.
[0108]
In addition, since the deleted node information is stored in the second storage unit 18 as necessary, the node information is lost even when the node necessary for the operation of the upper XML application 50 is deleted. Instead, it can be read from the second storage unit 18 and expanded into the first storage unit 14 as needed.
[0109]
In addition, after the stop condition is satisfied, when the storage amount of the DOM tree stored in the first storage unit 14 is equal to or less than the specified amount, the stop state is canceled and the stopped process is restarted. It is possible to perform analysis on the whole, and to prevent a situation in which the process ends while being interrupted.
[0110]
Note that the deletion process is not limited to the above-described example. For example, the deletion process may be performed for each element belonging to a predefined set. Furthermore, a priority order may be set for each defined set, and may be deleted from elements belonging to a set with a higher (or lower) priority order. As a method for defining a set, for example, when a document to be handled is an XML document, a standard called XML Namespace can be adopted.
[0111]
Namespace is used to specify which set an element belongs to.
[0112]
Hereinafter, an example in which a set is defined using Namespace and a priority order is provided for each defined set will be described with reference to FIGS. 10 and 11.
[0113]
FIG. 10 is an example of a table that defines Namespace, and is registered in the XML processor 10 when, for example, a higher-level XML application is activated. The service name 80 defines a service (that is, a set) provided by the upper XML application 50. Namespace Prefix 82 is a prefix for linking a defined service to each element. By describing this prefix in the start tag of the XML document, each element belongs to the service represented by the prefix. Can be associated as an element. Here, “none” means not belonging to any service. Priority 84 defines the priority order. By referring to this item, an element in which a prefix of a service with a high priority (or low) is described can be deleted preferentially.
[0114]
FIG. 11 shows an example of an XML document described using a defined prefix.
[0115]
As illustrated, an element in which a prefix is described (for example, an element indicated by <a: action> and </ a: action>) is considered to belong to a service. In addition, an element in which the prefix b is described (for example, an element indicated by <b: action> and </ b: action>) is considered to belong to b service. An element without a prefix description is regarded as not belonging to any service. If it is set so that a service with a low priority is preferentially deleted, a node of an element date without a prefix description and a node subordinate to the node according to the table shown in FIG. Will be deleted preferentially.
[0116]
For example, when an element is used for a printing service, if it is referred to once, it is often not referred to thereafter. Therefore, the priority is set low so that it is deleted preferentially, and when used for a screen display service, it is referred again. Therefore, it is possible to prevent deletion by setting a high priority.
[0117]
Furthermore, the deletion process is not limited to such an example. For example, a node constituting a subtree starting from an element with a low reference count may be deleted preferentially or lastly referred to. You may make it delete preferentially the node which comprises the subtree starting from the element with the long elapsed time from time. Elements that have a low number of references or elements that have a long elapsed time since the last reference are considered to be less likely to be referenced or operated by the higher-level XML application 50, and thus are preferably deleted. When such deletion processing is performed, for example, information indicating which element the upper XML application 50 has referred to may be registered in a predetermined table and may be determined by referring to this table. .
[0118]
In addition, it is preferable not to delete a node constituting a subtree starting from an element that is referenced by another element. For example, an element referred to by another element using XPath used in XML is excluded from the deletion target. XPath is a language that describes position information of nodes that constitute a DOM tree. If Xpath indicating the position information of a certain element is held in another element, the other element can refer to the content of the element designated by the Xpath. If an element that is referenced by another element is deleted, the contents of the deleted element cannot be referenced by other elements. Therefore, the nodes that make up the subtree starting from such an element are not subject to deletion. It is preferable.
[0119]
In addition, when an element that is unlikely to be referenced in advance is known, the element may be designated in advance, and nodes constituting the tree starting from the element may be preferentially deleted. .
[0120]
Further, a priority order may be determined for each element. Thereby, a desired DOM tree can be created.
[0121]
In addition, when an element to be deleted is designated from the upper XML application 50, a node constituting a tree starting from the designated element can be preferentially deleted.
[0122]
Note that a plurality of deletion processes may be executed, or any one of them may be executed.
[0123]
Such a deletion method may be set in the XML processor 10 in advance, may be designated from the upper XML application 50, or may be arbitrarily set by the user.
[0124]
[Second Embodiment]
In the first embodiment, the stop condition is satisfied when the amount of DOM tree creation (accumulation of the amount of subtree creation) reaches a predetermined amount or when a predesignated element is detected. In the present embodiment, an example in which the analysis amount of the analysis process is used instead of the creation amount of the DOM tree will be described.
[0125]
Note that the hardware configuration and software configuration in the present embodiment are the same as those in the first embodiment, and a description thereof will be omitted.
[0126]
FIG. 12 is a flowchart showing the flow of the DOM tree creation processing routine according to the present embodiment.
[0127]
In this embodiment, after the structured document acquisition process in step 400 and the analysis process in step 402, it is determined in step 404 whether or not a stop condition is satisfied.
[0128]
In the present embodiment, the stop condition is determined to be satisfied when the analysis amount (cumulative) of the analysis process reaches a predetermined amount. The amount of analysis here is the amount of the XML document read and analyzed by the XML processor 10. The predetermined amount can be arbitrarily set in advance by the user.
[0129]
Further, similarly to the first embodiment, it may be set so that the stop condition is determined to be satisfied when an element designated in advance is detected during the analysis processing in step 402. The designation of this element can also be arbitrarily set in advance by the user.
[0130]
Furthermore, it is determined that the stop condition is satisfied when the analysis amount (accumulation) of the analysis processing reaches a predetermined amount and / or when a predesignated element is detected. It may be set so that.
[0131]
If it is determined in step 404 that the stop condition is not satisfied, the process proceeds to step 408, and a subtree is created based on the analysis result in step 402.
[0132]
If it is determined in step 404 that the stop condition is satisfied, the process proceeds to step 406, and processing when the stop condition is satisfied is performed. Since the processing when the stop condition is satisfied is the same as that in the first embodiment, description thereof is omitted.
[0133]
After the processing when the stop condition is satisfied, the process proceeds to step 408, and the creation of the subtree is resumed based on the analysis result of step 402.
[0134]
In step 410, the created subtree is stored in the first storage unit 14.
[0135]
In step 412, it is determined whether or not the analysis of the entire XML document has been completed. If it is determined that the analysis of the entire XML document has not been completed, the process returns to step 402 to further read the XML document and continue the analysis of the XML document.
[0136]
If it is determined in step 412 that the analysis of the entire XML document has been completed, the process proceeds to step 414, the created DOM tree is passed to the upper XML application 50, and the DOM tree creation process is terminated.
[0137]
As described above, since whether or not the stop condition is satisfied is determined based on the analysis amount of the analysis process, it can be determined whether or not the stop condition is satisfied before the subtree creation process and the storage process. It is possible to increase the free capacity of the first storage unit 14 by stopping the partial tree creation process and performing the process when the stop condition is satisfied.
[0138]
The determination of the establishment of the stop condition is not limited to the example shown in the first embodiment and the second embodiment. For example, when the analysis amount (cumulative) of the analysis process reaches a predetermined amount, It is set so that the stop condition is determined to be satisfied when the amount of DOM tree creation reaches a predetermined amount and / or when a predetermined element is detected. It may be.
[0139]
[Third Embodiment]
In the processing when the stop condition is satisfied in the first embodiment and the second embodiment, when the XML processor 10 acquires a deletion request or a second storage unit storage request from the higher-level XML application 50, the deletion processing or the second processing is performed. Although an example of performing storage processing in the second storage unit has been described, in the present embodiment, an example in which the XML processor 10 automatically performs deletion processing and storage processing in the second storage unit will be described.
[0140]
FIG. 13 is a flowchart showing a process flow when the stop condition is satisfied according to the present embodiment.
[0141]
In step 500, an element (node) to be deleted is searched.
[0142]
In step 502, the node searched in step 500 is deleted from the first storage unit 14. Note that the deletion process can be performed in the same manner as in the first embodiment.
[0143]
In step 504, it is determined whether or not the deleted node information is stored in the second storage unit 18. For example, if the node deleted from the first storage unit 14 is always set to be stored in the second storage unit 18, the XML processor 10 makes an affirmative determination in step 504 and proceeds to step 506 to delete the deleted node. The second storage unit storage process is executed for. Note that the details of the flow of the second storage unit storing process in step 506 are as described above, and thus the description thereof is omitted.
[0144]
Note that it may be set so that only elements designated in advance are stored in the second storage unit. In this case, the second storage unit storing process is executed only when the specified element node and the Text node indicating the contents of the element are deleted.
[0145]
If it is set not to perform any storage in the second storage unit 18, a negative determination is always made in step 504, and the process proceeds to step 508.
[0146]
In step 508, it is determined whether or not the storage amount of the DOM tree stored in the first storage unit 14 has become equal to or less than the specified amount. If it is determined that the amount is not less than the specified amount, the process returns to step 500 and the above-described processing is repeated. The processing from step 500 to step 508 is repeated until an affirmative determination is made in step 508.
[0147]
If the determination in step 508 is affirmative, in step 510, the host XML application 50 is notified that the storage amount of the DOM tree has become equal to or less than the specified amount.
[0148]
In step 520, it is determined whether or not a request for resuming processing (resume request) has been acquired from the upper XML application 50. Since the standby is maintained until the restart request is acquired, the subsequent processing (analysis processing in the first embodiment, partial tree creation processing in the second embodiment) and the subsequent processing are stopped. If a restart request is acquired, the process returns to step 116 in FIG. 5 or step 408 in FIG. 12 to restart the process.
[0149]
As described above, the process when the stop condition is satisfied can be automatically determined and the deletion process and the storage process can be performed without acquiring the deletion request from the upper XML application 50 or the second storage unit storage request. it can.
[0150]
[Fourth Embodiment]
In the present embodiment, an example in which the designated element is stored in the second storage unit 18 in the text format of the original XML document without creating a subtree regardless of whether or not the stop condition is satisfied. Will be described.
[0151]
Steps shown in FIG. 14 are added to the DOM tree creation processing routine in the first to third embodiments.
[0152]
After step 102 in FIG. 5 or step 402 in FIG. 12, whether or not the element analyzed in step 600 is designated to be stored in the second storage unit 18 in the text format before the subtree creation. Determine whether.
[0153]
Here, if a negative determination is made, the process proceeds to step 104 in FIG. 5 or step 404 in FIG. 12 without performing storage in the second storage unit 18.
[0154]
If an affirmative determination is made in step 600, the analyzed element is stored in the second storage unit 18 in the text format in step 602. After the storage, the process proceeds to step 116 in FIG. 5 and step 412 in FIG.
[0155]
As described above, since elements that have been designated in advance are stored in the second storage unit without creating a subtree, the amount of storage of the DOM tree in the first storage unit 14 can be reduced and wasteful. Therefore, it is possible to efficiently create a DOM tree.
[0156]
[Fifth Embodiment]
In the present embodiment, an example will be described in which a created partial tree is always stored in the second storage unit 18 for a designated element regardless of whether or not a stop condition is satisfied.
[0157]
Steps shown in FIG. 15 are added to the DOM tree creation processing routine in the first to third embodiments.
[0158]
After step 106 in FIG. 5 or step 408 in FIG. 12, in step 700, it is determined whether or not the element that created the subtree is an element that is designated in advance to be stored in the second storage unit 18.
[0159]
Here, if a negative determination is made, the process proceeds to step 108 in FIG. 5 or step 410 in FIG. 12 without storing in the second storage unit 18.
[0160]
If an affirmative determination is made in step 700, the process proceeds to the second storage unit storage process in step 702, and a process of storing the created subtree in the second storage unit 18 is performed. Since the second storage unit storage process is the same as that of the first embodiment, the description thereof is omitted. After the storage, the process proceeds to step 116 in FIG. 5 and step 412 in FIG.
[0161]
As described above, since the created partial tree is stored in the second storage unit for the element designated in advance regardless of whether or not the stop condition is satisfied, the DOM tree in the first storage unit 14 is stored. The amount of memory can be reduced.
[0162]
[Sixth Embodiment]
In the present embodiment, when a plurality of elements having the same element name are stored in the second storage unit 18, a predetermined number is associated with the same element name, and the element name is associated with the associated number. An example of replacement and storage will be described.
[0163]
FIG. 16 is a diagram showing an example of an XML document in which a plurality of the same element names are described. As shown in the figure, a plurality of elements named catalog and bookname are described.
[0164]
FIG. 17 is a diagram showing a DOM tree created based on the XML document of FIG.
[0165]
Here, a predetermined number is associated with each of the element names “catalog” and “bookname”. FIG. 18 is an example of a table storing the correspondence. Number 1 is associated with the element name “catalog”, and number 2 is associated with the element name “bookname”. When there are a plurality of element names having the same name, the XML processor 10 associates a predetermined number with the element name and sequentially registers it in this table.
[0166]
FIG. 19 shows a state in which each element name is replaced with an associated number using the table shown in FIG. As shown in the figure, the element name “catalog” is replaced with the number 1, and the element name “bookname” is replaced with the number 2.
[0167]
In this way, the storage amount can be reduced by replacing a plurality of identical element names with associated numbers and storing them.
[0168]
Note that not only the element name but also the name (attribute name) of the attribute (information indicating the characteristic of the element) attached to the element, when there are a plurality of the same names, the table is used in the same manner. The numbers can be associated with each other and stored in place of the numbers.
[0169]
【The invention's effect】
As described above, an analysis process for analyzing a document in which a plurality of elements are described hierarchically, and an object expressed in a tree structure with each element constituting the document as a node based on the analysis result of the analysis process When it is determined that a predetermined stop condition is satisfied while repeating the creation process for creating the object and the storage process for storing the object created by the creation process, the analysis process, the creation process, and Since at least one of the storage processes is stopped, the process can be stopped according to the processing state before an error occurs because the created tree structure object cannot be stored due to insufficient storage capacity. , Has the effect.
[Brief description of the drawings]
1A is a diagram conceptually showing a functional configuration of a structured document processing system according to the present embodiment and processing performed in the structured document processing system; FIG. FIG. 2 is a diagram showing in more detail the functional configuration of the structured document processing system shown in FIG. 1A and the processing performed in the structured document processing system.
FIG. 2 is a diagram showing an example of an XML document.
FIG. 3 is a diagram showing an example of a DOM tree created by analyzing the XML document of FIG. 2;
FIG. 4 is a block diagram showing a configuration of an information processing apparatus as hardware resources for realizing the functions of an XML processor.
FIG. 5 is a flowchart showing a flow of a DOM tree creation processing routine according to the first embodiment.
FIG. 6 is a flowchart showing a flow of second storage unit storage processing;
FIG. 7 is a flowchart showing a flow of processing when a stop condition is satisfied.
FIG. 8 is a diagram showing an example of an XML document.
9 is a diagram showing an example of a DOM tree created from the XML document of FIG. 8 and stored in the first storage unit.
FIG. 10 is an example of a table defining Namespace.
FIG. 11 is a diagram showing an example of an XML document described using a defined prefix.
FIG. 12 is a flowchart showing a flow of a DOM tree creation processing routine according to the second embodiment.
FIG. 13 is a flowchart showing a flow of processing when a stop condition is satisfied according to the third embodiment.
FIG. 14 is a diagram showing additional steps according to the fourth embodiment.
FIG. 15 is a diagram showing additional steps according to the fifth embodiment.
FIG. 16 is a diagram showing an example of an XML document in which a plurality of the same element names are described.
17 is a diagram showing a DOM tree created based on the XML document of FIG.
FIG. 18 is an example of a table storing associations between element names and numbers.
FIG. 19 is a diagram showing a state in which each element name is replaced with an associated number using the table shown in FIG.
[Explanation of symbols]
10 XML processor
11 Information processing device
14 First storage unit
18 Second storage unit
20 CPU
24 ROM
50 Top XML applications

Claims

An input means for inputting a document in which a plurality of elements are described hierarchically;
An analysis process for analyzing the document input by the input unit; a creation process for creating an object expressed in a tree structure with each element constituting the document as a node based on an analysis result of the analysis process; A storage means for storing the object created by the creation process,
Determining means for determining whether or not a predetermined stop condition is satisfied;
Control means for controlling the processing means so that at least one of the analysis process, the creation process, and the storage process is stopped when the determination means determines that the stop condition is satisfied;
An information processing apparatus including:

The determination means detects when the analysis amount of the analysis processing reaches a predetermined amount, when the creation amount of the creation processing reaches a predetermined amount, and when a predetermined element is detected by the analysis processing. The information processing apparatus according to claim 1, wherein the information processing apparatus determines that the stop condition is satisfied in at least one of cases.

The information processing apparatus according to claim 1, further comprising a deletion unit that deletes a node of the stored object after the stop condition is satisfied.

The information processing apparatus according to claim 3, wherein the deletion unit deletes the node of the stored object when receiving an instruction from the outside after the stop condition is satisfied.

The deleting means, when deleting a node,
Delete a node having a deep hierarchy and a node higher than the node with priority;
Delete nodes that make up the tree starting from an element with a low reference count,
Delete preferentially the nodes that make up the tree starting from an element that has a long elapsed time since it was last referenced,
Delete nodes preferentially other than the nodes that make up the tree starting from an element referenced by another element;
Delete preferentially the nodes that make up the tree starting from a pre-designated element;
Delete each element belonging to a predefined set,
Delete according to a predetermined priority,
Deletes the nodes that make up the tree starting from the element specified externally,
The information processing apparatus according to claim 3 or 4, wherein at least one of the following is executed.

6. The control unit according to claim 3, wherein the control unit further controls to store the information of the node deleted by the deletion unit in a storage area different from the storage area stored in the storage process. Information processing apparatus according to item.

The information processing apparatus according to claim 6, wherein the control unit performs control to store in the another storage area when receiving an instruction from the outside.

The information processing apparatus according to claim 6 or 7, wherein the control unit performs control so as to store information of the node deleted by the deletion unit as text data.

7. The control unit according to claim 6, wherein when there are a plurality of the same element names or a plurality of the same attribute names, the control means performs control so that the element names or attribute names are replaced with associated numbers and stored. Item 9. The information processing device according to any one of items 8 to 9.

A restart condition determining means for determining whether or not a predetermined restart condition is satisfied;
The control means cancels the stop and restarts the stopped process when the restart condition is determined by the restart condition determining means after the stop condition is satisfied. The information processing apparatus according to any one of claims 1 to 9, which controls the means.

The resumption condition determination unit is configured to receive the resumption condition in at least one of a case where a resumption instruction is received from the outside and a case where the storage amount of the object stored by the storage process is equal to or less than a predetermined amount. The information processing apparatus according to claim 10, wherein it is determined that is established.

12. The processing unit according to claim 1, wherein when the element designated in advance is detected by the analysis process, the processing unit does not perform the creation process for the element and an element subordinate to the element. Information processing apparatus according to item.

13. The storage processing of the processing means stores information on elements that have not been subjected to the creation processing and elements subordinate to the elements in a storage area different from a storage area that stores the created objects. Information processing device.

An input process for inputting a document in which a plurality of elements are described hierarchically;
An analysis process for analyzing the document input in the input step, and a creation process for creating an object expressed in a tree structure with each element constituting the document as a node based on an analysis result of the analysis process; A storage process for storing the object created by the creation process,
A determination step of determining whether or not a predetermined stop condition is satisfied;
A stop step of stopping at least one of the analysis process, the creation process, and the storage process when it is determined by the determination step that the stop condition is satisfied;
An information processing method including:

The information processing method according to claim 14, further comprising a deletion step of deleting a node of the stored object after the stop condition is satisfied.

The information processing method according to claim 15, further comprising a storage step of storing the information of the node deleted by the deletion step in a storage area different from the storage area stored in the storage process.

A restart condition determination step for determining whether or not a predetermined restart condition is satisfied;
After the stop condition is satisfied, when it is determined by the restart condition determination step that the restart condition is satisfied, the restart process is performed to release the stop and restart the stopped process;
The information processing method according to any one of claims 14 to 16, further comprising:

On the computer,
An input process for inputting a document in which a plurality of elements are described hierarchically;
An analysis process for analyzing the document input in the input step, and a creation process for creating an object expressed in a tree structure with each element constituting the document as a node based on an analysis result of the analysis process; A storage process for storing the object created by the creation process,
A determination step of determining whether or not a predetermined stop condition is satisfied;
A stop step of stopping at least one of the analysis process, the creation process, and the storage process when it is determined by the determination step that the stop condition is satisfied;
A program for running

The program according to claim 18, further comprising a deletion step of deleting the node of the stored object after the stop condition is satisfied.

20. The program according to claim 19, further comprising a storage step of storing information on the node deleted by the deletion step in a storage area different from the storage area stored in the storage process.

A restart condition determination step for determining whether or not a predetermined restart condition is satisfied;
After the stop condition is satisfied, when it is determined by the restart condition determination step that the restart condition is satisfied, the restart process is performed to release the stop and restart the stopped process;
The program according to any one of claims 18 to 20, further comprising: