JP4134824B2

JP4134824B2 - Information processing apparatus and program

Info

Publication number: JP4134824B2
Application number: JP2003176823A
Authority: JP
Inventors: 直子佐藤; 昌俊田川; 正義榊原; 雅紀佐竹; 芳幸内藤
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2003-06-20
Filing date: 2003-06-20
Publication date: 2008-08-20
Anticipated expiration: 2023-06-20
Also published as: JP2005011215A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の要素が階層的に記述された文書を処理する情報処理装置及びプログラムに関し、特に、該文書から作成された木構造を記憶するために必要な記憶容量を削減する情報処理装置及びプログラムに関するものである。
【０００２】
【従来の技術】
計算機あるいはアプリケーションによってばらばらであったデータ形式を、異なる計算機やアプリケーションでも使用できるようにするための統一規格として、ＸＭＬ（eXtensible Markup Language）が知られている。ＸＭＬは、複数の要素を階層的に記述する構造化文書の代表的な規格である。
【０００３】
ＸＭＬ文書のような構造化文書をアプリケーションで操作するためのＡＰＩ（Application Programming Interface）として、ＤＯＭ（Document Object Model）が知られている。ＤＯＭは、構造化文書を木構造のオブジェクトとして扱うためのＡＰＩである。
【０００４】
構造化文書をＤＯＭに変換することにより、アプリケーションは、構造化文書の木構造を認識でき、木をたどって必要な要素にアクセスすることができる。
【０００５】
しかし、構造化文書をＤＯＭに変換する場合、構造化文書のサイズが大きいほど、変換後の木構造が大きくなり、これを記憶するために多くの記憶容量が必要となる。また、階層が深い場合にも、階層構造を表すための記憶容量が多く必要となる。従って、木構造に変換しても容量不足により記憶装置に格納できずに、エラーとなって処理が中断する、あるいは格納できたとしてもアクセス効率が悪くなってしまう、等の問題が発生する。
【０００６】
必要な記憶容量を削減するための装置としては、構造化文書の階層を浅くすることにより動作記憶容量を削減し、データアクセス効率を改善する変換装置（例えば、特許文献１を参照。）や、相対的に同じ位置にある同じ要素名の複数の要素の内容を接合して、一つの新しい要素を生成する構造化文書変換装置（例えば、特許文献２を参照。）が知られている。
【０００７】
【特許文献１】
特開２００２−２９７５６９号公報
【特許文献２】
特開２００２−１０８８５０号公報
【０００８】
【発明が解決しようとする課題】
しかしながら、従来の変換装置及び構造化文書変換装置では、構造化文書の階層構造自体を変化させるため、構造化文書を操作するアプリケーションは階層構造の変化を認識していなくてはならない、という問題がある。また、従来の複数の要素を合成する構造化文書変換装置では、相対的に同じ階層位置に同じ要素名の要素が存在する場合でないと合成できない、という問題もある。
【０００９】
本発明は、上述した問題を解決するために提案されたものであり、複数の要素を階層的に記述する文書から作成された木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる情報処理装置及びプログラムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記目的を達成するために、本発明の第１の情報処理装置は、複数の要素が階層的に記述された文書に対応する木構造を記憶した記憶手段と、前記記憶された木構造において、要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する同一部分検出手段と、前記同一部分が検出された場合に、特定の同一部分の記憶場所を示す情報を生成する生成手段と、前記同一部分が検出された場合に、前記記憶手段から前記特定の同一部分以外の同一部分を削除すると共に、削除した同一部分に対応させて前記特定の同一部分の記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段と、を含んで構成されている。
【００１１】
本発明の第１の情報処理装置の記憶手段には、複数の要素が階層的に記述された文書に対応する木構造が記憶される。記憶手段は特に限定されず、例えば、メインメモリとして一般的に用いられるＲＡＭであってもよい。木構造は、複数の要素や要素の内容をノードとする木構造とすることができる。
【００１２】
複数の要素が階層的に記述された文書は、例えば、ＸＭＬ文書等に代表される構造化文書とすることができる。
【００１３】
同一部分検出手段は、記憶された木構造において、要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する。生成手段は、同一部分が検出された場合に、特定の同一部分の記憶場所を示す情報を生成する。
【００１４】
特定の同一部分は、同一部分検出手段により検出された複数の同一部分のいずれであってもよく、特に限定されない。
【００１５】
処理手段は、同一部分が検出された場合に、記憶手段から特定の同一部分以外の同一部分を削除するように処理する。処理手段は、削除した同一部分に対応させて特定の同一部分の記憶場所を示す情報を記憶手段に記憶するように処理する。
【００１６】
すなわち、特定の同一部分以外の同一部分については、木構造が削除され、代わりに特定の同一部分の記憶場所を示す情報が記憶される。
【００１７】
記憶場所を示す情報のデータ量は小さいため、記憶場所を示す情報を記憶しても記憶手段の記憶量が大幅に増加することはない。一方、木構造のデータ量は大きいため、同一部分の木構造を削除することにより木構造を記憶するために必要な記憶容量が大幅に削減されることとなる。
【００１８】
また、削除された同一部分を参照する場合であっても、特定の同一部分の記憶場所を示す情報を用いて、削除された同一部分の代わりに特定の同一部分の木構造を参照することができるため、木構造に対する操作性が損なわれることはない。
【００１９】
このように、同一部分を共通化できるため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００２０】
本発明の第１のプログラムは、コンピュータを、複数の要素が階層的に記述された文書に対応する木構造を記憶した記憶手段に記憶された木構造において、要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する同一部分検出手段、前記同一部分が検出された場合に、特定の同一部分の記憶場所を示す情報を生成する生成手段、及び前記同一部分が検出された場合に、前記記憶手段から前記特定の同一部分以外の同一部分を削除すると共に、削除した同一部分に対応させて前記特定の同一部分の記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段、として機能させる。
【００２１】
本発明の第１のプログラムも、本発明の第１の情報処理装置と同様に作用するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００２２】
本発明の第２の情報処理装置は、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段と、前記作成された木構造において、要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する同一部分検出手段と、前記同一部分が検出された場合に、特定の同一部分の記憶場所を示す情報を生成する生成手段と、前記作成された木構造の同一部分以外の部分及び前記特定の同一部分を記憶すると共に、前記特定の同一部分以外の同一部分に対応させて前記特定の同一部分の記憶場所を示す情報を記憶する記憶手段と、を含んで構成されている。
【００２３】
本発明の第２の情報処理装置の木構造作成手段は、複数の要素が階層的に記述された文書に対応する木構造を作成する。
【００２４】
また、第２の情報処理装置の同一部分検出手段は、作成された木構造において、同一部分を検出する。生成手段は、同一部分が検出された場合に、特定の同一部分の記憶場所を示す情報を生成する。
【００２５】
記憶手段は、作成された木構造の同一部分以外の部分及び特定の同一部分を記憶する。更に、特定の同一部分以外の同一部分に対応させて、特定の同一部分の記憶場所を示す情報を記憶する。
【００２６】
すなわち、検出された同一部分については、特定の同一部分のみ木構造として記憶手段に記憶され、他の同一部分については、木構造の代わりに特定の同一部分の記憶場所を示す情報が記憶される。
【００２７】
このように、同一部分の木構造を記憶しないことにより、木構造を記憶するために必要な記憶容量が大幅に削減されることとなる。
【００２８】
また、記憶されなかった同一部分を参照する場合であっても、特定の同一部分の記憶場所を示す情報を用いて、記憶されなかった同一部分の代わりに特定の同一部分の木構造を参照することができるため、木構造に対する操作性が損なわれることはない。
【００２９】
このように、同一部分を共通化できるため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００３０】
本発明の第２のプログラムは、コンピュータを、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段、前記作成された木構造において、要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する同一部分検出手段、前記同一部分が検出された場合に、特定の同一部分の記憶場所を示す情報を生成する生成手段、及び前記作成された木構造の同一部分以外の部分及び前記特定の同一部分を記憶手段に記憶すると共に、前記特定の同一部分以外の同一部分に対応させて前記特定の同一部分の記憶場所を示す情報を前記記憶手段に記憶する処理手段、として機能させる。
【００３１】
本発明の第２のプログラムも、本発明の第２の情報処理装置と同様に作用するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００３２】
本発明の第３の情報処理装置は、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段と、前記木構造作成手段で新たに木構造が作成される毎に、記憶手段に既に記憶されている木構造と前記木構造作成手段で新たに作成された木構造とを比較し、前記木構造作成手段で新たに作成された木構造において前記記憶手段に既に記憶されている木構造と要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する同一部分検出手段と、前記同一部分が検出された場合に、前記記憶手段に既に記憶されている木構造の前記同一部分の記憶場所を示す情報を生成する生成手段と、前記同一部分が検出された場合に、前記作成された木構造の同一部分以外の部分を前記記憶手段に記憶するように処理すると共に、及び前記作成された木構造の同一部分に対応させて前記同一部分の記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段と、を含んで構成されている。
【００３４】
木構造作成手段は、複数の要素が階層的に記述された文書に対応する木構造を作成する。
【００３５】
同一部分検出手段は、木構造作成手段で新たに木構造が作成される毎に、記憶手段に既に記憶されている木構造と木構造作成手段で新たに作成された木構造とを比較し、木構造作成手段で新たに作成された木構造において該記憶手段に既に記憶されている木構造と要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する。
【００３６】
生成手段は、同一部分が検出された場合に、記憶手段に既に記憶されている木構造の同一部分の記憶場所を示す情報を生成する。
【００３７】
処理手段は、同一部分が検出された場合に、作成された木構造の同一部分以外の部分を記憶手段に記憶するように処理する。また、処理手段は、作成された木構造の同一部分に対応させて、記憶された木構造の同一部分の記憶場所を示す情報を記憶手段に記憶するように処理する。
【００３８】
すなわち、検出された同一部分については、記憶手段に既に記憶されている同一部分の記憶場所を示す情報が木構造の代わりに記憶される。
【００３９】
このように、同一部分の木構造を記憶しないことにより、木構造を記憶するために必要な記憶容量が大幅に削減されることとなる。
【００４０】
また、記憶されなかった同一部分を参照する場合であっても、記憶手段に既に記憶されていた同一部分の記憶場所を示す情報を用いて、記憶されなかった同一部分の代わりに該同一部分の木構造を参照することができるため、木構造に対する操作性が損なわれることはない。
【００４１】
このように、同一部分を共通化できるため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００４２】
本発明の第３のプログラムは、コンピュータを、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段、前記木構造作成手段で新たに木構造が作成される毎に、記憶手段に既に記憶されている木構造と前記木構造作成手段で新たに作成された木構造とを比較し、前記木構造作成手段で新たに作成された木構造において前記記憶手段に既に記憶されている木構造と要素、要素の内容、及び要素の階層構造が同一の同一部分を検出する同一部分検出手段、前記同一部分が検出された場合に、前記記憶手段に既に記憶されている木構造の前記同一部分の記憶場所を示す情報を生成する生成手段、及び
前記同一部分が検出された場合に、前記作成された木構造の同一部分以外の部分を前記記憶手段に記憶するように処理すると共に、及び前記作成された木構造の同一部分に対応させて前記同一部分の記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段、として機能させる。
【００４３】
本発明の第３のプログラムも、本発明の第３の情報処理装置と同様に作用するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００４４】
本発明の第４の情報処理装置は、複数の要素が階層的に記述された文書に対応する木構造を記憶した記憶手段と、前記記憶された木構造の中から、要素名が同一の要素を含む部分を抽出する第１抽出手段と、前記第１抽出手段により抽出された部分の各々の中から、前記要素名が同一の要素以外の部分を相違部分として抽出する第２抽出手段と、前記第２抽出手段により抽出された相違部分のデータ量が所定値以下の前記第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する類似判断手段と、前記類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成すると共に、前記類似部分の相違部分の情報を示す第２データを前記類似部分毎に作成するデータ作成手段と、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を生成する生成手段と、前記記憶手段から前記類似部分の全てを削除し、削除した類似部分に対応させて、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段と、を含んで構成されている。
【００４５】
本発明の第４の情報処理装置の記憶手段には、複数の要素が階層的に記述された文書に対応する木構造が記憶される。
【００４６】
第１抽出手段は、記憶手段に記憶された木構造の中から、要素名が同一の要素を含む部分を抽出する。第２抽出手段は、該第１抽出手段により抽出された部分の各々の中から、要素名が同一の要素以外の部分を相違部分として抽出する。そして、類似判断手段は、第２抽出手段により抽出された相違部分のデータ量が所定値以下の第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する。
【００４７】
データ作成手段は、類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成する。更にデータ作成手段は、類似部分の相違部分の情報を示す第２データを類似部分毎に作成する。
【００４８】
第１データは、例えば、各類似部分に含まれる共通のデータにより構成されていてもよい。また、第２データは、例えば、各類似部分から第１データに含まれる情報を除いた情報により構成されていてもよい。
【００４９】
生成手段は、第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報を生成する。
【００５０】
処理手段は、類似部分が検出された場合に、記憶手段から類似部分の全てを削除するように処理する。更に、削除した類似部分に対応させて、第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報を記憶手段に記憶するように処理する。
【００５１】
すなわち、類似部分については、木構造が削除され、代わりに第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報が記憶される。
【００５２】
記憶場所を示す情報のデータ量は小さいため、記憶場所を示す情報を記憶しても記憶手段の記憶量が大幅に増加することはない。一方、木構造のデータ量は大きいため、類似部分の木構造を削除することにより木構造を記憶するために必要な記憶容量が大幅に削減されることとなる。
【００５３】
また、削除された類似部分を参照する場合であっても、第１データの記憶場所を示す情報及び第２データの記憶場所を示す情報を用いて、第１データ及び第２データを読み出すことができるため、削除された類似部分を参照することができる。
【００５４】
このように、類似部分の木構造を削除して、第１データの記憶場所及び第２データの記憶場所を各類似部分に対応させて記憶するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００５５】
本発明の第４のプログラムは、コンピュータを、複数の要素が階層的に記述された文書に対応する木構造を記憶した記憶手段に記憶された木構造の中から、要素名が同一の要素を含む部分を抽出する第１抽出手段、前記第１抽出手段により抽出された部分の各々の中から、前記要素名が同一の要素以外の部分を相違部分として抽出する第２抽出手段、前記第２抽出手段により抽出された相違部分のデータ量が所定値以下の前記第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する類似判断手段、前記類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成すると共に、前記類似部分の相違部分の情報を示す第２データを前記類似部分毎に作成するデータ作成手段、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を生成する生成手段、及び
前記記憶手段から前記類似部分の全てを削除し、削除した類似部分に対応させて、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段、として機能させる。
【００５６】
本発明の第４のプログラムも、本発明の第４の情報処理装置と同様に作用するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００５７】
本発明の第５の情報処理装置は、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段と、前記作成された木構造の中から、要素名が同一の要素を含む部分を抽出する第１抽出手段と、前記第１抽出手段により抽出された部分の中から、前記要素名が同一の要素以外の部分を相違部分として抽出する第２抽出手段と、前記第２抽出手段により抽出された相違部分のデータ量が所定値以下の前記第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する類似判断手段と、前記類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成すると共に、前記類似部分の相違部分の情報を示す第２データを前記類似部分毎に作成するデータ作成手段と、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を生成する生成手段と、前記作成された木構造の前記類似部分以外の部分を記憶手段に記憶すると共に、前記作成された木構造の類似部分に対応させて前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段と、を含んで構成されている。
【００５８】
本発明の第５の情報処理装置の木構造作成手段は、複数の要素が階層的に記述された文書に対応する木構造を作成する。
【００５９】
第１抽出手段は、作成された木構造の中から、要素名が同一の要素を含む部分を抽出する。第２抽出手段は、第１抽出手段により抽出された部分の中から、要素名が同一の要素以外の部分を相違部分として抽出する。類似判断手段は、第２抽出手段により抽出された相違部分のデータ量が所定値以下の第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する。
【００６０】
データ作成手段は、類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成する。更にデータ作成手段は、類似部分の相違部分の情報を示す第２データを類似部分毎に作成する。生成手段は、第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報を生成する。
【００６１】
処理手段は、上記作成された木構造の類似部分以外の部分を記憶手段に記憶すると共に、上記作成された木構造の類似部分に対応させて第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報を記憶手段に記憶するように処理する。
【００６２】
すなわち、類似部分については、木構造は記憶されずに、代わりに第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報が記憶される。
【００６３】
このように、類似部分の木構造を記憶しないことにより、木構造を記憶するために必要な記憶容量が大幅に削減されることとなる。
【００６４】
また、記憶されなかった類似部分を参照する場合であっても、第１データの記憶場所及び第２データの記憶場所を示す情報を用いて、第１データ及び第２データを読み出すことができるため、記憶されなかった類似部分を参照することができる。
【００６５】
このように、類似部分の木構造は記憶せず、第１データの記憶場所及び第２データの記憶場所を各類似部分に対応させて記憶するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００６６】
本発明の第５のプログラムは、コンピュータを、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段、前記作成された木構造の中から、要素名が同一の要素を含む部分を抽出する第１抽出手段、前記第１抽出手段により抽出された部分の中から、前記要素名が同一の要素以外の部分を相違部分として抽出する第２抽出手段、前記第２抽出手段により抽出された相違部分のデータ量が所定値以下の前記第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する類似判断手段、前記類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成すると共に、前記類似部分の相違部分の情報を示す第２データを前記類似部分毎に作成するデータ作成手段、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を生成する生成手段、及び前記作成された木構造の前記類似部分以外の部分を記憶手段に記憶すると共に、前記作成された木構造の類似部分に対応させて前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段、として機能させる。
【００６７】
本発明の第５のプログラムも、本発明の第５の情報処理装置と同様に作用するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００６８】
本発明の第６の情報処理装置は、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段と、前記木構造作成手段で新たに木構造が作成される毎に、記憶手段に既に記憶されている木構造と前記木構造作成手段で新たに作成された木構造とを比較し、前記木構造作成手段で新たに作成された木構造から、前記記憶手段に既に記憶されている木構造に含まれる要素と要素名が同一の要素を含む部分を抽出する第１抽出手段と、前記第１抽出手段により抽出された部分の中から、前記要素名が同一の要素以外の部分を相違部分として抽出する第２抽出手段と、前記第２抽出手段により抽出された相違部分のデータ量が所定値以下の前記第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する類似判断手段と、前記類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成すると共に、前記類似部分の相違部分の情報を示す第２データを前記類似部分毎に作成するデータ作成手段と、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を生成する生成手段と、前記作成された木構造の前記類似部分以外の部分を前記記憶手段に記憶すると共に、前記記憶手段から前記類似部分を削除し、該削除した部分及び前記作成された木構造の類似部分に対応させて、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段と、を含んで構成されている。
【００７０】
木構造作成手段は、複数の要素が階層的に記述された文書に対応する木構造を作成する。
【００７１】
第１抽出手段は、木構造作成手段で新たに木構造が作成される毎に、記憶手段に既に記憶されている木構造と該木構造作成手段で新たに作成された木構造とを比較し、該木構造作成手段で新たに作成された木構造から、記憶手段に既に記憶されている木構造に含まれる要素と要素名が同一の要素を含む部分を抽出する。第２抽出手段は、第１抽出手段により抽出された部分の中から、該要素名が同一の要素以外の部分を相違部分として抽出する。類似判断手段は、第２抽出手段により抽出された相違部分のデータ量が所定値以下の第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する。
【００７２】
データ作成手段は、類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成する。更にデータ作成手段は、類似部分の相違部分の情報を示す第２データを類似部分毎に作成する。生成手段は、第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報を生成する。
【００７３】
処理手段は、上記作成された木構造の類似部分以外の部分を記憶手段に記憶すると共に、記憶手段から類似部分を削除し、該削除した部分及び作成された木構造の類似部分に対応させて、第１データの記憶場所を示す情報及び各第２データの記憶場所を示す情報を記憶手段に記憶するように処理する。
【００７４】
すなわち、作成された木構造の類似部分については、木構造は記憶されずに、代わりに第１データの記憶場所を示す情報及び第２データの記憶場所を示す情報が記憶される。また、既に記憶されている類似部分については、木構造が削除されて、代わりに第１データの記憶場所を示す情報、及び第２データの記憶場所を示す情報が記憶される。
【００７５】
このように、作成された類似部分の木構造を記憶しないと共に、記憶されていた類似部分の木構造を削除することにより、木構造を記憶するために必要な記憶容量が大幅に削減されることとなる。
【００７６】
また、記憶されなかったあるいは削除された類似部分を参照する場合であっても、第１データの記憶場所及び第２データの記憶場所を示す情報を用いて、第１データ及び第２データを読み出すことができるため、記憶されなかったあるいは削除された類似部分を参照することができる。
【００７７】
このような構成によっても、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００７８】
本発明の第６のプログラムは、コンピュータを、複数の要素が階層的に記述された文書に対応する木構造を作成する木構造作成手段と、前記木構造作成手段で新たに木構造が作成される毎に、記憶手段に既に記憶されている木構造と前記木構造作成手段で新たに作成された木構造とを比較し、前記木構造作成手段で新たに作成された木構造から、前記記憶手段に既に記憶されている木構造に含まれる要素と要素名が同一の要素を含む部分を抽出する第１抽出手段と、前記第１抽出手段により抽出された部分の中から、前記要素名が同一の要素以外の部分を相違部分として抽出する第２抽出手段と、前記第２抽出手段により抽出された相違部分のデータ量が所定値以下の前記第１抽出手段により抽出された部分を、要素、要素の内容、及び要素の階層構造が類似する類似部分であると判断する類似判断手段と、前記類似判断手段により判断された類似部分に基づいて、各類似部分に共通する情報を示す第１データを作成すると共に、前記類似部分の相違部分の情報を示す第２データを前記類似部分毎に作成するデータ作成手段と、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を生成する生成手段と、前記作成された木構造の前記類似部分以外の部分を前記記憶手段に記憶すると共に、前記記憶手段から前記類似部分を削除し、該削除した部分及び前記作成された木構造の類似部分に対応させて、前記第１データの記憶場所を示す情報及び前記各第２データの記憶場所を示す情報を前記記憶手段に記憶するように処理する処理手段と、として機能させる。
【００７９】
本発明の第６のプログラムも、本発明の第６の情報処理装置と同様に作用するため、木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる。
【００８０】
なお、上述したプログラムを記憶するための記憶媒体は、ＲＯＭや、ＣＤ−ＲＯＭやＤＶＤディスク、光磁気ディスクやＩＣカード、あるいはハードディスク等であってもよいし、電気通信回線上の搬送波のような伝送媒体であってもよい。
【００８４】
なお、上述した第４〜６のいずれかの情報処理装置において、前記データ作成手段は、前記第１データに前記第２データを挿入可能に作成することができる。
【００８５】
このように第１データを作成することにより、第１データ及び第２データを用いれば、元の類似部分を示すデータを作成することができるため、これに基づいて類似部分の木構造を作成することができる。
【００８６】
前記データ作成手段は、前記第１データに前記第２データを所定の規則に従って変換するための変換式が含まれるように作成することができる。
【００８７】
このように第１データを作成することにより、第１データを用いて第２データを変換することができる。これにより容易に元の類似部分を示すデータを作成することができるため、これに基づいて類似部分の木構造を作成することができる。
【００８８】
また、上述した第４〜６のいずれかの情報処理装置は、前記第１データの記憶場所及び前記第２データの記憶場所が記憶された類似部分が参照された場合に、前記第１データと前記参照された類似部分に対応する第２データとを用いて木構造を作成して前記記憶手段に記憶する類似部分木構造作成手段を更に含んで構成されることができる。
【００８９】
これにより、類似部分が参照された場合に、容易に参照された類似部分の木構造を作成して記憶することができる。なお、ここでいう「参照」は、類似部分の木構造を操作する場合も含む。
【００９０】
更に、上述した情報処理装置は、前記記憶手段の空き容量が所定値以下になった場合、及び、前記類似部分木構造作成手段により記憶された木構造が所定時間参照されなかった場合、のいずれか一方の場合には、前記類似部分木構造作成手段により記憶された木構造を前記記憶手段から削除する削除手段を更に含んで構成されることができる。
【００９１】
これにより、記憶手段の空き容量を無駄に使用することがなくなる。
【００９２】
【発明の実施の形態】
以下、本発明の好ましい実施の形態について図面を参照しながら詳細に説明する。
【００９３】
［第１の実施の形態］
図１（Ａ）は、本実施の形態に係る構造化文書処理システムの機能構成及び構造化文書処理システムで行われる処理を概念的に示した図である。図示されるように、テキスト形式で記述されたＸＭＬ文書３０が、本発明の情報処理装置としてのＸＭＬプロセッサ（ＸＭＬパーザ）１０に入力されると、ＸＭＬプロセッサ１０は、ＸＭＬ文書３０の各要素をノードとする木構造（ＤＯＭツリー）４０を作成する。上位ＸＭＬアプリケーション５０は、作成されたＤＯＭツリー４０を参照あるいは操作することができる。。
【００９４】
図１（Ｂ）は、図１（Ａ）で示された構造化文書処理システムの機能構成及び構造化文書処理システムで行われる処理を更に詳細に示した図である。
【００９５】
（１）では、上位ＸＭＬアプリケーション５０がＸＭＬ文書３０を参照及び操作するために、ＤＯＭインタフェース６０を介して、ＸＭＬプロセッサ１０にＤＯＭツリー４０を作成するように指示を与える。
【００９６】
（２）では、指示を受けたＸＭＬプロセッサ１０は、ＸＭＬ文書３０を入力して解析する。
【００９７】
図２は、ＸＭＬ文書３０の一例を示している。図示されるようにＸＭＬ文書３０は、テキスト形式で、要素の始まりを示す開始タグ＜○○○＞と要素の終了を示す終了タグ＜／○○○＞により、複数の要素が階層的に記述されている。
【００９８】
具体的には、ＸＭＬプロセッサ１０は、図２に示されるようなＸＭＬ文書３０を上から順に読み込み、開始タグ及び終了タグを検出していくことにより、ＸＭＬ文書３０の各要素の階層構造を解析する。
【００９９】
更にＸＭＬプロセッサ１０は、解析結果に基づいて、ＤＯＭツリー４０を作成する。
【０１００】
図３は、図２のＸＭＬ文書３０を解析して作成されたＤＯＭツリー４０の一例を示している。図示されるように、各要素（Element）がＤＯＭツリー４０のノード（節点）として構成されている（例えば、図のノード７０）。また、要素の内容、すなわち、開始タグと終了タグで挟まれた文字列(Text）もノードとして構成されている（例えば、図のノード７２）。
【０１０１】
なお、図３の▲１▼で示された部分が、図２の▲１▼のscan要素の記述に対応する部分木（全体の木の一部）である。また、図３の▲２▼で示された部分が、図２の▲２▼のstorage要素の記述に対応する部分木である。
【０１０２】
このように、ＤＯＭツリー４０は、複数の部分木により構成されている。
【０１０３】
（３）では、上位ＸＭＬアプリケーション５０が、作成されたＤＯＭツリー４０をＤＯＭインタフェース６０を介して参照あるいは操作する。
【０１０４】
図４は、ＸＭＬプロセッサ１０の機能を実現するためのハードウェア資源としての情報処理装置１１の構成を示すブロック図である。
【０１０５】
図示されるように、情報処理装置１１は、操作部１２、入出力部１８、ＣＰＵ２０、ネットワークＩ／Ｆ２２、ＲＯＭ２４、及びＲＡＭ２６を含んで構成され、これらはバスにより相互に接続されている。
【０１０６】
操作部１２は、例えばキーボード等により構成され、ユーザは操作部１２を用いて任意のデータを入力し、所定の操作を行う。
【０１０７】
ネットワークＩ／Ｆ２２は、各種ネットワークに接続するためのインタフェースである。
【０１０８】
入出力部１８は、各種データを入出力するためのインタフェースである。入出力部１８には、各種データやプログラム等を記憶するためのＨＤＤ２８が接続されている。
【０１０９】
ＲＯＭ２４は、ＸＭＬプロセッサ１０の機能としての、ＸＭＬ文書からＤＯＭツリーを作成すると共に、ＤＯＭツリーを記憶するために必要な記憶容量を削減するための処理ルーチンのプログラムが記憶されている。更にＲＯＭ２４には、ＸＭＬ文書を扱う上位ＸＭＬアプリケーション５０とＤＯＭインタフェース６０の機能を実現するためのプログラムも記憶されている。
【０１１０】
ＣＰＵ２０は、ＲＯＭ２４等に記憶されたプログラムを実行することにより各種機能を実現する。
【０１１１】
ＲＡＭ２６は、ワーク領域１４と木構造記憶領域１６とを含んで構成されている。ワーク領域１４は、作業中のデータを記憶するための作業領域である。木構造記憶領域１６は、作成されたＤＯＭツリーを記憶するための領域である。
【０１１２】
図１に示されたＸＭＬプロセッサ１０は、図４に示される情報処理装置１１をハードウェア資源とし、ＲＯＭ２４に記憶されたプログラムをソフトウェア資源として用いて実現される機能である。なお、上位ＸＭＬアプリケーション５０及びＤＯＭインタフェース６０も、同様にして実現される機能である。
【０１１３】
以下、図５及び図６のフローチャートを参照しながら、本実施の形態のＸＭＬプロセッサ１０により実行される処理ルーチンについて説明する。
【０１１４】
図５は、本実施の形態のメインルーチンを示したフローチャートである。
【０１１５】
ステップ１００では、ＸＭＬプロセッサ１０は上位ＸＭＬアプリケーション５０からのＤＯＭツリー作成指示を受け、構造化文書（ここではＸＭＬ文書）を取得する。例えば、ＸＭＬ文書が記憶された記憶装置がＨＤＤ２８である場合には、ＨＤＤ２８から入出力部１８を介して取得され、ネットワーク上の他のコンピュータシステムに記憶されている場合には、ネットワークＩ／Ｆ２２を介して取得される。
【０１１６】
ステップ１０２では、取得したＸＭＬ文書を解析する。前述したように、開始タグ及び終了タグを順に検出することにより、これらタグで記述された要素毎にその構造と内容を解析する。
【０１１７】
ステップ１０４では、ステップ１０２の解析結果に基づいて、解析された要素をノードとする木構造（部分木）を作成する。作成した部分木は、ＲＡＭ２６のワーク領域１４に保持しておく。なお、以下では、ＲＡＭ２６の木構造記憶領域１６に記憶された木構造をＤＯＭツリーと呼称し、作成してワーク領域１４に保持する木構造を部分木と呼称して区別する。
【０１１８】
ステップ１０６では、木構造記憶領域１６の空き容量が不足しているか否かを判断する。ここで、空き容量が不足していない、すなわち、ステップ１０４で作成された部分木を格納するために十分な空き容量があると判断した場合には、ステップ１１２に移行し、ワーク領域１４の部分木を木構造記憶領域１６に格納する。
【０１１９】
なお、空き容量が不足しているか否かを判断するために、例えば、閾値を設けておき、木構造記憶領域１６の空き容量が閾値以下となった場合に、空き容量が不足していると判断してもよいし、ＤＯＭツリーの記憶量が閾値に到達した場合に、空き容量が不足していると判断してもよい。
【０１２０】
続いて、ステップ１１４で、ＸＭＬ文書全体の解析が終了したか否かを判断し、ＸＭＬ文書全体の解析が終了していないと判断した場合には、ステップ１０２に戻り、更にＸＭＬ文書を読み込んで、次の開始タグ及び終了タグを検出することにより、ＸＭＬ文書の解析を続ける。
【０１２１】
一方、ステップ１０６で、空き容量が不足していると判断した場合には、ステップ１０８に移行し、ＤＯＭツリーを記憶するために必要な記憶容量を削減するための記憶容量削減処理を実行する。
【０１２２】
図６は、本実施の形態における記憶容量削減処理の流れを示すフローチャートである。
【０１２３】
ステップ２００では、ＸＭＬプロセッサ１０は、木構造記憶領域１６に記憶されているＤＯＭツリーから同一部分を探索する。木構造記憶領域１６に記憶されているＤＯＭツリーは、前述のステップ１０２、１０４、１１２の処理により作成されて記憶された部分木により構成されたＤＯＭツリーである。また、同一部分とは、要素、要素の内容、及び階層構造が同一の部分をいう。
【０１２４】
図７は、同一部分を含むＸＭＬ文書の一例を示した図である。図示されるように、ＸＭＬ文書中に、Ｘで示される部分と、Ｘの要素、要素の内容、及び階層構造が同一のＹで示される部分が含まれている。
【０１２５】
図８（Ａ）は、要素、要素の内容、階層構造が同一の部分を含むＸＭＬ文書から作成したＤＯＭツリーの一例を示した図である。図示されるように、同一部分８０ａ、８０ｂが含まれている。ステップ２００では、このような同一部分を探索する。
【０１２６】
ステップ２０２では、探索の結果、記憶されているＤＯＭツリーの中に、同一部分があるか否かを判断する。
【０１２７】
ここで、同一部分があると判断した場合には、ステップ２０４で、該同一部分のうち、特定の同一部分を除いた全ての同一部分の木構造を木構造記憶領域１６から削除する。ここでは、同一部分の中から特定の同一部分１つを選択し、他のものについては削除する。
【０１２８】
例えば、図８（Ａ）に示される例では、同一部分８０ａを特定の同一部分として選択し、該選択した同一部分８０ａは削除せず、他の同一部分、ここでは、同一部分８０ｂを削除することができる。なお、同一部分が３つ以上ある場合であっても、同様に、特定の同一部分を選択して、選択したものは削除せずに他を削除するようにすればよい。
【０１２９】
図８（Ｂ）は、図８（Ａ）のＤＯＭツリーから同一部分の木構造を削除した状態を示した図である。図示されるように、同一部分８０ｂが削除されている。
【０１３０】
ステップ２０６では、リンク情報を作成する。ここで、リンク情報とは、前述の特定の同一部分の記憶場所を示す情報である。図８（Ｂ）に示される例では、特定の同一部分８０ａの記憶場所を示す情報がリンク情報として作成される。
【０１３１】
ステップ２０８では、作成したリンク情報を、削除した同一部分８０ｂに対応させて木構造記憶領域１６に格納する。図８（Ｂ）に示される例では、ノードＣから、ノードＢ直下のノードＥに対してリンクを張ることに相当する。
【０１３２】
ステップ２１０では、木構造記憶領域１６に記憶されているＤＯＭツリーの探索が終了したか否かを判断する。ここで、探索が終了していないと判断した場合には、ステップ２００に戻り、記憶容量削減処理を続行する。探索が終了したと判断した場合には、本記憶容量削減処理ルーチンを終了する。本処理ルーチンにより、ＤＯＭツリーを記憶するために必要な記憶容量を削減することができる。
【０１３３】
図５のステップ１０８の記憶容量削減処理が終了すると、ＸＭＬプロセッサ１０は、ステップ１１０で、木構造記憶領域１６の空き容量不足が解消されたか否かを判断する。空き容量不足が解消されなかったと判断した場合には、ステップ１１６で空き容量不足を示すメッセージを図示されない表示部等に表示し、ＤＯＭツリーを作成するための処理を終了する。
【０１３４】
ステップ１１０で、空き容量不足が解消されたと判断した場合、例えば、木構造記憶領域１６の空き容量が前述した閾値を超えた場合等には、ステップ１１２に移行し、ステップ１０４で作成した部分木を木構造記憶領域１６に格納する。
【０１３５】
ステップ１１４では、取得したＸＭＬ文書全体の解析が終了しているか否かを判断し、終了していないと判断した場合には、ステップ１０２に戻り、更にＸＭＬ文書を読み込んで、次の開始タグ及び終了タグを検出することにより、ＸＭＬ文書の解析を続ける。終了したと判断した場合には、ＤＯＭツリーを作成するための処理を終了する。
【０１３６】
なお、ＤＯＭツリー作成後、ＸＭＬプロセッサ１０は、作成したＤＯＭツリーを上位ＸＭＬアプリケーション５０に渡す。なお、ここでは、上位ＸＭＬアプリケーション５０及びＤＯＭインタフェース６０はＸＭＬプロセッサ１０と同じハードウェア資源を用いて実現されているため、ＸＭＬプロセッサ１０は、上位ＸＭＬアプリケーション５０に、作成したＤＯＭツリーの格納場所をＤＯＭインタフェース６０を介して通知する形でＤＯＭツリーを渡す。
【０１３７】
以上説明したように、本実施の形態では、特定の同一部分を除く同一部分を削除し、特定の同一部分の記憶場所を示す情報を、削除した同一部分に対応させて記憶するようにしたため、木構造を記憶するために必要な記憶容量を大幅に削減することができる。
【０１３８】
なお、本実施の形態では、木構造記憶領域１６が容量不足となった場合に、同一部分を探索して削除する記憶容量削減処理を実行する例について説明したが、木構造記憶領域１６の容量不足に拘らず、部分木作成毎、あるいは所定の間隔で、記憶容量削減処理を実行するようにしてもよい。
【０１３９】
また、本実施の形態では、木構造記憶領域１６に記憶されているＤＯＭツリーを対象として記憶容量削減処理を実行する例について説明したが、木構造記憶領域１６に記憶する前段階の、ワーク領域１４に保持された部分木を対象として記憶容量削減処理を実行するようにしてもよい。
【０１４０】
また、部分木を作成する毎に、作成した部分木と、木構造記憶領域１６に記憶されているＤＯＭツリーとを比較し、同一部分があれば、作成した部分木の中で同一部分以外の部分については、木構造の状態で木構造記憶領域１６に記憶し、同一部分については、木構造を記憶する代わりに、既に木構造記憶領域１６に記憶されている同一部分の記憶場所を示す情報を記憶するようにしてもよい。
【０１４１】
このような構成によっても、木構造を記憶するために必要な記憶容量を大幅に削減することができる。
【０１４２】
［第２の実施の形態］
第１の実施の形態では、同一部分を対象として処理する例について説明したが、本実施の形態では、類似する部分を対象として処理する例について説明する。
【０１４３】
なお、本実施の形態におけるハードウェア構成、ソフトウェア構成、及びＸＭＬプロセッサ１０により実行されるメインルーチンは第１の実施の形態と同様であるため、説明を省略する。本実施の形態では、第１の実施の形態で図６のフローチャートを用いて説明した記憶容量削減処理に代えて、以下に説明する記憶容量削減処理を実行する。
【０１４４】
まず、図９（Ａ）、（Ｂ）を参照しながら、本実施の形態の記憶容量削減処理について簡単に説明する。
【０１４５】
まず、木構造記憶領域１６に記憶されているＤＯＭツリーから類似部分を検出する。図９（Ａ）に示されるように、類似部分（部分木）８２、８４は、ノードＧとノードＩとが異なっているが、他のノードや階層構造は同じであり、類似しているといえる。
【０１４６】
図９（Ｂ）に示されるように、検出した類似部分８２，８４から、共通化できる部分については、ＸＭＬ形式の共通のデータ（以下、共通データと呼称）８６を作成し、相違部分については、類似部分それぞれに対応したＸＭＬ形式の相違部分のデータ（以下、ＸＭＬデータと呼称）８８、９０を作成する。類似部分の木構造は削除して、代わりに作成したデータの記憶場所を示す情報を記憶する。
【０１４７】
なお、本実施の形態では、共通データ８６には、類似部分に共通のテキストデータの他、Xinclude要素やXSLT（XSL Transformations：ＸＭＬのデータ変換言語）による記述を含め作成する例について説明する。この記述については後述する。
【０１４８】
図１０は、本実施の形態で実行される記憶容量削減処理ルーチンの流れを詳細に示したフローチャートである。本図を用いて、記憶容量削減処理について更に詳細に説明する。
【０１４９】
ステップ３００では、ＸＭＬプロセッサ１０は、木構造記憶領域１６に記憶されているＤＯＭツリーから同一タグ名を探索する。すなわち、ここで同一タグ名を探索することにより、類似している可能性のある部分を抽出する。具体的には、ＸＭＬプロセッサ１０は、記憶されているＤＯＭツリーを、ルートノードから下位方向に順に探索していき、同一のタグ名を検出する。
【０１５０】
ステップ３０２で、探索の結果、記憶されているＤＯＭツリーの中に、同一タグ名のノードが検出できたか否かを判断する。同一タグ名のノードが検出できなかったと判断した場合には、ＤＯＭツリーには類似部分が無いと判断することができ、本記憶容量削減処理ルーチンを終了する。
【０１５１】
また、ステップ３０２で、同一タグ名のノードが検出できたと判断した場合には、ステップ３０４で検出された同一タグ名のノードの下位に位置するノード（下位ノード）の量を算出する。例えば、図９（Ａ）に示される例において、同一タグ名のノード「Ｃ」の下位ノードは、部分木８２では、ノードＦ、Ｈ、及びＧであり、部分木８４では、ノードＦ、Ｈ、及びＩである。ここでは、全ての下位ノードの量を算出する。
【０１５２】
ステップ３０６では、算出された下位ノードの量が予め指定された量以上であるか否かを判断する。ここで、下位ノードの量が予め指定された量未満である場合には、下位ノードの量は少ないため、これ以降の処理を行ったとしても、木構造記憶領域１６の容量不足を解消することはできないと判断し、ステップ３２２に移行する。
【０１５３】
ステップ３２２では、木構造記憶領域１６に記憶されているＤＯＭツリーの探索が終了したか否かを判断する。ここで、探索が終了していないと判断した場合には、ステップ３００に戻り、記憶容量削減処理を続行する。探索が終了したと判断した場合には、本記憶容量削減処理ルーチンを終了する。
【０１５４】
一方、ステップ３０６で、下位ノードの量が予め指定された量以上である場合には、ステップ３０８で、下位ノードの量に基づいて、閾値を算出する。この閾値は、類似している可能性のある部分として検出された部分木において、相違部分がどの程度存在するかを判断するための指標であり、これにより、検出された類似している可能性のある部分を、類似部分として扱うか否かを定めることができる。なお、ここでは、下位ノード量が多いほど、閾値が高くなるように算出する。
【０１５５】
ステップ３１０では、相違量を算出する。具体的には、下位ノードの各ノードのタグ名、及びタグの配列（配列状態や内容）を比較し、相違部分を抽出してカウントすることにより、相違量を算出する。例えば、図９（Ａ）では、ノードＧ及びノードＩが異なるため、この部分がカウント対象となる。
【０１５６】
相違量を算出する場合に、相違部分の種類によってカウント数を異ならせるようにしてもよい。例えば、内容（数字等）が異なる場合にはカウント数を低くし、タグ名が異なる場合にはカウント数を高くするようにすることもできる。これは、内容（数字等）が異なる場合には、Xinclude要素やXSLTによる記述が容易であり、タグ名が異なる場合には、これらによる記述が困難であることに起因する。Xinclude要素及びXSLTは、本記憶容量削減処理において各類似部分に共通のデータを作成する際に用いられるが、この点については後述する。
【０１５７】
なお、相違量をカウントするためのカウント数の設定は、予め設定されていてもよいし、ユーザが任意に設定するようにしてもよい。
【０１５８】
ステップ３１２では、算出した相違量と算出した閾値とを比較する。相違量が閾値より大きい場合には、類似している可能性があるとして検出した部分は、類似部分ではないと判断し、ステップ３２２に移行し、探索が終了していなければ探索処理を繰り返す。
【０１５９】
ステップ３１２で、相違量が閾値以下である場合には、類似している可能性があるとして検出した部分は、類似部分であると判断し、ステップ３１４に移行し、該類似部分から抽出された相違部分の種類を分析する。相違部分の種類により、Xinclude要素を用いてデータを作成するインサートデータ作成処理、及びXSLTを用いてデータを作成する変換データ作成処理、のいずれの処理を行うかを決定する。
【０１６０】
ステップ３１６で、相違部分の種類がインサートデータ作成処理に適合していると判断した場合には、ステップ３１８のインサートデータ作成処理を実行する。また相違部分の種類が変換データ作成処理に適合していると判断した場合には、ステップ３２０の変換データ作成処理を実行する。
【０１６１】
ここで、インサートデータ作成処理及び変換データ作成処理について詳細に説明する。
【０１６２】
例えば、相違部分に同じタグ名で内容が異なるノードが多く含まれている場合には、共通部分に各類似部分の相違部分を挿入することで各類似部分を表現できる。従って、この場合には、Xinclude要素を用いるインサートデータ作成処理に適合していると判断することができる。
【０１６３】
図１１（Ａ）及び（Ｂ）は、インサートデータ作成処理に適合するＸＭＬ文書の一例である。図示されるように、図１１（Ａ）の元データ１で示されたＸＭＬ文書と図１１（Ｂ）の元データ２で示されたＸＭＬ文書は、共通部分の他に相違部分Ａ１またはＡ２を含んで構成されている。相違部分Ａ１及びＡ２は、同じタグ名kindsであるが、内容が異なっている。従って、インサートデータ作成処理に適合していると判断できる。
【０１６４】
図１２は、インサートデータ作成処理のサブルーチンを示したフローチャートである。
【０１６５】
ステップ４００では、ＸＭＬプロセッサ１０は、Xinclude要素を用いて、共通データを作成すると共に、相違部分のＸＭＬデータを類似部分毎に作成する。
【０１６６】
ステップ４０２では、作成した共通データ及びＸＭＬデータを記憶する。
【０１６７】
作成した共通データ及びＸＭＬデータはテキストデータであるため、データ量が小さく、木構造を記憶する場合に比して記憶するために必要な記憶容量は少なくて済む。また、共通データを作成することにより、共通化できる部分は共通化されているため、記憶するために必要な記憶容量は少なくて済む。
【０１６８】
なお、共通データ及びＸＭＬデータの記憶場所は、特に限定されず、例えば、ＨＤＤ２８とすることもできる。また、ネットワークに接続された他のコンピュータシステムの記憶装置とすることもできる。
【０１６９】
ステップ４０４では、リンク情報を作成する。ここで、リンク情報とは、ステップ４０２で記憶した共通データ及びＸＭＬデータの記憶場所を示す情報である。
【０１７０】
ステップ４０６では、木構造記憶領域１６から類似部分の木構造を削除し、各類似部分に対応させて各リンク情報を木構造記憶領域１６に格納する。また、本実施の形態では、図１４に示されるような管理テーブルを例えばＲＡＭ２６のワーク領域あるいはＨＤＤ２８上に設け、各リンク情報を格納して一元管理する。これにより各類似部分に対応させて各リンク情報を記憶する際に、各リンク情報に代えて管理テーブルの管理番号を記憶することもできる。
【０１７１】
図１３（Ａ）、（Ｂ）、及び（Ｃ）は、図１１の元データ１及び元データ２に基づいて作成された、共通データとＸＭＬデータを示した図である。
【０１７２】
図１３（Ｂ）に示されるように、ＸＭＬデータ１は、図１１（Ａ）の元データ１に対応するデータであり、元データ１の相違部分Ａ１が記述されている。図１３（Ｃ）のＸＭＬデータ２は、図１１（Ｂ）の元データ２に対応するデータであり、元データ２の相違部分Ａ２が記述されている。
【０１７３】
また、図１３（Ａ）の共通データは、元データ１及び元データ２に共通する共通部分の他に、ＸＭＬデータ１及びＸＭＬデータ２に記述されたデータが挿入されるようにXinclude要素Ａ３を含んで構成されている。Xinclude要素Ａ３で指定されるアドレスは、ＸＭＬデータ１及びＸＭＬデータ２の記憶場所を示すものであり、本実施の形態では、例えば、前述の管理テーブルの記憶場所としてもよい。
【０１７４】
このように、共通データは、類似部分の中の共通部分を含むように作成されると共に、類似部分の中の相違部分を挿入可能に作成される。
【０１７５】
一方、例えば、相違部分が、一部分の数字やテキストが異なるが、内容はほぼ同じような場合には、各々の類似部分の相違部分を所定の規則により変換することで各類似部分を表現できる。従って、この場合には、XSLTを用いる変換データ作成処理に適合していると判断することができる。
【０１７６】
図１５（Ａ）及び（Ｂ）は、変換データ作成処理に適合したＸＭＬ文書の一例である。図示されるように、図１５（Ａ）の元データ１で示されたＸＭＬ文書と図１５（Ｂ）の元データ２で示されたＸＭＬ文書は、共通部分の他に、相違部分Ｂ１またはＢ２を含んで構成されている。相違部分Ｂ１及びＢ２は、数字やテキストが異なっているが、内容はほぼ同じである。従って、変換データ作成処理に適合していると判断できる。
【０１７７】
図１６は、変換データ作成処理のサブルーチンを示したフローチャートである。
【０１７８】
ステップ５００では、ＸＭＬプロセッサ１０は、XSLTを用いて、共通データを作成すると共に、相違部分のＸＭＬデータを類似部分毎に作成する。
【０１７９】
ステップ５０２乃至ステップ５０６の処理は、上述したインサートデータ作成処理のサブルーチンのステップ４０２乃至４０６の処理と同様であるため説明を省略する。
【０１８０】
図１７（Ａ）、（Ｂ）、及び（Ｃ）は、図１５の元データ１及び元データ２に基づいて作成された、共通データとＸＭＬデータを示した図である。
【０１８１】
図１７（Ａ）に示されるように、共通データは、元データ１及び元データ２に共通する共通部分の他に、XSLTの変換式Ｂ３を含んで構成されている。
【０１８２】
また、図１７（Ｂ）のＸＭＬデータ１は、元データ１に対応するデータであり、元データ１の相違部分Ｂ１に相当する記述が含まれている。図１７（Ｃ）のＸＭＬデータ２は、元データ２に対応するデータであり、元データ２の相違部分Ｂ２に相当する記述が含まれている。図から明らかなように、ＸＭＬデータの記述は、図１５の元データ１及び２の相違部分そのままが記述されたものにはなっていない。相違部分において、XSLTを用いて変換可能な内容（本例では数字）については、共通データにXSLTの変換式Ｂ３を記述することで対応しているためである。
【０１８３】
図に示される例では、変換式Ｂ３において、ノードをそのままコピーするcopy-of要素のselect属性で、コピーしたノードの位置を表すposition()関数を用いて指定することにより、元データの相違部分に記述された数字が構成され、各ノードの値をそのまま文字列として出力するためのvalue of 要素のselect属性で、カレントノード（図では”．”で示されている）を指定することにより、相違部分に記述された文字列の内容が構成される。
【０１８４】
このように、XSLTの変換式を用いることにより、類似部分を効率的に共通化することができる。
【０１８５】
このように、図１０のステップ３１８またはステップ３２０を実行した後は、ステップ３２２では、木構造記憶領域１６に記憶されているＤＯＭツリーの探索が終了したか否かを判断する。ここで、探索が終了していないと判断した場合には、ステップ３００に戻り、記憶容量削減処理を続行する。探索が終了したと判断した場合には、本記憶容量削減処理ルーチンを終了する。
【０１８６】
以上説明したように、類似部分の木構造を削除し、各類似部分に共通する情報を示す共通データと、相違部分の情報を示すＸＭＬデータとを作成して、各類似部分に対応させて記憶し、作成したデータの記憶場所を示す情報を木構造の代わりに記憶するようにしたため、木構造を記憶するために必要な記憶容量を大幅に削減することができる。
【０１８７】
なお、本実施の形態でも第１の実施の形態と同様に、記憶容量削減処理の実行を、木構造記憶領域１６が容量不足となった場合に限らず、部分木作成毎、あるいは所定の間隔で実行するようにしてもよい。
【０１８８】
また、本実施の形態では、木構造記憶領域１６に記憶されているＤＯＭツリーを対象として記憶容量削減処理を実行する例について説明したが、木構造記憶領域１６に記憶する前段階の、ワーク領域１４に保持された部分木を対象として記憶容量削減処理を実行するようにしてもよい。
【０１８９】
また、部分木を作成する毎に、作成した部分木と、木構造記憶領域１６に記憶されているＤＯＭツリーとを比較して類似部分を探索するようにしてもよい。この場合には、類似部分があれば、作成した部分木の中で類似部分以外の部分については、木構造の状態で木構造記憶領域１６に記憶し、既に木構造記憶領域１６に記憶されているＤＯＭツリーの類似部分については、木構造を削除する。また、各類似部分について、共通データ及びＸＭＬデータを作成して記憶し、この記憶場所を示すリンク情報を木構造の代わりに各類似部分に対応させて記憶するように処理する。
【０１９０】
このような構成によっても、木構造を記憶するために必要な記憶容量を大幅に削減することができる。
【０１９１】
［第３の実施形態］
本実施の形態では、第２の実施の形態で説明した記憶容量削減処理において、相違量が０の場合には、同一部分と判断して処理する例について説明する。
【０１９２】
本実施の形態におけるハードウェア構成、ソフトウェア構成、及びＸＭＬプロセッサ１０により実行されるメインルーチンは第１の実施の形態と同様であるため、説明を省略する。ＸＭＬプロセッサ１０により実行される記憶容量削減処理ルーチンは、第２の実施の形態において図１０のフローチャートを用いて説明した記憶容量削減処理ルーチンのステップ３１２に代えて、図１８に示されるステップ６００乃至ステップ６０８を実行する。
【０１９３】
ステップ３１０で相違量を算出した後、ステップ６００で、ＸＭＬプロセッサ１０は、相違量が０であるか否かを判断する。相違量が０であると判断した場合には、類似している可能性があるとして検出した部分は、同一部分であると判断し、ステップ６０４に移行する。ステップ６０４からステップ６０８では、第１の実施の形態で説明したステップ２０４からステップ２０８の処理と同様に、ＤＯＭツリーから特定の同一部分以外の同一部分を削除し、特定の同一部分の記憶場所を示すリンク情報を作成して、削除した同一部分に対応させて該リンク情報を記憶する。その後は、図１０のステップ３２２に移行し、更に探索処理を続けるか、あるいは記憶容量削減処理を終了する。
【０１９４】
ステップ６００で、相違量が０ではないと判断した場合には、類似している可能性があるとして検出した部分は、同一部分ではないと判断し、ステップ６０２に移行する。ステップ６０２では、相違量が閾値以下であるか否かを判断する。相違量が閾値以下の場合には、類似している可能性があるとして検出した部分は、類似部分であると判断し、ステップ３１４に移行する。その後の処理は、第２の実施の形態と同様であるため、説明を省略する。
【０１９５】
以上説明したように、同一部分及び類似部分の双方を処理対象として記憶容量削減処理を行うようにしたため、木構造を記憶するために必要な記憶容量を効率的に削減することができる。
【０１９６】
［第４の実施形態］
本実施の形態では、第２の実施の形態及び第３の実施の形態で木構造記憶領域１６から削除した類似部分がアクセスされた場合に、類似部分を木構造に展開する例、及び木構造記憶領域１６が容量不足となった場合、または展開した類似部分に所定時間アクセスが無い場合、のいずれか一方の場合に再度削除する例について説明する。
【０１９７】
なお、展開とは、ここでは、前述した処理ルーチンで作成された共通データ及びＸＭＬデータを用いて木構造を作成し、木構造記憶領域１６に記憶する処理をいう。
【０１９８】
図１９は、本実施の形態における展開処理ルーチンを示すフローチャートである。
【０１９９】
ステップ７００では、ＸＭＬプロセッサ１０は、ＤＯＭツリーから削除され類似部分が上位ＸＭＬアプリケーション５０からアクセスされたか否かを判断する。類似部分がアクセスされたと判断された場合には、前述の管理テーブルに該類似部分に対応して記憶されている共通データ及びＸＭＬデータのリンク情報から、共通データ及びＸＭＬデータを読み出す。
【０２００】
ステップ７０４では、読み出した共通データ及びＸＭＬデータを用いて、元の木構造に展開する。具体的には、共通データ及びＸＭＬデータから元のＸＭＬ文書を作成し、該作成されたＸＭＬ文書から木構造を再作成して木構造記憶領域１６に記憶する。
【０２０１】
ステップ７０６では、前述の管理テーブルの、該類似部分に対応して記憶されているリンク情報を削除する。
【０２０２】
ステップ７０８では、展開した類似部分に対応するタイマＴｉをスタートさせる。タイマＴｉは、展開した類似部分毎に設けられ、後述する削除処理ルーチンで、展開した類似部分の木構造を削除するか否かの判断に用いられる。
【０２０３】
このように、類似部分がアクセスされた場合には、木構造に展開するようにしたため、上位ＸＭＬアプリケーション５０は、類似部分と類似部分以外の部分とを区別することなくアクセスすることができる。
【０２０４】
図２０は、本実施の形態における削除処理ルーチンを示すフローチャートである。
【０２０５】
ステップ８００では、木構造記憶領域１６が容量不足となったか否かを判断する。容量不足ではないと判断した場合には、ステップ８０２に移行する。
【０２０６】
ステップ８０２では、ＸＭＬプロセッサ１０は、タイマＴｉが閾値Ｔｔｈ以上になったか否かを判断する。タイマＴｉが閾値Ｔｔｈ未満の場合には、ステップ８０４で、対応の類似部分にアクセスがあったか否かを判断する。アクセスがあったと判断した場合には、タイマＴｉをリセットして、ステップ８００に戻る。また、ステップ８０４でアクセスがないと判断した場合には、タイマＴｉをリセットせずにステップ８００に戻る。
【０２０７】
ステップ８００で、木構造記憶領域１６が容量不足であると判断した場合、または、ステップ８０２で、タイマＴｉが閾値Ｔｔｈ以上になったと判断した場合には、ステップ８０８で、タイマＴｉに対応する展開した類似部分を木構造記憶領域１６から削除する。
【０２０８】
ステップ８１０では、削除した類似部分に対応するリンク情報を作成して再度管理テーブルに格納する。ステップ８１２で、タイマＴｉをリセットして終了する。
【０２０９】
このように、木構造記憶領域１６が容量不足となった場合、または展開した類似部分に所定時間アクセスが無い場合、のいずれか一方の場合に、展開した木構造を木構造記憶領域１６から削除するようにしたため、木構造記憶領域１６が容量不足となってエラーが発生する、あるいはアクセス効率が低下する、といった事態が発生することを防止することができる。
【０２１０】
【発明の効果】
以上説明したように、本発明によれば、複数の要素を階層的に記述する文書から作成された木構造の階層構造を変化させることなく、木構造を記憶するために必要な記憶容量を削減することができる、という効果を奏する。
【図面の簡単な説明】
【図１】図１（Ａ）は、本実施の形態に係る構造化文書処理システムの機能構成と、構造化文書処理システムで行われる処理を概念的に示した図であり、図１（Ｂ）は、図１（Ａ）で示された構造化文書処理システムの機能構成と、構造化文書処理システムで行われる処理とを更に詳細に示した図である。
【図２】ＸＭＬ文書の一例を示した図である。
【図３】図２のＸＭＬ文書を解析して作成されたＤＯＭツリーの一例を示した図である。
【図４】ＸＭＬプロセッサの機能を実現するためのハードウェア資源としての情報処理装置の構成を示すブロック図である。
【図５】ＸＭＬプロセッサにより実行される処理ルーチンのメインルーチンを示したフローチャートである。
【図６】第１の実施の形態に係る記憶容量削減処理ルーチンを示すフローチャートである。
【図７】同一部分を含むＸＭＬ文書の一例を示した図である。。
【図８】図８（Ａ）は、要素、要素の内容、階層構造が同一の部分を含むＸＭＬ文書から作成したＤＯＭツリーの一例を示した図であり、図８（Ａ）は、図８（Ｂ）のＤＯＭツリーから同一部分の木構造を削除した状態を示した図である。。
【図９】図９（Ａ）は、類似部分を含むＤＯＭツリーの一例であり、図９（Ｂ）は、類似部分の木構造を削除して、代わりに作成した共通データ及びＸＭＬデータの記憶場所を示す情報を記憶した状態を示した図である。
【図１０】第２の実施の形態に係る記憶容量削減処理ルーチンを示すフローチャートである。
【図１１】インサートデータ作成処理に適合するＸＭＬ文書の一例である。
【図１２】インサートデータ作成処理のサブルーチンを示したフローチャートである。
【図１３】図１３（Ａ）は、図１１の元データ１及び元データ２に基づいて作成された共通データを示した図であり、図１３（Ｂ）及び図１３（Ｃ）は、図１１の元データ１及び元データ２に基づいて作成されたＸＭＬデータを示した図である。
【図１４】共通データ及びＸＭＬデータの記憶場所を示す情報を記憶した管理テーブルの一例である。
【図１５】変換データ作成処理に適合したＸＭＬ文書の一例である。
【図１６】変換データ作成処理のサブルーチンを示したフローチャートである。
【図１７】図１７（Ａ）は、図１５の元データ１及び元データ２に基づいて作成された共通データを示した図であり、図１７（Ｂ）及び図１７（Ｃ）は、図１５の元データ１及び元データ２に基づいて作成されたＸＭＬデータを示した図である。
【図１８】第３の実施の形態に係る記憶容量削減処理ルーチンのフローチャートの一部である。
【図１９】第４の実施の形態に係る展開処理ルーチンを示すフローチャートである。
【図２０】第４の実施の形態に係る削除処理ルーチンを示すフローチャートである。
【符号の説明】
１０ＸＭＬプロセッサ
１１情報処理装置
１６ワーク領域
１８木構造記憶領域
２０ＣＰＵ
２４ＲＯＭ
２６ＲＡＭ
５０上位ＸＭＬアプリケーション[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to an information processing apparatus for processing a document in which a plurality of elements are hierarchically described.PlacementIn particular, an information processing apparatus that reduces the storage capacity required to store a tree structure created from the document.PlacementAnd programs.
[0002]
[Prior art]
XML (eXtensible Markup Language) is known as a unified standard for making it possible to use a data format that has been varied depending on a computer or application, even in a different computer or application. XML is a typical standard for structured documents that hierarchically describe a plurality of elements.
[0003]
DOM (Document Object Model) is known as an API (Application Programming Interface) for operating a structured document such as an XML document by an application. DOM is an API for handling a structured document as a tree-structured object.
[0004]
By converting the structured document to DOM, the application can recognize the tree structure of the structured document and can access necessary elements by tracing the tree.
[0005]
However, when converting a structured document to DOM, the larger the size of the structured document, the larger the converted tree structure, and a larger storage capacity is required to store this. Even when the hierarchy is deep, a large storage capacity is required to represent the hierarchical structure. Therefore, there is a problem that even if the tree structure is converted, the data cannot be stored in the storage device due to insufficient capacity, and the processing is interrupted due to an error, or even if the data can be stored, the access efficiency is deteriorated.
[0006]
As a device for reducing the necessary storage capacity, a conversion device (see, for example, Patent Document 1) that reduces the operation storage capacity by reducing the hierarchy of the structured document and improves data access efficiency. There is known a structured document conversion apparatus (see, for example, Patent Document 2) that generates a single new element by joining the contents of a plurality of elements having the same element name at relatively the same position.
[0007]
[Patent Document 1]
JP 2002-297469 A
[Patent Document 2]
JP 2002-108850 A
[0008]
[Problems to be solved by the invention]
However, in the conventional conversion device and structured document conversion device, the hierarchical structure itself of the structured document is changed, so that there is a problem that the application that operates the structured document must recognize the change in the hierarchical structure. is there. In addition, there is a problem that a conventional structured document conversion apparatus that synthesizes a plurality of elements can only synthesize unless there is an element having the same element name at a relatively same hierarchical position.
[0009]
  The present invention has been proposed in order to solve the above-described problem, and stores a tree structure without changing the hierarchical structure of the tree structure created from a document that hierarchically describes a plurality of elements. Processing equipment that can reduce the storage capacity required for storagePlacementAnd to provide a program.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, a first information processing apparatus according to the present invention includes a storage unit that stores a tree structure corresponding to a document in which a plurality of elements are hierarchically described, and the stored tree structure, The same part detecting means for detecting the same part having the same element, element content and element hierarchical structure, and generating means for generating information indicating the storage location of the specific same part when the same part is detected And when the same part is detected, the same part other than the specific same part is deleted from the storage means, and information indicating the storage location of the specific same part corresponding to the deleted same part And processing means for processing so as to be stored in the storage means.
[0011]
The storage unit of the first information processing apparatus of the present invention stores a tree structure corresponding to a document in which a plurality of elements are described hierarchically. The storage means is not particularly limited, and may be a RAM generally used as a main memory, for example. The tree structure may be a tree structure having a plurality of elements and element contents as nodes.
[0012]
A document in which a plurality of elements are described hierarchically can be, for example, a structured document represented by an XML document or the like.
[0013]
The same part detection means detects the same part having the same element, element content, and element hierarchical structure in the stored tree structure. The generation unit generates information indicating a storage location of a specific identical part when the identical part is detected.
[0014]
The specific identical part may be any of a plurality of identical parts detected by the identical part detection means, and is not particularly limited.
[0015]
The processing means performs processing so as to delete the same part other than the specific identical part from the storage means when the same part is detected. The processing means performs processing so as to store in the storage means information indicating the storage location of the specific identical part corresponding to the deleted identical part.
[0016]
That is, for the same part other than the specific same part, the tree structure is deleted, and information indicating the storage location of the specific same part is stored instead.
[0017]
Since the data amount of the information indicating the storage location is small, even if the information indicating the storage location is stored, the storage amount of the storage means does not increase significantly. On the other hand, since the amount of data in the tree structure is large, the storage capacity necessary for storing the tree structure is greatly reduced by deleting the same part of the tree structure.
[0018]
In addition, even when referring to the deleted identical part, it is possible to refer to the tree structure of the specific identical part in place of the deleted identical part using information indicating the storage location of the specific identical part. Therefore, the operability for the tree structure is not impaired.
[0019]
  As described above, since the same part can be shared, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure..
[0020]
  The first program of the present invention is a computer.TheStorage means storing a tree structure corresponding to a document in which a plurality of elements are described hierarchicallyRecorded inIdentical part detection that detects identical parts with the same element, element content, and element hierarchy in the remembered tree structuremeans,Generation that generates information indicating the storage location of a specific identical part when the identical part is detectedMeans, andWhen the same part is detected, the same part other than the specific same part is deleted from the storage means, and information indicating the storage location of the specific same part corresponding to the deleted same part is stored in the memory Processing to be stored in the meansFunction as a means.
[0021]
  The present inventionThe firstSince the program No. 1 operates in the same manner as the first information processing apparatus of the present invention, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure.
[0022]
According to a second information processing apparatus of the present invention, a tree structure creating means for creating a tree structure corresponding to a document in which a plurality of elements are hierarchically described, and in the created tree structure, elements, element contents, The same part detecting means for detecting the same part having the same hierarchical structure of the elements, the generating means for generating information indicating the storage location of the specific same part when the same part is detected, and the created A storage unit that stores a part other than the same part of the tree structure and the specific same part, and stores information indicating a storage location of the specific same part corresponding to the same part other than the specific same part; It is comprised including.
[0023]
The tree structure creation means of the second information processing apparatus of the present invention creates a tree structure corresponding to a document in which a plurality of elements are described hierarchically.
[0024]
Moreover, the same part detection means of the second information processing apparatus detects the same part in the created tree structure. The generation unit generates information indicating a storage location of a specific identical part when the identical part is detected.
[0025]
The storage means stores a portion other than the same portion of the created tree structure and a specific same portion. Further, information indicating the storage location of the specific identical part is stored in association with the identical part other than the specific identical part.
[0026]
That is, for the detected identical part, only the specific identical part is stored in the storage means as a tree structure, and for the other identical parts, information indicating the storage location of the specific identical part is stored instead of the tree structure. .
[0027]
Thus, by not storing the tree structure of the same part, the storage capacity required for storing the tree structure is greatly reduced.
[0028]
In addition, even when referring to the same part that is not stored, the tree structure of the specific part is referred to instead of the same part that is not stored using information indicating the storage location of the specific part. Therefore, the operability for the tree structure is not impaired.
[0029]
  As described above, since the same part can be shared, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure..
[0030]
  The second program of the present invention is a computer.TheTree structure creation to create a tree structure corresponding to a document in which multiple elements are described hierarchicallymeans,In the created tree structure, the same part detection for detecting the same part having the same element, element content, and element hierarchical structuremeans,Generation that generates information indicating the storage location of a specific identical part when the identical part is detectedMeans, andA part other than the same part of the created tree structure and the specific same partIn memoryAnd storing information indicating the storage location of the specific identical part corresponding to the same part other than the specific identical partIn the storage meansRememberFunction as a processing means.
[0031]
  The present inventionThe firstSince the second program also operates in the same manner as the second information processing apparatus of the present invention, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure.
[0032]
  The third information processing apparatus of the present invention, DoubleA tree structure creating means for creating a tree structure corresponding to a document in which a number of elements are described hierarchically;Every time a new tree structure is created by the tree structure creation means, it is already stored in the storage means.Tree structure and saidNew with tree structure creation meansCompare with the created tree structure,New with tree structure creation meansIn the created tree structureAlready stored in storageThe same part detecting means for detecting the same part having the same tree structure and element, element content, and element hierarchical structure, and when the same part is detected,Already stored in storageWooden structureAboveGeneration means for generating information indicating the storage location of the same part, and parts other than the same part of the created tree structure when the same part is detectedAnd storing the information in the storage meansAnd corresponding to the same part of the created tree structureDescriptionAnd processing means for processing so as to store information indicating a part of storage locations in the storage means.
[0034]
The tree structure creating means creates a tree structure corresponding to a document in which a plurality of elements are described hierarchically.
[0035]
  The same part detection means isEvery time a new tree structure is created by the tree structure creation means, it is already stored in the storage means.With tree structureNew with tree structure creation meansCompare with the created tree structure,New with tree structure creation meansIn the created tree structureAlready stored in storageWith tree structureElement, element content, and element hierarchyDetect the same part.
[0036]
  If the same part is detected, the generating meansAlready stored in storageInformation indicating the storage location of the same part of the tree structure is generated.
[0037]
The processing means performs processing so as to store a portion other than the same portion of the created tree structure in the storage means when the same portion is detected. The processing means performs processing so as to store information indicating the storage location of the same portion of the stored tree structure in the storage means in correspondence with the same portion of the created tree structure.
[0038]
That is, for the detected identical part, information indicating the storage location of the identical part already stored in the storage means is stored instead of the tree structure.
[0039]
Thus, by not storing the tree structure of the same part, the storage capacity required for storing the tree structure is greatly reduced.
[0040]
Further, even when referring to the same part that has not been stored, the information indicating the storage location of the same part that has already been stored in the storage means is used to replace the same part that has not been stored. Since the tree structure can be referred to, the operability for the tree structure is not impaired.
[0041]
  As described above, since the same part can be shared, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure..
[0042]
  The third program of the present invention is a computer.TheTree structure creation to create a tree structure corresponding to a document in which multiple elements are described hierarchicallyEach time a new tree structure is created by the tree structure creating means, it is already stored in the storage means.Tree structure and saidNew with tree structure creation meansCompare with the created tree structure,New with tree structure creation meansIn the created tree structureAlready stored in storageIdentical part detection that detects identical parts with the same tree structure and elements, element content, and element hierarchymeans,When the same part is detected,Already stored in storageWooden structureAboveGeneration to generate information indicating the memory location of the same partMeans, and
  When the same part is detected, a part other than the same part of the created tree structureAnd storing the information in the storage meansAnd corresponding to the same part of the created tree structureDescriptionProcessing for storing information indicating a part of the storage location in the storage meansFunction as a means.
[0043]
  The present inventionThe firstSince the third program also operates in the same manner as the third information processing apparatus of the present invention, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure.
[0044]
  According to a fourth information processing apparatus of the present invention, there is provided storage means for storing a tree structure corresponding to a document in which a plurality of elements are hierarchically described, and the stored tree structureThe first extraction means for extracting a part including the element having the same element name from the first extraction means and the part other than the element having the same element name from each of the parts extracted by the first extraction means are different. A second extraction means for extracting as a part; and a part extracted by the first extraction means for which the data amount of the different part extracted by the second extraction means is a predetermined value or less, an element, an element content, and an element Similarity determination means for determining that the hierarchical structure is a similar part; andThe aboveJudged by similarity judgment meansBased on the similar part, the first data indicating information common to each similar part is created, and the second data indicating the information of the different part of the similar part is created for each similar part, the data creation means, Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each of the second data;in frontAll of the similar parts are deleted from the storage means, and information indicating the storage location of the first data and information indicating the storage location of each second data are stored in the storage means in correspondence with the deleted similar parts. And processing means for processing to do so.
[0045]
The storage unit of the fourth information processing apparatus of the present invention stores a tree structure corresponding to a document in which a plurality of elements are hierarchically described.
[0046]
  The first extraction meansTree structure stored in storage meansExtract the part containing the element with the same element name from.The second extracting means extracts a part other than the element having the same element name as a different part from each of the parts extracted by the first extracting means. The similarity determination unit is similar to the portion extracted by the first extraction unit in which the data amount of the different portion extracted by the second extraction unit is equal to or less than a predetermined value in the element, the content of the element, and the hierarchical structure of the element. Judged as a similar part.
[0047]
  Data creation meansJudged by similarity judgment meansBased on the similar parts, first data indicating information common to the similar parts is created. Furthermore, the data creation means creates second data indicating information on the different parts of the similar parts for each similar part.
[0048]
The first data may be constituted by, for example, common data included in each similar part. In addition, the second data may be configured by information obtained by excluding information included in the first data from each similar part, for example.
[0049]
The generation unit generates information indicating a storage location of the first data and information indicating a storage location of each second data.
[0050]
The processing means performs processing so as to delete all the similar parts from the storage means when the similar parts are detected. Further, in correspondence with the deleted similar part, information indicating the storage location of the first data and information indicating the storage location of each second data are stored in the storage means.
[0051]
That is, for the similar portion, the tree structure is deleted, and information indicating the storage location of the first data and information indicating the storage location of each second data are stored instead.
[0052]
Since the data amount of the information indicating the storage location is small, even if the information indicating the storage location is stored, the storage amount of the storage means does not increase significantly. On the other hand, since the amount of data of the tree structure is large, the storage capacity necessary for storing the tree structure is greatly reduced by deleting the tree structure of similar parts.
[0053]
Further, even when referring to the deleted similar portion, the first data and the second data can be read using the information indicating the storage location of the first data and the information indicating the storage location of the second data. As a result, the deleted similar part can be referred to.
[0054]
  In this manner, the tree structure of the similar structure is deleted, and the storage location of the first data and the storage location of the second data are stored in correspondence with each similar portion. Therefore, without changing the hierarchical structure of the tree structure, The storage capacity required to store the tree structure can be reduced.
[0055]
  The fourth program of the present invention is a computer.TheA tree structure corresponding to a document in which multiple elements are described hierarchicallyRecorded in the memorized storage meansRemembered tree structureFirst extraction means for extracting a part including the element having the same element name from among the parts extracted by the first extraction means, and a part other than the element having the same element name is different from each other A second extraction means for extracting the data, and the portion extracted by the first extraction means for which the data amount of the different part extracted by the second extraction means is a predetermined value or less, the element, the content of the element, and the hierarchical structure of the elements Similarity determination means for determining that is a similar similar part,SaidJudged by similarity judgment meansData creation for creating first data indicating information common to each similar part based on the similar part and creating second data indicating information on a different part of the similar part for each similar partmeans,Generation for generating information indicating a storage location of the first data and information indicating a storage location of each second dataMeans, and
  in frontAll of the similar parts are deleted from the storage means, and information indicating the storage location of the first data and information indicating the storage location of each second data are stored in the storage means in correspondence with the deleted similar parts. Process to processFunction as a means.
[0056]
  The present inventionThe firstSince the program No. 4 also operates in the same manner as the fourth information processing apparatus of the present invention, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure.
[0057]
  According to a fifth information processing apparatus of the present invention, a tree structure creating means for creating a tree structure corresponding to a document in which a plurality of elements are hierarchically described, and the created tree structureFirst extraction means for extracting a part including an element having the same element name from among the parts extracted by the first extraction means, and a part other than the element having the same element name as a different part A second extraction means for extracting, and a portion extracted by the first extraction means for which the data amount of the different part extracted by the second extraction means is equal to or less than a predetermined value, an element, an element content, and an element hierarchical structure Similarity determination means for determining that is a similar part,SaidJudged by similarity judgment meansBased on the similar part, the first data indicating information common to each similar part is created, and the second data indicating the information of the different part of the similar part is created for each similar part, the data creation means, Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each of the second data;Before the created tree structureOther than similar partsUse the part as a storage meansMemoryAs well asThe created tree structureofInformation indicating the storage location of the first data and information indicating the storage location of each second data in correspondence with similar partsIn the storage meansRememberProcessing means to processAnd.
[0058]
The tree structure creation means of the fifth information processing apparatus of the present invention creates a tree structure corresponding to a document in which a plurality of elements are hierarchically described.
[0059]
  The first extraction meansCreated tree structureThe part including the element with the same element name is extracted from. The second extracting means extracts a part other than the element having the same element name as a different part from the parts extracted by the first extracting means. The similarity determination unit is a similar part in which the element, the content of the element, and the hierarchical structure of the element are similar to the part extracted by the first extraction unit in which the data amount of the different part extracted by the second extraction unit is a predetermined value or less. It is judged that.
[0060]
  Data creation meansJudged by similarity judgment meansBased on the similar parts, first data indicating information common to the similar parts is created. Furthermore, the data creation means creates second data indicating information on the different parts of the similar parts for each similar part. The generation unit generates information indicating a storage location of the first data and information indicating a storage location of each second data.
[0061]
  The processing means is the tree structure created above.Other than similar partsUse the part as a storage meansMemoryAnd aboveCreated tree structureofInformation indicating the storage location of the first data and information indicating the storage location of each second data corresponding to the similar partIn memoryRememberTo handle.
[0062]
That is, for the similar portion, the tree structure is not stored, but information indicating the storage location of the first data and information indicating the storage location of each second data are stored instead.
[0063]
As described above, by not storing the tree structure of the similar portion, the storage capacity necessary for storing the tree structure is greatly reduced.
[0064]
Further, even when referring to a similar portion that has not been stored, the first data and the second data can be read using information indicating the storage location of the first data and the storage location of the second data. Similar parts that are not stored can be referred to.
[0065]
  Thus, since the tree structure of the similar part is not stored and the storage location of the first data and the storage location of the second data are stored in correspondence with each similar part, the hierarchical structure of the tree structure is not changed. The storage capacity required to store the tree structure can be reduced.
[0066]
  The fifth program of the present invention is a computer.TheTree structure creation to create a tree structure corresponding to a document in which multiple elements are described hierarchicallymeans,The created tree structureFirst extraction means for extracting a part including the element having the same element name from among the parts extracted by the first extraction means, and extracting a part other than the element having the same element name as a different part The second extraction means, the portion extracted by the first extraction means whose data amount of the different part extracted by the second extraction means is less than a predetermined value, the elements, the contents of the elements, and the hierarchical structure of the elements are similar Similarity determination means for determining that it is a similar partSaidJudged by similarity judgment meansData creation for creating first data indicating information common to each similar part based on the similar part and creating second data indicating information on a different part of the similar part for each similar partmeans,Generation for generating information indicating a storage location of the first data and information indicating a storage location of each second dataMeans and in front of the created tree structureOther than similar partsUse the part as a storage meansMemoryAs well asThe created tree structureofInformation indicating the storage location of the first data and information indicating the storage location of each second data in correspondence with similar partsIn the storage meansRememberTo function as a processing means,.
[0067]
  The present inventionThe firstSince the program No. 5 operates in the same manner as the fifth information processing apparatus of the present invention, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure.
[0068]
  The sixth information processing apparatus of the present inventionDuplicateA tree structure creating means for creating a tree structure corresponding to a document in which a number of elements are described hierarchically;Every time a new tree structure is created by the tree structure creation means, it is already stored in the storage means.Tree structure and saidNew with tree structure creation meansCompare with the created tree structure,New with tree structure creation meansCreated tree structureFrom the first extraction means for extracting a part including an element having the same element name as the element included in the tree structure already stored in the storage means, and the part extracted by the first extraction means, A second extraction means for extracting a part other than the element having the same element name as a different part, and a data amount of the different part extracted by the second extraction means is extracted by the first extraction means having a predetermined value or less; Similarity determination means for determining that the part is a similar part in which the element, the content of the element, and the hierarchical structure of the element are similar;SaidJudged by similarity judgment meansBased on the similar part, the first data indicating information common to each similar part is created, and the second data indicating the information of the different part of the similar part is created for each similar part, the data creation means, Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each of the second data;in frontOther than the similar part of the tree structure createdPart to the storage meansMemoryAs well asDeleting the similar part from the storage means;TheProcessing to store information indicating the storage location of the first data and information indicating the storage location of each second data in the storage means in correspondence with the deleted portion and the similar portion of the created tree structure Processing means.
[0070]
The tree structure creating means creates a tree structure corresponding to a document in which a plurality of elements are described hierarchically.
[0071]
  The first extracting means is already stored in the storage means every time a new tree structure is created by the tree structure creating means.With tree structureNewly using the tree structure creation meansCompare with the created tree structure,Newly using the tree structure creation meansCreated tree structureThen, a portion including an element having the same element name as that of the element included in the tree structure already stored in the storage means is extracted. The second extracting means extracts a part other than the element having the same element name as a different part from the parts extracted by the first extracting means. The similarity determination unit is a similar part in which the element, the content of the element, and the hierarchical structure of the element are similar to the part extracted by the first extraction unit in which the data amount of the different part extracted by the second extraction unit is a predetermined value or less. It is judged that.
[0072]
  Data creation meansJudged by similarity judgment meansBased on the similar parts, first data indicating information common to the similar parts is created. Furthermore, the data creation means creates second data indicating information on the different parts of the similar parts for each similar part. The generation unit generates information indicating a storage location of the first data and information indicating a storage location of each second data.
[0073]
  Processing means, AboveOther than similar parts of the constructed tree structureUse the part as a storage meansMemoryAs well asRemove similar parts from storageAndIn correspondence with the deleted portion and the similar portion of the created tree structure, processing is performed so that information indicating the storage location of the first data and information indicating the storage location of each second data are stored in the storage means.
[0074]
That is, for the similar part of the created tree structure, the tree structure is not stored, but instead the information indicating the storage location of the first data and the information indicating the storage location of the second data are stored. For similar parts already stored, the tree structure is deleted, and instead, information indicating the storage location of the first data and information indicating the storage location of the second data are stored.
[0075]
In this way, the memory structure necessary for storing the tree structure is greatly reduced by not storing the tree structure of the created similar part and deleting the stored tree structure of the similar part. It becomes.
[0076]
Further, even when referring to a similar portion that has not been stored or deleted, the first data and the second data are read using information indicating the storage location of the first data and the storage location of the second data. It is possible to refer to similar parts that were not stored or deleted.
[0077]
  Even with such a configuration, the storage capacity required to store the tree structure can be reduced without changing the hierarchical structure of the tree structure..
[0078]
  The sixth program of the present invention is a computer.TheA tree structure creating means for creating a tree structure corresponding to a document in which a number of elements are described hierarchically;Every time a new tree structure is created by the tree structure creation means, it is already stored in the storage means.Tree structure and saidNew with tree structure creation meansCompare with the created tree structure,New with tree structure creation meansCreated tree structureFrom the first extraction means for extracting a part including an element having the same element name as the element included in the tree structure already stored in the storage means, and the part extracted by the first extraction means, A second extraction means for extracting a part other than the element having the same element name as a different part, and a data amount of the different part extracted by the second extraction means is extracted by the first extraction means having a predetermined value or less; Similarity determination means for determining that the part is a similar part in which the element, the content of the element, and the hierarchical structure of the element are similar;SaidJudged by similarity judgment meansBased on the similar part, the first data indicating information common to each similar part is created, and the second data indicating the information of the different part of the similar part is created for each similar part, the data creation means, Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each of the second data;in frontOther than the similar part of the tree structure createdStoring the portion in the storage means;Deleting the similar part from the storage means;TheProcessing to store information indicating the storage location of the first data and information indicating the storage location of each second data in the storage means in correspondence with the deleted portion and the similar portion of the created tree structure Processing means toTo function as.
[0079]
  The present inventionThe firstSince the program No. 6 also operates in the same manner as the sixth information processing apparatus of the present invention, the storage capacity necessary for storing the tree structure can be reduced without changing the hierarchical structure of the tree structure.
[0080]
The storage medium for storing the above-described program may be a ROM, a CD-ROM, a DVD disk, a magneto-optical disk, an IC card, a hard disk, or the like, or a carrier wave on a telecommunication line. It may be a transmission medium.
[0084]
  In any of the information processing apparatuses 4 to 6 described above,The data creation means can create the second data so that the second data can be inserted into the first data.
[0085]
By creating the first data in this way, using the first data and the second data, it is possible to create data indicating the original similar part. Based on this, the tree structure of the similar part is created. be able to.
[0086]
The data creation means can create the first data so as to include a conversion formula for converting the second data according to a predetermined rule.
[0087]
By creating the first data in this way, the second data can be converted using the first data. As a result, data indicating the original similar part can be easily created, and a tree structure of the similar part can be created based on the data.
[0088]
  Also mentioned aboveAny of 4-6The information processing apparatus, when referring to a similar portion in which the storage location of the first data and the storage location of the second data are referenced, stores the second data corresponding to the first data and the referenced similar portion. And a similar partial tree structure creating means for creating a tree structure using and and storing the tree structure in the storage means.
[0089]
Thereby, when a similar part is referred to, it is possible to easily create and store a tree structure of the referenced similar part. The “reference” here includes a case where a tree structure of a similar part is manipulated.
[0090]
Furthermore, the information processing apparatus described above is any of the case where the free capacity of the storage unit becomes a predetermined value or less, and the case where the tree structure stored by the similar partial tree structure creation unit is not referred to for a predetermined time. In either case, it may further comprise a deleting means for deleting the tree structure stored by the similar partial tree structure creating means from the storage means.
[0091]
Thereby, the free capacity of the storage means is not wasted.
[0092]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
[0093]
[First Embodiment]
FIG. 1A is a diagram conceptually showing the functional configuration of the structured document processing system according to the present embodiment and the processing performed in the structured document processing system. As shown in the figure, when an XML document 30 described in a text format is input to an XML processor (XML parser) 10 as an information processing apparatus of the present invention, the XML processor 10 converts each element of the XML document 30. A tree structure (DOM tree) 40 as a node is created. The upper XML application 50 can refer to or operate the created DOM tree 40. .
[0094]
FIG. 1B is a diagram showing in more detail the functional configuration of the structured document processing system shown in FIG. 1A and the processing performed in the structured document processing system.
[0095]
In (1), in order for the upper XML application 50 to refer to and operate the XML document 30, the XML processor 10 is instructed to create the DOM tree 40 via the DOM interface 60.
[0096]
In (2), the XML processor 10 that has received the instruction inputs and analyzes the XML document 30.
[0097]
FIG. 2 shows an example of the XML document 30. As shown in the figure, the XML document 30 is a text format in which a plurality of elements are hierarchically described by a start tag <XXX> indicating the start of the element and an end tag </ XXX> indicating the end of the element. Has been.
[0098]
Specifically, the XML processor 10 analyzes the hierarchical structure of each element of the XML document 30 by sequentially reading the XML document 30 as shown in FIG. 2 and detecting the start tag and the end tag. To do.
[0099]
Further, the XML processor 10 creates a DOM tree 40 based on the analysis result.
[0100]
FIG. 3 shows an example of a DOM tree 40 created by analyzing the XML document 30 shown in FIG. As illustrated, each element (Element) is configured as a node (node) of the DOM tree 40 (for example, node 70 in the figure). The contents of the element, that is, a character string (Text) sandwiched between the start tag and the end tag is also configured as a node (for example, node 72 in the figure).
[0101]
The portion indicated by (1) in FIG. 3 is a subtree (part of the entire tree) corresponding to the description of the scan element in (1) in FIG. Also, the part indicated by (2) in FIG. 3 is a subtree corresponding to the description of the storage element in (2) in FIG.
[0102]
Thus, the DOM tree 40 is composed of a plurality of subtrees.
[0103]
In (3), the upper XML application 50 refers to or operates the created DOM tree 40 via the DOM interface 60.
[0104]
FIG. 4 is a block diagram showing a configuration of the information processing apparatus 11 as hardware resources for realizing the functions of the XML processor 10.
[0105]
As illustrated, the information processing apparatus 11 includes an operation unit 12, an input / output unit 18, a CPU 20, a network I / F 22, a ROM 24, and a RAM 26, which are connected to each other via a bus.
[0106]
The operation unit 12 includes, for example, a keyboard, and the user inputs arbitrary data using the operation unit 12 and performs a predetermined operation.
[0107]
The network I / F 22 is an interface for connecting to various networks.
[0108]
The input / output unit 18 is an interface for inputting / outputting various data. The input / output unit 18 is connected to an HDD 28 for storing various data, programs, and the like.
[0109]
The ROM 24 stores a program of a processing routine for creating a DOM tree from an XML document as a function of the XML processor 10 and reducing a storage capacity necessary for storing the DOM tree. Further, the ROM 24 also stores programs for realizing the functions of the higher-level XML application 50 that handles XML documents and the DOM interface 60.
[0110]
The CPU 20 implements various functions by executing programs stored in the ROM 24 or the like.
[0111]
The RAM 26 includes a work area 14 and a tree structure storage area 16. The work area 14 is a work area for storing data during work. The tree structure storage area 16 is an area for storing the created DOM tree.
[0112]
The XML processor 10 shown in FIG. 1 is a function realized by using the information processing apparatus 11 shown in FIG. 4 as hardware resources and using a program stored in the ROM 24 as software resources. Note that the upper XML application 50 and the DOM interface 60 are functions realized in the same manner.
[0113]
Hereinafter, a processing routine executed by the XML processor 10 of the present embodiment will be described with reference to the flowcharts of FIGS. 5 and 6.
[0114]
FIG. 5 is a flowchart showing the main routine of the present embodiment.
[0115]
In step 100, the XML processor 10 receives a DOM tree creation instruction from the upper XML application 50 and acquires a structured document (here, an XML document). For example, when the storage device storing the XML document is the HDD 28, the network I / F 22 is acquired from the HDD 28 via the input / output unit 18 and stored in another computer system on the network. Get through.
[0116]
In step 102, the acquired XML document is analyzed. As described above, by detecting the start tag and the end tag in order, the structure and contents of each element described by these tags are analyzed.
[0117]
In step 104, based on the analysis result in step 102, a tree structure (subtree) having the analyzed element as a node is created. The created partial tree is held in the work area 14 of the RAM 26. In the following, the tree structure stored in the tree structure storage area 16 of the RAM 26 is referred to as a DOM tree, and the tree structure that is created and stored in the work area 14 is referred to as a partial tree for distinction.
[0118]
In step 106, it is determined whether or not the free space in the tree structure storage area 16 is insufficient. If it is determined that there is not enough free space, that is, there is sufficient free space to store the subtree created in step 104, the process proceeds to step 112, where the portion of the work area 14 is stored. The tree is stored in the tree structure storage area 16.
[0119]
In order to determine whether or not the free space is insufficient, for example, a threshold is provided, and when the free space in the tree structure storage area 16 is equal to or less than the threshold, the free space is insufficient. It may be determined, or when the storage amount of the DOM tree reaches a threshold value, it may be determined that the free space is insufficient.
[0120]
Subsequently, in step 114, it is determined whether or not the analysis of the entire XML document has been completed. If it is determined that the analysis of the entire XML document has not been completed, the process returns to step 102 to further read the XML document. The analysis of the XML document is continued by detecting the next start tag and end tag.
[0121]
On the other hand, if it is determined in step 106 that the free space is insufficient, the process proceeds to step 108, and a storage capacity reduction process for reducing the storage capacity necessary for storing the DOM tree is executed.
[0122]
FIG. 6 is a flowchart showing the flow of the storage capacity reduction process in the present embodiment.
[0123]
In step 200, the XML processor 10 searches for the same part from the DOM tree stored in the tree structure storage area 16. The DOM tree stored in the tree structure storage area 16 is a DOM tree composed of subtrees created and stored by the above-described steps 102, 104, and 112. In addition, the same part refers to a part having the same element, element content, and hierarchical structure.
[0124]
FIG. 7 is a diagram showing an example of an XML document including the same part. As illustrated, the XML document includes a portion indicated by X and a portion indicated by Y having the same X element, element content, and hierarchical structure.
[0125]
FIG. 8A is a diagram showing an example of a DOM tree created from an XML document that includes elements, the contents of the elements, and the same hierarchical structure. As shown, identical portions 80a, 80b are included. In step 200, such an identical part is searched.
[0126]
In step 202, it is determined whether there is an identical part in the stored DOM tree as a result of the search.
[0127]
If it is determined that there is an identical part, in step 204, all the identical parts of the same part excluding the specific identical part are deleted from the tree structure storage area 16. Here, one specific identical part is selected from the same parts, and the other parts are deleted.
[0128]
For example, in the example shown in FIG. 8A, the same portion 80a is selected as a specific same portion, the selected same portion 80a is not deleted, and the other same portion, here, the same portion 80b is deleted. be able to. Even in the case where there are three or more identical parts, similarly, a specific identical part may be selected and the selected parts may be deleted without deleting them.
[0129]
FIG. 8B is a diagram showing a state in which the same part of the tree structure is deleted from the DOM tree of FIG. As illustrated, the same portion 80b is deleted.
[0130]
In step 206, link information is created. Here, the link information is information indicating the storage location of the specific same portion described above. In the example shown in FIG. 8B, information indicating the storage location of a specific identical portion 80a is created as link information.
[0131]
In step 208, the created link information is stored in the tree structure storage area 16 in association with the deleted identical part 80b. In the example shown in FIG. 8B, this corresponds to establishing a link from the node C to the node E immediately below the node B.
[0132]
In step 210, it is determined whether or not the search for the DOM tree stored in the tree structure storage area 16 has been completed. If it is determined that the search has not ended, the process returns to step 200 to continue the storage capacity reduction process. If it is determined that the search has been completed, the present storage capacity reduction processing routine is terminated. This processing routine can reduce the storage capacity required to store the DOM tree.
[0133]
When the storage capacity reduction processing in step 108 of FIG. 5 is completed, the XML processor 10 determines in step 110 whether or not the shortage of free space in the tree structure storage area 16 has been resolved. If it is determined that the shortage of free space has not been resolved, a message indicating the shortage of free space is displayed on a display unit (not shown) in step 116, and the process for creating a DOM tree is terminated.
[0134]
If it is determined in step 110 that the shortage of free space has been resolved, for example, if the free space in the tree structure storage area 16 exceeds the above-described threshold, the process proceeds to step 112 and the subtree created in step 104 Are stored in the tree structure storage area 16.
[0135]
In step 114, it is determined whether or not the analysis of the entire acquired XML document has been completed. If it is determined that the analysis has not been completed, the process returns to step 102, and the XML document is further read and the next start tag and Analysis of the XML document is continued by detecting the end tag. If it is determined that the process has ended, the process for creating the DOM tree is ended.
[0136]
After creating the DOM tree, the XML processor 10 passes the created DOM tree to the upper XML application 50. Here, since the upper XML application 50 and the DOM interface 60 are implemented using the same hardware resources as the XML processor 10, the XML processor 10 stores the storage location of the created DOM tree in the upper XML application 50. The DOM tree is passed in a form notified through the DOM interface 60.
[0137]
As described above, in the present embodiment, the same part except for the specific identical part is deleted, and information indicating the storage location of the specific identical part is stored in correspondence with the deleted identical part. The storage capacity required to store the tree structure can be greatly reduced.
[0138]
In the present embodiment, an example of executing the storage capacity reduction process of searching for and deleting the same part when the capacity of the tree structure storage area 16 becomes insufficient has been described. Regardless of the shortage, the storage capacity reduction process may be executed every time a partial tree is created or at a predetermined interval.
[0139]
In the present embodiment, the example in which the storage capacity reduction process is executed for the DOM tree stored in the tree structure storage area 16 has been described. However, the work area in the previous stage that is stored in the tree structure storage area 16 The storage capacity reduction process may be executed for the subtree held in the target 14.
[0140]
Further, each time a partial tree is created, the created partial tree is compared with the DOM tree stored in the tree structure storage area 16, and if there is an identical part, a part other than the identical part in the created partial tree is compared. The part is stored in the tree structure storage area 16 in a tree structure state, and the information indicating the storage location of the same part already stored in the tree structure storage area 16 is stored instead of storing the tree structure for the same part. May be stored.
[0141]
Even with such a configuration, the storage capacity required for storing the tree structure can be greatly reduced.
[0142]
[Second Embodiment]
In the first embodiment, an example in which the same part is processed has been described. However, in the present embodiment, an example in which a similar part is processed is described.
[0143]
Note that the hardware configuration, software configuration, and main routine executed by the XML processor 10 in the present embodiment are the same as those in the first embodiment, and a description thereof will be omitted. In the present embodiment, a storage capacity reduction process described below is executed in place of the storage capacity reduction process described with reference to the flowchart of FIG. 6 in the first embodiment.
[0144]
First, with reference to FIGS. 9A and 9B, the storage capacity reduction processing of the present embodiment will be briefly described.
[0145]
First, a similar part is detected from the DOM tree stored in the tree structure storage area 16. As shown in FIG. 9A, similar parts (subtrees) 82 and 84 are different in node G and node I, but other nodes and hierarchical structures are the same, and similar. I can say that.
[0146]
As shown in FIG. 9B, from the detected similar parts 82 and 84, XML data common data 86 (hereinafter referred to as common data) 86 is created for the parts that can be shared, and for the different parts. Data of different parts in XML format (hereinafter referred to as XML data) 88 and 90 corresponding to the similar parts are created. The tree structure of the similar part is deleted, and information indicating the storage location of the created data is stored instead.
[0147]
In the present embodiment, an example will be described in which the common data 86 is created including text data common to similar parts, as well as descriptions using Xinclude elements and XSLT (XSL Transformations: XML data transformation language). This description will be described later.
[0148]
FIG. 10 is a flowchart showing in detail the flow of the storage capacity reduction processing routine executed in the present embodiment. The storage capacity reduction process will be described in more detail with reference to FIG.
[0149]
In step 300, the XML processor 10 searches for the same tag name from the DOM tree stored in the tree structure storage area 16. That is, by searching for the same tag name, a portion that may be similar is extracted. Specifically, the XML processor 10 searches the stored DOM tree in order from the root node in the lower direction, and detects the same tag name.
[0150]
In step 302, it is determined whether or not a node having the same tag name has been detected in the stored DOM tree as a result of the search. If it is determined that a node with the same tag name could not be detected, it can be determined that there is no similar part in the DOM tree, and this storage capacity reduction processing routine is terminated.
[0151]
If it is determined in step 302 that a node with the same tag name has been detected, the amount of nodes (lower nodes) positioned below the node with the same tag name detected in step 304 is calculated. For example, in the example shown in FIG. 9A, the lower nodes of the node “C” having the same tag name are the nodes F, H, and G in the subtree 82, and the nodes F, H in the subtree 84. , And I. Here, the amount of all lower nodes is calculated.
[0152]
In step 306, it is determined whether or not the calculated amount of lower nodes is equal to or greater than a predetermined amount. Here, when the amount of the lower nodes is less than the amount specified in advance, the amount of the lower nodes is small, so that the shortage of the capacity of the tree structure storage area 16 is resolved even if the subsequent processing is performed. Since it is determined that it cannot be performed, the process proceeds to step 322.
[0153]
In step 322, it is determined whether or not the search for the DOM tree stored in the tree structure storage area 16 has been completed. If it is determined that the search has not ended, the process returns to step 300 to continue the storage capacity reduction process. If it is determined that the search has been completed, the present storage capacity reduction processing routine is terminated.
[0154]
On the other hand, if the amount of lower nodes is greater than or equal to the amount specified in advance in step 306, a threshold is calculated in step 308 based on the amount of lower nodes. This threshold is an index for determining how many different parts exist in a subtree detected as a part that may be similar. It is possible to determine whether or not to treat a certain part as a similar part. Here, calculation is performed so that the threshold value increases as the amount of lower nodes increases.
[0155]
In step 310, a difference amount is calculated. Specifically, the amount of difference is calculated by comparing the tag name of each node of the lower node and the tag arrangement (arrangement state and contents), and extracting and counting the different portions. For example, in FIG. 9A, since the node G and the node I are different, this portion is a count target.
[0156]
When calculating the difference amount, the count number may be varied depending on the type of the difference portion. For example, the count number can be lowered when the contents (numbers, etc.) are different, and the count number can be increased when the tag names are different. This is because the description by the Xinclude element and XSLT is easy when the contents (numbers and the like) are different, and the description by these is difficult when the tag names are different. The Xinclude element and the XSLT are used when creating data common to similar parts in the storage capacity reduction process, which will be described later.
[0157]
Note that the setting of the count number for counting the difference amount may be set in advance or may be arbitrarily set by the user.
[0158]
In step 312, the calculated difference amount is compared with the calculated threshold value. If the difference amount is larger than the threshold value, it is determined that the portion detected as possibly similar is not a similar portion, and the process proceeds to step 322. If the search is not completed, the search process is repeated.
[0159]
If the difference amount is equal to or smaller than the threshold value in step 312, it is determined that the part detected as possibly similar is a similar part, and the process proceeds to step 314, where the part is extracted from the similar part. Analyze the types of differences. Depending on the type of the difference part, it is determined whether to perform insert data creation processing for creating data using the Xinclude element or conversion data creation processing for creating data using XSLT.
[0160]
If it is determined in step 316 that the type of the different part is compatible with the insert data creation process, the insert data creation process in step 318 is executed. If it is determined that the type of the difference is compatible with the conversion data creation process, the conversion data creation process in step 320 is executed.
[0161]
Here, the insert data creation process and the conversion data creation process will be described in detail.
[0162]
For example, when a different part includes many nodes having the same tag name and different contents, each similar part can be expressed by inserting a different part of each similar part into the common part. Therefore, in this case, it can be determined that it is suitable for the insert data creation process using the Xinclude element.
[0163]
FIGS. 11A and 11B are examples of XML documents that are compatible with the insert data creation process. As shown in the drawing, the XML document indicated by the original data 1 in FIG. 11A and the XML document indicated by the original data 2 in FIG. 11B have a different part A1 or A2 in addition to the common part. It is configured to include. The different parts A1 and A2 have the same tag name kinds but different contents. Therefore, it can be determined that it is suitable for the insert data creation process.
[0164]
FIG. 12 is a flowchart showing a subroutine of insert data creation processing.
[0165]
In step 400, the XML processor 10 creates common data using the Xinclude element, and creates XML data of different parts for each similar part.
[0166]
In step 402, the created common data and XML data are stored.
[0167]
Since the created common data and XML data are text data, the amount of data is small, and the storage capacity required to store the tree structure is smaller than when storing a tree structure. In addition, since the common data is created, the parts that can be shared are shared, so that the storage capacity required for storage can be reduced.
[0168]
Note that the storage location of the common data and the XML data is not particularly limited, and may be the HDD 28, for example. It can also be a storage device of another computer system connected to the network.
[0169]
In step 404, link information is created. Here, the link information is information indicating the storage location of the common data and XML data stored in step 402.
[0170]
In step 406, the tree structure of the similar part is deleted from the tree structure storage area 16, and each link information is stored in the tree structure storage area 16 in correspondence with each similar part. Further, in the present embodiment, a management table as shown in FIG. 14 is provided in, for example, the work area of the RAM 26 or the HDD 28, and each link information is stored and centrally managed. Thereby, when storing each link information corresponding to each similar part, the management number of the management table can be stored instead of each link information.
[0171]
FIGS. 13A, 13B, and 13C are diagrams showing common data and XML data created based on the original data 1 and the original data 2 in FIG.
[0172]
As shown in FIG. 13B, the XML data 1 is data corresponding to the original data 1 in FIG. 11A, and a different portion A1 of the original data 1 is described. The XML data 2 in FIG. 13C is data corresponding to the original data 2 in FIG. 11B, and a different portion A2 of the original data 2 is described.
[0173]
In addition, the common data in FIG. 13A includes an Xinclude element A3 so that the data described in the XML data 1 and the XML data 2 is inserted in addition to the common part common to the original data 1 and the original data 2. It is configured to include. The address specified by the Xinclude element A3 indicates the storage location of the XML data 1 and the XML data 2, and may be, for example, the storage location of the management table described above in this embodiment.
[0174]
Thus, the common data is created so as to include the common part in the similar part, and is created so that a different part in the similar part can be inserted.
[0175]
On the other hand, for example, when different parts have different numbers and texts, but the contents are almost the same, each similar part can be expressed by converting the different parts of each similar part according to a predetermined rule. Therefore, in this case, it can be determined that the conversion data creation process using XSLT is suitable.
[0176]
FIGS. 15A and 15B show an example of an XML document suitable for the conversion data creation process. As shown in the drawing, the XML document indicated by the original data 1 in FIG. 15A and the XML document indicated by the original data 2 in FIG. It is comprised including. The different parts B1 and B2 have different numbers and texts, but the contents are almost the same. Therefore, it can be determined that it is suitable for the conversion data creation process.
[0177]
FIG. 16 is a flowchart showing a subroutine of conversion data creation processing.
[0178]
In step 500, the XML processor 10 creates common data using XSLT and creates XML data of different parts for each similar part.
[0179]
The processing from step 502 to step 506 is the same as the processing from step 402 to 406 of the subroutine for the insert data creation processing described above, and a description thereof will be omitted.
[0180]
FIGS. 17A, 17B, and 17C are diagrams showing common data and XML data created based on the original data 1 and the original data 2 in FIG.
[0181]
As shown in FIG. 17A, the common data includes an XSLT conversion formula B3 in addition to the common part common to the original data 1 and the original data 2.
[0182]
Also, the XML data 1 in FIG. 17B is data corresponding to the original data 1 and includes a description corresponding to the different portion B1 of the original data 1. The XML data 2 in FIG. 17C is data corresponding to the original data 2, and includes a description corresponding to a different portion B2 of the original data 2. As is apparent from the figure, the description of the XML data is not a description of the difference between the original data 1 and 2 in FIG. This is because the contents (numbers in this example) that can be converted using XSLT are handled by describing the XSLT conversion formula B3 in the common data.
[0183]
In the example shown in the figure, in the conversion formula B3, by specifying with the select attribute of the copy-of element that copies the node as it is, using the position () function indicating the position of the copied node, the difference part of the original data By specifying the current node (indicated by “.” In the figure) in the select attribute of the value of element to output the value of each node as a character string as it is. The contents of the character string described in the different part are constructed.
[0184]
In this way, similar parts can be efficiently shared by using the XSLT transformation formula.
[0185]
As described above, after step 318 or step 320 in FIG. 10 is executed, in step 322, it is determined whether or not the search for the DOM tree stored in the tree structure storage area 16 has been completed. If it is determined that the search has not ended, the process returns to step 300 to continue the storage capacity reduction process. If it is determined that the search has been completed, the present storage capacity reduction processing routine is terminated.
[0186]
As described above, the tree structure of similar parts is deleted, common data indicating information common to each similar part, and XML data indicating information of different parts are created and stored in correspondence with each similar part. In addition, since the information indicating the storage location of the created data is stored instead of the tree structure, the storage capacity required to store the tree structure can be greatly reduced.
[0187]
In the present embodiment as well, as in the first embodiment, the storage capacity reduction process is not limited to the case where the capacity of the tree structure storage area 16 is insufficient, but every time a partial tree is created or at a predetermined interval. May be executed.
[0188]
In the present embodiment, the example in which the storage capacity reduction process is executed for the DOM tree stored in the tree structure storage area 16 has been described. However, the work area in the previous stage that is stored in the tree structure storage area 16 The storage capacity reduction process may be executed for the subtree held in FIG.
[0189]
Further, every time a partial tree is created, a similar portion may be searched by comparing the created partial tree with the DOM tree stored in the tree structure storage area 16. In this case, if there is a similar part, the part other than the similar part in the created partial tree is stored in the tree structure storage area 16 in a tree structure state and is already stored in the tree structure storage area 16. The tree structure is deleted for similar parts of the DOM tree. For each similar part, common data and XML data are created and stored, and link information indicating the storage location is stored in association with each similar part instead of the tree structure.
[0190]
Even with such a configuration, the storage capacity required for storing the tree structure can be greatly reduced.
[0191]
[Third Embodiment]
In the present embodiment, an example will be described in which, in the storage capacity reduction processing described in the second embodiment, when the difference amount is 0, the processing is determined to be the same portion.
[0192]
Since the hardware configuration, software configuration, and main routine executed by the XML processor 10 in the present embodiment are the same as those in the first embodiment, description thereof will be omitted. The storage capacity reduction processing routine executed by the XML processor 10 replaces step 312 of the storage capacity reduction processing routine described with reference to the flowchart of FIG. 10 in the second embodiment, and includes steps 600 to 600 shown in FIG. Step 608 is executed.
[0193]
After calculating the difference amount in step 310, in step 600, the XML processor 10 determines whether or not the difference amount is zero. When it is determined that the difference amount is 0, it is determined that the portions detected as being likely to be similar are the same portions, and the process proceeds to step 604. In step 604 to step 608, as in the processing of step 204 to step 208 described in the first embodiment, the same part other than the specific same part is deleted from the DOM tree, and the storage location of the specific same part is stored. The link information shown is created, and the link information is stored in association with the deleted same part. Thereafter, the process proceeds to step 322 in FIG. 10, and the search process is continued or the storage capacity reduction process is terminated.
[0194]
If it is determined in step 600 that the difference amount is not 0, it is determined that the parts detected as being likely to be similar are not the same part, and the process proceeds to step 602. In step 602, it is determined whether the difference amount is equal to or less than a threshold value. If the difference amount is less than or equal to the threshold value, it is determined that the portion detected as possibly similar is a similar portion, and the process proceeds to step 314. Since the subsequent processing is the same as that of the second embodiment, the description thereof is omitted.
[0195]
As described above, since the storage capacity reduction process is performed for both the same part and the similar part as processing targets, the storage capacity necessary for storing the tree structure can be efficiently reduced.
[0196]
[Fourth Embodiment]
In the present embodiment, when a similar part deleted from the tree structure storage area 16 in the second and third embodiments is accessed, an example in which the similar part is expanded into a tree structure, and the tree structure An example will be described in which the storage area 16 is deleted again in any one of the cases where the capacity of the storage area 16 is insufficient or when the developed similar portion is not accessed for a predetermined time.
[0197]
Here, the term “development” refers to a process of creating a tree structure using the common data and XML data created by the above-described processing routine and storing them in the tree structure storage area 16.
[0198]
FIG. 19 is a flowchart showing a development processing routine in the present embodiment.
[0199]
In step 700, the XML processor 10 determines whether or not the similar part is deleted from the DOM tree and accessed from the upper XML application 50. If it is determined that the similar part has been accessed, the common data and the XML data are read from the link information of the common data and the XML data stored in the management table corresponding to the similar part.
[0200]
In step 704, the read-out common data and XML data are used to expand the original tree structure. Specifically, an original XML document is created from the common data and XML data, a tree structure is recreated from the created XML document, and stored in the tree structure storage area 16.
[0201]
In step 706, the link information stored corresponding to the similar part of the management table is deleted.
[0202]
In step 708, a timer Ti corresponding to the developed similar part is started. The timer Ti is provided for each expanded similar part, and is used to determine whether or not to delete the expanded similar part tree structure in a later-described deletion processing routine.
[0203]
As described above, when a similar part is accessed, the upper XML application 50 can access the similar part and a part other than the similar part without distinction because the tree structure is expanded.
[0204]
FIG. 20 is a flowchart showing a deletion processing routine in the present embodiment.
[0205]
In step 800, it is determined whether or not the capacity of the tree structure storage area 16 is insufficient. If it is determined that the capacity is not insufficient, the process proceeds to step 802.
[0206]
In step 802, the XML processor 10 determines whether or not the timer Ti has reached or exceeded the threshold value Tth. If the timer Ti is less than the threshold Tth, it is determined in step 804 whether or not the corresponding similar part has been accessed. If it is determined that access has been made, the timer Ti is reset and the routine returns to step 800. If it is determined in step 804 that there is no access, the process returns to step 800 without resetting the timer Ti.
[0207]
If it is determined in step 800 that the capacity of the tree structure storage area 16 is insufficient, or if it is determined in step 802 that the timer Ti has become equal to or greater than the threshold Tth, in step 808, the expansion corresponding to the timer Ti is performed. The similar part is deleted from the tree structure storage area 16.
[0208]
In step 810, link information corresponding to the deleted similar part is created and stored again in the management table. In step 812, the timer Ti is reset and the process ends.
[0209]
As described above, the expanded tree structure is deleted from the tree structure storage area 16 when the capacity of the tree structure storage area 16 becomes insufficient or when the expanded similar part is not accessed for a predetermined time. As a result, it is possible to prevent a situation in which the capacity of the tree structure storage area 16 becomes insufficient and an error occurs or access efficiency is reduced.
[0210]
【The invention's effect】
As described above, according to the present invention, the storage capacity required for storing a tree structure is reduced without changing the hierarchical structure of the tree structure created from a document that hierarchically describes a plurality of elements. There is an effect that can be done.
[Brief description of the drawings]
FIG. 1A is a diagram conceptually showing a functional configuration of a structured document processing system according to the present embodiment and processing performed in the structured document processing system. FIG. 3 is a diagram showing in more detail the functional configuration of the structured document processing system shown in FIG. 1A and the processing performed in the structured document processing system.
FIG. 2 is a diagram showing an example of an XML document.
3 is a diagram showing an example of a DOM tree created by analyzing the XML document in FIG. 2; FIG.
FIG. 4 is a block diagram showing a configuration of an information processing apparatus as hardware resources for realizing the function of an XML processor.
FIG. 5 is a flowchart showing a main routine of a processing routine executed by the XML processor.
FIG. 6 is a flowchart showing a storage capacity reduction processing routine according to the first embodiment.
FIG. 7 is a diagram showing an example of an XML document including the same part. .
FIG. 8A is a diagram showing an example of a DOM tree created from an XML document that includes parts having the same elements, content of elements, and hierarchical structure. FIG. 8A is a diagram illustrating FIG. It is the figure which showed the state which deleted the tree structure of the same part from the DOM tree of (B). .
FIG. 9A is an example of a DOM tree including a similar part, and FIG. 9B is a diagram of storing common data and XML data created instead by deleting the tree structure of the similar part. It is the figure which showed the state which memorize | stored the information which shows a place.
FIG. 10 is a flowchart showing a storage capacity reduction processing routine according to the second embodiment.
FIG. 11 is an example of an XML document suitable for insert data creation processing.
FIG. 12 is a flowchart showing a subroutine of insert data creation processing.
13A is a diagram showing common data created based on the original data 1 and the original data 2 in FIG. 11, and FIG. 13B and FIG. 13C are diagrams showing the common data. It is the figure which showed the XML data produced based on 11 original data 1 and original data 2.
FIG. 14 is an example of a management table storing information indicating storage locations of common data and XML data.
FIG. 15 is an example of an XML document adapted to conversion data creation processing;
FIG. 16 is a flowchart showing a subroutine of conversion data creation processing;
17A is a diagram showing common data created based on the original data 1 and the original data 2 in FIG. 15, and FIG. 17B and FIG. It is the figure which showed the XML data produced based on 15 original data 1 and original data 2.
FIG. 18 is a part of a flowchart of a storage capacity reduction processing routine according to the third embodiment.
FIG. 19 is a flowchart showing a development processing routine according to the fourth embodiment.
FIG. 20 is a flowchart showing a deletion processing routine according to the fourth embodiment.
[Explanation of symbols]
10 XML processor
11 Information processing device
16 Work area
18 Tree storage area
20 CPU
24 ROM
26 RAM
50 Top XML applications

Claims

Storage means for storing a tree structure corresponding to a document in which a plurality of elements are hierarchically described;
In the stored tree structure, the same part detecting means for detecting the same part in which the element, the content of the element, and the hierarchical structure of the elements are the same;
Generating means for generating information indicating a storage location of a specific identical part when the identical part is detected;
When the same part is detected, the same part other than the specific same part is deleted from the storage means, and information indicating the storage location of the specific same part corresponding to the deleted same part is stored in the memory Processing means for processing to be stored in the means;
An information processing apparatus including:

A tree structure creating means for creating a tree structure corresponding to a document in which a plurality of elements are described hierarchically;
In the created tree structure, the same part detecting means for detecting the same part having the same element, element content, and element hierarchical structure;
Generating means for generating information indicating a storage location of a specific identical part when the identical part is detected;
A part other than the same part of the created tree structure and the specific same part are stored, and information indicating a storage location of the specific same part is stored corresponding to the same part other than the specific same part Storage means;
An information processing apparatus including:

A tree structure creating means for elements of several to create a tree structure corresponding to the hierarchical-written document,
Each time a new tree structure is created by the tree structure creating means, the tree structure already stored in the storage means is compared with the tree structure newly created by the tree structure creating means, and the tree structure is created. The same part detection means for detecting the same part in which the tree structure and the elements already stored in the storage means in the tree structure newly created by the means, the contents of the elements, and the hierarchical structure of the elements are the same;
When the same part is detected, and generating means for generating information indicating the previously stored location of the same part of a tree structure stored in said memory means,
When the same part is detected , a process is performed so that a part other than the same part of the created tree structure is stored in the storage means , and the previous part is made corresponding to the same part of the created tree structure. serial processing means for processing to store the information indicating the storage location of the same portions in said storage means,
An information processing apparatus including:

Storage means for storing a tree structure corresponding to a document in which a plurality of elements are hierarchically described;
First extraction means for extracting a part including an element having the same element name from the stored tree structure ;
Second extraction means for extracting a part other than the element having the same element name as a different part from each of the parts extracted by the first extraction means;
The portion extracted by the first extracting means whose data amount of the different part extracted by the second extracting means is equal to or less than a predetermined value is a similar part having similar elements, element contents, and element hierarchical structures. Similarity judgment means to judge ,
Based on the similar portion determined by the similarity determining means, first data indicating information common to the similar portions is created, and second data indicating information on the different portions of the similar portions is generated for each similar portion. Data creation means to create;
Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each second data;
Remove all of the similar parts from the previous term memory unit, corresponding to the deleted like parts, the information indicating the storage location of the information and the respective second data indicating the storage location of the first data in the storage means Processing means for processing to memorize,
An information processing apparatus including:

A tree structure creating means for creating a tree structure corresponding to a document in which a plurality of elements are described hierarchically;
First extraction means for extracting a part including an element having the same element name from the created tree structure ;
Second extraction means for extracting a part other than the element having the same element name as a different part from the parts extracted by the first extraction means;
The portion extracted by the first extracting means whose data amount of the different part extracted by the second extracting means is equal to or less than a predetermined value is a similar part having similar elements, element contents, and element hierarchical structures. Similarity judgment means to judge,
Based on the similar portion determined by the similarity determining means, first data indicating information common to the similar portions is created, and second data indicating information on the different portions of the similar portions is generated for each similar portion. Data creation means to create;
Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each second data;
Stores the portion other than the front Stories similar parts of the created tree structure in a storage unit, information indicating the storage location of the first data to correspond to similar parts of the created tree structure and each second Processing means for processing so as to store information indicating a storage location of data in the storage means ;
An information processing apparatus including:

A tree structure creating means for elements of several to create a tree structure corresponding to the hierarchical-written document,
Each time a new tree structure is created by the tree structure creating means, the tree structure already stored in the storage means is compared with the tree structure newly created by the tree structure creating means, and the tree structure is created. First extraction means for extracting a part including an element having the same element name as an element included in the tree structure already stored in the storage means from the tree structure newly created by the means;
Second extraction means for extracting a part other than the element having the same element name as a different part from the parts extracted by the first extraction means;
The portion extracted by the first extracting means whose data amount of the different part extracted by the second extracting means is equal to or less than a predetermined value is a similar part having similar elements, element contents, and element hierarchical structures. Similarity judgment means to judge,
Based on the similar portion determined by the similarity determining means, first data indicating information common to the similar portions is created, and second data indicating information on the different portions of the similar portions is generated for each similar portion. Data creation means to create;
Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each second data;
Stores the portion other than the similar portions of the previous SL created tree structure in said storage means deletes the similar parts from said storage means, corresponding to similar parts of the deleted portion and the created tree structure Processing means for processing to store information indicating a storage location of the first data and information indicating a storage location of each second data in the storage means;
An information processing apparatus including:

Said data generating means, the information processing apparatus of any one of claims 4-6 which can be inserted create the second data to the first data.

The information processing apparatus according to any one of claims 4 to 7 , wherein the data creation unit creates the first data so that a conversion formula for converting the second data according to a predetermined rule is included.

When a similar part in which the storage location of the first data and the storage location of the second data are stored is referred to, a tree using the first data and the second data corresponding to the referenced similar part is used. The information processing apparatus according to claim 4, further comprising a similar partial tree structure creating unit that creates a structure and stores the structure in the storage unit.

In the case of any one of the case where the free capacity of the storage unit is equal to or less than a predetermined value and the case where the tree structure stored by the similar partial tree structure creation unit is not referred to for a predetermined time, the similar The information processing apparatus according to claim 9 , further comprising a deletion unit that deletes the tree structure stored by the partial tree structure creation unit from the storage unit.

The computer,
In remembers by tree structure in a storage unit in which a plurality of elements storing a tree structure corresponding to the hierarchical-written document, identical elements, content of the element, and the hierarchical structure of elements for detecting the same identical parts Partial detection means,
When the same part is detected, generating means for generating information indicating the storage location of the specific same part , and when the same part is detected, the same part other than the specific same part from the storage means And processing means for processing to store information indicating the storage location of the specific identical part in the storage means in correspondence with the deleted identical part ,
Program to function as .

The computer,
A tree structure creation means for creating a tree structure corresponding to a document in which a plurality of elements are described hierarchically ;
In the created tree structure, the same part detecting means for detecting the same part having the same element, element content, and element hierarchical structure ,
When the same part is detected, generating means for generating information indicating a storage location of the specific identical part , and a part other than the same part of the created tree structure and the specific identical part are stored in the storage means And processing means for storing in the storage means information indicating the storage location of the specific identical part corresponding to the same part other than the specific identical part ,
Program to function as .

The computer,
A tree structure creation means for creating a tree structure corresponding to a document in which a plurality of elements are described hierarchically ;
Each time a new tree structure is created by the tree structure creating means, the tree structure already stored in the storage means is compared with the tree structure newly created by the tree structure creating means, and the tree structure is created. The same part detection means for detecting the same part in which the tree structure and the elements already stored in the storage means in the tree structure newly created by the means, the contents of the elements, and the hierarchical structure of the elements are the same ,
When the same part is detected, already when the generation unit generates information indicating a storage location of the same portions, and the same portion of the tree structure is stored is detected in the storage means, the creation with portions other than the same portion of the by tree structure for processing to store in the storage means, and said storing the information indicating the storage location of the previous SL same portions to correspond to the same portion of the created tree structure Processing means for processing to be stored in the means;
Program to function as .

The computer,
Among remembers by tree structure in a storage unit in which a plurality of elements storing a tree structure corresponding to the hierarchical-written document, first extracting means element name to extract the portion including the same elements,
Second extraction means for extracting a part other than the element having the same element name as a different part from each of the parts extracted by the first extraction means;
The portion extracted by the first extracting means whose data amount of the different part extracted by the second extracting means is equal to or less than a predetermined value is a similar part having similar elements, element contents, and element hierarchical structures. Similarity judgment means to judge,
First data indicating information common to each similar part is created based on the similar part determined by the similarity determining unit, and second data indicating information on a different part of the similar part is generated for each similar part. Data creation means to create ,
Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each second data ; and
Remove all of the similar parts from the previous term memory unit, corresponding to the deleted like parts, the information indicating the storage location of the information and the respective second data indicating the storage location of the first data in the storage means Processing means for processing to memorize ,
Program to function as .

The computer,
A tree structure creation means for creating a tree structure corresponding to a document in which a plurality of elements are described hierarchically ;
First extraction means for extracting a part including an element having the same element name from the created tree structure ;
Second extraction means for extracting a part other than the element having the same element name as a different part from the parts extracted by the first extraction means;
The portion extracted by the first extracting means whose data amount of the different part extracted by the second extracting means is equal to or less than a predetermined value is a similar part having similar elements, element contents, and element hierarchical structures. Similarity judgment means to judge,
First data indicating information common to each similar part is created based on the similar part determined by the similarity determining unit, and second data indicating information on a different part of the similar part is generated for each similar part. Data creation means to create ,
Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each second data ; and
Stores the portion other than the front Stories similar parts of the created tree structure in a storage unit, information indicating the storage location of the first data to correspond to similar parts of the created tree structure and each second processing the processing means to store information indicating the storage location of the data in the storage means,
Program to function as .

The computer,
A tree structure creating means for elements of several to create a tree structure corresponding to the hierarchical-written document,
Each time a new tree structure is created by the tree structure creating means, the tree structure already stored in the storage means is compared with the tree structure newly created by the tree structure creating means, and the tree structure is created. First extraction means for extracting a part including an element having the same element name as an element included in the tree structure already stored in the storage means from the tree structure newly created by the means;
Second extraction means for extracting a part other than the element having the same element name as a different part from the parts extracted by the first extraction means;
The portion extracted by the first extracting means whose data amount of the different part extracted by the second extracting means is equal to or less than a predetermined value is a similar part having similar elements, element contents, and element hierarchical structures. Similarity judgment means to judge,
Based on the similar portion determined by the similarity determining means, first data indicating information common to the similar portions is created, and second data indicating information on the different portions of the similar portions is generated for each similar portion. Data creation means to create;
Generating means for generating information indicating a storage location of the first data and information indicating a storage location of each second data;
Stores the portion other than the similar portions of the previous SL created tree structure in said storage means deletes the similar parts from said storage means, corresponding to similar parts of the deleted portion and the created tree structure Processing means for processing so as to store information indicating a storage location of the first data and information indicating a storage location of each second data in the storage means;
Program to function as .